aridkpi.io
Data loaders for common monitoring and simulation file formats.
Currently supported:
HOBO MX series CSV (Onset)
Generic CSV with timestamp + temperature columns
EnergyPlus .eso/.csv output (zone mean air temperature)
Pre-aggregated DataFrame validation (
ensure_valid)
All loaders return a pandas Series or DataFrame with a tz-naive
DatetimeIndex sorted in ascending order. NaN values are preserved (use
ensure_valid to validate before computing KPIs).
- aridkpi.io.ensure_valid(series: Series, *, name: str = 'series', require_monotonic: bool = True, max_gap_minutes: float | None = None) None[source]
Run sanity checks on a time series before KPI computation.
Raises a ValueError with an informative message if any check fails. Does not modify the series.
- Parameters:
series – The series to validate.
name – Display name for error messages.
require_monotonic – If
True, fail when the index is not strictly increasing.max_gap_minutes – If set, fail when any gap between consecutive timestamps exceeds this threshold (minutes). Useful for detecting unflagged outages.
- aridkpi.io.load_energyplus_csv(path: str | Path, zone_air_temp_col: str | None = None, operative_temp_col: str | None = None) DataFrame[source]
Load an EnergyPlus .csv output (eplusout.csv).
EnergyPlus reports timestamps in the form
MM/DD HH:MM:SSfor the Date/Time column. We assume the simulation year is the current year unless embedded in the column name, which EnergyPlus does not do by default.- Parameters:
path – Path to the EnergyPlus CSV.
zone_air_temp_col – Substring used to match the zone mean air temperature column. If
None, auto-detected by matching"Zone Mean Air Temperature".operative_temp_col – Substring used to match the zone operative temperature column. If
None, auto-detected by matching"Zone Operative Temperature".
- Returns:
pd.DataFrame – Columns
T_airand/orT_op. DatetimeIndex.
Notes
EnergyPlus timestamps use 24:00:00 to denote midnight at the end of the day. We map this to 00:00:00 of the next day before parsing.
- aridkpi.io.load_generic_csv(path: str | Path, timestamp_col: str = 'timestamp', columns: Iterable[str] | None = None, parse_kwargs: dict | None = None) DataFrame[source]
Load any CSV with a timestamp column and one or more numeric columns.
- Parameters:
path – Path to the CSV.
timestamp_col – Name of the timestamp column.
columns – Iterable of column names to keep besides the timestamp. If
None, all numeric columns are kept.parse_kwargs – Extra keyword arguments forwarded to
pandas.read_csv.
- Returns:
pd.DataFrame – DataFrame with the requested columns indexed by DatetimeIndex.
- aridkpi.io.load_hobo_csv(path: str | Path, timestamp_col: str = 'Date Time', temperature_col: str | None = None, rh_col: str | None = None, skiprows: int = 1) DataFrame[source]
Load a HOBO MX-series CSV exported by Onset HOBOware / HOBOconnect.
HOBO CSVs typically have a 1-line header above the column names. The timestamp is parsed automatically. Temperature and RH columns can be provided explicitly or auto-detected by name fragment.
- Parameters:
path – Path to the CSV.
timestamp_col – Name of the timestamp column.
temperature_col – Name of the temperature column. If
None, the first column whose name contains “Temp” (case-insensitive) is used.rh_col – Name of the RH column. If
None, the first column whose name contains “RH” or “Humidity” is used (or no RH if none found).skiprows – Number of metadata rows to skip before the header (HOBO default = 1).
- Returns:
pd.DataFrame – Columns
T(and optionallyRH), DatetimeIndex.- Raises:
FileNotFoundError, KeyError – Standard pandas exceptions if the file or columns are not found.