aridkpi.io

Data loaders for common monitoring and simulation file formats.

Currently supported:

  • HOBO MX series CSV (Onset)

  • Generic CSV with timestamp + temperature columns

  • EnergyPlus .eso/.csv output (zone mean air temperature)

  • Pre-aggregated DataFrame validation (ensure_valid)

All loaders return a pandas Series or DataFrame with a tz-naive DatetimeIndex sorted in ascending order. NaN values are preserved (use ensure_valid to validate before computing KPIs).

aridkpi.io.ensure_valid(series: Series, *, name: str = 'series', require_monotonic: bool = True, max_gap_minutes: float | None = None) None[source]

Run sanity checks on a time series before KPI computation.

Raises a ValueError with an informative message if any check fails. Does not modify the series.

Parameters:
  • series – The series to validate.

  • name – Display name for error messages.

  • require_monotonic – If True, fail when the index is not strictly increasing.

  • max_gap_minutes – If set, fail when any gap between consecutive timestamps exceeds this threshold (minutes). Useful for detecting unflagged outages.

aridkpi.io.load_energyplus_csv(path: str | Path, zone_air_temp_col: str | None = None, operative_temp_col: str | None = None) DataFrame[source]

Load an EnergyPlus .csv output (eplusout.csv).

EnergyPlus reports timestamps in the form MM/DD HH:MM:SS for the Date/Time column. We assume the simulation year is the current year unless embedded in the column name, which EnergyPlus does not do by default.

Parameters:
  • path – Path to the EnergyPlus CSV.

  • zone_air_temp_col – Substring used to match the zone mean air temperature column. If None, auto-detected by matching "Zone Mean Air Temperature".

  • operative_temp_col – Substring used to match the zone operative temperature column. If None, auto-detected by matching "Zone Operative Temperature".

Returns:

pd.DataFrame – Columns T_air and/or T_op. DatetimeIndex.

Notes

EnergyPlus timestamps use 24:00:00 to denote midnight at the end of the day. We map this to 00:00:00 of the next day before parsing.

aridkpi.io.load_generic_csv(path: str | Path, timestamp_col: str = 'timestamp', columns: Iterable[str] | None = None, parse_kwargs: dict | None = None) DataFrame[source]

Load any CSV with a timestamp column and one or more numeric columns.

Parameters:
  • path – Path to the CSV.

  • timestamp_col – Name of the timestamp column.

  • columns – Iterable of column names to keep besides the timestamp. If None, all numeric columns are kept.

  • parse_kwargs – Extra keyword arguments forwarded to pandas.read_csv.

Returns:

pd.DataFrame – DataFrame with the requested columns indexed by DatetimeIndex.

aridkpi.io.load_hobo_csv(path: str | Path, timestamp_col: str = 'Date Time', temperature_col: str | None = None, rh_col: str | None = None, skiprows: int = 1) DataFrame[source]

Load a HOBO MX-series CSV exported by Onset HOBOware / HOBOconnect.

HOBO CSVs typically have a 1-line header above the column names. The timestamp is parsed automatically. Temperature and RH columns can be provided explicitly or auto-detected by name fragment.

Parameters:
  • path – Path to the CSV.

  • timestamp_col – Name of the timestamp column.

  • temperature_col – Name of the temperature column. If None, the first column whose name contains “Temp” (case-insensitive) is used.

  • rh_col – Name of the RH column. If None, the first column whose name contains “RH” or “Humidity” is used (or no RH if none found).

  • skiprows – Number of metadata rows to skip before the header (HOBO default = 1).

Returns:

pd.DataFrame – Columns T (and optionally RH), DatetimeIndex.

Raises:

FileNotFoundError, KeyError – Standard pandas exceptions if the file or columns are not found.