ehrapy.preprocessing.qc_lab_measurements

ehrapy.preprocessing.qc_lab_measurements#

ehrapy.preprocessing.qc_lab_measurements(edata, *, layer=None, var_names=None, method='iqr', score_type='zscore', add_flag=True, add_score=True, groupby=None, copy=False)[source]#

Flag outliers and compute anomaly scores for numeric variables.

For each requested variable the function adds up to two columns in edata.obs:

  • {var}_outlier – boolean flag (True = outlier).

  • {var}_score – continuous anomaly score.

Parameters:
  • edata (EHRData) – Central data object.

  • var_names (list[str] | None, default: None) – Variables to evaluate. None (default) evaluates all variables in edata.var_names.

  • layer (str | None, default: None) – Layer to use instead of edata.X.

  • method (Literal['quantile', 'iqr', 'zscore', 'modified_zscore'], default: 'iqr') –

    Outlier detection method.

    • "iqr" – outside [Q1 − 1.5·IQR, Q3 + 1.5·IQR].

    • "quantile" – outside [2.5th, 97.5th] percentiles.

    • "zscore"|z| > 3.

    • "modified_zscore"|modified z| > 3.5 (median / MAD).

  • score_type (Literal['zscore', 'iqr_distance', 'percentile'], default: 'zscore') –

    Continuous score assigned to each observation.

    • "zscore"(x mean) / std.

    • "iqr_distance"(x median) / IQR.

    • "percentile" – percentile rank in [0, 100].

  • add_flag (bool, default: True) – Whether to add the {var}_outlier column.

  • add_score (bool, default: True) – Whether to add the {var}_score column.

  • groupby (str | None, default: None) – Column in edata.obs used to stratify the computation so that statistics are calculated within each group independently.

  • copy (bool, default: False) – If True, return a modified copy; otherwise modify in place.

Return type:

EHRData | None

Returns:

None if copy=False, otherwise the updated data object.

Examples

>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> ep.pp.qc_lab_measurements(edata, var_names=["potassium_first"])