ehrapy.preprocessing.qc_lab_measurements#
- ehrapy.preprocessing.qc_lab_measurements(edata, *, layer=None, var_names=None, method='iqr', score_type='zscore', add_flag=True, add_score=True, groupby=None, copy=False)[source]#
Flag outliers and compute anomaly scores for numeric variables.
For each requested variable the function adds up to two columns in
edata.obs:{var}_outlier– boolean flag (True= outlier).{var}_score– continuous anomaly score.
- Parameters:
var_names (
list[str] |None, default:None) – Variables to evaluate.None(default) evaluates all variables inedata.var_names.layer (
str|None, default:None) – Layer to use instead ofedata.X.method (
Literal['quantile','iqr','zscore','modified_zscore'], default:'iqr') –Outlier detection method.
"iqr"– outside [Q1 − 1.5·IQR, Q3 + 1.5·IQR]."quantile"– outside [2.5th, 97.5th] percentiles."zscore"–|z| > 3."modified_zscore"–|modified z| > 3.5(median / MAD).
score_type (
Literal['zscore','iqr_distance','percentile'], default:'zscore') –Continuous score assigned to each observation.
"zscore"–(x − mean) / std."iqr_distance"–(x − median) / IQR."percentile"– percentile rank in [0, 100].
add_flag (
bool, default:True) – Whether to add the{var}_outliercolumn.add_score (
bool, default:True) – Whether to add the{var}_scorecolumn.groupby (
str|None, default:None) – Column inedata.obsused to stratify the computation so that statistics are calculated within each group independently.copy (
bool, default:False) – IfTrue, return a modified copy; otherwise modify in place.
- Return type:
- Returns:
Noneifcopy=False, otherwise the updated data object.
Examples
>>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> ep.pp.qc_lab_measurements(edata, var_names=["potassium_first"])