ehrapy.preprocessing.simple_impute#
- ehrapy.preprocessing.simple_impute(edata, var_names=None, *, strategy='mean', warning_threshold=70, layer=None, copy=False)[source]#
Impute missing values in numerical data using mean/median/most frequent imputation.
If required and using mean or median strategy, the data needs to be properly encoded as this imputation requires numerical data only.
- Parameters:
edata (
EHRData) – Central data object.var_names (
Iterable[str] |None, default:None) – A list of column names to apply imputation on (if None, impute all columns).strategy (
Literal['mean','median','most_frequent'], default:'mean') – Imputation strategy to use. One of {‘mean’, ‘median’, ‘most_frequent’}. If data is a dask.array.Array, only ‘mean’ is supported.warning_threshold (
int, default:70) – Display a warning message if percentage of missing values exceeds this threshold.copy (
bool, default:False) – Whether to return a copy of edata or modify it inplace.
- Return type:
- Returns:
If copy is True, a modified copy of the original data object with imputed X. If copy is False, the original data object is modified in place, and None is returned.
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> ep.pp.simple_impute(edata, strategy="median")