ehrapy.preprocessing.simple_impute

Contents

ehrapy.preprocessing.simple_impute#

ehrapy.preprocessing.simple_impute(adata, var_names=None, *, strategy='mean', copy=False, warning_threshold=70)[source]#

Impute missing values in numerical data using mean/median/most frequent imputation.

If required and using mean or median strategy, the data needs to be properly encoded as this imputation requires numerical data only.

Parameters:
  • adata (AnnData) – The annotated data matrix to impute missing values on.

  • var_names (Iterable[str] | None, default: None) – A list of column names to apply imputation on (if None, impute all columns).

  • strategy (Literal['mean', 'median', 'most_frequent'], default: 'mean') – Imputation strategy to use. One of {‘mean’, ‘median’, ‘most_frequent’}.

  • warning_threshold (int, default: 70) – Display a warning message if percentage of missing values exceeds this threshold.

  • copy (bool, default: False) – Whether to return a copy of adata or modify it inplace.

Return type:

AnnData | None

Returns:

If copy is True, a modified copy of the original AnnData object with imputed X. If copy is False, the original AnnData object is modified in place, and None is returned.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.simple_impute(adata, strategy="median")