ehrapy.preprocessing.simple_impute

ehrapy.preprocessing.simple_impute(adata, var_names=None, *, strategy='mean', copy=False, warning_threshold=70)[source]

Impute missing values in numerical data using mean/median/most frequent imputation.

Parameters:
  • adata (AnnData) – The annotated data matrix to impute missing values on.

  • var_names (Iterable[str] | None) – A list of column names to apply imputation on (if None, impute all columns).

  • strategy (Literal['mean', 'median', 'most_frequent']) – Imputation strategy to use. One of {‘mean’, ‘median’, ‘most_frequent’}.

  • warning_threshold (int) – Display a warning message if percentage of missing values exceeds this threshold. Defaults to 70.

  • copy (bool) – Whether to return a copy of adata or modify it inplace. Defaults to False.

Return type:

AnnData

Returns:

An updated AnnData object with imputed values.

Raises:
  • ValueError – If the selected imputation strategy is not applicable to the data.

  • ValueError – If an unknown imputation strategy is provided.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.simple_impute(adata, strategy="median")