ehrapy.preprocessing.explicit_impute

ehrapy.preprocessing.explicit_impute#

ehrapy.preprocessing.explicit_impute(edata, replacement, *, layer=None, impute_empty_strings=True, warning_threshold=70, copy=False)[source]#

Replaces all missing values in all columns or a subset of columns specified by the user with the passed replacement value.

There are two scenarios to cover: 1. Replace all missing values with the specified value. 2. Replace all missing values in a subset of columns with a specified value per column.

Parameters:
  • edata (EHRData | AnnData) – Central data object.

  • replacement (str | int | dict[str, str | int]) – The value to replace missing values with. If a dictionary is provided, the keys represent column names and the values represent replacement values for those columns.

  • layer (str | None, default: None) – The layer to impute.

  • impute_empty_strings (bool, default: True) – If True, empty strings are also replaced.

  • warning_threshold (int, default: 70) – Threshold of percentage of missing values to display a warning for.

  • copy (bool, default: False) – If True, returns a modified copy of the original data object. If False, modifies the object in place.

Return type:

EHRData | AnnData | None

Returns:

If copy is True, a modified copy of the original data object with imputed X. If copy is False, the original data object is modified in place, and None is returned.

Examples

Replace all missing values in edata with the value 0:

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> ep.pp.explicit_impute(edata, replacement=0)