ehrapy.preprocessing.winsorize#

ehrapy.preprocessing.winsorize(adata, vars=None, obs_cols=None, limits=None, copy=False, **kwargs)[source]#

Returns a Winsorized version of the input array.

The implementation is based on https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html

Parameters:
  • adata (AnnData) – AnnData object to winsorize

  • vars (Union[str, list[str], set[str], None]) – The features to winsorize.

  • obs_cols (Union[str, list[str], set[str], None]) – Columns in obs with features to winsorize.

  • limits (Optional[list[float]]) – Tuple of the percentages to cut on each side of the array as floats between 0. and 1. Defaults to (0.01, 0.99)

  • copy (bool) – Whether to return a copy or not

  • **kwargs – Keywords arguments get passed to scipy.stats.mstats.winsorize

Return type:

AnnData

Returns:

Winsorized AnnData object if copy is True.

Examples

>>> import ehrapy as ep
>>> adata = ep.data.mimic_2(encoded=True)
>>> ep.pp.winsorize(adata, ['bmi'])