ehrapy.preprocessing.winsorize

Contents

ehrapy.preprocessing.winsorize#

ehrapy.preprocessing.winsorize(edata, vars=None, obs_cols=None, *, limits=(0.01, 0.99), layer=None, copy=False, **kwargs)[source]#

Returns a Winsorized version of the input array.

The implementation is based on https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html

Parameters:
  • edata (EHRData) – Central data object.

  • vars (Collection[str], default: None) – The features to winsorize.

  • obs_cols (Collection[str], default: None) – Columns in obs with features to winsorize.

  • limits (tuple[float, float], default: (0.01, 0.99)) – Tuple of the percentages to cut on each side of the array as floats between 0. and 1.

  • layer (str | None, default: None) – The layer to operate on.

  • copy (bool, default: False) – Whether to return a copy.

  • **kwargs – Keywords arguments get passed to scipy.stats.mstats.winsorize.

Return type:

EHRData | None

Returns:

Winsorized data object if copy is True.

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> ep.pp.winsorize(edata, vars=["bmi"])