ehrapy.preprocessing.winsorize#
- ehrapy.preprocessing.winsorize(edata, vars=None, obs_cols=None, *, limits=(0.01, 0.99), layer=None, copy=False, **kwargs)[source]#
Returns a Winsorized version of the input array.
The implementation is based on https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html
- Parameters:
edata (
EHRData) – Central data object.vars (
Collection[str], default:None) – The features to winsorize.obs_cols (
Collection[str], default:None) – Columns in obs with features to winsorize.limits (
tuple[float,float], default:(0.01, 0.99)) – Tuple of the percentages to cut on each side of the array as floats between 0. and 1.layer (
str|None, default:None) – The layer to operate on.copy (
bool, default:False) – Whether to return a copy.**kwargs – Keywords arguments get passed to scipy.stats.mstats.winsorize.
- Return type:
- Returns:
Winsorized data object if copy is True.
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> ep.pp.winsorize(edata, vars=["bmi"])