ehrapy.preprocessing.quantile_norm#
- ehrapy.preprocessing.quantile_norm(edata, vars=None, group_key=None, layer=None, copy=False, **kwargs)[source]#
Apply quantile normalization.
Functionality is provided by
QuantileTransformer, see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html for details. If edata.X is a Dask Array, functionality is provided byQuantileTransformer, see https://ml.dask.org/modules/generated/dask_ml.preprocessing.QuantileTransformer.html for details.Supports both 2D and 3D data:
2D data: Standard normalization across observations
3D data: Per-variable normalization across samples and timestamps
- Parameters:
edata (
EHRData) – Central data object. Must already be encoded usingencode().vars (
str|Sequence[str] |None, default:None) – List of the names of the numeric variables to normalize. If None all numeric variables will be normalized.group_key (
str|None, default:None) – Key in edata.obs that contains group information. If provided, scaling is applied per group.copy (
bool, default:False) – Whether to return a copy or act in place.**kwargs – Additional arguments passed to the QuantileTransformer.
- Return type:
- Returns:
None if copy=False and modifies the passed edata, else returns an updated object. Also stores a record of applied normalizations as a dictionary in edata.uns[“normalization”].
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> import numpy as np >>> edata = ed.dt.physionet2012(layer="tem_data") >>> np.nanmin(edata.layers["tem_data"]), np.nanmax(edata.layers["tem_data"]) (-17.8, 36400.0) >>> ep.pp.quantile_norm(edata, layer="tem_data") >>> np.nanmin(edata.layers["tem_data"]), np.nanmax(edata.layers["tem_data"]) (0.0, 1.0)