ehrapy.preprocessing.minmax_norm

Contents

ehrapy.preprocessing.minmax_norm#

ehrapy.preprocessing.minmax_norm(edata, vars=None, group_key=None, layer=None, copy=False, **kwargs)[source]#

Apply min-max normalization.

Functionality is provided by MinMaxScaler, see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html for details. If edata.X is a Dask Array, functionality is provided by MinMaxScaler, see https://ml.dask.org/modules/generated/dask_ml.preprocessing.MinMaxScaler.html for details.

Supports both 2D and 3D data:

  • 2D data: Standard normalization across observations

  • 3D data: Per-variable normalization across samples and timestamps

Parameters:
  • edata (EHRData | AnnData) – Central data object. Must already be encoded using encode().

  • vars (str | Sequence[str] | None, default: None) – List of the names of the numeric variables to normalize. If None all numeric variables will be normalized.

  • group_key (str | None, default: None) – Key in edata.obs that contains group information. If provided, scaling is applied per group.

  • layer (str | None, default: None) – The layer to normalize.

  • copy (bool, default: False) – Whether to return a copy or act in place.

  • **kwargs – Additional arguments passed to the MinMaxScaler.

Return type:

EHRData | AnnData | None

Returns:

None if copy=False and modifies the passed edata, else returns an updated object. Also stores a record of applied normalizations as a dictionary in edata.uns[“normalization”].

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> import numpy as np
>>> edata = ed.dt.physionet2012(layer="tem_data")
>>> np.nanmin(edata.layers["tem_data"]), np.nanmax(edata.layers["tem_data"])
(-17.8, 36400.0)
>>> ep.pp.minmax_norm(edata, layer="tem_data")
>>> np.nanmin(edata.layers["tem_data"]), np.nanmax(edata.layers["tem_data"])
(0.0, 1.0)