ehrapy.preprocessing.missing_data_mask

ehrapy.preprocessing.missing_data_mask#

ehrapy.preprocessing.missing_data_mask(edata, *, layer=None, mask_values=None, key_added='missing_data_mask', copy=False)[source]#

Create a boolean mask indicating missing values in the data matrix.

By default marks NaN values as missing. Optionally also marks user-specified sentinel values (e.g. -1, 0, 999) as missing.

The result is stored in edata.layers[key_added] and preserves the array backend of the source matrix: dense in / dense out, sparse in / sparse out, dask in / dask out.

Parameters:
  • edata (EHRData) – Central data object.

  • layer (str | None, default: None) – Layer to use instead of edata.X.

  • mask_values (Iterable[float | int] | None, default: None) – Additional values to treat as missing besides NaN. Not supported on sparse arrays — densify first or use a dense layer.

  • key_added (str, default: 'missing_data_mask') – Key under which the boolean mask is stored in edata.layers.

  • copy (bool, default: False) – If True, return a modified copy; otherwise modify in place.

Return type:

EHRData | None

Returns:

None if copy=False, otherwise the updated data object.

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> ep.pp.missing_data_mask(edata)
>>> edata
EHRData object with n_obs × n_vars × n_t = 1776 × 46 × 1
    layers: 'missing_data_mask'
    shape of .X: (1776, 46)
    shape of .missing_data_mask: (1776, 46)