ehrapy.preprocessing.missing_data_mask#
- ehrapy.preprocessing.missing_data_mask(edata, *, layer=None, mask_values=None, key_added='missing_data_mask', copy=False)[source]#
Create a boolean mask indicating missing values in the data matrix.
By default marks
NaNvalues as missing. Optionally also marks user-specified sentinel values (e.g.-1,0,999) as missing.The result is stored in
edata.layers[key_added]and preserves the array backend of the source matrix: dense in / dense out, sparse in / sparse out, dask in / dask out.- Parameters:
edata (
EHRData) – Central data object.layer (
str|None, default:None) – Layer to use instead ofedata.X.mask_values (
Iterable[float|int] |None, default:None) – Additional values to treat as missing besidesNaN. Not supported on sparse arrays — densify first or use a dense layer.key_added (
str, default:'missing_data_mask') – Key under which the boolean mask is stored inedata.layers.copy (
bool, default:False) – IfTrue, return a modified copy; otherwise modify in place.
- Return type:
- Returns:
Noneifcopy=False, otherwise the updated data object.
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> ep.pp.missing_data_mask(edata) >>> edata EHRData object with n_obs × n_vars × n_t = 1776 × 46 × 1 layers: 'missing_data_mask' shape of .X: (1776, 46) shape of .missing_data_mask: (1776, 46)