ehrapy.preprocessing.mcar_test

Contents

ehrapy.preprocessing.mcar_test#

ehrapy.preprocessing.mcar_test(edata, method='little', *, layer=None)[source]#

Statistical hypothesis test for Missing Completely At Random (MCAR).

Performs Little’s MCAR test or pairwise t-tests.

The null hypothesis of Little’s test is that data is Missing Completely At Random (MCAR). A small p-value suggests the data is not MCAR.

We advise to use Little’s MCAR test carefully. Rejecting the null hypothesis may not always mean that data is not MCAR, nor is accepting the null hypothesis a guarantee that data is MCAR. See Schouten, R. M., & Vink, G. (2021). The Dance of the Mechanisms: How Observed Information Influences the Validity of Missingness Assumptions. Sociological Methods & Research, 50(3), 1243-1258. https://doi.org/10.1177/0049124118799376 for a thorough discussion of missingness mechanisms.

Parameters:
  • edata (EHRData) – Central data object.

  • method (Literal['little', 'ttest'], default: 'little') – "little" for a global chi-square test or "ttest" for pairwise Welch t-tests across all variable combinations.

  • layer (str | None, default: None) – Layer to apply the test to. Uses X if None.

Return type:

float | DataFrame

Returns:

A single p-value if the Little’s test was applied or a Pandas DataFrame of the p-value of t-tests for each pair of features.

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.ehrdata_blobs(
...     n_observations=100, n_variables=5, missing_values=0.1, random_state=0, n_centers=1, base_timepoints=1
... )
>>> ep.pp.mcar_test(edata)
0.327...