ehrapy.preprocessing.mice_forest_impute

ehrapy.preprocessing.mice_forest_impute(adata, var_names=None, *, warning_threshold=70, save_all_iterations=True, random_state=None, inplace=False, iterations=5, variable_parameters=None, verbose=False, copy=False)[source]

Impute data using the miceforest.

See https://github.com/AnotherSamWilson/miceforest Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm.

Parameters:
  • adata (AnnData) – The AnnData object containing the data to impute.

  • var_names (Iterable[str] | None, default: None) – A list of variable names to impute. If None, impute all variables.

  • warning_threshold (int, default: 70) – Threshold of percentage of missing values to display a warning for.

  • save_all_iterations (bool, default: True) – Whether to save all imputed values from all iterations or just the latest. Saving all iterations allows for additional plotting, but may take more memory.

  • random_state (int | None, default: None) – The random state ensures script reproducibility.

  • inplace (bool, default: False) – If True, modify the input AnnData object in-place and return None. If False, return a copy of the modified AnnData object. Default is False.

  • iterations (int, default: 5) – The number of iterations to run.

  • variable_parameters (dict | None, default: None) – Model parameters can be specified by variable here. Keys should be variable names or indices, and values should be a dict of parameter which should apply to that variable only.

  • verbose (bool, default: False) – Whether to print information about the imputation process.

  • copy (bool, default: False) – Whether to return a copy of the AnnData object or modify it in-place.

Return type:

AnnData

Returns:

The imputed AnnData object.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.mice_forest_impute(adata)