ehrapy.preprocessing.mice_forest_impute(adata, var_names=None, warning_threshold=70, save_all_iterations=True, random_state=None, inplace=False, iterations=5, variable_parameters=None, verbose=False, copy=False)[source]#

Impute data using the miceforest.

See Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm.

  • adata (AnnData) – The AnnData object containing the data to impute.

  • var_names (Iterable[str] | None) – A list of variable names to impute. If None, impute all variables.

  • warning_threshold (int) – Threshold of percentage of missing values to display a warning for. Defaults to 30.

  • save_all_iterations (bool) – Whether to save all imputed values from all iterations or just the latest. Saving all iterations allows for additional plotting, but may take more memory. Defaults to True.

  • random_state (int | None) – The random state ensures script reproducibility. Defaults to None.

  • inplace (bool) – If True, modify the input AnnData object in-place and return None. If False, return a copy of the modified AnnData object. Default is False.

  • iterations (int) – The number of iterations to run. Defaults to 5.

  • variable_parameters (dict | None) – Model parameters can be specified by variable here. Keys should be variable names or indices, and values should be a dict of parameter which should apply to that variable only. Defaults to None.

  • verbose (bool) – Whether to print information about the imputation process. Defaults to False.

  • copy (bool) – Whether to return a copy of the AnnData object or modify it in-place. Defaults to False.

Return type:



The imputed AnnData object.


>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.mice_forest_impute(adata)