ehrapy.preprocessing.mice_forest_impute#

ehrapy.preprocessing.mice_forest_impute(adata, var_names=None, warning_threshold=70, save_all_iterations=True, random_state=None, inplace=False, iterations=5, variable_parameters=None, verbose=False, copy=False)[source]#

Impute data using the miceforest.

See https://github.com/AnotherSamWilson/miceforest Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm.

Parameters:

adata (AnnData) – The AnnData object containing the data to impute.
var_names (Iterable[str] | None) – A list of variable names to impute. If None, impute all variables.
warning_threshold (int) – Threshold of percentage of missing values to display a warning for. Defaults to 30.
save_all_iterations (bool) – Whether to save all imputed values from all iterations or just the latest. Saving all iterations allows for additional plotting, but may take more memory. Defaults to True.
random_state (int | None) – The random state ensures script reproducibility. Defaults to None.
inplace (bool) – If True, modify the input AnnData object in-place and return None. If False, return a copy of the modified AnnData object. Default is False.
iterations (int) – The number of iterations to run. Defaults to 5.
variable_parameters (dict | None) – Model parameters can be specified by variable here. Keys should be variable names or indices, and values should be a dict of parameter which should apply to that variable only. Defaults to None.
verbose (bool) – Whether to print information about the imputation process. Defaults to False.
copy (bool) – Whether to return a copy of the AnnData object or modify it in-place. Defaults to False.

Return type:

AnnData

Returns:

The imputed AnnData object.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.mice_forest_impute(adata)