ehrapy.preprocessing.miss_forest_impute¶
- ehrapy.preprocessing.miss_forest_impute(adata, var_names=None, *, num_initial_strategy='mean', max_iter=3, n_estimators=100, random_state=0, warning_threshold=70, copy=False)[source]¶
Impute data using the MissForest strategy.
This function uses the MissForest strategy to impute missing values in the data matrix of an AnnData object. The strategy works by fitting a random forest model on each feature containing missing values, and using the trained model to predict the missing values.
See https://academic.oup.com/bioinformatics/article/28/1/112/219101.
If required, the data needs to be properly encoded as this imputation requires numerical data only.
- Parameters:
adata (
AnnData
) – The AnnData object to use MissForest Imputation on.var_names (
Iterable
[str
] |None
, default:None
) – Iterable of columns to imputenum_initial_strategy (
Literal
['mean'
,'median'
,'most_frequent'
,'constant'
], default:'mean'
) – The initial strategy to replace all missing numerical values with.max_iter (
int
, default:3
) – The maximum number of iterations if the stop criterion has not been met yet.n_estimators (
int
, default:100
) – The number of trees to fit for every missing variable. Has a big effect on the run time. Decrease for faster computations.random_state (
int
, default:0
) – The random seed for the initialization.warning_threshold (
int
, default:70
) – Threshold of percentage of missing values to display a warning for.copy (
bool
, default:False
) – Whether to return a copy or act in place.
- Return type:
- Returns:
If copy is True, a modified copy of the original AnnData object with imputed X. If copy is False, the original AnnData object is modified in place, and None is returned.
Examples
>>> import ehrapy as ep >>> adata = ep.dt.mimic_2(encoded=True) >>> ep.pp.miss_forest_impute(adata)