ehrapy.preprocessing.mice_forest_impute#
- ehrapy.preprocessing.mice_forest_impute(edata, var_names=None, *, warning_threshold=70, save_all_iterations_data=True, random_state=None, inplace=False, iterations=5, variable_parameters=None, verbose=False, layer=None, copy=False)[source]#
Impute data using the miceforest method.
See AnotherSamWilson/miceforest Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm.
If required, the data needs to be properly encoded as this imputation requires numerical data only.
Warning
This function is not supported on MacOS.
- Parameters:
edata (
EHRData) – Central data object.var_names (
Iterable[str] |None, default:None) – A list of variable names to impute. If None, impute all variables.warning_threshold (
int, default:70) – Threshold of percentage of missing values to display a warning for.save_all_iterations_data (
bool, default:True) – Whether to save all imputed values from all iterations or just the latest. Saving all iterations allows for additional plotting, but may take more memory.random_state (
int|None, default:None) – The random state ensures script reproducibility.inplace (
bool, default:False) – If True, modify the input data object in-place and return None. If False, return a copy of the modified data object. Default is False.iterations (
int, default:5) – The number of iterations to run.variable_parameters (
dict|None, default:None) – Model parameters can be specified by variable here. Keys should be variable names or indices, and values should be a dict of parameter which should apply to that variable only.verbose (
bool, default:False) – Whether to print information about the imputation process.copy (
bool, default:False) – Whether to return a copy of the data object or modify it in-place.
- Return type:
- Returns:
If copy is True, a modified copy of the original data object with imputed X. If copy is False, the original data object is modified in place, and None is returned.
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> edata = ep.pp.encode(edata, autodetect=True) >>> ep.pp.mice_forest_impute(edata)