ehrapy.preprocessing.knn_impute#
- ehrapy.preprocessing.knn_impute(edata, var_names=None, *, n_neighbors=5, layer=None, copy=False, backend='faiss', warning_threshold=70, backend_kwargs=None, **kwargs)[source]#
Imputes missing values in the input data object using K-nearest neighbor imputation.
If required, the data needs to be properly encoded as this imputation requires numerical data only.
Warning
Currently, both n_neighbours and n_neighbors are accepted as parameters for the number of neighbors. However, in future versions, only n_neighbors will be supported. Please update your code accordingly.
- Parameters:
var_names (
Iterable[str] |None, default:None) – A list of variable names indicating which columns to impute. If None, all columns are imputed. Default is None.n_neighbors (
int, default:5) – Number of neighbors to use when performing the imputation.copy (
bool, default:False) – Whether to perform the imputation on a copy of the original data object. If True, the original object remains unmodified.backend (
Literal['scikit-learn','faiss'], default:'faiss') – The implementation to use for the KNN imputation. ‘scikit-learn’ is very slow but uses an exact KNN algorithm, whereas ‘faiss’ is drastically faster but uses an approximation for the KNN graph. In practice, ‘faiss’ is close enough to the ‘scikit-learn’ results.warning_threshold (
int, default:70) – Percentage of missing values above which a warning is issued.backend_kwargs (
dict|None, default:None) – Passed to the backend. Pass “mean”, “median”, or “weighted” for ‘strategy’ to set the imputation strategy for faiss. See sklearn.impute.KNNImputer for more information on the ‘scikit-learn’ backend. See fknni.faiss.FaissImputer for more information on the ‘faiss’ backend.kwargs – Gathering keyword arguments of earlier ehrapy versions for backwards compatibility. It is encouraged to use the here listed, current arguments.
- Return type:
- Returns:
If copy is True, a modified copy of the original data object with imputed X. If copy is False, the original data object is modified in place, and None is returned.
Examples
>>> import ehrdata as ed >>> import ehrapy as ep >>> edata = ed.dt.mimic_2() >>> ep.ad.infer_feature_types(edata) >>> ep.pp.knn_impute(edata)