ehrapy.preprocessing.knn_impute

ehrapy.preprocessing.knn_impute(adata, var_names=None, *, n_neighbors=5, copy=False, backend='faiss', warning_threshold=70, backend_kwargs=None, **kwargs)[source]

Imputes missing values in the input AnnData object using K-nearest neighbor imputation.

When using KNN Imputation with mixed data (non-numerical and numerical), encoding using ordinal encoding is required since KNN Imputation can only work on numerical data. The encoding itself is just a utility and will be undone once imputation ran successfully.

Warning

Currently, both n_neighbours and n_neighbors are accepted as parameters for the number of neighbors. However, in future versions, only n_neighbors will be supported. Please update your code accordingly.

Parameters:
  • adata (AnnData) – An annotated data matrix containing EHR data.

  • var_names (Iterable[str] | None, default: None) – A list of variable names indicating which columns to impute. If None, all columns are imputed. Default is None.

  • n_neighbors (int, default: 5) – Number of neighbors to use when performing the imputation.

  • copy (bool, default: False) – Whether to perform the imputation on a copy of the original AnnData object. If True, the original object remains unmodified.

  • backend (Literal['scikit-learn', 'faiss'], default: 'faiss') – The implementation to use for the KNN imputation. ‘scikit-learn’ is very slow but uses an exact KNN algorithm, whereas ‘faiss’ is drastically faster but uses an approximation for the KNN graph. In practice, ‘faiss’ is close enough to the ‘scikit-learn’ results.

  • warning_threshold (int, default: 70) – Percentage of missing values above which a warning is issued.

  • backend_kwargs (dict | None, default: None) – Passed to the backend. Pass “mean”, “median”, or “weighted” for ‘strategy’ to set the imputation strategy for faiss. See sklearn.impute.KNNImputer for more information on the ‘scikit-learn’ backend. See fknni.faiss.FaissImputer for more information on the ‘faiss’ backend.

  • kwargs – Gathering keyword arguments of earlier ehrapy versions for backwards compatibility. It is encouraged to use the here listed, current arguments.

Return type:

AnnData

Returns:

An updated AnnData object with imputed values.

Raises:

ValueError – If the input data matrix contains only categorical (non-numeric) values.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.knn_impute(adata)