ehrapy.preprocessing.knn_impute(adata, var_names=None, n_neighbours=5, copy=False, warning_threshold=70)[source]#

Imputes missing values in the input AnnData object using K-nearest neighbor imputation.

When using KNN Imputation with mixed data (non-numerical and numerical), encoding using ordinal encoding is required since KNN Imputation can only work on numerical data. The encoding itself is just a utility and will be undone once imputation ran successfully.

  • adata (AnnData) – An annotated data matrix containing gene expression values.

  • var_names (Iterable[str] | None) – A list of variable names indicating which columns to impute. If None, all columns are imputed. Default is None.

  • n_neighbours (int) – Number of neighbors to use when performing the imputation. Defaults to 5.

  • copy (bool) – Whether to perform the imputation on a copy of the original AnnData object. If True, the original object remains unmodified. Defaults to False.

  • warning_threshold (int) – Percentage of missing values above which a warning is issued. Defaults to 30.

Return type:



An updated AnnData object with imputed values.


ValueError – If the input data matrix contains only categorical (non-numeric) values.


>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.knn_impute(adata)