ehrapy.preprocessing.knn_impute#

ehrapy.preprocessing.knn_impute(adata, var_names=None, n_neighbours=5, copy=False, warning_threshold=30)[source]#

Imputes missing values in the input AnnData object using K-nearest neighbor imputation.

When using KNN Imputation with mixed data (non-numerical and numerical), encoding using ordinal encoding is required since KNN Imputation can only work on numerical data. The encoding itself is just a utility and will be undone once imputation ran successfully. :rtype: AnnData

Args:

adata: An annotated data matrix containing gene expression values. var_names: A list of variable names indicating which columns to impute.

If None, all columns are imputed. Default is None.

n_neighbours: Number of neighbors to use when performing the imputation. Defaults to 5. copy: Whether to perform the imputation on a copy of the original AnnData object.

If True, the original object remains unmodified. Defaults to False.

warning_threshold: Percentage of missing values above which a warning is issued. Defaults to 30.

Returns:

An updated AnnData object with imputed values.

Raises:

ValueError: If the input data matrix contains only categorical (non-numeric) values.

Examples:
>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.knn_impute(adata)