ehrapy.preprocessing.iterative_svd_impute

ehrapy.preprocessing.iterative_svd_impute(adata, var_names=None, copy=False, warning_threshold=70, rank=10, convergence_threshold=1e-05, max_iters=200, gradual_rank_increase=True, svd_algorithm='arpack', init_fill_method='mean', min_value=None, max_value=None, verbose=False)[source]

Impute missing values in an AnnData object using the IterativeSVD algorithm.

The IterativeSVD algorithm is a matrix completion method based on iterative low-rank singular value decomposition (SVD). This function can impute missing values for numerical and ordinal-encoded data.

Parameters:
  • adata (AnnData) – An AnnData object to impute missing values in.

  • var_names (Iterable[str] | None) – A list of var names indicating which columns to impute. If None, all columns will be imputed. Defaults to None.

  • copy (bool) – Whether to return a copy of the AnnData object or act in place. Defaults to False.

  • warning_threshold (int) – Threshold of percentage of missing values to display a warning for. Defaults to 70.

  • rank (int) – Rank of the SVD decomposition. Defaults to 10.

  • convergence_threshold (float) – Convergence threshold for the iterative algorithm. The algorithm stops when the relative difference in Frobenius norm between two iterations is less than convergence_threshold. Defaults to 0.00001.

  • max_iters (int) – Maximum number of iterations. The algorithm stops after max_iters iterations if it does not converge. Defaults to 200.

  • gradual_rank_increase (bool) – Whether to increase the rank gradually or to use the rank value immediately. Defaults to True.

  • svd_algorithm (Literal['arpack', 'randomized']) – The SVD algorithm to use. Can be one of {‘arpack’, ‘randomized’}. Defaults to arpack.

  • init_fill_method (Literal['zero', 'mean', 'median']) – The fill method to use for initializing missing values. Can be one of {‘zero’, ‘mean’, ‘median’}. Defaults to mean.

  • min_value (float | None) – The minimum value allowed for the imputed data. Any imputed value less than min_value is clipped to min_value. Defaults to None.

  • max_value (float | None) – The maximum value allowed for the imputed data. Any imputed value greater than max_value is clipped to max_value. Defaults to None.

  • verbose (bool) – Whether to print progress messages during the imputation. Defaults to False.

Return type:

AnnData

Returns:

An AnnData object with imputed values.

Raises:
  • ValueError – If svd_algorithm is not one of {‘arpack’, ‘randomized’}.

  • ValueError – If init_fill_method is not one of {‘zero’, ‘mean’, ‘median’}.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.iterative_svd_impute(adata)