ehrapy.preprocessing.soft_impute

ehrapy.preprocessing.soft_impute(adata, var_names=None, *, copy=False, warning_threshold=70, shrinkage_value=None, convergence_threshold=0.001, max_iters=100, max_rank=None, n_power_iterations=1, init_fill_method='zero', min_value=None, max_value=None, normalizer=None, verbose=False)[source]

Impute data using the SoftImpute.

See https://github.com/iskandr/fancyimpute/blob/master/fancyimpute/soft_impute.py Matrix completion by iterative soft thresholding of SVD decompositions.

Parameters:
  • adata (AnnData) – The AnnData object to impute missing values for.

  • var_names (Iterable[str] | None) – A list of var names indicating which columns to impute (if None -> all columns).

  • copy (bool) – Whether to return a copy or act in place.

  • warning_threshold (int) – Threshold of percentage of missing values to display a warning for. Defaults to 70 .

  • shrinkage_value (float | None) – Value by which we shrink singular values on each iteration. If omitted then the default value will be the maximum singular value of the initialized matrix (zeros for missing values) divided by 50.

  • convergence_threshold (float) – Minimum ration difference between iterations (as a fraction of the Frobenius norm of the current solution) before stopping.

  • max_iters (int) – Maximum number of SVD iterations. Defaults to 100.

  • max_rank (int | None) – Perform a truncated SVD on each iteration with this value as its rank. Defaults to None.

  • n_power_iterations (int) – Number of power iterations to perform with randomized SVD. Defaults to 1.

  • init_fill_method (str) – How to initialize missing values of data matrix, default is to fill them with zeros.

  • min_value (float | None) – Smallest allowable value in the solution.

  • max_value (float | None) – Largest allowable value in the solution.

  • normalizer (object | None) – Any object (such as BiScaler) with fit() and transform() methods.

  • verbose (bool) – Print debugging info. Defaults to False.

Return type:

AnnData

Returns:

The AnnData object with imputed missing values.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.soft_impute(adata)