ehrapy.preprocessing.matrix_factorization_impute(adata, var_names=None, warning_threshold=70, rank=40, learning_rate=0.01, max_iters=50, shrinkage_value=0, min_value=None, max_value=None, verbose=False, copy=False)[source]#

Impute data using the MatrixFactorization.

See Train a matrix factorization model to predict empty entries in a matrix.

  • adata (AnnData) – The AnnData object to use MatrixFactorization on.

  • var_names (Iterable[str] | None) – A list of var names indicating which columns to impute (if None -> all columns).

  • warning_threshold (int) – Threshold of percentage of missing values to display a warning for. Defaults to 30 .

  • rank (int) – Number of latent factors to use in the matrix factorization model. It determines the size of the latent feature space that will be used to estimate the missing values. A higher rank will allow for more complex relationships between the features, but it can also lead to overfitting. Defaults to 40.

  • learning_rate (float) – The learning rate is the step size at which the optimization algorithm updates the model parameters during training. A larger learning rate can lead to faster convergence, but if it is set too high, the optimization can become unstable. Defaults to 0.01.

  • max_iters (int) – Maximum number of iterations to train the matrix factorization model for. The algorithm stops once this number of iterations is reached, or if convergence is achieved earlier. Defaults to 50.

  • shrinkage_value (float) – The shrinkage value is a regularization parameter that controls the amount of shrinkage applied to the estimated values during optimization. This term is added to the loss function and serves to penalize large values in the estimated matrix. A higher shrinkage value can help prevent overfitting, but can also lead to underfitting if set too high. Defaults to 0.

  • min_value (float | None) – The minimum value allowed for the imputed data. Any imputed value less than min_value is clipped to min_value. Defaults to None.

  • max_value (float | None) – The maximum value allowed for the imputed data. Any imputed value greater than max_value is clipped to max_value. Defaults to None.

  • verbose (bool) – Whether or not to printout training progress. Defaults to False.

  • copy (bool) – Whether to return a copy or act in place. Defaults to False.

Return type:



The imputed AnnData object


>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.matrix_factorization_impute(adata)