ehrapy.preprocessing.pca¶
- ehrapy.preprocessing.pca(data, n_comps=None, zero_center=True, svd_solver='arpack', random_state=0, return_info=False, dtype='float32', copy=False, chunked=False, chunk_size=None)[source]¶
Computes a principal component analysis.
Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn.
- Parameters:
data (
AnnData
|ndarray
|spmatrix
) – The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to observations and columns to features.n_comps (
int
|None
, default:None
) – Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.zero_center (
bool
|None
, default:True
) – If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (usesTruncatedSVD
), which allows to handle sparse input efficiently. Passing None decides automatically based on sparseness of the data.svd_solver (
str
, default:'arpack'
) –SVD solver to use:
’arpack’ (the default) for the ARPACK wrapper in SciPy (
svds()
)’randomized’ for the randomized algorithm due to Halko (2009).
’auto’ chooses automatically depending on the size of the problem.
’lobpcg’ An alternative SciPy solver.
Efficient computation of the principal components of a sparse matrix currently only works with the ‘arpack’ or ‘lobpcg’ solvers.
random_state (
int
|RandomState
|None
, default:0
) – Change to use different initial states for the optimization.return_info (
bool
, default:False
) – Only relevant when not passing anAnnData
: see “Returns”.dtype (
str
, default:'float32'
) – Numpy data type string to which to convert the result.copy (
bool
, default:False
) – If anAnnData
is passed, determines whether a copy is returned. Is ignored otherwise.chunked (
bool
, default:False
) – If True, perform an incremental PCA on segments of chunk_size. The incremental PCA automatically zero centers and ignores settings of random_seed and svd_solver. If False, perform a full PCA.chunk_size (
int
|None
, default:None
) – Number of observations to include in each chunk. Required if chunked=True was passed.
- Return type:
- Returns:
-
If data is array-like and return_info=False was passed, this function only returns X_pca…
adata :
AnnData
…otherwise if copy=True it returns or else adds fields to adata:
.obsm[‘X_pca’] PCA representation of data.
.varm[‘PCs’] The principal components containing the loadings.
.uns[‘pca’][‘variance_ratio’] Ratio of explained variance.
.uns[‘pca’][‘variance’] Explained variance, equivalent to the eigenvalues of the covariance matrix.