ehrapy.preprocessing.qc_metrics#

ehrapy.preprocessing.qc_metrics(adata, qc_vars=(), layer=None, inplace=True)[source]#

Calculates various quality control metrics.

Uses the original values to calculate the metrics and not the encoded ones. Look at the return type for a more in depth description of the calculated metrics.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • qc_vars (Collection[str]) – Optional List of vars to calculate additional metrics for.

  • layer (Optional[str]) – Layer to use to calculate the metrics.

  • inplace (bool) – Whether to add the metrics to obs/var or to solely return a Pandas DataFrame.

Return type:

Optional[DataFrame]

Returns:

Pandas DataFrame of all calculated QC metrics.

Observation level metrics include:

missing_values_abs

Absolute amount of missing values.

missing_values_pct

Relative amount of missing values in percent.

Feature level metrics include:

missing_values_abs

Absolute amount of missing values.

missing_values_pct

Relative amount of missing values in percent.

mean

Mean value of the features.

median

Median value of the features.

std

Standard deviation of the features.

min

Minimum value of the features.

max

Maximum value of the features.

Examples

>>> import ehrapy as ep
>>> import seaborn as sns
>>> import matplotlib.pyplot as plt
>>> adata = ep.dt.mimic_2(encoded=True)
>>> ep.pp.qc_metrics(adata)
>>> sns.displot(adata.obs["missing_values_abs"])
>>> plt.show()