ehrapy.tools.dendrogram

ehrapy.tools.dendrogram(adata, groupby, n_pcs=None, use_rep=None, var_names=None, cor_method='pearson', linkage_method='complete', optimal_ordering=False, key_added=None, inplace=True)[source]

Computes a hierarchical clustering for the given groupby categories.

By default, the PCA representation is used unless .X has less than 50 variables. Alternatively, a list of var_names (e.g. genes) can be given. Average values of either var_names or components are used to compute a correlation matrix.

The hierarchical clustering can be visualized using ehrapy.pl.dendrogram() or multiple other visualizations that can include a dendrogram: matrixplot(), heatmap(), dotplot(), and stacked_violin().

Note

The computation of the hierarchical clustering is based on predefined groups and not per observation. The correlation matrix is computed using by default pearson but other methods are available.

Parameters:
  • adata (AnnData) – AnnData object containing all observations.

  • groupby (str) – Key to group by

  • n_pcs (Optional[int], default: None) – Use this many PCs. If n_pcs==0 use .X if use_rep is None.

  • use_rep (Optional[str], default: None) – Use the indicated representation. ‘X’ or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.

  • var_names (Optional[Sequence[str]], default: None) – List of var_names to use for computing the hierarchical clustering. If var_names is given, then use_rep and n_pcs is ignored.

  • cor_method (str, default: 'pearson') – correlation method to use. Options are ‘pearson’, ‘kendall’, and ‘spearman’

  • linkage_method (str, default: 'complete') – linkage method to use. See scipy.cluster.hierarchy.linkage() for more information.

  • optimal_ordering (bool, default: False) – Same as the optimal_ordering argument of scipy.cluster.hierarchy.linkage() which reorders the linkage matrix so that the distance between successive leaves is minimal.

  • key_added (Optional[str], default: None) – By default, the dendrogram information is added to .uns[f’dendrogram_{{groupby}}’]. Notice that the groupby information is added to the dendrogram.

  • inplace (bool, default: True) – If True, adds dendrogram information to adata.uns[key_added], else this function returns the information.

Return type:

Optional[dict[str, Any]]

Returns:

If inplace=False, returns dendrogram information, else adata.uns[key_added] is updated with it.

Examples

>>> import ehrapy as ep
>>> adata = ep.data.mimic_2(encoded=True)
>>> ep.tl.dendrogram(adata, groupby="service_unit")
>>> ep.pl.dendrogram(adata)