ehrapy.tools.marker_feature_overlap#

ehrapy.tools.marker_feature_overlap(adata, reference_markers, *, key='rank_features_groups', method='overlap_count', normalize=None, top_n_markers=None, adj_pval_threshold=None, key_added='feature_overlap', inplace=False)[source]#

Calculate an overlap score between data-deriven features and provided marker features.

Marker feature overlap scores can be quoted as overlap counts, overlap coefficients, or jaccard indices. The method returns a pandas dataframe which can be used to annotate clusters based on feature overlaps.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • reference_markers (Union[Dict[str, set], Dict[str, list]]) – A marker gene dictionary object. Keys should be strings with the cell identity name and values are sets or lists of strings which match format of adata.var_name.

  • key (str) – The key in adata.uns where the rank_features_groups output is stored (default: rank_features_groups).

  • method (Literal['overlap_count', 'overlap_coef', 'jaccard']) – Method to calculate marker gene overlap. ‘overlap_count’ uses the intersection of the feature set, ‘overlap_coef’ uses the overlap coefficient, and ‘jaccard’ uses the Jaccard index (default: overlap_count).

  • normalize (Optional[Literal['reference', 'data']]) – Normalization option for the feature overlap output. This parameter can only be set when method is set to ‘overlap_count’. ‘reference’ normalizes the data by the total number of marker features given in the reference annotation per group. ‘data’ normalizes the data by the total number of marker genes used for each cluster.

  • top_n_markers (Optional[int]) – The number of top data-derived marker genes to use. By default the top 100 marker features are used. If adj_pval_threshold is set along with top_n_markers, then adj_pval_threshold is ignored.

  • adj_pval_threshold (Optional[float]) – A significance threshold on the adjusted p-values to select marker features. This can only be used when adjusted p-values are calculated by ep.tl.rank_features_groups. If adj_pval_threshold is set along with top_n_markers, then adj_pval_threshold is ignored.

  • key_added (str) – Name of the .uns field that will contain the marker overlap scores.

  • inplace (bool) – Return a marker gene dataframe or store it inplace in adata.uns.

Returns:

A pandas dataframe with the marker gene overlap scores if inplace=False. For inplace=True adata.uns is updated with an additional field specified by the key_added parameter (default = ‘marker_gene_overlap’).

Examples

TODO