ehrapy.tools.dpt

Contents

ehrapy.tools.dpt#

ehrapy.tools.dpt(edata, *, n_dcs=10, n_branchings=0, min_group_size=0.01, allow_kendall_tau_shift=True, neighbors_key=None, copy=False)[source]#

Infer progression of observations through geodesic distance along the graph [HBW+16], [WHP+19].

Reconstruct the progression of a biological process from snapshot data. Diffusion Pseudotime has been introduced by [HBW+16] and implemented within Scanpy [WAT18]. Here, we use a further developed version, which is able to deal with disconnected graphs [WHP+19] and can be run in a hierarchical mode by setting the parameter n_branchings>1. We recommend, however, to only use dpt() for computing pseudotime (n_branchings=0) and to detect branchings via paga(). For pseudotime, you need to annotate your data with a root cell. For instance edata.uns[‘iroot’] = np.flatnonzero(edata.obs[‘cell_types’] == ‘Stem’)[0] This requires to run neighbors(), first. In order to reproduce the original implementation of DPT, use method==’gauss’ in this. Using the default method==’umap’ only leads to minor quantitative differences, though.

Parameters:
  • edata (EHRData) – Central data object.

  • n_dcs (int, default: 10) – The number of diffusion components to use.

  • n_branchings (int, default: 0) – Number of branchings to detect.

  • min_group_size (float, default: 0.01) – During recursive splitting of branches (‘dpt groups’) for n_branchings > 1, do not consider groups that contain less than min_group_size data points. If a float, min_group_size refers to a fraction of the total number of data points.

  • allow_kendall_tau_shift (bool, default: True) – If a very small branch is detected upon splitting, shift away from maximum correlation in Kendall tau criterion of [HBW+16] to stabilize the splitting.

  • neighbors_key (str | None, default: None) – If not specified, dpt looks .uns[‘neighbors’] for neighbors settings and .obsp[‘connectivities’], .obsp[‘distances’] for connectivities and distances respectively (default storage places for pp.neighbors). If specified, dpt looks .uns[neighbors_key] for neighbors settings and .obsp[.uns[neighbors_key][‘connectivities_key’]], .obsp[.uns[neighbors_key][‘distances_key’]] for connectivities and distances respectively.

  • copy (bool, default: False) – Copy instance before computation and return a copy. Otherwise, perform computation in place and return None.

Return type:

EHRData | None

Returns:

Depending on copy, returns or updates edata with the following fields. If n_branchings==0, no field dpt_groups will be written.

  • dpt_pseudotime : pandas.Series (edata.obs, dtype float) Array of dim (number of samples) that stores the pseudotime of each observation, that is, the DPT distance with respect to the root observation.

  • dpt_groups : pandas.Series (edata.obs, dtype category) Array of dim (number of samples) that stores the subgroup id (‘0’, ‘1’, …) for each observation.