Tools#

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.

Embeddings#

tools.tsne

Calculates t-SNE [vdMH08], [ADT+13], and [PVG+11].

tools.umap

Embed the neighborhood graph using UMAP [MHM18].

tools.draw_graph

Force-directed graph drawing [IKM+11], [JVHB14], and [Chi18].

tools.diffmap

Diffusion Maps [CLL+05], [HBT15], [WHP+19].

tools.embedding_density

Calculate the density of observation in an embedding (per condition).

Clustering and trajectory inference#

tools.leiden

Cluster observations into subgroups [TWvE19].

tools.dendrogram

Computes a hierarchical clustering for the given groupby categories.

tools.dpt

Infer progression of observations through geodesic distance along the graph [HBW+16], [WHP+19].

tools.paga

Mapping out the coarse-grained connectivity structures of complex manifolds [WHP+19].

Feature Ranking#

tools.rank_features_groups

Rank features for characterizing groups.

tools.filter_rank_features_groups

Filters out features based on fold change and fraction of features containing the feature within and outside the groupby categories.

tools.rank_features_supervised

Calculate feature importances for predicting a specified feature in adata.var.

Dataset integration#

tools.ingest

Map labels and embeddings from reference data to new data.

Survival Analysis#

tools.ols

Create an Ordinary Least Squares (OLS) Model from a formula and the data object.

tools.glm

Create a Generalized Linear Model (GLM) from a formula, a distribution, and the data object.

tools.kaplan_meier

Fit the Kaplan-Meier estimate for the survival function.

tools.test_kmf_logrank

Calculates the p-value for the logrank test comparing the survival functions of two groups.

tools.test_nested_f_statistic

Calculate the P value indicating if a larger GLM, encompassing a smaller GLM's parameters, adds explanatory power.

tools.cox_ph

Fit the Cox’s proportional hazard for the survival function.

tools.cox_ph_adjusted_curves

Compute CoxPH adjusted survival curves stratified by a grouping variable.

tools.weibull_aft

Fit the Weibull accelerated failure time regression for the survival function.

tools.log_logistic_aft

Fit the log logistic accelerated failure time regression for the survival function.

tools.nelson_aalen

Employ the Nelson-Aalen estimator to estimate the cumulative hazard function from censored survival data.

tools.weibull

Employ the Weibull model in univariate survival analysis to understand event occurrence dynamics.

Causal Inference#

ehrapy ships a small, dependency-light set of causal inference estimators built directly on top of scikit-learn. ATE estimators handle binary treatments via inverse probability of treatment weighting (IPTW), parametric g-computation, the doubly-robust augmented IPW (AIPW), and propensity score matching. Heterogeneous treatment effects (CATE) are available via the T-, S-, and X-learner meta-learners. Two diagnostics — covariate balance and positivity — round out the toolkit.

tools.iptw

Estimate the average treatment effect by inverse probability of treatment weighting (IPTW).

tools.g_computation

Estimate the ATE by parametric g-computation (a.k.a.

tools.aipw

Estimate the ATE by the augmented inverse-probability-weighted (AIPW) doubly robust estimator.

tools.propensity_score_matching

Estimate the treatment effect by 1-to-\(k\) propensity score matching on the logit scale.

tools.t_learner

Two-model (T-learner) CATE estimator.

tools.s_learner

Single-model (S-learner) CATE estimator.

tools.x_learner

X-learner CATE estimator of Künzel et al. (2019).

tools.covariate_balance

Report standardised mean differences (SMD) for each covariate, before and after weighting.

tools.positivity_check

Diagnose the positivity assumption by inspecting the propensity score distribution.

tools.CausalEstimate

Result of a causal effect estimation.

Normalized Complexity Profile#

tools.ncp

Non-negative CP (PARAFAC) decomposition of a 3D temporal EHR layer.

Cohort Tracking & summaries#

tools.CohortTracker

Track cohort changes over multiple filtering or processing steps.

tools.stratified_table_one

Build a stratified "Table 1" comparing baseline characteristics across groups.