ehrapy.tools.propensity_score_matching

ehrapy.tools.propensity_score_matching#

ehrapy.tools.propensity_score_matching(edata, treatment, outcome, *, covariates, propensity_model='logistic', k=1, caliper=0.2, replacement=True, target='att', n_bootstrap=200, random_state=None, layer=None)[source]#

Estimate the treatment effect by 1-to-\(k\) propensity score matching on the logit scale.

For each treated unit, the \(k\) nearest control units in logit-propensity space are selected as matches (and vice versa when target='ate'). With caliper set, candidate matches with logit-propensity distance above caliper * SD(logit(e)) are discarded; treated units with no valid match are dropped from the estimate.

Parameters:
  • edata (EHRData) – Central data object.

  • treatment (str) – Column name of the binary (0/1) treatment variable.

  • outcome (str) – Column name of the outcome variable.

  • covariates (Sequence[str]) – Adjustment set used to fit the propensity model. Each entry must refer to a name in edata.var_names or edata.obs.columns.

  • propensity_model (str | BaseEstimator, default: 'logistic') – Propensity model specification (see iptw() for the accepted values).

  • k (int, default: 1) – Number of matches per unit.

  • caliper (float | None, default: 0.2) – Maximum logit-propensity distance for a valid match, in units of SD(logit(e)). Use None to disable the caliper.

  • replacement (bool, default: True) – Whether matching is performed with replacement.

  • target (str, default: 'att') – 'att' for the average treatment effect on the treated, or 'ate' for the average treatment effect.

  • n_bootstrap (int, default: 200) – Number of bootstrap resamples used for the SE and 95% percentile confidence interval. Set to 0 to skip uncertainty estimation.

  • random_state (int | None, default: None) – Seed for the bootstrap resampler.

  • layer (str | None, default: None) – Layer of edata to draw the var-side variables from. If None, edata.X is used.

Return type:

CausalEstimate

Returns:

A CausalEstimate whose params dict contains the propensity scores and the matched-pair indices.

Examples

>>> import ehrapy as ep
>>> import ehrdata as ed
>>> edata = ed.dt.mimic_2_preprocessed()
>>> est = ep.tl.propensity_score_matching(
...     edata,
...     "aline_flg",
...     "day_28_flg",
...     covariates=["age", "sofa_first", "sapsi_first"],
...     random_state=0,
... )
>>> print(est.summary())
Causal effect of 'aline_flg' on 'day_28_flg'
  method: propensity_score_matching_att
  ATE:    -0.0511
  SE:     0.0337
  95% CI: [-0.1209, 0.0051]
  n:      1776