ehrapy.tools.propensity_score_matching#
- ehrapy.tools.propensity_score_matching(edata, treatment, outcome, *, covariates, propensity_model='logistic', k=1, caliper=0.2, replacement=True, target='att', n_bootstrap=200, random_state=None, layer=None)[source]#
Estimate the treatment effect by 1-to-\(k\) propensity score matching on the logit scale.
For each treated unit, the \(k\) nearest control units in logit-propensity space are selected as matches (and vice versa when
target='ate'). Withcaliperset, candidate matches with logit-propensity distance abovecaliper * SD(logit(e))are discarded; treated units with no valid match are dropped from the estimate.- Parameters:
edata (
EHRData) – Central data object.treatment (
str) – Column name of the binary (0/1) treatment variable.outcome (
str) – Column name of the outcome variable.covariates (
Sequence[str]) – Adjustment set used to fit the propensity model. Each entry must refer to a name inedata.var_namesoredata.obs.columns.propensity_model (
str|BaseEstimator, default:'logistic') – Propensity model specification (seeiptw()for the accepted values).k (
int, default:1) – Number of matches per unit.caliper (
float|None, default:0.2) – Maximum logit-propensity distance for a valid match, in units ofSD(logit(e)). UseNoneto disable the caliper.replacement (
bool, default:True) – Whether matching is performed with replacement.target (
str, default:'att') –'att'for the average treatment effect on the treated, or'ate'for the average treatment effect.n_bootstrap (
int, default:200) – Number of bootstrap resamples used for the SE and 95% percentile confidence interval. Set to0to skip uncertainty estimation.random_state (
int|None, default:None) – Seed for the bootstrap resampler.layer (
str|None, default:None) – Layer ofedatato draw the var-side variables from. IfNone,edata.Xis used.
- Return type:
- Returns:
A
CausalEstimatewhoseparamsdict contains the propensity scores and the matched-pair indices.
Examples
>>> import ehrapy as ep >>> import ehrdata as ed >>> edata = ed.dt.mimic_2_preprocessed() >>> est = ep.tl.propensity_score_matching( ... edata, ... "aline_flg", ... "day_28_flg", ... covariates=["age", "sofa_first", "sapsi_first"], ... random_state=0, ... ) >>> print(est.summary()) Causal effect of 'aline_flg' on 'day_28_flg' method: propensity_score_matching_att ATE: -0.0511 SE: 0.0337 95% CI: [-0.1209, 0.0051] n: 1776