ehrapy.tools.covariate_balance#
- ehrapy.tools.covariate_balance(edata, treatment, *, covariates, weights=None, propensity_model='logistic', layer=None)[source]#
Report standardised mean differences (SMD) for each covariate, before and after weighting.
SMD is the standard “love-plot” diagnostic:
(mean_treated − mean_control) / pooled_SD. Values with|SMD| < 0.1are conventionally considered balanced. The variance ratio isVar_treated / Var_control; values near 1 indicate similar spread between treatment arms.When
weightsis provided, the “after” columns use the supplied weights (typically the IPTW output stored inestimate.params['weights']). WhenweightsisNone, this function fits its own propensity model and computes IPTW weights internally.- Parameters:
edata (
EHRData) – Central data object.treatment (
str) – Column name of the binary (0/1) treatment variable.covariates (
Sequence[str]) – Adjustment set to evaluate. Each entry must refer to a name inedata.var_namesoredata.obs.columns.weights (
ndarray|None, default:None) – Optional pre-computed IPTW weight vector aligned withedata.obs.index. WhenNone, weights are computed internally from a freshly fitted propensity model.propensity_model (
str|BaseEstimator, default:'logistic') – Propensity model used to compute weights whenweightsisNone(seeiptw()for the accepted values).layer (
str|None, default:None) – Layer ofedatato draw the var-side variables from. IfNone,edata.Xis used.
- Return type:
- Returns:
A DataFrame indexed by covariate name with columns
smd_unweighted,smd_weighted,var_ratio_unweighted, andvar_ratio_weighted.
Examples
>>> import ehrapy as ep >>> import ehrdata as ed >>> edata = ed.dt.mimic_2_preprocessed() >>> bal = ep.tl.covariate_balance( ... edata, ... "aline_flg", ... covariates=["age", "sofa_first", "sapsi_first"], ... ) >>> print(bal.round(3).to_string()) smd_unweighted smd_weighted var_ratio_unweighted var_ratio_weighted age 0.117 -0.044 0.896 1.018 sofa_first 0.818 -0.220 1.135 0.480 sapsi_first 0.627 -0.157 1.112 0.781