MIMIC-II IAC Causal Inference

MIMIC-II IAC Causal Inference#

This tutorial continues the exploration the MIMIC-II IAC dataset using causal inference methods. The dataset was created for the purpose of a case study in the book: Secondary Analysis of Electronic Health Records, published by Springer in 2016. In particular, the dataset was used throughout Chapter 16 (Data Analysis) by Raffa J. et al. to investigate the effectiveness of indwelling arterial catheters in hemodynamically stable patients with respiratory failure for mortality outcomes. The dataset is derived from MIMIC-II, the publicly-accessible critical care database. It contains summary clinical data and outcomes for 1,776 patients.

Reference:

[1] https://github.com/py-why/dowhy

[2] https://www.pywhy.org/dowhy/

[3] https://arxiv.org/abs/2011.04216

[4] https://physionet.org/content/mimic2-iaccd/1.0/

Importing ehrapy and setting plotting parameters#

# pip install ehrapy[causal]

import ehrapy as ep
import ehrdata as ed
import graphviz
from IPython.display import display

import warnings

warnings.filterwarnings("ignore")

ep.print_versions()

graphviz        0.20.3
----    ----
defusedxml      0.7.1
numba   0.61.0
charset-normalizer      3.4.1
google-resumable-media  2.7.2
sphinxcontrib-jsmath    1.0.1
interface-meta  1.3.0
Pygments        2.19.1
threadpoolctl   3.5.0
torch   2.6.0
Deprecated      1.2.18
statsmodels     0.14.4
PyYAML  6.0.2
sphinxcontrib-devhelp   2.0.0
grpcio  1.71.0rc2
prompt_toolkit  3.0.50
thefuzz 0.22.1
Bottleneck      1.5.0
jedi    0.19.2
traitlets       5.14.3
kiwisolver      1.4.8
asciitree       0.3.3
platformdirs    4.3.6
prodict 0.8.20
lamin_utils     0.13.10
h5py    3.13.0
igraph  0.11.8
pytz    2025.1
toolz   1.0.0
tblib   3.0.0
sphinxcontrib-applehelp 2.0.0
protobuf        5.29.3
scipy   1.14.1
pyzmq   26.2.1
six     1.17.0
pytest  8.3.5
google-cloud-core       2.4.2
certifi 2025.1.31 (2025.01.31)
ipykernel       6.29.5
sphinxcontrib-qthelp    2.0.0
session-info2   0.1.2
wcwidth 0.2.13
dask    2025.2.0
fsspec  2025.2.0
coverage        7.6.12
attrs   25.1.0
MarkupSafe      3.0.2
pydot   3.0.4
xarray  2025.1.2
pyasn1  0.6.1
tqdm    4.67.1
pyparsing       3.2.1
scikit-learn    1.6.1
zipp    3.21.0
rich    13.9.4
Cython  0.29.37
legacy-api-wrap 1.4.1
autograd        1.7.0
seaborn 0.13.2
imbalanced-learn        0.13.0
typing_extensions       4.12.2
texttable       1.7.0
idna    3.10
awkward 2.7.4
sphinxcontrib-bibtex    2.6.3
google-api-core 2.24.1
natsort 8.4.0
ipywidgets      8.1.5
rsa     4.9
jupyter_core    5.7.2
formulaic       1.1.1
importlib_metadata      8.6.1
pandas  2.2.3
fhiry   4.0.0
numpy   1.26.4
cachetools      5.5.2
stack-data      0.6.3
pyasn1_modules  0.4.1
requests        2.32.3
regex   2024.11.6 (2.5.148)
llvmlite        0.44.0
db-dtypes       1.4.2
msgpack 1.1.0
googleapis-common-protos        1.69.1
parso   0.8.4
setuptools      75.8.0
google-cloud-bigquery   3.30.0
autograd-gamma  0.5.0
patsy   1.0.1
Django  5.1.7
packaging       24.2
Jinja2  3.0.3
leidenalg       0.10.2
psutil  7.0.0
joblib  1.4.2
fast-array-utils        1.2.1
zarr    2.18.4
scanpy  1.10.4
executing       2.2.0
wrapt   1.17.2
array_api_compat        1.11.1
pyarrow 19.0.1
missingno       0.5.2
tabulate        0.9.0
tornado 6.4.2
pluggy  1.5.0
chardet 5.2.0
comm    0.2.2
anndata 0.11.3
cloudpickle     3.1.1
RapidFuzz       3.12.2
google-auth     2.38.0
sparse  0.15.5
ipython 9.0.1
tableone        0.9.1
sphinxcontrib-htmlhelp  2.1.0
cycler  0.12.1
pillow  11.1.0
iniconfig       2.0.0
debugpy 1.8.13
sphinxcontrib-serializinghtml   2.0.0
jupyter_client  8.6.3
grpcio-status   1.71.0rc2
timeago 1.0.16 (1.0.14)
decorator       5.2.1
urllib3 1.26.20
appnope 0.1.4
matplotlib      3.10.1
duckdb  1.2.1
awkward_cpp     44
asttokens       3.0.0
filelock        3.17.0
lifelines       0.30.0
zstandard       0.23.0
python-dateutil 2.9.0.post0
pure_eval       0.2.3
numcodecs       0.15.1
----    ----
Python  3.11.11 (main, Dec 11 2024, 10:25:04) [Clang 14.0.6 ]
OS      macOS-15.3-arm64-arm-64bit
CPU     8 logical CPU cores, arm
GPU     No GPU found
Updated 2025-09-05 15:13

MIMIC-II dataset preparation#

Let’s load the MIMIC-II dataset, and using ehrapy encode categorical variables with a one-hot encoding.

edata = ed.dt.mimic_2()
ed.infer_feature_types(edata)
edata = ep.pp.encode(edata, autodetect=True)

! File ehrapy_data/ehrapy_mimic2.csv already exists! Using already downloaded dataset...
! Features 'aline_flg', 'gender_num', 'service_num', 'day_icu_intime_num', 'hour_icu_intime', 'hosp_exp_flg', 'icu_exp_flg', 'day_28_flg', 'censor_flg', 'sepsis_flg', 'chf_flg', 'afib_flg', 'renal_flg', 'liver_flg', 'copd_flg', 'cad_flg', 'stroke_flg', 'mal_flg', 'resp_flg' were detected as categorical features stored numerically.Please verify and adjust if necessary using `ed.replace_feature_types`.

 Detected feature types for EHRData object with 1776 obs and 46 vars
╠══ 📅 Date features
╠══ 📐 Numerical features
║   ╠══ abg_count
║   ╠══ age
║   ╠══ bmi
║   ╠══ bun_first
║   ╠══ chloride_first
║   ╠══ creatinine_first
║   ╠══ hgb_first
║   ╠══ hospital_los_day
║   ╠══ hr_1st
║   ╠══ icu_los_day
║   ╠══ iv_day_1
║   ╠══ map_1st
║   ╠══ mort_day_censored
║   ╠══ pco2_first
║   ╠══ platelet_first
║   ╠══ po2_first
║   ╠══ potassium_first
║   ╠══ sapsi_first
║   ╠══ sodium_first
║   ╠══ sofa_first
║   ╠══ spo2_1st
║   ╠══ tco2_first
║   ╠══ temp_1st
║   ╠══ wbc_first
║   ╚══ weight_first
╚══ 🗂️ Categorical features
    ╠══ afib_flg (2 categories)
    ╠══ aline_flg (2 categories)
    ╠══ cad_flg (2 categories)
    ╠══ censor_flg (2 categories)
    ╠══ chf_flg (2 categories)
    ╠══ copd_flg (2 categories)
    ╠══ day_28_flg (2 categories)
    ╠══ day_icu_intime (7 categories)
    ╠══ day_icu_intime_num (7 categories)
    ╠══ gender_num (2 categories)
    ╠══ hosp_exp_flg (2 categories)
    ╠══ hour_icu_intime (24 categories)
    ╠══ icu_exp_flg (2 categories)
    ╠══ liver_flg (2 categories)
    ╠══ mal_flg (2 categories)
    ╠══ renal_flg (2 categories)
    ╠══ resp_flg (2 categories)
    ╠══ sepsis_flg (1 categories)
    ╠══ service_num (2 categories)
    ╠══ service_unit (3 categories)
    ╚══ stroke_flg (2 categories)

The MIMIC-II dataset has 1776 patients as described above with 46 features.

edata

EHRData object with n_obs × n_vars = 1776 × 54
    obs: 'service_unit', 'day_icu_intime'
    var: 'feature_type', 'unencoded_var_names', 'encoding_mode'
    layers: 'original'
    shape of .X: (1776, 54)

Causal Inference on the MIMIC-II dataset#

In the background, ehrapy uses the dowhy package to enable effortless causal inference on electronic health records (EHR). Any dowhy analysis is structured into 3 steps:

Formulate causal questions
Estimate causal effects
Perform refutation tests.

Causal Graph#

The causal graph is a directed acyclic graph (DAG) that represents the causal relationships between the variables in the dataset. Here, we create it by manually writing out the connections between the variables. Other options would be the GML or DOT graph format. Furthermore, you can use graphical tools like DAGitty to construct the graph. You can export the graph string that it generates. The graph string is very close to the DOT format: just rename dag to digraph, remove newlines and add a semicolon after every line, to convert it to the DOT format and input to DoWhy.

Assumptions:#

both age and overweight increase your risk for medical problems
having a lot of problems makes you more likely to die in the hospital
having a lot of problems influences your likelihood of getting an IAC
having an IAC influences your likelihood of dying in the hospital

causal_graph = """digraph {
aline_flg[label="Indwelling arterial catheters used"];
icu_los_day[label="Days in ICU"];

age -> sepsis_flg;
age -> chf_flg;
age -> afib_flg;
age -> renal_flg;
age -> liver_flg;
age -> copd_flg;
age -> cad_flg;
age -> stroke_flg;
age -> resp_flg;
bmi -> sepsis_flg;
bmi -> chf_flg;
bmi -> afib_flg;
bmi -> renal_flg;
bmi -> liver_flg;
bmi -> copd_flg;
bmi -> cad_flg;
bmi -> stroke_flg;
bmi -> resp_flg;
sepsis_flg -> aline_flg;
chf_flg -> aline_flg;
afib_flg -> aline_flg;
renal_flg -> aline_flg;
liver_flg -> aline_flg;
copd_flg -> aline_flg;
cad_flg -> aline_flg;
stroke_flg -> aline_flg;
resp_flg -> aline_flg;
sepsis_flg -> icu_los_day;
chf_flg -> icu_los_day;
afib_flg -> icu_los_day;
renal_flg -> icu_los_day;
liver_flg -> icu_los_day;
copd_flg -> icu_los_day;
cad_flg -> icu_los_day;
stroke_flg -> icu_los_day;
resp_flg -> icu_los_day;
aline_flg -> icu_los_day;
}"""

g = graphviz.Source(causal_graph)
display(g)

../../_images/bdd95f92749a5a437c48eff0da3911b51375355d023b77560ae8d5f79487a7c6.svg

Causal inference#

This graph can now be fed into the causal_inference() method. Furthermore, we have to specify an estimation_method. For now, we will use backdoor.linear_regression. Please refer to this example notebook and the official dowhy documentation for more information on the different estimation methods.

ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
)

Show code cell output

Hide code cell output

! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (1/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (2/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (3/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (4/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (5/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (6/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (7/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (8/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (9/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (10/10)
Causal inference results for treatment variable 'aline_flg' and outcome variable 'icu_los_day':
└- Increasing the treatment variable(s) [aline_flg] from 0 to 1 causes an increase of 2.2348813091683177 in the expected value of the outcome [['icu_los_day']], over the data distribution/population represented by the dataset.

Refutation results
├-Refute: Use a Placebo Treatment
|    ├- Estimated effect: 2.23
|    ├- New effect: 0.000
|    ├- p-value: nan
|    └- Test significance: 2.23
├-Refute: Add a random common cause
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.455
|    └- Test significance: 2.23
├-Refute: Use a subset of data
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.497
|    └- Test significance: 2.23
├-Refute: Add an Unobserved Common Cause
|    ├- Estimated effect: 2.23
|    ├- New effect: -0.36, 2.3
|    ├- p-value: Not applicable
|    └- Test significance: 2.23
├-Refute: Bootstrap Sample Dataset
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.223
|    ├- p-value: 0.464
|    └- Test significance: 2.23
└-Refute: Use a Dummy Outcome
     ├- Estimated effect: 0.00
     ├- New effect: -0.004
     ├- p-value: 0.460
     └- Test significance: 0.00

<dowhy.causal_estimator.CausalEstimate at 0x30619d490>

As we can see, the model returns a summary of the identified causal effect and the refutation results. The placebo_treatment_refuter failed in this case because our intervention variable is binary. By default, all 6 potential refutation methods are run, but the user can also specify to only use a subset of them.

ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        "placebo_treatment_refuter",
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "bootstrap_refuter",
        "dummy_outcome_refuter",
    ],
)

Show code cell output

Hide code cell output

! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (1/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (2/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (3/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (4/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (5/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (6/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (7/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (8/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (9/10)
! Refutation 'placebo_treatment_refuter' returned invalid pval 'nan', retrying (10/10)
Causal inference results for treatment variable 'aline_flg' and outcome variable 'icu_los_day':
└- Increasing the treatment variable(s) [aline_flg] from 0 to 1 causes an increase of 2.2348813091683177 in the expected value of the outcome [['icu_los_day']], over the data distribution/population represented by the dataset.

Refutation results
├-Refute: Use a Placebo Treatment
|    ├- Estimated effect: 2.23
|    ├- New effect: 0.000
|    ├- p-value: nan
|    └- Test significance: 2.23
├-Refute: Add a random common cause
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.462
|    └- Test significance: 2.23
├-Refute: Use a subset of data
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.500
|    └- Test significance: 2.23
├-Refute: Add an Unobserved Common Cause
|    ├- Estimated effect: 2.23
|    ├- New effect: -0.52, 2.25
|    ├- p-value: Not applicable
|    └- Test significance: 2.23
├-Refute: Bootstrap Sample Dataset
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.250
|    ├- p-value: 0.461
|    └- Test significance: 2.23
└-Refute: Use a Dummy Outcome
     ├- Estimated effect: 0.00
     ├- New effect: 0.003
     ├- p-value: 0.471
     └- Test significance: 0.00

<dowhy.causal_estimator.CausalEstimate at 0x309c4c650>

By default, we are hiding a lot of the default dowhy output for clarity. However, it is possible to nontheless display it.

ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        # "placebo_treatment_refuter",  # we know it'll fail
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "bootstrap_refuter",
        "dummy_outcome_refuter",
    ],
    print_causal_estimate=True,
    print_summary=False,
)

Plotting#

Furthermore, the causal_inference() function can generate several plots, if desired. It’s for example possible to output the causal graph …

ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        # "placebo_treatment_refuter",  # we know it'll fail
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "dummy_outcome_refuter",
    ],
    print_causal_estimate=False,
    print_summary=False,
    show_graph=True,
)

… or the graph generated by the add_unobserved_common_cause refutation method.

ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        # "placebo_treatment_refuter",  # we know it'll fail
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "dummy_outcome_refuter",
    ],
    print_causal_estimate=False,
    print_summary=False,
    show_graph=False,
    show_refute_plots=True,
)

In the process of performing the causal dowhy analysis, an estimator is created which we can query from the model and then visualise.

estimate = ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        # "placebo_treatment_refuter",  # we know it'll fail
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "bootstrap_refuter",
        "dummy_outcome_refuter",
    ],
    print_causal_estimate=False,
    print_summary=False,
    show_graph=False,
    show_refute_plots=False,
    return_as="estimate",
)

ep.pl.causal_effect(estimate)

<Axes: title={'center': 'DoWhy estimate $\\rho$ (slope) = 2.235'}, xlabel='aline_flg', ylabel='icu_los_day'>

../../_images/461436b7d2000da876d6987aaeae5a3f928abd08b90c6d2f59595f40b74e1a59.png

Advanced options#

Within dowhy, the user can specify several arguments for identification, estimation and refutation. These arguments can also be passed directly to the respective functions through the identify_kwargs, estimate_kwargs and refute_kwarg of causal_inference().

estimate = ep.tl.causal_inference(
    edata=edata,
    graph=causal_graph,
    treatment="aline_flg",
    outcome="icu_los_day",
    estimation_method="backdoor.linear_regression",
    refute_methods=[
        # "placebo_treatment_refuter",  # we know it'll fail
        "random_common_cause",
        "data_subset_refuter",
        "add_unobserved_common_cause",
        "bootstrap_refuter",
        "dummy_outcome_refuter",
    ],
    print_causal_estimate=True,
    print_summary=True,
    show_graph=False,
    show_refute_plots="contour",
    return_as="estimate",
    identify_kwargs={"proceed_when_unidentifiable": True},
    estimate_kwargs={"target_units": "days"},
    refute_kwargs={"random_seed": 5},
)

../../_images/562a4b4e188e90d93d97c143f0decbbf2c11e990406f10d68fdfdafeb21c6502.png

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d                                                                         ↪
────────────(E[icu_los_day|stroke_flg,chf_flg,cad_flg,resp_flg,sepsis_flg,afib ↪
d[aline_flg]                                                                   ↪

↪                                    
↪ _flg,renal_flg,liver_flg,copd_flg])
↪                                    
Estimand assumption 1, Unconfoundedness: If U→{aline_flg} and U→icu_los_day then P(icu_los_day|aline_flg,stroke_flg,chf_flg,cad_flg,resp_flg,sepsis_flg,afib_flg,renal_flg,liver_flg,copd_flg,U) = P(icu_los_day|aline_flg,stroke_flg,chf_flg,cad_flg,resp_flg,sepsis_flg,afib_flg,renal_flg,liver_flg,copd_flg)

## Realized estimand
b: icu_los_day~aline_flg+stroke_flg+chf_flg+cad_flg+resp_flg+sepsis_flg+afib_flg+renal_flg+liver_flg+copd_flg
Target units: days

## Estimate
Mean value: 2.2348813091683177

Causal inference results for treatment variable 'aline_flg' and outcome variable 'icu_los_day':
└- Increasing the treatment variable(s) [aline_flg] from 0 to 1 causes an increase of 2.2348813091683177 in the expected value of the outcome [['icu_los_day']], over the data distribution/population represented by the dataset.

Refutation results
├-Refute: Add a random common cause
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.475
|    └- Test significance: 2.23
├-Refute: Use a subset of data
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.235
|    ├- p-value: 0.499
|    └- Test significance: 2.23
├-Refute: Add an Unobserved Common Cause
|    ├- Estimated effect: 2.23
|    ├- New effect: -0.59, 2.29
|    ├- p-value: Not applicable
|    └- Test significance: 2.23
├-Refute: Bootstrap Sample Dataset
|    ├- Estimated effect: 2.23
|    ├- New effect: 2.226
|    ├- p-value: 0.473
|    └- Test significance: 2.23
└-Refute: Use a Dummy Outcome
     ├- Estimated effect: 0.00
     ├- New effect: -0.004
     ├- p-value: 0.466
     └- Test significance: 0.00