{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MIMIC-II IAC Causal Inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial continues the exploration the MIMIC-II IAC dataset using causal inference methods. The dataset was created for the purpose of a case study in the book: Secondary Analysis of Electronic Health Records, published by Springer in 2016. In particular, the dataset was used throughout Chapter 16 (Data Analysis) by Raffa J. et al. to investigate the effectiveness of indwelling arterial catheters in hemodynamically stable patients with respiratory failure for mortality outcomes. The dataset is derived from MIMIC-II, the publicly-accessible critical care database. It contains summary clinical data and outcomes for 1,776 patients.\n", "\n", "Reference: \n", "\n", "[1] https://github.com/py-why/dowhy\n", "\n", "[2] https://www.pywhy.org/dowhy/\n", "\n", "[3] https://arxiv.org/abs/2011.04216\n", "\n", "[4] https://physionet.org/content/mimic2-iaccd/1.0/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing ehrapy and setting plotting parameters" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ehrapy as ep\n", "from IPython.display import display\n", "import graphviz" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-----\n", "ehrapy 0.9.0\n", "rich NA\n", "scanpy 1.9.3\n", "session_info 1.0.0\n", "-----\n", "CoreFoundation NA\n", "Foundation NA\n", "Levenshtein 0.21.1\n", "PIL 10.0.0\n", "PyObjCTools NA\n", "anndata 0.9.2\n", "anyio NA\n", "appnope 0.1.3\n", "argcomplete NA\n", "arrow 1.2.3\n", "astor 0.8.1\n", "asttokens NA\n", "attr 23.1.0\n", "attrs 23.1.0\n", "autograd NA\n", "autograd_gamma NA\n", "babel 2.12.1\n", "backcall 0.2.0\n", "brotli 1.0.9\n", "cachetools 5.3.1\n", "causallearn NA\n", "certifi 2023.07.22\n", "cffi 1.15.1\n", "charset_normalizer 3.2.0\n", "cloudpickle 2.2.1\n", "comm 0.1.3\n", "cvxopt 1.3.1\n", "cycler 0.10.0\n", "cython_runtime NA\n", "dateutil 2.8.2\n", "db_dtypes 1.1.1\n", "debugpy 1.6.7\n", "decorator 5.1.1\n", "defusedxml 0.7.1\n", "dill 0.3.7\n", "dot_parser NA\n", "dowhy 0.11.1\n", "executing 1.2.0\n", "fastjsonschema NA\n", "fhiry 3.0.0\n", "filelock 3.12.2\n", "formulaic 0.6.4\n", "fqdn NA\n", "future 0.18.3\n", "gmpy2 2.1.2\n", "google NA\n", "graphlib NA\n", "graphviz 0.20.1\n", "grpc 1.56.2\n", "grpc_status NA\n", "h5py 3.9.0\n", "idna 3.4\n", "igraph 0.10.6\n", "imblearn 0.12.3\n", "interface_meta 1.3.0\n", "ipykernel 6.25.0\n", "ipywidgets 8.0.7\n", "isoduration NA\n", "jedi 0.19.0\n", "jinja2 3.1.4\n", "joblib 1.3.0\n", "json5 NA\n", "jsonpointer 2.0\n", "jsonschema 4.18.4\n", "jsonschema_specifications NA\n", "jupyter_events 0.7.0\n", "jupyter_server 2.7.0\n", "jupyterlab_server 2.24.0\n", "kiwisolver 1.4.4\n", "lamin_utils 0.13.2\n", "leidenalg 0.10.1\n", "lifelines 0.27.7\n", "llvmlite 0.40.1\n", "markupsafe 2.1.3\n", "matplotlib 3.7.2\n", "missingno 0.5.2\n", "mpl_toolkits NA\n", "mpmath 1.3.0\n", "natsort 8.4.0\n", "nbformat 5.9.2\n", "networkx 3.1\n", "numba 0.57.1\n", "numpy 1.24.0\n", "objc 9.2\n", "overrides NA\n", "packaging 23.1\n", "pandas 2.0.3\n", "parso 0.8.3\n", "patsy 0.5.6\n", "pexpect 4.8.0\n", "pickleshare 0.7.5\n", "pkg_resources NA\n", "platformdirs 3.10.0\n", "prometheus_client NA\n", "prompt_toolkit 3.0.39\n", "psutil 5.9.5\n", "ptyprocess 0.7.0\n", "pure_eval 0.2.2\n", "pyarrow 12.0.1\n", "pyasn1 0.5.0\n", "pyasn1_modules 0.3.0\n", "pydev_ipython NA\n", "pydevconsole NA\n", "pydevd 2.9.5\n", "pydevd_file_utils NA\n", "pydevd_plugins NA\n", "pydevd_tracing NA\n", "pydot 1.4.2\n", "pygments 2.15.1\n", "pyparsing 3.0.9\n", "pythonjsonlogger NA\n", "pytz 2023.3\n", "rapidfuzz 3.1.2\n", "referencing NA\n", "requests 2.31.0\n", "rfc3339_validator 0.1.4\n", "rfc3986_validator 0.1.1\n", "rpds NA\n", "rsa 4.9\n", "scipy 1.11.1\n", "seaborn 0.12.2\n", "send2trash NA\n", "setuptools 68.0.0\n", "six 1.16.0\n", "sklearn 1.3.0\n", "sniffio 1.3.0\n", "socks 1.7.1\n", "sphinxcontrib NA\n", "stack_data 0.6.2\n", "statsmodels 0.14.2\n", "sympy 1.12\n", "tableone 0.9.1\n", "tabulate 0.9.0\n", "texttable 1.6.7\n", "thefuzz 0.19.0\n", "threadpoolctl 3.2.0\n", "torch 2.0.1\n", "tornado 6.3.2\n", "tqdm 4.65.0\n", "traitlets 5.9.0\n", "typing_extensions NA\n", "uri_template NA\n", "urllib3 2.0.4\n", "vscode NA\n", "wcwidth 0.2.6\n", "webcolors 1.13\n", "websocket 1.6.1\n", "wrapt 1.15.0\n", "yaml 6.0\n", "zmq 25.1.0\n", "zoneinfo NA\n", "-----\n", "IPython 8.14.0\n", "jupyter_client 8.3.0\n", "jupyter_core 5.3.1\n", "jupyterlab 4.0.3\n", "notebook 7.0.1\n", "-----\n", "Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ]\n", "macOS-13.1-arm64-arm-64bit\n", "-----\n", "Session information updated at 2024-06-28 11:35\n" ] } ], "source": [ "ep.print_versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MIMIC-II dataset preparation\n", "Let's load the MIMIC-II dataset using ehrapy with default one-hot encoding." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "637ed865f5ab40b8a969ed6f5327d844", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n" ], "text/plain": [] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"adata = ep.dt.mimic_2(encoded=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The MIMIC-II dataset has 1776 patients as described above with 46 features."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AnnData object with n_obs × n_vars = 1776 × 54\n",
" obs: 'service_unit', 'day_icu_intime'\n",
" var: 'feature_type', 'unencoded_var_names', 'encoding_mode'\n",
" layers: 'original'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Causal Inference on the MIMIC-II dataset\n",
"In the background, `ehrapy` uses the `dowhy` package to enable effortless causal inference on electronic health records (EHR). Any `dowhy` analysis is structured into 3 steps:\n",
"1) Formulate causal questions\n",
"2) Estimate causal effects\n",
"3) Perform refutation tests.\n",
"\n",
"Within `ehrapy`, we have consolidated this method into a single function `ehrapy.causal_inference()`. This function takes in the dataset, the treatment variable, the outcome variable, and the further optional parameters. It then performs the 3 steps above and displays the information in a user-friendly way."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Causal Graph\n",
"\n",
"The causal graph is a directed acyclic graph (DAG) that represents the causal relationships between the variables in the dataset. Here, we create it by manually writing out the connections between the variables. Other options would be the GML or DOT graph format. Furthermore, you can use graphical tools like [DAGitty](http://dagitty.net/dags.html) to construct the graph. You can export the graph string that it generates. The graph string is very close to the DOT format: just rename dag to digraph, remove newlines and add a semicolon after every line, to convert it to the DOT format and input to DoWhy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Assumptions:\n",
" - both age and overweight increase your risk for medical problems\n",
" - having a lot of problems makes you more likely to die in the hospital\n",
" - having a lot of problems influences your likelihood of getting an IAC\n",
" - having an IAC influences your likelihood of dying in the hospital"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"