{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "fd0136b6-54c9-496b-a1e3-d8f4509a8f28", "metadata": {}, "source": [ "# Introduction to ehrapy" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e6b8d85e-d22e-4833-a5fc-642b6f4a146c", "metadata": {}, "source": [ "Welcome to ehrapy! \n", "\n", "\n", "ehrapy is a framework for the exploratory and targeted end-to-end analysis of complex electronic health record (EHR) datasets inspired by the biological omics world.\n", "Hereby, data points are not necessarily treated as complete patients, but as patient visits representing snapshots of the underlying system.\n", "The goal of any exploratory analysis not necessarily is to predict or classify a specific state, but to understand the system underlying the data manifold." ] }, { "attachments": {}, "cell_type": "markdown", "id": "fc830dce-867f-4d1e-8c47-b3fe340c0179", "metadata": {}, "source": [ "ehrapy is not a pure machine learning library or a pure statistics library, but a framework providing simplified access to fundamental algorithms to preprocess, visualize and analyze EHR data." ] }, { "attachments": {}, "cell_type": "markdown", "id": "1fbccfae-6f32-4f30-8e63-f4123af410b9", "metadata": {}, "source": [ "## Fundamental Principles" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7b75214a-f619-47f2-84ce-8df1fd3344c0", "metadata": {}, "source": [ "One of the main advantages of ehrapy is that EHR datasets can be analyzed from beginning to end with a clear, but flexible, order of operations.\n", "\n", "![](images/ehrapy_overview.png)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0063b640-4cf6-453f-b0d8-db9b342ff312", "metadata": {}, "source": [ "ehrapy borrows a lot from the single-cell world and the [scverse](https://github.com/scverse/) ecosystem. Notably, ehrapy is using the same data structure (AnnData) and many of the fundamental algorithms (scanpy). Both are briefly introduced in the following subsections." ] }, { "attachments": {}, "cell_type": "markdown", "id": "aff45f3d-ee7a-4e47-ae04-891566160b91", "metadata": {}, "source": [ "## AnnData" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cffb8332-8b43-4c41-81fa-500b5cd3153f", "metadata": {}, "source": [ "AnnData is short for Annotated Data and is the primary data structure used within ehrapy. Technically described, it is a Python package for handling annotated data matrices in memory and on disk, positioned between Pandas and xarray. AnnData offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface. From a users perspective, it is based on the idea of a primary 2D matrix `X` of, for example, dimensions `n_patient_visits x n_features`. The patient visits would then also be the observations (`obs`) and the features would be the variables (`var`). AnnData allows us to annotate this matrix either with respect to the observations or the variables. Furthermore, AnnData allows for the addition of graph like structures (`obsp, varp`) and further structured (`obsm, varm`) and unstructured matrices (`uns`) to be saved within the same object. These can than be readily used for various machine learning algorithms.\n", "\n", "Visualized it looks like this:\n", "\n", "\n", "![](images/anndata_schema.jpg)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6550d6b7-f6c6-494d-897d-9860d7d6dccd", "metadata": {}, "source": [ "Let us create an example AnnData object as it would be used in ehrapy." ] }, { "cell_type": "code", "execution_count": 4, "id": "7d190ad9-d2d8-49f1-a7fb-9551a48b5675", "metadata": { "tags": [] }, "outputs": [], "source": [ "import anndata as ad\n", "import pandas as pd\n", "import numpy as np" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9089a7a1-56e1-44b2-ba1d-f5e794633fab", "metadata": {}, "source": [ "After importing the required packages, we create an example dataset with a **patient_visit_id** column and some feature columns such as **age**, **b12_level** and **d3_level**. We further add a **service_unit** column that we do not want to include as data for our algorithms, but only as annotations." ] }, { "cell_type": "code", "execution_count": 5, "id": "10471aab-6329-412c-96f6-accb8d7a44a8", "metadata": {}, "outputs": [], "source": [ "data = {\n", " \"patient_visit_id\": [0, 1, 2],\n", " \"age\": [59, 24, 64],\n", " \"b12_level\": [560, 201, 450],\n", " \"d3_level\": [25, 19, 50],\n", " \"service_unit\": [\"NY\", \"NY\", \"BO\"],\n", "}\n", "df = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": 6, "id": "c92da069-474e-418b-874c-98149b70d941", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patient_visit_idageb12_leveld3_levelservice_unit
005956025NY
112420119NY
226445050BO
\n", "
" ], "text/plain": [ " patient_visit_id age b12_level d3_level service_unit\n", "0 0 59 560 25 NY\n", "1 1 24 201 19 NY\n", "2 2 64 450 50 BO" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "attachments": {}, "cell_type": "markdown", "id": "771f37ba-5543-4821-9083-ade2403fe7ad", "metadata": {}, "source": [ "Next, we import ehrapy and create an AnnData object using this Pandas DataFrame. Usually, EHR data comes in the form of `csv/tsv` tables that can be directly read into ehrapy using `ep.io.read_csv()`. For the sake of this example we transform an existing Pandas DataFrame into an AnnData object using the `df_to_anndata` function. Note that it has a `index_column` parameter to set the index and a `columns_obs_only` parameter which denotes features which should not be a part of the `X` matrix but of `obs` annotations. This will allow us to e.g. color plots by `service_unit`, but not to use these values for algorithms." ] }, { "cell_type": "code", "execution_count": 7, "id": "d1c24efb-a176-44b6-a7f2-3fe344403444", "metadata": {}, "outputs": [], "source": [ "import ehrapy as ep" ] }, { "cell_type": "code", "execution_count": 8, "id": "015f6cd8-7e52-4d7d-b0f4-5c4539f9434f", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/zeth/PycharmProjects/ehrapy/ehrapy/anndata/anndata_ext.py:108: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.\n", " X = X.astype(np.number) if all_num else X.astype(object)\n" ] } ], "source": [ "adata = ep.ad.df_to_anndata(\n", " df, index_column=\"patient_visit_id\", columns_obs_only=[\"service_unit\"]\n", ")" ] }, { "cell_type": "code", "execution_count": 9, "id": "4c31460d-45fb-4eb9-9577-b6985cb7c8da", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3 × 3\n", " obs: 'service_unit'\n", " var: 'ehrapy_column_type'\n", " layers: 'original'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata" ] }, { "attachments": {}, "cell_type": "markdown", "id": "22622d76-aa15-45db-aece-3ab0d47d7527", "metadata": {}, "source": [ "When examining our AnnData object we notice that it has a matrix of size 3 x 3 which correspond to our **age**, **B12** and **D3** measurements." ] }, { "cell_type": "code", "execution_count": 10, "id": "40d2e5e2-40ae-4154-a023-e0c67cf15e36", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
service_unit
patient_visit_id
0NY
1NY
2BO
\n", "
" ], "text/plain": [ " service_unit\n", "patient_visit_id \n", "0 NY\n", "1 NY\n", "2 BO" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.obs" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ef2162b6-a9b0-48d4-9dbe-cf2f83a53cd6", "metadata": {}, "source": [ "Furthermore, our `obs` has the service unit as expected. The AnnData object also has data in the `uns` (unstructured) slot that denotes which columns are numerical columns and which ones are not. This may be required for specific algorithms." ] }, { "cell_type": "code", "execution_count": 11, "id": "992fd973-69fc-4ce2-ae7b-53cd27657765", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAABoAAAAUCAYAAACTQC2+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAABJ0AAASdAHeZh94AAABq0lEQVR4nL3VMWzNURTH8c9TCYMUHXVgZVEhNoJE0qSDycJoIJGQGAwMp2eyGbCI1SAREUljtkhISAwV0i4laSIRS0NUqp7h3Sf1evv3r6Fn+Q2/c8/3ntxz7+10u10bEZs2hILNTWZmnsEl7MU2HIyINyv8SVzHF7zA1YiYWRcoM/fhPr7hIebxaSDtGbZiDKewC4dr9TprnVFmXsQdXI6IW2ttaEX+axzAcER8HfSbzmik6Lt/QUq8Rwc7a2YTaKjoUktQP2+oZm7Y1DWBdhT93rLWYtHtrUGZ2cERdPGhJWiu6LGa+dfUZeY4TuIoDuFuRFxoQ8nMUTzHKB5jFvciYo7VHY3jSoFM43YbCETEPG7oDcVpXMOeakdlZ8M4jgd6N353RCy36GgCU6Wr85iJiD8Tu+pliIgFPMnMRzir9/xMt2jqRNHJiHg7aDZNXX8IRhpyVkb/os7VzCbQUoucWq2f6wX970dVXdcE+lG0+nZVop+3WDOb/qPZoucy8xXmI+LXYFJmbsF+vYu6gM+1Yk0dPdWbtgl8xHJmjg1AJksHLzGMm7XNwG+ZQXztc3ijGwAAAABJRU5ErkJggg==", "text/latex": [ "$\\displaystyle \\left\\{ \\right\\}$" ], "text/plain": [ "OrderedDict()" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.uns" ] }, { "attachments": {}, "cell_type": "markdown", "id": "76880ffc-e7aa-4515-abc3-c1d4e73f2674", "metadata": {}, "source": [ "Finally, the `layers` slot of our object saves all original values before any modifications in `original`. When using ehrapy, the `X` matrix will constantly be modified when applying algorithms to the object (e.g. scaling). This layer is a copy of our original `X` which will allow us to e.g. scale the age, but use the original values when coloring a UMAP plot." ] }, { "cell_type": "code", "execution_count": 12, "id": "01673e27-c9c2-4d0f-88a0-348604d85958", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 59., 560., 25.],\n", " [ 24., 201., 19.],\n", " [ 64., 450., 50.]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.layers[\"original\"]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "642a3e49-a670-4e12-b4e8-1522f9a6835c", "metadata": {}, "source": [ "For more details please examine the [AnnData documentation](https://anndata.readthedocs.io/en/latest/) and the [AnnData paper](https://www.biorxiv.org/content/10.1101/2021.12.16.473007v1)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "e143c4fb-c5ec-4b1c-89a7-ded10d9cafe8", "metadata": {}, "source": [ "## scanpy" ] }, { "attachments": {}, "cell_type": "markdown", "id": "759f0145-1d5f-4cc3-9825-a640e25dff15", "metadata": {}, "source": [ "[scanpy](https://github.com/theislab/scanpy/) is a framework for the analysis of single-cell data and ehrapy heavily builds upon it. While some of the implemented algorithms are single-cell specific (e.g. the **highly_variable_genes** function), many can be applied to any data (e.g. PCA or UMAP). ehrapy may also implement equivalents of single-cell specific functions that are EHR specific (e.g. the **highly_variable_features** function). All useful scanpy functions are wrapped in ehrapy to ensure that they are easily accessible and implemented in a fast and scalable way.\n", "\n", "![](images/scanpy.jpg)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8f0c4672-f5e7-4da3-b882-ba07847f3a62", "metadata": {}, "source": [ "Just like scanpy, ehrapy follows the same API patterns of preprocessing (pp), tools (tl) and plots (pl). Hence, the various functions from scanpy like `scanpy.tl.umap` can be used from ehrapy in a similar fashion: `ep.tl.umap`.\n", "\n", "The documentation of ehrapy tries to hide as many details from the single-cell world as possible, but you may see the terms cell, gene or expression pop up somewhere. However, the tight integration of AnnData and scanpy into ehrapy also allows for the joint analysis of omics data and EHR data. We will provide a vignette for this in the future.\n", "\n", "To learn more about scanpy please read the [scanpy documentation ](https://scanpy.readthedocs.io/en/stable/) and the [scanpy paper ](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1382-0)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "9a8fccc7-16bc-4e11-82fa-6b2a2268e89c", "metadata": {}, "source": [ "## ehrapy" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ff73e643-340e-4dda-a92f-9f13ac856aa3", "metadata": {}, "source": [ "Now that we've covered the basics of AnnData and scanpy and we have an example dataset, we can apply some of ehrapy's tools on it.\n", "We will start by calculating and visualizing a PCA on our data." ] }, { "cell_type": "code", "execution_count": 14, "id": "b8c3e1ab-718c-482d-a137-90db49898a7f", "metadata": {}, "outputs": [], "source": [ "ep.pp.pca(adata)" ] }, { "cell_type": "code", "execution_count": 15, "id": "791fcafb-eca5-4029-bc95-319f6fd60109", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmQAAAGvCAYAAAD11slWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAsZ0lEQVR4nO3de3gU5cH+8Xt2cyQnkpCQBBJADkEQgoJgrZzkFCq84gkEFbDW/qzailVbpa8FbKUoisqvpRWpJ0ALaEWgVRSFUm0RUKOACFEBkYQzOXDIcef9YyVtTAgJ2eTZnf1+rmuvmJndnTthYG6feXbGsm3bFgAAAIxxmQ4AAAAQ7ChkAAAAhlHIAAAADKOQAQAAGEYhAwAAMIxCBgAAYBiFDAAAwDAKGQAAgGEUMgAAAMMoZADOaPLkyWrfvr3pGD61bt06WZaldevWmY4CAFUoZACC3ksvvaQnn3zSdAwAQcziXpYAzqS8vFwej0fh4eGmo/iMx+NRWVmZwsLC5HJ5/5901KhR2rp1q3bv3m02HICgxQgZEEROnDjRoOeHhoY6qoxJksvlUkRERFUZAwB/wL9IgJ8pLi7WlClT1L59e4WHhys5OVnDhg3TRx99VPWcDz74QNnZ2YqLi1OLFi00cOBAvf/++9XeZ/r06bIsS5999pkmTJig+Ph4XXbZZXrsscdkWZb27NlTY9sPPPCAwsLCdOzYMUm1zyHzeDx66qmn1KNHD0VERCgpKUnZ2dnavHlztectWrRIvXv3VmRkpBISEnT99ddr7969DfpdnP4Zvuv555+XZVnVRrTat2+vUaNG6b333lPfvn0VERGh8847Ty+++GK11353DtmgQYP0t7/9TXv27JFlWbIsy3Hz5gD4PwoZ4Gduu+02/fGPf9Q111yjefPm6d5771VkZKS2b98uSXr33Xc1YMAAFRUVadq0aZo5c6YKCgp0+eWXa+PGjTXe77rrrtPJkyc1c+ZM3XrrrRo7dqwsy9LSpUtrPHfp0qUaPny44uPjz5jvlltu0ZQpU5Senq5HHnlE999/vyIiIrRhw4aq5zz88MOaOHGiOnfurDlz5mjKlCl65513NGDAABUUFDT+l3QGX3zxha699loNGzZMjz/+uOLj4zV58mRt27btjK/51a9+pV69eqlVq1ZauHChFi5cyHwyAM3PBuBX4uLi7DvuuKPWdR6Px+7cubM9YsQI2+PxVC0/efKk3aFDB3vYsGFVy6ZNm2ZLssePH1/jfb73ve/ZvXv3rrZs48aNtiT7xRdfrFo2adIku127dlXfv/vuu7Yk+2c/+1mt2Wzbtnfv3m273W774YcfrrZ+y5YtdkhISI3ldTn9M3zXc889Z0uyd+3aVbWsXbt2tiR7/fr1VcsOHjxoh4eH2/fcc0/VsrVr19qS7LVr11Ytu+KKK6r9nADQ3BghA/xMy5Yt9cEHHygvL6/GupycHOXm5mrChAk6cuSIDh8+rMOHD+vEiRMaMmSI1q9fL4/HU+01t912W433GTdunD788EN9+eWXVcuWLFmi8PBwXXnllWfM9uqrr8qyLE2bNq3GutOnFv/617/K4/Fo7NixVfkOHz6slJQUde7cWWvXrq3376KhunXrpv79+1d9n5SUpMzMTH311VdNtk0A8AUKGeBnHn30UW3dulXp6enq27evpk+fXlUocnNzJUmTJk1SUlJStceCBQtUWlqqwsLCau/XoUOHGtu47rrr5HK5tGTJEkmSbdtatmyZRo4cqdjY2DNm+/LLL5WWlqaEhIQzPic3N1e2batz5841Mm7fvl0HDx5s8O+kvjIyMmosi4+Pr5oTBwD+KsR0AADVjR07Vv3799drr72mt956S7Nnz9YjjzxSNfIkSbNnz1avXr1qfX10dHS17yMjI2s8Jy0tTf3799fSpUs1depUbdiwQV9//bUeeeSRRuf3eDyyLEtvvPGG3G73WfPVpbYJ/ZJUWVlZ6/Latid5CycA+DMKGeCHUlNTdfvtt+v222/XwYMHddFFF+nhhx/WE088IUmKjY3V0KFDG7WNcePG6fbbb9eOHTu0ZMkStWjRQqNHj67zNR07dtTq1at19OjRM46SdezYUbZtq0OHDurSpUujMp7+cEFBQYFatmxZtby2T4g2xpmKHwA0F05ZAn6ksrKyxinH5ORkpaWlqbS0VL1791bHjh312GOP6fjx4zVef+jQoXpv65prrpHb7dbLL7+sZcuWadSoUYqKijrra2zb1owZM2qsOz0KdfXVV8vtdmvGjBk1RqZs29aRI0fqnbFjx46SpPXr11ctO3HihF544YV6v0d9REVF1fi9A0BzYoQM8CPFxcVq27atrr32WmVlZSk6Olpr1qzRpk2b9Pjjj8vlcmnBggUaOXKkunfvrptvvllt2rTRvn37tHbtWsXGxmrlypX12lZycrIGDx6sOXPmqLi4WOPGjTvrawYPHqybbrpJc+fOVW5urrKzs+XxePTPf/5TgwcP1p133qmOHTvqt7/9rR544AHt3r1bY8aMUUxMjHbt2qXXXntNP/7xj3XvvffWK+Pw4cOVkZGhW265Rffdd5/cbreeffZZJSUl6euvv67Xe9RH7969tWTJEv385z/XxRdfrOjo6LOOFgKAT5n7gCeA7yotLbXvu+8+Oysry46JibGjoqLsrKwse968edWe9/HHH9tXX321nZiYaIeHh9vt2rWzx44da7/zzjtVzzl9yYhDhw6dcXvPPPOMLcmOiYmxT506VWP9dy97Ydu2XVFRYc+ePdvu2rWrHRYWZiclJdkjR460P/zww2rPe/XVV+3LLrvMjoqKsqOiouyuXbvad9xxh71jx44G/U4+/PBDu1+/fnZYWJidkZFhz5kz54yXvbjiiitqvH7gwIH2wIEDq76v7bIXx48ftydMmGC3bNnSlsQlMAA0O+5lCQAAYBhzyAAAAAxjDhmAZnfq1KmzTqJPSEhQWFhYMyUCALMoZACa3ZIlS3TzzTfX+Zy1a9dq0KBBzRMIAAxjDhmAZpefn1/nDb8l7ycf67rJOQA4CYUMAADAMCb1AwAAGGZkDpnH41FeXp5iYmK4ZQkAAAHCtm0VFxcrLS1NLhdjOr5kpJDl5eUpPT3dxKYBAEAj7d27V23btjUdw1GMFLKYmBhJ3j/Q2NhYExGCwvXzN2jrPu7Ph5qGnJ+kp66/yHQMAAGmqKhI6enpVcdx+I6RQnb6NGVsbCyFrIlUVHr0ZUGlXOEtTEeBH9p51MPfPQDnjOlGvscJYIfaeeC4Sis8pmPAT+0rOKWjJ8pMxwAAfItC5lBb9hWYjgA/9+k3BaYjAAC+RSFzqC3MHcNZML8QAPwHhcyhPs8vNh0Bfm77fvYRAPAXFDKHyi8sMR0Bfm4/+wgA+A0KmUMdOl5qOgL83MFiChkA+AsKmQMVnCxTGZ+wxFkcLKK0A4C/oJA50AEOtKiH0gqPCk+Wm44BABCFzJE4FYX6Yl8BAP9AIXMgRshQX+wrAOAfKGQOVHCSK7CjfgpOsa8AgD+gkDlQhcc2HQEBoqKSfQUA/AGFzIEqKWSoJ8o7APgHCpkDMeqB+qr0cHkUAPAHFDIHclmmEyBQWBY7CwD4AwqZA4W4+WNF/YS6KWQA4A84cjtQCENkqCe3i38CAMAf8K+xA0VHhJiOgAARHe42HQEAIAqZIyXHhJuOgACRHBNhOgIAQBQyR+Igi/pKjqW8A4A/oJA5EAdZ1IfbZalVFPsKAPgDCpkDtYoO59IXOKtW0WFysaMAgF+gkDmQ22UpMZqRD9SNU9sA4D8oZA7VmtOWOAs+/AEA/oNC5lAdk6JNR4Cf65TMPgIA/oJC5lA92sSZjgA/16Mt+wgA+AsKmUNRyHA27CMA4D8oZA7VvU2cuG80ziQ2IkTtEqNMxwAAfItC5lDR4SHq0IoDLmp3AaNjAOBXKGQOxikpnAnzxwDAv1DIHCyrbUvTEeCn2DcAwL9QyBxsYGaS6QjwQ6FuS9/v1Mp0DADAf6GQOVjHpGidxzwyfMfF7RMUFxlqOgYA4L9QyBxuaLfWpiPAzww9n30CAPwNhczhOPjiu4ZR0gHA71DIHK53u3glRIWZjgE/0aV1tNITWpiOAQD4DgqZw7ldlgYxuR/fYsQUAPwThSwIjOnVxnQE+Ikr2RcAwC9RyIJA/86t1D6R01TBrm/7BGWmxJiOAQCoBYUsCFiWpRv6tTMdA4bd+D32AQDwVxSyIHFdn7aKCOWPO1i1ig7XyAtSTMcAAJwBR+gg0bJFmEb1TDMdA4aM75uuUDd/3QHAX/EvdBC56RJOWQUjt8vShH4ZpmMAAOpAIQsiWekt1addvOkYaGYjL0hRalyk6RgAgDpQyILML7K7mo6AZhTisnTP8EzTMQAAZ0EhCzJ9OyTo8q7JpmOgmVzXJ10duME8APg9ClkQ+kV2plyW6RRoahGhLk0Z2tl0DABAPVDIglDXlFiu2B4EJl/aQa1jI0zHAADUA4UsSP18WBeFcRkEx4qLDNVPBnU0HQMAUE8ckYNUekIL3XxZe9Mx0ETuGtJZcZGhpmMAAOqJQhbE7h7aRR2TmPDtNBe3j9fkS9ubjgEAaAAKWRCLCHVr9nVZTPB3kIhQl2ZfmyUXf6gAEFAoZEHuoox4/aj/eaZjwEfuG9FV7bnMBQAEHAoZ9PNhnLp0govbx+tmTlUCQECikIFTlw7AqUoACGwUMkjynrq8bwS3VQpUv7u6B6cqASCAUchQ5SeDOmpMrzTTMdBA/2/AebrqwramYwAAGoFChmpmXdNTWW3jTMdAPQ3OTNIvuWE8AAQ8ChmqiQh1a/7EPkqOCTcdBWfRMSlKc8dfyLwxAHAAChlqaB0boadv6q2wEHYPfxUXGaoFky5WTARX4wcAJ+CIi1pdmBGvOWP55KU/Cg9x6Y83XqQOTOIHAMegkOGMRvVM06PXZsmilPmNMLdLT9/UW5d2bGU6CgDAhyhkqNO1vdvq4TE9KGV+INRt6f9PuFCDMpNNRwEA+FiI6QDwfxP6ZchlSVNf2yKPbTpNcAoLcWnehIs0tFtr01EAAE2AETLUy/V9M/T42Cy5mVTW7CJCXVowsQ9lDAAcjEKGervqwraaf1NvRYczsNpcWkWHa/GP+mlAlyTTUQAATYhChgYZcn5rvXb7pWqX2MJ0FMe7oE2sVtz5ffVul2A6CgCgiVHI0GCdW8fo9Tu+r+93SjQdxbFG9UzVK7ddqrSWkaajAACaAYUM56RlizC9cHNfTb60vekojmJZ0r3Du+j3Ey5SRKjbdBwAQDOhkOGchbhdmv4/3TX72p6KCqM8NFZiVJieuamP7ry8s+koAIBmRiFDo13XJ12r7x7AKcxG+EGPFL119wA+SQkAQYpCBp9oG99Ci27pp9+OuYDRsgZIjArTHyZcpHk39FZiNDd0B4BgRSGDz1iWpRsvacdoWT2dHhW7omeq6SgAAMO4oBR87vRo2asf7dMTb+/UvoJTpiP5lY5JUfpFdleN6J5iOgoAwE9QyNAkLMvStb3banRWqhb+e4/+sPYLHTtZbjqWUalxEbprSGdd1yedOx4AAKqxbNtu9rsTFhUVKS4uToWFhYqNjW3uzcOA4pJyzV//lf783i6dLKs0HadZtWwRqp8M7KhJl7bnUhYAAhrH76ZDIUOzOlRcqvnrv9TSzd+o8JSzR8ySYsI1oW+GfnhZB8VFhpqOAwCNxvG76VDIYERJeaVWfJKnRRv26NNvCk3H8al+HRJ00/faaUT3FIW6+dwMAOfg+N10mEMGIyJC3RrbJ11j+6Trk70FWrhhj1Z9mqeSco/paOckJjxEV13URjdd0k6dW8eYjgMACDCMkMFvFJWU6x87DmnN9gNat+OQ35/SbBUdpsu7Jmvo+a3Vv3OSIrn+GgCH4/jddBghg9+IjQjV6Kw0jc5KU0WlRxt3H9Wazw7qnc8PaM+Rk6bjSZK6tI7W0PNba8j5rXVheku5+LQkAMAHGCFDQMgrOKUt+wq15ZtCbdlXqK37CnXkRFmTbrN1bLh6tGmpHm3i1KNtrC5oE6fkmIgm3SYA+DOO302HETIEhLSWkUprGVntYqp5Bae0dV+hvjl2SgeLS3WwqMT7tbhEB4pK6zzlaVlSfIswJceEKzk2wvs1JlytYyOUnhBJ+QIANCsKGQLW6ZJ2JuWVHpVVeFThsVXp8Q4Eu12WQt2WwtwuhfAJSACAn6CQwbFC3S4uOwEACAgcrQAAAAyjkAEAABhGIQMAADCMQgYAAGAYhQwAAMAwChkAAIBhFDIAAADDKGQAAACGUcgAAAAMo5ABAAAYRiEDAAAwjEIGAABgGIUMAADAMAoZAACAYRQyAAAAwyhkAAAAhlHIAAAADKOQAQAAGEYhAwAAMIxCBgAAYBiFDAAAwDAKGQAAgGEUMgAAAMMoZAAAAIZRyAAAAAyjkAEAABhGIQMAADCMQgYAAGAYhQwAAMAwChkAAIBhFDIAAADDKGQAAACGUcgAAAAMo5ABAAAYRiEDAAAwjEIGAABgGIUMAADAMAoZAACAYRQyAAAAwyhkAAAAhlHIAAAADKOQAQAAGEYhAwAAMIxCBgAAYBiFDAAAwDAKGQAAgGEUMgAAAMMoZAAAAIZRyAAAAAyjkAEAABhGIQMAADCMQgYAAGAYhQwAAMAwChkAAIBhFDIAAADDKGQAAACGUcgAAAAMo5ABAAAYRiEDAAAwjEIGAABgGIUMAADAMAoZAACAYRQyAAAAwyhkAAAAhlHIAAAADKOQAQAAGEYhAwAAMIxCBgAAYBiFDAAAwDAKGQAAgGEUMgAAAMMoZAAAAIZRyAAAAAyjkAEAABhGIQMAADCMQgYAAGAYhQwAAMAwChkAAIBhFDIAAADDKGQAAACGUcgAAAAMo5ABAICgNXnyZFmWVfVITExUdna2Pv3006rnVFZW6oknnlCPHj0UERGh+Ph4jRw5Uu+//77PclDIAACA36j02Pr3l0f0es4+/fvLI6r02E2+zezsbOXn5ys/P1/vvPOOQkJCNGrUKEmSbdu6/vrr9dBDD+muu+7S9u3btW7dOqWnp2vQoEFavny5TzKE+ORdAAAAGunNrfmasfIz5ReWVC1LjYvQtNHdlH1BapNtNzw8XCkpKZKklJQU3X///erfv78OHTqkd999V6+88opWrFih0aNHV71m/vz5OnLkiH70ox9p2LBhioqKalQGRsgAAIBxb27N108WfVStjEnS/sIS/WTRR3pza36z5Dh+/LgWLVqkTp06KTExUS+99JK6dOlSrYydds899+jIkSN6++23G71dRsgAAIBRlR5bM1Z+ptpOTtqSLEkzVn6mYd1S5HZZPt/+qlWrFB0dLUk6ceKEUlNTtWrVKrlcLu3cuVPnn39+ra87vXznzp2NzsAIGQAAMGrjrqM1Rsb+my0pv7BEG3cdbZLtDx48WDk5OcrJydHGjRs1YsQIjRw5Unv27PFu3276eWyMkAEAAKMOFp+5jJ3L8xoqKipKnTp1qvp+wYIFiouL0zPPPKMuXbpo+/bttb7u9PIuXbo0OkODR8jy8/O1aNEi/f3vf1dZWVm1dSdOnNBDDz3U6FAAACB4JMdE+PR5jWVZllwul06dOqXrr79eubm5WrlyZY3nPf7440pMTNSwYcMavc0GjZBt2rRJw4cPl8fjUXl5udq0aaPly5ere/fukrwT4WbMmKFf//rXjQ4GAACCQ98OCUqNi9D+wpJa55FZklLiItS3Q0KTbL+0tFT79++XJB07dky///3vdfz4cY0ePVoDBw7UsmXLNGnSJM2ePVtDhgxRUVGR/vCHP2jFihVatmxZoz9hKTVwhGzq1Km66qqrdOzYMR04cEDDhg3TwIED9fHHHzc6CAAACE5ul6Vpo7tJ8pav/3b6+2mjuzXJhH5JevPNN5WamqrU1FT169dPmzZt0rJlyzRo0CBZlqWlS5dq6tSpeuKJJ5SZman+/ftrz549WrduncaMGeOTDJbdgJlqCQkJ2rBhQ7VzpbNmzdKjjz6q1atXKyMjQ2lpaaqsrKzzfYqKihQXF6fCwkLFxsaee3oAANBsmvr4beo6ZP6gwZP6S0qqT6i7//77FRISouHDh+vZZ5/1WTAAABBcsi9I1bBuKdq466gOFpcoOcZ7mrKpRsb8SYMK2QUXXKB//etf6tmzZ7Xl9957rzwej8aPH+/TcAAAILi4XZa+1zHRdIxm16A5ZBMnTtR7771X67pf/OIXmjFjhjIyMnwSDAAAIFg0aA6ZrzCHDACAwMPxu+k0aISspKREK1asUHFxcY11RUVFWrFihUpLS30WDgAAIBg0qJA9/fTTeuqppxQTE1NjXWxsrObOnatnnnnGZ+EAAACCQYMK2eLFizVlypQzrp8yZYpefPHFxmYCAAAIKg0qZLm5ucrKyjrj+p49eyo3N7fRoQAAAIJJgwpZRUWFDh06dMb1hw4dUkVFRaNDAQAABJMGFbLu3btrzZo1Z1z/1ltvVd3XEgAAAPXToEL2wx/+UL/5zW+0atWqGutWrlyphx9+WD/84Q99Fg4AAAQZT6W065/Slle8Xz11346xsSZPnizLsjRr1qxqy5cvXy7LsrRw4UJFRUXpiy++qLY+Ly9P8fHx+v3vf++THA2+DtmNN96ol156SV27dlVmZqYk6fPPP9fOnTs1duxYvfzyy2d9D65jAgBA4Gny4/dnK6Q3fykV5f1nWWyalP2I1O1/fL89eQvZkiVLFBERoa+++krx8fGSvIXsqquukm3buvrqq3Xw4EGtX79eLpd3LOuKK65QaWmp3n77bVlW42/t1KARMklatGiRlixZoi5dumjnzp3asWOHMjMz9fLLL9erjAEAANTw2Qpp6cTqZUySivK9yz9b0WSbHjp0qFJSUvS73/2u1vVPP/20du7cqTlz5kiSnn/+eb3//vt67rnnfFLGpAbey7KyslKPPfaYVqxYobKyMo0aNUrTp09XZGSkT8IAAIAg5Kn0joyptpN2tiRLevN+qesVksvt88273W7NnDlTEyZM0M9+9jO1bdu22vqkpCTNnz9f48ePV1ZWlu6++2499dRTSk9P91mGBo2QzZw5U1OnTlV0dLTatGmjuXPn6o477vBZGAAAEIT2/KvmyFg1tlS0z/u8JnLVVVepV69emjZtWq3rx4wZo7Fjxyo7O1sDBw7UpEmTfLr9BhWyF198UfPmzdPq1au1fPlyrVy5UosXL5bH4/FpKAAAEESOH/Dt887RI488ohdeeEHbt2+vdf2DDz4oj8ej//3f//X5thtUyL7++mv94Ac/qPp+6NChsixLeXl1tVoAAIA6RLf27fPO0YABAzRixAg98MADta4PCQmp9tWXGvSOFRUVioiIqLYsNDRU5eXlPg0FAACCSLtLvZ+mLMpX7fPILO/6dpc2eZRZs2apV69eVVeSaC4NKmS2bWvy5MkKDw+vWlZSUqLbbrtNUVFRVcv++te/+i4hAABwNpfbe2mLpRMlWapeyr79FGP2rCaZ0P9dPXr00A033KC5c+c2+bb+W4NOWU6aNEnJycmKi4uretx4441KS0urtgwAAKBBuv2PNPZFKTa1+vLYNO/yJroOWW0eeuihZp8f3+ALw/oCF4YFACDwNMvx21Pp/TTl8QPeOWPtLm2WkTHTfD8rDQAA4Fy53FKH/qZTNLsGX6kfAAAAvsUIGQAAgayiVDpxyPu1slzyVHiXu0Ikd6gUEi5FJXm/wm9RyAAA8GcVpdKBbVL+J1LBHql4v/dx/IBUnC+dOla/94mMl6JTpJhvH9Gtpfh2UmqW1PoCCpthFDIAAPxFZYW0/xMpL0fKz/F+Pbhd8vjgep+njnkfh2q5Cr0rVEruKqX2ktJ6SakXeouam5rQXPhNAwBgUkmR9MXb0o43pNy3pZKC5s/gKZf2b/E+Pl7oXRYRJ3UaJmWOlDoP836PJkMhAwCguRV+I21fJe18Q9r9vm9GwHytpFDa+or34Qr1Xn6i7WDTqRyLQgYAQHOwbemLNdKmBVLuW5LdvBcebRRPubTrH9Ln60wncSwKGQAATenkUe9pwM3PSsd2m04DP0UhAwCgKRzdJa1/zHvKr6LEdBr4OQoZAAC+dPyg9I9HpQ+f98+5YfBLFDIAAHyhpEj611zp3/Ok8hOm0yDAUMgAAGgMT6W0cb60frZ08ojpNAhQFDIAAM7VoR3S8p9I+z40nQQBjkIGAEBDeSq9pyfX/k6qLDWdBg5AIQMAoCEYFUMToJABAFBfG/4kvf1rRsXgcxQyAADOpqJUWnW3lLPYdBI4FIUMAIC6FB+QltwofbPRdBI4GIUMAIAzyftY+ssNUtE+00ngcBQyAABqs/VVafkdUsUp00kQBChkAAB814fPSyunSLINB0GwcJkOAACAX/ngacoYmh0jZAAAnLbhT9KbvzSdAkGIETIAACRp058pYzCGQgYAwCd/kf52j+kUCGIUMgBAcPt6g7Tip2LOGEyikAEAglfhN96LvlaWmU6CIEchAwAEp7KT0svjpROHTCcBKGQAgCD1+u3S/k9NpwAkUcgAAMFo/Wxp22umUwBVKGQAgODy9QfS2pmmUwDVUMgAAMGjvMR7qtL2mE4CVEMhAwAEj3d/Ix35wnQKoAYKGQAgOOzdKG2YZzoFUCsKGQDA+cpLpOWcqoT/opABAJxv3UzpSK7pFMAZUcgAAM5W8LW04U+mUwB1opABAJxt7e+kylLTKYA6UcgAAM514DPp07+YTgGcFYUMAOBc7zzERH4EBAoZAMCZvt4g7XzDdAqgXihkAABneuc3phMA9UYhAwA4T/6n0p73TKcA6o1CBgBwnk0LTCcAGoRCBgBwlpJCacsrplMADUIhAwA4S87LUvkJ0ymABqGQAQCcZfOfTScAGoxCBgBwjl3rpcM7TacAGoxCBgBwjq2vmk4AnBMKGQDAGWxb2rnadArgnFDIAADOkPeRVJxvOgVwTihkAABn2MFtkhC4KGQAAGfY8abpBMA5o5ABAAJfwV7pwBbTKYBzRiEDAAS+L98xnQBoFAoZACDw7fvIdAKgUShkAIDAl59jOgHQKBQyAEBgqyiTDm43nQJoFAoZACCwHdwmVZaZTgE0CoUMABDY8j42nQBoNAoZACCw5eWYTgA0GoUMABDYjnxpOgHQaBQyAEBg4/6VcAAKGQAgsB0/YDoB0GgUMgBA4CotlsqOm04BNBqFDAAQuIoZHYMzUMgAAIGL+WNwCAoZACBwMX8MDkEhAwAErrITphMAPkEhAwAELk+F6QSAT1DIAACBy1NpOgHgExQyAEAAs00HAHyCQgYACFwWhzE4A3syACBwuUJMJwB8gkIGAAhc7lDTCQCfoJABAAJXi0TTCQCfoJABAAJXTIrpBIBPUMgAAIErmkIGZ6CQAQACV3Qyn7SEI7AXAwACl8stRSWZTgE0GoUMABDYmEcGBwiMC7hUlkuF30jHD0jF+VLxt1+PH/A+yku89zPzlHuf7wrxPtxh3v9ziknxPqJT/vPfsW2ksBZmfy4AQOPFtpXyPzGdAmgU/ytkFWXSwW1SXo6Un+P9evAzqbLMt9uxXFJiJyntQim1l5TWS0rpKYVH+3Y7AICmlXKBtONvplMAjWK+kHk80jebpJ1vSF+tkw5s8335qo3tkQ7v9D4+XeJdZrmkxM5S+8ukzJFShwFSSHjTZwEAnLvUXqYTAI1mtpCtukfat1Y6cchojCq2Rzq8w/vY/GcpLFo6b5CU+QOpS7YUxQUIAcDvpPUynQBoNLOFbMsSKdwyGqFOZcelz1d5H5ZLyrhU6j1J6jZGCgkznQ4AIEmxaVJ0a++cYiBA8SnL+rI90p73pL/eKs05X1ozXSr42nQqAIDEaUsEPArZuTh5WHrvCempLGnxWCl3jelEABDcOG2JAEchawzbI+WulhZfIz0zRNr1T9OJACA4ZVxiOgHQKBQyX9m3WXphlLTwain/U9NpACC4tLtMCosxnQI4ZxQyX/vyHenpAdIrt0hHvzKdBgCCQ0iY1Oly0ymAc0YhaxK2tPUV6Q/9pH/MliorTAcCAOfL/IHpBMA5o5A1pcoyae1vpQVDvBe8BQA0nc7DJcttOgVwTihkzSE/R5o/iNEyAGhKLRKY3I+ARSFrLv89WnZwu+k0AOBMXa8wnQA4JxSy5paf471Exmevm04CAM7Tc5zk5h7ECDwUMhPKT0hLJ0lrfyfZtuk0AOAcUa2kbleaTgE0GIXMGFv6xyxp6USp7ITpMADgHBf/yHQCoMEoZKZtXyH9eQT3xQQAX8noJ6X0MJ0CaBAKmT84sEV65nJp/1bTSQDAGfrcYjoB0CAUMn9x4pD31kt5H5tOAgCBr+dYKTzOdAqg3ihk/uTUMemFK6VvNptOAgCBLSxKuuQ20ymAeqOQ+ZvSQmnR1VJejukkABDYLv2p1KKV6RRAvVDI/FFJobTwKm63BACNER4jDbjXdAqgXihk/urUUWnRNVLxftNJACBw9blFisswnQI4KwqZPyvOl/5yg1RRajoJAASmkDBp8FTTKYCzopD5u32bpZV3mU4BAIGr5zgpubvpFECdKGSB4JOXpffnmk4BAIHJ5ZJGzZEsDnnwX+ydgWLNNCn3bdMpACAwZVwi9fuJ6RTAGVHIAoXtkV65RSrYazoJAASmIQ9KCR1NpwBqRSELJKWF0oqfmk4BAIEpNFIaM49Tl/BL7JWB5qu10ubnTKcAgMDEqUv4KQpZIHrrQU5dAsC5GvKg1CrTdAqgGgpZICor5tQlAJyr0Ejp+pekCG4+Dv9BIQtUnLoEgHPXqpN07XOS5TadBJBEIQtsa6ZLpwpMpwCAwNRpiDTsIdMpAEkUssBWUiC9/6TpFAAQuC69U8qaYDoFQCELeBv+JBXlm04BAIFr9JNS276mUyDIUcgCXcUp6R+PmE4BAIErJFyasIT7XcIoCpkTfLxQOvKl6RQAELhaJEgTX5dadTGdBEGKQuYEngrp3d+YTgEAgS06SZq4QkrsbDoJghCFzCm2LZeOfmU6BQAEtthU6ea/S8ndTCdBkKGQOYYtbfqz6RAAEPiik6XJf5NSs0wnQRChkDlJzmKpvMR0CgAIfC0SpJvflLpdaToJggSFzElOHZO2vmo6BQA4Q1gL6boXpEFTJVmm08DhKGROs2mB6QQA4ByWJQ36pTRuoRQWbToNHIxC5jR5H0n7PjKdAgCc5fzR0i1vSS0zTCeBQ1HInChnsekEAOA8rbtLP/6H1GOs6SRwIAqZE+14w3QCAHCmFgnSNc9I178kRbc2nQYOQiFzoqJ9Ul6O6RQA4Fxdr5Bu3yD1uM50EjgEhcypGCUDgKbVIkG6ZgGjZfAJCplT7aSQAUCz6HqF9NOPpMG/ksJjTadBgKKQOVX+J1JRnukUABAcwqOlgb+QfpYjXXKH5A43nQgBhkLmZJy2BIDmFZUoZc+Ufvqh1OsGyeIwi/phT3GyvRtNJwCA4NQyXRozT7pzs/S9O6WIlqYTwc9RyJwsP8d0AgAIbokdpREPS/d8Ll35ByntQtOJ4KcoZE52eKdUdsJ0CgBAaKR04Y3Sj9dJt77r/e/IBNOpGo6RviYTYjoAmpDtkfZvkTIuMZ0EAHBam97ex+hKae8H0o6/e+f8HvnCdLLaJZwnZf5AyhwptewuTQ/AIhkAKGROl/cxhQwA/JHLLbW71PsY/lvpcK63nO1a772498nDZnK1SJRSe0kdBniLWFKX/6wrKjKTKQhQyJyOK/YDQGBo1VlqdZf0/bu83xfs9c4Fzsv5z1dfl7QWraTULCmtl7eEpV3o/UACmh2FzOkObDOdAABwLlqmex/nj/7PsrKTUnG+dPyAVLzf+zi+Xzp+UKoolTzlkqfS+1yXW3KFSiHhUlSSFJMqxbSWolOkmG8fYVFmfjbUQCFzuqJ9phMAAHwlrIX3k5uJHU0ngY/xKUunO3VUqigznQIAANSBQhYMjh8wnQAAANSBQhYMivebTgAAAOpAIQsGxylkAAD4MwpZMGCEDAAAv0YhCwanjplOAAAA6kAhCwaV5aYTAACAOlDIgoGnwnQCAABQBwpZMKCQAQDg1yhkwcDijxkAAH/GkToYuENNJwAAAHWgkAUDF7csBQDAn1HIgkFYtOkEAACgDhSyYBCTYjoBAACoA4UsGFDIAADwaxSyYBBNIQMAwJ9RyIIBI2QAAPg1CpnThUVL4UzqBwDAn1HInC66tekEAADgLChkTpeUaToBAAA4CwqZ06X2Mp0AAACcBYXM6dIuNJ0AAACcBYXM6dJ6mU4AAADOgkLmZDFpUnSy6RQAAOAsjNx12rZtSVJRqW1i88EjvbtUVGQ6BQDAIYq+PaacPo7DdyzbwG/1m2++UXp6enNvFgAA+MDevXvVtm1b0zEcxUgh83g8ysvLU0xMjCzLau7NAwCAc2DbtoqLi5WWliaXi1lPvmSkkAEAAOA/qLcAAACGUcgAAAAMo5ABAAAYRiEDAAAwjEIGoE6TJ0+WZVmyLEthYWHq1KmTHnroIVVUVEjyfupq/vz56tevn6Kjo9WyZUv16dNHTz75pE6ePClJ2rZtm6655hq1b99elmXpySefNPgTAYD/oZABOKvs7Gzl5+crNzdX99xzj6ZPn67Zs2dLkm666SZNmTJFV155pdauXaucnBw9+OCDev311/XWW29Jkk6ePKnzzjtPs2bNUkpKiskfBQD8Epe9AFCnyZMnq6CgQMuXL69aNnz4cBUXF+vuu+/WuHHjtHz5cl155ZXVXmfbtoqKihQXF1dtefv27TVlyhRNmTKlGdIDQGBghAxAg0VGRqqsrEyLFy9WZmZmjTImSZZl1ShjAIDaUcgA1Jtt21qzZo1Wr16tyy+/XLm5ucrMzDQdCwACHoUMwFmtWrVK0dHRioiI0MiRIzVu3DhNnz6dGwwDgI+EmA4AwP8NHjxYf/zjHxUWFqa0tDSFhHj/6ejSpYs+//xzw+kAIPAxQgbgrKKiotSpUydlZGRUlTFJmjBhgnbu3KnXX3+9xmts21ZhYWFzxgSAgEUhA3DOxo4dq3Hjxmn8+PGaOXOmNm/erD179mjVqlUaOnSo1q5dK0kqKytTTk6OcnJyVFZWpn379iknJ0dffPGF4Z8AAPwDl70AUKfaLnvx3zwej+bPn69nn31W27ZtU0hIiDp37qyJEyfq1ltvVWRkpHbv3q0OHTrUeO3AgQO1bt26pv0BACAAUMgAAAAM45QlAACAYRQyAAAAwyhkAAAAhlHIAAAADKOQAQAAGEYhAwAAMIxCBgAAYBiFDAAAwDAKGQAAgGEUMgAAAMMoZAAAAIZRyAAAAAz7P+SF2djCGLycAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ep.pl.pca(adata, color=\"service_unit\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6b3158d0-5f09-47cb-8ff2-249386257680", "metadata": {}, "source": [ "This is of course not a useful analysis since we only have three visits." ] }, { "cell_type": "markdown", "id": "8077a630-b72f-43ef-8597-1913f84a2f5f", "metadata": {}, "source": [ "## Making your dataset ready for ehrapy" ] }, { "cell_type": "markdown", "id": "91a05e19-9af4-44b4-a221-4bfaaab66d0e", "metadata": {}, "source": [ "### Data types" ] }, { "cell_type": "markdown", "id": "8d779967-51f5-4be0-81b1-8732896883ba", "metadata": {}, "source": [ "ehrapy requires data to be in two dimensional, vectorized format meaning anything that could be stored in a single Pandas DataFrame is suitable.\n", "It does not matter whether the data originally came from a database or several CSV files." ] }, { "cell_type": "code", "execution_count": 16, "id": "47a325f8-cbc0-4957-9b48-d001e76b090d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MixedColumn1MixedColumn2
0Apple15
110Cherry
2Banana5
320Date
\n", "
" ], "text/plain": [ " MixedColumn1 MixedColumn2\n", "0 Apple 15\n", "1 10 Cherry\n", "2 Banana 5\n", "3 20 Date" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This is NOT okay\n", "data = {\n", " \"MixedColumn1\": [\"Apple\", 10, \"Banana\", 20],\n", " \"MixedColumn2\": [15, \"Cherry\", 5, \"Date\"],\n", "}\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "code", "execution_count": 17, "id": "908e6813-cb2f-40a9-94e8-55f3f46fec18", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column1Column2Column3Column4Column5
0Apple10True2023-08-01dead
1Banana20False2023-08-15alive
2Cherry15True2023-08-10dead
3Date5False2023-08-05dead
\n", "
" ], "text/plain": [ " Column1 Column2 Column3 Column4 Column5\n", "0 Apple 10 True 2023-08-01 dead\n", "1 Banana 20 False 2023-08-15 alive\n", "2 Cherry 15 True 2023-08-10 dead\n", "3 Date 5 False 2023-08-05 dead" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This is okay\n", "data = {\n", " \"Column1\": [\"Apple\", \"Banana\", \"Cherry\", \"Date\"],\n", " \"Column2\": [10, 20, 15, 5],\n", " \"Column3\": [True, False, True, False],\n", " \"Column4\": [\n", " pd.Timestamp(\"2023-08-01\"),\n", " pd.Timestamp(\"2023-08-15\"),\n", " pd.Timestamp(\"2023-08-10\"),\n", " pd.Timestamp(\"2023-08-05\"),\n", " ],\n", " \"Column5\": pd.Categorical([\"dead\", \"alive\", \"dead\", \"dead\"]),\n", "}\n", "\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "code", "execution_count": 18, "id": "9e7fb9c0-e4b0-4412-8dd8-452bdb3f5fd8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 4 × 5\n", " var: 'ehrapy_column_type'\n", " layers: 'original'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = ep.ad.df_to_anndata(df)\n", "adata" ] }, { "cell_type": "markdown", "id": "735a52cc-b1d8-4bb8-928d-377dd0b438c9", "metadata": {}, "source": [ "### Feature groups" ] }, { "cell_type": "markdown", "id": "9564dc0b-eb95-4dbc-95ed-22e18fbeb98e", "metadata": {}, "source": [ "For many analyses with ehrapy it is useful to group together features that belong to the same data modality.\n", "Examples are high level groups such as demography values, lab or vital sign measurements.\n", "This allows for simpler groupbys or the creation of subsets:" ] }, { "cell_type": "code", "execution_count": 19, "id": "aa1d27c8-e4ba-441f-b0bf-934c6d0fadf0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderageb12d3
0male1030025
1female2060030
2female1580028
3male550021
\n", "
" ], "text/plain": [ " gender age b12 d3\n", "0 male 10 300 25\n", "1 female 20 600 30\n", "2 female 15 800 28\n", "3 male 5 500 21" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " \"gender\": pd.Categorical([\"male\", \"female\", \"female\", \"male\"]),\n", " \"age\": [10, 20, 15, 5],\n", " \"b12\": [300, 600, 800, 500],\n", " \"d3\": [25, 30, 28, 21],\n", "}\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "code", "execution_count": 20, "id": "90837e87-a742-40e3-9dc2-b8b4bd380fba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ehrapy_column_type
gendernon_numeric
agenumeric
b12numeric
d3numeric
\n", "
" ], "text/plain": [ " ehrapy_column_type\n", "gender non_numeric\n", "age numeric\n", "b12 numeric\n", "d3 numeric" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = ep.ad.df_to_anndata(df)\n", "adata.var" ] }, { "cell_type": "code", "execution_count": 21, "id": "1bb859d1-be18-445a-bbf9-6e287f291d3f", "metadata": {}, "outputs": [], "source": [ "demographics_features = [\"age\", \"gender\"]\n", "lab_measurements_features = [\"b12\", \"d3\"]\n", "\n", "# Assign the measurement groups to features in .var\n", "measurement_group = []\n", "\n", "for feature in adata.var_names:\n", " if feature in demographics_features:\n", " measurement_group.append(\"demographics\")\n", " elif feature in lab_measurements_features:\n", " measurement_group.append(\"lab_measurements\")\n", "\n", "adata.var[\"measurement_group\"] = measurement_group" ] }, { "cell_type": "code", "execution_count": 22, "id": "19f04f29-82af-4ab6-a5be-f1a06a897eab", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "View of AnnData object with n_obs × n_vars = 4 × 2\n", " var: 'ehrapy_column_type', 'measurement_group'\n", " layers: 'original'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_demographics = adata[:, adata.var[\"measurement_group\"] == \"demographics\"]\n", "adata_demographics" ] }, { "cell_type": "markdown", "id": "62778a52-885b-4dd7-a6e5-9a9faebe4bfb", "metadata": {}, "source": [ "### Units" ] }, { "cell_type": "markdown", "id": "8e87e933-f254-405f-94a3-3891d5018dc9", "metadata": {}, "source": [ "EHR measurements are recorded in specific units that are ideally stored with the measurements:" ] }, { "cell_type": "code", "execution_count": 23, "id": "e41955a2-5954-47d3-8229-413ff306266f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gender [categorical]age [years]b12 [pg/mL]d3 [ng/mL]
0male1030025
1female2060030
2female1580028
3male550021
\n", "
" ], "text/plain": [ " gender [categorical] age [years] b12 [pg/mL] d3 [ng/mL]\n", "0 male 10 300 25\n", "1 female 20 600 30\n", "2 female 15 800 28\n", "3 male 5 500 21" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " \"gender [categorical]\": pd.Categorical([\"male\", \"female\", \"female\", \"male\"]),\n", " \"age [years]\": [10, 20, 15, 5],\n", " \"b12 [pg/mL]\": [300, 600, 800, 500],\n", " \"d3 [ng/mL]\": [25, 30, 28, 21],\n", "}\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "code", "execution_count": 24, "id": "e8a29836-66d5-41b3-9efb-0f80085a21d4", "metadata": {}, "outputs": [], "source": [ "adata = ep.ad.df_to_anndata(df)\n", "\n", "# Extract feature names and units from var_names and store separately\n", "feature_names = [var_name.split(\"[\")[0].strip() for var_name in adata.var_names]\n", "unit_annotations = [\n", " var_name.split(\"[\")[-1][:-1] if \"[\" in var_name else \"\"\n", " for var_name in adata.var_names\n", "]\n", "\n", "# Update .var with feature names and units separately\n", "adata.var_names = feature_names\n", "adata.var[\"units\"] = unit_annotations" ] }, { "cell_type": "code", "execution_count": 25, "id": "5d32932e-3137-4205-962c-45496c1136ec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ehrapy_column_typeunits
gendernon_numericcategorical
agenumericyears
b12numericpg/mL
d3numericng/mL
\n", "
" ], "text/plain": [ " ehrapy_column_type units\n", "gender non_numeric categorical\n", "age numeric years\n", "b12 numeric pg/mL\n", "d3 numeric ng/mL" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.var" ] }, { "cell_type": "code", "execution_count": 26, "id": "c817e1e5-23d9-4975-9340-89608b638ea0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unit of 'd3': ng/mL\n" ] } ], "source": [ "d3_unit = adata.var[\"units\"][\"d3\"]\n", "print(f\"Unit of 'd3': {d3_unit}\")" ] }, { "cell_type": "markdown", "id": "4cba8354-8d87-4932-af33-b84afc2e0de7", "metadata": {}, "source": [ "## Conclusion" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0a67aa8b-1d4d-4467-b4d8-fe8aed3e74c3", "metadata": {}, "source": [ "To get started check out the [MIMIC-II introduction tutorial](https://ehrapy.readthedocs.io/en/latest/tutorials/mimic_2_introduction.html) where you will learn to apply ehrapy to a real dataset to investigate the effect of intdwelling artherical catheters on patient survival over multiple notebooks." ] }, { "cell_type": "markdown", "id": "357159ae-602f-462b-95e8-b654335b9aa7", "metadata": {}, "source": [ "Please also consider consulting the [ehrapy API](https://ehrapy.readthedocs.io/en/latest/usage/usage.html) documentation." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }