{ "cells": [ { "cell_type": "markdown", "id": "47bcd090-0737-4c94-91f9-a021efd73f18", "metadata": {}, "source": [ "# Ontology mapping" ] }, { "cell_type": "markdown", "id": "62dc1484-a7eb-4f5b-aedc-b57ce865c920", "metadata": {}, "source": [ "Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts." ] }, { "cell_type": "markdown", "id": "86af2499-df15-4dac-b80d-7db600eba205", "metadata": {}, "source": [ "ehrapy is compatible with [Bionty](https://github.com/laminlabs/bionty) which provides access to public ontologies and functionality to map values against them.\n", "\n", "Here, we'll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies." ] }, { "cell_type": "code", "execution_count": 1, "id": "d7579faa-c9a8-48aa-b385-8ab0c9f52ddf", "metadata": { "tags": [] }, "outputs": [], "source": [ "import anndata as ad\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "9223258c-cded-49ed-8380-93e3870b3182", "metadata": {}, "source": [ "Create an AnnData object with disease annotations in the `obs` slot." ] }, { "cell_type": "code", "execution_count": 2, "id": "79e34af4-bb5d-420c-b284-66a72168d611", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/zeth/miniconda3/envs/ehrapy/lib/python3.11/site-packages/anndata/_core/anndata.py:183: ImplicitModificationWarning: Transforming to str index.\n", " warnings.warn(\"Transforming to str index.\", ImplicitModificationWarning)\n" ] }, { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3 × 3\n", " obs: 'Immune system disorders', 'nervous system disorder', 'injury'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = ad.AnnData(\n", " X=np.random.random((3, 3)),\n", " var=pd.DataFrame(index=[f\"Lab value {val}\" for val in range(3)]),\n", " obs=pd.DataFrame(\n", " columns=[\"Immune system disorders\", \"nervous system disorder\", \"injury\"],\n", " data=[\n", " [\"Rheumatoid arthritis\", \"Alzheimer's disease\", \"Fracture\"],\n", " [\"Celiac disease\", \"Parkinson's disease\", \"Traumatic brain injury\"],\n", " [\"Multipla sclurosis\", \"Epilepsy\", \"Fractured Femur\"],\n", " ],\n", " ),\n", ")\n", "adata" ] }, { "cell_type": "code", "execution_count": 3, "id": "bffd9a56-8127-45cb-bd8b-ab86cdb54d1f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Immune system disordersnervous system disorderinjury
0Rheumatoid arthritisAlzheimer's diseaseFracture
1Celiac diseaseParkinson's diseaseTraumatic brain injury
2Multipla sclurosisEpilepsyFractured Femur
\n", "
" ], "text/plain": [ " Immune system disorders nervous system disorder injury\n", "0 Rheumatoid arthritis Alzheimer's disease Fracture\n", "1 Celiac disease Parkinson's disease Traumatic brain injury\n", "2 Multipla sclurosis Epilepsy Fractured Femur" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.obs" ] }, { "cell_type": "markdown", "id": "716fc989-1f56-428c-8c97-6c74e1a3b3a1", "metadata": {}, "source": [ "We notice that one of our injuries does not exist and we expect to have to correct it later." ] }, { "cell_type": "markdown", "id": "900b48e8-3785-46fe-8659-8ecc0c9fecd5", "metadata": {}, "source": [ "## Introduction to Bionty" ] }, { "cell_type": "markdown", "id": "7c267a1f-5e68-4761-ab6f-c704c7207479", "metadata": {}, "source": [ "First we import Bionty." ] }, { "cell_type": "code", "execution_count": 4, "id": "f5b2634e-b809-4115-9440-7ef40399e38e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!\n", "\n", "if you see this message repeatedly, run: import bionty_base; bionty_base.reset_sources()\n" ] } ], "source": [ "import bionty_base as bt" ] }, { "cell_type": "markdown", "id": "abca4350-8799-41fe-a955-0ea96201fc7b", "metadata": {}, "source": [ "Bionty provides support for several ontologies related to diseases." ] }, { "cell_type": "code", "execution_count": 5, "id": "60d2a5b4-3a89-4dc7-9101-8d0ddb1705b7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceorganismversionurlmd5source_namesource_website
entity
Diseasemondoall2024-02-06http://purl.obolibrary.org/obo/mondo/releases/...78914fa236773c5ea6605f7570df6245Mondo Disease Ontologyhttps://mondo.monarchinitiative.org
Diseasemondoall2023-08-02http://purl.obolibrary.org/obo/mondo/releases/...7f33767422042eec29f08b501fc851dbMondo Disease Ontologyhttps://mondo.monarchinitiative.org
Diseasemondoall2023-04-04http://purl.obolibrary.org/obo/mondo/releases/...700c43dd9ba51aecc7a8edfc3bc2dab1Mondo Disease Ontologyhttps://mondo.monarchinitiative.org
Diseasemondoall2023-02-06http://purl.obolibrary.org/obo/mondo/releases/...2b7d479d4bd02a94eab47d1c9e64c5dbMondo Disease Ontologyhttps://mondo.monarchinitiative.org
Diseasemondoall2022-10-11http://purl.obolibrary.org/obo/mondo/releases/...04b808d05c2c2e81430b20a0e87552bbMondo Disease Ontologyhttps://mondo.monarchinitiative.org
Diseasedoidhuman2024-01-31http://purl.obolibrary.org/obo/doid/releases/2...b36c15a4610757094f8db64b78ae2693Human Disease Ontologyhttps://disease-ontology.org
Diseasedoidhuman2023-03-31http://purl.obolibrary.org/obo/doid/releases/2...64f083a1e47867c307c8eae308afc3bbHuman Disease Ontologyhttps://disease-ontology.org
Diseasedoidhuman2023-01-30http://purl.obolibrary.org/obo/doid/releases/2...9f0c92ad2896dda82195e9226a06dc36Human Disease Ontologyhttps://disease-ontology.org
Diseaseicdhumanicd-11-2023s3://bionty-assets/df_human__icd__icd-11-2023_...16263aef644d2c62c47b7b1ecfbad9d6International Classification of Diseases (ICD)https://www.cdc.gov/nchs/icd/icd9cm.htm
Diseaseicdhumanicd-10-2020s3://bionty-assets/df_human__icd__icd-10-2020_...93ec5734fcc2edd64686d5ffc6f6105fInternational Classification of Diseases (ICD)https://www.cdc.gov/nchs/icd/icd9cm.htm
Diseaseicdhumanicd-9-2011s3://bionty-assets/df_human__icd__icd-9-2011__...cb3aefb3c4f7b2c47bf3de38453350c7International Classification of Diseases (ICD)https://www.cdc.gov/nchs/icd/icd9cm.htm
Diseaseicdhumanicd-10-2024s3://bionty-assets/df_human__icd__icd-10-2024_...NoneInternational Classification of Diseases (ICD)https://www.cdc.gov/nchs/icd/icd9cm.htm
\n", "
" ], "text/plain": [ " source organism version \\\n", "entity \n", "Disease mondo all 2024-02-06 \n", "Disease mondo all 2023-08-02 \n", "Disease mondo all 2023-04-04 \n", "Disease mondo all 2023-02-06 \n", "Disease mondo all 2022-10-11 \n", "Disease doid human 2024-01-31 \n", "Disease doid human 2023-03-31 \n", "Disease doid human 2023-01-30 \n", "Disease icd human icd-11-2023 \n", "Disease icd human icd-10-2020 \n", "Disease icd human icd-9-2011 \n", "Disease icd human icd-10-2024 \n", "\n", " url \\\n", "entity \n", "Disease http://purl.obolibrary.org/obo/mondo/releases/... \n", "Disease http://purl.obolibrary.org/obo/mondo/releases/... \n", "Disease http://purl.obolibrary.org/obo/mondo/releases/... \n", "Disease http://purl.obolibrary.org/obo/mondo/releases/... \n", "Disease http://purl.obolibrary.org/obo/mondo/releases/... \n", "Disease http://purl.obolibrary.org/obo/doid/releases/2... \n", "Disease http://purl.obolibrary.org/obo/doid/releases/2... \n", "Disease http://purl.obolibrary.org/obo/doid/releases/2... \n", "Disease s3://bionty-assets/df_human__icd__icd-11-2023_... \n", "Disease s3://bionty-assets/df_human__icd__icd-10-2020_... \n", "Disease s3://bionty-assets/df_human__icd__icd-9-2011__... \n", "Disease s3://bionty-assets/df_human__icd__icd-10-2024_... \n", "\n", " md5 \\\n", "entity \n", "Disease 78914fa236773c5ea6605f7570df6245 \n", "Disease 7f33767422042eec29f08b501fc851db \n", "Disease 700c43dd9ba51aecc7a8edfc3bc2dab1 \n", "Disease 2b7d479d4bd02a94eab47d1c9e64c5db \n", "Disease 04b808d05c2c2e81430b20a0e87552bb \n", "Disease b36c15a4610757094f8db64b78ae2693 \n", "Disease 64f083a1e47867c307c8eae308afc3bb \n", "Disease 9f0c92ad2896dda82195e9226a06dc36 \n", "Disease 16263aef644d2c62c47b7b1ecfbad9d6 \n", "Disease 93ec5734fcc2edd64686d5ffc6f6105f \n", "Disease cb3aefb3c4f7b2c47bf3de38453350c7 \n", "Disease None \n", "\n", " source_name \\\n", "entity \n", "Disease Mondo Disease Ontology \n", "Disease Mondo Disease Ontology \n", "Disease Mondo Disease Ontology \n", "Disease Mondo Disease Ontology \n", "Disease Mondo Disease Ontology \n", "Disease Human Disease Ontology \n", "Disease Human Disease Ontology \n", "Disease Human Disease Ontology \n", "Disease International Classification of Diseases (ICD) \n", "Disease International Classification of Diseases (ICD) \n", "Disease International Classification of Diseases (ICD) \n", "Disease International Classification of Diseases (ICD) \n", "\n", " source_website \n", "entity \n", "Disease https://mondo.monarchinitiative.org \n", "Disease https://mondo.monarchinitiative.org \n", "Disease https://mondo.monarchinitiative.org \n", "Disease https://mondo.monarchinitiative.org \n", "Disease https://mondo.monarchinitiative.org \n", "Disease https://disease-ontology.org \n", "Disease https://disease-ontology.org \n", "Disease https://disease-ontology.org \n", "Disease https://www.cdc.gov/nchs/icd/icd9cm.htm \n", "Disease https://www.cdc.gov/nchs/icd/icd9cm.htm \n", "Disease https://www.cdc.gov/nchs/icd/icd9cm.htm \n", "Disease https://www.cdc.gov/nchs/icd/icd9cm.htm " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bt.display_available_sources().loc[\"Disease\"]" ] }, { "cell_type": "markdown", "id": "b345b269-6b21-42ed-86a3-fa3e8d468b77", "metadata": {}, "source": [ "Bionty provides three key functionalities:\n", "\n", "1. `inspect`: Check whether any of our values (here diseases) are mappable against a specified ontology.\n", "2. `map_synonyms`: Map values against synonyms. This is not relevant for our diseases.\n", "3. `curate`: Curate ontology values against the ontology to ensure compliance." ] }, { "cell_type": "markdown", "id": "81d67ba7-b8b1-4b6b-ad42-3fe122c60705", "metadata": {}, "source": [ "## Mapping against the MONDO Disease Ontology with Bionty" ] }, { "cell_type": "markdown", "id": "35e32be9-4ade-411d-9e0f-fff64357a338", "metadata": {}, "source": [ "We will now showcase how to access the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/) with Bionty.\n", "The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.\n", "\n", "There are several different sources available that provide definitions and data models for diseases, such as [HPO](https://hpo.jax.org/app), [OMIM](https://omim.org/), [SNOMED CT](http://www.snomed.org/), [ICD](https://www.cdc.gov/nchs/icd/icd10cm.htm), [PhenoDB](https://phenodb.org/), [MedDRA](https://www.meddra.org/), [MedGen](https://www.ncbi.nlm.nih.gov/medgen/), [ORDO](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN), [DO](http://disease-ontology.org/), [GARD](https://rarediseases.info.nih.gov/), and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.\n", "\n", "To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure." ] }, { "cell_type": "markdown", "id": "427cc9fe-ca9c-45ae-8670-a491fbac609c", "metadata": {}, "source": [ "Bionty is centered around Bionty entity objects that provide the above introduced functionality. We'll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility." ] }, { "cell_type": "code", "execution_count": 6, "id": "2cb0a9c2-0f75-448e-858d-f93f73b6cd9a", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "PublicOntology\n", "Entity: Disease\n", "Organism: all\n", "Source: mondo, 2023-02-06\n", "#terms: 25913\n" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty = bt.Disease(source=\"mondo\", version=\"2023-02-06\")\n", "disease_bionty" ] }, { "cell_type": "markdown", "id": "d6cd8893-bc5d-424d-872e-80242ba26c68", "metadata": {}, "source": [ "We can access the DataFrame that contains all ontology terms:" ] }, { "cell_type": "code", "execution_count": 7, "id": "8bf7ef19-aea8-4fc7-9360-8e544f59d3b7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedefinitionsynonymsparents
ontology_id
MONDO:0000001disease or disorderA Disease Is A Disposition To Undergo Patholog...disorders|medical condition|other disease|dise...[]
MONDO:0000002obsolete 46,XX sex reversalNoneNone[]
MONDO:0000003obsolete 17-hydroxysteroid dehydrogenase defic...NoneNone[]
MONDO:0000004adrenocortical insufficiencyAn Endocrine Or Hormonal Disorder That Occurs ...adrenal gland insufficiency|adrenal cortical i...[MONDO:0002816]
MONDO:0000005alopecia, isolatedNoneNone[MONDO:0021034]
...............
MONDO:8000030obsolete morphological anomalyNoneNone[]
MONDO:8000031obsolete subtype of a disorderNoneNone[]
MONDO:8000032obsolete malformation syndromeNoneNone[]
MONDO:8000033obsolete group of disordersNoneNone[]
MONDO:8000034obsolete disorderNoneNone[]
\n", "

25913 rows × 4 columns

\n", "
" ], "text/plain": [ " name \\\n", "ontology_id \n", "MONDO:0000001 disease or disorder \n", "MONDO:0000002 obsolete 46,XX sex reversal \n", "MONDO:0000003 obsolete 17-hydroxysteroid dehydrogenase defic... \n", "MONDO:0000004 adrenocortical insufficiency \n", "MONDO:0000005 alopecia, isolated \n", "... ... \n", "MONDO:8000030 obsolete morphological anomaly \n", "MONDO:8000031 obsolete subtype of a disorder \n", "MONDO:8000032 obsolete malformation syndrome \n", "MONDO:8000033 obsolete group of disorders \n", "MONDO:8000034 obsolete disorder \n", "\n", " definition \\\n", "ontology_id \n", "MONDO:0000001 A Disease Is A Disposition To Undergo Patholog... \n", "MONDO:0000002 None \n", "MONDO:0000003 None \n", "MONDO:0000004 An Endocrine Or Hormonal Disorder That Occurs ... \n", "MONDO:0000005 None \n", "... ... \n", "MONDO:8000030 None \n", "MONDO:8000031 None \n", "MONDO:8000032 None \n", "MONDO:8000033 None \n", "MONDO:8000034 None \n", "\n", " synonyms \\\n", "ontology_id \n", "MONDO:0000001 disorders|medical condition|other disease|dise... \n", "MONDO:0000002 None \n", "MONDO:0000003 None \n", "MONDO:0000004 adrenal gland insufficiency|adrenal cortical i... \n", "MONDO:0000005 None \n", "... ... \n", "MONDO:8000030 None \n", "MONDO:8000031 None \n", "MONDO:8000032 None \n", "MONDO:8000033 None \n", "MONDO:8000034 None \n", "\n", " parents \n", "ontology_id \n", "MONDO:0000001 [] \n", "MONDO:0000002 [] \n", "MONDO:0000003 [] \n", "MONDO:0000004 [MONDO:0002816] \n", "MONDO:0000005 [MONDO:0021034] \n", "... ... \n", "MONDO:8000030 [] \n", "MONDO:8000031 [] \n", "MONDO:8000032 [] \n", "MONDO:8000033 [] \n", "MONDO:8000034 [] \n", "\n", "[25913 rows x 4 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty.df()" ] }, { "cell_type": "markdown", "id": "c2dc951a-5e92-4753-a364-0bc5a87e0dd5", "metadata": {}, "source": [ "Let's inspect all of our \"Immune system disorders\" to learn which terms map against the MONDO Disease ontology." ] }, { "cell_type": "code", "execution_count": 8, "id": "60563f60-b573-4182-9bae-9825ada8f943", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "❗ \u001b[1;93m3 terms\u001b[0m (100.00%) are not validated for \u001b[3mname\u001b[0m: \u001b[1;93mRheumatoid arthritis, Celiac disease, Multipla sclurosis\u001b[0m\n", " detected \u001b[1;93m2 terms with inconsistent casing/synonyms\u001b[0m: \u001b[1;93mRheumatoid arthritis, Celiac disease\u001b[0m\n", "→ standardize terms via \u001b[3m.standardize()\u001b[0m\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
__validated__
Rheumatoid arthritisFalse
Celiac diseaseFalse
Multipla sclurosisFalse
\n", "
" ], "text/plain": [ " __validated__\n", "Rheumatoid arthritis False\n", "Celiac disease False\n", "Multipla sclurosis False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty.inspect(\n", " adata.obs[\"Immune system disorders\"], field=disease_bionty.name, return_df=True\n", ")" ] }, { "cell_type": "markdown", "id": "6d0383e6-09db-43f8-a52d-471046ebacdc", "metadata": {}, "source": [ "None of the values can be validated immediately, but \"Rheumatoid arthritis\" and \"Celiac disease\" have synonyms and can be standardized." ] }, { "cell_type": "code", "execution_count": 9, "id": "d868811c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "💡 standardized 2/3 terms\n" ] } ], "source": [ "adata.obs[\"Immune system disorders\"] = disease_bionty.standardize(adata.obs[\"Immune system disorders\"], field=disease_bionty.name)" ] }, { "cell_type": "code", "execution_count": 10, "id": "f533ceb7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[1;92m2 terms\u001b[0m (66.70%) are validated for \u001b[3mname\u001b[0m\n", "❗ \u001b[1;93m1 term\u001b[0m (33.30%) is not validated for \u001b[3mname\u001b[0m: \u001b[1;93mMultipla sclurosis\u001b[0m\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
__validated__
rheumatoid arthritisTrue
celiac diseaseTrue
Multipla sclurosisFalse
\n", "
" ], "text/plain": [ " __validated__\n", "rheumatoid arthritis True\n", "celiac disease True\n", "Multipla sclurosis False" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty.inspect(\n", " adata.obs[\"Immune system disorders\"], field=disease_bionty.name, return_df=True\n", ")" ] }, { "cell_type": "markdown", "id": "de59a7a1-8ccf-4b7c-a6e5-16bd2ca2e126", "metadata": {}, "source": [ "We can use Bionty's lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete.\n", "For this purpose we create a lookup object." ] }, { "cell_type": "code", "execution_count": 11, "id": "5a570f24-44b4-43ad-b7ee-f1ba3bf922c9", "metadata": {}, "outputs": [], "source": [ "disease_bionty_lookup = disease_bionty.lookup()" ] }, { "cell_type": "code", "execution_count": 12, "id": "aa16c040-961d-4905-affc-297d66cc1a9c", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, parents=array(['MONDO:0006704', 'MONDO:0000568', 'MONDO:0002562', 'MONDO:0005560'],\n", " dtype=object), _5='multiple sclerosis')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty_lookup.multiple_sclerosis" ] }, { "cell_type": "markdown", "id": "53000247-6061-47c8-8f7d-794d9b8fc650", "metadata": {}, "source": [ "We found a match! Let's look at the definition of our result." ] }, { "cell_type": "code", "execution_count": 13, "id": "87ef3eba-be0c-4816-95eb-5175d78250d9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty_lookup.multiple_sclerosis.definition" ] }, { "cell_type": "markdown", "id": "168311ef-c77c-42cb-9faf-587150c152cd", "metadata": {}, "source": [ "This is exactly what we've been looking for. We can also search directly." ] }, { "cell_type": "code", "execution_count": 14, "id": "cdb65c6e-92b9-494e-96fd-f566cd68074d", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ontology_iddefinitionsynonymsparents__agg____ratio__
name
multiple sclerosisMONDO:0005301A Progressive Autoimmune Disorder Affecting Th...None[MONDO:0006704, MONDO:0000568, MONDO:0002562, ...multiple sclerosis88.888889
multiple sclerosis variantMONDO:0016428NoneNone[MONDO:0005071]multiple sclerosis variant72.727273
pediatric multiple sclerosisMONDO:0018784Pediatric Multiple Sclerosis (Ms) Is A Rare Mu...None[MONDO:0016428]pediatric multiple sclerosis69.565217
lateral sclerosisMONDO:0018155Primary Lateral Sclerosis (Pls) Is An Idiopath...primary lateral sclerosis|adult-onset PLS|PLS|...[MONDO:0024257]lateral sclerosis68.571429
glomerulosclerosisMONDO:0000490A Hardening Of The Kidney Glomerulus Caused By...glomerular sclerosis[MONDO:0019722]glomerulosclerosis68.421053
.....................
BAFopathyMONDO:0700120Disorder Caused By Mutations In The Various Su...None[MONDO:0003847]bafopathy14.814815
hydroceleMONDO:0004920NoneNone[MONDO:0003150]hydrocele14.814815
XH antigenMONDO:0010760NoneXH antigen[MONDO:0003847]xh antigen14.285714
angiomyxomaMONDO:0006086A Benign Soft Tissue Neoplasm Characterized By...None[MONDO:0021581, MONDO:0044335]angiomyxoma13.793103
PygmyMONDO:0009941NonePygmy[MONDO:0003847]pygmy8.695652
\n", "

25913 rows × 6 columns

\n", "
" ], "text/plain": [ " ontology_id \\\n", "name \n", "multiple sclerosis MONDO:0005301 \n", "multiple sclerosis variant MONDO:0016428 \n", "pediatric multiple sclerosis MONDO:0018784 \n", "lateral sclerosis MONDO:0018155 \n", "glomerulosclerosis MONDO:0000490 \n", "... ... \n", "BAFopathy MONDO:0700120 \n", "hydrocele MONDO:0004920 \n", "XH antigen MONDO:0010760 \n", "angiomyxoma MONDO:0006086 \n", "Pygmy MONDO:0009941 \n", "\n", " definition \\\n", "name \n", "multiple sclerosis A Progressive Autoimmune Disorder Affecting Th... \n", "multiple sclerosis variant None \n", "pediatric multiple sclerosis Pediatric Multiple Sclerosis (Ms) Is A Rare Mu... \n", "lateral sclerosis Primary Lateral Sclerosis (Pls) Is An Idiopath... \n", "glomerulosclerosis A Hardening Of The Kidney Glomerulus Caused By... \n", "... ... \n", "BAFopathy Disorder Caused By Mutations In The Various Su... \n", "hydrocele None \n", "XH antigen None \n", "angiomyxoma A Benign Soft Tissue Neoplasm Characterized By... \n", "Pygmy None \n", "\n", " synonyms \\\n", "name \n", "multiple sclerosis None \n", "multiple sclerosis variant None \n", "pediatric multiple sclerosis None \n", "lateral sclerosis primary lateral sclerosis|adult-onset PLS|PLS|... \n", "glomerulosclerosis glomerular sclerosis \n", "... ... \n", "BAFopathy None \n", "hydrocele None \n", "XH antigen XH antigen \n", "angiomyxoma None \n", "Pygmy Pygmy \n", "\n", " parents \\\n", "name \n", "multiple sclerosis [MONDO:0006704, MONDO:0000568, MONDO:0002562, ... \n", "multiple sclerosis variant [MONDO:0005071] \n", "pediatric multiple sclerosis [MONDO:0016428] \n", "lateral sclerosis [MONDO:0024257] \n", "glomerulosclerosis [MONDO:0019722] \n", "... ... \n", "BAFopathy [MONDO:0003847] \n", "hydrocele [MONDO:0003150] \n", "XH antigen [MONDO:0003847] \n", "angiomyxoma [MONDO:0021581, MONDO:0044335] \n", "Pygmy [MONDO:0003847] \n", "\n", " __agg__ __ratio__ \n", "name \n", "multiple sclerosis multiple sclerosis 88.888889 \n", "multiple sclerosis variant multiple sclerosis variant 72.727273 \n", "pediatric multiple sclerosis pediatric multiple sclerosis 69.565217 \n", "lateral sclerosis lateral sclerosis 68.571429 \n", "glomerulosclerosis glomerulosclerosis 68.421053 \n", "... ... ... \n", "BAFopathy bafopathy 14.814815 \n", "hydrocele hydrocele 14.814815 \n", "XH antigen xh antigen 14.285714 \n", "angiomyxoma angiomyxoma 13.793103 \n", "Pygmy pygmy 8.695652 \n", "\n", "[25913 rows x 6 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty.search(\n", " \"Multipla sclurosis\", field=disease_bionty.name, case_sensitive=False\n", ")" ] }, { "cell_type": "markdown", "id": "b71eb051-d388-4587-8def-54cbad53d6ed", "metadata": {}, "source": [ "Now we can finally replace the values of our obs column with the MONDO Disease ontology values." ] }, { "cell_type": "code", "execution_count": 15, "id": "0a862857-929f-4e0a-97b4-2e84a695a2a8", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_305804/3382110660.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n", "The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n", "\n", "For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n", "\n", "\n", " adata.obs[\"Immune system disorders\"].replace({\"Multipla sclurosis\": disease_bionty_lookup.multiple_sclerosis.name},\n" ] }, { "data": { "text/plain": [ "0 rheumatoid arthritis\n", "1 celiac disease\n", "2 multiple sclerosis\n", "Name: Immune system disorders, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.obs[\"Immune system disorders\"].replace({\"Multipla sclurosis\": disease_bionty_lookup.multiple_sclerosis.name},\n", " inplace=True)\n", "adata.obs[\"Immune system disorders\"]" ] }, { "cell_type": "code", "execution_count": 16, "id": "c70b961d-d483-44ed-8f7b-575cb18265a3", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[1;92m3 terms\u001b[0m (100.00%) are validated for \u001b[3mname\u001b[0m\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
__validated__
rheumatoid arthritisTrue
celiac diseaseTrue
multiple sclerosisTrue
\n", "
" ], "text/plain": [ " __validated__\n", "rheumatoid arthritis True\n", "celiac disease True\n", "multiple sclerosis True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty.inspect(\n", " adata.obs[\"Immune system disorders\"], field=disease_bionty.name, return_df=True\n", ")" ] }, { "cell_type": "markdown", "id": "c7a08e5e-039e-40ba-bf5d-2997a3a16c52", "metadata": {}, "source": [ "Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns." ] }, { "cell_type": "markdown", "id": "8974c126-55c0-4d97-8889-39e396fc21ea", "metadata": {}, "source": [ "## Mapping against other Disease ontologies" ] }, { "cell_type": "markdown", "id": "9bb095a1-da4f-4f71-a8e6-d08459b27a8c", "metadata": {}, "source": [ "Bionty supports other ontologies besides the MONDO Disease Ontology like the [Disease Ontology](https://disease-ontology.org/) or [ICD](https://www.who.int/standards/classifications/classification-of-diseases). The workflow is the same.\n", "\n", "We solely need to adapt the source and the version." ] }, { "cell_type": "code", "execution_count": 17, "id": "1227ca5f-80b9-491c-a7e4-dd309c599fd9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PublicOntology\n", "Entity: Disease\n", "Organism: human\n", "Source: icd, icd-11-2023\n", "#terms: 35574\n" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "disease_bionty = bt.Disease(source=\"icd\", version=\"icd-11-2023\")\n", "disease_bionty" ] }, { "cell_type": "markdown", "id": "226ad0a8-bd1e-4eb3-baf9-11402123a57c", "metadata": {}, "source": [ "The remaining workflow would be the same as above." ] }, { "cell_type": "markdown", "id": "19bc232b-cd44-4505-a31f-ee65e37d29f6", "metadata": {}, "source": [ "## Conclusion" ] }, { "cell_type": "markdown", "id": "7a15db25-70ea-4d0a-b7a4-49f2d4e35e1f", "metadata": {}, "source": [ "ehrapy provides support for ontology management, inspection and mapping through [Bionty](https://github.com/laminlabs/bionty).\n", "Bionty provide access to ontologies such as the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/), [Disease Ontology](https://disease-ontology.org/) and many others.\n", "To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies.\n", "Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }