Note

This page was generated from ontology_mapping.ipynb. Some tutorial content may look better in light mode.

Ontology mapping#

Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts.

ehrapy is compatible with Bionty which provides access to public ontologies and functionality to map values against them.

Here, we’ll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies.

[1]:
import anndata as ad
import numpy as np
import pandas as pd

Create an AnnData object with disease annotations in the obs slot.

[2]:
adata = ad.AnnData(X=np.random.random((3, 3)),
                   var = pd.DataFrame(index=[f"Lab value {val}" for val in range(3)]),
                   obs=pd.DataFrame(columns=["Immune system disorders", "nervous system disorder", "injury"],
                                    data=[["Rheumatoid arthritis", "Alzheimer's disease", "Fracture"],
                                          ["Celiac disease", "Parkinson's disease", "Traumatic brain injury"],
                                          ["Multipla sclurosis", "Epilepsy", "Fractured Femur"]]))
adata
/home/zeth/miniconda3/envs/ehrapy/lib/python3.10/site-packages/anndata/_core/anndata.py:117: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
[2]:
AnnData object with n_obs × n_vars = 3 × 3
    obs: 'Immune system disorders', 'nervous system disorder', 'injury'
[3]:
adata.obs
[3]:
Immune system disorders nervous system disorder injury
0 Rheumatoid arthritis Alzheimer's disease Fracture
1 Celiac disease Parkinson's disease Traumatic brain injury
2 Multipla sclurosis Epilepsy Fractured Femur

We notice that one of our injuries does not exist and we expect to have to correct it later.

Introduction to Bionty#

First we import Bionty.

[4]:
import bionty as bt
✅ New records found in the public sources.yaml, updated /home/zeth/.lamin/bionty/versions/sources.local.yaml!

Bionty provides support for several ontologies related to diseases.

[5]:
bt.display_available_sources().loc["Disease"]
[5]:
source species version url md5 source_name source_website
entity
Disease mondo all 2023-02-06 None 2b7d479d4bd02a94eab47d1c9e64c5db Mondo Disease Ontology https://mondo.monarchinitiative.org/
Disease mondo all 2022-10-11 None 04b808d05c2c2e81430b20a0e87552bb Mondo Disease Ontology https://mondo.monarchinitiative.org/
Disease doid human 2023-01-30 None 9f0c92ad2896dda82195e9226a06dc36 Human Disease Ontology https://disease-ontology.org/

Bionty provides three key functionalities:

  1. inspect: Check whether any of our values (here diseases) are mappable against a specified ontology.

  2. map_synonyms: Map values against synonyms. This is not relevant for our diseases.

  3. curate: Curate ontology values against the ontology to ensure compliance.

Mapping against the MONDO Disease Ontology with Bionty#

We will now showcase how to access the Mondo Disease Ontology with Bionty. The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.

There are several different sources available that provide definitions and data models for diseases, such as HPO, OMIM, SNOMED CT, ICD, PhenoDB, MedDRA, MedGen, ORDO, DO, GARD, and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.

To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure.

Bionty is centered around Bionty entity objects that provide the above introduced functionality. We’ll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility.

[6]:
disease_bionty = bt.Disease(source="mondo", version="2023-02-06")
disease_bionty
[6]:
Disease
Species: all
Source: mondo, 2023-02-06

📖 Disease.df(): ontology reference table
🔎 Disease.lookup(): autocompletion of ontology terms
🔗 Disease.ontology: Pronto.Ontology object

We can access the DataFrame that contains all ontology terms:

[7]:
disease_bionty.df()
[7]:
name definition synonyms children
ontology_id
http://identifiers.org/hgnc/10001 RGS5 None None []
http://identifiers.org/hgnc/10004 RGS9 None None []
http://identifiers.org/hgnc/10006 RHAG None None []
http://identifiers.org/hgnc/10012 RHO None None []
http://identifiers.org/hgnc/10013 GRK1 None None []
... ... ... ... ...
UBERON:8410056 capillary of anorectum None None []
UBERON:8410057 capillary of colon None None []
UBERON:8420000 hair of scalp None None []
UBERON:8440004 laminar subdivision of the cortex None None [UBERON:0002301]
UPHENO:0001001 phenotype None None []

41623 rows × 4 columns

Let’s inspect all of our “Immune system disorders” to learn which terms map against the MONDO Disease ontology. We

[8]:
disease_bionty.inspect(adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True)
✅ 1 terms (33.3%) are mapped.
🔶 2 terms (66.7%) are not mapped.
[8]:
__mapped__
Immune system disorders
Rheumatoid arthritis True
Celiac disease False
Multipla sclurosis False

Apparently “Rheumatoid arthritis” could be mapped to the MONDO Disease ontology, but “Celiac disease” and “Multiple sclerosis” could not.

We can use Bionty’s lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete. For this purpose we create a lookup object.

[9]:
disease_bionty_lookup = disease_bionty.lookup()
[10]:
disease_bionty_lookup.celiac_disease
[10]:
disease(ontology_id='MONDO:0005130', name='celiac disease', definition='An Autoimmune Genetic Disorder With An Unknown Pattern Of Inheritance That Primarily Affects The Digestive Tract. It Is Caused By Intolerance To Dietary Gluten. Consumption Of Gluten Protein Triggers An Immune Response Which Damages Small Intestinal Villi And Prevents Adequate Absorption Of Nutrients. Clinical Signs Include Abdominal Cramping, Diarrhea Or Constipation And Weight Loss. If Untreated, The Clinical Course May Progress To Malnutrition, Anemia, Osteoporosis And An Increased Risk Of Intestinal Malignancies. However, The Prognosis Is Favorable With Successful Avoidance Of Gluten In The Diet.', synonyms='gluten-induced enteropathy|celiac sprue|idiopathic steatorrhea|gluten intolerance|coeliac disease|non tropical sprue', children=array(['MONDO:0800124'], dtype=object))

We found a match! Let’s look at the definition of our result.

[11]:
disease_bionty_lookup.celiac_disease.definition
[11]:
'An Autoimmune Genetic Disorder With An Unknown Pattern Of Inheritance That Primarily Affects The Digestive Tract. It Is Caused By Intolerance To Dietary Gluten. Consumption Of Gluten Protein Triggers An Immune Response Which Damages Small Intestinal Villi And Prevents Adequate Absorption Of Nutrients. Clinical Signs Include Abdominal Cramping, Diarrhea Or Constipation And Weight Loss. If Untreated, The Clinical Course May Progress To Malnutrition, Anemia, Osteoporosis And An Increased Risk Of Intestinal Malignancies. However, The Prognosis Is Favorable With Successful Avoidance Of Gluten In The Diet.'

This is exactly what we’ve been looking for. To find a final match for “Multiple sclerosis” we use Bionty’s fuzzy matching.

[12]:
disease_bionty.fuzzy_match("Multipla sclurosis", field=disease_bionty.name, case_sensitive=False)
[12]:
ontology_id definition synonyms children __ratio__
name
multiple sclerosis MONDO:0005301 A Progressive Autoimmune Disorder Affecting Th... None [MONDO:0005314, MONDO:0005284] 88.888889
[13]:
disease_bionty_lookup.multiple_sclerosis
[13]:
disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, children=array(['MONDO:0005314', 'MONDO:0005284'], dtype=object))

Now we can finally replace the values of our obs column with the MONDO Disease ontology values.

[14]:
adata.obs["Immune system disorders"] = [adata.obs["Immune system disorders"][0],
                                       disease_bionty_lookup.celiac_disease.name,
                                       disease_bionty_lookup.multiple_sclerosis.name]
adata.obs["Immune system disorders"]
[14]:
0    Rheumatoid arthritis
1          celiac disease
2      multiple sclerosis
Name: Immune system disorders, dtype: object
[15]:
disease_bionty.inspect(adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True)
✅ 3 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.
[15]:
__mapped__
Immune system disorders
Rheumatoid arthritis True
celiac disease True
multiple sclerosis True

Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns.

Mapping against the Disease Ontology with Bionty#

Bionty supports other ontologies besides the MONDO Disease Ontology like the Disease Ontology. The workflow here is very similar.

We solely need to adapt the source and the version.

[16]:
disease_bionty = bt.Disease(source="doid", version="2023-01-30")

The remaining workflow would be the same as above.

Conclusion#

ehrapy provides support for ontology management, inspection and mapping through Bionty. Bionty provide access to ontologies such as the Mondo Disease Ontology, Disease Ontology and many others. To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies. Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching.