Note

This page was generated from ontology_mapping.ipynb. Some tutorial content may look better in light mode.

Ontology mapping

Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts.

ehrapy is compatible with Bionty which provides access to public ontologies and functionality to map values against them.

Here, we’ll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies.

[1]:
import anndata as ad
import numpy as np
import pandas as pd

Create an AnnData object with disease annotations in the obs slot.

[2]:
adata = ad.AnnData(
    X=np.random.random((3, 3)),
    var=pd.DataFrame(index=[f"Lab value {val}" for val in range(3)]),
    obs=pd.DataFrame(
        columns=["Immune system disorders", "nervous system disorder", "injury"],
        data=[
            ["Rheumatoid arthritis", "Alzheimer's disease", "Fracture"],
            ["Celiac disease", "Parkinson's disease", "Traumatic brain injury"],
            ["Multipla sclurosis", "Epilepsy", "Fractured Femur"],
        ],
    ),
)
adata
/home/zeth/miniconda3/envs/ehrapy/lib/python3.11/site-packages/anndata/_core/anndata.py:183: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
[2]:
AnnData object with n_obs × n_vars = 3 × 3
    obs: 'Immune system disorders', 'nervous system disorder', 'injury'
[3]:
adata.obs
[3]:
Immune system disorders nervous system disorder injury
0 Rheumatoid arthritis Alzheimer's disease Fracture
1 Celiac disease Parkinson's disease Traumatic brain injury
2 Multipla sclurosis Epilepsy Fractured Femur

We notice that one of our injuries does not exist and we expect to have to correct it later.

Introduction to Bionty

First we import Bionty.

[4]:
import bionty_base as bt
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!

if you see this message repeatedly, run: import bionty_base; bionty_base.reset_sources()

Bionty provides support for several ontologies related to diseases.

[5]:
bt.display_available_sources().loc["Disease"]
[5]:
source organism version url md5 source_name source_website
entity
Disease mondo all 2024-02-06 http://purl.obolibrary.org/obo/mondo/releases/... 78914fa236773c5ea6605f7570df6245 Mondo Disease Ontology https://mondo.monarchinitiative.org
Disease mondo all 2023-08-02 http://purl.obolibrary.org/obo/mondo/releases/... 7f33767422042eec29f08b501fc851db Mondo Disease Ontology https://mondo.monarchinitiative.org
Disease mondo all 2023-04-04 http://purl.obolibrary.org/obo/mondo/releases/... 700c43dd9ba51aecc7a8edfc3bc2dab1 Mondo Disease Ontology https://mondo.monarchinitiative.org
Disease mondo all 2023-02-06 http://purl.obolibrary.org/obo/mondo/releases/... 2b7d479d4bd02a94eab47d1c9e64c5db Mondo Disease Ontology https://mondo.monarchinitiative.org
Disease mondo all 2022-10-11 http://purl.obolibrary.org/obo/mondo/releases/... 04b808d05c2c2e81430b20a0e87552bb Mondo Disease Ontology https://mondo.monarchinitiative.org
Disease doid human 2024-01-31 http://purl.obolibrary.org/obo/doid/releases/2... b36c15a4610757094f8db64b78ae2693 Human Disease Ontology https://disease-ontology.org
Disease doid human 2023-03-31 http://purl.obolibrary.org/obo/doid/releases/2... 64f083a1e47867c307c8eae308afc3bb Human Disease Ontology https://disease-ontology.org
Disease doid human 2023-01-30 http://purl.obolibrary.org/obo/doid/releases/2... 9f0c92ad2896dda82195e9226a06dc36 Human Disease Ontology https://disease-ontology.org
Disease icd human icd-11-2023 s3://bionty-assets/df_human__icd__icd-11-2023_... 16263aef644d2c62c47b7b1ecfbad9d6 International Classification of Diseases (ICD) https://www.cdc.gov/nchs/icd/icd9cm.htm
Disease icd human icd-10-2020 s3://bionty-assets/df_human__icd__icd-10-2020_... 93ec5734fcc2edd64686d5ffc6f6105f International Classification of Diseases (ICD) https://www.cdc.gov/nchs/icd/icd9cm.htm
Disease icd human icd-9-2011 s3://bionty-assets/df_human__icd__icd-9-2011__... cb3aefb3c4f7b2c47bf3de38453350c7 International Classification of Diseases (ICD) https://www.cdc.gov/nchs/icd/icd9cm.htm
Disease icd human icd-10-2024 s3://bionty-assets/df_human__icd__icd-10-2024_... None International Classification of Diseases (ICD) https://www.cdc.gov/nchs/icd/icd9cm.htm

Bionty provides three key functionalities:

  1. inspect: Check whether any of our values (here diseases) are mappable against a specified ontology.

  2. map_synonyms: Map values against synonyms. This is not relevant for our diseases.

  3. curate: Curate ontology values against the ontology to ensure compliance.

Mapping against the MONDO Disease Ontology with Bionty

We will now showcase how to access the Mondo Disease Ontology with Bionty. The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.

There are several different sources available that provide definitions and data models for diseases, such as HPO, OMIM, SNOMED CT, ICD, PhenoDB, MedDRA, MedGen, ORDO, DO, GARD, and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.

To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure.

Bionty is centered around Bionty entity objects that provide the above introduced functionality. We’ll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility.

[6]:
disease_bionty = bt.Disease(source="mondo", version="2023-02-06")
disease_bionty
[6]:
PublicOntology
Entity: Disease
Organism: all
Source: mondo, 2023-02-06
#terms: 25913

We can access the DataFrame that contains all ontology terms:

[7]:
disease_bionty.df()
[7]:
name definition synonyms parents
ontology_id
MONDO:0000001 disease or disorder A Disease Is A Disposition To Undergo Patholog... disorders|medical condition|other disease|dise... []
MONDO:0000002 obsolete 46,XX sex reversal None None []
MONDO:0000003 obsolete 17-hydroxysteroid dehydrogenase defic... None None []
MONDO:0000004 adrenocortical insufficiency An Endocrine Or Hormonal Disorder That Occurs ... adrenal gland insufficiency|adrenal cortical i... [MONDO:0002816]
MONDO:0000005 alopecia, isolated None None [MONDO:0021034]
... ... ... ... ...
MONDO:8000030 obsolete morphological anomaly None None []
MONDO:8000031 obsolete subtype of a disorder None None []
MONDO:8000032 obsolete malformation syndrome None None []
MONDO:8000033 obsolete group of disorders None None []
MONDO:8000034 obsolete disorder None None []

25913 rows × 4 columns

Let’s inspect all of our “Immune system disorders” to learn which terms map against the MONDO Disease ontology.

[8]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
3 terms (100.00%) are not validated for name: Rheumatoid arthritis, Celiac disease, Multipla sclurosis
   detected 2 terms with inconsistent casing/synonyms: Rheumatoid arthritis, Celiac disease
→  standardize terms via .standardize()
[8]:
__validated__
Rheumatoid arthritis False
Celiac disease False
Multipla sclurosis False

None of the values can be validated immediately, but “Rheumatoid arthritis” and “Celiac disease” have synonyms and can be standardized.

[9]:
adata.obs["Immune system disorders"] = disease_bionty.standardize(adata.obs["Immune system disorders"], field=disease_bionty.name)
💡 standardized 2/3 terms
[10]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
2 terms (66.70%) are validated for name
❗ 1 term (33.30%) is not validated for name: Multipla sclurosis
[10]:
__validated__
rheumatoid arthritis True
celiac disease True
Multipla sclurosis False

We can use Bionty’s lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete. For this purpose we create a lookup object.

[11]:
disease_bionty_lookup = disease_bionty.lookup()
[12]:
disease_bionty_lookup.multiple_sclerosis
[12]:
Disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, parents=array(['MONDO:0006704', 'MONDO:0000568', 'MONDO:0002562', 'MONDO:0005560'],
      dtype=object), _5='multiple sclerosis')

We found a match! Let’s look at the definition of our result.

[13]:
disease_bionty_lookup.multiple_sclerosis.definition
[13]:
'A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.'

This is exactly what we’ve been looking for. We can also search directly.

[14]:
disease_bionty.search(
    "Multipla sclurosis", field=disease_bionty.name, case_sensitive=False
)
[14]:
ontology_id definition synonyms parents __agg__ __ratio__
name
multiple sclerosis MONDO:0005301 A Progressive Autoimmune Disorder Affecting Th... None [MONDO:0006704, MONDO:0000568, MONDO:0002562, ... multiple sclerosis 88.888889
multiple sclerosis variant MONDO:0016428 None None [MONDO:0005071] multiple sclerosis variant 72.727273
pediatric multiple sclerosis MONDO:0018784 Pediatric Multiple Sclerosis (Ms) Is A Rare Mu... None [MONDO:0016428] pediatric multiple sclerosis 69.565217
lateral sclerosis MONDO:0018155 Primary Lateral Sclerosis (Pls) Is An Idiopath... primary lateral sclerosis|adult-onset PLS|PLS|... [MONDO:0024257] lateral sclerosis 68.571429
glomerulosclerosis MONDO:0000490 A Hardening Of The Kidney Glomerulus Caused By... glomerular sclerosis [MONDO:0019722] glomerulosclerosis 68.421053
... ... ... ... ... ... ...
BAFopathy MONDO:0700120 Disorder Caused By Mutations In The Various Su... None [MONDO:0003847] bafopathy 14.814815
hydrocele MONDO:0004920 None None [MONDO:0003150] hydrocele 14.814815
XH antigen MONDO:0010760 None XH antigen [MONDO:0003847] xh antigen 14.285714
angiomyxoma MONDO:0006086 A Benign Soft Tissue Neoplasm Characterized By... None [MONDO:0021581, MONDO:0044335] angiomyxoma 13.793103
Pygmy MONDO:0009941 None Pygmy [MONDO:0003847] pygmy 8.695652

25913 rows × 6 columns

Now we can finally replace the values of our obs column with the MONDO Disease ontology values.

[15]:
adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},
                                             inplace=True)
adata.obs["Immune system disorders"]
/tmp/ipykernel_305804/3382110660.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},
[15]:
0    rheumatoid arthritis
1          celiac disease
2      multiple sclerosis
Name: Immune system disorders, dtype: object
[16]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)
3 terms (100.00%) are validated for name
[16]:
__validated__
rheumatoid arthritis True
celiac disease True
multiple sclerosis True

Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns.

Mapping against other Disease ontologies

Bionty supports other ontologies besides the MONDO Disease Ontology like the Disease Ontology or ICD. The workflow is the same.

We solely need to adapt the source and the version.

[17]:
disease_bionty = bt.Disease(source="icd", version="icd-11-2023")
disease_bionty
[17]:
PublicOntology
Entity: Disease
Organism: human
Source: icd, icd-11-2023
#terms: 35574

The remaining workflow would be the same as above.

Conclusion

ehrapy provides support for ontology management, inspection and mapping through Bionty. Bionty provide access to ontologies such as the Mondo Disease Ontology, Disease Ontology and many others. To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies. Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching.