ehrapy.tools.MedCAT#

class ehrapy.tools.MedCAT(anndata, vocabulary=None, concept_db=None, model_pack_path=None)[source]#

Wrapper class for Medcat. This class will hold references to the current AnnData object, which holds the data, the current model (with vocab and concept database) and should be passed to all functions exposed to the ehrapy nlp API when required.

Methods table#

create_concept_db(csv_path[, config])

Creates a MedCAT concept database and sets it for the MedCAT object.

create_vocabulary(vocabulary_data[, replace])

Creates a MedCAT Vocab and sets it for the MedCAT object.

load_concept_db(concept_db_path)

Loads the concept database.

load_vocabulary(vocabulary_path)

Loads a vocabulary.

save_concept_db(cdb, output_path)

Saves a concept database.

save_model_pack([model_pack_dir, name])

Saves a MedCAT model pack.

save_vocabulary(vocab, output_path)

Saves a vocabulary.

set_filter_by_tui(tuis)

Restrict results of annotation step to certain tui's (type unique identifiers).

update_cat([vocabulary, concept_db])

Updates the current MedCAT instance with new Vocabularies and Concept Databases.

update_cat_config(concept_db_config)

Updates the MedCAT configuration.

Methods#

create_concept_db#

static MedCAT.create_concept_db(csv_path, config=None)[source]#

Creates a MedCAT concept database and sets it for the MedCAT object.

Parameters:
  • csv_path (list[str]) –

    List of paths to one or more csv files containing all concepts. The concept csvs must look like:

    cui,name 1,kidney failure 7,coronavirus

  • config (Config) – Optional MedCAT concept database configuration. If not provided a default configuration with config.general[‘spacy_model’] = ‘en_core_sci_md’ is created.

Return type:

CDB

Returns:

Instance of a MedCAT CDB concept database

create_vocabulary#

static MedCAT.create_vocabulary(vocabulary_data, replace=True)[source]#

Creates a MedCAT Vocab and sets it for the MedCAT object.

Parameters:
  • vocabulary_data (str) –

    Path to the vocabulary data. It is a tsv file and must look like:

    <token> <word_count> <vector_embedding_separated_by_spaces> house 34444 0.3232 0.123213 1.231231

  • replace (bool) – Whether to replace existing words in the vocabulary.

Return type:

Vocab

Returns:

Instance of a MedCAT Vocab

load_concept_db#

static MedCAT.load_concept_db(concept_db_path)[source]#

Loads the concept database.

Parameters:

concept_db_path – Path to load the concept database from.

Return type:

CDB

load_vocabulary#

static MedCAT.load_vocabulary(vocabulary_path)[source]#

Loads a vocabulary.

Parameters:

vocabulary_path – Path to load the vocabulary from.

Return type:

Vocab

save_concept_db#

static MedCAT.save_concept_db(cdb, output_path)[source]#

Saves a concept database.

Parameters:
  • cdb – the concept database object

  • output_path (str) – Path to save the concept database to.

Return type:

None

save_model_pack#

MedCAT.save_model_pack(model_pack_dir='.', name='ehrapy_medcat_model_pack')[source]#

Saves a MedCAT model pack.

Parameters:
  • model_pack_dir (str) – Path to save the model to (defaults to current working directory).

  • name (str) – Name of the new model pack

Return type:

None

save_vocabulary#

static MedCAT.save_vocabulary(vocab, output_path)[source]#

Saves a vocabulary.

Parameters:
  • vocab (Vocab) – The vocabulary object

  • output_path (str) – Path to write the vocabulary to.

Return type:

None

set_filter_by_tui#

MedCAT.set_filter_by_tui(tuis)[source]#

Restrict results of annotation step to certain tui’s (type unique identifiers).

Note that this will change the MedCat object by updating the concept database config. In every annotation process that will be run afterwards, entities are shown, only if they fall into the tui’s type. A full list of tui’s can be found at: https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/SemanticTypes_2018AB.txt

As an example: Setting tuis=[“T047”, “T048”] will only annotate concepts (identified by a CUI (concept unique identifier)) in UMLS that are either diseases or syndroms (T047) or mental/behavioural dysfunctions (T048).

Parameters:

tuis (list[str]) – list of TUI’s (default is

Return type:

None

update_cat#

MedCAT.update_cat(vocabulary=None, concept_db=None)[source]#

Updates the current MedCAT instance with new Vocabularies and Concept Databases.

Parameters:
  • vocabulary (Vocab) – Vocabulary to update to.

  • concept_db (CDB) – Concept Database to update to.

update_cat_config#

MedCAT.update_cat_config(concept_db_config)[source]#

Updates the MedCAT configuration.

Parameters:

concept_db_config (Config) – Concept to update to.

Return type:

None