ehrapy.io.read_fhir#

ehrapy.io.read_fhir(dataset_path, format='json', columns_obs_only=None, columns_x_only=None, return_df=False, cache=False, backup_url=None, index_column=None, download_dataset_name=None, archive_format=None)[source]#

Reads one or multiple FHIR files using fhiry.

Uses https://github.com/dermatologist/fhiry to read the FHIR file into a Pandas DataFrame which is subsequently transformed into an AnnData object.

Parameters:
  • dataset_path (str) – Path to one or multiple FHIR files.

  • format (Literal['json', 'ndjson']) – The file format of the FHIR data. One of ‘json’ or ‘ndjson’. Defaults to ‘json’.

  • columns_obs_only (Optional[list[str]]) – These columns will be added to obs only and not X.

  • columns_x_only (Optional[list[str]]) – These columns will be added to X only and all remaining columns to obs. Note that datetime columns will always be added to .obs though.

  • return_df (bool) – Whether to return one or several Pandas DataFrames.

  • cache (bool) – Whether to write to cache when reading or not. Defaults to False.

  • download_dataset_name (Optional[str]) – Name of the file or directory in case the dataset is downloaded

  • index_column (Union[str, int, None]) – The index column for the generated object. Usually the patient or visit ID.

  • backup_url (Optional[str]) – URL to download the data file(s) from if not yet existing.

Return type:

DataFrame | AnnData

Returns:

A Pandas DataFrame or AnnData object of the read in FHIR file(s).

Examples

>>> import ehrapy as ep
>>> adata = ep.io.read_fhir("/path/to/fhir/resources")