ehrapy.io.read_fhir

ehrapy.io.read_fhir(dataset_path, format='json', columns_obs_only=None, columns_x_only=None, return_df=False, cache=False, backup_url=None, index_column=None, download_dataset_name=None, archive_format=None)[source]

Reads one or multiple FHIR files using fhiry.

Uses https://github.com/dermatologist/fhiry to read the FHIR file into a Pandas DataFrame which is subsequently transformed into an AnnData object.

Be aware that FHIR data can be nested and return lists or dictionaries as values. In such cases, one can either: 1. Transform the data into an awkward array and flatten it when needed. 2. Extract values from all lists and dictionaries to store single values in the fields. 3. Remove all lists and dictionaries. Only do this if the information is not relevant to you.

Parameters:
  • dataset_path (str) – Path to one or multiple FHIR files.

  • format (Literal['json', 'ndjson']) – The file format of the FHIR data. One of ‘json’ or ‘ndjson’. Defaults to ‘json’.

  • columns_obs_only (list[str] | None) – These columns will be added to obs only and not X.

  • columns_x_only (list[str] | None) – These columns will be added to X only and all remaining columns to obs. Note that datetime columns will always be added to .obs though.

  • return_df (bool) – Whether to return one or several Pandas DataFrames.

  • cache (bool) – Whether to write to cache when reading or not. Defaults to False.

  • download_dataset_name (str | None) – Name of the file or directory in case the dataset is downloaded

  • index_column (str | int | None) – The index column for the generated object. Usually the patient or visit ID.

  • backup_url (str | None) – URL to download the data file(s) from if not yet existing.

Return type:

DataFrame | AnnData

Returns:

A Pandas DataFrame or AnnData object of the read in FHIR file(s).

Examples

>>> import ehrapy as ep
>>> adata = ep.io.read_fhir("/path/to/fhir/resources")

Be aware that most FHIR datasets have nested data that might need to be removed. In such cases consider working with DataFrames. >>> df = ep.io.read_fhir(“/path/to/fhir/resources”, return_df=True) >>> df.drop( … columns=[col for col in df.columns if any(isinstance(x, (list, dict)) for x in df[col].dropna())], … inplace=True, … ) >>> df.drop(columns=df.columns[df.isna().all()], inplace=True)