ehrapy.io.read_csv

ehrapy.io.read_csv(dataset_path, sep=',', index_column=None, columns_obs_only=None, columns_x_only=None, return_dfs=False, cache=False, backup_url=None, download_dataset_name=None, archive_format=None, **kwargs)[source]

Reads or downloads a desired directory of csv/tsv files or a single csv/tsv file.

Parameters:
  • dataset_path (Path | str) – Path to the file or directory to read.

  • sep (str) – Separator in the file. One of either , (comma) or (tab). Defaults to , (comma)

  • index_column (dict[str, str | int] | str | int | None) – The index column of obs. Usually the patient visit ID or the patient ID.

  • columns_obs_only (dict[str, list[str]] | list[str] | None) – These columns will be added to obs only and not X.

  • columns_x_only (dict[str, list[str]] | list[str] | None) – These columns will be added to X only and all remaining columns to obs. Note that datetime columns will always be added to .obs though.

  • return_dfs (bool) – Whether to return one or several Pandas DataFrames.

  • cache (bool) – Whether to write to cache when reading or not. Defaults to False .

  • download_dataset_name (str | None) – Name of the file or directory after download.

  • backup_url (str | None) – URL to download the data file(s) from, if the dataset is not yet on disk.

  • is_archive – Whether the downloaded file is an archive.

Return type:

AnnData | dict[str, AnnData]

Returns:

An AnnData object or a dict with an identifier (the filename, without extension) for each AnnData object in the dict

Examples

>>> import ehrapy as ep
>>> adata = ep.io.read_csv("myfile.csv")