ehrapy.io.read_csv

ehrapy.io.read_csv#

ehrapy.io.read_csv(dataset_path, sep=',', index_column=None, columns_obs_only=None, columns_x_only=None, return_dfs=False, cache=False, download_dataset_name=None, backup_url=None, archive_format=None, **kwargs)[source]#

Reads or downloads a desired directory of csv/tsv files or a single csv/tsv file.

Parameters:

dataset_path (Path | str) – Path to the file or directory to read.
sep (str, default: ',') – Separator in the file. Delegates to pandas.read_csv().
index_column (dict[str, str | int] | str | int | None, default: None) – The index column of obs. Usually the patient visit ID or the patient ID.
columns_obs_only (dict[str, list[str]] | list[str] | None, default: None) – These columns will be added to obs only and not X.
columns_x_only (dict[str, list[str]] | list[str] | None, default: None) – These columns will be added to X only and all remaining columns to obs. Note that datetime columns will always be added to .obs though.
return_dfs (bool, default: False) – Whether to return one or several Pandas DataFrames.
cache (bool, default: False) – Whether to write to cache when reading or not.
download_dataset_name (str | None, default: None) – Name of the file or directory after download.
backup_url (str | None, default: None) – URL to download the data file(s) from, if the dataset is not yet on disk.
archive_format (Literal['zip', 'tar', 'tar.gz', 'tgz'], default: None) – Whether the downloaded file is an archive.
**kwargs – Passed to pandas.read_csv()

Return type:

AnnData | dict[str, AnnData]

Returns:

An AnnData object or a dict with an identifier (the filename, without extension) for each AnnData object in the dict

Examples

>>> import ehrapy as ep
>>> adata = ep.io.read_csv("myfile.csv")

ehrapy.io.read_csv

Contents

ehrapy.io.read_csv#