About ehrapy#
ehrapy (EHR Analysis in Python) is a unified framework for the exploratory analysis of heterogeneous electronic health record (EHR) data, providing a standardized approach to handle the complexities of healthcare datasets. If you find ehrapy useful for your research, please check out Citing ehrapy.
Design principles#
The framework is based on three key principles: Standardization, Modularity, and Extensibility.
Standardization#
Healthcare data is complex, heterogeneous, and often suffers from biases, inconsistencies, and missing values. ehrapy tackles these challenges by organizing EHR data into a consistent format using the EHRData structure, which represents observations (patients/visits) as rows, variables (measurements) as columns, and time optionally in a third dimension. This structure enables efficient storage, unified preprocessing, and consistent analytical approaches across diverse datasets.
Modularity#
A typical ehrapy workflow consists of several independent yet interconnected components:
Data loading and preprocessing: Importing data from various sources, mapping against ontologies, quality control, normalization, and imputation.
Exploration and visualization: Generating lower-dimensional embeddings, clustering, and visualization to obtain meaningful patient landscapes.
Knowledge inference: Statistical analysis, patient stratification, survival analysis, causal inference, and trajectory inference.
This modular design allows users to customize their analysis pipeline while maintaining compatibility with the broader ecosystem of tools, including scverse.
Extensibility#
ehrapy is built on open standards and interfaces with widely-used scientific Python libraries. Its AnnData-based structure (EHRData) ensures compatibility with a growing ecosystem of analysis tools and visualization frameworks. The package seamlessly integrates with external libraries for machine learning, statistical analysis, and deep learning. This extensibility enables ehrapy to address new challenges, such as developing foundation models for biomedical research, without requiring significant architectural changes.
Why is it called “ehrapy”?#
The name ehrapy combines EHR (Electronic Health Records) with py (Python), reflecting its purpose as a Python framework for comprehensive EHR analysis. It provides a standardized approach to process, analyze, and derive insights from complex healthcare data in a computationally efficient and reproducible manner.