Changelog

Changelog#

This project adheres to Semantic Versioning.

v0.14.0rc1#

🚀 Features#

Add LOCF imputation ep.pp.locf_impute() for longitudinal (3D) data with forward fill and configurable fallback strategies (#1020) @agerardy @eroell
Add non-negative CP decomposition ep.tl.ncp() for 3D tensor factorisation with companion plots ep.pl.ncp() and ep.pl.ncp_cluster_trajectories() (#1030) @eroell
Add ep.pp.variable_correlations() and plotting functions ep.pl.variable_correlations() / ep.pl.variable_dependencies() (#1010) @sueoglu
Longitudinal explicit impute ep.pp.explicit_impute() extended to enable different imputation values per timepoint (#1023) @sueoglu
Sankey diagram state-transition colours and hover function for timeseries plots (#1019) @sueoglu
Add longitudinal data analysis notebook (#1007) @eroell

🐛 Bug Fixes#

Fix AttributeError when normalizing with edata.X = None (#1015) @agerardy @eroell

🧰 Maintenance#

Fix plotting CI (#1011) @sueoglu @eroell
Continuous values don’t repeat title in CohortTracker’s barplot (#1021) @sueoglu
Remove legacy code deprecated in 0.13.0, fix test warnings & adjust to future scanpy arguments (#1016) @eroell
Update plotting ci dotplot (#1011) @sueoglu @Zethson @eroell

🐛 Bug Fixes#

ep.pp normalization functions now work when using a layer and .X is None (#1015) @agerardy @eroell
ep.tl.rank_features_groups can use .obs regardless of what is in .X or .layers (#1015) @agerardy @eroell

⚠️ Modified#

Update qc_lab_metrics(#1025) @eroell
remove deprecated ep.ad (moved to ehrdata): infer_feature_types, feature_type_overview, replace_feature_types, anndata_to_df, df_to_anndata, move_to_obs, move_to_x (#1016) @eroell
remove deprecated ep.dt (moved to ehrdata) (#1016) @eroell
remove deprecated ep.io (moved to ehrdata): df_to_anndata, read_csv, read_fhir, read_h5ad, write (#1016) @eroell

v0.13.1#

🧰 Maintenance#

improve syntax usage (#1005) @Zethson
fix fknni extra (#1003) @Zethson

v0.13.0#

🚀 Features#

Transitioning from AnnData to EHRData EHRData replaces AnnData as ehrapy’s core data structure to better support time-series electronic health record data. The key enhancement is native support for 3D tensors (observations × variables × timesteps) alongside the existing 2D matrices, enabling efficient storage of longitudinal patient data. A new .tem DataFrame provides time-point annotations, complementing the existing .obs and .var annotations for comprehensive temporal data description. While EHRData maintains full backward compatibility with AnnData’s API, users can now seamlessly work with time-series data and leverage specialized methods for temporal analysis. Existing code using AnnData objects will continue to work, but migration to EHRData is strongly recommended to access enhanced time-series functionality.
The preferred central data object is now EHRData (#908) @eroell
The layers argument is now available for all functions operating on X or layers (#908) @eroell
Update expected behaviour of io.read_fhir (#922) @eroell
Move mimic_2, mimic_2_preprocessed, diabetes_130_raw, diabetes_130_fairlearn to ehrdata.dt (#908)
Deprecate all ep.dt.*, refer to datasets in ehrdata (#908) @eroell
Support Python 3.14 (#996) @Zethson
Move kaplan_meier & cox_ph plots to holoviews (#995) @Zethson
Longitudinal normalization (#958) @agerardy
Add interactive ols plot (#992) @Zethson
Longitudinal and new qc_metrics (#967) @sueoglu
Simple Impute for timeseries (#975) @eroell
Simple implementation of balanced sampling (#937) @sueoglu
Add Sankey diagram visualization functions (#989) @sueoglu
Add ep.pl.timeseries() to visualize variables over time (#994) @sueoglu
Add GPU CI & skeleton (#998) @Zethson
Add FAMD (#976) @Zethson
3D enabled implementation of ep.pp.filter_observations, ep.pp.filter_features (#953) @sueoglu
Add time series distances (#954) @Zethson

🐛 Bug Fixes#

All green if GPU skipped (#1000) @Zethson
Fix neighbors with timeseries (#973) @eroell
Fix use_rep when X none (#969) @eroell
Fix missing_values_barplot errors (#963) @sueoglu
Fix CR notebook (#939) @Zethson

🧰 Maintenance#

Update actions (#977) @Zethson
Cleanup simple_impute tests (#974) @eroell
Move to ehrdata 0.0.10 (#971) @eroell
Improved notebook CI (#959) @Zethson
Switch to template (#960) @Zethson
Tests for more plots (#919) @sueoglu
Lowerbound cvxpy (#935) @Zethson
Optimize var_metrics (#927) @Zethson
Refactor Dask usage pattern (#926) @Zethson
Add cover to README & remove some tokens (#923) @Zethson
Update test coverage reporting (#918) @eroell
Fix changelog links (#915) @Zethson
Fixed structure of Returns in _rank_features_groups.py documentation (#911) @agerardy
Add EHRData transition code (#897) @Zethson @eroell
Make test that downloads dermatology dataset more robust (#906) @Zethson
Update image source in README.md (#986) @eroell
Fix plot docs formatting (#952) @Zethson
Typo in the documentation of ehrapy.data.mimic_2_preprocessed (#917) @sueoglu

v0.12.1#

🚀 Features#

Make dowhy optional & remove medcat (#903) @Zethson
Add about page & improve citations (#902) @Zethson
Overhaul doc structure (#895) @Zethson
Move to biome & improve CI & reenable CR (#890) @Zethson
Clean up Round - cut down anndata extension functionality (#880) @eroell

v0.12.0#

🚀 Features#

Improved KM plot data depth and functionality (#853) @aGuyLearning
New Feature: Forestplot for CoxPH model (#838) @aGuyLearning
Datatype Support in Quality Control and Impute (#865) @aGuyLearning
Revamp survival analysis interface (#842) @aGuyLearning
Improve submodule documentation (#859) @Zethson
Update Kaplan Meier plots in survival analysis notebook (#864) @aGuyLearning

🐛 Bug Fixes#

Pass all non-nan features along desired var_names to impute (KNN) (#867) @nicolassidoux
Remove Syntax warnings (#869) @Zethson
Fix test_norm_power_group (#862) @Zethson

🧰 Maintenance#

Fix a typo in pl.paga_compare: pos -> pos, (#846) @VladimirShitov

v0.11.0#

✨ Features#

Add array type handling for normalization (#835) @eroell @Zethson

🐛 Bug Fixes#

Fix scipy array support (#844) @Zethson
Fix casting to float when assigning numeric values; fixes normalization of integer arrays (#837) @eroell

v0.9.0 & 0.10.0#

🚀 Features#

Make all imputation methods consistent in regard to encoding requirements (#827) @nicolassidoux
Add approximate KNN backend (#791) @nicolassidoux
Improve survival analysis interface (#825) @aGuyLearning
Python 3.12 support (#794) @Lilly-May
Python 3.10+ & use uv for docs & fix RTD & support numpy 2 (#830) @Zethson

🐛 Bug Fixes#

move_to_x: Fix name of non-implemented argument “copy” to “copy_x”, implement & test (#832) @eroell
Contributing typo fix (#821) @aGuyLearning
Fix miceforest (#800) @Zethson
style: == to is for type comparison (#774) @eroell

v0.8.0#

🚀 Features#

remove pyyaml & explicit scikit-learn (#729) @Zethson
Remove fancyimpute (#728) @Zethson
Unify feature type detection (#724) @Lilly-May
catplot (#721) @eroell
Simplify ehrapy (#719) @Zethson
Use all (#715) @Zethson
Add bias detection to preprocessing (#690) @Lilly-May
Use lamin logger (#707) @Zethson
Add faiss backend for KNN imputation (#704) @Zethson
Build RTD docs with uv (#700) @Zethson
Refactor feature importance ranking (#698) @Zethson
Simplify CI (#694) @Zethson
Refactor outliers and IQR (#692) @Zethson
Calculation of feature importances in a supervised setting (#677) @Lilly-May
Speed up winsorize (#681) @Zethson
Remove notebook prefix in tutorial URLs (#679) @Zethson
Add cohort tracking notebook (#678) @Zethson
Switch to uv (#674) @Zethson
Style: typing of _scale_func_group (#727) @eroell
Improved support of encoded features in detect_bias (#725) @Lilly-May
Enable Synchronous dataloader write (#722) @wxicu
Feature scaling on training set when computing feature importances (#716) @Lilly-May
add batch-wise normalization argument (#711) @eroell
add functools.wraps to type check (#705) @eroell
add bias notebook to list of notebooks (#696) @eroell
basic sampling (#686) @eroell
add options for subitles in legend of cohorttrackers barplot (#688) @eroell
doc fix imputation: 70 instead of 30 (#683) @eroell

🐛 Bug Fixes#

Encoded dtype to float32 instead of np.number (#714) @Zethson
Fix feature importance warnings (#708) @Zethson
Remove notebook prefix in tutorial URLs (#679) @Zethson
fix name of log_rogistic_aft to log_logistic_aft (#676) @eroell

🧰 Maintenance#

Remove notebook prefix in tutorial URLs (#679) @Zethson
Add cohort tracking notebook (#678) @Zethson
knni amendments (#706) @eroell

v0.7.0#

🚀 Features#

Cohort Tracker (#658) @eroell
change diabetes-130 datasets which are provided (#672) @eroell
More sa functions (#664) @fatisati
Coxphfitter (#643) @fatisati
Implement little’s test (#667) @Zethson
Improve test design (#651) @Zethson
Improve QC docstring (#639) @Zethson
Refactor _missing_values calculation (#638) @Zethson

🐛 Bug Fixes#

Fix one-hot encoding tests (#644) @Zethson

v0.6.0#

🚀 Features#

Breaking changes#

Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var (#630) @eroell

Make older AnnData objects compatible using

def move_type_info_from_uns_to_var(adata, copy=False):
    """Move type information from adata.uns to adata.var['ehrapy_column_type'].

    The latter is the current, updated flavor used by ehrapy.
    """
    if copy:
        adata = adata.copy()

    adata.var['ehrapy_column_type'] = 'unknown'

    if 'numerical_columns' in adata.uns.keys():
        for key in adata.uns['numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'numeric'
    if 'non_numerical_columns' in adata.uns.keys():
        for key in adata.uns['non_numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'non_numeric'
    if 'encoded_non_numerical_columns' in adata.uns.keys():
        for key in adata.uns['encoded_non_numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'non_numeric_encoded'

    if copy:
        return adata

New features#

Medcat refresh (#623) @eroell
Rank features groups obs (#622) @eroell
Add FHIR tutorial and simplify code (#626) @Zethson
Add input checks for imputers (#625) @Zethson
Removed unused dependencies (#615) @Zethson
Refactor encoding (#588) @Zethson

🐛 Bug Fixes#

Use fixtures for preprocessing tests (#577) @Zethson

🧰 Maintenance#

Refactoring (#627) @Zethson
Add FHIR tutorial and simplify code (#626) @Zethson
pre-commit (#587) @Zethson
Small edits (#599) @eroell

v0.5.0#

🚀 Features#

Add g-tests for rank features group (#546) @VladimirShitov
Causal Inference with dowhy (#502) @timtreis
Remove MuData support (#545) @Zethson

🐛 Bug Fixes#

Fixed reading format warnings (#569) @namsaraeva
Fixed inability to normalize AnnData that does not require encoding (#568) @namsaraeva
Fixed adata.uns[“non_numericlal_columns”] being empty in mimic_2 dataset (#567) @namsaraeva

v0.4.0#

🚀 Features#

Add Synthea dataset (#510) @namsaraeva
Added tiny examples to every function (#498) @namsaraeva
add a title parameter (#494) @xinyuejohn
Changed the hue of grey (#493) @namsaraeva
Logger info message when writing to .h5ad files (#458) @namsaraeva
Modified docstrings (#533) @namsaraeva
Added examples to missing modules (#531) @namsaraeva
Allow Python 3.11 (#523) @Zethson
Add test_kmf_logrank (#516) @Zethson
Add scget functions (#484) @Zethson
Add FHIR parsing support (#463) @Zethson
Add new tutorial & switch to python 3.10 (#454) @Zethson
Add docs group (#437) @Zethson
Add thefuzz (#434) @Zethson

🐛 Bug Fixes#

Fix CI (#524) @Zethson
Error message and minor fixes, issue #447 (#504) @namsaraeva
fix quality control (#495) @xinyuejohn
Fix MacOS CI (#435) @Zethson

🧰 Maintenance#

Add test_kmf_logrank (#516) @Zethson
Add scget functions (#484) @Zethson
Add new tutorial & switch to python 3.10 (#454) @Zethson

v0.3.0#

🚀 Features#

Add winsorize, clip quantiles and filter quantiles (#418) @Zethson
Remove PDF support (#430) @Zethson
Logging instance, issue #246 (#426) @namsaraeva
Negative values offset (#420) @Zethson
Missing values visualization, ref issue #271 (#419) @namsaraeva
Add copy_obs parameter to move_to_obs (#404) @namsaraeva
add anova_glm function (#400) @xinyuejohn
issue #397 “check for neighbors run before UMAP” fixed (#401) @namsaraeva
add more tutorials to CI (#382) @Zethson
add support for reading multiple files into Pandas DFs & adapted MIMIC-III Demo (#386) @Zethson
#321: Add X_only option for reading (#380) @Imipenem

🐛 Bug Fixes#

KeyError fix issue #423 (#428) @namsaraeva
fix qc_metrics bug (#425) @xinyuejohn
df_to_anndata logical XOR to OR, issue #422 (#429) @namsaraeva
Fix docs CI (#392) @Zethson
small fix in the qc_metrics() example (#407) @namsaraeva

🧰 Maintenance#

Add winsorize, clip quantiles and filter quantiles (#418) @Zethson
Remove PDF support (#430) @Zethson
Negative values offset (#420) @Zethson
Missing values visualization, ref issue #271 (#419) @namsaraeva
Fix docs CI (#392) @Zethson

v0.2.0#

🚀 Features#

Important cookietemple template update 2.1.0 released! (#343) @Zethson
add chronic kidney disease dataloader (#301) @xinyuejohn
dataloader for diabetes dataset (#292) @HorlavaNastassya
Add X_only option for reading (#380) @Imipenem
MedCAT API improvements & function renaming (#381) @Zethson
minor changes (#379) @xinyuejohn
add functions related to survival analysis (#371) @xinyuejohn
MedCat [#101]: extract biomedical concepts/entities from (free) text (#367) @Imipenem
Add heart dataset to docs (#377) @xinyuejohn
add heart disease data set to ehrapy (#376) @xinyuejohn
add highly_variable_features (#364) @xinyuejohn
add SoftImpute and IterativeSVD to imputation (#353) @xinyuejohn
(#307) Improve KNN with n_neighbours parameter (#365) @Imipenem
add furo theme & switch to markdown (#359) @Zethson
add several datasets and change Docstring examples (#355) @xinyuejohn
Add ability to compare laboratory measurements to reference values (#352) @Zethson
(Feature) New read API #263 (#351) @Imipenem
(Feature) Set index column #305 (#350) @Imipenem
Add encoded parameter to all new datasets amd fix import (#336) @xinyuejohn
(FEATURE) #314: Autodetect binary (0,1) columns (#327) @Imipenem
(FEATURE) Display QC metrics of var #239 (#323) @Imipenem
Add several dataset loaders (#322) @xinyuejohn
(FEATURE) Improve type_overview #306 (#308) @Imipenem
Feature/deep translator integration (#303) @MxMstrmn
remove CLI module (#298) @Zethson
Improve missforest interface (#284) @Zethson
Add example calls and preview images to all plotting functions (#289) @xinyuejohn
add heart failure dataloader (#291) @Zethson
add highly_variable_features (#364) @xinyuejohn
add SoftImpute and IterativeSVD to imputation (#353) @xinyuejohn
add furo theme & switch to markdown (#359) @Zethson

🐛 Bug Fixes#

(FIX) #255: Encode mutates input adata object (#348) @Imipenem
(FIX) Write .h5ad files (#347) @Imipenem
Fix #331: Improved autodetect docs (#344) @Imipenem
Add encoded parameter to all new datasets amd fix import (#336) @xinyuejohn
(FIX) Autodetect encode + specify encode mode for autodetect (#310) @Imipenem

🧰 Maintenance#

MedCAT API improvements & function renaming (#381) @Zethson
add functions related to survival analysis (#371) @xinyuejohn
MedCat [#101]: extract biomedical concepts/entities from (free) text (#367) @Imipenem
Add heart dataset to docs (#377) @xinyuejohn
remove CLI module (#298) @Zethson

v0.1.0#

🚀 Features#

Input and output of CSVs, PDFs, h5ad files
Several encoding modes (one-hot, label, …)
Several imputation methods (simple, KNN, MissForest, …)
Several normalization methods (log, scale, …)
Full Scanpy API support
Initial MedCAT integration
DeepL & Google Translator support

Changelog

Contents

Changelog#

v0.14.0rc1#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

🐛 Bug Fixes#

⚠️ Modified#

v0.13.1#

🧰 Maintenance#

v0.13.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.12.1#

🚀 Features#

v0.12.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.11.0#

✨ Features#

🐛 Bug Fixes#

v0.9.0 & 0.10.0#

🚀 Features#

🐛 Bug Fixes#

v0.8.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.7.0#

🚀 Features#

🐛 Bug Fixes#

v0.6.0#

🚀 Features#

Breaking changes#

New features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.5.0#

🚀 Features#

🐛 Bug Fixes#

v0.4.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.3.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.2.0#

🚀 Features#

🐛 Bug Fixes#

🧰 Maintenance#

v0.1.0#

🚀 Features#