Changelog#

This project adheres to Semantic Versioning.

v0.12.1#

πŸš€ Features#

  • Make dowhy optional & remove medcat #903 @Zethson

  • Add about page & improve citations #902 @Zethson

  • Overhaul doc structure #895 @Zethson

  • Move to biome & improve CI & reenable CR #890 @Zethson

  • Clean up Round - cut down anndata extension functionality #880 @eroell

v0.12.0#

πŸš€ Features#

  • Improved KM plot data depth and functionality #853 @aGuyLearning

  • New Feature: Forestplot for CoxPH model #838 @aGuyLearning

  • Datatype Support in Quality Control and Impute #865 @aGuyLearning

  • Revamp survival analysis interface #842 @aGuyLearning

  • Improve submodule documentation #859 @Zethson

  • Update Kaplan Meier plots in survival analysis notebook #864 @aGuyLearning

πŸ› Bug Fixes#

  • Pass all non-nan features along desired var_names to impute (KNN) #867 @nicolassidoux

  • Remove Syntax warnings #869 @Zethson

  • Fix test_norm_power_group #862 @Zethson

🧰 Maintenance#

  • Fix a typo in pl.paga_compare: pos -> pos, #846 @VladimirShitov

v0.11.0#

✨ Features#

  • Add array type handling for normalization #835 @eroell @Zethson

πŸ› Bug Fixes#

  • Fix scipy array support #844 @Zethson

  • Fix casting to float when assigning numeric values; fixes normalization of integer arrays #837 @eroell

v0.9.0 & 0.10.0#

πŸš€ Features#

  • Make all imputation methods consistent in regard to encoding requirements #827 @nicolassidoux

  • Add approximate KNN backend #791 @nicolassidoux

  • Improve survival analysis interface #825 @aGuyLearning

  • Python 3.12 support #794 @Lilly-May

  • Python 3.10+ & use uv for docs & fix RTD & support numpy 2 #830 @Zethson

πŸ› Bug Fixes#

  • move_to_x: Fix name of non-implemented argument β€œcopy” to β€œcopy_x”, implement & test #832 @eroell

  • Contributing typo fix #821 @aGuyLearning

  • Fix miceforest #800 @Zethson

  • style: == to is for type comparison #774 @eroell

v0.8.0#

πŸš€ Features#

  • remove pyyaml & explicit scikit-learn #729 @Zethson

  • Remove fancyimpute #728 @Zethson

  • Unify feature type detection #724 @Lilly-May

  • catplot #721 @eroell

  • Simplify ehrapy #719 @Zethson

  • Use all #715 @Zethson

  • Add bias detection to preprocessing #690 @Lilly-May

  • Use lamin logger #707 @Zethson

  • Add faiss backend for KNN imputation #704 @Zethson

  • Build RTD docs with uv #700 @Zethson

  • Refactor feature importance ranking #698 @Zethson

  • Simplify CI #694 @Zethson

  • Refactor outliers and IQR #692 @Zethson

  • Calculation of feature importances in a supervised setting #677 @Lilly-May

  • Speed up winsorize #681 @Zethson

  • Remove notebook prefix in tutorial URLs #679 @Zethson

  • Add cohort tracking notebook #678 @Zethson

  • Switch to uv #674 @Zethson

  • Style: typing of _scale_func_group #727 @eroell

  • Improved support of encoded features in detect_bias #725 @Lilly-May

  • Enable Synchronous dataloader write #722 @wxicu

  • Feature scaling on training set when computing feature importances #716 @Lilly-May

  • add batch-wise normalization argument #711 @eroell

  • add functools.wraps to type check #705 @eroell

  • add bias notebook to list of notebooks #696 @eroell

  • basic sampling #686 @eroell

  • add options for subitles in legend of cohorttrackers barplot #688 @eroell

  • doc fix imputation: 70 instead of 30 #683 @eroell

πŸ› Bug Fixes#

  • Encoded dtype to float32 instead of np.number #714 @Zethson

  • Fix feature importance warnings #708 @Zethson

  • Remove notebook prefix in tutorial URLs #679 @Zethson

  • fix name of log_rogistic_aft to log_logistic_aft #676 @eroell

🧰 Maintenance#

  • Remove notebook prefix in tutorial URLs #679 @Zethson

  • Add cohort tracking notebook #678 @Zethson

  • knni amendments #706 @eroell

v0.7.0#

πŸš€ Features#

  • Cohort Tracker #658 @eroell

  • change diabetes-130 datasets which are provided #672 @eroell

  • More sa functions #664 @fatisati

  • Coxphfitter #643 @fatisati

  • Implement little’s test #667 @Zethson

  • Improve test design #651 @Zethson

  • Improve QC docstring #639 @Zethson

  • Refactor _missing_values calculation #638 @Zethson

πŸ› Bug Fixes#

  • Fix one-hot encoding tests #644 @Zethson

v0.6.0#

πŸš€ Features#

Breaking changes#

  • Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630 @eroell

Make older AnnData objects compatible using

def move_type_info_from_uns_to_var(adata, copy=False):
    """
    Move type information from adata.uns to adata.var['ehrapy_column_type'].

    The latter is the current, updated flavor used by ehrapy.
    """
    if copy:
        adata = adata.copy()

    adata.var['ehrapy_column_type'] = 'unknown'

    if 'numerical_columns' in adata.uns.keys():
        for key in adata.uns['numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'numeric'
    if 'non_numerical_columns' in adata.uns.keys():
        for key in adata.uns['non_numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'non_numeric'
    if 'encoded_non_numerical_columns' in adata.uns.keys():
        for key in adata.uns['encoded_non_numerical_columns']:
            adata.var.loc[key, 'ehrapy_column_type'] = 'non_numeric_encoded'

    if copy:
        return adata

New features#

  • Medcat refresh #623 @eroell

  • Rank features groups obs #622 @eroell

  • Add FHIR tutorial and simplify code #626 @Zethson

  • Add input checks for imputers #625 @Zethson

  • Removed unused dependencies #615 @Zethson

  • Refactor encoding #588 @Zethson

πŸ› Bug Fixes#

  • Use fixtures for preprocessing tests #577 @Zethson

🧰 Maintenance#

  • Refactoring #627 @Zethson

  • Add FHIR tutorial and simplify code #626 @Zethson

  • pre-commit #587 @Zethson

  • Small edits #599 @eroell

v0.5.0#

πŸš€ Features#

  • Add g-tests for rank features group #546 @VladimirShitov

  • Causal Inference with dowhy #502 @timtreis

  • Remove MuData support #545 @Zethson

πŸ› Bug Fixes#

  • Fixed reading format warnings #569 @namsaraeva

  • Fixed inability to normalize AnnData that does not require encoding #568 @namsaraeva

  • Fixed adata.uns[β€œnon_numericlal_columns”] being empty in mimic_2 dataset #567 @namsaraeva

v0.4.0#

πŸš€ Features#

  • Add Synthea dataset #510 @namsaraeva

  • Added tiny examples to every function #498 @namsaraeva

  • add a title parameter #494 @xinyuejohn

  • Changed the hue of grey #493 @namsaraeva

  • Logger info message when writing to .h5ad files #458 @namsaraeva

  • Modified docstrings #533 @namsaraeva

  • Added examples to missing modules #531 @namsaraeva

  • Allow Python 3.11 #523 @Zethson

  • Add test_kmf_logrank #516 @Zethson

  • Add scget functions #484 @Zethson

  • Add FHIR parsing support #463 @Zethson

  • Add new tutorial & switch to python 3.10 #454 @Zethson

  • Add docs group #437 @Zethson

  • Add thefuzz #434 @Zethson

πŸ› Bug Fixes#

  • Fix CI #524 @Zethson

  • Error message and minor fixes, issue #447 #504 @namsaraeva

  • fix quality control #495 @xinyuejohn

  • Fix MacOS CI #435 @Zethson

🧰 Maintenance#

  • Add test_kmf_logrank #516 @Zethson

  • Add scget functions #484 @Zethson

  • Add new tutorial & switch to python 3.10 #454 @Zethson

v0.3.0#

πŸš€ Features#

  • Add winsorize, clip quantiles and filter quantiles #418 @Zethson

  • Remove PDF support #430 @Zethson

  • Logging instance, issue #246 #426 @namsaraeva

  • Negative values offset #420 @Zethson

  • Missing values visualization, ref issue #271 #419 @namsaraeva

  • Add copy_obs parameter to move_to_obs #404 @namsaraeva

  • add anova_glm function #400 @xinyuejohn

  • issue #397 β€œcheck for neighbors run before UMAP” fixed #401 @namsaraeva

  • add more tutorials to CI #382 @Zethson

  • add support for reading multiple files into Pandas DFs & adapted MIMIC-III Demo #386 @Zethson

  • #321: Add X_only option for reading #380 @Imipenem

πŸ› Bug Fixes#

  • KeyError fix issue #423 #428 @namsaraeva

  • fix qc_metrics bug #425 @xinyuejohn

  • df_to_anndata logical XOR to OR, issue #422 #429 @namsaraeva

  • Fix docs CI #392 @Zethson

  • small fix in the qc_metrics() example #407 @namsaraeva

🧰 Maintenance#

  • Add winsorize, clip quantiles and filter quantiles #418 @Zethson

  • Remove PDF support #430 @Zethson

  • Negative values offset #420 @Zethson

  • Missing values visualization, ref issue #271 #419 @namsaraeva

  • Fix docs CI #392 @Zethson

v0.2.0#

πŸš€ Features#

  • Important cookietemple template update 2.1.0 released! #343 @Zethson

  • add chronic kidney disease dataloader #301 @xinyuejohn

  • dataloader for diabetes dataset #292 @HorlavaNastassya

  • Add X_only option for reading #380 @Imipenem

  • MedCAT API improvements & function renaming #381 @Zethson

  • minor changes #379 @xinyuejohn

  • add functions related to survival analysis #371 @xinyuejohn

  • MedCat [#101]: extract biomedical concepts/entities from (free) text #367 @Imipenem

  • Add heart dataset to docs #377 @xinyuejohn

  • add heart disease data set to ehrapy #376 @xinyuejohn

  • add highly_variable_features #364 @xinyuejohn

  • add SoftImpute and IterativeSVD to imputation #353 @xinyuejohn

  • [#307] Improve KNN with n_neighbours parameter #365 @Imipenem

  • add furo theme & switch to markdown #359 @Zethson

  • add several datasets and change Docstring examples #355 @xinyuejohn

  • Add ability to compare laboratory measurements to reference values #352 @Zethson

  • [Feature] New read API #263 #351 @Imipenem

  • [Feature] Set index column #305 #350 @Imipenem

  • Add encoded parameter to all new datasets amd fix import #336 @xinyuejohn

  • [FEATURE] #314: Autodetect binary (0,1) columns #327 @Imipenem

  • [FEATURE] Display QC metrics of var #239 #323 @Imipenem

  • Add several dataset loaders #322 @xinyuejohn

  • [FEATURE] Improve type_overview #306 #308 @Imipenem

  • Feature/deep translator integration #303 @MxMstrmn

  • remove CLI module #298 @Zethson

  • Improve missforest interface #284 @Zethson

  • Add example calls and preview images to all plotting functions #289 @xinyuejohn

  • add heart failure dataloader #291 @Zethson

  • add highly_variable_features #364 @xinyuejohn

  • add SoftImpute and IterativeSVD to imputation #353 @xinyuejohn

  • add furo theme & switch to markdown #359 @Zethson

πŸ› Bug Fixes#

  • [FIX] #255: Encode mutates input adata object #348 @Imipenem

  • [FIX] Write .h5ad files #347 @Imipenem

  • Fix #331: Improved autodetect docs #344 @Imipenem

  • Add encoded parameter to all new datasets amd fix import #336 @xinyuejohn

  • [FIX] Autodetect encode + specify encode mode for autodetect #310 @Imipenem

🧰 Maintenance#

  • MedCAT API improvements & function renaming #381 @Zethson

  • add functions related to survival analysis #371 @xinyuejohn

  • MedCat [#101]: extract biomedical concepts/entities from (free) text #367 @Imipenem

  • Add heart dataset to docs #377 @xinyuejohn

  • remove CLI module #298 @Zethson

v0.1.0#

πŸš€ Features#

  • Input and output of CSVs, PDFs, h5ad files

  • Several encoding modes (one-hot, label, …)

  • Several imputation methods (simple, KNN, MissForest, …)

  • Several normalization methods (log, scale, …)

  • Full Scanpy API support

  • Initial MedCAT integration

  • DeepL & Google Translator support