Preprocessing#

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic preprocessing#

preprocessing.encode

Encode categoricals of an AnnData object.

preprocessing.pca

Computes a principal component analysis.

preprocessing.regress_out

Regress out (mostly) unwanted sources of variation.

preprocessing.subsample

Subsample to a fraction of the number of observations.

preprocessing.balanced_sample

Balancing groups in the dataset.

preprocessing.highly_variable_features

Annotate highly variable features.

preprocessing.winsorize

Returns a Winsorized version of the input array.

preprocessing.clip_quantile

Clips (limits) features.

preprocessing.summarize_measurements

Summarizes numerical measurements into minimum, maximum and average values.

Quality control#

preprocessing.qc_metrics

Calculates various quality control metrics.

preprocessing.qc_lab_measurements

Examines lab measurements for reference ranges and outliers.

preprocessing.mcar_test

Statistical hypothesis test for Missing Completely At Random (MCAR).

preprocessing.detect_bias

Detects biases in the data using feature correlations, standardized mean differences, and feature importances.

Imputation#

preprocessing.explicit_impute

Replaces all missing values in all columns or a subset of columns specified by the user with the passed replacement value.

preprocessing.simple_impute

Impute missing values in numerical data using mean/median/most frequent imputation.

preprocessing.knn_impute

Imputes missing values in the input AnnData object using K-nearest neighbor imputation.

preprocessing.miss_forest_impute

Impute data using the MissForest strategy.

preprocessing.mice_forest_impute

Impute data using the miceforest.

Normalization#

preprocessing.log_norm

Apply log normalization.

preprocessing.maxabs_norm

Apply max-abs normalization.

preprocessing.minmax_norm

Apply min-max normalization.

preprocessing.power_norm

Apply power transformation normalization.

preprocessing.quantile_norm

Apply quantile normalization.

preprocessing.robust_scale_norm

Apply robust scaling normalization.

preprocessing.scale_norm

Apply scaling normalization.

preprocessing.offset_negative_values

Offsets negative values into positive ones with the lowest negative value becoming 0.

Dataset Shift Correction#

Partially overlaps with dataset integration. Note that a simple batch correction method is available via pp.regress_out().

preprocessing.combat

ComBat function for batch effect correction [JLR06], [LJP+17], [Ped12].

Neighbors#

preprocessing.neighbors

Compute a neighborhood graph of observations [MHM18].