ehrapy.tools.rank_features_supervised

ehrapy.tools.rank_features_supervised(adata, predicted_feature, model='regression', input_features='all', layer=None, test_split_size=0.2, key_added='feature_importances', feature_scaling='standard', percent_output=False, **kwargs)[source]

Calculate feature importances for predicting a specified feature in adata.var.

Parameters:
  • adata (AnnData) – AnnData object storing the data.

  • predicted_feature (str) – The feature to predict by the model. Must be present in adata.var_names.

  • model (Literal['regression', 'svm', 'rf']) – The model to use for prediction. Choose between ‘regression’, ‘svm’, or ‘rf’. Note that multi-class classification is only possible with ‘rf’. Defaults to ‘regression’.

  • input_features (Union[Iterable[str], Literal['all']]) – The features in adata.var to use for prediction. Should be a list of feature names. If ‘all’, all features in adata.var will be used. Note that non-numeric input features will cause an error, so make sure to encode them properly before. Defaults to ‘all’.

  • layer (str | None) – The layer in adata.layers to use for prediction. If None, adata.X will be used. Defaults to None.

  • test_split_size (float) – The split of data used for testing the model. Should be a float between 0 and 1, representing the proportion. Defaults to 0.2.

  • key_added (str) – The key in adata.var to store the feature importances. Defaults to ‘feature_importances’.

  • feature_scaling (Optional[Literal['standard', 'minmax']]) – The type of feature scaling to use for the input. Choose between ‘standard’, ‘minmax’, or None. ‘standard’ uses sklearn’s StandardScaler, ‘minmax’ uses MinMaxScaler. Scaler will be fit and transformed for each feature individually. Defaults to ‘standard’.

  • percent_output (bool) – Set to True to output the feature importances as percentages. Note that information about positive or negative coefficients for regression models will be lost. Defaults to False.

  • **kwargs – Additional keyword arguments to pass to the model. See the documentation of the respective model in scikit-learn for details.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=False)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.knn_impute(adata, n_neighbours=5)
>>> input_features = [
...     feat for feat in adata.var_names if feat not in {"service_unit", "day_icu_intime", "tco2_first"}
... ]
>>> ep.tl.rank_features_supervised(adata, "tco2_first", model="rf", input_features=input_features)