ehrapy.tools.rank_features_supervised

ehrapy.tools.rank_features_supervised(adata, predicted_feature, *, model='rf', input_features='all', layer=None, test_split_size=0.2, key_added='feature_importances', feature_scaling='standard', percent_output=False, verbose=True, return_score=False, **kwargs)[source]

Calculate feature importances for predicting a specified feature in adata.var.

Parameters:
  • adata (AnnData) – AnnData object storing the data.

  • predicted_feature (str) – The feature to predict by the model. Must be present in adata.var_names.

  • model (Literal['regression', 'svm', 'rf'], default: 'rf') – The model to use for prediction. Choose between ‘regression’, ‘svm’, or ‘rf’. Note that multi-class classification is only possible with ‘rf’.

  • input_features (Union[Iterable[str], Literal['all']], default: 'all') – The features in adata.var to use for prediction. Should be a list of feature names. If ‘all’, all features in adata.var will be used. Note that non-numeric input features will cause an error, so make sure to encode them properly before.

  • layer (str | None, default: None) – The layer in adata.layers to use for prediction. If None, adata.X will be used.

  • test_split_size (float, default: 0.2) – The split of data used for testing the model. Should be a float between 0 and 1, representing the proportion.

  • key_added (str, default: 'feature_importances') – The key in adata.var to store the feature importances.

  • feature_scaling (Optional[Literal['standard', 'minmax']], default: 'standard') – The type of feature scaling to use for the input. Choose between ‘standard’, ‘minmax’, or None. ‘standard’ uses sklearn’s StandardScaler, ‘minmax’ uses MinMaxScaler. Scaler will be fit and transformed for each feature individually.

  • percent_output (bool, default: False) – Set to True to output the feature importances as percentages. Note that information about positive or negative coefficients for regression models will be lost.

  • verbose (bool, default: True) – Set to False to disable logging.

  • return_score (bool, default: False) – Set to True to return the R2 score / the accuracy of the model.

  • **kwargs – Additional keyword arguments to pass to the model. See the documentation of the respective model in scikit-learn for details.

Return type:

float | None

Returns:

If return_score is True, the R2 score / accuracy of the model on the test set. Otherwise, None.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=False)
>>> ep.ad.infer_feature_types(adata)
>>> ep.pp.knn_impute(adata, n_neighbors=5)
>>> input_features = [
...     feat for feat in adata.var_names if feat not in {"service_unit", "day_icu_intime", "tco2_first"}
... ]
>>> ep.tl.rank_features_supervised(adata, "tco2_first", model="rf", input_features=input_features)