ehrapy.tools.rank_features_supervised

ehrapy.tools.rank_features_supervised#

ehrapy.tools.rank_features_supervised(edata, predicted_feature, *, model='rf', input_features='all', layer=None, test_split_size=0.2, key_added='feature_importances', feature_scaling='standard', percent_output=False, verbose=True, return_score=False, **kwargs)[source]#

Calculate feature importances for predicting a specified feature in adata.var.

Parameters:

edata (EHRData | AnnData) – Central data object.
predicted_feature (str) – The feature to predict by the model. Must be present in edata.var_names.
model (Literal['regression', 'svm', 'rf'], default: 'rf') – The model to use for prediction. Choose between ‘regression’, ‘svm’, or ‘rf’. Multi-class classification is only possible with ‘rf’.
input_features (Iterable[str] | Literal['all'], default: 'all') – The features in edata.var to use for prediction. Should be a list of feature names. If ‘all’, all features in edata.var will be used. Non-numeric input features will error.
layer (str | None, default: None) – The layer in edata.layers to use for prediction. If None, edata.X will be used.
test_split_size (float, default: 0.2) – The split of data used for testing the model. Should be a float between 0 and 1, representing the proportion.
key_added (str, default: 'feature_importances') – The key in edata.var to store the feature importances.
feature_scaling (Literal['standard', 'minmax'] | None, default: 'standard') – The type of feature scaling to use for the input. Choose between ‘standard’, ‘minmax’, or None. ‘standard’ uses sklearn’s StandardScaler, ‘minmax’ uses MinMaxScaler. Scaler will be fit and transformed for each feature individually.
percent_output (bool, default: False) – Set to True to output the feature importances as percentages. Note that information about positive or negative coefficients for regression models will be lost.
verbose (bool, default: True) – Set to False to disable logging.
return_score (bool, default: False) – Set to True to return the R2 score / the accuracy of the model.
**kwargs – Additional keyword arguments to pass to the model. See the documentation of the respective model in scikit-learn for details.

Return type:

float | None

Returns:

If return_score is True, the R2 score / accuracy of the model on the test set. Otherwise, None.

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> ed.infer_feature_types(edata)
>>> ep.pp.knn_impute(edata, n_neighbors=5)
>>> input_features = [
...     feat for feat in edata.var_names if feat not in {"service_unit", "day_icu_intime", "tco2_first"}
... ]
>>> ep.tl.rank_features_supervised(edata, "tco2_first", model="rf", input_features=input_features)

ehrapy.tools.rank_features_supervised

Contents

ehrapy.tools.rank_features_supervised#