ehrapy.tools.glm

ehrapy.tools.glm#

ehrapy.tools.glm(edata, var_names=None, formula=None, *, family='Gaussian', use_feature_types=False, missing='none', as_continuous=None, layer=None)[source]#

Create a Generalized Linear Model (GLM) from a formula, a distribution, and the data object.

See https://www.statsmodels.org/stable/generated/statsmodels.formula.api.glm.html#statsmodels.formula.api.glm

Parameters:

edata (EHRData) – Central data object.
var_names (Iterable[str] | None, default: None) – A list of var names indicating which columns are for the GLM model.
formula (str | None, default: None) – The formula specifying the model.
family (Literal['Gaussian', 'Binomial', 'Gamma', 'InverseGaussian'], default: 'Gaussian') – The distribution families. Available options are ‘Gaussian’, ‘Binomial’, ‘Gamma’, and ‘InverseGaussian’.
use_feature_types (bool, default: False) – If True, the feature types in the data objects .var are used.
missing (Literal['none', 'drop', 'raise'], default: 'none') – Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised.
as_continuous (Iterable[str] | None, default: None) – A list of var names indicating which columns are continuous rather than categorical. The corresponding columns will be set as type float.
layer (str | None, default: None) – The layer to use.

Return type:

GLM

Returns:

The GLM model instance.

Examples

>>> import ehrdata as ed
>>> import ehrapy as ep
>>> edata = ed.dt.mimic_2()
>>> formula = "day_28_flg ~ age"
>>> var_names = ["day_28_flg", "age"]
>>> family = "Binomial"
>>> glm = ep.tl.glm(edata, var_names, formula, family, missing="drop", as_continuous=["age"])

ehrapy.tools.glm

Contents

ehrapy.tools.glm#