ehrapy.tools.glm

ehrapy.tools.glm(adata, var_names=None, formula=None, family='Gaussian', missing='none', as_continuous=None)[source]

Create a Generalized Linear Model (GLM) from a formula, a distribution, and AnnData.

See https://www.statsmodels.org/stable/generated/statsmodels.formula.api.glm.html#statsmodels.formula.api.glm

Parameters:
  • adata (AnnData) – The AnnData object for the GLM model.

  • var_names (Iterable[str] | None) – A list of var names indicating which columns are for the GLM model.

  • formula (str | None) – The formula specifying the model.

  • family (Literal['Gaussian', 'Binomial', 'Gamma', 'InverseGaussian']) – The distribution families. Available options are ‘Gaussian’, ‘Binomial’, ‘Gamma’, and ‘InverseGaussian’. Defaults to ‘Gaussian’.

  • missing (Literal['none', 'drop', 'raise']) – Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised (default: ‘none’).

  • as_continuous (Iterable[str] | None) – A list of var names indicating which columns are continuous rather than categorical. The corresponding columns will be set as type float.

Return type:

GLM

Returns:

The GLM model instance.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=False)
>>> formula = "day_28_flg ~ age"
>>> var_names = ["day_28_flg", "age"]
>>> family = "Binomial"
>>> glm = ep.tl.glm(adata, var_names, formula, family, missing="drop", ascontinus=["age"])