ehrapy.tools.glm#

ehrapy.tools.glm(adata, var_names=None, formula=None, family='Gaussian', missing='none', as_continuous=None)[source]#

Create a Generalized Linear Model (GLM) from a formula, a distribution, and AnnData.

See https://www.statsmodels.org/stable/generated/statsmodels.formula.api.glm.html#statsmodels.formula.api.glm Internally use the statsmodel to create a GLM Model from a formula, a distribution, and dataframe.

Parameters:
  • adata (AnnData) – The AnnData object for the GLM model.

  • var_names (Optional[Iterable[str]]) – A list of var names indicating which columns are for the GLM model.

  • formula (Optional[str]) – The formula specifying the model.

  • family (Literal['Gaussian', 'Binomial', 'Gamma', 'InverseGaussian']) – The distribution families. Available options are ‘Gaussian’, ‘Binomial’, ‘Gamma’, and ‘InverseGaussian’. Defaults to ‘Gaussian’.

  • missing (Literal['none', 'drop', 'raise']) – Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised (default: ‘none’).

  • ascontinus – A list of var names indicating which columns are continuous rather than categorical. The corresponding columns will be set as type float.

Return type:

GLM

Returns:

The GLM model instance.

Examples

>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=False)
>>> formula = 'day_28_flg ~ age'
>>> var_names = ['day_28_flg', 'age']
>>> family = 'Binomial'
>>> glm = ep.tl.glmglm(adata, var_names, formula, family, missing = 'drop', ascontinus = ['age'])