ehrapy.tools.ncp

Contents

ehrapy.tools.ncp#

ehrapy.tools.ncp(edata, *, layer, rank=4, n_iter_max=300, init='random', sigmoid_transform=False, key_added='ncp', random_state=0, copy=False)[source]#

Non-negative CP (PARAFAC) decomposition of a 3D temporal layer.

Decomposes the stored 3D data into three factor matrices (all factors non-negative).

Uses tensorly.decomposition.non_negative_parafac().

Parameters:
  • edata (EHRData | AnnData) – Central data object.

  • layer (str) – Key of the 3D layer to decompose (shape n_obs × n_vars × n_time).

  • rank (int, default: 4) – Number of components (rank of the decomposition).

  • n_iter_max (int, default: 300) – Maximum number of ALS iterations.

  • init (str, default: 'random') – Initialisation strategy passed to non_negative_parafac() ("random" or "svd").

  • sigmoid_transform (bool, default: False) – If True, apply a sigmoid transformation to the layer before decomposition. Useful when the layer contains raw logits.

  • key_added (str, default: 'ncp') – Key prefix for storing results. Results are stored as edata.obsm["X_{key_added}"] (sample factors, shape n_obs × rank), edata.varm["{key_added}_loadings"] (variable factors, shape n_vars × rank), and edata.uns["{key_added}"] (temporal factors + metadata).

  • random_state (int, default: 0) – Random seed for reproducibility.

  • copy (bool, default: False) – Whether to return a copy rather than modifying in place.

Return type:

EHRData | AnnData | None

Returns:

None if copy=False, else a modified copy of edata.

Examples

>>> import numpy as np, pandas as pd
>>> import ehrdata as ed, ehrapy as ep
>>> np.random.seed(0)
>>> tensor = np.abs(np.random.randn(30, 8, 12))  # patients × vars × time
>>> edata = ed.EHRData(
...     shape=(30, 8),
...     layers={"data": tensor},
...     var=pd.DataFrame(index=[f"var_{i}" for i in range(8)]),
... )
>>> ep.tl.ncp(edata, layer="data", rank=3)
>>> edata.obsm["X_ncp"].shape  # (30, 3)  – sample factors
>>> edata.varm["ncp_loadings"].shape  # (8, 3)   – variable factors
>>> edata.uns["ncp"]["temporal_factors"].shape  # (12, 3) – time factors