Lineage tree

This example shows how lineage trees can be passed, specifically useful for the LineageProblem, which requires lineage information. Check moslin [Lange et al., 2023] for examples on real-world data.

moscot allows this by passing the:

  1. precomputed cost matrices,

  2. barcode information,

  3. or the lineage tree as a DiGraph.

In this notebook, we consider the lineage tree case.

See also

  • TODO: link to other relevant examples

Imports and data loading

from moscot import datasets
from moscot.problems.time import LineageProblem

Simulate data using simulate_data().

adata = datasets.simulate_data(n_distributions=3, key="day", quad_term="tree")
adata
AnnData object with n_obs × n_vars = 60 × 60
    obs: 'day', 'celltype'
    uns: 'trees'

We assume trees are saved in uns as a dict, where each key is a value in obs and each value is a DiGraph.

adata.uns["trees"]
{0: <networkx.classes.digraph.DiGraph at 0x7fb4786c9ae0>,
 1: <networkx.classes.digraph.DiGraph at 0x7fb4786c9720>,
 2: <networkx.classes.digraph.DiGraph at 0x7fb4786c94b0>}

Leaf distance

Now, we can instantiate and prepare the LineageProblem by specifying the cost.

lp = LineageProblem(adata)
lp = lp.prepare(
    time_key="day",
    lineage_attr={"attr": "uns", "key": "trees", "cost": "leaf_distance"},
)
INFO     Computing pca with `n_comps=30` for `xy` using `adata.X`                                                  
INFO     Computing pca with `n_comps=30` for `xy` using `adata.X`                                                  

Internally, cost matrices have been computed from the trees using the shortest path distance between the leaves. Let us investigate the first few entries of the cost matrix computed from the first lineage tree.

lp[0, 1].x.data_src[:3, :3]
array([[0., 2., 3.],
       [2., 0., 3.],
       [3., 3., 0.]])

Similarly, we investigate parts of the cost matrix created from the second tree.

lp[0, 1].y.data_src[:3, :3]
array([[0., 2., 3.],
       [2., 0., 3.],
       [3., 3., 0.]])

Note that the gene expression term is still saved as two point clouds. This cost matrix will be computed by the backend.

lp[0, 1].xy.data_src.shape, lp[0, 1].xy.data_tgt.shape
((20, 30), (20, 30))