Skip to contents

When to use this workflow

Use this workflow when your data was generated with a long-read platform (PacBio, Oxford Nanopore) or when you have used a short-read isoform quantifier such as Bagpiper, FLAMES, or LIQA that produces per-transcript per-cell counts. These tools assign each read to a specific full-length transcript rather than counting at the junction level.

If you have STARsolo junction counts from standard 10x Chromium short reads, see the Short-read workflow instead.


What you need

Input Description
Seurat object Already processed: gene-expression normalisation, UMAP, cluster labels.
Transcript count matrix Transcripts x cells matrix of UMI counts. Row names are transcript IDs matching those in your annotation.
SUPPA2 IOE files One or more .ioe files from SUPPA2’s generateEvents command, one per event type (SE, RI, SS, MX, FL). These map transcripts to inclusion/exclusion sets for each splicing event.

Step 1 – Build the Matisse object

Pass transcript_counts and ioe_files together to trigger event mode. PSI is calculated automatically during construction – no separate CalculatePSI() call is needed.

library(Matisse)

# transcript_counts: transcripts x cells sparse matrix (e.g. from Bagpiper)
# ioe_files: SUPPA2 .ioe output files
obj <- CreateMatisseObject(
  seurat            = seu,
  transcript_counts = transcript_counts,
  ioe_files         = c(
    "events_SE.ioe",   # skipped exons
    "events_RI.ioe",   # retained introns
    "events_SS.ioe"    # alternative splice sites
  ),
  min_coverage = 5L
)

Each cell x event PSI entry is the fraction of transcripts carrying the included form of that exon:

PSIc,e=inclusion transcript countsinclusion counts+exclusion countsPSI_{c,e} = \frac{\sum \text{inclusion transcript counts}} {\sum \text{inclusion counts} + \sum \text{exclusion counts}}

Cells with fewer than min_coverage total transcript counts for a given event are left as NA.


Step 2 – Summarise and quality control

Summarise the PSI distribution across events, then compute per-cell QC metrics before filtering.

# Per-event summary: mean PSI, median PSI, variance, cell coverage
psi_summary <- SummarizePSI(obj)
head(psi_summary)
obj <- ComputeIsoformQC(obj)
PlotQCMetrics(obj, group_by = "cell_type")

Remove low-coverage cells and uninformative events:

obj <- FilterCells(obj, min_pct_covered = 10)
obj <- FilterEvents(obj, min_cells_covered = 20, min_psi_variance = 0.01)

Step 3 – Normalise transcript counts

For long-read data the transcript-level count matrix often has high technical variability. SCTransform() on a Matisse object in event mode automatically targets the "transcript" assay and then runs PCA – useful because isoform variation is subtler than gene-expression variation and benefits from a wider PCA space.

# Normalise, regress out sequencing depth, and run PCA in one step.
# Increase n_pca_dims if clusters look under-resolved.
obj <- SCTransform(obj, n_pca_dims = 50)

Step 4 – Cluster and embed

Standard Seurat clustering functions work directly on the Matisse object. The PSI data and all other slots are preserved throughout.

obj <- RunUMAP(obj, dims = 1:50)
obj <- FindNeighbors(obj, dims = 1:50)
obj <- FindClusters(obj, resolution = 0.5)

Step 5 – Visualise splicing patterns

Overlay PSI on the UMAP

Each dot is a cell coloured by its PSI value for one event: blue = exon skipped (low PSI), red = exon included (high PSI).

PlotUMAP(
  obj,
  feature = "SE:chr18:3433647-3436055:+",
  title   = "PTBP1 exon 9 -- PSI per cell"
)

Compare splicing between cell types

PlotViolin(
  obj,
  feature  = "SE:chr18:3433647-3436055:+",
  group_by = "cell_type"
)

Survey all events at once

PlotHeatmap(obj, group_by = "seurat_clusters", max_cells = 400)

Note on event IDs: SUPPA2 event IDs use the format TYPE:chr:coords:strand (e.g. SE:chr18:100-200:300-400:+). These become the row names in GetPSI(obj), so use the same string when calling PlotUMAP() or subsetting.


Accessing data at any point

# Extract the underlying Seurat object
seu <- GetSeurat(obj)

# Cell metadata via $ (forwarded from Seurat)
obj$seurat_clusters
obj$cell_type

# Full PSI matrix as a sparse matrix (cells x events)
psi <- GetPSI(obj)

# Raw transcript counts (transcripts x cells)
tx <- GetTranscriptCounts(obj)

# Subset to one cell type -- PSI and gene expression stay in sync
neurons <- obj[obj$cell_type == "Neuron", ]

Session info

sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
#>  [5] xfun_0.57         cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
#>  [9] rmarkdown_2.31    lifecycle_1.0.5   cli_3.6.6         sass_0.4.10      
#> [13] pkgdown_2.2.0     textshaping_1.0.5 jquerylib_0.1.4   systemfonts_1.3.2
#> [17] compiler_4.5.3    tools_4.5.3       ragg_1.5.2        bslib_0.10.0     
#> [21] evaluate_1.0.5    yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [25] rlang_1.2.0       fs_2.1.0          htmlwidgets_1.6.4