STACAS is a method for scRNA-seq integration or batch effect correction. Through an open-source benchmark, we showed that STACAS outperforms competing methods such as Harmony, Seurat, and scVI/scANVI.
Prior cell type knowledge, given as cell type labels, can be provided to the algorithm to perform semi-supervised integration, leading to increased preservation of biological variability in the resulting integrated space.
STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks.
To install STACAS directly from the Git repository, run the following code from within RStudio:
if (!requireNamespace("remotes")) install.packages("remotes")
library(remotes)
remotes::install_github("carmonalab/STACAS")Standard integration (more here)
library(STACAS)
# get the test dataset "pbmcsca" from SeuratData package
if (!requireNamespace("remotes")) install.packages("remotes")
if (!requireNamespace("SeuratData")) install_github('satijalab/seurat-data')
library(SeuratData)
library(Seurat)
options(timeout = 3000)
InstallData("pbmcsca")
data("pbmcsca")
pbmcsca <- UpdateSeuratObject(pbmcsca)
# Integrate scRNA-seq datasets generated in different batches (in this example, using different methods/technologies)
pbmcsca.integrated <- NormalizeData(pbmcsca) |>
SplitObject(split.by = "Method")|>
Run.STACAS()
pbmcsca.integrated <- RunUMAP(pbmcsca.integrated, dims = 1:30)
# Visualize
DimPlot(pbmcsca.integrated, group.by = c("Method","CellType")) Semi-supervised integration (more here)
pbmcsca.semisup <- NormalizeData(pbmcsca) |>
SplitObject(split.by = "Method")|>
Run.STACAS(cell.labels = "CellType")
pbmcsca.semisup <- RunUMAP(pbmcsca.semisup, dims = 1:30) Find a tutorial for STACAS in a complete Seurat integration pipeline at: STACAS demo (code and instructions here)
See also how STACAS compares to other computational tools for the integration of heterogeneos data sets: STACAS vs other tools
Use scIntegrationMetrics to evaluate the quality of integration results, or the snakemake pipeline for reproducible benchmarking.
-
Andreatta M, Herault L, Gueguen P, Gfeller D, Berenstein AJ, Carmona SJ - "Semi-supervised integration of single-cell transcriptomics data", Nature Communications (2024) - https://www.nature.com/articles/s41467-024-45240-z
-
Andreatta M, Carmona SJ - "STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data", Bioinformatics (2021) - https://doi.org/10.1093/bioinformatics/btaa755