Introduction

Many Bioconductor workflows – scran normalization, scater QC, slingshot trajectory inference, MAST differential expression – operate on the SingleCellExperiment (SCE) container. Seurat provides built-in converters (as.SingleCellExperiment() and as.Seurat()) that move data between ecosystems. scConvert extends this by letting you route SCE objects directly to h5ad, h5Seurat, Loom, or Zarr via its hub architecture – no manual intermediate step required.

This article walks through the full workflow on the PBMC 3k dataset.


1 Load real PBMC3k data

t0      <- proc.time()
pbmc    <- readH5AD("../pbmc3k.h5ad", verbose = FALSE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf("Loaded: %d cells x %d genes in %.2fs\n",
            ncol(pbmc), nrow(pbmc), elapsed))
#> Loaded: 2638 cells x 13714 genes in 1.25s

2 Seurat analysis pipeline

We run the standard pipeline so the object carries PCA, UMAP, and cluster labels for downstream inspection. Steps are skipped if reductions already exist in the imported file.

set.seed(42L)
pbmc <- NormalizeData(pbmc, verbose = FALSE)
pbmc <- FindVariableFeatures(pbmc, nfeatures = 2000L, verbose = FALSE)
pbmc <- ScaleData(pbmc, verbose = FALSE)
pbmc <- RunPCA(pbmc, npcs = 30L, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
DimPlot(pbmc, reduction = "umap", label = TRUE, label.size = 4) +
  ggtitle("PBMC 3k: Seurat clusters") +
  theme(plot.title = element_text(hjust = 0.5))
PBMC 3k UMAP coloured by Seurat clusters.

PBMC 3k UMAP coloured by Seurat clusters.


3 Seurat to SingleCellExperiment

as.SingleCellExperiment() transfers counts, normalized data, dimensional reductions, and cell metadata into the SCE container.

suppressPackageStartupMessages(library(SingleCellExperiment))

sce <- as.SingleCellExperiment(pbmc)
cat(sprintf("SCE assays: %s\n",
            paste(assayNames(sce), collapse = ", ")))
#> SCE assays: counts, logcounts
cat(sprintf("Reduced dims: %s\n",
            paste(reducedDimNames(sce), collapse = ", ")))
#> Reduced dims: PCA, UMAP
cat(sprintf("colData columns: %d\n", ncol(colData(sce))))
#> colData columns: 8

4 Bioconductor QC workflow on SCE

A typical Bioconductor entry point is computing per-cell QC metrics. Below we use base R to calculate log-library-size and detected-feature counts – the same quantities that scater::perCellQCMetrics() would produce. The new columns become available to any SCE-aware tool.

counts_mat <- assay(sce, "counts")
colData(sce)$log_library_size  <- log1p(Matrix::colSums(counts_mat))
colData(sce)$detected_features <- Matrix::colSums(counts_mat > 0)
cat(sprintf("Added QC columns: log_library_size, detected_features\n"))
#> Added QC columns: log_library_size, detected_features
cat(sprintf("Median library size (log1p): %.2f\n",
            median(colData(sce)$log_library_size)))
#> Median library size (log1p): 7.70
cat(sprintf("Median detected features: %.0f\n",
            median(colData(sce)$detected_features)))
#> Median detected features: 819
qc_df <- data.frame(
  cluster         = as.character(pbmc$seurat_clusters),
  log_library     = colData(sce)$log_library_size,
  detected        = colData(sce)$detected_features
)

p1 <- ggplot(qc_df, aes(x = cluster, y = log_library, fill = cluster)) +
  geom_violin(scale = "width", trim = TRUE) +
  geom_boxplot(width = 0.1, outlier.size = 0.5, fill = "white") +
  labs(x = "Cluster", y = "log(1 + library size)") +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(size = 9))

p2 <- ggplot(qc_df, aes(x = cluster, y = detected, fill = cluster)) +
  geom_violin(scale = "width", trim = TRUE) +
  geom_boxplot(width = 0.1, outlier.size = 0.5, fill = "white") +
  labs(x = "Cluster", y = "Detected features") +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(size = 9))

suppressPackageStartupMessages(library(patchwork))
p1 + p2
Per-cell QC metrics across Seurat clusters.

Per-cell QC metrics across Seurat clusters.


5 SCE back to Seurat

as.Seurat() reconstructs the Seurat object from SCE. The counts and data arguments map SCE assay names to the appropriate Seurat layers. Any columns added to colData during the Bioconductor workflow travel with the object.

seurat_rt <- as.Seurat(sce, counts = "counts", data = "logcounts")
cat(sprintf("Converted back: %d cells\n", ncol(seurat_rt)))
#> Converted back: 2638 cells
cat(sprintf("Metadata columns: %s\n",
            paste(head(colnames(seurat_rt[[]]), 6L), collapse = ", ")))
#> Metadata columns: orig.ident, nCount_RNA, nFeature_RNA, seurat_annotations, percent.mt, RNA_snn_res.0.5

6 SCE to h5ad via scConvert

To share an SCE dataset with Python collaborators, pass it directly to scConvert(). The SCE is converted to a Seurat object internally, then written to the target format – no manual intermediate step is needed.

h5ad_path <- file.path(tempdir(), "pbmc3k_sce.h5ad")

t0      <- proc.time()
scConvert(sce, dest = h5ad_path, overwrite = TRUE)
elapsed <- (proc.time() - t0)[["elapsed"]]

cat(sprintf("h5ad size: %.1f MB (%.2fs)\n",
            file.size(h5ad_path) / 1e6, elapsed))
#> h5ad size: 12.9 MB (0.65s)

7 What is preserved

Seurat -> SCE -> Seurat round-trip

Component Preserved
Count matrix Yes
Normalized data Yes
Cell metadata Yes
Gene metadata Yes
PCA / UMAP Yes (as reducedDims)
Neighbor graphs Partial

Common Bioconductor workflows enabled by scConvert

Workflow Package After scConvert
QC filtering scater perCellQCMetrics() on SCE
Normalization scran computeSumFactors() on SCE
Trajectory slingshot slingshot(sce, ...)
Differential expression MAST zlm(~ condition, sce)

8 Cleanup

unlink(h5ad_path)