Loom is an HDF5-based format developed by the Linnarsson lab and used
natively by loompy, RNA velocity tools (scVelo,
velocyto), and several single-cell atlases. It stores the expression
matrix at /matrix, cell metadata in
/col_attrs, gene metadata in /row_attrs, and
additional layers (spliced/unspliced counts, normalized data) in
/layers.
scConvert provides readLoom() and
writeLoom() for full Seurat <-> Loom conversion
without a Python dependency.
This article demonstrates the conversion on PBMC 3k (2,638 cells, 13,714 genes) and validates the written file with loompy and scanpy.
We first load the PBMC 3k h5ad into Seurat, then export to Loom.
input_h5ad <- "../pbmc3k.h5ad"
pbmc_seurat <- readH5AD(input_h5ad, verbose = FALSE)
cat(sprintf("Loaded: %d cells x %d genes\n", ncol(pbmc_seurat), nrow(pbmc_seurat)))
#> Loaded: 2638 cells x 13714 genes
loom_path <- file.path(tempdir(), "pbmc3k.loom")
t0 <- proc.time()
writeLoom(pbmc_seurat, filename = loom_path, overwrite = TRUE, verbose = FALSE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf("Wrote Loom: %.2fs | %.1f MB\n", elapsed, file.size(loom_path) / 1e6))
#> Wrote Loom: 2.43s | 22.3 MB
pbmc_loom <- readLoom(loom_path, verbose = FALSE)
cat(sprintf("Loaded: %d cells x %d genes\n", ncol(pbmc_loom), nrow(pbmc_loom)))
#> Loaded: 2638 cells x 13714 genes
pbmc_loom
#> An object of class Seurat
#> 13714 features across 2638 samples within 1 assay
#> Active assay: RNA (13714 features, 0 variable features)
#> 2 layers present: counts, data
#> 2 dimensional reductions calculated: pca, umap
head(pbmc_loom[[]], 4)
set.seed(42L)
pbmc_loom <- NormalizeData(pbmc_loom, verbose = FALSE)
pbmc_loom <- FindVariableFeatures(pbmc_loom, nfeatures = 2000L, verbose = FALSE)
pbmc_loom <- ScaleData(pbmc_loom, verbose = FALSE)
pbmc_loom <- RunPCA(pbmc_loom, npcs = 30L, verbose = FALSE)
pbmc_loom <- RunUMAP(pbmc_loom, dims = 1:20, verbose = FALSE)
DimPlot(
pbmc_loom,
reduction = "umap",
group.by = "seurat_annotations",
label = TRUE,
label.size = 3.5,
repel = TRUE
) +
ggtitle("PBMC 3k: cell-type annotations (from Loom)") +
theme(plot.title = element_text(hjust = 0.5))
PBMC 3k UMAP coloured by cell-type annotation after Loom round-trip.
| Component | Preserved | Loom path |
|---|---|---|
| Expression matrix | Yes | /matrix |
| Raw counts | Yes | /layers/counts |
| Cell metadata | Yes | /col_attrs |
| Gene metadata | Yes | /row_attrs |
| PCA / UMAP embeddings | Yes | /col_attrs/PC_1..n, UMAP_1..n |
| Seurat cluster labels | Yes | /col_attrs/seurat_clusters |
| Neighbor graphs | No | Recompute with FindNeighbors() |
misc / uns |
No | Not part of the Loom spec |
library(reticulate)
Sys.setenv(NUMBA_THREADING_LAYER = "tbb", OMP_NUM_THREADS = "1")
use_condaenv("scverse")
import loompy
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
with loompy.connect(r.loom_path, mode="r") as ds:
print(f"Shape: {ds.shape[0]} genes x {ds.shape[1]} cells")
print(f"Column attributes (cell metadata): {list(ds.ca.keys())[:8]}")
print(f"Row attributes (gene metadata): {list(ds.ra.keys())[:5]}")
print(f"Layers: {list(ds.layers.keys())}")
#> Shape: 13714 genes x 2638 cells
#> Column attributes (cell metadata): ['CellID', 'RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations', 'seurat_clusters']
#> Row attributes (gene metadata): ['Gene']
#> Layers: ['']
scanpy can load a Loom file directly and treat it as an AnnData object.
import scanpy as sc
adata = sc.read_loom(r.loom_path, sparse=True, cleanup=False)
print(adata)
#> AnnData object with n_obs × n_vars = 2638 × 13714
#> obs: 'RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations', 'seurat_clusters'
print(f"obs columns: {list(adata.obs.columns)[:6]}")
#> obs columns: ['RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations']
Loom is the primary input format for RNA velocity analysis with scVelo and velocyto. scVelo expects two layers:
spliced and unspliced, which are produced by
velocyto or STARsolo during alignment.
A typical workflow:
.loom with spliced/unspliced counts.readLoom() for QC and
metadata annotation in Seurat.writeLoom(),
which preserves the cluster labels and embeddings in
/col_attrs.scConvert writes expression counts to /layers/counts.
Users need to merge the spliced/unspliced layers from the velocyto
output separately before running scVelo.
import loompy
with loompy.connect(r.loom_path, mode="r") as ds:
layers = list(ds.layers.keys())
print(f"Layers in scConvert output: {layers}")
print("scVelo additionally requires: ['spliced', 'unspliced']")
print("Merge these from the velocyto loom before calling scvelo.pp.filter_and_normalize()")
#> Layers in scConvert output: ['']
#> scVelo additionally requires: ['spliced', 'unspliced']
#> Merge these from the velocyto loom before calling scvelo.pp.filter_and_normalize()
unlink(loom_path)