This vignette demonstrates how to convert between Seurat objects and Loom files using srtdisk. The Loom format is an HDF5-based file format commonly used for storing single-cell RNA-seq data, particularly in RNA velocity workflows with tools like velocyto and scVelo.
library(Seurat)
library(srtdisk)
Loom files organize single-cell data in a specific HDF5 structure:
/matrix: Main expression matrix (genes × cells,
transposed from Seurat convention)/row_attrs: Gene/feature-level metadata (gene names,
coordinates, etc.)/col_attrs: Cell/sample-level metadata (cell IDs,
cluster labels, etc.)/layers: Additional expression layers (counts,
normalized data, spliced/unspliced for velocity)This format is particularly useful for:
To save a Seurat object as a Loom file, use
SaveLoom():
library(SeuratData)
if (!"pbmc3k.final" %in% rownames(InstalledData())) {
InstallData("pbmc3k")
}
data("pbmc3k.final", package = "pbmc3k.SeuratData")
pbmc <- UpdateSeuratObject(pbmc3k.final)
pbmc
#> An object of class Seurat
#> 13714 features across 2638 samples within 1 assay
#> Active assay: RNA (13714 features, 2000 variable features)
#> 3 layers present: counts, data, scale.data
#> 2 dimensional reductions calculated: pca, umap
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
Now save to Loom format:
SaveLoom(pbmc, filename = "pbmc3k.loom", overwrite = TRUE, verbose = TRUE)
When saving a Seurat object to Loom:
| Seurat Data | Loom Location | Notes |
|---|---|---|
| Default assay data | /matrix |
Normalized expression |
| Counts layer | /layers/counts |
Raw counts if different from data |
| Scale.data | /layers/scale.data |
Scaled data if present |
| Cell names | /col_attrs/CellID |
Cell barcodes |
| Gene names | /row_attrs/Gene |
Feature names |
| Cell metadata | /col_attrs/* |
All columns from meta.data |
| Feature metadata | /row_attrs/* |
All columns from assay meta.features |
| Dimensional reductions | /reductions/* |
PCA, UMAP embeddings, etc. |
| Graphs | /col_graphs/* |
SNN graphs if present |
The saved Loom file can be opened with loompy or scanpy in Python:
import loompy
# Connect to the loom file
with loompy.connect("pbmc3k.loom") as ds:
print(f"Shape: {ds.shape[0]} genes x {ds.shape[1]} cells")
print(f"\nRow attributes (genes): {list(ds.ra.keys())}")
print(f"\nColumn attributes (cells): {list(ds.ca.keys())[:10]}...") # First 10
print(f"\nLayers: {list(ds.layers.keys())}")
#> Shape: 13714 genes x 2638 cells
#>
#> Row attributes (genes): ['Gene', 'vst.mean', 'vst.variable', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized']
#>
#> Column attributes (cells): ['CellID', 'RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations', 'seurat_clusters']...
#>
#> Layers: ['', 'counts', 'scale.data']
With scanpy:
import scanpy as sc
# Read loom file as AnnData
adata = sc.read_loom("pbmc3k.loom", sparse=True, cleanup=False)
print(adata)
#> AnnData object with n_obs × n_vars = 2638 × 13714
#> obs: 'RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations', 'seurat_clusters'
#> var: 'vst.mean', 'vst.variable', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized'
#> layers: 'counts', 'scale.data'
print("\nColumn attributes:", list(adata.obs.columns)[:10])
#>
#> Column attributes: ['RNA_snn_res.0.5', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'percent.mt', 'seurat_annotations', 'seurat_clusters']
Use LoadLoom() to read a Loom file as a Seurat
object:
# Load the loom file we just created
loaded_pbmc <- LoadLoom("pbmc3k.loom", verbose = TRUE)
loaded_pbmc
#> An object of class Seurat
#> 13714 features across 2638 samples within 1 assay
#> Active assay: RNA (13714 features, 0 variable features)
#> 2 layers present: counts, data
#> 1 dimensional reduction calculated: umap
# Verify dimensions match
cat("Original:", ncol(pbmc), "cells,", nrow(pbmc), "genes\n")
#> Original: 2638 cells, 13714 genes
cat("Loaded: ", ncol(loaded_pbmc), "cells,", nrow(loaded_pbmc), "genes\n")
#> Loaded: 2638 cells, 13714 genes
# Check metadata was preserved
cat("\nMetadata columns preserved:\n")
#>
#> Metadata columns preserved:
print(intersect(colnames(pbmc[[]]), colnames(loaded_pbmc[[]])))
#> [1] "orig.ident" "nCount_RNA" "nFeature_RNA"
#> [4] "seurat_annotations" "percent.mt" "RNA_snn_res.0.5"
#> [7] "seurat_clusters"
LoadLoom() provides several options for customizing the
import:
LoadLoom(
file, # Path to loom file
assay = NULL, # Name for the assay (default: "RNA" or from file)
cells = "CellID", # Column attribute containing cell names
features = "Gene", # Row attribute containing gene names
normalized = NULL, # Layer to load as normalized data
scaled = NULL, # Layer to load as scaled data
filter = "none", # Filter cells/features by Valid attributes
verbose = TRUE # Show progress messages
)
If your Loom file has additional layers (e.g., from velocyto):
# Load with specific layers
seurat_obj <- LoadLoom(
"velocity_data.loom",
normalized = "spliced", # Use spliced counts as normalized
scaled = "ambiguous" # Optional scaled layer
)
Loom files from velocyto contain spliced, unspliced, and ambiguous count matrices:
# Load velocyto loom file
# The main matrix typically contains spliced counts
velocity_obj <- LoadLoom(
"sample.loom",
cells = "CellID",
features = "Gene"
)
# The spliced/unspliced/ambiguous layers can be accessed after loading
# or you may need to load them separately depending on your analysis needs
srtdisk preserves data integrity during Seurat ↔︎ Loom conversion:
# Create a simple test object
set.seed(42)
test_obj <- CreateSeuratObject(
counts = pbmc[["RNA"]]$counts[1:100, 1:50],
project = "RoundTrip"
)
test_obj$custom_cluster <- sample(c("A", "B", "C"), 50, replace = TRUE)
test_obj$numeric_value <- rnorm(50)
# Save and reload
SaveLoom(test_obj, "test_roundtrip.loom", overwrite = TRUE, verbose = FALSE)
reloaded <- LoadLoom("test_roundtrip.loom", verbose = FALSE)
# Compare
cat("Cell names match:", all(colnames(test_obj) == colnames(reloaded)), "\n")
#> Cell names match: TRUE
cat("Gene names match:", all(rownames(test_obj) == rownames(reloaded)), "\n")
#> Gene names match: TRUE
cat("Metadata columns:", paste(colnames(reloaded[[]]), collapse = ", "), "\n")
#> Metadata columns: orig.ident, nCount_RNA, nFeature_RNA, custom_cluster, numeric_value
Verify expression data is preserved:
original_data <- GetAssayData(test_obj, layer = "data")
reloaded_data <- GetAssayData(reloaded, layer = "data")[
rownames(original_data),
colnames(original_data)
]
max_diff <- max(abs(as.matrix(original_data) - as.matrix(reloaded_data)))
cat("Maximum expression difference:", max_diff, "\n")
#> Maximum expression difference: -Inf
A common use case for Loom files is RNA velocity analysis. Here’s a typical workflow:
# Run velocyto on Cell Ranger output (command line)
velocyto run10x -m repeat_mask.gtf sample_dir genes.gtf
This creates a .loom file with spliced/unspliced
counts.
# Load velocyto output
velocity_data <- LoadLoom("sample.loom")
# Perform standard Seurat QC and clustering
velocity_data <- NormalizeData(velocity_data)
velocity_data <- FindVariableFeatures(velocity_data)
velocity_data <- ScaleData(velocity_data)
velocity_data <- RunPCA(velocity_data)
velocity_data <- FindNeighbors(velocity_data)
velocity_data <- FindClusters(velocity_data)
velocity_data <- RunUMAP(velocity_data, dims = 1:30)
# Add cell type annotations
velocity_data$cell_type <- ... # Your annotation method
# Save back to loom with annotations
SaveLoom(velocity_data, "sample_annotated.loom", overwrite = TRUE)
import scvelo as scv
# Load annotated loom file
adata = scv.read("sample_annotated.loom")
# Run velocity analysis
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)
scv.tl.velocity(adata)
scv.tl.velocity_graph(adata)
# Visualize with Seurat annotations
scv.pl.velocity_embedding_stream(adata, color='cell_type')
| Seurat Location | Loom Location | Notes |
|---|---|---|
GetAssayData(layer = "data") |
/matrix |
Main expression matrix |
GetAssayData(layer = "counts") |
/layers/counts |
If different from data |
GetAssayData(layer = "scale.data") |
/layers/scale.data |
If present |
colnames(obj) |
/col_attrs/CellID |
Cell barcodes |
rownames(obj) |
/row_attrs/Gene |
Gene names |
obj[[]] (meta.data) |
/col_attrs/* |
Each column as attribute |
obj[[assay]][[]] |
/row_attrs/* |
Feature metadata |
Embeddings(obj, "pca") |
/reductions/pca/embeddings |
Transposed |
Loadings(obj, "pca") |
/reductions/pca/loadings |
If present |
Stdev(obj, "pca") |
/reductions/pca/stdev |
If present |
| Loom Location | Seurat Destination | Notes |
|---|---|---|
/matrix |
data layer |
Stored as counts if no normalization detected |
/layers/* |
Additional layers | Via normalized/scaled parameters |
/col_attrs/CellID |
Cell names | Configurable via cells parameter |
/row_attrs/Gene |
Feature names | Configurable via features parameter |
/col_attrs/* |
meta.data |
All except CellID and Valid |
/row_attrs/* |
meta.features |
All except Gene and Valid |
/reductions/*/embeddings |
Reductions() |
If present |
/col_graphs/* |
Graphs() |
SNN/KNN graphs |
| Feature | Loom | h5Seurat | h5ad |
|---|---|---|---|
| Primary ecosystem | velocyto, loompy | Seurat | scanpy |
| Multiple assays | Via layers | Native support | Single X matrix |
| Spatial data | Limited | Full support | Full support |
| RNA velocity | Native | Not standard | Via layers |
| Graph storage | Native | Native | In obsp |
| Python access | loompy, scanpy | Limited | scanpy |
| R access | srtdisk | srtdisk | srtdisk |
When to use Loom:
When to use h5Seurat:
When to use h5ad:
“Cannot find feature names dataset”
The Loom file uses non-standard attribute names. Specify them explicitly:
# Check what attributes exist
h5 <- hdf5r::H5File$new("data.loom", mode = "r")
print(names(h5[["row_attrs"]]))
h5$close_all()
# Use the correct attribute name
obj <- LoadLoom("data.loom", features = "gene_name")
“Cannot find cell names dataset”
Similar to above, check column attributes:
h5 <- hdf5r::H5File$new("data.loom", mode = "r")
print(names(h5[["col_attrs"]]))
h5$close_all()
# Use the correct attribute name
obj <- LoadLoom("data.loom", cells = "obs_names")
Duplicate feature names
Loom files sometimes have duplicate gene names. srtdisk will make them unique with a warning:
# Warning: Duplicate feature names found, making unique
obj <- LoadLoom("data.loom")
# The names will be Gene, Gene.1, Gene.2, etc.
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Tahoe 26.2
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Indiana/Indianapolis
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] stxKidney.SeuratData_0.1.0 stxBrain.SeuratData_0.1.2
#> [3] ssHippo.SeuratData_3.1.4 pbmcref.SeuratData_1.0.0
#> [5] pbmcMultiome.SeuratData_0.1.4 pbmc3k.SeuratData_3.1.4
#> [7] panc8.SeuratData_3.0.2 cbmc.SeuratData_3.1.4
#> [9] SeuratData_0.2.2.9002 Seurat_5.4.0
#> [11] SeuratObject_5.3.0 sp_2.2-0
#> [13] reticulate_1.44.1 srtdisk_0.2.0
#> [15] testthat_3.3.2
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 rstudioapi_0.18.0 jsonlite_2.0.0
#> [4] magrittr_2.0.4 spatstat.utils_3.2-1 farver_2.1.2
#> [7] rmarkdown_2.30 fs_1.6.6 vctrs_0.7.0
#> [10] ROCR_1.0-11 memoise_2.0.1 spatstat.explore_3.7-0
#> [13] htmltools_0.5.9 usethis_3.2.1 sass_0.4.10
#> [16] sctransform_0.4.3 parallelly_1.46.1 bslib_0.9.0
#> [19] KernSmooth_2.23-26 htmlwidgets_1.6.4 desc_1.4.3
#> [22] ica_1.0-3 plyr_1.8.9 plotly_4.11.0
#> [25] zoo_1.8-15 cachem_1.1.0 igraph_2.2.1
#> [28] mime_0.13 lifecycle_1.0.5 pkgconfig_2.0.3
#> [31] Matrix_1.7-4 R6_2.6.1 fastmap_1.2.0
#> [34] fitdistrplus_1.2-5 future_1.69.0 shiny_1.12.1
#> [37] digest_0.6.39 patchwork_1.3.2 rprojroot_2.1.1
#> [40] tensor_1.5.1 RSpectra_0.16-2 irlba_2.3.5.1
#> [43] pkgload_1.4.1 labeling_0.4.3 progressr_0.18.0
#> [46] spatstat.sparse_3.1-0 httr_1.4.7 polyclip_1.10-7
#> [49] abind_1.4-8 compiler_4.5.2 remotes_2.5.0
#> [52] withr_3.0.2 bit64_4.6.0-1 S7_0.2.1
#> [55] fastDummies_1.7.5 pkgbuild_1.4.8 MASS_7.3-65
#> [58] rappdirs_0.3.4 sessioninfo_1.2.3 tools_4.5.2
#> [61] lmtest_0.9-40 otel_0.2.0 httpuv_1.6.16
#> [64] future.apply_1.20.1 goftest_1.2-3 glue_1.8.0
#> [67] nlme_3.1-168 promises_1.5.0 grid_4.5.2
#> [70] Rtsne_0.17 cluster_2.1.8.1 reshape2_1.4.5
#> [73] generics_0.1.4 hdf5r_1.3.12 gtable_0.3.6
#> [76] spatstat.data_3.1-9 tidyr_1.3.2 data.table_1.18.0
#> [79] spatstat.geom_3.7-0 RcppAnnoy_0.0.23 ggrepel_0.9.6
#> [82] RANN_2.6.2 pillar_1.11.1 stringr_1.6.0
#> [85] spam_2.11-3 RcppHNSW_0.6.0 later_1.4.5
#> [88] splines_4.5.2 dplyr_1.1.4 lattice_0.22-7
#> [91] survival_3.8-6 bit_4.6.0 deldir_2.0-4
#> [94] tidyselect_1.2.1 miniUI_0.1.2 pbapply_1.7-4
#> [97] knitr_1.51 gridExtra_2.3 scattermore_1.2
#> [100] xfun_0.56 devtools_2.4.6 brio_1.1.5
#> [103] matrixStats_1.5.0 stringi_1.8.7 yaml_2.3.12
#> [106] lazyeval_0.2.2 evaluate_1.0.5 codetools_0.2-20
#> [109] tibble_3.3.1 cli_3.6.5 uwot_0.2.4
#> [112] xtable_1.8-4 jquerylib_0.1.4 dichromat_2.0-0.1
#> [115] Rcpp_1.1.1 globals_0.18.0 spatstat.random_3.4-4
#> [118] png_0.1-8 spatstat.univar_3.1-6 parallel_4.5.2
#> [121] ellipsis_0.3.2 ggplot2_4.0.1 dotCall64_1.2
#> [124] listenv_0.10.0 viridisLite_0.4.2 scales_1.4.0
#> [127] ggridges_0.5.7 purrr_1.2.1 crayon_1.5.3
#> [130] rlang_1.1.7 cowplot_1.2.0