Single-cell ATAC-seq measures chromatin accessibility at single-cell resolution. The core data structure – a peak-by-cell count matrix – is structurally identical to a gene-by-cell RNA matrix. scConvert converts this matrix alongside cell metadata, embeddings, and cluster labels, enabling interoperability between R (Seurat/Signac) and Python (episcanpy, snapATAC2).
This article demonstrates the conversion mechanics using a CRC (colorectal cancer) sample as a proxy. The matrix structure and I/O paths are the same whether rows represent genes or ATAC peaks.
t0 <- proc.time()
crc <- readH5AD("../crc_sample.h5ad", verbose = FALSE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf("Loaded: %d cells x %d features in %.2fs\n",
ncol(crc), nrow(crc), elapsed))
#> Loaded: 935 cells x 25344 features in 1.20s
We run a standard Seurat workflow to produce clusters and a UMAP embedding for visualization.
set.seed(42L)
crc <- NormalizeData(crc, verbose = FALSE)
crc <- FindVariableFeatures(crc, nfeatures = 2000L, verbose = FALSE)
crc <- ScaleData(crc, verbose = FALSE)
crc <- RunPCA(crc, npcs = 30L, verbose = FALSE)
crc <- FindNeighbors(crc, dims = 1:20, verbose = FALSE)
crc <- FindClusters(crc, resolution = 0.5, verbose = FALSE)
crc <- RunUMAP(crc, dims = 1:20, verbose = FALSE)
DimPlot(crc, reduction = "umap", label = TRUE, label.size = 4, pt.size = 0.5) +
ggtitle("CRC sample: Seurat clusters") +
theme(plot.title = element_text(hjust = 0.5))
CRC sample UMAP coloured by Seurat clusters.
writeH5AD() serialises the Seurat object to a
spec-compliant AnnData file. Embeddings are stored in obsm,
sparse matrices in X, and metadata in obs.
h5ad_out <- file.path(tempdir(), "crc_atac_demo.h5ad")
t0 <- proc.time()
writeH5AD(crc, h5ad_out, overwrite = TRUE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf("Wrote h5ad: %.2fs | %.1f MB\n",
elapsed, file.size(h5ad_out) / 1e6))
#> Wrote h5ad: 1.82s | 20.4 MB
library(reticulate)
Sys.setenv(NUMBA_THREADING_LAYER = "tbb", OMP_NUM_THREADS = "1")
use_condaenv("scverse")
import anndata
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
adata = anndata.read_h5ad(r.h5ad_out)
print(f"Validated: {adata.n_obs} cells x {adata.n_vars} features")
#> Validated: 935 cells x 25344 features
print(f"Obs columns: {list(adata.obs.columns[:6])}")
#> Obs columns: ['orig.ident', 'nCount_RNA', 'nFeature_RNA', 'total_counts', 'log1p_total_counts', 'Sample ID']
print(f"Embeddings: {list(adata.obsm.keys())}")
#> Embeddings: ['X_pca', 'X_umap']
For real scATAC-seq data, scConvert preserves the core data matrix and annotations. Signac-specific components that live outside the standard AnnData/h5ad schema require separate handling.
| Component | Seurat / Signac | h5ad path |
|---|---|---|
| Peak count matrix | counts layer |
/X (or /raw/X) |
| Cell metadata | meta.data |
/obs |
| Peak metadata | meta.features |
/var |
| LSI / PCA embeddings | reductions |
/obsm/X_lsi, /obsm/X_pca |
| UMAP coordinates | reductions$umap |
/obsm/X_umap |
| Cluster labels | meta.data$seurat_clusters |
/obs/seurat_clusters |
| Component | Why | Workaround |
|---|---|---|
| Fragment files | Large tabix files, not part of h5ad | Copy fragments.tsv.gz + .tbi
separately |
| Gene annotations | GRanges object (R-specific) | Reload from EnsDb after import |
| Motif matrices | Signac-specific slot | Recompute with AddMotifs() |
| ChromatinAssay class | R S4 class, not encodable in h5ad | Upgrade with CreateChromatinAssay() |
After loading an h5ad file that contains peak data, you can upgrade
the standard Seurat assay to a Signac ChromatinAssay for
peak-aware analyses such as motif enrichment and TF-IDF
normalization.
suppressPackageStartupMessages(library(Signac))
atac <- readH5AD("peaks.h5ad")
peak_counts <- GetAssayData(atac, layer = "counts")
atac[["peaks"]] <- CreateChromatinAssay(
counts = peak_counts,
sep = c("-", "-"),
min.cells = 0,
min.features = 0
)
# Optionally re-attach genome annotations:
# library(EnsDb.Hsapiens.v86)
# Annotation(atac) <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
# Fragments(atac) <- CreateFragmentObject("fragments.tsv.gz")
unlink(h5ad_out)