Changelog • scConvert

scConvert 0.2.0

Release Date: 2026-05-04

Breaking changes

HDF5 1.14+ required. SystemRequirements bumped from HDF5 (>= 1.10.0) to HDF5 (>= 1.14.0). The hdf5r close path segfaults on libhdf5 1.10.x when closing files with many open child IDs (groups/datasets opened via [[]] subsetting); this is upstream and not catchable in R. Modern Linux distros (Ubuntu 24.04+, Debian 12+, Fedora 38+), recent macOS packages, and Bioconductor’s Rhdf5lib all ship 1.14+, so this matches real deployments.

New features

Existing native readers (carried from development)

Native Stereo-seq GEF reader (LoadStereoSeqGef()). Pure-R reader for BGI .gef and .cellbin.gef files using hdf5r only. Handles both the square-bin and cell-bin schemas documented by STOmics. No Python, stereopy, or reticulate dependency. Spot coordinates are stored in meta.data$spatial_x/y and misc$spatial_technology = "StereoSeq".
Native CosMx SMI reader (LoadCosMx()). Thin R wrapper around Seurat::LoadNanostring() that validates the canonical flat-file bundle (*exprMat*.csv, *metadata*.csv, *fov_positions*.csv, *tx_file*.csv) and tags the result with misc$spatial_technology = "CosMx". No squidpy or reticulate dependency.
CLI auto-delegation for vendor raw formats. The scconvert C binary now auto-detects .gef, .cellbin.gef, and CosMx bundle directories and transparently delegates to the R backend via Rscript + execvp(). Users can write scconvert mosta.gef mosta.h5ad directly. Rscript lookup happens up-front; paths are absolutised; no shell-parsed command strings.
FOV round-trip through h5ad. writeH5AD() now serializes FOV@boundaries and FOV@molecules into a stable uns/spatial/{library}/segmentation/ and uns/spatial/{library}/molecules/ contract. readH5AD() automatically rebuilds any FOV library it finds on load. Backward-compatible with squidpy and scanpy (they ignore unknown uns/spatial/{lib}/ children).
CLI varp preservation. The new sc_stream_varp() in src/sc_groups.c mirrors sc_stream_obsp() and maps /varp/ (h5ad) to /misc/__varp__/ (h5seurat). Handles both sparse (CSR/CSC group) and dense (array dataset) varp entries. Closes the manuscript limitation “CLI does not preserve varp”.
Loom factor-level preservation. writeLoom() now stores factor levels and ordered flag under /scConvert_extensions/col_factor_levels/{name}, outside col_attrs so loompy/scanpy continue to read the file without errors. readLoom() restores the factors on load.
h5mu per-modality uns round-trip. writeH5MU() and readH5MU() now mirror each modality’s uns group under obj@misc[["__h5mu_uns_per_mod__"]][[modality]] so per-modality uns entries survive h5mu round-trips instead of being flattened.
IMC multi-image support. SeuratSpatialToH5AD() now iterates over every image in deterministic sorted order instead of processing only Images()[1]. Fixes the IMC 14/15 -> 11/13 double-roundtrip degradation documented in NOTES.md section 3.

P0 robustness fixes (Codex review response, Part A)

SOMA / SpatialData generic dispatch. Lambdas registered for the scConvert generic now accept filename = so scConvert.character() reaches the right method. Adds tests/testthat/test-generic-dispatch.R.
CLI build hygiene. CLI .o files are isolated to src/cli_obj/; make -f Makefile.cli install-bin copies the binary to inst/bin/scconvert; sc_find_cli() prefers it over the source-tree src/scconvert.
Bounded-memory R sparse streaming. .stream_sparse_group() now reads in 64 MiB chunks (tunable via options("scConvert.stream_chunk_bytes")) instead of materialising whole sparse matrices. Adds tests/testthat/test-stream-memory.R.
Canonical h5mu layout on write. writeH5MU() now writes the top-level /var (concat of modalities), /obsmap/{mod} (always 0..n-1 for Seurat sources), and /varmap/{mod} (block-diagonal with -1 sentinels) in the muon convention.
Atomic SOMA / SpatialData writes. Both writers now build under a sibling temp name and rename on success, so a mid-write crash leaves the user’s existing path untouched. (writeSpatialData() initially shipped with an on.exit that deleted its own freshly-renamed output; fixed in 2026-05-01 via a write_succeeded disarm flag.)
Python-validation tests in CI. tests/scverse-env.yml + setup-micromamba make tests/testthat/test-python-validation.R runnable on the GitHub runner. Test no longer hardcodes the macOS conda path.

Robustness

C CLI memory-safety helpers. New sc_xmalloc(), sc_xcalloc(), sc_xrealloc(), and sc_check_mul_size() in src/sc_util.c replace raw malloc() calls with overflow-checked allocations at the dense-embedding transpose sites in src/sc_zarr.c:1043, 1668, 1674 and the column-buffer allocation in src/sc_loom.c:138. Prevents SIZE_MAX overflow on chip-scale embeddings.
Defensive close_all wrap on direct-path conversion. R/Convert.R wraps hfile$close_all() in tryCatch for HDF5 1.10.x graceful degradation. (1.14+ is required and tested; the wrap is no-op there.)

Bug fixes

Native Stereo-seq GEF reader (LoadStereoSeqGef()). Pure-R reader for BGI .gef and .cellbin.gef files using hdf5r only. Handles both the square-bin and cell-bin schemas documented by STOmics. No Python, stereopy, or reticulate dependency. Spot coordinates are stored in meta.data$spatial_x/y and misc$spatial_technology = "StereoSeq".
Native CosMx SMI reader (LoadCosMx()). Thin R wrapper around Seurat::LoadNanostring() that validates the canonical flat-file bundle (*exprMat*.csv, *metadata*.csv, *fov_positions*.csv, *tx_file*.csv) and tags the result with misc$spatial_technology = "CosMx". No squidpy or reticulate dependency.
CLI auto-delegation for vendor raw formats. The scconvert C binary now auto-detects .gef, .cellbin.gef, and CosMx bundle directories and transparently delegates to the R backend via Rscript + execvp(). Users can write scconvert mosta.gef mosta.h5ad directly. Rscript lookup happens up-front; paths are absolutised; no shell-parsed command strings.
FOV round-trip through h5ad. writeH5AD() now serializes FOV@boundaries and FOV@molecules into a stable uns/spatial/{library}/segmentation/ and uns/spatial/{library}/molecules/ contract. readH5AD() automatically rebuilds any FOV library it finds on load. Backward-compatible with squidpy and scanpy (they ignore unknown uns/spatial/{lib}/ children).
CLI varp preservation. The new sc_stream_varp() in src/sc_groups.c mirrors sc_stream_obsp() and maps /varp/ (h5ad) to /misc/__varp__/ (h5seurat). Handles both sparse (CSR/CSC group) and dense (array dataset) varp entries. Closes the manuscript limitation “CLI does not preserve varp”.
Loom factor-level preservation. writeLoom() now stores factor levels and ordered flag under /scConvert_extensions/col_factor_levels/{name}, outside col_attrs so loompy/scanpy continue to read the file without errors. readLoom() restores the factors on load.
h5mu per-modality uns round-trip. writeH5MU() and readH5MU() now mirror each modality’s uns group under obj@misc[["__h5mu_uns_per_mod__"]][[modality]] so per-modality uns entries survive h5mu round-trips instead of being flattened.
IMC multi-image support. SeuratSpatialToH5AD() now iterates over every image in deterministic sorted order instead of processing only Images()[1]. Fixes the IMC 14/15 -> 11/13 double-roundtrip degradation documented in NOTES.md section 3.

Robustness

C CLI memory-safety helpers. New sc_xmalloc(), sc_xcalloc(), sc_xrealloc(), and sc_check_mul_size() in src/sc_util.c replace raw malloc() calls with overflow-checked allocations at the dense-embedding transpose sites in src/sc_zarr.c:1043, 1668, 1674 and the column-buffer allocation in src/sc_loom.c:138. Prevents SIZE_MAX overflow on chip-scale embeddings.

Bug fixes

readH5AD(): handle unsorted CSR column indices. scanpy-processed files such as scanpy.datasets.pbmc3k_processed() ship with CSR matrices whose column indices are not sorted within each row. Previously these produced an invalid dgCMatrix on read. readH5AD() and readH5MU() now detect this condition via .sort_dgc_indices() in R/LoadH5AD.R (and the sibling helper in R/LoadH5MU.R) and sort indices column-wise before constructing the sparse matrix. A regression test has been added in tests/testthat/test-regression-fixes.R.
H5SeuratToZarr(): do not crash on 1D dense datasets. The direct h5Seurat -> Zarr converter previously attempted to infer (rows, cols) from a 1D dense HDF5 dataset and crashed with a dimensionality error. It now emits a warning and skips the offending dataset. Tracked as “Chain D” in the benchmark manuscript; regression test included.
writeZarr(): skip scale.data layer when its shape differs from X. Seurat::ScaleData() produces an (n_hvg x n_cells) matrix whose row count does not match X’s n_genes. AnnData requires all layers/* to match X’s shape, so writeZarr() now skips any layer whose dimensions differ from the default assay’s data matrix. A regression test covers this case. (See R/SaveZarr.R:131.)

Testing

Added tests/testthat/test-regression-fixes.R with three regression tests pinning the bugs listed above.
Added tests/testthat/test-generic-dispatch.R (SOMA/SpatialData lambda signature pinning).
Added tests/testthat/test-stream-memory.R (bounded-memory verification on tiny chunk budgets).
Appended a canonical-h5mu-layout block to tests/testthat/test-h5mu-multimodal.R.
Test suite is now 166 test_that blocks with 773 assertions on macOS / Ubuntu / Windows (was 137 / 448 at 0.1.0).
19 vignettes build cleanly under tools::buildVignettes() including the 6 with live Python interop chunks via reticulate.

CI

Python validation in CI via setup-micromamba and tests/scverse-env.yml. anndata 0.12, scanpy 1.11, squidpy 1.8, mudata 0.3, loompy 3.0.

Documentation

Added NOTES.md describing the state of recent bug fixes, benchmark findings that are specific to the manuscript (not to users), and investigations into dataset-pipeline issues (Stereo-seq MOSTA and CosMx SMI – upstream format is not HDF5; see the notes file for details).

scConvert 0.1.0

Release Date: 2026-03-10

Highlights

Initial public release of scConvert — a universal single-cell format converter for R.

Universal Format Conversion

Support for 7 formats: h5ad, h5Seurat, h5mu, Loom, Zarr, RDS, and SingleCellExperiment
Hub architecture with 30+ conversion paths via scConvert()
Direct HDF5 paths for h5ad/h5Seurat without intermediate loading

Direct h5ad Loading

readH5AD() for native h5ad-to-Seurat conversion without intermediate files
Sparse (CSR/CSC) and dense matrix support
Categorical metadata, dimensional reductions, neighbor graphs, and spatial data

MuData (h5mu) Multimodal Support

readH5MU() / writeH5MU() for multimodal single-cell data
Automatic modality-to-assay name mapping (rna->RNA, prot->ADT, atac->ATAC)
No MuDataSeurat or Python dependency required

Zarr AnnData Support

readZarr() and writeZarr() for Zarr-based AnnData stores (v2 format)
Sparse CSR/CSC and dense matrix support
Categorical metadata and dimensional reduction preservation

Spatial Data (Visium)

Bidirectional Visium spatial data conversion with image reconstruction
Proper coordinate handling and scale factor preservation
Compatible with scanpy/squidpy spatial analysis workflows

C CLI Binary

Standalone scconvert binary for h5ad/h5Seurat/h5mu conversions
Streaming on-disk conversion without R or Python runtime
Options: --assay, --gzip, --overwrite, --quiet

BPCells On-Disk Loading

readH5AD(..., use.bpcells = TRUE) for memory-efficient atlas-scale analysis
Compatible with all Seurat analysis functions

Seurat v5 Compatibility

Full support for Seurat v5 Assay5 objects
Proper handling of V5 layered data structure in conversions