Changelog
Source:NEWS.md
scConvert 0.2.0
Release Date: 2026-05-04
Breaking changes
-
HDF5 1.14+ required.
SystemRequirementsbumped fromHDF5 (>= 1.10.0)toHDF5 (>= 1.14.0). The hdf5r close path segfaults on libhdf5 1.10.x when closing files with many open child IDs (groups/datasets opened via[[]]subsetting); this is upstream and not catchable in R. Modern Linux distros (Ubuntu 24.04+, Debian 12+, Fedora 38+), recent macOS packages, and Bioconductor’sRhdf5liball ship 1.14+, so this matches real deployments.
New features
Existing native readers (carried from development)
-
Native Stereo-seq GEF reader (
LoadStereoSeqGef()). Pure-R reader for BGI.gefand.cellbin.geffiles usinghdf5ronly. Handles both the square-bin and cell-bin schemas documented by STOmics. No Python, stereopy, or reticulate dependency. Spot coordinates are stored inmeta.data$spatial_x/yandmisc$spatial_technology = "StereoSeq". -
Native CosMx SMI reader (
LoadCosMx()). Thin R wrapper aroundSeurat::LoadNanostring()that validates the canonical flat-file bundle (*exprMat*.csv,*metadata*.csv,*fov_positions*.csv,*tx_file*.csv) and tags the result withmisc$spatial_technology = "CosMx". No squidpy or reticulate dependency. -
CLI auto-delegation for vendor raw formats. The
scconvertC binary now auto-detects.gef,.cellbin.gef, and CosMx bundle directories and transparently delegates to the R backend viaRscript+execvp(). Users can writescconvert mosta.gef mosta.h5addirectly.Rscriptlookup happens up-front; paths are absolutised; no shell-parsed command strings. -
FOV round-trip through h5ad.
writeH5AD()now serializesFOV@boundariesandFOV@moleculesinto a stableuns/spatial/{library}/segmentation/anduns/spatial/{library}/molecules/contract.readH5AD()automatically rebuilds any FOV library it finds on load. Backward-compatible with squidpy and scanpy (they ignore unknownuns/spatial/{lib}/children). -
CLI
varppreservation. The newsc_stream_varp()insrc/sc_groups.cmirrorssc_stream_obsp()and maps/varp/(h5ad) to/misc/__varp__/(h5seurat). Handles both sparse (CSR/CSC group) and dense (array dataset) varp entries. Closes the manuscript limitation “CLI does not preserve varp”. -
Loom factor-level preservation.
writeLoom()now stores factor levels andorderedflag under/scConvert_extensions/col_factor_levels/{name}, outsidecol_attrsso loompy/scanpy continue to read the file without errors.readLoom()restores the factors on load. -
h5mu per-modality
unsround-trip.writeH5MU()andreadH5MU()now mirror each modality’sunsgroup underobj@misc[["__h5mu_uns_per_mod__"]][[modality]]so per-modality uns entries survive h5mu round-trips instead of being flattened. -
IMC multi-image support.
SeuratSpatialToH5AD()now iterates over every image in deterministic sorted order instead of processing onlyImages()[1]. Fixes the IMC 14/15 -> 11/13 double-roundtrip degradation documented inNOTES.mdsection 3.
P0 robustness fixes (Codex review response, Part A)
-
SOMA / SpatialData generic dispatch. Lambdas registered for the
scConvertgeneric now acceptfilename =soscConvert.character()reaches the right method. Addstests/testthat/test-generic-dispatch.R. -
CLI build hygiene. CLI
.ofiles are isolated tosrc/cli_obj/;make -f Makefile.cli install-bincopies the binary toinst/bin/scconvert;sc_find_cli()prefers it over the source-treesrc/scconvert. -
Bounded-memory R sparse streaming.
.stream_sparse_group()now reads in 64 MiB chunks (tunable viaoptions("scConvert.stream_chunk_bytes")) instead of materialising whole sparse matrices. Addstests/testthat/test-stream-memory.R. -
Canonical h5mu layout on write.
writeH5MU()now writes the top-level/var(concat of modalities),/obsmap/{mod}(always0..n-1for Seurat sources), and/varmap/{mod}(block-diagonal with-1sentinels) in the muon convention. -
Atomic SOMA / SpatialData writes. Both writers now build under a sibling temp name and rename on success, so a mid-write crash leaves the user’s existing path untouched. (
writeSpatialData()initially shipped with anon.exitthat deleted its own freshly-renamed output; fixed in 2026-05-01 via awrite_succeededdisarm flag.) -
Python-validation tests in CI.
tests/scverse-env.yml+setup-micromambamaketests/testthat/test-python-validation.Rrunnable on the GitHub runner. Test no longer hardcodes the macOS conda path.
Robustness
-
C CLI memory-safety helpers. New
sc_xmalloc(),sc_xcalloc(),sc_xrealloc(), andsc_check_mul_size()insrc/sc_util.creplace rawmalloc()calls with overflow-checked allocations at the dense-embedding transpose sites insrc/sc_zarr.c:1043, 1668, 1674and the column-buffer allocation insrc/sc_loom.c:138. Prevents SIZE_MAX overflow on chip-scale embeddings. -
Defensive
close_allwrap on direct-path conversion. R/Convert.R wrapshfile$close_all()intryCatchfor HDF5 1.10.x graceful degradation. (1.14+ is required and tested; the wrap is no-op there.)
Bug fixes
-
Native Stereo-seq GEF reader (
LoadStereoSeqGef()). Pure-R reader for BGI.gefand.cellbin.geffiles usinghdf5ronly. Handles both the square-bin and cell-bin schemas documented by STOmics. No Python, stereopy, or reticulate dependency. Spot coordinates are stored inmeta.data$spatial_x/yandmisc$spatial_technology = "StereoSeq". -
Native CosMx SMI reader (
LoadCosMx()). Thin R wrapper aroundSeurat::LoadNanostring()that validates the canonical flat-file bundle (*exprMat*.csv,*metadata*.csv,*fov_positions*.csv,*tx_file*.csv) and tags the result withmisc$spatial_technology = "CosMx". No squidpy or reticulate dependency. -
CLI auto-delegation for vendor raw formats. The
scconvertC binary now auto-detects.gef,.cellbin.gef, and CosMx bundle directories and transparently delegates to the R backend viaRscript+execvp(). Users can writescconvert mosta.gef mosta.h5addirectly.Rscriptlookup happens up-front; paths are absolutised; no shell-parsed command strings. -
FOV round-trip through h5ad.
writeH5AD()now serializesFOV@boundariesandFOV@moleculesinto a stableuns/spatial/{library}/segmentation/anduns/spatial/{library}/molecules/contract.readH5AD()automatically rebuilds any FOV library it finds on load. Backward-compatible with squidpy and scanpy (they ignore unknownuns/spatial/{lib}/children). -
CLI
varppreservation. The newsc_stream_varp()insrc/sc_groups.cmirrorssc_stream_obsp()and maps/varp/(h5ad) to/misc/__varp__/(h5seurat). Handles both sparse (CSR/CSC group) and dense (array dataset) varp entries. Closes the manuscript limitation “CLI does not preserve varp”. -
Loom factor-level preservation.
writeLoom()now stores factor levels andorderedflag under/scConvert_extensions/col_factor_levels/{name}, outsidecol_attrsso loompy/scanpy continue to read the file without errors.readLoom()restores the factors on load. -
h5mu per-modality
unsround-trip.writeH5MU()andreadH5MU()now mirror each modality’sunsgroup underobj@misc[["__h5mu_uns_per_mod__"]][[modality]]so per-modality uns entries survive h5mu round-trips instead of being flattened. -
IMC multi-image support.
SeuratSpatialToH5AD()now iterates over every image in deterministic sorted order instead of processing onlyImages()[1]. Fixes the IMC 14/15 -> 11/13 double-roundtrip degradation documented inNOTES.mdsection 3.
Robustness
-
C CLI memory-safety helpers. New
sc_xmalloc(),sc_xcalloc(),sc_xrealloc(), andsc_check_mul_size()insrc/sc_util.creplace rawmalloc()calls with overflow-checked allocations at the dense-embedding transpose sites insrc/sc_zarr.c:1043, 1668, 1674and the column-buffer allocation insrc/sc_loom.c:138. Prevents SIZE_MAX overflow on chip-scale embeddings.
Bug fixes
-
readH5AD(): handle unsorted CSR column indices. scanpy-processed files such asscanpy.datasets.pbmc3k_processed()ship with CSR matrices whose column indices are not sorted within each row. Previously these produced an invaliddgCMatrixon read.readH5AD()andreadH5MU()now detect this condition via.sort_dgc_indices()inR/LoadH5AD.R(and the sibling helper inR/LoadH5MU.R) and sort indices column-wise before constructing the sparse matrix. A regression test has been added intests/testthat/test-regression-fixes.R. -
H5SeuratToZarr(): do not crash on 1D dense datasets. The direct h5Seurat -> Zarr converter previously attempted to infer (rows, cols) from a 1D dense HDF5 dataset and crashed with a dimensionality error. It now emits a warning and skips the offending dataset. Tracked as “Chain D” in the benchmark manuscript; regression test included. -
writeZarr(): skip scale.data layer when its shape differs from X.Seurat::ScaleData()produces an(n_hvg x n_cells)matrix whose row count does not matchX’sn_genes. AnnData requires alllayers/*to matchX’s shape, sowriteZarr()now skips any layer whose dimensions differ from the default assay’s data matrix. A regression test covers this case. (SeeR/SaveZarr.R:131.)
Testing
- Added
tests/testthat/test-regression-fixes.Rwith three regression tests pinning the bugs listed above. - Added
tests/testthat/test-generic-dispatch.R(SOMA/SpatialData lambda signature pinning). - Added
tests/testthat/test-stream-memory.R(bounded-memory verification on tiny chunk budgets). - Appended a canonical-h5mu-layout block to
tests/testthat/test-h5mu-multimodal.R. - Test suite is now 166
test_thatblocks with 773 assertions on macOS / Ubuntu / Windows (was 137 / 448 at 0.1.0). - 19 vignettes build cleanly under
tools::buildVignettes()including the 6 with live Python interop chunks via reticulate.
scConvert 0.1.0
Release Date: 2026-03-10
Highlights
Initial public release of scConvert — a universal single-cell format converter for R.
Universal Format Conversion
- Support for 7 formats: h5ad, h5Seurat, h5mu, Loom, Zarr, RDS, and SingleCellExperiment
- Hub architecture with 30+ conversion paths via
scConvert() - Direct HDF5 paths for h5ad/h5Seurat without intermediate loading
Direct h5ad Loading
-
readH5AD()for native h5ad-to-Seurat conversion without intermediate files - Sparse (CSR/CSC) and dense matrix support
- Categorical metadata, dimensional reductions, neighbor graphs, and spatial data
MuData (h5mu) Multimodal Support
-
readH5MU()/writeH5MU()for multimodal single-cell data - Automatic modality-to-assay name mapping (rna->RNA, prot->ADT, atac->ATAC)
- No MuDataSeurat or Python dependency required
Zarr AnnData Support
-
readZarr()andwriteZarr()for Zarr-based AnnData stores (v2 format) - Sparse CSR/CSC and dense matrix support
- Categorical metadata and dimensional reduction preservation
Spatial Data (Visium)
- Bidirectional Visium spatial data conversion with image reconstruction
- Proper coordinate handling and scale factor preservation
- Compatible with scanpy/squidpy spatial analysis workflows
C CLI Binary
- Standalone
scconvertbinary for h5ad/h5Seurat/h5mu conversions - Streaming on-disk conversion without R or Python runtime
- Options:
--assay,--gzip,--overwrite,--quiet