Reads a TileDB-SOMA experiment (local or remote) and returns a Seurat object. Supports cell/feature filtering via value queries, multiple measurements (assays), embeddings (obsm), and neighbor graphs (obsp).
Usage
readSOMA(
uri,
measurement = "RNA",
obs_query = NULL,
var_query = NULL,
obs_column_names = NULL,
var_column_names = NULL,
verbose = TRUE
)Arguments
- uri
URI to a SOMA experiment. Can be a local path or a cloud URI (e.g.,
tiledb://...ors3://...).- measurement
Name of the measurement to use as the default assay (default:
"RNA").- obs_query
Optional value filter string for cells, passed to
SOMADataFrame$read(value_filter = ...). For example,"cell_type == 'T cell'".- var_query
Optional value filter string for features.
- obs_column_names
Optional character vector of obs column names to read. If
NULL(default), all columns are read.- var_column_names
Optional character vector of var column names to read. If
NULL(default), all columns are read.- verbose
Show progress messages
Details
TileDB-SOMA is the cloud-native format underlying CELLxGENE Census (61M+ cells). This function requires the tiledbsoma R package.
The reader performs the following steps:
Opens the SOMA experiment at
uriReads obs (cell metadata) with optional filtering
Reads var (feature metadata) from the specified measurement
Reads the X matrix (
datalayer) as a sparse matrixReads any obsm entries as dimensional reductions
Reads any obsp entries as neighbor graphs
Reads additional measurements as extra Seurat assays
Cell names are determined from the following columns in priority order:
obs_id, _index, index, barcode, cell_id.
If none are present, synthetic names (cell_1, cell_2, ...)
are generated.
CELLxGENE Census workflow
CELLxGENE Census exposes 60M+ cells as a single SOMA experiment hosted
on public S3 (s3://cellxgene-census-public-us-west-2/).
readSOMA() can stream a filtered slice directly via
tiledbsoma, which uses S3 byte-range requests — no full download.
For programmatic Census workflows (release version pinning, metadata
discovery, source-collection lookups), prefer the dedicated
cellxgene.census R package; readSOMA() converts the resulting
SOMA experiment to Seurat in a single step.
The hub dispatcher (scConvert) recognises only file-extension
formats; SOMA URIs do not have one and therefore cannot be passed to
scConvert("s3://...", "out.h5ad") directly. Use the explicit
two-step pattern instead:
Examples
if (FALSE) { # \dontrun{
# Read a local SOMA experiment
obj <- readSOMA("path/to/experiment")
# Stream a filtered slice from CELLxGENE Census public S3
census_uri <- paste0(
"s3://cellxgene-census-public-us-west-2/",
"cell-census/2024-07-01/soma/census_data/homo_sapiens"
)
obj <- readSOMA(
uri = census_uri,
obs_query = "cell_type == 'T cell' & tissue_general == 'blood'"
)
# Read a specific measurement
obj <- readSOMA("path/to/experiment", measurement = "ATAC")
} # }