NanoString CosMx SMI (Spatial Molecular Imager) is a sub-cellular resolution imaging-based spatial transcriptomics platform. It combines in-situ hybridization with cyclic fluorescence imaging to capture RNA and protein targets at single-cell and sub-cellular resolution, with panels of up to 6,000+ genes across the full tissue section.
CosMx data arrives as a flat-file bundle: an expression matrix, per-cell metadata (cell morphology and co-registered immunofluorescence intensity), per-FOV position coordinates, and optionally transcript-level positions and segmentation polygons. Each FOV (field of view) is an independently imaged tile that is later assembled into a whole-slide mosaic via the global pixel coordinates.
scConvert loads the full bundle in a single call — counts, centroids, and the immunofluorescence metadata — and exports a squidpy-compatible h5ad without manual reformatting.
The demo dataset is the NanoString NSCLC Lung9 Rep1 sample from the public NanoString data portal. It contains 91,972 cells across 20 FOVs profiled with a 980-gene panel, co-registered with membrane stain and four immune / epithelial fluorescence channels (PanCK, CD45, CD3, DAPI).
Public data from the same study (Lung5 Rep2, 106K cells) is used in
the squidpy CosMx tutorial. Both samples share identical bundle layout
and metadata columns and are interchangeable as inputs to
LoadCosMx().
# Download and extract the flat-file bundle from NanoString's public S3
# (example shown for Lung5 Rep2, the squidpy tutorial dataset)
url <- paste0(
"https://nanostring-public-share.s3.us-west-2.amazonaws.com/",
"SMI-Compressed/Lung5_Rep2+SMI+Flat+data.tar.gz"
)
dest <- file.path(dirname(getwd()), "cosmx_lung5.tar.gz")
download.file(url, dest, method = "wget", extra = "-c")
untar(dest, exdir = file.path(dirname(getwd()), "cosmx_lung5"))
data_dir <- normalizePath(
"../cosmx_lung9/Lung9_Rep1-Flat_files_and_images",
mustWork = FALSE
)
LoadCosMx()LoadCosMx() validates the bundle layout, invokes
Seurat::ReadNanostring() for the count matrix and
centroids, and automatically attaches the
*metadata_file.csv to the Seurat object. This is the step
that Seurat::LoadNanostring() itself skips when polygon
files are absent.
t0 <- proc.time()
obj <- LoadCosMx(data_dir, fov = "cosmx", verbose = FALSE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf(
"Loaded: %d cells x %d genes in %.1fs\n FOVs: %d\n Metadata cols: %d\n",
ncol(obj), nrow(obj), elapsed,
length(unique(obj$fov)),
ncol(obj[[]])
))
#> Loaded: 91805 cells x 980 genes in 11.7s
#> FOVs: 20
#> Metadata cols: 23
# First few rows of the per-cell metadata
head(obj[[]], 6)
The fov column identifies each cell’s field of view
(1–30 for this sample). Columns with Mean.* /
Max.* prefixes are co-registered immunofluorescence
intensities measured alongside the transcriptome:
| Column | Marker | Biology |
|---|---|---|
Mean.MembraneStain |
Membrane stain | Cell boundary reference |
Mean.PanCK |
Pan-cytokeratin | Epithelial / tumor cells |
Mean.CD45 |
CD45 (PTPRC) | All immune cells |
Mean.CD3 |
CD3 | T lymphocytes |
Mean.DAPI |
DAPI nuclear stain | Nuclear segmentation quality |
meta <- obj[[]]
p1 <- ggplot(meta, aes(x = nCount_Nanostring)) +
geom_histogram(bins = 80, fill = "#3182bd", color = NA) +
scale_x_log10() +
labs(x = "Total transcripts per cell (log10)", y = "Cells") +
theme_classic(base_size = 11)
p2 <- ggplot(meta, aes(x = nFeature_Nanostring)) +
geom_histogram(bins = 80, fill = "#31a354", color = NA) +
labs(x = "Unique genes per cell", y = "Cells") +
theme_classic(base_size = 11)
fov_counts <- as.data.frame(table(fov = factor(
meta$fov,
levels = as.character(sort(unique(as.integer(meta$fov))))
)))
p3 <- ggplot(fov_counts, aes(x = fov, y = Freq)) +
geom_col(fill = "#756bb1") +
labs(x = "FOV", y = "Cells") +
theme_classic(base_size = 10) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p1 + p2 + p3 + plot_layout(widths = c(1, 1, 2))
cat(sprintf(
"Median transcripts/cell : %.0f\nMedian genes/cell : %.0f\nCells per FOV (median) : %.0f\n",
median(meta$nCount_Nanostring),
median(meta$nFeature_Nanostring),
median(table(meta$fov))
))
#> Median transcripts/cell : 155
#> Median genes/cell : 98
#> Cells per FOV (median) : 4468
Co-registered protein markers are in obj@meta.data;
plotting them on the spatial map reveals tissue architecture without any
clustering step.
# Single FOV to keep rendering fast; FOV 1 is shown here
p_panck <- ImageFeaturePlot(obj, fov = "cosmx", features = "Mean.PanCK",
max.cutoff = "q95") +
ggtitle("PanCK (epithelial/tumor)") +
theme(plot.title = element_text(size = 11))
p_cd45 <- ImageFeaturePlot(obj, fov = "cosmx", features = "Mean.CD45",
max.cutoff = "q95") +
ggtitle("CD45 (immune cells)") +
theme(plot.title = element_text(size = 11))
p_panck | p_cd45
scConvert() serializes the Seurat object to a
squidpy-compatible h5ad. The output preserves:
Xobsm/spatial (global pixel
coordinates)obsuns/spatial/cosmx/segmentation/ (when present)uns/spatial/cosmx/molecules/ (when present)h5ad_out <- file.path(tempdir(), "cosmx_lung5.h5ad")
t0 <- proc.time()
scConvert(obj, h5ad_out, overwrite = TRUE)
elapsed <- (proc.time() - t0)[["elapsed"]]
cat(sprintf("Wrote h5ad: %.1fs | %.1f MB\n",
elapsed, file.size(h5ad_out) / 1e6))
#> Wrote h5ad: 12.8s | 830.8 MB
import anndata
import scanpy as sc
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
adata = anndata.read_h5ad(r.h5ad_out)
print(adata)
#> AnnData object with n_obs × n_vars = 91805 × 980
#> obs: 'orig.ident', 'nCount_Nanostring', 'nFeature_Nanostring', 'fov', 'cell_ID', 'Area', 'AspectRatio', 'CenterX_local_px', 'CenterY_local_px', 'CenterX_global_px', 'CenterY_global_px', 'Width', 'Height', 'Mean.MembraneStain', 'Max.MembraneStain', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD3', 'Max.CD3', 'Mean.DAPI', 'Max.DAPI'
#> uns: 'cosmx', 'spatial', 'spatial_technology'
#> obsm: 'spatial'
print(f"\nobs columns : {list(adata.obs.columns[:10])} ...")
#>
#> obs columns : ['orig.ident', 'nCount_Nanostring', 'nFeature_Nanostring', 'fov', 'cell_ID', 'Area', 'AspectRatio', 'CenterX_local_px', 'CenterY_local_px', 'CenterX_global_px'] ...
print(f"obsm keys : {list(adata.obsm.keys())}")
#> obsm keys : ['spatial']
print(f"uns/spatial : {list(adata.uns.get('spatial', {}).keys())}")
#> uns/spatial : ['cosmx']
The spatial coordinates live in obsm["spatial"] (global
pixel coordinates, matching the original CenterX_global_px
/ CenterY_global_px columns).
# Color all cells by FOV — each color is one tile of the tissue mosaic
adata.obsm["spatial"] = adata.obs[["CenterX_global_px",
"CenterY_global_px"]].values
sc.pl.embedding(
adata,
basis = "spatial",
color = "fov",
s = 2,
frameon = False,
title = "CosMx NSCLC: cells colored by FOV (20 tiles)",
legend_loc = None
)
plt.tight_layout()
plt.show()
# PanCK intensity on the spatial map — highlights epithelial/tumor regions
sc.pl.embedding(
adata,
basis = "spatial",
color = "Mean.PanCK",
s = 2,
frameon = False,
color_map = "viridis",
vmax = "p95",
title = "PanCK intensity (epithelial / tumor cells)"
)
plt.tight_layout()
plt.show()
# CD45 intensity — highlights immune infiltration
sc.pl.embedding(
adata,
basis = "spatial",
color = "Mean.CD45",
s = 2,
frameon = False,
color_map = "magma",
vmax = "p95",
title = "CD45 intensity (immune cells)"
)
plt.tight_layout()
plt.show()
| Component | h5ad location | Preserved |
|---|---|---|
| Expression matrix | X | Yes |
| Cell centroids (global) | obsm/spatial | Yes |
| FOV identifier | obs/fov | Yes |
| Cell morphology (Area, AspectRatio…) | obs | Yes |
| IF intensities (Mean.PanCK, Mean.CD45…) | obs | Yes |
| Global pixel coordinates | obs/CenterX_global_px, obs/CenterY_global_px | Yes |
| Segmentation polygons | uns/spatial/cosmx/segmentation/ | Yes (when present) |
| Transcript molecules | uns/spatial/cosmx/molecules/ | Yes (when present) |
Immunofluorescence columns are loaded from the
*metadata_file.csv during LoadCosMx() — they
are not returned by Seurat::ReadNanostring() alone and must
be explicitly read and attached, which LoadCosMx() handles
automatically.