Analyze collections of spatial data objects
Spatial data in this case is any dataset in which measurements on a set of observations have been collected, where each observation also has a location in physical space.
A good example of this is spatial transcriptomics, where each cell (observation) has been measured for the expression of a collection of genes (features), and where the location of each cell is given in x/y coordinates.
The goal of this project is to allow users to:
- Select and annotate regions from one or more spatial datasets (e.g. cores from a TMA)
- Build an aggregate collection which consists of multiple regions from one or more source datasets
- Save that collection as a cohesive data package
- Perform clustering on cells (e.g. k-means, leiden, louvain) and annotate each cluster (i.e. cell type)
- Perform neighborhood analysis across the aggregate collection
- Compute summary metrics on the cell type vs. neighborhood vs. region levels
- Visualize the entire spatial data collection in an interactive way
The interactive process of region selection can be performed with the Streamlit
app defined by app.py
and app/
.
To make modifications to the app, clone the GitHub repo and launch within your
local system. With uv
installed:
- Install prerequisites with
uv sync --locked
- Run the app with
uv run streamlit run app.py
Modifications to the app's code made locally can be automatically applied to the running process, with no need to relaunch the process.
A pre-built version of the app can be launched using Docker:
docker run -p 8000:8000 quay.io/hdc-workflows/quantitative-spatial-analysis
By default the latest
tag will be used. Specific versions of the code
are indicated by short commit hash (e.g. quay.io/hdc-workflows/quantitative-spatial-analysis:21864a8
).
The full list of available tags can be found here: https://quay.io/repository/hdc-workflows/quantitative-spatial-analysis?tab=tags.
Each region is defined on the basis of:
- A source dataset with a particular type (e.g.
"xenium"
) which is accessible from a recognizable data repository (e.g."cirro"
). - One or more outlines given as x/y coordinate arrays for which any points within those shapes are included in the region.
The JSON-serializable format for the object follows the pattern:
{
"dataset": {
"cirro_source": {
"domain": "organization.cirro.bio",
"project": "000000000-0000-0000-0000-000000000000",
"dataset": "00000000-0000-0000-0000-000000000000",
"path": "data/analysis-subfolder"
},
"type": "xenium"
},
"outline": [
{
"x": [
1671.0085411393027,
1497.2881668808081,
1005.0804398150743,
3431.375000292045,
2811.7723321034155
],
"xref": "x",
"y": [
4049.5208189474565,
3861.008928715988,
3528.3408871310435,
4265.75504597767
],
"yref": "y"
}
]
}
The analyze_regions.nf
workflow can be used to analyze one or more
spatial regions using the analysis steps outlined above.
It is a Nextflow workflow which can be used following the
the official documentation.
regions
: The path to a CSV file containing a list of regions, with the columnsid
(the name/identifier of the region) anduri
(the path to theregion.json
file defined above)resolution
: The leiden clustering resolution used for determining cell typesn_neighbors
: The number of neighbors for each individual cell to consider when performing neighborhood anlaysisn_neighborhoods
: The number of neighborhoods to return (provided as thek
for k-means clustering)outdir
: The base path for all output files. Note that this cannot have a leading or trailing slash (/
) based on Nextflow publishing implementation.
├── combined
│ ├── counts.csv # The number of cells per region, per cell type, per neighborhood
│ └── spatialdata.h5ad # The spatial coordinates, measurements (e.g. gene expression) and annotations for each individual cell
├── logs
│ ├── cluster_points.txt
│ ├── make_plots.txt
│ ├── neighborhood_analysis.txt
│ ├── summary_stats.txt
│ └── vitessce.txt
└── regions
├── <regionId> # Information for each individual region
│ ├── cell_types.vt.json
│ ├── neighborhoods.vt.json
│ └── spatialdata.zarr.zip # SpatialData object in zarr format
└── <regionId>.json # Definition of the region as shown above