Quantitative Spatial Analysis

Analyze collections of spatial data objects

Background

Spatial data in this case is any dataset in which measurements on a set of observations have been collected, where each observation also has a location in physical space.

A good example of this is spatial transcriptomics, where each cell (observation) has been measured for the expression of a collection of genes (features), and where the location of each cell is given in x/y coordinates.

Purpose

The goal of this project is to allow users to:

Select and annotate regions from one or more spatial datasets (e.g. cores from a TMA)
Build an aggregate collection which consists of multiple regions from one or more source datasets
Save that collection as a cohesive data package
Perform clustering on cells (e.g. k-means, leiden, louvain) and annotate each cluster (i.e. cell type)
Perform neighborhood analysis across the aggregate collection
Compute summary metrics on the cell type vs. neighborhood vs. region levels
Visualize the entire spatial data collection in an interactive way

Components

Interactive App

The interactive process of region selection can be performed with the Streamlit app defined by app.py and app/.

Development

To make modifications to the app, clone the GitHub repo and launch within your local system. With uv installed:

Install prerequisites with uv sync --locked
Run the app with uv run streamlit run app.py

Modifications to the app's code made locally can be automatically applied to the running process, with no need to relaunch the process.

Deployment

A pre-built version of the app can be launched using Docker:

docker run -p 8000:8000 quay.io/hdc-workflows/quantitative-spatial-analysis

By default the latest tag will be used. Specific versions of the code are indicated by short commit hash (e.g. quay.io/hdc-workflows/quantitative-spatial-analysis:21864a8). The full list of available tags can be found here: https://quay.io/repository/hdc-workflows/quantitative-spatial-analysis?tab=tags.

Region Definition

Each region is defined on the basis of:

A source dataset with a particular type (e.g. "xenium") which is accessible from a recognizable data repository (e.g. "cirro").
One or more outlines given as x/y coordinate arrays for which any points within those shapes are included in the region.

The JSON-serializable format for the object follows the pattern:

{
    "dataset": {
        "cirro_source": {
          "domain": "organization.cirro.bio",
            "project": "000000000-0000-0000-0000-000000000000",
            "dataset": "00000000-0000-0000-0000-000000000000",
            "path": "data/analysis-subfolder"
        },
        "type": "xenium"
    },
    "outline": [
        {
            "x": [
                1671.0085411393027,
                1497.2881668808081,
                1005.0804398150743,
                3431.375000292045,
                2811.7723321034155
            ],
            "xref": "x",
            "y": [
                4049.5208189474565,
                3861.008928715988,
                3528.3408871310435,
                4265.75504597767
            ],
            "yref": "y"
        }
    ]
}

Analysis Workflow

The analyze_regions.nf workflow can be used to analyze one or more spatial regions using the analysis steps outlined above. It is a Nextflow workflow which can be used following the the official documentation.

Inputs

regions: The path to a CSV file containing a list of regions, with the columns id (the name/identifier of the region) and uri (the path to the region.json file defined above)
resolution: The leiden clustering resolution used for determining cell types
n_neighbors: The number of neighbors for each individual cell to consider when performing neighborhood anlaysis
n_neighborhoods: The number of neighborhoods to return (provided as the k for k-means clustering)
outdir: The base path for all output files. Note that this cannot have a leading or trailing slash (/) based on Nextflow publishing implementation.

Outputs

├── combined
│   ├── counts.csv # The number of cells per region, per cell type, per neighborhood
│   └── spatialdata.h5ad # The spatial coordinates, measurements (e.g. gene expression) and annotations for each individual cell
├── logs
│   ├── cluster_points.txt
│   ├── make_plots.txt
│   ├── neighborhood_analysis.txt
│   ├── summary_stats.txt
│   └── vitessce.txt
└── regions
    ├── <regionId> # Information for each individual region
    │   ├── cell_types.vt.json
    │   ├── neighborhoods.vt.json
    │   └── spatialdata.zarr.zip # SpatialData object in zarr format
    └── <regionId>.json # Definition of the region as shown above

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.streamlit		.streamlit
app		app
bin		bin
modules		modules
templates		templates
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
analyze_regions.nf		analyze_regions.nf
app.py		app.py
docker-compose.yml		docker-compose.yml
nextflow.config		nextflow.config
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quantitative Spatial Analysis

Background

Purpose

Components

Interactive App

Development

Deployment

Region Definition

Analysis Workflow

Inputs

Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

FredHutch/quantitative-spatial-analysis

Folders and files

Latest commit

History

Repository files navigation

Quantitative Spatial Analysis

Background

Purpose

Components

Interactive App

Development

Deployment

Region Definition

Analysis Workflow

Inputs

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages