This repository houses tutorial notebooks to run GPU-accelerated single-cell analysis workflows using RAPIDS-singlecell, a GPU accelerated library developed by scverse®. The goal is of this repository is to help users try out and explore different capabilities of RAPIDS-singlecell on datasets ranging from 250 thousand to 11 million cells. To make this as easy as possible, we set up two different GPU envinronments on Brev that are designed to get you working with GPU-accelerated single-cell workflows as quickly as possible (see Quickstart). We've also provided instructions to run these notebooks on your own CUDA-enabled GPU systems (see Bring your own compute).
These notebooks will be valuable for single-cell scientists who want to quickly evaluate ease of use as well as explore the biological interpretability of RAPIDS-singlecell results. Secondarily, scientists will find value in learning to apply these methods to very large data sets. This repository is also broadly useful for any data scientist or developer who wants to run and evaluate single cell methods leveraging RAPIDS-singlecell. Data sets used for this tutorial were made publicly available by 10X as well as CZ cellxgene. The base container is the 25.04 RAPIDSAI Notebooks Container, which you can freely get from NVIDIA's NGC Catalog following the instructions below.
If you like these notebooks and this GPU accelerated capability, and want to support scverse's efforts, please learn more about them here as well as consider joining their community.
The quickest way to use these blueprints is to use one of our pre-configured NVIDIA Brev reseources.
- Select your resource size, and click "Deploy Now":
-
Click Deploy Launchable on the Brev.dev Launchable page
-
Wait for the Container status show Ready (can take up to 8 minutes). Then, click Access GPU
-
On the Instance page, click Open Notebook
You should drop into a fully installed and populated JupyterLab environment. Open up your desired notebook from the list below, and have a great time!
This repository contains a diverse set of notebooks to help get anyone started using RAPIDS-singlecell developed by scverse.
The outline below is a suggested exploration flow. Unless otherwise noted, you can choose any notebook to get started, as long as you have the GPU resources to run the notebook.
For those who are new to doing basic analysis for single cell data, the end to end analysis of 01_scRNA_analysis_preprocessing.ipynb is the best place to start, where you are walked through the steps of data preprocessing, cleanup, visualization, and investigation.
Notebook | Description | Min GPU Size / Instance |
---|---|---|
01_scRNA_analysis_preprocessing.ipynb | End to end workflow, where we understand the cells, run ETL on the data set then visiualize and explore the results. This tutorial is good for all users |
24GB / Standard RSC Instance |
02_scRNA_analysis_extended.ipynb | This notebook continues from the outputs of 01_scRNA_analysis_preprocessing.ipynb as an overview of methods that can be used to investigate transcriptional regulation | 24GB / Standard RSC Instance |
03_scRNA_analysis_with_pearson_residuals.ipynb | End to end workflow, like 01_scRNA_analysis_preprocessing.ipynb, but uses pearson residuals for normalization. | 24GB / Standard RSC Instance |
04_scRNA_analysis_dask_out_of_core.ipynb | In this notebook, we show the scalability of the analysis toof up to 11M cells easily by using Dask. Requires a 48GB GPU |
48GB / Standard RSC Instance |
05_scRNA_analysis_multi_GPU.ipynb | This notebook enhances the 11M cell dataset analysis with Dask without exceeding memory limits. It fully scales to utilize all available GPUs, uses chunk-based execution, and efficiently manages memory Requires 8x H100s or better. For all other GPUs systems, please run 04_scRNA_analysis_dask_out_of_core.ipynb instead |
8x 80GB / Large RSC Instance |
06_scRNA_analysis_90k_brain_example.ipynb | In this notebook, show diversity in capability by run a similar workflow to 01_scRNA_analysis_preprocessing.ipynb, but on brain cells | 24GB / Standard RSC Instance |
07_scRNA_analysis_1.3M_brain_example.ipynb | In this notebook, we scale up the analysis of 06_scRNA_analysis_90k_brain_example.ipynb to 1 million brain cells. Requires an 80GB GPU, like an H100 |
80GB / Large RSC Instance |
You can find more detail on each notebook in the Notebooks README.
Note
To ensure you have the maximum GPU memory available, please remember to shut down your completed notebook's kernel before starting a new notebook. If you don't, you may experience Out Of Memory (OOM) based errors. To fix that, simply kill all the kernels, and the restart only the kernel for the notebook you want to run.
The goal of this repository is to make it easy to try GPU-accelerated single-cell analysis workflows on different compute environments and datasets. Our preferred environment is NVIDIA Brev, but you can also run these in your own GPU-connected environment. We've provided a few tutorials below on how to set this up, and the easiest place to start is to follow the Quickstart instructions.
Follow our Quickstart Instructions above.
If you want to try a compute environment on Brev that's not one of the Quickstart Launchables, you will need to create a new Launchable or Standalone Compute Instance. This will let you select your desired cloud provider and desired compute resource. Note, we have not tested this on every combination of cloud provider and instance type, so your experience may vary.
If you're interested in trying this out, please follow the instructions here: Setting up your Custom Brev Launchable
Some people may want to have this experience off of Brev and take it with you. Great! We wrote a (somewhat) easy tutorial here: Bring your own compute
If you have any questions about these notebooks or need support, please open an Issue on this repository and we will respond there.