SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.
This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.
For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:
π SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv
SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub π€:
Each dataset contains:
- 224Γ224 central patches with expert-verified class labels
- 24 surrounding context patches forming a 1120Γ1120 composite region
- 20X magnification for high-detail analysis
- Train-test splits ensuring robust benchmarking
π See individual dataset pages for more details.
Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub π€:
Each model supports:
- Patch-level classification with multi-class labels
- Improved accuracy using surrounding context patches
- Easy deployment for pathology AI applications
π See individual model pages for inference instructions.
Download any SPIDER dataset using huggingface_hub:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")Or clone directly using Git:
git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectalExtract dataset files:
cat spider-colorectal.tar.* | tar -xvf -Load a pretrained model for inference:
from transformers import AutoModel, AutoProcessor
from PIL import Image
model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)| Organ | Accuracy | Precision | F1 Score |
|---|---|---|---|
| Skin | 0.940 | 0.935 | 0.937 |
| Colorectal | 0.914 | 0.917 | 0.915 |
| Thorax | 0.962 | 0.958 | 0.960 |
| Breast | 0.902 | 0.896 | 0.897 |
This project is licensed under CC BY-NC 4.0. The dataset and models are available for research use only.
Authors: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
π© Emails: [email protected], [email protected], [email protected]
If you use SPIDER in your research, please cite:
@misc{nechaev2025spidercomprehensivemultiorgansupervised,
title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models},
author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
year={2025},
eprint={2503.02876},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2503.02876},
}