Skip to content
/ SPIDER Public

SPIDER: A multi-organ pathology dataset with expert annotations and pre-trained models for AI-driven research. Available on Hugging Face πŸ€—.

License

Notifications You must be signed in to change notification settings

HistAI/SPIDER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 

Repository files navigation

SPIDER: A Multi-Organ Supervised Pathology Dataset and Baseline Models

Overview

SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.

This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.


πŸ“„ Paper

For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:

πŸ“„ SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv


Resources

πŸ“‚ Datasets

SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub πŸ€—:

Each dataset contains:

  • 224Γ—224 central patches with expert-verified class labels
  • 24 surrounding context patches forming a 1120Γ—1120 composite region
  • 20X magnification for high-detail analysis
  • Train-test splits ensuring robust benchmarking

πŸ“Œ See individual dataset pages for more details.

πŸ€– Pretrained Models

Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub πŸ€—:

Each model supports:

  • Patch-level classification with multi-class labels
  • Improved accuracy using surrounding context patches
  • Easy deployment for pathology AI applications

πŸ“Œ See individual model pages for inference instructions.


πŸ”§ Getting Started

πŸ›  Using the Dataset

Download any SPIDER dataset using huggingface_hub:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")

Or clone directly using Git:

git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectal

Extract dataset files:

cat spider-colorectal.tar.* | tar -xvf -

πŸ€– Using the Model

Load a pretrained model for inference:

from transformers import AutoModel, AutoProcessor
from PIL import Image

model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)

image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)

πŸ“ˆ Benchmark Results

Organ Accuracy Precision F1 Score
Skin 0.940 0.935 0.937
Colorectal 0.914 0.917 0.915
Thorax 0.962 0.958 0.960
Breast 0.902 0.896 0.897

πŸ”— More Information


πŸ“œ License

This project is licensed under CC BY-NC 4.0. The dataset and models are available for research use only.


πŸ“§ Contact

Authors: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
πŸ“© Emails: [email protected], [email protected], [email protected]


πŸ“– Citation

If you use SPIDER in your research, please cite:

@misc{nechaev2025spidercomprehensivemultiorgansupervised,
      title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models}, 
      author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
      year={2025},
      eprint={2503.02876},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2503.02876}, 
}

About

SPIDER: A multi-organ pathology dataset with expert annotations and pre-trained models for AI-driven research. Available on Hugging Face πŸ€—.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published