SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

[🤗 Project] [🤗 Paper] [🤗 Dataset] [🤗 Checkpoint]

🎉TL;DR We introduce the geospatial pixel reasoning task, construct the first benchmark dataset (EarthReason), and propose a simple yet effective baseline (SegEarth-R1).

Remote sensing has become critical for understanding environmental dynamics, urban planning, and disaster management. However, traditional remote sensing workflows often rely on explicit segmentation or detection methods, which struggle to handle complex, implicit queries that require reasoning over spatial context, domain knowledge, and implicit user intent. Motivated by this, we introduce a new task, i.e., geospatial pixel reasoning, which allows implicit querying and reasoning and generates the mask of the target region. To advance this task, we construct and release the first large-scale benchmark dataset called EarthReason, which comprises 5,434 manually annotated image masks with over 30,000 implicit question-answer pairs. Moreover, we propose SegEarth-R1, a simple yet effective language-guided segmentation baseline that integrates a hierarchical visual encoder, a large language model (LLM) for instruction parsing, and a tailored mask generator for spatial correlation. The design of SegEarth-R1 incorporates domain-specific adaptations, including aggressive visual token compression to handle ultra-high-resolution remote sensing images, a description projection module to fuse language and multi-scale features, and a streamlined mask prediction pipeline that directly queries description embeddings. Extensive experiments demonstrate that SegEarth-R1 achieves state-of-the-art performance on both reasoning and referring segmentation tasks, significantly outperforming traditional and LLM-based segmentation methods.

🗞️ Update

2025-05-04: The code, dataset and checkpoints are released.
2025-04-15: 🔥🔥🔥 We release the paper of SegEarth-R1 on arXiv.

🔧 Usage：

Follow the guidelines below to set up, train and evaluate:

Preparation ⚙️: Instructions for organizing datasets and pretrained weights for proper model training and inference.
Installation 💻: Set up the segearthr1 conda environment, install dependencies, and clone the repo.
Training 🏋️‍♂️: Run scripts/train.sh with DeepSpeed, modifying parameters like data and model paths for training.
Evaluation 🎯: Run scripts/eval.sh to evaluate the model, updating paths as needed.

⭐️ Citation

If you find this project useful, welcome to cite us.

@article{li2025segearth,
  title={SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model},
  author={Li, Kaiyu and Xin, Zepeng and Pang, Li and Pang, Chao and Deng, Yupeng and Yao, Jing and Xia, Guisong and Meng, Deyu and Wang, Zhi and Cao, Xiangyong},
  journal={arXiv preprint arXiv:2504.09644},
  year={2025}
}

🙏 Acknowledgement

We appreciate PSALM and Mask2Former for making their models and code available as open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
scripts		scripts
segearth_r1		segearth_r1
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

🗞️ Update

🔧 Usage：

⭐️ Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

earth-insights/SegEarth-R1

Folders and files

Latest commit

History

Repository files navigation

SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

🗞️ Update

🔧 Usage：

⭐️ Citation

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages