[RFC] [sub roadmap] [25Q2] Add Ascend NPU support for verl

## Contents
Native support for verl on Ascend NPU has attracted the attention of some developers. This roadmap shows the progress of native support, welcome to join in the discussion.


## Q1 RoadMap
- https://github.com/volcengine/verl/issues/338

The verl-NPU workflow depends on vLLM-ascend's version, so it has only been rebase on vLLM-ascend **tag: v0.7.3rc1**. We will continue to rebase as vLLM-ascend is updated.

### Quick Start
document: [ascend.rst](https://github.com/volcengine/verl/pull/332/files#diff-6ae868c01dd1da3f48ea396d4878533421748791797e26fabac4a6c409d63522)


## Plan
### Dependencies (Q1 done)
- [x] Native support ```transformers```
- [x] Native support ```ray```
- [x] Native support ```FSDP``` worker
 - Review PR: https://github.com/volcengine/verl/pull/332
- [x] Support ```vLLM-ascend v0.7.1```
- [x] Support ```vLLM-ascend v0.7.3``` (Some features have been temporarily circumvented and marked in the Q2 Plan)


### Q2 Plan
- [ ] vLLM-ascend sleep mode: https://github.com/vllm-project/vllm-ascend/issues/320
- [ ] Support ```megatron```/```mindspeed``` worker (for npu, megatron≈mindspeed)
 - [ ] add mindspeed (matching megatron v0.10.x): WIP @Chendong98
- [ ] Native support ```torchtitan```/```FSDP2``` worker
- [ ] ```--use_remove_padding```
- [ ] **Performance Optimization**


## Release Accuracy Comparison Results
Modify the default config as little as possible to keep the accuracy.

| | ALGO | Model | Result (Mean Absolute Error) | CANN |
|:---:|:-----|:-------------------------------------|:--------------------------------------------------------------------------------------|:----------------------|
| ✅ | SFT | Qwen2.5-0.5B-Instruct[1] | [chart](https://github.com/volcengine/verl/pull/332#issuecomment-2765951791) @as12138 | 8.1.RC1 (not release) |
| | GRPO | Qwen2-7B-Instruct[2] | [chart](https://github.com/volcengine/verl/pull/332#issuecomment-2765951791) @as12138 | 8.1.RC1 (not release) |
| | GRPO | Qwen2.5-VL-3B-Instruct[1] | WIP @as12138 | 8.1.RC1 (not release) |
| | GRPO | Qwen2.5-VL-7B-Instruct | Waiting for ```sleep mode``` | 8.1.RC1 (not release) |
| | GRPO | Qwen2.5-7B-Instruct | Waiting for ```sleep mode``` | 8.1.RC1 (not release) |

Notes:

[1] NPU currently doesn't support **sleep mode** (required for hybrid engine). In order to obtain results efficiently, we use 2B/3B model on 8 devices when verifying ALGO.

[2] Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported.


## Easy of use
```flash-attn``` is not supported on Ascend NPU. So we need to use ```torch_npu.npu_fusion_attention``` to replace.
- [x] ~~[Temporary solution] NPU support SDPA: https://github.com/huggingface/transformers/pull/35165~~
- [x] Using flash_attn on NPU @FightingZhen: https://github.com/huggingface/transformers/pull/36696


## Long-term Planning
- [ ] ring attention
- [ ] torch.compile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] [sub roadmap] [25Q2] Add Ascend NPU support for verl #842

Contents

Q1 RoadMap

Quick Start

Plan

Dependencies (Q1 done)

Q2 Plan

Release Accuracy Comparison Results

Easy of use

Long-term Planning

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	ALGO	Model	Result (Mean Absolute Error)	CANN
✅	SFT	Qwen2.5-0.5B-Instruct^[1]	chart @as12138	8.1.RC1 (not release)
	GRPO	Qwen2-7B-Instruct^[2]	chart @as12138	8.1.RC1 (not release)
	GRPO	Qwen2.5-VL-3B-Instruct^[1]	WIP @as12138	8.1.RC1 (not release)
	GRPO	Qwen2.5-VL-7B-Instruct	Waiting for `sleep mode`	8.1.RC1 (not release)
	GRPO	Qwen2.5-7B-Instruct	Waiting for `sleep mode`	8.1.RC1 (not release)

Uh oh!

[RFC] [sub roadmap] [25Q2] Add Ascend NPU support for verl #842

Description

Contents

Q1 RoadMap

Quick Start

Plan

Dependencies (Q1 done)

Q2 Plan

Release Accuracy Comparison Results

Easy of use

Long-term Planning

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions