-
Couldn't load subscription status.
- Fork 2.4k
Description
Contents
Native support for verl on Ascend NPU has attracted the attention of some developers. This roadmap shows the progress of native support, welcome to join in the discussion.
Q1 RoadMap
The verl-NPU workflow depends on vLLM-ascend's version, so it has only been rebase on vLLM-ascend tag: v0.7.3rc1. We will continue to rebase as vLLM-ascend is updated.
Quick Start
document: ascend.rst
Plan
Dependencies (Q1 done)
- Native support
transformers - Native support
ray - Native support
FSDPworker - Support
vLLM-ascend v0.7.1 - Support
vLLM-ascend v0.7.3(Some features have been temporarily circumvented and marked in the Q2 Plan)
Q2 Plan
- vLLM-ascend sleep mode: [Feature]: Support sleep mode in vllm-ascend vllm-project/vllm-ascend#320
- Support
megatron/mindspeedworker (for npu, megatron≈mindspeed)- add mindspeed (matching megatron v0.10.x): WIP @Chendong98
- Native support
torchtitan/FSDP2worker -
--use_remove_padding - Performance Optimization
Release Accuracy Comparison Results
Modify the default config as little as possible to keep the accuracy.
| ALGO | Model | Result (Mean Absolute Error) | CANN | |
|---|---|---|---|---|
| ✅ | SFT | Qwen2.5-0.5B-Instruct[1] | chart @as12138 | 8.1.RC1 (not release) |
| GRPO | Qwen2-7B-Instruct[2] | chart @as12138 | 8.1.RC1 (not release) | |
| GRPO | Qwen2.5-VL-3B-Instruct[1] | WIP @as12138 | 8.1.RC1 (not release) | |
| GRPO | Qwen2.5-VL-7B-Instruct | Waiting for sleep mode |
8.1.RC1 (not release) | |
| GRPO | Qwen2.5-7B-Instruct | Waiting for sleep mode |
8.1.RC1 (not release) |
Notes:
[1] NPU currently doesn't support sleep mode (required for hybrid engine). In order to obtain results efficiently, we use 2B/3B model on 8 devices when verifying ALGO.
[2] Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported.
Easy of use
flash-attn is not supported on Ascend NPU. So we need to use torch_npu.npu_fusion_attention to replace.
-
[Temporary solution] NPU support SDPA: NPU support SDPA huggingface/transformers#35165 - Using flash_attn on NPU @FightingZhen: [Feature] Support using FlashAttention2 on Ascend NPU huggingface/transformers#36696
Long-term Planning
- ring attention
- torch.compile