[Ascend NPU]Add initial support for Ascend devices #99

noemotiovon · 2025-07-15T10:34:00Z

What does this PR do?

This commit introduces native support for Ascend NPUs in the ROLL project, while preserving compatibility with existing CUDA-based infrastructure.

Key changes include:

Introduced a unified device abstraction interface to encapsulate device initialization, memory management, and synchronization, enabling extensibility for both CUDA and Ascend devices.
Replaced direct usage of and Ray CUDA resource APIs with the new abstraction layer to support multi-device environments.
Integrated Ascend inference backend via vLLM + vLLM-ascend.
Added experimental support for training with MindSpeed on Ascend hardware.

This enhancement lays the groundwork for seamless switching across CUDA and Ascend devices.

Related to #83

CLAassistant · 2025-07-15T10:34:06Z

All committers have signed the CLA.

noemotiovon · 2025-07-29T06:32:22Z

Environment

CANN: 8.1 RC1
torch: 2.5.1
torch-npu: 2.5.1
vllm: 0.8.4
vllm-ascend: 0.8.4rc2
deepspeed: 0.16.4
sglang: not support now
megatron: not support now

noemotiovon · 2025-07-29T07:01:00Z

Test 1 Script

bash examples/agentic_demo/run_agentic_pipeline_frozen_lake_single_node_demo.sh

Yaml

defaults:
  - ../config/envs@_here_
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
  run:
    dir: .
  output_subdir: null

exp_name: "agentic_pipeline"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
render_save_dir: /home/lichenguang25/tmp/data/oss_bucket_0/yali/output/render
system_envs:
  USE_MODELSCOPE: '1'

#track_with: wandb
#tracker_kwargs:
#  api_key:
#  project: roll-agentic
#  name: ${exp_name}_frozen_lake
#  notes: "agentic_pipeline"
#  tags:
#    - agentic
#    - roll
#    - baseline


track_with: tensorboard
tracker_kwargs:
  log_dir: /home/lichenguang25/tmp/data/oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_sokoban

num_gpus_per_node: 1

max_steps: 100
save_steps: 10000
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false

rollout_batch_size: 16
val_batch_size: 16
sequence_length: 4096

reward_clip: 20
advantage_clip: 10.0
ppo_epochs: 1
adv_estimator: "reinforce"
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0

pretrain: Qwen/Qwen2.5-0.5B-Instruct
reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct

actor_train:
  model_args:
    flash_attn: fa2
    disable_gradient_checkpointing: false
    dtype: fp16
    model_type: ~
  training_args:
    learning_rate: 1.0e-6
    weight_decay: 0
    per_device_train_batch_size: 1
    gradient_accumulation_steps: 16
    warmup_steps: 10
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero2}
    # strategy_name: megatron_train
    # strategy_config:
    #   tensor_model_parallel_size: 1
    #   pipeline_model_parallel_size: 1
    #   expert_model_parallel_size: 1
    #   use_distributed_optimizer: true
    #   recompute_granularity: full
  device_mapping: list(range(0,1))
  infer_batch_size: 1

actor_infer:
  model_args:
    flash_attn: fa2
    disable_gradient_checkpointing: true
    dtype: fp16
  generating_args:
    max_new_tokens: 32 # single-turn response length
    top_p: 0.99
    top_k: 100
    num_beams: 1
    temperature: 0.99
    num_return_sequences: 1
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: vllm
    strategy_config:
      gpu_memory_utilization: 0.8
      block_size: 16
      load_format: auto
  device_mapping: list(range(0,1))
  infer_batch_size: 1

reference:
  model_args:
    flash_attn: fa2
    disable_gradient_checkpointing: true
    dtype: fp16
    model_type: ~
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: hf_infer
    strategy_config: ~
  device_mapping: list(range(0,1))
  infer_batch_size: 1

enable_response_mask: True
action_sep: "||"
use_turn_scores: False # important to GAE when applying token-level rewards to token-level advantages. If False, will take the sum of scores as the reward for the last turn.
enable_think: False # False -> no think RL
max_actions_per_traj: 10
reward_normalization:
  grouping: tags # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
  method: identity # asym_clip / identity / mean_std

custom_envs:
  SimpleSokoban:
    env_type: sokoban
    max_actions_per_traj:  ${max_actions_per_traj} # used in environment state manager to control the actual max actions executed per trajectory
    max_steps_per_traj: ${max_actions_per_traj}
    env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
    max_tokens: 100 # used to curate llm prompt "max words", not used for rollout
    env_config: # keys should be a subset of SokobanConfig
      dim_x: 6
      dim_y: 6
      num_boxes: 1
      max_steps: ${max_actions_per_traj}
  LargerSokoban:
    env_type: sokoban
    max_actions_per_traj:  ${max_actions_per_traj}
    max_steps_per_traj: ${max_actions_per_traj}
    env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
    max_tokens: 100
    env_config:
      dim_x: 8
      dim_y: 8
      num_boxes: 2
      max_steps: ${max_actions_per_traj}
      search_depth: 10
  SokobanDifferentGridVocab:
    env_type: sokoban
    max_actions_per_traj:  ${max_actions_per_traj}
    max_steps_per_traj: ${max_actions_per_traj}
    env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
    max_tokens: 100
    env_config: # keys should be a subset of SokobanConfig
      search_depth: 30
      dim_x: 6
      dim_y: 6
      num_boxes: 1
      max_steps: ${max_actions_per_traj}
      grid_lookup: { 0: "W", 1: ".", 2: "G", 3: "C", 4: "B", 5: "A", 6: "@" }
      grid_vocab: { "W": "wall", ".": "empty", "G": "target", "C": "box on target", "B": "box", "A": "player", "@": "player on target" }
  FrozenLake:
    env_type: frozen_lake
    max_actions_per_traj:  ${max_actions_per_traj}
    max_steps_per_traj: ${max_actions_per_traj}
    env_instruction: "You are solving the FrozenLake puzzle. Forbid the whole and go to the target. You may move to the unintended direction due to the slippery ice. The answer must be one of action in a turn, format is <answer>Right</answer>"
    max_tokens: 100
    env_config:
      is_slippery: false

train_env_manager:
  format_penalty: -0.001
  env_groups: 1
  group_size: 1
  tags: [FrozenLake]
  n_groups: [1] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

val_env_manager:
  env_groups: 2
  group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
  tags: [SimpleSokoban, FrozenLake]
  n_groups: [1, 1] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

Result

noemotiovon · 2025-07-29T10:33:45Z

Test 2 Script

bash examples/qwen2.5-0.5B-agentic_ds/run_agentic_pipeline_sokoban.sh

Yaml

defaults:
  - ../config/traj_envs@_here_
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
  run:
    dir: .
  output_subdir: null

exp_name: "agentic_pipeline"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
render_save_dir: ./output/render
system_envs:
  USE_MODELSCOPE: '1'

#track_with: wandb
#tracker_kwargs:
#  api_key:
#  project: roll-agentic
#  name: ${exp_name}_sokoban
#  notes: "agentic_pipeline"
#  tags:
#    - agentic
#    - roll
#    - baseline

track_with: tensorboard
tracker_kwargs:
  log_dir: /home/lichenguang25/tmp/data/oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_sokoban


checkpoint_config:
  type: file_system
  output_dir: /home/lichenguang25/tmp/data/cpfs_0/rl_examples/models/${exp_name}

num_gpus_per_node: 4

max_steps: 1024
save_steps: 10000
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false

rollout_batch_size: 128
val_batch_size: 1024
sequence_length: 2048

advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "grpo"
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0

pretrain: Qwen/Qwen2.5-0.5B-Instruct
reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct

actor_train:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: false
    dtype: bf16
    model_type: ~
  training_args:
    learning_rate: 1.0e-6
    weight_decay: 0
    per_device_train_batch_size: 2
    gradient_accumulation_steps: 64
    warmup_steps: 10
    lr_scheduler_type: cosine
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}
    # strategy_name: megatron_train
    # strategy_config:
    #   tensor_model_parallel_size: 1
    #   pipeline_model_parallel_size: 1
    #   expert_model_parallel_size: 1
    #   use_distributed_optimizer: true
    #   recompute_granularity: full
  device_mapping: list(range(0,2))
  infer_batch_size: 1

actor_infer:
  model_args:
    disable_gradient_checkpointing: true
    dtype: bf16
  generating_args:
    max_new_tokens: 128 # single-turn response length
    top_p: 0.99
    top_k: 100
    num_beams: 1
    temperature: 0.99
    num_return_sequences: 1
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: vllm
    strategy_config:
      gpu_memory_utilization: 0.8
      block_size: 16
      load_format: auto
  device_mapping: list(range(2,3))

reference:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
    model_type: ~
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: hf_infer
    strategy_config: ~
  device_mapping: list(range(3,4))
  infer_batch_size: 1

reward_normalization:
  grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
  method: mean_std # asym_clip / identity / mean_std

train_env_manager:
  format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
  max_env_num_per_worker: 16
  num_env_groups: 128
  # under the same group, the env config and env seed are ensured to be equal
  group_size: 8
  tags: [SimpleSokoban]
  num_groups_partition: [128] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

val_env_manager:
  max_env_num_per_worker: 32
  num_env_groups: 1024
  group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
  tags: [SimpleSokoban, LargerSokoban, SokobanDifferentGridVocab, FrozenLake]
  num_groups_partition: [256, 256, 256, 256] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation


# Here, you can override variables defined in the imported envs. max_tokens_per_step: 128 in custom_env.SimpleSokoban, here replaced by 64
max_tokens_per_step: 64

custom_envs:
  SimpleSokoban:
    ${custom_env.SimpleSokoban}
  LargerSokoban:
    ${custom_env.LargerSokoban}
  SokobanDifferentGridVocab:
    ${custom_env.SokobanDifferentGridVocab}
  FrozenLake:
    ${custom_env.FrozenLake}
  FrozenLakeThink:
    ${custom_env.FrozenLakeThink}

Result

HuangJoJo · 2025-08-01T10:09:06Z

Thanks for your Contribution to ROLL! This Design helps a lot for other Hardware support work.

…raction This commit introduces native support for Ascend NPUs in the ROLL project, while preserving compatibility with existing CUDA-based infrastructure. Key changes include: - Introduced a unified device abstraction interface to encapsulate device initialization, memory management, and synchronization, enabling extensibility for both CUDA and Ascend devices. - Replaced direct usage of and Ray CUDA resource APIs with the new abstraction layer to support multi-device environments. - Integrated Ascend inference backend via vLLM + vLLM-ascend. - Added experimental support for training with MindSpeed on Ascend hardware. This enhancement lays the groundwork for seamless switching across CUDA and Ascend devices. Signed-off-by: noemotiovon <[email protected]>

…raction This commit introduces native support for Ascend NPUs in the ROLL project, while preserving compatibility with existing CUDA-based infrastructure. Key changes include: - Introduced a unified device abstraction interface to encapsulate device initialization, memory management, and synchronization, enabling extensibility for both CUDA and Ascend devices. - Replaced direct usage of and Ray CUDA resource APIs with the new abstraction layer to support multi-device environments. - Integrated Ascend inference backend via vLLM + vLLM-ascend. - Added experimental support for training with MindSpeed on Ascend hardware. This enhancement lays the groundwork for seamless switching across CUDA and Ascend devices. Signed-off-by: noemotiovon <[email protected]> # Conflicts: # roll/distributed/executor/cluster.py # roll/distributed/executor/worker.py # roll/distributed/scheduler/initialize.py

Signed-off-by: noemotiovon <[email protected]>

noemotiovon · 2025-08-12T09:29:18Z

Test 3 Script

bash examples/qwen2.5-0.5B-agentic/run_agentic_pipeline_frozen_lake.sh

Yaml

defaults:
  - ../config/traj_envs@_here_
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
  run:
    dir: .
  output_subdir: null

exp_name: "agentic_pipeline"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
render_save_dir: ./output/render
system_envs:
  USE_MODELSCOPE: '1'

#track_with: wandb
#tracker_kwargs:
#  api_key:
#  project: roll-agentic
#  name: ${exp_name}_sokoban
#  notes: "agentic_pipeline"
#  tags:
#    - agentic
#    - roll
#    - baseline

track_with: tensorboard
tracker_kwargs:
  log_dir: /home/lichenguang25/tmp/data/oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_frozen_lake


checkpoint_config:
  type: file_system
  output_dir: /home/lichenguang25/tmp/data/cpfs_0/rl_examples/models/${exp_name}

num_gpus_per_node: 4

max_steps: 1024
save_steps: 10000
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false

rollout_batch_size: 128
val_batch_size: 1024
sequence_length: 2048

advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "grpo"
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0

pretrain: Qwen/Qwen2.5-0.5B-Instruct
reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct

actor_train:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: false
    dtype: bf16
    model_type: ~
  training_args:
    learning_rate: 1.0e-6
    weight_decay: 0
    per_device_train_batch_size: 2
    gradient_accumulation_steps: 64
    warmup_steps: 10
    lr_scheduler_type: cosine
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}
    # strategy_name: megatron_train
    # strategy_config:
    #   tensor_model_parallel_size: 1
    #   pipeline_model_parallel_size: 1
    #   expert_model_parallel_size: 1
    #   use_distributed_optimizer: true
    #   recompute_granularity: full
  device_mapping: list(range(0,2))
  infer_batch_size: 1

actor_infer:
  model_args:
    disable_gradient_checkpointing: true
    dtype: bf16
  generating_args:
    max_new_tokens: 128 # single-turn response length
    top_p: 0.99
    top_k: 100
    num_beams: 1
    temperature: 0.99
    num_return_sequences: 1
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: vllm
    strategy_config:
      gpu_memory_utilization: 0.8
      block_size: 16
      load_format: auto
  device_mapping: list(range(2,3))

reference:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
    model_type: ~
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: hf_infer
    strategy_config: ~
  device_mapping: list(range(3,4))
  infer_batch_size: 1

reward_normalization:
  grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
  method: mean_std # asym_clip / identity / mean_std

train_env_manager:
  format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
  max_env_num_per_worker: 16
  num_env_groups: 128
  # under the same group, the env config and env seed are ensured to be equal
  group_size: 8
  tags: [FrozenLake]
  num_groups_partition: [128] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation

val_env_manager:
  max_env_num_per_worker: 32
  num_env_groups: 1024
  group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
  tags: [SimpleSokoban, LargerSokoban, SokobanDifferentGridVocab, FrozenLake]
  num_groups_partition: [256, 256, 256, 256] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation


# Here, you can override variables defined in the imported envs. max_tokens_per_step: 128 in custom_env.SimpleSokoban, here replaced by 64
max_tokens_per_step: 64

custom_envs:
  SimpleSokoban:
    ${custom_env.SimpleSokoban}
  LargerSokoban:
    ${custom_env.LargerSokoban}
  SokobanDifferentGridVocab:
    ${custom_env.SokobanDifferentGridVocab}
  FrozenLake:
    ${custom_env.FrozenLake}
  FrozenLakeThink:
    ${custom_env.FrozenLakeThink}
  FrozenLakeLocallyDefineExamples:  # Can import from unified envs config or define dict locally
    env_type: frozen_lake
    max_tokens_per_step: ${max_tokens_per_step}
    user_prompt_format: ${user_prompt_think_format}
    env_manager_cls: ${env_manager_cls}
    use_thread_lock: true
    env_config:
      env_instruction: "You are solving the FrozenLake puzzle. Forbid the whole and go to the target. You may move to the unintended direction due to the slippery ice. The answer must be one of action in a turn, format is <answer>Right</answer>"
      action_pattern: ${think_action_pattern}
      max_steps: ${max_actions_per_traj}
      is_slippery: false

Result

lowdy1 · 2025-08-15T06:11:52Z

Test 4 Script

python examples/start_agentic_rollout_pipeline.py --config_path qwen2.5-0.5B-agentic  --config_name agentic_rollout_sokoban

yaml

defaults:
  - ../config/traj_envs@_here_
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
  run:
    dir: .
  output_subdir: null

exp_name: "agentic_pipeline"
seed: 42
render_save_dir: ./oss_bucket_0/yali/llm/output/render

#track_with: wandb
#tracker_kwargs:
#  api_key:
#  project: roll-agentic
#  name: ${exp_name}_sokoban
#  notes: "agentic_pipeline"
#  tags:
#    - agentic
#    - roll
#    - baseline

track_with: tensorboard
tracker_kwargs:
  log_dir: ./oss_bucket_0/yali/llm/tensorboard/roll_exp/agentic_sokoban

num_gpus_per_node: 4

max_steps: 128
save_steps: 10000

rollout_batch_size: 16
sequence_length: 1024

pretrain: Qwen/Qwen2.5-0.5B-Instruct

actor_infer:
  model_args:
    disable_gradient_checkpointing: true
    dtype: bf16
  generating_args:
    max_new_tokens: 128 # single-turn response length
    top_p: 0.99
    top_k: 100
    num_beams: 1
    temperature: 0.99
    num_return_sequences: 1
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: vllm
    strategy_config:
      gpu_memory_utilization: 0.8
      block_size: 16
      load_format: auto # should set 'auto' here, because default load_format is 'dummy'
  device_mapping: list(range(0,4))

train_env_manager:
  format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
  max_env_num_per_worker: 1
  num_env_groups: 1
  # under the same group, the env config and env seed are ensured to be equal
  group_size: 1
  tags: [SimpleSokoban]
  num_groups_partition: [1] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation


# Here, you can override variables defined in the imported envs. max_tokens_per_step: 128 in custom_env.SimpleSokoban, here replaced by 64
max_tokens_per_step: 64

custom_envs:
  SimpleSokoban:
    ${custom_env.SimpleSokoban}
  LargerSokoban:
    ${custom_env.LargerSokoban}
  SokobanDifferentGridVocab:
    ${custom_env.SokobanDifferentGridVocab}
  FrozenLake:
    ${custom_env.FrozenLake}
  FrozenLakeThink:
    ${custom_env.FrozenLakeThink}

Signed-off-by: noemotiovon <[email protected]>

noemotiovon · 2025-08-15T08:46:01Z

Test 5 Script

bash examples/qwen2.5-3B-dpo_megatron/run_dpo_pipeline.sh

Yaml

defaults:
  - ../config/deepspeed_zero@_here_
  - ../config/deepspeed_zero2@_here_
  - ../config/deepspeed_zero3@_here_
  - ../config/deepspeed_zero3_cpuoffload@_here_

hydra:
  run:
    dir: .
  output_subdir: null

num_gpus_per_node: 4

exp_name: "distill_zero3"
seed: 42
logging_dir: ./output/logs
output_dir: ./output

checkpoint_config:
  type: file_system
  output_dir: /home/lichenguang25/tmp/data/oss_bucket_0/chuye/roll/distill/models/zero3

save_steps: 100
logging_steps: 1
resume_from_checkpoint: false

student_pretrain: Qwen/Qwen2.5-0.5B-Instruct
teacher_pretrain: Qwen/Qwen2.5-1.5B-Instruct

# distill config
distill_loss_weight: 0.85
kd_objective: forward_kl
distill_on_prompt: True

sequence_length: 1024
max_grad_norm: 1.0

question_key: question_zh
answer_key: answer_zh


student:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: false
    dtype: bf16
    model_type: ~
  training_args:
    learning_rate: 2.0e-5
    weight_decay: 1.0e-2
    lr_scheduler_type: constant
    per_device_train_batch_size: 1
    gradient_accumulation_steps: 1
    warmup_steps: 0
    num_train_epochs: 1
  data_args:
    template: qwen2_5
    file_name:
      - data/GSM8K_zh/GSM8K_zh.json        #https://huggingface.co/datasets/meta-math/GSM8K_zh
    preprocessing_num_workers: 4

  strategy_args:
    strategy_name: deepspeed_train
    strategy_config: ${deepspeed_zero3}
  device_mapping: list(range(0,2))

teacher:
  model_args:
    attn_implementation: fa2
    disable_gradient_checkpointing: true
    dtype: bf16
  data_args:
    template: qwen2_5
  strategy_args:
    strategy_name: deepspeed_infer
    strategy_config: ${deepspeed_zero3}
  device_mapping: list(range(2,4))

system_envs:
  RAY_PROFILING: "0"

Result

noemotiovon mentioned this pull request Jul 15, 2025

RFC: Ascend NPU Support Proposal #83

Open

noemotiovon force-pushed the ascend_npu_support branch from 8ef169b to a4f9de3 Compare July 26, 2025 09:29

noemotiovon force-pushed the ascend_npu_support branch from a4f9de3 to 3733f07 Compare August 11, 2025 01:49

noemotiovon and others added 7 commits August 12, 2025 03:22

(refactor): extract gpu & ray utils.

c5b0260

add other platform and refactory

91634cb

Signed-off-by: noemotiovon <[email protected]>

fix some bug

10c2124

Signed-off-by: noemotiovon <[email protected]>

Add get_vllm_run_time_env_vars

62870ff

Signed-off-by: noemotiovon <[email protected]>

remove RayUtils

a958bf4

Signed-off-by: noemotiovon <[email protected]>

noemotiovon force-pushed the ascend_npu_support branch from 3733f07 to a958bf4 Compare August 12, 2025 09:32

add int64 to int32

2ee7599

Signed-off-by: noemotiovon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ascend NPU]Add initial support for Ascend devices #99

[Ascend NPU]Add initial support for Ascend devices #99

Uh oh!

noemotiovon commented Jul 15, 2025

Uh oh!

CLAassistant commented Jul 15, 2025 •

edited

Loading

Uh oh!

noemotiovon commented Jul 29, 2025

Uh oh!

noemotiovon commented Jul 29, 2025

Uh oh!

noemotiovon commented Jul 29, 2025 •

edited

Loading

Uh oh!

HuangJoJo commented Aug 1, 2025

Uh oh!

noemotiovon commented Aug 12, 2025

Uh oh!

lowdy1 commented Aug 15, 2025 •

edited

Loading

Uh oh!

noemotiovon commented Aug 15, 2025

Uh oh!

Uh oh!

[Ascend NPU]Add initial support for Ascend devices #99

Are you sure you want to change the base?

[Ascend NPU]Add initial support for Ascend devices #99

Uh oh!

Conversation

noemotiovon commented Jul 15, 2025

What does this PR do?

Uh oh!

CLAassistant commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noemotiovon commented Jul 29, 2025

Environment

Uh oh!

noemotiovon commented Jul 29, 2025

Test 1 Script

Yaml

Result

Uh oh!

noemotiovon commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test 2 Script

Yaml

Result

Uh oh!

HuangJoJo commented Aug 1, 2025

Uh oh!

noemotiovon commented Aug 12, 2025

Test 3 Script

Yaml

Result

Uh oh!

lowdy1 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noemotiovon commented Aug 15, 2025

Test 5 Script

Yaml

Result

Uh oh!

Uh oh!

CLAassistant commented Jul 15, 2025 •

edited

Loading

noemotiovon commented Jul 29, 2025 •

edited

Loading

lowdy1 commented Aug 15, 2025 •

edited

Loading