Skip to content

Why does applying sequence parallelism reduce the step count? #4553

@mungg

Description

@mungg

Thanks for building such a great framework!

I had a question while using GRPO with sequence parallelism (SP). I was training on 1,000 data samples using 2 GPUs, per_device_train_batch_size=1 and I noticed something that confused me:

With SP=2 → training steps = 250
With SP=1 → training steps = 500

I initially thought that with SP, each sequence is split across two GPUs, so it should actually take more steps to process the same number of sequences. But the opposite is happening.

Am I misunderstanding something here? Would love your help!

here is command line I used for training model

NPROC_PER_NODE=2 \
PYTORCH_CUDA_ALLOC_CONF='' \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --train_type full \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.5 \
    --vllm_max_model_len 1024 \
    --vllm_tensor_parallel_size 1 \
    --dataset AI-MO/NuminaMath-TIR@1000
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --max_length 1024 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --eval_steps 1000 \
    --save_steps 1000 \
    --learning_rate 1e-6 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --max_completion_length 1024 \
    --reward_funcs accuracy format \
    --num_generations 8 \
    --system examples/train/grpo/prompt.txt \
    --deepspeed zero3_offload \
    --temperature 1.0 \
    --top_p 1.0 \
    --top_k 80 \
    --attn_impl flash_attn \
    --log_completions true \
    --async_generate false \
    --offload_optimizer true \
    --offload_model true \
    --padding_free true \
    --sequence_parallel_size 2 \
    --gc_collect_after_offload true \
    --dataloader_drop_last true \
    --sleep_level 1 \
    --split_dataset_ratio 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions