Please correct the following DeepSpeed config values that mismatch TrainingArguments values: scheduler.params.total_num_steps=0 vs hf num_training_steps (calculated)= 260

### System Info

- `transformers` version: 4.36.2
- Platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27
- Python version: 3.9.18
- Huggingface_hub version: 0.21.1
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

 raise ValueError(
ValueError: Please correct the following DeepSpeed config values that mismatch TrainingArguments values:
- ds scheduler.params.total_num_steps=0 vs hf num_training_steps (calculated)=260
The easiest method is to set these DeepSpeed config values to 'auto'.

When I use transformers==4.28.1 + deepspeed==0.13.3 for Llama2 fine-tuning, the code runs normally and training is completed. This error occurs when I upgrade transformers to 4.36.x, 4.37.x or 4.38.1 respectively.
And I have not modified the default_offload_opt_param.json file of deepspeed. The contents of the file are as follows:
```python
{
  "bf16": {
    "enabled": "auto"
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "scheduler": {
    "type": "WarmupDecayLR",
    "params": {
      "total_num_steps": "auto",
      "warmup_min_lr": "auto",
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "cpu",
      "pin_memory": true
    },
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "steps_per_print": 5,
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": false
}
```
The value of scheduler.params.total_num_steps is always "auto".

### Expected behavior

please fix this bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please correct the following DeepSpeed config values that mismatch TrainingArguments values: scheduler.params.total_num_steps=0 vs hf num_training_steps (calculated)= 260 #29348

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Please correct the following DeepSpeed config values that mismatch TrainingArguments values: scheduler.params.total_num_steps=0 vs hf num_training_steps (calculated)= 260 #29348

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions