[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

### System Info

docker image: ```huggingface/transformers-pytorch-deepspeed-latest-gpu:latest```
- `transformers` version: 4.36.0.dev0
- Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0.dev0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.1.0+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: rtx4060ti 16g
- Using distributed or parallel set-up in script?: no

### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

run:
```
deepspeed  --autotuning run \
./script/run_classification.py \
--model_name_or_path ckip-joint/bloom-1b1-zh \
--do_train \
--do_eval \
--output_dir ./bloom \
--train_file ./data/train.csv \
--validation_file ./data/test.csv \
--text_column_names sentence \
--label_column_name label \
--overwrite_output_dir \
--fp16 \
--torch_compile \
--deepspeed cfg/auto.json
```
cfg/auto.json:
```
{
    "train_micro_batch_size_per_gpu": "auto",
    "autotuning": {
      "enabled": true,
      "fast": false
    }
}
```
the error:
```
[2023-12-04 11:51:42,325] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-04 11:51:43,363] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-04 11:51:43,363] [INFO] [autotuner.py:71:__init__] Created autotuning experiments directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:84:__init__] Created autotuning results directory: autotuning_exps
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0])])
[2023-12-04 11:51:43,364] [INFO] [runner.py:362:run_autotuning] [Start] Running autotuning
[2023-12-04 11:51:43,364] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
  0%|                                                                                                                                             | 0/1 [00:00<?, ?it/s][2023-12-04 11:51:43,366] [INFO] [scheduler.py:344:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:351:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /workspaces/hf/autotuning_results/profile_model_info/exp.json
[2023-12-04 11:51:43,367] [INFO] [scheduler.py:378:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0, and ds_config = /workspaces/hf/autotuning_results/profile_model_info/ds_config.json
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
pdsh@b97c1584d47d: localhost: ssh exited with exit code 255
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2023-12-04 11:51:59,057] [INFO] [scheduler.py:393:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:25<00:00, 25.01s/it]
[2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (

[2023-12-04 11:52:08,378] [INFO] [runner.py:367:run_autotuning] [End] Running autotuning
[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.
```

### Expected behavior

train successfully

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning. #27830

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning. #27830

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions