Fine-Tuning very slow (6h->24h??)

Hello, first of all thank you for releasing the training code for alpaca, we really appreaciate it.

I am running the fine-tuning script on an 4xA100-SXM4-80GB, and currently getting an 24H ETA. Which doesn't really scales with the reported "3 hours on 8 80GB A100s" mentioned on https://crfm.stanford.edu/2023/03/13/alpaca.html , Shouldn't it be around 6hours, or even 12hours considering that the script "is not particularly optimized"?

Is anyone else encountering this issue? And if this is expected, then what were the methods you used to optimize the fine-tuning process?

Running on CUDA 12.1, Torch 1.13, and the transformers fork of llama at the commit you mentioned.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-Tuning very slow (6h->24h??) #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-Tuning very slow (6h->24h??) #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions