-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Open
Labels
Feature requestRequest for a new featureRequest for a new feature
Description
Feature request
Hi HF team,
I'm wondering about the current status of tensor parallelism (TP) support in Hugging Face. I've noticed that some standard models, such as llama4 and mixtral, include TP sharding plans, and .from_pretrained
appears to support loading models with a TP plan. So it seems that TP is supported for inference.
However, I'm curious about training support. Does the transformers library support TP combined with data parallelism (DP) during training? Also, it looks like .save_pretrained
doesn't currently support saving TP-sharded models—can you confirm if that's the case, or if there's a workaround?
Thanks in advance!
Motivation
To support large-scale LLM training with TP.
Your contribution
Happy to contribute if there is a specific way to support TP+DP training.
Metadata
Metadata
Assignees
Labels
Feature requestRequest for a new featureRequest for a new feature