Skip to content

Tensor parallel support for LLM training. #37505

@czkkkkkk

Description

@czkkkkkk

Feature request

Hi HF team,

I'm wondering about the current status of tensor parallelism (TP) support in Hugging Face. I've noticed that some standard models, such as llama4 and mixtral, include TP sharding plans, and .from_pretrained appears to support loading models with a TP plan. So it seems that TP is supported for inference.

However, I'm curious about training support. Does the transformers library support TP combined with data parallelism (DP) during training? Also, it looks like .save_pretrained doesn't currently support saving TP-sharded models—can you confirm if that's the case, or if there's a workaround?

Thanks in advance!

Motivation

To support large-scale LLM training with TP.

Your contribution

Happy to contribute if there is a specific way to support TP+DP training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions