Unable to save model after training with tensor parallel

### System Info

Currently, attempting to save model after training with tensor parallel gives the `RuntimeError: Attempted to access the data pointer on an invalid python storage`, this is due to the state dict not properly gathered from the sharded tensors beforehand. 

Fix here: https://github.com/huggingface/transformers/pull/36434

![Image](https://github.com/user-attachments/assets/5a0052e2-548c-4486-977d-8c7c37d2bee9)

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Train the model with tensor parallelism by parsing `tp_size >=2` into the trainer, make sure to specify `output_dir` for the model saving directory.

### Expected behavior

Model is saved upon completion of training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to save model after training with tensor parallel #36436

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to save model after training with tensor parallel #36436

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions