`import_ckpt` isn't thread safe.

**Describe the bug**

A clear and concise description of what the bug is.

I'm using `torchrun` to launch a distributed training, in which I called `model.import_ckpt`. It will save tokenizer to `/tmp/nemo_tokenizer` and is not thread safe.

**Steps/Code to reproduce bug**

```
import torch
import lightning as pl
from nemo import lightning as nl
from nemo.collections import llm
from megatron.core.optimizer import OptimizerConfig
from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer
from transformers import AutoTokenizer

if __name__ == "__main__":
    seq_length = 4096
    global_batch_size = 16

    # tokenizer = get_nmt_tokenizer(
    #     "megatron", "GPT2BPETokenizer"
    # )
    tokenizer = get_nmt_tokenizer(
        library="huggingface",
        model_name='meta-llama/Meta-Llama-3-8B-Instruct',
        use_fast=True,
    )
    print(1)
    model = llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)
    print(2)
    print(tokenizer, model)
    model.config.seq_length = seq_length
    ckpt_path = model.import_ckpt(path='hf://meta-llama/Meta-Llama-3-8B-Instruct')
    print(ckpt_path)
    data = llm.SquadDataModule(seq_length=seq_length, global_batch_size=global_batch_size, tokenizer=tokenizer)
    print(3)
    ## initialize the strategy
    strategy = nl.MegatronStrategy(
        context_parallel_size=1,
        tensor_model_parallel_size=8,
        pipeline_model_parallel_size=1,
        pipeline_dtype=torch.bfloat16,
    )
    print(4)

    ## setup the optimizer
    opt_config = OptimizerConfig(
        optimizer='adam',
        lr=6e-4,
        bf16=True,
    )
    opt = nl.MegatronOptimizerModule(config=opt_config)
    print(5)
    trainer = nl.Trainer(
        devices=8, ## you can change the number of devices to suit your setup
        max_steps=50,
        accelerator="gpu",
        strategy=strategy,
        plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
    )
    wandb_logger = pl.pytorch.loggers.WandbLogger()
    nemo_logger = nl.NeMoLogger(
        log_dir="test_logdir", ## logs and checkpoints will be written here
        wandb=wandb_logger,
    )

    # llm.train(
    #     model=model,
    #     data=data,
    #     trainer=trainer,
    #     log=nemo_logger,
    #     tokenizer='data',
    #     optim=opt,
    # )
    print(6)
    trainer.fit(model, data, ckpt_path=ckpt_path)
```

**Expected behavior**

Model loads.

**Environment overview (please complete the following information)**

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
 - Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version

**Additional context**

Add any other context about the problem here.
Example: GPU model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`import_ckpt` isn't thread safe. #14479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

import_ckpt isn't thread safe. #14479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`import_ckpt` isn't thread safe. #14479