Skip to content

KeyError: 'SLURM_NODEID' #112

@xiaobaitu123321

Description

@xiaobaitu123321

When I tried to train or evaluate, I encountered this problem.

File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 577, in _lazy_init_strategy
self.strategy.set_world_ranks()
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 210, in set_world_ranks
self.cluster_environment.set_global_rank(self.node_rank * self.num_processes + self.local_rank)
File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/parallel.py", line 63, in node_rank
return self.cluster_environment.node_rank() if self.cluster_environment is not None else 0
File "/opt/conda/lib/python3.10/site-packages/lightning/fabric/plugins/environments/slurm.py", line 155, in node_rank
return int(os.environ["SLURM_NODEID"])
File "/opt/conda/lib/python3.10/os.py", line 680, in getitem
raise KeyError(key) from None
KeyError: 'SLURM_NODEID'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions