Loading from local files, e.g., dataset = nlp.load_dataset('csv', data_files=['file_1.csv', 'file_2.csv'])
concurrently from multiple processes, will raise FileExistsError from builder's line 430, https://github.com/huggingface/nlp/blob/6655008c738cb613c522deb3bd18e35a67b2a7e5/src/nlp/builder.py#L423-L438
Likely because multiple processes step into download_and_prepare, https://github.com/huggingface/nlp/blob/6655008c738cb613c522deb3bd18e35a67b2a7e5/src/nlp/load.py#L550-L554
This can happen when launching distributed training with commands like python -m torch.distributed.launch --nproc_per_node 4 on a new collection of files never loaded before.
I can create a PR that puts in some file locks. It would be helpful if I can be informed of the convention for naming and placement of the lock.