Skip to content

Using custom DownloadConfig results in an error #560

@ynouri

Description

@ynouri

Version / Environment

Ubuntu 18.04
Python 3.6.8
nlp 0.4.0

Description

Loading imdb dataset works fine when when I don't specify any download_config argument. When I create a custom DownloadConfig object and pass it to the nlp.load_dataset function, this results in an error.

How to reproduce

Example without DownloadConfig --> works

import os

os.environ["HF_HOME"] = "/data/hf-test-without-dl-config-01/"

import logging
import nlp

logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":
    imdb = nlp.load_dataset(path="imdb")

Example with DownloadConfig --> doesn't work

import os

os.environ["HF_HOME"] = "/data/hf-test-with-dl-config-01/"

import logging
import nlp
from nlp.utils import DownloadConfig

logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":
    download_config = DownloadConfig()
    imdb = nlp.load_dataset(path="imdb", download_config=download_config)

Error traceback:

Traceback (most recent call last):
  File "/.../example_with_dl_config.py", line 13, in <module>
    imdb = nlp.load_dataset(path="imdb", download_config=download_config)
  File "/.../python3.6/python3.6/site-packages/nlp/load.py", line 549, in load_dataset
    download_config=download_config, download_mode=download_mode, ignore_verifications=ignore_verifications,
  File "/.../python3.6/python3.6/site-packages/nlp/builder.py", line 463, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
  File "/.../python3.6/python3.6/site-packages/nlp/builder.py", line 518, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
  File "/.../python3.6/python3.6/site-packages/nlp/datasets/imdb/76cdbd7249ea3548c928bbf304258dab44d09cd3638d9da8d42480d1d1be3743/imdb.py", line 86, in _split_generators
    arch_path = dl_manager.download_and_extract(_DOWNLOAD_URL)
  File "/.../python3.6/python3.6/site-packages/nlp/utils/download_manager.py", line 220, in download_and_extract
    return self.extract(self.download(url_or_urls))
  File "/.../python3.6/python3.6/site-packages/nlp/utils/download_manager.py", line 158, in download
    self._record_sizes_checksums(url_or_urls, downloaded_path_or_paths)
  File "/.../python3.6/python3.6/site-packages/nlp/utils/download_manager.py", line 108, in _record_sizes_checksums
    self._recorded_sizes_checksums[url] = get_size_checksum_dict(path)
  File "/.../python3.6/python3.6/site-packages/nlp/utils/info_utils.py", line 79, in get_size_checksum_dict
    with open(path, "rb") as f:
IsADirectoryError: [Errno 21] Is a directory: '/data/hf-test-with-dl-config-01/datasets/extracted/b6802c5b61824b2c1f7dbf7cda6696b5f2e22214e18d171ce1ed3be90c931ce5'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions