Skip to content
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Nov 19, 2019

Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer
is required when Python parses the data_pipeline.py file. This means users will
always need to install the "optional" SpacyTokenizer dependencies, even if they
don't plan to use it. For example, just running an unrelated test in the scripts
folder will currently raise the following error.

ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'.
scripts/tests/conftest.py:23: in <module>
    from ..question_answering.data_pipeline import SQuADDataPipeline
scripts/question_answering/data_pipeline.py:433: in <module>
    class SQuADDataTokenizer:
scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer
    spacy_tokenizer = nlp.data.SpacyTokenizer()
src/gluonnlp/data/transforms.py:248: in __init__
    lang=lang))
E   OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.

cc @dmlc/gluon-nlp-team

@leezu leezu requested a review from Ishitori November 19, 2019 08:47
@leezu leezu requested a review from a team as a code owner November 19, 2019 08:47
@codecov
Copy link

codecov bot commented Nov 19, 2019

Codecov Report

Merging #1013 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1013   +/-   ##
=======================================
  Coverage   88.27%   88.27%           
=======================================
  Files          67       67           
  Lines        6254     6254           
=======================================
  Hits         5521     5521           
  Misses        733      733

@leezu leezu force-pushed the fixqadatapipelinespacy branch from 51eb0c4 to 6cfdfa6 Compare November 20, 2019 08:22
@mli
Copy link
Member

mli commented Nov 20, 2019

Job PR-1013/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/3/index.html

@leezu leezu force-pushed the fixqadatapipelinespacy branch from 6cfdfa6 to 49b861d Compare December 3, 2019 03:44
@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/4/index.html

@leezu leezu force-pushed the fixqadatapipelinespacy branch 2 times, most recently from a87a9e3 to 78fdd38 Compare December 3, 2019 07:42
…ckage

Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer
is required when Python parses the data_pipeline.py file. This means users will
always need to install the "optional" SpacyTokenizer dependencies, even if they
don't plan to use it. For example, just running an unrelated test in the scripts
folder will currently raise the following error.

ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'.
scripts/tests/conftest.py:23: in <module>
    from ..question_answering.data_pipeline import SQuADDataPipeline
scripts/question_answering/data_pipeline.py:433: in <module>
    class SQuADDataTokenizer:
scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer
    spacy_tokenizer = nlp.data.SpacyTokenizer()
src/gluonnlp/data/transforms.py:248: in __init__
    lang=lang))
E   OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.
@leezu leezu force-pushed the fixqadatapipelinespacy branch from 78fdd38 to b048122 Compare December 3, 2019 07:47
@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/7/index.html

@leezu leezu removed the request for review from Ishitori December 3, 2019 08:28
@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/8/index.html

@leezu leezu merged commit 7b7bf60 into dmlc:master Dec 4, 2019
@leezu leezu deleted the fixqadatapipelinespacy branch December 4, 2019 03:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants