Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer #942

leezu · 2019-09-24T10:34:45Z

Description

NLTKMosesTokenizer works only with unsupported / outdated nltk==3.2.5 but was
kept in GluonNLP as SacreMosesTokenizer does not support Python 2. Drop it as
Python 2 support has been dropped.

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer
Remove NLTKMosesDetokenizer in favor of SacreMosesDetokenizer

Comments

By design this is backwards incompatible.

cc @dmlc/gluon-nlp-team

NLTKMosesTokenizer works only with unsupported / outdated nltk==3.2.5 but was kept in GluonNLP as SacreMosesTokenizer does not support Python 2. Drop it as Python 2 support has been dropped.

mli · 2019-09-24T11:10:50Z

Job PR-942/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/2/index.html

mli · 2019-09-24T11:11:38Z

Job PR-942/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/1/index.html

codecov · 2019-09-24T18:58:50Z

Codecov Report

Merging #942 into master will increase coverage by 0.47%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #942      +/-   ##
==========================================
+ Coverage   90.12%   90.59%   +0.47%     
==========================================
  Files          67       67              
  Lines        6389     6380       -9     
==========================================
+ Hits         5758     5780      +22     
+ Misses        631      600      -31

Impacted Files	Coverage Δ
src/gluonnlp/data/transforms.py	`86% <100%> (+9%)`	⬆️
src/gluonnlp/model/bert.py	`84.95% <0%> (-14.51%)`	⬇️
src/gluonnlp/data/word_embedding_evaluation.py	`96.96% <0%> (+0.75%)`	⬆️
src/gluonnlp/data/dataset.py	`99.2% <0%> (+1.58%)`	⬆️
src/gluonnlp/data/batchify/batchify.py	`96.59% <0%> (+3.4%)`	⬆️
src/gluonnlp/data/corpora/wikitext.py	`100% <0%> (+5.17%)`	⬆️
src/gluonnlp/model/parameter.py	`100% <0%> (+8%)`	⬆️

mli · 2019-09-24T19:37:00Z

Job PR-942/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/3/index.html

sxjscience · 2019-10-03T20:54:34Z

Let's raise a warning instead of removing the code.

leezu · 2019-10-03T21:22:02Z

People that use the NLTKMosesTokenizer or NLTKMosesDetokenizer class may break LGPL without noticing. It can have severe impact, thus I think it's better to remove the code. For example, in a corporate setting usage of LGPL software typically requires special approval.

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer

9eaf6ca

NLTKMosesTokenizer works only with unsupported / outdated nltk==3.2.5 but was kept in GluonNLP as SacreMosesTokenizer does not support Python 2. Drop it as Python 2 support has been dropped.

leezu requested a review from a team as a code owner September 24, 2019 10:34

Add sacremoses to extra dependencies

c333988

Fix lint

1e8c5c7

leezu requested a review from sxjscience October 4, 2019 21:26

sxjscience approved these changes Oct 4, 2019

View reviewed changes

sxjscience merged commit 93d25fa into dmlc:master Oct 4, 2019

leezu deleted the removenltkmoses branch October 4, 2019 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer #942

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer #942

Uh oh!

leezu commented Sep 24, 2019

Uh oh!

mli commented Sep 24, 2019

Uh oh!

mli commented Sep 24, 2019

Uh oh!

codecov bot commented Sep 24, 2019 •

edited

Loading

Uh oh!

mli commented Sep 24, 2019

Uh oh!

sxjscience commented Oct 3, 2019

Uh oh!

leezu commented Oct 3, 2019

Uh oh!

Uh oh!

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer #942

Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer #942

Uh oh!

Conversation

leezu commented Sep 24, 2019

Description

Checklist

Essentials

Changes

Comments

Uh oh!

mli commented Sep 24, 2019

Uh oh!

mli commented Sep 24, 2019

Uh oh!

codecov bot commented Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mli commented Sep 24, 2019

Uh oh!

sxjscience commented Oct 3, 2019

Uh oh!

leezu commented Oct 3, 2019

Uh oh!

Uh oh!

codecov bot commented Sep 24, 2019 •

edited

Loading