Skip to content
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Sep 24, 2019

Description

NLTKMosesTokenizer works only with unsupported / outdated nltk==3.2.5 but was
kept in GluonNLP as SacreMosesTokenizer does not support Python 2. Drop it as
Python 2 support has been dropped.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Remove NLTKMosesTokenizer in favor of SacreMosesTokenizer
  • Remove NLTKMosesDetokenizer in favor of SacreMosesDetokenizer

Comments

  • By design this is backwards incompatible.

cc @dmlc/gluon-nlp-team

NLTKMosesTokenizer works only with unsupported / outdated nltk==3.2.5 but was
kept in GluonNLP as SacreMosesTokenizer does not support Python 2. Drop it as
Python 2 support has been dropped.
@leezu leezu requested a review from a team as a code owner September 24, 2019 10:34
@mli
Copy link
Member

mli commented Sep 24, 2019

Job PR-942/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/2/index.html

@mli
Copy link
Member

mli commented Sep 24, 2019

Job PR-942/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/1/index.html

@codecov
Copy link

codecov bot commented Sep 24, 2019

Codecov Report

Merging #942 into master will increase coverage by 0.47%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #942      +/-   ##
==========================================
+ Coverage   90.12%   90.59%   +0.47%     
==========================================
  Files          67       67              
  Lines        6389     6380       -9     
==========================================
+ Hits         5758     5780      +22     
+ Misses        631      600      -31
Impacted Files Coverage Δ
src/gluonnlp/data/transforms.py 86% <100%> (+9%) ⬆️
src/gluonnlp/model/bert.py 84.95% <0%> (-14.51%) ⬇️
src/gluonnlp/data/word_embedding_evaluation.py 96.96% <0%> (+0.75%) ⬆️
src/gluonnlp/data/dataset.py 99.2% <0%> (+1.58%) ⬆️
src/gluonnlp/data/batchify/batchify.py 96.59% <0%> (+3.4%) ⬆️
src/gluonnlp/data/corpora/wikitext.py 100% <0%> (+5.17%) ⬆️
src/gluonnlp/model/parameter.py 100% <0%> (+8%) ⬆️

@mli
Copy link
Member

mli commented Sep 24, 2019

Job PR-942/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-942/3/index.html

@sxjscience
Copy link
Member

Let's raise a warning instead of removing the code.

@leezu
Copy link
Contributor Author

leezu commented Oct 3, 2019

People that use the NLTKMosesTokenizer or NLTKMosesDetokenizer class may break LGPL without noticing. It can have severe impact, thus I think it's better to remove the code. For example, in a corporate setting usage of LGPL software typically requires special approval.

@leezu leezu requested a review from sxjscience October 4, 2019 21:26
@sxjscience sxjscience merged commit 93d25fa into dmlc:master Oct 4, 2019
@leezu leezu deleted the removenltkmoses branch October 4, 2019 22:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants