Fix #905 #906

leezu · 2019-08-28T11:32:36Z

Description

enforce_max_size function incorrectly overwrote an internal TokenEmbedding datastructure, breaking the use of unknown_token's embedding for '<pad>'. When this script was written, modifying TokenEmbedding's internals was required. Based on #750 we can now use proper API to make the required changes.

In general however, vocab = nlp.Vocab(nlp.data.count_tokens(tokens)) can be replaced with vocab = nlp.Vocab(nlp.data.count_tokens(tokens), unknown_token=token_embedding_.unknown_token, padding_token=None, bos_token=None, eos_token=None). I'm opening a PR for these two changes.

Unfortunately this error slipped through the tests. I'm also extending the testcase.

Fixes #905

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Fix word_embeddings/evaluate_pretrained.py script when --analogy-max-vocab-size is used

codecov · 2019-08-28T11:32:38Z

Codecov Report

❗ No coverage uploaded for pull request head (fixevaluatepretrained@6224bcd). Click here to learn what that means.
The diff coverage is n/a.

codecov · 2019-08-28T11:32:38Z

Codecov Report

Merging #906 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #906      +/-   ##
==========================================
+ Coverage   90.48%   90.48%   +<.01%     
==========================================
  Files          66       66              
  Lines        6400     6401       +1     
==========================================
+ Hits         5791     5792       +1     
  Misses        609      609

Impacted Files	Coverage Δ
src/gluonnlp/vocab/vocab.py	`97.32% <100%> (+0.01%)`	⬆️

mli · 2019-08-28T12:07:21Z

Job PR-906/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-906/1/index.html

mli · 2019-08-28T12:17:30Z

Job PR-906/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-906/2/index.html

mli · 2019-09-02T11:15:07Z

Job PR-906/13 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-906/13/index.html

Fix evaluate_pretrain.py

6224bcd

leezu requested a review from szha as a code owner August 28, 2019 11:32

Correctly specify unknown_lookup

95d3316

leezu requested review from sxjscience and eric-haibin-lin August 28, 2019 11:35

leezu and others added 3 commits August 29, 2019 18:32

Fix test

9726636

Refactor based on dmlc#750

e30a4da

Fix lint

0d4cbe8

szha requested a review from a team September 2, 2019 16:59

szha approved these changes Sep 11, 2019

View reviewed changes

szha merged commit b810d10 into dmlc:master Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #905 #906

Fix #905 #906

Uh oh!

leezu commented Aug 28, 2019

Uh oh!

codecov bot commented Aug 28, 2019

Uh oh!

codecov bot commented Aug 28, 2019 •

edited

Loading

Uh oh!

mli commented Aug 28, 2019

Uh oh!

mli commented Aug 28, 2019

Uh oh!

mli commented Sep 2, 2019

Uh oh!

Uh oh!

Fix #905 #906

Fix #905 #906

Uh oh!

Conversation

leezu commented Aug 28, 2019

Description

Checklist

Essentials

Changes

Uh oh!

codecov bot commented Aug 28, 2019

Codecov Report

Uh oh!

codecov bot commented Aug 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mli commented Aug 28, 2019

Uh oh!

mli commented Aug 28, 2019

Uh oh!

mli commented Sep 2, 2019

Uh oh!

Uh oh!

codecov bot commented Aug 28, 2019 •

edited

Loading