Fix custom tokenizers test #19052

sgugger · 2022-09-15T13:37:36Z

What does this PR do?

The custom tokenizers tests were never run because the test fetcher was not run before we look at its output to decide whether or not to run the tests. This PR fixes that and also adds missing tests to the nightly suite.

sgugger · 2022-09-15T14:01:23Z

tests/test_tokenization_common.py

                elif is_tf_available():
                    returned_tensor = "tf"
-                else:
+                elif is_flax_available():


This test was failing when no framework (PyTorch, TF, Flax) is installed. Returning instead of failing now.

HuggingFaceDocBuilderDev · 2022-09-15T14:09:49Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh

LGTM, thanks.

But I don't understand well this test: why we test tokenization for these 3 models only?

Also with a quick search, I can see @custom_tokenizers decorator is applied to cpm and BertJapanese. But here we run tests for openai and clip?

I must miss context.

sgugger · 2022-09-15T14:56:17Z

@ydshieh Those three files contain tests that require some specific dependencies to be installed (ftfy for openai and clip). So in the other test jobs, those tests are never run.

LysandreJik

LGTM

sgugger added 2 commits September 15, 2022 09:32

Fix CI for custom tokenizers

51b3ef2

Add nightly tests

62c82e9

sgugger requested review from LysandreJik and ydshieh September 15, 2022 13:37

sgugger added 4 commits September 15, 2022 09:39

Run CI, run!

492f85b

Fix paths

1d04cbe

Typos

5d2e578

Fix test

80f1fa5

sgugger commented Sep 15, 2022

View reviewed changes

sgugger changed the title ~~Fix custom tok test~~ Fix custom tokenizers test Sep 15, 2022

ydshieh approved these changes Sep 15, 2022

View reviewed changes

LysandreJik approved these changes Sep 15, 2022

View reviewed changes

sgugger merged commit f7ce4f1 into main Sep 15, 2022

sgugger deleted the fix_custom_tok_test branch September 15, 2022 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix custom tokenizers test #19052

Fix custom tokenizers test #19052

Uh oh!

sgugger commented Sep 15, 2022

Uh oh!

sgugger Sep 15, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2022 •

edited

Loading

Uh oh!

ydshieh left a comment

Uh oh!

sgugger commented Sep 15, 2022

Uh oh!

LysandreJik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix custom tokenizers test #19052

Fix custom tokenizers test #19052

Uh oh!

Conversation

sgugger commented Sep 15, 2022

What does this PR do?

Uh oh!

sgugger Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger commented Sep 15, 2022

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Sep 15, 2022 •

edited

Loading