Add canine in documentation_tests_file #18225

oneraghavan · 2022-07-21T06:11:05Z

What does this PR do?

modeling_canine has doc test setup by not included in documentation_tests.txt , this PR adds it

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2022-07-21T06:23:55Z

The documentation is not available anymore as the PR was closed or merged.

oneraghavan · 2022-08-15T11:47:45Z

@ydshieh Requesting you to review .

ydshieh · 2022-08-16T08:02:41Z

Sorry, I missed this PR. Look it now.

ydshieh · 2022-08-16T08:24:45Z

@oneraghavan , the doctest would fail for canine at this moment. We have to add the expected values (loss or some outputs).

Would you like to follow the changes in PR #16441 for modeling_longformer.py, more precisely in LongformerForSequenceClassification and LongformerForTokenClassification. See those changes here.

Don't hesitate if you have any question. Thank you!

oneraghavan · 2022-09-26T07:31:40Z

@ydshieh Will add those changes. Request you to reopen this PR.

ydshieh · 2022-09-26T08:42:46Z

@oneraghavan Thank you 🤗 . I reopened the PR. Before continue the work, don't forget to update your local main branch first, then rebase your working branch on main branch.

oneraghavan · 2022-09-26T15:10:49Z

@ydshieh I request you to reopen the PR again. I have fixed the checkpoints, the tests should pass now.

oneraghavan · 2022-09-27T07:53:31Z

@ydshieh I think this is good to merge.

ydshieh · 2022-09-27T08:44:09Z

src/transformers/models/canine/modeling_canine.py

@@ -1467,9 +1469,64 @@ def __init__(self, config):
    @add_start_docstrings_to_model_forward(CANINE_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        processor_class=_TOKENIZER_FOR_DOC,
-        checkpoint=_CHECKPOINT_FOR_DOC,
+        checkpoint="aliosm/sha3bor-poetry-diacritizer-canine-s",


This checkpoint is in Arabic, and trained for diacritizer.

ydshieh · 2022-09-27T08:45:59Z

src/transformers/models/canine/modeling_canine.py

+            "LABEL_3",
+            "LABEL_3",
+        ],
+        expected_loss=0.46,


This list is too long and doesn't really provide meaningful information, as the model is character-based. If the list is short, we could accept it (as done in a few other doc examples).

ydshieh

@oneraghavan The doctest for this model indeed passes!

However, I have some concern for CanineForTokenClassification.

The checkpoint is for Arabic text, and trained for diacritizer.
The model is character-based, and the expected output contains a very long list of LABEL_X.

I would suggest not to use add_code_sample_docstrings for CanineForTokenClassification, but to overwrite the code sample in the modeling file without checking expected outputs.

@sgugger Any opinion for this?

sgugger · 2022-09-27T11:43:17Z

Yes, agreed @ydshieh . In general any result that is LABEL_0 or a list of those should really not be included.

oneraghavan · 2022-09-27T12:48:20Z

@ydshieh I agree to the part where label_x is not so meaningful. Duplicating the function will make later debugging hard. I will remove the test for token classification.

@sgugger Can we make add_code_sample_docstrings decorator use the expected output in optional way ? like if the function does not have the expected output, just don't validate the expected output ?

ydshieh · 2022-09-27T13:34:56Z

I don't think we have an easy way to ignore the doctest in this case. The >>> predicted_tokens_classes part in PT_TOKEN_CLASSIFICATION_SAMPLE in the file src/transformers/utils/doc.py requires some expected outputs for predicted_tokens_classes. If there is none, the test just fails.

    >>> predicted_tokens_classes
    {expected_output}

oneraghavan · 2022-09-28T04:54:45Z

@ydshieh @sgugger Can we do add a paramerter in add_code_sample_docstrings in function and leave the default to None. Then when places we need to use custom sample, we can call it from there .

The function definition will look like this

def add_code_sample_docstrings(
*docstr,
processor_class=None,
checkpoint=None,
output_type=None,
config_class=None,
mask="[MASK]",
qa_target_start_index=14,
qa_target_end_index=15,
model_cls=None,
modality=None,
expected_output="",
expected_loss="",
code_sample="",
):

Inside I can use code_sample if it has been passed or look up the code sample from the templates.

Let me know if this is okay.

ydshieh · 2022-09-30T10:15:22Z

I don't see how the latest change is better than just putting the docstring under CanineForTokenClassification directly.

I will leave @sgugger to give his opinion.

sgugger · 2022-09-30T11:12:25Z

We don't need any other tooling here. Either the model falls in the "automatic docstring" category or it does not. If it does not, we just write the docstring (with the replace return decorator).

LysandreJik requested a review from ydshieh July 21, 2022 09:51

github-actions bot closed this Sep 18, 2022

huggingface deleted a comment from github-actions bot Sep 26, 2022

ydshieh reopened this Sep 26, 2022

oneraghavan closed this Sep 26, 2022

oneraghavan force-pushed the doc_tests_canine branch from 3c69d1f to c20b2c7 Compare September 26, 2022 14:44

Adding the model checkpoint for doctests for canine models

63d52c0

ydshieh reopened this Sep 26, 2022

Fix code format issue

16ad19b

ydshieh reviewed Sep 27, 2022

View reviewed changes

Change the doc_test function to give custom code sample

e1e98b9

oneraghavan closed this Oct 10, 2022

Add canine in documentation_tests_file #18225

Add canine in documentation_tests_file #18225

Uh oh!

Conversation

oneraghavan commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oneraghavan commented Aug 15, 2022

Uh oh!

ydshieh commented Aug 16, 2022

Uh oh!

ydshieh commented Aug 16, 2022

Uh oh!

oneraghavan commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Sep 26, 2022

Uh oh!

oneraghavan commented Sep 26, 2022

Uh oh!

oneraghavan commented Sep 27, 2022

Uh oh!

ydshieh Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger commented Sep 27, 2022

Uh oh!

oneraghavan commented Sep 27, 2022

Uh oh!

ydshieh commented Sep 27, 2022

Uh oh!

oneraghavan commented Sep 28, 2022

Uh oh!

ydshieh commented Sep 30, 2022

Uh oh!

sgugger commented Sep 30, 2022

Uh oh!

Uh oh!

oneraghavan commented Jul 21, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 21, 2022 •

edited

Loading

oneraghavan commented Sep 26, 2022 •

edited

Loading

ydshieh left a comment •

edited

Loading