Skip to content

Conversation

oneraghavan
Copy link
Contributor

@oneraghavan oneraghavan commented Jul 21, 2022

What does this PR do?

modeling_canine has doc test setup by not included in documentation_tests.txt , this PR adds it

Fixes #16292

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 21, 2022

The documentation is not available anymore as the PR was closed or merged.

@LysandreJik LysandreJik requested a review from ydshieh July 21, 2022 09:51
@oneraghavan
Copy link
Contributor Author

@ydshieh Requesting you to review .

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 16, 2022

Sorry, I missed this PR. Look it now.

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 16, 2022

@oneraghavan , the doctest would fail for canine at this moment. We have to add the expected values (loss or some outputs).

Would you like to follow the changes in PR #16441 for modeling_longformer.py, more precisely in LongformerForSequenceClassification and LongformerForTokenClassification. See those changes here.

Don't hesitate if you have any question. Thank you!

@github-actions github-actions bot closed this Sep 18, 2022
@oneraghavan
Copy link
Contributor Author

oneraghavan commented Sep 26, 2022

@ydshieh Will add those changes. Request you to reopen this PR.

@huggingface huggingface deleted a comment from github-actions bot Sep 26, 2022
@ydshieh ydshieh reopened this Sep 26, 2022
@ydshieh
Copy link
Collaborator

ydshieh commented Sep 26, 2022

@oneraghavan Thank you 🤗 . I reopened the PR. Before continue the work, don't forget to update your local main branch first, then rebase your working branch on main branch.

@oneraghavan
Copy link
Contributor Author

@ydshieh I request you to reopen the PR again. I have fixed the checkpoints, the tests should pass now.

@ydshieh ydshieh reopened this Sep 26, 2022
@oneraghavan
Copy link
Contributor Author

@ydshieh I think this is good to merge.

@@ -1467,9 +1469,64 @@ def __init__(self, config):
@add_start_docstrings_to_model_forward(CANINE_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
processor_class=_TOKENIZER_FOR_DOC,
checkpoint=_CHECKPOINT_FOR_DOC,
checkpoint="aliosm/sha3bor-poetry-diacritizer-canine-s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkpoint is in Arabic, and trained for diacritizer.

"LABEL_3",
"LABEL_3",
],
expected_loss=0.46,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is too long and doesn't really provide meaningful information, as the model is character-based. If the list is short, we could accept it (as done in a few other doc examples).

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oneraghavan The doctest for this model indeed passes!

However, I have some concern for CanineForTokenClassification.

  • The checkpoint is for Arabic text, and trained for diacritizer.
  • The model is character-based, and the expected output contains a very long list of LABEL_X.

I would suggest not to use add_code_sample_docstrings for CanineForTokenClassification, but to overwrite the code sample in the modeling file without checking expected outputs.

@sgugger Any opinion for this?

@sgugger
Copy link
Collaborator

sgugger commented Sep 27, 2022

Yes, agreed @ydshieh . In general any result that is LABEL_0 or a list of those should really not be included.

@oneraghavan
Copy link
Contributor Author

@ydshieh I agree to the part where label_x is not so meaningful. Duplicating the function will make later debugging hard. I will remove the test for token classification.

@sgugger Can we make add_code_sample_docstrings decorator use the expected output in optional way ? like if the function does not have the expected output, just don't validate the expected output ?

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 27, 2022

I don't think we have an easy way to ignore the doctest in this case. The >>> predicted_tokens_classes part in PT_TOKEN_CLASSIFICATION_SAMPLE in the file src/transformers/utils/doc.py requires some expected outputs for predicted_tokens_classes. If there is none, the test just fails.

    >>> predicted_tokens_classes
    {expected_output}

@oneraghavan
Copy link
Contributor Author

@ydshieh @sgugger Can we do add a paramerter in add_code_sample_docstrings in function and leave the default to None. Then when places we need to use custom sample, we can call it from there .

The function definition will look like this

def add_code_sample_docstrings(
*docstr,
processor_class=None,
checkpoint=None,
output_type=None,
config_class=None,
mask="[MASK]",
qa_target_start_index=14,
qa_target_end_index=15,
model_cls=None,
modality=None,
expected_output="",
expected_loss="",
code_sample="",
):

Inside I can use code_sample if it has been passed or look up the code sample from the templates.

Let me know if this is okay.

@ydshieh
Copy link
Collaborator

ydshieh commented Sep 30, 2022

I don't see how the latest change is better than just putting the docstring under CanineForTokenClassification directly.

I will leave @sgugger to give his opinion.

@sgugger
Copy link
Collaborator

sgugger commented Sep 30, 2022

We don't need any other tooling here. Either the model falls in the "automatic docstring" category or it does not. If it does not, we just write the docstring (with the replace return decorator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Community Event] Doc Tests Sprint
4 participants