Skip to content

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Apr 14, 2022

What does this PR do?

This adds a couple of improvements to the evaluation parts of the XTREME-S script:

  • fix the bug where filtering by language happened multiple times for parallel workers (redundantly)
  • use preprocess_logits_for_metrics to transform the logits into pred_ids before concatenating them to avoid OOMs
  • add the --language_group parameter to train on the FLEURS dataset in batches of languages (west/eastern european languages, south asian languages etc.)

Misc:

  • add --ctc_zero_infinity to handle the noisy FLEURS transcriptions

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 14, 2022

The documentation is not available anymore as the PR was closed or merged.

@anton-l anton-l changed the title [WIP][Research] Speed up evaluation for XTREME-S [Research] Speed up evaluation for XTREME-S Apr 26, 2022
@anton-l anton-l marked this pull request as ready for review April 26, 2022 15:45
@anton-l
Copy link
Member Author

anton-l commented Apr 26, 2022

@patrickvonplaten these are ready to merge now I think

Also cc @sanchit-gandhi: the fixes should make your life much easier if you decide to do a run of multilingual translation :)

@anton-l anton-l merged commit a4a88fa into huggingface:main Apr 27, 2022
chamidullinr pushed a commit to chamidullinr/transformers that referenced this pull request Apr 28, 2022
* Avoid repeated per-lang filtering

* Language groups and logits preprocessing

* Style
@anton-l anton-l deleted the faster-xtreme-s-eval branch April 28, 2022 09:39
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Avoid repeated per-lang filtering

* Language groups and logits preprocessing

* Style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants