-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Add fast tokenizer for BARTpho #17254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 11 commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
b889418
Add BartphoTokenizerFast
datquocnguyen 4c64432
Add BartphoTokenizerFast
datquocnguyen 3496219
Add test for BartphoTokenizerFast
datquocnguyen aa77c99
Revise BARTpho slow and fast tokenizers to be independent
datquocnguyen f5931ed
Fix formatting
datquocnguyen 4655d6d
Fix formatting
datquocnguyen 3618b77
Fix formatting
datquocnguyen a81aa03
Merge branch 'main' into main
datquocnguyen 8bfea5a
Fix formatting
datquocnguyen 069612f
Update src/transformers/models/bartpho/tokenization_bartpho_fast.py
datquocnguyen 7cb6707
Fix formatting
datquocnguyen d76fffa
Remove hardcoded value
datquocnguyen 6c1b82c
Revert the new slow tokenizer to the original slow one
datquocnguyen af0aa0e
Fix formatting
datquocnguyen 1b22570
The fast tokenizer with the same tokenization strategy as the slow one
datquocnguyen a915922
Fix formatting
datquocnguyen 8835d18
Add fast tokenizers for PhoBERT and BERTweet
datquocnguyen b3677f5
Fix formatting
datquocnguyen 7d9d477
Add require_torch
datquocnguyen 542cfe2
Improved tokenization strategy for BartphoTokenizerFast
datquocnguyen cd82d13
Original BERTweet and PhoBERT tokenizers
datquocnguyen f59b4af
Fix format
datquocnguyen 0b25f8c
Improve get_added_vocabulary_hacking
datquocnguyen 176e323
Merge pull request #1 from datquocnguyen/main
datquocnguyen a7aba09
Fix formatting
datquocnguyen 0047c83
Fast tokenizers for PhoBERT and BERTweet
datquocnguyen 18e7684
Fix formatting
datquocnguyen a41f761
Fix formatting
datquocnguyen a65cef8
Merge pull request #2 from huggingface/main
datquocnguyen 7692138
Merge pull request #4 from huggingface/main
datquocnguyen 64a27eb
Merge pull request #5 from huggingface/main
datquocnguyen d0ad0de
Merge pull request #6 from huggingface/main
datquocnguyen 72651a2
Merge pull request #7 from huggingface/main
datquocnguyen a14b6a5
Merge pull request #8 from huggingface/main
datquocnguyen 68c3148
Merge pull request #9 from huggingface/main
datquocnguyen cf9a23c
Merge pull request #10 from huggingface/main
datquocnguyen 321c148
Merge pull request #11 from huggingface/main
datquocnguyen d592599
Merge pull request #12 from huggingface/main
datquocnguyen 9630bce
Merge pull request #13 from huggingface/main
datquocnguyen 5c0fdac
Merge pull request #14 from huggingface/main
datquocnguyen 3da785c
Merge pull request #15 from huggingface/main
datquocnguyen 2f0940f
Merge pull request #17 from huggingface/main
datquocnguyen 99b1c05
Merge pull request #18 from huggingface/main
datquocnguyen f29a771
Merge pull request #19 from huggingface/main
datquocnguyen 3f0bdce
Merge pull request #20 from huggingface/main
datquocnguyen 0e79af5
Merge pull request #21 from huggingface/main
datquocnguyen 0db5b71
Merge pull request #22 from huggingface/main
datquocnguyen c21aadb
Merge pull request #23 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen d4a6fbb
Update test_tokenization_bartpho.py
datquocnguyen af834cf
Merge pull request #24 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 1bd229f
Update test_tokenization_bartpho.py
datquocnguyen 57a7f67
Merge pull request #25 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 2140b76
Merge pull request #27 from datquocnguyen/tmp_branch
datquocnguyen 30275f1
Merge pull request #28 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen e56cb63
Merge pull request #29 from huggingface/main
datquocnguyen 0b61789
Merge pull request #30 from huggingface/main
datquocnguyen 0a4b3c1
Merge pull request #31 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen f214fa0
Merge pull request #32 from huggingface/main
datquocnguyen 8303666
Merge pull request #33 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 5a7b682
Merge pull request #34 from huggingface/main
datquocnguyen 6833da7
Merge pull request #35 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 85ecfbd
Merge pull request #36 from huggingface/main
datquocnguyen 53a577e
Merge pull request #37 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 391a440
Merge pull request #38 from huggingface/main
datquocnguyen 2a06fa7
Merge pull request #39 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 8048f3a
Merge pull request #40 from huggingface/main
datquocnguyen 0787507
Merge pull request #41 from huggingface/main
datquocnguyen c0726f1
Merge pull request #42 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen 0f90212
Merge pull request #43 from huggingface/main
datquocnguyen 809f738
Merge pull request #44 from datquocnguyen/fast_tokenizers_BARTpho_Pho…
datquocnguyen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.