-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Add OPT #17088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add OPT #17088
Changes from all commits
Commits
Show all changes
158 commits
Select commit
Hold shift + click to select a range
c8cf718
First version - OPT model
younesbelkada 9ee623d
Final changes
younesbelkada 0484ca1
few changes
younesbelkada b931db8
few changes
younesbelkada 681dfc5
fix style issues
younesbelkada 1e21983
few changes
younesbelkada 1363221
Update src/transformers/models/auto/tokenization_auto.py
younesbelkada 8427279
add gen tests
younesbelkada 5e8e2f5
few changes
younesbelkada be0e434
few changes
younesbelkada 51db79e
some changes
younesbelkada 99001d3
fix code quality
younesbelkada a777bbc
major changes
younesbelkada 38f7463
rm useless classes
younesbelkada c6f3a69
Removed autodoc calls to non-existant classes
ArthurZucker 30d3db2
Update src/transformers/__init__.py
younesbelkada f903445
Update src/transformers/__init__.py
younesbelkada bb4ab4a
Update src/transformers/models/auto/modeling_tf_auto.py
younesbelkada 2a6e288
Replaced OPTTokeniser with GPT2 tokenizer
ArthurZucker cb853fd
added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokeni…
ArthurZucker 337e71f
Removed OPTTokenizer
ArthurZucker 0d9130f
make style
ArthurZucker 290b7f0
Make style replaces
ArthurZucker 096eb74
make repo consistency
ArthurZucker 020843a
Removed PretrainedOPTModel
ArthurZucker c63d9f8
fix opt.mdx removed other heads
ArthurZucker 8b6e496
fix init, removed 3 heads
ArthurZucker 0303f2b
removed heads
ArthurZucker 2c0327d
finished cleaning head
ArthurZucker 4aa6ab2
removed seauence classif and question answering
ArthurZucker 752f512
removed unused imports
ArthurZucker 14eeb13
removed useless dummy object for QA, SC and CG
ArthurZucker 9c96f09
removed tests for removed useless dummy object for QA, SC and CG
ArthurZucker 54fc962
Removed head_mask using encoder layers which don't exist
ArthurZucker 06f42ca
fixed test
ArthurZucker 76e52ac
fix line
ArthurZucker 556c2f4
added OPT to toctree
ArthurZucker 1460025
Updated model path with pushed weigths
ArthurZucker db100a5
fix model path
ArthurZucker d16d40d
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker c10f347
fixed code quality
ArthurZucker f1fe820
fixed embeddings and generation tests
ArthurZucker 9b9c65b
update paths
ArthurZucker 4fb9608
clean comments
ArthurZucker ab57047
removed OPTClassificationHead for sentence classification
ArthurZucker 0c1c791
renamed hidden layer
ArthurZucker ac50b44
renamed num layers to standard num_hidden_layers
ArthurZucker 1505de5
num_attention_heads fix
ArthurZucker 8ace67b
changes for 125m
younesbelkada 80296cb
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada 752c1d2
add first version for 125m
younesbelkada 77e6e04
add first version - flax
younesbelkada 1564dac
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada abd1f3c
add new version
younesbelkada 23ff89c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada 5c5c858
causal LM output
ArthurZucker 41fad01
Merge branch 'opt-350-m' of github.com:younesbelkada/transformers int…
ArthurZucker 27b55c9
replace output type with BaseModelOutputWithPastAndCrossAttentions
ArthurZucker aebd19e
revert working config from 150m to 350m
ArthurZucker d0723aa
clean
ArthurZucker 7575749
removed decoder input ids
ArthurZucker 66e8298
fixed embed dim
ArthurZucker 8d4920e
more embed_dim issues
ArthurZucker c005840
make style + removed enc_dec test
ArthurZucker 84eb497
update falx model
ArthurZucker 043a109
removed troublesome copy
ArthurZucker 8ba7cbc
added is_encoder_decoder=False to config
ArthurZucker 2099b5f
added set_input emb fuinction to model class
ArthurZucker 1c9580f
requires torch on embed test
ArthurZucker 9f6291d
use head mask instead of decoder head mask input param solves a test
ArthurZucker 740fcf5
8 test remaining, update
ArthurZucker f8c276b
Updated create_and_check_decoder_model_past_large_inputs
ArthurZucker fff035f
Make style
ArthurZucker 30ed9f6
update op tokenizer with condition
ArthurZucker 69c7ae6
make style
ArthurZucker ff09958
See if I can push
patrickvonplaten 0555b92
some clean up
patrickvonplaten 5491431
remove linear head hack
patrickvonplaten 521822f
save intermediate
patrickvonplaten 61e8023
save correct attention
patrickvonplaten 7b27a91
add copied from from bart
patrickvonplaten 26729d7
Merge branch 'main' of https://github.com/huggingface/transformers in…
patrickvonplaten 7661453
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 25a40b1
fix part of the reviewss
ArthurZucker aefa63d
Merge pull request #2 from younesbelkada/opt_branch/opt-350-m
ArthurZucker f3b5e24
same changes in naming / conversion
patrickvonplaten 0365e27
correct mask
patrickvonplaten 929be23
more fixes
patrickvonplaten f6b032b
delete FlaxOPT and TfOPT
ArthurZucker d633832
clean traces of Flax and Tf
ArthurZucker 85ce8e8
fix mask
patrickvonplaten d6fc7f3
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten 95d7ead
fixed positionnal embedding length when past key value is provoded
ArthurZucker 412bdab
get 125m, 6.7b to work
patrickvonplaten 974d44c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten cc1b4c9
Added do_layer_norm
ArthurZucker 156866d
solved mismatch in load dictionnary
ArthurZucker 849afd3
clean up preapre opt input dict
ArthurZucker 1acb47a
fixed past key value as bool
ArthurZucker 668246b
fix previus
ArthurZucker 769b9d6
fixed return dict False tuple issue
ArthurZucker def917e
All tests are passing
ArthurZucker 5131932
Make style
ArthurZucker 5ec0766
Ignore OPTDecoder non tested
ArthurZucker 2ed32a8
make fix-copies
ArthurZucker 1db5f2b
make repo consistency
ArthurZucker f57a0b5
small fix
ArthurZucker 5f96836
removed uselss @torch.no_grad decorator
ArthurZucker 70c2196
make styl;e
ArthurZucker 49e905d
fix previous opt test
ArthurZucker 2c1bce4
style
ArthurZucker 9c3f0c0
make style
ArthurZucker 29987ed
added opt documentation
ArthurZucker 145838f
update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
ArthurZucker e2c932b
up
patrickvonplaten 3bf333d
more fixes
patrickvonplaten b24ac4b
model & config work
patrickvonplaten 2e1d4f4
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 6737d09
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 994c104
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 136983b
added comment on padding hack (+2)
ArthurZucker 6834c7b
cleaup
ArthurZucker 014674d
review update
ArthurZucker 2c7102b
docstring for missing arg
ArthurZucker 598ef8d
Update docs/source/en/model_doc/opt.mdx
ArthurZucker 0a58092
Update docs/source/en/model_doc/opt.mdx
ArthurZucker 66c807a
Update docs/source/en/model_doc/opt.mdx
ArthurZucker fd91198
Update src/transformers/models/opt/__init__.py
ArthurZucker dfb00c0
update pretrained map
ArthurZucker f6c587c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
ArthurZucker 7923a46
update path and tests
ArthurZucker 192c407
make style
ArthurZucker ab1c4fb
styling
ArthurZucker 0215920
make consistency
ArthurZucker 6de5a2d
add gpt2 tok new
patrickvonplaten 4dbd565
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
patrickvonplaten 484f24f
more tok fixes
patrickvonplaten e3b7c4b
Update src/transformers/models/auto/tokenization_auto.py
patrickvonplaten 46f6401
Update docs/source/en/model_doc/opt.mdx
ArthurZucker 27437f7
Update docs/source/en/model_doc/opt.mdx
ArthurZucker c325b0a
Update docs/source/en/model_doc/opt.mdx
ArthurZucker 5fc7b7b
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 133a465
Update tests/models/opt/test_modeling_opt.py
ArthurZucker 109abdc
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker d69db00
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 51cba40
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 6554537
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 200ac36
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker 368620a
Update based on reviews
ArthurZucker e1bbc22
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
ArthurZucker e53e8f7
Apply suggestions from code review
patrickvonplaten 4c1e494
make style
patrickvonplaten 4c9c360
make tokenizer auto tests pass
patrickvonplaten 6055da9
apply Lysandre suggestion
patrickvonplaten 22c89b4
Merge branch 'main' of https://github.com/huggingface/transformers in…
patrickvonplaten 776b42c
finish tests
patrickvonplaten c39f5bd
add some good tokenizer tests
patrickvonplaten d8070cd
improve docs slighly
patrickvonplaten File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# OPT | ||
|
||
## Overview | ||
|
||
The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI. | ||
OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3. | ||
|
||
|
||
The abstract from the paper is the following: | ||
|
||
*Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.* | ||
|
||
Tips: | ||
- OPT has the same architecture as [`BartDecoder`]. | ||
- Contrary to GPT2, OPT adds the EOS token `</s>` to the beginning of every prompt. **Note**: Make sure to pass `use_fast=False` when loading OPT's tokenizer with [`AutoTokenizer`] to get the correct tokenizer. | ||
|
||
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ), [Younes Belkada](https://huggingface.co/ybelkada), and [Patrick Von Platen](https://huggingface.co/patrickvonplaten). | ||
The original code can be found [here](https://github.com/facebookresearch/metaseq). | ||
|
||
|
||
## OPTConfig | ||
|
||
[[autodoc]] OPTConfig | ||
|
||
## OPTModel | ||
|
||
[[autodoc]] OPTModel | ||
- forward | ||
|
||
|
||
## OPTForCausalLM | ||
|
||
[[autodoc]] OPTForCausalLM | ||
- forward | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,7 @@ | |
mt5, | ||
nystromformer, | ||
openai, | ||
opt, | ||
pegasus, | ||
perceiver, | ||
phobert, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,6 +137,7 @@ | |
("openai-gpt", ("OpenAIGPTTokenizer", "OpenAIGPTTokenizerFast" if is_tokenizers_available() else None)), | ||
("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)), | ||
("gptj", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)), | ||
("opt", ("GPT2Tokenizer", None)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to add a fast tokenizer in a follow-up PR that is able to prepend the bos_token at the beginning. For this a new converter needs to be written. |
||
("transfo-xl", ("TransfoXLTokenizer", None)), | ||
( | ||
"xlnet", | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.