Skip to content

Commit ca2a55e

Browse files
younesbelkadathomwolfthomasw21sguggersIncerass
authored
* adding template * update model * model update * update conf for debug model * update conversion * update conversion script * update conversion script * fix missing keys check * add tests to test the tokenizer in the local machine * Change variable name * add tests on xnli dataset * add more description * add descriptions + clearer code * clearer code * adding new tests + skipping few tests because of env problems * change comment * add dtype on the configuration * add test embeddings * add hardcoded test * fix dtype issue * adding torch.float16 to config * adding more metrics (min, max, mean) * add sum * now the test passes with almost equal * add files for conversion - test passes on cpu gpu * add final changes * cleaning code * add new args in the docstring * fix one liner function * remove macros * remove forward attention * clean up init funtion * add comments on the issue * rm scale mask softmax * do make style * fix dtype in init * fixing for loop on att probs * fix style with black * fix style + doc error * fix and debug CI errors (docs + style) * some updates - change new operations - finally add scaled softmax - added new args in the config * make use cache working * add changes - save sharded models - final changes on the modeling script * add changes - comment on alibi - add TODO on seq length * test commit - added a text to test the commit Co-authored-by: thomasw21 <[email protected]> * final changes - attention mask change - generation works on BS176b Co-authored-by: thomasw21 <[email protected]> * changes - model + conversion * move to correct dir * put , * fex fixes * fix tokenizer autodoc * fix minor CI issues * fix minor CI issues * fix minor CI issues * fix style issue * fix minor import issues * fix few issues * remove def main on the test * add require torch * replace decorator with 'with' * fix style * change to bloom * add quick fix tokenizer * fix tokenizer file * fix tokenizer - merge tests - small fixes * fix import issue * add bloom to readme * fix consistency * Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <[email protected]> * Apply suggestions from code review fix comment issues on file headers Co-authored-by: Sylvain Gugger <[email protected]> * fix doc issue * small fix - modeling test * some changes - refactor some code - taking into account reviews - more tests should pass - removed pruning tests * remove useless division * more tests should pass * more tests should pass * more tests should pass * let's try this one -add alibi offset - remove all permutes to make the grad operations work - finger crossed * refactor - refactor code - style changes - add new threshold for test * major changes - change BLOOM to Bloom - add quick doc on bloom.mdx - move embeddings test on modeling test * modify readme * small fixes * small fix - better threshold for a test * remove old test file from fetcher * fix small typo * major change - change BloomLMHead to BloomForCausalLM * remove onnx config * major changes - refactor the code - remove asserts - change tol for test * make style * small change * adding a slow test + commenting old ones for now * make style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * make style * fix duplicates * cleaning comments on config * clean a bit conversion file * refacor a bit modeling file * refactor tokenizer file * fix tokenization test issue * fix tokenization issue #2 * fix tokenization issue second try * fix test issue * make style + add suggestions * change test fetcher * try this one - slow tests should pass - finger crossed * possible final changes * make style * try fix padding side issue * fix side * fix padding issue * fix ko-readme * fix config auto * cleaning modeling file * keep bloom in caps in ko * update config docs * remove pretraining_pp * remove model parallel * update config - add correct config files * fix duplicates * fix fetcher * fix refactor issue - remove divide function * try to remove alibi * small fixes - fix alibi - remove seq length - refactor a bit the code * put correct values - fix bos and eos token ids * fix attention mask loop Co-authored-by: thomasw21 <[email protected]> * small fixes: - remove skip bias add * small fixes - fix typo in readme - fix typos in config * small changes - remove a test - add reconstruction test - change config * small changes - change Scaled Softmax to BloomScaledSoftmax * small fixes - fix alibi dtype * major changes - removing explicit dtype when loading modules - fixing test args (torch_dtype=auto) - add dosctring * fix readmes * major changes - now bloom supports alibi shifting - refactor a bit the code - better test tolerance now * refactor a bit * refactor a bit * put correct name on test * change docstring * small changes - fix docstring modeling - fix test tolerance * fix small nit - take dtype from tensors in the conversion script * minor fix - fix mdx issue * minor fix - change config docstring * forward contrib credits from PR14084 * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * apply modifications Co-authored-by: Stas Bekman <[email protected]> * resolve softmax upcast * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Niklas Muennighoff <[email protected]> * final changes modeling Co-authored-by: Stas Bekman <[email protected]> * Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2' * merge commit * Apply suggestions from code review Co-authored-by: Stas Bekman <[email protected]> * apply suggestions Apply suggestions from Stas comments Co-authored-by: Stas Bekman <[email protected]> * Fix gradient checkpointing Co-authored-by: Stas Bekman <[email protected]> * add slow but exact * add accelerate compatibility Co-authored-by: Nicolas Patry <[email protected]> * forward contrib credits Co-authored-by: thomasw21 <[email protected]> Co-authored-by: sgugger <[email protected]> Co-authored-by: patrickvonplaten <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: LysandreJik <[email protected]> * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * fix torch device on tests * make style * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * fix nits Co-authored-by: patrickvonplaten<[email protected]> * remove final nits * fix doc - add more details on the doc - add links to checkpoints * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Sylvain Gugger <[email protected]> * apply suggestions Co-authored-by: sgugger <[email protected]> * put test torchscript to false * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: justheuristic <[email protected]> * fix alibi - create alibi only once * add small doc * make quality * replace torch.nn * remove token type emb * fix fused op + output bias * add fused op - now can control fused operation from config * remove fused op * make quality * small changes - remove unsed args on config - removed bias gelu file - make the model torchscriptable - add torchscript slow tests * Update src/transformers/models/bloom/modeling_bloom.py * fix slow * make style * add accelerate support * add bloom to deepspeed tests * minor changes * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * minor change * slow tests pass * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Update docs/source/en/model_doc/bloom.mdx Co-authored-by: Sylvain Gugger <[email protected]> * minor changes: - change docstring - add link to paper Co-authored-by: Thomwolf <[email protected]> Co-authored-by: Thomas Wolf <[email protected]> Co-authored-by: thomasw21 <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: sIncerass <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Niklas Muennighoff <[email protected]> Co-authored-by: Nicolas Patry <[email protected]> Co-authored-by: thomasw21 <[email protected]> Co-authored-by: sgugger <[email protected]> Co-authored-by: patrickvonplaten <[email protected]> Co-authored-by: LysandreJik <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: justheuristic <[email protected]> Co-authored-by: Stas Bekman <[email protected]>
1 parent dfc76b2 commit ca2a55e

23 files changed

+2582
-3
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
240240
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
241241
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
242242
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
243+
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
243244
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
244245
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
245246
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

README_ko.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
221221
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
222222
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
223223
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
224+
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
224225
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
225226
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
226227
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

README_zh-hans.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ conda install -c huggingface transformers
245245
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
246246
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
247247
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
248+
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
248249
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
249250
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
250251
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。

README_zh-hant.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,7 @@ conda install -c huggingface transformers
257257
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
258258
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
259259
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
260+
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
260261
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
261262
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
262263
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,8 @@
176176
title: Blenderbot
177177
- local: model_doc/blenderbot-small
178178
title: Blenderbot Small
179+
- local: model_doc/bloom
180+
title: BLOOM
179181
- local: model_doc/bort
180182
title: BORT
181183
- local: model_doc/byt5

docs/source/en/index.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
6363
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
6464
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
6565
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
66+
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
6667
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
6768
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
6869
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
@@ -193,6 +194,7 @@ Flax), PyTorch, and/or TensorFlow.
193194
| BigBird-Pegasus | | | | | |
194195
| Blenderbot | | | | | |
195196
| BlenderbotSmall | | | | | |
197+
| BLOOM | | | | | |
196198
| CamemBERT | | | | | |
197199
| CANINE | | | | | |
198200
| CLIP | | | | | |

docs/source/en/model_doc/bloom.mdx

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# BLOOM
14+
15+
## Overview
16+
17+
The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
18+
The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
19+
Several smaller versions of the models have been trained on the same dataset. BLOOM is available in the following versions:
20+
21+
- [bloom-350m](https://huggingface.co/bigscience/bloom-350m)
22+
- [bloom-760m](https://huggingface.co/bigscience/bloom-760m)
23+
- [bloom-1b3](https://huggingface.co/bigscience/bloom-1b3)
24+
- [bloom-2b5](https://huggingface.co/bigscience/bloom-2b5)
25+
- [bloom-6b3](https://huggingface.co/bigscience/bloom-6b3)
26+
- [bloom](https://huggingface.co/bigscience/bloom) (175B parameters)
27+
28+
29+
## BloomConfig
30+
31+
[[autodoc]] BloomConfig
32+
- all
33+
34+
## BloomModel
35+
36+
[[autodoc]] BloomModel
37+
- forward
38+
39+
## BloomTokenizerFast
40+
41+
[[autodoc]] BloomTokenizerFast
42+
- all
43+
44+
## BloomForCausalLM
45+
46+
[[autodoc]] BloomForCausalLM
47+
- forward

src/transformers/__init__.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@
156156
"BlenderbotSmallConfig",
157157
"BlenderbotSmallTokenizer",
158158
],
159+
"models.bloom": ["BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP", "BloomConfig"],
159160
"models.bort": [],
160161
"models.byt5": ["ByT5Tokenizer"],
161162
"models.camembert": ["CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CamembertConfig"],
@@ -497,6 +498,7 @@
497498
_import_structure["models.big_bird"].append("BigBirdTokenizerFast")
498499
_import_structure["models.blenderbot"].append("BlenderbotTokenizerFast")
499500
_import_structure["models.blenderbot_small"].append("BlenderbotSmallTokenizerFast")
501+
_import_structure["models.bloom"].append("BloomTokenizerFast")
500502
_import_structure["models.camembert"].append("CamembertTokenizerFast")
501503
_import_structure["models.clip"].append("CLIPTokenizerFast")
502504
_import_structure["models.convbert"].append("ConvBertTokenizerFast")
@@ -858,6 +860,14 @@
858860
"BigBirdPegasusPreTrainedModel",
859861
]
860862
)
863+
_import_structure["models.bloom"].extend(
864+
[
865+
"BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST",
866+
"BloomForCausalLM",
867+
"BloomModel",
868+
"BloomPreTrainedModel",
869+
]
870+
)
861871
_import_structure["models.blenderbot"].extend(
862872
[
863873
"BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST",
@@ -2755,6 +2765,7 @@
27552765
BlenderbotSmallConfig,
27562766
BlenderbotSmallTokenizer,
27572767
)
2768+
from .models.bloom import BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP, BloomConfig
27582769
from .models.byt5 import ByT5Tokenizer
27592770
from .models.camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig
27602771
from .models.canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig, CanineTokenizer
@@ -3064,6 +3075,7 @@
30643075
from .models.big_bird import BigBirdTokenizerFast
30653076
from .models.blenderbot import BlenderbotTokenizerFast
30663077
from .models.blenderbot_small import BlenderbotSmallTokenizerFast
3078+
from .models.bloom import BloomTokenizerFast
30673079
from .models.camembert import CamembertTokenizerFast
30683080
from .models.clip import CLIPTokenizerFast
30693081
from .models.convbert import ConvBertTokenizerFast
@@ -3382,6 +3394,12 @@
33823394
BlenderbotSmallModel,
33833395
BlenderbotSmallPreTrainedModel,
33843396
)
3397+
from .models.bloom import (
3398+
BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST,
3399+
BloomForCausalLM,
3400+
BloomModel,
3401+
BloomPreTrainedModel,
3402+
)
33853403
from .models.camembert import (
33863404
CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
33873405
CamembertForCausalLM,

src/transformers/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
bigbird_pegasus,
3232
blenderbot,
3333
blenderbot_small,
34+
bloom,
3435
bort,
3536
byt5,
3637
camembert,

src/transformers/models/auto/configuration_auto.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
("bigbird_pegasus", "BigBirdPegasusConfig"),
3939
("blenderbot", "BlenderbotConfig"),
4040
("blenderbot-small", "BlenderbotSmallConfig"),
41+
("bloom", "BloomConfig"),
4142
("camembert", "CamembertConfig"),
4243
("canine", "CanineConfig"),
4344
("clip", "CLIPConfig"),
@@ -51,7 +52,6 @@
5152
("deberta", "DebertaConfig"),
5253
("deberta-v2", "DebertaV2Config"),
5354
("decision_transformer", "DecisionTransformerConfig"),
54-
("decision_transformer", "DecisionTransformerConfig"),
5555
("deit", "DeiTConfig"),
5656
("detr", "DetrConfig"),
5757
("distilbert", "DistilBertConfig"),
@@ -155,6 +155,7 @@
155155
("bigbird_pegasus", "BIGBIRD_PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
156156
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
157157
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
158+
("bloom", "BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
158159
("camembert", "CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
159160
("canine", "CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
160161
("clip", "CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@@ -262,6 +263,7 @@
262263
("bigbird_pegasus", "BigBird-Pegasus"),
263264
("blenderbot", "Blenderbot"),
264265
("blenderbot-small", "BlenderbotSmall"),
266+
("bloom", "BLOOM"),
265267
("bort", "BORT"),
266268
("byt5", "ByT5"),
267269
("camembert", "CamemBERT"),
@@ -362,7 +364,6 @@
362364
("van", "VAN"),
363365
("vilt", "ViLT"),
364366
("vision-encoder-decoder", "Vision Encoder decoder"),
365-
("vision-encoder-decoder", "Vision Encoder decoder"),
366367
("vision-text-dual-encoder", "VisionTextDualEncoder"),
367368
("visual_bert", "VisualBERT"),
368369
("vit", "ViT"),

0 commit comments

Comments
 (0)