-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Description
Environment info
transformers
version: 4.17.0.dev0- Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.34
- Python version: 3.9.7
- PyTorch version (GPU?): 1.10.1+cu102 (True)
- Tensorflow version (GPU?): 2.7.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.3.6 (cpu)
- Jax version: 0.2.26
- JaxLib version: 0.1.75
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help
Information
In EncoderDecoder models one can pass encoder_outputs
as a tuple of Tensors . However, if you do that this line will fail with
AttributeError: 'tuple' object has no attribute 'last_hidden_state'
since the tuple isn't modified in the forward
method.
So if it is a tuple, encoder_outputs
could maybe wrapped in a ModelOutput
class or something similar. Or handle the tuple somehow explicitly.
On a slight tangent
I made a SpeechEncoderDecoderModel
for the robust speech challenge: https://huggingface.co/jsnfly/wav2vec2-large-xlsr-53-german-gpt2. I found that adding the position embeddings of the decoder model to the outputs of the encoder model improved performance significantly (basically didn't work without it).
This needs small modifications to the __init__
and forward
methods of the SpeechEncoderDecoderModel
.
At the moment this seems to me too much of a "hack" to add it to the SpeechEncoderDecoderModel
class generally (for example via a flag), because it may differ for different decoder
models and probably also needs more verification. @patrickvonplaten showed some interest that this could be included in Transformers nonetheless. What do you think?