Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/transformers/models/gpt2/modeling_gpt2.py
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,10 @@ class GPT2DoubleHeadsModelOutput(ModelOutput):
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.

If `past_key_values` is used, `attention_mask` needs to contain the masking strategy that was used for
`past_key_values`. In other words, the `attention_mask` always has to have the length:
`len(past_key_values) + len(input_ids)`

[What are attention masks?](../glossary#attention-mask)
token_type_ids (`torch.LongTensor` of shape `(batch_size, input_ids_length)`, *optional*):
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/gpt2/modeling_tf_gpt2.py
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,10 @@ class TFGPT2DoubleHeadsModelOutput(ModelOutput):
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.

If `past_key_values` is used, `attention_mask` needs to contain the masking strategy that was used for
`past_key_values`. In other words, the `attention_mask` always has to have the length:
`len(past_key_values) + len(input_ids)`

[What are attention masks?](../glossary#attention-mask)
token_type_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, sequence_length)`, *optional*):
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
Expand Down