Skip to content

XLA generation error with repetition_penalty #18630

@AlekseyKorshuk

Description

@AlekseyKorshuk

System Info

  • transformers version: 4.22.0.dev0
  • Platform: Linux-5.13.0-40-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.10.1+cu113 (True)
  • Tensorflow version (GPU?): 2.9.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@gante
@Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To reproduce error (adapted code from https://huggingface.co/blog/tf-xla-generate):

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM

generation_kwargs = {
    "max_new_tokens": 64,
    'eos_token_id': 198,
    'do_sample': True,
    'temperature': 0.72,
    'top_k': 0,
    'top_p': 0.725,
    'repetition_penalty': 1.13,
}

tokenizer = AutoTokenizer.from_pretrained(
    "gpt2", padding_side="left", pad_token="</s>"
)
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
model.config.pad_token_id = model.config.eos_token_id
input_text = "repetition_penalty error"

xla_generate = tf.function(model.generate, jit_compile=True)

tokenized_input = tokenizer(input_text, return_tensors="tf")

print("model.generate")
model.generate(**tokenized_input, **generation_kwargs)

print("xla_generate")
xla_generate(**tokenized_input, **generation_kwargs)    # error here

Error:

    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_utils.py", line 604, in generate  *
        seed=model_kwargs.pop("seed", None),
    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_utils.py", line 1651, in _generate  *
        input_ids,
    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_utils.py", line 2475, in sample_body_fn  *
        next_tokens_scores = logits_processor(generated, next_token_logits, cur_len)
    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_logits_process.py", line 94, in __call__  *
        scores = processor(input_ids, scores, cur_len)
    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_logits_process.py", line 278, in __call__  *
        score_penalties = self._create_score_penalties(input_ids[:, :cur_len], scores)
    File "/usr/local/lib/python3.8/dist-packages/transformers/generation_tf_logits_process.py", line 265, in _create_score_penalties  *
        indexable_prev_input_ids = tf.concat(

    ValueError: None values not supported.

By setting repetition_penalty to 1.0 or by removing this parameter everything works fine.

Expected behavior

The expected result is the work of text generation using repetition_penalty without any errors, taking into account the use of XLA.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions