CodeGen Fix causal mask for half precision #18467

younesbelkada · 2022-08-04T08:19:44Z

What does this PR do?

This PR forces the causal mask to stay in torch.uint8. An error occurs when loading a model in half precision since torch_dtype=torch.float16 casts also the buffers in fp16. Here is a minimal script to reproduce the error:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-2B-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-2B-mono", device_map="auto", torch_dtype=torch.float16)

text = "def quicksort(l):"

encoded_input = tokenizer(text, return_tensors='pt')
output_sequences = model.generate(input_ids=encoded_input['input_ids'], attention_mask=encoded_input['attention_mask'])
print(tokenizer.decode(output_sequences[0], skip_special_tokens=True))

In a future PR we could address non-casting the buffers (aka keeping them in their native dtype)

Can also confirm the slow tests pass!

cc @ydshieh

- Small hotfix for causal mask for half-precision models - Explicitly cast the causal mask to uint8 for compatibiliy with `torch.where`

- check huggingface#18467

HuggingFaceDocBuilderDev · 2022-08-04T08:32:15Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2022-08-04T09:21:47Z

src/transformers/models/codegen/modeling_codegen.py

        # compute causal mask from causal mask buffer
        query_length, key_length = query.size(-2), key.size(-2)
-        causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length]
+        causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length].to(torch.uint8)


Let's have a comment here to explain why we need .to(torch.uint8) 🙏

Added more comments on 8b81ac1
Let me know if anything is unclear!

Stupid question, why is that not

Suggested change

causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length].to(torch.uint8)

self.causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length].to(torch.uint8)

feels like something you want to do only once no?

It's a no-op when the tensor is already in the correct dtype.

It won't be needed anymore as we found the root cause with Younes :-)

sgugger

LGTM, thanks for fixing!

sgugger

Let's close this one to focus on the right fix @younesbelkada :-)

younesbelkada · 2022-08-04T12:25:17Z

Yeah let's move the discussion to: #18471

hotfix causal mask

37bc76d

- Small hotfix for causal mask for half-precision models - Explicitly cast the causal mask to uint8 for compatibiliy with `torch.where`

younesbelkada requested a review from sgugger August 4, 2022 08:19

younesbelkada added a commit to younesbelkada/transformers that referenced this pull request Aug 4, 2022

small hotfix

b117963

- check huggingface#18467

younesbelkada mentioned this pull request Aug 4, 2022

bitsandbytes - Linear8bitLt integration into transformers models #17901

Merged

6 tasks

ydshieh reviewed Aug 4, 2022

View reviewed changes

add more comments on casting

8b81ac1

sgugger approved these changes Aug 4, 2022

View reviewed changes

sgugger reviewed Aug 4, 2022

View reviewed changes

sgugger self-requested a review August 4, 2022 12:23

younesbelkada mentioned this pull request Aug 4, 2022

Let's not cast them all #18471

Merged

younesbelkada closed this Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CodeGen Fix causal mask for half precision #18467

CodeGen Fix causal mask for half precision #18467

Uh oh!

younesbelkada commented Aug 4, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 4, 2022 •

edited

Loading

Uh oh!

ydshieh Aug 4, 2022

Uh oh!

younesbelkada Aug 4, 2022

Uh oh!

thomasw21 Aug 4, 2022 •

edited

Loading

Uh oh!

sgugger Aug 4, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

sgugger left a comment

Uh oh!

younesbelkada commented Aug 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length].to(torch.uint8)
	self.causal_mask = self.causal_mask[:, :, key_length - query_length : key_length, :key_length].to(torch.uint8)

CodeGen Fix causal mask for half precision #18467

CodeGen Fix causal mask for half precision #18467

Uh oh!

Conversation

younesbelkada commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

younesbelkada Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

thomasw21 Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Aug 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

younesbelkada commented Aug 4, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 4, 2022 •

edited

Loading

thomasw21 Aug 4, 2022 •

edited

Loading

sgugger Aug 4, 2022 •

edited

Loading