Prepare sin/cos buffers for rope outside model forward #566

tzielinski-habana · 2024-11-29T11:44:14Z

Moving sin/cos buffers preparation for rope outside model forward boosts performance by getting rid
of unneccessary gather and memcopy ops before rope

tzielinski-habana · 2024-11-29T18:07:27Z

Let's not merge yet, I've found some problems with TP=2 with fp8

tzielinski-habana · 2024-11-29T18:22:03Z

Actually, it looks like this problem also occurs on habana_main

This reverts commit 8c76728.

#566 breaks long-contexts + LoRA flow. This assumes caching sin-cos buffer for first decoder layer is sufficient to handle all cases, which is not the applicable for long-context + LoRA. This PR ignores `_prepare_cos_sin` call prior to HpuModelAdapter forward in long-context + LoRA flow.

Error reported in https://jira.habana-labs.com/browse/SW-212516 Found two recent merged PR breaks down Spec Decode functionality: 1. #491 overrides existing workerwrapperBase design for speculative decoding. ``` if model_runner_cls is not None: ModelRunnerClass = model_runner_cls ``` is not needed since we now use codes as below for init model_runner_cls to follow upstream design. ``` if model_runner_cls is not None: self.model_runner = model_runner_cls(self.model_runner) ``` 2. #566 is not working in Spec Decode Eagle mode Due to input tensors is now different to the pre-assumption that decode_fwd only provide one token per seq. Spec Decode provides multiple candidates tokens as q. To fix that, added a new ENV - "**VLLM_COS_SIN_RECOMPUTE**=true", need to use it to trigger recompute to cos and sin for spec decode. --------- Signed-off-by: Chendi.Xue <[email protected]>

tzielinski-habana requested review from jkaniecki, michalkuligowski and mswiniarsk November 29, 2024 11:44

tzielinski-habana force-pushed the rope_improvements branch from bf8e714 to efaddfa Compare November 29, 2024 15:59

michalkuligowski approved these changes Nov 29, 2024

View reviewed changes

tzielinski-habana force-pushed the rope_improvements branch from efaddfa to 8fb04d1 Compare November 29, 2024 17:12

tzielinski-habana changed the title ~~Prepare sin/cos buffers for rope outside model forward~~ [DO NOT MERGE] Prepare sin/cos buffers for rope outside model forward Nov 29, 2024

tzielinski-habana changed the title ~~[DO NOT MERGE] Prepare sin/cos buffers for rope outside model forward~~ Prepare sin/cos buffers for rope outside model forward Dec 2, 2024

tzielinski-habana changed the title ~~Prepare sin/cos buffers for rope outside model forward~~ [DO NOT MERGE] Prepare sin/cos buffers for rope outside model forward Dec 2, 2024

bgoldberg-habana and others added 7 commits December 3, 2024 13:04

[SW-209737] prepare sin/cos buffers for rope outside model forward

6b5622e

Update rotary_embedding.py

84d2cc7

formatter

e848c42

copy output from prepare_cos_sin

cecbc65

Dynamic paths

7ab4a25

prepare_cos_sin moved to HpuModelAdapter

979c212

only prepare cos and sin for the first layer

cd2fdd3

tzielinski-habana force-pushed the rope_improvements branch from 8fb04d1 to cd2fdd3 Compare December 3, 2024 15:34

Formatter 2

fb23991

tzielinski-habana force-pushed the rope_improvements branch 2 times, most recently from aeeb21d to fd9c667 Compare December 3, 2024 16:03

Refactoring

785fc70

tzielinski-habana force-pushed the rope_improvements branch from fd9c667 to 785fc70 Compare December 3, 2024 16:10

Fix for "non-iterable NoneType object"

52ffbc0

tzielinski-habana changed the title ~~[DO NOT MERGE] Prepare sin/cos buffers for rope outside model forward~~ Prepare sin/cos buffers for rope outside model forward Dec 4, 2024

mswiniarsk approved these changes Dec 4, 2024

View reviewed changes

tzielinski-habana merged commit 8c76728 into habana_main Dec 4, 2024
10 checks passed

tzielinski-habana deleted the rope_improvements branch December 4, 2024 12:44

jkaniecki mentioned this pull request Dec 5, 2024

Support mllama (llama 3.2) model for HPU #491

Merged

xuechendi mentioned this pull request Dec 11, 2024

[BUG fix] Rebase caused spec decode fix #613

Merged

adobrzyn added a commit that referenced this pull request Dec 11, 2024

Revert "Prepare sin/cos buffers for rope outside model forward (#566)"

955fb9a

This reverts commit 8c76728.

scsudhak-intel mentioned this pull request Dec 12, 2024

Fix long contexts in LoRA #624

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepare sin/cos buffers for rope outside model forward #566

Prepare sin/cos buffers for rope outside model forward #566

Uh oh!

tzielinski-habana commented Nov 29, 2024 •

edited by github-actions bot

Loading

Uh oh!

tzielinski-habana commented Nov 29, 2024

Uh oh!

tzielinski-habana commented Nov 29, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Prepare sin/cos buffers for rope outside model forward #566

Prepare sin/cos buffers for rope outside model forward #566

Uh oh!

Conversation

tzielinski-habana commented Nov 29, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzielinski-habana commented Nov 29, 2024

Uh oh!

tzielinski-habana commented Nov 29, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tzielinski-habana commented Nov 29, 2024 •

edited by github-actions bot

Loading