Support fused add rmsnorm for LLaMA #1667

beginlner · 2023-11-15T02:48:45Z

This PR reduces CPU overhead, increasing the end-to-end token generation speed in LLaMA by 10%.

WoosukKwon

@beginlner Thanks for the PR! Good work. Left some minor comments. Please take a look at them.

vllm/model_executor/layers/layernorm.py

vllm/model_executor/models/llama.py

WoosukKwon

@beginlner LGTM! Thanks for the optimization!

…#1667) Because libfrabic need consume some HBM memory, reduce util to 0.7. Or else vLLM may crash in 32k model length (30k/2k)

Support fused add rmsnorm for LLaMA

598f7ea

WoosukKwon self-requested a review November 15, 2023 08:19

WoosukKwon reviewed Nov 16, 2023

View reviewed changes

vllm/model_executor/layers/layernorm.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llama.py Show resolved Hide resolved

beginlner added 8 commits November 17, 2023 09:01

Minor fix

98ad913

Support fused add rmsnorm for Mistral

bb1cc17

Fix

c6134bc

Support fused add rmsnorm for Internlm

3cb9d78

Support fused add rmsnorm for Baichuan

71ba885

Support fused add rmsnorm for Qwen

d7350ff

Support fused add rmsnorm for Yi

2d5bd36

Minor fix

05998fc

WoosukKwon self-requested a review November 19, 2023 00:46

WoosukKwon approved these changes Nov 19, 2023

View reviewed changes

WoosukKwon merged commit e105424 into vllm-project:main Nov 19, 2023

yxl pushed a commit to yxl/vllm that referenced this pull request Nov 29, 2023

[Optimization] Implement fused add rmsnorm (vllm-project#1667)

34c75a9

beginlner deleted the fused_rms_norm branch December 19, 2023 01:41

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Optimization] Implement fused add rmsnorm (vllm-project#1667)

6865a1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support fused add rmsnorm for LLaMA #1667

Support fused add rmsnorm for LLaMA #1667

Uh oh!

beginlner commented Nov 15, 2023 •

edited

Loading

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Support fused add rmsnorm for LLaMA #1667

Support fused add rmsnorm for LLaMA #1667

Uh oh!

Conversation

beginlner commented Nov 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

beginlner commented Nov 15, 2023 •

edited

Loading