-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
clean up activation hardcoding to silu #28503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: No functional changes Test Plan: CI Passes Differential Revision: D86799190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to remove the hardcoded 'silu' activation in the MoE expert implementation to support more general activation functions. The changes are in the right direction, but I've identified a critical issue in torch_moe where a call to the updated torch_experts function is now incorrect, which will lead to a runtime error. I've provided a fix for this. Additionally, I've suggested a small performance optimization in torch_experts to avoid redundant function calls inside a loop.
| def apply_moe_activation(act_str, act_input): | ||
| act_fn = get_act_and_mul_fn(act_str) | ||
| return act_fn(act_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The apply_moe_activation helper function repeatedly calls get_act_and_mul_fn inside the loop over experts. This is inefficient. It's better to fetch the activation function once before the loop.
I suggest removing this helper function and instead initializing the activation function before the loop:
act_fn = get_act_and_mul_fn(activation)
for i in range(num_experts):
mask = topk_ids == i
if mask.sum():
if quant_dtype is None:
# ...
tmp2 = act_fn(tmp1)
# ...This change should be applied to all calls to apply_moe_activation within the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| topk_weight, | ||
| topk_ids, | ||
| global_num_experts, | ||
| b_bias1, | ||
| b_bias2, | ||
| w1_bias, | ||
| w2_bias, | ||
| expert_map, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide activation argument when calling torch_experts
The new activation parameter in torch_experts is required but torch_moe still calls the helper with the old positional layout. As written the integer global_num_experts is passed into get_act_and_mul_fn as the activation name and the bias tensors are shifted into the wrong parameters, so any test that reaches this code will raise before exercising the kernel. Please either give activation a default (e.g., "silu") or add an activation argument to torch_moe and forward it correctly.
Useful? React with 👍 / 👎.
Summary: Pull Request resolved: vllm-project#28503 GPT OSS has a different activation function from Llama4. This diff removes activation function hardcoding to allow general fused moe support. Test Plan: CI Passes Differential Revision: D86720793
42d2563 to
a5dcebd
Compare
Summary: GPT OSS has a different activation function from Llama4. This diff removes activation function hardcoding to allow general fused moe support.
Test Plan: CI Passes
Differential Revision: D86720793