-
Notifications
You must be signed in to change notification settings - Fork 195
AWQ resolved mappings -- ensure shapes align #1372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Shouldn't we be erroring if the user passes a bad mapping, rather than excluding model-specific names? |
Also, is there an architecture-agnostic way to verify a mapping? How do we know that the model doesn't perform a reshaping in between mappings? |
It does, it will fail in the call if for example a user has |
I talked to @rahul-tuli about an architecture-agnostic way, but I think for now we resolved to scope this to exactly how the AutoAWQ handles it here. In reality, if the smooth_layer.out_features doesn't match the balance_layer.in_features, it will fail during runtime. Wonkier architectures may hit further errors, but we resolved to handle those as they arise rather than attempting a more all-encompassing solution |
@brian-dellabetta I'm worried about breaking/polluting other model architectures. If we're excluding names specific to an architecture, shouldn't we be creating a model architecture mappings registry, similar to smoothquant? |
@kylesayrs we do have a mappings registry, see diff just above here, for each model family. However, the specific model can have different shapes. for example, |
@brian-dellabetta I see. Since I'd suggest that, rather than logging the exclusion for each layer (which could be noisy), hoisting the logic out of the for loops and modifying the mapping based on the config. def validate_mapping(self, model: PreTrainedModel, mappings: List[AWQMapping]):
if model.config.model_type == "llama":
num_attn_heads = model.config.num_attention_heads
num_kv_heads = model.config.num_num_key_value_heads
if num_attn_heads != num_kv_heads:
logger.info(f"Excluding v_proj due to shape mismatch")
mappings = [mapping for mapping in mappings if mapping.smooth_layer != "re:.*v_proj"]
return mappings |
@kylesayrs it is not limited to llama. See AutoAWQ's qwen, mistral and gemma implementations for a few additional examples. |
@brian-dellabetta My point is that AWQ explicitly lists which architectures this rule applies to, whereas the current implementation applies the rule to all architectures. However, it seems like this is consistent enough across architectures to merit a blanket rule. If logging for every layer is noisy, we can talk about logging only once. |
@kylesayrs yeah, I don't think they've picked those model architectures in particular, I bet they were added as errors arose, because without this check the algorithm will error out further down. I am not sure why they are doing shape checks, which seems more exclusive than just checking Yeah I thought about the logging being noisy for large models. Since we are restricting this to o_proj/v_proj layers i'll change it to a count |
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
f250c1c
to
51a58a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding tests! Logs are much cleaner too. Works for me locally 🎉
SUMMARY: Add a robustness check to AWQ to exclude mappings where the layer shapes don't align. This is a known issue, just wanted to do it in a different PR because it occurs in a different location in AutoAWQ, and I wanted to keep the initial AWQ PR to be as close to the AutoAWQ implementation so the git history reflects our changes well. TEST PLAN: Added unit test to check default case and edge cases --------- Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>
SUMMARY:
Add a robustness check to AWQ to exclude mappings where the layer shapes don't align. This is a known issue, just wanted to do it in a different PR because it occurs in a different location in AutoAWQ, and I wanted to keep the initial AWQ PR to be as close to the AutoAWQ implementation so the git history reflects our changes well.
TEST PLAN:
Added unit test to check default case and edge cases