Skip to content

Conversation

xytpai
Copy link

@xytpai xytpai commented Sep 24, 2025

We registered new pattern-matching logics for integrating aiter::_fused_rms_mxfp4_quant_kernel and aiter::act_mul_and_mxfp4_quant
image

@xytpai
Copy link
Author

xytpai commented Sep 24, 2025

image

@xytpai xytpai marked this pull request as draft September 25, 2025 03:50
@xytpai xytpai marked this pull request as ready for review September 25, 2025 06:48
@xytpai
Copy link
Author

xytpai commented Sep 25, 2025

To enable mxfp4 fusion:

--compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE", "custom_ops": ["+rms_norm", "+silu_and_mul"]}'

empty_fp4(32, 32), # weight_gemm
empty_fp4(32, 1), # scale
empty_fp4(32, 4), # weight_gemm
empty_fp4(1, 1), # scale

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we here empty the dummy tensor for all of arguments of this pattern? And what does the shape here mean?

Copy link
Author

@xytpai xytpai Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xytpai xytpai changed the title [355_wip] Let dynamo capture add+rmsnorm+f4gemm pattern [355_wip] Let dynamo capture rms/silu_mul+f4gemm pattern Sep 25, 2025
@xytpai xytpai requested a review from dllehr-amd September 26, 2025 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants