Drop `flash_attn` skip for quantizing_moe example tests #1396

dbarbuzzi · 2025-04-28T20:23:24Z

SUMMARY:
Drop the skip related to requiring flash_attn be installed in the tests for the quantizing_moe examples. Recent CI failures related to this package and CUDA compatibility with the recently released PyTorch 2.7.0 has resulted in findings that it is not required for these tests.

TEST PLAN:
An internal test run that drops the installation of flash-attn and runs the changes on this branch indicates that the tests will pass (one successful so far, will mark PR as ready once the run completes and the remaining show expected results).

Specific relevant output (will update with other tests’ results):

tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_int8.py] PASSED

tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_fp8.py] PASSED

Signed-off-by: Domenic Barbuzzi <[email protected]>

github-actions · 2025-04-28T20:23:34Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

SUMMARY: Drop the skip related to requiring `flash_attn` be installed in the tests for the `quantizing_moe` examples. Recent CI failures related to this package and CUDA compatibility with the recently released PyTorch 2.7.0 has resulted in findings that it is not required for these tests. TEST PLAN: An [internal test run][1] that drops the installation of `flash-attn` and runs the changes on this branch indicates that the tests will pass (one successful so far, will mark PR as ready once the run completes and the remaining show expected results). Specific relevant output (will update with other tests’ results): ``` tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_int8.py] PASSED tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_fp8.py] PASSED ``` [1]: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/14712618904 Signed-off-by: Domenic Barbuzzi <[email protected]>

Drop flash_attn examples test skip

32bb607

Signed-off-by: Domenic Barbuzzi <[email protected]>

dbarbuzzi marked this pull request as ready for review April 29, 2025 13:34

dbarbuzzi added the ready When a PR is ready for review label Apr 29, 2025

brian-dellabetta approved these changes Apr 29, 2025

View reviewed changes

kylesayrs approved these changes Apr 29, 2025

View reviewed changes

kylesayrs enabled auto-merge (squash) April 29, 2025 23:29

Merge branch 'main' into drop-flash-attn-skip

04a6728

kylesayrs merged commit 564140d into main Apr 29, 2025
8 checks passed

kylesayrs deleted the drop-flash-attn-skip branch April 29, 2025 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop `flash_attn` skip for quantizing_moe example tests #1396

Drop `flash_attn` skip for quantizing_moe example tests #1396

Uh oh!

dbarbuzzi commented Apr 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Drop flash_attn skip for quantizing_moe example tests #1396

Drop flash_attn skip for quantizing_moe example tests #1396

Uh oh!

Conversation

dbarbuzzi commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Drop `flash_attn` skip for quantizing_moe example tests #1396

Drop `flash_attn` skip for quantizing_moe example tests #1396

dbarbuzzi commented Apr 28, 2025 •

edited

Loading