Skip to content

Drop flash_attn skip for quantizing_moe example tests #1396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 29, 2025

Conversation

dbarbuzzi
Copy link
Collaborator

@dbarbuzzi dbarbuzzi commented Apr 28, 2025

SUMMARY:
Drop the skip related to requiring flash_attn be installed in the tests for the quantizing_moe examples. Recent CI failures related to this package and CUDA compatibility with the recently released PyTorch 2.7.0 has resulted in findings that it is not required for these tests.

TEST PLAN:
An internal test run that drops the installation of flash-attn and runs the changes on this branch indicates that the tests will pass (one successful so far, will mark PR as ready once the run completes and the remaining show expected results).

Specific relevant output (will update with other tests’ results):

tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_int8.py] PASSED

tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_fp8.py] PASSED

Signed-off-by: Domenic Barbuzzi <[email protected]>
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@dbarbuzzi dbarbuzzi marked this pull request as ready for review April 29, 2025 13:34
@dbarbuzzi dbarbuzzi added the ready When a PR is ready for review label Apr 29, 2025
@kylesayrs kylesayrs enabled auto-merge (squash) April 29, 2025 23:29
@kylesayrs kylesayrs merged commit 564140d into main Apr 29, 2025
8 checks passed
@kylesayrs kylesayrs deleted the drop-flash-attn-skip branch April 29, 2025 23:29
kylesayrs pushed a commit that referenced this pull request May 4, 2025
SUMMARY:
Drop the skip related to requiring `flash_attn` be installed in the
tests for the `quantizing_moe` examples. Recent CI failures related to
this package and CUDA compatibility with the recently released PyTorch
2.7.0 has resulted in findings that it is not required for these tests.

TEST PLAN:
An [internal test run][1] that drops the installation of `flash-attn`
and runs the changes on this branch indicates that the tests will pass
(one successful so far, will mark PR as ready once the run completes and
the remaining show expected results).

Specific relevant output (will update with other tests’ results):
```
tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_int8.py] PASSED

tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_fp8.py] PASSED
```

[1]:
https://github.com/neuralmagic/llm-compressor-testing/actions/runs/14712618904

Signed-off-by: Domenic Barbuzzi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants