You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix GPTQ ROCm type conversion bug causing gibberish output
- Fix double type conversion bug in q_gemm.cu affecting all GPTQ models with tensor parallelism on ROCm
- Move half2 res2 declaration inside loop with proper zero initialization
- Remove problematic __half_as_ushort/__ushort_as_half conversions
- Fix false Triton flash attention warning for models with sliding window when VLLM_USE_TRITON_FLASH_ATTN=0
- Changes match upstream PR vllm-project#17583
This fixes silent data corruption that was causing GPTQ models to produce gibberish output on ROCm with tensor parallelism.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
0 commit comments