File tree Expand file tree Collapse file tree 1 file changed +11
-4
lines changed Expand file tree Collapse file tree 1 file changed +11
-4
lines changed Original file line number Diff line number Diff line change @@ -11,7 +11,7 @@ This documentation includes information for running the popular Llama 3.1 series
11
11
The pre-built image includes:
12
12
13
13
- ROCm™ 6.3.1
14
- - vLLM 0.6.6
14
+ - vLLM 0.7.3
15
15
- PyTorch 2.7dev (nightly)
16
16
17
17
## Pull latest Docker Image
@@ -20,18 +20,25 @@ Pull the most recent validated docker image with `docker pull rocm/vllm-dev:main
20
20
21
21
## What is New
22
22
23
- nightly_fixed_aiter_integration_final_20250305:
24
- - Performance improvement
23
+ 20250305_aiter:
24
+ - vllm 0.7.3
25
+ - HipblasLT 0.13
26
+ - AITER improvements
27
+ - Support for FP8 skinny GEMM
28
+
25
29
20250207_aiter:
26
30
- More performant AITER
27
31
- Bug fixes
32
+
28
33
20250205_aiter:
29
34
- [ AITER] ( https://github.com/ROCm/aiter ) support
30
35
- Performance improvement for custom paged attention
31
36
- Reduced memory overhead bug fix
37
+
32
38
20250124:
33
39
- Fix accuracy issue with 405B FP8 Triton FA
34
40
- Fixed accuracy issue with TP8
41
+
35
42
20250117:
36
43
- [ Experimental DeepSeek-V3 and DeepSeek-R1 support] ( #running-deepseek-v3-and-deepseek-r1 )
37
44
@@ -359,7 +366,7 @@ docker run -it --rm --ipc=host --network=host --group-add render \
359
366
--cap-add=CAP_SYS_ADMIN --cap-add=SYS_PTRACE \
360
367
--device=/dev/kfd --device=/dev/dri --device=/dev/mem \
361
368
-e VLLM_USE_TRITON_FLASH_ATTN=0 \
362
- -e VLLM_FP8_PADDING=0 \
369
+ -e VLLM_MLA_DISABLE=1 \
363
370
rocm/vllm-dev:main
364
371
# Online serving
365
372
vllm serve deepseek-ai/DeepSeek-V3 \
You can’t perform that action at this time.
0 commit comments