Skip to content

Commit 1524dad

Browse files
reidliu41Yuqi Zhang
authored andcommitted
[doc] add the print result (vllm-project#17584)
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
1 parent d5078c4 commit 1524dad

File tree

1 file changed

+3
-0
lines changed
  • docs/source/features/quantization

1 file changed

+3
-0
lines changed

docs/source/features/quantization/fp8.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ from vllm import LLM
3030
model = LLM("facebook/opt-125m", quantization="fp8")
3131
# INFO 06-10 17:55:42 model_runner.py:157] Loading model weights took 0.1550 GB
3232
result = model.generate("Hello, my name is")
33+
print(result[0].outputs[0].text)
3334
```
3435

3536
:::{warning}
@@ -106,6 +107,7 @@ Load and run the model in `vllm`:
106107
from vllm import LLM
107108
model = LLM("./Meta-Llama-3-8B-Instruct-FP8-Dynamic")
108109
model.generate("Hello my name is")
110+
print(result[0].outputs[0].text)
109111
```
110112

111113
Evaluate accuracy with `lm_eval` (for example on 250 samples of `gsm8k`):
@@ -188,4 +190,5 @@ from vllm import LLM
188190
model = LLM(model="Meta-Llama-3-8B-Instruct-FP8/")
189191
# INFO 06-10 21:15:41 model_runner.py:159] Loading model weights took 8.4596 GB
190192
result = model.generate("Hello, my name is")
193+
print(result[0].outputs[0].text)
191194
```

0 commit comments

Comments
 (0)