Skip to content

Commit f40de68

Browse files
horheynmkylesayrs
authored andcommitted
[E2E Testing] KV-Cache (#1004)
~~Contingent on merge of vllm-project/vllm#11354 ^ merged SUMMARY: Add kv-cache e2e testing * One small model - tinyllama - with kv-cache * One small model - tinyllama - with kv-cache + gptq * Fused Model - phi3 - with kv-cache Signed-off-by: Kyle Sayers <[email protected]>
1 parent 78fab24 commit f40de68

File tree

5 files changed

+43
-0
lines changed

5 files changed

+43
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
cadence: "nightly"
2+
test_type: "regression"
3+
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
4+
recipe: tests/e2e/vLLM/recipes/kv_cache/gptq.yaml
5+
dataset_id: HuggingFaceH4/ultrachat_200k
6+
dataset_split: train_sft
7+
scheme: kv_cache_default_tinyllama
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
cadence: "nightly"
2+
test_type: "regression"
3+
model: microsoft/Phi-3-mini-4k-instruct
4+
recipe: tests/e2e/vLLM/recipes/kv_cache/default.yaml
5+
dataset_id: HuggingFaceH4/ultrachat_200k
6+
dataset_split: train_sft
7+
scheme: kv_cache_default_phi3
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
cadence: "nightly"
2+
test_type: "regression"
3+
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
4+
recipe: tests/e2e/vLLM/recipes/kv_cache/default.yaml
5+
dataset_id: HuggingFaceH4/ultrachat_200k
6+
dataset_split: train_sft
7+
scheme: kv_cache_default_tinyllama
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
quant_stage:
2+
quant_modifiers:
3+
QuantizationModifier:
4+
kv_cache_scheme:
5+
{num_bits: 8, type: float, symmetric: true, strategy: tensor}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
quant_stage:
2+
quant_modifiers:
3+
QuantizationModifier:
4+
kv_cache_scheme:
5+
{num_bits: 8, type: float, symmetric: true, strategy: tensor}
6+
GPTQModifier:
7+
sequential_update: false
8+
ignore: ["lm_head"]
9+
config_groups:
10+
group_0:
11+
weights:
12+
num_bits: 4
13+
type: "int"
14+
symmetric: true
15+
strategy: "channel"
16+
actorder: False
17+
targets: ["Linear"]

0 commit comments

Comments
 (0)