Skip to content

Commit c1fe865

Browse files
fixing reprdocubility of lmeval tests (#1220)
SUMMARY: LM Eval weekly tests are failing, this resolves two issues 1. installs pillow, which I had locally through vllm but is not installed as part of llm-compressor 2. adds a random seed to the lmeval tests, which seems after a good amount of testing to resolve the issue. it is entirely during calibration/quantization, lm-eval behavior is deterministic as they always set a seed. It is a bit surprising that it can have such a drastic effect, but these are 2B vision-language models and a difficult multiple choice dataset, not too far away from random guessing. TEST PLAN: no new src code --------- Signed-off-by: Brian Dellabetta <[email protected]>
1 parent 64175da commit c1fe865

File tree

5 files changed

+18
-7
lines changed

5 files changed

+18
-7
lines changed

setup.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,12 @@
6060
"datasets",
6161
"accelerate>=0.20.3,!=1.1.0",
6262
"pynvml",
63-
"compressed-tensors"
64-
if version_info.build_type == "release"
65-
else "compressed-tensors-nightly",
63+
"pillow",
64+
(
65+
"compressed-tensors"
66+
if version_info.build_type == "release"
67+
else "compressed-tensors-nightly"
68+
),
6669
],
6770
extras_require={
6871
"dev": [

tests/lmeval/configs/vl_fp8_dynamic_per_token.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ cadence: weekly
22
model: Qwen/Qwen2-VL-2B-Instruct
33
model_class: TraceableQwen2VLForConditionalGeneration
44
scheme: FP8_DYNAMIC
5+
seed: 42 # compressed model is sensitive to random seed
56
lmeval:
67
model: "hf-multimodal"
78
model_args:
@@ -10,7 +11,6 @@ lmeval:
1011
convert_img_format: True
1112
task: mmmu_val_economics
1213
num_fewshot: 0
13-
limit: 1000
1414
batch_size: 8
1515
metrics:
1616
acc,none: 0.333

tests/lmeval/configs/vl_int8_w8a8_dynamic_per_token.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ scheme: INT8_dyn_per_token
55
recipe: tests/e2e/vLLM/recipes/INT8/recipe_int8_channel_weight_dynamic_per_token.yaml
66
dataset_id: lmms-lab/flickr30k
77
dataset_split: "test[:512]"
8+
seed: 42 #compressed model is sensitive to random seed
89
lmeval:
910
model: "hf-multimodal"
1011
model_args:
@@ -13,7 +14,6 @@ lmeval:
1314
convert_img_format: True
1415
task: mmmu_val_economics
1516
num_fewshot: 0
16-
limit: 1000
1717
metrics:
1818
acc,none: 0.233
1919
batch_size: 8

tests/lmeval/configs/vl_w4a16_actorder_weight.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ recipe: tests/e2e/vLLM/recipes/actorder/recipe_w4a16_actorder_weight.yaml
55
dataset_id: lmms-lab/flickr30k
66
dataset_split: "test[:512]"
77
scheme: W4A16_actorder_group
8+
seed: 42 #compressed model is sensitive to random seed
89
lmeval:
910
model: "hf-multimodal"
1011
model_args:
@@ -13,7 +14,6 @@ lmeval:
1314
convert_img_format: True
1415
task: mmmu_val_economics
1516
num_fewshot: 0
16-
limit: 1000
1717
metrics:
18-
acc,none: 0.4
18+
acc,none: 0.366
1919
batch_size: 4

tests/lmeval/test_lmeval.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
import os
2+
import random
23
import shutil
34
from pathlib import Path
45

56
import numpy
67
import pytest
8+
import torch
79
import yaml
810
from loguru import logger
911
from pydantic import BaseModel
@@ -73,6 +75,12 @@ def set_up(self):
7375
self.quant_type = eval_config.get("quant_type")
7476
self.save_dir = eval_config.get("save_dir")
7577

78+
seed = eval_config.get("seed", None)
79+
if seed is not None:
80+
random.seed(seed)
81+
numpy.random.seed(seed)
82+
torch.manual_seed(seed)
83+
7684
logger.info("========== RUNNING ==============")
7785
logger.info(self.scheme)
7886

0 commit comments

Comments
 (0)