Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/en/advanced_guides/needleinahaystack_eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ If evaluating locally, the command will use all available GPUs. You can control

```bash
# Local evaluation
python run.py --dataset needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer
python run.py --datasets needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer
```

##### Evaluation on Slurm Cluster
Expand All @@ -57,21 +57,21 @@ For Slurm environments, you can add options like `--slurm -p partition_name -q r

```bash
# Slurm evaluation
python run.py --dataset needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

##### Evaluating Specific Subsets

If you only want to test the original Needle In A Haystack task (e.g., single-needle 128k), adjust the dataset parameter:

```bash
python run.py --dataset needlebench_v2_single_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_single_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

To evaluate only Chinese versions, specify the subset dataset after `/`:

```bash
python run.py --dataset needlebench_v2_single_128k/needlebench_zh_datasets --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_single_128k/needlebench_zh_datasets --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

Ensure `VLLM` is installed beforehand:
Expand All @@ -92,7 +92,7 @@ You can then run evaluation with:
python run.py configs/eval_needlebench_v2.py --slurm -p partition_name -q reserved --max-num-workers 16
```

No need to manually specify `--dataset`, `--models`, or `--summarizer` again.
No need to manually specify `--datasets`, `--models`, or `--summarizer` again.

### Visualization

Expand Down
10 changes: 5 additions & 5 deletions docs/zh_cn/advanced_guides/needleinahaystack_eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ pip install -e .

```bash
# 本地评估
python run.py --dataset needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer
python run.py --datasets needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer
```

##### 在Slurm集群上评估
Expand All @@ -57,21 +57,21 @@ python run.py --dataset needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_12

```bash
# Slurm评估
python run.py --dataset needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

##### 只评估子数据集

如果只想测试原始的大海捞针任务设定,比如可以更换数据集的参数为`needlebench_single_128k`,这对应于128k长度下的单针版本的大海捞针测试:

```bash
python run.py --dataset needlebench_v2_single_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_single_128k --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

您也可以进一步选择子数据集,如更换数据集`--datasets`的参数为`needlebench_single_128k/needlebench_zh_datasets`,仅仅进行中文版本的单针128k长度下的大海捞针任务测试,其中`/`后面的参数代表子数据集,您可以在`opencompass/configs/datasets/needlebench_v2/needlebench_v2_128k/needlebench_v2_single_128k.py`中找到可选的子数据集变量,如:

```bash
python run.py --dataset needlebench_v2_single_128k/needlebench_zh_datasets --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
python run.py --datasets needlebench_v2_single_128k/needlebench_zh_datasets --models vllm_qwen2_5_7b_instruct_128k --summarizer needlebench/needlebench_v2_128k_summarizer --slurm -p partition_name -q reserved --max-num-workers 16
```

注意在评估前预先安装[VLLM](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html)工具
Expand All @@ -95,7 +95,7 @@ pip install vllm
python run.py configs/eval_needlebench_v2.py --slurm -p partition_name -q reserved --max-num-workers 16
```

注意,此时我们不需传入`--dataset, --models, --summarizer `等参数,因为我们已经在config文件中定义了这些配置。你可以自己手动调节`--max-num-workers`的设定以调节并行工作的workers的数量。
注意,此时我们不需传入`--datasets, --models, --summarizer `等参数,因为我们已经在config文件中定义了这些配置。你可以自己手动调节`--max-num-workers`的设定以调节并行工作的workers的数量。

### 可视化

Expand Down