open-compass · blueternalness · Jul 10, 2025
diff --git a/docs/en/advanced_guides/circular_eval.md b/docs/en/advanced_guides/circular_eval.md
@@ -110,4 +110,4 @@ summarizer = dict(
 )
 ```
 
-For more complex evaluation examples, refer to this sample code: https://github.com/open-compass/opencompass/tree/main/configs/eval_circular.py
+For more complex evaluation examples, refer to this sample code: https://github.com/open-compass/opencompass/tree/main/examples/eval_circular.py
diff --git a/docs/en/advanced_guides/code_eval.md b/docs/en/advanced_guides/code_eval.md
@@ -52,7 +52,7 @@ We also need model responses with randomness, thus setting the `generation_kwarg
 
 Note: `num_return_sequences` must be greater than or equal to k, as pass@k itself is a probability estimate.
 
-You can specifically refer to the following configuration file [configs/eval_code_passk.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_code_passk.py)
+You can specifically refer to the following configuration file [examples/eval_code_passk.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_code_passk.py)
 
 ### For Models That Do Not Support Multiple Responses
 
@@ -101,4 +101,4 @@ For `mbpp`, modify the `type`, `eval_cfg.evaluator.type`, `reader_cfg.output_col
 
 We also need model responses with randomness, thus setting the `generation_kwargs` parameter is necessary.
 
-You can specifically refer to the following configuration file [configs/eval_code_passk_repeat_dataset.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_code_passk_repeat_dataset.py)
+You can specifically refer to the following configuration file [examples/eval_code_passk_repeat_dataset.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_code_passk_repeat_dataset.py)
diff --git a/docs/en/advanced_guides/code_eval_service.md b/docs/en/advanced_guides/code_eval_service.md
@@ -62,7 +62,7 @@ When the model inference and code evaluation services are running on the same ho
 
 ### Configuration File
 
-We provide [the configuration file](https://github.com/open-compass/opencompass/blob/main/configs/eval_codegeex2.py) of using `humanevalx` for evaluation on `codegeex2` as reference.
+We provide [the configuration file](https://github.com/open-compass/opencompass/blob/main/examples/eval_codegeex2.py) of using `humanevalx` for evaluation on `codegeex2` as reference.
 
 The dataset and related post-processing configurations files can be found at this [link](https://github.com/open-compass/opencompass/tree/main/configs/datasets/humanevalx) with attention paid to the `evaluator` field in the humanevalx_eval_cfg_dict.
 

diff --git a/docs/en/advanced_guides/contamination_eval.md b/docs/en/advanced_guides/contamination_eval.md
@@ -72,7 +72,7 @@ will report the accuracy or perplexity of ceval on subsets composed of these thr
 
 - If the performance of the three is relatively close, the contamination level of the model on that test set is light; otherwise, it is heavy.
 
-The following configuration file can be referenced [link](https://github.com/open-compass/opencompass/blob/main/configs/eval_contamination.py):
+The following configuration file can be referenced [link](https://github.com/open-compass/opencompass/blob/main/examples/eval_contamination.py):
 
 ```python
 from mmengine.config import read_base

diff --git a/docs/en/advanced_guides/evaluation_lightllm.md b/docs/en/advanced_guides/evaluation_lightllm.md
@@ -63,7 +63,7 @@ else:
 ### Step-2: Evaluate the above model using OpenCompass.
 
 ```shell
-python run.py configs/eval_lightllm.py
+python run.py examples/eval_lightllm.py
 ```
 
 You are expected to get the evaluation results after the inference and evaluation.

diff --git a/docs/en/advanced_guides/needleinahaystack_eval.md b/docs/en/advanced_guides/needleinahaystack_eval.md
@@ -89,7 +89,7 @@ For other models, it is recommended to write your own config file (such as `exam
 You can then run evaluation with:
 
 ```bash
-python run.py configs/eval_needlebench_v2.py --slurm -p partition_name -q reserved --max-num-workers 16
+python run.py examples/eval_needlebench_v2.py --slurm -p partition_name -q reserved --max-num-workers 16
 ```
 
 No need to manually specify `--dataset`, `--models`, or `--summarizer` again.

diff --git a/docs/en/advanced_guides/objective_judgelm_evaluation.md b/docs/en/advanced_guides/objective_judgelm_evaluation.md
@@ -19,7 +19,7 @@ OpenCompass currently supports most datasets that use `GenInferencer` for infere
 
 ### Step One: Building Evaluation Configurations, Using MATH as an Example
 
-Below is the Config for evaluating the MATH dataset with JudgeLLM, with the evaluation model being *Llama3-8b-instruct* and the JudgeLLM being *Llama3-70b-instruct*. For more detailed config settings, please refer to `configs/eval_math_llm_judge.py`. The following is a brief version of the annotations to help users understand the meaning of the configuration file.
+Below is the Config for evaluating the MATH dataset with JudgeLLM, with the evaluation model being *Llama3-8b-instruct* and the JudgeLLM being *Llama3-70b-instruct*. For more detailed config settings, please refer to `examples/eval_math_llm_judge.py`. The following is a brief version of the annotations to help users understand the meaning of the configuration file.
 
 ```python
 # Most of the code in this file is copied from https://github.com/openai/simple-evals/blob/main/math_eval.py

diff --git a/docs/en/advanced_guides/prompt_attack.md b/docs/en/advanced_guides/prompt_attack.md
@@ -90,7 +90,7 @@ attack = dict(
 Please use `--mode infer` when run the attack experiment, and set `PYTHONPATH` env.
 
 ```shell
-python run.py configs/eval_attack.py --mode infer
+python run.py examples/eval_attack.py --mode infer
 ```
 
 All the results will be saved in `attack` folder.

diff --git a/docs/en/advanced_guides/subjective_evaluation.md b/docs/en/advanced_guides/subjective_evaluation.md
@@ -25,7 +25,7 @@ We support the use of GPT-4 (or other JudgeLLM) for the subjective evaluation of
 
 ## Initiating Subjective Evaluation
 
-Similar to existing objective evaluation methods, you can configure related settings in `configs/eval_subjective.py`.
+Similar to existing objective evaluation methods, you can configure related settings in `examples/eval_subjective.py`.
 
 ### Basic Parameters: Specifying models, datasets, and judgemodels
 

diff --git a/docs/zh_cn/advanced_guides/circular_eval.md b/docs/zh_cn/advanced_guides/circular_eval.md
@@ -108,4 +108,4 @@ summarizer = dict(
 )
 ```
 
-更多复杂的评测案例可以参考这个样例代码: https://github.com/open-compass/opencompass/tree/main/configs/eval_circular.py
+更多复杂的评测案例可以参考这个样例代码: https://github.com/open-compass/opencompass/tree/main/examples/eval_circular.py
diff --git a/docs/zh_cn/advanced_guides/code_eval.md b/docs/zh_cn/advanced_guides/code_eval.md
@@ -53,7 +53,7 @@ models = [
 注意：`num_return_sequences` 必须大于等于k，本身pass@k是计算的概率估计。
 
 具体可以参考以下配置文件
-[configs/eval_code_passk.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_code_passk.py)
+[examples/eval_code_passk.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_code_passk.py)
 
 ### 模型不支持多回复
 
@@ -103,4 +103,4 @@ models = [
 另外我们需要模型的回复有随机性，同步需要设置`generation_kwargs`参数。
 
 具体可以参考以下配置文件
-[configs/eval_code_passk_repeat_dataset.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_code_passk_repeat_dataset.py)
+[examples/eval_code_passk_repeat_dataset.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_code_passk_repeat_dataset.py)
diff --git a/docs/zh_cn/advanced_guides/code_eval_service.md b/docs/zh_cn/advanced_guides/code_eval_service.md
@@ -62,7 +62,7 @@ telnet your_service_ip_address your_service_port
 
 ### 配置文件
 
-我们已经提供了 huamaneval-x 在 codegeex2 上评估的\[配置文件\]作为参考(https://github.com/open-compass/opencompass/blob/main/configs/eval_codegeex2.py)。
+我们已经提供了 huamaneval-x 在 codegeex2 上评估的\[配置文件\]作为参考(https://github.com/open-compass/opencompass/blob/main/examples/eval_codegeex2.py)。
 其中数据集以及相关后处理的配置文件为这个[链接](https://github.com/open-compass/opencompass/tree/main/configs/datasets/humanevalx)， 需要注意 humanevalx_eval_cfg_dict 中的evaluator 字段。
 
 ```python

diff --git a/docs/zh_cn/advanced_guides/contamination_eval.md b/docs/zh_cn/advanced_guides/contamination_eval.md
@@ -70,7 +70,7 @@ gsm8k-ref-ppl    f729ba     average_ppl  unknown              1.55          1.2
 
 - 若三者性能较为接近，则模型在该测试集上的污染程度较轻；反之则污染程度较重。
 
-我们可以参考使用以下配置文件 [link](https://github.com/open-compass/opencompass/blob/main/configs/eval_contamination.py)：
+我们可以参考使用以下配置文件 [link](https://github.com/open-compass/opencompass/blob/main/examples/eval_contamination.py)：
 
 ```python
 from mmengine.config import read_base

diff --git a/docs/zh_cn/advanced_guides/evaluation_lightllm.md b/docs/zh_cn/advanced_guides/evaluation_lightllm.md
@@ -63,7 +63,7 @@ else:
 ### 第二步: 使用 OpenCompass 评测上述模型
 
 ```shell
-python run.py configs/eval_lightllm.py
+python run.py examples/eval_lightllm.py
 ```
 
 当模型完成推理和指标计算后，我们便可获得模型的评测结果。

diff --git a/docs/zh_cn/advanced_guides/needleinahaystack_eval.md b/docs/zh_cn/advanced_guides/needleinahaystack_eval.md
@@ -92,7 +92,7 @@ pip install vllm
 当书写好测试的`config`文件后，我们可以命令行中通过`run.py`文件传入对应的config文件路径，例如：
 
 ```bash
-python run.py configs/eval_needlebench_v2.py  --slurm -p partition_name -q reserved --max-num-workers 16
+python run.py examples/eval_needlebench_v2.py  --slurm -p partition_name -q reserved --max-num-workers 16
 ```
 
 注意，此时我们不需传入`--dataset, --models, --summarizer `等参数，因为我们已经在config文件中定义了这些配置。你可以自己手动调节`--max-num-workers`的设定以调节并行工作的workers的数量。

diff --git a/docs/zh_cn/advanced_guides/objective_judgelm_evaluation.md b/docs/zh_cn/advanced_guides/objective_judgelm_evaluation.md
@@ -19,7 +19,7 @@
 
 ### 第一步：构建评测配置，以MATH为例
 
-下面是对MATH数据集进行JudgeLLM评测的Config，评测模型为*Llama3-8b-instruct*，JudgeLLM为*Llama3-70b-instruct*。更详细的config setting请参考 `configs/eval_math_llm_judge.py`，下面我们提供了部分简略版的注释，方便用户理解配置文件的含义。
+下面是对MATH数据集进行JudgeLLM评测的Config，评测模型为*Llama3-8b-instruct*，JudgeLLM为*Llama3-70b-instruct*。更详细的config setting请参考 `examples/eval_math_llm_judge.py`，下面我们提供了部分简略版的注释，方便用户理解配置文件的含义。
 
 ```python
 # Most of the code in this file is copied from https://github.com/openai/simple-evals/blob/main/math_eval.py

diff --git a/docs/zh_cn/advanced_guides/prompt_attack.md b/docs/zh_cn/advanced_guides/prompt_attack.md
@@ -90,7 +90,7 @@ attack = dict(
 请当运行攻击实验的时候请使用 `--mode infer` 选项，并需要指定`PYTHONPATH`。
 
 ```shell
-python run.py configs/eval_attack.py --mode infer
+python run.py examples/eval_attack.py --mode infer
 ```
 
 所有结果都将保存在名为“attack”的文件夹中。

diff --git a/docs/zh_cn/advanced_guides/subjective_evaluation.md b/docs/zh_cn/advanced_guides/subjective_evaluation.md
@@ -25,7 +25,7 @@
 
 ## 启动主观评测
 
-类似于已有的客观评测方式，可以在configs/eval_subjective.py中进行相关配置
+类似于已有的客观评测方式，可以在examples/eval_subjective.py中进行相关配置
 
 ### 基本参数models, datasets 和 judgemodels的指定
 
@@ -134,7 +134,7 @@ judgemodel通常被设置为GPT4等强力模型，可以直接按照config文件
 ### 第三步 启动评测并输出评测结果
 
 ```shell
-python run.py configs/eval_subjective.py -r
+python run.py examples/eval_subjective.py -r
 ```
 
 - `-r` 参数支持复用模型推理和评估结果。

diff --git a/docs/zh_cn/get_started/quick_start.md b/docs/zh_cn/get_started/quick_start.md
@@ -12,7 +12,7 @@
 
 **可视化**：评估完成后，OpenCompass 将结果整理成易读的表格，并将其保存为 CSV 和 TXT 文件。你也可以激活飞书状态上报功能，此后可以在飞书客户端中及时获得评测状态报告。
 
-接下来，我们将展示 OpenCompass 的基础用法，展示基座模型模型 [InternLM2-1.8B](https://huggingface.co/internlm/internlm2-1_8b) 和对话模型 [InternLM2-Chat-1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b)、[Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) 在 [GSM8K](https://github.com/openai/grade-school-math) 和 [MATH](https://github.com/hendrycks/math) 下采样数据集上的评估。它们的配置文件可以在 [configs/eval_chat_demo.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_chat_demo.py) 和 [configs/eval_base_demo.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_base_demo.py) 中找到。
+接下来，我们将展示 OpenCompass 的基础用法，展示基座模型模型 [InternLM2-1.8B](https://huggingface.co/internlm/internlm2-1_8b) 和对话模型 [InternLM2-Chat-1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b)、[Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) 在 [GSM8K](https://github.com/openai/grade-school-math) 和 [MATH](https://github.com/hendrycks/math) 下采样数据集上的评估。它们的配置文件可以在 [examples/eval_chat_demo.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_chat_demo.py) 和 [examples/eval_base_demo.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_base_demo.py) 中找到。
 
 在运行此实验之前，请确保您已在本地安装了 OpenCompass。这个例子 (应该) 可以在一台 _GTX-1660-6G_ GPU 下成功运行。
 
@@ -136,7 +136,7 @@ python tools/list_configs.py llama mmlu
 
 除了通过命令行配置实验外，OpenCompass 还允许用户在配置文件中编写实验的完整配置，并通过 `run.py` 直接运行它。配置文件是以 Python 格式组织的，并且必须包括 `datasets` 和 `models` 字段。
 
-本次测试配置在 [configs/eval_chat_demo.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_chat_demo.py) 中。此配置通过 [继承机制](../user_guides/config.md#继承机制) 引入所需的数据集和模型配置，并以所需格式组合 `datasets` 和 `models` 字段。
+本次测试配置在 [examples/eval_chat_demo.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_chat_demo.py) 中。此配置通过 [继承机制](../user_guides/config.md#继承机制) 引入所需的数据集和模型配置，并以所需格式组合 `datasets` 和 `models` 字段。
 
 ```python
 from mmengine.config import read_base
@@ -154,7 +154,7 @@ models = hf_qwen2_1_5b_instruct_models + hf_internlm2_chat_1_8b_models
 运行任务时，我们只需将配置文件的路径传递给 `run.py`：
 
 ```bash
-python run.py configs/eval_chat_demo.py --debug
+python run.py examples/eval_chat_demo.py --debug
 ```
 
 :::{dropdown} 关于 `models`
@@ -190,7 +190,7 @@ models = [
 
 与模型类似，数据集的配置文件也提供在 `configs/datasets` 下。用户可以在命令行中使用 `--datasets`，或通过继承在配置文件中导入相关配置
 
-下面是来自 `configs/eval_chat_demo.py` 的与数据集相关的配置片段：
+下面是来自 `examples/eval_chat_demo.py` 的与数据集相关的配置片段：
 
 ```python
 from mmengine.config import read_base  # 使用 mmengine.read_base() 读取基本配置
@@ -270,7 +270,7 @@ python run.py \
 
 除了通过命令行配置实验外，OpenCompass 还允许用户在配置文件中编写实验的完整配置，并通过 `run.py` 直接运行它。配置文件是以 Python 格式组织的，并且必须包括 `datasets` 和 `models` 字段。
 
-本次测试配置在 [configs/eval_base_demo.py](https://github.com/open-compass/opencompass/blob/main/configs/eval_base_demo.py) 中。此配置通过 [继承机制](../user_guides/config.md#继承机制) 引入所需的数据集和模型配置，并以所需格式组合 `datasets` 和 `models` 字段。
+本次测试配置在 [examples/eval_base_demo.py](https://github.com/open-compass/opencompass/blob/main/examples/eval_base_demo.py) 中。此配置通过 [继承机制](../user_guides/config.md#继承机制) 引入所需的数据集和模型配置，并以所需格式组合 `datasets` 和 `models` 字段。
 
 ```python
 from mmengine.config import read_base
@@ -288,7 +288,7 @@ models = hf_qwen2_1_5b_models + hf_internlm2_1_8b_models
 运行任务时，我们只需将配置文件的路径传递给 `run.py`：
 
 ```bash
-python run.py configs/eval_base_demo.py --debug
+python run.py examples/eval_base_demo.py --debug
 ```
 
 :::{dropdown} 关于 `models`
@@ -324,7 +324,7 @@ models = [
 
 与模型类似，数据集的配置文件也提供在 `configs/datasets` 下。用户可以在命令行中使用 `--datasets`，或通过继承在配置文件中导入相关配置
 
-下面是来自 `configs/eval_base_demo.py` 的与数据集相关的配置片段：
+下面是来自 `examples/eval_base_demo.py` 的与数据集相关的配置片段：
 
 ```python
 from mmengine.config import read_base  # 使用 mmengine.read_base() 读取基本配置
@@ -358,7 +358,7 @@ OpenCompass 通常假定运行环境网络是可用的。如果您遇到网络
 由于 OpenCompass 默认并行启动评估过程，我们可以在第一次运行时以 `--debug` 模式启动评估，并检查是否存在问题。包括在前述的所有文档中，我们都使用了 `--debug` 开关。在 `--debug` 模式下，任务将按顺序执行，并实时打印输出。
 
 ```bash
-python run.py configs/eval_chat_demo.py -w outputs/demo --debug
+python run.py examples/eval_chat_demo.py -w outputs/demo --debug
 ```
 
 对话默写 'internlm/internlm2-chat-1_8b' 和 'Qwen/Qwen2-1.5B-Instruct' 将在首次运行期间从 HuggingFace 自动下载。
@@ -371,7 +371,7 @@ python run.py configs/eval_chat_demo.py -w outputs/demo --debug
 然后，您可以按 `Ctrl+C` 中断程序，并以正常模式运行以下命令：
 
 ```bash
-python run.py configs/eval_chat_demo.py -w outputs/demo
+python run.py examples/eval_chat_demo.py -w outputs/demo
 ```
 
 在正常模式下，评估任务将在后台并行执行，其输出将被重定向到输出目录 `outputs/demo/{TIMESTAMP}`。前端的进度条只指示已完成任务的数量，而不考虑其成功或失败。**任何后端任务失败都只会在终端触发警告消息。**

diff --git a/opencompass/configs/datasets/CHARM/README.md b/opencompass/configs/datasets/CHARM/README.md
@@ -95,11 +95,11 @@ ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
 ```bash
 cd ${path_to_opencompass}
 
-# modify config file `configs/eval_charm_rea.py`: uncomment or add models you want to evaluate
-python run.py configs/eval_charm_rea.py -r --dump-eval-details
+# modify config file `examples/eval_charm_rea.py`: uncomment or add models you want to evaluate
+python run.py examples/eval_charm_rea.py -r --dump-eval-details
 
-# modify config file `configs/eval_charm_mem.py`: uncomment or add models you want to evaluate
-python run.py configs/eval_charm_mem.py -r --dump-eval-details
+# modify config file `examples/eval_charm_mem.py`: uncomment or add models you want to evaluate
+python run.py examples/eval_charm_mem.py -r --dump-eval-details
 ```
 The inference and evaluation results would be in `${path_to_opencompass}/outputs`, like this:
 ```bash

diff --git a/opencompass/configs/datasets/CHARM/README_ZH.md b/opencompass/configs/datasets/CHARM/README_ZH.md
@@ -93,11 +93,11 @@ ln -snf ${path_to_CHARM_repo}/data/CHARM ./data/CHARM
 ```bash
 cd ${path_to_opencompass}
 
-# 修改配置文件`configs/eval_charm_rea.py`: 将现有的模型取消注释，或者添加你想评测的模型
-python run.py configs/eval_charm_rea.py -r --dump-eval-details
+# 修改配置文件`examples/eval_charm_rea.py`: 将现有的模型取消注释，或者添加你想评测的模型
+python run.py examples/eval_charm_rea.py -r --dump-eval-details
 
-# 修改配置文件`configs/eval_charm_mem.py`: 将现有的模型取消注释，或者添加你想评测的模型
-python run.py configs/eval_charm_mem.py -r --dump-eval-details
+# 修改配置文件`examples/eval_charm_mem.py`: 将现有的模型取消注释，或者添加你想评测的模型
+python run.py examples/eval_charm_mem.py -r --dump-eval-details
 ```
 推理和评测的结果位于路径`${path_to_opencompass}/outputs`, 如下所示:
 ```bash

diff --git a/opencompass/configs/datasets/babilong/README.md b/opencompass/configs/datasets/babilong/README.md
@@ -11,7 +11,7 @@ BABILong paper provides in total 20 tasks, we provide 10 tasks configurations in
 Opencompass provides a demo for evaluating language models on the BABILong dataset.
 
 ```bash
-opencompass configs/eval_babilong.py
+opencompass examples/eval_babilong.py
 ```
 OpenCompass provides the results of some models on the BABILong dataset. The evaluation results are run with LMDeploy with default model settings.
 

diff --git a/opencompass/configs/datasets/chinese_simpleqa/README.md b/opencompass/configs/datasets/chinese_simpleqa/README.md
@@ -84,9 +84,9 @@ We provide three evaluation methods.
     ```
 
 
-- Step3: configuration your launch in configs/eval_chinese_simpleqa.py, set your models to be evaluated, set your judge model (we recommend to use gpt4o) and launch it!
+- Step3: configuration your launch in examples/eval_chinese_simpleqa.py, set your models to be evaluated, set your judge model (we recommend to use gpt4o) and launch it!
   ```
-  python run.py configs/eval_chinese_simpleqa.py
+  python run.py examples/eval_chinese_simpleqa.py
   ```
 
 

diff --git a/opencompass/configs/datasets/inference_ppl/README.md b/opencompass/configs/datasets/inference_ppl/README.md
@@ -13,7 +13,7 @@ where Eq. (1) is the normal mean ppl computation formula, for inference-ppl, we
 
 ```shell
 cd opencompass
-python run.py configs/eval_inference_ppl.py
+python run.py examples/eval_inference_ppl.py
 ```
 
 # Some results
-Original file line number
+Diff line change
@@ Expand Up / @@ -110,4 +110,4 @@ summarizer = dict( @@
     )
     ```
-    For more complex evaluation examples, refer to this sample code: https://github.com/open-compass/opencompass/tree/main/configs/eval_circular.py
+    For more complex evaluation examples, refer to this sample code: https://github.com/open-compass/opencompass/tree/main/examples/eval_circular.py