You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/advanced_guides/custom_dataset.md
+127-9Lines changed: 127 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,132 @@
1
-
# Custom Dataset Tutorial
1
+
# Dataset Quick Evaluation Tutorial
2
2
3
-
This tutorial is intended for temporary and informal use of datasets. If the dataset requires long-term use or has specific needs for custom reading/inference/evaluation, it is strongly recommended to implement it according to the methods described in [new_dataset.md](./new_dataset.md).
3
+
OpenCompass provides two paths for quickly evaluating the provided data, the data format protocol based on ChatMLDataset and the data format protocol based on CustomDataset.
4
+
Compared to the complete dataset integration process in [new_dataset.md](./new_dataset.md), these two evaluation paths are more convenient and efficient, being able to directly enter the evaluation process without adding new configuration files.
5
+
But if you have specific needs for custom reading/inference/evaluation, it is recommended to still follow the complete integration process to add a new dataset.
4
6
5
-
In this tutorial, we will introduce how to test a new dataset without implementing a config or modifying the OpenCompass source code. We support two types of tasks: multiple choice (`mcq`) and question & answer (`qa`). For `mcq`, both ppl and gen inferences are supported; for `qa`, gen inference is supported.
7
+
## Data Format Protocol and Fast Evaluation Based on ChatMLDataset
6
8
7
-
## Dataset Format
9
+
OpenCompass has recently launched a dataset evaluation mode based on the ChatML dialogue template, which allow users to provide a dataset .json file that conforms to the ChatML dialogue template, and simply set the dataset information config like model configs to start evaluating directly.
10
+
11
+
### Format Requirements for Data Files
12
+
13
+
This evaluation method only supports data files in `.json` format, and each sample must comply with the following format:
14
+
15
+
The format of a text-only dataset with a simple structure:
16
+
17
+
```jsonl
18
+
{
19
+
"question":[
20
+
{
21
+
"role": "system"# Omittable
22
+
"content": Str
23
+
},
24
+
{
25
+
"role": "user",
26
+
"content": Str
27
+
}
28
+
],
29
+
"answer":[
30
+
Str
31
+
]
32
+
}
33
+
{
34
+
...
35
+
}
36
+
...
37
+
```
38
+
39
+
The format of multiple rounds and multiple modes datasets:
40
+
41
+
```jsonl
42
+
{
43
+
"question":[
44
+
{
45
+
"role": "system",
46
+
"content": Str,
47
+
},
48
+
{
49
+
"role": "user",
50
+
"content": Str or List
51
+
[
52
+
{
53
+
"type": Str, # "image"
54
+
"image_url": Str,
55
+
},
56
+
...
57
+
{
58
+
"type": Str, # "text"
59
+
"text": Str,
60
+
},
61
+
]
62
+
},
63
+
{
64
+
"role": "assistant",
65
+
"content": Str
66
+
},
67
+
{
68
+
"role": "user",
69
+
"content": Str or List
70
+
},
71
+
...
72
+
],
73
+
"answer":[
74
+
Str,
75
+
Str,
76
+
...
77
+
]
78
+
}
79
+
{
80
+
...
81
+
}
82
+
...
83
+
```
84
+
85
+
(As OpenCompass currently does not support multi-mode evaluation, the template above is for reference only.)
86
+
87
+
When ChatMLDataset reading `.json` files, it will use `pydantic` to perform simple format validation on the files.
88
+
You can use `tools/chatml_fformat_test.py` to check your provided data file.
89
+
90
+
After format checking, please add a config dictionary named `chatml_datasets` in your running config file to convert the data file into an OpenCompass dataset at runtime.
91
+
An example is as follows:
92
+
93
+
```python
94
+
chatml_datasets = [
95
+
dict(
96
+
abbr='YOUR_DATASET_NAME',
97
+
path='YOUR_DATASET_PATH',
98
+
evaluator=dict(
99
+
type='cascade_evaluator',
100
+
rule_evaluator=dict(
101
+
type='math_evaluator',
102
+
),
103
+
llm_evaluator=dict(
104
+
type='llm_evaluator',
105
+
prompt="YOUR_JUDGE_PROMPT",
106
+
judge_cfg=dict(), # YOUR Judge Model Config
107
+
)
108
+
),
109
+
n=1, # Repeat Number
110
+
),
111
+
]
112
+
```
113
+
114
+
The ChatML evaluation module currently provides four preset evaluators, `mcq_rule_evaluator` used for MCQ evaluation, `math_evaluator` used for latex mathematical formula evaluation, `llm_evaluator` used for evaluating answers that are open-ended or difficult to extract), and `cascade_evaluator`, an evaluation mode composed of rule and LLM evaluators cascaded together.
115
+
116
+
In addition, if you have a long-term need to use datasets based on ChatML templates, you can contribute your dataset config to `opencompass/config/chatml_datasets`.
117
+
An eval example of calling these dataset configs is provided in `examples/evalchat_datasets.py`.
118
+
119
+
## Data Format Protocol and Fast Evaluation Based on CustomsDataset
120
+
121
+
(This module is no longer being updated, but it can still be used if there is a need for cli- quick evaluation.)
122
+
123
+
This module support two types of tasks: multiple choice (`mcq`) and question & answer (`qa`). For `mcq`, both ppl and gen inferences are supported; for `qa`, gen inference is supported.
124
+
125
+
### Dataset Format
8
126
9
127
We support datasets in both `.jsonl` and `.csv` formats.
10
128
11
-
### Multiple Choice (`mcq`)
129
+
####Multiple Choice (`mcq`)
12
130
13
131
For `mcq` datasets, the default fields are as follows:
14
132
@@ -37,7 +155,7 @@ question,A,B,C,answer
37
155
504+811+870+445=,2615,2630,2750,B
38
156
```
39
157
40
-
### Question & Answer (`qa`)
158
+
####Question & Answer (`qa`)
41
159
42
160
For `qa` datasets, the default fields are as follows:
43
161
@@ -65,7 +183,7 @@ question,answer
65
183
649+215+412+495+220+738+989+452=,4170
66
184
```
67
185
68
-
## Command Line List
186
+
###Command Line List
69
187
70
188
Custom datasets can be directly called for evaluation through the command line.
71
189
@@ -92,7 +210,7 @@ set them based on the following logic:
92
210
- If options like `A`, `B`, `C`, etc., can be parsed from the dataset file, it is considered an `mcq` dataset; otherwise, it is considered a `qa` dataset.
93
211
- The default `infer_method` is `gen`.
94
212
95
-
## Configuration File
213
+
###Configuration File
96
214
97
215
In the original configuration file, simply add a new item to the `datasets` variable. Custom datasets can be mixed with regular datasets.
98
216
@@ -103,7 +221,7 @@ datasets = [
103
221
]
104
222
```
105
223
106
-
## Supplemental Information for Dataset `.meta.json`
224
+
###Supplemental Information for Dataset `.meta.json`
107
225
108
226
OpenCompass will try to parse the input dataset file by default, so in most cases, the `.meta.json` file is **not necessary**. However, if the dataset field names are not the default ones, or custom prompt words are required, it should be specified in the `.meta.json` file.
0 commit comments