You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/add_dataset.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -250,6 +250,7 @@ The base :class:`nlp.BuilderConfig` class is very simple and only comprises the
250
250
You can sub-class the base :class:`nlp.BuilderConfig` class to add additional attributes that you may want to use to control the generation of a dataset. The specific configuration class that will be used by the dataset is set in the :attr:`nlp.DatasetBuilder.BUILDER_CONFIG_CLASS`.
251
251
252
252
There are two ways to populate the attributes of a :class:`nlp.BuilderConfig` class or sub-class:
253
+
253
254
- a list of predefined :class:`nlp.BuilderConfig` classes or sub-classes can be set in the :attr:`nlp.DatasetBuilder.BUILDER_CONFIGS` attribute of the dataset. Each specific configuration can then be selected by giving its ``name`` as ``name`` keyword to :func:`nlp.load_dataset`,
254
255
- when calling :func:`nlp.load_dataset`, all the keyword arguments which are not specific to the :func:`nlp.load_dataset` method will be used to set the associated attributes of the :class:`nlp.BuilderConfig` class and override the predefined attributes if a specific configuration was selected.
The library also provides a selection of metrics focusing in particular on:
5
+
5
6
- providing a common API accross a range of NLP metrics,
6
7
- providing metrics associated to some benchmark datasets provided by the libray such as GLUE or SQuAD,
7
8
- providing access to recent and somewhat complex metrics such as BLEURT or BERTScore,
@@ -127,6 +128,7 @@ In several settings, computing metrics in distributed or parrallel processing en
127
128
Let's first see how to use a metric in a distributed setting before giving a few words about the internals. Let's say we train and evaluate a model in 8 parallel processes (e.g. using PyTorch's `DistributedDataParallel <https://pytorch.org/tutorials/intermediate/ddp_tutorial.html>`__ on a server with 8 GPUs).
128
129
129
130
We assume your python script can have access to:
131
+
130
132
- the total number of processes as an integer we'll call ``num_process`` (in our example 8),
131
133
- the process id of each process as an integer between 0 and ``num_process-1`` that we'll call ``rank`` (in our case betwen 0 and 7 included).
Copy file name to clipboardExpand all lines: docs/source/share_dataset.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,7 @@ Sharing your dataset
2
2
=============================================
3
3
4
4
Once you've written a new dataset loading script as detailed on the :doc:`add_dataset` page, you may want to share it with the community for instance on the `HuggingFace Hub <https://huggingface.co/datasets>`__. There are two options to do that:
5
+
5
6
- add it as a canonical dataset by opening a pull-request on the `GitHub repository for 🤗nlp <https://github.com/huggingface/nlp>`__,
6
7
- directly upload it on the Hub as a community provided dataset.
Evaluating a model's predictions with :class:`nlp.Metric` involve just a couple of methods:
5
-
- :func:`nlp.Metric.add` and :func:`nlp.Metric.add_batch` are used to add paris of predictions/reference (or just predictions if the metrics doesn't make use of references) to a temporary (and memory efficient) cache table,
6
-
- :func:`nlp.Metric.compute` then compute the metric score from the stored predictions/references.
4
+
Evaluating a model's predictions with :class:`nlp.Metric` involves just a couple of methods:
5
+
6
+
- :func:`nlp.Metric.add` and :func:`nlp.Metric.add_batch` are used to add pairs of predictions/reference (or just predictions if a metric doesn't make use of references) to a temporary and memory efficient cache table,
7
+
- :func:`nlp.Metric.compute` then gather all the cached predictions and reference to compute the metric score.
7
8
8
9
A typical **two-steps workflow** to compute the metric is thus as follow:
9
10
@@ -13,13 +14,13 @@ A typical **two-steps workflow** to compute the metric is thus as follow:
13
14
14
15
metric = nlp.load_metric('my_metric')
15
16
16
-
for model_input, gold_references in evaluation_dataloader:
Alternatively, when the model predictions can be computed in one step, a **single-step workflow** can be used by directly feeding the predictions/references to :func:`nlp.Metric.compute` as follow:
23
+
Alternatively, when the model predictions over the whole evaluation dataset can be computed in one step, a **single-step workflow** can be used by directly feeding the predictions/references to the :func:`nlp.Metric.compute` method as follow:
23
24
24
25
.. code-block::
25
26
@@ -34,60 +35,61 @@ Alternatively, when the model predictions can be computed in one step, a **singl
34
35
35
36
.. note::
36
37
37
-
Uner the hood, both the two-steps workflow and the single-step workflow use a temporary cache table to store predictions/references before computing the scores. This is convenient for several reasons that we briefly detail here. The `nlp` library is designed to handle a wide range of metrics and in particular metrics whose scores depends on the evaluation set in non-additive ways (``f(A∪B) ≠ f(A) + f(B)``). Storing predictions/references make this quite convenient. The library is also designed to be efficient in terms of CPU/GPU memory even when the predictions/references pairs involve large objects by using memory-mapped temporary cache files thus effectively requiring almost no CPU/GPU memory to store prediction. Lastly, storing predictions/references pairs in temporary cache files enable easy distributed computation for the metrics by using the cahce file as synchronization objects across the various processes.
38
+
Uner the hood, both the two-steps workflow and the single-step workflow use memory-mapped temporary cache tables to store predictions/references before computing the scores (similarly to a :class:`nlp.Dataset`). This is convenient for several reasons:
39
+
40
+
- let us easily handle metrics whose scores depends on the evaluation set in non-additive ways, i.e. when f(A∪B) ≠ f(A) + f(B),
41
+
- very efficient in terms of CPU/GPU memory (effectively requiring no CPU/GPU memory to use the metrics),
42
+
- enable easy distributed computation for the metrics by using the cache file as synchronization objects across the various processes.
38
43
39
44
Adding predictions and references
40
45
-----------------------------------------
41
46
42
-
Adding model predictions and references can be done using either one of the :func:`nlp.Metric.add`, :func:`nlp.Metric.add_batch` and :func:`nlp.Metric.compute` methods (only once for the last one).
47
+
Adding model predictions and references to a :class:`nlp.Metric` instance can be done using either one of :func:`nlp.Metric.add`, :func:`nlp.Metric.add_batch` and :func:`nlp.Metric.compute` methods.
48
+
49
+
There methods are pretty simple to use and only accept two arguments for predictions/references:
43
50
44
-
:func:`nlp.Metric.add`, :func:`nlp.Metric.add_batch` are pretty intuitve to use. They only accept two arguments:
45
51
- ``predictions`` (for :func:`nlp.Metric.add_batch`) and ``prediction`` (for :func:`nlp.Metric.add`) should contains the predictions of a model to be evaluated by mean of the metric. For :func:`nlp.Metric.add` this will be a single prediction, for :func:`nlp.Metric.add_batch` this will be a batch of predictions.
46
-
- ``references`` (for :func:`nlp.Metric.add_batch`) and ``reference`` (for :func:`nlp.Metric.add`) should contains the references that the model predictions should be compared to if this metric require references. For :func:`nlp.Metric.add` this will be the reference associated to a single prediction, for :func:`nlp.Metric.add_batch` this will be references associated to a batch of predictions. Note that some metrics accept several references to compare each model prediction to.
52
+
- ``references`` (for :func:`nlp.Metric.add_batch`) and ``reference`` (for :func:`nlp.Metric.add`) should contains the references that the model predictions should be compared to (if the metric requires references). For :func:`nlp.Metric.add` this will be the reference associated to a single prediction, for :func:`nlp.Metric.add_batch` this will be references associated to a batch of predictions. Note that some metrics accept several references to compare each model prediction to.
47
53
48
-
:func:`nlp.Metric.add` and :func:`nlp.Metric.add_batch` require **named arguments** to avoid the silent error of mixing predictions with references.
54
+
:func:`nlp.Metric.add` and :func:`nlp.Metric.add_batch` require the use of **named arguments** to avoid the silent error of mixing predictions with references.
49
55
50
56
The model predictions and references can be provided in a wide number of formats (python lists, numpy arrays, pytorch tensors, tensorflow tensors), the metric object will take care of converting them to a suitable format for temporary storage and computation (as well as bringing them back to cpu and detaching them from gradients for PyTorch tensors).
51
57
52
-
The exact format of the inputs is specific to each metric script and can be found in :obj:`nlp.Metric.features`, :obj:`nlp.Metric.inputs_descriptions` and the string representation of the :class:`nlp.Metric` object:
58
+
The exact format of the inputs is specific to each metric script and can be found in :obj:`nlp.Metric.features`, :obj:`nlp.Metric.inputs_descriptions` and the string representation of the :class:`nlp.Metric` object.
predictions: The system stream (a sequence of segments)
152
-
references: A list of one or more reference streams (each a sequence of segments)
153
-
smooth: The smoothing method to use
154
-
smooth_value: For 'floor' smoothing, the floor to use
155
-
force: Ignore data that looks already tokenized
156
-
lowercase: Lowercase the data
157
-
tokenize: The tokenizer to use
158
-
Returns:
159
-
'score': BLEU score,
160
-
'counts': Counts,
161
-
'totals': Totals,
162
-
'precisions': Precisions,
163
-
'bp': Brevity penalty,
164
-
'sys_len': predictions length,
165
-
'ref_len': reference length,
166
-
""", stored examples: 3)
148
+
Note that the format of the inputs is a bit different than the official sacrebleu format: we provide the references for each prediction in a list inside the list associated to the prediction while the official example is nested the other way around (list for the reference numbers and inside list for the examples).
149
+
150
+
Querying the length of a Metric object will return the number of we can see on the last line, we have stored three evaluation examples in our metric.
167
151
168
-
We have stored three evaluation examples in our metric, now let's compute the score.
152
+
Now let's compute the sacrebleu score from these 3 evaluation datapoints.
169
153
170
-
Conmputing the metric scores
154
+
Computing the metric scores
171
155
-----------------------------------------
172
156
157
+
The evaluation of a metric scores is done by using the :func:`nlp.Metric.compute` method.
158
+
159
+
This method can accept several arguments:
160
+
161
+
- predictions and references: you can add predictions and references (to be added at the end of the cache if you have used :func:`nlp.Metric.add` or :func:`nlp.Metric.add_batch` before)
162
+
- specific arguments that can be required or can modify the behavior of some metrics (print the metric input description to see the details with ``print(metric)`` or ``print(metric.inputs_description)``).
163
+
164
+
In the simplest case (when the predictions and references have already been added with ``add`` or ``add_batch`` and no specific argument need to be set to modify the default behavior of the metric, we can just call :func:`nlp.Metric.compute`:
165
+
166
+
.. code-black::
167
+
168
+
>>> score = metric.compute()
169
+
Done writing 3 examples in 265 bytes /Users/thomwolf/.cache/huggingface/metrics/sacrebleu/default/default_experiment-0430a7c7-31cb-48bf-9fb0-2a0b6c03ad81-1-0.arrow.
170
+
Set __getitem__(key) output type to python objects for no columns (when key is int or slice) and don't output other (un-formatted) columns.
0 commit comments