huggingface
diff --git a/‎Makefile‎
Lines changed: 5 additions & 0 deletions b/‎Makefile‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/source/add_dataset.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/add_dataset.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/add_metric.rst‎
Lines changed: 182 additions & 0 deletions b/‎docs/source/add_metric.rst‎
Lines changed: 182 additions & 0 deletions
diff --git a/‎docs/source/index.rst‎
Lines changed: 8 additions & 6 deletions b/‎docs/source/index.rst‎
Lines changed: 8 additions & 6 deletions
diff --git a/‎docs/source/loading_datasets.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/loading_datasets.rst‎
Lines changed: 1 addition & 0 deletions
@@ -12,3 +12,8 @@ quality:
 style:
 	black --line-length 119 --target-version py36 tests src benchmarks datasets metrics
 	isort tests src benchmarks datasets metrics
+
+# Run tests for the library
+
+test:
+	python -m pytest -n auto --dist=loadfile -s -v ./tests/
@@ -250,6 +250,7 @@ The base :class:`nlp.BuilderConfig` class is very simple and only comprises the
 You can sub-class the base :class:`nlp.BuilderConfig` class to add additional attributes that you may want to use to control the generation of a dataset. The specific configuration class that will be used by the dataset is set in the :attr:`nlp.DatasetBuilder.BUILDER_CONFIG_CLASS`.
 
 There are two ways to populate the attributes of a :class:`nlp.BuilderConfig` class or sub-class:
+
 - a list of predefined :class:`nlp.BuilderConfig` classes or sub-classes can be set in the :attr:`nlp.DatasetBuilder.BUILDER_CONFIGS` attribute of the dataset. Each specific configuration can then be selected by giving its ``name`` as ``name`` keyword to :func:`nlp.load_dataset`,
 - when calling :func:`nlp.load_dataset`, all the keyword arguments which are not specific to the :func:`nlp.load_dataset` method will be used to set the associated attributes of the :class:`nlp.BuilderConfig` class and override the predefined attributes if a specific configuration was selected.
 
 
@@ -0,0 +1,182 @@
+Writing a metric loading script
+=============================================
+
+If you want to use your own metric, or if you would like to share a new metric with the community, for instance in the `HuggingFace Hub <https://huggingface.co/metrics>`__, then you can define a new metric loading script.
+
+This chapter will explain how metrics are loaded and how you can write from scratch or adapt a metric loading script.
+
+.. note::
+
+	You can start from the `template for a metric loading script <https://github.com/huggingface/nlp/blob/master/templates/new_metric_script.py>`__ when writing a new metric loading script. You can find this template in the ``templates`` folder on the github repository.
+
+
+To create a new metric loading script one mostly needs to specify three methods in a :class:`nlp.Metric` class:
+
+- :func:`nlp.Metric._info` which is in charge of specifying the metric metadata as a :obj:`nlp.MetricInfo` dataclass and in particular the :class:`nlp.Features` which defined the types of the predictions and the references,
+- :func:`nlp.Metric._compute` which is in charge of computing the actual score(s), given some predictions and references.
+
+.. note::
+
+	Note on naming: the metric class should be camel case, while the metric name is its snake case equivalent (ex: :obj:`class Rouge(nlp.Metric)` for the metric ``rouge``).
+
+
+Adding metric metadata
+----------------------------------
+
+The :func:`nlp.Metric._info` method is in charge of specifying the metric metadata as a :obj:`nlp.MetricInfo` dataclass and in particular the :class:`nlp.Features` which defined the types of the predictions and the references. :class:`nlp.MetricInfo` has a predefined set of attributes and cannot be extended. The full list of attributes can be found in the package reference.
+
+The most important attributes to specify are:
+
+- :attr:`nlp.MetricInfo.features`: a :class:`nlp.Features` instance defining the name and the type the predictions and references,
+- :attr:`nlp.MetricInfo.description`: a :obj:`str` describing the metric,
+- :attr:`nlp.MetricInfo.citation`: a :obj:`str` containing the citation for the metric in a BibTex format for inclusion in communications citing the metric,
+- :attr:`nlp.MetricInfo.homepage`: a :obj:`str` containing an URL to an original homepage of the metric.
+- :attr:`nlp.MetricInfo.format`: an optional :obj:`str` to tell what is the format of the predictions and the references passed to _compute. It can be set to "numpy", "torch", "tensorflow" or "pandas".
+
+Here is for instance the :func:`nlp.Metric._info` for the Sacrebleu metric for instance, which is taken from the `sacrebleu metric loading script <https://github.com/huggingface/nlp/tree/master/metrics/sacrebleu/sacrebleu.py>`__
+
+.. code-block::
+
+    def _info(self):
+        return nlp.MetricInfo(
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            homepage="https://github.com/mjpost/sacreBLEU",
+            inputs_description=_KWARGS_DESCRIPTION,
+            features=nlp.Features({
+                'predictions': nlp.Value('string'),
+                'references': nlp.Sequence(nlp.Value('string')),
+            }),
+            codebase_urls=["https://github.com/mjpost/sacreBLEU"],
+            reference_urls=["https://github.com/mjpost/sacreBLEU",
+                            "https://en.wikipedia.org/wiki/BLEU",
+                            "https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213"]
+        )
+
+
+The :class:`nlp.Features` define the type of the predictions and the references and can define arbitrary nested objects with fields of various types. More details on the available ``features`` can be found in the guide on features :doc:`features` and in the package reference on :class:`nlp.Features`. Many examples of features can also be found in the various `metric scripts provided on the GitHub repository <https://github.com/huggingface/nlp/tree/master/metrics>`__ and even in `dataset scripts provided on the GitHub repository <https://github.com/huggingface/nlp/tree/master/datasets>`__ or directly inspected on the `🤗nlp viewer <https://huggingface.co/nlp/viewer>`__.
+
+Here are the features of the SQuAD metric for instance, which is taken from the `squad metric loading script <https://github.com/huggingface/nlp/tree/master/metrics/squad/squad.py>`__:
+
+.. code-block::
+
+    nlp.Features({
+        'predictions': nlp.Value('string'),
+        'references': nlp.Sequence(nlp.Value('string')),
+    }),
+
+We can see that each prediction is a string, and each reference is a sequence of strings.
+Indeed we can use the metric the following way:
+
+.. code-block::
+
+    >>> import nlp
+
+    >>> metric = nlp.load_metric('./metrics/sacrebleu')
+    >>> reference_batch = [['The dog bit the man.', 'The dog had bit the man.'],
+    ...                    ['It was not unexpected.', 'No one was surprised.'],
+    ...                    ['The man bit him first.', 'The man had bitten the dog.']]
+    >>> sys_batch = ['The dog bit the man.', "It wasn't surprising.", 'The man had just bitten him.']
+    >>> score = metric.add_batch(predictions=sys_batch, references=reference_batch)
+    >>> print(metric)
+
+
+Downloading data files
+-------------------------------------------------
+
+The :func:`nlp.Metric._download_and_prepare` method is in charge of downloading (or retrieving locally the data files) if needed.
+
+This method **takes as input** a :class:`nlp.DownloadManager` which is a utility which can be used to download files (or to retrieve them from the local filesystem if they are local files or are already in the cache).
+
+Let's have a look at a simple example of a :func:`nlp.Metric._download_and_prepare` method. We'll take the example of the `bleurt metric loading script <https://github.com/huggingface/nlp/tree/master/metrics/bleurt/bleurt.py>`__:
+
+.. code-block::
+
+    def _download_and_prepare(self, dl_manager):
+
+        # check that config name specifies a valid BLEURT model
+        if self.config_name not in CHECKPOINT_URLS.keys():
+            raise KeyError(f"{self.config_name} model not found. You should supply the name of a model checkpoint for bleurt in {CHECKPOINT_URLS.keys()}")
+
+        # download the model checkpoint specified by self.config_name and set up the scorer
+        model_path = dl_manager.download_and_extract(CHECKPOINT_URLS[self.config_name])
+        self.scorer = score.BleurtScorer(os.path.join(model_path, self.config_name))  
+
+As you can see this method downloads a model checkpoint depending of the configuration name of the metric. The checkpoint url is then provided to the :func:`nlp.DownloadManager.download_and_extract` method which will take care of downloading or retrieving the file from the local file system and returning a object of the same type and organization (here a just one path, but it could be a list or a dict of paths) with the path to the local version of the requested files. :func:`nlp.DownloadManager.download_and_extract` can take as input a single URL/path or a list or dictionary of URLs/paths and will return an object of the same structure (single URL/path, list or dictionary of URLs/paths) with the path to the local files. This method also takes care of extracting compressed tar, gzip and zip archives.
+
+:func:`nlp.DownloadManager.download_and_extract` can download files from a large set of origins but if your data files are hosted on a special access server, it's also possible to provide a callable which will take care of the downloading process to the ``DownloadManager`` using :func:`nlp.DownloadManager.download_custom`.
+
+.. note::
+
+	In addition to :func:`nlp.DownloadManager.download_and_extract` and :func:`nlp.DownloadManager.download_custom`, the :class:`nlp.DownloadManager` class also provide more fine-grained control on the download and extraction process through several methods including: :func:`nlp.DownloadManager.download`, :func:`nlp.DownloadManager.extract` and :func:`nlp.DownloadManager.iter_archive`. Please refer to the package reference on :class:`nlp.DownloadManager` for details on these methods.
+
+
+Computing the scores
+-------------------------------------------------
+
+The :func:`nlp.DatasetBuilder._compute` is in charge of computing the metric scores given predictions and references that are in the format specified in the ``features`` set in :func:`nlp.DatasetBuilder._info`.
+
+Here again, let's take the simple example of the `xnli metric loading script <https://github.com/huggingface/nlp/tree/master/metrics/squad/squad.py>`__:
+
+.. code-block::
+
+    def simple_accuracy(preds, labels):
+        return (preds == labels).mean()
+
+    class Xnli(nlp.Metric):
+        def _info(self):
+            return nlp.MetricInfo(
+                description=_DESCRIPTION,
+                citation=_CITATION,
+                inputs_description=_KWARGS_DESCRIPTION,
+                features=nlp.Features({
+                    'predictions': nlp.Value('int64' if self.config_name != 'sts-b' else 'float32'),
+                    'references': nlp.Value('int64' if self.config_name != 'sts-b' else 'float32'),
+                }),
+                codebase_urls=[],
+                reference_urls=[],
+                format='numpy'
+            )
+
+        def _compute(self, predictions, references):
+            return {"accuracy": simple_accuracy(predictions, references)}
+
+Here to compute the accuracy it uses the simple_accuracy function, that uses numpy to compute the accuracy using .mean()
+
+The predictions and references objects passes to _compute are sequences of integers or floats, and the sequences are formated as numpy arrays since the format specified in the :obj:`nlp.MetricInfo` object is set to "numpy".
+
+Specifying several metric configurations
+-------------------------------------------------
+
+Sometimes you want to provide several ways of computing the scores.
+
+It is possible to gave different configurations for a metric. The configuration name is stored in :obj:`nlp.Metric.config_name` attribute. The configuration name can be specified by the user when instantiating a metric:
+
+.. code-block::
+
+	>>> from nlp import load_metric
+	>>> metric = load_metric('bleurt', name='bleurt-base-128')
+	>>> metric = load_metric('bleurt', name='bleurt-base-512')
+
+Here depending on the configuration name, a different checkpoint will be downloaded and used to compute the BLEURT score.
+
+You can access :obj:`nlp.Metric.config_name` from inside :func:`nlp.Metric._info`, :func:`nlp.Metric._download_and_prepare` and :func:`nlp.Metric._compute`
+
+Testing the metric loading script
+-------------------------------------------------
+
+Once you're finished with creating or adapting a metric loading script, you can try it locally by giving the path to the metric loading script:
+
+.. code-block::
+
+	>>> from nlp import load_metric
+	>>> metric = load_metric('PATH/TO/MY/SCRIPT.py')
+
+If your metric has several configurations you can use the arguments of :func:`nlp.load_metric` accordingly:
+
+.. code-block::
+
+	>>> from nlp import load_metric
+	>>> metric = load_metric('PATH/TO/MY/SCRIPT.py', 'my_configuration')
+
+
@@ -42,21 +42,23 @@ The documentation is organized in five parts:
     loading_datasets
     exploring
     processing
+    torch_tensorflow
     faiss_and_ea
 
 .. toctree::
     :maxdepth: 2
-    :caption: Creating and sharing datasets
+    :caption: Using metrics
 
-    add_dataset
-    share_dataset
+    loading_metrics
+    using_metrics
 
 .. toctree::
     :maxdepth: 2
-    :caption: Using metrics
+    :caption: Adding new datasets/metrics
 
-    loading_metrics
-    using_metrics
+    add_dataset
+    share_dataset
+    add_metric
 
 .. toctree::
     :maxdepth: 2
 
@@ -183,6 +183,7 @@ CSV files
 All the CSV files in the dataset should have the same organization and in particular the same datatypes for the columns.
 
 A few interesting features are provided out-of-the-box by the Apache Arrow backend:
+
 - multi-threaded or single-threaded reading
 - automatic decompression of input files (based on the filename extension, such as my_data.csv.gz)
 - fetching column names from the first row in the CSV file