You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/add_dataset.rst
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,7 @@ Here are the features of the SQuAD dataset for instance, which is taken from the
97
97
}
98
98
)
99
99
100
-
These features should be mostly self-explanatory given the above introduction. One specific behavior here is the fact that the ``Sequence`` field in ``"answers"`` is given a dictionnary of sub-fields. As mentioned in the above note, in this case, this feature is actually **converted in a dictionnary of lists** (instead of the list of dictionnary that we read in the feature here).
100
+
These features should be mostly self-explanatory given the above introduction. One specific behavior here is the fact that the ``Sequence`` field in ``"answers"`` is given a dictionary of sub-fields. As mentioned in the above note, in this case, this feature is actually **converted in a dictionary of lists** (instead of the list of dictionary that we read in the feature here).
101
101
102
102
We can see a confirmation of that in the structure of the examples yield by the generation method at the very end of the `squad dataset loading script <https://github.com/huggingface/nlp/tree/master/datasets/squad/squad.py>`__:
103
103
@@ -114,7 +114,7 @@ We can see a confirmation of that in the structure of the examples yield by the
As you can see this method first prepare a dict of URL to the original data files for SQuAD. This dict is then provided to the :func:`nlp.DownloadManager.download_and_extract` method which will take care of downloading or retriving from the local file system these files and returning a object of the same type and organization (here a dictionary) with the path to the local version of the requetsed files. :func:`nlp.DownloadManager.download_and_extract` can take as input a single URL/path or a list or dictionnary of URLs/paths and will return an object of the same structure (single URL/path, list or dictionnary of URLs/paths) with the path to the local files.
188
+
As you can see this method first prepare a dict of URL to the original data files for SQuAD. This dict is then provided to the :func:`nlp.DownloadManager.download_and_extract` method which will take care of downloading or retriving from the local file system these files and returning a object of the same type and organization (here a dictionary) with the path to the local version of the requetsed files. :func:`nlp.DownloadManager.download_and_extract` can take as input a single URL/path or a list or dictionary of URLs/paths and will return an object of the same structure (single URL/path, list or dictionary of URLs/paths) with the path to the local files.
189
189
190
190
This method also takes care of extracting compressed tar, gzip and zip archives.
191
191
@@ -208,7 +208,7 @@ Generating the samples in each split
208
208
209
209
The :func:`nlp.DatasetBuilder._generate_examples` is in charge of reading the data files for a split and yielding examples with the format specified in the ``features`` set in :func:`nlp.DatasetBuilder._info`.
210
210
211
-
The input arguments of :func:`nlp.DatasetBuilder._generate_examples` are defined by the :obj:`gen_kwargs` dictionnary returned by the :func:`nlp.DatasetBuilder._split_generator` method we detailed above.
211
+
The input arguments of :func:`nlp.DatasetBuilder._generate_examples` are defined by the :obj:`gen_kwargs` dictionary returned by the :func:`nlp.DatasetBuilder._split_generator` method we detailed above.
212
212
213
213
Here again, let's take the simple example of the `squad dataset loading script <https://github.com/huggingface/nlp/tree/master/datasets/squad/squad.py>`__:
214
214
@@ -244,7 +244,7 @@ The input argument is the ``filepath`` provided in the :obj:`gen_kwargs` of each
244
244
245
245
The method read and parse the inputs files and yield a tuple constituted of an ``id_`` (can be arbitrary be should be unique (for backward compatibility with TensorFlow dataset) and an example.
246
246
247
-
The example is a dictionnary with the same structure and element types as the ``features`` defined in :func:`nlp.DatasetBuilder._info`.
247
+
The example is a dictionary with the same structure and element types as the ``features`` defined in :func:`nlp.DatasetBuilder._info`.
0 commit comments