Skip to content

Commit b85ae23

Browse files
authored
DOCS: Fix typo (#467)
1 parent 9f6fd88 commit b85ae23

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/source/add_dataset.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ Here are the features of the SQuAD dataset for instance, which is taken from the
9797
}
9898
)
9999
100-
These features should be mostly self-explanatory given the above introduction. One specific behavior here is the fact that the ``Sequence`` field in ``"answers"`` is given a dictionnary of sub-fields. As mentioned in the above note, in this case, this feature is actually **converted in a dictionnary of lists** (instead of the list of dictionnary that we read in the feature here).
100+
These features should be mostly self-explanatory given the above introduction. One specific behavior here is the fact that the ``Sequence`` field in ``"answers"`` is given a dictionary of sub-fields. As mentioned in the above note, in this case, this feature is actually **converted in a dictionary of lists** (instead of the list of dictionary that we read in the feature here).
101101

102102
We can see a confirmation of that in the structure of the examples yield by the generation method at the very end of the `squad dataset loading script <https://github.com/huggingface/nlp/tree/master/datasets/squad/squad.py>`__:
103103

@@ -114,7 +114,7 @@ We can see a confirmation of that in the structure of the examples yield by the
114114
"answers": {"answer_start": answer_starts, "text": answers,},
115115
}
116116
117-
Here the ``"answers"`` is accordingly provided with a dictionnary of lists and not a list of dictionnary.
117+
Here the ``"answers"`` is accordingly provided with a dictionary of lists and not a list of dictionary.
118118

119119
Let's take another example of features from the `large-scale reading comprehension dataset Race <https://huggingface.co/datasets/race>`__:
120120

@@ -185,7 +185,7 @@ Let's have a look at a simple example of a :func:`nlp.DatasetBuilder._split_gene
185185
nlp.SplitGenerator(name=nlp.Split.VALIDATION, gen_kwargs={"filepath": downloaded_files["dev"]}),
186186
]
187187
188-
As you can see this method first prepare a dict of URL to the original data files for SQuAD. This dict is then provided to the :func:`nlp.DownloadManager.download_and_extract` method which will take care of downloading or retriving from the local file system these files and returning a object of the same type and organization (here a dictionary) with the path to the local version of the requetsed files. :func:`nlp.DownloadManager.download_and_extract` can take as input a single URL/path or a list or dictionnary of URLs/paths and will return an object of the same structure (single URL/path, list or dictionnary of URLs/paths) with the path to the local files.
188+
As you can see this method first prepare a dict of URL to the original data files for SQuAD. This dict is then provided to the :func:`nlp.DownloadManager.download_and_extract` method which will take care of downloading or retriving from the local file system these files and returning a object of the same type and organization (here a dictionary) with the path to the local version of the requetsed files. :func:`nlp.DownloadManager.download_and_extract` can take as input a single URL/path or a list or dictionary of URLs/paths and will return an object of the same structure (single URL/path, list or dictionary of URLs/paths) with the path to the local files.
189189

190190
This method also takes care of extracting compressed tar, gzip and zip archives.
191191

@@ -208,7 +208,7 @@ Generating the samples in each split
208208

209209
The :func:`nlp.DatasetBuilder._generate_examples` is in charge of reading the data files for a split and yielding examples with the format specified in the ``features`` set in :func:`nlp.DatasetBuilder._info`.
210210

211-
The input arguments of :func:`nlp.DatasetBuilder._generate_examples` are defined by the :obj:`gen_kwargs` dictionnary returned by the :func:`nlp.DatasetBuilder._split_generator` method we detailed above.
211+
The input arguments of :func:`nlp.DatasetBuilder._generate_examples` are defined by the :obj:`gen_kwargs` dictionary returned by the :func:`nlp.DatasetBuilder._split_generator` method we detailed above.
212212

213213
Here again, let's take the simple example of the `squad dataset loading script <https://github.com/huggingface/nlp/tree/master/datasets/squad/squad.py>`__:
214214

@@ -244,7 +244,7 @@ The input argument is the ``filepath`` provided in the :obj:`gen_kwargs` of each
244244

245245
The method read and parse the inputs files and yield a tuple constituted of an ``id_`` (can be arbitrary be should be unique (for backward compatibility with TensorFlow dataset) and an example.
246246

247-
The example is a dictionnary with the same structure and element types as the ``features`` defined in :func:`nlp.DatasetBuilder._info`.
247+
The example is a dictionary with the same structure and element types as the ``features`` defined in :func:`nlp.DatasetBuilder._info`.
248248

249249
Specifying several dataset configurations
250250
-------------------------------------------------

0 commit comments

Comments
 (0)