Skip to content

Commit 5ac61ee

Browse files
Bram VanroyBram Vanroy
authored andcommitted
Release v2.0.0
1 parent 9866f68 commit 5ac61ee

File tree

1 file changed

+41
-2
lines changed

1 file changed

+41
-2
lines changed

HISTORY.rst

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,48 @@
22
History
33
#######
44

5-
**************************
5+
**********************
6+
2.0.0 (May 11th, 2020)
7+
**********************
8+
**Fully reworked version!**
9+
10+
* Tested support for both `spacy-stanza`_ and `spacy-udpipe`_! (Not included as a dependency, install manually)
11+
* Added a useful utility function :code:`init_parser` that can easily initialise a parser together with the custom
12+
pipeline component. (See the README or `examples`_.)
13+
* Added the :code:`disable_pandas` flag the the formatter class in case you would want to disable setting the pandas
14+
attribute even when pandas is installed.
15+
* Added custom properties for Tokens as well. So now a Doc, its sentence Spans as well as Tokens have custom attributes
16+
* Reworked datatypes of output. In version 2.0.0 the data types are as follows:
17+
- :code:`._.conll`: raw CoNLL format
18+
- in :code:`Token`: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as
19+
values.
20+
- in sentence :code:`Span`: a list of its tokens' :code:`._.conll` dictionaries (list of dictionaries).
21+
- in a :code:`Doc`: a list of its sentences' :code:`._.conll` lists (list of list of dictionaries).
22+
- :code:`._.conll_str`: string representation of the CoNLL format
23+
- in :code:`Token`: tab-separated representation of the contents of the CoNLL fields ending with a newline.
24+
- in sentence :code:`Span`: the expected CoNLL format where each row represents a token. When
25+
:code:`ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the
26+
`CoNLL format`_.
27+
- in :code:`Doc`: all its sentences' :code:`._.conll_str` combined and separated by new lines.
28+
- :code:`._.conll_pd`: ``pandas`` representation of the CoNLL format
29+
- in :code:`Token`: a :code:`Series` representation of this token's CoNLL properties.
30+
- in sentence :code:`Span`: a :code:`DataFrame` representation of this sentence, with the CoNLL names as column
31+
headers.
32+
- in :code:`Doc`: a concatenation of its sentences' :code:`DataFrame`'s, leading to a new a :code:`DataFrame` whose
33+
index is reset.
34+
* :code:`field_names` has been removed, assuming that you do not need to change the column names of the CoNLL properties
35+
* Removed the :code:`Spacy2ConllParser` class
36+
* Many doc changes, added tests, and a few examples
37+
38+
39+
.. _`spacy-stanza`: https://github.com/explosion/spacy-stanza
40+
.. _`spacy-udpipe`: https://github.com/TakeLab/spacy-udpipe
41+
.. _`examples`: examples/
42+
.. _`CoNLL format`: https://universaldependencies.org/format.html#sentence-boundaries-and-comments
43+
44+
************************
645
1.3.0 (April 28th, 2020)
7-
**************************
46+
************************
847
* **IMPORTANT**: This will be the last release that supports the deprecated Spacy2ConllParser class!
948
* Community addition (@KoichiYasuoka): add SpaceAfter=No to the Misc field when applicable.
1049
* Fixed failing tests

0 commit comments

Comments
 (0)