|
2 | 2 | History |
3 | 3 | ####### |
4 | 4 |
|
5 | | -************************** |
| 5 | +********************** |
| 6 | +2.0.0 (May 11th, 2020) |
| 7 | +********************** |
| 8 | +**Fully reworked version!** |
| 9 | + |
| 10 | +* Tested support for both `spacy-stanza`_ and `spacy-udpipe`_! (Not included as a dependency, install manually) |
| 11 | +* Added a useful utility function :code:`init_parser` that can easily initialise a parser together with the custom |
| 12 | + pipeline component. (See the README or `examples`_.) |
| 13 | +* Added the :code:`disable_pandas` flag the the formatter class in case you would want to disable setting the pandas |
| 14 | + attribute even when pandas is installed. |
| 15 | +* Added custom properties for Tokens as well. So now a Doc, its sentence Spans as well as Tokens have custom attributes |
| 16 | +* Reworked datatypes of output. In version 2.0.0 the data types are as follows: |
| 17 | + - :code:`._.conll`: raw CoNLL format |
| 18 | + - in :code:`Token`: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as |
| 19 | + values. |
| 20 | + - in sentence :code:`Span`: a list of its tokens' :code:`._.conll` dictionaries (list of dictionaries). |
| 21 | + - in a :code:`Doc`: a list of its sentences' :code:`._.conll` lists (list of list of dictionaries). |
| 22 | + - :code:`._.conll_str`: string representation of the CoNLL format |
| 23 | + - in :code:`Token`: tab-separated representation of the contents of the CoNLL fields ending with a newline. |
| 24 | + - in sentence :code:`Span`: the expected CoNLL format where each row represents a token. When |
| 25 | + :code:`ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the |
| 26 | + `CoNLL format`_. |
| 27 | + - in :code:`Doc`: all its sentences' :code:`._.conll_str` combined and separated by new lines. |
| 28 | + - :code:`._.conll_pd`: ``pandas`` representation of the CoNLL format |
| 29 | + - in :code:`Token`: a :code:`Series` representation of this token's CoNLL properties. |
| 30 | + - in sentence :code:`Span`: a :code:`DataFrame` representation of this sentence, with the CoNLL names as column |
| 31 | + headers. |
| 32 | + - in :code:`Doc`: a concatenation of its sentences' :code:`DataFrame`'s, leading to a new a :code:`DataFrame` whose |
| 33 | + index is reset. |
| 34 | +* :code:`field_names` has been removed, assuming that you do not need to change the column names of the CoNLL properties |
| 35 | +* Removed the :code:`Spacy2ConllParser` class |
| 36 | +* Many doc changes, added tests, and a few examples |
| 37 | + |
| 38 | + |
| 39 | +.. _`spacy-stanza`: https://github.com/explosion/spacy-stanza |
| 40 | +.. _`spacy-udpipe`: https://github.com/TakeLab/spacy-udpipe |
| 41 | +.. _`examples`: examples/ |
| 42 | +.. _`CoNLL format`: https://universaldependencies.org/format.html#sentence-boundaries-and-comments |
| 43 | + |
| 44 | +************************ |
6 | 45 | 1.3.0 (April 28th, 2020) |
7 | | -************************** |
| 46 | +************************ |
8 | 47 | * **IMPORTANT**: This will be the last release that supports the deprecated Spacy2ConllParser class! |
9 | 48 | * Community addition (@KoichiYasuoka): add SpaceAfter=No to the Misc field when applicable. |
10 | 49 | * Fixed failing tests |
|
0 commit comments