Skip to content

coref not using proper noun #1326

@fakerybakery

Description

@fakerybakery

Hi,
Thanks for this tool.
I noticed that sometimes coref doesn't use the proper noun, is there any way to make it use the proper noun?
Here is my code (wip):

import stanza
pipe = stanza.Pipeline("en", processors="tokenize,coref")
t = pipe('"I am doing this," John said. He did it.')

final = []
nouns = []
for sente in t.to_dict():
    sent = []
    exclude_ids = []
    for word in sente:
        if not word['id'] in exclude_ids:
            if type(word['id']) == tuple:
                exclude_ids += word['id']
            if "coref_chains" in word and type(word['coref_chains'] == list):
                if (word['coref_chains']) and not word['coref_chains'][0].is_representative:
                    print(word['coref_chains'][0].to_json())
                    sent.append(word['coref_chains'][0].chain.representative_text)
                else:
                    sent.append(word['text'])
            else:
                sent.append(word['text'])
    sent = [item.strip() for item in sent if item and item.strip()]
    x = ''
    for i in sent:
        if i in ['.', ',', '?', ';', ':']:
            x += i
        else:
            x += ' ' + i
    if sent:
        final.append(x.strip())

print(' '.join(final))

Output: " I am doing this, " I said. I did this.
It should be: " John am doing this, " John said. John did this.
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions