-
Notifications
You must be signed in to change notification settings - Fork 920
Closed
Labels
Description
Hi,
Thanks for this tool.
I noticed that sometimes coref doesn't use the proper noun, is there any way to make it use the proper noun?
Here is my code (wip):
import stanza
pipe = stanza.Pipeline("en", processors="tokenize,coref")
t = pipe('"I am doing this," John said. He did it.')
final = []
nouns = []
for sente in t.to_dict():
sent = []
exclude_ids = []
for word in sente:
if not word['id'] in exclude_ids:
if type(word['id']) == tuple:
exclude_ids += word['id']
if "coref_chains" in word and type(word['coref_chains'] == list):
if (word['coref_chains']) and not word['coref_chains'][0].is_representative:
print(word['coref_chains'][0].to_json())
sent.append(word['coref_chains'][0].chain.representative_text)
else:
sent.append(word['text'])
else:
sent.append(word['text'])
sent = [item.strip() for item in sent if item and item.strip()]
x = ''
for i in sent:
if i in ['.', ',', '?', ';', ':']:
x += i
else:
x += ' ' + i
if sent:
final.append(x.strip())
print(' '.join(final))
Output: " I am doing this, " I said. I did this.
It should be: " John am doing this, " John said. John did this.
Thank you!