Skip to content

RELTask generates prompts that use ints instead of entity labels #366

@peter-axion

Description

@peter-axion

The relation task tells the LLM:

The text below contains pre-extracted entities, denoted in the following format within the text:
<entity text>[ENT<entity id>:<entity label>]

However, it generates prompt text like:

well[ENT0:14862748245026736845] hello[ENT1:14230521632333904559] there[ENT2:149303876845869574]!

Reproduce via:

import spacy_llm
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("well hello there!")
doc.set_ents([
    spacy.tokens.Span(doc,0,1,"A"),
    spacy.tokens.Span(doc,1,2,"B"),
    spacy.tokens.Span(doc,2,3,"C")]
)
tsk = spacy_llm.tasks.make_rel_task(labels = ["A","B","C"])
for prompt in tsk.generate_prompts([doc]):
    print(prompt)

The problem is a single character difference here:

annotation = f"[ENT{i}:{ent.label}]"

It should be label_ instead of label. I'd open the PR, but it feels weird to have label be the non-string version of the entity so maybe the real problem is upstream.

print(doc.ents[0].label)
print(doc.ents[0].label_)

14862748245026736845
A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfeat/taskFeature: tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions