-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Description of the bug
I'm trying to Train a model that can build a Knowledge Base from the OPC UA Companions specification as a part of my Thesis.
I have the Dataset as PDFs and used a third-party program to convert them into HTML and tried my best to preserve the data structure information (i'm getting the same result even if i just Parsed on the PDFs alone).
Then i followed the hardware_fonduer_model Tutorial to Extract the Candidates accordingly.
the Problem is that the Parser is splitting the sentences wrongly, namely it is getting the end of a Line as an end of a sentence.
I tried to debug using a SimpleParser.split_sentences(text) command and turned out that python needs a backslash to split a statement into multiple lines.
So i thought maybe i could use the replacements=['[\n]', ' '] parameter so the Split could function better but i'm getting the ValueError: too many values to unpack (expected 2).
What is the default configuration for the sentence segmentation?
How could i get a multiple Sentences as a mention? (i tried MentionNgram till n_max =100 and still getting just one).
I would really appreciate getting back from you.
many thanks in advance
Example: Text to be parsed
Boolean indicating if a profile /signature should be generated by this move command
request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Expected behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command request.
sentence 2 : If the optional VariableSignatureRequestStatus is not provided on the Object, this
parameter is ignored by the Server.
Actual behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command
sentence 2 : request.
sentence 3 : request.If the optional VariableSignatureRequestStatus is not provided on the Object, this
sentence 4 : parameter is ignored by the Server.
Environment
- OS: Ubuntu 20.04.1 LTS
- PostgreSQL Version: 12.0
- Poppler Utils Version: 0.2.1]
- Fonduer Version: 0.8.2
MDISCompanionSpecification.pdf