-
Notifications
You must be signed in to change notification settings - Fork 16
Closed
Description
In trying to get Pub2TEI working on the grobid gold standard data from PMC, I ran into the DTD issues mentioned in the README. After some research, I was able to discover that DTD loading can be disabled with the following switch:
--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false
References:
- Saxon docs on the
--feature
switch - StackOverflow response from Saxon developer Michael Kay
- Xerces Documentation on
load-external-dtd
I've attached a file from the grobid PMC gold standard data I was having trouble with. The new switch allows the conversion to proceed.
The sample command in the README could be updated to:
java -jar Samples/saxon9he.jar \
--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false \
-a:off \
-dtd:off \
-expand:off \
-o:out.tei.xml \
-s:Samples/TestPubInput/BMJ/bmj_sample.xml \
-t \
-xsl:Stylesheets/Publishers.xsl
vipulg13, jacksongoode and kermitt2
Metadata
Metadata
Assignees
Labels
No labels