Upload
eunice-dawson
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Copyright 2010 Inera Incorporated. All Rights Reserved
NLM DTD Flexibility:How and Why Applications of the
NLM DTD Vary
Presented by
Bruce D. Rosenblum
CEO
Inera Incorporated
Journal Article Tag Suite Conference, 1 November 2010
Copyright 2010 Inera Incorporated. All Rights Reserved
Remember When…
Copyright 2010 Inera Incorporated. All Rights Reserved
Scholarly DTDs, Circa 2001
ISO 12083
Elsevier 1.1.0Elsevier 2.1.1
Elsevier 3.0.0Elsevier 4.1
Blackwell 2.2Blackwell 3.0
Blackwell 4.0
KetonCamdus
Capital CityCharlesworth
AldenHighwire 4.2.8
PMC 1.0AIP
UCP
WileyIEEE
NatureBioO
neU Chicago Press
Cambridge Univeristy Press
American GeoPhysical
American Medical
New England JournalAm
erican Chemical
National Resarch CanadaA
cade
mic
Pre
ss
Oxford University PressA
cade
mic
Pre
ss
SpringerLkuwer Academic
Copyright 2010 Inera Incorporated. All Rights Reserved
Scholarly DTDs 2010 NLM DTD
Elsevier DTD Springer DTD Wiley-Blackwell DTD And a few others…
No longer a grand mess, but… NLM DTD Suite applications vary Specific tagging practices meet publisher-specific
requirements
Copyright 2010 Inera Incorporated. All Rights Reserved
Data and Methodology Data from 25 eXtyles and refXpress
implementations since 2003Not a scientific survey However useful to show NLM DTD usage
variations Supplier requirements differ from publishers
Serve multiple publishers who deliver to different platforms
Copyright 2010 Inera Incorporated. All Rights Reserved
NLM DTD Adoption By YearOrganization DTD Year Version Prior XML
Publisher 1 Archive * 2003 3.0 † No
Publisher 2 Archive 2005 2.0 No
Publisher 3 Archive 2005 2.3 No
Publisher 4 Archive 2006 2.2 No
Publisher 5 Archive 2006 2.3 Yes
Publisher 6 Publish 2006 2.3 Yes
Publisher 7 Publish & book 2006 2.3 No
Publisher 8 Book * 2007 2.2 No
Publisher 9 Publish 2007 2.3 No
Publisher 10 Archive 2007 2.3 No
Publisher 11 Publish 2007 2.3 No
Publisher 12 Publish 2007 2.3 No
Publisher 13 Publish 2008 2.3 Yes
Publisher 14 Publish & book 2008 2.3 No
Publisher 15 Publish 2008 2.3 No
Publisher 16 Book 2009 2.3 No
Publisher 17 Publish 2009 2.3 No
Publisher 18 Publish 2010 2.3 No
Publisher 19 Publish * 2010 3.0 No
Publisher 20 Archive 2010 3.0 No
Publisher 21 Publish * 2010 3.0 Yes
JATS-con Authoring 2010 3.0 Yes
Supplier 1 Publish 2008 2.3 Yes
Supplier 2 Publish 2007 2.3 No
Supplier 3 Book 2010 3.0 Yes
* Customized version of DTD beyond OASIS-CALS addition
† Upgraded from 1.0 to 3.0 in 2010
Copyright 2010 Inera Incorporated. All Rights Reserved
Year of DTD Adoption Few implementations prior to 2006
Mostly related to PMC deposit Adoption rate grows in 2006 and later
Maturity of version 2.0 in August 2004 Greater public awareness by 2006
Freely available and modifiable Flexible Not just for life science content
More off-the-shelf tool support from NCBI and others 3.0 upgrade not automatic; not fully backwards
compatible
Copyright 2010 Inera Incorporated. All Rights Reserved
Prior Markup Experience Most had not used full-text XML or SGML
Driven to NLM DTD for: More modern XML-based workflow Desire for full-text to drive HTML and archive needs PMC deposit
Those with SGML experience SGML to XML conversion choice
Convert existing DTD to XML Adopt NLM DTD
Copyright 2010 Inera Incorporated. All Rights Reserved
DTD Selection Most adopters use Journal Publishing (blue) DTD Early adopters chose Archive and Interchange
(green) DTD Blue was too restrictive prior to 2.0 ISSN optional in green; hosts non-serial publications
without modification
Book DTD use growing in recent years Not as mature as journals, but useful
Copyright 2010 Inera Incorporated. All Rights Reserved
Implementation CharacteristicsOrganization Char Encoding Math Tables List Labels Ref PCDATA
Publisher 1 ISO MathML HTML DROP DROP
Publisher 2 ISO Graphic HTML DROP DROP
Publisher 3 Unicode Graphic HTML DROP KEEP
Publisher 4 ISO MathML CALS DROP KEEP
Publisher 5 ISO MathML HTML DROP KEEP
Publisher 6 ISO TeX CALS DROP KEEP
Publisher 7 Unicode Graphic HTML DROP DROP
Publisher 8 ISO Graphic CALS KEEP KEEP
Publisher 9 ISO MathML HTML DROP DROP
Publisher 10 Unicode Graphic HTML DROP DROP
Publisher 11 Unicode MathML HTML DROP DROP
Publisher 12 Unicode MathML HTML KEEP KEEP
Publisher 13 Unicode Graphic CALS KEEP KEEP
Publisher 14 Unicode Graphic CALS KEEP KEEP
Publisher 15 Unicode Graphic HTML DROP DROP
Publisher 16 Unicode Graphic HTML KEEP KEEP
Publisher 17 Unicode Graphic HTML DROP KEEP
Publisher 18 Unicode Graphic HTML DROP KEEP
Publisher 19 Unicode Graphic CALS KEEP NA
Publisher 20 Unicode NA NA NA KEEP
Publisher 21 Unicode MathML CALS KEEP KEEP
JATS-con Unspecified MathML HTML DROP KEEP
Supplier 1 Unicode MathML HTML KEEP KEEP
Supplier 2 ISO TeX CALS KEEP KEEP
Supplier 3 Unicode MathML+graphic CALS KEEP KEEP
Copyright 2010 Inera Incorporated. All Rights Reserved
Character Encoding Most implementations use Unicode entities (e.g.,
β) Quasi-human readable (unlike UTF-8)
Some use ISO entities (e.g. β) Most human-readable But Transform required for HTML
Copyright 2010 Inera Incorporated. All Rights Reserved
Generated and Boilerplate text Generated Text:
Inconsequential, formulaic, or stereotypical text, punctuation, and formatting omitted from an XML file, which is applied to content by a style sheet when an XML file is rendered
Boilerplate Text: Inconsequential, formulaic, or stereotypical text, punctuation,
and formatting that could have been omitted but which the publisher has chosen to keep in the XML file rather than to generate with a style sheet
Copyright 2010 Inera Incorporated. All Rights Reserved
NLM DTD Structure NLM DTD is flexible
Permits generated or boilerplate text
Degree varies by tag set Green DTD allows greatest degree of Boilerplate Text Includes the <x> element
Hypothesis: Flexibility of generated versus boilerplate text increased NLM DTD adoption
Copyright 2010 Inera Incorporated. All Rights Reserved
List Labels List-type attribute carries format information Most publishers don’t keep list label
Possibly because HTML excludes list label
Books are an exception List label useful for dis-continuous lists (e.g. items 1
to 4, intervening text, then items 5 to 8)
Copyright 2010 Inera Incorporated. All Rights Reserved
Early Reference Models Versions 1.0 through version 2.3 had the
<citation> and <nlm-citation> elements <citation> allowed PCDATA and any element order <nlm-citation> allowed only elements in proscribed
order No way to restrict PCDATA without enforcing
element order Problematic when mixing parsed and unparsed
references (e.g. gray literature)
Copyright 2010 Inera Incorporated. All Rights Reserved
Reference Tagging 3.0 <mixed-citation> and <element-citation>
Former allows PCDATA Latter allows only semantic elements Neither proscribes order
Copyright 2010 Inera Incorporated. All Rights Reserved
Reference Tagging Most publishers keep PCDATA All suppliers keep PCDATA Reasons
Less style sheet setup (PDF, HTML, etc.) PCDATA can easily be dropped Suppliers: multiple publisher styles require less
setup
Copyright 2010 Inera Incorporated. All Rights Reserved
PCDATA Correlations All element-citation users drop list labels Some mixed-citation users drop list labels Publishers decide on boilerplate text on per-
element basis, not global all or nothing
Copyright 2010 Inera Incorporated. All Rights Reserved
Math & Tables by Comp ApplicationOrganization Composition Application Math Tables
Publisher 8 3B2 Graphic CALS
Publisher 21 3B2 MathML CALS
Publisher 6 3B2 TeX CALS
Supplier 2 3B2 TeX CALS
Publisher 1 3B2 MathML HTML
Supplier 1 3B2 MathML HTML
Publisher 5 3B2 & InDesign MathML HTML
Publisher 11 Antenna House MathML HTML
Publisher 4 Frame MathML CALS
Publisher 19 InDesign Graphic CALS
Publisher 2 InDesign Graphic HTML
Publisher 3 InDesign Graphic HTML
Publisher 15 InDesign Graphic HTML
Publisher 16 InDesign Graphic HTML
Publisher 18 InDesign Graphic HTML
Publisher 13 InDesign/Typefi Graphic CALS
Publisher 14 InDesign/Typefi Graphic CALS
Supplier 3 InDesign/Typefi MathML+graphic CALS
Publisher 7 InDesign/Typefi Graphic HTML
JATS-con NA MathML HTML
Publisher 20 NA NA NA
Publisher 17 PDF from Word Graphic HTML
Publisher 9 PDF from Word MathML HTML
Publisher 12 PDF from Word MathML HTML
Publisher 10 Ventura Graphic HTML
Copyright 2010 Inera Incorporated. All Rights Reserved
Table Markup XHTML is default NLM DTD model CALS requires DTD modification
CALS has cell borders and table groups InDesign & Frame support CALS, but not XHTML
tables 3B2 users seem to prefer CALS tables Must be converted to XHTML for online delivery
Theory: publishers adopt CALS when more appropriate for PDF/print composition systems
Copyright 2010 Inera Incorporated. All Rights Reserved
Math Markup NLM DTD permits MathML, TeX, pointers to
graphic files MathML is native XML markup, but…
MathML has limited browser support Firefox is good; Safari is OK; IE has no MathML support Most publishers deliver online math as images
MathML has limited composition support InDesign does not have native MathML rendering 3B2 native rendering is TeX
Math model driven by PDF creation requirements
Copyright 2010 Inera Incorporated. All Rights Reserved
Composition and HostingOrganization Comp Application Comp Location Online PMC
Publisher 1 3B2 Outsource Self-hosted No
Publisher 2 InDesign In-House Self-hosted Yes
Publisher 3 InDesign In-House Self-hosted Yes
Publisher 4 Frame In-House Self-hosted No
Publisher 5 3B2 & InDesign Outsource Highwire Yes
Publisher 6 3B2 Outsource Highwire Yes
Publisher 7 InDesign/Typefi In-House Self-hosted Yes
Publisher 8 3B2 Outsource Self-hosted No
Publisher 9 PDF from Word In-House Self-hosted Yes
Publisher 10 Ventura In-House Self-hosted No
Publisher 11 Antenna House In-House Self-hosted Yes
Publisher 12 PDF from Word In-House Self-hosted Yes
Publisher 13 InDesign/Typefi In-House Highwire Yes
Publisher 14 InDesign/Typefi In-House Self-hosted No
Publisher 15 InDesign In-House Self-hosted No
Publisher 16 InDesign In-House Self-hosted No
Publisher 17 PDF from Word In-House Self-hosted Yes
Publisher 18 InDesign In-House Self-hosted Yes
Publisher 19 InDesign In-House Self-hosted No
Publisher 20 NA NA Self-hosted No
Publisher 21 3B2 In-House Self-hosted Some
JATS-con NA NA Self-hosted No
Supplier 1 3B2 Supplier Various Some
Supplier 2 3B2 Supplier Various No
Supplier 3 InDesign/Typefi Supplier Various No
Copyright 2010 Inera Incorporated. All Rights Reserved
Composition and Online Hosting Majority of users
Typeset in-house Self-host online version
PMC delivery requirement for half of users However… this correlation may be significant
only among organizations that have chosen to create XML in-house
Copyright 2010 Inera Incorporated. All Rights Reserved
Conclusions NLM DTD flexibility led to broader adoption
Application of DTD can be adjusted to meet needs of specific publishing requirements or tools
NLM DTD standard facilitates in-house XML implementation Eliminates R&D requirement to create a DTD Customizable off-the-shelf tools available Cost-effective solution for small and medium-size
publishers
Copyright 2010 Inera Incorporated. All Rights Reserved
Questions?
Bruce RosenblumInera Incorporated+1 (617) 932 - 1932