25
Copyright 2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum CEO Inera Incorporated Journal Article Tag Suite Conference, 1 November 2010

Copyright 2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Embed Size (px)

Citation preview

Page 1: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

NLM DTD Flexibility:How and Why Applications of the

NLM DTD Vary

Presented by

Bruce D. Rosenblum

CEO

Inera Incorporated

Journal Article Tag Suite Conference, 1 November 2010

Page 2: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Remember When…

Page 3: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Scholarly DTDs, Circa 2001

ISO 12083

Elsevier 1.1.0Elsevier 2.1.1

Elsevier 3.0.0Elsevier 4.1

Blackwell 2.2Blackwell 3.0

Blackwell 4.0

KetonCamdus

Capital CityCharlesworth

AldenHighwire 4.2.8

PMC 1.0AIP

UCP

WileyIEEE

NatureBioO

neU Chicago Press

Cambridge Univeristy Press

American GeoPhysical

American Medical

New England JournalAm

erican Chemical

National Resarch CanadaA

cade

mic

Pre

ss

Oxford University PressA

cade

mic

Pre

ss

SpringerLkuwer Academic

Page 4: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Scholarly DTDs 2010 NLM DTD

Elsevier DTD Springer DTD Wiley-Blackwell DTD And a few others…

No longer a grand mess, but… NLM DTD Suite applications vary Specific tagging practices meet publisher-specific

requirements

Page 5: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Data and Methodology Data from 25 eXtyles and refXpress

implementations since 2003Not a scientific survey However useful to show NLM DTD usage

variations Supplier requirements differ from publishers

Serve multiple publishers who deliver to different platforms

Page 6: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

NLM DTD Adoption By YearOrganization DTD Year Version Prior XML

Publisher 1 Archive * 2003 3.0 † No

Publisher 2 Archive 2005 2.0 No

Publisher 3 Archive 2005 2.3 No

Publisher 4 Archive 2006 2.2 No

Publisher 5 Archive 2006 2.3 Yes

Publisher 6 Publish 2006 2.3 Yes

Publisher 7 Publish & book 2006 2.3 No

Publisher 8 Book * 2007 2.2 No

Publisher 9 Publish 2007 2.3 No

Publisher 10 Archive 2007 2.3 No

Publisher 11 Publish 2007 2.3 No

Publisher 12 Publish 2007 2.3 No

Publisher 13 Publish 2008 2.3 Yes

Publisher 14 Publish & book 2008 2.3 No

Publisher 15 Publish 2008 2.3 No

Publisher 16 Book 2009 2.3 No

Publisher 17 Publish 2009 2.3 No

Publisher 18 Publish 2010 2.3 No

Publisher 19 Publish * 2010 3.0 No

Publisher 20 Archive 2010 3.0 No

Publisher 21 Publish * 2010 3.0 Yes

JATS-con Authoring 2010 3.0 Yes

Supplier 1 Publish 2008 2.3 Yes

Supplier 2 Publish 2007 2.3 No

Supplier 3 Book 2010 3.0 Yes

* Customized version of DTD beyond OASIS-CALS addition

† Upgraded from 1.0 to 3.0 in 2010

Page 7: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Year of DTD Adoption Few implementations prior to 2006

Mostly related to PMC deposit Adoption rate grows in 2006 and later

Maturity of version 2.0 in August 2004 Greater public awareness by 2006

Freely available and modifiable Flexible Not just for life science content

More off-the-shelf tool support from NCBI and others 3.0 upgrade not automatic; not fully backwards

compatible

Page 8: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Prior Markup Experience Most had not used full-text XML or SGML

Driven to NLM DTD for: More modern XML-based workflow Desire for full-text to drive HTML and archive needs PMC deposit

Those with SGML experience SGML to XML conversion choice

Convert existing DTD to XML Adopt NLM DTD

Page 9: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

DTD Selection Most adopters use Journal Publishing (blue) DTD Early adopters chose Archive and Interchange

(green) DTD Blue was too restrictive prior to 2.0 ISSN optional in green; hosts non-serial publications

without modification

Book DTD use growing in recent years Not as mature as journals, but useful

Page 10: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Implementation CharacteristicsOrganization Char Encoding Math Tables List Labels Ref PCDATA

Publisher 1 ISO MathML HTML DROP DROP

Publisher 2 ISO Graphic HTML DROP DROP

Publisher 3 Unicode Graphic HTML DROP KEEP

Publisher 4 ISO MathML CALS DROP KEEP

Publisher 5 ISO MathML HTML DROP KEEP

Publisher 6 ISO TeX CALS DROP KEEP

Publisher 7 Unicode Graphic HTML DROP DROP

Publisher 8 ISO Graphic CALS KEEP KEEP

Publisher 9 ISO MathML HTML DROP DROP

Publisher 10 Unicode Graphic HTML DROP DROP

Publisher 11 Unicode MathML HTML DROP DROP

Publisher 12 Unicode MathML HTML KEEP KEEP

Publisher 13 Unicode Graphic CALS KEEP KEEP

Publisher 14 Unicode Graphic CALS KEEP KEEP

Publisher 15 Unicode Graphic HTML DROP DROP

Publisher 16 Unicode Graphic HTML KEEP KEEP

Publisher 17 Unicode Graphic HTML DROP KEEP

Publisher 18 Unicode Graphic HTML DROP KEEP

Publisher 19 Unicode Graphic CALS KEEP NA

Publisher 20 Unicode NA NA NA KEEP

Publisher 21 Unicode MathML CALS KEEP KEEP

JATS-con Unspecified MathML HTML DROP KEEP

Supplier 1 Unicode MathML HTML KEEP KEEP

Supplier 2 ISO TeX CALS KEEP KEEP

Supplier 3 Unicode MathML+graphic CALS KEEP KEEP

Page 11: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Character Encoding Most implementations use Unicode entities (e.g.,

β) Quasi-human readable (unlike UTF-8)

Some use ISO entities (e.g. β) Most human-readable But Transform required for HTML

Page 12: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Generated and Boilerplate text Generated Text:

Inconsequential, formulaic, or stereotypical text, punctuation, and formatting omitted from an XML file, which is applied to content by a style sheet when an XML file is rendered

Boilerplate Text: Inconsequential, formulaic, or stereotypical text, punctuation,

and formatting that could have been omitted but which the publisher has chosen to keep in the XML file rather than to generate with a style sheet

Page 13: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

NLM DTD Structure NLM DTD is flexible

Permits generated or boilerplate text

Degree varies by tag set Green DTD allows greatest degree of Boilerplate Text Includes the <x> element

Hypothesis: Flexibility of generated versus boilerplate text increased NLM DTD adoption

Page 14: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

List Labels List-type attribute carries format information Most publishers don’t keep list label

Possibly because HTML excludes list label

Books are an exception List label useful for dis-continuous lists (e.g. items 1

to 4, intervening text, then items 5 to 8)

Page 15: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Early Reference Models Versions 1.0 through version 2.3 had the

<citation> and <nlm-citation> elements <citation> allowed PCDATA and any element order <nlm-citation> allowed only elements in proscribed

order No way to restrict PCDATA without enforcing

element order Problematic when mixing parsed and unparsed

references (e.g. gray literature)

Page 16: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Reference Tagging 3.0 <mixed-citation> and <element-citation>

Former allows PCDATA Latter allows only semantic elements Neither proscribes order

Page 17: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Reference Tagging Most publishers keep PCDATA All suppliers keep PCDATA Reasons

Less style sheet setup (PDF, HTML, etc.) PCDATA can easily be dropped Suppliers: multiple publisher styles require less

setup

Page 18: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

PCDATA Correlations All element-citation users drop list labels Some mixed-citation users drop list labels Publishers decide on boilerplate text on per-

element basis, not global all or nothing

Page 19: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Math & Tables by Comp ApplicationOrganization Composition Application Math Tables

Publisher 8 3B2 Graphic CALS

Publisher 21 3B2 MathML CALS

Publisher 6 3B2 TeX CALS

Supplier 2 3B2 TeX CALS

Publisher 1 3B2 MathML HTML

Supplier 1 3B2 MathML HTML

Publisher 5 3B2 & InDesign MathML HTML

Publisher 11 Antenna House MathML HTML

Publisher 4 Frame MathML CALS

Publisher 19 InDesign Graphic CALS

Publisher 2 InDesign Graphic HTML

Publisher 3 InDesign Graphic HTML

Publisher 15 InDesign Graphic HTML

Publisher 16 InDesign Graphic HTML

Publisher 18 InDesign Graphic HTML

Publisher 13 InDesign/Typefi Graphic CALS

Publisher 14 InDesign/Typefi Graphic CALS

Supplier 3 InDesign/Typefi MathML+graphic CALS

Publisher 7 InDesign/Typefi Graphic HTML

JATS-con NA MathML HTML

Publisher 20 NA NA NA

Publisher 17 PDF from Word Graphic HTML

Publisher 9 PDF from Word MathML HTML

Publisher 12 PDF from Word MathML HTML

Publisher 10 Ventura Graphic HTML

Page 20: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Table Markup XHTML is default NLM DTD model CALS requires DTD modification

CALS has cell borders and table groups InDesign & Frame support CALS, but not XHTML

tables 3B2 users seem to prefer CALS tables Must be converted to XHTML for online delivery

Theory: publishers adopt CALS when more appropriate for PDF/print composition systems

Page 21: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Math Markup NLM DTD permits MathML, TeX, pointers to

graphic files MathML is native XML markup, but…

MathML has limited browser support Firefox is good; Safari is OK; IE has no MathML support Most publishers deliver online math as images

MathML has limited composition support InDesign does not have native MathML rendering 3B2 native rendering is TeX

Math model driven by PDF creation requirements

Page 22: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Composition and HostingOrganization Comp Application Comp Location Online PMC

Publisher 1 3B2 Outsource Self-hosted No

Publisher 2 InDesign In-House Self-hosted Yes

Publisher 3 InDesign In-House Self-hosted Yes

Publisher 4 Frame In-House Self-hosted No

Publisher 5 3B2 & InDesign Outsource Highwire Yes

Publisher 6 3B2 Outsource Highwire Yes

Publisher 7 InDesign/Typefi In-House Self-hosted Yes

Publisher 8 3B2 Outsource Self-hosted No

Publisher 9 PDF from Word In-House Self-hosted Yes

Publisher 10 Ventura In-House Self-hosted No

Publisher 11 Antenna House In-House Self-hosted Yes

Publisher 12 PDF from Word In-House Self-hosted Yes

Publisher 13 InDesign/Typefi In-House Highwire Yes

Publisher 14 InDesign/Typefi In-House Self-hosted No

Publisher 15 InDesign In-House Self-hosted No

Publisher 16 InDesign In-House Self-hosted No

Publisher 17 PDF from Word In-House Self-hosted Yes

Publisher 18 InDesign In-House Self-hosted Yes

Publisher 19 InDesign In-House Self-hosted No

Publisher 20 NA NA Self-hosted No

Publisher 21 3B2 In-House Self-hosted Some

JATS-con NA NA Self-hosted No

Supplier 1 3B2 Supplier Various Some

Supplier 2 3B2 Supplier Various No

Supplier 3 InDesign/Typefi Supplier Various No

Page 23: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Composition and Online Hosting Majority of users

Typeset in-house Self-host online version

PMC delivery requirement for half of users However… this correlation may be significant

only among organizations that have chosen to create XML in-house

Page 24: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Conclusions NLM DTD flexibility led to broader adoption

Application of DTD can be adjusted to meet needs of specific publishing requirements or tools

NLM DTD standard facilitates in-house XML implementation Eliminates R&D requirement to create a DTD Customizable off-the-shelf tools available Cost-effective solution for small and medium-size

publishers

Page 25: Copyright  2010 Inera Incorporated. All Rights Reserved NLM DTD Flexibility: How and Why Applications of the NLM DTD Vary Presented by Bruce D. Rosenblum

Copyright 2010 Inera Incorporated. All Rights Reserved

Questions?

Bruce RosenblumInera Incorporated+1 (617) 932 - 1932

[email protected]