The$Long$Road$to$JATS$ - Nature Researchdata.nature.com/downloads/docs/The-long-road-to-JATS.pdf ·...

Preview:

Citation preview

The  Long  Road  to  JATS  

Paul  Donohoe  –  Senior  XML  Developer  Jenny  Sherman  –  XML  Developer  Ashwin  Mistry    -­‐  XML  Developer  

 Macmillan  Science  and  EducaBon  

 

In  the  beginning  …  

•  over  half  a  million  arBcles  (increased  to  a  million  by  the  Bme  the  project  started)  

•  143  journals  (increased  to  180)  •  three  in-­‐house  DTDs  •  five  typesePers  •  numerous  workflows  and  producBon  systems  •  teams  based  across  the  world  

Mapping  process  

Mapping  to  JATS  –  the  value  of  examining  the  content    <!ELEMENT  artpubdt  (#PCDATA)>  <!-­‐-­‐  Date  of  iniBal  online  arBcle  publicaBon  under  "conBnuous  publishing"  program.  -­‐-­‐>    Feasibility  report  mapping    <artpubdt>19991125</artpubdt>      Maps  to:      <pub-­‐date  pub-­‐type="epreprint"  year="1999"  month="11"  day="25"/>  

Mapping  to  JATS  –  the  value  of  examining  the  content    <!ELEMENT  artpubdt  (#PCDATA)>  <!-­‐-­‐  Date  of  iniBal  online  arBcle  publicaBon  under  "conBnuous  publishing"  program.  -­‐-­‐>    Content-­‐based  mapping    <artpubdt>19991125</artpubdt>      Maps  to:    Nothing.    Used  for  only  10  arBcles  in  one  supplement.    

StandardizaBon  to  JATS  

List  type  mapping  from  JATS  to  AJ  or  NPG      

AJ   JATS   NPG  

Numbers   1      2      3   1      2      3   1      2      3  

Numbers  with  parens   1)    2)    3)   1)    2)    3)  

LePers   a      b      c   a      b      c   a      b      c  

LePers  with  parens   a)    b)    c)   a)    b)    c)  

Uppercase  lePers   A      B      C   A      B      C  

Roman  numerals   i      ii      iii   i      ii      iii   i      ii      iii  

Roman  numerals  with  parens   i)    ii)    iii)   i)    ii)    iii)  

Uppercase  Roman  numerals   I      II      III   I      II      III  

Bullets   •      •      •   •      •      •   •      •      •  

Project  kickoff  

External  needs  

Internal  needs  

ExisBng  workflow  

Staging  Live  

ExisBng  to  final  desired  workflow  

Staging  Live  

Content  Hub  

Suggested  interim  workflow  

Staging  Live  

Staging  

Current  hybrid  workflow  

Staging  

Live  

OpportuniBes  

OpportuniBes  

npg:baseUriTemplate  "{journal-­‐id}/{year}/{arBcle-­‐id}/"  ;  

Scrum  Development  Process  

(Re)Write a test  

Does the test fail?  

(Re)Write production

code  

Run all tests

Clean up code

Test Succeeds  

Test Fails Tests Fail

All Tests Succeed

Repeat

Test  Driven  Development  

Unit  TesBng  

   

MarkLogic  

Content Ingestion Service  

Content Hub API  

Asset Service  

Triple-store  

Content Gateway  

MongoDB  

Transformation Service  

Validation Service  

FTP Hot

Folder  

New Publishing Platform

incl

Article Rendering

Search  

Product Set Up tool  

MySQL  

Content Work Flow

Tool

File System  

Ontologies

Content  Hub  Architecture  

Core  Ontology  

Content  Hub  { "article": { "id": "nplants20151", "titleXml": "<article-title>Plant hormones: On-the-spot reporting</article-title>", ... "hasContributor": { "id": "rainer-waadt-nplants20151", "type": "contributors", "name": "Rainer Waadt", "isCorresponding": true }, ... "hasSummary": { "bodyXml": "<abstract><p>The development of a new ... </p></abstract>", "hasSummaryType": {"id": "standfirst"} }, ... "hasFigure": { "id": "nplants20151-f1", "captionXml": "<p><bold>a</bold>, Expression-based reporters ...</p>", "titleXml": “<title>The toolbox of plant hormone reporters.<title>", "hasImageAsset": {"id": "nplants20151-f1.jpg"} } } }

The  Big  Picture  

New  Publishing  Plalorm  

New  Search  

Other  benefits  

ConBnuous  Modelling  

Four  DTDs  

Next  Steps  

•  Single  environment  publishing  •  Data  sends  for  third  parBes  •  Archive  conversion  

Lessons  Learned  

•  Don’t  start  from  here!  •  Do  not  rely  on  DTD  to  DTD  mappings  •  Agile  development  methodology  for  tool  development  

Recommended