Upload
nickyn
View
722
Download
2
Embed Size (px)
Citation preview
Advancing the International Plant Names Index (IPNI)
Nicky Nicolson, Alan Paton, Jim Croft, James Macklin, Paul Morris, Greg Whitbread, Kanchi Gandhi
Advancing IPNI
• Current - where IPNI is now• Issues • Future - where we’d like to go and how to get
there
What data?
• What data types:– ICBN governed nomenclatural acts– Standardised author list– Publications
• Which groups:– Vascular plants
• Which ranks:– Family and below
How is data entered?
• Data entry:– From literature scanning, journals received by library at
Kew, Harvard, Canberra (2 years - 95%)– User reports of missing nomenclatural acts, usually
accompanied by a link to digitised literature page (BHL)• How many?
– About 7400 names entered in average year– About 6100 nomenclatural acts published / year– … of these about 2800 are tax. novs.
How is data managed?• Full audit history on core objects – names /
authors / publications.• Average 300,000 edits on name records / year• Standardisation effort ongoing :
– Epithet– Author citation – Publication title– Collation– Year
Standardisation – author and titleAuthor and Title standardization
30%
40%
50%
60%
70%
80%
90%
standardized author citations standardized publication title
Standardisation – epithet updates
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
2006
-01
2006
-03
2006
-05
2006
-07
2006
-09
2006
-11
2007
-01
2007
-03
2007
-05
2007
-07
2007
-09
2007
-11
2008
-01
2008
-03
2008
-05
2008
-07
2008
-09
2008
-11
2009
-01
2009
-03
2009
-05
2009
-07
2009
-09
2009
-11
2010
-01
2010
-03
2010
-05
2010
-07
2010
-09
2010
-11
2011
-01
2011
-03
2011
-05
Standardisation of epithets
• Why important – Main search criterion– Improving epithets enables other improvements
in dataset e.g.:• basionym linkage• de-duplication
– Errors propagate
Rhus keamcyi was an OCR error for Rhus kearneyi but the incorrect value persists in datasets derived from IPNI
Statistics
• Dataset can be used for trends analysis:– Publication rates– Combination rates– Author collaborations
• Audit history used to determine changes in data-set over time
http://www.ipni.org/stats.html
http://www.ipni.org/stats.html
As well as the data…
• IPNI editors respond to user queries about the data, dealing with c. 50 cases / month
• Includes an expert service re interpretation of ICBN
• Can provide worked examples illustrating particular articles of the code
Why should anyone care?
• c55,000 searches / dayBUT• dataset is not being used to full advantage• inputs not being handled efficiently:
– limited to partnership– missing out on community input
• expertise is hidden
Future
• Increase efficiency of input– provision of core data– annotating and linking existing data– solving nomenclatural problems
• Increase output– usage of IPNI data– benefit from on-going curation effort– benefit from nomenclatural expertise
Data in - contributor services
• Pre-publication data entry• Batch submission of datasets• Annotation• Addition of links within dataset• Facilitate interpretation of nomenclatural
issues• Accreditation – credit for helping improve the
data
Pre-publication data entry• Workflow currently being trialled
– Author or publisher submits data to IPNI once article has been accepted for publication
– Generated record suppressed until publication effective under the code
– But this not yet automated!
Electronic Publication Example - Phytokeys
A nomenclator of Pacific oceanic island Phyllanthus (Phyllanthaceae), including Glochidion
Warren L. Wagner, David H. Lorence
• 5. Phyllanthus atalotrichus (A.C. Sm.) W.L. Wagner & Lorence, comb. nov.
urn:lsid:ipni.org:names:77112693-1
PhytoKeys 4: 67–94 (2011)doi: 10.3897/phytokeys.4.1581www.phytokeys.com
Pre-publication issues• Name squatting – mitigated by only entering
names which are in papers accepted for publication
• Curation of record throughout publication process
• Electronic and effective publication – before this the record will not be visible
• IPNI editors provide visible expert service re validity of name
Where IPNI data are placed
Any name occurrence: e.g. specimens, reports, literature citation
concepts
Standard form of name
Data out - links
• To concept layer:– embed IPNI identifiers– storage of factual concepts / links to concept layer
• To name occurrence layer:– seed lexical reconciliation projects (e.g. GNI)
• To allied information:– literature– types
Links to concept layerEmbed IPNI identifiers in externally held names lists• IPNI holds curated name data, labelled with persistent
identifiers.
• Need a tool to seed IPNI identifiers into datasets (in prototype)
• Can devolve curation of name elements in other systems to IPNI
Benefit from on-going curation:• 300,000 edits per year
Report on changes in name list since date
Links to the Concept LayerExample The Plant List
Link to name occurrence layer
• IPNI’s version history can be used to seed lexical reconciliation projects (GNI), e.g.:– Plectranthus macrophylius -> Plectranthus macrophyllus
• These editorialised translations of higher value than programmatically derived operations of the same edit distance, e.g:– Plectranthus microphyllus -> Plectranthus macrophyllus
• Standardisation tools and techniques opened up for use in allied projects
Conclusion
• Faciliate electronic publication - pilot registration
• Foster larger community to support the data and automate workflows
• Stronger links between:– the people who produce names– the places where they are published– the downstream users
• Technical redevelopment