Upload
hilmar-lapp
View
813
Download
1
Embed Size (px)
DESCRIPTION
Talk presented at the Genomic Standards Consortium 15 conference.
Citation preview
The blessing and the curse: handshaking between
general andspecialist data repositories
Hilmar Lapp (NESCent), Todd Vision (UNC Chapel Hill)GSC 15 Conference, Bethesda, MD
April 22-24, 2013
> 180 for biological sciences
alone
Which data goes where?Which is required?
Addressing the long tail of orphan dataVo
lum
e
Rank frequency of datatype
Specialized repositories(e.g. GenBank, GBIF)
Orphan data
After Heidorn (2008) http://hdl.handle.net/2142/9127
Many datasets belong to the long tail. Though less standardized, they can be rich in information content and have unique value
General purpose repositories cater to long-tail data
General purpose repositories cater to long-tail data
And that’s aside from the proverbial Babel of
data formats.
Where does this leave the user?
Where to deposit what, and how?
Enter Publication:
Please enter your publication:
Publication:
Enter Publication:
Metadatahas to be
provisioned redundantly
How to concisely link to the supporting data?
Given the article, how do I find the data?
Given a data record, how
do I find related data?
How do I assess quality and fitness for purpose?
Lessons fromDryad/TreeBASE
handshaking
• The End To make data archiving and reuse a standard part of scholarly communication.
• The Means Integrate data archiving with the process of publication. Make archiving easy and low burden for both authors and journals. Give researchers incentives to archive their data. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation. Work with and support trusted, specialized disciplinary repositories.
• The Scope Research data in sciences and medicine. (Early focus on evolution and ecology). Content must be complementary to existing disciplinary repositories. Data must be associated with a vetted publication (article, thesis, book chapter, etc.) Associated non-data content (e.g. software scripts, figures) where appropriate
Lessons learnt
• Different priorities on deposit versus metadata richness may void benefits
• Advantages of one-stop deposition and when to use it are not obvious to users
• Custom-building handshaking protocols is not robust, doesn’t scale
How to promote
• Minimum metadata reporting standards?
• Uptake of community specialist repositories?
• Archival of all long-tail data?
• Linking between repositories?
DataMetadata Links
DataMetadata Links
Standards for repository & web of data
interoperability
Standards for repository & web of data interoperability
Promoting community rallying around standards
?
Promoting community rallying around standards
?
Repo: http://datadryad.orgBlog: http://blog.datadryad.orgWiki: http://datadryad.org/wikiCode: http://code.google.com/p/dryadList: [email protected] @datadryad Dryad