21
Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International Association of STM Publishers Director, Standards and Technology ICSTI workshop Delivering Data in Science PARIS, 5 March 2012

Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Embed Size (px)

Citation preview

Page 1: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Avoiding a Digital Dark Age for Data:why data and publications belong together

Integration of Research Data and Publications

Eefke Smit

International Association of STM Publishers

Director, Standards and Technology

ICSTI workshop Delivering Data in Science

PARIS, 5 March 2012

Page 2: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

A famous paper in Nature:DNA structure - 1953

• 1 page• 2 authors• 1 figure• no data

Source: V. Kiermer, Nature Publishing Group, 2011

Page 3: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Nature in 2001: The human genome issue • 62 pages, 49 figures, 27 tables

Source: V. Kiermer, Nature Publishing Group, 2011

Page 4: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

The human genome at 10 – 2010Nature now in an iPad edition:

Source: V. Kiermer, Nature Publishing Group, 2011

Page 5: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

A thousand genomes – 2010http://www.nature.com/nature/journal/v467/n7319/full/nature09534.html

Raw data: 12,145 SRA run ids submitted to Short Read Archive

Raw data: 12,145 SRA run ids submitted to Short Read Archive

Source: V. Kiermer, Nature Publishing Group, 2011

Page 6: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

author information

live updates

Collapsible sections

Tool box to print, download reference, share: email, social media, bookmark

Figure previewer

Related content

new publishing models

doi

article-level metrics

Source: V. Kiermer, Nature Publishing Group, 2011

Page 7: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

From The BioChemical Journal, Portland Press:

Every wanted to inspect data referenced in articles? Utopia Documents allows you to interact directly with curated database entries. Play with molecular structures; edit sequence and alignment data; even plot curated tabular data yourself. http://www.biochemj.org/bj/semantic_faq.htm

Page 8: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

8

Elsevier offers gene and protein viewers

from within the article, to data stored elsewhere:

Page 9: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

9

How big is the Data Problem ?

Depositions of datasets in archives continue to grow, surpassing journal articles

in biomedical research

Growth of biomedical research publications (red; current total >19 million), alongside the accumulation of research data, including nucleic acid sequences (black; current total ~163 million), computer-annotated protein sequences (magenta; current total 9 million), manually annotated protein sequences (green; current total 500,000) and protein structures (blue; current total 60,000)

Source: Biochemical Journal 2009 424, 317-333 - Teresa K. Attwood, Douglas B. Kell and others.

Page 10: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Jnl of NeuroScience:The Graph depicts the average size of a Journal of Neuroscience article and supplemental material in megabytes.

As a consequence, the Journal no longer accepts supplementary files to manuscripts, soon the supplementary material would outgrow the article volume. The burden on the peer review process became simply to large.

Journal Cell: Editors suspect researchers to treat supplements as data dumping grounds (Emily Markus, Cell)

General: Publishers cannot guarantee proper preservation and future accessibility of supp files.

Maunsell J J. Neurosci. 2010;30:10599-10600

©2010 by Society for Neuroscience

How big is the Data Problem for journals?Too big for the Jnl of Neuroscience and Cell:

Page 11: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Estimated amount of data stored per research project

1%

17%

25%

40%

6%

1% 0%

11%

1%

8%

19%

41%

13%

3%0%

14%

2%5%

13%

36%

20%

5%2%

17%

0%5%

10%15%20%

25%30%35%

40%45%

0MB 1-100MB 100MB-1GB 1GB-1TB 1TB-1PB 1PB-10PB >10PB Don't Know

Current In 2 years In 5 Years

Researchers foresee higher volumes of data per research project:

Source: PARSE.Insight survey 2008

Page 12: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Where do you currently store your research data? (multiple answers possible)

Source: PARSE.Insight survey 2009, N = 1202

Page 13: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Where would you be willing to submit your research data? (multiple answers)

Source: PARSE.Insight survey 2009, N = 1202

Page 14: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Project-ODE:Opportunities for Data Exchange

ObjectivesTo consider the impact that data sharing, re-use and preservation is having on scholarly communication and identify incentives for researchers and other stakeholders that will help to optimise the take-up of future e-Infrastructure.

Specific objective:•Establish the baseline practices integrating datasets with publications and vice-versa.

Page 15: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

15

Data Publication Pyramid:there is data, data and data.........

Page 16: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

(1) Data contained and

explained within the article

(2) Further data explanations in

any kind of supplementary files to articles

(3) Data referenced from the article and

held in data centers and repositories

(4) Data publications, describing available datasets

(5) Data in drawers and on

disks at the institute

The Data Publication Pyramid

Page 17: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

17

The Pyramid’s likely short term reality:(1) Top of the

pyramid is stable but small

(2) Risk that supplements to articles turn into Data Dumping

places(3) Too many

disciplines lack a community

endorsed data archive

(4) Estimates are that at least

75 % of research data is

never made openly avaiable

Page 18: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

18

The Ideal Pyramid (1) More integration of text and data, viewers

and seamless links to interactive

datasets(2) Only if data

cannot be integrated in

article, and only relevant extra explanations

(3) Seamless links (bi-directional)

between publications and data, interactive

viewers within the articles

(4) More Data Journals that

describe datasets, data mgt plans and data methods

Page 19: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

How publishers view data: Brussels Declaration on Data in 2007

Raw research data should be made freely available to all researchers.Publishers encourage the public posting of the raw data outputs of research. Sets or sub-sets of data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholarsSigned by 45 leading publishers and 14 publishers organisations.

STM is working with DataCite on a new statement

Page 20: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

20

How can publishers help to make things better*• Stricter editorial policies on the availability of underlying data

• Recommend reliable and trustworthy Data Archives to authors

• Enhance articles for better integration of underlying data

• Endorse guidelines for proper citation of data

• Launch and sponsor Data Journals

• Ensure persistent identifiers and bi-directional linking

• Partner with reliable Data Archives for further integration of

Data and Publications,including interactivity for re-use.* See http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2011/11/ODE-ReportOnIntegrationOfDataAndPublications-1_1.pdf

Page 21: Avoiding a Digital Dark Age for Data: why data and publications belong together Integration of Research Data and Publications Eefke Smit International

Questions ?

Eefke SmitInternational Association of STM PublishersDirector, Standards and [email protected]