45
Research data and scholarly publications: Going from casual acquaintances to something more Todd Vision Dept of Biology, University of North Carolina at Chapel Hill and the U.S. National Evolutionary Synthesis Center ALPSP, September 2011 Abort, Retry, Fail? Data and the scholarly literature

Research data and scholarly publications: going from casual acquaintances to something more

Embed Size (px)

DESCRIPTION

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Citation preview

Page 1: Research data and scholarly publications: going from casual acquaintances to something more

Research data and scholarly publications:

Going from casual acquaintances to something more

Todd VisionDept of Biology, University of North Carolina at Chapel Hill

and the U.S. National Evolutionary Synthesis Center

ALPSP, September 2011Abort, Retry, Fail? Data and the scholarly literature

Page 2: Research data and scholarly publications: going from casual acquaintances to something more
Page 3: Research data and scholarly publications: going from casual acquaintances to something more
Page 4: Research data and scholarly publications: going from casual acquaintances to something more

Peer-to-peer ‘sharing’ fails

Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.

“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The

poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.

Page 5: Research data and scholarly publications: going from casual acquaintances to something more

Info

rmat

ion

Co

nte

nt

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Page 6: Research data and scholarly publications: going from casual acquaintances to something more

Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.

Page 7: Research data and scholarly publications: going from casual acquaintances to something more
Page 8: Research data and scholarly publications: going from casual acquaintances to something more

Source: Publishing Research Consortium, http://publishingresearch.net

n=3824

Page 9: Research data and scholarly publications: going from casual acquaintances to something more
Page 10: Research data and scholarly publications: going from casual acquaintances to something more

Taxonomy of data archiving benefits

Modified from Beagrie et al. (2009) Keeping Research Data Safe 2

DirectVerification of published researchPreserving accessibility to dataAllowing reuse and repurposing of dataDiscoverability of data

Indirect (costs avoided)Redundant data collectionInefficient legacy data curation Burden of sharing-upon-requestOpportunity cost of science not done

Near termProtection against personnel turnoverAvailability for review and validation

Long termSecure long-term stewardshipIncreased impact per publication

PrivateIncreased citationsNew collaborations New research opportunitiesFulfilling funding mandates

PublicMore efficient use of research dollarsPublic trust in scienceEducational opportunitiesImproved methodologiesMore informed policy

10

Page 11: Research data and scholarly publications: going from casual acquaintances to something more

Joint Data Archiving Policy (JDAP)

Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.

As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.

Authors may elect to embargo access to the data for a period up to a year after publication.

Exceptions may be granted at the discretion of the editor, especially for sensitive information.

Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.

Page 12: Research data and scholarly publications: going from casual acquaintances to something more

The long tail of orphan data in “small science”

Volu

me

Rank frequency of datatype

Specialized repositories(e.g. GenBank, PDB)

Orphan data

after B. Heidorn

“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray

Page 13: Research data and scholarly publications: going from casual acquaintances to something more

Smit E (2011) Abelard and Héloise: Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit

Page 14: Research data and scholarly publications: going from casual acquaintances to something more

• The End To make data archiving and reuse standard part of research and

publishing.

• The Means Enable low-burden data archiving at the time of manuscript submission. Promote researcher benefits from data archiving. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation.

• The Scope Data underlying peer-reviewed articles in basic and applied biosciences.

Page 15: Research data and scholarly publications: going from casual acquaintances to something more

Submit manuscript

Integrated

Page 16: Research data and scholarly publications: going from casual acquaintances to something more

Manuscript metadata

Submit manuscript

Integrated

Prompt author

Page 17: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Submit manuscript

Integrated

Prompt author

Page 18: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Submit manuscript

Integrated

Prompt author

Page 19: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Submit manuscript

Integrated

Prompt author

Page 20: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

Submit manuscript

Integrated

Prompt author

Page 21: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Page 22: Research data and scholarly publications: going from casual acquaintances to something more
Page 23: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Page 24: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Author includes

data DOI

Data DOI

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Page 25: Research data and scholarly publications: going from casual acquaintances to something more

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Author includes

data DOI

Data DOI

Article publicati

on

DOI/final metadataharvested

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Page 26: Research data and scholarly publications: going from casual acquaintances to something more

26

Dryad relative to Supplementary Online Materials

Dryad SOM

Discoverable: indexed and exposed to both web and bibliographic search engines ✔ ✗

Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers ✔ ✗*

Permanent: processes in place to promote preservation (incl. format migration) ✔ ✔/✗**

Curated: quality control by both automated processes and human inspection ✔ ✗*

Ease of deposit: streamlined deposit, allowance for large and complex datasets ✔ ✔/✗**

Formatted for reuse: support for non-PDF file formats ✔ ✔/✗**

Updatable: new versions of data files can be added, metadata can be enhanced ✔ ✗

Support for embargoes: can delay release of data in accordance with journal policy ✔ ✗

Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) ✔ ✔/✗**

Economy of scale: cost efficiency from shared infrastructure ✔ ✔/✗**

Alignment to organizational mission: focus on archiving and reuse of scientific data ✔ ✗

* A few publisher SOM sites are exceptions to the general rule** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit

Page 27: Research data and scholarly publications: going from casual acquaintances to something more

Article citationWu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M,

Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011

Data citationWu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M,

Venter JC, Eisen JA (2011) Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dryad Digital Repository. doi:10.5061/dryad.8384

Page 28: Research data and scholarly publications: going from casual acquaintances to something more

Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303

Page 29: Research data and scholarly publications: going from casual acquaintances to something more
Page 30: Research data and scholarly publications: going from casual acquaintances to something more

0

200

400

600

800

1000

Number of data packages

_x0

_x0

_x0

_x0

_x0

_x0

_x0

_x0

0

100

200

300

400

500Number of files

10

100

1000

10000

100000

100

1000...

1000...

1000...

0

100

200

300

400

500

Total data package size (bytes)

Page 31: Research data and scholarly publications: going from casual acquaintances to something more

20 papers from Delsuc and Douzery going back to 2002

Page 32: Research data and scholarly publications: going from casual acquaintances to something more

By now, downloaded >1000X

Page 33: Research data and scholarly publications: going from casual acquaintances to something more

Fulfilling the role of a journal

Journal Dryad

Registration ✓ ✓

Certification ✓ (peer review)

✓ (curation)

Awareness ✓ ✓ (distribution)

Archiving ✓

Rewarding ✓ ✓

Page 34: Research data and scholarly publications: going from casual acquaintances to something more
Page 35: Research data and scholarly publications: going from casual acquaintances to something more

Does sharing imply that it need be altruistic?

Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.

• For a set of 85 cancer microarray clinical trials 48% had publicly available data These received 85% of the article

citations Independent of journal impact

factor, publication date, author nationality

Page 36: Research data and scholarly publications: going from casual acquaintances to something more

Does sharing imply that it need be altruistic?

Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.

• For a set of 85 cancer microarray clinical trials 48% had publicly available data These received 85% of the article

citations Independent of journal impact

factor, publication date, author nationality

Page 37: Research data and scholarly publications: going from casual acquaintances to something more

Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1

Data policies among bioscience journals

n=70

IF=3.6

IF=4.5

IF=6.0

Page 38: Research data and scholarly publications: going from casual acquaintances to something more

The value proposition

• For researchers Increase the impact of, and citations to, published

research. Preserve and make data available to verify published

results, to refine methodologies, and to repurpose. Free researchers from the burden of data preservation and

access.

• For journals, publishers and societies Free journals from the burden of managing supplemental

data Increase the discoverability, impact, and integrity of

articles Increase their value to the community they serve.

• For funders A cost-effective mechanism to make research more

accessible Leverage existing investments in order to enable new

science

Page 39: Research data and scholarly publications: going from casual acquaintances to something more

Sustainability and governance• Business model

Long-term preservation requires a long-term organization

In Dryad’s case, a membership-based nonprofit Revenue received from a broad array of ‘customers,

including journals, societies, publishers, and researchers

• Deposit charges Paid upfront, when the majority of costs are incurred Ensure free access to the data in perpetuity Allow revenue to naturally scale with costs (i.e. volume

of deposits) Distribute costs fairly among stakeholders

• Governance 12 member Board of Directors nominated, elected by

Membership Membership serves in advisory capacity, and is a

community of practice

Page 40: Research data and scholarly publications: going from casual acquaintances to something more

Costs

• Moderate economies of scale are required At 10K packages/yr, <$50/deposit, depending on

curation

• What are the costs for SOM? Journal of Clinical Investigation: $300 flat fee Ecological Archives: $250 <10Mb, more fees beyond

that FASEB: $100 per file

Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010

Page 41: Research data and scholarly publications: going from casual acquaintances to something more

Proposed payment plans

1. Journal-based annual fee based on all research articles published/yr

(~$25/per*) covers any deposits from the journal (even from prior

yrs)

2. Voucher-based pay in advance for some number of deposits

(<$50/per deposit)

3. Pay-as-you-go: be invoiced retrospectively for deposits (>$50/per

deposit)

4. Author-pays Author pays online at time of deposit Journal can still facilitate archiving through

submission integration

* These are rates for Members, which include a 10% discount

Page 42: Research data and scholarly publications: going from casual acquaintances to something more

What is the return on investment?

• A rigorous framework is lacking But we can look at comparators

• Marginal cost of data archiving $50/article is <2% of of publication costs (>$2.5K) And 0.2% of grant costs/article (~$25K)

• Is the data worth 2% of the research investment? Using DNA microarray data in GEO as a model 2,711 submissions in 2007 Data reused by 3rd parties in >1,150 articles

Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285

Page 43: Research data and scholarly publications: going from casual acquaintances to something more
Page 44: Research data and scholarly publications: going from casual acquaintances to something more

• http://datadryad.org• http://blog.datadryad.org• http://datadryad.org/wiki• http://code.google.com/p/dryad• [email protected]• @datadryad• Dryad

Page 45: Research data and scholarly publications: going from casual acquaintances to something more

A very incomplete list of contributors

JDAP: M. WhitlockDryadUS. R. Scherle, E. Feinstein, J. Greenberg,

H. Piwowar, P. SchaefferDryadUK: B. Hole, Max Wilkinson, D. ShottonSustainability planning: N. Beagrie, L. Eakin-

Richards