17
Overcoming Hurdles to Data Publication Laurie Goodman, PhD Editor-in-Chief GigaScience ORCID ID: 0000-0001-9724-5976 @GigaScience (Personal Twitter Acct @Grimhawk1- but this is mostly me whining about Donald Trump, Pitbull Discrimination, and why I hate TSA and Homeland Security)

Laurie Goodman: Overcoming Hurdles to Data Publication

Embed Size (px)

Citation preview

Page 1: Laurie Goodman: Overcoming Hurdles to Data Publication

Overcoming Hurdles to Data Publication

Laurie Goodman, PhDEditor-in-Chief GigaScience

ORCID ID: 0000-0001-9724-5976@GigaScience

(Personal Twitter Acct @Grimhawk1- but this is mostly me whining about Donald Trump, Pitbull Discrimination, and why I hate TSA and Homeland Security)

Page 2: Laurie Goodman: Overcoming Hurdles to Data Publication

Why should we “publish” data?

1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)

Out of 18 microarray papers, resultsfrom 10 could not be reproduced

Page 3: Laurie Goodman: Overcoming Hurdles to Data Publication

Deconstructing a paper into accessible, useable, trackable, interlinked units

Need to provide credit to reward sharing and proper organization of:• Narrative• Data/Metadata

availability/curation• Source Code, Software

availability• Interoperability• Availability of workflows• Transparent analyses

Data/MetaData

Source Code, Software

Methods

Narrative

Page 4: Laurie Goodman: Overcoming Hurdles to Data Publication

Data Sets inGigaDB

Analyses inGigaGalaxy

Paper inGigaScience

Linked to

Linked to

Open-access journal

Data Publishing Platform (under CC0 waiver)

Data Analysis Platform

How we view publishing at GigaScience

DOIs from

Page 5: Laurie Goodman: Overcoming Hurdles to Data Publication

GigaScience Publishes (or links to) All Research ObjectsArticle (Narrative) + Data + Software + Source Code +

Methods + Workflows + Containers/Docker + VMs

Data sets inGigaDB

Analyses inGigaGalaxy

GigaSciencepaper

Linked to

Linked to

Workflow DOI

DataDOI

+ +

Page 6: Laurie Goodman: Overcoming Hurdles to Data Publication

What is Data Publication?

1. Publishing a standard article that describes the data.

2. Making the data itself citable.

Page 7: Laurie Goodman: Overcoming Hurdles to Data Publication

Make it easy to cite

See where it got cited!

Describe the data

Page 8: Laurie Goodman: Overcoming Hurdles to Data Publication

Current listOf Darwin Finch Data Citations on Google Scholar

…And more

Page 9: Laurie Goodman: Overcoming Hurdles to Data Publication

?

Data Publication HurdlesIf only it were easy…

• Data isn’t “scholarly” enough to be a citable entity (a ‘real’ paper)

• If I publish my data, I may not be able to publish the analysis paper later because journals will consider it Prior Publication

• If I publish my data, #DataParasites will use it!!*

*http://www.nejm.org/doi/full/10.1056/NEJMe1516564 Response from Functional Genomics Data Society:http://fged.org/projects/data-sharing-and-research-parasites/

Page 10: Laurie Goodman: Overcoming Hurdles to Data Publication

F1000 ResearchChecked with Publishers and Journals about Data Publication being considered “Prior Publication”

Page 11: Laurie Goodman: Overcoming Hurdles to Data Publication

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

The polar bear DATA was published -as a citable entity- in 2011 before publication of a data analysis paper

Page 12: Laurie Goodman: Overcoming Hurdles to Data Publication

BUT #dataparasites!Polar Bear Data were used before the data producer’s analysis paper was published—But it garnered 5 citations.

Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.

Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.

Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.

Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.

Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109

http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/

Page 13: Laurie Goodman: Overcoming Hurdles to Data Publication

However, this paper didn’t include the data citation…The Data Publication has since garnered 6 more citations

Even though the data had been released 2 years earlier and been cited in other papers- The main analysis paper was published in Cell

Analysis Paper was published in Cell.(And made the cover)

Page 14: Laurie Goodman: Overcoming Hurdles to Data Publication

Data Publication is being tracked by this and other tracking resources

AND THAT MEANS You can get a Data IF!!

Page 15: Laurie Goodman: Overcoming Hurdles to Data Publication

How are Data Citations Doing Overall?Proportions of Citation Types Per Year

https://blog.datacite.org/location-of-the-citation/

Looked at 1,125 Journal Articles with associated data in Dryad from 2011-2014

The Location of the Citation: Are Data Citation Recommendations Having an Effect? Elizabeth Hull, DataCite Blog

Highlights:• Dryad DOI in the works cited, as

recommended = only 6% of total articles

• Dryad DOI in the body only (including data availability sections) = 75%

• No citation (Dryad DOI not found anywhere in the article) = 20%

Good News:• Works cited in references increased from 5%

to 8% from 2011-2014• Articles with no data citation declined from

31% to 15%Bad News: With Current Growth Rate- expect to see 90% in works cited section in 2031

Page 16: Laurie Goodman: Overcoming Hurdles to Data Publication

More Education Needed“Easiest” Way Forward is to Engage the Journal Community• Organizations providing citation guidelines should engage

“Editor Evangelists”• Editor Evangelists will do the following:

o Get Data Citation Guidelines in the Guide To Authorso Get Data Citation Guidelines in the Copy Editor

Handbooko Tell All their Editor Friends and Get a Cult following

Example: The Standardization of Gene Nomenclature in articles• The Human Genome Organization (HUGO) worked with journal editors in the

late 1990s to drive use of appropriate Gene Nomenclature, getting it into the guide to authors.

• Within about ~3 Years, standard nomenclature use was used by all

Oh- and don’t forget to have the Editors tell the Production Department that DOIs shouldn’t be stripped out and replaced with URLs.

Page 17: Laurie Goodman: Overcoming Hurdles to Data Publication

Thanks to:Scott Edmunds, Executive EditorNicole Nogoy, Commissioning EditorPeter Li, Lead Data ManagerChris Hunter, Lead BioCuratorXiao (Jesse) Si Zhe, Database DeveloperSam Rose, Journal Development ManagerRob Davidson, Open Data Lead, Office for National Statistics

[email protected]@gigasciencejournal.com

@GigaScience

facebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog

Contact us:

Follow us:

http://gigascience.biomedcentral.comwww.gigadb.org