Upload
vivian-tyler
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
TODAY’S SCIENTIFIC ARTICLE – HOW REPRODUCIBLE IS IT?
MICHAEL MARKIEAssociate Publisher, F1000Research
@mmmarksman
f1000research.com@f1000research
PRODUCING A TYPICAL ARTICLE
The process of science in a nutshell:
1.Collect data, 2.Evaluate the data, 3.Present the result(s) in a scientific paper
Articles are normally written in a Word document (or perhaps a Latex file) and then typically converted to a JATS XML format:
ARTICLE METADATA
“The backing singer”: it’s not part of the main body text/graphics
Its job is to identify /describe the article
In the XML, bibliographic standards are:
1.Authorship
2.Article title
3.Copyright year, and publication date
4.Descriptive material such as keywords/abstracts
5.Persistent identifiers (DOIS, PMIDs etc)
Metadata makes an article discoverable: easily shared and interoperable
ARTICLE METADATA 2.
Benefits of Open Access journals
Greater visibility: dissemination is free and can be achieved via a simple Internet connection
The OAI protocols allows metadata harvesting for inclusion in many digital archives
Most journals provide the full text of the XML for data mining purposes – Why?
1.Drives users to content
2.Stimulates collaboration
3. Allows for the creation of new services for discovery
A NEW CHALLENGE - PUBLISHING DATA
• Outputs are more than just text – data and code are involved in the process too!
BUT IT’S NOT THAT EASY...
• Data is heterogeneous depending on the discipline
• Datasets are often generated with incomplete metadata
• Scientific user-communities are often small and specialised - so is their data
• Scientific metadata are more extensive and less standardised than non-scientific metadata
WHY SHOULD WE MAKE DATA AVAILABLE IN ARTICLES?
“We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data...We further conclude that...a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.”
Piowar HA., Vision TA. Data reuse and the open data citation advantage. PeerJ (2013) doi: 10.7717/peerj.175
1. Correlates with higher citations
BUT WHY SHOULD WE MAKE DATA AVAILABLE IN ARTICLES?
“• We examined the availability of data from 516 studies between 2 and 22 years old: the odds of a data set being reported as extant fell by 17% per year
• Broken e-mails and obsolete storage devices were the main obstacles to data sharing
• Policies mandating data archiving at publication are clearly needed”
Vines TH. et al. The availability of research data declines rapidly with article age. Curr Biol 24, 94–7 (2014)
2. Research becomes harder to access with age
BUT WHY SHOULD WE MAKE DATA AVAILABLE IN ARTICLES?
“We evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006...We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability.”
Ioannidis JPA. et al Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149–55 (2009)
3. Sharing Data allows replication
BUT WHY SHOULD WE MAKE DATA AVAILABLE IN ARTICLES?
Increasing government and funding mandates to do so
New research:• testing new hypotheses• new analysis methods• meta-analyses to create new datasets• studies on data collection methods
Diversity of analyses and opinion Reduction of error and fraud
Education for new researchers
4. Lots of other reasons!
PUBLISHING DATA IN ARTICLES TODAY – RISE OF THE DATA PAPER
The Data Paper - describes a particular dataset and is peer-reviewed – can it provide the missing link between the data and the research article?
THE DATA ENFORCERS!
• Reproducible research or data sharing statements in published papers (Annals Internal Medicine BMJ)
• Data sharing implied by submission (BioMed Central)
• Data sharing as a condition of publication (PLoS, NPG)- AND data must be available to reviewers/editors
• Open data a condition of submission (F1000Research)- Papers will be rejected if data no made freely available*
MAKING DATA ACCESSIBLE
• ‘Openly accessible’ – apply the principles of the Budapest Open Access Initiative (originally created for scholarly articles) to scholarly data too i.e..
• Free to view/access• Free to download• Free to re-analyse• Free to modify
• Use a license that facilitates the ease of sharing and reuse: CC0
• Apply community norms regarding acknowledgement and citation of data.
HOW TO MAKE DATA USABLE/REPRODUCIBLE
• Present data in a useable format (i.e. not in a supplemental PDF)
• Share data in non-proprietary formats
• Specify how the data was generated (context)
• Provide quality assurance (what were the limitations?)
• Specify access to software required to view the data
• Specify parameters in the software to analyze the data
THE MORE INFORMATION ABOUT A DATASET THE BETTER
White EP. Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution. 6(2):1-10. 2013
DATA REVIEW AT F1000RESEARCH
INTERNAL EDITORIAL CHECKS
• Where to store the data (discipline-specific repository where possible) if not in general-science repository such as figshare , Dryad Digital or Dataverse
• Format – what file type
• How the data is presented – layout, labels
• Is their adequate data?
• Is their adequate protocol information?
DATA REVIEW AT F1000RESEARCH
EXTERNAL PEER REVIEW CHECKS
• Were the methods used appropriate?
• Was the format/structure usable?
• What were the limitations and sources of error included
• Is their adequate information to enable potential replication?
DISCOVERING DATA AT F1000RESEARCH
DISCOVERING CODE AT F1000RESEARCH
STANDARDISING THE METADATA – WHAT’S BEING DONE
• Lists of recommended repositories and standards are being developed with community-driven efforts such as:
• Force11 Data Citation Implementation Group (DCIG) – Aiming to revise the NISO/JATS XML schema for direct data citation
• The Research Data Alliance forms focused Working Groups and Interest Groups to discuss the social and technical bridges of sharing data
• The Data Citation Index – data in repositories, with or without them being linked to papers, will be recognised as independent evidence products
• Training – Lots of initiatives to help standardise how we work.
BUT THE ARTICLE AS WE KNOW IT NEEDS TO CHANGE!
ReagentsWorkflowsSoftwareData
Publication
Open Peer Review and commenting
ResultsDiscussion
Alternative Metrics
An article for the digital age...?
Interactive toolsFor analysis
Preprint
IN-ARTICLE DATA MANIPULATION
FIGURES THAT DON’T EXIST
Simply data + code
Creates opportunities to change the definition of a figure
Colomb J and Brembs B.Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is]F1000Research 2014, 3:176
LIVING FIGURES
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can attempt to replicate the study and then submit their data directly onto the figure in the article (with associated metadata).
Provides a new way to show reproducibility attempts and could change fundamentally what an article is.