28
Amye Kenall Journal Development Manager, Open Data Revolutionizing the Journal through Big Data Computational Research DataCite Annua Vandoeuvre-l 26

Revolutionising the Journal through Big Data Computational Research

Embed Size (px)

Citation preview

Page 1: Revolutionising the Journal through Big Data Computational Research

Amye KenallJournal Development Manager, Open Data

Revolutionizing the Journal through Big Data

Computational Research

DataCite Annual ConferenceInist-CNRS

Vandoeuvre-lès-Nancy, France26 August 2014

Page 2: Revolutionising the Journal through Big Data Computational Research

2

• Founded in 2000 (bought by Springer in 2008)• Publish over 260 open access journals• ~25,000 peer reviewed research articles published annually• Genomics and computational biology are a significant fraction

e.g. Genome Biology, BMC Genomics, BMC Bioinformatics• Other key fields include

• Public Health / Global Health / Infectious Disease• Cancer

• All research articles are CC-BY licensed for reuse• Since mid 2013, all data is covered by a CC0 rights waiver

Who are we?

Page 3: Revolutionising the Journal through Big Data Computational Research

3

• Strong encouragement to authors of all journals to provide underlying datasets and required on a select number (eg. Genome Biology, Genome Medicine, GigaScience)

• CC0 + CC-BY 4.0 by default

In the works…• Interactive tabular data• DOIs for all additional files• Searchability of additional files• Data Citation clearly tagged in

XML to aid harvesting e.g. Data Citation Index

Data reuse @BioMedCentral• Availability of Data section and Data

Citation• Encourage use of ISA-TAB (especially

GigaScience and BMC Research Notes)

Page 4: Revolutionising the Journal through Big Data Computational Research

4

Page 5: Revolutionising the Journal through Big Data Computational Research

5

Journal, data-platform and database for large- scale data

In conjunction with

Page 6: Revolutionising the Journal through Big Data Computational Research

6

Page 7: Revolutionising the Journal through Big Data Computational Research

7

Linking and Citation

Page 8: Revolutionising the Journal through Big Data Computational Research

8

Publishing Reproducible Science: SOAPdenovo2, a case study

Page 9: Revolutionising the Journal through Big Data Computational Research

9

Page 10: Revolutionising the Journal through Big Data Computational Research

10

Page 11: Revolutionising the Journal through Big Data Computational Research

11

Page 12: Revolutionising the Journal through Big Data Computational Research

12

Page 13: Revolutionising the Journal through Big Data Computational Research

13

Page 14: Revolutionising the Journal through Big Data Computational Research

14

Lessons Learned?

• With enough work, results can be replicated with a push of a button.

• But a lot of work costs a lot of money! No one would pay an APC that reflects that cost.

• Learn a huge amount about the study and provides a lot of information not present in the paper.

• Needs to happen before publication.

Page 15: Revolutionising the Journal through Big Data Computational Research

15

Reproducibility of computational research

• Computational research in principle should be easier to replicate/reproduce than bench studies

• However, practical issues get in the way

• Even if source code is shared, reproducing entire technical setup/porting software, gathering appropriate input data, rerunning analysis is a significant effort

• This means readers and even reviewers don’t bother

• We would like to reduce this ‘activation energy’

Page 16: Revolutionising the Journal through Big Data Computational Research

16

Strong interest from potential partners

Page 17: Revolutionising the Journal through Big Data Computational Research

17

Key technologies

Page 18: Revolutionising the Journal through Big Data Computational Research

18

PartnersTechnologiesJournal

Article

+ +

Page 19: Revolutionising the Journal through Big Data Computational Research

19

Page 20: Revolutionising the Journal through Big Data Computational Research

20

Page 21: Revolutionising the Journal through Big Data Computational Research

21

Page 22: Revolutionising the Journal through Big Data Computational Research

22

Page 23: Revolutionising the Journal through Big Data Computational Research

23

Page 24: Revolutionising the Journal through Big Data Computational Research

24

Flexible management/deployment of packaged data/analysis suites using VM infrastructure

Page 25: Revolutionising the Journal through Big Data Computational Research

25

• Publishers have role in enforcement of community standards

• Public/academic databases can provide credible long term archiving for key data with a focus on curation and metadata standards

• Academic grid computing infrastructure can provide access for researchers to large-scale computing resource

• Commercial cloud providers universalize/democratize access to large-scale computing. Even if you are not at an institution with its own facilities, you can carry out high-end computations. No bureaucracy/politics – simply pay per CPU-hour.

Complementary roles of publishers, academia, and cloud providers

Page 26: Revolutionising the Journal through Big Data Computational Research

26

• To what extent can/should datasets be included in the VM/suite or pulled in externally?

• How can we avoid the costliness of moving data around, as it gets bigger and bigger?

• To what extent are cross-domain standards for referring to and pulling in underlying datasets feasible. Dataset DOIs typically point to metadata

• Multiple versions of datasets. To what extent is it practical, when dealing with evolving datasets/databases, to make them available as reproducible snapshots?

• Culture of data sharing. How to get authors to share their data?

Specific challenges with respect to data

Page 27: Revolutionising the Journal through Big Data Computational Research

27

• With big data and computational tools, research is becoming more “reproducible/reusable”

• The infrastructure is out there; we need to do a better job of using it

• What authors need to communicate their research is also changing, and as publishers we must respond

• Clear publishers have a role, with other organisations, in setting some community standards

• It took a few 100 years, but publishing is now getting exciting

Conclusions

Page 28: Revolutionising the Journal through Big Data Computational Research

28

Questions?

“One reason that the worldwide web worked was because people reused each other’s content in ways never imagined or achieved by those who created it. The same will be true of open data.”

– Tim Berners-Lee and Nigel Shadbolt, The Times, New Year’s Eve 2011

Amye KenallJournal Development Manager (Open Data), BioMed Central

@AmyeKenall (also @OpenDataBMC)[email protected]