47
May 2016 © 2016 IEEE Importance and Challenges of Reproducible Research Vladimir Kanchev [email protected]

Importance and Challenges of Reproducible Research

Embed Size (px)

Citation preview

Page 1: Importance and Challenges of Reproducible Research

May 2016© 2016 IEEE

Importance and Challenges of Reproducible Research

Vladimir [email protected]

Page 2: Importance and Challenges of Reproducible Research

*

* http://www.software.ac.uk/blog/2014-03-21-reproducible-research-impossible-dream

Slide 2

Page 3: Importance and Challenges of Reproducible Research

Slide 3

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 4: Importance and Challenges of Reproducible Research

Slide 4

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 5: Importance and Challenges of Reproducible Research

Slide 5

Personal Introduction

• Defense of my Ph.D. thesis at TU-Sofia is pending• Research in image/MR image segmentation• Publications in peer-reviewed journals• Some experience in industry

Page 6: Importance and Challenges of Reproducible Research

Slide 6

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 7: Importance and Challenges of Reproducible Research

Slide 7

Introduction to Reproducible ResearchDefinitions

Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers:

• data• computer codes• a precise description of how the code was applied to the data

* Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology-Paris 106.3 (2012):159-170.

Page 8: Importance and Challenges of Reproducible Research

Introduction to Reproducible ResearchDefinitions

Another definition (Signal Processing): An article about computational science in a

scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*.D. Donoho

* D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18

Slide 6

Page 9: Importance and Challenges of Reproducible Research

Slide 9

Introduction to Reproducible ResearchDefinitions

• Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard.

• Reproduction – independent people analyze the same data and produce the same result* . Focus on validity of data analysis. (Roger Peng)

* http://simplystatistics.org/2011/12/02/reproducible-research-in-computational-science/

Page 10: Importance and Challenges of Reproducible Research

Introduction to Reproducible ResearchDefinitions

*

* Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226.

Slide 8

Page 11: Importance and Challenges of Reproducible Research

Slide 11

Introduction to Reproducible ResearchHistory

The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.

Page 12: Importance and Challenges of Reproducible Research

Slide 12

Introduction to Reproducible Research Relation to scientific method

Steps of a scientific method *:1. Define a question2. Observe – gather information and resources3. Form an explanatory hypothesis4. Test the hypothesis by performing an experiment and

collecting data in a reproducible manner5. Analyze the data6. Interpret the data and draw a conclusion7. Publish results8. Retest (reproduce) from other researchers

 * Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228

The steps related to the Reproducible Research are in italic type

Page 13: Importance and Challenges of Reproducible Research

* https://scischol102.wordpress.com/category/science/

* *

Slide 11

Page 14: Importance and Challenges of Reproducible Research

Slide 14

Introduction to Reproducible Research Relation to scientific method

Principles of a scientific method:1. Empirically testable2. Replicable3. Objective4. Transparent5. Falsifiable6. Logically consistent

Page 15: Importance and Challenges of Reproducible Research

Slide 15

Introduction to Reproducible Research Scheme

*

* http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)

Page 16: Importance and Challenges of Reproducible Research

Slide 16

Introduction to Reproducible ResearchCurrent situation

Current situation with RR in different fields:• Medicine (cancer research), social sciences

(psychology), etc.Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate

• Natural sciences • Computer science

Page 17: Importance and Challenges of Reproducible Research

* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.

*

Slide 15

Page 18: Importance and Challenges of Reproducible Research

Slide 18

Introduction to Reproducible ResearchCurrent situation

Reproducibility in Medical imaging &Computer vision & Machine learning:• Public test sets available• Most method codes are available (papers from

major conferences and journals)• High pressure/workload on researchers to

make their work reproducible

Page 19: Importance and Challenges of Reproducible Research

Slide 19

Introduction to Reproducible ResearchCurrent situation

Reproducibility in Medical imaging &Computer vision & Machine learning (cont.):• Benchmark comparison with other methods -

compulsory• Experiment automation• Differences between Medical imaging vs.

Computer vision & Machine learning fieldsExample: IPOL journal

Page 20: Importance and Challenges of Reproducible Research

Slide 20

Introduction to Reproducible ResearchReasons

Reasons for reproducibility/replication crisis:• “Publish or perish” culture - pressure to obtain

publishable results• Uneasiness to make method codes public –

additional time and efforts to improve its quality• Most graduate non-CS students are not taught in

software engineering and statistics courses

Page 21: Importance and Challenges of Reproducible Research

*

* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454.

Slide 21

Page 22: Importance and Challenges of Reproducible Research

Slide 22

Other problems:• Insufficient description of the experiment in the

publications• Test datasets and paper method codes not publicly

available – common in social sciences• The used mathematical methods are inclined to

malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result

Introduction to Reproducible ResearchReasons

Page 23: Importance and Challenges of Reproducible Research

Slide 23

Introduction to Reproducible ResearchReasons

Problems with method code:• Reproducibility issues – missing method data

and code, method code errors, not all figures and tables are reproduced

• Documentation issues – missing README file, bad code documentation

• Programming style issues – bad coding style

Page 24: Importance and Challenges of Reproducible Research

*

* Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), 2102-2110.

Slide 24

Page 25: Importance and Challenges of Reproducible Research

Introduction of Reproducible Research Guidance (Biostatistics journal)

Authors should provide all data code inorder to reproduce all results, images andtables with:

• README file• Consistent coding style and documentation• Test data sets• Simulations and random numbers• General advice

* Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), 405-408.

Slide 25

Page 26: Importance and Challenges of Reproducible Research

Slide 26

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 27: Importance and Challenges of Reproducible Research

Slide 27

Software tools

Recommended programs to use to achievereproducibility:• Latex (Tex editor)• Version control systems - Git software systems• Make – pipeline

Literate programming concept (Knuth).

Page 28: Importance and Challenges of Reproducible Research

Slide 28

Software tools

Matlab programming language:• Matlab file exchange• Proprietary Matlab toolboxes - disadvantages• Examples of RR toolboxes - Wavelab,

Sparselab• Matlab publish – no literate programming

support

Page 29: Importance and Challenges of Reproducible Research

Slide 29

Software tools

R programming language:• R studio – development environment for R

programming language• Graphic packages, such as ggplot2• Packages as knitr or rmarkdown – literate

programming support

Page 30: Importance and Challenges of Reproducible Research

Slide 30

Software tools

Python programming language:• Many open scientific libraries available – scipy,

numpy, etc.• IPython notebook • Sumatra package – save parameter values,

code state, output results and files

Page 31: Importance and Challenges of Reproducible Research

* ISMB/ECCB 2013 Keynote

*

Slide 31

Page 32: Importance and Challenges of Reproducible Research

Slide 32

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 33: Importance and Challenges of Reproducible Research

Slide 33

The context – personal experience

Making a current research project reproducible at the end of the process is not the best way ….

* http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html

*

Page 34: Importance and Challenges of Reproducible Research

The context – personal experience

Difficulties with:• Exact reproduction of all figures and results• Exact parameter values setting• Time to improve code quality and add

documentation

Slide 34

Page 35: Importance and Challenges of Reproducible Research

Slide 35

The context – personal experience

Motivation for achieving reproducibility:• Better visibility of research• More citations and higher impact• Increased trust in research quality (outside

academia, e.g. from industry)• Help from readers of the publication with the

improvement of the developed method

Page 36: Importance and Challenges of Reproducible Research

Slide 36

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 37: Importance and Challenges of Reproducible Research

Slide 37

The situation in Bulgaria and abroad

RR in Bulgaria:• Its introduction in the scientific community is still

at the beginning • Its principles need to be taught at under- graduate and graduate level• Paper code and test datasets, in general, are not available online in most fields

Page 38: Importance and Challenges of Reproducible Research

Slide 38

The situation in Bulgaria and abroad

Advances of RR implementation would:• Increase the impact of research conducted by

Bulgarian researchers abroad • Improve reputation and applicability – especially

to people from industry• Faster distinction of quality work and steady

improvement of lower quality papers

Page 39: Importance and Challenges of Reproducible Research

Slide 39

The situation in Bulgaria and abroad

Advances of RR implementation (cont.):• Profit from the fast development of scientific

computing, machine learning, data science, and AI• Attract more bright young people in research (open source movement and open data)

Page 40: Importance and Challenges of Reproducible Research

Slide 40

The situation in Bulgaria and abroad

RR abroad:• A great issue in social and biomedical sciences• An important criterion for manuscript evaluation

from reviewers in many CS fields• One of major requirements of funding agencies

abroad for the evaluation of project proposals

Page 41: Importance and Challenges of Reproducible Research

Slide 41

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 42: Importance and Challenges of Reproducible Research

Slide 42

Additional resources for research and RR methods

MOOC courses:1. Data science specialization (www.coursera.org) (John

Hopkins University) – course 5 Reproducible research2. Methods and Statistics in Social Sciences Specialization

(www.coursera.org) (University of Amsterdam) 3. Research Methods: An Engineering Approach

(www.edx.org) (Wits University )4. Research Data Management and Sharing

(www.coursera.org) (The University of North Carolina at Chapel Hill & The University of Edinburgh)

Page 43: Importance and Challenges of Reproducible Research

Slide 43

Additional resources for research and RR methods

Software tools for RR:1. Software carpentry (www.Software-carpentry.org) – basic

computing skills for researchers2. Bootcamps - one or two day long courses – teaching coding

and professional skills for researchers.3. MOOC courses - www.coursera.org, www.edx.org,

www.udacity.org - for programming skills in R, Python, Matlab.

Page 44: Importance and Challenges of Reproducible Research

Slide 44

Additional resources for research and RR methods

Books:1. Stodden, V., Leisch, F., & Peng, R. D. (Eds.)

(2014). Implementing Reproducible Research. CRC Press 2. Gandrud, C. (2013). Reproducible Research with R and R

Studio. CRC Press3. Subramanian, G. (2015). Python Data Science Cookbook.

Packt Publishing Ltd4. Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data

Visualization Cookbook. Packt Publishing Ltd

Page 45: Importance and Challenges of Reproducible Research

Slide 45

Agenda

1. Personal introduction2. Introduction to Reproducible Research (RR)3. Software tools4. The context – personal experience5. The situation in Bulgaria and abroad6. Additional resources for RR7. Discussion

Page 46: Importance and Challenges of Reproducible Research

Slide 46

Discussion

Topics for discussion:• What do you think about reproducibility,

in general?• Have you already met RR in your work?• How the application of reproducibility might

impact your work as researchers, engineers, or programmers?

Page 47: Importance and Challenges of Reproducible Research

Slide 47

End