Reproducible Research for OMNeT++ Based on Python and …€¦ · · 2017-09-07How to Guarantee Match Between Input Files and Output Data •Online generation of results •Include

Reproducible Research for OMNeT++Based on Python and Pweave

Kyeong Soo (Joseph) Kim

Department of Electrical and Electronic Engineering

Xi’an Jiaotong-Liverpool University

07 September 2017

Outline

• Reproducible Research

• Python and Pweave

• Reproducible Research for OMNeT++

• Example: OMNeT++ FIFO Simulation

Reproducible Research

Reproducible Research

• Reproducible research is a key to any scientific method and ensures repeating an experiment and the results of its analysis in any place with any person.

• A study can be truly reproducible when it satisfies at least the following three criteria:• All experimental methods are fully reported.

• All data and files used for the analysis are (publicly) available.

• The process of analyzing raw data is well reported and preserved.

• Reproducible research is to ensure• Same data + Same script = Same results

Why Do We Need Reproducible Research:Two Examples

• LIGO - Gravitational Wave Detection

• Schön scandal - Molecular Computing

LIGO - Gravitational Wave Detection

• The Laser Interferometer Gravitational-Wave Observatory (LIGO) is a large-scale physics experiment and observatory to detect cosmic gravitational waves.• The detection of gravitational wave

was reported in Physical Review Letters in Feb. 2016, together with ipython notebook with analysis code and data.

https://en.wikipedia.org/wiki/LIGO

https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html

Schön Scandal - Molecular Computing

• No records found for his ground-breaking experimental results, including lab notebook, experimental samples and data, hard disk drives.

• During the investigation, he kept repeating“I clearly observed them in the Lab but …”

Python and Pweave

R/Sweave to Python/Pweave

• Until recently, R was the language of choice for statistical processing and data analysis.• Still, R has the largest code base for a wide variety of statistical and graphical

techniques.

• Like ipython (now jupyter), R provides a nice tool called Sweave (now replaced by knitr) to weave documentation and the results of the execution of R code chunks into one source file for integrated documentation.

• Python one of the most popular languages in scientific computing, including artificial intelligence & machine learning recently takes over R in statistical processing and data analysis as well.• Thanks to pandas implementing DataFrame object similar to R and Pweave,

python can replace R for most statistical and data analysis tasks, while retaining its many advantages over R (i.e., fully-featured programming language with easy syntax and higher speed).

http://pandas.pydata.org/

http://mpastell.com/pweave/

Snippets of R Source Codeand Sweave File for LaTeX

Pweave source file (“*.Plw”)• Mix of documentation (e.g.,

LaTeX) and code Chunks (e.g., Python)

Weaving (pweave) Tangling (ptangle)

Document 1

Result of the Execution of Code Chunk 1

Document N

Result of the Execution of Code Chunk N

Code Chunk 1

Code Chunk N

A program source file for separate execution and debugging• e.g., “*.py” for Python

A documentation source file• e.g., “*.tex” for LaTeX

Weaving Example: Automatic Table Generation

Weaving & LaTeXing

Reproducible Research for OMNeT++

How to Deal with Simulation Input Files

• Include them the document.• OK for small simulations

• Use a snapshot of the whole configurations.• e.g., git commit hashes

How to Guarantee Match BetweenInput Files and Output Data

• Online generation of results• Include simulation execution code within a document

• Refer to the provided sample Pweave file.

• OK for smaller simulations, but not for larger simulations.

• Use a snapshot of the whole configurations and data• e.g., git commit hashes

• Version controlling output data together with source code and input configuration files, however, may greatly increase the size of a repository.

How to Present and Analyze Output Data

• Unstacking of stacked DataFrame• Use pivot function (see the example shown here).

• Aggregated processing of measurement data over independent variables• Use pivot_table function.• Useful for the calculation of mean and confidence

intervals over multiple iterations.

• Online calculation of confidence intervals• Confidence intervals (CIs) can be calculated by assigning

a custom function for CI to aggfunc paramenter of pivot_table function.

• Now pandas support error bars in its own plot functions.

Example: OMNeT++ FIFO Simulation

Documents

Reproducible Research for OMNeT++ Based on Python and …€¦ · · 2017-09-07How to Guarantee Match Between Input Files and Output Data •Online generation of results •Include