47
Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow João Felipe Nicolaci Pimentel (UFF), Juliana Freire (NYU), Leonardo Murta (UFF), Vanessa Braganholo (UFF)

on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when

IPython meets noWorkflow

João Felipe Nicolaci Pimentel (UFF),Juliana Freire (NYU), Leonardo Murta (UFF),

Vanessa Braganholo (UFF)

Page 2: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Outline

• Motivation– Exploratory research

– Example

– Interactive Notebooks

– Example

– Provenance Limitations

• Approach– Provenance Collection

– Provenance Analysis

• Conclusion

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 2

Page 3: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Motivation

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 3

Page 4: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Exploratory research

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

Implement or Change

Execution

Analysis

Hypothesis

João Felipe Nicolaci Pimentel 4

Page 5: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Example of exploratory research

• Analyze precipitation data from Rio de Janeiro

• Hypothesis: “The precipitation for each month remains constant across years”

• Data: 2013, 2014 [BDMEP]

João Felipe Nicolaci Pimentel 5Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Implement or Change

Execution

Analysis

Hypothesis

Page 6: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 6

1| import numpy as np2| import matplotlib.pyplot as plt3| from precipitation import read, prepare4|5| def bar_graph(years):6| global PREC, MONTHS7| prepare(PREC, MONTHS, years, plt)8| plt.savefig("out.png")9|10| MONTHS = np.arange(12) + 111| d13, d14 = read('p13.dat'), read('p14.dat')12| PREC = prec13, prec14 = [], []14| for i in MONTHS:15| prec13.append(sum(d13[i]))16| prec14.append(sum(d14[i]))18| bar_graph(['2013', '2014'])

Page 7: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

1st Iteration

• $ python experiment.py

• $ display out.png

• Analysis: “Drought in 2014”

• New Hypothesis:

• Data: 2012, 2013, 2014 [BDMEP]

João Felipe Nicolaci Pimentel 7Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

“The precipitation for each month remains constant across years if there is no drought”

Page 8: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 8

10| MONTHS = np.arange(12) + 111| d12 = read('p12.dat') 12| d13, d14 = read('p13.dat'), read('p14.dat')13| PREC = prec12, prec13, prec14 = [], [], []14|15| for i in MONTHS:16| prec12.append(sum(d12[i]))17| prec13.append(sum(d13[i]))18| prec14.append(sum(d14[i]))19|20| bar_graph(['2012', '2013', '2014'])

2nd Iteration

Page 9: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

2nd Iteration

• $ python experiment.py

• $ display out.png

• Analysis: “2012 was similar to 2013”

• Cycle continues

João Felipe Nicolaci Pimentel 9Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 10: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

• Documents

– Text, code, plots, rich media

– Share the documents with results

• Most famous:

– IPython Notebook, knitr

• IPython Notebook has more than 500,000 active users

• Good for exploratory research

Interactive Notebooks

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 10

Page 11: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 11

Example - 1st Iteration

Page 12: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 12

Page 13: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 13

Page 14: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 14

Page 15: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 15

Page 16: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 16

Page 17: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Exploratory research

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

Change

Execution

Analysis

Hypothesis

João Felipe Nicolaci Pimentel 17

• Change existing cells• Add new one

• Execute cells

• Observe the results• Use code for analysis

InteractiveNotebooks

Page 18: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

1. Which version of matplotlib, numpy and precipitation module is it using? Will it work on another environment?

Provenance Limitations

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 18

Page 19: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Provenance Limitations

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 19

2. The cell [3] were replaced by [5]. Where did the “prec13” and “prec14” come from? What changed from [3] to [5]?

Page 20: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

3. What happens inside the cell? What does read('p12.dat') return? How long did it take to execute each function?

Provenance Limitations

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 20

Page 21: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

4. What is the content of 'p12.dat'?

Provenance Limitations

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 21

Page 22: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

João Felipe Nicolaci Pimentel 22Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Provenance of Python Scripts

• API: (Bochner; Gude; Schreiber, 2008)– Require API calls

• StarFlow (Angelino; Yamins; Seltzer, 2010)– Require annotations

• Sumatra (Davison, 2012)– Require version control system

• noWorkflow (Murta et al., 2014)– Transparent collection

• None supports notebooks

Page 23: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

• Transparently captures provenance of Python scripts – no changes required!

• Allows users to analyze provenance data

noWorkflow

João Felipe Nicolaci Pimentel 23Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 24: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 24

1| import numpy as np2| import matplotlib.pyplot as plt3| from precipitation import read, prepare4|5| def bar_graph(years):6| global PREC, MONTHS7| prepare(PREC, MONTHS, years, plt)8| plt.savefig("out.png")9|10| MONTHS = np.arange(12) + 111| d13, d14 = read('p13.dat'), read('p14.dat')12| PREC = prec13, prec14 = [], []14| for i in MONTHS:15| prec13.append(sum(d13[i]))16| prec14.append(sum(d14[i]))18| bar_graph(['2013', '2014'])

1.9.2

1.4.31.0.1

PATH = /home/joao/...PYTHON_VERSION = 2.7.6

Page 25: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 25

1| import numpy as np2| import matplotlib.pyplot as plt3| from precipitation import read, prepare4|5| def bar_graph(years):6| global PREC, MONTHS7| prepare(PREC, MONTHS, years, plt)8| plt.savefig("out.png")9|10| MONTHS = np.arange(12) + 111| d13, d14 = read('p13.dat'), read('p14.dat')12| PREC = prec13, prec14 = [], []14| for i in MONTHS:15| prec13.append(sum(d13[i]))16| prec14.append(sum(d14[i]))18| bar_graph(['2013', '2014'])

PREC = [[…], […]]MONTHS = array(1,…,12)

p13.dat content b/ap14.dat content b/a

read(‘p14.dat’) -> {1: [7.1, 0.8, 0.0, …],2: …}

Page 26: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

noWorkflow

João Felipe Nicolaci Pimentel 26Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Interactive Provenance

Interactive + Provenance

noW

Objective

Page 27: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Approach

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 27

Page 28: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 28

Provenance Collection – 1 / 2

Page 29: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 29

Provenance Collection – 2 / 2

Page 30: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 30

Provenance Collection – 2 / 2

Page 31: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

João Felipe Nicolaci Pimentel 31Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Provenance Analysis

• Call trial methods and properties

• Load Trial

• Trial visualization

• Perform SQL queries

• Perform Prolog queries

• Read file content before and after writing

• Advanced Analysis

Page 32: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Call Trial Method

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 32

1. Which version of precipitation module is it using?

Page 33: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Load Trial

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 33

2. What changed from [3] to [5]?

Page 34: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 34

Provenance Visualization

3. What happened inside the cell?

Page 35: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

SQL Queries

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 35

Page 36: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Prolog Queries

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 36

Page 37: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Read file content

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

João Felipe Nicolaci Pimentel 37

4. What is the content of 'p12.dat'?

Page 38: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Advanced Analysis

• Combine:

– Python code

– SQL queries

– Prolog queries

– File content

– External tools

João Felipe Nicolaci Pimentel 38Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 39: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Limitations

• Capture one cell at a time.

– It is necessary to repeat “%%now_run” for every cell

• No Out [x]

– The Out [x] is replaced by the trial object

• No IPython superset

– It is not possible to invoke other special commands

João Felipe Nicolaci Pimentel 39Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 40: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Related Work

• Ducktape (Wibisono et al., 2014)

– Use notebook only for interactive provenance visualization

• Lancet (Stevens, Elver, Bednar, 2013)

– Requires definition of special launchers to capture provenance

– Steep learning curve

João Felipe Nicolaci Pimentel 40Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 41: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Conclusion

Collecting and Analyzing Provenance on Interactive Notebooks: when IPython meets noWorkflow

• Mechanism to collect and analyze provenance from IPython Notebooks

– Invoke noWorkflow through special functions

– Analytic tools: SQL queries, Prolog queries, object properties, graphs

• noWorkflow tracks history, environment, intermediate results and files

• Reproducible notebook!

João Felipe Nicolaci Pimentel 41

Page 42: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Future Work

• New visualization methods for Provenance

– Dependency graph

– Diff visualization

• Integration with Pandas

– Improve data analysis

• Collect provenance from other languages supported by Jupyter

João Felipe Nicolaci Pimentel 42Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 43: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Collecting and Analyzing Provenance on Interactive Notebooks: when

IPython meets noWorkflow

João Felipe Nicolaci Pimentel (UFF),Juliana Freire (NYU), Leonardo Murta (UFF),

Vanessa Braganholo (UFF)

https://github.com/gems-uff/noworkflow [email protected]

Page 44: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

SQL Schema

João Felipe Nicolaci Pimentel 44Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 45: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Prolog Schema

João Felipe Nicolaci Pimentel 45Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 46: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Complex Analysis

João Felipe Nicolaci Pimentel 46Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow

Page 47: on Interactive Notebooks: when IPython meets noWorkflowworkshops.inf.ed.ac.uk/tapp2015/TAPP15_III_2_slides.pdf · on Interactive Notebooks: when IPython meets noWorkflow João Felipe

Complex Analysis

João Felipe Nicolaci Pimentel 47Collecting and Analyzing Provenance on Interactive

Notebooks: when IPython meets noWorkflow