New Improving Reproducible Deep Learning Workflows with DeepDIVA · 2019. 7. 1. · Improving...

Preview:

Citation preview

Improving Reproducible Deep Learning Workflows with DeepDIVA

M. Alberti1*, V. Pondenkandath1*, L. Vögtlin1, M. Würsch12, R. Ingold1, M. Liwicki13

*Equal contribution

1DIVA Group, University of Fribourg, Switzerland2IIT, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland3EISLAB Machine Learning, Luleå University of Technology, Sweden

Reproducibility Crisis: Trust or Verify?

2

Joelle Pineau, “Reproducible, Reusable, and Robust Reinforcement Learning”,

invited talk @NeurIPS 2018, Montreal, Canada

No possibility to verify

No possibility to extend

Lots of overhead created

Leads to no trust in scientific results

Why Is This a Problem?

3

Ensure reproducibility

Of your own experiments

Of other people’s experiments

Promote open-source code

Make it easy to have “good enough” code

Enable code trustworthiness

How To Make Steps Forward?

4

Open-Source

Python framework

Built on top of PyTorch

Makes your life easer for:Reproducing your own and other people’s experiments

Provides boilerplate code for:Common deep learning scenarios

Handling time consuming everyday problems

Documentation & Tutorial available

How We Contribute: DeepDIVA

5

Reproducing Your Own Experiments

Short-term, or work in progress

Long-term, or finished work

6

Kilometres of poor or incomplete log files

Stochasticity in the process

Short-term Reproducibility Dangers

7

Meaningful logging

Saving all run parameters and command line args

Providing concise coloured logs

Deterministic runs

Seeding the pseudo-random numbers generators: Python, Numpy and PyTorch.

Disabling CuDNN (NVIDIA Deep Neural Network library) when necessary

How DeepDIVA Ensures Short-term Reproducibility

8

Poor (or non-existent!) use of version control

Hard-to-die bad programming habits

Silent data modifications

Long-term Reproducibility Dangers

9

Git status

Linking every run to a specific commit in Git

Allowing this feature to be disabled for dev purposes

Copy code

Copying the entire running code in the output folder

Data Integrity Management

Footprint of the data in a JSON file using SHA-1 hashes

How DeepDIVA Ensures Long-term Reproducibility

10

Reproducing Other People’s Experiments

Given a paper, try to replicate the results and observations

11

In order to reproduce an experiment one needs:

Git repository URL

Git commit identifier (full SHA)

List of command line arguments used

The data

Reproducing Other People’s Experiments

12

Productivity Out-of-the-box

Making your life easier: do not reinvent the wheel!

13

“One click away” Deep Learning Scenarios

14

“when the data is ready the task is solved”

Download a dataset with a click

Natural images, medical images, historical documents, …

Split your datasetTrain, Validation and Test splits

Analyse the data

Mean/std and class distributions

Ensure data integrity

Compare the footprints

Prepare Your Data

15

Real-time Visualizations

16

Tensorboard (from TensorFlow)

Confusion Matrix

Features Visualization

Weight Histograms

Performance Evaluation

Let machine learning find the best values

No expensive grid or random search

Automatic Hyper-Parameter Optimization

17

Be A Part Of It

Getting Started With DeepDIVA

18

No Setup Time From source on Ubuntu (or other flavours of Linux)Docker Image Coming Soon

DocumentationOnline and in the code

TutorialsLearn new features efficiently

Fork ItExtensive and modular for easy modifications

How To Use It

19

Make Your Experiment Reproducible

bit.ly/DeepDIVA

20

Recommended