New Improving Reproducible Deep Learning Workflows with DeepDIVA · 2019. 7. 1. · Improving...

Improving Reproducible Deep Learning Workflows with DeepDIVA

M. Alberti1*, V. Pondenkandath1*, L. Vögtlin1, M. Würsch12, R. Ingold1, M. Liwicki13

*Equal contribution

1DIVA Group, University of Fribourg, Switzerland2IIT, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland3EISLAB Machine Learning, Luleå University of Technology, Sweden

Reproducibility Crisis: Trust or Verify?

Joelle Pineau, “Reproducible, Reusable, and Robust Reinforcement Learning”,

invited talk @NeurIPS 2018, Montreal, Canada

No possibility to verify

No possibility to extend

Lots of overhead created

Leads to no trust in scientific results

Why Is This a Problem?

Ensure reproducibility

Of your own experiments

Of other people’s experiments

Promote open-source code

Make it easy to have “good enough” code

Enable code trustworthiness

How To Make Steps Forward?

Open-Source

Python framework

Built on top of PyTorch

Makes your life easer for:Reproducing your own and other people’s experiments

Provides boilerplate code for:Common deep learning scenarios

Handling time consuming everyday problems

Documentation & Tutorial available

How We Contribute: DeepDIVA

Reproducing Your Own Experiments

Short-term, or work in progress

Long-term, or finished work

Kilometres of poor or incomplete log files

Stochasticity in the process

Short-term Reproducibility Dangers

Meaningful logging

Saving all run parameters and command line args

Providing concise coloured logs

Deterministic runs

Seeding the pseudo-random numbers generators: Python, Numpy and PyTorch.

Disabling CuDNN (NVIDIA Deep Neural Network library) when necessary

How DeepDIVA Ensures Short-term Reproducibility

Poor (or non-existent!) use of version control

Hard-to-die bad programming habits

Silent data modifications

Long-term Reproducibility Dangers

Git status

Linking every run to a specific commit in Git

Allowing this feature to be disabled for dev purposes

Copy code

Copying the entire running code in the output folder

Data Integrity Management

Footprint of the data in a JSON file using SHA-1 hashes

How DeepDIVA Ensures Long-term Reproducibility

Reproducing Other People’s Experiments

Given a paper, try to replicate the results and observations

In order to reproduce an experiment one needs:

Git repository URL

Git commit identifier (full SHA)

List of command line arguments used

The data

Reproducing Other People’s Experiments

Productivity Out-of-the-box

Making your life easier: do not reinvent the wheel!

“One click away” Deep Learning Scenarios

“when the data is ready the task is solved”

Download a dataset with a click

Natural images, medical images, historical documents, …

Split your datasetTrain, Validation and Test splits

Analyse the data

Mean/std and class distributions

Ensure data integrity

Compare the footprints

Prepare Your Data

Real-time Visualizations

Tensorboard (from TensorFlow)

Confusion Matrix

Features Visualization

Weight Histograms

Performance Evaluation

Let machine learning find the best values

No expensive grid or random search

Automatic Hyper-Parameter Optimization

Be A Part Of It

Getting Started With DeepDIVA

No Setup Time From source on Ubuntu (or other flavours of Linux)Docker Image Coming Soon

DocumentationOnline and in the code

TutorialsLearn new features efficiently

Fork ItExtensive and modular for easy modifications

How To Use It

Make Your Experiment Reproducible

bit.ly/DeepDIVA

New Improving Reproducible Deep Learning Workflows with DeepDIVA · 2019. 7. 1. · Improving...

Documents

ENMAX Supporting Processes and Improving GIS Data Workflows with FME

Improving Patient Satisfaction With Technology: A Series of Workflows

Towards Reproducible Data Analysis Using Container ...€¦ · Researcher’s side recommendations for Open Science: Share data, software, workflows and other digital artifacts. Persistent

Developing reproducible bioinformatics analysis workflows for … · 2019-11-01 · University of Groningen Developing reproducible bioinformatics analysis workflows for heterogeneous

From Order to Checkout Improving Workflows through Acq, Cat and Circ Jackie Wrosch Systems Librarian Eastern Michigan Univeristy

Reproducible Data Analysis in Drug Discovery with ... › smash › get › diva2:1242336 › FULLTEXT01.pdf · Reproducible Data Analysis in Drug Discovery with Scientific Workflows

Flexible, accessible & reproducible workflows for tandem ... accessible & reproducible workflows for tandem proteogenomic and metaproteomic analysis using the ... • We plan to search

Improving your workflows and awareness in the team with tools

Driving 4K Workflows with Thunderbolt 2 - Intel...to display and transfer 4K content simultaneously, dramatically improving post-production workflows. thuNdErBolt™ 2 KEEpS up With

eConsult: Improving Clinical Workflows and Access … info/esc...Reassess prn if anything changes. Referrer Question Specialist Response How eConsult Improves Clinical Workflows Patients

Improving Immunohistochemistry Standardization in your Laboratory: Renewable Reproducible and Consistent Reference Standards

Reproducible Research

Using GPUs to Generate Reproducible Workflows to ... · Docker/Kubernetes Cluster Supercomputer Servers • Deploy parallelized runs for hyperparameter search • Memory/GPU/CPU-

Building Reproducible Network Data Analysis / Visualization Workflows

Terminal Manager Managing your Operations Global ...€¦ · automate workflows, improving productivity as well as terminal safety and security. To automate workflows primarily among

Improving Risk Evaluation and Mitigation Strategy · Improving Risk Evaluation and . Mitigation Strategy. By leveraging shared documents and collaborative workflows, life sciences

Reproducible Computational Workflows with Continuous Analysis · computational biology experiments, which are scripted, should be straightforward. ... The practice of “open science”

Improving Cybersecurity Management, Workflows and Processes Security Suite.pdf · Improving Cybersecurity Management, Workflows and Processes. The business benefits of the Skybox

MAKING SOCIAL SCIENCE MORE REPRODUCIBLE...Aligning Data Curation Workflows with Data Quality Review 2. Cornell Institute for Social and Economic Research Providing Data Curation and

Research Methods in Political Science I - 3. Reproducible ... › teaching › rm1 › contents › ... · Reproducible Research Conducting Reproducible Research with R and RStudio