Upload
others
View
60
Download
0
Embed Size (px)
Citation preview
Moritz [email protected] @morimeister
Maggy-
Open-Source Asynchronous Distributed Hyperparameter Optimization Based on
Apache Spark
FOSDEM’2002.02.2020
The Bitter Lesson (of AI)*
“Methods that scale with computation are the future of AI”**
“The two (general purpose) methods that seem to scale …
... are search and learning.”*
2
** https://www.youtube.com/watch?v=EeMCEQa85tw
Rich Sutton (Father of Reinforcement Learning)
* http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Distribution and Deep Learning
Generalization Error
Better RegularizationMethods
DesignBetter Models
Hyperparameter Optimization
Larger Training Datasets
Better Optimization Algorithms
Distribution
Inner Loop
Outer Loop
worker1
Inner and Outer Loop of Deep Learning
worker2 workerN
…
∆
2
∆
N
Synchronization
http://tiny.cc/51yjdz
Training Data
∆
1
Metric
Search Method
TrialHParams,
Architecture, etc.
Inner Loop
Outer Loop
worker1
Inner and Outer Loop of Deep Learning
worker2 workerN
…
∆
2
∆
N
Synchronization
http://tiny.cc/51yjdz
Training Data
∆
1
Metric
SEARCH LEARNING
Search Method
TrialHParams,
Architecture, etc.
In Reality This Means Rewriting Training Code
Data PipelinesIngest & Prep
Feature Store
Machine Learning Experiments Data Parallel Training
Model ServingAblation
StudiesHyperparameter
Optimization
Explore/Design Model
Towards Distribution Transparency
The distribution oblivious training function (pseudo-code):
Inner Loop
Towards Distribution Transparency
• Trial and Error is slow
• Iterative approach is greedy
• Search spaces are usually large
• Sensitivity and interaction of hyperparameters
Set Hyper-parameters
Train ModelEvaluate Performance
Sequential Black Box Optimization
LearningBlack Box
Metric
Meta-level learning &
optimization
Search space
Outer Loop
Parallel Search
LearningBlack Box
Metric
Global Controller
ParallelWorkersQueue
Trial
Trial
Search space or Ablation Study
Which algorithm to use for search? How to monitor progress?
Fault Tolerance?How to aggregate results?
Outer Loop
Parallel Search
LearningBlack Box
Metric
Meta-level learning &
optimization ParallelWorkersQueue
Trial
Trial
Search space
Which algorithm to use for search? How to monitor progress?
Fault Tolerance?How to aggregate results?
This should be managed with platform support!
Maggy
A flexible framework for asynchronous parallel execution of trials for ML experiments on Hopsworks:
ASHA, Random Search, Grid Search, LOCO-Ablation,Bayesian Optimizationand more to come…
Synchronous Search
Task11
Driver
Task12
Task13
Task1N
…
HDFS
Task21
Task22
Task23
Task2N
…
Barrier
Barrier
Task31
Task32
Task33
Task3N
…
Barrier
Metrics1 Metrics2 Metrics3
Add Early Stopping and Asynchronous Algorithms
Task11
Driver
Task12
Task13
Task1N
…
HDFS
Task21
Task22
Task23
Task2N
…
Barrier
Barrier
Task31
Task32
Task33
Task3N
…
Barrier
Metrics1 Metrics2 Metrics3
Wasted Compute Wasted ComputeWasted Compute
Performance Enhancement
Early Stopping:
● Median Stopping Rule
● Performance curve prediction
Multi-fidelity Methods:
● Successive Halving Algorithm
● Hyperband
Asynchronous Successive Halving Algorithm
Animation: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/Liam Li et al. “Massively Parallel Hyperparameter Tuning”. In: CoRR abs/1810.05934 (2018).
Ablation Studies
PClassname survivesex sexname survive
Replacing the Maggy Optimizer with an Ablator:
• Feature Ablation using the Feature Store
• Leave-One-Layer-Out Ablation
• Leave-One-Component-Out (LOCO)
How can we fit this into the bulk synchronous execution model of Spark?Mismatch: Spark Tasks and Stages vs. Trials
Challenge
Databricks’ approach: Project Hydrogen (barrier execution mode) & SparkTrials in Hyperopt
The Solution
Task11
Driver
Task12
Task13
Task1N
…
Barrier
Metrics
New Trial
Early Stop
Long running tasks and communication:
HyperOpt: One Job/Trial, requiring many Threads on Driver
Conclusions
● Avoid iterative Hyperparameter Optimization
● Black box optimization is hard
● State-of-the-art algorithms can be deployed
asynchronously
● Maggy: platform support for automated hyperparameter
optimization and ablation studies
● Save resources with asynchronism
● Early stopping for sensible models
What’s next?
● More algorithms● Distribution
Transparency● Comparability/
reproducibility of experiments
● Implicit Provenance● Support for PyTorch
AcknowledgementsThanks to the entire Logical Clocks Team ☺
Contributions from colleagues:Robin Andersson @robzor92Sina Sheikholeslami @cutlashKim Hammar @KimHammar1Alex Ormenisan @alex_ormenisan
• Maggyhttps://github.com/logicalclocks/maggy https://maggy.readthedocs.io/en/latest/
• Hopsworkshttps://github.com/logicalclocks/hopsworkshttps://www.logicalclocks.com/whitepapers/hopsworks
• Feature Store: the missing data layer in ML pipelines?https://www.logicalclocks.com/feature-store/
@hopsworksHOPSWORKS