We circumvent the expensive evaluation with a deterministic RBF surrogate defined as: We tackle the high-dimensionality by reducing the probability φ of searching along dimension: We escape local minima by computing a compound score for each candidate point. The score is a dynamic weighted average of a metric based on the distance from the best found solution and of a metric based on the candidate surrogate value. Efficient Hyperparameter Optimization of Deep Learning Algorithms using Deterministic RBF Surrogates Ilija Ilievski a Taimoor Akhtar b Jiashi Feng c Christine Annette Shoemaker b [email protected] [email protected] [email protected] [email protected] ➢ In short ➢ More : bit.ly/hord-aaai Supplement and more at: ilija139.github.io a) Graduate School for Integrative Sciences and Engineering b) Industrial and Systems Engineering c) Electrical and Computer Engineering ➢ Details ➢ Results Optimization of 19 CNN hyperparameters Optimization of 8 CNN hyperparameters Optimization of 6 MLP hyperparameters Optimization of 15 CNN hyperparameters ➢ Algorithm HORD: Hyperparameter Optimization using deterministic RBF surrogate and DYCORS Input: n 0 init evals, N max max evals, m #candidates Sample n 0 points X n0 ≔{x i } Populate A n0 ≔{X n0 ,f(X n0 )} while n < N max update surrogate S n with A n ≔{X n ,f(X n )} set x best ≔argmin{f(X n )} sample m candidate points t i around x best according to probabilities φ n compute V ev , V dm , and W n for all t i set x * ≔argmin{W n (t i )} set A n+1 ≔{ A n ⋃ (x * ,f(x * )} end while return x best ➢ Comparison Evaluations required by HORD to match the best found error by state-of-the-art hyperparameter optimization algorithms Deep learning algorithms are powerful but are very sensitive to the many hyperparameters they have: number of layers and nodes, learning rate, weights initialization... Optimizing the validation error with respect to the hyperparameters involves the minimization of highly multimodal and expensive function in high dimensions. We propose an algorithm that matches the performance of the state-of-the-art hyperparameter optimization algorithms while using up to 6 times fewer evaluations. Distance metric: Surrogate value metric: Final candidate score:

Efficient Hyperparameter Optimization of Deep Learning ... · Optimization of 19 CNN hyperparameters Optimization of 8 CNN hyperparameters Optimization of 15 CNN hyperparameters Optimization

Download PDF Report

Upload
others
View
12
Download
0

Embed Size (px)

Citation preview

Page 1: Efficient Hyperparameter Optimization of Deep Learning ... · Optimization of 19 CNN hyperparameters Optimization of 8 CNN hyperparameters Optimization of 15 CNN hyperparameters Optimization

We circumvent the expensive evaluation with a

deterministic RBF surrogate defined as:

We tackle the high-dimensionality by reducing the

probability φ of searching along dimension:

We escape local minima by computing a compound score

for each candidate point. The score is a dynamic weighted

average of a metric based on the distance from the best

found solution and of a metric based on the candidate

surrogate value.

Efficient Hyperparameter Optimization of Deep Learning

Algorithms using Deterministic RBF Surrogates

Ilija Ilievski a Taimoor Akhtar b Jiashi Feng c Christine Annette Shoemaker b

[email protected] [email protected] [email protected] [email protected]

➢In

ort

➢M

ore

: bit.ly/hord-aaai Supplement and more at: ilija139.github.ioa) Graduate School for Integrative Sciences and Engineering b) Industrial and Systems Engineering c) Electrical and Computer Engineering

➢D

etai

➢R

esu

lts

Optimization of 19 CNN hyperparameters Optimization of 8 CNN hyperparameters

Optimization of 6 MLP hyperparametersOptimization of 15 CNN hyperparameters

➢A

lgo

rith

HORD: Hyperparameter Optimization using deterministic RBF surrogate and DYCORS Input: n0 init evals, Nmaxmax evals, m #candidates

Sample n0 points Xn0≔{xi} Populate An0≔{Xn0,f(Xn0)} while n < Nmaxupdate surrogate Sn with An≔{Xn,f(Xn)}set xbest≔argmin{f(Xn)}sample m candidate points ti around xbest according to probabilities φncompute Vev, Vdm, and Wn for all tiset x*≔argmin{Wn(ti)}set An+1≔{ An ⋃ (x*,f(x*)}

end while return xbest ➢

aris

Evaluations required by HORD to match the best found error by state-of-the-art hyperparameter optimization algorithms

Deep learning algorithms are powerful but are very

sensitive to the many hyperparameters they have: number

of layers and nodes, learning rate, weights initialization...

Optimizing the validation error with respect to the

hyperparameters involves the minimization of highly

multimodal and expensive function in high dimensions.

We propose an algorithm that matches the performance of

the state-of-the-art hyperparameter optimization

algorithms while using up to 6 times fewer evaluations.