44
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown Department of Computer Science University of British Columbia Canada COSEAL Workshop M¨ unster, Germany 2013/07/29

Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Auto-WEKA:Combined Selection

and Hyperparameter Optimizationof Classification Algorithms

Chris Thornton, Frank Hutter,Holger H. Hoos, Kevin Leyton-Brown

Department of Computer ScienceUniversity of British Columbia

Canada

COSEAL Workshop Munster, Germany

2013/07/29

Page 2: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Kevin Leyton-Brown!UBC

Frank Hutter!UBC

Chris Thornton!UBC

Page 3: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Thomas Stützle!U. Libre de Bruxelles

Chris Fawcett!UBC

Marius Schneider!U. Potsdam

James Styles!UBC

Alan Hu!UBC

Domagoj Babić!UBC

Torsten Schaub!U. Potsdam

Benjamin Kaufmann!U. Potsdam

Martin Müller!U. of Alberta

Marco Chiarandini!U. Southern Denmark

Alfonso Gerevini!U. di Brescia

Alessandro Saetti!U. di Brescia

Mauro Vallati!U. di Brescia

Matle Helmert!U. Freiburg

Erez Karpas!Technion

Gabriele Röger!U. Freiburg

Jendrik Seipp!U. Freiburg

Thomas Barz-Beielstein!FH Köln

Page 4: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Fundamental problem:

Which of many available algorithms (models) applicable togiven machine learning problem to use, and with whichhyper-parameter settings?

Example: WEKA contains 39 classification algorithms,Example: 3⇥ 8 feature selection methods

Key idea:

simultaneously solve algorithm selection+ hyperparameter optimisation problem

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 2

Page 5: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

1

Page 6: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Auto-WEKA approach:

I model space of all combinations of classification algorithms,feature selection methods as single parametric algorithm

I select between the 39⇥ 3⇥ 8 algorithms using high-levelcategorical choices

I consider hyper-parameters for each algorithm

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 3

Page 7: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

. . .

is_base

base class

iterationspercentageuse_resampling

iterationspercentageout_of_bag_err

AdaBoostM1 Bagging

meta_base

num_classes

. . .

Voting Stackingcombination_rule (none)

base_1 base_2 base_5

true

false

�1 �2 �5

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 4

Page 8: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

feat_sel

feat_ser feat_evalfalsetrue

. . . directionnon-improving nodeslookup cache

fwd./bkwd.conservativethreshold

Best FirstGreedy Stepwise

. . . num neighboursweight by distance...

missing as separateinclude locally predictive

RELIEFCFS Subset

. . .

true(none)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 5

Page 9: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Auto-WEKA approach:

I model space of all combinations of classification algorithms,feature selection methods as single parametric algorithm

I select between the 39⇥ 3⇥ 8 algorithms using high-levelcategorical choices

I consider hyper-parameters for each algorithm

I solve resulting algorithm configuration problem usinggeneral-purpose configurator

Automated configuration process:

I configurator: SMAC (Hutter, HH, Leyton-Brown 2011–13)

I performance objective: cross-validated mean error rate

I time budget: 4⇥ 30 CPU hours

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 6

Page 10: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Selected results (mean error rate)

Auto-WEKA

Dataset #Instances #Features #Classes Best Def. TPE SMAC

Semeion 1115+478 256 10 8.18 8.26 5.08

KR-vs-KP 2237+959 37 2 0.31 0.54 0.31

Waveform 3500+1500 40 3 14.40 14.23 14.42

Gisette 4900+2100 5000 2 2.81 3.94 2.24

MNIST Basic 12k+50k 784 10 5.19 12.28 3.64

CIFAR-10 50k+10k 3072 10 64.27 66.01 61.15

Auto-WEKA better than full grid search in 15/21 cases

Further details: KDD-13 paper (to appear)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 7

Page 11: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 8

Page 12: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Which classifiers were chosen by Auto-WEKA?

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 9

Page 13: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Some results for regression problems (RMSE)

Auto-WEKA

Dataset #Instances #Features Best Def. TPE SMAC

Forest Fires 362 + 155 12 63.55 63.73 64.36

Crime 1396 + 598 126 0.1404 0.1356 0.1376

Abalone 2924 + 1253 8 2.130 2.072 2.101

Parkinsons – Motor 4113 + 1762 20 0.6323 0.5627 0.4047

Parkinsons – Total 4113 + 1762 20 0.7999 0.3837 0.1606

COIL 5822 + 4000 85 0.2328 0.2471 0.2317

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 10

Page 14: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Auto-WEKA ...

I beats oracle (optimal) choice from large set of ML algorithmswith default hyper-parameter settings

I beats full grid search over all algorithms, hyper-parameters;also beats random search (Bergstra & Bengio 12)

I e↵ectively solves combined algorithm selection+ hyper-parameter optimisation problemon standard 4-core machine in less than 1.5 days

Note:

I general-purpose algorithm configurator (SMAC) outperformsbest method from ML literature (TPE, Bergstra et al. 2011)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 11

Page 15: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

2

Page 16: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Algorithm configuration

Observation: Many algorithms have parameters(sometimes hidden / hardwired) whose settingsa↵ect performance

Challenge: Find parameter settings that achieve good / optimalperformance on given type of input data

Example: IBM ILOG CPLEX

I widely used industrial optimisation software

I exact solver, based on sophisticated branch & cut algorithmand numerous heuristics

I 159 parameters, 81 directly control search process

I find parameter settings that solve MIP-encoded wildlifecorridor construction problems as fast as possible

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 12

Page 17: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 13

Page 18: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Lo Hi

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 13

Page 19: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Lo Hi

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 13

Page 20: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

The algorithm configuration problem

Given:

I parameterised target algorithm A

with configuration space C

I set of (training) inputs I

I performance metric m

(w.l.o.g. to be minimised)

Want: c

⇤ 2 argminc2C m(A[c], I )

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 14

Page 21: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Algorithm configuration is challenging:

I size of configuration space

I parameter interactions

I discrete / categorical parameters

I conditional parameters

I performance varies across inputs (problem instances)

I evaluating poor configurations can be very costly

I censored algorithm runs

standard optimisation methods are insu�cient

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 15

Page 22: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Algorithm configuration approaches

I Sampling methods(e.g., REVAC, REVAC++ – Nannen & Eiben 06–09)

I Racing(e.g., F-Race – Birattari, Stutzle, Paquete, Varrentrapp 02;Iterative F-Race – Balaprakash, Birattari, Stutzle 07)

I Model-free search(e.g., ParamILS – Hutter, HH, Stutzle 07;

Hutter, HH, Leyton-Brown, Stutzle 09;GGA – Ansotegui, Sellmann, Tierney09)

I Sequential model-based (aka Bayesian) optimisation(e.g., SPO – Bartz-Beielstein 06; SMAC – Hutter, HH, Leyton-Brown 11–12)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 16

Page 23: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential model-based optimisatione.g., Jones (1998), Bartz-Beielstein (2006)

I Key idea:use predictive performance model (response surface model) tofind good configurations

I perform runs for selected configurations (initial design)and fit model (e.g., noise-free Gaussian process model)

I iteratively select promising configuration,perform run and update model

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 17

Page 24: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 25: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 26: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

predicted best

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 27: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 28: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

predicted best

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 29: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 30: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

predicted best

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 31: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

measured

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 32: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Optimisation

parameter response

model

predicted best

measured

new incumbent found!

(Initialisation)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 18

Page 33: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Sequential Model-based Algorithm Configuration (SMAC)Hutter, HH, Leyton-Brown (2011)

I uses random forest model to predict performanceof parameter configurations

I predictions based on algorithm parameters and instancefeatures, aggregated across instances

I finds promising configurations based on expected improvement

criterion, using multi-start local search and random sampling

I impose time-limit for algorithm based onperformance observed so far (adaptive capping)

I initialisation with single configuration(algorithm default or randomly chosen)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 19

Page 34: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

CPLEX 11 on Wildlife Corridor Design

10!3

10!2

10!1

100

101

102

103

104

10!3

10!2

10!1

100

101

102

103

104

Runtime of default config. [CPU s]

Ru

ntim

e o

f co

nfig

. fo

un

d b

y S

MA

C [

CP

U s

]

191⇥ speedup on average!

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 20

Page 35: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

3

Page 36: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Programming by Optimisation (PbO)HH (2010–12)

Key idea:

I program (large) space of programs

I encourage software developers toI avoid premature commitment to design choicesI seek & maintain design alternatives

I automatically find performance-optimising designsfor given use context(s)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 21

Page 37: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

application context 1

solver

application context 2 application context 3

solversolver

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 22

Page 38: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

application context 1

solver[p1]

application context 2 application context 3

solver[p3]solver

solver[·]

solversolversolversolver[p2]

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 22

Page 39: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Levels of PbO:

Level 4: Make no design choice prematurely thatcannot be justified compellingly.

Level 3: Strive to provide design choices andalternatives.

Level 2: Keep and expose design choices consideredduring software development.

Level 1: Expose design choices hardwired intoexisting code (magic constants, hiddenparameters, abandoned design alternatives).

Level 0: Optimise settings of parameters exposedby existing software.

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 23

Page 40: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Success in optimising speed:

Application, Design choices Speedup PbO level

SAT-based software verification (Spear), 41Hutter, Babic, HH, Hu (2007)

4.5–500 ⇥ 2–3

AI Planning (LPG), 62Vallati, Fawcett, Gerevini, HH, Saetti (2011)

3–118 ⇥ 1

Mixed integer programming (CPLEX), 76Hutter, HH, Leyton-Brown (2010)

2–52 ⇥ 0

... and solution quality:

University timetabling, 18 design choices, PbO level 2–3 new state of the art; UBC exam schedulingFawcett, Chiarandini, HH (2009)

Machine learning / Classification, 786 design choices, PbO level 0–1 outperforms specialised model selection & hyper-parameter optimisation methods from machine learningThornton, Hutter, HH, Leyton-Brown (2012–13)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 24

Page 41: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Software development in the PbO paradigm

use context

PbO-<L>source(s)

parametric<L>

source(s)

instantiated<L>

source(s)

deployedexecutable

designspace

description

PbO-<L> weaver

PbO design

optimiser

benchmarkinputs

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 25

Page 42: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

PbO enables . . .

I performance optimisation for di↵erent use contexts(as shown for many problems)

I adaptation to changing use contexts(see, e.g., life-long learning – Thrun 1996)

I self-adaptation while solving given problem instance(e.g., Battiti et al. 2008; Carchrae & Beck 2005; Da Costa et al. 2008;

Wessing et al. 2011)

I automated generation of instance-based solver selectors(e.g., SATzilla – Leyton-Brown et al. 2003, Xu et al. 2008;

Hydra – Xu et al. 2010; ISAC – Kadioglu et al. 2010)

I automated generation of parallel solver portfolios(e.g., Huberman et al. 1997; Gomes & Selman 2001;

Schneider et al. 2012)

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 26

Page 43: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Communications of the ACM, 55(2), pp. 70–80, February 2012

www.prog-by-opt.net

Page 44: Auto-WEKA: Combined Selection and Hyperparameter Optimization …hoos/Talks/coseal-13-slides.pdf · 2013-07-30 · Auto-WEKA: Combined Selection and Hyperparameter Optimization of

Take-home message:

I state-of-the-art algorithm configuration procedures enablee↵ective selection and hyper-parameter optimisationof machine learning algorithms Auto-WEKA

I ... as well as an algorithm design approachthat avoids premature commitment to design choicesand leverages human creativity PbO

I ... this is just the beginning,lots of work further work to be done(methodology, tools, applications)

Auto-WEKA paper, code: www.cs.ubc.ca/labs/beta/Projects/autoweka

Thornton, Hutter, Hoos, Leyton-Brown: Auto-WEKA 28