65
Stochastic Optimization Marc Schoenauer DataScience @ LHC Workshop 2015

Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Marc Schoenauer

DataScience @ LHC Workshop 2015

Page 2: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 3: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 4: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Hypotheses

Search Space Ω with some topological structure

Objective function F assume some weak regularity

Hill-Climbing

Randomly draw x0 ∈ Ω and compute F(x0) Initialisation

Until(happy)

y = Best neighbor(xt) neighbor structure on Ω

Compute F(y)If F(y) F (xt) then xt+1 = y accept if improvement

else xt+1 = xt

Comments

Find closest local optimum defined by neighborhood structure

Iterate from different x0’s Until(very happy)

Page 5: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Hypotheses

Search Space Ω with some topological structure

Objective function F assume some weak regularity

Stochastic (Local) Search

Randomly draw x0 ∈ Ω and compute F(x0) Initialisation

Until(happy)

y = Random neighbor(xt) neighbor structure on Ω

Compute F(y)If F(y) F (xt) then xt+1 = y accept if improvement

else xt+1 = xt

Comments

Find one close local optimum defined by neighborhood structure

Iterate, leaving current optimum Iterated Local Search

Page 6: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Hypotheses

Search Space Ω with some topological structure

Objective function F assume some weak regularity

Stochastic (Local) Search – alternative viewpoint

Randomly draw x0 ∈ Ω and compute F(x0) Initialisation

Until(happy)

y = Move(xt) stochastic variation on Ω

Compute F(y)If F(y) F (xt) then xt+1 = y accept if improvement

else xt+1 = xt

Comments

Find one close local optimum defined by the Move operator

Iterate, leaving current optimum Iterated Local Search

Page 7: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Hypotheses

Search Space Ω with some topological structure

Objective function F assume some weak regularity

Stochastic Search aka Metaheuristics

Randomly draw x0 ∈ Ω and compute F(x0) Initialisation

Until(happy)

y = Move(xt) stochastic variation on Ω

Compute F(y)xt+1 = Select(xt,y) stochastic selection, biased toward better points

Comments

Can escape local optima e.g., Select = Boltzman selection

Iterate ?

Page 8: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization

Hypotheses

Search Space Ω with some topological structure

Objective function F assume some weak regularity

Population-based Metaheuristics

Randomly draw xi0 ∈ Ω and compute F(xi0), i = 1, . . . , µ Initialisation

Until(happy)

yi = Move(x1t , . . . ,x

µt ), i = 1, . . . , λ stochastic variations on Ω

Compute F(yi), i = 1, . . . , λ

(x1t+1, . . . ,x

µt+1) = Select(x1

t , . . . ,xµt ,y

1, . . . ,yλ) (µ+ λ) selection

= Select(y1, . . . ,yλ) (µ, λ) selection

Evolutionary Metaphor: ’natural’ selection and ’blind’ variations

yi = Movem(xit) for some i: mutation

yi = Movec(xit,x

jt ) for some (i, j): crossover

Page 9: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Metaheuristics: an Alternative Viewpoint

Population and variation operators define a distribution on Ω

−→ directly evolve a parameterized distribution P (θ)

Distribution-based Metaheuristics

Initialize distribution parameters θ0Until(happy)

Sample distribution P (θt)→ y1, . . . ,yλ ∈ ΩEvaluate y1, . . . ,yλ on FUpdate parameters θt+1 ← Fθ(θt,y1, . . . ,yλ,F(y1), . . . ,F(yλ))

includes selection

Covers

Estimation of Distribution Algorithms

Population-based Metaheuristics Evolutionary Algorithms

Particle Swarm Optimization, Differential Evolution, . . .

Deterministic algorithms

Page 10: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

This is (mostly) Experimental Science

Theory

lagging behind practice

though rapidly catching up in recent years

not discussed here

Results need to be experimentally validated

Experiments

Validation relies on

grounded statistical validation CPU costly

informed (meta)parameter setting/tuning CPU costly ++

reproducibility Open Science

Page 11: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

From Solution Space to Search Space

Choices

Objective function

Search space representation

and move operators / parameterized distribution

Approximation bias vs optimization error

This talk

Non-convex, non-smooth objective parametric continuous optimization

→ Rank-based ML seeded identification of influencers

Search spaces and crossover operators the TSP

Non-parametric representations exploration vs optimization

Page 12: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous OptimizationThe Evolution StrategyThe CMA-ES (Covariance Matrix Adaptation Evolution Strategy)The Rank-based DataScience

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 13: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous OptimizationThe Evolution StrategyThe CMA-ES (Covariance Matrix Adaptation Evolution Strategy)The Rank-based DataScience

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 14: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization Templates

Population-based Metaheuristics

Randomly draw xi0 ∈ Ω and compute F(xi0), i = 1, . . . , µ Initialisation

Until(happy)

yi = Move(x1t , . . . ,x

µt ), i = 1, . . . , λ stochastic variations on Ω

Compute F(yi), i = 1, . . . , λ(x1t+1, . . . ,x

µt+1) = Select(x1

t , . . . ,xµt ,y

1, . . . ,yλ) (µ+ λ) selection

= Select(y1, . . . ,yλ) (µ, λ) selection

Distribution-based Metaheuristics

Initialize distribution parameters θ0Until(happy)

Sample distribution P (θt)→ y1, . . . ,yλ ∈ ΩEvaluate y1, . . . ,yλ on FUpdate parameters θt+1 ← Fθ(θt,y1, . . . ,yλ,F(y1), . . . ,F(yλ))

Page 15: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization Templates

Distribution-based Metaheuristics

Initialize distribution parameters θ0Until(happy)

Sample distribution P (θt)→ y1, . . . ,yλ ∈ ΩEvaluate y1, . . . ,yλ on FUpdate parameters θt+1 ← Fθ(θt,y1, . . . ,yλ,F(y1), . . . ,F(yλ))

Page 16: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

The (µ, λ)−Evolution Strategy

Ω = Rn P (θ): normal distributions

Initialize distribution parameters m, σ,C, set population size λ ∈ N

While not terminate

1 Sample distribution N(m, σ2C

)→ y1, . . . ,yλ ∈ Rn

yi ∼m+ σNi(0,C) for i = 1, . . . , λ

2 Evaluate y1, . . . ,yλ on F

Compute F(y1), . . . ,F(yλ)

3 Select µ best samplesy1, . . . ,yµ

4 Update distribution parameters

m, σ,C← F (m, σ,C,y1, . . . ,yµ,F(y1), . . . ,F(yµ))

Page 17: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

The (µ, λ)−Evolution Strategy (2)

Gaussian Mutations

mean vector m ∈ Rn is the current solution

the so-called step-size σ ∈ R+ controls the step length

the covariance matrix C ∈ Rn×n determines the shape of thedistribution ellipsoid

How to update m, σ, and C?

m =1

µ

i=µ∑i=1

xi:λ “Crossover” of best µ individuals

Adaptive σ and C

Page 18: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Need for adaptation

Sphere function f(x) =∑x2i , n = 10

Page 19: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous OptimizationThe Evolution StrategyThe CMA-ES (Covariance Matrix Adaptation Evolution Strategy)The Rank-based DataScience

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 20: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Cumulative Step-Size Adaptation (CSA)

Measure the length of the evolution path

path of the mean vector m in the generation sequence

↓decrease σ

↓increase σ

Compare to random walk without selection

Page 21: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

initial distribution, C = I

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 22: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

yw, movement of the population mean m (disregarding σ)

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 23: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

mixture of distribution C and step yw, C← 0.8×C + 0.2× ywyTw

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 24: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

new distribution (disregarding σ)

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 25: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

movement of the population mean m

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 26: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

mixture of distribution C and step yw,C← 0.8×C + 0.2× ywyT

w

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 27: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Covariance Matrix AdaptationRank-One Update

m ← m+ σyw, yw =∑µi=1 wi yi:λ, yi ∼ Ni(0,C)

new distribution: C← 0.8×C + 0.2× ywyTw

ruling principle: the adaptation increases the probability of successfulsteps, yw, to appear again

Page 28: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

CMA-ES in one slide

Input: m ∈ Rn, σ ∈ R+, λInitialize: C = I, and pc = 0, pσ = 0,Set: cc ≈ 4/n, cσ ≈ 4/n, c1 ≈ 2/n2, cµ ≈ µw/n2, c1 + cµ ≤ 1,dσ ≈ 1 +

õwn

, and wi=1...λ such that µw = 1∑µi=1 wi

2 ≈ 0.3λ

While not terminate

xi = m+ σ yi, yi ∼ Ni(0,C) , for i = 1, . . . , λ sampling

m←∑µi=1 wi xi:λ = m+ σyw where yw =

∑µi=1 wi yi:λ update mean

pc ← (1− cc)pc + 1I‖pσ‖<1.5√n√

1− (1− cc)2√µw yw cumulation for C

pσ ← (1− cσ) pσ +√

1− (1− cσ)2√µwC−

12 yw cumulation for σ

C← (1− c1 − cµ)C + c1 pcpcT + cµ

∑µi=1 wi yi:λy

Ti:λ update C

σ ← σ × exp(cσdσ

(‖pσ‖

E‖N(0,I)‖ − 1))

update of σ

Not covered on this slide: termination, restarts, active or mirrored sampling,

outputs, and boundaries

Page 29: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Invariances: Guarantee for Generalization

Invariance properties of CMA-ES

Invariance to order preserving transformationsin function space

like all comparison-based algorithms

Translation and rotation invarianceto rigid transformations of the search space

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

→ Natural Gradient in distribution space NES, Schmidhuber et al., 08

IGO, Olliver et al., 11

CMA-ES is almost parameterless

Tuning of a small set of functions Hansen & Ostermeier 2001

Default values generalize to whole classes

Exception: population size for multi-modal functionsbut see IPOP-CMA-ES Auger & Hansen, 05

and BIPOP-CMA-ES Hansen, 09

Page 30: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

BBOB – Black-Box Optimization Benchmarking

ACM-GECCO workshops, 2009-2015 http://coco.gforge.inria.fr/

Set of 25 benchmark functions, dimensions 2 to 40

With known difficulties (ill-conditioning, non-separability, . . . )

Noisy and non-noisy versions

Competitors include

BFGS (Matlab version),

Fletcher-Powell,

DFO (Derivative-FreeOptimization, Powell 04)

Differential Evolution

Particle Swarm Optimization

and many more0 1 2 3 4 5 6 7 8 9

log10 of (ERT / dimension)

0.0

0.5

1.0

Proportion of functions

RANDOMSEARCHSPSABAYEDADIRECTDE-PSOGALSfminbndLSstepRCGARosenbrockMCSABCPSOPOEMSEDA-PSONELDERDOERRNELDERoPOEMSFULLNEWUOAALPSGLOBALPSO_BoundsBFGSONEFIFTHCauchy-EDANBC-CMACMA-ESPLUSSELNEWUOAAVGNEWUOAG3PCX1komma4mirser1plus1CMAEGSDEuniformDE-F-AUCMA-LS-CHAINVNSiAMALGAMIPOP-CMA-ESAMALGAMIPOP-ACTCMA-ESIPOP-saACM-ESMOSBIPOP-CMA-ESBIPOP-saACM-ESbest 2009f1-24

Page 31: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

BBOB – Black-Box Optimization Benchmarking

ACM-GECCO workshops, 2009-2015 http://coco.gforge.inria.fr/

Set of 25 benchmark functions, dimensions 2 to 40

With known difficulties (ill-conditioning, non-separability, . . . )

Noisy and non-noisy versions

Competitors include

BFGS (Matlab version),

Fletcher-Powell,

DFO (Derivative-FreeOptimization, Powell 04)

Differential Evolution

Particle Swarm Optimization

and many more0 1 2 3 4 5 6 7 8 9

log10 of (ERT / dimension)

0.0

0.5

1.0

Proportion of functions

RANDOMSEARCHSPSABAYEDADIRECTDE-PSOGALSfminbndLSstepRCGARosenbrockMCSABCPSOPOEMSEDA-PSONELDERDOERRNELDERoPOEMSFULLNEWUOAALPSGLOBALPSO_BoundsBFGSONEFIFTHCauchy-EDANBC-CMACMA-ESPLUSSELNEWUOAAVGNEWUOAG3PCX1komma4mirser1plus1CMAEGSDEuniformDE-F-AUCMA-LS-CHAINVNSiAMALGAMIPOP-CMA-ESAMALGAMIPOP-ACTCMA-ESIPOP-saACM-ESMOSBIPOP-CMA-ESBIPOP-saACM-ESbest 2009f1-24

(should be) available in ROOT6 Benazera, Hansen, Kegl, 15

Page 32: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous OptimizationThe Evolution StrategyThe CMA-ES (Covariance Matrix Adaptation Evolution Strategy)The Rank-based DataScience

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 33: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Seeded Influencer IdentificationPhilippe Caillou & Michele Sebag, coll SME Augure

Goal

Find the influencers in a given social network

from public data sources tweets, blogs, articles, . . .

Commercial goal: so as to bribe them :-)

Scientific goal: with little user input

Page 34: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

What is an Influencer?

No clear definition

# followers?

Highly retweeted? but # retweets disagrees with # followers

Sources of information cascades?

# invitations to join?

Topic-specific PageRanking?

Presence on Wikipedia?

Need for

Topic-specific features

and graph features

and user input no generic ground truth

Page 35: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Seeded Influencer Ranking

Input

A social network

(Big) Data of user interactions Tweets, blogs, messages, . . .

Some identified influencers

The global picture

Derive features representing the users using content and traces

Optimize scoring function giving highest scores to known influencers

return best-k scoring users the most (known + new) influencers

Page 36: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Search Space and Learning Criterion

The scoring function

Each individual is described by d features xi ∈ IRd

Non-linear score h defined by pair (v, a) in IRd × IRd

hv,a(x) =

d∑i=1

vi|xi − ai|

A rank-based criterion

Each score hv,a induces a ranking Rv,a on the dataset

Goal: Known influencers (KIs) ranked highest Best has rank n

Non-convex optimization

ArgMax(v,a) ∑xi∈KI

Rv,a(xi)

−→ Stochastic Optimizer e.g., sep-CMA-ES

Page 37: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Data

Domains

Fashion: 18/23 influencers with twitter account

SmartPhones: 69/206 influencers with twitter account

Social Influence: 39/39 influencers with twitter account

Wine: 28/235 influencers with twitter account, coming from publicsources (e.g., Time Magazine)

Human Resources: 99 influencers from the 100 most influencialpeople in HR and recruiting on twitter

Sources

random 10% of all retweets of November 2014

100M tweets, 45M unique tweets,

with origin-destination pairs → 43M nodes graph

only words in at least 0.001% and at most 10% tweets considered→ 10.5M candidates

Page 38: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Features

100 Content-based Features

foreach medium eventually

Foreach user x, identify W(x), the N words with max tf-idfterm frequency - inverse document frequency

50 words that appear most often in all W(influencer)

50 words with max sum of tf-idf over⋃W(influencer)

each selected word is a feature for x: 0 if not present in W(x), ti-idfotherwise Many are null for most candidates

6 Network Features

from the weighted graph of retweets centrality, PageRank, . . .

5 Social Features

from tweeter user profiles # tweets, # followers, . . .

Page 39: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Seeded Influencer Identification: Discussion

Results

Better than using only social features

Found unexpected influencers

Confirmed by experts

Forthcoming application

Unveil some influencial xxx-sceptics Global warming, genocides, . . .

Page 40: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial OptimizationThe Crossover ControversyThe TSP Illustration

4 Non-Parametric Representations

5 Conclusion

Page 41: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial OptimizationThe Crossover ControversyThe TSP Illustration

4 Non-Parametric Representations

5 Conclusion

Page 42: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization Templates (reminder)

Population-based Metaheuristics

Randomly draw xi0 ∈ Ω and compute F(xi0), i = 1, . . . , µ Initialisation

Until(happy)

yi = Move(x1t , . . . ,x

µt ), i = 1, . . . , λ stochastic variations on Ω

Compute F(yi), i = 1, . . . , λ(x1t+1, . . . ,x

µt+1) = Select(x1

t , . . . ,xµt ,y

1, . . . ,yλ) (µ+ λ) selection

= Select(y1, . . . ,yλ) (µ, λ) selection

Distribution-based Metaheuristics

Initialize distribution parameters θ0Until(happy)

Sample distribution P (θt)→ y1, . . . ,yλ ∈ ΩEvaluate y1, . . . ,yλ on FUpdate parameters θt+1 ← Fθ(θt,y1, . . . ,yλ,F(y1), . . . ,F(yλ))

Page 43: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Stochastic Optimization Templates (reminder)

Population-based Metaheuristics

Randomly draw xi0 ∈ Ω and compute F(xi0), i = 1, . . . , µ Initialisation

Until(happy)

yi = Move(x1t , . . . ,x

µt ), i = 1, . . . , λ stochastic variations on Ω

Compute F(yi), i = 1, . . . , λ(x1t+1, . . . ,x

µt+1) = Select(x1

t , . . . ,xµt ,y

1, . . . ,yλ) (µ+ λ) selection

= Select(y1, . . . ,yλ) (µ, λ) selection

Page 44: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Crossover or not Crossover

Issues with crossover

Crossover ubiquitous in naturethough not in all bacteria

Showed to mainly work on dynamical problemsCompared to ILS, SA, Ch. Papadimitriou et al., 15

Page 45: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Crossover or not Crossover

Issues with crossover

Crossover ubiquitous in naturethough not in all bacteria

Showed to mainly work on dynamical problemsCompared to ILS, SA, Ch. Papadimitriou et al., 15

Yes, but

Most artificial crossover operators exchangechunks of solutions

Nature does not exchange “organs”

but chunks of programs that builds thesolution

6=

Page 46: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Crossover or not Crossover

Issues with crossover

Crossover ubiquitous in naturethough not in all bacteria

Showed to mainly work on dynamical problemsCompared to ILS, SA, Ch. Papadimitriou et al., 15

Yes, but

Most artificial crossover operators exchangechunks of solutions

Nature does not exchange “organs”

but chunks of programs that builds thesolution

6=

Better no crossover than a poor crossover

Page 47: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial OptimizationThe Crossover ControversyThe TSP Illustration

4 Non-Parametric Representations

5 Conclusion

Page 48: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

The Traveling Salesman Problem

Find shortest Hamiltoniancircuit on n cities

dk11455.tsp

Permutation Representation

Ordered list of cities:no semantic link with the objective

Blind exchange of chunks meaningless,and leads to unfeasible circuits

Edge Representation

List of all edges of the tour: unordered

clearly related to objective

but blind exchange of edges doesn’twork either

Challenge or opportunity?

A strong exact solver Concorde

A very strong heuristic solver LKH-2

Page 49: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Lin-Kernighan-Helsgaun heuristic Helsgaun, 06

LKH-2

Local Search

Deep k-opt moves efficient implementation

Many ad hoc heuristics

Multi-trial LKH-2

Iterated Local Search

Using . . . a crossover operator between best-so-far and new localoptima

Iterative Partial Transcription, Mobius et al., 99-08

a particular case of GPX next slide

Most best known results on non-exactly solved instances, up to 10M cities

Page 50: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Generalized Partition Crossover (GPX) Whitley, Hains, & Howe, 09-12

Respectful and Transmitting crossover

Find k-partitions of cost 2 if exist O(n)

Chooses best of 2k − 2 solutions O(k)

All common edges are keptNo new edge is created

Two possible hybridizations

Evolutionary Algorithm using GPX anditerated 2-opt as mutation operator

Some best results on small instance < 10k cities

Replacing IPT in LKH-2 heuristicImproved some state-of-the-art results

Page 51: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Edge Assembly Crossover Nagata & Kobayashi, 06-13

The EAX crossover

1 Merge A and B

2 Alternate A/B edges

3 Local, then globalchoice strategy

4 Use sub-tours to removeparent edges

5 Local search for minimalnew edges

The EAX algorithm

Foreach randomly chosen parent pair typically 300 iterations

apply EAX N times typically 30

keep best of parents + N offspring diversity-preserving selection

Best known results on several instances . . .≤ 200k cities

Page 52: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Evolutionary TSP: Discussion

Meta or not Meta?

GPX can be applied to other graph problems

EAX is definitely a specialized TSP operator

Take Home Message

Efficient crossovers might exist

for semantically relevant representations

but require domain knowledge . . . and fine-tuning

Page 53: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric RepresentationsThe Velocity IdentificationThe Planning Decomposition

5 Conclusion

Page 54: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric RepresentationsThe Velocity IdentificationThe Planning Decomposition

5 Conclusion

Page 55: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Identification of Spatial Geological Models

Source**

measure points

** * * * * * * ***

The direct problem

Function φ

applied to model MFind result R = φ(M)

→ numerical simulation

The inverse problem

Given experimental results R∗

find model M∗

such that φ(M∗) = R∗

→ Arg MinM||φ(M)−R∗|2

MS, Ehinger, Braunschweig, 98

(simulated) 3D seismogram

New first seismograms: shot 11

10 20 30 40 50 60 70 80 90 100 110 120

1

10 20 30 40 50 60 70 80 90 100 110 120

0.00

1.00

2.04

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80

1.90

0.00

1.00

2.04

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80

1.90

Plane

Trace

Plane

Trace

120× 2D seismogram

Page 56: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Voronoi Representation

The solution space

Velocities in some 3D domain

Blocky model piecewise constant

Values on a grid/mesh

Thousands of variables

Uncorrelated

Voronoi representation

List of labelled Voronoi sites

Velocity = site label inside each cell

Variation Operators

Geometric exchange crossover

Add, remove, move sites mutations

Variable unordered list ofVoronoi sites, and corresponding

’colored’ Voronoi diagram

Offspring 2Offspring 1

Parent 2Parent 1

The crossover operator

Page 57: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Other Voronoi successes

Evolutionary chairs

Minimize weight

Constraint on stiffness

Permanent collection ofBeaubourg Modern Art Museum

The Seroussi architectural competition

A building for privateexhibitions

Optimize natural lightning ofpaintings all year round

Room seeds for adaptive roomdesing

1st price (out of 8 submissions)

Page 58: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric RepresentationsThe Velocity IdentificationThe Planning Decomposition

5 Conclusion

Page 59: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

The TGV paradigm

Improve a dummy algorithm

I G I G

Cannot directly go from I to GCan go sequentially from I to the in-termediate stations, and to G

→ evolve the sequence of intermediate ’stations’

Representation

Variable-length ordered lists of , ,1 2 3( )

Variation operators

Chromosome-like swap variable-length crossover

Move, add or remove intermediate stations mutations

Page 60: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Evolutionary AI Planning MS, P. Saveant, V. Vidal, 06

AI Planning problem (S,A, I, G)

Given a state space S and a set of actions A conditions, effects

An initial state I and a goal state G possibly partially defined

Find optimal sequence of actions from I to G.

Many complete / satisficing planners biennal IPC since 1998

Evolving intermediate states

Search space: Variable-length ordered lists of states (S1, . . . , Sl)

Solve (S,A, Si, Si+1) using a given planner “dummy”

All sub-problems solved → solution of initial problem

Highlights J. Bibaı et al., 09-11

Solves instances where ’dummy’ planner fails

Silver Medal, Humies Awards, GECCO 2010

Winner, satisficing track, IPC 2011

First multi-objective planner, IJCAI 2013 M. Khouadjia et al.

Page 61: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Non-Parametric Representations: Discussion

Further examples

Spatial geological models: horizontal layers + geological parameterspatented

Neural Networks HyerNeat, Stanley 07; Compressed NNs, Schmidhuber 10

Non-parametric representations

Explore larger search spaces, but with some constraints→ explore a sub-variety of the whole space

Modifies both the approximation biashow far from the true optimum is the best solution under the streetlight?

and the optimization error our algorithm is not perfect

Optimization as an exploratory tool

Page 62: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Content

1 Setting the scene

2 Continuous Optimization

3 Discrete/Combinatorial Optimization

4 Non-Parametric Representations

5 Conclusion

Page 63: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

The Black-Box Hypothesis

Meta or not Meta?

A continuum of embedded classes of problemsWhere does Black Box start? e.g., GPX and EAX

Continuous optimization: at least the dimension is known

The more information the better e.g. multimodality for restarts

Learn about F during the search→ online tuning beyond distribution parameters

Page 64: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Not Covered Today

General

Statistical tests Tutorials available

From parameter setting Programming by Optimization, Hoos, 2014

to hyper-heuristics Burke et al., 08-15

Multi-objective Full track with own conferences

Continuous

Surrogates ML helping Optimization

Constraints from Lagrangian to specific approaches

Noise handling e.g., using Hoffding or Bernstein inequalities

Other Representations/Paradigms

Genetic Programming Search a space of programs

Generative Developmental Systems . . . . . . that build the solution

Page 65: Stochastic Optimization · Stochastic Optimization Hypotheses Search Space with some topological structure Objective function F assume some weak regularity Hill-Climbing Randomly

Conclusion

Flexibility