63
Evolutionary Deep Neural Network (or NeuroEvolution) 신수용 2017. 8. 29 @SNU TensorFlow Study

Evolutionary (deep) neural network

Embed Size (px)

Citation preview

Page 1: Evolutionary (deep) neural network

Evolutionary Deep Neural Network(or NeuroEvolution)

신수용2017. 8. 29

@SNU TensorFlow Study

Page 2: Evolutionary (deep) neural network

2

https://www.youtube.com/watch?v=aeWmdojEJf0https://github.com/ssusnic/Machine-Learning-Flappy-Bird

Page 3: Evolutionary (deep) neural network

3

Evolutionary DNN

• Usually, used to decide DNN structure

– Number of layers, number of nodes..

• Can be used to decide weight values

– Flappy bird example

Page 4: Evolutionary (deep) neural network

Evolutionary Computation

Page 5: Evolutionary (deep) neural network

5

Biological Basis

• Biological systems adapt themselves to a new environment by evolution.

• Biological evolution

– Production of descendants changed from their parents

– Selective survival of some of these descendants to produce more descendants

Survival of the Fittest

Page 6: Evolutionary (deep) neural network

6

Evolutionary Computation

• Stochastic search (or problem solving) techniques that mimic the metaphor of natural biological evolution.

Page 7: Evolutionary (deep) neural network

77

General Framework

초기해집합 생성

적합도 평가

종료?

부모 개체 선택

자손 생성

적합도 함수

최적해Yes

No

교차 연산돌연변이 연산

선택 연산

Page 8: Evolutionary (deep) neural network

8

Paradigms in EC

• Genetic Algorithm (GA)– [J. Holland, 1975]

– Bitstrings, mainly crossover, proportionate selection

• Genetic Programming (GP)– [J. Koza, 1992]

– Trees, mainly crossover, proportionate selection

• Evolutionary Programming (EP)– [L. Fogel et al., 1966]

– FSMs, mutation only, tournament selection

• Evolution Strategy (ES)– [I. Rechenberg, 1973]

– Real values, mainly mutation, ranking selection

Page 9: Evolutionary (deep) neural network

Genetic Algorithms

Page 10: Evolutionary (deep) neural network

10

GA(Genetic Algorithms)

• 자연계의 유전 현상을 모방하여 적합한 가설을 얻어내는 방법

• 특성– 진화는 자연계에서 성공적이고 정교한 적응 방법– 모델링하기 힘든 복잡한 문제에도 적용 가능– 병렬화가 가능, H/W 성능의 도움을 받을 수 있음

• 대규모 탐색공간에서 최선의 fitness의 해를 찾는일반적인 최적화 과정

• 최적의 해를 찾는다고 보장할 수는 없지만 높은fitness의 해를 얻을 수 있음

Page 11: Evolutionary (deep) neural network

11

GA의 기본용어 (1/2)

• 염색체 (Chromosome) 개체 (Individual)

– 주어진 문제에 대한 가능한 해 또는 가설

– 대부분 string으로 표현됨

– string의 원소는 정수, 실수 등 필요에 의해 결정됨

• 개체군 (population)

– 개체(가설)들의 집합

1 1 0 1 0 0 1 1

Page 12: Evolutionary (deep) neural network

12

GA의 기본용어 (2/2)

• 적합도 (fitness)– 산술적인 단위로 가설의 적합도를 표시한다.

– 유전자의 각 개체의 환경에 대한 적합의 비율을 평가하는 값

– 평가치로 최적화 문제를 대상으로 하는 경우 목적함수 값이나 제약조건을 고려하여 페널티 함수 값

• 적합도 함수 (fitness function)– 적합도를 구하기 위해서 사용되는 기준방법

Page 13: Evolutionary (deep) neural network

13

GA의 연산자 (1/5)

• 선택 연산자 (Selection Operator)

– 개체를 선택하여 부모들로 선정

– 우수한 자손들이 많이 생성되도록 하기 위해서(해답을 발견하기 위해서) 좀더 우수한 적합도를 가진 개체들이 선택될 확률이 비교적 높도록 함.

– Proportional (Roulette wheel) selection

– Tournament selection

– Ranking-based selection

Page 14: Evolutionary (deep) neural network

14

GA의 연산자 (2/5)

• 교차 연산자 (Crossover Operator)

– 생물들이 생식을 하는 것처럼 부모들의 염색체를 서로 교차시켜서 자손을 만드는 연산자.

– Crossover rate라고 불리는 임의의 확률에 의해서교차연산의 수행여부가 결정된다.

Page 15: Evolutionary (deep) neural network

15

GA의 연산자 (3/5)

– One-point crossover

1 1 0 1 0 0 1 1

0 1 1 1 0 1 1 0

Crossover point

1 1 0 1 0 1 1 0

0 1 1 1 0 0 1 1

Page 16: Evolutionary (deep) neural network

16

GA의 연산자 (4/4)

• 돌연변이 연산자 (Mutation Operator)

– 한 bit를 mutation rate라는 임의의 확률로 변화(flip)시키는 연산자

– 아주 작은 확률로 적용된다. (ex) 0.001

1 1 0 1 0 0 1 1

1 1 0 1 1 0 1 1

Page 17: Evolutionary (deep) neural network

17

Example of Genetic Algorithm

Page 18: Evolutionary (deep) neural network

18

가설 공간 탐색

• 다른 탐색 방법과의 비교

– local minima에 빠질 확률이 적다(급격한 움직임 가능)

• Crowding

– 유사한 개체들이 개체군의 다수를 점유하는 현상

– 다양성을 감소시킨다.

Page 19: Evolutionary (deep) neural network

19

Crowding

• Crowding의 해결법– 선택방법을 바꾼다.

• Tournament selection, ranking selection

– “fitness sharing”• 유사한 개체가 많으면 fitness를 감소시킨다.

– 결합하는 개체들을 제한• 가장 비슷한 개체끼리 결합하게 함으로써 cluster or

multiple subspecies 형성

• 개체들을 공간적으로 분포시키고 근처의 것끼리만 결합 가능하게 함

Page 20: Evolutionary (deep) neural network

20

Typical behavior of an EA

• Phases in optimizing on a 1-dimensional fitness landscape

Early phase:

quasi-random population distribution

Mid-phase:

population arranged around/on hills

Late phase:

population concentrated on high hills

Page 21: Evolutionary (deep) neural network

21

Geometric Analogy - Mathematical Landscape

Page 22: Evolutionary (deep) neural network

22

Typical run: progression of fitness

Typical run of an EA shows so-called “anytime behavior”

Best

fitness

in p

opula

tion

Time (number of generations)

Page 23: Evolutionary (deep) neural network

23

Best

fitness

in p

opula

tion

Time (number of generations)

Progress in 1st half

Progress in 2nd half

Are long runs beneficial?

• Answer: - it depends how much you want the last bit of progress- it may be better to do more shorter runs

Page 24: Evolutionary (deep) neural network

24

Scale of “all” problems

Perform

ance

of m

eth

ods

on p

roble

ms

Random search

Special, problem tailored method

Evolutionary algorithm

ECs as problem solvers: Goldberg’s 1989 view

Page 25: Evolutionary (deep) neural network

25

Advantages of EC

• No presumptions w.r.t. problem space

• Widely applicable

• Low development & application costs

• Easy to incorporate other methods

• Solutions are interpretable (unlike NN)

• Can be run interactively, accommodate user proposed solutions

• Provide many alternative solutions

Page 26: Evolutionary (deep) neural network

26

Disadvantages of EC

• No guarantee for optimal solution within finite time

• Weak theoretical basis

• May need parameter tuning

• Often computationally expensive, i.e. slow

Page 27: Evolutionary (deep) neural network

Genetic Programming

Page 28: Evolutionary (deep) neural network

28

Genetic Programming

• Genetic programming uses variable-size tree-representations rather than fixed-length strings of binary values.

• Program tree

= S-expression

= LISP parse tree

• Tree = Functions (Nonterminals) + Terminals

Page 29: Evolutionary (deep) neural network

29

GP Tree: An Example

• Function set: internal nodes

– Functions, predicates, or actions which take one or more arguments

• Terminal set: leaf nodes

– Program constants, actions, or functions which take no arguments

S-expression: (+ 3 (/ ( 5 4) 7))Terminals = {3, 4, 5, 7}Functions = {+, , /}

Page 30: Evolutionary (deep) neural network

30

Tree based representation

• Trees are a universal form, e.g. consider

• Arithmetic formula

• Logical formula

• Program

15)3(2

yx

(x true) (( x y ) (z (x y)))

i =1;

while (i < 20)

{

i = i +1

}

Page 31: Evolutionary (deep) neural network

31

Tree based representation

• In GA, ES, EP chromosomes are linear structures (bit strings, integer string, real-valued vectors, permutations)

• Tree shaped chromosomes are non-linear structures.

• In GA, ES, EP the size of the chromosomes is fixed.

• Trees in GP may vary in depth and width.

Page 32: Evolutionary (deep) neural network

32

Crossover: Subtree Exchange

+

b

a b

+

b

+

a a b

+

a b

a b

+

b

+

a

b

Page 33: Evolutionary (deep) neural network

33

Mutation

a b

+

b

/

a

+

b

+

b

/

a

-

b a

Page 34: Evolutionary (deep) neural network

Evolution strategies

Page 35: Evolutionary (deep) neural network

35

ES quick overview

• Developed: Germany in the 1970’s

• Early names: I. Rechenberg, H.-P. Schwefel

• Typically applied to:– numerical optimisation

• Attributed features:– fast

– good optimizer for real-valued optimisation

– relatively much theory

• Special:– self-adaptation of (mutation) parameters standard

Page 36: Evolutionary (deep) neural network

36

ES technical summary

Representation Real-valued vectors

Recombination Discrete or intermediary

Mutation Gaussian perturbation

Parent selection Uniform random

Survivor selection (,) or (+)

Specialty Self-adaptation of mutation

step sizes

Page 37: Evolutionary (deep) neural network

37

Introductory example

• Task: minimimise f : Rn R

• Algorithm: “two-membered ES” using

– Vectors from Rn

directly as chromosomes

– Population size 1

– Only mutation creating one child

– Greedy selection

Page 38: Evolutionary (deep) neural network

38

Parent selection

• Parents are selected by uniform random distribution whenever an operator needs one/some

• Thus: ES parent selection is unbiased -every individual has the same probability to be selected

• Note that in ES “parent” means a population member (in GA’s: a population member selected to undergo variation)

Page 39: Evolutionary (deep) neural network

39

Survivor selection

• Applied after creating children from the parents by mutation and recombination

• Deterministically chops off the “bad stuff”

• Basis of selection is either:

– The set of children only: (,)-selection

– The set of parents and children: (+)-selection

Page 40: Evolutionary (deep) neural network

40

Survivor selection cont’d

• (+)-selection is an elitist strategy

• (,)-selection can “forget”

• Often (,)-selection is preferred for:– Better in leaving local optima

– Better in following moving optima

– Using the + strategy bad values can survive in x, too long if their host x is very fit

• Selective pressure in ES is very high ( 7 • is the common setting)

Page 41: Evolutionary (deep) neural network

Evolutionary Programming

Page 42: Evolutionary (deep) neural network

42

EP quick overview

• Developed: USA in the 1960’s

• Early names: D. Fogel

• Typically applied to:– traditional EP: machine learning tasks by finite state machines

– contemporary EP: (numerical) optimization

• Attributed features:– very open framework: any representation and mutation op’s OK

– crossbred with ES (contemporary EP)

– consequently: hard to say what “standard” EP is

• Special:– no recombination

– self-adaptation of parameters standard (contemporary EP)

Page 43: Evolutionary (deep) neural network

43

EP technical summary tableau

Representation Real-valued vectors

Recombination None

Mutation Gaussian perturbation

Parent selection Deterministic

Survivor selection Probabilistic (+)

Specialty Self-adaptation of mutation

step sizes (in meta-EP)

Page 44: Evolutionary (deep) neural network

Evolutionary Neural Networks(or Neuro-evolution)

Page 45: Evolutionary (deep) neural network

45

ENN

• The back-propagation learning algorithm cannot guarantee an optimal solution.

• In real-world applications, the back-propagation algorithm might converge to a set of sub-optimal weights from which it cannot escape.

• As a result, the neural network is often unable to find a desirable solution to a problem at hand.

Page 46: Evolutionary (deep) neural network

46

ENN

• Another difficulty is related to selecting an optimal topology for the neural network.

– The “right” network architecture for a particular problem is often chosen by means of heuristics, and designing a neural network topology is still more art than engineering.

• Genetic algorithms are an effective optimization technique that can guide both weight optimization and topology selection.

Page 47: Evolutionary (deep) neural network

47

Encoding a set of weights in a chromosome

y

0.91

3

4x1

x3

x22

-0.8

0.4

0.8

-0.7

0.2

-0.2

0.6

-0.3 0.1

-0.2

0.9

-0.60.1

0.3

0.5

From neuron:

To neuron:

12 34 5678

1

2

3

4

5

6

7

8

00 00 0000

00 00 0000

00 00 0000

0.9 -0.3 -0.7 0 0000

-0.8 0.6 0.3 0 0000

0.1 -0.2 0.2 0 0000

0.4 0.5 0.8 0 0000

00 0 -0.6 0.1 -0.2 0.9 0

Chromosome: 0.9 -0.3 -0.7 -0.8 0.6 0.3 0.1 -0.2 0.2 0.4 0.5 0.8 -0.6 0.1 -0.2 0.9

Page 48: Evolutionary (deep) neural network

48

Fitness function

• The second step is to define a fitness function for evaluating the chromosome’s performance.

– This function must estimate the performance of a given neural network.

– Simple function defined by the sum of squared errors.

Page 49: Evolutionary (deep) neural network

49

4

5

y

x22

-0.3

0.9

-0.7

0.5

-0.8

-0.6

Parent 1 Parent 2

x11

-0.2

0.1

0.44

5

y

x22

-0.1

-0.5

0.2

-0.9

0.6

0.3x11

0.9

0.3

-0.8

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9 0.4 -0.3 0.3 0.2 0.3 -0.9 0.60.9 -0.5 -0.8 -0.1

0.1 -0.7 -0.6 0.5 -0.80.9 -0.5 -0.8 0.1

4y

x22

-0.1

-0.5

-0.7

0.5

-0.8

-0.6

Child

x11

0.9

0.1

-0.8

Crossover

Page 50: Evolutionary (deep) neural network

50

Mutation

Original network3

4

5

y6

x22

-0.3

0.9

-0.7

0.5

-0.8

-0.6x11

-0.2

0.1

0.4

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9

3

4

5

y6

x22

0.2

0.9

-0.7

0.5

-0.8

-0.6x11

-0.2

0.1

-0.1

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9

Mutated network

0.4 -0.3 -0.1 0.2

Page 51: Evolutionary (deep) neural network

51

Architecture Selection

• The architecture of the network (i.e. the number of neurons and their interconnections) often determines the success or failure of the application.

• Usually the network architecture is decided by trial and error; there is a great need for a method of automatically designing the architecture for a particular application. – Genetic algorithms may well be suited for this

task.

Page 52: Evolutionary (deep) neural network

52

Encoding

Fromneuron:

To neuron:

1 2

0

5

0

3

0

4

0

6

1

2

3

4

5

6

0 0

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1 1

1

1 1 1 1

0

1 0

0 0

0 0

3

4

5

y6

x22

x11

Chromosome:

0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0

Page 53: Evolutionary (deep) neural network

53

ProcessNeural Network j

Fitness = 117

Neural Network j

Fitness = 117Generation i

Training Data Set0 0 1.0000

0.1000 0.0998 0.8869

0.2000 0.1987 0.7551

0.3000 0.2955 0.61420.4000 0.3894 0.4720

0.5000 0.4794 0.3345

0.6000 0.5646 0.20600.7000 0.6442 0.0892

0.8000 0.7174 -0.01430.9000 0.7833 -0.1038

1.0000 0.8415 -0.1794Child 2

Child 1

CrossoverParent 1

Parent 2

Mutation

Generation (i + 1)

Page 54: Evolutionary (deep) neural network

Evolutionary DNN

Page 55: Evolutionary (deep) neural network

55

Good reference blog

• https://medium.com/@stathis/design-by-evolution-393e41863f98

Page 56: Evolutionary (deep) neural network

56

Evolving Deep Neural Networks

• https://arxiv.org/pdf/1703.00548.pdf

• CoDeepNEAT

– for optimizing deep learning architectures through evolution

– Evolving DNNS for CIFAR-10

– Evolving LSTM architecture

– Not so clear experimental comparison..

Page 57: Evolutionary (deep) neural network

57

Large-Scale Evolution of Image Classifiers

• https://arxiv.org/abs/1703.01041

• Individual

– a trained architecture

• Fitness

– Individual’s accuracy on a validation set

• Selection (tournament selection)

– Randomly choose two individuals

– Select better one (parent)

Page 58: Evolutionary (deep) neural network

58

Large-Scale Evolution of Image Classifiers

• Mutation

– Pick a mutation from a predetermined set

• Train child

• Repeat.

Page 59: Evolutionary (deep) neural network

59

Large-Scale Evolution of Image Classifiers

Page 60: Evolutionary (deep) neural network

60

Convolution by Evolution

• https://arxiv.org/pdf/1606.02580.pdf

• GECCO16 paper

• Differential version of the Compositional Pattern Producing Network (DPPN)– Topology is evolved but the weights are

learned

– Compressed the weights of a denoising autoencoder from 157684 to roughly 200 parameters with comparable image reconstruction accuracy

Page 61: Evolutionary (deep) neural network

61

Page 62: Evolutionary (deep) neural network

62

Page 63: Evolutionary (deep) neural network

[email protected]

@likesky3