24
System Identification and Curve Fitting with a Genetic Algorithm Hierarchy Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering University of Pittsburgh INFORMS Fall 1997

System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Embed Size (px)

DESCRIPTION

System Identification and Curve Fitting with a Genetic Algorithm Hierarchy. Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering University of Pittsburgh INFORMS Fall 1997. Curve Fitting. - PowerPoint PPT Presentation

Citation preview

Page 1: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

System Identification and Curve Fitting with a

Genetic Algorithm Hierarchy

Alice E. Smith and Mehmet Gulsen

Department of Industrial Engineering

University of Pittsburgh

INFORMS Fall 1997

Page 2: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Curve Fitting

Process of approximating a closed form function to a given data set of independent variables and dependent variable (variable selection, closed form function selection, coefficient estimation). Used for:– System identification– Judging the strength of relationship– Identifying main variables and interaction between

variables– Interpolate/extrapolate to new data

Page 3: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Conventional Approaches

Various regression techniques Time series analysis Spline fitting Neural networks

Page 4: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Genetic Algorithm Hierarchy

LowerModule

UpperModule

Function andVariable Selection

Coefficient Estimation

y c x c Cos c c x 1 1 2 3 4 2

2( )

y x Cos x

SSE

9 234 2 123 0 093 4 823

0 346271 2

2. . ( . . )

.

candidatefunctions

optimizedcoefficientsfor functions

Page 5: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Search Structure

Lower GASearch

Data

n1 n2 n

111

Upper GASearch

Upper GAPopulation

Lower GAPopulation

Page 6: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Genetic Search Process

InitialPopulation

Mutants

Offspring

InitialPopulation

Offspring

Mutants

FinalPopulation

( )n

( )n1

( )n2

best (n)

( )n

TopHalfSelection

UniformSelection

Page 7: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Upper GA - Function Selection

Explore the possible functional forms that could represent the underlying relationship between independent and dependent variables of a data set

Objective Function: Minimize “adjusted” total error corresponding to the functional form. Adjustment is performed by penalizing more complex representations (more variables, higher order terms)

Stopping Criteria: Search is terminated when no improvement is observed for a specific number of generations

Page 8: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Upper GAFunction Selection - Encoding

Tree Structure y C C x C C x C x x 1 2 13

3 4 2 5 1 2cos( )

C5

x2

+

+

*

*

1

x1 x1

C1

x1

C2

+

*

x1

cos

x2

C3

C4

Page 9: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Upper GAFunction Selection - Penalty Function

C5

x2

+

+

*

*

1

x1 x1

C1

x1

C2

+

*

x1

cos

x2

C3

C4

[( )]number of nodes

constantm

( ) ..14

51 05280 05

Penalty Factor = 0.05

Page 10: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Upper GAFunction Selection - Crossover

y CC C x

C xC C x C x x 1

2 3 1

4 2

5 6 2 7 1 2

ln( )cos( )

C5

y C C x C C x C x x 1 2 1

3

3 4 2 5 1 2cos( ) y CC C x

C xC 1

2 3 1

4 2

5sin(ln( )

)

C5

x2

+

+

*

*

1

x1 x1

C1

x1

C2

+

*

x1

cos

x2

C3

C4

C3

+

/

x2

x1

1

sinC1

C2 C4

ln

crossover

y C C x C 1 2 1

3

3sin( )

Before:

After:

Parent 1 Parent 2

Offspring 1 Offspring 2

Page 11: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Upper GAFunction Selection - Mutation

y C C x C C x C x x 1 2 1

3

3 4 2 5 1 2cos( )

C5

x2

+

+

*

*

1

x1 x1

C1

x1

C2

+

*

x1

cos

x2

C3

C4

mutation

y C C x C C x C x C x C x 1 2 1

3

3 4 2 5 1 6 1 7 1

2cos( ) exp( )

Before:

After:

C3

x1

+

x1

C1

C2

exp

x2

Parent 1

Mutant

randomly generated tree

Page 12: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Lower GA - Coefficient Estimation

Estimate the coefficients of a given closed form function which minimize the total error over the set of data points

Objective Function: Minimize total squared error

Minimize

K: number of data points

Stopping Criteria: Search is terminated when no improvement is observed for specific number of generations

Detailed results are published in “International Journal of Production Research”, Vol. 33, No. 7, 1995

( )y yi

K

actual model

1

2

Page 13: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Lower GACoefficient Estimation - Encoding

y C C x C C x C x x 1 2 13

3 4 2 5 1 2cos( )

C1 C2 C3 C4 C5

Page 14: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Lower GA - Selection/Breeding Parents are selected for breeding uniformly from the

superior half of the population The values of the offspring’s coefficients are determined

by calculating the arithmetic mean of the corresponding coefficients of two parents

Parent A: 45.876 32.958 12.098 -3.892 0.2356

Parent B: 12.988 35.832 0.234 -12.984 2.4576

Offspring: 29.432 34.395 6.166 -8.438 1.3466

Page 15: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Lower GA - Mutation Perturbing existing solutions to explore new regions of

search space Perturbation value is obtained by multiplying the current

population range with a random factor

C1 C2 C3 C4 C5

k C1 1 1 k C4 4 4 k C2 2 2 k C3 3 3 k C5 5 5

Page 16: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Test Problem

C Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 MeanSd.Dv.

1 9.986 9.998 10.002 10.000 9.996 10.001 9.9970.005

2 9.999 10.000 10.000 10.000 10.000 10.000 10.0000.000

3 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

4 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

5 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

6 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

7 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

8 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

9 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

10 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

SE. 0.0017 0.000 0.0000 0.000 0.000 0.000 0.000 -

C Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 MeanSd.Dv.

1 9.986 9.998 10.002 10.000 9.996 10.001 9.9970.005

2 9.999 10.000 10.000 10.000 10.000 10.000 10.0000.000

3 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

4 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

5 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

6 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

7 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

8 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

9 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

10 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000

SE. 0.0017 0.000 0.0000 0.000 0.000 0.000 0.000 -

y C C x C x C x C x C x C x C x x C x x C x x 1 2 1 3 2 4 3 5 1

26 2

27 3

28 1 2 9 1 3 10 2 3

Page 17: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Test ProblemDifferent Error Metrics

0

1

2

3

4

5

6

7

8

0 500 1000

Number of Generations

Log

10

of

Sq

uare

d E

rror

1500

Squared ErrorAbsolute Error

Maximum Error

Page 18: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Test Problem Different Numbers of Data Points

-8

-6

-4

-2

0

2

4

6

8

0 500 1000 1500 2000 2500 3000 3500

Number of Generations

Log10 o

f Square

d E

rror

4000

25 Points

100 Points

Page 19: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Empirical Data Sets

Five benchmark problems from the literature

1. onion growth

2. children growth

3. sunspots

4. chemical plant

5. slip casting Single variable/50 observations to 13 variables/1000

observations Nonlinear regression, time series analysis, model

identification

Page 20: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Sunspot data from 1700 to 1995 Highly cyclic with peak and bottom values approximately

in every 11.1 years Cycle is not symmetric. The number of counts reaches to

maximum value faster than it drops to a minimum Training range: 1700-1979 Validation range: 1980-1995

Test Problem 3, Sunspot Data

Page 21: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Functions Identified

M o d e l E q u a t i o n S S E

A 9)-0.2471(+2)-0.4585(-1)-1.1965( ttt 6 1 9 6 4

B

2))-0.6271(-1))-p(-0.3263((-2.7260ex15.7476exp+

9))--0.3512(1.1989exp(-1)-0.8337(

tt

tt 4 5 5 3 3

C9)-0.1148(+4)-0.1316(-1)-0.8064(+1)-2))(-0.8446(-

1))-0.4282(+4)-(1.4097(-0.6099cos1.2410exp(

ttttt

tt 4 0 3 4 1

D

9)-0.1046(+4)-0.1413(-1)-0.8253(+

1)-2)(-0.9362(-2))-0.7485(-2))-3.1756(-2))-2.8807(+

4))-(0.2561(-3.3442cos0.6979exp(+

4)-(-1.4893(-0.5564cos1.6258exp(

ttt

ttttt

t

t 3 8 7 1 5

Page 22: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Model D

0

1

2

3

4

5

6

7

8

9

10

1700 1750 1800 1850 1900 1950 2000

Year

20 x

An

nu

al N

um

ber

of

Su

nsp

ots

Page 23: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Extrapolation of Model D

0

1

2

3

4

5

6

7

8

9

1980 1985 1990 1995

Year

Data

Fitted Function

Page 24: System Identification and Curve Fitting with a Genetic Algorithm Hierarchy

Conclusions

A unique approach for curve fitting problemsProvides closed form function for the given data set

Can handle non-linear, discontinuous functions

Flexible in terms of error metric

Can be used separately for function selection and coefficient optimization

Computationally intensive and needs a priori setting of search parameters and penalty function components

Forthcoming paper : “A hierarchical genetic algorithm for system identification and curve fitting with a supercomputer implementation,” Mehmet Gulsen and Alice E. Smith, Institute for Mathematics and its Applications, Volumes in Mathematics and its Applications, Volume on Evolutionary Computing.