Fuzzy Modeling

8/4/2019 Fuzzy Modeling

1/65

By: Saeed [email protected]

Fuzzy modeling

1


2/65

Introduction:Classical approach :

- Low accuracy in complicated systems

- Systems for which first principle and theoretical methods

are not fully developed-

Solution:1- human parallel processing neural networks2- human reasoning and inference system fuzzy models

2


3/65

-Although neural networks have many advantages but theyhave three main problems:

1- data saved in some parameter which are notinterpretable

2- nonlinear optimization problem

3- capturing the expert knowledge is impossible

3


4/65

Fuzzy models-A mathematical model which in some way uses fuzzy sets is called

fuzzy model [1]

-A method for modeling complex, ill defined, and less tractablesystems.

if( ) then ( )

validity of rule output of rule

fuzzy sets fuzzy sets(mamdani) or

functions (Takagi-Sugeno)

Example(mamdani):If pressure is high,then volume is small

Example(TSK):If velocity is high, thenforce = k *

4


5/65

-Two different ideas are behind these modelingapproaches; while the Mamdani model tries to imitate

the human reasoning mechanism, the Takagi-Sugenomodel tries to represent system by some local simplemodels when it is not describable by a single modelaccurately. For this reason Takagi-Sugeno model is

sometimes called local model.

5


6/65

- input space partitioning

partitioning ofinput space

Grid

partitioning

Tree

partitioning

Scatter

partitioning

1-ANFIS (Jang)

2-FUREGALOLIMOT(Nelles) CLUSTERING (Babuka)

6


7/65

ANFIS(Adaptive-Network-Based Fuzzy Inference System)

7


8/65

Main problems of fuzzy modeling before ANFIS:

1) No standard methods exist for transforming human

knowledge or experience into the rule base and database

of a fuzzy inference system.

2) There is a need for effective methods for tuning the

membership functions (MFs) so as to minimize the

output error measure or maximize performance index.

8


9/65

neural networksNeuron structure:

Output of neuron:1

( )m

k kj j k

j

y x b

9


10/65

- Activation function(

):

The logistic function ( +() ):

Hyperbolic tangent (tanh()):

Nonlinear behavior ofneural networks! 10


11/65

Multilayer perceptron (MLP):

Arbitrary number of hidden layer can be used!

11


12/65

Training MLPS (back propagation)

-Training data:

(: Input to MLP , : desired output , :MLP output for ())

()

() & (1) (1)

() () & ()

() - Cost function

12

=&

- What should be optimized

(neuron weights) 12


13/65

-Optimization algorithm

steepest descent: The search direction is the oppositegradient direction.

: the gradient of output error with respect to

- The most important advantage of this algorithm is that

it shows that the gradient for each weight can becalculated with the aid of the gradient of neurons in thenext layer.

13


14/65

-Training procedure:

Its two pass optimization method. In forward pass the inputsgo through the MLP and and can be calculated.It backward pass the error goes through output layer to input

layer and update all of the MLPs weights. This procedure

repeated by all data samples many time.

14


15/65

Fuzzy Inference System (FIS)

15


16/65

Fuzzy Inference System (FIS):

1-Compare the input variables with the membership functionson the premise part to obtain the membership values (orcompatibility measures) of each linguistic label. (This step is

often calledfuzzification ).2- Combine (through a specific T-norm operator, usuallymultiplication or min.) the membership values on the premisepart to getfiring strength (weight) of each rule.

3- Generate the qualified consequent (either fuzzy or crisp) ofeach rule depending on the firing strength.

4- Aggregate the qualified consequents to produce a crisp

output. (This step is called defuzzification.) 16


17/65

- Example

()

()

MamdaniType 1 Type2

TSK

17


18/65

Each of this if then rules can be represented as an adaptive network:

Nodes with adaptiveparameters

Nodes fixedoperation

Centers and width ofmembership functions & &

18


19/65

Example of a FIS with two inputand three membership function

for each of the inputs

19


20/65

Training procedure:

Forward pass Backward pass

Premise parameters Fixed Gradient descent

Consequent

parameters

Least square

estimateFixed

signals Node output Error rates

twopasses in the hybrid learning procedure for ANFIS

20


21/65

Why we can use the least squares algorithm for consequentparameters: (for example for TSK model on page 18)

() ()

() ()

21

Linear regressionproblem


22/65

- In backward pass the gradient descent algorithm isused to optimize the premise parameter while the error

propagate backward through the network.(like backpropagation in neural networks)

22


23/65

Remark1: since the consequent parameters are optimizedin each iteration with least squares algorithm, in backwardpass the nonlinear optimization problem can be solvedmore efficiently and problems such as being trapped inlocal minima or slow convergence are less problematic.

23


24/65

- remark2: TSK model is more popular in ANISstructure since it has more adjustable parameters in

consequent of rules. This will reduce the training timeand effort, because these parameters will be linear withrespect to output error and can be estimated veryefficiently through least-squares algorithm

24


25/65

- Remark3: sometimes optimizing the premise parameter(input membership functions) will deteriorate theinterpretability of the rule base.

25


26/65

Example: 0.6 sin 0.3 sin 3 0.1 sin 5 & [1,1]

26

3 membershipfunction for each

output(9rules)


27/65

27

4 membershipfunction for eachoutput(16rules)


28/65

28

5 membershipfunction for eachoutput(25rules)

Loss ofinterpretability


29/65

FUREGA

Fuzzy Rule Extraction using

Genetic Algorithm

29


30/65

FUREGA:1- start a grid base network using prior knowledge

2- selection of rule by genetic algorithm

3-least squares for output parameter optimization

4- constrain nonlinear optimization of membershipfunction

30


31/65

Properties :

Hopeful to have the best solution (accuracy)

Time consuming training

Curse of dimensionality

Interpretability ?

31


32/65

Local Linear Model Tree

LOLIMOT

32


33/65

What are local models ?

33


34/65

Example:

34


35/65

LOLIMOT algorithm:-The algorithm has an outer loop (upper level) thatdetermines the input partitions (structure) where thelocal linear models are valid and an inner loop (lowerlevel) that estimates the parameters of those local linear

models by efficient weighted least squares algorithm.

Consequent parameter estimation:

. (, , )= :local linear model parameters : inputs vector: normalized Gaussian weighting function for the ith model withcenter coordinates and standard deviations

35


36/65

, , =

Where:

exp( 12 (

))

- Assume the weighting functions would have been alreadydetermined. Then the parameters of each linear model areestimated separately by a weighted least squares technique.

With the data matrixX (inputs of model-known) the

diagonal weighting matrix Q, (each entry is theweighting function value of the corresponding input data)and desired outputsythe optimal parameters of the model are:

36


37/65

- Input space partitioning

1- Set the first hyper-rectangle in such a way that is containsall data points. Estimate a global linear model.

2- For all input dimensions j := l...n:

2a. Cut the hyper-rectangle into two halves alongdimension j.

2b. Estimate local linear models for each half.

2c. Calculate the global approximation error (output error)

for the model with this cut.

3- Determine which cut has led to the smallestapproximation error.

37


38/65

4- Perform this cut. Place a weighting function within each

center of both hyper-rectangles. Set standard deviations ofboth weighting functions proportional to the extension of thehyper-rectangle in each dimension. Apply the correspondingestimated local linear models(from 2b).

5- Calculate the local error measures Jon basis of a parallelrunning model for each hyper-rectangle.

6-Choose the hyper-rectangle with the largest local error

measureJ.

7-If the global approximation error on a parallel model

(output error) is too large go to step 2.

8- Convergence. Stop. 38


39/65

LOLIMOT

39


40/65

Example:

40


41/65

properties:

High interpretability of rules

Automatically partitioning of the input spaceaccording to the system properties

Different objective function for modeling error andstructure optimization

Low sensitivity to user selected parameters

No curse of dimensionality for high-dimensionalproblems

41


42/65

Implementing Hierarchical Fuzzy

Clustering in Fuzzy IdentificationUsing weighted fuzzy C-means

42


43/65

Clustering- Definitionto divide the data-set in such way that objects belonging tothe same cluster are as similar as possible and objectsbelonging to different clusters are as dissimilar as possible

- types

1- Crisp

2- Fuzzy

- Properties

1-Unsupervised learning task

2- Nonlinear optimization

3- Computational economy

4- Needs user defined parameters 43


44/65

Fuzzy C_means (FCM)

Cost function

m ---> 1 clusters ---> crispm ---> clusters ---> fuzzy

Iterative training

44


45/65

Example of fuzzy C_means

45


46/65

Weighted fuzzy C-means (WFCM) Some points are more important

46


47/65

self organizing map(SOM):

The most famous neural network base clustering

K-means (crisp C-means) with sequential training

47

( )


48/65

SOM algorithm:1- Choose initial values for the C neuron vectors , 1, . . . , . Thiscan be done by picking randomlyCdifferent data samples.2. Choose one sample for the data set(u). This can be done eitherrandomly or by systematically going through the hole data set.

3. Calculate the distance of the selected data sample to all neuronvectors. Typically, the Euclidean distance measure is used. The neuronwith the vector closest to the data sample is called thewinner neuron.

4. Update the vector of the winner neuron in a way that moves it towardthe selected data sample u:

( )5. If any neuron vector has been moved significantly, in the previousstep then go to Step 2; otherwise stop.

48


49/65

fuzzy clustering for fuzzy identification

It is a unsupervised learning task so it does not need no additionaldata.

Input space term-sets derived from a direct result of the clusteringprocess

Computational economy

49


50/65

Application of clustering in fuzzy modeling

1- applying clustering algorithms to input data only

2- applying clustering algorithms to output data only

3- applying clustering algorithms to a vector composedof input and output data.

50


51/65

FCM for input space partitioning

FCM requires a priori knowledge of the number ofclusters

- determining the number of clusters in an iterative manner

- using optimal fuzzy clustering methods

dependence of FCM on the initialization- hierarchical clustering

interpretability of the final fuzzy model

- Model simplification methods

51


52/65

Algorithm:

52


53/65

Algorithm:

1- apply SOM algorithm to classify N data samples into ncrisp clusters( , 1 . . ).

2- select the n cluster center(

, 1 . . ) from previous

step and assign a weight for each of them according totheir relative cardinality.

3-apply WFCM to classify the n cluster center ( , )into C new clusters.

53


54/65

4- The centers of the Gaussian membership functions in

premise 0f the fuzzy rules are obtained by simplyprojecting the final cluster centersinto each axis. Tocalculate the respective standard deviations utilize thefuzzy covariance matrix.[5]

5- use weighted least squares to optimize the consequentparameters and steepest descent for premiseparameters.(Formulas[5])

6- merge similar member functions for interpretability.

similarity measure: , 7- optimize the consequent parameters again.

54


55/65

Example I:

55


56/65

Example :

1 2 3 4 5

DS1\10

1

2

3

4

5

DS1\11SOM

WFCM

56

1 2 3 4 5X1

1.5

2.5

3.5

4.5

5.5

X2

green w=2/51

red w=4/51

black w=10/51

dark blue w=8/51

blue w=3/51


57/65

Example (cont.):

1 1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

initial term-sets for x1

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

final term-sets for x1

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

simplified term-sets for x1

medium

small

large

1 1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

initial term-sets for x2

1 1.5 2 2.5 3 3.5 4 4.5 5

0

0.5

1

final term-sets for x2

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

simplified term-sets for x2

small

large

R1: if x1 is small and x2 is small then y=17.3-2.6x1+1.4x2R2: if x1 is medium and x2 is large then y=7.5-2.9x1-0.02x2R3: if x1 is large and x2 is small then y=4.7+2.7x1-7.8x2R4: if x1 is large and x2 is large then y=2.8-0.2x1-0.2x2

J=0.1801

J=0.0018

J=0.0154

57


58/65

Example II:

0 50 100 150 200 250 300 350 400 450 5000.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

t

x

Inputs

x(t-18) x(t-12)x(t-6) x(t)

output

x(t+6)


59/65

Example II(cont.):

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.2

0.4

0.6

0.8

1

initial term-sets for x(t-18)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1

final term-sets for x(t-18)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1

simplified term-sets for x(t-18)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.2

0.4

0.6

0.8

1


0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1


0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1



60/65

Example II(cont.):

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.2

0.4

0.6

0.8

1


0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1


0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1


0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.2

0.4

0.6

0.8

1

initial term-sets for x(t)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1

final term-sets for x(t)

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1

simplified term-sets for x(t)

J=0.0166

J=0.0072

J=0.0128


61/65

Benefits to Similar approaches::

It does not need any additional data

Low sensitivity to user selected parameters andinitial condition

Computational economy curse of dimensionality

interpretability

Sensitivity to data distribution

61


62/65

universal

approximator

62


63/65

Proof:[6]

63


64/65

References:1- Babuka, R. and Verbuggen, H. (2003). Neuro-fuzzy methods for nonlinear systemidentification, Review. Annual reviews in control, 27, 73-85.

2- Haykin, S.(1998), Neural Networks: A Comprehensive Foundation. Prentice Hall.

4- Jang, J.-S.R. (1993). ANFIS: Adaptive-network-based fuzzy inference systems. IEEETransactions on Systems, Man & Cybernetics, 23(3), 665685.

3- Nelles, O. and Isermann, R. (1996). Basis function networks for interpolation oflocal linear models. In: IEEE Conference on Decision and Control (CDC), 470475.

4- Nelles, O. (2002). Nonlinear System Identification. Springer Verlag, Berlin.

5- Oliveira, J. V. and Pedrycz, W. (2007).Advances in Fuzzy Clustering and itsApplications,John Wiley & Sons, chapter 12.

6- Espinosa, J., Vandewalle, J., Wertz, V. (2004). Fuzzy logic, identification andpredictive control. Springer Verlag, Berlin.

64


65/65

Questions and Discussion

Thanks for your attention

Documents

Fuzzy Modeling