Fuzzy Systems - Fuzzy Rule...

Fuzzy SystemsFuzzy Rule Generation

Rudolf Kruse Christian Moewes{kruse,cmoewes}@iws.cs.uni-magdeburg.de

Otto-von-Guericke University of MagdeburgFaculty of Computer Science

Department of Knowledge Processing and Language Engineering

R. Kruse, C. Moewes Fuzzy Systems – Fuzzy Rule Generation 2010/01/27 1 / 61

Outline

1. Motivation

2. Extracting Grid-Based Fuzzy Rules

3. Extracting Individual Fuzzy Rules

4. Rule Generation by Fuzzy Clustering

5. Different Approaches

Extracting Fuzzy Systems from Data

How can fuzzy systems automatically be derived from example data?

• suppose, input space X and output space Y

• we observe n training patterns (xi , yi ) ∈ S ⊆ X × Y, 1 ≤ i ≤ n

• given numerical input, X = (X1, . . . , Xp) ⊂ IRp and thus xi 7→ x i

• fuzzy rule base shall approximate S

• classical two-step approach1. find fuzzy sets by

• either predefining them on input and output variables

• or constructing them throughout learning procedure

2. find fuzzy rules• either directly

• or iteratively

Outline

1. Motivation

2. Extracting Grid-Based Fuzzy RulesWang & Mendel AlgorithmHiggins & Goodman Algorithm

Fixed Grid-Based Algorithms

• each Xj is partitioned into small set of linguistic variables

• rules use all or subset of possible combinations

⇒ global granulation of X into tiles

R1,...,1 : if x1 is µ1,1 and . . . and xp is µ1,p then . . .. . .Rl1,...,lp : if x1 is µl1,1 and . . . and xp is µlp ,p then . . .

• lj (1 ≤ j ≤ p): number of linguistic values for Xj

• problems• exponentially many rules in high-dimensional spaces• fine grid causes very high computational costs• wrong choice of grid may skip extrema

Wang & Mendel Algorithm

• basic fuzzy rule learning method

1. predefine global grid for X and Y2. determine best possible output fuzzy set for each rule

• one run through S determines closest x to geometrical rule center

• closest output fuzzy value 7→ corresponding fuzzy rule

• problems if

1. function extrema far from away center point2. number of rules gets huge

1. Granulate X and Y

µ1,1 µ1,2 µ1,3

1,2 R1,3

R2,2 R2,3

• divide each Xj into ljequidistant triangularfuzzy sets

• similarly, Y isgranulated into lytriangular fuzzy sets

2. Determine best consequence for each rule

• for each (x, y) = (x1, . . . , xp, y) ∈ Scompute membership degree to each possible tile

µk1,1(x1), . . . , µkp ,p(xp), µky (y)}

with 1 ≤ kj ≤ lj and 1 ≤ ky ≤ ly

• µkj ,j = membership function of kj -th linguistic value of Xj

• ∀k1, . . . , kp tile arg max1≤ky ≤ly µ(k1, . . . , kp, ky ) is 1 rule

R(k1,...,kp) : if x1 is µk1,1 and . . . and xp is µkp ,p then y is µky

• membership degree is used as rule weight β(k1,...,kp)

Determining y based on new x

• given arbitrary x, rules determine crisp output y

1. for each rule compute degree of fulfillment

µ(k1,...,kp)(x) = min{

µk1,1(x1), . . . , µkp ,p(xp)}

2. comput y by some kind of COG defuzzification

l1,...,lp∑

k1=1,...,kp=1

β(k1,...,kp) · µ(k1,...,kp)(x) · y(k1,...,kp)

β(k1,...,kp) · µ(k1,...,kp)(x)

• y(k1,...,kp) = center of output region of R(k1,...,kp)

Example: Wang & Mendel Algorithm

given data points step 1: granulate X and Y

• example data set with one input and one output

• note that closest points to corresponding rules are red

Example: Wang & Mendel Algorithm (cont.)

step 2: generate rules resulting crisp approximation

• fuzzy rules are shown by their α = 0.5-cuts

• learned model misses extrema far away from rule centers

Example: Wang & Mendel Algorithm (cont.)

• generated rule base:

R1 : if x is zerox then y is mediumy

R2 : if x is smallx then y is mediumy

R3 : if x is mediumx then y is largey

R4 : if x is largex then y is mediumy

• intuitively, rule R2 should probably be used to describe minimum

R ′2 : if x is smallx then y is smally

Summary

• one pattern per rule is used to compute rule’s outcome

• high variance would lead to model failures

• predefined fixed grid yields to fuzzy model which

• either does not fit underlying function very well

• or consists of large number of rules

⇒ wish to automatically determine granulations of X and Y

Higgins & Goodman Algorithm

• extension of Wang & Mendel algorithm

1. only one membership function is used for each Xj and Y

⇒ one large rule covering entire feature space

2. new membership functions are placed at points of maximum error

• both steps are repeated until

• maximum number of divisions is reached or

• approximation error remains below threshold

1. Initialization

• create membership functionfor each Xj covering entiredomain range

• create membership functionfor Y at corner points of X

• at corner point, each Xj ismaximal or minimal of itsdomain range

• for each corner point, closestexample from S is used to addmembership function at itsoutput value

2. Adding new Membership Functions

• find point within S withmaximum error

• defuzzification equalsWang & Mendel

• for each Xj , add newmembership function atcorresponding value of“maximal error point”

⇒ perfectly described by model

3. Create new Cell-based Rule Set

• new rules: associate outputmembership functions withnewly created cells

⇒ take closest point to allmembership functions of X(equals Wang & Mendel)

• associated output membershipfunction is closest one tooutput value of “closest point”

• if output value of “closestpoint” is far away, new outputfunction is created

4. Termination Detection

• if error is below threshold (orif certain number of iterationshave been done),then stop algorithm

• otherwise continue at step 2

Summary

• this approach can model extrema better than Wang & Mendel

• it favors extrema

⇒ strong tendency to outliers

• data driven granulation: difficult to interpret

• greedy algorithm: grid is often suboptimal

• it is possible to simplify learned rule base, e.g.,

by rule ranking and search for best rules

Outline

1. Motivation

3. Extracting Individual Fuzzy RulesIndividual Membership FunctionsBerthold & Huber Algorithm

Extensions avoiding Global Grids

• in high-dimensional X , global granulation leads to many rules

⇒ now, no global dependence on granulation• individual membership functions for each rule• better modeling of local properties

R1 : if x1 is µ1,1 and . . . and xp is µ1,p then . . .. . .

Rr : if x1 is µr ,1 and . . . and xp is µr ,p then . . .

• not all attributes will be used for all rules• individual choice of constraints on few attributes per rule• better interpretability in high dimensions• no exponential number of rules with increasing dimensionality

Local Granulation

µ1,1µ1,2 µ1,3

• 3 rules in 2 dimensions

• compare with globalgranulation à laWang & Mendel

• possible disadvantage

• potential loss ofinterpretation

• projection of allfuzzy sets onto oneXj is usually notmeaningful

Berthold & Huber Algorithm

• constructs rule base with individual fuzzy sets per rule

• parameters that must be specified• granulation of Y, i.e., number and shape of membership functions• c fuzzy sets defined by µk

y with 1 ≤ k ≤ c

• algorithm iterates over S and fine-tunes evolving model

• final rule base consists of fuzzy rules Rkd , 1 ≤ d ≤ rk

• rk = number of rules for output region k

• output for k−th region and some x equals Mamdani controller

µk(x) = max1≤d≤rk

min1≤j≤p

{µkd,j(xj)}

Form of RulesRules

• all rules rely on trapezoidal membership functions

⇒ each rule can be described by 4 parameters per Xj

if x1 is 〈a1, b1, c1, d1〉 and . . . and xp is 〈ap, bp, cp , dp〉then y is µk

• however, if some trapezoids cover entire domain of an Xj ,

then rule’s degree of fulfillment is independent from of this Xj

Algorithm 1 Berthold & Huber

Input: S = {(x i , ci) | 1 ≤ i ≤ n} with x ∈ IRp and c ∈ IN class of x

do {for each training example (x, c) ∈ S {

if correct rule of class c exists {increase weight by one // COVERadjust core region of rule to cover x

} else {insert new rule with core equals x // COMMITsupport equals ∞ (i.e., rule is not constrained)

}reduce support of all rules of conflicting class that cover x //SHRINK

}} while changes have occurred

• COVER and COMMIT are easy to implement

• SHRINK is based on heuristics (e.g., volume-based)

Some Remarks on Core and Support Regions

• algorithm finds rule base that completely describes data

• each rule is partial hypothesis for subset S ⊂ S

• core = most specific hypothesis covering S

• support = (one of the) most general hypotheses covering S

⇒ support is more general than core

• both core and support regions can be seen as

• smallest area with highest degree of confidence (evidence)

• largest area without conflict (no counter-example)

Example: Berthold & Huber Algorithm

• given two-dimensional X andtraining data S where |Y| = 2

• task: fuzzy binaryclassification

• first, start with empty rulebase for each region/class

• Java applet is available thatdemonstrates algorithm

Example: Berthold & Huber Algorithm (cont.)

• insert general rule for firstexample pattern

• suppose that 2nd pattern isfrom different class

⇒ new rule is inserted for 2ndpattern

• also adjust 1st (conflicting)rule

• suppose that 3rd pattern isfrom same class as 2nd one

⇒ adjust free feature to avoidconflict with 3rd pattern

• and so on. . .

Choosing the Right Feature to Shrink

• in p-dimensional feature space, there are p choices

• algorithm uses several heuristics:

1. maximize remaining volume ⇒ low rule numbers, good coverage

2. minimize number of constrained attributes ⇒ feature reduction

3. minimize number of constraints on free features ⇒ interpretability

4. use information theoretic measures ⇒ feature importance

Outline

1. Motivation

4. Rule Generation by Fuzzy ClusteringExtend Membership Values to Continuous Membership FunctionsExample: The Iris DataInformation Loss from ProjectionExample: Transfer Passenger Analysis

Rule Generation by Fuzzy Clustering

1. apply fuzzy clustering to X ⇒ fuzzy partition matrix U = [uij ]

2. use obtained U = [uij ] to define membership functions

• usually X is multidimensional

⇒ How to specify meaningful labels for multidim. membershipfunctions?

Extend uij to Continuous Membership Functions

• assigning labels for one-dimensional domains is easier ⇒

1. project U down to X1, . . . , Xp axis, respectively2. only consider upper envelope of membership degrees3. linear interpolate membership values ⇒ membership functions4. cylindrically extend membership functions

• original clusters are interpreted as conjunction of cyl. extensions

• e.g., cylindrical extensions “x1 is low”, “x2 is high”⇒ multidimensional cluster label “x1 is low and x2 is high”

• labeled clusters = classes characterized by labels

• every cluster = one fuzzy rule

Convex Completion [Höppner et al., 1999]

• problem of this approach: non-convex fuzzy sets

⇒ having upper envelope, compute convex completion

• we denote p1, . . . , pn, p1 ≤ . . . , pk as ordered projections ofx1, . . . , xn and µi1, . . . , µin as respective membership values

• eliminate each point (pt , µit), t = 1, . . . , n, for which two limitindices tl , tr = 1, . . . , n, tl < t < tr , exist s.t.

muit < min{muitl, muitr }

• after that: apply linear interpolation of remaining points

Example: The Iris Datac© Iris Species Database http://www.badbear.com/signa/

Iris setosa Iris versicolor Iris virginica

• collected by Ronald Aylmer Fischer (famous statistician)• 150 cases in total, 50 cases per Iris flower type• measurements: sepal length/width, petal length/width (in cm)• most famous dataset in pattern recognition and data analysis

Example: The Iris Data

• shown: sepal length and petal length

• Iris setosa (red), Iris versicolor (green), Iris virginica (blue)

1. Membership Degrees from FCM

4 4.5 5 5.5 6 6.5 7 7.5 80

1 2 3 4 5 6 70

sepal length pedal length

• raw, unmodified membership degrees

2. Upper Envelope

4 4.5 5 5.5 6 6.5 7 7.5 80

1 2 3 4 5 6 70

• for every attribute value and cluster center, only considermaximum membership degree

3. Convex Completion

4 4.5 5 5.5 6 6.5 7 7.5 80

1 2 3 4 5 6 70

• convex completion removes spikes [Höppner et al., 1999]

4. Linear Interpolation

4 4.5 5 5.5 6 6.5 7 7.5 80

1 2 3 4 5 6 70

• interpolation for missing values (needed for normalization)

5. Stretched and Normalized Fuzzy Sets

4 4.5 5 5.5 6 6.5 7 7.5 80

1 2 3 4 5 6 70

• every µi(xj) 7→ µi(xj)5 (extends core and support)

• normalization has been performed finally

Information Loss from Projection

• rule derived from fuzzy clusterrepresents approximation of cluster

• information gets lost by projection

• cluster shape of FCM is spherical

• cluster projection leads tohypercube

• hypercube contains hypersphere

• loss of information can be kept smallusing axes-parallel clusters

Example: Transfer Passenger Analysis[Keller and Kruse, 2002]

• German Aerospace Center (DLR) developed macroscopicpassenger flow model for simulating passenger movements onairport’s land side

• for passenger movements in terminal areas: distribution functionsare used today

• goal: build fuzzy rule base describing transfer passenger amountbetween aircrafts

• these rules can be used to improve macroscopic simulation

• idea: find rules based on probabilistic fuzzy c-means (FCM)

Attributes for Passenger Analysis

• maximal amount of passengers in certain aircraft (depending ontype of aircraft)

• distance between airport of departure and airport of destination(in three categories: short-, medium-, and long-haul)

• time of departure

• percentage of transfer passengers in aircraft

General Clustering Procedure

evaluation

calculation

preparation

extraction offuzzy rules

sufficientclassifica-

preprocessing

calculationof prototypes

parameterselection

calculation ofmembership

degrees

initialization

identificationof outliers

scale adaptionclusteringtechnique

number of clustersor validity measure

similaritymeasure

Distance Measure

• distance between x = (x1, x2) and c = (0, 0)

−0.5

d2(c, x) = ‖c − x‖2d2

τ(c, x) = 1

τp ‖c − x‖2

Distance Measure with Size Adaption

d2ij =

· ‖c i − x j‖2

∑nj=1 um

ij x j∑n

j=1 umij

∑nj=1 um

ij d2ij

∑ck=1

∑nj=1 um

kj d2kj

• p determines emphasis put on size adaption during clustering

Constraints for the Objective function

• probabilistic clustering

• noise clustering

• influence of outliers

Probabilistic and Noise Clustering

Influence of Outliers

• A weighting factor ωj is attached to each datum x j

• weighting factors are adapted during clustering

• using concept of weighting factors:

• outliers in data set can be identified and

• outliers’ influence on partition is reduced

Membership Degrees and Weighting Factors

Influence of Outliers

• minimize objective function

J(X , U, C) =c

umij ·

· d2ij

subject to

∀j ∈ [n] :c

uij = 1, ∀i ∈ [c] :n

uij > 0,

ωj = ω

• q determines emphasis put on weight adaption during clustering

• update equations for memberships and weights, resp.

uij =d

21−m

∑ck=1 d

21−m

, ωj =

∑ci=1 um

ij d2ij

∑nk=1

i=1 umik d2

Determining the Number of Clusters

• here, validity measuresevaluating whole partition ofdata

⇒ global validity measures

• clustering is run for varyingnumber of clusters

• validity of resulting partitionsis compared

Fuzzy Rules and Induced Vague Areas

• intensity of color indicatesfiring strength of specific rule

• vague areas = fuzzy clusterswhere color intensity indicatesmembership degree

• tips of fuzzy partitions insingle domains = projectionsof multidimensional clustercenters

Simplification of Fuzzy Rules

• similar fuzzy sets arecombined to one fuzzy set

• fuzzy sets similar to universalfuzzy set are removed

• rules with same input sets are

• combined if they also havesame output set(s) or

• otherwise removed fromrule set

Results

• FCM with c = 18, outlier and size adaptation, Euclidean distance:

resulting fuzzy sets simplified fuzzy sets

Evaluation of the Rule Baserule max. no. of pax De st. depart. % transfer pax

1 paxmax1 R1 time1 tpax12 paxmax2 R1 time2 tpax23 paxmax3 R1 time3 tpax34 paxmax4 R1 time4 tpax45 paxmax5 R5 time1 tpax5. . . . . . . . . . . . . . .

• rules 1 and 5: aircraft with relatively small amount of maximalpassengers (80-200), short- to medium-haul destination, anddeparting late at night usually have high amount of transferpassengers (80-90%)

• rule 2: flights with medium-haul destination and small aircraft(about 150 passengers), starting about noon, carry relatively highamount of transfer passengers (ca. 70%)

Outline

1. Motivation

Different Approaches

• constructive: find fuzzy rules by growing singletons

• hierarchical : merge grid cells if no points are covered or sameclass is predicted

• adaptive:

• initialize rules randomly (e.g., with expert knowledge) anditeratively optimize rule parameters (e.g., location, number offuzzy sets)

• based on, e.g., gradient descent, neural networks, . . .

• evolutionary : find rules by mutation/crossover over generations

• neuro-fuzzy : inject fuzzy rules into ANN, use its learningalgorithm

Literature about Fuzzy Rule Generation

Berthold, M. R. and Hand, D. J., editors (2003).Intelligent Data Analysis: An Introduction.Springer, Berlin, Germany, 2nd edition.

Borgelt, C., Klawonn, F., Kruse, R., and Nauck, D.(2003).Neuro-Fuzzy-Systeme: Von den Grundlagen künstlicherNeuronaler Netze zur Kopplung mit Fuzzy-Systemen.Vieweg, Wiesbaden, Germany, 3rd edition.

Höppner, F., Klawonn, F., Kruse, R., and Runkler, T.(1999).Fuzzy Cluster Analysis: Methods for Classification, DataAnalysis and Image Recognition.John Wiley & Sons Ltd, New York, NY, USA.

Keller, A. and Kruse, R. (2002).Fuzzy rule generation for transfer passenger analysis.In Wang, L., Halgamuge, S. K., and Yao, X., editors,Proceedings of the 1st International Conference on FuzzySystems and Knowledge Discovery (FSDK’02), pages667–671, Orchid Country Club, Singapore.

Fuzzy Systems - Fuzzy Rule...

Documents

2002 - Putting Fuzzy Logic to Work - An Intro to Fuzzy Rule

Fuzzy rule bases

Fuzzy Firing Rule in Medical

Optimal fuzzy rule based pulmonary nodule detection

Fuzzy Rule Based Feature Extraction and Classification of Time

84 IEEE TRANSA CTIONS ON FUZZY SYSTEMS, V …sipi.usc.edu/~mendel/publications/Mendel TFS 02-2004.pdf84 IEEE TRANSA CTIONS ON FUZZY SYSTEMS, ... NO. 1, FEBR U A R Y 2004 Computing

Application of Fuzzy Association Rule Mining for Analysing ... · Application of Fuzzy Association Rule Mining for Analysing Students Academic Performance Olufunke O. Oladipupo1,

Evolutionary Multi-Objective Design of Fuzzy Rule-Based ... · - Fuzzy air cleaner - Fuzzy oven - Fuzzy rice cooker - Fuzzy washing machine - Fuzzy camera - Fuzzy copy machine - Fuzzy

IMPLEMENTASI FUZZY RULE BASED SYSTEM UNTUK …

Applying Fuzzy Rule-Based System on FMEA to Assess the ... · Fuzzy inference is a process that uses rule-based data-based systems to drive the fuzzy inference and approximate reasoning

Fuzzy Rule Base System for Software Classification

Fuzzy Systems - Fuzzy Rule Generationfuzzy.cs.ovgu.de/wiki/uploads/Lehre.FS0809/fs09.pdf · • basic fuzzy rule learning method for function approximation • based on predeﬁned

Fuzzy systems. Calculate the degree of matching Fuzzy inference engine Defuzzification module Fuzzy rule base General scheme of a fuzzy system

Fuzzy Signature Neural Network final - ANUcourses.cecs.anu.edu.au/courses/CSPROJECTS/13S2...Neural Network ! Fuzzy Logic, Fuzzy Rule Based System and Fuzzy Signature ! Fuzzy Signature

Fuzzy Rule Based System

Genetic learning of fuzzy rule-based classification

A neuro-fuzzy network to generate human-understandable ... · Keywords: Neuro-fuzzy network; Fuzzy knowledge base; Fuzzy rule extraction; Fuzzy rule interpretability 1. Introduction

Enhanced Karnik–Mendel Algorithmssipi.usc.edu/~mendel/publications/WuMendelEKMTFS-09.pdf · IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009 923 Enhanced Karnik–Mendel

Rule base simplification in fuzzy systems by aggregation of ... · sion of the intersection rule configuration of a fuzzy system into a union rule configuration with a smaller

FUZZY LOGIC RULE BASED SERIES COMPENSATOR …