Efficient fuzzy compiler for SIMD architectures

Applied Soft Computing 4 (2004) 287–301

Efficient fuzzy compiler for SIMD architectures

Enrique Frıas-Martıneza, Julio Gutiérrez-Rıosb,∗, Felipe Fernández-Hernándezb

a Computer Science Department, Courant Institute of Mathematical Sciences, New York University, 715 Broadway, New York, NY 10003, USAb Dpto. de Tecnolog´ıa Fotónica, Facultad de Informática, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain

Received 7 January 2004; accepted 6 March 2004

Abstract

This paper presents a real-time full-programmable fuzzy compiler based on piecewise linear interpolation techniquesdesigned to be executed in single instruction multiple data (SIMD) architectures. A full-programmable fuzzy processor isdefined as a system where the set of rules, the membership functions, thet-norm, thet-conorm, the aggregation operator, thepropagation operator, and the defuzzyfication algorithm can be defined by any valid algorithm. The SIMD platforms selectedare the Intel Pentium III (using the SSE set of instructions) and the Texas Instruments TI DSP C6x family. The final speedobtained in both implementations is highly satisfactory and better than the speed provided by standard specific hardware.© 2004 Elsevier B.V. All rights reserved.

Keywords:Fuzzy compiler; Single instruction multiple data; Linear interpolation techniques

1. Introduction

Fuzzy logic has been successfully introduced in awide range of applications, ranging from classical con-trol systems to decision support systems like nuclearplant supervision[12], medical applications[40] orautomotive applications[7]. A wide variety of fuzzylogic applications are described in[1,17]. Neverthe-less, two are the main drawbacks of fuzzy logic solu-tions: (1) the processing speed and (2) the limited pro-grammable capabilities of both the specific platformsand the design environments.

The need to process fuzzy knowledge base systemswith high speed resulted in the development of fuzzyhardware architectures. The first developments weredone in the mid 1980s by Togai[34] using a digi-tal architecture, and by Yamakawa[36] using analog

∗ Corresponding author. Tel.:+34-91-3367-370;fax: +34-91-3367-372.E-mail address:[email protected] (J. Gutierrez-Rıos).

techniques. Before that, other ASICs designed to pro-cess fuzzy knowledge bases with high speed wheredeveloped[6,25]. During the mid 1990s due to the ad-vance in the performance of standard architectures, itbecame possible to use them as a platform for highspeed fuzzy processing. In order to obtain the max-imum performance possible in standard architecturesfuzzy compilers where developed[23,39]. These com-pilers are based in the idea that the fuzzy syntax of therules is a useful way of representing knowledge, andthe fuzzy algorithm is a useful way of processing it, butthey are not the best way of processing a fuzzy logicsystem in a standard architecture. In order to processfuzzy systems with high speed in standard architec-tures it is needed a compilation from the fuzzy systemsyntax to a syntax suitable for a standard architecture.

The design characteristics of a fuzzy system areapplication dependent. Different applications willrequire differentt-norm, different membership func-tions, or different defuzzyfication algorithms. Be-sides, the same problem can be solved by different

1568-4946/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.asoc.2004.03.007

288 E. Frıas-Martınez et al. / Applied Soft Computing 4 (2004) 287–301

designers using fuzzy systems with different char-acteristics (different membership functions, differentinference mechanism, or different defuzzyficationalgorithms, for example). Nevertheless, a commoncharacteristic to almost all fuzzy coprocessors is thatonly the set of rules and the membership functionsof the system can be defined, while the fuzzy algo-rithm is implemented by hardware, and cannot beprogrammed. This problem is also present in fuzzycompilers; they are designed for a specific fuzzy sys-tem (usually Takagi–Sugeno (TS)[22,39]) and, again,only rules and membership functions can be defined.Some fuzzy logic integrated development environ-ments (IDE) do allow programmable capabilities,but the final inference mechanism obtained does notprovide high-speed processing.

In this paper a full-programmable fuzzy system isdefined as a system where the rules, membership func-tions, thet-norm, thet-conorm, the propagation oper-ator, the aggregation operator, and the defuzzyficationalgorithm can be defined. A full-programmable fuzzycompiler will allow to execute any kind of applicationusing standard hardware.

The platforms selected for the implementation ofthe proposed compiler are Intel Pentium III[16] andthe Texas Instruments DSP C6x family (C6201[30]and C6701[30]). These two families are, respec-tively, good examples of a microprocessor and of adigital signal processor architecture, which are thetypical platforms for the execution of fuzzy logicsystems. Also both platforms have SIMD architec-tures which make them ideal to execute the proposedcompiler.

In the rest of the paper, first the objectives and therelated work are described. Next a full-programmablefuzzy compiler and its implementation on SIMD ar-chitectures is introduced. In the last section, the char-acteristics of the controller implemented are comparedwith other implementations.

2. Objectives and related work

The main objective is the design of a high-perfor-mance full-programmable compiler for fuzzy systems.This goal makes the model to be involved in two fun-damental aspects: first, full programmability and, sec-ond, high performance in standard architectures.

Full programmability means to be able to define ab-solutely all the processes that a fuzzy controller is ableto carry out. That definition consists on determiningthe algorithm that characterises each process. The pro-grammable characteristics should be: (1) membershipfunctions (any kind of membership function should bepossible with the only constrain of being fuzzy num-bers), (2)t-norm andt-conorm, (3) propagation oper-ator, (4) aggregation operator and (5) defuzzyficationalgorithm.

On the other hand, the term high performanceis a time-varying concept depending on the currenttechnological state: nowadays, we may assume thatit is acceptable to have an execution speed of theorder of tenths of MFLIPS. With a response time inthe order of nanoseconds, it is ensured the possibil-ity to serve a majority of applications, as shown in[4].

Some authors have already presented the idea of us-ing a compiler to execute a fuzzy system. In[22,23]a compiler is presented, based on interpolation for TSsystems with membership functions with an overlapfactor of two. The ASIC implementation of this com-piler achieves 2 MFLIPS. In[39] it is presented a co-processor, the FZP-0401A, based on the compilationof the knowledge base also for TS systems that has aprocessing speed of 0.48 MFLIPS.

Although the concept of compiler has already beenused, as far as we know, no compiler has been devel-oped for full-programmable fuzzy systems. The de-velopment of such a compiler can be done using theconcept of approximate fuzzy compilation.

3. Approximate fuzzy compilation

Standard architectures are not adequate forhigh-performance fuzzy processing, because theirstructure has been designed for numerical applica-tions and, tough fuzzy inference is usually made ina digital way by fuzzy coprocessors, the nature ofthe treated information is not essentially numerical.Consequently, our purpose is to introduce a pre-vious compilation of the original fuzzy system inorder to adapt it to the operations usually made by ahigh-performance standard processor.

As shown inFig. 1, the compiler starts from thespecification of ann-dimensional fuzzy system (FS),

E. Frıas-Martınez et al. / Applied Soft Computing 4 (2004) 287–301 289

Fig. 1. Concept of approximate fuzzy compilation.

defined by the vector:

FS= (Tn, Tc,PO,AO,D,R,MFi,MFo,M, I,O)

(1)

whereTn is thet-norm of the system,Tc thet-conorm,PO the propagation operator, AO the agregation oper-ator,D the defuzzyfication algorithm,R the set of rulesof the system, MFi are the set of membership functionsof each input,i = 1, . . . , n, defined as fuzzy numbers,MFo the membership functions of the linguistic labelsdefined on each output,I = (I1, . . . , In) the vectorof inputs of the system, andO = (O1, . . . , On) thevector of the outputs of the system.

The compiler extracts an approximated fuzzy sys-tem (AFS) characterised by mathematical structuresable to make a fast evaluation of the system.

The control surface (CS) is the output of the systemfor any input values inside the universe of discourse.The main condition that must fulfil a fuzzy model isthe ability to generate a control surface CS′ such that,

CS′ ≈ CS (2)

Indeed, what becomes important under the point ofview of a fuzzy model is to behave as close as pos-sible to the fuzzy system. Although the output of theapproximate system is not exactly the same, this is notrelevant since a fuzzy controller is not conceived asa precise system. The designer determines the outputof the system only in a reduced set of well-selectedsamples and the rest of the output is obtained as a co-herent mix of them. Taking into account these ideas,the compiler must keep the value of the system in thepoints given by the designer in the fuzzy knowledgebase, but can obtain an approximation for the rest ofthe points for which no value was specified. The com-piler will preserve the certainty of the designer in theapproximate fuzzy system obtained.

The concept of certainty in this context is relatedwith the design of the fuzzy membership functions.For any given membership function, the kernel rep-resents the set of points for which the designer has acomplete certainty of the output of the system. Thecompiler will preserve the output value of these in-puts. For the rest of the points, where the membershipfunction goes from 0 to 1 or vice versa, the designerhas only a partial certainty of the output of the system.In these points the compiler can produce an approxi-mation.

In fact, many fuzzy systems are type-2[19,21]fuzzylogic systems. In these kind of systems uncertainty ismodelled using the third dimension of type-2 fuzzysets. A type-2 FS is able to consider uncertainties aboutmeasurements, rule labels of antecedents and conse-quents, fuzzy logical operators in use, etc. When alluncertainties disappear, then a type-2 FS reduces to atype-1 FS. Although a complete theory of type-2 FSsexists for general type-2 FSs, it is only for intervaltype-2 FSs that type-2 FSs are commonly practical.This paper uses this uncertainty as a positive factor toreduce the complexity of the corresponding approxi-mate fuzzy algorithms.

More formally, the control surface CS of an intervaltype-2 FS can be modelled by an interval functionF:

F : x → (z1, z2) (3)

wherex is the multivariate input of the MISO systemconsidered and (z1, z2) is the corresponding outputinterval.

To implement this interval fuzzy functionF an ap-proximation functionF′ is defined:

F ′ : x → z0 (4)

such thatz0 ∈ (z1, z2). This approximation is goodenough in many practical applications. The main ad-vantage of this consideration is that some functionsF′


can be modelled in many circumstances with a muchmore simple structure than the original fuzzy function.

A very simple approximator to implement is themulti-linear interpolator that is well-known in finiteelements analysis and computer graphics. Multi-linearinterpolators are simple extensions of linear ones to themultidimensional case, and are defined by a sequenceof linear interpolations.

These multi-linear interpolators can also be speci-fied in a fuzzy way by means of a product–sum TSfuzzy system with triangular fuzzy partitions on theantecedents domain[8,18]. Moreover, it is possibleto use the referred characteristic points of the speci-fied antecedent partitions: interval corners of supportand core of the corresponding antecedents, to directlyspecify each initial approximate triangular fuzzy par-tition. The corresponding domain points can also beconsidered such a suitable nonuniform sampling of thespecified fuzzy rule system, and the values of outputfunction on these characteristic grid points are used todefine the approximate multivariate TS system con-sidered:

{Ri : If x isNi thenz iszi} (5)

whereNi is the corresponding multivariate second or-der pyramidal spline andzi is the corresponding outputfunction in the multidimensional sampling pointi.

Section 4presents the derived model to efficientlycompute this TS system in a standard processor archi-tecture.

4. High-performance full-programmable fuzzymodel

The developed fuzzy model has been structuredin two steps, the first one compilation and the sec-ond one, execution. The compilation step transformsthe original fuzzy system into an equivalent approxi-mate fuzzy system capable of being executed in realtime in a standard architecture and keeping the rel-evant information of the original fuzzy system. Theexecution step describes how the output is computed

using the information obtained in the compilationstep.

4.1. Full-programmable fuzzy compiler

Let a fuzzy system FS be defined by a vector asstated inEq. (1).

4.1.1. Definitions of some useful functions and spacesIn this section a set of functions and spaces are

defined in order to make easier the description of themodel:

• For(V): letV = (A1, . . . , Ap) be a vector withAi ∈R. For(V) sorts the componentsAi in ascendant or-der, obtaining the outputV ′ = {A′

1, . . . , A′p} with

A′1 < . . . < A′

p.• Fe(V): let V = (A1, . . . , Ap) be a vector with

Ai ∈ R. Fe(V) takes out the components ofVwhich are repeated and generates a new vectorV ′ =(A′

1, . . . , A′l) with A′

1 �= . . . �= A′l.• Fi(V): let V = (A1, . . . , Ap) be a vector withAi ∈

R. Fi(V) obtains the intervals of vectorV:

Fi(A1, . . . , Ap) = ([A1, A2), . . . , [Ap−1, Ap))

(6)

• Card(A): gives the cardinality of the parameterA.• S: is the input space of the fuzzy system FS.• Kernele(A): obtains the two extreme points that de-

fine the kernel ofA.• Suppe(A): obtains the two extreme points that define

the support ofA.

4.1.2. Partition of the input space SRule activation depends on the region in which the

inputs are included. Consequently, it is important tomake a coherent partition of the input space, accord-ingly to the position of the linguistic labels.

We define the activation points of the inputIi, APi,with i = 1, . . . , n, as the set of points determined bythe extreme points of the kernels and supports of eachmembership function ofIi, in ascendant order. Usingthe functions defined inSection 4.1.1, we can expressthe activation points as:

APi = Fe

For

Card(MFi)⋃k=1

(Suppe(MFi,k) ∪ Kernele(MFi,k))

(7)


The activation intervals of an inputIi, AI i, is definedas the intervals of the activation points:

AI i = Fi(APi) (8)

Let FS be a fuzzy system with activation inter-vals AI = {AI1, . . . ,AIn} of the input variablesI ={I1, . . . , In}, andS then-dimensional input space de-fined by the vectorI of inputs. The partitionP of theinput spaceSof the system is defined as the cartesianproduct of the elements of AI:

1

n

ii

P AI=

= ×(9)

The next step of the model obtains the output of theFS in the vertex that define the cells of the partition P.Starting from these values, we construct the approxi-mate control surface CS′. The number of cells of thepartition P is given by the following expression:

Card(P) =n∏

i=1

Card(AIi) (10)

Each one of the cells in P, Pk, is determined by2n vertex, n being the dimension of the system,n = Card(I). Calling VP the total number of thevertexes of P, we can write the following expression:

VP =n∏

k=1

Card(APi) (11)

4.1.3. Characteristic matrix of the fuzzy system FSWe define the vertex matrix V of a fuzzy system

FS, as the matrix that contains the vertexes of parti-tion P. The characteristic matrix CM is defined as then-dimensional matrix containing the output of the FSin the vertexes of partition P. Dimi = Card(APi), be-ing the number of elements of the activation pointsof the dimension i, the element Va1,... ,an of V can bewritten as follows:

Va1,... ,an = (AP1,a1 , . . . ,APn,an) (12)

Consequently, V can be expressed as the Cartesianproduct of the activation points APi of each input tothe FS:

1

n

ii

V AP=

= ×(13)

Each vertex Va1,... ,an in V is an identifier of a cellPk of the partition P, where k is obtained as:

k =1∑

i=n

(ai − 1)

n∏k=i+1

Card(AIk)

+ 1 (14)

with a1 = 1 . . .Card(AP1) − 1, . . . , an = 1 . . .Card(APn)−1. The cell Pk is defined by the set of vertexesVEk:

VEk = (Va1,a2,... ,an , Va1+1,a2,... ,an ,

Va1,a2+1,... ,an , . . . , Va1+1,a2+1,... ,an+1) (15)

The characteristic matrix CM is obtained as the outputof FS in each element of V:

CMa1,a2,... ,an = FS(Va1,a2,... ,an) (16)

In order to accelerate the execution, the compilationphase defines two more functions: equalisation (E) andnormalisation (N).

4.1.4. Equalisation of the inputs of the systemLet I = (I1, I2, . . . , In) be the vector of inputs of

the fuzzy system FS. We define the equalisation ofthe dimension k (k = 1, . . . , n) as the natural numberindicating the activation interval AIk in which the in-put Ik is found. Consequently, being j = Card(APk)for each input Ik, the equalisation function Ek(Ik) isdefined as follows:

Ek(Ik) =

1 si Ik ∈ AIk,12 si Ik ∈ AIk,2...

j − 1 si Ik ∈ AIk,j−1

(17)

Fig. 2 presents a graphical representation of thefunction Ek(Ik).

The equalisation vector E is defined as the set of allthe equalisation functions:

E = (E1(I1), . . . , En(In)) (18)

In the same way, the equalisation vector of an inputI, E(I), is defined as the set of values of the equalisationfunctions for that input:

E(I)=(E1(I1), . . . , En(In))=(a1, a2, . . . , an) (19)

The equalisation vector of I identifies the cell Pk ofpartition P in which the input I is included, as statedin Eqs. (14) and (15).


Fig. 2. Equalisation function Ek(Ik).

4.1.5. Normalisation functionsThe objective of the normalisation functions is to

make all inputs to be normalised in the range [0,1) ineach activation interval.

Each coefficient ak of Eq. (19) points to the activa-tion interval of Ik which is currently active. [A,B) be-ing the activation interval ak of Ik, the normalisationfunction of Ik in the activation interval ak is definedas follows:

Nk,ai = Ik − A

B − A(20)

For each input Ik of the input vector I, there will beCard(AIk) normalisation functions. Nk(Ik) is definedas a vector containing all the normalisation functionsof Ik:

Nk(Ik) = (Nk,1(Ik), . . . , Nk,Card(AIk)(Ik)) (21)

The normalisation matrix (N) is defined as the setof all normalisation functions:

N = (N1(I1), . . . , Nn(In)) (22)

Finally, as it was the case for the equalisation func-tions, the normalisation vector of the input I, N(I), isdefined as the vector obtained after applying the re-spective normalisation functions to each one of thedimensions of the input vector I:

N(I) = (N1,a1(I1), · · · , Nn,an(In)) (23)

where, as stated in Eq. (19), (a1, a2, . . . , an) representboth, the activation intervals of I, and the vertex of thecell in which the input I is included.

4.1.6. Approximate fuzzy systemAs a product of the compilation we obtain three

items:

1. Characteristic matrix (CM) described in Section4.1.3, Eq. (16).

2. Equalisation vector defined in Eq. (18): E ={E1(I1), . . . , En(In)}.

3. Normalisation matrix defined in Eq. (22): N ={N1(I1), . . . , Nn(In)}.

The set of these three items constitutes the so calledapproximate fuzzy system of the original fuzzy systemFS:

AFS = {MC, E,N} (24)

4.2. Execution step of the full-programmable fuzzysystem

Once the AFS has been obtained, the first step forthe computation of the output of the model is to get therelevant information. With this purpose, it is necessaryto calculate the equalisation vector E(I) of the input Iusing Eq. (19). Using the equalisation vector E(I) asan index in the characteristic matrix CM, the outputsof FS for the vertexes of the cell Pk in which the inputI is included are obtained.

We define the characteristic vector CV(I) of the in-put I, as the set of values of the output of the FS on thevertexes of the cell Pk in which the input I is included:

CV(I) = (CMa1,a2,... ,an ,CMa1+1,a2,... ,an ,

CMa1,a2+1,... ,an , . . . ,CMa1,a2,... ,an+1)

(25)

The vector (a1, a2, . . . , an), has the necessary in-formation to obtain N(I) applying Eq. (23):

N(I) = (N1,a1(I1), N1,a2(I2), . . . , N1,an(In)) (26)

The approximate fuzzy system function of the inputI, AFS(I), is defined as the computation of the char-acteristic vector CV(I) and the normalisation vectorN(I) making use of the information provided by theapproximate fuzzy system AFS:

AFS(I) = {CV(I), N(I)} (27)

The second step of the execution phase, obtainsthe output of the system using the data obtained inEq. (27); O being the output of the model, O is ob-tained as:

O = FO(AFS(I)) = FO(CV(I), N(I)) (28)


where FO is the approximation function that pro-duces the output that the model provides. Thehigh-performance full-programmable fuzzy model, ormodel M for short, is defined as a tuple formed byAFS and FO:

M = {AFS, FO} (29)

4.2.1. Choosing the approximation function FO

The approximation function FO should verify thefollowing characteristics:

• FO must be defined as:

FO(R2n, Rn) → R (30)

• The result of FO for an input I must be as near tothe original FS as possible:

FS(I) ≈ FO(AFS(I)) (31)

• FO must be able to be efficiently implemented in aSIMD architecture.

The most immediate approximation function FO

that fulfills these premises is multi-linear interpolation.

4.3. Generalisation of the model M to multipleoutput systems

Let FS = (Tn, Tc,PO,AO,D,R,MFi,MFo,M,

I,O) be a multiple output (MIMO) fuzzy system,with O = (O1, . . . , Om), where m is the numberof outputs. The adaptation of the model M to theMIMO FS consists on considering the original FS ascomposed by m fuzzy systems:

FS1 = (Tn, Tc,PO,AO,D,R,MFi,MFo,M, I,O1)

...

FSm = (Tn, Tc,PO,AO,D,R,MFi,MFo,M, I,Om)

(32)

From each one of the fuzzy systems {FS1, . . . ,FSm}an associated model is obtained:

M1 = (AFS1, FO) · · ·Mm = (AFSm, FO) (33)

Consequently, the model M of the MIMO FS isgiven by the set of all the submodels:

M = {M1, . . . ,Mm} (34)

4.4. Validation of the model M: the model M as aTakagi–Sugeno compiler

The validity of the model M can be proved demon-strating that it produces an approximation of thecontrol surface CS as stated in Eq. (2), and that thisapproximation can be made as close to the originalsurface as needed. The proof consists only on provingthat the proposed compiler provides a Takagi–Sugenosystem.

The fact that the model M provides an approxima-tion to the original CS of FS is based on the followingresults:

• A zero-order Takagi–Sugeno system with triangu-lar linguistic labels, unity partition, overlapping fac-tor of two, product as t-norm and weighted sum ast-conorm is a multi-linear interpolator. This resultcan be found, for example in [18].

• A zero-order Takagi–Sugeno system with triangularlinguistic labels, unity fuzzy partition, overlappingfactor of two, product as t-norm and weighted sumas t-conorm, is a universal approximator. Castro [3]demonstrated that Takagi–Sugeno systems, as wellas many others, are universal approximators. Simi-lar results may be seen in [20,37,38].

Starting from those results the proof is immediate:as stated in [18] a zero-order Takagi–Sugeno systemwith triangular linguistic labels, unity fuzzy partition,overlapping factor of two and product–sum as t-normand t-conorm, respectively, is equivalent to carryingout a multi-linear interpolation.

The model M uses multi-linear interpolation as theapproximation function FO in each cell Pk of the inputspace. Consequently, in all those cells it is possible toobtain an equivalent Takagi–Sugeno system with theprevious characteristics.

For a system of two inputs, Fig. 3 shows a cellPk of P with the linguistic labels of the equivalentTakagi–Sugeno system and the output of the FS inthe vertexes of the cell. The four rules which woulddefine the equivalent Takagi–Sugeno system are thefollowing ones:

If I1 isEk and I2 isEp then Z = CMk,p

If I1 isEk+1 and I2 isEp then Z = CMk+1,p

If I1 isEk and I2 isEp+1 then Z = CMk,p+1

If I1 isEk+1 and I2 isEp+1 then Z = CMk+1,p+1

(35)


Fig. 3. Takagi–Sugeno system generated in a concrete cell.

Making a generalisation of these concepts, theconclusion is that it is possible to get the equivalentTakagi–Sugeno system for all the cells derived fromthe compilation step of the model M, using partitionP and the characteristic matrix CM. The model Mprovides a Takagi–Sugeno system which, accordingto [3], is a universal approximator. Then, the modelM is a universal approximator.

As a particular example, any zero-order Takagi–Sugeno system with triangular linguistic labels andoverlapping factor of two is approximated by themodel M without error.

4.5. Estimations of the model M

Once the model M has been defined, it is im-portant to evaluate the computational cost. Withthis purpose, the most important parameters are theamount of memory needed to store the data struc-tures generated by the model M, and the executiontime.

If we call B the number of bytes of a real numberor an integer in the chosen architecture, the amount ofnecessary memory will be given by:

Mem1 = B

n∏i=1

Card(APi) (36)

In the case of having the equalisation and normal-isation functions digitised, they would involve tablesof memory. Let pi (i = 1, . . . , n) be the number ofbytes used by the sensors of every one of the inputs ofthe system, B1 and B2 the size in bytes of the equal-isation and the normalisation of an input, and n thedimension of the system, the amount of necessary

memory to keep the tables of digitisation Mdig wouldbe:

Mdig =n∑

i=1

2pi(B1 + B2) (37)

The execution time of the model M is highly depen-dent on the approximation function FO employed. Thevalues given in this analysis are referred to multi-linearinterpolation. The number of interpolations to be madein an n-dimensional system equals 2n − 1. Conse-quently, τ being the processing time for a single in-terpolation, the total amount of time T1 necessary tocarry out the multi-linear interpolation will be:

T1 = τ(2n − 1) (38)

The estimation of τ may be done as the additionof the processing time of a product, an addition and asubstraction:

τ = τprod + τsum + τsub (39)

In order to estimate the latency to obtain the char-acteristic vector CV(I) and the normalisation vectorN(I), we will distinguish between discrete inferenceengines and non-discrete inference engines; c beingthe evaluation time of an equalisation function, p theevaluation time of a normalisation function and l theaccess time to memory, the total amount of time T toobtain the output in a non-discrete engine is:

T = T1 + nc + np + 2nl (40)

In the case of discrete engines, the evaluation timeof the equalisation and normalisation functions is re-duced to a single memory access:

T = T1 + nl + 2nl (41)

However, as it will be described latter, if the ar-chitecture of the processor is suitably designed and acache memory is installed in order to keep the last ac-cessed values, the evaluation time of N(I) and CV(I)become negligible. In this case, the total amount oftime necessary to get the output T, is reduced to thelatency of the multi-linear interpolation:

T = T1 (42)

This reveals the crucial importance of the compu-tational efficiency of the approximation function FO.


Taking into account Eqs. (36) and (37), it is easyto see that the amount of memory grows expo-nentially with the number n of dimensions of thesystem, and also linearly with the number of ac-tivation points of each dimension. Similarly, theexecution time also grows exponentially with thenumber of dimensions. Thus, the number of dimen-sions presents a serious problem, since the modelcould become unfeasible. Of course, this is not anexclusive problem for the model M, but for all thefuzzy models, being especially critical for real-timeapplications.

However, in a practical system, that kind of prob-lems are usually avoided, since complex systemsare generally designed in a tree structure of smallersub-controllers. This makes the memory to grow lin-early instead of exponentially with the dimensionof the system. Concerning to execution time, thetree structure provides such a degree of parallelismthat increment is linear with the dimension of thesystem.

5. Architecture of the controller

The controller implemented is divided in four mod-ules: the sensor, which receives the input of the sys-tem and produces de equalisation and linearisation ofthat input, the interpolation cache, which contains alsothe model memory and the inference engine, whichfrom the interpolation values obtained by the inter-polation cache produces the output of the system.The architecture of these components is presented inFig. 4.

5.1. Sensor

The sensor implements the set of equalisation andnormalisation functions, Ek(Ik), Nk,p(Ik). From the in-put of the system I, the sensor obtains (a1, . . . , an) andN(I).

Fig. 4. Interconnection of the modules of the controller.

5.2. Memory model

The memory model stores the characteristic matrixof the system, CM.

5.3. Interpolation cache

The interpolation cache receives the equalisationand the normalisation of the input, (a1, . . . , an) andN(I). From the equalisation of the input, it obtains thecharacteristic vector CV accessing the memory model.

Due to the locality of the inputs, if an input is ina cell Pk, the most probable situation is that the nextinput of the system I will be in the same cell Pk. Thismeans that the inference engine will have to workwith the same characteristic vector as in the previousinference, so the interpolation cache does not need toaccess the memory model to obtain it. The output ofthe interpolation cache is the characteristic vector CVand the normalisation of the input.

5.4. Inference engine on Intel Pentium III

The inference engine receives the characteristic vec-tor CV and the normalisation of the input N(I) andobtains the output O applying multi-linear interpola-tion. Two different versions of the inference enginehave been implemented for Pentium III: the first oneis based on the compiler optimisation capabilities andthe second one uses the set of SSE instructions of theprocessor.

5.4.1. Inference engine using compiler optimisationsThis implementation directly codes the multi-lineal

interpolation and uses the optimisations of the C com-piler [15] to produce efficient code. In order to obtainoptimum performance, Vtune Performance Analyzer[16] has been used. Also, some rules have been con-sidered when developing the code:

• Unwind of the existing loops to increase the execu-tion speed.


Table 1Measurements of the model M in an Intel Pentium III 1 GHz

System Time (ns) MFRPS MFLIPS

2 inputs/1 output 8.67 5,390 1113 inputs/1 output 12.98 26,675 774 inputs/1 output 28.97 80,025 33

• To gather the calculations for a better use of thetemporal values [35]. Intel C compiler is capable tominimise the number of memory accesses, and opti-mise the use of temporal variables. Because of that,the calculation of the output of the system is carriedout in a single line of code. The compiler will gen-erate the optimal structure in assembler language.

Table 1 presents the values of the inference enginein a Intel Pentium III 1 GHz processor. The numberof mega fuzzy rules per second (MFRPS) has beenmade assuming seven linguistic labels per input and acomplete rule-base.

The maximum number of clock cycles per interpo-lation is three. This happens in the case of two inputssystems because the number of operations to be madeis not enough to exploit all the resources of the pro-cessor; n being the number of inputs, m the number ofoutputs, and f the clock rate, the upper bound of theexecution time T is:

T ≤ (3f−1(2n − 1))m (ns) (43)

5.4.2. Inference engine using the SSE instructionsThe SSE set of instructions [15] of an Intel Pen-

tium III is an excellent environment to implement amulti-linear interpolation. This can be done using theSIMD [26] architecture of the processor.

The Pentium III processor has a 128-bit registerwhich can be accessed as a mm128 data type. Thisdata type can be also viewed as four 32-bit registers.Using this structure, one operation can be done inparallel to four different data, as can be seen in Fig. 5.

To access and operate with m128 data Intel com-piler has a set of intrinsics [15]. An intrinsic is a Cfunction that directly translates into an assembler op-eration that cannot be obtained using standard C code.Table 2 has examples of some intrinsics.

The code presented in Fig. 6 implements four in-ference engines (one in each element of the SIMD

Fig. 5. SIMD data structure of a Pentium III.

Table 2Examples of intrinsics

Intrinsic Description

m128 mm mul ps( m128 x, m128 y)

Multiplies two m128 data

m128 mm add ps( m128 x, m128 y)

Adds two m128 data

vector) of two inputs and one output, using the set ofSSE instructions where OutputG is the output of thesystem, CV G is the characteristic vector of the input,and N G the normalisation vector.

The feedback of the controller will be given by thearchitecture of the implemented fuzzy system. Thismakes possible that the code given in Fig. 6 can imple-ment different architectures. Some examples can be:four controllers of 2 inputs/1 output, one controller of8 inputs/4 outputs, one controller of 5 inputs/1 output,etc.

Each inference engine implemented in Fig. 6 has aspeed of 55 MFLIPS for a 2 inputs/1 output system.This gives a total throughput of 220 MFLIPS in aPentium III 1 GHz. (55 MFLIPS each inference systemmultiplied by four inference engines implemented).Table 3 presents the speed obtained for four inferenceengines of 2 inputs/1 output and four inference enginesof 4 inputs/1 output in a Intel Pentium III 1 GHz.

Table 3Speed obtained with the proposed model

Inference engine Throughput

Four systems 2 I/1 O 55 MFLIPS 220 MFLIPSFour systems 4 I/1 O 7.04 MFLIPS 28.16 MFLIPS


Fig. 6. Code of the inference engines.

5.5. Inference engine on Texas Instruments TI DSPC6x family

As in the previous case, the inference engine re-ceives the characteristic vector CV and the normal-isation of the input N(I) and obtains the output Oapplying multi-linear interpolation. The inference en-gine has been implemented in a fixed-point DSP, theTMS320C6201 and in a floating-point DSP, the DSPTMS320C6701.

The code has been developed in C language forboth implementations, and has been optimised suc-cessfully using TI compiler options. Although the codehas been developed in a high level language the speedachieved is very satisfactory. This has an importantadvantage, the inference engine developed can be exe-cuted in any architecture with C support. The code hasalso been manually optimised using Code ComposerStudio.

Code Composer Studio [33] (CCS) is an inte-grated development environment (IDE) that providesa variety of tools to simplify the coding process andaccelerate development time. CCS includes a debug-ger, an editor, a profiler, and a project manager. Italso has a highly efficient optimising C compiler andan assembly optimiser, which allows to choose thelevel of performance required. The compiler enablesoptimising features such as instruction packing, con-ditional branching, intrinsics, in-line assembly andprogram-level optimisation.

The main optimisations introduced in the code are:

• Unwind of the loops [31,32] of the approximationfunction FO.

• Segmentation of the operations [5,31,32]. TI C com-piler works better when instructions are very parti-tioned, because it is able to optimise the multiplesunits to be handled, and to reach a higher degree ofparallelism.

5.5.1. Inference engine in a TMS320C6201The TI TMS320C6201 [30] is a fixed point DSP de-

veloped by Texas Instruments, and the first of the C62xgeneration. It is designed with Texas Instruments’ Ve-lociTI architecture, which is an advanced very longinstruction word (VLIW) architecture. The CPU coreof the DSP consist on two data paths with a generalpurpose register and four functional units each path.

The performance of the inference engine in this plat-form is given in Table 4.

The estimation of number of mega fuzzy rules persecond has been made assuming seven linguistic labelsper input and a complete rule-base. Table 4 revealsthat the maximum time to make an interpolation is42 ns; n being the number of inputs and m the numberof outputs, an upper bound of the execution time is:

T ≤ (42(2n − 1))m (ns) (44)

5.5.2. Inference engine in a TMS320C6701The TI TMS320C6701 [30] is a floating point DSP

developed by Texas Instruments, and basically has thesame architectural characteristics as the C62x. Theperformance of the inference engine in this implemen-tation is given in Table 5.

Table 5 reveals that the maximum time to make aninterpolation is 67 ns; n being the number of inputsand m the number of outputs, an upper bound of theexecution time is:

T ≤ (67(2n − 1))m (ns) (45)

Table 4Response time of the model M with a TI TMS320C6201

System Clockcycles

Time (ns) MFRPS MFLIPS

2 inputs/1 output 23 115 421.4 8.63 inputs/1 output 45 225 1509 4.44 inputs/1 output 126 630 3601 1.5


Table 5Response time of the model M with a TI TMS320C6701

System Clockcycles

Time (ns) MFRPS MFLIPS

2 inputs/1 output 30 201 245 53 inputs/1 output 43 288 1200 3.54 inputs/1 output 66 442.2 5282 2.2

6. Features of the high-speed full-programmablefuzzy model

Fig. 7 shows the original control surface of a 2input/1 output fuzzy system FS with Max–Min as theinference method and centre of gravity (COG) as thedefuzzyfication algorithm.

Both the inputs and the output have seven linguis-tic labels: negative big (nb), negative medium (nm),negative small (ns), zero (z), positive small (ps), pos-itive medium (pm) and positive big (pb). The shapesof the membership functions are trapezoidal or trian-gular. The rule-base contains 49 rules.

6.1. Output comparison of the model M

This section compares the output of the system FSof Fig. 7 with the output obtained with the approx-imated fuzzy system AFS obtained from the compi-lation of FS with the proposed model M, shown inFig. 8.

Fig. 7. Control surface of FS.

Fig. 8. Control surface produced by the compilation of FS withmodel M.

The values of the control surface of FS where ob-tained with the fuzzy integrated development environ-ment (IDE) FuzzyTech [14]. The output values of theapproximated fuzzy system AFS were obtained exe-cuting the inference engine on an Intel Pentium III1 GHz.

The comparison of the value of 48,682 points ofboth systems shows that: (1) 35% of the points of AFShave exactly the same value as in FS, (2) the deviationof the other 65% of the points is smaller than 0.5%with respect to the output range and (3) the averagedeviation of all the points is 0.57%.

All that, in spite of the fact that the original FSuses triangular membership functions to define someof the inputs. This is the worst case scenario, be-cause triangular membership functions only have onepoint of certainty, and the model M is designed tokeep the certainty of the designer. This means thatwhen the original systems FS uses trapezoidal la-bels, or any label whose kernel is wider than just onepoint, the approach produced by AFS would be evenbetter.

A useful advantage is derived from using multi-linearinterpolation as an approximation function of themodel M: the elimination of small variations in theoriginal control surface. As a matter of fact, triangularor trapezoidal labels usually are naive representationsof the real concept and they produce non desired vari-ations in the control surface whose filtering providesbenefices in control applications [9,10].


Table 6Processing time of some commercial fuzzy coprocessors

System WARP 2.0 (�s) SAE81C99 (�s)

3 I/1 O – 124 I/2 O 33.1 –

Model M is based on the fact that the design of FS istolerant by nature. The designer knows the output hewants, only in several points of reference and the restof the response of the system is deduced by means ofa combination of the output in those points. The out-put of the system in the reference points is expressedby rules, and the algorithm of combination is givenalong with the FS programming. Because model Mgenerates a control surface which preserves the posi-tions where the output is well known by the designer,and approaches the rest of the positions, the result isthat model M preserves the certainty of the designer.

6.2. Speed comparison of the model M

Table 6 gives the processing time of some com-mercial fuzzy coprocessors. WARP 2.0 [25] andSAE81C99 [6] are fuzzy commercial coprocessors ofST Microelectronics and Siemens, respectively.

Values of Table 6 show that model M implementedin DSPs reduces the execution time of a fuzzy system,in the worst case, by a factor of 15. Implementationon Intel Pentium III 1 GHz reduces the execution timeby a factor between 550 and 990, depending on thenumber of inputs and the considered coprocessor.

Usually high-speed fuzzy processing systems arebased on ASIC developments or on compilers thatadapt the original fuzzy system into some representa-tion suitable of being executed with high speed. In thissection we present some of the most relevant relatedwork in order to compare the speed achieved with themodel M.

Some of the most relevant solutions based on ASICsare:

• Project high energy physic experiments (HEPE)uses fuzzy logic to compute the trajectory of par-ticles in an accelerator. In [11] it is describedthe design and implementation of a fuzzy co-processor of four inputs and one output whichemploys PROD–SUM as t-norm and t-conorm.

This coprocessor obtains a processing speed of50 MFLIPs without defuzzyfication, but only 12MFLIPs including defuzzyfication.

• In [13] it is also presented a Min–Max coprocessor.It makes defuzzyfication by the method of cen-tre of gravity which is computationally heavy. Itsmaximum execution speed is of 6 MFLIPS.

• The processor shown in [24] reached 10 MFLIPSusing Lukasiewicz as t-norm, with a pipelinedarchitecture.

Some of the most relevant solutions based on com-pilation of the rule-base are:

• In [39] it is described a fuzzy coprocessor, FZP-0401A based on the compilation of the rule-base.The authors also prove that a Takagi–Sugeno FS isequivalent to an interpolator. This result is the baseof the coprocessor implemented on an ASIC. Thespeed obtained is 0.48 MFLIPS.

• Rovatti et al. [22] developed a model of inferencebased on simplicial interpolation, starting fromTakagi–Sugeno systems. It uses Min–Max as aninference method, triangular labels with degree ofoverlapping of two in the input, and singletons atthe consequent. This model was implemented onan ASIC and a speed of 2 MFLIPS was obtained.

As shown in Tables 1, 4 and 5, the speed obtainedby model M in a DSP is faster than the best part ofrelated work, and when comparing the Pentium III im-plementation, the speed is remarkably improved whencompared with related work. At the same time it isimportant to take into account that model M is fullyprogrammable. Therefore, it can be adapted to any in-ference procedure, shape of labels or defuzzyficationmethod, and do not present any restriction, being ableto implement any fuzzy solution.

7. Conclusions and future work

This paper has presented a compiler that allowsto execute real-time full-programmable fuzzy systemsusing standard SIMD architectures.

This compiler makes it possible to introducefuzzy logic techniques to applications that requirereal-time processing and more programmable capa-bilities. Using a standard architecture as a DSP or a


microprocessor also makes possible to develop thefinal solution very fast and at a low cost, comparedto specific fuzzy circuits. Also, a full-programmablemodel allows to use the same hardware and the samemodel for any kind of application.

The future work is divided in two main areas: (1)optimize the model M in order to reduce the numberof computations needed to obtain the output and (2)implement the model M in new platforms that willincrease the speed obtained.

Some improvements that can be made as part of thefirst area are the optimisation of the normalisation andequalisation functions, especially to avoid unneededoutput calculations.

The second area should explore the implementationof the model M in new platforms such as TMS320C64[27], TMS320C6211 [29] and TMS320C6711 [28]and Pentium IV. The design of an ASIC that imple-ments the execution phase of the model M could im-prove the performance of the system. An interestingintermediate solution between standard architecturesand ASICs are configurable instruction set processor(CISCP) architectures, such as [2]. In this case the de-sign of an interpolation instruction should improve theprocessing speed.

References

[1] C. Von Altrock, Fuzzy Logic and Neurofuzzy Applicationsin Business, Prentice-Hall, 1997.

[2] ARC Cores Ltd., CISP Spells the End of Off-the-PegProcessors in Electronic Express Europe, October 1999, p. 22.

[3] J.L. Castro, Fuzzy logic controllers are universal approxi-mators, IEEE Trans. Syst. Man Cybern. 25 (4) (1995) 629–635.

[4] A. Costa, A. de Gloria, Hardware solutions for fuzzy control,Proc. IEEE 83 (3) (1995) 422–434.

[5] J. Edwards, Optimizing Code for Advanced DSPs in Embed-ded Systems Programming Europe, March 1998, pp. 8–19.

[6] H. Eichfeld, A 12 b general-purpose fuzzy logic controllerchip, IEEE Trans. Fuzzy Syst. 4 (4) (1996) 460–475.

[7] D. Elting, M. Fennich, R. Kowalczyk, Fuzzy Anti-LockBrake System Solution in Intel C. Automot. Oper. Center,http://www.developer.intel.com/design/mcs96/designex/2351.htm, 1998.

[8] F. Fernández, J. Gutiérrez, The FRISC Model, Technicalreport, Dept. de Tecnologıa Fotónica, Universidad Politécnicade Madrid, November 1998.

[9] F. Fernández, J. Gutiérrez, J.C. Crespo, G. Triviño, A shape-preserving affine Takagi–Sugeno model based on a piecewiseconstant nonuniform fuzzyfication transform, in: Proceedingsof the International Conference WSEAS, 2002.

[10] F. Fernández, J. Gutiérrez, J.C. Crespo, G. Triviño,Construction of shape-preserving first-order Takagi–Sugenosystems via linear-rational spline fuzzy partitions, in:Proceedings of the 4th International Conference on RecentAdvances in Soft Computing, December 2002.

[11] A. Gabrielli, E. Gandolfi, A Fast Digital Fuzzy Processor,IEEE Micro, January–February 1999, pp. 68–79.

[12] J. Gebhardt, C. Von Altrock, Recent successful fuzzy logicapplications in industrial automation, in: Proceedings of the5th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 96, New Orleans, September 1998.

[13] S. Guo, L. Peters, A High-Speed, Reconfigurable Fuzzy LogicController, IEEE Micro, December 1995, pp. 1–11.

[14] FuzzyTECH 4.1, Reference Manual, Inform GmbH, 2000.[15] Intel C/C++ Compiler User’s Guide with Support for the

Streaming SIMD Extensions, Intel Corporation, 2000.[16] Vtune Performance Enhancement Environment, Intel Cor-

poration, 1999.[17] M. Jamshidi, A. Titli, L.A. Zadeh, S. Boverie, Applications of

Fuzzy Logic: Towards a High Machine Intelligence QuotientSystems, Prentice-Hall, 1997.

[18] A. Jiménez, F. Matıa, A fast piecewise-linear implementationof fuzzy controllers, in: Proceedings of the 1st InternationalJoint Conference of NAFIPS-IFIS-NASA, San Antonio, TX,December 1994, pp. 27–31.

[19] J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory andApplications, Prentice-Hall, PTR, 1995.

[20] V. Kreinovich, H.T. Nguyen, Y. Yam, Fuzzy systemsare universal approximators for a smooth function andits derivatives, en technical report CUHK-MAE-99002,The Chinese University of Hong Kong, Ene., 1999,http://www.citeseer.nj.nec.com/kreinovich99fuzzy.html.

[21] J.M. Mendel, Uncertainty, fuzzy logic, and signal processing,Signal Process. 80 (6) (2000) 913–933.

[22] R. Rovatti, C. Fantuzzi, S. Simani, High speed DSP-basedimplementation of picewiseaffine and picewise-quadraticfuzzy systems, Signal Process. 80 (6) (2000) 951–963.

[23] R. Rovatti, Linear and fuzzy piecewise-linear signalprocessing with an extended DSP architecture, in: Proceedingsof the 7th IEEE International Conference on Fuzzy Systems,FUZZ-IEEE 98, pp. 1082–1087.

[24] L. de Salvador, Modelo de un Microprocesador Borroso deAltas Prestaciones y su Diseño, Ph.D. thesis, UniversidadPolitécnica de Madrid, 1996.

[25] Weight Associative Rule Processor WARP 2.0, SGS-ThomsonMicroelectronics, 1994.

[26] S. Thakkar, Streaming SIMD Extension, IEEE Computer,December 1999, pp. 26–34.

[27] Texas Instruments, TMS320C64x Technical Overview,February 2000.

[28] Texas Instruments, TMS320C6711, Floating-point digitalsignal processor, September 2000, http://www.ti.com/sc/docs/products/dsp/tms320c6711.html.

[29] Texas Instruments, TMS320C6211, Fixed-point digitalsignal processor, September 2000, http://www.ti.com/sc/docs/products/dsp/tms320c6211.html.

[30] TMS320C62x/C67x CPU, Reference Guide, Texas Instru-ments, March 1998.

http://www.developer.intel.com/design/mcs96/designex/2351.htm

http://www.developer.intel.com/design/mcs96/designex/2351.htm

http://www.citeseer.nj.nec.com/kreinovich99fuzzy.html

http://www.ti.com/sc/docs/products/dsp/tms320c6711.html





[31] TMS320C6x Optimizing C Compiler, User’s Guide, TexasInstruments, February 1998.

[32] TMS320C62x/C67x, Programmer’s Guide, Texas Instru-ments, February 1998.

[33] TMX320C6000, Code Composer Studio, Tutorial, TexasInstruments, 1999.

[34] M. Togai, Expert system on a chip: an engine for approximatereasoning, IEEE Expert 1 (3) (1986) 55–62.

[35] J.H. Wolf, Programming methods for the Pentium III proce-ssor’s streaming SIMD extensions using the Vtuneperformance enhancement environment, Intel Technol. J. Q2(1999), http://www.developer.intel.com/technology/itj/index.htm.

[36] T. Yamakawa, A simple fuzzy computer hardware systememploying min and max operations, in: Proceedings of the2nd IFSA Congress, 1987, pp. 122–130.

[37] H. Ying, General Takagi–Sugeno fuzzy systems are universalapproximators, in: Proceedings of the 7th IEEE InternationalConference on Fuzzy Systems, Anchorage, Alaska, May1998, pp. 819–823.

[38] H. Ying, Y. Ding,. S. Li, Typical Takagi–Sugeno andMamdami fuzzy systems as universal approximators:necessary conditions and comparison, in: Proceedings ofthe 7th IEEE International Conference on Fuzzy Systems,Anchorage, Alaska, May 1998, pp. 824–828.

[39] N. Yubazaki, M. Otani, A. Mutuo, T. Ashida, Fuzzy inferencechip FZP-0401A based on interpolation algorithm, Fuzzy SetsSyst. 98 (3) 299–310.

[40] G. Zahlman, M. Scherf, Monitoring glaucoma by means of aneurofuzzy classifier, Fuzzy Logic Application Note Series,Inform Corp., 1998, http://www.fuzzytech.com.

http://www.developer.intel.com/technology/itj/index.htm

http://www.developer.intel.com/technology/itj/index.htm

http://www.fuzzytech.com

Documents

Efficient fuzzy compiler for SIMD architectures