Sets, fuzzy sets and rough sets - York University...Lotfi Zadeh proposed completely new, elegant approach to vagueness called fuzzy set theory oach an element can belong to a set to

Rough Sets

Zdzisław Pawlak

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences,

ul. Bałtycka 5, 44 100 Gliwice, Poland

University of Information Technology and Management ul. Newelska 6, 01-447 Warsaw, Poland

[email protected]

1

mailto:[email protected]

Contents

Introduction ......................................................................................................................... 3

CHAPTER 1 Rough Sets – Basic Concepts........................................................................ 5

CHAPTER 2 Rough Sets and Reasoning from Data ........................................................ 14

CHAPTER 3 Rough Sets and Bayes’ Theorem................................................................ 29

CHAPTER 4 Data Analysis and Flow Graphs.................................................................. 37

CHAPTER 5 Rough Sets and Conflict Analysis .............................................................. 45

2

Introduction

Rough set theory is a new mathematical approach to imperfect knowledge. The problem of imperfect knowledge has been tackled for a long time by philosophers,

logicians and mathematicians. Recently it became also a crucial issue for computer scientists, particularly in the area of artificial intelligence. There are many approaches to the problem of how to understand and manipulate imperfect knowledge. The most successful one is, no doubt, the fuzzy set theory proposed by Zadeh [2].

Rough set theory proposed by the author in [1] presents still another attempt to this problem. The theory has attracted attention of many researchers and practitioners all over the world, who contributed essentially to its development and applications.

Rough set theory has an overlap with many other theories. However we will refrain to discuss these connections here. Despite of the above mentioned connections rough set theory may be considered as the independent discipline in its own rights.

Rough set theory has found many interesting applications. The rough set approach seems to be of fundamental importance to AI and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, inductive reasoning and pattern recognition.

The main advantage of rough set theory in data analysis is that it does not need any preliminary or additional information about data − like probability in statistics, or basic probability assignment in Dempster-Shafer theory, grade of membership or the value of possibility in fuzzy set theory.

The proposed approach

• provides efficient algorithms for finding hidden patterns in data, • finds minimal sets of data (data reduction), • evaluates significance of data, • generates sets of decision rules from data, • it is easy to understand, • offers straightforward interpretation of obtained results, • most algorithms based on the rough set theory are particularly suited for parallel

processing.

Basic ideas of rough set theory and its extensions, as well as many interesting applications can be found on the internet, e.g., http://www.roughsets.org

The booklet is organized as follows: Chapter 1 (Basic Concepts) contains general formulation of basic ideas of rough set theory together with brief discussion of its place in classical set theory. Chapter 2 (Rough Sets and Reasoning from Data) presents the application of rough set concept to reason from data (data mining). Chapter 3 (Rough Sets and Bayes’ Theorem) gives a new look on Bayes’ theorem and shows that Bayes’ rule can be used differently to that offered by classical Bayesian reasoning

em thodology. In Chapter 4 (Data Analysis and Flow Graphs) we show that many problems in data analysis can be boiled down to flow analysis in a flow network.

3

Chapter 5 (Rough Sets and Conflict Analysis) discuses the application of rough set concept to study conflict.

This booklet is a modified version of lectures delivered at the Tarragona University seminar on Formal Languages and Rough Sets in August 2003.

References

[1] Z. Pawlak: Rough sets, International Journal of Computer and Information Sciences, 11, 341-356, 1982

[2] L. Zadeh: Fuzzy sets, Information and Control, 8, 338-353, 1965

4

CHAPTER 1

Rough Sets – Basic Concepts

1. Introduction

In this chapter we give some general remarks on a concept of a set and the place of rough sets in set theory. The concept of a set is fundamental for the whole mathematics. Modern set theory was formulated by George Cantor [1].

Bertrand Russell has discovered that the intuitive notion of a set proposed by Cantor leads to antinomies [8]. Two kinds of remedy for this discontent have been proposed: axiomatization of Cantorian set theory and alternative set theories.

Another issue discussed in connection with the notion of a set is vagueness. Mathematics requires that all mathematical notions (including set) must be exact (Gottlob Frege[2]). However philosophers and recently computer scientists got interested in vague concepts.

The notion of a fuzzy set proposed by Lotfi Zadeh [10] is the first very successful approach to vagueness. In this approach sets are defined by partial membership, in contrast to crisp membership used in classical definition of a set.

Rough set theory, introduced by the author, [4] expresses vagueness, not by means of membership, but employing a boundary region of a set. If the boundary region of a set is empty it means that the set is crisp, otherwise the set is rough (inexact). Nonempty boundary region of a set means that our knowledge about the set is not sufficient to define the set precisely.

In this paper the relationship between sets, fuzzy sets and rough sets will be outlined and briefly discussed.

2. Sets

The notion of a set is not only basic for the whole mathematics but it also plays an important role in natural language. We often speak about sets (collections) of various objects of interest, e.g., collection of books, paintings, people etc.

Intuitive meaning of a set according to some dictionaries is the following:

“A number of things of the same kind that belong or are used together.” Webster’s Dictionary

“Number of things of the same kind, that belong together because they are similar or complementary to each other.”

The Oxford English Dictionary

Thus a set is a collection of things which are somehow related to each other but the nature of this relationship is not specified in these definitions.

5

In fact these definitions are due to the original definition given by the creator of set theory, George Cantor [1], which reads as follows:

“Unter einer Mannigfaltigkeit oder Menge verstehe ich nämlich allgenein jedes Viele, welches sich als Eines denken lässt, d.h. jeden Inbegriff bestimmter Elemente, welcher durch ein Gesetz zu einem Ganzen verbunden werden kann.”

Thus according to Cantor a set is a collection of any objects, which according to some law can be considered as a whole.

All mathematical objects, e.g., relations, functions, numbers, etc., are some kind of sets. In fact set theory is needed in mathematics to provide rigor.

Bertrand Russell discovered that the intuitive notion of a set given by Cantor leads to antinomies (contradictions) [8]. One of the best known antinomies called the powerset antinomy goes as follows: consider (infinite) set X of all sets. Thus X is the greatest set. Let Y denote the set of all subsets of X. Obviously Y is greater then X, because the number of subsets of a set is always greater the number of its elements. For example, if X = {1, 2, 3} then Y = {∅,{1},{2},{3},{1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}, where ∅ denotes the empty set. Hence X is not the greatest set as assumed and we arrived at contradiction. Thus the basic concept of mathematics, the concept of a set, is contradictory. That means tha

posed. For ex

• Axiomatic set theory (Zermello and Fraenkel, 1904) • • Theory of classes (v. Neumann, 1920)

All these improvements consist on restrictions, put on objects which can form a set. The res

on, some ma

• Mereology (Leśniewski, 1915) • Alternative set theory (Vopenka, 1970) • “Penumbral” set theory (Apostoli and Kanada, 1999)

No doubt the most interesting proposal was given by Stanisław Leśniewski [3], who pro

e mentioned above “new” set theories were accepted by mathematicians, ho

other words, it means tha

also in vague (imprecise) notions.

t a set cannot be a collection of arbitrary elements as was stipulated by Cantor. As a remedy for this defect several improvements of set theory have been proample,

Theory of types (Whitehead and Russell, 1910)

trictions are expressed by properly chosen axioms, which say how the set can be build. They are called, in contrast to Cantors’ intuitive set theory, axiomatic set theories.

Instead of improvements of Cantors’ set theory by its axiomatizatithematicians proposed escape from classical set theory by creating completely new idea of

a set, which would free the theory from antinomies. Some of them are listed below.

posed instead of membership relation between elements and sets, employed in classical set theory, the relation of “being a part”. In his set theory, called mereology, this relation is a fundamental one.

None of the threwever Leśniewski’s mereology attracted some attention of philosophers and recently also

computer scientists, (e.g., Lech Polkowski and Andrzej Skowron [7]). In classical set theory a set is uniquely determined by its elements. Int every element must be uniquely classified as belonging to the set or not. That is to say the

notion of a set is a crisp (precise) one. For example, the set of odd numbers is crisp because every number is either odd or even. In mathematics we have to use crisp notions, otherwise precise reasoning would be impossible. However philosophers for many years were interested

6

For example, in contrast to odd numbers, the notion of a beautiful painting is vague, because we are unable to classify uniquely all paintings into two classes: beautiful and not beautiful. Some paintings cannot be decided whether they are beautiful or not and thus they rem

o for computer sci

in 1893 by the father of modern logic Gottlob Frege [2]. He wrote:

r kein Bezirk;

ave a sharp boundary. To the concept without a sharp boundary there would correspond an area that had not a sharp boundary-line all around.”

I.e., mathematics must use crisp, not vague concepts, otherwise it would be impossible to rea

y • e

3. Fuz

Lotfi Zadeh proposed completely new, elegant approach to vagueness called fuzzy set theory oach an element can belong to a set to a degree k (0 ≤ k ≤ 1), in contrast to

classical set theory where an element must definitely belong or not to a set. E.g., in classical

whLet us observe that the definition nvolves more advanced mathematical

concepts, real numbers and functions, ssical set theory the notion of a set is

ain in the doubtful area. Thus beauty is not a precise but a vague concept. Almost all concepts we are using in natural language are vague. Therefore common sense

reasoning based on natural language must be based on vague concepts and not on classical logic. This is why vagueness is important for philosophers and recently als

entists. Vagueness is usually associated with the boundary region approach (i.e., existing of

objects which cannot be uniquely classified to the set or its complement) which was first formulated

“Der Begrieff muss scharf begrenzt sein. Einem unscharf begrenzten Begriff würde ein Bezirk ensprechen, der nicht überall ein scharfe Grentzlinie hätte, sondern stellenweise gantz verschwimmend in die Umgebung übergine. Das wäre eigentlich gaund so wird ein unscharf definirter Begrieff mit Unrecht Begrieff gennant. Solche begriffsartige Bildungen kann die Logik nicht als Begriffe anerkennen; es is unmäglich, von ihnen genaue Gesetze auszustellen. Das Gesetz des ausgeschlossenen Drititten ist ja eigentlich nur in anderer From die Forderung, dass der Begriff scharf begrentz sei. Ein beliebiger Gegenstand x fält entwerder unter der Begriff y, oder er fällt nich unter ihn: tertium non datur.”

Thus according to Frege

“The concept must h

son precisely. Summing up, vagueness is

• Not allowed in mathematics • Interesting for philosoph

Necessary for computer scienc

zy Sets

[10]. In his appr

set theory one can be definitely ill or healthy, whereas in fuzzy set theory we can say that someone is ill (or healthy) in 60 percent (i.e. in the degree 0.6). Of course, at once the question arises where we get the value of degree from. This issue raised a lot of discussion, but we will refrain from considering this problem here.

Thus fuzzy membership function can be presented as

µX(x)∈<0,1>

ere, X is a set and x is an element. of fuzzy set iwhereas in cla

7

used as a fundamental notion of whole mathematics and is used to derive any other mathematical concepts, e.g., numbers and functions. Consequently fuzzy set theory cannot replace classical set theory, because, in fact, the theory is needed to define fuzzy sets.

Fuzzy membership function has the following properties.

a) )(1)( xx XXU µµ −=− for any Ux∈

b) ))(),(()( xxmaxx YXYX µµµ =∪ for any Ux∈

c) )(), x Yµ ) for any (()( xminx XYX µµ =∩ Ux∈

ership of an element to the union and intersection of sets is ership to constituent sets. This is a very nice property and

, which is a very important feature both the

Rough set theory [4] is still another approach to vagueness. Similarly to fuzzy set theory it is to classical set theory but it is embedded in it. Rough set theory can be

viewed as a specific implementation of Frege’s idea of vagueness, i.e., imprecision in this

this problem more precisely. Suppose we are given a set of objects U cal

sake of simplicity we assume that R is an eq

spect to R). •

•

Now

• region of X is empty. undary region of X is nonempty.

Thu the set is crisp (p

The approximations and the boundary region can be defined more precisely. To this end we

oted by R(x). The indiscernibility relation in certain sense describes our lack of knowledge about the universe.

That means that the membuniquely determined by its memballows very simple operations on fuzzy sets

oretically and practically. Fuzzy set theory and its applications developed very extensively over last years and

attracted attention of practitioners, logicians and philosophers worldwide.

4. Rough Sets

not an alternative

approach is expressed by a boundary region of a set, and not by a partial membership, like in fuzzy set theory.

Rough set concept can be defined quite generally by means of topological operations, interior and closure, called approximations.

Let us describeled the universe and an indiscernibility relation R ⊆ U × U, representing our lack of

knowledge about elements of U. For theuivalence relation. Let X be a subset of U. We want to characterize the set X with respect to

R. To this end we will need the basic concepts of rough set theory given below.

• The lower approximation of a set X with respect to R is the set of all objects, which can be for certain classified as X with respect to R (are certainly X with reThe upper approximation of a set X with respect to R is the set of all objects which can be possibly classified as X with respect to R (are possibly X in view of R). The boundary region of a set X with respect to R is the set of all objects, which can be classified neither as X nor as not-X with respect to R.

we are ready to give the definition of rough sets.

Set X is crisp (exact with respect to R), if the boundary• Set X is rough (inexact with respect to R), if the bo

s a set is rough (imprecise) if it has nonempty boundary region; otherwise recise). This is exactly the idea of vagueness proposed by Frege.

need some additional notation. The equivalence class of R determined by element x will be den

8

Eqwe are able to perceive due to R. Thus in view of the

ind

uivalence classes of the indiscernibility relation, called granules generated by R, represent elementary portion of knowledge

iscernibility relation, in general, we are an able to observe individual objects but we are forced to reason only about the accessible granules of knowledge.

Formal definitions of approximations and the boundary region are as follows:

• R-lower approximation of X

( ) ( ) ( ){ }∪ XxRxRxR ⊆= :* Ux∈

• R-upper approximation of X

( )* ( ) ( ){ }R

• R-boundary region of X

∪Ux

XxRxRx∈

∅≠∩= :

( ) ( ) ( )XRXRX * −=

As we can see from ons are expressed in terms of granules of knowledge. The lower approximation of a set is union of all granules which are entirely included in the set; the upper approxim is union of all granules which have non-empty intersection with the set; the boundary region of set is the difference between the upper and the

RN R *

the definition approximati

ation −

lower approximation. This definition is clearly depicted in Figure 1.

Granules of knowledge The set of objects

Fig. 1

to tions of classical sets, fuzzy sets and rough sets. Classical itive notion and is defined intuitively or axiomatically. Fuzzy sets are def

The upper approximation

The set The lower approximation

It is interestinget is a prim ined by

employing the fuzzy membership function, athematical structures,

compare defini

swhich involves advanced m

9

nu

)(Y∗

)() YR∗⊆

)(XR∗=

)() XR∗=

fact interior and closure operations in a ry and rough set theory require completely

diffe

>→< 1,0:URµ

( )

mbers and functions. Rough sets are defined by approximations. Thus this definition also requires advanced mathematical concepts.

Approximations have the following properties:

1) )()( XRXXR ∗∗ ⊆⊆

2) UURUR RR ==∅=∅=∅ ∗∗

∗∗ )()(;)()(

3) R)(( XRYXR ∗∗ ∪=∪

4) )()()( YRXRYXR ∗∗∗ ∩=∩

5) )()()( YRXRYXR ∗∗∗ ∪⊇∪6) )()()( YRXRYXR ∗∗∗ ∩⊆∩

7) (&)()( XR YRXRYX ∗∗∗ ⊆→⊆

8) )()( XRXR ∗∗ −=−

9) )()( XRXR ∗∗ −=−

10) )()( XRRXRR ∗∗

∗∗ =

11) ()( XRRXRR ∗∗

∗∗ =

It is tions are in easily seen that approximatopology generated by data. Thus fuzzy set theo

rent mathematical setting. Rough sets can be also defined employing, instead of approximation, rough membership

function [5] X

where

( )| xRX ∩( ) ||

|xR

xRX =µ

and |X| denotes the cardinality of

The rough membership function expr x belongs to X given that x belongs to X in view of information about x

X.

esses conditional probability that R and can be interpreted as a degree expressed by R.

The meaning of rough membership function can be depicted as shown in Fig.2.

10

0 < RXµ

R(x)

0)( =xRXµ

x

R(x) X

1)( <x

x

X R(x)

1)( =xRXµ

x

X

Fig. 2

The rough membership function can be used to define approximations and the boundary region of a set, as shown below:

( ) ( ){ }1: =∈=∗ xUxXR RXµ ,

( ) ( ){ }0: >∈=∗ xUxXR RXµ ,

( ) ( ){ }10: <<∈= xUxXRN RXR µ .

It can be shown that the membership function has the following properties [5]:

1) iff 1)( =xRXµ )(* XRx∈

2) iff 0)( =xRXµ )(* XRUx −∈

3) iff 1)(0 << xRXµ )(XRNx R∈

4) for any x∈U )(1)( xx RX

RXU µµ −=−

5) max ( for any x∈U ≥∪ )(xRYXµ ))(),( xx R

YRX µµ

6) min ( for any x∈U ≤∩ )(xRYXµ ))(),( xx R

YRX µµ

From the properties it follows that the rough membership differs essentially from the fuzzy membership, for properties 5) and 6) show that the membership for union and intersection of sets, in general, cannot be computed – as in the case of fuzzy sets – from their constituents membership. Thus formally the rough membership is a generalization of fuzzy membership. Besides, the rough membership function, in contrast to fuzzy membership function, has a probabilistic flavour.

11

Now we can give two definitions of rough sets.

Definition 1: Set X is rough with respect to R if . )()( XRXR ∗∗ ≠

Definition 2: Set X rough with respect to R if for some x, . 1)(0 << xRXµ

It is interesting to observe that definition 1 and definition 2 are not equivalent [5], but we will not discuss this issue here.

Let us also mention that rough set theory clearly distinguishes two very important concepts, vagueness and uncertainty, very often confused in the AI literature. Vagueness is the property of sets and can be described by approximations, whereas uncertainty is the property of elements of a set and can expressed by the rough membership function.

5. Summary

Basic concept of mathematics, the set, leads to antinomies, i.e., it is contradictory. This deficiency of sets, has rather philosophical than practical meaning, for sets used in

mathematics are free from the above discussed faults. Antinomies are associated with very “artificial” sets constructed in logic but not found in sets used in mathematics. That is why we can use mathematics safely.

Fuzzy set and rough set theory are two different approaches to vagueness and are not remedy for classical set theory difficulties.

Both theories represent two different approaches to vagueness. Fuzzy set theory addresses gradualness of knowledge, expressed by the fuzzy membership – whereas rough set theory addresses granularity of knowledge, expressed by the indiscernibility relation.

References

[1] G. Cantor: Grundlagen einer allgemeinen Mannigfaltigkeitslehre, Leipzig, 1883 [2] G. Frege, Grundlagen der Arithmetik, 2, Verlag von Herman Pohle, Jena, 1893 [3] St. Leśniewski: Grungzüge eines neuen Systems der Grundlagen der Mathematik,

Fundamenta Matemaicae, XIV, 1929, 1- 81 [4] Z. Pawlak: Rough sets, Int. J. of Information and Computer Sciences, 11, 5, 341-356,

1982 [5] Z. Pawlak, A. Skowron: Rough membership function, in: R. E Yeager, M. Fedrizzi and

J. Kacprzyk (eds.), Advaces in the Dempster-Schafer of Evidence, Wiley, New York, 1994, 251-271

[6] L. Polkowski: Rough Sets, Mathematical Foundations, Advances in Soft Computing, Physica – Verlag, A Springer-Verlag Company, 2002

[7] L. Polkowski, A. Skowron: Rough mereological calculi granules: a rough set approach to computation, computational intelligence: An International Journal 17, 2001, 472-479

[8] B. Russell: The Principles of Mathematics, London, George Allen & Unwin Ltd., 1st Ed. 1903 (2nd Edition in 1937)

[9] A. Skowron, J. Komorowski, Z. Pawlak, L. Polkowski: A rough set perspective on data and knowledge, in: W. Kloesgen, J. Zytkow (eds.), Handbook of KDD, Oxford University Press, 2002, 134-149

12

[10] L. Zadeh: Fuzzy sets, Information and Control, 8, 338-353, 1965

13

CHAPTER 2

Rough Sets and Reasoning from Data

1. Introduction

In this chapter we define basic concepts of rough set theory in terms of data, in contrast to general formulation presented in Chapter 1. This is necessary if we want to apply rough sets to reason from data.

As mentioned in the previous chapter rough set philosophy is based on the assumption that, in contrast to classical set theory, we have some additional information (knowledge, data) about elements of a universe of discourse. Elements that exhibit the same information are indiscernible (similar) and form blocks that can be understood as elementary granules of knowledge about the universe. For example, patients suffering from a certain disease, displaying the same symptoms are indiscernible and may be thought of as representing a granule (disease unit) of medical knowledge. These granules are called elementary sets (concepts), and can be considered as elementary building blocks of knowledge. Elementary concepts can be combined into compound concepts, i.e., concepts that are uniquely determined in terms of elementary concepts. Any union of elementary sets is called a crisp set, and any other sets are referred to as rough (vague, imprecise).

Due to the granularity of knowledge, rough sets cannot be characterized by using available knowledge. Therefore with every rough set we associate two crisp sets, called its lower and upper approximation. Intuitively, the lower approximation of a set consists of all elements that surely belong to the set, whereas the upper approximation of the set constitutes of all elements that possibly belong to the set. The difference of the upper and the lower approximation is a boundary region. It consists of all elements that cannot be classified uniquely to the set or its complement, by employing available knowledge. Thus any rough set, in contrast to a crisp set, has a non-empty boundary region.

In rough set theory sets are defined by approximations. Notice, that sets are usually defined by the membership function. Rough sets can be also defined using, instead of approximations, membership function, however the membership function is not a primitive concept in this approach, and both definitions are not equivalent.

2. An Example

For the sake of simplicity we first explain the proposed approach intuitively, by means of a simple tutorial example.

Data are often presented as a table, columns of which are labeled by attributes, rows by objects of interest and entries of the table are attribute values. For example, in a table containing information about patients suffering from a certain disease objects are patients (strictly speaking their ID's), attributes can be, for example, blood pressure, body temperature etc., whereas the entry corresponding to object Smith and the attribute blood preasure can be

14

normal. Such tables are known as information systems, attribute-value tables or information tables. We will use here the term information table.

Below an example of information table is presented. Suppose we are given data about 6 patients, as shown in Table 1.

Patient Headache Muscle-pain

Temperature

Flu

p1 no yes high yes p2 yes no high yes p3 yes yes very high yes p4 no yes normal no p5 yes no high no p6 no yes very high yes

Table 1 Columns of the table are labeled by attributes (symptoms) and rows – by objects (patients),

whereas entries of the table are attribute values. Thus each row of the table can be seen as information about specific patient. For example, patient p2 is characterized in the table by the following attribute-value set

(Headache, yes), (Muscle-pain, no), (Temperature, high), (Flu, yes),

which form the information about the patient. In the table patients p2, p3 and p5 are indiscernible with respect to the attribute Headache,

patients p3 and p6 are indiscernible with respect to attributes Muscle-pain and Flu, and patients p2 and p5 are indiscernible with respect to attributes Headache, Muscle-pain and Temperature. Hence, for example, the attribute Headache generates two elementary sets {p2, p3, p5} and {p1, p4, p6}, whereas the attributes Headache and Muscle-pain form the following elementary sets: {p1, p4, p6}, {p2, p5} and {p3}. Similarly one can define elementary sets generated by any subset of attributes.

Patient p2 has flu, whereas patient p5 does not, and they are indiscernible with respect to the attributes Headache, Muscle-pain and Temperature, hence flu cannot be characterized in terms of attributes Headache, Muscle-pain and Temperature. Hence p2 and p5 are the boundary-line cases, which cannot be properly classified in view of the available knowledge. The remaining patients p1, p3 and p6 display symptoms which enable us to classify them with certainty as having flu, patients p2 and p5 cannot be excluded as having flu and patient p4 for sure does not have flu, in view of the displayed symptoms. Thus the lower approximation of the set of patients having flu is the set {p1, p3, p6} and the upper approximation of this set is the set {p1, p2, p3, p5, p6}, whereas the boundary-line cases are patients p2 and p5. Similarly p4 does not have flu and p2, p5 cannot be excluded as having flu, thus the lower approximation of this concept is the set {p4} whereas - the upper approximation – is the set {p2, p4, p5} and the boundary region of the concept “not flu” is the set {p2, p5}, the same as in the previous case.

3. Rough Sets and Approximations

As mentioned in the introduction, the starting point of rough set theory is the indiscernibility relation, generated by information about objects of interest. The indiscernibility relation is intended to express the fact that due to the lack of knowledge we are unable to discern some objects employing the available information. That means that, in general, we are unable to

15

deal with single objects but we have to consider clusters of indiscernible objects, as fundamental concepts of our theory.

Now we present above considerations more precisely. Suppose we are given two finite, non-empty sets U and A, where U is the universe, and A –

a set of attributes. With every attribute a∈A we associate a set Va, of its values, called the domain of a. Any subset B of A determines a binary relation I(B) on U, which will be called an indiscernibility relation, and is defined as follows:

xI(B)y if and only if a(x) = a(y) for every a∈A, ement x.

Obviously I(B) is an equivalence relation. The family of all equivalence classes of I(B), i.e

lence classes of the

ation will be used next to define approximations, basic concepts of rou

tions can be defined as follows:

where a(x) denotes the value of attribute a for el

., partition determined by B, will be denoted by U/I(B), or simple U/B; an equivalence class of I(B), i.e., block of the partition U/B, containing x will be denoted by B(x).

If (x, y) belongs to I(B) we will say that x and y are B-indiscernible. Equiva relation I(B) (or blocks of the partition U/B) are referred to as B-elementary sets. In the

rough set approach the elementary sets are the basic building blocks (concepts) of our knowledge about reality.

The indiscernibility relgh set theory. Now approxima

( ) { ( ) }XxBUxXB ⊆∈=∗ :

( ),

( ){ }∅≠∩∈= XxBUxX∗

assigning to every subset X B*(X) called the B-lower and

will be referred to as the i.e., BN (X) = ∅, then the set X is crisp

(ex

ations can be presented now as:

1) ,

2) ∗

3)

4) ,

5) and ∗ ,

7) ,

8)

9) ,

10) ,

B : ,

of the universe U two sets B (X) and *the B-upper approximation of X, respectively. The set

∗ )()()( XBXBXBNB ∗−=

B-boundary region of X. If the boundary region of X is the empty set, B

act) with respect to B; in the opposite case, i.e., if BNB(X) ≠∅, the set X is to as rough (inexact) with respect to B.

The properties of approxim

)()( XBXXB∗ ⊆⊆

B BB ∅=∅=∅ ∗ ;)()

∗

UUBU ==∗∗ )()(( ,

)()(( YBXBYXB ∗∗∗ ∪=∪ ,

)()()( YBXBYXB ∩=∩ ∗∗∗

YX ⊆ implies ()( BXB ⊆ )Y∗∗

)() YBX ∗∗ ∪ , )()( YBXB∗ ⊆

6) () BY ⊇(XB∗ ∪

)()()( YBXBYXB ∗∗∗ ∩⊆∪

)()( XBXB ∗−=− , ∗

)()( XBXB∗ −=− ∗

())(( BBXBB ∗= )())( XBX ∗∗∗∗ =

16

11) , )())(())(( XBXBBXBB ∗∗∗

∗∗ ==

where –X denotes U – X. It is easily seen that the lower and the upper approximations of a set are interior and

closure operations in a topology generated by the indiscernibility relation. One can define the following four basic classes of rough sets, i.e., four categories of

vagueness:

a) and, iff X is roughly B-definable, ∅≠∗ )(XB ≠∗ )(XBb) and , iff X is internally B-indefinable, ∅=∗ )(XB UXB ≠∗ )(c) and , iff X is externally B-definable, ∅≠∗ )(XB UXB =∗ )(d) and , iff X is totally B-indefinable. ∅=∗ )(XB UXB =∗ )(

The intuitive meaning of this classification is the following. If X is roughly B-definable, this means that we are able to decide for some elements of U

whether they belong to X or −X, using B. If X is internally B-indefinable, this means that we are able to decide whether some

elements of U belong to −X, but we are unable to decide for any element of U, whether it belongs to X or not, using B.

If X is externally B-indefinable, this means that we are able to decide for some elements of U whether they belong to X, but we are unable to decide, for any element of U whether it belongs to −X or not, using B.

If X is totally B-indefinable, we are unable to decide for any element of U whether it belongs to X or −X, using B.

Rough set can be also characterized numerically by the following coefficient

|)(||)(|)(

XBXBXB ∗

∗=α

called accuracy of approximation, where |X| denotes the cardinality of X. Obviously 1)(0 ≤≤ XBα . If 1)( =XBα , X is crisp with respect to B (X is precise with respect to B), and

otherwise, if 1) <X(α , X is rough with respect to B (X is vague} with respect to B). B

Let us depict above definitions by examples referring to Table 1. Consider the concept “flu”, i.e., the set X = { p1, p2, p3, p6} and the set of attributes B = {Headache, Muscle-pain, Temperature}. Concept “flu” is roughly B-definable, because ∅≠=∗ }6,3,1{)( pppXB and

. For this case we get αB(“flu”) = 3/5. It means that the concept “flu” can be characterized partially employing symptoms Headache, Muscle-pain and

UpppppXB ≠=∗ }6,5,3,2,1{)(

Temperature. Taking only one symptom B = {Headache} we get and , which means that the concept “flu” is totally indefinable in terms of attribute

Headache, i.e., this attribute is not characteristic for flu whatsoever. However, taking single attribute B = {Temperature} we get

∅=∗ )(XBUXB =∗ )(

}6,3{)( ppXB =∗ and , thus the concept “flu” is again roughly definable, but in this case we obtain αB(X)= 2/5, which means that the single symptom Temperature is less characteristic for flu, than the whole set of symptoms, and patient p1 cannot be now classified as having flu in this case.

}6,5 p,3,2,1{)( ppppXB =∗

17

4. Rough Sets and Membership Function

As shown Chapter 1 rough sets can be also defined using a rough membership function [3], defined as

|)(||)(|)(

xBxBXxB

X∩

=µ .

Obviously

]1,0[)( ∈xBXµ .

Value of the membership function µX(x) is a kind of conditional probability, and can be interpreted as a degree of certainty to which x belongs to X (or 1 − µX(x), as a degree of uncertainty).

The rough membership function can be used to define approximations and the boundary region of a set, as shown below:

}1)(:{)( =∈=∗ xUxXB BXµ ,

}0)(:{)( >∈=∗ xUxXB BXµ ,

}1)(0:{)( <<∈= xUxXBN BXB µ .

The rough membership function has the following properties [3]:

a) iff 1)( =xBXµ )(* XBx∈ ,

b) iff , 0)( =xBXµ )(* XBx −∈

c) iff , 1)(0 << xBXµ )(XBNx B∈

d) If , then is the characteristic function of X, }:),{()( UxxxBI ∈= )(xBXµ

e) If xI(B)y, then = provided I(B), )(xBXµ )(yB

Xµ

f) for any x∈U, )(1)( xx BX

BXU µµ −=−

g) max for any x∈U, ≥∪ )(xBYXµ ))(),(( xx B

YBX µµ

h) min for any x∈U, ≤∩ )(xBYXµ ))(),(( xx B

YBX µµ

The above properties show clearly the difference between fuzzy and rough membership. In particular properties g) and h) show that the rough membership formally can be regarded as a generalization of fuzzy membership. Let us recall that the “rough membership”, in contrast to the “fuzzy membership”, has probabilistic flavor.

It can be easily seen that there exists a strict connection between vagueness and uncertainty. As we mentioned above vagueness is related to sets (concepts), whereas uncertainty is related to elements of sets. Rough set approach shows clear connection between these two concepts.

5. Decision Tables and Decision Algorithms

Sometimes we distinguish in an information table two classes of attributes, called condition and decision (action) attributes. For example, in Table 1 attributes Headache, Muscle-pain

18

and Temperature can be considered as condition attributes, whereas the attribute Flu − as a decision attribute.

Each row of a decision table determines a decision rule, which specifies decisions (actions) that should be taken when conditions pointed out by condition attributes are satisfied. For example, in Table 1 the condition (Headache, no), (Muscle-pain, yes), (Temperature, high) determines uniquely the decision (Flu, yes). Objects in a decision table are used as labels of decision rules.

Decision rules 2) and 5) in Table 1 have the same conditions but different decisions. Such rules are called inconsistent (nondeterministic, conflicting); otherwise the rules are referred to as consistent (certain, deterministic, non-conflicting). Sometimes consistent decision rules are called sure rules, and inconsistent rules are called possible rules. Decision tables containing inconsistent decision rules are called inconsistent (nondeterministic, conflicting); otherwise the table is consistent (deterministic, non-conflicting).

The number of consistent rules to all rules in a decision table can be used as consistency factor of the decision table, and will be denoted by γ(C, D), where C and D are condition and decision attributes respectively. Thus if γ(C, D) = 1 the decision table is consistent and if γ(C, D) ≠ 1 the decision table is inconsistent. For example, for Table 1, we have γ(C, D) = 4/6. Decision rules are often presented as implications called “if...then...” rules. For example, rul

if (Headache, no) and (Muscle-pain, yes) and (Temperature, high) then (Flu, yes).

A set of decision rules is called a decision algorithm. Thus with each decision table we can ass

de

6. Dependency of Attributes

Another important issue in data analysis is discovering dependencies between attributes.

means that only some values of D are determined by values of C.

e 1) in Table 1 can be presented as implication

ociate a decision algorithm consisting of all decision rules occurring in the decision table. We must however, make distinction between decision tables and decision algorithms. Acision table is a collection of data, whereas a decision algorithm is a collection of

implications, e.g., logical expressions. To deal with data we use various mathematical methods, e.g., statistics but to analyze implications we must employ logical tools. Thus these two approaches are not equivalent, however for simplicity we will often present here decision rules in form of implications, without referring deeper to their logical nature, as it is often practiced in AI.

Intuitively, a set of attributes D depends totally on a set of attributes C, denoted C ⇒D, if all values of attributes from D are uniquely determined by values of attributes from C. In other words, D depends totally on C, if there exists a functional dependency between values of D and C. For example, in Table 1 there are no total dependencies whatsoever. If in Table 1, the value of the attribute Temperature for patient p5 were “no” instead of “high”, there would be a total dependency {Temperature}⇒{Flu}, because to each value of the attribute Temperature there would correspond unique value of the attribute Flu.

We would need also a more general concept of dependency of attributes, called a partial dependency of attributes.

Let us depict the idea by example, referring to Table 1. In this table, for example, the attribute Temperature determines uniquely only some values of the attribute Flu. That is, (Temperature, very high) implies (Flu, yes), similarly (Temperature, normal) implies (Flu, no), but (Temperature, high) does not imply always (Flu, yes). Thus the partial dependency

19

Formally dependency can be defined in the following way. Let D and C be subsets of A. We will say that D depends on C in a degree k (0 ≤ k ≤ 1), denoted C ⇒kD, if k = γ(C, D).

y (in

the partition U/D, employing attributes C.

four out of six patients can be uniquely classified as having flu or not, em

ture}⇒{Flu}, we would get k =

the concept of de

If k = 1 we say that D depends totally on C, and if k < 1, we say that D depends partiall a degree k) on C. The coefficient k expresses the ratio of all elements of the universe, which can be properly

classified to blocks ofThus the concept of dependency of attributes is strictly connected with that of consistency

of the decision table. For example, for dependency {Headache, Muscle-pain, Temperature}⇒{Flu} we get

k = 4/6 = 2/3, becauseploying attributes Headache, Muscle-pain and Temperature. If we were interested in how exactly patients can be diagnosed using only the attribute

Temperature, that is − in the degree of the dependence {Tempera 3/6 = 1/2, since in this case only three patients p3, p4 and p6 out of six can be uniquely

classified as having flu. In contrast to the previous case patient p4 cannot be classified now as having flu or not. Hence the single attribute Temperature offers worse classification than the whole set of attributes Headache, Muscle-pain and Temperature. It is interesting to observe that neither Headache nor Muscle-pain can be used to recognize flu, because for both dependencies {Headache}⇒{Flu} and {Muscle-pain}⇒{Flu} we have k = 0.

It can be easily seen that if D depends totally on C then I(C) ⊆ I(D). That means that the partition generated by C is finer than the partition generated by D. Notice, that

pendency discussed above corresponds to that considered in relational databases. If D depends in degree k, 0 ≤ k ≤ 1, on C, then

||U|)(DSC ,

where

∪ )()(C XCDPOS ∗= .

The expression POSC(D), called a U/D with respect to C, is the set of all elements of U ed to blocks of the partition U/D, by

uely classified to blocks of the partition U/D, employing C.

7. Reduction of Attributes

her we can remove some data from a data table preserving its basic properties, that is − whether a table contains some superfluous data.

one, in regard to ap

nd let a belong to B.

|),( PODC =γ

)(/ DIUX∈

positive region of the partition that can be uniquely classifi

means of C. Summing up: D is totally (partially) dependent on C, if all (some) elements of the universe

U can be uniq

We often face a question whet

For example, it is easily seen that if we drop in Table 1 either the attribute Headache or Muscle-pain we get the data set which is equivalent to the original

proximations and dependencies. That is we get in this case the same accuracy of approximation and degree of dependencies as in the original table, however using smaller set of attributes.

In order to express the above idea more precisely we need some auxiliary notions. Let B be a subset of A a

20

• We say that a is dispensable in B if I(B) = I(B − {a}); otherwise a is indispensable in B.

• Set B is independent if all its attributes are indispensable. Sub• set B' of B is a reduct of B if B' is independent and I(B') = I(B).

Thu a means that a reduct is the minima of the universe as

g the notion of the core and reducts

where Red(B) is the set off all reducts ofBecause the core is the intersection of all reducts, it is included in every reduct, i.e., each

Thus, in a sense, the core is the most important sub

e are still able to discern objects in the table as the original on

the value of attribute a is indispensable for x.

The set of all indispensable values of attributes in B for x will be called the value core of B for x, a w

)()( BdReB xx ∩= ,

where Redx(B) is the family ofSuppose we are given a dependency y happen that the set D depends not on

ight be interested to find this subset. In ord

s reduct is a set of attributes that preserves partition. It l subset of attributes that enables the same classification of elements

the whole set of attributes. In other words, attributes that do not belong to a reduct are superfluous with regard to classification of elements of the universe.

Reducts have several important properties. In what follows we will present two of them. First, we define a notion of a core of attributes. Let B be a subset of A. The core of B is the set off all indispensable attributes of B. The following is an important property, connectin

∩ )()( BdReBCore = ,

B.

element of the core belongs to some reduct.set of attributes, for none of its elements can be removed without affecting the

classification power of attributes. To further simplification of an information table we can eliminate some values of attribute

from the table in such a way that we. To this end we can apply similar procedure as to eliminate superfluous attributes, which

is defined next.

• We will say that the value of attribute a∈B, is dispensable for x, if [x]I(B) = [x]I(B − {a}); otherwise

• If for every attribute a∈B the value of a is indispensable for x, then B will be called orthogonal for x.

• Subset B' ⊆ B is a value reduct of B for x, iff B' is orthogonal for x and [x]I(B) = [x]I(B').

nd ill be denoted COREx(B). Also in this case we have

CORE

all reducts of B for x. C ⇒D. It ma

the whole set C but on its subset C' and therefore we mer to solve this problem we need the notion of a relative reduct, which will be defined and

discussed next. Let C,D ⊆ A. Obviously if C' ⊆ C is a D-reduct of C, then C' is a minimal subset of C such

that

),(),( DCDC ′= γγ .

• We will say that attribute a C, if POSC(D) = POS(C−{a})(D); otherwise the attribute a is

∈C is D-dispensabD-indispensable in C.

le in

• If all attributes a∈C are C-indispensable in C, then C will be called D-independent.

21

• Subset C' ⊆ C is a D-reduct of C, iff C' is D-independent and POSC(D) = POSC'(D).

The d

)(CD ,

where RedD(C) is the family of all

ve reducts with respect to Flu, {Headache, Te

Patient Headache Temperatur Flu

set of all D-indispensable attributes in C will be called D-core of C, and will be denoteby CORED(C). In this case we have also the property

)( dReCCORED ∩=D-reducts of C.

If D = C we will get the previous definitions. For example, in Table 1 there are two relatimperature} and {Muscle-pain, Temperature} of the set of condition attributes {Headache,

Muscle-pain, Temperature}. That means that either the attribute Headache or Muscle-pain can be eliminated from the table and consequently instead of Table 1 we can use either Table 2

e

p2 yes high yes p3 yes very high yes p4 no normal no p5 yes high no p6 no very high yes

ble 2

p1 no high yes

Ta

or Table 3

Patient Muscle- Temperatur Flu pain e

y higp2 no high yes p3 yes very high yes p4 yes normal no p5 no high no p6 yes very high yes

ble 3

p1 es h yes

Ta

For Table 1 the relative core of with respect to the set {Headache, Muscle-pain, Te

given a de

otherwise the value• or x, then C will be called

D-independent (orthogonal) for x.

mperature} is the Temperature. This confirms our previous considerations showing that Temperature is the only symptom that enables, at least, partial diagnosis of patients.

We will need also a concept of a value reduct and value core. Suppose we arependency DC ⇒ where C is relative D-reduct of C. To further investigation of the

dependency w t be interested to know exactly how values of attributes from D depend on values of attributes from C. To this end we need a procedure eliminating values of attributes form C which does not influence on values of attributes from D.

• We say that value of attribute a∈C, is D-dispensable for x∈U, if

e migh

[x]I(C) ⊆ [x]I(D) implies [x]I(C−{a}) ⊆ [x]I(D); of attribute a is D-indispensable for x.

If for every attribute a∈C value of a is D-indispensable f

22

• Subset C' ⊆ C is a D-reduct of C for x (a value reduct), iff C' is D-independent for x and

[x]I(C) ⊆ [x]I(D) implies [x]I(C') ⊆ [x]I(D).

t of alThe se l D-indispensable for x values of attributes in C will be called the D-core of C for x (the value core), and will

)(CdRe xDD ,

where )(CdRe xD is the family of

plified as follows

be denoted )(CCORE xD .

We have also the following property

)(Cx ∩=CORE

all D-reducts of C for x. Using the concept of a value reduct, Table 2 and Table 3 can be sim

Patient Headache Temperature

Flu

p1 no high yes

p3 − very high yes p4 − normal no p5 yes high no p6 − very high yes

p2 yes high yes

Table 4

Patient Muscle-pain

eratur Flu Tempe

p1 yes high yes no high

p3 − very high yes p4 − normal no p5 no high no p6 − very high yes

ble 5

p2 yes

Ta

We can also present the obtained results in a form of a decision algorithm. For Table 4 we get

nd (Temperature, high) then (Flu, yes),

then (Flu, no), ).

an

if (Muscle-pain, yes) and (Temperature, high) then (Flu, yes), d (Temperature, high) then (Flu, yes),

) then (Flu, no),

if (Headache, no) and (Temperature, high) then (Flu, yes), if (Headache, yes) aif (Temperature, very high) then (Flu, yes), if (Temperature, normal) then (Flu, no), if (Headache, yes) and (Temperature, high) if (Temperature, very high) then (Flu, yes

d for Table 5 we have

if (Muscle-pain, no) anif (Temperature, very high) then (Flu, yes), if (Temperature, normal) then (Flu, no), if (Muscle-pain, no) and (Temperature, high

23

if (Temperature, very high) then (Flu, yes).

ct of B,

con

every C' ⊆ C,

in p

C, then B ⇒{a}, for every a∈C.

Mo

ct of B, then neither {a} ⇒{b} nor {b} ⇒{a} holds, for every a, b∈B', tes in a reduct are pairwise independent.

8. Indiscernibility Matrices and Functions

se discernibility matrix [4], which is defined next.

The following important property

a) B' ⇒ B − B', where B' is a redu

nects reducts and dependency. Besides, we have:

b) If B ⇒ C, then B ⇒ C', for

articular

c) If B ⇒

reover, we have:

d) If B' is a redui.e., all attribu

To compute easily reducts and the core we will u

By an discernibility matrix of B ⊆ A denoted M(B) we will mean n × n matrix defined as:

)}()(:{)( jiij xaxBac ≠∈= for nji ,,2,1, …= .

Thus entry cij is the set ofThe discernibility m y a subset of attributes

all attributes which discern objects atrix M(B) assigns to each pair of objects

xi and xjx and

.

Byx ⊆),(δ , with the following properties:

i) δ (x, x) = ∅, ii) δ (x, y) = δ (y, x),

i δ (y, z).

ties of semi-distance, and therefore the function δ may be regarded as qualitative semi-matrix and ),( yx

ii) δ (x, z) ⊆ δ (x, y) ∪

These properties resemble properδ − qualitative semi-distance. Thus the

discernibility matrix can be seen as a semi-dista c (qualitative) matrix.

Let us also note that for every Uzyx

n e

∈,, we have

iv) |δ (x, x)| = 0, v) |δ (x, y)| = |δ (y, x)|,

|δ (y, z)|.

It i t of all single element entries of the discernibility matrix M(B), i.e.,

}{:{)) acBaBCORE ij

vi) |δ (x, z)| ≤ |δ (x, y)| +

s easily seen that the core is the se

=∈= , for some }, ji

Obviously BB ⊆′ o inclusion) subset of B such that

∅≠

is a reduct of B, if B' is the minimal (with res tpect

∩′ cB for any nonempty entry )( ∅≠cc in M(B).

24

In other words reduct is the m bjects discernible by the whole set of attributes.

inimal subset of attribute scerns all os that di

Every discernibility matrix M(B) defines uniquely a discernibility (boolean) function f(B) defined as follows.

Let us assign to each attribute Ba∈ a binary Boolean variable a , and let ),( yxδΣ denote Boolean sum of all Boolean variables assigned to the set of attributes ( ), yxδ . Then the discernibility function can be defined by the formula

∏ ∈= ),(:),({)( UyxyxBf δΣ∈ 2),( Uyx

The following property establishes the relationship between disjunctive norm

2 and }),( ∅≠yxδ .

al form of the

e normal form of the function f(B) are all reducts of

∈Uy

function f(B) and the set of all reducts of B.

All constituents in the minimal disjunctivB. In order to compute the value core and value reducts for x we can also use the

discernibility matrix as defined before and the discernibility function, which must be slightly modified: ∏ ∈=x UyyxBf :),({)( δΣ and }),( ∅≠yxδ .

Relative reducts and core can be com atrix, which needs slight modification

puted also using discernibility m

)()(:{ jiij xaxaCac ≠∈= and )},( ji xxw ,

where ,( POSxxxw iji or )( and )( DPOSxD CjC ∉∈≡ POSx or )( and )( DPOSxD ∈ CjCi

)(),( and )(, DIxxDPOSxx jjCji ∉∈ ∉

for ji ,2,1, …=If the partition def then the condition ),( ji xxw in the above

duced to )(),( DIxx ji ∉ .

).

ries of the discernibility matrix MD(C), i.e.,

}, somefor ),( jia

n, . ined by D is definable by C

defi renition can be Thus entry cij is the set of all attributes which discern objects xi and x not belong to

the same equivalence class of (Dj that do

the relation IThe remaining definitions need little changes. The D-core is the set of all single element ent

:{)( cCaCCORE =∈= ijD .

Set CC ⊆′ is the D l (with respect to inclusion) subset of C such that

-reduct of C, if C' is the minima

)(in )(entry nonempty any for CMcccC D∅≠∅≠∩′ .

Thus D-reduct is the mrelation I(D).

defined as before. We have also the following property:

inimal subset of attributes that discerns all equivalence classes of the

Every discernibility matrix MD(C) defines uniquely a discernibility (Boolean) function fD(C) which is

All constituents in the disjunctive normal form of the function fD(C) are all D-reducts of C.

25

For computing value reducts and the value core for relative reducts we use as a starting point the discernibility matrix M (C) and discernibility function will have the form:

puting relative reducts for the set of attributes {Headache, .

6

D

∏ ∅≠∈=xD yxUyyxCf }),( and :),({)( δδΣ .

∈Uy

Let us illustrate the above considerations by comMuscle-pain, Temperature} with respect to Flu

The corresponding discernibility matrix is shown in Table 6.

1 2 3 4 5 1 2 3 4 T , M, T H 5 H, M , T M 6 T , M, T H

6 Table

In this table H, M, T denote Headache, Mu and Temperature, respectively.

))( TMTM

scle-painThe discernibility function for this table is

)(( HMHT ++ ++ ,

where + denotes the boolean sum itted in the formula. he

TH + TH,

which says that there are two reducts TH a he data table and T is the core.

9. Significance of Attributes and Approximate Reducts

butes, they cannot be equally

n attribute can be evaluated by measuring effect of removing the attribute fro

ion and decision attributes respectively and let a be a co

and the boolean multiplication is omAfter simplication the discernibility function using laws of Boolean algebra we obtain t

following expression

nd TM in t

As it follows from considerations concerning reduction of attriimportant, and some of them can be eliminated from an information table without losing information contained in the table. The idea of attribute reduction can be generalized by introducing a concept of significance of attributes, which enables us evaluation of attributes not only by two-valued scale, dispensable − indispensable, but by assigning to an attribute a real number from the closed interval [0,1], expressing how important is an attribute in an information table.

Significance of am an information table on classification defined by the table. Let us first start our

consideration with decision tables. Let C and D be sets of conditndition attribute, i.e., Ca∈ . As shown previously the number ),( DCγ expresses a degree

of consistency of the decision table, or the degree of dependency between attributes C and D, or accuracy of approximation of U/D by C. We can ask how the coefficient ),( DCγ changes when removing the attribute a, i.e., what is the difference between )D ,(Cγ and

)},{( DaC −γ . We can normalize the difference and define the significance of the attribute a as

26

),(),(),( DCDCDC γ)},{(1))},{(),(()( DaCDaCDCa γ

γγγσ −

−=−−

= ,

and denoted sim

σ

of t

ITematt

denoted by ε (If B is a reduct of

denoted simpexactly the set of

also the following approxim

ple by )(aσ , when C and D are understood. )( ≤aObviously 10 ≤σ . The more important is the attribute a the greater is the number

)(aσ . For example for ition attributes in Table 1 we havcond e the following results:

.75.

Because the significance of the attribute Temperature or Muscle-pain is zero, removing either ion attributes does not effect the set of consistent decision rules,

the attribute Headache from the reduct, i.e., using only the attribute of four) of consistent decision rules will be lost, and dropping the

follows:

σ (Headach(Muscle-pain) = 0,

e) = 0,

σ (Temperature) = 0

he attributes from conditwhatsoever. Hence the attribute Temperature is the most significant one in the table. That means that by removing the attribute Temperature, 75% (three out of four) of consistent decision rules will disappear from the table, thus lack of the attribute essentially effects the “decisive power” of the decision table.

For a reduct of condition attributes, e.g., {Headache, Temperature}, we get

σ (Headache) = 0.25, σ (Temperature) = 1.00.

n this case, removing perature, 25% (one out

ribute Temperature, i.e., using only the attribute Headache 100% (all) consistent decision rules will be lost. That means that in this case making decisions is impossible at all, whereas by employing only the attribute Temperature some decision can be made.

Thus the coefficient σ(a) can be understood as an error which occurs when attribute a is dropped. The significance coefficient can be extended to set of attributes as

),(),(1

),()),(),(()(),( DC

DBCDC

DBCDCBDC γγ

γγγσ −

−=−−

=

B), if a set of decision rules

C and D are understood, where B is a subset of C. C, then ε (B) = 1, i.e., removing any reduct from

unables to make sure decisions, whatsoever. Any subset B of C will be called an approximate reduct of C, and the number

),(( DCγ −),(

1),(

)(),( DCDCBDC γγ

ε −== ),()),( DBDBγ γ

le as ε(B . It expresses how C. Obviously

he, Temperature}, and

But for the whole set of condition attributes {Headache, Muscle-pain, Temperature} we have ate reduct

), will be called an error of reduct approximation attributes B approximates the set of condition attributes

ε (B) = 1 − σ (B) and ε (B) = 1 − ε (C − B). For any subset B of C we have ε (B) ≤ ε (C). If B is a reduct of C, then ε (B) = 0.

For example, either of attributes Headache and Temperature can be considered as approximate reducts of {Headac

ε (Headache) = 1, ε (Temperature) = 0.25.

27

ε (Headache, Muscle-pain) = 0.75.

The concept of an approximate reduct is a generalization of the concept of a reduct con l subset B of condition attributes C, such that

(sidered previously. The minima

),(), DBDC γγ = , or 0)(),( =BDCε is a reduct in the previous sense. The idea of an approximate reduct can be useful in cases when a smaller number of condition attributes is

uracy ion.

10. Summary

preferred over acc of classificat

Rough set Theory has found many applications in medical data analysis, finance, voice recognition, image processing and others. However the approach presented in this paper is too

References

lak: Rough sets, International Journal of Computer and Information Sciences, 11, 341-356, 1982

s, Boston, London, Dordrecht, 1991

r Theory of Evidence, John

[4] s, in: R. Słowiński (ed.), Intelligent Decision Support. Handbook of

[5] overy (W. Klösgen, J. Żytkow eds.), Oxford University

[6] er-Verlag Company, 2002, 1-534

simple to many real-life applications and was extended in many ways by various authors. The detailed discussion of the above issues can be found in [5], [6] and the internet (e.g., http://www.roughsets.org)

[1] Z. Paw

[2] Z. Pawlak: Rough Sets − Theoretical Aspects of Reasoning about Data, Kluwer Academic Publisher

[3] Z. Pawlak, A. Skowron: Rough membership functions, in: R. R Yaeger, M. Fedrizzi and J. Kacprzyk (eds.), Advances in the Dempster ShafeWiley & Sons, Inc., New York, Chichester, Brisbane, Toronto, Singapore, 1994, 251-271 A. Skowron, C. Rauszer: The discernibility matrices and functions in information systemApplications and Advances of the Rough Set Theory, Kluwer Academic Publishers, Dordrecht, 1992, 311-362 A. Skowron et al: Rough set perspective on data and konwedge, Handbook of Data Mining and Knoledge DiscPress, 2002, 134-149 L. Polkowski: Rough Sets − Mathematical Foundations, Advances in Soft Computing, Physica-Verlag, Spring

[7] L. Zadeh: Fuzzy sets, Information and Control 8, 333-353, 1965

28

CHAPTER 3

Rough Sets and Bayes’ Theorem

1. Introduction

The Bayes’ theorem is the essence of statistical inference. “The result of the Bayesian data analysis process is the posterior distribution that

represents a revision of the prior distribution on the light of the evidence provided by the data” [5]. “Opinion as to the values of Bayes’ theorem as a basic for statistical inference has swung be

look on Bayes’ theorem off

irst studied by Łukasiewicz [6] (see also [1]

fro

2. Bayes’ Theorem

f H denotes an hypothesis and D denotes data, the theorem says that

With P(H) regarded as a pro ut H before obtaining data D, the left-hand side P(H|D) becomes an probabilistic statement of belief about H after obtaining D.

H, or the distribution of H priori. Correspondingly,

tween acceptance and rejection since its publication on 1763” [4]. Rough set theory offers new insight into Bayes’ theorem [7]. Theered by rough set theory is completely different to that used in the Bayesian data analysis

philosophy. It does not refer either to prior or posterior probabilities, inherently associated with Bayesian reasoning, but it reveals some probabilistic structure of the data being analyzed. It states that any data set (decision table) satisfies total probability theorem and Bayes’ theorem. This property can be used directly to draw conclusions from data without referring to prior knowledge and its revision if new evidence is available. Thus in the presented approach the only source of knowledge is the data and there is no need to assume that there is any prior knowledge besides the data. We simple look what the data are telling us. Consequently we do not refer to any prior knowledge which is updated after receiving some data. Moreover, the rough set approach to Bayes’ theorem shows close relationship between logic of implications and probability, which was f

). Bayes’ theorem in this context can be used to “invert” implications, i.e., to give reasons for decisions. This is a very important feature of utmost importance to data mining and decision analysis, for it extends the class of problem which can be considered in this domains.

Besides, we propose a new form of Bayes’ theorem where basic role plays strength of decision rules (implications) derived from the data. The strength of decision rules is computed

m the data or it can be also a subjective assessment. This formulation gives new look on Bayesian method of inference and also simplifies essentially computations.

“In its simplest form, i

P(H|D) = P(D|H) × P(H)/P(D).

babilistic statement of belief abo

Having specified P(D|H) and P(D), the mechanism of the theorem provides a solution to the problem of how to learn from data.

In this expression, P(H), which tells us what is known about H without knowing of the data, is called the prior distribution of

29

P(

analysis. Such a dis

3. Information Systems and Decision Rules

ns, results etc.) determined, when some conditions are satisfied. In other words each row of the decision table specifies a decision rule

Every Ux

H|D), which tells us what is known about H given knowledge of the data, is called the posterior distribution of H given D, or the distribution of H a posteriori” [3].

“A prior distribution, which is supposed to represent what is known about unknown parameters before the data is available, plays an important role in Bayesian

tribution can be used to represent prior knowledge or relative ignorance” [4].

Every decision table describes decisions (actio

which determines decisions in terms of conditions. In what follows we will describe decision rules more exactly. Let S = (U, C, D) be a decision table. ∈ determines a sequence

(1c )(,),( ),(,), xdxdxcx …… where Ccc =},,{ … and ,,{ dd …1 mn n1 1 .} Dm

The sequence will be called a decision rule induced x (in S) and denoted by )x or in .

by (,),()(,),(1 dxdxcxc …… → short DC →

=

1 mn x

The n m er supp (C,D C(x) ∩ D(x) will e called a support of the decision rule u b x ) = |DC → and the number

| bx

||U

of the decision rule

),(),( DCpsupDC xx =σ ,

will be referred to as the strength Dx→ , where |X| denotes the →

decision rule, denoted cer (C, D

Ccardinality of X. With every decision rule DC x we associate the certainty factor of the

x ) and defined as follows:

() CD(|)()(|),( CpsupxDxCDCcer xx

σ==

∩=

))((|)(||)(| xCxCxC π

|||)(

UxC .

),, Dx ,

where |))(( xC =π

y be interpreted as a conditional probability that y belongs to D(x) given ), symbolically ).|( CD

The acertainty factor my belongs to C(x x

If cerx(C, D) = 1, then DC x→ will be called a certain decision rule in S; if 0 < cer (C, D) rred to as an

π x

< 1 the decision rule will be ref uncertain decision rule in S. euse a cBesides, we will also overage factor of the decision rule, denoted covx(C, D)

defined as

))((|)(||)(| xDxDxD π

|)(xD .

),(),(|)()(|),( DCDCpsupxDxCDCcov xxx

σ==

∩=

where ||U

Similarly

|))(( xD =π

).|(),( DCDCcov xx π=

If DC x→ is a decision rule then inverse decision rule. The inverse decision rules can be used to give explanations (reasons) for a decision.

DC x→ will be called an

30

Let us observe that

)(),( xDCcer Cµ= and covx(C, D). )(xDx

That means that the certainty factor expresses the degree of membership of x to the decision class D(x), given C, whereas the coverage factor expresses the degree of membership of

4. Decision Language

ibe decision tables in logical terms. To this end we define a formal language called a decision language.

). Formulas of For (B) are built up from attribute-value pa

f Φ in S.

S is defined inductively as follows: (a, v)||S = {x∈U : a(v) = x} for all a∈B and v∈Va, ||Φ ∨ Ψ||S = ||Φ||S ∪ ||Ψ||S, ||Φ ∧ Ψ||S = ||Φ||S ∩

d next.

or(D) and C, D are condition and decision attributes, res

a probability distribution pU(x) = 1/|U| for x∈U where U is the (non-empty) un

SUS

x to condition class C(x), given D.

It is often useful to descr

Let S = (U, A) be an information system. With every B ⊆ A we associate a formal language, i.e., a set of formulas For(B

irs (a, v) where a∈B and v∈Va by means of logical connectives ∧ (and), ∨ (or), ∼ (not) in the standard way.

For any Φ∈For(B) by ||Φ||S we denote the set of all objects x∈U satisfying Φ in S and refer to as the meaning o

The meaning ||Φ||S of Φ in||

||Ψ||S, ||∼ Φ||S = U − ||Φ||S. If S = (U, C, D) is a decision table then with every row of the decision table we associate a

decision rule, which is defineA decision rule in S is an expression Φ →S Ψ or simply Φ → Ψ if S is understood, read if

Φ then Ψ, where Φ∈For(C), Ψ∈Fpectively; Φ and Ψ are referred to as conditions part and decisions part of the rule,

respectively. The number suppS(Φ,Ψ ) = |(||Φ ∧ Ψ||S)| will be called the support of the rule Φ → Ψ in S.

We consider iverse of objects of S; we have pU(X) = |X|/|U| for X ⊆ U. For any formula Φ we associate

its probability in S defined by

)||(||)( p Φ=Φπ .

With every decision rule Φ → Ψ we associate a conditional probability

)|||||||(||)|( SSUS p ΦΨ=ΦΨπ

called the certainty factor ). This idea was used first by Łukasiewicz [6] (see also [1]) plications. We have

of the decision rule, denoted cerS(Φ to estimate the probability of im

,Ψ

|)||(||| S

ining and is called

|)||(|||)|(),( SSScer

ΦΨ∧Φ

=ΦΨ=ΨΦ π

where ∅≠Φ S|||| . This coefficient is now widely used in data m confidence coefficient. If 1)|( =ΦΨSπ , then Ψ→Φ will be called a certain decision rule; if 1|(0 <ΦΨ< Sπ

the decision rule will be referred to as a uncertain decision rule.

31

There is an interesting relationship between decision rules and their certain decision rules correspond to the lower approximation, w

approximations: ncertain decisionhereas the u

rules correspond to the boundary region. Besides, we will also use a coverage factor of the decision rule, denoted covS(Φ, Ψ)

defined by

)|||| |||(||)|( SSUS p ΨΦ=ΨΦπ .

Obviously we have

|)||(||||)||)|(),( S

SScovΨ

(||| Ψ∧Φ

S

There are three possibilities to interpret the certainty and the coverage fand mereological (degree of inclusion).

=ΨΦ=ΨΦ π .

actors: statistical (frequency), logical (degree of truth)

jects having the pro

We will use here mainly the statistical interpretation, i.e., the certainty factors will be interpreted as the frequency of objects having the property Ψ in the set of ob

perty Φ and the coverage factor − as the frequency of objects having the property Φ in the set of objects having the property Ψ.

Let us observe that the factors are not assumed arbitrarily but are computed from the data. The number

)()|(),( Φ⋅ΦΨ=),( ΨΦ

=ΨΦ SSS

S Uππσ

will be called the strength

supp

of the decision rule Ψ→Φ in S, and will play an imour approach, which will be discussed in section 6.

portant role in

of attributes in S = (U, A). W if S is understood, in symbols Φ

We will need also the notion of an equivalen rmulas. Let Φ, Ψ be formulas in For(A) where A is the set

ce of fo

e say that Φ and Ψ are equivalent in S, or simply, equivalentΨ≡ , if and only if Ψ→Φ and Φ→Ψ . It means that Ψ≡Φ if and only if

≡

SS |||||||| Ψ=Φ . We need also approximate equivalence of formulas which is defined as follows:

ΨΦ k and only if cov kcer Φ ),()

accuracy )10( ≤≤

,( .=ΨΦ=Ψ

Besides, we define also approximate equivalence of formulas with the εε , which is defined as follows:

Ψ≡Φ ε,k if and only if )},(),,({( ΨΦΨΦ= covcermink and ε≤ΨΦ−ΨΦ covcer |),(),(| .

5. Decision Algorithms

In this section we define the notion of a decision algorithm, which is a logical counterpart of a

Let miiiSDec }{)( =Ψ→Φ= , 2≥m , be a set of decision rules in a decision table

decision table. 1

S = (U, C, D).

1) )(SDecIf for every Ψ→Φ , ∈Ψ′→′Φ we have Φ′=Φ or ∅=Φ′∧Φ S|||| , and Ψ′=Ψ or Ψ′∧Ψ |||| ∅=S , then we will say that $Dec(S)$ is the set of pairwise

mutually exclusive (independent) decision rules in S.

32

2) If Si

m=Φ ||/\|| S|| ll say that t of de ec(S)

covers U.

U and i=1

Ui

m

i=Ψ

=||/\

1 we wi he set cision rules D

3) (S 0)If )Dec∈Ψ→Φ and ,(ΦSsuppadm

If C∗∪issible in S.

≠Ψ we will say that the decision rule Ψ→Φ is

4) SX ||/\)( Φ= , where Dec+(S) is the set of all certain decision rules

decision rules Dec(S f the decision table S = (U, C, D).

U, pres

DUX /∈ →Φ

SS )(+

), we will say that the set ofDec∈Ψ

from Dec(consistency part o

) preserves the

The set of decision rules t satisfies 1), 2) 3) and 4), i.e., is independent, covers erves the consistency of S and all decision rules )(SDec

Dec(S) tha∈Ψ→Φ are admissible in S

e called a decision algorithm in S. − will bHence, if Dec(S) is a decision algorithm in S then the conditions of rules from Dec(S)

)(XC∪define in S a partition of U. Moreover, the positive region of D with respect to C, i.e., the set

/ DUX∈∗

If Ψ→Φ is a decision rule then the decision rule is partitioned by the conditions of some of these rules, which are certain in S.

Φ→Ψ will be called an inverse decision rule of Ψ→Φ .

Let Dec*(S) denote the set of all inverse decision rules of Dec(S). lgorithm in S.

decisions pointed out by the

on of decision algorithms from decision tables is a complex task and we will not dis

erences.

Decision tables have important probabilistic properties which are discussed next. )(xC

It can be shown that Dec*(S) satisfies 1), 2), 3) and 4), i.e., it is a decision aIf De s a decision algorithm then Dec*(S) will be called an inverse decision algorithm

of Dec(S). c(S) i

The inverse decision algorithm gives reasons (explanations) for decision algorithms. A decision algorithm is a description of a decision table in the decision language. Generaticuss this issue here, for it does not lie in the scope of this paper. The interested reader is

advised to consult the ref

6. Probabilistic Properties of Decision Tables

Let DC x→ be a decision rule in S and let =Γ and )(xD=∆ . Then the following properties are valid:

1) ∑ =DCcer 1),( Γ∈y

y

2) 1),( =∑Γ∈y

y DCcov

3) ∑ ⋅y CDCcer (),( π ∑Γ∈y

⋅y yDDCcov ((),( π

Γ∈

==y

y DCyxD ),())())(( σπ

4) ∑ ∑∆∈ ∆∈y y

y== DCxC ),())))(( σπ

33

5) ))((),(

xCDCx

x π),(),(

))((),())((),(),

DCDC

yDDCcovxDDCcovD

y

x

y

x(Ccer σσ

σπ

π==

⋅⋅

=∑∑yy ∆∈Γ∈

6) ))((),(

),(),(

))((),())((),(),(

xDDC

DCDC

yCDCercxCDCercDCcov x

x

x

y

xx π

σσ

σπ

π==

⋅⋅

=∑∑

yy Γ∈Γ∈

That is, any decision table, satisfies 1),...,6). Observe that 3) and 4) refer to the well known , whereas 5) and 6) refer to Bayes' theorem. total probability theorem

decision rules only. The str

xample, shown in Table 1.

Fact Diseas Age Sex Test Support

Thus in order to compute the certainty and coverage factors of decision rules according to formula 5) and 6) it is enough to know the strength (support) of all

ength of decision rules can be computed from data or can be a subjective assessment.

7. Illustrative Example

Let us now consider an e

e 1 yes old man + 400

3 no old man − 100 4 yes old man − 40 5 no y woman oung − 220 6 yes m woman iddle − 60

Tab

Attributes disease, age and sex are condition attributes, whereas test is the decision attribute.

tributes disease, age and sex.

le 1

2 yes middle woman + 80

We want to explain the test result in terms of patients state, i.e., to describe attribute test in terms of at

The strength, certainty and coverage factors for decision table are shown in Table 2.

Fact Strength

Certainty Coverage1 0.44 0.92 0.83 2 0.09 0.56 0.17

4 0.04 0.08 0.10 5 0.24 1.00 0.52 6 0.07 0.44 0.14

le 2

3 0.11 1.00 0.24

Tab

Below a decision algorithm associated with Table 1 is presented.

1) if (disease, yes) and (age, old) then

3) if (disease, no) then (test, −)

(test, +) 2) if (disease, yes) and (age, middle) then (test, +)

34

4) if (disease, yes) and (age, old) then (test, −) 5) if (disease, yes) and (age, middle) then (test, −)

The certainty and coverage factors for the above algorithm are given in Table 3.

Rule Strengt Certainty Coverageh 1 0.44 0.92 0.83

2 0.09 0.56 0.17 3 0.36 1.00 0.76 4 0.04 0.08 0.10

le 3

ing co

5 0.24 0.44 0.14

Tab

The certainty factors of the decision rules lead the follow nclusions:

− 92% ill and old patients have positive test result − 56% ill and middle age patients more positive test result −

− 8% ill and old patients have negative test result − 44% ill and old patients have negative test result

In

est result (probability = 0.92) sitive test result (probability = 0.56)

ents have certainly negative test result (probability = 1.00)

No

−

4') if (test, −) then (disease, yes) and (age, old) −

Em orithm and the coverage factor we get the following exp

− reason for positive test results are most probably patients disease and old age

It follows from Table 2 that there are two interesting approximate equivalences of test results and

According to rule 1) the disease and old age are approximately equivalent to positive test result (

all healthy patients have negative test result

other words:

− ill and old patients most probably have positive t− ill and middle age patients most probably have po− healthy pati

w let us examine the inverse decision algorithm, which is given below:

1') if (test, +) then (disease, yes) and (age, old) 2') if (test, +) then (disease, yes) and (age, middle) 3') if (test, ) then (disease, no)

5') if (test, ) then (disease, yes) and (age, middle)

ploying the inverse decision alglanation of test results:

(probability = 0.83) − reason for negative test result is most probably lack of the disease (probability = 0.76)

the disease.

k = 0.83, ε = 0.11), and lack of the disease according to rule 3) is approximately equivalent to negative test result (k = 0.76, ε = 0.24).

35

8.

between employing Bayes’ theorem in statistical reasoning and the role of Bayes’ theorem in rough set based data analysis.

Bayesian inference consists in update prior probabilities by means of data to posterior

In the rough set approach Bayes' theorem reveals data patterns, which are used next to

pplication of Probability to Deductive Logic. D. Reidel Publishing Company, Dordrecht, Boston, 1975

s: An essay toward solving a problem in the doctrine of chances, Phil. Trans. c., 53, 370-418; (1763); Reprint Biometrika 45, 296-315, 1958

rk, Brisbane, Toronto,

[4] John Wiley and

[5] , New York, 1999

North Holland

[7] n Algorithms, in: W. Ziarko, Y. Y. Yao (eds.),

Summary

From the example it is easily seen the difference

probabilities.

draw conclusions from data, in form of decision rules.

References

[1] E. W. Adams: The Logic of Conditionals, an A

[2] T. BayeRoy. So

[3] J. M. Bernardo, A. F. M. Smith: Bayesian Theory, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Chichester, New YoSingapore, 1994 G. E. P. Box, G. C. Tiao: Bayesian Inference in: Statistical Analysis, Sons, Inc., New York, Chichester, Brisbane, Toronto, Singapore, 1992 M. Berthold, D. J. Hand: Intelligent Data Analysis, an Introduction, Springer-Verlag, Berlin, Heidelberg

[6] J. Łukasiewicz: Die logischen Grundlagen der Wahrscheinilchkeitsrechnung. Kraków, 1913, in: L. Borkowski (ed.), Jan Łukasiewicz − Selected Works,Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw, 1970 Z. Pawlak: Rough Sets and DecisioSecond International Conference, Rough Sets and Current Trends in Computing, RSCTC 2000, Banff, Canada, October 2000, LNAI 2000, 30-45

36

CHAPTER 4

Data Analysis and Flow Graphs

1. Introduction

In [2] Jan Łukasiewicz proposed to use logic as mathematical foundation of probability. He claims that probability is “purely logical concept” and that his approach frees probability from its obscure philosophical connotation. He recommends to replace the concept of probability by truth values of indefinite propositions, which are in fact propositional functions.

Let us explain this idea more closely. Let U be a non empty finite set, and let Φ(x) be a propositional function. The meaning of Φ(x) in U, denoted by |Φ(x)|, is the set of all elements of U, that satisfies Φ(x) in U. The truth value of Φ(x) is defined as card |Φ(x)| / card U. For example, if U = {1, 2, 3, 4, 5, 6} and Φ(x) is the propositional function x > 4, then the truth value of Φ(x) = 2/6 = 1/3. If the truth value of Φ(x) is 1, then the propositional function is true, and if it is 0, then the function is false. Thus the truth value of any propositional function is a number between 0 and 1. Further, it is shown that the truth values can be treated as probability and that all laws of probability can be obtained by means of logical calculus.

In this paper we show that the idea of Łukasiewicz can be also expressed differently. Instead of using truth values in place of probability, stipulated by Łukasiewicz, we propose, in this paper, using of deterministic flow analysis in flow networks (graphs). In the proposed setting, flow is governed by some probabilistic rules (e.g., Bayes’ rule), or by the corresponding logical calculus proposed by Łukasiewicz, though, the formulas have entirely deterministic meaning, and need neither probabilistic nor logical interpretation. They simply describe flow distribution in flow graphs. However, flow graphs introduced here are different from those proposed by Ford and Fulkerson [1] for optimal flow analysis, because they model rather, e.g., flow distribution in a plumbing network, than the optimal flow.

The flow graphs considered in this paper are basically meant not to physical media (e.g., water) flow analysis, but to information flow examination in decision algorithms. To this end branches of a flow graph are interpreted as decision rules. With every decision rule (i.e. branch) three coefficients are associated, the strength, certainty and coverage factors. In classical decision algorithms language they have probabilistic interpretation. Using Łukasiewicz’s approach we can understand them as truth values. However, in the proposed setting they can be interpreted simply as flow distribution ratios between branches of the flow graph, without referring to their probabilistic or logical nature.

This interpretation, in particular, leads to a new look on Bayes’ theorem, which in this setting, has entirely deterministic explanation.

The presented idea can be used, among others, as a new tool for data analysis, and knowledge representation.

We start our considerations giving fundamental definitions of a flow graph and related notions. Next, basic properties of flow graphs are defined and investigated. Further, the

37

relationship between flow graphs and decision algorithms is discussed. Finally, a simple tutorial example is used to illustrate the consideration.

2. Flow Graphs

A flow graph is a directed, acyclic, finite graph G = (N, B, ϕ), where N is a set of nodes, B ⊆ N × N is a set of directed branches, ϕ : B →R+ is a flow function and R+ is the set of non-negative reals. If ∈B then x is an input of y and y is an output of x. ),( yx If x∈N then I(x) is the set of all inputs of x and O(x) is the set of all outputs of x.

Input and output of a graph G are defined I(G) = {x∈N : I(x) = ∅}, O(G) = {x∈N : O(x) = ∅}.

Inputs and outputs of G are external nodes of G; other nodes are internal nodes of G. If ∈B then ϕ is a troughflow from x to y. We will assume in what follows that ),( yx

0) ≠y),( yx

,(xϕ for every ∈B. ),( yxWith every node x of a flow graph G we associate its inflow

∑∈

+ =)(

),()(xIy

xyx ϕϕ ,

and outflow

∑∈

− =)(

),()(xOy

yxx ϕϕ .

Similarly, we define an inflow and an outflow for the whole flow graph G, which are defined as

∑∈

−+ =)(

)()(GIx

xG ϕϕ

∑∈

+− =)(

)()(GOx

xG ϕϕ

We assume that for any internal node x, )()()( xxx ϕϕϕ == −+ , where )(xϕ is a troughflow of node x. Obviously, )()()( GGG ϕϕϕ == −+ , where )(Gϕ is a troughflow of graph G. The above formulas can be considered as flow conservation equations [1]. We will define now a normalized flow graph. A normalized flow graph is a directed, acyclic, finite graph G = (N, B, σ), where N is a set of nodes, B ⊆ N × N is a set of directed branches and σ : B → <0,1> is a normalized flow of

and ),( yx

)(),(),(

Gyxyx

ϕϕσ =

is strength of . Obviously, 0 ≤ σ ≤ 1. The strength of the branch expresses simply the percentage of a total flow through the branch.

),( yx ),( yx

In what follows we will use normalized flow graphs only, therefore by a flow graphs we will understand normalized flow graphs, unless stated otherwise.

38

With every node x alized inflow and outflow defined as

of a flow graph G we associate its norm

∑∈

++ ==

)(),(

)()(

)(xIy

xyGx

x σϕϕ

σ ,

.),( yxσ )( )(∈ xOyGϕ

any internal node x, we have )()()( xxx

)()( ∑−

− ==x

xϕ

σ

Obviously for σσσ == −+ , where )(xσ is a normalized troughflow of x.

Moreover, let

∑∈

−+

+ ==)(

)()()()(

GIx

xGGG σ

ϕϕ

σ ,

∑∈

+=)(

)()()(

GOx

xGG

σϕ

.

1

−ϕ− =)(Gσ

Obviously, )()()( === −+ GGG σσσ .

With every branch ),( yx of a flow graph G we associate the certainty and the coverage factors .

The certainty and the coverage of

3. Certainty and coverage factors

),( yx are defined as

)(),(

xyxcer

σ),( yxσ

= ,

and

),( yx)(yσ

respectively

)y,(xcov σ= .

, where 0)( ≠xσ and 0)( ≠yσ . onsequences of definitions given above are

presented: Below some properties, which are immediate c

1) ∑∈

=)(

1),(xOy

yxcer ,

2) ∑∈

=)(

1),(yIx

yxcov ,

3) ∑ ∑∈ ∈

==)( )(

),()(),()(xOy xOy

yxxyxercx σσσ ,

4) ∑ ∑∈ ∈

=)

)(),(yIx yIx

yyxcov σ=(

)(y σσ( )

),( yx ,

5) )(

)(),(),(x

yyxcovyxσ

σ=cer ,

39

6) )(

)(),(),(y

xyxceryxcovσ

σ= .

Obviously the above properties have a probabilistic flavor, e.g., equations (3) and (4) have a fo

e flow dis

A (directed) path from x to x1,…,xn such that x1 = x, xn = y and (xi, xi+1) ∈B for every to y is denoted by [x…y].

The certainty, the coverage 1… xn] are defined as

∏

rm of total probability theorem, whereas formulas (5) and (6) are Bayes’ rules. However, these properties in our approach are interpreted in a deterministic way and they describ

tribution among branches in the network. y, x ≠ y in G is a sequence of nodes i, 1 ≤ i ≤ n-1. A path from x

and the strength of the path [x

∏−

=+=

1

111 ),(][

n

iiin xxcerxxcer … ,

−

=+=

111 ),(][

iiin xxcovxxcov … ,

ely.

1n

σ [x…y] = σ (x) cer[x…y] = σ (y) y],

respectiv

cov[x…

The set of all paths fro y (x ≠ y) in G denoted >m x to < yx, , will be called a connection from x to y in G. In other words, connection >< yx, is a sub-graph of G determined by nodes x and y. For every connection < >yx, we define its certainty, coverage and strength as shown below:

∑>∈< yxyx ,][ ……

>< yx, is

=> yxcer yxcer ][, ,

the coverage of the connection

<

∑>∈< yxyx ,][ …

of the connection >< y

=>< yxcov yxcov ][, … ,

and the strength x, is

><>=< yxcovyyxcerx ,)(,)( σσ .

will be referred to as complete.

==>< ∑ yx yx ][, σσ …>∈< yxyx ,][ …

Let [x…y] be a path such that x and y are input and output of the graph G, respectively. Such a path

The set of all complete paths from x to y will be called a complete connection from x to y in G. In what follows we will consider complete paths and connections only, unless stated

Let x and y be an input and output of a graph G respectively. If we substitute for every complete connection >< y

otherwise.

x, in G a single branch ),( yx such ><= yxyx ,),( σσ , ><= yxceryxcer ,),( , ><= yxcovyxcov ,),( then we obtain a new flow graph G´ such

that )()( GG ′= σσ . The new flow graph will be called a combined flow graph. The combined resents a relationship between its inputs and outputs. flow graph for a given flow graph rep

40

4. Dependencies in flow graphs

Let ),( yx ∈B. Nodes x and y

σ ),( yx = σ (x) σ (y).

Consequently

are independent on each other if

)(),()(

yyxcerxσ

),( yx σσ== ,

and

).(),()(),( xyxcov

yyx σ

σσ

==

If

cer > σ (y),

imilarly, if

er ),( yx < σ (y), or

yx σ

then x and y depend negativelymetric ones, and

are analogous to that used in statistics.

),( yx

or

cov > σ (x), ),( yx

then x and y depend positively on each other. S

c

cov ),( < (x

on each other. Let us observe that relations of independency and dependencies are sym

),

For every (x ), y ∈B we define a dependency factor ),( yxη defined as

cer)(),()(),(

)(),()(),(),(

xyxcovxyxcov

yyxceryyxyx

σσ

σση

+−

=+−

= .

heck that if 0),(It is easy to c =yxη , then x and y are independent on each other, if <<− 0),(1 yxη , then x and y are negatively dependent and if 1),(0 yxη then x and y are

positively dependent on each other. <<

5. An Example

Now s by means of a we will illustrate ideas introduced in the previous sectionmple concerning votes distribution of various age groups and

simple xa social classes of voters etween political parties.

Consider three disjoint age groups of voters y1 (old), y2 (middle aged) and y3 (young) – belonging to three social classes x1 (high), x2 (middle) and x3 (low). The voters voted for four political parties z1 (Conservatives), z2 (Labor), z3 (Liberal Democrats) and z4 (others)

Social class and age group votes distribution is shown in Fig. 1.

eb

41

Fig. 1

First we want to find votes distribution with respect to age group. The result is shown in Fig.2.

Fig. 2

From the flow graph presented in Fig. 2 we can see that, e.g., party z1 obtained 19% of total votes, all of them from age group y1; party z2 – 44% votes, which 82% are from age group y2 and 18% – from age group y3, etc.

If we want to know how votes are distributed between parties with respects to social classes we have to eliminate age groups from the flow graph. Employing the algorithm presented in section 5 we get results shown in Fig. 3.

42

Fig. 3

From the flow graph presented in Fig. 3 we can see that party z1 obtained 22% votes from social class x1 and 78% − from social class x2, etc.

We can also present the obtained results employing decision algorithms. For simplicity we present only some decision rules of the decision algorithm. For example, from Fig.2 we obtain decision rules:

If Party (z1) then Age group (y1) (0.19)

If Party (z2) then Age group (y2) (0.36)

If Party (z2) then Age group (y3) (0.08), etc.

The number at the end of each decision rule denotes strength of the rule. Similarly, from Fig.3 we get:

If Party (z1) then Soc. class (x1) (0.04)

If Party (z1) then Soc. class (x2) (0.14), etc.

We can also invert decision rules and, e.g., from Fig. 3 we have:

If Soc. class (x1) then Party (z1) (0.04)

If Soc. class (x1) then Party (z2) (0.02)

If Soc. class (x1) then Party (z3) (0.04), etc

From the examples given above one can easily see the relationship between the role of modus ponens and modus tollens in logical reasoning and using flow graphs in reasoning about data.

Dependencies between Social class and Parties are shown in Fig. 4.

43

Fig.4

9. Summary

In this paper we have shown a new mathematical model of flow networks, which can be used to decision algorithm analysis. In particular it has been revealed a new interpretation of Bayes’ theorem, where the theorem has entirely deterministic meaning, and can be used to decision algorithm study. Besides, a new look of dependencies in databases, based on Łukasiewicz’s ideas of independencies of logical formulas, is presented.

References

[1] L. R. Ford, D. R. Fulkerson: Flows in Networks, Princeton University Press, Princeton, New Jersey

[2] J. Łukasiewicz: Die logishen Grundlagen der Wahrscheinilchkeitsrechnung. Kraków (1913), in: L. Borkowski (ed.), Jan Łukasiewicz – Selected Works, North Holland Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw, 1970

[3] Z. Pawlak: Flow graphs and decision algorithms, in: G. Wang, Q. Liu, Y. Y. Yao, A. Skowron (eds.), Proceedings of the Ninth International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing RSFDGrC’2003), Chongqing, China, May 26-29, 2003, LNAI 2639, Springer-Verlag, Berlin, Heidelberg, New York, 1-11

44

Chapter 5

Rough Sets and Conflict Analysis

1. Introduction

Conflict analysis and resolution play an important role in business, governmental, political and lawsuits disputes, labor-management negotiations, military operations and others. To this end many mathematical formal models of conflict situations have been proposed and studied, e.g., [1-6].

Various mathematical tools, e.g., graph theory, topology, differential equations and others, have been used to that purpose.

Needless to say that game theory can be also considered as a mathematical model of conflict situations.

In fact there is no, as yet, “universal” theory of conflicts and mathematical models of conflict situations are strongly domain dependent.

We are going to present in this paper still another approach to conflict analysis, based on some ideas of rough set theory − along the lines proposed in [5]. We will illustrate the proposed approach by means of a simple tutorial example of voting analysis in conflict situations.

The considered model is simple enough for easy computer implementation and seems adequate for many real life applications but to this end more research is needed.

2. Basic concepts of conflict theory

In this section we give after [5] definitions of basic concepts of the proposed approach. Let us assume that we are given a finite, non-empty set U called the universe. Elements of

U will be referred to as agents. Let a function v :U →{-1, 0, 1}, or in short {-, 0, +}, be given assigning to every agent the number -1, 0 or 1, representing his opinion, view, voting result, etc. about some discussed issue, and meaning against, neutral and favorable, respectively.

The pair S = (U, v) will be called a conflict situation. In order to express relations between agents we define three basic binary relations on the

universe: conflict, neutrality and alliance. To this end we first define the following auxiliary function:

.1)()( if ,1

, and 0)()( if ,0,or 1)()( if ,1

),(

−=−≠=

===

yvxvyxyvxv

yxyvxvyxvφ

This means that, if v 1),( =yxφ , agents x and y have the same opinion about issue v (are allied on v); if 0),( =yxvφ means that at least one agent x or y has neutral approach to issue a (is

45

neutral on a), and if 1),( −=yxvφ , means that both agents have different opinions about issue v (are in conflict on v).

),( xyRv+

),( zyRv+

),( xyRv−

),( zyRv−

),( zyRv+

),( xy

1),( −=Φ yxv−v

In what follows we will define three basic relations , and on U2 called alliance, neutrality and conflict relations respectively, and defined as follows:

+vR 0

vR −vR

1),( iff ),( =+ yxyxR vv φ ,

0),( iff ),(0 =yxyxR vv φ ,

1),( iff ),( −=− yxyxR vv φ .

It is easily seen that the alliance relation has the following properties:

(i) , ),( xxRv+

(ii) implies , ),( yxRv+

(iii) and implies , ),( yxRv+ ),( zxRv

+

i.e., is an equivalence relation. Each equivalence class of alliance relation will be called coalition with respect to v. Let us note that the condition (iii) can be expressed as “a friend of my friend is my friend”.

+vR

For the conflict relation we have the following properties:

(iv) not , ),( xxRv−

(v) implies , ),( yxRv−

(vi) and implies , ),( yxRv− ),( zxRv

+

(vii) and implies . ),( yxRv− ),( zxRv

−

Conditions (vi) and (vii) refer to well known sayings “an enemy of my enemy is my friend” and “a friend of my enemy is my enemy”.

For the neutrality relation we have:

(viii) not , ),(0 xxRv

(ix) . ),( 00 RyxR vv =

Let us observe that in the conflict and neutrality relations there are no coalitions. The following property holds: because if ( then 20 URRR vvv =∪∪ −+ 2), Uyx ∈ 1),( =Φ yxv

or or so r ( or . All the three relations , , are pairwise disjoint, i.e., every pair of objects (x, y) belongs to exactly one of the above defined relations (is in conflict, is allied or is neutral).

0),( =Φ yxv+vR 0

vR

+∈ vRyx ),( o 0), vRyx ∈ −vR∈yx ),(

R

With every conflict situation we will associate a conflict graph G . ),,( 0 −+= vvvS RRR An example of a conflict graph is shown in Fig. 1.

46

Fig. 1

Solid lines are denoting conflicts, doted line − alliance, and neutrality, for simplicity, is not shown explicitly in the graph. Of course, B, C, and D form a coalition.

3. An example

In this section we will illustrate the above presented ideas by means of a very simple tutorial example using concepts presented in the previous.

Table 1 presents a decision table in which the only condition attribute is Party, whereas the decision attribute is Voting. The table describes voting results in a parliament containing 500 members grouped in four political parties denoted A, B, C and D. Suppose the parliament discussed certain issue (e.g., membership of the country in European Union) and the voting result is presented in column Voting, where +, 0 and − denoted yes, abstention and no respectively. The column support contains the number of voters for each option.

Fact Party Voting Support1 A + 200 2 A 0 30 3 A − 10 4 B + 15 5 B − 25 6 C 0 20 7 C − 40 8 D + 25 9 D 0 35 10 D − 100

Table 1

The strength, certainty and the coverage factors for Table 1 are given in Table 2.

47

Fact Strengt

h Certaint

y Coverage

2 0.06 0.125 0.353 3 0.02 0.042 0.057 4 0.03 0.375 0.063 5 0.05 0.625 0.143 6 0.04 0.333 0.235 7 0.08 0.667 0.229 8 0.05 0.156 0.104 9 0.07 0.219 0.412 10 0.20 0.625 0.571

1 0.40 0.833 0.833

Table 2

From the certainty factors we can conclude, for example, that:

• 83.3% of party A voted yes •

From the coverage factors we can get, for example, the following explanation of voting

• 83.3% yes votes came from party A •

The flow graph associated with Table 2 is shown in Fig. 2.

12.5% of party A abstained • 4.2% of party A voted no

results:

6.3% yes votes came from party B • 10.4% yes votes came from party C

Branches of the flow graph represent decision rules together with their certainty and co

Fig. 2

verage factors. For example, the decision rule A → 0 has the certainty and coverage factors 0.125 and 0.353, respectively.

48

The flow graph gives a clear insight into the voting structure of all parties. For many applications exact values of certainty of coverage factors of decision rules are

not necessary. To this end we introduce “approximate” decision rules, denoted C⇒D and read “C mostly implies D”. C⇒D if and only if cer(C, D) > 0.5. Thus we can replace flow graph shown in Fig. 2 by "approximate” flow graph presented in Fig. 3.

Fig. 3

From this graph we can see that parties B, C and D form a coalition, which is in conflict with party A, i.e., every member of the coalition is in conflict with party A. The corresponding conflict graph is shown in Fig. 4.

Fig. 4

Moreover from the flow graph shown in Figure 2 we can obtain an “inverse” approximate flow graph which is shown in Fig. 5.

Fig. 5

49

This flow graph contains all inverse decision rules with certainty factor greater than 0.5. From this graph we can see that yes votes were obtained mostly from party A and no votes − mostly from party D.

We can also compute dependencies between parties and voting results the results are shown in Fig. 6.

Fig. 6

4. Summary

It is shown that with any conflict situation a flow graph can be associated. Flow distribution in the graph can be used to study the relationship between agents involved in the conflict.

References

[1] J. L. Casti: Alternative Realities − Mathematical Models of Nature and Man, Wiley, New York, 1989

[2] C. H. Coombs, G. S. Avruin: The Structure of Conflicts, Lawrence Erlbaum, London, 1988

[3] R. Deja: Conflict analysis, rough set methods and applications, in: L. Polkowski, S Tsumoto, T.Y. Lin (eds.), Studies in Fuzzyness and Soft Computing, Physica-Verlag, A Springer-Verlag Company, 2000, 491-520

[4] Y. Maeda, K. Senoo, H. Tanaka: Interval density function in conflict analysis, in: N. Zhong, A. Skowron, S. Ohsuga, (eds.), New Directions in Rough Sets, Data Mining and Granular-Soft Computing, Springer, 1999, 382-389

[5] A. Nakamura: Conflict Logic with Degrees, Rough Fuzzy Hybridization − A New Trend in Decison-Making, (S. K. Pal, A. Skowron, eds.), Springer, 1999, 136-150

[6] Z. Pawlak: An inquiry into anatomy of conflicts, Journal of Information Sciences 109, 1998, 65-68

50

51

[7] F. Roberts: Discrete Mathematical Models with Applications to Social, Biological and Environmental Problems, Englewood Clifs, 1976, Prince Hall.

Documents

Sets, fuzzy sets and rough sets - York University...Lotfi Zadeh proposed completely new, elegant approach to vagueness called fuzzy set theory oach an element can belong to a set to