View
228
Download
0
Tags:
Embed Size (px)
Citation preview
Data classification based on tolerant rough set
reporter: yanan yean
Abstract
• Similarity measure between two data is described by a distance function of all constituent attributes.
• Optimal similarity threshold value –GA• Two-stage classification method
– Lower approximation– Rough membership functions obtained from the
upper approximation
• BPNN,OFUNN,FCM
Outline
• Introduction
• Tolerant rough set
• Determination of similarity thresholds
• Data classification based on the tolerant rough set
• Simulation results and discussion
• conclusion
• Carpenter and Grossberg– Fuzzy adaptive resonance theory (ART)
• Lin and Lee– A general neural-network model for fuzzy logic control and d
ecision systems
• Simpson– A fuzzy min-max classification neural network
• Banzan et al.– Multi-modal logics for automatic feature extraction– Rough-set-based induct reasoning for discovering optimal feature s
et.• Nguyen et al.
– The tolerance relation among the objects for pattern classification.
1.Introduction
2.Tolerant rough set
• Some objects have an indiscernibility relation I from each other with the given attributes.
• A tolerance relation that satisfies only the reflexive and symmetric property.
• A tolerance set
• Define a similarity measure that quantifies the closeness between attribute values of objects.– t(a) is a similarity threshold value
• We can relate the tolerance relation with the similarity measure as
• One of the most important tasks in the data classification using the similarity measure defined above is the optimal determination of the similarity threshold
• Apply the GA to solve this optimization problem
3.Determination of similarity thresholds• 3-1. Chromosome representation
– The Inputs: the information table– The similarity measure– The output: a set of optimal similarity threshold
values– An object is represented by n attributes– The chromosome for the GA consists of n+1
consecutive real numbers of the similarity thresholds
– t(A) : the similarity threshold that defines the tolerance relation when all attributes A are considered together.
3-2. Initial population generation
• The initial gene values in the chromosome are obtained by generating n+1 real-valued random numbers in the interval of [0.5,1.0]
3-3.Fitness function
• If ,then we can say that there is a connection between two objects x and y.
• When two objects are tolerant and contained in the same class, they have good connection.
• Some objects that are tolerant of each other are included in the same class as many as possible.
• A quality of approximation of classification that express the ratio of all classified objects to all objects.
• A set of objects contained in the same class
• The tolerance set of an object x whose all elements in TS(x) is contained in the same class di
• A quality of approximation of classification that express the ratio of all classified objects to all objects.
• ; the size of tolerant sets ;similarity thresholds
• The ratio of good connection– Express a ration of good connections to all possible
connections as
• ;the size of tolerant sets ;the similarity thresholds• The fitness function F in order to balance two coefficients
• The first term makes some tolerant objects to be contained in the same class
• The second term makes the objects in the same class to be tolerant.
3-4. Genetic operations
• Reproduction– First selection method : F – Second selection method: a modified k-tournamen
t method.• F , k chromosomes selected from the upper class of fit
ness values randomly is chosen => reproduction
– Choromosomes : C1.C2 =>Cc+m
• Crossover– (C1,t1(ai),F1) (C2,t2(ai),F2)
– The new chromosome Cc created by the chromosome operation is computed by an average weighted by fitness value as
• Mutation
4.Data classification based on the tolerant rough set
• We define a rough membership function udi (x)– Express the degree of inclusion of the sample x in the decision clas
s di as
• 1st stage: Classification using the lower approximation set– A tolerant set of a test sample x,
• 2nd stage: Classification using the upper approximation set
5.Simulation results and discussion