Upload
venkatesh-gangula
View
87
Download
7
Embed Size (px)
Citation preview
2010 / 03 / 17 Yi - Xian Lin 1
A Fuzzy Self-Constructing Feature Clustering Algorithm for Text
Classification
Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee
Accepted by IEEE Transactions on Knowledge and Data Engineering
Reporter :Yi-Xian Lin
National University of Tainan
2010 / 03 / 17 Yi - Xian Lin 2
Outline
• Motivation & Objective
• Feature Reduction
• Feature Clustering
• Fuzzy Feature Clustering
• Text Classification
• Experimental results
• Advantages
2010 / 03 / 17 Yi - Xian Lin 3
Motivation &&&& Objective
• In text classification, the dimensionality of the feature vector is
usually huge
• The current problem of the existing feature clustering methods
� The desired number of extracted features has to be specified in advance
� When calculating similarities, the variance of the underlying cluster is
not considered
• How to reduce the dimensionality of feature vectors for text
classification and run faster ?
2010 / 03 / 17 Yi - Xian Lin 4
Feature Reduction
• Purpose
� Reduce classifier’s computation load
� Increase data consistency
• Techniques
� To eliminate redundant data
� To find representative data
� To reduce the dimensions of the feature sets
� To find the best set of vectors which best separate the patterns
• Two ways of doing feature reduction, feature selection
and feature extraction
2010 / 03 / 17 Yi - Xian Lin 5
Feature Reduction
• Feature selection
� Let the word set W={W1,W2,…,Wm} be the feature vector of the
document set
� Find a new word set
� Then W’ is used as inputs for classification tasks
• Feature extraction
� Extracted features are obtained by a projecting process through
algebraic transformations
� Let a corpus of documents be represented as an matrix
� Find an optimal transformation matrix
' ' ' '
1 2{ , ,... } , kW w w w k m= <
nm×nm
RX×∈
kmRF
×∈*
2010 / 03 / 17 Yi - Xian Lin 6
Feature Clustering
• Feature clustering is an efficient approach for feature reduction
• Groups all features into some clusters where features in a
cluster are similar to each other
• Let D be the matrix consisting of all the original documents
with m features and D’ be the matrix consisting of the
converted documents with new k features
• New feature set corresponds to a partition
{W1,W2,…,Wk} of the original feature set W
' ' ' '
1 2{ , ,... }kW w w w=
2010 / 03 / 17 Yi - Xian Lin 7
Fuzzy Feature Clustering
• A document set D of n documents d1,d2,...,dn
• Feature vector W of m words w1,w2,...,wm
• p classes c1,c2,...,cp
• Construct one word pattern for each word in W
where
( ) ( ) ( )1 2 1 2, ,..., | , | ,..., |i i i ip i i p ix x x x P c w P c w P c w= =
( ) 1
1
| , 1
n
qi qiq
j i n
qiq
dP c w for j p
d
δ=
=
×= ≤ ≤∑∑
2010 / 03 / 17 Yi - Xian Lin 8
Fuzzy Feature Clustering
( ) ( )6 1 6 2 6| , |x P c w P c w=
( )2 6
1 0 2 0 0 0 1 0 1 1 1 1 1 1 1 1 0 1| 0.50
1 2 0 1 1 1 1 1 0P c w
× + × + × + × + × + × + × + × + ×= =
+ + + + + + + +
2010 / 03 / 17 Yi - Xian Lin 9
Fuzzy Feature Clustering• Let G be a cluster containing q word patterns x1,x2,...,xq
• Let
• The mean
• The deviation
• The fuzzy similarity of a word pattern x to cluster G
1 2, ,..., , 1j j j jpx x x x j q= ≤ ≤
1
1 2, ,..., ,
q
jij
p i
xm m m m m
G
== =
∑
1 2, ,..., pσ σ σ σ=
( )2
1 , 1
q
ji jij
i
x mfor i p
Gσ =
−= ≤ ≤∑
( )2
1
expp
i i
i i
x mG xµ
σ=
− = −
∏
2010 / 03 / 17 Yi - Xian Lin 10
Fuzzy Feature Clustering
• A word pattern close to the mean of a cluster is regarded to
be very similar to this cluster
• Suppose m1 = < 0.4, 0.6 > , σ1 = < 0.3 , 0.5 >
( ) 1G xµ ≈
( )2 2
1 2
0.2 0.4 0.8 0.6exp exp
0.3 0.5
0.6412 0.8521 0.5464
G xµ − −
= − × −
= × =
2010 / 03 / 17 Yi - Xian Lin 11
Fuzzy Feature Clustering
• A predefined threshold ρ,
• If , xi passes the similarity test on cluster Gj
• If the user intends to have larger clusters, give a smaller
threshold
• Two cases may occur
� No existing fuzzy clusters on which xi has passed the similarity test
� Create a new cluster Gh , h = k + 1 ( k is the number of currently
existing clusters) ,
� is a user-defined constant vector
0 1ρ≤ ≤
( )j iG xµ ρ≥
0= , h i hm x σ σ=
0 0 0,...,σ σ σ=
2010 / 03 / 17 Yi - Xian Lin 12
Fuzzy Feature Clustering
• If there are existing clusters on which xi has passed the
similarity test, let cluster Gt be the cluster with the largest
membership degree ,
• Modification to cluster Gt
( )( )1
arg max j ij k
t G xµ≤ ≤
=
( )( )
0
2 22 2
0
, , 1
1 1 ,
1
1 , 1
t tj ij
tj tj
t
t tj t tj ij t tj ijt
t t t
t t
S m xm A B
S
S S m x S m xSA B
S S S
for j p and S S
σ σ
σ σ
× += = − +
+
− − + × + × + += =
+
≤ ≤ = +
2010 / 03 / 17 Yi - Xian Lin 13
Fuzzy Feature Clustering
• The order in which the word patterns are fed in influences the
clusters obtained
• Sort all the patterns, in decreasing order, by their largest
components
� Let x1 = < 0.1 , 0.3 , 0.6 > , x2 = < 0.3, 0.3, 0.4 > , x3 = < 0.8, 0.1, 0.1 >
� The largest components in these word patterns are 0.6, 0.4, and 0.8
� The sorted list is 0.8, 0.6, 0.4
� The order of feeding is x3, x1, x2
2010 / 03 / 17 Yi - Xian Lin 14
Fuzzy Feature Clustering
• The order of feeding : x5, x7, x10, x1, x4, x9, x2, x3, x8, x6
• No clusters exist at the beginning , k = 0
• Set σ0 = 0.5 , ρ=0.64
• Create G1
< 0.5 , 0.5 >< 1.00 , 0.00 >1G1
deviation σmean mSize Scluster
2010 / 03 / 17 Yi - Xian Lin 15
Fuzzy Feature Clustering
• Feeding : x7 μG1(x7) = 1 > ρ
( )( )
( )( )
11 12
1
2 22 2
11 11
2 22 2
12 12
11
1 1.00 1.00 1 0.00 0.001.00 , 0.00
1 1 1 1
1.00 , 0.00
1 1 0.5 0.5 1 1.00 1.00 1 1 1 1.00 1.00 ,
1 1 1 1
1 1 0.5 0.5 1 0.00 0.00 1 1 1 0.00 0.00 ,
1 1 1 1
m m
m
A B
A B
σ
× + × += = = =
+ +
=
− − + × + + × + = =
+
− − + × + + × + = =
+
11 11 12 11 11
1 1
0.5 0.5 , 0.5 0.5
0.5 , 0.5 , 1 1 2
A B A B
S
σ
σ
= − + = = − + =
= = + =
2010 / 03 / 17 Yi - Xian Lin 16
Fuzzy Feature Clustering
• After self-constructing clustering
• Similarities of patterns to clusters
2010 / 03 / 17 Yi - Xian Lin 17
Fuzzy Feature Clustering
• Data transformation
• H-FFC (hard weighting)
� each word is only allowed to belong to a cluster and so it only
contributes to a new extracted feature
'D DT=
[ ]1
' ' ' '
1 2 2 , TT
n nD d d d D d d d = = ⋯ ⋯
( )( )11 , arg max
0 , otherwise
k i
ij
j G xt
α αµ≤ ≤ =
=
if
2010 / 03 / 17 Yi - Xian Lin 18
Fuzzy Feature Clustering
H-FFC :
2010 / 03 / 17 Yi - Xian Lin 19
Fuzzy Feature Clustering
• S-FFC (soft weighting)
� each word is allowed to contribute to all new extracted features,
with the degrees depending on the values of the membership
functions
• M-FFC (mixed weighting)
� a combination of the hardweighting approach and the soft-
weighting approach
� γis a user-defined constant lying between 0 and 1
( )ij j it G xµ=
( ) ( )1H S
ij ij ijt t tγ γ= × + − ×
2010 / 03 / 17 Yi - Xian Lin 20
Fuzzy Feature Clustering
S-FFC :
2010 / 03 / 17 Yi - Xian Lin 21
Fuzzy Feature Clustering
M-FFC :
2010 / 03 / 17 Yi - Xian Lin 22
Text Classification
Training document data set
Feature reduction
Training data set for class 1
…...Training data set for class p
Train 1st classifier (SVM) Train p-th classifier (SVM)
…...
Unknown pattern
Feature reduction
…...
p classifiers are constructed.
2010 / 03 / 17 Yi - Xian Lin 23
Text Classification
• Training data set and target sets for SVMs
Class Target 1Target 2
C1 +1 -1
C1 +1 -1
C1 +1 -1
C1 +1 -1
C2 -1 +1
C2 -1 +1
C2 -1 +1
C2 -1 +1
C2 -1 +1
Training target set for class C1
Training target set for class C2
2010 / 03 / 17 Yi - Xian Lin 24
Text Classification
• Training classifiers
• Feature reduction for unknown pattern
1target ' +HD
2target ' +HD
Training classifier (SVM1)
Training classifier (SVM2)
Unknown pattern
Unknown pattern after feature reduction
2010 / 03 / 17 Yi - Xian Lin 25
Text Classification
• Classify the unknown pattern
Trained classifier (SVM1)
Trained classifier (SVM2)
-1 +1
Unknown pattern d Class C2Classified to
2010 / 03 / 17 Yi - Xian Lin 26
Experimental results
• Performance measures
class. wrt negatives False :
class. wrt positives False :
class. wrt negatives True :
class. wrt positives True :
classes. ofnumber :
i-thFN
i-thFP
i-thTN
i-thTP
p
i
i
i
i
( ) ( )
( )
( )
1 1
1 1
1
1
,
21 ,
P P
i ii i
P P
i i i ii i
P
i ii
P
i i i ii
TP TPMicroP MicroR
TP FP TP FN
TP TNMicroP MiccroRMicroF MicroAcc
MicroP MiccroR TP TN FP FN
= =
= =
=
=
= =+ +
+×= =
+ + + +
∑ ∑∑ ∑
∑∑
2010 / 03 / 17 Yi - Xian Lin 27
Experimental results
• 20 news groups data set
Number of classes 20
Number of
documents20000
Proportion of
training documents2/3
Proportion of
testing documents1/3
Number of features 25718
2010 / 03 / 17 Yi - Xian Lin 28
Experimental results
Execution time (sec) of different methods on 20 Newsgroup data
2010 / 03 / 17 Yi - Xian Lin 29
Experimental results
Microaveraged accuracy (%) of different methods on 20 Newsgroup data
2010 / 03 / 17 Yi - Xian Lin 30
Experimental results
Microaveraged F1 (%) of M-FFC with different γvalues
for 20 Newsgroups data
2010 / 03 / 17 Yi - Xian Lin 31
Advantages• a fuzzy self-constructing feature clustering (FFC)
algorithm which is an incremental clustering approach
to reduce the dimensionality of the features in text
classification
• Determine the number of features automatically
• Match membership functions closely with the real
distribution of the training data
• Runs faster
• Better extracted features than other methods