Upload
bigyan
View
436
Download
1
Embed Size (px)
Citation preview
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
SVM based Semi-Supervised Classi�cationTopics in Pattern Recognition
Bigyan Bhar
M.E. CSA, IISc
4710-410-091-07064
Oct 11th, 2010
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
What is a Classi�cation?
Classi�cation refers to an algorithmic procedure for assigning a
given piece of input data into one of a given number of
categories
Class test Final Exam Project Seminar
13 35 16 18
10 31 5 19
11 21 9 11
12 29 10 15
Grade
A
B
C
B
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Traditional Classi�er
Classifier
BuilderLabelled Data Classifier
Unlabelled Data Classifier Label for Data
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Classi�er
Classi�er is supposed to classify unlabeled data
We have a lot of unlabeled data; typically much more than
number of labeled data
So far we have seen classi�ers being built using only labeled
data
What if we could also use the large set of unclassi�ed data to
build a better classi�er?
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Semi-supervised Classi�er
Classifier
Builder
Unlabelled &
Classifier
BuilderLabelled Data Classifier
Labelled Data Classifier
Semi−supervised
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
How to use the unlabeled data?
The separating plane has to pass through a low density region
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
How to use the unlabeled data?
The separating plane has to pass through a low density region
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Labeling Constraint
The �low density region� principle that we observed can be
realized using a fractional constraint
# of positive class examples
total # of of examples= r
r is an user supplied input
We enforce the above constraint on unlabeled examples as
they are large in number
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
What is SVM?
SVM = Support Vector Machine
Maximal Margin Classi�er
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
SVM Continued
w
w’x+b=
1
w’x+b=
0
w’x+b=
−1
margin
Total margin= 1‖w‖ + 1
‖w‖ = 2‖w‖
Optimization problem
minw
[1
2wTw
]subject to,
yi
[wT xi +b
]≥ 1 ∀1≤ i ≤ l
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
SVM Formulation
Using KKT conditions, we get the �nal SVM problem as:
w∗ = argminw
[1
2
l
∑i=1
loss(yiw
T xi
)+
λ
2wTw
]
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Transductive SVM (TSVM)
minw ,{y ′j }
u
j=1
λ
2‖w‖2 +
1
2l
l
∑i=1
loss(yiw
T xi
)+
λ ′
2u
u
∑j=1
loss(y ′jw
T x ′i
)
subject to:
1
u
u
∑j=1
max[0,sign
(wT x ′j
)]= r
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Modifying TSVM
What is the cost to importance ratio of the terms in TSVM
formulation?
minw ,{y ′j }
u
j=1
λ
2‖w‖2 +
1
2l
l
∑i=1
loss(yiw
T xi
)+
λ ′
2u
u
∑j=1
loss(y ′jw
T x ′i
)Clearly the third term, unlabeled loss is the costliest
computation of y ′i for the large set of unlabeled terms
What if we can avoid it altogether?
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Modi�ed TSVM
TSVM formulation:
minw ,{y ′j }
u
j=1
λ
2‖w‖2 +
1
2l
l
∑i=1
loss(yiw
T xi
)+
λ ′
2u
u
∑j=1
loss(y ′jw
T x ′i
)
Our formulation:
minw
λ
2‖w‖2 +
1
2l
l
∑i=1
loss(yiw
T xi
)subject to:
1
u
u
∑j=1
max[0,sign
(wT x ′j
)]= r
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Augmented Lagrangian Technique
Augmented Lagrangian is a technique for solving minimization
problems with equality constraints
It converges faster than the generalized methods
Original problem: min f(x), subject to g(x) = 0
Can be written as an unconstrained minimization over:
L(x ,λ ,µ) = f(x)−λg(x)+1
2µ‖g(x)‖2
Since f and the Lagrangian (for any λ ) agree on the feasible
set g(x) = 0, the basic idea remains same as that of
Lagrangian
a small value of µ forces the minimizer(s) of L to lie close tothe feasible setvalues of x that that reduce f are preferred
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Modi�ed TSVM using Augmented Lagrangian
Our formulation:
minw
[f(w)] =⇒ minw
[λ
2‖w‖2 +
1
2l
l
∑i=1
loss(yiw
T xi
)]subject to:
g(w) = 0 =⇒ 1
u
u
∑j=1
max[0,sign
(wT x ′j
)]− r = 0
Augmented Lagrangian:
minx
[L(x ,λ ,µ)] = minx
[f(x)−λg(x)+
1
2µ‖g(x)‖2
]
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
Penalty Method
Augmented Lagrangian:
minx
[L(x ,λ ,µ)] = minx
[f(x)−λg(x)+
1
2µ‖g(x)‖2
]
Penalty Method:
minx
[f(x)+
1
2µ‖g(x)‖2
]
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
SVM based
Supervised SVM (SSVM)
w∗ = arg minw∈Rd
[λ
2‖w‖2 +
1
2
l
∑i=1
loss(yiw
T xi
)]
SSVM with Threshold Adjustment
Obtain w∗ from SSVM
Adjust threshold to satisfy la belling constraint
1
u
u
∑j=1
max[0,sign
(w
Tx′j
)]= r
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods
All Methods at a Glance
SVM based:
SSVM on labeled dataSSVM on labeled data with threshold adjustment
Methods proposed in this work:
Augmented LagrangianPenalty Method
TSVM
Deterministic AnnealingSwitching
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Accuracy Vs # of Labled Example (gcat)
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Accuracy Vs # of Labled Example (aut-avn)
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Accuracy Vs # of Noise in r (gcat)
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
Accuracy Vs # of Noise in r (aut-avn)
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
New FactsFurther DirectionsAcknowledgmentsReferences
Outline
1 Classi�cation
2 Support Vector Machine (SVM)
3 Using SVM for Semi-Supervised Classi�cation
Transductive SVM & Modi�cations
Augmented Lagrangian
Other Methods
All Methods
4 Results
5 Conclusion
New Facts
Further Directions
Acknowledgments
References
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
New FactsFurther DirectionsAcknowledgmentsReferences
Some Results
Simple penalty method is the most robust method WRT
estimation of r
TSVM still leads in terms of accuracy
Augmented Lagrangian is a direction worth investigating due
to its faster computational time
Defeating the SSVM is possible only for reasonably accurate
estimation of r
If labeled dataset does not follow r , then alternate methods
perform better
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
New FactsFurther DirectionsAcknowledgmentsReferences
Future Directions
Establish theoretical bounds for accuracy of our methods WRT
that of TSVM
Look at non-SVM based semi-supervised classi�ers (e.g.
decision tree) and come up with a way to express the
fractional constraint
Can we use something other than the fractional constraint to
enforce the low density criterion?
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
New FactsFurther DirectionsAcknowledgmentsReferences
Acknowledgments
I thank the following persons for their able guidance and help in
this work:
S S Keerthi (Yahoo! Labs)
M N Murthy (IISc)
S Sundararajan (Yahoo! Labs)
S Shevade (IISc)
Bigyan Bhar Seminar, Topics in PR
Classi�cationSupport Vector Machine (SVM)
Using SVM for Semi-Supervised Classi�cationResults
Conclusion
New FactsFurther DirectionsAcknowledgmentsReferences
References
MS Gockenbach. The augmented Lagrangian method for
equality-constrained optimizations
V Sindhwani, SS Keerthi. Newton Methods for Fast Solution
of Semi-supervised Linear SVMs
SS Keerthi, D DeCoste. A modi�ed �nite Newton method for
fast solution of large scale linear SVMs
Bigyan Bhar Seminar, Topics in PR