Presentation 01

Classi�cationSupport Vector Machine (SVM)

Using SVM for Semi-Supervised Classi�cationResults

Conclusion

SVM based Semi-Supervised Classi�cationTopics in Pattern Recognition

Bigyan Bhar

M.E. CSA, IISc

4710-410-091-07064

Oct 11th, 2010

Bigyan Bhar Seminar, Topics in PR



Conclusion

Outline

1 Classi�cation

2 Support Vector Machine (SVM)

3 Using SVM for Semi-Supervised Classi�cation

Transductive SVM & Modi�cations

Augmented Lagrangian

Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion

Outline

1 Classi�cation





Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion

What is a Classi�cation?

Classi�cation refers to an algorithmic procedure for assigning a

given piece of input data into one of a given number of

categories

Class test Final Exam Project Seminar

13 35 16 18

10 31 5 19

11 21 9 11

12 29 10 15

Grade

A

B

C

B




Conclusion

Traditional Classi�er

Classifier

BuilderLabelled Data Classifier

Unlabelled Data Classifier Label for Data




Conclusion

Classi�er

Classi�er is supposed to classify unlabeled data

We have a lot of unlabeled data; typically much more than

number of labeled data

So far we have seen classi�ers being built using only labeled

data

What if we could also use the large set of unclassi�ed data to

build a better classi�er?




Conclusion

Semi-supervised Classi�er

Classifier

Builder

Unlabelled &

Classifier

BuilderLabelled Data Classifier

Labelled Data Classifier

Semi−supervised




Conclusion

How to use the unlabeled data?

The separating plane has to pass through a low density region




Conclusion

How to use the unlabeled data?

The separating plane has to pass through a low density region




Conclusion

Labeling Constraint

The �low density region� principle that we observed can be

realized using a fractional constraint

# of positive class examples

total # of of examples= r

r is an user supplied input

We enforce the above constraint on unlabeled examples as

they are large in number




Conclusion

Outline

1 Classi�cation





Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion

What is SVM?

SVM = Support Vector Machine

Maximal Margin Classi�er




Conclusion

SVM Continued

w

w’x+b=

1

w’x+b=

0

w’x+b=

−1

margin

Total margin= 1‖w‖ + 1

‖w‖ = 2‖w‖

Optimization problem

minw

[1

2wTw

]subject to,

yi

[wT xi +b

]≥ 1 ∀1≤ i ≤ l




Conclusion

SVM Formulation

Using KKT conditions, we get the �nal SVM problem as:

w∗ = argminw

[1

2

l

∑i=1

loss(yiw

T xi

)+

λ

2wTw

]




Conclusion

Transductive SVM & Modi�cationsAugmented LagrangianOther MethodsAll Methods

Outline

1 Classi�cation





Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion


Transductive SVM (TSVM)

minw ,{y ′j }

u

j=1

λ

2‖w‖2 +

1

2l

l

∑i=1

loss(yiw

T xi

)+

λ ′

2u

u

∑j=1

loss(y ′jw

T x ′i

)

subject to:

1

u

u

∑j=1

max[0,sign

(wT x ′j

)]= r




Conclusion


Modifying TSVM

What is the cost to importance ratio of the terms in TSVM

formulation?

minw ,{y ′j }

u

j=1

λ

2‖w‖2 +

1

2l

l

∑i=1

loss(yiw

T xi

)+

λ ′

2u

u

∑j=1

loss(y ′jw

T x ′i

)Clearly the third term, unlabeled loss is the costliest

computation of y ′i for the large set of unlabeled terms

What if we can avoid it altogether?




Conclusion


Modi�ed TSVM

TSVM formulation:

minw ,{y ′j }

u

j=1

λ

2‖w‖2 +

1

2l

l

∑i=1

loss(yiw

T xi

)+

λ ′

2u

u

∑j=1

loss(y ′jw

T x ′i

)

Our formulation:

minw

λ

2‖w‖2 +

1

2l

l

∑i=1

loss(yiw

T xi

)subject to:

1

u

u

∑j=1

max[0,sign

(wT x ′j

)]= r




Conclusion


Augmented Lagrangian Technique

Augmented Lagrangian is a technique for solving minimization

problems with equality constraints

It converges faster than the generalized methods

Original problem: min f(x), subject to g(x) = 0

Can be written as an unconstrained minimization over:

L(x ,λ ,µ) = f(x)−λg(x)+1

2µ‖g(x)‖2

Since f and the Lagrangian (for any λ ) agree on the feasible

set g(x) = 0, the basic idea remains same as that of

Lagrangian

a small value of µ forces the minimizer(s) of L to lie close tothe feasible setvalues of x that that reduce f are preferred




Conclusion


Modi�ed TSVM using Augmented Lagrangian

Our formulation:

minw

[f(w)] =⇒ minw

[λ

2‖w‖2 +

1

2l

l

∑i=1

loss(yiw

T xi

)]subject to:

g(w) = 0 =⇒ 1

u

u

∑j=1

max[0,sign

(wT x ′j

)]− r = 0

Augmented Lagrangian:

minx

[L(x ,λ ,µ)] = minx

[f(x)−λg(x)+

1

2µ‖g(x)‖2

]




Conclusion


Penalty Method

Augmented Lagrangian:

minx

[L(x ,λ ,µ)] = minx

[f(x)−λg(x)+

1

2µ‖g(x)‖2

]

Penalty Method:

minx

[f(x)+

1

2µ‖g(x)‖2

]




Conclusion


SVM based

Supervised SVM (SSVM)

w∗ = arg minw∈Rd

[λ

2‖w‖2 +

1

2

l

∑i=1

loss(yiw

T xi

)]

SSVM with Threshold Adjustment

Obtain w∗ from SSVM

Adjust threshold to satisfy la belling constraint

1

u

u

∑j=1

max[0,sign

(w

Tx′j

)]= r




Conclusion


All Methods at a Glance

SVM based:

SSVM on labeled dataSSVM on labeled data with threshold adjustment

Methods proposed in this work:

Augmented LagrangianPenalty Method

TSVM

Deterministic AnnealingSwitching




Conclusion

Outline

1 Classi�cation





Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion

Accuracy Vs # of Labled Example (gcat)




Conclusion

Accuracy Vs # of Labled Example (aut-avn)




Conclusion

Accuracy Vs # of Noise in r (gcat)




Conclusion

Accuracy Vs # of Noise in r (aut-avn)




Conclusion

New FactsFurther DirectionsAcknowledgmentsReferences

Outline

1 Classi�cation





Other Methods

All Methods

4 Results

5 Conclusion

New Facts

Further Directions

Acknowledgments

References




Conclusion


Some Results

Simple penalty method is the most robust method WRT

estimation of r

TSVM still leads in terms of accuracy

Augmented Lagrangian is a direction worth investigating due

to its faster computational time

Defeating the SSVM is possible only for reasonably accurate

estimation of r

If labeled dataset does not follow r , then alternate methods

perform better




Conclusion


Future Directions

Establish theoretical bounds for accuracy of our methods WRT

that of TSVM

Look at non-SVM based semi-supervised classi�ers (e.g.

decision tree) and come up with a way to express the

fractional constraint

Can we use something other than the fractional constraint to

enforce the low density criterion?




Conclusion


Acknowledgments

I thank the following persons for their able guidance and help in

this work:

S S Keerthi (Yahoo! Labs)

M N Murthy (IISc)

S Sundararajan (Yahoo! Labs)

S Shevade (IISc)




Conclusion


References

MS Gockenbach. The augmented Lagrangian method for

equality-constrained optimizations

V Sindhwani, SS Keerthi. Newton Methods for Fast Solution

of Semi-supervised Linear SVMs

SS Keerthi, D DeCoste. A modi�ed �nite Newton method for

fast solution of large scale linear SVMs


Documents

Presentation 01