Upload
hau-nguyen
View
229
Download
0
Embed Size (px)
Citation preview
8/18/2019 A introduction to svm.pdf
1/48
Practical Guide to Support
Vector MachinesTingfan Wu
MPLAB, UCSD
8/18/2019 A introduction to svm.pdf
2/48
Outline
•
Data Classification
•
High-level Concepts of SVM
•
Interpretation of SVM Model/Result• Use Case Study
2
8/18/2019 A introduction to svm.pdf
3/48
What does it mean to learn?
•
Acquire new skills?
• Make predictions about the world?
3
8/18/2019 A introduction to svm.pdf
4/48
Making predictions is
fundamental to survival
Will that bear eat me?
Is there water in that canyon?
These are all examples of classification problems
Is that person a good mate?
4
8/18/2019 A introduction to svm.pdf
5/48
Boot Camp Related
face recognition / speaker identification
Motion classification
Brain Computer Interface / Spikes Classification
5
8/18/2019 A introduction to svm.pdf
6/48
Driver Fatigue Detection from Facial Expression
8/18/2019 A introduction to svm.pdf
7/48
Data Classification
• Given training data (class labels known)Predicts test data (class labels unknown)
•
Not just fitting! generalization
SensorClassifier
SVM Adaboost
Neural Network
PredictionDataPreprocessingfeatures
7
8/18/2019 A introduction to svm.pdf
8/48
Generalization
8
Many possible classification modelsWhich one generalize better ?
8/18/2019 A introduction to svm.pdf
9/48
Generalization
9
8/18/2019 A introduction to svm.pdf
10/48
Why SVM ? (my opinion)
•
With careful data preprocessing, and properlyuse of SVM or NN! similar performance.
•
SVM is easier to use properly.
•
SVM provides a reasonable good baselineperformance.
10
8/18/2019 A introduction to svm.pdf
11/48
Outline
•
Data Classification
•
High-level Concepts of SVM
•
Interpretation of SVM Model/Result• Use case study
11
8/18/2019 A introduction to svm.pdf
12/48
A Simple Dilemma
Who do I invite to mybirthday party?
12
8/18/2019 A introduction to svm.pdf
13/48
Problem Formulation
•
training data as vectors: xi• binary labels [ +1, -1]
Name Gift? Income Fondness
John Yes 3k 3/5
Mary No 5k 1/5
class feature vector
y1=+1 x1 = [3000, 0.6]
y2= -1 x2 = [5000, 0.2]13
8/18/2019 A introduction to svm.pdf
14/48
Vector space
Gift
x 2
( D i s p o s a b l e I n c o m e )
x1(Fondness)
++
+
++
++
++ +
+-
--
--
--
--No Gift
14
8/18/2019 A introduction to svm.pdf
15/48
A Line
w T x + b = 0
x 2 ( s e c o n d f e a t u r
e )
x1(first feature)
Nor mal: w
++
+
++
++
++
++-
--
--
- ---
The line :
“Hyperplane” in high dimensional space 15
8/18/2019 A introduction to svm.pdf
16/48
The inequalities and regions
++
+
++
++
++ +
+--
---
- ---
wT
x + b = 0
D eci si on f unct i on f ( x )=
si gn( w T xn ew
+ b)
w T x i + b < 0
w T x i + b > 0
model
16
8/18/2019 A introduction to svm.pdf
17/48
Large Margin
17
8/18/2019 A introduction to svm.pdf
18/48
Maximal Margin
18
8/18/2019 A introduction to svm.pdf
19/48
Data not linearly separable
Case 1 Case 2
19
8/18/2019 A introduction to svm.pdf
20/48
Trick 1: Soft-Margin
Penalty ofviolating data
These points are usually outliers. The hyperplane should not bias too much.
20
8/18/2019 A introduction to svm.pdf
21/48
Soft-margin
21
[Ben-Hur & Weston 2005]
8/18/2019 A introduction to svm.pdf
22/48
Support vectors
More important data that support (define) the hyperplane 22
8/18/2019 A introduction to svm.pdf
23/48
Trick2: Map to Higher Dimensionx2 = x21
x1
x2
x21
x2 x2 = x21
M appi ng: Á
23
8/18/2019 A introduction to svm.pdf
24/48
Mapping to Infinite Dimension• Is it possible to create a universal mapping ?•
What if we can map to infinite dimension ? Every problem is separable!
•
Consider “Radial Basis Function (RBF)”:
• =Kernel(x,y)
w : infinite number of variables!
24
8/18/2019 A introduction to svm.pdf
25/48
Dual Problem
•
Primal •
Dual
finite calculation
25
8/18/2019 A introduction to svm.pdf
26/48
Gaussian/RBF Kernel
~ linear kernel Overfitting
nearest neighbor?
26
8/18/2019 A introduction to svm.pdf
27/48
27
[Ben-Hur & Weston 2005]
8/18/2019 A introduction to svm.pdf
28/48
Recap
Soft-nessNonlinearity
28
8/18/2019 A introduction to svm.pdf
29/48
Checkout the SVMToy
•
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
•
-c (cost control softness of the margin/#SV)• -g (gamma controls the curvature of the
hyperplane)
29
8/18/2019 A introduction to svm.pdf
30/48
Cross Validation
Training
Split: training Split: test
Testing
•
What is the best (C, !) ?! Date dependent• Need to be determined by “testing performance”
• Split training data into pseudo “training, testing” sets
• Exhausted grid search for best (C, !)
determine the best (C, !)
30
8/18/2019 A introduction to svm.pdf
31/48
Outline
•
Machine Learning! Classification
•
High-level Concepts of SVM
•
Interpretation of SVM Model/Result• Use Case Study
31
8/18/2019 A introduction to svm.pdf
32/48
(1)Decision value as strength
32
D eci si on f unct i on f ( x ) = si gn( w T xn ew + b)
+
-
8/18/2019 A introduction to svm.pdf
33/48
Facial Movement Classification
•
Classes: brow up(+) or down(-)
•
Features: pixels of Gabor filtered image
33
8/18/2019 A introduction to svm.pdf
34/48
Decision value as strength
Probability estimates from decision values also available34
8/18/2019 A introduction to svm.pdf
35/48
(2)Weight as feature importance
•
Magnitude of weight : feature importance
•
Similar to regression
35
8/18/2019 A introduction to svm.pdf
36/48
8/18/2019 A introduction to svm.pdf
37/48
(3)Weights as profiles
Fluorescent image of cells ofvarious dosage of certain drug
Various image-based features
Clustering the weights shows theprimal and secondary effect of thedrug
37
8/18/2019 A introduction to svm.pdf
38/48
Outline
•
Machine Learning! Classification
•
High-level Concepts of SVM
•
Interpretation of SVM Model/Result• User Case Study
38
8/18/2019 A introduction to svm.pdf
39/48
The Software
•
SVM requires an constraint quadraticoptimization solver!not easy to implement.
• Off-the-shelf Software ! "#$%&' by Chih-Jen Lin et. al.
– svmlight by Thorsten Joachims
•
Incorporated into many ML software –
matlab / pyML / R…
39
8/18/2019 A introduction to svm.pdf
40/48
Beginners may…
1.
Convert their data into the format of aSVM software.
2.
May not conduct scaling3. Randomly try few parameters andwithout cross validation
4. Good result on training data, but poor intesting.
40
8/18/2019 A introduction to svm.pdf
41/48
Data scaling
Without scaling
– feature of large dynamic range may dominateseparating hyperplane.
label X Height Gender
y1=0 x1 150 2
y2=1 x2 180 1
y3=1 x3 185 1Height
G en d er
41
8/18/2019 A introduction to svm.pdf
42/48
Parameter SelectionContour of cross validation accuracy.
Good area
42
8/18/2019 A introduction to svm.pdf
43/48
User case : Astroparticle scientist•
User:I am using libsvm in a astroparticle physicsapplication .. First, let me congratulate you toa really easy to use and nice package.
Unfortunately, it gives me astonishingly bad test results...•
OK. Please send us your dataWe are able to get 97% test accuracy. Is that
good enough for you ?• User:You earned a copy of my PhD thesis
43
8/18/2019 A introduction to svm.pdf
44/48
Dynamic Range Mismatch
•
A problem from astroparticle physics : : …1 1:2.6173e+01 2:5.88670e+01 3:-1.89469e-01 4:1.25122e+021 1:5.7073e+01 2:2.21404e+02 3:8.60795e-02 4:1.22911e+02
1 1:1.7259e+01 2:1.73436e+02 3:-1.29805e-01 4:1.25031e+021 1:2.1779e+01 2:1.24953e+02 3:1.53885e-01 4:1.52715e+021 1:9.1339e+01 2:2.93569e+02 3:1.42391e-01 4:1.60540e+021 1:5.5375e+01 2:1.79222e+02 3:1.65495e-01 4:1.11227e+021 1:2.9562e+01 2:1.91357e+02 3:9.90143e-02 4:1.03407e+02
•
#Training set 3,089 and #testing set 4,000• Large dynamic range of some features.
44
8/18/2019 A introduction to svm.pdf
45/48
Overfitting•
Training$./svm-train train.1 (default parameter used)optimization finished, #iter = 6131nSV = 3053, nBSV = 724Total nSV = 3053
•
Training Accuracy$./svm-predict train.1 train.1.model o Accuracy = 99.7734% (3082/3089)
• Testing Accuracy$./svm-predict test.1 train.1.model test.1.out
Accuracy = 66.925% (2677/4000)
nSV and nBSV: number of SVs and bounded SVs (i = C).Without scaling. One feature may dominant the value overfitting
• 3053/3089 training data become support vector !Overfitting• Training accuracy high, but low testing accuracy! Overfitting
45
8/18/2019 A introduction to svm.pdf
46/48
Suggested Procedure
• Data pre-scaling – scale range [0 1] or unit variance
• Using (default) Gaussian(RBF) kernel
•
Use cross-validation to find the best parameter (C, )• Train your model with best parameter
• Test!
All above done automatically in “easy.py” script provided with libsvm.
46
8/18/2019 A introduction to svm.pdf
47/48
Large Scale SVM
•
(#training data >> #feature ) and linear kernel
– Use primal solvers (eg. liblinear)
• To approximated result in short time –
Allow inaccurate stopping conditionsvm-train –e 0.01
– Use stochastic gradient descent solvers
– 24
47
8/18/2019 A introduction to svm.pdf
48/48
Resources
• LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm
• LIBSVM Tools: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools
• Kernel Machines Forum: http://www.kernel-machines.org
• Hsu, Chang, and Lin: A Practical Guide to Suppor t Vector
Classification• my email: [email protected]
• Acknowledgement
–
Many slides from Dr. Chih-Jen Lin , NTU
48