53
DS 4400 Alina Oprea Associate Professor, CCIS Northeastern University October 23 2018 Machine Learning and Data Mining I

Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

DS 4400

Alina Oprea

Associate Professor, CCIS

Northeastern University

October 23 2018

Machine Learning and Data Mining I

Page 2: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Logistics

• Project proposal is due on Oct 24 (1 page on Gradescope)– Project Title

– Problem Description

– Dataset

– Approach

• Final project – Presentation: Monday, Dec 3

– Report: Friday, Dec 7

• Final exam– Tuesday, Dec 11

2

Page 3: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Review

• Maximum margin classifier– Classifier of maximum margin

– For linearly separable data

– An “optimized” perceptron

• Support vector classifier– Allows some slack and sets a total error budget

(hyper-parameter)

– Final classifier on a point is a linear combination of inner product of point with support vectors

– Efficient to evaluate

3

Page 4: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Outline

• Support vector classifier

– Review

– Hinge loss

• SVM

– Non-linear decision boundaries

– Kernels

– Polynomial and Radial SVM

• Density estimators

4

Page 5: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Linear separability

5

Page 6: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Separating hyperplane

6

𝑦(𝑖)(𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)) > 0

For all training

data 𝑥(𝑖), 𝑦(𝑖),𝑖 ∈ {1,… , 𝑛}

h(x, 𝜃) = sign(𝜃𝑇𝑥)

Page 7: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Maximum Margin

Class 1Class -1

f x

𝜃

yest

f(x, 𝜃) = sign(𝜃𝑇𝑥)

The maximum margin linear classifier is the linear classifier with the maximum margin!

7

Page 8: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Classifier margin

8

• Support vectors are “closest” to hyperplane• At least 2 support vectors (1 positive, 1 negative)

Page 9: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Finding the maximum margin classifier

9

• Training data 𝑥(1), … , 𝑥 𝑛 with 𝑥(𝑖) = 𝑥1(𝑖), … , 𝑥𝑑

(𝑖) T

• Labels are from 2 classes: 𝑦𝑖 ∈ {−1,1}

max M

𝑦(𝑖) 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)≥ 𝑀 ∀𝑖

𝜃2= 1

Normalization constraintEach point is on the right side of hyper-

plane at distance ≥ 𝑀

Page 10: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Support vector classifier

• Allow for small number of mistakes on training data

• Obtain a more robust model

max M

𝑦(𝑖) 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)≥ 𝑀 1 − 𝜖𝑖 ∀𝑖

𝜃2= 1

𝜖𝑖 ≥ 0,σ𝑖 𝜖𝑖 = 𝐶

10Error Budget (Hyper-parameter)

Slack

Page 11: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Equivalent formulation

• Min 𝜃2+ 𝐶σ𝑖 𝜖𝑖

• 𝑦(𝑖) 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)≥ 1 − 𝜖𝑖 ∀𝑖

• 𝜖𝑖 ≥ 0

• When i is correctly classified, 𝑦(𝑖)ℎ 𝑥(𝑖) ≥ 1

• When i is not correctly classified

1 − 𝑦(𝑖)ℎ 𝑥(𝑖) ≤ 𝜖𝑖

• max 0,1 − 𝑦(𝑖)ℎ 𝑥(𝑖) ≤ 𝜖𝑖 11

h 𝑥(𝑖) = 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)

Page 12: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Hinge Loss

• 𝐽 𝜃 = σ𝑖=1𝑛 max 0,1 − 𝑦(𝑖)ℎ 𝑥(𝑖) + 𝜆σ𝑗=1

𝑑 𝜃𝑗2

12

Hinge loss

h 𝑥(𝑖) = 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)

Total Error Budget Regularization Term

𝐽 𝜃 = 𝐶

𝑖=0

𝑛

max 0,1 − 𝑦(𝑖)ℎ 𝑥(𝑖) +𝑗=1

𝑑

𝜃𝑗2

𝐶 =1

λ

Page 13: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Error Budget and Margin

13

Larger CLow variance

Smaller COver-fitting

Find best hyper-parameter C by cross-validation

Page 14: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Non-linear decision

14

Page 15: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

More examples

15

Page 16: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Kernels

• Support vector classifier– h 𝑧 = 𝜃0 + σ𝑖∈𝑆𝛼𝑖 < 𝑧, 𝑥(𝑖) >=

= 𝜃0 + σ𝑖∈𝑆𝛼𝑖 σ𝑗=1 𝑧𝑗𝑥𝑗(𝑖)

– S is set of support vectors

– Replace with h 𝑧 = 𝜃0 +σ𝑖∈𝑆𝛼𝑖𝐾(𝑧, 𝑥(𝑖))

• What is a kernel?– Function that characterizes similarity between 2

observations

– 𝐾 𝑎, 𝑏 =< 𝑎, 𝑏 > = σ𝑗 𝑎𝑗𝑏𝑗 linear kernel!

– The “closest” the points, the larger the kernel

• Intuition– The closest support vectors to the point play larger role in

classification

16

Page 17: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

The Kernel Trick

17

• Enlarge feature space• Shape of the kernel changes the decision boundary

Page 18: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Kernels

• Linear kernels

– 𝐾 𝑎, 𝑏 =< 𝑎, 𝑏 > = σ𝑖 𝑎𝑖𝑏𝑖

• Polynomial kernel of degree m

– 𝐾 𝑎, 𝑏 = 1 + σ𝑖=0𝑑 𝑎𝑖𝑏𝑖

𝑚

• Radial Basis Function (RBF) kernel (or Gaussian)

– 𝐾 𝑎, 𝑏 = exp −𝛾σ𝑖=0𝑑 (𝑎𝑖−𝑏𝑖)

2

• Support vector machine classifier

– h 𝑧 = 𝜃0 + σ𝑖∈𝑆 𝛼𝑖𝐾(𝑧, 𝑥(𝑖))

18

Page 19: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

General SVM classifier

• S = set of support vectors

• SVM with polynomial kernel

– ℎ 𝑧 = 𝜃0 +σ𝑖∈𝑆 𝛼𝑖 1 + σ𝑗=0𝑑 𝑧𝑗𝑥𝑗

(𝑖) 𝑚

– Hyper-parameter m (degree of polynomial)

• SVM with radial kernel

– ℎ 𝑧 = 𝜃0 +σ𝑖∈𝑆 𝛼𝑖exp −𝛾σ𝑗=0𝑑 (𝑧𝑗−𝑥𝑗

(𝑖))2

– Hyper-parameter 𝛾 (increase for non-linear data)

– As testing point z is closer to support vector, kernel is close to 1

– Local behavior: points far away have negligible impact on prediction

19

Page 20: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Kernel Example

20

Page 21: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Advantages of Kernels

• Generate non-linear features

• More flexibility in decision boundary

• Generate a family of SVM classifiers

• Testing is computationally efficient– Cost depends only on support vectors and kernel

operation

• Disadvantages– Kernels need to be tuned (additional hyper-

parameters)

21

Page 22: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

When to use different kernels?

• If data is (close to) linearly separable, use linear SVM

• Radial or polynomial kernels preferred for non-linear data

• Training radial or polynomial kernels takes longer than linear SVM

• Other kernels

– Sigmoid

– Hyperbolic Tangent

22

Page 23: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Comparing SVM with other classifiers

• SVM is resilient to outliers– Similar to Logistic Regression

– LDA or kNN are not

• SVM can be trained with Gradient Descent– Hinge loss cost function

• Supports regularization– Can add penalty term (ridge or Lasso) to cost

function

• Linear SVM is most similar to Logistic Regression

23

Page 24: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Connection to Logistic Regression

• 𝐽 𝜃 = σ𝑖=0𝑛 max 0,1 − 𝑦(𝑖)ℎ 𝑥(𝑖) + 𝜆σ𝑗=1

𝑑 𝜃𝑗2

24

Hinge loss

𝑦(𝑖) 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)

h 𝑥(𝑖) = 𝜃0 + 𝜃1𝑥1(𝑖)+⋯𝜃𝑑𝑥𝑑

(𝑖)

Page 25: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

SVM for Multiple Classes

25

Page 26: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Linear SVM

26

Page 27: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Linear SVM

27

Page 28: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Linear SVM

28

Page 29: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Radial SVM

29

Page 30: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Radial SVM

30

Page 31: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Multiple Classes

31

Page 32: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Lab – Multiple Classes

32

Page 33: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Review SVM

33

Page 34: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Outline

• Support vector classifier

– Review

– Hinge loss

• SVM

– Non-linear decision boundaries

– Kernels

– Polynomial and Radial SVM

• Density estimators

34

Page 35: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Essential probability concepts

35

Page 36: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Prior and Joint Probabilities

36

Page 37: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

The Joint Distribution

37

Page 38: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Computing Prior Probabilities

38

Page 39: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Learning Joint Distributions

39

Page 40: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Example – Learning Joint Probability Distribution

40

Page 41: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Density Estimation

41

Page 42: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Density Estimation

42

Page 43: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Evaluating Density Estimators

43

Page 44: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Evaluating Density Estimators

44

Page 45: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Example

45

Page 46: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Example

46

Page 47: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Log Probabilities

47

Page 48: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Example

48

Page 49: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Evaluation on Test Set

49

Page 50: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Overfitting

50

Page 51: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Curse of Dimensionality

51

Page 52: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Pros and Cons of Density Estimators

• Pros

– Density Estimators can learn distribution of training data

– Can compute probability for a record

– Can do inference (predict likelihood of record)

• Cons

– Can overfit to the training data and not generalize to test data

– Curse of dimensionality

52

Naïve Bayes classifier fixes these cons!

Page 53: Machine Learning and Data Mining I · 2018. 10. 24. · Machine Learning and Data Mining I. Logistics •Project proposal is due on Oct 24 (1 page on Gradescope) ... (increase for

Acknowledgements

• Slides made using resources from:

– Andrew Ng

– Eric Eaton

– David Sontag

– Andrew Moore

• Thanks!

53