Semi-Supervised Learning10701/slides/17_SSL.pdfSemi-supervised SVM Many others 9. Notation 10....

Preview:

Citation preview

Semi-Supervised Learning

Barnabas Poczos

Slides Courtesy: Jerry Zhu, Aarti Singh

Supervised Learning

Learning algorithm

Labeled

Goal:

Feature Space Label Space

Optimal predictor (Bayes Rule) depends on unknown PXY, so instead

learn a good prediction rule from training data

2

Labeled and Unlabeled data

Human expert/

Special equipment/

Experiment

“Crystal” “Needle” “Empty”

Cheap and abundant ! Expensive and scarce !

“0” “1” “2” …

“Sports”

“News”

“Science”

3

Free-of-cost labels?

Luis von Ahn: Games with a purpose (ReCaptcha)

Word challenging to OCR (Optical Character Recognition)

You provide a free label!

4

Semi-Supervised learning

Supervised learning (SL)

Semi-Supervised learning (SSL)

Learning algorithm

Goal: Learn a better prediction rule than based on labeled data alone.

“Crystal”

5

Semi-Supervised learning in Humans

6

Can unlabeled data help?

Assume each class is a coherent group (e.g. Gaussian)

Then unlabeled data can help identify the boundary more accurately.

Positive labeled data

Negative labeled data

Unlabeled data

Supervised Decision Boundary

Semi-Supervised Decision Boundary

7

Can unlabeled data help?

35

8

7

9

4

2

1

53

8

7

9

4

2

1

“0” “1” “2” …

“Similar” data points have “similar” labels8

This embedding can be done by manifold learning algorithms

Some SSL Algorithms

▪ Self-Training

▪ Generative methods, mixture models

▪ Graph-based methods

▪ Co-Training

▪ Semi-supervised SVM

▪ Many others

9

Notation

10

Self-training

11

Self-training Example

Propagating 1-NN

12

Mixture Models for Labeled Data

15

Mixture Models for Labeled Data

> 1/2<

Estimate the

parameters from the

labeled data

Decision for any test

point not in the labeled

dataset

16

Mixture Models for Labeled Data

17

Mixture Models for SSL Data

18

Mixture Models

19

Mixture ModelsSL vs SSL

20

Mixture Models

21

Gaussian Mixture Models

22

EM for Gaussian Mixture Models

23

Assumption for GMMs

24

Assumption for GMMs

25

Assumption for GMMs

26

Related: Cluster and Label

27

28

Graph Based Methods

Assumption: Similar unlabeled data have similar labels.

29

Graph RegularizationSimilarity Graphs: Model local neighborhood relations between data points

30

Assumption:

Nodes connected by heavy edges

tend to have similar label

Graph Regularization

If data points i and j are similar (i.e. weight wij is large), then their labels

are similar fi = fj

Loss on labeled data

(mean square,0-1)

Graph based smoothness prior

on labeled and unlabeled data

31

Co-training

Co-training (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that (i) features can be split into two sets; (ii) each sub-feature set is sufficient to train a good classifier.

• Initially two separate classifiers are trained with the labeled data, on the two sub-feature sets respectively.

• Each classifier then classifies the unlabeled data, and ‘teaches’ the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.

• Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats.

33

Co-training Algorithm

Co-training AlgorithmBlum & Mitchell’98

Semi-Supervised SVMs

35

Semi-Supervised Learning

▪ Generative methods

▪ Graph-based methods

▪ Co-Training

▪ Semi-Supervised SVMs

▪ Many other methods

SSL algorithms can use unlabeled data to help improve prediction accuracy if data satisfies appropriate assumptions

36

Recommended