110
Tamara Berg Machine Learning 790-133 Recognizing People, Objects, & Actions 1

Tamara Berg Machine Learning

  • Upload
    takara

  • View
    51

  • Download
    1

Embed Size (px)

DESCRIPTION

790-133 Recognizing People, Objects, & Actions. Tamara Berg Machine Learning. A nnouncements. Topic presentation groups posted. Anyone not have a group yet? Last day of background material For Monday - Object recognition papers will be posted online. Please read!. What is machine learning?. - PowerPoint PPT Presentation

Citation preview

Page 1: Tamara Berg Machine Learning

1

Tamara BergMachine Learning

790-133Recognizing People, Objects, & Actions

Page 2: Tamara Berg Machine Learning

Announcements

• Topic presentation groups posted. Anyone not have a group yet?

• Last day of background material

• For Monday - Object recognition papers will be posted online. Please read!

Slide 2 of 113

Page 3: Tamara Berg Machine Learning

What is machine learning?

• Computer programs that can learn from data

• Two key components– Representation: how should we represent the

data?– Generalization: the system should generalize from

its past experience (observed data items) to perform well on unseen data items.

Slide 3 of 113

Page 4: Tamara Berg Machine Learning

Types of ML algorithms

• Unsupervised– Algorithms operate on unlabeled examples

• Supervised– Algorithms operate on labeled examples

• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled examples

Slide 4 of 113

Page 5: Tamara Berg Machine Learning

Unsupervised Learning

Slide 5 of 113

Page 6: Tamara Berg Machine Learning

Slide 6 of 113

Page 7: Tamara Berg Machine Learning

K-means clustering• Want to minimize sum of squared Euclidean

distances between points xi and their nearest cluster centers mk

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

• Assign each data point to the nearest center• Recompute each cluster center as the mean of all points assigned

to it

k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

source: Svetlana Lazebnik Slide 7 of 113

Page 8: Tamara Berg Machine Learning

Slide 8 of 113

Page 9: Tamara Berg Machine Learning

Slide 9 of 113

Page 10: Tamara Berg Machine Learning

Slide 10 of 113

Page 11: Tamara Berg Machine Learning

Slide 11 of 113

Page 12: Tamara Berg Machine Learning

Slide 12 of 113

Page 13: Tamara Berg Machine Learning

Slide 13 of 113

Page 14: Tamara Berg Machine Learning

Slide 14 of 113

Page 15: Tamara Berg Machine Learning

Slide 15 of 113

Page 16: Tamara Berg Machine Learning

Slide 16 of 113

Page 17: Tamara Berg Machine Learning

Slide 17 of 113

Page 18: Tamara Berg Machine Learning

Slide 18 of 113

Page 19: Tamara Berg Machine Learning

Different clustering strategies• Agglomerative clustering

• Start with each point in a separate cluster• At each iteration, merge two of the “closest” clusters

• Divisive clustering• Start with all points grouped into a single cluster• At each iteration, split the “largest” cluster

• K-means clustering• Iterate: assign points to clusters, compute means

• K-medoids• Same as k-means, only cluster center cannot be computed by

averaging• The “medoid” of each cluster is the most centrally located point in

that cluster (i.e., point with lowest average distance to the other points)

source: Svetlana Lazebnik Slide 19 of 113

Page 20: Tamara Berg Machine Learning

Supervised Learning

Slide 20 of 113

Page 21: Tamara Berg Machine Learning

Slide from Dan KleinSlide 21 of 113

Page 22: Tamara Berg Machine Learning

Slide from Dan KleinSlide 22 of 113

Page 23: Tamara Berg Machine Learning

Slide from Dan KleinSlide 23 of 113

Page 24: Tamara Berg Machine Learning

Slide from Dan KleinSlide 24 of 113

Page 25: Tamara Berg Machine Learning

Example: Image classification

apple

pear

tomato

cow

dog

horse

input desired output

Slide credit: Svetlana LazebnikSlide 25 of 113

Page 26: Tamara Berg Machine Learning

Slide from Dan Kleinhttp://yann.lecun.com/exdb/mnist/index.html Slide 26 of 113

Page 27: Tamara Berg Machine Learning

Example: Seismic data

Body wave magnitude

Surfa

ce w

ave

mag

nitu

de

Nuclear explosions

Earthquakes

Slide credit: Svetlana LazebnikSlide 27 of 113

Page 28: Tamara Berg Machine Learning

Slide from Dan KleinSlide 28 of 113

Page 29: Tamara Berg Machine Learning

The basic classification framework

y = f(x)

• Learning: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the parameters of the prediction function f

• Inference: apply f to a never before seen test example x and output the predicted value y = f(x)

output classification function

input

Slide credit: Svetlana LazebnikSlide 29 of 113

Page 30: Tamara Berg Machine Learning

30

Some ML classification methods

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Support Vector Machines and Kernels Conditional Random Fields

McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001…

Slide credit: Antonio Torralba

Page 31: Tamara Berg Machine Learning

Example: Training and testing

• Key challenge: generalization to unseen examples

Training set (labels known) Test set (labels unknown)

Slide credit: Svetlana LazebnikSlide 31 of 113

Page 32: Tamara Berg Machine Learning

Slide credit: Dan KleinSlide 32 of 113

Page 33: Tamara Berg Machine Learning

Slide from Min-Yen Kan

Classification by Nearest Neighbor

Word vector document classification – here the vector space is illustrated as having 2 dimensions. How many dimensions would the data actually live in?

Slide 33 of 113

Page 34: Tamara Berg Machine Learning

Slide from Min-Yen Kan

Classification by Nearest Neighbor

Slide 34 of 113

Page 35: Tamara Berg Machine Learning

Classification by Nearest Neighbor

Classify the test document as the class of the document “nearest” to the query document (use vector similarity to find most similar doc)

Slide from Min-Yen Kan

Slide 35 of 113

Page 36: Tamara Berg Machine Learning

Classification by kNN

Classify the test document as the majority class of the k documents “nearest” to the query document. Slide from Min-Yen Kan

Slide 36 of 113

Page 37: Tamara Berg Machine Learning

Slide from Min-Yen Kan

What are the features? What’s the training data? Testing data? Parameters?

Classification by kNN

Slide 37 of 113

Page 38: Tamara Berg Machine Learning

Slide from Min-Yen KanSlide 38 of 113

Page 39: Tamara Berg Machine Learning

Slide from Min-Yen KanSlide 39 of 113

Page 40: Tamara Berg Machine Learning

Slide from Min-Yen KanSlide 40 of 113

Page 41: Tamara Berg Machine Learning

Slide from Min-Yen KanSlide 41 of 113

Page 42: Tamara Berg Machine Learning

Slide from Min-Yen KanSlide 42 of 113

Page 43: Tamara Berg Machine Learning

Slide from Min-Yen Kan

What are the features? What’s the training data? Testing data? Parameters?

Classification by kNN

Slide 43 of 113

Page 44: Tamara Berg Machine Learning

44

NN for vision

Fast Pose Estimation with Parameter Sensitive HashingShakhnarovich, Viola, Darrell

Page 45: Tamara Berg Machine Learning

J. Hays and A. Efros, Scene Completion using Millions of Photographs, SIGGRAPH 2007

NN for vision

Page 46: Tamara Berg Machine Learning

J. Hays and A. Efros, IM2GPS: estimating geographic information from a single image, CVPR 2008

NN for vision

Page 47: Tamara Berg Machine Learning

Decision tree classifierExample problem: decide whether to wait for a table at a

restaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?2. Bar: is there a comfortable bar area to wait in?3. Fri/Sat: is today Friday or Saturday?4. Hungry: are we hungry?5. Patrons: number of people in the restaurant (None, Some, Full)6. Price: price range ($, $$, $$$)7. Raining: is it raining outside?8. Reservation: have we made a reservation?9. Type: kind of restaurant (French, Italian, Thai, Burger)10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Slide credit: Svetlana LazebnikSlide 47 of 113

Page 48: Tamara Berg Machine Learning

Decision tree classifier

Slide credit: Svetlana LazebnikSlide 48 of 113

Page 49: Tamara Berg Machine Learning

Decision tree classifier

Slide credit: Svetlana LazebnikSlide 49 of 113

Page 50: Tamara Berg Machine Learning

Linear classifier

• Find a linear function to separate the classes

f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)

Slide credit: Svetlana LazebnikSlide 50 of 113

Page 51: Tamara Berg Machine Learning

Discriminant Function• It can be arbitrary functions of x, such as:

Nearest Neighbor

Decision Tree

LinearFunctions

( ) Tg b x w x

Slide credit: Jinwei GuSlide 51 of 113

Page 52: Tamara Berg Machine Learning

Linear Discriminant Function• g(x) is a linear function:

( ) Tg b x w x

x1

x2

wT x + b = 0

wT x + b < 0

wT x + b > 0

A hyper-plane in the feature space

Slide credit: Jinwei Gu

denotes +1denotes -1

x1

Slide 52 of 113

Page 53: Tamara Berg Machine Learning

• How would you classify these points using a linear discriminant function in order to minimize the error rate?

Linear Discriminant Function

denotes +1denotes -1

x1

x2

Infinite number of answers!

Slide credit: Jinwei GuSlide 53 of 113

Page 54: Tamara Berg Machine Learning

• How would you classify these points using a linear discriminant function in order to minimize the error rate?

Linear Discriminant Function

x1

x2

Infinite number of answers!

denotes +1denotes -1

Slide credit: Jinwei GuSlide 54 of 113

Page 55: Tamara Berg Machine Learning

• How would you classify these points using a linear discriminant function in order to minimize the error rate?

Linear Discriminant Function

x1

x2

Infinite number of answers!

denotes +1denotes -1

Slide credit: Jinwei GuSlide 55 of 113

Page 56: Tamara Berg Machine Learning

x1

x2• How would you classify these points using a linear discriminant function in order to minimize the error rate?

Linear Discriminant Function

Infinite number of answers!

Which one is the best?

denotes +1denotes -1

Slide credit: Jinwei GuSlide 56 of 113

Page 57: Tamara Berg Machine Learning

Large Margin Linear Classifier

“safe zone”• The linear discriminant

function (classifier) with the maximum margin is the best

Margin is defined as the width that the boundary could be increased by before hitting a data point

Why it is the best? strong generalization ability

Margin

x1

x2

Linear SVMSlide credit: Jinwei Gu

Slide 57 of 113

Page 58: Tamara Berg Machine Learning

Large Margin Linear Classifier

x1

x2 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

x+

x+

x-

Support Vectors

Slide credit: Jinwei GuSlide 58 of 113

Page 59: Tamara Berg Machine Learning

Large Margin Linear Classifier • Formulation:

x1

x2 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

x+

x+

x-n

21minimize 2

w

such that

For 1, 1

For 1, 1

Ti i

Ti i

y b

y b

w x

w x

Slide credit: Jinwei GuSlide 61 of 113

Page 60: Tamara Berg Machine Learning

Large Margin Linear Classifier • Formulation:

x1

x2 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

x+

x+

x-n( ) 1T

i iy b w x

21minimize 2

w

such that

Slide credit: Jinwei GuSlide 62 of 113

Page 61: Tamara Berg Machine Learning

Solving the Optimization Problem

( ) 1Ti iy b w x

21minimize 2

w

s.t.

Quadratic programming

with linear constraints

Slide credit: Jinwei GuSlide 63 of 113

Page 62: Tamara Berg Machine Learning

Solving the Optimization Problem The linear discriminant function is:

Notice it relies on a dot product between the test point x and the support vectors xi

Slide credit: Jinwei GuSlide 66 of 113

Page 63: Tamara Berg Machine Learning

Linear separability

Slide credit: Svetlana LazebnikSlide 67 of 113

Page 64: Tamara Berg Machine Learning

68

Non-linear SVMs: Feature Space General idea: the original input space can be mapped to

some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

Slide courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt

Page 65: Tamara Berg Machine Learning

69

Nonlinear SVMs: The Kernel Trick With this mapping, our discriminant function becomes:

SV

( ) ( ) ( ) ( )T Ti i

i

g b b

x w x x x

No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.

A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:

( , ) ( ) ( )Ti j i jK x x x x

Slide credit: Jinwei Gu

Page 66: Tamara Berg Machine Learning

71

Nonlinear SVMs: The Kernel Trick

Linear kernel:

2

2( , ) exp( )2i j

i jK

x xx x

( , ) Ti j i jK x x x x

( , ) (1 )T pi j i jK x x x x

0 1( , ) tanh( )Ti j i jK x x x x

Examples of commonly-used kernel functions:

Polynomial kernel:

Gaussian (Radial-Basis Function (RBF) ) kernel:

Sigmoid:

Slide credit: Jinwei Gu

Page 67: Tamara Berg Machine Learning

Support Vector Machine: Algorithm

1. Choose a kernel function

2. Choose a value for C and any other parameters (e.g. σ)

3. Solve the quadratic programming problem (many software packages available)

4. Classify held out validation instances using the learned model

5. Select the best learned model based on validation accuracy 6. Classify test instances using the final selected model

Slide 72 of 113

Page 68: Tamara Berg Machine Learning

Some Issues• Choice of kernel - Gaussian or polynomial kernel is default - if ineffective, more elaborate kernels are needed - domain experts can give assistance in formulating appropriate similarity

measures

• Choice of kernel parameters - e.g. σ in Gaussian kernel - In the absence of reliable criteria, applications rely on the use of a

validation set or cross-validation to set such parameters.

This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt Slide 73 of 113

Page 69: Tamara Berg Machine Learning

Summary: Support Vector Machine

• 1. Large Margin Classifier – Better generalization ability & less over-fitting

• 2. The Kernel Trick– Map data points to higher dimensional space in

order to make them linearly separable.– Since only dot product is used, we do not need to

represent the mapping explicitly.

Slide credit: Jinwei GuSlide 74 of 113

Page 70: Tamara Berg Machine Learning

• A simple algorithm for learning robust classifiers– Freund & Shapire, 1995– Friedman, Hastie, Tibshhirani, 1998

• Provides efficient algorithm for sparse visual feature selection– Tieu & Viola, 2000– Viola & Jones, 2003

• Easy to implement, doesn’t require external optimization tools.

Boosting

Slide credit: Antonio TorralbaSlide 75 of 113

Page 71: Tamara Berg Machine Learning

• Defines a classifier using an additive model:

Boosting

Strong classifier

Weak classifier

WeightFeaturesvector

Slide credit: Antonio TorralbaSlide 76 of 113

Page 72: Tamara Berg Machine Learning

• Defines a classifier using an additive model:

• We need to define a family of weak classifiers

Boosting

Strong classifier

Weak classifier

WeightFeaturesvector

from a family of weak classifiers

Slide credit: Antonio TorralbaSlide 77 of 113

Page 73: Tamara Berg Machine Learning

Adaboost

Slide credit: Antonio TorralbaSlide 78 of 113

Page 74: Tamara Berg Machine Learning

Each data point has

a class label:

wt =1and a weight:

+1 ( )

-1 ( )yt =

Boosting• It is a sequential procedure:

xt=1

xt=2

xt

Slide credit: Antonio TorralbaSlide 79 of 113

Page 75: Tamara Berg Machine Learning

Toy exampleWeak learners from the family of lines

h => p(error) = 0.5 it is at chance

Each data point has

a class label:

wt =1and a weight:

+1 ( )

-1 ( )yt =

Slide credit: Antonio TorralbaSlide 80 of 113

Page 76: Tamara Berg Machine Learning

Toy example

This one seems to be the best

Each data point has

a class label:

wt =1and a weight:

+1 ( )

-1 ( )yt =

This is a ‘weak classifier’: It performs slightly better than chance.Slide credit: Antonio Torralba

Slide 81 of 113

Page 77: Tamara Berg Machine Learning

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Slide credit: Antonio TorralbaSlide 82 of 113

Page 78: Tamara Berg Machine Learning

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Slide credit: Antonio TorralbaSlide 83 of 113

Page 79: Tamara Berg Machine Learning

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Slide credit: Antonio TorralbaSlide 84 of 113

Page 80: Tamara Berg Machine Learning

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Slide credit: Antonio TorralbaSlide 85 of 113

Page 81: Tamara Berg Machine Learning

Toy example

The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.

f1 f2

f3

f4

Slide credit: Antonio TorralbaSlide 86 of 113

Page 82: Tamara Berg Machine Learning

Adaboost

Slide credit: Antonio TorralbaSlide 87 of 113

Page 83: Tamara Berg Machine Learning

Semi-Supervised Learning

Slide 88 of 113

Page 85: Tamara Berg Machine Learning

90

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 86: Tamara Berg Machine Learning

91

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

Speech

Images

Medical outcomes

Customer modeling

Protein sequences

Web pages

Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 87: Tamara Berg Machine Learning

92

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

[From Jerry Zhu]

Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 88: Tamara Berg Machine Learning

93

Need to pay someone to do it, requires special testing,…

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

Can we make use of cheap unlabeled data?

Slide Credit: Avrim Blum

Page 89: Tamara Berg Machine Learning

Semi-Supervised LearningCan we use unlabeled data to augment a small

labeled sample to improve learning?

But unlabeled data is missing the most important info!!But maybe still has

useful regularities that we can use.

But…But…But…Slide Credit: Avrim Blum Slide 94 of 113

Page 90: Tamara Berg Machine Learning

95

Method 1:

EM

Page 91: Tamara Berg Machine Learning

How to use unlabeled data • One way is to use the EM algorithm

– EM: Expectation Maximization• The EM algorithm is a popular iterative algorithm for

maximum likelihood estimation in problems with missing data.

• The EM algorithm consists of two steps, – Expectation step, i.e., filling in the missing data – Maximization step – calculate a new maximum a posteriori

estimate for the parameters.

Slide 96 of 113

Page 92: Tamara Berg Machine Learning

Algorithm Outline

1. Train a classifier with only the labeled documents.

2. Use it to probabilistically classify the unlabeled documents.

3. Use ALL the documents to train a new classifier.4. Iterate steps 2 and 3 to convergence.

Slide 97 of 113

Page 93: Tamara Berg Machine Learning

98

Method 2:

Co-Training

Page 94: Tamara Berg Machine Learning

Co-training[Blum&Mitchell’98] Many problems have two different sources of info

(“features/views”) you can use to determine label.E.g., classifying faculty webpages: can use words on page or words on links pointing to the page.

My AdvisorProf. Avrim Blum My AdvisorProf. Avrim Blum

x2- Text infox1- Link infox - Link info & Text info

Slide Credit: Avrim BlumSlide 99 of 113

Page 95: Tamara Berg Machine Learning

Co-trainingIdea: Use small labeled sample to learn initial rules.

– E.g., “my advisor” pointing to a page is a good indicator it is a faculty home page.

– E.g., “I am teaching” on a page is a good indicator it is a faculty home page.

my advisor

Slide Credit: Avrim BlumSlide 100 of 113

Page 96: Tamara Berg Machine Learning

Co-trainingIdea: Use small labeled sample to learn initial rules.

– E.g., “my advisor” pointing to a page is a good indicator it is a faculty home page.

– E.g., “I am teaching” on a page is a good indicator it is a faculty home page.

Then look for unlabeled examples where one view is confident and the other is not. Have it label the example for the other.

Training 2 classifiers, one on each type of info. Using each to help train the other.

hx1,x2ihx1,x2ihx1,x2i

hx1,x2ihx1,x2ihx1,x2i

Slide Credit: Avrim BlumSlide 101 of 113

Page 97: Tamara Berg Machine Learning

102

Co-training Algorithm [Blum and Mitchell, 1998]

Given: labeled data L,

unlabeled data U

Loop:

Train h1 (e.g., hyperlink classifier) using L

Train h2 (e.g., page classifier) using L

Allow h1 to label p positive, n negative examples from U

Allow h2 to label p positive, n negative examples from U

Add these most confident self-labeled examples to L

Page 98: Tamara Berg Machine Learning

103

Watch, Listen & Learn: Co-training on Captioned Images and Videos

Sonal Gupta, Joohyun Kim, Kristen Grauman, Raymond MooneyThe University of Texas at Austin, U.S.A.

Page 99: Tamara Berg Machine Learning

Goals• Classify images and videos with the help

of visual information and associated text captions

• Use unlabeled image and video examples

Slide 104 of 113

Page 100: Tamara Berg Machine Learning

Image Examples

105

Cultivating farming at Nabataean Ruins of the Ancient Avdat

Bedouin Leads His Donkey That Carries Load Of Straw

Ibex Eating In The Nature Entrance To Mikveh Israel Agricultural School

Desert

Trees

Slide 105 of 113

Page 101: Tamara Berg Machine Learning

Approach• Combining two views of images and videos using Co-

training (Blum and Mitchell ‘98) learning algorithm

• Views: Text and Visual

• Text View – Caption of image or video– Readily available

• Visual View– Color, texture, temporal information in image/video

Slide 106 of 113

Page 102: Tamara Berg Machine Learning

Co-training

107

++-+

Initially Labeled Instances

Visual Classifier

Text Classifier

Text View Visual View

Text View Visual View

Text View Visual View

Text View Visual View

Slide 107 of 113

Page 103: Tamara Berg Machine Learning

Co-training

108

Initially Labeled Instances

Visual Classifier

Text Classifier

Supervised Learning

Text ViewText ViewText ViewText View

Visual ViewVisual ViewVisual ViewVisual View

++-+

++-+

Slide 108 of 113

Page 104: Tamara Berg Machine Learning

Co-training

109

Unlabeled Instances

Visual Classifier

Text Classifier

Text ViewText ViewText ViewText View

Visual ViewVisual ViewVisual ViewVisual View

Slide 109 of 113

Page 105: Tamara Berg Machine Learning

Co-training

110

ClassifierLabeled

Instances

Classify most confident instances

Text Classifier

Visual Classifier

Text ViewText ViewText ViewText View

Visual ViewVisual ViewVisual ViewVisual View

++--

++--

Slide 110 of 113

Page 106: Tamara Berg Machine Learning

Co-training

111

Retrain Classifiers

Text Classifier

Visual Classifier

Text ViewText ViewText ViewText View

Visual ViewVisual ViewVisual ViewVisual View

++--

++--

Slide 111 of 113

Page 107: Tamara Berg Machine Learning

Video FeaturesDetect Interest Points

Harris-Forstener Corner Detector for both spatial and temporal space

Describe Interest PointsHistogram of Oriented Gradients (HoG)

Create Spatio-Temporal VocabularyQuantize interest points to create 200

visual words dictionary

Represent each video as histogram of visual words

[Laptev, IJCV ‘05]

N 72

Slide 112 of 113

Page 108: Tamara Berg Machine Learning

Textual Features

113

• That was a very nice forward camel.• Well I remember her performance last time.• He has some delicate hand movement.• She gave a small jump while gliding• He runs in to chip the ball with his right foot.• He runs in to take the instep drive and executes it well.• The small kid pushes the ball ahead with his tiny kicks.

Standard Bag-of-Words Representation

Raw Text Commentary

Porter Stemmer Remove Stop Words

Slide 113 of 113

Page 109: Tamara Berg Machine Learning

Conclusion• Combining textual and visual features

can help improve accuracy• Co-training can be useful to combine

textual and visual features to classify images and videos

• Co-training helps in reducing labeling of images and videos

[More information on http://www.cs.utexas.edu/users/ml/co-training]

114 Slide 114 of 113

Page 110: Tamara Berg Machine Learning

Co-training vs. EM

• Co-training splits features, EM does not.

• Co-training incrementally uses the unlabeled data.

• EM probabilistically labels all the data at each round; EM iteratively uses the unlabeled data.

Slide 115 of 113