78
ECS 289H: Visual Recognition Fall 2014 Yong Jae Lee Department of Computer Science

ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

ECS 289H: Visual RecognitionFall 2014

Yong Jae Lee

Department of Computer Science

Page 2: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Plan for today

• Questions?

• Research overview

Page 3: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Standard supervised visual learning

Annotators

Category models

Novel images

building

tree

• Number of training images required can be costly• Assumes closed-world setting where all categories

are known

4

Page 4: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unsupervised visual discovery

5

Visual world

Discovered categories

Page 5: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unsupervised visual discovery

6

Visual world

Object segmentations

in images and video

Page 6: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unsupervised visual discovery

• No human to explicitly guide visual recognition process

7

Visual world

Storyboard visual summary

1:00 pm 2:00 pm 3:00 pm 4:00 pm

Page 7: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Why visual discovery?

Exploring new environments

8

Page 8: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Summarization

MSR Sensecam9

Why visual discovery?

Page 9: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

6 billion images 70 billion images 1 billion images served daily

10 billion images

100 hours uploaded per minute

Almost 90% of web traffic is visual!

:From

10

Why visual discovery?

Most of it is unlabeled!!

Page 10: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Personal photo albums

Surveillance and security

Movies, news, sports

Medical and scientific images

Inputs today

Svetlana Lazebnik

Understand and organize and index all this data!!

Page 11: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Let’s first explore…

what we can do with big data!

Page 12: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

13

Everyday use of big data:Predictive text

Page 13: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

14

Predictive drawing?

Page 14: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

ShadowDraw

• video

Page 15: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Research goal: Visual discovery

16

Visual world

Discovered categories

Page 16: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Key challenges

• Simultaneously estimate segmentation and groups

• Unknown variability in appearance

• What is the proper distance metric?

17

Page 17: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

18

- = Grayvalue distance of 50 values

- = Euclidian distance of 5 unitsx

y

x

y

- = ?

= hamming distance of 1 letterCLIME - CRIME

How similar are two pictures?

Alyosha Efros

Page 18: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

?=

19

How similar are two pictures?

Page 19: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Problem

Clusters formed from full image matches

Page 20: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Mutual Relationship between Foreground Features and Clusters

• If we have only foreground features, we can form good clusters…

Clusters formed from full image matches

Clusters formed from foreground matches

Page 21: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Mutual Relationship between Foreground Features and Clusters

• If we have good clusters, we can detect the foreground…

Page 22: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Our Approach

• Unsupervised task that iteratively seeks the mutual support between discovered objects and their defining features

Update cluster based on weighted

feature matches

Refine feature weights given

current clusters

Feature

index

Feature

weights

[Lee & Grauman, Foreground Focus, IJCV 2009]

Page 23: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Cluster and Feature Weight Refinement:Iteration 1

Feature

index

Feature

weightsImages as Local Feature Sets

Pair-wise Matching

Normalized Cuts Clustering

Initial Set of Clusters

Page 24: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Cluster and Feature Weight Refinement:Iteration 1

Feature

index

Feature

weights

Compute Feature Weights

New Feature Weights

Page 25: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Cluster and Feature Weight Refinement:Iteration 2

Feature

index

Feature

weightsNew Set of Clusters

Compute Feature Weights

New Feature Weights

Page 26: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Cluster and Feature Weight Refinement:Iteration 3

Feature

index

Feature

weightsPair-wise

Matching + Normalized Cuts

Final Set of Clusters

New Feature Weights

Page 27: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Quality of Clusters Formed

• Black dotted lines indicate the best possible quality that could be obtained if the ground truth segmentation were known

Page 28: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Quality of Foreground Detection

10-classes subset - highly weighted features

Page 29: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Shape

• Invariant to lighting conditions • Relatively stable compared to intra-category appearance (texture, color) variations

Can we discover common object shapes within unlabeled multi-category collections of images?

Page 30: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Anchoring Edge Fragments to Local Patches

Even with accurate patch matches, there’s a limit to how much shape information can be captured.

By anchoring edge fragments to patch features, we can produce more reliable matches and describe the object’s shape.

Page 31: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Foreground Shape Discovery: Prototypical Shape

Examples of discovered object contours Our shapes

[Lee & Grauman, Shape Discovery, CVPR 2009]

Page 32: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

• Works well for object-centric images

• Complex images with multiple objects remains challenging…

Page 33: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Existing approachesPrevious work treats unsupervised visual discovery as an appearance-grouping problem.

50

1

3 4

2

Page 34: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

How can seeing previously learned objects in novel images help to discover new categories?

1

3 4

2

Our idea

51

Page 35: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Our idea: Discover visual categories within unlabeled images by modeling interactions between the unfamiliar regions and familiar objects.

Our idea

1

3 4

2

52[Lee & Grauman, Object-graphs, CVPR 2010]

Page 36: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

drive-way

sky

house

? grass

Context-aware visual discovery

grass

sky

truckhouse

? drive-way

grass

sky

housedrive-way

fence

?

? ? ?

53[Lee & Grauman, Object-graphs, CVPR 2010]

Page 37: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Learn “known” categories

• Offline: Train region-based classifiers for N “known” categories using labeled training data.

sky road

buildingtree

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

54

Page 38: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Identifying unknown regions

Input: unlabeled pool of novel images

Compute multiple segmentations for each unlabeled image

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

55

Page 39: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

P(c

las

s | r

eg

ion

)P

(cla

ss

| r

eg

ion

)

P(c

las

s | r

eg

ion

)P

(cla

ss

| r

eg

ion

)

Prediction:known

Prediction:known

Prediction:known

Highentropy →Prediction:unknown

• Deem each segment as “known” or “unknown” based on

Identifying unknown regions

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

56

resulting entropy:

Page 40: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Model the topology of category predictions relative to the unknown (unfamiliar) region.

An unknown region within an image

Object-graphs

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

57

Page 41: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

An unknown region within an image

0

Closest nodes in

its object-graph

2a

2b1b

1a

3a

3b

S

b t s r

1aabove

1bbelow

H1(s)

b t s rb t s r

0

self

g(s) = [ , , , ]

HR(s)

b t s r b t s r

Raabove

Rbbelow

1st nearest region out to Rth nearest

b t s r

0

self

H0(s)

Object-graphs

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

58

Consider spatially near regions aboveand below, record distributions for each known class.

Page 42: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Example object-graphs

building sky roadunknown

• Colors indicate the predicted known category (max posterior)59

Page 43: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unknown

Regions

Clusters from region-region affinities

Detect Unknowns

Object-level Context

DiscoveryLearn

Models

60

Object-level context provides more robust affinities

Page 44: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

MSRC-v2

PASCAL 2008

Corel

MSRC-v0

Results: object discovery accuracy

61

Page 45: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Example discoveries

62

Page 46: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

63

Context-aware face discovery

• System can suggest novel people to name based on their appearance and co-occurrence with familiar people.

DavidDavid

David

Kate

KateKate

Kate

Katename?

[Lee & Grauman, Face discovery, BMVC 2011]

Page 47: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

64

Co-occurring faces

2 2 2 2 2

3 3333

DiscoveredFace

7 7 7 7 7

12 12 12 1212

• Dataset: Gallagher, Friends, Buffy• 12,542 images, 8,452 faces and 23

unique people• Two splits: 8 unknowns, and 15

unknowns

Results: Context-aware face discovery

[Lee & Grauman, Face discovery, BMVC 2011]

Page 48: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Self-paced discovery

Traditional Batch k-way

• Previous work treats unsupervised visual discovery as a one-pass “batch” procedure.

66

Page 49: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Self-paced discovery

67

• Focus on the “easier” instances first, and gradually discover new models of increasing complexity.

Single Easiest (Ours)

[Lee & Grauman, Self-paced discovery, CVPR 2011]

Page 50: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Identify Easy Objects

68

Detect Easy Instances

Discover New Category

Expand Knowledge

Initialize Stuff

Objectness (Obj) Easiness (ES)

Familiarity Map (F)

Context-Awareness (CA)

+

• Obj: how well a window contains any generic object.

• CA: how well surrounding regions resemble familiar categories.

Page 51: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

69

Identify Easy Objects

Detect Easy Instances

Discover New Category

Expand Knowledge

Initialize Stuff

Page 52: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

70

Object Discovery Accuracy3 9

22 29

12 13

14 20

Page 53: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unsupervised visual discovery

71

Visual world

Object segmentations

in images and video

Page 54: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Collect-Cut

72

Best Bottom-up (with multi-segs)

Collect-Cut(ours)

Discovered Ensemble from Unlabeled Multi-Object Images

Unlabeled Images

Unsupervised Segmentation Examples

[Lee & Grauman, Collect-Cut, CVPR 2010]

Page 55: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Problem: Video object segmentation

Input: Unannotated video Desired output: Segmentation of high-ranking foreground object

• Existing methods group pixels using low-level features, which can result in an “over-segmentation.” [Brendel & Todorovic 2009,

Vazquez-Reina et al. 2010, Grundmann et al. 2010, Brox & Malik 2010]

How to segment the foreground objects in video when• background is moving and changing• categories of foreground objects are unknown in advance

73

Page 56: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Key-segment discovery

• Discover a set of “object-like” key-segments for category independent video object segmentation

– Resist over-segmentation by detecting regions with “object-like” appearance and motion

[Lee, Kim, Grauman, Key-segments, ICCV 2011] 74

Page 57: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

1) Find object-like regions using appearance and motion cues

2) Group regions across video to discover key-segment hypotheses

3) Rank hypotheses and build segmentation models for each hypothesis

4) For a given hypothesis, segment the corresponding foreground object using the models

Key-segment discovery

Output segmentation

Shape modelColor model

[Lee, Kim, Grauman, Key-segments, ICCV 2011] 75

Page 58: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Results: Key-segment video segmentation

76

• Detect and segment people and discovered important objects without category-specific models

• Success in spite of moving camera, bg changes, low resolution

Page 59: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

77

Results: Key-segment video segmentation

Grundmann et al. 2010

Grundmann et al. 2010

Ours Ours

• Resists over-segmentation by detecting regions with “object-like” appearance and motion

Page 60: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Results: Key-segment video segmentation

[29]: Tsai et al. BMVC 2010, [7]: Chockalingam et al. ICCV 2009

• Background subtraction falls apart• Ours produces state-of-the-art results even when compared

to supervised methods

78

Segmentation error rate

Page 61: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Unsupervised visual discovery

79

Visual world

Storyboard visual summary

1:00 pm 2:00 pm 3:00 pm 4:00 pm

Page 62: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

GoPro Google Glass Looxcie

PivotheadTobii SMI

Mining first-person camera data

80

Page 63: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Steve Mann life logger81

90’s

Mining first-person camera data

Page 64: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Problem: Summarizing egocentric videos

Output: Storyboard summary of discovered important people and objects

9:00 am 10:00 am 11:00 am 12:00 pm 1:00 pm 2:00 pm

Wearable camera

82

Input: Egocentric video of the camera wearer’s day

[Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012]

Page 65: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Important person/object discovery

• Discover important people and objects for egocentric video summarization

– Important: things with which the camera wearer has significant interaction

83[Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012]

Page 66: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Data collection

• 15 fps, 320 x 480 resolution• 10 videos, 3-5 hrs in length; total of 37 hrs• Four subjects: one undergraduate, two grad students, and

one office worker

84

Segment video into

events

Discover important

regions

Storyboard summary

Learn Importance

Collect training data

Page 67: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

distance to hand frequencydistance to frame center

Egocentric features:

Learning region importance

85

Segment video into

events

Discover important

regions

Storyboard summary

Learn Importance

Collect training data

Page 68: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

distance to hand distance to frame center frequency

Egocentric features:

Learning region importance

86Region features: size, width, height, centroid

Object features:

surrounding area’s appearance, motion

[ ]

candidate region’s appearance, motion

[ ]

Object-like appearance, motion overlap w/ face detection

Segment video into

events

Discover important

regions

Storyboard summary

Learn Importance

Collect training data

Page 69: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Learning region importance

• Regressor to learn and predict a region’s degree of importance• Expect significant interactions between the features; e.g., a region

near the hand is important only if it is object-like in appearance

• For training:

• For testing: predict I(r) given xi(r)’s

learned parameters i’th feature valueimportance

87

Segment video into

events

Discover important

regions

Storyboard summary

Learn Importance

Collect training data

Page 70: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

88

OursObject-like

[Carreira, 2010]Object-like

[Endres, 2010]Saliency

[Walther, 2006]

Results: Important region prediction

Good predictions

Page 71: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

89

Results: Important region prediction

Ours

Failure cases

Object-like [Carreira, 2010]

Object-like [Endres, 2010]

Saliency [Walther, 2006]

Page 72: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Generating a storyboard summary

90

Event 1 Event 2 Event 3

Event 4Event 3

• Display event boundaries and frames of the selected important people and objects

Segment video into

events

Discover important

regions

Storyboard summary

Learn Importance

Collect training data

Page 73: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Our summary (12 frames)Original video (3 hours)

91

Results: Egocentric video summarization

Page 74: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

92

Results: Egocentric video summarization

Page 75: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Fine-grained recognition

94

Page 76: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

AverageExplorer

• video

Page 77: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •
Page 78: ECS 289H: Visual Recognitionyjlee/teaching/ecs289h... · ground truth segmentation were known. Quality of Foreground Detection 10-classes subset-highly weighted features. Shape •

Coming up

• Sign-up for papers

• Next class– Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV 1999.

– Video Google: A Text Retrieval Approach to Object Matching in Videos. J. Sivic and A. Zisserman. ICCV 2003.

• Read both papers

• Write a review for one of them