25
1 Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval Analysis of Visual Similarity in News Videos Analysis of Visual Similarity in News Videos with Robust and Memory with Robust and Memory - - Efficient Efficient Image Retrieval Image Retrieval David Chen, Peter Vajda, Sam Tsai, Maryam Daneshi, Matt Yu, Huizhong Chen, Andre Araujo, Bernd Girod Image, Video, and Multimedia Systems Group Stanford University

Analysis of visual similarity in news videos with robust and memory efficient image retrieval

Embed Size (px)

DESCRIPTION

Analysis of visual similarity in news videos with robust and memory efficient image retrieval

Citation preview

Page 1: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

11Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Analysis of Visual Similarity in News Videos Analysis of Visual Similarity in News Videos with Robust and Memorywith Robust and Memory--Efficient Efficient

Image RetrievalImage Retrieval

David Chen, Peter Vajda, Sam Tsai, Maryam Daneshi, Matt Yu, Huizhong Chen, Andre Araujo, Bernd Girod

Image, Video, and Multimedia Systems GroupStanford University

Page 2: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

22Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Page 3: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

33Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Page 4: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

44Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Page 5: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

55Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Plays 30 second clip around query phrase match

Would benefit from accurate segmentation of stories

Would benefit from reliable generation of summary clips

Page 6: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

66Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Applications of Anchor DetectionApplications of Anchor Detection1. Provide strong cues for story segmentation

2. Extract news story summaries/previews

3. Identify anchors for general person recognition

TURNING TO TECH, SHARES OF RESEARCH IN MOTION REBOUNDED FROM A ONE MONTH LOW. THE COMPANY'S NEXT GENERATION BLACKBERRY-10 PRODUCT LINE IS EXPECTED TO BE UNVEILED IN JUST A FEW WEEKS. YOU MAY REMEMBER SHARES SOLD OFF LAST WEEK AFTER THE COMPANY ISSUED A CAUTIOUS OUTLOOK FOR ITS FOURTH QUARTER RESULTS. BUT TODAY SHARES BOUNCED BACK: UP 11.5% TO A UNDER $12.

Anchor Brian Williams

Anchor Susie Gharib

Don’t confuse anchors with other people in the videos

Page 7: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

77Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Applications of Preview MatchingApplications of Preview Matching1. Provide strong cues for story segmentation

2. Extract news story summaries/previews

3. Indicate the most important stories in a broadcast

JUST A MESS. IN WASHINGTON, LAWMAKERS LEAVE TOWN FOR THE HOLIDAYS. THE CLOCK TICKS DOWN TO THE SO-CALLED FISCAL CLIFF. LATE TODAY, THE PRESIDENT HASTILY APPEARS TO ASK IF SOME OF THIS BUSINESS CAN BE FINISHED SOON.

Page 8: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

88Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

OutlineOutline• Related work in news video analysis• Long-range visual similarity• Anchor detection algorithm• Preview matching algorithm• Experimental results

Page 9: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

99Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Related Work in News Video AnalysisRelated Work in News Video Analysis• Model-based anchor detection

[Zhang et al., 1998] [Hanjalic et al., 1998] [Liu et al., 2000]

• Model-free anchor detection[Gao et al., 2002] [De Santo et al., 2006] [D’Anna et al., 2007] [Ma et al., 2008] [Broilo et al., 2011]

• Spatio-temporal slices for reporter detection[Liu et al., 2007] [Zheng et al., 2010]

• Classification of news video shots[Bertini et al., 2001] [Xiao et al., 2010] [Lee et al., 2011]

Page 10: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1010Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

LongLong--Range Visual SimilarityRange Visual Similarity

Frame Number

Fram

e N

umbe

r

1 501 1001 1501 2001 2501 3001

1

501

1001

1501

2001

2501

3001

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Page 11: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1111Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

LongLong--Range Visual SimilarityRange Visual Similarity

Frame Number

Fram

e N

umbe

r

1 501 1001 1501 2001 2501 3001 3501

1

501

1001

1501

2001

2501

3001

3501 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

What causes these long-range visual similarities?What causes these long-range visual similarities?

Page 12: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1212Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

LongLong--Range Visual SimilarityRange Visual Similarity

NBC Nightly News on Dec. 21, 2012

Page 13: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1313Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

LongLong--Range Visual SimilarityRange Visual SimilarityAnchor: Brian

Williams

Analyst: David

Gregory

Reporter: Kelly

O’Donnell

Reporter: Andrea Mitchell

Page 14: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1414Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

LongLong--Range Visual SimilarityRange Visual Similarity

Page 15: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1515Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Anchor Detection PipelineAnchor Detection Pipeline

Exclude Frames

Without Faces

Extract Image Signatures

Compare Image

Signatures

Form Initial Anchor

Candidates

Prune Away False

Candidates

Include Temporally

Nearby Candidates

Keyframes

Detections

Similarity Matrix

Count number of long-range local peaks in the current row of the similarity matrix and pick initial

candidates from high-count rows

Compare initial candidates to one another and prune out candidates which are not very similar to

the other initial candidates

From pruned set of candidates, expand to include temporally nearby candidates which are also very

similar in appearance

Page 16: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1616Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

IntraIntra--Episode vs. InterEpisode vs. Inter--EpisodeEpisode• Intra-episode: compare frames within a single

episode of a news program• Inter-episode: compare frames between different

episodes of a news program

Page 17: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1717Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Preview Matching PipelinePreview Matching Pipeline

Detect and Recognize

Text

Adaptively Crop to Preview Region

Extract Image Signature

Compare Image

Signatures

Verify Geometry in

Shortlist

JUST A MESS

JUST A MESS

COMING UP

COMING UP

Database of Image Signatures

Frame

Matches

Page 18: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1818Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

REVV: Residual Enhanced Visual VectorREVV: Residual Enhanced Visual Vector

Extract Local Features

Vector Quantize to

Visual Words

Visual Codebook

Perform Mean Aggregation of Residuals

Query Image

……

Regularize with Power

Law

Reduce Dimensions

by LDA

Binarize Components

from Sign

Compute Weighted

Correlations

Database Signatures

Ranked List1.741.751.791.801.831.84

Page 19: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

1919Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Experimental SetupExperimental Setup• Anchor detection

– Training on 12 episodes of NBC Nightly News (1 anchor/episode), ABC World News (1 anchor/episode), Nightly Business Report (2 anchors/episode)

– Testing on 21 episodes of same three programs– Measure precision / recall / F-score

• Preview matching– Testing on 10 episodes of NBC Nightly News and ABC

World News– Measure precision / recall / F-score

• Comparison of two memory-efficient signatures– GIST: 66 MB/episode [Oliva et al., 2001] [Douze et al., 2009]– REVV: 10 MB/episode [Chen et al., 2013]

Page 20: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

2020Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Anchor Detection ResultsAnchor Detection Results

Recall Precision F-ScoreGIST Intra 0.53 0.84 0.65REVV Intra 0.87 0.90 0.88

Page 21: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

2121Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Anchor Detection ResultsAnchor Detection Results

Recall Precision F-ScoreREVV Intra 0.87 0.90 0.88

REVV Intra + Inter 0.90 0.91 0.90

Page 22: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

2222Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Preview Matching ResultsPreview Matching Results

Recall Precision F-ScoreGIST 0.48 1.00 0.65REVV 0.90 1.00 0.95

Type A: Preview occurs at beginning of broadcast

Page 23: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

2323Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

Preview Matching ResultsPreview Matching Results

Recall Precision F-ScoreGIST 0.62 1.00 0.77REVV 0.93 1.00 0.96

Type B: Preview occurs prior to a commercial

Page 24: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

2424Chen et al., Analysis of Visual Similarity in News Videos with Robust and Memory-Efficient Image Retrieval

ConclusionsConclusions• Long-range visual similarity in news videos provides

a general and effective method for anchor detection and preview matching

• A robust image signature is required to handle challenging appearance variations throughout a newscast

• The image signature should be memory-efficient to enable parallelized processing of large video archives

Page 25: Analysis of visual similarity in news videos with robust and memory efficient image retrieval

Thank YouThank YouThank [email protected]@stanford.edu