Supplementary Material for TVSum: Summarizing …openaccess.thecvf.com/content_cvpr_2015/supplemental/...Supplementary Material for TVSum: Summarizing Web Videos Using Titles Yale

Supplementary Material for

TVSum: Summarizing Web Videos Using Titles

Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes

Yahoo Labs, New York

{yalesong,jvallmi,stent,ajaimes}@yahoo-inc.com

This supplementary material includes detailed information about our experiments presented in Section 5.2. Below gives a

summary of the material; see captions in each Table and Figure for more detailed explanations.

• Figure 1 shows a tutorial page and the annotation user interface of our Amazon Mechanical Turk (AMT) study.

• Table 1 shows the original titles (provided in the SumMe dataset) and our substituted titles, along with YouTube unique

video identifiers (some are missing because we could not find them).

• Figure 2 and Figure 3 show a list of 50 videos in our TVSum50 dataset.

• Figure 4 shows per-video F1 scores on our TVSum50 dataset.

• Figure 4 shows detailed experimental results (show importance scores and visualizations of the learned co-archetypes)

on our TVSum50 dataset.

1

978-1-4673-6964-0/15/$31.00 ©2015 IEEE

(a)

(b)

(c)

target range et ra

Figure 1. Amazon Mechanical Turk study user interface. A tutorial page (top) and the annotation user interface (bottom). (a) By clicking

the image below “Watch Video,” users can replay the video at any time during the task. (b) A group of visually similar shots; we clustered

shots using the k-means algorithm. Users can assign a score to each shot in a group (by clicking a thumbnail image), or all shots in a group

as a whole (by clicking the group header). The groups and shots are presented in a random order to avoid the chronological bias. (c) A

monitoring gadget showing the current progress towards the target range of each score.

Table 1. Substituted titles on the SumMe dataset. We used the original title of a video whenever possible. “(no change)” means we used

the title as provided in the SumMe dataset. For those videos that we were able to find from the YouTube website, we also include their

Youtube unique video identifiers. For the video ID u9E9HpuJQ7U, we changed the title to “Hibachi Cooking” because the original title

“Amazing Cooking!”) was not very descriptive of the video content.

Original Title Substituted Title Youtube ID (link)

Air Force One Barack Obama Berlin |Die Air Force One schwebt ber TXL ein uWIftWgKFc8

Base Jumping Base Jumping, Wingsuit in Norway 8rmIDJEYjoU

Bearpark Climbing Brenpark Bern |Berna, Ursina |Bjrk am klettern im Baum on32amsEoNA

Bike Polo (no change)

Bus in Rock Tunnel Charter bus in rock tunnel bBcsQcOyuIo

Car over camera Car runs over camera

Car railcrossing Meanwhile At The Railway Crossing R 0 o1N9hBw

Cockpit Landing Cockpit landing

Cooking Hibachi Cooking u9E9HpuJQ7U

Eiffel Tower (no change)

Excavators river crossing Crazy Russians crossing river excavators nGMxQfJe7OA

Fire Domino Amazing Fire Domino U1oT 6HzzVI

Jump Giant Waterslide Jump w4vW3fsJ8pk

Kids playing in leaves (no change)

Notre Dame (no change)

Paintball Paintball fight

Paluma Jump Rock Jumping at little Crystal Creek

Playing ball Crow and dog playing with pingpongball ZOYuB7-KHeU

Playing on water slide (no change) pxnRYQnz6Ws

Saving Dolphins Dolphins stranding and incredibly saved ekmMD8oYtJ0

Scuba Scuba diving

St Maarten Landing St. Maarten KLM Boeing 747 landing SCIJ0F62og4

Statue of Liberty (no change)

Uncut evening flight (no change) Eqs328fkiaQ

Valparaiso Downhill (no change) 9hvfYvqS-bE

[VT] Changing a vehicle tire

How to change tires for

off road vehicles

How to use a tyre repair kit

- Which? guide

Flat tire KODA Tips

How to Repair Your Tyre

When to Replace

Your Tires GMC

AwmHb44 ouw (5:54) 98MoyGZKHXc (3:07) J0nA4VgnoCo (9:44) gzDbaEs1Rlg (4:48) XzYM3PfTM4w (1:51)

how-to how-to vlog how-to how-to

[VU] Getting a vehicle unstuck

The stuck truck of Mark,

The rut that filled...

BBC -

Train crash

Girl gets van stuck

in the back forty

Smart Electric Vehicle

Balances on Two Wheels

Electric cars making

earth more green

HT5vyqe0Xaw (5:22) sTEELN-vY30 (2:29) vdmoEJ5YbrQ (5:29) xwqBXPGE9pQ (3:53) akI8YFjEmUw (2:13)

egocentric news egocentric interview news

[GA] Grooming an animal

Pet Joy Spa Grooming

Services — Brentwood, CA

I am a puppy dog groomer Nail clipper Gloria Pets

professional grooming

Dog Grooming

in Buenos Aires

How to Clean Your Dog’s

Ears - Vetoquinol USA

i3wAGJaaktw (2:36) Bhxk-O1Y7Ho (7:30) 0tmA C6XwfM (2:21) 3eYKfiOEJNs (3:14) xxdtq8mxegs (2:24)

commercial vlog how-to story how-to

[MS] Making a sandwich

Mexican Fried Chicken

Sandwich Recipe

Reuben Sandwich with

Corned Beef & Sauerkraut

Poor Man’s Meals:

Spicy Sausage Sandwich

Saigon Sandwich -

Vietnamese Sandwiches...

...Joseph Leonard’s

Fried Chicken Sandwich...

WG0MBPpPC6I (6:37) Hl- g2gn A (4:03) Yi4Ij2NM7U4 (6:45) 37rzWOQsNIw (3:11) LRw obCPUt0 (4:20)

how-to how-to how-to vlog story

[PK] Parkour

David Belle |Fondateur du parkour...

Charlotte Parkour

|Charlotte Video Project

Singapore Parkour

Free Running...

Parkour Camp Leipzig Jam Parkour Via del Mar

cjibtmSLxQ4 (10:47) b626MiF1ew4 (3:55) XkqCExn6 Us (3:08) GsAD1KT1xo8 (2:25) PJrm840pAUI (4:34)

story interview UGC UGC UGC

Figure 2. TVSum50 Dataset (page 1 of 2). Videos are grouped into query categories (e.g., [VT] Changing a vehicle tire). For each video,

we show its title, thumbnail image, YouTube unique video identifier, duration, and genre. Click thumbnail images to watch videos.

[PR] Parade

Chinese New Year Parade

NYC Chinatown

Alumnae Parade &

Laurel Chain Ceremony

LA Kings Stanley Cup

South Bay Parade

Chinatown Parade Southend-on-Sea and

Kortrijk Bike Friendly...

91IHQYk1IQM (1:50) RBCABdttQmI (6:04) z 6gVvQb2d0 (4:36) fWutDQy1nnY (9:45) 4wU LUjG5Ic (2:47)

UGC UGC egocentric vlog UGC

[FM] Flash mob gathering

Flash mob protest for Syria |Sydney, Australia

ICC World Twenty

Bangladesh Flash Mob...

Bluffton Teachers

Flash Mob Dance...

ICC World Twenty

Bangladesh, Flash Mob...

CASACL - Flashmob in

Copenhagen underground...

VuWGsYPqAX8 (3:36) JKpqYvAdIsw (2:32) xmEERLqJ2kU (7:26) byxOvuiIJV0 (2:34) xMr-HKMfVA ()

UGC UGC egocentric UGC UGC

[BK] Beekeeping

Beekeeping 101 A Year of Beekeeping Paper Wasp Removal |From the Ground Up

Killer Bees Kill 1000-lb

Hog in Bisbee AZ

Apis Mellifera in a

Vertical Log Hive

WxtbjNsCQ8A (4:25) uGu 10sucQo (2:47) EE-bNr36nyA (1:38) Se3oxnaPsz0 (2:18) oDXZc0tZe04 (6:20)

interview UGC how-to news UGC

[BT] Attempting a bike trick

...How to Whip

a Motocross Bike

Smage Bros.

Motorcycle Stunt Show

How to stop your Fixie How to lock your bike.

The RIGHT way!

Pure Fix TV:

How to Wheelie

qqR6AEXwxoQ (4:29) EYqVtI9YWJA (3:18) eQu1rNs0an0 (2:44) JgHubY5Vw3Y (2:23) iVt07TCkFM0 (1:44)

interview interview how-to how-to how-to

[DS] Dog show

German Shepherd

Dog Show (KCI)...

The Dog Show HD Oliver’s Show - Dog’s tale ...Obie the obese dog

works toward weight loss

Will A Cat Eat Dog Food?

E11zDS9XGzg (8:30) NyBmCxDoHJU (3:09) kLxoNp-UchI (2:10) jcoYJXDG9sw (3:19) -esJrBWj2d8 (3:50)

vlog UGC UGC news story egocentric

Figure 3. TVSum50 Dataset (page 2 of 2). Videos are grouped into query categories (e.g., [PR] Parade). For each video, we show its title,

thumbnail image, YouTube unique video identifier, duration, and genre. Click thumbnail images to watch videos.

AwmHb44_ouw 98MoyGZKHXc J0nA4VgnoCo gzDbaEs1Rlg XzYM3PfTM4w Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: VT)

0.5966

(0.5833)

0.4304

(0.4483)

0.5510

(0.4848)

0.4851

(0.4814)

0.5451

(0.4671)

0.5216

(0.4679)

+

o

o

o+

o

+

SU SR CK CS LL WP AA1 AA2 CA

HT5vyqe0Xaw sTEELN−vY30 vdmoEJ5YbrQ xwqBXPGE9pQ akI8YFjEmUw Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: VU)

0.5621

(0.4874)

0.5484

(0.4969)

0.5035

(0.5122)

0.5793

(0.5726)

0.5492

(0.6536)

0.5485

(0.5161)

o

o

o

o ++


i3wAGJaaktw Bhxk−O1Y7Ho 0tmA_C6XwfM 3eYKfiOEJNs xxdtq8mxegs Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: GA)

0.5593

(0.4555)

0.4816

(0.5709)

0.4459

(0.4761)

0.1461

(0.5284)

0.3945

(0.4579)

0.4055

(0.4596)


WG0MBPpPC6I Hl−__g2gn_A Yi4Ij2NM7U4 37rzWOQsNIw LRw_obCPUt0 Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: MS)

0.6093

(0.5007)

0.6247

(0.5800)

0.5842

(0.4215)

0.5854

(0.4712)

0.4992

(0.4645)

0.5806

(0.4525)

o

oo

o

+

o

+o


cjibtmSLxQ4 b626MiF1ew4 XkqCExn6_Us GsAD1KT1xo8 PJrm840pAUI Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: PK)

0.3612

(0.5895)

0.4959

(0.5370)

0.3823

(0.4320)

0.4931

(0.5253)

0.4611

(0.4312)

0.4387

(0.4875)

+

+

+


91IHQYk1IQM RBCABdttQmI z_6gVvQb2d0 fWutDQy1nnY 4wU_LUjG5Ic Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: PR)

0.5271

(0.4276)

0.4858

(0.4428)

0.6045

(0.5851)

0.5502

(0.4889)

0.4967

(0.4901)

0.5329

(0.4195)

o

o

o

oo

+

+


VuWGsYPqAX8 JKpqYvAdIsw xmEERLqJ2kU byxOvuiIJV0 _xMr−HKMfVA Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: FM)

0.4296

(0.4892)

0.4423

(0.4641)

0.5922

(0.5118)

0.5003

(0.4786)

0.6061

(0.4572)

0.5141

(0.4204)

oo

+

+ +


WxtbjNsCQ8A uGu_10sucQo EE−bNr36nyA Se3oxnaPsz0 oDXZc0tZe04 Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: BK)

0.5597

(0.4274)

0.4896

(0.4248)

0.4392

(0.5284)

0.3638

(0.5690)

0.5041

(0.5345)

0.4713

(0.4408)

+o

o

+


qqR6AEXwxoQ EYqVtI9YWJA eQu1rNs0an0 JgHubY5Vw3Y iVt07TCkFM0 Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: BT)

0.4898

(0.6513)

0.3674

(0.5231)

0.4776

(0.4369)

0.5329

(0.4959)

0.5732

(0.4941)

0.4882

(0.4609)

+

o

o

oo


E11zDS9XGzg NyBmCxDoHJU kLxoNp−UchI jcoYJXDG9sw −esJrBWj2d8 Average0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Mean Pairwise F1−Scores on the TVSum50 Dataset (category: DS)

0.4704

(0.5211)

0.5418

(0.5221)

0.4104

(0.4663)

0.4533

(0.6211)

0.5126

(0.4614)

0.4777

(0.5184)

ooo

+

ooo


Figure 4. Experimental results on each video category on the TVSum50 dataset. Models compared include uniform sampling (SU),

random sampling (SR), k-means clustering (CK), spectral clustering (CS), LiveLight by Zhao and Xing (LL), web image prior by Khosla et

al. (WP), archetypal analysis with video frames (AA1), archetypal analysis with a combination of video frames and images (AA2), and

our co-archetypal analysis (CA). Bold-faced numbers represent our results; numbers in parentheses represent the highest score from any

baseline. The last column shows average scores, where we also show the statistical significance of the pairwise difference between our

approach (CA) and the baselines (the ’o’ mark indicates p < 0.01; the ’+’ mark indicates p < 0.05). Except for a few videos, our approach

outperforms all other baselines.

(a) Ruben Sandwich with Corned Beef & Sauerkraut F1Score = 0.6247

vid

eo

imag

e

(b) CASACL Ð Flashmob in Copenhagen underground Ð Peer Gynt F1Score = 0.6061

vid

eo

imag

e

F1Score = 0.3612 (c) David Belle | Fondatuer du parkour | Reportage de TF1

vid

eo

imag

e

F1Score = 0.1461 (d) Dog grooming in Buenos Aires

vid

eo

imag

e

Human Labeled Score

Co-archetypal Analysis Score

Human Labeled Score


Human Labeled Score


Human Labeled Score


Figure 5. Detailed results of co-archetypal analysis on the TVSum50 dataset. Shown here are four example results (a-d), each with a

video title, an F1 score of our approach, video frames, two heatmap representations of normalized shot importance scores (one from human

labels, another from our approach), and four learned co-archetypes represented in terms of video frames and images; stacked in that order.

(a-b) Two example results with highest F1 scores; (c-d) two example results with lowest F1 scores. The results with high F1 scores tend to

show co-archetypes that match more closely with video segments scored higher by humans (e.g., (a) sandwich, (b) underground) compared

to the results with low F1 scores.

Documents

Supplementary Material for TVSum: Summarizing …openaccess.thecvf.com/content_cvpr_2015/supplemental/...Supplementary Material for TVSum: Summarizing Web Videos Using Titles Yale