Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Supplementary Material for
TVSum: Summarizing Web Videos Using Titles
Yale Song, Jordi Vallmitjana, Amanda Stent, Alejandro Jaimes
Yahoo Labs, New York
{yalesong,jvallmi,stent,ajaimes}@yahoo-inc.com
This supplementary material includes detailed information about our experiments presented in Section 5.2. Below gives a
summary of the material; see captions in each Table and Figure for more detailed explanations.
• Figure 1 shows a tutorial page and the annotation user interface of our Amazon Mechanical Turk (AMT) study.
• Table 1 shows the original titles (provided in the SumMe dataset) and our substituted titles, along with YouTube unique
video identifiers (some are missing because we could not find them).
• Figure 2 and Figure 3 show a list of 50 videos in our TVSum50 dataset.
• Figure 4 shows per-video F1 scores on our TVSum50 dataset.
• Figure 4 shows detailed experimental results (show importance scores and visualizations of the learned co-archetypes)
on our TVSum50 dataset.
1
978-1-4673-6964-0/15/$31.00 ©2015 IEEE
(a)
(b)
(c)
target range et ra
Figure 1. Amazon Mechanical Turk study user interface. A tutorial page (top) and the annotation user interface (bottom). (a) By clicking
the image below “Watch Video,” users can replay the video at any time during the task. (b) A group of visually similar shots; we clustered
shots using the k-means algorithm. Users can assign a score to each shot in a group (by clicking a thumbnail image), or all shots in a group
as a whole (by clicking the group header). The groups and shots are presented in a random order to avoid the chronological bias. (c) A
monitoring gadget showing the current progress towards the target range of each score.
Table 1. Substituted titles on the SumMe dataset. We used the original title of a video whenever possible. “(no change)” means we used
the title as provided in the SumMe dataset. For those videos that we were able to find from the YouTube website, we also include their
Youtube unique video identifiers. For the video ID u9E9HpuJQ7U, we changed the title to “Hibachi Cooking” because the original title
“Amazing Cooking!”) was not very descriptive of the video content.
Original Title Substituted Title Youtube ID (link)
Air Force One Barack Obama Berlin |Die Air Force One schwebt ber TXL ein uWIftWgKFc8
Base Jumping Base Jumping, Wingsuit in Norway 8rmIDJEYjoU
Bearpark Climbing Brenpark Bern |Berna, Ursina |Bjrk am klettern im Baum on32amsEoNA
Bike Polo (no change)
Bus in Rock Tunnel Charter bus in rock tunnel bBcsQcOyuIo
Car over camera Car runs over camera
Car railcrossing Meanwhile At The Railway Crossing R 0 o1N9hBw
Cockpit Landing Cockpit landing
Cooking Hibachi Cooking u9E9HpuJQ7U
Eiffel Tower (no change)
Excavators river crossing Crazy Russians crossing river excavators nGMxQfJe7OA
Fire Domino Amazing Fire Domino U1oT 6HzzVI
Jump Giant Waterslide Jump w4vW3fsJ8pk
Kids playing in leaves (no change)
Notre Dame (no change)
Paintball Paintball fight
Paluma Jump Rock Jumping at little Crystal Creek
Playing ball Crow and dog playing with pingpongball ZOYuB7-KHeU
Playing on water slide (no change) pxnRYQnz6Ws
Saving Dolphins Dolphins stranding and incredibly saved ekmMD8oYtJ0
Scuba Scuba diving
St Maarten Landing St. Maarten KLM Boeing 747 landing SCIJ0F62og4
Statue of Liberty (no change)
Uncut evening flight (no change) Eqs328fkiaQ
Valparaiso Downhill (no change) 9hvfYvqS-bE
[VT] Changing a vehicle tire
How to change tires for
off road vehicles
How to use a tyre repair kit
- Which? guide
Flat tire KODA Tips
How to Repair Your Tyre
When to Replace
Your Tires GMC
AwmHb44 ouw (5:54) 98MoyGZKHXc (3:07) J0nA4VgnoCo (9:44) gzDbaEs1Rlg (4:48) XzYM3PfTM4w (1:51)
how-to how-to vlog how-to how-to
[VU] Getting a vehicle unstuck
The stuck truck of Mark,
The rut that filled...
BBC -
Train crash
Girl gets van stuck
in the back forty
Smart Electric Vehicle
Balances on Two Wheels
Electric cars making
earth more green
HT5vyqe0Xaw (5:22) sTEELN-vY30 (2:29) vdmoEJ5YbrQ (5:29) xwqBXPGE9pQ (3:53) akI8YFjEmUw (2:13)
egocentric news egocentric interview news
[GA] Grooming an animal
Pet Joy Spa Grooming
Services — Brentwood, CA
I am a puppy dog groomer Nail clipper Gloria Pets
professional grooming
Dog Grooming
in Buenos Aires
How to Clean Your Dog’s
Ears - Vetoquinol USA
i3wAGJaaktw (2:36) Bhxk-O1Y7Ho (7:30) 0tmA C6XwfM (2:21) 3eYKfiOEJNs (3:14) xxdtq8mxegs (2:24)
commercial vlog how-to story how-to
[MS] Making a sandwich
Mexican Fried Chicken
Sandwich Recipe
Reuben Sandwich with
Corned Beef & Sauerkraut
Poor Man’s Meals:
Spicy Sausage Sandwich
Saigon Sandwich -
Vietnamese Sandwiches...
...Joseph Leonard’s
Fried Chicken Sandwich...
WG0MBPpPC6I (6:37) Hl- g2gn A (4:03) Yi4Ij2NM7U4 (6:45) 37rzWOQsNIw (3:11) LRw obCPUt0 (4:20)
how-to how-to how-to vlog story
[PK] Parkour
David Belle |Fondateur du parkour...
Charlotte Parkour
|Charlotte Video Project
Singapore Parkour
Free Running...
Parkour Camp Leipzig Jam Parkour Via del Mar
cjibtmSLxQ4 (10:47) b626MiF1ew4 (3:55) XkqCExn6 Us (3:08) GsAD1KT1xo8 (2:25) PJrm840pAUI (4:34)
story interview UGC UGC UGC
Figure 2. TVSum50 Dataset (page 1 of 2). Videos are grouped into query categories (e.g., [VT] Changing a vehicle tire). For each video,
we show its title, thumbnail image, YouTube unique video identifier, duration, and genre. Click thumbnail images to watch videos.
[PR] Parade
Chinese New Year Parade
NYC Chinatown
Alumnae Parade &
Laurel Chain Ceremony
LA Kings Stanley Cup
South Bay Parade
Chinatown Parade Southend-on-Sea and
Kortrijk Bike Friendly...
91IHQYk1IQM (1:50) RBCABdttQmI (6:04) z 6gVvQb2d0 (4:36) fWutDQy1nnY (9:45) 4wU LUjG5Ic (2:47)
UGC UGC egocentric vlog UGC
[FM] Flash mob gathering
Flash mob protest for Syria |Sydney, Australia
ICC World Twenty
Bangladesh Flash Mob...
Bluffton Teachers
Flash Mob Dance...
ICC World Twenty
Bangladesh, Flash Mob...
CASACL - Flashmob in
Copenhagen underground...
VuWGsYPqAX8 (3:36) JKpqYvAdIsw (2:32) xmEERLqJ2kU (7:26) byxOvuiIJV0 (2:34) xMr-HKMfVA ()
UGC UGC egocentric UGC UGC
[BK] Beekeeping
Beekeeping 101 A Year of Beekeeping Paper Wasp Removal |From the Ground Up
Killer Bees Kill 1000-lb
Hog in Bisbee AZ
Apis Mellifera in a
Vertical Log Hive
WxtbjNsCQ8A (4:25) uGu 10sucQo (2:47) EE-bNr36nyA (1:38) Se3oxnaPsz0 (2:18) oDXZc0tZe04 (6:20)
interview UGC how-to news UGC
[BT] Attempting a bike trick
...How to Whip
a Motocross Bike
Smage Bros.
Motorcycle Stunt Show
How to stop your Fixie How to lock your bike.
The RIGHT way!
Pure Fix TV:
How to Wheelie
qqR6AEXwxoQ (4:29) EYqVtI9YWJA (3:18) eQu1rNs0an0 (2:44) JgHubY5Vw3Y (2:23) iVt07TCkFM0 (1:44)
interview interview how-to how-to how-to
[DS] Dog show
German Shepherd
Dog Show (KCI)...
The Dog Show HD Oliver’s Show - Dog’s tale ...Obie the obese dog
works toward weight loss
Will A Cat Eat Dog Food?
E11zDS9XGzg (8:30) NyBmCxDoHJU (3:09) kLxoNp-UchI (2:10) jcoYJXDG9sw (3:19) -esJrBWj2d8 (3:50)
vlog UGC UGC news story egocentric
Figure 3. TVSum50 Dataset (page 2 of 2). Videos are grouped into query categories (e.g., [PR] Parade). For each video, we show its title,
thumbnail image, YouTube unique video identifier, duration, and genre. Click thumbnail images to watch videos.
AwmHb44_ouw 98MoyGZKHXc J0nA4VgnoCo gzDbaEs1Rlg XzYM3PfTM4w Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: VT)
0.5966
(0.5833)
0.4304
(0.4483)
0.5510
(0.4848)
0.4851
(0.4814)
0.5451
(0.4671)
0.5216
(0.4679)
+
o
o
o+
o
+
SU SR CK CS LL WP AA1 AA2 CA
HT5vyqe0Xaw sTEELN−vY30 vdmoEJ5YbrQ xwqBXPGE9pQ akI8YFjEmUw Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: VU)
0.5621
(0.4874)
0.5484
(0.4969)
0.5035
(0.5122)
0.5793
(0.5726)
0.5492
(0.6536)
0.5485
(0.5161)
o
o
o
o ++
SU SR CK CS LL WP AA1 AA2 CA
i3wAGJaaktw Bhxk−O1Y7Ho 0tmA_C6XwfM 3eYKfiOEJNs xxdtq8mxegs Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: GA)
0.5593
(0.4555)
0.4816
(0.5709)
0.4459
(0.4761)
0.1461
(0.5284)
0.3945
(0.4579)
0.4055
(0.4596)
SU SR CK CS LL WP AA1 AA2 CA
WG0MBPpPC6I Hl−__g2gn_A Yi4Ij2NM7U4 37rzWOQsNIw LRw_obCPUt0 Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: MS)
0.6093
(0.5007)
0.6247
(0.5800)
0.5842
(0.4215)
0.5854
(0.4712)
0.4992
(0.4645)
0.5806
(0.4525)
o
oo
o
+
o
+o
SU SR CK CS LL WP AA1 AA2 CA
cjibtmSLxQ4 b626MiF1ew4 XkqCExn6_Us GsAD1KT1xo8 PJrm840pAUI Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: PK)
0.3612
(0.5895)
0.4959
(0.5370)
0.3823
(0.4320)
0.4931
(0.5253)
0.4611
(0.4312)
0.4387
(0.4875)
+
+
+
SU SR CK CS LL WP AA1 AA2 CA
91IHQYk1IQM RBCABdttQmI z_6gVvQb2d0 fWutDQy1nnY 4wU_LUjG5Ic Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: PR)
0.5271
(0.4276)
0.4858
(0.4428)
0.6045
(0.5851)
0.5502
(0.4889)
0.4967
(0.4901)
0.5329
(0.4195)
o
o
o
oo
+
+
SU SR CK CS LL WP AA1 AA2 CA
VuWGsYPqAX8 JKpqYvAdIsw xmEERLqJ2kU byxOvuiIJV0 _xMr−HKMfVA Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: FM)
0.4296
(0.4892)
0.4423
(0.4641)
0.5922
(0.5118)
0.5003
(0.4786)
0.6061
(0.4572)
0.5141
(0.4204)
oo
+
+ +
SU SR CK CS LL WP AA1 AA2 CA
WxtbjNsCQ8A uGu_10sucQo EE−bNr36nyA Se3oxnaPsz0 oDXZc0tZe04 Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: BK)
0.5597
(0.4274)
0.4896
(0.4248)
0.4392
(0.5284)
0.3638
(0.5690)
0.5041
(0.5345)
0.4713
(0.4408)
+o
o
+
SU SR CK CS LL WP AA1 AA2 CA
qqR6AEXwxoQ EYqVtI9YWJA eQu1rNs0an0 JgHubY5Vw3Y iVt07TCkFM0 Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: BT)
0.4898
(0.6513)
0.3674
(0.5231)
0.4776
(0.4369)
0.5329
(0.4959)
0.5732
(0.4941)
0.4882
(0.4609)
+
o
o
oo
SU SR CK CS LL WP AA1 AA2 CA
E11zDS9XGzg NyBmCxDoHJU kLxoNp−UchI jcoYJXDG9sw −esJrBWj2d8 Average0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Mean Pairwise F1−Scores on the TVSum50 Dataset (category: DS)
0.4704
(0.5211)
0.5418
(0.5221)
0.4104
(0.4663)
0.4533
(0.6211)
0.5126
(0.4614)
0.4777
(0.5184)
ooo
+
ooo
SU SR CK CS LL WP AA1 AA2 CA
Figure 4. Experimental results on each video category on the TVSum50 dataset. Models compared include uniform sampling (SU),
random sampling (SR), k-means clustering (CK), spectral clustering (CS), LiveLight by Zhao and Xing (LL), web image prior by Khosla et
al. (WP), archetypal analysis with video frames (AA1), archetypal analysis with a combination of video frames and images (AA2), and
our co-archetypal analysis (CA). Bold-faced numbers represent our results; numbers in parentheses represent the highest score from any
baseline. The last column shows average scores, where we also show the statistical significance of the pairwise difference between our
approach (CA) and the baselines (the ’o’ mark indicates p < 0.01; the ’+’ mark indicates p < 0.05). Except for a few videos, our approach
outperforms all other baselines.
(a) Ruben Sandwich with Corned Beef & Sauerkraut F1Score = 0.6247
vid
eo
imag
e
(b) CASACL Ð Flashmob in Copenhagen underground Ð Peer Gynt F1Score = 0.6061
vid
eo
imag
e
F1Score = 0.3612 (c) David Belle | Fondatuer du parkour | Reportage de TF1
vid
eo
imag
e
F1Score = 0.1461 (d) Dog grooming in Buenos Aires
vid
eo
imag
e
Human Labeled Score
Co-archetypal Analysis Score
Human Labeled Score
Co-archetypal Analysis Score
Human Labeled Score
Co-archetypal Analysis Score
Human Labeled Score
Co-archetypal Analysis Score
Figure 5. Detailed results of co-archetypal analysis on the TVSum50 dataset. Shown here are four example results (a-d), each with a
video title, an F1 score of our approach, video frames, two heatmap representations of normalized shot importance scores (one from human
labels, another from our approach), and four learned co-archetypes represented in terms of video frames and images; stacked in that order.
(a-b) Two example results with highest F1 scores; (c-d) two example results with lowest F1 scores. The results with high F1 scores tend to
show co-archetypes that match more closely with video segments scored higher by humans (e.g., (a) sandwich, (b) underground) compared
to the results with low F1 scores.