Upload
xin
View
212
Download
0
Embed Size (px)
Citation preview
In traditional broadcasting, the advertisingmodel has long been a financial pillar. The
onslaught of new media such as broadband and3G has caused advertisers to sit up and takenotice of the opportunities that new channels ofvideo delivery have to offer—for instance, livesporting events are increasingly webcast over IPnetworks and 3G streaming servers. However, thestandard model of a fixed-slot advertising runmight not apply to these new media infrastruc-tures: Why would anyone incur expensive 3Gairtime to watch a commercial break? In addi-tion, media’s mass digitization and the conve-nience of time-shifted, nonlinear viewing ofdigital content have also brought new challengesto the advertising industry. For instance, TiVo’sability to fast-forward through advertisementshas raised concerns over whether the 30-secondspot has lost its bite.
In response to this complex media environ-ment, advertisers are turning more and more toproduct placement, branded TV programming,and virtual advertising insertion. The key is toblend all advertising opportunities into the videocontent. Pioneering systems include the virtualfirst-down line in American football1 and thegraphical overlay of athletes’ country flags andother performance-related information such asskiing paths2 and tennis shots.3 They demon-strate that an important factor in user acceptanceis the value the technology adds to the game.
These systems generally apply camera-basedsensors at the video source and reference manu-ally input 3D grids for accurate subpixel objecttracking and chroma keying. Hence, the systemsdecide all virtual effects centrally and perma-nently encode them into the video for broadcastdownstream. Moreover, the system setup isexpensive, making a viable business model anissue. The webcast infrastructure offers signifi-cant advantages over broadcast in this regard,including
❚ Webcasts can reach a demographic segment.
❚ Webcasts combine traditional media’s familiar-ity with the Internet’s one-to-one interactivity.
❚ Webcast viewers are likely to be more tolerantof extraneous video effects such as advertisinginsertions than TV viewers because webcasts’video quality is generally lower than TV’s.
❚ Webcast audiences are generally moretechnologically savvy, affluent, and likely tospend money on advertised items than TVaudiences.4
By carefully timing the exposures and appro-priately placing the virtual content in strategicpositions in the video, webcasters can balancethe advertisers’ need for more eyeballs and theviewers’ need for less clutter and interference. Tothis end, we created a sports advertising inser-tion system that combines automated sportscontent analysis with manual techniques thatexploit an understanding of gameplay that onlyhumans possess.
Prior work in sports content analysisSubstantial research has focused on machine
algorithms for automating sports content analy-sis.5,6 Motivating this research is the notion that theability to efficiently tag sports video with relevantmetadata will enable content reuse and access.
In particular, Ekin and Tekalp’s algorithm usescinematic features, such as shot type and shotlength, to detect play breaks in a sports video.6 Xieet al. also noted that, during a game, the changesin a sports video’s visual features can be good indi-cators of the game’s natural play-break structures.So, they proceeded to train a detection systembased on hidden Markov models.7 Unfortunately,as Chang’s paper clarified, the difficult task ofmachines achieving a level of understanding
78 1070-986X/07/$25.00 © 2007 IEEE Published by the IEEE Computer Society
Multimedia at Work Qibin SunInfocomm Research
Kongwah Wanand Xin Yan
InfocommResearch
Advertising Insertion in Sports Webcasts
equivalent to that of humans remains.8 Moreover,Chang goes on to say that in the premium sportsdomain, where content value is high, contentowners can easily afford manual annotations.
These considerations prompted us to adopt atoolkit approach to designing our sports adver-tisement insertion system. We aimed to expeditethe operator’s laborious job of annotation, ratherthan to replace the operator altogether.
Overall system architectureFigure 1 shows a high-level snapshot of the
system in action during a baseball game. Videofrom a live sporting event feeds into a terminal,where a human operator monitors the game andmanually inserts control signals by selecting fromGUI buttons that denote game status, such as playbreak, home run, and pitching. The operator canalso specify coordinate information, such asbounding boxes for advertising images, by click-dragging the mouse. The system time-stampsthese control signals and multiplexes them intothe video stream for decoding by a downstreamplayer/receiver.
While normal video decoding and playbackruns on a client player, the system simultaneous-ly inserts advertising content. First, it preloads adatabase of indexed advertising content (logo,images, or video) onto the client players/receivers.Then, during playback, control signals with thecurrent presentation time stamps are decodedinto a set of client-side actions. By predefining
actions such as video overlay, the system canblend advertising content into the video.
Automatic machine detectionFigure 1 also shows a parallel video processing
module that takes the live video as input. Thisautomatic module complements the manualinput system. For example, the machine runs ahard-cut detection algorithm and accurately iden-tifies shot boundaries. This produces a more pleas-ant advertising insertion effect when combinedwith the operator’s manual inputs. For instance,while observing a particular baseball pitchingscene, the operator can indicate on the videoscreen corresponding to the backstop a preferredbounding box for advertising image insertion. Thesystem then automatically extends the advertis-ing insertion interval to all frames in the shot,hence causing the eventual advertising exposureto appear more natural and realistic. Moreover, ifthe image area within the operator-specifiedbounding box is homogeneous, such as a part ofthe backstop that is not already cluttered by adver-tising banners, the machine can also compute theseparation of foreground objects from the back-ground. Then, through the system’s chroma key-ing functionality, we can make the advertisinginsertion look more natural by causing the fore-ground objects to walk over the overlay content.
Having an automatic module would alsogreatly facilitate detection of events or visuallandmarks that regularly appear in the sporting
79
Ap
ril–June 2007
Video-on-demand center
Decode Encode
Streaming center
Livestream
Machineauto detect
Manualinput
Live sporting event
Manualcontrol
signal
Automaticcontrol
signal
Vide
o
PC
PC
PC
Multiplexer/encode
PC
PC
PC
Ad-DB
Decode Encode File
Ad-DB
Figure 1. System
workflow.
80
Multimedia at Work
event video. For example, in a tennis match, loudand sustained audience applause typically occursat the end of a good play. The system can performautomatic audio detection of applause to isolatethese moments9 as possible candidates for adver-tising content placement in a postclimax scene.10
As another example, when a soccer team adoptsa highly defensive strategy, we expect gameplayto be occurring mostly midfield, generally per-ceived as a lull in the game. Intuitively, at thismoment, the acceptance of a higher advertisingfrequency is greater. Because the soccer midfieldis generally a long, white, vertical line, machinedetection is relatively easy. Figure 2 shows otherexamples of visual landmark detection and adver-tising effects. Although computationally moreinvolved, detecting clear line markings on theplaying field is technically feasible.11,12
An economy of advertising spaceAdopting an advertising model similar to that
of broadcasting, the webcast architecture (see Fig-ure 1) makes decisions on advertising spaces cen-trally—that is, the available advertising slots arespecified at the video source end on the basis ofthe current video’s visual content. However, the
system doesn’t apply any advertising effects untilfarther downstream when the video reaches astreaming center or a video-on-demand center.You can think of these centers as middlemannodes that might cater to a large subscriber basemade up of advertisers.
When distributed hierarchically, the centersbecome local and geographical stations that holdniche knowledge of the demographics they’reserving. With the video received, each centernode can decode and retrieve all available (andallowable) advertising slots. Based on establishedcriteria, the node then decides whether to buy.The system logs these decisions over the Web,and they become legally binding purchase agree-ments between parties.
A node that buys a slot can choose to embeddesignated advertising content into the video,making the insertions permanent fixtures in thevideo downstream. Or the node can choose toresell the slot to downstream takers. By cus-tomizing the advertising content database atthese centers, each center maximizes its market-ing opportunities by sending meaningful adver-tising messages to its targeted audience. At thehierarchy’s root node, the potential for person-alized advertising to the individual is clear.
ImplementationsA prototype for streaming advertising inser-
tions in sports webcasts has been implementedon the Windows Media and Windows 2003Streaming Server platform.13 Figure 3 shows thehigh-level schematic of the three main systemcomponents: the encoder, transcoder, and play-er. Developed for the baseball domain in collab-oration with a Japanese partner, the encoderprovides the basic video monitoring and anno-tating functionalities expected of a broadcast stu-dio tool. Simple templates are available formonitoring teams, players, scores, and advertis-ing. An illustrative score bar overlays the score inthe top of the video.
(a) (b) (c) (d)
30%
18%
Figure 2. Automatic
detection of visual
landmarks for
advertising insertion
in sports video.
Internetor
broadband
TranscoderSYS-B
TranscoderSYS-B
TranscoderSYS-B
Windows 2003
streamingserver
Overlay player(SYS-C)
or Windows Media Player
(WMP)
Overlay player(SYS-C)or WMP
Overlay player(SYS-C)or WMP
EncoderSYS-A
Figure 3. Windows
Media implementation.
Figure 4 shows a snapshot of the encoderimplementing two simple forms of advertisingoverlay: text and image. These overlays are per-manently encoded into the video, and each hasa fixed location of exposure. As these types ofinsertion are common in video presentations, weoffer them as a standard feature in our system.Our system’s niche, however, is that operatorscan add additional manual control signals byspecifying signal duration (the minimal durationof the signal embedding into the video payload)and bounding box coordinates for advertisingcontent placement (for example, the red bound-ing box in Figure 4). The system sends the signaldownstream to the transcoder module, whichdecodes the control signal, decides whether tobuy the advertising space specified by the bound-ing box coordinates, and, if it decides to buy, per-forms the image overlay.
Figure 5 shows two corresponding snapshotsof the transcoder modules at two different receiv-ing locations downstream from the encoder,which has decided to perform an insertion in thebounding box. Each transcoder module can serveaudiences from different demographic locationsand, hence, insert different advertising contentin each bounding box. Note also that thetranscoder module shows the “VOX” overlay,which the encoder module permanently encodedupstream. Figure 6 (next page) also shows anexample of the chroma keying effect after sepa-rating the foreground objects (players) from thehomogeneous background (backstop). The play-er now appears to be standing in front of theinserted image. From here, the transcoder sendsthe new content downstream to the player, asFigure 7 shows.
ConclusionsSports content continues to generate a global
appeal that transcends national, cultural, reli-gious, and gender boundaries. Historically, TVtechnology’s success has been intertwined withthe development of televised sports. In thefamous words of pioneering TV sports directorHarry Coyle, “Television got off the groundbecause of sports.” Sports showcases offer a splen-did platform to promote new media technologies.During the 2006 FIFA (Fédération Internationalede Football Association) World Cup tournamentin Germany, major mobile companies launcheda plethora of mobile TV services with such offer-ings as video streaming, text-based services, ring-tone downloads, and mobile blogging.
By augmenting a video presentation withadvertising content, our semiautomatic system
81
Figure 4. Two types of encoder overlay: “EPSON” text overlay on the top-right
score-bar and a “VOX” image overlay at the bottom right.
Figure 5. Transcoder modules at different receiving locations separately insert
images into the bounding box location that the control signal specified.
proposes a new way of enhancing the commer-cial value of sports video webcasts. Clearly, theamount of advertising exposures must be man-aged to avoid viewers’ perceiving them as unnec-essary clutter in the video. As with legacy videobroadcasting systems staffed by human opera-tors, this check can be easily managed manually.Automatic machine detection techniques, how-ever, can further facilitate advertising contentinsertion. MM
AcknowledgmentsThe authors would like to thank Yiqun Li for
her implementation of the advertising effect in
baseball video and Akira Miyata of Digital VOXCorporation for valuable discussions.
References1. PVI Virtual Media Services, http://www.pvi.tv/
pvi/index.html.
2. Sportvision, http://www.sportvision.com.
3. Hawk-Eye Sports Tracking, http://www.
hawkeyeinnovations.co.uk.
4. Arbitron/Edison Media Research, Internet Study V:
Startling New Insights About the Internet and Stream-
ing, The Arbitron Company & Edison Media
Research, 2000; http://www.arbitron.com/study_
m/internet_study_v.asp.
5. A. Kokaram et al., “Browsing Sports Video: Trends
in Sports-Related Indexing and Retrieval Work,”
IEEE Signal Processing Magazine, vol. 23, no. 2,
2006, pp. 47-58.
6. A. Ekin and A.M. Tekalp, “Generic Play-Break Event
Detection for Summarization and Hierarchical
Sports Video Analysis,” Proc. IEEE Int’l Conf. Multime-
dia and Expo (ICME), IEEE Press, 2003, pp. 169-172.
7. L. Xie et al., “Structure Analysis of Soccer Video
with Domain Knowledge and Hidden Markov Mod-
els,” Pattern Recognition Letters, vol. 25, no. 7,
2004, pp. 767-775.
8. S.F. Chang, “The Holy Grail of Content-Based
Media Analysis,” IEEE MultiMedia, vol. 9, no. 2,
2002, pp. 6-10.
9. Z. Xiong et al., “Audio Events Detection Based
Highlights Extraction from Baseball, Golf and Soc-
cer Games in a Unified Framework,” Proc. IEEE Conf.
Acoustics Speech and Signal Processing (ICASSP), IEEE
Press, vol. 5, 2003, pp. 632-635.
10. K. Wan and C. Xu, “Automatic Content Placement in
Sports Highlights,” Proc. IEEE Int’l Conf. Multimedia
and Expo (ICME), IEEE Press, 2006, pp. 1893-1896.
11. Y. Li, “Real Time Advertisement Insertion in Baseball
Video Based on Advertisement Effect,” Proc. ACM Int’l
Conf. Multimedia, ACM Press, 2005, pp. 343-346.
12. K. Wan et al., “Real-Time Goal-Mouth Detection in
MPEG Soccer Video,” Proc. ACM Int'l Conf. Multime-
dia, ACM Press, 2003, pp. 311-314.
13. C. Simonetti, B. Birney, and J. Travis, “Using Win-
dows Media Technologies for Advertising on the
Internet,” Microsoft Digital Media Division;
http://msdn2.microsoft.com/en-us/library/
ms983661.aspx.
Readers may contact Kongwah Wan at kongwah@i2r.
a-star.edu.sg.
Contact Multimedia at Work editor Qibin Sun at qibin@
i2r.a-star.edu.sg.
82
Multimedia at Work
Figure 6. Foreground objects can also be automatically separated from the
homogeneous background.
Figure 7. The player
plays back video
received from the
transcoder in Figure 5
that inserted the A-Star
image.
Renew your IEEE Computer Society
membership today!
www.ieee.org/renewal