Multimedia Grand Challenge 2012

Multimedia Grand Challenge 2012

Mei-Chen Yeh04/24/2012

Midterm Report

• Submission due date: May 8– report– short presentation (10-mins)

• Max 4 pages, double column – Word template– Latex

• Come up with a solution to one of the grand challenges– http://www.acmmm12.org/call-for-multimedia-gran

d-challenge-solutions/

http://www.acmmm12.org/paper-submission-instruction/

http://www.acmmm12.org/call-for-multimedia-grand-challenge-solutions/


Why should I care about this?

• I want to pass this course.• Look for ideas for your final project / thesis.• Writing a report and doing a project always

take time. Why not turn the report/project into something beneficial?

Here comes the opportunity!

• 6 problems that Google, HP, NHK and other companies see in the future of multimedia

• Cash Award– 3 prizes for last year– For every finalist team this year

………………………….………………………….………………………….

………………………….………………………….

………………………….

Education Educationmaster, NTNU

………………………….………………………….

master, NXU………………………….………………………….

Experiences

………………………………………………………………………………………………………………………………………

……………………………………………

Experiences……………………………………………………………………………………………………………………………………………………………………………………

Publicationxxx, “A new approach for automatic music video generation”, ACM Multimedia Grand Challenges, 2012.

勝Make your resume stand out!

Great experience and great location!

Scottsdale, 2011

Beijing, 2009

Nara, 2012

Florence, 2010

2012 Challenges

• Google: Automatic Music Video Generation• 3DLife / Huawei: Realistic Interaction In Online Virtual

Environments• HP: Understanding the Emotional Impact of Images and

Videos• NHK: “Where is beauty?” Video Segment Extraction

Based on Aesthetic Quality Assessment• NTT Docomo: Event Understanding through Social

Media and its Text-Visual Summarization• Technicolor: Audiovisual Recognition of Specific Events

Google Challenge: Automatic Music Video Generation

Google Challenge

• Music Vide = Visual + Audio• A befitting soundtrack makes a video

compelling and likewise Lady Gaga’s music videos greatly enhance her songs.

• Automatic Music Video Generation– How to auto-suggest a cool soundtrack to a user-

generated video?– How to auto-generate interesting music videos?

Use Case 1

• You have shot a few family videos on your smartphone, but you don’t want to upload them to YouTube because they look boring.

• What if you could find a matching soundtrack? Wouldn’t it improve the appeal of the video and make you want to upload it?

• Goal: make a video much more attractive for sharing by adding a matching soundtrack to it.

• Bonus point: the application runs on Android or iPhone.

Use Case 2

• Consider the case that you are hosting a home party. You have a playlist of party music, but you don’t have any matching music videos to show on your 50 inch TV.

• Goal: automatically generate entertaining music videos that match the songs.

• Bonus point: personalize the music videos to the people who are viewing them.

You may focus on either of the two use cases.

Evaluation

• Novelty of the music video generation system• Entertainment value of the produced music

videos

http://www.mtv.com/

http://www.mtv.com/

http://www.mtv.com.tw/

http://www.mtv.com.tw/

HP Challenge: Understanding the Emotional Impact of Images and Videos

HP Challenge

• Images and videos can serve as a powerful communications vehicle, conveying a wealth of information as well as emotional impact.

HP Challenge

• Images and videos are used extensively by professionals on web sites, magazine covers and printed advertisements to draw attention, communicate a message and leave a lasting emotional impression.

HP Challenge

• Understanding the Emotional Impact of Images and Videos: 6 research problems:1. How do we characterize the response categories

and levels of emotional impact?2. What attributes of images and videos are

associated with their emotional impact?• The color, composition, content, lighting, sharpness,

and movement of an image or a video, …

3. What affective models can be used to predict the emotional impact of images?

HP Challenge

4. How can we use the affective models to rank images and videos?

5. Can we use image and video transformations to change the emotional impact?

6. What are the applications of affective models?

HP Challenge

• Evaluation– how well the deep understanding of the emotional

impact is used to create novel and compelling applications on the web, for the mobile devices, and for social networks.

NHK Challenge: “Where is beauty?” Video Segment Extraction Based on Aesthetic Quality Assessment

美學的

NHK Challenge

• Goal:– “Where is beauty?” -- Automatic recognition of

beautiful scenes in broadcast programs• Two key questions:– how beauty is defined– how to approach beauty

• Dataset provided!

NHK Challenge• Input– Broadcast video program “Japan’s Scenic Beauty ” (25 min x 10

programs)• Video format: MPEG1 (704 x 480 pixels)• Audio: MPEG Audio 44.1 kHz stereo 224 kbps (English)

– Shot boundary data (xml file)• Output– List of extracted beautiful scenes that were ranked in the top

10%• The scenes should be described by the shot number that we provide or

frame number and its duration– Recommended video:

• 1 to 2-minute short video that is composed of extracted beautiful scenes

NHK Challenge

• Evaluation– Originality and adequacy of proposed algorithm– Reliability and variety of submitted beautiful

scenes– Quality of the submitted short video (if submitted)

NTT Docomo Challenge: Event Understanding through Social Media and its Text-Visual Summarization

NTT Docomo Challenge

• Goal:– Data-mining on social media to retrieve,

summarize, and visualize events for a selected topic

• Example – Topic: “local events for New York City”– Summarize twitter/flickr data to have the

magazine like “New York of the Day.”

NTT Docomo Challenge• Input

Researcher working on this challenge should collect necessary data from Twitter or Flicker. There will be at least three types of data requirement for this challenge.– Images: Twitter or Flicker, or both– Text: Tweets from Twitter– 3rd party contents: News website such as New York Times, Blog, and others.

• OutputThe output could be in a format of magazine, in which each article represents an event and each article is associated with either/both related images and texts. These images and texts should be self-explanatory to the article. The magazine could be summarized as daily basis, hourly basis, or even shorter.


• Research problems– Extract the local events from the Twitter data– Assign the location information to the image– Create a text summary of each local event with

tweets and other 3rd party contents– Assign the most relevant images to each local

event– Layout the articles and design the magazine


• Evaluation– Relevance of the summary/article to the actual

topic– Relevance of the related images to the abstract

text, or vice-versa– Quality of magazine design

Technicolor Challenge: Audiovisual Recognition of Specific Events

Technicolor Challenge

• Goal– given a short video sequence, with audio,

stemming from the coverage of a public event, the system should produce precise textual information on it.

Technicolor Challenge

• A description at the event identity level:– Which event is it?– When and where did it take place?– What is its context?– What is precisely happening in the audio-visual

scene?– In particular, who are the persons in the scene?– Where are they in the image?– What are they doing or saying?

Example

Key ideas

• Extract automatically as much information as possible from the audio-visual query and to use it to search the intertwined textual, audio and visual data available online!– Extraction of compact low-level audio-visual signatures– Detection and recognition of text present in the images– Detection and recognition of speech present in the

audio track– Semantic analysis of the audio-visual content

Huawei/3DLife Challenge: Realistic Interaction in Online Virtual Environments

Huawei/3DLife Challenge

• Goal– Support real-time realistic interaction between

humans in online virtual environments• Scenario– An online dance class where a dance teacher and

a student perform a series of movements


• Not limited to certain capture technology – Visual sensing techniques: a single camera, a

camera network, wearable inertial motion sensing – Gaming controllers: the Nintendo Wii, the

Microsoft Kinect


• Work with the provided data set to illustrate key technical components that would be required to realize this kind of online interaction and communication:– 3D data acquisition and processing from multiple sensor

data sources– Realistic (optionally real-time) rendering of 3D data based

on noisy or incomplete sources– Realistic and naturalistic marker-less motion capture– Human factors around interaction modalities in virtual

worlds http://perso.telecom-paristech.fr/~essid/3dlife-gc-12

http://perso.telecom-paristech.fr/~essid/3dlife-gc-12

Huawei/3DLife Challenge• A data set is provided, including:

– Synchronization data between each of the multiple calibrated sources capturing the students movements;

– Original music excerpts consisting of a few tracks at different tempos varying from low to fast;

– Inertial (accelerometer + gyroscope + magnometer) sensor data captured from multiple sensors on the student’s body;

– Depth maps for student performance captured using a Microsoft Kinect;

– Ratings of the student performances by the teacher;– A form of annotation of the choreographies (mostly basic steps

and movements for salsa beginners) performed.

Start Early!

• Upload your report on Moodle by 11:55pm, May 8, 2012

• Less than 4 pages, using the ACM MM template

• Prepare for a short presentation (<10-mins) for sharing your ideas on a challenge

• More information:– http://www.acmmm12.org/call-for-multimedia-gr

and-challenge-solutions/



Documents

Multimedia Grand Challenge 2012