Upload
tavia
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multimedia Grand Challenge 2012. Mei-Chen Yeh 04/24/2012. Midterm Report. Submission due date: May 8 report short presentation (10-mins) Max 4 pages, double column Word template Latex Come up with a solution to one of the grand challenges - PowerPoint PPT Presentation
Citation preview
Multimedia Grand Challenge 2012
Mei-Chen Yeh04/24/2012
Midterm Report
• Submission due date: May 8– report– short presentation (10-mins)
• Max 4 pages, double column – Word template– Latex
• Come up with a solution to one of the grand challenges– http://www.acmmm12.org/call-for-multimedia-gran
d-challenge-solutions/
Why should I care about this?
• I want to pass this course.• Look for ideas for your final project / thesis.• Writing a report and doing a project always
take time. Why not turn the report/project into something beneficial?
Here comes the opportunity!
• 6 problems that Google, HP, NHK and other companies see in the future of multimedia
• Cash Award– 3 prizes for last year– For every finalist team this year
………………………….………………………….………………………….
………………………….………………………….
………………………….
Education Educationmaster, NTNU
………………………….………………………….
master, NXU………………………….………………………….
Experiences
………………………………………………………………………………………………………………………………………
……………………………………………
Experiences……………………………………………………………………………………………………………………………………………………………………………………
Publicationxxx, “A new approach for automatic music video generation”, ACM Multimedia Grand Challenges, 2012.
勝Make your resume stand out!
Great experience and great location!
Scottsdale, 2011
Beijing, 2009
Nara, 2012
Florence, 2010
2012 Challenges
• Google: Automatic Music Video Generation• 3DLife / Huawei: Realistic Interaction In Online Virtual
Environments• HP: Understanding the Emotional Impact of Images and
Videos• NHK: “Where is beauty?” Video Segment Extraction
Based on Aesthetic Quality Assessment• NTT Docomo: Event Understanding through Social
Media and its Text-Visual Summarization• Technicolor: Audiovisual Recognition of Specific Events
Google Challenge: Automatic Music Video Generation
Google Challenge
• Music Vide = Visual + Audio• A befitting soundtrack makes a video
compelling and likewise Lady Gaga’s music videos greatly enhance her songs.
• Automatic Music Video Generation– How to auto-suggest a cool soundtrack to a user-
generated video?– How to auto-generate interesting music videos?
Use Case 1
• You have shot a few family videos on your smartphone, but you don’t want to upload them to YouTube because they look boring.
• What if you could find a matching soundtrack? Wouldn’t it improve the appeal of the video and make you want to upload it?
• Goal: make a video much more attractive for sharing by adding a matching soundtrack to it.
• Bonus point: the application runs on Android or iPhone.
Use Case 2
• Consider the case that you are hosting a home party. You have a playlist of party music, but you don’t have any matching music videos to show on your 50 inch TV.
• Goal: automatically generate entertaining music videos that match the songs.
• Bonus point: personalize the music videos to the people who are viewing them.
You may focus on either of the two use cases.
Evaluation
• Novelty of the music video generation system• Entertainment value of the produced music
videos
http://www.mtv.com/
http://www.mtv.com.tw/
HP Challenge: Understanding the Emotional Impact of Images and Videos
HP Challenge
• Images and videos can serve as a powerful communications vehicle, conveying a wealth of information as well as emotional impact.
HP Challenge
• Images and videos are used extensively by professionals on web sites, magazine covers and printed advertisements to draw attention, communicate a message and leave a lasting emotional impression.
HP Challenge
• Understanding the Emotional Impact of Images and Videos: 6 research problems:1. How do we characterize the response categories
and levels of emotional impact?2. What attributes of images and videos are
associated with their emotional impact?• The color, composition, content, lighting, sharpness,
and movement of an image or a video, …
3. What affective models can be used to predict the emotional impact of images?
HP Challenge
4. How can we use the affective models to rank images and videos?
5. Can we use image and video transformations to change the emotional impact?
6. What are the applications of affective models?
HP Challenge
• Evaluation– how well the deep understanding of the emotional
impact is used to create novel and compelling applications on the web, for the mobile devices, and for social networks.
NHK Challenge: “Where is beauty?” Video Segment Extraction Based on Aesthetic Quality Assessment
美學的
NHK Challenge
• Goal:– “Where is beauty?” -- Automatic recognition of
beautiful scenes in broadcast programs• Two key questions:– how beauty is defined– how to approach beauty
• Dataset provided!
NHK Challenge• Input– Broadcast video program “Japan’s Scenic Beauty ” (25 min x 10
programs)• Video format: MPEG1 (704 x 480 pixels)• Audio: MPEG Audio 44.1 kHz stereo 224 kbps (English)
– Shot boundary data (xml file)• Output– List of extracted beautiful scenes that were ranked in the top
10%• The scenes should be described by the shot number that we provide or
frame number and its duration– Recommended video:
• 1 to 2-minute short video that is composed of extracted beautiful scenes
NHK Challenge
• Evaluation– Originality and adequacy of proposed algorithm– Reliability and variety of submitted beautiful
scenes– Quality of the submitted short video (if submitted)
NTT Docomo Challenge: Event Understanding through Social Media and its Text-Visual Summarization
NTT Docomo Challenge
• Goal:– Data-mining on social media to retrieve,
summarize, and visualize events for a selected topic
• Example – Topic: “local events for New York City”– Summarize twitter/flickr data to have the
magazine like “New York of the Day.”
NTT Docomo Challenge• Input
Researcher working on this challenge should collect necessary data from Twitter or Flicker. There will be at least three types of data requirement for this challenge.– Images: Twitter or Flicker, or both– Text: Tweets from Twitter– 3rd party contents: News website such as New York Times, Blog, and others.
• OutputThe output could be in a format of magazine, in which each article represents an event and each article is associated with either/both related images and texts. These images and texts should be self-explanatory to the article. The magazine could be summarized as daily basis, hourly basis, or even shorter.
NTT Docomo Challenge
• Research problems– Extract the local events from the Twitter data– Assign the location information to the image– Create a text summary of each local event with
tweets and other 3rd party contents– Assign the most relevant images to each local
event– Layout the articles and design the magazine
NTT Docomo Challenge
• Evaluation– Relevance of the summary/article to the actual
topic– Relevance of the related images to the abstract
text, or vice-versa– Quality of magazine design
Technicolor Challenge: Audiovisual Recognition of Specific Events
Technicolor Challenge
• Goal– given a short video sequence, with audio,
stemming from the coverage of a public event, the system should produce precise textual information on it.
Technicolor Challenge
• A description at the event identity level:– Which event is it?– When and where did it take place?– What is its context?– What is precisely happening in the audio-visual
scene?– In particular, who are the persons in the scene?– Where are they in the image?– What are they doing or saying?
Example
Key ideas
• Extract automatically as much information as possible from the audio-visual query and to use it to search the intertwined textual, audio and visual data available online!– Extraction of compact low-level audio-visual signatures– Detection and recognition of text present in the images– Detection and recognition of speech present in the
audio track– Semantic analysis of the audio-visual content
Huawei/3DLife Challenge: Realistic Interaction in Online Virtual Environments
Huawei/3DLife Challenge
• Goal– Support real-time realistic interaction between
humans in online virtual environments• Scenario– An online dance class where a dance teacher and
a student perform a series of movements
Huawei/3DLife Challenge
• Not limited to certain capture technology – Visual sensing techniques: a single camera, a
camera network, wearable inertial motion sensing – Gaming controllers: the Nintendo Wii, the
Microsoft Kinect
Huawei/3DLife Challenge
• Work with the provided data set to illustrate key technical components that would be required to realize this kind of online interaction and communication:– 3D data acquisition and processing from multiple sensor
data sources– Realistic (optionally real-time) rendering of 3D data based
on noisy or incomplete sources– Realistic and naturalistic marker-less motion capture– Human factors around interaction modalities in virtual
worlds http://perso.telecom-paristech.fr/~essid/3dlife-gc-12
Huawei/3DLife Challenge• A data set is provided, including:
– Synchronization data between each of the multiple calibrated sources capturing the students movements;
– Original music excerpts consisting of a few tracks at different tempos varying from low to fast;
– Inertial (accelerometer + gyroscope + magnometer) sensor data captured from multiple sensors on the student’s body;
– Depth maps for student performance captured using a Microsoft Kinect;
– Ratings of the student performances by the teacher;– A form of annotation of the choreographies (mostly basic steps
and movements for salsa beginners) performed.
Start Early!
• Upload your report on Moodle by 11:55pm, May 8, 2012
• Less than 4 pages, using the ACM MM template
• Prepare for a short presentation (<10-mins) for sharing your ideas on a challenge
• More information:– http://www.acmmm12.org/call-for-multimedia-gr
and-challenge-solutions/