FORGE 1 year: Jarno Kallio, PacketVideo

VisualLabel –Video Analytics Made Open

in a Big Way!

Jarno Kallio / PacketVideo Finland Oyon behalf of VisualLabel team

Motivation 1/2

• Applications utilizing “Big data” computing are expected to provide an ICT breakthrough in the next years.

• Big data computing started from Web search engines 15 years ago.

• Now “googling” has become the standard of searching and analyzing textual information.

• Extending this to media items: images and videos is still incomplete and existing in-house solutions only available to big IT giants.

Motivation 2/2

• Implementing such an media analytics service has not been possible until the recent developments in cloud technology

• Cloud platform’s such as Forge provide computational resources for resource intensive content analysis services.

• Additionally, Forge has very unique characteristics of being shared sandbox for open collaboration between companies & universities

Demo:

Video Analysis

using PicSOM system

You will see authentic unprocessed

object recognition results as green

subtitles over the video..

Demo about Video Analysis Capabilities:

Example clip (2:19) is an trailer from open source movie “Vaalkama”

Achieved Goals So Far:

1. Design a media analysis service for photos and videos

2. Design integration specification for content analysis back-ends and cloud storage • Publicly available REST APIs • data formats…

3. Implement prototype applications that demonstrate the basic system features

Target for 2015:

4. Refine reference service and platform to be easily available for testing by 3rd party

5. Design and validate self-learning algorithms for media content analysis

Challenges.. and how Forge came into rescue!

Before Forge we run couple of pilots with test users. The results itself look promising but...• Getting all 3 services running smoothly on separate

university and company clouds was a pain because..1. We didn’t have separate stable and dev env2. University cloud environment where sometimes

running other services -> offline for test users

• Now when both the frontend and backend services are running in Forge – problems where solved!

How it All Works? 1/3:FORGE Deployment

How it All Works? 2/3:Task Based Analysis

How it All Works? 3/3: Additional Details

• Performed for content stored on third party providers• No media storage, only metadata is synchronized• Support for popular content providers

(e.g. Google, Facebook, Twitter)• On-demand analysis for directly uploaded content• Generate tags (keywords) from media content• Generate sentences that describe the image in the

English language• Facial recognition• Detect similarities for extended search functionality

Back-end

Analysis Systems:

PicSOM (Aalto),

Tag-Engine (Arcada),

CMUVIS (TUT-SGN)

PicSOM media analysis system

• Developed at Aalto University since 1998• Content analysis of images and videos with a large variety of

visual features, including many types of SIFT, Fisher Vector and deep convolutional neural network features

• Provides also interactive search with relevance feedback based iterative query refinement

• Uses very fast linear Support Vector Machines for content classification and visual category detection

• Self-Organizing Maps are used for iterative search and class distribution analysis

• Participated in NIST's annual video search evaluation TRECVID since 2005 and ranked 2nd in the Semantic Indexing task in 2014

Automatic Tag Extraction from Social Media for Visual Labeling (Arcada)

1. FacebookAnalyzer• Retrieves basic Facebook profile content and generates

from it tags of user’s interest and hobbies.

2. TwitterAnalyzer• Retrieves tweet-image pairs from public Twitter

accounts • Analyze the Tweet text to extract hashtags, named

entities, keywords and phrases• Post-processing to remove noisy tags

Core of the Tag-Engine

• Make use of existing content structure information to identify relevant parts of the content

• Allow preferences in content weighting to customize the system for different types of profiles

• Heuristic rules, Statistical term weighting method Term TF-IDF weighting, with adjustment

N-grams Named entities Hashtags

• Simple, generic, handling multiple languages

Twitter Analysis: English TweetPresident Obama meets with @VP Biden and members of his National Security Council in the Situation Room. pic.twitter.com/SDWvZ1sSnL

"Neil Armstrong, Buzz Aldrin and Michael Collins took the 1st small steps of our giant leap into the future." —Obama pic.twitter.com/OVwaxP1kgm

Text Tagssituation roompresident obamameetsnational security councilmembers of his nationalcouncil in the situationobama meets with @vpmeets with @vp biden@vp biden and members

Named Entity TagsObama

HashtagsVP

Text Tagsneil armstronggiant leap1st small stepssteps of our giantbuzz aldrin and michaelcollins took the 1staldrin and michaelcollinsleap into the future

Named Entity TagsObamaNeil ArmstrongBuzz AldrinMichael Collins

Twitter Analysis: Finnish TweetOlympialaisten avajaisissa nähtiin koreita kuvioita– ja osasi se Akukin jo 20 vuotta sitten... #Sotshi#Lillehammer pic.twitter.com/ZRwl9PfLZw

#Facebook täyttää tänään 10. Vuodesta 2010 mukana ollut Aku Ankka onnittelee. #some#pärstäpankki pic.twitter.com/tO74l0Jc7o

Text Tagsopening ceremonyolympic gameskoreita patterns -20 years agoknew it akukinpatterns - and knewceremony of the olympicakukin already 20 yearsgames was seen koreita

Named Entity TagsAkukinHashtagsSotshiLillehammerText Tags in Original Language

Text Tagstoday 10donald duck congratulatesinvolved had donaldduck2010 involved had donald

tänään 10aku ankka onnitteleemukana oli aku ankka2010 mukana oli aku

Named Entity TagsDonald Duckaku ankka

HashtagsFacebooksomepärstäpankki

https://twitter.com/hashtag/Facebook?src=hash

https://twitter.com/hashtag/some?src=hash

https://twitter.com/hashtag/p%C3%A4rst%C3%A4pankki?src=hash

http://t.co/tO74l0Jc7o

Motivation and Goals

• Goal(s):

• Detection of different indoor furniture types in user uploaded images.

• Automatically recognize different furniture types in vender provider

images.

Cloud MUVIS: Visual Label Project

Cloud MUVIS Architecture

Images/Audio

s/Videos/Text

Data

Partitio

n

Machine Learning

(Feature Extraction)

Offline

Processing

Images/Audios

/Videos/Text

Online

ProcessingMatch

score/Recommendation

Challenges

• Unlimited Object

Categories

• Appearance Variation

• Object Pose

• View Angle

• Illumination Variation

• Occlusions

• Image Quality Variation

Prototype

Applications

Prototype Application 1#Smart Photo Service



Prototype Application 2#Facebook Profile Summarization

Prototype Application 2#Facebook Profile Summarization

Partners:

Thank You!

Q & A

Backup

Slides

Summarization

•Analysis of user’s social media accounts•Generate tags from profile content (posts, tweets, timelines...)•Provide tags as suggestions for content modifications

Search

•Text-based search using pre-generated keywords•Similarity search using features extracted from analyzed content•On-demand similarity search

–Direct file upload–URL content

•Search targeted to user’s synchronized content (accounts), not public Internet

Feedback

•Indirectly from user’s actions–Content modifications

•Directly from users–Feedback for search results

•Delivered to back-ends to facilitate self-learning algorithms

Technology

FORGE 1 year: Jarno Kallio, PacketVideo