8
The Speech Recognition Virtual Kitchen Florian Metze and Eric Fosler-Lussier INTERSPEECH 2012

The Speech Recognition Virtual Kitchen

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: The Speech Recognition Virtual Kitchen

The Speech Recognition Virtual Kitchen

Florian Metze and Eric Fosler-Lussier

INTERSPEECH 2012

Page 2: The Speech Recognition Virtual Kitchen

Multimedia Retrieval and Summarization

“Traditional” Multimedia Retrieval and Summarization

Select frames and shots that are most informative

Save user time by avoiding repetitions etc. (BBC Rushes Summarization)

Recent Advances in Natural Language Processing

Replace “extractive” summarization of text with “abstractive” techniques

Use Statistical Machine Translation as a general technique to convert long “foreign” symbol sequence into concise English text

Would this not apply nicely to Multi-media?

Easily have huge amounts of data

“Skimming”, “tagging” with keywords, or “liking” clearly doesn’t do justice to relevance, complexity and potential of Multi-media

Page 3: The Speech Recognition Virtual Kitchen

What’s Next?

Generate more detailed synopses, add temporal aspects, properties

Add more modalities (sounds, etc.)

“What is in these videos?”

Text could summarize multiple videos at once

Attract interest to (groups of) videos

“Why is this video relevant? Or different?”

Text can relate a retrieved video to the query

Text can potentially flag false alarms, outliers

Page 4: The Speech Recognition Virtual Kitchen

Thank You!

Page 5: The Speech Recognition Virtual Kitchen

Feature Definition

Event name: Changing a vehicle tire

Definition: One or more people work to replace a

tire on a vehicle

Explication: A vehicle is any device, motorized or not, used to transport people and/or other items. Tires are ring-shaped inflated

objects, usually made of rubber, that fit over the wheel of a vehicle. The

process for replacing a tire includes removing the existing tire and

installing the new tire onto the wheel of the vehicle. Tires typically are

replaced because they are damaged or worn down. If a tire is damaged and

loses air pressure as a result, it is called a "flat tire". Generally the

driver of the vehicle with a flat tire will stop the vehicle as soon as

possible and replace the affected tire with a temporary tire called a

"spare tire”, which may be stored elsewhere on/in the vehicle. In other

cases, the tire may be changed not by the vehicle operator, but by a

professional (e.g. a mechanic) who may use dedicated tools and work in a

repair shop or similar setting.

Evidential description:

scene: garage, outdoors, street, parking lot

objects/people: tire, lug wrench, hubcap,

vehicle (car, bike, lawnmower, etc), tire jack

activities: removing hubcap, turning lugwrench,

unscrewing bolts, pulling rim out of tire

audio: narration of the process; sounds of

tools being used; street/traffic noise;

background noises from repair shop

Extract candidates for relevant

objects from “Event Kit”

Determine salient objects from

MED features

Intersect both sets

Use ontologies to resolve

synonyms, etc

Combine data-driven and

knowledge based sources

Page 6: The Speech Recognition Virtual Kitchen

MER Approach: Feature Extraction

What to mention:

Take visual evidence (for 100s of classes) for video

Re-rank using manually determined “importance”

How to mention:

Present as corroborating or contraindicative evidence

Place additional constraints

Similar for ASR hypotheses

Based on unigrams for now

Move from “hand-engineered” to automatic methods

Now: similar to Tf/ Idf measure, BOW features

Future: Bipartite graph matching to determine “good” concepts

Birthday

Vehicle unstuck

Flash mob

Vehicle Tire

Page 7: The Speech Recognition Virtual Kitchen

INTERSPEECH AFTERPARTY “Speech Recognition Virtual Kitchen”

Broadway 3 & 4

4:30pm on Thursday, September 13

We want your input to grow this

idea further – show your support

Come and see more demos of VMs

Discuss with potential users or

content providers from outside the

speech community

Present your own ideas in a short

presentation(?)

http://www.speechkitchen.org/