24
multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

Embed Size (px)

Citation preview

Page 1: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

multimodality, universals, natural interaction…

and some other stories…

Kostas Karpouzis & Stefanos Kollias

ICCS/NTUA

HUMAINE WP4

Page 2: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

going multimodal

• ‘multimodal’ is this decade’s main ‘affective interaction’ aspect.

• plethora of modalities available to capture and process– visual, aural, haptic…– ‘visual’ can be broken down to ‘facial

expressivity’, ‘hand gesturing’, ‘body language’, etc.

– ‘aural’ to ‘prosody’, ‘linguistic content’, etc.

Page 3: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

why multimodal?

• Extending unimodality…– recognition from traditional unimodal

inputs had serious limitations– Multimodal corpora become available

• What to gain?– have recognition rates improved?– or just introduced more uncertain

features

Page 4: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

essential reading

• Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81

Page 5: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

putting it all together

• myth #6: multimodal integration involves redundancy of content between modes

• you have features from a person’s– facial expressions and body language– speech prosody and linguistic content,– even their heartbeat rate

• so, what do you do when their face tells you different than their …heart?

Page 6: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

first, look at this video

Page 7: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

and now, listen!

Page 8: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

but it can be good

• what happens when one of the available modalities is not robust?– better yet, when the ‘weak’ modality

changes over time?

• consider the ‘bartender problem’– very little linguistic content reaches its

target– mouth shape available (viseme)– limited vocabulary

Page 9: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

but it can be good

Page 10: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

again, why multimodal?

• holy grail: assigning labels to different parts of human-human or human-computer interaction

• yes, labels can be nice!– humans do it all the time– and so do computers (e.g.,

classification)– OK, but what kind of label?

Page 11: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

In the beginning …

• Based on the claim that ‘there are six facial expressions recognized universally across cultures’…

• all video databases used to contain images of sad, angry, happy or fearful people…

• thus, more sad, angry, happy or fearful people appear, even when data involve HCI, and subtle emotions/additional labels are out of the picture– can you really be afraid that often when

using your computer?

Page 12: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

the Humaine approach

• so where is Humaine in all that?– subtle emotions– natural expressivity– alternative emotion representations– discussing dynamics– classification of emotional episodes

from life-like HCI and reality TV

Page 13: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

Humaine WP4 results Dataset (leader)

Frames/users/Length

Modalities present

Features extracted

Until now Plans for 2007 Recognitionrates

ERMIS SAL (QUB-ICCS)

Four subjects~ 2hr of audio/video annotated with Feeltrace

facial expressions,Speech prosody,head pose

FAPs per frameAcoustic features per tunePhonemes/

Visemes

One subject analyzed~34.000 frames~800 tunes

Extract facial and prosody features from three remaining subjectsAnalyze head pose

RecurrentNNs:87% Rule-based : 78,4%Possibilistic:

65,1%

EmoTV(LIMSI)

28 clips~5 minutes total

Subtle facial expressions,Restricted gesturing

Overall activation (FAPs or prosody not possible

All clips Extract Remaining expressivity features (where possible)

Correlation with manual annotator: κ*=0,83

EmoTaboo (LIMSI)

2 clips~5 minutes

Facial expressionsSpeech prosody

FAPs All clips Head poseProsody features

Annotation not yet available

CEICES (FAU)

51 children~ 9 hrs recorded and annotated

Speech prosody

Acoustic features per turn/word

All clips Completed analysis, pending comparison of recognition schemes

Mean recognition rate: 55.8%

Genoa06 corpus (Genoa)

10 subjects~50 gesture repetitions each~1 hour

FAPsgesturing,Pseudolangu

age

FAPsgesturesSpeech

All clips Expressivity features from hand movement

Facial: 59.6%Gestures: 67.1%Speech: 70.8%Multimodal: 78.3%

GEMEP(GERG)

1200 clips total FAPsgesturing,Pseudolangu

a

Expressivity, gesturesFAPsSpeech

8 body clips30 face clips

Analyze remaining 1200 clips

Few clips analyzed

Page 14: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

HUMAINE 2010

three years from now in a galaxy (not) far,

far away…

Page 15: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

a fundamental question

Page 16: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

a fundamental question

• OK, people may be angry or sad, or express positive/active emotions

• face recognition provides response to the ‘who?’ question

• ‘when?’ and ‘where?’ are usually known or irrelevant

• but, does anyone know ‘why?’– context information – semantics

Page 17: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

a fundamental question (2)

Page 18: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

is it me or?...

Page 19: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

is it me or?...

• some modalities may display no clues or, worse, contradicting clues

• the same expression may mean different things coming from different people

• can we ‘bridge’ what we know about someone or about the interaction with what we sense?– and can we adapt what we know based on

that?– or can we align what we sense with other

sources?

Page 20: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

another kind of language

Page 21: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

another kind of language

• sign language analysis poses a number of interesting problems– image processing and understanding

tasks– syntactic analysis– context (e.g. when referring to a third

person)– natural language processing– vocabulary limitations

Page 22: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

want answers?

Let us try to extend some of the issues already

raised!

Page 23: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

Semantic Analysis

Semantics – Context (a peak at the future)

Visual- data Segmentation Feature Extraction

C1

C2.

Cn

Context

Fu

sio

n

Adapt-

ation

Label-ling

Ontologyinfrastructure

Context analysis

Visual analysis

Classifiers

Fuzzy Reasoning Engine (FiRE)

Centralised /DecentralisedKnowledge Repository

Page 24: Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

Standardisation Activities

• W3C Multimedia Semantics Incubator Group

• W3C Emotion Incubator Group

Provide machine understandable representations of available Emotion Modelling, Analysis, Synthesis theory, cues and results to be accessed through the Web and used in all types of affective interaction.