VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard

VoiceUNL : a proposal to represent speech control mechanisms

within the Universal Networking Digital Language

Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard Chollet (ENST)reviewed by Christian Boitet (GETA-CLIPS-IMAG)

Content

• Background of this work

• Proposal of extension of UNL - Speech to speech MT

• Emotion representation

Background

• Normalangue - normalization of linguistic resources (2002)TECHNOVOC and RNIL (2002) - normalization of technologies applied in the domain of the engineering of written and spoken language, SIEMENS, TELISMA, IDYLIC, DIALOCA, ELAN Speech, ST Microelectronics, LORIA, ENST Paris

• Lingtour (2002) - multilingual-multimedia MT , TsingHua University (China), Paris 8 University (France), INT (France), ENST-Paris and Bretagne (France) and CLIPS (France)

Extension of UNL (1)

Example : - May I smoke?- No! You may not, Victor .

[S:01] {org:e1} - May I smoke?{/org} {unl} agt(smoke(icl>do).@entry.@present.@may.@interrogative, I) {/unl}[/S]

[1]

Extension of UNL (2)

[S:02] {org:e2}- No! you may not, Victor [arte]

{/org}{unl}agt(smoke(icl>do).@entry.@present.@may.@not, you)mod(smoke(icl>do).@entry.@present.@may.@not, no)mod(no, !(icl>symbol).@interjection)mod(smoke(icl>do).@entry.@present.@may.@not,

Victor(icl>name).@vocative){/unl}[/S]

Speech to Speech Machine Translation (SSMT)

1. Speaker recognition

2. Gestures, facial movement andspeech recognition

3. Transcription and text transfer (UNL)

4. Target language generation (Ariane-G5)

5. Voice, speech, gestures synthesis

[Furui,03,Blanchon,02]

Emotion representation (1)

Classification of emotions :

(1) happiness, (2) sadness,

(3) disgust, (4) surprise,

(5) fear, (6) anger,

(7) irritation, (8) hesitation,

(9) uncertainty, (10) neutral[morita,89; Ekman ,79, 03; OOC,90; ESPIRE, 00]


Emotion eliciting factors and task facets in SSMT:

• lexicon (sad, happy, etc)• phatics (ah, hein, etc.)• prosodies (fast, slow, strong, etc.)• voice (noisy, soft, young, etc.)• gestures (movements of hands, mouth, eyes, etc.)


1 2 3 4 5 6 7 8 9 10lexicon * * * * * * * * * *phatics * * * * * * * * * *prosodies * * * * * * * * *voice * * * * * * * *hands * *mouth * eyes * * * * * *eyebrows * * * *head *shoulders * * *


Speaker recognition and voice synthesis :

• gender,

• age,

• Variant (natural, artificial, etc.),

• voice name (high-pitched, husky, etc.)[BMC,02; W3C rec, 02]


Prosody :• Pitch : x-high, high, medium, low, x-low, default • Range,• Rate : x-fast, fast, medium, slow, x-slow, default • Duration,• Volume : silent, x-soft, soft, medium, loud, x-

loud, default• Emphasis,• Break [BMC,02; W3C rec.,02]


Lexicon and Speech acts :Inform, Offer, Offer-follow-up, Promise, Yn-question, Action-request, Confirmation-question, Do-you-understand-question, Permission-request, Wh-question, Yes, No, Acknowledge, Thanks, Thanks-response, Farewell, Good-wishes, Good-wishes-response, Greet, Apology, Apology-response, Alert, Instruct, Confirmation-question-to-self, Invite, Vocative, Topic, Expressive

[tomokiyo, 00]


Facial movements : left, right, up, down• mouth• eyes• eyebrows

Body movements : left, right, up, down• hands• shoulder• heads [ACE, 02; BMC, 02; MPEG-4, 00]


<?xml version="1.0" encoding="iso-8859-1 ?>





<!DOCTYPE D (View Source for full doctype...)>

<D dn=" TV " on="TV. 1.2" dt="2003">

<Paragraph number="1">

<Sentence :snumber="1">

<org lang="el"> May I smoke?</org>

<unlsem>

agt(smoke(icl>do).@entry.@present.@may.@interrogative, I)

</unlsem>

<speech-act>type=”Yn-question”

may I smoke ? </speech-act>

<prosody>may I <emphasis> smoke</emphasis> ? </prosody>

</Sentence >

</Paragraph>

</D>


<?xml version="1.0" encoding="iso-8859-1 ?>





<!DOCTYPE D (View Source for full doctype...)>

<D dn=" TV " on="TV. 1.2" dt="2003">

<Paragraph number="1">

<Sentence snumber="2">

<org lang="el"> No!, you may not, Victor.</org> <unlsem>

aoj(smoke(icl>do).@entry.@present.@may.@not, you)

mod(smoke(icl>do).@entry.@present.@may.@not, no)

mod(no, !(icl>symbol))

mod(smoke(icl>do).@entry.@present.@may.@not, Victor(icl>name).@vocative)

</unlsem>

<speech-act> type=”Expressive” No! type=”Inform” you may not, type=”Vocative” Victor </speech-act>

<prosody> <emphasis> No!</emphasis> you may <emphasis> not </emphasis> Victor </prosody>

<emotion> type =”surprise” lexicon=”No!” eyebrows=”left-and right raised” No! you may not</emotion>

</Sentence >

</Paragraph>

</D>

Reflections and Next step

• Extension of UNL –from written text processing to SSMT in multimodality and multilingualism, focussing on emotion representation

• Visual corpus development

• Development of a prototype with speech and image interface

Documents

VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard