Upload
fabio-fabio
View
215
Download
0
Embed Size (px)
DESCRIPTION
http://fvalente.zxq.net/presentations/socrole_presentation.pdf
Citation preview
Language-IndependentSocio-Emotional RoleRecognition in the AMI
Meetings CorpusFabio Valente1 and Alessandro Vinciarelli2
[email protected],[email protected]
1 - Idiap Research Institute (Switzerland)
2 - University of Glasgow (UK)
Interspeech 2011
Conversation Analysis• Conversation analysis and role recognition have been an active research
fields for long time [Sachs74].
• Automatic role recognition based on statistical classifiers has been studied in
CMU corpus [Banerjee06], the AMI corpus [Vinciarelli05], ICSI corpus
[Lakowski04] and various Broadcast conversations [Sibel09].
• Typical features consists in turn-taking patterns, turns durations, overlaps
between participants, stylistic and prosodic features as well as lexical
features.
• Applications into summarization, indexing and analysis.
• The roles considered in those studies are mainly formal roles constant over
the entire duration of the conversation, e.g., the Project Manager during a
professional meeting, the professor during a faculty meeting or the
moderator during a broadcast talk show.
Conversation Analysis• Formal roles do not generalize to any type of conversation.
• Formal roles are not directly related to the type nor to phenomena in
conversations.
• Several phenomena have been studied in (meetings) conversations like
hotspots and engagement [Shriberg03], dominance [Japygoci08],
agreement/disagreement [Hillard03].
• Socio-Emotional roles [Pianesi 2006] are a general coding scheme for small
group conversations and are more related to the type of meeting and its
dynamics.
• Inspired from Bales IPA [Bales76] and characterize the relationships between
group members and their roles “oriented toward the f unctioning of the
group as a group” [Pianesi 2006].
• Initial investigation using the same language-independent features used for
formal role recognition.
Outline
• Socio-Emotional Roles definition and previous works.
• AMI corpus annotations
• Feature extraction
• Statistical Modeling
1. basic generative model
2. modeling influence
3. jointly modeling formal and social roles
• Results and discussion
Socio-Emotional Roles• [PROTAGONIST] - A speaker that takes the floor, drives
the conversation, asserts its authority and assume a personal
perspective.
• [SUPPORTER] - A speaker that shows a cooperative
attitude demonstrating attention and acceptance providing
technical and relational support.
• [NEUTRAL] - A speaker that passively accepts others ideas
and serves as audience.
• [GATEKEEPER] - A speaker that acts like group
moderator, mediates and encourage the communication.
• [ATTACKER] A speaker who deflates the status of others,
express disapproval and attacks other speakers
Socio-Emotional Roles• 1. A participant has only a role in between those at a given time instant.
• 2. Multiple participants can have the same role at a given time instant.
• They can be related with a number of phenomena studied in meetings, like
engagement, domiance and hot-spots.
• Automatic recognition of social roles studied in the Mission Survival Corpus
2 (CHIL project).
• Activity features (audio and video), i.e., non-linguistic features, were used to
train statistical classifiers like SVM, HMM or coupled HMMs.
Dataset and Annotations• The AMI Meeting Corpus is a collection of meetings captured in specially
instrumented meeting rooms, which record the audio and video for each meeting
participant.
• In the scenario meetings, four participants play the role of a design team Project
Manager (PM), Marketing Expert (ME), User Interface Designer (UI), and
Industrial Designer (ID) tasked with designing a new remote control.
• The meeting is supervised by the Project Manager (PM)
Dataset and Annotations• Annotation guidelines same as the Mission Survival Corpus [Pianese06] and include
a number of physical behaviour and inferential questions.
• Five scenario meetings; Annotators were provided with audio and video.
• Given a set of participants {S} and the role set {R} = {P, S,N,G,A}, the
mapping ϕt(S) → R speaker-to-role is available.
• Roles are post-processed such that the role becomes the most frequent role that the
speaker has in a one-minute long window centered around time t.
Dataset and Annotations• Resulting role distribution (percentage of total speaking time) of the five meetings:
ProtagonistSupporter Neutral GatekeeperAttacker0
10
20
30
40Role Distribution
• Most of the time is attributed to the Protagonist/Supporter/Neutral roles and only
5% of the time is attributed to the Gatekeeper.
• No speaker is labeled as Attacker because of the collaborative nature of the
professional meeting.
Protagonist Supporter Neutral Gatekeeper0
0.2
0.4
0.6
0.8Social Role distribution conditioned to Formal roles
PM ID UI ME
• The Gatekeeper role, i.e., the moderator of the discussion, is consistently taken by
the Program Manager.
Feature Extraction• A meeting is a sequence of speaker turns (simplified turn definition [Shriberg01]).
To further simplify the problem, the time in overlapping regions is given to the floor
holder.
• F0 frequency (mean, standard deviation, minimum, maximum and median for each
turn), energy (mean and standard deviation for each turn) and mean speech rate
over the turn (Xn).
• Meeting M = {(t1, d1, X1, s1, r1, f1), ...., (tN , dN , XN , sN , rN , fN ) where:
tn is the beginning time of the n-th turn.
dn is its duration.
Xn is the prosodic feature vector.
sn is the speaker associated with the turn.
rn is the social role associated with the turn.
fn is the formal role.
Statistical modeling• Simple generative conversation model as First-order Markov Chain:
p(M) =
N∏
n=1
P (Xn|rn)P (dn|rn)P (rn|rn−1)
• P (rn|rn−1) represents the turn-taking patterns (Bigram LM).
• P (Xn|rn) represents the prosodic feature distribution (GMM).
• P (dn|rn) represents the turn duration (Gamma distribution).
• Scaling factors are introduced to bring the distributions to comparable
ranges.
Experimental Setup and Evaluation• Leave-one-out approach on the five annotated meetings.
• The social role of each speaker is assumed constant over the one-minute
long window both during training and testing.
• The center of the window is then progressively shifted by 20 seconds and the
procedure is repeated till the end of the meeting.
• All possible mappings speaker-to-role ϕ∗
t (S) → R are searched and the one
that maximizes the probabilitly of the model P (M) is selected.
Random Turns (Unigram) Turns (Bigram) Duration Prosody
Accuracy 0.26 0.35 0.49 0.43 0.41
Total Protagonist Supporter Neutral Gatekeeper
Model 1 0.59 0.61 0.62 0.68 0
Modeling Influence• Social roles are indicative of group behaviors - the influence that a speaker
has on others has been pointed as a central effect in determining those roles
[Dong08] in the MSC corpus.
• The influence is verified not only on the speech activity but also on the
prosodic behavior, body movement and focus of attention.
Total Protagonist Supporter Neutral Gatekeeper
Model 1 0.59 0.61 0.62 0.68 0
Influence 0.65 0.70 0.63 0.79 0
Formal and Social Roles• Even if Gatekeeper is a rare role - it is consistently taken by the PM in the
AMI meetings.
• This information can be modeled simply computing probabilities
p(rn|rn−1, fn).
• The formal role fn of speaker taking turn n, is assumed known and it is
constant over the entire meeting.
p(M) =N∏
n=1
P (dn|rrn−1
n )P (Xn|rrn−1
n )P (rn|rn−1, fn)
Total Protagonist Supporter Neutral Gatekeeper
Influence 0.65 0.70 0.63 0.79 0
Influence+Formal 0.68 0.72 0.65 0.80 0.15
Discussion and Conclusions• Social roles characterize the relationships between group members and they
can be related to several phenomena studied in conversations, e.g.,
engagement, hot-spots and dominance.
• They are universal thus could generalize to any type of discussions.
• The use of turn-taking patterns, turn duration and prosodic features can
recognize social roles with an accuracy of 59%.
• When influence is introduced, the accuracy becomes 65%.
• Integrating the formal role information in the conversation model, increase
the recognition rate to 68% permitting the recognition of Gatekeeper
instances.
• In future, more data, more features (eventually lexical) and more types of
conversations...
Thank You