socrole_presentation

Language-IndependentSocio-Emotional RoleRecognition in the AMI

Meetings CorpusFabio Valente1 and Alessandro Vinciarelli2

[email protected],[email protected]

1 - Idiap Research Institute (Switzerland)

2 - University of Glasgow (UK)

Interspeech 2011

Conversation Analysis• Conversation analysis and role recognition have been an active research

fields for long time [Sachs74].

• Automatic role recognition based on statistical classifiers has been studied in

CMU corpus [Banerjee06], the AMI corpus [Vinciarelli05], ICSI corpus

[Lakowski04] and various Broadcast conversations [Sibel09].

• Typical features consists in turn-taking patterns, turns durations, overlaps

between participants, stylistic and prosodic features as well as lexical

features.

• Applications into summarization, indexing and analysis.

• The roles considered in those studies are mainly formal roles constant over

the entire duration of the conversation, e.g., the Project Manager during a

professional meeting, the professor during a faculty meeting or the

moderator during a broadcast talk show.

Conversation Analysis• Formal roles do not generalize to any type of conversation.

• Formal roles are not directly related to the type nor to phenomena in

conversations.

• Several phenomena have been studied in (meetings) conversations like

hotspots and engagement [Shriberg03], dominance [Japygoci08],

agreement/disagreement [Hillard03].

• Socio-Emotional roles [Pianesi 2006] are a general coding scheme for small

group conversations and are more related to the type of meeting and its

dynamics.

• Inspired from Bales IPA [Bales76] and characterize the relationships between

group members and their roles “oriented toward the f unctioning of the

group as a group” [Pianesi 2006].

• Initial investigation using the same language-independent features used for

formal role recognition.

Outline

• Socio-Emotional Roles definition and previous works.

• AMI corpus annotations

• Feature extraction

• Statistical Modeling

1. basic generative model

2. modeling influence

3. jointly modeling formal and social roles

• Results and discussion

Socio-Emotional Roles• [PROTAGONIST] - A speaker that takes the floor, drives

the conversation, asserts its authority and assume a personal

perspective.

• [SUPPORTER] - A speaker that shows a cooperative

attitude demonstrating attention and acceptance providing

technical and relational support.

• [NEUTRAL] - A speaker that passively accepts others ideas

and serves as audience.

• [GATEKEEPER] - A speaker that acts like group

moderator, mediates and encourage the communication.

• [ATTACKER] A speaker who deflates the status of others,

express disapproval and attacks other speakers

Socio-Emotional Roles• 1. A participant has only a role in between those at a given time instant.

• 2. Multiple participants can have the same role at a given time instant.

• They can be related with a number of phenomena studied in meetings, like

engagement, domiance and hot-spots.

• Automatic recognition of social roles studied in the Mission Survival Corpus

2 (CHIL project).

• Activity features (audio and video), i.e., non-linguistic features, were used to

train statistical classifiers like SVM, HMM or coupled HMMs.

Dataset and Annotations• The AMI Meeting Corpus is a collection of meetings captured in specially

instrumented meeting rooms, which record the audio and video for each meeting

participant.

• In the scenario meetings, four participants play the role of a design team Project

Manager (PM), Marketing Expert (ME), User Interface Designer (UI), and

Industrial Designer (ID) tasked with designing a new remote control.

• The meeting is supervised by the Project Manager (PM)

Dataset and Annotations• Annotation guidelines same as the Mission Survival Corpus [Pianese06] and include

a number of physical behaviour and inferential questions.

• Five scenario meetings; Annotators were provided with audio and video.

• Given a set of participants {S} and the role set {R} = {P, S,N,G,A}, the

mapping ϕt(S) → R speaker-to-role is available.

• Roles are post-processed such that the role becomes the most frequent role that the

speaker has in a one-minute long window centered around time t.

Dataset and Annotations• Resulting role distribution (percentage of total speaking time) of the five meetings:

ProtagonistSupporter Neutral GatekeeperAttacker0

10

20

30

40Role Distribution

• Most of the time is attributed to the Protagonist/Supporter/Neutral roles and only

5% of the time is attributed to the Gatekeeper.

• No speaker is labeled as Attacker because of the collaborative nature of the

professional meeting.

Protagonist Supporter Neutral Gatekeeper0

0.2

0.4

0.6

0.8Social Role distribution conditioned to Formal roles

PM ID UI ME

• The Gatekeeper role, i.e., the moderator of the discussion, is consistently taken by

the Program Manager.

Feature Extraction• A meeting is a sequence of speaker turns (simplified turn definition [Shriberg01]).

To further simplify the problem, the time in overlapping regions is given to the floor

holder.

• F0 frequency (mean, standard deviation, minimum, maximum and median for each

turn), energy (mean and standard deviation for each turn) and mean speech rate

over the turn (Xn).

• Meeting M = {(t1, d1, X1, s1, r1, f1), ...., (tN , dN , XN , sN , rN , fN ) where:

tn is the beginning time of the n-th turn.

dn is its duration.

Xn is the prosodic feature vector.

sn is the speaker associated with the turn.

rn is the social role associated with the turn.

fn is the formal role.

Statistical modeling• Simple generative conversation model as First-order Markov Chain:

p(M) =

N∏

n=1

P (Xn|rn)P (dn|rn)P (rn|rn−1)

• P (rn|rn−1) represents the turn-taking patterns (Bigram LM).

• P (Xn|rn) represents the prosodic feature distribution (GMM).

• P (dn|rn) represents the turn duration (Gamma distribution).

• Scaling factors are introduced to bring the distributions to comparable

ranges.

Experimental Setup and Evaluation• Leave-one-out approach on the five annotated meetings.

• The social role of each speaker is assumed constant over the one-minute

long window both during training and testing.

• The center of the window is then progressively shifted by 20 seconds and the

procedure is repeated till the end of the meeting.

• All possible mappings speaker-to-role ϕ∗

t (S) → R are searched and the one

that maximizes the probabilitly of the model P (M) is selected.

Random Turns (Unigram) Turns (Bigram) Duration Prosody

Accuracy 0.26 0.35 0.49 0.43 0.41

Total Protagonist Supporter Neutral Gatekeeper

Model 1 0.59 0.61 0.62 0.68 0

Modeling Influence• Social roles are indicative of group behaviors - the influence that a speaker

has on others has been pointed as a central effect in determining those roles

[Dong08] in the MSC corpus.

• The influence is verified not only on the speech activity but also on the

prosodic behavior, body movement and focus of attention.


Model 1 0.59 0.61 0.62 0.68 0

Influence 0.65 0.70 0.63 0.79 0

Formal and Social Roles• Even if Gatekeeper is a rare role - it is consistently taken by the PM in the

AMI meetings.

• This information can be modeled simply computing probabilities

p(rn|rn−1, fn).

• The formal role fn of speaker taking turn n, is assumed known and it is

constant over the entire meeting.

p(M) =N∏

n=1

P (dn|rrn−1

n )P (Xn|rrn−1

n )P (rn|rn−1, fn)


Influence 0.65 0.70 0.63 0.79 0

Influence+Formal 0.68 0.72 0.65 0.80 0.15

Discussion and Conclusions• Social roles characterize the relationships between group members and they

can be related to several phenomena studied in conversations, e.g.,

engagement, hot-spots and dominance.

• They are universal thus could generalize to any type of discussions.

• The use of turn-taking patterns, turn duration and prosodic features can

recognize social roles with an accuracy of 59%.

• When influence is introduced, the accuracy becomes 65%.

• Integrating the formal role information in the conversation model, increase

the recognition rate to 68% permitting the recognition of Gatekeeper

instances.

• In future, more data, more features (eventually lexical) and more types of

conversations...

Thank You

Documents

socrole_presentation