Kinect-taped communication: Using motion sensing to study gesture use and similarity in face-to-face and computer-mediated brainstorming

Kinect-taped Communication: ��Using Motion Sensing to Study Gesture Use ��and Similarity in Face-to-Face and ��Computer-Mediated Brainstorming

Hao-Chuan Wang, Chien-Tung Lai National Tsing Hua University, Taiwan

[cf. Bos et al., 2002; Setlock et al., 2004; Scissors et al., 2008, Wang et al., 2009]

Computer-mediated communication (CMC) tools are prevalent, but are they all equal?�•  Ex. Video vs. Audio��Media properties influence aspects of communication differently�•  Task performance, grounding, styles, similarity of

language patterns, social processes and outcomes etc.

How media influence communication?

Communication could be more than speaking.�Both verbal and non-verbal channels are active

during conversations.�

Facial expression

Gesture

[cf. Goldin-‐Meadow, 1999; Giles & Coupland, 1991 ]

The (missing) non-verbal aspect in CMC research

Studying gesture use in communication Current methods:�•  Videotaping with manual coding.�•  Giving specific instructions to participants �

(e.g., to gesture or not).�•  Using confederates etc.�

Problems to solve:�•  High cost. Labor-intensiveness.�•  Resolution of manual analysis- �

Hard to recognize and reliably label small movements.�•  Scalability-�

Hard to study arbitrary communication in the wild.�

“Kinect-taping”method Like videotaping, we use motion sensing devices, such as Microsoft Kinect, to record hand and body movements during conversations.�

•  Detailed, easier-to-process representations.�•  Behavioral science instrument (“microscope”) to

study non-verbal communication in ad hoc groups.�•  Low cost if automatic measures are satisfactory.�

Re-appropriating motion sensors in HCI: Sensing-aided user research for ��future designs From sensors as design elements to sensors as research instruments to help future designs.�

!

!(a)!Face(to(face!(F2F)!communication! !

(b)!Video(mediated!communication!

Figure'1.'A'sample'study'setting'that'compares'(a)'F2F'to'(b)'video<mediated'communication'by'using'Kinect'as'a'behavioral'science'instrument.'

!

[cf. Mark et al., 2014]

A media comparison study Investigate how people use gestures during face-to-face and computer-mediated brainstorming��Compare three communication media�•  Face-to-Face�•  Video�•  Audio�

!

!(a)!Face(to(face!(F2F)!communication! !

(b)!Video(mediated!communication!

Figure'1.'A'sample'study'setting'that'compares'(a)'F2F'to'(b)'video<mediated'communication'by'using'Kinect'as'a'behavioral'science'instrument.'

!

Hypotheses

H1. Visibility increases gesture use� Proportion of gesture� Face-to-Face > Video > Audio�

H2. Visibility increases accommodation Similarity between group members’ gestures�

Face-to-Face > Video > Audio�

Also explore how gesture use, level of understanding, and ideation productivity correlate.

[cf. Clark & Brennan, 1991]

[cf. Giles & Coupland, 1991]

Experimental design

36 individuals, 18 two-person groups�

�Kinect-taped group brainstorming sessions�

��

Face-to-Face Video Audio

Three trials (15 min each) in counterbalanced order

Data analysis�Amount and similarity of gestures, �

Level of understanding, Productivity�

How to quantify gestures? How many gestures are there in a 15 min talk?

moving

not moving

Two unit motions with speed threshold 0

Three unit motions with speed threshold 2

Choose the thresholds

(m/s)

Choose the thresholds

Too few signals Almost everything

Data points of interest (m/s)

How to measure similarity between unit motions?

Feature extraction and representation Unit motions are represented as feature vectors�•  Time length, path length, displacement, �

velocity, speed, angular movement etc.�•  Features extracted for both hands and both

elbows.�

73 features extracted for each unit motion.��Similarity between unit motions: Cosine value between the two vectors.��

Validating the similarity metric

1 2

3

Machine Ranking

Human Ranking

1 2

3

Randomly select motion queries

Retrieve similar and dissimilar motions

Kinect-taped motion database

Count Human Rank

R1 R2 R3

Machine Rank�

R1 29 2 5

R2 7 27 2

R3 0 7 29

x2=107.97, p<.001

Validating the similarity metric

Contingency analysis

H1: Amount of gesture use�

H2: Similarity between group members�

�

Associations�•  Amount of gesture and understanding�•  Amount of gesture and ideation productivity�•  Gesture similarity and ideation productivity��

Key Results

Visibility on proportion of gesture use

0

2

4

6

8

10

12

14

16

Face-to-face Video Audio

Prop

ortio

n of

Ges

ture

Use

(%

)

H1 not supported. Media did not influence percentage of gesture. �People gesture as much in Audio as in F2F and Video.�

Association between self-gesture and level of understanding

Mod

el&Predicted

,Und

erstanding�

Mod

el&Predicted

,Num

ber,o

f,Ide

as�

Propor9on,of,Individual’s,Own,Gesture,Use,(%)�

Mod

el&Predicted

,Und

erstanding�

Mod

el&Predicted

,Num

ber,o

f,Ide

as�

Propor9on,of,Individual’s,Own,Gesture,Use,(%)�

Audio�

F2F�

Video�

Individual’s Own Gesture Use (%)�

Non-communicative function of gesture. ��Understanding correlates with �self-gesture but not partner-gesture��Stronger correlation with reduced or no visibility.��

Similarity between group members

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

0.55

Face-to-face Video Audio

Betw

een-

part

icip

ant

Ges

tura

l Si

mila

rity

H2 supported. Similarity F2F > Video > Audio. �People gesture more similarly when they can see each other.�

Summary and implications

Media

Comparison Study

Kinect-taping

Method��

Motion sensing for studying non-verbal behaviors in CMC.�

Summary and implications

Media

Comparison Study

Kinect-taping

Method��

Visibility influences similarity but not amount of gesture.��Only self-gesture correlates with understanding.��Gesture doesn’t seem to convey much meaning to the partner. Seeing the partner is not crucial to understanding.��

Study communication of ad hoc groups�in the wild. ��Distributed deployment�study of CMC tools.��Cross-lingual and cross-cultural communication.�

Summary and implications (cont.)

Media

Comparison Study

Kinect-taping

Method��

The value of video may be relatively limited to the social and collaborative aspect (similarity etc.).��Feedback that promotes self-gesturing may help understanding.��

Microsoft Research Asia �(UR FY13-RES-OPP-027)��Ministry of Science and Technology, Taiwan �(NSC 102-2221-E-007-073-MY3)��Contact:�Hao-Chuan Wang ⺩王浩全 [email protected]

Acknowledgement