In Copyright - Non-Commercial Use Permitted Rights ...46843/eth-46843-02.pdfDiss. ETH No. 21894 Observing Teams with Wearable Sensors A dissertation submitted to ETH Z¨urich for the

Research Collection

Doctoral Thesis

Observing Teams with Wearable Sensors

Author(s): Feese, Sebastian F.

Publication Date: 2014

Permanent Link: https://doi.org/10.3929/ethz-a-010261327

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-010261327

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

Diss. ETH No. 21894

Observing Teams withWearable Sensors

A dissertation submitted to

ETH Zurich

for the degree ofDoctor of Sciences

presented by

Sebastian Franz Feese

Dipl. -Ing., TU-Berlinborn May 9, 1983

citizen of Berlin, Germany

accepted on the recommendation of

Prof. Dr. Gerhard Tröster, examinerProf. Dr. Klaus Jonas, co-examiner

2014

Sebastian FeeseObserving Teams with Wearable SensorsDiss. ETH No. 21894

First edition 2014Published by ETH Zurich, Switzerland

Printed byBuch- und O↵setdruckerei H. Heenemann GmbH & Co. KGBessemerstrasse 83-91, 12103 Berlin

Copyright c� 2014 by Sebastian Feese

All rights reserved. No part of this publication may be reproduced,stored in a retrieval system, or transmitted, in any form or by anymeans, electronic, mechanical, photocopying, recording, or otherwise,without the prior permission of the author.

To my parents.

Acknowledgments

My time at the Wearable Computing Lab. of ETH Zurich has beena wonderful and rewarding experience. I am sincerely thankful to myacademic supervisor Prof. Dr. Gerhard Tröster for giving me this op-portunity, o↵ering his support and guidance while allowing me free-dom in my work. I would also like to thank Prof. Dr. Klaus Jonas forco-examining my PhD thesis.

In addition I would particularly like to thank Dr. Bert Arnrich who sup-ported me during the first years of the work and with whom I had manyfruitful discussions. Special thanks go to Dr. Michael Burtscher and Dr.Bertolt Mayer from the University of Zurich with whom I had the plea-sure to work together on our interdisciplinary SNF project. I would liketo thank them for sharing their expertise in social-psychology with meand the many discussions on our joint experiments.Also, I would like to thank all members of the Zurich fire brigade fortheir participation and engagement throughout the experiments. Espe-cially, I thank Dr. Jan Bauke and Pascal Eichmann for their continuoussupport, advice and curiosity.

A big thank you goes to all members of the Wearable ComputingLab. In particular, I thank all those who contributed with joint research,valuable input, helpful discussions or in any other way: Alberto, Amir,Burcu, Christina, Christoph, Holger, Julia, Kilian, Martin K., Martin W.,Mirco, Thomas H., Thomas K., Tobias, Ulf. I would also like to thankRuth for her help with many administrative tasks and Fredy for histechnical support.

Thank you also to all my friends outside ETH for many legendarymoments of joy. I am deeply grateful to Andrea for sharing her timeand positive energy with me jumping from one event to the next.

Finally, I want to thank the most important people to me – my family.My parents, brother and sister have always been there supporting mein every way possible. If it was not for their endless love, I would nothave been able to follow my ideas.

Zurich, April 2014 Sebastian Feese

Contents

Abstract xi

Zusammenfassung xiii

1. Introduction 11.1 Teams and Team Performance . . . . . . . . . . . . . . . 21.2 Sensing Teamwork . . . . . . . . . . . . . . . . . . . . . 41.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Objectives of the Thesis . . . . . . . . . . . . . . . . . . . 101.5 Thesis Outline and Paper List . . . . . . . . . . . . . . . 121.6 Additional Publications . . . . . . . . . . . . . . . . . . 15Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2. Thesis summary 232.1 Outline of Contributions . . . . . . . . . . . . . . . . . . 242.2 Conducted Experimental Studies . . . . . . . . . . . . . 252.3 Quantifying Behavioral Mimicry between Team Members 282.4 Measuring Verbal Communication of Teams in Mobile

and Noisy Scenarios . . . . . . . . . . . . . . . . . . . . 322.5 Capturing Spatial and Temporal Coordination in Teams 352.6 Observing Teams in the Wild . . . . . . . . . . . . . . . 412.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 442.8 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 452.9 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3. Quantifying Behavioral Mimicry 493.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 503.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3 Individual Nonverbal Cues from Body Motion . . . . . 553.4 Interpersonal Cues from Body Motion - Behavioral

Mimicry . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.5 Results and Discussion . . . . . . . . . . . . . . . . . . . 593.6 Conclusion and Outlook . . . . . . . . . . . . . . . . . . 65Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

viii

4. Discriminating Leadership Style 694.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 704.2 Prior and Related Work . . . . . . . . . . . . . . . . . . . 714.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.5 Results and Discussion . . . . . . . . . . . . . . . . . . . 794.6 Conclusion and Outlook . . . . . . . . . . . . . . . . . . 81Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5. Noise Robust Speech Activity Detection 875.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 885.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 895.3 Noise Robust Speech Detection . . . . . . . . . . . . . . 895.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 935.5 Speech Detection during Firefighting . . . . . . . . . . . 945.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 98Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6. Sensing Group Proximity Dynamics 1016.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1026.2 ANT-based Proximity . . . . . . . . . . . . . . . . . . . 1046.3 Group Clustering . . . . . . . . . . . . . . . . . . . . . . 1096.4 Firefighting Experiment . . . . . . . . . . . . . . . . . . 1126.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 120Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7. Sensing of Team Coordination Indicators 1237.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1247.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 1257.3 Sensing Spatial and Temporal Coordination . . . . . . . 1287.4 Extraction of Team Coordination Indicators . . . . . . . 1337.5 Evaluation of Team Coordination Indicators . . . . . . . 1407.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 149Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

ix

8. Monitoring Firefighters in Real-world Missions 1558.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1568.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 1578.3 Performance Indicators . . . . . . . . . . . . . . . . . . . 1598.4 CoenoFire System . . . . . . . . . . . . . . . . . . . . . . 1608.5 Validation of Performance Metrics . . . . . . . . . . . . 1658.6 CoenoFire in the Wild . . . . . . . . . . . . . . . . . . . 1708.7 Discussion and Conclusion . . . . . . . . . . . . . . . . 1778.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 178Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Glossary 183

Curriculum Vitae 185

xi

Abstract

Teams and team work are essential in today’s organizations. Di�-cult, complex and interdependent tasks in the workplace call for teamswhich deliberately combine individual skills and knowledge. To per-form well as a team, members need to direct, align, monitor and sup-port their work requiring them to interact with each other.

In order to improve team performance, it is mandatory to observehow team members work and interact with another. However, tradi-tional observation approaches rely on self-report having low temporalresolution and manual behavior observation being labour intensiveand thus costly.

It was the aim of this thesis to investigate a wearable computingapproach to observe team behaviors automatically. First, actions andsocial behaviors of team members were captured with body-worn sen-sors such as accelerometers, barometers, microphones and wirelessradios. Second, individual and interpersonal behavioral cues were ex-tracted from the sensor data. Third, behavioral cues were aggregatedon team level to capture team behaviors.

We focused on the recognition of three behaviors: i) behavioralmimicry referring to the alignment of gestures and postures betweeninteraction partners, ii) amount and timing of verbal communica-tion indicating information exchange and iii) moving sub-groups andmovement synchrony capturing spatial and temporal aspects of teamcoordination.

Behavioral mimicry in group discussions was quantified by count-ing how frequently team members adopted the gestures and posturesshown by other team members. Gestures and postures of the lowerarms were recognized from movement data captured with body-worninertial sensors. In a naturalistic meeting scenario nodding was de-tected with 85 % subject-independent accuracy, face touch and flatarms with 99 % and gesticulating, fidgeting and posture changes with67 % across 30 individuals.

In order to unobtrusively capture verbal communication in fire-fighting teams we used the microphone of the smartphone to detectspeech in the vicinity of each team member. The placement of thesmartphone inside jacket pockets and the often noisy environmentrequired a noise robust approach to speech detection. We utilizeddictionary learning and sparse representation to robustly detect thepresence of speech with an average accuracy of 85 % in noisy ambient

xii

audio recorded during firefighting missions.We captured team coordination by a) detecting moving sub-groups

over time from radio based proximity data and b) quantifying move-ment synchrony between team members by analyzing their motionactivity levels. In a firefighting training scenario team members wereassigned to the correct sub-group with an average accuracy of 95 %.The proposed team coordination metrics significantly correlated withcompletion time and perceived coordination.

As part of the thesis we collected behavioral data of 55 group dis-cussions, 18 training missions of professional firefighters as well as 76real-world missions of actual fire incidents. Over 220 persons partici-pated in three experiments and more than 165 hours of data have beencollected.

Further, we designed and implemented a smartphone-based sens-ing system to observe teams unobtrusively in the field. We demon-strated the value of the team sensing approach by monitoring profes-sional firefighting teams during training and actual incidents. Positivefeedback of the incident commanders and instructors showed that thevisualization of sensor data was evaluated to be a useful tool for train-ing and mission feedback.

We conclude that the proposed team behavior sensing approachcontributed a practical tool to automatically and objectively observeand monitor important aspects of teamwork in the wild.

xiii

Zusammenfassung

Teams und Teamarbeit sind in heutigen Organisationen unerlässlich.Schwierige, komplexe und voneinander abhängige Aufgaben verlan-gen Teams, in denen individuelle Fähigkeiten und Wissen bewusstkombiniert werden. Um als Team gut zusammenzuarbeiten, müssensich Teammitglieder aufeinander einstellen, sich unterstützen undmiteinander interagieren.

Um die Leistung eines Teams zu verbessern ist die Beobachtungvon Zusammenarbeit und Interaktion zwischen Teammitglieder nötig.Traditionelle Beobachtungsansätze beruhen jedoch zumeist auf Frage-bögen mit geringer zeitlicher Auflösung und manueller, arbeitsinten-siver und daher teurer Verhaltensbeobachtung.

Es war das Ziel dieser Arbeit, einen “Wearable Computing”-Ansatz zu untersuchen, um Verhalten innerhalb von Teams automa-tisch zu beobachten. Zunächst wurden soziale Verhaltensweisen derTeammitglieder mit am Körper getragenen Sensoren wie Beschleuni-gungsmessern, Barometern, Mikrofonen und drahtloser Kommunika-tionsmittel erfasst. Anschliessend wurden individuelle und interper-sonelle Verhalten aus den Sensordaten extrahiert, welche schliesslichauf Teamebene aggregiert wurden, um so Teamverhalten zu erfassen.

Wir konzentrierten uns auf die Erkennung dreier Verhalten: i)Spiegelung von Gesten und Körperhaltungen zwischen Interaktion-spartnern, ii) Umfang und Zeitpunkt verbaler Kommunikation undiii) Erkennung von Untergruppen und synchroner Bewegung zur Er-fassung räumlicher und zeitlicher Aspekte von Teamkoordination.

Verhaltensspiegelung in Diskussionsgruppen wurde gemessen, in-dem gezählt wurde wie oft Teammitglieder sich gegenseitig in ihrenGesten und Körperhaltungen folgten. Gesten und Körperhaltungender Unterarme wurden mit getragenen Bewegungssensoren erfasst. Ineinem Diskussionsszenario wurde Kopfnicken personenunabhängigmit 85 % Genauigkeit erkannt, Gesichtsberührungen und flache Armemit 99 % und gestikulierende Arme, zappelige Arme und Änderungender Armhaltung mit 67 %.

Zur unaufdringlichen Erfassung verbaler Kommunikiation ver-wendeten wir das Mikrofon des Smartphones. So waren wir inder Lage, Sprache in der Nähe einzelner Teammitglieder zu erken-nen. Die Platzierung der Smartphones in Jackentaschen und dieoft lauten Umgebungen erforderten einen geräuschrobusten Ansatzzur Detektion von Sprache. Wir verwendeten “Codebuch-Lernen”

xiv

und “Spärliche-Representation”, um Sprache während Brandbekämp-fungstrainings robust mit einer durchschnittlichen Genauigkeit von85 % zu erkennen.

Teamkoordination erfassten wir durch a) die Detektion von sichzusammenbewegenden Untergruppen und b) die Quantifizierungvon Bewegungssynchronität zwischen Teammitgliedern mittels Anal-yse ihrer Bewegungsaktivität. In einem Feuerwehrübungsszenariowurden Teammitglieder der richtigen Untergruppe mit einer durch-schnittlichen Genauigkeit von 95 % zugewiesen. Die vorgeschlagen-en Teamkoordinationsindikatoren korrelierten signifikant mit Einsatz-dauer und wahrgenommener Koordination.

Als Teil dieser Arbeit haben wir Verhaltensdaten von 55 Grup-pendiskussionen, 18 Durchgängen einer Feuerwehreinsatzübung und75 Feuerwehreinsätzen mittels tragbarer Sensoren aufgezeichnet.Mehr als 220 Personen nahmen an den drei Experimenten teil und eswurden insgesamt mehr als 165 Stunden Verhaltensdaten von Teamsgesammelt.

Desweiteren wurde ein Smartphone-basiertes Messsystem zurunaufdringlichen Beobachtung von Teams im Feld entworfen und im-plementiert. Die Bedeutung des vorgeschlagenen Team-Beobachtungs-verfahren konnten wir durch die Überwachung von Berufsfeuer-wehrteams zeigen. Positive Rückmeldungen der Kommandanten undTrainer zeigten, dass die Visualisierung von Sensordaten ein nützlichesWerkzeug für die Einsatznachbesprechung und Ausbildung bildet.

Wir kommen zu dem Schluss, dass der vorgeschlagene Ansatz,Teamverhalten im Feld mit tragbaren Sensoren zu messen, als prak-tisches Instrument zur automatischen und objektiven Beobachtungwichtiger Aspekte der Teamarbeit herangezogen werden kann.

1Introduction

2 Chapter 1: Introduction

1.1 Teams and Team Performance

Teamwork has become one of the most common forms of organiza-tional collaboration. As a result, teams are found across many disci-plines, e.g. in health care, aviation, emergency response, engineeringand others. The widespread use of teams have made their performancecrucial for organizational success [1].

In a team a group of people work together towards a shared goaland individuals contribute their skills and knowledge. Teams are partof a larger organization and their members regard each other as asocial entity. Teams are particularly appropriate in highly complextasks that can only be solved when the skills of all team members arethoughtfully combined. In a coordinated e↵ort the team profits fromthe complimentary strengths and minimizes individual weaknesses.

1.1.1 Team E↵ectiveness Framework

Team e↵ectiveness is the evaluation of team performance (process)in terms of an outcome measure related to a) quantity or quality ofthe produced outputs, b) e↵ects on its members’ a↵ect and viability(e.g. satisfaction, commitment) or c) the team’s capability to performe↵ectively in the future [2]. To investigate the underlying variablesof team e↵ectiveness team research has proposed numerous concep-tional frameworks. Most of the frameworks share the view that inputsat di↵erent levels influence teamwork processes which are responsibleof producing outputs [1, 3]. In this thesis, we adopt the team e↵ec-tiveness framework presented in [4] which follows Ilgen et. al [5] anddistinguishes behavioral processes and emerging states related to af-fect and cognition. A slightly simplified version of the framework isillustrated in Figure 1.1.

Generally, teamwork is enabled and constrained by a set of inputfactors which are found on three contextual levels: individual char-acteristics of team members (e.g. skills and knowledge), team charac-teristics (e.g. leadership style, team composition) and organizationalcharacteristics (e.g. reward systems, organizational climate). Thus, theinput factors define the initial position of a team.

While working towards the shared goal team members engage inteamwork interacting with each other to coordinate their work. Assuch, teamwork includes behavioral processes and emergent stateswhich capture cognitive, motivational and a↵ective states of team

1.1. Teams and Team Performance 3

Organizational Context

Team Context

Members ProcessesEmergent States

Team PerformanceMembers’ Affect and Viability

Inputs Mediators Outcomes

Figure 1.1: Team E↵ectiveness Framework: Processes and emergingstates mediate inputs to outcomes. Adapted from [4].

members. Behavioral processes include communication, back-up be-havior and coordination whereas team cohesion reflects one exampleof an emergent state. Given the theoretical framework of team e↵ec-tiveness, researchers investigate how team processes and emergentstates mediate the e↵ects of inputs on outcomes.

1.1.2 Measurement of Team Performance

Traditionally researchers measure team performance with the help ofquestionnaires and behavior observation. To assess the performancecomponents of the theoretical model, e↵ective and ine↵ective behav-iors as well as relevant cognitive and a↵ective concepts have to beidentified with help of subject matter experts [6]. Team performanceepisodes are then observed and the specified behaviors are quantifiedusing checklists and frequency counts or they are rated by trainedobservers. For example, team communication could be rated in termsof quality or quantified in terms of frequency. Additionally, in case ofsimulation based trainings event-based measures can be used to as-sess team performance. Events in the training scenarios are generatedin such a way that they should trigger an action by the team. E↵ectiveand ine↵ective responses can then be quantified [7]. As an example, ateam member could be overloaded with work on purpose to triggerback-up behavior of other team members.

In contrast to behaviors, cognitive and a↵ective aspects cannot bydirectly observed and are therefore often captured with question-naires [7]. As one example, interpersonal attraction between teammembers can be assessed by self-report.


1.1.3 The Need for Automatic Observational Tools

Behavior observation and questionnaires are best practice to measureteam performance. However, these methods su↵er from the follow-ing limitations: Self-reports require team members to actively answerquestionnaire items which could distract from primary tasks. As a di-rect consequence, questionnaires are administered at limited points intime and thus have low temporal resolution. Manual behavior obser-vation on the other hand o↵ers high temporal resolution, yet is notablyabsent from group research [8]. One reason is that this method is time-consuming as the manual encoding of behavior usually takes multipletimes of the actual interaction. As a result, most studies are limited tosmall samples and short observational periods.

Acknowledging the limitations of traditional observational meth-ods, Salas et. al explicitly state the need for new “unobtrusive andreal-time measures of team performance that can be practically im-plemented, especially in the field [1, p.544].” New tools for behaviorobservation of teams are not only needed in team research, but also inteam training and monitoring where team observation is the founda-tion of feedback aimed at improving overall team performance.

1.2 Sensing Teamwork

It is the aim of this thesis to develop new tools for automatic behav-ior observation of teams. Following a wearable computing approach,the goal is to sense individual and interpersonal behaviors with body-worn sensors in an automatic and standardized way in order to en-able real-time monitoring of team performance. The proposed sensingframework includes the following three steps which are illustrated inFigure 1.2:

1. Sensing Behavior - Actions and social behaviors of team mem-bers are captured with wearable sensors. We measure body mo-tion, speech and proximity to others with sensors that are eitherdirectly worn on the body or those that are integrated into mobiledevices carried by team members.

2. Extracting Behavioral Cues - From the sensor data we extractbehavioral cues which describe individual behaviors of teammembers such as their speech and motion activity as well as

1.2. Sensing Teamwork 5

3) Team Metrics1) Sensing Behavior

MotionSpeech Proximity

2) Extracting Behavioral Cues

x� Speech Activityx� Motion Activityx� Gestures and

Postures

x� Mimicryx� Co-locationx� Motion

Synchrony

Individual Cues Interpersonal Cues Team Level

x� Mimicryx� Communicationx� Coordination

Team Input

x� Leadership Style

Team Outcome

x� Performance

Figure 1.2: Observing teams with wearable sensors.

interpersonal behaviors such as the alignment of postures andgestures between team members (behavioral mimicry).

3. Team Metrics - In order to characterize team performance, wederive team metrics by aggregating individual and interpersonalbehavioral cues on the team level. The team metrics can then belinked to input and output variables of the team e↵ectivenessframework.

The proposed team sensing framework was developed within theinterdisciplinary project “Micro-level behavior and team performance:A social signal processing approach to teamwork [9]“ funded by SwissNational Science Foundation. Within the project new behavior sens-ing methods are integrated into team research. The project formulatestwo main hypotheses: 1) Team behaviors can be sensed with wear-able sensors. 2) Inputs are mediated to outputs by an interaction ofmicro-level behaviors, a↵ective and cognitive states. While the firsthypothesis is part of this thesis, the second hypothesis is tested bythe group of Prof. Dr. Klaus Jonas at the University of Zurich. Theirprevious research [10, 11] has shown initial evidence for hypothesis 2.


1.3 Related Work

In the following, we will review related work on platforms and ap-plications of behavior sensing, as well as on automatic extraction ofbehavioral cues and their relationship to psychological measures.

1.3.1 Behavior Sensing Platforms and Applications

In order to automatically capture aspects of human behavior, sensingplatforms are required that incorporate di↵erent sensing modalitiessuch as speech, vision, motion and others. While permanently installedaudio-visual recording systems can be used in stationary scenarios likesmart meeting rooms, mobile sensing systems are needed to capturehuman behavior in the field. One of the first wearable social sensingdevices was the sociometer [12]. With the custom device everydayconversations, face-to-face interaction and body movement could bemeasured out of the lab. However, as more and more sensors wereintegrated into commercially available mobile and wearable devices(in particular the smartphone) researchers adopted these devices associal sensing platforms [13, 14].

The automatic analysis of meetings is one of the stationary scenariosin which the behavior of meeting participants can be observed withpre-installed cameras and microphones. Previous research on the au-tomatic analysis of social interactions in group meetings focused onthe detection of group functional roles [15], dominance and status [16],conversational pattern [17, 18], group cohesion [19] and emerging lead-ership [20, 21].

Reality mining on the other hand refers to the collection and analy-sis of mobile phone sensor data in order to identify patterns of humanbehavior. Eagle et. al have first investigated the use of the mobilephone to capture daily routines of individuals and to detect commu-nities from Bluetooth proximity networks [13]. Olguin et. al proposedto use sociometric badges, a new version of the sociometer, to sensehuman interaction in organizations [22, 23]. Focusing on the individual,the measurement of well-being is another application of behavior sensing.Previous research has shown how the smartphone and it’s integratedsensors can be used to infer daily routines [24], to detect basic emo-tions [25] of a user, to detect perceived stress during a street promotiontask [26], and to quantify sleep quantity and sociability [27].

1.3. Related Work 7

GroupSensor Modality InterpersonalIndividual

WiFi

GSM Cell ID

GPS

Inertial Sensors

Camera

Microphone

Motion

Location

Speech Activity

Speech Prosody

Movement SynchronyPhysical Activity

Daily Routines

Important Places

Gestures & Postures

Moving Sub-Groups

Conversation Network

Bluetooth Proximity Co-Location Co-Location Network

In-ConversationVoiced SpeechAudio

Figure 1.3: Sensors, sensing modalities and corresponding behavioralcues that can be detected on the individual, dyad or group level.

1.3.2 Automatic Extraction of Behavioral Cues

Behavioral cues represent aspects of human behavior that can be ex-tracted from sensor data. While individual cues capture the behaviorof a single group member, interpersonal cues capture the interdepen-dencies between group members. Individual and interpersonal cuesare aggregated to capture behavior on team level. Figure 1.3 shows anoverview of the sensing modalities and the behavioral cues that havebeen detected in previous work.

Individual Cues

Speech activity cues capture who speaks when for how long and havebeen mostly used for the automatic analysis of group discussions.Speech activity cues include among others total speaking time, numberof short utterances and number of speaking turns [16]. Additionally,speaking energy and speech prosodic features such as pitch have beenused as nonverbal cues [26].

In the meeting scenario, motion of the upper body and the handswere detected using vision based methods [15], whereas gross bodymotion was measured with accelerometers [12, 22] in mobile settings.

As hand and head gestures play an important role in human in-teraction [28] vision based approaches to their automatic detectionwere previously investigated in the domain of human machine inter-


action [29, 30]. Recently, communicative cues such as gesticulating,self-touch, hand on table and hiding hands have been detected fromvideo sequences of job interviews [31].

Interpersonal Cues

To capture interdependencies between team members the sensor sig-nals of two group members have to be analyzed. In case of speech ac-tivity cues multiple speaking state sequences are therefore comparedto detect interpersonal speech activity cues such as interruptions (over-lapping speech) and speaker turns [16].

As the alignment of behavior and motion synchrony between in-teraction partners has been found to be linked to the a↵ective rela-tionship [28, 32] behavior alignment is an important cue to be cap-tured. In the meeting scenario, synchrony between group memberswas measured as the mutual information of two corresponding mo-tion or speech activity sequences [19]. In psychiatry sessions bodymovement synchrony between therapist and client was quantified bytime-lagged correlation between two motion energy signals [33]. Arecent review on evaluation methods of interaction synchrony can befound in [34].

In the mobile setting, individuals move and interact with one an-other in their daily environment. Therefore, mobility adds a spatialcomponent to the behavior sensing and requires to detect when groupmembers are in proximity (co-located) and when they interact withone another. In the field, proximity between group members has beenestimated by repeated Bluetooth scans [13, 23, 35, 36]. As it was as-sumed that mobile devices were carried by their users, detected de-vices corresponded to nearby persons. In order to sense face-to-faceinteraction the sociometer sends and receives device IDs via infraredsignals which limits the direction and distance at which messages canbe exchanged. In this way, only people who are near and face eachother are detected [12, 22]. Wyatt et. al showed how conversation andco-location can be inferred by comparing voiced speech sequencesrecorded simultaneously on di↵erent devices [37].

Aggregation on Group Level

In order to aggregate individual cues on group level, members’ cues areoften combined by taking their sum, mean, maximum or minimum [16,

1.3. Related Work 9

20]. For example, total speaking time of a team would be the sum ofindividual speaking times. Pairwise measures on the other hand areoften represented in network form and combined on team level withthe help of social network analysis metrics which capture the overallstructure of the sensed group networks, e.g. network density or degreecentralization [12, 22]. The bag-of-nonverbal-cues is another approachto describe the social interaction within a group [17]. Essentially, groupcues can be defined by the cue distribution across group members orby classifying aggregated cues in either greater or less than the averagevalue of a training corpus. For example, the speaking time distributionover an interaction period is classified as either one member speaking,two members speaking, equal speaking and other. Group cues suchas overlapping speech are quantized into two levels corresponding togreater or less than the average value observed within a corpus.

1.3.3 Sensed Group Behavior and Psychological Measures

Olguin et. al collected behavioral data with sociometric badges wornby 67 nurses in the Post Anesthesia Care Unit of a hospital [22]. Linearregression was used to investigate the link between aggregated groupcues and group outcomes. The analysis showed a positive relation-ship of group body motion energy and speech activity with perceivedworkload. The amount of voiced speech best predicted perceived groupproductivity.

Another approach to link behavioral cues to psychological con-structs relies on supervised classification [38]. For example, behavioralcues for predicting group cohesion were investigated in [19]. From theAMI Meeting Corpus [39], which includes audio-visual recordings ofproject meetings, 120 segments of two minutes were annotated byexternal observers. Segments with high inter-rater reliability were se-lected for the classification task of high and low cohesion. Group cohe-sion was classified with up to 90 % accuracy using group total silence ascue. The relationship between nonverbal cues and emergent leadershipwas investigated in [20, 21]. Twenty groups of three or four personssolved a hidden profile task and team members were afterwards askedto rate each other to determine the emerged leader of the group. It wasfound that emergent leaders spokes more than other group membersand that perceived leadership was highly correlated with dominance.Emergent leadership was classified with up to 85 % accuracy usingnonverbal cues extracted from the audio and video signals.


In contrast to the supervised approaches, Jayagopi et. al used topicmodels [40] to mine conversational patterns from interaction periods oftwo or five minutes in an unsupervised way. Using a bag-of-nonverbal-cues approach, interaction periods were described by group cues tocapture group speaking pattern [17]. From the AMI Meeting Cor-pus [39] two and five minute long interaction periods were extractedfrom 37 meetings. In this data set, three patterns related to di↵erentleadership style (autocratic, participative, free-rein) were discovered.

1.4 Objectives of the Thesis

The thesis aims at developing new behavioral observational tools forteam research, team training and team monitoring. Relying on wear-able computing, activity recognition and speech detection the goal isto characterize interaction in teams by analyzing behavioral cues au-tomatically extracted from sensor data. The first three objectives (seeSections 1.4.1, 1.4.2 and 1.4.3) focus on 1) the automatic quantificationof behavioral mimicry between team members, 2) the measurementof verbal communication of teams in noisy environments and 3) thecapturing of spatial and temporal coordination in teams. The fourthobjective is to demonstrate the value of these automatically extractedbehavioral cues by showing that they can be used to observe teams “inthe wild”.

1.4.1 Quantifying Behavioral Mimicry between Team Members

Behavioral mimicry is the unconscious alignment of gestures and pos-tures between two interaction partners. It is an important behavioralcue which has been shown to be linked to interpersonal a↵ect, likingand rapport. Chartrand et. al described mimicry as the “social gluethat binds and bonds people together [41]”. Thus, behavioral mimicryis likely to play a role in teamwork. Mimicry can be captured by eithercomparing discrete individual behavioral cues with respect to theirtiming and frequency [42], or by investigating the similarity of contin-uous data streams of multiple persons such as their activity levels [33].In this thesis, we follow the first approach and detect explicit behav-ioral cues which are easily understood and directly characterize theinteraction. To test our approach, we quantify behavioral mimicry indecision teams that are led by leaders exhibiting di↵erent leadershipstyles. Thus, we address the following questions:

1.4. Objectives of the Thesis 11

• How well can gestures and postures of the lower arms be de-tected from wearable motion sensors in naturalistic human in-teractions?

• How can behavioural mimicry between interaction partners beautomatically quantified using wearable sensors?

• Can leadership style influence mimicry behavior in teams?

• At what time of the discussion can we best discriminate leader-ship style?

1.4.2 Measuring Verbal Communication of Teams in Mobile andNoisy Scenarios

At the heart of team interaction lies the verbal exchange of informa-tion. As others [12, 22, 37], we focus on the amount and timing ofverbal communication instead of detecting information content. Inorder to unobtrusively capture verbal communication between teammembers we use the smartphone to detect speech in the vicinity ofeach team member. In contrast to previous studies, we aim to captureverbal communication in first responder teams. This requires a noiserobust speech detection approach as these teams often work in noisyenvironments and carry the smartphone inside their jacket pockets.We evaluate the approach in teams of firefighters. Furthermore, weinvestigate in our experiments how speech activity cues are relatedwith leadership style and team performance. We concentrate on thefollowing research questions:

• How well can speech in the vicinity of team members be detectedin mobile and noisy environments using the smartphone?

• Can leadership style influence the leader’s speech activity cues?

• Can team speech activity be linked to team performance?

1.4.3 Capturing Spatial and Temporal Coordination in Teams

Team coordination occurs when “team members execute their activi-ties in a timely and integrated manner [43]”. Coordination is the actof organizing team work flow requiring the team to manage its taskinterdependencies [44]. We aim to capture team work flow and as suchthe status of coordination by a) detecting moving sub-groups over


time from radio based proximity data and b) quantifying movementsynchrony between team members by analyzing their motion activ-ity levels. Thus, we focus on the spatial and temporal aspect of teamcoordination. In particular, we address the following questions:

• How can proximity dynamics within teams be measured withthe smartphone?

• How can moving sub-groups be detected from the proximitydata?

• How can team coordination indicators be extracted from the sen-sor data?

• Can the derived team coordination indicators be linked to per-ceived coordination and team performance?

1.4.4 Observing Teams ’in the Wild’

We apply our team sensing method in a real work environment todemonstrate the value of the approach in the field. We monitor teamsof firefighters over a six week period during their actual incidents. Thisrequires the sensing method to be mobile and unobtrusive. The thesisaddresses the following research questions:

• How can a smartphone-based behavior sensing system be de-ployed in real-work environments?

• What factors most influence data completeness?

• How can the sensor data of a team be visualized to be a usefultool for team feedback?

1.5 Thesis Outline and Paper List

The thesis is structured into 8 chapters. Chapter 2 summarizes theachievements, discusses limitations and concludes the work by givingan outlook to future work. Chapter 3 to 8 include 6 scientific articlesthat are listed in Table 1.1. Figure 1.4 presents the outline of the thesisand indicates where each chapter’s contributions lies with respect towhich behavioral cues are extracted and the scenario complexity thatwas used for evaluation. Chapter 3 presents a method to quantifybehavioral mimicry and investigates how mimicry behavior di↵ers in

1.5. Thesis Outline and Paper List 13

groups which were lead by authoritarian and considerate leaders. Inchapter 4 authoritarian and considerate leaders are discriminated bytheir speech activity cues. Chapter 5 presents a robust method for voiceactivity detection and chapter 6 presents and evaluates a method tomeasure proximity dynamics in groups. Chapter 7 presents a methodhow team coordination can be measured. Chapter 8 presents how amobile sensing system can be deployed in a real-work environmentand how mission feedback can be supported by sensed behavioraldata.

Team

Res

earc

h(n

atur

alis

tic la

b st

udy)

Team

Tra

inin

g(r

eal-

life

scen

ario

)Te

am M

onito

ring

(rea

l-lif

e, u

nres

tric

ted)

Communication CoordinationBehavioral Mimicry

Discriminating Leadership StyleChapter 4

Quantifying Behavioral MimicryChapter 3

Sensing Proximity DynamicsChapter 6

Monitoring Team Performance IndicatorsChapter 8

Sensing Team Coordination IndicatorsChapter 7

Robust Voice Activity DetectionChapter 5

Figure 1.4: Outline of the thesis, structured into 8 chapters.


Chapter Publication

3 Quantifying Behavioral Mimicry by Automatic Detec-tion of Nonverbal Cues from Body MotionS. Feese, B. Arnrich, G. Tröster, B. Meyer, K. JonasProceedings of the 4th International Conference on SocialComputing (SocialCom), pages 520–525, Amsterdam, Nether-lands, 2012, IEEE.

4 Discriminating Individually Considerate and Authori-tarian Leaders by Speech Activity CuesS. Feese, A. Muaremi, B. Arnrich, G. Tröster, B. Meyer, K. JonasProceedings of the 3rd International Conference on SocialComputing (SocialCom), pages 1460–1465, Cambridge - MA,USA, 2011, IEEE.

5 Robust Voice Activity Detection for Social SensingS. Feese, G. TrösterProceedings of the International Conference on Pervasiveand Ubiquitous Computing Adjunct Publication (UbiComp),pages 931–938, Zurich, Switzerland, 2013, ACM.

6 Sensing Group Proximity Dynamics of FirefightingTeams using SmartphonesS. Feese, B. Arnrich, G. Tröster, M. Burtscher, B. Meyer, K. JonasProceedings of the 17th International Symposium on Wear-able Computers (ISWC), pages 97–104, Zurich, Switzerland,2013, ACM.

7 Continuous Sensing of Team Coordination Indicatorsin Naturalistic Environments using the SmartphoneS. Feese, G. Tröster, M. Burtscher, K. JonasJournal of Human-centric Computing and Information Sci-ence, 2014

8 CoenoFire: Monitoring Performance Indicators of Fire-fighters in Real-world Missions using SmartphonesS. Feese, B. Arnrich, G. Tröster, M. Burtscher, B. Meyer, K. JonasProceedings of the International Joint Conference on Perva-sive and Ubiquitous Computing (UbiComp), pages 83–92,Zurich, Switzerland, 2013, ACM.

Table 1.1: Publications and their corresponding chapters in this the-sis.

1.6. Additional Publications 15

1.6 Additional Publications

The following publications have been authored and co-authored inaddition to those presented in this thesis:

• M. Rossi, S. Feese, O. Amft, N. Braune, S. Martis and G. Tröster.AmbientSense: A Real - Time Ambient Sound Recognition Sys-tem for Smartphones. In Proceedings of International Workshop onthe Impact of Human Mobility in Pervasive Systems and Applications(PerMoby), San Diego - CA, USA, 2013.

• M. Rossi, O. Amft, S. Feese, C. Käslin and G. Tröster. MyCon-verse: recognizing and visualizing personal conversations usingsmartphones. In Proceedings of the 2013 ACM Conference on Per-vasive and Ubiquitous Computing Adjunct Publication (UbiComp),Zurich, Switzerland, pages 1275–1284, ACM, 2013.

• J. Seiter, S. Feese, B. Arnrich, G. Tröster, O. Amft, L. Macrea,M. Konrad. Activity monitoring in daily life as an outcome mea-sure for surgical pain relief intervention using smartphones. InProceedings of the 17th International Symposium on Wearable Com-puters (ISWC), pages 127–128, ACM, 2013.

• S. Feese, B. Arnrich, M. Rossi, G. Tröster, B. Burtscher, B. Meyer,K. Jonas. Towards Monitoring Firefighting Teams with the Smart-phone. In Proceedings of the IEEE International Conference on Perva-sive Computing and Communications Workshops (PERCOM Work-shops), San Diego - CA, USA, pages 381–384, IEEE, 2013

• M. Wirz, P. Schläpfer, M. Kjærgaard, D. Roggen, S. Feese,G. Tröster. Towards an online detection of pedestrian flocks inurban canyons by smoothed spatio-temporal clustering of GPStrajectories. In Proceedings of the 3rd ACM SIGSPATIAL Interna-tional Workshop on Location-Based Social Networks, Chicago - IL,USA, pages 17–24, ACM, 2011.

• S. Feese, B. Arnrich, G. Tröster, B. Meyer, K. Jonas. Detecting Pos-ture Mirroring in Social Interactions with Wearable Sensors. InProceedings of the 15th International Symposium on Wearable Com-puters (ISWC), San Francisco - CA, USA, pages 119–120, ACM,2011.


Bibliography

[1] E. Salas, N. J. Cooke, and M. A. Rosen, “On teams, teamwork,and team performance: Discoveries and developments,” HumanFactors, vol. 50, no. 3, pp. 540–547, 2008.

[2] J. R. Hackman, “The design of work teams,” in Handbook of Orga-nizational Behavior, pp. 315–342, Prentice-Hall, 1987.

[3] J. A. LePine, R. F. Piccolo, C. L. Jackson, J. E. Mathieu, and J. R.Saul, “A meta-analysis of teamwork processes: Tests of a mul-tidimensional model and relationships with team e↵ectivenesscriteria,” Personnel Psychology, vol. 61, no. 2, pp. 273–307, 2008.

[4] J. Mathieu, M. T. Maynard, T. Rapp, and L. Gilson, “Team e↵ective-ness 1997-2007: A review of recent advancements and a glimpseinto the future,” Journal of Management, vol. 34, no. 3, pp. 410–476,2008.

[5] D. R. Ilgen, J. R. Hollenbeck, M. Johnson, and D. Jundt, “Teams inorganizations: From input-process-output models to IMOI mod-els,” Annual Review of Psychology, vol. 56, pp. 517–543, 2005.

[6] K. J. Krokos, D. P. Baker, A. Alonso, and R. Day, “Assessing teamprocesses in complex environments: Challenges in transitioningresearch to practice,” in Team E↵ectiveness in Complex Organiza-tions: Cross-disciplinary Perspectives and Approaches, pp. 383–408,Routledge, 2009.

[7] E. Salas, M. A. Rosen, J. D. Held, and J. J. Weissmuller, “Perfor-mance measurement in simulation-based training: A review andbest practices,” Simulation & Gaming, vol. 40, no. 3, pp. 328–376,2009.

[8] R. L. Moreland, J. D. Fetterman, J. J. Flagg, and K. Swanenburg,“Behavioral assessment practices among social psychologists whostudy small groups,” in Then A Miracle Occurs: Focusing on Behaviorin Social Psychological Theory and Research:, pp. 28–53, New York:Oxford University Press, 2010.


[9] SNF Project, “Micro-level behavior and team performance:A social signal processing approach to teamwork,” Jan2014. http://p3.snf.ch/Project-137741, Grant agreement no.:CR12I1_137741.

[10] B. Meyer, M. Shemla, and C. C. Schermuly, “Social categorysalience moderates the e↵ect of diversity faultlines on informa-tion elaboration,” Small Group Research, vol. 42, no. 3, pp. 257–282,2011.

[11] M. J. Burtscher, M. Kolbe, J. Wacker, and T. Manser, “Interactionsof team mental models and monitoring behaviors predict teamperformance in simulated anesthesia inductions.,” Journal of Ex-perimental Psychology: Applied, vol. 17, no. 3, p. 257, 2011.

[12] T. Choudhury and A. S. Pentland, “Sensing and modeling humannetworks using the sociometer,” in Proc. Int. Conf. Symposium onWearable Computers (ISWC), 2003.

[13] N. Eagle and A. Pentland, “Reality mining: sensing complex so-cial systems,” Personal and Ubiquitous Computing, vol. 10, no. 4,pp. 255–268, 2005.

[14] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T.Campbell, “A survey of mobile phone sensing,” IEEE Communi-cations Magazine, vol. 48, no. 9, pp. 140–150, 2010.

[15] M. Zancanaro, B. Lepri, and F. Pianesi, “Automatic detection ofgroup functional roles in face to face interactions,” in Proc. Int.Conf. Multimodal Interfaces (ICMI), p. 28, ACM Press, 2006.

[16] D. B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez, “Model-ing dominance in group conversations using nonverbal activitycues,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17,no. 3, pp. 501–513, 2009.

[17] D. B. Jayagopi and D. Gatica-Perez, “Mining group nonverbalconversational patterns using probabilistic topic models,” IEEETrans. Multimedia, vol. 12, no. 8, pp. 790–802, 2010.

[18] K. Otsuka, “Conversation scene analysis [social sciences],” IEEESignal Processing Magazine, vol. 28, no. 4, pp. 127–131, 2011.

http://p3.snf.ch/Project-137741

19

[19] H. Hung and D. Gatica-Perez, “Estimating cohesion in smallgroups using audio-visual nonverbal behavior,” IEEE Trans. Mul-timedia, vol. 12, no. 6, pp. 563–575, 2010.

[20] D. Sanchez-Cortes, O. Aran, and M. Schmid Mast, “Identifyingemergent leadership in small groups using nonverbal commu-nicative cues,” in Proc. Int. Conf. Multimodal Interaction (ICMI),2010.

[21] D. Sanchez-Cortes, O. Aran, M. Schmid Mast, and D. Gatica-Perez,“A nonverbal behavior approach to identify emergent leaders insmall groups,” IEEE Trans. Multimedia, vol. 14, no. 3, pp. 816–832,2012.

[22] D. Olguin, P. A. Gloor, and A. S. Pentland, “Capturing individ-ual and group behavior with wearable sensors,” in AAAI Symp.Human Behavior Modeling, 2009.

[23] D. Olguin, B. Waber, T. Kim, A. Mohan, K. Ara, and A. Pent-land, “Sensible organizations: Technology and methodology forautomatically measuring organizational behavior,” IEEE Trans.Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 1,pp. 43–55, 2009.

[24] K. Farrahi and D. Gatica-Perez, “What did you do today?: Dis-covering daily routines from large-scale mobile data,” in Proc. Int.Conf. ACM Multimedia, 2008.

[25] K. K. Rachuri, C. Mascolo, P. J. Rentfrow, and C. Longworth,“EmotionSense: A mobile phones based adaptive platform forexperimental social psychology research,” in Proc. Int. Conf. Ubiq-uitous Computing (UbiComp), 2010.

[26] H. Lu, D. Frauendorfer, M. Rabbi, M. Schmid Mast, G. T. Chit-taranjan, A. T. Campbell, D. Gatica-Perez, and T. Choudhury,“StressSense: Detecting stress in unconstrained acoustic environ-ments using smartphones,” in Proc. Int. Conf. Ubiquitous Comput-ing (UbiComp), 2012.

[27] M. Rabbi, S. Ali, T. Choudhury, and E. Berke, “Passive and in-situ assessment of mental and physical well-being using mobilesensors,” in Proc. Int. Conf. Ubiquitous Computing (UbiComp), 2011.


[28] M. L. Knapp and J. A. Hall, Nonverbal Communication in HumanInteraction. Wadsworth, 2005.

[29] S. Mitra and T. Acharya, “Gesture recognition: A survey,” IEEETrans. Systems, Man, and Cybernetics, Part C: Applications and Re-views, vol. 37, no. 3, pp. 311–324, 2007.

[30] L.-P. Morency, A. Quattoni, and T. Darrell, “Latent-dynamic dis-criminative models for continuous gesture recognition,” in Proc.Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.

[31] A. Marcos-Ramiro, D. Pizarro-Perez, M. Marron-Romera, L. S.Nguyen, and D. Gatica-Perez, “Body communicative cue extrac-tion for conversational analysis,” in Proc. Int. Conf. Automatic Faceand Gesture Recognition, Apr. 2013.

[32] T. L. Chartrand and R. van Baaren, “Chapter 5 human mimicry,” inAdvances in Experimental Social Psychology, Academic Press, 2009.

[33] F. Ramseyer and W. Tschacher, “Nonverbal synchrony in psy-chotherapy: Coordinated body movement reflects relationshipquality and outcome.,” Journal of Consulting and Clinical Psychol-ogy, vol. 79, no. 3, pp. 284–95, 2011.

[34] E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges,S. Viaux, and D. Cohen, “Interpersonal synchrony: A survey ofevaluation methods across disciplines,” IEEE Trans. A↵ective Com-puting, vol. 3, no. 3, pp. 349–365, 2012.

[35] N. Aharony, W. Pan, C. Ip, I. Khayal, and A. Pentland, “SocialfMRI: Investigating and shaping social mechanisms in the real-world,” Pervasive and Mobile Computing, vol. 7, pp. 643–659, 2011.

[36] T. M. T. Do and D. Gatica-Perez, “Human interaction discoveryin smartphone proximity networks,” Personal and Ubiquitous Com-puting, vol. 17, no. 3, pp. 413–431, 2011.

[37] D. Wyatt, T. Choudhury, J. Bilmes, and J. A. Kitts, “Inferring colo-cation and conversation networks from privacy-sensitive audiowith implications for computational social science,” ACM Trans.Intelligent Systems and Technology, vol. 2, pp. 7:1–7:41, Jan. 2011.

[38] D. Gatica-Perez, “Automatic nonverbal analysis of social inter-action in small groups: A review,” Image and Vision Computing,vol. 27, no. 12, pp. 1775–1787, 2009.

21

[39] M. et. al., “The ami meeting corpus,” in Proc. Int. Conf. Methodsand Techniques in Behavioral Research, 2005.

[40] D. M. Blei, “Probabilistic topic models,” Communications of theACM, vol. 55, no. 4, pp. 77–84, 2012.

[41] T. L. Chartrand, W. W. Maddux, and J. L. Lakin, “Beyond theperception-behavior link: The ubiquitous utility and motivationalmoderators of nonconscious mimicry,” The New Unconscious,pp. 334–361, 2005.

[42] M. Lafrance and M. Broadbent, “Group rapport: Posture shar-ing as a nonverbal indicator,” Group & Organization Management,vol. 1, no. 3, pp. 328–333, 1976.

[43] T. Dickinson and R. McIntyre, “A conceptual framework for team-work measurement,” in Team performance assessment and measure-ment: Theory, Methods, and Applications, pp. 19–43, Lawrence Erl-baum Associates, 1997.

[44] S. W. J. Kozlowski and B. S. Bell, “Work groups and teams in orga-nizations,” in Handbook of psychology: Industrial and organizationalpsychology, vol. 12, pp. 333–375, London: Wiley, 2003.

2Thesis summary

Chapter 2 summarizes the main approaches and contributions of thisthesis, discusses the limitations and presents an outlook of opportunities forfuture research. Detailed descriptions and discussions of the contributionscan be found in the referenced publication chapters.

24 Chapter 2: Thesis summary

2.1 Outline of Contributions

The most relevant contributions of this thesis are listed in Figure 2.1along the thesis objectives defined in Section 1.4.

Objective ContributionMethod Investigated link to Study

Quantifying Behavioral Mimicry(Section 2.3)

� Quantification of Mimicry by detecting and comparing gestures and postures of interaction partners using body-worn motion sensors.

� Leadership Style

Lead

ers

hip

Measuring Verbal Communication(Section 2.4)

� Noise robust speech activity detection using dictionary learning and sparse representation.

� Leadership Style� Team Performance

Fire

figh

tin

g Tr

ain

ingSpatial and

Temporal Team Coordination(Section 2.5)

� Detection of moving sub groups using radio based proximity information.

� Measurement of temporal movement alignment between team members.

� Set of team coordination indicators that capture spatial and temporal aspects of team coordination.

� Team Performance� Perceived

Coordination

Observing Teams in the Wild(Section 2.6)

� Development of a smartphone based mobile sensing system.� Real-work deployment of the system in Zurich fire brigade over a

period of six weeks.� Analysis of deployment and data collection procedure.� Visualization of sensor data to illustrate team workflow and enable

data supported mission feedback.Fi

refi

ghti

ng

Rea

l-w

ork

Figure 2.1: Outline of contributions presented in this thesis accordingto the objectives defined in Section 1.4.

The thesis contributed methods to detect the following three teambehaviors from sensor signals: Behavioral Mimicry (Section 2.3, p. 28),Verbal Communication (Section 2.4, p. 32) and Spatial and Temporal Co-ordination (Section 2.5, p. 35). Additionally, the methods were appliedin the setting of team research by investigating the links between thesensed behavioral cues and variables of the team e↵ectiveness frame-work such as Leadership Style and Team Performance. Further, three ex-periments were conducted to collect behavioral data of over 140 teaminteractions involving more than 220 participants.

Section 2.3 details the detection of Behavioral Mimicry using on-body motion sensors. Section 2.4 deals with the measurement of Verbal

2.2. Conducted Experimental Studies 25

Communication in teams and presents a noise robust method to detectpresent speech from ambient audio data. Section 2.5 describes howSpatial and Temporal Coordination of firefighting teams can be capturedwith sensors and wireless radios integrated into the smartphone.

Sections 2.3.3 and 2.4.2 summarize how Leadership Style (inputvariable of the team e↵ectiveness framework) influenced BehavioralMimicry and Speech Activity in our experiments. Sections 2.4.3 and 2.5.4present how Team Speech Activity and Team Coordination Indicators werecorrelated in our experiments to Team Performance (output variable ofthe team e↵ectiveness framework).

The following Section 2.2 presents the conducted experiments andthe sensor setups that have been used to capture the three behaviorsin teams.

2.2 Conducted Experimental Studies

Throughout this thesis we conducted three experiments in which teamsinteracted in scenarios with increasing complexity. The subsequentsub-sections present the three experiments that we used for develop-ment and validation of our methods.

2.2.1 Study 1 - Sensing Leadership Behavior in Small Groups

We conducted a lab study with 55 teams aimed to analyze the nonver-bal behavior of individually considerate leaders.

Scenario In groups of three, the participants had to solve a hid-den problem decision making task. Led by a selected leader, the teamranked four fictive candidates with regard to their suitability for anopen job position (see Section 3.2 on Page 53). Half of the leaders wereinstructed to show individually considerate leadership, whereas the otherhalf was instructed to be authoritarian which in our study referred tothe absence of individual consideration (see Section 4.3.1 on Page 73).

Sensor Setup To capture body movement of the group members,we used Inertial Measurement Units (IMU). The IMUs included anaccelerometer, a gyroscope and a magnetometer (XSens MTx). Duringexperiments, all participants wore a sensor-shirt with five integratedIMUs to capture the movement of lower and upper arms as well asback movements. Head movements were captured with an additional


IMU mounted on baseball caps (see Figure 4.1 on Page 74). Speechof all group members was synchronously recorded with clip-on lapelmicrophones which were also integrated into the sensor shirts (seeSection 4.3.2 on Page 74).

Data Set In total, we recorded data of 165 subjects (112 female, 53male; 25.4 ± 4.2 years) in 55 group discussions. Due to sensor failures,data of 9 groups were partially missing. The final data set includes46 group discussions (more than 15 h) led by 22 authoritarian and 24individual considerate leaders.

2.2.2 Study 2 - Measuring Firefighting Teams during Training

Moving out of the lab, we conducted the second experiment in closecollaboration with the Zurich fire brigade. We monitored professionalfirefighting teams during real training missions in which the firefight-ers were confronted with actual fires, extreme heat, high humidity andrestricted visibility.

Scenario The scenario was designed to be challenging so that itwould maximize di↵erences in team coordination and performance. Inthe scenario a fire on the third floor of an apartment building had to beextinguished (see Section 8.5.1 on Page 165). After arrival at the scene,the hose was prepared and the first troop used the turntable ladder toreach the roof window where they entered the building. Subsequently,the firefighters navigated blindly through the smoke-filled buildingto reach the fire. On their way an unexpected dummy person had tobe found and rescued and only afterwards the fire should have beenextinguished. Impressions of the scenario are presented in Figure 8.2on Page 167.

Mobile Sensing System In order to record movement and speechactivity of each firefighter as well as proximity between firefighters, wedeveloped a mobile sensing system using smartphones (see Section 8.4on Page 160). Each firefighter placed a smartphone into the left pocketof their jacket. Raw data from the integrated accelerometer, barometer,microphone and GPS receiver were recorded and ANT radio messagesbetween devices were sent and received to estimate proximity. Addi-tional signal features such as motion intensity were calculated and

2.2. Conducted Experimental Studies 27

transferred in real-time to a server to visualize system status and teamindicators such as movement intensity (see Figure 8.1 on Page 161).

Data Set We recorded 18 training runs of the described scenario. Intotal 51 male professional firefighters, aged 35 ± 10, took part in thedata collection. In five runs one smartphone partially failed recording,in additional three runs one firefighter did not participate in the study.This left us with 10 complete runs totaling to over 2 h of training data.

2.2.3 Study 3 - Monitoring Firefighting Teams in the Wild

To show that our smartphone-based team sensing approach can beused in the field, we monitored a squad of professional firefightersfrom the Zurich fire brigade in order to collect data of firefighting teamsduring actual incidents (see Section 8.6.1 on Page 170). For the datacollection we used our smartphone based sensing system CoenoFire(see Section 8.4 on Page 160).

Scenario Over a period of six weeks the sensing system was deployedin the fire brigade to monitor squads of firefighters (see Figure 8.4on Page 171). Each squad consisted of a turntable ladder with threefirefighters and a fire truck with five to six firefighters varying with thestation’s work plan. During the data collection period the monitoredsquads were involved in 76 incidents of which 43 were triggered byautomatic fire alarm systems, 9 were real fire incidents and the rest wereother incident types such as a burning garbage container, a trappedperson in an elevator or water inside a building.

Data Set The data set includes 76 real-world missions that occurredduring 33 monitored work shifts. Overall, 71 firefighters participated inour study and more than 148 mission hours were recorded.In total, wecollected 236 sensor recordings and 156 post incident questionnaires.


2.3 Quantifying Behavioral Mimicry between TeamMembers

Behavioral Mimicry refers to the unconscious alignment of gestures andpostures between two or more interaction partners. It is an importantbehavioral cue that has been shown to be related to interpersonal a↵ect,liking and rapport [1]. One example of behavioral mimicry is shown inFigure 2.2 in which two persons simultaneously display the face-touchposture.

Figure 2.2: An example of behavioral mimicry: The left and right per-son both display the face-touch posture. Motion sensors are placed onboth lower arms, back and head.

In order to quantify behavioral mimicry automatically from motionsensor data we first detected gestures and postures independentlyof each team member and then detected if a team member followedanother one.

2.3.1 Detection of Gestures and Postures

To detect postures and gestures with wearable motion sensors werelied on activity recognition methods. We focused on seven cues thatare important in face-to-face interaction (see Section 3.3.2 on Page 56).We considered static postures (face-touch, arm crossed and arm diagonal)and dynamic gestures (gesticulating, fidgeting, posture changes and headnodding).

2.3. Quantifying Behavioral Mimicry between Team Members 29

Methods For the detection of lower arm postures and gestures weapplied a two step approach (see Section 3.3 on Page 55). In a first step,we segmented the motion data of the lower arm sensors into active seg-ments belonging to dynamic gestures and static segments belongingto postures. Depending on the segmentation outcome, these segmentswere classified in a second step using logistic regression. Head nodswere classified independently using only the motion data of the headsensor. For classification of each segment we used mean orientationrepresented by Euler angles and additional statistical features calcu-lated on each axis of the acceleration and gyroscope sensors. Becausefidgeting and posture changes were enclosed by di↵erent static postureswe additionally computed the absolute di↵erence between the meanorientation angles of the previous and the next segment to capture thesimilarity of the neighbouring static segments. Before classification,highly correlated features (r > 0.8) and features with low standarddeviation (� < 0.01) were removed. For classification we used logisticregression.

Evaluation For classifier training and evaluation we manually la-beled 240 min of discussion data (see Section 3.2.3 on Page 54). Thesubject independent performance of the individual classifiers are sum-marized in the combined confusion matrix shown in Figure 2.3.

• The face touch posture could be detected with high accuracy acrossall participants, whereas the arm postures arms diagonal and armscrossed could not be well discriminated. This was because of mag-netic disturbances that occurred when subjects moved their armsover the table which led to wrong heading information. Collaps-ing arms diagonal and arms crossed into one arms flat class led toan subject independent accuracy of 99 %.

• Dynamic arm gestures were recognized with 66.8 % accuracy andconfused in about 6 % to 24 % percent of the time. The confusionbetween gesticulating and fidgeting were due to lightly performedinstances of gesticulating and more accented fidgeting instances.The confusion between posture changes and fidgeting could be ex-plained by the di�culties to discriminate the static postures armscrossed and arms diagonal as in this case the neighbourhood fea-tures, which were important to di↵erentiate fidgeting from posturechanges, also su↵ered from magnetic disturbances.


72.1

27.9

2.8

97.2

71.4

22.8

5.8

24.6

67.2

8.2

13.7

24.4

61.9

98.0

0.2

1.8

0.0

62.0

38.0

0.0

4.6

95.4

Ground Truth

Pre

dic

tio

n

[1] [2] [3] [4] [5] [6] [7] [8]

nodding [1]

not nodding [2]

gesticulating [3]

fidgeting [4]

posture change [5]

face touch [6]

arms crossed [7]

arms diagonal [8]10

20

30

40

50

60

70

80

90

100

Acc

ura

cy [

%]

Figure 2.3: Detection performance of individual nonverbal cues. Time-based confusion matrix for 30 subjects of the evaluation set. Indepen-dent detection results for head related, dynamic and static cues arecombined.

• Nodding was detected in 72.1 % of the ground truth time and97.2 % were correctly recognized as not nodding. Thus, noddingwas detected with 84.7 % accuracy. The confusion stemmed frominsertions and deletions and was mostly due to individual nod-ding habits (see Table 3.2 on Page 61).

2.3.2 Measuring Behavioral Mimicry

Following LaFrance [2], we defined mimicry as an event that occurswhen person B follows person A in her behavior, that is she displaysthe same gesture or posture. Schematically behavior mimicry of onebehavioral cue is illustrated in Figure 3.2 on page 58. To count as amimicry event, person B needed to start displaying behavior x afterperson A started, but at most one second after person A had stoppeddisplaying behavior x (see Section 3.4 on Page 56).

2.3.3 Behavioral Mimicry & Leadership Style (Study 1)

Within the leadership study (Study 1, Section 2.2.1) we investigatedhow leadership style influenced mimicry behavior between leaders

2.3. Quantifying Behavioral Mimicry between Team Members 31

and followers. We quantified behavioral mimicry between leaders andfollowers in two directions (see Section 3.5.2 on Page 63). We countedhow often the leader mimicked one or both of his followers and howoften one of the followers mimicked their leader’s behavior. We usedrandom permutation tests to analyze mimicry behavior in 24 teamsthat were led by an individual considerate (IC) leader and in 22 teamsthat were led by an authoritarian (AU) leader. Further, we assumed thaterrors made during the automatic detection of individual gestures andpostures were distributed equally across the two groups and wereindependent of leadership style. In the following, we summarize ourfindings based on the automatic detection of mimicry (see Section 3.5.2on Page 63):

• IC leaders more often mimicked the nodding behavior of theirfollowers (p = 0.031, Cli↵’s delta = �0.20). On average IC leadersmimicked 23.2 % of their followers’ nods, whereas AU leadersmimicked only 10.9 % of the nods of their followers. The higheramount of nodding mimicry might have signaled that IC leadersgave more attention to their followers.

• IC leaders more often mimicked posture changes of their followers(p < 0.001, Cli↵’s delta = �0.51). On average IC leaders mim-icked 5.2 % of their followers’ posture changes, whereas AU lead-ers did not mimic posture changes of their followers.

• Followers who were led by an IC leader more often mimickedthe face touch of their leaders (p = 0.038, Cli↵’s delta = �0.29). Onaverage followers of an IC leader mimicked 10.4 % of their fol-lowers’ face touches, whereas followers of AU leaders mimicked3.3 % of their followers’ face touches.

Given our definition of mimicry measurement, we found more be-havioral mimicry in teams which were lead by an IC leader. Becausemimicry is related to empathy, trust and smoothness of a conversa-tion [1] this finding fits into the theoretical framework of considerateleadership which places the focus on the personal relationship betweenleaders and their followers [3].


2.4 Measuring Verbal Communication of Teams in Mo-bile and Noisy Scenarios

In this thesis, we focused on the amount and timing of communicationand did not attempt to recognize informational content. While the lowbackground noise of the lab experiment (Study 1, Section 2.2.1) allowedthe use of an energy based speaker diarization algorithm to detect whospeaks when (see Section 4.4.2 on Page 76), the noisy outdoor environ-ment of the firefighting scenario (Study 2, Section 2.2.2 and Study 3,Section 2.2.3) required a more noise robust approach. Therefore, wedeveloped a noise robust voice activity detector to detect speech in thevicinity of team members.

2.4.1 Measuring Speech Activity in Noisy Environments

To detect speech in noisy environments, we utilized dictionary learningand sparse representation of the noisy audio signal (see Chapter 5on Page 87). Sparse representation refers to the representation of asignal by a linear combination of dictionary atoms in such a way thatthe approximation error and the number of used dictionary atoms isminimized. The trade-o↵ between approximation error and sparsityis controlled by a factor. In dictionary learning, one searches for thedictionary that best represents some training data given the sparsityinducing factor.

We first learned a dictionary on clean speech data to best repre-sent noise free speech. As training data, we sampled 106 voiced speechframes of 200 randomly selected sentences from the TIMIT database [4].To learn the clean speech dictionary, we computed the dictionary thatbest represents the training data while having a sparse solution. Usingthe clean speech dictionary to find a sparse representation of the noisyspeech signal enhanced the signal-to-noise ratio since speech was bet-ter represented than noise. Thus, the sparse coe�cients enabled speechdetection even in low signal-to-noise conditions.

The details of the speech detection chain are presented in Figure 5.2on Page 91. In the pre-processing stage, the audio signal was framed(30 ms, 50 % overlap) using a hamming window and then transformedinto the sparse representation with the clean speech dictionary. Framesof the sparse representation were used to calculate features on longerwindows (1 s to 2 s) which were fed into a logistic regression classifierfor speech / non speech detection (see Section 5.3.5 on Page 92).

2.4. Measuring Verbal Communication of Teams in Mobile and Noisy Scenarios 33

Evaluation We evaluated the proposed speech detection algorithmusing two created data sets which combined clean speech with di↵erentnoise types at three signal-to-noise ratios (0 dB, =5 dB, =10 dB). Addi-tionally, we analyzed the detection performance on ambient sounddata that was recorded during a firefighting training mission (see Sec-tion 5.5 on Page 94).

• On average speech could be detected with an accuracy of 87 %,92 % on a window length of one, two seconds, respectively. Acomparison to an alternative noise robust approach [5] showedan improvement in accuracy of 6 % (see Section 5.4.2 on Page 93).

• Evaluation on firefighting training data showed that speechcould be detected on average with 85 % accuracy in a noisy fire-fighting scenario from smartphones that were placed into thebreast pocket of the firefighting jacket (see Section 5.5 on Page 94).

2.4.2 Speech Activity Cues & Leadership Style (Study 1)

In the leadership experiment (Study 1, Section 2.2.1) we examined howwell speech activity cues could discriminate individually considerate andauthoritarian leaders (see Chapter 4 on Page 69). Speech activity cuessuch as speaking time and number of speaking turns summarizedthe speaking behavior of each leader during discussion periods (seeSection 4.4.2 on Page 76). We calculated the speech activity cues on dis-cussion slices of 1 min to 6 min in length (moving window approach).

For the analysis of the speech activity cues we used logistic regres-sion to discriminate individually considerate from authoritarian leader-ship. We only considered the speech activity cues of the group leadersand did not take into account the behavior of the followers.

• We analyzed the minimum slice length needed to discriminatethe two leadership styles. We found four minutes to be the op-timal slice length to discriminate leadership style with 75.5 %accuracy (see Figure 4.4 on Page 81).

• The ability to distinguish the leadership styles was dependent onthe relative discussion time. The best classification accuracy wasachieved towards the middle of the discussion (see Figure 4.4 onPage 81).


Further, we investigated how the two leadership styles a↵ected ver-bal communication patterns of the leader. The analysis of the regressioncoe�cients (see Figure 4.5 on Page 82) showed that individually consid-erate leaders not only had shorter turns and spoke less, but also usedmore short utterances and interrupted followers more often. Taken to-gether these findings could signal e↵ective listening which would bein line with the literature on considerate leadership [3, page 7].

2.4.3 Team Speech Activity & Team Performance (Study 2)

In the firefighting training scenario (Study 2, Section 2.2.2) we inves-tigated the link between speech activity and team performance (seeSection 8.5.2 on Page 168). The analysis was done for two phases: thepreparation phase which started with the arrival on site and ended whenthe first troop was ready to enter the building and the execution phasewhich included navigating through the building, rescuing the dummyperson and extinguishing the fire. We found the following correlations(see Section 8.5.2 on Page 168):

• Speech Activity during the preparation phase was not correlatedwith the duration of the preparation phase. We believe this find-ing was to be expected since the preparation was standard proce-dure and did not require any exchange of additional informationbetween the firefighters.

• Teams that needed longer for the execution phase showed morespeech activity than faster teams (r = 0.57, p < 0.05, N = 16).This might have indicated an extra need for explicit coordinationbetween the troop and the incident commander.

• Though not statistically significant, teams that needed longer tocomplete the training mission also tended to speak more (r =0.39, ns, N = 16).

2.5. Capturing Spatial and Temporal Coordination in Teams 35

2.5 Capturing Spatial and Temporal Coordination inTeams

This section summarizes how we measured spatial and temporal as-pects of team coordination (see Chapter 7 on Page 123). Our approachincludes three steps: 1) The detection of moving sub-groups to capturethe spatial aspect of team coordination (Section 2.5.1). 2) The estimationof temporal movement alignment between team members to capturethe temporal aspect of team coordination (Section 2.5.2). 3) A set ofteam coordination indicators which summarize pairwise coordinationmeasures between team members on the team level (Section 2.5.3).

2.5.1 Detecting Moving Sub-Groups

Estimating Proximity We estimated proximity between individualsby sending and receiving radio messages between smartphones. Weassumed that individuals wearing the smartphone were in-sight andin proximity with each other when radio messages could be received.Instead of Bluetooth, we used the low-power ANT radio protocol.To search nearby devices, we implemented a list search scanning re-peatedly for the presence of predefined devices (see Section 6.2.1 onPage 105). This allowed us to scan for present devices every 2 s. Incase that a device was not present search timeouts were likely to occurwhich increased the update rate to 4 s (see Section 6.2.2 on Page 106).Because radio waves are influenced by external factors such as theenvironment [6], maximum transmission ranges vary. We tested themaximum search distance at which messages were still received fromthe transmitting devices in di↵erent environments. On average, weobserved about 1 m range for back-to-back, 1 m to 4 m range for face-to-back, and 9 m to 20 m range for face-to-face configurations (see Sec-tion 6.2.3 on Page 108).

Sub-Group Detection To detect moving sub-groups from proximityinformation, we followed a two step approach (see Section 6.3 onPage 109): For consecutive time intervals we evaluated between whichdevices messages were transmitted. In the second step, moving sub-groups were clustered. To be part of a sub-group a group memberneeded to be connected to at least one other group member (single-link criterion). As a result, if A was connected with B and B with C, but


not with A, all three individuals were still clustered into one group.Clusters were first identified independently for each time interval andthen smoothed with a temporal filter, so that clusters persisted for atleast 10 s.

Using only radio based proximity information lead to individu-als on di↵erent height levels (e.g. di↵erent floors in buildings) to beclustered into one group. To address this problem, we added heightinformation derived from the barometer in the smartphone. If the abso-lute atmospheric pressure di↵erence between two devices was greaterthan a predefined threshold, the two devices were considered to beon di↵erent height levels and were thus not clustered to the samesub-group.

We evaluated our algorithm to detect moving sub-groups of fire-fighters during the described training scenario by comparing the re-sults to a manually annotated ground truth. On average, team mem-bers were assigned to the correct sub-group with 95 % accuracy (seeSection 6.5 on Page 114).

Visualization To visualize the detected sub-groups, we used narra-tive charts to show how sub-groups merge and split. In Figure 2.4,the narrative chart of a training mission is presented. Each colored linerepresents one team member and lines that are close together indicate amoving sub-group. Additionally, it is shown which groups operate onground floor levels and which ones operate above ground level. Thisform of visualization allows firefighters to quickly gain an overviewof how a mission evolved over time.


371

444

599

561

584

5011

222

827

630

0

Above Ground Ground Level

combined with atmospheric pressure

010

020

030

040

050

060

0

Inci

dent

Com

man

der

Troo

p 1

Engi

neer

Troo

p 2

Troo

p 1

Troo

p 2

Troo

p 2

Ladd

er O

pera

tor

Troo

p 1

12

34

56

78

910

time

[sec

onds

]

Figu

re2.

4:V

isua

lizat

ion

ofth

egr

oup

clus

teri

ngof

afir

efigh

ting

trai

ning

mis

sion

.AN

T-ba

sed

prox

imity

was

com

bine

dw

ithat

mos

pher

icpr

essu

resi

gnal

sto

also

cons

ider

heig

htdi↵

eren

ces

offir

efigh

ters

.


2.5.2 Temporal Movement Alignment

Temporal coordination indicates how well team members align theiractions in time. As firefighting requires team members to work andmove together, we assumed that coordinated team members changetheir activity level at similar points in time. We therefore capturedtemporal movement alignment between team members by comparingtheir motion activity levels (see Section 7.4.2 on Page 136).

The motion activity level expresses the fraction of time that anindividual is active within a moving window of length L. It is thereforeindependent of motion intensity if the motion intensity is above theminimum intensity required to count as active. The value of L a↵ectsthe temporal resolution, a small value requires individuals to changetheir activity level closer in time, whereas a larger value allows for adelay between activity changes.

To measure the statistical dependencies between two activity levelsignals we used mutual information. Mutual information is an infor-mation theoretic measure which captures how much information tworandom variables share, e.g. in our case two activity level signals.

2.5.3 Team Coordination Indicators

Based on the detection of moving sub-groups and the estimation oftemporal movement alignment we constructed two types of team net-works. The Sub-Group Network expressed how long each team memberwas in a sub-group with each other team member. The Movement Align-ment Network captured how well each pair of team members alignedtheir motion activity levels. Figure 2.5 shows examples of sub-groupnetworks of firefighting teams that performed the training mission ofStudy 2 (see Section 2.2.2).

Based on social network analysis metrics, we proposed a set ofteam coordination metrics to aggregate and characterize the structureof the two types of team networks (see Section 7.4.3 on Page 138). Weused density to quantify the average connectedness of the network anddegree centralization to measure how central the most central node wasin relation to how central all other nodes were.

2.5.4 Team Coordination Indicators & Team Performance (Study 2)

In the firefighting training scenario (Study 2, Section 2.2.2) we analyzedthe correlation between the proposed team coordination indicators


centralization: 0.0976density: 0.4234duration: 10.57 min




Troop - at top of ladderTroop - inside building

Ground SupportLadder OperatorIncident Commander

Engineer

fastest team 2nd fastest team slowest team 2nd slowest team

Sub

-Gro

up

Net

wo

rks

Figure 2.5: Sub-group networks of the two fastest and the two slow-est firefighting teams completing the training mission of Study 2 (seeSection 2.2.2). Team members are represented by a node which size isproportional to how long a team member was on average in a sub-group with each other team member. Links between nodes representthe time that the corresponding team members were in one sub-group,nodes of team members which were longer in the same sub-group areshown to be closer together. Roles are indicated by di↵erent colors.

and perceived coordination. Further, we investigated the link betweenthe coordination indicators and team performance as measured bycompletion time (see Table 7.1 on Page 141). We found:

• Degree centralization of the sub-group network to be highly neg-atively correlated with completion time (⇢ = �0.83, p < 0.05,N = 10) and positively with implicit coordination (⇢ = 0.68,p < 0.05, N = 10). Faster teams showed higher degree centraliza-tion in the sub-group network, meaning that team members weremore heterogeneously distributed in sub-groups. Some firefight-ers were in well connected sub-groups for a long time, whereasothers were longer on their own or part of a small sub-group.

• Density of activity coordination networks was highly negativelycorrelated with completion time (⇢ = �0.81, p < 0.01, N = 10)and positively with implicit coordination (⇢ = 0.53, ns, N = 10),meaning that teams which showed higher activity coordinationwere faster and perceived their implicit coordination as bettercompared to teams which showed less activity coordination.


The visual analysis of the sub-group networks (see Figure 2.5 andFigure 7.7 on Page145) explained why faster teams had higher de-gree centralization in the sub-group network: Faster teams tended tohave one firefighter on top of the ladder (red) and two well connectedsub-groups, the troop and the remaining firefighters on ground. Theobserved high degree centralization was due to faster teams splittingmore quickly into these three groups so that the firefighter on top of theladder was relatively long alone which was the reason for his low de-gree centrality. Because the other firefighters were part of a sub-grouptheir respective degrees were higher. Thus, the observed high degreecentralization.

In terms of temporal coordination, the results showed that fasterteams exhibited higher movement alignment. This finding seemedreasonable as it indicated that firefighters in faster teams worked welltogether and aligned their movements accordingly. The visual inspec-tion of the movement alignment networks (see Figure 7.8 on Page 146)showed that highest movement alignment occurred between memberswhich worked closely together (e.g. the troop inside the building).

2.6. Observing Teams in the Wild 41

2.6 Observing Teams in the Wild

In this section, we summarize the data collection procedure (see Sec-tion 8.6.1 on Page 170) that we designed to collect behavioral datafrom professional firefighting teams in real-world missions (Study 3,Section 2.2.3). Further, we present the main factors which influencedthe data collection. Additionally, we present how the sensor data canbe visualized to become a useful tool for data supported mission feed-back.

2.6.1 Data Collection Procedure

We integrated the data collection procedure into the daily routine ofthe fire brigade as best as possible. The smartphones were placed ona sideboard close to the fire truck (see Figure 8.4 on Page 171) andconnected to the charger. In this way they were ready to be usedanytime. As soon as an alarm occurred, each firefighter un-pluggedhis assigned smartphone and put it inside the left breast pocket ofhis jacket. Un-plugging the smartphone from the charging cable trig-gered the recording app to automatically start the data collection. Therecordings were stopped as soon as the phones were reconnected tothe charging cables after the firefighters returned to the station. Thisalso triggered a short questionnaire to assess perceived workload andcoordination of the incident.

Evaluation of Data Completeness Over the course of the data collec-tion the monitored squad was involved in 76 incidents. This resultedin 621 expected recordings (the number of incidents times the num-ber of firefighters involved in each incident). In 93 % of the expectedrecordings the smartphones were charged and ready to be used. How-ever, given that the phones were charged, only 41 % of the expectedrecordings were collected. To better understand when firefighters didnot carry the smartphone, we investigated several factors (see Sec-tion 8.6.4 on Page 175). We found the following to be most relevant:

• Data completeness rate decreased over the period of the datacollection. While within the first two weeks 62 % of all expectedrecordings were completed, the completeness rate dropped inthe second fortnight to 42 % and reached 28 % in the last twoweeks. One reason for this was that the end of the data collectionwas not well communicated to the firefighters, another reason


was that they did not see any immediate benefit from the datacollection.

• We noticed that firefighters of the fire truck remembered thephone almost twice as often compared to firefighters of theturntable ladder. Most likely this was because the smartphoneswere located close to the fire truck, but further away from theturntable ladder.

• We observed a higher than average data completeness for inci-dents that occurred in the afternoon and the lowest for incidentsat night. At the first incident of the day and at night firefightersforgot to pick up the smartphones more often.

2.6.2 Data Visualization for Mission Feedback

We considered a data visualization that shows how team memberswork together over time. We chose to present raw atmospheric pres-sure data to show when team members moved up or down, movingsub-groups to show which and when firefighters were close together,motion intensity to illustrate physical activity and speech activity tovisualize team communication. Additionally, mission phases such asapproach and on-site were indicated by di↵erent background shading.

Figure 2.6 shows an example visualization of the smartphone sen-sor data collected during an actual fire in an apartment building. Fromthe visualization it can easily be spotted when first (see 2a, 2b) andsecond troop (see 3a, 3b) reached the third floor where the fire waslocated and when the missing persons were rescued and carried to theground floor (see 4a, 5a). Additionally, it can be seen that the squadmerged (see 1b) and showed high motion intensity (see 1c) duringthe approach phase. This was due to imprecise location informationwhich required the firefighters to check the first possible incident loca-tion. Shortly after the arrival on site, team communication decreasedwhich can be seen by the decreasing amount of detected speech activ-ity (see minutes 11-13) and was most likely because at that stage of themission trained automatisms were at play and every firefighter knewwhat to do.

Positive feedback of the firefighters confirmed the usefulness of theproposed data visualization. The training instructor of the fire brigadefound the visualization to be a “great” tool for post-incident feedbackand training allowing to quickly gain an overview of a mission.

2.6. Observing Teams in the Wild 43

time [min]

961

962

963

964

965

Approach On-site

grou

p pr

oxim

ity

grou

nd le

vel

high

er le

vel

1c

1b

2c

2a

2b

3a

3b

4a 5a

4c

5c

mot

ion

inte

nsity

[m/s

^2]

spee

ch a

ctiv

ity [%

]at

mos

pher

ic p

ress

ure

[hPa

]

5

10

15

20

5 10 15 20 25 300

10

20

30

40

50

60

70

Inident Commander Troop 1EngineerTroop 2

5 10 15 20 25 300

5 10 15 20 25 300

5 10 15 20 25 300

Troop 1Troop 1

Figure 2.6: Impressions and visualization of the smartphone datarecorded during the first 30 minutes of a real-world firefighting mis-sion in a multi-family residential home. Mission time starts as soonas firefighters leave the station. Shown are from top to bottom atmo-spheric pressure, group proximity, movement intensity and speechactivity. Just the pressure signals alone indicate when first (2a) and sec-ond troop (3a) reached higher floors and when two missing personswere rescued (4a, 5a).


2.7 Conclusions

This thesis investigated the potential to observe teams with wearablesensors. The major aim of the thesis was to develop and evaluatemethods to automatically extract behavioral team metrics from sensordata in order to characterize team processes. We focused on threeimportant behaviors:

• Behavioral Mimicry relates to the a↵ective relationship betweenteam members. We used on-body motion sensors and activ-ity recognition methods to detect gestures and postures ofteam members. Head nodding was detected with 85 % subject-independent accuracy, face touch and flat arms with 99 % andgesticulating, fidgeting and posture changes with 67 %. We quan-tified behavioral mimicry by counting how often a team memberfollowed another member to show the same gesture or posture.

• Verbal Communication We measured the total amount of speechwithin a team to capture the amount of direct communication.We developed a noise robust speech activity detection algo-rithm to reliably detect speech in noisy environments. Evalua-tions showed that even if the smartphone was placed in a jacketpocket, speech in the vicinity of a team member could be detectedwith an average accuracy of 85 %.

• Coordination We captured spatial and temporal aspects of coor-dination in firefighting teams by detecting moving sub-groupsand estimating temporal body motion alignment between teammembers. In a training scenario team members were assigned tothe correct sub-group with an average accuracy of 95 %. Usingmetrics from social network analysis we proposed a set of teamcoordination indicators that significantly correlated with com-pletion time of the training mission and perceived coordination.

We have placed the proposed team metrics in context of the teame↵ectiveness framework. In the leadership experiment (Study 1, Sec-tion 2.2.1), we have investigated how leadership style a↵ected mimicrybetween leader and followers as well as the communication behavior ofthe leader. In the firefighting training scenario (Study 2, Section 2.2.2),we have analyzed how team coordination indicators and speech activ-ity correlated with team performance.

2.8. Limitations 45

Further, we developed a smartphone-based sensing system to un-obtrusively observe teams in their working environment. By deployingthe mobile sensing system in a fire brigade over a period of six weeks,we demonstrated that the system can be used to monitor teams in thewild. Positive feedback of the training instructors showed that the vi-sualization of sensor data can be a useful tool for mission and trainingfeedback.

2.8 Limitations

The thesis demonstrated the successful deployment of wearable sys-tems to observe teams in their natural environments. However, thefollowing limitations remain:

• Using wearable sensors requires all team members to wear sens-ing devices in order to capture the complete team interaction. Incase that a team member does not take part in the data collectionor a failure of the sensing device an incomplete view of the teaminteraction is recorded and teams cannot easily be compared.

• To capture the amount of communication between team mem-bers we detected speech in the vicinity of each team member.In the firefighting scenario we did not discriminate who speaksor whether team members communicated in person or over theradio. In real-life incidents the measured amount of communi-cation is likely to be influenced by other third parties which arenot equipped with the sensing system.

• For the detection of moving sub-groups we relied on radio mes-sages send back and forth between devices. However, becauseradio signals are e↵ected by transceiver movement and environ-mental conditions, the search distance varies. In our experimentsthe maximum search distance was in the range of 1 m to 20 m.This accuracy proved to be good enough for the detection ofmoving sub-groups during firefighting. In scenarios where sub-groups are closer together such as during social events the spatialresolution might not be su�cient.

• We measured temporal coordination as simultaneous change inactivity level. As the measure captures temporal alignment ofbody movement independent of the actual activity, it can only


estimate temporal coordination in settings where team membersmove together to solve the task (e.g. in firefighting). In o�cescenarios this measure of temporal coordination is likely notmeaningful as simultaneous change in body movement is notimportant in typical o�ce teams.

2.9 Outlook

This thesis has presented a promising research perspective that mightlead to a new tool for team behavior observation in team research,training and monitoring. However, to become an accepted and prac-tical tool, further research is needed and may address the followingchallenges:

• Generalizability - Our experiments showed the potential of theteam metrics in two limited settings. Because the focus of thethesis was on the extraction of team metrics from sensor data thelinks between the team metrics with team inputs and outcomemeasures were investigated only exemplary. In order to vali-date the generalizability of the approach, future research shouldinvestigate how variables such as team type, task and organiza-tional contexts e↵ect the team metrics.

• Other modalities - In the thesis, we limited ourselves to threemodalities: body motion, speech and proximity. Future studiescould consider physiological signals to capture emergent statesrelated to stress and a↵ect. Furthermore, cameras integrated innew wearable devices could be used to analyze facial expressionsof interaction partners in natural environments.

• Missing data problem - We have assumed that all team memberswear the sensing device, no sensor errors occur and no thirdparties are involved. However, in practice these assumptionsare not always valid: team members might forget to wear thedevice, sensors fail to record data and third persons which arenot monitored interact with team members. In this work, we haveignored all incomplete data. However to handle the missing datacase more e�ciently, future research should investigate whattype of information can be recovered from other team members’data in what kind of scenarios and how interaction with thirdparties wearing no sensors can be detected.

Bibliography


[2] M. Lafrance and M. Broadbent, “Group rapport: Posture sharingas a nonverbal indicator,” Group & Organization Management, vol. 1,no. 3, pp. 328–333, 1976.

[3] B. M. Bass and R. E. Riggio, Transformational Leadership. Routledge,2nd ed., 2006.

[4] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. F. andDavid S. Pallett,N. L. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuousspeech corpus,” 1993.

[5] P. K. Ghosh, A. Tsiartas, and S. Narayanan, “Robust voice activitydetection using long-term signal variability,” IEEE Trans. Audio,Speech, and Language Processing, vol. 19, no. 3, pp. 600–613, 2011.

[6] B. Fong, P. Rapajic, G. Hong, and A. Fong, “Factors causing un-certainties in outdoor wireless wearable communications,” IEEEPervasive Computing, vol. 2, no. 2, pp. 16–19, 2003.

3Quantifying

Behavioral Mimicry

Sebastian Feese, Bert Arnrich, Gerhard Tröster, Bertolt Meyer, Klaus Jonas

Quantifying Behavioral Mimicry by Automatic Detection of NonverbalCues from Body Motion

Proceedings International Conference on Social Computing (SocialCom), pp.520–525, 2012.

c� 2012 IEEE.

50 Chapter 3: Quantifying Behavioral Mimicry

Abstract

E↵ective leadership can increase team performance, however the underly-ing micro-level behaviors that support team performance are still unclear.At the same time, traditional behavioral observation methods rely on man-ual video annotation which is a time consuming and costly process. In thiswork, we employ wearable motion sensors to automatically extract non-verbal cues from body motion. We utilize activity recognition methodsto detect relevant nonverbal cues such as head nodding, gesticulatingand posture changes. Further, we combine the detected individual cues toquantify behavioral mimicry between interaction partners. We evaluateour methods on data that was acquired during a psychological experimentin which 55 groups of three persons worked on a decision-making task.Group leaders were instructed to either lead with individual considerationor in an authoritarian way. We demonstrate that nonverbal cues can bedetected with a F1-measure between 56 % and 100 %. Moreover, we showhow our methods can highlight nonverbal behavioral di↵erences of thetwo leadership styles. Our findings suggest that individually considerateleaders mimic head nods of their followers twice as often and that theirface touches are mimicked three times as often by their followers whencompared with authoritarian leaders.

3.1 Introduction

Whenever we interact with others, we not only take conversationalturns and thus coordinate the flow of speech, but we also tend toalign our gestures, postures, body movements and other nonverbalbehaviors to match those of our interaction partners. Even though weare most often unaware of this matching and synchronization of ournonverbal behavior it “is present in nearly all aspects of our sociallives, helping us to negotiate our daily face-to-face encounters”[1].

In psychology the matching of nonverbal behaviors during face-to-face interaction is known as mimicry which includes facial, prosodicand behavioral mimicry. Mimicry has been found to relate to liking andrapport and in fact to be a unconscious tool to a�liate and disa�liatewith others [2]. When individuals want to a�liate with others theyunconsciously engage in more mimicry, whereas they mimic less whenthey disa�liate. It has been shown that mimicry can lead to empathywhich helps in understanding the emotions of others and that it canlead to more similar attitudes and shared viewpoints. Mimicry leads to

3.1. Introduction 51

more pro-social behavior towards the mimicker. Altogether, mimicry“binds and bonds” people together and supports their face-to-faceinteractions [2].

Despite the fact that mimicry is pervasive in human behavior andhas led to a vast amount of literature in psychology there exist only fewapproaches to measure and quantify mimicry. In psychotherapy, themeasurement of body movement synchronization has been shown tobe a useful tool to track the quality of the patient-therapist relationship[3].

We envision that this tracking of relationship quality is a valuabletool also outside psychotherapy; for example, it could be a comple-mentary tool to measure the interpersonal relationships of employeeswith their leader. In a leadership training scenario in which a teamleader learns to engage and motivate his followers, we can think ofthe coordination measures to be part of an objective feedback to thetrainees about their interaction with their followers.

Traditional measurements of behavioral mimicry depend on videofootage and manual encoding of mimicry which is time consuming,costly and bound to the lab setting. In order to go out of the lab andmeasure natural mimicry behavior in every day situations without theneed of any fixed infrastructure, we employ wearable motion sensors.In this work, we want to evaluate the potential of a wearable setup.

3.1.1 Paper Contribution

In this paper, we concentrate on information derived from wearablemotion sensors to automatically detect nonverbal cues of interactionpartners. We present the following results:

1. We show how on-body motion sensors can provide quantitativeinformation about the nonverbal behavior of humans in natu-ralistic interactions. In particular, we present a method to detectindividual nonverbal cues from motion sensor data using activityrecognition methods. We evaluate our methods on an evaluationset of 30 participants.

2. We show how the automatically detected nonverbal cues of in-dividuals can be combined to capture the behavioral interdepen-dencies between two or more interacting persons to measure andassess behavioral mimicry.


3. We apply our methods to sensor data recorded during a recentpsychological study on leadership behavior in teams [4] anddemonstrate how they can help to uncover nonverbal behaviordi↵erences of two di↵erent leadership styles.

3.1.2 Related Work

Psychology Background

Leadership has been examined from many perspectives. Much workhas been devoted to understand several leadership styles. Two impor-tant types are individualized considerate and authoritarian leadership. In-dividualized considerate leadership is a person-focused leadership style.Considerate leaders pay special attention to their followers’ needs andlisten e↵ectively [5]. As such, individual consideration is connected to“preference for and use of two-way communication, empathy, andwillingness to use delegation" [5, page 132]. As a substantial facet oftransformational leadership, individual consideration has been found toincrease team performance particularly well [6]. In contrast to individ-ually considerate leaders, authoritarian leaders take decisions withoutconsulting their followers [7]. Consequently, authoritarian leadershipcan only work as long as there is no need for input from followersand their motivation does not depend on their involvement in thedecision-making process.

Social Computing

Previous work on automatic analysis of social interactions in smallgroups has dealt with automatic inference of conversational structure,analysis of social attention and the detection of personality traits androles. A review on the topic can be found in [8]. These works havemostly relied on speech related cues such as speaking length, speakerturns and number of successful interruptions. More recently, nonver-bal cues extracted from audio and video have been used for predictinggroup cohesion [9] and identifying emerging leaders in small groups[10]. Individually considerate leaders were characterized using speechactivity cues in [4]. First steps to measure mimicry using vision basedmethods have been taken in [3, 11]. These works aim to measure move-ment synchronization on a lower-level using abstract signal features,whereas our approach relies on discrete nonverbal cues and is thuscloser to the works of LaFrance [12] and Chatrand et. al. [13]. In the

3.2. Experiment 53

field of human-robot interaction head nods have been detected usingvision based methods [14]. In contrast to previous works on meet-ing corpora, our approach to extract nonverbal cues relies on sensordata from wearable motion sensors. Pentland and collaborators havefirst investigated how wearable sensors can be employed to measure’honest signals’ to capture aspects of human behavior in daily life [15].

3.2 Experiment

For this work, we use data recorded during a recent psychologi-cal experiment on leadership [4]. During the experiment participantsworked in groups of three (one leader, two followers) on a simulatedpersonnel selection task. Each group was asked to rank four fictivecandidates with regard to their suitability for an open job position.For the task, each group member received five pieces of informationabout each candidate that were partly shared among group mem-bers (hidden-profile decision making task). Under the guidance of thegroup leader, the group discussed the suitability of each candidate andwas asked to agree on a rank order which served as a measure of groupperformance.

3.2.1 Leadership Manipulation

Half of the leaders were instructed to show individually considerateleadership, whereas the other half was instructed to display author-itarian leadership. Within the study authoritarian leadership refers tothe absence of individual consideration. The oldest group member wasselected as the group leader and received a short leadership trainingfocusing either on individually considerate or authoritarian leadership.In one-minute instruction videos typical behaviors of each leadershipstyle were presented and the leader was asked to show these behav-iors throughout the later discussion. As an incentive, leaders received ara✏e ticket for a cash prize for each behavior that they displayed. Indi-vidually considerate leaders were instructed to stimulate their followers,to make sure that their followers contribute to the final decision, toavoid pushing for their own opinion and to make suggestions on thediscussion structure. Authoritarian leaders were instructed to deter-mine the structure of the discussion, be the first to suggest the rankorder of candidates, to interrupt unsuitable contributions of followers


and to decide on the optimal rank order of candidates after listeningto the followers’ opinions.

3.2.2 Data Set

The upper body motion of each group member was captured with sixinertial measurement units (IMU, XSens MTx) placed on the upperbody. The IMU’s were located on both lower and upper arms, the backand the head (see Figure 3.1). Each IMU includes an accelerometer, agyroscope and a magnetic field sensor. All sensors were sampled witha frequency of 32 Hz. For an easier setup the IMUs were integrated intoa sensor-shirt, a long-sleeve stretch shirt which allowed identical sensorplacement on all participants. Additionally, speech was recorded withseparate lapel microphones and physiological data such as heart rateand breathing rate was recorded with a monitoring chest-belt (ZephyrBioHarness).

In total, we recorded data from 165 subjects (112 female, 53 male;25.4 ± 4.2 years) in 55 group discussions. Due to sensor failures duringthe first groups under investigation, the sensor data of 11 group dis-cussions were partially missing. This left us with 44 discussions within total over 15 hours of discussion time.

3.2.3 Behavior Annotation

Based on the Discussion Coding Manual [16], we have selected sevenrelevant nonverbal cues which can be derived from body motion. Wesummarize these nonverbal cues in Table 3.1. Generally, the nonverbalcues can be categorized into static postures and dynamic gestures orposture changes.

In order to evaluate our detection algorithms, we obtained a groundtruth by manually labeling the first 8 minutes of 10 randomly selectedsessions (five of each leadership style). Consequently, the evaluation setincludes 30 subjects and totals to 240 minutes of discussion data. For thelabeling of the lower-arm cues, we followed a semi-automatic labelingapproach by first employing an activity segmentation of the motiondata stream to pre-segment the motion stream into static and dynamicsegments (for details see section 3.3.1). Static segments of each lowerarm were then labeled as either face touch, arms crossed or arms diagonal.Dynamic segments were labeled as gesticulating whenever the armwas used for a gesture, as fidgeting when the arm was lightly moved

3.3. Individual Nonverbal Cues from Body Motion 55

Table 3.1: Extracted nonverbal cues. Number of instances (#) and theaverage length in seconds (len) in the evaluation set.

cue description (related to) # lenface touch touching ones own face (listening, thinking) 253 5.4arm crossed closed posture (hostile behavior) 135 17.2arm diagonal hands folded on table (listening) 1291 8.6gesticulating hands gesticulate (emphasis, dominance) 691 3.2fidgeting light arm movements (nervousness) 1082 1.5posture change posture changes, incl. all other movements 279 2.7nodding head nodding (back-channel, agreement) 287 1.8

while not changing the posture and as posture-change otherwise. Inaddition head nodding was annotated manually without any automaticsegmentation.

3.3 Individual Nonverbal Cues from Body Motion

3.3.1 Pre-processing and Segmentation

In a pre-processing step, we calibrated the heading (yaw-angle) of theorientation data to face straight forward to the middle of the table. Thisheading calibration was done in order to be independent of earth’smagnetic north and allowed us to use the same model for all threeparticipants around the table.

For the detection of gestures and postures, we first segmented themotion stream of each body part into dynamic and static segments us-ing a sliding window approach. On a sliding window (length: 500 ms;step size: 31.25 ms) a segmentation feature is calculated and a detec-tion threshold is used to segment the motion streams. For the lowerarms, we used the standard deviation of the mean of the accelera-tion magnitude and the gyroscope magnitude as segmentation featurefseg =

12 (||acc||2 + ||mag||2). The detection threshold ⌧ controls the sensi-

tivity of the motion segmentation and was empirically set to ⌧ = 0.1,which we found to be a good balance between static postures and fid-geting movements. To smooth the segmentation output and preventover-segmentation, we deleted all segments shorter than 500 ms andthen merged all segments within 500 ms.


3.3.2 Detection of Gestures and Postures

For each static segment, we calculated the mean of the orientationdata represented by Euler angles (roll, pitch, yaw). In case of dynamicsegments, a number of additional signal-level features were calculatedfor each axis of the acceleration and gyroscope sensors. We used thefollowing features: maximum, minimum, range, maximum absolute value,number of maximum peaks, number of minimum peaks, mean time betweenpeaks and standard deviation.

Because the dynamic cues fidgeting and posture change are depen-dent on their neighbouring static segments (fidgeting is enclosed bythe same posture, whereas posture changes occur between two di↵erentpostures), we also computed neighbourhood features. For each seg-ment, we calculated the absolute di↵erence of each mean orientationangle of the previous and the next segment to capture the similarity ofthe neighbouring static segments.

Before classification, highly correlated features (Pearson’s corre-lation coe�cient, r > 0.8) and features with low standard deviation(� < 0.01) were removed. For the classification of nonverbal cues, weused logistic regression. For dynamic gestures we added the lassopenalty term for automatic feature selection. The lasso penalty � wasset to 0.8. All our results are cross validated leaving one discussion(three subjects) out and are thus subject independent.

3.3.3 Detection of Head Nods

For the head nod detection only the acceleration and gyroscope sensorsof the head mounted IMU are utilized. The detection approach is sim-ilar to the detection of arm gestures and postures, however a slidingwindow with fixed step size (250 ms) and a window length of 1.5 s isused in the segmentation step. In our experiments the sliding windowapproach outperformed other segmentation approaches in case of thehead nod detection and a window length of 1.5 s has been found to beoptimal.

3.4 Interpersonal Cues from Body Motion - BehavioralMimicry

In the previous section, we described how individual nonverbal cuescan be detected from body motion. In the following, we combine the in-

3.4. Interpersonal Cues from Body Motion - Behavioral Mimicry 57

Figure 3.1: An example of behavioral mimicry: The left and right per-son both display the face touch posture. Motion sensors are placed onboth lower arms, back and head.

dividual cues of two persons to measure mimicry. Behavioral mimicrytakes place whenever a person A adopts the behavior of another per-son B. An example of behavioral mimicry is illustrated in Figure 3.1.

3.4.1 Definition

We define mimicry as an event that occurs when person B followsperson A in her behavior, that is both display the same nonverbal cue.Schematically behavior mimicry of one behavioral cue is illustratedin Figure 3.2. To count as a mimicry event, person B needs to startdisplaying behavior x after person A started, but within a certain timedt after person A stopped displaying behavior x (compare examples1,2,4 and 5). To avoid double counting in case that person A displaysthe behavior x again (example 3), person B needs to display x before Astarts again. In case that person B displays x multiple times (example 5)only one mimicry event is counted. More formally, mimicry events aredefined as follows: Given a sequence of behaviors bA

1...N of person A outof a set of behaviors X, a behavior instance is given by bA

i . Start andend times of each behavior instance are accessed by t1[bA

i ] and t2[bAi ],

respectively. A behavior of person A bAi is mimicked by person B if a


A

B

time

1 2 4 5

+dt +dt +dt +dt+dt

3

Figure 3.2: Definition of behavioral mimicry. Person A displays behav-ior x (yellow) five times and person B mimics person A four times (boldborder).

behavior instance bBj exists that meets the following constraints:

bAi = bB

j ,

t1[bBj ] > t1[bA

i ],

t1[bBj ] < min{t2[bA

i ] + dt, t1[bAi+1]} (3.1)

The start time of a mimicking event is given by t1[bAi ], while the

stop time is given by: min{t2[bBj ], t1[bA

i+1]} with j being the maximumindex that fulfills the set of conditions in Equations 3.1, so that bA

i isconsidered as mimicked.

3.4.2 Metrics

To be able to compare behavioral mimicry across dyads we define a setof metrics that summarize the amount of mimicry displayed by twopersons in an interaction. The number of times that person B mimicsthe behavior x of person A is expressed by:

n(B|A, x) =NX

i=1

[[bAi , b

B, x]], (3.2)

with [[bAi , b

B, x]] being an indicator function that returns one if there ex-ists at least one bB

j = x that fulfills all conditions given in Equations 3.1.In our experiment, we are especially interested in the amount of

mimicking between leader L and followers F1, F2, that is we want toknow how often the leader is mimicked by one of his followers:

f (F|L, x) =n(F1|L, x) + n(F2|L, x)

2 ⇤PNi=1[[bL

i = x]](3.3)

3.5. Results and Discussion 59

and how often a follower is mimicked by the leader:

f (L|F, x) =n(L|F1, x) + n(L|F2, x)

PNF1i=1 [[bF1

i = x]] +PNF2

i=1 [[bF2i = x]]

. (3.4)

In order to obtain a relative mimicking measure the absolute mimicrycounts in the above equations are normalized by the number of mim-icking opportunities that the leader (Equation 3.3) and followers (Equa-tion 3.4) have.

3.5 Results and Discussion

3.5.1 Automatic Detection of Nonverbal Cues - Evaluation

We evaluate our approach to detect nonverbal cues with wearablesensors by comparing the detection results to a manually annotatedground truth. As we are interested in how often and how long a certainbehavior is shown by a person, we evaluate the recognition perfor-mance of the nonverbal cue detection by a time-based and an event-based measure as proposed by Ward [17]. The time-based measureevaluates the time overlap of prediction and ground truth, whereasthe event-based measure evaluates the number of correctly classifiednonverbal cues as a fraction of ground truth cues.

The detection accuracies are summarized by the time-based confu-sion matrix presented in Figure 3.3. Because nodding is classified inde-pendently of any arm gestures and postures a separate not nodding classis presented to represent all other head movements. Furthermore, thereis no confusion between dynamic gestures and static postures as theyare classified depending on the outcome of the activity segmentation.

Turning to the results for the head nod detection, we can see fromthe confusion matrix that nodding is detected in 72.1 % of the groundtruth time and that 97.2 % are correctly recognized as not nodding. Theconfusion stems from insertions and deletions and is mostly due toindividual nodding habits as will be shown below. From the event-based measure, one can see that false-positives (insertions) and false-negatives (deletions) occur roughly equally often which is due to thefact that we have optimized the equal error rate which is the breakeven point of precision and recall.

In the lower right corner of the confusion matrix the detectionresults for the lower arm postures are shown. The face touch posturecan be detected with high accuracy, whereas the arm postures arms


72.1

27.9

NaN

NaN

NaN

NaN

NaN

NaN

2.8

97.2

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

71.4

22.8

5.8

NaN

NaN

NaN

NaN

NaN

24.6

67.2

8.2

NaN

NaN

NaN

NaN

NaN

13.7

24.4

61.9

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

98.0

0.2

1.8

NaN

NaN

NaN

NaN

NaN

0.0

62.0

38.0

NaN

NaN

NaN

NaN

NaN

0.0

4.6

95.4

Ground Truth

Pred

ictio

n

[1] [2] [3] [4] [5] [6] [7] [8]

nodding [1]

not nodding [2]

gesticulating [3]

fidgeting [4]

posture change [5]

face touch [6]

arms crossed [7]

arms diagonal [8] 10

20

30

40

50

60

70

80

90

100

Figure 3.3: Detection performance of individual nonverbal cues. Time-based confusion matrix for 30 subjects of the evaluation set. Indepen-dent detection results for head related, dynamic and static cues arecombined.

diagonal and arms crossed cannot be well discriminated. This can beseen by the confusion between arms crossed and arms diagonal. When weinspected the reasons for this confusion, we noticed that the heading(yaw angle) of the orientation sensor is often corrupted due to magneticdisturbances that occur when subjects move their arms over the table.Arms diagonal has high accuracy because it is the most frequent posturethroughout the whole experiment.

Dynamic arm gestures are confused in about 6 % to 24 % percent ofthe time. The confusion between gesticulating and fidgeting are due tolightly performed instances of gesticulating and more accented fidgetinginstances. The confusion between posture changes and fidgeting can beexplained by the di�culties to discriminate the static postures armscrossed and arms diagonal as in this case the neighbourhood features,which are important to di↵erentiate fidgeting from posture changes, alsosu↵er from magnetic disturbances.

In order to automatically observe and characterize teams it is impor-


Table 3.2: Nonverbal Cue detection performance. Median and inter-quartile range across 30 subjects of the evaluation set.

(a) time-basedNonverbal Cue Recall Precision F-Measurenodding 0.68 (0.31) 0.54 (0.36) 0.56 (0.22)gesticulating 0.80 (0.32) 0.79 (0.47) 0.68 (0.28)fidgeting 0.73 (0.18) 0.67 (0.32) 0.66 (0.15)posture change 0.53 (0.43) 0.55 (0.38) 0.50 (0.34)face touch 1.00 (0.02) 1.00 (0.00) 1.00 (0.02)arms crossed 0.44 (0.97) 0.00 (0.91) 0.00 (0.61)arms diagonal 0.99 (0.07) 1.00 (0.11) 0.97 (0.12)

(b) event-basedNonverbal Cue Recall Precision F-Measurenodding 0.67 (0.29) 0.73 (0.35) 0.67 (0.18)gesticulating 0.68 (0.38) 0.68 (0.35) 0.56 (0.25)fidgeting 0.82 (0.11) 0.78 (0.23) 0.78 (0.09)posture change 0.54 (0.43) 0.65 (0.31) 0.57 (0.32)face touch 1.00 (0.02) 1.00 (0.00) 1.00 (0.05)arms crossed 0.44 (0.88) 0.00 (0.81) 0.00 (0.50)arms diagonal 0.98 (0.08) 1.00 (0.12) 0.98 (0.09)

tant to detect the nonverbal cues with high accuracy and low varianceacross a wide range of persons. To investigate the person dependency,we summarize the individual detection performances in Table 3.2 interms of recall, precision and F1-measure across all 30 subjects of theevaluation set. It can be seen that the face touch posture can be recog-nized with an F1-measure of about 1.00 across all participants. Withan inter-quartile range between 0.09 and 0.25 higher user variabilityis observed in case of nodding, gesticulating and fidgeting. High uservariability in case of posture change, arms crossed and arms diagonal canbe explained by the di�culties to di↵erentiate these classes due tomagnetic disturbances.

3.5.2 Leadership Experiment

To demonstrate the value of our detection methods of nonverbal be-havior, we have applied our methods to the complete data set of theleadership experiment. As described in section 3.2, the groups in ourexperiment were led either with individual consideration (IC) or in anauthoritarian (AU) fashion. In the following, we will use our methods to


uncover some of the nonverbal behavioral di↵erences of the two lead-ership styles. Due to the di�culties to di↵erentiate the postures armscrossed and arms diagonal, we have combined them in a arms flat pos-ture. Under the assumption that possible detection errors are equallydistributed across the two groups, we test whether the two leader-ship styles di↵er in terms of individual and interpersonal nonverbalcues. We report the p-values obtained by a non-parametric permuta-tion test to indicate statistical significance, as well as Cli↵’s delta as anon-parametric measure of e↵ect size.

Leadership Style and Individual Cues

We first compare individually considerate and authoritarian leaders interms of their individual nonverbal cues, using the average cue rate.Concretely, we count how often a cue is displayed by a person andnormalize by the discussion length. In Figure 3.4 the median valueacross all leaders of both groups are illustrated. A statistical significantdi↵erence (p = 0.048, Cli↵’s delta = 0.47) between the two leadershipstyles is observed for the arms flat cue which suggests that authoritar-ian leaders are more often moving their lower arms than individualconsiderate leaders. This finding is in line with our expectations sinceauthoritarian leaders spoke more and are thus more likely to gesticulatethan individually considerate leaders. For the other individual cues, wehave not observed any di↵erences in terms of the cue rate, suggestingthat the two leadership types are relatively similar when we only lookat individual cues of the leader. The reason behind lies in the individualcues themselves as they do not explicitly capture the interdependencybetween interaction partners and thus cannot uncover any behavioraldi↵erences that are important for relating to other people which is onemain characteristic of individual consideration.

As the two leadership styles might also a↵ect the behavior of thefollowers, we present in Figure 3.4 the median value across all follow-ers. As one can see the followers only behave di↵erently in terms ofposture changes ( p = 0.024, Cli↵’s delta = �0.33), suggesting that fol-lowers in teams of IC leaders are more often changing their posture.Active followers would fit nicely in the framework of individual con-siderate leadership, as individually considerate leaders aim to stimulatetheir followers to contribute their skills and ideas. However, one hasto remain cautious as the detection accuracy of posture changes variedthe most during the evaluation across participants (see Table 3.2).


0

2

4

6

8

# in

stan

ces

per

min

ute

*AU leader

IC leader

0

2

4

# in

stan

ces

per

min

ute

*AU followers

IC followers

nodding gesticulating fidgeting posturechange

face touch arms flat

Figure 3.4: Individual nonverbal cues for individually considerate (IC)and authoritarian (AU) groups. Median values across all discussions. Astar ’*’ indicates a significant di↵erence at the 5 %-level.

Leadership Style and Behavioral Mimicry

Some of the interdependencies between interaction partners can becaptured with the interpersonal cues that quantify behavioral mimicry.In our setting, we are interested in two directions of mimicry: The firstone quantifies the leader’s mimicry behavior towards his followers(Eq. 3.4), while the second quantifies the mimicry behavior of the fol-lowers towards their leader (Eq. 3.3). Generally, the amount of mimicrydescribes the fraction of how many cue events, e.g. nodding instancesof one of the followers, were mimicked by the leader. We refer to sec-tion 3.4.2 for details.

Figure 3.5 summarize the mimicry behavior between leaders andfollowers. Comparing the median amount of the leader’s mimicrytowards his followers, we find that individually considerate leadersmore often mimic the nodding behavior of their followers (p = 0.031,


0

10

20

30

40

50

mim

icry

[%

]

* *

AU leader mimics followers

IC leader mimics followers

0

10

20

30

nodding gesticulating fidgeting posturechange

face touch arms flat

mim

icry

[%

]

*AU leader is mimicked by follower

IC leader is mimicked by follower

Figure 3.5: Mimicry amount between leaders and their followers. Me-dian values for individually considerate (IC) and authoritarian (AU) lead-ers. A star ’*’ indicates a significant di↵erence at the 5 %-level.

Cli↵’s delta = �0.20). On average IC leaders mimic 23.2 % of their fol-lower’s nods, whereas AU leaders mimic only 10.9 % of the nods oftheir followers. One might conclude that individually considerate lead-ers seem to agree more with their followers, however this hypothesisis not supported by the analysis of the nodding rate of the leader(see Figure 3.4). The higher amount of nodding mimicry might signalthat IC leaders give more attention to their followers. Furthermore,we observed that IC leaders more often mimic posture changes of theirfollowers (p = 0.001, Cli↵’s delta = �0.51) as opposed to AU leaders. Itseems that IC leaders react more often to posture changes of their follow-ers. The median mimicry amount of followers mimicking their leaderis illustrated in Figure 3.5. It appears that followers who are led by anIC leader more often mimic the face touch of their leader (p = 0.038,Cli↵’s delta = �0.29).

3.6. Conclusion and Outlook 65

3.6 Conclusion and Outlook

We have presented a method on how information about nonverbalcommunication in groups can be derived from body-worn motionsensors. Using activity recognition methods, we detect important non-verbal cues from body motion data. In a naturalistic meeting scenarioand a test set including 30 randomly chosen subjects, our results showthat basic postures like the face touch can be detected nearly perfectly(1.00 F1-measure), whereas more similar cues such as gesticulating, fid-geting and posture changes can be detected with an F1-measure between56 % and 78 %. Nodding can be detected with an F1-measure of 67 %.Furthermore, we have presented how individual nonverbal cues canbe combined to quantify behavioral mimicry and proposed a metric ofbehavioral mimicry. We have applied our methods to investigate thenonverbal behavior in groups that were led by leaders of two distinctleadership styles. We identified characteristic di↵erences of individu-ally considerate and authoritarian leaders: Individually considerate leadersseem

• to be less active

• to mimic their followers’ nodding behavior more often

• to mimic posture changes of their followers more often

and followers of IC leaders seem to

• mimic their leader’s face touch more often.

The presented results demonstrate the potential of on-body sensingfor the detection of nonverbal cues from body motion. The automati-cally found characteristics of IC leaders are in line with the theoreticalframework of individualized considerate leadership. Future out of thelab studies will have to prove the value of the presented methods toquantify behavioral mimicry in the wild.


Bibliography

[1] F. J. Bernieri and R. Rosenthal, “Interpersonal coordination: Be-havior matching and interactional synchrony,” in Fundamentals ofNonverbal Behavior, Cambride University Press, 1991.


[3] F. Ramseyer and W. Tschacher, “Nonverbal synchrony in psy-chotherapy: Coordinated body movement reflects relationshipquality and outcome.,” Journal of Consulting and Clinical Psychol-ogy, vol. 79, no. 3, pp. 284–95, 2011.

[4] S. Feese, A. Muaremi, B. Arnrich, G. Tröster, B. Meyer, andK. Jonas, “Discriminating individually considerate and author-itarian leaders by speech activity cues,” in Proc. Int. Conf. SocialComputing (SocialCom), 2011.


[6] G. L. Stewart, “A meta-analytic review of relationships betweenteam design features and team performance,” Journal of Manage-ment, vol. 32, no. 1, pp. 29–55, 2006.

[7] K. Lewin, R. Lippitt, and R. K. White, “Patterns of aggressive be-havior in experimentally created social climates,” Journal of SocialPsychology, vol. 10, no. 2, pp. 271–299, 1939.





[11] X. Sun, K. Truong, M. Pantic, and A. Nijholt, “Towards visualand vocal mimicry recognition in human-human interactions,” inProc. Int. Conf. Systems, Man, and Cybernetics, 2011.

[12] M. Lafrance and M. Broadbent, “Group rapport: Posture shar-ing as a nonverbal indicator,” Group & Organization Management,vol. 1, no. 3, pp. 328–333, 1976.

[13] T. L. Chartrand and J. A. Bargh, “The chameleon e↵ect: theperception-behavior link and social interaction,” Journal of Per-sonality and Social Psychology, vol. 76, no. 6, pp. 893–910, 1999.

[14] L.-P. Morency, A. Quattoni, and T. Darrell, “Latent-dynamic dis-criminative models for continuous gesture recognition,” in Proc.Int. Conf. Computer Vision and Pattern Recognition (CVPR), 2007.

[15] A. Pentland, Honest Signals: How they shape our world. Cambridge:The MIT Press, 2008.

[16] C. C. Schermuly and W. Scholl, Das Instrument zur Kodierung vonDiskussionen (IKD). Hogrefe, 2011.

[17] J. A. Ward, P. Lukowicz, and H. Gellersen, “Performance met-rics for activity recognition,” ACM Trans. Intelligent Systems andTechnology, vol. 2, pp. 6:1–6:23, 2011.

4Discriminating

Leadership Style

Sebastian Feese, Amir Muaremi, Bert Arnrich, Bertolt Meyer, Klaus Jonas, GerhardTröster

Discriminating Individually Considerate and Authoritarian Leaders bySpeech Activity Cues

Proceedings International Conference on Social Computing (SocialCom 2011),pp. 1460–1465, 2011.

c� 2011 IEEE.

70 Chapter 4: Discriminating Leadership Style

Abstract

E↵ective leadership can increase team performance, however up to now theinfluence of specific micro-level behavioral patterns on team performance isunclear. At the same time, current behavior observation methods in socialpsychology mostly rely on manual video annotations that impede research.In our work, we follow a sensor-based approach to automatically extractspeech activity cues to discriminate individualized considerate from au-thoritarian leadership. On a subset of 35 selected group discussions leadby leaders of di↵erent styles, we predict leadership style with 75.5 % accu-racy using logistic regression. We find that leadership style predictabilityis dependent on the relative discussion time and is highest for the middleparts of the discussions. Analysis of regression coe�cients suggests thatindividually considerate leaders start speaking more often while othersspeak, use short utterances more often, change their speech loudness moreand speak less than authoritarian leaders.

4.1 Introduction

In today’s business world, teams are a central aspect of organizationalcooperation and their performance is crucial for organizational suc-cess [1]. It is widely accepted that e↵ective leadership style can increaseteam performance. However, the influence of specific behavioral pat-terns of the team members on team performance is unclear. Recently,psychologists started to investigate how specific micro-level behav-iors of team members like gestures or vocal expression influence theoverall team performance. One major challenge when investigating be-havioral patterns is that the available methods in psychology are stillmostly based on manual annotation of video recordings and are thuslabor intensive, time consuming, and prone to error due to subjectiveassessments.

A sensor-based, automatic acquisition and detection of nonverbalcues from body posture, gestures and vocal expressions could poten-tially discover such specific micro-level behaviors in an objective wayand thus contribute to a better understanding of e↵ective leadershipstyle. Moreover, a sensor-based approach has the potential to measuremicro-level behaviors outside the lab.

In this work, we describe our e↵ort to characterize and classify twoimportant leadership styles, individualized consideration and authoritar-ian leadership, with automatically extracted nonverbal cues from sen-

4.2. Prior and Related Work 71

sor data. We present an interdisciplinary laboratory study in which 165participants completed a decision making task in groups of three underthe guidance of a leader. Although group leaders were trained to showspecific behaviors for each leadership style, not all leaders played theirrole perfectly, which motivated us to select a subset of discussions inwhich the leaders played their roles well. We describe our selectionmethod and investigate how well speech activity cues di↵erentiateleadership style. This work represents one step towards sensor-baseddiscovery of micro-level behaviors of team leaders during meetings.

4.2 Prior and Related Work

4.2.1 Leadership Style in Social Psychology

Leadership has been examined from many perspectives and severalleadership styles have been identified within the last century. In thispaper we consider individualized considerate and authoritarian leadership.Individual consideration is a substantial facet of transformational lead-ership that has been found to increase team performance particularlywell [2]. Individually considerate leaders pay special attention to theirfollowers’ needs and listen e↵ectively [3]. As such, individual consid-eration is supposedly connected to “preference for and use of two-waycommunication, empathy, and willingness to use delegation" [3, page132]. Authoritarian leaders on the other hand take decisions withoutconsulting their followers [4]. Consequently, authoritarian leadershipcan only work as long as there is no need for input from the follow-ers and their motivation does not depend on their involvement in thedecision-making process. However, in the presented study, authoritar-ian leadership simply refers to the absence of individual consideration.

4.2.2 Social Computing

A review on the automatic analysis of social interactions in smallgroups can be found in [5]. Previous work in the social signal pro-cessing domain dealt with automatic inference of conversation struc-ture [6, 7, 8], analysis of social attention [9, 10] and the detection ofpersonality traits [11, 12] and roles [13, 14, 15]. These works mostlyrelied on speech related cues such as speaking length, speaker turnsand number of successful interruptions. Additionally, physical activitycues were estimated with vision based methods, but only in few works


motion sensors were used to track body motion. For classifying leader-ship style, the detection of dominance is especially important, becauseauthoritarian leaders are more dominant than individually considerateleaders. On five minute slices extracted from 11 meetings of the AMIMeeting Corpus, dominance and status were automatically detectedin [16]. For the dominance classification task, 59 meeting slices weremanually annotated by three raters and two sets of either full or ma-jority agreement among annotators were considered. Two classifierswere compared. An unsupervised approach simply classified the per-son with the smallest (highest) value of a cue, e.g. speaking time, as themost dominant person. This simple method was compared to SupportVector Machines for feature subsets. Accuracies ranged from 80% to90% when classifying the most dominant out of four persons in a meet-ing. Nonverbal cues for predicting cohesion in small groups were in-vestigated in [17]. From the AMI Corpus 120 segments of two minuteswere annotated by external observers and segments with high inter-rater reliability were selected for the classification task of high and lowcohesion. Nonverbal cues were compared by a simple threshold basedclassifier. More recently, correlations between emerging leadership insmall groups with speech related nonverbal cues have been examinedin [18]. A method to measure posture mirroring in social interactionwas presented in [19] and results indicate that posture mirroring dif-fers across groups of di↵erent leadership. In contrast to the works onmeeting corpora, Pentland and collaborators investigated how wear-able sensors can be employed to measure aspects of human behaviorin daily life. Human behavior such as physical activity, speech activityand face-to-face interaction was recorded with sociometric badges topredict personality traits and group performance from sensor data [11].

4.3 Experiment

In order to investigate how micro-level behavior di↵erentiates leader-ship styles, we conducted an experiment in which participants werediscussing in groups of three persons under the guidance of a selectedleader. Fifty-five groups were asked to work on a decision makingtask to rank four fictive candidates with regard to their suitability foran open job position. For the task, each group member received fivepieces of information about each candidate that were partly sharedamong group members (hidden-profile decision making task). Underthe guidance of the group leader, the group had to discuss the suit-

4.3. Experiment 73

ability of each candidate and agree on a rank order which served as ameasure of group performance. The experiment design was proposedand first validated in [20].

4.3.1 Leadership Manipulation

As we are interested in behavior di↵erences across leadership styles,leadership style was manipulated. Half of the leaders were instructedto show individually considerate leadership, whereas the other half wasinstructed to be authoritarian.

Upon arrival at the laboratory, the oldest group member was se-lected as the group leader and was led to a separate room where shereceived a short leadership training focusing either on authoritarianleadership or on individually considerate leadership. In five one-minuteinstruction videos typical behaviors of each leadership style were pre-sented and the leader was asked to show these behaviors throughoutthe later discussion. As an incentive, leaders received a ra✏e ticket fora cash prize for each behavior that they displayed.

Leaders that were instructed to be considerate leadershipindividuallyconsiderate received the following instructions:

• Try to stimulate each of your followers to contribute his or herviews and knowledge to the discussion

• Make sure that all of your employees contribute to the final de-cision

• Avoid pushing for your own opinion, e.g. after the group hasarrived at a rank order ask each group member about any doubtsregarding the decision

• Make suggestions on how the discussion might be structuredand discuss these with your followers

In the control group, authoritarian leaders were instructed to show thefollowing behaviors:

• Determine the structure of the discussion

• Be the first to suggest the rank order of candidates

• Interrupt unsuitable contributions of followers

• Decide on the optimal rank order of candidates after listening tothe followers’ opinions


Figure 4.1: Experiment setup: participants wearing sensor shirts.

4.3.2 Sensor Data Acquisition

Each group member was equipped with a separate clip-on lapel micro-phone. The speech of all group members was synchronously recordedat a sampling rate of 44.1 kHz via an USB-Audio-Interface on a PC.The upper body motion of each group member was captured with sixinertial measurement units (IMU) (XSens MTx) which were located onboth lower and upper arms, the back and the head (Figure 4.1). Ad-ditionally, physiological data such as heart rate and breathing rate ofeach group member was recorded with a monitoring chest-belt (ZyphrBioHarness).

4.3.3 Video Annotation

All discussions were recorded on video and coded with the Discus-sion Coding System (DCS) [21]. The DCS is a state-of-the-art codingsystem to analyze group interaction. It dissects the group interactioninto individual statements or acts of communication. Each act is tran-scribed in brief. Its accompanying interpersonal a↵ect is coded ontwo dimensions: power (dominance vs. submissiveness) and a�lia-tion (friendliness vs. hostility). The ratings on these dimensions arebased on verbal and nonverbal cues as described in the DCS man-

4.4. Methods 75

ual [21]. Examples include interrupting someone else or expressivegesticulation as markers for dominance. The function of a speech actis divided in main and minor categories. For the main category, it iscoded whether the act is a social-emotional statement (di↵erentiated inpositive or negative), whether it is a statement with regard to the con-tent of the task, or whether it is aimed at regulating the discussion. Foreach of these three main categories, the two minor categories proposaland question are coded, as these mark important process elements fordecisions. Additionally, the reactions (agreement, rejection) followingan act are coded.

4.3.4 Data Set

In total, we recorded data from 165 subjects (112 female, 53 male; age= 25.4±4.2) in 55 group discussions. Due to a technical problem in oneof the sensor shirts at the beginning of the experiment we lost sensordata of 11 subjects. In consequence, we ended up with a data set thatincludes 44 group discussions (16 groups were lead authoritarian and18 with individual consideration) with three participants each. In the 44selected sessions were eight male and ten female individually considerateleaders and five male and eleven female authoritarian leaders. Our dataset totals to over 15 hours of discussion time.

4.4 Methods

4.4.1 Check of Leadership Manipulation

After the discussion, the followers rated their team leader on theindividualized consideration scale of the MLQ 5X leadership question-naire [22]. Individual considerate leaders were evaluated as more indi-vidually considerate (M = 3.28, SD = 0.79) than in authoritarian leaders(M = 2.58, SD = 1.00, t(108) = 4.13, p < .001). Despite the statisti-cal di↵erence in the perceived individualized consideration, we noticedthroughout the experiment that some of the group leaders did not leadtheir followers as instructed. This noise in the class labels decreases theperformance of the leadership style classification task and motivatedus to select a subset of discussions in which the leaders played theirroles well. We therefore also check the leadership style manipulationwith help of the video annotation. If we assume that the DCS capturesrelevant behaviors to di↵erentiate leadership style and that only few


leaders did not play their role well, we can calculate cues that sum-marize the leadership behavior of each group leader and use these forstyle prediction to exclude misclassified discussions. From the DCS,we calculated the following cues that summarize the behavior of theleader:

• DCS Speaking Time measures the relative speaking time in termsof discussion length

• DCS Number of Questions Asked asked by the leader divided bythe total number of communication acts within the discussion

• DCS Number of Proposals Made made by the leader divided by thetotal number of communication acts within the discussion

• DCS A�liation of each communication act was encoded on a five-point scale. We use the mean of all statements of the leader tomeasure a�liation of the leader towards the followers

• DCS Power of each communication act was encoded on a five-point scale and we use the mean of all statements of the leaderto measure power of the leader towards his followers

• DCS Number of Times Addressed measures how often the leaderwas addressed by her followers normalized by the number oftotal communication acts per discussion

With these cues from the DCS, we fit a linear logistic regressionmodel to predict leadership style. All discussions that are misclassifiedare excluded for the later analysis.

4.4.2 Speech Activity Cues

From the audio recordings we extract speech activity cues adoptedfrom [16] to summarize the speaking behavior of each group memberthroughout the discussion. In a first step, relevant audio features suchas signal energy were extracted for each frame (frame length: 25 ms,step size: 10 ms) with openSMILE [23]. Speaker diarization was per-formed by employing a simple threshold based approach. Speech fora group member was detected if the energy di↵erence between thegroup members energy value and the mean value of the other groupmembers was greater than an empirically set threshold. Speech activitysegments shorter than 30 ms were then removed and segments of the

4.4. Methods 77

same speaker within 1000 ms were merged. As in [16, 12] we follow aslice-based approach to calculate cues on discussion excerpts. We cutthe discussion into non-overlapping slices of fixed length ranging fromone minute to six minutes and calculate the following speech activitycues for each slice:

• Average Single Speaking Energy (ASSE) is the median of the sig-nal energy per frame when a speaker speaks alone. The energyper frame is the sum of squared signal values multiplied by ahamming window.

• Change Single Speaking Energy (CSSE) is the inter-quartile rangeof the signal energy per frame when a speaker speaks alone.

• Single Speaking Length (SSL) measures the amount of time that aperson speaks alone

• Multiple Speaking Length (MSL) measures the amount of time thata person speaks while at least one other person speaks

• Total Speaking Length (TSL) is the total amount of speech for eachspeaker; it is the sum of SSL and MSL

• Speaking Turns (ST) is the number of speaking turns for the person

• Successful Interruptions (SI) is the number of successful interrup-tions. Person i interrupts person j if person i starts talking whileperson j talks and person j stops before person i.

• Unsuccessful Interruptions (UI) is the number of unsuccessful in-terruptions. Person i does not interrupt person j if person i startstalking while person j talks and person i stops before person j.

• Average Speaking Turn Duration (ASTD) is the median turn dura-tion

• Change in Speaking Turn Duration (CSTD) is the inter-quartilerange of turn duration

• Short Utterances (SU) is the number of turns shorter than onesecond


To compare the slice based-approach, we also calculated the speech ac-tivity features for the whole length of each discussion and normalizedthem by the discussion length.

In addition to speech activity cues, we extracted prosodic featuressuch as fundamental frequency, voice quality, voicing probability andformants. Prosodic features have been used in emotion recognitionand capture how a person speaks and how much emphasis they giveto a statement rather than how much a person speaks. We summarizedthe prosodic features over each slice by their median and inter-quartilerange. However, we excluded all prosodic cues from further analy-sis because our data set contains an unequal distribution of malesand female leaders (see 4.3.4) and the fact that prosodic features alsocharacterize gender. We tested the gender dependence of the prosodicfeatures with the Wilcoxon rank-sum test. The test revealed that mostof the cues were significantly dependent on gender and was the reasonfor us not to include prosodic features in the further analysis.

4.4.3 Classification of Leadership Style from Speech Cues

For the task of leadership style classification from automatically ex-tracted speech cues we use logistic regression with the Lasso penaltyterm. We chose the logistic regression classifier because the Lassoshrinkage o↵ers variable selection and the learned models can be eas-ily understood by an analysis of the regression coe�cients. As trainingdata we take all slices from the leader and fit logistic regression mod-els for each slice length. To obtain person independent results, weemploy a leave-one-discussion-out cross-validation scheme and ex-clude in each fold all slices of one discussion as test data. From 35available discussions we randomly sample 15 of each leadership styleand calculate the cross-validated accuracy. We repeat this procedure1000 times and report the mean and standard derivation of the accu-racy. In addition to the slice-based accuracy, we use majority voting onthe predictions of all slices of one discussion to predict which style theleader displayed in a particular discussion.

To investigate whether style prediction is dependent on the flowof the discussion, we calculate slices of fixed length at equally spacedintervals of the discussion. Considering di↵erent durations of the dis-cussions, the start of each slice is relative to the discussion length. Foreach time step, we randomly sample 15 discussions of each leadershipstyle to calculate the cross-validated accuracy. We report the mean and


DCS Power

DCS Affiliation

DCS Speaking Time

DCS Number of Questions Asked

DCS Number of Proposals Made

DCS Number of Times Addressed

Authoritarian Individual Considerate

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Figure 4.2: Coe�cients of the logistic regression model. Questionsand speaking time are the most important variables to distinguishleadership style with variables of the discussion coding system.

standard derivation for 100 sampling iterations.

4.5 Results and Discussion

4.5.1 Check of Leadership Manipulation

Classifying the leadership style using the cues from the DCS we achievean accuracy of 79.2%. Nine of 44 discussions were not correctly classi-fied and were excluded. Thus the selected subset contains 35 discus-sions out of which 17 were lead authoritarian and 18 with individualconsideration. These discussions are 100% distinguishable with a linearlogistic regression model and the cues from the DCS. The coe�cientsof the learned model are displayed in Figure 4.2. Analyzing the coe�-cients, we notice that the most predictive variables are DCS Number ofQuestions Asked and DCS Speaking Time. This suggests that individuallyconsiderate leaders ask more questions and speak less than authoritar-ian leaders.

4.5.2 Leadership Style Detection from Speech Cues

The results of the leadership style classification task for di↵erent slicelengths are presented in Figure 4.3. The mean accuracy increases fromslightly above chance level for one minute slices to 72.1% for five


1 min 2 min 3 min 4 min 5 min 6 min discussion

Accu

racy

0.0

0.2

0.4

0.6

0.8

1.0

slice baseddiscussion based

Figure 4.3: Person independent performance of leadership style clas-sification from speech activity cues. Mean and standard derivation ofcross-validated accuracy.

minute slices. The mean accuracy for slices over the entire discussionis 70.5%. Since one discussion consists of multiple slices, we can usethe majority voting principle for the discussion based classification.For a discussion to be counted as correctly classified more than halfof all slices of that particular discussion need to be correctly classi-fied. The higher the slice based accuracy (for values above 50%) andthe number of slices within a discussion, the higher is the discussionbased accuracy. The optimal ratio for our one minute step analysisis reached at four minutes with an accuracy of 75.5%. Four minutesseems to be the shortest slice length in our data which captures enoughspeech activities needed for the extraction of meaningful speech cuesto discriminate leadership style.

Figure 4.4 depicts the slice accuracy over the relative discussiontime. The slice length is fixed to four minutes. It can be seen fromthe graph that the detection accuracy is low at the beginning and theend of the discussion and reaches its maximum in the middle of thediscussion. From these observations, it can be stated that the ability todistinguish the leadership styles is dependent on the relative time ofthe discussion, and the best classification accuracy is achieved towardsthe middle of the discussion.

In order to better understand the importance of each speech cue,

4.6. Conclusion and Outlook 81

Start time of four minute slice relative to discussion length [%]

Accu

racy

0.50

0.55

0.60

0.65

0.70

0.75

0 20 40 60 80 100

Figure 4.4: Cross-validated leadership style classification accuracy(Mean and SD) over relative discussion time. Four minutes slices areshifted in time relative to the discussion length. Predictability of theleader is highest for the middle parts of the discussion.

we analyze the coe�cients of the fitted logistic regression models. Abox plot summarizing the coe�cients of the models trained on dataof four minute slices is presented in Figure 4.5. The most predictivevariables are Change in Single Speaking Energy, Speaking Time, ShortUtterances and Interruptions. Analysis of the coe�cients reveals thatauthoritarian leaders speak more and have longer turns. This is coher-ent as these speech cues have also been found to be good predictorsof dominance [16]. Individually considerate leaders instead, vary theirspeech loudness, have more short utterances and interrupt followersmore often. These speech cues are linked to back-channeling and couldindicate e↵ective listening which is typical for individually considerateleaders [3, page 7].

4.6 Conclusion and Outlook

We have presented a psychological experiment in which 165 subjectsparticipated in groups of three under the guidance of a leader. Aim-ing at a better understanding of micro-level behavior of two di↵er-ent leadership styles (individualized consideration and authoritarian), wehave used a subset of discussions for automatic prediction of leader-ship style. To select discussions in which leaders played their role as


Total Speaking Length (TSL)Single Speaking Length (SSL)

Multiple Speaking Length (MSL)Speaking Turns (ST)

Successful Interruptions (SI)Unsuccessful Interruptions (UI)

Average Speaking Turn Duration (ASTD)Change in Speaking Turn Duration (CSTD)

Short Utterances (SU)Average Single Speaking Energy (ASSE)

Change in Single Speaking Energy (CSSE)

−1.5 −0.5 0.0 0.5 1.0 1.5

Authoritarian Individual Considerate

Figure 4.5: Box-plot of regression coe�cients for four minute slices.The coe�cients suggest that authoritarian leaders speak more and havelonger turns, whereas individually considerate leaders have more shortutterances, vary their speech loudness and speak more often whilefollowers speak.

instructed, we have used a logistic regression model fitted on variablesthat summarize the leaders behavior as manually encoded by externalobservers. Using automatically extracted speech activity cues and alogistic regression, we detect the leadership style with an accuracy of75.5%. Analysis of the regression coe�cients shows that individuallyconsiderate leaders not only have shorter turns, but also use more shortutterances and interrupt followers more often which taken togethercould signal e↵ective listening and would be in line with the literatureon leadership [3, page 7].

In the present study we limited ourselves to speech cues from theleader. However, to better capture the discussion flow and conversa-tional patterns, speech of all group members needs to be analyzed.Future work will also include the analysis of body posture and ges-tures.

Bibliography


[2] G. L. Stewart, “A meta-analytic review of relationships betweenteam design features and team performance,” Journal of Manage-ment, vol. 32, no. 1, pp. 29–55, 2006.


[4] K. Lewin, R. Lippitt, and R. K. White, “Patterns of aggressive be-havior in experimentally created social climates,” Journal of SocialPsychology, vol. 10, no. 2, pp. 271–299, 1939.


[6] T. Choudhury and S. Basu, “Modeling conversational dynamicsas a mixed memory markov process,” in Proc. Int. Conf. NeuralInformation and Processing Systems (NIPS), 2004.

[7] K. Otsuka, J. Yamato, Y. Takemae, and Murase, “A probabilisticinference of multiparty-conversation structure based on markov-switching models of gaze patterns, head directions, and utter-ances,” in Proc. Int. Conf. Multimodal Interfaces (ICMI), pp. 191–198,2005.

[8] D. B. Jayagopi and D. Gatica-Perez, “Mining group nonverbalconversational patterns using probabilistic topic models,” IEEETrans. Multimedia, vol. 12, no. 8, pp. 790–802, 2010.

[9] S. O. Ba and J.-M. Odobez, “Recognizing visual focus of attentionfrom head pose in natural meetings,” IEEE Trans. Systems, Man,and Cybernetics, Part B: Cybernetics, vol. 39, no. 1, pp. 16–33, 2009.


[10] R. Subramanian, J. Staiano, K. Kalimeri, N. Sebe, and F. Pianesi,“Putting the pieces together: Multimodal analysis of social at-tention in meetings,” in Proc. Int. Conf. ACM Multimedia, p. 659,2010.


[12] B. Lepri, K. Kalimeri, and F. Pianesi, “Honest signals and theircontribution to the automatic analysis of personality traits - acomparative study,” in Human Behavior Understanding, vol. 6219of Lecture Notes in Computer Science, Springer Berlin / Heidelberg,2010.

[13] M. Zancanaro, B. Lepri, and F. Pianesi, “Automatic detection ofgroup functional roles in face to face interactions,” in Proc. Int.Conf. Multimodal Interfaces (ICMI), p. 28, ACM Press, 2006.

[14] W. Dong, B. Lepri, A. Cappelletti, A. S. Pentland, F. Pianesi, andM. Zancanaro, “Using the influence model to recognize functionalroles in meetings,” in Proc. Int. Conf. Multimodal Interfaces (ICMI),2007.

[15] H. Salamin, S. Favre, and A. Vinciarelli, “Automatic role recogni-tion in multiparty recordings: Using social a�liation networks forfeature extraction,” IEEE Trans. Multimedia, vol. 11, no. 7, pp. 1373–1380, 2009.

[16] D. B. Jayagopi, H. Hung, C. Yeo, and D. Gatica-Perez, “Model-ing dominance in group conversations using nonverbal activitycues,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17,no. 3, pp. 501–513, 2009.



85

[19] S. Feese, B. Arnrich, G. Tröster, B. Meyer, and K. Jonas, “Detectingposture mirroring in social interactions with wearable sensors,”in Proc. Int. Symp. Wearable Computers (ISWC), 2011.

[20] C. C. Schermuly, Das Instrument zur Kodierung von Diskussionen(IKD) - Untersuchung der psychometrischen Qualität und experi-menteller Einsatz zur Prüfungdes Empowermentkonstrukts. PhD the-sis, Humboldt-Universität zu Berlin, 2011.

[21] C. C. Schermuly and W. Scholl, Das Instrument zur Kodierung vonDiskussionen (IKD). Hogrefe, 2011.

[22] B. Bass and B. Avolio, Multifactor Leadership Questionnaire for Re-search: Permission Set. Mindgarden, 1995.

[23] F. Eyben, M. Wöllmer, and B. Schuller, “OpenSMILE: The Munichversatile and fast open-source audio feature extractor,” in Proc.Int. Conf. ACM Multimedia, p. 1459, 2010.

5Noise Robust Speech

Activity Detection

Sebastian Feese, Gerhard Tröster

Robust Voice Activity Detection for Social Sensing

Proceedings of the Conference on Pervasive and Ubiquitous Computing Ad-junct Publication (UbiComp Adjunct), pp. 931–938, 2013.

c� ACM 2013.

88 Chapter 5: Noise Robust Speech Activity Detection

Abstract

The speech modality is a rich source of personal information. As such,speech detection is a fundamental function of many social sensing ap-plications. Simply the amount of speech present in our surroundings cangive indications about our sociability and communication patterns. In thiswork, we present and evaluate a speech detection approach utilizing dic-tionary learning and sparse signal representation. Transforming the noisyaudio data to the sparse representation with a dictionary learned fromclean speech data, we show that speech and non speech can be discrimi-nated even in low signal-to-noise conditions with up to 92 % accuracy. Inaddition to an evaluation with simulated data, we evaluate the algorithmon a real-world data set recorded during firefighting missions. We show,that speech activity of firefighters can be detected with 85 % accuracy whenusing a smartphone that was placed in the firefighting jacket.

5.1 Introduction

Speech is an important modality which reveals personal information,e.g. about our state of mind, our emotions and connection to others. Ingroups of persons, communication patterns can indicate social struc-ture and can characterize relationships. Within work teams, simply theamount of speech can indicate explicit coordination within teams.

Within the interdisciplinary SNSF-funded research project “Micro-level behavior and team performance”, we apply social sensing toteam research. One of our goals is to measure communication pat-terns in first responder teams such as firefighters automatically withthe smartphone. Noisy work environments and the placement of thesmartphone in the firefighting jacket require a robust voice activity de-tection in order to estimate the amount of communication accurately inthe field. The fact that the detection system must work across variousnoise types and at di↵erent signal-to-noise levels renders the detectiontask challenging.

In order to detect speech in noisy ambient sound recorded with thesmartphone we rely on dictionary learning and sparse representation.Our contributions are the following:

1. We present a noise robust voice activity detection system basedon dictionary learning and sparse representation.

5.2. Related Work 89

2. We evaluate the approach on simulated data using the TIMITand NOIZEX-92 databases.

3. We test the voice activity detection algorithm on ambient sounddata recorded with the smartphone during firefighting trainings.

5.2 Related Work

In recent years the smartphone became a true sensing platform andenabled ubiquitous sensing of human behavior. Previous research hasshown how user context and behavior can be inferred from di↵erentsensor modalities. Particularly, ambient sound proved to be a richsource of personal information. The built-in microphone of smart-phones was utilized to sense ambient sound patterns [1], to recognizeemotions of the user [2], to detect user conversations [3], as well as toindicate levels of sociability as one factor of well-being [4]. However,most these applications were designed for o�ce use were noise con-ditions are at an acceptable level to make inferences about personalstates. Outdoors, for example on noisy streets, already the detection ofspeech becomes challenging.

In the signal processing community various robust voice activitydetectors have been developed which work in noisy environments.For example the long-term-spectral-variability (LTSV) introduced in[5] measures the non-stationarity of signals. Because speech and noiseexhibit di↵erent levels of non-stationarity the measure can be usedfor voice activity detection. Recently an approach for robust activitydetection based on dictionary learning and sparse representation wasproposed in [6]. In this work, we compare the two approaches to detectspeech / non speech in ambient sound recordings collected with thesmartphone placed in a jacket pocket.

5.3 Noise Robust Speech Detection

5.3.1 Approach

Our approach to detect speech in noisy environments utilizes dic-tionary learning and sparse representation of the noisy audio signal.Using a dictionary learned on clean speech data, the sparse represen-tation better approximates speech than noise signals and thus can beused for speech detection. Because of the sparsity constraints, speech

90 Chapter 5: Noise Robust Speech Activity Detectionfre

quen

cy [k

Hz]

2 4 6 8 10 12 14 16 18 20 22

4

3

2

1

0

2 4 6 8 10 12 14 16 18 20 220

5

10

15

20

time [s]

ener

gyco

effic

ient

s

2 4 6 8 10 12 14 16 18 20 22

10

20

30

40

50

Figure 5.1: Sparse representation of noisy speech. Top: spectrogramof noisy signal; Middle: sparse coe�cients; Bottom: total coe�cientenergy per frame smoothed with short and long term sliding windows.

can be detected even in low signal-to-noise conditions when speech isbarely audible. The approach is illustrated in Figure 5.1. In the exam-ple, one sentence from the TIMIT database [7] was mixed with threedi↵erent noise types from the NOIZEX-92 database [8] at =10 dB SNR.From the spectrogram, the di�culty of the detection at such a lownoise level becomes apparent. However, in the sparse representationthe detection task becomes feasible, as the squared coe�cients high-light voiced parts of the spoken sentence. As can be seen, the totalenergy of the sparse coe�cients is much higher for speech than fornoise and peaks at the voiced parts of speech (bottom). Comparingshort-term and long-term averages of the total coe�cient energy hasbeen shown to robustly detect speech [6].

5.3.2 Recognition Chain

The speech detection chain is presented in Figure 5.2. The audio signalis framed using a hamming window and then transformed into thesparse representation. Frames of the sparse representation are used tocalculate features on longer windows which are fed into a classifier forspeech / non speech detection.

5.3. Noise Robust Speech Detection 91

Framing

Sparse Representation

Windowing

Feature Extraction

Classifier Speech /No speech

AudioSource

Clean speechdictionary

Classification model

Figure 5.2: Proposed voice activity detection chain

5.3.3 Dictionary Learning

To learn a dictionary from data, one searches the dictionary Dopt thatbest represents the training data X = [x1, .., xn

], while having a sparsesolution. This is expressed by the l1-sparse-coding problem:

Dopt = argminD,↵

1n

nX

i=1

(kxi

�D↵i

k22 � �k↵i

k1), (5.1)

where � is the regularization parameter corresponding to the e↵ectivesparsity of the solution.

For learning the dictionary from clean speech data, we randomlysample frames x

i

of length f l of 200 randomly selected sentences fromthe TIMIT database [7]. Because voiced parts of speech are most dis-criminative in noisy conditions, we only consider frames that includeat least 80 % of voiced speech. Each frame is multiplied by a hammingwindow. In total, we sample 106 frames for each considered framelength. We use the online method of Mairal et. al. [9] to solve Equa-tion 5.1.

To illustrate the learned dictionary, we present in Figure 5.3 thespectrograms of each dictionary atom. As can be seen, the learneddictionary atoms appear to be similar to voiced parts of speech.


Figure 5.3: Learned Dictionary (k = 50, f l = 100ms): For each of the50 atoms the spectrogram is presented.

5.3.4 Sparse Representation

To find a sparse representation ↵ = [↵1, ..,↵n] of the audio signal givenby frames [x1, .., xn

] one needs to find the coe�cients that minimizethe representation error while being sparse. Similar to above this isexpressed by:

↵ = argmin↵

1n

nX

i=1

(kxi

�Dopt↵i

k22 � �k↵i

k1) (5.2)

5.3.5 Classification

The classification of speech / non speech is done on hopping windowsof length W and step size S. Having observed that voiced speech hashigh energy coe�cients in the sparse representation whereas noisesignals have low energy coe�cients (compare Figure 5.1), we computethe maximum total coe�cient energy within a window and subtract themedian value for reasons of normalization. Normalization is necessarydue to di↵erent noise types. In formula, the feature for each window i

5.4. Evaluation 93

is given by:f (i) = max

i<= j<i+We( j) �median

i<= j<i+We( j), (5.3)

where e( j) is the total coe�cient energy of frame j by e( j) = k↵j

k22.For classification we use logistic regression. All results reported

below used 10-fold cross-validation.

5.4 Evaluation

For evaluation we randomly selected 96 sentences of the TIMITdatabase that were not previously used for the dictionary learningand concatenated them with 3 seconds of silence in between. The se-lected sentences had an average length of 3.25 ± 0.95 seconds. Theclean speech data was mixed at three di↵erent SNRs (0,-5,-10 dB) with12 noise types of the NOIZEX-92 database from which we did not in-clude ’babble’ and ’destroyerop’ because they included speech. SNRwas only calculated when speech was present. In total 6 h of audiodata were used for the evaluation which was done independently ofnoise type and SNR.

5.4.1 Dictionary Parameters

To find suitable dictionary parameters, we compared the detectionaccuracy for di↵erent frame lengths and dictionary sizes. All otherparameters were fixed: the regularization parameter �was set to 0.15,the frame overlap to 50 %, window length W to 1 s and step size S to100 ms. In Figure 5.4 the accuracies of the di↵erent combinations arepresented. It can be seen that the best detection performance on 1 slong prediction windows is reached at a frame length of f l = 100 msand a dictionary size of k = 50.

5.4.2 Comparison to LTSV

We compared the presented approach to another robust voice activ-ity detector based on the long-term-spectral-variability measure intro-duced in [5]. LTSV was calculated with parameters as presented in [5]on frames of 20 ms length and a step size of 10 ms. Similar to above,the frame based measure was aggregated on a longer window length.For each window the root-mean-square over all LTSV-frames includedin one window was calculated and classified using logistic regression.


72.8

71.8

69.8

68.0

77.7

76.1

74.8

73.9

85.9

80.9

79.0

77.7

87.4

81.5

80.1

77.7

frame length [ms]

dict

iona

ry s

ize

k

20 40 80 100

50

100

200

400 70

72

74

76

78

80

82

84

86

Figure 5.4: Average detection accuracies across di↵erent noise typesand levels at di↵erent dictionary sizes and frame lengths.

The results are presented in Figure 5.5 for di↵erent window lengths.As can be seen, LTSV is better than the sparse representation approach(SR) for window lengths shorter 800 ms, whereas for longer windowsSR is better. In both cases detection accuracy increase with longer win-dows. The fact that LTSV is better at shorter window sizes is due to thefact, that internally LTSV includes information of 1 s long windows.

5.5 Speech Detection during Firefighting

In order to sense the amount of team communication within first re-sponder teams, we have tested the speech detection algorithm on audiodata that was recorded during a one day training of firefighters.

5.5.1 Experiment

The experiment was conducted in the fire simulation building at thetraining facilities of the Zurich fire brigade where a variety of firescenarios can be realistically simulated ranging from kitchen fires to aburning car in the garage. In the chosen scenario a kitchen fire in thethird floor of the training building had to be extinguished.

Two teams of a voluntary fire brigade completed the scenario oneafter the other. Each team consisted of five firefighters including the

5.5. Speech Detection during Firefighting 95

400 600 800 1000 1200 1400 1600 1800 200076

78

80

82

84

86

88

90

92

94

window length [ms]

accu

racy

[%]

SRLTSV

Figure 5.5: Comparison of average detection accuracies across di↵erentnoise types and levels at di↵erent window lengths. k = 50, f l = 100 ms,S = 50 ms.

incident commander (IC) who led mission operations and the troopleader (TL) who led the troop that went inside the building to extin-guish the fire. To coordinate mission operations, incident commanderand troop leader had to communicate. Impressions of the scenario areshown in Figure 5.6.

For data collection, we used the Sony Xperia Active smart-phone and a custom Android app. Based on the funf-open-sensing-framework1, we designed an Android app to record ambient sounddata at a sample rate of 11 250 Hz and later down sampled to 8 kHz.The phone was placed in the left pocket of the firefighting jacket (seeFigure 5.6) where firefighters were used to carry their mobile phone.For more details on the experiment please refer to [10].

5.5.2 Test on Firefighting Noise

In order to test the accuracy of the detection algorithms on noise typesthat are observed during firefighting, we manually selected eight dif-ferent noise snippets from the ambient sound recorded during thetraining missions. This included engine noise of the fire truck, rustlingnoise when waking, background noise of the fire house such as a loud

1http://funf.org/


ICTL

Figure 5.6: Smartphone placement and impressions of the firefight-ing training scenario.

5.5. Speech Detection during Firefighting 97

400 600 800 1000 1200 1400 1600 1800 200076

78

80

82

84

86

88

90

92

window length [ms]

accu

racy

[%]

SRLTSV

0 −5 −100

10

20

30

40

50

60

70

80

90

100

SNR [dB]ac

cura

cy [%

] @ 2

s w

indo

ws

SRLTSV

Figure 5.7: Voice activity detection accuracies when clean speech wasmixed with typical noise types observed during a firefighting trainingmission.

fan and breathing noise when using the self contained breathing ap-paratus. As above, these noise types were mixed with the same cleanspeech data at the three di↵erent noise levels. In Figure 5.7 the resultson the simulated noisy speech data are presented. It can be seen, thatthe SR approach to speech detection is robust also to typical noise typesobserved in a firefighting training scenario. The average accuracy overall SNRs and noise types at a window length of 2 s is above 90 %. Com-pared with the LTSV approach the detection accuracies are about 8 %higher.

5.5.3 Test on Firefighting Audio Data

To test our speech detection algorithm on noisy speech data observedduring firefighting, we manually labeled 50 min of the recorded audiodata for present speech at the incident commanders and troop leaders.

In the left of Figure 5.8 the accuracies are shown. As can be seenthe voice activity detection works about 10 % better for the incidentcommanders (IC1, IC2) who were outside the building compared to thetroop leaders (TL1, TL2) who were inside. This di↵erence in detectionaccuracy can be explained by di↵erent levels of environmental noiseas the building ventilation was very noisy. In the right of Figure 5.8


1000 1500 20000

10

20

30

40

50

60

70

80

90

100

window length [ms]

accu

racy

[%]

TL1TL2IC1IC2

1000 1500 20000

10

20

30

40

50

60

70

80

90

100

window length [ms]ev

ent−

base

d F−

mea

sure

[%]

TL1TL2IC1IC2

Figure 5.8: Continuous voice activity detection results for ambientsound data recorded during firefighting training. Left: accuracy; Right:event-based F-measure

the event-based F-measure is presented. At a window length of 1 s,the F-measure is above 85 % for all four firefighters, meaning that onlyvery few speech events were inserted or deleted.

5.6 Conclusion

We have presented a robust voice activity detection algorithm whichis based on sparse representation. To best represent speech, we useda dictionary learned from clean speech data. The evaluation on simu-lated noisy speech data proofed robustness even in low signal-to-noiseconditions. On average an accuracy of 87 %, 92 % was reached on a win-dow length of one, two seconds, respectively. To test real-world noisyscenarios, we applied the detection algorithm to ambient sound datawhich was recorded during firefighting trainings. On average an ac-curacy of 85 % and an event-based F-measure of 91 % was obtained ona window length of 1 s. Future work should address the problem ofspeaker diarization in low signal-to-noise conditions.

Bibliography

[1] H. Lu, W. Pan, N. D. Lane, and A. T. Choudhury, Tanzee-mand Campbell, “SoundSense: Scalable sound sensing forpeople-centric applications on mobile phones,” in Proc. Int. Conf.Mobile Systems, Applications, and Services (MobiSys), 2009.


[3] H. Lu, A. J. B. Brush, B. Priyantha, A. K. Karlson, and J. Liu,“SpeakerSense: Energy e�cient unobtrusive speaker identifica-tion on mobile phones,” in Proc. Int. Conf. Pervasive Computing(Pervasive), 2011.

[4] N. Lane, M. Mohammod, M. Lin, X. Yang, H. Lu, S. Ali, A. Do-ryab, T. Berke, Ethanand Choudhury, and A. Campbell, “BeWell:A smartphone application to monitor, model and promote wellbe-ing,” in Proc. Int. Conf. Pervasive Computing Technologies for Health-care (PervasiveHealth), 2011.


[6] D. You, J. Han, G. Zheng, and T. Zheng, “Sparse power spectrumbased robust voice activity detector,” in Proc. Int. Conf. Acoustics,Speech, and Signal Processing (ICASSP), 2012.

[7] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. F. andDavid S. Pallett,N. L. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuousspeech corpus,” 1993.

[8] A. Varga and H. J. M. Steeneken, “Assessment for automaticspeech recognition II: NOISEX-92: A database and an experimentto study the e↵ect of additive noise on speech recognition sys-tems,” Speech Communication, vol. 12, pp. 247–251, 1993.


[9] J. Mairal, F. Bach, and G. U. M. N. Edu, “Online dictionary learningfor sparse coding,” in Proc. Int. Conf. Machine Learning (ICML),2009.

[10] S. Feese, B. Arnrich, M. Rossi, M. Burtscher, B. Meyer, K. Jonas,and G. Tröster, “Towards monitoring firefighting teams with thesmartphone,” in Proc. Int. Conf. Pervasive Computing and Commu-nications (PerCom): WorkInProgress, 2013.

6Sensing Group

Proximity Dynamics

Sebastian Feese, Bert Arnrich, Bertolt Meyer, Klaus Jonas, Gerhard Tröster

Sensing Group Proximity Dynamics of Firefighting Teams using Smart-phones

Proceedings International Symposium on Wearable Computers (ISWC), pp.97–104, 2013.

Copyright is held by the authors. Publication rights licensed to ACM.

102 Chapter 6: Sensing Group Proximity Dynamics

Abstract

Firefighters work in dangerous and unfamiliar situations under a high de-gree of time pressure and thus team work is of utmost importance. Relyingon trained automatisms, firefighters coordinate their actions implicitly byobserving the actions of their team members. To support training instruc-tors with objective mission data, we aim to automatically detect when afirefighter is in-sight with other firefighters and to visualize the proximitydynamics of firefighting missions. In our approach, we equip firefighterswith smartphones and use the built-in ANT protocol, a low-power commu-nication radio, to measure proximity to other firefighters. In a second step,we cluster the proximity data to detect moving sub-groups. To evaluate ourmethod, we recorded proximity data of 16 professional firefighting teamsperforming a real-life training scenario. We manually labeled six trainingsessions, involving 51 firefighters, to obtain 79 minutes of ground truthdata. On average, our algorithm assigns each group member to the correctground truth cluster with 80% accuracy. Considering height informationderived from atmospheric pressure signals increases group assignmentaccuracy to 95%.

6.1 Introduction

During firefighting missions each firefighter fulfills a specific functionand relies on his peers. Firefighting teams usually split into sub-groupsto work in parallel on di↵erent tasks. Depending on mission complex-ity and the commander’s strategy, these sub-groups are more or lessstable and can merge and split again at any time. As the whole fire-fighting team works towards a common goal, it is crucial that thesub-groups coordinate their actions. However, coordination betweenmembers of di↵erent sub-groups is complicated by the fact that theymight not be in visual contact.

Wearable computing can provide details on these group dynamicsby automatically measuring how group structure changes during amission. A graphical representation of who was when in close prox-imity to whom illustrates mission development over time allowinginstructors to pinpoint possible coordination problems, which can beaddressed in further trainings.

In this paper, we present a methodology to measure and visualizegroup proximity dynamics of firefighting teams. Using the built-in

6.1. Introduction 103

ANT1 radio of smartphones, we scan nearby devices fast and e�cientlyin order to detect sub-groups based on the measured proximity. Inparticular, we make the following contributions:

1. We investigate the use of the low-power ANT radio to measureproximity between individuals and detail our search strategy todetect nearby devices. Further, we characterize discovery timeand search distance.

2. We present a methodology to cluster moving sub-groups withinaction teams using ANT-based proximity information and extendthe approach to also incorporate height information derived fromatmospheric pressure signals.

3. For an easy understanding of group dynamics, we visualize thegroup clusters over time in form of narrative charts which rep-resent who was when in a sub-group with whom.

4. We evaluate our group clustering algorithms in real-world fire-fighting training sessions and compare the results to manuallyannotated ground truth. We further show, how a firefightingtraining mission evolves over time and highlight important stepsof the mission.

6.1.1 Related Work

Several research projects funded by the European Union aimed atsupporting and increasing work safety of firefighters. The ProeTEXproject [1] developed a smart textile to monitor the physiological sta-tus of firefighters. To support tactical navigation under poor visibility,a beacon based relative positioning system was proposed during thewearIT@work project [2]. To better integrate current practices of fire-fighting brigades the approach was adapted in the ProFiTex project [3]and resulted in a Smart Lifeline which enabled relative positioning. TheNIST Smart Firefighting Project [4] combines smart building technol-ogy, smart firefighter equipment and robotics. Like in previous projectsthe aim is to provide real-time information on firefighter location, fire-fighter vital signs, and environmental conditions. The Fire Informationand Rescue Equipment project [5] at UC Berkeley combined wireless

1www.thisisant.com

www.thisisant.com


sensor networks (WSN) and head-mounted displays to support fire-fighters. A pre-installed WSN enabled room-level localization of emer-gency responders within a building [6]. The benefits and drawbacksof pre-installed location systems, wireless sensor systems and inertialtracking systems for emergency responders were compared in [7].

In contrast to the above systems, we focus on group proximityrather than on localization to capture mission development and teamactivity. Our primary goal is to support post incident feedback withobjective mission data. Although previous system prototypes weretested in simulated scenarios none of them were used in real-worldtrainings. In this paper, we deploy and evaluate our method in real-world training sessions.

In the data mining community spatial-temporal data is mined formoving objects by clustering methods which combine time and lo-cation information [8, 9, 10]. Kalnis et al. [9], split trajectory data intime slices and used a density based clustering method to group closeobjects. Similar clusters found in consecutive time slices were then con-sidered as a moving cluster of objects. In previous work [11], we haveextended the approach to handle noisy data and applied the clusteringmethod to GPS-trajectories of people walking in groups through a city.

In the field of reality mining, the works by Eagle and Pentlandhave first explored the use of the mobile phone to measure proximityto others using repeated Bluetooth scans [12]. They showed that com-munities and daily routines of persons can be identified from Bluetoothproximity networks. More recent work, discovered human interactionsfrom proximity networks using topic models [13]. In both approaches,the measured interaction data is aggregated in time slices of at least10 min duration and thus the discovered patterns are on an even largertime scale.

Contrary to previous work, we use the low-power ANT protocolto scan for nearby devices. This allows us to detect devices in closeproximity much faster, usually in less than 600 ms compared to 30 sof a typical Bluetooth scan. This increased time resolution by a factorof up to 50 enables us to measure how groups of firefighters split andmerge during a mission in real-time.

6.2 ANT-based Proximity

ANT, similar to Bluetooth Low Energy, is an ultra low-power, lowbandwidth wireless protocol which operates in the 2.4 GHz range.

6.2. ANT-based Proximity 105

Contrary to Bluetooth Low Energy it allows a node to be master andslave at the same time and thus supports many network configurations.Currently, ANT is most used in fitness devices such as chest-belts andpedometers. ANT chips support up to eight logical channels on onephysical 2.4 GHz radio link using time division multiplexing. EachANT channel is identified by a tuple of network ID, type ID and deviceID. Configuring a channel includes setting the ID, the frequency andthe period.

6.2.1 Search Strategy

ANT o↵ers di↵erent strategies to discover other devices: one can searcha device with a known ID, search for devices which match certain prop-erties, e.g. are of particular type, search near devices using proximitysearch and one can utilize background searches. However, because theANT chip included in our Sony Xperia Active phones did not supportproximity nor background search mode, we implemented a list searchstrategy. On each device one master channel constantly transmits adevice ID with a specified period and seven slave channels are used tosearch in parallel for devices specified in a search list.

In Figure 6.1 the search strategy is presented in form of an UMLActivity Diagram. Given a list of devices to search for, the first deviceID is popped from the list, a slave channel with the desired device IDis opened and a search timer is started. In case that a device is found orthe search times out, the channel is closed, the device ID is appendedat the end of the list, the next device ID is popped from the list andthe channel is reopened. In case that the device is found, the receivedsignal strength indicator (RSSI) is saved.

The implemented list search is a simple device discovery strategy.It is robust, as the search result of one device is independent of thesearch results of other devices. But the search strategy does not scaleto a large search list as it takes time to find all devices before a devicecan be searched again. To handle large search lists, one could utilize acollaborative search strategy, in which devices also send their knownneighbours to reduce the number of devices that have to be searchedon each device.


Pop Device IDfrom SearchList

Configure channel ch

Open channel ch

ch = ch + 1

Log (Device ID,RSSI)

Close channel ch

Append Device IDto SearchList

ch = 1N = min(MAX_CHANNELS, length(SearchList))

ch <= N

Device Found

Timeout

ch > N

Start timer ch

Figure 6.1: Implemented list search to discover nearby devices.

6.2.2 Search Interval

The maximum search interval SI is the time between two searches ofthe same device. Fixing the channel frequency, the search interval isdependent on the number of channels used for searching, as well as thelength of the search list. In the best case, all devices that are searchedare also discovered and no channel is blocked until a search timeoutoccurs. The search interval is then given by:

SI(x) = d x# search channels

e ⇤ SI(1), (6.1)

where x is the number of searched devices, d.e is the integer ceilingfunction and SI(1) is the maximum time that it takes to search onepresent device. In the worst case, only one of the searched devices ispresent and timeouts will occur. The search interval is then given by:

SI(x) = (d x# search channels

e � 1) ⇤ tst + SI(1), (6.2)

6.2. ANT-based Proximity 107

0 500 1000 1500 20000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

∆t [ms]

P(∆

t)

Search Interval [1−7 devices]

0 1000 2000 3000 40000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

∆t [ms]

P(∆

t)

Search Interval [8−12 devices]

bestcaseworstcase

Figure 6.2: Search Interval Distribution. Left: when searched for 1-7 devices, one or 12 devices present. Right: when searched for 8-12devices, best case: 12 devices present; worst case: only one devicepresent. Device ID’s were transmitted with 6 Hz.

where tst is the time of the search timeout.To evaluate the search interval in best and worst case scenarios,

we measured the search interval using seven search channels. In thebest case, 12 devices were present and continuously transmitted theirdevice ID’s six times per second. In the worst case only 1 device waspresent and transmitted its device ID. For both cases, we repeatedlymeasured the search interval over a period of ten minutes increasingeach time the number of devices to search for from 1 to 12. For eachconfiguration, we randomly sampled 250 search intervals, totaling to6000 search intervals in our analysis.

With seven search channels, up to seven devices can be searchedin parallel and worst and best case search intervals do not di↵er. Thedistribution of measured search intervals is shown in the left of Fig-ure 6.2. On average devices are found again within 600 ms and withinSI(1) = 1500 ms at most. We therefore, set the search timeout tst conser-vatively to 2000 ms. In case that more than seven devices are searchedthe search interval will depend on the number of present devices. Inthe best case, all devices that are searched are present, in the worst case,only the device at the end of the search list is present, and the searchtimer times out at least once until the device is found. The distributionof measured search intervals is shown in the right of Figure 6.2.


0.51 2 3 4 5 6 7 8 9 10 12 13 14 16 18 20-85

-80

-75

-70

-65

-60

-55

-50

distance [m]

RSSI

[dBm

]

free-spacecorridorgarage

Figure 6.3: Average relationship between RSSI and distance in threedi↵erent environments as measured between two ANT-enabled smart-phones. Transmit power was set to 0 dBm.

6.2.3 Search Distance

The free-space path loss is proportional to the square of the distancebetween the transmitter and receiver. However, the received signalstrength (RSS) is generally not proportional to the distance due tothe influence of other parameters such as variation of transceivers,antenna orientation, height of transceivers and other environmentalfactors [14, 15, 16]. To illustrate the problem, we have measured theRSSI for di↵erent distances between receiver and transmitter in threedi↵erent environments. Figure 6.3 shows the average RSSI measuredat di↵erent receiver-transmitter distances. As can be seen, RSSI doesnot monotonously decrease with increasing distance. Because of thenonlinear e↵ects of RSSI, we decided to ignore the RSSI level and toconsider persons to be in proximity if messages arrive at all.

In order to reduce maximum communication distance, we set thetransmit power as low as possible to =20 dBm. With this setting, wetested the maximum distance at which messages are still received fromthe transmitting device in di↵erent scenarios. In the distance experi-ments, two persons hold the smartphone in the hand in front of their

6.3. Group Clustering 109

upper body and either faced each other, turned their backs to eachother, or looked in the same direction, so that one person looked atthe back of the other person. The tests were conducted in di↵erentenvironments: a o�ce corridor with metal cupboards to the left andright, a garage with pillars, a foyer of a university building and out-side in an alley between two buildings. In all cases, there were no otherobjects between the two persons. On average, we observed about 1 mrange for back-to-back, 1 m to 4 m range for face-to-back, and 9 m to20 m range for face-to-face configurations. In our targeted applicationof monitoring sub-groups of firefighters during missions these maxi-mum distance ranges seem reasonable as for example troop membersusually work within reach holding on to each other for safety reasons.Firefighters that operate outside the building can be considered to bepart of one sub-group as long as they can see each other and are within20 m distance.

6.3 Group Clustering

Our approach to detect moving groups over time from the recordedproximity data is illustrated in Figure 6.4. Like [11], the approach con-sists of two steps: first, the proximity data is clustered independentlyfor each time slice; second, the clustering output is smoothed usingtemporal filtering.

6.3.1 Group Clustering

For each time slice t, the binary elements of the proximity matrix Dtij

indicate if device i received any message from device j within the lastperiod P. Because we are not interested in directed links, we sym-metrize the proximity matrix by adding the transpose

Dtsym = Dt + (Dt)T. (6.3)

Based on the symmetrized proximity matrix each time slice is clusteredby the single-link criterion, so that all connected pairs are merged to onecluster using Algorithm 1. In principle, for each of the N individuals i,one cluster is created containing i and all it’s neighbours. In case thatthere exists already another cluster that contains at least one of thecurrent members the two clusters are merged.


1

43

2

1

43

2

1

43

2

1

2

3

4

time

00

DDsymsymtt22DDsymsym

tt11DDsymsymtt00 temporal

filtering

proximity clustering

time slice t

individuals

group 1

group 2

Figure 6.4: Group clustering: At each time step the proximity matrixindicates who is in proximity to whom and groups are detected basedon the single-link criterion. The results of each time step are thensmoothed by a temporal filter. In the example two groups are present.

6.3.2 Considering Height Levels

Using only radio based proximity information might lead to individu-als on di↵erent height levels to be clustered into one group. However,depending on the goal of the clustering, this might not be desired. Toconsider height levels, we additionally take the absolute atmosphericpressure di↵erence APDt

ij between individuals i and j at time step tinto account. If APDt

ij is smaller a threshold value ⌧height, individualsi and j are considered to be on the same level, which is expressedby Lt

ij = [APDij < ⌧height]. If height di↵erences should be consideredduring clustering, each element of the proximity matrix Dt has to bemultiplied with the corresponding element of Lt at each time step t.

Dtsym = Dt � Lt + (Dt � Lt)T, (6.4)

where � denotes element-wise matrix multiplication.

6.3. Group Clustering 111

Algorithm 1 clustering of time slice Lt

function cluster(Lt)Ct = ;for i = 1 : N do

nc = ifor j = i + 1 : N do

if Ltij thennc = nc [ j

end ifend forCt =merge(Ct,nc)

end forreturn Ct

end functionfunction merge(clusters,nc)

mc = ;for c 2 cluster do

if c \ nc = ; thenmc = {mc, c}

elsemc = {mc, c [ nc}

end ifend forreturn mc

end function

6.3.3 Temporal Smoothing

As we are interested in clusters that persist for at least ⌧ time steps,we smooth the individual clusterings by applying a temporal filter assuggested in [11]. At each time step, a group cluster is either an activeor a potential cluster. A group cluster is considered an active cluster, ifit persisted for at least ⌧ time steps and a potential cluster otherwise.Only if a potential cluster has lasted longer than ⌧ time steps, it ispromoted to an active cluster. In case that an active cluster has notbeen detected in any of the previous � time steps it is deleted. At eachtime step the active clusters are taken as the smoothed output. If at anytime step a person is not assigned to an active cluster, the clustering ofthe previous time step is used instead.


start of group cluster

stop of group cluster

split merge

Figure 6.5: Visualization of group clustering over time in form of anarrative chart. Individuals that are in proximity are represented byclosely spaced colored lines. Groups of lines that are apart from anotherrepresent groups of individuals not in proximity.

6.3.4 Visualization

Based on the idea of narrative charts2 which display when characters ina movie appear together, we visualize the proximity of group membersover time. An example of a narrative chart is presented in Figure 6.5.Each individual is represented by one line of di↵erent color. Individualswho form a group cluster are represented by lines which are closetogether. Each start and end of a group is represented by a vertical bar.When individuals change groups, splits and merges occur. Becausewe do not measure distance between individuals, space between linesdoes not correspond directly to distance but indicates di↵erent groupsof individuals.

6.4 Firefighting Experiment

In close collaboration with the a professional fire brigade, we exam-ined how our group proximity clustering methods perform in trainingsessions of professional firefighters and how the proximity dynamicscan be used to analyze the training sessions. In this section, we willexplain the conducted experiment and describe the training scenario.

The experiment took place in a fire simulation building in which avariety of incidents, ranging from kitchen fires to burning cars in thegarage, can be staged. During trainings, firefighters are confronted

2http://xkcd.com/657

http://xkcd.com/657

6.4. Firefighting Experiment 113

E

TLIC

3 4

5 6

21

Figure 6.6: Training scenario. Firefighters had to enter through the roofwindow and navigate blindly to the fire below a spiral staircase. Onthe way to the fire a dummy person had to be found and rescued.

with real fires, extreme heat, high humidity and thick smoke thatseverely restricts visibility. Together with the training instructors, wedesigned a non-standard training scenario with increased di�culty toensure that di↵erent teams would not perform equally well.

Each firefighter has a specific role which is fixed to the seating posi-tion in the firetrucks. The incident commander (IC) leads the operationand is in charge. On-site, the driver of the turntable ladder (TL) is re-sponsible for operating the ladder, whereas the driver of the fire truckbecomes the engineer (E) who operates the water pumps. The engineerkeeps track of which firefighter uses the self contained breathing appa-ratus (SCBA). All other firefighters are part of a troop. The two troopsare led by a troop leader (T1a,T2a) and contain one or two additionalfirefighters (T1b, T1c, T2b, T2c).

In Figure 6.6 the training scenario is illustrated. In the scenario,a fire on the third floor of the training building had to be put out.In the beginning of the training mission, the hose was prepared, theengineer connected the fire hose to the hydrant and the first troop wastransported with the turntable ladder to the roof window which wasthe only entrance point allowed (see 1 , 2 ). Once the troop wasinside the building (see 3 ), the firefighters had to fight against theheat of the fire maneuvering from the fourth floor to the third floor(see 4 ). On the way towards the fire, a non expected dummy personhad to be found and rescued. As the troop leader was not informed ofany missing person, he had to decide how to correspond to the new


situation. Only after the dummy person was safe the fire should havebeen extinguished, either be the first troop or by an ordered secondtroop (see 5 , 6 ).

6.5 Evaluation

First, we will qualitatively evaluate the clustering result of one trainingmission to investigate where the group clustering performs well andwhere the results do not match the ground truth. Second, we willquantitatively compare the clustering solutions to the ground truth interms of clustering accuracy.

We successfully recorded 16 training runs of the same scenarioand in total 51 professional firefighters took part in our experiment.The duration of the training missions ranged from 10 to 16 minutes.All training runs were videotaped with two regular and one thermo-graphic camera.

In all evaluations, we used the following parameter settings: Theslice period P was set to 5 s. For the temporal smoothing, we set theparameter ⌧ to 10 s and � to 5 s. We set the parameter ⌧height to 1 hPa toconsider firefighters with a height di↵erence of less than 8 m to workon the same height level.

6.5.1 Qualitative Analysis

Figure 6.7 shows how the group structure changes over time withinone training mission. Presented are two visualizations, the first onerepresents the clustering results when only ANT-based proximity in-formation was used, whereas the second one represents the clusteringresults when ANT-based proximity information was combined withatmospheric pressure signals. The pictures on top of the graphs inFigure 6.7 show important steps of the training mission.

We first look at how well the two clustering solutions capture mis-sion operations. At the start of the mission, two groups are identifiedwhich correspond to the seating in the fire trucks. At a T2c is on hisown while running around the house in order familiarize himself withthe situation behind. At 1 all firefighters are in-sight to each other andform one group when they are preparing the turntable ladder and thequick-attack hose. At 2 the first troop is brought upwards with theturntable ladder which can be seen in the two clustering results as thefirst troop is shown to split (T1a,T1b,T1c are separated from the rest).

6.5. Evaluation 115

96135 174

372

321300

510402 488

371 444 599561 58450 112228 276

300

with

out a

tmos

pher

ic p

ress

ure

Abov

e G

roun

dG

roun

d Le

vel

com

bine

d w

ith a

tmos

pher

ic p

ress

ure

0 100 200 300 400 500 600

0 100 200 300 400 500 600

IC T1aE T2aT1b T2b T2cTL T1c

1 2 3 4 5 6 7 8 9 10

a b c

d

f g h j

time [seconds]

ie

a b c f g h jie

b c

e f

d

g

h i j

Figure 6.7: Visualization of the group clustering of a firefighting train-ing mission. Presented are two clustering results. Top: only ANT-basedproximity was used. Bottom: ANT-based proximity was combinedwith atmospheric pressure signals to also consider height di↵erencesof firefighters.


At d the two clustering solutions are now becoming di↵erent. Ascan be seen in picture d , one of the troop members turns towards thefirefighters on ground and as such connects the two separate groups.As a consequence the two groups will be joined if the clustering is onlybased on ANT-proximity. At 3 troop members T1a and T1b enter thebuilding through the roof window and begin to navigate which canbe seen in both clustering solutions 4 . Around 5 the second troopclimbs the turntable ladder. As there is still a connection to the otherfirefighters outside the building this cannot be seen in the first cluster-ing, but only in the second one which considers the height di↵erence.At 7 T2a and T2b enter the building and because there is no con-nection to other firefighters this can also be seen in the first clusteringsolution. This split is seen a few time steps later in the first clusteringsolution due to the temporal filtering.

To evaluate in which situations, the group clustering works well,we now look at the splits and merges of E and T2c which are equalin both clustering solutions. At b and c E is shown to be separatefrom the other firefighters, this is because the line-of-sight to the otherfirefighters is blocked by some rocks when connecting the hose to thehydrant. At e and f E is in proximity to IC and TL when standingleft to the fire truck, whereas E is alone at g , h , i and j whenhe is behind the fire truck operating the machinery. Turning to T2c, itappears that T2c is on his own at e and f , but from the correspondingpictures, we see that he is in fact behind the turntable ladder but turnedhis body away from the other firefighters. At g and j T2c turnedhis body more towards the incident commander at the left side of thepicture and as a consequence is in a group with IC and TL.

Summarizing the qualitative analysis, we find that the clusteringsolution which combines ANT-based proximity with pressure signalsis better suited to show how a mission evolves over time as missionrelevant information is presented clearer. From the proximity graph,one can easily identify important mission events such as when theturntable ladder reached final position and when and for how long atroop is operating levels above ground. However, also from the ANT-based proximity clustering one can infer when and for how long a troopis inside the building. From the examples of E we saw, that the groupclustering overall corresponds well to the in-sight criterion, however,in the case of T2c, we have seen that a firefighter can be in-sight to otherfirefighters, but because he blocks the radio signals with his body heis not detected to be in proximity with others. The e↵ect that the body

6.5. Evaluation 117

blocks the radio signals became also apparent at d when the troopused the turntable ladder facing the building.

6.5.2 Quantitative Analysis

For a quantitative analysis of the presented group clustering algo-rithms, we calculate the accuracy metrics proposed in [11] and evalu-ate how well the group clustering results match manually annotatedground truth.

Accuracy Metrics

Let the set of individuals be defined as I = {1, ..,N} with N being thetotal number of individuals. Further, let a clustering at time step t bedescribed by the partition Ct and |Ct| be the number of clusters at timestep t 2 {1, ..,T} with total time steps T. A particular cluster is indexedby Ct(i). Clusters of the ground truth and the algorithm are indicatedby Ct

GT and CtA, respectively.

The Number of Groups Detected Accuracy (NGDA) expresses thefraction of time steps in which the algorithm detects the same numberof group clusters Nc as are present in the ground truth:

NGDA =1T

TX

t

[|CtGT | = |Ct

A|] (6.5)

To measure how far of the algorithm is, we also calculate the AverageNumber of Groups Detected Error (ANGDE) as the average over alltime steps of the absolute di↵erence between the number of groups inthe ground truth and as detected by the algorithms:

ANGDE =1T

TX

t=1

||CtGT | � |Ct

A|| (6.6)

The Average Group Assignment Accuracy (AGAA) expresses howwell individuals are assigned to the correct group cluster on averageover all time steps. For each time step, we calculate the number ofcorrectly assigned individuals ct as following: for each ground truthcluster, the predicted cluster with the highest number of shared group


members is searched and the number of shared group members isadded to the number of correctly assigned individuals. In formula:

ct =

|CtGT |X

i=1

maxj|Ct

GT(i) \ CtA( j)|. (6.7)

AGAA is then given by:

AGAA =PT

t=1 ct

N ⇤ T. (6.8)

Ground Truth Annotation

In order to evaluate our clustering algorithm, we manually labeledthe video recordings of six complete training runs in two di↵erentways. The first ground truth places the focus on being in-sight to eachother and we consider two firefighters to be in proximity when at leastone of them can see the other and no object blocks the line-of-sight.The motivation for the first ground truth lies in the fact that implicitcoordination requires to see or hear each other so that a firefightercan overview the actions of his peers. With the second ground truthwe incorporate the height level that a firefighter works on, that is wedistinguish if a firefighter is on ground level or operates at the top ofthe turntable ladder or in the building on floors above ground level.Consequently, a firefighter who is on ground level and can see thefirefighter on top of the turntable ladder is now not considered to bein proximity with the firefighter on top. Thus, the focus is placed onmission operations as this ground truth additionally captures who isabove ground levels.

Group Cluster Accuracy

We applied the group clustering to six annotated runs of the trainingscenario totaling to 79 minutes of training mission data. The mean andstandard deviation of the accuracy metrics across the training missionsof six teams are summarized in Table 6.1. On average the proximitybased clustering algorithm detects the correct number of groups in37% of all time steps. In terms of NGDA the performance of the groupproximity clustering appears to be low, however, one should keep inmind that NGDA is a rather hard accuracy metric, as already the mis-assignment of one individual results in an error at the corresponding

6.6. Conclusion 119

filter NGDA [%] ANGDE AGAA [%]

ANT-basedproximity (Eq. 6.3)

no 37 (12) 0.74 (0.11) 80 (6)yes 37 (14) 0.73 (0.13) 80 (6)

incl. atmosphericpressure (Eq. 6.4)

no 64 (15) 0.38 (0.17) 95 (4)yes 66 (14) 0.35 (0.14) 95 (3)

Table 6.1: Accuracy of group clustering algorithms. Mean and stan-dard deviation (in brackets) across six firefighting teams performing atraining scenario.

time step, even if all other individuals are assigned correctly. From theANGDE, we conclude that on average the algorithm is close to thenumber of groups present in the ground truth, meaning that the algo-rithm detects on average one group too much or too less. When prox-imity information is combined with atmospheric pressure, we see thatthe performance of the clustering algorithm increases. In two-thirdsof the time, the correct number of groups is detected and individualsare assigned to the correct cluster with an AGAA of 95%. Temporalsmoothing did not increase the results when only proximity informa-tion was used for the clustering, but it slightly increased clusteringperformance when additional height information was utilized.

6.6 Conclusion

We presented a methodology to cluster moving groups over time fromradio-based proximity data. Relying on ANT-based radio messagesinstead of Bluetooth scans enabled us to scan nearby devices at a rateof up to 50 times faster than with commonly used Bluetooth scans.The increased time resolution allowed us to capture group proxim-ity dynamics of firefighting teams. Further, we presented how groupproximity dynamics of firefighting teams can be visualized in formof narrative charts showing which firefighters were when in proxim-ity to each other. This compact representation of mission operationsenables incident commander or training instructor to easily inspecthow a mission evolved over time and to pinpoint important missionevents. Moreover, we evaluated our group clustering algorithm onreal-life data of six professional firefighting teams performing a train-ing scenario in a fire simulation building. When compared to manually


annotated ground truth, our ANT-based algorithm correctly assignedfirefighters to the correct group in 80% of the time. When ANT-basedproximity information was combined with atmospheric pressure sig-nals, the average group assignment accuracy (AGAA) increased to95%. In future work, we will analyze the group proximity graphs ofdi↵erent performing firefighting teams.

6.7 Acknowledgements

The authors would like to thank all members of the fire brigade fortheir participation and support throughout the experiments. This workis partly funded by the SNSF interdisciplinary project "Micro-level be-havior and team performance" (grant agreement no.: CR12I1_137741).

Bibliography

[1] “ProeTEX - Advanced e-textiles for firefighters and civilian vic-tims,” Mar 2013. http://www.proetex.org.

[2] “Wear IT at work,” Mar 2013. http://www.wearitatwork.com.

[3] “ProFiTex - Advanced Protective Firefighting Equipment,” Mar2013. https://www.project-profitex.eu.

[4] “NIST Smart Firefighting Project,” Mar 2013. http://www.nist.gov/el/fire_research/firetech/project_sff.cfm.

[5] “Fire - fire information and rescue equipment,” Mar 2013. http://fire.me.berkeley.edu.

[6] J. Wilson, V. Bhargava, a. Redfern, and P. Wright, “A wireless sen-sor network and incident command interface for urban firefight-ing,” in Proc. Int. Conf. Mobile and Ubiquitous Systems: Computing,Networking and Services (MobiQuitous), 2007.

[7] C. Fischer and H. Gellersen, “Location and navigation supportfor emergency responders: A survey,” IEEE Pervasive Computing,vol. 9, no. 1, pp. 38–47, 2010.

[8] Y. Li, J. Han, and J. Yang, “Clustering moving objects,” in Proc.Int. Conf. Knowledge Discovery and Data Mining (SIGKDD), 2004.

[9] P. Kalnis, N. Mamoulis, and S. Bakiras, “On discovering mov-ing clusters in spatio-temporal data,” in Advances in Spatial andTemporal Databases, pp. 364–381, Springer, 2005.

[10] C. Jensen, D. Lin, and B. Ooi, “Continuous clustering of movingobjects,” Knowledge and Data Engineering, vol. 19, no. 9, pp. 1161–1174, 2007.

[11] M. Wirz, M. B. Kjaergaard, S. Feese, P. Schläpfer, D. Roggen, andG. Tröster, “Towards an online detection of pedestrian flocks inurban canyons by smoothed spatio-temporal clustering of GPStrajectories,” in Proc. Int. Workshop Location-Based Social Networks,pp. 17–24, 2011.

http://www.proetex.org

http://www.wearitatwork.com

https://www.project-profitex.eu

http://www.nist.gov/el/fire_research/firetech/project_sff.cfm


http://fire.me.berkeley.edu





[14] D. Lymberopoulos, Q. Lindsey, and A. Savvides, “An empiricalcharacterization of radio signal strength variability in 3-d IEEE802.15. 4 networks using monopole antennas,” Wireless SensorNetworks, vol. 3868, pp. 326–341, 2006.

[15] T. Stoyanova, F. Kerasiotis, A. Prayati, and G. Papadopoulos,“Evaluation of impact factors on rss accuracy for localization andtracking applications in sensor networks,” Telecommunication Sys-tems, vol. 42, pp. 235–248, 2009.

[16] K. Srinivasan, P. Dutta, A. Tavakoli, and P. Levis, “An empiricalstudy of low-power wireless,” ACM Trans. Sensor Networks, vol. 6,pp. 16:1–16:49, 2010.

7Sensing of Team

CoordinationIndicators

Sebastian Feese, Michael Burtscher, Klaus Jonas, Gerhard Tröster

Sensing Spatial and Temporal Coordination in Teams using the Smartphone

Journal of Human-centric Computing and Information Sciences, 2014 4:15.

124 Chapter 7: Sensing of Team Coordination Indicators

Abstract

Teams are at the heart of today’s organizations and their performance is cru-cial for organizational success. It is therefore important to understand andmonitor team processes. Traditional approaches employ questionnaires,which have low temporal resolution or manual behavior observation, whichis labor intensive and thus costly. In this work, we propose to apply mobilebehavior sensing to capture team coordination processes in an automaticmanner, thereby enabling cost-e↵ective and real-time monitoring of teams.In particular, we use the built-in sensors of smartphones to sense inter-personal body movement alignment and to detect moving sub-groups. Weaggregate the data on team level in form of networks that capture a) howlong team members are together in a sub-group and b) how synchronizedteam members move. Density and centralization metrics extract teamcoordination indicators from the team networks. We demonstrate the va-lidity of our approach in firefighting teams performing a realistic trainingscenario and investigate the link between the coordination indicators andteam performance as well as experienced team coordination. Our methodenables researchers and practitioners alike to capture temporal and spatialteam coordination automatically and objectively in real-time.

7.1 Introduction

Teams and team work are essential in today’s organizations [1]. Toperform well as a team, members need to share information, coordinatetheir actions and support each other. These activities are commonlyreferred to as team processes, which convert inputs such as individualmembers’ abilities into outcomes such as team performance [2].

In order to improve team performance, it is mandatory to monitorhow team members work and interact with another. However, currentapproaches to monitor these team processes in situ and over time arelimited. While questionnaires are ill-suited to capture the temporalsequence of interactions, manual behavioral observation that wouldbe more suitable for that purpose is notably absent from group re-search [3]. One reason is that behavioral observation is time-consumingas the manual encoding of behavior usually takes many times longerthan the actual interaction. As a result, most studies are limited to smallsamples and short observational periods. Consequently, researchershave called for new measurement systems capable of capturing thecomplexity of team processes [4].


In our view, ubiquitous computing can help to continuously mon-itor team processes in realistic environments and provide a new ob-servational tool that can support team researchers and trainers withobjective data on how team members interact and work with eachother.

In this paper, we focus on team coordination, which is regarded acentral teamwork process [5, 6]. Team “coordination occurs when teammembers perform the same or compatible actions at the same time [7,p. 423]”. Following a mobile behavior sensing approach [8], we pro-pose to automatically capture the temporal and spatial aspects of teamcoordination with the built-in sensors of smartphones. On the onehand, we capture the temporal aspect of coordination by continuouslymeasuring and comparing motion activity levels of team members toquantify how well team members align their movement in time. Onthe other hand, we assess the spatial component of coordination by de-tecting moving sub-groups from radio-based proximity information.In particular, we make the following contribution:

1. We present an approach to use the smartphone as a sensingplatform to capture individual and team behaviors. We recordbody movement of each team member and estimate proximitybetween team members.

2. From the sensor data, we extract sub-group and movement align-ment networks that summarize a) how long team members aretogether in a sub-group and b) how synchronized team membersmove. Further, we propose to summarize the structures of the ex-tracted team networks using density and centralization metricsas used in social network analysis.

3. We validate our approach in a study with professional firefight-ing teams performing high fidelity training missions in a fire-house and show how the proposed coordination indicators arecorrelated with objective and subjective coordination measures.

7.2 Related Work

7.2.1 Team Work

Team coordination in safety-critical environments has been assessedusing di↵erent methodologies. A common approach includes behav-ioral observation. By observing recorded videos of the team interaction


and encoding predefined behaviors, researchers can investigate tem-poral aspects of team processes such as patterns of interaction andchanges over time [9, 10, 11]. Behavioral observation, however, is veryresource-intensive and impractical for applied settings.

Another approach to team processes focuses on the structural char-acteristics of teamwork. Crawford et al. recently introduced a theo-retical framework that considers structure [12]. By drawing on socialnetwork analysis (SNA), their theory proposes di↵erent types of net-works to provide a more comprehensive explanation of the relation-ship between team processes and performance. SNA expresses thesocial environment as patterns or regularities in relationships amonginteraction units [13]. These relationships (i.e., ties) can be of di↵erenttypes. For example, a communication network could capture whichmembers of a department communicate with each other on a regularbasis. Such a network can reveal those members that are central to thedissemination of information within this department. The relationshipdata that makes up a social network can be represented and analyzedin di↵erent ways [14]. For a quantitative analysis, di↵erent relationshipmetrics can be derived from the data. These metrics can describe prop-erties of individual members or of the whole team. The most commonteam-level metrics include density and centralization [15]. Networkdensity is defined as the ratio between the actual number and the totalnumber of ties in a network and is often used as an indicator of cohe-sion [13]. Centralization refers to the variance in ties per team member;low values indicate a structure in which each member has the samenumbers of ties. Centralization reflects aspects of work organizationand hierarchy.

Researchers have applied SNA to teams in organizations. For exam-ple, it has been suggested that centralization has a negative impact onteam performance in complex tasks [16]. Zohar et al. showed that thedensity of a military teams communication network mediated the ef-fects of transformational leadership on climate strength [17]. Likewise,a series of case studies with police and firefighting teams suggeststhat both teams have di↵erent network architectures (distributed vs.split) [18].

Despite the potential of SNA to uncover the underlying structureof team processes, the number of studies using SNA in team researchis small [15, 12]. We believe that part of this problem lies in the methoditself as SNA, like behavioral observation, is very resource-intensiveand often impractical for applied settings. One way to address this


issue includes taking advantage of new developments from the fieldof mobile behavior sensing.

7.2.2 Mobile Behavior Sensing

Mobile Behavior Sensing aims at measuring and analyzing humanbehavior from sensor data recorded with mobile devices [19, 20, 8].Research in wearable and ubiquitous computing has shown how usercontext and behavior can be inferred from the smartphone’s sensordata using signal processing and machine learning techniques. The in-tegrated sensors capture device interaction, body movement, locationand speech of the user as well as characteristics of the user’s envi-ronment such as ambient light and sound. Characteristics features arethen extracted from the sensor signals to make inferences about thecontext, state and behavior of a user.

Farrahi et al. used coarse location information from cell towers andclustered individual location traces to discover daily routines suchas “going to work at 10am" or “leaving work at night” [21]. In or-der to give semantic meaning to recorded location information, fea-tures from ambient sound and video were fused to categorize locationinto place categories such as “college/education”, “food/restaurant” or“home” [22, 23].

The audio modality has been analyzed in the mobile setting todetect conversations, recognize speakers and estimate speaking dura-tion [24, 25, 26], perceived basic emotions [27], perceived stress duringa street promotion tasks [28] and to quantify sociability as one aspectof well-being [29].

On a macro level, Eagle et al. have first shown how mobile phonescan be used to infer proximity networks of communities [20]. Rely-ing on repeated Bluetooth scans, mobile phones were used to detectother nearby devices to estimate proximity between individuals. Onthe same data set, topic models were later used to discover humaninteractions from the proximity data [30].

Before the smartphone was available, Choudhury et al. introducedthe sociometer, a wearable device, to automatically sense body motion,communication and proximity networks [24]. Extending this line ofresearch, Olguin et al. used a new version of the sociometer to collectbehavioral data of nurses in a hospital. The results showed a positiverelationship of group motion energy and speaking time with groupproductivity [31].


In previous work, we adopted the idea to use motion and speechactivity to monitor teams. Our feasibility study showed that speech andmotion activity are promising performance indicators in firefightingteams [32] which motivated us to design, build and distribute ourmobile sensing app CoenoFire in a real fire brigade [33]. In this paper,we build on our approach to sense team proximity dynamics with thesmartphone [34].

7.3 Sensing Spatial and Temporal Coordination

From the review of related work, we conclude: Firstly, the smartphonecan be used to capture the behavior of it’s user and secondly, teamscan be characterized by network metrics as commonly applied in socialnetwork analysis. Based on these findings, we propose to automaticallysense spatial and temporal aspects of team coordination using thesmartphone.

The spatial component of coordination is concerned with how teamactivities are distributed in space. For this reason, we detect movingsub-groups of team members. Team members within the same sub-group are in close proximity, whereas those of another sub-group arenot. The temporal component of coordination is related to how teamactivities are aligned in time. Instead of detecting concrete activities(e.g. running), we measure the movement activity level that captureshow long a team member was physically active during consecutivetime intervals. By comparing the motion activity level signals of twoteam members, we measure how well they aligned their body move-ment in time. This is especially important for team members of firstresponder teams such as firefighters that move and work at least inpairs of two.

7.3.1 Approach

Our approach is schematically presented in Figure 7.1 and consists ofthe following steps:

1. Sensor data is recorded on the smartphones carried by each teammember. The phone’s sensors capture body motion by sensingacceleration, proximity to others by exchanging radio messagesbetween nearby devices and height information by sensing at-mospheric pressure.

7.3. Sensing Spatial and Temporal Coordination 129

3)2) Team Networks1) Mobile Sensingrunning

DA

B C

Sub-Group

B

A D

C

Movement-Alignment

B

AD

C

Network Metrics��Density��Centralization

Network Metrics��Density��Centralization

CoordinationIndicatorsrunning

walking

standing

Figure 7.1: Schematic of our approach to sensing team coordinationindicators: 1) The smartphone records body motion and proximity toothers. 2) Pairwise movement alignment and sub-grouping are rep-resented as networks. 3) Network metrics density and centralizationextract team coordination indicators related to performance and per-ceived coordination.

2. Data of each team member is processed to derive the sub-groupnetwork which captures who was for how long in a sub-groupwith another team member, as well as the movement alignmentnetwork, which captures dependencies in activity levels betweenteam members.

3. Network metrics as used in SNA are extracted from the team net-works to capture the overall structure of the networks. Networkdensity describes how well the nodes (team members) within thenetwork are connected, whereas centralization measures how het-erogeneously the nodes are connected to each other. Depending onthe type of network (sub-group network or movement alignmentnetwork), connected therefore refers to how long team memberswere in the same sub-group or to how well they aligned theirmovement activity levels. We refer to density and centralizationof the two team networks as team coordination indicators.

In the example presented in Figure 7.1, person A is standing stillwhile being in proximity with the running persons B and C. Person Don the other hand is walking behind a wall and is therefore not in anysub-group with another person. This leads to the presented sub-groupnetwork. As person A is in-sight with persons B and C, the sub-groupgraph shows them to be in one group, whereas person D is indicated


to be alone. The movement alignment network shows person B and Cto be best aligned because they are both running, whereas person Ais worst aligned as she is the only person not moving. From the sub-group and movement alignment networks SNA metrics are derivedto characterize overall network structures in order to capture teamcoordination indicators.

7.3.2 Smartphone Sensing Platform

Our data collection framework called CoenoFire is illustrated in Fig-ure 7.2 and consists of two parts: the smartphone data recording appas the sensing front-end and a database and visualization server atthe back-end. CoenoFire was developed to monitor firefighters duringreal-world incidents. Details on the system architecture were previ-ously presented in [33].

For data collection, we used the Sony Xperia Active smartphonewhich is dust and water-resistant, has a 3-inch capacitive touchscreenand a built-in ANT-radio (http://www.thisisant.com). ANT is a lowpower wireless protocol that was developed to connect fitness devicessuch as heart-rate-belts and pedometers with sport watches; however,in this work we use it for proximity estimation. We developed an An-droid app to continuously record data of the phone’s built-in sensors.Therefore, we extended the funf-open-sensing-framework [35] to alsodetect nearby devices by transceiving ANT-radio messages and to savethe raw sensor data locally to the memory card.

Data was recorded from the following built-in sensors: accelera-tion and orientation sensors were used to measure body movement,the barometer measured atmospheric pressure and was used to inferwhether individuals were on the same floor level and ANT-radio mes-sages were sent and received to find out which team member was inproximity to another one.

As our goal was to sense team behaviors, all devices carried by theteam members needed to be synchronized to allow comparison of thesensor signals across team members. Therefore, we measured the o↵setbetween system time and a common reference time each 5 min usingthe network time protocol. With this approach, we were able to achievea time synchronization across devices with a maximum time di↵erenceof 500 ms. To enable remote monitoring, we configured the frameworkto upload every five minutes a subset of calculated features, such asthe battery level to a central server. Because we used the smartphone

7.3. Sensing Spatial and Temporal Coordination 131

Features

Database Server

Real-Time Data Visualization

Visualization Server

Mobile Network

......

Battery Level

time [h]

devices

tota

l bat

tery

ch

arge

[%

]

6.00 12.00

200

400

800

600

Figure 7.2: CoenoFire: smartphone based data collection framework.Raw smartphone sensor data is saved to the SD-Card and features aretransmitted via the mobile network to enable real-time monitoring ofperformance metrics and system status, e.g. battery level.

as a sensing platform, we installed our app as the default home screenand blocked all soft buttons. In this way, our app was always visibleand the use of the smartphone was restricted to our data collection.

At the back-end, we ran one web server to receive and store thedata from the smartphones in a central database. A second web serverprovided a web-based user interface that o↵ered real-time monitoringof the system. A screen shot of the web interface showing the batterystatus of the devices is presented in the right of Figure 7.2. The interfacealso allows visualization of real time data of the firefighters’ movementand speech activity.

7.3.3 Experimental Study

In order to validate our approach, we tested it in a sample of firefightingteams completing a training scenario. The study was conducted incooperation with the Zurich fire department and approved by theEthics committee of the University of Zurich. Written consent from allparticipants was obtained prior to data collection.

Scenario

In cooperation with the training instructors, we designed the di�cultlevel of the training scenario to be challenging for the firefighters in


E

TLIC

3 4

5 6

21

Figure 7.3: Impressions of the training scenario. Firefighters had toenter through a roof window and navigate in low-visibility to a fire onthe third floor, rescue an unexpected dummy person and extinguishthe fire.

order to maximize di↵erences in team coordination and performance.Impressions of the scenario are shown in Figure 7.3. The scenario in-volved a fire on the third floor of an apartment building at whichteams arrived with two fire trucks. The building had to be accessedvia the roof. Thus, the firefighters had to prepare a turntable ladder.A first troop entered the building and navigated blindly to the sourceof the fire. On their way the troop detected an unconscious dummyperson. As a first priority, this person had to be evacuated. After that,the fire had to be extinguished, which could either be done by the firsttroop or by a second troop depending on the decision of the incidentcommander (IC). The scenario ended when the fire was extinguished.

Setting and Procedure

The scenario took place in a burn building, a multi-story training facil-ity that allows for a highly realistic simulation of fire incidents. Duringtraining sessions, firefighters were confronted with actual fires, ex-treme heat, high humidity and thick smoke restricting visibility. Train-ings were performed using standard equipment including vehicles,protection suits, and self-contained breathing apparatus (SCBA). Weinformed the participants about the study two weeks prior to data col-lection during their morning reports. Upon arrival at the trainings site,participants were again informed about the study, and completed theconsent form and a personal background questionnaire. They also re-ceived a briefing about the scenario from a training instructor. Then, the

7.4. Extraction of Team Coordination Indicators 133

first trial was conducted. Each firefighter carried one smartphone in theleft jacket pocket of his protection suit (see Figure 7.3). We videotapedall trials using two regular cameras to record outside and a thermo-graphic camera to record inside the building. After the scenario, theycompleted the coordination questionnaire (see below) and received atechnical debriefing about their performance. Team members switchedroles and started the next scenario after a short break.

Measurement of Perceived Coordination

Perceived explicit and implicit coordination were measured via self-report after each trial. To assess explicit coordination we used threeitems of the German translation of the subscale coordination of thetransactive memory scale [36]. A sample item is “Our team worked to-gether in a well-coordinated fashion.” The scale had a high reliability(↵ = .80). In absence of a validated scale, we developed five items toassess implicit coordination based on its definition. Sample items in-cluded “We automatically adjusted our working styles to each other”and “We understood each other blindly.” The scale had a high reliabil-ity (↵ = .87). All items were answered on a 5-point scale ranging from1 = “strongly disagree” to 5 = “strongly agree”.

Data Set

In total 51 professional firefighters from the Zurich Fire Brigade par-ticipated in our study. All participants were male, aged 35 ± 10 years.They completed a simulated fire incident in teams of 7-9 members. Thedata collection was conducted on four consecutive days. We recorded18 training runs of the described scenario. Most firefighters took partin more than one trial because of the limited overall sample size. How-ever, we made sure that participants switched their roles after eachtrial to ensure variation. In five trails one smartphone partially failedrecording; in additional three runs one firefighter did not participatein the study. This left us with 10 complete runs totaling to over 2 h oftraining data.

7.4 Extraction of Team Coordination Indicators

In the following, we present how we extract team coordination indi-cators from automatically sensed team networks. First, we describe


how moving subgroups are detected to derive the sub-group network.Second, we describe how temporal activity alignment is quantifiedto extract the activity alignment network. Third, we detail how teamcoordination indicators are extracted from the team networks.

7.4.1 Detection and Visualization of Moving Sub-Groups

In our previous work [34], we have shown how moving sub-groupswithin teams can be detected from radio-based proximity data ob-tained with smartphones. The detected sub-groups can be visualizedin the form of narrative charts to display which team members were insub-groups at each point in time and show how sub-groups merge andsplit over time. The narrative chart presented in Figure 7.4a illustratesthe moving sub-groups of firefighters during the described trainingscenario. The chart allows for example to identify the points in timewhen the first (1) and second troop (3) reached the top of the turntableladder and when the first troop entered the building (2). As can be seenin the narrative chart the lines representing the members of the firsttroop (T1a, T1b, T1c) split from the other lines (other team members)at time point (1) when the first troop forms a sub-group and uses theturntable ladder to reach the roof window. At time point (2) two mem-bers (T1a, T1b) of the first troop enter the building and team memberT1c remains outside on the top of the ladder. In the narrative chart thisis shown by the splitting of the yellow line from the orange and purplelines. At time point (3) members of the second troop (T2a,T2b) climbthe turntable ladder and join team member T1c which is shown in thechart by the merging of the red and brown lines with the yellow line.

Having detected moving sub-groups, we are able to calculate asub-group network that summarizes which team member was for howlong in a sub-group with another team member. Thus, the sub-groupnetwork captures the overall spatial structure of the team during a mis-sion. In Figure 7.4a the corresponding sub-group network is presentedon the right of the narrative chart. The network graphs highlightsthree sub-groups that belong to the first and second troop that enterthe building via the turntable ladder as well as the ground supportteam that includes the incident commander, turntable ladder operatorand the engineer. In the graph darker links between nodes correspondto team members that were longer than 60 % of the mission togetherin a sub-group.

In the following we briefly describe our method to detect moving


371

444

599

561

584

5011

222

827

630

0

Above Ground Ground Level

Moving Sub-Groups

010

020

030

040

050

060

0

12

3

time

[sec

onds

]

010

020

030

040

050

060

00246810

high

mut

ual i

nfor

mat

ion:

0.1

270

activity level

010

020

030

040

050

060

00246810

low

mut

ual i

nfor

mat

ion

0.02

12

activity level

Movement Alignment

Inci

dent

Com

man

der

T1a

Engi

neer

T2a

T1b

T2b

T2c

Ladd

er O

pera

tor

T1c

Mov

emen

t Alig

nmen

t Net

wor

k

Sub-

Gro

up N

etw

ork

Inci

dent

Com

man

der

T1a

Engi

neer

T2a

T1b

T2b

T2c

Ladd

er O

pera

tor

T1c

time

[sec

onds

]

a b

T1a

T1c

T1c

IC

Figu

re7.

4:M

easu

ring

the

sub-

grou

pan

dm

ovem

ent

alig

nmen

tne

twor

ks.

a)N

arra

tive

char

tre

pres

ents

prox

imity

dyna

mic

sof

ate

am.E

ach

colo

red

line

repr

esen

tsa

team

mem

ber,

clos

elin

esin

dica

tem

ovin

gsu

b-gr

oups

.Tea

mm

embe

rsar

eei

ther

ongr

ound

leve

lora

bove

grou

ndle

vel.

The

sub-

grou

pne

twor

ksu

mm

ariz

esho

wlo

ngea

chte

amm

embe

rw

asin

asu

b-gr

oup

with

whi

chea

chot

her

team

mem

ber.

Dar

ker

links

indi

cate

pair

sth

atw

ere

toge

ther

ina

sub-

grou

pfo

rmor

eth

an60

%of

the

mis

sion

.b)E

xam

ples

ofhi

ghan

dlo

wm

utua

lin

form

atio

nbe

twee

ntw

oac

tivity

leve

lsi

gnal

s.A

ctiv

ityle

vel

sign

als

chan

gem

ore

ofte

nsi

mul

tane

ousl

yin

the

top

grap

has

oppo

sed

toth

ose

show

nin

the

bott

omgr

aph.

Con

sequ

ently

,mut

uali

nfor

mat

ion

ishi

gher

for

sign

als

show

nin

the

top

grap

h.Th

em

ovem

enta

lignm

entn

etw

ork

sum

mar

izes

how

wel

ltea

mm

embe

rsal

igne

dth

eir

activ

ityle

vels

intim

e.D

arke

rlin

ksin

dica

tepa

irs

ofte

amm

embe

rsth

atsh

owed

high

erm

otio

nal

ignm

entt

han

60%

ofal

loth

erpa

irs

inth

eda

tase

t.


sub-groups using radio-based proximity data. Please refer to [34] formore details. We follow a two stage approach to detect moving sub-groups: We first calculate the proximity matrix Dt for consecutive timeintervals t of length L = 5 s. Each binary element Dt

ij indicates whetherdevice i received any ANT message of device j during time interval t.Considering proximity to be undirected, we further symmetrize theproximity matrix to obtain Dt

sym.In the second stage, moving sub-groups are clustered from the

proximity data. Clusters are first identified independently from thesymmetrized proximity matrices of each time interval and secondly,the clustering output is smoothed by applying a temporal filter, so thatclusters last for at least 10 s. We cluster each symmetrized proximitymatrix Dt

sym using the single-link criterion. As a result, if group memberA is connected with B and B with C, but not with A, all three devicesare still clustered into one group.

Using only radio based proximity information might lead to indi-viduals on di↵erent height levels to be clustered into one group. Toaddress this problem, we added height information derived from theatmospheric pressure sensor. If the absolute atmospheric pressure dif-ference between two devices is greater than a predefined threshold, thetwo devices are considered to be on di↵erent height levels and are thusnot clustered to the same sub-group. To obtain the sub-group network,we average the clustering results over all time intervals.

As the ANT-radio protocol operates in the 2.4 GHz band, radiosignals are particularly influenced by the surrounding environment.In our experiments, we observed that depending on the relative ori-entation and environment of the individuals, the maximal transmitdistance varied in the range of 1 m to 20 m. In [34] we evaluated ouralgorithm to detect moving sub-groups of firefighters during the de-scribed training scenario by comparing the results to a manually anno-tated ground truth. On average, team members were assigned to thecorrect sub-group with 95 % accuracy.

7.4.2 Temporal Alignment of Activity Level

In order to capture the temporal aspect of coordination in teams, wemeasure and compare activity levels of individual team members.Thus, we assume that well coordinated team members change theiractivity level at similar points in time.


-10linea

r acc

eler

atio

n [m

/s2 ]

activ

ity le

vel

time [s]0 50 100 150 200 250 3000

246810

0 50 100 150 200 250 300

-5

0

5

10

15

Figure 7.5: Calculation of Activity Level. Top: Linear acceleration mag-nitude and moving standard deviation (pink), rectangles indicate seg-ments in which the standard deviation exceeds the activity detectionthreshold. Bottom: Activity level calculated on a hopping window(L = 5 s, S = 200 ms).

We define the activity level to be the fraction of time that an in-dividual is active within a moving window of length L. The activitylevel increases when individuals become active and decreases as soonas team members stop moving. The window length L determines theslope of the activity level and the minimum time that an individualneeds to be active to reach the maximum activity level. The value of Lalso a↵ects the temporal resolution, a small value requires individualsto change their activity closer in time, whereas a larger value allowsfor a delay between activity changes, as the activity level is calculatedover a longer period.

Figure 7.5 illustrates the calculation of the motion activity level. In afirst step, we detect when a team member is active, by thresholding themoving standard deviation of the linear acceleration magnitude. Whena predefined threshold is exceeded, motion activity is detected (top inFigure 7.5). In a second step, the motion activity level is calculatedas the percentage of time that motion activity was detected within ahopping window of length L and step size S (bottom in Figure 7.5). Forfurther processing, the continuous activity level is linearly quantizedinto 10 discrete activity levels {0..9}.

In order to compare two motion activity level signals X,Y 2 {0..9}of two team members, we use the mutual information as similaritymeasure. In general, mutual information measures the dependency


between two random variables, that is how much information twovariables share and is defined as:

I(X,Y) =X

x2X

X

y2Y

p(x, y) logp(x, y)

p(x)p(y)(7.1)

The dependency between X and Y is expressed by the joint distributionp(x, y) and compared to the joint distribution when independence is as-sumed, in which case p(x, y) = p(x)p(y). Thus, I(X,Y) is zero if and onlyif X and Y are independent. Two examples of activity level alignmentthat occurred during the firefighting training scenario are presented inFigure 7.4b. The two activity levels presented in the top graph belongtwo team members from the first troop (T1a, T1b), whereas the activitylevels shown in the bottom graph belong to team member T1b andthe incident commander. While the activity levels of the troop mem-bers change often together in time and are well aligned, the activitylevels of the troop member T1b and the incident commander are notwell aligned in time. Consequently, the observed mutual informationis higher between the activity levels of the troop members as opposedto those of troop member T1b and the incident commander.

In order to summarize the temporal alignment for the whole team,the mutual information between all pairs of activity levels are calcu-lated. This results in the activity alignment network. An example ispresented on the right side of Figure 7.4b. As can be seen, troop mem-ber T1b had highest activity alignment with troop member T1a andlowest with the incident commander.

7.4.3 Team Coordination Indicators

On each of the extracted team networks (sub-group network and ac-tivity alignment network), we calculate network density and degreecentralization in order to characterize the global network structure. Insummary, we extract the following team coordination indicators fromthe team networks:

• Density of the sub-group network measures how long teammembers were on average in sub-groups. As the sub-group net-work captures the spatial distribution of team members, a highdensity indicates that many team members were together for along time, whereas a low density indicates that team memberswere mostly on their own.


• Degree centralization of the sub-group network measures howdi↵erently team members were part of a sub-group. A high de-gree centralization indicates that there was at least one well con-nected large group and one other small group.

• Density of the activity alignment network measures how wellteam members aligned their activity level on average. It can thusbe seen as an overall measure of how coordinated a team moved.

• Degree centralization of the activity alignment network mea-sures how di↵erently the team members aligned their activitylevels with that of others. It can thus be seen as an overall measureof how di↵erently team members’ motions were coordinated.

In the following, we give the definition of network density andcentralization. Degree centrality of a node in the network captures howwell each node (team member) in the network is connected to othernodes (other team members). Degree centrality of node i is defined by

di =1

N � 1

X

j,i

ai j,

with aij 2 [0..1] being an element of the adjacency matrix defining thenetwork and i, j 2 [1..N], with N being the number of nodes in thenetwork. Network density D is the average degree centrality and thuscaptures how well nodes in the network are connected with each other.Network density is given by

D =1N

NX

i

di.

Degree centralization DC measures how central its most central node(highest degree) is in relation to how central all the other nodes are.Thus, it is a measure of degree variation and is zero in a homogeneouslyconnected network where each node has the same degree and one in astar network. Degree centralization is defined by

DC =1

N � 1

NX

i

|d⇤ � di|,

with d⇤ being the maximum degree observed in the network.


7.5 Evaluation of Team Coordination Indicators

7.5.1 Correlation Analysis

In order to evaluate our approach, we correlated the proposed coordi-nation indicators derived from the sensor data with three validationcriteria. We used perceived implicit and explicit coordination to val-idate the indicators. Explicit coordination includes those actions in-tentionally used for team coordination and is achieved by means ofverbal communication. By contrast, implicit coordination refers to theanticipations of others members’ actions and the dynamic adjustmentof one’s own actions accordingly, without the need for overt commu-nication [37].

We additionally used time to complete the training mission as objec-tive validation criteria. As time is critical in firefighting our reasoningwas that well coordinated teams would be faster.

Our findings reveal significant relationships between both subjec-tive and objective validation criteria as can be seen from Table 7.1.Figure 7.6 shows the relationships graphically. In summary, we find:

• Degree centralization of the sub-group network is highly nega-tively correlated with completion time and positively with im-plicit team coordination. That is, the centralization of the sub-group network decreased with completion time (compare Fig-ure 7.6a) and increased with perceived implicit coordination(compare Figure 7.6c). In other words, faster teams showed ahigher degree of centralization in the sub-group network, mean-ing that team members were more heterogeneously connected,e.g. some firefighters were in well connected sub-groups for along time, whereas others were longer on their own or part of asmall sub-group.

• Density of activity alignment networks is highly negatively cor-related with completion time and positively with implicit coor-dination. That is, the density of the movement alignment net-work decreased with completion time (compare Figure 7.6f) andincreased with perceived implicit coordination (compare Fig-ure 7.6h). Thus, faster teams showed more alignment of theiractivity levels and perceived their implicit coordination as betterthan slower teams.

7.5. Evaluation of Team Coordination Indicators 141

netw

ork

met

ric

com

plet

ion

time

impl

.coo

rdin

atio

nex

pl.c

oord

inat

ion

rp

rp

rp

sub-

grou

pde

nsity

+0.

207

0.56

7�0.6

090.

062

�0.5

870.

074

cent

raliz

atio

n�0.8

41⇤⇤

0.00

2+

0.64

4⇤0.

044

+0.

427

0.21

9

mov

emen

t-de

nsity

�0.8

24⇤⇤

0.00

3+

0.65

7⇤0.

039

+0.

415

0.23

3al

ignm

ent

cent

raliz

atio

n�0.3

560.

313

+0.

411

0.23

8+

0.56

90.

086

Not

es:⇤

p<

0.05

,⇤⇤ p<

0.01

Tabl

e7.

1:Li

near

Cor

rela

tion

Ana

lysi

s:R

elat

ions

hip

betw

een

team

coor

dina

tion

indi

cato

rsan

dou

tcom

em

ea-

sure

ste

ampe

rfor

man

cean

dex

peri

ence

dco

ordi

natio

n(im

plic

itan

dex

plic

it)(N=

10,L=

5s).


R² = 0

.70

62

0.0

0

0.0

2

0.0

4

0.0

6

0.0

8

0.1

0

0.1

2

0.1

4

10

12

14

16

18

Sub-Group NetworkCentralization

Co

mp

letion

Time

[min

]

R² = 0

.04

27

0.0

0

0.1

0

0.2

0

0.3

0

0.4

0

0.5

0

0.6

0

0.7

0

10

12

14

16

18

Sub-Group NetworkDensity

Co

mp

letion

Time

[min

]

R² = 0

.12

67

0

0.0

2

0.0

4

0.0

6

0.0

8

0.1

0.1

2

0.1

4

10

12

14

16

18

Movement Alignment NetworkCentralization

Co

mp

letion

Time

[min

]

R² = 0

.68

0

0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

10

12

14

16

18

Movement Alignment NetworkDensity

Co

mp

letion

Time

[min

]

R² = 0

.41

53

0.0

0

0.0

2

0.0

4

0.0

6

0.0

8

0.1

0

0.1

2

0.1

4

3.5

03

.75

4.0

04

.25

4.5

0

Sub-Group NetworkCentralization

Imp

licit Co

ord

inatio

n

R² = 0

.37

14

0.0

0

0.1

0

0.2

0

0.3

0

0.4

0

0.5

0

0.6

0

0.7

0

3.5

03

.75

4.0

04

.25

4.5

0

Sub-Group NetworkDensity

Imp

licit Co

ord

inatio

n

R² = 0

.16

85

0

0.0

2

0.0

4

0.0

6

0.0

8

0.1

0.1

2

0.1

4

3.5

03

.75

4.0

04

.25

4.5

0

Movement Alignment NetworkCentralization

Imp

licit Co

ord

inatio

n

R² = 0

.43

26

0

0.0

5

0.1

0.1

5

0.2

0.2

5

0.3

0.3

5

3.5

03

.75

4.0

04

.25

4.5

0

Movement Alignment NetworkDensity

Imp

licit Co

ord

inatio

n

aef

gh

bc

d

Figure7.6:

Lineartrend

lines:Relationship

between

teamcoordination

indicatorsw

ithcom

pletiontim

eand

experiencedim

plicitcoordination(N=

10,L=

5s).

7.5. Evaluation of Team Coordination Indicators 143

The finding that faster teams had a sub-group network with higherdegree of centralization makes sense, because the chosen scenario de-manded teams to split into at least three sub-groups of di↵erent size:the troop that went inside the building, the firefighter on top of theladder that helped with the fire hose and the remaining team memberson the ground outside the building. Thus, faster teams organized theirspatial structure more e�ciently than slower teams.

In terms of activity alignment, the results showed that faster teamsexhibited higher temporal movement coordination. This finding seemsreasonable as it indicates that firefighters in faster teams worked welltogether and aligned their movements accordingly. Thus, faster teamsmoved on average more synchronously than slower teams.

The correlation analysis did not indicate any significant relation-ships between the proposed team coordination indicators and per-ceived explicit coordination. This is likely due to the fact that the pro-posed coordination indicators measure spatial and temporal aspectsand did not include direct verbal communication which is essential forexplicit coordination.

7.5.2 Visual Analysis of Team Networks

To further analyze the correlation results, we visually inspect the teamnetworks. In Figure 7.7 the sub-group networks of the teams are pre-sented. As can be seen, faster teams tended to have one firefighter ontop of the ladder (red circle) and two well connected sub-groups, thetroop and the remaining firefighters on ground. The observed high de-gree centralization stems from the fact that faster teams split quicklyinto these three groups so that the firefighter on top of the ladder wasrelatively long alone, which is the reason for the low degree (smallcircle). As the other firefighters were part of a sub-group, their respec-tive degrees were higher (larger circle). Due to this di↵erence in theindividual degrees the overall degree centralization of the network ishigh. While not being statistically significant, it can also be observedfrom the networks, that slower teams showed higher density, meaningthat their team members stayed longer in well connected sub-groups,which indicates that they needed more time to split into the requiredsub-groups.

The movement alignment networks of all groups are presented inFigure 7.8. As can be seen, faster teams (top row) have better connectednetworks and thus higher network density. Further, the networks show


that the highest activity coordination occurred between members thatworked close together such as the troop inside the building. Troopmembers of faster teams had higher alignment between their activ-ity levels (represented by bigger circles) than those of slower teams(represented by smaller circles).

7.5. Evaluation of Team Coordination Indicators 145ce

ntra

lizat

ion:

0.1

249

dens

ity: 0

.578

9du

ratio

n: 1

0.63

min

cent

raliz

atio

n: 0

.076

4de

nsity

: 0.5

499

dura

tion:

12.

98 m

in

cent

raliz

atio

n: 0

.099

7de

nsity

: 0.4

048

dura

tion:

11.

13 m

in

cent

raliz

atio

n: 0

.059

0de

nsity

: 0.3

974

dura

tion:

15.

72 m

in

cent

raliz

atio

n: 0

.112

4de

nsity

: 0.4

391

dura

tion:

11.

22 m

in

cent

raliz

atio

n: 0

.074

8de

nsity

: 0.4

754

dura

tion:

15.

90 m

in

cent

raliz

atio

n: 0

.101

7de

nsity

: 0.3

943

dura

tion:

11.

65 m

in

cent

raliz

atio

n: 0

.085

2de

nsity

: 0.5

775

dura

tion:

16.

23 m

in

cent

raliz

atio

n: 0

.108

3de

nsity

: 0.4

234

dura

tion:

10.

57 m

in

cent

raliz

atio

n: 0

.099

3de

nsity

: 0.4

247

dura

tion:

12.

55 m

in

Ladd

er O

pera

tor

Inci

dent

Com

man

der

Gro

und

Supp

ort

Engi

neer

Troo

p -

at to

p of

ladd

er

Troo

p -

insi

de b

uild

ing

Figu

re7.

7:Su

b-G

roup

Net

wor

ksso

rted

from

fast

estt

eam

(top

left

)to

slow

estt

eam

(bot

tom

righ

t).E

ach

team

mem

ber

isre

pres

ente

dby

ano

dew

hich

size

ispr

opor

tiona

lto

itsde

gree

.Deg

ree

expr

esse

sho

wlo

nga

team

mem

ber

was

onav

erag

ein

asu

b-gr

oup

with

each

othe

rte

amm

embe

r.R

oles

are

indi

cate

dby

di↵

eren

tcol

ors.

Link

sbet

wee

nno

desr

epre

sent

the

time

that

the

corr

espo

ndin

gte

amm

embe

rsw

ere

inon

esu

b-gr

oup,

node

sof

team

mem

bers

whi

chw

ere

long

erin

the

sam

esu

b-gr

oup

are

show

nto

becl

oser

toge

ther

.Sub

-gro

upne

twor

ksof

fast

erte

amsa

rem

ore

hete

roge

neou

s,w

here

assl

ower

team

shav

em

ore

hom

ogen

eous

lyco

nnec

ted

netw

orks

.Th

isin

dica

tes

that

fast

erte

ams

split

mor

equ

ickl

yin

toth

eta

skde

man

ded

sub-

grou

psan

dth

usw

ere

bett

ersp

atia

llyco

ordi

nate

d.

146 Chapter 7: Sensing of Team Coordination Indicatorscentralization: 0.0544density: 0.2936duration: 10.63 m

in

centralization: 0.0747density: 0.1820duration: 12.98 m

in


in


in


in


in


in


in


in


in

Ladder Operator

Incident Comm

ander

Ground Support

Engineer

Troop - at top of ladder

Troop - inside building

Figure7.8:M

ovementA

lignmentN

etworks

sortedfrom

fastestteam(top

left)toslow

estteam(bottom

right).Size

ofanode

correspondstoitsdegree.D

egreeexpressesthe

averageactivity

coordinationw

ithallotherteam

mem

bers.Rolesare

indicatedby

di↵erentcolors.Linkw

idthcorrespondsto

activitycoordination

between

two

teamm

embers,nodesofteam

mem

berswhich

betteralignedtheirm

otionactivity

levelsareshow

nto

becloser

together.Movem

entalignmentnetw

orksoffasterteam

sare

highlyconnected,w

hereasslow

erteams

haveless

connectednetw

orks.Thisindicates

thatfasterteam

shad

bettertem

poralcoordination.

7.6. Discussion 147

7.6 Discussion

The aim of the current paper was to introduce smartphone based be-havior sensing and data processing as a means to automatically ob-serve team coordination processes in realistic environments. To thisend, we described the method, its theoretical background and reportedthe findings of a validation study. Our method consists of three steps:First, individual activity level and proximity between team membersis measured with the integrated sensors of smartphones. Second, in-dividual data streams are processed and compared with each other toderive motion alignment and sub-group networks which capture teamcoordination processes. Third, to derive team coordination indicatorswe used social network analysis to quantify temporal and spatial coor-dination on the team level. In a firefighting training scenario, we havevalidated the team coordination indicators by investigating their linkto team performance and experienced team coordination.

7.6.1 Implications

We see four main implications of our method:

• First, we provide a method that is capable of continuously mon-itoring team coordination processes in a variety of settings.Thereby, we provide a new measurement tool for team research.

• Second, the smartphone allows to capture di↵erent types of net-works that are beyond classical self-report based social networksthat focus on content such as advice, information or friendship.We introduced proximity based sub-groups and motion activityalignment as contents. Moreover, the smartphone data has a hightemporal resolution (in the order of seconds) and thus captureschanges in network structure over time.

• Third, the smartphone-based behavior sensing approach enableseasier data collection as no user input is required. In addition, thesmartphone-based approach o↵ers a higher degree of anonymitythan videos, which potentially increases the willingness to par-ticipate in a study.

• Finally, our approach bears the potential to open up new set-tings for team research. The smartphone can be used in settings


where traditional behavioral observation is not feasible. For ex-ample, firefighters can be monitored during real incidents. Inearlier work [33], we have shown that a smartphone-based datacollection approach is feasible in such settings.

7.6.2 Practical Implications

Our proposed team sensing approach also has potential implicationsfor practitioners. Instructors of first responder teams can use the smart-phone during training to collect objective data on team processes.These data could then serve as additional input for debriefing andthus enables data supported training feedback. Even more so, as thedata can be illustrated using di↵erent graphs. For example, the narra-tive chart (compare Figure 7.4) can be used to get a quick overview ofwhen sub-groups formed and disbanded over the course of a mission.

As the smartphone allows for continuous, real-time assessment ofteams, it could also be used during actual missions to monitor perfor-mance and coordination. This has a large potential for error prevention.For example, using radio-based proximity data, mission commanderscan detect when a team member moves out of sight of the rest of theteam without relying on GPS or any installed infrastructure. Beingalone may pose a threat to this person because it will be more di�cultfor his teammates to recognize potential dangers and to provide timelybackup. As smartphones are widely used, it would not be di�cult toimplement such a monitoring system.

7.6.3 Limitations

In order to detect moving sub-groups over time, we make use of radiobased proximity estimation. The accuracy of nearby device detectiondependents on architectural constraints and consequently varies acrosslocations. Experiments showed that the maximum detection distancevaries between 1 m to 20 m. This accuracy proved to be good enoughfor the detection of moving sub-groups in the firefighting scenario. Wetherefore believe our method to be also applicable to other first respon-der teams. Having room-level accuracy, the approach is also usefulto track white-collar workers in o�ce buildings in order to capturetheir co-location networks, which could be used to identify importantpersons in a social network. However, in a social event in which in-dividuals stand close together the spatial resolution is likely not to

7.7. Acknowledgements 149

be su�cient to reliably detect social interaction. In such scenarios, theused technique can only give rough proximity cues.

Further, we measured temporal coordination as simultaneouschange in activity level. As the activity level captures the amount ofbody movement, it is only a rough estimate of action coordination. Forfirefighting teams and most likely also for other first responder teams,the simultaneous change in body movement is clearly related to teamcoordination because in such teams it is important to move togetherto solve the task. For white-color workers in an o�ce building thissimultaneous change in movement however has no clear meaning.In typical o�ce work, body movement itself is not a driving processbehind team performance.

7.6.4 Conclusion

We proposed a set of team coordination indicators that can be mea-sured with the built-in sensors of smartphones. We demonstrated thevalidity of our approach in firefighting teams performing a realistictraining scenario and investigated the link between the coordinationindicators and team performance as well as experienced team coor-dination. Our method enables researchers to capture temporal andspatial team coordination automatically and objectively. However, toprove the generality of the approach, future studies have to be carriedout in di↵erent architectural configurations and with other types ofteams.


The authors would like to thank all members of the Zurich fire brigadefor their participation and support throughout the experiments. Weare grateful to Bert Arnrich and to Bertolt Meyer for their help indesigning this study. We thank Anna-Lena Köng, Laura Fischer andNadja Ott for their help with the data collection and for behavioralcoding. This work was funded by the SNSF interdisciplinary project"Micro-level behavior and team performance" (grant agreement no.:CR12I1_137741).


Bibliography


[2] S. G. Cohen and D. E. Bailey, “What makes team work: Groupe↵ectiveness from the shop floor to the executive suite,” Journal ofManagement, vol. 23, no. 3, pp. 239–290, 1997.

[3] R. L. Moreland, J. D. Fetterman, J. J. Flagg, and K. Swanenburg,“Behavioral assessment practices among social psychologists whostudy small groups,” in Then A Miracle Occurs: Focusing on Behaviorin Social Psychological Theory and Research:, pp. 28–53, New York:Oxford University Press, 2010.

[4] M. A. Rosen, W. L. Bedwell, J. L. Wildman, B. A. Fritzsche, E. Salas,and C. S. Burke, “Managing adaptive performance in teams: Guid-ing principles and behavioral markers for measurement,” HumanResource Management Review, vol. 21, no. 2, pp. 107–122, 2011.

[5] M. T. Brannick and C. Prince, “An overview of team performancemeasurement,” in Team performance assessment and measurement:Theory, Methods, and Applications, pp. 3–16, London: LawrenceErlbaum Associates, 1997.


[7] S. J. Guastello and D. D. Guastello, “Origins of coordination andteam e↵ectiveness: A perspective from game theory and nonlineardynamics,” Journal of Applied Psychology, vol. 83, no. 3, pp. 423–437,1998.

[8] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T.Campbell, “A survey of mobile phone sensing,” IEEE Communi-cations Magazine, vol. 48, no. 9, pp. 140–150, 2010.


[9] M. J. Burtscher, T. Manser, M. Kolbe, G. Grote, B. Grande, D. R.Spahn, and J. Wacker, “Adaptation in anaesthesia team coordina-tion in response to a simulated critical event and its relationship toclinical performance,” British Journal of Anaesthesia, vol. 106, no. 6,pp. 801–806, 2011.

[10] M. J. Burtscher, J. Wacker, G. Grote, and T. Manser, “Managingnon-routine events in anesthesia: the role of adaptive coordina-tion,” Human Factors, vol. 52, no. 2, pp. 282–294, 2010.

[11] A. A. Stachowski, S. A. Kaplan, and M. J. Waller, “The benefits offlexible team interaction during crises,” Journal of Applied Psychol-ogy, vol. 94, no. 6, pp. 1536–1543, 2009.

[12] E. R. Crawford and J. A. LePine, “A configural theory of team pro-cesses: Accounting for the structure of taskwork and teamwork,”Academy of Management Review, vol. 38, no. 1, pp. 32–48, 2013.

[13] S. Wasserman and K. Faust, Social Network Analysis: Methods andApplications. New York: Cambridge University Press, 1994.

[14] D. Knoke and S. Yang, Social Network Analysis, vol. 154 of Quanti-tative Applications in the Social Sciences. Thousand Oaks, CA: Sage,2008.

[15] P. Balkundi and D. A. Harrison, “Ties, leaders, and time in teams:Strong inference about network structure e↵ects on team viabilityand performance,” Academy of Management, vol. 49, no. 1, pp. 49–68, 2006.

[16] R. T. Sparrowe, R. C. Liden, S. J. Wayne, and M. L. Kraimer, “So-cial networks and the performance of individuals and groups,”Academy of Management, vol. 44, no. 2, pp. 316–325, 2001.

[17] D. Zohar and O. Tenne-Gazit, “Transformational leadership andgroup interaction as climate antecedents: a social network anal-ysis,” Journal of Applied Psychology, vol. 93, no. 4, pp. 744–757,2008.

[18] R. J. Houghton, C. Baber, R. McMaster, N. A. Stanton, P. Salmon,R. Stewart, and G. Walker, “Command and control in emer-gency services operations: a social network analysis,” Ergonomics,vol. 49, no. 12-13, pp. 1204–1225, 2006.

153

[19] A. Pentland, Honest Signals: How they shape our world. Cambridge:The MIT Press, 2008.



[22] Y. Chon, N. D. Lane, F. Li, H. Cha, and F. Zhao, “Automati-cally characterizing places with opportunistic crowdsensing us-ing smartphones,” in Proc. Int. Conf. Ubiquitous Computing (Ubi-Comp), 2012.

[23] M. Rossi, O. Amft, and G. Tröster, “Recognizing daily life con-text using web-collected audio data,” in Proc. Int. Symp. WearableComputers (ISWC), 2012.

[24] T. Choudhury and A. S. Pentland, “Sensing and modeling humannetworks using the sociometer,” in Proc. Int. Conf. Symposium onWearable Computers (ISWC), 2003.

[25] D. Wyatt, J. Bilmes, T. Choudhury, and J. Kitts, “Towards theautomated social analysis of situated speech data,” in Proc. Int.Conf. Ubiquitous Computing (UbiComp), 2008.

[26] M. Rossi, O. Amft, and G. Tröster, “Collaborative personal speakeridentification: A generalized approach pervasive and mobile com-puting,” Pervasive and Mobile Computing, vol. 8, pp. 180–189, 2012.








[33] S. Feese, B. Arnrich, M. Burtscher, B. Meyer, K. Jonas, andG. Tröster, “CoenoFire: Monitoring performance indicators of fire-fighters in real-world missions using smartphones,” in Proc. Int.Conf. Ubiquitous Computing (UbiComp), 2013.

[34] S. Feese, B. Arnrich, M. Burtscher, B. Meyer, K. Jonas, andG. Tröster, “Sensing group proximity danamics of firefightingteams using smartphones,” in Proc. Int. Symp. Wearable Computers(ISWC), 2013.


[36] K. Lewis, “Measuring transactive memory systems in the field:Scale development and validation.,” Journal of Applied Psychology,vol. 88, no. 4, pp. 587–604, 2003.

[37] R. Rico, M. Sanchez-Manzanares, F. Gil, and C. Gibson, “Teamimplicit coordination processes: A team knowledge-based ap-proach,” Academy of Management Review, vol. 33, no. 1, pp. 163–184,2008.

8Monitoring

Firefighters inReal-world Missions

Sebastian Feese, Bert Arnrich, Gerhard Tröster, Michael Burtscher, Bertolt Meyer,Klaus Jonas

CoenoFire: Monitoring Performance Indicators of Firefighters in Real-worldMissions using Smartphones

Proceedings International Joint Conference on Pervasive and Ubiquitous Com-puting (UbiComp), pp. 83–92, 2013.

c� 2013 ACM.

156 Chapter 8: Monitoring Firefighters in Real-world Missions

Abstract

Firefighting is a dangerous task and many research projects have aimed atsupporting firefighters during missions by developing new and often costlyequipment. In contrast to previous approaches, we use the smartphoneto monitor firefighters during real-world missions in order to provideobjective data that can be used in post-incident briefings and trainings.In this paper, we present CoenoFire, a smartphone based sensing systemaimed at monitoring temporal and behavioral performance indicators offirefighting missions. We validate the performance metrics showing thatthey can indicate why certain teams performed faster than others in atraining scenario conducted by 16 firefighting teams. Furthermore, wedeployed CoenoFire over a period of six weeks in a professional fire brigade.In total, 71 firefighters participated in our study and the collected dataincludes 76 real-world missions totaling to over 148 hours of missiondata. Additionally, we visualize real-world mission data and show howmission feedback is supported by the data.

8.1 Introduction

Firefighting is a dangerous and potentially life threatening task. Fire-fighters work in unfamiliar situations under a high degree of uncer-tainty and time is critical [1]. To overcome these challenges team workis of utmost importance. High team performance of firefighters is cru-cial for saving lives and protecting property and environment.

During missions each firefighter fulfills a specific function and re-lies on his peers. These individual functions and their related activitieshave to be coordinated within the team. As a result, e↵ective coordina-tion is vital for firefighting, which is in line with the general finding thatteam coordination is an important correlate for performance [2]. As co-ordination and performance unfold in time, continuous monitoring isimportant to investigate these processes in detail.

In our view, ubiquitous computing can help to continuously mon-itor performance indicators of firefighters during real-world missionsand to assist incident commanders as well as training instructors withobjective data during post-incident feedback and training. As mostfirefighters of our study already carry their mobile phone with them,even during missions, the smart phone can serve as a rich sensor plat-form to unobtrusively monitor firefighters.


Our goal is a system that can be used to capture performance indica-tors of firefighters in training scenarios as well as real-world missions.In close collaboration with a professional fire brigade, we defined aset of performance metrics that can be extracted from the smartphonedata. Furthermore, we visualize the sensor data to show how missionsevolve over time to automatically create a high level log book withimportant events of a mission. In particular, our contributions are:

1. We describe how sensor data over a period of several weeks canbe collected in a hazardous, real-world work environment. Weanalyze requirements, detail our implementation of our sensingsystem CoenoFire and present lessons learned.

2. Considering speech and movement activity as proxy of explicitteam coordination and team e↵ort, we analyze the relationship tothe critical performance measure of completion time in a realistictraining scenario.

3. We show how real life missions evolve over time and demon-strate how mission phases and important firefighting events suchas time of arrival and first troop in house can be automaticallylogged.

8.2 Related Work

In this section, we detail related work on the two main aspects of thepaper. First, we summarize technical projects which aimed at sup-porting firefighters. Second, we review current smartphone sensingapplications.

8.2.1 Supporting Firefighters

Previous research projects which aimed to support firefighters focusedon three aspects: monitoring of firefighters’ health during missions,monitoring of the environment of firefighters for toxic gases and hightemperatures and providing navigational support.

The European Union funded several research projects which aimedat supporting and increasing work safety of firefighters. The ProeTEXproject [3] developed a system including a smart textile to monitorthe physiological status of the firefighter. Within the emergency re-sponse part of the wearIT@work project [4], the LifeNet, a beacon


based relative positioning system, was proposed to support tacticalnavigation under poor visibility. To increase acceptance by the fire-fighters the LifeNet approach was adapted in the ProFiTex project [5]to better integrate with current practices of firefighting brigades andresulted in a Smart Lifeline to which firefighters are connected anddata can be transmitted out of the building to the incident commander.The NIST Smart Firefighting Project [6] combines research in smartbuilding technology, smart firefighter equipment and robotics. Like inprevious projects the aim is to provide real-time information on fire-fighter location, firefighter vital signs, and environmental conditionsto the firefighter, incident commander, and other firefighters. The FireInformation and Rescue Equipment project [7] at UC Berkeley com-bined wireless sensor networks and small head-mounted displays tosupport firefighters. In [8] a fixed wireless sensor network enables thecommunication between emergency responders and the incident com-mander. Multiple prototypes of localization and navigation systemshave been developed to support firefighters. In a recent review [9]the benefits and drawbacks of pre-installed location systems, wirelesssensor systems and inertial tracking systems for emergency responderswere compared.

All of the above projects focused on supporting firefighters on-site.Although system prototypes were tested in simulated scenarios noneof these project ideas were used in real-world missions. Our approachputs the focus on real-world deployment and usage during actual in-cidents. In this paper, we do not put our primary focus on supportingfirefighters during missions when time is critical but rather on docu-menting mission operations in order to support post mission feedbackand further trainings. However, CoenoFire o↵ers real-time feedback ofperformance indicators and can be used to monitor ongoing missions,provided that the area of operation is covered by the mobile network.

8.2.2 Smartphone Sensing

As we use the smartphone as our sensing platform, we review existingsmartphone sensing applications targeted at monitoring an individualor a group of persons.

The smartphone, with more and more built-in sensors, has evolvedinto a ubiquitous sensing platform and recent research has shown howuser context and behavior can be inferred. Studies dealt with inferenceand detection of important places [10], detection of daily routines [11],

8.3. Performance Indicators 159

as well as the detection of users emotions [12], experienced stress [13]and personality [14]. Automatic assessment of well-being with thesmartphone was explored in [15]. On a population level, communitieshave been first identified from Bluetooth proximity networks by Eagleet. al. [16], and only recently topic models were used to discover humaninteractions from proximity networks [17].

Instead of using the smartphone, Olguin et al. used sociometricbadges to collect behavioral data of 67 nurses in the Post AnesthesiaCare Unit of a hospital [18]. The results showed a positive relationshipof group body motion energy and speaking time with group produc-tivity.

In previous work [19], we adopted the idea to use motion andspeech activity to monitor teams. Our feasibility study showed thatspeech and motion activity are promising performance indicators infirefighting teams. However, to the best of our knowledge, no previousstudy has attempted the monitoring of professional firefighters duringreal-world missions using only the smartphone.

8.3 Performance Indicators

The performance of a firefighting squad depends on a set of criteriaand there exist no single measure of performance. During missions,firefighters have to keep in mind several objectives, but obviously ownsafety stands above all, followed by rescuing other lives and protectingproperty.

Firefighting squads can be considered as action teams that are char-acterized by expert members conducting complex, time-limited tasksin challenging environments [20]. As delays can be disastrous, time isa critical factor in evaluating the performance of firefighters [1]. In thisregard, two temporal aspects can be distinguished: speed and timing.Action teams need to complete their tasks quickly. Moreover, the righttiming of team activities (when to do what) is crucial for success. Phasemodels of team processes [21] highlight this second aspect of tempo-rality. For example, planning activities should be finished prior to taskexecution.

To assess the speed aspect of time related team performance, wepropose to use timing measures of important events during missions.This includes the time of arrival on-site, as well as the time of a firsttroop entering a building. We will therefore aim to detect these eventsautomatically from the smartphone sensor data.


In addition, we extract the following behavioral performance met-rics from the sensor data. We measure team e↵ort as the amount, in-tensity and variability of physical activity, reasoning that higher teame↵ort is expressed in more physical activity. Furthermore, we assessteam coordination as the amount of speech activity. The idea behind isthat the more firefighters have to explicitly co-ordinate their actions,the more they have to communicate.

8.4 CoenoFire System

In the following, we detail our approach to monitor performance in-dicators of firefighters and describe CoenoFire, our mobile sensingsystem to monitor firefighters on-duty.

8.4.1 Requirements on Monitoring System

For a successful data collection in a real working environment it is ofutmost importance not to infer with day to day operations. In the caseof monitoring firefighters this is in particular true as time is criticaland firefighters will not accept any delays when they leave the stationfor a mission. At the same time, the system should run reliably andbe always ready to record data. Consequently, in order to monitorfirefighting missions 24/7, one has to find a practical solution to chargethe smartphones reliably without much user e↵ort and to start andstop the data recording automatically.

To ease administration and to be able to respond to possible datacollection problems, for example when firefighters forget to chargethe smartphone after returning from a mission, the system shouldfurther support some real-time feedback on it’s current state, such asthe battery level of each smartphone.

Additional information about an incident may be obtained during apost mission questionnaire. However, the number of questions asked toeach firefighter should be kept to a minimum, because time consumingquestionnaires will result in missing data as firefighters have moreimportant tasks than filling out questionnaires.

8.4.2 Data Collection Framework

Our data collection framework CoenoFire consists of two parts, thesmartphone data collector as the sensing front-end and a database and

8.4. CoenoFire System 161

SD Card

Features

Mobile Network

Database Server Real-Time Data Visualization

Visualization ServerRaw Data

Battery Level

Figure 8.1: CoenoFire: Smartphone based data collection framework.Raw smartphone sensor data is saved to the SD-Card and features aretransmitted via the mobile network to enable real-time monitoring ofperformance metrics and system status, e.g. battery level.

visualization server in the back-end. The overall data flow is illustratedin Figure 8.1.

For data collection, we used the Sony Xperia Active Smartphonewhich was designed for active people. It features a dust and water-resistant case, a 3-inch capacitive touchscreen and a built-in ANT radio1

to communicate with fitness devices such as heart-belts. We choose thephone for the data collection because of it’s small form factor and it’srobust design.

Front-End: Smartphone Data Collector

Based on the funf-open-sensing-framework [22], we designed an An-droid app to sample the phone’s built-in sensors. For robustness rea-sons, each sensor was sampled in a separate background service andwe extended the framework to save the raw sensor data to the memorycard.

We recorded the data from the following built-in sensors: acceler-ation and orientation sensors were used to measure body movement,the barometer measured atmospheric pressure and was used to inferwhether firefighters were on di↵erent floor or ground levels, the mi-crophone captured raw audio data which was analyzed for speech,GPS location fixes were used to record incident location and driving

1http://www.thisisant.com

http://www.thisisant.com


speed and ANT-based radio messages were send and received to findout which firefighter was in proximity to another one.

As we aim to monitor firefighting teams, the timestamps of all de-vices have to be synchronized. We used the network time protocol(NTP) to measure the o↵set between system time and a common ref-erence time each 5 min. With this approach, we were able to achieve atime synchronization across devices with a maximum time di↵erenceof 500 ms.

In order to monitor the status of the smart phone data collector, weconfigured the framework to upload a subset of calculated features,such as the battery level and a sliding mean value of the accelerationsignal to a central server. The upload period was set to five minutes.

We installed our app as default home screen and blocked all softbuttons of the smartphone to prevent the firefighters to play aroundwith any smartphone settings. In this way, our app was always visibleand the use of the smartphone was restricted to our data collection.

Back-End: Database and Visualization

On the server side, we run one web server to receive the data fromthe smartphones via http-post requests. Upon each request, the datawas extracted and stored it in a central database. A second web serverprovided a web-based user interface o↵ering to monitor the system inreal-time. A screen shot of the web interface showing the battery statusof the devices is presented in the right of Figure 8.1. The interface alsoallows to visualize real time data of the firefighters movement andspeech activity. For the implementation, we used Tornado2 as our webserver and choose Mongo-DB3 as our database. For data visualization,we used the JavaScript libraries d3.js4 and rickshaw.js5.

8.4.3 Detection of Mission Phases

Based on GPS location fixes, we segment each mission into three dif-ferent phases. The approach phase is the first phase of each mission andstarts when the fire trucks leave the station. In the second phase thefirefighters are on-site and in the third phase the mission is completed

2http://www.tornadoweb.org

3http://www.mongodb.org

4http://d3js.org

5http://code.shutterstock.com/rickshaw

http://www.tornadoweb.org

http://www.mongodb.org

http://d3js.org

http://code.shutterstock.com/rickshaw

8.4. CoenoFire System 163

and the firefighters return to the station. To detect the phases, the av-erage location of all firefighters was aggregated for each second of themission operation, by taking the mean of all GPS fixes recorded withinone second. Based on the squads location, we then calculated the dis-tance to the fire station and the driving speed and applied a movingaverage filter of 5 s to smooth both measures. We defined the on-sitephase to be the longest time segment in which the distance to the firestation was constant and greater than 200 m. The start of the approachphase was defined to be the first second in which the distance to the firestation was at least 50 m and the squads movement speed was higherthan 20 km h=1. The return phase ended as soon as the distance to thefire station was smaller 50 m and the squads movement speed was lessthan 20 km h=1.

8.4.4 Detection of Group Proximity

Contrary to previous works which have relied on Bluetooth scans todetect proximity between people [16, 17], we use the low-power ANTprotocol to scan for nearby devices. This allows us to detect devicesin close proximity at a lower power budget and much faster, usuallyin less than 600 ms compared to 30 s of a typical Bluetooth scan. Thisincreased time resolution by a factor of up to 50 enables us to measurehow groups of firefighters split and merge during a mission.

Each smartphone constantly transmits a unique ID and searchesin parallel for devices contained in a search list. Every five seconds,we determine which of the devices was seen by each other device andcluster this proximity data to detect groups of firefighters that are inclose proximity. The clustering is done by grouping all pairs of devicestogether that are connected by at least one link. A temporal filter isthen applied to smooth the clustering result. To also consider whethertwo firefighters are on the same floor level, the measured di↵erencein atmospheric pressure is taken into consideration. In case that theabsolute pressure di↵erence is more than 1 hPa, which corresponds toabout 8 m to 10 m in height di↵erence, we conclude that two firefightersare on di↵erent levels and thus are not close to each other. A moredetailed description and evaluation of our group proximity sensingmethod can be found in [23].


8.4.5 Performance Metrics

From the smartphone sensor data, we extract the following temporaland behavioral performance metrics. In general the behavioral perfor-mance metrics are calculated for each firefighter over a defined periodof time such as a mission phase or the complete mission. Addition-ally, to address the timing aspect of team performance and to visualizehow a mission evolves over time, we calculate the behavioral metricson consecutive periods of 30 seconds.

Behavioral Performance Metrics

Movement Activity To detect body movement activity, first the slidingstandard deviation of the acceleration magnitude �a over one secondis calculated and then a threshold based approach is used to segmentthe motion data into active and non-active segments. The movementactivity describes how much of a period a firefighter was active and isgiven by

movement activity =1N

NX

n=1

[�a(n) > ⌧a], (8.1)

with ⌧a being an activity threshold and [.] being the indicator function.The activity threshold ⌧a was learned from the movement data usinga two component Gaussian Mixture Model.

Movement Intensity is given by the median of the absolute linearacceleration magnitude. Linear acceleration is calculated by subtrac-tion of the median value from the acceleration magnitude.

Movement Variability is given by the inter-quartile-range of theabsolute linear acceleration magnitude.

Speech Activity To automatically detect speech from the recordedraw audio data, we use the long-term-spectral-variability (LTSV) mea-sure presented in [24]. In our previous work [19], we have shown thatLTSV can detect speech activity with high accuracy even in noisy fire-fighting scenarios. Analogous to movement activity, speech activitydescribes how much of the mission time a firefighter or someone nearhim spoke.

Temporal Performance Metrics

First Above Ground In missions which require the turntable ladderto be used, the time that a firefighter is first above ground level is

8.5. Validation of Performance Metrics 165

calculated using the atmospheric pressure signal. Using the pressuresignal of the engineer who operates the firetruck on ground level as thereference signal, we calculate the di↵erence in atmospheric pressuremeasured at each firefighter and the engineer. In case that the pressuredi↵erence is more than 1 hPa, which equals to roughly 8 m in heightdi↵erence, the time that a firefighter is first above ground level iscalculated.

Arrival On-Site For real-world incidents, the time of arrival on-siteis given by the length of the approach phase.

8.5 Validation of Performance Metrics

In order to validate the proposed performance metrics, we have mon-itored 16 firefighting teams during a training scenario in a fire sim-ulation building. Based on the performance metrics, we compare theteams and show how the metrics relate to the mission completion time,one important measure of performance.

8.5.1 Data Collection during Trainings

The data collection took place on the training facilities for first respon-ders in a major city of Switzerland. We staged our experiments in thefire simulation building where a variety of training scenarios can be re-alistically simulated. During trainings, which range from kitchen firesto burning cars in the garage, firefighters are confronted with real fires,extreme heat, high humidity, severely restricted visibility and thicksmoke.

Together with the training instructors, we designed a non-standardtraining scenario with increased di�culty to ensure that di↵erent teamswould not perform equally well. In the chosen training scenario a fireon the third floor of an apartment building is reported by an automaticfire alarm system and the fire department sends a squad consisting ofa fire truck and a turntable ladder. The squad includes eight to ninefirefighters, split into three firefighters on the turntable ladder and fiveto six firefighters on the fire truck.

Each firefighter has a specific role which is fixed to the seatingposition in the firetrucks. The incident commander (IC) is in chargeand keeps track of the ongoing operation. On-site, the driver of theturntable ladder (L) is responsible of operating the ladder, whereasthe driver of the fire truck becomes the engineer (E) who operates the


water pumps. The engineer is also is responsible to keep track of whichfirefighter uses the self contained breathing apparatus (SCBA) for howlong. All other firefighters are part of a troop and thus potentially usethe SCBA. First and second troop are composed of a troop leader (T1a,T2a) and one or two other firefighters (T1b, T1c, T2b, T2c).

As soon as the squad arrives at the scene the incident commanderanalyses the scene and decides how to position the fire trucks, whichhose to use, the size of the first troop and where to enter the building.After the decision is made, the incident commander gives orders tohis squad and the preparation to enter the building via the turntableladder begins. As soon as the hose is prepared and the troop is ready,the turntable ladder brings the troop upwards to the roof windowwhere the troop members enter the building.

When the first firefighters enter the building, it is already filled withthick smoke so that the troop has to navigate blindly to the fire whichis located one floor below the level of the roof window at a staircase ofa maisonette apartment. On the way towards the fire, an unexpecteddummy person has to be found and rescued. At this point the troopleader has to decide how to correspond to the new situation as he didnot know in advance that a person was at risk. Only after the dummyperson is safe the fire should be extinguished, which can either be doneby the first troop or by a second troop.

We successfully recorded 16 training runs of the same scenario. Alltraining runs were videotaped. We used two regular cameras to recordoutside and a thermographic camera to record inside the building.Impressions of the scenario are presented in Figure 8.2. In all runs,the location of entrance was fixed to be the roof window. We chosea single point of entrance for two reasons: First, it made runs morecomparable as it reduced variability between runs and second, it in-creased the di�culty as firefighters had to fight against the heat of thefire maneuvering from upper to lower floors.

In total 51 male professional firefighters, aged 35 ± 10, took partin the data collection. The data recording was scheduled on four con-secutive days. In order to have many di↵erent team compositions, thefirefighters of the morning and afternoon sessions were exchangedcompletely and in each run of one session the roles of the firefighterswere changed in such a way that at least the troop was always com-posed of di↵erent firefighters. The incident commander within onesession stayed always the same.

For later analysis, we used the video recordings to manually split


Figure 8.2: Impressions of training scenario. Firefighters had to enterthrough a roof window and navigate in low-visibility to a fire on thethird floor, rescue a unexpected dummy person and extinguish the fire.

the training scenario into two phases. In the preparation phase, theturntable ladder is positioned, the hose is prepared and the troopuses the turntable ladder to reach the roof window. We defined thepreparation phase to start when the first truck reached it’s final positionand to last until the turntable ladder was fully extended. The executionphase lasted until the troop reported to the incident commander thatthe fire had been extinguished.


duration of

preparation execution complete

movement activity �0.03 0.01 0.03movement intensity �0.55⇤ �0.34 �0.32

movement variability �0.55⇤ �0.33 �0.25speech activity 0.06 0.57⇤ 0.39

first above ground 0.87⇤⇤ �0.10 0.41Notes: ⇤p < 0.05, ⇤⇤p < 0.01

Table 8.1: Correlations between performance metrics and duration ofpreparation and execution phase as well as for the complete trainingduration.

8.5.2 Analysis of Performance Metrics

In the following, we will investigate how the performance metrics arerelated to mission completion time, one of the most critical indicatorsof team performance in firefighting. We compare 16 teams in terms oftheir averaged performance metrics over all involved firefighters. InFigure 8.3 the mean values of the performance metrics observed byslow, middle and fast teams are shown for the preparation and executionphase, as well as for the complete training mission.

The categorization into slow, middle and fast teams was done ineach phase separately by the quartiles of the phase completion time.The completion times of the slow teams are consequently in the high-est quartile, whereas the completion timess of the fast teams are inthe lowest quartile. In addition to the bar plots, the linear correlationcoe�cients between the performance metrics and the phase durationtimes are given in Table 8.1.

As can be seen in the top of Figure 8.3, all teams in the prepara-tion phase were active for about 70 % of the phase, however fasterteams showed higher movement intensity and movement variability. Thisrelationship is also seen by the negative linear correlation betweenthe movement related metrics and phase duration. The more averagemovement intensity and variability across firefighters the shorter thepreparation phase.


prep

arat

ion

phas

eex

ecut

ion

phas

eco

mpl

ete

trai

ning

fast (t< 3.1 min)slow (t > 4.7 min)

fast (t< 6.9 min)slow (t > 9.0 min)

slow (t > 12.9 min) fast (t< 10.6 min)

speech motion0

20

40

60

80

activ

ity [%

]

intensity variability0

0.5

1

1.5

2

acce

lera

tion

[m/s

2 ]

speech motion0

20

40

60

80

activ

ity [%

]


0.2

0.4

0.6

0.8

1ac

cele

ratio

n [m

/s2 ]

speech motion0

20

40

60

80

activ

ity [%

]


0.2

0.4

0.6

0.8

1

acce

lera

tion

[m/s

2 ]

middle

middle

middle

Figure 8.3: Performance metrics during preparation and executionphase, as well as for the complete training. For each phase teams weresplit into slower, middle and faster teams by the first and third quar-tiles of the respective phase durations (respective times are given inbrackets).


Interestingly, speech activity is not correlated with the duration of thepreparation phase. Most possibly this stems from the fact, that the prepa-ration phase of the chosen training scenario is standard procedure andthus known by heart so that no extra coordination is required. Theperformance metric first above ground is a good indicator of the lengthof the preparation phase (R = 0.87); the two measures are not perfectlycorrelated because troops needed more or less time to enter the roofwindow.

As in the preparation phase, faster teams also showed higher move-ment intensity and movement variability during the execution phase.Again this can be seen in the negative correlations between move-ment intensity, movement variability and the execution phase duration.During the execution phase, slower teams showed more speech activitythan faster teams. Thus, we can observe a positive correlation betweenspeech activity and execution phase duration. The higher amount of com-munication might indicate more need for explicit coordination whichconsequently leads to longer execution phases.

Analyzing the complete training duration, we find that overallslower teams tend to speak more, as seen by the positive correlationbetween speech activity and training duration. Slower teams showedless movement intensity and movement variability, while being active forthe same amount.

The analysis of the performance metrics showed that the metricsare valid performance indicators as they are not only correlated withthe temporal performance measures of phase and training duration,but also provide more insight why teams might have been faster thanothers.

8.6 CoenoFire in the Wild

In the following, we describe the conducted real-world study withprofessional firefighters. We detail the data collection procedure duringreal-world deployment, analyze mission operations of a real-world fireincident and show how the smartphone data can support post missionfeedback.

8.6.1 Data Collection

Over a period of six weeks, we monitored a squad of nine professionalfirefighters in 33 shifts during real-world incidents. Each squad itself

8.6. CoenoFire in the Wild 171

Alarm

Incide

ntRe

turn

Figure 8.4: Deployment of smartphones during the data collectionphase. Smartphones were placed and charged next to the fire truck tobe picked up by the firefighters before leaving the fire station.

was composed out of a turntable ladder with three firefighters and afire truck with five to six firefighters varying with the station’s workplan.

Work is organized in three 24 hours shifts meaning that a firefighteris on duty for 24 hours and o↵ for the next 48 hours. Each shift begins inthe morning at 7 am with a report of the previous shift and ends with ahandover to the next shift the next morning. During a shift firefightersmaintain equipment, take part in special training and keep themselvesfit with sports. In case of an incident alarm, the firefighters stop theireveryday activities, put on their protective clothing and jump on thefire trucks to drive to the incident location.


Having the requirements of an in-work-place recording in mind(see ’Requirements on Monitoring System’), we integrated the datacollection procedure into the daily routine of the fire brigade as fol-lows: The smartphones were placed on a sideboard located left to thefire truck and were attached to a powered USB Hub which served ascharging station (see Figure 8.4). In this way, the phones were alwayscharged and ready to be used. As soon as an alarm occurred, the fire-fighters un-plugged the smartphone labeled with the number of theirdaily position and put it inside the left inside pocket of their jacket.Un-plugging the smartphone from the charging cable triggered therecording app to automatically start the data collection. In this waythe firefighters were not further disturbed from their normal routine.When the firefighters returned to the station, they reconnected thesmartphone to the charging cable which triggered the app to displaya short post mission questionnaire including 10 questions.

During the data collection period the monitored squads were in-volved in 76 incidents of which 43 were triggered by automatic firealarm systems, 9 were real fire incidents and the rest were other inci-dent types such as a burning garbage container, a trapped person in anelevator or water inside a building. In total 71 firefighters participatedin the real-world data collection.

8.6.2 A Real-World Mission

In the following, we visualize the first 30 minutes of smartphone datarecorded during a fire at a multi-family residential home. We chooseto show this mission, because a detailed mission report was available.Impressions of the fire incident are presented in the top of Figure 8.5.

When leaving the fire station, only a street intersection for the in-cident location was provided and the detailed address of the incidentwas unclear. When the squad arrived at the incident scene the policeinformed the incident commander that two persons were still missingin the apartment on the third floor. Consequently, the first concern ofthe incident commander was to rescue the missing persons and he or-dered the first troop to search and rescue the persons via the staircaseusing the quick-attack hose. Afterwards, the incident commander or-dered the second troop to attack the fire at the balcony via the turntableladder in order to extinguish the fire and to save the roof so�t. Theincident commander then ordered a second squad for backup. Thepersons were found and rescued by the first troop and other four per-


time [min]

961

962

963

964

965

Approach On-site

grou

p pr

oxim

ity

grou

nd le

vel

high

er le

vel

1c

1b

2c

2a

2b

3a

3b

4a 5a

4c

5c

mot

ion

inte

nsity

[m/s

^2]

spee

ch a

ctiv

ity [%

]at

mos

pher

ic p

ress

ure

[hPa

]

5

10

15

20

5 10 15 20 25 300

10

20

30

40

50

60

70

Inident Commander Troop 1EngineerTroop 2

5 10 15 20 25 300

5 10 15 20 25 300

5 10 15 20 25 300

Troop 1Troop 1

Figure 8.5: Impressions and visualization of the smartphone datarecorded during the first 30 minutes of a real-world firefighting mis-sion in a multi-family residential home. Mission time starts as soonas firefighters leave the station. Shown are from top to bottom atmo-spheric pressure, group proximity, movement intensity and speechactivity. Just the pressure signals alone indicate when first (2a) and sec-ond troop (3a) reached higher floors and when two missing personswere rescued (4a, 5a).

sons were evacuated via a side balcony on the fourth floor. The wholemission lasted for more than three hours.


8.6.3 Data Supported Mission Feedback

In Figure 8.5 the smartphone data illustrates how the firefighting op-erations evolved over time. Presented are, from top to bottom, atmo-spheric pressure, groups of firefighters who are in proximity to eachother, motion intensity and speech activity. The mission phases are un-derlaid in di↵erent colors, the approach phase in light blue, the on-sitephase in light orange.

From the pressure signal, we can infer altitude changes while ap-proaching the incident site and relative di↵erences in altitude betweenfirefighters during the on-site phase, indicating when a troop operatedabove ground level. The proximity graph displays, in form of a nar-rative chart, which firefighters were in proximity during each point intime of the mission. Each firefighter is represented by a line of di↵er-ent color and lines that are close to each other represent a group offirefighters that are in proximity. The graph also indicates on whichlevel relative to the ground level a group of firefighters operates. Themovement intensity of the firefighters is aggregated each 30 secondsand is illustrated in form of a stacked bar chart, which allows to in-fer when the squad was most active and who of the firefighters wasmost active. Analogous, speech activity detected at each firefighter isdisplayed. The values are normalized and 100 % would represent thatspeech was detected at all firefighters for the entire 30 seconds period.

Looking at the approach phase first, we notice a short peak in move-ment intensity (see 1c) and a merging of all firefighters (see 1b). Thisis the result of the uncertain incident location and due to the fact thatthe given street intersection exists twice as one of the streets has cir-cular shape. Consequently, the squad stopped at the first intersectiononly to find out that they had to continue driving to reach the secondintersection where the incident was located.

The high peak of movement intensity at the beginning of the on-sitephase (see 2c) indicates the rapid start of all firefighters, especially ofthe first troop which had to rescue the missing persons. Already withinone minute after arrival on-site, the first troop is at least 8 meters abovethe engineer, which can be seen from the pressure signals (see 2a) andthe group clustering (see 2b). Together with the high motion intensitythis shows the fast operation of the first troop.

Between minutes 10 and 14 of the mission, the engineer E and theincident commander IC moved intensively, while at the same time thespeech activity of all firefighters dropped considerably. In this period


of the mission, automatisms were at play and all firefighters followedtheir role specific tasks indicating that everything went as supposedto. For the engineer E this meant to connect the fire truck to the nextfire hydrant, while the incident commander overviewed the situationon-site.

At minute 15 of the mission, the second troop arrived at the balconyto extinguish the fire at the roof which can be seen from the pressuresignal (see 3a) and the proximity clustering and (see 3b). It appears, thatonly one firefighter was involved in this task, however this is not true.Because we monitored only one squad, not all firefighters involved inthe mission carried a smartphone.

Twelve minutes after arriving on-site, the first person was found andrescued by troop member T1b (see 4a,4c). From the fourth floor, T1bcarried the person down the staircase to the first responders waitingoutside the building. Little later, the second person was rescued andcarried down by troop leader T1a (see 5a,5c).

8.6.4 Data Completeness

In the following, we analyze data completeness and evaluate how wellthe data collection procedure could be integrated into the daily routineof the firefighters. We first look at how well the charging procedureworked during deployment. In the top left of Figure 8.6 the overalldata completeness is shown. It can be seen that the smartphones wereon and ready to record in 93 % of the expected recordings, where thenumber of expected recordings is given by the product of the number ofmissions and the number of firefighters involved. In total, we collected236 recordings.

To better understand when firefighters did not take the smartphonewith them, we looked at the following factors which might have had aninfluence on the data collection. All factors are illustrated in Figure 8.6.


Overall Data Com

pleteness

236340

7%93%

data recordedno data recorded

phone offphone on

52

115

69

0%20%

40%60%

Period of Data Collectionw

eek 1-2w

eek 3-4w

eek 5-6

11057

13

56

0%20%

40%60%

Incident Time

Morning

EveningAfternoon

Night

41

195

0%20%

40%60%

Fire truckfire truck

turntable ladder

41 75

120

0%20%

40%60%

Workgroup

12

3

0100

200300

400Alarm

Fire Residence Fire Airport

AircraftO

ther

Incident Type

42%57%

69% 1%

39%

# recordings

data recordedno data recorded

Figure8.6:

Overall

datacom

pletenessand

factorsw

hichinfluenced

datacom

pletenessduring

real-world

deployment.The

systemw

ason

andready

most

ofthe

time

andfirefighters

carriedthe

smartphone

duringincidentsin

more

thanone

thirdofallpossible

incidents.Given

areabsolute

valuesforthenum

berofrecordingsand

percentagevalues

indicatethe

fractionofactualrecordings

giventhe

number

ofexpectedrecordings.

8.7. Discussion and Conclusion 177

• Period of Data Collection We have noticed that the data com-pleteness rate decreased over the period of the data collection.Within the first two weeks 62 % of all expected recordings werecompleted. The completeness rate dropped in the second fort-night to 42 % and reached 28 % in the last two weeks. The lowdata completeness towards the end of the data collection is prob-ably because this period fell into a holiday season and the factthat firefighters thought that the data collection had ended.

• Incident Time We observed a higher than average data complete-ness for incidents that occurred in the afternoon and the lowestfor incidents at night. At night and at the first incident of the dayfirefighters forgot to pick up the smartphones more often.

• Incident Type Dependent on the incident type the data com-pleteness rate varies from 36 % to 69 %, with one clear exception:In case of an aircraft incident almost no data was recorded. Be-cause firefighters have to be at the airplane within three minutes,time is extremely rare and the firefighters could not spent anyadditional time un-plugging the smart phone.

• Fire Truck We noticed that firefighters of the fire truck remem-bered the phone almost twice as often compared to firefightersof the turntable ladder. Most likely this is due to the fact, thatall smartphones were located close to the fire truck, but furtheraway from the turntable ladder.

• Workgroup Comparing the three shift workgroups, we observedthat the first workgroup had a data completeness rate of 49 %,whereas the two other groups had 36 % and 34 % completenessrate, respectively. It appears that the first workgroup was highlymotivated to participate in the data collection.

8.7 Discussion and Conclusion

We have presented CoenoFire, a smartphone based sensing systemfor monitoring performance indicators of firefighting missions. Wesuccessfully deployed CoenoFire in a professional fire brigade over aperiod of six weeks in which 71 firefighters used the system in 76 real-world missions.

The performance of firefighters depends on many factors, and anymetric derived from smartphone data can only give indications of what


might have been good or bad during a training or mission. However,we have demonstrated, that with only the smartphone in the jacketof the firefighters, detailed information can be extracted that is valu-able for incident commanders and training instructors. In the recordedtraining scenario, we have seen that longer mission durations are corre-lated with more speech activity of the squad which could indicate thatmore explicit coordination was needed. Also, we found that shorterpreparation and execution phases were related to higher movementintensity and variability.

We have seen that in scenarios which spread across di↵erent floors,already the signal of a pressure sensor can provide information aboutwhen the first troop reached a level above or below the reference levelof the engineer, who is in that troop, and for how long the troopwas operating. Combined with proximity information derived fromlow-power communication radios, we showed how groups merge andsplit during missions to perform di↵erent tasks. We showed, how thesmartphone data can be visualized to show temporal evolvement andhow important mission phases and events can be detected. As thetraining instructor of the fire brigade put it: “I can really see howthe mission evolved, it is a great tool for post-incident feedback andtraining".

From the results of the data completion analysis and personal feed-back from the firefighters, we conclude that overall the acceptanceof the smartphone to recording data during missions was high, butthat the already simple data collection procedure has to be further im-proved. This could be achieved for example by integrating the smart-phone better into the jacket to reduce the user e↵ort.

In future work, the audio data could be mined for reoccurringambient sounds which could further improve the logging of importantevents throughout missions.


The authors would like to thank all members of the fire brigade fortheir participation and support throughout the experiments. This workis partly funded by the SNSF interdisciplinary project "Micro-level be-havior and team performance" (grant agreement no.: CR12I1_137741).

Bibliography

[1] J. R. Cook, L. Sutton, and M. Useem, “Developing leaders fordecision making under stress: Wildland firefighters in the southcanyon fire and its aftermath.,” Academy of Management Learning& Education, vol. 4, no. 4, pp. 461–485, 2005.


[3] “ProeTEX - Advanced e-textiles for firefighters and civilian vic-tims,” Mar 2013. http://www.proetex.org.

[4] “Wear IT at work,” Mar 2013. http://www.wearitatwork.com.

[5] “ProFiTex - Advanced Protective Firefighting Equipment,” Mar2013. https://www.project-profitex.eu.

[6] “NIST Smart Firefighting Project,” Mar 2013. http://www.nist.gov/el/fire_research/firetech/project_sff.cfm.

[7] “Fire - fire information and rescue equipment,” Mar 2013. http://fire.me.berkeley.edu.

[8] J. Wilson, V. Bhargava, a. Redfern, and P. Wright, “A wireless sen-sor network and incident command interface for urban firefight-ing,” in Proc. Int. Conf. Mobile and Ubiquitous Systems: Computing,Networking and Services (MobiQuitous), 2007.

[9] C. Fischer and H. Gellersen, “Location and navigation supportfor emergency responders: A survey,” IEEE Pervasive Computing,vol. 9, no. 1, pp. 38–47, 2010.

[10] Y. Chon, N. D. Lane, F. Li, H. Cha, and F. Zhao, “Automati-cally characterizing places with opportunistic crowdsensing us-ing smartphones,” in Proc. Int. Conf. Ubiquitous Computing (Ubi-Comp), 2012.


http://www.proetex.org

http://www.wearitatwork.com

https://www.project-profitex.eu








[14] G. Chittaranjan, J. Blom, and D. Gatica-Perez, “Mining large-scalesmartphone data for personality studies,” Personal and UbiquitousComputing, vol. 17, no. 3, pp. 433–450, 2011.






[20] E. Sundstrom, M. McIntyre, T. Halfhill, and H. Richards, “WorkGroups: From the Hawthorne Studies to Work Teams of the 1990sand Beyond,” Group Dynamics: Theory, Research, and Practice, vol. 4,pp. 44–67, 2000.

[21] M. A. Marks, J. E. Mathieu, and S. J. Zaccaro, “A temporallybased framework and taxonomy of team processes,” The Academyof Management Review, vol. 26, pp. 356–376, 2001.

181


[23] S. Feese, B. Arnrich, M. Burtscher, B. Meyer, K. Jonas, andG. Tröster, “Sensing group proximity danamics of firefightingteams using smartphones,” in Proc. Int. Symp. Wearable Computers(ISWC), 2013.


Glossary

Notation Descriptionactivity level captures the fraction of time that a person is

actively moving during a defined time interval.10, 38, 45

ANT wireless low power communication protocol.26, 35, 37

behavioral mimicry refers to the alignment of gestures and posturesbetween two or more interaction partners. 5, 10,12, 24, 25, 28, 31, 44, 50, 51, 57, 58, 63, 65

clean speech dictionary dictionary learned using clean speech as train-ing data to best present clean speech. 32

completion time refers to the time it takes to complete a task.In the firefighting scenario it was used as anoutcome measure of team performance.. 39, 44,165, 168

considerate leadership is a person-focused leadership style. Consider-ate leaders pay special attention to their follow-ers’ needs and listen e↵ectively. 13, 14, 25, 31,33, 34, 52, 53, 62–65, 71–73, 75, 79, 81, 82

degree centrality of a node in the network measures how welleach node in the network is connected to othernodes. 40

degree centralization measures how central the most most centralnode of a network is with respect to the cen-trality of all other nodes. 9, 38–40

logistic regression probabilistic statistical classification model. 29,32, 33, 56, 76, 78, 79, 81, 82, 93

moving sub-group sub-group of a team that moves together inspace and time. 11, 12, 35, 36, 38, 42, 44, 45,103


Notation Description

network density measures how well nodes of a network are con-nected with each other. 9, 38

sparse representation representation of a signal using a limited num-ber of code book elements. 32, 88–90, 94

speech activity cues describe patterns of speech activity of oneor more team members, e.g. total speakingtime, number of short utterances or number ofspeaker turns. 7, 8, 11, 13, 33, 71, 77, 78, 82

speech detection refers to the binary detection of speech in sound,in this thesis it is used interchangeably withvoice activity detection and speech activity de-tection. 10, 11, 32, 33, 90, 94, 97

team e↵ectiveness evaluation of team performance in terms of anoutcome measure such as quantity of producedunits, speed, or a team’s viability. 2, 3, 5, 24, 25,44

team performance all processes, cognitive and a↵ective states thatare involved when team members work andinteract with each other to reach their sharedgoals. 2–5, 11, 25, 34, 39, 44, 52, 70, 71, 156, 159,164, 168

Curriculum Vitae

Personal Information

Sebastian FeeseBorn on May 9, 1983, in Berlin, GermanyCitizen of Germany

Education

2009–2014 PhD studies (Dr. sc. ETH) in Information Technology andElectrical Engineering, ETH Zurich, Switzerland.

2002–2008 Dipl.-Ing. in Computer EngineeringTechnische Universität Berlin, Germany.

2004–2005 Student Exchange in Computer EngineeringRMIT University, Melbourne, Australia.

1996–2002 Abitur, Droste-Hülsho↵-Oberschule, Berlin, Germany.

Work Experience

2009–2014 Research assistant, Electronics Laboratory,ETH Zurich, Switzerland.

2007–2008 Research studentTechnische Universität Berlin, Germany.

2006–2007 Working student, Nokia Siemens Networks GmbH & Co. KGBerlin, Germany.

Documents

In Copyright - Non-Commercial Use Permitted Rights ...46843/eth-46843-02.pdfDiss. ETH No. 21894 Observing Teams with Wearable Sensors A dissertation submitted to ETH Z¨urich for the