114
A Learning-based Control Architecture for Socially Assistive Robots Providing Cognitive Interventions by Jeanie Chan A thesis submitted in conformity with the requirements for the degree of Masters of Applied Science Mechanical and Industrial Engineering University of Toronto © Copyright by Jeanie Chan 2011

A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

A Learning-based Control Architecture for Socially Assistive Robots Providing Cognitive Interventions

by

Jeanie Chan

A thesis submitted in conformity with the requirements for the degree of Masters of Applied Science

Mechanical and Industrial Engineering University of Toronto

© Copyright by Jeanie Chan 2011

Page 2: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

ii

A Learning-based Control Architecture for Socially Assistive

Robots Providing Cognitive Interventions

Jeanie Chan

Masters of Applied Science

Mechanical and Industrial Engineering

University of Toronto

2011

Abstract

Due to the world’s rapidly growing elderly population, dementia is becoming increasingly

prevalent. This poses considerable health, social, and economic concerns as it impacts

individuals, families and healthcare systems. Current research has shown that cognitive

interventions may slow the decline of or improve brain functioning in older adults. This research

investigates the use of intelligent socially assistive robots to engage individuals in person-

centered cognitively stimulating activities. Specifically, in this thesis, a novel learning-based

control architecture is developed to enable socially assistive robots to act as social motivators

during an activity. A hierarchical reinforcement learning approach is used in the architecture so

that the robot can learn appropriate assistive behaviours based on activity structure and

personalize an interaction based on the individual’s behaviour and user state. Experiments show

that the control architecture is effective in determining the robot’s optimal assistive behaviours

for a memory game interaction and a meal assistance scenario.

Page 3: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

iii

Acknowledgments

I would like to thank my advisor, Professor Goldie Nejat, for her guidance and support of my

research work, and my M.A.Sc. thesis committee for their time and input. I would also like to

thank the following undergrad students for their valuable contributions to this project: Nelson

Tran, Bijan Shahriari, Jingcong Chen, Greg Jhin, Howard Tseng, John Adler, Sean Feng, Kelly

Payette, Andy Tseung, Clarence Leung, Ray Zhao, Manav Agarwal, Amy Do, and Adib Saad.

Lastly, I would like to thank all of my lab mates, friends, and family for their motivation,

encouragement, and support during the two year period of my research.

Page 4: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

iv

Table of Contents

Acknowledgments .......................................................................................................................... iii

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................. vi

List of Figures ............................................................................................................................... vii

Chapter 1 Introduction .................................................................................................................... 1

1.1 Motivation ........................................................................................................................... 1

1.1.1 Cognitive Interventions ........................................................................................... 1

1.2 Robots as Assistive Aids for Cognitively Impaired Persons .............................................. 4

1.2.1 Social Robots as Therapeutic Aids in Cognitive Interventions .............................. 4

1.2.2 Activity Guidance Systems for Cognitively Impaired Persons .............................. 5

1.2.3 The Socially Assistive Robot Brian 2.0 .................................................................. 6

1.3 Problem Definition .............................................................................................................. 7

1.4 Proposed Methodology and Tasks ...................................................................................... 7

1.4.1 Literature Review .................................................................................................... 7

1.4.2 Design of Control Architecture for Socially Assistive Robots ............................... 8

1.4.3 Learning-based Decision Making for the Behaviour Deliberation Module ........... 8

1.4.4 Implementation ....................................................................................................... 8

1.4.5 Conclusion .............................................................................................................. 8

Chapter 2 Literature Review ........................................................................................................... 9

2.1 Development of a Socially Assistive Robot for HRI .......................................................... 9

2.1.1 Learning Strategies for Socially Intelligent Robots ................................................ 9

2.1.2 Strategies for Addressing Uncertainty in Social HRI ........................................... 11

Chapter 3 Design of Control Architecture for Socially Assistive Robots .................................... 13

3.1 Proposed HRI Control Architecture .................................................................................. 13

3.1.1 Methods of Inter-module Communication ............................................................ 14

3.1.2 Addressing Uncertainty ........................................................................................ 15

3.2 Memory Game Scenario ................................................................................................... 15

3.2.1 The Card Game of Memory .................................................................................. 17

3.2.2 Control Architecture for the Memory Game Scenario ......................................... 17

3.3 Meal-time Scenario ........................................................................................................... 28

3.3.1 Control Architecture for Meal-assistance Robot .................................................. 30

Chapter 4 Learning-based Decision Making for the Behaviour Deliberation Module ................. 42

Page 5: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

v

4.1 Behaviour Deliberation Module ....................................................................................... 42

4.2 Model of HRI Scenario ..................................................................................................... 42

4.3 MAXQ Hierarchical Reinforcement Learning ................................................................. 43

4.3.1 Task Decomposition ............................................................................................. 43

4.3.2 Value Function Decomposition ............................................................................ 44

4.3.3 MAXQ Learning Algorithm ................................................................................. 44

4.4 Memory Game Scenario ................................................................................................... 45

4.4.1 Knowledge Clarification Layer ............................................................................. 45

4.4.2 Intelligence Layer ................................................................................................. 45

4.5 Meal-time Scenario ........................................................................................................... 54

4.5.1 Knowledge Clarification Layer ............................................................................. 54

4.5.2 Intelligence Layer ................................................................................................. 55

Chapter 5 Experiments .................................................................................................................. 64

5.1 Memory Game Scenario ................................................................................................... 64

5.1.1 Performance Assessment: Control Architecture ................................................... 64

5.1.2 HRI Study: Activity Engagement ......................................................................... 67

5.1.3 Performance Assessment: Learning-based Decision Making ............................... 70

5.1.4 HRI Study: Minimizing Task-Induced Stress ....................................................... 76

5.2 Meal-time Scenario ........................................................................................................... 80

5.2.1 Performance Assessment ...................................................................................... 80

5.2.2 Human-Robot Interaction Studies ......................................................................... 91

Chapter 6 Conclusion .................................................................................................................... 97

6.1 Summary of Contributions ................................................................................................ 97

6.1.1 Control Architecture for Socially Assistive Robots .............................................. 97

6.1.2 Learning-based Robot Assistive Behaviours ........................................................ 97

6.1.3 Metrics Explored for the Evaluation of HRI ......................................................... 98

6.2 Discussion of Future Work ............................................................................................... 98

6.3 Final Concluding Statement .............................................................................................. 99

References ................................................................................................................................... 100

Appendix ..................................................................................................................................... 106

A.1 List of My Publications ....................................................................................... 106

Page 6: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

vi

List of Tables

Table 1: Background Subtraction Method .................................................................................... 23

Table 2: List of Recognized Questions for the Memory Game Scenario ..................................... 25

Table 3: Task-based User States ................................................................................................... 26

Table 4: Robot Emotional State for Memory Game Scenario ...................................................... 27

Table 5: Facial Action Units for Angry [84][86] .......................................................................... 36

Table 6: Activity State Parameters for the Meal-time Scenario ................................................... 38

Table 7: Robot Emotional State for Meal-assistance Robot ......................................................... 41

Table 8: Examples of Primitive Robot Actions for Memory Game ............................................. 46

Table 9: State Functions for Memory Game Scenario .................................................................. 47

Table 10: Bi-gram User Simulation Model (Memory Game Scenario) ....................................... 50

Table 11: Speech Recognition Rates ............................................................................................ 50

Table 12: Card Identity Detection Rates ....................................................................................... 51

Table 13: Detection Rates for the Number of Cards Flipped Over .............................................. 51

Table 14: Task Termination Conditions for Meal-time Scenario ................................................. 56

Table 15: State Functions for Mealtime Scenario ......................................................................... 57

Table 16: Examples of Primitive Actions for Mealtime Scenario ................................................ 58

Table 17: Human User Model for Meal-time Scenario ................................................................ 60

Table 18: Sensor Error Model (Meal-time Scenario) ................................................................... 61

Table 19: Activity State Identification Results (Memory Game Scenario) .................................. 66

Table 20: Robot Emotion-based Behaviour Selection and Execution Results ............................. 66

Table 21: Engagement Results (Memory Game Scenario) ........................................................... 69

Table 22: Robot Behaviours Effective at Relieving Stress ........................................................... 80

Table 23: Performance Results for Activity State Module (Meal-time Scenario): Sensitivity .... 83

Table 24: Performance Results for Activity State Module (Meal-time Scenario): Specificity .... 83

Table 25: User State Module Performance (Meal-time Scenario): Sensitivity ............................ 84

Table 26: User State Module Performance (Meal-time Scenario): Specificity ............................ 84

Table 27: Performance of Behaviour Deliberation Module (Meal-time Scenario) ...................... 84

Table 28: Construct Definitions [94] ............................................................................................ 92

Table 29: Users’ Acceptance Questionnaire [94] ......................................................................... 93

Table 30: Users’ Acceptance Results ........................................................................................... 95

Table 31: Robot Behaviours Effective at Engaging the Person in the Meal-time Scenario ......... 96

Table 32: Most Liked Robot Characteristics ................................................................................ 96

Page 7: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

vii

List of Figures

Figure 1: Brian 2.0 .......................................................................................................................... 7

Figure 2: Control Architecture for a Socially Assistive Robot ..................................................... 13

Figure 3: Brian 2.0 in a Memory Game Scenario ......................................................................... 16

Figure 4: Sensory System for the Memory Game Scenario ......................................................... 18

Figure 5: Keypoint Identification for the Memory Game ............................................................. 20

Figure 6: Card Cluster Identification: a square is drawn symmetrically around keypoint pij (red

dot within the square) and expands in the directions denoted by the arrows. .............................. 21

Figure 7: Card Recovery: The picture card in the game (right) is matched to the database card

(left). (a) The picture card in the game is upright, (b) the picture card in the game is rotated, and

(c) the picture card in the game is partially obstructed by a person’s fingers. ............................. 22

Figure 8: Division of camera image into a 4x4 grid ..................................................................... 23

Figure 9: Brian 2.0 in a (a) happy state, (b) neutral state, and (c) sad state. ................................. 27

Figure 10: Meal-assistance Robot ................................................................................................. 30

Figure 11: Sensory System for the Meal-assistance Robot .......................................................... 31

Figure 12: Activity Sensing System (Note: load cells are under the side dish and cup) .............. 32

Figure 13: Meal Tray Sensing Platform Schematic (Courtesy of Amy Do) ................................. 33

Figure 14: Clip-on Device for Utensil Position Sensing .............................................................. 33

Figure 15: Horizontal face orientation: (a) facing right, (b) facing center, and (c) facing left. .... 36

Figure 16: Vertical face orientations: (a) facing down, (b) facing level, and (c) facing up. ........ 36

Figure 17: Two regions of interest for eyebrow sensing .............................................................. 37

Figure 18: Detection of the eyebrow and its slope: (a) neutral face and (b) angry expression..... 37

Figure 19: Hierarchical task graph for the memory game scenario (primitive robot actions on the

bottom row are defined in Table 8). .............................................................................................. 46

Figure 20: Comparison of MAXQ and flat Q-learning for the Memory Game Scenario ............. 53

Figure 21: Task Decomposition for Meal-time Scenario ............................................................. 55

Figure 22: Flowchart of Human Actions for Meal-time Scenario ................................................ 59

Figure 23: MAXQ vs. Flat-Q Comparison for Meal-time Scenario ............................................. 62

Figure 24: Experimental Set-up for Performance Assessment (Memory Game Scenario) .......... 65

Figure 25: (a) Baseline Scenario and (b) Robot Interaction Scenario .......................................... 68

Page 8: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

viii

Figure 26: Examples of robot behaviour during interactions: (a) Robot providing celebration in a

happy emotional state after a correct match, (b) Robot providing help in a neutral state, and (c)

Robot providing instruction in a sad state when game disengagement occurs. ............................ 70

Figure 27: Experimental Setup for Evaluation of Learning-based Decision Making .................. 71

Figure 28: Participant user states detected during the memory game .......................................... 73

Figure 29: Interaction details for all participants .......................................................................... 74

Figure 30: Rewards for the Flip Over subtask .............................................................................. 75

Figure 31: Reward for Help subtask ............................................................................................. 76

Figure 32: (a) Baseline Scenario and (b) HRI Scenario ............................................................... 77

Figure 33: The percentage of the interaction that a participant is stressed ................................... 78

Figure 34: Comparison of the percentage of the interaction that a participant is in a stressed or

positive state during the HRI scenario .......................................................................................... 79

Figure 35: Examples of robot behaviours during interactions: (a) Robot providing help in a

neutral emotional state, (b) Robot providing celebration in a happy state after a correct match,

and (c) Robot providing instruction in a sad state when game disengagement occurs. ................ 79

Figure 36: Experimental Setup for Performance Assessment (Meal-time Scenario) ................... 82

Figure 37: Details of interactions involving the main dish ........................................................... 86

Figure 38: Details of interactions involving the beverage ............................................................ 87

Figure 39: Details of interactions involving the side dish ............................................................ 88

Figure 40: Rewards for the Obtain food from main dish subtask ................................................. 89

Figure 41: Rewards for the Pick up beverage subtask .................................................................. 89

Figure 42: Rewards for the Obtain food from side dish subtask ................................................... 90

Figure 43: Rewards for the Eat food subtask ................................................................................ 90

Figure 44: Rewards for the Drink beverage subtask .................................................................... 90

Figure 45: (a) Baseline Scenario A and (b) Scenario B for HRI Study (Meal-time Scenario) ..... 91

Page 9: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

1

Chapter 1 Introduction

1.1 Motivation

Cognitive impairment progressively diminishes a person’s memory, orientation, verbal skills,

visuospatial ability, abstract reasoning and attentional skills [1]. Impairment can range from mild

to severe and is associated with many disorders and disabilities that are either present at birth or

acquired later in life, i.e. as a result of an illness or accident. Common causes include brain

injury, autism spectrum disorder, learning disorders, dementia, and substance dependencies. In

particular, for the elderly population, the risk of developing cognitive impairment increases as

one ages and due to the world’s rapidly growing elderly population, dementia is becoming

progressively more prevalent worldwide. In 2010, there was an estimated 35.6 million people

living with dementia and by 2050, it is predicted that an estimated 115.4 million people

worldwide will suffer from cognitive impairment related to the condition [2]. The rapid increase

in people suffering from dementia poses considerable medical, social, and economic concerns as

it impacts individuals, families, and healthcare systems. The total worldwide cost of dementia in

2010 was estimated to be $604 billion USD, which includes the direct costs of medical and

social care in addition to the costs of informal care provided by family members [2].

With dementia, the ability to independently initiate and perform daily activities can be

compromised as specific cognitive abilities such as activity planning, problem solving, self-

initiation, attention, and memory can all be severely affected [3]. If a person is incapable of

performing these activities, assistance from other people and/or mechanical devices may be

necessary. The prevalence rate for elderly persons who have difficulties performing basic daily

tasks increase significantly with advancing age and are especially high for persons aged 85 and

over [4]. There is no cure for dementia, but there is hope that the use of pharmacological

interventions as well as cognitive, social, and physical interventions may help reduce the decline

of or improve brain functioning in people suffering from this condition.

1.1.1 Cognitive Interventions

Currently, a growing body of research supports the effectiveness of using non-pharmacological

interventions to reduce the decline of or improve brain functioning in people suffering from

Page 10: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

2

dementia. Based on recent human studies, sustained engagement in cognitively stimulating

activities has been found to impact the neural structure in humans [5]-[6].

There is a considerable amount of literature documenting the positive effects of cognitive

training programs on the cognitive functioning of older adults [7]-[12]. These training programs

include the adult development and enrichment project (ADEPT) and the advanced cognitive

training for independent and vital elderly (ACTIVE). ADEPT is designed to improve the

reasoning capability of older adults through training activities which include solving pictorial

reasoning and numerical pattern identification problems. Results from studies using the program

have shown significant improvement in reasoning abilities after five, one-hour training [7],[13].

The objective of the ACTIVE program is to design training interventions that can improve the

performance of older adults in cognitive-based daily functioning tasks. Training activities

include mnemonic strategies, and pattern and object identification. The interventions have shown

to have positive effects on memory, reasoning and speed of processing [12],[14].

Other training programs that currently exist have focused solely on the use of computer-based

interventions, i.e. [15]-[18]. For example, Hofmann et al. [15] utilized a touch screen based

computer program simulating a walk into the center of a simulated town for patients suffering

from mild to moderate Alzheimer's disease. Patients were instructed to "move through"

simulated scenes by touching the correct picture shown on the screen and to complete tasks such

as shopping or answering multiple-choice questions. Tárraga et al. [16] used the interactive

internet-based computer program, Smartbrain, for interventions with people suffering from mild

Alzheimer's disease. The program provides stimulation exercises across the domains of memory,

attention, orientation, recognition, language, calculation and executive functions. Günther et al.

investigated the effects of computer-assisted cognitive training on older adults via the software,

Cognition I [17]. The software includes exercises involving anagrams, reading comprehension,

mental arithmetic, and the reading and setting of an analogue clock. With the aid of computers,

cognitive training programs are relatively easier to implement and sustain; however, they have

yet to be conclusively proven to be effective for the cognitively impaired and are inflexible to the

addition of crucial complementary therapies such as social interventions, which are also

beneficial in maintaining or improving brain functioning [19]-[22].

Page 11: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

3

During meal-times in a long-term care facility, personal support workers (PSWs) provide one-

on-one assistance and, if necessary, feed the individual in order to provide social and cognitive

stimulation during the interaction. However, this becomes challenging to implement as PSWs

become overwhelmed with providing individual care to so many people during a short meal-

time, in addition to performing other tasks. The consequences of an inadequate number of

knowledgeable and well-trained staff during meal-times may include the negligence of the social

dimensions of meal-time, residents not receiving the necessary assistance and residences being

fed forcefully [23]. Moreover, if the residents do not consume an adequate amount of food

during meal-times, serious health problems such as malnutrition may arise. Malnutrition is

defined as faulty or inadequate nutritional status, undernourishment characterized by insufficient

dietary intake, poor appetite, muscle wasting, and weight loss [24]. Malnutrition is a serious

problem amongst the elderly living in long-term care facilities as it contributes significantly to

morbidity, decreased quality of life and mortality. Namely, malnourished elderly patients have

longer hospital stays, 2 to 20 times more health complications than healthy older adults, frequent

re-admissions to hospitals, and delayed recovery times [24]. Therefore, it is imperative to

investigate new methods to effectively promote independent eating habits and ways to aid PSWs

in addressing the needs of elderly persons during meal-times.

The aforementioned cognitive interventions show potential in reducing the decline of,

maintaining, or even improving cognitive and global functioning in persons suffering from

cognitive impairment. However, more research is required as these initiatives still have

inadequate ecological validity and unproven outcomes due to the fact that they lack the

experimental evidence needed to assess their effectiveness. Moreover, the implementation and

sustaining of such therapeutic measures on a long-term basis can be very difficult and time-

consuming for already busy healthcare staff as they require considerable resources and people.

Because of fast-growing demographic trends, the available care needed to provide supervision

and coaching for cognitive therapy is already lacking and on a recognized steady decline [25].

Therefore, there exists an urgent need to further investigate the potential use of cognitive training

interventions as a tool to aid the rapid growing numbers of people suffering from dementia.

Page 12: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

4

1.2 Robots as Assistive Aids for Cognitively Impaired Persons

Recently, due the fast growing elderly population in the world, there has been great interest in

the development of social robots [26]-[31] and systems [32]-[35] as aids for cognitively impaired

persons in leisure and daily living activities. The aim of these robots and systems is to aid

healthcare workers in a variety of different scenarios by providing the necessary attention,

cognitive and social stimulation, and guidance to cognitively impaired persons who may not

otherwise receive the necessary care. The following subsections discuss various social robots and

systems utilized for engaging cognitively impaired individuals in cognitively and/or socially

stimulating activities.

1.2.1 Social Robots as Therapeutic Aids in Cognitive Interventions

To date, only a handful of research groups have focused on developing life-like social robots to

engage different individuals in varying socially and/or cognitively stimulating activities [26]-

[31]. For example, the seal-like robot Paro, [26], has been designed to engage elderly persons,

including those with dementia, in animal therapy scenarios by learning which of the robot’s

behaviours (i.e., moving its body parts and making seal sounds) are desired by the way a person

pets, holds, or speaks to it. In [27], the wirelessly controlled robotic dog, AIBO, performs dog-

like actions such as fetching objects and chasing a ball to engage persons with dementia in card

and ball games designed to improve memory, control of emotions and social skills. Bandit II,

[28], engages a person with dementia in a music game by providing assistance and

encouragement via a pre-recorded human voice, social cues like applauding, and other human-

like body movements. The game is designed to improve or maintain cognitive attention. The

robot also adapts its behaviour to a person’s task performance and disability level. KASPAR, a

child-sized tele-operated humanoid robot engages an autistic child to play imitation games by

displaying various facial expressions, waving its hand, and drumming on a tambourine [29].

Keepon, a small soft yellow snowman-like robot, is designed to perform emotional and attention

exchange with children suffering from developmental delays/disorders as well as normally

developing children. The robot acquires attention by orienting its face to a person. It expresses

emotions with sounds and by rocking or bobbing its body [30]. Lastly, the IROMEC (Interactive

Robotic Social Mediators as Companions) robot is designed to encourage the development of

communication, motor, cognitive, sensory and social interaction skills for autistic children via

Page 13: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

5

various interactive play scenarios. The robot engages a child in an activity with its graphical user

interface, buttons and wireless switches [31].

1.2.2 Activity Guidance Systems for Cognitively Impaired Persons

Currently, a few automated computer-based activity guidance systems have been developed to

aid elderly persons in various activities of daily living. The aim of these systems is to reduce the

dependence of elderly persons to caregivers by monitoring their actions and providing prompts

to guide the person through the steps of an activity. For example, Hoey et al. have created a real-

time vision-based system to assist a person with dementia with washing his/her hands. Via video

inputs, assistance is given in the form of verbal or visual prompts, or through the enlistment of a

human caregiver's help if necessary [32]. In [33], the Erroneous Plan Recognition (EPR) system

monitors a person with dementia during meal-time and determines if he/she has executed a

correct or erroneous action according to a pre-defined plan. Pressure sensors, near-field Radio

Frequency Identification (RFID) antennas, Pyroelectric InfraRed (PIR) sensors, reed switches,

and accelerometers are deployed in the dining area and the kitchen to detect actions such as the

opening of cupboards and bringing food from a plate to the mouth. If the person performs an

erroneous action, the system provides audio and visual prompts to correct the action. Si et al.

have developed a guidance system for older people with dementia to support them in activities

such as tea making, shaving, and self-hairstyling [34]. Attached pressure sensors or

accelerometers are utilized to sense the usage of the tools relevant to the activity. The system is

designed to first learn the person’s routine (i.e. the order in which he/she uses the tools) and then

provide prompts in the form of blinking red or green LED lights to guide him/her through the

learned routine. In [35], the Pearl robot was designed to assist elderly individuals with mild

cognitive and physical impairments in their daily activities by providing appointment reminders

and information via audio prompts as well as physically guiding them to their appointment. The

robot uses information obtained through navigation (laser range-finder) and interaction sensors

(speech recognition and a touch-screen). The aforementioned systems demonstrate the potential

in effectively guiding a person through a daily activity via instructive prompts; however none of

these systems have investigated the potential benefits of having a human-like embodied robotic

system.

Page 14: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

6

1.2.3 The Socially Assistive Robot Brian 2.0

One of the main objectives of the Autonomous Systems and Biomechatronics Laboratory

(ASBLab) is to provide insight into the use of innovative robotic technologies to manage mild to

moderate dementia. One particular type of robot being developed is socially assistive robots,

which can provide assistance to individuals through social and cognitive interaction.

The initial work of the ASBLab in this area focused on developing the human-like socially

assistive robot Brian, who provides monitoring, reminders, and companionship to individuals in

social human-robot interaction (HRI) scenarios [36]-[38]. A person’s accessibility level towards

Brian, as determined by his/her body language and the assistive tasks to be accomplished, was

utilized by the Q-learning algorithm to determine the robot’s appropriate assistive behaviour.

Brian’s behaviour was then displayed through both verbal (e.g., speech) and non-verbal (e.g.,

facial expressions, body language) communication means.

Currently, the ASBLab is developing the next generation robot, Brian 2.0, Figure 1. Brian 2.0 is

being designed as a tool to engage people suffering from dementia in personalized cognitive

interventions in order to reduce their dependence on healthcare workers as well as provide them

with an avenue to interact and socialize during the course of these activities. The significance of

using a human-like social robot lies in the ability to directly incorporate a person’s existing

capabilities to communicate naturally as well as his/her ability to understand these forms of

communication. Namely, the robot will tap into remnants of already existing communication

skills of a person suffering from dementia in order to provide effective guidance as well as

cognitive and social stimuli. Another objective of this research is to study how these robots can

contribute to therapeutic protocols aimed at improving or maintaining residual social, cognitive,

affective, and global functioning in persons with dementia. Finally, the ASBLab aims to

ultimately make cognitive interventions more accessible to residents in long-term care facilities

through the aid of socially assistive robots.

Page 15: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

7

Figure 1: Brian 2.0

1.3 Problem Definition

This thesis focuses on the development of a novel learning-based HRI control architecture for

Brian 2.0. This control architecture will enable the robot to effectively engage an individual in

one-on-one person-centered scenarios and provide task assistance as needed. In particular, the

architecture allows Brian 2.0 to be a social motivator that provides a variety of assistive,

instructive, encouraging, orienting, and celebratory prompts during the course of an activity. A

hierarchical reinforcement learning (HRL) approach is used in the architecture to provide the

robot with the ability to: (i) learn appropriate assistive behaviours based on the structure of the

activity, and (ii) personalize an interaction based on human actions/affect during HRI. The

control architecture will be implemented for a Memory Game Scenario and a Meal-time

Scenario.

1.4 Proposed Methodology and Tasks

The overall design of the learning-based control architecture for Brian 2.0 comprises of the

following components with corresponding reference to the Thesis Chapters:

1.4.1 Literature Review

In Chapter 2, literature review for the following two areas, which are critical to the development

of intelligent socially assistive robots, is presented: (i) social intelligence and (ii) strategies for

addressing uncertainty in social HRI.

Page 16: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

8

1.4.2 Design of Control Architecture for Socially Assistive Robots

In Chapter 3, the overall design of a novel control architecture is presented. The control

architecture will be first discussed in terms of the general functionality of each module within the

architecture, the different types of inter-module communication systems that can be utilized, and

how uncertainty is addressed. The implementation of the control architecture for the Memory

Game Scenario and the Meal-time Scenario will also be presented. Namely, the specific

functionality of each module as it pertains to each HRI scenario will be shown, as well as which

inter-module communication systems are utilized.

1.4.3 Learning-based Decision Making for the Behaviour Deliberation Module

In Chapter 4, the detailed design of the Behaviour Deliberation module is presented. A brief

background of Markov Decision Processes (MDPs) and the MAXQ HRL technique is first

presented. Then, the application of the developed MAXQ HRL technique on the Memory Game

Scenario and the Meal-time Scenario is shown. Namely, for each scenario, the task

decomposition, the state and action definitions, and a two-stage training procedure are proposed.

The 1st training stage determines the appropriate behaviours for the robot based on the structure

of the activity and the 2nd

stage focuses on developing personalized interactions based on human

actions/affect during HRI.

1.4.4 Implementation

In Chapter 5, extensive experiments are presented to evaluate the control architecture and its

learning-based decision making capabilities for the Memory Game Scenario and the Meal-time

Scenario. Two types of experiments were conducted for each scenario. Namely, the 1st set

assessed the performance of the key modules within the control architecture. The 2nd

set studied

the effect of the robot’s behaviours during HRI scenarios. Discussions to illustrate the

effectiveness of the proposed designs are also presented.

1.4.5 Conclusion

Lastly, Chapter 6 presents concluding remarks on the development of the control architecture,

highlighting the main contributions of the thesis and future work.

Page 17: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

9

Chapter 2 Literature Review

2.1 Development of a Socially Assistive Robot for HRI

In order for robots to effectively engage human participants in different types of interactions,

they must possess the necessary intelligence to adapt to human behaviours in each type of

interaction. Furthermore, the robot should be capable of dealing with uncertainty due to

incomplete and inconsistent sensory data and non-deterministic human actions/behaviours. The

1st subsection will discuss the learning strategies that have been utilized by socially intelligent

robots to adapt to various social HRI settings. The 2nd

subsection will discuss strategies for

addressing uncertainty in social HRI.

2.1.1 Learning Strategies for Socially Intelligent Robots

It is envisioned that robots will need to have social intelligence in order to be effectively

integrated into human society. Social intelligence allows a robot to share information with, relate

to, and interact with humans. HRI research involves empowering a robot with the social

functionalities needed to engage human participants in different types of interactions. In order to

be socially intelligent, robots must be able to [39]: (i) perceive and interpret human activity and

behaviour, (ii) respond in a natural, appropriate, and believable manner, (iii) display

understandable social cues such as the expression of emotions, and (iv) operate at human

interaction rates. A number of these characteristics will need to be formulated via the study and

development of social learning capabilities for robots.

In general, learning strategies can be utilized by robots in order to enable them to adapt to social

HRI settings. Recently, a number of socially intelligent robots have been developed that are

capable of learning their behaviours for social HRI scenarios. A common approach has been to

utilize reinforcement learning (RL) strategies to solve HRI control problems that are modeled as

either a Markov decision process (MDP) [36]-[38],[40]-[43] or partially observable Markov

decision process (POMDP) [44], where the latter deals with noise and state uncertainty.

In [40], the Leonardo robot used a Q-learning approach to learn to turn on/off a set of buttons

through a turn-taking socially guided interaction. The interaction consisted of verbal instructions

Page 18: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

10

such as “press” and “look”, task feedback such as “good” and “not quite” provided by a person

as well as the person physically demonstrating how to press the buttons. In [41], the HOPE-3

robot used Q-learning in order to learn various social body gestures through imitation. Namely,

Q-learning was used to extract optimal symbolic postures from a human and incorporate

interpolation techniques for generating the same postures on the robot. Prommer et al., [42], used

the Watkins’ Q(λ) method to manage the dialog of an early-stage robot bartender whose main

goal was to identify an object of interest, such as a bottle, plate or cup, by asking a human

customer a series of questions. The STAIR home/office robotic assistant used RL to learn the

optimal dialogue for a speaker identification scenario where the robot attempts to identify the

person with whom it was interacting through a set of questions [43]. Other approaches have

focused on utilizing policy gradient reinforcement learning (PGRL) when there is no obvious

notion of state, i.e., [28],[45]. In particular, PGRL was used in [28] to determine Bandit II’s

behaviour in terms of the amount and speed of its movements, and the type of help it should

provide based on a person’s task performance. In [45], PGRL was used to tailor a mobile robot’s

behaviour to a person’s personality during post-stroke rehabilitation exercises. Namely, learning

was used to determine the mobile robot’s behaviour in terms of interaction distance, robot speed,

and vocal content, based on the person’s introversion-extroversion level.

Hierarchal reinforcement learning (HRL) methods have also been proposed for HRI scenarios. In

the case of HRL, the decision making problem is decomposed into a collection of smaller sub-

problems so that they can be solved more efficiently [46]. This results in faster learning as the

value function requires less data to be learned. For example, in [35], a hierarchical POMDP

approach was implemented in the dialog-based guidance task of the Pearl robot in order for the

robot to perform tasks such as reminding a person of an appointment, navigation and/or

information assistance. The control policy was computed off-line, hence, during task execution,

the controller simply looked up the appropriate robot action to be implemented.

As an alternative to RL, neural networks (NN) have also been implemented for learning in social

HRI scenarios, i.e., [47],[48]. For example, in [47], an NN was used to teach the receptionist

robot Arisco to convey appropriate communicative behaviours such as facial expressions based

on various input stimuli including clapping, color, movement, speech, IR signals, and a human

face. Breazeal et al. also used an NN to enable Leonardo to imitate facial expressions of a human

Page 19: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

11

[48]. Correspondence between perceived facial features from the human and the robot’s own

facial features was determined through learning.

2.1.2 Strategies for Addressing Uncertainty in Social HRI

For our social HRI application, the robot should be capable of dealing with uncertainty due to

incomplete and inconsistent sensory data and non-deterministic human actions/behaviours.

Uncertainty from sensor data can be addressed at the sensor data processing level

[42],[35],[49],[50], and/or at the decision making level of the robot’s control architecture

[42],[35],[49],[51], whereas, uncertainly resulting from non-deterministic human behaviours is

usually addressed at the robot decision making level [42],[44],[35],[49],[51].

The advantage of resolving sensor data uncertainty at the sensory data processing level is that

dedicated sensor-specific algorithms can be developed to directly deal with uncertainties and

noise acquired from raw sensor readings, resulting in a more accurate representation of the state

of the interaction prior to the decision making process. Common practices in addressing

uncertainty include the utilization of data filtering and data fusion techniques. For example, in

[50], the tour-guide robot RoboX deployed at the Swiss National Exhibition fused both visitors’

dialogue with the presence of visitors in close proximity of the robot (determined by a laser

scanner) via a Bayesian network framework in order to determine a visitor’s intention (e.g. if the

visitor wants to know more about the exhibit or go see the next exhibit). Namely, the laser

scanner was utilized to reduce the speech recognition errors that arise from noisy environments

consisting of crowds of people and other moving robots.

At the decision-making level, stochastic non-deterministic Decision Theory

[42],[44],[35],[49],[51] techniques are a popular approach for robot decision-making as they can

take into account the non-deterministic nature of social HRI scenarios. Non-deterministic

approaches allow for multiple paths to be taken from a given starting point. Some paths arrive at

the same outcome and some arrive at different outcomes. Nonetheless, all outcomes are valid

regardless of the choices that are made during execution. Typically, an MDP [42] or POMDP

[44],[35],[49],[51] approach is taken, where the latter approach deals directly with state

uncertainty.

Page 20: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

12

Unlike at the sensory data processing level, an in-depth analysis of uncertainties acquired from

actual sensor readings cannot be performed at the decision-making level as all sensor inputs are

treated the same. However, techniques unique to the decision-making level such as state

prediction and error correction methods can be equally effective when dealing with sensor data

uncertainty. For example, state prediction methods used with a POMDP model process inputs

from the sensory level into a belief state by using observation and transition models in a

Bayesian update step [49]. On the other hand, error correction methods can be implemented by

incorporating state validation questions [42],[35],[51] or by repeating a recognition action [49].

For example, state prediction and error correction methods using clarification questions were

utilized within a POMDP model to resolve speech recognition [35],[51] and self-localization

[51] uncertainty.

Hybrid approaches also exist, for which uncertainty can be resolved at both the sensor data

processing level and at the decision making level. For example, in [49], Schmidt-Rohr et al. used

a feature filter system at the sensory data processing level to handle the abstraction of multi-

modal perception and sensor uncertainty for a service robot designed to identify if a person

would like a cup and then fetch the cup to place it where the person chooses. Each filter

processed sensory data from one or more sensors in a single sensor group (i.e. object

localization, human speech, and human activity) into a belief state probability distribution. The

output of these filters was then merged via Bayesian forward filtering into a single belief state

that was used at the robot’s decision making level, where simplified state prediction was

performed symbolically within a discrete POMDP model.

Uncertainty resulting from non-deterministic human actions/behavior can also be addressed

using a probabilistic user model for the training of MDPs and POMDPs. A simple stochastic

approach for user modeling is the n-gram model, [52], which predicts human response based on

the last n number of system actions. The advantages of the n-gram model is that it allows for the

multiplicity (i.e. the person can perform multiple actions at the same time) and multi-modality of

user actions to be modeled. For example, in [42], a bi-gram model was used to represent user

actions defined by a person’s speech and pointing gestures in response to the robot bartender’s

actions. Other user models that have been used specifically in human-computer interaction (HCI)

applications include Levin [53], Pietquin [54], and Hidden Markov [55] models.

Page 21: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

13

Chapter 3 Design of Control Architecture for Socially Assistive Robots

3.1 Proposed HRI Control Architecture

A generic learning-based HRI control architecture is proposed to enable the robot to monitor the

person’s user state and behaviour (i.e. verbal and physical actions) during the activity and adapt

its own emotion-based behaviour to the current interactive scenario. A modular design approach

is applied to the overall control architecture, allowing for the addition and/or substitution of

different sensor modalities as needed based on the intended activity. Figure 2 shows an overview

of the HRI control architecture for a socially assistive robot.

Activity State

User State

Speech Recognition &

Analysis

Behaviour Deliberation

Actuator Control

Activity Sensors

User State Sensors

Microphone

Motors

Speakers

Knowledge Clarification

Robot Emotional State

Intelligence

Figure 2: Control Architecture for a Socially Assistive Robot

Sensory information is acquired for: (i) recognizing human verbal actions via a microphone, (ii)

user state recognition via user state sensors (e.g. microphone, heart-rate sensor, camera), and (iii)

activity state monitoring using activity sensors (e.g. camera, load cell). The Activity State

module is used to monitor the state of the activity during the interaction. Human verbal intent is

recognized via the Speech Recognition and Analysis module, while user state is determined

using the User State Recognition module. The Robot Emotional State module uses the person’s

user state and the current assistive action of the robot to determine the emotional state of the

robot. The objective of the emotional state module is to determine which robot emotion will

elicit an appropriate response from the human in order to accomplish a given task while also

responding appropriately to a person’s user state. The Behaviour Deliberation module is the main

decision making module of the architecture and is utilized to determine the robot’s effective

assistive behaviour. This module requires inputs from all four of the aforementioned modules.

Page 22: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

14

Robot behaviour is executed by the actuator control module. In particular, the module is

responsible for physically implementing the robot’s behaviour using a combination of speech,

facial expressions and gestures via the appropriate actuator hardware (e.g. speakers and motors).

Within this module, a voice synthesizer is utilized to generate the robot’s voice based on the

robot’s emotion. Sensors, actuators, and the specific functionality of each of the aforementioned

modules can be customized to the chosen activity.

In this work, the proposed generic control architecture is applied to the Memory Game Scenario

and Meal-time Scenario; however, due to its generality, it can be applied to any person-centered

guidance-type activity in HRI scenarios. This control architecture has the potential to provide

socially assistive robots with the necessary intelligence to be effective social motivators for

individuals that need assistance.

3.1.1 Methods of Inter-module Communication

There are two ways that the aforementioned modules communicate with each other in this

control architecture: (i) via a pipeline system or (ii) via a data-pool system. The pipeline system

is utilized for synchronization purposes. Namely, it enables a module to start or pause the

operation of another module by sending commands through the pipeline. Conversely, the data-

pool system is used to store state information, which can be accessed and updated by multiple

modules in the control architecture.

3.1.1.1 Pipeline System

The pipe system consists of unique unidirectional first-in, first-out (FIFO) connections between

the modules in the control architecture via a reserved area of computer memory. Conceptually, a

message can be sent from one module to another through a pipe. Once the message is read, it is

destroyed and no longer present in the pipe. Furthermore, to avoid reading from an empty pipe,

which will cause the module to be suspended by the computer operating system, the receiving

module always checks if there is data present in the pipe before reading form it. The advantage

of the pipe system is that it does not allow memory to be written and read at the same time by

two modules. Moreover, modules that do not need to be continuously running can be paused

until needed.

Page 23: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

15

3.1.1.2 Data-pool System

The data-pool system consists of a shared pool of data stored in a reserved area of computer

memory that can be accessed and updated by multiple modules in the control architecture.

Sensory input modules (e.g. Activity State module) can update to the data-pool whenever they

detect a change in state and whenever the Behaviour Deliberation module needs to observe the

state, it queries this data-pool. Flags indicate the current status of the data-pool (i.e. in-use or

free). Before querying a data-pool the module must check the status of the data-pool; otherwise,

if the module queries a data-pool that is in-use, it, and/or the other module using it, may be

suspended by the computer operating system. The advantage of the data-pool system is that

modules can operate in parallel and communicate with each other without running the risk of

being suspended by the operating system.

3.1.2 Addressing Uncertainty

In the proposed HRI control architecture, a hybrid approach is applied to resolve uncertainty at

both the sensor data processing level and at the decision making level. At the sensor data

processing level, sensor-specific algorithms are utilized to obtain the best possible state

representation prior to the decision making process. These algorithms directly deal with

uncertainties and noise acquired from raw sensor readings, resulting in a more accurate

representation of the state of the interaction. At the decision making level, a knowledge

clarification layer, which utilizes clarification dialogue or multi-modal fusion techniques, is

incorporated to reduce state recognition errors. Furthermore, non-deterministic human

behaviours are accounted for by the MAXQ algorithm. On-line training is also utilized to adapt

to non-deterministic scenarios as well as new users.

3.2 Memory Game Scenario

Studies have shown that individuals with dementia residing in long-term care facilities are at a

higher risk for understimulation because they lack the initiative to begin or sustain leisure

activities [56],[57]. Prolonged lack of stimulation can be harmful to these individuals as it can

increase the boredom, apathy, loneliness, and depression that accompany the progression of

dementia [58],[59]. Therefore, engagement is a critical priority for the mental and emotional

Page 24: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

16

health of individuals suffering from dementia. Studies have shown that social engagement also

plays an important role in the prevention of dementia [19]-[22].

Cognitive leisure activities are types of activities which individuals engage in for their enjoyment

or overall well-being, and may include writing, puzzles, reading, games, playing musical

instruments and participating in social discussions [60]. The goal is to design a robotic social

motivator to provide interventions that focus on strengthening the remaining cognitive abilities

of a person, while promoting engagement in the cognitively stimulating leisure activity at hand.

During a cognitively stimulating activity, Brian 2.0 is designed to provide the elderly with

opportunities to socialize and interact during cognitively stimulating activities. Two criteria

identified in the literature have been used to design the cognitive intervention that Brian 2.0 can

provide to individuals in order to better engage them in an activity of interest. Firstly, the focus is

on matching the stimuli that is provided by the robot to a person’s skill and interest level, which

has shown to significantly increase a person’s engagement and positive affect [61]. Secondly, the

robot is designed to provide one-on-one social stimuli, which has been identified to be one of the

most engaging forms of stimuli [62].

In this work, the card game of memory has been chosen as the cognitively stimulating activity. In

this scenario, the proposed control architecture will enable Brian 2.0 to effectively engage an

individual in one-on-one person-centered a cognitively stimulating activity, i.e. Figure 3. In

particular, the architecture allows Brian 2.0 to be a social motivator by providing assistance,

encouragement and celebration during the course of an activity.

Figure 3: Brian 2.0 in a Memory Game Scenario

Page 25: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

17

3.2.1 The Card Game of Memory

The memory game consists of 16 picture cards turned face down in a 4x4 grid formation. The

objective is for the human player to flip over pairs of cards and match the pictures on the cards

correctly. Once a pair has been matched, the two cards are removed from the game. The game is

over when all cards have been matched. Individuals play the game as single players while the

robot autonomously provides preferred amounts of social stimulation in order to keep these

individuals engaged in the game. The memory functions within the brain that are trained while

playing this card game include the visual object memory and the updating function of the central

executive component of the working memory [63].

For the memory game, the notion of winning is not as crucial as keeping the person stimulated

and engaged in the activity. In order to do this, herein, the focus is on reducing activity-induced

stress of the person. Activity-induced stress is known to result in negative moods, and lead to

disturbances in motivation (e.g., loss of task interest) and cognition (e.g., worry) [64]. Moreover,

stress has been found to progress the symptoms of dementia after its onset [65].

3.2.2 Control Architecture for the Memory Game Scenario

The proposed HRI control architecture for the Memory Game Scenario is presented in the

following subsections. The HRI control architecture focuses on determining the person’s user

state, his/her task performance, and speech during an interaction with Brian 2.0, and adjusting

the robot’s behaviour to reflect the task to be completed given a particular user state.

3.2.2.1 Sensory System

Figure 4 shows the sensory system for the Memory Game Scenario. Sensory information is

acquired for: (i) recognizing human verbal actions via a Logitech noise-cancelling microphone,

(ii) user state recognition via an emWave ear-clip heart rate sensor or a Logitech noise-cancelling

microphone, and (iii) activity state monitoring using a Logitech 1.3MP webcam. The heart rate

sensor is utilized to determine a person’s affective arousal level during activity engagement.

Page 26: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

18

Figure 4: Sensory System for the Memory Game Scenario

3.2.2.2 Activity State Module

The objective of this module is to identify and analyze the state of the cards (game state) as a

person plays the memory game. 1.3 Megapixel images taken by webcam are utilized to identify,

locate and track the picture cards during the course of the game. The camera is placed above the

card game and provides a top view perspective of the game set-up. Card recovery errors can arise

during activity state recognition when cards become obstructed. This mainly occurs due to the

temporary presence of human hands. Uncertainty is minimized by capturing and analyzing n

number of images of the same activity state. A probabilistic voting system is then utilized on the

images to determine the current activity state.

The game state is defined by five distinct classifications: (i) start of game s(nf), (ii) zero cards

flipped s(c), (iii) one card flipped s(c,i,l1x,y,lmx,y), (iv) two cards flipped s(c,i,l1x,y,l2x,y,m), and (v)

end of game s(nm). nf is defined as the total number of cards flipped in the game and c represents

the number of cards flipped in a single round, where c = 0,1,2. i represents the identity of a card

as defined by the picture on it, where i = 1 to 8. The locations of the cards that have been flipped

over are defined by l1x,y, l2x,y, respectively. lmx,y represents the pair location of a card that has

been flipped. This location is only known if that pair card was already flipped over in a previous

round of the game. m represents whether a matched pair has been found, and nm represents the

total number of matches a person has found during a game.

Two card recognition and localization approaches were developed to identify the aforementioned

states: (i) a SIFT-based method, and (ii) a colour-based method. The 1st method utilizes the

Page 27: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

19

Scale-Invariant Feature Transform (SIFT) technique, developed by Lowe [66], and is more

robust than the 2nd

method as card recognition is invariant to image translation, scaling, and

rotation, as well as partially invariant to illumination changes and affine or 3D projection. The

sampling time (i.e. approximately 10-20 seconds) for this method is slower than the colour-based

method (i.e. approximately 3 seconds). However, interactions with the elderly individuals with

cognitive impairments will be slower and the SIFT-based method will be sufficient. For our HRI

experiments in the lab, we found the colour-based method to be more effective for our

interactions. The SIFT-based method was used for the 1st set of experiments, which are presented

in Sections 0 and 5.1.2. The colour-based method was developed and used for the 2nd

set of

experiments, which are presented in Sections 5.1.3 and 5.1.4.

SIFT-based Method

The overall proposed SIFT-based approach will be discussed herein outlining its most pertinent

steps: Step 1- Identifying keypoints on cards; Step 2- Identifying card clusters; Step 3- Card

recovery; and Step 4- Card matching. By performing the four steps described below, the card(s)

of interest will be successfully located and identified. A 1024x768 resolution camera image is

used in this approach.

Keypoint Identification (Step 1)

In order to identify each of the picture cards as they are flipped over during game playing, a card

recognition and localization approach that utilizes the SIFT technique has been developed to

identify distinctive invariant features on the picture cards captured by the camera. SIFT

transforms an image provided by the camera into a large collection of local feature vectors,

which are called SIFT keypoints. Pairs of picture cards utilized in the memory game have unique

SIFT keypoints, allowing them to be distinguished from each other. Herein, SIFT is utilized to

identify the keypoints on the cards in an image taken by the camera, i.e. Figure 5. The blue dots

in the figure represent the keypoints.

Page 28: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

20

Figure 5: Keypoint Identification for the Memory Game

Once the keypoints are identified, the total number of keypoints found in the image, K, is

compared with a Minimum Keypoint Threshold (Kmin). In general, when there are no cards

flipped over, there is a smaller number of keypoints found in the image. These keypoints are

mainly a result of the shadows projected by the edges of the cards as can be seen in Fig. 3.

Conversely, when there are one or more card(s) flipped over, the number of keypoints found in

an image is considerably large due to the texture provided by the pictures on the card(s). This, in

turn, results in K exceeding the threshold Kmin. When this occurs, Step 2 is implemented in order

to cluster keypoints belonging to the same cards together. Otherwise, the module notifies the

Behaviour Deliberation module that there are no cards flipped over.

Card Cluster Identification (Step 2)

A nearest neighbor search algorithm is proposed that defines regions in the 2D images containing

keypoints that may potentially represent picture cards that have been flipped over. The following

sub-steps outline the search algorithm:

Sub-Step 1: A random keypoint, pij , is chosen on the image.

Sub-Step 2: A square of length l and width w is drawn symmetrically around pij to

search for its nearest neighbour keypoints, Figure 6. The objective is to

find a starting point with a large number of nearest neighbours. If the

Page 29: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

21

number of nearest neighbours initially found is small (less than a defined

threshold nsmin), Sub-Step 1 is repeated. Otherwise, each side of the square

is extended one by one (in a clockwise fashion starting with the top side)

in order to determine all the nearest neighbours of pij, i.e. Figure 6. The

extension stops when the number of keypoints in an extended area is

below a minimum threshold, nemin.

Once a boundary for the cluster of keypoints has been identified, the cluster is said to represent a

card, whose location is defined by the center of the cluster (xc,yc). This cluster is used to

determine the identification of the card. After the first card has been localized, the keypoints

belonging to this card are removed from the keypoint clustering search. At this time the

remaining keypoints in the image are utilized to determine if a subsequent cluster representing

another card exists.

Figure 6: Card Cluster Identification: a square is drawn symmetrically around keypoint pij

(red dot within the square) and expands in the directions denoted by the arrows.

Card Recovery (Step 3)

The card recovery module is utilized to determine the identity of the card(s) that have been

flipped over. A database of the keypoints for each picture card is utilized to identify the cards

that have been flipped over during the game. Namely, the clusters of keypoints representing

cards found in Step 2 are matched to the database of keypoints that has been defined for each

individual picture card, i.e. Figure 7a. Matching utilizes the Best-Bin-First (BBF) method [66].

This can be achieved in terms of matching the descriptors of the keypoints, which can

Page 30: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

22

correspond to finding a set of nearest neighbors (NN) to a query point. Since individual SIFT

keypoints are easily distinguishable, the majority of keypoints can be matched correctly to

identity the card of interest. During game playing, cards may be slightly moved and rotated from

their starting positions or partially obstructed. The robustness of the method enables a card to be

identified in all of these circumstances as long as the card stays within the viewing area of the

camera. Figure 7b shows an example of a rotated card matching correctly and Figure 7c shows

an example of a partially obstructed card matching correctly. Within this step, the identity of the

card is stored along with the location of the card for use in future rounds.

(a)

(b)

(c)

Figure 7: Card Recovery: The picture card in the game (right) is matched to the database

card (left). (a) The picture card in the game is upright, (b) the picture card in the game is

rotated, and (c) the picture card in the game is partially obstructed by a person’s fingers.

Card Matching (Step 4)

The card matching technique checks to see if the two cards that are flipped over match by using

the BBF method to match the card clusters found in Step 2. If the keypoints in the two clusters

are correctly matched, then the cards are considered a match, otherwise it is defined to be a no

match condition.

Colour-based Method

In the colour-based method, the cards are differentiated based on the colour of the features on the

card. Card identification based on colour is a quick and accurate process as long as the colour

content of each card is unique and lighting is controlled as much as possible in terms of intensity

and tint. A 640x480 resolution camera image is used, resulting in a fast (i.e. 3 seconds) analysis

time. The method is invariant to image translation, scaling, and rotation and is sufficient for our

experiments in the lab. The overall proposed colour-based approach, which was developed by

another ASBLab member [67], consists of the following steps:

Page 31: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

23

First, the camera image is divided into a 4x4 grid of 16 sections (Figure 8).

.

Figure 8: Division of camera image into a 4x4 grid

Within each section, the colour content is checked using RGB values. If more than half of the

section is detected to be black, the program assumes that there is no card present and moves on

to analyzing the next section. However, if there is a large non-black entity detected in the

section, the presence of a card is inferred. Black and white areas in the image are then subtracted

by setting their pixel values to 0, leaving only the coloured areas, i.e., Table 1. Lastly, to

determine the identity of the card, the RGB and CMYK (i.e., cyan, magenta, yellow and black)

values of these coloured areas are compared with a database of pre-defined colour ranges for

each card.

Table 1: Background Subtraction Method

Camera image Processed Image

Unflipped card

Flipped card

Page 32: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

24

3.2.2.3 Speech Recognition and Analysis Module

Human speech is recognized via the Speech Recognition and Analysis module. Recognition is

performed by Julius, a two-pass large vocabulary continuous speech recognition (LVCSR)

decoder [68]. The LVCSR utilizes the person independent VoXForge acoustic model [69], which

is composed of statistical representations, created via Hidden Markov Models, for each phoneme

in the English language to account for persons with different accents and speaking styles. The

acoustic model has been trained using 625 unique voices.

Words are recognized based on their phonemes and their approximate location in an utterance.

The software can also recognize syntax or patterns of words (i.e. sentence structure). When given

a speech input, it searches for the most likely word sequence under constraint of the given

grammar. The sampling period is 625 nanoseconds. At the end of each sample, the program

outputs the words to the speech analyzer. The speech analyzer compares corresponding synsets

to its own database of words which are grouped into nouns and other lexical categories to

identify a match.

The reliability of the spoken utterance is determined using word confidence scores which are

based on a combination of predicator features (e.g., acoustic and language model scores). If the

weighted average of all the confidence scores of an utterance is low, this information is sent to

the Knowledge Clarification layer in the Behaviour Deliberation module in order to resolve the

uncertainty via the robot asking clarification questions.

For the Memory Game Scenario, the LVCSR software has been customized to support the

vocabulary, dialog and action-based context needed during game playing. In particular, the

vocabulary and grammar definitions have been configured with the syntactic constraints of a

“response” or “question” posed to Brian 2.0. Examples of “responses” include “Yes” and “No”.

Table 2 shows a list of “questions” that the Brian 2.0 can recognize during the HRI scenario.

They are categorized into three types of phrases: (i) Localize, (ii) Identify, and (iii) Recall.

Page 33: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

25

Table 2: List of Recognized Questions for the Memory Game Scenario

Questions

Localize

Where is the card located?

Where is this card located?

Where's the card located?

Where is the location of the card?

Identify

What is this card?

What is the card?

What is on the card?

Recall

Have I seen this card before?

Have I seen the card before?

Have I seen this card?

Have I seen the card?

Have I seen it already?

3.2.2.4 User State Recognition Module

The User State recognition module is used to determine a person’s user state during game

playing. Two user state detection approaches were developed for the Memory Game Scenario: (i)

verbal intonation and (ii) a combination of affective arousal and activity performance. The 1st

approach was used for the 1st set of performance assessment experiments and HRI studies

(Sections 5.1.1 and 5.1.2). Experimental results showed that this approach provided limited

opportunities in detecting user state during the course of the activity. Namely, a change in user

state (if present) could only be detected when the person spoke, which may not be often during

the memory game. Therefore, a 2nd

approach was developed to allow for a more continuous

monitoring of user state during HRI. The 2nd

approach was used for the 2nd

set of performance

assessment experiments and HRI studies (Sections 5.1.3 and 5.1.4).

Verbal Intonation

Verbal intonation can used to determine user state during HRI. In particular, this approach

utilizes Layered-Voice Analysis (LVA) software developed by Nemesysco Ltd. [70]. LVA uses

wide range spectrum analysis to detect anomalies in brain activity that are revealed by minute

changes in speech waveform. In this work, the software is used to determine the following

affective states during game playing: stressed, bored, neutral or positively excited.

Page 34: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

26

The “affect signature” is analyzed for each sentence uttered by the person. Numerical values for

the parameters stress, bored and positively excited are determined by the software for each

utterance. These values are compared with threshold values for each parameter to determine the

dominant affective state of a person. If none of the parameters exceed their threshold, then the

person is determined to be in a neutral state. The thresholds for the parameters are set based on

the input from 100 different experimental trials.

Affective Arousal and Activity Performance

In this approach, user state is determined using a combination of affective arousal and activity

performance. Affective arousal is the intensity with which emotional stimuli are perceived [71].

Herein, a person’s arousal level is based on his/her heart rate value. Heart rate has a long history

of being used as an index of arousal [72]. Heart rate data is gathered from the user during

interaction at a sampling rate of 2Hz. A smoothing algorithm is employed to eliminate outliers

due to sensory noise. Every data point is compared with a threshold of ±4bpm to an average of

four data points before it and an average of four data points after it. The baseline heart rate,

which is an average of 10 valid data points, is acquired before the start of the activity.

Subsequent valid heart rate readings are compared to this baseline, with a threshold of 5bpm, to

determine if the person is in a high or low affective arousal state. Activity performance is

determined by whether or not matching card pairs were found in the previous round of the

memory game by the Activity State module, Table 3. Table 3 has been developed through the

monitoring of numerous user experiments and the acquiring of participant feedback. In these

experiments, the heart-rate sensor was able to detect increased heart rate when a person was

faced with both a stressful and exciting situation in an activity. In the context of the memory

game, stress was directly related to the scenario when a matching card pair could not be found

and excitement was directly related to matching a pair of cards.

Table 3: Task-based User States

Activity Performance

No Match Match

Arousal High State = 0 (Stressed) State = 3(Excited)

Low State = 1 (Neutral) State = 2(Pleased)

Page 35: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

27

3.2.2.5 Robot Emotional State Module

The Robot Emotional State module uses the person’s user state and the current assistive action of

the robot to determine the emotional state of the robot. A finite-state machine approach is used to

match the appropriate robot emotion to a given user state and the robot’s assistive action within

the context of the cognitively stimulating activity. For the memory game, the robot emotional

states are: happy, neutral and sad (Figure 9).

(a)

(b)

(c)

Figure 9: Brian 2.0 in a (a) happy state, (b) neutral state, and (c) sad state.

A summary of the robot’s emotional state for various scenarios are shown in Table 4. When the

person finds a matching card pair and is in an excited state, the robot celebrates with him/her by

being in a happy state. The robot is sad when it has to repeat an instruction after a long period of

waiting. In general, in all cases when the user is stressed, regardless of the robot action to be

implemented, the robot will try to improve the user state of a person by being in a happy state.

For all other cases not mentioned here, the robot’s emotional state is neutral.

Table 4: Robot Emotional State for Memory Game Scenario

Human User

State

Current Robot Assistive Action

Instruct Celebrate Encourage Help

Stressed Happy Happy Happy Happy

Neutral Neutral Neutral Neutral Neutral

Pleased Neutral Neutral Neutral Neutral

Excited Neutral Happy Neutral Neutral

Distracted Sad Neutral Neutral Neutral

Page 36: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

28

3.2.2.6 Behaviour Deliberation Module

The Behaviour Deliberation module is the main decision making module within the HRI control

architecture and one of the main tasks of this thesis. This module requires inputs from all four of

the aforementioned modules in order to determine the robot’s effective assistive behaviour via a

MAXQ hierarchical reinforcement learning approach [46]. The detailed design of the Behaviour

Deliberation module will be discussed as it pertains to the robot engaging a person in the card

game of memory in Section 4.4.

3.2.2.7 Inter-module Communication System

Communication between the Activity State module and the Behaviour Deliberation module is

facilitated by the use of a pipe system. A pipeline system is used here in order to minimize the

computational load of Activity State module. In this system, there are two pipes. The Behaviour

Deliberation module requests information from the Activity State module by sending a request

for the current game state via the first pipe. Once the Activity State module receives the request,

it performs the necessary tasks to determine the current game state as previously discussed in

Section 3.2.2.2. The response is then sent to the Behaviour Deliberation module via the second

pipe.

All other inter-module communication is facilitated via the use of the data-pool system. A

designated data-pool is set up for each of the remaining modules. These modules update to their

respective data-pools whenever they detect a change in state and the Behaviour Deliberation

module queries these data-pools whenever it needs to observe the state. Parallel operation of the

modules is important in this application because speech and user state sensing must be performed

continuously and simultaneously. For example, the Speech Recognition and Analysis module

must sense quick actions (e.g., a person answering “Yes” in response to a question), which may

be missed if the module only monitors the activity when requested by the Behaviour Deliberation

module. The User State module monitors heart-rate continuously in order to filter out sensor

noise.

3.3 Meal-time Scenario

Recently, there have been numerous studies [73]-[77] that investigate approaches to prevent

malnutrition by improving eating habits among the elderly. In general, social interaction during

Page 37: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

29

meal-time has been found to play an important role at improving dietary intake for the elderly

[73]-[76]. Results from a qualitative study of meal-time in a long-term care facility suggest that

caregivers need to socially interact and encourage meaningful activities for the older person in

order to improve the quality of the meal-time experience and his/her nutritional intake [74].

Addressing the resident was found to be an important aspect of the social interaction. For

example, personal greetings, invitations to sit down, comments about the appearance of the food,

were shown to be effective at engaging the person in the social meal-time event. Nonverbal

exchanges such as smiles and laughter, and occasionally exchanging a few sentences on topics

unrelated to food was also effective at increasing engagement in the activity of eating.

Slowness in the activity of eating can hinder meal-time independence for cognitively impaired

elderly persons. Treatable causes of slowness include drug induced lethargy, easy distractibility,

prolonged chewing related to dry mouth, getting "lost" in repetitive behaviours such as chewing

and forgetting to swallow, and unappetizing food [75]. As an intervention to slowness, Osborn et

al. suggest providing frequent orienting information, verbal reminders, prompts, praise and

encouragement [75]. Hellen [76] suggests that when feeding elderly persons suffering from

dementia, it is important to present them with one food item at a time because they may not be

able to focus on an entire tray of food. Furthermore, they may need to be reminded to eat, chew,

or swallow in order to keep their attention on the meal and prevent choking. Findings from a

study by Coyne et al. confirm that directed verbal prompting and positive reinforcement can

increase the eating performance of elderly persons suffering from dementia [77].

The goal, herein, is to design a robot that can assist personal support workers during meal-times

by acting as a social motivator that promotes independent eating habits during the meal. The

robot’s task is to engage a person in the meal eating activity, while adding a social element to the

meal-time experience. The intelligence of Brian 2.0 is designed in this work to allow the robot

to motivate the person to consume the contents of the meal by providing meal-related cues,

reminders, encouragement, praise and orientating statements via verbal/non-verbal

communication means. Herein, the contents of the meal consist of a main dish, side dish, and

beverage, i.e. Figure 10. The robot will focus the person’s attention to one dish at a time by

referring to a specific dish until that dish is fully consumed. The order in which the dishes are

referred to is based on a meal plan. The meal plan that is implemented is to consume the meal in

this order: main dish, drink, and then side dish. To add a social element to the meal, Brian 2.0

Page 38: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

30

will also greet the person by name, invite him/her to sit down, and provide various meal-related

or non-meal-related jokes and comments. Since the robot may potentially interact with elderly

persons with different interaction preferences and/or varying degrees of cognitive impairment, it

also has the ability to personalize its actions based on the person’s task compliance.

Figure 10: Meal-assistance Robot

The Meal-time Scenario was chosen to be the 2nd

HRI scenario because of the following reasons:

(i) eating is an important daily activity for elderly persons suffering from dementia as it is

directly related to their health and quality of life, and (ii) to test the design of the learning-based

control architecture on a less structured, preference-driven, human-robot interaction.

3.3.1 Control Architecture for Meal-assistance Robot

The proposed HRI control architecture for the Meal-time Scenario is described in the following

subsections. The HRI control architecture focuses on determining the person’s user state, his/her

task performance, and speech during an interaction with Brian 2.0, and adjusting the behaviour

of the robot to reflect the task to be completed given a particular user state.

Page 39: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

31

3.3.1.1 Sensory System

Figure 11 shows the sensory system for the Meal-assistance robot. Sensory information is

acquired for: (i) recognizing human verbal actions via a Logitech noise-cancelling microphone

(i.e. on the table), (ii) user state recognition via a front-facing 10MP Creative webcam (i.e. on the

robot’s left shoulder) to capture the person’s facial expressions, and (iii) activity state monitoring

using an embedded activity sensing system (i.e. on the table).

Figure 11: Sensory System for the Meal-assistance Robot

The activity sensing system (Figure 12) consists of a meal sensing tray platform and a computer

vision-based utensil tracking system. This system is designed to be easily implemented into

meal-time routines at dining halls in any long-term care facility. Non-contact sensors are utilized

within the system to minimize the disturbance of the sensors to the users and their natural eating

habits. Furthermore, the system is designed to accommodate any type of dishes, cups and

utensils used in the facilities so that it does not require extra preparation from caregivers prior to

meal time.

Page 40: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

32

Figure 12: Activity Sensing System (Note: load cells are under the side dish and cup)

Figure 13 shows the schematic of the meal tray sensing platform, which was developed by

another ASBLab member [78]. The platform consists of the following embedded sensors: (i) a

DYMO M10 Postal Scale (i.e. load cells) with a capacity of 10 lb for monitoring the main plate,

and (ii) two pairs of Phidgets shear micro load cells with a 0.78 kg capacity interfaced with a 4-

input Phidgets bridge for the side dish and beverage. The resolution of weight outputs for the

postal scale and the micro load cells is 2g and 1g, respectively. The weight outputs are fed into

the Activity State module of the control architecture. Weight readings from the pair of micro

load cells for the side dish and beverage are averaged. The meal components are physically

confined to rest on a certain area of the tray via tapered supports in order to achieve optimal and

repeatable contact between the dish/cup and the sensors, resulting in an accurate weight sensing

of the food and drink.

Page 41: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

33

Figure 13: Meal Tray Sensing Platform Schematic (Courtesy of Amy Do)

The utensil tracking system, which was developed by another ASBLab member [78] consists of:

(i) an IR camera from a Nintendo Wii remote and (ii) two 940nm infrared lights with a 120°

viewing angle. The Wii remote is mounted on the right shoulder of Brian 2.0, i.e. Figure 11.

Communication between the computer and the Wii remote is performed via an ASUS Bluetooth

dongle. The IR camera has a resolution of 1024x768 and can sense up to four infrared lights.

Utilizing the Wiiuse C library [79], the location of the infrared light within the camera image can

be obtained. The infrared lights are mounted on a small circuit board, which is attached to the

utensil via a clip-on attachment, i.e. Figure 14. One light is placed on each side of the circuit

board in order to accommodate left or right handed users.

FRONT BACK

Figure 14: Clip-on Device for Utensil Position Sensing

ON/OFF switch

Infrared Light

Infrared Light

Battery

TOP VIEW

Page 42: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

34

3.3.1.2 User State Recognition Module

For the Meal-time Scenario, the following user states are detected: happy, neutral, angry, and

distracted. Herein, user state is determined using a combination of facial orientation and

expression detection techniques. The resolution of the video image utilized for analysis is

640x480 pixels and images are acquired at 30fps. Both techniques require the positional data and

dimensions of the person’s face and its features within the video image. The program used to

detect a person’s face and its features was developed by another ASBLab member [80]. The

presence of a face and its features is detected utilizing Haar feature-based cascade classifiers

[81]. Once the face and its features are located within the image, a tracking algorithm is then

utilized to monitor their positions during the course of the interaction. If the tracking algorithm

loses the face and its features due to them exiting the view of the camera, tracking is disabled

and detection is performed again. The following sections demonstrate how this data is used to

determine the user state.

User State: Happy

An open-source smile detection program [82], which is based on facial expression detection

algorithms developed by Bartlett et al. [83], is used to detect that the person is smiling. The video

image is further downsized to a resolution of 320x240 pixels for this program. The program first

scans video image to detect approximately upright-frontal faces. The faces found are then scaled

into image patches of equal size, convolved with Gabor energy filters, and then passed to a facial

expressions recognition engine. The engine utilizes an Adaptive Boosting learning algorithm to

select a subset of Gabor filters and then trains Support Vector Machines (SVMs) on the outputs

of the Gabor filters. Facial expressions are recognized based on the Facial Action Coding System

(FACS) [84].

User State: Distracted

The distracted user state is defined to be when a person’s face is not oriented towards the robot

or the meal. The person is seated in front of the robot at approximately eye level and the meal is

situated on the table directly in front of him/her. Therefore, the user is distracted when he/she is

looking left, right, or up, with respect to looking straight ahead at the robot. Facial orientation is

detected by monitoring the distances between the following facial features: eyes, nose, and

mouth. For a sampling period of 0.5 seconds, the face orientation is monitored. A voting system

Page 43: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

35

is used to determine the person’s detected face orientation for that sampling period. Namely, the

orientation that occurs the most during the sampling period is the detected orientation. The

algorithms utilized to detect if the person is distracted were developed by another ASBLab

member [80].

Horizontal face orientation is found by comparing the horizontal distance between the right eye

and the nose (xRE,N) to the horizontal distance between the left eye and the nose (xLE,N). First, the

difference of these two distances ( ) is determined:

(1)

is then compared to the threshold (i.e. to

determine the horizontal face orientation of the person for a single video image:

Horizontal Face Orientation {

Center

Left

Right

(2)

Vertical face orientation is found by comparing the average vertical distance between the eyes

and the nose (yE,N) to the vertical distance between the nose and mouth (yN,M). Since both

distances are not similar, an offset value is empirically determined through calibration for

each user. First, the difference of these two distances plus the offset ( is determined:

(3)

is then compared to the threshold to determine the vertical face orientation of the

person for a single video image:

Vertical Face Orientation {

Level

Up

Down

(4)

Figure 15 and Figure 16 show the different horizontal and vertical face orientations that can be

detected. For the case when the person’s face turns 45 degrees or more away from the robot and

the frontal face is lost, the person’s profile face will be detected. Since some facial features are

occluded and cannot be detected on the person’s profile face, the aforementioned face orientation

Page 44: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

36

detection algorithm cannot be applied. Therefore, the orientation of the profile face will be

assumed to be same as the last detected horizontal face orientation.

(a) (b) (c)

Figure 15: Horizontal face orientation: (a) facing right, (b) facing center, and (c) facing left.

(a) (b) (c)

Figure 16: Vertical face orientations: (a) facing down, (b) facing level, and (c) facing up.

User State: Anger

Anger detection is performed by sensing the key emotion-related facial actions as defined in the

Facial Action Coding System Affect Interpretation Database (FACSAID) [85]. Table 5 shows

the facial action units related to the emotion anger. Based on these facial action units, the

proposed detection algorithm utilizes two indicators to define anger: (i) the decrease of space

between the eyebrow and the eye and/or (ii) the increased slope of the eyebrow relative to the

center of the face (i.e. the eyebrows make a “V” shape). These indictors can be detected by

tracking two points on each eyebrow and the location of each eye.

Table 5: Facial Action Units for Angry [84][85]

Emotion Action Units Description

Anger Brow Lowerer Lowers the eyebrow

Upper Lid Raiser Widens the eye aperture

Lid Tightener Narrows eye aperture

Page 45: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

37

Two regions of interest are created to find two points on each eyebrow, i.e. Figure 17. The size

and location of the region is relative to the size and location of the detected eye. The 1st region

(i.e., red rectangle) encompasses the inner portion of the eyebrow and the 2nd

region (i.e., green

rectangle) encompasses the outer region of the eyebrow.

Figure 17: Two regions of interest for eyebrow sensing

Each region is first converted into a binary image in order to highlight the contrast between the

darker eyebrow and the lighter skin. For persons with light eyebrows and darker skin, the image

is inverted before being converted into binary images. To locate the vertical position of the

eyebrow within the region, the algorithm determines the percentage of white pixels in each row

of pixels in the region. Beginning at the top of the region, the first row of pixels with a high

percentage (i.e., 40% or higher) of white pixels is defined as the vertical location of the eyebrow.

Figure 18 shows the program detecting a person’s eyebrows when the person has a neutral

(Figure 18a) and an angry expression (Figure 18b). It can be seen from Figure 18b that the inner

eyebrow (i.e. red circles) moves closer to the eye and the slope of the eyebrows changes when

the person is displaying an angry expression.

(a) (b)

Figure 18: Detection of the eyebrow and its slope: (a) neutral face and (b) angry expression

Page 46: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

38

3.3.1.3 Activity State Module

The objective of this module is to monitor: (i) the consumption of the meal and (ii) the state of

the utensil. Namely, with respect to meal consumption, the module keeps track of the amount of

food or drink remaining and if there is a change in weight detected in each dish or cup. The

module also detects if the cup has been taken from the tray. The state of the utensil is detected in

terms of its current location and the direction in which it is moving.

An overall activity state a is defined by two distinct classification: (i) the person is idle or

obtaining food/beverage s(mw,sw,dt,ul,um) and (ii) the person has consumed the food item

s(ul,db). Table 6 shows a summary of all the detected activity state parameters and their

respective detected levels.

Table 6: Activity State Parameters for the Meal-time Scenario

Activity State Parameter Levels

Main plate weight level, m

Side dish weight level, s

Drink weight level, d

0: 76 - 100% full

1: 51 - 75% full

2: 26 - 50% full

3: 6 - 25% full

4: 0 - 5% full

Main plate weight change, mw

Side dish weight change, sw

Drink weight change, db

0: No weight decrease

1: Weight decrease

Presence of drink, dt 0: Drink is on meal tray

1: Drink has been picked up

Utensil location, ul 0: At tray

1: At mouth of person

Utensil movement, um 0: Not moving

1: Moving towards mouth

2: Moving towards tray

Meal Consumption

The calibration procedure for the meal sensing tray and the algorithms utilized to track meal

consumption has been developed by another ASBLab member [78]. The meal sensing tray is

designed to accommodate a variety of standard dishware that is used in long-term care facilities.

If new dishware is introduced, the program needs to be calibrated with the weight of an empty

dish/cup. This value can be obtained by placing the object on the scale for the main dish and

running a calibration program, which will ask the user if the object is the main dish, side dish, or

Page 47: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

39

cup. This information is then stored in a text file, which can be read by the Activity State

module. If the calibration program is not run, the robot will assume that the same dishware is

used.

In the beginning of the interaction, the initial weight of each meal item is obtained by the weight

sensors as the user is greeted by Brian 2.0. Based on this initial reading, five consumption levels

are created for each dish or cup, which are defined in Table 6 for the main dish, drink and side

dish. During the interaction, the weight sensory data from the meal tray sensing platform is

utilized to determine the user’s current consumption level of each meal item. Namely, the

module checks to see in which consumption range the current weight reading of the food item

falls. In addition, the weight readings are used to determine if the user has obtained some food

from either the main plate or side dish by searching for a small food weight decrease of at least

10g. For the drink a small weight change of 10g would indicate that the user has taken a sip from

the cup. Specifically for the drink, a weight reading of zero indicates that the user has picked up

the cup.

The weight readings from the load cells are subjected to data acquisition delays, sensor noise and

errors caused by the user exerting pressure onto the sensor with his/her utensil. For example,

pushing on the dishware with one’s utensil is a common occurrence when obtaining food as

contact is made when one scrapes, cuts or mixes one’s food. To minimize the effect of these

aforementioned errors and to obtain accurate weight readings, sensor signals are passed through

a median filtering algorithm. This method was chosen because it was found that filters that

manipulate raw sensor data using averaging techniques such as low bypass filters were

unsuitable for the application of the meal tray. Averaging techniques would erroneously include

increased weight values caused by utensil pressure in the calculation of meal weight and weight

change. Through testing, it was found that applying the median filter on twenty-one sensor

readings per cycle was effective at minimizing the effects of utensil pressure. With this filtering

process, it takes 2 seconds to achieve an accurate weight reading.

State of the Utensil

The algorithms utilized to obtain utensil location have been developed by another ASBLab

member [78]. The state of the utensil is obtained by analyzing the current location of the utensil

relative to the person’s face, which is obtained by the User State module (Section 3.3.1.2), and

Page 48: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

40

the utensil’s previous location. For example, if the detected location of the utensil is close to the

detected location of the face, then the utensil is identified as being at the mouth. Conversely, if

the utensil is detected to be far away and below the face, then it is identified as being at the tray.

In order to detect if the utensil is moving and in which direction, the current location of the

utensil is compared to its last five previous locations. From that, the utensil can be identified as

moving up or down or not at all. During HRI, the utensil location is tracked through the images

provided by the camera.

In order to accurately compare the location of the face and the utensil, visual data from the

640x480 pixel resolution webcam and the 1024x768 pixel resolution IR camera must be related

to physical distances relative to a reference point. First a common reference point (i.e. a point in

the bottom left corner of both images) is chosen to be the [0,0] coordinate. From that reference

point, an object is moved a certain distance and the pixel location of the object is observed in

both types of images. This action is performed ten times in order to find two linear relationships:

(i) between actual distances and pixel location in the webcam image, and (ii) between actual

distances and pixel location within the IR camera image.

3.3.1.4 Speech Recognition and Analysis Module

Human speech is recognized via the Speech Recognition and Analysis module. Herein, speech

recognition and analysis is performed with the same software (i.e. Julius) that was utilized for the

memory game scenario. For the Meal-time Scenario, the vocabulary definitions have been

configured to recognize the words “Yes” or “No”, which can be spoken by the user in response

to a question posed by the robot.

3.3.1.5 Robot Emotional State Module

Similar to the Memory Game Scenario, a finite-state machine approach is used to match the

appropriate robot emotion to on a given user state and the robot’s current assistive action within

the context of the Mealtime Scenario. The same robot emotions are also utilized in this scenario.

Table 7 shows a summary of the robot’s emotional state for various situations. The robot is in

happy state when providing encouragement. Regardless of the robot action to be implemented,

when the user is angry, the robot will try to improve the user state of a person by being in a

Page 49: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

41

happy state. The robot is sad when the person is distracted. For all other cases, the robot’s

emotional state is neutral.

Table 7: Robot Emotional State for Meal-assistance Robot

Human

User State

Current Robot Assistive Action

Encourage Cue Orient Monitor

Happy Happy Neutral Neutral Neutral

Neutral Happy Neutral Neutral Neutral

Angry Happy Happy Happy Happy

Distracted Happy Neutral Neutral Sad

3.3.1.6 Behaviour Deliberation Module

The detailed design of the Behaviour Deliberation module will be discussed as it pertains to the

robot engaging a person in the meal-assistance activity.

3.3.1.7 Inter-module Communication System

Communication between all modules is facilitated by the use of a data-pool system. Parallel

operation of the modules is important in this application because activity, speech, and user state

sensing must be performed continuously and simultaneously. For example, the Activity State and

the Speech Recognition and Analysis module must sense quick actions (e.g., a person picking up

the cup or answering “Yes” in response to a question), which may be missed if the module only

monitors the activity when requested by the Behaviour Deliberation module. The User State

module tracks face and its features instead of detecting them every time, which minimizes the

module’s computational load, but requires that the module be run continuously.

Page 50: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

42

Chapter 4 Learning-based Decision Making for the Behaviour Deliberation

Module

4.1 Behaviour Deliberation Module

As previously mentioned, the main decision making module of the HRI control architecture is

the Behaviour Deliberation module. The module requires inputs from all sensor data analysis

modules in order to determine the robot’s effective assistive behaviour. The module is composed

of two layers: (i) the Knowledge Clarification layer and (ii) the Intelligence layer. The role of

Knowledge Clarification layer is to clarify the current state of the activity and person before it is

sent to the Intelligence layer. Multi-modal fusion techniques and/or clarification dialogue

techniques can be integrated into this layer to ensure that the state submitted to the Intelligence

layer is as accurate as possible. The Intelligence layer consists of the MAXQ hierarchical

reinforcement learning technique which is capable of adapting the robot’s behaviour to the

current assistive interactive scenario. This layer determines the overall behaviour of the robot as

a function of both verbal (speech) and nonverbal (gestures, and facial expressions and intonation

based on the robot’s emotions) communication means. A HRL approach is utilized to provide the

robot with the ability to: (i) learn appropriate assistive behaviours based on the structure of the

activity, and (ii) personalize an interaction based on a person’s behaviour and user state during

HRI.

4.2 Model of HRI Scenario

The modelling of the HRI scenario follows a standard Markov Decision Process (MDP) setup

[46], which consists of:

S: a finite set of states of the environment.

A: a finite set of available actions for the current state s.

P: the probability that the environment will transition from the current state s to a

resulting state s’ when an action is performed.

R: a real-valued reward that the agent receives based on a, s, and s’.

Page 51: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

43

The role of reinforcement learning is to solve the MDP by determining the policy, π, which maps

a particular action to a state (i.e. ). The goal is to determine a policy that maximizes the

expected cumulative reward and minimizes the cost of the actions taken to reach a terminal state.

4.3 MAXQ Hierarchical Reinforcement Learning

MAXQ provides a hierarchical decomposition of a given reinforcement learning problem (task)

into a set of sub-problems (sub-tasks). MAXQ is able to support temporal abstraction, state

abstraction, and subtask abstraction which are important in the decision making process for the

socially assistive robot in the HRI scenario. The need for temporal abstraction exists since,

depending on the person’s skill level in the chosen activity and their interaction preferences,

some actions may take varying amounts of time to execute. State abstraction is beneficial since

all state variables are not needed at certain tasks. Due to state abstraction, the overall value

function for this task can be represented more effectively by utilizing only a subset of the state

variables. Subtask abstraction is also necessary because it allows subtasks to be learned only

once; the solution can then be shared by other subtasks. The next few sections outline the core

components of MAXQ.

4.3.1 Task Decomposition

Utilizing the MAXQ task decomposition method, a given MDP M is decomposed into a finite set

of subtasks {M0, M1,…, Mn} [46]. M0 is the Root Task, which is the overall assistive task for

chosen activity. Each subtask consists of a set of actions A, which can be performed to achieve

subtask Mi. These actions can be either primitive robot behavioural actions or other subtasks.

Since the MDP is decomposed into subtasks, a hierarchical policy, π, which is a set containing a

policy for each subtask (i.e. { }), must be determined.

The exploration policy, πx, for a given task determines if the action for that task is chosen

randomly or based on the action’s Q-value and the current state s, i.e. a = πx(s). Two examples of

exploration strategies that can be used within the MAXQ framework are: greedy and epsilon-

decreasing. With a greedy exploration strategy, the action that is chosen is always the one with

the highest Q-value. Conversely, with an epsilon-decreasing strategy, the action with the highest

Q-value is chosen 1-ε of the time. For rest of the time, the action is chosen randomly. ε gradually

Page 52: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

44

decreases over time; therefore, action selection in the beginning is highly explorative and as time

progresses, action selection becomes more and more exploitative.

4.3.2 Value Function Decomposition

At the core of MAXQ is the value function decomposition, which describes how to decompose

the overall value function (i.e. Q-value) for a policy into a collection of value functions for the

individual subtasks, recursively [86]. The Q-value for a parent task p, state s, and action a, is

decomposed into two components [86]:

(5)

V(a,s) is the value function for action a, which can be subtask Mi or a primitive action. If the

action is a subtask, V(a,s) is further decomposed into the same two components (i.e. V(a,s) and

C(p,s,a)) for its own state and actions. To terminate the recursion, V(a,s) for a primitive action is

defined as the expected one-step reward of performing the primitive action a at state s. C(p,s,a)

is the completion function. Conceptually, V(a,s) is the reward for performing action a and

C(p,s,a) is the reward for completing the parent task.

4.3.3 MAXQ Learning Algorithm

The MAXQ value function for all tasks is learned utilizing the MAXQ learning algorithm [86]:

function MAXQ(state s, subtask p) returns float

Let TotalReward = 0

while p is not terminated do

Choose action a = πx(s) according to the exploration policy πx.

Execute a.

if a is primitive, observe one-step reward r

else r := MAXQ(s,a), which invokes subroutine a and returns the total

reward received while a is executed.

TotalReward := TotalReward + r

Observe resulting state s’

if a is primitive

else a is a subroutine

end // while

return TotalReward

end

Page 53: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

45

4.4 Memory Game Scenario

The Behaviour Deliberation module for the Memory Game Scenario is composed of two layers:

(i) the Knowledge Clarification layer and (ii) the Intelligence layer.

4.4.1 Knowledge Clarification Layer

For the Memory Game Scenario, this layer is in charge of generating a clarification dialogue

between a person and the robot in order to reduce errors as a result of speech recognition.

Namely, if the average confidence score for the utterance by the person is low, as determined by

the Speech Recognition and Analysis module, the robot will state the utterance that has the

highest relative confidence score and ask the person to confirm his/her request by providing

positive/negative feedback in the form of yes or no answers. This allows the robot to match the

utterance with its own stored activity-specific utterance templates and hence, increase the

accuracy of speech recognition.

4.4.2 Intelligence Layer

For the Memory Game Scenario, the Intelligence layer utilizes the MAXQ hierarchical

reinforcement learning technique to adapt the robot’s behaviour to the current assistive

interactive scenario. The following subsections will describe how this is performed.

4.4.2.1 Task Decomposition

The proposed hierarchical task graph for the Memory Game Scenario is presented in Figure 19.

M0 is the Root Task and is defined, herein, to be the overall assistive task, which aligns with the

objective of the card game: To identify and check that cards flipped over result in a

corresponding pair match. The other subtasks are designed to determine the appropriate assistive

behaviours of the robot based on the current user state and activity state.

Page 54: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

46

Root Task

Flip Over Flip Back

L1

Encouragement

Remove

Help

Localize Identify Recall

L2 L3 L4 I1 I2 R1 R2 R3 R4

CelebrationInstruction

Figure 19: Hierarchical task graph for the memory game scenario (primitive robot actions

on the bottom row are defined in Table 8).

Table 8: Examples of Primitive Robot Actions for Memory Game

Action Type Example

Instruction “Let’s play a round of the memory game. Please flip over a card.”

Celebration

(with prompting)

“Congratulations, you have made a successful match. Please

remove the cards from the game.”

Encouragement

(with prompting)

“Those are interesting cards that you have flipped, but they are not

the same. Please flip back the cards and try again. I know you can

do this.”

Help: Identify

(Player asks Robot

to identify a card)

I1: Related question

“This is a very good question. This card shows a picture of a dog.”

Help: Recall

(Player asks Robot

to recall a card)

R1: Level of difficultly = high

“Yes! You have definitely seen this card before in the game.”

R2: Level of difficultly = low

“Yes, you have seen this card before here.”

(Robot points at the location of the card)

R3: Card Location is not known

“You have not yet seen this card before.”

Help: Localize

(Player asks Robot

to locate a card)

L1: Level of difficultly = high

“The card is located in the top left corner.”

L2: Level of difficultly = low

“The card is located here.” (As identified by the pointing gesture of

the robot)

L3: Card Location is not known

“You have not yet flipped over this card.”

Help:

Identify/Recall/

Localize

I2/R4/L4: Unrelated question

“I’m sorry. I cannot answer your question at this time in the game.

Please try again later.”

Page 55: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

47

The three main 1st level subtasks are defined as: Flip over cards, Remove (matched) cards from

the game, and Flip back (unmatched) cards. Each 1st level subtask is divided into a primitive

action, which includes Celebration, Encouragement, and Instruction, and a 2nd

level subtask:

Help. The Help subtask is further divided into three 3rd

level subtasks: 1) Localize a particular

card in the game, 2) Recall if a card has been flipped over in a previous round, and 3) Identify the

picture on a particular card. For example, if there are no cards flipped over, the path taken in the

task graph should be: Root Task → Flip Over → Instruction. Alternatively, if there is one card

flipped over and the player has asked the robot to localize the matching pair of this card, which

was flipped over in a previous round, the path taken should be: Root Task → Flip Over →

Help→ Localize →L1; where L1 is the primitive robot action where the robot informs the person

of the location of the matching card.

Every subtask in the task graph has a termination condition. For example, for the Root Task, the

termination condition is that eight pair matches are found in the game. For the Flip Over subtask,

the termination condition is that there are two cards flipped over. If the termination condition for

this subtask is not met after i number of iterations, the robot becomes sad since the person has

become disengaged from the game. This change in robot emotion is used to re-engage the

person. The termination condition for both the Flip Back and Remove subtasks is that there are 0

cards flipped over. The termination condition for the Help, Localize, Identify, and Recall

subtasks is that there is no human speech input.

4.4.2.2 State and Action Definition

A set of states, S, have been determined for the aforementioned subtasks to be utilized within the

MAXQ framework. Specifically, the state functions for the robot’s subtasks are defined in Table

9, where sS:

Table 9: State Functions for Memory Game Scenario

Task State functions

Root Task s(mc, c, m)

Flip Over cards s(c, hs, hu, re)

Remove cards s(c, hs)

Flip back cards s(c, hs)

Help s(c, hs, hu, re)

Localize s(c, hs, gd, l, I)

Recall s(c, hs, gd, l, I)

Identify s(c, hs, I)

Page 56: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

48

mc represents the number of matches found in the game, c represents the number of cards flipped

over in a single round, m represents if a matched pair has been found, hu represents a person’s

user state, hs represents human speech, re represents the emotional state of the robot, l is the pair

location of the flipped over card, I is the identity of the flipped over card, and gd is the level of

difficulty of the game, which changes based on the number of incorrect matches the person has

made in the last n rounds.

Table 8 show examples of primitive robot actions. The primitive actions for the subtasks

Localize, Recall and Identify provide varying levels of encouragement and assistance to keep a

person engaged in the game. The first primitive action for Identify (i.e. I1) is to inform the player

of the identity of the card in question. A 2nd

primitive action for Identify (i.e. I2) is used when the

person asks a question that is not related to the activity state. Similarly, for Localize and Recall,

if the person asks a question that is not related to the activity state, the fourth action (i.e. L4 or

R4) is chosen. The remaining primitive actions for Localize and Recall provide two different

levels of difficulty of the game (i.e. L1 and L2, or R1 and R2). The third action (i.e. L3 or R3) is

used to deal with the case when a card’s location is unknown since the card has yet to be flipped

over.

At the start and end of the game, the Deliberation module implements the following behavioural

actions for the robot: 1) At Game Start: “Hi, my name is Brian. I am glad you want to play the

memory game with me. Let’s start.”, and 2) At Game End: “Congratulations, you have

completed the memory game.”

4.4.2.3 MAXQ Training

A two stage training procedure has been implemented for the MAXQ approach discussed in

following subsections. In the 1st stage, the primary focus is on determining appropriate

behaviours for the robot based on the structure of the game. After the robot has learned its

optimal behaviours with respect to the card game, the 2nd

training stage focuses on developing

personalized interactions for each person utilizing his/her user state during game playing.

MAXQ Off-line Training

The objective of the 1st training stage is to learn the robot’s optimal behaviours based on human

actions and activity states. On-line training would be unrealistic to use at this stage due to the

Page 57: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

49

large amount of possible states and actions that need to be explored, as well as the extensive

amount of experience required to learn the optimal strategy. Therefore, an off-line training

procedure is utilized. The procedure incorporates a human user simulation model, error models

for both speech recognition and activity state detection, and an epsilon-decreasing exploration

strategy that can provide the extensive interaction experience needed for policy learning.

Human User Simulation Model

A simple probabilistic approach for user modeling is the n-gram model, [52], which predicts

human behaviour based on the last n-1 number of system actions. n-grams have been proven to

be effective in simulating real user behaviours for learning scenarios. Their main advantage is

that they allow for the multiplicity (i.e. the person can perform multiple actions at the same time)

and multi-modality of user actions to be modeled. Furthermore, they can easily be trained and

are fully domain-independent [52]. Herein, a bi-gram (n=2) human user model is used to

represent both human verbal and physical actions during the proposed assistive HRI scenario.

Experiments consisting of ten participants, each playing the memory game while interacting with

the robot, were performed to acquire the necessary data for the bi-gram model. In this bi-gram

user model approach, a person’s action is dependent on the last robot action, i.e.

p=P(actionhuman|actionrobot). For these experiments, an action is defined as any possible

behaviour of the robot or person related to the game. Examples of human actions include a

person flipping over a card or asking the robot a help-related question. Examples of the robot’s

actions are presented in Table 8. Full cooperation of the user during the interaction is assumed.

Namely, the user’s actions are related to the memory game, abiding by the rules of the game in

order to find all possible matches. The bi-gram user model as obtained from these experiments is

presented in Table 10.

Page 58: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

50

Table 10: Bi-gram User Simulation Model (Memory Game Scenario)

Human Actions

Robot

Actions

Flips

over 1st

card

Flips

over 2nd

card*

Flips

back

cards

Remove

cards

Ask

“What..?”

(Identify)

Ask

“Where..?”

(Localize)

Ask

“Have..?”

(Recall)

Instruct/

Flip Over

(0 card)

76% 21% 0% 0% 1% 1% 1%

Instruct/

Flip Over

(1 card)

0% 62% 0% 0% 11% 8% 19%

Encourage/

Flip back 0% 0% 97% 0% 1% 1% 1%

Celebrate/

Remove 0% 0% 0% 97% 1% 1% 1%

Answers

Identify 0% 86% 0% 0% 1% 12% 1%

Answers

Localize 0% 97% 0% 0% 1% 1% 1%

Answers

Recall 0% 67% 0% 0% 1% 31% 1%

* If there are 0 cards initially flipped over, this action is described as flipping

over two cards at once.

Sensor Error Model

To account for variations in recognition performance caused by noise and speaker-dependant

differences, a speech recognition error model that assumes a new speaker for every game is

utilized. For each recognition task (RT), the following equation is used to compute the

recognition rate (RR) [42]:

( )( ) ( (0,1)) ( )Game RR RT OverallRR RT Sample N RR RT (6)

The recognition results of ten different speakers are used to compute the overall RR and standard

deviation for Recall, Identify and Localize, i.e. Table 11. These errors are incorporated into the

simulation model for when the robot needs to detect a person’s verbal action.

Table 11: Speech Recognition Rates

Recall Identify Localize

Number of utterances detected 250 150 200

RROverall(RT) 82.0% 97.3% 97.0%

σRR(RT) 0.148 0.064 0.063

Page 59: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

51

The activity state detection error is based on determining: (i) the identity of the cards in the

game, and (ii) the number of cards flipped over by the user. The card identification error is

incorporated into the simulation model for when the robot must provide help to the user. The

game area is split into a 4x4 grid, representing the location of the cards. Table 12 shows the

detection rates for each section based on the results of ten detection trials per section.

Table 12: Card Identity Detection Rates

Game Area Column

1 2 3 4

Row

1 90% 100% 90% 100%

2 100% 100% 100% 100%

3 100% 100% 100% 100%

4 100% 100% 100% 100%

Errors resulting from detecting an incorrect number of cards flipped over are also incorporated in

the simulation model for when the robot must provide the appropriate instructions based on the

activity state. Table 13 shows the detection rates for detecting if there are 0, 1 or 2 cards flipped

over.

Table 13: Detection Rates for the Number of Cards Flipped Over

Number of cards flipped: 0 1 2

Number of occurrences 75 32 142

Detection Rate 100% 94% 100%

Rewards System

The aim of the reward system is to minimize the cost of the actions taken to reach the ultimate

goal of completing the activity. Therefore, the cost of an undesired robot action is higher than a

desired action. In the memory game, a desired action is defined as an appropriate action for the

current state (e.g. the robot congratulating a person when he/she has found matching cards).

Every completed primitive action is given a negative reward of -1; whereas undesired actions are

given an additional negative reward of -20. Desired primitive actions are not further rewarded. A

positive reward of +21 is given at 1st level subtasks if a person is asking a help-related question

and the appropriate Help subtask is chosen. A game reward of +400 is given at the Root Task

when the player finds all 8 matches in the game. The reward values presented here were chosen

in a manner that allows a clear distinction between desired and undesired actions.

Page 60: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

52

For example, when a player flips over the last two matching cards of the game, the correct robot

action is to celebrate the match and inform the player to remove the cards. If the person

completes the task or removing the cards, the resulting reward is r = 399 since a completed

primitive action has been implemented (-1) and the game is finished (+400). Alternatively, if

there is one card flipped over and the player asks the robot to localize a card that has been

previously flipped over, the appropriate robot action would be to inform the user of the correct

location of the card. In this case, the resulting reward is r = +20 since a primitive action has been

implemented (-1) and the appropriate Help subtask has been chosen based on the player’s

question (+21).

Exploration Policy

An epsilon-decreasing exploration strategy is applied during off-line training. At the beginning

of off-line training, ε is set to 1 for the Root Task as well as all 1st and 2

nd level subtasks to

encourage the maximum amount of exploration possible. 3rd

level subtasks, which only evoke

primitive actions employ a greedy policy (i.e. ε = 0), where the action with the highest potential

reward (Q-value) will always be chosen. Since a previously implemented action will result in a

negative reward for that action, a greedy policy is used here to ensure that all primitive actions

are explored at least once. Once all primitive actions have been explored, the exploration policy

gradually reduces to 0 at the Root Task as well as all 1st and 2

nd level subtasks. This is done so

that Q-values at these subtasks will converge to their optimal values.

Performance Analysis

A study was performed to compare the rate of convergence of the MAXQ approach versus a

traditional flat Q-learning approach, [87], for the proposed memory game scenario. The same

learning rate (i.e. α=0.8), initial Q-values, state parameters, primitive actions and user simulation

model were used for both implementations. Figure 20 presents the cumulative rewards for the

overall assistive task. The results from this study show that the MAXQ method converges at a

faster rate than the flat Q-learning approach. For flat Q-learning, there were 33 state parameters

and 13 primitive actions, resulting in 20,736 unique states and 269,568 Q-values. With state and

subtask abstraction, the MAXQ approach significantly reduces the amount of Q-values needed to

be stored to only 1,707 Q-values, making it a considerably more efficient solution to this

decision making problem.

Page 61: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

53

Figure 20: Comparison of MAXQ and flat Q-learning for the Memory Game Scenario

MAXQ On-line Training

Once the 1st training stage has determined the appropriate behaviours of the robot based on the

structure of the memory game, the 2nd

stage of the training process is implemented. This on-line

training stage is used to allow Brian 2.0 to learn its optimal assistive behaviours based on a

person’s user states during game engagement. The aim is to select the robot’s behaviours in an

attempt to maintain positive (i.e., pleased or excited) user states during game playing. It is

postulated that this will, in turn, allow a person to be more engaged in the cognitively stimulating

activity.

A novel on-line training procedure has been developed utilizing a person’s user state to explore

robot behaviours such as providing instruction or help when appropriate, and reward the

behaviours that succeed at improving user state during the memory game. Exploration of

behaviours is triggered by the robot detecting that the person is in a stressed state. At this user

state, the exploration policy, ε, is non-greedy for the Flip Over and Help subtasks. ε is gradually

reduced at every successful robot action. The robot will eventually revert back to the greedy

exploration policy when ε finally decreases to 0, where the action with the highest Q-value will

be chosen. This on-line training procedure is repeated for every new user that interacts with the

robot.

0 2 4 6 8

x 106

0

100

200

300

400

500

Primitive Actions (x106)

Cum

ula

tive R

ew

ard

Flat Q-learning

MAXQ

Page 62: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

54

At the end of the 1st stage of training, the Instruction behaviour (at the Flip Over subtask level)

has a higher Q-value than the Help behaviour. To promote exploration of both these behaviours

in the 2nd

stage of training, the first successful Help behaviour is given a reward of +20 so that it

has the same Q-value as the Instruction behaviour. Subsequent successful Help and all successful

Instruction behaviours are given a reward of +10. At the Help subtask level, a reward of +1 is

given to a successful Localize, Identify, or Recall action.

The on-line training procedure was implemented and tested on ten healthy adult participants (20

to 35 years old) as they played the memory game twice while interacting with the robot. Results

from this experiment will be discussed in Section 5.1.3.

4.5 Meal-time Scenario

4.5.1 Knowledge Clarification Layer

For the Meal-time Scenario, the objective of the Knowledge Clarification layer is used to clarify

speech recognition and to utilize a multi-modal fusion method to determine the current human

activity state. Similar to the Memory Game Scenario, this layer generates a clarification dialogue

between a person and the robot in order to reduce errors as a result of speech recognition. Human

intent is based on multiple inputs from the Activity State and User State modules. These modules

provide complimentary inputs so that there are multiple indicators for each human activity state

ha(t). For example, the following combination of inputs indicates that person may need

assistance from a staff member: (i) the person is currently idle as determined by the activity state

module, (ii) the person is distracted or angry as determined by the user state module, and (iii) the

person has been idle for a long period of time as determined by the length of time that has passed

since the person was first at the idle state.

Utilizing inputs from various modules, three distinct human activity states are defined: (i) the

person engaged in the activity s(a,ha(t-1)), (ii) the person may need assistance from a staff

member s(a,hu,it), and (iii) the person has confirmed that he/she needs assistance from a staff

member s(ha(t-1),hs). a indicates the activity state. hu represents the user state of the person and

hs represents human speech. it indicates if the person has idled for a long period of time (i.e. 10

minutes). Lastly, ha(t-1) is the previous human activity state.

Page 63: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

55

4.5.2 Intelligence Layer

The MAXQ HRL task hierarchy for the Meal-time Scenario is shown in Figure 21. The root task

for this scenario is defined to be the overall assistive task, which is to motivate the person to

consume the contents of the meal.

Motivate to Eat

Obtain food or drink

Put food or drink in mouth

Pick up beverage

Obtain food from side

dish

Obtain food from main

dish

Eat foodDrink

beverage

Encourage 1

Cue 1

Orient 1

Encourage2

Cue2

Orient 2

Encourage3

Cue3

Orient3

Monitor

Encourage4

Cue4

Encourage 5

Cue5

Figure 21: Task Decomposition for Meal-time Scenario

The root task Motivate to eat or drink is divided into two subtasks and one primitive action. The

two 1st level subtasks are defined as: Obtain food/drink and Put food/drink in mouth. The 1

st

level primitive action is Monitor, is where the robot asks if the person wants further assistance

from a staff member. This action is only evoked when the robot has sensed that person is

distracted or has been idling for a long time (i.e. 10 minutes). Based on the meal plan, Obtain

food/drink is a subtask designed to motivate the person to either obtain the food on a particular

dish with a utensil or pick up the beverage from the tray. Therefore, Obtain food/drink is divided

into three 2nd

level subtasks: Obtain food from main dish, Pick up beverage, and Obtain food

from side dish. Each of these subtasks is further decomposed into three primitive actions, which

are Encourage, Cue, and Orient. Based on whether food was previously obtained or the beverage

was picked up, Put food/drink in mouth is a subtask designed to motivate the person to either eat

the food he/she has obtained or drink the beverage. Therefore, Put food/drink in mouth is divided

into two 2nd

level subtasks: Put food in mouth and Drink beverage. These subtasks are each

decomposed into two primitive actions, which are Encourage and Cue. For example, if the

person is currently idle and the robot should motivate him/her to eat from the side dish, the path

taken in the task graph could be: Motivate to eat or drink → Obtain food or drink → Obtain food

Page 64: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

56

from side dish → Encourage. Alternatively, if the person has picked up the beverage, the path

taken could be: Motivate to eat or drink → Put food or drink in mouth → Drink beverage →

Cue.

Table 14 shows the termination conditions for each task. In general, the termination occurs: (i)

when the objective of a root task or subtask is completed, or (ii) if the user appears to be in need

of further assistance from a staff member. The only exception is for subtasks Obtain food from

main dish, Pick up beverage, and Obtain food from side dish. These subtasks also terminate

when the person has progressed in the interaction. For example, when the robot prompts the

person to obtain food from the main dish, but the person obtains food from the side dish, the

Obtain food/drink and Obtain food from the main dish subtasks have to terminate because the

person has now moved on to consuming the obtained food. In this case, the robot must adapt to

the obtained meal item and continue to provide the appropriate prompting.

Table 14: Task Termination Conditions for Meal-time Scenario

Task Termination Conditions

Motivate to eat or drink The meal is completed.

Assistance from a staff member has been requested.

Obtain food or drink/

Obtain food from main dish/

Pick up beverage/

Obtain food from side dish

The person has obtained food or picked up their beverage.

The person is distracted or has been idling for a long time.

Put food or drink in mouth The person has brought the food or drink to their mouth.

The person is distracted or has been idling for a long time.

Put food in mouth The person has brought the food to their mouth.

The person is distracted or has been idling for a long time.

Drink beverage The person has brought the drink to their mouth.

The person is distracted or has been idling for a long time.

4.5.2.1 State and Action Definitions

S is a set of states that have been determined for the aforementioned subtasks to be utilized

within the MAXQ framework. Table 15 shows the state functions for the robot’s subtasks, where

sS.

Page 65: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

57

Table 15: State Functions for Mealtime Scenario

Task State functions

Motivate to eat or drink s(m,ha)

Obtain food or drink s(mp,ha)

Put food or drink in mouth s(ha,ms,fd)

Obtain food from main dish s(mw,m hu)

Pick up beverage s(dt,d,hu)

Obtain food from side dish s(sw,s,hu)

Put food in mouth s(fe,hu)

Drink beverage s(db,hu)

m represents how much of the meal has been already consumed. mp represents the designated

meal plan, which is the order in which the dishes and drink should be consumed. ha is the current

human activity. ms is the meal status, which indicates if the meal is still in progress or finished.

fd represents whether a food item or drink has been obtained by the person. mw indicates that the

person has removed food from the main dish, resulting in a weight change. Similarly, sw

indicates that food has been taken from the side dish. dt represents that the person has taken the

beverage off the tray. m, d, and s represents how much of the main dish, drink, and side dish, has

been consumed by the person, respectively. fe indicates that the food has been successfully

brought to the mouth and eaten by the person. db represents that the person has drank the

beverage, resulting in a weight change. Finally, hu represents the user state of the person.

The Monitor action is used to call a staff member to assist with the scenario at hand. The

Encourage, Cue, and Orient primitive actions represent the three different techniques used to

motivate the person to complete the given task. Encourage actions are positive reasoning tactics

to convince the person to perform the task. Namely, the robot may promote the task by informing

him/her of the positive health benefits of a certain food/drink type, providing reinforcement for

completing the previous task, adding courteous and encouraging words to cuing phrases,

promoting healthy eating habits (e.g., chewing the food or drinking slowly), or commenting on

the aroma or visual appeal of the food/drink. Cueing actions are designed to be straight forward

and direct instructions. In this case, the robot simply cues the person to eat a certain type of food,

use a utensil to obtain the food, pick up the beverage, or slow down if he/she is eating too fast.

Lastly, Orienting actions are designed to provide general awareness of the situation. Namely, the

robot may inform the person of which meal he/she is attending, his/her location, what type of

Page 66: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

58

food is in the meal (e.g. pasta with sauce, juice), and the location of the food and drink items on

the tray. Table 16 shows examples of the aforementioned primitive actions.

At the start and end of the meal, the Deliberation module implements the following behavioural

actions for the robot: (i) at the beginning of the meal: “Hi, my name is Brian. You look very nice

today. I will be joining you for lunch. Let’s eat.”, and (ii) at the end of the meal: “I see that

you’ve finished your meal. Thank you for letting me join you for lunch. Have a nice day!” In

order to add more social elements to the interaction, as suggested in [73]-[76], the robot also

provides jokes and general positive statements about the interaction and the meal every 2

minutes.

Table 16: Examples of Primitive Actions for Mealtime Scenario

Action

Type Example

Monitor “I detect that you have not eaten in a long time. Would you like me to ask a

staff member to help you with your meal?”

Encourage

“The main dish smells amazing. Why don’t you pick up some food with your

fork?”

“Drinking some of your beverage will make your food easier to swallow.”

“I see that you have finished eating the entire main dish. Good job. Try the

side dish.

“What you have on your spoon looks delicious. Why don’t you take a bite and

see how it tastes?”

“Please drink your beverage slowly. You will enjoy it more.”

Cue

“Use your spoon to pick up your food from the main dish.”

“Take a drink from your cup.”

“Please chew your food before taking another bite.”

“Slow down. Please drink your beverage slowly.”

Orient

“The menu today includes chicken breast, vegetable medley, and rice.”

“We are in the dining room and you are enjoying lunch with me.”

“Your side dish is located at the bottom right corner of your tray.”

4.5.2.2 MAXQ Training

Similar to the Memory Game Scenario, the proposed MAXQ training procedure for the Meal-

time Scenario is performed in two stages. In the 1st stage, appropriate robot behaviours are

learned based on the structure of the meal-time activity. In the 2nd

stage, the interaction is

personalizing by learning which motivating technique is effective at convincing the person to

complete a specific task.

Page 67: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

59

MAXQ Off-line Training

MAXQ off-line training for Meal-time Scenario is presented in the following subsections.

Human User Simulation Model

Similar to the MAXQ training setup used for the Memory Game, a bi-gram human user model is

used to represent human activity stages during the proposed assistive HRI scenario. Examples of

the robot’s actions are presented in Table 16. This model assumes full cooperation of the user

during the interaction. Namely, the user’s actions are related to the meal activity, there is

generally a higher probability that the user will act as directed by the robot, and he/she has the

objective of consuming the entire meal. Figure 22 and Table 17 present the proposed bi-gram

model used for off-line training.

Idle

Obtain from main dish / Take beverage /

Obtain from side drink

Put food in mouth /Drink beverage

Needs Attention

Get Staff

Cue[1-3]Encourage[1-3]

Orient[1-3]

Cue[4-5]Encourage[4-5]

Cue[1-3]Encourage[1-3]

Orient[1-3]

Monitor

Legend

Robot ActionsHuman Activity

Stages

Figure 22: Flowchart of Human Actions for Meal-time Scenario

Page 68: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

60

Table 17: Human User Model for Meal-time Scenario

Human Activity Stages

Idle

Obtain

food from

main dish

Take drink

Obtain

food from

side dish

Put food in

mouth

Drink

beverage

Needs

Attention

Robot

Actions

Encourage

1 10% 75% 5% 5% 0% 0% 5%

Cue

1 10% 75% 5% 5% 0% 0% 5%

Orient

1 10% 75% 5% 5% 0% 0% 5%

Encourage

2 10% 5% 75% 5% 0% 0% 5%

Cue

2 10% 5% 75% 5% 0% 0% 5%

Orient

2 10% 5% 75% 5% 0% 0% 5%

Encourage

3 10% 5% 5% 75% 0% 0% 5%

Cue

3 10% 5% 5% 75% 0% 0% 5%

Orient

3 10% 5% 5% 75% 0% 0% 5%

Encourage

4 0%

0% or

10%* 0%

0% or

10%* 85% 0% 5%

Cue

4 0%

0% or

10%* 0%

0% or

10%* 85% 0% 5%

Encourage

5 0% 0% 10% 0% 0% 85% 5%

Cue

5 0% 0% 10% 0% 0% 85% 5%

Monitor 95% 0% 0% 0% 0% 0% 5%

*0% or 10% depending on where food was obtained from. For example, if food was obtained from the main dish,

then there would be a 10% that Encourage4 and Cue4 would result in the person staying in the same state (i.e. the

person has not brought the food to their mouth yet) and 0% that the person obtained food from the side dish.

Sensor Error Model

The activity state detection error model is presented in Table 18. Detection rates are determined

via repeatability testing.

Page 69: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

61

Table 18: Sensor Error Model (Meal-time Scenario)

Activity State Parameters Detection Rate

Consumption Levels

Main Dish 99%

Beverage 100%

Side Dish 96%

Weight Change

Main Dish 100%

Beverage 97%

Side Dish 98%

Beverage Picked up from tray 100%

Utensil Location At tray 99%

At mouth 99%

Utensil Movement

Not moving 100%

Moving to mouth 99%

Moving to tray 100%

Reward System

In the Meal-time Scenario, a desired action is defined as an appropriate action for the current

state (e.g. the robot encouraging the person to eat the food that he/she has obtained from the

main dish). Every completed primitive action is given a negative reward of -1; whereas

undesired actions are given an additional negative reward of -20. Desired primitive actions are

not further rewarded. A reward of +300 is given at the root task if the activity ends with the

person finishing his/her meal. Conversely, a reward of +50 is given if the activity ends with the

person requesting for further assistance from a staff member. The reward values presented here

were chosen in a manner that allows a clear distinction between desired and undesired actions.

For example, at end of the meal, the person has obtained the last remaining portion of food or

drink in the meal and the correct robot action is to encourage or cue the person to consume what

he/she has obtained. If the person performs the action as directed, and thus, ends the activity by

completing the meal, the resulting reward is r = 299, since a completed primitive action has been

implemented (-1) and the meal is completed (+300). Alternatively, if the person is idle and the

robot performs the incorrect action of cuing the person to eat the food he/she has obtained from

the main dish, the reward is r = -21, since a primitive action has been implemented (-1) and an

undesired action is performed (-20).

Page 70: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

62

Exploration Policy

An epsilon-decreasing exploration strategy is applied during off-line training. ε is originally set

to 1 for the root task and the 1st level subtasks to encourage the maximum amount of exploration

possible at the beginning of off-line training. 2rd

level subtasks, which only evoke primitive

actions employ a greedy policy (i.e. ε = 0), where the action with the highest potential reward

(Q-value) will always be chosen. A greedy policy is used here to ensure that all primitive actions

are explored at least once since a previously implemented action will result in a negative reward.

Over time, the exploration policy at the root task and 1st level subtask gradually reduces to 0,

which enables their Q-values to ultimately converge to their optimal values.

Performance Analysis

A convergence study for the Meal-time Scenario was also performed to confirm that the MAXQ

approach is more efficient than a traditional flat Q-learning approach. The same learning rate

(i.e. α=0.8), initial Q-values, state parameters, primitive actions and user simulation model were

used for both implementations. Figure 23 shows the convergence rates for the MAXQ method

and the flat Q-learning approach for this scenario. For flat Q-learning, there were 6,000,000

unique states and 78,000,000 Q-values. With state and subtask abstraction, the MAXQ approach

significantly reduces the amount of Q-values needed to be stored to only 650 Q-values, making it

a considerably more efficient solution to this decision making problem.

Figure 23: MAXQ vs. Flat-Q Comparison for Meal-time Scenario

0 1 2 3 4 5

x 106

0

100

200

300

400

Primitive Actions (x106)

Cum

ula

tive R

ew

ard

MAXQ

Flat Q-learning

Flat Q-learning MAXQ

Page 71: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

63

MAXQ On-line Training

The aim of the 2nd

stage of MAXQ training is to learn the optimal assistive robot behaviours that

will motivate the person to ultimately consume his/her entire meal. Herein, the effectiveness of

three motivation styles (i.e., Encourage, Cue, and Orient) is investigated for each person in terms

of convincing the person to consume a portion of his/her meal.

In this online procedure, up to three levels of customization can be applied. Namely, the robot

can learn the preferred motivation style for each 2nd

level subtask, user state, and consumption

level of the dish/drink (for the 2nd

level subtasks of Obtain food or drink only). This is performed

by first setting a non-greedy exploration policy at 2nd

level subtasks to trigger the exploration of

assistive behaviours. At each 2nd

level subtask, there are up to three possible assistive behaviours

from which the robot can choose from: Encourage, Cue, and Orient (for the subtasks of Obtain

food or drink only). Based on the exploration policy, an assistive behaviour is chosen and

executed, and the resulting human action is observed. If the robot is successful is motivating the

person to complete the task, the action is rewarded with an on-line training reward of +1. The

exploration policy is gradually reduced at every successful robot action. The robot will

eventually revert back to the greedy exploration policy when ε finally decreases to 0, where the

action with the highest Q-value will be chosen. This on-line training procedure is repeated for

every new user.

The on-line training procedure was implemented and tested on five healthy adult participants as

they each interacted with the robot during two meals. Results from this experiment will be

discussed in Section 5.2.1.2.

Page 72: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

64

Chapter 5 Experiments

Experiments were performed to evaluate the proposed control architecture and its learning-based

decision making capabilities for the Memory Game Scenario and the Meal-time Scenario. The

following two types of experiments were conducted for each scenario: (i) performance

assessment of the key modules within the control architecture and (ii) HRI studies that

investigate the effect of the robot’s behaviours during HRI scenarios.

5.1 Memory Game Scenario

For the Memory Game Scenario, four experiments were performed to evaluate: (i) the

performance of the individual modules within the control architecture, (ii) the robot’s ability to

engage an individual in a cognitively stimulating activity, (iii) the performance of the overall

proposed MAXQ online training procedure, and (iv) the robot’s ability to minimizes task-

induced stress during the a cognitively stimulating activity. The following subsections will

discuss the findings from all experiments.

5.1.1 Performance Assessment: Control Architecture

In this experiment, ten healthy adult participants (21 to 40 years old) were instructed to play the

memory game with Brian 2.0 in its entirety with the aim of utilizing all its instruction and help

functions during game playing. The objective of this experiment is to evaluate the ability of the

control architecture to: (i) accurately detect human behaviours during HRI and (ii) choose the

appropriate robot behaviour based on the current interactive scenario and the policy learned from

MAXQ offline training. The experimental set-up is shown in Figure 24. All participants were

familiar with the memory game. 16 picture cards were used in total for the game. The SIFT-

based card recognition and localization method, as described in Section 3.2.2.2, was utilized in

this set of experiments. Moreover, voice intonation was utilized to determine user state.

Page 73: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

65

Figure 24: Experimental Set-up for Performance Assessment (Memory Game Scenario)

5.1.1.1 Results and Discussions

Results of the experiments are presented in Table 19 and Table 20. Table 19 presents the

performance of the Activity State module and Table 20 shows the robot’s ability to successfully

determine and execute the appropriate behaviour (as determined by the Behaviour Deliberation

module) for a given game state. The number of trials represents the number of opportunities that

existed to observe a particular activity state parameter or robot behaviour.

As shown in Table 19, high success rates were achieved for all the tested game state parameters

during the experiments. It is interesting to note that there were significantly more opportunities

to detect the presence of two cards flipped over, compared to one card flipped over, as most

participants preferred to flip two cards at once rather than one card at a time. Typically, it was

found that human hands have an insignificant number of features associated with them compared

to a picture card; therefore, during game state analysis, if there is a hand present in an image, the

hand is ignored the majority of the time. Instead, the system focuses on locating clusters with a

large number of features. However, in two occurrences during the experiments, each happening

within two different trials, a participant’s hand was erroneously identified as a 2nd

card that has

been flipped over when only one card was flipped over in that trial. In general, a hand has the

potential to be falsely identified if a person is wearing nail polish, jewellery, or anything else that

provides additional texture to the hand. False identification of a card also occurred twice during

game play in two different trials with separate participants. Both participants had accidently

moved the card beyond the defined game area, leaving only a small portion of the card to be

Page 74: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

66

visible to the 2D camera. In one case, this false identification subsequently led to the incorrect

recognition of a “match” due to the lack of a sufficient number of keypoints.

Table 19: Activity State Identification Results (Memory Game Scenario)

Activity State Parameters No. of Trials No. of Failures Success Rate

Number of cards flipped

0 cards flipped 75 0 100%

1 card flipped 32 2 94%

2 cards flipped 142 0 100%

Localization and identification of cards 160 2 99%

“Match” recognition 80 1 99%

“No match” recognition 62 0 100%

From Table 20, it can be seen that the robot was successful at selecting and executing

appropriate emotion-based behaviours throughout the interactions. The two previously

mentioned types of errors resulted in the robot also implementing the wrong behavioural actions

(i.e., instructions and celebration (with prompting)). Failures related to Help behavioural actions

were primarily a result of the speech recognition software incorrectly recognizing the intent of

the participants. Namely, the acoustic model utilized for speech recognition was sensitive to the

different word pronunciations and speaking styles amongst the 10 participants, as well as

background noise during the interactions. The two following words were the most difficult to

recognize: “What” and “Have”. The acoustic model utilized in these experiments was not trained

to be participant-dependant and thus, inherently, as a general limitation to person independent

speech recognition techniques, it has difficulty correctly recognizing different pronunciations of

the same words. This limitation is reflected in the results shown in Table 20.

Table 20: Robot Emotion-based Behaviour Selection and Execution Results

Expected Robot Behaviour No. of Trials No. of Failures Success Rate

Game Start 10 0 100% Instruction 111 2 98% Celebration (with prompting) 80 1 99% Encouragement (with prompting) 62 0 100% Help: Player asks to recall card 42 9 79% Help: Player asks to locate a card 43 3 93% Help: Player asks to identify a card 44 8 82%

Game End 10 0 100%

Page 75: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

67

5.1.2 HRI Study: Activity Engagement

To assess the ability of Brian 2.0 using the proposed control architecture to engage people in the

memory game, an AB test design was implemented. One of the main symptoms of dementia is

the potential for people suffering from the disease to be easily distracted from a particular task

due to limited attention span and concentration [88]. The objective of this experiment is to

simulate scenarios in which a person can be distracted and show that, in such situations, the robot

can be effectively used to engage the person in the memory game.

5.1.2.1 Study Procedure

The experiment consisted of ten healthy adult participants (21 to 40 years old) interacting in a

one-on-one scenario with the robot in a laboratory setting. All participants were familiar with the

memory game. 16 picture cards were used in total for the game. Each participant was situated in

an environment in which they would be easily distracted from the memory game. The specific

aim was to assess and validate the robot’s ability to provide positive interactions during HRI-

based person-directed activities prior to initiating pilot testing at the ASBLab’s collaborative

healthcare facility with persons suffering from mild cognitive impairment. Namely, within this

specific aim the following hypothesis was addressed: The robot Brian 2.0 with social interaction

capabilities will increase the likelihood of a person engaging in a specific cognitively stimulating

activity.

The definition of activity engagement used by Brenske et al. [89] is applied to the cognitively

stimulating activity performed during the HRI scenario. Within the context of the memory game,

engagement is identified to consist of any of the following activities performed by the person: (i)

manipulation of the cards, (ii) looking towards the game area or the robot, and (iii) partaking in

verbal dialogue with the robot.

During the baseline (i.e., experimental set A), a table and chair set-up was utilized, where the

memory card game was arranged in the center of the table, in front of the person. Distractions in

the form of a tennis ball, magazines, a toy robot and a robotic dog were placed around the game,

i.e. Figure 25a. Each participant was instructed that the memory game was available to them to

play for a 20 minute time period. For the robot interaction scenario (i.e., experimental set B), the

sessions were designed to be the exact same as in the baseline except that the robot was present

Page 76: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

68

to provide social engagement, i.e. Figure 25b. The baseline scenario was conducted first and the

robot interaction scenario was conducted the following day. The objective of the overall

experiment was not revealed to the participants until the experiment was completed.

(a)

(b)

Figure 25: (a) Baseline Scenario and (b) Robot Interaction Scenario

Observations were targeted at the activities related to engagement in the memory game. Namely,

engagement was noted only if the person’s attention is solely directed towards the memory game

or Brian 2.0. A participant was observed every 30 seconds, during which the presence or non-

presence of engagement indicators were recorded. The percentage of intervals in which the

participants engaged in the memory game was determined and used to identify a participant’s

level of engagement. All observations were recorded using a small webcam to monitor the

participants indirectly in order to relieve any potential psychological pressure instilled by the

presence of an observer. In this experiment the SIFT-based method was used for card recognition

and voice intonation was utilized to determine user state.

5.1.2.2 Results and Discussions

Table 21 presents the results for the baseline and robot interaction scenario for all 10

participants. A one-tailed, paired t-test was performed to validate the hypothesis that the robot

Brian 2.0 with social interaction capabilities will increase the likelihood of a person engaging in

a cognitive stimulating activity. A statistical significance (p<0.001) between the mean values of

engagement in the no-robot and robot sessions was determined. It can be confirmed that

engagement in the memory game was noticeably greater in the robot interaction scenario, which

supports the hypothesis.

Page 77: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

69

Table 21: Engagement Results (Memory Game Scenario)

Participant

Baseline

Scenario “A”

(%)

Robot

Interaction

Scenario “B”

(%)

1 38 45

2 10 100

3 25 60

4 10 47.5

5 15 100

6 0 82.5

7 10 55

8 0 65

9 30 45

10 28 72.5

Mean 16.5 67.3

Standard Dev. 12.9 21.1

It is interesting to note that two participants had zero engagement in the memory game during the

baseline. Instead of playing the memory game, the two participants read the magazines and

played with the toys for the full 20 minute duration. In general, during the baseline, the

distracters worked well in causing the participants to deviate from (and in two cases ignore) the

memory game. Namely, on average the participants only played the memory game for 3.3

minutes during the baseline. In short, participants found the memory game, on its own, engaging

for only a short period of time. Even during game playing, they were often distracted by the

other objects and hence, at times, not fully concentrating on playing the game.

Conversely, for the robot interaction scenario, participants were more engaged in the memory

game due to the presence of the social robot. The robot’s instructions and help functions were the

most effective at keeping a participant’s attention on the game during the interaction period. On

average, the robot kept the participants engaged in the memory game for approximately 13.5

minutes. For two participants, the robot was even able to keep them engaged in the game for the

full 20 minutes. If the participants did not manipulate the cards or interact with the robot for a 1

minute interval, the robot would provide instructions or help to direct the participant’s attention

back to the game. This would cause the participants to be engaged in the game again and their

affective states would also increase. Because of this, the participants never deviated from the

Page 78: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

70

memory game for a long duration of time but rather for short time intervals. Examples of the

robot’s behaviour during the interactions are presented in Figure 26.

(a)

(b)

(c)

Figure 26: Examples of robot behaviour during interactions: (a) Robot providing

celebration in a happy emotional state after a correct match, (b) Robot providing help in a

neutral state, and (c) Robot providing instruction in a sad state when game disengagement

occurs.

5.1.2.3 Participant Survey Results

Once the participants were finished interacting with the robot, they were asked to provide their

feedback on the robot’s behaviour during game playing via a Likert scale survey. They were

asked to answer questions regarding the effectiveness of the robot’s communication and

intelligence attributes in terms of engaging them and enhancing their experience of the game.

80% of the participants stated that the instruction, celebration, and encouragement phrases

provided by the robot helped keep them engaged and interested in the game. 70% of the

participants found the help functions of the robot to be very useful, especially how these

functions assisted them in finding a match and provided specific details about the location and

identity of the cards. Furthermore, 70% of participants explicitly commented on how they liked

the robot’s ability to communicate with them in a clear and natural manner as they played the

game. Specifically, they liked that the robot was intelligent enough to recognize what questions

they were asking and give the appropriate answer, the variety of different phrases it was capable

of saying, and how its facial movements complemented its speech.

5.1.3 Performance Assessment: Learning-based Decision Making

The aim of this experiment is to evaluate the proposed on-line training procedure, which was

discussed in Section 4.5.2.2. In this experiment, ten healthy adult participants (20 to 35 years

Page 79: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

71

old) played the memory game twice while interacting with the robot. A baseline heart rate was

obtained for each participant prior to game initiation. A successful action is defined as a robot

action that improves a person’s user state from a stressed state to a non-stressed state. Figure 27

shows the experimental setup.

Observations from the first two experiments showed that voice intonation (i.e. Section 3.2.2.4)

provided limited opportunities in detecting user state during the course of the activity. Therefore,

the heart-rate based method (i.e. Section 3.2.2.4) was utilized from this point forward to allow

for a more continuous monitoring of user state during HRI. Moreover, the colour-based card

recognition and localization method (Section 3.2.2.2) was utilized from this point forward to

improve the sampling time for the activity state module.

Figure 27: Experimental Setup for Evaluation of Learning-based Decision Making

In this experiment, a scenario involving activity-induced stress is simulated in order to

demonstrate that, in such situations, the robot can be effectively used to minimize this type of

stress. As the participants are healthy adults, the following constraint was imposed on the game:

each participant must try to win the game with five or less incorrect matches.

5.1.3.1 Results and Discussions

Preliminary experiments demonstrate that the proposed on-line training procedure allows the

robot to learn its optimal assistive behaviours during personalized interactions. Namely, the robot

successfully detects a participant’s user state at every interaction, explores different behaviours,

and is rewarded when its behaviours improve user states.

Figure 28 shows the user states of all ten participants during the two games. One interaction is

defined to include a robot detecting a user’s action (which updates the activity state) as well as

Page 80: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

72

the robot’s reaction during game playing. For example, an interaction may include the robot

detecting a person has flipped over one card and the robot providing instruction to the person to

flip over another card. From Figure 28, it can be seen that the robot was successful at detecting

the participants’ change in user state throughout the activity.

Figure 29 provides a more detailed view of two sets of ten different interactions for each of the

participants, i.e., one for each game. The robot was able to explore and determine appropriate

behaviours during game playing utilizing the proposed MAXQ control architecture and on-line

training procedure based on the participants’ user states and activity states. For example, for

Participant A, the robot was able to detect that the person was in a stressed state at interactions 8,

12 and 35, and provided assistance via the Identify and Locate help actions. Similarly, the

Identify action was explored for Participant D at interactions 7 and 15, and for Participant H at

interaction 12. The Recall help action was explored for Participant F at interaction 28. The

Instruction action was also explored and found to be effective at improving user states for

Participant C at interactions 10 and 35; Participant E at interactions 3, 8, and 40; Participant F at

interaction 22; Participant G at interactions 3, 5, 9, and 39; Participant I at interactions 14, 42 and

47; and Participant J at interactions 10 and 68. During the second game, the policy resulted in the

robot performing more exploitation than exploration, due to the decrease in ε. For example, this

was observed with Participants B, E and I, where the Instruction action was repeatedly chosen

during game 2 at interactions 65 and 69; 36 and 40; and 42 and 47, respectively. In general, the

robot was, on average, 70% successful at improving user state with its Instruction and Help

actions during the experiments.

Page 81: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

73

Figure 28: Participant user states detected during the memory game

0

1

2

3

0 10 20 30 40 50 60U

ser

Stat

e

Interaction

Participant A

0

1

2

3

0 10 20 30 40 50 60 70 80

Use

r St

ate

Interaction

Participant B

0

1

2

3

0 10 20 30 40 50

Use

r St

ate

Interaction

Participant C

0

1

2

3

0 10 20 30 40 50 60

Use

r St

ate

Interaction

Participant D

0

1

2

3

0 10 20 30 40 50 60

Use

r St

ate

Interaction

Participant E

0

1

2

3

0 10 20 30 40 50 60 70 80

Use

r St

ate

Interaction

Participant F

0

1

2

3

0 10 20 30 40 50 60

Use

r St

ate

Interaction

Participant G

0

1

2

3

0 10 20 30 40 50 60 70 80

Use

r St

ate

Interaction

Participant H

0

1

2

3

0 10 20 30 40 50 60 70

Use

r St

ate

Interaction

Participant I

0

1

2

3

0 10 20 30 40 50 60 70 80

Use

r St

ate

Interaction

Participant J

Page 82: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

74

Figure 29: Interaction details for all participants

Participant A Excited 1 1 2M 1 Pleased 2M 1 Neutral 2 Stressed 1 2 1

6 7 8 9 10 11 12 13 14 15 Interaction

Participant A Excited 2M 1 2M Pleased 1 2M Neutral 1 2 1 Stressed 2 1

33 34 35 36 37 38 39 40 41 42 Interaction

Participant B Excited Pleased 2M 0 1 Neutral 1 1 2 0 1 2 Stressed 1

1 2 3 4 5 6 7 8 9 10 Interaction

Participant B Excited 2M 0 Pleased 1 2M 0 Neutral 1 Stressed 1 2 0 1

65 66 67 68 69 70 71 72 73 74 Interaction

Participant C Excited 1 2M Pleased 2M Neutral 2 1 Stressed 0 1 2 2 1

2 3 4 5 6 7 8 9 10 11 Interaction

Participant C Excited 1 2M 1 2M 1 Pleased 2M Neutral 1 Stressed 2 2 1

30 31 32 33 34 35 36 37 38 39 Interaction

Participant D Excited 1 2M 2M Pleased 2M 1 2M 1 Neutral Stressed 1 2 1

7 8 9 10 11 12 13 14 15 16 Interaction

Participant D Excited 1 1 2M 1 2M Pleased 2M 2M 1 2M Neutral 1 Stressed

43 44 45 46 47 48 49 50 51 52 Interaction

Participant E Excited 2M Pleased 1 1 Neutral 2 1 2 1 Stressed 1 0 1

3 4 5 6 7 8 9 10 11 12 Interaction

Participant E Excited Pleased 2M 1 Neutral 1 2 1 2 Stressed 1 2 2 1

36 37 38 39 40 41 42 43 44 45 Interaction

Participant F Excited 2M 0 1 2M Pleased Neutral 2 0 Stressed 0 1 2 1

20 21 22 23 24 25 26 27 28 29 Interaction

Participant F Excited 0 1 2M Pleased 2M 0 1 2M 0 1 2M Neutral Stressed

69 70 71 72 73 74 75 76 77 78 Interaction

Participant G Excited 1 Pleased 2M 2M Neutral 2 Stressed 1 2 1 1 2 1

1 2 3 4 5 6 7 8 9 10 Interaction

Participant G Excited 2M 1 2M 1 2M 1 Pleased Neutral 2 1 Stressed 1 2

36 37 38 39 40 41 42 43 44 45 Interaction

Participant H Excited 2M 1 Pleased 0 Neutral Stressed 1 2 0 1 2 0 1

9 10 11 12 13 14 15 16 17 18 Interaction

Participant H Excited 2M 0 1 2M 0 1 Pleased Neutral Stressed 1 2 0 1

64 65 66 67 68 69 70 71 72 73 Interaction

Participant I Excited 2M 0 Pleased 2M 1 1 2M 0 1 Neutral Stressed 2 1

13 14 15 16 17 18 19 20 21 22 Interaction

Participant I Excited 1 2M 0 1 Pleased 2M 1 Neutral Stressed 2 1 2 1

40 41 42 43 44 45 46 47 48 49 Interaction

Participant J Excited 2M 0 Pleased 1 2M 0 1 Neutral 2 1 1 Stressed 1

8 9 10 11 12 13 14 15 16 17 Interaction

Participant J Excited 2M 0 1 2M 0 1 2M Pleased Neutral Stressed 2 0 1

66 67 68 69 70 71 72 73 74 75 Interaction

Game State

0 Zero cards flipped

1 One card flipped

2 Two cards flipped + No match

2M Two cards flipped + Match

Robot Behaviors

Instruction: Flip over

card

Encouragement: Flip back cards

Celebration: Remove

cards

Help: Localize Help: Recall Help:

Identify

Page 83: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

75

Figure 30 shows the rewards for the Flip Over subtask for all ten participants during the

experiment to illustrate how rewarding of actions are implemented. After the two games, one can

observe which robot behaviours obtained the highest rewards and thus, became the current

optimal behaviour for each participant. For the majority of the participants the Instruction action

has the highest rewards followed by the Help action.

Figure 30: Rewards for the Flip Over subtask

-30-20-10

01020304050

0 10 20 30 40 50 60

Rew

ard

Interaction

Participant A

InstructionHelp

-30-20-10

01020304050

0 10 20 30 40 50 60 70 80

Rew

ard

Interaction

Participant B

-30-20-10

01020304050

0 10 20 30 40 50

Rew

ard

Interaction

Participant C

-30-20-10

01020304050

0 10 20 30 40 50 60

Rew

ard

Interaction

Participant D

-30-20-10

01020304050

0 10 20 30 40 50 60

Rew

ard

Interaction

Participant E

-30-20-10

01020304050

0 10 20 30 40 50 60 70 80

Re

war

d

Interaction

Participant F

-30-20-10

010203040506070

0 10 20 30 40 50 60

Re

war

d

Interaction

Participant G

-30-20-10

0102030405060

0 10 20 30 40 50 60 70 80

Re

war

d

Interaction

Participant H

-30-20-10

01020304050

0 10 20 30 40 50 60 70

Re

war

d

Interaction

Participant I

-30-20-10

01020304050

0 10 20 30 40 50 60 70 80

Re

war

d

Interaction

Participant J

Page 84: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

76

Figure 31 shows the rewards for the Help subtask for Participants A and D. For Participant A, at

the end of the two games, the reward for Localize is higher than the rewards for the other Help

actions, and for Participant D, Identify has the highest reward.

Figure 31: Reward for Help subtask

5.1.4 HRI Study: Minimizing Task-Induced Stress

The goal of this work is to design robotic interventions focusing on cognitively stimulating

leisure activities in order to strengthen the remaining cognitive abilities of a person and promote

meaningful engagement in these activities. Therefore, the notion of completing the activity is not

as essential as keeping the person stimulated and engaged in the activity. To accomplish this,

herein, a person’s activity-induced stress must be reduced. In general, activity/task-induced stress

is known to result in negative moods and lead to disturbances in motivation (e.g., loss of task

interest) and cognition (e.g., worry) [90]. Moreover, stress has been found to progress the

symptoms of dementia [91]. Hence, a social robot motivator is utilized to keep a person engaged

in a cognitively stimulating activity so that he/she may potentially receive the positive benefits

from the directed stimulus.

A 2nd

HRI study was conducted to verify the use of Brian 2.0 in minimizing activity-induced

stress during cognitively stimulating activities. In this study, the proposed MAXQ online training

procedure was performed during HRI. The study consisted of one-on-one interaction scenarios

between six healthy adult participants (21 to 35 years of age) and the robot in a laboratory

setting. The memory game was used as the stimulating activity. All participants had past

experience with the game.

The specific aim of this study is to assess and validate the robot’s influence on the user state of a

person during HRI-based person-directed activities. Namely, within this specific aim the

following hypothesis was addressed: The robot Brian 2.0 with social interaction capabilities will

minimize activity-induced stress during a specific cognitively stimulating activity.

-22

-21

-20

-19

-18

-17

0 10 20 30 40 50 60

Re

war

d

Interaction

Participant A

LocalizeRecallIdentify

-22

-21

-20

-19

-18

-17

0 10 20 30 40 50 60

Rew

ard

Interaction

Participant D

Page 85: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

77

The study simulated a scenario in which the participants could experience activity-induced stress

in order to demonstrate that, in such situations, the robot can be effectively used to minimize this

type of stress. As the participants are healthy adults, the following constraint is imposed on the

game: each participant must try to win the game with five or less incorrect matches.

5.1.4.1 Brian 2.0`s Influence on User State

The robot’s ability to improve user state is evaluated by examining the following two

performance indices: (i) the duration of the game (measured as percentage of game rounds) that

the participant is in a stressed state, and (ii) the average number of times during game playing

that the robot is actively able to improve user state when the participant is stressed. The 1st index

is calculated to validate the hypothesis and the 2nd

index is calculated to further investigate the

effectiveness of the robot’s behaviours.

5.1.4.2 Study Procedure

An AB test design was used to evaluate the ability of the robot to detect and improve the task-

based user state of a person during the memory game. During the baseline scenario, experimental

set A, a table and chair set-up was used, where the card game was arranged in the center of the

table, in front of the person, Figure 32a. Each participant was instructed to play the memory card

game with the aforementioned constraint imposed. Conversely, for the HRI scenario,

experimental set B, the sessions were designed to be the exact same as in the baseline except that

the robot was present to provide encouragement, help, and instructions during game playing,

Figure 32b. A baseline heart rate was obtained for each participant prior to each experiment. All

tests were recorded using a small webcam in order to monitor the participants indirectly to

relieve any potential psychological pressure instilled by the presence of an observer.

(a)

(b)

Figure 32: (a) Baseline Scenario and (b) HRI Scenario

Page 86: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

78

5.1.4.3 Results and Discussions

In order to calculate the performance indices defined in Section 5.1.4.1, the changes in user state

during the baseline scenario and the HRI scenario were analyzed. In particular, the task-based

user state of a participant was detected and recorded at every round of the game. Figure 33

presents the results for the baseline and HRI scenarios with respect to the percentage of game

rounds that a participant was in a stressed state.

Figure 33: The percentage of the interaction that a participant is stressed

In the baseline scenario, participants spent, on average, 42.7% (σ = 11.5%) of the total number of

rounds in a stressed state. In the HRI scenario, participants spent an average of 21.8% (σ = 9.6%)

of the total number of rounds in a stressed state. To validate the hypothesis, a one-tailed, paired t-

test was performed. A statistical significance (p<0.005) between the mean values of the

percentage of the rounds a participant spent in a stressed state in both sessions was determined. It

can be confirmed that the amount of rounds of the game spent in a stress state was noticeably

less in the HRI scenario, which supports the hypothesis.

Figure 34 shows a comparison of the percentage of rounds in the HRI scenario for a stressed or

positive state (i.e., pleased or excited). On average, the participants were in a positive state

57.2% (σ = 5.6%) of the total number of rounds compared to the 21.8% for the stressed state.

This result shows that the robot’s behaviours can help promote positive user states during game

0%

10%

20%

30%

40%

50%

60%

1 2 3 4 5 6

Percen

tag

e o

f R

ou

nd

s in

a

Str

ess

ed

Sta

te

Participant

Baseline Scenario HRI Scenario

Page 87: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

79

playing. The most effective robot behaviours in relieving stress were instructions and helping a

participant locate a corresponding card pair. It was observed that when utilizing these

behaviours, the robot, on average, was directly able to improve user state during game playing a

total of four times when a single participant was stressed. Examples of the robot’s behaviour

during the interactions are presented in Figure 35.

Figure 34: Comparison of the percentage of the interaction that a participant is in a

stressed or positive state during the HRI scenario

(a)

(b)

(c)

Figure 35: Examples of robot behaviours during interactions: (a) Robot providing help in a

neutral emotional state, (b) Robot providing celebration in a happy state after a correct

match, and (c) Robot providing instruction in a sad state when game disengagement

occurs.

5.1.4.4 Participant Survey Results

A post-experiment survey was administered to the participants after the HRI scenario to obtain

feedback on the robot’s behaviour during game playing. The types of questions that were asked

in the survey pertained to the robot’s social intelligence attributes as well as the participants’

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 2 3 4 5 6

Per

cen

tag

e o

f R

ou

nd

s d

uri

ng

HR

I S

cen

ari

o

Participant

Stressed State Positive State

Page 88: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

80

overall impressions of the robot. The participants were asked to choose their responses from a

list of robot behaviours.

The participants were first asked to identify the robot behaviours they found were the most

effective at relieving stress during game playing. Table 22 summarizes their responses based on

a ranking of the total number of responses for each behaviour. The robot providing instructions

was ranked the highest by the participants, which concurs with the quantitative experimental

results discussed above.

Table 22: Robot Behaviours Effective at Relieving Stress

Ranking Robot Behaviours

1st Providing instructions

2nd

Celebrating with you when you get a match

3rd

Prompting you to continue the game

4th

Encouraging you when you do not get a match

5th

Providing help

With respect to their overall impressions of the robot, ninety-percent of the participants stated

that the robot’s life-like appearance and its ability to express different emotions via facial

expressions and tone of voice while providing instruction, celebration, and encouragement

phrases helped keep them engaged and interested in the game. Seventy-percent stated that they

liked the fact that the robot provided companionship by just being present during the activity.

5.2 Meal-time Scenario

For the Meal-time Assistance scenario, the performance of the control architecture was validated

and HRI experiments were also performed.

5.2.1 Performance Assessment

Two sets of experiments were performed in order to individually evaluate the performance of: (i)

the control architecture and (ii) its learning-based decision making capabilities. The following

subsections will discuss the findings from both sets.

Page 89: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

81

5.2.1.1 Control Architecture

Experiments were conducted to assess the performance of the proposed control architecture for

the Meal-assistance robot. Namely, the following three key modules were evaluated: (i) Activity

State, (ii) User State, (iii) and Behaviour Deliberation.

Activity State Module

For the Activity State module, a series of repeatability tests were conducted to evaluate the

module’s ability to detect changes in activity state. Each activity state parameter was tested by

“consuming” a meal multiple times. Pasta was used as the main dish food, fresh fruit was used as

the side dish food, and water was used as the beverage.

User State Module

The User State module was tested utilizing images from two validated facial expression

databases as well as real-time natural experiences from three volunteer participants. To test the

recognition for the user state angry, the User State module was tested with 115 front-facing

images of 31 different subjects displaying an angry expression. 88 images of 9 female and 13

male subjects were chosen from the Cohn-Kanade facial expression database [92]. 27 images of

9 female subjects were also chosen from the Japanese Female Facial Expression (JAFFE)

database [93]. Similarly, in order to test the recognition for the user state happy, the User State

module was tested with 124 front-facing images of 33 different subjects displaying a happy

expression. 96 images of 10 female and 14 male subjects were chosen from the Cohn-Kanade

facial expression database [92]. 28 images of 9 female subjects were chosen from the JAFFE

database [93]. All images used to test the user states angry and happy were coded with a

validated emotional label, which was determined by the database’s own coding system. The

distracted user state was tested using 824 images of the three participants. These participants

were instructed to look toward or away from the robot when directed to do so.

Behaviour Deliberation Module

Lastly, the Behaviour Deliberation module was tested via an in-lab HRI experiment consisting of

six healthy adults (21 to 33 years old). Participants were instructed to mimic eating of two meals

while interacting with Brian 2.0. Figure 36 shows the experimental setup. Participants were

asked to mimic eating by removing the food off the plates and out of the cup and pretending to

Page 90: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

82

eat/drink by bringing food to their mouths. This experiment was used to test the robot behaviours

determined for the proposed scenario. Behaviours as determined from offline training were used

in this experiment.

Figure 36: Experimental Setup for Performance Assessment (Meal-time Scenario)

Results and Discussions

Results from this set of experiments are presented in Tables 23 to Table 27. As shown in Table

23 and Table 24, high sensitivity and specificity is achieved for all the tested activity state

parameters during the repeatability tests. A significant number of false negatives related to the

consumption levels were caused by undesired forces applied to the dishware during the act of

obtaining food with the utensil. In general, when food is scooped or pierced with a utensil, there

is a pressure applied to the dish. If there is prolonged contact (i.e. 5 seconds) the module

incorrectly detects that there is food added to the dish. In this test, the side dish was most

sensitive to these actions. Errors related to weight change may be attributed to there being an

insufficient amount of food/drink obtained from the dish or cup. In these tests, the system could

sense weight changes of 10g or more. Utensil location errors are the result of the inaccurate face

location data, which may occur when the User State module cannot detect the location of the

face. In this test, the person looked downwards while eating, causing the camera view of the

facial features to be skewed. Namely, the User State module could not detect the facial features

at this perceptive and thus, it temporarily lost the detection of the face. In Table 24, the higher

Page 91: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

83

number of false positives for the detection of weight change in the side dish are attributed to

vibrations caused by aggressively placing the cup into its holder.

Table 23: Performance Results for Activity State Module (Meal-time Scenario): Sensitivity

Activity State Parameters No. of False

Negatives

No. of True

Positives Sensitivity

Consumption

Levels

Main Dish 5 402 99%

Beverage 11 1476 99%

Side Dish 25 567 96%

Weight

Change

Main Dish 0 121 100%

Beverage 3 100 97%

Side Dish 2 106 98%

Beverage Picked up from tray 0 103 100%

Utensil

Location

At tray 1 100 99%

At mouth 1 99 99%

Utensil

Movement

Not moving 3 803 100%

Moving to mouth 2 230 99%

Moving to tray 0 243 100%

Table 24: Performance Results for Activity State Module (Meal-time Scenario): Specificity

Activity State Parameters No. of False

Positives

No. of True

Negatives Specificity

Consumption

Levels

Main Dish 0 2079 100%

Beverage 0 999 100%

Side Dish 0 1894 100%

Weight Change

Main Dish 0 314 100%

Beverage 3 329 99%

Side Dish 27 327 92%

Beverage Picked up from tray 0 332 100%

Utensil

Location

At tray 1 99 99%

At mouth 1 100 99%

Table 25 and Table 26 show high sensitivity and specificity for detecting the angry, happy, or

distracted user state. False negatives in detecting the correct facial expression were caused by the

limitation of the User State module in detecting very subtle expressions. This can be corrected by

changing certain detection thresholds; however, it was found that adjusting these thresholds to

detect subtle expressions also results in the detection of more false positives. Errors in detecting

Page 92: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

84

the distracted user state are attributed to losing the tracking or detection of key facial features.

For example, the module may lose the location of an eye if the person tilts his/her head too far in

one direction and the User State module can no longer detect eye at that perspective.

Table 25: User State Module Performance (Meal-time Scenario): Sensitivity

User State No. of False Negatives No. of True Positives Sensitivity

Angry 13 102 89%

Happy 14 110 89%

Distracted 78 768 91%

Table 26: User State Module Performance (Meal-time Scenario): Specificity

User State No. of False Positives No. of True Negatives Specificity

Angry 1 123 99%

Happy 4 111 97%

Distracted 60 1632 96%

As seen in Table 27, the Behaviour Deliberation module has a high success rate in choosing the

appropriate robot behaviour based on the person’s current action. Sources of error that were

found during this experiment include: (i) the person resting his/her hand on the dish causing a

false weight change, (ii) the person covering the infrared light with their hand causing a false

utensil location reading, (iii) vibration from slamming down the cup causing a false weight

change in the side dish, and (iv) the person leaving the utensil on the dish causing a false

consumption level reading.

Table 27: Performance of Behaviour Deliberation Module (Meal-time Scenario)

Expected Robot Behaviour No. of Trials No. of Failures Success Rate

Meal Start 12 0 100%

Obtain food from main dish 150 3 98%

Pick up beverage 78 2 97%

Obtain food from side dish 102 5 95%

Eat food 183 4 98%

Drink beverage 59 2 97%

Monitor 2 2 100%

Meal End 12 0 100%

Page 93: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

85

5.2.1.2 Learning-based Decision Making

On-line training was utilized to adapt the system on the fly to the preferred motivation style of a

person eating a meal. Five healthy adult participants (21 to 33 years old) interacted with the

robot during two meals. The experimental setup is similar to the one utilized for the evaluation of

the offline training procedure (i.e. Figure 36). In these experiments, the first level of

customization is performed. Namely, during HRI, the robot learns the preferred motivation style

for each 2nd

level subtask. A successful action is defined to be when a person completes the

requested task. For example, if the robot asks a person to obtain food from the main dish, then a

successful action is when the person complies with the robot’s request. For this set of

experiments, the exploration policy for all subtasks is set to gradually reduce to 0 from 1, in 15

successful actions for each subtask.

Results and Discussions

The experimental results demonstrate the ability of the proposed on-line training procedure to

allow the robot to learn its optimal motivation styles for each participant during HRI. Namely,

the robot successfully explores different behaviours at every interaction based on the current

exploration policy, and is rewarded when its behaviours enables the person to complete the

requested action. Figure 37-39 show the details of the interactions for the main dish, beverage

and side dish, respectively, during the 1st and 2

nd meal. The left column of each figure shows ten

interactions during the 1st meal and the right column shows ten interactions during the 2

nd meal.

It can be seen that there is, as predicted, explorative behaviour during the first few interactions

with the dish or beverage, where the robot is trying different behaviours. As the HRI scenario

progressed into the 2nd

meal, robot behaviours became more exploitative (i.e. >50% exploitative

actions), which can be seen in the later interactions for all participants except for Participant A,

B, and C. For Participant A, the robot’s behaviour is still very explorative for the Pick up

beverage subtask since there have only been four successful actions prior to the interactions

shown for the 2nd

meal. For Participants A, B, and C, the robot’s behaviour is still very

explorative (i.e. >40% explorative actions) for the Drink beverage subtask.

Page 94: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

86

Figure 37: Details of interactions involving the main dish

Needs attentionEating/Drinking M M M M M M M M M M M

Obtained Food/Drink E E E E E E E EIdle M

1 2 3 4 5 6 7 8 9 10 66 67 68 69 70 71 72 73 74 75

Needs attentionEating/Drinking M M M M M M M M M

Obtained Food/Drink E E E E E E E EIdle M M M

1 2 3 4 5 6 7 8 9 10 90 91 92 93 94 95 96 97 98 99

Needs attentionEating/Drinking M M M M M M M M M

Obtained Food/Drink E E E E E E E E E EIdle M

1 2 3 4 5 6 7 8 9 10 101 102 103 104 105 106 107 108 109 110

Needs attentionEating/Drinking M M M M M M M M

Obtained Food/Drink E E E E E E E E E E EIdle M

1 2 3 4 5 6 7 8 9 10 98 99 100 101 102 103 104 105 106 107

Needs attentionEating/Drinking M M M M M M M M M

Obtained Food/Drink E E E E E E E E E EIdle M

1 2 3 4 5 6 7 8 9 10 67 68 69 70 71 72 73 74 75 76Interaction

Participant E: Main Dish

Interaction

Participant D: Main Dish

Interaction

Participant D: Main Dish

Interaction

Participant E: Main Dish

Interaction

Participant B: Main Dish

Interaction

Participant C: Main Dish

Interaction

Participant C: Main Dish

Interaction

Interaction

Participant A: Main Dish Participant A: Main Dish

Interaction

Participant B: Main Dish

Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage

Subtask

Page 95: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

87

Figure 38: Details of interactions involving the beverage

Needs attentionEating/Drinking B B B B B B B B B B

Obtained Food/Drink D D D D D D D D D DIdle

22 23 24 25 26 27 28 29 30 31 83 84 85 86 87 88 89 90 91 92

Needs attentionEating/Drinking B B B B B B B B B B

Obtained Food/Drink D D D D D D D D DIdle B

54 55 56 57 58 59 60 61 62 63 102 103 104 105 106 107 108 109 110 111

Needs attentionEating/Drinking B B B B B B B B B B

Obtained Food/Drink D D D D D D D D D DIdle

46 47 48 49 50 51 52 53 54 55 115 116 117 118 119 120 121 122 123 124

Needs attentionEating/Drinking B B B B B B B B

Obtained Food/Drink D D D D D D D D D DIdle B B

56 57 58 59 60 61 62 63 64 65 112 113 114 115 116 117 118 119 120 121

Needs attentionEating/Drinking B B B B B B B B B

Obtained Food/Drink D D D D D D D DIdle B B B

25 26 27 28 29 30 31 32 33 34 93 94 95 96 97 98 99 100 101 102

Interaction

Participant E: Beverage

Interaction

Participant E: Beverage

Interaction

Participant D: Beverage Participant D: Beverage

Interaction Interaction

Interaction

Participant C: Beverage Participant C: Beverage

Interaction Interaction

Participant A: Beverage

Interaction

Participant A: Beverage

Interaction

Participant B: Beverage Participant B: Beverage

Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage

Subtask

Page 96: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

88

Figure 39: Details of interactions involving the side dish

Figure 40 to Figure 44 show the rewards of all 2nd

level subtasks during the experiment to

illustrate how rewarding of actions are implemented. After the two meals, one can observe which

robot behaviours obtained the highest rewards and thus, became the current optimal behaviours

for each participant. For example, all of the participants preferred the persuasion style Encourage

for the subtask Obtain food from main dish (Figure 40). For the Pick up beverage subtask

(Figure 41), Encourage was the most effective for Participants C D and E, and Orient was the

most effective for Participant A. For Participant B, there was no preference of persuasion style

indicated for this subtask, namely, at the end of two meals, the reward for all three robot actions

is the same. Participants B to E preferred the Encourage persuasion style for the Obtain food

from side dish subtask (Figure 42); whereas, Participant A preferred the Orient actions more for

Needs attentionEating/Drinking S S S S S S S S S S S S S

Obtained Food/Drink E E E E E E EIdle

34 35 36 37 38 39 40 41 42 43 109 110 111 112 113 114 115 116 117 118

Needs attentionEating/Drinking S S S S S S S S S S S S

Obtained Food/Drink E E E E E E EIdle S

65 66 67 68 69 70 71 72 73 74 121 122 123 124 125 126 127 128 129 130

Needs attentionEating/Drinking S S S S S S B B S S S S

Obtained Food/Drink E E E E D E E EIdle

67 68 69 70 71 72 73 74 75 76 123 124 125 126 127 128 129 130 131 132

Needs attentionEating/Drinking S S S S S S S S S S

Obtained Food/Drink E E E E E E E E E EIdle

76 77 78 79 80 81 82 83 84 85 127 128 129 130 131 132 133 134 135 136

Needs attentionEating/Drinking S S S S S S S S S S S

Obtained Food/Drink E E E E E E E E EIdle

38 39 40 41 42 43 44 45 46 47 108 109 110 111 112 113 114 115 116 117

Participant A: Side Dish

Interaction Interaction

Participant A: Side Dish

Participant E: Side Dish

Interaction Interaction

Participant C: Side Dish Participant C: Side Dish

Interaction Interaction

Participant B: Side Dish Participant B: Side Dish

InteractionInteraction

Participant E: Side Dish

Participant D: Side Dish Participant D: Side Dish

Interaction Interaction

Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage

Subtask

Page 97: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

89

this subtask. For the Eat food subtask (Figure 43), Participants B to E preferred the Encourage

persuasion style and Participant A preferred the Cue persuasion style. Lastly, for the Drink

beverage subtask (Figure 44), the Cue style was preferred by Participants B-D and the

Encourage style was preferred by Participants A and E.

Motivation Styles

Figure 40: Rewards for the Obtain food from main dish subtask

Motivation Styles

Figure 41: Rewards for the Pick up beverage subtask

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Main Dish

-202468

101214

0 20 40 60 80 100 120 140R

ewar

dInteraction

Participant B: Main Dish

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant C: Main Dish

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant D: Main Dish

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant E: Main Dish

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Main Dish

EncourageCueOrient

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant B: Beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant C: Beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant D: Beverage

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant E: Beverage

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Main Dish

EncourageCueOrient

Page 98: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

90

Motivation Styles

Figure 42: Rewards for the Obtain food from side dish subtask

Motivation Styles

Figure 43: Rewards for the Eat food subtask

Motivation Styles

Figure 44: Rewards for the Drink beverage subtask

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Side Dish

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant B: Side Dish

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant C: Side Dish

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant D: Side Dish

-202468

101214

0 20 40 60 80 100 120R

ewar

dInteraction

Participant E: Side Dish

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Main Dish

EncourageCueOrient

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Eat food

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant B: Eat food

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant C: Eat food

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant D: Eat food

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant E Eat food

-202468

101214

0 20 40 60 80 100 120

Re

war

d

Interaction

Participant A: Main Dish

EncourageCue

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant A: Drink beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant B: Drink beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant C: Drink beverage

-202468

101214

0 20 40 60 80 100 120 140

Rew

ard

Interaction

Participant D: Drink beverage

-202468

101214

0 20 40 60 80 100 120

Rew

ard

Interaction

Participant E Drink beverage

-202468

101214

0 20 40 60 80 100 120

Re

war

d

Interaction

Participant A: Main Dish

EncourageCue

Page 99: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

91

5.2.2 Human-Robot Interaction Studies

In order to investigate the potential benefits of having the human-like embodied Brian 2.0

provide meal-assistance, an HRI and HCI comparison experiment was conducted. The specific

aim of this experiment was to study the users’ acceptance of the robot as well as the effects of

embodiment for the socially assistive robot Brian 2.0 in the Meal-time Scenario. The study

consisted of one-on-one interaction scenarios between six healthy adult participants (21 to 33

years of age) and the robot in a laboratory setting. An AB test design was used to evaluate the

users’ acceptance of the Meal-assistance robot: (i) a screen agent presented on the computer

screen, and (ii) an embodied socially assistive robot. Namely, within this specific aim the

following hypothesis was addressed: The embodied robot Brian 2.0 with both verbal and non-

verbal (e.g., facial expressions, body language) communication means will have improved user

acceptance over a screen agent.

During the baseline scenario, experimental set A, a table and chair set-up was used, where the

meal tray with food was placed on the table in front of the person and a still image of Brian 2.0

was shown on a computer screen, Figure 45a. Only verbal communication took place with

respect to the robot. Conversely, for the HRI scenario, experimental set B, the sessions were

designed to be the exact same as in the baseline except the robot Brian 2.0 was used, Figure 45b.

For both scenarios, each participant was invited to eat a meal while interacting with Brian 2.0.

Pasta and fresh fruit was utilized as “food” to be consumed by the participants during HRI.

Water was utilized as the beverage. Participants were asked to mimic eating and drinking similar

to the previous experiments. After each scenario, participants were directed to fill out a users’

acceptance questionnaire.

(a) (b)

Figure 45: (a) Baseline Scenario A and (b) Scenario B for HRI Study (Meal-time Scenario)

Page 100: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

92

The questionnaire utilized in this experiment is based on the technology acceptance model

developed by Heerink et al. [94] to evaluate users’ acceptance of social robots for elderly users.

The questions are grouped into eleven constructs, which are defined in Table 28. The complete

questionnaire is presented in Table 29. Participants were instructed to indicate their agreement

with each statement using a five point Likert scale (5=strong agreement, 3=neutral, 1=strong

disagreement). The scale is inversed for statements in the Anxiety construct (i.e. 1=strong

agreement, 3=neutral, 5=strong disagreement).

Table 28: Construct Definitions [94]

Code Construct Definition

ANX Anxiety Evoking anxious or emotional reactions when using the system.

ATT Attitude Positive or negative feelings about the appliance of the technology.

FC Facilitating

conditions Objective factors in the environment that facilitate using the system.

ITU Intention to

use

The outspoken intention to use the system over a longer period in

time.

PAD Perceived

adaptability

The perceived ability of the system to be adaptive to the changing

needs of the user.

PENJ Perceived

enjoyment

Feelings of joy or pleasure associated by the user with the use of the

system.

PEOU Perceived

ease of use

The degree to which the user believes that using the system would be

free of effort.

PS Perceived

sociability The perceived ability of the system to perform sociable behaviour.

PU Perceived

usefulness

The degree to which a person believes that using the system would

enhance his or her daily activities.

SP Social

presence

The experience of sensing a social entity when interacting with the

system.

Trust Trust The belief that the system performs with personal integrity and

reliability.

Page 101: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

93

Table 29: Users’ Acceptance Questionnaire [94]

Construct No. Statement

ANX

1 If I should use the robot, I would be afraid to make mistakes with it

2 If I should use the robot, I would be afraid to break something

3 I find the robot scary

4 I find the robot intimidating

ATT

5 I think it’s a good idea to use the robot

6 The robot would make my life more interesting

7 It’s good to make use of the robot

FC 8 I have everything I need to make good use of the robot.

9 I know enough of the robot to make good use of it.

ITU

10 I think I’ll use the robot again

11 I am certain to use the robot again

12 I’m planning to use the robot again

PAD

13 I think the robot can be adaptive to what I need

14 I think the robot will only do what I need at that particular moment

15 I think the robot will help me when I consider it to be necessary

PENJ

16 I enjoy the robot talking to me

17 I enjoy doing things with the robot

18 I find the robot enjoyable

19 I find the robot fascinating

20 I find the robot boring

PEOU

21 I think I will know quickly how to use the robot

22 I find the robot easy to use

23 I think I can use the robot without any help

24 I think I can use the robot when there is someone around to help me

25 I think I can use the robot when I have a good manual

PS

26 I consider the robot a pleasant conversational partner

27 I find the robot pleasant to interact with

28 I feel the robot understands me

29 I think the robot is nice

PU

30 I think the robot is useful to me

31 It would be convenient for me to have the robot

32 I think the robot can help me with many things

SP

33 When interacting with the robot I felt like I’m talking to a real person

34 It sometimes felt as if the robot was really looking at me

35 I can imagine the robot to be a living creature

36 I often think the robot is not a real person.

37 Sometimes the robot seems to have real feelings

Trust 38 I would trust the robot if it gave me advice

39 I would follow the advice the robot gives me

Page 102: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

94

5.2.2.1 Results and Discussions

Table 30 shows the descriptive statistics for Scenarios A and B. In general, the proposed robot in

Scenario B, on average, scored high on questions within the Attitude, Intention of Use, Perceived

Enjoyment and Perceived Sociability constructs. Table 30 also shows the results from a paired

two-tail t-test, which was performed to evaluate the differences between the results of both

scenarios. Even though significance testing should be conducted with a larger group of

participants, the results here show if any potential relationships exist. Statements that show a t-

test result of “n/a” have exactly the same agreement scores for both scenarios. The highlighted

questions in Table 30 show the questions that have a higher agreement score in Scenario B than

Scenario A within this group with a statistical significance of p≤0.20. Based on the results, the

overall hypothesis is validated for these particular questions. Namely, the embodied robot Brian

2.0 with both verbal and non-verbal communication means has improved user acceptance over a

screen agent with respect to these particular statements.

Page 103: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

95

Table 30: Users’ Acceptance Results

Scenario A: HCI Scenario B: HRI Paired t-test

Construct No. Min Max Mean Std.

Dev. Min Max Mean

Std.

Dev. t-stat

p-value

(two-tail)

ANX

1 1.0 4.0 1.8 1.2 1.0 5.0 2.7 1.6 -2.712 0.04

2 1.0 4.0 2.7 1.5 1.0 5.0 2.8 1.7 -0.307 0.77

3 1.0 3.0 1.7 1.0 1.0 4.0 2.7 1.2 -1.732 0.14

4 1.0 4.0 1.8 1.2 1.0 4.0 2.2 1.2 -0.542 0.61

ATT

5 3.0 5.0 4.2 0.8 4.0 5.0 4.3 0.5 -0.542 0.61

6 3.0 5.0 4.3 0.8 3.0 5.0 4.2 0.8 -0.542 0.61

7 3.0 5.0 4.2 0.8 3.0 5.0 4.2 0.8 0.000 1.00

FC 8 2.0 4.0 3.2 1.0 1.0 4.0 3.2 1.3 0.000 1.00

9 1.0 4.0 2.7 1.2 1.0 5.0 3.5 1.6 -1.746 0.14

ITU

10 2.0 5.0 3.7 1.2 3.0 5.0 4.5 0.8 -1.536 0.19

11 3.0 5.0 3.8 1.0 3.0 5.0 4.3 0.8 -1.464 0.20

12 2.0 5.0 3.7 1.2 3.0 5.0 4.2 1.0 -0.889 0.42

PAD

13 3.0 5.0 3.7 0.8 3.0 4.0 3.5 0.5 0.542 0.61

14 2.0 4.0 3.3 1.0 1.0 5.0 3.5 1.4 -0.542 0.61

15 2.0 5.0 3.7 1.2 3.0 5.0 3.8 0.8 -0.542 0.61

PENJ

16 3.0 5.0 4.2 0.8 4.0 5.0 4.3 0.5 -0.542 0.61

17 2.0 5.0 3.8 1.2 4.0 5.0 4.3 0.5 -1.168 0.30

18 2.0 5.0 3.8 1.2 3.0 5.0 4.3 0.8 -1.464 0.20

19 2.0 5.0 3.2 1.2 3.0 5.0 4.3 0.8 -2.445 0.06

20 1.0 4.0 2.3 1.4 1.0 3.0 1.7 0.8 1.348 0.24

PEOU

21 2.0 5.0 3.7 1.0 2.0 5.0 3.8 1.2 -1.000 0.36

22 3.0 5.0 4.0 0.6 1.0 5.0 3.8 1.5 0.415 0.70

23 1.0 5.0 3.7 1.5 1.0 5.0 3.2 1.6 1.464 0.20

24 4.0 5.0 4.8 0.4 4.0 5.0 4.8 0.4 n/a n/a

25 3.0 5.0 4.7 0.8 3.0 5.0 4.7 0.8 n/a n/a

PS

26 2.0 5.0 3.7 1.2 3.0 5.0 4.2 0.8 -1.464 0.20

27 2.0 5.0 3.7 1.2 3.0 5.0 4.2 0.8 -1.464 0.20

28 2.0 4.0 2.5 0.8 2.0 4.0 3.2 0.8 -3.162 0.03

29 2.0 5.0 4.0 1.1 3.0 5.0 4.2 0.8 -1.000 0.36

PU

30 2.0 5.0 3.5 1.2 2.0 5.0 3.3 1.2 0.542 0.61

31 2.0 5.0 3.3 1.5 2.0 5.0 3.7 1.0 -0.791 0.47

32 2.0 5.0 2.8 1.2 1.0 5.0 3.0 1.4 -0.542 0.61

SP

33 2.0 4.0 3.0 0.9 1.0 5.0 2.7 1.9 0.598 0.58

34 2.0 5.0 3.7 1.0 2.0 5.0 3.7 1.4 0.000 1.00

35 1.0 5.0 2.7 1.6 1.0 5.0 2.8 1.8 -0.349 0.74

36 2.0 5.0 4.3 1.2 1.0 5.0 3.2 1.7 1.941 0.11

37 1.0 4.0 3.0 1.1 2.0 4.0 3.3 0.8 -0.598 0.58

Trust 38 3.0 4.0 3.7 0.5 3.0 5.0 4.0 0.6 -1.581 0.18

39 3.0 4.0 3.5 0.5 3.0 5.0 3.8 0.8 -1.581 0.18

Page 104: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

96

5.2.2.2 Participant Survey Results

A post-experiment survey was administered to the participants after the HRI scenario to obtain

feedback on the robot’s behaviour during the Meal-time Scenario. The types of questions that

were asked in the survey pertained to the robot’s social intelligence attributes as well as the

participants’ overall impressions of the robot. The participants were asked to choose their

responses from a list of robot behaviours.

The participants were first asked to identify the robot behaviours they found were the most

effective at engaging them in the meal assistance activity. They were also asked what

characteristics of the robot they liked the most and helped keep them engaged in the interaction

during the meal. Table 31 and Table 32 summarize their responses for both questions based on a

ranking of the total number of responses for each behaviour and characteristic, respectively.

Table 31: Robot Behaviours Effective at Engaging the Person in the Meal-time Scenario

Ranking Robot Behaviours

1st Providing verbal prompts/cues

2nd

Providing jokes

3rd

Providing comments about the food

4th

Providing non-verbal prompts/cues (e.g. robot pointing to the dish)

4th

Robot’s facial expressions in reaction to your actions

5th

Providing encouragement

5th

Providing orienting statements

6th

Providing non-meal-related commentary about the interaction

Table 32: Most Liked Robot Characteristics

Ranking Robot Behaviours

1st The robot’s human voice

2nd

The companionship the robot provides by just being there

3rd

The robot’s life-like appearance and demeanour

4th

The robot expressing different emotions through facial expressions

and different tones of voice

Page 105: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

97

Chapter 6 Conclusion

6.1 Summary of Contributions

The primary contributions of this work are summarized in the following subsections.

6.1.1 Control Architecture for Socially Assistive Robots

A novel HRI control architecture has been developed to enable socially assistive robots to

effectively engage a person in cognitively stimulating person-centred activities. A modular

design approach has been applied to the overall control architecture, allowing for the addition

and/or substitution of different sensor modalities as needed based on the intended activity.

During HRI, the control architecture monitors the person’s user state and behaviour during the

activity and adapts the robot’s emotion-based behaviour to the current interactive scenario. This

control architecture has been successfully applied to the Memory Game and Meal-time HRI

scenarios.

6.1.2 Learning-based Robot Assistive Behaviours

A learning-based decision making module for the proposed control architecture has been

developed to determine the robot’s effective assistive behaviour during HRI. The module is

composed of two layers: (i) the Knowledge Clarification layer and (ii) the Intelligence layer. The

role of Knowledge Clarification layer is to clarify the current state of the activity and person.

Based on the current assistive interactive scenario, the Intelligence Layer module then uses the

MAXQ hierarchical reinforcement learning technique to determine the robot’s behaviour. A

HRL approach is utilized to provide the robot with the ability to: (i) learn appropriate assistive

behaviours based on the structure of the activity, and (ii) personalize an interaction based on a

person’s behaviour or user state during HRI. The learning-based decision making module has

been successfully applied to the Memory Game and Mealtime HRI scenarios.

6.1.2.1 MAXQ for Assistive HRI Scenarios

To date, this work presents the first application of MAXQ for assistive HRI scenarios. The

MAXQ technique is an efficient solution compared to the traditional Q-learning approach for

multimodal HRI, where the latter requires the exploration of a large number of states and actions

Page 106: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

98

and an extensive amount of experience to learn the optimal policy. MAXQ solves the

reinforcement learning problem more efficiently for assistive scenarios by decomposing the

overall assistive problem into a set of sub-problems. MAXQ also supports temporal abstraction,

state abstraction, and subtask abstraction which are important in the decision making process for

the socially assistive robot in an HRI scenario.

6.1.2.2 Affect-based Learning

In this work, affect-based learning is an essential part of the personalization of HRI. In affect-

based learning, the robot learns its optimal assistive behaviours based on the person’s affective

state during HRI. This is a novel addition to decision making processes of socially assistive

robots that engage individuals in varying socially and/or cognitively stimulating activities.

Typically, robots developed for these purposes only adapt their behaviours to task performance.

The aim in utilizing affect-based learning in this scenario is to select the robot’s behaviours in an

attempt to maintain positive affective states during interactions. It is postulated that this will, in

turn, enable the person to be more engaged in the activity.

6.1.3 Metrics Explored for the Evaluation of HRI

Different evaluation metrics for HRI with Brian 2.0 were explored in this work. Namely, three

HRI studies were performed to investigate: (i) the effectiveness of the robot in engaging a person

in a cognitively stimulating activity, (ii) the ability of the robot to minimize task-induced stress

during a cognitively stimulating activity, and (iii) user acceptance of an embodied robot versus a

screen agent in a meal-time scenario.

6.2 Discussion of Future Work

Future work consists of optimizing the overall robotic system in order to perform a pilot study

with the robot at the ASBLab’s collaborative healthcare facility with elderly persons suffering

from mild cognitive impairment. Experimental findings and participant feedback has provided

insight into how the capabilities of Brian 2.0 can be further improved for future studies. With

respect to the development of the control architecture, the main area that should be focused on is

improving user state detection.

Page 107: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

99

In both the Memory Game and Meal-time scenario, there was only one sensor used to detect the

affect component of user state. One sensor provided enough information to evaluate the proposed

control architecture; however, to obtain a more accurate indication of a person’s affect during

HRI, more inputs should be added. For example, additional sensors can be added to detect more

affect indicators (e.g. vocal intonation, body language, etc.). Moreover, the detection of more

human affective states (e.g. sad, surprised, fear, etc.) for existing inputs (i.e. heart-rate and facial

expression) can also be explored. In general, the detection of more affective states will allow the

robot to learn how to respond appropriately to human affect, resulting in more opportunities for

bidirectional emotion-based HRI.

6.3 Final Concluding Statement

This work provides valuable insight into the use of innovative robotic technologies as therapeutic

or assistive aids to manage dementia. Specifically, the proposed learning-based control

architecture provides a socially assistive robot with the necessary abilities to be an effective

social motivator in cognitively stimulating activities. As a social motivator, the robot can

effectively engage individuals in important activities to promote task completion, and thus,

reduce dependence on caregivers. Moreover, the architecture’s learning capabilities can increase

activity engagement over time by personalizing the robot’s behaviours. Namely, it can enable the

robot to learn which of its behaviours will promote positive user states and/or increase task

compliance. The affect detection aspect of this control architecture provides opportunities for

bidirectional emotion-based HRI, which are critical for cognitively impaired individuals who

have begun to lose their ability to communicate verbally. Proposed improvements to the User

State within the control architecture can potentially enhance the quality of the bidirectional

emotion-based HRI, resulting in HRIs that are more natural and believable.

Page 108: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

100

References

[1] T.K. Tatemichi et al., "Cognitive impairment after stroke: frequency, patterns, and

relationship to functional abilities," Journal of Neurology, Neurosurgery & Psychiatry,

vol. 57, no. 2, pp. 202-207, 1994.

[2] A. Wimo and M. Prince, "World Alzheimer's Report 2010: The Global Economic Impact

of Dementia," Alzheimer's Disease International, London, 2010.

[3] E.F. LoPresti, R.C. Simpson, N. Kirsch, D. Schreckenghost, and S. Hayashi, "Distributed

cognitive aid with scheduling and interactive task guidance," Journal of Rehabilitation

Research and Development, vol. 45, no. 4, pp. 505-522, 2008.

[4] J.M. Wiener, R.J. Hanley, R. Clark, and J.F. Van Nostrand, “Measuring the Activities of

Daily Living: Comparisons across National Surveys,” Journal of Gerontology, vol. 45,

no. 6, pp. S229-S237, 1990.

[5] J.D. Churchill, R. Galvez, S. Colcombe, R.A. Swain, and W.T. Greenough, "Excercise,

experience, and the aging brain," Neurobiology of Aging, vol. 23, no. 5, pp. 941-955,

2002.

[6] G.H. Recanzone, "Cerebral cortical plasticity: perception and skill acquisition," in The

New Cognitive Neurosciences, M.S. Gazzaniga, Ed. Cambridge: MIT Press, 2000, pp.

237-247.

[7] P.B. Baltes and S.L. Willis, "Plasticity and enhancement of intellectual functioning in old

age: Penn State's Adult Development and Enrichment Project (ADEPT)," in Aging and

Cognitive Processes, F.I.M. Craik and S. Trehub, Eds. New York: Plenum Press, 1982,

pp. 353-390.

[8] C. Greenburg and S.M. Powers, "Memory improvement among adult learners,"

Educational Gerontology, vol. 13, no. 3, pp. 263-280, 1987.

[9] G. Rebok and L.J. Bacerak, "Memory self-efficacy and performance differences in young

and old adults: effects of mnemonic training," Developmental Psychology, vol. 25, no. 5,

pp. 714-721, 1989.

[10] S.L. Willis, "Cognitive training and everyday competence," Annual Review of

Gerontology and Geriatrics, vol. 7, no. 1, pp. 159-188, 1987.

[11] J.A. Yesavage, "Nonpharmalogical treatments for memory losses and normal aging," The

American Journal of Psychiatry, vol. 142, no. 1, pp. 600-605, 1985.

[12] K. Ball, D. Berch, and K. Helmers, "Effects of cognitive training interventions with older

adults," The Journal of the American Medical Association, vol. 288, no. 18, pp. 2271-

2281, 2002.

[13] A.T. Cianciolo and R.J. Sternberg, Intelligence: A Brief History. Malden, MA, USA:

Wiley-Blackwell, 2004.

[14] J.B. Jobe et al., "ACTIVE: A cognitive intervention trial to promote independence in

older adults," Controlled Clinical Trials, vol. 22, no. 4, pp. 453-479, 2001.

[15] M. Hofmann, C. Hock, A. Kuhler, and F. Muller-Spahn, “Interactive computer-based

cognitive training in patients with Alzheimer's disease,” Journal of Psychiatric Research,

vol. 30, no. 6, pp. 493-501, 1996.

[16] L. Tárraga et al., “A randomised pilot study to assess the efficacy of an interactive,

multimedia tool of cognitive stimulation in Alzheimer’s disease,” Journal of Neurology,

Neurosurgery & Psychiatry, vol. 77, no. 10, pp. 1116–1121, 2006.

Page 109: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

101

[17] V.K. Günther et al., “Long-term improvements in cognitive performance through

computer-assisted cognitive training: a pilot study in a residential home for older people,”

Aging & Mental Health, vol. 7, no. 3, pp. 200–206, 2003.

[18] G. Cipriani, A. Bianchette, M. Trabucchi, “Outcomes of a computer-based cognitive

rehabilitation program on Alzheimer’s disease patients compared with those on patients

affected by mild cognitive impairment,” Archives of Gerontology and Geriatrics, vol. 43,

no. 3, pp. 327–335, 2006.

[19] L.E. Middleton and K. Yaffe, “Promising strategies for the prevention of dementia,”

Archives of Neurology, vol. 66, no. 10, pp. 1210-1215, 2009.

[20] L. Fratiglioni and H.X. Wang, “Brain reserve hypothesis in dementia,” Journal of

Alzheimer’s Disease, vol. 12, no. 1, pp. 11-22, 2007.

[21] J.S. Saczynski et al., "The effect of social engagement on incident dementia: the

Honolulu-Asia aging study," American Journal of Epidemiology, vol. 163, no. 5, pp. 433-

440, 2006.

[22] A. Karp et al., “Mental, physical and social components in leisure activities equally

contribute to decrease dementia risk,” Dementia and Geriatric Cognitive Disorders, vol.

21, no. 2, pp. 65-73, 2005.

[23] J. Kayser-Jones and E.S. Schell, “Staffing and the meal-time experience of long-term

care facilityresidents on a Special Care Unit,” American Journal of Alzheimer’s Disorder

and other Dementias, vol. 12, no. 2, pp. 67-72, 1997.

[24] C.C. Chen, L.S. Schilling, and C.H. Lyder, "A concept analysis of malnutrition in the

elderly," Journal of Advanced Nursing, vol. 36, no. 1, pp. 131-142, 2001.

[25] M. Matarić, A. Okamura and H. Christensen, “A Research Roadmap for Medical and

Healthcare Robotics,” NSF/CCC/CRA Roadmapping for Robotics Workshop,

Arlington/Washington, DC, 2008, pp. 1-30.

[26] S. Shibata and K. Wada, “Robot therapy-a new approach for mental healthcare of the

elderly,” Gerontology, vol. 57, no. 4, pp. 378-386, 2010.

[27] T. Hamada et al., “Robot therapy as for recreation for elderly people with dementia -

Game recreation using a pet-type robot,” IEEE International Symposium on Robot and

Human Interactive Communication, Munich, 2008, pp.174-179.

[28] A. Tapus, C. Tapus, and M.J. Mataric, “Long term learning and on-line robot behaviour

adaptation for individuals with physical and cognitive impairments,” Field and Service

Robotics: Springer Tracts in Advanced Robotics, 1st ed., vol. 62, A. Howard et al., Eds.,

Berlin/Heidelberg: Springer, 2010, pp. 389-398.

[29] B. Robins et al., “Human-centred design methods: developing scenarios for robot assisted

play informed by user panels and field trials,” International Journal of Human-Computer

Studies, vol. 68, no. 12, pp. 873-898, 2010.

[30] H. Kozima and C. Nakagawa, “Social robots for children: practice in communication-

care,” IEEE International Workshop on Advanced Motion Control, Istanbul, 2006, pp.

768-773.

[31] E. Ferrari, B. Robins, and K. Dautenhahn, “Therapeutic and educational objectives in

robot assisted play for children with autism,” IEEE International Symposium on Robot

and Human Interactive Communication, Toyama, 2009, pp. 108-114.

[32] J. Hoey, A. Von Bertoldi, T. Craig, P. Poupart, and A. Mihailidis, “Automated

Handwashing Assistance For Persons With Dementia Using Video and A Partially

Observable Markov Decision Process,” Computer Vision and Image Understanding, vol.

114, no. 5, pp. 503-519, 2010.

Page 110: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

102

[33] K. Sim et al. "Improving the accuracy of erroneous-plan recognition system for Activities

of Daily Living," IEEE International Conference on e-Health Networking Applications

and Services (Healthcom), Lyon, 2010, pp.28-35.

[34] H. Si, S.J. Kim, N. Kawanishi, and H. Morikawa, “An Guidance System Based on Q-

Learning for Supporting Dementia Patient's Activities of Daily Living,” IEEE Consumer

Communications and Networking Conference, Las Vegas, NV, 2007, pp. 1199-1200.

[35] J. Pineau, M. Montemerlo, M. Pollack, N. Roy and S. Thrun, “Towards robotic assistants

in long-term care facilities: Challenges and results,” Robotics and Autonomous Systems,

vol. 42, no. 3-4, pp. 271-281, 2003.

[36] Z. Zhang and G. Nejat, “Human affective state recognition and classification during

human-robot interaction,” ASME Design Engineering Technical Conference, San Diego,

CA, 2009, DETC2009-87647.

[37] G. Nejat and M. Ficocelli, “Can I be of assistance? The intelligence behind an assistive

robot,” IEEE International Conference on Robotics and Automation, Pasadena, CA,

2008, pp. 3564–3569.

[38] J. Terao, L. Trejos, Z. Zhang, and G. Nejat, “The design of an intelligent socially

assistive robot for elderly care,” ASME International Mechanical Engineering Congress

and Exposition, Boston, MA, 2008, IMECE2008-67678.

[39] T. Fong, I. Nourbakhsh, and K. Dautenhahn, "A survey of socially interactive robots,"

Robotics and Autonomous Systems, vol. 42, no. 3-4, pp. 143-166, 2003.

[40] A. Lockerd and C. Breazeal, "Tutelage and socially guided robot learning," IEEE/RSJ

International Conference on Intelligent Robots and Systems, Sendai, 2004, pp. 3475-

3480.

[41] P. Ravindra, S. De Silva, T. Matsumoto, S.G. Lambacher, and M. Higashi, "Development

of a social learning mechanism for a humanoid robot," International Conference on

Intelligent Sensors, Sensor Networks and Information Processes, 2008, Sydney, pp. 243-

248.

[42] T. Prommer, H. Holzapfel, and A. Waibel, “Rapid simulation-driven reinforcement

learning of multimodal dialog strategies in human-robot interaction”, INTERSPEECH,

Pittsburgh, PA, 2006, pp. 1918-1921.

[43] F. Krsmanovic, C. Spencer, D. Jurafsky, and A.Y. Ng, “Have we met? MDP based

speaker ID for robot dialogue,” INTERSPEECH, Pittsburgh, PA, 2006, pp. 461-464.

[44] S.R. Schmidt-Rohr, M. Losch, R. Dillmann, “Human and robot behaviour modeling for

probabilistic cognition of an autonomous service robot,” IEEE International Symposium

on Robot and Human Interactive Communication, Munich, 2008, pp. 635-640.

[45] A. Tapus, C. Tapus, and M.J. Mataric, "Hands-off therapist robot behaviour adaptation to

user personality for post-stroke rehabilitation therapy," IEEE International Conference

on Robotics and Automation, Roma, 2007, pp. 1547-1553.

[46] T.G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function

decomposition,” Journal of Artificial Intelligence Research, vol. 13, no. 1, pp. 227-303,

2000.

[47] S. Dominguez, E. Zalama, J.G. Garcia-Bermejo, and J. Pulido, "Motivation and

competitive learning in a social robot," IEEE/RSJ International Conference on Intelligent

Robots and Systems, Nice, 2008, pp. 3826-3831.

[48] C. Breazeal et al., “Learning from and about others: towards using imitation to bootstrap

the social understanding of others by robots," Artificial Life, vol. 11, no. 1-2, pp. 31-62,

2006.

Page 111: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

103

[49] S. Schmidt-Rohr, S. Knoop, M. Lösch, and R. Dillmann, "Reasoning for a multi-modal

service robot considering uncertainty in human-robot interaction," ACM/IEEE

International Conference on Human-Robot Interaction: Living with Robots, Amsterdam,

2008, pp. 249-254.

[50] P. Prodanov and A. Drygajlo, "Bayesian networks based multi-modality fusion for error

handling in human-robot dialogues under noisy conditions," Speech Communication, pp.

231-248, 2005.

[51] F. Doshi and N. Roy, "Spoken language interaction with model uncertainty: an adaptive

human-robot interaction system," Connection Science, vol. 20, no. 4, pp. 299-318, 2008.

[52] K. Georgila, J. Henderson, and O. Lemon, “User simulation for spoken dialogue systems:

learning and evaluation,” INTERSPEECH, Pittsburgh, PA, 2006, pp. 1065-1068.

[53] E. Levin, R. Pieraccini, and W. Eckert, “A stochastic model of human- machine

interaction for learning dialog strategies,” IEEE Trans. Speech Audio Process., vol. 8, no.

1, pp. 11–23, 2000.

[54] O. Pietquin, “Probabilistic framework for dialog simulation and optimal strategy

learning,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no.

2, pp. 589-599, 2006.

[55] H. Cuayahuitl, S. Renals, O. Lemon, and H. Shimodaira, “Human-computer dialogue

simulation using hidden Markov models,” IEEE Workshop on Automatic Speech

Recognition Understanding (ASRU), San Juan, PR, 2005.

[56] B.A. Shore, D.C. Lerman, R.G. Smith, B.A. Iwata, and I.G. DeLeon, “Direct assessment

of quality of care in a geriatric long term care facility,” Journal of Applied Behaviour

Analysis, vol. 28, pp. 435–448, 1995.

[57] D. Kuhn, B.F. Fulton, and P. Edelman, “Factors influencing participation in activities in

dementia care settings,” Alzheimer’s Care Quarterly, vol. 5, pp. 144–152, 2004.

[58] J. T. Cacioppo, M.E. Hughes, L.J. Waite, L.C Hawkley, and R.A. Thisted, “Loneliness as

a specific risk factor for depressive symptoms: Cross-sectional and longitudinal

analyses,” Psychology and Aging, vol. 21, no. 1, pp. 140–151, 2006.

[59] L. Buettner et al, “Therapeutic recreation as an intervention for persons with dementia

and agitation: an efficacy study,” American Journal of Alzheimer’s Disease and Other

Dementias, vol. 11, no. 5, pp. 4-12, 1996

[60] C. B. Hall et al., “Cognitive activities delay onset of memory decline in persons who

develop dementia,” Neurology, vol. 73, pp. 356-361, 2009.

[61] A.M. Kolanowski et al, “Capturing interests: therapeutic recreation activities for persons

with dementia,” Therapeutic Recreation Journal, vol. 35, pp. 220-235, 2001.

[62] J. Cohen-Mansfield, M.S. Marx, M. Dakheel-Ali, N.G. Regier, and K. Thein, “Can

persons with dementia be engaged with stimuli?” American Journal of Geriatric

Psychology, December 2009.

[63] S. Jeffery. (2008, July). Cognitive stimulation technique may prevent decline in healthy

elderly. Medscape Medical News [On-line]. Available:

http://www.medscape.com/viewarticle/577373

[64] G. Matthews et al., “Emotional intelligence, personality, and task-induced stress,”

Journal of Experimental Psychology: Applied, vol. 12, no. 2, pp. 96-107, 2006.

[65] M.C. Pardon, “Therapeutic potential of some stress mediators in early Alzheimer's

disease,” Experimental Gerontology, vol. 46, no. 2-3, pp. 170-173, 2010.

[66] D.G. Lowe, “Distinctive image features from scale-invariant keypoints,” International

Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.

Page 112: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

104

[67] M. Agarwal, "2D/3D Object Detection Techniques," University of Toronto, B.A.Sc.

Thesis 2011.

[68] A. Lee, T. Kawahara, “Recent Development of Open-Source Speech Recognition Engine

Julius,” Asia-Pacific Signal and Information Processing Association Annual Summit and

Conference, Sapporo, 2009, pp. 131-137.

[69] VoxForge. (2010, Dec. 17). VoxForge Downloads [On-line]. Available:

http://www.voxforge.org/home/downloads

[70] Nemesysco. Ltd. (2006) Voice-Analysis Tools for Security & Commercial Use [On-line].

Available: http://www.nemesysco.com

[71] A. Heinzel et al., “Differential modulation of valence and arousal in high-alexithymic and

low-alexithymic individuals,” Neuroreport, vol. 21, no. 15, pp. 998-1002, 2010.

[72] P.G. Jorna, “Spectral analysis of heart rate and psychological state: A review of its

validity as a workload index,” Biological Psychology, vol. 34, no. 2-3, pp. 237-257, 1992.

[73] R.G. Hansson, “Considering social nutrition in assessing geriatric nutrition,” Geriatrics

vol. 33, no. 3, pp. 49-51, 1978.

[74] E.S. Schell and J. Kayser-Jones, “The effect of role-taking on caregiver-resident meal-

time interaction,” Applied Nursing Research, vol. 12, no. 1, pp. 38-44, 1999.

[75] C.L. Osborn and M. Marshall, "Promoting meal-time independence," Geriatric Nursing,

vol. 13, no. 5, pp. 254-258, 1992.

[76] C.R. Hellen, "Eating: An Alzheimer's activity," American Journal of Alzheimer’s

Disorder and other Dementias, vol. 5, no. 2, pp. 5-9, 1990.

[77] M. Coyne and L. Hoskins, "Improving Eating Behaviors in Dementia Using Behavioral

Strategies," Clinical Nursing Research, vol. 6, no. 3, pp. 275-290, 1997.

[78] A. Do and A. Saad, "Development of a Sensory System for a Meal-assistance Socially

Assistive Robot," Autonomous Systems and Biomechatronics Laboratory, University of

Toronto, Internal Report 2011.

[79] SourceForge. (2008). Wiiuse [Online]. Available: http://sourceforge.net/projects/wiiuse/

[80] K. Choi, ""Face Tracking Program"," Autonomous Systems and Biomechatronics

Laboratory, University of Toronto, Internal Report 2010.

[81] P. Viola and M. Jones, “Rapid object detection using boosted cascade of simple

features,” IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-

518.

[82] T. Watson. (2010). Auto Smiley – Computer vision smile generator. FAT LAB: free art &

technology [Online]. Available: http://fffff.at/auto-smiley/

[83] M. Bartlett, G. Littlewort, C. Lainscsek, I. Fasel, and J. Movellan, “Machine learning

methods for fully automatic recognition of facial expressions and facial actions,” IEEE

International Conference on Systems, Man & Cybernetics, 2004, pp. 592-597.

[84] P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the

Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA, 1978.

[85] P. Ekman, and J. Hager. (1998). Facial Action Coding System Affect Interpretation

Database (FACSAID) [Online]. Available: http://face-and-

emotion.com/dataface/facsaid/description.jsp

[86] T.G. Dietterich, "An Overview of MAXQ Hierarchical Reinforcement Learning," in

Abstraction, Reformulation, and Approximation, B. Choueiry and T. Walsh, Eds.

Berlin/Heidelberg: Springer, 2000, pp. 26-44.

[87] C. Watkins and P. Dayan, “Q-learning”, Machine Learning, vol. 8, no. 3-4, pp. 279-292,

1992.

Page 113: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

105

[88] M.C. Silveri, G. Reali, C. Jenner and M. Puopolo, “Attention and memory in the

preclinical stage of dementia,” Journal of Geriatric Psychiatry and Neurology, vol. 20,

no. 2, pp. 67-75, 2007.

[89] B. Brenske, E.H. Rudrud, K. A. Schulze, and J. T. Rapp, “Increasing activity attendance

and engagement in individuals with dementia using descriptive prompts,” Journal of

Applied Behaviour Analysis, vol. 41, no. 2, pp. 273-277, 2008.

[90] G. Matthews et al., “Emotional intelligence, personality, and task-induced stress,”

Journal of Experimental Psychology: Applied, vol. 12, no. 2, pp. 96-107, 2006.

[91] M.C. Pardon, “Therapeutic potential of some stress mediators in early Alzheimer's

disease,” Experimental Gerontology, vol. 46, no. 2-3, pp. 170-173, 2010.

[92] T. Kanade, J.F. Cohn, and Y. Tian, “Comprehensive database for facial expression

analysis,” IEEE International Conference on Automatic Face and Gesture Recognition,

Grenoble, 2000, pp. 46-53.

[93] M.J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, "Coding Facial Expressions with

Gabor Wavelets" IEEE International Conference on Automatic Face and Gesture

Recognition, Nara, 1998, pp. 200-205.

[94] M. Heerink, B. Krose, V. Evers, and B. Wielinga, “Measuring acceptance of an assistive

social robot: a suggested toolkit,” IEEE International Symposium on Robot and Human

Interactive Communication, Toyama, 2009, pp. 528-533.

Page 114: A Learning-based Control Architecture for Socially ......A hierarchical reinforcement learning approach is used in the architecture so that the robot can learn appropriate assistive

106

Appendix

A.1 List of My Publications

[1] J. Chan and G. Nejat, “Designing intelligent socially assistive robots as effective tools in

cognitive interventions,” International Journal of Humanoid Robotics, vol. 8, no. 1, pp.

103-126, 2011.

[2] J. Chan and G. Nejat, “Minimizing Task-Induced Stress in Cognitively Stimulating

Activities using an Intelligent Socially Assistive Robot,” IEEE International Symposium

on Robot and Human Interactive Communication, Atlanta, GA, 2011, In print.

[3] J. Chan and G. Nejat, “A learning-based control architecture for an assistive robot

providing social engagement during cognitively stimulating activities,” IEEE

International Conference on Robotics and Automation, Shanghai, 2011, In print.

[4] J. Chan and G. Nejat, “The design of an intelligent socially assistive robot for person-

centered cognitive interventions,” ASME International Design Engineering Technical

Conferences, Montreal, 2010, IDETC2010-26861.

[5] J. Chan and G. Nejat, "Promoting engagement in cognitively stimulating activities using

an intelligent socially assistive robot," IEEE/ASME International Conference on

Advanced Intelligent Mechatronics, Montreal, 2010, pp. 533-538.