7
JULY/AUGUST 2013 13 0278-6648/13/$31.00©2013IEEE Digital Object Identifier 10.1109/MPOT.2012.2189443 Date of publication: 22 July 2013 R obots have become an integral part of life in today’s modern and advanced society. Technolog- ical advancement in the field of machine vision has resulted in many new applications wherein work that could previously be performed by humans alone is now being performed by robots. A few examples of such works are picking up and placing objects and quality control, among others. Robots are also starting to play a prominent role in areas such as household chores, markets, schools, and health-care facilities. The introduction of robots in workplaces has greatly increased the efficiency with which work can be done. As the role of robots in the day-to-day activities of human beings started gaining prominence, researchers around the world contemplated introducing robots in the field of teaching, hoping it would enhance the learning rate of students. This thought process led to research in the field of human–robot interactionwherein the focus is to study the effects of interaction with robots on the human mindset. The ability of a robot to interact efficiently with humans is extremely important in the case of learning in order to main- tain a positive learning rate. Learning refers to acquiring knowledge and skills that require thinking on the student’s part. However, emotions impact the process of think- ing, which further influences the learning process. Thus, it is very intuitive to understand that emotions and learning are interdependent and play a major role in the learning process. It is therefore important that the robot be capable of identifying and under- standing students’ emotions in order to establish a positive learning rate. The human body is the primary channel for interpersonal communication that conveys information related to inter- personal attitudes and emotions. The most expressive way humans display emotions is through facial expressions. Humans detect and interpret faces and facial expressions in a scene with little or no effort. The learning process of stu- dents has been rarely modeled by edu- cators, who give importance to con- veying information and facts. It is, however, important to make stu- dents realize that failure is also a part of learning and recognize their emotional states to impact their learning positively. In any learning process, the learner’s emotions are to be identified and necessary action is to be taken to optimize his/ her learning. Literature surveys indicate that many image pro- cessing techniques were used to analyze constructive learning and also the interplay between emo- tions and learning. The face plays an important role in the human per- ception of emotion. In face-to-face interactions, facial expressions are main channels for conveying emotions. Many methods like the tracking of facial feature points and observing the subject’s upper body and eyes were used for measuring emotions so as to aid the learning process. By analyzing emotions, inference can be drawn about the affective state of the learner, which can be used to optimize his/her learning. This article focuses on maintaining a positive learning rate of a student being taught in a classroom using facial expression recognition and tree augmented naive (TAN) Bayes classifier on a biped robot platform. A TAN Bayes classifier is used Constructive learning for human–robot interaction AMARJOT SINGH, SRIKRISHNA KARANAM, AND DEVINDER KUMAR © ISTOCKPHOTO/LINDA BUCKLIN © CAN STOCK PHOTO/SVEDOLIVER T h e U b i qu i t o u s M a c h i n e

Constructive Learning for Human-Robot Interaction

Embed Size (px)

Citation preview

JULY/AUGUST 2013 130278-6648/13/$31.00©2013IEEE

Digital Object Identifier 10.1109/MPOT.2012.2189443

Date of publication: 22 July 2013

Robots have become an integral part of life in today’s modern and advanced society. Technolog-ical advancement in the field of machine vision

has resulted in many new applications wherein work that could previously be performed by humans alone is now being performed by robots. A few examples of such works are picking up and placing objects and quality control, among others. Robots are also starting to play a prominent role in areas such as household chores, markets, schools, and health-care facilities. The introduction of robots in workplaces has greatly increased the efficiency with which work can be done.

As the role of robots in the day-to-day activities of human beings started gaining prominence, researchers around the world contemplated introducing robots in the field of teaching, hoping it would enhance the learning rate of students. This thought process led to research in the field of human–robot interactionwherein the focus is to study the effects of interaction with robots on the human mindset. The ability of a robot to interact efficiently with humans is extremely important in the case of learning in order to main-tain a positive learning rate. Learning refers to acquiring knowledge and skills that require thinking on the student’s part. However, emotions impact the process of think-ing, which further influences the learning process. Thus, it is very intuitive to understand that emotions and learning are interdependent and play a major role in the learning process. It is therefore important that the robot be capable of identifying and under-

standing students’ emotions in order to establish a positive learning rate. The human body is the primary channel for interpersonal

communication that conveys information related to inter-personal attitudes and emotions. The most expressive

way humans display emotions is through facial expressions. Humans detect and interpret faces

and facial expressions in a scene with little or no effort. The learning process of stu-

dents has been rarely modeled by edu-cators, who give importance to con-veying information and facts. It is, however, important to make stu-dents realize that failure is also a part of learning and recognize their emotional states to impact their learning positively.

In any learning process, the learner’s emotions are to be identified and necessary action is to be taken to optimize his/her learning. Literature surveys indicate that many image pro-cessing techniques were used to

analyze constructive learning and also the interplay between emo-

tions and learning. The face plays an important role in the human per-

ception of emotion. In face-to-face interactions, facial expressions are main

channels for conveying emotions. Many methods like the tracking of facial feature

points and observing the subject’s upper body and eyes were used for measuring emotions so as

to aid the learning process. By analyzing emotions, inference can be drawn about the affective state of the

learner, which can be used to optimize his/her learning. This article focuses on maintaining a positive learning rate of a student

being taught in a classroom using facial expression recognition and tree augmented naive (TAN) Bayes classifier on a biped robot platform. A TAN Bayes classifier is used

Constructive

learning for

human–robot

interaction

AMARJOT SINGH,

SRIKRISHNA KARANAM,

AND DEVINDER KUMAR

© ISTOCKPHOTO/LINDA BUCKLIN

© CAN STOCK PHOTO/SVEDOLIVER

Th

e U

biq

uit

ous

Machine

14 IEEE POTENTIALS

to recognize the affective emotional states. Once the emotional state of the student is recognized, his/her learning rate is observed using the theory of con-structive learning. The tutor continu-ously aims for maintaining a healthy learning rate. The actions performed by the tutor in order to maintain a healthy learning rate are mimicked by the wire-less biped robot using TAN and facial expression recognition. It is envisaged that in the future, robots and not human teachers will be tutoring students. To this end, the proposed system incorporates a biped robot to mimick the human teach-

er’s action. The biped robot will thus help understand better human–robot interaction and its effects on the learning process of students.

Constructive learningPrevious theories studying human

emotions have proposed a number of basic or prototype emotions from a min-imum of two to a maximum of 20. Eight basic emotions, fear, anger, sorrow, joy, disgust, acceptance, anticipation, and surprise, were distinguished by Plutchik. Fear, anger, sadness, and joy are the most common emotions appearing on

many theorists lists. Ekman focused on a set of six to eight basic emotions associ-ated with facial expressions. However, none of the existing theories seem to address emotions commonly seen in learning experiences, some of which we have noted in Fig 1.

Figure 2 indicates the interconnection of cognitive dynamics of the learning process and emotion axes. Positive emo-tions are to the right side of the horizon-tal axis, whereas the negative emotions are to the left side. The vertical axis above the horizontal axis represents constructive learning while the one below the horizontal axis symbolizes destructive learning. In order to maintain a healthy learning rate of a student, the affective emotional state of the student should be kept within the first two quad-rants only. If the emotional state is observed to be shifting into the third or fourth quadrants, necessary actions need to be taken by the tutor to prevent the emotional transition, which may lead to a restart of the whole learning process.

Assume a situation in class where a human instructor is lecturing the stu-dents, and accordingly the robot is mim-icking the teacher. Also assume that during the course of the lecture, the instructor observes a shift in the emo-tional state of the students from a moti-vated to a confused state. He immedi-ately makes an attempt to make the students understand the topic being studied to stop their emotional state from shifting to the second quadrant or the confused state. This attempt of the teacher is mimicked by the biped robot by moving left. This attempt has to be made by the instructor since it is possi-ble that the student might lose track of what is being taught and his or her emotional state might subsequently move into the next quadrants. If the ini-tial attempts turn out out to be futile, the attempts should be intensified by the instructor as the emotional state of the students can shift from a con-fused state—the second quadrant—to a depressed state—the third quadrant—and possibly to the fourth quadrant. All these shifts in the emotional states of the students are observed and recog-nized with the help of the TAN Bayes classifier, and the actions of the teacher are mimicked by the biped robot. The instructor should analyze the emotional state of the students carefully and should try to move his/her emotional state back to first quadrant since, if it moves to the third quadrant (state of

Negative Emotions Positive Emotions

Constructive Learning

Destructive Learning

MotivationConfusion

Frustration Hope

Fig. 2 A graphical representation of the relationship between emotions and learning.

Negative Zero Positive

Anxiety Worry Hopeful Confident

Boredom Indifference Interest Curiosity

Frustration Confusion Insight Enlightenment

Dispirited Dissatisfied Satisfied Thrilled

Fig. 1 Indicating the transitions between emotions.

JULY/AUGUST 2013 15

depression), it is highly likely that it will further move to the fourth quadrant and will lead to a restart of the whole learn-ing process.

Facial expression recognitionA facial emotion recognition algo-

rithm is applied on the face to recognize the emotional state of the person as one of the four emotions: happy, confused,

frustrated, and sad (as shown in Fig. 3). The algorithm identifies 12 action points, as shown in Table 1, on the face. These action points or motion features from the face are obtained using a face-tracking algorithm. These motion vectors are further fed into the Bayesian net-work classifier as inputs, resulting into a specific emotion.

Face tracking and feature extraction

The face tracking algorithm used in this article is based on a system developed by Tao and Haung popu-

larly known as a piecewise Bezier volume deformation (PBVD) tracker. The system of Tao uses a model-based approach to construct a 3-D wireframe model of a face using the Bezier volume. A 3-D Bezier volume is defined as

( , , ) ( )

( ) ( ),

f x y z b B x

B y B z

i

n

j

m

k

l

ijk in

im

il

0 0 0

#

== = =

/ / /

where ( , , )f x y z is a point inside the volume. The variables x , y , and z are parameters in the range 0 to 1, bijk are control points for the bezier surface and , , ( )B B B zi

nim

il are Bernstein polynomials.

To obtain accurate face models, multi-ple Bezier volumes based on facial feature points such as eye corners and mouth cor-ners are generated. The movement of facial points is observed by recording the change in control points. (This is quite important as tracking the face determines the movement of facial points which is essential to compute the emotional state of the student.) A change in each control point bijk by an amount ,dijk leads to a displacement for each face model point, which can be written as

( , , ) ( ) ( ) ( ) .S x y z d B x B y B zk

l

j

m

i

n

ijk in

im

il

000

====

///

This displacement equation plays an important role in tracking the face as it governs the movement of facial model points. This equation can also conve-niently be written in a matrix form repre-sented as

S BD= ,

where S is the matrix representing dis-placement of facial model points, the B

matrix contains the Bernstein polyno-mials, and the D matrix contains the

Table 1. Action unit and description.

Action Unit Description

1 Vertical movement of the center of upper lip

2 Vertical movement of the center of lower lip

3 Horizontal movement of the left mouth corner

4 Vertical movement of the left mouth corner

5 Horizontal movement of the right mouth corner

6 Horizontal movement of the right mouth corner

7 Vertical movement of the right brow

8 Vertical movement of the left brow

9 Lifting of thr right cheek

10 Lifting of the left cheek

11 Blinking of the right eye

12 Blinking of the left eye

(a)

(c) (d)

(b)

Fig. 3 Depicting different kinds of emotions of two different people. (a) Frustrated, (b) confused, (c) happy, and (d) hopeless or sad.

16 IEEE POTENTIALS

displacement vectors of the control points bijk . The loca-tions of the control points bijk in the Bezier volume can be used to alter the shape of face model generated using the Bezier curve. From this matrix, we can conveniently determine the displacements of the facial points by pro-gramming the matrix S in MATLAB.

Once the model is fitted to the face, it can be used to track head motion and facial deformations of features such as eyebrows, eyelids, and mouth. The model is matched from intial frame ( )t 0= to next time step ( 1t = ) for measurement of 2-D facial motion features. Each facial feature is stated in terms of Bezier volume control parameters, which symbolizes a simple defor-mation on the face. These motion vectors are referredto as motion units (MUs). These are similar to the Ekman’s action units (AUs)

but not equivalent as they are numeric in nature.

A linear combination of MUs is used to express the general motion model which can be written as

q

,S B D D D

q

qBDQk

k

0 1

0

1f= =

h

J

L

KKKK

N

P

OOOO

6 @ (1)

where qk are the respective magnitudes of each displacement vector Dk .

To understand the principle behind the face-tracking algorithm, we show the way in which a face is modeled using the Bezier curve concept.

Tree-augmented naive bayes classifierThis article uses tree argumented naive

TAN Bayes classifierto identify features as shown in Fig. 4. The algorithm aims at finding the TAN structure that maximizes the likelihood function given the training data out of all possible TAN structures. The tree structure is evaluated by computing pairwise class conditional mutual informa-tion among the features and builds a max-imum weighted spanning tree using the pairwise mutual information as the weights of the arcs in the tree. Kruskal’s algorithm

Class

Feature 1 Feature 2 ··· Feature N

Fig. 4 An example of a TAN classifier.

(a) (b)

(c) (d)

Fig. 6 (a) The transmitter circuit of a wireless robot, (b) the receiver circuit of a wireless robot, (c) the data sheet of a transmitter circuit of a wireless robot, and (d) a datasheet of a receiver circuit of a wireless robot.

Fig. 5 The biped robot.

JULY/AUGUST 2013 17

is used to compute the maximum weighted acyclic spanning tree. In Kruskal’s algo-rithm, the maximum weighted spanning tree is built by taking one edge at a time. Further, in order to find the TAN structure with maximum likelihood, the modified Chow Liu algorithm is used. The five steps of the TAN algorithm are described in Algorithm 1. This procedure will give us the TAN model that maximizes the likeli-hood of the data we have.

Biped robotOnce the emotion has been recog-

nized, the biped wireless robot mimicks the actions taken by the tutor in order to keep the affective emotional state in the first two quadrants using a PC-controlled transmitter and receiver. The Robot is a 12 axis degrees of freedom (DOF) servo controlled bipedal walking robot. It includes three DOF for a thigh joint, one for a knee joint, and two for an ankle joint. The biped robot is actuated by means of 12 servo motors. The transmit-ter is connected to a PC while the receiver is placed on the wireless biped robot shown in Fig. 5. A signal, transmit-ted to the robot, is received by the receiver and the robot performs the desired action with respect to the signal.

The signal is transmitted using a trans-mitter [Fig. 6(a)] having an HT12E encoder, which encodes the signal to be transmitted to the receiver [Fig. 6(b)]. The transmitter is connected to the PC using MCT2E integrated circuit chips. The infor-mation to be sent is encoded on a 12-b signal sent over a serial channel. The 12-b signal is a combination of eight address bits and four data bits. The information sent by the transmitter is received by the HT 12D connected to the receiver. The detailed circuit diagram of the transmitter as well as receiver is shown in Fig. 6(c) and (d). The information is analyzed as eight address bits and four data bits.

The biped robot mimics the actions performed by the tutor to maintain a healthy learning rate. When the emo-tional state of a person shifts from the first quadrant, which is the motivated state, to the second quadrant, the con-fused state, the tutor repeats the topic taught to shift the emotional state from confused to motivated or from the first second to the first quadrant. Even though it may be possible that the student is fol-lowing the topic even while in the second quadrant, it is also equally likely that he may lose track of what is being taught due to confusion (because the second quadrant represents a

Algorithm 1. The TAN algorithm.

1) For each pair of features ( , ), ,X X i j n1i j f! , compute the class conditional pair wise mutual information given by

( , | ) ( )( | ) ( | )

( | );logI X X C P x x c

P x c P x c

P x x ci jp i j

x x c

i ji j

i j

i j

!= / .

2) Build a complete unidirected graph in which each vertex is a variable, and the weight of each edge is the mutual information computed in 1).

3) Build a maximum weighted spanning tree using Kruskal’s algorithm.

4) Transform the unidirected maximum weighted spanning tree of 3) to a directed graph by choosing a root node and ponting the arrows of all edges away from the root.

5) Make the class node the parent of all the feature nodes in the directed graph of 4).

Motivated/FirstQuadrant

Motivated/SecondQuadrant

Corrective Action by Tutor

Biped Robot Turn Left

No

Yes

Frustrated/3rdQuadrant

Intensified CorrectiveAction by Tutor

Left Turn by BipedRobot-Twice

No

Yes

Restart Learning Process

Fig. 7 A schematic of the corrective action taken by the tutor and mimicked by the biped robot.

18 IEEE POTENTIALS

confused state). Thus, the tutor repeats the topic being taught. This action is mimicked by the robot by taking a left turn. The teacher intensifies the process of repeating topics when the transition from the second quadrant to the third quadrant is observed as the emotional state shift from confused to frustrated state ultimately results in a restart of the learning process. In this case, the robot takes the left turn twice as shown in Fig. 7. The biped robot moves forward when the emotional state is motivating and it turns left when the emotional state becomes confused. Turning left is analo-gous to an action performed to keep the emotional state in the first quadrant.

If frustration is recognized as an emo-tional state, the robot moves left two times, showing an increased intensity by the teacher in trying to make the students understand the topic and hence keeping the emotional state in the first or the second quadrant. If the student goes into the fourth quadrant, the whole learning process is restarted, as shown in Fig. 7.

ResultsThe results obtained from the

simulations enable us to analyze and maintain a healthy learning rate of a student by observing affective emotional states using facial expressions. The emotional state of a specific student among a group of students being taught by a human teacher is studied and analyzed. The emotional states are recognized using a TAN classifier, which is further used to maintain a positive learning rate. Necessary actions are performed by the teacher to maintian a posi-tive learning rate, which are mim-icked by the biped robot. The simulations are performed on MATLAB on a Core 2 Duo, 1.83 GHz machine.

A topic familiar to the stu-dent is taught in the beginning. The emotional state of the stu-dent is recognized using 12

action points or motion features, as shown in Fig. 8. A Bezier volume is constructed con-necting the motion po in t s . Fu r t he r , a TAN c las s i f i e r is used to recog-nize the emotional state of the student. The resultant emotion recognition matrix generated by the TAN classifier is shown in Table 2. The student was able to easily understand the topic and is happy and motivated to learn, hence the affective emotional state of the student is in the first quadrant of the learning cycle with a positive learn-ing rate. As the topic becomes compli-cated or difficult for the student to understand, his/her emotional state shifts from motivated to confused, into the second quadrant. In order to main-tain a healthy learning rate, the system makes an attempt to the keep the emo-tional state of the student in the first two quadrants.

The topic being taught results in a shift in the emotional state and can be repeated once, if the emotional state is shifting from the first to the second quadrant while the action is repeated twice if the emotional state is shifting from the second to the third quadrant. Figure 9 shows the varia-tion in the emotional state with time along the x axis and frequency of actions per-formed along the y axis. This repetition is mimicked by the movement of the wire-less biped robot. The biped robot moves once to the left for the shift from first to second quadrant while the action is repeated twice (i.e., the biped robot moves two left turns mimicking the intensified attempt on the part of the teacher) for a

shift to the third quadrant. A shift in the emotional state of the stu-dent to the fourth quadrant will lead to a fresh beginning in the whole learning process as shown in the schematic in Fig. 7. A graph showing the learning rate of the student (on the y axis) plotted against time in seconds (on the x axis) is shown in Fig. 10. Since the learning rate of the student is directly related to the movement of the robot, as explained in the preceeding lines, we can even determine how the robot moves with respect to time.

ConclusionThis article presents a system

that can be used to train stu-dents to learn at a healthy rate. The affective emotional state-ment of a student is analyzed using a TAN classifier and an attempt is made to maintain a positive learning rate. It is observed that the student’s emotional state lies in the first and second quadrant for a posi-tive learning rate and shifts to the third and fourth quadrant for a negative learning rate. An attempt is made by the system to maintain the positive learn-ing rate by repeating the topic

Fig. 8 Motion features.

4

3.5

3

2.5

2

1.5

Le

arn

ing R

ate

1

0.5

00 5 10 15 20 25

Time

30 35 40 45 50

Fig. 10 A graph showing the learning rate of a person.

2.5

2

1.5

1

0.5

0

Time

= 5

Time

= 10

Time

= 15

Time

= 20

Time

= 25

Time

= 30

Time

= 35

Time

= 40

Time

= 45

Time

= 50

Quadrant 1Quadrant 2

Fig. 9 A graph showing the learning rate of the author with blue color corresponding to quadrant 1 and red correspond-ing to quadrant 2.

Table 2. The emotional recognition confusion

matrix.

Emotion Happy Confused Frustrated Sad

Happy 93.17 3.93 2.4 0.5

Confused 3.88 92.96 2.87 0.27

Frustrated 1.96 2.99 91.38 3.67

Sad 4.12 2.58 0.79 92.48

JULY/AUGUST 2013 19

responsible for the fall in the learning rate. The action is performed once if the emotional state is in the second quad-rant, while it is performed twice in case of third quadrant. The actions are mim-icked by a biped robot by turning left for a particular action. Finally, a posi-tive learning rate is maintained.

AcknowledgmentsThe authors would like to thank the

anonymous reviewers for their construc-tive suggestions to improve the presen-tation of this manuscript.

Read more about it • J. Han, M. Jo, S. Park, and S. Kim, “The educational use of home robots for children,” in Proc. IEEE Int. Workshop Ro-

bot and Human Interactive Communica-

tion (ROMAN 2005), 2005, pp. 378–383. • R. Murphy, T. Nomura, A. Billard, and J. Burke, “Human–robot interac-tion,” IEEE Robot. Automat. Mag., vol. 17, pp. 85–89, 2010. • B. Fagin and L. Merkle, “Measur-ing the effectiveness of robots in teaching computer science,” in Proc. 34th SIGCSE

Tech. Symp. Computer Science Education, Jan. 2003, vol. 35, no. 1, pp. 307–311.

• B. Kort, R. Reilly, and R. W. Picard, “An effective model of interplay between emotions and learning: Reen-gineering educational pedagogy–Build-ing a learning companion,” in Proc. IEEE

Int. Conf. Advanced Learning Technolo-

gies, 2001, pp. 43–46. • R. W. Picard, “Affective comput-ing: Challenges,” Int. J. Human–Comput.

Stud., vol. 59, no. 1–2, pp. 55–64, 2003. • S. Afzal, T. M. Sezgin, Y. Gao, and P. Robinson, “Perception of emotional expressions in different representations using facial feature points,” in Proc. Af-

fective Computing and Intelligent Inter-

action and Workshops, 2009, pp. 1–6. • J. D. Vermunt, “The regulation of constructive learning processes,” Brit-

ish J. Educ. Psychol., vol. 68, no. 2, pp. 149–171, 1998. • P. Ekman, “Universals and cul-tural differences in facial expression of emotion,” in Proc. Nebraska Symp. Moti-

vation, 1972, pp. 207–283. • R. Plutchik, Emotion, a Psycho-

evolutionary Synthesis. New York: Harp-er and Row, 1980. • P. Ekman, “An argument for basic emotions,” Cogn. Emot., vol. 6, no. 3–4, pp. 169–300, 1992.

• H. Tao and T. S. Huang, “Connect-ed vibrations: A modal analysis approach to non-rigidmotion tracking,” in Proc.

IEEE Conf. Computer Vision and Pattern

Recognition, 1998, pp. 735–740. • T. W. Sederberg and S. R. Parr, “Free-form deformation of solid geomet-ric models,” in Proc. SIGGRAPH, 1986, pp. 151–160.

About the authorsAmarjot Singh ([email protected])

is a research engineer with the Tropical Marine Science Institute at the National University of Singapore (NUS). He com-pleted his bachelor’s degree in electrical and electronics engineering from the National Institute of Technology Warangal.

Srikrishna Karanam ([email protected]) is currently pursuing his bachelor’s degree in electronics and communication engineering at the National Institute of Technology Warangal, India. He is an IEEE Student Member.

Devinder Kumar ([email protected]) is an undergraduate student researcher currently pursuing his bache-lor’s degree in electrical and electronics engineering at the National Institute of Technology Warangal.

Progress isn’t always written in stone.(Sometimes it’s online.)

IMA

GE

CO

UR

TE

SY

OF

ST

OC

K.X

CH

NG

/PA

TR

ICK

MO

OR

E.

Visit us on Facebook at:

www.facebook.com/IEEE.Potentials

Digital Object Identifier 10.1109/MPOT.2013.2271519