Workshop on Utilizing EEG Input in Intelligent Tutoring ...frasson/FrassonPub/ITS-2014-W5...Aplusix [4] and Scatterplot [5], and the participants were made to wear an Emotiv headset

Workshop on Utilizing EEG Input in Intelligent Tutoring Systems (ITS2014 WSEEG)

Workshop CoChairs

Kaimin Chang1

Claude Frasson2

1Carnegie Mellon University, USA 2University of Montreal, Canada

https://sites.google.com/site/its2014wseeg/

https://www.google.com/url?q=https%3A%2F%2Fsites.google.com%2Fsite%2Fits2014wseeg%2F&sa=D&sntz=1&usg=AFQjCNF8vfa6m1XQJ2kbz9EJxdXC1LMYDw

Preface The ultimate intelligent tutoring system could peer directly into students' minds to identify their mental states (e.g. engagement, cognitive load, competencies, intentions) and decide accordingly what and how to teach at each moment. Recent advances in brain imaging technologies have lead to several portable EEG headsets that are commerciallyavailable and show promise for use in intelligent tutoring systems. The EEG signal is a voltage signal that can be measured on the surface of the scalp, arising from large areas of coordinated neural activity manifested as synchronization (groups of neurons firing at the same rate). This neural activity varies as a function of development, mental state, and cognitive activity, and the EEG signal can measurably detect such variation. Using signals recorded from lowcost, portable EEG devices, Chang et al. trained machine learning classifiers to detect reading difficulty in an intelligent tutoring systems (Chang, Nelson, Pant, & Mostow, 2013), student confusion while watching course material (Wang, 2013), and user frustration while using a spoken dialog interface (Sridharan, Chen, Chang, & Rudnicky, 2012). Frasson et al. also used EEG to model learners' reactions in ITS (Blanchard, Chalfoun, & Frasson, 2007), detect learners' emotions (Heraz & Frasson, 2007), assess learners' attention (Derbali, Chalfoun, & Frasson, 2011), and more recently to show that subliminal cues were cognitively processed and have positive influence on learners' performance and intuition (Chalfoun & Frasson, 2012; Jraidi, Chalfoun, & Frasson, 2012). Szafir and Mutlu demonstrated that ARTFul, an adaptive review technology for flipped learning that monitors student's attention during educational presentations and adapts reviewing lesson content, can improve student recall abilities by 29% and in less time (Szafir & Mutlu, 2013). Azcarraga and Suarez used a combination of EEG brainwaves and mouse behavior to predict the level of academic emotions, such as confidence, excitement, frustration, and interest (Azcarraga & Suarez, 2012). These early results shows promise for augmenting intelligent tutoring systems with EEG signals. Advances in EEGITS require close collaboration between education researchers, machine learning scientists and computational neuroscientists. To this end, an interdisciplinary workshop can play a key role in advancing existing and initiating new research. Our workshop would be the first of this type to be held at the ITS conference. We hope that it will attract an interdisciplinary target audience consisting of researchers in education, machine learning, and neurosciences.

June, 2014 Kaimin Chang and Claude Frasson

Program Committee Judith Azcarraga, De La Salle University, Manila, Philippines Tiffany Barnes, North Carolina State University, USA Carole Beal, University of Arizona, USA Pierre Chalfoun, University of Montreal, Canada Maher Chaouachi, University of Montreal, Canada Lotfi Derbali, University of Montreal, Canada Karola Dillenburger, Queen's University Belfast, UK Imène Jraidi, University of Montreal, Canada Jack Mostow, Carnegie Mellon University, USA Brian Murphy, Queen's University Belfast, UK Bilge Mutlu, University of Wisconsin–Madison, USA Daniel Szafir, University of Wisconsin–Madison, USA Martin Talbot, Warner Bros, Canada Merlin Teodosia Suarez, De La Salle University, Manila, Philippines Yanbo Xu, Carnegie Mellon University, USA

Table of Contents Modelling EEG Signals for the Prediction of Academic Emotions 1 Judith Azcarraga, Nelson Marcos, and Merlin Teodosia Suarez Emotional Transitions in Driving 7 Pierre Olivier Brosseau, Thi Hong Dung Tran, and Claude Frasson Smart headbands for monitoring functional brain activity 10 James Dieffenderfer, Mychael Chance Bair, Justis Peters, Andrew Krystal, and Alper Bozkurt A Study of Learner’s Mental Profile in Different Categories of Tasks 12 Ramla Ghali and Claude Frasson Exploring the Behavior of Novice Programmers’ EEG Signals for Affectbased Student Modeling 19 Tita R. Herradura, Joel P. Ilao, and Merlin Teodosia C. Suarez Classification of video game players using EEG and logistic regression with ridge estimator 21 Gustavo A. LujanMoreno, Robert Atkinson, George Runger, Javier GonzalezSanchez, and Maria Elena ChavezEcheagaray Predicting subsequent memory from singletrial EEG 27 Eunho Noh, Grit Herzmann, Tim Curran, and Virginia R. de Sa Extracting temporal EEG features with BCIpy 29 Justis Peters, Sagar Jauhari, and Tiffany Barnes Intelligent tutors exploiting novel sensing modalities for decoding students' attention 35 Alvaro Soto, Felipe OrihuelaEspina, Diego Cosmelli, Cristian Alcholado, Patrick Heyer, and L. Enrique Sucar An Exploration of Two Methods for using fMRI to Identify Student Problem Solving Strategies 37 Caitlin Tenison and John R. Anderson EEG Helps Knowledge Tracing! 43 Yanbo Xu, Kaimin Chang, Yueran Yuan, and Jack Mostow A Public Toolkit and ITS Dataset for EEG 49 Yueran Yuan, Kaimin Chang, Yanbo Xu, and Jack Mostow

Modelling EEG Signals for the Prediction of Academic Emotions

Judith Azcarraga, Nelson Marcos, Merlin Teodosia Suarez

Center for Empathic Human-Computer Interactions College of Computer Studies, De La Salle University, Manila, Philippines

jay.azcarraga,nelson.marcos,[email protected]

Abstract. Forty-nine (49) young learners of ages 12 to 16, all academic achiev-ers in Mathematics and Science, are asked to answer algebra problems while their brainwaves are being captured using an Emotiv headset. While engaged in the Aplusix algebra learning software, the learners are prompted to report their academic emotion, namely frustration, confusion, boredom and interest, giving a score from 0 to 100 for each of the emotions. Using 126 features based on 14 EEG channels from the Emotiv sensor, several classifiers are built. The initial classifiers built using decision trees showed very limited ability to predict the academic emotions, with a mean accuracy rate for the four academic emotions of only 0.48. The prediction rate improves significantly, however, by using Multi-Layered Perceptrons (MLP) instead – yielding a mean accuracy of 0.60. The prediction performance is shown to further improve when training and test-ing are restricted to specific categories of learners, i.e. based on gender as well as hand-dominance. For the MLP classifiers, prediction performance consistent-ly yields an accuracy of higher than 0.60 to as high as 0.75 for each of the four academic emotions when train and test datasets have been restricted to all right-handed males or all right-handed females. Aside from the accuracy, the Area Under the Curve (AUC), as an additional measure of performance, is also re-ported.

Keywords: EEG, academic emotions, decision trees, MLP

1 Introduction

Students experience various emotions while engaged in learning activities and these emotions have both cognitive and affective dimensions. Emotions tend to influence their performance, memory, motivation and even decision-making [1]. As such, inter-vention by a human teacher or a computer tutor may be necessary when students are bored, frustrated or confused. This is in order to improve their learning performance and increase their motivation to continue with the rest of the learning activity. The expression of such emotions, however, may not be obvious unless the teacher or tutor is an expert in assessing student emotions. Fortunately, the expression of such emo-tions may be manifested not only through facial expressions but also through other physiological reactions. In one neurophysiology research, negative emotions, such as

1

“disgust” were found to be associated with right-sided activation in the frontal and anterior temporal regions whereas “happiness” was found to be associated with a left-sided activation in the anterior temporal region [2].

Brainwave patterns as physiological signals may be captured by physiological sen-sors such as an electroencephalogram (EEG) device. We report in the paper some results of a very recent study that built classification models for predicting academic emotions of a group of academically-gifted students based solely on their EEG sig-nals. Academic emotions considered in the study are confused, interested, frustrated and bored.

Intellectually-gifted children, characterized by high intellectual abilities based on some intelligence measures and academic achievement, possess learning needs that are different from average-intellect children. With these special abilities, these stu-dents are provided with differentiated instructional strategies to match their learning abilities [3].

2 Experimental Set-up and Data Preparation

2.1 Data Collection

Forty-nine (49) Grades 7 and 8 students, all academic achievers with ages 12 to 16, participated in the study. Students were selected based on their excellent overall aca-demic performance, particularly in Science and Mathematics, and for having passed a highly selective entrance examination.

The selected samples of academically-gifted students were subjected to a calibrat-ed, video-taped learning session using two learning systems for Mathematics, namely Aplusix [4] and Scatterplot [5], and the participants were made to wear an Emotiv headset [6] equipped with emotion-sensing devices connected to a computer. Soft-ware modules for collecting simultaneous data signals from the EEG sensor, the user screen, and for self-reporting of emotions were deployed.

The learning session that involves the use of Aplusix is the last part of an hour-long series of activities. During the entire session, the Emotiv Epoc EEG sensor was attached to the head of the participant. Each of them was asked to report the level of their confusion, boredom, frustration, interest as well as the difficulty of the task by clicking on a sliding bar. Prior to the session, the participants were instructed to re-port their emotions every 2 minutes, after solving each problem, and whenever they sense that there is a change of emotion or task difficulty.

For the first activity, each participant was asked to rest for about 3 minutes. The EEG signals collected during this ‘resting-state’ were used as a baseline data of the participant. The succeeding activities involve a slide presentation about scatterplot, Scatterplot problem solving, slide presentation about the Aplusix software, and lastly, solving of algebra problems using the Aplusix software. The Aplusix session lasted for about 10 minutes which involve 6 algebra problems ordered from easy to moder-ate to difficult.

2

2.2 Data Preprocessing and Data Preparation

According to studies on emotion, emotions persist for about 0.5 to 4 seconds [7,8]. Given this information, the data signals collected from the experiments were seg-mented into 2-second window samples and all the pre-processed EEG data and self-reported emotion tags were carefully synchronized, merged and uniformly segmented into 2-second windows with 1-second overlap. Each segment was treated as a single instance in the dataset.

Since the techniques used for predicting the academic emotion associated to each test instance are all supervised, the self-reported emotions were used as tags (or label) during training. Note that the slide bar provided for self-report allows the user to rate their emotion (confused, frustrated, bored, interest) from 0 to 100. The emotion rate is then discretized to either low or high, which is what is then used to tag/label each instance. The label is low if the emotion value is less than 50, otherwise, the label is set to high.

The EEG sensor that was used in this study is the Emotiv EPOC sensor. A com-mercial product typically used for gaming purposes, the Emotiv EPOC sensor is equipped with 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) based on the International standard 10-20 locations.

Because there is substantial noise/artifacts in the EEG signals captured by hard-ware EEG sensors (which would significantly degrade the quality of the signals), various data preparation and pre-processing operations were performed. Artifacts are removed by EEG data transformation i.e. frequency transform followed by the appli-cation of low-pass and high-pass filters.

Brainwave signals are typically within the frequency range of 0-45 Hz. There are five major frequency bands, namely: alpha, theta, beta, delta and gamma. Alpha, beta and gamma bands were chosen as the frequency bands to which raw data were trans-formed. The data were segmented and transformed into 2-second window samples using the Fast Fourier Transform functions. This was done using built-in functions of Octave, an open-source high-level language for numerical computations. There were four resultant frequency bands: alpha (8-12 Hz), low beta (12-21 Hz), high beta (21-30 Hz) and gamma (31-50 Hz).

Two features were extracted from each frequency band of each EEG channel: the peak magnitude and the mean spectral power. Aside from these, other features such as the average energy over brain areas (frontal, parietal, temporal, occipital), brain asymmetry scores or lateralization of the 7 right-left electrode pairs from the alpha band and the energy of beta for all the electrodes, were extracted [9]. All told, a total of 126 features were extracted from each segmented raw samples.

Two sets of EEG data for each participant were processed. The first set is the EEG data taken during the ‘resting-state’ while the other is gathered during learning ses-sion. In this paper, we only report the results based on the EEG data collected while the learner was using the Aplusix algebra problem-solving software. The data taken during the Aplusix session were converted into deviations from the baseline EEG of the ‘resting-state’ session. The processed data are then normalized and standardized

3

z-scores within the range of [-3,3]. Extreme z-scores were treated as aberrations and were clipped to at least -3 or at most +3.

Table 1 shows the distribution of participants. Different datasets based on gender and hand dominance were built for each emotion. These datasets were used in build-ing different classification models for the 4 emotions. Note that the datasets composed of Left-Handed Male and Left-Handed Female participants were not included as there were too few participants to make any meaningful classification model for them.

Table 1. Distribution of Participants

Male Female Total Right-Handed 25 16 8 Left-Handed 6 2 41

Total 31 18 49

3 Results and Discussions

Each of above-listed datasets was classified using the C4.5 Decision Tree (DT) and Multi-Layered Perceptron (MLP) modeling techniques. RapidMiner [10] was used in building the models and generating the performance rate. For the validation of a clas-sifier, the student-level cross validation technique was employed. Prior to testing, the train set is balanced by repeating randomly-selected instances until the number of instances in both classes low and high is the same. This is a required data preparation step for the proper use Multi-Layered Perceptrons (MLP) as a classifier.

In evaluating the prediction performance of the classifiers for each of the 4 aca-demic emotions, two performance measures were used, namely Accuracy and Area Under the Curve (AUC). These measures are all based on the contingency table (con-fusion matrix) and are computed based on the True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). The results are shown in Table 2.

Compared to using a Decision Tree, the performance in predicting emotions signif-icantly improves when MLP classifier is employed. Note from the Table 2 that the accuracy on the entire dataset (All) increased from 0.49 to 0.52 for Frustrated, 0.59 to 0.64 for Confused, 0.44 to 0.59 for Bored and 0.39 to 0.63 for Interested. The mean accuracy over all four emotions and for all participants in the study is 0.48 using a Decision Tree, and the figure increases to 0.60 using MLP.

We also can notice a general improvement in the prediction performance when the dataset is more restricted to one gender, and/or to one hand dominance. In Table 2, the figures that appear in bold-face are those performance rates of restricted datasets that are better than the performance rate for the entire dataset (All) written in italics. For the MLP classifiers, prediction performance consistently yields an accuracy of higher than 0.60 to as high as 0.75 for each of the four academic emotions when train and test datasets have been restricted to all right-handed males or all right-handed females. In fact, compared to the 0.60 mean accuracy for the entire dataset, the mean

4

accuracy rates for the All-Right-Handed Male and All-Right-Handed-Female are 0.66 and 0.67, respectively.

Table 2. Performance of Decision Tree (DT) and Multi-Layered Perceptron (MLP)

Decision Tree MLP Accuracy AUC Accuracy AUC

Frustrated

All 0.50 0.55 0.52 0.53 All Male 0.69 0.70 0.61 0.62 All Female 0.41 0.42 0.75 0.73 All Right-Handed 0.49 0.52 0.52 0.52 All Left-Handed 0.30 0.27 0.30 0.26 All Right-Handed Male 0.60 0.60 0.63 0.63 All Right-Handed Female 0.26 0.27 0.70 0.69

Confused


Bored


Interested


Aside from the accuracy, the Area Under the Curve (AUC), as an additional meas-

ure of performance, is also shown in Table 2. The same general trend can be observed with the AUC for the MLP being higher than that of the Decision Tree. Furthermore, whereas the AUC of the MLP for the entire dataset is 0.57, the AUC for the All-Right-Handed-Male and All-Right-Handed-Female are 0.65 and 0.64 respectively.

5

4 Summary and Conclusion

This study provides baseline data and information that will be useful for the future design of affect-aware computer-based learning systems targeted at academically gifted learners.

The initial classifiers built using decision trees showed very limited ability to pre-dict the academic emotions, with a mean accuracy rate for the four academic emo-tions of only 0.48. The prediction rate improves significantly, however, by using Mul-ti-Layered Perceptrons (MLP) instead – yielding a mean accuracy of 0.60. The pre-diction performance is shown to further improve when training and testing are re-stricted to specific categories of learners, i.e. based on gender as well as hand-dominance. For the MLP classifiers, prediction performance consistently yields an accuracy of higher than 0.60, to as high as 0.75 for each of the four academic emo-tions when train and test datasets have been restricted to all right-handed males or all right-handed females.

Aside from the accuracy, the Area Under the Curve (AUC), as an additional meas-ure of performance, is also reported. The same general trend is observed with the AUC for the MLP being higher than that of the Decision Tree. Also, whereas the AUC of the MLP for the entire dataset is 0.57, the AUC for the All-Right-Handed-Male and All-Right-Handed-Female are 0.65 and 0.64 respectively.

5 References

1. Forgas, J.P.: Network Theories and Beyond. Handbook of Cognition and Emotion. pp. 589–611. John Wiley & Sons, Ltd (1999).

2. Davidson, R.J., Ekman, P., Saron, C.D., Senulis, J.A., Friesen, W.V.: Approach-withdrawal and cerebral asymmetry: emotional expression and brain physiology. I. Journal of Personali-ty and Social Psychology. 58, 330–41 (1990).

3. Lee, S.Y., Olszewski-Kubilius, P.: A Study of Instructional Methods Used in Fast-Paced Classes. The Gifted Child Quarterly. 50, 216–235,273 (2006).

4. Nicaud, J.-F., Bouhineau, D., Huguet, T.: The Aplusix-Editor: A New Kind of Software for the Learning of Algebra. In: Cerri, S., Gouardères, G., and Paraguaçu, F. (eds.) Lecture notes in computer science. pp. 178–187. Springer, Berlin/Heidelberg (2002).

5. Baker,R., Walonoski,J., Heffernan,N., Roll,I., Corbett,A., Koedinger,K.: Why Students En-gage in “Gaming the System”Behavior in Interactive LearningEnvironments. Journal Of In-teractive Learning Research. 19,185–224 (2008).

6. Emotiv EPOC Headset, http://www.emotiv.com 7. Ekman, P.:Expression and the Nature of Emotion. In: Scherer, K. and Ekman, P.(eds.) Ap-

proaches to Emotion. pp. 319–344. Erlbaum,Hillsdale, NJ. (1984). 8. Levenson, R.W.: Emotion and the Autonomic Nervous System. A Prospectus for Research

on Autonomic Specificity. In: Wagner, H.L. (ed.) Social Psychophysiology and Emotion: Theory and Clinical Applications. pp. 17–42. John Wiley & Sons, Hoboken, N.J. (1988).

9. Chanel, G.: Emotion Assessment for Affective Computing Based on Brain and Peripheral signals, (2009).

10. Rapidminer Tool, http://www.rapidminer.com

6

http://www.emotiv.com/

http://www.rapidminer.com/

Emotional Transitions in Driving

Pierre Olivier Brosseau, Thi Hong Dung Tran, Claude Frasson

Université de Montréal Département d’informatique et de recherche opérationnelle

2920 Chemin de la Tour, Montréal, H3T-1J4, Canada pierre-olivier.brosseau,mylife.tran, [email protected]

Abstract : emotions resulting from driving situations can have an impact on security both for drivers and passengers of the car. In this experiment we have built a virtual driving environment able to detect and assess emotions felt by a driver. We use for that EEG systems with a driver immersed into virtual emotional driving situations. We observe how emotions evolve from situation to another. According to the situation and the driver’s profile, different advices are given by an agent to calm the corresponding emotions. Keywords : Emotions, Simulation, EEG, Driving, Emotional Transitions 1 Introduction In driving situations emotions can arise and have an impact on driver's reactions [1]. It seems that road rage concerns more than 16 million drivers in the United States [2]. Generally, emotions that increase the reaction time in driving situations are the most dangerous. Cai et al. [3] found that anger and excitement, in a scenario involving several drivers, caused an increase in heart rate, breathing and skin conductivity. More specifically, drivers who are not in the neutral state infringe more often. Works undertaken by a team at the Institute Human-Machine Communication in Munchen confirm the influence of the affective state on driver performances. Jones and Jonsson [4] have presented a method to identify five emotional states of the driver during simulations. They used neural networks as classifiers, but they have not studied the impact of ambient noise. In this paper we address the following questions: How do we measure or estimate the emotion of the driver in certain situations? How can we reduce these emotions? The use of Electroencephalograms (EEG) sensors is precise and the most up to date technology [5]. In fact, EEG signals are able to detect emotions and cerebral states which can highlight what happens in the brain. 2 The Emotional Car simulator To generate and assess emotions in a driving situation we have built a Virtual Environment able to simulate specific driving situations including sources of emotions. The virtual environment takes the form of a game in which the player is a driver who

7

is presented with a variety of realistic situations (nine scenarios likely to provoke emotions) that everybody could experience every day in the traffic. The emotion corrector, represented by virtual emotional agent, is intended to reduce the emotions of the driver by giving advices. To collect the data we used the EPOC headset built by Emotiv. EPOC is a high resolution, multi-channel, wireless neuroheadset which uses a set of 14 sensors plus 2 references to tune into electric signals produced by the brain to detect the user’s thoughts, feelings and expressions in real time. From the Emotiv Epoc we detect four primary emotions: boredom, excitement, frustration and meditation. For example the followings scenarios show 1) a participant who has to find a place in a public parking. There is only one place left and before the participant can reach it, another car takes it. The participant has to look around to find another place (Figure 1), and 2) a fire truck comes from behind and starts its siren. The participant has to move his car to the right and stay immobilised until the fire truck is gone (Figure 2).

Figure 5. The parking slot (Scenario #1) Figure 2. The Fire Truck (Scenario #2)

In scenario 1, when the driver is looking around to find a place to park, we observe the following transitions of emotions. 85% participants who were excited remained excited. Participants who were excited transited to frustration with a significant 68%. 73% participants that were in the engagement state remained engaged, and 78% transited to a state of excitement. 89% participants who were engaged became frustrated and 89% of participants that were frustrated became bored. References 1. Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the Necessity

and Feasibility of Detecting a Driver’s Emotional State While Driving, Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science Volume 4738, 2007, pp 126-138 (2007).

2. CNN news, 2006. CNN News Health Study: 16 million might have road rage disorder, June 5, 2006. http://www.cnn.com/2006/HEALTH/06/05/road.rage.disease.ap/..

3. Cai, H., Lin, Y., Mourant, R. R., 2007, Study on Driver Emotion in Driver-Vehicle-Environment Systems Using Multiple Networked Driving Simulators. DSC 2007 North America – Iowa City – September 2007.

4. Jones, C., Jonsson, I.M.: Automatic recognition of affective cues in the speech of car drivers to allow appropriate responses. In: Proc. OZCHI (2005).

8

http://www.cnn.com/2006/HEALTH/06/05/road.rage.disease.ap/

5. Chaouachi, M., Jraidi, I., Frasson, C.: Modeling Mental Workload Using EEG Features for Intelligent Systems. User Modeling and User-Adapted Interaction, Girona, Spain, 50-61(2011).

9

Smart headbands for monitoring functional brain activity James Dieffenderfer1, Mychael Chance Bair1, Justis Peters1, Andrew Krystal2 and Alper

Bozkurt1

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

NORTH CAROLINA STATE UNIVERSITY, RALEIGH, NC 27695-7911, UNITED STATES

DEPARTMENT OF PSYCHIATRY DUKE UNIVERSITY, DURHAM, NC 27710, UNITED STATES

Abstract. This paper presents our efforts towards a smart headband with incorporated miniaturized wireless functional near infrared spectroscopy system as a sensor node for wearable body area networks. We also present our efforts towards incorporating EEG amplifier to this headband. The built prototype was used to successfully transmit deoxygenation of forearm muscle tissue during pressure cuff induced ischemia through an established Bluetooth link. The system can run around 5 hours continuously with provided 90mAh lithium polymer batteries and transmit data to distances more than 75 meters.

Keywords: Near infrared spectroscopy, body area networks, Bluetooth, physiological sensing.

1. Introduction

Metabolic activity in human tissue involves the transformation of energy and matter for survival and functioning. Metabolic state is often directly reflected to the local oxygenation levels, which in turn influences the optical proper-ties. As oxygenated (HbO2) and deoxygenated (Hb) hemoglobin demonstrate different optical properties in the near infrared (NIR) region, the functionally induced changes in these can be detected through functional optical imaging methods [1]. Functional near infrared spectroscopy (fNIRS) is one of such methods that uses the absorption and scattering spectra of hemoglobin at different states and provides a noninvasive, portable and low-cost way of characterizing the func-tional change in tissue oxygenation [2]-[3]. fNIRS would provide a useful sensing node for wireless body area net-works (wBAN) by assessing local and global oxygenation information of the muscle and brain tissue during daily life activities. We present here our initial efforts towards a smart headband with fNIRS capability to wirelessly transmit local oxygenation information through a Bluetooth link.

10

2. Sys

The fNIRtroller seing on th30ms at ttered lighis first am

2.1 ElecThe elect

contains thCC2540 (Tmm2 packafit in with 2er budget isThe secondenough curcuit provid

2.2 Inco

We are wotype whereA represen

Referenc

1. D. BOpti

2. A Vneuroscie3. A. Bing of ne

Fig. 1. Cin the h

F

tem Descrip

RS system cannds time multi

he pulse widththe same frequht from the tissmplified and th

ctronic Controtronic control

hree different cTexas Instrumeage, CC2540 pr2.4 GHz Bluetos also possible d stage of the crrent (10-100mes variable inte

orporating EE

rking towards e the EEG systetative eye clos

ces

Boas, M. Francical Tomographillringer, B. Cences, 20:10, p

Bozkurt, A. Rosewborn brain.,”

Circuitry for theeadband. (b) Sil

Fig. 2. A smart ban

ption

n be described iplexed pulse tr. The 660nm L

uency and finalsue is collectedhen filtered bef

ol Layer layer is the top

circuit stages. Tents), combininrovides 8 KB oooth standardswith its flexib

control layer comA) to turn theensity from the

EG to the Head

adding one chem and relateding opening te

ceschini, A. Duhy,” CRC Preshance, “Non-in

pp. 435–42, Ocsen, H. Rosen,” Biomedical E

e smart bandage licone cushion o

ndage with EEG c

as two parts: trains to the ligLED pulses at ly both LEDs a

d by two photofore inputted to

p layer of the r

The first stage ng an 8051 miof RAM and up to establish cole power modeonsists of two

e LEDs on. CCe duty cycle of

dband

hannel EEG recd amplification st is presented

unn, G. Strangss, pp. 193–221nvasive optica

ct. 1997. B. Onaral, “A

Engineering On

in the form facton the face of the

capability to be inc

the wireless fNht emitting dioa frequency o

are turned off fdiodes on eithe

o the microcont

rigid PCB (Fig

is the micrococro-controller p to 256 KB ofonnections withes. LED driver ci

C2540 lacks df pulse width m

cording to our (based on INAin Figure 2.

gman, “Noninv1, 2002. al spectroscopy

A portable nearnline, 4:1, p. 29

tor of a headbande device.

corporated into the

NIRS headbandode (LED) drivof 1 kHz for 30for 40ms beforer side of the Ltroller.

gure 1A). It is

ontroller and rwith a radio-ff flash memoryh computers an

ircuits as the mdigital to analogmodulated signa

headband systA118) and filte

vasive Imaging

y and imaging

r infrared spect9, Jan. 2005.

d. (a) Rigid PCB

e headband system

d and the base vers which outp0ms, then the re the cycle is rLED package.

connected to t

adio. We usedfrequency transy. It also provind smartphone

microcontrollerg converters; tals sent from th

tem. Figure 2 ser circuits were

g of Cerebral A

of human bra

troscopy syste

B sitting

m

station. The mputs intensities940nm LED prepeated. The bThe photodiod

the smart headb

d a single chip sceiver. Comindes tailored so

es. Optimizing

r itself is unabltherefore the dhe CC2540.

shows our initie connected to

Activation with

ain function.” T

m for bedside

microcon-s depend-pulses for backscat-de output

band and

solution, ng in 6x6 oftware to

the pow-

le to sink driver cir-

ial proto-CC2540.

h Diffuse

Trends in

monitor-

11

A Study of Learner’s Mental Profile in Different Categories of Tasks

Ramla Ghali and Claude Frasson

Département d’informatique et de recherche opérationnelle Université de Montréal

2920 Chemin de la Tour, Montréal Québec, Canada, H3C 3J7

ghaliram, [email protected]

Abstract: Adapting learning according to the learner’s profile is an essential characteristic of Intelligent Tutoring Systems. Several studies have proposed different approaches to that aim. These methods are based mainly on the learn-er’s individual traits, performances and emotions. However, few studies are in-terested to consider the learner’s behavior variation according to the nature and type of the presented task. In this paper, we have focused on the learner’s men-tal profile based on electroencephalogram signals analysis and classification, in different cognitive tasks. These tasks are composed of three main categories (memory, concentration and reasoning) and organized according to a varying difficulty level, from easiest to hardest. Primary results showed that learner’s performance depends on the category of a task. Furthermore, some mental states (engagement and workload) are correlated with the cognitive task catego-ry.

Keywords: Memory cognitive tasks, Concentration cognitive tasks, reason-ing cognitive tasks, EEG, engagement, workload.

1 Introduction

In Intelligent Tutoring Systems (ITS) adapting learning according to learner’s profile is a fundamental criterion of intelligence. In fact, several researchers have suggested to spend more effort in defining a precise architecture of learner’s profile and to adapt learning according to the different components of this profile [8, 9]. Indeed, defining a precise and a steady profile for a learner is very challenging for many reasons. Among these reasons, we all know that learning is a complex process. It can indeed be influenced by several factors; external factors related to the environment (interface quality, course organization, etc.) and internal factors related to the learner (current emotions while learning, learner’s motivation, learner’s engagement on a task, etc.). All these factors could have a direct influence on learner’s performance and conse-quently learning success. Thus, many studies from different disciplines (artificial intelligence, human computer interaction, cognition and neuroscience) have focused on detecting and assessing users’ mental profile based on different approaches, and more specifically electroencephalogram (EEG) signals analysis and classification [3, 4, 6]. The major part of these systems was based on two fundamental mental metrics, namely, mental workload and mental engagement. Mental workload refers to the por-

12

tion of operator information processing capacity or resource that is actually required to meet system demands [5]. Mental engagement is related to the level of mental vigi-lance, attention and alertness during the task. However, most of these approaches do take into consideration neither the brain specificities nor the type of cognitive tasks involved.

In this paper we aim at assessing learner’s mental states variation in different cate-gories of a set of cognitive tasks that we developed. These mental states are issued from signals treatment and analysis provided by the B-alert software [1]. Thus, we formulate the following hypothesis: 1) we think that the type of a task has an impact on the learner’s performance, 2) the type of a task has an influence on learner’s men-tal states, showing more or less cognitive workload and engagement.

2 Related Work

To date several studies were conducted to detect, assess and predict learner’s states evolution during interacting with e-learning environments which can influence learn-ing. Among these states, we can quote learner’s emotions, motivation, behavior, etc. For example, to detect if a student is engaged or not in a task, Beck [2] has built a student model that is based on 3 parameters (response time, question difficulty and nature of answer (correct or not)). This model can trace learner’s engagement by cal-culating a probability based on learner’s previous performance and behavior. Besides, Johns and his colleagues [7] have used dichotomous Item Response Theory (IRT) models to estimate student’s proficiency in answering multiple choice questions. These approaches are mainly based on learners’ statistics while interacting with a system. On the other side, some researchers have considered data issued from certain physio-logical sensors, more specifically electroencephalogram (EEG), to detect learner’s states of engagement and disengagement. For example, Pope [10] has developed an EEG engagement index based on brainwave band power spectral densities and ap-plied in a closed-loop system to modulate task allocation. Performance improvement was reported using this engagement index for task allocation mode (manual or auto-mated).This index has been even effective to detect learner’s attention and vigilance in learning tasks [4]. Furthermore, Stevens and al [11] explored the feasibility of mon-itoring EEG indices of engagement and workload acquired and measured during per-formance of cognitive tests. Results showed an increase of engagement and workload during the encoding period of verbal and image learning and memory tests compared with the recognition period. They showed also that workload increased linearly with the level of difficulty. Moreover, Galan and Beal [6] evaluated positively the use of EEG for estimating attention and cognitive load (Workload) during math problems. They could be used for predicting learner’s success or failure by a combination of engagement and workload measures established by Stevens and his colleagues. In the same vein, we proposed in this work to adopt a sensor-based approach in order to detect learner’s mental evolution in different categories of cognitive tasks. We also used the last two metrics proposed by Stevens and al [11] to track learner’s mental states evolution. However, we think that these two metrics depend not only on the difficulty of a proposed task but also on the nature and the type of this task. In order to prove this assumption, we developed a set of different categories of cognitive

13

tasks. We conducted also, an experiment to gather learner’s EEG data. Our main goal from the primary experiment was to study learner’s mental state evolution (essentially engagement and workload) according to the nature of a proposed task, using the B-Alert software [1]. In the following, we will describe briefly these cognitive tasks.

3 Cognitive Tasks Categories

We developed a set of 2D cognitive tasks for studying the learners’ performances and their brain behavior. This set contains three categories of tasks (memory, concentra-tion and reasoning) and three difficulty levels (easy, medium and hard) presented in an ascending order. The user can choose freely each time the category and the task to do but he has to complete all the tasks at least once. So, the tasks are ordered differ-ently according to the user’s choice and are grouped by task category. Each category includes two to three subcategories of different tasks. In what follows, we will present these tasks ordered by category name.

3.1 Memory

The following category is mainly based on the popular task of Digit Span. In this task, we familiarize the learner with a series of numbers and ask him to remember and type them afterwards. We implemented two versions of this task: Forward Digit span (FDS) when it comes to typing the numbers retained in the same order in which they appeared on the screen and Backward Digit Span (BDS) when it comes to type num-bers in the reverse order of their apparition. Each version has 6 difficulty levels or-dered from easiest (L1) to hardest (L6).

3.2 Concentration

This category contains two subcategories of concentration tasks: Feature Match and Rotations described below. 3.2.1 Feature Match (FM) This task consists in identifying whether the two images appearing on the screen are identical or not according to their forms, numbers and colors. It has also six difficulty levels (ranging from L1 to L6) which vary in its geometrical number and forms (see figure 1).

Figure 1 Example of Feature Match for level 6

14

3.2.2 Rotations (RT) This task is similar to the previous task. It has five difficulty levels which vary de-pending on the complexity of the image content (number of shapes). It consists in identifying whether two images match or not in doing their rotations.

3.3 Reasoning

This category contains three sets of reasoning tasks: Arithmetic addition, Odd One Out, and Intuitive reasoning. 3.3.1 Arithmetic Addition (AA) In this task, we kindly ask the learner to add two variable numbers. Like any other task, this one has three levels of difficulty, where in each level we vary the number of digits to add going from 2 until 4. 3.3.2 Odd One Out (OO) This task has three difficulty levels. For each difficulty level, it has a fixed series of images. Every series has a certain correspondence between images (color, form, num-ber, etc.) and one odd one out which is different according to one or more characteris-tics (see figure 2). Thus, the learner has to identify each time the odd one out image.

Figure 2 Example of Odd One Out task

3.3.3 Intuitive Reasoning (IR) This task has three levels of difficulty (varying according a time constraint: unlimited, 1mn and 30s) and 15 series in total; where every level contains 5 series of exercises. Unlike other tasks, this task is based on intuitive or analogical reasoning (see figure3).

Figure 3 Example of a series of intuitive reasoning task

4 Experiment

In order to study learners’ mental states variation in different tasks category, twenty participants (9 women and 11 men, mean age=28, standard deviation=4.67) were invited to play our cognitive tasks. This study lasted about 2 hours, mainly distributed

15

into three steps: (1) Initially, we installed the B-Alert X10 headset on the participant to set up the EEG, (2) the participant is invited to do 3 tasks of baseline defined by the manufacturers of this headset [14] to establish a classification of mental states, (3) the participant is finally invited to play our set of cognitive tasks which composed of 3 categories as described above. During all the experiment, electroencephalogram (EEG) was recorded by using a Wi-Fi cap with a two linked mastoid references. 9 sensors (F3, Fz, F4, C3, Cz, C4, P3, Poz and P4) were placed on the participant’s head using the international 10-20 system. EEG was sampled at a rate of 256 Hz and converted to Power spectral densi-ties (Alpha, Beta, Theta and Sigma). EEG was processed by the B-Alert software [1]. This software allows us to obtain a real time classification of certain mental states (sleep Onset, Distraction, Low engagement, High Engagement and High workload). From this set, we selected in this study the Workload and Engagement states. Thus, we synchronized the EEG mental states of Engagement and Workload with all the categories of cognitive tasks developed. These mental states are labeled and synchro-nized with task category by using data (corresponding system time) from learner’s log files (during accomplishing the cognitive tasks) and B-Alert software (EEG mental states).

5 Statistical Results

We recall that in this work, we want to consider the following points: (1) examine the learner’s performance variation depending on the category of task, (2) study the varia-tions of mental states depending on the category of task.

5.1 Learners’ performance and category of task

First, we computed for each category and for each task, the average of the learners’ scores, as well as their standard deviation (see table 1). The scores of tasks are calcu-lated as follows: Each correct answer is worth 1 point and each response incorrect is 0. Then, for each task, we calculated the percentage of total score achieved in the task (TST: Total Score in Task), as well as the percentage of total score achieved in the task category (TSTC: Total Score in Task category).

Table 1 Distribution of scores between tasks

Task Category Mean (SD) of TSTC Task Name Mean (SD) of TST Memory 65.66 (2.46) FDS 64.61 (3.14)

BDS 67.64 (3.96) Concentration 81.90 (1.35) FM 82.20 (1.70)

RT 81.32 (2.24) Reasoning 58.14 (2.52) AA 67.79 (4.47)

OO 63.03 (5.85) RI 49.47 (2.97)

From this table, we can notice that the category of concentration has the best per-

centage of scores for all the learners, as well as Feature Match task. However, rea-soning category is ranked the last. This can be explained by the fact that the category

16

of task could have an impact on learner’s performance. In fact, concentration tasks do not require much mental workload comparing to reasoning tasks that require a lot of concentration, memory work, arithmetic calculation, etc. To confirm this hypothesis, we conducted a one way ANOVA test after checking the normal distribution of scores for each category (using SPSS Q-QPlot). This test allows us to obtain a very signifi-cant result (F(2,477)=31.1,p=0.000*). So, we can assume that learner’s performance depends on the category of task.

5.2 Learner’s mental profile and category of task

To analyze the relationship between mental states and task category, we conduct first a descriptive analysis by comparing the distribution of engagement, workload and a category of task (see table 2). Task Category Engagement (%) Workload (%)

Min Max Mean (SD) Min Max Mean (SD) Memory 33.06 80.11 59.44(1.09) 53.13 84.53 69.36(0.75) Concentration 36.14 92.15 61.14(1.47) 21.43 77.66 60.78(1.2) Reasoning 33.34 98.70 64.86(1.84) 26.02 79.23 66.09(1.21) From this table, we can notice that memory task is the most difficult task. It has the highest workload (69.36%) comparing to other tasks. However, concentration task is the easiest one. Furthermore, we can see that reasoning tasks are the most engaged tasks. This result can be explained by the fact that reasoning tasks are challenging and increase learners’ interest and involvement.

Second, we used three one way ANOVA tests. For the mental state of work-load, results are very significant (F(2,224)=18.33, p=0.000*). Indeed, for the mental state of engagement, results are also significant (F(2,224)=3.32, p=0.04*). Further-more, we obtained an average correlation between workload and engagement states (R=0.4, p=0.00*). So, we can conclude that workload and engagement states depend on task category. This result is very consistent since learner’s concentration and men-tal activity increases according to the nature of proposed task; more the nature or category of the task is interesting, more he is engaged on the task. Moreover, more the learner is engaged on the task, more he tries to reason and learn, so his workload in-creases.

6 Conclusion

In the present study we have shown that both learner’s performance and mental pro-file (which is composed of engagement and workload states) depend on the category of cognitive task involved. So, we can confirm that the type or the nature of a pro-posed task in learning has a significant impact on learner’s mental states and conse-quently his performance. This finding leads us to take into consideration learner’s cognitive capacity before proposing a task and then try to adapt learning depending to the evolution of some selected outputs issued from EEG signal. More specifically, our future work will focus on real time learner’s adaptation according the learner’s mental states variation and the type of proposed task.

17

Acknowledgments We acknowledge the Fonds Québécois de la Recherche sur la Nature et les Technologies (FQRNT), CRSH (LEADS project) and NSERC for funding this work. References 1. Advanced Brain Monitoring, B-Alert X10, 2013. http://advancedbrainmonitoring.com/xseries/x10/. 2. Beck, J.: engagement tracing: using response times to model student disengagement. The international

Conference on Artificial Intelligence in education, 88-95 (2005). 3. Berka, C., Levendowski, D.J. et al.: Real-Time Analysis of EEG Indexes of Alertness, Cognition, and

Memory Acquired With a Wireless EEG Headset. International Journal of HCI, 17, 151 - 170 (2004) 4. Chaouachi, M., Chalfoun, P., Jraidi, I., Frasson, C. : Affect and mental engagement : Towards

adapatabilityfor intelligent systems, FLAIRS, 355-361 (2010). 5. Eggemeier, F.T., Wilson, G.F. et al.: Workload assessment in multi-task environments. Multiple task

performance. D.L. Damos. London, GB, Taylor & Francis, Ltd.: 207-216 (1991). 6. Galon, F., & Beal, C. R.: EEG estimates of engagement and cognitive workload predict math problem

solving outcomes. Proceedings of UMAP, Montreal Canada (2012) 7. Johns, J., Mahadevan, S., Woolf. B.: Estimating Student's Proficiency using an Item Response Theory

Model, International Conference on Intelligent Tutoring System (2006). 8. Murray, T. Authoring intelligent tutoring systems: An analysis of the state of the art. Journal of Artificial

Intelligence in Education, 10, 98-129 (1999). 9. Oppermann, R., Rasher, R. Adaptability and adaptivity in learning systems. Knowledge Transfer (1997). 10. Pope, A.T., Bogart, E.H., Bartolome, D.S.: Biocybernetic system evaluates indices of operator

engagement in automated task. Biological Psychology 40, 187-195 (1995) 11. Stevens, R., Galloway, T., Berka, C.: EEG-Related Changes in Cognitive Workload, Engagement and

Distraction as Students Acquire Problem Solving Skills. In: Conati, C., McCoy, K., Paliouras, G. (eds.) User Modeling 2007, vol. 4511. pp. 187-196. Springer Berlin / Heidelberg (2007)

18

https://docs.google.com/a/email.arizona.edu/viewer?a=v&pid=sites&srcid=ZW1haWwuYXJpem9uYS5lZHV8Y2Fyb2xlLXItYmVhbHxneDo2YTM1OTVmMjVkMmU4NDk4

http://telearn.archives-ouvertes.fr/hal-00197339/

Exploring the Behavior of Novice Programmers’ EEG

Signals for Affect-based Student Modeling

Tita R. Herradura1, Joel P. Ilao

2, Merlin Teodosia C. Suarez

2

1College of Science and Computer Studies, De La Salle University-Dasmarińas

City of Dasmarińas, Philippines

[email protected]

2College of Computer Studies, De La Salle University, Philippines

joel.ilao, [email protected]

Abstract. Ten (10) first year college programming students participated in the

study and reported their emotion during the learning session. The Emotiv EPOC

sensor is used to gather brainwave signals. Digital signal processing techniques

such as filtering and transformation were used to preprocess the data. The study

visualizes the behavior of EEG signals for each academic emotion. Using spec-

tral plots, engagement exhibits higher amplitude for most of the participants

while boredom exhibits lower amplitude. Confusion and frustration shows in-

consistencies in their behavior. Statistical features were used for feature extrac-

tion. Several machine algorithms are used to classify emotions. C4.5 has an ac-

curacy rate of 97.91%.

1 Introduction

The study focuses on the characterization of behavior of student’s emotions using

brainwave signals. Using digital signal processing techniques, it visualizes the beha-

vior of EEG signals for each academic emotion. It is limited to beta waves since it is

responsible for cognitive activities [1]. The study also explores the behavior of frontal

lobes on these beta waves. Recent EEG studies have shown that when a subject per-

forms cognitive or judgment tasks that require keeping something in mind over a

short period, a number of areas in the prefrontal cortex are active [2]. Since no re-

search has been done to explore the behavior of frontal lobes on beta waves in a learn-

ing context, this work aims to fill the gap.

2 Methodology and Results

The engaged academic emotion shows higher energy for most of the students as

compared with other emotions, which may imply that the student shows interest in the

19

mailto:[email protected]

activity. Figure 1 shows the average amplitude of student’s reported emotion for the

eight (8) frontal nodes. Boredom signals have lower amplitude. This may imply that

boredom is associated with the absence of change in cognitive action (see Fig. 1).

Confusion and frustration have shown inconsistencies in their behavior. This may

imply that students have difficulty differentiating the two academic emotions or con-

fusion and frustration signals may have similarities in their behavior.

Six (6) statistical features are extracted to produce the dataset. Several classifica-

tion algorithms were used to determine the accuracy of performance. C4.5 has an

accuracy rate of 97.91%. It has higher accuracy results compared with [3]. This may

be attributed to data preprocessing techniques used to generate the dataset.

3 Conclusion

Using digital signal processing techniques on EEG data provided relevant results to

determine the behavior of learner’s academic emotion. Engagement is found to have

higher amplitude for most of the participants while boredom exhibited lower ampli-

tude. Confusion and frustration have shown inconsistencies in their behavior. These

findings are dependent on the data obtained from the ten participants. Future works

include increasing the number of participants and identifying the personality traits of

the students.

Acknowledgements. We thank all the participants in this study. We also thank Ms.

Judith J. Azcarraga, the Center for Empathic Human-Computer Interaction and the

Commission on Higher Education for all the support in conducting this research.

References

1. Boutros, N., Galderisi, S., Pogarell, O.: Standard Electroencephalography in Clinical Psy-

chiatry: A Practical Handbook. Wiley (2011)

2. Ackerman, S.: Discovering the Brain. Washington, DC, USA: National Academies Press

(1991)

3. Mampusti, E.T., Ng, J.S., Quinto, J.J.I., Teng, G.L., Suarez, M.T.C., Trogo, R.S.: Measuring

Academic Affective States of Students via Brainwave Signals. In: 2011 Third International

Conference on Knowledge and Systems Engineering (KSE), pp. 226–231. IEEE (2011)

Fig. 1. Average amplitude of student’s academic emotion

20

Classification of video game players using EEG and logistic regression with ridge estimator

Gustavo A. Lujan-Moreno, Robert Atkinson, George Runger, Javier Gonzalez-Sanchez and Maria Elena Chavez-Echeagaray

Arizona State University, School of Computing, Informatics and Decision Systems Engineer-ing, Tempe, Arizona. gustavo.lujanmoreno, robert.atkinson,

george.runger, jgonza24, mchaveze @asu.edu

Abstract. The objective is to classify a group of subjects playing a video game as experts and novices using electroencephalogram (EEG) signals as inputs. Analytical methods applied to multi-channel EEG recording are described. A fast Fourier transform (FFT) is used to calculate the power spectral density for a number of bandwidths (delta, theta, alpha and beta) and ratios (e.g., theta/beta). A regularized logistic regression learning algorithm (L2 penalty) was applied to the extracted features. We successfully classified 80% of the instances using a 10 fold cross-validation.

Keywords: data mining, electroencephalogram (EEG), gaming, linear regres-sion, logistic regression, machine learning, ridge estimators.

1 Introduction

The utilization of headsets to gather brainwaves in the past was merely restricted for health purposes. Lately, the applications have expanded to other areas such as tutoring systems, learning environments and video games [5]. One study for example used Microsoft Kinect and EEG to isolate body movements of participants while playing a virtual ball game [2]. Concerns regarding usability, standardization, minimum set up time and ethical issues are still some of the challenges when using these type of de-vices [5].

Although different devices and methods exist to monitor brain activity such as: electrocorticography (ECoG), electroencephalography (EEG), functional magnetic resonance imaging (fMRI), functional near-infrared spectroscopy (fNIRS), positron emission tomography (PET), magneto-encephalography (MEG), near-infrared spec-troscopy (NIRS), and intracortical electrodes (ICE) [2][5], EEG has consistently showed to be accurate and convenient for practical purposes [5].

Furthermore, different areas of EEG applications have been identified such as: de-vice control, user-state monitoring, evaluation, training and education, cognitive im-provement, gaming and entertainment, [5]. Several commercial EEG devices even try to estimate the affective state of the users in real time such as: frustration, meditation, fatigue, stress, drowsiness, distraction, task engagement and mental workload [6][7].

21

Past studies have tried to classify or predict an outcome based on physiological in-formation. For example, recent studies have used non-parametric tests and regression analysis on predictors based on heart rate, skin conductance and EEG to assess learn-ers’ attention while overcoming obstacles [8].

In this study we use an EEG system to generate features that are later used to clas-sify participants as experts or novices. For this purpose we use the video game Guitar Hero. This video game requires attention, rhythm and coordination. Video games have shown to have substantial influence in education, healthcare and even social change [3].

Our hypothesis is that experts and novices have a different cognitive process when playing a video game. Novices will normally go through a learning curve which in-cludes the video game interface, the purpose of the game and finally the skill. Experts on the other hand normally try to achieve a “flow” state. Experts in different areas describe this feeling as if time and even the purpose of the game didn’t matter. Re-searchers have describe this “flow” zone as an enjoyable and satisfying experience [4].

We present the process to classify experts and novices using EEG signal as inputs. Furthermore, we approach the task from a statistical machine learning perspective applying a logistic regression algorithm that uses ridge estimators.

2 Methods

2.1 Game Environment

As previously mentioned, the Guitar Hero video game was used for this study. Guitar Hero is a game that involves holding a guitar interface while listening to music and watching a video screen. This type of video game uses a combination of graphics, multimedia and challenges that require to develop different skills. This gives the best context in which to elicit changes in the different brain waves frequencies that users generate. Guitar Hero provides a scenario where subjects are challenged in different ways that demands from them different skills that could be related with a learning process such as concentration as well as visual, motor, and auditory skills. The objec-tive of the game is to hit the right button(s) by looking at the “notes” streaming on the screen. The user has five colored buttons to press on the guitar fingerboard. Both hands are needed to play since the left hand clicks the color buttons on the guitar arm and the right hand is used to depress a switch that resembles a guitar strum or string picking.

2.2 Participants and Design

We recruited 21 subjects from Arizona State University of which 14 were men and 7 were women. Age ranged from 18 to 28 years. Participants were compensated with $10 USD and they had the option to leave the study at any time. Participants were asked to self-report their experience playing video games. A total of 8 participants

22

were selected: 4 men and 4 women whose age range from 18 to 28 years. According to a self-report four of them were classified as novices and four of them as experts. In this study a novice is defined as a person who does not normally play video games. On the contrary, an expert is defined as a person who not only plays video games frequently but who also considers himself proficient at playing Guitar Hero. The self-report was validated by the final score of the participants while playing Guitar Hero. Participants played two songs: an easy and a difficult one. The easy song, “Story of my life”, had length of 5 minutes and 40 seconds, 19 segments, and a total of 511 notes. The difficult song, “One”, had length of 7 minutes and 3 seconds, 25 segments, and 2189 total number of notes. We had a total of 16 data sets, one for each player-difficulty possible combination. We used Weka 3.6.10 to perform the analysis. Also, to compute the PSD the EEGLAB toolbox in Matlab was used. EEGLAB is a Matlab toolbox for processing EEG data that performs time/frequency analysis. 2.3 Data Sources: EEG

Several studies have shown that physiological measures are reliable sources to meas-ure learners’ attention [14]. In this case, we used a brain computer interface (BCI) device to capture neural oscillations also known as brain-waves signals. The neural oscillations are generated by the neural tissue and can vary by frequency, phase and amplitude.

For the EEG system we used Emotiv EEG. This device is a high resolution, multi-channel, wireless portable EEG system. It has 14 EEG channels with names based on the International 10-10 locations, these are: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4. The signal from the 14 channels has a sampling rate of 128 SPS.

2.4 Feature generation

The signal from the 14 channels was filtered with a band-pass filter (0.2–45Hz). Fur-thermore, Emotiv software applies digital notch filters at 50Hz and 60HZ in order to remove environmental artifacts. Once we had the decontaminated EEG raw signal we transformed the EEG data from time to frequency domain using a Fast Fourier Algo-rithm (FFT). In this study the Power Spectral Density (PSD) was calculated for the following bandwidths: delta, 0.1-4 Hz; theta, 4-8 Hz; alpha, 8-14 Hz; and beta, 14-30 Hz. The time/frequency decomposition was done for each of the 14 channels. Two ratios were also computed: the theta/beta and the delta/beta. Studies have shown that the theta/beta ratio may provide useful information in the study of affective and emo-tional regulation [2]. Additionally, power density ratios in frequency bands have been studied in neuroscience. For example, a studied showed that slow wave/fast wave (SW/FW) ratios increase in subjects with attention deficit hyperactivity disorder (ADHD) [2]. Since we have 8 participants and each one played 2 songs, we had a total of 16 combinations. However, one dataset was discarded due to problems with the timestamp. Consequently, with 15 rows of information we decided to average the power of each of the bandwidths and the two ratios in order to have only 6 variables.

23

Therefore, the logistic regression model would only need to compute 7 parameters (6 variables plus the intercept).

2.5 Logistic regression with ridge estimators

Logistic regression is a widely used method for classifying binary data. If we de-fine 𝑌𝑖 = 1 as the event of classifying subject i as an expert and 𝑌𝑖 = 0 otherwise then the probability that 𝑌𝑖 = 1 given the value 𝑋𝑖=(𝑋𝑖1, … ,𝑋𝑖7) can be defined as:

𝑝(𝑋𝑖) = exp (𝛽𝑗𝑋𝑖𝑗)7

𝑗=1

1 + exp (𝛽𝑗𝑋𝑖𝑗)7

𝑗=1

Normally, the optimal value for 𝛽 would be found maximizing the log likelihood function:

𝑙(𝛽) = [𝑌𝑖𝑖

𝑙𝑜𝑔 𝑝(𝑋𝑖) + (1 − 𝑌𝑖)log 1 − 𝑝(𝑋𝑖)]

The result will yield the well-known MLE for 𝛽. LeCessie and van Houwelingen (1992) proposed to use the same approach but this

time including a penalty 𝜆 to the norm of 𝛽:

𝑙𝜆(𝛽) = 𝑙(𝛽) − 𝜆‖𝛽‖2

The ridge parameter in this case is 𝜆 which controls the length of the norm of 𝛽. If we set 𝜆 = 0 this would be equivalent to the ordinary MLE. On the other hand as 𝜆 → ∞ all parameters 𝛽𝑗 tend to 0. The effect in the logistic regression model is the same as in linear regression. We allow a small bias in the parameters 𝛽𝑗 but this will allow to reduce the variance and stabilize the model especially for predictions. The optimal value for the ridge estimator is found through cross-validation such that the mean error rate is minimal. For more details refer to [10].

3 Experiments and results

As we previously mentioned, out of the 16 possible combinations we used 15 due to problems with the time stamp log in one of the expert-hard data sets. The inputs were standardized to have 𝜇𝑖 = 0 and 𝜎𝑖 = 1. The ridge parameter was tuned up using cross-validation and we found out that the algorithm performed well with 𝜆 = 0.001. The logistic regression algorithm with ridge estimators with and 10-fold cross-validation was able to correctly classify 80% of the samples. The true positive rate for class 1 (expert) was 0.86, with 6 experts classified as experts and 1 expert classified as novice. On the other hand, the true positive rate for class 0 (novice) was 0.75, with

24

6 novices correctly classified and 1 novice classified as expert. These results can been seen in Table 1.

Table 1. Accuracy by class and confussion matrix

Accuracy No. Instances 15

Accuracy 80% TP Rate Class 1 0.86 TP Rate Class 0 0.75

Classified as:

Actual 1 0 1 6 1 0 2 6

Table 2 shows the coefficients for the 7 parameters (6 variables plus intercept). The left part of the table shows the odds ratio. The odds ratio is the estimated increase in the probability of success 𝑝(𝑌𝑖 = 1) with a one-unit change in the value of the variable 𝑥𝑖 . We observe that the largest odd ratio is given by the delta variable. Furthermore, the second largest odds ratio is caused by the predictor beta. We see the correspondence between the right and left table since the odds ratio can also be computed as 𝑒𝛽𝑗 .

Table 2. Coefficients and odds ratio for logistic regression with ridge parameter

Coefficients

Variable Class 1 delta 12.1738 theta -16.7003 alpha 1.1019 beta 4.8335 Theta/Beta 2.3244 Delta/Beta -0.013 Intercept -4.5503

Odds ratio Variable Class 1

delta 193652.07 theta 0 alpha 3.01 beta 125.6488 Theta/Beta 10.2202 Delta/Beta 0.9871

We can see that variables with the largest coefficients are delta and theta. The significance for delta is Pr(> |𝑡|) = 0.0207 and theta is Pr(> |𝑡|) = 0.0051. The interpretation of the coeffi-cients in logistic regression is similar to that for the case of linear regression. This means that an increase in the delta variable will increase the chances of classifying a subject as an expert, assuming that all other predictors are held constant. On the other hand, an increase in the theta variable will decrease the chances of classifying a subject as an expert.

4 Conclusion and future work

We successfully classified experts from novices using logistic regression with ridge estimators. The final ridge parameter was tuned to be equal to 0.001. This value was selected while trying to minimize the cross-validation misclassification error. The results suggest that the cognitive process of an expert differs from a novice under the gaming context. The delta bandwidth (0.1-4 Hz) turned out to be significant and an

25

increment on this variable will increase the chances of classifying a subject as an expert. On the other hand, the largest coefficient in absolute value was the bandwidth theta (4-8 Hz). These conclusions shed some light regarding the importance of low frequencies when trying to classify experts from novices. Experts tend to have higher values of delta signals while novices are more inclined to have higher power in the theta bandwidth. It is necessary to conduct more studies to confirm these results using a larger sample size. One potential application of this finding is that we could certify a person as an expert not only by the number of training hours or final score but also by the analysis of brainwaves. This knowledge could be applied to know if for example, a driver is really processing information as an expert during a driving test with no need of a human evaluator. A medicine student could be evaluated during a surgery practice to know if he is in the “flow” or still struggling with the procedure. The so called “flow state” now could be described in terms of a model and not only as an abstract definition.

References

1. P. Putman, J. Peer, L. Maimari and S. Werff, "EEG theta/beta ratio in relation to fear-modulated response-inhibition attentional control, and affective traits", Biological Psy-chology 83 (2010) 73-78

2. G. Moitzi, I. Daly and G.R. Müller-Putz. “On the use of games for noninvasive EEG-based functional brain mapping”, IEEE Transactions on Computational Intelligence and AI in Games, Vol. 5, No.2, June 2013.

3. Quick, J. M., Atkinson, R. K., & Lin, L. (2012). The Gameplay Enjoyment Model. Inter-national Journal of Games and Computer Mediated Simulations, 4(4).

4. R. Berta, F. Bellotti, A. de Gloria, D. Pranantha and C. Schatten, -Electroencephalogram and Physiological Signal Analysis for Assessing Flow in Games-, IEEE Transactions on Computational Intelligence and AI in Games, Vol. 5, No. 2, June 2013.

5. J.B.F. van Erp, F. Lotte and M. Tangermann, Brain-Computer Interfaces: Beyond Medical Applications, Vol. 45, No.4, 26-34, April 2012.

6. R.R. Johnson, D.P. Popovic, R.E. Olmstead, M. Stikic, D.J. Levendowski and C. Berka, Drowsiness/Alertness algorithm development and validation using synchronized EEG and cognitive performance to individualize a generalized model, Biological Psychology 87(2), 241-250.

7. C. Berka, D.J. Levendowski, M.N. Lumicao, A. Yau, G. Davis, V.T. Zivkovic, R.E. Olmstead, P.D. Tremoulet and P.L. Grave, EEG correlates to task engagement and mental workload vigilance, learning and memory tasks, Aviation Space and Environmental Medi-cine 78(5), B231-B244.

8. L. Derbali, P. Chalfoun and C. Frasson, Assessment of learners' attention while overcom-ing errors and obstacles: an empirical study, AIED 2011, LNAI 6738, 39-46, 2011.

9. G.F. Wilson and F. Fisher, Cognitive task classification based upon topographic EEG data, Biological Psychology 40 (1995) 239-250.

10. S. Le Cessie and J. C. Van Houwelingen (1992). “Ridge Estimators in Logistic Regres-sion’, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 41, No. 1, 1992

26

Predicting subsequent memory from single-trial EEG

Eunho Noh1, Grit Herzmann2, Tim Curran3, and Virginia R. de Sa4

1 Department of Electrical and Computer Engineering, University of California, San [email protected]

2 Department of Psychology, The College of Wooster3 Department of Psychology and Neuroscience, University of Colorado Boulder

4 Department of Cognitive Science, University of California, San Diego

Abstract. We show that it is possible to successfully predict subsequent memory perfor-mance based on single-trial EEG activity before and during item presentation in the studyphase. Many studies have shown that the EEG signals during encoding of pictures or wordsare different between the later remembered vs. forgotten trials [11, 9]. In addition to brainactivity during encoding, it has also been found that the signals preceding the stimulus pre-sentation are different between the later remembered vs. forgotten trials [7, 8, 10, 3, 2]. Thesedifferences in brain activity between the subsequently remembered and forgotten trials areoften referred to as subsequent memory effects (SMEs).EEG for this study was previously recorded during a visual memory task [4]. The experimentwas divided into 8 blocks where each block consisted of a study phase and a recognitionphase. In the study phases, the participants were given pictures of birds and cars in differentblocks. In the recognition phases, they had to discriminate these target items from randomnew items using a rating scale with 5 options (recollect, definitely familiar, maybe familiar,

maybe unfamiliar, and definitely unfamiliar). The subjects were instructed to give recollect

responses only when they had a conscious recollection of learning the item in the study phase.Two-class classification was conducted on the recollected vs. unfamiliar trials by combin-ing the pre- and during-stimulus information in the EEG signal. The pre-stimulus classifierutilized the spectral information in multiple frequencies bands ranging from 4-40 Hz in thepre-stimulus period to predict good and bad brain states for memory encoding. The during-stimulus classifier combined the temporal and spectral information in the alpha band (7-12Hz) during study item presentation to predict whether the encoding process was successfulor not. The results from the individual classifiers were then combined to predict subsequentmemory for each trial.By combining the pre- and during-stimulus classifier outputs, we were able to achieve anoverall classification accuracy (calculated for all trials from the 18 subjects available for theclassification analysis) of 59.6 %. The subject with the highest classification rate showed anaccuracy of 71.1 % and the subject with the lowest classification rate showed an accuracy of51.8 %. The combined classification results showed a 2 % increase in performance from theindividual pre- and during-stimulus classifier results. The pre-stimulus and during-stimulusclassifiers each gave individual classification results significantly over chance with p < 0.05for 9 subjects (where the threshold for chance performance was defined based on the totalnumber of trials for each subject [5]). The combined classification results gave significantlyover chance results with p < 0.05 for 13 subjects out of the 18 subjects.A passive BCI system based on these classifiers could be developed to augment tutoringtools and tailor study scheduling to each individual’s brain dynamics. The system wouldmeasure the brain activity of a user in order to infer the user’s preparedness for learningand present study items at estimated optimal times. It would also monitor the brain activityduring learning/encoding to assess whether the encoding process was successful or not. Itemsdeemed unsuccessfully encoded could be re-presented for restudy purposes. As a long-termgoal, we would like to explore the effects of long term use of our system. We hypothesize thatthe implicit neurofeedback users would get from being presented study items only duringpredicted good brain states (for memory encoding) could help them learn to get in andremain in a receptive brain state more often. Thus it is possible that the system could beused to train students to be more effective learners. As requested in the Workshop on UtilizingEEG Input in Intelligent Tutoring Systems call for papers, the majority of the work has beenpreviously published [6].

27

2

ACKNOWLEDGEMENTS

This research was funded by NSF grants # CBET-0756828 and # IIS-1219200, NIH GrantMH64812,

NSF grants # SBE-0542013 and # SMA-1041755, and a James S. McDonnell Foundation grant,

and the KIBM Innovative Research Grant. We would like to thank Dr. Marta Kutas and Dr. Tom

Urbach for helpful comments.

References

1. Agresti, A., Caffo, B.: Simple and effective confidence intervals for proportions and differences of pro-portions result from adding two successes and two failures. The American Statistician 54(4), 280–288(2000)

2. Fell, J., Ludowig, E., Staresina, B.P., Wagner, T., Kranz, T., Elger, C.E., Axmacher, N.: Medial tempo-ral theta/alpha power enhancement precedes successful memory encoding: evidence based on intracra-nial EEG. Journal of Neuroscience 31(14), 5392–5397 (2011)

3. Guderian, S., Schott, B.H., Richardson-Klavehn, A., Duezel, E.: Medial temporal theta state before anevent predicts episodic encoding success in humans. Proceedings of the National Academy of Sciences106(13), 5365–5370 (2009)

4. Herzmann, G., Curran, T.: Experts’ memory: an ERP study of perceptual expertise effects on encodingand recognition. Memory & Cognition 39(3), 412–32 (2011)

5. Muller-Putz, G., Scherer, R., Brunner, C., Leeb, R., Pfurtscheller, G.: Better than random? a closerlook on BCI results. International Journal of Bioelectromagnetism 10(1), 52–55 (2008)

6. Noh, E., Herzmann, G., Curran, T., de Sa, V.R.: Using single-trial eeg to predict and analyze subsequentmemory. NeuroImage 84, 712–723 (2014)

7. Otten, L.J., Quayle, A.H., Akram, S., Ditewig, T.A., Rugg, M.D.: Brain activity before an event predictslater recollection. Nature Neuroscience 9(4), 489–491 (2006)

8. Otten, L.J., Quayle, A.H., Puvaneswaran, B.: Prestimulus subsequent memory effects for auditory andvisual events. Journal of Cognitive Neuroscience 22(6), 1212–1223 (2010)

9. Paller, K.A., Wagner, A.D.: Observing the transformation of experience into memory. Trends in Cog-nitive Sciences 6(2), 93–102 (2002)

10. Park, H., Rugg, M.D.: Neural correlates of encoding within- and across-domain inter-item associations.Journal of Cognitive Neuroscience 9, 2533–2543 (2010)

11. Sanquist, T.F., Rohrbaugh, J.W., Syndulko, K., Lindsley, D.B.: Electrocortical signs of levels of pro-cessing: perceptual analysis and recognition memory. Psychophysiology 17(6), 568–576 (1980)

28

Extracting temporal EEG features with BCIpy

Justis Peters, Sagar Jauhari, and Tiffany Barnes

North Carolina State University, Raleigh, NCjpgeter2,sjauhar,[email protected]

Abstract. We present BCIpy, an open source toolkit1 written in Python,which focuses on temporal features in EEG (electroencephalography)recordings from a single channel. BCIpy extracts and charts features andtrains a support-vector classifier (SVC) with these features. BCIpy is in-tended to support classifying subject responses to stimuli in intelligenttutoring systems, particularly when those responses include event-relatedpotentials (ERPs). We present a case study of EEGs recorded while stu-dents read passages in the Project LISTEN reading tutor [1]. To testour hypothesis that ERPs distinguish between hard and easy passages,we transform the first second of the EEG signal with a rolling median,train an SVC, and evaluate its classification accuracy. We conclude withrecommendations for study designs and data collection that would sup-port more accurate detection of ERPs, potentially leading to successfulclassification using temporal features of EEGs.

Keywords: EEG, BCI, brain-computer interface, NeuroSky, ERP, event-related potential, SVM, SVC

1 Introduction

Accurately adapting difficulty of exercises in an ITS can speed learning outcomesand maintain flow for the student, but traditional forms of assessment such asin-system quizzes and feedback questionnaires can interrupt flow and frustratestudents. Ideally, an ITS would adapt difficulty without interrupting flow. As Tanstates in his thesis, EEG could move us closer to this goal, by aiding inferenceabout the student’s mental state and supporting personalization of instruction[8]. Such inference from EEG allows insight without interrupting flow.

Low-cost systems, such as those from NeuroSky, enable in situ studies withelectroencephalography (EEG) at greater scale and lower cost. EEG providesrich information for personalization of instruction, but we must first understandhow voltage samples from single-channel EEG systems relate to learning.

In this paper, we present BCIpy, an open source toolkit which focuses ontemporal features in EEG recordings from a single channel. BCIpy extracts andcharts features and can train a support-vector classifier (SVC) with these fea-tures. Our main goal was to support classifying subject response to stimuli in

1 See http://bcipy.org for code and documentation.

29

http://bcipy.org

II Peters, Jauhari, and Barnes

intelligent tutoring systems, particularly when those responses include event-related potentials (ERPs). BCIpy is open source, written in Python, licensedwith GPLv3, and its code is published on GitHub 2.

Identifying temporal patterns in EEG is difficult, because the underlying pro-cesses of EEG are non-stationary [4]. Classifying subject response to stimulusvia EEG is also difficult, because neural activity includes many processes whichare unrelated to engagement with the stimulus and these processes create noisewhich may obscure the signal for the process under classification [7]. Researchabout ERPs typically addresses nonstationarity by registering the timeline forpresentation with the timeline of EEG recording and addresses noise by pre-senting the same stimulus multiple times and averaging the signal across mul-tiple recordings. To use EEG in ITS, though, we want to maximize inferencefrom the subject’s first engagement with each stimulus. Therefore, we presentrolling median as a means of averaging the signal from one trial. To address non-stationarity, we discuss the importance of precise timestamps, accurate measuresof subject engagement with stimulus, and proper registration of EEG recordingswith timestamps of subject engagement.

As a case study, we test the hypothesis that a rolling median on one second ofEEG after stimulus is enough to differentiate subject response to easy and hardpassages in the Project LISTEN reading tutor [1]. We focus on the first secondbecause ERPs typically occur within the first second after subject engagementwith stimulus (ex. N1, P2 and P300). Further, Chang et al found evidence thatthe first second contains enough information to classify the difficulty of a passagea student reads aloud [1]. This evidence inspired our hypothesis and supportedthe idea that ERPs may be present in these data.

SVCs have been used in classification of EEG [9,11]. To test our hypothesis,we trained a SVC and found it had accuracy no better than chance. We includeanalysis and discussion on how careful data collection and temporal registrationcould support classification with temporal features of an EEG.

2 Data, Toolkit, and Case Study

Our research analyzes EEG data collected by Chang et al. [1] while studentsread passages in Project LISTEN’s reading tutor, an ITS. Their pilot studyrecorded data from a NeuroSky MindSet, an inexpensive headset equipped witha single-channel, dry-contact EEG sensor. As Chang notes, the limitations ofthis headset are well balanced by the opportunities its convenience affords.

The participants were allowed to make body movements, as well as read attheir own pace and click the next button on the screen. [8] Using these data,they trained binary classifiers to predict exercise difficulty with above-chanceaccuracy, thus demonstrating that one or more correlations exist between EEGsignal and passage difficulty.Their experiment showed that EEG data could be

2 BCIpy uses the Pandas data analysis library [5,6], for the data structures and func-tions it provides, and uses NumPy and SciPy [3] for matrix operations, SVCs, cross-validation, and statistics on classifier accuracy.

30

Extracting temporal EEG features with BCIpy III

used to classify the difficulty of reading passages with 41% - 69% accuracy acrossdifferent classification tasks with the probability of change equal to 50%.

The EEG data were recorded at 512Hz, but the timestamp was truncated tothe second. Fortunately, the timestamps for presentation of stimulus includedmilliseconds. To align thesewith EEG data, we treated the first EEG samplewithin a second as 0ms and linearly interpolated the time between seconds.

Table 1: Extracted featuresFeature Scope Units Description Hz

word count stimulus words Number of words in passage.is passage stimulus boolean Stimuli with more than one word.filtered EEG timeseries mV Butterworth filter 512rolling power timeseries dB power spectral density 8rolling median timeseries mV median 10

We detail extracted features in Table 1: The word count and is passage fea-tures are useful in filtering the tasks, in order to train classifiers that consideronly certain subsets of tasks. They could also be used as features in training,particularly if the other features included in training have patterns in common,regardless of word count, but include some features which are dependent onword count. For rolling window features, our default window size is 128 sam-ples, which corresponds to 0.25 seconds worth of 512Hz data. The window sizeis configurable at runtime, via a window size parameter.

We hypothesize that the EEG while the subject reads text stimuli exhibitsERPs that are a function of the subject’s mental processing of qualities withinthe text. An ERP is a notable deflection in the mean EEG voltage measuredduring a specific window of time after presentation of a stimulus. Some ERPsare well studied and have commonly accepted names, such as P300 and N400.Figure 1a illustrates this concept, with shortened versions of the names.

In most research, ERPs are studied by presenting the stimulus multiple timesand averaging the the signal from all trials. This eliminates noise from other men-tal processing and increases the signal from activity that is strongly dependenton the time at which stimulus is presented. To accomplish a similar effect witha single trial, we selected rolling median as our feature. We compared differentwindow sizes for the rolling median function and chose 128 (0.25 seconds), as itsmooths out high-frequency variation while preserving low-frequency variation.Further, we selected median instead of mean in order to accommodate somevariance in the latency between presentation of the stimulus on the screen andthe subject’s engagement with the stimulus. Figure 1b compares the originalEEG signal with the rolling median of window size 128.

Finally, we downsample the rolling median to 10Hz because the window func-tion smoothed out higher frequency information and because it reduces the di-mensionality of the feature vector. Although we selected a window size of 128, toaccommodate the variance in latency that may exist in this corpus, we encourage

31

IV Peters, Jauhari, and Barnes

(a) ERPs after a stimulus(Source: Wikipedia [10])

0 200 400 600 800 1000Time after stimulus (ms)

200

100

0

100

200

300

Pote

ntia

l (µ

V)

512Hz EEGWindow size: 32Window size: 64Window size: 128

(b) 10Hz rolling median calculated for 1 second ondata of 512Hz

Fig. 1: Using rolling median to find Event Related Potentials (ERP)

experiments with better controls over this variance to consider a window size of64 (0.125 seconds). This will reduce the overlap between windows and will thusreduce the effect of activity occuring before or after distinct ERPs.

We trained a SVC with a RBF (radial basis function) kernel on the firstsecond of rolling median as its feature vector. We included data only for taskswhere the student read a passage aloud, excluding any single-word trials.

Class Precision Recall F1 Support

Easy 0.75 0.75 0.75 12Hard 0.5 0.5 0.5 6Avg 0.67 0.67 0.67 18

Table 2: Subject 24, unbalanced classes

Class Precision Recall F1 Support

Easy 0.00 0.00 0.00 83Hard 0.5 0.99 0.66 83Avg 0.25 0.49 0.33 166

Table 3: All subjects, balanced classes

With these data, we ran two experiments: one on a specific subject (#24)and one on all subjects. We began with subject #24 because it was the sub-ject for which the classifier in Chang 2013 [1] had the highest accuracy. In bothexperiments, we reserved 20% of the data for our test set and we used the remain-ing 80% the training set. Using ScikitLearn’s StratifiedKFold cross-validation,with 4-folds, we trained the SVC on the training set and tested its classifcationaccuracy on the test set.

We did not balance the class sizes for the SVC trained on data from subject24. The results in Table 2 show accuracy of 67%, but this is the same as a naive

32

Extracting temporal EEG features with BCIpy V

classifier which predicts ”easy” for every passage. We did balance the class sizesfor the the SVC trained on all subjects and, as Table 3 demonstrates, it hadaccuracy no better than chance.

3 Discussion

An ERP is a strongly timed process, thus it is important to properly aligntraining data such that each begins at the same point in the process. We call thisalignment ”temporal registration”, analagous to image registration in computervision. In this case, we analyze the EEG signal as it responds to the subjectexperiencing the stimulus. Thus, we want to align the beginning of each examplewith the exact moment at which the subject’s experience begins.

We partly addressed temporal registration by interpolating timestamps forthe 512Hz data and aligning task boundaries at millisecond resolution. The taskboundaries, however, are not an accurate measure of the start and end times ofthe subject’s engagement with stimulus. The beginning of a task is recorded asthe moment at which the system presented the stimulus, but the subject mayhave been looking away or thinking about something other than the stimulus.Further, each word may trigger one or more ERPs. Thus, variance in readingspeed can confound temporal alignment. Focusing on the first second minimizesbut does not eliminate this effect.

For passages which the student reads aloud, we could infer time of expe-rience through features in timestamped recordings of the student’s speech. Atminimum, we could consider the first signal above a silence threshold, after pre-sentation of stimulus. Having timestamped audio could further allow segmen-tation on word boundaries, similar to the methods used in Chen et al [2]. Thiscould provide far more specific information about the mental process, includingreaction to lexical qualities and dynamic processes which may capture semanticinterplay between words in a passage.

We also note that some of the subject’s experience with a passage may occurduring visual processing and before vocalization of the words. Thus, it may befurther interesting to include eye tracking equipment or video recordings in futurestudies of EEG in a reading tutor. If it does not detract from other goals of theexperiment, one could also consider rapid serial visual presentation (RSVP) ofone or two words at a time, for more granular control of the subject’s encounterwith each stimulus.

Acknowledging that there may exist other processes informing our model,we maintain our hypothesis that ERPs are present in these data and that theseERPs are, at least partially, a function of the difficulty class for the presentedstimulus. We look forward to testing this hypothesis in future work.

To continue our work toward building classifiers on temporal features of EEG,we plan to use synthetic data to test the properties of rolling median as thefeature vector for a SVC. Synthesizing data allows us to model idealized ERPsand test how varying levels of noise and errors in temporal registration affectclassifier accuracy. Establishing these bounds will help inform parameters for

33

VI Peters, Jauhari, and Barnes

feature extraction and controls on data collection. We may also explore publicdata which were collected in experiments designed to elicit specific ERPs.

As we have discussed here, temporal features of EEG signal from a singleelectrode may be useful in classification of subject response to stimulus. BCIpyhelps you extract and analyze these features and our examples demonstrate howto train classifiers with these features. We hope that BCIpy and the analyses wepresent here move our community closer to useful applications of EEG in ITS,as a form of assessment which minimizes interruptions and helps a student in”flow” remain there. We look forward to further study of this topic.

4 Acknowledgments

We provide our sincere thanks to Vinaya Polamreddi, Yueran Yuan, Kai-minChang, Jack Mostow, Ryan Baker, Alper Bozkurt, and Thomas Price.

References

1. K.-m. Chang, J. Nelson, U. Pant, and J. Mostow. Toward exploiting eeg inputin a reading tutor. International Journal of Artificial Intelligence in Education,22(1):19–38, 2013.

2. Y.-N. Chen, K.-M. Chang, and J. Mostow. Towards using eeg to improve asraccuracy. In Proceedings of the 2012 Conference of the North American Chapterof the Association for Computational Linguistics: Human Language Technologies,pages 382–385. Association for Computational Linguistics, 2012.

3. E. Jones, T. Oliphant, and P. Peterson. SciPy: Open source scientific tools forPython, 2001–.

4. A. Y. Kaplan, A. A. Fingelkurts, A. A. Fingelkurts, S. V. Borisov, and B. S. Dark-hovsky. Nonstationary nature of the brain activity as revealed by eeg/meg: method-ological, practical and conceptual challenges. Signal processing, 85(11):2190–2212,2005.

5. W. McKinney. Pandas: Python data analysis library. http://pandas.pydata.org.Accessed: 2013-12-13.

6. W. McKinney. Python for Data Analysis: Data Wrangling with Pandas, NumPy,and IPython. O’Reilly Media, 2012.

7. S. Sanei and J. A. Chambers. EEG signal processing. John Wiley & Sons, 2008.8. B. H. Tan. Using a low-cost eeg sensor to detect mental states. Master’s thesis,

Carnegie Mellon University, 2012.9. B. Wang and F. Wan. Classification of single-trial eeg based on support vector

clustering during finger movement. In Advances in Neural Networks–ISNN 2009,pages 354–363. Springer, 2009.

10. Wikipedia. File:componentsoferp.svg, 2009. [Online; accessed 13-December-2013].11. J. Zhou, J. Yao, J. Deng, and J. Dewald. Eeg-based classification for elbow versus

shoulder torque intentions involving stroke subjects. Computers in biology andmedicine, 39(5):443–452, 2009.

34

http://pandas.pydata.org

Intelligent tutors exploiting novel sensingmodalities for decoding students’ attention

Alvaro Soto1, Felipe Orihuela-Espina2, Diego Cosmelli1, Cristian Alcholado1,Patrick Heyer2 and L. Enrique Sucar2

1 Pontificia Universidad Catolica de Chile, Santaigo, Chile2 Instituto Nacional de Astrofısica, Optica y Electronica, Puebla, Mexico

Abstract. To afford personalized instruction, Intelligent Tutoring Sys-tems (ITS) require appropriate technologies to effectively access the in-ternal state of each student. This includes attentional disposition, emo-tional attitude, and cognitive state in general. This work presents ourinitial steps towards building an ITS exploiting electroencephalogra-phy (EEG) and body posture to deduce relevant aspects of the at-tentional state of the student. Binarized attentional state of a studentbased on posture alone can be successfully discriminated with F-measure76.47 ± 4.58. Emerging patterns in preliminary exploration of the EEG,still underway, suggest non-cued identification of attention is a feasibleundertaking.Keywords: Cognitive Tutors, Attention Detection, EEG.

1 Introduction

Exploitation of underused sensing modalities combined with existing psychophys-iological indexes of attentional state, emotion disposition and exploratory atti-tude can boost the possibilities of ITS. Electroencephalography (EEG) for atten-tion [3] and computer vision technologies for posture [2] are among the amenabletechnologies to decode internal cognitive states of an ITS user.

2 Methodology

We focus on topics related to mathematical properties of fractions [1]. Activityis organised according to 5 levels of increasing difficulty. Progress across levelsis achieved based on number of correct answers.

Five volunteers in 5th grade plus other 3 in 7th grade of the chilean basiceducational system participated (average age: 10.75 yrs) following parent or le-gal guardian prior consent. None had any history of neurological or psychiatricdisorder, and IQs where in the average normal range. The experimental sessionwas ended when the participant failed to advance any further in the task or re-ported boredom and/or fatigue to the experimenter using a small bell, rangingbetween 15 to 40 minutes approximately. Figure 1 illustrates the ITS interfaceand the experimental set-up.

35

a) Interface b) Set-up c) Examples of attentional episodes

Fig. 1. (a) User interface of the educational activity indicating the 4 main areas ofthe interface; (1) problem statement zone, (2) answer selection zone operable by scrollcontrols, (3) decision zone to confirm answer, and (4) feedback zone. (b) Experimentalset-up showing a participant wearing an EEG to capture their brain activity duringa session with our educational software. (c) Two examples of attentional episodes ex-tracted during preliminary visual exploratory analysis of the EEG.

Digital EEG was obtained at 32 channels of the 10/20 system at 2048 Hz.Mastoid electrodes where recorded and used for off-line referencing. Simultane-ous video stream accompanied by depth perception maps was recorded using aKinect system (Microsoft, USA) at 15 fps for monitoring posture. A log-file ofrelevant actions on the educational software was also automatically recorded.Two coders labelled the data obtained during the task with specific behaviorsthat we expect to recover from the EEG data analysis.

3 Preliminary Results

Analysis of the multi-modal data is still in progress. Preliminary, visual ex-ploratory analysis highlight a recurrent pattern following episodes of attentionin the EEG (see Figure 1).

4 Conclusions

Incorporating, underused sensing modalities as EEG and Kinect to access theinternal state of the student is a feasible endeavour that can potentially boostthe capabilities of ITS to provide personalized education.

Acknowledgements. Funded by Microsoft LACCIR project R1211LAC0001.

References

1. Alcoholado, C., Nussbaum, M., et al: One Mouse per Child: Interpersonal Com-puter for Individual Arithmetic Practice Journal of Computer Assisted Learning,28, 295–309 (2012)

2. Heyer, P., Herrera-Vega, J., et al: Posture Based Detection of Attention in Hu-man Computer Interaction 6th Pacific-Rim Symposium, PSIVT 2013, Guanajuato,Mexico, October 28-November 1, 2013, pp 220-229

3. Wang, Q., Sourina, O.: Real-time mental arithmetic task recognition from EEGsignals. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 21,225-232 (2013)

36

An Exploration of Two Methods for using fMRIto Identify Student Problem Solving Strategies

Caitlin Tenison1, John R. Anderson1

Psychology Department1

Carnegie Mellon University, Pittsburgh PA 15213, [email protected] and [email protected]

Abstract. Using a math-learning paradigm, we explore two potentialuses for fMRI when modeling problem-solving strategies. First, we usefMRI as an additional data source for our model. Second, we employfMRI as a method of testing and understanding our behavioral models.We evaluate each method and consider which gains the greatest benefitfrom the inclusion of fMRI data.

1 Introduction

Intelligent tutoring systems (ITSs) emerged from the idea that we can modelthe cognitive learning processes of students to better inform instruction. EarlyITS modeling relied heavily on latency and student productions as responsesof underlying cognition; however, with the improved ability to record neuralresponses, methods like functional magnetic resonance imaging (fMRI) are in-creasingly used to advance research. In their study of an Algebra ITS, Andersonet al. [1] found that a model using both fMRI and keystroke information betterpredicted when a student was problem solving than one that relied solely ona single data source. They attribute the models success to the merging infor-mation from the brain with behavioral measures. Predicting whether or not astudent is problem solving, however, is a simpler task than predicting how a stu-dent is problem solving. Additionally, when predicting problem solving strategythere is more variety in the type of behavioral data that can be collected. Inthis paper, we identify two potential functions of fMRI use in modeling problemsolving states. The first is as an additional data source to build better studentmodels, and the second, is as a method for testing and understanding behavioralmodels. To investigate this question we use a simple math-learning paradigm.As students gain practice problem solving, the strategies that they use changefrom calculation to retreival. In a previous study we found students also employintermediary strategies containing a mixture of both retreival and calculation aswell [2]. Rather than predicting if a student is problem solving, our models willpredict what strategy they are using. We will build two models, one that usesbehavioral indicators of strategy use, the other which uses a combination of bothbehavioral and neurological indicators. We will compare these two methods toassess the value of incorporating fMRI as an additional data source. Finally, wewill consider the use of brain data to better understand the behavioral model.

37

2 Materials and Methods

2.1 Participants

Twenty right-handed university students (9 females; mean age 22; SD 2.3) par-ticipated in the study. Participants gave informed written consent and receivedmonetary compensation for their participation.

2.2 Stimuli and Experimental Design

To investigate the change in strategy that occurs when learning a new type ofoperation, we trained participants on a novel operation. Participants learnedhow to calculate the value of a pyramid expression b$n by adding n decreasingnumbers starting with b [2]. For instance, 11$4=11+10+9+8=38. Participantsused the keypad to type out the answers to these problems and to indicate theproblem solving strategies that were used. After answering a problem, partici-pants chose from a list of strategies the option that best matched the one used tosolve the problem. We compiled the strategy options from the reported strategiesin a previous study [2]. Students chose from 4 options: “Retrieve” was defined asremembering the answer; “calculate” was defined as using arithmetic to find theanswer; “partial” was described as partially calculating and partially remember-ing the problem; the “other” strategy was used for anything else but only oneparticipant indicated use of this strategy.

2.3 Scanning Procedure

Participants completed 6 blocks of fMRI scans. Participants completed a concur-rent assessment on the 2nd, 4th and 6th scans. The alternating of scans featuringan assessment allowed us to check its reactivity (no reaction was found). Overallthere were 3 practiced problems that were repeated 36 times over the course ofthe experiment and 18 novel problems that were repeated twice. Pyramid prob-lems were presented on the screen following a 2 second fixation period. Once theproblem appeared on the screen, the participant was allowed a maximum of 30seconds to indicate knowledge of a solution by pressing the return key on thenumeric keypad. After pressing return, participants input a solution using thekeypad and were given correctness feedback. Problem solving time was definedas the time between the appearance of the math problem and the point at whichthe participant indicated a readiness to input the answer. fMRI images wereaquired using gradient echo-echo planar image aquisition on a Siemens 3T VerioScanner using a 32 channel RF head coil, with 2 s. repetition time (TR). Moredetailed data aquisition and processing steps is described in Tenison et al. [2].

2.4 fMRI Analysis

To create a single measure of strategy use from the fMRI data we used a classifi-cation analysis to quantify how similar a given trial was to other retrieval trials.

38

Without a direct report of retrieval for all problems, we trained our classifier onthe distinction between practiced and novel problems, since we knew novel prob-lem could not be solved by retrieval, whereas many practiced problems would besolved by retrieval. For the purposes of this paper, we will briefly summarize theprocessing steps applied to our data (again see Tenison et al. [2] for detail on asimilar analysis). First, we subdivided the brain into 4x4x4 voxel cubes (a voxelis 3.2 x 3.125 x 3.125mm) over 32 slices of the 64x64 acquisition matrix to createan initial 408 mega-voxel regions of interest (ROIs) [1]. The second step was toeliminate regions that had highly variable fMRI signals. ROIs containing morethan 15 TRs across all participants that fluctuated more than 15% during a blockwere eliminated. The reduced sample comprised 288, 4x4x4 voxel regions of rawdata. For the 288 regions, we estimated the activity during problem-solving foreach trial and calculated the z-scores of this measure. Normalizing allowed forcomparison across subjects. To eliminate fluctuations in the blood-oxygen-level-dependent signal that were physiologically implausible, z-scores were Winsorizedsuch that scores greater than 5 or less than -5 were changed to 5 or -5 respec-tively. As a third step, we performed dimensionality reduction using PrincipleComponents Analysis (PCA), which creates a set of uncorrelated variables fromlinear combinations of the ROI activity. We then preformed a linear discriminateanalysis (LDA) on the first 50 factors extracted from the PCA to identify whichof these factors contributed to distinguishing between practiced and novel prob-lems. We used a leave-one-out cross-validation method where we predicted eachsubjects from the results of the other subjects. Besides returning a predictedcategory for each item, an LDA generates a continuously varying evidence mea-sure for category membership and a posterior probability that an item is froma category. These measures were used in subsequent analysis.

3 Results

3.1 Effects of Practice

A repeated measures ANOVA run on the latency data revealed a significantmain effect of problem group (practiced vs. novel), F(1,18)=69.28, p<0.0001,scan block, F(5,90)=18.66, p<0.0001, and a significant problem by scan blockinteraction, F(5,90)=14.95, p<0.0001. The time it took participants to solvepracticed problems decreased, but the time to solve novel problems remainedconstant. Additionally, there was an increase in reports of retrieval with prac-tice, F(2,38)=42.04, p<0.0001, and a decrease of reports of both computation, F(2,38)=8.396, p=0.001, and partial strategies, F(2,38)=18.598, p<0.0001. Novelproblems showed no changes in reported strategy use. Averaged across all as-sessed trials practiced problems were reported as retrieved 81.7% of the timeand novel problems were reported as calculated 89% of the time. We took thisas evidence that a LDA classifier trained to distinguish between practiced andnovel problems would use information similar to what would be used to distin-guish between retrieval and calculation (Figure 1). In cross-subject tests, theclassifier predicted all subjects better than chance. The average d-prime measure

39

Fig. 1. Four classification analyses are represented here in m-n format. Warm voxelsare more active for m problems, cool voxels are more active for n problems. Locus ofthe HIPS represented by the black square.

of performance in this analysis was 1.71, t(19) = 14.5, p<0.001, with a hit rateof 60% and a false alarm rate of 11%. The major contribution of this classifierto this study is to label each trial with the probability that it was retrieved. Wewill use this evidence score as one source of information about strategy use.

3.2 Results from Two Hidden Markov Models

Our first aim of this study is to assess the value of fMRI as an additional datasource for modeling strategy changes. We used a Hidden Markov Model (HMM)to study the three practiced problems over the course of the experiment. Weran two HMMs, the first HMM used only behavioral data (the reports, thelatency and the accuracy information). Each state was associated with threemeasures: probability of the three reported strategies, latency (the normalizedlog latency was modeled as a Gaussian), and the probability of correct solution.We calculated the probability of a student being in a specific strategy-use stateafter having observed that problem a given number of times using a forward-backwards algorithm. We fit HMMs for 1-10 states, using BIC to penalize foradded parameters. The best model had 3 strategy states. Table 1 shows the meanparameter value in each state for the behavioral HMM. The second HMM weused combined the latency and fMRI evidence data along with the accuracy andreport data. Since latency and fMRI are highly correlated, we orthogonalize thesetwo measures by use of a PCA. The first component of the PCA proved to carryall the information accounting for 88.7% of the variance. This first componentcan be taken as a general strength measure and was used, in combination withthe other measures, to train the HMM. The right side of Table 1 indicatesthe mean parameter values of each state for the combination brain behavioralHMM. The HMM generates likelihood of state belonging, we assign each problemto the most likely state. Using this discrete assignment we can compare the twoHMMs by looking at the similarity of state assignments. We ran a Cohens Kappacalculation to show the agreement between the two HMMs is above chance (κ =

40

Table 1. Average parameter values for the two models

Models Behavioral Brain and BehavioralState1 State2 State3 State1 State2 State3

Accuracy (%) 84.2 95.3 98.4 85.7 95.7 98.7Latency (s) 3.6 1.4 .72 3.3 1.3 .69fMRI Evidence .50 -.25 -1.3 .46 -.29 -1.44

Percent Strategy Use

Calculation 35.7 9.4 1.1 35 6.4 1.1Partial 52 20.6 .78 56.4 13.3 .74Retrieval 12.2 69.9 98.1 8.54 80.3 98.1

0.0

0.1

0.2

0.3

0.4

1 2 3State

Per

cent

Act

ivat

ion

Left HIPS

0.0

0.1

0.2

0.3

0.4

1 2 3State

Right HIPS

Fig. 2. Mean activation of the Bilateral HIPS. Error bars represent standard error.

0.80) There are no cases in which a problem is assigned as State 1 by one modeland State 3 by the other. This evidence suggests the addition of the brain databrings little additional benefit to the state estimation.

3.3 Understanding the States

Our second aim of this study is to explore the potential of fMRI data as ameans for understanding the states identified by behavioral models. We ranan LDA similar to the one described in Section 2.4 but this time we classifiedstate assignments from the behavioral HMM. We mapped the weights from theclassifier back to the brain in order to observe the areas associated with theclassification of the different states (Figure 1) Among the regions used by theclassifier to distinguish between states, we identified the horizontal intraparietalsulcus (HIPS), an area used in calculation and numerical cognition. We usedthe coordinates from a meta-analysis of numerical cognition [3] (Maxima TC:-31, -50, 45) to investigate if there were significant differences in bilateral HIPSactivity for the three states. We found significant bilateral differences betweenState 1 and 3 (Left: t(19) = 2.6 , p<0.05 Right: t(19) = 2.7, p<0.05) and State 2and 3 (Left: t(19) = 4.0 , p<0.05 Right: : t(19) = 3.6 , p<0.05) and no differencesbetween State 1 and 2 (Figure 2).

41

4 Discussion

In this study we put forth two possible methods for using fMRI to inform how wemodel student strategy use. The first method was to use brain data as a sourceto incorporate in our models. To this end, the fMRI data did little to change theclassification we obtained just from behavioral data. The second method was touse brain data to interpret the latent states identified when modeling behavioraldata. Our results bring some insight as to the nature of the three states identifiedby our model. It is clear from both the behavioral signatures and the brain datathat State 1 is a calculation state and State 3 is a retrieval state. The natureof State 2 is less clear, and according to the participant reports State 2 is aretrieval state. However, the brain data suggests that State 2 is more similar to acalculation than a retrieval state. The classification of the brain data associatedwith the states showed the HIPS, an area used in math calculation, helpeddistinguish the 3 states. A further ROI analysis of this region verified that thebilateral HIPS showed significantly more activation in State 1 and 2 than State3, and no difference between States 1 and 2. The direction of the HIPS activationis supporting evidence that States 1 and 2 involve more number processing thanState 3 [3]. Future studies could use exploratory analyses, comparisons with theuntrained problems, or techniques such as representational similarity analysis tobuild a more detailed picture of the states. Our goal in this study was to considerhow we use fMRI in modeling cognitive states. Previous work found that modelsusing both fMRI and keystroke information, better predicted when students wereproblem solving [1], however it was unclear if the benefit of fMRI held whenidentifying shorter cognitive states or using additional behavioral information.Our findings suggest that the high spatial and low temporal resolution of thefMRI is better suited for understanding models rather than building modelsof brief cognitive states. Future studies could explore how systems with bettertemporal resolution such as MEG or EEG perform in such scenarios.

Acknowledgments

This work was supported in part by NSF grant DRL-1007945 and by CMU’s Pro-gram in Interdisciplinary Education Research funded by IES grant R305B090023.

References

1. Anderson, J.R., Betts, S., Ferris, J.L., Fincham, J.M.: Neural imaging to trackmental states while using an intelligent tutoring system. Proceedings of the NationalAcademy of Sciences 107(15) (April 2010) 7018–23

2. Tenison, C., Fincham, J.M., Anderson, J.R.: Detecting math problem solving strate-gies: an investigation into the use of retrospective self-reports, latency and fMRIdata. Neuropsychologia 54 (February 2014) 41–52

3. Cohen Kadosh, R., Lammertyn, J., Izard, V.: Are numbers special? An overview ofchronometric, neuroimaging, developmental and comparative studies of magnituderepresentation. Progress in neurobiology 84(2) (February 2008) 132–47

42

EEG Helps Knowledge Tracing!

Yanbo Xu, Kai-min Chang, Yueran Yuan, and Jack Mostow

Carnegie Mellon Universityyanbox, kkchang, yuerany, [email protected]

Abstract. Knowledge tracing (KT) is widely used in Intelligent Tu-toring Systems (ITS) to measure student learning. Inexpensive portableelectroencephalography (EEG) devices are viable as a way to help detecta number of student mental states relevant to learning, e.g. engagementor attention. In this paper, we combine such EEG measures with KT toimprove estimates of the students’ hidden knowledge state. We proposetwo approaches to insert the EEG measured mental states into KT asa way of fitting parameters learn, forget, guess and slip specifically forthe different mental states. Both approaches improve the original KTprediction, and one of them outperforms KT significantly.

Keywords: EEG, knowledge tracing, logistic regression

1 Introduction

Knowledge tracing (KT) is widely used in Intelligent Tutoring Systems (ITS) tomeasure student learning. In this paper, we improve KT’s estimates of students’hidden knowledge states by incorporating input from inexpensive EEG devices.EEG sensors record brainwaves, which result from coordinated neural activity.Patterns in these recorded brainwaves have been shown to correlate with a num-ber of mental states relevant to learning, e.g. workload [1], associate learning[2], reading difficulty [3], and emotion [4]. Importantly, cost-effective, portableEEG devices (like those used in this work) allow us to collect longitudinal data,tracking student performance over months of learning.

Prior work on adding extra information in KT includes using student help re-quests as an additional source of input [5] and individualizing student knowledge[6]. Thus for the first time, students’ longitudinal EEG signals can be directlyused as input to dynamic Bayes nets to help trace their knowledge of differentskills. An EEG-enhanced student model allows direct assessment to be performedunobtrusively in real time. The ability to detect learning while it occurs insteadof waiting to observe future performance could accelerate teaching dramatically.Current EEG is much too noisy to detect learning reliably on its own. However,as we show in this paper, combining EEG with KT allows us to detect learningsignificantly better than using KT alone.

43

2 Approach

KT is a type of Hidden Markov Model, which uses a binary latent variable(K(i)) to model whether a student knows a skill at step i. It estimates thehidden variable from a sequence of observations (C(i)’s) of whether the studenthas applied the skill correctly up to step i. In this paper, KT is used to capturethe changes in knowledge state of a word over time (e.g., the school year), basedon observations of whether or not the student read the word fluently (defined inmore detail in Section 3). Standard KT usually has 4 (sometimes 5) parameters:initial knowledge (L0), learning rate (t), forgetting rate (f) (usually set to zero,but not in this paper), guessing rate (g), and slipping rate (s). We add anotherobserved variable (E(i)), representing the EEG measured mental state that isextracted from EEG signals and is time-aligned to the student’s performanceat step i. We present two approaches to insert this variable into KT so thatthe student’s hidden knowledge is inferred not only from the observed student’sperformance but also from the student’s mental state measured by EEG.

ge se

L0 te

fe K(1)

C(1)

E(1)

…K(2)

C(2)

E(2)

ge se

te

fe

(a) EEG-KT

g s

L0 te

fe K(1)

C(1)

…K(2)

C(2)

g s

te

fe

E1(1)

Em(1)

…

(b) EEG-LRKT

Fig. 1: Add EEG measures into KT

Approach I: Insert 1-dimensional binary EEG measure into KT(EEG-KT). EEG-derived signals are often described as a type of measure forhuman mental states. For example, NeuroSky uses EEG signal to derive pro-prietary attention and meditation measures that indicate focus and calmness instudents [7]. By adding a binary EEG input into KT, we hypothesize that astudent can have a higher learning rate t given that the student is focusing atthat step. Thus EEG-KT, shown in Figure 1a, extends KT by adding a binaryvariable E(i) computed from EEG input. We started with a binary (vs. contin-uous) EEG input for ease of implementation. This approach is reported in [8].

Approach II: Combine multi-dimensional continuous EEG mea-sures in KT (EEG-LRKT). We also try an m-dimensional continuous variableE(i), denoting m EEG measures extracted from the raw EEG signal at step i. Xu

44

and Mostow [9] proposed a method that uses logistic regression to trace multiplesubskills in a Dynamic Bayes Net (LR-DBN). Without exploding the conditionalprobability tables in a DBN, LR-DBN combines the multi-dimensional inputsvia a sigmoid function, which increases the number of parameters linearly (innumber of inputs) instead of exponentially. This combination function was usedin tracing multiple subskills [10]. Similarly, EEG-LRKT uses logistic regressionto combine continuous EEG measures in KT. Figure 1b shows the graphicalrepresentation of EEG-LRKT, where circle nodes denote continuous variables.Hidden knowledge states are now determined by various EEG inputs. KT pa-rameters te and fe are computed by logistic regression over all m EEG measures.

3 Evaluation and Results

3.1 Data sets

Our EEG data comes from children 6-8 years old who used Project LISTEN’sReading Tutor at their primary school during the 2013-2014 school year [11]. Wemodel the growth of students’ oral reading fluency, by labeling a word as fluentif it was 1) accepted by the automatic speech recognizer (ASR) [12], as read 2)with no hesitation (the latency determined by ASR is less than 0.05s), and 3)without the student clicking on a word for help from the tutor.

EEG raw signals are captured by NeuroSky’s BrainBand device at 512 Hz,and are denoised as in [11]. We use NeuroSky’s proprietary algorithms to gener-ate 4 channels: signal quality, attention, meditation, and rawwave. We then useFast Fourier Transform to generate 5 additional channels from rawwave: delta,theta, alpha, beta, and gamma. We break EEG data into 1-second long seg-ments, and filter out any segment with a poor EEG signal quality score (cuttingoff at 100 on the 0 to 200 signal quality scale provided by Neurosky). We then re-move any observation for which more than 50% of its corresponding EEG signalis filtered out. We remove every word encounter whose next encounter (by thesame student) has poor EEG signal quality, e.g. the first encounter of “cat” bya student is removed because the second encounter of “cat” by the same studenthas bad EEG quality. We keep only encounters whose next encounter had goodsignal quality, which reduces our data size by 1/3.

The original data set includes 16 students who read 600 distinct words. Wediscard 4 students who have fewer than 100 observations, resulting in 6,313observations from 12 students. To maintain enough data for EM estimations ofthe parameters, however, we keep 4 students who have many more than 500observations in the training data and cross validate the other 8 students.

3.2 Train classifiers as an extra EEG measure

We train Gaussian Naive Bayes classifiers to predict fluency. We compute theaverage and variance of the values of each of the 8 channels (excluding signalquality) over the duration of each word according to ASR as the classifier fea-tures (16 features in total). The validation is between-subject (i.e. training on

45

all but one subject and testing on that remaining subject). Because the largemajority class in this dataset will create overpowering priors, we pre-balanceour data using under-sampling. This classifier uses a similar training pipelineas [11] with a few notable differences: 1) no feature selection due to the largetraining set; 2) to account for individual differences, we normalize every featureby converting features to z-scores over the distribution of that feature for thatsubject. Normalization is done before we train our classifier.

The classifier has a prediction accuracy of 61.8%. We evaluate it against a50:50 chance classifier since we train the classifier on pre-balanced data. Ourclassifier performs significantly above chance on a Chi-squared test (p < 0.05).Finally, in Eq. 1, we compute a confidence-of-fluency (Fconf) metric as our 9thEEG measure and use it in the same way as the above 8 EEG scalar features:

Fconf = Pr(fluent|2× 8 features)− Pr(disfluent|2× 8 features) (1)

3.3 Model fit with cross validation

We compare EEG-KT and EEG-LRKT to KT on a real data set. We normalizeeach EEG measure within student by subtracting the measure’s mean and divid-ing by the measure’s standard deviation across each student’s observations. AsEEG-KT requires, we discretize each measure as a binary variable: TRUE if thevalue is above zero; FALSE otherwise. We individually insert each of the binaryEEG measures into KT and obtain in total 9 EEG-KT models: ATT(ention)-KT, MED(itation)-KT, RAW-KT, Delta-KT, Theta-KT, Alpha-KT, Beta-KT,Gamma-KT, and Fconf-KT. EEG-LRKT directly combines the 8 normalizedEEG measures (excluding Fconf). Besides, we fit Rand-KT and Rand-LRKT,which replace EEG with randomly generated values from Bernoulli and stan-dard Normal distributions respectively. We use EM algorithms to estimate theparameters, and implement the models in Matlab Bayesian Net Toolkit for Stu-dent Modeling (BNT-SM) [13, 10].

We conduct a leave-one-student-out cross validation (CV), which trains wordspecific models on 11 out of 12 students and tests on the remaining single student.We use receiver operating characteristic (ROC) curve and area under the curve(AUC) to assess the performance of model prediction (i.e., binary classification)since we have an unbalanced data with 83% labeled as fluent. Since we do notchange the parameter of initial knowledge (L0) in EEG-KT or EEG-LRKT,we clamp L0 to 0.4 in our experiments in order to assess only the effect ofthose modified KT parameters. To test the statistical significance of differencesbetween the proposed models and KT, we do two-tailed paired t-tests on AUCscores across the 8 students. EEG-LRKT significantly outperforms KT; the other8 EEG measures and Rand-KT do not differ significantly from KT. Rand-LRKTseems to have a high AUC, but lacks results for half of the tested skills becauseof rank deficiency when fitting random values with logistic regression in DBN.Figure 2a shows a ROC graph with only the models that have significantly betterAUC scores than KT with 8-fold CV; Table 2b shows a full list of AUC scores.

46

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

True

pos

itive

rate

EEG−LRKT AUC = 0.7665Rand−LRKT* AUC = 0.7255KT AUC = 0.6479Rand−KT AUC = 0.6146Majority AUC = 0.5000

(a) ROC curves

the parameters, and implement the models in Matlab Byesian Net Toolkit for Student Modeling (BNT-SM) [12, 13].

We compare model fit with cross validation in Section 5.1, present the KT parameters differentiated by EEG in Section 5.2.

5.1 Model fit with cross validation We conduct a leave-one student-out cross validation (CV), which trains word specific models on 11 out of 12 students and tests on the remaining single student. The original data set includes 16 students who read 600 distinct words. We discard 4 students who have much less than 100 observations, and finally result at 6,313 observations from 12 students. In order to maintain enough data for EM estimations of the parameters, however, we constantly keep 4 students who have many more than 500 observations in the training data and cross validate the other 8 students.

We use receiver operating characteristic (ROC) curve and area under the curve (AUC) to assess the performance of model prediction (i.e., binary classification) since we have an unbalanced data with 83% labeled as fluent. ROC plots the true positive rate (TPR) vs. false positive rate (FPR) at different thresholds for cutting off the predicted probabilities. TPR (also known as Recall) is the percentage that a positive instance (e.g., a fluently reading word) is correctly classified as positive; FPR (also known as the Fall-out) is the percentage that a negative instance (e.g. a not fluently reading word) is incorrectly classified as positive. So the curve shows a trade-off between the Recall (benefits) and Fall-out (costs). AUC calculates the area under the ROC curve, which is also insensitive to an unbalanced dataset. A perfect classification model would reach the top left corner in ROC space, and yield an AUC score of 1, while a majority vote model (probability of 1 to predict a word as fluent and 0 to predict as disfluent) would show a diagonal line from the bottom left to the top right corner in ROC space, and get an AUC score of 0.5.

Since we do not change the parameter of initial knowledge (L0) in EEG-KT nor EEG-LRKT comparing to the original KT, so we clamp L0 as 0.4 in our experiments in order to only assess the effect of those modified KT parameters.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

True

pos

itive

rate

EEG−LRKT AUC = 0.7665Rand−LRKT AUC = 0.7255Fconf−KT AUC = 0.6613Theta−KT AUC = 0.6568KT AUC = 0.6479Rand−KT AUC = 0.6146Majority AUC = 0.5000

Figure 3. ROC curves of EEG-LRKT vs. EEG-KT vs. KT

by 8-fold CV

Table 1. AUC scores by 8-fold CV

(Underlined if P-Value <0.05 in paired T-test with KT; AUC for Rand-LRKT (starred*) is based on incomplete test data)

Models AUC Models AUC

EEG-LRKT 0.7665 Beta-KT 0.6355

Rand-LRKT* 0.7255 Gamma-KT 0.6317

Fconf-KT 0.6613 RAW-KT 0.6275

Theta-KT 0.6568 MED-KT 0.6230

KT 0.6479 Delta-KT 0.6224

ATT-KT 0.6435 Rand-KT 0.6146

Alpha-KT 0.6429

Table 1 gives a complete list of AUC scores for all the models. To test the statistical significance of the difference between the proposed models (EEG-KT and EEG-LRKT) and KT, we do two-tailed paired t-test on AUC scores across the 8 students. We see that EEG-LRKT significantly outperforms KT. Fconf-KT beats KT for 6 out of 8 students; Theta-KT beats KT for 5 out of 8 students. ATT-, Alpha-, and Beta-KT are close to KT with no significant difference. The other 4 EEG measures actually hurt KT’ model fit, but still better than Rand-KT. Rand-LRKT seems having a high AUC, however, about half of the tested skills don’t have fitted parameters due to the rank deficiency by EM algorithm. So the AUC of Rand-KT is computed only based on roughly half of the test data. Figure 3 shows a visible ROC graph with only the models that have better AUC scores than KT by 8-fold CV.

So far, we have shown that the EEG signals from a simple portable device can help KT predictions, even with possible random noise. Now we want to infer the amount of noise in the EEG signals. We say an EEG-KT model predicts perfectly if the binary EEG variable agrees with the true label of reading a word fluently or not. The model fit starts to decline when some values of the variable disagree with the true labels (like labels being flipped), and the noise level increases as the flips increase. Thus we generate a set of binary variables by randomly flipping the true labels of fluent from 0% (perfect, named as F100%) up to 50% (random, as F50%). We insert each of these simulated random variables into KT as new EEG-KT models, and compare them with the real EEG measured model Fconf-KT. Recall Fconf denotes the confidence score of using all the EEG measures to predict the true label of fluent. The goal is to see what position Fconf-KT would locate among the F100% ~ F50%-KT models, so that we can approximate EEG’s noise level as what extend Fconf-KT can help KT to recover the true labels.

(b) AUC scores

Fig. 2: Model fit comparison by 8-fold CV (Underlined if p-value <0.05 in pairedT-test with KT; Rand-LRKT (starred*) is based on incomplete test data)

4 Conclusion and Future Directions

In this paper, we combine EEG measures with KT to improve estimates ofthe student’s hidden knowledge state. Estimating Pr(K) enables us to predictperformance (e.g. fluency) more accurately than estimating performance directlysince the estimate of Pr(K) is conditioned on all observations so far. We presenttwo approaches: 1) EEG-KT adds one binary EEG measure into KT, and 2)EEG-LRKT uses logistic regression on various continuous EEG measures in KT.Both approaches outperform the original KT, significantly for EEG-LRKT interms of ROC and AUC, when predicting an unseen student’s reading fluency onwords in the Reading Tutor. For the first time, EEG measures are directly usedto help model students’ knowledge. Though not all the single-channel measures(like Theta) from EEG can help knowledge tracing, the combined EEG measuresignificantly improves KT predictions.

EEG studies in the neuroscience literature have better instrumentation butnot longitudinal data like we have. EEG-based information (especially using asingle sensor like NeuroSky’s BrainBand) is noisy and is by no means a reliable,precise measure of a meaningful brain state. However, as demonstrated in thispaper, longitudinal EEG does provide measurable improvement in predictiveaccuracy anyway.

In this paper, we focus on reading (specifically, fluency development), whichis good for studying EEG-enriched KT thanks to density of sensing (many wordsper minute). The framework that we proposed is also applicable to other typesof learning. Another future direction is to analyze the practical significance ofthe result in terms of impact on learning. As Beck and Gong [14] pointed out,tiny improvements in predictive accuracy don’t matter - actionable intelligencedoes. We want to estimate the possible speedup in learning as a result of beingable to use EEG to detect learning while it occurs (instead of waiting to observefuture performance).

47

5 Acknowledgements

This work was supported by the National Science Foundation under Cyberlearn-ing Grant IIS1124240. The opinions expressed are those of the authors and donot necessarily represent the views of the National Science Foundation.

References

1. Berka, C., Levendowski, D. J., Lumicao, M. N., Yau, A., Davis, G., Zivkovic, V.T., Vladimir, T., Olmstead, R. E., Tremoulet, P. D., Patrice, D., Craven, P. L.:EEG correlates of task engagement and mental workload in vigilance, learning, andmemory tasks. Aviation, Space, and Environmental Medicine. 78 (Supp 1), B231-244(2007).

2. Miltner, W. H. R., Braun, C., Arnold, M., Witte, H., Taub, E: Coherence of gamma-band EEG activity as a basis for associative learning. Nature, 397, 434-436 (1999).

3. Chang, K. M., Nelson, J., Pant, U., Mostow, J.: Toward Exploiting EEG Input in aReading Tutor. International Journal of Artificial Intelligence in Education, 22, 1-2,19-38 (2013).

4. Heraz, A., Frasson, C.: Predicting the three major dimensions of the learner’s emo-tions from brainwaves. World Academy of Science, Engineering and Technology, 31,323-329 (2007).

5. Beck, J. E., Chang, K. M., Mostow, J., Corbett, A.: Does help help? Introduc-ing the Bayesian evaluation and assessment methodology. In Proceedings of the 9thInternational Conference on Intelligent Tutoring Systems, 383-394 (2008).

6. Pardos, Z. A., Heffernan, N. T.: Modeling individualization in a bayesian networksimplementation of knowledge tracing. In User Modeling, Adaptation, and Personal-ization, pp. 255-266. Springer Berlin Heidelberg, (2010).

7. NeuroSky.: NeuroSky’s eSenseTM meters and detection of mental sate. Neurosky,Inc. (2009).

8. Xu, Y., K.-m. Chang, Y. Yuan, and J. Mostow. Using EEG in Knowledge Tracing.In Proceedings of the 7th International Conference on Educational Data Mining.2014: London, UK.

9. Xu, Y., Mostow, J.: Using logistic regression to trace multiple sub-skills in a dynamicbayes net. In Proceedings of the 4th International Conference on Educational DataMining, 241-246 (2011).

10. Xu, Y., Mostow, J.: Comparison of methods to trace multiple subskills: Is LR-DBN best? In Proceedings of the 5th International Conference on Educational DataMining, 41-48 (2012).

11. Yuan, Y., Chang, K. M., Xu, Y., Mostow, J.: A Toolkit and Dataset for EEG inReading. In Proceedings of the 12th International Conference on Intelligent Tutor-ing Systems Workshop on Utilizing EEG Input in Intelligent Tutoring Systems (toappear).

12. Mostow, J., Beck, J. E.: When the Rubber Meets the Road: Lessons from the In-School Adventures of an Automated Reading Tutor that Listens. In B. Schneider &S.-K. McDonald (Eds.), Scale-Up in Education (Vol. 2, pp. 183-200) (2007).

13. Chang, K. M., Beck, J. E., Mostow, J., Corbett, A.: A Bayes net toolkit for studentmodeling in intelligent tutoring systems. In Proceedings of the 8th InternationalConference on Intelligent Tutoring Systems, 104-113 (2006).

14. Beck, J. E., Gong, Y.: Wheel-spinning: Students who fail to master a skill. InProceedings of the 16th Artificial Intelligence in Education, 431-440 (2013).

48

A Public Toolkit and ITS Dataset for EEG

Yueran Yuan, Kai-min Chang, Yanbo Xu, Jack Mostow

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA. USA yuerany, kkchang, yanbox, [email protected]

Abstract. We present a data set collected since 2012 containing children’s EEG signals logged during their usage of Project LISTEN’s Reading Tutor. We also present EEG-ML, an integrated machine learning toolkit to preprocess EEG da-ta, extract and select features, train and cross-validate classifiers to predict be-havioral labels, and analyze their statistical reliability. To illustrate, we de-scribe and evaluate a classifier to estimate a student’s amount of prior exposure to a given word. We make this dataset and toolkit publically available1 to help researchers explore how EEG might improve intelligent tutoring systems.

Keywords: EEG; toolkit; reading comprehension; machine-learning; Project LISTEN’s Reading Tutor

1 Introduction

With the rising importance of educational data mining and analytics, measuring and predicting student actions and mental states has become a key part of building better educational technologies. Electroencephalography (EEG) records a student’s brain activity using electrodes on the scalp. Studies show EEG can be informative of many educationally relevant metrics including workload [1] and learning [2]. Howev-er, most of those studies were done in laboratories using laboratory-grade devices and fail to simulate the cost or environmental constraints of real classroom deployment of EEG devices. To explore the feasibility of practical classroom usage of EEG devices, we need to 1) use EEG devices simple enough for students to wear without assistance and cheap enough for schools to afford en masse and 2) record students’ data in a realistic school setting. Collecting data in this way introduces two notable challenges: 1) the reduced dimensionality resulting from fewer sensors on a cheaper device and 2) the environmental noise inherent to an uncontrolled setting.

In this paper, we present a dataset from 3 years of school usage of Project LISTEN’s Reading Tutor during which we recorded students’ EEG signals. The sig-nals were recorded with a consumer-grade single-channel device that now costs less than $100 each. We also describe EEG-ML, a machine learning toolkit to create and evaluate EEG-based classifiers of student actions and mental states. Many general-purpose EEG processing software and machine learning packages have been imple-mented and distributed; however, combining EEG processing with machine learning 1 https://sites.google.com/site/its2014wseeg/eeg_ml

49

often involves complicated coding effort. EEG-ML simplifies the research process by providing a single pipeline for signal processing, classifier building, and cross-validated evaluation. We do not claim that the toolkit is an algorithmic innovation, rather a framework and baseline implementation to allow researchers to explore dif-ferent algorithms and prediction tasks without needing to write a lot of code. We demonstrate a use case on the Reading Tutor dataset by applying this toolkit to esti-mate students’ level of prior exposure to the word they’re reading.

2 Project LISTEN’s Reading Tutor EEG Dataset

Project LISTEN’s Reading Tutor [3] (Fig. 1) is an intelligent tutoring system that displays text, listens to a student read it aloud, uses automated speech recognition to track the student’s position in the text and detect miscues [4], and responds with spo-ken and graphical feedback. For 3 years, students 7-13 years old have worn EEG devices while they used the Reading Tutor. Our dataset consists of EEG signals and Reading Tutor logs collected during this period.

Fig. 1. On the left, NeuroSky’s BrainBand. On the right, two students wear BrainBands that log their EEG data while they use the Reading Tutor.

2.1 Behavioral Data

We define a trial as a behavioral event with some outcome label, along with the corresponding EEG signal recorded during that time. Two types of such events are:

Sentence Encounter. During a session with the Reading Tutor, the tutor presents one sentence (or fragment) at a time, and asks the student to read it aloud. As the stu-dent reads the sentence, the words recognized by the tutor turn green.

Word Encounter. The student’s speech is recognized as a time-aligned sequence of words by an Automated Speech Recognizer (ASR). The ASR estimates when each text word was read and whether the reading was correct. The tutor computes latency as the duration between successive text words. The ASR is imperfect – it detects only about 25% of misread words and falsely rejects about 2% of correctly read words.

50

2.2 EEG Data

We used NeuroSky’s BrainBands to collect EEG data (see Fig. 1). The BrainBand is a wireless device with one electrode on the forehead (frontal lobe) roughly between Fp1 and Fp2 in the 10-20 system. BrainBands output raw EEG signals at a rate of 512 Hz and NeuroSky’s proprietary eSense measures at 1 Hz. The BrainBand is a product for consumers so it is designed with ease of use in mind. Unlike the multi-channel electrode nets worn in labs, the BrainBand requires no gel or saline for recording, making it easier to wear and maintain. Students are able to put on the headset with minimal supervision. Students included in the dataset encountered ~160,000 sentenc-es containing ~800,000 words and we recorded about 108 hours of EEG data, though not all sentence and word encounters have corresponding EEG data, and vice versa. Our example word-exposure classifier used data from the 17 students with the most data, who had an average of 3,600 words with aligned EEG.

3 The EEG-ML Toolkit

Machine Learning for EEG (EEG-ML) is a toolkit for studying EEG in the context of intelligent tutoring systems. The pipeline attempts to cover the complete process of signal processing, machine learning, and evaluation/analysis. Much of this pipeline has been described previously [5]. See Fig. 2 for the pipeline’s overall structure. We will describe important components below (see project website1 for a full descrip-tion).

Fig. 2. Overview of pipeline of EEG machine-learning toolkit

As a demonstration of the toolkit and dataset, we use them to create a predictor for students’ number of prior encounters of particular words. Notably, this measure (word exposure) is fairly well insulated from ASR error and we urge caution when studying measures that could be heavily impacted by ASR error, such as latency or correctness. We will use this classifier as a running example as we describe our pipeline.

51

Inputs. The pipeline’s inputs are 1) a spreadsheet of behavioral data containing the label that we want to predict, 2) a spreadsheet of EEG data, and 3) a set of parameters specifying the algorithms and arguments to be used in the pipeline. For our word-exposure classifier, we labeled the first 11 encounters of a word (by each subject) as ‘early encounter’ and the remaining encounters as ‘late encounter’. Example: if the subject saw ‘cat’ 30 times, the first 11 times are early encounters and the final 19 times are late encounters. We chose 11 as a threshold so that we could have roughly the same number of early and late encounters. To avoid skewing our models with subjects who have little data, we removed subject with less than 8,000 seconds of EEG recordings and analyzed 17 students who read about ~62,000 words in total.

Pipeline. Given the behavioral and EEG data, the pipeline 1) aligns corresponding EEG signals to each trial in the behavioral data, 2) filters and derives features for each trial from the EEG signals aligned to that trial, 3) splits the data into training and test-ing sets following a cross-validation scheme. Within each cross-validation fold, we use feature selection to reduce the dimensionality, and train a classifier on the training set and apply it to the testing set. Finally, the pipeline aggregates classification results and evaluates the classifier’s performance. This entire process happens offline.

EEG Preprocessing and Filtering. Many noise sources (including eye blinks, fa-cial expressions) can introduce artifacts into the recorded signals. To remove potential artifacts the pipeline uses soft thresholding with wavelets to denoise the signals [6]. The pipeline also allows experimenters to remove trials whose EEG signal had a cer-tain proportion of low-quality signals. We use Neurosky’s PoorSignal score as a measure of signal quality. In building our word-exposure classifier, we are aggressive in filtering; we filter out all trials where more than 50% of corresponding signals have poor reported signal quality (score of 100 or higher on NeuroSky’s 0 to 200 poor signal scale).

Feature Generation. The unit of analysis is an individual trial. The pipeline breaks the trial into several epochs – an EEG segment of a fixed length. For example a 3-second-long trial could be broken into 3 epochs of 1 second each with no overlap, or it could be broken into 5 epochs of 1 second each with 0.5 seconds of overlap be-tween epochs.

The pipeline uses Fast Fourier Transform to extract oscillation features from each epoch – delta (1-3Hz), theta (4-7 Hz), alpha (8-11 Hz), beta (12-29 Hz), and gamma (30-100 Hz) frequency bands. Using these per-epoch features, the pipeline derives a set of higher-level features (e.g. mean, variance) for each trial. Our word-exposure classifier used 5 features - the means of each of the alpha, beta, gamma, theta, delta features of the epochs.

Cross-Validation. The pipeline supports leave-one-out cross-validation with a within-subject or between-subject scheme. In the within-subject scheme, the training set and test set are taken from the same subject, creating a subject-specific classifier using all but 1 trial from the subject as the training set, and testing on the left-out trial. In the between-subject scheme, we train on all trials from all but one subject, and test on the trials of the left-out subject. The between-subject scheme allows us to simulate how the algorithm will perform on unseen subjects. We used within-subject cross-validation to evaluate our word-exposure classifier.

52

Feature Selection. In cases where we have little data but many features, we often want to use feature selection to reduce the dimensionality of our data before feeding it to a classifier, in order to learn a classifier less sensitive to noise. The pipeline sup-ports two feature selection methods – Principal Component Analysis and T-Test based Rank Feature Selection. Because of the high level of noise in our data, our word-exposure classifier used 3 dimensional PCA to avoid over-fitting.

Train/Apply Classifier. Our pipeline supports two types of classifiers – Linear SVM and Gaussian Naïve Bayes. The Linear SVM classifier is more commonly used in brain signal processing. A Gaussian Naïve Bayes classifier allows us to train non-linear classifiers and train them more quickly than SVM. Our word-exposure classifi-er used a Gaussian Naïve Bayes classifier.

Evaluation. The pipeline computes 1) classification accuracy (ACC), 2) a chi-squared test comparing accuracy to chance (one over the number of categories), and 3) the receiver operating characteristic (ROC) curve and area under the curve (AUC). ACC is intuitive and widely used, but it can have issues with class size imbalance – a majority class classifier could obtain above-chance results. AUC calculates the area under the ROC curve, which is insensitive to data set imbalance. A majority class model would show a diagonal line from the bottom left to the top right corner in ROC space, and get an AUC score of 0.5.

Outputs. The outputs of the pipeline are 1) a table showing accuracy, AUC, N (number of data points), and p-value and 2) a spreadsheet where each row of the orig-inal behavioral data is annotated with the prediction made by the classifiers in the experiment so that further analysis may be done in other programs if desired. Our word-exposure classifier had an average accuracy of 57%, which was significantly above chance (p < 0.05), with AUC of 0.60. A measure whose accuracy is only in the high 50’s is not practical on its own, but can potentially be used as a feature in com-bination with other features to improve student modeling, as shown by Xu et al. [7]

Though significantly above chance, our accuracy is relatively low compared to that claimed in other EEG studies of learning [8]. A subtle difference in independence assumptions might be one reason [5]. Also, we expect the lower-end device and noisy in vivo setting to reduce accuracy. However, further analysis of features (e.g. which bands are most useful) and algorithms (e.g. different classifiers and kernels) could produce incremental improvements to results. Indeed, a key motive for releasing this toolkit and dataset is to provide the research community with a baseline to build upon and a common dataset to evaluate different algorithms.

4 Conclusion

We present a multi-year dataset of EEG data from in vivo usage of an intelligent tutoring system. We also present EEG-ML, a machine learning toolkit to produce and evaluate EEG-based classifiers. We hope the dataset and toolkit will allow researchers to focus on experimentation and analysis rather than data collection and technical implementation, facilitating their research into new applications of brain signal pro-cessing in building better intelligent tutoring systems.

53

5 Acknowledgement

This work was supported by the National Science Foundation under Cyber-learning Grant IIS1124240. Any opinions, findings, and conclusions or recommenda-tions expressed in this material are those of the author(s) and do not necessarily re-flect the views of the National Science Foundation.

References

[1] Berka, C., D.J. Levendowski, M.N. Lumicao, A. Yau, G. Davis, V.T. Zivkovic, T. Vladimir, R.E. Olmstead, P.D. Tremoulet, D. Patrice, and P.L. Craven. EEG correlates of task engagement and mental workload in vigilance, learning, and memory tasks. Aviation, Space, and Environmental Medicine, 2007. 78 (Supp 1): p. B231-244.

[2] Miltner, W.H.R., C. Braun, M. Arnold, H. Witte, and E. Taub. Coherence of gamma-band EEG activity as a basis for associative learning. Nature, 1999. 397: p. 434-436.

[3] Mostow, J. and J.E. Beck. When the Rubber Meets the Road: Lessons from the In-School Adventures of an Automated Reading Tutor that Listens. In B. Schneider and S.-K. McDonald, Editors, Scale-Up in Education, 183-200. Rowman & Littlefield Publishers: Lanham, MD, 2007.

[4] Mostow, J. Why and How Our Automated Reading Tutor Listens. Proceedings of the International Symposium on Automatic Detection of Errors in Pronunciation Training (ISADEPT), 43-52. 2012. Stockholm, Sweden. KTH, Computer Science and Communication, Department of Speech, Music and Hearing, SE-100 44 Stockholm, Sweden.

[5] Chang, K.M., J. Nelson, U. Pant, and J. Mostow. Toward Exploiting EEG Input in a Reading Tutor. International Journal of Artificial Intelligence in Education, 2013. 22(1-2): p. 19-38.

[6] Donoho, D.L. De-noising by soft-thresholding. IEEE Transactions on Infomation Theory, 1995. 41(3): p. 613-627.

[7] Xu, Y., K.M. Chang, Y. Yuan, and J. Mostow. EEG Helps Knowledge Tracing! In Proceedings of the 12th International Conference on Intelligent Tutoring Systems Workshop on Utilizing EEG Input in Intelligent Tutoring Systems. 2014: Honolulu.

[8] Heraz, A. and C. Frasson. Predicting the three major dimensions of the learner's emotions from brainwaves. World Academy of Science, Engineering and Technology, 2007. 31: p. 323-329.

54

Documents

Workshop on Utilizing EEG Input in Intelligent Tutoring ...frasson/FrassonPub/ITS-2014-W5...Aplusix [4] and Scatterplot [5], and the participants were made to wear an Emotiv headset