74

Classification and Transfer Learning of EEG during a

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification and Transfer Learning of EEG during a
Page 2: Classification and Transfer Learning of EEG during a

Classification and Transfer Learning of EEG during a Kinesthetic Motor Imagery Task using Deep Convolutional Neural Networks

A Thesis

Presented to

The Faculty of the Department of Electrical Engineering

University of Houston

In Partial Fulfillment

Of the Requirements for the degree

Master of Science

in Electrical Engineering

by

Alexander Craik

December 2018

Page 3: Classification and Transfer Learning of EEG during a

Classification and Transfer Learning of EEG during a Kinesthetic Motor Imagery Task using Deep Convolutional Neural Networks

______________________

Alexander Craik

Approved: ____________________________ Chair of the Committee Dr. Jose Contreras-Vidal, Professor, Electrical and Computer Engineering

Committee Members: ____________________________ Dr. Saurabh Prasad, Assistant Professor, Electrical and Computer Engineering

____________________________ Dr. Luca Pollonini, Assistant Professor, Computer Engineering Technology ____________________________ ____________________________ Dr. Suresh K. Khator, Associate Dean, Dr. Badri Roysam, Professor and Chair, Cullen College of Engineering Dept. of Electrical and Computer

Engineering

Page 4: Classification and Transfer Learning of EEG during a

Acknowledgements

This thesis is dedicated to my parents, Keith Craik and Diane Alexander, who have

provided me with an immeasurable amount of support and advice throughout my years in

academia. I’d also like to thank my research advisor, Dr. Jose Contreras-Vidal, for his

mentorship, and my friend and colleague, Dr. Yongtian He, for his insights and research

guidance.

iv

Page 5: Classification and Transfer Learning of EEG during a

Classification and Transfer Learning of EEG during a Kinesthetic Motor Imagery Task using Deep Convolutional Neural Networks

An Abstract

of a

Thesis

Presented to

The Faculty of the Department of Electrical Engineering

University of Houston

In Partial Fulfillment

Of the Requirements for the degree

Master of Science

in Electrical Engineering

by

Alexander Craik

December 2018

v

Page 6: Classification and Transfer Learning of EEG during a

Abstract

The reliable classification of Electroencephalogram (EEG) signals is a crucial step

towards making EEG-controlled non-invasive Brain-Machine exoskeleton rehabilitation

a practical reality. EEG signals collected during motor imagery tasks have been proposed

to act as a control signal for exoskeleton applications. Here, a Deep Convolutional Neural

Network (DCNN) was optimized to classify a two-class kinesthetic motor imagery EEG

dataset, leading to an architecture consisting of four convolutional layers and three fully

connected layers. Transfer learning, or the leveraging of data from past subjects to

classify the intentions of a new subject, is important for rehabilitation as it helps to

minimize the number of training sessions required from disabled subjects, who lack full

motor functionality. The transfer learning training paradigm investigated through this

thesis utilized region criticality trends to reduce the number of new subject training

sessions and increase the classification performance when compared against a single-

subject non-transfer-learning classifier.

vi

Page 7: Classification and Transfer Learning of EEG during a

Table of Contents

Acknowledgements ............................................................................................................ iv

Abstract .............................................................................................................................. vi

Table of Contents .............................................................................................................. vii

List of Figures .................................................................................................................. viii

List of Tables ..................................................................................................................... ix

Chapter 1: Introduction ....................................................................................................... 1

1.1 Exoskeleton Assisted Therapy ............................................................................. 1

1.2 EEG Signal Classification .................................................................................... 3

1.3 Deep Learning Classification of EEG .................................................................. 8

1.4 Classification of the Kinesthetic Motor Imagery Task Dataset ......................... 13

1.5 Transfer Learning ............................................................................................... 17

1.6 Specific Aims and Contributions ....................................................................... 17

1.7 Thesis Organization............................................................................................ 19

Chapter 2: Methods ........................................................................................................... 20

2.1 Data Acquisition and Experimental Design ....................................................... 20

2.2 EEG Signal Pre-Processing ................................................................................ 21

2.3 Neural Network Optimization Method .............................................................. 24

2.4 Transfer Learning Method ................................................................................. 27

Chapter 3: Results ............................................................................................................. 32

3.1 Single-subject Architecture Optimization .......................................................... 32

3.2 Transfer Learning ............................................................................................... 38

3.2.1 Region Criticality Analysis Results ............................................................ 38

3.2.2 Transfer Learning Results ........................................................................... 43

Chapter 4: Discussion ....................................................................................................... 49

Chapter 5: Conclusion....................................................................................................... 53

References ......................................................................................................................... 54

vii

Page 8: Classification and Transfer Learning of EEG during a

List of Figures

Figure 1: Two types of exoskeletons in use today .............................................................. 2

Figure 2 - A) The international 10-20 system for EEG electrode placement B) A 60-

channel electrode cap with 4 visible EOG electrodes. ................................................ 4

Figure 3: A) Common neural network feed-forward algorithm and B) common neural network backpropagation algorithm .......................................................................... 10

Figure 4: 13 Regions of importance.................................................................................. 15

Figure 5: Removal of EOG artifacts ................................................................................. 22

Figure 6: Example deep convolutional neural network architecture. ............................... 25

Figure 7: Regions of importance implementation ............................................................ 28

Figure 8: Region criticality analysis of the external subjects 2 and 3 while subject 1 is the

intended primary subject ........................................................................................... 30

Figure 9: Accuracy comparisons as a function of the number of convolutional layers .... 33

Figure 10: Accuracy comparisons as a function of the number of fully-connected

classifier layers .......................................................................................................... 34

Figure 11: Differences in the shape of the classifier block ............................................... 35

Figure 12: Accuracy comparisons as a function of the classifier block shape ................. 35

Figure 13: The optimized deep convolutional neural network architecture ..................... 37

Figure 14: Region criticality analysis, single-subject model (A) and transfer learning

model (B), and scalp maps for Subject 1 ................................................................... 39

Figure 15: Region criticality analysis, single-subject model (A) and transfer learning

model (B), and scalp maps for Subject 2 ................................................................... 40

Figure 16: Region criticality analysis, single-subject model (A) and transfer learning

model (B), and scalp maps for Subject 3 ................................................................... 41

Figure 17: Region criticality analysis and scalp map for all three subjects combined ..... 43

Figure 18: The session-by-session (A) and average accuracies (B) found from different

dataset formations for Subject 1 ................................................................................ 44

Figure 19: The session-by-session (A) and average accuracies (B) found from different

dataset formations for Subject 2 ................................................................................ 45

Figure 20: The session-by-session (A) and average accuracies (B) found from different dataset formations for Subject 3 ................................................................................ 46

viii

Page 9: Classification and Transfer Learning of EEG during a

List of Tables

Table 1: Common frequency band designations, corresponding frequency ranges, and

characteristic mental states ................................................................................................. 5

Table 2: Index and names for the thirteen regions of importance .................................... 16

Table 3: Transfer Learning Model Results. ...................................................................... 47

ix

Page 10: Classification and Transfer Learning of EEG during a

Chapter 1: Introduction

1.1 Exoskeleton Assisted Therapy

Hundreds of thousands of Americans are living with severe motor disabilities due to

limb loss, spinal cord injuries, and neurodegenerative diseases (Forbes, Duncan, &

Zimmerman, 1997; B. B. Lee, Cripps, Fitzharris, & Wing, 2014). Subjects who have

survived multiple stroke events tend to walk in an overcompensating manor, leading to

the eventual necessity of relearning gait patterns during rehabilitation (Contreras-Vidal et

al., 2016; Forbes et al., 1997). Spinal cord injuries and limb loss subjects live with

severely reduced freedom of movement and a reliance on wheelchairs or caregivers (B.

B. Lee et al., 2014). Rehabilitation for these subjects is typically labor and cost intensive

as significant support from physical therapists is required (Zemke, Heagerty, Lee, &

Cramer, 2003). These limitations and recent technological advances in control systems

and robotics have led research to investigate the possibility of using exoskeletons to assist

in the rehabilitation of these subjects (Jarrassac et al., 2014; Strausser, Swift, & Zoss,

2018). Figure 1 shows two examples of the types of exoskeletons in use today.

1

Page 11: Classification and Transfer Learning of EEG during a

Figure 1: Two types of exoskeletons in use today - A) HAL5 – exoskeleton designed to support four extremities with motor function assist B) ReWalk – exoskeleton designed to support hip and knee movement (Medicine, Miko, & Miko, 2013)

Exoskeletons are a robotic application that can be used to expand or improve upon a

user’s motor functionality. Exoskeletons have been proposed to act as an alternative to

wheelchairs, to increase a subject’s mobility during certain activates (walking, climbing

stairs, upper-body functionality) (Contreras-Vidal et al., 2016), and as a tool to assist in

gait re-education therapy following stroke or lower-body injuries (Strausser et al., 2018).

Traditionally, exoskeleton control is either fully or partially manual in nature. Manual

control is accomplished with the use of user-activated buttons or joysticks; however, this

is not always a feasible solution as manual control deprives the subject of his or her hand

freedom (Noda et al., 2012). Surgically invasive control signal acquisition methods have

also been proposed for both primates and tetraplegic subjects, but the dangers due to

2

Page 12: Classification and Transfer Learning of EEG during a

surgery and the degradation of the control signal over time makes this a less than

practical solution (Carmena et al., 2003; Hochberg et al., 2012). Due to these practicality

and health risk issues, electroencephalography (EEG) signals, electrical activity of the

brain and measured at the scalp, have been proposed as a possible solution due to its high

temporal resolution, non-invasiveness, and relatively low financial cost (He et al., 2018;

K. Lee, Liu, Perroud, Chavarriaga, & Millán, 2017; Lotte et al., 2007; Noda et al., 2012;

Pouratian, 2012).

1.2 EEG Signal Classification

Electroencephalography measurement method is the typically non-invasive

recording of electrical activity from the brain, measured at the scalp (Teplan, 2002). EEG

signals are collected via electrodes placed along the scalp using a standard electrode

placement system called the 10-20 international system. The 10-20 international system,

shown in Figure 2A, represents the recognized standard location protocol for EEG

electrode placement. The ‘10’ and ’20’ designations refer to the distances between

adjacent electrodes in that an electrode is either 10% or 20% of the total left-right or

front-back skull length from any adjacent electrode. Figure 2B shows a subject with a 60-

channel electrode cap. Also present are four additional electrodes placed near the eyes.

These electrodes measure electrooculography activity, which help to identify and

eliminate eye-blink artifacts.

3

Page 13: Classification and Transfer Learning of EEG during a

A B

Figure 2: - A) The international 10-20 system for EEG electrode placement B) A 60-channel electrode cap with 4 visible EOG electrodes.

EEG signals are complex and are defined in terms of rhythmic characteristics. The

rhythmic activity is distributed into different frequency bands. People have different

amplitude and frequency characteristics of EEG signals depending on factors like age

(Duffy, Mcanulty, & Albert, 1996) . Based on frequency ranges, six types of waves can

be identified. They are delta (δ), theta (θ), mu (µ), alpha (α), beta, (β), and low/high

gamma (γ) from low to high frequency respectively. Different mental states and artifacts

are associated with different frequency bands and this is described in Table 1. In this

table, three major EEG artifact sources are listed in the respective frequency ranges.

Electrooculography (EOG) artifacts, signals produced by eye movements, are contained

within the Delta band, while electromyography (EMG) artifacts, signals produced by

muscle movement, and electrical line noise are contained within the high Gamma band.

4

Page 14: Classification and Transfer Learning of EEG during a

Table 1: Common frequency band designations, corresponding frequency ranges, and characteristic mental states

Band Frequency Range (Hz) Characteristic Mental State

Delta (δ)

0-4

Continuous attention tasks

(Kirmizi-Alsan et al., 2006)

Contains dominant EOG frequency range

Theta (θ)

4-7

Associated with inhibition of particular responses

(Kirmizi-Alsan et al., 2006)

Mu (µ)

8-12

Rest-state motor neurons, visualization of motor

actions (Lazarou, Nikolopoulos, Petrantonakis,

Kompatsiaris, & Tsolaki, 2018)

Alpha (α)

8-15

Associated with inhibition, relaxed state

(Kirmizi-Alsan et al., 2006)

Beta (β)

16-30

Active thinking, focused state (Baker, 2007)

Low

Gamma

(γ)

30-50

(Greyson, Kelly, &

Dunseath, 2013)

Related to synchronization process

of different parts of the brain

(Kort, Cuesta, Houde, & Nagarajan, 2016)

High

Gamma

(γ)

>50

(Greyson et al., 2013)

Contains the dominant EMG

frequency range (50 Hz+) (Luca, 2002)

and electrical line noise (60 Hz)

The measurements and classifications of these signals are used to control either

software- or hardware-based external objects (Bahy, Hosny, Mohamed, & Ibrahim,

2017). In order to produce reliable control signals for BCI’s, and in particular

5

Page 15: Classification and Transfer Learning of EEG during a

exoskeletons, EEG measurements are recorded during various tasks and machine learning

is used to classify a certain stage within that task. For example, the most prevalent types

of tasks found within the literature fall into five general groups: emotion recognition,

motor imagery, mental workload, sleep pattern classification, and seizure detection.

Emotion recognition studies attempt to gauge a subject’s current emotional state by

training a classifier through the subject’s repetition of a multi-emotion state task, such as

(Zheng & Lu, 2015), which used video clips that were identified as producing a specific

emotion. The primary drive for emotion recognition studies is the eventual application in

brain-machine interfaces as understanding a subject’s emotion will help the underlying

algorithm decide whether a selected movement was the intended movement. More

generally, emotion recognition studies help computers better understand the current

emotional state of the user.

Mental workload tasks involve measuring EEG data while the subject was under

varying degrees of mental task complexity, such as airplane pilot and long-range driving

studies (Hajinoroozi, Mao, & Huang, 2015; Yin & Zhang, 2017). This kind of task may

be applied in two general areas: cognitive stress monitoring or brain-machine

performance monitoring

Seizure detection studies (Hosseini, Pompili, Elisevich, & Soltanian-Zadeh, 2017;

Korshunova et al., 2017) were designed for the eventual application for detecting

upcoming seizures in order to preemptively notify the epileptic subject. Sleep stage

scoring tasks focuses on reducing the reliance on trained personnel in the analysis and

6

Page 16: Classification and Transfer Learning of EEG during a

understanding of a subject’s sleep stages (Dong et al., 2018; Tsinalis, Matthews, & Guo,

2016).

The fifth type of general task type is motor imagery tasks, which involve having the

subject imagine certain muscle movements of the limbs and/or the tongue (Pfurtscheller

& Neuper, 2001). Motor imagery tasks are the most prominent EEG task type used for

exoskeleton control (Pouratian, 2012). The specific subject protocol for this type of task

falls into two groups (Fery, 2014): visual motor imagery (VMI) and kinesthetic motor

imagery (KMI). In visual motor imagery tasks, the subject imagines seeing himself or

another person performing the motor action, which is referred to as a ‘third-person

process’. Visual motor imagery tasks have shown a higher activation of the occipital

lobe, which indicates that this type of task is primarily visual in character (Stinear,

Byblow, & Swinnen, 2006). Kinesthetic motor imagery tasks involve the subject

imagining self-performed actions in an interior view, referred to as a ‘first-person

process’, by focusing on the activated muscle groups associated with the internalized

motor action. KMI tasks produce higher activations of the sensorimotor areas (Stinear et

al., 2006) rather than the occipital lobe.

One major drawback towards the use of EEG signals is that the signal-to-noise ratio

(SNR) is low. This characteristic leads to two distinct challenges, the first being accurate

classification. Recent advances in deep learning techniques, in particular deep

convolutional neural networks, have allowed past research to accurately classify EEG

data.

7

Page 17: Classification and Transfer Learning of EEG during a

1.3 Deep Learning Classification of EEG

Neural networks did not immediately receive the high attention seen today in neural

classification applications because of practical issues, such as very long computation time

and problems with the vanishing or exploding gradients (Bengio, Simard, & Frasconi,

1994). Fortunately, the recent development of graphic processing units (GPU’s) brought

neural network researchers an inexpensive and powerful solution to their hardware

bottleneck (Lecun, Bengio, & Hinton, 2015), allowing them to investigate deep learning

architectures (neural network architectures containing at least two hidden layers). These

innovations have led to an exponential increase in interest and applications of deep

learning in the past decade. Because neural networks iteratively and automatically

optimize many of its parameters, they are generally believed to require less prior expert

knowledge about the dataset to perform well (Lecun et al., 2015). This advantage led to

early adaptations in the realm of medical imaging (Greenspan, van Ginneken, &

Summers, 2016), which usually involves large datasets that are otherwise difficult to

interpret, even by experts. Recently, deep learning frameworks have been applied to the

classification of EEG signals.

To better understand the state of deep learning classification of EEG, a review of the

state of the art was performed on deep learning EEG classification. For motor imagery

tasks, the type of task this thesis investigates, two deep learning implementations were

found among the reviewed studies: convolutional neural networks and deep belief

networks. Of those two options, studies that used deep belief networks were drawn

towards hand-crafted features, whereas studies that implemented convolutional neural

networks were able to process raw data directly. For instance, (Tang, Li, & Sun, 2017)

8

Page 18: Classification and Transfer Learning of EEG during a

achieved 92.5% accuracy in a 2-class motor imagery problem, whereas (Liu, Cheng, &

Zhang, 2015) achieved 100% accuracy in a 4-class motor imagery problem with both

studies utilizing deep convolutional neural networks. As the minimization of pre-

processing effort has significant benefits for online exoskeleton practicality, this thesis

revolves around using deep convolutional neural networks to classify a kinesthetic motor

imagery task.

Deep Convolutional Neural Networks (DCNN) are biologically-inspired variations

of feedforward multi-layer neural networks. Neural networks are based on a series of

connected nodes, which ‘learn’ the connection weights between themselves by

considering examples, feed-forward process, and adjusting the interconnecting weights

through a non-linear process of backpropagation (Lecun et al., 2015). Figure 3 describes

the feed-forward and back-propagation process of a neural network with two hidden

layers.

9

Page 19: Classification and Transfer Learning of EEG during a

𝑦𝑦𝑙𝑙 = 𝑓𝑓(𝑧𝑧𝑙𝑙) (1.3.1)

𝑧𝑧𝑙𝑙 = ∑𝑤𝑤𝑘𝑘𝑙𝑙 𝑦𝑦𝑘𝑘 (1.3.2)

𝑘𝑘 𝜖𝜖 𝐻𝐻2 (1.3.3) 𝑦𝑦𝑘𝑘 = 𝑓𝑓(𝑧𝑧𝑘𝑘) (1.3.4)

𝑧𝑧𝑘𝑘 = ∑𝑤𝑤𝑗𝑗𝑘𝑘 𝑦𝑦𝑗𝑗 (1.3.5)

𝑗𝑗 𝜖𝜖 𝐻𝐻1 (1.3.6) 𝑦𝑦𝑗𝑗 = 𝑓𝑓�𝑧𝑧𝑗𝑗� (1.3.7)

𝑧𝑧𝑗𝑗 = ∑𝑤𝑤𝑖𝑖𝑗𝑗 𝑦𝑦𝑖𝑖 (1.3.8)

𝑗𝑗 𝜖𝜖 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 (1.3.9)

∂E∂𝑦𝑦𝑙𝑙

= 𝑦𝑦𝑙𝑙 − 𝑖𝑖𝑙𝑙 (1.3.10) ∂E∂𝑧𝑧𝑙𝑙

= ∂E∂𝑦𝑦𝑙𝑙

∂𝑦𝑦𝑙𝑙∂𝑧𝑧𝑙𝑙

(1.3.11) ∂E∂𝑦𝑦𝑘𝑘

= ∑ 𝑤𝑤𝑘𝑘𝑙𝑙𝑖𝑖 𝜖𝜖 𝑜𝑜𝑜𝑜𝑜𝑜∂E∂𝑧𝑧𝑙𝑙

(1.3.12) ∂E∂𝑧𝑧𝑘𝑘

= ∂E∂𝑦𝑦𝑘𝑘

∂𝑦𝑦𝑘𝑘∂𝑧𝑧𝑘𝑘

(1.3.13) ∂E∂𝑦𝑦𝑗𝑗

= ∑ 𝑤𝑤𝑗𝑗𝑘𝑘𝑖𝑖 𝜖𝜖 𝐻𝐻2∂E∂𝑧𝑧𝑘𝑘

(1.3.14) ∂E∂𝑧𝑧𝑗𝑗

= ∂E∂𝑦𝑦𝑗𝑗

∂𝑦𝑦𝑘𝑘∂𝑧𝑧𝑗𝑗

(1.3.15)

Figure 3: A) Common neural network feed-forward algorithm and B) common neural network backpropagation algorithm (Lecun et al., 2015)

10

Page 20: Classification and Transfer Learning of EEG during a

In the feed-forward algorithm presented in Figure 3-A, input data is fed into the

input neurons and proceeds upwards using the following basic protocol. The value for

each neuron, zj, is a summation of the input data, xij, multiplied by the connection weight,

wij (eq. 1.3.9). A non-linear activation function is then applied to the value of the neuron,

yj = f(zj) (eq. 1.3.7). Yj now acts as an input neuron for the following layer. This process

repeats through each layer until the final classification layer, which outputs the class

prediction (eq. 1.3.1).

In the backpropagation algorithm presented in Figure 3-B, the process starts at the

output layer, where the error derivative is computed by comparing the predicted output

with the true output using a cost function (eq. 1.3.10). The specific cost function varies

between applications, but the general process involves calculating the derivative of the

error with respect to each connection weight, wij, and iteratively adjusting the weights to

minimize error (eq. 1.3.12) (Lecun et al., 2015).

Convolutional neural networks are structured with a series of convolution and pooling

stages prior to one or more fully-connected layers. Individual units of a convolution layer

are organized into feature maps, which link that unit to local patches of the feature map

from the previous layer through a collection of shared weights called a filter bank. The

pooling layers combine similar features from the convolutional layer into a single feature.

The use of local receptive fields, weight sharing, and pooling layers helps to reduce the

high dimensionality of EEG data (Lawrence, 1997). The filter banks necessary to perform

these convolutions are automatically adjusted through back-propagation. More

conceptual information on neural networks can be found in (Lecun et al., 2015).

11

Page 21: Classification and Transfer Learning of EEG during a

Research on EEG that utilize machine learning methods other than deep

convolutional neural networks, such as multi-layer perceptrons, deep belief networks, or

stacked auto-encoders, typically use hand-crafted features, which is time consuming and

relies heavily on expert prior knowledge (Lotte et al., 2007). Convolutional neural

networks have been shown to work well with the raw EEG signal (O ’shea, Lightbody,

Boylan, & Temko, n.d.; Tang et al., 2017; Van Putten, Olbrich, & Arns, 2018), which

gives it an advantage for applications where the amount of available pre-processing time

is limited. Among deep learning EEG studies, DCNN studies had the greatest proportion

of studies using signal values as inputs and the majority of studies did not limit the

number of channels, indicating that DCNN’s are more capable of handling the high

dimensionality and size of EEG signal value datasets when compared to other machine

learning algorithms. For example, (Yanagimoto & Sugimoto, 2016) achieved higher

accuracy with raw signal values from all channels when classifying a public EEG dataset

(Soleymani, Member, & Lee, 2012) than other studies that required extensive effort

creating inputs (Jirayucharoensak et al., 2014; Li, Huang, Zhou, & Zhong, 2017; Xiang

Li et al., 2016).

In order to optimize the neural network architecture, key parameters must be

compared. For convolutional neural networks, the number and size of the convolutional

and fully-connected classifier layers are typically seen as the most influential parameters.

Four studies that used raw EEG data compared accuracies while varying the number of

convolutional layers. (Yanagimoto & Sugimoto, 2016), an emotion recognition study,

and (Acharya, Oh, Hagiwara, Tan, & Adeli, 2017), a seizure detection study, found that

five convolutional layers achieved the best accuracies. (Antoniades et al., 2017) found

12

Page 22: Classification and Transfer Learning of EEG during a

that accuracy peaked with four convolutional layers and trended downwards as

convolutional layers increased. (Schirrmeister et al., 2017) compared a shallow two

convolutional layer CNN versus a deep four convolutional layer CNN and found that the

deep CNN consistently outperformed the shallow CNN. While there were no studies that

specifically compared the different numbers of classifier layers, the identified studies

used one or two fully connected layers.

Of the four studies that used convolutional neural networks to classify motor

imagery data (Liu et al., 2015; Sakhavi, Guan, & Yan, 2015; Tabar & Halici, 2017; Tang

et al., 2017), none specifically implemented or described experiment protocols that would

allow a designation of either kinesthetic or visual motor imagery tasks. While the cortical

activations share some similarities (Stinear et al., 2006), trends for motor imagery task

studies found through the review cannot be entirely relied upon and reliable classification

of KMI tasks requires further optimization analyses.

1.4 Classification of the Kinesthetic Motor Imagery Task Dataset

The dataset analyzed within this thesis was first collected and researched by

(Kilicarslan, Prasad, Grossman, & Member, 2013; Kilicarslan, Grossman, & Contreras-

Vidal, 2016; Zhang, Prasad, Kilicarslan, & Contreras-vidal, 2017). Subjects were fitted

with a wearable powered exoskeleton (REX from REX Bionics Ltd., New Zealand) and

asked to perform a two-class kinesthetic motor imagery task. During these studies,

healthy subjects were asked to imagine themselves moving forward in the exoskeleton

while in the ‘walking’ stage, followed by a period of immobility, the ‘resting’ stage (a

‘GO-NO GO’ task as described in (Kirmizi-Alsan et al., 2006)). The subjects were

specifically asked to imagine themselves walking from a first-person viewpoint, as

13

Page 23: Classification and Transfer Learning of EEG during a

opposed to the third-person formulation found in visual motor imagery tasks, with the

stated focus on imagining the muscle groups that would be active if moving

independently of the exoskeleton, thus making this a kinesthetic motor imagery task.

The three studies differed in their approaches to pre-processing and classification.

All three studies used a 2nd order Butterworth filter with a frequency range of 0.1-2 Hz,

meaning the classification focus was solely on the lower Delta frequency band. The

differences in pre-processing center around how each study handled electrooculography

artifacts. The preliminary papers by Kilicarslan and Zhang relied upon the Butterworth

filter to remove artifacts, while Kilicarslan’s second paper introduced a robust adaptive

denoising filter strategy (H∞ filtering) in addition to the Butterworth filter, which proved

to be effective in the removal and cleaning of EEG signals contaminated with eye-blink

artifacts.

In (Kilicarslan et al., 2016, 2013), a feature matrix was extracted from the raw signal

by using Local Fisher’s Discriminant Analysis, which reduces the dimensionality of the

data while retaining the multi-modal nature. Then, a Gaussian Mixture Model was

employed for classification, which is a probabilistic model that combines multiple

Gaussian distributions and predicts where a particular observation lies. Both studies

achieved validation accuracies of over 95% based on testing the classifier on a randomly

sub-sampled set of subject EEG data. The focus of (Zhang et al., 2017) was two-fold:

accurate classification on previously untouched data as well as a region of importance

analysis. For classification, the authors opted for a type of Support Vector Machine

(SVM) called Multi-Kernel Learning support vector machine (MKL). SVM’s are a

supervised machine learning classifiers that are especially suited for linear regression

14

Page 24: Classification and Transfer Learning of EEG during a

problems, excelling in binary classification. MKL’s use the ‘kernel trick’, which allows

SVM’s to handle higher dimensional non-linear classification by using the computations

of inner products between kernels to replace the computationally expensive need of

directly modeling higher dimensional space (Moore & Ezra, 2002). Accuracy trends for

this study were reported in a per-session basis and found the subject’s accuracy increased

linearly as the number of included sessions increased and were contained within a range

of approximately 83% to 92% accuracy. In addition to classification, (Zhang et al., 2017)

performed a region of importance criticality analysis. The surface of the scalp was

divided into 13 regions and the corresponding electrodes were formed into groups (Figure

4). The designated region names are displayed in Table 2.

Figure 4: 13 Regions of importance - the scalp divided into 13 regions in order to assess the individual importance of each region towards the classification of the EEG signal.

15

Page 25: Classification and Transfer Learning of EEG during a

Table 2: Index and names for the thirteen regions of importance as formulated by (Zhang et al., 2017)

Index ROI Name Index ROI Name

1

Anterior Frontal

8

Left Parieto-Occipital

2 Left Fronto-Central 9 Middle Parieto-Occipital

3 Midline Fronto-Central 10 Right Parieto-Occipital

4 Right Fronto-Central 11 Left Temporal

5 Left Centro-Parietal 12 Right Temporal

6 Midline Centro-Parietal 13 Occipital

7 Right Centro-Parietal

In the multi-kernel MKL approach, each region was treated as an individual signal

source. After the model was trained, the weights corresponding to each region were

compared and, through this analysis, three regions, Midline Fronto-Central (3), Right

Fronto-Central (4), and Left Centro-Parietal (5) regions, were found to have the greatest

impact on classification performance. It’s important to re-emphasize that this is the

region of importance analysis based on a limited EEG frequency range, which may mean

that the same trends on regional importance differ when analyzing the entire frequency

range.

All three papers classified this dataset in a within-subject basis, meaning that data

used to train a classifier for a particular subject used data exclusively from that subject.

No cross-subject or transfer learning analysis has yet been applied to this dataset.

16

Page 26: Classification and Transfer Learning of EEG during a

1.5 Transfer Learning

As previously mentioned, EEG signals suffer from a low SNR. The first of two

challenges this creates is the difficulty of accurately classifying the data, which was

discussed in the preceding two sections. The second major challenge from having a low

SNR ratio for EEG-based BCI applications, and in particular exoskeleton BCI’s, is the

necessity of long training periods for disabled subjects. Transfer learning is the process of

leveraging past subject data to predict a current subject’s intentions. This has been

proposed as a way to reduce the necessary training time for other tasks such as predicting

cognitive performance while driving (Hajinoroozi et al., 2015), drowsiness classification

and detection (Wei et al., 2016), and the classification of single-trial event related

potentials (Wu, Lance, & Lawhern, 2014).

(Hajinoroozi, Mao, Jung, Lin, & Huang, 2016) compared transfer learning

classification performances between different machine learning classifiers and found that

variations of deep convolutional neural networks outperformed other methods, such as

support vector machines and deep belief networks. (Wei et al., 2016; Wu et al., 2014)

both compared the performances of single-subject models versus transfer learning models

and found that transfer learning models reliably outperformed single-subject models

given the same number of primary subject sessions. To date, no structured study has

investigated transfer learning paradigms with a motor imagery dataset, kinesthetic or

otherwise.

1.6 Specific Aims and Contributions

The focus of this thesis is to investigate deep learning architecture optimization for

accurate classification and transfer learning strategies in order to reduce the amount of

17

Page 27: Classification and Transfer Learning of EEG during a

time needed from disabled subjects. This will be accomplished by focusing on the

following two specific aims.

Specific Aim 1: Develop offline single-subject neural classifiers by optimizing key

DCNN architecture parameters. Due to the lack of research on DCNN applications to

kinesthetic motor imagery tasks, nothing definite is known on the necessary parameters

for reliable classification. To optimize the DCNN’s for kinesthetic motor imagery tasks,

specific parameters of DCNN’s, namely the number of convolutional and fully-connected

classifier layers, will be varied and the classification performances between the different

models will be compared. The proposed approach will help optimize the framework for

future DCNN research on this type of task as it will advance understanding of DCNN

parameters necessary for high classification performance. As the goal of Specific Aim 1

is to develop single subject classifiers, these DCNN’s will be trained solely on a single

subject’s data separately, whereas Specific Aim 2 will focus on using the combined data

from all subjects. This analysis will conclude with a set of well-performing subject-

specific DCNN parameters for this kinesthetic motor imagery EEG dataset.

Specific Aim 2: Develop cross-subject neural classifiers based on region criticality

analysis and transfer learning. Current approaches require extensive training time for

subjects who have limited motor functionality, so any classification paradigm that helps

to reduce the necessary training time is preferred. A proposed solution to long calibration

time is to design cross-subject classifiers, also called transfer learning, which can predict

a new subject’s intent using data exclusively from previous subjects. Since no cross-

subject classification has yet been performed on this data, Specific Aim 2 has two parts.

First, classifiers will be trained using individual regions for each subject and the

18

Page 28: Classification and Transfer Learning of EEG during a

accuracies will be compared. The findings from part one will serve as a guide in creating

subsets of regions, which will be used in the transfer learning paradigm. The hope here is

that, by defining and comparing region criticality between subjects, more reliable cross-

subject DCNN’s can be trained with fewer regions. Additionally, critical region

comparisons between subjects will help in understanding the common neural correlates

produced by kinesthetic motor imagery. Specific Aim 2 will conclude with a description

of the critical regions specific to this type of task and an assessment on DCNN’s ability to

reliably perform transfer learning for kinesthetic motor imagery task classification.

1.7 Thesis Organization

The rest of this thesis is organized as follows. Chapter 0 will describe the methods

used during this study. This includes the original data acquisition information, the pre-

processing decisions and reasoning, and the methods for the DCNN optimization and

transfer learning analyses. Chapter 0 will describe the results found through this process,

including the architecture optimization, region criticality trends, and transfer learning

results. Chapter 0 presents a discussion motivated by the results, whereas Chapter 0

details this author’s conclusions for this thesis and suggestions for future research.

19

Page 29: Classification and Transfer Learning of EEG during a

Chapter 2: Methods

2.1 Data Acquisition and Experimental Design

The original dataset was collected through multiple studies (Kilicarslan et al., 2016,

2013; Zhang et al., 2017) at the University of Houston. The Institutional Review Board

(IRB) at the University of Houston approved the original research protocols. Three able-

bodied subjects (two males, aged 28 and 30, and a female, aged 21) gave their informed

written consent. The subjects were fitted with a wearable powered exoskeleton (REX,

REX Bionics Ltd., New Zealand) and asked to perform a task.

During this task, subjects executed walking and stop motions (‘GO/ NO GO’) based

on audible beep instructions. Each trial contained at least 10 stop-to-walk or walk-to-stop

transitions. The length of each ‘GO’ or ‘NO GO’ stage was varied in order to limit the

subject’s anticipatory response, but each stage ranged in length from 10 to 20 seconds.

Specific instructions on the kinesthetic motor imagery process were given before each

session. This included the emphasis that subjects should focus on imagining the muscle

groups that would be activated if physically executing the intended imagined motor

action. The subjects were trained over multiple sessions over a 30-day period in this

kinesthetic motor imagery task (10, 12, and 9 sessions for Subjects 1, 2, and 3

respectively).

EEG signal collection was accomplished with a 64-channel active-electrode EEG

system, which was composed of two 32-channel amplifiers (actiCap system, Brain

Products GmbH, Germany). 60 channels were reserved for EEG collection while four

were relocated to positions around the eyes to collect EOG signals. These EOG signals

20

Page 30: Classification and Transfer Learning of EEG during a

were then used to help identify and remove eye blink artifacts in the pre-processing stage.

The electrodes were placed and labeled in accordance with the extended 10–20

international system. A wireless interface (MOVE system, Brain Products GmbH,

Germany) sampled the data at 100 Hz and sent this data to the host PC. The original

authors minimized motion artifacts by using EEG collection best practices, which are

further detailed in (Nathan & Contreras-Vidal, 2016). This practice includes careful EEG

cap set-up, the sufficient and controlled application of conductive gel, the use of medical-

grade mesh to fixate individual electrode wirings, and the deployment of a wireless

active-electrode EEG system, which amplifies the signal directly at the electrode

location, thereby increasing the signal to noise ratio.

2.2 EEG Signal Pre-Processing

Eye blinks and eye movement are one of the primary sources of EEG signal

contamination. Eye blinks can be selected and removed manually, but this is highly labor

intensive. The removal of eye-blink contaminated sections of EEG data may also remove

information important for classification (Fatourechi, Bashashati, Ward, & Birch, 2007).

A method proposed by (Kilicarslan et al., 2016) and used within this thesis adapts a

control system denoising scheme, H-Infinity, to remove eye-blink artifacts. A comparison

of a single EEG channel segment with and without the H-Infinity denoising scheme is

displayed in Figure 5.

21

Page 31: Classification and Transfer Learning of EEG during a

Figure 5: Removal of EOG artifacts - Signal comparison of a single channel with and without the H-Infinity denoising scheme, showing the removal of a single eye-blink EOG artifact (highlighted)

In (Kilicarslan et al., 2016), the authors demonstrated that the H-Infinity denoising

algorithm outperformed two other leading EEG denoising schemes based on

classification performance results and the minimization of kurtosis and skewness. The

two denoising schemes were an Independent Component Analysis denoising strategy and

a principal-component based Artifact Subspace Reconstruction (ASR) scheme. For a

more detailed theoretical and practical explanation of H-Infinity, please see (Kilicarslan

et al., 2016).

After cleaning the data with H-Infinity, a 2nd order Butterworth filter was applied to

the data. All three previous studies opted to limit the 2nd order Butterworth filter ranges to

22

Page 32: Classification and Transfer Learning of EEG during a

0.1-2 Hz, which solely includes the lower half of the Delta frequency band. This thesis

investigates classification performance while utilizing a less restrictive filter so that data

from higher frequency bands can be leveraged for classification. Specifically, the mu

frequency band, which has been shown to be active during motor visualization (Lazarou

et al., 2018), the alpha band, associated with inhibition and relaxation stages of ‘GO-NO

GO’ tasks (Kirmizi-Alsan et al., 2006), the beta band, associated with active thinking

(Baker, 2007), and the gamma band. The gamma band includes frequencies above 30 Hz

(Greyson et al., 2013) and is associated with neural synchronization activities, but this

band is often contaminated at frequencies above 50 Hz due to coinciding with the

dominant frequency range for EMG signals (Luca, 2002) and 60 Hz power line noise

(Xue, Li, Li, & Wan, 2006).

To determine whether the inclusion of the gamma frequency band would negatively

affect classification performance, two filter frequency ranges were used during the

analysis for Specific Aim 1: 0.1-30 Hz and 0.1-50 Hz. In the first 360 simulations, 80%

of the models performed better with the inclusion of the lower Gamma band (0.1-50 Hz

filter range), which led this thesis towards using solely this larger filter range for the

remainder of the simulations within this thesis.

This cleaned EEG signal was then segmented into one-second intervals. Each

interval was composed of 100 time points by 60 channels. This is not an ideal input

formulation tactic as stacking the channels in this way creates an artificial spatial

relationship between the electrodes. This process was used for the same reason that

hand-crafted features were not used in that this input formulation requires significantly

less expert prior knowledge.

23

Page 33: Classification and Transfer Learning of EEG during a

2.3 Neural Network Optimization Method

In order to investigate the optimization of key DCNN parameters, which is the

primary goal of Specific Aim 1, several DCNN architectures were designed. The first

parameter investigated through this process was the number of convolutional layers. Five

DCNN’s were designed by adapting architectures described and tested in (Lawhern,

Solon, Waytowich, Gordon, & May, 2018; Schirrmeister et al., 2017). This specific

architecture was chosen based on Schirrmeister’s stated focus of generic parameter

selection and reproducible results. The adapted DCNN architecture for the analysis of

five convolutional layers is shown in Figure 6 and was designed through Python using

Tensorflow and Keras.

24

Page 34: Classification and Transfer Learning of EEG during a

Figure 6: Example deep convolutional neural network architecture - A deep convolutional neural network with five convolutional layers, adapted from architectures described in (Lawhern et al., 2018; Schirrmeister et al., 2017).

25

Page 35: Classification and Transfer Learning of EEG during a

In this architecture, the structure is composed of convolutional and classifier blocks.

The first convolutional block includes a set of temporal and spatial convolutional layers

with 25 linear and exponential linear units, respectively, followed by a maximum pooling

layer. The pool size used here and in all models is 1x2. Each convolutional block that

follows contains a convolutional layer with a varied number of exponential units

following by a maximum pooling layer. The convolutional filter size used in this thesis is

1x3, which differs from the implementation in both (Lawhern et al., 2018; Schirrmeister

et al., 2017), which used convolutional filters of sizes 1x5 and 1x10 respectively. The

difference here is the signal sampling rate. In Schirrmeister, the sampling rate was 250

Hz while the sampling rate with Lawhern was 128 Hz. Lawhern, who also had adapted

Schirrmeister’s original algorithm, found that a reduced stride value was necessary when

using a sampling rate slower than that used in Schirrmester’s architecture. This thesis

analyzes data sampled at 100 Hz and it was necessary to further reduce the convolutional

filter size. The final convolutional block flattens all remaining units and sends these

values into the classifer block. The classifier block in the DCNN arcitecture shown in

Figure 6 has a single fully-connected layer of two neurons with a softmax activation for

classification.

Each model was compiled using categorical crossentropy as the loss metric and

ADAM as the optimizer, which were selected based on the recommendations outlined in

the two original studies. Parameters not being optimized in this study were frozen

between each invdividual models so that, for example with the four convolutional block

DCNN, the final convolutional block was removed from the five block DCNN and the

26

Page 36: Classification and Transfer Learning of EEG during a

classifier block was connected to the remaining structure. This was iterated for the other

three DCNN architectures.

During the optimization stage of this thesis, an accuracy comparison was performed

based on results from feeding the input data from each subject into the individual neural

networks. The input data was randomized and seperated into three sections. 10% of the

data was seperated for testing, 25% was used as validation, and 65% was used as training

data. The neural network would train on the training data, but only save models based on

the validation error (cross-validation optimization technique). This cross-validation

technique was combined with a dropout rate of 0.5 in order to prevent overfitting. The

trained model was then tested on the remaining 10% of the data. This final testing

accuracy was collected for comparisons between models.

After the convolutional layer analysis, the best performing number of convolutional

layers was then used as the base strucure to assess changes to the classifier block. In this

part of the analysis, the number and size of the fully-connected layer(s) was varied.

Further details on model parameter optimization can be found in the Section 3.

2.4 Transfer Learning Method

The transfer learning paradigm described in this thesis centers around attempting to

leverage EEG data and region criticality trends from two subjects (external subjects) in

order to predict the intentions of a third subject (primary subject). To accomplish this,

three types of transfer learning specific datasets were composed for each subject: ‘All

Regions’, ‘Good Regions Only’, and ‘Worst Regions Removed’. First, the skull was

divided into 13 regions as outlined in (Zhang et al., 2017) and shown in Figure 7.

27

Page 37: Classification and Transfer Learning of EEG during a

Figure 7: Regions of importance implementation - The skull divided into 13 regions of importance for the region criticality assessment and transfer learning paradigm.

Each region was individually processed using the best performing architecture found

in the single-subject DCNN analysis. Then, the accuracy results were normalized and the

region criticality score was compared between subjects. For a primary subject, the

consistently well-performing regions between the two external subjects were found and

used to create the two region-restrictive datasets. The ‘Good Regions Only’ dataset

included channels from regions that performed above average for both of the external

subjects. The ‘Worst Regions Removed’ dataset was composed by removing all channels

contained within regions that were below average for both external subjects. The third

dataset, ‘All Regions’, did not remove regions and serves as a baseline for comparison

against the restricted region datasets. These three datasets were used in the transfer

learning analysis stage. Figure 8 shows the region of importance analysis in the test case

where Subject 1 is the primary subject. For Subject 1, the ‘Good Regions Only’ dataset

28

Page 38: Classification and Transfer Learning of EEG during a

only includes regions in the green boxes, while the ‘Worst Regions Removed’ dataset

removes all regions in the red boxes. In this case, the shared good regions between

Subjects 2 and 3 were the Left Centro-Parietal, Midline Centro-Parietal, Right Centro-

Parietal, and Middle Parieto-Occipital, whereas the consistently underperforming regions

included the Anterior Frontal, Right Fronto-Central, and Right Temporal regions. Further

region criticality analysis results can be found in Section 3.2.1.

29

Page 39: Classification and Transfer Learning of EEG during a

Figure 8: Region criticality analysis of the external subjects 2 and 3 while subject 1 is the intended primary subject. The green boxes highlight regions that consistently performed above average for both subjects 2 and 3, while the red boxes indicate regions that were consistently underperforming for the two subjects.

Following the region criticality and dataset preparation stage, a transfer learning

analysis was performed in three cases, each case having a different subject acting as the

primary subject. In this stage, an initial simulation was run using data exclusively from

the external subjects to predict the intentions of the primary subject across all sessions as

a baseline. Then, simulations were run in the following manner. All of the data from the

external subjects was included in the training dataset for the neural network while an

30

Page 40: Classification and Transfer Learning of EEG during a

increasing number of sessions from the primary subject were included in both the training

and validation datasets. In other words, the model would only save the model weights

when the validation error on the primary subject decreased, rather than saving models

that improved the error on the training set, with the idea that the neural network would be

identifying general patterns in the external subjects while applying these patterns to the

primary subject. Each model was then tested by classifying the remaining sessions of the

primary subject. This iterative process was repeated until five full sessions had been

included from the primary subject.

The main goal through this analysis is to demonstrate how past data can be leveraged

to limit the number of sessions needed from new subjects, so the maximum number of

sessions included from the primary subject was half of the total number of sessions. It

would then be advisable to change the training and testing paradigm, as, with five

sessions from the primary subject, it would then be possible to revise the region subsets

by using trends found between all three subjects. Dataset accuracies, by session and

overall, were then compared as a function of the increasing number of included sessions

from the primary subject. These results can be found in section 3.2.1.

31

Page 41: Classification and Transfer Learning of EEG during a

Chapter 3: Results

3.1 Single-subject architecture optimization

To accomplish the single-subject neural network architecture optimization, EEG

data from the kinesthetic motor imagery task was first cleaned of EOG artifacts using the

H-infinity algorithm, as described in the Methods section. This cleaned data was filtered

using a second order Butterworth filter. In the first 300 simulations, the initial set of

classifiers performed better with a filter range of 0.1-50 Hz, rather than the filter range of

0.1-30 Hz, for 80% of simulations. For this reason, this section will present findings

solely using data filtered with the 0.1-50 Hz 2nd order Butterworth filter.

To investigate the optimum number of convolutional blocks for this dataset, five

different DCNN’s were designed, as described in the Methods section. The EEG data

from each subject was individually passed through each different model and the

accuracies were compared. These results are shown in Figure 9.

32

Page 42: Classification and Transfer Learning of EEG during a

Figure 9: Accuracy comparisons as a function of the number of convolutional layers

While accuracies for Subjects 1 and 2 did not vary dramatically with an increase in

the number of convolutional layers, all three subjects shared a similar trend in that

accuracy peaked with four convolutional layers. For subjects 2 and 3, accuracy then

decreased with the addition of a fifth convolutional layer, whereas, for Subject 1,

accuracy leveled off with the inclusion of a fifth layer. Therefore, four was chosen as the

optimum number of convolutional layers for further optimization and transfer learning

efforts.

Using the four convolutional layer design as a base design, analysis continued with

an investigation into the number and general shape of the classifier block. First, each

33

Page 43: Classification and Transfer Learning of EEG during a

subject dataset was individually fed into three different neural network architectures,

which had a varied number of fully-connected classifier layers. Specifically, a single

layer, double layer, and triple layer classifier block was compared. These results are

shown in Figure 10.

Figure 10: Accuracy comparisons as a function of the number of fully-connected classifier layers

The accuracy changes were, again, not dramatic, but the general trends between each

subject were similar in that accuracy increased as the number of fully-connected classifier

layers increased. This result was used for the final optimization stage – identifying the

general shape and size of the classifier layers. Three different classifier blocks were

designed: thin, medium, and large variations of the three-fully-connected-layer classifier

block, which is shown in Figure 11. Accuracy comparisons were made by subject and

34

Page 44: Classification and Transfer Learning of EEG during a

this is shown in Figure 12. The classification performance of all three subjects decreased

as the shape of the classifier block moved from thin to large.

Figure 11: Differences in the shape of the classifier block

Figure 12: Accuracy comparisons as a function of the classifier block shape

These three results guided the development of an optimized deep convolutional

architecture, containing four convolutional layers and three fully-connected classifier

35

Page 45: Classification and Transfer Learning of EEG during a

layers, which are composed of eight, four, and two fully connected nodes respectively.

The final complete architecture is presented in Figure 13.

36

Page 46: Classification and Transfer Learning of EEG during a

Figure 13: The optimized deep convolutional neural network architecture

37

Page 47: Classification and Transfer Learning of EEG during a

3.2 Transfer Learning

3.2.1 Region Criticality Analysis Results

In order to investigate this thesis’ transfer learning paradigm, three datasets were

created for each subject: ‘All Regions’, ‘Good Regions Only’, and ‘Worst Regions

Removed’. A region criticality analysis was performed in order to create the latter region-

restricted datasets. The skull was divided into 13 regions as described in the Methods

section and the resulting accuracies were normalized for each subject individually. For a

designated primary subject, for example subject 1, the two region restricted datasets were

formed by analyzing the region criticality results from the designated external subjects, in

this case subjects 2 and 3. The ‘Good Regions Only’ dataset only included regions that

performed better than average for both of the excluded subjects. The ‘Worst Regions

Removed’ dataset removed regions that were below average for both of the excluded

subjects. The results for each subject are presented in Figure 14, Figure 15, and Figure

16.

38

Page 48: Classification and Transfer Learning of EEG during a

Figure 14: Region criticality analysis, single-subject model (A) and transfer learning model (B) and corresponding scalp maps for Subject 1

39

Page 49: Classification and Transfer Learning of EEG during a

Figure 15: Region criticality analysis, single-subject model (A) and transfer learning model (B) and corresponding scalp maps for Subject 2

40

Page 50: Classification and Transfer Learning of EEG during a

Figure 16: Region criticality analysis, single-subject model (A) and transfer learning model (B) and corresponding scalp maps for Subject 3

41

Page 51: Classification and Transfer Learning of EEG during a

Figure 14 presents the region criticality analysis for Subject 1. Figure 14-A shows

the single-subject region criticality analysis results and corresponding scalp map, while

Figure 14-B presents the transfer learning region criticality analysis and corresponding

scalp map when using Subjects 2 and 3 as the external subjects. The regions in all scalp

maps are color coded based on the region importance score, with red indicating a region

that performs exclusively below average, green indicating a region that performed

exclusively above average, and yellow indicated a region that was above average for one

external subject and below average for the second external subject.

Figure 15 and Figure 16 present these results for Subject 2 and Subject 3

respectively. Figure 17 presents the region criticality analysis based on trends from all

three subjects combined. These particular results are not used in the transfer learning

paradigm since the hypothetical transfer learning model would not be able to use trends

from primary subject data that would not yet exist. However, these results help to

generalize trends for kinesthetic motor imagery tasks and will be explored in the

discussion section.

42

Page 52: Classification and Transfer Learning of EEG during a

Figure 17: Region criticality analysis and scalp map for all three subjects combined

3.2.2 Transfer Learning Results

Once the region criticality analysis and transfer learning dataset formation was

completed, each subject was classified according to the transfer learning paradigm

described in the Methods section. First, each primary subject was classified using data

exclusively from the excluded subjects or, in other words, no data from the primary

subject was used when training the model. This was to be used as a baseline for

comparisons against the accuracy results when an increasing number of sessions from the

primary subject were included for training. The comparison between the baseline

predictions and the cases with an increasing number of sessions is shown in Figure 18,

Figure 19, and Figure 20 for each primary subject and type of dataset formation

respectively. Included with these comparisons is a comparison between the four different

datasets (A) and the corresponding average accuracies over all remaining sessions based

on the varied number of included sessions (B).

43

Page 53: Classification and Transfer Learning of EEG during a

Figure 18: The session-by-session (A) and average accuracies (B) found from different dataset formations for Subject 1

44

Page 54: Classification and Transfer Learning of EEG during a

Figure 19: The session-by-session (A) and average accuracies (B) found from different dataset formations for Subject 2

45

Page 55: Classification and Transfer Learning of EEG during a

Figure 20: The session-by-session (A) and average accuracies (B) found from different dataset formations for Subject 3

46

Page 56: Classification and Transfer Learning of EEG during a

Table 3: Transfer Learning Model Results - The average benefit of the transfer learning models over the single-session models, the number of sessions required by the transfer learning model to outperform the highest accuracy achieved by the single-subject model, and the number of test cases where the transfer learning model failed to outperform the single-subject model.

Model Subject 1

Subject 2

Subject 3

Average Benefit of Transfer Learning Models

Good Regions Only 14.80% 5.36% 4.08% Worst Regions Removed 13.62% 7.32% 3.80%

All Regions 13.14% 6.93% 1.13% Number of Sessions

Required to Outperform the Single-Subject Model

Good Regions Only 1 3 2 Worst Regions Removed 2 3 3

All Regions 2 2 5 Transfer Learning Dataset Failed to Outperform the

Single-Subject Model

Good Regions Only 0 2 0 Worst Regions Removed 0 0 0

All Regions 0 0 3

The results for each subject in Figures 18, 19 and 20, further outlined in Table 3,

present the common trend found in all three subjects, which is that the inclusion of an

increasing number of sessions from the primary subject always improved classification

performance for all three transfer learning datasets when compared against the case

where no primary subject session was included. For Subject 1, what is also evident is that

the ‘Good Regions Only’ model was able to successfully classify session 10, which was

handled poorly by the three other dataset formulations, demonstrating how relatively

effective that region restriction method was in eliminating poorly performing regions.

The average accuracy comparison presented in Figure 18 and Table 3 provides evidence

that, for this subject, leveraging external subject data was always beneficial towards

classification performance.

In Subject 2’s transfer learning results, Figure 19, the same trend found through

Subject 1’s analysis is evident in that adding sessions from the primary subject increased

47

Page 57: Classification and Transfer Learning of EEG during a

classification performance on the remaining sessions. Contrary to Subject 1, outside of a

single test case, the average accuracy results were highest when using the ‘Worst Regions

Removed’ dataset as presented in Figure 19. Interesting to note is that the accuracy for

the ‘Good Regions Only’ dataset actually fell below the single-subject model, meaning

that, for the cases where four and five primary subject sessions were included for

training, leveraging external subject data actually negatively affected classification

performances.

The transfer learning analysis for Subject 3, Figure 20, shares the general trend

found with Subjects 1 and 2 in that an increasing number of primary subject sessions

improved classification performance. However, the performance difference between the

test cases that included zero sessions versus a single session did not produce the dramatic

increase in performance seen in Subjects 1 and 2. For Subject 3, the ‘Good Regions

Only’ and ‘Worst Regions Removed’ datasets always outperformed the single-subject

dataset, but the ‘All Regions’ dataset classifier failed to outperform the single-subject

model for average accuracy in three test cases.

48

Page 58: Classification and Transfer Learning of EEG during a

Chapter 4: Discussion

This thesis presented a DCNN optimization process for a two-class kinesthetic motor

imagery ‘GO – NO GO’ task. Through this process, an optimized architecture consisting

of four convolutional layers and three fully-connected layers with eight, four, and two

hidden neurons respectively was found to outperform other variations of a base DCNN

algorithm. The general accuracy trends across the three subjects were similar, but the

main motivation for these parameter selections comes from the analysis of the third

subject, which was the worst performing subject. In the results for this subject, the

changes in accuracy due to changes in architecture parameters were most pronounced,

indicating that these architecture parameter decisions are more influential for datasets

more difficult to classify. Further research is needed in analyzing and optimizing other

DCNN parameters, such pool size, convolutional filter, and stride length, but these

preliminary optimization results can serve to form a framework for future kinesthetic

imagery task research.

The region of importance criticality analysis accomplished two goals: transfer

learning dataset creation and the understanding of kinesthetic motor imagery task neural

correlates. Figure 17 details the results when using region of importance trends from all

three subjects combined. The results found through this analysis do not match the results

found in (Zhang et al., 2017), where the fronto-central and centro-parietal regions

appeared to be the most important for classification. However, as previously described,

(Zhang et al., 2017) bandpass filtered the datasets with a filter range of 0.1–2 Hz whereas

this thesis analyzed the much larger frequency range of 0.1-50 Hz. This thesis instead

found the left and midline centro-parietal regions and the middle parietal-occipital region

49

Page 59: Classification and Transfer Learning of EEG during a

to consistently impact the classification performance positively, while the anterior frontal,

right front-central and right temporal regions were found to consistently lead to poor

classification performance. The most dramatic difference between the results found in

this thesis and (Zhang et al., 2017) were the importance ratings of the right front-central

region, as this thesis found the region to be consistently below average in performance,

whereas (Zhang et al., 2017) ranked this as the second most important region. This

difference points to the possibility that, for this type of task, the fronto-central regions act

as the dominant cortical sources for delta band frequencies, but play a relatively smaller

role in the formation of other frequency bands. Future research could further investigate

this possibility by analyzing different EEG bands individually and performing a region

criticality analysis to assess how classification performance differs with the individually

analyzed frequency band.

The transfer learning paradigm investigated whether past external subject data could

be leveraged in order to improve the classification performance of a primary subject.

External subject data was leveraged by first analyzing the region of importance trends

and using those trends to restrict the number of regions used to classify the primary

subject. Two processes were used to create the region-restricted transfer learning

datasets. The ‘Good Regions Only’ datasets was composed solely of regions that

performed exclusively above average for both external subjects, whereas the ‘Worst

Regions Removed’ dataset removed the regions that performed exclusively below

average for both external subjects. The classification performances for these two region-

restricted datasets were compared to the performance when using the ‘All Regions’

dataset in addition to the single-subject model, which is outlined in Table 3.

50

Page 60: Classification and Transfer Learning of EEG during a

This comparison led to two observations. First, a reduced region datasets always

performed better than or as well as the ‘All Regions’ dataset outside of a single test case

for Subject 2. This indicates that some form of region restriction leads to statistical

improvements when compared against the ‘All Region’ dataset formulation. Since it was

not always the case that the ‘Worst Regions Removed’ training paradigm outperformed

the ‘Good Regions Only’ dataset or vice versa, further research is needed in order to

optimize the region restriction process. However, the second observation sheds light as to

which direction future research should focus and this is based on the fact that the ‘Worst

Regions Removed’ dataset always performed better than the single-subject model. Both

the ‘Good Regions Only’ and ‘All Regions’ datasets had test cases where the models

failed to outperform the single-subject classification performance, which indicates that,

for those test cases, leveraging the external subjects negatively affected classification

performance for the primary subject. Since the ‘Worst Regions Removed’ dataset always

beat the single-subject dataset, there was no test case where leveraging external subjects

didn’t increase the classification accuracy of all remaining sessions. Of the three datasets,

the ‘Worst Regions Removed’ dataset removes the smallest number of regions, indicating

that, for subjects such as Subject 2, the ‘Good Regions Only’ dataset was too restrictive,

while, for subjects such as Subject 3, the ‘All Regions’ dataset was too inclusive. Future

research is encouraged to use a region restriction paradigm, but, due to the possibility of

over- and under-restriction, is cautioned against relying on a single region restriction

process.

As previously discussed, limiting the number of necessary time and effort intensive

sessions for subjects with motor-disabilities is crucial towards making exoskeleton-based

51

Page 61: Classification and Transfer Learning of EEG during a

rehabilitation a practical reality. This thesis has presented a transfer learning paradigm

that leverages past subject data to help limit the number of sessions from the primary

subject. For instance, the results in Table 3 show that, for subject one, the average

accuracy found when using five sessions with the single-subject model was equal to or

below the performance found when only using a single primary subject session with the

two region-restricted transfer learning models. This means that, in this hypothetical

transfer learning test case, a disabled subject would only need two training sessions to

achieve the same accuracy with the transfer learning models as compared against the five

sessions needed for the single-subject model. For subject 2, the transfer learning models

exposed to a single session from the primary subject outperformed the single-subject

model when three primary subject sessions were used. The single-subject models for

subject 3 always performed less than the two restrictive-region models, but not to the

extent of subjects 1 and 2. Regardless, what’s clear is that past subject data, when

properly restricted, can always improve upon the single-subject classification

performance, and typically allows the primary subject to undergo less training sessions.

Future offline analysis research may benefit from introducing a more adaptive training

scheme so that region of importance trends can be updated as each additional session

from the primary subject is included.

52

Page 62: Classification and Transfer Learning of EEG during a

Chapter 5: Conclusion

Reliable classification of Electroencephalogram (EEG) signals is a crucial step

towards making EEG-controlled non-invasive Brain-Machine exoskeleton rehabilitation

a practical reality. The classification of EEG signals during motor imagery tasks has been

proposed as a way to isolate a control signal for exoskeleton use. This thesis adapted a

Deep Convolutional Neural Network (DCNN) design to optimize key neural network

parameters for an existing kinesthetic motor imagery EEG dataset. This led to an

optimized architecture consisting of four convolutional layers and three fully connected

layers. This optimized structure was then used to investigate a transfer learning paradigm.

Transfer learning, or the leveraging of data from past subjects to classify the intentions of

a new subject, is important for rehabilitation as it minimizes the number of training

sessions required from disabled subjects who lack full motor functionality. The transfer

learning paradigm investigated through this thesis utilized region criticality trends to

reduce the number of required new subject training sessions and generally increase the

classification performance when compared against the single-subject non-transfer

learning classifier. Future research would benefit from additional focus on optimizing

other key parameters of DCNN’s, designing different region restriction strategies, and

creating an adaptive transfer learning paradigm that utilizes the region criticality trends

that evolve with the inclusion of additional new subject training sessions.

53

Page 63: Classification and Transfer Learning of EEG during a

References

Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adeli, H. (2017). Deep

convolutional neural network for the automated detection and diagnosis of seizure

using EEG signals. Computers in Biology and Medicine, (September), 1–9.

https://doi.org/10.1016/j.compbiomed.2017.09.017

Antoniades, A., Spyrou, L., Martin-Lopez, D., Valentin, A., Alarcon, G., Sanei, S., &

Took, C. C. (2017). Detection of Interictal Discharges with Convolutional Neural

Networks Using Discrete Ordered Multichannel Intracranial EEG. IEEE

Transactions on Neural Systems and Rehabilitation Engineering, 25(12), 2285–

2294. https://doi.org/10.1109/TNSRE.2017.2755770

Bahy, M. M. El, Hosny, M., Mohamed, W. A., & Ibrahim, S. (2017). Proceedings of the

International Conference on Advanced Intelligent Systems and Informatics 2016,

533. https://doi.org/10.1007/978-3-319-48308-5

Baker, S. N. (2007). Oscillatory interactions between sensorimotor cortex and the

periphery. Current Opinion in Neurobiology, 17(6), 649–655.

https://doi.org/10.1016/j.conb.2008.01.007

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with

gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.

Carmena, J. M., Lebedev, M. A., Crist, R. E., O’Doherty, J. E., Santucci, D. M.,

Dimitrov, D. F., Nicolelis, M. A. L. (2003). Learning to control a brain-machine

interface for reaching and grasping by primates. PLoS Biology, 1(2), 193–208.

https://doi.org/10.1371/journal.pbio.0000042

54

Page 64: Classification and Transfer Learning of EEG during a

Contreras-Vidal, J. L., Bhagat, N. A., Brantley, J., Cruz-Garza, J. G., He, Y., Manley,

Pons, J. L. (2016). Powered exoskeletons for bipedal locomotion after spinal cord

injury. Journal of Neural Engineering, 13(3). https://doi.org/10.1088/1741-

2560/13/3/031001

Dong, H., Supratak, A., Pan, W., Wu, C., Matthews, P. M., & Guo, Y. (2018). Mixed

Neural Network Approach for Temporal Sleep Stage Classification. IEEE

Transactions on Neural Systems and Rehabilitation Engineering, 26(2), 324–333.

https://doi.org/10.1109/TNSRE.2017.2733220

Duffy, F. H., Mcanulty, G. B., & Albert, M. S. (1996). Effects of age upon

interhemispheric EEG coherence in normal adults. Neurobiology of Aging, 17(4),

587–599. https://doi.org/10.1016/0197-4580(96)00007-3

Fatourechi, M., Bashashati, A., Ward, R. K., & Birch, G. E. (2007). EMG and EOG

artifacts in brain computer interface systems: A survey. Clinical Neurophysiology,

118(3), 480–494. https://doi.org/10.1016/j.clinph.2006.10.019

Fery, Y. (2014). Differentiating visual and kinesthetic imagery in mental practice

Differentiating visual and kinesthetic imagery in mental practice, (August).

https://doi.org/10.1037/h0087408

Forbes, S. A., Duncan, P. W., & Zimmerman, M. K. (1997). Review criteria for stroke

rehabilitation outcomes. Archives of Physical Medicine and Rehabilitation, 78(10),

1112–1116. https://doi.org/10.1016/S0003-9993(97)90137-4

Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent

55

Page 65: Classification and Transfer Learning of EEG during a

neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee

international conference on (pp. 6645–6649).

Greenspan, H., van Ginneken, B., & Summers, R. M. (2016). Guest Editorial Deep

Learning in Medical Imaging: Overview and Future Promise of an Exciting New

Technique. IEEE Transactions on Medical Imaging, 35(5), 1153–1159.

https://doi.org/10.1109/TMI.2016.2553401

Greyson, B., Kelly, E. F., & Dunseath, W. J. R. (2013). Surge of neurophysiological

activity in the dying brain. Proceedings of the National Academy of Sciences,

110(47), E4405–E4405. https://doi.org/10.1073/pnas.1316937110

Guerra, E., de Lara, J., Malizia, A., & Díaz, P. (2009). Supporting user-oriented analysis

for multi-view domain-specific visual languages. Information and Software

Technology, 51(4), 769–784. https://doi.org/10.1016/j.infsof.2008.09.005

Hajinoroozi, M., Mao, Z., & Huang, Y. (2015). Prediction of driver’s drowsy and alert

states from EEG signals with deep learning. 2015 IEEE 6th International Workshop

on Computational Advances in Multi-Sensor Adaptive Processing, CAMSAP 2015,

493–496. https://doi.org/10.1109/CAMSAP.2015.7383844

Hajinoroozi, M., Mao, Z., Jung, T. P., Lin, C. T., & Huang, Y. (2016). EEG-based

prediction of driver’s cognitive performance by deep convolutional neural network.

Signal Processing: Image Communication, 47, 549–555.

https://doi.org/10.1016/j.image.2016.05.018

He, Y., Eguren, D., Azorín, J. M., Grossman, R., Luu, T. P., & Contreras-Vidal, J. L.

56

Page 66: Classification and Transfer Learning of EEG during a

(Pepe). (2018). Brain–machine interfaces for controlling lower-limb powered

robotic systems. Journal of Neural Engineering. https://doi.org/10.1088/1741-

2552/aaa8c0

Hochberg, L. R., Bacher, D., Jarosiewicz, B., Masse, N. Y., Simeral, J. D., Vogel, J.,

Donoghue, J. P. (2012). Reach and grasp by people with tetraplegia using a neurally

controlled robotic arm. Nature, 485(7398), 372–375.

https://doi.org/10.1038/nature11076

Hosseini, M.-P., Pompili, D., Elisevich, K., & Soltanian-Zadeh, H. (2017). Optimized

Deep Learning for EEG Big Data and Seizure Prediction BCI via Internet of Things.

IEEE Transactions on Big Data, 3(4), 392–404.

https://doi.org/10.1109/TBDATA.2017.2769670

Jarrasica, N., Proietti, T., Crocher, V., Robertson, J., Sahbani, A., Morel, G., & Roby-

Brami, A. (2014). Robotic Exoskeletons: A Perspective for the Rehabilitation of

Arm Coordination in Stroke Patients. Frontiers in Human Neuroscience,

8(December), 1–13. https://doi.org/10.3389/fnhum.2014.00947

Jirayucharoensak, S., Pan-Ngum, S., Israsena, P., Jirayucharoensak, S., Pan-Ngum, S., &

Israsena, P. (2014). EEG-Based Emotion Recognition Using Deep Learning

Network with Principal Component Based Covariate Shift Adaptation, EEG-Based

Emotion Recognition Using Deep Learning Network with Principal Component

Based Covariate Shift Adaptation, 2014, 2014, e627892.

https://doi.org/10.1155/2014/627892, 10.1155/2014/627892

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Li, F. F. (2014).

57

Page 67: Classification and Transfer Learning of EEG during a

Large-scale video classification with convolutional neural networks. Proceedings of

the IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, 1725–1732. https://doi.org/10.1109/CVPR.2014.223

Kilicarslan, A., Grossman, R. G., & Contreras-Vidal, J. L. (2016). A robust adaptive

denoising framework for real-time artifact removal in scalp EEG measurements.

Journal of Neural Engineering, 13(2). https://doi.org/10.1088/1741-

2560/13/2/026013

Kilicarslan, A., Prasad, S., Grossman, R. G., & Member, J. L. C. S. (2013). High

Accuracy Decoding of User Intentions Using EEG to Control a, 5606–5609.

Kirmizi-Alsan, E., Bayraktaroglu, Z., Gurvit, H., Keskin, Y. H., Emre, M., & Demiralp,

T. (2006). Comparative analysis of event-related potentials during Go/NoGo and

CPT: Decomposition of electrophysiological markers of response inhibition and

sustained attention. Brain Research, 1104(1), 114–128.

https://doi.org/10.1016/j.brainres.2006.03.010

Korshunova, I., Kindermans, P.-J., Degrave, J., Verhoeven, T., Brinkmann, B. H., &

Dambre, J. (2017). Towards improved design and evaluation of epileptic seizure

predictors. IEEE Transactions on Biomedical Engineering, 65(3), 1–1.

https://doi.org/10.1109/TBME.2017.2700086

Kort, N. S., Cuesta, P., Houde, J. F., & Nagarajan, S. S. (2016). Bihemispheric network

dynamics coordinating vocal feedback control. Human Brain Mapping, 37(4), 1474–

1485. https://doi.org/10.1002/hbm.23114

58

Page 68: Classification and Transfer Learning of EEG during a

Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., & May, L. G. (2018).

EEGNet : A Compact Convolutional Neural Network for EEG-based Brain-

Computer Interfaces, 1–30.

Lawrence, S. (1997). Face Recognition: A Convolutional Neural-Network Approach.

IEEE TRANSACTIONS ON NEURAL NETWORKS, 627(1), 202–206.

https://doi.org/10.1016/j.gene.2017.06.018

Lazarou, I., Nikolopoulos, S., Petrantonakis, P. C., Kompatsiaris, I., & Tsolaki, M.

(2018). EEG-Based Brain–Computer Interfaces for Communication and

Rehabilitation of People with Motor Impairment: A Novel Approach of the 21st

Century. Frontiers in Human Neuroscience, 12(January), 1–18.

https://doi.org/10.3389/fnhum.2018.00014

Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

https://doi.org/10.1038/nature14539

Lee, B. B., Cripps, R. A., Fitzharris, M., & Wing, P. C. (2014). The global map for

traumatic spinal cord injury epidemiology: Update 2011, global incidence rate.

Spinal Cord, 52(2), 110–116. https://doi.org/10.1038/sc.2012.158

Lee, K., Liu, D., Perroud, L., Chavarriaga, R., & Millán, J. del R. (2017). A brain-

controlled exoskeleton with cascaded event-related desynchronization classifiers.

Robotics and Autonomous Systems, 90, 15–23.

https://doi.org/10.1016/j.robot.2016.10.005

Li, Y., Huang, J., Zhou, H., & Zhong, N. (2017). Human Emotion Recognition with

59

Page 69: Classification and Transfer Learning of EEG during a

Electroencephalographic Multidimensional Features by Hybrid Deep Neural

Networks. Applied Sciences, 7(10), 1060. https://doi.org/10.3390/app7101060

Liu, J., Cheng, Y., & Zhang, W. (2015). Deep learning EEG response representation for

brain computer interface. Chinese Control Conference, CCC, 2015–Septe, 3518–

3523. https://doi.org/10.1109/ChiCC.2015.7260182

Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Lotte, F., Congedo, M., … Arnaldi,

B. (2007). A review of classification algorithms for EEG-based brain – computer

interfaces To cite this version : HAL Id : inria-00134950 A Review of Classification

Algorithms for EEG-based Brain-Computer Interfaces.

Luca, C. J. De. (2002). Delsys Surface Electromyography: Detection and Recording.

Delsys Incorporated, 10(2), 1–10. https://doi.org/10.5121/ijsea.2013.4501

Medicine, E., Miko, E., & Miko, D. (2013). Exoskeletons in Neurological Diseases –

Current Exoskeletons in Neurological Diseases, (May 2017).

Moore, J. E., & Ezra, J. E. J. (2002). Pattern Recognition. https://doi.org/10.1016/B978-

0-12-815489-2.00016-2

Nathan, K., & Contreras-Vidal, J. L. (2016). Negligible Motion Artifacts in Scalp

Electroencephalography (EEG) During Treadmill Walking. Frontiers in Human

Neuroscience, 9(January), 1–12. https://doi.org/10.3389/fnhum.2015.00708

Noda, T., Sugimoto, N., Furukawa, J., Sato, M. A., Hyon, S. H., & Morimoto, J. (2012).

Brain-controlled exoskeleton robot for BMI rehabilitation. IEEE-RAS International

Conference on Humanoid Robots, 21–27.

60

Page 70: Classification and Transfer Learning of EEG during a

https://doi.org/10.1109/HUMANOIDS.2012.6651494

O ’shea, A., Lightbody, G., Boylan, G., & Temko, A. (n.d.). Neonatal Seizure Detection

Using Convolutional Neural Networks. Retrieved from

https://arxiv.org/pdf/1709.05849.pdf

Pfurtscheller, G., & Neuper, C. (2001). Motor Imagery and Direct Brain – Computer

Communication, 89(7), 1123–1134.

Pouratian, N. (2012). On the feasibility of using motor imagery EEG-based brain–

computer interface in chronic tetraplegics for assistive robotic arm control: a clinical

test and long-term post-trial follow-up. Spinal Cord, 50(9), 716.

https://doi.org/10.1038/sc.2012.29

Sakhavi, S., Guan, C., & Yan, S. (2015). Parallel convolutional-linear neural network for

motor imagery classification. 2015 23rd European Signal Processing Conference,

EUSIPCO 2015, 2736–2740. https://doi.org/10.1109/EUSIPCO.2015.7362882

Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M.,

Eggensperger, K., Tangermann, M., … Ball, T. (2017a). Deep learning with

convolutional neural networks for EEG decoding and visualization. Human Brain

Mapping, 38(11), 5391–5420. https://doi.org/10.1002/hbm.23730

Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M.,

Eggensperger, K., Tangermann, M., … Ball, T. (2017b). Deep learning with

convolutional neural networks for EEG decoding and visualization. Human Brain

Mapping, 38(11), 5391–5420. https://doi.org/10.1002/hbm.23730

61

Page 71: Classification and Transfer Learning of EEG during a

Soleymani, M., Member, S., & Lee, J. (2012). DEAP : A Database for Emotion Analysis

Using Physiological Signals, 3(1), 18–31. https://doi.org/10.1109/T-AFFC.2011.15

Stinear, C. M., Byblow, Æ. W. D., & Swinnen, S. P. (2006). Kinesthetic , but not visual ,

motor imagery modulates corticomotor excitability, 157–164.

https://doi.org/10.1007/s00221-005-0078-y

Strausser, K. A., Swift, T. A., & Zoss, A. B. (2018). Prototype medical exoskeleton for

paraplegic mobility: first experimental resutls, 1-6.

Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural

networks. In Proceedings of the 28th International Conference on Machine

Learning (ICML-11) (pp. 1017–1024).

Tabar, Y. R., & Halici, U. (2017). A novel deep learning approach for classification of

EEG motor imagery signals. Journal of Neural Engineering, 14(1).

https://doi.org/10.1088/1741-2560/14/1/016003

Tang, Z., Li, C., & Sun, S. (2017). Single-trial EEG classification of motor imagery using

deep convolutional neural networks. Optik, 130, 11–18.

https://doi.org/10.1016/j.ijleo.2016.10.117

Teplan, M. . (2002). FUNDAMENTALS OF EEG MEASUREMENT. Measurement

Science, 2, 1–11.

Tsinalis, O., Matthews, P. M., & Guo, Y. (2016). Automatic Sleep Stage Scoring Using

Time-Frequency Analysis and Stacked Sparse Autoencoders. Annals of Biomedical

Engineering, 44(5), 1587–1597. https://doi.org/10.1007/s10439-015-1444-y

62

Page 72: Classification and Transfer Learning of EEG during a

Van Putten, M. J. A. M., Olbrich, S., & Arns, M. (2018). Predicting sex from brain

rhythms with deep learning. Scientific Reports, 8(1), 1–7.

https://doi.org/10.1038/s41598-018-21495-7

Wei, C. S., Lin, Y. P., Wang, Y. Te, Jung, T. P., Bigdely-Shamlo, N., & Lin, C. T.

(2016). Selective Transfer Learning for EEG-Based Drowsiness Detection.

Proceedings - 2015 IEEE International Conference on Systems, Man, and

Cybernetics, SMC 2015, 3229–3232. https://doi.org/10.1109/SMC.2015.560

Wu, D., Lance, B., & Lawhern, V. (2014). Transfer learning and active transfer learning

for reducing calibration data in single-trial classification of visually-evoked

potentials. Conference Proceedings - IEEE International Conference on Systems,

Man and Cybernetics, 2014–January (January), 2801–2807.

https://doi.org/10.1109/smc.2014.6974353

Xiang Li, Dawei Song, Peng Zhang, Guangliang Yu, Yuexian Hou, & Bin Hu. (2016).

Emotion recognition from multi-channel EEG data through Convolutional Recurrent

Neural Network. 2016 IEEE International Conference on Bioinformatics and

Biomedicine (BIBM), 352–359. https://doi.org/10.1109/BIBM.2016.7822545

Xue, Z. X. Z., Li, J. L. J., Li, S. L. S., & Wan, B. W. B. (2006). Using ICA to Remove

Eye Blink and Power Line Artifacts in EEG. First International Conference on

Innovative Computing, Information and Control - Volume I (ICICIC’06), 3, 2–5.

https://doi.org/10.1109/ICICIC.2006.543

Yanagimoto, M., & Sugimoto, C. (2016). Recognition of persisting emotional valence

from EEG using convolutional neural networks. 2016 IEEE 9th International

63

Page 73: Classification and Transfer Learning of EEG during a

Workshop on Computational Intelligence and Applications (IWCIA), 27–32.

https://doi.org/10.1109/IWCIA.2016.7805744

Yin, Z., & Zhang, J. (2017). Cross-session classification of mental workload levels using

EEG and an adaptive deep learning model. Biomedical Signal Processing and

Control, 33, 30–47. https://doi.org/10.1016/j.bspc.2016.11.013

Zemke, A. C., Heagerty, P. J., Lee, C., & Cramer, S. C. (2003). Motor Cortex

Organization After Stroke Is Related to Side of Stroke and Level of Recovery.

Stroke, 34(5), e23–e26. https://doi.org/10.1161/01.STR.0000065827.35634.5E

Zhang, Y., Prasad, S., Kilicarslan, A., & Contreras-vidal, J. L. (2017). Multiple Kernel

Based Region Importance Learning for Neural Classification of Gait States from

EEG Signals, 11(April), 1–11. https://doi.org/10.3389/fnins.2017.00170

Zheng, W. L., & Lu, B. L. (2015). Investigating Critical Frequency Bands and Channels

for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE

Transactions on Autonomous Mental Development, 7(3), 162–175.

https://doi.org/10.1109/TAMD.2015.2431497

64

Page 74: Classification and Transfer Learning of EEG during a