A Novel Combination of Natural Language …...A brain-computer interface (BCI) is a communication system that enables individuals with severe physical disabilities to communicate or

A Novel Combination of Natural Language Processing

and Brain Computer Interface in a Communication

System

by

Maryam Fallah

A thesis submitted in conformity with the requirements for the degree of

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

© Copyright by Maryam Fallah 2019

ii

A Novel Combination of Natural Language Processing and Brain Computer Interface in a

Communication System

Maryam Fallah

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2019

Abstract

A brain-computer interface (BCI) is a communication system that enables individuals with

severe physical disabilities to communicate or control external devices through their brain

activities. In this study, we proposed a communication system combining natural language

processing (NLP) and BCI in a question and answer paradigm. Specifically, we combined a

context-aware predictive speller and an answer generation engine that comprehends the

question being asked of the user, to efficiently present potential conversational responses. The

user could either type a response or select from suggested answers. If the user started typing,

the cells containing suggestions were repopulated with context-relevant words matching the

user’s typed characters, thereby reducing typing time.

We have evaluated our proposed system through 4 sessions per subject in terms of accuracy,

bit rate, timing, and user satisfaction. Our data analysis has validated that the proposed

paradigm doubles the typing speed, increases accuracy and reduces mental demand of message

composition.

Keywords: Brain-computer interface, electroencephalography, P300, natural language

processing, context aware, context independent, answer generation, context dependent.

iii

Acknowledgments

I would first like to express my deepest gratitude to my supervisor Dr. Tom Chau for his

unceasing support and guidance throughout the past two years of my post-graduate degree. I

will forever be grateful for this opportunity.

Thank you to my committee members Dr. Tilak Dutta, Dr. Elaine Biddiss and my external

examiner Dr. Dimitrios Hatzinakos for their insightful questions and suggestions.

Thank you to all the members of the PRISM lab for your support and friendship. Special thanks

to Pierre Duez for his endless guidance and technical expertise, as well as Ka Lun Tam for his

unconditional help.

Thanks to all those who volunteered for my study. Without whom this work would not have

been possible.

Lastly, I would like to thank my family for their unconditional love and support without which

I would not have been where I am today.

iv

Contents

Acknowledgments ....................................................................................................................... iii

List of Tables ............................................................................................................................. viii

List of Figures .............................................................................................................................. ix

List of Acronyms .......................................................................................................................... xi

Introduction ................................................................................................................ 1

1.1 Motivation ................................................................................................................ 1

1.2 BCI system ............................................................................................................... 3

1.3 BCI cycle .................................................................................................................. 3

1.4 BCI Types and Control Signals ............................................................................. 4

1.5 P300 Response ......................................................................................................... 6

1.6 BCI speller ............................................................................................................... 7

1.6.1 SSVEP Spellers ............................................................................................................................................... 7

1.6.2 MI Spellers ...................................................................................................................................................... 9

1.6.3 P300 Spellers ................................................................................................................................................... 9

1.7 Combination of NLP and P300 Spellers .............................................................. 15

1.7.1 NLP for word completion .............................................................................................................................. 15

1.7.2 NLP for language models in classification .................................................................................................... 17

1.7.3 Performance metrics ...................................................................................................................................... 18

1.8 Project Overview ................................................................................................... 20

1.9 Research Questions and Objectives ..................................................................... 20

v

Methodology .............................................................................................................. 22

2.1 Participants ............................................................................................................ 22

2.2 Instrumentation ..................................................................................................... 22

2.3 Experimental Protocol .......................................................................................... 23

2.3.1 Offline ........................................................................................................................................................... 25

2.3.2 Online sessions .............................................................................................................................................. 27

2.4 Data Analysis ......................................................................................................... 30

2.4.1 Offline Session .............................................................................................................................................. 30

2.4.2 Online Session ............................................................................................................................................... 35

2.5 Assessment Metrics ............................................................................................... 35

A Novel Combination of Natural Language Processing and Brain Computer

Interfaces in a Question and Answer Context ..................................................................... 38

3.1 Abstract .................................................................................................................. 38

3.2 Introduction ........................................................................................................... 39

3.3 Methods .................................................................................................................. 41

3.3.1 Participants .................................................................................................................................................... 41

3.3.2 Experimental design ...................................................................................................................................... 41

3.3.3 Data collection ............................................................................................................................................... 44

3.3.4 Evaluation metrics ......................................................................................................................................... 45

3.4 Results .................................................................................................................... 46

3.4.1 ERP response ................................................................................................................................................. 46

3.4.2 Online performance ....................................................................................................................................... 47

3.4.3 Surveys .......................................................................................................................................................... 49

3.5 Discussion ............................................................................................................... 50

vi

3.5.1 Limitations and future directions ................................................................................................................... 51

3.6 Conclusion .............................................................................................................. 52

Results ........................................................................................................................ 54

4.1 Overview ................................................................................................................ 54

4.2 Feature Extraction ................................................................................................ 54

4.3 ERP responses ....................................................................................................... 56

4.4 Participant-specific offline classification results ................................................ 56

4.5 Participant-specific classification online results ................................................. 58

4.5.1 Constrained Blocks (Blocks 2-4) ................................................................................................................... 58

4.5.2 Unconstrained selection blocks ..................................................................................................................... 59

4.6 Surveys ................................................................................................................... 65

Discussion .................................................................................................................. 67

5.1 Overview ................................................................................................................ 67

5.2 Context-based corpus ............................................................................................ 67

5.3 Design parameters ................................................................................................. 68

5.4 Interface modifications ......................................................................................... 70

5.5 Error correction .................................................................................................... 71

5.6 Alternative modalities ........................................................................................... 71

5.7 BCI target population ........................................................................................... 72

Conclusion ................................................................................................................. 74

6.1 Overview ................................................................................................................ 74

6.2 Future work ........................................................................................................... 75

vii

6.2.1 Adaptively expanding the corpus .................................................................................................................. 75

6.2.2 Expressive communication: Taking turns ...................................................................................................... 75

6.2.3 Optimisation .................................................................................................................................................. 76

6.2.4 Customising the interface .............................................................................................................................. 76

Bibliography ................................................................................................................................ 77

Appendices .................................................................................................................................. 89

Appendix A1 ..................................................................................................................... 89

Appendix A2 ..................................................................................................................... 90

viii

List of Tables

Table 1.1: Difference between active and reactive BCI ............................................................ 5

Table 3.1: Average accuracies, ITR, MI for constrained selection blocks in online sessions. 47

Table 3.2: Selection rates, accuracies and ITR of the CI and CD blocks ................................ 48

Table 3.3: Selection rates, accuracies and MI of the CI and CD blocks .................................. 49

Table 4.1: Offline performance................................................................................................ 58

Table 4.2: Average accuracies, ITR and MI for constrained blocks in online sessions. ........ 59

Table 4.3: Selection rates, accuracies and ITR of the CI and CD blocks ................................ 60

Table 4.4: Selection rates, accuracies and MI of the CI and CD blocks .................................. 62

Table 4.5: Completion time and number of selections of the CI and CD blocks. ................... 64

ix

List of Figures

Figure 1.1: Visualisation of access solutions ............................................................................. 2

Figure 1.2: The BCI cycle .......................................................................................................... 4

Figure 1.3: GUI of the Bremen speller ...................................................................................... 8

Figure 1.4: Hex-O-Speller ......................................................................................................... 9

Figure 1.5: RC, CB and RB paradigms.................................................................................... 13

Figure 1.6: Chroma Speller ...................................................................................................... 13

Figure 1.7: Familiar face stimulus ........................................................................................... 15

Figure 1.8: The T9 speller ........................................................................................................ 17

Figure 2.1: Electrode configuration ......................................................................................... 23

Figure 2.2: The proposed interface .......................................................................................... 25

Figure 2.3: The GUI ................................................................................................................. 25

Figure 2.4: Timing of events during a trial .............................................................................. 25

Figure 2.5: An iterative selection trial. .................................................................................... 26

Figure 2.6: Offline session structure ........................................................................................ 27

Figure 2.7: Online session structure......................................................................................... 28

Figure 2.8: CI block ................................................................................................................. 30

Figure 3.1: Proposed interface ................................................................................................. 42

Figure 3.2: Timing of events during a trial .............................................................................. 42

Figure 3.3: ERP responses ....................................................................................................... 46

Figure 3.4: Topographic map of ERP response ....................................................................... 47

Figure 4.1: LDA score distribution for spatiotempral feature set ............................................ 55

Figure 4.2: LDA score distribution for tconcatenation featue set ............................................ 55

Figure 4.5: ERP classification accuracy versus stimulus repetition ........................................ 57

Figure 4.6: ERP classification accuracy versus the threshold value. ....................................... 58

Figure 4.7: Comparing average character accuracy of CI and CD blocks. .............................. 61

Figure 4.8: Comparing average ITR using CI and CD predictive spellers. ............................. 61

Figure 4.9: Comparing average word accuracy using CD and CI predictive spellers. ............ 63

x

Figure 4.10: Comparing average MI using CD and CI predictive spellers . ........................... 63

Figure 4.11: Comparing average completion time using CD and CI predictive spellers ........ 64

Figure 4.12: Comparing average number of selections using CD and CI predictive spellers . 65

xi

List of Acronyms

ACC accuracy

ALS amyotrophic lateral sclerosis

BCI brain computer interface

CB checker board

CD context dependent

CI context independent

CP cerebral palsy

CPM character per minute

DSLM Dynamic Stopping Language Model

EEG electroencephalography

EMCP eye movement correction procedure

EOG Electro-oculogram

ERD event related desynchronisation

ERP event related potential

ErrP error-related potential

ERS event related synchronisation

FF familiar face

GIBS gaze independent block speller

GFF green familiar face

GUI graphical user interface

xii

HMM hidden markov model

ISI inter-stimulus interval

ITR information transfer rate

LDA linear discriminant analysis

LSC lateral single character speller

MI motor imagery

MI mutual information

MMG mechanomyography

MRI magnetic resonance imaging

NIRS near-infrared spectroscopy

NLP natural language processing

OCM output character per minute

OOV out of vocabulary words

PBR practical bit rate

PF particle filtering

RB region based

RC row/column

SC single character

SCP slow cortical potentials

SNR signal to noise ratio

SR sensorimotor rhythms

xiii

SSVEP steady-state visually evoked potential

SVM support vector machines

TRCA task related component analysis

TVEP transient visual evoked potential

WSR word symbol rate

WPM word per minute

VEP visual evoked potential

1

Introduction

1.1 Motivation

Expressive communication entails the transmission of one’s needs and emotions to a

communication partner through body gestures, hand movements, speech or facial expressions.

However, many individuals living with severe disability often are not capable of

communicating through these channels [1].

Some technologies provide an alternative communication pathway for those individuals. For

instance, opening the mouth can be detected with infrared cameras [2], small muscle vibrations

can be measured using mechanomyography (MMG) sensors [3], or tongue protrusion can be

detected by computer vision [4]. Figure 1.1 conceptually depicts the components of such access

solutions. However, these technologies still require some amount of physical movement and

therefore, are not suitable for individuals who have severe motor impairments due to cerebral

palsy, degenerative neuromuscular conditions, or acquired brain injuries.

A brain-computer interface (BCI) is a technology which makes communication feasible

through neural activity, eliminating the need for body movement [5].

2

Figure 1.1: Visualisation of access solutions [1]

There are a number of methods to measure functional brain activities. Electroencephalography

(EEG), magnetic resonance imaging (MRI) and near-infrared spectroscopy (NIRS) are the

most common measurement modalities [6]. EEG signals can reflect electrocortical activity

before, during or after sensory, motor or cognitive events, known as event-related potentials

(ERP) [7].

Different types of brain signals have been used in BCI. Examples include visual evoked

potentials, slow cortical potentials and sensorimotor rhythms [8]. Each has been deployed in

different BCI applications. For instance, the visually evoked P300 potential has been proposed

as a control signal for spellers since 1988 by Farewell and Donchin [9]. Their proposed P300

speller consisted of a 6 × 6 matrix of characters where each row and column flashed at random.

Users were asked to focus on the character they intended to spell and count the number of times

the row or column containing that character flashed. Each flash of the desired row/column

elicited a P300 brain signal; therefore, by signal detection the intended character could be

identified. Although this research yielded promising results, the BCI was very slow (2

characters/min). Since then, much research has been conducted to improve communication

rates and accuracy. Different interfaces [10]–[12], stimuli [13]–[15] and control signals [5]

have been proposed. The inclusion of language models [7], [16]–[22] has also been suggested.

Despite numerous efforts to enhance the performance of P300 spellers, further improvement

of the information transfer rate (ITR) remains an elusive challenge.

The overall goal of this study was to investigate the potential merit of endowing a BCI with

natural language processing (NLP) capabilities. The most intuitive approach is to accelerate

communication by generating potential answers to a given question through NLP. However,

3

restricting a user’s response to machine-generated phrases may limit the spontaneity and

variability of natural conversation. An additional feature that provides more conversational

flexibility is to use NLP to generate context-relevant words while typing. The specific aim of

this research was thus to design, implement and evaluate an NLP-BCI communication interface

in terms of communication rate and user satisfaction.

In this chapter we will discuss the basics of BCI, application of NLP in BCI and literature on

BCI spellers.

1.2 BCI system

A BCI system is a communication pathway that does not require any muscular activity but

rather is dependent exclusively on neural activities [23]. As such, a BCI may be a suitable

alternative access pathway for people with severe motor impairments, due to conditions such

as amyotrophic lateral sclerosis (ALS), brain stroke, cervical spinal cord injury, cerebral palsy

(CP), or muscular dystrophies [7].

1.3 BCI cycle

The first step in a BCI cycle is to measure the brain signals through EEG or NIRS. These

modalities are dependent on the application of the BCI and the mental task used for control.

EEG measures the summation of electrical activity at the scalp caused primarily by synaptic

activity in the upper layers of the cortex [24] whereas. NIRS is an optical spectroscopy method

that measures the hemodynamic response during neural activity by irradiating near-infrared

light through the skull [25]. The next step is pre-processing, which is necessary due to low

signal-to-noise ratio (SNR) of the brain signals. The SNR is low because the signals cross

various skull layers and are contaminated by background noise from inside the brain and

externally over the scalp [26], [27]. This step will maximise the probability of detecting task-

related brain activity. After pre-processing, discriminative features must be extracted, which is

very challenging as there are many irrelevant and confounding brain activities [8]. Feature

engineering is critical to avoiding the curse of dimensionality, that is, to create a lower

dimension feature vector without relevant information loss [28]. The implementation of the

4

classification step depends on the application and data; here, one algorithmically categorises

the mental state of the user on the basis of the extracted features. The detected mental state is

subsequently used to control an external device, such as a wheelchair, a speller, etc. The BCI

cycle concludes with the user’s perception of the output. Figure 1.2 provides a schematic

summary of the BCI cycle.

Figure 1.2: The BCI cycle [25]

1.4 BCI Types and Control Signals

BCIs can be either invasive or non-invasive. Although invasive BCIs offer substantively higher

signal-to-noise ratio and spatial resolution, their clinical translation to date, is very limited. We

will therefore focus exclusively on non-invasive BCIs from here on.

Non-invasive EEG BCIs can be grouped into two groups based on what brain signals are used

as control. One is active (endogenous) BCI, the other is reactive (exogenous) BCI. The former

requires the user to actively engage in a cognitive activity. Examples of this type include those

using Slow Cortical Potentials (SCP)[29] and Sensorimotor Rhythms (SR)[30]. On the other

hand, exogenous BCIs rely on the brain activity associated with the user’s natural reaction to

an external stimulus [31]. Example of exogenous BCIs include Visual Evoked Potentials

(VEP), P300 BCIs and Steady State Visual Evoked Potentials (SSVEP).

Compared to reactive BCIs, active BCIs provide the user with more control over the system

[26]. However, they require extensive user training to gain sufficient control of the BCI system

[32]. Common examples of active control tasks are mental arithmetic and mental singing which

5

activate the prefrontal cortex. However, these tasks are usually unintuitive since they are

irrelevant to the output command [33]. Motor Imagery (MI) is another commonly used active

control signal which involves imagination of movement of body parts without overt execution

[34]. As a result of this cognitive task, amplitude modulations in the SMR known as Event

Related Desynchronisation (ERD) and Event Related Synchronisation (ERS) occur that can be

used to detect which part of the body the user was moving in their imagination and translate

that to a specific output command [35]. During the training process for MI, it is crucial to

emphasise kinaesthetic experiences rather than imagining visual images of movement; the

former could be challenging for individuals who have experienced stroke or lost control over

their limbs [36].

One of the common reactive BCI control signals is SSVEP. SSVEP is a type of VEP which

are fluctuations in the visual cortex when exposed to visual stimulus [37]. Depending on the

frequency of the stimulus, VEPs are separated into two groups, namely Transient VEPs (TVEP)

for frequencies below 6 Hz and SSVEP for higher frequencies [8]. VEPs are usually elicited

by flashing LEDs at different frequencies. This stimulus requires the user to visually fixate on

a flashing light source, eliciting a brain response at the same frequency and its harmonics [38].

SSVEP BCI applications can be used for spatial navigation [39] and have relatively high

information transfer rates (ITR); however, they pose the risk of inducing seizures [40].

Table 1 summarises the differences between active and reactive BCIs.

Table 1.1: Difference between active and reactive BCI [28]

Approach Active BCI Reactive BCI

Brain Signals - Slow Cortical Potentials (SCPs)

- Sensorimotor rhythms

- P300

- Steady State Visual Evoked

Potentials (SSVEP)

Advantages - Does not require computer stimuli

- Can be operated freely at will

- Can be used by individuals with

sensory impairments

- Suitable for cursor control applications

- Minimal training required

- Control signal set up easily and

quickly

- High bit rate (60 bits/min)

- Only one EEG channel

required

Drawbacks - Time consuming training

- Not all users are able to obtain control

- Multichannel EEG recordings required

for good performance

- Lower bit rate (20-30/min)

- Requires sustained attention to

external stimuli

- May cause visual and mental

fatigue

6

1.5 P300 Response

Another widely used reactive control signal is the P300 response [41] which is an ERP in

reaction to an infrequent target stimulus in a series of frequent stimuli known as the oddball

paradigm [7]. This ERP represents itself as a positive deflection in the EEG signal. P300

potentials can be separated into two groups, P3a and P3b which differ in the latency and scalp

topography. P3a originates from the frontal area as a result of attention mechanisms during task

processing with a latency of 250-280 ms, while P3b originates from the parietal lobe and is

associated with attention and subsequent memory procession with a latency of 250-500 ms

[42].

This phenomenon was first characterised in 1964 by Chapman and Bradgon [43]. Later in 1965,

Sutton et al. further explored this positive deflection by presenting a series of stimuli to

participants either a flashing light or a sound [44]. First, the sound or flash stimulus was

presented as a cue, then after three to five seconds (randomly selected) a test stimulus was

followed. Some of the test cases were predetermined, i.e. where the participant was certain of

the type of test stimulus (flash or sound). In other test cases, the user was uncertain about the

modality of the test stimulus and was asked to predict the type of the upcoming test stimulus

in the interval between cueing and testing. The study found that in the uncertain case a peak in

the EEG waveform occurred approximately 200 ms after the stimulus and that the amplitude

of this peak could be modulated with the probability of the stimuli; that is, a less probable

stimulus resulted in a larger peak [44]. As previously mentioned, the P300 response is known

to be associated with attention and memory process [42]. According to the “context updating

theory” in [45], the P300 response in generated by the updating of working memory when the

current event is different from the previous [45]. The less occurring stimulus is usually referred

to as target stimulus while the more expected stimulus is known as non-target. Some studies

have focused on the effect of target probability on the P300 response showing that the response

is enhanced when the target is infrequent and therefore less expected by the user [46], [47].

Another factor that can modulate the P300 response is the order of stimuli, i.e. whether a target

has occurred right after a previous target stimuli, it has been suggested that P300 responses can

also occur with equal target and non-target stimuli [45], [48].

Different type of modalities can be used to elicit P300 responses as the early studies have shown

[44]. Stimulus can be visual where users are shown a series of n items sequentially flashing in

7

random order and are asked to focus on one specific item[9]. The detected ERP can then be

translated to control an external device, such as a robotic arm [49] or a cursor on the screen

[50]. To eliminate the need for functional vision, alternative stimuli (e.g, tactile [51] and

auditory [52]) have also been considered. However, these alternative stimulus modalities have

elicited weaker ERP responses and achieved lower classification rates compared to visual

stimuli [53]–[55]. A combination of the modalities has been suggested as a solution to this

problem. A hybrid auditory-tactile BCI study in [56] demonstrated improvement in transfer

rates by exploiting multiple brain responses from SSVEP and P300 modalities.

In the rest of this chapter, we will focus on previous studies of BCI spellers and delve into

research conducted on P300 spellers as they form the basis of our proposed NLP-BCI system.

1.6 BCI speller

One application of brain computer interfaces is BCI speller. A BCI speller is a communication

device that enables individuals with motor and speech difficulties to communicate through a

graphical user interface (GUI). Through brain signal recordings and analysis, the user selects

his/her desired characters from the screen [7].

BCI spellers are similar to typical keyboards with the main difference being the method of

typing. While with regular keyboards, users press each button to produce the corresponding

letter on the screen, in a BCI speller, users simply select characters through cognitive activity.

Three control signals studied to interact with a BCI speller are P300, SSVEP and Motor

Imagery (MI). P300 spellers as we will discuss further in the next section consist of a series of

stimuli where the user has to focus on a specific cue (target). The occurrence of the target

stimulus in a random manner manifests as a positive deflection in the EEG signal that can be

classified using machine learning to determine the user’s desired character [57].

1.6.1 SSVEP Spellers

SSVEP spellers are controlled by gazing at light sources that flick at different frequencies. One

of the early SSVEP spellers was the Bremen-BCI speller [58] which consisted of a 32 character

diamond shaped grid with five command buttons, four arrows and one select button. Each of

these five control buttons flickered at a different frequency. The cursor was at the middle of

8

the screen by default. In order to make a selection, the user had to gaze at the arrow buttons to

move the cursor in their desired direction followed by fixating on the select button. Figure 1.3

shows the Bremen speller. Later, Volosyak et al. used a built-in dictionary to accelerate the

prediction process and boosted the ITR from 25.67 bits/minute to 32.71 bits/minute [59]. This

was the first SSVEP speller with predictive spelling. After further improvements in the signal

processing phase, the transfer rate was increased to an average of 61.70 bits/minute in a test

with seven participants [58]. Many other studies have been conducted on the effect of the

number of stimuli and different GUIs on the performance of SSVEP spellers such as Wang et

al. [60] who increased the number of target stimuli to sixteen and gained on average 75.4

bits/minute and 97.2% accuracy. Similarly, the increased number of stimuli and the use of

spatial filters to remove background noise have led to higher bit rates [61]. For more

information on SSVEP spellers please refer to [7].

Figure 1.3: GUI of the Bremen speller. [58]

Some studies have tested a hybrid BCI speller and exploited ERPs elicited by different stimulus

modalities to boost BCI performance [5],[62].

A limitation of the discussed spellers so far is gaze-dependency; the user must have control of

his or her gaze in order to interact with such systems [9]. This is known as overt attention.

Some studies have invoked covert attention with BCI spellers. Such interfaces do not require

fixation of gaze, thereby minimising ocular movement by using alternative features in colour

and shape to localise stimuli in a single, central location [63]–[67]. These techniques however,

still require functional sight. Auditory and tactile stimuli are two complete gaze independent

alternatives [68]–[70]. However, these BCIs are typically characterized by much lower ITRs

than their visual counterparts simply because of longer stimulus presentation times. An

alternative solution to the gaze dependency issue could be MI spellers.

9

1.6.2 MI Spellers

MI spellers are controlled by imagining movement of different body parts and are therefore

considered as active gaze independent BCI spellers. One the early MI spellers was the Hex-O-

Speller by Blankertz et al. [71] which consisted of a two-step process. As depicted in Figure

1.4, six hexagons each with five characters were arranged on the screen with an arrow that was

used as a region selector. The user had to imagine right hand or foot movement to select one

of the hexagons. After a region was selected, the five characters were spread each in one

hexagon and the same process continued for character selection. Although a good solution for

gaze independency, extended user training, mental fatigue and slower transfer rates were

disadvantages of this paradigm [7].

For more information on MI spellers refer to [7].

Figure 1.4: Hex-O-Speller. Each region was selected by imagining right-hand or foot movement and moving the

pointer [7].

1.6.3 P300 Spellers

As mentioned earlier, to maximise detectability of the P300, there should be a notable signal

difference between target and non-target event-related potentials. Usually, the user interacts

with a visual interface on a computer screen. The most well-known is the Row/Column

paradigm (RC) introduced by Farewell and Donchin in 1988 [9] as depicted in Figure 1.5.A.

This paradigm consisted of six rows and six columns including twenty-six alphabet letters and

ten digits. In this typical interface, each row and column flashed at random while the user

fixated on the desired character, counting the number of times it flashed. Each time the

corresponding row or column flashed, a peak in the user’s brain signal occurred, whereas,

10

flashing of the non-target rows/columns ideally did not elicit such changes in the brain signal.

This signal difference makes it feasible to detect the desired row and column and therefore, the

desired character. This study achieved a maximum accuracy of 95% and transfer rate of 12

bits/minute with four typically developed participants [9]. One advantage of this system was

that no user training was required. However, there were several issues that limited its clinical

utility.

The attention span, levels of fatigue and motivation, and mental state of the participant, directly

affect BCI performance. Käthner et al, [72] argue that high workload conditions attenuate the

P300 amplitude, underscoring the need for careful selection of stimuli and Inter-Stimulus

Intervals (ISI) [73]. Further, calibration is typically required as each user has slightly different

evoked brain response patterns. Repetition blindness, habituation and artefacts, can also

diminish real-time accuracies. To overcome these limitations, Treder et al. evaluated three

different variants of fast-paced, gaze independent visual spellers [74]. Participants could use

covert spatial attention, non-spatial feature attention (i.e., attention to colour and form) in two

paradigms, and overt attention in the third paradigm. Mean symbol selection accuracies of 85–

90% were achieved with thirty symbols, suggesting that overt attention is not necessary for

highly accurate responses. Other studies have investigated the effect of different matrix sizes

and concluded that performance decreases as symbol size is reduced [75]. Salvaris et al.

showed that a green and blue chromatic flicker matrix offers better performance than a black

and grey one [76].

Farewell and Donchin’s RC speller became the base of most future P300 spellers which were

developed to improve the system’s speed, classification accuracy and user friendliness. Below

we discuss alternative paradigms as depicted in Figure 1.5 that have addressed some of the

shortcomings of the initial proposed interface.

Single character (SC)

As an alternative to the regular RC interface, Guger et al. suggested flashing one character at a

time (Figure 1.5.B) [77]. Although this interface has the advantage of captivating user attention

during the experiment and therefore eliciting higher P300 amplitudes, it is slower compared to

the RC paradigm. SC flashing inevitably lengthens the time required to detect the target

character. To be more specific it was shown in [77] that with a 60 ms flash and a 40 ms ISI

period, 54 seconds are needed to flash each character fifteen times with a 6 × 6 matrix. On the

11

other hand, with a 100 ms flash and a 60 ms ISI, the RC interface requires 28.8 seconds to

present thirty flashes of each character. Therefore, the SC paradigm is twice as slower as the

RC paradigm. The accuracy computed in these two paradigms with nineteen participants was

a mean of 85.3% for RC and 77.9% for SC.

Checkerboard (CB)

One issue with the RC interface is that adjacent cells flash simultaneously. This is a source of

distraction as the non-target responses may appear as target responses. It has been discovered

that when such distraction occurs in the RC paradigm, the majority of incorrect selections lie

in the same row or column as that of the desired character [77]. An alternative approach to

mitigate such drawbacks is the checkerboard (CB) as depicted in Figure 1.5.C & D. The rows

and columns of the matrix are disassociated in the CB paradigm and Townsend et al. [78]

demonstrated that this disassociation enhances the performance by reducing distraction. As

depicted, the CB paradigm was an 8 × 9 matrix superimposed on a checkerboard. The items

were randomly placed in the white and black squares. Since these matrices were disassociated,

adjacent flashes did not occur. After populating the matrix at random with the items, they

flashed sequentially in the following order: white rows, black rows, white columns, and finally

black columns. After the first sequence ended, the matrices were repopulated at random again

and the next sequence occurred. Another advantage of this paradigm over the RC is the capacity

for more on-screen squares (87 vs. 36) which decreases the probability of the target character

occurrence and therefore, increased the amplitude of the elicited P300 during the oddball

paradigm. This paradigm was tested on eighteen participants with mean accuracy of 92% and

23 bits/minute.

Improved Checkerboard design

A number of studies have attempted to improve the baseline checkerboard. Lakey et. al in [79]

studied the effects of attentional resources, and demonstrated that mindfulness induction

significantly improved classification accuracy over a non-induction control group in the RC

and CB paradigms. Another study showed that the CB paradigm could be further improved by

suppressing and not flashing the items surrounding the attended item during calibration. Online

results showed that this suppression calibration method leads to enhanced performance

compared to the standard CB paradigm.

12

Region-Based

Fazel-Rezaei et al. suggested a region-based (RB), two-level paradigm as depicted in Figure

1.5 panels C and D where all the characters were divided into several regions [80]. In the first

level, the user focused on the desired character while all the regions flashed. After several

flashes of each group, the selected group was detected. Afterwards, in the second level, each

character in the group flashed until the selected character was identified. It was shown in [80]

and [81] that this paradigm significantly decreased, human error and the adjacency problem. It

was found that the overall spelling accuracies averaged for the same set of subjects, trials, and

characters for RC, SC, and two variations of RB paradigms were 85%, 72.2%, 90.6%, and

86.1%, respectively [81].

The RB and CB paradigms were new directions in P300 BCI research that produced superior

performance over the traditional RC approach.

A number of other interfaces have been suggested to mitigate the adjacency problem and gaze

dependency that deviate from of the standard RC P300 speller. These include the Chroma

speller [64], Geospell [65], Gaze Independent Block Speller (GIBS) [67], Lateral Single

Character Speller (LSC) [82] and T9 [17].

The Chroma speller for instance, was designed to eliminate gaze dependency by having each

row in a distinctive colour. A letter was selected based on a two-step process. The user needed

to focus on the colour of the desired row rather than the specific letter. After a row had been

selected, the corresponding letters were spread in each colourful row and a single letter was

selected in a similar manner. This two-stage paradigm is suitable for ALS patients as they may

have a limited oculomotor control. However, this system has not yet been tested on ALS

individuals [64]. Figure 1.6 illustrates this two-stage selection speller.

Some of these paradigms were later integrated with a predictive speller to improve efficiency,

as detailed. We will describe the details of them in the next section.

13

Figure 1.5: A: Rows and columns are flashed. B: A single character is flashed. C&D: Checkerboard paradigm.

E&F: Region-based, two level selection, one region is expanded at the second level selection. [55]

Figure 1.6: Chroma Speller. At the first stage, a row is selected by focusing on its colour. After a row has been

selected, the blue row for instance, letters of that row spread among the different rows and the same process

occurs to make a single letter selection [7].

14

Alternative stimuli

Although flashing stimuli are the most typical and common in P300 spellers, there have been

studies suggesting alternative stimuli. For instance, Guo et al. studied the performance of a

virtual keyboard that deployed a moving vertical bar as a stimulus instead of flashing [83]. In

this interface a vertical bar appeared below each key and moved leftward at random intervals.

This study showed that moving stimuli can elicit strong P300 ERPs for offline studies. Another

study compared the performance of the flash stimuli against that of moving stimuli on the

typical 6 × 6 matrix [84] in an offline paradigm. This work concluded that the moving stimuli

elicit a stronger P300 signal than the flashing stimuli. An online comparison of these two

stimuli was presented in [85], where twelve participants interacted with an online P300

interface subsequent to individual offline calibration (selecting among six letters). A noticeably

high average transfer rate was achieved (42.1 bits/min) with motion-onset visually evoked

potentials.

In another study, Jin et al. compared three types of stimuli: The typical flash, the vertical

moving stimuli and a combination of them. Ten individuals participated in this study and all

had better performance with the hybrid stimuli compared to flashing or moving stimuli alone

[86].

An alternative stimulus suggested by Kaufmann et al. was familiar faces [87]. In this variation

of the P300 speller, familiar faces were transparently superimposed on the letters of the P300

matrix (see Figure 1.7 panel A). This type of stimulus elicited other ERPs such as N200 and

N400f (“f” for face) which were negative peaks roughly 200 and 400 milliseconds following

stimulus presentation. The latter negative peak originates from the inferior temporal gyrus

which is associated with visual stimuli processing, object recognition and face perception

[88].The appearance of additional ERPs facilitated detection and therefore, increased the

transfer rate.

In another study, Kaufmann et al. implemented two simultaneous stimuli where some regions

were illuminated with a familiar face and others with a symbol [89]. This two-stimulus

paradigm achieved noticeably higher transfer rates ~80 bits/minute, but at reduced accuracy,

(81.25%). Nonetheless, this finding suggests that there is still potential to enhance the speed of

P300 BCI spellers.

15

Studies have shown colour sensitivities in the parietal, occipital and temporal areas [90].

Further work has studied the influence of chromatic properties on the familiar face stimulus,

demonstrating that a green coloured face stimuli elicited higher amplitudes of P300 ERPs [15].

Figure 1.7 panel B shows this stimulus.

Figure 1.7: A) familiar face stimulus B) Green familiar face stimulus [15]

It can be concluded that recent studies show that alternative stimuli which elicit stronger ERPs

have better performance and can be considered as a substitute for the canonical flashing stimuli.

1.7 Combination of NLP and P300 Spellers

The field of natural language has been studied for many years in the domain of linguistics [36],

machine translation [6] and speech recognition [3]. However, language models have only been

recently integrated into the BCI domain [39]. The most common use of NLP in the field of BCI

is in P300 spellers. Language models can be exploited for word completion [38], signal

classification, and error correction [39], ultimately increasing communication rate [20].

1.7.1 NLP for word completion

Donchin et al. mentioned “substantial sequential dependencies in English” which could be

leveraged in classification [91]. Including known patterns and structure of language in a BCI

communication system can effectively improve the spelling rate, accuracy, and error

correction. Ryan et al. in [21] added a spelling checker to the standard CB paradigm consisting

of a 8 × 9 matrix and reported increased speed in typing. They used the output of the P300

speller as input to an assistive word completion software, WordQ2 (version 2.5, Quillsoft, Ltd,

Toronto, ON), which references a dictionary for potential word suggestions. The top

16

suggestions were sent back to the user for selection according to their number in the suggestion

list. This application is similar to the word completion suggestions on a smart phone.

Interestingly, accuracy decreased as the task and interface became increasingly complicated.

However, the spelling speed increased since complete words could be typed with less

selections. The following year, Kaufmann et al. combined a similar dictionary lookup method

with the P300 speller for German words [22]. This approach mitigated the workload issue of

Ryan et al. as it included the suggested words in the same matrix as that for selecting characters.

In this study, word suggestions were listed by looking up German webpages sorted by the

number of repetitions. The algorithm searched this list after a few letters were selected by the

user and presented the top six matches. These top matches were then presented in a column in

the P300 speller. A delete button was also included in the interface in case none of the

suggestions were correct and the user wanted to go back to typing mode. Later, Akram et al.

suggested an interface similar to that proposed by Ryan et al. However, instead of selecting the

number of the desired word from the 8 × 9 matrix, the interface switched to a 3 × 3 matrix of

numbers, each corresponding to one of the nine suggested words [17] . This interface was later

embedded into one single T9 paradigm [18]. This integration may have reduced the complexity

caused by switching between interfaces; however, the two-step selection of letters and then

words can be confusing for users, especially now that smart phones do not use the T9 interface.

Another issue was that each selection corresponded to at least three letters and as more

selections were made, the combination of possible target letters increased. Also, since the

suggestions were only shown when the number of retrieved words from the dictionary were

nine or less, the system’s transfer rate was limited. Figure 1.8 demonstrates this interface.

A recent study by Guy et al. used smiley faces as stimuli in a matrix speller and implemented

a word prediction dictionary with the Presage library to present the top ten suggested words on

the right side of the keyboard [19]. This interface was tested on twenty ALS participants and

65% of them gained above 95% accuracy with 5.04 correct symbols/minute.

Clearly, integration of predictive spellers can reduce frustration, mental demand and selection

time. These systems make predictions by simply checking selected letters against dictionary

entries. When the letters do not match any sequences in the dictionary, the system will change

the letter sequence to match one in the dictionary. However, an issue of this model is that it

cannot manage Out Of Vocabulary (OOV) words. Also, none of the studies mentioned in this

section have included prior knowledge of natural language in their classifiers; that is, in the

17

classification step, an equal probability for all the cells was assumed. However, based on what

letters have been selected, some prior assumptions can be made, e.g. probability of selecting

the letter “u” after “q” is higher than other letters.

In the next section we focus on previous studies that have included this prior knowledge in

their classification process.

Figure 1.8: The T9 P300 speller. A) At the first step the user had to focus on the numbers associated with the

desired letter. The predictive speller searched for words starting with the selected letters. B) When these

suggestions numbered nine or less, they were represented on the screen and indexed numerically. C,D) The user

focused on the target number.

1.7.2 NLP for language models in classification

Language models attempt to model character patterns based on the corpora of the existing text.

These models provide a probability distribution for target characters based on previous

selections, which can be used as a prior probability for future selections. The simplest of such

18

models captures patterns by finding the relative frequency of n grams, sequences of 𝑛

consecutive characters. These models are created by parsing through a corpus of text and

counting the number of occurrences of these sequences. The conditional probability of a

character, xt, given the previous 𝑛 − 1 characters can then be computed [92]:

p(xt| xt-1

,¼,xt-n+1

) =c(x

t,¼,x

t-n+1)

c(xt-1

,¼,xt-n+1

) (1.1)

where 𝑐(𝑥𝑡, … , 𝑥𝑡−𝑛+1) is the number of occurrences of the character sequence

𝑥𝑡𝑥𝑡−1…𝑥𝑡−𝑛+1. The number of n grams is exponential in n; therefore, the algorithm will be

slow in real-time classification. Many studies have used n grams, specifically bigrams [92],

[93], and trigrams [92], [94], [95] as language models to improve the classification of ERPs

using naïve Bayes and the Hidden Markov Model (HMM). The latter is a representation of

processes that cannot be directly observed, but which can be predicted by state-dependent

output. The objective of an HMM is to determine the optimal sequence of states that may have

produced a certain outcome [96]. A typed word is interpreted as a sequence of states of the

process 𝒙 = (𝑥0, … , 𝑥𝑛) that can only be indirectly observed through the recorded EEG data.

The goal is to determine 𝒙 by observing the EEG data [73], [97], [98].

More recently, Speier et al. used Particle Filtering (PF) to compute prior probabilities and

correct errors through statistical modelling [20]. This classifier computes the probability

distribution over possible outputs by sampling a batch of possible realisations, i.e. a batch of

potential output strings typed by the user. Each of these particles moves through the model

independently based on the transition probabilities [20]. This model is useful when the

estimation of the probability distribution over all possible strings in real-time is

computationally intractable.

1.7.3 Performance metrics

The most commonly used metrics are accuracy which is the number of correct selections over

the number of all selections and ITR which is the number of error-free bits per minute [26].

ITR finds the average bits of information communicated through each selection, B as the

mutual information between selection y and the target character x, divided by time [99].

19

Written Symbol Rate (WSR)

Another metric used is written symbol rate [100]. First, the Symbol Rate (SR) is computed as

the bits per trial scaled by its maximum possible value, 𝑙𝑜𝑔 𝑁, where 𝑁 is the number of

possible targets. SR is considered as the probability of a correct selection. This metric is not

suitable for cases when multiple decisions are required for a correct selection. The average

number of selections necessary to choose one character is then found by determining the

number of additional selections required for correcting errors. WSR becomes zero if the

number of errors are more than the correct selections, i.e. 𝑆𝑅 ≤ 0.5 [101].

𝑊𝑆𝑅 = {2𝑆𝑅 − 1

𝑇 𝑆𝑅 > 0.5

0 𝑆𝑅 ≤ 0.5 (1.2)

Practical Bit Rate (PBR)

Practical bit rate simulates error correction and uses accuracy (P) instead of SR. It then divides

the bits of information in a single correct selection (assuming all character have equal

probability) by the average number of selections to make a selection [78][101].

𝑃𝐵𝑅 = {(2𝑃 − 1)𝑙𝑜𝑔𝑁

𝑇 𝑃 > 0.5

0 𝑃 ≤ 0.5

(1.3)

Characters per Minute (CPM)

Characters per minute is similar to PBR with the difference that it does not consider the size of

the grid [101].

𝐶𝑃𝑀 = {(2𝑃 − 1)

𝑇 𝑃 > 0.5

0 𝑃 ≤ 0.5

(1.4)

Output Character per Minute (OCM)

This metric is only suitable for cases that require the user to correct all errors. It is computed

by dividing the total number of characters by the time required to type them [101].

20

Mutual Information (MI)

This is a similar metric to ITR with the difference that it does not assume that all selections are

equally likely and considers the accuracy at a word level, eliminating the issue of longer words

transferring more information [101]. The formula for ITR and MI can be found in section five

of the next chapter.

1.8 Project Overview

Based on previous studies, P300 spellers seem to be a promising solution to communication

challenges. Implementation of various visual features in the interface such as colour, stimulus,

selection process, etc. have improved the ERP responses and therefore performance. Later

studies have investigated the effect of integrating predictive spellers and language models.

However, to the best of our knowledge none of these studies have implemented a context aware

P300 speller to further facilitate communication. To this end, this thesis focused on the

development of a context-dependent P300 speller in a question and answer context.

1.9 Research Questions and Objectives

This research aimed to answer the following question:

What magnitude of change, if any, in BCI classification accuracy and bit rate, can be achieved

through the combination of a P300 BCI, context relevant predictive speller and an answer

generation engine in a single adjacency pair conversation?

An adjacency pair is an organizational unit of conversation, consisting of two utterances in

succession, by two conversation partners. A question posed by one speaker followed by an

answer from the other is an example of an adjacency pair, i.e. a type of conversational turn-

taking.

The objectives of this study were threefold:

21

1- Design an offline NLP-BCI interface for a question and answer context with an accuracy of

at least 70%

2- Implement this interface online with a minimum accuracy of 70%

3- Contrast the performance of the proposed system against that of previous relevant research

This study consisted of two main technical components: natural language processing and the

BCI speller. To some extent there are similarities between this research and research conducted

in [16], [50]; however, neither previous study has included language models in their work. In

this study, we validated that using such prior knowledge (i.e. language model) improves

communication performance.

Based on these questions and objectives, we hypothesise that the combination of NLP and a

P300 speller in the context mentioned will improve the BCI performance.

22

Methodology

2.1 Participants

This study was approved by the Holland Bloorview Kids Rehabilitation Hospital and the

University of Toronto ethics review board. Ten typically-developed adults aged 20-40, with no

verbal, motor or neurological conditions and normal/corrected vision were recruited through

Holland Bloorview Kids Rehabilitation Hospital and the University of Toronto. Participants

gave informed consent prior to their participation. The study consisted of one offline and three

online sessions, each of an hour duration. Data were collected from each participant on four

different days.

2.2 Instrumentation

EEG data were collected from eight channels, namely, Fz, Cz, Pz, P3, P4 , PO7, PO7 and Oz

[102], using the BrainAmp DC amplifier (Brain Products GmbH, Germany). All signals were

sampled at a rate of 1000 Hz and the impedance of each active electrode was maintained below

10 kΩ for the duration of all sessions. As depicted in Figure 2.1, the electrodes were grounded

to AFz and referenced to FCz.

23

Figure 2.1: Electrode configuration [102].

2.3 Experimental Protocol

Participants were seated comfortably in a chair located approximately 80 cm from a 22” LED

computer monitor with a resolution of 1680 × 1080 pixels. Our design consisted of a speech-

to-text tool that converted the question asked by the conversation partner, who in this study

was the researcher, into text. The text of the question was displayed on the screen for the

participant. We used Google’s API for the speech-to-text conversion. In the next step, this text

was sent to an NLP engine to classify the intent of the question. We used MITIE open source

library [103] for detecting the context of the question. The detected intent was then used to

generate six potential answers to the question. The potential responses were displayed in the

6th column (the suggestion column) of a 6 × 6 speller. The initial suggestions were

predetermined based on frequency and popularity and tagged with the relevant context to

facilitate retrieval by the answer generation engine on the basis of detected intent. Participants

were given five seconds to locate their target cell. These suggestions were retrieved from a

context-based dictionary with twenty categories and 3302 words. The dictionary was designed

specifically for this study as we did not find any off the shelf context-based corpora. The

number of categories was determined based on the design of the experiment, i.e. how many

questions fit in an hour-long session. We implemented the famous-faces stimuli with green hue

[15] and pseudo-random flashes [104] with stimulus onset of 100 ms and inter-stimulus interval

of 200 ms. A five-second gap followed each selection, allowing the participant to check the

24

current word suggestions and to navigate through the grid to the target letter or word. Figures

2.2 and 2.3 illustrate the interface and the general flow of a session, respectively.

Figure2.1 The experimental paradigm. The researcher asked a question verbally which was detected through

speech recognition and converted to text. The question was sent to the NLP engine for entity recognition and

retrieving potential answers. Then the grid with letters and suggested words were represented to the participant

and the brain signals were subjected to pattern recognition.

Figure2 2: After the question was asked verbally by the researcher, the corresponding text was shown on the

screen and suggested answers populated the last column.

Researcher

Participant

25

Figure 2.2: Timing of events during a trial. The question was only asked verbally in the last two blocks of the

online sessions. The timing of classification varied among sessions (offline or online) and individuals.

2.3.1 Offline

The offline session consisted of five blocks. Each block had six trials (questions). In each trial,

a question, the designated answer (shown underneath the question) and a grid of letters flanked

by a suggestion column (as in Figure 2.2) were shown on the screen for six seconds, giving the

participant the time to read and prepare. The cell (letter or word) within the grid that the

participant was to select, herein referred to as the target, appeared in red highlight (Figure 2.4a)

and subsequently flashed for three seconds.

Three out of the six trials in each block were randomly selected as iterative selection trials,

where the participant was asked to start by typing the first letter of the designated answer (not

among the answers in the suggestion column). As the participant started to type the characters

of the answer one letter at a time, the suggestions were updated accordingly. An example is

depicted in Figure 2.4. The question was “What fruit do you want to eat?” and the designated

answer “orange” appeared under the question with the letter “o” highlighted in red as a cue

for the participant (Figure 2.4A). The fact that only the first letter was highlighted indicated

that the answer was not in the suggestion column. The participant focused on “o” in the grid

and after fourteen flashes of all the cells, feedback was given to the participant showing that

letter “o” had been selected (Figure 2.4B). The number fourteen was determined based on

literature [20]–[22], [93] and confirmed via pilot sessions. Our context relevant predictive

speller then searched the category of fruits for words that have the least Lavenshteine distance

from the selected letter and repopulated the suggestion column with the top six of such words

(Figure 2.4B). The Lavenshteine distance [105] is defined as the minimum number of edits,

namely insertions, deletions and substitutions that can be made in string a to arrive at string b.

Subsequent to the updates to the suggestion column, the next target was highlighted in red,

26

directing the participant’s focus accordingly. In the current example, the next target was the

word “orange”, which now appeared in the selection column.

The other three trials of a block entailed selecting among the answers provided in the

suggestion column (single selections), as shown in Figure 2.4C.

(A)

(B)

(C)

Figure 2.3: An iterative selection trial (A) where the designated answer was not among the suggestions and the

participant had to start typing letter by letter until the target appeared in the suggestion column (B). A single

selection trial where the target was among the suggestions (C).

The structure of the offline session is summarized in Figure 2.5.

27

Figure 2.4: Offline session structure. Each block consisted of six questions. Three out of six were single

selection questions where the answer was among the suggestion column and the rest were iterative selection

trials where the designated answer was not initially among the suggestions.

2.3.2 Online sessions

Participants completed three online sessions. For the first four blocks of each online session,

the distribution of trials resembled that of the offline session (i.e., 3 single selection, 3 iterative

selection trials; participants prompted with target selection).

First Online Block

The first block was offline and used as same day data. The number of flashes were fixed to ten

instead of fourteen to reduce the risk of fatigue.

Blocks two to four: Constrained selection Blocks

For blocks two to four, the answer was provided to the participant in the same manner as in the

offline trials (red highlight) with a slight difference. In these blocks, the participants had to

navigate through the grid to find the target on their own, prior to the stimulus flashes. This was

to prepare the participants for a more realistic interaction with the system in the last two,

“unconstrained selection blocks”. The structure of the online sessions is summarised in Figure

2.6.

28

Figure 2.5: Online session structure. Blocks 1-4 were structured in the same way as blocks in the offline session

except that in blocks 2-4, the feedback was the result of online classification. For blocks 5 and 6, questions were

asked verbally and the participant decided how to respond. The classification model was retrained after each

block.

The number of flashes in the online blocks varied from two to eight. After each flash sequence

of the grid, probabilities of each of the thirty-six cells being the target were updated and if any

cell had a probability higher than 80%, it was determined to be the participant’s intended

character. We will discuss how we decided on our threshold level later. The participants were

asked not to correct their mistakes. Allowing for backspace poses complex modelling

challenges and will not allow for fully exploiting the information from the language model

[19], [20], [106], [107].

The offline blocks and constrained selection blocks (blocks 2-4) had designated answers which

varied from one session to another but were consistent among different participants.

Knowledge of the ground truth in the online sessions helped us retrain our model with the

additional data accumulated after each block.

Last two online blocks: Unconstrained Selection Blocks

In order to determine if our proposed paradigm could outperform previously studied P300-NLP

spellers in terms of communication rate and to also test our system in a more realistic manner,

we included two unconstrained selection blocks at the end of each online session. The

participant was given the freedom to respond with a word at their discretion. For one

unconstrained selection block, the BCI ignored the context of the question (context-

independent block), while for the other, the BCI invoked the context-dependent answer

29

generation engine (context-dependent block). The presentation of the last two blocks was

pseudo-randomized to minimize any potential order effects.

In these unconstrained selection blocks, a standard set of five questions was asked verbally by

the researcher. Through speech-to-text, the transcript of the question was displayed on the

screen and the participant had five seconds to think of how they would like to respond. These

questions were different for every session but standardised across participants.

For the context-independent (CI) block, we used a corpus of the most commonly used English

words as our source for the suggestion column. As in previous studies, given the absence of

context, the suggestion column was initially empty, forcing the participant to type the first letter

of their response (Figure 2.7). As the system detected the participant’s desired letter(s), the last

column was (re)populated with the most frequent words having the closest distance to the

letters typed so far. However, these words may have been irrelevant to the context of the

question asked.

On the other hand, in the context-dependent (CD) block, after each question was asked, the

transcript of the question was subjected to the NLP engine for intent recognition. Based on the

detected category, the last column was populated with context relevant suggestions that the

participant may potentially have had in mind (Figure 2.2). The participant could decide to either

select among those suggestions or type letters until their intended response appeared in the

suggestion column.

In both blocks, similar to the other online blocks, participants were asked not to correct their

mistakes. If the target word was not among any of the updated suggestions as they were typing

the word letter by letter, they had to select DONE to proceed to the next trial. In order to avoid

misclassification of DONE or any of the other command buttons, we designed our classifier to

only act on these commands after they were selected twice in succession, i.e. selecting the

command cells once had no consequence [22].

Participants completed a survey, the NASA Task Load Index (NASA-TLX) at the conclusion

of each session to capture their experience with our proposed communication system [108].

They also comparatively rated and commented upon the context dependent and independent

systems.

30

2.4 Data Analysis

2.4.1 Offline Session

Preprocessing

EEG signals were resampled to 50 Hz, bandpass filtered between 1Hz to 25 Hz with a finite

impulse response (FIR) filter. A notch filter at 60 Hz was applied to suppress power line

artefacts. Next, trials were epoched from 200ms prior to the stimulus onset to 800 ms post-

stimulus. The average of 200 ms pre-epoch data was subtracted from the data points to cancel

the baseline amplitude offset.

Feature Extraction and Selection

The most commonly used feature extraction for the oddball paradigm is trial averaging, which

along with several alternatives, are described below.

Trial Averaging Method

Recall that the P300 is an event-related potential appearing as a reaction to an infrequent target

stimulus in a series of frequent non-target stimuli. Due to existing noise in measuring the EEG

signals, this ERP will not be visible after only one target stimulus representation. In order to

amplify this peak and reduce the noise, multiple epochs corresponding to the target stimulus

Figure 2.6: A question asked in the context-independent, unconstrained selection block. The suggestion column was initially

empty as the system did not consider the context of the question and only provided suggestions after the user started to type.

Suggestion Column

31

are typically averaged. This method is not be suitable for cases where the latency of P300 varies

among different sessions.

Temporal, spectral and frequency features

Different characteristics of the EEG waveforms can be considered as features. Some of these

features are as follows [109]: 1) ERP latency, 2) Maximum signal amplitude, 3)

Latency/amplitude ratio,4) Absolute maximum amplitude, 5) Absolute latency/amplitude ratio,

6) Positive area, 7) Negative area, 8) The sum of positive and negative area, 9) Absolute value

of the sum of positive and negative area, 10) The sum of absolute positive and absolute negative

area, 11) Average absolute signal slope, 12) Peak-to-peak amplitude, 13) peak-to-peak time

window, 14) Peak-to-peak slope, 15) The number of zero-crossing in the peak-to-peak time

window, 16) Zero crossings per time unit in the peak-to-peak time window, 17) Slope sign

alterations, 18) Mode frequency, 19) Median frequency, 20) Mean frequency, and 21) Wavelet

coefficients.

Concatenation Method

In this method, the epochs of all 𝑁 channels are concatenated to create one feature vector of

length 𝑁 × 𝐷, where 𝐷 is the number of data points in each epoch after downsampling. For our

case we had eight channels each with fifty samples leading to a feature vector of length 400.

In the next chapter, we justify the concatenation method as the preferred approach in this study

for distinguishing target from the non-target classes.

Classification

We tested the selected features with a number of classifiers, namely, Support Vector Machines

(SVM), Random Forest, Linear Discriminant Analysis (LDA), and Naïve Bayes. We

conducted a 10-fold cross validation on the offline data and compared the performance of

different classifiers. The best results were obtained with a Naïve Bayes classifier.

The Bayes Theorem computes the probability of hypothesis 𝐻 given some data 𝐷, i.e. 𝑃(𝐻𝐷):

𝑃(𝐻𝐷) = 𝑃(𝐷𝐻)𝑃(𝐻)

𝑃(𝐷) (2.1)

32

where

𝑃(𝐻𝐷) is the probability of hypothesis H given the data D. Formally, this term is the

posterior.

𝑃(𝐷𝐻), known as the likelihood, is the probability of the data D given the hypothesis

was correct.

𝑃(𝐻) is the probability of H irrespective of the data and is known as the prior

probability of the hypothesis.

𝑃(𝐷) is the probability of the data regardless of the hypothesis.

In the case of a P300 speller, the probabilities in the Bayes Theorem can be rewritten as below

[92]:

𝑃(𝑥𝑡𝒚𝒕, 𝑥𝑡−1, … , 𝑥0 ) = 𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) 𝑃(𝒚𝒕𝑥𝑡 , … , 𝑥0 )

𝑃(𝒚𝒕𝑥𝑡−1, … , 𝑥0 )

= 1

𝑍 𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) ∏ 𝑓(𝒚𝒕

𝒊𝑥𝑡)𝑖

(2.2)

where

𝑃(𝑥𝑡𝒚𝒕, 𝑥𝑡−1, … , 𝑥0 ) is the probability of typing character 𝑥𝑡 given the score of that

character flashing and the characters typed so far.

𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) is the prior probability of having character 𝑥𝑡 after 𝑥𝑡−1, … , 𝑥0. This

is computed through a language model.

𝑍 is the normalising constant.

𝑃(𝒚𝒕𝑥𝑡 , … , 𝑥0 ) is the likelihood probability and reflects the distribution of scores

during stimulation. Based on [5], [92], [93] consecutive flashes are assumed to be

drawn independently from a Gaussian distribution. The probability density function for

the likelihood probability can be computed,

𝑓(𝑦𝑡𝑖𝑥𝑡) =

{

1

√2𝜋𝜎𝑎2𝑒

1

2𝜎𝑎2(𝑦𝑡

𝑖−𝜇𝑎)2

if 𝑥𝑡 𝑨𝑡𝑖

1

√2𝜋𝜎𝑛2𝑒

1

2𝜎𝑛2(𝑦𝑡

𝑖−𝜇𝑛)2

if 𝑥𝑡 𝑨𝑡𝑖

(2.3)

33

where

𝑦𝑡𝑖 is the score for character 𝑥𝑡 for the ith flash.

𝑨𝑡𝑖 is the set of characters illuminated for the ith flash for character 𝑥𝑡 in the

sequence.

𝜇𝑎, 𝜎𝑎 and 𝜇𝑛, 𝜎𝑛 are the means and standards deviation of the distributions for

the attended (i.e., target) and non-attended flashes respectively. These values

are computed from the offline data and updated between online blocks as

mentioned earlier.

The class which maximises the posterior probability will be the output of the classifier.

Language Model and Prior Probabilities

Early studies of P300 spellers considered the prior distribution to be uniform, i.e. 1

𝑁 where 𝑁 is

the number of cells in the grid. In other words, a constant prior probability of 1/36 was

considered for all cells in all trials for a 6 × 6 grid. This naïve approach does not take into

account the differential frequency of letter occurrence given the previously typed letter, e.g.

after the letter q, the letter u is the most likely to occur.

More recently, studies have taken this prior language knowledge into account. Speier et al. [92]

suggested a trigram model using the second-order Markov assumption. Based on this model

the probability of a character 𝑥𝑡 being typed given the last two characters is:

𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) = 𝑐(𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡)

𝑐(𝑥𝑡−2, 𝑥𝑡−1) (2.4)

where 𝑐(𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡) is the number of occurrences of the string ‘𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡’. For the first

two characters of a string, i.e., when 𝑥𝑡−2 and 𝑥𝑡−1 are not defined, one can compute the prior

probability as below:

{

𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡)

𝑐(𝑠𝑡𝑎𝑟𝑡) if 𝑡 = 0

𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡−1, 𝑥𝑡)

𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡−1) if 𝑡 = 1

(2.5)

34

We decided not to adopt this model for the following reason.

The main assumption in the trigram model is that the last two characters 𝑥𝑡−1, 𝑥𝑡

have been correctly classified and thus the subsequent search would be for

‘𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡’ in the corpus. This scheme is problematic if at least one of

𝑥𝑡−2 or 𝑥𝑡−1 had been misclassified, as the subsequent search would then be for an

incorrect string. For instance, for the target word ‘Interesting’, if the first character

in the target word was classified as N instead of I, the probability of classifying N

as the next character will be zero as the count of words that start with NN is zero.

This makes it impossible for the system to recover from a mistake.

Kindermans et al. regularised the n-gram model by applying Witten-Bell smoothing [110]

which assigns small non-zero probabilities to n grams that do not exist in the corpus [107].

However, we designed our system differently to account for out-of-corpus strings. For each

letter the computed probability of it being the target is as follows:

0.475 × (0.85 × 𝑐(𝑥𝑡−1, 𝑥𝑡)

𝑐(𝑥𝑡−1)+ 0.15 ×

𝑐(𝑥𝑡)

𝑐(∗)) (2.6)

This method is philosophically akin to the smoothing algorithm used by [107] and very similar

to the approach taken in [93]. There are a few things to note:

We assumed that almost half of the time, the participant would not select among the

suggestions. This is the justification for using a 0.475 weight for selecting a character.

In order for the system to recover from a mistake, we split the letter probability into

two terms. The first term assumes that the previous character 𝑥𝑡−1 was correctly

classified with an 85% confidence. The second term ignores what has been typed so far

and counts the number of words that have 𝑥𝑡 in position t regardless of the other t-1

characters. 𝑐(∗), denotes the total number of words in the corpus with minimum length

of 𝑡. Similar to [93] the weights were set based on offline analysis.

If no word can be found to match the desired sequence, we set the count to one. This

will mitigate the issue of Out Of Vocabulary (OOV) words.

The reason for using a different method than Witten-Bell smoothing was to account for

adaptively extending the limited corpus as the participant interacts with the system. In other

words, the non-zero value that smoothing methods assign to zero-occurrence of a sequence

35

cannot differentiate between OOV words and misclassification. Separating these two cases as

in our method allows for the automatic addition of new words to the context-based corpus. This

matter is discussed in further detail in chapter 6.

The probability of selecting a full word w from the suggestion column is as follows:

0.475 × {

1

6 if no letter has been selected

𝑐(𝑤) × (𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑣𝑎𝑙𝑢𝑒)𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 else

(2.7)

When the participant has not typed anything, all six suggestions have the same probability of

being selected. As the participant types letters, the probability of each word becomes

proportional to its frequency in the corpus (this count is one for all the words in the context

relevant case), weighted by some penalty value that decreases as the Levenshtein distance from

the typed sequence increases. The penalty value was set to 0.8 empirically.

Threshold Determination

Based on previous studies, the threshold probability that maximised the bit rate was chosen per

subject. [92], [98]. We decided to choose a fixed empirical value for all participants that was

higher than the average optimal threshold for ITR. Based on multiple pilots on different

participants, we set this value to 0.8. More details on this empirical decision can be found in

chapter 5.

2.4.2 Online Session

Online signal processing was similar to that invoked offline. The features were extracted by

concatenation and classified using the method described above. Between every session and

every block within a session, the distribution parameters of the LDA scores were updated for

each participant.

2.5 Assessment Metrics

Conventional performance metrics used for BCI systems are accuracy and ITR. However, ITR

is not useful for the proposed paradigm due to three inapplicable assumptions: 1) selections are

36

independent from one another, 2) marginal probabilities are uniform over the character in the

grid, 3) errors are uniform over the non-target characters.

ITR is computed as follows:

𝐵𝑅 = log𝑁 + 𝐴𝐶𝐶𝑐 log 𝐴𝐶𝐶𝑐 + (1 − 𝐴𝐶𝐶𝑐)𝑙𝑜𝑔1 − 𝐴𝐶𝐶𝑐𝑁 − 1

𝐼𝑇𝑅 = 𝐵𝑅 × 𝐶𝑃𝑀

(2.8)

where BR denotes bit rate, N is the number of cells in the grid, 𝐴𝐶𝐶𝑐 = ∑ 𝛿𝑥𝑡

𝑧𝑡𝑡

𝑛, 𝑡 = 1,… , 𝑛 is

the number of correctly classified characters over the total number of characters selected and

𝛿𝑥𝑡𝑧𝑡 is the indicator function which assumes a value of unity when the classifier output, z

t,

equals the intended target character, xt and is 0 otherwise. CPM is the number of characters

per minute.

Also, ITR largely depends on the length of the word and assigns high values to incorrect strings

that have many letters in common with the target.

Speier et al. suggested an alternative metric to overcome these shortcomings [101] namely

mutual information (MI). MI is computes as follows:

𝐵𝑅 = ∑𝑝(𝑧)(𝐴𝐶𝐶𝑤 log𝐴𝐶𝐶𝑤𝑝(𝑧)

+ (1 − 𝐴𝐶𝐶𝑤)𝑙𝑜𝑔1 − 𝐴𝐶𝐶𝑤1 − 𝑝(𝑧)

)

𝑧

𝑀𝐼 = 𝐵𝑅 ×𝑊𝑃𝑀

(2.9)

The summation is over all the words in the corpus and p(z) is the probability of word z

occurring. 𝐴𝐶𝐶𝑤 = ∑ 𝛿𝑥𝑡

𝑧𝑡𝑡

𝑛 is the number of correctly classified words over the total number

of selected words. WPM is the number of words per minute. The MI computation had to be

slightly altered for our system as all the words in the corpus were not considered for each

selection unlike traditional predictive spellers. For each selection in our system, only a subset

of words belonging to a specific, context relevant category was considered. Therefore, to

estimate the bitrate, we considered the subset of the corpus for each selection and took an

average over all selections. For the sake of comparison with previous work, we have

nonetheless reported both ITR and MI in our results. Also, in order to compare the study and

37

control blocks (unconstrained selection blocks) in terms of communication speed, we measured

the completion time and the number of selections for both blocks. NASA TLX forms were

filled at the end of each session to ascertain the factors that contributed to the system’s task

load.

38

A Novel Combination of

Natural Language Processing

and Brain Computer

Interfaces in a Question and

Answer Context

The following section is a journal article written based on the work completed in this thesis.

The material presented here can be found with greater detail in the other chapters.

3.1 Abstract

A P300 speller is a brain computer interface that can be used as a communication device for

individuals with speech and language impairments. Recent studies have incorporated natural

language processing to further improve the performance of these systems by allowing for

multiple characters being selected simultaneously and/or computing prior probability

39

distributions based on previously selected characters. In this study, we exploited natural

language processing to endow a P300 speller with awareness of conversational context in a

single adjacency pair conversation (i.e., question and answer). Context awareness of the system

was manifested as the generation of appropriate suggestions based both on the question posed

by the communication partner and the characters typed by the user. The proposed paradigm

was tested with ten typically developed adults and compared with previous context independent

systems. The integration of a context relevant predictive speller and answer generation engine

with a P300 brain-based speller led to increases in typing speed (by 42.84%) as well as

character and word accuracies on average across participants when compared to a context

independent P300 speller. Participant satisfaction was also higher with the context dependent

speller. The introduction of conversational context has potential to enhance the function and

user experience of a P300 speller for responding to questions.

Keywords: Brain-computer interface, electroencephalography, P300, natural language

processing, context aware, context independent, answer generation, context dependent.

3.2 Introduction

Some form of communication is necessary for expressing one’s needs and emotions through

body gestures, hand movements, speech or facial expressions. However, many individuals

living with severe disability often are not capable of communicating through these channels

[1]. A brain-computer interface (BCI) such as the P300 speller is a technology which makes

communication feasible through neural activity, eliminating the need for body movement [5].

A typical P300 speller interface involves a grid with letters and special characters; each row

and column flash in a pseudo-random sequence while the user fixates on the desired character

and counts the number of times that character flashes. Each time the corresponding row or

column flashes, a peak in the user’s brain signal occurs, whereas, flashing of the non-target

rows/columns ideally should not elicit such changes in the brain signal. This difference in the

brain signal makes it feasible to detect the desired row and column and therefore identify the

desired character. The main challenge with this BCI system is slow speed as multiple

repetitions are required to increase the signal-to-noise-ratio (SNR). Studies have attempted to

improve the communication speed of P300 spellers by optimising system parameters [68], [73],

40

interface design [17], [65], [75], [77], [79], [80], [82], [111], [112], stimulus hue and pattern

[76], signal processing techniques and classifiers [57], [113]–[115].

The field of natural language has been studied for many years in the domain of linguistics [36],

machine translation [6] and speech recognition [3]. However, language models have only been

recently integrated into the BCI domain [39]. The most common use of NLP in the field of BCI

is in P300 spellers. Language models can be exploited for word completion [38], signal

classification, and error correction [39], ultimately increasing communication rate [20].

Predictive spellers increase typing speed by allowing multiple characters to be chosen through

one selection. One of the first studies to present a predictive P300 speller deployed the Quillsoft

WordQ2 (version 2.5, Quillsoft, Ltd, Toronto, ON) assistive software to generate suggestions

as the user typed; the suggestions in turn, could be selected by focusing on their corresponding

numerical index in the original grid [111]. Although this two-step interface enhanced typing

speed, workload was also increased and accuracy was reduced. Later, Kaufmann et al.

integrated the suggestions into the original grid mitigating the additional cognitive load [22].

Later studies attempted to further improve this system by modifying the interface design and

stimulus [17], [19]. An additional approach to improve the performance of the P300 speller is

to incorporate a language model into the classification stage, i.e. to compute the weights of

each cell in the grid. Each letter has a likelihood of being selected next based on some

probability distribution conditioned on the previous selections. The simplest of such

probabilistic models is the naïve Bayes or Hidden Markov Model, which captures the relative

frequency of n grams, sequences of 𝑛 consecutive characters [92]–[95]. These models are

created by parsing through a corpus of text and counting the number of occurrences of these

sequences. The conditional probability of a character given the previous 𝑛 − 1 characters can

then be computed [92]. More recently, Speier et al. used particle filtering (PF) to compute prior

probabilities and correct errors through statistical modelling [20]. This classifier computed the

probability distribution over possible outputs by sampling a batch of possible realisations, i.e.

a batch of potential output strings typed by the user. Each of these particles moved through the

model independently based on the transition probabilities [20]. This model is useful when it is

impractical to compute the probability distribution over all possible strings in real-time.

The goal of the present study was to further enhance the communication rate of P300 spellers

in a single adjacency conversation pair. Specifically, we combined a context aware predictive

speller and an answer generation engine that comprehends the question being asked of the

41

participant, to efficiently present potential conversational responses. The participant could

either type a response or select from suggested answers. If the participant started typing, the

cells containing suggestions were repopulated with context-relevant words matching the

participant’s typed characters, thereby reducing typing time. With ten typically developed

adults, we investigated whether the incorporation of context awareness and answer generation

yields improvements in online communication rate over a generic P300 speller with predictive

spelling and a language model.

3.3 Methods

3.3.1 Participants

This study was approved by the research ethics boards of Holland Bloorview Kids

Rehabilitation Hospital and the University of Toronto. Ten typically-developed adults aged 20-

40, with no verbal, motor or neurological conditions and normal/corrected vision were

recruited through Holland Bloorview Kids Rehabilitation Hospital and the University of

Toronto. Participants provided informed consent. The study consisted of one offline and three

online sessions, each an hour in duration. Data were collected from each participant on four

different days.

3.3.2 Experimental design

Our design consisted of a speech-to-text tool that converted into text, the question asked by the

conversation partner, who in this study was the researcher. The text of the question was

displayed on the screen for the participant. We used Google’s API for the speech-to-text

conversion. In the next step, this text was sent to an NLP engine to classify the intent of the

question. We used MITIE open source library [103] for detecting the context of the question.

The detected intent was then used to generate six potential answers to the question. The

potential responses were displayed in the 6th column (the suggestion column) of a 6 × 6 speller.

The other 30 cells consisted of letters A-Z and four command cells. The initial suggestions

were predetermined based on frequency and popularity and tagged with the relevant context to

facilitate retrieval on the basis of the detected intent, by the answer generation engine.

42

Participants were given five seconds to locate their target cell. Suggestions were retrieved from

a context-based dictionary with twenty categories and 3302 words. The dictionary was

designed specifically for this study as we did not find any off-the-shelf context-based corpora.

The number of categories was determined based on the design of the experiment, i.e. how many

questions could be accommodated in an hour-long session. We implemented the famous-faces

stimuli with green hue [15] and pseudo-random flashes [104] with stimulus onset of 100 ms

and inter-stimulus interval of 200 ms. A five-second gap followed each selection, allowing the

participant to check the current word suggestions and to navigate through the grid to the target

letter or word. Figures 3.1 and 3.2 illustrate the interface and the general flow of a session,

respectively.

Figure 3.2: Timing of events during a trial. The question was only asked verbally in the last two blocks of the

online sessions. The timing of classification varied among sessions (offline or online) and individuals.

Participants attended one offline session consisting of five blocks, each with six questions

(trials). For each trial, a question and answer pair were shown on the screen for a second

Figure 3.1: After the question was asked verbally by the researcher, its text was shown on the screen and suggested

answers populated the last column.

43

followed by the presentation of the grid flanked by a suggestion column populated with

context-relevant suggestions. For each selection, the target letter/word was highlighted in red.

In three out of six trials, the designated answer was not found in the suggestion column and the

participant was guided to focus on the answer’s first letter. After fourteen flashes, feedback

was provided and the suggestions were updated. Based on the updates, further selections took

place. These trials are herein referred as iterative selection trials. The other three trials in the

block were single selection trials, meaning the designated answer was found among the

suggestions from the beginning of the trial. The reason for this split between iterative and single

selection trials was the assumption that half of the time, the participant will not find their

answer among the generated answers.

Participants attended three online sessions. The arrangement of trials in the first four blocks of

the online sessions resembled that of the offline session. The first block was offline and used

as same day data. The number of flashes were fixed to ten instead of fourteen to reduce the risk

of fatigue. For blocks 2 to 4 of the online sessions, namely the constrained selection blocks,

the answer was provided to the participant in the same manner as in the offline trials (red

highlight); however the number of flashes varied based on the confidence level of the classifier.

The question and answer pairs varied between sessions but were consistent among different

participants. After each of these online blocks, the classifier was retrained. The particpants

were asked not to correct any potential misclassifications as allowing for backspace poses

complex modelling challenges that is, at each selection the possibility of incorrect selections

affects the computation of prior probabilities and would not allow for the full exploitation of

information from the language model [19], [20], [106], [107].

In order to determine if our proposed paradigm could outperform previously studied P300-NLP

spellers in terms of communication rate and to also test our system in a more realistic manner,

we included two unconstrained selection blocks at the end of each online session. In these

blocks the questions were asked verbally and converted to text. The participant was given the

freedom to respond with a word at their discretion. For one unconstrained selection block, the

BCI ignored the context of the question (context-independent block), while for the other, the

BCI invoked the context-dependent answer generation engine (context-dependent block). The

presentation of the last two blocks was pseudo-randomized to mitigate any order effect.

44

Participants completed the NASA Task Load Index at the conclusion of each session to capture

their experience with our proposed communication system [108]. They also comparatively

rated and commented upon the context dependent and independent systems.

3.3.3 Data collection

All data were collected using eight active EEG electrodes, namely Fz, Cz, Pz, P3, P4, PO7,

PO8, and electrode cap BrainAmp DC amplifier (Brain Products GmbH, Germany) sampled at

1000 Hz, grounded at AFz and referenced to FCz [102]. Conductive gel was applied to each

electrode with impedances maintainted below 10 kΩ.

EEG signals were resampled to 50 Hz, bandpass filtered between 1Hz to 25 Hz with a finite

impulse response (FIR) filter. A notch filter at 60 Hz was applied to suppress power line

artefacts. Next, trials were epoched from 200ms prior to the stimulus onset to 800 ms post-

stimulus. The average of 200 ms pre-epoch data was subtracted from the data points to cancel

the baseline amplitude offset. Features were extracted according to [57] where the epochs

across the eight channels were concatenated to obtain a feature vector.

Classification

No explicit artefact removal was implemented; the discrimination between valid ERPs and

other signals, i.e. artefacts was made by the classifier [20]. We used a method similar to the

Bayesian Dynamic Stopping Language Model (DSLM) [93], which consists of an offline and

online portion. During the offline session, the probability density function of target and non-

target signals were computed. This was used in the online sessions to compute the likelihood

of an epoch belonging to one of the two classes. The posterior probability was also proportional

to the prior probability which was dependent on the language model. A bigram model was used

for the prior probability of letters A-Z as formulated in Equation 3.1. The prior probabilities

were weighted by the expected frequency at which cells would be selected: a 0.05 weight for

the four command cells and 0.95 for the other cell, i.e. letter cells and suggested words. Similar

to [93], the probability of selecting a letter was split into two terms to account for previous

misclassifications.

P(𝑥𝑡) = 0.475 × (0.85 × 𝑐(𝑥𝑡−1, 𝑥𝑡)

𝑐(𝑥𝑡−1)+ 0.15 ×

𝑐(𝑥𝑡)

𝑐(∗))

(3.1)

45

where c(𝑥𝑡−1, 𝑥𝑡) is the number of occurrences of the string sequence 𝑥𝑡−1, 𝑥𝑡in the corpus

while c(*) is the total number of words in the corpus with a minimum length of 𝑡. The first

term represents the prediction based on the bigram language model, i.e., the conditional

probability of typing xt assuming 𝑥𝑡−1 was correctly predicted whereas the second term

ignores the language model, i.e., the probability that xt occurs in position t of the word,

regardless of what had been previously typed. An empirical confidence of 85% was assigned

to the bigram search, thereby accommodating instances where 𝑥𝑡−1 may have been incorrectly

predicted. The constant 0.475 was chosen to reflect that approximately half of the time (0.95/2),

the user would not choose among the suggested words and would elect to type. Prior probability

for the words were computed as in Equation 3.2. At the beginning of each trial all six words

had the same probability. As letters were typed, the word probability became proportional to

the corresponding word counts, c(w) , in the corpus penalised by a value dependent on the

Lavenshtein distance between the typed string and the suggested word. The penalty value was

set to 0.80 empirically.

0.475 × {

1

6 if no letter has been selected

𝑐(𝑤) × (𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑣𝑎𝑙𝑢𝑒)𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 else

(3.2)

The number of flashes before classification was determined by the posterior probabilities of

each cell. A decision was made once the cell with maximum portability exceeded a chance

level of 80%.

3.3.4 Evaluation metrics

BCI systems are usually assessed based on their accuracy and speed. Most commonly

information transfer rate (ITR) is computed as a measurement of speed; however, it is not a

suitable metric for this system since it assumes an equal probability for all cells and a uniform

distribution of errors across the grid. Another issue introduced with including a predictive

speller is that the incorrectly selected words may have different lengths from the target.

Therefore, considering accuracy on a character level is not informative [20]. To mitigate this

issue, Speier et al. proposed accuracies on a word level and speed estimates through mutual

information (MI) [20]. In this study, the MI computation had to be slightly altered for our

system as all the words in the corpus were not considered for each selection unlike traditional

46

predictive spellers. For each selection in our system, only a subset of words belonging to a

specific, context relevant category was considered. Therefore, to estimate the bitrate, we

considered the subset of the corpus for each selection and took an average over all selections.

For the sake of comparison with previous work, we have nonetheless reported both ITR and

MI.

3.4 Results

3.4.1 ERP response

As expected, all participants exhibited a negative peak in their EEG response around 200 ms

and a positive peak around 300 ms after the stimulation presentation (Figure 3.3). This

corroborates the waveforms reported in previous works using familiar face stimuli [15], [104].

Topographic scalp maps were generated using EEG data from participant two in the target and

non-target conditions (Figure 3.4). These maps provide some insight into the regions of the

brain involved with the s elective attention task.

Figure 3.3: Average and standard deviation of stimulus response for participant 2 for target (blue) and non-target

(orange) stimuli. Signals were averaged across channels PO7 and PO8. The first arrow indicates the N200 peak,

a negative peak induced by the familiar face stimulus and the second arrow points to the P300 occurrence.

47

Figure 3.4: Topographic map of ERP response in participant 2. This figure shows the expected negative

inflection at 200 ms followed by a positive inflection.

3.4.2 Online performance

Participants achieved an accuracy of at least 95.97% with an average ITR of 43.33 bits/minute.

These accuracies significantly exceeded the chance level of 66.67% (p < 0.05). Unsurprisingly,

misclassifications tended to be cells proximal to the target, i.e., attributable to flashes of cells

in the neighbourhood of the target. Considering the word-level metrics, a minimum accuracy

of 96.29% was obtained with average MI of 10.67 bits/minute. Table 3.1 summarises the

performance of all participants for the constrained blocks in the online sessions.

Table 3.1: Average character ( ACCc) and word ( ACC

w) accuracies, information transfer rate (ITR) and

mutual information (MI) for the constrained selection blocks in online sessions.

𝐴𝐶𝐶𝑐(%) ITR

(bits/minute) 𝐴𝐶𝐶𝑤(%) MI

(bits/minute)

1 95.97 27.3 98.14 7.21

2 100 49.37 100 11.8

3 97.56 52.34 98.14 12.74

4 97.42 39.17 96.29 10.09

5 100 49.56 100 10.99

6 100 44.29 100 10.81

7 97.52 37.59 98.14 9.42

8 99.43 49.43 100 11.28

9 99.39 46.61 100 12.88

10 97.11 37.65 98.15 9.64

Average 98.44 43.33 98.89 10.67

STD 1.48 7.78 1.3 1.61

48

Unconstrained selection blocks

In the unconstrained blocks, all participants achieved higher than chance level accuracy in both

blocks; 82%, 71% for the CD and CI blocks, respectively (p < 0.05). With the CI predictive

speller a minimum accuracy of 90.97% with average ITR of 18.65 bits/minute and 3.65 CPM

were achieved. By incorporating context awareness, all the participants achieved significantly

higher accuracy with a minimum of 97.85% (p = 0.01), average

ITR of 42.64 bits/minute ( p <<10-5) and 8.38 CPM (p = 0.005).

Considering the word accuracy and MI, all participants performed better with the CD speller.

(Table 3.3). With the CI predictive speller, participants selected on average 0.67 words/minute

with 94% accuracy resulting in an average mutual information rate of 6.35 bits/min. When

using the answer generation engine and CD speller, participants achieved significant

improvements, with an average of 1.49 words/minute (p = 0.005), 11.11 bits/minute (p = 0.005)

and accuracy of 98.66% (p = 0.009)

Table 3.2: Character selection rates, accuracies and information transfer rates for all participants using the

context independent and context dependent predictive spellers

CPM(Characters

/minute) 𝐴𝐶𝐶𝑐(%) ITR(bits/minute)

CI CD CI CD CI CD

1 2.91 8.08 90.97 100 12.28 38.6

2 3.7 9.93 96.67 97.85 17.94 50.4

3 3.63 7.79 97.23 98.55 16.37 39.45

4 3.31 5.37 98.92 100 16.7 27.78

5 3.93 11.25 96 100 18.36 58.18

6 3.49 7.82 100 100 18.04 40.43

7 2.71 5.72 100 100 13.99 29.59

8 4.88 10 97.22 100 32.7 51.69

9 4.21 10.64 100 100 21.76 54.99

10 3.77 7.17 97.3 100 18.4 35.33

Average 3.65 8.38 97.43 99.64 18.65 42.64

STD 0.59 1.92 2.72 0.78 5.57 10.6

49

Table 3.3: Word selection rates, accuracies and mutual information for all participants using the context

independent and context dependent predictive spellers

WPM(Words

/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)

CI CD CI CD CI CD

1 0.65 1.77 86.67 100 5.22 13.34

2 0.71 1.9 93.33 93.33 6.35 13.48

3 0.65 1.41 86.67 93.33 6.09 10.73

4 0.56 0.9 93.33 100 5.13 6.82

5 0.67 1.85 93.33 100 5.81 14.06

6 0.55 1.23 100 100 5.53 9.29

7 0.56 1.19 100 100 7.03 9.12

8 0.94 1.47 93.33 100 8.83 10.39

9 0.83 2.1 100 100 8.35 16.03

10 0.56 1.04 93.33 100 5.15 7.88

Average 0.67 1.49 94 98.66 6.35 11.11

STD 0.12 0.38 4.92 2.81 1.26 2.83

3.4.3 Surveys

NASA TLX surveys were collected and analysed after each session. From the offline to online

sessions, there seemed to be a decrease in the level of mental demand, effort and frustration.

This reduction was expected as the number of flashes was fixed at fourteen for the offline

session but variable and capped at four sequences for the online sessions. As such, the stimulus

intervals and hence the period of required attention (effort and mental demand) were shortened,

while likely inducing less frustration among the participants. A decrease in the temporal

demand and effort was seen among 60% of the participants across the online sessions.

Comparing the weights of all six factors, mental demand had the highest rank with an average

of 3.73±1.44, which was not surprising given that the BCI task required attention. The overall

task load was 28.57/100 for all participants and for 60% of them a decrease in overall task load

was seen between the first and last online session.

All participants preferred the CD block stating it was easier and afforded more flexibility with

which to express their answers, reduced mental demand and fatigue and converged to their

desired answers faster. The comments on the CI was that the irrelevant suggestions were

distracting and at times caused frustration as more selections were necessary to arrive at their

desired answer.

50

3.5 Discussion

Incorporating the answer generation engine and context dependent predictive speller increased

the typing speed on average by 128% for ITR and character accuracy by 2.3%. Likewise, MI

increased by 75% while word accuracy jumped by 5%. These significant improvements were

due to the ability to select an appropriate word with fewer selections, if not from the beginning

of every trial.

For this study, we built a context-based corpus consisting of twenty different categories and

3302 words. This corpus was created manually and had fewer words compared to standard

corpora such as the Brown corpus [116]. The size of the corpus impacts system’s performance

in two ways: one being the diversity of word suggestions and the other being the mutual

information rate. It is important to have a broad enough corpus to be able to predict any word

of which the participant thinks. For a small number of questions in the unconstrained selection

blocks, the word that some participants had in mind did not exist in the corpus (participant 2

for two questions, participant 4 and participant 10 for one question). In such trials, the

participants only recourse was to type out the entire word, leading to increased completion

times and number of selections. However, since this occurred only for at most two questions

in total, the context-dependent paradigm remained advantageous in terms of the selected

metrics. We updated our corpus after each of these sessions. The size of the corpus also affects

the mutual information rate. Recall that the word bit rate is the amount of information that is

conveyed in a single word selection [101]. The more words in a corpus, the less the probability

of each word. The summation of such small word probabilities over the entire corpus leads to

a higher bit per selection. To compute the information rate over time, this bit rate is multiplied

by the average number of words per minute. Although we had fewer words in our corpus, its

context awareness led to a significant difference in the WPM between the CD and CI blocks

(p < 0.005) resulting in a higher MI rate compared to the larger CI corpus.

It is important to note that the performance metrics, e.g. ITR, MI are highly dependent on the

design and timing of the paradigm, the length of words the participant decides on and the

utilised software and hardware. Some studies have created their own software [17], [98] while,

others have deployed products available on the market [19], [21]. Also, different studies utilise

different machines, bioamplifiers, caps, and other instrumentation. Therefore, it is not possible

to conduct an objective comparison between studies. The focus of this study was to investigate

51

the effect of combining a P300 BCI, context relevant predictive speller and an answer

generation engine in a single adjacency pair conversation. Therefore, a core component of our

paradigm was asking and/or displaying a question on the screen for a few seconds giving the

participant time to process what they had been asked. None of the previous studies have studied

a BCI speller in the context of a conversation and thus had less time gaps between multiple

selections. In order to strike a comparison with previous studies, we conducted the CI and CD

blocks at the end of each online session and measured the performance. Although it is clear

that the additional time allocated to the beginning of each trial reduces the measured ITR and

MI in general, when comparing a context independent and context dependent system in a

question and answer context, our findings point to distinct speed and accuracy advantages of

the latter.

This paper verified the potential improvements achievable in a P300 speller by integrating a

context relevant predictive speller and answer generation engine in a single adjacency pair

conversation. However, the proposed paradigm was tested exclusively with typically

developed adults. In the spirit of previous studies that have reported promising use of P300

spellers by clinical populations [19], [50], [91], [117], further investigation is necessary to

confirm the usefulness of this system with individuals with complex communication

challenges, e.g. individuals with ALS or CP.

3.5.1 Limitations and future directions

The manually constructed context-based corpus was limited compared to standard corpora. In

order to reduce the chance of OOV words during interaction with the system, adaptive and

automatic addition of new words to the corpus will be beneficial. As discussed in chapter 2, we

had split the probability of selecting a letter into two terms to account for correct and incorrect

previous misclassifications. The computation can be used to flag whether the participant is

trying to select a word from the corpus or not and to automatically add the word to the

appropriate corporal category.

Another way of expanding the corpus could be to algorithmically screen internet articles and

webpages, preprocess the text, detect the category of each word and automatically add them to

a context-based dictionary.

52

Extension of the language model will be necessary to account for typing phrases and sentences.

This will possibly require modifications to the language model to transition from word to space

while maintaining the context. The proposed interface studied a unidirectional conversation

held by the researcher. However, a more realistic system should allow for a bidirectional

conversation affording more control to the participant. Therefore, it is important to

accommodate both conversational response and initiation. From an implementation

perspective, one possible approach to this challenge could be the implementation of a command

button that allows for a switch between response and initiation, where for example, the latter

would allow the participant to pose a question of their conversation partner. This will

potentially lead to a cumulative context that needs to be tracked by the NLP engine for

appropriate suggestions as the dialogue evolves.

Allowing for interaction between this system and other potential software such as games, web

browser, etc. is another area that will lead to a more realistic use case of BCI applications. Also,

there are many aspects of the BCI hardware itself that require further simplification and

improvement to allow for daily usage by individuals with communication challenges, e.g.

comfortable gel free electrodes on wireless caps and quicker set-up time.

It will be useful to have a GUI that allows for customisation. Some participants may prefer

speed over accuracy and therefore be willing to decrease the decision threshold. This is

understandable as in many cases the intended answer can be comprehended regardless of some

misclassifications. Another useful feature could be to allow the participant or caregiver to

predefine the initial suggestions based on common words preferred by the participant, e.g.

favourite food, games, etc.

3.6 Conclusion

In this work a communication system was designed with the ultimate objective of improving

the conversational function of a P300 speller. Our findings suggest that machine awareness of

conversational context, as realized through a combination of a context sensitive predictive

speller and an answer generation engine, can significantly improve classification speed and

accuracy in the P300 speller in single adjacency pair conversations. Subjective workload is

also reduced in the context-dependent paradigm. Collectively, these findings support future

53

incorporation of natural language processing, predictive spelling and language models in brain-

controlled communication devices.

54

Results

4.1 Overview

In this chapter, we expand upon the presentation of the results of our study and compare them

with previous studies. Since the accuracies and bit rates were not normally distributed, the

Wilcoxon signed-rank test was deployed. Note that some results in section 4.5.2 are replicated

from the previous chapter; however, in this chapter, we discuss the results in further detail.

4.2 Feature Extraction

We computed the temporal, spectral and frequency features as described in the previous

chapter. This feature set performed poorly in distinguishing between the target and non-target

groups as seen in Figure 4.1.

We then concatenated the EEG signals from the eight channels and as depicted in Figure 4.2,

the distributions were distinctly separable. The shape of the distributions for all participants

were similar, with slight differences in the separation of target and non-target groups.

55

Figure 4.1: LDA score distribution of target and non-target signals for participant 8 using the spatiotemporal

features as described in chapter 2. This set of features was not able to differentiate between target and non-target

signals.

Figure 4.2: LDA score distribution for target and non-target for participant 8 using the concatenation method.

This set of features was able to differentiate between target and non-target signals.

56

4.3 ERP responses

As expected, all participants exhibited a negative peak in their EEG response around 200 ms

and a positive peak around 300 ms after the stimulation representation (Figure 4.3). This

corroborates the waveforms reported in previous works using familiar face stimuli [15], [104].

Topographic scalp maps were generated using EEG data from participant two in the target and

non-target conditions (Figure 4.4). These maps provide some insight into the regions of the

brain involved with the selective attention task.

4.4 Participant-specific offline classification results

As the number of flashes increased, the accuracy also increased due to the accumulation of

P300 scores (Figure 4.5); however, there was a trade-off between accuracy and speed. We

therefore had our system dynamically decide on whether to classify a selection or continue

flashing. This decision was determined by two factors.

1- Threshold, i.e. if any of the cells had a probability higher than a certain value.

2- Maximum number of flashes, which was empirically set to four for the online sessions

after inspecting the offline data. Note that the number of flashes is per row/column. i.e.

four flashes per row meaning each cell flashes eight time at maximum.

We investigated the effect of threshold on performance in Figure 4.6. As expected, the higher

the threshold, the more information gathered and the better the performance. The threshold

value was determined by multiple pilots on different individuals. After each session we

inspected the highest probability among the thirty-six cells after each flash sequence. We

noticed that in almost all selections the non-target cell with the highest probability was less

than 80%, i.e. when a cell had exceeded 80% chance of being the target, it was most likely the

true target. We therefore empirically set the threshold value to 80% for all participants. The

average character accuracy and word accuracy for the offline session with a fixed threshold at

80% was 96.49 ± 4.76 % and 96.34 ± 5.66%, respectively. These values were the result of

averaging over a 10-fold cross validation. All participants achieved above-chance accuracy,

57

i.e. >64% (p < 0.05), which was estimated using the binomial distribution and the number of

selections as the dataset was relatively small [118].

The misclassifications occurred at cells adjacent to the target. This was expected as

illumination of cells close to the target are a source of distraction and to some degree inevitable

even with pseudo-random flashes. It took on average two sequences, i.e. four flashes to achieve

this accuracy. While only four out of fourteen flashes was taken into account to obtain this high

accuracy, we included the additional data to ensure that we trained a generalizable model. Table

4.1 summarises the offline performance for all participants.

Figure 4.3: ERP classification accuracy in 10-fold cross-validation versus the number of stimulus repetitions.

58

Figure 4.4: ERP classification accuracy in 10-fold cross-validation versus the threshold value.

Table 4.1: Offline performance

Participant 𝐴𝐶𝐶𝑐(%) 𝐴𝐶𝐶𝑤(%) Average #

Repetition

1 100 100 2.4

2 96.89 96.67 1.78

3 100 100 1.66

4 100 100 2.44

5 96.89 96.67 1.28

6 96.27 96.67 2.08

7 96.27 96.67 2.53

8 95.34 96.67 1.63

9 100 100 1.62

10 83.23 0.8 2.78

Average 96.49 96.34 2.02

STD 4.76 5.66 0.47

4.5 Participant-specific classification online results

4.5.1 Constrained Blocks (Blocks 2-4)

All participants achieved above-chance accuracy, 66.67% (p < 0.05). In many cases, the

mistakes occurred while the participant was distracted by the neighbouring cells as in the

offline session. Table 4.2 summarises the performance of all participants for blocks 2-4 in the

online sessions.

59

Table 4.2: Average accuracies, information transfer rate and mutual information for constrained blocks in

online sessions.

𝐴𝐶𝐶𝑐(%) ITR(bits/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)

1 95.97 27.3 98.14 7.21

2 100 49.37 100 11.8

3 97.56 52.34 98.14 12.74

4 97.42 39.17 96.29 10.09

5 100 49.56 100 10.99

6 100 44.29 100 10.81

7 97.52 37.59 98.14 9.42

8 99.43 49.43 100 11.28

9 99.39 46.61 100 12.88

10 97.11 37.65 98.15 9.64

Average 98.44 43.33 98.89 10.67

STD 1.48 7.78 1.3 1.61

4.5.2 Unconstrained selection blocks

The performance of context dependent (CD) and context independent (CI) unconstrained

selection blocks were compared in terms of accuracy, ITR, MI, number of selections and

completion time. All participants achieved higher than chance level accuracy in both blocks;

82% and 71% for the CD and CI blocks, respectively (p < 0.05). All participants achieved a

higher information transfer rate when using the CD predictive speller. When using the CI

predictive speller, participants selected on average 3.65 characters/minute with 97.43%

accuracy resulting in an average bit rate of 18.65 bits/minute. When using the CD speller,

participants achieved significant speed improvements, with an average CPM of 8.38

characters/min (p = 0.005) and an average bit rate of 42.64 bits/minute (p = 1.06× 10−6) and

accuracy of 99.64% (p = 0.01). For more details refer to Table 4.3. Figures 4.7 and 4.8

graphically depict character level performance differences between the CI and CD blocks for

each participant, specifically in terms of character accuracy and information transfer rates,

respectively.

The reason for the high standard deviation of participant 2 (CD block) in Figure 4.8 is that two

out of five of the participant’s target words in the first online session did not exist in the context-

based corpus and therefore the participant had to type the entire word. We will later discuss

this matter in further detail in the next chapter section 5.2.

60

Table 4.3: Selection rates, character accuracies and information transfer rates for all participants using the

context independent and context dependent predictive spellers

CPM(Characters

/minute) 𝐴𝐶𝐶𝑐(%) ITR(bits/minute)

CI CD CI CD CI CD

1 2.91 8.08 90.97 100 12.28 38.6

2 3.7 9.93 96.67 97.85 17.94 50.4

3 3.63 7.79 97.23 98.55 16.37 39.45

4 3.31 5.37 98.92 100 16.7 27.78

5 3.93 11.25 96 100 18.36 58.18

6 3.49 7.82 100 100 18.04 40.43

7 2.71 5.72 100 100 13.99 29.59

8 4.88 10 97.22 100 32.7 51.69

9 4.21 10.64 100 100 21.76 54.99

10 3.77 7.17 97.3 100 18.4 35.33

Average 3.65 8.38 97.43 99.64 18.65 42.64

STD 0.59 1.92 2.72 0.78 5.57 10.6

61

Figure 4.5: Comparing average character accuracy using context independent (green) and context dependent

(red) predictive spellers for all participants.

Figure 4.6: Comparing average information transfer rate using context independent and context dependent

predictive spellers for all participants.

62

When using word level metrics, all participants achieved a higher bit rate using the CD

predictive speller, as reported in Table 4.4. With the CI predictive speller, participants selected

on average 0.67 words/minute with 94% accuracy resulting in an average mutual information

rate of 6.35 bits/minute. When using the CD speller, participants achieved twice as much speed,

with an average WPM of 1.49 words/minute (p = 0.005), an average bit rate of 11.11

bits/minute (p = 0.005) and accuracy of 98.66% (p = 0.009).

Figures 4.9 and 4.10 graphically contrast the CI and CD blocks for each participant on a word

level.

Table 4.4: Selection rates, word accuracies and mutual information for all participants using the context

independent and context dependent predictive spellers

WPM(Words

/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)

CI CD CI CD CI CD

1 0.65 1.77 86.67 100 5.22 13.34

2 0.71 1.9 93.33 93.33 6.35 13.48

3 0.65 1.41 86.67 93.33 6.09 10.73

4 0.56 0.9 93.33 100 5.13 6.82

5 0.67 1.85 93.33 100 5.81 14.06

6 0.55 1.23 100 100 5.53 9.29

7 0.56 1.19 100 100 7.03 9.12

8 0.94 1.47 93.33 100 8.83 10.39

9 0.83 2.1 100 100 8.35 16.03

10 0.56 1.04 93.33 100 5.15 7.88

Average 0.67 1.49 94 98.66 6.35 11.11

STD 0.12 0.38 4.92 2.81 1.26 2.83

These results were expected as the CD predictive speller produced relevant suggestions as soon

as the question was asked, decreasing the number of selections, chances of error, completion

time and as a result minimized user fatigue.

All participants completed the unconstrained selection block in less time and with fewer

selections when using the CD speller (Table 4.5). With the CI predictive speller, participants

answered the five questions on average in 9.07 minutes with an average of 24.33 selections.

When using the CD speller, participants typing speed (in minutes) improved by 57.55%, with

an average completion time of 3.85 minutes (p = 0.005) and an average of 10.53 selections (p

= 0.005). Figures 4.11 and 4.12 graphically compare the completion time and number of

selections between the CI and CD blocks for each participant on a word level.

63

Figure 4.7: Comparing average word accuracy using context independent (green) and context dependent (red)


Figure 4.8: Comparing average mutual information using context dependent and context independent predictive

spellers for all participants.

64

Table 4.5: Completion time and number of selections for all participants using the context independent and

context dependent speller.

Figure 4.9: Comparing average completion time using context dependent and context independent predictive

spellers for all participants

Completion

Time

(minutes)

Number of

Selections

CI CD CI CD

1 8.15 2.98 17.33 8.33

2 7.37 3.9 22 10.67

3 8.18 3.76 27.67 11.33

4 8.97 5.61 23.67 14

5 8.54 2.8 26 6.67

6 9.18 4.28 29.67 12.67

7 9.07 4.28 23.67 9.67

8 7.37 3.44 28 11.33

9 6.19 2.46 20.33 6.67

10 8.95 5 25 14

Average 9.07 3.85 24.33 10.53

STD 0.92 0.94 3.57 2.56

65

Figure 4.10: Comparing average number of selections using context dependent and context independent


4.6 Surveys

At the end of each session the NASA TLX survey was provided to the participants (see

Appendix A.1) to gauge their experience in terms of: 1) the overall workload of different tasks

and, 2) the main sources of workload [50]. Task load is defined as a “hypothetical construct

that represents the cost incurred by a human operator to achieve a particular level of

performance” [108]. In this standard survey, participants were asked to rank six factors in terms

of difficulty from 0-100. These factors were mental demand, physical demand, temporal

demand, performance, effort and frustration. The participants then weighed each of these

factors against the others. These rankings and weights were then used to compute the overall

task load of interacting with the system.

From the offline to online sessions, there seems to be a decrease in the level of mental demand,

effort and frustration. Which was expected as the number of flashes was fixed at fourteen for

the offline session whereas for online blocks it was maximum six flashes (the maximum

number was set to eight in our design, but for all participants and all sessions it took at most

six flashes before deciding on a selection). This decrease obviously reduced the stimulus

66

intervals leading to a shorter period of required attention (effort and mental demand) and less

frustration among the participants.

Mental demand ranks were generally consistent for each individual but varied between

participants.

A decrease in the temporal demand and effort was seen among 60% of the participants across

online sessions. These individuals mentioned in the additional survey that as they attended

more sessions, they became more accustomed to the timing of the trials.

There seems to be a negative correlation between effort and frustration. This seems logical as

the more effort one makes in attending to the stimuli, the faster and more accurate the system,

which can lead to less frustration.

Comparing the weights of all six factors, mental demand had the highest rank with an average

3.73±1.44, which was not surprising since the BCI task necessitated visual attention.

The overall task load was less than 28.57/100 for all participants and for 60% of them a

decrease in overall task load was seen between the first and last online session.

For the online sessions, we provided an additional survey (see Appendix A.2) asking

specifically about the participants’ preference regarding the unconstrained selection blocks and

their reasoning behind that choice. All participants preferred the CD block stating it was easier

and more flexible, reduced mental demand and fatigue and converged to their desired answers

faster. The comments on the CI was that the irrelevant suggestions were distracting and at times

caused frustration as more selections were required to reach their desired answer.

67

Discussion

5.1 Overview

While the results presented in the previous chapter were in line with our expectations, there

remains a number of challenges that must be addressed to further improve this system. In this

chapter, we will discuss different aspects of the designed NLP-BCI system that require further

study and inspection. This chapter is an extended version of the discussion section (3.5) from

Chapter 3.

5.2 Context-based corpus

Recall that for this study we built a context-based corpus consisting of twenty different

categories and 3302 words. This corpus was created manually and thus was modest in size

compared to standard corpora such as the Brown corpus [116]. The size of the corpus affects

the system’s performance in two ways.

One is the variety of suggestions made; it is important to have a broad enough corpus such that

any word of which the participant thinks could be predicted. For a small number of questions

in the unconstrained selection blocks, the answers that some participants had in mind did not

exist in the corpus (participant 2 for two questions, participant 4 and participant 10 for one

question). In such trials, the participants had to type out the entire word, leading to an increased

completion time and number of selections. However, since this occurred only for at most two

questions in total, the overall findings were unaffected. We updated our corpus after each of

these sessions.

68

The size of the corpus also affects the mutual information rate. As mentioned earlier, word bit

rate is the amount of information that is conveyed in a single word selection [101]. The more

words in a corpus, the lower the probability of each word. The summation of such small word

probabilities over the entire corpus leads to a higher bit rate per selection. To compute the

information rate over time, this bit rate is multiplied by the average number of words per

minute. Although we had fewer words in our corpus, its context awareness led to a significant

difference in the WPM between the CD and CI blocks (p < 0.005) resulting in a higher MI rate

compared to the larger CI corpus.

5.3 Design parameters

It is important to note that the performance metrics, e.g. ITR, MI are highly dependent on the

design and timing of the paradigm, the length of words the user decides on and utilised software

and hardware. Some studies have created their own software [17], [98] while others have used

products available on the market [19], [21]. Also, different studies utilise different machines,

bioamplifiers, caps, etc. Therefore, an objective comparison between studies is not possible.

The focus of this study was to investigate the effect of combining a P300 BCI, context relevant

predictive speller and an answer generation engine in a single adjacency pair conversation.

Therefore, a core component of our paradigm was asking and/or displaying a question on the

screen for a few seconds giving the participant time to process what they have been asked.

None of the previous studies have studied a BCI speller in the context of a conversation and

thus had less time gaps between multiple selections. In order to have a comparison of previous

studies with ours, we conducted the CI and CD block at the end of each online session and

measured the performance. Although it is clear that the additional time allocated at the

beginning of each trial reduces the measured ITR and MI in general, when comparing a context

independent and context dependent system in a question and answer context, our results

validate that the context dependent outperforms.

There were a number of other design decisions made in this study such as dividing the trials

into single selection and iterative selections. This distribution was based on the assumption that

approximately half of the times, the user will not find their answer among the suggestions and

will start typing out the letters until the word is suggested. The results from the unconstrained

context dependent selection blocks validated this assumption. Participants found their desired

69

response on average 56% of the time among the initial suggestions. It is worth noting that the

order of the two unconstrained blocks were pseudo-randomised to minimize order effects.

Another design parameter was the decision threshold and number of stimuli repetitions.

Different approaches have been taken regarding this parameter in previous studies. Earlier

studies used a fixed number of repetitions [21], [22] based on offline accuracy. This does not

seem to be appropriate given the inter and intra-participant variability [93]. More recent studies

have investigated a dynamic (early) stopping criterion [92], [93]. Speier et al. optimised the

threshold per participant based on offline bit rates [92], but used a constant value of 95% in

their later studies [20], [119]. Kindermans et al. and Mainsah et al. empirically allocated fixed

thresholds of 99% and 90% for all participants [93], [107]. A similar approach to [93] was

taken in this design; based on pilot results of multiple participants, we decided on the value of

80%.

The maximum number of repetitions also varied among different studies. Speier et al. and

Kindermans et al. set a maximum of fifteen sequences before making a decision, i.e. if no cell

exceeded the threshold and every cell was illuminated for thirty times, a decision had to be

made [20], [92], [107], [119]. Mainsah et al. set the maximum number of stimulus to seven

sequences (fourteen flashes of each cell) in an earlier study, and later to ten sequences (twenty

flashes of each cell) [93], [94]. None of these studies set a minimum number of flashes and

after the first sequence the maximum of cell scores or probabilities was compared with the

threshold value and a classification decision was made accordingly. However, Kaufmann et al.

set a minimum of four sequences (eight flashes) to prevent high error rates in long spelling

sessions and a maximum based on the offline results of each participant. If the offline accuracy

exceeded 75% for a participant, the maximum was set to two more than the number of

repetitions that yielded the highest offline accuracy. In the case that a participant did not

achieve a minimum of 75% accuracy offline, a maximum of fifteen sequences was set for their

online sessions [22]. Based on these studies and our own pilot sessions, we set the maximum

repetition to four sequences. The collected data validated this choice as the average number of

flash sequences prior to a decision was 1.99 ± 0.39. Previous studies have not reported this

average, precluding direct comparison with past literature.

Artefact detection and removal is another important design factor. Studies have taken different

approaches regarding ocular artefact removal. Some studies have band pass filtered the data in

bands higher than the range of ocular and blink artefacts (>3 Hz) [5]. Some have used other

70

methods such as Eye Movement Correction Procedure (EMCP), which estimates a propagation

factor describing the relationship between electro-oculogram (EOG) and EEG records [120].

Yet other studies have shown that the classifier is able to detect valid ERPs in an epoch without

artefact rejection [20], [22]. We further investigated this statement by conducting a pilot session

where the participant was asked to blink abnormally during the stimulus presentation. No

changes were noticed in the system’s performance. We therefore decided not to take any

additional step in ocular artefact removal.

5.4 Interface modifications

The designed interface validated our research hypothesis; however, there remain questions as

to whether the arrangement of the grid was the most appropriate. Past studies with predictive

spellers have proposed different designs some which have been demonstrated in chapter 1.

Ryan et al. and Akram et. Al. presented participants with an additional window listing the top

retrieved suggestions from the dictionary [17], [21] and participants had to focus on the number

in the original grid corresponding the desired word in the suggestion list. This was subsequently

shown to increase the cognitive work load for the user, resulting in a decrease in accuracy [22].

A later study presented the suggested words in an additional column, positioned to the left of

the 6´ 6 letter grid, further away from the grid than the intercolumn distance within the grid

[22]. Another study designed a 6 × 6 grid that included positive single digit numbers. After the

user typed a letter the top 6 suggestions replaced the letters in the first row and the rows were

shifted one row downwards keeping only digits eight and nine [20]. It appears that in this

experimental design, the user was never asked to select a number and therefore replacing them

with suggestions did not affect the user’s target cell. Guy et al. designed an asymmetric grid

with 43 cells and the suggestions in the right most column [19]. None of these studies

mentioned the reasoning behind their design. Additional research must be completed to

determine the optimal BCI interface from a human factors perspective. During the initial online

sessions (CD block), some participants missed their desired word among the suggestions for a

variety of reasons. Some participants felt rushed to fixate on a target cell, while others had

confused the procedure with the CI block. Conceivably, the context-independent experience of

typing on their smart phones, where suggestions arise only upon character entry, contributed

to this confusion. Although after more interaction with the system over a number of sessions,

participants became more familiar and adjusted to the system and its timing, it is unclear

71

whether these mistakes could have been avoided with an alternative positioning of the

suggested words.

5.5 Error correction

Studies have taken different approaches to the correction of typing mistakes based on whether

a language model was used or not. Those without language models provided back space or

delete command buttons and asked users to correct the mistakes, during online trials [18], [19],

[21], [22]. In these studies, all grid cells had the same chance of occurrence as no prior language

probability was taken into account. Other studies have integrated error-related potentials (ErrP)

to correct errors automatically [121]–[123]. The ErrP signal is typically generated 50-100 ms

after an error is detected by the user [7]. However, studies including language models did not

allow for correction by the users and relied strictly on the language model [20], [92]–[94],

[106], [107], [119]. The reasoning behind this decision was that selecting back space in a

history-based language model complicates inference. Allowing for correction means that at

each selection, two cases are possible: the current classified character is either correct or

incorrect in which case the user has to select backspace. This scheme weakens the inference

approach as no history is considered and the language model is not used to its fullest extent

[107]. We, therefore, decided to not allow for corrections by the user and relied on the language

model in that regard.

5.6 Alternative modalities

This study verified that integrating a context aware predictive speller with a BCI speller in a

single adjacency conversation pair improves performance (i.e., speed and accuracy). However,

further research must be conducted to verify whether modifications in the P300 stimulus, using

other BCI modalities or combining two modalities could further enhance this performance. As

mentioned in chapter 1, SSVEP paradigms tend to gain higher bit rates as they do not require

a minimum of two flashes [58]–[61]. Some studies have focused on combining modalities such

as SSVEP with P300 [5], [124] and eye tracking with SSVEP [125]. An eye tracker is a high

speed commercially available device that could be used if certain conditions are met. Some

limitations with the eye tracker is its high dependency on the lighting of the environment, poor

72

performance with light coloured eyes and necessity for strict user positioning in the field of

view of the camera. Stawicki et al. have leveraged the high speed of eye trackers with high

classification accuracies of SSVEP-based BCIs in a speller and showed that this combination

improved performance compared to a stand-alone eye-tracking system or SSVEP speller.

Additionally, the two simultaneous stimuli proposed by Kaufmann et al. in [89] could also be

considered as an alternative to increase bit rate.

5.7 BCI target population

This thesis verified the potential improvements in a P300 speller by integrating a context

relevant predictive speller and answer generation engine in a single adjacency pair

conversation. However, the proposed paradigm was tested on typically developed adults.

Further investigation is necessary to confirm the usefulness of this system with individuals with

complex communication needs, e.g. individuals with advanced ALS or severe CP. Studies with

clinical populations using regular P300 spellers, i.e. with no language model and predictive

speller, have reported promising results. Donchin et al. tested a 6 × 6 speller with four

participants with paraplegia and gained an 80% accuracy level with 5.9 items/minute [91].

Sellers et al. achieved above 75% accuracy for nine out of fifteen participants with ALS [117].

Zickler et al. studied the feasibility of performing four tasks, namely, copy spelling, free

spelling of a sentence, typing a word and selecting appropriate commands to send an email,

and using a web browser on four participants with severe disabilities in their homes. They

reported above 70% accuracy with an average ITR of 8.56 bits/minute [50]. Although the

number of selections were predefined and the different tasks were not integrated into one

system, this study showed the capacity of P300 BCIs for daily usage. More recently, Guy et al.

conducted an experiment with twenty individuals with ALS using a P300 BCI integrated with

a predictive speller and the familiar face stimulus in a semi-realistic environment, where an

occupational therapist with no prior BCI experience setup the BCI in a regular office space in

a hospital. Impressively, 65% of participants gained up to 95% accuracy with 5.04 correct

symbols/minute [19]. More recent studies tested P300 spellers integrated with language models

on non-typically developed adults. Speier et al. tested a P300 speller that used a language model

with six participants with ALS in their homes. All participants gained above 84% accuracy and

73

a minimum of 6 CPM. Mainsah et.al conducted a study with a similar experimental set-up,

feature set and classifier involving ten participants with ALS and reported 76.39% accuracy.

P300 studies have also been conducted on paediatric participants with conditions such as CP

with promising results, indicating that little difference in latency and amplitude of the P300 in

children with mild CP compared to that of typically developing children [126], [127].

For participants facing challenges in controlling gaze dependent P300 spellers, alternative

interfaces can be used as discussed in chapter 1.

74

Conclusion

6.1 Overview

This work demonstrated the design and implementation of a NLP-P300 speller in a single

adjacency pair conversation context. The paradigm consisted of a speech-to-text tool that

converted the question asked by the conversation partner into text. The text of the question was

displayed on the screen for the user. In the next step, this text was sent to an NLP engine which

generated six potential answers to the question. The potential responses were displayed in the

6th column (the suggestion column) of a 6 × 6 speller. The initial suggestions were

predetermined based on frequency and popularity and tagged with the relevant context so the

answer generation engine retrieved words based on the detected intent. We compared our

proposed system with previous studies by designing two unconstrained selection blocks at the

end of each online session. In the context independent (CI) unconstrained block, five questions

were asked from the participant and a context independent predictive speller populated the

suggestion column based on what had been typed; therefore, the user was forced to type at least

one letter for suggestions to be retrieved. On the other hand, in the context dependent (CD)

unconstrained block, the same five questions were asked and potential answers populated the

suggestion column allowing the user to select a word from the beginning. Users were asked to

give the same answers they gave in the CI block, assuming the CI block occurred first. If the

user did not find their answer among the suggestions, other context relevant words were

suggested as they started typing letters. This system was tested on 10 typically developed

adults. All participants gained above chance accuracy and achieved a higher information

transfer rate with the CD predictive speller, averaging 8.24 characters/min (p = 0.005) and an

average bit rate of 42.01 bits/minute (p =1.06 × 10−6) and accuracy of 99.55% (p = 0.01).

75

6.2 Future work

6.2.1 Adaptively expanding the corpus

As mentioned previously, the context-based corpus was built manually and therefore is limited

compared to the standard corpora. In order to reduce the chance of OOV words during

interaction with the system, adaptive and automatic addition of new words to the corpus will

be beneficial. As discussed in chapter 2, we split the probability of selecting a letter into two

terms to account for correct and incorrect previous misclassifications. The result of this

computation can be used to flag whether the user is trying to select a word from the corpus or

not and automatically add it to the corpus in its appropriate category if flagged as a new word.

Another way of expanding the corpus could be by writing a script to screen internet articles

and webpages, preprocess the text, detect the category of each word and automatically add the

words to a context-based dictionary.

6.2.2 Expressive communication: Taking turns

Extension of the language model will be necessary to account for typing phrases and sentences.

This will possibly require modifications in the language model to transition from word to space

while maintaining the context.

The proposed interface was studied via a unidirectional conversation held by the researcher.

However, a more realistic system should allow for a bidirectional conversation flow allowing

greater control by the user. Therefore, it is important to consider an expressive communication

pathway as well as receptive. From an implementation perspective, one possible approach to

this problem could be having a command button that allows for a switch between receptive and

expressive conversations where the user can type a question from their conversation partner.

This will potentially lead to a cumulative context that needs to be tracked by the NLP engine

for appropriate suggestions.

76

6.2.3 Optimisation

Allowing for interaction between this system and other potential software such as games, web

browser, etc. is another area that will lead to a more realistic use case of BCI applications. Also,

there are many aspects of the BCI hardware itself that require further simplification and

improvement to allow for daily usage by individuals with communication challenges, e.g.

comfortable gel free electrodes on wireless caps, less setup time.

6.2.4 Customising the interface

It will be useful to have a GUI that allows for customisation. Some users may prefer speed over

accuracy and therefore be willing to decrease the decision threshold. This is understandable as

in many cases the intended answer can be comprehended regardless of some misclassifications.

Another useful feature could be to allow the user or caregiver to predefine the initial

suggestions based on the common words used personally by the user, e.g. favourite food,

games, etc.

77

Bibliography

[1] K. Tai, S. Blain, and T. Chau, “A review of emerging access technologies for individuals

with severe motor impairments,” Assist. Technol., vol. 20, no. 4, pp. 204–221, 2008.

[2] N. Alves and T. Chau, “Uncovering patterns of forearm muscle activity using multi-

channel mechanomyography,” J. Electromyogr. Kinesiol., vol. 20, no. 5, pp. 777–786,

2010.

[3] N. Memarian, A. N. Venetsanopoulos, and T. Chau, “Infrared thermography as an access

pathway for individuals with severe motor impairments,” J. Neuroeng. Rehabil., vol. 6,

no. 1, 2009.

[4] B. Leung and T. Chau, “A multiple camera tongue switch for a child with severe spastic

quadriplegic cerebral palsy.,” Disabil. Rehabil. Assist. Technol., vol. 5, no. 1, pp. 58–

68, 2010.

[5] Y. Erwei, T. Zeyl, R. Saab, T. Chau, H. Dewen, and Z. Zongtan, “A Hybrid Brain-

Computer Interface Based on the Fusion of P300 and SSVEP Scores,” Neural Syst.

Rehabil. Eng. IEEE Trans., vol. 23, no. 4, pp. 693–701, 2015.

[6] K. Cho et al., “Learning Phrase Representations using RNN Encoder-Decoder for

Statistical Machine Translation,” 2014.

[7] A. Rezeika, M. Benda, P. Stawicki, F. Gembler, A. Saboor, and I. Volosyak, “Brain–

Computer Interface Spellers: A Review,” Brain Sci., vol. 8, no. 4, p. 57, 2018.

[8] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, a review,” Sensors,

vol. 12, no. 2, pp. 1211–1279, 2012.

[9] L. a Farwell and E. Donchin, “Talking Off the Top of Your Head,” Electroencephalogr.

Clin. Neurophysiol., vol. 70, no. 6, pp. 510–523, 1988.

[10] Y. Li, C. S. Nam, B. B. Shadden, and S. L. Johnson, “A P300-based brain–computer

interface: Effects of interface type and screen size,” Intl. J. Human–Computer Interact.,

vol. 27, no. 1, pp. 52–68, 2010.

78

[11] Y. Sakai and T. Yagi, “Alphabet matrix layout in P300 speller may alter its

performance,” in Biomedical Engineering International Conference (BMEiCON), 2011,

2012, pp. 89–92.

[12] J. Jin, E. W. Sellers, and X. Wang, “Targeting an efficient target-to-target interval for

P300 speller brain-computer interfaces,” Med. Biol. Eng. Comput., vol. 50, no. 3, pp.

289–296, 2012.

[13] Y. Liu, Z. Zhou, and D. Hu, “Comparison of stimulus types in visual P300 speller of

brain-computer interfaces,” in Cognitive Informatics (ICCI), 2010 9th IEEE

International Conference on, 2010, pp. 273–279.

[14] I. Käthner, A. Kübler, and S. Halder, “Rapid P300 brain-computer interface

communication with a head-mounted display,” Front. Neurosci., vol. 9, p. 207, 2015.

[15] Q. Li, S. Liu, J. Li, and O. Bai, “Use of a green familiar faces paradigm improves p300-

speller brain-computer interface performance,” PLoS One, vol. 10, no. 6, p. e0130325,

2015.

[16] J. Jarmolowska, M. M. Turconi, P. Busan, J. Mei, and P. P. Battaglini, “A multimenu

system based on the p300 component as a time saving procedure for communication

with a brain-computer interface,” Front. Neurosci., vol. 7, no. 7 MAR, pp. 1–10, 2013.

[17] F. Akram, S. M. Han, and T. S. Kim, “An efficient word typing P300-BCI system using

a modified T9 interface and random forest classifier,” Comput. Biol. Med., vol. 56, pp.

30–36, 2015.

[18] F. Akram, M. K. Metwally, H. S. Han, H. J. Jeon, and T. S. Kim, “A novel P300-based

BCI system for words typing,” 2013 Int. Winter Work. Brain-Computer Interface, BCI

2013, no. February, pp. 24–25, 2013.

[19] V. Guy, M. H. Soriani, M. Bruno, T. Papadopoulo, C. Desnuelle, and M. Clerc, “Brain

computer interface with the P300 speller: Usability for disabled people with

amyotrophic lateral sclerosis,” Ann. Phys. Rehabil. Med., vol. 61, no. 1, pp. 5–11, 2018.

[20] W. Speier, C. Arnold, N. Chandravadia, D. Roberts, S. Pendekanti, and N. Pouratian,

“Improving P300 spelling rate using language models and predictive spelling,” Brain-

79

Computer Interfaces, vol. 2621, pp. 1–10, 2017.

[21] D. B. Ryan et al., “Predictive spelling with a P300-based brain-computer interface:

Increasing the rate of communication,” Int. J. Hum. Comput. Interact., vol. 27, no. 1,

pp. 69–84, 2011.

[22] T. Kaufmann, S. V??lker, L. Gunesch, and A. K??bler, “Spelling is just a click away -

A user-centered brain-computer interface including auto-calibration and predictive text

entry,” Front. Neurosci., vol. 6, no. MAY, pp. 1–10, 2012.

[23] A. Kübler, B. Kotchoubey, J. Kaiser, J. R. Wolpaw, and N. Birbaumer, “Brain–computer

communication: Unlocking the locked in.,” Psychol. Bull., vol. 127, no. 3, p. 358, 2001.

[24] E. J. Speckman, C. E. Elger, and A. Gorji, “Neurophysiologic Basis of EEG and DC

Potentials,” pp. 1–16.

[25] S. M. Coyle, T. E. Ward, and C. M. Markham, “Brain-computer interface using a

simplified functional near-infrared spectroscopy system.,” J. Neural Eng., vol. 4, no. 3,

pp. 219–226, 2007.

[26] J. R. Wolpaw et al., “Brain-computer interface technology: a review of the first

international meeting,” IEEE Trans. Rehabil. Eng., vol. 8, no. 2, pp. 164–173, 2000.

[27] S. Moghimi, A. Kushki, A. Marie Guerguerian, and T. Chau, “A review of EEG-Based

brain-computer interfaces as access pathways for individuals with severe disabilities,”

Assist. Technol., vol. 25, no. 2, pp. 99–110, 2013.

[28] J. Wang, G. Xu, L. Wang, and H. Zhang, “Feature extraction of brain-computer interface

based on improved multivariate adaptive autoregressive models,” in 2010 3rd

International Conference on Biomedical Engineering and Informatics, 2010, vol. 2, pp.

895–898.

[29] N. N. Birbaumer et al., “A spelling device for the paralysed.,” Nature, vol. 398, no.

6725, pp. 297–298, 1999.

[30] E. Pasqualotto, S. Federici, and M. O. Belardinelli, “Toward functioning and usable

brain-computer interfaces (BCIs): A literature review,” Disabil. Rehabil. Assist.

Technol., vol. 7, no. 2, pp. 89–103, 2012.

80

[31] T. O. Zander and C. Kothe, “Towards passive brain-computer interfaces: Applying

brain-computer interface technology to human-machine systems in general,” J. Neural

Eng., vol. 8, no. 2, 2011.

[32] K. Cassady, A. You, A. Doud, and B. He, “The impact of mind-body awareness training

on the early learning of a brain-computer interface,” Technology, vol. 2, no. 03, pp. 254–

260, 2014.

[33] S. D. Power, A. Kushki, and T. Chau, “Towards a system-paced near-infrared

spectroscopy brain–computer interface: differentiating prefrontal activity due to mental

arithmetic and mental singing from the no-control state,” J. Neural Eng., vol. 8, no. 6,

p. 66004, 2011.

[34] J. Milton, S. L. Small, and A. Solodkin, “Imaging motor imagery: methodological issues

related to expertise,” Methods, vol. 45, no. 4, pp. 336–341, Aug. 2008.

[35] G. Pfurtscheller and C. Neuper, “Motor imagery and direct brain-computer

communication,” Proc. IEEE, vol. 89, no. 7, pp. 1123–1134, 2001.

[36] S. Bajaj, A. J. Butler, D. Drake, and M. Dhamala, “Brain effective connectivity during

motor-imagery and execution following stroke and rehabilitation,” NeuroImage. Clin.,

vol. 8, pp. 572–582, Jun. 2015.

[37] J. V. Odom et al., “Visual evoked potentials standard (2004),” Doc. Ophthalmol., vol.

108, no. 2, pp. 115–123, 2004.

[38] M. Wang et al., “A new hybrid BCI paradigm based on P300 and SSVEP,” J. Neurosci.

Methods, vol. 244, pp. 16–25, 2015.

[39] J. Chen, D. Zhang, A. K. Engel, Q. Gong, and A. Maye, “Application of a single-flicker

online SSVEP BCI for spatial navigation,” PLoS One, vol. 12, no. 5, pp. 1–13, 2017.

[40] Y. Y. Chien et al., “Polychromatic SSVEP stimuli with subtle flickering adapted to

brain-display interactions,” J. Neural Eng., vol. 14, no. 1, 2017.

[41] S. Sur and V. K. Sinha, “Event-related potential: An overview,” Ind. Psychiatry J., vol.

18, no. 1, p. 70, 2009.

81

[42] J. Polich, “Updating P300: An integrative theory of P3a and P3b,” Clin. Neurophysiol.,

vol. 118, no. 10, pp. 2128–2148, 2007.

[43] R. M. Chapman and H. R. Bragdon, “Evoked responses to numerical and non-numerical

visual stimuli while problem solving,” Nature, vol. 203, no. 4950, p. 1155, 1964.

[44] S. Sutton, M. Braren, J. Zubin, and E. R. John, “Evoked-potential correlates of stimulus

uncertainty,” Science (80-. )., vol. 150, no. 3700, pp. 1187–1188, 1965.

[45] J. Polich, “Neuropsychology of P300,” Oxford Handb. event-related potential

components, vol. 159, p. 88, 2012.

[46] C. C. Duncan‐Johnson and E. Donchin, “On quantifying surprise: The variation of

event‐related potentials with subjective probability,” Psychophysiology, vol. 14, no. 5,

pp. 456–467, 1977.

[47] J. Polich and C. Margala, “P300 and probability: comparison of oddball and single-

stimulus paradigms,” Int. J. Psychophysiol., vol. 25, no. 2, pp. 169–176, 1997.

[48] E. Donchin, M. Kubovy, M. Kutas, R. Johnson, and R. I. Tterning, “Graded changes in

evoked response (P300) amplitude as a function of cognitive activity,” Percept.

Psychophys., vol. 14, no. 2, pp. 319–324, 1973.

[49] M. Palankar et al., “Control of a 9-DoF wheelchair-mounted robotic arm system using

a P300 brain computer interface: Initial experiments,” in 2008 IEEE International

Conference on Robotics and Biomimetics, 2009, pp. 348–353.

[50] C. Zickler et al., “A brain-computer interface as input channel for a standard assistive

technology software,” Clin. EEG Neurosci., vol. 42, no. 4, pp. 236–244, 2011.

[51] A.-M. Brouwer and J. B. F. Van Erp, “A tactile P300 brain-computer interface,” Front.

Neurosci., vol. 4, p. 19, 2010.

[52] R. W. McCarley et al., “Auditory P300 abnormalities and left posterior superior

temporal gyrus volume reduction in schizophrenia,” Arch. Gen. Psychiatry, vol. 50, no.

3, pp. 190–197, 1993.

[53] X. An, J. Höhne, D. Ming, and B. Blankertz, “Exploring combinations of auditory and

82

visual stimuli for gaze-independent brain-computer interfaces,” PLoS One, vol. 9, no.

10, p. e111070, 2014.

[54] J. Polich, P. C. Ellerson, and J. Cohen, “P300, stimulus intensity, modality, and

probability,” Int. J. Psychophysiol., vol. 23, no. 1–2, pp. 55–62, 1996.

[55] F. Aloise et al., “Multimodal stimulation for a P300-based BCI,” Int. J. Bioelectromagn,

vol. 9, no. 3, pp. 128–130, 2007.

[56] E. Yin, T. Zeyl, R. Saab, D. Hu, Z. Zhou, and T. Chau, “An Auditory-Tactile Visual

Saccade-Independent P300 Brain–Computer Interface,” Int. J. Neural Syst., vol. 26, no.

01, p. 1650001, 2016.

[57] D. J. Krusienski et al., “A comparison of classification techniques for the P300 Speller,”

J. Neural Eng., vol. 3, no. 4, p. 299, 2006.

[58] I. Volosyak, “SSVEP-based Bremen–BCI interface—boosting information transfer

rates,” J. Neural Eng., vol. 8, no. 3, p. 36020, 2011.

[59] I. Volosyak, A. Moor, and A. Gräser, “A dictionary-driven SSVEP speller with a

modified graphical user interface,” in International Work-Conference on Artificial

Neural Networks, 2011, pp. 353–361.

[60] Y. Wang and T. Jung, “Visual stimulus design for high-rate SSVEP BCI,” Electron.

Lett., vol. 46, no. 15, pp. 1057–1058, 2010.

[61] M. Nakanishi, Y. Wang, X. Chen, Y. Wang, X. Gao, and T. Jung, “Enhancing Detection

of SSVEPs for a High-Speed Brain Speller Using Task-Related Component Analysis,”

IEEE Trans. Biomed. Eng., vol. 65, no. 1, pp. 104–112, 2018.

[62] E. Yin, Z. Zhou, J. Jiang, F. Chen, Y. Liu, and D. Hu, “A speedy hybrid BCI spelling

approach combining P300 and SSVEP,” IEEE Trans. Biomed. Eng., vol. 61, no. 2, pp.

473–483, 2014.

[63] Z. Lin, C. Zhang, Y. Zeng, L. Tong, and B. Yan, “A novel P300 BCI speller based on

the Triple RSVP paradigm,” Sci. Rep., vol. 8, no. 1, p. 3350, 2018.

[64] L. Acqualagna, M. S. Treder, and B. Blankertz, “Chroma Speller: Isotropic visual

83

stimuli for truly gaze-independent spelling,” in Neural Engineering (NER), 2013 6th

International IEEE/EMBS Conference on, 2013, pp. 1041–1044.

[65] F. Aloise et al., “A covert attention P300-based brain–computer interface: Geospell,”

Ergonomics, vol. 55, no. 5, pp. 538–551, 2012.

[66] W. Speier, C. Arnold, and N. Pouratian, “Integrating language models into classifiers

for BCI communication: A review,” J. Neural Eng., vol. 13, no. 3, pp. 1–13, 2016.

[67] G. Pires, U. Nunes, and M. Castelo-Branco, “GIBS block speller: toward a gaze-

independent P300-based BCI,” in Engineering in Medicine and Biology Society, EMBC,

2011 Annual International Conference of the IEEE, 2011, pp. 6360–6364.

[68] M. S. Treder, H. Purwins, D. Miklody, I. Sturm, and B. Blankertz, “Decoding auditory

attention to instruments in polyphonic music using single-trial EEG classification,” J.

Neural Eng., vol. 11, no. 2, 2014.

[69] I. Käthner, C. A. Ruf, E. Pasqualotto, C. Braun, N. Birbaumer, and S. Halder, “A

portable auditory P300 brain-computer interface with directional cues,” Clin.

Neurophysiol., vol. 124, no. 2, pp. 327–338, 2013.

[70] A. Onishi, K. Takano, T. Kawase, H. Ora, and K. Kansaku, “Affective stimuli for an

auditory P300 brain-computer interface,” Front. Neurosci., vol. 11, no. SEP, pp. 1–9,

2017.

[71] B. Blankertz et al., “The Berlin Brain-Computer Interface presents the novel mental

typewriter Hex-o-Spell.,” 2006.

[72] I. Käthner, S. C. Wriessnegger, G. R. Müller-Putz, A. Kübler, and S. Halder, “Effects

of mental workload and fatigue on the P300, alpha and theta band power during

operation of an ERP (P300) brain-computer interface,” Biol. Psychol., vol. 102, no. 1,

pp. 118–129, 2014.

[73] D. E. Thompson et al., “Performance measurement for brain-computer or brain-machine

interfaces: A tutorial,” J. Neural Eng., vol. 11, no. 3, 2014.

[74] M. S. Treder and B. Blankertz, “(C)overt attention and visual speller design in an ERP-

based brain-computer interface,” Behav. Brain Funct., vol. 6, pp. 1–13, 2010.

84

[75] B. Z. Allison and J. A. Pineda, “Effects of SOA and flash pattern manipulations on

ERPs, performance, and preference: Implications for a BCI system,” Int. J.

Psychophysiol., vol. 59, no. 2, pp. 127–140, 2006.

[76] M. Salvaris and F. Sepulveda, “Visual modifications on the P300 speller BCI paradigm,”

J. Neural Eng., vol. 6, no. 4, 2009.

[77] C. Guger et al., “How many people are able to control a P300-based brain-computer

interface (BCI)?,” Neurosci. Lett., vol. 462, no. 1, pp. 94–98, 2009.

[78] G. Townsend et al., “A novel P300-based brain-computer interface stimulus

presentation paradigm: Moving beyond rows and columns,” Clin. Neurophysiol., vol.

121, no. 7, pp. 1109–1120, 2010.

[79] C. E. Lakey, D. R. Berry, and E. W. Sellers, “Manipulating attention via mindfulness

induction improves P300-based brain-computer interface performance,” J. Neural Eng.,

vol. 8, no. 2, 2011.

[80] R. Fazel-Rezai and K. Abhari, “A region-based P300 speller for brain-computer

interface,” Can. J. Electr. Comput. Eng., vol. 34, no. 3, pp. 81–85, 2009.

[81] R. Fazel-Rezai and W. Ahmad, “P300-based brain-computer interface paradigm

design,” in Recent advances in brain-computer interface systems, InTech, 2011.

[82] G. Pires, U. Nunes, and M. Castelo-Branco, “Comparison of a row-column speller vs. a

novel lateral single-character speller: assessment of BCI for severe motor disabled

patients,” Clin. Neurophysiol., vol. 123, no. 6, pp. 1168–1181, 2012.

[83] F. Guo, B. Hong, X. Gao, and S. Gao, “A brain–computer interface using motion-onset

visual evoked potential,” J. Neural Eng., vol. 5, no. 4, p. 477, 2008.

[84] B. Hong, F. Guo, T. Liu, X. Gao, and S. Gao, “N200-speller using motion-onset visual

response,” Clin. Neurophysiol., vol. 120, no. 9, pp. 1658–1666, 2009.

[85] T. Liu, L. Goldberg, S. Gao, and B. Hong, “An online brain–computer interface using

non-flashing visual evoked potentials,” J. Neural Eng., vol. 7, no. 3, p. 36003, 2010.

[86] J. Jin, B. Z. Allison, X. Wang, and C. Neuper, “A combined brain–computer interface

85

based on P300 potentials and motion-onset visual evoked potentials,” J. Neurosci.

Methods, vol. 205, no. 2, pp. 265–276, 2012.

[87] T. Kaufmann, S. M. Schulz, C. Grünzinger, and A. Kübler, “Flashing characters with

famous faces improves ERP-based brain–computer interface performance,” J. Neural

Eng., vol. 8, no. 5, p. 56016, 2011.

[88] S. H Patel and P. N Azzam, Characterization of N200 and P300: Selected Studies of the

Event-Related Potential, vol. 2. 2005.

[89] T. Kaufmann and A. Kübler, “Beyond maximum speed — a novel two- stimulus

paradigm for brain – computer interfaces based on event-related potentials ( P300-BCI

),” vol. 11, 2014.

[90] B. R. Conway, S. Moeller, and D. Y. Tsao, “Specialized Color Modules in Macaque

Extrastriate Cortex,” Neuron, vol. 56, no. 3, pp. 560–573, 2007.

[91] E. Donchin, K. M. Spencer, and R. Wijesinghe, “The mental prosthesis: Assessing the

speed of a P300-based brain- computer interface,” IEEE Trans. Rehabil. Eng., vol. 8,

no. 2, pp. 174–179, 2000.

[92] W. Speier, C. Arnold, J. Lu, R. K. Taira, and N. Pouratian, “Natural language processing

with dynamic classification improves P300 speller accuracy and bit rate,” J. Neural

Eng., vol. 9, no. 1, 2012.

[93] B. O. Mainsah et al., “Increasing BCI communication rates with dynamic stopping

towards more practical use: An ALS study,” J. Neural Eng., vol. 12, no. 1, p. 16013,

2015.

[94] B. O. Mainsah, K. A. Colwell, L. M. Collins, S. Member, and C. S. Throckmorton,

“Utilizing a Language Model to Improve Online Dynamic Data Collection in P300

Spellers,” vol. 22, no. 4, pp. 837–846, 2014.

[95] U. Orhan et al., “Fusion with Language Models Improves Spelling Accuracy for ERP-

based Brain Computer Interface Spellers,” 2011 Annu. Int. Conf. IEEE Eng. Med. Biol.

Soc., pp. 5774–5777, 2011.

[96] S. Fine, Y. Singer, and N. Tishby, “The hierarchical hidden Markov model: Analysis

86

and applications,” Mach. Learn., vol. 32, no. 1, pp. 41–62, 1998.

[97] U. Orhan, H. Nezamfar, M. Akcakaya, D. Erdogmus, and M. Higger, “Probabilistic

simulation framework for EEG-based BCI design,” Brain-Computer Interfaces, vol. 3,

no. 4, pp. 1–15, 2016.

[98] W. Speier, C. Arnold, J. Lu, A. Deshpande, and N. Pouratian, “Integrating language

information with a hidden markov model to improve communication rate in the P300

speller,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 22, no. 3, pp. 678–684, 2014.

[99] T. Schreiber, “Measuring information transfer,” Phys. Rev. Lett., vol. 85, no. 2, p. 461,

2000.

[100] A. Furdea et al., “An auditory oddball (P300) spelling system for brain-computer

interfaces,” Psychophysiology, vol. 46, no. 3, pp. 617–625, 2009.

[101] W. Speier, C. Arnold, and N. Pouratian, “Evaluating True BCI Communication Rate

through Mutual Information and Language Models,” PLoS One, vol. 8, no. 10, 2013.

[102] K. Takano, T. Komatsu, N. Hata, Y. Nakajima, and K. Kansaku, “Visual stimuli for the

P300 brain-computer interface: A comparison of white/gray and green/blue flicker

matrices,” Clin. Neurophysiol., vol. 120, no. 8, pp. 1562–1566, 2009.

[103] Davis King, “MITIE, MIT Information Extraction.” .

[104] S. K. Yeom, S. Fazli, K. R. M. Ller, and S. W. Lee, “An efficient ERP-based brain-

computer interface using random set presentation and face familiarity,” PLoS One, vol.

9, no. 11, 2014.

[105] V. I. Levenshtein, “Binary codes capable of correcting deletions,” Insertions and

Reversals. Sov, vol. 6. pp. 707–710, 1966.

[106] S. Dudy, S. Xu, S. Bedrick, and D. Smith, “A Multi-Context Character Prediction Model

for a Brain-Computer Interface,” pp. 72–77, 2018.

[107] P. J. Kindermans, M. Tangermann, K. R. Müller, and B. Schrauwen, “Integrating

dynamic stopping, transfer learning and language models in an adaptive zero-training

ERP speller,” J. Neural Eng., vol. 11, no. 3, 2014.

87

[108] S. G. Hart and L. E. Staveland, “Development of NASA-TLX (Task Load Index):

Results of empirical and theoretical research,” in Advances in psychology, vol. 52,

Elsevier, 1988, pp. 139–183.

[109] V. Abootalebi, M. Hassan, and M. Ali, “A new approach for EEG feature extraction in

P300-based lie,” vol. 4, pp. 48–57, 2008.

[110] S. F. Chen and J. Goodman, “Empirical study of smoothing techniques for language

modeling,” Comput. Speech Lang., vol. 13, no. 4, pp. 359–394, 1999.

[111] D. B. Ryan, G. E. Frye, G. Townsend, D. R. Berry, N. A. Gates, and E. W. Sellers,

“Predictive Spelling With a P300-Based Brain – Computer Interface : Increasing the

Rate of Communication,” vol. 27, no. 1, pp. 69–84, 2011.

[112] R. Fazel-Rezai, B. Z. Allison, C. Guger, E. W. Sellers, S. C. Kleih, and A. Kübler, “P300

brain computer interface: current challenges and emerging trends,” Front. Neuroeng.,

vol. 5, no. July, pp. 1–14, 2012.

[113] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter, “BCI competition

2003-data set IIb: support vector machines for the P300 speller paradigm,” IEEE Trans.

Biomed. Eng., vol. 51, no. 6, pp. 1073–1076, 2004.

[114] N. Xu, X. Gao, B. Hong, X. Miao, S. Gao, and F. Yang, “BCI competition 2003-data

set IIb: enhancing P300 wave detection using ICA-based subspace projections for BCI

applications,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1067–1072, 2004.

[115] H. Serby, E. Yom-Tov, and G. F. Inbar, “An improved P300-based brain-computer

interface,” IEEE Trans. neural Syst. Rehabil. Eng., vol. 13, no. 1, pp. 89–98, 2005.

[116] W. Francis and H. Kucera, “Brown Corpus Manual.” 1979.

[117] E. W. Sellers, A. Kübler, and E. Donchin, “Brain-computer interface research at the

University of South Florida cognitive psychophysiology laboratory: The P300 speller,”

IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no. 2, pp. 221–224, 2006.

[118] E. Combrisson and K. Jerbi, “Exceeding chance level by chance: The caveat of

theoretical chance levels in brain signal classification and statistical assessment of

decoding accuracy,” J. Neurosci. Methods, vol. 250, pp. 126–136, 2015.

88

[119] W. Speier, C. W. Arnold, A. Deshpande, J. Knall, and N. Pouratian, “Incorporating

advanced language models into the P300 speller using particle filtering,” J. Neural Eng.,

vol. 12, no. 4, p. 46018, 2015.

[120] G. Gratton, M. G. H. Coles, and E. Donchin, “A new method for off-line removal of

ocular artifact,” Electroencephalogr. Clin. Neurophysiol., vol. 55, no. 4, pp. 468–484,

1983.

[121] N. M. Schmidt, B. Blankertz, and M. S. Treder, “Online detection of error-related

potentials boosts the performance of mental typewriters,” BMC Neurosci., vol. 13, no.

1, p. 19, 2012.

[122] M. Spüler, M. Bensch, S. Kleih, W. Rosenstiel, M. Bogdan, and A. Kübler, “Online use

of error-related potentials in healthy users and people with severe motor impairment

increases performance of a P300-BCI,” Clin. Neurophysiol., vol. 123, no. 7, pp. 1328–

1337, 2012.

[123] B. Dal Seno, M. Matteucci, and L. Mainardi, “Online detection of P300 and error

potentials in a BCI speller,” Comput. Intell. Neurosci., vol. 2010, p. 11, 2010.

[124] E. Yin, Z. Zhou, J. Jiang, F. Chen, Y. Liu, and D. Hu, “A novel hybrid BCI speller based

on the incorporation of SSVEP into the P300 paradigm,” J. Neural Eng., vol. 10, no. 2,

2013.

[125] P. Stawicki, F. Gembler, A. Rezeika, and I. Volosyak, “A novel hybrid mental spelling

application based on eye tracking and SSVEP-based BCI,” Brain Sci., vol. 7, no. 4,

2017.

[126] C. Morales et al., “Single trial P300 detection in children using expert knowledge and

SOM,” 2014 36th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBC 2014, pp. 3801–

3804, 2014.

[127] E. Hakkarainen, S. Pirilä, J. Kaartinen, K. Eriksson, and J. J. Van Der Meere, “Visual

attention study in youth with spastic cerebral palsy using the event-related potential

method,” J. Child Neurol., vol. 26, no. 12, pp. 1525–1528, 2011.

89

Appendices

Appendix A1

NASA Task Load Index

Hart and Staveland’s NASA Task Load Index (TLX) method

assesses work load on five 7-point scales. Increments of high,

medium and low estimates for each point result in 21

gradations on the scales.

Participant Number

Date

Session #

Mental Demand How mentally demanding was the task?

Very Low Very High

Physical Demand How physically demanding was the task?

Very Low Very High

Temporal Demand How hurried or rushed was the pace of the task?

Very Low Very High

Performance How successful were you in accomplishing what

you were asked to do?

Perfect Failure

Effort How hard did you have to work to accomplish

your level of performance?

Very Low Very High

Frustration How insecure, discouraged, irritated, stressed,

and annoyed were you?

Very Low Very High

90

Appendix A2

Post- session questionnaire

During the last block, you have noticed that after asking the questions, suggestions

represented in the last column were either relevant to the questions or completely naïve to the

context. Please answer the questions below regarding your experience interacting with the

interface during the last block.

1- Using the scales below (middle being neutral), please indicate your preference in

using either of the suggestion methods for expressing your answers during the last

block.

2- Please tell us about your opinion on using either of the suggestion methods and

why you prefer using one over another to express your thoughts (if have a

preference):

Context dependent Context Independent

Context Independent Context dependent

Documents

A Novel Combination of Natural Language …...A brain-computer interface (BCI) is a communication system that enables individuals with severe physical disabilities to communicate or