Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A Novel Combination of Natural Language Processing
and Brain Computer Interface in a Communication
System
by
Maryam Fallah
A thesis submitted in conformity with the requirements for the degree of
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
© Copyright by Maryam Fallah 2019
ii
A Novel Combination of Natural Language Processing and Brain Computer Interface in a
Communication System
Maryam Fallah
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2019
Abstract
A brain-computer interface (BCI) is a communication system that enables individuals with
severe physical disabilities to communicate or control external devices through their brain
activities. In this study, we proposed a communication system combining natural language
processing (NLP) and BCI in a question and answer paradigm. Specifically, we combined a
context-aware predictive speller and an answer generation engine that comprehends the
question being asked of the user, to efficiently present potential conversational responses. The
user could either type a response or select from suggested answers. If the user started typing,
the cells containing suggestions were repopulated with context-relevant words matching the
user’s typed characters, thereby reducing typing time.
We have evaluated our proposed system through 4 sessions per subject in terms of accuracy,
bit rate, timing, and user satisfaction. Our data analysis has validated that the proposed
paradigm doubles the typing speed, increases accuracy and reduces mental demand of message
composition.
Keywords: Brain-computer interface, electroencephalography, P300, natural language
processing, context aware, context independent, answer generation, context dependent.
iii
Acknowledgments
I would first like to express my deepest gratitude to my supervisor Dr. Tom Chau for his
unceasing support and guidance throughout the past two years of my post-graduate degree. I
will forever be grateful for this opportunity.
Thank you to my committee members Dr. Tilak Dutta, Dr. Elaine Biddiss and my external
examiner Dr. Dimitrios Hatzinakos for their insightful questions and suggestions.
Thank you to all the members of the PRISM lab for your support and friendship. Special thanks
to Pierre Duez for his endless guidance and technical expertise, as well as Ka Lun Tam for his
unconditional help.
Thanks to all those who volunteered for my study. Without whom this work would not have
been possible.
Lastly, I would like to thank my family for their unconditional love and support without which
I would not have been where I am today.
iv
Contents
Acknowledgments ....................................................................................................................... iii
List of Tables ............................................................................................................................. viii
List of Figures .............................................................................................................................. ix
List of Acronyms .......................................................................................................................... xi
Introduction ................................................................................................................ 1
1.1 Motivation ................................................................................................................ 1
1.2 BCI system ............................................................................................................... 3
1.3 BCI cycle .................................................................................................................. 3
1.4 BCI Types and Control Signals ............................................................................. 4
1.5 P300 Response ......................................................................................................... 6
1.6 BCI speller ............................................................................................................... 7
1.6.1 SSVEP Spellers ............................................................................................................................................... 7
1.6.2 MI Spellers ...................................................................................................................................................... 9
1.6.3 P300 Spellers ................................................................................................................................................... 9
1.7 Combination of NLP and P300 Spellers .............................................................. 15
1.7.1 NLP for word completion .............................................................................................................................. 15
1.7.2 NLP for language models in classification .................................................................................................... 17
1.7.3 Performance metrics ...................................................................................................................................... 18
1.8 Project Overview ................................................................................................... 20
1.9 Research Questions and Objectives ..................................................................... 20
v
Methodology .............................................................................................................. 22
2.1 Participants ............................................................................................................ 22
2.2 Instrumentation ..................................................................................................... 22
2.3 Experimental Protocol .......................................................................................... 23
2.3.1 Offline ........................................................................................................................................................... 25
2.3.2 Online sessions .............................................................................................................................................. 27
2.4 Data Analysis ......................................................................................................... 30
2.4.1 Offline Session .............................................................................................................................................. 30
2.4.2 Online Session ............................................................................................................................................... 35
2.5 Assessment Metrics ............................................................................................... 35
A Novel Combination of Natural Language Processing and Brain Computer
Interfaces in a Question and Answer Context ..................................................................... 38
3.1 Abstract .................................................................................................................. 38
3.2 Introduction ........................................................................................................... 39
3.3 Methods .................................................................................................................. 41
3.3.1 Participants .................................................................................................................................................... 41
3.3.2 Experimental design ...................................................................................................................................... 41
3.3.3 Data collection ............................................................................................................................................... 44
3.3.4 Evaluation metrics ......................................................................................................................................... 45
3.4 Results .................................................................................................................... 46
3.4.1 ERP response ................................................................................................................................................. 46
3.4.2 Online performance ....................................................................................................................................... 47
3.4.3 Surveys .......................................................................................................................................................... 49
3.5 Discussion ............................................................................................................... 50
vi
3.5.1 Limitations and future directions ................................................................................................................... 51
3.6 Conclusion .............................................................................................................. 52
Results ........................................................................................................................ 54
4.1 Overview ................................................................................................................ 54
4.2 Feature Extraction ................................................................................................ 54
4.3 ERP responses ....................................................................................................... 56
4.4 Participant-specific offline classification results ................................................ 56
4.5 Participant-specific classification online results ................................................. 58
4.5.1 Constrained Blocks (Blocks 2-4) ................................................................................................................... 58
4.5.2 Unconstrained selection blocks ..................................................................................................................... 59
4.6 Surveys ................................................................................................................... 65
Discussion .................................................................................................................. 67
5.1 Overview ................................................................................................................ 67
5.2 Context-based corpus ............................................................................................ 67
5.3 Design parameters ................................................................................................. 68
5.4 Interface modifications ......................................................................................... 70
5.5 Error correction .................................................................................................... 71
5.6 Alternative modalities ........................................................................................... 71
5.7 BCI target population ........................................................................................... 72
Conclusion ................................................................................................................. 74
6.1 Overview ................................................................................................................ 74
6.2 Future work ........................................................................................................... 75
vii
6.2.1 Adaptively expanding the corpus .................................................................................................................. 75
6.2.2 Expressive communication: Taking turns ...................................................................................................... 75
6.2.3 Optimisation .................................................................................................................................................. 76
6.2.4 Customising the interface .............................................................................................................................. 76
Bibliography ................................................................................................................................ 77
Appendices .................................................................................................................................. 89
Appendix A1 ..................................................................................................................... 89
Appendix A2 ..................................................................................................................... 90
viii
List of Tables
Table 1.1: Difference between active and reactive BCI ............................................................ 5
Table 3.1: Average accuracies, ITR, MI for constrained selection blocks in online sessions. 47
Table 3.2: Selection rates, accuracies and ITR of the CI and CD blocks ................................ 48
Table 3.3: Selection rates, accuracies and MI of the CI and CD blocks .................................. 49
Table 4.1: Offline performance................................................................................................ 58
Table 4.2: Average accuracies, ITR and MI for constrained blocks in online sessions. ........ 59
Table 4.3: Selection rates, accuracies and ITR of the CI and CD blocks ................................ 60
Table 4.4: Selection rates, accuracies and MI of the CI and CD blocks .................................. 62
Table 4.5: Completion time and number of selections of the CI and CD blocks. ................... 64
ix
List of Figures
Figure 1.1: Visualisation of access solutions ............................................................................. 2
Figure 1.2: The BCI cycle .......................................................................................................... 4
Figure 1.3: GUI of the Bremen speller ...................................................................................... 8
Figure 1.4: Hex-O-Speller ......................................................................................................... 9
Figure 1.5: RC, CB and RB paradigms.................................................................................... 13
Figure 1.6: Chroma Speller ...................................................................................................... 13
Figure 1.7: Familiar face stimulus ........................................................................................... 15
Figure 1.8: The T9 speller ........................................................................................................ 17
Figure 2.1: Electrode configuration ......................................................................................... 23
Figure 2.2: The proposed interface .......................................................................................... 25
Figure 2.3: The GUI ................................................................................................................. 25
Figure 2.4: Timing of events during a trial .............................................................................. 25
Figure 2.5: An iterative selection trial. .................................................................................... 26
Figure 2.6: Offline session structure ........................................................................................ 27
Figure 2.7: Online session structure......................................................................................... 28
Figure 2.8: CI block ................................................................................................................. 30
Figure 3.1: Proposed interface ................................................................................................. 42
Figure 3.2: Timing of events during a trial .............................................................................. 42
Figure 3.3: ERP responses ....................................................................................................... 46
Figure 3.4: Topographic map of ERP response ....................................................................... 47
Figure 4.1: LDA score distribution for spatiotempral feature set ............................................ 55
Figure 4.2: LDA score distribution for tconcatenation featue set ............................................ 55
Figure 4.5: ERP classification accuracy versus stimulus repetition ........................................ 57
Figure 4.6: ERP classification accuracy versus the threshold value. ....................................... 58
Figure 4.7: Comparing average character accuracy of CI and CD blocks. .............................. 61
Figure 4.8: Comparing average ITR using CI and CD predictive spellers. ............................. 61
Figure 4.9: Comparing average word accuracy using CD and CI predictive spellers. ............ 63
x
Figure 4.10: Comparing average MI using CD and CI predictive spellers . ........................... 63
Figure 4.11: Comparing average completion time using CD and CI predictive spellers ........ 64
Figure 4.12: Comparing average number of selections using CD and CI predictive spellers . 65
xi
List of Acronyms
ACC accuracy
ALS amyotrophic lateral sclerosis
BCI brain computer interface
CB checker board
CD context dependent
CI context independent
CP cerebral palsy
CPM character per minute
DSLM Dynamic Stopping Language Model
EEG electroencephalography
EMCP eye movement correction procedure
EOG Electro-oculogram
ERD event related desynchronisation
ERP event related potential
ErrP error-related potential
ERS event related synchronisation
FF familiar face
GIBS gaze independent block speller
GFF green familiar face
GUI graphical user interface
xii
HMM hidden markov model
ISI inter-stimulus interval
ITR information transfer rate
LDA linear discriminant analysis
LSC lateral single character speller
MI motor imagery
MI mutual information
MMG mechanomyography
MRI magnetic resonance imaging
NIRS near-infrared spectroscopy
NLP natural language processing
OCM output character per minute
OOV out of vocabulary words
PBR practical bit rate
PF particle filtering
RB region based
RC row/column
SC single character
SCP slow cortical potentials
SNR signal to noise ratio
SR sensorimotor rhythms
xiii
SSVEP steady-state visually evoked potential
SVM support vector machines
TRCA task related component analysis
TVEP transient visual evoked potential
WSR word symbol rate
WPM word per minute
VEP visual evoked potential
1
Introduction
1.1 Motivation
Expressive communication entails the transmission of one’s needs and emotions to a
communication partner through body gestures, hand movements, speech or facial expressions.
However, many individuals living with severe disability often are not capable of
communicating through these channels [1].
Some technologies provide an alternative communication pathway for those individuals. For
instance, opening the mouth can be detected with infrared cameras [2], small muscle vibrations
can be measured using mechanomyography (MMG) sensors [3], or tongue protrusion can be
detected by computer vision [4]. Figure 1.1 conceptually depicts the components of such access
solutions. However, these technologies still require some amount of physical movement and
therefore, are not suitable for individuals who have severe motor impairments due to cerebral
palsy, degenerative neuromuscular conditions, or acquired brain injuries.
A brain-computer interface (BCI) is a technology which makes communication feasible
through neural activity, eliminating the need for body movement [5].
2
Figure 1.1: Visualisation of access solutions [1]
There are a number of methods to measure functional brain activities. Electroencephalography
(EEG), magnetic resonance imaging (MRI) and near-infrared spectroscopy (NIRS) are the
most common measurement modalities [6]. EEG signals can reflect electrocortical activity
before, during or after sensory, motor or cognitive events, known as event-related potentials
(ERP) [7].
Different types of brain signals have been used in BCI. Examples include visual evoked
potentials, slow cortical potentials and sensorimotor rhythms [8]. Each has been deployed in
different BCI applications. For instance, the visually evoked P300 potential has been proposed
as a control signal for spellers since 1988 by Farewell and Donchin [9]. Their proposed P300
speller consisted of a 6 × 6 matrix of characters where each row and column flashed at random.
Users were asked to focus on the character they intended to spell and count the number of times
the row or column containing that character flashed. Each flash of the desired row/column
elicited a P300 brain signal; therefore, by signal detection the intended character could be
identified. Although this research yielded promising results, the BCI was very slow (2
characters/min). Since then, much research has been conducted to improve communication
rates and accuracy. Different interfaces [10]–[12], stimuli [13]–[15] and control signals [5]
have been proposed. The inclusion of language models [7], [16]–[22] has also been suggested.
Despite numerous efforts to enhance the performance of P300 spellers, further improvement
of the information transfer rate (ITR) remains an elusive challenge.
The overall goal of this study was to investigate the potential merit of endowing a BCI with
natural language processing (NLP) capabilities. The most intuitive approach is to accelerate
communication by generating potential answers to a given question through NLP. However,
3
restricting a user’s response to machine-generated phrases may limit the spontaneity and
variability of natural conversation. An additional feature that provides more conversational
flexibility is to use NLP to generate context-relevant words while typing. The specific aim of
this research was thus to design, implement and evaluate an NLP-BCI communication interface
in terms of communication rate and user satisfaction.
In this chapter we will discuss the basics of BCI, application of NLP in BCI and literature on
BCI spellers.
1.2 BCI system
A BCI system is a communication pathway that does not require any muscular activity but
rather is dependent exclusively on neural activities [23]. As such, a BCI may be a suitable
alternative access pathway for people with severe motor impairments, due to conditions such
as amyotrophic lateral sclerosis (ALS), brain stroke, cervical spinal cord injury, cerebral palsy
(CP), or muscular dystrophies [7].
1.3 BCI cycle
The first step in a BCI cycle is to measure the brain signals through EEG or NIRS. These
modalities are dependent on the application of the BCI and the mental task used for control.
EEG measures the summation of electrical activity at the scalp caused primarily by synaptic
activity in the upper layers of the cortex [24] whereas. NIRS is an optical spectroscopy method
that measures the hemodynamic response during neural activity by irradiating near-infrared
light through the skull [25]. The next step is pre-processing, which is necessary due to low
signal-to-noise ratio (SNR) of the brain signals. The SNR is low because the signals cross
various skull layers and are contaminated by background noise from inside the brain and
externally over the scalp [26], [27]. This step will maximise the probability of detecting task-
related brain activity. After pre-processing, discriminative features must be extracted, which is
very challenging as there are many irrelevant and confounding brain activities [8]. Feature
engineering is critical to avoiding the curse of dimensionality, that is, to create a lower
dimension feature vector without relevant information loss [28]. The implementation of the
4
classification step depends on the application and data; here, one algorithmically categorises
the mental state of the user on the basis of the extracted features. The detected mental state is
subsequently used to control an external device, such as a wheelchair, a speller, etc. The BCI
cycle concludes with the user’s perception of the output. Figure 1.2 provides a schematic
summary of the BCI cycle.
Figure 1.2: The BCI cycle [25]
1.4 BCI Types and Control Signals
BCIs can be either invasive or non-invasive. Although invasive BCIs offer substantively higher
signal-to-noise ratio and spatial resolution, their clinical translation to date, is very limited. We
will therefore focus exclusively on non-invasive BCIs from here on.
Non-invasive EEG BCIs can be grouped into two groups based on what brain signals are used
as control. One is active (endogenous) BCI, the other is reactive (exogenous) BCI. The former
requires the user to actively engage in a cognitive activity. Examples of this type include those
using Slow Cortical Potentials (SCP)[29] and Sensorimotor Rhythms (SR)[30]. On the other
hand, exogenous BCIs rely on the brain activity associated with the user’s natural reaction to
an external stimulus [31]. Example of exogenous BCIs include Visual Evoked Potentials
(VEP), P300 BCIs and Steady State Visual Evoked Potentials (SSVEP).
Compared to reactive BCIs, active BCIs provide the user with more control over the system
[26]. However, they require extensive user training to gain sufficient control of the BCI system
[32]. Common examples of active control tasks are mental arithmetic and mental singing which
5
activate the prefrontal cortex. However, these tasks are usually unintuitive since they are
irrelevant to the output command [33]. Motor Imagery (MI) is another commonly used active
control signal which involves imagination of movement of body parts without overt execution
[34]. As a result of this cognitive task, amplitude modulations in the SMR known as Event
Related Desynchronisation (ERD) and Event Related Synchronisation (ERS) occur that can be
used to detect which part of the body the user was moving in their imagination and translate
that to a specific output command [35]. During the training process for MI, it is crucial to
emphasise kinaesthetic experiences rather than imagining visual images of movement; the
former could be challenging for individuals who have experienced stroke or lost control over
their limbs [36].
One of the common reactive BCI control signals is SSVEP. SSVEP is a type of VEP which
are fluctuations in the visual cortex when exposed to visual stimulus [37]. Depending on the
frequency of the stimulus, VEPs are separated into two groups, namely Transient VEPs (TVEP)
for frequencies below 6 Hz and SSVEP for higher frequencies [8]. VEPs are usually elicited
by flashing LEDs at different frequencies. This stimulus requires the user to visually fixate on
a flashing light source, eliciting a brain response at the same frequency and its harmonics [38].
SSVEP BCI applications can be used for spatial navigation [39] and have relatively high
information transfer rates (ITR); however, they pose the risk of inducing seizures [40].
Table 1 summarises the differences between active and reactive BCIs.
Table 1.1: Difference between active and reactive BCI [28]
Approach Active BCI Reactive BCI
Brain Signals - Slow Cortical Potentials (SCPs)
- Sensorimotor rhythms
- P300
- Steady State Visual Evoked
Potentials (SSVEP)
Advantages - Does not require computer stimuli
- Can be operated freely at will
- Can be used by individuals with
sensory impairments
- Suitable for cursor control applications
- Minimal training required
- Control signal set up easily and
quickly
- High bit rate (60 bits/min)
- Only one EEG channel
required
Drawbacks - Time consuming training
- Not all users are able to obtain control
- Multichannel EEG recordings required
for good performance
- Lower bit rate (20-30/min)
- Requires sustained attention to
external stimuli
- May cause visual and mental
fatigue
6
1.5 P300 Response
Another widely used reactive control signal is the P300 response [41] which is an ERP in
reaction to an infrequent target stimulus in a series of frequent stimuli known as the oddball
paradigm [7]. This ERP represents itself as a positive deflection in the EEG signal. P300
potentials can be separated into two groups, P3a and P3b which differ in the latency and scalp
topography. P3a originates from the frontal area as a result of attention mechanisms during task
processing with a latency of 250-280 ms, while P3b originates from the parietal lobe and is
associated with attention and subsequent memory procession with a latency of 250-500 ms
[42].
This phenomenon was first characterised in 1964 by Chapman and Bradgon [43]. Later in 1965,
Sutton et al. further explored this positive deflection by presenting a series of stimuli to
participants either a flashing light or a sound [44]. First, the sound or flash stimulus was
presented as a cue, then after three to five seconds (randomly selected) a test stimulus was
followed. Some of the test cases were predetermined, i.e. where the participant was certain of
the type of test stimulus (flash or sound). In other test cases, the user was uncertain about the
modality of the test stimulus and was asked to predict the type of the upcoming test stimulus
in the interval between cueing and testing. The study found that in the uncertain case a peak in
the EEG waveform occurred approximately 200 ms after the stimulus and that the amplitude
of this peak could be modulated with the probability of the stimuli; that is, a less probable
stimulus resulted in a larger peak [44]. As previously mentioned, the P300 response is known
to be associated with attention and memory process [42]. According to the “context updating
theory” in [45], the P300 response in generated by the updating of working memory when the
current event is different from the previous [45]. The less occurring stimulus is usually referred
to as target stimulus while the more expected stimulus is known as non-target. Some studies
have focused on the effect of target probability on the P300 response showing that the response
is enhanced when the target is infrequent and therefore less expected by the user [46], [47].
Another factor that can modulate the P300 response is the order of stimuli, i.e. whether a target
has occurred right after a previous target stimuli, it has been suggested that P300 responses can
also occur with equal target and non-target stimuli [45], [48].
Different type of modalities can be used to elicit P300 responses as the early studies have shown
[44]. Stimulus can be visual where users are shown a series of n items sequentially flashing in
7
random order and are asked to focus on one specific item[9]. The detected ERP can then be
translated to control an external device, such as a robotic arm [49] or a cursor on the screen
[50]. To eliminate the need for functional vision, alternative stimuli (e.g, tactile [51] and
auditory [52]) have also been considered. However, these alternative stimulus modalities have
elicited weaker ERP responses and achieved lower classification rates compared to visual
stimuli [53]–[55]. A combination of the modalities has been suggested as a solution to this
problem. A hybrid auditory-tactile BCI study in [56] demonstrated improvement in transfer
rates by exploiting multiple brain responses from SSVEP and P300 modalities.
In the rest of this chapter, we will focus on previous studies of BCI spellers and delve into
research conducted on P300 spellers as they form the basis of our proposed NLP-BCI system.
1.6 BCI speller
One application of brain computer interfaces is BCI speller. A BCI speller is a communication
device that enables individuals with motor and speech difficulties to communicate through a
graphical user interface (GUI). Through brain signal recordings and analysis, the user selects
his/her desired characters from the screen [7].
BCI spellers are similar to typical keyboards with the main difference being the method of
typing. While with regular keyboards, users press each button to produce the corresponding
letter on the screen, in a BCI speller, users simply select characters through cognitive activity.
Three control signals studied to interact with a BCI speller are P300, SSVEP and Motor
Imagery (MI). P300 spellers as we will discuss further in the next section consist of a series of
stimuli where the user has to focus on a specific cue (target). The occurrence of the target
stimulus in a random manner manifests as a positive deflection in the EEG signal that can be
classified using machine learning to determine the user’s desired character [57].
1.6.1 SSVEP Spellers
SSVEP spellers are controlled by gazing at light sources that flick at different frequencies. One
of the early SSVEP spellers was the Bremen-BCI speller [58] which consisted of a 32 character
diamond shaped grid with five command buttons, four arrows and one select button. Each of
these five control buttons flickered at a different frequency. The cursor was at the middle of
8
the screen by default. In order to make a selection, the user had to gaze at the arrow buttons to
move the cursor in their desired direction followed by fixating on the select button. Figure 1.3
shows the Bremen speller. Later, Volosyak et al. used a built-in dictionary to accelerate the
prediction process and boosted the ITR from 25.67 bits/minute to 32.71 bits/minute [59]. This
was the first SSVEP speller with predictive spelling. After further improvements in the signal
processing phase, the transfer rate was increased to an average of 61.70 bits/minute in a test
with seven participants [58]. Many other studies have been conducted on the effect of the
number of stimuli and different GUIs on the performance of SSVEP spellers such as Wang et
al. [60] who increased the number of target stimuli to sixteen and gained on average 75.4
bits/minute and 97.2% accuracy. Similarly, the increased number of stimuli and the use of
spatial filters to remove background noise have led to higher bit rates [61]. For more
information on SSVEP spellers please refer to [7].
Figure 1.3: GUI of the Bremen speller. [58]
Some studies have tested a hybrid BCI speller and exploited ERPs elicited by different stimulus
modalities to boost BCI performance [5],[62].
A limitation of the discussed spellers so far is gaze-dependency; the user must have control of
his or her gaze in order to interact with such systems [9]. This is known as overt attention.
Some studies have invoked covert attention with BCI spellers. Such interfaces do not require
fixation of gaze, thereby minimising ocular movement by using alternative features in colour
and shape to localise stimuli in a single, central location [63]–[67]. These techniques however,
still require functional sight. Auditory and tactile stimuli are two complete gaze independent
alternatives [68]–[70]. However, these BCIs are typically characterized by much lower ITRs
than their visual counterparts simply because of longer stimulus presentation times. An
alternative solution to the gaze dependency issue could be MI spellers.
9
1.6.2 MI Spellers
MI spellers are controlled by imagining movement of different body parts and are therefore
considered as active gaze independent BCI spellers. One the early MI spellers was the Hex-O-
Speller by Blankertz et al. [71] which consisted of a two-step process. As depicted in Figure
1.4, six hexagons each with five characters were arranged on the screen with an arrow that was
used as a region selector. The user had to imagine right hand or foot movement to select one
of the hexagons. After a region was selected, the five characters were spread each in one
hexagon and the same process continued for character selection. Although a good solution for
gaze independency, extended user training, mental fatigue and slower transfer rates were
disadvantages of this paradigm [7].
For more information on MI spellers refer to [7].
Figure 1.4: Hex-O-Speller. Each region was selected by imagining right-hand or foot movement and moving the
pointer [7].
1.6.3 P300 Spellers
As mentioned earlier, to maximise detectability of the P300, there should be a notable signal
difference between target and non-target event-related potentials. Usually, the user interacts
with a visual interface on a computer screen. The most well-known is the Row/Column
paradigm (RC) introduced by Farewell and Donchin in 1988 [9] as depicted in Figure 1.5.A.
This paradigm consisted of six rows and six columns including twenty-six alphabet letters and
ten digits. In this typical interface, each row and column flashed at random while the user
fixated on the desired character, counting the number of times it flashed. Each time the
corresponding row or column flashed, a peak in the user’s brain signal occurred, whereas,
10
flashing of the non-target rows/columns ideally did not elicit such changes in the brain signal.
This signal difference makes it feasible to detect the desired row and column and therefore, the
desired character. This study achieved a maximum accuracy of 95% and transfer rate of 12
bits/minute with four typically developed participants [9]. One advantage of this system was
that no user training was required. However, there were several issues that limited its clinical
utility.
The attention span, levels of fatigue and motivation, and mental state of the participant, directly
affect BCI performance. Käthner et al, [72] argue that high workload conditions attenuate the
P300 amplitude, underscoring the need for careful selection of stimuli and Inter-Stimulus
Intervals (ISI) [73]. Further, calibration is typically required as each user has slightly different
evoked brain response patterns. Repetition blindness, habituation and artefacts, can also
diminish real-time accuracies. To overcome these limitations, Treder et al. evaluated three
different variants of fast-paced, gaze independent visual spellers [74]. Participants could use
covert spatial attention, non-spatial feature attention (i.e., attention to colour and form) in two
paradigms, and overt attention in the third paradigm. Mean symbol selection accuracies of 85–
90% were achieved with thirty symbols, suggesting that overt attention is not necessary for
highly accurate responses. Other studies have investigated the effect of different matrix sizes
and concluded that performance decreases as symbol size is reduced [75]. Salvaris et al.
showed that a green and blue chromatic flicker matrix offers better performance than a black
and grey one [76].
Farewell and Donchin’s RC speller became the base of most future P300 spellers which were
developed to improve the system’s speed, classification accuracy and user friendliness. Below
we discuss alternative paradigms as depicted in Figure 1.5 that have addressed some of the
shortcomings of the initial proposed interface.
Single character (SC)
As an alternative to the regular RC interface, Guger et al. suggested flashing one character at a
time (Figure 1.5.B) [77]. Although this interface has the advantage of captivating user attention
during the experiment and therefore eliciting higher P300 amplitudes, it is slower compared to
the RC paradigm. SC flashing inevitably lengthens the time required to detect the target
character. To be more specific it was shown in [77] that with a 60 ms flash and a 40 ms ISI
period, 54 seconds are needed to flash each character fifteen times with a 6 × 6 matrix. On the
11
other hand, with a 100 ms flash and a 60 ms ISI, the RC interface requires 28.8 seconds to
present thirty flashes of each character. Therefore, the SC paradigm is twice as slower as the
RC paradigm. The accuracy computed in these two paradigms with nineteen participants was
a mean of 85.3% for RC and 77.9% for SC.
Checkerboard (CB)
One issue with the RC interface is that adjacent cells flash simultaneously. This is a source of
distraction as the non-target responses may appear as target responses. It has been discovered
that when such distraction occurs in the RC paradigm, the majority of incorrect selections lie
in the same row or column as that of the desired character [77]. An alternative approach to
mitigate such drawbacks is the checkerboard (CB) as depicted in Figure 1.5.C & D. The rows
and columns of the matrix are disassociated in the CB paradigm and Townsend et al. [78]
demonstrated that this disassociation enhances the performance by reducing distraction. As
depicted, the CB paradigm was an 8 × 9 matrix superimposed on a checkerboard. The items
were randomly placed in the white and black squares. Since these matrices were disassociated,
adjacent flashes did not occur. After populating the matrix at random with the items, they
flashed sequentially in the following order: white rows, black rows, white columns, and finally
black columns. After the first sequence ended, the matrices were repopulated at random again
and the next sequence occurred. Another advantage of this paradigm over the RC is the capacity
for more on-screen squares (87 vs. 36) which decreases the probability of the target character
occurrence and therefore, increased the amplitude of the elicited P300 during the oddball
paradigm. This paradigm was tested on eighteen participants with mean accuracy of 92% and
23 bits/minute.
Improved Checkerboard design
A number of studies have attempted to improve the baseline checkerboard. Lakey et. al in [79]
studied the effects of attentional resources, and demonstrated that mindfulness induction
significantly improved classification accuracy over a non-induction control group in the RC
and CB paradigms. Another study showed that the CB paradigm could be further improved by
suppressing and not flashing the items surrounding the attended item during calibration. Online
results showed that this suppression calibration method leads to enhanced performance
compared to the standard CB paradigm.
12
Region-Based
Fazel-Rezaei et al. suggested a region-based (RB), two-level paradigm as depicted in Figure
1.5 panels C and D where all the characters were divided into several regions [80]. In the first
level, the user focused on the desired character while all the regions flashed. After several
flashes of each group, the selected group was detected. Afterwards, in the second level, each
character in the group flashed until the selected character was identified. It was shown in [80]
and [81] that this paradigm significantly decreased, human error and the adjacency problem. It
was found that the overall spelling accuracies averaged for the same set of subjects, trials, and
characters for RC, SC, and two variations of RB paradigms were 85%, 72.2%, 90.6%, and
86.1%, respectively [81].
The RB and CB paradigms were new directions in P300 BCI research that produced superior
performance over the traditional RC approach.
A number of other interfaces have been suggested to mitigate the adjacency problem and gaze
dependency that deviate from of the standard RC P300 speller. These include the Chroma
speller [64], Geospell [65], Gaze Independent Block Speller (GIBS) [67], Lateral Single
Character Speller (LSC) [82] and T9 [17].
The Chroma speller for instance, was designed to eliminate gaze dependency by having each
row in a distinctive colour. A letter was selected based on a two-step process. The user needed
to focus on the colour of the desired row rather than the specific letter. After a row had been
selected, the corresponding letters were spread in each colourful row and a single letter was
selected in a similar manner. This two-stage paradigm is suitable for ALS patients as they may
have a limited oculomotor control. However, this system has not yet been tested on ALS
individuals [64]. Figure 1.6 illustrates this two-stage selection speller.
Some of these paradigms were later integrated with a predictive speller to improve efficiency,
as detailed. We will describe the details of them in the next section.
13
Figure 1.5: A: Rows and columns are flashed. B: A single character is flashed. C&D: Checkerboard paradigm.
E&F: Region-based, two level selection, one region is expanded at the second level selection. [55]
Figure 1.6: Chroma Speller. At the first stage, a row is selected by focusing on its colour. After a row has been
selected, the blue row for instance, letters of that row spread among the different rows and the same process
occurs to make a single letter selection [7].
14
Alternative stimuli
Although flashing stimuli are the most typical and common in P300 spellers, there have been
studies suggesting alternative stimuli. For instance, Guo et al. studied the performance of a
virtual keyboard that deployed a moving vertical bar as a stimulus instead of flashing [83]. In
this interface a vertical bar appeared below each key and moved leftward at random intervals.
This study showed that moving stimuli can elicit strong P300 ERPs for offline studies. Another
study compared the performance of the flash stimuli against that of moving stimuli on the
typical 6 × 6 matrix [84] in an offline paradigm. This work concluded that the moving stimuli
elicit a stronger P300 signal than the flashing stimuli. An online comparison of these two
stimuli was presented in [85], where twelve participants interacted with an online P300
interface subsequent to individual offline calibration (selecting among six letters). A noticeably
high average transfer rate was achieved (42.1 bits/min) with motion-onset visually evoked
potentials.
In another study, Jin et al. compared three types of stimuli: The typical flash, the vertical
moving stimuli and a combination of them. Ten individuals participated in this study and all
had better performance with the hybrid stimuli compared to flashing or moving stimuli alone
[86].
An alternative stimulus suggested by Kaufmann et al. was familiar faces [87]. In this variation
of the P300 speller, familiar faces were transparently superimposed on the letters of the P300
matrix (see Figure 1.7 panel A). This type of stimulus elicited other ERPs such as N200 and
N400f (“f” for face) which were negative peaks roughly 200 and 400 milliseconds following
stimulus presentation. The latter negative peak originates from the inferior temporal gyrus
which is associated with visual stimuli processing, object recognition and face perception
[88].The appearance of additional ERPs facilitated detection and therefore, increased the
transfer rate.
In another study, Kaufmann et al. implemented two simultaneous stimuli where some regions
were illuminated with a familiar face and others with a symbol [89]. This two-stimulus
paradigm achieved noticeably higher transfer rates ~80 bits/minute, but at reduced accuracy,
(81.25%). Nonetheless, this finding suggests that there is still potential to enhance the speed of
P300 BCI spellers.
15
Studies have shown colour sensitivities in the parietal, occipital and temporal areas [90].
Further work has studied the influence of chromatic properties on the familiar face stimulus,
demonstrating that a green coloured face stimuli elicited higher amplitudes of P300 ERPs [15].
Figure 1.7 panel B shows this stimulus.
Figure 1.7: A) familiar face stimulus B) Green familiar face stimulus [15]
It can be concluded that recent studies show that alternative stimuli which elicit stronger ERPs
have better performance and can be considered as a substitute for the canonical flashing stimuli.
1.7 Combination of NLP and P300 Spellers
The field of natural language has been studied for many years in the domain of linguistics [36],
machine translation [6] and speech recognition [3]. However, language models have only been
recently integrated into the BCI domain [39]. The most common use of NLP in the field of BCI
is in P300 spellers. Language models can be exploited for word completion [38], signal
classification, and error correction [39], ultimately increasing communication rate [20].
1.7.1 NLP for word completion
Donchin et al. mentioned “substantial sequential dependencies in English” which could be
leveraged in classification [91]. Including known patterns and structure of language in a BCI
communication system can effectively improve the spelling rate, accuracy, and error
correction. Ryan et al. in [21] added a spelling checker to the standard CB paradigm consisting
of a 8 × 9 matrix and reported increased speed in typing. They used the output of the P300
speller as input to an assistive word completion software, WordQ2 (version 2.5, Quillsoft, Ltd,
Toronto, ON), which references a dictionary for potential word suggestions. The top
16
suggestions were sent back to the user for selection according to their number in the suggestion
list. This application is similar to the word completion suggestions on a smart phone.
Interestingly, accuracy decreased as the task and interface became increasingly complicated.
However, the spelling speed increased since complete words could be typed with less
selections. The following year, Kaufmann et al. combined a similar dictionary lookup method
with the P300 speller for German words [22]. This approach mitigated the workload issue of
Ryan et al. as it included the suggested words in the same matrix as that for selecting characters.
In this study, word suggestions were listed by looking up German webpages sorted by the
number of repetitions. The algorithm searched this list after a few letters were selected by the
user and presented the top six matches. These top matches were then presented in a column in
the P300 speller. A delete button was also included in the interface in case none of the
suggestions were correct and the user wanted to go back to typing mode. Later, Akram et al.
suggested an interface similar to that proposed by Ryan et al. However, instead of selecting the
number of the desired word from the 8 × 9 matrix, the interface switched to a 3 × 3 matrix of
numbers, each corresponding to one of the nine suggested words [17] . This interface was later
embedded into one single T9 paradigm [18]. This integration may have reduced the complexity
caused by switching between interfaces; however, the two-step selection of letters and then
words can be confusing for users, especially now that smart phones do not use the T9 interface.
Another issue was that each selection corresponded to at least three letters and as more
selections were made, the combination of possible target letters increased. Also, since the
suggestions were only shown when the number of retrieved words from the dictionary were
nine or less, the system’s transfer rate was limited. Figure 1.8 demonstrates this interface.
A recent study by Guy et al. used smiley faces as stimuli in a matrix speller and implemented
a word prediction dictionary with the Presage library to present the top ten suggested words on
the right side of the keyboard [19]. This interface was tested on twenty ALS participants and
65% of them gained above 95% accuracy with 5.04 correct symbols/minute.
Clearly, integration of predictive spellers can reduce frustration, mental demand and selection
time. These systems make predictions by simply checking selected letters against dictionary
entries. When the letters do not match any sequences in the dictionary, the system will change
the letter sequence to match one in the dictionary. However, an issue of this model is that it
cannot manage Out Of Vocabulary (OOV) words. Also, none of the studies mentioned in this
section have included prior knowledge of natural language in their classifiers; that is, in the
17
classification step, an equal probability for all the cells was assumed. However, based on what
letters have been selected, some prior assumptions can be made, e.g. probability of selecting
the letter “u” after “q” is higher than other letters.
In the next section we focus on previous studies that have included this prior knowledge in
their classification process.
Figure 1.8: The T9 P300 speller. A) At the first step the user had to focus on the numbers associated with the
desired letter. The predictive speller searched for words starting with the selected letters. B) When these
suggestions numbered nine or less, they were represented on the screen and indexed numerically. C,D) The user
focused on the target number.
1.7.2 NLP for language models in classification
Language models attempt to model character patterns based on the corpora of the existing text.
These models provide a probability distribution for target characters based on previous
selections, which can be used as a prior probability for future selections. The simplest of such
18
models captures patterns by finding the relative frequency of n grams, sequences of 𝑛
consecutive characters. These models are created by parsing through a corpus of text and
counting the number of occurrences of these sequences. The conditional probability of a
character, xt, given the previous 𝑛 − 1 characters can then be computed [92]:
p(xt| xt-1
,¼,xt-n+1
) =c(x
t,¼,x
t-n+1)
c(xt-1
,¼,xt-n+1
) (1.1)
where 𝑐(𝑥𝑡, … , 𝑥𝑡−𝑛+1) is the number of occurrences of the character sequence
𝑥𝑡𝑥𝑡−1…𝑥𝑡−𝑛+1. The number of n grams is exponential in n; therefore, the algorithm will be
slow in real-time classification. Many studies have used n grams, specifically bigrams [92],
[93], and trigrams [92], [94], [95] as language models to improve the classification of ERPs
using naïve Bayes and the Hidden Markov Model (HMM). The latter is a representation of
processes that cannot be directly observed, but which can be predicted by state-dependent
output. The objective of an HMM is to determine the optimal sequence of states that may have
produced a certain outcome [96]. A typed word is interpreted as a sequence of states of the
process 𝒙 = (𝑥0, … , 𝑥𝑛) that can only be indirectly observed through the recorded EEG data.
The goal is to determine 𝒙 by observing the EEG data [73], [97], [98].
More recently, Speier et al. used Particle Filtering (PF) to compute prior probabilities and
correct errors through statistical modelling [20]. This classifier computes the probability
distribution over possible outputs by sampling a batch of possible realisations, i.e. a batch of
potential output strings typed by the user. Each of these particles moves through the model
independently based on the transition probabilities [20]. This model is useful when the
estimation of the probability distribution over all possible strings in real-time is
computationally intractable.
1.7.3 Performance metrics
The most commonly used metrics are accuracy which is the number of correct selections over
the number of all selections and ITR which is the number of error-free bits per minute [26].
ITR finds the average bits of information communicated through each selection, B as the
mutual information between selection y and the target character x, divided by time [99].
19
Written Symbol Rate (WSR)
Another metric used is written symbol rate [100]. First, the Symbol Rate (SR) is computed as
the bits per trial scaled by its maximum possible value, 𝑙𝑜𝑔 𝑁, where 𝑁 is the number of
possible targets. SR is considered as the probability of a correct selection. This metric is not
suitable for cases when multiple decisions are required for a correct selection. The average
number of selections necessary to choose one character is then found by determining the
number of additional selections required for correcting errors. WSR becomes zero if the
number of errors are more than the correct selections, i.e. 𝑆𝑅 ≤ 0.5 [101].
𝑊𝑆𝑅 = {2𝑆𝑅 − 1
𝑇 𝑆𝑅 > 0.5
0 𝑆𝑅 ≤ 0.5 (1.2)
Practical Bit Rate (PBR)
Practical bit rate simulates error correction and uses accuracy (P) instead of SR. It then divides
the bits of information in a single correct selection (assuming all character have equal
probability) by the average number of selections to make a selection [78][101].
𝑃𝐵𝑅 = {(2𝑃 − 1)𝑙𝑜𝑔𝑁
𝑇 𝑃 > 0.5
0 𝑃 ≤ 0.5
(1.3)
Characters per Minute (CPM)
Characters per minute is similar to PBR with the difference that it does not consider the size of
the grid [101].
𝐶𝑃𝑀 = {(2𝑃 − 1)
𝑇 𝑃 > 0.5
0 𝑃 ≤ 0.5
(1.4)
Output Character per Minute (OCM)
This metric is only suitable for cases that require the user to correct all errors. It is computed
by dividing the total number of characters by the time required to type them [101].
20
Mutual Information (MI)
This is a similar metric to ITR with the difference that it does not assume that all selections are
equally likely and considers the accuracy at a word level, eliminating the issue of longer words
transferring more information [101]. The formula for ITR and MI can be found in section five
of the next chapter.
1.8 Project Overview
Based on previous studies, P300 spellers seem to be a promising solution to communication
challenges. Implementation of various visual features in the interface such as colour, stimulus,
selection process, etc. have improved the ERP responses and therefore performance. Later
studies have investigated the effect of integrating predictive spellers and language models.
However, to the best of our knowledge none of these studies have implemented a context aware
P300 speller to further facilitate communication. To this end, this thesis focused on the
development of a context-dependent P300 speller in a question and answer context.
1.9 Research Questions and Objectives
This research aimed to answer the following question:
What magnitude of change, if any, in BCI classification accuracy and bit rate, can be achieved
through the combination of a P300 BCI, context relevant predictive speller and an answer
generation engine in a single adjacency pair conversation?
An adjacency pair is an organizational unit of conversation, consisting of two utterances in
succession, by two conversation partners. A question posed by one speaker followed by an
answer from the other is an example of an adjacency pair, i.e. a type of conversational turn-
taking.
The objectives of this study were threefold:
21
1- Design an offline NLP-BCI interface for a question and answer context with an accuracy of
at least 70%
2- Implement this interface online with a minimum accuracy of 70%
3- Contrast the performance of the proposed system against that of previous relevant research
This study consisted of two main technical components: natural language processing and the
BCI speller. To some extent there are similarities between this research and research conducted
in [16], [50]; however, neither previous study has included language models in their work. In
this study, we validated that using such prior knowledge (i.e. language model) improves
communication performance.
Based on these questions and objectives, we hypothesise that the combination of NLP and a
P300 speller in the context mentioned will improve the BCI performance.
22
Methodology
2.1 Participants
This study was approved by the Holland Bloorview Kids Rehabilitation Hospital and the
University of Toronto ethics review board. Ten typically-developed adults aged 20-40, with no
verbal, motor or neurological conditions and normal/corrected vision were recruited through
Holland Bloorview Kids Rehabilitation Hospital and the University of Toronto. Participants
gave informed consent prior to their participation. The study consisted of one offline and three
online sessions, each of an hour duration. Data were collected from each participant on four
different days.
2.2 Instrumentation
EEG data were collected from eight channels, namely, Fz, Cz, Pz, P3, P4 , PO7, PO7 and Oz
[102], using the BrainAmp DC amplifier (Brain Products GmbH, Germany). All signals were
sampled at a rate of 1000 Hz and the impedance of each active electrode was maintained below
10 kΩ for the duration of all sessions. As depicted in Figure 2.1, the electrodes were grounded
to AFz and referenced to FCz.
23
Figure 2.1: Electrode configuration [102].
2.3 Experimental Protocol
Participants were seated comfortably in a chair located approximately 80 cm from a 22” LED
computer monitor with a resolution of 1680 × 1080 pixels. Our design consisted of a speech-
to-text tool that converted the question asked by the conversation partner, who in this study
was the researcher, into text. The text of the question was displayed on the screen for the
participant. We used Google’s API for the speech-to-text conversion. In the next step, this text
was sent to an NLP engine to classify the intent of the question. We used MITIE open source
library [103] for detecting the context of the question. The detected intent was then used to
generate six potential answers to the question. The potential responses were displayed in the
6th column (the suggestion column) of a 6 × 6 speller. The initial suggestions were
predetermined based on frequency and popularity and tagged with the relevant context to
facilitate retrieval by the answer generation engine on the basis of detected intent. Participants
were given five seconds to locate their target cell. These suggestions were retrieved from a
context-based dictionary with twenty categories and 3302 words. The dictionary was designed
specifically for this study as we did not find any off the shelf context-based corpora. The
number of categories was determined based on the design of the experiment, i.e. how many
questions fit in an hour-long session. We implemented the famous-faces stimuli with green hue
[15] and pseudo-random flashes [104] with stimulus onset of 100 ms and inter-stimulus interval
of 200 ms. A five-second gap followed each selection, allowing the participant to check the
24
current word suggestions and to navigate through the grid to the target letter or word. Figures
2.2 and 2.3 illustrate the interface and the general flow of a session, respectively.
Figure2.1 The experimental paradigm. The researcher asked a question verbally which was detected through
speech recognition and converted to text. The question was sent to the NLP engine for entity recognition and
retrieving potential answers. Then the grid with letters and suggested words were represented to the participant
and the brain signals were subjected to pattern recognition.
Figure2 2: After the question was asked verbally by the researcher, the corresponding text was shown on the
screen and suggested answers populated the last column.
Researcher
Participant
25
Figure 2.2: Timing of events during a trial. The question was only asked verbally in the last two blocks of the
online sessions. The timing of classification varied among sessions (offline or online) and individuals.
2.3.1 Offline
The offline session consisted of five blocks. Each block had six trials (questions). In each trial,
a question, the designated answer (shown underneath the question) and a grid of letters flanked
by a suggestion column (as in Figure 2.2) were shown on the screen for six seconds, giving the
participant the time to read and prepare. The cell (letter or word) within the grid that the
participant was to select, herein referred to as the target, appeared in red highlight (Figure 2.4a)
and subsequently flashed for three seconds.
Three out of the six trials in each block were randomly selected as iterative selection trials,
where the participant was asked to start by typing the first letter of the designated answer (not
among the answers in the suggestion column). As the participant started to type the characters
of the answer one letter at a time, the suggestions were updated accordingly. An example is
depicted in Figure 2.4. The question was “What fruit do you want to eat?” and the designated
answer “orange” appeared under the question with the letter “o” highlighted in red as a cue
for the participant (Figure 2.4A). The fact that only the first letter was highlighted indicated
that the answer was not in the suggestion column. The participant focused on “o” in the grid
and after fourteen flashes of all the cells, feedback was given to the participant showing that
letter “o” had been selected (Figure 2.4B). The number fourteen was determined based on
literature [20]–[22], [93] and confirmed via pilot sessions. Our context relevant predictive
speller then searched the category of fruits for words that have the least Lavenshteine distance
from the selected letter and repopulated the suggestion column with the top six of such words
(Figure 2.4B). The Lavenshteine distance [105] is defined as the minimum number of edits,
namely insertions, deletions and substitutions that can be made in string a to arrive at string b.
Subsequent to the updates to the suggestion column, the next target was highlighted in red,
26
directing the participant’s focus accordingly. In the current example, the next target was the
word “orange”, which now appeared in the selection column.
The other three trials of a block entailed selecting among the answers provided in the
suggestion column (single selections), as shown in Figure 2.4C.
(A)
(B)
(C)
Figure 2.3: An iterative selection trial (A) where the designated answer was not among the suggestions and the
participant had to start typing letter by letter until the target appeared in the suggestion column (B). A single
selection trial where the target was among the suggestions (C).
The structure of the offline session is summarized in Figure 2.5.
27
Figure 2.4: Offline session structure. Each block consisted of six questions. Three out of six were single
selection questions where the answer was among the suggestion column and the rest were iterative selection
trials where the designated answer was not initially among the suggestions.
2.3.2 Online sessions
Participants completed three online sessions. For the first four blocks of each online session,
the distribution of trials resembled that of the offline session (i.e., 3 single selection, 3 iterative
selection trials; participants prompted with target selection).
First Online Block
The first block was offline and used as same day data. The number of flashes were fixed to ten
instead of fourteen to reduce the risk of fatigue.
Blocks two to four: Constrained selection Blocks
For blocks two to four, the answer was provided to the participant in the same manner as in the
offline trials (red highlight) with a slight difference. In these blocks, the participants had to
navigate through the grid to find the target on their own, prior to the stimulus flashes. This was
to prepare the participants for a more realistic interaction with the system in the last two,
“unconstrained selection blocks”. The structure of the online sessions is summarised in Figure
2.6.
28
Figure 2.5: Online session structure. Blocks 1-4 were structured in the same way as blocks in the offline session
except that in blocks 2-4, the feedback was the result of online classification. For blocks 5 and 6, questions were
asked verbally and the participant decided how to respond. The classification model was retrained after each
block.
The number of flashes in the online blocks varied from two to eight. After each flash sequence
of the grid, probabilities of each of the thirty-six cells being the target were updated and if any
cell had a probability higher than 80%, it was determined to be the participant’s intended
character. We will discuss how we decided on our threshold level later. The participants were
asked not to correct their mistakes. Allowing for backspace poses complex modelling
challenges and will not allow for fully exploiting the information from the language model
[19], [20], [106], [107].
The offline blocks and constrained selection blocks (blocks 2-4) had designated answers which
varied from one session to another but were consistent among different participants.
Knowledge of the ground truth in the online sessions helped us retrain our model with the
additional data accumulated after each block.
Last two online blocks: Unconstrained Selection Blocks
In order to determine if our proposed paradigm could outperform previously studied P300-NLP
spellers in terms of communication rate and to also test our system in a more realistic manner,
we included two unconstrained selection blocks at the end of each online session. The
participant was given the freedom to respond with a word at their discretion. For one
unconstrained selection block, the BCI ignored the context of the question (context-
independent block), while for the other, the BCI invoked the context-dependent answer
29
generation engine (context-dependent block). The presentation of the last two blocks was
pseudo-randomized to minimize any potential order effects.
In these unconstrained selection blocks, a standard set of five questions was asked verbally by
the researcher. Through speech-to-text, the transcript of the question was displayed on the
screen and the participant had five seconds to think of how they would like to respond. These
questions were different for every session but standardised across participants.
For the context-independent (CI) block, we used a corpus of the most commonly used English
words as our source for the suggestion column. As in previous studies, given the absence of
context, the suggestion column was initially empty, forcing the participant to type the first letter
of their response (Figure 2.7). As the system detected the participant’s desired letter(s), the last
column was (re)populated with the most frequent words having the closest distance to the
letters typed so far. However, these words may have been irrelevant to the context of the
question asked.
On the other hand, in the context-dependent (CD) block, after each question was asked, the
transcript of the question was subjected to the NLP engine for intent recognition. Based on the
detected category, the last column was populated with context relevant suggestions that the
participant may potentially have had in mind (Figure 2.2). The participant could decide to either
select among those suggestions or type letters until their intended response appeared in the
suggestion column.
In both blocks, similar to the other online blocks, participants were asked not to correct their
mistakes. If the target word was not among any of the updated suggestions as they were typing
the word letter by letter, they had to select DONE to proceed to the next trial. In order to avoid
misclassification of DONE or any of the other command buttons, we designed our classifier to
only act on these commands after they were selected twice in succession, i.e. selecting the
command cells once had no consequence [22].
Participants completed a survey, the NASA Task Load Index (NASA-TLX) at the conclusion
of each session to capture their experience with our proposed communication system [108].
They also comparatively rated and commented upon the context dependent and independent
systems.
30
2.4 Data Analysis
2.4.1 Offline Session
Preprocessing
EEG signals were resampled to 50 Hz, bandpass filtered between 1Hz to 25 Hz with a finite
impulse response (FIR) filter. A notch filter at 60 Hz was applied to suppress power line
artefacts. Next, trials were epoched from 200ms prior to the stimulus onset to 800 ms post-
stimulus. The average of 200 ms pre-epoch data was subtracted from the data points to cancel
the baseline amplitude offset.
Feature Extraction and Selection
The most commonly used feature extraction for the oddball paradigm is trial averaging, which
along with several alternatives, are described below.
Trial Averaging Method
Recall that the P300 is an event-related potential appearing as a reaction to an infrequent target
stimulus in a series of frequent non-target stimuli. Due to existing noise in measuring the EEG
signals, this ERP will not be visible after only one target stimulus representation. In order to
amplify this peak and reduce the noise, multiple epochs corresponding to the target stimulus
Figure 2.6: A question asked in the context-independent, unconstrained selection block. The suggestion column was initially
empty as the system did not consider the context of the question and only provided suggestions after the user started to type.
Suggestion Column
31
are typically averaged. This method is not be suitable for cases where the latency of P300 varies
among different sessions.
Temporal, spectral and frequency features
Different characteristics of the EEG waveforms can be considered as features. Some of these
features are as follows [109]: 1) ERP latency, 2) Maximum signal amplitude, 3)
Latency/amplitude ratio,4) Absolute maximum amplitude, 5) Absolute latency/amplitude ratio,
6) Positive area, 7) Negative area, 8) The sum of positive and negative area, 9) Absolute value
of the sum of positive and negative area, 10) The sum of absolute positive and absolute negative
area, 11) Average absolute signal slope, 12) Peak-to-peak amplitude, 13) peak-to-peak time
window, 14) Peak-to-peak slope, 15) The number of zero-crossing in the peak-to-peak time
window, 16) Zero crossings per time unit in the peak-to-peak time window, 17) Slope sign
alterations, 18) Mode frequency, 19) Median frequency, 20) Mean frequency, and 21) Wavelet
coefficients.
Concatenation Method
In this method, the epochs of all 𝑁 channels are concatenated to create one feature vector of
length 𝑁 × 𝐷, where 𝐷 is the number of data points in each epoch after downsampling. For our
case we had eight channels each with fifty samples leading to a feature vector of length 400.
In the next chapter, we justify the concatenation method as the preferred approach in this study
for distinguishing target from the non-target classes.
Classification
We tested the selected features with a number of classifiers, namely, Support Vector Machines
(SVM), Random Forest, Linear Discriminant Analysis (LDA), and Naïve Bayes. We
conducted a 10-fold cross validation on the offline data and compared the performance of
different classifiers. The best results were obtained with a Naïve Bayes classifier.
The Bayes Theorem computes the probability of hypothesis 𝐻 given some data 𝐷, i.e. 𝑃(𝐻𝐷):
𝑃(𝐻𝐷) = 𝑃(𝐷𝐻)𝑃(𝐻)
𝑃(𝐷) (2.1)
32
where
𝑃(𝐻𝐷) is the probability of hypothesis H given the data D. Formally, this term is the
posterior.
𝑃(𝐷𝐻), known as the likelihood, is the probability of the data D given the hypothesis
was correct.
𝑃(𝐻) is the probability of H irrespective of the data and is known as the prior
probability of the hypothesis.
𝑃(𝐷) is the probability of the data regardless of the hypothesis.
In the case of a P300 speller, the probabilities in the Bayes Theorem can be rewritten as below
[92]:
𝑃(𝑥𝑡𝒚𝒕, 𝑥𝑡−1, … , 𝑥0 ) = 𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) 𝑃(𝒚𝒕𝑥𝑡 , … , 𝑥0 )
𝑃(𝒚𝒕𝑥𝑡−1, … , 𝑥0 )
= 1
𝑍 𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) ∏ 𝑓(𝒚𝒕
𝒊𝑥𝑡)𝑖
(2.2)
where
𝑃(𝑥𝑡𝒚𝒕, 𝑥𝑡−1, … , 𝑥0 ) is the probability of typing character 𝑥𝑡 given the score of that
character flashing and the characters typed so far.
𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) is the prior probability of having character 𝑥𝑡 after 𝑥𝑡−1, … , 𝑥0. This
is computed through a language model.
𝑍 is the normalising constant.
𝑃(𝒚𝒕𝑥𝑡 , … , 𝑥0 ) is the likelihood probability and reflects the distribution of scores
during stimulation. Based on [5], [92], [93] consecutive flashes are assumed to be
drawn independently from a Gaussian distribution. The probability density function for
the likelihood probability can be computed,
𝑓(𝑦𝑡𝑖𝑥𝑡) =
{
1
√2𝜋𝜎𝑎2𝑒
1
2𝜎𝑎2(𝑦𝑡
𝑖−𝜇𝑎)2
if 𝑥𝑡 𝑨𝑡𝑖
1
√2𝜋𝜎𝑛2𝑒
1
2𝜎𝑛2(𝑦𝑡
𝑖−𝜇𝑛)2
if 𝑥𝑡 𝑨𝑡𝑖
(2.3)
33
where
𝑦𝑡𝑖 is the score for character 𝑥𝑡 for the ith flash.
𝑨𝑡𝑖 is the set of characters illuminated for the ith flash for character 𝑥𝑡 in the
sequence.
𝜇𝑎, 𝜎𝑎 and 𝜇𝑛, 𝜎𝑛 are the means and standards deviation of the distributions for
the attended (i.e., target) and non-attended flashes respectively. These values
are computed from the offline data and updated between online blocks as
mentioned earlier.
The class which maximises the posterior probability will be the output of the classifier.
Language Model and Prior Probabilities
Early studies of P300 spellers considered the prior distribution to be uniform, i.e. 1
𝑁 where 𝑁 is
the number of cells in the grid. In other words, a constant prior probability of 1/36 was
considered for all cells in all trials for a 6 × 6 grid. This naïve approach does not take into
account the differential frequency of letter occurrence given the previously typed letter, e.g.
after the letter q, the letter u is the most likely to occur.
More recently, studies have taken this prior language knowledge into account. Speier et al. [92]
suggested a trigram model using the second-order Markov assumption. Based on this model
the probability of a character 𝑥𝑡 being typed given the last two characters is:
𝑃(𝑥𝑡𝑥𝑡−1, … , 𝑥0 ) = 𝑐(𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡)
𝑐(𝑥𝑡−2, 𝑥𝑡−1) (2.4)
where 𝑐(𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡) is the number of occurrences of the string ‘𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡’. For the first
two characters of a string, i.e., when 𝑥𝑡−2 and 𝑥𝑡−1 are not defined, one can compute the prior
probability as below:
{
𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡)
𝑐(𝑠𝑡𝑎𝑟𝑡) if 𝑡 = 0
𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡−1, 𝑥𝑡)
𝑐(𝑠𝑡𝑎𝑟𝑡, 𝑥𝑡−1) if 𝑡 = 1
(2.5)
34
We decided not to adopt this model for the following reason.
The main assumption in the trigram model is that the last two characters 𝑥𝑡−1, 𝑥𝑡
have been correctly classified and thus the subsequent search would be for
‘𝑥𝑡−2, 𝑥𝑡−1, 𝑥𝑡’ in the corpus. This scheme is problematic if at least one of
𝑥𝑡−2 or 𝑥𝑡−1 had been misclassified, as the subsequent search would then be for an
incorrect string. For instance, for the target word ‘Interesting’, if the first character
in the target word was classified as N instead of I, the probability of classifying N
as the next character will be zero as the count of words that start with NN is zero.
This makes it impossible for the system to recover from a mistake.
Kindermans et al. regularised the n-gram model by applying Witten-Bell smoothing [110]
which assigns small non-zero probabilities to n grams that do not exist in the corpus [107].
However, we designed our system differently to account for out-of-corpus strings. For each
letter the computed probability of it being the target is as follows:
0.475 × (0.85 × 𝑐(𝑥𝑡−1, 𝑥𝑡)
𝑐(𝑥𝑡−1)+ 0.15 ×
𝑐(𝑥𝑡)
𝑐(∗)) (2.6)
This method is philosophically akin to the smoothing algorithm used by [107] and very similar
to the approach taken in [93]. There are a few things to note:
We assumed that almost half of the time, the participant would not select among the
suggestions. This is the justification for using a 0.475 weight for selecting a character.
In order for the system to recover from a mistake, we split the letter probability into
two terms. The first term assumes that the previous character 𝑥𝑡−1 was correctly
classified with an 85% confidence. The second term ignores what has been typed so far
and counts the number of words that have 𝑥𝑡 in position t regardless of the other t-1
characters. 𝑐(∗), denotes the total number of words in the corpus with minimum length
of 𝑡. Similar to [93] the weights were set based on offline analysis.
If no word can be found to match the desired sequence, we set the count to one. This
will mitigate the issue of Out Of Vocabulary (OOV) words.
The reason for using a different method than Witten-Bell smoothing was to account for
adaptively extending the limited corpus as the participant interacts with the system. In other
words, the non-zero value that smoothing methods assign to zero-occurrence of a sequence
35
cannot differentiate between OOV words and misclassification. Separating these two cases as
in our method allows for the automatic addition of new words to the context-based corpus. This
matter is discussed in further detail in chapter 6.
The probability of selecting a full word w from the suggestion column is as follows:
0.475 × {
1
6 if no letter has been selected
𝑐(𝑤) × (𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑣𝑎𝑙𝑢𝑒)𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 else
(2.7)
When the participant has not typed anything, all six suggestions have the same probability of
being selected. As the participant types letters, the probability of each word becomes
proportional to its frequency in the corpus (this count is one for all the words in the context
relevant case), weighted by some penalty value that decreases as the Levenshtein distance from
the typed sequence increases. The penalty value was set to 0.8 empirically.
Threshold Determination
Based on previous studies, the threshold probability that maximised the bit rate was chosen per
subject. [92], [98]. We decided to choose a fixed empirical value for all participants that was
higher than the average optimal threshold for ITR. Based on multiple pilots on different
participants, we set this value to 0.8. More details on this empirical decision can be found in
chapter 5.
2.4.2 Online Session
Online signal processing was similar to that invoked offline. The features were extracted by
concatenation and classified using the method described above. Between every session and
every block within a session, the distribution parameters of the LDA scores were updated for
each participant.
2.5 Assessment Metrics
Conventional performance metrics used for BCI systems are accuracy and ITR. However, ITR
is not useful for the proposed paradigm due to three inapplicable assumptions: 1) selections are
36
independent from one another, 2) marginal probabilities are uniform over the character in the
grid, 3) errors are uniform over the non-target characters.
ITR is computed as follows:
𝐵𝑅 = log𝑁 + 𝐴𝐶𝐶𝑐 log 𝐴𝐶𝐶𝑐 + (1 − 𝐴𝐶𝐶𝑐)𝑙𝑜𝑔1 − 𝐴𝐶𝐶𝑐𝑁 − 1
𝐼𝑇𝑅 = 𝐵𝑅 × 𝐶𝑃𝑀
(2.8)
where BR denotes bit rate, N is the number of cells in the grid, 𝐴𝐶𝐶𝑐 = ∑ 𝛿𝑥𝑡
𝑧𝑡𝑡
𝑛, 𝑡 = 1,… , 𝑛 is
the number of correctly classified characters over the total number of characters selected and
𝛿𝑥𝑡𝑧𝑡 is the indicator function which assumes a value of unity when the classifier output, z
t,
equals the intended target character, xt and is 0 otherwise. CPM is the number of characters
per minute.
Also, ITR largely depends on the length of the word and assigns high values to incorrect strings
that have many letters in common with the target.
Speier et al. suggested an alternative metric to overcome these shortcomings [101] namely
mutual information (MI). MI is computes as follows:
𝐵𝑅 = ∑𝑝(𝑧)(𝐴𝐶𝐶𝑤 log𝐴𝐶𝐶𝑤𝑝(𝑧)
+ (1 − 𝐴𝐶𝐶𝑤)𝑙𝑜𝑔1 − 𝐴𝐶𝐶𝑤1 − 𝑝(𝑧)
)
𝑧
𝑀𝐼 = 𝐵𝑅 ×𝑊𝑃𝑀
(2.9)
The summation is over all the words in the corpus and p(z) is the probability of word z
occurring. 𝐴𝐶𝐶𝑤 = ∑ 𝛿𝑥𝑡
𝑧𝑡𝑡
𝑛 is the number of correctly classified words over the total number
of selected words. WPM is the number of words per minute. The MI computation had to be
slightly altered for our system as all the words in the corpus were not considered for each
selection unlike traditional predictive spellers. For each selection in our system, only a subset
of words belonging to a specific, context relevant category was considered. Therefore, to
estimate the bitrate, we considered the subset of the corpus for each selection and took an
average over all selections. For the sake of comparison with previous work, we have
nonetheless reported both ITR and MI in our results. Also, in order to compare the study and
37
control blocks (unconstrained selection blocks) in terms of communication speed, we measured
the completion time and the number of selections for both blocks. NASA TLX forms were
filled at the end of each session to ascertain the factors that contributed to the system’s task
load.
38
A Novel Combination of
Natural Language Processing
and Brain Computer
Interfaces in a Question and
Answer Context
The following section is a journal article written based on the work completed in this thesis.
The material presented here can be found with greater detail in the other chapters.
3.1 Abstract
A P300 speller is a brain computer interface that can be used as a communication device for
individuals with speech and language impairments. Recent studies have incorporated natural
language processing to further improve the performance of these systems by allowing for
multiple characters being selected simultaneously and/or computing prior probability
39
distributions based on previously selected characters. In this study, we exploited natural
language processing to endow a P300 speller with awareness of conversational context in a
single adjacency pair conversation (i.e., question and answer). Context awareness of the system
was manifested as the generation of appropriate suggestions based both on the question posed
by the communication partner and the characters typed by the user. The proposed paradigm
was tested with ten typically developed adults and compared with previous context independent
systems. The integration of a context relevant predictive speller and answer generation engine
with a P300 brain-based speller led to increases in typing speed (by 42.84%) as well as
character and word accuracies on average across participants when compared to a context
independent P300 speller. Participant satisfaction was also higher with the context dependent
speller. The introduction of conversational context has potential to enhance the function and
user experience of a P300 speller for responding to questions.
Keywords: Brain-computer interface, electroencephalography, P300, natural language
processing, context aware, context independent, answer generation, context dependent.
3.2 Introduction
Some form of communication is necessary for expressing one’s needs and emotions through
body gestures, hand movements, speech or facial expressions. However, many individuals
living with severe disability often are not capable of communicating through these channels
[1]. A brain-computer interface (BCI) such as the P300 speller is a technology which makes
communication feasible through neural activity, eliminating the need for body movement [5].
A typical P300 speller interface involves a grid with letters and special characters; each row
and column flash in a pseudo-random sequence while the user fixates on the desired character
and counts the number of times that character flashes. Each time the corresponding row or
column flashes, a peak in the user’s brain signal occurs, whereas, flashing of the non-target
rows/columns ideally should not elicit such changes in the brain signal. This difference in the
brain signal makes it feasible to detect the desired row and column and therefore identify the
desired character. The main challenge with this BCI system is slow speed as multiple
repetitions are required to increase the signal-to-noise-ratio (SNR). Studies have attempted to
improve the communication speed of P300 spellers by optimising system parameters [68], [73],
40
interface design [17], [65], [75], [77], [79], [80], [82], [111], [112], stimulus hue and pattern
[76], signal processing techniques and classifiers [57], [113]–[115].
The field of natural language has been studied for many years in the domain of linguistics [36],
machine translation [6] and speech recognition [3]. However, language models have only been
recently integrated into the BCI domain [39]. The most common use of NLP in the field of BCI
is in P300 spellers. Language models can be exploited for word completion [38], signal
classification, and error correction [39], ultimately increasing communication rate [20].
Predictive spellers increase typing speed by allowing multiple characters to be chosen through
one selection. One of the first studies to present a predictive P300 speller deployed the Quillsoft
WordQ2 (version 2.5, Quillsoft, Ltd, Toronto, ON) assistive software to generate suggestions
as the user typed; the suggestions in turn, could be selected by focusing on their corresponding
numerical index in the original grid [111]. Although this two-step interface enhanced typing
speed, workload was also increased and accuracy was reduced. Later, Kaufmann et al.
integrated the suggestions into the original grid mitigating the additional cognitive load [22].
Later studies attempted to further improve this system by modifying the interface design and
stimulus [17], [19]. An additional approach to improve the performance of the P300 speller is
to incorporate a language model into the classification stage, i.e. to compute the weights of
each cell in the grid. Each letter has a likelihood of being selected next based on some
probability distribution conditioned on the previous selections. The simplest of such
probabilistic models is the naïve Bayes or Hidden Markov Model, which captures the relative
frequency of n grams, sequences of 𝑛 consecutive characters [92]–[95]. These models are
created by parsing through a corpus of text and counting the number of occurrences of these
sequences. The conditional probability of a character given the previous 𝑛 − 1 characters can
then be computed [92]. More recently, Speier et al. used particle filtering (PF) to compute prior
probabilities and correct errors through statistical modelling [20]. This classifier computed the
probability distribution over possible outputs by sampling a batch of possible realisations, i.e.
a batch of potential output strings typed by the user. Each of these particles moved through the
model independently based on the transition probabilities [20]. This model is useful when it is
impractical to compute the probability distribution over all possible strings in real-time.
The goal of the present study was to further enhance the communication rate of P300 spellers
in a single adjacency conversation pair. Specifically, we combined a context aware predictive
speller and an answer generation engine that comprehends the question being asked of the
41
participant, to efficiently present potential conversational responses. The participant could
either type a response or select from suggested answers. If the participant started typing, the
cells containing suggestions were repopulated with context-relevant words matching the
participant’s typed characters, thereby reducing typing time. With ten typically developed
adults, we investigated whether the incorporation of context awareness and answer generation
yields improvements in online communication rate over a generic P300 speller with predictive
spelling and a language model.
3.3 Methods
3.3.1 Participants
This study was approved by the research ethics boards of Holland Bloorview Kids
Rehabilitation Hospital and the University of Toronto. Ten typically-developed adults aged 20-
40, with no verbal, motor or neurological conditions and normal/corrected vision were
recruited through Holland Bloorview Kids Rehabilitation Hospital and the University of
Toronto. Participants provided informed consent. The study consisted of one offline and three
online sessions, each an hour in duration. Data were collected from each participant on four
different days.
3.3.2 Experimental design
Our design consisted of a speech-to-text tool that converted into text, the question asked by the
conversation partner, who in this study was the researcher. The text of the question was
displayed on the screen for the participant. We used Google’s API for the speech-to-text
conversion. In the next step, this text was sent to an NLP engine to classify the intent of the
question. We used MITIE open source library [103] for detecting the context of the question.
The detected intent was then used to generate six potential answers to the question. The
potential responses were displayed in the 6th column (the suggestion column) of a 6 × 6 speller.
The other 30 cells consisted of letters A-Z and four command cells. The initial suggestions
were predetermined based on frequency and popularity and tagged with the relevant context to
facilitate retrieval on the basis of the detected intent, by the answer generation engine.
42
Participants were given five seconds to locate their target cell. Suggestions were retrieved from
a context-based dictionary with twenty categories and 3302 words. The dictionary was
designed specifically for this study as we did not find any off-the-shelf context-based corpora.
The number of categories was determined based on the design of the experiment, i.e. how many
questions could be accommodated in an hour-long session. We implemented the famous-faces
stimuli with green hue [15] and pseudo-random flashes [104] with stimulus onset of 100 ms
and inter-stimulus interval of 200 ms. A five-second gap followed each selection, allowing the
participant to check the current word suggestions and to navigate through the grid to the target
letter or word. Figures 3.1 and 3.2 illustrate the interface and the general flow of a session,
respectively.
Figure 3.2: Timing of events during a trial. The question was only asked verbally in the last two blocks of the
online sessions. The timing of classification varied among sessions (offline or online) and individuals.
Participants attended one offline session consisting of five blocks, each with six questions
(trials). For each trial, a question and answer pair were shown on the screen for a second
Figure 3.1: After the question was asked verbally by the researcher, its text was shown on the screen and suggested
answers populated the last column.
43
followed by the presentation of the grid flanked by a suggestion column populated with
context-relevant suggestions. For each selection, the target letter/word was highlighted in red.
In three out of six trials, the designated answer was not found in the suggestion column and the
participant was guided to focus on the answer’s first letter. After fourteen flashes, feedback
was provided and the suggestions were updated. Based on the updates, further selections took
place. These trials are herein referred as iterative selection trials. The other three trials in the
block were single selection trials, meaning the designated answer was found among the
suggestions from the beginning of the trial. The reason for this split between iterative and single
selection trials was the assumption that half of the time, the participant will not find their
answer among the generated answers.
Participants attended three online sessions. The arrangement of trials in the first four blocks of
the online sessions resembled that of the offline session. The first block was offline and used
as same day data. The number of flashes were fixed to ten instead of fourteen to reduce the risk
of fatigue. For blocks 2 to 4 of the online sessions, namely the constrained selection blocks,
the answer was provided to the participant in the same manner as in the offline trials (red
highlight); however the number of flashes varied based on the confidence level of the classifier.
The question and answer pairs varied between sessions but were consistent among different
participants. After each of these online blocks, the classifier was retrained. The particpants
were asked not to correct any potential misclassifications as allowing for backspace poses
complex modelling challenges that is, at each selection the possibility of incorrect selections
affects the computation of prior probabilities and would not allow for the full exploitation of
information from the language model [19], [20], [106], [107].
In order to determine if our proposed paradigm could outperform previously studied P300-NLP
spellers in terms of communication rate and to also test our system in a more realistic manner,
we included two unconstrained selection blocks at the end of each online session. In these
blocks the questions were asked verbally and converted to text. The participant was given the
freedom to respond with a word at their discretion. For one unconstrained selection block, the
BCI ignored the context of the question (context-independent block), while for the other, the
BCI invoked the context-dependent answer generation engine (context-dependent block). The
presentation of the last two blocks was pseudo-randomized to mitigate any order effect.
44
Participants completed the NASA Task Load Index at the conclusion of each session to capture
their experience with our proposed communication system [108]. They also comparatively
rated and commented upon the context dependent and independent systems.
3.3.3 Data collection
All data were collected using eight active EEG electrodes, namely Fz, Cz, Pz, P3, P4, PO7,
PO8, and electrode cap BrainAmp DC amplifier (Brain Products GmbH, Germany) sampled at
1000 Hz, grounded at AFz and referenced to FCz [102]. Conductive gel was applied to each
electrode with impedances maintainted below 10 kΩ.
EEG signals were resampled to 50 Hz, bandpass filtered between 1Hz to 25 Hz with a finite
impulse response (FIR) filter. A notch filter at 60 Hz was applied to suppress power line
artefacts. Next, trials were epoched from 200ms prior to the stimulus onset to 800 ms post-
stimulus. The average of 200 ms pre-epoch data was subtracted from the data points to cancel
the baseline amplitude offset. Features were extracted according to [57] where the epochs
across the eight channels were concatenated to obtain a feature vector.
Classification
No explicit artefact removal was implemented; the discrimination between valid ERPs and
other signals, i.e. artefacts was made by the classifier [20]. We used a method similar to the
Bayesian Dynamic Stopping Language Model (DSLM) [93], which consists of an offline and
online portion. During the offline session, the probability density function of target and non-
target signals were computed. This was used in the online sessions to compute the likelihood
of an epoch belonging to one of the two classes. The posterior probability was also proportional
to the prior probability which was dependent on the language model. A bigram model was used
for the prior probability of letters A-Z as formulated in Equation 3.1. The prior probabilities
were weighted by the expected frequency at which cells would be selected: a 0.05 weight for
the four command cells and 0.95 for the other cell, i.e. letter cells and suggested words. Similar
to [93], the probability of selecting a letter was split into two terms to account for previous
misclassifications.
P(𝑥𝑡) = 0.475 × (0.85 × 𝑐(𝑥𝑡−1, 𝑥𝑡)
𝑐(𝑥𝑡−1)+ 0.15 ×
𝑐(𝑥𝑡)
𝑐(∗))
(3.1)
45
where c(𝑥𝑡−1, 𝑥𝑡) is the number of occurrences of the string sequence 𝑥𝑡−1, 𝑥𝑡in the corpus
while c(*) is the total number of words in the corpus with a minimum length of 𝑡. The first
term represents the prediction based on the bigram language model, i.e., the conditional
probability of typing xt assuming 𝑥𝑡−1 was correctly predicted whereas the second term
ignores the language model, i.e., the probability that xt occurs in position t of the word,
regardless of what had been previously typed. An empirical confidence of 85% was assigned
to the bigram search, thereby accommodating instances where 𝑥𝑡−1 may have been incorrectly
predicted. The constant 0.475 was chosen to reflect that approximately half of the time (0.95/2),
the user would not choose among the suggested words and would elect to type. Prior probability
for the words were computed as in Equation 3.2. At the beginning of each trial all six words
had the same probability. As letters were typed, the word probability became proportional to
the corresponding word counts, c(w) , in the corpus penalised by a value dependent on the
Lavenshtein distance between the typed string and the suggested word. The penalty value was
set to 0.80 empirically.
0.475 × {
1
6 if no letter has been selected
𝑐(𝑤) × (𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑣𝑎𝑙𝑢𝑒)𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 else
(3.2)
The number of flashes before classification was determined by the posterior probabilities of
each cell. A decision was made once the cell with maximum portability exceeded a chance
level of 80%.
3.3.4 Evaluation metrics
BCI systems are usually assessed based on their accuracy and speed. Most commonly
information transfer rate (ITR) is computed as a measurement of speed; however, it is not a
suitable metric for this system since it assumes an equal probability for all cells and a uniform
distribution of errors across the grid. Another issue introduced with including a predictive
speller is that the incorrectly selected words may have different lengths from the target.
Therefore, considering accuracy on a character level is not informative [20]. To mitigate this
issue, Speier et al. proposed accuracies on a word level and speed estimates through mutual
information (MI) [20]. In this study, the MI computation had to be slightly altered for our
system as all the words in the corpus were not considered for each selection unlike traditional
46
predictive spellers. For each selection in our system, only a subset of words belonging to a
specific, context relevant category was considered. Therefore, to estimate the bitrate, we
considered the subset of the corpus for each selection and took an average over all selections.
For the sake of comparison with previous work, we have nonetheless reported both ITR and
MI.
3.4 Results
3.4.1 ERP response
As expected, all participants exhibited a negative peak in their EEG response around 200 ms
and a positive peak around 300 ms after the stimulation presentation (Figure 3.3). This
corroborates the waveforms reported in previous works using familiar face stimuli [15], [104].
Topographic scalp maps were generated using EEG data from participant two in the target and
non-target conditions (Figure 3.4). These maps provide some insight into the regions of the
brain involved with the s elective attention task.
Figure 3.3: Average and standard deviation of stimulus response for participant 2 for target (blue) and non-target
(orange) stimuli. Signals were averaged across channels PO7 and PO8. The first arrow indicates the N200 peak,
a negative peak induced by the familiar face stimulus and the second arrow points to the P300 occurrence.
47
Figure 3.4: Topographic map of ERP response in participant 2. This figure shows the expected negative
inflection at 200 ms followed by a positive inflection.
3.4.2 Online performance
Participants achieved an accuracy of at least 95.97% with an average ITR of 43.33 bits/minute.
These accuracies significantly exceeded the chance level of 66.67% (p < 0.05). Unsurprisingly,
misclassifications tended to be cells proximal to the target, i.e., attributable to flashes of cells
in the neighbourhood of the target. Considering the word-level metrics, a minimum accuracy
of 96.29% was obtained with average MI of 10.67 bits/minute. Table 3.1 summarises the
performance of all participants for the constrained blocks in the online sessions.
Table 3.1: Average character ( ACCc) and word ( ACC
w) accuracies, information transfer rate (ITR) and
mutual information (MI) for the constrained selection blocks in online sessions.
𝐴𝐶𝐶𝑐(%) ITR
(bits/minute) 𝐴𝐶𝐶𝑤(%) MI
(bits/minute)
1 95.97 27.3 98.14 7.21
2 100 49.37 100 11.8
3 97.56 52.34 98.14 12.74
4 97.42 39.17 96.29 10.09
5 100 49.56 100 10.99
6 100 44.29 100 10.81
7 97.52 37.59 98.14 9.42
8 99.43 49.43 100 11.28
9 99.39 46.61 100 12.88
10 97.11 37.65 98.15 9.64
Average 98.44 43.33 98.89 10.67
STD 1.48 7.78 1.3 1.61
48
Unconstrained selection blocks
In the unconstrained blocks, all participants achieved higher than chance level accuracy in both
blocks; 82%, 71% for the CD and CI blocks, respectively (p < 0.05). With the CI predictive
speller a minimum accuracy of 90.97% with average ITR of 18.65 bits/minute and 3.65 CPM
were achieved. By incorporating context awareness, all the participants achieved significantly
higher accuracy with a minimum of 97.85% (p = 0.01), average
ITR of 42.64 bits/minute ( p <<10-5) and 8.38 CPM (p = 0.005).
Considering the word accuracy and MI, all participants performed better with the CD speller.
(Table 3.3). With the CI predictive speller, participants selected on average 0.67 words/minute
with 94% accuracy resulting in an average mutual information rate of 6.35 bits/min. When
using the answer generation engine and CD speller, participants achieved significant
improvements, with an average of 1.49 words/minute (p = 0.005), 11.11 bits/minute (p = 0.005)
and accuracy of 98.66% (p = 0.009)
Table 3.2: Character selection rates, accuracies and information transfer rates for all participants using the
context independent and context dependent predictive spellers
CPM(Characters
/minute) 𝐴𝐶𝐶𝑐(%) ITR(bits/minute)
CI CD CI CD CI CD
1 2.91 8.08 90.97 100 12.28 38.6
2 3.7 9.93 96.67 97.85 17.94 50.4
3 3.63 7.79 97.23 98.55 16.37 39.45
4 3.31 5.37 98.92 100 16.7 27.78
5 3.93 11.25 96 100 18.36 58.18
6 3.49 7.82 100 100 18.04 40.43
7 2.71 5.72 100 100 13.99 29.59
8 4.88 10 97.22 100 32.7 51.69
9 4.21 10.64 100 100 21.76 54.99
10 3.77 7.17 97.3 100 18.4 35.33
Average 3.65 8.38 97.43 99.64 18.65 42.64
STD 0.59 1.92 2.72 0.78 5.57 10.6
49
Table 3.3: Word selection rates, accuracies and mutual information for all participants using the context
independent and context dependent predictive spellers
WPM(Words
/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)
CI CD CI CD CI CD
1 0.65 1.77 86.67 100 5.22 13.34
2 0.71 1.9 93.33 93.33 6.35 13.48
3 0.65 1.41 86.67 93.33 6.09 10.73
4 0.56 0.9 93.33 100 5.13 6.82
5 0.67 1.85 93.33 100 5.81 14.06
6 0.55 1.23 100 100 5.53 9.29
7 0.56 1.19 100 100 7.03 9.12
8 0.94 1.47 93.33 100 8.83 10.39
9 0.83 2.1 100 100 8.35 16.03
10 0.56 1.04 93.33 100 5.15 7.88
Average 0.67 1.49 94 98.66 6.35 11.11
STD 0.12 0.38 4.92 2.81 1.26 2.83
3.4.3 Surveys
NASA TLX surveys were collected and analysed after each session. From the offline to online
sessions, there seemed to be a decrease in the level of mental demand, effort and frustration.
This reduction was expected as the number of flashes was fixed at fourteen for the offline
session but variable and capped at four sequences for the online sessions. As such, the stimulus
intervals and hence the period of required attention (effort and mental demand) were shortened,
while likely inducing less frustration among the participants. A decrease in the temporal
demand and effort was seen among 60% of the participants across the online sessions.
Comparing the weights of all six factors, mental demand had the highest rank with an average
of 3.73±1.44, which was not surprising given that the BCI task required attention. The overall
task load was 28.57/100 for all participants and for 60% of them a decrease in overall task load
was seen between the first and last online session.
All participants preferred the CD block stating it was easier and afforded more flexibility with
which to express their answers, reduced mental demand and fatigue and converged to their
desired answers faster. The comments on the CI was that the irrelevant suggestions were
distracting and at times caused frustration as more selections were necessary to arrive at their
desired answer.
50
3.5 Discussion
Incorporating the answer generation engine and context dependent predictive speller increased
the typing speed on average by 128% for ITR and character accuracy by 2.3%. Likewise, MI
increased by 75% while word accuracy jumped by 5%. These significant improvements were
due to the ability to select an appropriate word with fewer selections, if not from the beginning
of every trial.
For this study, we built a context-based corpus consisting of twenty different categories and
3302 words. This corpus was created manually and had fewer words compared to standard
corpora such as the Brown corpus [116]. The size of the corpus impacts system’s performance
in two ways: one being the diversity of word suggestions and the other being the mutual
information rate. It is important to have a broad enough corpus to be able to predict any word
of which the participant thinks. For a small number of questions in the unconstrained selection
blocks, the word that some participants had in mind did not exist in the corpus (participant 2
for two questions, participant 4 and participant 10 for one question). In such trials, the
participants only recourse was to type out the entire word, leading to increased completion
times and number of selections. However, since this occurred only for at most two questions
in total, the context-dependent paradigm remained advantageous in terms of the selected
metrics. We updated our corpus after each of these sessions. The size of the corpus also affects
the mutual information rate. Recall that the word bit rate is the amount of information that is
conveyed in a single word selection [101]. The more words in a corpus, the less the probability
of each word. The summation of such small word probabilities over the entire corpus leads to
a higher bit per selection. To compute the information rate over time, this bit rate is multiplied
by the average number of words per minute. Although we had fewer words in our corpus, its
context awareness led to a significant difference in the WPM between the CD and CI blocks
(p < 0.005) resulting in a higher MI rate compared to the larger CI corpus.
It is important to note that the performance metrics, e.g. ITR, MI are highly dependent on the
design and timing of the paradigm, the length of words the participant decides on and the
utilised software and hardware. Some studies have created their own software [17], [98] while,
others have deployed products available on the market [19], [21]. Also, different studies utilise
different machines, bioamplifiers, caps, and other instrumentation. Therefore, it is not possible
to conduct an objective comparison between studies. The focus of this study was to investigate
51
the effect of combining a P300 BCI, context relevant predictive speller and an answer
generation engine in a single adjacency pair conversation. Therefore, a core component of our
paradigm was asking and/or displaying a question on the screen for a few seconds giving the
participant time to process what they had been asked. None of the previous studies have studied
a BCI speller in the context of a conversation and thus had less time gaps between multiple
selections. In order to strike a comparison with previous studies, we conducted the CI and CD
blocks at the end of each online session and measured the performance. Although it is clear
that the additional time allocated to the beginning of each trial reduces the measured ITR and
MI in general, when comparing a context independent and context dependent system in a
question and answer context, our findings point to distinct speed and accuracy advantages of
the latter.
This paper verified the potential improvements achievable in a P300 speller by integrating a
context relevant predictive speller and answer generation engine in a single adjacency pair
conversation. However, the proposed paradigm was tested exclusively with typically
developed adults. In the spirit of previous studies that have reported promising use of P300
spellers by clinical populations [19], [50], [91], [117], further investigation is necessary to
confirm the usefulness of this system with individuals with complex communication
challenges, e.g. individuals with ALS or CP.
3.5.1 Limitations and future directions
The manually constructed context-based corpus was limited compared to standard corpora. In
order to reduce the chance of OOV words during interaction with the system, adaptive and
automatic addition of new words to the corpus will be beneficial. As discussed in chapter 2, we
had split the probability of selecting a letter into two terms to account for correct and incorrect
previous misclassifications. The computation can be used to flag whether the participant is
trying to select a word from the corpus or not and to automatically add the word to the
appropriate corporal category.
Another way of expanding the corpus could be to algorithmically screen internet articles and
webpages, preprocess the text, detect the category of each word and automatically add them to
a context-based dictionary.
52
Extension of the language model will be necessary to account for typing phrases and sentences.
This will possibly require modifications to the language model to transition from word to space
while maintaining the context. The proposed interface studied a unidirectional conversation
held by the researcher. However, a more realistic system should allow for a bidirectional
conversation affording more control to the participant. Therefore, it is important to
accommodate both conversational response and initiation. From an implementation
perspective, one possible approach to this challenge could be the implementation of a command
button that allows for a switch between response and initiation, where for example, the latter
would allow the participant to pose a question of their conversation partner. This will
potentially lead to a cumulative context that needs to be tracked by the NLP engine for
appropriate suggestions as the dialogue evolves.
Allowing for interaction between this system and other potential software such as games, web
browser, etc. is another area that will lead to a more realistic use case of BCI applications. Also,
there are many aspects of the BCI hardware itself that require further simplification and
improvement to allow for daily usage by individuals with communication challenges, e.g.
comfortable gel free electrodes on wireless caps and quicker set-up time.
It will be useful to have a GUI that allows for customisation. Some participants may prefer
speed over accuracy and therefore be willing to decrease the decision threshold. This is
understandable as in many cases the intended answer can be comprehended regardless of some
misclassifications. Another useful feature could be to allow the participant or caregiver to
predefine the initial suggestions based on common words preferred by the participant, e.g.
favourite food, games, etc.
3.6 Conclusion
In this work a communication system was designed with the ultimate objective of improving
the conversational function of a P300 speller. Our findings suggest that machine awareness of
conversational context, as realized through a combination of a context sensitive predictive
speller and an answer generation engine, can significantly improve classification speed and
accuracy in the P300 speller in single adjacency pair conversations. Subjective workload is
also reduced in the context-dependent paradigm. Collectively, these findings support future
53
incorporation of natural language processing, predictive spelling and language models in brain-
controlled communication devices.
54
Results
4.1 Overview
In this chapter, we expand upon the presentation of the results of our study and compare them
with previous studies. Since the accuracies and bit rates were not normally distributed, the
Wilcoxon signed-rank test was deployed. Note that some results in section 4.5.2 are replicated
from the previous chapter; however, in this chapter, we discuss the results in further detail.
4.2 Feature Extraction
We computed the temporal, spectral and frequency features as described in the previous
chapter. This feature set performed poorly in distinguishing between the target and non-target
groups as seen in Figure 4.1.
We then concatenated the EEG signals from the eight channels and as depicted in Figure 4.2,
the distributions were distinctly separable. The shape of the distributions for all participants
were similar, with slight differences in the separation of target and non-target groups.
55
Figure 4.1: LDA score distribution of target and non-target signals for participant 8 using the spatiotemporal
features as described in chapter 2. This set of features was not able to differentiate between target and non-target
signals.
Figure 4.2: LDA score distribution for target and non-target for participant 8 using the concatenation method.
This set of features was able to differentiate between target and non-target signals.
56
4.3 ERP responses
As expected, all participants exhibited a negative peak in their EEG response around 200 ms
and a positive peak around 300 ms after the stimulation representation (Figure 4.3). This
corroborates the waveforms reported in previous works using familiar face stimuli [15], [104].
Topographic scalp maps were generated using EEG data from participant two in the target and
non-target conditions (Figure 4.4). These maps provide some insight into the regions of the
brain involved with the selective attention task.
4.4 Participant-specific offline classification results
As the number of flashes increased, the accuracy also increased due to the accumulation of
P300 scores (Figure 4.5); however, there was a trade-off between accuracy and speed. We
therefore had our system dynamically decide on whether to classify a selection or continue
flashing. This decision was determined by two factors.
1- Threshold, i.e. if any of the cells had a probability higher than a certain value.
2- Maximum number of flashes, which was empirically set to four for the online sessions
after inspecting the offline data. Note that the number of flashes is per row/column. i.e.
four flashes per row meaning each cell flashes eight time at maximum.
We investigated the effect of threshold on performance in Figure 4.6. As expected, the higher
the threshold, the more information gathered and the better the performance. The threshold
value was determined by multiple pilots on different individuals. After each session we
inspected the highest probability among the thirty-six cells after each flash sequence. We
noticed that in almost all selections the non-target cell with the highest probability was less
than 80%, i.e. when a cell had exceeded 80% chance of being the target, it was most likely the
true target. We therefore empirically set the threshold value to 80% for all participants. The
average character accuracy and word accuracy for the offline session with a fixed threshold at
80% was 96.49 ± 4.76 % and 96.34 ± 5.66%, respectively. These values were the result of
averaging over a 10-fold cross validation. All participants achieved above-chance accuracy,
57
i.e. >64% (p < 0.05), which was estimated using the binomial distribution and the number of
selections as the dataset was relatively small [118].
The misclassifications occurred at cells adjacent to the target. This was expected as
illumination of cells close to the target are a source of distraction and to some degree inevitable
even with pseudo-random flashes. It took on average two sequences, i.e. four flashes to achieve
this accuracy. While only four out of fourteen flashes was taken into account to obtain this high
accuracy, we included the additional data to ensure that we trained a generalizable model. Table
4.1 summarises the offline performance for all participants.
Figure 4.3: ERP classification accuracy in 10-fold cross-validation versus the number of stimulus repetitions.
58
Figure 4.4: ERP classification accuracy in 10-fold cross-validation versus the threshold value.
Table 4.1: Offline performance
Participant 𝐴𝐶𝐶𝑐(%) 𝐴𝐶𝐶𝑤(%) Average #
Repetition
1 100 100 2.4
2 96.89 96.67 1.78
3 100 100 1.66
4 100 100 2.44
5 96.89 96.67 1.28
6 96.27 96.67 2.08
7 96.27 96.67 2.53
8 95.34 96.67 1.63
9 100 100 1.62
10 83.23 0.8 2.78
Average 96.49 96.34 2.02
STD 4.76 5.66 0.47
4.5 Participant-specific classification online results
4.5.1 Constrained Blocks (Blocks 2-4)
All participants achieved above-chance accuracy, 66.67% (p < 0.05). In many cases, the
mistakes occurred while the participant was distracted by the neighbouring cells as in the
offline session. Table 4.2 summarises the performance of all participants for blocks 2-4 in the
online sessions.
59
Table 4.2: Average accuracies, information transfer rate and mutual information for constrained blocks in
online sessions.
𝐴𝐶𝐶𝑐(%) ITR(bits/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)
1 95.97 27.3 98.14 7.21
2 100 49.37 100 11.8
3 97.56 52.34 98.14 12.74
4 97.42 39.17 96.29 10.09
5 100 49.56 100 10.99
6 100 44.29 100 10.81
7 97.52 37.59 98.14 9.42
8 99.43 49.43 100 11.28
9 99.39 46.61 100 12.88
10 97.11 37.65 98.15 9.64
Average 98.44 43.33 98.89 10.67
STD 1.48 7.78 1.3 1.61
4.5.2 Unconstrained selection blocks
The performance of context dependent (CD) and context independent (CI) unconstrained
selection blocks were compared in terms of accuracy, ITR, MI, number of selections and
completion time. All participants achieved higher than chance level accuracy in both blocks;
82% and 71% for the CD and CI blocks, respectively (p < 0.05). All participants achieved a
higher information transfer rate when using the CD predictive speller. When using the CI
predictive speller, participants selected on average 3.65 characters/minute with 97.43%
accuracy resulting in an average bit rate of 18.65 bits/minute. When using the CD speller,
participants achieved significant speed improvements, with an average CPM of 8.38
characters/min (p = 0.005) and an average bit rate of 42.64 bits/minute (p = 1.06× 10−6) and
accuracy of 99.64% (p = 0.01). For more details refer to Table 4.3. Figures 4.7 and 4.8
graphically depict character level performance differences between the CI and CD blocks for
each participant, specifically in terms of character accuracy and information transfer rates,
respectively.
The reason for the high standard deviation of participant 2 (CD block) in Figure 4.8 is that two
out of five of the participant’s target words in the first online session did not exist in the context-
based corpus and therefore the participant had to type the entire word. We will later discuss
this matter in further detail in the next chapter section 5.2.
60
Table 4.3: Selection rates, character accuracies and information transfer rates for all participants using the
context independent and context dependent predictive spellers
CPM(Characters
/minute) 𝐴𝐶𝐶𝑐(%) ITR(bits/minute)
CI CD CI CD CI CD
1 2.91 8.08 90.97 100 12.28 38.6
2 3.7 9.93 96.67 97.85 17.94 50.4
3 3.63 7.79 97.23 98.55 16.37 39.45
4 3.31 5.37 98.92 100 16.7 27.78
5 3.93 11.25 96 100 18.36 58.18
6 3.49 7.82 100 100 18.04 40.43
7 2.71 5.72 100 100 13.99 29.59
8 4.88 10 97.22 100 32.7 51.69
9 4.21 10.64 100 100 21.76 54.99
10 3.77 7.17 97.3 100 18.4 35.33
Average 3.65 8.38 97.43 99.64 18.65 42.64
STD 0.59 1.92 2.72 0.78 5.57 10.6
61
Figure 4.5: Comparing average character accuracy using context independent (green) and context dependent
(red) predictive spellers for all participants.
Figure 4.6: Comparing average information transfer rate using context independent and context dependent
predictive spellers for all participants.
62
When using word level metrics, all participants achieved a higher bit rate using the CD
predictive speller, as reported in Table 4.4. With the CI predictive speller, participants selected
on average 0.67 words/minute with 94% accuracy resulting in an average mutual information
rate of 6.35 bits/minute. When using the CD speller, participants achieved twice as much speed,
with an average WPM of 1.49 words/minute (p = 0.005), an average bit rate of 11.11
bits/minute (p = 0.005) and accuracy of 98.66% (p = 0.009).
Figures 4.9 and 4.10 graphically contrast the CI and CD blocks for each participant on a word
level.
Table 4.4: Selection rates, word accuracies and mutual information for all participants using the context
independent and context dependent predictive spellers
WPM(Words
/minute) 𝐴𝐶𝐶𝑤(%) MI(bits/minute)
CI CD CI CD CI CD
1 0.65 1.77 86.67 100 5.22 13.34
2 0.71 1.9 93.33 93.33 6.35 13.48
3 0.65 1.41 86.67 93.33 6.09 10.73
4 0.56 0.9 93.33 100 5.13 6.82
5 0.67 1.85 93.33 100 5.81 14.06
6 0.55 1.23 100 100 5.53 9.29
7 0.56 1.19 100 100 7.03 9.12
8 0.94 1.47 93.33 100 8.83 10.39
9 0.83 2.1 100 100 8.35 16.03
10 0.56 1.04 93.33 100 5.15 7.88
Average 0.67 1.49 94 98.66 6.35 11.11
STD 0.12 0.38 4.92 2.81 1.26 2.83
These results were expected as the CD predictive speller produced relevant suggestions as soon
as the question was asked, decreasing the number of selections, chances of error, completion
time and as a result minimized user fatigue.
All participants completed the unconstrained selection block in less time and with fewer
selections when using the CD speller (Table 4.5). With the CI predictive speller, participants
answered the five questions on average in 9.07 minutes with an average of 24.33 selections.
When using the CD speller, participants typing speed (in minutes) improved by 57.55%, with
an average completion time of 3.85 minutes (p = 0.005) and an average of 10.53 selections (p
= 0.005). Figures 4.11 and 4.12 graphically compare the completion time and number of
selections between the CI and CD blocks for each participant on a word level.
63
Figure 4.7: Comparing average word accuracy using context independent (green) and context dependent (red)
predictive spellers for all participants.
Figure 4.8: Comparing average mutual information using context dependent and context independent predictive
spellers for all participants.
64
Table 4.5: Completion time and number of selections for all participants using the context independent and
context dependent speller.
Figure 4.9: Comparing average completion time using context dependent and context independent predictive
spellers for all participants
Completion
Time
(minutes)
Number of
Selections
CI CD CI CD
1 8.15 2.98 17.33 8.33
2 7.37 3.9 22 10.67
3 8.18 3.76 27.67 11.33
4 8.97 5.61 23.67 14
5 8.54 2.8 26 6.67
6 9.18 4.28 29.67 12.67
7 9.07 4.28 23.67 9.67
8 7.37 3.44 28 11.33
9 6.19 2.46 20.33 6.67
10 8.95 5 25 14
Average 9.07 3.85 24.33 10.53
STD 0.92 0.94 3.57 2.56
65
Figure 4.10: Comparing average number of selections using context dependent and context independent
predictive spellers for all participants.
4.6 Surveys
At the end of each session the NASA TLX survey was provided to the participants (see
Appendix A.1) to gauge their experience in terms of: 1) the overall workload of different tasks
and, 2) the main sources of workload [50]. Task load is defined as a “hypothetical construct
that represents the cost incurred by a human operator to achieve a particular level of
performance” [108]. In this standard survey, participants were asked to rank six factors in terms
of difficulty from 0-100. These factors were mental demand, physical demand, temporal
demand, performance, effort and frustration. The participants then weighed each of these
factors against the others. These rankings and weights were then used to compute the overall
task load of interacting with the system.
From the offline to online sessions, there seems to be a decrease in the level of mental demand,
effort and frustration. Which was expected as the number of flashes was fixed at fourteen for
the offline session whereas for online blocks it was maximum six flashes (the maximum
number was set to eight in our design, but for all participants and all sessions it took at most
six flashes before deciding on a selection). This decrease obviously reduced the stimulus
66
intervals leading to a shorter period of required attention (effort and mental demand) and less
frustration among the participants.
Mental demand ranks were generally consistent for each individual but varied between
participants.
A decrease in the temporal demand and effort was seen among 60% of the participants across
online sessions. These individuals mentioned in the additional survey that as they attended
more sessions, they became more accustomed to the timing of the trials.
There seems to be a negative correlation between effort and frustration. This seems logical as
the more effort one makes in attending to the stimuli, the faster and more accurate the system,
which can lead to less frustration.
Comparing the weights of all six factors, mental demand had the highest rank with an average
3.73±1.44, which was not surprising since the BCI task necessitated visual attention.
The overall task load was less than 28.57/100 for all participants and for 60% of them a
decrease in overall task load was seen between the first and last online session.
For the online sessions, we provided an additional survey (see Appendix A.2) asking
specifically about the participants’ preference regarding the unconstrained selection blocks and
their reasoning behind that choice. All participants preferred the CD block stating it was easier
and more flexible, reduced mental demand and fatigue and converged to their desired answers
faster. The comments on the CI was that the irrelevant suggestions were distracting and at times
caused frustration as more selections were required to reach their desired answer.
67
Discussion
5.1 Overview
While the results presented in the previous chapter were in line with our expectations, there
remains a number of challenges that must be addressed to further improve this system. In this
chapter, we will discuss different aspects of the designed NLP-BCI system that require further
study and inspection. This chapter is an extended version of the discussion section (3.5) from
Chapter 3.
5.2 Context-based corpus
Recall that for this study we built a context-based corpus consisting of twenty different
categories and 3302 words. This corpus was created manually and thus was modest in size
compared to standard corpora such as the Brown corpus [116]. The size of the corpus affects
the system’s performance in two ways.
One is the variety of suggestions made; it is important to have a broad enough corpus such that
any word of which the participant thinks could be predicted. For a small number of questions
in the unconstrained selection blocks, the answers that some participants had in mind did not
exist in the corpus (participant 2 for two questions, participant 4 and participant 10 for one
question). In such trials, the participants had to type out the entire word, leading to an increased
completion time and number of selections. However, since this occurred only for at most two
questions in total, the overall findings were unaffected. We updated our corpus after each of
these sessions.
68
The size of the corpus also affects the mutual information rate. As mentioned earlier, word bit
rate is the amount of information that is conveyed in a single word selection [101]. The more
words in a corpus, the lower the probability of each word. The summation of such small word
probabilities over the entire corpus leads to a higher bit rate per selection. To compute the
information rate over time, this bit rate is multiplied by the average number of words per
minute. Although we had fewer words in our corpus, its context awareness led to a significant
difference in the WPM between the CD and CI blocks (p < 0.005) resulting in a higher MI rate
compared to the larger CI corpus.
5.3 Design parameters
It is important to note that the performance metrics, e.g. ITR, MI are highly dependent on the
design and timing of the paradigm, the length of words the user decides on and utilised software
and hardware. Some studies have created their own software [17], [98] while others have used
products available on the market [19], [21]. Also, different studies utilise different machines,
bioamplifiers, caps, etc. Therefore, an objective comparison between studies is not possible.
The focus of this study was to investigate the effect of combining a P300 BCI, context relevant
predictive speller and an answer generation engine in a single adjacency pair conversation.
Therefore, a core component of our paradigm was asking and/or displaying a question on the
screen for a few seconds giving the participant time to process what they have been asked.
None of the previous studies have studied a BCI speller in the context of a conversation and
thus had less time gaps between multiple selections. In order to have a comparison of previous
studies with ours, we conducted the CI and CD block at the end of each online session and
measured the performance. Although it is clear that the additional time allocated at the
beginning of each trial reduces the measured ITR and MI in general, when comparing a context
independent and context dependent system in a question and answer context, our results
validate that the context dependent outperforms.
There were a number of other design decisions made in this study such as dividing the trials
into single selection and iterative selections. This distribution was based on the assumption that
approximately half of the times, the user will not find their answer among the suggestions and
will start typing out the letters until the word is suggested. The results from the unconstrained
context dependent selection blocks validated this assumption. Participants found their desired
69
response on average 56% of the time among the initial suggestions. It is worth noting that the
order of the two unconstrained blocks were pseudo-randomised to minimize order effects.
Another design parameter was the decision threshold and number of stimuli repetitions.
Different approaches have been taken regarding this parameter in previous studies. Earlier
studies used a fixed number of repetitions [21], [22] based on offline accuracy. This does not
seem to be appropriate given the inter and intra-participant variability [93]. More recent studies
have investigated a dynamic (early) stopping criterion [92], [93]. Speier et al. optimised the
threshold per participant based on offline bit rates [92], but used a constant value of 95% in
their later studies [20], [119]. Kindermans et al. and Mainsah et al. empirically allocated fixed
thresholds of 99% and 90% for all participants [93], [107]. A similar approach to [93] was
taken in this design; based on pilot results of multiple participants, we decided on the value of
80%.
The maximum number of repetitions also varied among different studies. Speier et al. and
Kindermans et al. set a maximum of fifteen sequences before making a decision, i.e. if no cell
exceeded the threshold and every cell was illuminated for thirty times, a decision had to be
made [20], [92], [107], [119]. Mainsah et al. set the maximum number of stimulus to seven
sequences (fourteen flashes of each cell) in an earlier study, and later to ten sequences (twenty
flashes of each cell) [93], [94]. None of these studies set a minimum number of flashes and
after the first sequence the maximum of cell scores or probabilities was compared with the
threshold value and a classification decision was made accordingly. However, Kaufmann et al.
set a minimum of four sequences (eight flashes) to prevent high error rates in long spelling
sessions and a maximum based on the offline results of each participant. If the offline accuracy
exceeded 75% for a participant, the maximum was set to two more than the number of
repetitions that yielded the highest offline accuracy. In the case that a participant did not
achieve a minimum of 75% accuracy offline, a maximum of fifteen sequences was set for their
online sessions [22]. Based on these studies and our own pilot sessions, we set the maximum
repetition to four sequences. The collected data validated this choice as the average number of
flash sequences prior to a decision was 1.99 ± 0.39. Previous studies have not reported this
average, precluding direct comparison with past literature.
Artefact detection and removal is another important design factor. Studies have taken different
approaches regarding ocular artefact removal. Some studies have band pass filtered the data in
bands higher than the range of ocular and blink artefacts (>3 Hz) [5]. Some have used other
70
methods such as Eye Movement Correction Procedure (EMCP), which estimates a propagation
factor describing the relationship between electro-oculogram (EOG) and EEG records [120].
Yet other studies have shown that the classifier is able to detect valid ERPs in an epoch without
artefact rejection [20], [22]. We further investigated this statement by conducting a pilot session
where the participant was asked to blink abnormally during the stimulus presentation. No
changes were noticed in the system’s performance. We therefore decided not to take any
additional step in ocular artefact removal.
5.4 Interface modifications
The designed interface validated our research hypothesis; however, there remain questions as
to whether the arrangement of the grid was the most appropriate. Past studies with predictive
spellers have proposed different designs some which have been demonstrated in chapter 1.
Ryan et al. and Akram et. Al. presented participants with an additional window listing the top
retrieved suggestions from the dictionary [17], [21] and participants had to focus on the number
in the original grid corresponding the desired word in the suggestion list. This was subsequently
shown to increase the cognitive work load for the user, resulting in a decrease in accuracy [22].
A later study presented the suggested words in an additional column, positioned to the left of
the 6´ 6 letter grid, further away from the grid than the intercolumn distance within the grid
[22]. Another study designed a 6 × 6 grid that included positive single digit numbers. After the
user typed a letter the top 6 suggestions replaced the letters in the first row and the rows were
shifted one row downwards keeping only digits eight and nine [20]. It appears that in this
experimental design, the user was never asked to select a number and therefore replacing them
with suggestions did not affect the user’s target cell. Guy et al. designed an asymmetric grid
with 43 cells and the suggestions in the right most column [19]. None of these studies
mentioned the reasoning behind their design. Additional research must be completed to
determine the optimal BCI interface from a human factors perspective. During the initial online
sessions (CD block), some participants missed their desired word among the suggestions for a
variety of reasons. Some participants felt rushed to fixate on a target cell, while others had
confused the procedure with the CI block. Conceivably, the context-independent experience of
typing on their smart phones, where suggestions arise only upon character entry, contributed
to this confusion. Although after more interaction with the system over a number of sessions,
participants became more familiar and adjusted to the system and its timing, it is unclear
71
whether these mistakes could have been avoided with an alternative positioning of the
suggested words.
5.5 Error correction
Studies have taken different approaches to the correction of typing mistakes based on whether
a language model was used or not. Those without language models provided back space or
delete command buttons and asked users to correct the mistakes, during online trials [18], [19],
[21], [22]. In these studies, all grid cells had the same chance of occurrence as no prior language
probability was taken into account. Other studies have integrated error-related potentials (ErrP)
to correct errors automatically [121]–[123]. The ErrP signal is typically generated 50-100 ms
after an error is detected by the user [7]. However, studies including language models did not
allow for correction by the users and relied strictly on the language model [20], [92]–[94],
[106], [107], [119]. The reasoning behind this decision was that selecting back space in a
history-based language model complicates inference. Allowing for correction means that at
each selection, two cases are possible: the current classified character is either correct or
incorrect in which case the user has to select backspace. This scheme weakens the inference
approach as no history is considered and the language model is not used to its fullest extent
[107]. We, therefore, decided to not allow for corrections by the user and relied on the language
model in that regard.
5.6 Alternative modalities
This study verified that integrating a context aware predictive speller with a BCI speller in a
single adjacency conversation pair improves performance (i.e., speed and accuracy). However,
further research must be conducted to verify whether modifications in the P300 stimulus, using
other BCI modalities or combining two modalities could further enhance this performance. As
mentioned in chapter 1, SSVEP paradigms tend to gain higher bit rates as they do not require
a minimum of two flashes [58]–[61]. Some studies have focused on combining modalities such
as SSVEP with P300 [5], [124] and eye tracking with SSVEP [125]. An eye tracker is a high
speed commercially available device that could be used if certain conditions are met. Some
limitations with the eye tracker is its high dependency on the lighting of the environment, poor
72
performance with light coloured eyes and necessity for strict user positioning in the field of
view of the camera. Stawicki et al. have leveraged the high speed of eye trackers with high
classification accuracies of SSVEP-based BCIs in a speller and showed that this combination
improved performance compared to a stand-alone eye-tracking system or SSVEP speller.
Additionally, the two simultaneous stimuli proposed by Kaufmann et al. in [89] could also be
considered as an alternative to increase bit rate.
5.7 BCI target population
This thesis verified the potential improvements in a P300 speller by integrating a context
relevant predictive speller and answer generation engine in a single adjacency pair
conversation. However, the proposed paradigm was tested on typically developed adults.
Further investigation is necessary to confirm the usefulness of this system with individuals with
complex communication needs, e.g. individuals with advanced ALS or severe CP. Studies with
clinical populations using regular P300 spellers, i.e. with no language model and predictive
speller, have reported promising results. Donchin et al. tested a 6 × 6 speller with four
participants with paraplegia and gained an 80% accuracy level with 5.9 items/minute [91].
Sellers et al. achieved above 75% accuracy for nine out of fifteen participants with ALS [117].
Zickler et al. studied the feasibility of performing four tasks, namely, copy spelling, free
spelling of a sentence, typing a word and selecting appropriate commands to send an email,
and using a web browser on four participants with severe disabilities in their homes. They
reported above 70% accuracy with an average ITR of 8.56 bits/minute [50]. Although the
number of selections were predefined and the different tasks were not integrated into one
system, this study showed the capacity of P300 BCIs for daily usage. More recently, Guy et al.
conducted an experiment with twenty individuals with ALS using a P300 BCI integrated with
a predictive speller and the familiar face stimulus in a semi-realistic environment, where an
occupational therapist with no prior BCI experience setup the BCI in a regular office space in
a hospital. Impressively, 65% of participants gained up to 95% accuracy with 5.04 correct
symbols/minute [19]. More recent studies tested P300 spellers integrated with language models
on non-typically developed adults. Speier et al. tested a P300 speller that used a language model
with six participants with ALS in their homes. All participants gained above 84% accuracy and
73
a minimum of 6 CPM. Mainsah et.al conducted a study with a similar experimental set-up,
feature set and classifier involving ten participants with ALS and reported 76.39% accuracy.
P300 studies have also been conducted on paediatric participants with conditions such as CP
with promising results, indicating that little difference in latency and amplitude of the P300 in
children with mild CP compared to that of typically developing children [126], [127].
For participants facing challenges in controlling gaze dependent P300 spellers, alternative
interfaces can be used as discussed in chapter 1.
74
Conclusion
6.1 Overview
This work demonstrated the design and implementation of a NLP-P300 speller in a single
adjacency pair conversation context. The paradigm consisted of a speech-to-text tool that
converted the question asked by the conversation partner into text. The text of the question was
displayed on the screen for the user. In the next step, this text was sent to an NLP engine which
generated six potential answers to the question. The potential responses were displayed in the
6th column (the suggestion column) of a 6 × 6 speller. The initial suggestions were
predetermined based on frequency and popularity and tagged with the relevant context so the
answer generation engine retrieved words based on the detected intent. We compared our
proposed system with previous studies by designing two unconstrained selection blocks at the
end of each online session. In the context independent (CI) unconstrained block, five questions
were asked from the participant and a context independent predictive speller populated the
suggestion column based on what had been typed; therefore, the user was forced to type at least
one letter for suggestions to be retrieved. On the other hand, in the context dependent (CD)
unconstrained block, the same five questions were asked and potential answers populated the
suggestion column allowing the user to select a word from the beginning. Users were asked to
give the same answers they gave in the CI block, assuming the CI block occurred first. If the
user did not find their answer among the suggestions, other context relevant words were
suggested as they started typing letters. This system was tested on 10 typically developed
adults. All participants gained above chance accuracy and achieved a higher information
transfer rate with the CD predictive speller, averaging 8.24 characters/min (p = 0.005) and an
average bit rate of 42.01 bits/minute (p =1.06 × 10−6) and accuracy of 99.55% (p = 0.01).
75
6.2 Future work
6.2.1 Adaptively expanding the corpus
As mentioned previously, the context-based corpus was built manually and therefore is limited
compared to the standard corpora. In order to reduce the chance of OOV words during
interaction with the system, adaptive and automatic addition of new words to the corpus will
be beneficial. As discussed in chapter 2, we split the probability of selecting a letter into two
terms to account for correct and incorrect previous misclassifications. The result of this
computation can be used to flag whether the user is trying to select a word from the corpus or
not and automatically add it to the corpus in its appropriate category if flagged as a new word.
Another way of expanding the corpus could be by writing a script to screen internet articles
and webpages, preprocess the text, detect the category of each word and automatically add the
words to a context-based dictionary.
6.2.2 Expressive communication: Taking turns
Extension of the language model will be necessary to account for typing phrases and sentences.
This will possibly require modifications in the language model to transition from word to space
while maintaining the context.
The proposed interface was studied via a unidirectional conversation held by the researcher.
However, a more realistic system should allow for a bidirectional conversation flow allowing
greater control by the user. Therefore, it is important to consider an expressive communication
pathway as well as receptive. From an implementation perspective, one possible approach to
this problem could be having a command button that allows for a switch between receptive and
expressive conversations where the user can type a question from their conversation partner.
This will potentially lead to a cumulative context that needs to be tracked by the NLP engine
for appropriate suggestions.
76
6.2.3 Optimisation
Allowing for interaction between this system and other potential software such as games, web
browser, etc. is another area that will lead to a more realistic use case of BCI applications. Also,
there are many aspects of the BCI hardware itself that require further simplification and
improvement to allow for daily usage by individuals with communication challenges, e.g.
comfortable gel free electrodes on wireless caps, less setup time.
6.2.4 Customising the interface
It will be useful to have a GUI that allows for customisation. Some users may prefer speed over
accuracy and therefore be willing to decrease the decision threshold. This is understandable as
in many cases the intended answer can be comprehended regardless of some misclassifications.
Another useful feature could be to allow the user or caregiver to predefine the initial
suggestions based on the common words used personally by the user, e.g. favourite food,
games, etc.
77
Bibliography
[1] K. Tai, S. Blain, and T. Chau, “A review of emerging access technologies for individuals
with severe motor impairments,” Assist. Technol., vol. 20, no. 4, pp. 204–221, 2008.
[2] N. Alves and T. Chau, “Uncovering patterns of forearm muscle activity using multi-
channel mechanomyography,” J. Electromyogr. Kinesiol., vol. 20, no. 5, pp. 777–786,
2010.
[3] N. Memarian, A. N. Venetsanopoulos, and T. Chau, “Infrared thermography as an access
pathway for individuals with severe motor impairments,” J. Neuroeng. Rehabil., vol. 6,
no. 1, 2009.
[4] B. Leung and T. Chau, “A multiple camera tongue switch for a child with severe spastic
quadriplegic cerebral palsy.,” Disabil. Rehabil. Assist. Technol., vol. 5, no. 1, pp. 58–
68, 2010.
[5] Y. Erwei, T. Zeyl, R. Saab, T. Chau, H. Dewen, and Z. Zongtan, “A Hybrid Brain-
Computer Interface Based on the Fusion of P300 and SSVEP Scores,” Neural Syst.
Rehabil. Eng. IEEE Trans., vol. 23, no. 4, pp. 693–701, 2015.
[6] K. Cho et al., “Learning Phrase Representations using RNN Encoder-Decoder for
Statistical Machine Translation,” 2014.
[7] A. Rezeika, M. Benda, P. Stawicki, F. Gembler, A. Saboor, and I. Volosyak, “Brain–
Computer Interface Spellers: A Review,” Brain Sci., vol. 8, no. 4, p. 57, 2018.
[8] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, a review,” Sensors,
vol. 12, no. 2, pp. 1211–1279, 2012.
[9] L. a Farwell and E. Donchin, “Talking Off the Top of Your Head,” Electroencephalogr.
Clin. Neurophysiol., vol. 70, no. 6, pp. 510–523, 1988.
[10] Y. Li, C. S. Nam, B. B. Shadden, and S. L. Johnson, “A P300-based brain–computer
interface: Effects of interface type and screen size,” Intl. J. Human–Computer Interact.,
vol. 27, no. 1, pp. 52–68, 2010.
78
[11] Y. Sakai and T. Yagi, “Alphabet matrix layout in P300 speller may alter its
performance,” in Biomedical Engineering International Conference (BMEiCON), 2011,
2012, pp. 89–92.
[12] J. Jin, E. W. Sellers, and X. Wang, “Targeting an efficient target-to-target interval for
P300 speller brain-computer interfaces,” Med. Biol. Eng. Comput., vol. 50, no. 3, pp.
289–296, 2012.
[13] Y. Liu, Z. Zhou, and D. Hu, “Comparison of stimulus types in visual P300 speller of
brain-computer interfaces,” in Cognitive Informatics (ICCI), 2010 9th IEEE
International Conference on, 2010, pp. 273–279.
[14] I. Käthner, A. Kübler, and S. Halder, “Rapid P300 brain-computer interface
communication with a head-mounted display,” Front. Neurosci., vol. 9, p. 207, 2015.
[15] Q. Li, S. Liu, J. Li, and O. Bai, “Use of a green familiar faces paradigm improves p300-
speller brain-computer interface performance,” PLoS One, vol. 10, no. 6, p. e0130325,
2015.
[16] J. Jarmolowska, M. M. Turconi, P. Busan, J. Mei, and P. P. Battaglini, “A multimenu
system based on the p300 component as a time saving procedure for communication
with a brain-computer interface,” Front. Neurosci., vol. 7, no. 7 MAR, pp. 1–10, 2013.
[17] F. Akram, S. M. Han, and T. S. Kim, “An efficient word typing P300-BCI system using
a modified T9 interface and random forest classifier,” Comput. Biol. Med., vol. 56, pp.
30–36, 2015.
[18] F. Akram, M. K. Metwally, H. S. Han, H. J. Jeon, and T. S. Kim, “A novel P300-based
BCI system for words typing,” 2013 Int. Winter Work. Brain-Computer Interface, BCI
2013, no. February, pp. 24–25, 2013.
[19] V. Guy, M. H. Soriani, M. Bruno, T. Papadopoulo, C. Desnuelle, and M. Clerc, “Brain
computer interface with the P300 speller: Usability for disabled people with
amyotrophic lateral sclerosis,” Ann. Phys. Rehabil. Med., vol. 61, no. 1, pp. 5–11, 2018.
[20] W. Speier, C. Arnold, N. Chandravadia, D. Roberts, S. Pendekanti, and N. Pouratian,
“Improving P300 spelling rate using language models and predictive spelling,” Brain-
79
Computer Interfaces, vol. 2621, pp. 1–10, 2017.
[21] D. B. Ryan et al., “Predictive spelling with a P300-based brain-computer interface:
Increasing the rate of communication,” Int. J. Hum. Comput. Interact., vol. 27, no. 1,
pp. 69–84, 2011.
[22] T. Kaufmann, S. V??lker, L. Gunesch, and A. K??bler, “Spelling is just a click away -
A user-centered brain-computer interface including auto-calibration and predictive text
entry,” Front. Neurosci., vol. 6, no. MAY, pp. 1–10, 2012.
[23] A. Kübler, B. Kotchoubey, J. Kaiser, J. R. Wolpaw, and N. Birbaumer, “Brain–computer
communication: Unlocking the locked in.,” Psychol. Bull., vol. 127, no. 3, p. 358, 2001.
[24] E. J. Speckman, C. E. Elger, and A. Gorji, “Neurophysiologic Basis of EEG and DC
Potentials,” pp. 1–16.
[25] S. M. Coyle, T. E. Ward, and C. M. Markham, “Brain-computer interface using a
simplified functional near-infrared spectroscopy system.,” J. Neural Eng., vol. 4, no. 3,
pp. 219–226, 2007.
[26] J. R. Wolpaw et al., “Brain-computer interface technology: a review of the first
international meeting,” IEEE Trans. Rehabil. Eng., vol. 8, no. 2, pp. 164–173, 2000.
[27] S. Moghimi, A. Kushki, A. Marie Guerguerian, and T. Chau, “A review of EEG-Based
brain-computer interfaces as access pathways for individuals with severe disabilities,”
Assist. Technol., vol. 25, no. 2, pp. 99–110, 2013.
[28] J. Wang, G. Xu, L. Wang, and H. Zhang, “Feature extraction of brain-computer interface
based on improved multivariate adaptive autoregressive models,” in 2010 3rd
International Conference on Biomedical Engineering and Informatics, 2010, vol. 2, pp.
895–898.
[29] N. N. Birbaumer et al., “A spelling device for the paralysed.,” Nature, vol. 398, no.
6725, pp. 297–298, 1999.
[30] E. Pasqualotto, S. Federici, and M. O. Belardinelli, “Toward functioning and usable
brain-computer interfaces (BCIs): A literature review,” Disabil. Rehabil. Assist.
Technol., vol. 7, no. 2, pp. 89–103, 2012.
80
[31] T. O. Zander and C. Kothe, “Towards passive brain-computer interfaces: Applying
brain-computer interface technology to human-machine systems in general,” J. Neural
Eng., vol. 8, no. 2, 2011.
[32] K. Cassady, A. You, A. Doud, and B. He, “The impact of mind-body awareness training
on the early learning of a brain-computer interface,” Technology, vol. 2, no. 03, pp. 254–
260, 2014.
[33] S. D. Power, A. Kushki, and T. Chau, “Towards a system-paced near-infrared
spectroscopy brain–computer interface: differentiating prefrontal activity due to mental
arithmetic and mental singing from the no-control state,” J. Neural Eng., vol. 8, no. 6,
p. 66004, 2011.
[34] J. Milton, S. L. Small, and A. Solodkin, “Imaging motor imagery: methodological issues
related to expertise,” Methods, vol. 45, no. 4, pp. 336–341, Aug. 2008.
[35] G. Pfurtscheller and C. Neuper, “Motor imagery and direct brain-computer
communication,” Proc. IEEE, vol. 89, no. 7, pp. 1123–1134, 2001.
[36] S. Bajaj, A. J. Butler, D. Drake, and M. Dhamala, “Brain effective connectivity during
motor-imagery and execution following stroke and rehabilitation,” NeuroImage. Clin.,
vol. 8, pp. 572–582, Jun. 2015.
[37] J. V. Odom et al., “Visual evoked potentials standard (2004),” Doc. Ophthalmol., vol.
108, no. 2, pp. 115–123, 2004.
[38] M. Wang et al., “A new hybrid BCI paradigm based on P300 and SSVEP,” J. Neurosci.
Methods, vol. 244, pp. 16–25, 2015.
[39] J. Chen, D. Zhang, A. K. Engel, Q. Gong, and A. Maye, “Application of a single-flicker
online SSVEP BCI for spatial navigation,” PLoS One, vol. 12, no. 5, pp. 1–13, 2017.
[40] Y. Y. Chien et al., “Polychromatic SSVEP stimuli with subtle flickering adapted to
brain-display interactions,” J. Neural Eng., vol. 14, no. 1, 2017.
[41] S. Sur and V. K. Sinha, “Event-related potential: An overview,” Ind. Psychiatry J., vol.
18, no. 1, p. 70, 2009.
81
[42] J. Polich, “Updating P300: An integrative theory of P3a and P3b,” Clin. Neurophysiol.,
vol. 118, no. 10, pp. 2128–2148, 2007.
[43] R. M. Chapman and H. R. Bragdon, “Evoked responses to numerical and non-numerical
visual stimuli while problem solving,” Nature, vol. 203, no. 4950, p. 1155, 1964.
[44] S. Sutton, M. Braren, J. Zubin, and E. R. John, “Evoked-potential correlates of stimulus
uncertainty,” Science (80-. )., vol. 150, no. 3700, pp. 1187–1188, 1965.
[45] J. Polich, “Neuropsychology of P300,” Oxford Handb. event-related potential
components, vol. 159, p. 88, 2012.
[46] C. C. Duncan‐Johnson and E. Donchin, “On quantifying surprise: The variation of
event‐related potentials with subjective probability,” Psychophysiology, vol. 14, no. 5,
pp. 456–467, 1977.
[47] J. Polich and C. Margala, “P300 and probability: comparison of oddball and single-
stimulus paradigms,” Int. J. Psychophysiol., vol. 25, no. 2, pp. 169–176, 1997.
[48] E. Donchin, M. Kubovy, M. Kutas, R. Johnson, and R. I. Tterning, “Graded changes in
evoked response (P300) amplitude as a function of cognitive activity,” Percept.
Psychophys., vol. 14, no. 2, pp. 319–324, 1973.
[49] M. Palankar et al., “Control of a 9-DoF wheelchair-mounted robotic arm system using
a P300 brain computer interface: Initial experiments,” in 2008 IEEE International
Conference on Robotics and Biomimetics, 2009, pp. 348–353.
[50] C. Zickler et al., “A brain-computer interface as input channel for a standard assistive
technology software,” Clin. EEG Neurosci., vol. 42, no. 4, pp. 236–244, 2011.
[51] A.-M. Brouwer and J. B. F. Van Erp, “A tactile P300 brain-computer interface,” Front.
Neurosci., vol. 4, p. 19, 2010.
[52] R. W. McCarley et al., “Auditory P300 abnormalities and left posterior superior
temporal gyrus volume reduction in schizophrenia,” Arch. Gen. Psychiatry, vol. 50, no.
3, pp. 190–197, 1993.
[53] X. An, J. Höhne, D. Ming, and B. Blankertz, “Exploring combinations of auditory and
82
visual stimuli for gaze-independent brain-computer interfaces,” PLoS One, vol. 9, no.
10, p. e111070, 2014.
[54] J. Polich, P. C. Ellerson, and J. Cohen, “P300, stimulus intensity, modality, and
probability,” Int. J. Psychophysiol., vol. 23, no. 1–2, pp. 55–62, 1996.
[55] F. Aloise et al., “Multimodal stimulation for a P300-based BCI,” Int. J. Bioelectromagn,
vol. 9, no. 3, pp. 128–130, 2007.
[56] E. Yin, T. Zeyl, R. Saab, D. Hu, Z. Zhou, and T. Chau, “An Auditory-Tactile Visual
Saccade-Independent P300 Brain–Computer Interface,” Int. J. Neural Syst., vol. 26, no.
01, p. 1650001, 2016.
[57] D. J. Krusienski et al., “A comparison of classification techniques for the P300 Speller,”
J. Neural Eng., vol. 3, no. 4, p. 299, 2006.
[58] I. Volosyak, “SSVEP-based Bremen–BCI interface—boosting information transfer
rates,” J. Neural Eng., vol. 8, no. 3, p. 36020, 2011.
[59] I. Volosyak, A. Moor, and A. Gräser, “A dictionary-driven SSVEP speller with a
modified graphical user interface,” in International Work-Conference on Artificial
Neural Networks, 2011, pp. 353–361.
[60] Y. Wang and T. Jung, “Visual stimulus design for high-rate SSVEP BCI,” Electron.
Lett., vol. 46, no. 15, pp. 1057–1058, 2010.
[61] M. Nakanishi, Y. Wang, X. Chen, Y. Wang, X. Gao, and T. Jung, “Enhancing Detection
of SSVEPs for a High-Speed Brain Speller Using Task-Related Component Analysis,”
IEEE Trans. Biomed. Eng., vol. 65, no. 1, pp. 104–112, 2018.
[62] E. Yin, Z. Zhou, J. Jiang, F. Chen, Y. Liu, and D. Hu, “A speedy hybrid BCI spelling
approach combining P300 and SSVEP,” IEEE Trans. Biomed. Eng., vol. 61, no. 2, pp.
473–483, 2014.
[63] Z. Lin, C. Zhang, Y. Zeng, L. Tong, and B. Yan, “A novel P300 BCI speller based on
the Triple RSVP paradigm,” Sci. Rep., vol. 8, no. 1, p. 3350, 2018.
[64] L. Acqualagna, M. S. Treder, and B. Blankertz, “Chroma Speller: Isotropic visual
83
stimuli for truly gaze-independent spelling,” in Neural Engineering (NER), 2013 6th
International IEEE/EMBS Conference on, 2013, pp. 1041–1044.
[65] F. Aloise et al., “A covert attention P300-based brain–computer interface: Geospell,”
Ergonomics, vol. 55, no. 5, pp. 538–551, 2012.
[66] W. Speier, C. Arnold, and N. Pouratian, “Integrating language models into classifiers
for BCI communication: A review,” J. Neural Eng., vol. 13, no. 3, pp. 1–13, 2016.
[67] G. Pires, U. Nunes, and M. Castelo-Branco, “GIBS block speller: toward a gaze-
independent P300-based BCI,” in Engineering in Medicine and Biology Society, EMBC,
2011 Annual International Conference of the IEEE, 2011, pp. 6360–6364.
[68] M. S. Treder, H. Purwins, D. Miklody, I. Sturm, and B. Blankertz, “Decoding auditory
attention to instruments in polyphonic music using single-trial EEG classification,” J.
Neural Eng., vol. 11, no. 2, 2014.
[69] I. Käthner, C. A. Ruf, E. Pasqualotto, C. Braun, N. Birbaumer, and S. Halder, “A
portable auditory P300 brain-computer interface with directional cues,” Clin.
Neurophysiol., vol. 124, no. 2, pp. 327–338, 2013.
[70] A. Onishi, K. Takano, T. Kawase, H. Ora, and K. Kansaku, “Affective stimuli for an
auditory P300 brain-computer interface,” Front. Neurosci., vol. 11, no. SEP, pp. 1–9,
2017.
[71] B. Blankertz et al., “The Berlin Brain-Computer Interface presents the novel mental
typewriter Hex-o-Spell.,” 2006.
[72] I. Käthner, S. C. Wriessnegger, G. R. Müller-Putz, A. Kübler, and S. Halder, “Effects
of mental workload and fatigue on the P300, alpha and theta band power during
operation of an ERP (P300) brain-computer interface,” Biol. Psychol., vol. 102, no. 1,
pp. 118–129, 2014.
[73] D. E. Thompson et al., “Performance measurement for brain-computer or brain-machine
interfaces: A tutorial,” J. Neural Eng., vol. 11, no. 3, 2014.
[74] M. S. Treder and B. Blankertz, “(C)overt attention and visual speller design in an ERP-
based brain-computer interface,” Behav. Brain Funct., vol. 6, pp. 1–13, 2010.
84
[75] B. Z. Allison and J. A. Pineda, “Effects of SOA and flash pattern manipulations on
ERPs, performance, and preference: Implications for a BCI system,” Int. J.
Psychophysiol., vol. 59, no. 2, pp. 127–140, 2006.
[76] M. Salvaris and F. Sepulveda, “Visual modifications on the P300 speller BCI paradigm,”
J. Neural Eng., vol. 6, no. 4, 2009.
[77] C. Guger et al., “How many people are able to control a P300-based brain-computer
interface (BCI)?,” Neurosci. Lett., vol. 462, no. 1, pp. 94–98, 2009.
[78] G. Townsend et al., “A novel P300-based brain-computer interface stimulus
presentation paradigm: Moving beyond rows and columns,” Clin. Neurophysiol., vol.
121, no. 7, pp. 1109–1120, 2010.
[79] C. E. Lakey, D. R. Berry, and E. W. Sellers, “Manipulating attention via mindfulness
induction improves P300-based brain-computer interface performance,” J. Neural Eng.,
vol. 8, no. 2, 2011.
[80] R. Fazel-Rezai and K. Abhari, “A region-based P300 speller for brain-computer
interface,” Can. J. Electr. Comput. Eng., vol. 34, no. 3, pp. 81–85, 2009.
[81] R. Fazel-Rezai and W. Ahmad, “P300-based brain-computer interface paradigm
design,” in Recent advances in brain-computer interface systems, InTech, 2011.
[82] G. Pires, U. Nunes, and M. Castelo-Branco, “Comparison of a row-column speller vs. a
novel lateral single-character speller: assessment of BCI for severe motor disabled
patients,” Clin. Neurophysiol., vol. 123, no. 6, pp. 1168–1181, 2012.
[83] F. Guo, B. Hong, X. Gao, and S. Gao, “A brain–computer interface using motion-onset
visual evoked potential,” J. Neural Eng., vol. 5, no. 4, p. 477, 2008.
[84] B. Hong, F. Guo, T. Liu, X. Gao, and S. Gao, “N200-speller using motion-onset visual
response,” Clin. Neurophysiol., vol. 120, no. 9, pp. 1658–1666, 2009.
[85] T. Liu, L. Goldberg, S. Gao, and B. Hong, “An online brain–computer interface using
non-flashing visual evoked potentials,” J. Neural Eng., vol. 7, no. 3, p. 36003, 2010.
[86] J. Jin, B. Z. Allison, X. Wang, and C. Neuper, “A combined brain–computer interface
85
based on P300 potentials and motion-onset visual evoked potentials,” J. Neurosci.
Methods, vol. 205, no. 2, pp. 265–276, 2012.
[87] T. Kaufmann, S. M. Schulz, C. Grünzinger, and A. Kübler, “Flashing characters with
famous faces improves ERP-based brain–computer interface performance,” J. Neural
Eng., vol. 8, no. 5, p. 56016, 2011.
[88] S. H Patel and P. N Azzam, Characterization of N200 and P300: Selected Studies of the
Event-Related Potential, vol. 2. 2005.
[89] T. Kaufmann and A. Kübler, “Beyond maximum speed — a novel two- stimulus
paradigm for brain – computer interfaces based on event-related potentials ( P300-BCI
),” vol. 11, 2014.
[90] B. R. Conway, S. Moeller, and D. Y. Tsao, “Specialized Color Modules in Macaque
Extrastriate Cortex,” Neuron, vol. 56, no. 3, pp. 560–573, 2007.
[91] E. Donchin, K. M. Spencer, and R. Wijesinghe, “The mental prosthesis: Assessing the
speed of a P300-based brain- computer interface,” IEEE Trans. Rehabil. Eng., vol. 8,
no. 2, pp. 174–179, 2000.
[92] W. Speier, C. Arnold, J. Lu, R. K. Taira, and N. Pouratian, “Natural language processing
with dynamic classification improves P300 speller accuracy and bit rate,” J. Neural
Eng., vol. 9, no. 1, 2012.
[93] B. O. Mainsah et al., “Increasing BCI communication rates with dynamic stopping
towards more practical use: An ALS study,” J. Neural Eng., vol. 12, no. 1, p. 16013,
2015.
[94] B. O. Mainsah, K. A. Colwell, L. M. Collins, S. Member, and C. S. Throckmorton,
“Utilizing a Language Model to Improve Online Dynamic Data Collection in P300
Spellers,” vol. 22, no. 4, pp. 837–846, 2014.
[95] U. Orhan et al., “Fusion with Language Models Improves Spelling Accuracy for ERP-
based Brain Computer Interface Spellers,” 2011 Annu. Int. Conf. IEEE Eng. Med. Biol.
Soc., pp. 5774–5777, 2011.
[96] S. Fine, Y. Singer, and N. Tishby, “The hierarchical hidden Markov model: Analysis
86
and applications,” Mach. Learn., vol. 32, no. 1, pp. 41–62, 1998.
[97] U. Orhan, H. Nezamfar, M. Akcakaya, D. Erdogmus, and M. Higger, “Probabilistic
simulation framework for EEG-based BCI design,” Brain-Computer Interfaces, vol. 3,
no. 4, pp. 1–15, 2016.
[98] W. Speier, C. Arnold, J. Lu, A. Deshpande, and N. Pouratian, “Integrating language
information with a hidden markov model to improve communication rate in the P300
speller,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 22, no. 3, pp. 678–684, 2014.
[99] T. Schreiber, “Measuring information transfer,” Phys. Rev. Lett., vol. 85, no. 2, p. 461,
2000.
[100] A. Furdea et al., “An auditory oddball (P300) spelling system for brain-computer
interfaces,” Psychophysiology, vol. 46, no. 3, pp. 617–625, 2009.
[101] W. Speier, C. Arnold, and N. Pouratian, “Evaluating True BCI Communication Rate
through Mutual Information and Language Models,” PLoS One, vol. 8, no. 10, 2013.
[102] K. Takano, T. Komatsu, N. Hata, Y. Nakajima, and K. Kansaku, “Visual stimuli for the
P300 brain-computer interface: A comparison of white/gray and green/blue flicker
matrices,” Clin. Neurophysiol., vol. 120, no. 8, pp. 1562–1566, 2009.
[103] Davis King, “MITIE, MIT Information Extraction.” .
[104] S. K. Yeom, S. Fazli, K. R. M. Ller, and S. W. Lee, “An efficient ERP-based brain-
computer interface using random set presentation and face familiarity,” PLoS One, vol.
9, no. 11, 2014.
[105] V. I. Levenshtein, “Binary codes capable of correcting deletions,” Insertions and
Reversals. Sov, vol. 6. pp. 707–710, 1966.
[106] S. Dudy, S. Xu, S. Bedrick, and D. Smith, “A Multi-Context Character Prediction Model
for a Brain-Computer Interface,” pp. 72–77, 2018.
[107] P. J. Kindermans, M. Tangermann, K. R. Müller, and B. Schrauwen, “Integrating
dynamic stopping, transfer learning and language models in an adaptive zero-training
ERP speller,” J. Neural Eng., vol. 11, no. 3, 2014.
87
[108] S. G. Hart and L. E. Staveland, “Development of NASA-TLX (Task Load Index):
Results of empirical and theoretical research,” in Advances in psychology, vol. 52,
Elsevier, 1988, pp. 139–183.
[109] V. Abootalebi, M. Hassan, and M. Ali, “A new approach for EEG feature extraction in
P300-based lie,” vol. 4, pp. 48–57, 2008.
[110] S. F. Chen and J. Goodman, “Empirical study of smoothing techniques for language
modeling,” Comput. Speech Lang., vol. 13, no. 4, pp. 359–394, 1999.
[111] D. B. Ryan, G. E. Frye, G. Townsend, D. R. Berry, N. A. Gates, and E. W. Sellers,
“Predictive Spelling With a P300-Based Brain – Computer Interface : Increasing the
Rate of Communication,” vol. 27, no. 1, pp. 69–84, 2011.
[112] R. Fazel-Rezai, B. Z. Allison, C. Guger, E. W. Sellers, S. C. Kleih, and A. Kübler, “P300
brain computer interface: current challenges and emerging trends,” Front. Neuroeng.,
vol. 5, no. July, pp. 1–14, 2012.
[113] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter, “BCI competition
2003-data set IIb: support vector machines for the P300 speller paradigm,” IEEE Trans.
Biomed. Eng., vol. 51, no. 6, pp. 1073–1076, 2004.
[114] N. Xu, X. Gao, B. Hong, X. Miao, S. Gao, and F. Yang, “BCI competition 2003-data
set IIb: enhancing P300 wave detection using ICA-based subspace projections for BCI
applications,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1067–1072, 2004.
[115] H. Serby, E. Yom-Tov, and G. F. Inbar, “An improved P300-based brain-computer
interface,” IEEE Trans. neural Syst. Rehabil. Eng., vol. 13, no. 1, pp. 89–98, 2005.
[116] W. Francis and H. Kucera, “Brown Corpus Manual.” 1979.
[117] E. W. Sellers, A. Kübler, and E. Donchin, “Brain-computer interface research at the
University of South Florida cognitive psychophysiology laboratory: The P300 speller,”
IEEE Trans. Neural Syst. Rehabil. Eng., vol. 14, no. 2, pp. 221–224, 2006.
[118] E. Combrisson and K. Jerbi, “Exceeding chance level by chance: The caveat of
theoretical chance levels in brain signal classification and statistical assessment of
decoding accuracy,” J. Neurosci. Methods, vol. 250, pp. 126–136, 2015.
88
[119] W. Speier, C. W. Arnold, A. Deshpande, J. Knall, and N. Pouratian, “Incorporating
advanced language models into the P300 speller using particle filtering,” J. Neural Eng.,
vol. 12, no. 4, p. 46018, 2015.
[120] G. Gratton, M. G. H. Coles, and E. Donchin, “A new method for off-line removal of
ocular artifact,” Electroencephalogr. Clin. Neurophysiol., vol. 55, no. 4, pp. 468–484,
1983.
[121] N. M. Schmidt, B. Blankertz, and M. S. Treder, “Online detection of error-related
potentials boosts the performance of mental typewriters,” BMC Neurosci., vol. 13, no.
1, p. 19, 2012.
[122] M. Spüler, M. Bensch, S. Kleih, W. Rosenstiel, M. Bogdan, and A. Kübler, “Online use
of error-related potentials in healthy users and people with severe motor impairment
increases performance of a P300-BCI,” Clin. Neurophysiol., vol. 123, no. 7, pp. 1328–
1337, 2012.
[123] B. Dal Seno, M. Matteucci, and L. Mainardi, “Online detection of P300 and error
potentials in a BCI speller,” Comput. Intell. Neurosci., vol. 2010, p. 11, 2010.
[124] E. Yin, Z. Zhou, J. Jiang, F. Chen, Y. Liu, and D. Hu, “A novel hybrid BCI speller based
on the incorporation of SSVEP into the P300 paradigm,” J. Neural Eng., vol. 10, no. 2,
2013.
[125] P. Stawicki, F. Gembler, A. Rezeika, and I. Volosyak, “A novel hybrid mental spelling
application based on eye tracking and SSVEP-based BCI,” Brain Sci., vol. 7, no. 4,
2017.
[126] C. Morales et al., “Single trial P300 detection in children using expert knowledge and
SOM,” 2014 36th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBC 2014, pp. 3801–
3804, 2014.
[127] E. Hakkarainen, S. Pirilä, J. Kaartinen, K. Eriksson, and J. J. Van Der Meere, “Visual
attention study in youth with spastic cerebral palsy using the event-related potential
method,” J. Child Neurol., vol. 26, no. 12, pp. 1525–1528, 2011.
89
Appendices
Appendix A1
NASA Task Load Index
Hart and Staveland’s NASA Task Load Index (TLX) method
assesses work load on five 7-point scales. Increments of high,
medium and low estimates for each point result in 21
gradations on the scales.
Participant Number
Date
Session #
Mental Demand How mentally demanding was the task?
Very Low Very High
Physical Demand How physically demanding was the task?
Very Low Very High
Temporal Demand How hurried or rushed was the pace of the task?
Very Low Very High
Performance How successful were you in accomplishing what
you were asked to do?
Perfect Failure
Effort How hard did you have to work to accomplish
your level of performance?
Very Low Very High
Frustration How insecure, discouraged, irritated, stressed,
and annoyed were you?
Very Low Very High
90
Appendix A2
Post- session questionnaire
During the last block, you have noticed that after asking the questions, suggestions
represented in the last column were either relevant to the questions or completely naïve to the
context. Please answer the questions below regarding your experience interacting with the
interface during the last block.
1- Using the scales below (middle being neutral), please indicate your preference in
using either of the suggestion methods for expressing your answers during the last
block.
2- Please tell us about your opinion on using either of the suggestion methods and
why you prefer using one over another to express your thoughts (if have a
preference):
Context dependent Context Independent
Context Independent Context dependent