12
EEG Signal Decoding and Classification Julius Hülsmann * Michal Jirku * Alexander Dyck * Abstract The classification of Electroencephalogram (EEG) data is a non-trivial task, be- cause EEG-signals are high-dimensional multivariate time series, often distorted by artifacts emerging from various sources. In this paper, we propose a new-designed classification pipeline for two-class EEG data. The approach is based on data expansion using a sliding window. For the training and testing phase we are using a small amount of samples and avoid the use of sophisticated artifact removal tech- niques. In this setup, the new classification pipeline outperforms the classification performance of commonly used alternatives. We firstly evaluate the performance of different state-of-the-art classification pipelines on two-class EEG data. Due to the high-dimensional nature of EEG signals, the choice of an appropriate dimensionality reduction algorithm is crucial to the classification pipeline’s success. We compare commonly used algorithms, focusing on the performance of Common Spatial Pattern (CSP) in particular. Like an important property of a classifier used for EEG data is its time consumption, we evaluate two different approaches for reducing time consumption: omitting the training phase and using only a segment of the signal with crossvalidated length for the classification. We then evaluate the performance of the new created pipeline and investigate on whether it is possible to omit the training phase for the new approach. 1 Introduction In the recent years Brain-Computer Interfacing (BCI) has become a highly active research topic in neuroscience, signal processing and engineering. Several techniques for acquiring brain activity were developed. There are four main parameters in which the techniques differ [4]: Scale - How many neurons can be simultaneously recorded. Temporal resolution - How often is the brain activity acquired. Spatial resolution - How closely is the activity of single neuron measured. Invasiveness - Determines if a surgery is needed and to which extent. We focus on EEG-based BCI, which is in general preferred over other techniques for analyzing brain functions. EEG-based BCI uses electrodes placed on scalp over the brain region. It has high temporal resolution typically in millisecond scale allowing rapid estimates of the user’s mental state. Furthermore it is a compact system, which is economically affordable and portable. The main advantage is it’s non-invasiveness, requiring no surgery. [4, 5] * Please replace [AT] by @ in each email address. PJ:NI 2017, TU Berlin

EEG Signal Decoding and Classification - ni.tu-berlin.de · EEG-based BCI uses electrodes placed on ... The Power Spectral Density is the Fourier Transform of the autocorrelation

Embed Size (px)

Citation preview

EEG Signal Decoding and Classification

Julius Hülsmannhuelsmann[AT]campus.tu-berlin.de∗

Michal Jirkujirku[AT]campus.tu-berlin.de∗

Alexander Dyckalexander.dyck[AT]campus.tu-berlin.de∗

Abstract

The classification of Electroencephalogram (EEG) data is a non-trivial task, be-cause EEG-signals are high-dimensional multivariate time series, often distorted byartifacts emerging from various sources. In this paper, we propose a new-designedclassification pipeline for two-class EEG data. The approach is based on dataexpansion using a sliding window. For the training and testing phase we are usinga small amount of samples and avoid the use of sophisticated artifact removal tech-niques. In this setup, the new classification pipeline outperforms the classificationperformance of commonly used alternatives.We firstly evaluate the performance of different state-of-the-art classificationpipelines on two-class EEG data. Due to the high-dimensional nature of EEGsignals, the choice of an appropriate dimensionality reduction algorithm is crucialto the classification pipeline’s success. We compare commonly used algorithms,focusing on the performance of Common Spatial Pattern (CSP) in particular.Like an important property of a classifier used for EEG data is its time consumption,we evaluate two different approaches for reducing time consumption: omitting thetraining phase and using only a segment of the signal with crossvalidated lengthfor the classification.We then evaluate the performance of the new created pipeline and investigate onwhether it is possible to omit the training phase for the new approach.

1 Introduction

In the recent years Brain-Computer Interfacing (BCI) has become a highly active research topic inneuroscience, signal processing and engineering. Several techniques for acquiring brain activity weredeveloped. There are four main parameters in which the techniques differ [4]:

• Scale - How many neurons can be simultaneously recorded.• Temporal resolution - How often is the brain activity acquired.• Spatial resolution - How closely is the activity of single neuron measured.• Invasiveness - Determines if a surgery is needed and to which extent.

We focus on EEG-based BCI, which is in general preferred over other techniques for analyzingbrain functions. EEG-based BCI uses electrodes placed on scalp over the brain region. It has hightemporal resolution typically in millisecond scale allowing rapid estimates of the user’s mentalstate. Furthermore it is a compact system, which is economically affordable and portable. The mainadvantage is it’s non-invasiveness, requiring no surgery. [4, 5]

∗Please replace [AT] by @ in each email address.

PJ:NI 2017, TU Berlin

The downside of EEG-based BCI lies in very low spatial resolution, making it difficult to performprecise spatial analysis. Moreover the resulting EEG signal may contain external artifacts includingmains-electrical interference and DC level drift. There is also internal source of artifacts which arecaused directly by the subject, such as blinks, swallowing, eye movements and subject movements.The last class of potential error is the inaccuracy of electrode placement. Furthermore, research in thisarea is limited by the scarce neurophysiological knowledge about the brain mechanisms generatingthe outgoing signal. [5]

Using the tools of signal processing and machine learning, BCI systems allow a person to controldevices using electrical activity of the brain. There are two main types of EEG-based BCI systems.The first type uses brain activity generated in response to specific visual or auditory stimuli. Thesecond type, which we focus on, uses activity spontaneously generated by the user. In this systemthe user is asked to image one task from a limited set of mental tasks. In our case it is to imagineopening and closing both fists or both feet. Using this type of BCI system, the recording can beused to control a cursor or provide alternative interface to control virtual keyboard. The advantage isthat the interface is more immediate and flexible to operate. The system in principle may be used todirectly recognize the mental state of the user. The disadvantage lies in inconsistencies in the user’smental state, motivation, fatigue and other psychological and physiological factors. These factorsmake the correspondence of mental state and electrode activity more difficult to achieve. [5]

In order to classify, if spontaneously generated activity corresponds to one of the mental tasks,different methods of signal processing and classification are used. Our goal is to define and cross-validate data related parameters such as number of components and size of the time window. Thenext step is to analyze different approaches proposed in the literature [9, 10, 11] and compare them.We further try to expand the given data to increase the classification performance. As we have broaddataset containing many different subjects, we also deal with the question if we can generalize theclassifier to be subject independent.

Our motivation is to make the classification as precise and generalized as possible in order to to enablephysically-impaired people with limited muscle control but intact brain capabilities to communicatewith outside world. But we also want to achieve high precision in order to play games like Pong moreaccurate using only the brain activity.

2 Methods

In this section we describe different methods used for signal preprocessing and classification.

2.1 Power Spectral Density Analysis

As a first preprocessing step we bandpass the EEG signal. To find the right cutoff frequencies weperformed a Power Spectral Density (PSD) analysis to find out in which frequency bands both classesare most differentiable (we used PSD as the metric).

Sxx(f) = limT→∞

1

2T

∫ T

−T|f(t)|2dt

The Power Spectral Density is the Fourier Transform of the autocorrelation function of a time series.It describes the strength of variations in a signal as a function of frequency.

2.2 Data Preprocessing

Careful selection of preprocessing steps is crucial to the success of any classification scheme. Herewe describe three main preprocessing techniques used in EEG-based BCI and our further evaluation.

2.2.1 Principle Component Analysis

Principal Component Analysis (PCA) creates a k-dimensional orthonormal basis of the subspacewith maximized variance along the dimensions in descending order. The equation

T = X ·W

2

describes the full decomposition whereas X is the original data matrix and W holds as columnsthe eigenvectors of X’s covariance matrix, i.e. (X −m)T (X −m) for m := mean(X). The lastcomponents have low variance and thus explain very few differences in the data. Therefore droppingthem is an effective way for dimensionality reduction, not taking into account the classes assigned tothe data.

2.2.2 Independent Component Analysis

Independent Component Analysis (ICA) is a method to separate a multivariate signal into additivesubcomponents. It is assumed that the components forming the sum signal are statistically independentand are non-Gaussian. This consideration leads to the equation

x = A · s

where x are the observations, i.e. the measured sum signal and A is a mixing matrix which describesthe composition of the individual source signals s. ICA then finds the unmixing Matrix W from therearranged equation

s = W · xwhere s are the reconstructed source signals. To reliably find a good W as many observations of thesum signal as there are additive components are needed. Using W we can then separate the recordedsignal into it’s individual components as shown in Figure 1.

(a) Decomposition of mixed signal using ICA (b) Two sets of samples drawn from two Gaussiandistributions marked as red crosses and blue circles.Two ellipses show the estimated covariances anddashed lines show the direction of CSP projections.

Figure 1: Depiction of the application of ICA and CSP [10].

2.2.3 Common Spatial Patterns

CSP separates a multivariate signal into additive subcomponents. The general idea behind CSP isfor each subcomponent to maximize it’s variance for one class while minimizing it for the other.The decomposition is described by the decomposition matrix W which projects the data into thesurrogate sensor space x.

xcsp = W · xSince we are looking to maximize variance and variance of band-pass filtered signals is equal toband power, CSP analysis is applied to approximately band-pass filtered signals in order to obtain aneffective discrimination of mental states.

Given two windows X1 and X2 with the dimensions C × n where C is the number of channelsand n the number of samples, the component w of the decomposition matrix W can be obtainedcalculating:

w = argmaxw

‖wX1‖2

‖wX2‖2

The equation can be solved by first calculating the covariance matrices C1 and C2. Then we find theeigenvalue decomposition:

C−12 C1 = PDP−1

with P being the matrix of eigenvectors and Λ the diagonal matrix of eigenvalues. Then wT

corresponds to the first column of P.

3

2.3 Classifiers

In this work we used two classifiers namely Linear Discriminant Analysis (LDA) and Support VectorClassifier (SVC). LDA was chosen because of its seeming popularity in EEG pattern classification.We chose a SVC as the second classifier because of the low amount of data points and it familiarity.In this section we shortly explain the theoretical concepts of both classifiers.

2.3.1 Linear Discriminant Analysis

LDA classifies samples by projecting them onto a vector and thus obtaining a scalar value y.

y = wT · x

Therefore the goal is to find a vector w which maximizes the separability of the projections of a giventraining set X. This is achieved by first maximizing the difference between the projected means,normalized by the within-class scatter.

2.3.2 Support Vector Classifier

A SVC classifies samples by constructing a hyperplane separating both classes with an maximizedmargin on either side. In this regard it is similar to a linear classifier. However the hyperplane isconstructed in a high- or infinite-dimensional space. This achieved by mapping the original data witha kernel function

k(xi,xj) = φ(xi) · φ(xj)

3 Implementation

The implementation of our experiments is based on NUMPY, SCIPY and MNE. The latter is aCommunity-driven software for processing time-resolved neural signals including EEG. NUMPY andSCIPY are general-purpose python packages, offering libraries and tools, in general for scientificcomputation and in particular for machine learning techniques.

Figure 2: Placement of the 64 electrodes

For training and evaluation we chose the publicly available EEG Motor Movement/Imagery Dataset[7, 8]. The data was obtained using the BCI2000 instrumentation system [6] and is distributed byPhysioNet. The whole Dataset contains trial recordings of 109 different subjects. In this work weused only one of four different trial variants. The subject watches a screen on which eventually atarget appears on the top or bottom. Depending on the location the subjects then imagines to openand close both his/her hands or feet until the target disappears. We only used the EEG part of thedata which is recorded by 64 electrodes placed on the scalp of the subject. Fig 2 shows the placementof electrodes. This gives 64 channels with a temporal resolution of 121 Hz. The data is available in afive second window from one second before the onset until 4 seconds after. Therefore each data point(i.e. trial) has a dimensionality of 38720.

4

4 Experimental setup

After initially evaluating different classification pipeline’s performance, only the pipeline consistingof a CSP spatial filter and a LDA classifier is considered – a decision emerging from the fact that thispipeline achieves the best and most robust results in the single-subject-experiments2. The subsequentinvestigations aim to evaluate relevant properties of the selected classifier and to further enhance theclassification performance.

4.1 Pipeline evaluation

Firstly, we evaluate the performance of the following pipelines:

1. CSP combined with a LDA classifier,

2. CPS combined with a SVC classifier using a RBF kernel,

3. CPS combined with a dummy classifier, executed for yielding information on the varianceof our pipeline,

4. PCA combined with a LDA classifier,

5. Fast ICA combined with a LDA classifier,

each being composed of a dimensionality reduction algorithm and a classifier. To this end, we conductnested crossvalidation for hyperparameter search, training and testing on samples of the same subject.This procedure is repeated for each pipeline, each of which being optimized with respect to theselected amount of components3 n and the considered time window T of the response. They arevaried in the following ranges:

n ∈ { 1, . . . , 16 },

T ∈{[tB , tE ]

∣∣∣∣ 0 ≤ tB < tE ≤ 3s ∧ ∃λ ∈ N : tE − tB =λ

4s

}. (1)

The time window (equation (1)) is hence varied among all possible intervals in [0, 3], having asdiameter a multiple of 250ms. The second pipeline – involving a SVC classifier – also tunes theclassifier’s hyperparameters

C ∈ {.1, 10, 100} and γ ∈ {10−3, 10−4}.

Like both the signal’s quality and characteristics vary among different subjects, we provide de-tailed experiments on the applicability of the best-performing pipeline on different subjects as well.Therefore we repeat the entire training and testing procedure for other subjects, too.

Except for bandpass-filtering the signal used for the classification, the original response signal is notpreprocessed. The accepted frequency range is determined by a visual analysis of the signal’s PSD,as described in section 2.1. The determination of the optimal bandpass filter is not included into thetraining pipeline because it aims to reveal properties that are not specific to a subject but generallyinherent to EEG recordings.

4.2 Time consumption

One advantage of EEG-data is that it is easy to acquire, cheep and – for some use cases mostimportantly – that it can be gathered and processed in real time (cmp. section 1).

Correct classifications per hour An important property of a classifier is thus its time consumption,a quantity mainly depending on the time interval extracted from the signal and utilized for theclassification. In a real-world use case the consumed time is to be traded off with a minimumacceptable classification performance. For that reason, in the second subsection, we both present anevaluation of the classification performance over time and the correct classifications per hour. Here

2The evaluation of the different classifiers can be found in section 5.1.3respectively „spatial filters“ if the pipeline involves CSP.

5

we use the optimal time window length per subject, emerging from the cross validation in 4.1 and varythe selected interval’s start. A gap of 1s in between the experiments is considered for not distortingthe response. We neglect the time used for training and the time consumption of the classificationitself. The latter is of a magnitude of ms, and can be performed during the pausing in between theexperiments. We apply this evaluation to the signals of 5 subjects provided by the dataset.

Subject-independent training When time is an important factor, it is major achievement to beable to omit the training phase while achieving adequate results. For that reason, we consider thequestion

Does the training of a subset of subjects generalize to the remaining ones? (2)

for the selected pipeline. Like EEG-data is noisy and there might be different hyperparametersemerging from the crossvalidation, we consider the possibility, that even though it is not likely thateach subject generalizes to each other subject, there might be some clusters consisting of subjects withsimilar responses. We denote the fact that evaluation of samples from subject si with the classifiertrained on subject sj yields acceptable results (significantly above chance) as

si ← sj .

For being exploited to omit the training phase for new subjects, it is beneficial for the clusters to forman Equivalence relation; hence to satisfy the following three properties for arbitrary valid indicesi, j, k ∈ N0:

1. reflexivity, the property si ← si is guaranteed in case the classification pipeline yieldssufficient results for single-subject training for each of the selected subjects.

2. transitivity, defined as the implication:

si ← sj ← sk ⇒ si ← sk,

hence if it follows, that si can be classified using the classifier trained on subject sk.3. symmetry, defined as

si ← sj ⇒ sj ← si,

thus, the set of subjects used for training and testing can be swapped without major influenceon the percentage of subjects yielding acceptable results.

To that end, we first evaluated, whether those properties hold by training on one of the subjectsand evaluating on the others. We provide a brief extract of the relationship diagram and investigate,whether the above mentioned properties hold. Secondly, we analyze the performance of training witha subset of subjects and to evaluate on the samples of a different set.

4.3 Improvement of the classification performance

In the last experiment, we determine the performance of a new created pipeline, designed for achievinga better classification performance and to offer a notion of uncertainty in the classification result.

The pipeline is constructed on the basis of the CSP-LDA pipeline, augmented by a first data-preprocessing step, increasing the sample-to dimensionality ratio.In the previous experiments, there are only 45 samples to our disposal – a number that is even furtherreduced by the separation into training data and crossvalidation splits. The raw signal of each sampleon the other hand has got a dimensionality of 38, 720. This set of 45 samples if further referred to asthe original set.

We increase the amount of samples that can be used by treating each sliding window with the fixeddiameter taken from the crossvalidation in section 4.1 as a new samples. We call this set the enhancedset4. These enhanced samples are highly overlapping, which violates the assumption, that the datais independently drawn from a fixed probability distribution. For avoiding data snooping, the datais split into training set, test set and crossvalidation splits before the just-described data generationprocess. For the evaluation on the test set, we infer the label of an original sample from a majorityvote of the corresponding enhanced samples. The percentage of differently assigned enhancedsamples yields information on the uncertainty of the prediction.

4Note that the considered sliding windows are those evaluated separately in section 4.2.

6

5 Results

In the following sections we present the results of the experiments described in section 4.

5.1 Pipeline evaluation

PSD analysis We observe a significant mean PSD difference between the two classes up to 30Hz,as depicted in Figure 3a. Normalization of the amplitudes by the corresponding frequency’s

(a) Average difference in PSD with respect to chan-nels and subjects.

(b) Average difference normalized by the occur-rence of the frequency in the signal.

Figure 3: Difference in Power Spectral Density.

fraction of occurrence yields an entirely different distribution, depicted in Figure 3b. However, higherfrequencies than 30Hz barely occur and are hence of no use for the classification task. Like theoccurrence of frequencies below 7Hz is unstable, a bandpass filter is applied to the data filtering outall frequencies that are not between 7 and 30Hz.

Pipeline on subject 1 Considering the classifiers’ crossvalidation and test-set performance onsubject 1 in Figure 4a respectively b, two main results can be observed. Firstly, the classifiers

(a) Crossvalidation performance and standard de-viation. The best crossvalidation performance isachieved by the CSP LDA pipeline for n = 4.

(b) Test set performance: the CSP-involvingpipelines yield the best results.

Figure 4: Performance of the classifiers trained on subject 1.

involving CSP prefer a lower amount of spatial filters: The crossvalidation performance is almostmonotonous decreasing with n. Secondly, both the test set - and the crossvalidation performance ofthe two pipelines involving CSP surpass those of the others.

Surprisingly, the Fast ICA pipeline does not perform significantly better than the PCA approach. Thecrossvalidation standard deviation is lowest for the two CSP classifiers. Best results can be achievedwith n ≈ 2.

More detailed information on the misclassified samples are contained in the confusion matrices inFigure 5.

7

Figure 5: Confusion matrices for each pipeline, training on subject 1.

Like the combination of the CSP with the LDA tends to yield the best performance, the followingexperiments that are investigating on the performance of the pipeline are conducted using the CSP-LDA pipeline.

Pipeline on multiple subjects The selected pipeline is also applicable to other subjects, as you cansee in Figure 6. Even though there is a large variation in the selected n, the property that a lower n ispreferred, holds to a majority of tested subjects. Plus, the mean test set performance shrinks with alarger crossvalidated n.

(a) A histogram containing the test set perfor-mance.

(b) Alongside a histogram showing the occurrenceof optimal n you can see the mean test set perfor-mance of the corresponding subjects.

Figure 6: Test set performance of 50 subjects, including information on the selected n and thecorresponding mean test-set performance for the CSP LDA pipeline.

5.2 Time consumption

Correct classifications per hour Figure 7a displays the test set performance for the pipelinecrossvalidated with respect to the optimal amount of components. This crossvalidation is performedfor each beginning point in time of the selected interval on the x-axis and for each of the 7 subjects.The blue curve, achieving a test-set performance of 100%, does not yield the largest amount of correctclassifications per hour. In case wrong classifications do not affect the system, it is often preferable tochose a classifier using a small window over the best-performing classifier, such as the classifier ofthe pink subject. It uses a diameter of 0.5 seconds compared to 2.5 seconds for the blue classifier.

Subject-independent classification In Figure 8a you can see the test set performance for trainingon subject 1 and evaluation on all the available subjects as an example image for the conductedintra-subject classification experiments. The properties listed below do also hold for training on othersubjects. The first thing to notice is the classifier’s chance level performance for all subject exceptfor the subjects 1 and 93. As expected, the relation’s reflexivity property holds. The extract of therelationship map in Figure 8 reveals that this is not the case for symmetry and transitivity. While

8

Figure 7: Performance respectively correct classifications per hour. The x-axis contains the startingpoint of the sliding window. Each subject – being depicted with a different color – possesses itscustom sliding-window-diameter, crossvalidated beforehand. For both graphs, the output has beensmoothened, depicting only the best classification result in a range of 3 discrete points in time andinterpolated using a cubic spline between two accepted discrete points in time 5.

subject 93 can be trained using the pipeline of subject 1, this process cannot be reversed. Plus, eventhough also subject 34 can be trained using the classifier of subject 94, the performance of subject 1’sclassifier on it is around 50%. These example results generalize; the overall percentage of symmetricand two-step transitive relationships is at chance level. Thus it is not possible to omit the training

(a) Performance of the classifier trained on subject1 evaluated on all available subjects.

(b) An extract of the relationship map. The edgesare defined as described in section 4.2. The evalua-tion has been performed for the subjects identifiedby the number inside the nodes.

Figure 8: Evaluation of intra-subject classification.

phase for the presented classifier, using the optimal parameters for one subject. The intra-subjectclassification results achieved using multiple randomly selected subject for training and a random setof remaining subjects for testing in Figure 9 emphasize on this observation: for subject-independentclassification we do only achieve a test set performance of approximately 60%.

5.3 Improvement of the classification performance

Figure 10 depicts the test-set results for he enhanced classifier, described in section 4.3. Furtherinvestigation show that the new-created classifier achieves slightly better results than those of theCSP-LDA pipeline on the first 10 subjects. Plus, it offers the notion of uncertainty, as sketched inFigure 11b). The overall uncertainty tends to be larger for wrong-classified samples (with a mean of≈ 30% compared to ≈ 10% for the first 5 subjects.

A major drawback is an increase in the training time, like the crossvalidation involves the diameterof the extracted windows. As far as subject-independent classification is concerned, the previouslyused classifier resembles the newly-created one. Not only the symmetry and transitivity properties do

5This procedure does not change the output, but just improves the readability

9

Figure 9: The test set performance for subject-independent training and testing.

(a) Confusion matrix of the enhanced respectivelyoriginal set, the latter being inferred by a majorityvote of the corresponding enhanced samples. Allthe original samples have been correctly classified.Also in the enhanced set, the test set performanceis acceptable.

(b) Histogram containing the uncertainty of theoriginal samples’ prediction. Two of the correctlyclsasified samples have a rather large uncertainty.

Figure 10: Test-set performance and uncertainty: new created classifier applied on subject 1.

not hold for either of the two classifiers, but also there is a large proximity in the related subjects ascan be seen comparing Figure 8 to Figure 11. The results indicate that the new pipeline behaves a bitbetter for intra subject classification.

(a) Performance of the classifier trained on subject1 evaluated on all available subjects.

(b) An extract of the relationship map. The edgesare defined as described in section 4.2. The evalua-tion has been performed for the subjects identifiedby the number inside the nodes.

Figure 11: Evaluation of intra-subject classification for the new pipeline.

10

6 Discussion

CSP is currently the state of the art method for EEG-based BCI’s specifically designed to discriminatemotor imaginary signal classes from each other. Combined with simple LDA it offers a reasonabletrade-off between accuracy and speed of computation. For our experiments, this pipeline achievedbetter results than any of the other tested state-of-the-art pipelines. It is applicable for all testedsubjects in our dataset. We reproduced the classification performance of this pipeline that has beenpresented by other papers [14, 15].

We further conducted experiments for optimizing the time consumption for data acquisition andclassification. Using a pause of 1 second in between the experiments, the best achieved value is2, 000 classifications per hour.

It turned out that it is not possible to use the pipeline in a subject-independent fashion, thus to omitthe training phase for new subjects. This is due to high inter-subject and inter-session variability. Fewresearch papers [11, 12] were published trying to generalize signal preprocessing and classifiers andreported that the highest accuracy for subject independent BCI systems was around 77%.

To our best knowledge there is no research paper that deals with an improvement of the classificationperformance by data expansion. Our sliding-window approach for data expansion turned out tobe successful and to slightly outperform the CSP-LDA pipeline in our setup. This advantage inperformance comes with a major drawback: the computation time for training increases significantly.Also for this approach, we did not achieve good results for subject independent classification.

11

References

[1] Alexander, J.A. & Mozer, M.C. (1995) Template-based algorithms for connectionist rule extraction. InG. Tesauro, D.S. Touretzky and T.K. Leen (eds.), Advances in Neural Information Processing Systems 7, pp.609–616. Cambridge, MA: MIT Press.

[2] Bower, J.M. & Beeman, D. (1995) The Book of GENESIS: Exploring Realistic Neural Models with theGEneral NEural SImulation System. New York: TELOS/Springer–Verlag.

[3] Hasselmo, M.E., Schnell, E. & Barkai, E. (1995) Dynamics of learning and recall at excitatory recurrentsynapses and cholinergic modulation in rat hippocampal region CA3. Journal of Neuroscience 15(7):5249-5262.

[4] Tim Urban (2017) Neuralink and the Brain’s Magical Future, https://waitbutwhy.com/2017/04/neuralink.html#part3

[5] Grosse-Wentrup M, Buss M (2008) Multiclass Common Spatial Patterns and Information Theoretic FeatureExtraction. In IEEE Transactions on Biomedical Engineering 55(8):1991 - 2000

[6] Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R. BCI2000: A General-PurposeBrain-Computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering 51(6):1034-1043, 2004.[In 2008, this paper received the Best Paper Award from IEEE TBME.]

[7] www.bci2000.org

[8] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, PengC-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource forComplex Physiologic Signals. Circulation 101(23):e215-e220 [Circulation Electronic Pages; http://circ.ahajournals.org/cgi/content/full/101/23/e215]; 2000 (June 13).

[9] Ahn M., Hong J.H., Jun S.C. (2010) Source Space Based Brain Computer Interface. In: Supek S., SušacA. (eds) 17th International Conference on Biomagnetism Advances in Biomagnetism – Biomag2010. IFMBEProceedings, vol 28. Springer, Berlin, Heidelberg

[10] Blankerzt et al. 2008, Optimizing Spatial Filters for Robust EEG Single-Trial Analysis. IEEE SignalProcessing Magazine 25(1):41 - 56

[11] Hoang, T.T. (2014) Multivariate Features for Multi-class Brain Computer Interface Sys-tems. University of Canberra. http://www.canberra.edu.au/researchrepository/file/b4366e1c-d2eb-41fc-b8cf-4fcdd077887b/1/full_text.pdf

[12] Lotte, F., Guan, C., and Ang, K. K. (2009). Comparison of designs towards a subject-independent brain-computer interface based on motor imagery. In Engineering in Medicine and Biology Society, 2009. EMBC2009. Annual International Conference of the IEEE, pages 4543–4546.

[13] Reuderink, B., Farquhar, J., Poel, M., and Nijholt, A. (2011). A subject-independent brain-computerinterface based on smoothed, second-order baselining. In 33rd Annual IEEE Conference on Engineering inMedicine and Biology, EMBC 2011, pages 4600–4604

[14] Pfurtscheller G, Brunner C, Schlgl A, Lopes da Silva FH (2006) Mu rhythm (de)synchronization and EEGsingle-trial classification of different motor imagery tasks. NeuroImage 31: 153–159. pmid:16443377

[15] Graimann B, Allison BZ, Pfurtscheller G (2010) Brain-Computer Interfaces: Revolutionizing Human-Computer Interaction. Springer Science & Business Media. 00066.

12