Personalizing EEG-based Affective Models with Transfer ... · an EEG dataset, SEED1, to personalize EEG-based affective models. 2 Methods 2.1 TCA-based Subject Transfer Transfer component

Personalizing EEG-based Affective Models with Transfer Learning

Wei-Long Zheng1 and Bao-Liang Lu1,2,3,∗

1Center for Brain-like Computing and Machine IntelligenceDepartment of Computer Science and Engineering

2Key Laboratory of Shanghai Education Commission forIntelligent Interaction and Cognitive Engineering3Brain Science and Technology Research CenterShanghai Jiao Tong University, Shanghai, China

weilong, [email protected]

AbstractIndividual differences across subjects and non-stationary characteristic of electroencephalography(EEG) limit the generalization of affective brain-computer interfaces in real-world applications. Onthe other hand, it is very time consuming andexpensive to acquire a large number of subject-specific labeled data for learning subject-specificmodels. In this paper, we propose to build per-sonalized EEG-based affective models without la-beled target data using transfer learning techniques.We mainly explore two types of subject-to-subjecttransfer approaches. One is to exploit shared struc-ture underlying source domain (source subject) andtarget domain (target subject). The other is to trainmultiple individual classifiers on source subjectsand transfer knowledge about classifier parametersto target subjects, and its aim is to learn a regressionfunction that maps the relationship between featuredistribution and classifier parameters. We comparethe performance of five different approaches on anEEG dataset for constructing an affective modelwith three affective states: positive, neutral, andnegative. The experimental results demonstrate thatour proposed subject transfer framework achievesthe mean accuracy of 76.31% in comparison with aconventional generic classifier with 56.73% in av-erage.

1 IntroductionAffective brain-computer interfaces (aBCIs) [Muhl et al.,2014a] introduce affective factors into conventional brain-computer interfaces [Chung et al., 2011]. aBCIs provide rele-vant context information about users affective states in brain-computer interfaces (BCIs) [Zander and Jatzev, 2012], whichcan help BCI systems react adaptively according to users’ af-fective states, rather in a rule-based fashion. It is an efficient

∗Corresponding author

way to enhance current BCI systems with an increase of infor-mation flow, while at the same time without additional cost.Therefore, aBCIs have attracted increasing interests in bothresearch and industry communities, and various studies havepresented their efficiency and feasibility [Muhl et al., 2014a;2014b; Jenke et al., 2014; Eaton et al., 2015]. Although thelarge progress has been obtained about development of aB-CIs in recent years, there still exist some challenges such asthe adaptation to changing environments and individual dif-ferences.

Until now, most of studies emphasized choices of fea-tures and classifiers [Singh et al., 2007; Jenke et al., 2014;Abraham et al., 2014] and neglected to consider individualdifferences in target persons. They focus on training subject-specific affective models. However, these approaches arepractically infeasible in real-world scenarios since they needto collect a large number of labeled data. In addition, thecalibration phase is time-consuming and annoying. An in-tuitive and straightforward way to dealing with this problemis to train a generic classifier on the collected data from agroup of subjects and then make inference on the unseendata from a new subject. However, the existing studies in-dicated that following this way, the performance of gener-ic classifiers were dramatically degraded due to the struc-tural and functional variability between subjects as well asthe non-stationary nature of EEG signals [Samek et al., 2013;Morioka et al., 2015]. Technically, this issue refers to thecovariate-shift challenges [Sugiyama et al., 2007]. The alter-native way to dealing with this problem is to personalize ageneric classifier for target subjects in an unsupervised fash-ion with knowledge transfer from the existing labeled data inhand.

The problem mentioned above has motivated many re-searchers from different fields in developing transfer learn-ing and domain adaptation algorithms [Duan et al., 2009;Pan and Yang, 2010; Chu et al., 2013]. Transfer learningmethods try to transfer knowledge from source domain to tar-get domain with few or no labeled samples available fromsubjects of interest, which refer to inductive and transductivesetups, respectively. Figure 1 illustrates the covariate-shift

−0.8−0.6

−0.4−0.2

00.2

0.40.6

−0.5

0

0.5

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Component 1Component 2

Com

pone

nt 3

Negative (Subject 1)Neutral (Subject 1)Positive (Subject 1)Negative (Subject 2)Neutral (Subject 2)Positive (Subject 2)

Figure 1: Illustration of the covariate-shift challenges of con-structing EEG-based affective models. Here, two sample sub-jects (subjects 1 and 2) are with three classes of emotions andEEG features are different in conditional probability distribu-tion across subjects.

challenges of constructing EEG-based affective models. Tra-ditional machine learning methods have a prior assumptionthat the distributions of training data and test data are inde-pendently and identically distributed (i.i.d.). However, dueto the variability from subject to subject, this assumption cannot be always satisfied in aBCIs.

In this work, we adopt transfer learning algorithms in atransductive setup (without any labeled samples from targetsubjects) to tackle the subject-to-subject variability for build-ing EEG-based affective models. Let X ∈ X be the EEGrecording of a sample (X, y), here y ∈ Y represents the cor-responding emotion labels. In this case, X = RC×d, C isthe number of channels, and d is the number of time seriessamples. Let P (X) be the marginal probability distributionof X . According to [Pan and Yang, 2010], D = X , P (X)is a domain, which in our case is a given subject from whichwe record the EEG signals. The source and target domains inthis paper share the same feature space, XS = XT , but the re-spective marginal probability distributions are different, thatis, P (XS) 6= P (XT ). The key assumption in most domainadaptation methods is that P (YS |XS) = P (YT |XT ).

Recently, Jayaram and colleges made a timely survey oncurrent transfer learning techniques for BCIs [Jayaram et al.,2016]. Morioka et al. proposed to learn a common dictio-nary shared by multiple subjects and used the resting-stateactivity of a previously unseen target subject as calibrationdata for compensating for individual differences, rather thantask sessions [Morioka et al., 2015]. Krauledat and collegesproposed a zero-training framework for extracting prototyp-ical spatial filters that have better generalization properties[Krauledat et al., 2008]. Although most of these methods arebased on the variants of common spacial patterns (CSP) formotor-imagery paradigms, some studies focused on passiveBCIs [Wu et al., 2013] and EEG-based emotion recognition

[Zheng et al., 2015].In this paper, we propose to personalize EEG-based affec-

tive models by adopting two kinds of domain adaptation ap-proaches in an unsupervised manner. One is to find a sharedcommon feature space between source and target domains.We apply Transfer Component Analysis (TCA) and KernelPrinciple Analysis (KPCA) based methods proposed in [Panet al., 2011]. These methods are adopted to learn a set ofcommon transfer components underlying both the source do-main and the target domain. When projected to this subspace,the difference of feature distributions of both domains canbe reduced. The other is to construct individual classifiersand learn a regression function that maps the relationship be-tween data distribution and classifier parameters, which refer-s to Transductive Parameter Transfer (TPT) [Sangineto et al.,2014]. We evaluate the performance of these approaches onan EEG dataset, SEED1, to personalize EEG-based affectivemodels.

2 Methods2.1 TCA-based Subject TransferTransfer component analysis (TCA) proposed by Pan et al.learns a set of common transfer components between sourcedomain and target domain [Pan et al., 2011] and finds alow-dimensional feature subspace across source domain andtarget domain, where the difference of feature distribution-s between two domains can be reduced. The aim of thistransfer learning algorithm is to find a transformation φ(·)such that P (φ(XS)) ≈ P (φ(XT )) and P (YS |φ(XS)) ≈P (YT |φ(XT )) without any labeled data in target domain (tar-get subjects). An intuitive approach to find the mappingφ(·) is to minimize the Maximum Mean Discrepancy (MMD)[Gretton et al., 2006] between the empirical means of the twodomains,

MMD(X ′S , X′T ) = || 1

n1

n1∑i=1

φ(xsi )−1

n2

n2∑i=1

φ(xti)||2H, (1)

where n1 and n2 represent the sample numbers of source do-main and target domain, respectively. However, φ(·) is usu-ally highly nonlinear and a direct optimization with respectto φ(·) can easily get stuck in poor local minima [Pan et al.,2011].

TCA is a dimensionality reduction based domain adapta-tion method. It embeds both the source and target domaindata into a shared low-dimensional latent space using a map-ping φ. Specially, let the Gram matrices defined on the sourcedomain, target domain and cross-domain data in the embed-ded space beKS,S ,KT,T , andKS,T , respectively. The kernelmatrix K is defined on all the data as

K =

[KS,S KS,T

KT,S KT,T

]∈ R(n1+n2)×(n1+n2). (2)

By virtue of kernel trick, the MMD distance can be rewrittenas tr(KL), where K = [φ(xi)

>φ(xj)], and Lij = 1/n21 ifxi, xj ∈ XS , else Lij = 1/n22 if xi, xj ∈ XT , otherwise,

1http://bcmi.sjtu.edu.cn/˜seed/index.html

Lij = −(1/n1n2). A matrix∼W ∈ R(n1+n2)×m transforms

the empirical kernel map K to an m-dimension space (wherem n1 + n2). The resultant kernel matrix is∼K = (KK−1/2

∼W (

∼W>K−1/2K)) = KWW>K, (3)

where W = K−1/2∼W . With the definition of

∼K in Eq.(3),

the MMD distance between the empirical means of the twodomain X ′S and X ′T can be rewritten as

Dist(X ′S , X′T ) = tr((KWW>K)L) = tr(W>KLKW ).

(4)A regularization term tr(W>W ) is usually added to controlthe complexity of W , while minimizing Eq.(4).

Besides reducing the difference of the two distributions, φshould also preserve the data variance that is related to thetarget learning task. From Eq.(3), the variance of the project-ed samples is W>KHKW , where H = In1+n2 − (1/(n1 +n2))11> is the centering matrix, 1 ∈ Rn1+n2 is the columnvector with all 1’s, and In1+n2

∈ R(n1+n2)×(n1+n2) is theidentity matrix.

Therefore, the objective function of TCA is

minW

tr(W>KLKW ) + µtr(W>W )

s.t. W>KHKW = Im,(5)

where µ > 0 is a regularization parameter, and Im ∈ Rm×n

is the identity matrix. According to [Pan et al., 2011], thesolutions W are the m leading eigenvectors of (KLK +µI)−1KHK, where m ≤ n1 + n2 − 1. The algorithm ofTCA for subject transfer is summarized in Algorithm 1. Werecommend the readers to refer to [Pan et al., 2011] for thedetailed descriptions of TCA. After obtaining the transfor-mation matrix W , standard machine learning methods can beused in this feature subspace.

Algorithm 1 TCA-based Subject Transferinput : Source domain data set DS = (xsi , ysi )n1

i=1, andtarget domain data set DT = xtj

n2j=1.

output : Transformation matrix W .1: Compute kernel matrix K from xsi

n1i=1 and xtj

n2j=1,

matrix L, and the centering matrix H .2: Eigendecompose the matrix (KLK + µI)−1KLK and

select the m leading eigenvectors to construct the trans-formation matrix W .

3: return tranformation matrix W .

2.2 KPCA-based Subject TransferKernel PCA [Scholkopf et al., 1998] projects the original D-dimensional feature space into an M -dimensional feature s-pace with a nonlinear transformation φ(x), where M D.For KPCA-based subject transfer, we concatenate the sourceand target data as the training data and construct the kernelmatrix. The kernel principal components are then computedusing singular value decomposition. Each sample xi is pro-jected to a point φ(xi) in lower dimensional subspaces. Thealgorithm of KPCA-based subject transfer is summarized inAlgorithm 2.

Algorithm 2 KPCA-based Subject Transferinput : Source domain data set DS = (xsi , ysi )n1

i=1, andtarget domain data set DT = xtj

n2j=1.

output : The kernel principal components pk.1: Concatenate the source and target domain data sets as the

training data set, xin1+n2i=1 = [xsi

n1i=1; xtj

n2j=1].

2: Construct the kernel matrix from the training data setxin1+n2

i=1 .3: Compute the vectors ak.4: Compute the kernel principal components pk.5: Return the kernel principal components pk.

2.3 Transductive Parameter TransferThe transductive parameter transfer (TPT) approach is firstlyproposed by Sangineto and colleges for action units detectionand spontaneous pain recognition [Sangineto et al., 2014].The TPT approach consists of three main steps as illustratedin Figure 2. First, multiple individual classifiers are learnedon each training dataset Ds

i . Second, a regression functionis trained to learn the relation between the data distributionsand classifiers’ parameter vectors. Finally, the target classi-fier is obtained using the target feature distribution and thedistribution-to-classifier mapping.

Individual Classifier

Feature Distributions

Parameter Regression Model

D1

D2

DN

θ1

θ2

θN

Personalized Affective Model

Labeled Source Data

Unlabeled Target Data

Parameter Transfer

Training Phase

Testing PhaseθtXt



...

Figure 2: The framework of the transductive parameter trans-fer (TPT) approach adopted in this work.

We adopt the TPT approach to personalize EEG-based af-fective models in this paper. In the first phase, we train mul-tiple individual classifiers on each source dataset Ds

i . Here,we use linear support vector machine (SVM) as a classifier.θi = [w′i, bi] defines the hyperplane in the feature space. Theobjective function is as follows,

minw,b

1

2‖w‖2 + λL

nsi∑

j=1

l(w′xsj + b, ysj ), (6)

where l(·) is the hinge loss.In the second step, a regression function f is learned for

the mapping: D → Θ with the source data distribution-s. Since the data distributions and the optimal correspond-ing hyperplanes are relevant, we can predict the hyperplaneand construct the classifier on target data without any labelinformation via learning this mapping. To quantify the simi-larity between pairs of datasets Xi and Xj , a kernel functionκ(Xi, Xj) is adopted. In this implement, we use the densityestimation kernel [Blanchard et al., 2011], which is definedas follows,

κ(Xi, Xj) =1

nm

n∑p=1

m∑q=1

κX (xp,xq), (7)

where n,m are cardinality ofXi,Xj , respectively, and κX (·)is a Gaussian kernel. After computing the kernel matrix,the mapping function f can be learned using the Multiout-put Support Vector Regression framework [Tuia et al., 2011].

Finally, the parameter vector of the target classifier can bepredicted by θt = f(Xt) without any label information fromtarget subjects. Given the target features x and the classifi-er parameters θ, the label can be predicted by the decisionfunction: y = sign(w′tx+ bt). The algorithm of TPT-basedsubject transfer is summarized in Algorithm 3. For more de-tails about the TPT algorithm, we recommend the readers torefer to [Sangineto et al., 2014].

Algorithm 3 TPT-based Subject Transferinput : Source domain data sets Ds

1, ...,DsN , target domain

data set Xt, and some regularization parameters for SVMs.output : The parameter vector of target classifier: wt, bt.

1: Construct individual classifiers: θi = (wi, bi)Ni=1.2: Create a training set T = Xs

i ,θiNi=1.3: Compute the kernel matrixK,Kij = κ(Xs

i , Xsj ).

4: Given K and T , learn f(·) using multioutput supportvector regression.

5: Compute (wt, bt) = f(Xt)6: Return wt, bt.

3 Experiment Setup3.1 EEG Dataset for Constructing Affective

ModelsWe adopt film clips as stimuli to elicit emotions in the labo-ratory environment, since film clips contains both visual andauditory stimuli. A preliminary study is conducted using s-cores (1-5) to select a pool of film clips to elicit three emo-tions: positive, neutral, and negative. Each clip is well editedfor its coherent context for corresponding emotion and timelength of about four minutes. Finally, 15 emotional film clipswith high ratings are chosen from the materials. For the ex-perimental protocol, each experiment consists of 15 sessionswith 15 different film clips. For each session, there is a 5s hint for starting before each clip, a 45 s self-assessment,and a 15 s rest after each clip. There are totally 15 subjects

(8 females, mean: 23.27, std: 2.37) participated in our ex-periments. They are all informed about the experiments andinstructress before the experiments. They are required to elic-it their own corresponding emotions while watching the filmclips. Only the data with right elicited emotions are used infurther analysis. EEG data are recorded simultaneously witha 62-electrode cap according to the international 10-20 sys-tem using ESI Neuroscan system. The original sampling rateis 1000 Hz. The impedance of each electrode is lower than 5kΩ.

3.2 Data Preprocessing and Feature ExtractionFor data preprocessing, since there is often contaminationof electromyography (EMG) signals from facial expression-s and Electrooculogram (EOG) signals from eye movementsin EEG data [Fatourechi et al., 2007], the raw EEG data isprocessed with a bandpass filter between 1 Hz and 75 Hz andthe data with serious noise and artifacts are discarded. The62-channel EEG signals are further down-sampled to 200 Hzand EEG features are extracted with each segment of the pre-processed EEG data with a non-overlapping 1 s time window.

For feature extraction, we employ differential entropy (DE)features [Duan et al., 2013; Zheng and Lu, 2015], which showsuperior performance than conventional power spectral den-sity (PSD) features. According to [Zheng and Lu, 2015], fora fixed length EEG sequence, the DE feature is equivalent tothe logarithm of PSD in a certain frequency band. Therefore,the DE features can be calculated in five frequency bands (δ:1-3 Hz, θ: 4-7 Hz, α: 8-13 Hz, β: 14-30 Hz, and γ: 31-50Hz), which are widely used in EEG studies using Short-termFourier transform. The total dimension of a 62-channel EEGsegment is 310.

3.3 Evaluation DetailsWe adopt a leave-one-subject-out cross validation method forthe evaluation. Each time, we separate the data from one sub-ject as the target domain and the resting data from other 14subjects as the source domain. For the baseline method, weconcatenate data from all available subjects as training dataand train a generic classifier with linear SVM. A variant ofSVM called Transductive SVM (T-SVM) is also developedto learn a decision boundary and maximize the margin withunlabeled data [Collobert et al., 2006]. We use the implementof T-SVM in SVM light [Joachims, 1999].

For TCA and KPCA, it is practically infeasible to includeall the data from available subjects due to limits of memoryand time cost for singular value decomposition. Therefore,we randomly select a subset of samples (5000 samples) from14 subjects as training data. For kernel functions, we employlinear kernels. We will evaluate how the performance varieswith respect to the dimensionality of new feature space. Theregularization parameter µ is set to 1, the same as [Pan et al.,2011]. After TCA and KPCA project the original featuresinto low-dimensional subspace using transfer components, s-tandard linear SVMs are trained on the new extracted fea-tures. The value of the regularization parameter for TPT is0.1. To deal with multi-class classification task, we adopt theone vs one strategy to avoid label unbalance problem. All thealgorithms are implemented in MATLAB.

4 Experiment ResultsIn this section, we carry out experiments to demonstratethe effectiveness of the proposed methods for personalizingEEG-based affective models. All results are conducted us-ing a leave-one-subject-out evaluation scheme. First, we e-valuate how the performance of KPCA and TCA approachesvaries with the dimensionality of the low-dimensional fea-ture subspace and choose the best dimensions. Figure 3 de-picts the accuracy curve with respect to varying dimensions.TCA achieves better performance than KPCA in lower di-mensions (less than 30) and reaches the peak accuracy with63.64% in the 30-dimensional subspace. In contrast, the ac-curacies of KPCA approximately increase with the increasingdimensions. The saturation is reached at about 35-dimensionpoint. Regarding the accuracy, TCA outperforms KPCA s-

0 10 20 30 40 50

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

# Dimension

Acc

urac

y

KPCATCA

Figure 3: Comparison of KPCA and TCA approaches for d-ifferent dimensionality of the subspace.

lightly (63.64% vs 61.28%).We compare the performance of different subject transfer

methods. Figure 4 shows the accuracies of different methods(Generic classifiers, KPCA, TCA, T-SVM, and TPT) for to-tal 15 subjects and Table 1 presents the mean accuracies andstandard deviations. The generic classifiers perform poorlywith a mean accuracy of only 56.73% due to the fact that thismethod directly includes all source samples for the training.Since there exist some individual differences across subjectsand some training samples from irrelevant subjects included,all these factors may bias the optimal hyperplane and dra-matically degrade the performance of subject transfer, whichrefers to some negative transfer. TCA and KPCA learn a setof transfer components underlying both the source and tar-get distributions. The feature distributions of both domain-s are similar in the new low-dimensional subspace. BothTCA and KPCA methods outperform the generic classifier-s, indicating the efficient knowledge transfer through featurereduction. T-SVM archives comparative accuracies amongthese methods with the mean accuracy and standard devia-tion of 72.53%/14.00%, respectively. T-SVM learns the de-cision boundary in a semi-supervised manner and weights allthe training instances equally, which may still introduce someirrelevant source data during training.

TPT method outperforms the other four approaches withthe highest accuracy of 76.31% and achieves a significant im-provement in comparison with generic classifiers (one way

Stats. Generic KPCA TCA T-SVM TPTMean 0.5673 0.6128 0.6364 0.7253 0.7631Std. 0.1629 0.1462 0.1488 0.1400 0.1589

Table 1: The mean accuracies and standard deviations of thefive different approaches.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mean0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Subject Index

Acc

urac

y

GenericKPCATCAT-SVMTPT

Figure 4: The accuracies of the five different methods (Generic classifiers, KPCA, TCA, T-SVM, and TPT) for each subject.

Figure 5: The confusion matrices of the five methods.

analysis of variance, p < 0.01). TPT method can measurethe similarity between pairs of data distributions from differ-ent subjects and learn the mapping function from data dis-tributions and classifier parameters. Therefore, it can extractthe relevant information to determine the decision functionand bypass the bias caused by irrelevant information. Be-sides accuracy, we compare the generalization performanceof KPCA, TCA, and TPT approaches. The KPCA and TCAneed to construct the kernel matrix using source and targetdomains and find the manifold subspaces where the differ-ences between them can be reduced. Their limitation is highmemory and time cost for training. They can not constantlybenefit from the increasing samples in the source domain. Incontrast, TPT keeps individual classifiers for different sourcesubjects. The new regression function can be trained onlyon the kernel matrix. It is incremental while data from anew training subject is available. Therefore, TPT approachis more feasible in practice regarding both the performanceand incremental learning property.

The confusion matrices of the five approaches are shownin Figure 5. Each row of the confusion matrix represents thetarget class and each column represents the predicted class.The element (i, j) is the percentage of samples in class i thatis classified as class j. For generic method, the accuraciesfor three emotions are almost similar. For TCA and KPCA,they have an improvement for classifying positive emotion-s with the accuracies of 75.88% and 66.23%, respectively.Besides the improved performance of recognizing positive e-motions, T-SVM has a significant increase in performance forrecognizing neutral emotions (71.52%). TPT has highest ac-

curacies for classifying all of the three emotions among theseapproaches. Comparing the accuracies of different emotions,we can find that positive emotions can be more easily rec-ognized using EEG with the comparatively high accuracy of85.01%. Negative emotions are often confused with neutralemotions (25.76%) and vice versa (10.24%). These result-s indicate that the neural patterns of negative and neutral e-motions are similar. In summary, the experimental resultsdemonstrate the efficiency of the TPT approach for construct-ing personalized affective models from EEG.

5 Conclusion and Future WorkIn this paper, we have proposed a novel method for personal-izing EEG-based affective models with transfer learning tech-niques. The affective models are personalized and construct-ed for a new target subject without any label information. Wehave compared the performance of five different methods:TPT, T-SVM, TCA, KPCA, and conventional generic clas-sifiers. The experimental results have demonstrated that thetransductive parameter transfer approach significantly outper-forms the other approaches in terms of the accuracies, and a19.58% increase in recognition accuracy is achieved. TPTcan capture the similarity between data distributions takingadvantages of kernel functions and learn the mapping fromdata distributions to classifier parameters with the regressionframework. For the future work, we will evaluate the per-formance of our proposed approaches for more categories ofemotions as well as the publicly available EEG datasets.

AcknowledgmentsThis work was supported in part by the grants fromthe National Natural Science Foundation of China (GrantNo.61272248), the National Basic Research Program of Chi-na (Grant No.2013CB329401), and the Major Basic ResearchProgram of Shanghai Science and Technology Committee(15JC1400103).

References[Abraham et al., 2014] Alexandre Abraham, Fabian Pe-

dregosa, Michael Eickenberg, Philippe Gervais, AndreasMueller, Jean Kossaifi, Alexandre Gramfort, BertrandThirion, and Gael Varoquaux. Machine learning for neu-roimaging with scikit-learn. Frontiers in Neuroinformatic-s, 8, 2014.

[Blanchard et al., 2011] Gilles Blanchard, Gyemin Lee, andClayton Scott. Generalizing from several related classi-fication tasks to a new unlabeled sample. In AdvancesIn Neural Information Processing Systems, pages 2178–2186, 2011.

[Chu et al., 2013] Wen-Sheng Chu, Fernando De la Torre,and Jeffrey F Cohn. Selective transfer machine for per-sonalized facial action unit detection. In IEEE Conferenceon Computer Vision and Pattern Recognition, pages 3515–3522. IEEE, 2013.

[Chung et al., 2011] Mike Chung, Willy Cheung, ReinholdScherer, and Rajesh PN Rao. A hierarchical architecture

for adaptive brain-computer interfacing. In IJCAI’11, vol-ume 22, page 1647, 2011.

[Collobert et al., 2006] Ronan Collobert, Fabian Sinz, JasonWeston, and Leon Bottou. Large scale transductive SVMs.The Journal of Machine Learning Research, 7:1687–1712,2006.

[Duan et al., 2009] Lixin Duan, Ivor W Tsang, Dong Xu,and Tat-Seng Chua. Domain adaptation from multiplesources via auxiliary classifiers. In 26th Annual Interna-tional Conference on Machine Learning, pages 289–296.ACM, 2009.

[Duan et al., 2013] Ruo-Nan Duan, Jia-Yi Zhu, and Bao-Liang Lu. Differential entropy feature for EEG-based e-motion classification. In 6th International IEEE/EMBSConference on Neural Engineering, pages 81–84. IEEE,2013.

[Eaton et al., 2015] Joel Eaton, Duncan Williams, and Ed-uardo Miranda. The space between us: Evaluating a multi-user affective brain-computer music interface. Brain-Computer Interfaces, 2(2-3):103–116, 2015.

[Fatourechi et al., 2007] Mehrdad Fatourechi, Al-i Bashashati, Rabab K Ward, and Gary E Birch. EMGand EOG artifacts in brain computer interface systems: Asurvey. Clinical Neurophysiology, 118(3):480–494, 2007.

[Gretton et al., 2006] Arthur Gretton, Karsten M Borgwardt,Malte Rasch, Bernhard Scholkopf, and Alex J Smola. A k-ernel method for the two-sample-problem. In Advances inNeural Information Processing Systems, pages 513–520,2006.

[Jayaram et al., 2016] Vinay Jayaram, Morteza Alamgir,Yasemin Altun, Bernhard Scholkopf, and Moritz Grosse-Wentrup. Transfer learning in brain-computer interfaces.IEEE Computational Intelligence Magazine, 11(1):20–31,2016.

[Jenke et al., 2014] Robert Jenke, Angelika Peer, and MartinBuss. Feature extraction and selection for emotion recog-nition from EEG. IEEE Transactions on Affective Com-puting, 5(3):327–339, 2014.

[Joachims, 1999] Thorsten Joachims. Making large scaleSVM learning practical. Technical report, UniversitatDortmund, 1999.

[Krauledat et al., 2008] Matthias Krauledat, MichaelTangermann, Benjamin Blankertz, and Klaus-Robert M-ller. Towards zero training for brain-computer interfacing.PloS One, 3:e2967, 08 2008.

[Morioka et al., 2015] Hiroshi Morioka, Atsunori Kanemu-ra, Jun-ichiro Hirayama, Manabu Shikauchi, Takeshi O-gawa, Shigeyuki Ikeda, Motoaki Kawanabe, and Shin Ishi-i. Learning a common dictionary for subject-transfer de-coding with resting calibration. NeuroImage, 111:167–178, 2015.

[Muhl et al., 2014a] Christian Muhl, Brendan Allison, An-ton Nijholt, and Guillaume Chanel. A survey of affectivebrain computer interfaces: principles, state-of-the-art, andchallenges. Brain-Computer Interfaces, 1(2):66–84, 2014.

[Muhl et al., 2014b] Christian Muhl, Camille Jeunet, and Fa-bien Lotte. EEG-based workload estimation across affec-tive contexts. Frontiers in Neuroscience, 8, 2014.

[Pan and Yang, 2010] Sinno Jialin Pan and Qiang Yang. Asurvey on transfer learning. IEEE Transactions on Knowl-edge and Data Engineering, 22(10):1345–1359, 2010.

[Pan et al., 2011] Sinno Jialin Pan, Ivor W Tsang, James TKwok, and Qiang Yang. Domain adaptation via transfercomponent analysis. IEEE Transactions on Neural Net-works, 22(2):199–210, 2011.

[Samek et al., 2013] Wojciech Samek, Frank C Meinecke,and Klaus-Robert Muller. Transferring subspaces betweensubjects in brain–computer interfacing. IEEE Transactionson Biomedical Engineering, 60(8):2289–2298, 2013.

[Sangineto et al., 2014] Enver Sangineto, Gloria Zen, ElisaRicci, and Nicu Sebe. We are not all equal: Personaliz-ing models for facial expression analysis with transductiveparameter transfer. In ACM International Conference onMultimedia, pages 357–366. ACM, 2014.

[Scholkopf et al., 1998] Bernhard Scholkopf, Alexander S-mola, and Klaus-Robert Muller. Nonlinear componentanalysis as a kernel eigenvalue problem. Neural Compu-tation, 10(5):1299–1319, 1998.

[Singh et al., 2007] Vishwajeet Singh, Krishna P Miyapu-ram, and Raju S Bapi. Detection of cognitive states fromfMRI data using machine learning techniques. In IJ-CAI’07, pages 587–592, 2007.

[Sugiyama et al., 2007] Masashi Sugiyama, MatthiasKrauledat, and Klaus-Robert Muller. Covariate shiftadaptation by importance weighted cross validation. TheJournal of Machine Learning Research, 8:985–1005,2007.

[Tuia et al., 2011] Devis Tuia, Jochem Verrelst, Luis Alon-so, Fernando Perez-Cruz, and Gustavo Camps-Valls. Mul-tioutput support vector regression for remote sensing bio-physical parameter estimation. IEEE Geoscience and Re-mote Sensing Letters, 8(4):804–808, 2011.

[Wu et al., 2013] Dongrui Wu, Brent J Lance, and Thomas DParsons. Collaborative filtering for brain-computer inter-action using transfer learning and active class selection.PloS One, 8(2), 2013.

[Zander and Jatzev, 2012] Thorsten O Zander and S Jatzev.Context-aware brain–computer interfaces: exploring theinformation space of user, technical system and environ-ment. Journal of Neural Engineering, 9(1):016003, 2012.

[Zheng and Lu, 2015] Wei-Long Zheng and Bao-Liang Lu.Investigating critical frequency bands and channels forEEG-based emotion recognition with deep neural net-works. IEEE Transactions on Autonomous Mental Devel-opment, 7(3):162–175, 2015.

[Zheng et al., 2015] Wei-Long Zheng, Yong-Qi Zhang, Jia-Yi Zhu, and Bao-Liang Lu. Transfer components betweensubjects for EEG-based emotion recognition. In Interna-tional Conference on Affective Computing and IntelligentInteraction, pages 917–922. IEEE, 2015.

Documents

Personalizing EEG-based Affective Models with Transfer ... · an EEG dataset, SEED1, to personalize EEG-based affective models. 2 Methods 2.1 TCA-based Subject Transfer Transfer component