Activity recognition with android phone using mixture-of ...sclab.yonsei.ac.kr/publications/Papers/IJ/2013_activity recognition... · Activity recognition Android phone abstract As

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing

0925-23http://d

n CorrE-m

sbcho@

Pleasand

journal homepage: www.elsevier.com/locate/neucom

Activity recognition with android phone using mixture-of-expertsco-trained with labeled and unlabeled data

Young-Seol Lee, Sung-Bae Cho n

Department of Computer Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, South Korea

a r t i c l e i n f o

Article history:Received 27 February 2012Received in revised form20 March 2013Accepted 30 May 2013

Keywords:Mixture-of-expertsCo-trainingActivity recognitionAndroid phone

12/$ - see front matter & 2013 Published by Ex.doi.org/10.1016/j.neucom.2013.05.044

esponding author. Tel.: +82 2 2123 2720; fax:ail addresses: [email protected] (Y.-S. Lecs.yonsei.ac.kr (S.-B. Cho).

e cite this article as: Y.-S. Lee, S.-B. Chunlabeled data, Neurocomputing (20

a b s t r a c t

As the number of smartphone users has grown recently, many context-aware services have been studiedand launched. Activity recognition becomes one of the important issues for user adaptive services onthe mobile phones. Even though many researchers have attempted to recognize a user's activities on amobile device, it is still difficult to infer human activities from uncertain, incomplete and insufficientmobile sensor data. We present a method to recognize a person's activities from sensors in a mobilephone using mixture-of-experts (ME) model. In order to train the ME model, we have applied global–local co-training (GLCT) algorithm with both labeled and unlabeled data to improve the performance.The GLCT is a variation of co-training that uses a global model and a local model together. To evaluate theusefulness of the proposed method, we have conducted experiments using real datasets collected fromGoogle Android smartphones. This paper is a revised and extended version of a paper that was presentedat HAIS 2011.

& 2013 Published by Elsevier B.V.

1. Introduction

Smart phones, such as Google Android phone and AppleiPhone, have recently incorporated diverse and powerful sensors.The sensors include GPS receivers, microphones, cameras, lightsensors, temperature sensors, digital compasses, magnetometersand accelerometers. Because of the small size and the superiorcomputing power of the smartphones, they can become powerfulsensing devices that can monitor a user's context in real time.Especially, Android-based smart phones are suitable platforms tocollect sensing information because the Android operating systemis free, open-source, easy to program, and expected to become adominant smartphone in the marketplace.

Many context-aware services were introduced and they tried toprovide a user with convenience on a mobile phone. For instance,there are social networking applications such as Facebook [1] andMySpace [2]. Foursquare provides a location-based social network-ing service. It ranks the users by the frequency of visiting a specificlocation and encourages them to check in the place. Loopt servicecan recommend some locations for the users, and Whoshereservice helps users to share friends' locations. Davis et al. tried touse temporal, spatial, and social contexts to manage multimedia

lsevier B.V.

+82 2 365 2579.e),

o, Activity recognition with13), http://dx.doi.org/10.101

content with a camera phone [3]. Until now, most of the com-mercial services use only raw data like GPS coordinates.

In the mobile environment, inferring context informationbecomes an important issue for mobile cooperative services. Bycapturing more meaningful context in real time, we can developmore adaptive applications to the changing environment and userpreferences. Activity recognition technology is one of the coretechnologies to provide user adaptive services. Many researchershave attempted to infer high-level semantic information fromraw data collected in a mobile device. Bellotti et al. developed aMaggiti system which predicted a user's activities (eating, shop-ping, etc.) and provided suitable services on Windows Mobileplatform [4]. Kwapisz et al. studied a method to recognize a user'sbehavior using cell phone accelerometers [5]. Chen proposed intel-ligent location-based mobile news service as a kind of location-based service [6]. Other researchers have studied context-awaresystems considering efficient energy management. Paek et al. con-trolled GPS receiver to reduce energy consumption on a smartphone[7]. Ravi et al. predicted battery charge cycle and recommended thecharge [8]. Other researchers tried to stop unnecessary functional-ities in the context. It is difficult to recognize a user's situationbecause of uncertainty and incompleteness of context in mobileenvironment.

Most of the research used various statistical analysis andmachine learning techniques such as probabilistic model, fuzzylogic, and case based reasoning. However, it is still difficult toapply them to mobile devices because of two problems. First,although the machine learning techniques are appropriate to deal

android phone using mixture-of-experts co-trained with labeled6/j.neucom.2013.05.044i

www.sciencedirect.com/science/journal/09252312

www.elsevier.com/locate/neucom

http://dx.doi.org/10.1016/j.neucom.2013.05.044



mailto:[email protected]

mailto:[email protected]





Y.-S. Lee, S.-B. Cho / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎2

with vagueness and uncertainty in mobile environment, it isnot easy to classify user's activities using only one classifier fromincomplete mobile data in a complex environment. The otherproblem is that many supervised machine learning techniquesneed hand-labeled data samples for good classification perfor-mance. It requires users to annotate the sensor data. In manycases, unlabeled data are significantly easier to come by thanlabeled ones [9,10]. These labeled data are fairly expensive toobtain because they require human effort [11]. Therefore, wewould like our learning algorithm to be able to take as muchadvantage of the unlabeled data as possible.

In this paper, we present an activity recognition system usingmixture of experts (ME) [12] on Android phone. ME is one of thedivide-and-conquer models which are effective solutions to over-come the limitations of mobile environment. The ME is basedon several local experts to solve smaller problems and a gatingnetwork to combine the solutions from separate experts, each ofwhich is a statistical model for a part of overall problem space. Inorder to train the ME with labeled and unlabeled data, the global–local co-training (GLCT) method is used. GLCT is one of thevariations of co-training [13], and enables us to improve theperformance of ME model by minimizing errors which occur dueto the unlabeled data. To evaluate the usefulness of the proposedsystem, we conducted experiments using real datasets collectedfrom Google Android smart phones.

2. Related works

2.1. Activity recognition using mobile sensors

There are many attempts to recognize a user's actions andbehaviors with mobile sensors. Among all these mobile sensors,the accelerometer is commonly used for activity recognition. All ofthese Android phones contain tri-axial accelerometers that measureacceleration in all three spatial dimensions. The accelerometers arealso capable of detecting the orientation of the device (helped by thefact that they can detect the direction of Earth's gravity), which can

Table 1Activity recognition with accelerometers in a mobile device.

Author Year Classifier Contributi

Frank et al. [18] 2011 Geometric template matching (GTM) DemonstraSun et al. [19] 2010 Support vector machine (SVM) Activity reBrezmes et al. [20] 2009 K-nearest neighbor Activity reBieber et al. [21] 2009 Heuristics (rule) IdentificatYang et al. [22] 2009 Decision tree, k-means clustering, HMM Using deciRavi et al. [23] 2005 Decision tables, C4.5 Decision trees,

k-nearest neighbor, SVM, naïve BayesCombining

accelerBao et al. [24] 2004 Decision tables, instance based learning,

C4.5 decision trees, naïve BayesRecognizin

compu

Table 2Activity recognition with various sensors in a mobile device.

Author Year Sensors C

Hong et al. [25] 2010 Accelerometer, RFID tags D

Wang et al. [26] 2009 Accelerometer, microphone, GPS, WiFi,Bluetooth

T

Choudhury et al. [27] 2008 Accelerometer, barometer,compass, humidity, temperature

P

Consolvo et al. [28] 2008 Accelerometer, barometer, microphone,humidity, etc.

B

Please cite this article as: Y.-S. Lee, S.-B. Cho, Activity recognition withand unlabeled data, Neurocomputing (2013), http://dx.doi.org/10.101

provide useful information for activity recognition. Accelerometerswere initially included in these devices to support advanced gameplay and to enable automatic screen rotation but they clearly havemany other applications. In fact, there are many useful applicationsthat can be built if accelerometers can be used to recognize a user'sactivity. Therefore, using the accelerometer-embedded mobilephones to recognize people′s physical activities becomes the pri-mary choice among all the solutions. Győrbíró et al. presented asystem to recognize a person's activities using MotionBand device.They used a feed-forward neural network to classify activities fromacceleration data [14]. Berchtold et al. proposed an architecture torecognize a user's activities for services. The system utilizes classi-fiers based on fuzzy inference systems which use real time annota-tion to personalization [15]. Zhao et al. developed a smart-phonebased activity reporting system using a three axial accelerometer.The system used decision tree and k-means clustering algorithm torecognize a user's activity [16]. In order to monitor elderly people'shealth, Zhao et al. presented an activity recognition method basedon ELM (a variation of neural network) using accelerometer in amobile phone [17]. Kwapisz et al. developed an Android phonebased activity recognition method using an accelerometer [5].Table 1 summarizes the studies to recognize human activities usingaccelerometers.

Several researchers have investigated activity recognition usingvarious sensors as well as accelerometers as shown in Table 2.However, most of the research focuses on accurate activityrecognition without considering modularization and unlabeleddata for the improved performance.

2.2. Mixture-of-experts using unlabeled data

ME models [12] consist of a set of local experts and a gatingnetwork which combines the decisions from experts. The algo-rithm breaks down a complex problem into a set of simple sub-problems in a recursive manner. The solutions to the sub-problemsare combined to make a solution to the complex problem. In theME model, it is assumed that there are separate sub-problems in amore complex underlying problem. According to the divide-and-

on

tion for activity and gait recognition systemcognition with an accelerometer considering people's pocket positioncognition with no server processing data for a reasonable low costion of physical activity in everyday life on a standard mobile phonesion tree to classify physical activities and HMM to consider activity sequencesclassifiers using Plurality Voting for activity recognition from a singleometerg a variety of household activities using acceleration for context-awareting

lassifier Contribution

ecision tree Hierarchical approach for robust and accuraterecognition

hreshold based heuristics Sensor management for mobile devices to monitoruser state

robabilistic model Using Mobile Sensing Platform (MSP) for activityrecognition

oosted decision stumpclassifiers

System for on-body sensing, and activity inference,and three-week field trial





Y.-S. Lee, S.-B. Cho / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3

conquer principle, the ME model should work well for problemscomposed of smaller unconnected ideas. The experts can bemodeled to give a solution to these smaller sub-problems, whilethe combination of the solutions from the expert will be given bythe gating network. Each expert deals with different features froma different perspective, thereby resolving the small separableproblems. Through the features of the model, it has been appliedto traditional complicated recognition problems [29,30]. One ofthe considerations when implementing ME models is the way toseparate the input data space and generate experts. In order todivide the input data space into several subgroups, clusteringtechniques can be applied as shown in [31]. Some clusteringtechniques such as k-means clustering and hierarchical clusteringhave been widely used.

In supervised learning, local experts in the ME model can betrained easily by using conventional machine learning techniques,for example, MLP [32], support vector machine (SVM) [33] orprobabilistic model [34]. However, in semi-supervised learning, itrequires an extra learning method to deal with unlabeled data.

The simplest way to train the ME model with both labeled andunlabeled data is to apply previous semi-supervised learningmethods such as individual-training or co-training directly tothe model. However, it is already known that local experts aresensitive about the number of the training instances and can bebiased or the variance can be increased when there are only a fewamounts of instances [35]. Especially in semi-supervised learningwhich uses only a small amount of labeled data, the initial MEmodel may not be accurate enough to evaluate unlabeled data. Inorder to overcome this limitation, the semi-supervised learningmethod specialized to the ME model is required.

The EM algorithm to train the ME model in semi-supervisedapproach was proposed for this reason. Nevertheless, it has alimitation caused by the model assumption of EM algorithmmentioned in the previous section, and another enhanced methodis still required.

In this paper, we use the GLCT method to train the ME model.The method is specialized to train the ME model since it over-comes the imperfect model problem related to the lack of traininginstances by using the global model together.

3. Activity recognition using ME

The proposed activity recognition system consists of threesteps: to collect sensor data, preprocess the data and recognize auser's activity. First, after sensor data are collected from sensors ona smartphone, the data are transferred to preprocessing unitsfor extracting features such as mean and standard deviation. Thefollowing Eqs. (1)–(3) denote the features such as the differencebetween previous acceleration and current acceleration, averageacceleration, and standard deviation of acceleration that are extracted

Fig. 1. Activity recognitio


from the datasets.

sum_X ¼ ∑N

i ¼ 1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxi�xi�1Þ2

qð1Þ

mean_X ¼∑N

i ¼ 1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxiþ1�xiÞ2

qN

ð2Þ

std_X ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðxiþ1�xiÞ2

q�mean_X

� �2s

ð3Þ

After the preprocessing step, ME model is used to recognize auser's activities from the feature set. Fig. 1 shows the process of theentire system.

3.1. Mixture-of-experts construction

Prior to applying the ME model to activity recognition, localexperts must be built with separate dataset. Each expert dealswith only a part of dataset. When dividing entire data into severalsubgroups, each subgroup should guarantee similarities betweenmembers in it. This is required to give specialties to local experts indistinguishing some similar patterns even though they are only apart of entire data as shown in Fig. 2.

The clustering technique is used to define boundaries toestimate similarities between data. K-means clustering approachis a method of cluster analysis which aims to assign n data itemsinto k sets in which each item belongs to the set with the nearestmean value. It is a simple method which is easy to implement andshows good performance on common dataset. The distance from thecentroids to data items can be calculated with various measures suchas Euclidean distance and Pearson correlation coefficient. K-meansclustering consists of the following process of five steps.

(1)

n sy

and6/j.n

Determine k, the number of clusters.
(2) Calculate k centroids. (3) Calculate distances of data items to centroids. (4) Assign data items into k sets in terms of the nearest centroid. (5) Iterate (2)–(4) until there is no change of k sets.
In order to implement the global and the local models includ-ing local experts and the gating network, MLP was used for allthe models. For the global model, conventional MLP was applieddirectly. The ME model was designed to include the five localexperts in total. Here, MLP is employed as a base classifier for eachexpert in mixture-of-local-experts. Fig. 3 depicts the structure ofthe ME model used. To train the ME model, learning technique forMLP-based ME model [32] is used. fL (X) in Eqs. (4) and (5) denotesthe output of the mixture-of-experts model where πi is the ithoutput of gating network and fi(X) is the output of the ith local

stem overview.

roid phone using mixture-of-experts co-trained with labeledeucom.2013.05.044i




Fig. 2. Training mixture-of-experts model.

Fig. 3. Mixture-of-experts model.

Fig. 4. Global–local co-training algorithm.


expert. X is an input vector with multivariate attributes.

f LðXÞ ¼ ∑k

i ¼ 1πif iðXÞ; X ¼ fx1;⋯; xng ð4Þ

∑g

i ¼ 1πi ¼ 1; πi≥0ði¼ 1;⋯; kÞ ð5Þ

Each neural network in the model used sigmoid function like Eq. (6)as activation function for a hidden layer and an output layer.

aðxÞ ¼ 11þ e�x ð6Þ

3.2. GLCT algorithm

GLCT algorithm is summarized in Fig. 4. Given a set L of labeledtraining instances, we can obtain the global model MG and the MEmodel ML which consists of N experts μ1, μ2, …, μN and the gatingnetwork g. The algorithm iteratively extracts the instances whichare going to be labeled based on two criteria, preventing degrada-tions and expecting improvements.

Since the models can be imperfect and show poorer perfor-mance than the opponent in some conditions, the models shouldcarefully predict labels, especially when there are not sufficienttraining data for local experts to generalize the entire problem.In order to prevent degradations due to model imperfections, thealgorithm compares the confidences from MG and ML. The modelscan provide labeled instances for the opponents only if each modelis more confident than the other one. The instances satisfying thefirst criterion are added to candidate sets U′L or U′G.

After obtaining the candidate sets, the sets are sorted byconsidering confidence degrees in descending order. At the next


step, at most N instances for each class with higher confidencedegree than the certain threshold e are chosen gradually from thetop of the set to be labeled. By choosing instances with sufficientconfidence, it is expected that the models would be improved atthe next iteration.

The chosen instances are added to the training dataset for eachmodel, LG and LL. The models are trained with newly extended






labeled dataset. If LG and LL do not change, the training stops.Afterwards, the algorithm iterates the training procedure until noadditional data samples for LL and LG remained.

In the co-training algorithm, the local and the global modelsare based on feed-forward neural networks trained with back-propagation learning. The neural networks are repeatedly trainedby gradient descent with the error function on given training setfrom the other model. Newly labeled data by a global model areused to train a local model in the next step and vice versa. InEqs. (7) and (8), EL and EG denote error functions in the additionaltraining set for a local model and a global model, respectively.

EL ¼ jjOG�∑πif LiðxÞjj ð7Þ

EG ¼ jjOL�f GðxÞjj ð8ÞIn Eq. (7), an error function for a local model, OG, is the output

vector of a global model, πi is the gating value of the ith localexpert, and fLi(x) is the output vector of the ith local expert. OL isthe output vector of a local model and fG (x) is the output vector ofa global model. In detail, OG in Eq. (7) can be replaced with theoutput function of a global model at the previous step, f′G (x) andoutput function of a local model at the previous step,∑πi

0f Li0 ðxÞ can

be substituted for OL in Eq. (8) as shown in Eqs. (9) and (10).

EL ¼ jjf G0 ðxÞ�∑πif LiðxÞjj ð9Þ

EG ¼ jj∑πi0f Li

0 ðxÞ�f GðxÞjj ð10ÞIdeally, the co-training algorithm is stopped when there is nodifference between current error and previous error as illustratedin Eq. (11).

co-training is stopped if∂EL∂x

¼ 0 and∂EG∂x

¼ 0 ð11Þ

However, it is practically difficult to satisfy the condition because itis satisfied only when two different views have the same errorvalue or no change of error values from additional training set. Inthis paper, we determine when the co-training is stopped by the

Table 3A summary of the datasets used.

Dataset #Features #Classes #Samples

Acceleration only 6 3 1107Acceleration, orientation, magnetic field, etc. 22 5 661

Fig. 5. (a) Class and datasets, and (b) Androi


number of additional training sets N for a global model and a localmodel. The number of additional training sets N(LL) and N(LG) aredecided by the confidence degree, introduced in the next section.

co-training is stopped when NðLLÞ ¼ 0 and NðLGÞ ¼ 0 ð12Þ

3.3. Measuring confidence degrees

In order to obtain confidence degrees of examples, we designthe measure for confidences. Let conf(w,x) be the original con-fidence degree for input instance x defined by the correspondingmodel w. conf(w, x) can be changed by the type of model used tobuild the global or the local model. In this work, we implementedboth the global and the local models by using MLP. The output ofMLP can be estimated as confidence degrees, which can be simplydefined as follows:

conf ðw; xÞ ¼ fwðxÞjwðxÞ is output of w for xg ð13ÞSince the original problem space is divided into several sub-

spaces to generate local experts, the number of training data foreach expert is rather smaller than the global model. This canmake the ME model have large bias or variance, because thereare chances that some of the training instances for certain class aremissing in some sub-regions even though the original spacecontains all classes. Since some experts have not learned aboutcertain classes, they can generate unexpected values as outputsfor the missing classes according to the type of models used aslocal experts. To prevent this problem, the existence of traininginstances of certain class should be checked prior to computingconfidence degrees. Eqs. (14) and (15) show the modified con-fidence degree for class C of the global model and the ME model,respectively.

conf ðMG; x; cÞ ¼MG;CðxÞ; if NMG;C 400; otherwise

(ð14Þ

conf ðML; x; cÞ ¼ ∑N

i ¼ 1giconf ðμi; x; cÞ

conf ðμi; x; cÞ ¼μi;CðxÞ; if Nμi;c400; otherwise

(ð15Þ

μi is the ith local expert, and g is the gating network. MG,C(x) and μi,c(x) represent the outputs of MG and μi for class C, respectively. gi isthe gating output for the ith local expert and NMG,C and Nμi,C are

d phone based data collection interface.






the numbers of training instances for class C used to train MG andμi, respectively.

4. Experiments

This section presents the experiments conducted to evaluate theusefulness of the proposed method on the pattern classificationproblems. The number of hidden nodes for each neural network isfixed to ten and the learning rate is 0.3. There are two objectives ofthe experiments. Firstly, it is to prove the usefulness of co-training

Fig. 6. Performances according to the rat

Table 5Accuracy comparison for each algorithm on dataset 1 (10-folds).

Algorithm No. of labeleddata

No. of unlabeleddata

Accuracy (%)

Mixture of localexperts

1107 0 92.56 (71.05)


55 1052 76.28 (77.39)

Co-trainingbased ME

55 1052 87.27 (79.92)


110 997 81.07 (75.59)

Co-trainingbased ME

110 997 90.25 (75.82)


166 941 83.48 (75.10)

Co-trainingbased ME

166 941 91.78 (74.72)




Accuracy (%)


1107 0 91.70 (77.61)


55 1052 75.74 (714.56)

Co-training based ME 55 1052 86.34 (711.11)Mixture of local

experts110 997 80.87 (711.05)

Co-training based ME 110 997 90.17 (75.42)Mixture of local

experts166 941 83.04 (79.12)

Co-training based ME 166 941 90.81 (72.96)


based on unlabeled data. Secondly, our proposed method showsbetter performance than semi-supervised self-training as [36].

As shown in Fig. 2, we divide the dataset into some groupsto learn local expert models in our mixture of experts. In theexperiments, we use k-means clustering approach to learn an MEmodel with five experts. In the initial stage, the labeled data areused to learn the model. After the first learning, the unlabeled dataare labeled using the previous learned models.

4.1. Android phone datasets

For the experiments, the evaluation of the proposed method isconducted on two Android phone datasets as shown in Table 3.The two types of datasets consist of acceleration, orientation, andmagnetic fields to classify a user's transportation mode which arecollected from Android OS smart phones using an interface asshown in Fig. 5(b). The dataset can be downloaded and testedin the author's homepages (http://sclab.yonsei.ac.kr/movingstatus.csv and http://sclab.yonsei.ac.kr/movingstatus_2.csv).

For all datasets, data instances were partially labeled depend-ing on the ratio of the instances labeled. Labeled data instanceswere chosen randomly by ignoring class distributions. The classlabels are still, walk, and run in dataset 1 and still, walk, vehicle,run, and subway in dataset 2 as shown in Fig. 5(a).

4.2. Experimental results with android phone datasets

Firstly, we conduct experiments to show the improvement ofperformance by using unlabeled data. For each experiment, theperformances of models trained in three different conditions thatare trained with all labeled data, only with partially labeled dataand with both labeled and unlabeled data were evaluated by5-folds and 10-folds cross validation. All experiments were per-formed ten times to obtain the average accuracy.

First of all, we conducted learning process for two differentdatasets to show the feasibility of the proposed method. Espe-cially, in order to show that GLCT performs well regardless of theratio of initial labeled data, we changed the ratio of labeledinstances to observe performances according to the amount ofinitial labeled data. Tables 4 and 5 and Fig. 6 show the result ofthe test on dataset 1 and Tables 5–7 and Fig. 7 show the result ofdataset 2. As the result, the model trained with both labeled andunlabeled data showed better performance than the model withonly labeled data. There is no big difference between 5-folds and10-folds cross validation. The standard NN and ME models weretrained well for all datasets by the proposed GLCT method. This

io of labeled instances for dataset 1.


http://sclab.yonsei.ac.kr/movingstatus.csv

http://sclab.yonsei.ac.kr/movingstatus.csv

http://sclab.yonsei.ac.kr/movingstatus_2.csv





result implied that the GLCT can be used as the training methodfor the ME model in semi-supervised approach.

The performance of the model was improved in every case from7% to 10% for dataset 1 and from 7% to 8% for dataset 2. With only 5%initial labeled data, the model showed the most significant perfor-mance improvement for dataset 1. However, the results for dataset 2did not show the big difference in performance improvement.

Fig. 7. The model performance co




Accuracy (%) fordataset 2


661 0 73.30 (71.78)


33 628 61.03 (73.95)

Co-trainingbased ME

33 628 68.10 (75.57)


66 595 55.12 (71.62)

Co-trainingbased ME

66 595 63.61 (77.59)


99 562 61.03 (72.82)

Co-trainingbased ME

99 562 68.98 (76.10)




Accuracy (%) for dataset 2


661 0 71.56 (75.85)


33 628 60.94 (78.28)

Co-training basedME

33 628 68.09 (73.69)


66 595 54.87 (73.90)

Co-training basedME

66 595 61.85 (72.94)


99 562 60.75 (76.61)

Co-training basedME

99 562 68.10 (72.63)


Table 8 shows the result of paired t-tests to examine thesignificant difference between ME model trained with only par-tially labeled data and that trained with labeled and unlabeleddata. The t-value is greater than 4.78 for all tests with the degree offreedom 9 and the probability is less than 0.001, indicating thatthe two models are significantly different in their accuracy with99.9% confidence.

On the other hand, Table 9 summarizes the result of F-tests toevaluate the statistical difference between ME model trained withonly partially labeled data and that trained with labeled andunlabeled data. The values in Table 9 are greater than thresholdfor all tests, and it implies that the two models are significantlydifferent with 99.9% confidence.

Finally, we compared GLCT with the alternative way to train theME model to show the superiority of GLCT. In this experiment, the

mparison result for dataset 2.

Table 8Paired t-test between ME_L and ME_UL.

5% Labeled 10% Labeled 15% Labeled

Dataset 1 5.130445806 7.532020147 5.623460193Dataset 2 5.861555853 13.29366642 7.361158544

Table 9F-test between ME_L and ME_UL.

5% Labeled 10% Labeled 15% Labeled

Dataset 1 21.45512662 42.58100057 37.26350818Dataset 2 30.75440238 30.05156706 32.78501936

Input:Labeled training data LUnlabeled training data UClassifier A

Output:The predicted labels for unlabeled examplesOne trained classifier: A

Repeat until U is empty:Build classifier A by LUse A to examples in U. For each class C, pick the

unlabeled examples with higher confidence degree, and add them into L

Fig. 8. Individual-training algorithm.






individual-training algorithm [36,37] was used as the alternativetechnique to train the ME. As mentioned in Section 2, it is one ofthe conventional semi-supervised learning techniques to train theME. Fig. 8 shows the algorithm of the individual-training.

For the experiment, the dataset 1 and 2 with 5%, 10%, 15%labeled instances were used as shown in Table 10. GLCT producessignificantly improved performance than the individual-trainingalgorithm as shown in Figs. 9 and 10. Figs. 11 and 12 show thechange of accuracies by iterating co-training and individual-trainingmethods, respectively. Initial accuracies of the two methods were

Fig. 9. Comparison between co-trainin

Fig. 10. Comparison between co-training

Table 10Performances comparison between GLCT and individual-training.

Label ratio (%) GLCT Individual-training

Dataset1 5 86.34 (711.11) 52.16 (75.91)Dataset2 5 68.09 (73.69) 45.33 (76.27)Dataset1 10 90.17 (75.42) 61.12 (73.75)Dataset2 10 61.85 (72.94) 49.81 (73.90)Dataset1 15 90.81 (72.96) 66.08 (75.38)Dataset2 15 68.10 (72.63) 57.04 (73.85)


similar, but as iterations increased, the GLCT yielded greaterimprovement in performance than the individual-training. In manycases, the performance of individual-training tends to get worse asthe iteration goes by.

g and individual-training for dataset 1.

and individual-training for dataset 2.

Fig. 11. Change of accuracies by iterating two methods for dataset 1.





Fig. 12. Change of accuracies by iterating two methods for dataset 2.


5. Concluding remarks

In this paper, we have proposed an activity recognition systemusing the mixture-of-experts (ME) model with both labeled andunlabeled data. ME is a variant of divide-and-conquer paradigm,and it is suitable to solve complex problem to recognize a user'sactivity from uncertain, and incomplete mobile sensor data. TheGLCT method is also used to train the ME model. In order toovercome the limitation of the local experts with small amount oftraining data, we employed the GLCT method to improve theME model. GLCT is a method to build a model in which unlabeleddata can be used to augment labeled data, based on having twoviews (global model and mixture of experts). The two views areredundant but not completely correlated. It is assumed thatunlabeled instances can potentially be used to increase theaccuracy of the model. The contribution lies in applying ME modeland the GLCT method to activity recognition on a mobile phone.

To show the usefulness of the proposed system, we conductedexperiments using datasets collected from Google Android smartphones. The result of the experiments showed that GLCT could trainthe MEmodel well by predicting unlabeled instances accurately, andoutperformed the individual training algorithm for the ME model.We can use co-training applied to the large volume of unlabeledacceleration data, to improve the accuracy of activity recognition. Itcan also be applied to activity recognition using multiple sensorssuch as orientation, magnetic field and so on. The important point isthat a large volume of unlabeled instances can easily be collected inreal world, but the labeled instances are very expensive. GLCT can bea method to use the unlabeled data for context recognition.

Even though the performance of GLCT was promising inexperiments, the method should be justified with theoreticalanalysis. In the future, we are planning to prove that the MEmodel is learnable by GLCT theoretically. We also have to applythis method to other domains which have data with missingfeatures, and analyze the effectiveness to maximize performance.

Acknowledgment

This research was supported by the Industrial Strategic Tech-nology Development Program (10044828) funded by the Ministryof Trade, Industry and Energy (MI, Korea), and the ICT R&DProgram 2013 funded by the MSIP (Ministry of Science, ICT &Future Planning, Korea).

References

[1] Facebook ⟨http://www.facebook.com⟩.[2] MySpace ⟨http://www.myspace.com⟩.


[3] M. Davis, J. Canny, V.N. House, N. Good, S. King, R. Nair, C. Burgener,R. Strickland, G. Campbell, S. Fisher, N. Reid, MMM2: Mobile Media Metadatafor Media Sharing, ACM MM 2005, 2005, pp. 267–268.

[4] V. Bellotti, B. Begole, H.E. Chi, N. Ducheneaut, J. Fang, E. Isaacs, T. King,W.M. Newman, K. Partridge, B. Price, P. Rasmussen, M. Roberts, J.D. Schiano,A. Walendowski, Activity-based serendipitous recommendations with theMagitti mobile leisure guide, in: CHI 2008 Proceedings, 2008, pp. 1157–1166.

[5] J.R. Kwapisz, G.M. Weiss, S.A. Moore, Activity recognition using cell phoneaccelerometers, in: 4th ACM SIGKDD International Workshop on KnowledgeDiscovery from Sensor Data, 2010.

[6] M. Böhmer, G. Bauer, A. Krüger, Exploring the design space of context-awarerecommender systems that suggest mobile applications, in: 2nd Workshop onContext-Aware Recommender Systems, 2010.

[7] J. Paek, J. Kim, R. Govindan, Energy-efficient rate-adaptive GPS-based position-ing for smartphones, in: Proceedings of the 8th International Conference onMobile Systems, Applications, and Services, 2010, pp. 299-314.

[8] N. Ravi, J. Scott, L. Han, L. Iftode, Context-aware battery management formobile phones, in: Proceedings of the Sixth Annual IEEE InternationalConference on Pervasive Computing and Communications, 2008, pp. 224–233.

[9] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, NJ, USA,1973.

[10] D. Yarowsky, Unsupervised word sense disambiguation rivaling supervisedmethods, in: Proceedings of the 33rd Annual Meeting of the Association forComputational Linguistics, 1995, pp. 189–196.

[11] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training,in: Proceedings of the Eleventh Annual Conference on Computational LearningTheory, 1998, pp. 92–100.

[12] R. Jacobs, M. Jordan, S. Nowlan, G. Hinton, Adaptive mixture local experts,Neural Computation 3 (4) (1991) 79–87.

[13] W. Wang, Z.-H. Zhou, Analyzing co-training style algorithms, Lecture Notes inComputer Science 4701 (2007) 454–465.

[14] N. Győrbíró, Á. Fábián, G. Hományi, An activity recognition system for mobilephone, Mobile Networks and Applications 14 (1) (2009) 82–91.

[15] M. Berchtold, M. Budde, D. Gordon, H. Schmidtke, M. Beigl, ActiServ: activityrecognition service for mobile phones, in: Proceedings of the 2010 Interna-tional Symposium on Wearable Computers, 2010, pp. 1–8.

[16] Z. Zhao, Y. Chen, J. Liu, Z. Shen, M. Liu, Cross-people Mobile-phone BasedActivity Recognition, IJCAI 2011, 2011, pp. 2545–2550.

[17] Z. Zhao, Y. Chen, J. Liu, M. Liu, Cross-mobile ELM based activity recognition,International Journal of Engineering and Industries 1 (1) (2010) 30–40.

[18] J. Frank, S. Mannor, D. Precu, Activity recognition with mobile phones, in:Proceedings of European Conference on Machine Learning 2011 Demo Session,2011.

[19] L. Sun, D. Zhang, B. Li, B. Guo, S. Li, Activity recognition on an accelerometerembedded mobile phone with varying positions and orientations, in: Proceed-ings of the 7th International Conference on Ubiquitous Intelligence andComputing, 2010, pp. 548–562.

[20] T. Brezmes, J. Gorricho, J. Cotrina, Activity recognition from accelerometer dataon a mobile phone, in: Proceedings of the 10th International Work-Conferenceon Artificial Neural Networks: Part II: Distributed Computing, ArtificialIntelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living,2009, pp. 796–799.

[21] G. Bieber, J. Voskamp, B. Urban, Activity recognition for everyday life on mobilephones, in: Proceedings of the 5th International on Conference Universal Accessin Human-Computer Interaction, Part II: Intelligent and Ubiquitous InteractionEnvironments, 2009, pp. 289–296.

[22] J. Yang, Toward physical activity diary: motion recognition using simple accel-eration features with mobile phones, in: Proceedings of the 1st InternationalWorkshop on Interactive Multimedia for Consumer Electronics, 2009, pp. 1–9.

[23] N. Ravi, N. Dandekar, P. Mysore, M.L. Littma, Activity recognition fromaccelerometer data, in: Proceedings of the 17th Conference on InnovativeApplications of Artificial Intelligence, 2005, pp. 1541–1546.

[24] L. Bao, S.S. Intille, Activity recognition from user-annotated acceleration data,Pervasive 2004 (2004) 1–17.

[25] Y.-J. Hong, I.-J. Kim, S.C. Ahn, H.-G. Kim, Mobile health monitoring systembased on activity recognition using accelerometer, Simulation ModellingPractice and Theory 18 (4) (2010) 446–455.

[26] Y. Wang, J. Lin, M. Annavaram, Q.A. Jacobson, J. Hong, B. Krishnamachari, andN. Sadeh, A framework of energy efficient mobile sensing for automatic userstate recognition, in: Proceedings of the 7th International Conference onMobile Systems, Applications, and Services, 2009, pp. 179–192.

[27] T. Choudhury, S. Consolvo, B. Harrison, J. Hightower, A. LaMarca, L. LeGrand,A. Rahimi, A. Rea, G. Borriello, B. Hemingway, P.Pedja Klasnja, K. Koscher,J.A. Landay, J. Lester, D. Wyatt, D. Haehnel, The mobile sensing platform: anembedded activity recognition system, IEEE Pervasive Computing 7 (2) (2008)32–41.

[28] S. Consolvo, D.W. McDonald, T. Toscos, M.Y. Chen, J. Froehlich, B. Harrison, P.Klasnja, A. LaMarca, L. LeGrand, R. Libby, I. Smith, J.A. Landay, Activity sensingin the wild: a field trial of ubifit garden, in: Proceedings of the Twenty-sixthAnnual SIGCHI Conference on Human Factors in Computing Systems, 2008,pp. 1797–1806.

[29] E.D. Ubeyli, Wavelet/mixture of experts network structure for EEG signalsclassification, Expert Systems with Applications 34 (3) (2008) 1954–1962.

[30] R. Ebrahimpour, E. Kabir, H. Esteky, M.R. Yousefi, View-independentface recognition with mixture of experts, Neurocomputing 71 (4–6) (2008)1103–1107.


http://www.facebook.com

http://www.myspace.com

http://refhub.elsevier.com/S0925-2312(13)00696-6/othref0015



















http://refhub.elsevier.com/S0925-2312(13)00696-6/sbref1







































































[31] L.I. Kuncheva, Clustering-and-selection model for classifier combination, in:Proceedings of the Fourth International Conference on Knowledge-Based Intelli-gent Engineering Systems and Allied Technologies, vol. 1, 2000, pp. 185–188.

[32] R. Ebrahimpour, E. Kabir, M.R. Yousefi, Face detection using mixture of MLPexperts, Neural Processing Letters 26 (1) (2007) 69–82.

[33] C.A.M. Lima, A.L.V. Coelho, F.J.V. Zuben, Hybridizing mixtures of experts withsupport vector machines: investigation into nonlinear dynamic systemsidentification, Information Sciences 177 (10) (2007) 2049–2074.

[34] C.M. Bishop, M. Svensén, Bayesian hierarchical mixtures of experts, in:Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelli-gence, 2003, pp. 57–64.

[35] A.H.R. Ko, R. Sabourin, A.S. Britto Jr., From dynamic classifier selection todynamic ensemble selection, Pattern Recognition 41 (5) (2008) 1718–1731.

[36] C. Rosenberg, M. Hebert, H. Schneiderman, Semi-supervised self-training ofobject detection models, in: IEEE 7th Workshops on Application of ComputerVision, vol. 1, 2005, pp. 29–36.

[37] K. Nigam, R. Ghani, Analyzing the effectiveness and applicability of co-training, in: Proceedings of the 9th International Conference on Informationand Knowledge Management, 2000, pp. 86–93.

Young-Seol Lee is a Ph.D. candidate in computerscience at Yonsei University. His research interestsinclude Bayesian networks and evolutionary algorithmsfor context-aware computing and intelligent agents.He received his M.S. degree in computer science fromYonsei University.


Sung-Bae Cho received the B.S. degree in computerscience from Yonsei University, Seoul, Korea and theM.S. and Ph.D. degrees in computer science from KoreaAdvanced Institute of Science and Technology (KAIST),Taejeon, Korea. He was an Invited Researcher of HumanInformation Processing Research Laboratories at AdvancedTelecommunications Research (ATR) Institute, Kyoto,Japan from 1993 to 1995, and a Visiting Scholar atUniversity of New South Wales, Canberra, Australia in1998. He was also a Visiting Professor at University ofBritish Columbia, Vancouver, Canada from 2005 to 2006.Since 1995, he has been a Professor in the Department of
Computer Science, Yonsei University. His research inter-
ests include neural networks, pattern recognition, intelligent man–machine interfaces,evolutionary computation, and artificial life. Dr. Cho was awarded outstanding paperprizes from the IEEE Korea Section in 1989 and 1992, and another one from the KoreaInformation Science Society in 1990. He was also the recipient of the Richard E. Merwinprize from the IEEE Computer Society in 1993. He was listed in Who's Who in PatternRecognition from the International Association for Pattern Recognition in 1994, andreceived the best paper awards at International Conference on Soft Computing in 1996and 1998. Also, he received the best paper award at World Automation Congress in1998, and listed in Marquis Who's Who in Science and Engineering in 2000 and inMarquis Who's Who in the World in 2001. He is a Senior Member of IEEE and aMember of the Korea Information Science Society, INNS, the IEEE ComputationalIntelligence Society, and the IEEE Systems, Man, and Cybernetics Society.
























Documents

Activity recognition with android phone using mixture-of ...sclab.yonsei.ac.kr/publications/Papers/IJ/2013_activity recognition... · Activity recognition Android phone abstract As