8
Inferring and Predicting Context of Mobile Users Erik Meeuwissen, Paul Reinold, and Cynthia Liem User context characterizes the user’s environment and situation. Knowledge of user context helps to improve how people communicate and enables new context-aware applications. This letter concerns inferring and predicting mobile users’ context from time-stamped sequences of location identifiers. Results are validated using data from operational Global System for Mobile Communications* (GSM*) networks. © 2007 Alcatel-Lucent. Bell Labs Technical Journal 12(2), 79–86 (2007) © 2007 Alcatel-Lucent. Published by Wiley Periodicals, Inc. Published online in Wiley InterScience (www.interscience.wiley.com). • 10.1002/bltj.20237 Introduction Within a range of research themes, Bell Labs Europe–Netherlands is researching distributed serv- ice infrastructures to support context awareness in next-generation mobile and converged networks. We adopt the widely accepted definitions of context and context awareness from [6]. In short, user context characterizes the user’s environment and situation. Knowledge of user context helps to improve the way people communicate and enables new context-aware applications. As an example, the availability and pres- entation of online status information via instant mes- saging help to improve communication. A context model [7] gives an approximate description of the user context for a class of applica- tions. Raw information for this model is obtained via sensors, such as a mobile phone or Global Positioning System (GPS) device, or from networks serving the user. Higher level context information may be inferred from raw information. Place, for example, railway station, may be inferred from geographical position (latitude, longitude) in combination with geographic information system (GIS) data and is typi- cally the basis for location-based services (LBS). In addi- tion, information derived from user patterns may lead to a more complete description of the user’s context. For example, inferring that the user is traveling, along with predicting that the user will arrive at the rail- way station in 10 minutes and wait there for a certain time, will give a restaurant on the railway platform the opportunity to offer a coffee for a reduced price. In this letter, we focus on inferring and predicting context of mobile users from time-stamped sequences of location identifiers, for example, inferring the user’s traveling state or predicting the user’s destination and arrival time. Context derived from location information over time enables service offerings beyond LBS. In the remainder of this letter, the problem statement is given, followed by the solutions architecture, including the research approach and contribution. Next, results based on Global System for Mobile Communications* (GSM*) network data are presented and a conclusion is reached. Context Inference and Prediction We aim to derive as much context information as possible from time-stamped sequences of location iden- tifiers. Location identifiers represent geographical areas and can be obtained in various ways, e.g., from: 1. Access point identifiers of wireless technologies, such as medium access control (MAC) addresses of wireless fidelity (Wi-Fi*) access points, BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 79

Inferring and predicting context of mobile users

Embed Size (px)

Citation preview

◆ Inferring and Predicting Context of Mobile UsersErik Meeuwissen, Paul Reinold, and Cynthia Liem

User context characterizes the user’s environment and situation. Knowledgeof user context helps to improve how people communicate and enables newcontext-aware applications. This letter concerns inferring and predictingmobile users’ context from time-stamped sequences of location identifiers.Results are validated using data from operational Global System for MobileCommunications* (GSM*) networks. © 2007 Alcatel-Lucent.

Bell Labs Technical Journal 12(2), 79–86 (2007) © 2007 Alcatel-Lucent. Published by Wiley Periodicals, Inc. Publishedonline in Wiley InterScience (www.interscience.wiley.com). • 10.1002/bltj.20237

IntroductionWithin a range of research themes, Bell Labs

Europe–Netherlands is researching distributed serv-

ice infrastructures to support context awareness in

next-generation mobile and converged networks. We

adopt the widely accepted definitions of context and

context awareness from [6]. In short, user context

characterizes the user’s environment and situation.

Knowledge of user context helps to improve the way

people communicate and enables new context-aware

applications. As an example, the availability and pres-

entation of online status information via instant mes-

saging help to improve communication.

A context model [7] gives an approximate

description of the user context for a class of applica-

tions. Raw information for this model is obtained via

sensors, such as a mobile phone or Global Positioning

System (GPS) device, or from networks serving the

user. Higher level context information may be

inferred from raw information. Place, for example,

railway station, may be inferred from geographical

position (latitude, longitude) in combination with

geographic information system (GIS) data and is typi-

cally the basis for location-based services (LBS). In addi-

tion, information derived from user patterns may lead

to a more complete description of the user’s context.

For example, inferring that the user is traveling, along

with predicting that the user will arrive at the rail-

way station in 10 minutes and wait there for a certain

time, will give a restaurant on the railway platform

the opportunity to offer a coffee for a reduced price.

In this letter, we focus on inferring and predicting

context of mobile users from time-stamped sequences of

location identifiers, for example, inferring the user’s

traveling state or predicting the user’s destination and

arrival time. Context derived from location information

over time enables service offerings beyond LBS. In the

remainder of this letter, the problem statement is given,

followed by the solutions architecture, including the

research approach and contribution. Next, results based

on Global System for Mobile Communications* (GSM*)

network data are presented and a conclusion is reached.

Context Inference and PredictionWe aim to derive as much context information as

possible from time-stamped sequences of location iden-

tifiers. Location identifiers represent geographical areas

and can be obtained in various ways, e.g., from:

1. Access point identifiers of wireless technologies,

such as medium access control (MAC) addresses

of wireless fidelity (Wi-Fi*) access points,

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 79

80 Bell Labs Technical Journal DOI: 10.1002/bltj

2. Cellular network identifiers, e.g., GSM location

area codes, base station (BS), or cell identifiers,

3. Radio frequency identification (RFID) technology,

and

4. Mapping geographical coordinates obtained via

GPS to regions on a map.

Each technology has its own characteristics and

issues (e.g., battery usage, availability, accuracy, indoor-

outdoor coverage, and overlapping geographical areas)

that have to be considered. This letter focuses on

location identifiers from GSM networks.

Problem StatementThe problem addressed in this letter is, given an

input sequence of time-stamped location identifiers for

an individual mobile user over a period of time, develop

efficient algorithms (1) to infer user context informa-

tion at the best possible quality level [4] and (2) to learn

and predict the user’s traveling patterns. Real data is

needed to assess the quality level of inferred and pre-

dicted information in order to investigate how much

context information can be derived from the input

sequence, and which additional input data may be ben-

eficial. A central problem is to determine the start and

end points of journeys as well as intermediate stops.

In addition, handling oscillations caused by overlapping

geographical areas and selecting journeys for which

prediction is possible and relevant are also important.

Solutions ArchitectureFigure 1 shows our solutions architecture for the

problem outlined. The solution consists of both an

inference module (IM) and a learning and prediction

Panel 1. Abbreviations, Acronyms, and Terms

BS—Base stationGIS—Geographic information systemGPS—Global Positioning SystemGSM—Global System for Mobile

CommunicationsID—IdentifierIM—Inference moduleLBS—Location-based servicesLPM—Learning and prediction moduleMAC—Medium access controlPPM—Prediction by partial matchRFID—Radio frequency identificationWi-Fi*—Wireless fidelity

Context-awareapplication

“Send instant message when useris arriving home in 25 minutes”

Time-stampedlocation identifiers

IM

Previousjourneys

Currentjourney

Predicting routes,destinations and arrival

times

LPM

IM—Inference moduleLPM—Learning and prediction module

Method oftransportation

Favoriteplaces

Journeyfilter

Travelingstate

Oscillationsfilter

Figure 1.Solutions architecture.

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 80

DOI: 10.1002/bltj Bell Labs Technical Journal 81

module (LPM). The LPM predicts the user’s context

on the basis of journeys in the user’s history. A journey

is described as a part of the input sequence, that is, a

time-stamped sequence of location identifiers in

which the first and last identifiers, respectively, denote

the starting point and destination. The IM selects pre-

vious journeys to be learned by the LPM and triggers

prediction for the current journey by the LPM. The IM

contains three interconnected building blocks needed

to interact with the LPM. These include the:

• Oscillations filter, which detects the start and end

of oscillations between two or more location iden-

tifiers. Oscillations typically occur at overlapping

location identifier areas, even when users are sta-

tionary. In GSM networks, a significant fraction of

location identifier changes may be due to such

oscillations.

• Traveling state indicator, which determines whether

the user is traveling or not by taking into account

the output of the oscillations filter as well as time-

related parameters and identifies the traveling

state with a YES or NO.

• Journey filter, which first extracts all journeys made

by the mobile user by considering the traveling

state and distinguishing final destinations from

intermediate stops (e.g., traffic jams, transfer

points, or gas stops). Next, it filters out journeys

for which prediction is possible and useful by clas-

sifying extracted journeys as either local move-

ments (e.g., shopping in a city center) or targeted

trips (e.g., driving by car from work to home).

Finally, it selects all or a subset of the targeted trips

(e.g., home-bound journeys only) to be learned

by the LPM.

The IM may also determine:

• Favorite places, by maintaining lists of location

identifiers for intermediate stops and destinations

and relating those identifiers to descriptions that

are meaningful for the user, for instance, home,

work, school, railway stations, and shops.

• Method of transportation, by inferring the method(s)

of transportation during extracted journeys, such

as walking, bike, or car.

The LPM uses previous journeys to model the

user’s traveling patterns statistically, and dynamically

predicts, using location identifier changes, (1) all

possible destinations and their probabilities, (2) all

routes to a particular destination and their probabili-

ties, and (3) arrival times.

Approach. The IM derives various parameters from

the input sequence, such as the time spent per location

identifier as well as the visit distance, defined as

the number of location identifier changes between the

current and previous “visit” to a location identifier.

The inference algorithms in the IM use such parame-

ters as input and they may partly rely on machine

learning techniques [9].

The LPM relies on data modeling components of

universal data compaction algorithms that also can

be used for prediction [2]. For example, the Lempel-

Ziv78 dictionary algorithm [10] is based on the idea

that data contain many repeated strings, and pre-

diction by partial match (PPM) [5] is based on the

idea that the previous k data symbols can be used to

predict the next symbol, where k ≤ D and D denotes

the maximal number of previous symbols to be con-

sidered. Since users often have repeating traveling

patterns, exploiting the structure in their sequences of

location identifiers by adopting data compaction algo-

rithms seems a natural approach. For our current

LPM prototype, shown in Figure 2, we adopted PPM

and investigated its benefit to our problem as well as

which adaptations are needed to predict routes, des-

tinations, and arrival times of individual mobile users.

In subsequent work, we will investigate the best

choice for the value of D in relation to aspects such as

prediction accuracy, performance, and complexity.

Contribution. Our system automatically filters out

the journeys to be learned, incrementally models the

user’s traveling patterns, and dynamically predicts

routes, destinations, and arrival times. We extend

beyond the state of the art in the following ways. In

[1], a second-order Markov model is constructed offline

using clustered GPS data to predict the probability that

the user will move between different clusters. However,

this model is not incrementally updated, only predicts

final destinations, and does not enable dynamic pre-

diction. Moreover, we can easily take more history into

account by configuring the value of D > 2. In [3],

Lempel-Ziv78 is used to predict the next location area in

a cellular network. In their approach, this may be the

same as the current location area. However, no routes,

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 81

82 Bell Labs Technical Journal DOI: 10.1002/bltj

destinations, or arrival times are determined. In [8], the

prediction of future locations based on GSM network

data is addressed, but no data compaction techniques

are adopted for predicting locations and arrival times.

ResultsFor the validation of our approach, we use GSM

network data. GSM is widely deployed on most

continents and is usable both indoors and outdoors.

Further, location identifiers, such as GSM cell identi-

fiers (IDs), are already available in the terminal and

the network, as they are fundamental for the tech-

nology to function properly.

The coverage area of a GSM base station (BS) is

typically subdivided into three sectors, each having a

unique cell ID. A mobile device in idle mode typically

tries to select the cell with the highest signal quality.

However, as signal quality varies and coverage areas

partially overlap, cell ID changes do not always indi-

cate that users are physically moving. Further, fol-

lowing the same physical route at different times

might not produce cell sequence IDs of an exact

match. Figure 3 shows a BS arrangement with real-

istic cell ID sequences.

An experimental prototype based on the architec-

ture shown in Figure 1 is developed. The IM starts

with running an online algorithm that first determines

the possible occurrence of an oscillation and then

determines the user’s traveling state. This algorithm

detects oscillation starts by checking whether the visit

distance is smaller than or equal to the preconfigured

maximal oscillation set size M, where the oscillation

set OS contains the cell IDs that occur in the current

oscillation. On new cell IDs not in OS, if |OS| � M,

the oscillation ends, else if it is decided by looking

ahead that the user is not yet traveling, the cell ID is

added to OS, else the oscillation ends. When the oscil-

lation ends, the oscillation set is reset to the empty

set. In parallel, the algorithm decides on the user’s

traveling state by considering the previous traveling

state, oscillation-related information such as the aver-

age oscillation time, and time-related thresholds such as

the time in cell threshold, which is used to determine

YES-NO traveling state transitions. An online

approach is chosen as this can trigger prediction

for the current journey and could enable context-

aware services based on the current traveling state.

However, we note that traveling state transitions

Prediction tree nodes may also contain time-related information,e.g., for predicting arrival times and improving prediction accuracy.

-

3,2 4,2

3,12,1 4,1

1,1 2,1

2,21,2

2,1 3,11,1

4,13,1

Journey 1 = 1-2-3-4 parsed as 12, 23, 34, 4Journey 2 = 4-3-2-1 parsed as 43, 32, 21, 1

Nodes contain (location identifier,count)

During new journey, after 1-2, predict 3 (probability 1) and then, assuming 2-3,predict 4 as destination (probability 1)

Learning

Prediction

D � 2

PPM—Prediction by partial match

Figure 2.Slimple PPM prediction tree example of depth D � 1 � 3.

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 82

DOI: 10.1002/bltj Bell Labs Technical Journal 83

are detected with a variable time delay. As an exten-

sion, the maximal oscillation set size is made flexible

with a default value of 3.

To extract learnable journeys, we have developed

a journey filter that filters out targeted trips to be

passed to the LPM. For prediction purposes, it is cru-

cial that the cell IDs corresponding to the start and

end points of the journeys to be learned are exactly

determined. Obtaining these points directly from trav-

eling state transitions is not possible because of oscil-

lations and intermediate stops. In fact, the entire

portion of the input sequence corresponding to

the NO traveling state is used to find the cell in

which the most time was spent during the period of

not traveling. This cell most likely indicates either an

intermediate stop or a journey starting or end point,

depending on the total duration of the NO state. Thus,

the cell ID indicating an ending point coincides in

principle with the cell ID indicating the starting point

of the next journey, but journey extraction can only

be completed with some delay, that is, at the start of

the next journey. In this way, we improve upon

extracting journeys by simply isolating those parts

from the input sequence that are suggested by the

NO-YES and YES-NO traveling state transitions.

The journey filter in our current IM prototype

only uses a static time threshold on the duration of

the NO traveling state to distinguish between inter-

mediate stops and journey end points. This may be

improved by taking visit distance patterns into

account. For example, some journey ends could be

corrected to intermediate stops when the visit dis-

tance pattern, after the end of a NO period, continues

to show a linear increase of 2, indicating that the

route has been taken before in opposite direction.

The test data to validate the algorithms consist

of cell ID histories collected by several test partici-

pants during their everyday activities, containing a

818

819820

534

535536

1

BS 1 BS 2

BS 3

2

3

cell ID=104

105106

BS—Base stationGSM†—Global System for Mobile Communications†

ID—Identifier

†Registered trademarks of the GSM Association.

User stationary, connected to cell 5361

2 User stationary, oscillation between cells 105 and 820

3a User traveling on day 1

3b User traveling on day 2

Figure 3.Example cell ID sequences in a GSM network.

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 83

84 Bell Labs Technical Journal DOI: 10.1002/bltj

time-stamped log entry for every time a new cell ID

was selected by their devices. The testers manually

add notes on changes in their traveling state and

method of transportation. For now, we have validated

the algorithms in the IM on the basis of the data from

two test persons with different traveling profiles.

Details can be found in Panel 2. Most mismatches

in traveling state occur at journey boundaries, and

for local movements, short intermediate stops,

and transport switching. However, the journey filter

compensates for these errors as local movements

can be distinguished from targeted trips by putting

a threshold on the average time in cell during

journeys.

In our current LPM prototype, previous journeys

are learned one by one by using the PPM approach.

Consequently, journey end points can be distinguished

from intermediate points in the prediction tree.

Predicting routes and destinations deviates much more

from standard PPM as we need to predict beyond the

next cell ID. In fact, we predict routes and destina-

tions by, given the previous 1 ≤ k ≤ D cell IDs of the

current journey, repeatedly assuming that predicted

cell IDs are the next cell IDs until all possible destina-

tions are found, as shown in Figure 2. For journeys

that are made at least once, or for journeys that have

their final leg in common with previous journeys to

the same destination, prediction can be done. For

example, by setting D � 6, we see that in our data set

of test person 1, after a small training period, most

journeys home can be predicted well in advance of

the user’s arriving there without using any time-

related information in the tree. However, such infor-

mation is needed to estimate arrival times and

enhance prediction further.

ConclusionsDespite GSM network characteristics such as oscil-

lations and varying cell ID patterns, context inference

can be accomplished. To learn and predict traveling

patterns of mobile users, adopting data compaction

algorithms is shown to be a promising approach.

It is possible to derive a significant amount of indi-

vidual mobile users’ context information from time-

stamped sequences of GSM cell IDs only. This may

enable service providers to introduce personalized and

context-aware services by using location identifiers

from operational GSM networks.

AcknowledgmentsThe authors would like to acknowledge Harold

Batteram for his contribution to the LPM and Willem

Romijn for his contribution to the prototyping and

Test person 1 lives in an urban areasurrounded by several national parks andtravels mostly by bike or car. Going from hometo work is a 30-minute bike or 20-minute cartrip. The test person logged GSM cell ID dataduring a 3-month period, yielding a data set of6,123 measurements. Key data points includethe following:• Of the logged cell changes 38 percent

occurred while the user was not traveling.• After running the online traveling detection

algorithm, 93 percent of the traveling statesfound by the algorithm match the statesindicated by the user.

• After running the journey filter, for 97percent of the 171 journeys found, a correct decision was made on passing thejourney to the LPM, resulting in 155 journeyspassed.

Test person 2 lives in a highly urban area andtravels mostly by train or car. Going from hometo work is a 2-hour train trip. The test personlogged GSM cell ID data during a 1-monthperiod, yielding a data set of 4,916measurements. Key data points include thefollowing:• Of the logged cell changes 19 percent

occurred while the user was not traveling.• After running the online traveling detection

algorithm, 92 percent of the traveling statesfound by the algorithm match with the statesindicated by the user.

• After running the journey filter, for 98percent of the 61 journeys found, a correctdecision was made on passing the journey tothe LPM, resulting in 54 journeys passed.

Panel 2. Test Data: Characteristics and Results for the Inference Module.

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 84

DOI: 10.1002/bltj Bell Labs Technical Journal 85

data gathering. The authors are also grateful to Mortaza

Bargh for carefully reviewing an earlier version of this

manuscript. Some of this work is part of the Freeband

Awareness project (http://awareness.freeband.nl).

Freeband is sponsored by the Dutch government

under contract BSIK 03025.

*TrademarksGSM and Global System for Mobile Communications

are registered trademarks of the GSM Association.

Wi-Fi is a registered trademark of the Wireless Ethernet

Compatibility Alliance, Inc.

References[1] D. Ashbrook and T. Starner, “Learning

Significant Locations and Predicting UserMovement with GPS,” Proc. 6th Internat.Symposium on Wearable Computers (ISWC‘02) (Seattle, 2002), pp. 101–108.

[2] R. Begleiter, R. El-Yaniv, and G. Yona, “OnPrediction Using Variable Order MarkovModels,” J. Artificial Intelligence Research(JAIR), 22 (2004), 385–421.

[3] A. Bhattacharya and S. K. Das, “LeZi-Update:An Information-Theoretic Approach to TrackMobile Users in PCS Networks,” Proc. 5thAnnual ACM/IEEE Internat. Conf. on MobileComputing and Networking (MobiCom ‘99)(Seattle, 1999), pp. 1–12.

[4] T. Buchholz, A. Küpper, and M. Schiffers,“Quality of Context: What It Is and Why WeNeed It,” Proc. Workshop of HP OpenView Univ.Assoc. (HPOVUA ‘03) (Geneva, Switz., 2003).

[5] J. G. Cleary and I. H. Witten, “DataCompression Using Adaptive Coding and PartialString Matching,” IEEE Trans. Commun., 32:4(1984), 396–402.

[6] A. K. Dey and G. D. Abowd, “Towards a BetterUnderstanding of Context and Context-Awareness,” Georgia Institute of Technology,College of Computing, GVU Tech. Report GIT-GVU-99-22, 1999, <ftp://ftp.cc.gatech.edu/pub/gvu/tr/1999/99-22.pdf>.

[7] K. Henricksen, J. Indulska, and A.Rakotonirainy, “Modeling Context Informationin Pervasive Computing Systems,” Proc. 1stInternat. Conf. on Pervasive Computing(Pervasive ‘02) (Zürich, Switz., 2002), LNCSvol. 2414, pp. 167–180.

[8] K. Laasonen, M. Raento, and H. Toivonen,“Adaptive On-Device Location Recognition,”Proc. 2nd Internat. Conf. on Pervasive

Computing (Pervasive ‘04) (Vienna, Aus.,2004), LNCS vol. 3001, pp. 287–304.

[9] I. H. Witten and E. Frank, Data Mining:Practical Machine Learning Tools andTechniques, 2nd ed., Morgan Kaufmann, SanFrancisco, 2005.

[10] J. Ziv and A. Lempel, “Compression ofIndividual Sequences via Variable-RateCoding,” IEEE Trans. Inform. Theory, 24:5(1978), 530–536.

(Manuscript approved March 2007)

ERIK MEEUWISSEN is a senior member of technicalstaff at Bell Labs Europe in Hilversum, theNetherlands. He received M.S. and Ph.D.degrees in electrical engineering fromEindhoven University of Technology in theNetherlands. His research interests include

theoretical and applied aspects of communicationnetworks and services. He presently focuses on qualityof service and context awareness. He served as overallproject leader of the Dutch End-to-End Quality ofService in Next-Generation Networks (EQUANET)consortium. Dr. Meeuwissen is a member of theInstitute of Electrical and Electronics Engineers (IEEE)and the Dutch Electronics and Radio Society (NERG). Heis a corecipient of the 2004 Bell Labs President’s GoldAward for his work on the Parlay Proxy Manager.

PAUL REINOLD is a research manager at Bell LabsEurope in Hilversum, the Netherlands. In his career at Alcatel-Lucent, he held manypositions in R&D product verification,product development, and systemsengineering for the 5ESS® Switch in the

area of applications and call handling services. He later joined Bell Labs to lead a team of researchers on applications and middleware for wireline andconverged networks. He is a member of the steeringboard of Freeband Communication, the Dutch research program on intelligent communication, and member of the European Information andCommunications Technology Industry Association(EICTA) Technical and Regulatory Policy Group onpublic funding and conditions for innovative R&D. Mr.Reinold has a B.S. degree in electrical and computerengineering from the Hogere Technische School inHilversum, the Netherlands. He is a corecipient of the2004 Bell Labs President’s Gold Award for hiscontribution to the Parlay Proxy Manager.

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 85

86 Bell Labs Technical Journal DOI: 10.1002/bltj

CYNTHIA LIEM worked during the summer of 2006 asintern at Bell Labs Europe in Hilversum, theNetherlands. During her summer internship,she participated in research on theinference of higher-level contextinformation and the realization of

experimental prototypes. She is currently pursuingundergraduate degrees in computer science at DelftUniversity of Technology and in music at the RoyalConservatoire in The Hague, both in the Netherlands.She is recipient of the Lucent Global Science ScholarAward 2005 and the Young Talent Incentive Prize forComputer Science 2005 of the Royal Holland Society ofSciences and Humanities. ◆

BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 86