Upload
erik-meeuwissen
View
213
Download
1
Embed Size (px)
Citation preview
◆ Inferring and Predicting Context of Mobile UsersErik Meeuwissen, Paul Reinold, and Cynthia Liem
User context characterizes the user’s environment and situation. Knowledgeof user context helps to improve how people communicate and enables newcontext-aware applications. This letter concerns inferring and predictingmobile users’ context from time-stamped sequences of location identifiers.Results are validated using data from operational Global System for MobileCommunications* (GSM*) networks. © 2007 Alcatel-Lucent.
Bell Labs Technical Journal 12(2), 79–86 (2007) © 2007 Alcatel-Lucent. Published by Wiley Periodicals, Inc. Publishedonline in Wiley InterScience (www.interscience.wiley.com). • 10.1002/bltj.20237
IntroductionWithin a range of research themes, Bell Labs
Europe–Netherlands is researching distributed serv-
ice infrastructures to support context awareness in
next-generation mobile and converged networks. We
adopt the widely accepted definitions of context and
context awareness from [6]. In short, user context
characterizes the user’s environment and situation.
Knowledge of user context helps to improve the way
people communicate and enables new context-aware
applications. As an example, the availability and pres-
entation of online status information via instant mes-
saging help to improve communication.
A context model [7] gives an approximate
description of the user context for a class of applica-
tions. Raw information for this model is obtained via
sensors, such as a mobile phone or Global Positioning
System (GPS) device, or from networks serving the
user. Higher level context information may be
inferred from raw information. Place, for example,
railway station, may be inferred from geographical
position (latitude, longitude) in combination with
geographic information system (GIS) data and is typi-
cally the basis for location-based services (LBS). In addi-
tion, information derived from user patterns may lead
to a more complete description of the user’s context.
For example, inferring that the user is traveling, along
with predicting that the user will arrive at the rail-
way station in 10 minutes and wait there for a certain
time, will give a restaurant on the railway platform
the opportunity to offer a coffee for a reduced price.
In this letter, we focus on inferring and predicting
context of mobile users from time-stamped sequences of
location identifiers, for example, inferring the user’s
traveling state or predicting the user’s destination and
arrival time. Context derived from location information
over time enables service offerings beyond LBS. In the
remainder of this letter, the problem statement is given,
followed by the solutions architecture, including the
research approach and contribution. Next, results based
on Global System for Mobile Communications* (GSM*)
network data are presented and a conclusion is reached.
Context Inference and PredictionWe aim to derive as much context information as
possible from time-stamped sequences of location iden-
tifiers. Location identifiers represent geographical areas
and can be obtained in various ways, e.g., from:
1. Access point identifiers of wireless technologies,
such as medium access control (MAC) addresses
of wireless fidelity (Wi-Fi*) access points,
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 79
80 Bell Labs Technical Journal DOI: 10.1002/bltj
2. Cellular network identifiers, e.g., GSM location
area codes, base station (BS), or cell identifiers,
3. Radio frequency identification (RFID) technology,
and
4. Mapping geographical coordinates obtained via
GPS to regions on a map.
Each technology has its own characteristics and
issues (e.g., battery usage, availability, accuracy, indoor-
outdoor coverage, and overlapping geographical areas)
that have to be considered. This letter focuses on
location identifiers from GSM networks.
Problem StatementThe problem addressed in this letter is, given an
input sequence of time-stamped location identifiers for
an individual mobile user over a period of time, develop
efficient algorithms (1) to infer user context informa-
tion at the best possible quality level [4] and (2) to learn
and predict the user’s traveling patterns. Real data is
needed to assess the quality level of inferred and pre-
dicted information in order to investigate how much
context information can be derived from the input
sequence, and which additional input data may be ben-
eficial. A central problem is to determine the start and
end points of journeys as well as intermediate stops.
In addition, handling oscillations caused by overlapping
geographical areas and selecting journeys for which
prediction is possible and relevant are also important.
Solutions ArchitectureFigure 1 shows our solutions architecture for the
problem outlined. The solution consists of both an
inference module (IM) and a learning and prediction
Panel 1. Abbreviations, Acronyms, and Terms
BS—Base stationGIS—Geographic information systemGPS—Global Positioning SystemGSM—Global System for Mobile
CommunicationsID—IdentifierIM—Inference moduleLBS—Location-based servicesLPM—Learning and prediction moduleMAC—Medium access controlPPM—Prediction by partial matchRFID—Radio frequency identificationWi-Fi*—Wireless fidelity
Context-awareapplication
“Send instant message when useris arriving home in 25 minutes”
Time-stampedlocation identifiers
IM
Previousjourneys
Currentjourney
Predicting routes,destinations and arrival
times
LPM
IM—Inference moduleLPM—Learning and prediction module
Method oftransportation
Favoriteplaces
Journeyfilter
Travelingstate
Oscillationsfilter
Figure 1.Solutions architecture.
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 80
DOI: 10.1002/bltj Bell Labs Technical Journal 81
module (LPM). The LPM predicts the user’s context
on the basis of journeys in the user’s history. A journey
is described as a part of the input sequence, that is, a
time-stamped sequence of location identifiers in
which the first and last identifiers, respectively, denote
the starting point and destination. The IM selects pre-
vious journeys to be learned by the LPM and triggers
prediction for the current journey by the LPM. The IM
contains three interconnected building blocks needed
to interact with the LPM. These include the:
• Oscillations filter, which detects the start and end
of oscillations between two or more location iden-
tifiers. Oscillations typically occur at overlapping
location identifier areas, even when users are sta-
tionary. In GSM networks, a significant fraction of
location identifier changes may be due to such
oscillations.
• Traveling state indicator, which determines whether
the user is traveling or not by taking into account
the output of the oscillations filter as well as time-
related parameters and identifies the traveling
state with a YES or NO.
• Journey filter, which first extracts all journeys made
by the mobile user by considering the traveling
state and distinguishing final destinations from
intermediate stops (e.g., traffic jams, transfer
points, or gas stops). Next, it filters out journeys
for which prediction is possible and useful by clas-
sifying extracted journeys as either local move-
ments (e.g., shopping in a city center) or targeted
trips (e.g., driving by car from work to home).
Finally, it selects all or a subset of the targeted trips
(e.g., home-bound journeys only) to be learned
by the LPM.
The IM may also determine:
• Favorite places, by maintaining lists of location
identifiers for intermediate stops and destinations
and relating those identifiers to descriptions that
are meaningful for the user, for instance, home,
work, school, railway stations, and shops.
• Method of transportation, by inferring the method(s)
of transportation during extracted journeys, such
as walking, bike, or car.
The LPM uses previous journeys to model the
user’s traveling patterns statistically, and dynamically
predicts, using location identifier changes, (1) all
possible destinations and their probabilities, (2) all
routes to a particular destination and their probabili-
ties, and (3) arrival times.
Approach. The IM derives various parameters from
the input sequence, such as the time spent per location
identifier as well as the visit distance, defined as
the number of location identifier changes between the
current and previous “visit” to a location identifier.
The inference algorithms in the IM use such parame-
ters as input and they may partly rely on machine
learning techniques [9].
The LPM relies on data modeling components of
universal data compaction algorithms that also can
be used for prediction [2]. For example, the Lempel-
Ziv78 dictionary algorithm [10] is based on the idea
that data contain many repeated strings, and pre-
diction by partial match (PPM) [5] is based on the
idea that the previous k data symbols can be used to
predict the next symbol, where k ≤ D and D denotes
the maximal number of previous symbols to be con-
sidered. Since users often have repeating traveling
patterns, exploiting the structure in their sequences of
location identifiers by adopting data compaction algo-
rithms seems a natural approach. For our current
LPM prototype, shown in Figure 2, we adopted PPM
and investigated its benefit to our problem as well as
which adaptations are needed to predict routes, des-
tinations, and arrival times of individual mobile users.
In subsequent work, we will investigate the best
choice for the value of D in relation to aspects such as
prediction accuracy, performance, and complexity.
Contribution. Our system automatically filters out
the journeys to be learned, incrementally models the
user’s traveling patterns, and dynamically predicts
routes, destinations, and arrival times. We extend
beyond the state of the art in the following ways. In
[1], a second-order Markov model is constructed offline
using clustered GPS data to predict the probability that
the user will move between different clusters. However,
this model is not incrementally updated, only predicts
final destinations, and does not enable dynamic pre-
diction. Moreover, we can easily take more history into
account by configuring the value of D > 2. In [3],
Lempel-Ziv78 is used to predict the next location area in
a cellular network. In their approach, this may be the
same as the current location area. However, no routes,
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 81
82 Bell Labs Technical Journal DOI: 10.1002/bltj
destinations, or arrival times are determined. In [8], the
prediction of future locations based on GSM network
data is addressed, but no data compaction techniques
are adopted for predicting locations and arrival times.
ResultsFor the validation of our approach, we use GSM
network data. GSM is widely deployed on most
continents and is usable both indoors and outdoors.
Further, location identifiers, such as GSM cell identi-
fiers (IDs), are already available in the terminal and
the network, as they are fundamental for the tech-
nology to function properly.
The coverage area of a GSM base station (BS) is
typically subdivided into three sectors, each having a
unique cell ID. A mobile device in idle mode typically
tries to select the cell with the highest signal quality.
However, as signal quality varies and coverage areas
partially overlap, cell ID changes do not always indi-
cate that users are physically moving. Further, fol-
lowing the same physical route at different times
might not produce cell sequence IDs of an exact
match. Figure 3 shows a BS arrangement with real-
istic cell ID sequences.
An experimental prototype based on the architec-
ture shown in Figure 1 is developed. The IM starts
with running an online algorithm that first determines
the possible occurrence of an oscillation and then
determines the user’s traveling state. This algorithm
detects oscillation starts by checking whether the visit
distance is smaller than or equal to the preconfigured
maximal oscillation set size M, where the oscillation
set OS contains the cell IDs that occur in the current
oscillation. On new cell IDs not in OS, if |OS| � M,
the oscillation ends, else if it is decided by looking
ahead that the user is not yet traveling, the cell ID is
added to OS, else the oscillation ends. When the oscil-
lation ends, the oscillation set is reset to the empty
set. In parallel, the algorithm decides on the user’s
traveling state by considering the previous traveling
state, oscillation-related information such as the aver-
age oscillation time, and time-related thresholds such as
the time in cell threshold, which is used to determine
YES-NO traveling state transitions. An online
approach is chosen as this can trigger prediction
for the current journey and could enable context-
aware services based on the current traveling state.
However, we note that traveling state transitions
Prediction tree nodes may also contain time-related information,e.g., for predicting arrival times and improving prediction accuracy.
-
3,2 4,2
3,12,1 4,1
1,1 2,1
2,21,2
2,1 3,11,1
4,13,1
Journey 1 = 1-2-3-4 parsed as 12, 23, 34, 4Journey 2 = 4-3-2-1 parsed as 43, 32, 21, 1
Nodes contain (location identifier,count)
During new journey, after 1-2, predict 3 (probability 1) and then, assuming 2-3,predict 4 as destination (probability 1)
Learning
Prediction
D � 2
PPM—Prediction by partial match
Figure 2.Slimple PPM prediction tree example of depth D � 1 � 3.
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 82
DOI: 10.1002/bltj Bell Labs Technical Journal 83
are detected with a variable time delay. As an exten-
sion, the maximal oscillation set size is made flexible
with a default value of 3.
To extract learnable journeys, we have developed
a journey filter that filters out targeted trips to be
passed to the LPM. For prediction purposes, it is cru-
cial that the cell IDs corresponding to the start and
end points of the journeys to be learned are exactly
determined. Obtaining these points directly from trav-
eling state transitions is not possible because of oscil-
lations and intermediate stops. In fact, the entire
portion of the input sequence corresponding to
the NO traveling state is used to find the cell in
which the most time was spent during the period of
not traveling. This cell most likely indicates either an
intermediate stop or a journey starting or end point,
depending on the total duration of the NO state. Thus,
the cell ID indicating an ending point coincides in
principle with the cell ID indicating the starting point
of the next journey, but journey extraction can only
be completed with some delay, that is, at the start of
the next journey. In this way, we improve upon
extracting journeys by simply isolating those parts
from the input sequence that are suggested by the
NO-YES and YES-NO traveling state transitions.
The journey filter in our current IM prototype
only uses a static time threshold on the duration of
the NO traveling state to distinguish between inter-
mediate stops and journey end points. This may be
improved by taking visit distance patterns into
account. For example, some journey ends could be
corrected to intermediate stops when the visit dis-
tance pattern, after the end of a NO period, continues
to show a linear increase of 2, indicating that the
route has been taken before in opposite direction.
The test data to validate the algorithms consist
of cell ID histories collected by several test partici-
pants during their everyday activities, containing a
818
819820
534
535536
…
1
BS 1 BS 2
BS 3
2
3
cell ID=104
105106
BS—Base stationGSM†—Global System for Mobile Communications†
ID—Identifier
†Registered trademarks of the GSM Association.
User stationary, connected to cell 5361
2 User stationary, oscillation between cells 105 and 820
3a User traveling on day 1
3b User traveling on day 2
Figure 3.Example cell ID sequences in a GSM network.
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 83
84 Bell Labs Technical Journal DOI: 10.1002/bltj
time-stamped log entry for every time a new cell ID
was selected by their devices. The testers manually
add notes on changes in their traveling state and
method of transportation. For now, we have validated
the algorithms in the IM on the basis of the data from
two test persons with different traveling profiles.
Details can be found in Panel 2. Most mismatches
in traveling state occur at journey boundaries, and
for local movements, short intermediate stops,
and transport switching. However, the journey filter
compensates for these errors as local movements
can be distinguished from targeted trips by putting
a threshold on the average time in cell during
journeys.
In our current LPM prototype, previous journeys
are learned one by one by using the PPM approach.
Consequently, journey end points can be distinguished
from intermediate points in the prediction tree.
Predicting routes and destinations deviates much more
from standard PPM as we need to predict beyond the
next cell ID. In fact, we predict routes and destina-
tions by, given the previous 1 ≤ k ≤ D cell IDs of the
current journey, repeatedly assuming that predicted
cell IDs are the next cell IDs until all possible destina-
tions are found, as shown in Figure 2. For journeys
that are made at least once, or for journeys that have
their final leg in common with previous journeys to
the same destination, prediction can be done. For
example, by setting D � 6, we see that in our data set
of test person 1, after a small training period, most
journeys home can be predicted well in advance of
the user’s arriving there without using any time-
related information in the tree. However, such infor-
mation is needed to estimate arrival times and
enhance prediction further.
ConclusionsDespite GSM network characteristics such as oscil-
lations and varying cell ID patterns, context inference
can be accomplished. To learn and predict traveling
patterns of mobile users, adopting data compaction
algorithms is shown to be a promising approach.
It is possible to derive a significant amount of indi-
vidual mobile users’ context information from time-
stamped sequences of GSM cell IDs only. This may
enable service providers to introduce personalized and
context-aware services by using location identifiers
from operational GSM networks.
AcknowledgmentsThe authors would like to acknowledge Harold
Batteram for his contribution to the LPM and Willem
Romijn for his contribution to the prototyping and
Test person 1 lives in an urban areasurrounded by several national parks andtravels mostly by bike or car. Going from hometo work is a 30-minute bike or 20-minute cartrip. The test person logged GSM cell ID dataduring a 3-month period, yielding a data set of6,123 measurements. Key data points includethe following:• Of the logged cell changes 38 percent
occurred while the user was not traveling.• After running the online traveling detection
algorithm, 93 percent of the traveling statesfound by the algorithm match the statesindicated by the user.
• After running the journey filter, for 97percent of the 171 journeys found, a correct decision was made on passing thejourney to the LPM, resulting in 155 journeyspassed.
Test person 2 lives in a highly urban area andtravels mostly by train or car. Going from hometo work is a 2-hour train trip. The test personlogged GSM cell ID data during a 1-monthperiod, yielding a data set of 4,916measurements. Key data points include thefollowing:• Of the logged cell changes 19 percent
occurred while the user was not traveling.• After running the online traveling detection
algorithm, 92 percent of the traveling statesfound by the algorithm match with the statesindicated by the user.
• After running the journey filter, for 98percent of the 61 journeys found, a correctdecision was made on passing the journey tothe LPM, resulting in 54 journeys passed.
Panel 2. Test Data: Characteristics and Results for the Inference Module.
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 84
DOI: 10.1002/bltj Bell Labs Technical Journal 85
data gathering. The authors are also grateful to Mortaza
Bargh for carefully reviewing an earlier version of this
manuscript. Some of this work is part of the Freeband
Awareness project (http://awareness.freeband.nl).
Freeband is sponsored by the Dutch government
under contract BSIK 03025.
*TrademarksGSM and Global System for Mobile Communications
are registered trademarks of the GSM Association.
Wi-Fi is a registered trademark of the Wireless Ethernet
Compatibility Alliance, Inc.
References[1] D. Ashbrook and T. Starner, “Learning
Significant Locations and Predicting UserMovement with GPS,” Proc. 6th Internat.Symposium on Wearable Computers (ISWC‘02) (Seattle, 2002), pp. 101–108.
[2] R. Begleiter, R. El-Yaniv, and G. Yona, “OnPrediction Using Variable Order MarkovModels,” J. Artificial Intelligence Research(JAIR), 22 (2004), 385–421.
[3] A. Bhattacharya and S. K. Das, “LeZi-Update:An Information-Theoretic Approach to TrackMobile Users in PCS Networks,” Proc. 5thAnnual ACM/IEEE Internat. Conf. on MobileComputing and Networking (MobiCom ‘99)(Seattle, 1999), pp. 1–12.
[4] T. Buchholz, A. Küpper, and M. Schiffers,“Quality of Context: What It Is and Why WeNeed It,” Proc. Workshop of HP OpenView Univ.Assoc. (HPOVUA ‘03) (Geneva, Switz., 2003).
[5] J. G. Cleary and I. H. Witten, “DataCompression Using Adaptive Coding and PartialString Matching,” IEEE Trans. Commun., 32:4(1984), 396–402.
[6] A. K. Dey and G. D. Abowd, “Towards a BetterUnderstanding of Context and Context-Awareness,” Georgia Institute of Technology,College of Computing, GVU Tech. Report GIT-GVU-99-22, 1999, <ftp://ftp.cc.gatech.edu/pub/gvu/tr/1999/99-22.pdf>.
[7] K. Henricksen, J. Indulska, and A.Rakotonirainy, “Modeling Context Informationin Pervasive Computing Systems,” Proc. 1stInternat. Conf. on Pervasive Computing(Pervasive ‘02) (Zürich, Switz., 2002), LNCSvol. 2414, pp. 167–180.
[8] K. Laasonen, M. Raento, and H. Toivonen,“Adaptive On-Device Location Recognition,”Proc. 2nd Internat. Conf. on Pervasive
Computing (Pervasive ‘04) (Vienna, Aus.,2004), LNCS vol. 3001, pp. 287–304.
[9] I. H. Witten and E. Frank, Data Mining:Practical Machine Learning Tools andTechniques, 2nd ed., Morgan Kaufmann, SanFrancisco, 2005.
[10] J. Ziv and A. Lempel, “Compression ofIndividual Sequences via Variable-RateCoding,” IEEE Trans. Inform. Theory, 24:5(1978), 530–536.
(Manuscript approved March 2007)
ERIK MEEUWISSEN is a senior member of technicalstaff at Bell Labs Europe in Hilversum, theNetherlands. He received M.S. and Ph.D.degrees in electrical engineering fromEindhoven University of Technology in theNetherlands. His research interests include
theoretical and applied aspects of communicationnetworks and services. He presently focuses on qualityof service and context awareness. He served as overallproject leader of the Dutch End-to-End Quality ofService in Next-Generation Networks (EQUANET)consortium. Dr. Meeuwissen is a member of theInstitute of Electrical and Electronics Engineers (IEEE)and the Dutch Electronics and Radio Society (NERG). Heis a corecipient of the 2004 Bell Labs President’s GoldAward for his work on the Parlay Proxy Manager.
PAUL REINOLD is a research manager at Bell LabsEurope in Hilversum, the Netherlands. In his career at Alcatel-Lucent, he held manypositions in R&D product verification,product development, and systemsengineering for the 5ESS® Switch in the
area of applications and call handling services. He later joined Bell Labs to lead a team of researchers on applications and middleware for wireline andconverged networks. He is a member of the steeringboard of Freeband Communication, the Dutch research program on intelligent communication, and member of the European Information andCommunications Technology Industry Association(EICTA) Technical and Regulatory Policy Group onpublic funding and conditions for innovative R&D. Mr.Reinold has a B.S. degree in electrical and computerengineering from the Hogere Technische School inHilversum, the Netherlands. He is a corecipient of the2004 Bell Labs President’s Gold Award for hiscontribution to the Parlay Proxy Manager.
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 85
86 Bell Labs Technical Journal DOI: 10.1002/bltj
CYNTHIA LIEM worked during the summer of 2006 asintern at Bell Labs Europe in Hilversum, theNetherlands. During her summer internship,she participated in research on theinference of higher-level contextinformation and the realization of
experimental prototypes. She is currently pursuingundergraduate degrees in computer science at DelftUniversity of Technology and in music at the RoyalConservatoire in The Hague, both in the Netherlands.She is recipient of the Lucent Global Science ScholarAward 2005 and the Young Talent Incentive Prize forComputer Science 2005 of the Royal Holland Society ofSciences and Humanities. ◆
BLTJ122_1202_06_20237.qxd 7/23/07 2:00 PM Page 86