10
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. For more information, please see www.ieee.org/web/publications/rights/index.html. MOBILE AND UBIQUITOUS SYSTEMS www.computer.org/pervasive Cellular Census: Explorations in Urban Data Collection Jonathan Reades, Francesco Calabrese, Andres Sevtsuk, and Carlo Ratti Vol. 6, No. 3 July–September 2007 This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Cellular Census: Explorations in Urban Data Collectionucftb48/Cellular_Census.pdf · specificity of GPS, it nonetheless remains attractive for urban research at scales where this

Embed Size (px)

Citation preview

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or

for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be

obtained from the IEEE.

For more information, please see www.ieee.org/web/publications/rights/index.html.

MOBILE AND UBIQUITOUS SYSTEMS www.computer.org/pervasive

Cellular Census: Explorations in Urban Data Collection

Jonathan Reades, Francesco Calabrese, Andres Sevtsuk, and Carlo Ratti

Vol. 6, No. 3

July–September 2007

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works

may not be reposted without the explicit permission of the copyright holder.

30 PERVASIVEcomputing Published by the IEEE Computer Society ■ 1536-1268/07/$25.00 © 2007 IEEE

Cellular Census:Explorations in UrbanData Collection

Much of our understanding of ur-ban systems comes from tradi-tional data collection methodssuch as surveys by person orphone. These approaches can

provide detailed information about urban behav-iors, but they’re hard to update and might limitresults to “snapshots in time.”

In the past few years, some innovative ap-proaches have sought to use mobile devices to col-lect spatiotemporal data (see the sidebar, “UrbanAnalysis Using Mobile-Device Data”). But littleresearch has been done to develop and analyze themuch larger samples of existing data generated dailyby mobile networks.

The most common explanation for this is thatthe challenge of data-sharing with the telecom-

munications industry has ham-pered data access. However, inearly 2006, a collaboration be-tween Telecom Italia, whichserves 40 percent of the Romanmarket, and MIT’s SENSEableCity Laboratory (http://senseable.mit.edu) allowed unprecedentedaccess to aggregate mobile phonedata from Rome. Here, we ex-

plore how researchers might be able to use datafor an entire metropolitan region to analyzeurban dynamics.

The Real Time Rome platformThe TI and MIT collaboration, developed

under the Real Time Rome label, was shown atthe 2006 Venice Biennale. The installation incor-porated both real-time and historical visualiza-

tions of mobile phone usage levels in centralRome during autumn 2006. The system archi-tecture, including data collection, transfer, andprocessing, has been detailed elsewhere.1

TI supplied several different types of data, firstand foremost of which was the Erlang, a mea-sure of network bandwidth usage typically col-lected at the antenna level. Additionally, TI usedits innovative Lochness platform to supply aggre-gate location and trajectory data on callers usingthe system for more than three minutes at a time.Two transportation companies—Atac-Rome (apublic bus company) and Samarcanda (a privatetaxi company)—also provided supplemental GPSdata to MIT for further processing. However,here we focus on the Erlang data collected overfour months in late 2006 and covering a regionof 47 km2, considering how it can help us betterunderstand urban dynamics.

An Erlang is one person-hour of phone use, so1 Erlang could represent one person talking for anhour, two people talking for a half hour each, 30people speaking for two minutes each, and so on.Consequently, Erlang data is both aggregate andanonymous, and deducing individual identitiesfrom the data collected and stored in the system isimpossible. Additionally, because Erlang data is astandard measure used by most network opera-tors, it’s an accessible source for the analysis of typ-ical GSM (Global System for Mobile Communi-cation) networks. You can collect Erlang datawithout installing new applications or upgradingthe base station controllers, both of which incurcosts and operational risks for the networks.

Although Erlang data can’t be linked to an indi-vidual subscriber and doesn’t offer the locational

Analysis of cell phone use can provide an important new way of lookingat the city as a holistic, dynamic system.

U R B A N C O M P U T I N G

Jonathan Reades University College London

Francesco Calabrese, AndresSevtsuk, and Carlo Ratti Massachusetts Institute of Technology

specificity of GPS, it nonetheless remainsattractive for urban research at scaleswhere this level of resolution is unnec-essary. In effect, Erlang data providesboth a view of urban space as seenthrough network bandwidth consump-tion and, indirectly, insight into urbanlife’s spatial and temporal dynamics.This aspect makes it an excellent jump-ing-off point for research supportingpublic-transport planning, health andsafety, advertising, and other types ofgroup-directed activity.

First visualizations and hypothesis

Figure 1 shows one of the simplest vi-sualizations of Erlang data: a 3D plotof telecommunications activity duringMadonna’s controversial 6 August2006 performance, when more than70,000 people converged on the StadioOlimpico for a concert condemned bythe Pope. Generic Erlang maps such asthis, which was presented at the Bien-nale, are graphically appealing andintuitively easy to grasp. However,they’re actually quite difficult to inter-pret rigorously, and they provide littleinsight into local-area dynamics with-out additional processing.

We hypothesize that by employing var-

ious statistical techniques, we can use dif-ferences in Erlang data over time to deriveclues to the types of activity in the imme-diate area of the mast. (A mast can carrymultiple antennas, oriented in differentdirections or serving different frequencies.)This analysis is conceptually related to theidea of a chronotype (see the “Chrono-types and Space-Time Typologies” side-

bar), except that we’re characterizingspaces by their mobile-bandwidth use overtime. By analyzing the bandwidth “signa-ture” of each antenna, we try to envisionhow it might correlate with urban activi-ties in the geographical vicinity.

Because Erlang data is an antenna-level measure, we needed an algorithmto spread the point data values across the

JULY–SEPTEMBER 2007 PERVASIVEcomputing 31

T he Massachusetts Institute of Technology’s Reality Mining project

(http://reality.media.mit.edu) successfully abstracted common

behavioral patterns from the activities of 70 students and faculty issued

with Nokia phones carrying specially designed logging software.1,2

Rein Ahas and Ülar Mark tracked the mobile phones of 300 users

for a “social positioning method” analysis.3 By combining spatio-

temporal data from phones with demographic and attitudinal data

from surveys, they created a map of social spaces in Estonia.

In the UK, the Cityware research group has taken a more readily

scalable approach. They supplement the pedestrian flow data typi-

cally gathered as part of a space syntax analysis with data on Blue-

tooth devices passing through pedestrian survey “gates.”4

However, approaches such as these can suffer from important limi-

tations: they rely on the deployment of ad hoc infrastructure or re-

quire user consent. Consequently, sample sizes are necessarily more

modest and might be limited in terms of the research’s spatial extent.

REFERENCES

1. N. Eagle and A. Pentland, “Eigenbehaviors: Identifying Structure in Rou-tine,” 2006; http://vismod.media.mit.edu//tech-reports/TR-601.pdf.

2. N. Eagle and A. Pentland, “Reality Mining: Sensing Complex Social Sys-tems,” Personal and Ubiquitous Computing, vol. 10, no. 4, 2006, pp.255–268.

3. R. Ahas and Ü. Mark, “Location Based Services—New Challenges for Plan-ning and Public Administration?” Futures, vol. 37, no. 6, 2005, pp. 547–561.

4. E. O’Neill et al., “Instrumenting the City: Developing Methods forObserving and Understanding the Digital Cityscape,” UbiComp 2006:Ubiquitous Computing, LNCS 4206, Springer, 2006, pp. 315–332.

Urban Analysis Using Mobile-Device Data

Figure 1. A 3D plot of telecommunications activity during a Madonna concertin Rome.

area served, accounting for distancedecay in signal coverage and multipleantennas on a single mast. Carlo Rattiand his colleagues took a center-of-gravity approach,2 but to interpolatevalues for the entire metropolitan region,an alternative algorithm3 was used todivide Rome into “pixels” measuring1,600 m2. We used an exponential dis-tribution function to derive an Erlangpoint value based on a composite signalfrom the surrounding masts.

We use this mathematical notation:

• Loc is the set of 1,600 m2 pixels.• T96 is the set of times when we made

observations each day of the week.Because we took measurements every15 minutes, one day comprises 96observations.

• Day is the set of {Weekday, Friday, Sat-urday, Sunday} (we discuss this inmore detail later).

• erlang(�, �, �) defines the Erlang valueat location � � Loc, at time � � T96,and � � Day.

• indicates the mean of the

values ai, i � I.

(To preserve confidentiality, TI used ascaling factor to adjust the Erlang values

transmitted to the SENSEable City Labo-ratory. This means that while the rela-tive difference between any two observa-tions is scaled consistently, the actualErlang value at that point in time is un-known. So, it’s helpful to focus on therelationships between points over timeand space rather than the specific valueat any one point in time.)

Using prior knowledge of the city, wearbitrarily selected eight locations thatwe expected to have markedly differentsignatures. Following an initial vi-sualization exercise, we selected six foranalysis:

• Termini, Rome’s main passenger railstation and busiest subway station;

• Trastevere, a mixed-use area popularwith Romans and tourists for its barsand restaurants;

• the Piazza Bologna, a residential areaeast of the city center;

• the area in front of the Pantheon (oneof Rome’s premier tourist attractions),which also contains many bars andrestaurants;

• the Stadio Olimpico, a sports and ma-jor concert venue northwest of centralRome; and

• Tiburtina, a smaller rail and subwayinterchange.

Figure 2 shows the pixels for these sixlocations.

To minimize the impact of specialevents on the data set, we calculated anaverage Erlang value for each pixel ateach 15-minute interval, using a 90-dayperiod. So, for example, the data pointfor 9 a.m. Monday is an average of every9 a.m. Monday value between 1 Sep-tember and 30 November 2006. Weexcluded civic holidays from the calcu-lation on the basis that they would intro-duce unnecessary noise.

Erlang data by day of the weekBeginning with a minimal level of

processing, figure 3 shows how Erlangdata changes over time at each of thesix selected pixels. As the graphs in-dicate, Monday through Friday arebroadly similar, except for a more rapiddecrease in activity on Friday after-noon, suggesting a transition to theweekend. Even more strikingly, Satur-day and Sunday values often drop be-low 50 percent of the typical weekdayload, but the drop’s magnitude variesdramatically from site to site. This find-ing indicates that weekday and week-end data should be treated separatelyin our analysis.

Intriguingly, areas more closely identi-

mean ai I i∈

{ }

32 PERVASIVEcomputing www.computer.org/pervasive

U R B A N C O M P U T I N G

T o help conceptualize a city’s complex hourly, daily, weekly,

monthly, and annual rhythms, Luca Bertolini and Martin Dijst

put forward Roberta Bonfiglioli’s concept of the chronotype.1 The

chronotype is a useful conceptual handle for thinking about how dif-

ferent groups occupy the same space depending on the time of day.

Bertolini and Dijst offer the example of a mixed-use area inhabited by

young couples without children and by families. The young couples

will likely work in another part of the city, returning perhaps only in

the evening to socialize in bars and restaurants. In contrast, family

members will go shopping and use other services during the day in

this area. The same space can thus have two or more distinct uses and

populations.

As another way to conceptualize these rhythms, Robbert Zand-

vliet and Dijst offer space-time typologies.2 That is, they propose “a

typology of urban, suburban, and rural municipalities … based on

diurnal weekday variations in visitor populations”2 as a way to un-

derstand how place works in Manuel Castells’ “network society.”3

REFERENCES

1. L. Bertolini and M. Dijst, “Mobility Environments and Network Cities,” J.Urban Design, vol. 8, no. 1, 2003, pp. 27–43.

2. R. Zandvliet and M. Dijst, “Short-Term Dynamics in the Use of Places: ASpace-Time Typology of Visitor Populations in the Netherlands,” UrbanStudies, vol. 43, no. 7, 2006.

3. M. Castells, “Grassrooting the Space of Flows,” Cities in the Telecommu-nications Age, J. Wheeler, Y. Ayoma, and B. Warf, eds., Routledge, 2000,pp. 18–27.

Chronotypes and Space-Time Typologies

fied with Roman residents, such as thePiazza Bologna and Tiburtina, displaylower levels of day-to-day Erlang variancethan those associated with more transientpopulations of commuters or tourists,such as Termini and the Pantheon. Thispeculiarity suggests that the greater theflux of people—or, possibly, nonresi-dents—through a site, the greater the vari-ance in the signal. Conversely, predomi-nantly “local” areas seem to have higherlevels of routine or habitual activity andthus less variation between days.

However, in spite of the differencesbetween weekdays, Fridays, and theweekend, figure 3 shows that all six loca-tions demonstrate a broadly compara-ble rhythm—a rapid ramping-up oftelecommunications activity between 6and 10 a.m. on weekdays and a slowerpace on weekends. Apart from the Sta-dio Olimpico, where the rhythms of con-certs and football matches clearly showon the graph, patterns are quite uniform:a clear double peak and varying ratiosof weekday-to-weekend activity. So, howcan we make differences in signaturesbetween different sites more evident?

NormalizationThe magnitude of the differences be-

tween sites in figure 3 makes it hard tocompare them in a more detailed way,so some type of data normalization isnecessary. In figure 4, we plot the ratio oftelecommunications intensity at onepixel against the average of every pixel inthe system at that point in time (nor-malization over space). We then comparethat to the daily pixel average (normal-ization over time). Using this approach,we can identify otherwise hidden shiftsin the relative intensity of activity acrossRome.

We employed these normalization steps:

1. For each location � and day �, wecalculate

2. We then normalize the signatureover space:

3. We then normalize the signature overtime:

The differences in figure 4 are quitevisible, and the radically different signa-ture at the Stadio Olimpico indicates thatyou can readily recognize certain classesof urban activity by the unusual distri-

bution of telecommunications activity.Such events will likely place a corre-spondingly high load on urban infra-structure and resources, and a similarspike in Erlang data is also likely duringemergencies. This suggests that the real-time recognition of unusual concentra-tions of telecommunications activitymight have relevance for public safetyplanning and transport scheduling.

On weekends, the higher levels ofbandwidth use between midnight and 2a.m. near the Pantheon and in Traste-vere relative to work-oriented and resi-dentially oriented pixels strongly suggestleisure activity. This feature suggests thatwe can also identify cultural and leisureareas on the basis of their telecommuni-cations signature. Another feature withimplications for the understanding ofurban dynamics is the high level of activ-ity at the transit hubs on weekday morn-ings, compared to residential sites.

Figure 5 shows a more diffuse patternof spatial activity on weekends. This is

τ ∈ → =T erlang

erlangnorm space time

norm spac

96 _ _

_ ee

T norm spacemean erlang

δ λ τ

δ λ ττ

, ,

, ,_

( )( ){ }

∈96

erlang

erlang

mean erla

norm space

Loc

_

, ,

=

( )∈

δ λ τ

λnng δ λ τ, ,( ){ }

mean erlangLocλ

δ λ τ∈

( ){ }, ,

JULY–SEPTEMBER 2007 PERVASIVEcomputing 33

0 Selected pixels for analysis625 Subway stations1,250 2,500 meters

Figure 2. A map of Rome indicating the six locations (“pixels”) selected foranalysis of mobile phone usage.

consistent with the idea that althoughweekday telecommunications activity ateach site exhibits a more dynamic tempo-ral pattern, weekend activity exhibits morespatial dispersal. From an urban-planningstandpoint, this strongly suggests largecommuter flows into the central businessdistrict during the week and more resi-dentially oriented activity on weekends.Of course, planners are well aware of thisspatial relationship, but spatial and tem-poral visualization of these features at thisscale hasn’t been possible before.

One caveat: the levels of activity be-tween 3 and 6 a.m. throughout the weekmean that any analysis using that periodwould be rooted in extremely low Erlangvalues. So, such a comparison might erro-neously indicate excessive shifts in activ-ity from site to site. Nonetheless, from thisinitial analysis, it seems that through nor-malized signatures we can reconstructsome of the functioning of the city usingthe invisible fingerprints of mobile phoneinfrastructure.

Cluster analysisSo far, we’ve focused largely on indi-

vidual pixels, and we’ve identified someinteresting features at a fairly detailedspatial level. Our preliminary analysisindicates that residential areas, com-muter hubs, nighttime hot spots, andeven special-event venues demonstratefeatures consistent with our contextual,anecdotal knowledge of Rome. How-ever, validating our hypotheses requiresa more rigorously quantitative study.The ultimate goal is to take the derivedsignatures, group them by degree of sim-ilarity, and map them to urban spa-tiotemporal structures.

As a proof of concept, we created asimplified vector—required for compu-tational manageability—to feed pixeldata for each of Rome’s 262,144 pixelsto a clustering algorithm. An examina-tion of our six selected pixels suggestedthat six times in the daily cycle of Erlangactivity are particularly significant: 1a.m., 7 a.m., 11 a.m., 2 p.m., 5 p.m., and

9 p.m. Each of these points lies towardthe middle of a period of rapid changeor significant variation between sites—the early morning rise in activity, latemorning peak period, early afternoonlull, afternoon peak, and evening drop.The six normalized Erlang values thusmake up the coordinates of a vector thatdescribes, in a limited way, each pixel’ssignature.

We could use many clustering tech-niques to create segmentations based onthe affinity between vectors. We chose aK-Means approach, such that everyobservation in a cluster is as much likeother members of that cluster and as dif-ferent as possible from members of anyother cluster. With six coordinates fromeach day, and separate sets of coordi-nates for Monday through Thursday(one set of averaged observations), Fri-day, Saturday, and Sunday, the K-Meansalgorithm used a 24-dimensional space.

We employed two clustering steps.First, for each pixel, we calculate fea-

34 PERVASIVEcomputing www.computer.org/pervasive

U R B A N C O M P U T I N G

Stadio Olimpico Pantheon Piazza Bologna

Termini Tiburtina Trastevere

12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m. 12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m. 12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.

Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Erla

ng (n

atur

al s

cale

)Er

lang

(nat

ural

sca

le)

Figure 3. Erlang data for the six locations by day of week.

ture(loc) = {erlang(�, �, j)}, j = 1 a.m., 7a.m., 11 a.m., 2 p.m., 5 p.m., and 9 p.m.

Second, the K-Means clustering algo-rithm partitions the pixels into mutuallyexclusive clusters. Each cluster is charac-terized by its centroid, and the algorithmaims to minimize the error function:

where clusterk is the set of objects relatedto the cluster k, and centroidk is the meanof all the points in clusterk. We calculatedthe distance between pixels using thesquared Euclidean distance:

As a result of the clustering process, wecan group all pixels in the city into any arbi-trary number of groups based on the affin-ity of their composite Erlang signature. Inour tests, we found a mix of clusters thatsuggest a complex set of relationshipsbetween signatures. Given the sheer com-

plexity of cities, this is hardly surprising.However, the existence of several smallclusters with much stronger levels of affil-iation or differentiation indicates that theoverall data set includes some quite distinctsignatures. These signatures will likely mapto distinct types of urban activity.

For this initial research, we worked

with eight clusters as a compromisebetween simplicity and specificity. Doingthis gave us a fair cophenetic correlationvalue of 0.7704. Cophenetic correlationis one way to gauge the clusters’ fit to theoriginal data set—values approaching1.0 suggest a good fit—by comparingpairwise linkages between observations.

distance loc loc

feature loc feature li

1 2

1

,( ) =

( ) − ooci2

2

1

241

2

( )⎛

⎝⎜⎜

⎠⎟⎟

=∑τ

distance loc centroidj kloc Clusterk

n

j k

,( )∈=∑

2

1

CClusters

JULY–SEPTEMBER 2007 PERVASIVEcomputing 35

0

1

2

3

4

12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.

12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.

0

1

2

3

4

Termini

Piazza Bologna

Trastevere

Tiburtina

Pantheon

Stadio Olimpico

(b)

(a)No

rmal

ized

Erla

ngNo

rmal

ized

Erla

ng

Figure 4. Erlang data normalized overspace and time by site: (a) Mondaythrough Thursday and (b) Saturday.

Satu

rday

Mon

day

to T

hurs

day

11 a.m.7 a.m.1 a.m. 2 p.m. 5 p.m. 9 p.m.

Figure 5. Erlang data for Rome normalized over space and time. Intensities range from low (blue) to high (red).

Projecting these clusters onto a mapof Rome (see figure 6) naturally indi-cates that they’re closely linked to thenormalized Erlang signatures. Theedges of Rome’s urban core are clearlyvisible, as are the hot spots of urbanactivity straddling the Tiber River. Themap suggests an overall structure to thecity, with a correspondence betweenlevels of telecommunications activityand types of human activity. At thispoint, however, we can’t verifiably con-nect cellular signatures to specific typesof human activity.

We then adjusted the metric to favorthe two most distinctive types of useseen in the normalized graphs: earlymorning use suggestive of commutingbehavior and late evening use suggestiveof nighttime leisure activities. For theseclusters we obtained cophenetic corre-lations of 0.7630 and 0.8508, indicat-ing that the clustering approach has sub-stantial promise.

The red nighttime-leisure cluster in fig-ure 7a shows two discrete spatial group-ings that map anecdotally to knownareas of evening activity: Trastevere andthe area ranging to the west and southof the Piazza Navona, and the vicinityof the Piazza Spagna. The red commuterclusters in figure 7b quite astonishingly

map to the most important points ofentry to the city by car and train: Ter-mini station, Tiburtina, the end of theCorso d’Italia, the Porta Maggiore, andthe Porta San Giovanni.

DiscussionOur preliminary findings suggest that

signature analysis can provide an impor-tant new way of looking at the city as aholistic, dynamic system. In particular, themobile phone network lets us develop areal-time representation of those dynam-ics at the city and city-region scale. Thisapproach can complement traditional col-lection techniques, which are often out-dated by the time they’re available to pol-icy makers and the general public. Ofcourse, because our hypotheses so far arebased on anecdotal evidence, our findingswill require additional validation, whichwe outline below.

What’s most promising about thisearly research is the extent to which ourfindings seem to parallel those of otherEuropean researchers4,5 as well as moreconceptual research into telecommuni-cations’ impacts on urban behaviors.6,7

In particular, we can characterize areason the basis of flows and dynamicsrather than on the basis of comparativelystatic physical or demographic features.

Moreover, we’ve recently receiveddata from Pagine Gialle (the Italian Yel-low Pages) with which we intend to val-idate our initial findings by linking thesignatures to spatial data on businesstypes and densities. In so doing, we canbuild on the processing requirements wediscussed earlier in this article:

1. Antenna and pixel values must benormalized over both space andtime to provide a measure of rela-tive telecommunications intensity.

2. The substantial differences betweenweekdays and weekends requiretreating them separately in a classi-fication algorithm.

3. The key time periods intimated inthis initial analysis appear to be 12to 2 a.m., 5 to 8 a.m., 10 a.m. to 12p.m., 2 to 6 p.m., and 8 to 11 p.m.However, as our initial cluster analy-sis makes clear, these aren’t the onlyfactors.

We expect several other analyticalapproaches to yield insights into net-work usage patterns. One of the mostpromising approaches is Eigenbehavioranalysis.8 Because we can easily map thesignature to a vector representation ofthe sort already used in the cluster analy-

36 PERVASIVEcomputing www.computer.org/pervasive

U R B A N C O M P U T I N G

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8

(b) (c)(a)

Figure 6. Analysis of eight clusters of Erlang data: (a) clusters 1–4; (b) a satellite view of Rome, for comparison; (c) clusters 5–8.

sis, deriving the Eigenvectors should bequite straightforward. Applying otheranalytical techniques such as Fourier andwavelet transform plots might revealnew, distinct characteristics. Ideally, eachof these analyses will eventually feed intoa single categorization process that candiscriminate between discrete types ofbehavior at the antenna and pixel levels.

Limitations of researchAs we mentioned before, TI’s masking

function meant that we weren’t able towork with true Erlang values. An addi-tional constraint is that owing to opera-tional requirements and planning re-strictions, GSM masts are irregularlydistributed and oriented. To manage thecomputational and mathematical com-plexity of calculating point values for262,144 pixels over a three-month per-iod, our algorithm spreads Erlang datathrough all 360 degrees, producing a pos-sible skew in the overall distribution.

Finally, not all masts handle both the900- and 1,800-Hz bands used inEurope. So, some network activity mightgravitate toward more physically remotebase stations with the hardware toprocess calls in a particular band. Wedon’t have data that would let us com-pensate for these possible biases. So,without adopting an entirely differentapproach to data collection—one thatthe network operator would have beenreluctant to support at this developmentstage—localizing phones more accu-rately is impossible.

Although the data to which we cur-rently have access has clear, substantiallimitations, we believe our approach rep-resents an appropriate trade-off betweenlocational specificity and implementa-tional feasibility. Fortunately, analysis atthe city and city-regional scale doesn’tdepend on the high level of accuracy that

location-based services typically require.So, we feel that there’s plenty of oppor-tunity to gain valuable insights intourban dynamics using Erlang data atsmaller scales, and we intend to moveforward with it.

It would be exciting to compare thesignatures collected from Romewith similar data from other majorEuropean cities such as London,

Paris, or Frankfurt. For instance, it’s rea-sonable to expect that cities with moredistinct spatial patterns of human activ-ity might display correspondingly moredistinct patterns of network use andmore readily classifiable signatures.Unfortunately, at this time commercialconsiderations appear to preclude usingdata from other network operators.

This issue highlights the extent towhich research using cellular networksmust take nonscientific factors intoaccount. First, a policy framework at thenational or European level that encour-ages networks to share nonidentifiabledata with planning and policy researcherswould be immensely helpful. Clearly,there are important considerations fromthe standpoint of commercial confiden-tiality, personal privacy, and possiblyeven national security. However, in theabsence of clear regulatory guidance, fur-

ther research using cell phones—the mostwidely deployed device with locationalcapabilities—won’t be possible at the cityor city-region scale.

Without encouragement, far moredetailed data sets held by the networkswill never see the light of day. For in-stance, paging data—generated bypolling the phones in a cell to obtain alist of IMEI (International Mobile Equip-ment Identity) numbers at the mastlevel—could provide unmatched detailon travel origins and destinations, andon population densities. By scramblinghandset identifiers with changing encryp-tion schemes, reporting only partial tra-jectories, and never reporting on cells orpaths containing fewer than an agreedminimum number of users, you wouldbe able to perform this kind of researchwithout compromising personal privacy.

This data would also assist enormouslyin understanding how individual andgroup behavior changes over time andspace. This would not only shed furtherlight on the rhythms of urban life but alsoaddress the fact that you can’t derive met-rics on activity and population densitiesfrom Erlang data alone.

The challenge is that as the databecomes more useful, it also becomesmore sensitive to both operators and endusers. An all-or-nothing approach to pri-vacy has hampered this discussion.

JULY–SEPTEMBER 2007 PERVASIVEcomputing 37

Nighttime-leisure-weighted clusters Transit-weighted clusters

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

(b)(a)

Figure 7. Analysis of the five clusters covering Erlang data for the two mostdistinctive types of cell phone use: (a) nighttime leisure, (b) early morningcommuting.

38 PERVASIVEcomputing www.computer.org/pervasive

U R B A N C O M P U T I N G

It would be helpful to move toward amore nuanced understanding of how topreserve reasonable expectations of pri-vacy by the network user while creatingmechanisms to permit future research.We need to establish the extent to whichcertain types of data and analysis createeither the perception of a privacy inva-sion9 or the real risk of trail reidentifi-cation,10 and to set out the trade-offs forpublic review and discussion.

ACKNOWLEDGMENTSThe Real Time Rome exhibit at the 10th Interna-tional Architecture Exhibition at Venice, Italy was

sponsored by Telecom Italia, with technical part-ners Biennale di Venezia, the City of Rome, Google,Atac-Rome, Samarcanda, and Microtek. We grate-fully acknowledge the significant contributions ofBurak Arikan, Assaf Biderman, Filippo Dal Fiore, Sa-ba Ghole, Daniel Gutierrez, Sonya Huang, SriramKrishnan, Justin Moe, Francisca Rojas, and NajibMarc Terazi.

We also thank Matthew Jull, Tim Kindberg, and ouranonymous reviewers for their valuable feedback;Giovanni Celentano of the University of Naples andLucio Pinto of the Silvio Tronchetti Provera Founda-tion for their generous suggestions; and Peter Halland Michael Batty of University College London andLarry Vale of the Massachusetts Institute of Techno-logy for fostering a collaborative research environ-ment. And a final thank-you to the Balzan Founda-tion, whose support for young researchers throughthe Balzan Prize helped enable this transatlanticcollaboration.

REFERENCES1. F. Calabrese and C. Ratti, “Real Time

Rome,” Networks and CommunicationStudies, vol. 20, nos. 3 & 4, 2006, pp.247–258.

2. C. Ratti et al., “Mobile Landscapes: UsingLocation Data from Cell-Phones for UrbanAnalysis,” Environment and Planning B:Planning and Design, vol. 33, no. 5, 2006,pp. 727–748.

3. F. Calabrese et al., “Real-Time Urban Monitoring Using Cellphones: A Case-Study in Rome,” SENSEable Working Paper, SENSEable City Laboratory, Mass-achusetts Inst. of Technology, 2007; http://senseable.mit.edu/papers/pdf/CalabreseRatti2007SCLWorkingPaper.pdf.

4. L. Bertolini and M. Dijst, “Mobility Envi-ronments and Network Cities,” J. UrbanDesign, vol. 8, no. 1, 2003, pp. 27–43.

5. R. Zandvliet and M. Dijst, “Short-TermDynamics in the Use of Places: A Space-Time Typology of Visitor Populations in theNetherlands,” Urban Studies, vol. 43, no. 7,2006.

6. S. Graham, “Cities in the Real-Time Age:The Paradigm Challenge of Telecommuni-cations to the Conception and Planning ofUrban Space,” Environment and PlanningA, vol. 29, no. 1, 1997, pp. 105–127.

7. M. Sheller and J. Urry, “The New Mobili-ties Paradigm,” Environment and PlanningA, vol. 38, no. 2, 2006, pp. 207–226.

8. N. Eagle and A. Pentland, “Eigenbehaviors:Identifying Structure in Routine,” 2006;http://vismod.media.mit.edu/tech-reports/TR-601.pdf.

9. M. Langheinrich, “Privacy Invasions inUbiquitous Computing,” 2002; http://guir.berkeley.edu/pubs/ubicomp2002/privacyworkshop/papers/uc2002-pws.pdf.

10. B. Malin, “Betrayed by My Shadow: Learn-ing Data Identity via Trail Matching,” J.Privacy Technology, 2005, pp. 1–22; www.jopt.org/publications/20050609001_malin_abstract.html.

For more information on this or any other comput-ing topic, please visit our Digital Library at www.computer.org/publications/dlib.

the AUTHORSJonathan Reades is an MPhil and a PhD candidate at the Bartlett School of Planningat University College London. His research interests are the application of mobilephone data to topics in urban planning such as business clustering and communica-tion, and the spatiotemporal structures of European cities. He previously spent eightyears in data analytics, helping telecom firms use their data for targeted marketing.His first degree was in comparative literature at Princeton University; he’s a studentmember of both the Royal Town Planning Institute and the Town and Country Plan-ning Association. Contact him at the Planning Dept., 4th fl., The Bartlett School,Wates House, 22 Gordon St., London WC1H 0QB, UK; [email protected].

Francesco Calabrese is pursuing a PhD in informatics and automatics engineeringfrom the University of Naples Federico II and is a visiting doctoral student at the Mass-achusetts Institute of Technology’s SENSEable City Laboratory. His research interestsinclude hybrid control systems, embedded control systems, computer–numerical-control machines, finite-time stability, and real-time monitoring and analysis of urbandynamics. He received his Laurea in informatics engineering from the University ofNaples Federico II. He’s a member of the IEEE and the IEEE Control Systems Society.Contact him at the SENSEable City Laboratory, MIT 10-485, 77 Massachusetts Ave.,Cambridge, MA, 02139; [email protected].

Andres Sevtsuk is a PhD candidate in city design and development and urban infor-mation systems at the Massachusetts Institute of Technology. His research interestsinclude accuracy and reliability in using mobile phone data for estimating the distri-bution of people in cities, the relationship between urban form and movement, andthe study of new possibilities in street parking and one-way vehicle rentals using dis-tributed computation and sensing. He received his MS in architecture studies fromMIT. Contact him at 7 Dodge St., Cambridge, MA 02139; [email protected].

Carlo Ratti is an associate professor of the practice of urban technologies at the Mass-achusetts Institute of Technology, where he directs the SENSEable City Laboratory. He’salso founding partner and director of carlorattiassociati, an architectural firm. He re-ceived his PhD in architecture from the University of Cambridge. He’s a member of theOrdine degli Ingegneri di Torino and the Association des Anciens Elèves de l’École Na-tionale des Ponts et Chaussées, and is a UK Registered Architect. Contact him at theSENSEable City Laboratory, MIT 10-485, 77 Massachusetts Ave., Cambridge, MA 02139;[email protected].