Cellular Data Network Infrastructure Characterization andImplication on Mobile Content Placement
Qiang XuUniversity of Michigan
Junxian HuangUniversity of Michigan
Zhaoguang WangUniversity of Michigan
Feng QianUniversity of Michigan
Alexandre GerberAT&T Labs Research
Z. Morley MaoUniversity of Michigan
ABSTRACTDespite the tremendous growth in the cellular data network usagedue to the popularity of smartphones, so far there is rather lim-ited understanding of the network infrastructure of various cellularcarriers. Understanding the infrastructure characteristics such asthe network topology, routing design, address allocation, and DNSservice configuration is essential for predicting, diagnosing, andimproving cellular network services, as well as for delivering con-tent to the growing population of mobile wireless users. In thiswork, we propose a novel approach for discovering cellular infras-tructure by intelligently combining several data sources, i.e., serverlogs from a popular location search application, active measure-ments results collected from smartphone users, DNS request logsfrom a DNS authoritative server, and publicly available routing up-dates. We perform the first comprehensive analysis to characterizethe cellular data network infrastructure of four major cellular carri-ers within the U.S. in our study.
We conclude among other previously little known results thatthe current routing of cellular data traffic is quite restricted, as itmust traverse a rather limited number (i.e., 46) of infrastructurelocations (i.e., GGSNs), which is in sharp contrast to wireline In-ternet traffic. We demonstrate how such findings have direct impli-cations on important decisions such as mobile content placementand content server selection. We observe that although the localDNS server is a coarse-grained approximation on the users net-work location, for some carriers, choosing content servers based onthe local DNS server is accurate enough due to the restricted rout-ing in cellular networks. Placing content servers close to GGSNscan potentially reduce the end-to-end latency by more than 50%excluding the variability from air interface.
Categories and Subject DescriptorsC.2.1 [Network Architecture and Design]: Wireless communi-cation; C.2.3 [Network Operations]: Network monitoring; C.4[Performance of Systems]: Measurement techniques; C.4 [Performanceof Systems]: Reliability, availability, and serviceability
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGMETRICS11, June 711, 2011, San Jose, California, USA.Copyright 2011 ACM 978-1-4503-0262-3/11/06 ...$10.00.
General TermsExperimentation, Measurement, Performance
KeywordsCellular network architecture, GGSN placement, Mobile contentdelivery
1. INTRODUCTIONOn the Internet, IP addresses indicate to some degree the identity
and location of end-hosts. IP-based geolocation is widely used indifferent types of network applications such as content customiza-tion and server selection. Using IP addresses to geolocate wirelineend-hosts is known to work reasonably well despite the prevalenceof NAT, since most NAT boxes consist of only a few hosts .However, one recent study  exposed very different characteris-tics of IP addresses in cellular networks, i.e., cellular IP addressescan be shared across geographically very disjoint regions withina short time duration. This observation suggests that cellular IPaddresses do not contain enough geographic information at a suffi-ciently high fidelity. Moreover, it implies only a few IP gatewaysmay exist for cellular data networks, and that IP address manage-ment is much more centralized than that for wireline networks, forwhich tens to hundreds of Points of Presence (PoPs) are spread outat geographically distinct locations.
There is a growing need to improve mobile content delivery,e.g., via a content distribution network (CDN) service, given therapidly increasing mobile traffic volume and the fact that the per-formance perceived by mobile users is still much worse than thatfor DSL/Cable wireline services . For mobile content, the ra-dio access network, cellular backbone, and the Internet wireline allhave impact and leave space for further improvement [2, 1, 30]. Afirst necessary step is to understand the cellular network structure.
The lack of geographic information of cellular IP addresses bringsnew challenges for mobile service providers, who attempt to de-liver content from servers close to users. First, it is unclear whereto place the content servers. As shown later, cellular data networkshave very few IP gateways. Therefore, it is critical to first identifythose IP gateways to help decide where to place content servers.Second, unlike wireline networks, cellular IP addresses themselvesoften cannot accurately convey a users location, which is criticalinformation needed by the CDN service to determine the closestserver. In this work, we show how these challenges can be ad-dressed by leveraging the knowledge of the cellular network in-frastructure.
Cellular data networks have not been explored much by the re-
search community to explain the dynamics of cellular IP addressesdespite the growing popularity of their use. The impact of the cellu-lar architecture on the performance of a diverse set of smartphonenetwork applications and on cellular users has been largely over-looked. In this study, we perform the first comprehensive charac-terization study of the cellular data network infrastructure to ex-plain the diverse geographic distribution of cellular IP addresses,and to highlight the key importance of the design decisions of thenetwork infrastructure that affect the performance, manageability,and evolvability of the network architecture. Understanding thecurrent architecture of cellular data networks is critical for futureimprovement.
Since the observation of the diversity in the geographic distri-bution of cellular IP address in the previous study  indicatesthat there may exist very few cellular IP data network gateways,identifying the location of these gateways becomes the key for cel-lular infrastructure characterization in our study. The major chal-lenge is exacerbated by the lack of openness of such networks. Weare unable to infer topological information using existing probingtools. For example, merely sending traceroute probes from cellulardevices to the Internet IP addresses exposes mostly private IP ad-dresses along the path within the UMTS architecture. In the reversedirection, only some of the IP hops outside the cellular networks re-spond to traceroute probes.
To tackle these challenges, instead of relying on those cellularIP hops, we use the geographic coverage of cellular IP addressesto infer the placement of IP gateways following the intuition thatthose cellular IP addresses with the same geographic coverage arelikely to have the same IP allocation policy, i.e., they are managedby the same set of gateways. To obtain the geographic coverage,we use two distinct data sources and devise a systematic approachfor processing the data reconciling potential conflicts, combinedwith other data obtained via simple probing and passive data analy-sis. Our approach of deploying a lightweight measurement tool onsmartphones provides the network information from the perspec-tive of cellular users. Combining this data source with a locationsearch service of a cellular content provider further enhances ourvisibility into the cellular network infrastructure.
One key contribution of our work is the measurement methodol-ogy for characterizing the cellular network infrastructure, whichrequires finding the relevant address blocks, locating them, andclustering them based on their geographic coverage. This enablesthe identification of the IP gateways within cellular data networks,corresponding to the first several outbound IP hops used to reachthe rest of the Internet. We draw parallels with many past stud-ies in the Internet topology characterization, such as the Rocketfuelproject  characterizing ISP topologies, while our problem high-lights additional challenges due to the lack of publicly available in-formation and the difficulties in collecting relevant measurementdata. We enumerate our key findings and major contributions be-low.
We designed and evaluated a general technique for distin-guishing cellular users from WiFi users using smartphonesand further differentiating network carriers based on cellu-lar IP addresses. Compared with other heuristics such asquerying IP addresses from whois database and distinguish-ing cellular carriers based on key words such as mobilityand wireless from the organization name, our techniquecollects the ground truth observed by smartphone devicesby deploying a lightweight measurement tool for popularsmartphone OSes. Distributed as a free application on majorsmartphone application markets, it can tell the carrier name
for 99.97% records of a popular location search applicationwhich has 20,000 times more records than the application.
We comprehensively characterized the cellular network in-frastructure for four major U.S. carriers including both UMTSand EVDO networks by clustering their IP addresses basedon their geographic coverage. Our technique relies on thedevice-side IP behavior easily collected through our lightweigh