20
Towards Street-Level Client- Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern http:// networks.cs.northwestern.edu

Towards Street-Level Client- Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar

Embed Size (px)

Citation preview

Towards Street-Level Client-Independent IP Geolocation

Yong Wang, UESTC/Northwestern

Daniel Burgener, Northwestern

Marcel Flores, Northwestern

Aleksandar Kuzmanovic, Northwestern

Cheng Huang, Microsoft Research

http://networks.cs.northwestern.edu

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Problem and Motivation

How to accurately locate IP addresses on the Internet?

Host-dependent solutions:– GPS– WiFi (e.g., Google My Location, Skyhook)

Host-independent solutions:– Server cannot always expect clients’ cooperation

• Security / access restrictions• Online service access analytics• Location-based online advertising

2

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

A Scenario of Street-Level Online Advertising

3

User’s location

Local Businesses

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Prior Work

Constrained Based Geolocation [ToN 06]

Median error distance = 228 km– Measure delays from active vantage points

Topology Based Geolocation [IMC 06]

Median error distance = 67 km– CBG + consider network topological information

Octant [NSDI 07]

Median error distance = 35.2 km– CBG + consider router’s location, geographical and

demographics information

4

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Methodology Highlights

Our methodology is based on two insights

– Websites often provide the actual geographical location of associated entities

• E.g., universities, businesses, government offices, etc.• Develop methods to determine if web- or e-mail servers

reside at the corresponding locations

– Relative network delays highly correlate with geographical distances

• Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results

5

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation 6

Institutional Network Example

to externalnetwork

router

IP subnet

mail server

web server

550 South Hill Street Suite 890, Los Angeles, CA 90013

Web cloud-

sourcing

Web cloud-

sourcing550 South Hill Street Suite 890,

Los Angeles, CA 90013

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation 7

< <<

Measured delays:

The Role of Relative Network Delays

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

A Case Study

Target IP address: 38.100.25.196

Target postal address: 1850, K Street NW, Washington DC, DC, 20006

8

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Three-Tier Geolocation System

9

Tier 1Goal: Find the coarse-grained region for the

targeted IP

Measured delays

Geographical distances

Create intersection

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Three-Tier Geolocation System

10

Tier 2

Estimate the delay between landmarks and the target

D1 + D2 < D3 +D4

Create a new intersection

Populate the intersection with landmarks

Goal: Use passive landmarks to determine

finer-grained region for the targeted IP

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Three-Tier Geolocation System

11

Tier 3

Select the landmark with the minimum delay to the target, and associate the target’s location with it.

10.6 km vs. 0.103 km

Measured distance ∝ Geographical distance

Goal: Geolocatethe target IP using passive landmarks

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Remaining Issues

Verifying landmarks– Sweep-out most of the erroneous landmarks– Errors are still possible!

Resilience to errors– The larger the error – the more resilient our method

is– We prove that the likelihood that an erroneous

landmark will affect the accuracy is small

12

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Evaluation

Three datasets– Planetlab dataset (Academic)– Collected dataset (Residential)– Online Maps dataset (In the wild)

Factors impact the accuracy– Landmark density– Population density– Access networks

13

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Dataset Characteristics

14

The three datasets cover both urban areas and rural areas.

Urban areas

Rural areas

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Baseline Results

15

Error distance (km) Planetlab Residential Online Maps

The best previous result

Median 0.69 2.25 2.11 35.2

Maximum 5.24 8.1 13.2 276.8

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Landmark Density

16

The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate the targeted IP.

Density sequence:

Planetlab > Residential > Online Maps

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

The Role of Population Density

17

The error distance is smallest in densely populated areas The error grows as the population density decreases

Middle of “nowhere”

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

The Role of Access Networks

18

Error distance (km) AT&T Comcast Verizon

Median 1.68 2.38 1.48

2 km

700 meters

Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon)

Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation

Conclusions

A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method

Our methodology consists of two components– Mining landmarks from the Web and using Web or

E-mail servers as landmarks– Using relative network distances as opposed to

absolute network distances

19

Thank You

http://networks.cs.northwestern.edu