55
Sándor Laki (C) Geolocation b y IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki [email protected] http://lakis.web.elte.hu

Sándor Laki (C) Geolocation by IP address 1 Geolocation by IP address Locating Internet hosts Sándor Laki [email protected]

Embed Size (px)

Citation preview

Sándor Laki (C) Geolocation by IP address

1

Geolocation by IP address

Locating Internet hosts

Sándor [email protected]

http://lakis.web.elte.hu

Sándor Laki (C) Geolocation by IP address

2

Outline

RADAR a wireless solution IP2Geo on the Internet Constraint-Based Geolocation

(GeoLim) OCTANT framework

Topology-Based Geolocation

Sándor Laki (C) Geolocation by IP address

3

RADAR

Sándor Laki (C) Geolocation by IP address

4

RADAR, a wireless approach

Focus on the indoor environment GPS does not work indoors Dedicated technologies

Goals Leverage existing infrastucture Use wireless LAN Software solution Better scalability and lower cost than dedicated

technology

Sándor Laki (C) Geolocation by IP address

5

RADAR

Key idea: Signal strength matching Offline calibration

Construct radio map (<location, Sstr>)

Real-time location and tracking Extract SStr from beacons Find table entry that best matches the

measured SStr

Sándor Laki (C) Geolocation by IP address

6

RADAR: Determine location

Find nearest neighbor in signal space (NNSS)

1st solution Physical position of NNSS gives the user

location 2nd solution

K-NNSS Average the coordinates of k nearest

neighbor gives the wanted position

Sándor Laki (C) Geolocation by IP address

7

Correlation between physical location and signal strength

•Base system:

•INFOCOM 2000 paper

•Enhanced system:

•Microsoft Technical Report MSR-TR-2000-12

Sándor Laki (C) Geolocation by IP address

8

IP2Geo

Single-point localization

Sándor Laki (C) Geolocation by IP address

9

IP2Geo - Motivation Much focus on location-aware services in wireless

and mobile contexts Such services are relevant in the Internet context

too targeted advertising event notification territorial rights management network diagnostics

It is a challenging problem IP address does not inherently contain an indication

of location

Sándor Laki (C) Geolocation by IP address

10

IP2Geo Multi-pronged approach that exploits various “properties” of

the Internet DNS names of router interfaces often indicate location network delay tends to correlate with geographic distance hosts that are aggregated for the purposes of Internet routing also

tend to be clustered geographically

GeoTrack determine location of closest router with a recognizable DNS name

GeoPing use delay measurements to estimate location

GeoCluster extrapolate partial (and possibly inaccurate) IP-to-location mapping

information using BGP prefix clusters

Sándor Laki (C) Geolocation by IP address

11

GeoTrack – main idea

Extract geographical information from DNS names of routers on the path

Localizes the target to the last router whose position is known

Example ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle 184.atm6-0.xr2.ewr1.alter.net => New York dnvr-scrm.abilene.ucaid.edu => Denver

Sándor Laki (C) Geolocation by IP address

12

GeoTrack GeoTrack operation

do a traceroute to the target IP address determine location of last recognizable router along the path

Key ideas in GeoTrack partitioned city code database to minimize chance of false match ISP-specific parsing rules delay-based correction

Limitations routers may not respond to traceroute DNS name may not contain location information or lookup may

fail target host may be behind a proxy or a firewall

Sándor Laki (C) Geolocation by IP address

13

GeoPing - Delay based localization Delay-based triangulation is conceptually simple

delay to distance distance from 3 or more non-colinear points => target

location

But there are practical difficulties network path may be circuitous transmission & queuing delays may corrupt delay

estimate OWD is hard to measure OWD ≠ RTT/2 because of routing asymmetry

Sándor Laki (C) Geolocation by IP address

14

GeoPing - details Measure the network delay to the target host from several

geographically distributed probes typically more than 3 probes are used round-trip delay measured using ping utility small-sized packets => transmission delay is negligible pick minimum among several delay samples

Nearest Neighbor in Delay Space (NNDS) akin to Nearest Neighbor in Signal Space (NNSS) in RADAR construct a delay map containing (delay vector,location) tuples given a vector of delay measurements, search through the delay

map for the NNDS location of the NNDS is our estimate for the location of the target

host

More robust that directly trying to map from delay to distance

Sándor Laki (C) Geolocation by IP address

15

GeoPing – Delay tends to increase with geographic distance

Sándor Laki (C) Geolocation by IP address

16

GeoPing – Estimation error

Sándor Laki (C) Geolocation by IP address

17

GeoCluster

A passive method

Sándor Laki (C) Geolocation by IP address

18

GeoCluster A passive technique unlike GeoTrack and GeoPing

Basic idea: breaks the IP address space into clusters assign a geographical location to each cluster based on

IP-to-location third party databases given a target IP address, first find the matching cluster

using longest-prefix match. location of matching cluster is our estimate of host

location

Sándor Laki (C) Geolocation by IP address

19

GeoCluster Example:

consider the cluster 128.95.0.0/16 (containing 65536 IP addresses)

suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle

then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too

Sándor Laki (C) Geolocation by IP address

20

GeoCluster – Clustering IP addresses

Exploit the hierarchical nature of Internet routing inter-domain routing in the Internet uses

the Border Gateway Protocol (BGP) BGP operates on address aggregates we treat these aggregates as clusters in all we had about 100,000 clusters of

different sizes

Sándor Laki (C) Geolocation by IP address

21

IP-to-location mapping

Data sources e-mail service, business web-hosting companies,

etc. requires a large, fine-grain and fresh database!

Information partial information (i.e., only for a small subset

of addresses) possibly inaccurate (e.g., manual input from

user)

Sándor Laki (C) Geolocation by IP address

22

Extrapolating IP-to-location mapping Determine location most likely to correspond to a cluster

majority polling “average” location dispersion is an indicator of our confidence in the location estimate

What if there is a large geographic spread in locations? some clusters correspond to large ISPs and the internal

subdivisions are not visible at the BGP level sub-clustering algorithm: keep sub-dividing clusters until there is

sufficient consensus in the individual sub-clusters some clients connect via proxies or firewalls (e.g., AOL clients) sub-clustering may help if there are local or regional proxies otherwise large dispersion => no location estimate made many tools fail in this regard

Sándor Laki (C) Geolocation by IP address

23

Performance of GeoCluster

Median errors:

GeoCluster ~30km

GeoPing ~300km

GeoTrack ~100km

Sándor Laki (C) Geolocation by IP address

24

Other database-oriented applications NetGeo and IP2LL

based on WHOIS DB not closely regulated the address information often indicates the

head office of the owner which may be far from the actual target

Quova Commercial service with thier own database

Gtrace using DNS LOC entries

Sándor Laki (C) Geolocation by IP address

25

Octant framework

A very impressive solution

Sándor Laki (C) Geolocation by IP address

26

Octant overview

Combine very different techniques Active and passive

Constraint-based Weighted positive and negative constraints Constraint => region

Using Bézier-regions Efficient implementations of clipping and union

operations are available

Sándor Laki (C) Geolocation by IP address

27

Octant - Notations

i : the region in which the target node is located

j : a constraint It is a region where the node might be

reside associated with weight Set of nodes

Landmarks: physical locations are at least partially known (Lj)

Every Lj has an „estimated” location Lj

Sándor Laki (C) Geolocation by IP address

28

Octant – Landmarks and constraints Primary landmark

GPS, street address Low error

Secondary landmark Position computed by Octant itself

Positive constraints ( set: ) Node A is within d miles of Lk

= (x,y) in k c(x,y,d), where c(x,y,d) is a disc.

Negative constraints( set: ) Node A is further than d miles from Lk

= (x,y) in k c(x,y,d)

Sándor Laki (C) Geolocation by IP address

29

Estimated location

i = XiXi \ XiXi

Sándor Laki (C) Geolocation by IP address

30

Mapping latencies to distances Latency between a target and a landmark

bounds thier maximum distance Calculate with speed of light

delay*2/3*c Low precision

Octant’s way Dynamic calibration For each L landmark compute two bounds RL(d) and

rL(d) where d is the ping time of node i

rL(d) || loc(L) – loc(i) || RL(d)

When queuing delays are dominant then rL(d) = 0.

Sándor Laki (C) Geolocation by IP address

31

Mapping latencies to distances Each landmark periodically pings all other landmarks =>

creating a correlation table Determines the convex hull around the points => R(d)

and r(d) It is sufficient when the target has a direct and

congestion-free path to the landmark Octant introduce a cut off at latency p

a tunable percentage of landmark lie to the left of p discard the others (z is a fictitious datapoint, placed far away)

Sándor Laki (C) Geolocation by IP address

32

Mapping latencies to distances

Sándor Laki (C) Geolocation by IP address

33

Last hop delays Mapping is further complicated by queuing

and transmission delays associated with the last hop Cable and DSL connections Overloaded PlanetLAB nodes

Goal: isolate the delay components which artificially inflate latencies Detailed maps of the underlying physical

network, as in network tomography (not in Octant)

Octant introduce a simple metric called height

Sándor Laki (C) Geolocation by IP address

34

Last hop delays in Octant Based on pair-wise latency measurements

between landmarks Primary landmarks: a, b, c Measure thier latencies(RTT): [a,b], [a,c],

[b,c] The positions of primary landmarks are

known -> we can estimate the transmission delays: (a,b), (a,c), (b,c)

Lasthop delay(a,b) = [a,b] - (a,b) Landmark coordinates (alon, alat),…

Sándor Laki (C) Geolocation by IP address

35

Last hop delays in Octant How much of the delays can be attributed to each

landmark? Denoted by a’, b’ and c’ // height

Similarly, for a target t, we can compute t’, as an estimation…

We can solve for t’, tlon, tlat

Sándor Laki (C) Geolocation by IP address

36

Last hop delays

tlon and tlat has relatively high error not used in the later stages

Given the target and landmark heights Each landmark can shift its

RL up if t’ < heights of the other landmarks

rL down if t’ > heights of the other landmarks

Sándor Laki (C) Geolocation by IP address

37

Indirect routes The preceding assumption

Route lengths are proportional to great circle distances

not the case in practise, due to policy routing

Example: a subscriber Ithaca, NY -> Cornell Univ. (Ithaca) Syracuse, NY -> Brockport, IL -> New York City ->

Cornell Univ. ~1 mile physical distance VS. ~800 miles length

path

Sándor Laki (C) Geolocation by IP address

38

Indirect routes discovery Landmark’s heigth can

indicate… Localizing routers on the

network path Secondary landmarks Localization by latencies Extract location from router

names Reverse DNS lookup – undns

tool Using ZIP code to determine

geographical location

Sándor Laki (C) Geolocation by IP address

39

Handling uncertainty

Filter out errorneous constraint Latency based constraints

Weight system that decreases exponentially with increasing latency

Weight threshold

Sándor Laki (C) Geolocation by IP address

40

Iterative refinement

Two phase: First, we use accurate and mostly

conservative constraints Second, less acurate and more

aggressive constraints to obtain a better estimation (inside the initial estimated region)

Sándor Laki (C) Geolocation by IP address

41

Results

Sándor Laki (C) Geolocation by IP address

42

Results

Sándor Laki (C) Geolocation by IP address

43

Topology-based Geolocation

Sándor Laki (C) Geolocation by IP address

44

Motivations

Problems with CBG Use constraints that are less than speed of light

Risk of underestimates When an underestimate occurs, the final region

does not contain the true location

Topology based geolocation using the speed of light to generate constraints inspired by Sensor Network Localization

Sándor Laki (C) Geolocation by IP address

45

Summary of techniques Traceroute from landmarks

Map topology Estimate hop latency

Improve accuracy Cluster network interfaces

Increase structuring Validate location hints

Incorporate location hints Constraint optimization

Geolocate targets

Sándor Laki (C) Geolocation by IP address

46

Estimate hop latencies

Using traceroute tool to infer link latency

Estimate hop latency from the difference in RTT to adjacent routers Accurate only if the link is traversed both

directions (symmetric routing) How can we discover this property?

Three different techniques

Sándor Laki (C) Geolocation by IP address

47

Estimate hop latencies First, observing the reverse TTL values

Most routers initialize the TTL values for thier packets from a small set. 30,32,64,128,150,255

If TTL values changes significantly from one node to the next => discard the link estimate

Second, measuring paths in both direction between pairs of landmarks If both paths traverse a particular link => taking the differences of

measurements to the two endpoints

This estimation has high confidence

Third, increasing vantage points from which we probe a certain link For every link on a path from a landmark we probes to both endpoints from

all other landmarks…

If these probes pass over the link => estimate for the link

Sándor Laki (C) Geolocation by IP address

48

Clustering interfaces Clustering interfaces that belong to the same router (IP

aliases)

Sándor Laki (C) Geolocation by IP address

49

Clustering interfaces Two IP-aliases techniques

Mercator UDP probes are send to high-numbered ports on a set of

interfaces Routers send back a port-unreachable ICMP message with the

source address If two diff. interfaces replie with the same source address =>

aliases

Ally Used on pairs of interfaces Sends probes to the two if. Examines the IP-ID Most routers generate the IP-ID using a single counter that has

incremented after each packet has been created

Sándor Laki (C) Geolocation by IP address

50

Validating location hints

DNS names -> locations Some names are incorrect…

Missnamed, reconfig, reassignment of IP addresses

Topology constraints can be used to verify location hints RTT measurements -> upper bounds… Clustering -> aliases Hop latencies

Sándor Laki (C) Geolocation by IP address

51

Constraint optimization Targets:X = {xi} Landmarks:L = {li} Distance bw i and j: d(i,j) Hard delay constraint

D(li xj) <= cij // cij Speed of light Set of hdc = Cd

Soft Link Latency Constraints Hop latency bw i and j: hij

D(xi,xj) = hij + eij

Where eij is some error Set of SLLC = Cl

Sándor Laki (C) Geolocation by IP address

52

Constraint optimization

Minimize: i,j in Cl|eij|

Subject to: Cd , Cl

Not a convex optimization problem But we can recast it as a semidefinite

program Using fast solvers

SeDuMi Vivaldi

Sándor Laki (C) Geolocation by IP address

53

Results

Sándor Laki (C) Geolocation by IP address

54

Results

Sándor Laki (C) Geolocation by IP address

55

References RADAR and IP2Geo

http://eris.prakinf.tu-ilmenau.de/res/papers/coopStreaming/padmanabhan01Locating.pdf

Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.

Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts.In Proceedings of the Symposium on Networked System Design and Implementation, Cambridge, Massachusetts, April 2007.

Katz-Bassett, E., John, J. P., Krishnamurthy, A., Wetherall, D., Anderson, T., and Chawathe, Y.

Towards IP geolocation using delay and topology measurements.

In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de Janeriro, Brazil, October 25 - 27, 2006). IMC '06. ACM Press, New York, NY, 71-84.