Transcript
Page 1: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

IP Geolocation with Two-Tier Neural Network

Hao Jiang1 Yaoqing Liu2 Jeanna N. Matthews2

1Department of Computer ScienceThe University of Chicago

2Department of Computer ScienceClarkson University

Page 2: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Outline

Background

System Design

Data Collection

Evaluation

Conclusion

Page 3: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

What is IP Geolocation?

128.135.100.110

41.79N87.60W

Page 4: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

What can we do with IP Geolocation?

Credit card Fraud

CDN Online Ads

Page 5: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Previous work - IPGeo Database

Pros

I Easy to use

Cons

I Less accurate (City level)

I Not up to date (Periodic update)

Page 6: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Previous work - Measuring network delay

Observers

Pos 1

Pos 2

Pos ?

ping/traceroute

Latency 1

Latency 2

Latency 3

θ = train(Latency1,2,Pos1,2)

Pos3 = predict(Latency3,θ)

Page 7: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Previous work - Build a model

ob1 ob2

min dist

max dist

Possible locations

Design a simple model (mostly based on triangulation) andcalculate the parameters. [GZCF06, KBJK+06, WSS07, DPCS12]Accuracy: ∼10km median error

Such a model requires a lot of assumptions, which are notnecessarily true. (E.g., is there a linear relationship betweenlatency and geographic distance?)

Page 8: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Previous work - Find nearby landmarks

observer 1 observer 2

landmark 1 (1,1.5)landmark 2 (2.5,1.5)

target (1.2,1.7)

Find the landmark that has the most similar observationresults with the target. [WBF+11]Accuracy: ∼1km median errorAccuracy is greatly relied on the density of the landmarks.Hard to maintain a large group of landmarks.

Page 9: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Previous work - What do we learn

I Physically adjacent nodes have similarmeasurements

I Network topology is simpler in a local area thanin a larger area

Page 10: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Outline

Background

System Design

Data Collection

Evaluation

Conclusion

Page 11: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Design Idea

Observers

Landmarks

measureModel

Target

measure

Train Predict

Our method employs machine learning technique to solve theproblem. Instead of “choose” a model, we collect latency datafrom landmarks with known locations and train a model, thenuse this model to predict the location of unknown targets.

Page 12: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Two-Tier Neural Network

Region estimation Location estimation

Intuition:Measurement from adjacent landmarks can yield a betterestimation result.

Make a rough estimation with all landmarks, locate the region thetarget resides in. Then use only the landmarks in that region to doa more accurate prediction.

Page 13: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Outline

Background

System Design

Data Collection

Evaluation

Conclusion

Page 14: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Data Collection - Observers

Ripe Atlas Probes

Ripe Atlas Anchors

We choose 14 anchors from Ripe Atlas Network as observers.These observers covers most area of US continental evenly. Theyare able to send ping/traceroute requests to arbitrary IP addresses.

Page 15: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Data Collection - Datasets

A large enough landmark dataset is crucial to the accuracy of ourmethod. Our dataset consists of landmarks from three datasources

I Ripe Atlas Probes

I University Webservers

I City Government Webservers

Page 16: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Data Collection

University Dataset

I Get a U.S. university list from Wikipedia

I Use Google search API to obtain the geographic location andits website

I Use host command to obtain corresponding IP address

City Dataset

I Get a U.S. city and population list from government website

I Choose the top 50 cities of each state ordered by populationin descending order

I Use Google search API to obtain the geographic location andits website

I Use host command to obtain corresponding IP address

Page 17: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Data Collection - Filtering

We filter out invalid data using various methods

I Look for popular virtual host providers (Amazon, GoDaddy,Rackspace, etc.)

I Look for owners that own multiple IP addresses (throughwhois)

I Cross-validation using GeoIP database

Page 18: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Data Collection - Result

Category Raw Valid Reachable

Ripe Atlas Probes 637 637 429

UniversityWebsites

2170 1858 826

City GovernmentWebsites

2880 740 292

Total 5687 3235 1547

Table: Landmark Detail (Raw: All landmark candidates. Valid:Landmarks after filtering and cross-validation. Reachable: Landmarksthat respond to ping)

Page 19: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Outline

Background

System Design

Data Collection

Evaluation

Conclusion

Page 20: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Evaluation - Error Distribution

;Error Distribution of the estimation result

We compare the performance of two popular neural network types:Multi-Layer Perceptron (MLP) and Radial-Basis Function (RBF)

Accuracy:

I Over 80% estimations have a error within 10km

I MLP has a overall better performance than RBF

Page 21: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Evaluation - Accuracy related to number of landmarks

;MLP Error related to Landmark Density

I 3.7km in regions with > 100 landmarks

I 6km in regions with < 50landmarks

I Error decreases when landmark density increases

Page 22: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Outline

Background

System Design

Data Collection

Evaluation

Conclusion

Page 23: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Conclusion

Our Contribution:

I A novel method for IP Geolocation

I Achieved similar accuracy with state-of-the artwith a fixed amount of landmarks

Page 24: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Future Work

I Mobile clientIn this research, our data source contains only wirednetwork nodes. Mobile network, especially cellular networkclients may have different properties that is notrepresented in our dataset.

Contribution: High Complexity: High

I RegionOur method assumes two geographically adjacent IPaddresses will be adjacent on network topology. Whilethis has been justified by our research result on U.S.territory, we are interested in expanding the testing inregions such as Europe and Asia.

Contribution: High Complexity: Medium

Page 25: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

Question?

Page 26: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

References I

Ziqian Dong, Rohan D.W. Perera, Rajarathnam Chandramouli,and K.P. Subbalakshmi.Network measurement based modeling and optimization for{IP} geolocation.Computer Networks, 56(1):85 – 98, 2012.

Bamba Gueye, Artur Ziviani, Mark Crovella, and Serge Fdida.Constraint-based Geolocation of Internet Hosts.IEEE/ACM Trans. Netw., 14(6):1219–1232, December 2006.

Ethan Katz-Bassett, John P. John, Arvind Krishnamurthy,David Wetherall, Thomas Anderson, and Yatin Chawathe.Towards IP Geolocation Using Delay and TopologyMeasurements.In IMC ’06, pages 71–84, New York, NY, USA, 2006. ACM.

Page 27: IP Geolocation with Two-Tier Neural Networkpeople.clarkson.edu/~jmatthew/publications/IEEE_GI2016_FINALSLIDE… · IP Geolocation with Two-Tier Neural Network ... ping/traceroute

References II

Yong Wang, Daniel Burgener, Marcel Flores, AleksandarKuzmanovic, and Cheng Huang.Towards Street-level Client-independent IP Geolocation.In NSDI ’11, pages 27–27, Berkeley, CA, USA, 2011. USENIXAssociation.

Bernard Wong, Ivan Stoyanov, and Emin Gun Sirer.Octant: A Comprehensive Framework for the Geolocalizationof Internet Hosts.In NSDI ’07, pages 23–23, Berkeley, CA, USA, 2007. USENIXAssociation.