1
Operating a DNS-based Active Internet Observatory Jens Hiller, Oliver Hohlfeld, Jan Rüth, Torsten Zimmermann www.netray.io The NetRay Internet Observatory Motivation Study Internet Evolution Internet: an entirely man-made system yet not fully understood Optimizing the Internet requires understanding its properties Longitudinal and multi-protocol studies rare Often only a single protocol is measured for a short duration Domain name probes provide new perspective IPv4 space probed regularly à doesn’t account for virtualization (SNI) Goal Regular, multi-protocol probes of IPv4 & >50% of domain name space for multiple protocols Regular probes: daily or weekly Probe more than one protocol Probe large portion of the domain name space Architecture Target lists Zmap based IPv4 address space scan DNS zone files for multiple TLDs (e.g., .com, .net, .org) Complete zone files for few TLDs Passive DNS feed + CT logs to reconstruct other TLDs DNS resolution Perform DNS resolution for every domain for multiple RRs DNS resolution by cluster of machines Output annotated: e.g., CDN, ASN, Cloud, Output written to Rabbit MQ message bus Protocol probing Protocol worker for every protocol Can run on multiple server Subscribe to workload via Rabbit MQ message bus t Protocol Probing DNS Resolution Input: Target Lists Zone Files DNS Crawler Passive DNS CT Logs zmap domain names en*re IPv4 Space Rabbit MQ Message Bus Results A / AAAA NS / MX A-www Results HTTP2 A-www + IPs Results QUIC Results MX Protocol X *me protocol prober 1 2 3 - blacklist - Classifiers Example Studies Data Sets Domain Name System Complete zone files (daily) for com, .net, .org, .fi, .se, .nu, .gov fed.us, .name, + >1000 new gTLDs (e.g., .london) Incomplete zone files for 80 ccTLDs (e.g., .de) Source: passive DNS & CT log Probed RRs: ANY, SOA, CAA, LOC A & AAAA: (www.) domain.tld A & AAAA for every NS/MX HTTP/2 Probe selected TLDs for HTTP2 Full connection establishment Regular scans: daily/weekly HTTP2 Server Push adoption Monitor which sites push content on their landing page QUIC Probe all TLDs / IPv4 for QUIC support Perform connection establishment Google & IETF QUIC Regular scans: daily/weekly Server fingerprinting etc. TCP Initial Window Assessment of global TCP Initial Window distribution Probe 1% random subsample of the IPv4 space & Alexa Top 1M domains Few full scans available 1% random sample sufficiently approximates overall distribution à Reduce scan footprint Web Security TLS connections with all TLDs Establish connection Retrieve certificates & 256kB payload Regular scans: daily/weekly Certificate Authority Authorization CAA goal: limit cert mis- issuance Probe all TLDs for CAA RRs Interested in data? Contact us: [email protected] http2.netray.io quic.netray.io iw.netray.io Acknowledgements We would like to thank Jens Hektor and Bernd Kohler (RWTH Aachen IT Center) for enabling and supporting our work. Funded by the Excellence Initiative of the German federal and state governments and by the DFG as part of the CRC 1053 MAKI. 1 2 3 ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ●●●●●●●●●●●●●●● ●●●● ●● ●●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●●●●● ●● ●●●● ●● ●●●● ●● ●●●● ●● ●●●●● ●● ●●●●● ●● ●●●● ●● ●● 0 100 200 300 400 201701 201702 201703 201704 201705 201706 201707 201708 201709 201710 201711 201712 201801 201802 201803 201804 201805 201806 HTTP2 Growth [%] TLD alexa com net nu org se The Rise of HTTP2 The Rise of QUIC The Rise of Crypto Miners in Web Pages 11.01.18 11.03.18 02.03.18 11.05.18 27.02.18 08.05.18 28.02.18 09.05.18 Scan Date 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 NoCoin Detection Share Alexa .com .net .org coinhive authedmine wp-monero cryptoloot cpmstar other 710 621 6676 5744 618 553 473 399 # Potential Mining Domains Methodology Pattern matching on HTML payload of TLS scans Match javascript object names against NoCoin list Result Crypto mining in web pages exist Coinhive most prevalent framework / operator Methodology Establish QUIC connections with all IPv4 hosts Retrieve QUIC version supported by server Result QUIC is on the rise, driven by Google & Akamai New protocol versions (color shades) come and go quickly Methodology Establish H2 connections with domains in selected TLDs Analyze Server Push usage Result HTTP2 adoption is on the rise Server Push adoption orders of magnitude lower (not shown) Influence of Internet Top Lists Cloud & CDN Adoption ●● ●● ●● ●● ●● ●● 0 2 4 6 8 201704 201705 201706 201707 201708 201709 201710 201711 201712 201801 201802 201803 201804 CDN Share [%] alexa com net nu org se ●● 0 10 20 30 40 201704 201705 201706 201707 201708 201709 201710 201711 201712 201801 201802 201803 201804 Cloud Hosted Domains [%] TLD alexa com net nu org se Methodology Cloud usage: match www. A records to cloud prefixes (full-site) CDN usage: match www. CNAME to CDN pattern Result CDN adoption higher on Alexa Top 1M then complete TLDs Cloud usage higher than full-site CDN hosting 2018-04-11 2018-04-14 2018-04-17 2018-04-20 2018-04-23 2018-04-26 2018-04-29 2018-05-02 2018-05-05 2018-05-08 0 10 20 30 40 50 60 Share [%] Alexa 1M Alexa 1k Umbrella 1M Umbrella 1k Majestic 1M Majestic 1k c/n/o Methodology Partition HTTP2 adoption measurement data by top lists Alexa, Umbrella & Majestic Top 1M Result Results differ by list and rank (e.g., top 1k vs top 1M) Umbrella has high number of NXDOMAIN entries tls.netray.io Certification Authority Authorization (CAA) caastudy.github.io toplists.github.io dns.netray.io 20.Aug 2016 16.Sep 2016 14.Oct 2016 11.Nov 2016 09.Dec 2016 06.Jan 2017 03.Feb 2017 03.Mar 2017 31.Mar 2017 28.Apr 2017 26.May 2017 30.Jun 2017 28.Jul 2017 25.Aug 2017 22.Sep 2017 20.Oct 2017 17.Nov 2017 15.Dec 2017 12.Jan 2018 09.Feb 2018 09.Mar 2018 06.Apr 2018 04.May 2018 01.Jun 2018 29.Jun 2018 27.Jul 2018 0.0 2M 4M # Hosts 37..35 38..35 39..35 39..37,35 39..37,35,41 40..37,35 41,41,39,35 41,41..37,35 43..41,39,35 44..43,39,35 Other netray.io netray.io

Operating a DNS-based Active Internet Observatoryconferences2.sigcomm.org/co-next/2018/slides/poster... · HTTP2 Growth [%] TLD alexa com net nu org se The Rise of HTTP2 The Rise

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Operating a DNS-based Active Internet Observatoryconferences2.sigcomm.org/co-next/2018/slides/poster... · HTTP2 Growth [%] TLD alexa com net nu org se The Rise of HTTP2 The Rise

Operating a DNS-based Active Internet Observatory

Jens Hiller, Oliver Hohlfeld, Jan Rüth, Torsten Zimmermann

www.netray.io

The NetRay Internet Observatory Motivation

● Study Internet Evolution ●  Internet: an entirely man-made system yet not fully understood ● Optimizing the Internet requires understanding its properties

● Longitudinal and multi-protocol studies rare ● Often only a single protocol is measured for a short duration

● Domain name probes provide new perspective ●  IPv4 space probed regularly à doesn’t account for virtualization (SNI)

Goal

● Regular, multi-protocol probes of IPv4 & >50% of domain name space for multiple protocols ● Regular probes: daily or weekly ● Probe more than one protocol ● Probe large portion of the domain name space

Architecture ● Target lists ● Zmap based IPv4 address space scan ● DNS zone files for multiple TLDs (e.g., .com, .net, .org) ● Complete zone files for few TLDs ● Passive DNS feed + CT logs to reconstruct other TLDs

● DNS resolution ● Perform DNS resolution for every domain for multiple RRs ● DNS resolution by cluster of machines ● Output annotated: e.g., CDN, ASN, Cloud, … ● Output written to Rabbit MQ message bus

● Protocol probing ● Protocol worker for every protocol ● Can run on multiple server ● Subscribe to workload via Rabbit MQ message bus

t

Pro

toco

lP

robi

ng

DN

S R

esol

utio

n In

put:

Targ

et

List

s ZoneFiles

DNS Crawler

PassiveD

NS

CT

Logs

zmap

domainnames en*reIPv4Space

Rabbit MQ Message Bus

Results

A/AA

AA

NS/MX

A-www

Results

HTTP2

A-www+IPs

Results

QUIC

Results

MX

ProtocolX

*me

protocolprober

1!

2!

3!

-! blacklist

-!

Classifiers

Example Studies

Data Sets

Domain Name System ● Complete zone files (daily) for ● com, .net, .org, .fi, .se, .nu, .gov

fed.us, .name, + >1000 new gTLDs (e.g., .london)

●  Incomplete zone files for ● 80 ccTLDs (e.g., .de) ● Source: passive DNS & CT log

● Probed RRs: ANY, SOA, CAA, LOC ● A & AAAA: (www.) domain.tld ● A & AAAA for every NS/MX

HTTP/2 ● Probe selected TLDs for HTTP2 ● Full connection establishment ● Regular scans: daily/weekly

● HTTP2 Server Push adoption ● Monitor which sites push

content on their landing page

QUIC ● Probe all TLDs / IPv4 for QUIC

support ● Perform connection

establishment ● Google & IETF QUIC ● Regular scans: daily/weekly

● Server fingerprinting etc.

TCP Initial Window ● Assessment of global TCP Initial

Window distribution ● Probe 1% random subsample of

the IPv4 space & Alexa Top 1M domains ● Few full scans available ● 1% random sample sufficiently

approximates overall distribution à Reduce scan footprint

Web Security ● TLS connections with all TLDs ● Establish connection ● Retrieve certificates &

256kB payload ● Regular scans: daily/weekly

● Certificate Authority Authorization ● CAA goal: limit cert mis-

issuance ● Probe all TLDs for CAA RRs

Interested in data? Contact us: [email protected]

http2.netray.io quic.netray.io iw.netray.io

Acknowledgements We would like to thank Jens Hektor and Bernd Kohler (RWTH Aachen IT Center) for enabling and supporting our work. Funded by the Excellence Initiative of the German federal and state governments and by the DFG as part of the CRC 1053 MAKI.

1

2

3

●●● ●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0

100

200

300

400

2017−0

1

2017−0

2

2017−0

3

2017−0

4

2017−0

5

2017−0

6

2017−0

7

2017−0

8

2017−0

9

2017−1

0

2017−1

1

2017−1

2

2018−0

1

2018−0

2

2018−0

3

2018−0

4

2018−0

5

2018−0

6

HTT

P2 G

row

th [%

] TLD● alexa

comnetnu

orgse

The Rise of HTTP2 The Rise of QUIC The Rise of Crypto Miners in Web Pages

11.01.1811.03.18

02.03.1811.05.18

27.02.1808.05.18

28.02.1809.05.18

Scan Date

0.00

0.25

0.50

0.75

1.00

NoC

oin

Det

ecti

onSh

are

Ale

xa

.com .net

.org

coinhiveauthedmine

wp-monerocryptoloot

cpmstarother

710 621 6676 5744 618 553 473 399# Potential Mining Domains

● Methodology ● Pattern matching on HTML payload of TLS scans ● Match javascript object names against NoCoin list

● Result ● Crypto mining in web pages exist ● Coinhive most prevalent framework / operator

Summary

● New active measurement infrastructure to study Internet evolution with large-scale, DNS-based multi-protocol measurements ● Ambitious goal to cover a large domain name space with

longitudinal measurements ● web page showing current statistics and further information

about our studies: netray.io

● Methodology ● Establish QUIC connections with all IPv4 hosts ● Retrieve QUIC version supported by server

● Result ● QUIC is on the rise, driven by Google & Akamai ● New protocol versions (color shades) come and go quickly

● Methodology ● Establish H2 connections with domains in selected TLDs ● Analyze Server Push usage

● Result ● HTTP2 adoption is on the rise ● Server Push adoption orders of magnitude lower (not shown)

Influence of Internet Top Lists Cloud & CDN Adoption ●

●●

●●

● ●

● ●●

●●

● ●

●●

●●

●● ●

●●

● ●

● ●

●●

● ●

●●●●

●● ● ●

●●

●●

●● ●●

● ●

●● ●●

● ●

● ●

● ●

●● ●●

●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

●●

●●

● ●●

●●●

●●

●● ●●

●●●

●●

●●

●●●

0

2

4

6

8

2017−0

4

2017−0

5

2017−0

6

2017−0

7

2017−0

8

2017−0

9

2017−1

0

2017−1

1

2017−1

2

2018−0

1

2018−0

2

2018−0

3

2018−0

4

CD

N S

hare

[%]

● alexacom

netnu

orgse

●●● ●● ●●● ●● ●

●●●●●

●● ●●

●●●●

● ●●● ●● ●● ●● ●

● ●● ●● ●●

● ● ●●●

●●●

●●

●●● ●

● ●●

●●● ●● ●●

●●

●●●

●●●

●●● ●

● ●●● ● ●●● ●

●●

● ●●

●● ● ●

● ●●●●

●●●●● ●

●● ● ●● ●

●●● ●

●●

●● ●

● ●● ● ●

●●

●●● ●

●● ●

● ● ● ●●

●●● ●●

●●●● ●

● ●● ●●●

●●●●●

●●● ●●●●

●●

● ●●●

● ● ●●●●

●● ●●●● ●

●●●

●●●

●●●● ● ● ●●●●●● ● ●

●● ●●

●●●

● ●●●●

●●

● ●●

●●●●●

●●● ●● ● ● ●●

●●●● ●

● ●●●●● ●

●●

● ●●

● ●●●● ●

●●●● ● ●● ● ●

●●

●●● ●●●●●

● ● ●

0

10

20

30

40

2017−0

4

2017−0

5

2017−0

6

2017−0

7

2017−0

8

2017−0

9

2017−1

0

2017−1

1

2017−1

2

2018−0

1

2018−0

2

2018−0

3

2018−0

4

Clo

ud H

oste

d D

omai

ns [%

]

TLD● alexa

comnetnu

orgse

● Methodology ● Cloud usage: match www. A records to cloud prefixes ●  (full-site) CDN usage: match www. CNAME to CDN pattern

● Result ● CDN adoption higher on Alexa Top 1M then complete TLDs ● Cloud usage higher than full-site CDN hosting

2018-04

-11

2018-04

-14

2018-04

-17

2018-04

-20

2018-04

-23

2018-04

-26

2018-04

-29

2018-05

-02

2018-05

-05

2018-05

-080

10

20

30

40

50

60

Shar

e[%

]

Alexa 1MAlexa 1k

Umbrella 1MUmbrella 1k

Majestic 1MMajestic 1k

c/n/o

● Methodology ● Partition HTTP2 adoption measurement data by top lists ● Alexa, Umbrella & Majestic Top 1M

● Result ● Results differ by list and rank (e.g., top 1k vs top 1M) ● Umbrella has high number of NXDOMAIN entries

tls.netray.io

Certification Authority Authorization (CAA)

caastudy.github.io toplists.github.io

dns.netray.io

20.Aug 2016

16.Sep2016

14.Oct 2016

11.Nov 2016

09.Dec2016

06.Jan2017

03.Feb2017

03.Mar 2017

31.Mar 2017

28.Apr 2017

26.May

2017

30.Jun 2017

28.Jul 2017

25.Aug 2017

22.Sep2017

20.Oct 2017

17.Nov 2017

15.Dec2017

12.Jan2018

09.Feb2018

09.Mar 2018

06.Apr 2018

04.May

2018

01.Jun 2018

29.Jun 2018

27.Jul 2018

0.0

2M

4M

#H

osts

37..3538..3539..3539..37,3539..37,35,4140..37,3541,41,39,3541,41..37,3543..41,39,3544..43,39,35Other

netray.io netray.io