34
TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

Embed Size (px)

DESCRIPTION

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton). IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD. Internet access is a scarce commodity in the developing world: expensive / slow - PowerPoint PPT Presentation

Citation preview

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

Sunghwan Ihm (Princeton)KyoungSoo Park (KAIST)Vivek S. Pai (Princeton)

2

IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity

in the developing world: expensive / slow

Our focus: improving performance of connected network access

Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet)

Sunghwan Ihm, Princeton University

2

3

POSSIBLE OPTIONS

Web proxy cachingWhole objectsSingle endpoint (local)Designated cacheable traffic only

WAN accelerationPacket-level cachingMostly for enterpriseTwo (or more) endpoints, coordinated

Effective in first worldSunghwan Ihm, Princeton University

3

4

DEVELOPING WORLD QUESTIONS How effective are these approaches?

Systems designed for first-world useMost traffic studies small, first-world

focusedHow similar is developing region

traffic?

Any new opportunities to exploit?Differences in trafficDifferences in cost/tradeoffsSystem design issues

Sunghwan Ihm, Princeton University

4

5

UNDERSTANDING DEVELOPING WORLD TRAFFIC

Goal

Shape system design by better understanding the traffic optimization opportunities

Requirements

Large-scale, content-focused analysis

Sunghwan Ihm, Princeton University

5

6

PRIOR TRAFFIC ANALYSIS WORK Large scale traffic analysis

Internet Study 2007, 2008/2009 by ipoqueOne million usersHigh-level characteristics via DPIFirst-world focus

Developing world traffic analysisDu et al. WWW’06, Johnson et al. NSDR’10Proxy-level analysis from kiosk, Internet

cafes, and community centers

Sunghwan Ihm, Princeton University

6

7

OUR APPROACH

Combine best featuresLarge-scale and content-focused First world and developing world

Use traffic from CoDeeN content distribution network (CDN)Global proxy (500+ PlanetLab nodes)Running since 200330+ million requests per day

Sunghwan Ihm, Princeton University

7

8

WHAT TO ANALYZE?

1. Traffic profile

2. Caching opportunities

3. User behavior

Sunghwan Ihm, Princeton University

8

9

DATA COLLECTION

OriginWeb Server

Local ProxyCache

User BrowserCache

CoDeeNCache

WAN

• Assume local proxy caches• Focus on cache misses only • Capture full content

9

9

Sunghwan Ihm, Princeton University

10

DATA SET

Duration: 1 week (March 25-31, 2010)

# Requests: 157 Million

Volume: 3 TeraBytes

# Clients (unique IPs): 348 K

# Countries/Regions: 190 /8 networks coverage: 61.3%/16 networks coverage: 24.1%

Sunghwan Ihm, Princeton University

10

11

TOP COUNTRIES

Requests % Bytes % Clients %

PL

CN

SA

Etc. Etc.Etc.

11

DE (Germany)US (United States)RU (Russian Federation)AE (United Arab Emirates)

PL (Poland)CN (China)SA (Saudi Arabia)

DEUS

US

PL

CN

CN

PL

SA

SA

DEAE

RU

Etc.(185 Countries)

12

OECD VS. DEVREG

OECD: the first world27 high-income economies from OECD

member countries25% of total traffic

DevReg: the developing worldThe remaining 163 countries and 3 OECD

members: Mexico, Poland, and Turkey75% of total traffic

Sunghwan Ihm, Princeton University

12

13

ANALYSIS #1: TRAFFIC PROFILE

Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy)

We often hear a variant of“Offline Wikipedia content suffices for developing world users”

Sunghwan Ihm, Princeton University

13

14

Small: median 3KB vs. 5KB Large: similar demand/profile

16KB

OBJECT SIZE

Sunghwan Ihm, Princeton University

14

15

TEXT AND IMAGES

DevReg has a higher fraction of images Exact opposite of bandwidth conjecture

Sunghwan Ihm, Princeton University

15

16

VIDEO AND AUDIO

DevReg: higher fraction of video & audio Music videos and MP3 songs

Sunghwan Ihm, Princeton University

16

17

APPLICATION (FLASH)

DevReg has a higher fraction of application traffic

Median near 7%

Sunghwan Ihm, Princeton University

17

18

ANALYSIS #1 SUMMARY

Some evidence that DevReg-visited sites have smaller objects, but

DevReg users visit large pages as well, and

DevReg users seek a higher fraction of rich content than OECD users

Sunghwan Ihm, Princeton University

18

19

ANALYSIS #2: CACHING OPPORTUNITY

Conjecture: little gain from larger cachesSome analysis suggests 1GB sufficientTypical cache size < 20GBObject-based caching

Sunghwan Ihm, Princeton University

19

20

CONTENT-BASED CHUNK CACHING

Split content into chunksName chunks by content (SHA-1 hash)Cache chunks instead of objects

Fetch content, send only modified chunksTwo endpoints neededApplies to “uncacheable” content

A B C D E

Sunghwan Ihm, Princeton University

20

21

OVERALL REDUNDANCY

40% @ 64 KB: objects or parts of large object 60% @ 1 KB: parts of text pages 65% @ 128 bytes: paragraphs or sentences

Sunghwan Ihm, Princeton University

21

22

CACHE BEHAVIOR SIMULATION

Simulate one week’s trafficCache misses onlyLRU cache replacement policy

Determine size for near-ideal hit rateCalculate byte hit ratio (BHR) Vary storage size (from 10MB to max)

Results for US, China, and Brazil

Sunghwan Ihm, Princeton University

22

23

US – 213 GB

24

CHINA – 559 GB

25

BRAZIL – 44 GB

26

ANALYSIS #2 SUMMARY

Chunk caching usefulReduces WAN (cache miss) trafficComplements existing Web proxies

Larger caches usefulUseful reduction in miss rateCheap compared to bandwidth costs

Sunghwan Ihm, Princeton University

26

27

ANALYSIS #3: USER BEHAVIOR

Conjecture: as first-world Web pages get larger, DevReg users suffer delays

Mechanism: observe aborted transfers Intentional terminationAutomatic when browsing away

Abort = users bored or downloads slow

Sunghwan Ihm, Princeton University

27

28

CANCELLED OBJECT SIZEC-CDF

Cancelled objects larger than normal (red) Complete objects (green) much larger than actual

download (blue) Most downloads less than 10MB

Sunghwan Ihm, Princeton University

28

29

CANCELLED TRANSFER VOLUME 17% of transfers are terminated early

Due to the early termination, 25% of actual traffic

If fully downloaded, would have been 80% of all bytesOverall traffic increase of 375%

Sunghwan Ihm, Princeton University

29

30

CANCELLED CONTENT TYPES

Most canceled responses were text Most bytes from video/audio/application

Sunghwan Ihm, Princeton University

30

31

% CANCELLED REQUESTS CDF

OECD cancel more often than DevRegMedian almost double

Sunghwan Ihm, Princeton University

31

32

ANALYSIS #3 SUMMARY

Many transactions aborted

Previewing video filesContent-based caching is effective

OECD users less patient than DevRegCheap bandwidth = more sampling?

Sunghwan Ihm, Princeton University

32

33

CONCLUSIONS

First glimpse at CoDeeN trafficLarge-scale, content-focused analysisOECD and developing world

Many DevReg assumptions are false In fact, strong desire for rich content, andPatient despite slow connections

Systems implicationsChunk caching worth more explorationLarger caches very useful

Sunghwan Ihm, Princeton University

33

[email protected]

http://www.cs.princeton.edu/~sihm/