Next Steps in Internet Content Delivery Peter B Danzig Danzig@danzigthomas.com

Preview:

Citation preview

Next Steps in Internet Content Delivery

Peter B Danzig

Danzig@danzigthomas.com

Understanding WAN traffic

HOW MUCH WEB TRAFFIC

CROSSES THE INTERNET?

How much WAN HTTP traffic?

Assumptions:

250 million internet users

Average 10 kbits/s per user when online

Average 10% online

Yields:

Bandwidth = 250M * 10kb/s * 0.10 = 250 Gbits/s

How much WAN HTTP traffic? Observations:

Doubleclick.Com is 1% of web by byte

Geocities.Com is 1 % of web by byte

Download.Microsoft.Com is 1% byte byte

Learn their aggregate bandwidth purchases….And estimate internet bandwidth:

250 Gbits/s

Internet Web Sites

05

1015202530354045

20 100 300 1000

Number of Web Sites

Per

cent

of

Inte

rnet

Tra

ffic

More than 6,000 ISPs

0

20

40

60

80

100

1 10 100 1000 10000

ISPs

Per

cent

of

Inte

rnet

Tra

ffic

Percent

Internet CDN Market Sizing

• Market Size = $2M/Gb * 250 Gb * 3yr / Sqrt(2)yr

• Revenue Potential = Market Size * Mkt Fraction

• Market size grows by 4.5x every two years

• Today: Mkt = (250 Gb) (Multiplex Factor) + Streaming Mkt

• Streaming Mkt = Yahoo Broadcast + IBeam + Akamai + Real Broadcast Network < 10Gb/s

Internet CDN Strategy• Except for top 1000 sites, the world’s 199,000+ web sites serve minimal bandwidth

• This determines the business strategy

•Acct provisioning need be cheap & easy

• Need indirect sales

• Need bigger, more expensive product bundle

• Customer care need be inexpensive

• Make money from streaming media?

Internet CDN Strategy, cont.

• Live Streaming Media

• Lights, camera, action

• Event connectivity: ISDN or Satellite truck role

• Production and encoding

• Yucky, dirty, icky, labor intensive, non-cerebral, labor-of-love, crafty, stuff

• Work more reminiscent of WebVan than Cisco

Akamai ‘GIF’ Delivery

Without Akamai

 

“Akamaized”

HTMLDelivered byCNN

“Akamaized”

“Akamaized”

“Akamaized”

“Akamaized”“Akamaized”

“Akamaized”

Entire WebPage deliveredby CNN

 

KeyNote System Measurements

With Akamai

Without Akamai

 

KeyNote Systems & its Wannabes Deploys “footprint” of monitoring

agents, provisioning interface, global log collection, reports

Agents: Emulate URL and page download. Emulate broadband and dialup access rates

Wannabe Competitors: Mercury Interactive, Service Metrics, StreamCheck, etc.

KeyNote: Operational Issues Where’s the bottleneck: the agent or the

agent’s network connection Where’s the agent’s DNS resolver? How to excise mistaken points from

database How can a CDN beat a Keynote

benchmark? How does Keynote’s TCP stack affect its

results?

End-to-End CDN Measurements? Contrast methodology between Johnson

et al and Keynote Systems Server log analysis-e.g. Web Trends

Server logs don’t record page arrival times, as the bytes stay queued in TCP or OS buffers.

Client-side reporting (e.g. WebSideStory)Place JavaScript on web page that

reports client experience to aggregator

HTML Delivery Consider Web Traffic breakdown:

GIFS and JPEG 55% HTML 25% J. Random Gunk 20%

HTML is half of the delivery market, but HTML is 1/3 static and 2/3 dynamic HTML?

HTML Delivery Delivering static HTML from caches is fast How can we make dynamic HTML faster?

Compress it or Delta-Encode it Black magic: Transfer it over a TCP tunnel or

L2TP Little’s Law almost always surprises laymen. Construct or “assemble” it within the CDN via

proprietary language extensions

Future of HTML Delivery Profiling: detect client location and

link speed Interpret XML style sheets at edge

(see Oracle/Akamai) Insert ads Compress at source/decompress in

browser or edge network

Components of a CDN Distributed server load balancing,

e.g. “Internet Mapping” DNS redirection, hashing, and fault

tolerance Distributed system monitoring Distributed software configuration

management

Components of a CDN (cont) Live stream distribution and entry

points Log collection, reporting, and

performance monitoring Client provisioning mechanism Content management and

replication

Network Mapping Network mapping chooses reasonable data

centers to satisfy a client request. We could devote an entire day to mapping. Briefly, what factors help predict good

mapping? Contracted data center bandwidth Path characteristics: RTT, Bottleneck Bandwidth,

“Experience”, Autonomous Systems Crossed, Hop Count, Observed loss rates, etc.

How do you measure these factors? Mapping is an art.

Black Art of Network Mapping Cisco Boomerang

Synchronized DNS servers Radware’s DSLB box

Linear combination of hop count and RTT

F5’s 3DNS ICMP ping

Alteon, Foundry, Resonate, and others….

Live Stream Distribution Ubiquitous IP Multicast hasn’t

emerged Alternative: IP Multicast plus FEC Yahoo Broadcast’s Approach:

Private network link to principal ISPs Support multicast where available Otherwise, just blast it by unicast and

hope

Live Stream Distribution Some CDNs attempt to route

independent live streams via multiple paths

Encode with simple error correction codes- better code would increase delay

Makes client provisioning more challenging—need to get encoded signal to multiple entry points

Live Stream Distribution Splitter-combiner network burns

bandwidth Subscription and teardown expensive,

given low median subscriber count According to Yahoo Broadcast

Mean subscribers? Average subscribers?

Splitter/combiner masks failures too successfully, until hell breaks loose

DNS Redirection, Hashing, and Fault Tolerance Top-level DNS: Uses IP Anycast to a

dozen DNS servers (or more) Second-level DNS servers: Redirect

client to a reasonable region Low-level DNS servers: Implement

something akin to consistent hashing

Hot spare address takeover to mask machine failures

Distributed system monitoring Problem: export monitoring information

across thousands of machines running in hundreds of regions

Design principles: Aggregation Scalable Extensible data types Fault-tolerant Timely delivery Expressible queries

Distributed software configuration management

Manage software and OS on thousands of remote machines

Stage system software pushes Detect incompatibilities before hell

breaks loose

Log collection, reporting, and performance monitoring

Collect and create database of 10-100 billion log lines per day

Allow customer to see their logs and performance

How would you do this in real time?

Content management and replication

Reliably update replicated hosting Mask storage volume boundaries Enable billing and reclaiming lost

space

Consistent Hashing

Cute algorithm for splitting load across multiple servers

Create permutation on hash bucket

Add servers and subtract servers for given bucket (e.g. permutation) in same order

Consistent Hashing

http://a32.g.akamaitech.net/ Would a less elegant algorithm

suffice? Yes, hit rates are 98-99% anyway,

any hash algorithm suffices. The 2nd level of Akamai DNS servers

slightly degrade performance, since DNS TTLs are short

What are the next steps?

Got to address HTTP and compression/delta encoding

What about peer-to-peer for GIFS and Video?

How about PVR (e.g. TIVO) and Peer-to-peer

What about live stream distribution?

Recommended