40
1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer Rexford, and colleagues at UC Berkeley)

1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

1

EE 122: The World Wide Web

Ion Stoica

TAs: Junda Liu, DK Moon, David Zatshttp://inst.eecs.berkeley.edu/~ee122/

(Materials with thanks to Vern Paxson, Jennifer Rexford,and colleagues at UC Berkeley)

Page 2: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

2

Goals of Today’s Lecture

Main ingredients of the Web URIs, HTML, HTTP

Key properties of HTTP Request-response, stateless, and resource meta-data

Performance of HTTP Parallel connections, persistent connections,

pipelining Web components

Clients, proxies, and servers Caching vs. replication

Page 3: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

3

The Web – History (I)

1945: Vannevar Bush, Memex:

"a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility"

Vannevar Bush (1890-1974)

Memex

(See http://www.iath.virginia.edu/elab/hfl0051.html)

Page 4: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

4

The Web – History (II)

1967, Ted Nelson, Xanadu: A world-wide publishing network

that would allow information to be stored not as separate files but as connected literature

Owners of documents would be automatically paid via electronic means for the virtual copying of their documents

Coined the term “Hypertext”

Ted Nelson

Page 5: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

5

The Web – History (III)

World Wide Web (WWW): a distributed database of “pages” linked through Hypertext Transport Protocol (HTTP) First HTTP implementation - 1990

Tim Berners-Lee at CERN HTTP/0.9 – 1991

Simple GET command for the Web HTTP/1.0 –1992

Client/Server information, simple caching HTTP/1.1 - 1996

Tim Berners-Lee

Page 6: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

6

Web Components

Content Objects

Clients Send requests / Receive responses

Servers Receive requests / Send responses Store or generate the responses

Proxies Placed between clients and servers

Act as a server for the client, and a client to the server Provide extra functions

Caching, anonymization, logging, transcoding, filtering access Explicit or transparent (“interception”)

Page 7: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

7

HTML Content: How?

A Web page has several components Base HTML file Referenced objects (e.g., images)

HyperText Markup Language (HTML) Representation of hypertext documents in ASCII format Web browsers interpret HTML when rendering a page Several functions:

Format text, reference images, embed hyperlinks (HREF)

Straight-forward to learn Syntax easy to understand Authoring programs can auto-generate HTML Source almost always available

Page 8: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

8

URI Content: How?

Uniform Resource Identifier (URI) Uniform Resource Locator (URL)

Provides a means to get the resource http://www.ietf.org/rfc/rfc3986.txt

Uniform Resource Name (URN) Names a resource independent of how to get it urn:ietf:rfc:3986 is a standard URN for RFC 3986

Page 9: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

9

URL Syntax Content: How?

protocol://hostname[:port]/directorypath/resource(e.g., http://inst.eecs.berkeley.edu/~ee122/fa08/index.html)

protocol http, ftp, https, smtp, rtsp, etc.

hostname Fully Qualified Domain Name (FQDN), IP address

port Defaults to protocol’s standard porte.g. http: 80/tcp https: 443/tcp

directory path Hierarchical, often reflecting file system

resource Identifies the desired resourceCan also extend to program executions:

http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b

Page 10: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

10

HTTP Client-Server: How?

HyperText Transfer Protocol (HTTP) Client-server protocol for transferring

resources Important properties:

Request-response protocol Resource metadata Stateless ASCII format % telnet www.cs.berkeley.edu 80

GET /istoica/ HTTP/1.0<blank line, i.e., CRLF>

Page 11: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

11

HTTP Big Picture

Client Server

Request image 1

Transfer image 1

Request image 2

Transfer image 2

Request text

Transfer text

Finish displaypage

Page 12: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

12

GET /somedir/page.html HTTP/1.1Host: www.someschool.edu User-agent: Mozilla/4.0Connection: close Accept-language: fr (blank line)

Client-to-Server Communication HTTP Request Message

Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to “POST” data to the server)

request line

header lines

carriage return line feedindicates end of message

Not optional

Page 13: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

13

Client-to-Server Communication HTTP Request Message

Request line: method, resource, and protocol version Request headers: provide information or modify request Body: optional data (e.g., to “POST” data to the server)

Request methods include: GET: Return current value of resource, run program, … HEAD: Return the meta-data associated with a resource POST: Update resource, provide input to a program, …

Headers include: Useful info for the server (e.g. desired language)

Page 14: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

14

Server-to-Client Communication HTTP Response Message

Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data

HTTP/1.1 200 OK Connection closeDate: Thu, 06 Aug 2006 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 2006 ... Content-Length: 6821 Content-Type: text/html(blank line) data data data data data ...

status line(protocol, status code,

status phrase)

header lines

datae.g., requested HTML file

Page 15: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

15

Server-to-Client Communication HTTP Response Message

Status line: protocol version, status code, status phrase Response headers: provide information Body: optional data

Response code classes Similar to other ASCII app. protocols like SMTP

Code Class Example

1xx Informational 100 Continue

2xx Success 200 OK

3xx Redirection 304 Not Modified

4xx Client error 404 Not Found

5xx Server error 503 Service Unavailable

Page 16: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

16

Web Server: Generating a Response Return a file

URL matches a file (e.g., /www/index.html) Server returns file as the response Server generates appropriate response header

Generate response dynamically URL triggers a program on the server Server runs program and sends output to client

Return meta-data with no body

Page 17: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

17

HTTP Resource Meta-Data Meta-data

Info about a resource A separate entity

Examples: Size of a resource Last modification time Type of the content

Data format classification e.g., Content-Type: text/html

Enables browser to automatically launch an appropriate viewer Borrowed from e-mail’s Multipurpose Internet Mail Extensions (MIME)

Usage example: Conditional GET Request Client requests object “If-modified-since” If object hasn’t changed, server returns “HTTP/1.1 304 Not

Modified” No body in the server’s response, only a header

Page 18: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

18

HTTP is Stateless Client-Server: How?

Stateless protocol Each request-response exchange treated independently Servers not required to retain state

This is good Improves scalability on the server-side

Don’t have to retain info across requests Can handle higher rate of requests Order of requests doesn’t matter

This is bad Some applications need persistent state

Need to uniquely identify user or store temporary info e.g., Shopping cart, user preferences and profiles, usage tracking, …

Page 19: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

19

State in a Stateless Protocol: Cookies Client-side state maintenance

Client stores small(?) state on behalf of server Client sends state in future requests to the server

Can provide authenticationRequest

ResponseSet-Cookie: XYZ

RequestCookie: XYZ

Page 20: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

20

Putting All Together Client Server: How?

Client-Server Request-Response

HTTP Stateless

Get state with cookies

Content URI/URL HTML Meta-data

Page 21: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

21

Web Browser Is the client Generates HTTP requests

User types URL, clicks a hyperlink or bookmark, clicks “reload” or “submit”

Automatically downloads embedded images Submits the requests (fetches content)

Via one or more HTTP connections Presents the response

Parses HTML and renders the Web page Invokes helper applications (e.g., Acrobat, MediaPlayer)

Maintains cache Stores recently-viewed objects and ensures freshness

Page 22: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

22

Web Browser History

1990, WorldWideWeb, Tim Berners-Lee, NeXT computer

1993, Mosaic, Marc Andreessen and Eric Bina

1994, Netscape

1995, Internet Explorer

….

Page 23: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

23

Web Serve

Handle client request:1. Accept a TCP connection2. Read and parse the HTTP request message3. Translate the URI to a resource4. Determine whether the request is authorized5. Generate and transmit the response

Web site vs. Web server Web site: one or more Web pages and objects

united to provide the user an experience of a coherent collection

Web server: program that satisfies client requests for Web resources

Page 24: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

24

5 Minute Break

Questions Before We Proceed?

Page 25: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

25

HTTP Performance

Most Web pages have multiple objects (“items”) e.g., HTML file and a bunch of embedded images

How do you retrieve those objects? One item at a time

What transport behavior does this remind you of?

Page 26: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

26

Fetch HTTP Items: Stop & Wait

Client Server

Request item 1

Transfer item 1

Request item 2

Transfer item 2

Request item 3

Transfer item 3

Finish; displaypage

Start fetchingpage T

ime

≥2 RTTsper

object

Page 27: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

27

Improving HTTP Performance:Concurrent Requests & Responses

Use multiple connections in parallel

Does not necessarily maintain order of responses

• Is this fair?– N parallel connections use bandwidth N times

more aggressively than just one– What’s a reasonable/fair limit as traffic

competes with that of other users?

• Client = Why?

• Server = Why?

• Network = Why?

R1R2 R3

T1

T2 T3

Page 28: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

28

Improving HTTP Performance:Pipelined Requests & Responses

Batch requests and responses Reduce connection overhead Multiple requests sent in a single

batch Small items (common) can also

share segments Maintains order of responses Item 1 always arrives before item 2

How is this different from concurrent requests/responses?

Client Server

Request 1Request 2Request 3

Transfer 1

Transfer 2

Transfer 3

Page 29: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

29

Improving HTTP Performance:Persistent Connections

Enables multiple transfers per connection Maintain TCP connection across multiple requests Including transfers subsequent to current page Client or server can tear down connection

Performance advantages: Avoid overhead of connection set-up and tear-down Allow TCP to learn more accurate RTT estimate Allow TCP congestion window to increase

i.e., leverage previously discovered bandwidth Default in HTTP/1.1 Can use this to batch requests on a single connection

Example:5 objects, RTT=50ms

Page 30: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

30

Improving HTTP Performance

Many clients transfer same information Generates redundant server and network load Clients experience unnecessary latency

Server

Clients

Backbone ISP

ISP-1 ISP-2

Page 31: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

31

Improving HTTP Performance: Caching

How?Modifier to GET requests:

If-modified-since – returns “not modified” if resource not modified since specified time

Response header: Expires – how long it’s safe to cache the resource No-cache – ignore all caches; always get resource directly

from server

Page 32: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

32

Improving HTTP Performance:Caching on the Client

Example: Conditional GET Request Return resource only if it has changed at the server

Save server resources!

How? Client specifies “if-modified-since” time in request Server compares this against “last modified” time of desired

resource Server returns “304 Not Modified” if resource has not changed …. or a “200 OK” with the latest version otherwise

GET /~ee122/fa08/ HTTP/1.1Host: inst.eecs.berkeley.eduUser-Agent: Mozilla/4.03If-Modified-Since: Sun, 27 Aug 2006 22:25:50 GMT<CRLF>

Request from client to server:

Page 33: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

33

Improving HTTP Performance:Caching with Reverse ProxiesCache documents close to server

decrease server load Typically done by content providers Only works for static content

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

Page 34: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

34

Improving HTTP Performance:Caching with Forward Proxies

Cache documents close to clients reduce network traffic and decrease latency

Typically done by ISPs or corporate LANs

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

Forward proxies

Page 35: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

35

Improving HTTP Performance:Caching w/ Content Distribution Networks

Integrate forward and reverse caching functionality One overlay network (usually) administered by one

entity e.g., Akamai

Provide document caching Pull: Direct result of clients’ requests Push: Expectation of high access rate

Also do some processing Handle dynamic web pages Transcoding

Page 36: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

36

Improving HTTP Performance:Caching with CDNs (cont.)

Clients

ISP-1

Server

Forward proxies

Backbone ISP

ISP-2

CDN

Page 37: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

37

Example: Akamai

Akamai creates new domain names for each client content provider. e.g., a128.g.akamai.net

The CDN’s DNS servers are authoritative for the new domains

The client content provider modifies its content so that embedded URLs reference the new domains. “Akamaize” content, e.g.: http://www.cnn.com/image-

of-the-day.gif becomes http://a128.g.akamai.net/image-of-the-day.gif

Page 38: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

38

Example: Akamai

get

http://www.cnn.com

a

DNS server forwww.cnn.com

b

c

localDNS server

www.cnn.com“Akamaizes” its content.

“Akamaized” response object has inline URLs for secondary content at a128.g.akamai.net and other Akamai-managed DNS names.

akamai.netDNS servers

lookup a128.g.akamai.net

Akamai servers store/cache secondary

content for “Akamaized” services.

Page 39: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

39

Improving HTTP Performance:Caching vs. Replication Why move content closer to users?

Reduce latency for the user Reduce load on the network and the server

How? Caching

Replicate content “on demand” after a request Store the response message locally for future use Challenges:

May need to verify if the response has changed … and some responses are not cacheable

Replication Planned replication of content in multiple locations Update of resources handled outside of HTTP Can replicate scripts that create dynamic responses

Page 40: 1 EE 122: The World Wide Web Ion Stoica TAs: Junda Liu, DK Moon, David Zats ee122/ (Materials with thanks to Vern Paxson,

40

Conclusions Key ideas underlying the Web

Uniform Resource Identifier (URI), Locator (URL) HyperText Markup Language (HTML) HyperText Transfer Protocol (HTTP) Browser helper applications based on content type

Performance implications Concurrent connections, pipelining, persistent conns.

Main Web components Clients, servers, proxies, CDNs

Next lecture: drilling down to the link layer K & R 5.1-5.3, 5.4.1