04/08/23 HTTP, CGI and Cookies 1
Foundations of the Web:HTTP, CGI and Cookies
Ethan CeramiNew York University
04/08/23 HTTP, CGI and Cookies 2
Road Map HTTP Overview Example HTTP Session HTTP 1.0 v. 1.1 Structure of Client Requests/Server
Responses CGI Overview Cookies Overview
04/08/23 HTTP, CGI and Cookies 3
HTTP Overview
04/08/23 HTTP, CGI and Cookies 4
HTTP Overview HTTP: HyperText Transfer Protocol Developed by Tim Berners Lee, 1990 Enables web clients to request documents
from web servers Stateless Protocol
each HTTP request is completely independent. Web Servers do not retain any memory of
related requests. (Cookies are actually used to maintain state,
but more on this later…)
04/08/23 HTTP, CGI and Cookies 5
HTTP Client/Server Client/Server Architecture Client: web browser that requests a
document. Examples: Microsoft Internet Explorer,
Netscape Navigator Server: web server that returns a
document Examples: Apache Web Server, Microsoft
IIS, etc.
04/08/23 HTTP, CGI and Cookies 6
Http Client/Server
Client Web Browser
Web Server
Give me /index.html
Here you go...
04/08/23 HTTP, CGI and Cookies 7
HTTP via Telnet You can run HTTP via the UNIX
Telnet command. Instructions
Log into your UNIX account telnet www.yahoo.com 80 GET /
Good method to learn the details of HTTP
04/08/23 HTTP, CGI and Cookies 8
Sample Telnet Sessionbash-2.04$ telnet www.yahoo.com 80Trying 216.32.74.50...Connected to www.yahoo.akadns.net.Escape character is '^]'.
GET /HTTP/1.0 200 OKContent-Length: 15582Content-Type: text/html <html><head><title>Yahoo!</title><base href=http://www.yahoo.com/><meta
http-equiv="PICS-Label" content='(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true for "http://www.yahoo.com" r (n 0 s 0 v 0 l))'></head><body><center><formaction=http://search.yahoo.com/bin/search><map name=m><area coords="0,0,52,52" href=r/a1><area coords="53,0,121,52" href=r/p1><area coords="122,0,191,52" href=r/m1><area
...
04/08/23 HTTP, CGI and Cookies 9
Example HTTP Session
04/08/23 HTTP, CGI and Cookies 10
Example HTTP Session Client requests the following URL:
http://hypothetical.ora.com:80/ Anatomy of the Request:
http:// HyperText Transfer Protocol; other options: ftp, mailto, etc.
hypothetical.ora.com: host name :80: Port Number. 80 is reserved for
HTTP. Ports can range from: 1-65,535 / Root document
04/08/23 HTTP, CGI and Cookies 11
The Client Request Actual Browser Request:GET / HTTP/1.1Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */*
Accept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
Host: hypothetical.ora.comConnection: Keep-Alive
04/08/23 HTTP, CGI and Cookies 12
Anatomy of the Client Request GET / HTTP/1.1
Requests the root / document. Specifies HTTP version 1.1. HTTP Versions: 1.0 and 1.1 (more on this
later…) Accept: image/gif, image/x-xbitmap,
image/ jpeg, image/pjpeg, */* Indicates what type of media the browser
will accept.
04/08/23 HTTP, CGI and Cookies 13
Anatomy of the Client Request Accept-Language: en-us
Browser’s preferred language Accept-Encoding: gzip, deflate
Accepts compressed data (speeds download times.)
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Indicates the browser type.
04/08/23 HTTP, CGI and Cookies 14
Anatomy of the Client Request Host: hypothetical.ora.com
Required for HTTP 1.1 Optional for HTTP 1.0 A Server may host multiple
hostnames. Hence, the browser indicates the host name here.
Connection: Keep-Alive Enables “persistent connections”.
Faster performance (more later…)
04/08/23 HTTP, CGI and Cookies 15
Server ResponseHTTP/1.1 200 OKDate: Mon, 24 Sept 2003 20:54:26 GMTServer: Apache/1.3.6 (Unix)Last-Modified: Mon, 24 Sept 2003 14:06:11 GMTContent-length: 327Connection: closeContent-type: text/html <title>Sample Homepage</title><img src="/images/oreilly_mast.gif"><h1>Welcome</h2>Hi there, this is a simple web page.
Granted, it may not be as elegant as some other web pages you've seen on the net, but there are some common qualities...
04/08/23 HTTP, CGI and Cookies 16
Anatomy of Server Response HTTP/1.1 200 OK
Server Status Code Code 200: Document was found We will examine other status codes
shortly. Date: Mon, 24 Sept 2003 20:54:26
GMT Date on the server. GMT (Greenwich Mean Time)
04/08/23 HTTP, CGI and Cookies 17
Anatomy of Server Response Last-Modified: Mon, 24 Sept 2003
14:06:11 GMT Indicates the time when the document
was last modified. Very useful for browser caching. If a browser already has the page in its
cache, it may not need to request the whole document again (more later…)
04/08/23 HTTP, CGI and Cookies 18
Anatomy of Server Response Content-length: 327
Number of bytes in the document response.
Connection: close Indicates that the server will close the
connection. If the client wants to send another
request, it will need to open another connection to the server.
04/08/23 HTTP, CGI and Cookies 19
Anatomy of Server Response Content-type: text/html
Indicates the MIME Type of the return document.
Multi-Purpose Internet Mail Extensions Enables web servers to return binary or text
files. Other MIME Categories:
audio, video, images, xml
Full list of MIME Types available online at: http://www.iana.org/assignments/media-types/
04/08/23 HTTP, CGI and Cookies 20
Anatomy of Server Response
The actual HTML document:<title>Sample Homepage</title>
<img src="/images/oreilly_mast.gif">
<h1>Welcome</h2>Hi there, this is a simple web page. Granted, it may not
be as elegant as some other web pages you've seen on the net, but there are
some common qualities…
04/08/23 HTTP, CGI and Cookies 21
HTTP 1.0 v. 1.1
04/08/23 HTTP, CGI and Cookies 22
Getting Images Once a browser receives an HTML
page, it makes separate connections to retrieve the images.
Client Web Browser
Web Server
Give me /index.html
Here you go...
Now, give me logo.gif
Here you go...
04/08/23 HTTP, CGI and Cookies 23
HTTP 1.0 v. 1.1 HTTP 1.0:
For each request, you must open a new connection with the server.
HTTP 1.1 For each request, the default action is to
maintain an open connection with the server.
Faster, Persistent Connections Supported by most browsers and
servers.
04/08/23 HTTP, CGI and Cookies 24
Example: HTTP 1.0 v. 1.1 HTTP 1.0: Get HTML Page plus Images
Open Connection: GET /index.html Open Connection: GET /logo.gif Open Connection: GET /button.gif
HTTP 1.1: Get HTML Page plus Images Open Persistent Connection: GET
/index.html GET /logo.gif GET /button.gif
04/08/23 HTTP, CGI and Cookies 25
Structure of Client Requests
04/08/23 HTTP, CGI and Cookies 26
Client Requests Every client request includes three
parts: Method: Used to indicate type of
request, HTTP Version and name of requested document.
Header Information: Used to specify browser version, language, etc.
Entity Body: Used to specify form data for POST requests.
04/08/23 HTTP, CGI and Cookies 27
Client Methods GET:
This is the same GET that we discussed for HTML forms.
POST: This is the same POST method that
we discussed for HTML forms. Data is sent in the entity portion of
the HTTP request.
04/08/23 HTTP, CGI and Cookies 28
One More Client Method HEAD:
Similar to GET, except that the method requests only the header information.
Server will return date-modified, but will not return the data portion of the requested document.
Useful for browser caching. For example:
If browser contains a cached version of a page, it issues a head request.
If document has not been modified recently, use cached version.
04/08/23 HTTP, CGI and Cookies 29
Structure of Server Responses
04/08/23 HTTP, CGI and Cookies 30
Server Responses Every server response includes
three parts: Response line: HTTP version number,
three digit status code, and status message.
Header: Information about the server Entity Body: The actual data.
04/08/23 HTTP, CGI and Cookies 31
Server Status Codes 100-199 Informational 200-299 Client Request
Successful 300-399 Client Request
Redirected 400-499 Client Request
Incomplete 500-599 Server Errors
04/08/23 HTTP, CGI and Cookies 32
Some Important Status Codes 200: OK
Request was successful.
301: Moved Permanently Server redirects client to a new URL.
404 Not Found Document does not exist
500 Internal Server Error Error within the Web Server
All other status codes are available online at: http://www.w3.org/Protocols/HTTP/HTRESP.html
04/08/23 HTTP, CGI and Cookies 33
Common Gateway InterfaceCGI Overview
04/08/23 HTTP, CGI and Cookies 34
Common Gateway Interface What is CGI?
A general framework for creating server side web applications.
Instead of returning a static web document, web server returns the results of a program.
For example browser sends the parameter: name=Ethan. Web server passes the request to a Perl program. Perl Program returns HTML that says, Hello,
Ethan!
04/08/23 HTTP, CGI and Cookies 35
CGI Overview
Web Browser
Web Server
C/PerlProgram
Name=Ethan Name=Ethan
Hello, Ethan!Hello, Ethan!
04/08/23 HTTP, CGI and Cookies 36
Notes on CGI The first mechanism for creating
dynamic web sites. What languages can you create
CGI programs in? Just about any language: C/C++,
Perl, Java, etc.
04/08/23 HTTP, CGI and Cookies 37
CGI Environment Variables CGI includes a number of environment
variables. REMOTE_ADDR: Address of client browser SERVER_NAME: The Server Host Name or IP
Address SERVER_SOFTWARE: Name and version of
the server software. QUERY_STRING: A String of GET or POST
Form Variables.
04/08/23 HTTP, CGI and Cookies 38
Hello, World CGI#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello, World!\n";
04/08/23 HTTP, CGI and Cookies 39
From CGI to Servlets… That’s all you are going to cover on
CGI? Yes, CGI still represents a good way to
create dynamic web applications. Nonetheless, Servlets represent a more
powerful architecture… If you want to get more information on
CGI, check out: CGI Programming with Perl (O’Reilly Press.)
04/08/23 HTTP, CGI and Cookies 40
Cookies Overview
04/08/23 HTTP, CGI and Cookies 41
What is a Cookie? Small piece of data generated by a web
server, stored on the client’s hard drive. Serves as an add-on to the HTTP
specification (remember, HTTP by itself is stateless.)
Still somewhat controversial, as it enables web sites to track web users and their habits…
04/08/23 HTTP, CGI and Cookies 42
Example Cookie Use Web Site Acme.com wants to track the number of
unique visitors who access its site. If Acme.com checks the HTTP Server logs, it can
determine the number of “hits”, but cannot determine the number of unique visitors.*
That’s because HTTP is stateless. It retains no memory regarding individual users.
Cookies provide a mechanism to solve this problem.
* Actually, you could check the log files for IP addresses, but you would still have the problem of Internet proxies.
04/08/23 HTTP, CGI and Cookies 43
Tracking Unique Visitors Step 1: Person A requests home page for acme.com Step 2: Acme.com Web Server generates a new
unique ID. Step 3: Server returns home page plus a cookie set
to the unique ID. Step 4: Each time Person A returns to acme.com,
the browser automatically sends the cookie along with the GET request.
04/08/23 HTTP, CGI and Cookies 44
Cookie Conversation
Browser Server
Give me the home page!
Here’s the home page plusa cookie.
Now, give me the news page(cookie is sent automatically)
I’ve seen you before… Here’sthe news page.
04/08/23 HTTP, CGI and Cookies 45
Cookie Notes Created in 1994 for Netscape 1.1 Cookies cannot be larger than 4K No domain (e.g. netscape.com,
microsoft.com) can have more than 20 cookies.
Cookies stay on your machine until: they automatically expire they are explicitly deleted
Cookies work the same on all browsers. No cross-browser problems here!
04/08/23 HTTP, CGI and Cookies 46
Magical Cookies The term cookie comes from an old
programming hack, called Magical Cookies.
If a programmer couldn’t make two parts of a program communicate, he would create a “magical cookie”, a small text file containing data to transfer between program parts.
04/08/23 HTTP, CGI and Cookies 47
Cookie Standards Version 0 (Netscape):
The original cookie specification Implemented by all browsers and servers We will focus on this Version
Version 1 A proposed standard of the Internet Engineering
Task Force (IETF) Request for Comment 2109 Unfortunately, not very widely used (hence, we will
stick to Version 0.)
04/08/23 HTTP, CGI and Cookies 48
Why use Cookies? Tracking unique visitors Creating personalized web sites Shopping Carts Tracking users across your site:
e.g. do users that visit your sports news page also visit your sports store?
04/08/23 HTTP, CGI and Cookies 49
Cookie Anatomy
04/08/23 HTTP, CGI and Cookies 50
Cookie Anatomy Version 0 specifies six cookie parts:
Name Value Domain Path Expires Secure
04/08/23 HTTP, CGI and Cookies 51
Cookie Parts: Name/Value Name
Name of your cookie (Required) Cannot contain white spaces, semicolons
or commas. Value
Value of your cookie (Required) Cannot contain white spaces, semicolons
or commas.
04/08/23 HTTP, CGI and Cookies 52
Cookie Parts: Domain Only pages from the domain which created a cookie are allowed
to read the cookie. For example, amazon.com cannot read yahoo.com’s cookies
(imagine the security flaws if this were otherwise!) By default, the domain is set to the full domain of the web server
that served the web page. For example, myserver.mydomain.com would automatically
set the domain to .myserver.mydomain.com
04/08/23 HTTP, CGI and Cookies 53
Cookie Parts: Domain Note that domains are always prepended with a dot.
This is a security precaution: all domains must have at least two periods.
You can however, set a higher level domain For example, myserver.mydomain.com can set the
domain to .mydomain.com. This way hisserver.mydomain.com and herserver.mydomain.com can all access the same cookies.
No matter what, you cannot set a domain other than your own.
04/08/23 HTTP, CGI and Cookies 54
Cookie Parts: Path Restricts cookie usage within the site. By default, the path is set to the path of the
page that created the cookie. Example: user requests page from
mymall.com/storea. By default, cookie will only be returned to pages for or under /storea.
If you specify the path to / the cookie will be returned to all pages (a common practice.)
04/08/23 HTTP, CGI and Cookies 55
Cookie Parts: Expires Specifies when the cookie will expire. Specified in Greenwich Mean Time (GMT):
Wdy DD-Mon-YYYY HH:MM:SS GMT
If you leave this value blank, browser will delete the cookie when the user exits the browser. This is known as a session cookies, as opposed to
a persistent cookie.
04/08/23 HTTP, CGI and Cookies 56
Cookie Parts: Secure The secure flag is designed to encrypt
cookies while in transit. A secure cookie will only be sent over a
secure connection (such as SSL.) In other words, if a cookie is set to
secure, and you only connect via a non-secure connection, the cookie will not be sent.
04/08/23 HTTP, CGI and Cookies 57
Example Cookie from GoogleHTTP/1.1 200 OKCache-control: privateContent-Type: text/htmlSet-Cookie: PREF=ID=11cebd117082ef7a:TM=1074966051:LM= 1074966051:S=CgHQLEJ57-U9oRXn; expires=Sun, 17-Jan- 2038 19:14:07 GMT; path=/; domain=.google.comContent-Encoding: gzipServer: GWS/2.1Content-length: 1216Date: Sat, 24 Jan 2004 17:40:51 GMT
04/08/23 HTTP, CGI and Cookies 58
Example from Amazon.com
HTTP/1.1 302Date: Sat, 24 Jan 2004 17:58:29 GMTServer: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) amarewrite/0.1 mod_fastcgi/2.2.12Set-Cookie: session-id-time=1075536000; path=/; domain=.amazon.com; expires=Saturday, 31-Jan-2004 08:00:00 GMTSet-Cookie: session-id=103-0070896-9210277; path=/; domain=.amazon.com; expires=Saturday, 31-Jan-2004 08:00:00 GMTTransfer-Encoding: chunkedContent-Type: text/html