76
NBA 518: Enterprise Data Design and Analysis 1 CS330 Enterprise Architectures

2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 1

CS330

Enterprise Architectures

Page 2: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 2

2

The Big Picture

WWW SiteVisitor

THE WEB

Public Web Server

BusinessTransactionServer

MainMemoryCache

DBMS

DataWarehouseApplicationServer

INTRANET,VPN

Internal User

InternalWeb Server

Page 3: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 3

Overview

• Enterprise architectures

• Internet concepts• URIs

• The HTTP Protocol

• The presentation layer• HTML, HTML Forms

• Cookies

• JavaScript

• Style Sheets

Page 4: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 4

4

Layers and Tiers

Client is any user or program that wants to perform an operation over the system. Clients interact with the system through a presentation layer

The application logic determines what the system actually does. It takes care of enforcing the business rules and establish the business processes. The application logic can take many forms: programs, constraints, business processes, etc.

The resource manager deals with the organization (storage, indexing, and retrieval) of the data necessary to support the application logic. This is typically a database but it can also be a text retrieval system or any other data management system providing querying capabilities and persistence.

Client

Application Logic

Resource Manager

Presentation layer

Business rules

Business objects

Client

Server

Database

Client

Business processes

Persistent storage

Page 5: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 5

5

A Game of Boxes and Arrows

• Each box represents a part of the system.

• Each arrow represents a connection between two parts of the system.

• The more boxes, the more modular the system: more opportunities for distribution and parallelism. This allows encapsulation, component based design, reuse.

• The more boxes, the more arrows: more sessions (connections) need to be maintained, more coordination is necessary. The system becomes more complex to monitor and manage.

• The more boxes, the greater the number of context switches and intermediate steps to go through before one gets to the data. Performance suffers considerably.

• System designers try to balance the flexibility of modular design with the performance demands of real applications. Once a layer is established, it tends to migrate down and merge with lower layers.

There is no problem in system

design that cannot be solved by

adding a level of indirection.

There is no performance

problem that cannot be solved

by removing a level of

indirection.

Page 6: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 6

6

Top-Down Design

top-down design

PL-A PL-BPL-C

AL-AAL-B

AL-D

AL-C

RM-1 RM-2

top-down architecture

RM-1 RM-2

AL-A

AL-D

AL-C AL-B

PL-APL-B

PL-C

Page 7: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 7

7

Top-Down design

presentation layer

resource management layer

application logic layer

client

info

rmation

syst

em

1. define access channelsand client platforms

2. define presentation formats and protocols forthe selected clients andprotocols

3. define the functionalitynecessary to deliver thecontents and formats neededat the presentation layer

4. define the data sourcesand data organization neededto implement the applicationlogic

top-down design

Page 8: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 8

8

Bottom-Up Design

• In a bottom up design, many of the basic components already exist. These are stand alone systems which need to be integrated into new systems.

• The components do not necessarily cease to work as stand alone components. Often old applications continue running at the same time as new applications.

• This approach has a wide application because the underlying systems already exist and cannot be easily replaced.

• Much of the work and products in this area are related to middleware, the intermediate layer used to provide a common interface, bridge heterogeneity, and cope with distribution.

Legacy systems

New

application

Legacy

application

Page 9: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 9

9

Bottom-Up Design

bottom-up design

PL-A PL-BPL-C

AL-AAL-B

AL-D

AL-C

bot

tom-up

arc

hitect

ure

AL-A

AL-D

AL-C AL-B

PL-APL-B

PL-C

wrapper wrapper wrapperwrapper wrapperwrapper

legacyapplication

legacyapplication

legacysystem

legacysystem

legacysystem

Page 10: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 10

10

Bottom-Up Design

presentation layer

resource management layer

application logic layer

client

info

rmation

syst

em

1. define access channelsand client platforms

2. examine existing resourcesand the functionalitythey offer

3. wrap existing resourcesand integrate their functionalityinto a consistent interface

4. adapt the output of the application logic so that itcan be used with the requiredaccess channels and clientprotocols

bottom-up design

Page 11: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 11

11

One Tier: Fully Centralized

• The presentation layer, application logic and resource manager are built as a monolithic entity.

• Access through dumb terminals

• This was the typical architecture of mainframes, offering several advantages:

• no forced context switches in the control flow (everything happens within the system),

• all is centralized, managing and controlling resources is easier,

• the design can be highly optimized by blurring the separation between layers.

Server

Page 12: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 12

12

Two Tier: Client/Server

• As computers became more powerful, it was possible to move the presentation layer to the client. This has several advantages:• Clients are independent.

• Computing power at clients.

• It introduces the concept of API (Application Program Interface). An interface to invoke the system from the outside. It also allows designers to think about federating the systems into a single system.

• The resource manager only sees one client: the application logic. This greatly helps with performance since there are no client connections/sessions to maintain.

Server

Page 13: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 13

13

APIs in Client/Server

• Introduced notion of a service

• Introduced notion of an interface (how the client can invoke a given service)

• Many standardization efforts due to need for common APIs

resource management layer

serv

er

serviceinterface

serviceinterface

serviceinterface

serviceinterface

server’s API

serviceserviceserviceservice

Page 14: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 14

14

Technical Aspects Of Two Tier

• Advantages to Single Tier: • Take advantage of client capacity to off-load work to the clients

• Work within the server takes place within one scope (almost as in 1 tier),

• The server design is still tightly coupled and can be optimized by ignoring presentation issues

• Still relatively easy to manage and control from a software engineering point of view

• Disadvantages:• Connection management

• Clients are “tied” to the system (no standard presentation layer). Connect to two systems, a client needs two presentation layers.

• No failure or load encapsulation. If the server fails, nobody can work.

• The load created by one client will directly affect the work of others since they are all competing for the same resources.

Page 15: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 15

15

The Main Limitation of Client/Server

• The responsibility of dealing with heterogeneous systems is shifted to the client.

• The client becomes responsible for knowing where things are, how to get to them, and how to ensure consistency

• Very inefficient (software design, portability, code reuse, performance since the client capacity is limited, etc.).

• These issues cannot be solved with 2-tier

Server A Server B

• Accessing more than two servers:

• The underlying systems don’t know about each other

• No common business logic

• Client is the point of integration (increasingly fat clients)

Page 16: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 16

16

Three Tier: Middleware

• Three layers are fully separated.

• The layers are also typically distributed taking advantage of the complete modularity of the design

Page 17: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 17

17

Middleware

• Middleware is just a level of indirection between clients and other layers of the system.

• Introduces an additional layer of business logic encompassing all underlying systems.

• By doing this, a middleware system:• simplifies the design of the clients

by reducing the number of interfaces,

• provides transparent access to the underlying systems,

• acts as the platform for inter-system functionality and high level application logic, and

• takes care of locating resources, accessing them, and gathering results.

Middleware or

global application logic

clients

Local resource

managers

Local application logic

Server A Server B

middleware

Page 18: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 18

18

Technical Aspects of Middleware

• The introduction of a middleware layer helps in that:• the number of necessary interfaces is greatly reduced:

• clients see only one system (the middleware),

• local applications see only one system (the middleware),

• it centralizes control (middleware systems themselves are usually 2 tier),

• it makes necessary functionality widely available to all clients,

• it allows to implement functionality that otherwise would be very difficult to provide, and

• it is a first step towards dealing with application heterogeneity (some forms of it).

• The middleware layer does not help in that:• it is another indirection level,

• it is complex software,

• it is a development platform, not a complete system

Page 19: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 19

19

A three tier middleware based system

...External clients

connecting logic

control

user logic

internal clients

2 t

ier

syst

ems

Resource

managers

wrappers

middleware

Resource

manager

2 tier system

mid

dle

war

e sy

stem

External client

Page 20: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 20

20

N-Tier Architectures

• N-tier architectures result from connecting several three tier systems to each other

• The addition of the Web layer led to the notion of “application servers”, which was used to refer to middleware platforms supporting access through the Web

client

resource management layer

application logic layer

information system

middleware

presentationlayer

Web server

Web browser

HTML filter

Page 21: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 21

21

INTERNET

FIREWALL

LAN

Webserver cluster

LAN,gateways

LAN

internalclients

LAN

middlewareapplication

logic

resource management

layer databaseserver

LAN

middlewareapplication

logic

additional resource management layers

LAN

Wrappersand

gateways

fileserver

application

N-tier In reality

Page 22: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 22

22

Blocking or Synchronous Interaction

• Traditionally, information systems use blocking calls Synchronous interaction requires both parties to be “on-line”: the caller makes a request, the receiver gets the request, processes the request, sends a response, the caller receives the response.

• The caller must wait until the response comes back. but the interaction requires both client and server to be “alive” at the same time

CallReceiveResponse

Answer idle time

Disadvantages due to synchronization:• Connection overhead

• Higher probability of failures

• Difficult to identify and react to failures

• It is not really practical for complex interactions

client server

Page 23: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 23

23

Overhead of Synchronism

• Need to maintain a session between the caller and the receiver.

• Maintaining sessions is expensive. There is also a limit on how many sessions can be active at the same time

• For this reason, client/server systems often resort to connection pooling to optimize resource utilization• Have a pool of open

connections• Allocate connections as

needed• Synchronous interaction

requires a context for each call and a context management system for all incoming calls.

request()

do with answer

receive

process

return

session

duration

request()

do with answer

receive

processreturn

Context is lost

Needs to be restarted!!

Page 24: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 24

24

Failures In Synchronous Calls

• If the client or the server fail, the context is lost.• If the failure occurred before

1, nothing has happened• If the failure occurs after 1

but before 2 (receiver crashes), then the request is lost

• If the failure happens after 2 but before 3, side effects may cause inconsistencies

• If the failure occurs after 3 but before 4, the response is lost but the action has been performed (do it again?)

• Who is responsible for finding out what happened?

• Finding out when the failure took place may not be easy. If there is a chain of invocations the failure can occur anywhere along the chain.

request()

do with answer

receive

processreturn

1

2

34

request()

do with answer

timeouttry again

do with answer

receive

processreturn

1

2

3

receive

process

return

2’

3’

Page 25: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 25

25

Two Solutions

ENHANCED SUPPORT

• Client/Server systems and middleware platforms provide a number of mechanisms to deal with the problems created by synchronous interaction:

• Transactional interaction

• Service replication and load balancing

ASYNCHRONOUS INTERACTION

• Using asynchronous interaction, the caller sends a message that gets stored somewhere until the receiver reads it and sends a response. The response is sent in a similar manner

• Asynchronous interaction can take place in two forms:

• Non-blocking invocation

• Persistent queues

Page 26: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 26

26

Message Queuing

• Reliable queuing is an excellent complement to synchronous interactions:• Suitable to modular design:

the code for making a request can be in a different module (even a different machine!) than the code for dealing with the response

• Easier to design sophisticated distribution modes and it also helps to handle communication sessions in a more abstract way

• More natural way to implement complex interactions between heterogeneous systems

do with answerdo with answer

request()request()

receive

process

return

queue

queue

Page 27: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 27

Overview

• Enterprise architectures

• Internet concepts• URIs

• The HTTP Protocol

• The presentation layer• HTML, HTML Forms

• Cookies

• JavaScript

• Style Sheets

Page 28: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 28

Internet Concepts

• URIs

• The HTTP Protocol

• HTTP Overview

• Example HTTP Session

• HTTP 1.0 v. 1.1

• Live Demo via HTTP Tracer Plus

• Structure of Client Requests/Server Responses

Page 29: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 29

Uniform Resource Identifiers

• Uniform naming schema to identify resources on the Internet

• A resource can be anything:

• Index.html

• mysong.mp3

• picture.jpg

• Example URIs:

http://www.cs.wisc.edu/~dbbook/index.htmlmailto:[email protected]

Page 30: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 30

Structure of URIs

http://www.cs.wisc.edu/~dbbook/index.html

• URI has three parts:

• Naming schema (http)

• Name of the host computer (www.cs.wisc.edu)

• Name of the resource (~dbbook/index.html)

• URLs are a subset of URIs

Page 31: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 31

HTTP Overview

• HTTP: HyperText Transfer Protocol

• Developed by Tim Berners Lee, 1990

• Client/Server Architecture:

• Client requests a document

• Example clients: IE, Netscape, etc.

• Server returns the document

• Example servers: Apache, IIS

Page 32: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 32

Watch HTTP

• Telnet:

• telnet www.yahoo.com 80

• GET /

• See your requests:

• http://www.schroepl.net/cgi-bin/http_trace.pl

• Trace your HTTP traffic:

• http://www.sstinc.com/

Page 33: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 33

Example HTTP Session

• Client sends request, Server sends response

• Client requests the following URL: http://www.cs.cornell.edu:80/

• Anatomy of the Request:• http:// HyperText Transfer Protocol; other options:

ftp, mailto.

• www.cs.cornell.edu : host name

• :80: Port Number. 80 is reserved for HTTP. Ports can range from: 1-65,535

• / Root document

Page 34: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 34

The Client Request

Actual Browser Request

GET / HTTP/1.1Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */*

Accept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

Host: www.cs.cornell.eduConnection: Keep-Alive

Page 35: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 35

Anatomy of the Client Request

• GET / HTTP/1.1• Requests the root / document.• Specifies HTTP version 1.1.• HTTP Versions: 1.0 and 1.1 (more on this later…)

• Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */*• Indicates what type of media the browser will accept.

• Accept-Language: en-us• Browser’s preferred language

• Accept-Encoding: gzip, deflate• Accepts compressed data (speeds download times.)

Page 36: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 36

Anatomy of the Client Request

• User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

• Indicates the browser type.

• Host: www.cs.cornell.edu

• Required for HTTP 1.1

• Optional for HTTP 1.0

• A Server may host multiple hostnames. Hence, the browser indicates the host name here.

• Connection: Keep-Alive

• Enables “persistent connections”. Faster performance (more later…)

Page 37: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 37

Server Response

HTTP/1.1 200 OK

Date: Mon, 24 Sept 2001 20:54:26 GMT

Server: Apache/1.3.6 (Unix)

Last-Modified: Mon, 24 Sept 2001 14:06:11 GMT

Content-length: 327

Connection: close

Content-type: text/html

<title>Sample Homepage</title>

<img src="/images/oreilly_mast.gif">

<h1>Welcome</h2>This is the webpage of ...

Page 38: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 38

Anatomy of Server Response

• HTTP/1.1 200 OK• Server Status Code

• Code 200: Document was found

• We will examine other status codes shortly.

• Date: Mon, 24 Sept 2001 20:54:26 GMT• Date on the server.

• GMT (Greenwich Mean Time)

• Last-Modified: Mon, 24 Sept 2001 14:06:11 GMT• Indicates the time when the document was last modified.

• Very useful for browser caching.

• If a browser already has the page in its cache, it may not need to request the whole document again (more later…)

Page 39: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 39

Anatomy of Server Response

• Content-length: 327• Number of bytes in the document response.

• Connection: close• Indicates that the server will close the connection.

• If the client wants to send another request, it will need to open another connection to the server.

• Content-type: text/html• Indicates the MIME Type of the return document.

• Multi-Purpose Internet Mail Extensions

• Enables web servers to return binary or text files.

• Other MIME Categories:• audio, video, images, xml

Page 40: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 40

Anatomy of Server Response

The actual HTML document:<title>Sample Homepage</title>

<img src="/images/oreilly_mast.gif">

<h1>Welcome</h2>This is the web page of ...

Page 41: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 41

HTTP 1.0 v. 1.1: Getting Objects

Once a browser receives an HTML page, it makes separate connections to retrieve different objects within the page.

Client

Web

Browser

Web

Server

Give me /index.html

Here you go...

Now, give me logo.gif

Here you go...

Page 42: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 42

HTTP 1.0 v. 1.1

• HTTP 1.0:

• For each request, you must open a new connection with the server.

• HTTP 1.1

• For each request, the default action is to maintain an open connection with the server.

• Faster, Persistent Connections

• Supported by most browsers and servers.

Page 43: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 43

Example: HTTP 1.0 v. 1.1

• HTTP 1.0: Get HTML Page plus Images

• Open Connection: GET /index.html

• Open Connection: GET /logo.gif

• Open Connection: GET /button.gif

• HTTP 1.1: Get HTML Page plus Images

• Open Persistent Connection: GET /index.html

• GET /logo.gif

• GET /button.gif

Page 44: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 44

Client Requests

• Every client request includes three parts:

• Method: Used to indicate type of request, HTTP Version and name of requested document.

• Header Information: Used to specify browser version, language, etc.

• Entity Body: Used to specify form data for POST requests.

Page 45: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 45

Client Methods

• GET and POST: We will see them later when we discuss HTML forms.

• HEAD:• Similar to GET, except that the method requests only

the header information.• Server will return date-modified, but will not return

the data portion of the requested document.• Useful for browser caching.• For example:

• If browser contains a cached version of a page, it issues a head request.

• If document has not been modified recently, use cached version.

Page 46: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 46

Server Responses

• Every server response includes three parts:

• Response line: HTTP version number, three digit status code, and status message.

• Header: Information about the server and the object being served

• Entity Body: The actual data.

Page 47: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 47

Server Status Codes

• 100-199 Informational

• 200-299 Client Request Successful

• 300-399 Client Request Redirected

• 400-499 Client Request Incomplete

• 500-599 Server Errors

Page 48: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 48

Some Important Status Codes

• 200: OK

• Request was successful.

• 301: Moved Permanently

• Server redirects client to a new URL.

• 404 Not Found

• Document does not exist

• 500 Internal Server Error

• Error within the Web Server

Page 49: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 49

HTTP Is Stateless

• What does this mean:• No “sessions”

• Every message is completely self-contained

• No previous interaction is “remembered” by the protocol

• Tradeoff between ease of implementation and ease of application development: Other functionality has to be built on top

• Implications for applications:• Any state information (shopping carts, user login-information)

need to be encoded in every HTTP request and response!

• Popular methods on how to maintain state:• Cookies (later this lecture)

• Dynamically generate unique URL’s at the server level (later this lecture)

Page 50: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 50

Overview

• Enterprise architectures

• Internet concepts

• The presentation tier

• HTML, HTML Forms

• Cookies

• JavaScript

• Style Sheets

• The middle tier

Page 51: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 51

Web Data Formats

• HTML

• The presentation language for the Internet

• XML

• A self-describing, hierarchal data model

• We will cover XML and associated query and transformation languages (XPath, XSLT) later.

Page 52: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 52

HTML: An Example

<HTML>

<HEAD></HEAD>

<BODY>

<h1>Barns and Nobble Internet Bookstore</h1>

Our inventory:

<h3>Science</h3>

<b>The Character of Physical Law</b>

<UL>

<LI>Author: Richard Feynman</LI>

<LI>Published 1980</LI>

<LI>Hardcover</LI>

</UL>

<h3>Fiction</h3>

<b>Waiting for the Mahatma</b>

<UL>

<LI>Author: R.K. Narayan</LI>

<LI>Published 1981</LI>

</UL>

<b>The English Teacher</b>

<UL>

<LI>Author: R.K. Narayan</LI>

<LI>Published 1980</LI>

<LI>Paperback</LI>

</UL>

</BODY>

</HTML>

Page 53: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 53

HTML: A Short Introduction

• HTML is a markup language

• Commands are tags:

• Start tag and end tag

• Examples:

• <HTML> … </HTML>

• <UL> … </UL>

• Many editors automatically generate HTML

directly from your document (e.g., Microsoft

Word has an “Save as html” facility)

Page 54: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 54

HTML: Sample Commands

• <HTML>:

• <UL>: unordered list

• <LI>: list entry

• <h1>: largest heading

• <h2>: second-level heading, <h3>, <h4> analogous

• <B>Title</B>: Bold

Page 55: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 55

Overview

• Internet concepts

• The presentation tier

• HTML, HTML Forms

• Cookies

• JavaScript

• Style Sheets

• The middle tier

Page 56: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 56

Sites that know you...

• Just a few common examples:• my.yahoo.com

• www.amazon.com

• Each time I return to these sites, they remember who I am.• Yahoo remembers my news, bookmarks, etc.

• Amazon.com remembers what books I have browsed and makes recommendations.

• How do they do that?

Page 57: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 57

What is a Cookie?

• Small piece of data generated by a web server, stored on the client’s hard drive.

• Serves as an add-on to the HTTP specification (remember, HTTP by itself is stateless.)

• Controversial, as it enables web sites to track web users and their habits (more later…)

Page 58: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 58

Example Cookie Use

• Web Site Acme.com wants to track the number of unique visitors who access its site.

• If Acme.com checks the HTTP Server logs, it can determine the number of “hits”, but cannot determine the number of unique visitors.*

• That’s because HTTP is stateless. It retains no memory regarding individual users.

• Cookies provide a mechanism to solve this problem.

* Actually, you could check the log files for IP addresses, but

Internet proxies and NAT are a problem.

Page 59: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 59

Tracking Unique Visitors

• Step 1: Person A requests home page for acme.com

• Step 2: Acme.com Web Server generates a new unique ID.

• Step 3: Server returns home page plus a cookie set to the unique ID.

• Step 4: Each time Person A returns to acme.com, the browser automatically sends the cookie along with the GET request.

Page 60: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 60

Cookie Conversation

Browser ServerGive me the home page!

Here’s the home page plus

a cookie.

Now, give me the news page

(cookie is sent automatically)

I’ve seen you before… Here’s

the news page.

Page 61: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 61

Cookie Notes

• Created in 1994 for Netscape 1.1

• Cookies cannot be larger than 4K

• No domain (netscape.com, microsoft.com) can have more than 20 cookies.

• Cookies stay on your machine until:

• they automatically expire

• they are explicitly deleted

• Cookies work the same on all browsers. No cross-browser problems here!

Page 62: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 62

Magic Cookies

• The term cookie comes from an old programming hack, called Magic Cookies.

• If a programmer needed to make two programs communicate, he would create a “magic cookie”, a small file containing data to transfer between program parts.

Page 63: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 63

Cookie Standards

• Version 0 (Netscape):

• The original cookie specification

• Implemented by all browsers and servers

• We will focus on this Version

• Version 1

• A proposed Internet Engineering Task Force (IETF) standard - RFC 2109

• Compatible with V0, but with some extensions

• We will stick to Version 0.

Page 64: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 64

Why use Cookies?

• Tracking unique visitors

• Creating personalized web sites

• Shopping Carts

• Tracking users across your site:

• e.g. do users who visit your sports news page also visit your sports store?

Page 65: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 65

Cookie Anatomy

• Version 0 specifies six cookie parts:

• Name

• Value

• Domain

• Path

• Expires

• Secure

Page 66: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 66

Cookie Parts: Name/Value

• Name

• Name of your cookie (Required)

• Cannot contain whitespaces, semicolons or commas.

• Value

• Value of your cookie (Required)

• Cannot contain whitespaces, semicolons or commas.

Page 67: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 67

Cookie Parts: Domain

• Only pages from the domain which created a cookie are allowed to read the cookie.

• For example, amazon.com cannot read yahoo.com’s cookies (imagine the security flaws if this were otherwise!)

• By default, the domain is set to the full domain of the web server that served the web page.

• For example, myserver.mydomain.com would automatically set the domain to .myserver.mydomain.com

Page 68: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 68

Cookie Parts: Domain

• Note that domains are always prepended with a dot.• This is a security precaution: all domains must have

at least two periods.

• You can however, set a higher level domain• For example, myserver.mydomain.com can set the

domain to .mydomain.com. This way hisserver.mydomain.com and herserver.mydomain.com can all access the same cookies.

• No matter what, you cannot set a domain other than your own.

Page 69: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 69

Cookie Parts: Path

• Restricts cookie usage within the site.

• By default, the path is set to the path of the page that created the cookie.

• Example: user requests page from mymall.com/storea. By default, cookie will only be returned to pages for or under /storea.

• If you specify the path to / the cookie will be returned to all pages (a common practice.)

Page 70: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 70

Cookie Parts: Expires

• Specifies when the cookie will expire.

• Specified in Greenwich Mean Time (GMT):

• Wdy DD-Mon-YYYY HH:MM:SS GMT

• If you leave this value blank, browser will delete the cookie when the user exits the browser.

• This is known as a session cookies, as opposed to a persistent cookie.

Page 71: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 71

Cookie Parts: Secure

• The specification says that the secure flag is designed to encrypt cookies while in transit.

• A secure cookie will only be sent over a secure connection (such as SSL.)

• In other words, if a cookie is set to secure, and you connect using a non-secure connection, the cookie will not be sent.

Page 72: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 72

Weaknesses of Cookies

• People share machines

• per-user cookie files solves this

• People use multiple machines

• I have different cookies on different machines. Is this a bug or a feature?

• Cookies can be erased from the client machine’s hard drive

• Cookies can be copied

• This has security implications for eCommerce sites

Page 73: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 73

Cookie Abuse - I

• Conventional catalog stores would sell information about customers

• name/address/purchases

• eCommerce sites can gather and sell much more detailed information

• all the way down to clickstreams!

• But that’s only for a single site

Page 74: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 74

Cookie Abuse - II

• Ad servers and/or the “1-pixel gif”

• Simple form:

• bookstore.com page p17 has

• <img src=“x... adsvr.com/stat?page=...p17”>

• adsvr.com sets a persistent UID cookie in the usual way

• gets around cookie domain specification

• So adsvr.com can maintain user page visit statistics across multiple sites.

• It gets much more elaborate!

Page 75: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 75

Legal Abuse

• Amazon.com has been granted a patent on some aspects of storing structured data in cookies for eCommerce

• All you need is a unique ID if you are willing to keep the structured data in database

• So this is a technique for avoiding database accesses

• Probably many sites are infringing

• Amazon hasn’t sued anybody (yet)

Page 76: 2005-09-14-CS330-ThreeTier · • The server design is still tightly coupled and can be optimized by ignoring presentation issues • Still relatively easy to manage and control from

NBA 518: Enterprise Data Design and

Analysis 76

Cookie Blocking Software

• Cookie Central has pointers to lots of cookie blocking software.

• Cookie Pal

• Cookie Crusher

• Cookie Cruncher

• etc.

• But many (most) sites don’t work if you disable cookies these days ...