IBM Labs in Haifa Copyright 2000-2003 IBM Corporation Advanced Web Applications Development Technion CS 236606 Spring 2003, Class 2 Eliezer Dekel March

IBM Labs in HaifaCopyright 2000-2003

IBM Corporation

Advanced Web Applications Development

Technion CS 236606 Spring 2003, Class 2

Eliezer Dekel

March 2003

IBM Labs in Haifa

2 Copyright 2000-2003 IBM Corporation

Copyright 2000-2003 IBM Corporation Material is based on original by Dr. Alfred Spector & Dr. Jeffrey

Eppinger Updated by Eliezer Dekel

IBM Labs in Haifa


Table of Contents

Module A-1: Introduction Module A-2: Multi-tier Architectures Module A-3: Application Taxonomy Module A-4: Requirements of Web Applications Module A-5: Techniques for Scaling Module A-6: Caching and Replication Module A-7: An Example of Replication: Weighted Voting Module A-8: Load Balancing Module A-9: Failure Detection Module A-10: Achieving Availability with Malleability


IBM Corporation

IntroductionModule A-1

IBM Labs in Haifa


Complex heterogeneous infrastructures are a reality!

Director Director and Security and Security

ServicesServicesExistingExisting

ApplicationsApplicationsand Dataand Data

BusinessBusinessDataData

DataDataServerServerWebWeb

ApplicationApplicationServerServer

Storage AreaStorage AreaNetworkNetwork

BPs andBPs andExternalExternalServicesServices

Inte

rne

t F

ire

wall

Inte

rne

t F

ire

wall

WebWebServerServer

DNSDNSServerServer

DataData

Cach

eC

ach

e

Lo

ad

Bala

nce

rLo

ad

Bala

nce

r

Inte

rne

t F

ire

wall

Inte

rne

t F

ire

wall

Dozens of systems and applications

Hundreds of components

Thousands of tuning

parameters

IBM Labs in Haifa


On

e of th

e Data C

enters (500 servers)

C is c o 7 0 0 0

ICPMSCOMC7501

C is c o 7 0 0 0

ICPMSCOMC7502

C a ta lyst5 0 0 0

ICPMSCOMC5001(MSCOM1)

ATM0/0/0.1

FE4/0/0Port 1/1

HSRP

FE4/1/0 FE4/1/0

HSRP

Port 2/1 Port 2/1C a ta lyst

5 0 0 0


FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7503

C a ta lyst5 0 0 0


ATM0/0/0.1

FE4/0/0Port 1/1

HSRP

FE4/1/0 FE4/1/0

HSRP

Port 2/1 Port 2/1 C a ta lyst5 0 0 0


FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7504

SD

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SYSTEMS

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

AC AC

48V DC 48V DC

5VDC OK 5VDC OK

SHUTDOWN SHUTDOWN

CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V

ASX-1000

B DB DB D B D

A CA CA CA C

SD

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SYSTEMS

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

SER

ETHN

EXTSELECT

RESET

TXCRXL

PWR

AC AC

48V DC 48V DC

5VDC OK 5VDC OK

SHUTDOWN SHUTDOWN

CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V

ASX-1000

B DB DB D B D

A CA CA CA C

ICPMDISTFA1001 ICPMDISTFA1002

3A2 2A2

2A2

1A2

ATM0/0/0.1

4A2

ATM0/0/0.1

4A2

1A2

C is c o 7 0 0 0

ICPMSCOMC7505

Catalyst 2926

ICPMSFTDLC2921(MSCOM DL1)

Port 1/1

FE4/0/0

HSRP

C is c o 7 0 0 0

ICPMSCOMC7506

Catalyst 2926

ICPMSFTDLC2922(MSCOM DL2)

Port 1/1

FE5/0/0

HSRP

Port 1/2Port 1/2

FE4/0/0

HSRP

FE5/0/0

HSRP

IIS

IIS

IIS

IIS

IIS

IIS

CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30


WWW.MICROSOFT.COMWWW.MICROSOFT.COM

CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34

CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43

SEARCH.MICROSOFT.COM

CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09

CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18

WWW.MICROSOFT.COM

CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29


WWW.MICROSOFT.COM



REGISTER.MICROSOFT.COM

CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05

CPMSFTWBR09CPMSFTWBR10

SUPPORT.MICROSOFT.COM

CPMSFTWBT01CPMSFTWBT02



WINDOWS.MICROSOFT.COM

CPMSFTWBY01CPMSFTWBY02

CPMSFTWBY03CPMSFTWBY04

WINDOWS98.MICROSOFT.COM

CPMSFTWBJ01

WINDOWSMEDIA.MICROSOFT.COM

PREMIUM.MICROSOFT.COM

CPMSFTWBP01CPMSFTWBP02

CPMSFTWBP03

SUPPORT.MICROSOFT.COM


CPMSFTWBR07CPMSFTWBR08

CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06

REGISTER.MICROSOFT.COM

WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ01CPMSFTWBJ02


CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08




MSDN.MICROSOFT.COM

CPMSFTWBN01CPMSFTWBN02

CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM

CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42


INSIDER.MICROSOFT.COM

CPMSFTWBI01 CPMSFTWBI02

3D2

C a ta lyst5 0 0 0

IUSCCMQUEC5002(COMMUNIQUE2)

C a ta lyst5 0 0 0

IUSCCMQUEC5001(COMMUNIQUE1)

C a ta lyst5 0 0 0

C a ta lyst5 0 0 0

ICPMSCBAC5001ICPMSCBAC5502

Port 1/1 Port 1/2Port 2/12

C is c o 7 0 0 0

ICPCMGTC7501

C is c o 7 0 0 0

ICPCMGTC7502

FE4/1/0

Port 1/1

FE4/1/0SQL

Microsoft.com SQL Servers

Microsoft.com Stagers,Build and Misc. Servers

FTP 6

Build Servers 32

IIS 210

Application 2

Exchange 24

Network/Monitoring 12

SQL 120

Search 2

NetShow 3

NNTP 16

SMTP 6

Stagers 26

Total 459

Microsoft.com Server Count

Drawn by: Matt GroshongLast Updated: April 12, 2000

IP addresses removed by J im Gray to protect security

CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21

Backup SQL Servers

CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39

CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22

Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39

IIS

IIS

IIS IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

Consolidator SQL Servers

CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23

CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39

DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM

HTMLNEWS(pvt).MICROSOFT.COM

CPMSFTWBV01CPMSFTWBV02CPMSFTWBV03

CPMSFTWBV04CPMSFTWBV05

CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06

CPMSFTWBD07CPMSFTWBD08

CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09

CPMSFTWBD10CPMSFTWBD11

ACTIVEX.MICROSOFT.COM

CPMSFTWBA02 CPMSFTWBA03

FTP.MICROSOFT.COM

CPMSFTFTPA03CPMSFTFTPA04

CPMSFTFTPA05CPMSFTFTPA06

NTSERVICEPACK.MICROSOFT.COM

CPMSFTWBH01CPMSFTWBH02

CPMSFTWBH03

HOTFIX.MICROSOFT.COM

CPMSFTFTPA01

ASKSUPPORT.MICROSOFT.COM

CPMSFTWBAM03CPMSFTWBAM04

CPMSFTWBAM01CPMSFTWBAM01

MSDNNews.MICROSOFT.COM

CPMSFTWBV21CPMSFTWBV22

CPMSFTWBV23

MSDNSupport.MICROSOFT.COM

CPMSFTWBV41 CPMSFTWBV42

NEWSLETTERS.MICROSOFT.COM

CPMSFTSMTPQ01 CPMSFTSMTPQ02

NEWSLETTERS

CPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15

NEWSWIRE

CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03

Misc. SQL Servers

INTERNAL SMTP

CPMSFTSMTPR01CPMSFTSMTPR02

NEWSWIRE.MICROSOFT.COM

CPITGMSGR01 CPITGMSGR02

NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03

OFFICEUPDATE.MICROSOFT.COM

CPMSFTWBO01CPMSFTWBO02


PremOFFICEUPDATE.MICROSOFT.COM


CPMSFTWBO32

SearchMCSP.MICROSOFT.COM

CPMSFTWBM03

SvcsWINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ21 CPMSFTWBJ22

STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16

WINDOWS_Redir.MICROSOFT.COM

CPMSFTWBY05

COMMUNITIES

COMMUNITIES.MICROSOFT.COM

CPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03

CPMSFTNGXA04CPMSFTNGXA05

CODECS.MICROSOFT.COM



CGL.MICROSOFT.COM

CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05

CPMSFTWBG04CPMSFTWBG05

CDMICROSOFT.COM

CPMSFTWBC01CPMSFTWBC02

CPMSFTWBC03

BACKOFFICE.MICROSOFT.COM

CPMSFTWBB01CPMSFTWBB03

CPMSFTWBB04

Build Servers

INTERNET-BUILDINTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16

INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IIS

IISIIS

IIS IIS

SQL

SQL

SQL

SQL

SQLSQL

SQL

SQL

SQL

SQL

SQL

StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03

CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07

PPTP / Terminal Servers

CPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04

CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03

CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06

CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03

Monitoring Servers

CPMSFTHMON01CPMSFTHMON02CPMSFTHMON03

CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03

Canyon Park Data CenterMicrosoft.com Network Diagram


IBM Corporation

Multi-Tier ArchitecturesModule A-2

Where it All Takes Place

IBM Labs in Haifa


Recall 2-tier Vs. n-tier Architecture

Client(Browser)

Tier 2Logic

Tier 3Logic

Client Database

Database

Tier 2Logic

Data

Data

2-tier

N-tier

IBM Labs in Haifa


Why 2-tier?

(Often called “Client-Server”, which is a bad name because it’s too general) Simple Better for dynamic queries Potentially more efficient (probably not in reality) Perhaps more processing off-loaded to client (for better or worse) Global data modeling is not practical

IBM Labs in Haifa


Examples of Two-,Three-,and Four-Tiered Infrastructures

IBM Labs in Haifa


Why n-tier?

Modularity via objects, not enterprise-wide data model “Thin” clients since “Fat” clients infeasible Security Replication of business logic easier Flexibility Performance (Due to flexibility) Manageability All data not in one data model All data not in one database brand Etc.

IBM Labs in Haifa


Even with n-tier, Databases Crucial

Databases need to have all functions required in 2-tier and more. Data model support Concurrency Control Security Integrity Performance Manageability Support for heterogeneity

IBM Labs in Haifa


Databases in a Heterogeneous World

There needs to be semantic consistency while using multiple databases Atomicity Consistency Isolation Durability

Transactions will be covered later It is desirable that there be interoperability of applications with multiple databases

Same API to access multiple databases And, ability to access multiple databases Hence, motivation for JDBC and ODBC


IBM Corporation

Application TaxonomyModule A-3

Characterizing Web Applications

IBM Labs in Haifa


Application Taxonomy

Applications typically made up of many interactions with a client How the application must be built depends on the type of interactions

that comprise it This seems trivial, but it is where all architecture starts All interactions are to varying degrees

Asynchronous or Synchronous Influencing all interactions are requirements for concurrency, throughput,

latency, ... Interactions are sometimes called “transactions,” though no specific

semantic properties are applied to the word transaction when used in this way.

IBM Labs in Haifa


Workload Characteristics

Application Functionality Types of Interaction - Inquiry (Static and Dynamic) vs. Transactions Volume of Transactions Volume of User-Specific Responses (Personalization) Amount of Cross-Session Info Transaction Complexity Data Volatility Integration with legacy systems

Usage Patterns Number of Unique Items Number of Page Views Volume of Dynamic Searches Transaction Volumes Swing

Infrastructure Constraints % Secure Pages (privacy) Security: Authentication, Integrity, Non-repudiation, Regulations

IBM Labs in Haifa


Types of Web Applications

Publish and Subscribe Web Portals such as yahoo.com, excite.com, Media Sites such as www.nfc.co.il, zdnet.com and Events such as www.usopen.org, www.wimbeldon.org

Shopping Exact Inventory Sites - Victoriassecret.com, Abercrombie.com Inexact Inventory Sites - buy.com, dvdexpress.com

Customer Self Service Home banking - bankone.com, wingspanbank.com Travel Sites - Travelocity Insurance - amica.com

Trading Online Brokerages - schwab.com, fidelity.com, etrade.com Auction Sites - ebay.com, priceline.com Games – Interactive group game servers

IBM Labs in Haifa


Workload Characteristics of Web Applications

Low Medium High

Transaction Volumes

Dynamic ContentDynamic Searches

User Specific Responses (Personalization)

Cross-session Information

Legacy Integration

Data Volatility

Transaction Volume Swings

Number of content Publishers/Sources

Number of Unique Items per page

Page Content Volatility

Number of Page Views

Security, Authentication etc.

Percentage of Secure Pages

Transaction Complexity

System Workload Characteristics Publish &Subscribe

Shopping CustomerSelf Svc.

Trading

IBM Labs in Haifa


Application Taxonomy: Read Transactions

Read-only transactions Highly static: X-Ray, Corporate Information Entertainment Video,

1990 Census Nearly static: Train Schedule, Catalog without quantities Dynamic: Weather Forecast, Catalog with quantities Dynamic with high consistency requirements: Account balance,

Catalog with quantities Dynamic data with high consistency and rapid update: rock concert

sales with assigned seating

IBM Labs in Haifa


Application Taxonomy: Update Transactions

Update w/ modest integrity: Amazon book comment Update w/ high integrity: Billing record Update w/asynchronous processing: Stock Trade Update w/loosely coupled processing: Buying a physical product over

the net, or ordering/provisioning a new ISDN line

IBM Labs in Haifa


Issues

It is the type of applications along the read-only and update dimensions that greatly impact How applications are architected What system support is needed

For each of the previous examples, it is worth considering the implications


IBM Corporation

Requirements of Web ApplicationsModule A-4

IBM Labs in Haifa


Requirements - Summary

Availability Scalability Security Performance Integrity Manageability Malleability/Longevity Integration Cost

IBM Labs in Haifa


Availability

Defined as measurement of perceived uptime by a user There are 86,400 seconds in a day (~100,000) 31,536,000 seconds in a

year (~30 million) 99% uptime represents 1% downtime is

864 seconds/day or 14.4 minutes/day 315,360 seconds/year or 5256 minutes/year or 88 hours/year

99.99%53 minutes/year or 0.14 minutes/day)

99.999%5 minutes/year

99.99999% (7 nines)3 seconds/year

99.9999%30 seconds/year

Percentage UptimeDowntime

IBM Labs in Haifa


Availability - Discussion

What do you see on the web? Why? What will be required in the future?

IBM Labs in Haifa


In the News

Source: Gartner Group

IBM Labs in Haifa


Downtime Costs (per Hour)

Brokerage operations $6,450,000 Credit card authorization $2,600,000 Ebay (1 outage 22 hours) $225,000 Amazon.com $180,000 Package shipping services $150,000 Home shopping channel $113,000 Catalog sales center $90,000 Airline reservation center $89,000 Cellular service activation $41,000 On-line network fees $25,000 ATM service fees $14,000

Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."

IBM Labs in Haifa


September 11, 2001

Only 15% of the companies in the World Trade Center had a working business continuity plan

One Law firm did not have a backup outside of the building – it went out of business

One of the trading firms was able to successfully, immediately transition over to a backup site across the river with absolutely no interruption to their customers

An investment bank had only a tape backup. It took them four days to recover

IBM Labs in Haifa


Scalability

The capability of a system to adapt readily to a greater or lesser intensity of use, volume, or demand while still meeting its business objectives (acceptable levels of performance, availability, manageability etc.)

Ideal - Gracefully degrade as load increases. Seldom happens

Bad situation - Think it's OK until load increases. Poor design

Utilization increases faster than the load - Typical

Utilization increase linearly with load - Good Situation

Resource Utilization

Load

IBM Labs in Haifa


Security

Privacy Authentication Authorization Audit Non-repudiation

IBM Labs in Haifa


Performance

How long does it take to get a response to a request from the system? Top-level metrics

Latency Throughput

How many transactions can be completed in a unit of time (Capacity)? Subsidiary metrics

CPU Network Bandwidth I/O of various types ...

IBM Labs in Haifa


Integrity

Data correctness Data permanence Disaster recovery Data currency

IBM Labs in Haifa


Manageability

Consider number of elements in a web applications Consistency Security Modifications Performance Configuration Training level required of operators

IBM Labs in Haifa


Malleability/Longevity

Continuous availability (despite update and failure) Time period of use of program

IBM Labs in Haifa


Integration

Note: millions of person-years of spent every year for applications This represents a total multi-trillion dollar investment Hence, integration is a necessity Integration approaches

Application to application Data sharing by multiple applications Process (Complex application integration)

For some applications, integration cost is 7x cost of system, yet this is less than recreating existing applications or losing benefits of integrated systems

IBM Labs in Haifa


Cost

Initial implementation Modification Installation Management (management is greater than development cost – usually

at least double)

IBM Labs in Haifa


Total Cost of Ownership

HW management

3%

Environmental14%

Downtime20%

Purchase20%

Administration

13%

Backup Restore

30%•Administration: all people time•Backup Restore: devices, media, and people time•Environmental: floor space, power, air conditioning

IBM Labs in Haifa


Cause of System Crashes

20%10%

5%

50%

18%

5%

15%

53%

69%

15% 18% 21%

0%

20%

40%

60%

80%

100%

1985 1993 2001

Other: app, power, network failure

System management: actions + N/problem

Operating Systemfailure

Hardware failure

(est.)

Current State of the ART

Failures due to people up, hard to measure VAX crashes ‘85, ‘93 [Murp95]; extrap. to ‘01 HW/OS 70% in ‘85 to 28% in ‘93. In ‘01, 10%? How get administrator to admit mistake? (Heisenberg?)

(based on the lecture “Recovery Oriented Computing” by Dave Patterson, Berkeley)


IBM Corporation

Techniques for Scaling Module A-5

Techniques for achieving the requirements

IBM Labs in Haifa


Motivation

Defined: Data is stored without overlap across multiple sites and each site processes its data the same way

This is the architecture of the web (Order of magnitude circa 10^12 hits/day)

Back of the envelope thought exercise: Assume a server can handle average number of hits ranging from

10^1/sec. – 10^4 /sec Then, there must be 10^3 – 10^6 web sites to meet load…

Examples (data partitioning – segmented workload): 1999 data on one site, 1998 on another… a’s on one site, b’s on another…

IBM Labs in Haifa


Some typical Web site loads over a 24-hour period

IBM Labs in Haifa


Example Response Time Budget

Client Request5%

Request Network Latency5%

Server Time55%

Response Network Latency20%

Client Response Processing15%

IBM Labs in Haifa


How Latency Varies Based on Workload Pattern and Tier

IBM Labs in Haifa


Achieving the Requirements

Faster Machines (Vertical Growth) Replicated Machines (Horizontal Growth) Specialized Machines Segmented Workloads Request Batching User Data Aggregation Connection Management and Caching

It is important to note that a detailed understanding of the application is key to the successful implementation

IBM Labs in Haifa


Faster Machines - Vertical Growth

Scalability can be achieved through the use of faster machines. This technique can include:

moving to hardware that is bigger than current environment. For example: moving a web server from and PC based server running NT to a UNIX based serverusing machines with more CPUs to leverage

the operating system's multitasking and multiprocessing capabilitiesusing machines that leverage other

computing paradigms such as parallel computingusing better software that is optimized for the

CPUusing faster hardware components such as

memory, cache, disk and I/O devices etc.

IBM Labs in Haifa


Replicated Machines - Clusters

Adding more machines of the same type and load balancing requests across these machines. In order to implement this technique we have to implement additional components in the architecture such as:

Dispatcher node that can monitor and load balance processing requests across the replicated machines A synchronization node that synchronizes the

content and data across the machinesA mechanism for managing sessions across

replicated machines

IBM Labs in Haifa


Specialized Machines

Individual components of the architecture can be scaled by using specialized machines that perform a certain function much faster. This technique is typically used in architectures to facilitate: Intelligent routing of traffic and data across replicated machines Dynamic caching, used extensively by event sites and other media

sites to speed up access to frequently accessed content Security and encryption, used by high volume sites to speed up the

SSL encryption and decryption

IBM Labs in Haifa


Segmented Workload

This is a technique that is typically used in conjunction with replicated machines. It involves the partitioning of the workload of an application to achieve optimum performance. There are several ways of implementing this technique, they vary from:

URL references, which is the most simplistic form of segmenting the workload by analyzing the URL and directing the requests to appropriate serversFunctional Partitioning, which looks at the

application and builds the partitioning of the workload in through custom programmingData Partitioning, placing segments of the data

in different machines

Function 1

Function 2

Function 3

IBM Labs in Haifa


Request Batching Multi-tier communication places a large computational

load on both the client-tier (requester) and the server-tier. It also introduces considerable latency. Furthermore, the overhead costs of virtually all cross-tier requests are equal, therefore it is much better to make fewer, but larger requests.

The goal of this technique is to reduce the number of requests that are sent between requesters and responders (such as between tiers or processes) by allowing the requester to define new requests that combine multiple requests.

Client Server

Client Server

Client

Server

Server

Server

Client

Server

Server

Server

Command

IBM Labs in Haifa


User Data Aggregation

This technique aggregates most commonly accessed data from multiple backend systems to speed up the overall performance of the architecture. This technique is typically implemented using:Custom ProgrammingIntelligent Middleware andData replication

Client Server

Client Server

Client

Server

Server

Server

Client

Server

Server

Server

Server

IBM Labs in Haifa


Connection Management

This technique aims to achieve scalability by reducing the most expensive operations within an application's workflows. This includes connections to legacy systems, databases and other servers

Servlet /App

WEB Application Server

PoolConnection

Connection Manager

ClientClient

Resource

I ncoming Request

1

4 6

3

7

A5 B

2

1. WAS passes a user request to a Servlet/App

2. The Servlet requests a connection from the Mgr.

3. The Mgr get a connection from the pool and gives the Servlet/app a connection.

4. The Servlet uses the connection to the resource

5. The resource returns data back

6. The Servlet return the connection to the Manager and the connection is returned to the pool

7. The Servlet/App sends the response back

If a connection is not available: A The CM requests a new connection B Adds the connection to the pool

IBM Labs in Haifa


Caching

Defined: Storage of and reference to data in a location that can be accessed faster and/or with higher aggregate bandwidth

Done at every level of a system Processor/memory Computer/disk Browser Web

Simplest when only one, infrequent writer of the data Issues: Write through caches

Cache invalidation

IBM Labs in Haifa


Caching (continued)

More complex when multiple writers and/or higher frequency updates There is the distributed cache consistency problem This happens in:

Computer architecture Multi-computer architectures Distributed systems of all types, including the web

Examples: Browser cache DNS Mirror sites Etc.

IBM Labs in Haifa


Techniques Applied to Web Tiers

IBM Labs in Haifa


Dimensions of the Scaling Techniques

Scaling TechniqueIncrease Power

Improve Efficiency

Shift / Reduce Load

Faster Machine X

Replicate Machines X

Specialized Machines X X

Segmented Workload X X

Request Batching X

User Data Aggregation X

Connection Management X

Caching X X


IBM Corporation

Caching and Replication Module A-6

The Technology Behind the Techniques

IBM Labs in Haifa


Cache Consistency Techniques

Fuzzy Use time-out and hope for the best Setting time-out is very tricky and error-prone

Consistent caching Use distributed cache consistency algorithms There are trade-offs between availability and consistency Algorithms are very tricky but can be gotten right Typical approach is the concept of token management The concept of token management...

Read token Write token Usually more tokens required to make things really work

IBM Labs in Haifa


Replication

Definition: Explicit creation, maintenance, and access of multiple copies of some resource Processors Bandwidth Data Etc.

Why replicate? Throughput Bandwidth Availability Integrity

IBM Labs in Haifa


Replication vs. Data Partitioning

Replication Same or overlapping data stored at multiple locations

Partitioning Data non-overlapping Typically, only one “home” for any data element

IBM Labs in Haifa


Replication vs. Caching

Difference between caching and replication Caching: there is a fundamental difference between a cached copy

and the real “backing” data. Loss of the cache is not a failure except from the perspective of performance

Replication: all replicas are of the same type, albeit not necessarily identical. Loss of a replica is a failure and could result in higher likelihood of lost data

IBM Labs in Haifa


Semantics of Replication

Consistency/fuzzy replication Same issue as in caching as above

What does consistency mean? Ticket Sales (OK to not show all the seats) Latest Score in basketball game (Can lag by up to n seconds) Weather forecast (Variable lag, depending on serverity of change) Prices for certain goods (Perhaps they need to be exact, as

differentials would cause customer dissatisfaction)

IBM Labs in Haifa


Replication Algorithms Abound

Unanimous Update Always update all copies Read from any copy

Excellent read throughput Excellent read availability Very poor write throughout Very poor write availability

Unanimous Read Always read all copies Update any copy

Excellent write throughput and availability Very poor read throughput and availability

IBM Labs in Haifa


Additional Replication Algorithms

Primary Copy Must update primary copy Primary copy ensures all other copies get updated Read from any copy

Excellent read throughput and availability Poor write availability

Signicant complexity in ensuring primary copy updates all other replicas

Voting Assume n copies Read from any r Write to n-r+1

IBM Labs in Haifa


Replication Conclusions

All algorithms quite difficult to implement But, replication has compelling benefits

Best long term approach for high data availability Software update or data reorganization Disaster recovery

Obvious performance benefits as well, at least for data which is either read or written infrequently. (Often, one of these is true.)

Systems support for replication required if implementation is to be feasible

Systems Support – Atomic Transactions in particular


IBM Corporation

An Example of Replication: Weighted Voting Module A-7

This algorithm is due to Dr. David Gifford and was published in the ACM, 1974, …

IBM Labs in Haifa



Unanimous Update Unanimous Read Primary Copy Weighted Voting

Assume n copies Read from r (r n), the “read set” Write to n-r+1, the “write set” Concept is that there is overlap between read and write sets,

ensuring up-to-date copy seen.

IBM Labs in Haifa



Unanimous Update Unanimous Read Primary Copy Weighted Voting

Assume n copies Read from r (r n), the “read set” Write to n-r+1, the “write set” Concept is that there is overlap between read and write sets,

ensuring up-to-date copy seen.

IBM Labs in Haifa


Weighted Voting in More Detail

Each replica assigned a “weight” Each replica stores {version #, value} pair Read algorithm

Read from r copies Choose value associated with highest version #

Write algorithm Read from r copies to obtain version_numberi

Update n-r+1 copies using 1+max(version_numberi)

The invariant is that there are always n-r+1 copies of the data, and each of these has the same, highest version number.

IBM Labs in Haifa


Weighted Voting Example

Value:

Version:

Replica A Weight:

Value:

Version:

Replica B Weight:

Value:

Version:

Replica C Weight:

IBM Labs in Haifa


Weighted Voting Pragmatics

When read set is small, read availability and throughput is high When write set is small, write availability and throughput is high Or is it?

Writes require reads in version # based algorithm… Solution involves monotonically increasing time stamps:

Time clocks are typically not used by themselves Sequence numbers get passed on every message and continually

updated

IBM Labs in Haifa


Problems

What happens if a replica is down? Answer: self-healing; replica eventually restored

What happens if there are concurrent updates? What happens if reads occur during an update? What happens if there are failures during writes?

The algorithm fails The invariant gets violated The algorithm produces inconsistent results

IBM Labs in Haifa


The Solution

The atomic transaction Distributed updates and reads are done within the scope of a

transaction ACID Properties automatically maintained by system

Atomicity Consistency Isolation Durability

These properties make it possible to maintain invariants on distributed objects; e.g., the replicas

IBM Labs in Haifa


The Atomic Transaction

In a few weeks, we will discuss the concept fully Usage Implementation (major purpose of the book by Bernstein)

It will play an important role in the course because: The Web is a distributed structure There need to be invariants maintained across data Doing this by-hand –if one worries about failure– is very tedious


IBM Corporation

Load Balancing Module A-8

IBM Labs in Haifa


Load Balancing

Definition: Load Balancing refers to a technique that uses a load balancing algorithm (LBA) to choose a replica

Definition: An LBA is an algorithm (typically distributed) that permits a client to select a replica that meets performance & availability goals

Participants in the algorithm include clients and commonly replicas and other intermediaries

May want priority for certain requests

IBM Labs in Haifa


Load Balancing In Use - Examples

Direct a data read or write to: An unloaded replica A nearby replica A replica that will not charge much for its service …

Direct a processing request to: A replica that will complete the request with minimum latency A node that has been used for similar processing, so its cache is

primed …

IBM Labs in Haifa


Many Approaches to Load Balancing

Maintain a replicated directory service Client can consult an instance of it to gain an address of a replica Approaches

Directory can return set of replicas and client can use algorithm to determine proper replica

Or, Directory service can apply algorithm and return proper replica

Can use a replicated, intermediary that is a forwarding service

IBM Labs in Haifa


Algorithms for Directing Load

Randomization Round-robin Dynamic: Based on recent replica performance Locality-based (recent usage) Content-based Geography or Topology-based Negotiation-based (Request for Proposal -- direction to lowest bidder)

IBM Labs in Haifa


Randomization

Simple Excellent if

Locality effects are not important Reasonable distribution of requests

Timing Duration

No need for priority-based execution Willingness to accept stochastically good performance

IBM Labs in Haifa


Round-Robin

What is meant by Round-Robin Intra-client round robin? Inter-client round robin?

Simple Excellent if

Locality effects are unimportant (or non-existent) Requests have similar duration

IBM Labs in Haifa


Add’l Topics for Randomization & RR

Algorithms should take into account: Differential capacity of replicas Differential capacity of networks Ownership of resources Security issues

IBM Labs in Haifa


Dynamic Load Balancing

Can track in one or more places: Actual performance by replica Metrics of replica loading Results of probes

That information can be used to determine best replica Complex Advantages

Can provide excellent results in situations that randomized or round-robin load-balancing does not

Can be customized to provide priority, etc.

IBM Labs in Haifa


A Strawman LBA

Assumptions below… Clients 1..n, Datagatherer, & Replicas A & B DataGatherer

Probes replicas every 60 seconds, (Time = 0, 60, …) Chooses least loaded replica & reports it for 60 secs

Clients issue Service time for requests is ~10 secs w/low variance Requests to replicas based upon consulting DataGatherer

IBM Labs in Haifa


What’s the Result?

A meta-stable system: all load oscillates between Replica A and Replica B

Problem: reported load not tracking actual load Solutions

More frequent probes: probes should happen more frequently than 1/average(service time)

LBA should be less definitive in nature; e.g., somewhat stochastic In any case, designing good load balancing algorithms is hard without

knowing lots of information about the load

IBM Labs in Haifa


Locality-based

Premise is that a replica that has serviced a certain type of request recently should do so again

Why? Efficiency due to already available resources

E.g., open files or databases Efficiency due to security

E.g., secure communication sessions Complexity: how to other techniques, as Locality may not be enough

IBM Labs in Haifa


Content-based

As in data partitioning, assume certain types of data can best be handled by certain sites Site A stores “aa…az” in random access memory Site B does the same for “ba..bz” Therefore, “a” requests should generally go to Site A.

This is actually an approach for achieving locality

IBM Labs in Haifa


Geography or Topology-based

Based on co-location of client and replica May be an indicator of

Higher bandwidth Shorter latency Increased reliability Better security

Domain names are now registered with geographical coordinates

IBM Labs in Haifa


Negotiation-based

Virtual capitalism in action: Issue RFP Evaluate RFPs Ship work as appropriate

Cost of load-balancing overhead must be less than benefit This approach can get very interesting quickly:

Contractual commitments and compensation if unmet A way to do Pareto optimal scheduling

Useful to implement for real load balancing in business-to-business e-commerce

IBM Labs in Haifa


Role of Caching

Cache results of LBA for performance and availability The usual problem of cache correctness

How long until cache refresh Time-outs too short -> load balancing algorithm places too much load Time-outs too long -> data is insufficiently fresh

What happens when cache sends you to a failed site If faulty cached-data, go back and refetch This leads to the definition of a Hint

A cached entry which is right with high probability, but can be and always is checked for validity prior to use

The issue of time-out appears again

IBM Labs in Haifa


Example: Load Balancing to HTTP Server

User specifies http://www.xxx.com Request should actually be handled by one of many HTTP servers to

provide higher throughput One approach Can do request re-direction (a type of forwarder)

See http protocol definition as in assigned reading The forwarder a potential bottleneck

IBM Labs in Haifa


Approach 1 – Round Robin DNS

DNS entries allow 32 server addresses per record. DNS (name) servers will cycle through the entries therefore providing

round-robin load balancing Advantages

Cheap Easy

IBM Labs in Haifa


Round Robin DNS - Problems

Addresses of unavailable servers will remain until an administrator removes the entries

It takes hours or days for the DNS database to replicate So, system hands out addresses of down servers for a long time Address of recently added servers take a while to become visible

All servers treated equally Perhaps, new servers will likely be faster than the old ones and

would handle more load Some servers may handle multiple loads and should get fewer

requests

IBM Labs in Haifa


Cisco Local and Distributed Director

See:http://www.cisco.com/warp/public/cc/pd/cxsr/400/tech/scale_wp.htm

Session redirection accomplished by rewriting IP header using a mapping table

Intelligent load balancing to servers within a cluster Takes into account status of servers Uses only a single DNS entry for entire server complex

Simplifies administration Hot standby feasible

Fancier load balancing of this type Routes requests based on topological distance Routing decisions can be based on hop counts, network usage, &

round-trip latency.

IBM Labs in Haifa


IBM Secureway Network Dispatcher

http://www-4.ibm.com/software/network/dispatcher/about/features/keyfeatures.html Network dispatcher

Doesn't modify packets (vs. LocalDirector which does) Only inspects inbound requests (LocalDirector looks at both)

So, response go back directly to the requester (greater efficiency) Background processes check servers to ensure that they are up

"advisors" support HTTP, SSL, FTP, NNTP, POP3, SMTP, Telnet This way requests don't go to down servers.

Balances load across servers of different sizes: Servers send CPU, Disk, I/O metrics to dispatcher

Supports hot standby for high availability of dispatcher Uses a "sticky" port option to route client requests to same server to

ensure state preserved across requests: recall locality topic


IBM Corporation

Failure Detection Module A-9

IBM Labs in Haifa


Failure Detection

Explicit –clear indication that failure has occurred Timely Semantics clean, … as far as they go Voting

Implicit – timeout Requester does not receive response after waiting a while Unclean: Does not necessarily mean remote system failed

Timeout often used in very many places/levels Communication Naming, … And, ultimately, End-to-end

Some have argued only end-to-end timeouts valuable, but this is incorrect

IBM Labs in Haifa


Timeout In More Depth

Problems with timeouts Semantics Specification of timeout length

Particularly difficult when requests take variable amounts of time And, requester, can not dynamically set time-out interval Long intervals lead to poor customer satisfaction – imagine an

ATM that made you wait 10 minutes before failing and giving you your card back?

Therefore, timeouts are used at multiple system levels Lower levels have more predictable performance so can trigger

timely failures better Higher levels are required for ultimate correctness

IBM Labs in Haifa


The Role of the Sequence Number

Sequence number in communication protocol Failure Duplicate detection Flow control

Sequence number in replication algorithms As discussed previously

Sequence number in site crash detection Sites increment a number post failure Therefore possible to tell if site has crashed This is important to not miss getting work done on a site

IBM Labs in Haifa


Voting

Discussed wrt: Weighted Voting Algorithm Used to determine most up-to-date copies

What if used to detect incorrect data N-way computation

Structure N-inputs: vote on them and determine most typical input N-computations on most typical input Vote on result N-outputs which go into next stage of computation Or go to some device which itself votes

IBM Labs in Haifa

100

Copyright 2000-2003 IBM Corporation

Yahoo Denial of Service Attack

Mostly unavailable 10:20AM – 12:00PM PST 2/7/00 Reported cause (NYT, 2/8/00)

50 computers “flooded” Yahoo site 1 gigabyte/second or 20 mbytes/computer/second “Clogging” Yahoo’s site and routers Difficult to trace due to use of hijacked computers

Solutions Audit, Filter, Legal System

Typical Yahoo availability: 99.3%, according to Keynote Systems Corresponds to being down 61 hours/year And, Yahoo is a good site


IBM Corporation

Achieving Availability with MalleabilityModule A-10

IBM Labs in Haifa

102


The Goal Malleability

How do you change the system without taking it down? The application The operating system Perhaps, even a change to the hardware

This has proven very hard

IBM Labs in Haifa

103


An Approach

Ensure a service is replicated Stop a copy Augment its interfaces Restart it And repetitively do the same to the other copies Eventually, all replicas will have no capabilities Note: it is very hard to reduce the scope of interfaces., Augmentation is

much easier.

IBM Labs in Haifa

104


An Example

Assume you want to modify the function of a replicated directory while it is online

Assume there are: Multiples instances of the replicated directory itself, called

CtrReplicaGrps Multiple instances of the individual replicas, called CtrReplicas As in the weighted voting algorithm discussed earlier

IBM Labs in Haifa

105


Technique (1)

[Part 1 to be discussed at the end] Part 2: one by one,

Stop a CtrReplica (hope a failure doesn’t occur simultaneously) Start a new version Do for all CtrReplicas The CtrReplicaGrps should not mind this gradual change. (They

don’t use the new methods… yet…) Also, they can tolerate the failure of a CtrReplica

IBM Labs in Haifa

106


Technique (2)

Part 3: Now, one by one: Stop a CtrReplicaGrp Start the new version Do for all CtrReplicaGrps

Now, there is a new function available. Finally, do Part 1: test what we have so carefully installed, so we haven’t

just (methodically) inserted a bug into the entire, supposedly fault-tolerant, system

IBM Labs in Haifa

107


Issues

Issues: Too many steps for a human being to get right

So, need automation via console May not handle a simultaneous failure during upgrading:

So, more replicas may be needed Cost of availability: The shape of this curve is right, though the calibration is

unknown and undoubtedly flattens as experience grows

0

10

20

30

40

50

60

70

IBM Labs in Haifa

108


Window of Vulnerability

If transactions used, there is a potential availability problem during the “Window of Vulnerability”

The only solution is that transactions coordinators must be rather reliable and be guaranteed to recover quickly after a crash

IBM Labs in Haifa

109


Availability

So, considerable thought required to achieve high availability in malleable systems

Better when not needed However, when high availability required

Every level of system needs to be studied and addressed

IBM Labs in Haifa

110


The Architecture As We’re Studying It

…

EJB

DB

MS

Servlet/JSPClient

Integrated Dev’t Environment

Java Runtime Environment

Security/Directory (X509, LDAP, Kerberos)

Linux NT AIX Solaris Sys/390

Reusable Components

Modeling and Other Softw’ Eng. Tools

System

s Mgm

t

Reliable

Messsaging

Workflow Management

Documents

IBM Labs in Haifa Copyright 2000-2003 IBM Corporation Advanced Web Applications Development Technion CS 236606 Spring 2003, Class 2 Eliezer Dekel March