Upload
rolf-crawford
View
216
Download
1
Embed Size (px)
Citation preview
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Advanced Web Applications Development
Technion CS 236606 Spring 2003, Class 2
Eliezer Dekel
March 2003
IBM Labs in Haifa
2 Copyright 2000-2003 IBM Corporation
Copyright 2000-2003 IBM Corporation Material is based on original by Dr. Alfred Spector & Dr. Jeffrey
Eppinger Updated by Eliezer Dekel
IBM Labs in Haifa
3 Copyright 2000-2003 IBM Corporation
Table of Contents
Module A-1: Introduction Module A-2: Multi-tier Architectures Module A-3: Application Taxonomy Module A-4: Requirements of Web Applications Module A-5: Techniques for Scaling Module A-6: Caching and Replication Module A-7: An Example of Replication: Weighted Voting Module A-8: Load Balancing Module A-9: Failure Detection Module A-10: Achieving Availability with Malleability
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
IntroductionModule A-1
IBM Labs in Haifa
5 Copyright 2000-2003 IBM Corporation
Complex heterogeneous infrastructures are a reality!
Director Director and Security and Security
ServicesServicesExistingExisting
ApplicationsApplicationsand Dataand Data
BusinessBusinessDataData
DataDataServerServerWebWeb
ApplicationApplicationServerServer
Storage AreaStorage AreaNetworkNetwork
BPs andBPs andExternalExternalServicesServices
Inte
rne
t F
ire
wall
Inte
rne
t F
ire
wall
WebWebServerServer
DNSDNSServerServer
DataData
Cach
eC
ach
e
Lo
ad
Bala
nce
rLo
ad
Bala
nce
r
Inte
rne
t F
ire
wall
Inte
rne
t F
ire
wall
Dozens of systems and applications
Hundreds of components
Thousands of tuning
parameters
IBM Labs in Haifa
6 Copyright 2000-2003 IBM Corporation
On
e of th
e Data C
enters (500 servers)
C is c o 7 0 0 0
ICPMSCOMC7501
C is c o 7 0 0 0
ICPMSCOMC7502
C a ta lyst5 0 0 0
ICPMSCOMC5001(MSCOM1)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1C a ta lyst
5 0 0 0
ICPMSCOMC5002(MSCOM2)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7503
C a ta lyst5 0 0 0
ICPMSCOMC5003(MSCOM3)
ATM0/0/0.1
FE4/0/0Port 1/1
HSRP
FE4/1/0 FE4/1/0
HSRP
Port 2/1 Port 2/1 C a ta lyst5 0 0 0
ICPMSCOMC5004(MSCOM4)
FE4/0/0
ATM0/0/0.1
Port 1/1
C is c o 7 0 0 0
ICPMSCOMC7504
SD
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SYSTEMS
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
AC AC
48V DC 48V DC
5VDC OK 5VDC OK
SHUTDOWN SHUTDOWN
CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V
ASX-1000
B DB DB D B D
A CA CA CA C
SD
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SYSTEMS
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
SER
ETHN
EXTSELECT
RESET
TXCRXL
PWR
AC AC
48V DC 48V DC
5VDC OK 5VDC OK
SHUTDOWN SHUTDOWN
CAUTION:Double Pole/neutral fusing CAUTION:Double Pole/neutral fusingF12A/250V F12A/250V
ASX-1000
B DB DB D B D
A CA CA CA C
ICPMDISTFA1001 ICPMDISTFA1002
3A2 2A2
2A2
1A2
ATM0/0/0.1
4A2
ATM0/0/0.1
4A2
1A2
C is c o 7 0 0 0
ICPMSCOMC7505
Catalyst 2926
ICPMSFTDLC2921(MSCOM DL1)
Port 1/1
FE4/0/0
HSRP
C is c o 7 0 0 0
ICPMSCOMC7506
Catalyst 2926
ICPMSFTDLC2922(MSCOM DL2)
Port 1/1
FE5/0/0
HSRP
Port 1/2Port 1/2
FE4/0/0
HSRP
FE5/0/0
HSRP
IIS
IIS
IIS
IIS
IIS
IIS
CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30
CPMSFTWBW37CPMSFTWBW38CPMSFTWBW39
WWW.MICROSOFT.COMWWW.MICROSOFT.COM
CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34
CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43
SEARCH.MICROSOFT.COM
CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09
CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18
WWW.MICROSOFT.COM
CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29
CPMSFTWBW36CPMSFTWBW44CPMSFTWBW45
WWW.MICROSOFT.COM
CPMSFTWBW01CPMSFTWBW15CPMSFTWBW25
CPMSFTWBW27CPMSFTWBW46CPMSFTWBW47
REGISTER.MICROSOFT.COM
CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05
CPMSFTWBR09CPMSFTWBR10
SUPPORT.MICROSOFT.COM
CPMSFTWBT01CPMSFTWBT02
CPMSFTWBT03CPMSFTWBT07
CPMSFTWBT04CPMSFTWBT05
WINDOWS.MICROSOFT.COM
CPMSFTWBY01CPMSFTWBY02
CPMSFTWBY03CPMSFTWBY04
WINDOWS98.MICROSOFT.COM
CPMSFTWBJ01
WINDOWSMEDIA.MICROSOFT.COM
PREMIUM.MICROSOFT.COM
CPMSFTWBP01CPMSFTWBP02
CPMSFTWBP03
SUPPORT.MICROSOFT.COM
CPMSFTWBT06CPMSFTWBT08
CPMSFTWBR07CPMSFTWBR08
CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06
REGISTER.MICROSOFT.COM
WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ01CPMSFTWBJ02
CPMSFTWBJ03CPMSFTWBJ05
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08
CPMSFTWBJ09CPMSFTWBJ10
MSDN.MICROSOFT.COM
CPMSFTWBN01CPMSFTWBN02
CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM
CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42
CPMSFTWBT43CPMSFTWBT44
INSIDER.MICROSOFT.COM
CPMSFTWBI01 CPMSFTWBI02
3D2
C a ta lyst5 0 0 0
IUSCCMQUEC5002(COMMUNIQUE2)
C a ta lyst5 0 0 0
IUSCCMQUEC5001(COMMUNIQUE1)
C a ta lyst5 0 0 0
C a ta lyst5 0 0 0
ICPMSCBAC5001ICPMSCBAC5502
Port 1/1 Port 1/2Port 2/12
C is c o 7 0 0 0
ICPCMGTC7501
C is c o 7 0 0 0
ICPCMGTC7502
FE4/1/0
Port 1/1
FE4/1/0SQL
Microsoft.com SQL Servers
Microsoft.com Stagers,Build and Misc. Servers
FTP 6
Build Servers 32
IIS 210
Application 2
Exchange 24
Network/Monitoring 12
SQL 120
Search 2
NetShow 3
NNTP 16
SMTP 6
Stagers 26
Total 459
Microsoft.com Server Count
Drawn by: Matt GroshongLast Updated: April 12, 2000
IP addresses removed by J im Gray to protect security
CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21
Backup SQL Servers
CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39
CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22
Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
Consolidator SQL Servers
CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23
CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39
DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM
HTMLNEWS(pvt).MICROSOFT.COM
CPMSFTWBV01CPMSFTWBV02CPMSFTWBV03
CPMSFTWBV04CPMSFTWBV05
CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06
CPMSFTWBD07CPMSFTWBD08
CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09
CPMSFTWBD10CPMSFTWBD11
ACTIVEX.MICROSOFT.COM
CPMSFTWBA02 CPMSFTWBA03
FTP.MICROSOFT.COM
CPMSFTFTPA03CPMSFTFTPA04
CPMSFTFTPA05CPMSFTFTPA06
NTSERVICEPACK.MICROSOFT.COM
CPMSFTWBH01CPMSFTWBH02
CPMSFTWBH03
HOTFIX.MICROSOFT.COM
CPMSFTFTPA01
ASKSUPPORT.MICROSOFT.COM
CPMSFTWBAM03CPMSFTWBAM04
CPMSFTWBAM01CPMSFTWBAM01
MSDNNews.MICROSOFT.COM
CPMSFTWBV21CPMSFTWBV22
CPMSFTWBV23
MSDNSupport.MICROSOFT.COM
CPMSFTWBV41 CPMSFTWBV42
NEWSLETTERS.MICROSOFT.COM
CPMSFTSMTPQ01 CPMSFTSMTPQ02
NEWSLETTERS
CPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15
NEWSWIRE
CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03
Misc. SQL Servers
INTERNAL SMTP
CPMSFTSMTPR01CPMSFTSMTPR02
NEWSWIRE.MICROSOFT.COM
CPITGMSGR01 CPITGMSGR02
NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03
OFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO01CPMSFTWBO02
CPMSFTWBO04CPMSFTWBO07
PremOFFICEUPDATE.MICROSOFT.COM
CPMSFTWBO30CPMSFTWBO31
CPMSFTWBO32
SearchMCSP.MICROSOFT.COM
CPMSFTWBM03
SvcsWINDOWSMEDIA.MICROSOFT.COM
CPMSFTWBJ21 CPMSFTWBJ22
STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16
WINDOWS_Redir.MICROSOFT.COM
CPMSFTWBY05
COMMUNITIES
COMMUNITIES.MICROSOFT.COM
CPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03
CPMSFTNGXA04CPMSFTNGXA05
CODECS.MICROSOFT.COM
CPMSFTWBJ16CPMSFTWBJ17CPMSFTWBJ18
CPMSFTWBJ19CPMSFTWBJ20
CGL.MICROSOFT.COM
CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05
CPMSFTWBG04CPMSFTWBG05
CDMICROSOFT.COM
CPMSFTWBC01CPMSFTWBC02
CPMSFTWBC03
BACKOFFICE.MICROSOFT.COM
CPMSFTWBB01CPMSFTWBB03
CPMSFTWBB04
Build Servers
INTERNET-BUILDINTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16
INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IIS
IISIIS
IIS IIS
SQL
SQL
SQL
SQL
SQLSQL
SQL
SQL
SQL
SQL
SQL
StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03
CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07
PPTP / Terminal Servers
CPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04
CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03
CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06
CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03
Monitoring Servers
CPMSFTHMON01CPMSFTHMON02CPMSFTHMON03
CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03
Canyon Park Data CenterMicrosoft.com Network Diagram
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Multi-Tier ArchitecturesModule A-2
Where it All Takes Place
IBM Labs in Haifa
8 Copyright 2000-2003 IBM Corporation
Recall 2-tier Vs. n-tier Architecture
Client(Browser)
Tier 2Logic
Tier 3Logic
Client Database
Database
Tier 2Logic
Data
Data
2-tier
N-tier
IBM Labs in Haifa
9 Copyright 2000-2003 IBM Corporation
Why 2-tier?
(Often called “Client-Server”, which is a bad name because it’s too general) Simple Better for dynamic queries Potentially more efficient (probably not in reality) Perhaps more processing off-loaded to client (for better or worse) Global data modeling is not practical
IBM Labs in Haifa
10 Copyright 2000-2003 IBM Corporation
Examples of Two-,Three-,and Four-Tiered Infrastructures
IBM Labs in Haifa
11 Copyright 2000-2003 IBM Corporation
Why n-tier?
Modularity via objects, not enterprise-wide data model “Thin” clients since “Fat” clients infeasible Security Replication of business logic easier Flexibility Performance (Due to flexibility) Manageability All data not in one data model All data not in one database brand Etc.
IBM Labs in Haifa
12 Copyright 2000-2003 IBM Corporation
Even with n-tier, Databases Crucial
Databases need to have all functions required in 2-tier and more. Data model support Concurrency Control Security Integrity Performance Manageability Support for heterogeneity
IBM Labs in Haifa
13 Copyright 2000-2003 IBM Corporation
Databases in a Heterogeneous World
There needs to be semantic consistency while using multiple databases Atomicity Consistency Isolation Durability
Transactions will be covered later It is desirable that there be interoperability of applications with multiple databases
Same API to access multiple databases And, ability to access multiple databases Hence, motivation for JDBC and ODBC
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Application TaxonomyModule A-3
Characterizing Web Applications
IBM Labs in Haifa
15 Copyright 2000-2003 IBM Corporation
Application Taxonomy
Applications typically made up of many interactions with a client How the application must be built depends on the type of interactions
that comprise it This seems trivial, but it is where all architecture starts All interactions are to varying degrees
Asynchronous or Synchronous Influencing all interactions are requirements for concurrency, throughput,
latency, ... Interactions are sometimes called “transactions,” though no specific
semantic properties are applied to the word transaction when used in this way.
IBM Labs in Haifa
16 Copyright 2000-2003 IBM Corporation
Workload Characteristics
Application Functionality Types of Interaction - Inquiry (Static and Dynamic) vs. Transactions Volume of Transactions Volume of User-Specific Responses (Personalization) Amount of Cross-Session Info Transaction Complexity Data Volatility Integration with legacy systems
Usage Patterns Number of Unique Items Number of Page Views Volume of Dynamic Searches Transaction Volumes Swing
Infrastructure Constraints % Secure Pages (privacy) Security: Authentication, Integrity, Non-repudiation, Regulations
IBM Labs in Haifa
17 Copyright 2000-2003 IBM Corporation
Types of Web Applications
Publish and Subscribe Web Portals such as yahoo.com, excite.com, Media Sites such as www.nfc.co.il, zdnet.com and Events such as www.usopen.org, www.wimbeldon.org
Shopping Exact Inventory Sites - Victoriassecret.com, Abercrombie.com Inexact Inventory Sites - buy.com, dvdexpress.com
Customer Self Service Home banking - bankone.com, wingspanbank.com Travel Sites - Travelocity Insurance - amica.com
Trading Online Brokerages - schwab.com, fidelity.com, etrade.com Auction Sites - ebay.com, priceline.com Games – Interactive group game servers
IBM Labs in Haifa
18 Copyright 2000-2003 IBM Corporation
Workload Characteristics of Web Applications
Low Medium High
Transaction Volumes
Dynamic ContentDynamic Searches
User Specific Responses (Personalization)
Cross-session Information
Legacy Integration
Data Volatility
Transaction Volume Swings
Number of content Publishers/Sources
Number of Unique Items per page
Page Content Volatility
Number of Page Views
Security, Authentication etc.
Percentage of Secure Pages
Transaction Complexity
System Workload Characteristics Publish &Subscribe
Shopping CustomerSelf Svc.
Trading
IBM Labs in Haifa
19 Copyright 2000-2003 IBM Corporation
Application Taxonomy: Read Transactions
Read-only transactions Highly static: X-Ray, Corporate Information Entertainment Video,
1990 Census Nearly static: Train Schedule, Catalog without quantities Dynamic: Weather Forecast, Catalog with quantities Dynamic with high consistency requirements: Account balance,
Catalog with quantities Dynamic data with high consistency and rapid update: rock concert
sales with assigned seating
IBM Labs in Haifa
20 Copyright 2000-2003 IBM Corporation
Application Taxonomy: Update Transactions
Update w/ modest integrity: Amazon book comment Update w/ high integrity: Billing record Update w/asynchronous processing: Stock Trade Update w/loosely coupled processing: Buying a physical product over
the net, or ordering/provisioning a new ISDN line
IBM Labs in Haifa
21 Copyright 2000-2003 IBM Corporation
Issues
It is the type of applications along the read-only and update dimensions that greatly impact How applications are architected What system support is needed
For each of the previous examples, it is worth considering the implications
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Requirements of Web ApplicationsModule A-4
IBM Labs in Haifa
23 Copyright 2000-2003 IBM Corporation
Requirements - Summary
Availability Scalability Security Performance Integrity Manageability Malleability/Longevity Integration Cost
IBM Labs in Haifa
24 Copyright 2000-2003 IBM Corporation
Availability
Defined as measurement of perceived uptime by a user There are 86,400 seconds in a day (~100,000) 31,536,000 seconds in a
year (~30 million) 99% uptime represents 1% downtime is
864 seconds/day or 14.4 minutes/day 315,360 seconds/year or 5256 minutes/year or 88 hours/year
99.99%53 minutes/year or 0.14 minutes/day)
99.999%5 minutes/year
99.99999% (7 nines)3 seconds/year
99.9999%30 seconds/year
Percentage UptimeDowntime
IBM Labs in Haifa
25 Copyright 2000-2003 IBM Corporation
Availability - Discussion
What do you see on the web? Why? What will be required in the future?
IBM Labs in Haifa
26 Copyright 2000-2003 IBM Corporation
In the News
Source: Gartner Group
IBM Labs in Haifa
27 Copyright 2000-2003 IBM Corporation
Downtime Costs (per Hour)
Brokerage operations $6,450,000 Credit card authorization $2,600,000 Ebay (1 outage 22 hours) $225,000 Amazon.com $180,000 Package shipping services $150,000 Home shopping channel $113,000 Catalog sales center $90,000 Airline reservation center $89,000 Cellular service activation $41,000 On-line network fees $25,000 ATM service fees $14,000
Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."
IBM Labs in Haifa
28 Copyright 2000-2003 IBM Corporation
September 11, 2001
Only 15% of the companies in the World Trade Center had a working business continuity plan
One Law firm did not have a backup outside of the building – it went out of business
One of the trading firms was able to successfully, immediately transition over to a backup site across the river with absolutely no interruption to their customers
An investment bank had only a tape backup. It took them four days to recover
IBM Labs in Haifa
29 Copyright 2000-2003 IBM Corporation
Scalability
The capability of a system to adapt readily to a greater or lesser intensity of use, volume, or demand while still meeting its business objectives (acceptable levels of performance, availability, manageability etc.)
Ideal - Gracefully degrade as load increases. Seldom happens
Bad situation - Think it's OK until load increases. Poor design
Utilization increases faster than the load - Typical
Utilization increase linearly with load - Good Situation
Resource Utilization
Load
IBM Labs in Haifa
30 Copyright 2000-2003 IBM Corporation
Security
Privacy Authentication Authorization Audit Non-repudiation
IBM Labs in Haifa
31 Copyright 2000-2003 IBM Corporation
Performance
How long does it take to get a response to a request from the system? Top-level metrics
Latency Throughput
How many transactions can be completed in a unit of time (Capacity)? Subsidiary metrics
CPU Network Bandwidth I/O of various types ...
IBM Labs in Haifa
32 Copyright 2000-2003 IBM Corporation
Integrity
Data correctness Data permanence Disaster recovery Data currency
IBM Labs in Haifa
33 Copyright 2000-2003 IBM Corporation
Manageability
Consider number of elements in a web applications Consistency Security Modifications Performance Configuration Training level required of operators
IBM Labs in Haifa
34 Copyright 2000-2003 IBM Corporation
Malleability/Longevity
Continuous availability (despite update and failure) Time period of use of program
IBM Labs in Haifa
35 Copyright 2000-2003 IBM Corporation
Integration
Note: millions of person-years of spent every year for applications This represents a total multi-trillion dollar investment Hence, integration is a necessity Integration approaches
Application to application Data sharing by multiple applications Process (Complex application integration)
For some applications, integration cost is 7x cost of system, yet this is less than recreating existing applications or losing benefits of integrated systems
IBM Labs in Haifa
36 Copyright 2000-2003 IBM Corporation
Cost
Initial implementation Modification Installation Management (management is greater than development cost – usually
at least double)
IBM Labs in Haifa
37 Copyright 2000-2003 IBM Corporation
Total Cost of Ownership
HW management
3%
Environmental14%
Downtime20%
Purchase20%
Administration
13%
Backup Restore
30%•Administration: all people time•Backup Restore: devices, media, and people time•Environmental: floor space, power, air conditioning
IBM Labs in Haifa
38 Copyright 2000-2003 IBM Corporation
Cause of System Crashes
20%10%
5%
50%
18%
5%
15%
53%
69%
15% 18% 21%
0%
20%
40%
60%
80%
100%
1985 1993 2001
Other: app, power, network failure
System management: actions + N/problem
Operating Systemfailure
Hardware failure
(est.)
Current State of the ART
Failures due to people up, hard to measure VAX crashes ‘85, ‘93 [Murp95]; extrap. to ‘01 HW/OS 70% in ‘85 to 28% in ‘93. In ‘01, 10%? How get administrator to admit mistake? (Heisenberg?)
(based on the lecture “Recovery Oriented Computing” by Dave Patterson, Berkeley)
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Techniques for Scaling Module A-5
Techniques for achieving the requirements
IBM Labs in Haifa
40 Copyright 2000-2003 IBM Corporation
Motivation
Defined: Data is stored without overlap across multiple sites and each site processes its data the same way
This is the architecture of the web (Order of magnitude circa 10^12 hits/day)
Back of the envelope thought exercise: Assume a server can handle average number of hits ranging from
10^1/sec. – 10^4 /sec Then, there must be 10^3 – 10^6 web sites to meet load…
Examples (data partitioning – segmented workload): 1999 data on one site, 1998 on another… a’s on one site, b’s on another…
IBM Labs in Haifa
41 Copyright 2000-2003 IBM Corporation
Some typical Web site loads over a 24-hour period
IBM Labs in Haifa
42 Copyright 2000-2003 IBM Corporation
Example Response Time Budget
Client Request5%
Request Network Latency5%
Server Time55%
Response Network Latency20%
Client Response Processing15%
IBM Labs in Haifa
43 Copyright 2000-2003 IBM Corporation
How Latency Varies Based on Workload Pattern and Tier
IBM Labs in Haifa
44 Copyright 2000-2003 IBM Corporation
Achieving the Requirements
Faster Machines (Vertical Growth) Replicated Machines (Horizontal Growth) Specialized Machines Segmented Workloads Request Batching User Data Aggregation Connection Management and Caching
It is important to note that a detailed understanding of the application is key to the successful implementation
IBM Labs in Haifa
45 Copyright 2000-2003 IBM Corporation
Faster Machines - Vertical Growth
Scalability can be achieved through the use of faster machines. This technique can include:
moving to hardware that is bigger than current environment. For example: moving a web server from and PC based server running NT to a UNIX based serverusing machines with more CPUs to leverage
the operating system's multitasking and multiprocessing capabilitiesusing machines that leverage other
computing paradigms such as parallel computingusing better software that is optimized for the
CPUusing faster hardware components such as
memory, cache, disk and I/O devices etc.
IBM Labs in Haifa
46 Copyright 2000-2003 IBM Corporation
Replicated Machines - Clusters
Adding more machines of the same type and load balancing requests across these machines. In order to implement this technique we have to implement additional components in the architecture such as:
Dispatcher node that can monitor and load balance processing requests across the replicated machines A synchronization node that synchronizes the
content and data across the machinesA mechanism for managing sessions across
replicated machines
IBM Labs in Haifa
47 Copyright 2000-2003 IBM Corporation
Specialized Machines
Individual components of the architecture can be scaled by using specialized machines that perform a certain function much faster. This technique is typically used in architectures to facilitate: Intelligent routing of traffic and data across replicated machines Dynamic caching, used extensively by event sites and other media
sites to speed up access to frequently accessed content Security and encryption, used by high volume sites to speed up the
SSL encryption and decryption
IBM Labs in Haifa
48 Copyright 2000-2003 IBM Corporation
Segmented Workload
This is a technique that is typically used in conjunction with replicated machines. It involves the partitioning of the workload of an application to achieve optimum performance. There are several ways of implementing this technique, they vary from:
URL references, which is the most simplistic form of segmenting the workload by analyzing the URL and directing the requests to appropriate serversFunctional Partitioning, which looks at the
application and builds the partitioning of the workload in through custom programmingData Partitioning, placing segments of the data
in different machines
Function 1
Function 2
Function 3
IBM Labs in Haifa
49 Copyright 2000-2003 IBM Corporation
Request Batching Multi-tier communication places a large computational
load on both the client-tier (requester) and the server-tier. It also introduces considerable latency. Furthermore, the overhead costs of virtually all cross-tier requests are equal, therefore it is much better to make fewer, but larger requests.
The goal of this technique is to reduce the number of requests that are sent between requesters and responders (such as between tiers or processes) by allowing the requester to define new requests that combine multiple requests.
Client Server
Client Server
Client
Server
Server
Server
Client
Server
Server
Server
Command
IBM Labs in Haifa
50 Copyright 2000-2003 IBM Corporation
User Data Aggregation
This technique aggregates most commonly accessed data from multiple backend systems to speed up the overall performance of the architecture. This technique is typically implemented using:Custom ProgrammingIntelligent Middleware andData replication
Client Server
Client Server
Client
Server
Server
Server
Client
Server
Server
Server
Server
IBM Labs in Haifa
51 Copyright 2000-2003 IBM Corporation
Connection Management
This technique aims to achieve scalability by reducing the most expensive operations within an application's workflows. This includes connections to legacy systems, databases and other servers
Servlet /App
WEB Application Server
PoolConnection
Connection Manager
ClientClient
Resource
I ncoming Request
1
4 6
3
7
A5 B
2
1. WAS passes a user request to a Servlet/App
2. The Servlet requests a connection from the Mgr.
3. The Mgr get a connection from the pool and gives the Servlet/app a connection.
4. The Servlet uses the connection to the resource
5. The resource returns data back
6. The Servlet return the connection to the Manager and the connection is returned to the pool
7. The Servlet/App sends the response back
If a connection is not available: A The CM requests a new connection B Adds the connection to the pool
IBM Labs in Haifa
52 Copyright 2000-2003 IBM Corporation
Caching
Defined: Storage of and reference to data in a location that can be accessed faster and/or with higher aggregate bandwidth
Done at every level of a system Processor/memory Computer/disk Browser Web
Simplest when only one, infrequent writer of the data Issues: Write through caches
Cache invalidation
IBM Labs in Haifa
53 Copyright 2000-2003 IBM Corporation
Caching (continued)
More complex when multiple writers and/or higher frequency updates There is the distributed cache consistency problem This happens in:
Computer architecture Multi-computer architectures Distributed systems of all types, including the web
Examples: Browser cache DNS Mirror sites Etc.
IBM Labs in Haifa
54 Copyright 2000-2003 IBM Corporation
Techniques Applied to Web Tiers
IBM Labs in Haifa
55 Copyright 2000-2003 IBM Corporation
Dimensions of the Scaling Techniques
Scaling TechniqueIncrease Power
Improve Efficiency
Shift / Reduce Load
Faster Machine X
Replicate Machines X
Specialized Machines X X
Segmented Workload X X
Request Batching X
User Data Aggregation X
Connection Management X
Caching X X
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Caching and Replication Module A-6
The Technology Behind the Techniques
IBM Labs in Haifa
57 Copyright 2000-2003 IBM Corporation
Cache Consistency Techniques
Fuzzy Use time-out and hope for the best Setting time-out is very tricky and error-prone
Consistent caching Use distributed cache consistency algorithms There are trade-offs between availability and consistency Algorithms are very tricky but can be gotten right Typical approach is the concept of token management The concept of token management...
Read token Write token Usually more tokens required to make things really work
IBM Labs in Haifa
58 Copyright 2000-2003 IBM Corporation
Replication
Definition: Explicit creation, maintenance, and access of multiple copies of some resource Processors Bandwidth Data Etc.
Why replicate? Throughput Bandwidth Availability Integrity
IBM Labs in Haifa
59 Copyright 2000-2003 IBM Corporation
Replication vs. Data Partitioning
Replication Same or overlapping data stored at multiple locations
Partitioning Data non-overlapping Typically, only one “home” for any data element
IBM Labs in Haifa
60 Copyright 2000-2003 IBM Corporation
Replication vs. Caching
Difference between caching and replication Caching: there is a fundamental difference between a cached copy
and the real “backing” data. Loss of the cache is not a failure except from the perspective of performance
Replication: all replicas are of the same type, albeit not necessarily identical. Loss of a replica is a failure and could result in higher likelihood of lost data
IBM Labs in Haifa
61 Copyright 2000-2003 IBM Corporation
Semantics of Replication
Consistency/fuzzy replication Same issue as in caching as above
What does consistency mean? Ticket Sales (OK to not show all the seats) Latest Score in basketball game (Can lag by up to n seconds) Weather forecast (Variable lag, depending on serverity of change) Prices for certain goods (Perhaps they need to be exact, as
differentials would cause customer dissatisfaction)
IBM Labs in Haifa
62 Copyright 2000-2003 IBM Corporation
Replication Algorithms Abound
Unanimous Update Always update all copies Read from any copy
Excellent read throughput Excellent read availability Very poor write throughout Very poor write availability
Unanimous Read Always read all copies Update any copy
Excellent write throughput and availability Very poor read throughput and availability
IBM Labs in Haifa
63 Copyright 2000-2003 IBM Corporation
Additional Replication Algorithms
Primary Copy Must update primary copy Primary copy ensures all other copies get updated Read from any copy
Excellent read throughput and availability Poor write availability
Signicant complexity in ensuring primary copy updates all other replicas
Voting Assume n copies Read from any r Write to n-r+1
IBM Labs in Haifa
64 Copyright 2000-2003 IBM Corporation
Replication Conclusions
All algorithms quite difficult to implement But, replication has compelling benefits
Best long term approach for high data availability Software update or data reorganization Disaster recovery
Obvious performance benefits as well, at least for data which is either read or written infrequently. (Often, one of these is true.)
Systems support for replication required if implementation is to be feasible
Systems Support – Atomic Transactions in particular
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
An Example of Replication: Weighted Voting Module A-7
This algorithm is due to Dr. David Gifford and was published in the ACM, 1974, …
IBM Labs in Haifa
66 Copyright 2000-2003 IBM Corporation
Replication Algorithms Abound
Unanimous Update Unanimous Read Primary Copy Weighted Voting
Assume n copies Read from r (r n), the “read set” Write to n-r+1, the “write set” Concept is that there is overlap between read and write sets,
ensuring up-to-date copy seen.
IBM Labs in Haifa
67 Copyright 2000-2003 IBM Corporation
Replication Algorithms Abound
Unanimous Update Unanimous Read Primary Copy Weighted Voting
Assume n copies Read from r (r n), the “read set” Write to n-r+1, the “write set” Concept is that there is overlap between read and write sets,
ensuring up-to-date copy seen.
IBM Labs in Haifa
68 Copyright 2000-2003 IBM Corporation
Weighted Voting in More Detail
Each replica assigned a “weight” Each replica stores {version #, value} pair Read algorithm
Read from r copies Choose value associated with highest version #
Write algorithm Read from r copies to obtain version_numberi
Update n-r+1 copies using 1+max(version_numberi)
The invariant is that there are always n-r+1 copies of the data, and each of these has the same, highest version number.
IBM Labs in Haifa
69 Copyright 2000-2003 IBM Corporation
Weighted Voting Example
Value:
Version:
Replica A Weight:
Value:
Version:
Replica B Weight:
Value:
Version:
Replica C Weight:
IBM Labs in Haifa
70 Copyright 2000-2003 IBM Corporation
Weighted Voting Pragmatics
When read set is small, read availability and throughput is high When write set is small, write availability and throughput is high Or is it?
Writes require reads in version # based algorithm… Solution involves monotonically increasing time stamps:
Time clocks are typically not used by themselves Sequence numbers get passed on every message and continually
updated
IBM Labs in Haifa
71 Copyright 2000-2003 IBM Corporation
Problems
What happens if a replica is down? Answer: self-healing; replica eventually restored
What happens if there are concurrent updates? What happens if reads occur during an update? What happens if there are failures during writes?
The algorithm fails The invariant gets violated The algorithm produces inconsistent results
IBM Labs in Haifa
72 Copyright 2000-2003 IBM Corporation
The Solution
The atomic transaction Distributed updates and reads are done within the scope of a
transaction ACID Properties automatically maintained by system
Atomicity Consistency Isolation Durability
These properties make it possible to maintain invariants on distributed objects; e.g., the replicas
IBM Labs in Haifa
73 Copyright 2000-2003 IBM Corporation
The Atomic Transaction
In a few weeks, we will discuss the concept fully Usage Implementation (major purpose of the book by Bernstein)
It will play an important role in the course because: The Web is a distributed structure There need to be invariants maintained across data Doing this by-hand –if one worries about failure– is very tedious
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Load Balancing Module A-8
IBM Labs in Haifa
75 Copyright 2000-2003 IBM Corporation
Load Balancing
Definition: Load Balancing refers to a technique that uses a load balancing algorithm (LBA) to choose a replica
Definition: An LBA is an algorithm (typically distributed) that permits a client to select a replica that meets performance & availability goals
Participants in the algorithm include clients and commonly replicas and other intermediaries
May want priority for certain requests
IBM Labs in Haifa
76 Copyright 2000-2003 IBM Corporation
Load Balancing In Use - Examples
Direct a data read or write to: An unloaded replica A nearby replica A replica that will not charge much for its service …
Direct a processing request to: A replica that will complete the request with minimum latency A node that has been used for similar processing, so its cache is
primed …
IBM Labs in Haifa
77 Copyright 2000-2003 IBM Corporation
Many Approaches to Load Balancing
Maintain a replicated directory service Client can consult an instance of it to gain an address of a replica Approaches
Directory can return set of replicas and client can use algorithm to determine proper replica
Or, Directory service can apply algorithm and return proper replica
Can use a replicated, intermediary that is a forwarding service
IBM Labs in Haifa
78 Copyright 2000-2003 IBM Corporation
Algorithms for Directing Load
Randomization Round-robin Dynamic: Based on recent replica performance Locality-based (recent usage) Content-based Geography or Topology-based Negotiation-based (Request for Proposal -- direction to lowest bidder)
IBM Labs in Haifa
79 Copyright 2000-2003 IBM Corporation
Randomization
Simple Excellent if
Locality effects are not important Reasonable distribution of requests
Timing Duration
No need for priority-based execution Willingness to accept stochastically good performance
IBM Labs in Haifa
80 Copyright 2000-2003 IBM Corporation
Round-Robin
What is meant by Round-Robin Intra-client round robin? Inter-client round robin?
Simple Excellent if
Locality effects are unimportant (or non-existent) Requests have similar duration
IBM Labs in Haifa
81 Copyright 2000-2003 IBM Corporation
Add’l Topics for Randomization & RR
Algorithms should take into account: Differential capacity of replicas Differential capacity of networks Ownership of resources Security issues
IBM Labs in Haifa
82 Copyright 2000-2003 IBM Corporation
Dynamic Load Balancing
Can track in one or more places: Actual performance by replica Metrics of replica loading Results of probes
That information can be used to determine best replica Complex Advantages
Can provide excellent results in situations that randomized or round-robin load-balancing does not
Can be customized to provide priority, etc.
IBM Labs in Haifa
83 Copyright 2000-2003 IBM Corporation
A Strawman LBA
Assumptions below… Clients 1..n, Datagatherer, & Replicas A & B DataGatherer
Probes replicas every 60 seconds, (Time = 0, 60, …) Chooses least loaded replica & reports it for 60 secs
Clients issue Service time for requests is ~10 secs w/low variance Requests to replicas based upon consulting DataGatherer
IBM Labs in Haifa
84 Copyright 2000-2003 IBM Corporation
What’s the Result?
A meta-stable system: all load oscillates between Replica A and Replica B
Problem: reported load not tracking actual load Solutions
More frequent probes: probes should happen more frequently than 1/average(service time)
LBA should be less definitive in nature; e.g., somewhat stochastic In any case, designing good load balancing algorithms is hard without
knowing lots of information about the load
IBM Labs in Haifa
85 Copyright 2000-2003 IBM Corporation
Locality-based
Premise is that a replica that has serviced a certain type of request recently should do so again
Why? Efficiency due to already available resources
E.g., open files or databases Efficiency due to security
E.g., secure communication sessions Complexity: how to other techniques, as Locality may not be enough
IBM Labs in Haifa
86 Copyright 2000-2003 IBM Corporation
Content-based
As in data partitioning, assume certain types of data can best be handled by certain sites Site A stores “aa…az” in random access memory Site B does the same for “ba..bz” Therefore, “a” requests should generally go to Site A.
This is actually an approach for achieving locality
IBM Labs in Haifa
87 Copyright 2000-2003 IBM Corporation
Geography or Topology-based
Based on co-location of client and replica May be an indicator of
Higher bandwidth Shorter latency Increased reliability Better security
Domain names are now registered with geographical coordinates
IBM Labs in Haifa
88 Copyright 2000-2003 IBM Corporation
Negotiation-based
Virtual capitalism in action: Issue RFP Evaluate RFPs Ship work as appropriate
Cost of load-balancing overhead must be less than benefit This approach can get very interesting quickly:
Contractual commitments and compensation if unmet A way to do Pareto optimal scheduling
Useful to implement for real load balancing in business-to-business e-commerce
IBM Labs in Haifa
89 Copyright 2000-2003 IBM Corporation
Role of Caching
Cache results of LBA for performance and availability The usual problem of cache correctness
How long until cache refresh Time-outs too short -> load balancing algorithm places too much load Time-outs too long -> data is insufficiently fresh
What happens when cache sends you to a failed site If faulty cached-data, go back and refetch This leads to the definition of a Hint
A cached entry which is right with high probability, but can be and always is checked for validity prior to use
The issue of time-out appears again
IBM Labs in Haifa
90 Copyright 2000-2003 IBM Corporation
Example: Load Balancing to HTTP Server
User specifies http://www.xxx.com Request should actually be handled by one of many HTTP servers to
provide higher throughput One approach Can do request re-direction (a type of forwarder)
See http protocol definition as in assigned reading The forwarder a potential bottleneck
IBM Labs in Haifa
91 Copyright 2000-2003 IBM Corporation
Approach 1 – Round Robin DNS
DNS entries allow 32 server addresses per record. DNS (name) servers will cycle through the entries therefore providing
round-robin load balancing Advantages
Cheap Easy
IBM Labs in Haifa
92 Copyright 2000-2003 IBM Corporation
Round Robin DNS - Problems
Addresses of unavailable servers will remain until an administrator removes the entries
It takes hours or days for the DNS database to replicate So, system hands out addresses of down servers for a long time Address of recently added servers take a while to become visible
All servers treated equally Perhaps, new servers will likely be faster than the old ones and
would handle more load Some servers may handle multiple loads and should get fewer
requests
IBM Labs in Haifa
93 Copyright 2000-2003 IBM Corporation
Cisco Local and Distributed Director
See:http://www.cisco.com/warp/public/cc/pd/cxsr/400/tech/scale_wp.htm
Session redirection accomplished by rewriting IP header using a mapping table
Intelligent load balancing to servers within a cluster Takes into account status of servers Uses only a single DNS entry for entire server complex
Simplifies administration Hot standby feasible
Fancier load balancing of this type Routes requests based on topological distance Routing decisions can be based on hop counts, network usage, &
round-trip latency.
IBM Labs in Haifa
94 Copyright 2000-2003 IBM Corporation
IBM Secureway Network Dispatcher
http://www-4.ibm.com/software/network/dispatcher/about/features/keyfeatures.html Network dispatcher
Doesn't modify packets (vs. LocalDirector which does) Only inspects inbound requests (LocalDirector looks at both)
So, response go back directly to the requester (greater efficiency) Background processes check servers to ensure that they are up
"advisors" support HTTP, SSL, FTP, NNTP, POP3, SMTP, Telnet This way requests don't go to down servers.
Balances load across servers of different sizes: Servers send CPU, Disk, I/O metrics to dispatcher
Supports hot standby for high availability of dispatcher Uses a "sticky" port option to route client requests to same server to
ensure state preserved across requests: recall locality topic
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Failure Detection Module A-9
IBM Labs in Haifa
96 Copyright 2000-2003 IBM Corporation
Failure Detection
Explicit –clear indication that failure has occurred Timely Semantics clean, … as far as they go Voting
Implicit – timeout Requester does not receive response after waiting a while Unclean: Does not necessarily mean remote system failed
Timeout often used in very many places/levels Communication Naming, … And, ultimately, End-to-end
Some have argued only end-to-end timeouts valuable, but this is incorrect
IBM Labs in Haifa
97 Copyright 2000-2003 IBM Corporation
Timeout In More Depth
Problems with timeouts Semantics Specification of timeout length
Particularly difficult when requests take variable amounts of time And, requester, can not dynamically set time-out interval Long intervals lead to poor customer satisfaction – imagine an
ATM that made you wait 10 minutes before failing and giving you your card back?
Therefore, timeouts are used at multiple system levels Lower levels have more predictable performance so can trigger
timely failures better Higher levels are required for ultimate correctness
IBM Labs in Haifa
98 Copyright 2000-2003 IBM Corporation
The Role of the Sequence Number
Sequence number in communication protocol Failure Duplicate detection Flow control
Sequence number in replication algorithms As discussed previously
Sequence number in site crash detection Sites increment a number post failure Therefore possible to tell if site has crashed This is important to not miss getting work done on a site
IBM Labs in Haifa
99 Copyright 2000-2003 IBM Corporation
Voting
Discussed wrt: Weighted Voting Algorithm Used to determine most up-to-date copies
What if used to detect incorrect data N-way computation
Structure N-inputs: vote on them and determine most typical input N-computations on most typical input Vote on result N-outputs which go into next stage of computation Or go to some device which itself votes
IBM Labs in Haifa
100
Copyright 2000-2003 IBM Corporation
Yahoo Denial of Service Attack
Mostly unavailable 10:20AM – 12:00PM PST 2/7/00 Reported cause (NYT, 2/8/00)
50 computers “flooded” Yahoo site 1 gigabyte/second or 20 mbytes/computer/second “Clogging” Yahoo’s site and routers Difficult to trace due to use of hijacked computers
Solutions Audit, Filter, Legal System
Typical Yahoo availability: 99.3%, according to Keynote Systems Corresponds to being down 61 hours/year And, Yahoo is a good site
IBM Labs in HaifaCopyright 2000-2003
IBM Corporation
Achieving Availability with MalleabilityModule A-10
IBM Labs in Haifa
102
Copyright 2000-2003 IBM Corporation
The Goal Malleability
How do you change the system without taking it down? The application The operating system Perhaps, even a change to the hardware
This has proven very hard
IBM Labs in Haifa
103
Copyright 2000-2003 IBM Corporation
An Approach
Ensure a service is replicated Stop a copy Augment its interfaces Restart it And repetitively do the same to the other copies Eventually, all replicas will have no capabilities Note: it is very hard to reduce the scope of interfaces., Augmentation is
much easier.
IBM Labs in Haifa
104
Copyright 2000-2003 IBM Corporation
An Example
Assume you want to modify the function of a replicated directory while it is online
Assume there are: Multiples instances of the replicated directory itself, called
CtrReplicaGrps Multiple instances of the individual replicas, called CtrReplicas As in the weighted voting algorithm discussed earlier
IBM Labs in Haifa
105
Copyright 2000-2003 IBM Corporation
Technique (1)
[Part 1 to be discussed at the end] Part 2: one by one,
Stop a CtrReplica (hope a failure doesn’t occur simultaneously) Start a new version Do for all CtrReplicas The CtrReplicaGrps should not mind this gradual change. (They
don’t use the new methods… yet…) Also, they can tolerate the failure of a CtrReplica
IBM Labs in Haifa
106
Copyright 2000-2003 IBM Corporation
Technique (2)
Part 3: Now, one by one: Stop a CtrReplicaGrp Start the new version Do for all CtrReplicaGrps
Now, there is a new function available. Finally, do Part 1: test what we have so carefully installed, so we haven’t
just (methodically) inserted a bug into the entire, supposedly fault-tolerant, system
IBM Labs in Haifa
107
Copyright 2000-2003 IBM Corporation
Issues
Issues: Too many steps for a human being to get right
So, need automation via console May not handle a simultaneous failure during upgrading:
So, more replicas may be needed Cost of availability: The shape of this curve is right, though the calibration is
unknown and undoubtedly flattens as experience grows
0
10
20
30
40
50
60
70
IBM Labs in Haifa
108
Copyright 2000-2003 IBM Corporation
Window of Vulnerability
If transactions used, there is a potential availability problem during the “Window of Vulnerability”
The only solution is that transactions coordinators must be rather reliable and be guaranteed to recover quickly after a crash
IBM Labs in Haifa
109
Copyright 2000-2003 IBM Corporation
Availability
So, considerable thought required to achieve high availability in malleable systems
Better when not needed However, when high availability required
Every level of system needs to be studied and addressed
IBM Labs in Haifa
110
Copyright 2000-2003 IBM Corporation
The Architecture As We’re Studying It
…
EJB
DB
MS
Servlet/JSPClient
Integrated Dev’t Environment
Java Runtime Environment
Security/Directory (X509, LDAP, Kerberos)
Linux NT AIX Solaris Sys/390
Reusable Components
Modeling and Other Softw’ Eng. Tools
System
s Mgm
t
Reliable
Messsaging
Workflow Management