Upload
elmer-skinner
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
DYNAMIC LOAD BALANCING IN
WEBSERVERS & PARALLEL COMPUTERS
By
Vidhya Balasubramanian
• Dynamic Load Balancing on Highly Parallel Computers- dynamic balancing schemes which seek to minimize total execution time of a single application running in parallel on a multiprocessor system
1. Sender Initiated Diffusion (SID) 2. Receiver Initiated Diffusion(RID) 3. Hierarchical Balancing Method (HBM) 4. Gradient Model (GM) 5. Dynamic Exchange method (DEM)
• Dynamic Load Balancing on Web Servers-dynamic load balancing techniques in distributed web-server architectures , by scheduling client requests among multiple nodes in a transparent way
1. Client-based approach 2. DNS-Based approach 3. Dispatcher-based approach 4. Server-based approach
Load balancing on Highly Parallel computers
• load balancing is needed to solve non-uniform problems on multiprocessor systems• load balancing to minimize total execution time of a single application running in parallel on a multicomputer system• General Model for dynamic load balancing includes four phases * process load evaluation * load balancing profitability determination * task migration strategy * task selection strategy• 1st and 4th phase application dependent and hence can be done independently• load balancing overhead includes :- - communication costs of acquiring load information - informing processors of load migration decisions - processing costs of evaluating load information to determine task transfers
Issues in DLB Strategies
1. Sender or Receiver initiation of balancing 2. Size and type of balancing domains 3. Degree of knowledge used in the decision process 4. Overhead , distribution and complexity
General DLB Model
• Assumption – each task is estimated to require equal computation time• process load evaluation – count of number of tasks pending execution• task selection simple – no distinction between tasks• inaccuracy of task requirements estimates leads to unbalanced load distributions• imbalance detected in phase 2, and appropriate migration strategy devised in phase 3. • centralized vs. distributed approach –
• centralized –more accurate, high degree of knowledge, but requires synchronization which incurs an overhead and delay• distributed – less accurate, lesser overhead
Load Balancing Terminology
Load Imbalance Factor ( t) ) : It is a measure of potential speedup obtainable through load balancing at time t It is defined as the maximum processor loads before and after load balancing , Lmax, and Lbal respectively
t) = Lmax - Lbal
Profitability: Load Balancing is profitable if the savings is greater than load balancing overhead Loverhead i.e.,
t) > Loverhead
Simplifying assumption : One the processor’s load drops below a preset threshold , Koverhead any balancing will improve the system performance
Balancing Domains: system partitioned into individual groups of processors Larger domains – more accurate migration strategies : smaller domains – reduced complexity
Gradient Model
• Under loaded processors inform other processors in the system of their state and overloaded processors respond by sending a portion of the load to the nearest lightly loaded processor
• threshold parameters – Low-Water-Mark(LWM) , High-Water-Mark(HWM)• processors state light if less than LWM, and high if greater than HWM• Proximity of a process : defined as the shortest distance from itself to the nearest lightly
loaded node in the system• wmax - initial proximity, the diameter of the system• proximity of system is 0 if state becomes light• Proximity of p with ni neighbors computed as :
proximity(p) = mini ( proximity(ni )) + 1 Load balancing profitable if :
Lp – Lq > HWM – LWM• Complexity:1. May perform inefficiently when too mulch or too little work is sent to an under loaded
processor2. In the worst case an update would require NlogN messages (dependent on network
topology)3. Since ultimate destination of migrating tasks is not explicitly known , intermediate
processors must be interrupted to do the migration4. Proximity map might change during a task’s migration altering its destination
3
2
1
2
3
3
32
2
1
10
Overloaded
ModeratelyOverloaded
Underloaded
Sender Initiated Diffusion
Local, near- neighbor diffusion approach which employs overlapping balancing domains to achieve global balancing balancing performed when a processor receives a load update message from a neighbor indicating that the neighbors load li < L low where L low is preset threshold Average load in domain Lp _ k
Lp = 1 / (k+1) ( lp + lk ) k=1
Profitability: Profitable if _ Lp – Lp > Lthreshold
Each neighbor assigned a weight hk depending on its load
the weights hk are summed to find the local deficiency Hp The portion of processor p’s excess load that is apportioned to neighbor k is given by k = ( lp – Lp) hk / Hp
Complexity1. Number of messages for update = KN
2. Overhead incurred by each processor = K messages 3. Communication overhead for migration = N/2 k transfers
04
6
8
Average load L =10Domain deficiency H = 20Surplus load S = 21
Receiver Initiated Diffusion
• under loaded processors request load from overloaded processors• initiated by any processor whose load drops below a prespecified threshold Llow
• processor will fulfill request only upto half of its current load.• underloaded processors take on majority of load balancing overheadk = ( lp – Lp) hk / Hp same as SID, except it is amount of load requested.• balancing activated when load drops below threshold and there are no outstanding requests.
• Complexity Num of messages for update = KN Communication overhead for task migration = Nk messages + N/2 K transfers(due to extra messages for requests)
As in SID, number of iterations to achieve global balancing is dependent on topology and application
Hierarchical Balancing Method
• processors in charge of balancing process at level li , receive load information from both lower level li-1 domains• size of balancing domains double from one level to the next• subtree load information is computed at intermediate nodes and propagated to the root• The absolute value of difference between the left domain LL and right domain LR is compared to Lthreshold
| LL – LR | > Lthreshold
• Processors within the overloaded subtree , send a designated amount of load to matching neighbor in corresponding subtree
• Complexity:
1. Load transfer request messages = N/2 2. Total messages required = N(log N+1) 3. Avg cost per processor = log N+1 sends and receives 4. Cost at leaves = 1 send + log N receives 5 . Cost at root = log N receives + N-1 sends + log N receives
Dimension Exchange Method
• small domains balanced first, then entire system is balanced• synchronized approach• in N processor hypercube, balancing performed iteratively in each logN dimensions • balancing initiated by processor with load that drops below threshold
• Complexity 1. Total communication overhead = 3N log N messages
Category Gm SID DEM HBM RID
Initiation Receiver Sender Designated Designated Receiver
Balancing
Domain
Variable Overlapped Variable Variable Overlapped
Knowledge Global; local Global Global Local
Aging period
O(diameter(N))
F(u,K) Constant F(u,N) F(u,K)
Overhead
Distribution
Uniform Uniform Uniform Non uniform
Uniform
U = load update factor: if u = ½ then processor must send update messages whenever load has doubled or halved from last update
Summary of Comparison Analysis
Performance Analysis Graphs
Speedup Vs Number of Processors
Dynamic Load Balancing on Web Servers
• load balancing is required to route requests among distributed web server nodes in a transparent way• this helps in improving throughput and provides high scalability and availability• user: one who accesses the information• client: a program, typically a web browser• client obtains IP address of a web server node through an address mapping request to the DNS server• there are intermediate name server, local gateways and browsers , that can cache the address mapping for sometime
Requirements of the web server:
• transparency
• scalability
• load balancing
• availability
• applicability to existing Web standards (backward compatibility)
• geographic scalability (i.e., solutions applicable to both LAN and WAN distributed systems)
Client –Based Approach
• In this approach it is the client side itself that routes the request to one of
the servers in the cluster. This can be done by the Web-browser or by the client-side proxy-server.
1 . Web Clients• assume web clients know the existence of replicated servers of the web
server system• based on protocol centered description• web client selects the node of a cluster , resolves the address and submits
requests to selected node• Example:
1. Netscape* Picks random server i* not scalable
2. Smart Clients* Java applet monitors node states and network delays* scalable, but large network traffic
2. Client Side Proxies
• combined caching and server replication
• Web Location and Information service can keep track of replicated URL addresses and route client requests appropriately
Advantages and Disadvantages:
-Scalable and high availability
-Limited applicability
-Lack of portability on the client side
Client –Based Approach-contd
DNS –Based Approach
• cluster DNS – routes requests to the corresponding server• transparency at URL level• through the translation process from the symbolic name to IP address , it can select any node of the cluster•DNS it also specifies, a validity period known as Time-to-Live, TTL• After expiration of TTL, address mapping request forwarded to cluster DNS
• limited factors affecting DNS* TTL does not work on browser caching* no cooperative intermediate name servers* can become potential bottleneck
• Two DNS based System of algorithms* Constant TTL Algorithms* Adaptive TTL algorithms
A DNS-based Web server cluster
DNS-Based Approach
Constant TTL Algorithms
classified based on system state information and constant TTL value
System Stateless Algorithms:- Round Robin DNS by NCSA- load distribution not very balanced, overloaded server nodes- ignores sever capacity and availability
Server State Based Algorithms:- simple feedback alarm mechanism- selects server with lightest load - limited applicability
Client State Based Algorithms- typical load that can come from each connected domain- Hidden Load , measure of average number of data requests sent
from each domain to a Web site during the TTL caching period- geographical location of the client - Cisco DistributedDirector – takes into account relative client-to-
server topological proximity, and client-to-server link latency- Internet2 Distributed Storage Infrastructure uses round trip delays
Server and Client State Based Algorithm-Distributed Director DNS - both server availability and client
proximity
Adaptive TTL Algorithm
-By base of dynamic information from servers and/or clients to assign different TTL
- Two step process* DNS selects server node similar to hidden load weight
algorithms* DNS chooses appropriate value for the TTL period
-TTL values inversely proportional to the domain request rate
- popular domains have shorter TTL intervals
- scalable from LAN to WAN distributed Web Server systems
Dispatcher Based Approach
• provides full control on client requests and masks the request routing among multiple servers
• cluster has only one virtual IP address the IP address of the dispatcher
• dispatcher identifies the servers through unique private IP addresses
• Classes of routing1. Packet single-rewriting by the dispatcher2. Packet double-rewriting by the dispatcher3. Packet forwarding by the dispatcher4. HTTP redirection
Packet Single Rewriting
-dispatcher reroutes client-to-server packets by rewriting their IP address
-requires modification of the kernel code of the servers, since IP address substitution occurs at TCP/IP level-Provides high system availability
Packet Double Rewriting
-modification of all IP addresses, including that in the response packets carried out by dispatcher-two architectures based on this:
* Magicrouter (fast packet interposing where user level process,acting as a switchboard, intercepts client-to-server and server-to-client packets and modifies them)
* LocalDirector ( modifies IP address of client-server packets according to a dynamic mapping table)
Packet Forwarding
* forwards client packets to servers instead of rewriting IP address * Network Dispatcher
- use MAC address- dispatcher and servers share same IP-SVA address- for WAN, two level dispatcher (first level packet rewriting)- transparent to both the client and server
* ONE-IP address- publicizes the same secondary IP addresses of all Web-server nodes as IP-SVA of the Web-server cluster- routing based dispatching : destination server selected based on hash function- broadcast based dispatching:
router broadcasts the packets to every server in the cluster- using hash function restricts dynamic load balancing- does not account for server heterogeneity
HTTP Redirection
• Distribute requests among web-servers through HTTP redirection mechanism
• redirection transparent to user
• Server State based dispatching - each server periodically reports both the number of processes in its
run queue and number of received requests per second
• Location based dispatching
• can be finely applied to LAN and WAN distributed Web Server Systems
• duplicates the number of necessary TCP connections
Server Based Approach
- uses two level dispatching mechanism- cluster DNS assigns requests to a server- server may redirect request to another server in the cluster
-allows all servers to participate in load balancing (distributed)
- Redirection is done in two ways- HTTP redirection- Packet redirection by packet rewriting
HTTP Redirection by the Server
Packet Redirection
-transparent to client-Two balancing algorithms
- use RR-DNS to schedule request (static routing)- periodic communication among servers about their current load
Main Pros and Cons
Approach Scheduling Pros Cons
Client-Based
Client-side No server overhead Limited applicability
Distributed LAN & WAN solution Medium coarse grained balancing
DNS-Based Cluster-side No bottleneck Partial control
Centralized LAN & WAN solution Coarse grained balancing
Dispatcher-Based
Cluster side Fine grained balancing Dispatcher bottleneck
Centralized Full control LAN solution
Packet rewriting overhead
Server-Based
Cluster-side Distributed control Latency time increase(HTTP)
Distributed Fine grained balancing Packet rewriting overhead(DPR)
LAN & WAN solution
Performance of various distributed architectures
1. Exponential distribution model
2. Heavy-tailed distribution model
Conclusions
-consider performance constraints due to network bandwidth than server node capacity- account for network load as well as client proximity