Dynamic load balancing

CHAPTER 1

INTRODUCTION

1.1 Overview

1.1.1 Distributed System

The computing power of any distributed system can be realized by allowing its constituent

computational elements (CEs), or nodes, to work cooperatively so that large loads are allocated

among them in a fair and effective manner. Any strategy for load distribution among CEs is called

load balancing (LB). An effective LB policy ensures optimal use of the distributed resources

whereby no CE remains in an idle state while any other CE is being utilized. In many of today’s

distributed-computing environments, the CEs are linked by a delay-limited and bandwidth limited

communication medium that inherently inflicts tangible delays on internode communications and

load exchange. Examples include distributed systems over wireless local-area networks (WLANs) as

well as clusters of geographically distant CEs connected over the Internet, such as Planet Lab .

Although the majority of LB policies developed heretofore take account of such time delays, they

are predicated on the assumption that delays are deterministic

In actuality, delays are random in such communication media, especially in the case of

WLANs. This is attributable to uncertainties associated with the amount of traffic, congestion,

and other unpredictable factors within the network. Furthermore, unknown characteristics (e.g.,

type of application and load size) of the incoming loads cause the CEs to exhibit fluctuations in

runtime processing speeds. Earlier work by our group has shown that LB policies that do not

account for the delay randomness may perform poorly in practical distributed computing settings

where random delays are present. For example, if nodes have dated, inaccurate information about

the state of other nodes, due to random communication delays between nodes, then this could

result in unnecessary periodic exchange of loads among them. Consequently, certain nodes may

become idle while loadsare in transit, a condition that would result in prolonging the total

completion time of a load. Generally, the performance of LB in delay-infested environments

depends upon the selection of balancing instants as well as the level of load-exchange allowed

1

between nodes. For example, if the network delay is negligible within the context of a certain

application, the best performance is achieved by allowing every node to send its entire excess

load (e.g., relative to the average load per node in the system) to less-occupied nodes. On the

other hand, in the extreme case for which the network delays are excessively large, it would be

more prudent to reduce the amount of load exchange so as to avoid time wasted while loads are

in transit. Clearly, in a practical delay-limited distributed-computing setting, the amount of load

to be exchanged lies between these two extremes and the amount of load-transfer has to be

carefully chosen. A commonly used parameter that serves to control the intensity of load

balancing is the LB gain.

1.2 Objective:

The novelty of this implementation (i.e., its dynamic nature) lies in the fact that it makes the

system able to balance the load not only in terms of active mobile nodes or number of serving

contexts, but also based on the actual traffic characteristics of active mobile terminals.

1.3 Scope of Study:

1.3.1 Load Balancing

Load balancing is defined as the allocation of the work of a single application to

Processors at run-time so that the execution time of the application is minimized. Since the

speed at which a NOW-based parallel application can be completed depends on the

computation time of the slowest workstation, efficient load balancing can clearly provide

major performance benefits. The two major categories for load-balancing algorithms are static

and dynamic.

1.3.2 Static Load Balancing

Static load balancing algorithms allocate the tasks of a parallel program to workstations

based on either the load at the time nodes are allocated to some task, or based on an average

load of our workstation cluster. The advantage in this sort of algorithm is the simplicity in

terms of both implementation as well as overhead, since there is no need to constantly monitor

the workstations for performance statistics. However, static algorithms only work well when

there is not much variation in the load on the workstations. Clearly, static load balancing

2

algorithms aren’t well suited to a NOW environment, where loads may vary significantly at

various times in the day, based on the issues discussed earlier.

1.3.3 Dynamic Load Balancing

Dynamic load balancing algorithms make changes to the distribution of work among

workstations at run-time; they use current or recent load information when making distribution

decisions .As a result, dynamic load balancing algorithms can provide a significant improvement in

performance over static algorithms. However, this comes at the additional cost of collecting and

maintaining load information, so it is important to keep these overheads within reasonable limits.

The remainder of this report will focus on such dynamic load balancing algorithms.

3

CHAPTER 2

LITERATURE SURVEY

2.1 Existing System:

Many efforts have already been made to provide a more efficient GGSN selection and

anchor relocation in 3G/UMTS architectures. In this section we give a short overview of the

different approaches, in particular the ones which try to present a practically applicable dynamic

solution for GGSN load balancing and load sharing.

The method called “GPRS-Subscriber selection of multiple internet service providers”

was developed by Ericsson in order to allow mobile users to connect to multiple different Packet

Data Communication networks (PDNs). The selection of the PDN is based on the transmission of a

specific Network Indication Parameter (NIP). This parameter is sent to the Serving GPRS Support

Node (SGSN) as a parameter in the PDP context activation procedure. The PDP type parameter (a

two bytes long field) is used to set up the connection to the chosen PDN. Based on this method,

mobile operators can distribute traffic load towards gateways of different networks, and can also

assign the GGSN with the lowest load to the new entrant MNs or using other selection policies

“Distributed IP-pool in GPRS” or “Dynamically distributed IP-pool in GPRS” is another

development made by Ericsson. The aim of these patents is to distribute IP addresses to the users

efficiently in a multi-GGSN system when the users get the address from the GPRS/UMTS network.

Each GGSN has an own variable size pool from which it can assign addresses, and a central pool is

given, which holds available addresses. When the pool of a GGSN is nearly empty it can get a group

of addresses from the central pool. On the other hand, when a GGSN has a lot of address, these

addresses can be sent back to the central pool to be used elsewhere. This scheme could help the

3G/UMTS network to balance load of GGSNs in means of MN numbers.

Cisco’s “GGSN GTP Load Balancing” is based on the Cisco Server Load Balancing (SLB)

feature designed to provide IP server load balancing between Cisco devices. GGSN GTP Load

Balancing is a method aiming to provide increased GGSN reliability and availability when applying

4

multiple Cisco or non-Cisco GGSNs in a GPRS/UMTS network. Using this feature the operator can

define a virtual server that represents a group or cluster of real server implementations (i.e., server

farm). In such an environment potential clients connect to the IP address of the virtual server. When

a client starts a connection to the virtual server, the Cisco SLB feature will choose a real server for

the connection from the server farm. The server -selection is based on a configured load-balancing

algorithm implemented in the virtual server.

The above introduced methods make operators able to distribute load to different GGSN

entities and also to assign GGSN actually suffering the lowest load to the new coming MNs or using

any other GGSN selection policy. However, such selected GGSN will be permanent during the

whole period between the attachment and detachment of a particular MN. Therefore in such a

system there is no way to perform dynamic load balancing between GGSN entities if MNs are

currently in service (i.e., comprise active PDP contexts). However, in bursty packet data

communication it is very likely that a load balancer assigns the same load (e.g., the same number of

MNs) to the GGSNs in a GGSN server farm or pool, but the GGSNs still suffer from unequally

balanced load. In current 3GPP standards it is impossible to provide such features like dynamic load

balancing or load adjustment between GGSN nodes.

2.2 Proposed System:

This was the motivation of Shiao-Li Tsao who presented the basics of a dynamic GGSN load

balancing and load sharing scheme. The author introduces a new device called the GGSN

Controller, which is responsible for monitoring the load of the GGSNs and adjusting the GGSN load

dynamically by transferring PDP contexts between GGSNs. The initiation and the process of the

PDP transfers are monitored and controlled by the centralized GGSN controller. The GGSNs

sending their load information periodically to the GGSN Controller and the Controller decides

whether a PDP context transfer is necessary. Shiao-Li Tsao also defines novel protocol messages

that make the PDP context transfer available, and gives detailed description of various cases that can

occur during the context transfers, but unfortunately no performance evaluation or any kind of

analysis was provided in this work.

To the best of our knowledge, the only evaluation ever conducted in the area of dynamic

anchor load balancing is presented by Cheng Xue et al. who introduced and simulated the inter-GW

5

load balancing approach of LTE/EPC networks. Based on various simulation tests they showed that

with load balancing, the performance of the system can be enhanced dramatically.

As we introduced, several schemes have already been proposed aiming to provide

intelligent GGSN selection and anchor load balancing in 3G/UMTS and beyond architectures.

However, only very few of them were evaluated and even the available evaluation results are based

only on simulations. In our work we rely on the insights of as we refined the basic idea in order to

implement and integrate the solution in a 3G/UMTS testbed for comprehensive analysis and

evaluation of the scheme with extensive measurements. According to our surveys, this is the first

freely available work of analyzing dynamic GGSN load balancing and load sharing in real-life

wireless environment.

2.3 Comparison:

In order to evaluate our implementation and to analyze the performance characteristics of the

concept of dynamic GGSN load balancing in our testbed, we decided to execute measurements with

various number of active PDP contexts. We examined the average latency of the packets, which

passed through the GGSNs in order to benchmark the throughput of the anchoring gateway node(s)

in the 3G/UMTS network. The tests we performed were based on artificial UDP streams synthesized

per PDP context by our GTP traffic generator applying packet size of 125 bytes at each stream and

using packet sending frequencies between 2400–4000 packet/s. The maximum number of

transferred contexts was limited to 10. The results are shown on the following graphs. As it can be

seen, the average packet latencies were significantly lower, when dynamic PDP context transfer was

enabled. Our results are proving that transferring PDP contexts, thereby achieving dynamic load

balancing between GGSNs, can be implemented in real-life environments with substantial gains .

With dynamic load balancing, lower packet latencies at gateway anchor nodes are achievable, thus

the overall throughput of the network can be increased significantly.

6

Fig.1 Comparison results based on average packet latency measurements

1

7

CHAPTER 3

ARCHITECTURE AND IMPLEMENTATION

3.1 Test Bed Evaluation of Dynamic GGSN Load Balancing for High Bitrate

3G/UMTS Networks:

Thanks to the appearance of novel, extremely practical smartphones, portable computers with

easy-to-use 3G USB modems and attractive business models, mobile Internet has started to become

a reality. Network operators offer flat-rate mobile data connection packages and although these

packages usually have traffic limit, it is expected to disappear soon. Mobile users tend to replicate

usage manner of wired broadband Internet subscriptions with long connection times and massive

multimedia data transmission. The growing traffic load of single users and the growing amount of

mobile subscribers will result an immense traffic explosion in the packet switched domain of mobile

networks up to year 2020 . The increase of mobile Internet traffic will be higher compared to the

fixed Internet traffic in the forthcoming years, most dramatically due to new entrant, data-hungry

mobile entertainment services like mobile TV,

video and music,and new application types, such as M2M (machine-to-machine) communications

including e-health services, vehicle communications, remote control and monitoring services. In

order to handle the anticipated traffic demands and maintain the profitability together, it is important

to eliminate bottlenecks from the network . One of these bottlenecks in today’s 3G/UMTS

architectures is the GPRS Gateway Support Node (GGSN).

In UMTS networks, GGSN is the gateway towards the Internet. According to the actual

3GPP specifications, a Mobile Node (MN) requesting packet switched (PS) services should attach to

the network first. When an attached MN wants to access packet switched services, it needs to

activate a PDP (Packet Data Protocol) context that enables the MN to access the service based on

the information stored in the Home Location Register (HLR) . Once the MN successfully activated

the PDP context, it can start the packet delivery/reception procedures.During this activation

procedure a suitable GGSN is to be selected in the 3G/UMTS network in order to serve the MN. The

selected GGSN will be responsible to forward every single user data packet towards the outside

8

network domains and routing packets back to the MN based on the GTP tunnels linked to the MN’s

PDP contexts. In this way the GGSN becomes a user plane anchor for its MNs. Considering the

increasingly growing traffic demand for packet services in 3G/UMTS and beyond (e.g., LTE/EPC)

systems, it is obvious that serious scalability issues of such anchor nodes will be necessary to handle

very soon. In order to cope with these questions, operators tend to deploy a set of GGSNs.

Unfortunately, a GGSN anchor is permanent during the whole period of a PDP context

activation/deactivation interval, meaning that the selected GGSN node cannot be changed until the

MN deactivates the session of a particular PDP context. To put it the other way around we can say

that a standard 3GPP 3G/UMTS architecture cannot perform on-the-fly or dynamic load balancing

of GGSNs for MNs which own currently activated contexts. However, packet data services are often

generating traffic which is quite bursty in nature, thus creating the need for a more scalable solution

like dynamic load balancing between anchor points without breaking ongoing MN connections

The concept of the protocol is based on Shiao-Li Tsao's work. However, we discard the

GGSN Controller unit from the scheme, and our method focuses on the implementation questions

and the way of deletion of the contexts.

Fig 2. The basic scheme of the implemented load balancing protocol

9

3.2 Dynamic GGSN Load Balancing Scheme :-

An important starting assumption during our implementation efforts was that the GGSNs are

connected via a high bandwidth and reliable internal network, and know each other’s IP addresses.

We also assume that the majority of the traffic that handled by the GGSNs are “best-effort” in

nature, therefore don’t require special QoS provisioning services and mechanisms. This “best -

effort” traffic should tolerate the extra delay, and possible pocket loss that can occur during PDP

transfer.

Our protocol was designed not to change the size and items of IP address pools maintained in

individual GGSNs after transferring and deleting PDP contexts. By every PDP context transfer, the

IP address of the context moves to another GGSN only temporarily: right after the deletion of a

formerly transferred PDP context, the IP address of the context is placed back into the pool of the

originating GGSN. With this method the GGSNs can be prevented from running out of dynamic IP

addresses. Considering two GGSNs in the GGSN cluster, the operation of our implemented dynamic

GGSN load balancing scheme is the following.

GGSN-1 continuously monitors and records the bandwidth usage of each PDP context, and

marks the contexts with the highest load. GGSN-1 is also measuring, implements the GTPv0 and

GTPv1 protocols. The main part of our implementation work was extending OpenGGSN’s GTPLib

to enable management of the newly defined GTP messages related to PDP context transfer (e.g.,

Transfer PDP Context Request/Reply). The GGSN component of the software package was also

modified in order to implement the mechanisms for PDP context transfer and packet delay

measurements. We extended the PDP context structure too: the PDP contexts now are able to store

the address of the GGSN where they were originally created. A special logic for continuous load

analysis of active PDP context is implemented with the addition of two variables to the context

structure: a timer and a counter. Both the timer and the counter are reset periodically in order to

reflect to the actual usage such providing capabilities for dynamic decisions. The counter can be

based on different load metrics; our implementation uses current bandwidth information for

differentiation between load characteristics of PDP context traffic. However the idea of this protocol

is based on Shiao-Li Tsao’s method the concept of the GGSN Controller which collects load

information from GGSNs and centrally controls the PDP context transfers was set aside as in our

implemented scheme the control of dynamic PDP context transfers is managed by decision logics of

10

individual GGSNs in a distributed way. Compared to Shiao-Li Tsao’s protocol, we focused more on

the issues of what happens after a transferred PDP context is deleted. We designed our protocol that

PDP context transfers should not change the size of the address pools of the GGSNs. After deleting

a transferred PDP context, its IP address returns to the originating GGSN. A significant

improvement to Shiao-Li Tsao’s work that our protocol was implemented, tested and evaluated in a

real-life 3G/UMTS testing environment to be introduced in the next Section.

3.3 Test bed Architecture:

1) In order to provide a test bed for gateway scalability researches and analyzing dynamic

GGSN load balancing in next generation multimedia-centric and high bitrate communication

systems, we designed and implemented a UMTS/IMS architecture based on the existing hardware

elements of Mobile Innovation Centre (MIK) located in Budapest, Hungary . In order to make the

system able to use synthetic traffic of variable number of users, we applied a software GGSN

implementation called OpenGGSN as a basis of our work. Our GPL licensed and publicly available

OpenGGSN modification uses the same architecture as original version 0.84, but extends the GTP

library with routines and components for dynamically setting up, maintain and tear down PDP

contexts inside a GGSN pool for efficient load balancing. Besides packets originated from and

heading towards real 3G User Equipments, also synthetic IP traffic of virtual users can be used for

evaluation purposes. This is achieved by extended SGSNemu instances (note, that the original

SGSNemu is also a part of the OpenGGSN 0.84 pack). As shown in our testbed has a special

GGSN pool consisting of multiple GGSNs, and several SGSNs can be connected to that pool. To

test the scalability of the system, several IP traffic generators (e.g., were applied. These generators

are connected to the SGSNs, emulate the Gb interface, such creating an effective and flexible GTP

traffic generator. The receiver side of the traffic generator is connected to the Gi interface of the

GGSN Pool. SGSNemu instances map the incoming traffic to different PDP contexts, and then

forward the data to the direction of the chosen GGSN. With the development of this extended

SGSNemu version, our testbed became able to create and delete contexts, and generate traffic to the

created contexts dynamically. Using this complex and flexible testbed architecture we evaluated the

throughput of the GGSN with and without dynamic load balancing capabilities, measured the real-

life effects of the dynamic load balancing scheme, and showed how this approach can improve the

performance of the overall 3G/UMTS system.

11

Fig 5. 3G/UMTS testbed architecture used for evaluation

The latency of each passing packet. When this latency exceeds a certain level, GGSN-1

sends a Transfer PDP Context Request message to GGSN-2. The request contains one of the

marked transferable PDP-contexts.

2) Then GGSN-2 calculates, whether it has enough resources to serve the context. If the answer

is yes, it creates the context, saves the IP address of the context to its local address pool, and

sends a routing update message to the inbound routers in order to indicate that the packets

belonging to that context should be forwarded to GGSN-2. The transferred context then gets

a new TEID (Tunnel Endpoint IDentifier) value from GGSN-2.

12

3) GGSN-2 sends a Transfer PDP Context Response message to GGSN-1. The message

contains the new and the old TEID of the transferred PDP context. GGSN-1 releases the

resources occupied by the transferred context, except the IP address belonging to the PDP

context under transfer: this address should not be reallocated during the whole life of the

context.

4) As a next step, GGSN-1 sends a Change GGSN Request message, to SGSN. This message

contains the received identifiers (i.e., the old and new TEID of the transferred context) from

GGSN-2.

5) The SGSN responds with a Change GGSN Response message in order to confirm the

request, and updates its copy of the transferred PDP context. From now the packets which

belong to the transferred PDP context will be forwarded to GGSN-2, meaning that the

context transfer is completed.

6) In case of UE initiated removal of the previously transferred context the SGSN receives a

Delete PDP Context Request message. The SGSN has no information about that the PDP

context “to be deleted” has been transferred before, so it sends a Delete PDP Context

Request message directly to GGSN-2.

7) GGSN-2 stores information about whether the context is its own, or has been transferred

under its authority. So when the deletion request arrives, GGSN-2 knows that a transferred

context is to be deleted. In that case GGSN-2 sends a Delete Parent PDP Context message to

GGSN-1 from where the context was originally transferred.

8) GGSN-2 sends a routing update message to the inbound routers, indicating that the address

for the routing entry is deleted, and the packets from that address is to be directed towards

GGSN-1 in the future.

9) and 10) GGSN-1 releases the address of the PDP context, and then sends a Delete Parent

PDP Context Response message to GGSN-2 indicating the successful release of the address.

GGSN-2 removes the PDP context, and then sends a Delete PDP Context Response message

as an acknowledgment to SGSN.

13

The implementation of the above protocol shell is based on the OpenGGSN open source

software package . This package has three components:

1) a fully functional GGSN module,

2) an SGSN emulator named SGSNemu, which can be used to test the GGSN (it only includes

basic functionality for SGSN-GGSN communication, and

3) the GTPLib.

14

CHAPTER 4

METHODOLOGY

4.1 Load Balancing Algorithm

Most load balancing algorithms are designed based on the performance requirements of

some specific application domain. For example, applications that exhibit lengthy parallel jobs

usually benefit in the presence of a job migration system. However, applications with shorter tasks

usually don’t warrant the expense of job migration and thus are better handled with clever loop

scheduling algorithms where the task granularity changes dynamically, as defined in. As a result,

only one algorithm will be described here in order to provide an overview of the various issues that a

typical algorithm must take into consideration. However, all algorithms closely follow the four basic

load balancing steps outlined at the beginning of Implementation

4.1.1 Single Program Multiple Data Computation Model

The Single Program Multiple Data (SPMD) paradigm implies that all the Workstations run

the same code, but operate on different sets of data. The motivation for sing SPMD programs is that

they can be designed and implemented easily, and they can be applied to a wide range of

applications such as numerical optimization problems and solving coupled partial differential

equations.

The SPMD computation model is depicted in Figure 3. Each task is divided into operations

or iterations. Workstations execute the same operation asynchronously, using data available in the

workstation’s own local memory. This is followed by a data exchange phase where information can

be exchanged between workstations (if required), after which all workstations wait for

synchronization. Thus each “lock-step” of an SPMD program contains.

15

Three phases:

1. Calculation Phase: each task will do the required computation. There is no communication

between workstations at this point.

2. Data Distribution Phase: each task will distribute the relevant data to other tasks that need it for

the next lock-step.

3. Synchronization Phase: this phase ensures that all tasks have completed the same lock-step.

Otherwise there will be problems with tasks using the wrong data.

Fig 4. SPMD Computation Model [LEE95]

If an SPMD program were to be executed on a homogeneous multiprocessor system, the

workload would be balanced for the entire computation (assuming that all tasks were initially evenly

distributed). However, in a NOW, there are various other factors that can affect the load and thus

contribute to load imbalances.

16

Thus, within the SPMD paradigm in a VPM, we would like to reduce the execution time of

the program by dynamically shifting/migrating tasks from one workstation to another at the end of

each lock-step, if required. There are 2 things that need to be considered:

1. Determine if there is a need to rebalance the load

2. Find the best distribution of tasks

SPMD Load Balancer

Has developed a global, centralized, sender-initiated load balancing algorithm for large,

computation-heavy SPMD programs using the following parameters:

Tcomputei :-

Tthe interval between the time at which the first task on workstation i

starts execution and the time at which the last task on the same workstation completes the

computation and waits for the synchronization. This value is thus the dynamic workload index for

the algorithm, since there is a direct relation between Tcomputei and a workstation’s load.

Assuming a program can be decomposed into N tasks and there are P workstations, then we have N

= ni (for i = 1 to P). Thus ni is the number of tasks on workstation i.

Figure 3 – SPMD Computation Model

Ttaski – the average computation time for each task on a workstation, defined

as:

Ttaski = Tcomputei / ni (equation 1)

Thigh – the maximum of Tcompute over all workstations, defined as:

Thigh = max { Tcomputei } (1 <= i <= P)

Tlow – the minimum of Tcompute over all workstations, defined as:

Tlow = min { Tcomputei } (1 <= i <= P)

A common approach taken for dynamic load balancing on a NOW is to predict

17

future performance based on past information [ZAKI96]. In the SPMD algorithm, Ttask can be used

to update Tcompute. Thus if m tasks are moved to workstation i, we can solve equation1 to give us

an estimation of Tcomputei: Ttaski x (ni + m)

The estimation is based on the current workload of the workstation, and it is valid because all

tasks in an SPMD program are executing the same code. Tcompute will be recalculated after each

task reallocation, with Thigh and Tlow updated accordingly.

4.2 Meeting the Rebalancing Criteria:

Therefore, in order to balance the load, tasks from workstations that have a longer Tcompute will be

moved to the workstations with a shorter Tcompute. However, the algorithm must also take into

account the rebalancing criteria as discussed in earlier

For the first rebalancing criteria, assume that we have a workstation k that has the current

highest Tcompute. Further assume that L represents the number of lock steps remaining, and mi

represents the number of tasks workstation i transmits or receives. Therefore, in order to guarantee

that moving a task from workstation k to a new workstation doesn’t cause the new workstation’s

Tcompute to be greater than k’s, we check the following condition:

Ttaskk x nk > min {Ttaski x (ni + 1)} (1 <= i <= P, i != k)

If this is true, then moving one task from workstation k to another workstation will

not cause the oscillating effect that was mentioned in Section 3.3. For the second rebalancing criteria

(where we need to verify that attempting to balance the load provides some performance gain), we

can compute the following:

OldThigh – represents the previous Thigh value before the current load balancing decision

NewThigh – represents the new Thigh value that is computed after meeting criteria 1 (which is

guaranteed to be lower than OldThigh)

13

Therefore the gain associated with performing load balancing would be:

18

Gain = (OldThigh – NewThigh) X L

Assuming that our load balancer knows Toverhead, the cost of performing job migration of

the SPMD tasks, we can now check criteria 2 using the following:

Gain >= Toverhead x max {mi} (1 <= i <= P)

If this is true, then we can conclude that it is worthwhile to perform the load balancing and job

migration.

4.3 Iterative Algorithm

The final iterative algorithm, as outlined by [LEE95], can be summarized as follows:

OldThigh = max {Tcomputei} (1 <= i <= P)

WHILE criteria 1 and 2 are TRUE DO

FOR i = 1 to P

Tcompute’ = (ni + 1) x Ttaski

ENDFOR

//move a task from workstation with Thigh to the one with the smallest Tcompute’

//Update Tcompute of the workstations involved in task migration

NewThigh = max {Tcomputei} (1 <= i <= P)

ENDWHILE

Each iteration of the while loop attempts to move one task from the most heavily loaded

workstation to the most lightly loaded node, as long as the rebalancing criteria are being met. After

some movement, the load monitoring variables are recomputed and the while loop repeats. This

continues until the algorithm detects that there are no more tasks that can be redistributed without

degrading performance. In other words, the while loop iterates until the system is as balanced as

possible given the current timing information.

19

5 ADVANTAGES AND DISADVANTAGES

5.1Global vs. Local Strategies

Global or local policies answer the question of what information will be used to make a

load balancing decision In global policies, the load balancer uses the performance profiles of all

available workstations. In local policies workstations are partitioned into different groups. In a

heterogeneous NOW, the partitioning is usually done such that each group has nearly equal

aggregate computational power. The benefit in a local scheme is that performance profile

information is only exchanged within the group.

The choice of a global or local policy depends on the behavior an application will exhibit.

For global schemes, balanced load convergence is faster compared to a local scheme since all

workstations are considered at the same time. However, this requires additional communication

and synchronization between the various workstations; the local schemes minimize this extra

overhead. But the reduced synchronization between workstations is also a downfall of the local

schemes if the various groups exhibit major differences in performance. notes that if one group

has processors with poor performance (high load), and another group has very fast processors

(little or no load), the latter will finish quite early while the former group is overloaded.

20

5.2Centralized vs. Distributed Strategies

A load balancer is categorized as either centralized or distributed, both of which define

where load balancing decisions are made. In a centralized scheme, theb load balancer is located

on one master workstation node and all decisions are made there. In a distributed scheme, the

load balancer is replicated on all workstations.

Once again, there are tradeoffs associated with choosing one location scheme over the

other. For centralized schemes, the reliance on one central point of balancing control could limit

future scalability. Additionally, the central scheme also requires an “all-to-one” exchange of

profile information from workstations to the balancer as well as a “one-to-all” exchange of

distribution instructions from the balancer to the workstations. The distributed scheme helps

solve the scalability problems, but at the expense of an “all-to-all” broadcast of profile

information between workstations. However, the distributed scheme avoids the “one-to-all”

distribution exchange since the distribution decisions are made on each workstation

21

CONCLUSION

Load balancing is an important issue in a virtual parallel machine built using a low-cost

network of workstations. The most difficult aspect of load balancing in a network of workstations

involves deciding on which algorithm to use. Hundreds of various algorithms have been proposed,

and each one has its own specific motivations and design decisions that result in trade-offs that

aren’t always suited to every imaginable task.

This report described many of the design issues that are commonly considered when

deciding on a load balancing algorithm (such as global/local, centralized/distributed, etc.), as well as

the tradeoffs associated with the various parameters and strategies. Additionally, this report outlined

the details of an algorithm targeted towards SPMD style programs in order to present a concrete

example of the details associated with an actual load balancing implementation. Finally, the load

balancing features for two cluster management software

packages (Load Leveler and Condor) were described briefly.

22

FUTURE SCOPE

While great progress has been made in dynamic load balancing for parallel, unstructured

and/or adaptive applications, research continues to address issues arising due to application and

architecture requirements. Existing algorithms, such as the geometric algorithms RCB and

HSFC, are being augmented to support special needs of complex applications. New models using

hypergraphs are being developed to more accurately represent highly connected, on-symmetric,

and/or rectangular systems arising in density functional theory, circuit simulations, and integer

programming. On heterogeneous computer architectures, software such as DRUM dynamically

detects the available computing, memory and network resources, and provides the resource

information to both existing partitioning algorithms and new hierarchical partitioning strategies.

Software toolkits such as Zoltan deliver these capabilities to applications; enable comparisons of

methods within applications, and serves test-beds for further research and development. While

we present some solutions to these issues, our work represents only small sample of continuing

research into load balancing. For adaptive ¯nit element methods, data movement from an old

decomposition to a new one can consume orders of magnitude more time than the actual

computation of a new decomposition; highly incremental partitioning strategies that minimize

data movement are important for high performance of adaptive simulations.In overlapping

Schwartz preconditioning, the work to be balanced depends on data in both the processor's

subdomain and the overlap region, while the size of the overlap region depends on the

subdomain generated by the partitioned. In such cases, standard partitioning models that assume

work per processor is the total weight of objects assigned to the processor are insouciant;

strategies that treat workloads as a function of the subdomain are needed . Very large scale

semantic networks place additional demands on practitioners, due to both their high connectivity

and irregular structure; highly selective partitioning techniques for these networks are in their

infancy. These examples of research in partitioning, while still not exhaustive, demonstrate that,

indeed, the load-balancing problem is not yet solved.

23

REFERENCES

[1].Szabolcs Kustos, László Bokor, Gábor Jeney,” Testbed Evaluation of Dynamic GGSN

Load Balancing for High Bitrate 3G/UMTS Networks”, Budapest University of

Technology and Economics (BME), Department of Telecommunications (HT) Mobile

Communication and Computing Laboratory (MC2L) – Mobile Innovation Centre (MIK)

Magyar Tudósok krt.2, H-1117, Budapest, Hungary 2011

[2].Van Albada, G.D., Clinckemaillie, J. “Dynamite-blasting Obstacles to Parallel Cluster

Computing”, Technical Report, Department of Computer Science, University of

Amsterdam, The Netherlands, 1996.

[3].Baker, M., Fox, G., Yau, H. “Review of Cluster Management Software”, NHSCReview,

1996 Volume, First Issue, July 1996.

[4].Dandamudi, S., Piotrowski, A. “A Comparative Study of Load Sharing on Networks of

Workstations”, Proceedings of the International Conference on Parallel and Distributed

Computing Systems, New Orleans, October 1997.

[5].Dandamudi, S. Sensitivity Evaluation of Dynamic Load Sharing in Distributed Systems,

Technical Report TR 97-12, Carleton University, Ottawa, Canada.

24

Documents

Dynamic load balancing