38
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte

A Grid Parallel Application Framework

  • Upload
    elle

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte. A Grid Parallel Application Framework. Overview. Parallel Applications on the Grid Latency Hiding by Redundant Processing (LHRP) ‏ PGAFramework Related work Conclusion. - PowerPoint PPT Presentation

Citation preview

Page 1: A Grid Parallel Application Framework

A Grid Parallel Application Framework

Jeremy Villalobos

PhD studentDepartment of Computer Science

University of North Carolina Charlotte

Page 2: A Grid Parallel Application Framework

Overview

Parallel Applications on the Grid

Latency Hiding by Redundant Processing (LHRP)

PGAFramework

Related work

Conclusion

Page 3: A Grid Parallel Application Framework

Parallel Applications on the Grid

Advantages Access to more resources Lower costs Future profits from Grid Economy ?

Challenges IO problem Need for easy-to-use Interface Heterogeneous hardware

Page 4: A Grid Parallel Application Framework

Latency Hiding by Redundant Processing

Latency Hiding problem LHRP Algorithm

CPU type CPU task assigned to each CPU type Versioning system

Mathematical model to describe LHRP Results

Page 5: A Grid Parallel Application Framework

LHRP Latency Hiding Latency Hiding by Redundantly

Processing

Page 6: A Grid Parallel Application Framework

LHRP Algorithm

Internal: Only communicates with LAN CPUs. Border: Communicates with LAN CPUs and

one Buffer CPU Buffer: Communicates with LAN Border CPU

and receives data from WAN Border CPU

Page 7: A Grid Parallel Application Framework

Computation and Communication Stages

Internal: Computes borders Transfers borders

(Non-blocking) Computes core

matrix Waits for transfer

ACK

Page 8: A Grid Parallel Application Framework

Computation and Communication Stages

Border: Computes borders Transfers borders (Non-

blocking) Sends far border Computes core matrix Waits for transfer ACK Checks on far border

transfer ACK (if it is the last iteration Wait)

Page 9: A Grid Parallel Application Framework

Computation and Communication Stages

Buffer: Computes borders Transfers borders (Non-

blocking) Receives far border Computes core matrix Waits for transfer ACK Checks on far border

transfer ACK (if it is the last iteration Wait)

Page 10: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13

Time Steps Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 1

2 2

3 3

4 3

5 36 4

7 5

8 6

9 6

10 6

11 7

12 8

column coordinates

Buffer Node Versioning Algorithm

Page 11: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 34 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

Buffer Node Versioning Algorithm

Page 12: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

Buffer Node Versioning Algorithm

Page 13: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

Buffer Node Versioning Algorithm

Page 14: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 47 58 69 6

10 611 712 813 914 9

column coordinates

Buffer Node Versioning Algorithm

Page 15: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 4 4 4 4 3 3 4 4 4 47 58 69 6

10 611 712 813 914 9

column coordinates

Buffer Node Versioning Algorithm

Page 16: A Grid Parallel Application Framework

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

row

coo

rdin

ates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 4 4 4 4 3 3 4 4 4 47 5 5 5 4 3 3 4 5 5 58 6 6 5 4 4 4 4 5 6 69 6 6 5 5 5 5 5 5 6 6

10 6 6 6 6 6 6 6 6 6 611 7 7 7 7 6 6 7 7 7 712 8 8 8 7 6 6 7 8 8 813 9 9 8 7 7 7 7 8 9 914 9 9 8 8 8 8 8 8 9 9

column coordinates

Buffer Node Versioning Algorithm

Page 17: A Grid Parallel Application Framework

LHRP Algorithm Review

Node types: Internal Border Buffer

Far Border transfer Buffer Node Versioning system

Page 18: A Grid Parallel Application Framework

Estimated Algorithm Performance

G: Grid Latency I: Internal Latency B: Amount of data tuples used by the Buffer Node W: Total amount of work for all CPUs C: Amount of CPUs doing non-redundant work

Page 19: A Grid Parallel Application Framework

Estimated Algorithm Performance

0

50

100

150

200

250

300

350

400

LH vs LHRP

LHRP

LH

Compute time in nanosec per subcell

Est

imat

ed t

otal

tim

e to

com

pute

one

cy

cle

Page 20: A Grid Parallel Application Framework

0

20

40

60

80

100

120

140

160

180

LH vs LHRP

LHRP

LH

Grid Latency

Ca

lcu

late

d T

ota

l Tim

e

Estimated Algorithm Performance

Page 21: A Grid Parallel Application Framework

Process LH LHRP

0 35724 418561 7408 8228

2 7408 10932

3 7408 10928

4 7412 8528

Experimental Result: Memory Footprint

21% increase memory use over conventional form of Latency Hiding.

Causes: Extra Matrix in Buffer

Node to store old column versions

Extra far border buffers.

Page 22: A Grid Parallel Application Framework

Experimental Results: Performance

0

100

200

300

400

500

600

700

800

LH vs LHRP Grid Latency

LH Average

LH Min

LH Max

LHRP Average

LHRP Min

LHRP Max

Grid Latency ( ms )

To

tal C

om

pu

te T

ime

(se

c)

Page 23: A Grid Parallel Application Framework

0 20 40 60 80 100 120 140 160 180 200

0

500

1000

1500

2000

2500

LHRP vs LH Compute TimeLH Average

LH MIN

LH MAX

LHRP Average

LHRP MIN

LHRP MAX

Compute Time in ns per Subcell

Tota

l Com

pute

Tim

e

Experimental Results: Performance

Page 24: A Grid Parallel Application Framework

PGAFramework Objective Design Requirements Implementation technology choices API Design API Workpool Example Other API features

Synchronization option Recursive option

Page 25: A Grid Parallel Application Framework

PGAFramework

Objective: To create an efficient parallel application framework for the grid that allows a user programmer easy interaction with the Grid resources.

Page 26: A Grid Parallel Application Framework

Design Requirements Platform independence Self Deployment Easy-to-Use Interface Provide the following services without requiring

extra effort on the part of the user programmer: Load Balancing Scheduling Fault tolerance Latency/Bandwidth tolerance

Page 27: A Grid Parallel Application Framework

DesignGPAFramework

User's Application

API (Interface)

LoadBalancing Scheduling

Fault Tolerance

LatencyBandwidthTolerance

Globus

Job Scheduler (Condor)

GPAFramework

User's Applications

Hardware Resources

Page 28: A Grid Parallel Application Framework

Deployment

GridWay ?

Globus Globus

Condor PBS

Globus

SGE

Desktop PCs Node Cluster computer node Super computer

SchedulingService

Job Submit Node

ResourceDiscovery

Page 29: A Grid Parallel Application Framework

Implementation Java

Platform Independence JXTA (JXSE)

Peer-to-peer API Provides tools to work-around NAT's and

firewalls Provides library and module runtime loading

API

Page 30: A Grid Parallel Application Framework

Motivation for API Design

Video Codecs Codecs follow an

interfaces What happens inside

the codec does not matter

The input and output for the codec needs to be specified

Display a GuiLoad File...

Output video to screen

mpeg ogg h.264

Video Player

Mpeg endoded

stream

Raw video Data

Page 31: A Grid Parallel Application Framework

PGAFramework API There may be

multiple “template” API's

Each API has Interfaces that the user implements

The user “Inserts” his module into the framework

API

Get data from frameworkCompute on dataReturn processed dataRequest sync (optional)

Give data to framework

Get data from framework Store or pipe data

Schedule processes on ResourceLoad user Data

Create network Determine topology and net behavior Send user process to compute nodes

Get Data from user class Send to master node

Repeat process in loop until done

Page 32: A Grid Parallel Application Framework

API Sample Code

public interface GridAppTemplate {public Data Compute(Data input);public Data DiffuseData(long segment);public void GatherData(long segment, Data dat);public long getDataCount();

}

Page 33: A Grid Parallel Application Framework

API Sample Codepublic class myModule implements GridAppTemplate{

double x1, x2, y1, y2;double total;long random_samples;final long data_count = 100;public myModule(double x1_arg, double x2_arg,

double y1, double y2, long rad_smp){x1 = x1_arg;x2 = x2_arg;total = 0;random_samples = rad_smp;

}

@Override

public Data Compute(Data data) {//convert generic object to my object

MyData dat = (MyData) data;MyOutputData output = new MyOutputData();output.inside=0;//compute

double dist = Math.sqrt(dat.random_x * dat.random_x + dat.random_y * dat.random_y);

if( dist < 1.0){output.inside = 1L;

}return output;

}

@Override

public Data DiffuseData( long segment) {MyData d = new MyData();d.random_x = Math.random()*(x2-x1) + x1;d.random_y = Math.random()*(y2-y1) + y1;return d;

}

@Override

public void GatherData( long segment, Data dat) {MyOutputData data = (MyOutputData) dat;total += data.inside;

}

@Override

public long getDataCount() {return random_samples;

}

public double getPi(){double pi = (total / random_samples ) * 4;return pi;

}

}

API

Page 34: A Grid Parallel Application Framework

API Sample Code

public class UserApplication {public static void main(String[] args) throws

UserModNotSetException {// Instantiate the custom Module that foolows//the GridAppTemplate Interface//the user can use the constructor, or some other way to get the//parameters set, such as file paths and options to tweak an algorithmmyModule mod = new myModule(0.0, 1.0, 0.0, 1.0, 10000);//submit the module to the network

NetworkDeployer deployer = new NetworkDeployer(mod);//start the network

deployer.startNetwork();

double pi = mod.getPi();System.out.println("PI is: " + pi );

}}

Page 35: A Grid Parallel Application Framework

Synchronization option

RemoteHandler provides an Interface to synchronize data

Data is synced non-blocking User creates blocking procedures if needed

public class myRemoteHandler implementsRemoteEventHandler{

mySyncData chunk;public myRemoteHandler( Data chunk){

this.chunk = (mySyncData) chunk;}@Overridepublic void SyncDone(SyncData piece) {

chunk.setPiece( piece );}

}

Page 36: A Grid Parallel Application Framework

Recursive Feature

Allows multiple level of parallelization (granularity)

DecodeVideo

CutRaw Video

IntoPictures

Blur pictures

Blur portion of picturePipeline

Work pool

Synchronous

Page 37: A Grid Parallel Application Framework

Related Work MPI Implementation for the Grid

MPICH-G2 GridMPI MPICH-V2 (MPICH-V1)

Peer-to-peer parallel frameworks P2PMPI (for cluster computing) P3 (for cluster computing)

Self deploying frameworks Jojo

Page 38: A Grid Parallel Application Framework

Conclusions

Parallel Applications on the Grid

Latency Hiding by Redundant Processing (LHRP)

PGAFramework

Related work