A Grid Parallel Application Framework

Jeremy Villalobos

PhD studentDepartment of Computer Science

University of North Carolina Charlotte

Overview

Parallel Applications on the Grid

Latency Hiding by Redundant Processing (LHRP)

PGAFramework

Related work

Conclusion

Advantages Access to more resources Lower costs Future profits from Grid Economy ?

Challenges IO problem Need for easy-to-use Interface Heterogeneous hardware

Latency Hiding by Redundant Processing

Latency Hiding problem LHRP Algorithm

CPU type CPU task assigned to each CPU type Versioning system

Mathematical model to describe LHRP Results

LHRP Latency Hiding Latency Hiding by Redundantly

Processing

LHRP Algorithm

Internal: Only communicates with LAN CPUs. Border: Communicates with LAN CPUs and

one Buffer CPU Buffer: Communicates with LAN Border CPU

and receives data from WAN Border CPU

Computation and Communication Stages

Internal: Computes borders Transfers borders

(Non-blocking) Computes core

matrix Waits for transfer

Border: Computes borders Transfers borders (Non-

blocking) Sends far border Computes core matrix Waits for transfer ACK Checks on far border

transfer ACK (if it is the last iteration Wait)

Buffer: Computes borders Transfers borders (Non-

blocking) Receives far border Computes core matrix Waits for transfer ACK Checks on far border

transfer ACK (if it is the last iteration Wait)

Grid to Local Latency Ratio = 3

1 2 3 4 5 6 7 8 9 10 11 12 13

Time Steps Node 2 Node 3 Node 4 Node 5

1 1 1 1 1 0 0 1 1 1 1

5 36 4

column coordinates

Buffer Node Versioning Algorithm

1 2 3 4 5 6 7 8 9 10 11 12 13Node 2 Node 3 Node 4 Node 5

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 34 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 36 47 58 69 6

10 611 712 813 914 9

column coordinates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 47 58 69 6

10 611 712 813 914 9

column coordinates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 4 4 4 4 3 3 4 4 4 47 58 69 6

10 611 712 813 914 9

column coordinates

1 1 1 1 1 0 0 1 1 1 12 2 2 2 1 0 0 1 2 2 23 3 3 2 1 1 1 1 2 3 34 3 3 2 2 2 2 2 2 3 35 3 3 3 3 3 3 3 3 3 36 4 4 4 4 3 3 4 4 4 47 5 5 5 4 3 3 4 5 5 58 6 6 5 4 4 4 4 5 6 69 6 6 5 5 5 5 5 5 6 6

10 6 6 6 6 6 6 6 6 6 611 7 7 7 7 6 6 7 7 7 712 8 8 8 7 6 6 7 8 8 813 9 9 8 7 7 7 7 8 9 914 9 9 8 8 8 8 8 8 9 9

column coordinates

LHRP Algorithm Review

Node types: Internal Border Buffer

Far Border transfer Buffer Node Versioning system

Estimated Algorithm Performance

G: Grid Latency I: Internal Latency B: Amount of data tuples used by the Buffer Node W: Total amount of work for all CPUs C: Amount of CPUs doing non-redundant work

LH vs LHRP

Compute time in nanosec per subcell

LH vs LHRP

Grid Latency

Process LH LHRP

0 35724 418561 7408 8228

2 7408 10932

3 7408 10928

4 7412 8528

Experimental Result: Memory Footprint

21% increase memory use over conventional form of Latency Hiding.

Causes: Extra Matrix in Buffer

Node to store old column versions

Extra far border buffers.

Experimental Results: Performance

LH vs LHRP Grid Latency

LH Average

LH Min

LH Max

LHRP Average

LHRP Min

LHRP Max

Grid Latency ( ms )

0 20 40 60 80 100 120 140 160 180 200

LHRP vs LH Compute TimeLH Average

LH MIN

LH MAX

LHRP Average

LHRP MIN

LHRP MAX

Compute Time in ns per Subcell

Experimental Results: Performance

PGAFramework Objective Design Requirements Implementation technology choices API Design API Workpool Example Other API features

Synchronization option Recursive option

PGAFramework

Objective: To create an efficient parallel application framework for the grid that allows a user programmer easy interaction with the Grid resources.

Design Requirements Platform independence Self Deployment Easy-to-Use Interface Provide the following services without requiring

extra effort on the part of the user programmer: Load Balancing Scheduling Fault tolerance Latency/Bandwidth tolerance

DesignGPAFramework

User's Application

API (Interface)

LoadBalancing Scheduling

Fault Tolerance

LatencyBandwidthTolerance

Globus

Job Scheduler (Condor)

GPAFramework

User's Applications

Hardware Resources

Deployment

GridWay ?

Globus Globus

Condor PBS

Globus

Desktop PCs Node Cluster computer node Super computer

SchedulingService

Job Submit Node

ResourceDiscovery

Implementation Java

Platform Independence JXTA (JXSE)

Peer-to-peer API Provides tools to work-around NAT's and

firewalls Provides library and module runtime loading

Motivation for API Design

Video Codecs Codecs follow an

interfaces What happens inside

the codec does not matter

The input and output for the codec needs to be specified

Display a GuiLoad File...

Output video to screen

mpeg ogg h.264

Video Player

Mpeg endoded

stream

Raw video Data

PGAFramework API There may be

multiple “template” API's

Each API has Interfaces that the user implements

The user “Inserts” his module into the framework

Get data from frameworkCompute on dataReturn processed dataRequest sync (optional)

Give data to framework

Get data from framework Store or pipe data

Schedule processes on ResourceLoad user Data

Create network Determine topology and net behavior Send user process to compute nodes

Get Data from user class Send to master node

Repeat process in loop until done

API Sample Code

public interface GridAppTemplate {public Data Compute(Data input);public Data DiffuseData(long segment);public void GatherData(long segment, Data dat);public long getDataCount();

API Sample Codepublic class myModule implements GridAppTemplate{

double x1, x2, y1, y2;double total;long random_samples;final long data_count = 100;public myModule(double x1_arg, double x2_arg,

double y1, double y2, long rad_smp){x1 = x1_arg;x2 = x2_arg;total = 0;random_samples = rad_smp;

@Override

public Data Compute(Data data) {//convert generic object to my object

MyData dat = (MyData) data;MyOutputData output = new MyOutputData();output.inside=0;//compute

double dist = Math.sqrt(dat.random_x * dat.random_x + dat.random_y * dat.random_y);

if( dist < 1.0){output.inside = 1L;

}return output;

@Override

public Data DiffuseData( long segment) {MyData d = new MyData();d.random_x = Math.random()*(x2-x1) + x1;d.random_y = Math.random()*(y2-y1) + y1;return d;

@Override

public void GatherData( long segment, Data dat) {MyOutputData data = (MyOutputData) dat;total += data.inside;

@Override

public long getDataCount() {return random_samples;

public double getPi(){double pi = (total / random_samples ) * 4;return pi;

API Sample Code

public class UserApplication {public static void main(String[] args) throws

UserModNotSetException {// Instantiate the custom Module that foolows//the GridAppTemplate Interface//the user can use the constructor, or some other way to get the//parameters set, such as file paths and options to tweak an algorithmmyModule mod = new myModule(0.0, 1.0, 0.0, 1.0, 10000);//submit the module to the network

NetworkDeployer deployer = new NetworkDeployer(mod);//start the network

deployer.startNetwork();

double pi = mod.getPi();System.out.println("PI is: " + pi );

Synchronization option

RemoteHandler provides an Interface to synchronize data

Data is synced non-blocking User creates blocking procedures if needed

public class myRemoteHandler implementsRemoteEventHandler{

mySyncData chunk;public myRemoteHandler( Data chunk){

this.chunk = (mySyncData) chunk;}@Overridepublic void SyncDone(SyncData piece) {

chunk.setPiece( piece );}

Recursive Feature

Allows multiple level of parallelization (granularity)

DecodeVideo

CutRaw Video

IntoPictures

Blur pictures

Blur portion of picturePipeline

Work pool

Synchronous

Related Work MPI Implementation for the Grid

MPICH-G2 GridMPI MPICH-V2 (MPICH-V1)

Peer-to-peer parallel frameworks P2PMPI (for cluster computing) P3 (for cluster computing)

Self deploying frameworks Jojo

Conclusions

Latency Hiding by Redundant Processing (LHRP)

PGAFramework

Related work

A Grid Parallel Application Framework

Documents

Presentation Overview 1. Models of Parallel Computing The evolution of the conceptual framework behind parallel systems. 2.Grid Computing The creation

Parallel Large Scale Simulations in the PL-Grid Environmentlib.psnc.pl/Content/685/10.12921_cmst.2010.SI.01.47-56_Kurowski.pdf · The PL-Grid Project is funded under the framework

Buy Grid Framework

NATIONAL SMART GRID MISSION IMPLEMENTATION FRAMEWORK · National Smart Grid Mission Implementation Framework i NATIONAL SMART GRID MISSION IMPLEMENTATION ... Government of India

Grid Services and Technologies Valuation Framework - … · Grid Services and Technologies Valuation Framework ... Grid Services and Technologies Valuation ... S tak eholders

NIST Smart Grid Interoperability Framework

MPI-AMRVAC: A parallel, grid

Module CSM23 - Grid Computing Lectures in Parallel Computing

Adaptive grid algorithm in application to parallel

CHALLENGES FOR PARALLEL I/O IN GRID COMPUTINGusers.ece.northwestern.edu/.../ChallengesForParallelIOInGridComputi… · CHALLENGES FOR PARALLEL I/O IN GRID COMPUTING AVERY CHING1,KENINCOLOMA2,

Parallel Job Submission In Grid Environment Using Parallel Particle Swarm Optimization

Grid Computing Framework A Java framework for managed modular distributed parallel computing

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte

big data processing, semantic grid, axcp computing, grid computing, parallel architectures

National Grid Requirements for Interconnection of Parallel

pNFS, parallel storage for grid, virtualization and database

GridSphere’s Grid Portlets A Grid Portal Development Framework

High-Performance Parallel Database Processing and Grid ... · High-Performance Parallel Database Processing and Grid Databases David Taniar Monash University, Australia Clement H.C

A Distributed Parallel Programming Framework

The Grid: From Parallel to Virtualized Parallel Computing