27
Kento Aida, Tokyo Institute of Te chnology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting in Singapore

Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Embed Size (px)

Citation preview

Page 1: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Grid Challenge - programming competition on the Grid -

Kento Aida

Tokyo Institute of Technology

22nd APAN Meeting in Singapore

Page 2: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

What is Grid Challenge?

programming competition to develop high-performance programs on the GridThe organizer operates a Grid testbed.Participants develop/run programs on the

testbed.a special event in the Annual Symposium on

Advanced Computing Systems and Infrastructures (SACSIS)

history1st Grid Challenge in SACSIS 20052nd Grid Challenge in SACSIS 2006

Page 3: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Category

compulsoryprogramming competition on the Grid testbedsolving the problem provided by the organizer

Graph Partitioning Problem

students (university and high school)

freegiving opportunities to perform experiments on

the Gridpresentations during the conferencestudents, engineers and researchers

Page 4: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Compulsory

Graph Partitioning Problemfor given undirected graph G(V,E), |V| = 2nL and R are disjoint partitions generated by equally dividing G, where |L| = |R|.Find partition that minimizes the number of edges with one endpoint in L and the other in R.

2

3

4

5

61

L R

Page 5: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Compulsory (cont’d)

qualifying runs (3 weeks)Solve early!

to find a solution within a given thresholdshared resourcesproblem size: |V| = 500 - 1500

final runs (2 weeks)Solve fast!

dedicated time slots for finalists (2.5h per a team)to find a solution within a given period (10 min)A finalist with the best solution will be a winner!problem size: |V| = 30000 - 35000

Page 6: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Free

experiments of research projects (1 month)shared resources

projectstools

a monitoring tool, a message passing system, a programming tool, volunteer computing

applicationsphysics simulation, bio informatics, simulation of

diesel engine, optimization problems

Page 7: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Participants

D, 2

M, 12U, 6

H, 1

compulsory free

D, 2

M, 5

U, 1

Page 8: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Testbed

Grid Challenge FederationAISTTokyo Institute of TechnologyThe University of TokyoDoshisha University

more than 1,200 CPUs

Page 9: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Resources

collection of PC clustersspec of a PC cluster

a gateway nodegateway, compiling

computing nodescomputation

global IP address/private IP addressNFS

“/home” is shared among nodes

Page 10: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Resources (cont’d)name site compt. node #compt. node

(#CPUs)

F32 AIST(Tsukuba)

Xeon 3GHz x2, 4GB mem.,1000BASE-T

128(256)

SAKURA Opteron 1.8GHz x2, 3GB mem., 1000BASE-T

16(32)

DIS TITECH(Yokohama)

Athlon MP 2000+ 1.6GHz x2, 512MB mem. 100BASE-TX

50(100)

PrestoIII TITECH(Tokyo)

Opteron 246/242 2/1.6GHz x2, 4/3/2GB mem. 1000BASE-T

103(206)

Tau U. Tokyo(Tokyo)

Xeon 2.4/2.8GHz x2, 2GB mem., 1000BASE-T

175(350)

Chikayama U. Tokyo(Chiba)

Xeon 2.4GHz x2, 2GB mem., 1000BASE-T

64(128)

Xenia Doshisha U.(Kyoto)

Xeon 2.4GHz x2, 1GB em. 100BASE-TX

63/126

Page 11: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Internet Connection

TsukubaWAN

F32

SAKURA

PrestoIII

Chikayama

Tau

DIS

SINETXenia

WIDE

Page 12: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Software

Grid middlewareGlobus Tool Kit 2.4

batch queueing systemSun Grid Engine, PBS

remote process invocationSSH, GXP

monitoringGanglia

programmingMPICH 1.2.7, Ninf-G 2.4

Page 13: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

GXP

shell for distributed multi-cluster environmentfast simultaneous command submissionsparallel job pipesinteractive selection of nodes to execute

commandsno cumbersome per-node operations!

installation and deploymentinvocation of parallel processesmonitoring, trouble diagnosis, debugging dead processes clean-up

http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml

Page 14: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Ninf-G

reference implementation of GridRPCGridRPC : a simple RPC-based programming

model for the GridClient invokes remote libraries installed on remote

servers on the Grid.utilizing task parallelism

http://ninf.apgrid.org/

server

librarylibrary

server

librarylibrary

data

resultdata

result

client

clientprogram

serverprogram

grpc_call(…)

Page 15: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Ganglia

a distributed monitoring tool for high-performance computing systems such as PC clusters and GridsCPU loadmemory usagenetwork traffic

http://ganglia.sourceforge.net/

Page 16: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Operation

The testbed is operated by volunteers!researchers/technical staff/students

What we need to doinstallation and its training for studentsuser managementjob management

Page 17: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

User Management

local accountthe same UID and login name for a user on all si

tesremote login via ssh

public key

Globus accounttemporal CA for the Grid Challenge

Page 18: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Job Management

interactive or batchAll sites provide both environment for job

execution.

dedicated slotFinalists are assigned

dedicated slots for their application runs.

the gentlemen’s agreement

Page 19: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Troubles …

computing nodesOS hang up, troubles on hard disc drives

power supplyfailure of balancing power supply

serverstroubles on NFS, batch queueing systems

monitoringtroubles to collect monitoring data on ganglia

Page 20: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Troubles … (cont’d)

jobs being out of controlwaste of CPU/memory resources by jobs being

out of control

dedicated slotsjobs running beyond its slot.

Page 21: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Operational Issue

trouble on computing nodesmonitoring tools to identify computing nodes

power supplycritical problem for small groups, e.g., a lab in

universitytools for power monitoringlow-power processor

serversredundancy

Page 22: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Operational Issue (cont’d)

user/process managementtools to control user processes

monitoring user processesdetecting unusual behaviorsuspending/killing jobs being out of control

tools for reservationreserving dedicated slots for userscontrolling user jobs

Page 23: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Snapshots qualifying runs

final runs

Page 24: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Snapshots (cont’d)

Page 25: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Conclusions

Grid Challenge is programming competition to develop high-performance programs on the Grid.compulsory and free categories

Grid testbed for Grid Challenge6 sites, 7 PC clusters, >1200 CPUGlobus, SGE, PBS, GXP, Ganglia, Ninf-G,

MPICH, …discussion about operational issue

tools for monitoring, power supply, user/process management

Page 26: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Acknowledgements

Information Processing Society of JapanSun MicrosystemsSoum Corporation Grid Consortium Japan

Page 27: Kento Aida, Tokyo Institute of Technology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting

Kento Aida, Tokyo Institute of Technology

Thank you.