27
MiG Projects DM75

MiG Projects DM75. Runtime Environment DB Problem: –Many Grid jobs assume that the binaries and libraries are already available –Even if this is the case

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

MiG Projects

DM75

Runtime Environment DB

• Problem:– Many Grid jobs assume that the binaries and libraries

are already available– Even if this is the case – where are those files

placed?

• Solution– Build a database for maintaining runtime

environments– Define rules for environment settings– Allow automatic testing for correctness of those

settings

Runtime Environment DB

• Example:– POV-Ray– Define POV_RAY_EXE_PATH

• /usr/local/bin/pov34/bin

– Define POV_RAY_LIB_PATH• /usr/local/bin/pov34/include

– Test correctness• which povray34 = /usr/local/bin/povray34/bin/povray34?• find $POV_RAY_LIB_PATH colors.inc =

lib/colors.inc

Remote File Access Proxy

• Problem– Not all systems has HTTPS access from the

nodes

• Solution– Assuming that the nodes and the front-end

can communicate - place a proxy-server on the front-end

Remote File Access Proxy

Direct Access Proxy Access

Remote File Access Proxy

• Issues– Security token handling– Performance

• Potential– Caching– Prefetching without CPU interference

Resource specification detection

• Problem– A resource is defined by a large set of parameters:

• Architecture, memory, diskspace,…• Access rights, user-id, node-access, queue-system• Runtime enviroments

• Solutions– Have the sysadm add all the information automatically– Run a program that identifies as many components as

possible

Resource specification detection

• Examples– OS = `uname`– if OS==‘Linux’

• cat /proc/cpuinfo | grep CPU | awk ‘{print $4}’• tempdrive = `mount | grep /tmp’• if tempdrive = ‘’ tempdrive = ‘/’• space = df $tempdrive• gcc_ver = `gcc –v’• if gcc_ver != ‘’ gcc_env = gcc_ver

Monitor

• We would like to make nice presentations of the state of MiG– # users– #jobs– #Resources

• ID of resources that are not anonymous

– Estimated time to start execution

• All sorted, filtered and presented as the users requests

Accounting

• We need to do realiable accounting

• When a job is submitted to a queue the server must ask a bank to deposit credits corresponding to the maximum use

• After execution the server must ask to be given the credits corresponding to the resources that were actually used

Accounting

Server

Bank

Job: 10h 1GB mem

Reserve (10,1)

Run Job

Actual use (<=(10,1)

Con

firm

Debit (x,y) C

onfi

rm

Accounting

• Secure

• Reliable

• What happens if– The job crashes– The server crashes– The Bank Crashes

• ?

Grid Units

• We need to be able to define the performance of a system– Processing speed– IO performance– Networking performance

• Units:– Generic single CPU

• Balanced CPU speed and IO

– Generic MPP• Balanced of all 3

– Individual of the 3

Grid Units

• The definition of a system should be determined automatically by a program

• A user should be able to run his applications and get an idea of the Grid units it uses– time a.out

• Tells us disk need and CPU need• Determining network dependency is harder!!!

Applications

• Dalton

• Pov-Ray

• BLAST

• Others…

Dalton

• Very important application i chemistry

• Fairly small input files

• Fairly small output files

• Huge runtime

• Local expertise

• Very well suited for a Web-portal!!!

Dalton Portal

POV-Ray

• Popular

• Simple

• Can be parallelized using Grid

• Fairly small input

• Medium to small output

• Very well suited for a Web-portal!!!

POV-Ray

BLAST

• Very important– Right now these guys eat a lot of the time on

Horseshoe

• National expertise

• Large input files

• Small output-files

• Should be scriptable

• But portals are also interesting

Bio-BLASTE.Coli vs. Human

1GB512 MB

Shared data-structures for Grid

• There are many scenarios where Grid jobs could communicate through shared data-structures

• Examples– Single variables– Bounded buffers– Arrays– Objects

• All access must be secure!!!

Interfacing with other Grid Implementations

• It is interresting for MiG to accept other Grids as– Users– Resources

• Examples:– NorduGrid– gLite– Gridbus– Unicore– OfficeGrid

Supporting more Queuing systems

• Different resources use different queuing systems

• Examples– PBS/Torque– LSF– LoadLeveler– OfficeGRID

Programmers API

• It is interesting for programmers to be able to Grid enable their applications directly– Access Grid files– Submit jobs– Retrieve results

• For this a library with these features must be designed and implemented

Statistics

• Just like monitoring it is interresting to obtain statictics on Grid

• Examples– Usage– #Users– Turn-over-time– Activation time– #Resources– etc…

Many others

• Including your own proposals!