2
Synopsis Two classes of talks and posters
➟Computer hardware▓ Dominated by cooling / power consumption▓ Mostly in the plenary sessions
➟Software▓ Grid job workload management systems
Job submission by the experiments Site Job handling, monitoring Grid operations (Monte Carlo production, glexec, interoperability, …) Data integrity checking ….
▓ Storage systems Primarily concerning dCache and DPM Distributed storage systems
Parallel session : Grid middleware and tools
3
Computing hardware
Power requirements of LHC computing➟ Important for running costs
▓ ~330W to provision for 100W of electronics
➟Some sites running with air or water cooled racks
Electronics
Server fans
Voltage regulation
Case power supply
Room power distribution
UPS
Room cooling
Electronics 100 W
Server fans 13 W
Voltage regulation 22 W
Case power supply 48 W
Room power distribution 4 W
UPS 18 W
Room cooling 125 W
4
High performance and multi-core
computing Core Frequencies ~ 2-4 GHz, will not change significantly Power
➟ 1,000,000 cores at 25 W / core = 25 MW▓ Just for the cpu
➟ Have to bring core power down by multiple orders of magnitude▓ reduces chip frequency, complexity, capability
Memory Bandwidth➟ As we add cores to a chip, it is increasingly difficulty to provide sufficient
memory bandwidth➟ Application tuning to manage memory bandwidth becomes critical
Network and I/O Bandwidth, data integrity, reliability➟ A Petascale computer will have Petabytes of Memory➟ Current Single File Servers achieve 2-4 GB/s
▓ 70+ hours to checkpoint 1 Petabyte➟ IO management is a major challenge
Memory Cost➟ Can’t expect to maintain current memory / core numbers at petascale.
▓ 2GB/core for ATLAS / CMS
5
Grid job submission
Most new developments were on pilot agent based grid systems➟ Implement job scheduling based on “pull” scheduling paradigm➟ The only method for grid job submission LHCb
▓ DIRAC (> 3 years experience)▓ Ganga is the user analysis front end
➟ Also used in Alice (and Panda and Magic)▓ AliEn since 2001
➟ Used for production, user analysis, data management in LHCb & Alice➟ New developments for others
▓ Panda : Atlas, Charmm Central server based on Apache
▓ GlideIn : Atlas, CMS, CDF Based on Condor
▓ Used for production and analysis➟ Very successful implementations
▓ Real-time view of the local environment▓ Pilot agents can have some intelligence built into the system
Useful for heterogeneous computing environment▓ Recently Panda to be used for all Atlas production
One talk on distributed batch systems
6
Pilot agents
Pilot agents submitted on demand➟Reserve the resource for
immediate use▓ Allows checking of the environment
before job scheduling▓ Only bidirectional network traffic▓ Unidirectional connectivity
➟Terminates gracefully if no work is available
➟Also called GlideIn-s LCG jobs are essentially pilot
jobs for the experiment
11
Glexec
A thin layer to change Unix domain credentials based on grid identity and attribute information
Different modes of operation ➟ With or without setuid
▓ Ability to change the user id of the final job Enable VO to
➟ Internally manage job scheduling and prioritisation➟ Late binding of user jobs to pilots
In production at Fermilab➟ Code ready and tested, awaiting full audit
12
LSF universus
LSF PBS SGE CCE
Cluster/Desktops
LSF Scheduler
Web PortalJob Scheduler
Cluster/Desktops
LSF SchedulerMultiCluster
13
LSF universus Commercial extension of LSF
➟ Interface to multiple clusters➟Centralised scheduler, but sites retain local control➟LSF daemons installed on head nodes of remote
cluster➟Kerberos for user, host and service authentication➟Scp for file transfer
Currently deployed in ➟Sandia National labs to link OpenPBS, PBS Pro and
LSF clusters➟Singapore national grid to link PBS Pro, LSF and
N1GE clusters➟Distributed European Infrastructure for
Supercomputing Applications (DEISA)
14
Grid interoperability
Many different grids➟ WLCG, Nordugrid, Teragrid, …➟ Experiments span the various grids
Short term solutions have to be ad-hoc➟ Maintain parallel infrastructures by the user, site or both
For the medium term setup adaptors and translators In the long term adopt common standards and interfaces
➟ Important in security, information, CE, SE➟ Most grids use X509 standard➟ Multiple “common” standards …➟ GIN (Grid interoperability now) group working on some of this
SRMSRMSRMStorage Control Protocol
GSI/VOMS
GridFTP
GLUE v1
LDAP/GIIS
GRAM
OSG
GSI/VOMSGSI/VOMSSecurity
GridFTPGridFTPStorage Transfer Protocol
GLUE v1.2ARCSchema
LDAP/BDIILDAP/GIISService Discovery
GRAMGridFTPJob Submission
EGEEARC
15
Distributed storage
GridPP organised into 4 regional Tier-2s in the UK
Currently a job follows data into a site➟Consider disk at one site as close
to cpu at another site▓ Eg. Disk at Edinburgh vs cpu at
Glasgow
➟Pool resources for efficiency and ease of use
➟Jobs need to access storage directly from the worker node
16
RTT between Glasgow and Edinburgh ~ 12 s
Custom rfio client➟ Normal : One call / read➟ Readbuf : Fills internal buffer to
service request➟ Readahead : Reads till EOF➟ Streaming : Separate streams for
control & data Tests using single DPM server Atlas expects ~ 10 MiB/s / job Better performance with dedicated
light path Ultimately a single DPM instance
to span Glasgow and Edinburgh sites
17
Data Integrity Large number of components performing data
management in an experiment Two approaches to checking data integrity
➟ Automatic agents continuously performing checks➟ Checks in response to special events
Different catalogs in LHCb : Bookkeeping, LFC, SE Issues seen :
➟ zero size files: zero size files: ➟ missing replica informationmissing replica information:➟ wrong SAPathwrong SAPath➟ wrong SE host:wrong SE host:➟ wrong protocolwrong protocol
▓ sfn, rfio, bbftp…➟ mistakes in files registrationmistakes in files registration
▓ blank spaces on the surl path▓ carriage returns▓ presence of port number in the surl path..
18
Summary Many experiments have embraced the grid Many interesting challenges ahead
➟ Hardware▓ Reduce the power consumed by cpu-s▓ Applications need to manage with lesser RAM
➟ Software▓ Grid interoperability▓ Security with generic pilots / glexec▓ Distributed grid network
And many opportunities➟ To test solutions to above issues➟ Stress test the grid infrastructure
▓ Get ready for data taking▓ Implement lessons in other fields
Biomed …
➟ Note : 1 fully digitised film = 4 PB and needs 1.25 GB/s to play