24
[email protected] 10- Feb-00 CERN Building a Building a Regional Regional Centre Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT

Building a Regional Centre

Embed Size (px)

DESCRIPTION

Building a Regional Centre. A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT. Summary. LHC regional computing centre topology Some capacity and performance parameters From components to computing fabrics Remarks about regional centres - PowerPoint PPT Presentation

Citation preview

[email protected] 10-Feb-00

CERN Building a Building a Regional Regional CentreCentre

A few ideas & a personal view

CHEP 2000 – Padova

10 February 2000

Les Robertson

CERN/IT

CERN

10-feb-00 - #2les robertson - cern/it

SummarySummary

LHC regional computing centre topology Some capacity and performance parameters From components to computing fabrics Remarks about regional centres Policies & sociology Conclusions

CERN

10-feb-00 - #3les robertson - cern/it

Why Regional Centres?Why Regional Centres?

Bring computing facilities closer to home final analysis on a compact cluster in the physics department

Exploit established computing expertise & infrastructure

Reduce dependence on links to CERN full ESD available nearby - through a fat, fast, reliable network link

Tap funding sources not otherwise available to HEP

Devolve control over resource allocation national interests? regional interests? at the expense of physics interests?

CERN

10-feb-00 - #4les robertson - cern/it

Department

Desktop

The MONARC RC The MONARC RC Topology Topology

CERN – Tier 0

MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

Tier 1 FNALRAL

IN2P3622 M

bps2.5 Gbps

622 M

bp

s

155

mbp

s 155 mbps

Tier2 Lab a

Uni b Lab c

Uni n

Tier 0 – CERN Data recording, reconstruction, 20% analysis Full data sets on permanent mass storage

– raw, ESD, simulated data Hefty WAN capability Range of export-import media 24 X 7 availability

Tier 1 – established data centre or new facility hosted by a lab

Major subset of data – all/most of the ESD, selected raw data Mass storage, managed data operation ESD analysis, AOD generation, major analysis capacity Fat pipe to CERN High availability User consultancy – Library & Collaboration Software support

Tier 2 – smaller labs, smaller countries, probably hosted by existing data centre

Mainly AOD analysis Data cached from Tier 1, Tier 0 centres No mass storage management Minimal staffing costs

University physics department Final analysis Dedicated to local users Limited data capacity – cached only via the network Zero administration costs (fully automated)

CERN

10-feb-00 - #5les robertson - cern/it

The MONARC RC The MONARC RC Topology Topology

CERN – Tier 0

MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

Tier 1 FNALRAL

IN2P3622 M

bps2.5 Gbps

622 M

bp

s

155

mbp

s 155 mbps

Tier2 Lab a

Uni b Lab c

Uni n

Department

Desktop

CERN

10-feb-00 - #6les robertson - cern/it

More realistically - a Grid More realistically - a Grid TopologyTopology

CERN – Tier 0

Tier 1 FNALRAL

IN2P3622 M

bps2.5 Gbps

622 M

bp

s

155

mbp

s 155 mbps

Tier2 Lab a

Uni b Lab c

Uni n

Department

Desktop

DHL

CERN

10-feb-00 - #7les robertson - cern/it

Capacity / PerformanceCapacity / Performance

Based on CMS/Monarc estimates (early 1999)Rounded, extended and adapted by LMR

CERNCMS or ATLAS

Tier 11 expt.

Tier 12 expts.

Capacity in 2006

Annual increase

Capacity in 2006

CPU (K SPECint95)** 600 200 120 240

Disk (TB) 550 200 110 220

Tape (PB) (including copies at CERN)

3.4 2 0.4 <1

I/O rates disk (GB/sec)

tape (MB/sec)

50

400

10

50

20

100

WAN bandwidth Gbps 2.5 2.5

20% CERNall CERN

today

~15K SI95

~25 TB

~100 MB/sec

** 1 SPECint95 = 10 CERNunits = 40 MIPS

CERN

10-feb-00 - #8les robertson - cern/it

Capacity / PerformanceCapacity / PerformanceBased on CMS/Monarc estimates (early 1999)Rounded, extended and adapted by LMR

Tier 12 expts.

Capacity in

2006

CPU (K SPECint95) 240 ~1200 cpus~600 boxes

Disk (TB) 220 At least 2400 disks

~100 GB/disk (only!)

Tape (PB) (including copies at CERN)

<1

I/O rates disk (GB/sec)

tape (MB/sec)

20

100

40 MB/sec/cpu20 MB/sec/disk

WAN bandwidth Gbps

2.5 300 MB/sec

Approx. Number of farm PCs at CERN today

May not find disks as small as that!

But we need a high disk count for access,

performance, RAID/mirroring, etc.

We probably have to buy more disks, larger disks,

& use the disks that come with the PCs

much more disk space

Effective throughput of LAN

backbone1.5% of LAN

CERN

10-feb-00 - #9les robertson - cern/it

Building a Regional Building a Regional CentreCentre

Commodity components are just fine for HEP

Masses of experience with inexpensive farms LAN technology is going the right way

Inexpensive high performance PC attachments Compatible with hefty backbone switches

Good ideas for improving automated operation and management

CERN

10-feb-00 - #10les robertson - cern/it

Evolution of today’s analysis Evolution of today’s analysis farmsfarms

Computing & Storage Fabric

built up from commodity components

Simple PCs Inexpensive network-attached disk Standard network interface

(whatever Ethernet happens to be in 2006)

with a minimum of high(er)-end components LAN backbone WAN connection

CERN

10-feb-00 - #11les robertson - cern/it

StandardStandard components components

Computing & Storage Fabric

built up from commodity components Simple PCs Inexpensive network-attached disk Standard network interface

(whatever Ethernet happens to be in 2006)

with a minimum of high(er)-end components LAN backbone WAN connection

CERN

10-feb-00 - #12les robertson - cern/it

HEP’s not special, just more HEP’s not special, just more cost cost consciousconscious

Computing & Storage Fabric

built up from commodity components Simple PCs

Inexpensive network-attached disk Standard network interface

(whatever Ethernet happens to be in 2006)

with a minimum of high(er)-end components LAN backbone WAN connection

CERN

10-feb-00 - #13les robertson - cern/it

Limit the role of high end equipmentLimit the role of high end equipment

Computing & Storage Fabric built up from commodity components

Simple PCs Inexpensive network-attached disk Standard network interface

(whatever Ethernet happens to be in 2006)

with a minimum of high(er)-end components

LAN backbone WAN connection

CERN

10-feb-00 - #14les robertson - cern/it

Components Components building building blocksblocks

2000 – standard office equipment36 dual cpus ~900 SI95120 72GB disks ~9 TB

2005 – standard, cost-optimised,Internet warehouse equipment

36 dual 200 SI95 cpus= 14K SI95s

~ $100K

224 3.5” disks 25-100 TB

$50K - $200K

For capacity & cost estimates see the 1999 Pasta Report: http://nicewww.cern.ch/~les/pasta/welcome.html

CERN

10-feb-00 - #15les robertson - cern/it

The Physics Department The Physics Department SystemSystem

Two 19” racks & $200K CPU – 14K SI95 (10% of a Tier1 centre) Disk – 50TB (50% of a Tier1 centre)

Rather comfortable analysis machine

Small Regional Centres are not going to be

competitive Need to rethink the storage capacity at the Tier1

centres

CERN

10-feb-00 - #16les robertson - cern/it

Tier 1, Tier 2 RCs, CERNTier 1, Tier 2 RCs, CERN

A few general remarks: A major motivation for the RCs is that we are hard pressed to

finance the scale of computing needed for LHC We need to start now to work together towards minimising

costs Standardisation among experiments, regional centres, CERNso that we can use the same tools and practices to … Automate everything

Operation & monitoring Disk & data management Work scheduling Data export/import (prefer the network to mail)

in order to … Minimise operation, staffing –

Trade off mass storage for disk + network bandwidth Acquire contingency capacity rather than fighting bottlenecks Outsource what you can (at a sensible price) …….

Keep it simple

Work together

CERN

10-feb-00 - #17les robertson - cern/it

The middlewareThe middleware

The issues are: integration of this amorphous collection of Regional

Centres Data Workload Network performance

application monitoring quality of data analysis service

Leverage the “Grid” developments Extending Meta-computing to Mass-computing Emphasis on data management & caching

… and production reliability & quality – Keep it simple

Work together

CERN

10-feb-00 - #18les robertson - cern/it

Processors20 “standard” racks = 1,440 cpus 280K SI95

Disks12 “standard” racks = 2,688 disks 300TB (with low capacity disks)

A 2-experiment Tier 1 CentreA 2-experiment Tier 1 Centre

tape/DVD

net

cpu/disk

200 m2

Basic equipment

~ $3mcpus/disks

Requirement:240K SI95220 TB

CERN

10-feb-00 - #19les robertson - cern/it

The full costs?The full costs?

Space Power, cooling Software

LAN Replacement/Expansion 30% per year

Mass storage

People

CERN

10-feb-00 - #20les robertson - cern/it

mass storagemass storage ? ?

Do all Tier 1 centres really need a full mass storage operation?

Tapes, robots, storage management software?

Need support for export/import media But think hard before getting into mass storage Rather

more disks, bigger disks, mirrored disks cache data across the network from another centre

(that is willing to tolerate the stresses of mass storage management)

Mass storage is person-power intensive long term costs

CERN

10-feb-00 - #21les robertson - cern/it

Consider outsourcingConsider outsourcing

Massive growth in co-location centres, ISP warehouses, ASPs, storage renters, etc.

Level 3, Intel, Hot Office, Network Storage Inc, PSI, ….

There will probably be one near you Check it out – compare costs & prices

Maybe personnel savings can be made

CERN

10-feb-00 - #22les robertson - cern/it

Policies & sociologyPolicies & sociology

Access policy? Collaboration-wide?

or restricted access (regional, national, ….) A rich source of unnecessary complexity

Data distribution policies

Analysis models Monarc work will help to plan the centres But the real analysis models will evolve when the data

arrives Keep everything flexible – simple architecture - simple policies - minimal politics

CERN

10-feb-00 - #23les robertson - cern/it

Concluding remarks IConcluding remarks I

Lots of experience with farms of inexpensive components

We need to scale them up – lots of work but we think we understand it

But we have to learn how to integrate distributed farms into a coherent analysis facility

Leverage other developments But we need to learn through practice and

experience

Retain a healthy scepticism for scalability theories Check it all out on a realistically sized testbed

CERN

10-feb-00 - #24les robertson - cern/it

Concluding remarks IIConcluding remarks II

Don’t get hung up on optimising component costsDo be very careful with head-count

Personnel costs will probably dominate

Define clear objectives for the centre – Efficiency, capacity, quality Think hard if you really need mass storage Discourage empires & egos Encourage collaboration & out-sourcing

In fact – maybe we can just buy all this as an Internet service