27
Hosting Large-scale e-Infrastructure Resources Mark Leese [email protected]

Hosting Large-scale e-Infrastructure Resources Mark Leese [email protected]

Embed Size (px)

Citation preview

Page 1: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Hosting Large-scalee-Infrastructure Resources

Mark Leese

[email protected]

Page 2: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Contents

• Speed dating introduction to STFC

• Idyllic life, pre-e-Infrastructure

• Sample STFC hosted e-Infrastructure projects

• RAL network re-design

• Other issues to consider

Page 3: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

STFC• One of seven publicly funded UK Research

Councils• Formed from 2007 merger of CCLRC and PPARC• STFC does a lot, including…

– awarding research, project & PhD grants– providing access to international science facilities

through its funded membership of bodies like CERN– shares it expertise in areas such as materials and

space science with academic and industrial communities

• …but it is mainly recognised for hosting large scale scientific facilities, inc. High Performance Computing (HPC) resources

Page 4: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Harwell Oxford Campus

- STFC major shareholder in Diamond Light Source

- Electron beam accelerated to near light speed within ring

- Resulting light (X-Ray, UV or IR) interacts with samples being studied

- ISIS- ‘super-microscope’ employing

neutron beams to study materials at atomic level

Page 5: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Harwell Oxford Campus

- STFC’s Rutherford Appleton Lab is part of Harwell Oxford Science and Innovation Campus with UKAEA, and commercial campus management company

- Co-locate hi-tech start-ups and multi-national organisations alongside established scientific and technical expertise

- Similar arrangement at Daresbury in Cheshire- Both within George Osbourne Enterprise

Zones:- Reduced business rates - Government support for roll out of super

fast broadband

Page 6: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

PreviousExperiences

Page 7: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

ATLAS

CMS

ALICE LHCb

16.5 mile

s

Large Hadron Collider• LHC at CERN• Search for

elementary but hypothetical Higgs boson particle

• Two proton (hadron) beams

• Four experiments (particle detectors)

• Detector electronics generate data during collisions

Page 8: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

LHC and Tier-1• After initial processing,

the four experiments generated 13 PetaBytes of data in 2010 (> 15m GB or 3.3m single layer DVDs)

• In last 12 months, Tier-1 received ≈ 6 PBs from CERN and other Tier-1s

• GridPP contributes equivalent of 20,000 PCs

Page 9: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Backup

10 Gbps lightpath

CERN LHC OPN

Optical PrivateNetwor

k

UK Tier-1 at RAL

RAL Site

Site Access Router

Primary

Backup

Front Door

Firewall

Security

Tier-1

PetaBytes?!?“Normal” data

UK Light

Router

LHC data

Tier-0 & other Tier-

1s

Tier-1 to Tier-2s (universities)

• Individual Tier-1 hosts route data to routers A or UKLight as appropriate

• Config pushed out with Quattor Grid/cluster management tool

• Access Control Lists of IP address on SAR, UKLight router and/or hosts replaces firewall security

• As Tier-2 (universities) network capabilities increase, so must RAL’s (102030 Gbps)

Router A

Internal Distributio

n

Page 10: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

- LOw Frequency Array- World's largest and most

sensitive radio telescope- Thousands of simple dipole

antennas, 38 European arrays- 1st UK array opened at

Chilbolton, Sept 2010- 7 PetaBytes a year raw data

generated (> 1.5m DVDs)- Data transmitted in real-time

to IBM BlueGene/P super computer at Uni of Groningen

- Data processed & combined in software to produce images of the radio sky

LOFAR

Page 11: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

- 10 Gbps Janet Lightpath- Janet GÉANT SURFnet- Big leap from FedEx’ing data

tapes or drives- 2011 RCUK e-IAG

“Southampton and UCL make specific reference ... quicker to courier 1TB of data on a portable drive”

- Funded by LOFAR-UK- cf. LHC: centralised not

distributed processing- Expected to pioneer approach

for other projects, e.g. Square Kilometre Array

LOFAR

Page 12: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Sample STFCe-Infrastructure

Projects

Page 13: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

ICE-CSE• International Centre of Excellence for Computational Science

and Engineering• Was going to be Hartree Centre, now DFSC• STFC Daresbury Laboratory, Cheshire• Partnership with IBM• Mission to provide HPC resources and develop software• DL previously hosted HPCx, big academic HPC before HECToR• IBM BlueGene/Q supercomputer• 114,688 processor cores, 1.4 Petaflops peak performance • Partner IBM’s tests were first time a Petaflop application has

been run in the UK (one thousand trillion calculations per second)

• 13th in this year’s TOP500 worldwide list• Rest of Europe appears five times in Top 10• DiRAC and HECToR (Edinburgh) 20th and 32nd

Page 14: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

ICE-CSE• DL network upgraded to support up to 8 * 10 Gbps lightpaths to current regional

Janet deliverer, Net North West, in Liverpool and Manchester• Same optical fibres, different colours of light:

1. 10G JANET IP service (primary)2. 10G JANET IP service (secondary)3. 10G DEISA (consortium of European supercomputers)4. 10G HECToR (Edinburgh)5. 10G ISIC (STFC-RAL)

More expected as part of IBM-STFC collaboration

• Feasible because NNW rents its own dark (unlit) fibre network• NNW ‘simply’ change the optical equipment on each end of the dark fibre

• Key aim is for machine and expertise to be available to commercial companies• How? Over Janet?

• A Strategic Vision for UK e-Infrastructure estimates that 1,400 companies could make use of HPC, with 300 quite likely to do so

• So even if some instead go for the commercial “cloud” option...

Page 15: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

JASMIN & CEMS

• Joint Analysis System Meeting Infrastructure Needs

• JASMIN and CEMS funded by BIS through NERC, and UKSA and ISIC respectively

• Compute and storage cluster for the climate and earth system modelling community

Page 16: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Big compute and storage cluster

4.6 PetaBytesfast disc storage

JASMIN will talk internally to other STFC resources

compute + 500 TB

150 TB

JASMIN will talk to its satellite

systems

150 TB

JASMIN will talk to the

Nederlands, the MET Office &

Edinburgh over UKLight

Page 17: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

CEMS in the ISIC• Climate and Environmental Monitoring from Space• Essentially JASMIN for commercial users• Promote use of ‘space’ data and technology within new market sectors• Four consortia already won funding from public funded ‘Space for

Growth’ competition (run by UKSA, TSB and SEEDA)

• Hosted in International Space Innovation Centre• A ‘not-for-profit’ formed by industrials, academia and government.• Part of UK’s Space Innovation and Growth Strategy to grow the

sector’s £turnover

• ISIC is STFC ‘Partner Organisation’ in terms of Janet Eligibility Policy• So... Janet-BCE (Business and Community Engagement) for network

access related to academic and ISIC partners• Commercial ISP for network access related to commercial customers• As the industrial collaboration agenda is pushed, this needs to be

controlled and applicable elsewhere in STFC

Page 18: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Rtr JASMIN

Janet

Janet & Janet-BCE traffic

10 Gbps fibre

10 Gbps fibre

Commercial traffic

BT

Commercial

customers VLAN

Janet-BCE VLAN

RAL Infrastructu

re

• JASMIN and CEMS connected at 10 Gbps…

• …but no Janet access for CEMS via JASMIN

• Keeping Janet ‘permitted’ traffic as separate BCE VLAN allows tighter control

• Customers will access CEMS on different IP addresses depending on who they are (academia, partners, commercials)

• This could be enforced

No CEMS traffic permitted

Page 19: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

RAL Network Re-Design& Other Issues

Page 20: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Tier-1Tier-1

The Outside World

Two main aims:1. Resilience: Reduce serial paths and single points of failure.2. Scalability and flexibility: Remove need for special cases. Make

adding bandwidth and adding ‘clouds’ (e.g. Tier-1 or tenants) a repeatable process with known costs.

RAL PoP

SiteAccessRouter

UKLightRouter

Internal Distributi

onRouter A

RAL Network Re-Design

RAL Site

CERN LHC OPN

Janet

“Normal” data

LHC data

Firewall

ISIS

Admin

JASMIN

Page 21: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Internal Distribution

Site Access & Distribution

Security

SiteExternal Connectivity

Backup

Primary

Primary

Janet

CERN LHC OPN

Campus

Commer-cial ISP

Tenants

Visitors

Rtr A

Rest of RAL site

Rtr

Project,Facility,

Dept

Virtualfirewall

Sw 1

Sw 2Rtr 2

Rtr 1

Implicit trust relationship =bypass firewall

RAL PoP:

CampusAccess

&Distributi

on

InternalSite

Distribution

Page 22: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

Rtr 1 & 2, Sw1 & 2

Front: 48 ports 1/10 GbE (SFP+)

Back: 4 ports 40 GbE (QSFP+)

• Lots of 10 Gigs:– clouds and new providers can be readily added– bandwidth readily added to existing clouds– clouds can be dual connected

Page 23: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

RAL Site Resilience

500 ft100m

Backup to London

Primary to Reading

Page 24: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

User Education• Belief that you can plug a node or cluster into “the network” and

be immediately firing lots of data all over the world is a fallacy• Over provisioning is not a complete solution• Having invested £m’s elsewhere, most network problems that

do arise are within the last mile: campus network individual devices applications

• On the end systems...– Network Interface Card– Hard disc– TCP configuration– Poor cabling– Does your application use parallel TCP streams?– What protocols does your application use for data transfer

(GridFTP, HTTP...)?• Know what to do on your end systems• Know what questions to ask of others

Page 25: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

User Support• 2010 example: CMIP5 - RAL Space sharing environmental data with

Lawrence Livermore (West coast US) and DKRZ (Germany)– ESNet, California GÉANT, London 800 Mbps– ESNet, California RAL Space 30 Mbps– RAL Space DKRZ, Germany 40Mbps– So RAL is the problem right? Not necessarily...– DKRZ, Germany RAL Space up to 700Mbps

• Involved six distinct parties: RAL Space, STFC Networking, Janet, DANTE, ESNet, LLNL

• Difficult, although the experiences probably fed into the aforementioned JASMIN

• Tildesley’s Strategic Vision for UK e-Infrastructure talks of “the additional effort to provide the skills and training needed for advice and guidance on matching end-systems to high-capacity networks”

Page 26: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

I’ll do anything for a free lunch• Access Control and Identity Management– During DTI’s e-Science programme access to

resources was often controlled using personal X.509 certificates

– Is that scalable?– Will you run or pay for a PKI?– Resource providers may want to try Moonshot

• extension of eduroam technology• users of e-Infrastructure resources authenticated

with user credentials held by their employer

• Will the Janet Brokerage be applicable to HPC e-Infrastructure resources?

Page 27: Hosting Large-scale e-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

ConclusionsFrom the STFC networking perspective:• Adding bandwidth should be repeatable

process with known costs• Networking is now a core utility, just like

electricity: plan for resilience on many levels• Plan for commercial interaction• In all the excitement don’t forget security• e-Infrastructure funding is paying for capital

investments - be aware of the recurrent costs