View
218
Download
2
Category
Tags:
Preview:
Citation preview
connect • communicate • collaborate
LHCONE – Linking Tier 1 & Tier 2 SitesBackground and Requirements
Richard Hughes-Jones
DANTE Delivery of Advanced Network Technology to Europe
LHCONE Planning Meeting , RENATER Paris, 5 April 2011
connect • communicate • collaborate2
Introduction:
Describe some of the changes in the computing model of the LHC experiments.
Demonstrate the importance and usage of the network.
Show the relation between LHCONE and LHCOPN.
Bring together and present the user requirements for future LHC physics analysis.
Provide the information to facilitate the presentations on the Architecture and the Implementation of LHCONE.
connect • communicate • collaborate3
A Little History
Requirements paper from K. Bos (Atlas) and I. Fisk (CMS) in autumn 2010.Experiments had devised new compute and data models for LHC data evaluation basically assuming a high speed network connecting the T2s worldwide.Ideas & proposals were discussed at a workshop held at CERN in Jan 2011. Gave input from the networking community.An "LHCONE Architecture" doc finalised in Lyon in Feb 2011. Here K. Bos proposed to start with a prototype based on the commonly agreed architecture.K. Bos and I. Fisk produced a "Use Case" note with list of sites for the prototype.In Rome late Feb 2011 some NRENs & DANTE formed ideas for the "LHCONE prototype planning" doc.
connect • communicate • collaborate
LHCOPN
LHC: Changing Data Models (1)
LHC computing model based on MONARC served well > 10 years
ATLAS strictly hierarchal; CMS less so.
The successful operation of the LHC accelerator & start of data analysis, brought a re-evaluation of the computing and data models.
Flatter hierarchy: Any site might in the future pull data from any other site hosting it.
LHCOPN
4 Artur Barczyk
connect • communicate • collaborate
LHC: Changing Data Models (2)
Data caching: A bit like web caching.Analysis sites will pull datasets from other sites “on demand”, including from Tier2s in other regions, then make it available for others.
Possible strategic pre-placement of data setsDatasets put close to physicists studying that data / suitable CPU power.Use of continental replicas.
Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time.
Traffic patterns are changing – more direct inter-country data transfers
5
connect • communicate • collaborate
ATLAS Data TransfersBetween all Tier levels
Average: ~ 2.3 GB/s (daily average)
Peak: ~ 7 GB/s (daily average)
Data available on site within a few hours.
70 Gbit/s on LHCOPN ATLAS reprocessing
Daniele Bonacorsi6
connect • communicate • collaborate
Data Flow EU – US ATLAS Tier 2’s
Example above is from US Tier 2 sites Exponential rise in April and May, after LHC start Changed data distribution model end of June – caching ESD and DESD Much slower rise since July, even as luminosity grows rapidly
Kors Bos7
connect • communicate • collaborate
LHC: Evolving Traffic Patterns
One example of data coming from the US
4 Gbit/s for ~ 1.5 days (11 Jan 11)
Transatlantic link
GÉANT Backbone
NREN Access Link
Not an isolated case
Often made up of many data flows
Users getting good at running gridftp
8
connect • communicate • collaborate
Data Transfers over RENATER
Peak rates a substantial fraction of 10 Gigabits, often for hours.
Several LHC involved.
Demand variable depending on user work.
Francois-Xavier Andreu9
connect • communicate • collaborate
Data Transfers over DFN
Peak rates saturate one of the10 Gigabit links DFN-GÉANT.
Demand variable depending on user work.
Christian Grimm10
Two different weeks from GÉANT to Aachen
connect • communicate • collaborate
Data Transfers from GARR - CNAFT0-T1 + T1-T1 + T1-T2
Peak rates 14-18 Gigabit/s.
Traffic shows diurnal demand & is variable depending on user work.
Sustained growth over last year
Marco Marletta11
connect • communicate • collaborate
CMS Data TransfersData Placement for Physics Analysis
Once data is onto the WLCG, it must be made accessible to analysis applications.
Largest fraction of analysis computing at LHC is at the Tier2s.
New flexibility reduces latency for end users.Daniele Bonacorsi
12
T1‐T2 dominates
T2‐T2 emerges
connect • communicate • collaborate
Data Transfer Performance Site or Network?
Test NorthGrid to GÉANT PoP London
UDP throughput from SE 990 Mbit/s.
75% packet loss.
Data transmitted by SE at 3.8 Gbit/sover 4 1 Gigabit interfaces.
TCP transmits in bursts at 3.8 Gbit/spacket loss & re-tries mean low throughput
13
1 Gbit Bottleneck at receiver
Classic packet loss from bottleneck
Even more data with end-hosts fixed.
connect • communicate • collaborate
LHCOPN linking Tier 0 to Tier 1’sLHCONE for Tier 1’s and Tier 2’s
14
LHCONEOther regionsOther regions
T2s in a country
LHCONE prototype in Europe.
T1 are connected but not LHCOPN
connect • communicate • collaborate
Requirements for LHCONE
LHCOPN provides infrastructure to move data T0-T1 and T1-T1.
New infrastructure required to improve transfers T1-T2 & T2-T2:
Analysis is mainly done in Tier 2, so data is required from any T1 or any T2. T2-T2 is very important.
Work done at a Tier 2: Simulations & Physics Analysis (50:50)
Network BW needs of a T2 include:
Re-processing efforts: 400 TByte refresh in a week = 5 Gbit/s
Data bursts from user analysis : 25 Tbyte in a day = 2.5Gbit/s
Feeding a 1000 core farm with LHC events: ~ 1Gbit/s
Note this implies timely delivery of data not just average rates!
Access link “available bandwidth” for Tier 2 sizes:
Large 10 Gbit; Medium 5 Gbit; Small 1 Gbit
15
connect • communicate • collaborate
Requirements for LHCONE
Sites are free to choose the way they wish to connect.
Flexibility & extensibility required:
T2s change
Analysis usage pattern is more chaotic – Dynamic Networks of interest
World-wide connectivity required for LHC sites.
There is concern about LHC traffic swamping other disciplines.
Monitoring & fault-finding support should be built in.
Cost effective solution required – may influence the Architecture.
No isolation of sites must occur.
No interruption of the data-taking or physics analysis
A prototype is needed.
16
connect • communicate • collaborate
RequirementsFitting in with LHC 2011 data taking
17
Machine development & Technical Stops provide pauses in the data taking.
This does not mean there is plenty of time.
LHCONE prototype might grow in phases.
connect • communicate • collaborate
ANY QUESTIONS ?
18
Recommended