45
Data Handling for LHC: Plans and Reality Tony Cass Leader, Database Services Group Information Technology Department 11 th July 2012 1

Data Handling for LHC: Plans and Reality

  • Upload
    lovie

  • View
    23

  • Download
    2

Embed Size (px)

DESCRIPTION

Data Handling for LHC: Plans and Reality Tony Cass Leader, Database Services Group Information Technology Department 11 th July 2012. Outline. HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Handling for LHC: Plans and Reality

1

Data Handling for LHC:Plans and Reality

Tony CassLeader, Database Services Group

Information Technology Department

11th July 2012

Page 2: Data Handling for LHC: Plans and Reality

2

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 3: Data Handling for LHC: Plans and Reality

3

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 4: Data Handling for LHC: Plans and Reality

55

Page 5: Data Handling for LHC: Plans and Reality

ATLASEmily Nurse 20

We are looking for rare events!

Higgs (mH=120 GeV) : 17 pb 750 events

70 billion pb 3 trillion events! ** N.B. only a very small fraction saved!

e.g. potentially ~1 Higgs in every 300 billion interactions!

number of events = Luminosity × Cross section2010 Luminosity: 45pb-1

7

~250x more events to date

Page 6: Data Handling for LHC: Plans and Reality

22

So the four LHC Experiments…

Page 7: Data Handling for LHC: Plans and Reality

23

… generate lots of data …

The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors

Page 8: Data Handling for LHC: Plans and Reality

24

… generate lots of data …reduced by online computers to

a few hundred “good” eventsper second.

Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments• Current forecast ~ 23-25 PB / year, 100-120M files / year

– ~ 20-25K 1 TB tapes / year

• Archive will need to store 0.1 EB in 2014, ~1Billion files in 2015

0

10

20

30

40

50

60CASTOR data written, 01/01/2010 to 29/6/2012 (in PB)

USERNTOFNA61NA48LHCBCOMPASSCMSATLASAMSALICE

Z μμ

ATLAS Z μμ event from 2012 data with 25 reconstructed vertices

Page 9: Data Handling for LHC: Plans and Reality

25

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 10: Data Handling for LHC: Plans and Reality

26

What is the technique?Break up a Massive Data Set …

Page 11: Data Handling for LHC: Plans and Reality

27

What is the technique?… into lots of small pieces and distribute them around the world …

Page 12: Data Handling for LHC: Plans and Reality

28

What is the technique?… analyse in parallel …

Page 13: Data Handling for LHC: Plans and Reality

29

What is the technique?… gather the results …

Page 14: Data Handling for LHC: Plans and Reality

30

What is the technique?… and discover the Higgs boson:

Nice result, but… … is it novel?

a

Page 15: Data Handling for LHC: Plans and Reality

31

Is it Novel?Maybe not novel as such, but the implementation

is Terrascale computingthat is widely appreciated!

Page 16: Data Handling for LHC: Plans and Reality

32

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 17: Data Handling for LHC: Plans and Reality

34

The Grid• Timely Technology!• The WLCG project

deployed to meet LHC computing needs.

• The EDG and EGEE projects organised development in Europe. (OSG and others in the US.)

Page 18: Data Handling for LHC: Plans and Reality

35

Grid Middleware Basics• Compute Element

– Standard interface to local workload management systems (batch scheduler)

• Storage Element– Standard interface to local mass storage

systems• Resource Broker

– Tool to analyse user job requests (input data sets, cpu time, data output requirements) and route these to sites according to data and cpu time availability.

Many implementations of the basic principles:Globus, VDT, EDG/EGEE, NorduGrid, OSG

Page 19: Data Handling for LHC: Plans and Reality

36

• Issue– Grid sites generally want to maintain a high average CPU

utilisation; easiest to do this if there is a local queue of work to select from when another job ends.

– Users are generally interested in turnround times as well as job throughput. Turnround is reduced if jobs are held centrally until a processing slot is known to be free at a target site.

• Solution: Pilot job frameworks.– Per-experiment code submits a job which chooses a work

unit to run from a per-experiment queue when it is allocated an execution slot at a site.

• Pilot job frameworks separate out– site responsibility for allocating CPU resources from– Experiment responsibility for allocating priority between

different research sub-groups.

Job Scheduling in Practice

36

… But note: Pilot job frameworks talk directly to the CEs and

we have moved away from a generic solution to one that

has a specific framework per VO (although these can be

shared in principle)

Page 20: Data Handling for LHC: Plans and Reality

37

Data Issues• Reception and long-term storage• Delivery for processing and export• Distribution• Metadata distribution

1430MB/s

700MB/s 2600MB/s

700MB/s 420MB/s

(3600MB/s) (>4000MB/s)

Scheduled work only – and we need ability to support 2x for recovery!

Page 21: Data Handling for LHC: Plans and Reality

38

(Mass) Storage Systems• After evaluation of commercial alternatives

in the late 1990s, two tape-capable Mass storage systems have been developed for HEP:– CASTOR: an integrated

mass storage system

– dCache: a disk pool manager thatinterfaces to multiple tape archives(Enstore @ FNAL, IBM’s TSM)

• dCache is also used a basic disk storage manager Tier2s along with the simpler DPM

Page 22: Data Handling for LHC: Plans and Reality

39

A Word About Tape• Our data set may be massive, but…

<10K 10K-100K

100K-1M

1M-10M

10M-100M

100M-500M

500M-1G

1G-2G >2G0

5

10

15

20

25

30

35

CERN Archive file size distribution, in %

~195MB average only increasing slowly after LHC startup!

0 500 1000 1500 2000 25000

20000

40000

60000

80000

100000

120000Drive write performance, CASTOR tape format

(ANSI AUL)

IBM AULSUN AUL

file size (MB)

Writ

e sp

eed

(KB/

s)

Average write drive speed: < 40MB/s(cf native drive speeds: 120-160MB/s)Small increases with new drive generations

It is made up ofmany small files…

…which is bad fortape speeds:

Page 23: Data Handling for LHC: Plans and Reality

40

Tape Drive EfficiencySo we have to change tape writing policy…

0 100 200 300 400 500 6000

20

40

60

80

100

120

140

Drive write performance, buffered vs non-buffered tape marks

CASTOR present (3sync/file)CASTOR new (1sync/file)CASTOR future (1 sync / 4GB)

file size, MB

spee

d, M

B/s

3 sync/file 1 sync/file 1 sync / 4GB0

20

40

60

80

100

120

Average drive performance (MB/s) for CERN Archive files

Page 24: Data Handling for LHC: Plans and Reality

43

Storage vs Recall Efficiency

43

• Efficient data acceptance:– Have lots of input streams, spread across a

number of storage servers,– wait until the storage servers are ~full, and– write the data from each storage server to tape.– Result: data recorded at the same time is

scattered over many tapes.• How is the data read back?

– Generally, files grouped by time of creation.– How to optimise for this? Group files on to a

small number of tapes.• Ooops…

Page 25: Data Handling for LHC: Plans and Reality

44

Keep users away from tape

44

Page 26: Data Handling for LHC: Plans and Reality

45

CASTOR & EOS

Page 27: Data Handling for LHC: Plans and Reality

47

Data Distribution• The LHC experiments need to distribute

millions of files between the different sites.

• The File Transfer System automates this – handling failures of the underlying

distribution technology (gridftp)– ensuring effective use of the bandwidth with

multiple streams, and– managing the bandwidth use

• ensuring ATLAS, say, is guaranteed 50% of the available bandwidth between two sites if there is data to transfer

Page 28: Data Handling for LHC: Plans and Reality

48

Data Distribution• FTS uses the Storage Resource Manager as an

abstract interface to the different storage systems– A Good Idea™ but this is not (IMHO) a complete storage

abstraction layer and anyway cannot hide fundamental differences in approaches to MSS design• Lots of interest in the Amazon S3 interface these days; this

doesn’t try to do as much as SRM, but HEP should try to adopt de facto standards.

• Once you have distributed the data, a file catalogue is needed to record which files are available where.– LFC, the LCG File Catalogue was designed for this role

as a distributed catalogue to avoid a single point of failure, but other solutions are also used• And as many other services rely on CERN, the need for a

distributed catalogue is no longer (seen as…) so important.

Page 29: Data Handling for LHC: Plans and Reality

49

Looking more widely — I

49

• Only a small subset of data distributed is actually used

• Experiments don’t know a priori which dataset will be popular– CMS has 8 orders magnitude in

access between most and least popular

Dynamic data replication: create copies of popular datasets at multiple sites.

Page 30: Data Handling for LHC: Plans and Reality

50

Looking more widely — II

50Fibre cut during tests in 2009Capacity reduced, but alternative links took over

622

Mbi

ts/s

Desktops

CERNn.107 MIPSm Pbyte Robot

Universityn.106MIPSm Tbyte Robot

FNAL4.107 MIPS110 Tbyte

Robot

622 M

bits/s

N x

622

M

bits

/s

622Mbits/s

622 Mbits/s

Desktops

Desktops

MONARC2000

• Network capacity is readily available…• … and it is reliable:• So let’s simply copy data from another

site if it is not available locally– rather than recalling from tape or failing the

job.• Inter-connectedness is increasing with the

design of LHCOne to deliver (multi-) 10Gb links between Tier2s.

Page 31: Data Handling for LHC: Plans and Reality

51

Metadata Distribution• Conditions data is needed to make sense of the

raw data from the experiments– Data on items such as temperatures, detector

voltages and gas compositions is needed to turn the ~100M Pixel image of the event into a meaningful description in terms of particles, tracks and momenta.

• This data is in an RDBMS, Oracle at CERN, and presents interesting distribution challenges– One cannot tightly couple databases across the

loosely coupled WLCG sites, for example…– Oracle streams technology improved to deliver the

necessary performance, and http caching systems developed to address need for cross-DBMS distribution.

row size = 100B row size = 500B row size = 1000B0

50001000015000200002500030000350004000045000

4600 2800 1700

37000

3000025000

40000 40000

34000

Average Streams Throughput

Oracle 10g Oracle 11gR2 Oracle 11g R2 (optimized)

LCR/

s

Page 32: Data Handling for LHC: Plans and Reality

52

• Jobs submitted to sites depend on large, rapidly changing libraries of experiment specific code– Major problems ensue if updated code is not

distributed to every server across the grid (remember, there are x0,000 servers…)

– Shared filesystems can become a bottleneck if used as a distribution mechanism within a site.

• Approaches– Pilot job framework can check to see if the

execution host has the correct environment…– A global caching file system: CernVM-FS.

Job Execution Environment

52

2011

ATLAS Today: 22/1.8M filesATLAS Today: 921/115GB

Page 33: Data Handling for LHC: Plans and Reality

53

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 34: Data Handling for LHC: Plans and Reality

54

• Learning from our mistakes– We have just completed a review of WLCG

operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown.

– Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation.

• Clouds

• Identity Management

Towards the Future

Page 35: Data Handling for LHC: Plans and Reality

55

• Learning from our mistakes– We have just completed a review of WLCG

operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown.

– Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation.

• Clouds

• Identity Management

Towards the Future

Page 36: Data Handling for LHC: Plans and Reality

56

• Learning from our mistakes– We have just completed a review of WLCG

operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown.

– Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation.

• Clouds

• Identity Management

Towards the Future

Page 37: Data Handling for LHC: Plans and Reality

57

Integrating With The Cloud?

CentralTask

Queue

Site A

Site B

Site C

SharedImage

Repository(VMIC)

User

VO service

Instance requests

Commercial cloud

Payload pull

Image maintainer

Cloud bursting

Slid

e co

urte

sy o

f Ulri

ch S

chwi

cker

ath

Page 38: Data Handling for LHC: Plans and Reality

58

• Learning from our mistakes– We have just completed a review of WLCG

operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown.

– Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation.

• Clouds

• Identity Management

Towards the Future

Page 39: Data Handling for LHC: Plans and Reality

59

• Learning from our mistakes– We have just completed a review of WLCG

operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown.

– Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation.

• Clouds

• Identity Management

Towards the Future

Page 40: Data Handling for LHC: Plans and Reality

60

Grid Middleware Basics• Compute Element

– Standard interface to local workload management systems (batch scheduler)

• Storage Element– Standard interface to local mass storage

systems• Resource Broker

– Tool to analyse user job requests (input data sets, cpu time, data output requirements) and route these to sites according to data and cpu time availability.

Many implementations of the basic principles:Globus, VDT, EDG/EGEE, NorduGrid, OSG

None of this works

without…

Page 41: Data Handling for LHC: Plans and Reality

61

Trust!

Page 42: Data Handling for LHC: Plans and Reality

62

One step beyond?

Page 43: Data Handling for LHC: Plans and Reality

63

• HEP, CERN, LHC and LHC Experiments• LHC Computing Challenge• The Technique

– In outline– In more detail

• Towards the Future• Summary

Outline

Page 44: Data Handling for LHC: Plans and Reality

64

• WLCG has delivered the capability to manage and distribute the large volumes of data generated by the LHC experiments– and the excellent WLCG performance has

enabled physicists to deliver results rapidly.• HEP datasets may not be the most complex

or (any longer) massive, but in addressing the LHC computing challenges, the community has delivered– the world’s largest computing Grid,– practical solutions to requirements for large-

scale data storage, distribution and access, and– a global trust federation enabling world-wide

collaboration.

Summary

64

Page 45: Data Handling for LHC: Plans and Reality

65

Thank You!

And thanks to Vlado Bahyl, German Cancio, Ian Bird, Jakob Blomer, Eva Dafonte Perez, Fabiola Gianotti, Frédéric Hemmer, Jan Iven, Alberto Pace and Romain Wartel of CERN, Elisa Lanciotti of PIC and K. De, T. Maeno, and S. Panitkin of ATLAS for various unattributed graphics and slides.