Upload
janet
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
High-Energy Physics Data Delivering Data in Science ICSTI Winter Workshop. Tim Smith – CERN/IT Department. Delivering Data in HEP . Data Storage and Stewardship Distribution and Access Interpretability, Reusability and C itability. LHC and the Data Deluge. 150 million sensors. - PowerPoint PPT Presentation
Citation preview
European Organization for Nuclear ResearchOrganisation Européenne pour la Recherche Nucléaire
High-Energy Physics Data
Delivering Data in ScienceICSTI Winter Workshop
Tim Smith – CERN/IT Department
Tim Smith @ ICSTI Workshop, Mar 2012 <2>
Delivering Data in HEP
• Data Storage and Stewardship
• Distribution and Access
• Interpretability, Reusability and Citability
Tim Smith @ ICSTI Workshop, Mar 2012 <3>
LHC and the Data Deluge
150 million sensors
40 million times /sec
22 PB in 2012
Tim Smith @ ICSTI Workshop, Mar 2012 <4> 4
Just a Drop in the Ocean! …Selection
Particle
Protons/bunch 1011
Crossing rate 40 Million /sec
Collision rate 1 Billion /sec
Parton(quark, gluon)
Proton
Bunch
SUSY.....
HiggsZo
Zoe+
e+
e-
e-
Filter to200 /sec
Tim Smith @ ICSTI Workshop, Mar 2012 <5>
Data Storage
6 GB/s
Tim Smith @ ICSTI Workshop, Mar 2012 <6>
Data Stewardship: Migration
LHC era 60 PB
LEP era 100 TB
Tim Smith @ ICSTI Workshop, Mar 2012 <7>
Data Distribution & Access
11 T1140 T2
T3s
Worldwide LHC Computing Grid
Tim Smith @ ICSTI Workshop, Mar 2012 <8>
Data Access
Publication data
Derived physics data
Analysis Object Data
Reconstructed Data
Raw Data / Simulated Data
70 PB Worldwide
x N
x tens
x few
22 PBT0
T1
T2
T3
Tim Smith @ ICSTI Workshop, Mar 2012 <9>
Data Access ≠ Data UsabilityData Access ≠ Data Usability
Tim Smith @ ICSTI Workshop, Mar 2012 <10>
Data Reuse: Raw/Processed Data
• Reuse of the Reconstructed & Analysis Object Data– Calibrations, Configurations– Conditions DBs: tens of TBs
– Reconstruction and identification algorithms– Detector response parameterizations– Software: millions of lines-of-code
Tim Smith @ ICSTI Workshop, Mar 2012 <11>
Data Reuse: Publication Data
• Published observables– Model-independent measurements– Distributions and cross-sections– HEPData: tabular– DOIs
• Rivet routines– Parameterize analysis acceptance– Compare simulated & measured data– http://rivet.hepforge.org/
Tim Smith @ ICSTI Workshop, Mar 2012 <12>
Data Reuse: Derived Physics Data
• Access, ability to reinterpret– Reanalysis with new QCD calculations– Combination with data from future colliders– …serendipitous discovery
• Pitfalls: Large investment of effort required– Correlations, efficiencies, systematic uncertainties– Backgrounds estimated from data driven techniques
• Intertwined with event selection criteria
• Searches…
Tim Smith @ ICSTI Workshop, Mar 2012 <13>
Data Reuse: Derived Physics Data
• RECAST– Limits of an existing search
for an alternative hypothesis– Brokering service– Collaboration
• Archives the analysis code• Provides authority
• Digital Preservation in HEP– http://www.dphep.org/
Tim Smith @ ICSTI Workshop, Mar 2012 <14>
WLCG
Delivering HEP data to
scientists around the
world
Tim Smith @ ICSTI Workshop, Mar 2012 <15>
Questions ?