19
S ATLAS Data Preservation and Access Roger Jones

ATLAS Data Preservation and Access Roger Jones

Embed Size (px)

Citation preview

S

ATLAS Data Preservation and

AccessRoger Jones

Data Preservation & Access

Opening data access Preparatory discussions with “management”, CB chair, authorship and

Pubcom chairs Has clear implications for authorship/membership rules

Needs CB-level discussion

Past experience says these topics provoke long discussion in the CB!

Common principles proposed by LHC experiment Data Policy Harmonization Group straw man

This has been reviewed by the SIPB and taken to CERN Council to become a “policy suggestion”

A draft policy is with the management for discussion & has been seen by the ICB

ATLAS DMP Organization

Data Preservation now included as part of the upgrade activity planning May increase the funding options – some evidence

already Data Management Planning is now required by some

funders for upgrade grants Looking at the cost/benefit of various strategies Resource tensioning with other upgrade activities

Principles for preservation & access

General agreement RAW data is preserved for the experiment and future – open data access is not usually possible even to the collaboration members (level 4 data) and is not proposed for general use

Full reconstruction outputs for analysis might be made available after an embargo period – tbd, but clearly embargo of several years. The resource implications to make this useful are high. (Level 3 data)

We support limited access of samples in simple formats for outreach and teaching (level 2 data) – but these are best integrated to our presenter tools

Techniques like Recast may make data (information) usefully available, although it does not meet all the open access criteria for levels 2 & 3

We already make data from papers and supporting information available through HEPDAT/Inspire (Level 1 data)

Data Preservation Policies

Data Preservation There are DP policies implied in the Computing TDRs

conserve all raw data during the lifetime of the experiment All formats & code used for paper analyses to be archived Tier 0/1s responsible for the physical preservation

Some tacit belief that older sets may be ‘retired’ Retired data no longer to be on disk or under active analysis This may need to be revised e.g. if external access is then granted Obvious resource implications

First priority to to preserve data for active use by the collaboration

ATLAS DP Practical Steps

Making sure raw data can be reprocessed long-term (Level 4) Identifying key datasets for ‘unique data’ preservation Setting up regular reprocessing and validation This has been underway as a test case for the 2009 data, but

progress is slow Forward/backward compatibility issues illustrated in John

Chapman’s talk on simulation release plans14/3/13

Ensure the capability to run old trigger selections offline

AODfixing will help (reprocessing at analysis format level) This means level 4 operations can be applied to level 3 AOD

format

Digesting validation results

Must display the results of the validation in a comprehensible way: web based interface

The test must determine the nature of the results Could be simple yes/no, plots, ROOT files, text-files with

keywords or length, ...

Need for semi-automated, detailed physics validation

David South is on ATLAS and was central to the DESY SP and DPHEP activities Identify the useful common components Identify the ATLAS-specific elements Set up CERN-based instance for ATLAS (and others?)

Existing open datasets

The CB has authorized various datasets in (level-2) outreach formats for open use in education/outreach Event displays for interactive analysis

(MINERVA/HYPATIA/LPPP/CAMELIA) JIVE-XML, root format data Absolutely not intended for any serious analysis, but

illustrative

ATLAS Zpath

Master the invariant mass technique to study and measure the (Z, J/ , y U)

decaying to l+l-

to search for new physics (Z’) And Higgs boson in gg and l+l-l+l- • HYPATIA using the ATLANTIS event display

• Data from 2011– 13000 events ~2.5 GB (password protected, 100

open)– 13 data groups/directories, 20 subgroups (A-T),

and 50 events/mixed sample/2 students – 50% Z, 30% , gg 10% (J/ , )y U , 5% Z', 5% l+l-l+l-

– Higgs candidate events:– 1 fb-1 and cuts according to ATLAS publication– 125 GeV Higgs MC signals ready to upload

(1fb-1, 10fb-1,25fb-1 )

M(gg)=125 GeV

M(eemm)=123 GeV

9

ATLAS Zpath tests

OPloT: Mll and/or Mgg and/or Mllll to

be discussed locally Moderator: 1 slide with 3

invariant masses; Invariant mass as a tool to identify particles, to discover new particles, and to search for exotic particles

Web pages updated and measurement ready http://www.physicsmastercla

sses.org/exercises/ATLAS-2013/en/zpath.htm

Introduced Higgs Described new

measurements Prepared material for

instructors, moderators, for discussions, …

10

OPloTTests 2013

Higgs comments 4l provided without

requiring 2l from Z, with lower cut on other pair

gg provide MC with 125 Higgs and background

Upload 125 Higgs MC ((1)&10 & 25 fb-

1)

11

Measurements Wln W+/W- ratio Angular distribution between

leptons in WW events

MINERVA program using the ATLANTIS event display

2011 real data: 693 WW/Higgs candidates (from released 1fb-1) mixed with 5307 W and other background events

Histogram tool

spreadsheet and histogram websites connected with database

New measurement tested

ATLAS W-path with real WW (+H) events

12

ATLAS W-path

13

• Data from 2011, 1.1fb-1• 350 should be WW (w/o Higgs)160 should be ttbar or single

top120 should be Z+Jets50 should be W+Jets15 should be from HWW

gg or e+e-?

Left: pT>1GeV; right pT>5 GeV 2 apparent tracks pointing

to 2 calorimeter objects

Zoom reveals 2-pairs e+e- information

• The conclusion is that the 2 calorimeter objects correspond to 2 photons, which have converted and lead to 4 tracks; the tracks from one pair had less than 3 pixel hits

• So, to be classified and entered as gg14

Level-2 observations

The applications are all trying to illustrate the analyses and physics in the true context of a detector

They use ATLANTIS as a presenter in most cases, which defines the natural common format Other formats would require an additional interface,

to what benefit? Use case and resource justification for a common

format not clear

S

ATLAS AnalysisRoger Jones

The Generic Analysis Flow

Level 3

ATLAS has no approved level-3 formats for external use, and such release will require such approval

We are concerned that anything released be useful, not consume large amounts of collaboration effort (both in production and response)

As such, tools like Recast are more attractive The information incorporates the efficiency, acceptances and

corrections – so is robust It also helps meet the internal requirement of full

documentation of analyses

Analysis Practical steps - RECAST

Framework developed to extend impact of existing analyses

Candidate for within-experiment and long-term analysis archival, encapsulating the full trigger & event selection, data, backgrounds, systematics

arXiv:1010.2506

Allow an existing analysis to be reinterpreted under an alternate model hypothesis Complete information from

original analysis, including the tacit information, contained in the data

Not optimized for the new model, but more reliable than a naïve reanalysis? Recast seen as a very promising solution for preserving analyses and useful, cost effective preservation of information – addresses levels ~1-~3