22
ATLAS: Database Strategy (and Happy Thanksgiving) Elizabeth Gallas (Oxford) Distributed Database Operations Workshop CERN: November 26-27, 2009

ATLAS: Database Strategy (and Happy Thanksgiving) Elizabeth Gallas (Oxford) Distributed Database Operations Workshop CERN: November 26-27, 2009

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

ATLAS: Database Strategy(and Happy Thanksgiving)

Elizabeth Gallas(Oxford)

Distributed Database Operations Workshop CERN: November 26-27, 2009

26-Nov-2009 Elizabeth Gallas 2

Outline Thanks ! Overview of Oracle Usage in ATLAS

Highlight and describe schemas which are distributedAMI (limited distribution: Lyon CERN)Trigger Database (Summary and Usage)Conditions Database (many systems under one roof)

‘Conditions’ in the ATLAS Conditions Database Database Distribution Technologies, why Frontier ?

TAG Database Architecture, Services, Resource planning

Distributed Database Access at Tier-1 Accounts and profiles The ATLAS COOL Pilot Monitoring

Summary

26-Nov-2009 Elizabeth Gallas 3

Supporting the Databases is critical ! Special thanks to

CERN Physics Database ServicesSupport Oracle-based services at CERNCoordinate Distributed Database Operations (WLCG 3D)

Tier-1 (Tier-2) -- DBAs and system managersEssential component for ATLAS success

The majority of ATLAS processing happens on “The Grid”

ATLAS DBAs:Florbela Viegas Gancho Dimitrov

- Schedule/Apply Oracle interventions- Advise us on Application development- Coordinate database monitoring (experts, shifters)

- Helping to develop, maintain and distribute this critical data takes specialized knowledge and considerable effort which is frequently underestimated.

Thanks to Oracle Operations Support

26-Nov-2009 Elizabeth Gallas 4

Oracle is used extensively: every stage of data taking & analysis Configuration

Detector Description – Geometry OKS – Configuration databases for the TDAQ Trigger – Trigger Configuration (online and simulation) PVSS – Detector Control System (DCS) Configuration & Monitoring

File and Job management T0 – Tier 0 processing DQ2/DDM – distributed file and dataset management Dashboard – monitor jobs and data movement on the ATLAS grid PanDa – workload management: production & distributed analysis

Dataset selection catalogue AMI (dataset selection catalogue)

Conditions data (non-event data for offline analysis) Conditions Database in Oracle [POOL files in DDM (referenced from the Conditions DB)]

Event summary - event-level metadata TAGs – ease selection of and navigation to events of interest

Overview – Oracle usage in ATLASDatabase

Distribution togrid sites

Lyon-CERN

Volunteerhosts

26-Nov-2009 Elizabeth Gallas 5

ATLAS 3D replication snapshot Oracle Streams CERN Tier-1 sites

Conditions DB data needed for offline analysis Trigger DB

Muon Conditions from muon calibration centres Munich, Michigan, Rome CERN

AMI (Lyon CERN)

ATLAS TAG DB (Customized distribution) not using Oracle Streams

“Upload” mechanism described later Only to volunteer host sites

CERN, TRIUMF, RAL, DESY, BNL?

26-Nov-2009 Elizabeth Gallas 6

AMI (Metadata Catalogue) replication AMI DB: Master at Lyon AMI read-only replica: at CERN via Oracle Streams

AMI web services - more robust Other Tier-1s – no worry

AMICatalogue X

AMICatalogue X

ORACLE/CCIN2P3

MYSQL/AMI1/LPSC

AMICatalogue X

ORACLE/CERN

Read Only

3D replication

legacy

ccami02

ccami01ami.in2p3.fr

ami.cern.chr.ch

r.fr

r.ch

servers

Picture from Solveig Albrand/AMI team

26-Nov-2009 Elizabeth Gallas 7

Trigger Database: Usage and ReplicationTrigger DB lies at the heart of the Trigger

Configuration system (red components)Paul Bell, in March09 gave a summary:

http://indicobeta.cern.ch/conferenceDisplay.py?confId=43390

A subset of Trigger DB information is stored in the Conditions DB Used by many standard ATLAS analysis

But more information is needed from the Trigger DB for specialized tasks

Therefore: Online TriggerDB on ATONR

replicated to ATLR via Oracle Streams replicated to Tier-1 via Oracle StreamsAllows (re-)running HLT at Tier-1

Simulation Trigger DB on ATLR replication to Tier-1 via Oracle StreamsAllows the configuration of MC jobs

Storage in Oracle (some previously in xml)insures safe archival of all menus usedeasy browsing using web access tools

Replication to Tier-2 for both Online and Simulation Trigger DB via “DB Release” SQLite files shipped with DB Release

Run Control

Offline access

Configurationuploading

Online Operation

TriggerTool

TriggerDB

CORAL“Loader classes”

L1 H/W Configuration

HLT S/WConfiguration

TriggerPanel

Conf2COOL

COOLConditionsDatabase

26-Nov-2009 Elizabeth Gallas 8

“Conditions” in the ConditionsDB

“Conditions” – general term for information which is not ‘event-wise’ reflecting the conditions or states of a system – conditions are valid for an interval ranging from very short to infinity.

Any conditions data needed for offline processing and/or analysis must be stored in the

ATLAS Conditions Database (aka: COOL) or in its referenced POOL files (DDM)

ATLAS Conditions Database(any non-event-wise data

needed for offline process/analysis)ZDC

DCS TDAQ OKS

LHC

DataQuality

Configuration …

26-Nov-2009 Elizabeth Gallas 9

Conditions DB in ATLAS

Used at many stages of data taking and analysisFrom online calibrations, alignment, monitoring, to offline

… processing … more calibrations … further alignment… reprocessing … analysis …to luminosity and data quality

An IOV (‘Interval of Validity’) based database All data indexed by interval (time or RunLB) Makes extracting subsets of the data in distinct

intervals across subsystems possiblethe basis of Conditions in “DB Release”

Athena (the ATLAS Software framework) Allows jobs to access any Conditions Considerable Improvements in Release15:

TWiki: DeliverablesForRelease15 More efficient use of COOL connections Enabling of Frontier / Squid connections to Oracle IOVDbSvc refinements … and much more

Continued refinement – by subsystem As we chip away at inefficient usage

Relies on considerable infrastructure: COOL, CORAL, Athena (developed by ATLAS and CERN IT) -- generic schema design which can store / accommodate / deliver a large amount of data for a diverse set of subsystems.

26-Nov-2009 Elizabeth Gallas 10

Conditions Distribution Issues Oracle stores a huge amount of essential data ‘at our fingertips’

But ATLAS has many… many… many… fingers May be looking for oldest to newest data

Conditions in Oracle – Master copy at Tier-0 selected replication to Tier-1 Running jobs at Oracle sites (direct access) performs well

Important to continue testing, optimize RAC But direct Oracle access on the grid from remote site over WideAreaNetwork

Even after tuning efforts, direct access requires many back/forth communications on the network – excessive RTT (Round Trip Time)… SLOW

Cascade effect Jobs hold connections for longer … Prevents new jobs from starting …

Use alternative technologies, especially over WAN: “caching” Conditions from Oracle when possible

OnlineCondDB

Offlinemaster

CondDB

Tier-1replica

Tier-1replica

Tier-0 farm

Computer centre

Outside world

Isolation / cut

Calibration updates

26-Nov-2009 Elizabeth Gallas 11

Alternatives to direct Oracle access “DB Release”: make system of files containing data needed. Used in

reprocessing campaigns. Includes: SQLite replicas: “mini” Conditions DB

with specific Folders, IOV range, CoolTag(a ‘slice’ – small subset of all rows in Oracle tables)

And associated POOL files, PFC

“Frontier”: store results of Oracle queries in a web cache. Developed by Fermilab

used by CDF, adopted and further refined for CMS Thanks to Dave Dykstra/CMS for a lot of advice !

Basic Idea: Frontier / Squid servers located at/near Oracle RAC negotiate transactions between grid jobs and Oracle DB – load levelling reduces the load on Oracle by caching results of repeated queries reduces latency observed connecting to Oracle over the WAN.

Additional Squid servers at remote sites help even more Douglas Smith will talk later today about Frontier/Squid deployment

“DB Release” works well for reprocessing, but not well for many other use cases…

26-Nov-2009 Elizabeth Gallas 12

ATLAS TAGs in the ATLAS Computing model

Stages of ATLAS reconstruction RAW data file

ESD (Event Summary Data) ~ 500 kB/event AOD (Analysis Object Data) ~ 100 kB/event

TAG (not an acronym) ~ 1 kB/event (stable)

TAG s Are produced in reconstruction in 2 formats:

File based AthenaAwareNTuple format (AANT)

TAG files are distributed to all Tier 1 sites Oracle Database

Event TAG DB populated from files in ‘upload’ process Can be re-produced in re-processing

Available globally through network connection In addition:

‘Run Metadata’ at Temporal, Fill, Run, LB levels File and Dataset related Metadata

TAG Browser (ELSSI) – uses combined Event, Run, File … Metadata

RAW

AOD

ESD

TAG

26-Nov-2009 Elizabeth Gallas 13

TAG Database Architecture TAG DB services / architecture model has evolved

From: Everything deployed at all voluntary sites To: Specific aspects deployed to optimize resources

Decoupling of services underway – increases flexibility of the system to deploy resources depending on evolution of usage

TAG DB (experts: Florbela Viegas, Elisabeth Vinek) Is populated via “upload” process from TAG files at Tier-0

For initial reconstruction at Tier-0: Upload is automatedFor reprocessing: Upload is manually triggered (site specific)

All Uploads are Controlled by Tier-0 with detailed monitoringIntegrated with AMI and DQ2 tools where appropriate

Balanced for read and write operations. Very Stable. TAG DB Voluntary Hosts include:

CERN, TRIUMF, DESY, RAL and BNL (?)

26-Nov-2009 Elizabeth Gallas 14

Tier-1 ATLAS DB Accounts and ProfilesAccess to the distributed Oracle schemas is regulated via applications

which access the database using Accounts you have set up at the Tier-1s

Tier-1s should refer to this ATLAS TWiki for more details:https://twiki.cern.ch/twiki/bin/view/Atlas/OracleProcessesSessionsUsersProfilesTier-1 DB Account overview:

Conditions Database ATLAS_COOL_READER_U

Used by Athena to read Conditions DB via direct Oracle access You have recently disabled ATLAS_COOL_READER which had a

weak password (and used by old ATLAS Releases) Trigger Database

ATLAS_CONF_TRIGGER_V2_R TAGs (for volunteer sites hosting ATLAS TAG DB)

ATLAS_TAGS_READER Reader account of the TAGS application ATLAS_TAGS_WRITER Writer account of the TAGS application

Special accounts for Frontier (at sites with Frontier) ATLAS_FRONTIER_READER

Used by Athena to read COOL and CONF access via Frontier Servlet ATLAS_FRONTIER_TRACKMOD

Used by Frontier cache consistency to get last table modification time

26-Nov-2009 Elizabeth Gallas 1526-Nov-2009 Elizabeth Gallas 15

The ATLAS COOL Pilot “Message Board” style queue to throttle COOL jobs based on load Described in detail here (Database Operations Meeting 4/Nov/09):

http://indicobeta.cern.ch/getFile.py/access?subContId=2&contribId=4&resId=1&materialId=slides&confId=72342

Tested at IN2P3, installed at several sites, not active now, but can be setup quickly if needed, as code (pilot.py) is in the Atlas software.

Monitoring tools allow to control the behaviour by web-based interface."""File: pilot.py….        while True :…..                    sensor.execute("select atlas_cool_pilot.load_status from dual")   ….                        if status=='GO' :                            # RW: avoid start of many jobs released at once                            sleep (attempt*random.randint(1, 300))                            #exit(134)                            break                        else:                            print strftime("  %a, %d %b %Y %H:%M:%S ", gmtime()), " avoiding load of", load[3], "at", sessions, "concurrent COOL sessions"                            if attempt<5 :                                attempt=attempt+1                                print attempt*sessions                                sleep (attempt*sessions)  …..                if attempt==5 :                    print "  FATAL: Killing job to avoid Oracle overload"                    sys.exit(134)  …..

26-Nov-2009 Elizabeth Gallas 16

Database monitoring at Tier-1 ATLAS Computing Model – puts many types of analysis on the grid Wide diversity of jobs accessing databases

Conditions DB access via Athena is especially variable DBAs and Conditions DB experts looking into specific queries

Asking Tier-1s to send top-usage queries back to us Frontier evaluation has been useful in disclosing some issuesSavannah bug reports fixes to CORAL, COOL, Frontier)

Access will vary With time and with increasing collision data By site (different groups have different responsibilities)

Experts at CERN and Tier-1 use resources and levers at their disposal to explore issues, find solutions Example: Recent talk by Carlos Gamboa (BNL)

http://indicobeta.cern.ch/conferenceDisplay.py?confId=72342 New ATLAS TWiki logging issues, evaluation, resolution and follow up:

https://twiki.cern.ch/twiki/bin/view/Atlas/AtlasDBIssues Access patterns at Tier-1s will help us proactively find issues

Weekly (or regular reports) might help. Example from Tier-0:http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/rac_report/

26-Nov-2009 Elizabeth Gallas 17

Summary Schemas are stable Access is controlled Monitoring in place Depth of experience Some redundancy and

tools in back pockets (COOL Pilot, Frontier)

Ready ? YES !

It’s November…and the data is coming… intensive and diverse analysis will follow …

Are we ready ?

26-Nov-2009 Elizabeth Gallas 18

BACKUP

26-Nov-2009 Elizabeth Gallas 19

Why Frontier ?TadaAki Isobe (DB access from Japan)

http://www.icepp.s.u-tokyo.ac.jp/~isobe/rc/conddb/DBforUserAnalysisAtTokyo.pdf

ReadReal.py scriptAthena 15.4.03 Methods:

SQLiteAccess files on NFS

w/ LYON Oracle~290msec RTT to CC‐IN2P3

w/ FroNTier~200msec RTT to BNLZip‐level: 5

Zip‐level: 0 (i.e. no compress), it takes ~15% longer time.

Work is ongoing to understand these kinds of tests.

26-Nov-2009 Elizabeth Gallas 20

What does a remote collaborator (or grid job) need?

Use case: TRT Monitoring (one of MANY examples…) Needs: latest Conditions (Oracle + POOL files) From: a Tier-3 site Explored: all 3 access methods

Talked: in/at hallways … then at meetings (BNL Frontier, Atlas DB) .. With experts and many other users facing similar issues…

In this process: Many many talented people have been involved in this process from around the globe – impressive collaboration !

Collective realization that Use cases continue to grow for distributed

Processing…Calibration…Alignment…Analysis …Expect sustained surge in all use cases w/collision data

Frontier technology seems to satisfies the needs of most use cases in a reasonable time

Now a matter of final testing to refine configurations, going global … for all sites wanting to run jobs with latest data …

Fred LuehringIndiana U

26-Nov-2009 Elizabeth Gallas 21

TAG Database Services Components:

TAG Database(s)at CERN and voluntary Tier-1s, Tier-2s

ELSSI – Event Level Selection Service InterfaceTAG Usage in every Software tutorial

ELSSI and file based TAG usage Web Services

Extract - dependencies Atlas Software AFS maxidisk to hold root files from Extract

Skim - Atlas Software, DQ2, Ganga, … Surge in effort helping to make TAG jobs grid-enabled

Response times vary: O(sec) for interactive queries

event selection, histograms… O(min) for extract O(hr) for skim

26-Nov-2009 Elizabeth Gallas 22

TAGs by Oracle Site (size in GB) CERN ATLR (total 404 GB)

ATLAS_TAGS_CSC_FDR 22 ATLAS_TAGS_COMM_2009 142ATLAS_TAGS_USERMIX 18ATLAS_TAGS_COMM 200 ATLAS_TAGS_ROME 22

BNL (total 383 GB)ATLAS_TAGS_CSC_FDR 18ATLAS_TAGS_COMM_2009 105ATLAS_TAGS_COMM 260

TRIUMF (total 206 GB)ATLAS_TAGS_CSC_FDR 16ATLAS_TAGS_COMM 190

DESY ATLAS_TAGS_MC 231

RAL – gearing up …

Respectable level of TAG deployment – should entertain wide variety of users (commissioning and physics analysis)

TAG upload now routine for commissioning … will be automatic for collisions

Intensive work on deployment of TAG services making them increasingly accessible to users (ELSSI)