Upload
others
View
61
Download
0
Embed Size (px)
Citation preview
CERN - IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
The LCG Distributed Database Infrastructure
Dirk Düllmann, CERN & LCG 3DDESY Computing Seminar
21. May ‘07
The LCG Distributed Database Infrastructure -
Outline of the Talk
• Why databases and why distributed?• LCG Distributed Database Project (LCG 3D)• Building blocks for a scalable database
services– database clusters
• Distribution techniques– streams replication and distributed caching
• Database and replication monitoring optimisation
• Experiment models and schedules• Status and next steps
3
Offline Reconstruction
Conditions DB
ONline subset
CERN Computer
Center
Online Master Data StorageC
onditions
Formatting
Production Validation Development
Offline ReconstructionConditions DBOFfline subset
Mastercopy
Con
figur
atio
ns
Pixel
Strips
ECAL
HCAL
RPC
DT
ES
CSC
Trigger
DAQ
DCS Con
ditio
ns
Pixel
Strips
ECAL
HCAL
RPC
DT
ES
CSC
Trigger
DAQ
DCS
LHC data
Logbook
DDD
EMDB
Calibration
Create rec conDB data set
Tier 0
Con
figur
atio
n
Con
ditio
ns
The LCG Distributed Database Infrastructure - 4
Distributed Deployment of Databases (=3D)
• LCG provided initially an infrastructure for distributed access to file based data and file replication
• Physics applications (and grid services) depend crucially on relational databases for meta data and require similar services for databases
– Physics applications and grid services use RDBMS• eg configuration, conditions, calibration, event tags, file and
collection catalogues, production/transfer workflow
– LCG sites have already experience in providing RDBMS services
• Goals for common database project as part of LCG – increase the availability and scalability of LCG and experiment
database components
– allow applications to access databases in a consistent, location independent way
– connect database services via data replication mechanisms
– shared deployment and administration of this infrastructure during 24 x 7 operation
• Scope set by LCG PEB: Databases Online, Offline and at LCG Tier sites
The LCG Distributed Database Infrastructure - 5
LCG 3D Service Architecture
OO
O
O
S
F
S
S
M S
M S
T1 – DB backbone– all data replicated– reliable service
T2 – local DB cache– subset data– only local serviceOnline DB
– autonomous reliable service
database (DB) cluster
T0– autonomous reliable service
Read-only access at Tier-1/2(at least initially)
Oracle Streamshttp cache (Squid)Cross DB copy and MySQL/SQLight files
O
M
Squid
FroNTier
F
MySQL/SQLight DB file
The LCG Distributed Database Infrastructure -
Database Services for Physics at CERN in 2002
The LCG Distributed Database Infrastructure -
Database Services for Physics at CERN in 2002• 24x7 service based on
Oracle 9i / Solaris
The LCG Distributed Database Infrastructure -
Database Services for Physics at CERN in 2002• 24x7 service based on
Oracle 9i / Solaris• Most experiment
databases hosted on– 2-node (4 CPU) cluster on
Solaris with 6GB of RAM– 18 disks, 32GB each– Veritas Volume Manager
The LCG Distributed Database Infrastructure -
Database Services for Physics at CERN in 2002• 24x7 service based on
Oracle 9i / Solaris• Most experiment
databases hosted on– 2-node (4 CPU) cluster on
Solaris with 6GB of RAM– 18 disks, 32GB each– Veritas Volume Manager
• But also increasing many Linux disk servers for LHC and non-LHC data (COMPASS, HARP)– Locally attached disk– Spread all over the CERN
computing centre– Several different versions of
Linux and Oracle
The LCG Distributed Database Infrastructure -
How to build a highly available and scalable DB service?
The LCG Distributed Database Infrastructure -
How to build a highly available and scalable DB service?
• Scaling Up
The LCG Distributed Database Infrastructure -
How to build a highly available and scalable DB service?
• Scaling Up
The LCG Distributed Database Infrastructure -
How to build a highly available and scalable DB service?
• Scaling Up
• Scaling out -> clustering
Storage Area Network
The LCG Distributed Database Infrastructure -
How to build a highly available and scalable DB service?
• Scaling Up
• Scaling out -> clustering
Storage Area Network
The LCG Distributed Database Infrastructure -
Chosen Architecture: Database Cluster on Linux
• Oracle Real Application Cluster (RAC) on commodity hardware– Redundancy at all levels (CPU, storage, networking)– Oracle 10gR2 and RedHat ES 4– Oracle ASM as volume manager
CERN
LAN
RAC1
Infortrend storage
lxs5033
lxs5037
lxs5038
lxs5030
GB ethernet switch
SANbox 5200 – A2 GB fibre channel switch
ASGC 9
Hardware configuration Four servers
CPU : Intel Pentium-D 830 3.0 GHz Memory 2G (ECC) Local Disk S-ATA2 80G 7200 rpm Fiber Channel LSI 7102XP-LC, PCI
X 1
SAN Switch : Silkworm 3850 16 ports
Backend Raid subsystem: StorageTek B280
Each RAC group shares 1.7TB exported from SAN
RAC group for 3DRAC group for other LCG services
Dual channel &
redundant controller
Hurng-Chun - ASGC
The LCG Distributed Database Infrastructure -
Beginning of 2007
• ~220 Database CPUs
• ~440 GB of RAM (shared DB cache)
• ~1100 disks
• Some 15 DB clusters deployed– One production cluster per LHC experiment for offline
applications
– ATLAS Online, COMPASS cluster
– Number of nodes varying from 4 to 8
• Several validation and test clusters– 1 or 2 per experiment (typically 2 node clusters)
– Some hardware allocated for internal use/tests
The LCG Distributed Database Infrastructure -
Application Validation and Optimisation• Database clusters can amplify effects of poor application
design - need scalability test during application release cycle
Development DB service Validation DB service Production DB service
• Significant fraction of DB administrator work• needs close collaboration with application developers
• Focus on application with large resource consumption:– File transfer systems (FTS, PhEDEx)– Grid catalogs (LFC)– Experiment Dashboards– Condition data (COOL)– Event collections (TAGS)
The LCG Distributed Database Infrastructure -
Evolving Database Hardware
• Need to continuously replace database h/w with next generation CPU and storage– still maintaining as few s/w configurations as
possible: only a single Linux and Oracle version• CPU side - continued performance increase via multi-core
– Recently tested dual quad-core CPU & 16 GB of RAM• performance similar to 5 node RAC built with the hardware currently used
– Multi core works well for database servers
• but implies a move to 64-bit Linux & Oracle
– May run into memory bandwidth limitations with many cores
• Storage side - slower performance increase– Sizing for IO operations per second rather than just volume
increase -> disk numbers increase
– Investigating higher performance disks (eg raptor) and other storage technologies (solid state disks)
CERN - IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Database Replication and Distributed Caching
13 Sept, 2006 CMS Frontier Report
FroNTier “Launchpad” software
• Squid caching proxy– Load shared with Round-Robin DNS– Configured in “accelerator mode” – Peer-to-peer caching– “Wide open frontier”*
• Tomcat - standard• FroNTier servlet
– Distributed as “war” file• Unpack in Tomcat webapps dir• Change 2 files if name is different
– One xml file describes DB connection
DB
SquidSquid Squid
Tomcat Tomcat Tomcat
FroNTierservlet
FroNTierservlet
FroNTierservlet
Round-RobinDNS
server1 server2 server3
*In the past, we required the registration so we could add IP/mask to our Access Control List (ACL) at CERN. Recently decided to run in “wide-open” mode so installations can be tested w/o registration.
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
applypropagationcapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
insert into emp values ( 03, “Joan”,….)
applypropagationcapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
insert into emp values ( 03, “Joan”,….)
applypropagationcapturecapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
LCR
insert into emp values ( 03, “Joan”,….)
applypropagationcapturecapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
LCR
insert into emp values ( 03, “Joan”,….)
applypropagationcapture propagationcapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
LCR
LCR
LCR
LCR
LCR
LCR
LCR
LCR
insert into emp values ( 03, “Joan”,….)
applypropagationcapture propagationcapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
LCR
LCR
LCR
LCR
LCR
LCR
LCR
LCR
insert into emp values ( 03, “Joan”,….)
applypropagationcapture applypropagationcapture
Slide : Eva Dafonte Perez
LCG 3D Status Dirk Duellmann 15
How to keep Databases up-to-date? Asynchronous Replication via Streams
CNAF
RAL Sinica
FNAL
IN2P3
BNL
CERN
CERN
insert into emp values ( 03, “Joan”,….)
applypropagationcapture applypropagationcapture
Slide : Eva Dafonte Perez
The LCG Distributed Database Infrastructure -
Database s/w Licenses and Support
• Tier 1 licenses acquired– Experiment and grid service request collected for
all T1 sites – License payment and signed agreement forms
received
• LCG Support ID has been created with Oracle• Accounts for Oracle support (MetaLink)
created for all T1 database contacts– access to Oracle s/w, patches, security upgrades
and problem database– allows to file problem reports directly to Oracle
The LCG Distributed Database Infrastructure -
Tier 0 setup and operational procedures
• Main streams operations have been included in the Tier 0 DBA operations manual
• Automated alerts for database or streams problems from 3D monitoring– Integrated with GGUS (grid user support)– so far: handling of streams problems during
working hours only
• Downstream capture setup has been installed as part of the planned Tier 0 service extension– log-mining step runs on a separate box,
offloading the source database– same h/w and s/w setup as db server nodes
LCG 3D Status Dirk Duellmann 18
Further Decoupling between Databases
CERN RAC
SOURCE DATABASE
COPY redo log files
DOWNSTREAM DATABASE @ CERN DESTINATION SITES
CNAF
FNAL
CERN
propagation jobs
Objectives Remove impact of capture from Tier 0 database Isolate destination sites from each other
pair capture process + queue x each target site big Streams pool size redundant events ( x number of queues)
capture process
capture process
capture process
Slide : Eva Dafonte Perez
The LCG Distributed Database Infrastructure -
Database Backup / Recovery with Streams
• Collected DB / streams recovery scenarios – Recovery after T1 data loss - OK
• RAL recovered and re-synchronised• Replication CENR to CNAF continued unaffected
– Recovery after T0 data loss - OK– Coordinated point-in-time recovery - OK
• Service procedure documented and validated with two Tier 1 sites and CERN
• Full recovery exercise with all sites scheduled for the 3D Workshop at CNAF June 12-13– Recover Tier 1 database from tape– Resynchronise replication streams– While T0 database is being populated
The LCG Distributed Database Infrastructure -
Streams Performance Tuning
• Focus recently: Wide Area Network– Remote sites significantly affected by latency
• eg ASCG with 300ms round trip time
• Studied TCP level data flow between CERN and Taiwan
• Resulting Optimisations – increased TCP buffer size (OS level) – decrease frequency of acknowledgements
between src and dest DB
• Total improvement of factor 10– Now: 4000 logical change records / sec– Checklist for Tier 1 sites has been prepared
• Focus moved now to LAN setup optimisation
CERN - IT DepartmentCH-1211 Genève 23
Switzerlandwww.cern.ch/it
Experiment Database Activities & Plans
Andrea Valassi Conditions Databases Rimini, 7 May 2007
LHCb – COOL service model
• Two servers at CERN – essentially for online and offline– Replication to Tier1’s from the online database is a two-step replication– Online server at the pit (managed by LHCb); offline server in the CC
GRIDKARALIN2P3CNAFSARAPIC
(Marco Clemencic, 3D workshop 13 Sep 2006)
COOL (Oracle)
LFC Replication Testbed
LFC Read-Only Server
LFC Oracle Server
Replica DB
LFC R-W Server
LFC Oracle Server
Master DB
LFC R-W Server
Population Clients
Population Clients
Oracle Streams
rls1r1.cern.ch
lxb0716.cern.ch
lxb0717.cern.ch
Read Only Clients
lfc-streams.cr.cnaf.infn.it
lfc-replica.cr.cnaf.infn.it
WAN
The LCG Distributed Database Infrastructure -
File Catalogue replication between CERN and CNAF
• LFC replication via streams between CERN and CNAF in production since last November – Requested by LHCb to provide read-only catalog
replicas
• Stable operation without major problems– Several site interventions at CNAF have been
performed – Site restart and resynchronisation worked – Rate is low compared to conditions
• In contact with LHCb for adding remaining LHCb Tier 1 sites
Andrea Valassi Conditions Databases Rimini, 7 May 2007
ATLAS – COOL service model
• COOL Oracle services at Tier0 and ten Tier1’s– Two COOL servers at CERN for online/offline (similar to LHCb)
• Online database within the Atlas pit network, but physically in the CC
– In addition: Oracle at three ‘muon calibration center’ Tier2’s
OnlineOracleDB
Offlinemaster
CondDB
Tier-0Sqlite replica
(1 file/run)
Tier-1replica
Tier-1replica
Online / PVSS /
HLT farm
Tier-0 Dedicated 10Gbit link
ATLAS pit Computer centre Outside world
CERN publicnetwork
Calibration updates
Streamsreplication
ATLAS pitnetwork (ATCN)
gateway
(Sasha Vaniachine and Richard Hawkings, 3D workshop 14 Sep 2006)
GRIDKATAIWANRALIN2P3CNAFSARABNL TRIUMFPICNordugrid
The LCG Distributed Database Infrastructure -
ATLAS Muon Calibration Data Flow
• Picture: H. von der Schmitt
LCG 3D Project Status -
3D Database Resource Request and Current Predictions
4
Dual CPU DB
Nodes
DB Storage
[TB usable]
Request no change wrt GDB Nov’05
Conditions Challenges
(April - Jul)
3
2
0.3
0.1
ATLAS: COOL + TAGs
LHCb: COOL+ LFC r/o replica
Predictions next review after initial CDC phase (eg May)
Dress Rehearsals
(Jul-Nov)
3
2
0.3
0.1
ATLAS: 4GB on 64bit DB server
LHCb: 2 LFC r/o servers in place
expect resource upgrade: double storage and CPU review
LHC Startup
(from Nov)
3+x
2+y
1.0
0.3
ATLAS
LHCb
from June 2008
2009
nominal year
0.2 + 1.4
0.5 + 3.7
0.8 + 6.0
ATLAS COOL + TAGs(tbc by ATLAS)
Andrea Valassi Conditions Databases Rimini, 7 May 2007
CMS – conditions data at CERN
Vincenzo Innocente,
CMS Software Week,
April 2007
ORCON and ORCOFF conditions data are in POOL_ORA format
ORCON-ORCOFF Oracle Streams prototype set up (integration RAC).
Production set up later in 2007.
11 Oct 2006 3
FroNTier Launchpad Setup
CERN
DNS
round
Robbin
• 3 servers running Frontier & Squid (worker nodes)
• Backend Oracle Database 10gR2 (4-node RAC)
WANWAN
Provides load
balancing and
failover
T0 Farm
Slide: L. Lueking
LCG 3D Project Status -
CMS Request Update
5
26 Jan 2007 CMS Req. Update and Plans 9
Rough Estimates of CMS DB Resources March 2007 through August 2007
10
(currently ~5)
10
(currently ~2)
20Offline DBS
Tier-0 (CMSR)
10 per (Squid)
>100 ! all sites
10(per Squid)
>100 ! all sites
100 (per Squid)
2-3 Squids/site
Offline Conditions
Tier-1 (each site)
10 (DB)*
10 (Squid)
* Incl. On2off Xfer
10 (DB)*
10 (Squid)
* Incl. On2off Xfer
500 (DB)
100 (per Squid)
2-3 Squids
Offline Conditions
Tier-0 (CMSR)
1020500Online P5
Transactions
Peak usage (Hz)
Concurrent Users
Peak usage
Disk
Maximum (GB)
Area
(CMS contact)
No major Change until August
The LCG Distributed Database Infrastructure -
Deployment Status and Next Steps• CMS tested Frontier/SQUID setup during CSA ’06
– now some 30 Tier 1 and Tier 2 sites connected
• ATLAS and LHCb moved production mode in April
• All ten Tier 1 database sites integrated into a single distributed database infrastructure– ASGC, BNL, CNAF, GridKA, IN2P3, SARA/
NIKHEF, NDGF, PIC, RAL, TRIUMF – One of the largest distributed database setups
worldwide
• Preparing for Tier 1 replica scaling test with O(100) client nodes using ATLAS off line framework ATHENA
• In summer: participate in WLCG “Dress Rehearsal” tests
The LCG Distributed Database Infrastructure -
More information at
• WLCG 3D Project– http://lcg3d.cern.ch or
– http://twiki.cern.ch/twiki/bin/view/PSSGroup/LCG3DWiki
• CERN Physics Database Service– http://phydb.web.cern.ch/phydb/
• WLCG Persistency Framework– http://pool.cern.ch
– http://pool.cern.ch/coral
– http://cool.cern.ch