Upload
azure
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
TAG Catalog Replication Using Streams Florbela Viegas, CERN ADP. The TAG Information System is composed of several databases scattered across Europe and America and a suite of services used to access the data - PowerPoint PPT Presentation
Citation preview
TAG CatalogReplicationUsing Streams
Florbela Viegas, CERN ADP
20/04/23 1
20/04/23 2
The TAG Information System is composed of several databases scattered across Europe and America and a suite of services used to access the data
The event data is composed of POOL Collections, where a run from a collection directly maps to a merged TAG dataset produced at Tier-0 and Tier-1s.
The TAG catalog maintains the information of what data is available where. It is accessed by the ELSSI Suite services, at each application site, presently CERN, BNL and TRIUMF.
The TAG catalog is replicated from CERN to the other sites using materialized views. It presently resides in the TAG databases next to the data.
Overview of the TAG Information System
20/04/23 3
TAG Data Architecture
COMA DB
TASK DB
All Data except Monte Carlo
CERNCOMA DB
Monte Carlo & Other recent
Data
DESYCOMA DB
Most Recent Data (no MC)
PIC
COMA DB
TASK DB
September Reprocessing
BNL
COMA DB
TASK DB
All 2010 Data except Monte
Carlo
TRIUMF
ELS
SI S
uite
ELS
SI S
uite
ELS
SI S
uite COMA DB
December Reprocessing
RAL
20/04/23 4
The TAG catalog master is installed at CERN ATLARC database. In the event of failure, recovery can and will take days.
The consequence of failure at CERN is that Tier-0 upload must completely stop and a backlog will quickly build up on busy periods.
Conceptually there is no reason why the catalog must be in the same database of the data. In fact, the catalog should continue to be available for writing if one of the data sites goes down, including CERN.
So, to address this, I propose to move the TAG catalog master to ATLR, as a first step.
Present weaknesses of the catalog
20/04/23 5
The TAG catalog will very soon include a « Service catalog » which will keep updated the state of all the services in the TAG Information systems
This will enable to make decisions at run time about failover and load balancing. For this need, a smaller latency between the replicas is needed than is offered by simple materialized views.
I propose to replicate the TAG catalog from ATLR to the Tier-1 3D databases using Streams.
I don’t see caveats for this situation, as the transaction volume is very small, and the size of the catalog is 38MB. I’d like your input for any issues that might arise with this decision.
Replication of the catalog
TAG Data Architecture – after move
COMA DB
All Data except Monte Carlo
CERN ATLARC
COMA DB
Reprocessed Data (no MC)
BNLTAGS
COMA DB
All Data except Monte Carlo
TRIUMFTAGS
ELS
SI S
uite
ELS
SI S
uite
ELS
SI S
uite
COMA DB
TASK DB
CERN ATLR
TASK DBTASK DB
BNL 3D TRIUMF 3D
Streams
Tier-0