TAG Catalog Replication Using Streams Florbela Viegas, CERN ADP

TAG CatalogReplicationUsing Streams

Florbela Viegas, CERN ADP

20/04/23 1

20/04/23 2

The TAG Information System is composed of several databases scattered across Europe and America and a suite of services used to access the data

The event data is composed of POOL Collections, where a run from a collection directly maps to a merged TAG dataset produced at Tier-0 and Tier-1s.

The TAG catalog maintains the information of what data is available where. It is accessed by the ELSSI Suite services, at each application site, presently CERN, BNL and TRIUMF.

The TAG catalog is replicated from CERN to the other sites using materialized views. It presently resides in the TAG databases next to the data.

Overview of the TAG Information System

20/04/23 3

TAG Data Architecture

COMA DB

TASK DB

All Data except Monte Carlo

CERNCOMA DB

Monte Carlo & Other recent

Data

DESYCOMA DB

Most Recent Data (no MC)

PIC

COMA DB

TASK DB

September Reprocessing

BNL

COMA DB

TASK DB

All 2010 Data except Monte

Carlo

TRIUMF

ELS

SI S

uite

ELS

SI S

uite

ELS

SI S

uite COMA DB

December Reprocessing

RAL

20/04/23 4

The TAG catalog master is installed at CERN ATLARC database. In the event of failure, recovery can and will take days.

The consequence of failure at CERN is that Tier-0 upload must completely stop and a backlog will quickly build up on busy periods.

Conceptually there is no reason why the catalog must be in the same database of the data. In fact, the catalog should continue to be available for writing if one of the data sites goes down, including CERN.

So, to address this, I propose to move the TAG catalog master to ATLR, as a first step.

Present weaknesses of the catalog

20/04/23 5

The TAG catalog will very soon include a « Service catalog » which will keep updated the state of all the services in the TAG Information systems

This will enable to make decisions at run time about failover and load balancing. For this need, a smaller latency between the replicas is needed than is offered by simple materialized views.

I propose to replicate the TAG catalog from ATLR to the Tier-1 3D databases using Streams.

I don’t see caveats for this situation, as the transaction volume is very small, and the size of the catalog is 38MB. I’d like your input for any issues that might arise with this decision.

Replication of the catalog

TAG Data Architecture – after move

COMA DB


CERN ATLARC

COMA DB

Reprocessed Data (no MC)

BNLTAGS

COMA DB


TRIUMFTAGS

ELS

SI S

uite

ELS

SI S

uite

ELS

SI S

uite

COMA DB

TASK DB

CERN ATLR

TASK DBTASK DB

BNL 3D TRIUMF 3D

Streams

Tier-0

Documents

TAG Catalog Replication Using Streams Florbela Viegas, CERN ADP