21
Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Embed Size (px)

Citation preview

Page 1: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Database Monitoring Requirements

Salvatore Di Guida (CERN)On behalf of the CMS DB group

Page 2: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 2

Outline

• CMS Database infrastructure and data flow.• Data access patterns.• Requirements coming from the hardware and

software infrastructure:– DB safety and security;– DB monitoring for Conditions.

• Requirements to be fulfilled by front-end applications (web):– 3 tier architecture;– Authorization.

Monitoring Workshop

Page 3: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 3

CMS Database Infrastructure

• CMS has two production Oracle Real Application Clusters:– CMSONR, 6 nodes Oracle RAC located in the CMS experimental

area:• Only visible from the CMS online network,• Hosting two databases:

– OMDS stores data for sub-detectors, trigger, conditions (slow control, configuration, detector status), luminosity, monitoring,

– ORCON stores conditions (detector status data and calibration data);

– CMSR, 4 nodes ORACLE RAC located at CERN IT• Only visible within GPN,• Hosting one database (ORCOFF), storing conditions, luminosity,

workflow management data (file transfer, data bookkeeping, jobs processing, authentication and authorization).

Monitoring Workshop

Page 4: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 4

Condition Drop-boxORCOFF

(Offline Reconstruction Condition Database Offline System)

Streaming

OMDS (Online Master Database System)

ORCON (Offline Reconstruction

Condition Database Online System)CMS

Compact Muon Solenoid

PopCon

Online network at IP5 GPN

CMSO

NR CM

SRCMS Database Data Flow

• OMDS stores all online conditions coming from the different sub-detectors.• A subset (summary) of condition data is read from OMDS, reformatted in order to

be retrieved as C++ object (payload) and stored in ORCON:– Using Object Relational Access (ORA) design pattern;– Performed by applications based on a Common API integrated in CMSSW (PopCon).

• Oracle streams populate ORCOFF with data from OMDS and ORCON.• Condition Dropbox exports automatically data processed offline in ORCON.

Monitoring Workshop

Page 5: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 5

Database monitoring tools

• Many tools already available thanks to IT DB services:– For developers, they allow to check the status of all services, and

the usage of DB resources:• Main page: https://phydb.web.cern.ch/phydb/cms

– SLS monitoring for all services deployed,– Lemon for all hardware involved,– Session monitoring for each DB service and for each schema;

– For experts, they allow to deeply monitor each component of the system:• Streams availability,• DB resource usage (plenty of history plots);

– Automatic alarm notifications: • service failures (invalid objects, streams failure),• high loads on nodes (high CPU load, high network traffic…).

Monitoring Workshop

Page 6: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 6

Monitoring requirementsfor hardware and software

• Database safety and security.• Hardware and service monitoring across

different networks:– Complying with the security policy of the different

clusters;– With different levels of monitoring and a

corresponding alarm system.

Monitoring Workshop

Page 7: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 7

Data access patterns

• In general, access patterns depend on how an application exploits data stored in its backend:– Transactional data should be accessed by the

application itself in update mode, and not visible from other users;

– Bookkeeping and authentication information is static and read only, but can be huge;

– Conditions are of two kinds:• static and read-only (construction, equipment), • varying with time and requiring frequent lookups

(conditions, calibrations).

Monitoring Workshop

Page 8: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 8

Access patterns for conditions• Condition data produced by the CMS detector are essential for running

HLT, DQM and the offline reconstruction chain:– Managed by several groups within the collaboration;– Wide range of update frequency and data volume.

• The stability and the availability of the infrastructure must be ensured, and its performance must not be downgraded neither in write nor in read access: therefore, this requires to limit the access patterns for these data:– establishing a strict policy:

• NO DELETE, NO UPDATE, INSERT ONLY (append data to time-based sequences of validity ranges – IOV),

– promoting the usage of a reduced number of applications, for both data insertion and data retrieval:• PopCon and Condition DropBox,• Framework modules reading conditions (grouped consistently via Global Tag);

– Servers load in reading reduced using a caching mechanism (FroNTier).

Monitoring Workshop

Page 9: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 9

Retrieving conditions• The HLT (running at P5), DQM (running at P5 and Tier0/CAF), offline

reconstruction jobs (running at Tier0/Tier1s, ~20000 per day) and a subset of analysis jobs (running at Tier2s, ~50000 per day) can create a massive load when retrieving data (conditions, luminosity) from ORCON/ORCOFF.

• Frontier caches allow to minimize direct access to Oracle in read-only mode:– 2 services implemented: at P5 on ORCON for HLT and DQM, and at CERN on

ORCOFF for Tier0/1/2,• Dedicated instances for Tier0 express/prompt reconstruction, luminosity workflows,

MonteCarlo simulation,

– The cache refreshing policy can imply some latency in retrieving data.• The system is reliable w.r.t. the current workflows, but a change to the

current infrastructure must lead to severe loads on one or more nodes/services. Scalability is an issue.

Monitoring Workshop

Page 10: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 10

FroNTier Architecture

Monitoring Workshop

Page 11: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 11

FroNTier monitoring

• Each one of the FroNTier services is monitored:– Availability of CERN launchpads and all squids;– HTTP requests for CERN launchpads and all squids;– Network traffic of CERN launchpads and all squids;– Objects stored in cache for CERN launchpads and

all squids (object = payload of FroNTier request).

Monitoring Workshop

Page 12: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 12

Database safety and security

• Definition of a clearer account policy, and improvement of user privileges’ granting:– Based on application and user roles,

• See Oracle® Database Security Guide

• This policy is beneficial for monitoring the access to all DB schemas:– Each account can be easily associated to an application or a

group of developers:• Transactions can be easily tracked

– Reduce access with schema owner privileges:• Identify quickly accesses trying to perform unauthorized actions (e.g.

creating or inserting values in a read-only table).

Monitoring Workshop

Page 13: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 13

Monitoring access to Conditions

• PopCon monitors all payload transfers to production database:– Using a DB account where the status of all transfers

is logged in relation tables;– Exposing the logs to developers, managers, users via

a web-based application;• See Antonio’s presentation this afternoon.

• From the DB point of view, all transactions against production schemas performing DML statements are logged.

Monitoring Workshop

Page 14: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 14

Monitoring access to Conditions

• The creation/modification of ORA schemas (i.e. schemas where a mapping between tables and C++ data members is defined) is not yet monitored:– From the DB point of view, this means logging also DDL

statements, together with DML statements storing the mapping in the dedicated tables.

• This new monitoring instance will help to identify quickly:– Access to production schemas with wrong privileges;– Users/applications trying to perform illegal actions;– Corrupted schemas, providing help to experts for

troubleshooting;

Monitoring Workshop

Page 15: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 15

Plans for conditions in CMSSW

• The new account policy and the schema modification monitoring are going to be put in the Condition Core software package.

• All actions will be performed with the help of IT DBAs:– Validation of code and procedures;– Testing.

Monitoring Workshop

Page 16: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 16

Hardware & service configuration• The hardware involved in DB operations is split in two networks:

– CERN GPN:• Only applications approved by CERN Security Team can be visible from the

outside network!

– CMS online network at IP5 has a very strict security policy and a very constrained data transfer design:• Files cannot be copied from GPN to CMS network, but they must be pulled in

the online cluster from the offline network,• Files must be pushed by the online network to offline network,• Transferring data from GPN to CMS network is not envisaged in the online

network design.

• Some services are deployed in one network, but others (e.g. condition drop-box) use resources in both networks:– The communication between networks must be monitored too!

Monitoring Workshop

Page 17: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 17

Front-end applications

• The different monitoring instances for Database tools have a frontend application:– Retrieving monitoring data;– Aggregating them according to metrics based on different use-

case models;– Publishing them.

• The DB group focuses on web based front-end applications:– The monitoring data are read directly from Oracle:

• Small data volume,• Reduce latency as much as possible,• Checking Oracle availability (if Oracle fails, the application fails and an

alarm is raised).

Monitoring Workshop

Page 18: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 18

Multi-tier architecture• The monitoring system is based on three tier architecture:

– The presentation tier (frontend server) supports user interaction and data presentation;

– The logic tier (backend server) handles information exchange between the database and the user interface;

– The data tier supports the access to the data stored in the database.• This architecture has many advantages from the CMS DB monitoring

point of view:– encapsulates the functionality processing related to user interaction in an

application separated from the client application;– provides the possibility to program efficiently connection strategies (such as

clients requests queuing, database access control);– all the code related to database connection can be totally separated from the

client application, no queries issued by the client (users!);– Enforces security of backend and DB using firewall protection of GPN.

Monitoring Workshop

Page 19: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 19

ORCOFF Database

Backend server

Frontend server

WSGI

POST

XMLH

TTP

cx_Oracle

Web Interface

Logic tierData tier Pres

enta

tion

tier

Monitoring Workshop

Page 20: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 20

Authorization

• Database activity must be controlled.• Not all monitoring data should be visible

worldwide:– DB service names;– Account names.

• Authentication mechanism for all web based applications, deployed in the frontend servers:– For Drop-box: access to machine where the service is

deployed;– For Web browsing: SSO, e-groups.

Monitoring Workshop

Page 21: Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group

Salvatore Di Guida 21

Technology

• There are many technologies available on the market and within the Open Source community.

• DB web based monitoring must be visible not only on desktops and laptops, but also on modern mobile devices:– See Antonio’s slides where this item is discussed in

detail.

Monitoring Workshop