Upload
myrtle-davidson
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Database Monitoring Requirements
Salvatore Di Guida (CERN)On behalf of the CMS DB group
Salvatore Di Guida 2
Outline
• CMS Database infrastructure and data flow.• Data access patterns.• Requirements coming from the hardware and
software infrastructure:– DB safety and security;– DB monitoring for Conditions.
• Requirements to be fulfilled by front-end applications (web):– 3 tier architecture;– Authorization.
Monitoring Workshop
Salvatore Di Guida 3
CMS Database Infrastructure
• CMS has two production Oracle Real Application Clusters:– CMSONR, 6 nodes Oracle RAC located in the CMS experimental
area:• Only visible from the CMS online network,• Hosting two databases:
– OMDS stores data for sub-detectors, trigger, conditions (slow control, configuration, detector status), luminosity, monitoring,
– ORCON stores conditions (detector status data and calibration data);
– CMSR, 4 nodes ORACLE RAC located at CERN IT• Only visible within GPN,• Hosting one database (ORCOFF), storing conditions, luminosity,
workflow management data (file transfer, data bookkeeping, jobs processing, authentication and authorization).
Monitoring Workshop
Salvatore Di Guida 4
Condition Drop-boxORCOFF
(Offline Reconstruction Condition Database Offline System)
Streaming
OMDS (Online Master Database System)
ORCON (Offline Reconstruction
Condition Database Online System)CMS
Compact Muon Solenoid
PopCon
Online network at IP5 GPN
CMSO
NR CM
SRCMS Database Data Flow
• OMDS stores all online conditions coming from the different sub-detectors.• A subset (summary) of condition data is read from OMDS, reformatted in order to
be retrieved as C++ object (payload) and stored in ORCON:– Using Object Relational Access (ORA) design pattern;– Performed by applications based on a Common API integrated in CMSSW (PopCon).
• Oracle streams populate ORCOFF with data from OMDS and ORCON.• Condition Dropbox exports automatically data processed offline in ORCON.
Monitoring Workshop
Salvatore Di Guida 5
Database monitoring tools
• Many tools already available thanks to IT DB services:– For developers, they allow to check the status of all services, and
the usage of DB resources:• Main page: https://phydb.web.cern.ch/phydb/cms
– SLS monitoring for all services deployed,– Lemon for all hardware involved,– Session monitoring for each DB service and for each schema;
– For experts, they allow to deeply monitor each component of the system:• Streams availability,• DB resource usage (plenty of history plots);
– Automatic alarm notifications: • service failures (invalid objects, streams failure),• high loads on nodes (high CPU load, high network traffic…).
Monitoring Workshop
Salvatore Di Guida 6
Monitoring requirementsfor hardware and software
• Database safety and security.• Hardware and service monitoring across
different networks:– Complying with the security policy of the different
clusters;– With different levels of monitoring and a
corresponding alarm system.
Monitoring Workshop
Salvatore Di Guida 7
Data access patterns
• In general, access patterns depend on how an application exploits data stored in its backend:– Transactional data should be accessed by the
application itself in update mode, and not visible from other users;
– Bookkeeping and authentication information is static and read only, but can be huge;
– Conditions are of two kinds:• static and read-only (construction, equipment), • varying with time and requiring frequent lookups
(conditions, calibrations).
Monitoring Workshop
Salvatore Di Guida 8
Access patterns for conditions• Condition data produced by the CMS detector are essential for running
HLT, DQM and the offline reconstruction chain:– Managed by several groups within the collaboration;– Wide range of update frequency and data volume.
• The stability and the availability of the infrastructure must be ensured, and its performance must not be downgraded neither in write nor in read access: therefore, this requires to limit the access patterns for these data:– establishing a strict policy:
• NO DELETE, NO UPDATE, INSERT ONLY (append data to time-based sequences of validity ranges – IOV),
– promoting the usage of a reduced number of applications, for both data insertion and data retrieval:• PopCon and Condition DropBox,• Framework modules reading conditions (grouped consistently via Global Tag);
– Servers load in reading reduced using a caching mechanism (FroNTier).
Monitoring Workshop
Salvatore Di Guida 9
Retrieving conditions• The HLT (running at P5), DQM (running at P5 and Tier0/CAF), offline
reconstruction jobs (running at Tier0/Tier1s, ~20000 per day) and a subset of analysis jobs (running at Tier2s, ~50000 per day) can create a massive load when retrieving data (conditions, luminosity) from ORCON/ORCOFF.
• Frontier caches allow to minimize direct access to Oracle in read-only mode:– 2 services implemented: at P5 on ORCON for HLT and DQM, and at CERN on
ORCOFF for Tier0/1/2,• Dedicated instances for Tier0 express/prompt reconstruction, luminosity workflows,
MonteCarlo simulation,
– The cache refreshing policy can imply some latency in retrieving data.• The system is reliable w.r.t. the current workflows, but a change to the
current infrastructure must lead to severe loads on one or more nodes/services. Scalability is an issue.
Monitoring Workshop
Salvatore Di Guida 10
FroNTier Architecture
Monitoring Workshop
Salvatore Di Guida 11
FroNTier monitoring
• Each one of the FroNTier services is monitored:– Availability of CERN launchpads and all squids;– HTTP requests for CERN launchpads and all squids;– Network traffic of CERN launchpads and all squids;– Objects stored in cache for CERN launchpads and
all squids (object = payload of FroNTier request).
Monitoring Workshop
Salvatore Di Guida 12
Database safety and security
• Definition of a clearer account policy, and improvement of user privileges’ granting:– Based on application and user roles,
• See Oracle® Database Security Guide
• This policy is beneficial for monitoring the access to all DB schemas:– Each account can be easily associated to an application or a
group of developers:• Transactions can be easily tracked
– Reduce access with schema owner privileges:• Identify quickly accesses trying to perform unauthorized actions (e.g.
creating or inserting values in a read-only table).
Monitoring Workshop
Salvatore Di Guida 13
Monitoring access to Conditions
• PopCon monitors all payload transfers to production database:– Using a DB account where the status of all transfers
is logged in relation tables;– Exposing the logs to developers, managers, users via
a web-based application;• See Antonio’s presentation this afternoon.
• From the DB point of view, all transactions against production schemas performing DML statements are logged.
Monitoring Workshop
Salvatore Di Guida 14
Monitoring access to Conditions
• The creation/modification of ORA schemas (i.e. schemas where a mapping between tables and C++ data members is defined) is not yet monitored:– From the DB point of view, this means logging also DDL
statements, together with DML statements storing the mapping in the dedicated tables.
• This new monitoring instance will help to identify quickly:– Access to production schemas with wrong privileges;– Users/applications trying to perform illegal actions;– Corrupted schemas, providing help to experts for
troubleshooting;
Monitoring Workshop
Salvatore Di Guida 15
Plans for conditions in CMSSW
• The new account policy and the schema modification monitoring are going to be put in the Condition Core software package.
• All actions will be performed with the help of IT DBAs:– Validation of code and procedures;– Testing.
Monitoring Workshop
Salvatore Di Guida 16
Hardware & service configuration• The hardware involved in DB operations is split in two networks:
– CERN GPN:• Only applications approved by CERN Security Team can be visible from the
outside network!
– CMS online network at IP5 has a very strict security policy and a very constrained data transfer design:• Files cannot be copied from GPN to CMS network, but they must be pulled in
the online cluster from the offline network,• Files must be pushed by the online network to offline network,• Transferring data from GPN to CMS network is not envisaged in the online
network design.
• Some services are deployed in one network, but others (e.g. condition drop-box) use resources in both networks:– The communication between networks must be monitored too!
Monitoring Workshop
Salvatore Di Guida 17
Front-end applications
• The different monitoring instances for Database tools have a frontend application:– Retrieving monitoring data;– Aggregating them according to metrics based on different use-
case models;– Publishing them.
• The DB group focuses on web based front-end applications:– The monitoring data are read directly from Oracle:
• Small data volume,• Reduce latency as much as possible,• Checking Oracle availability (if Oracle fails, the application fails and an
alarm is raised).
Monitoring Workshop
Salvatore Di Guida 18
Multi-tier architecture• The monitoring system is based on three tier architecture:
– The presentation tier (frontend server) supports user interaction and data presentation;
– The logic tier (backend server) handles information exchange between the database and the user interface;
– The data tier supports the access to the data stored in the database.• This architecture has many advantages from the CMS DB monitoring
point of view:– encapsulates the functionality processing related to user interaction in an
application separated from the client application;– provides the possibility to program efficiently connection strategies (such as
clients requests queuing, database access control);– all the code related to database connection can be totally separated from the
client application, no queries issued by the client (users!);– Enforces security of backend and DB using firewall protection of GPN.
Monitoring Workshop
Salvatore Di Guida 19
ORCOFF Database
Backend server
Frontend server
WSGI
POST
XMLH
TTP
cx_Oracle
Web Interface
Logic tierData tier Pres
enta
tion
tier
Monitoring Workshop
Salvatore Di Guida 20
Authorization
• Database activity must be controlled.• Not all monitoring data should be visible
worldwide:– DB service names;– Account names.
• Authentication mechanism for all web based applications, deployed in the frontend servers:– For Drop-box: access to machine where the service is
deployed;– For Web browsing: SSO, e-groups.
Monitoring Workshop
Salvatore Di Guida 21
Technology
• There are many technologies available on the market and within the Open Source community.
• DB web based monitoring must be visible not only on desktops and laptops, but also on modern mobile devices:– See Antonio’s slides where this item is discussed in
detail.
Monitoring Workshop