Peter Chochula and Svetozár Kapusta ALICE DCS Workshop, October 6,2005 DCS Databases

Peter Chochula and Svetozár Kapusta ALICE DCS Workshop, October 6,2005

DCS Databases

Outline

DCS data flow DCS Configuration

Framework devices FERO Configuration

DCS Archive Implementation of the DCS archive and present

status DCS and the condition data AMANDA Additional developments of the DCS archive interface

The DCS Data Flow

Config.

PVSSII

Archive

DevicesDevices

DevicesDevices

ECS

Electricity

Ventilation

Cooling

Gas

Magnets

Safety

Access Control

LHC

DAQ TRI HLT

DIM, DIP

FER

O V

ersion

The DCS Configuration

FW tools provide methods for creation and use of the configuration database for FW devices

ALICE add-on is the FERO configuration Ways to organize the FERO configuration

database were presented several time (for example see the slides from Utrecht Workshop, or contact Peter or Sveťo)

FERO Configuration - remarks

There is a common misunderstanding concerning the FERO configuration

There is a difference between FERO configuration data and FED configuration

FERO configuration data is directly loaded by the FED to the front-end electronics (e.g. thresholds, mask matrices etc.)

FERO configuration should be organized as tables or BLOBs as explained in previous meetings

FED configuration concerns the monitoring and operation of the FED (alert limits etc.)

The FED configuration should be based on the FW concepts and components

The PVSSII FED interface should follow the FW device definition The FW configuration database tools can be used to create and

manipulate the configuration records Tools for defining the FW device are available

The DCS Archive implementation

The concept has been presented several times (for example see slides from Utrecht Workshop)

The DCS Archive and Conditions Data

The DCS archive contains all measured values (tagged for archival)

Used by trending tools allows for access to historical values, e.g. for system

debugging too big and too complex for offline data analysis

(contains information which is not directly related to physics data such as safety, services, etc.)

The main priority of the DCS is assure reliable archival of its data

The conditions database contains a sub-set of measured values, stripped from the archive

DCS Archive implementation

Standard PVSS archival: Archives are stored in local files (one set of files per PVSSII

system) OFFLINE archives are stored on backup media (tools for backup

and retrieval are part of the PVSSII) Problems with external access to archive files due to PVSS

proprietary format The PVSS RDB archival:

is replacing the previous method based on local files Oracle database server is required Architecture resembles the previous concept based on files

ORACLE tablespace replaces the file A library handles the management of data on the Oracle Server

(create/close tables, backup tables…)

DCS Archival status

Mechanism based on ORACLE was delivered by ETM and tested by the experiments

All PVSS-based tools are compatible with the new approach (we can profit from the PVSS trending etc.)

problems were discovered, ETM is working on improved version ALICE main concern is the performance (~100 inserts/s per

PVSS system) – which is not enough to handle an alert avalanche in a reasonable time

Architectural changes were requested – e.g. the archive switching process should be replaced by ORACLE partitioning, etc.

DCS will provide file-based archival during the pre-installation phase and replace it with ORACLE as soon as a new version qualifies for deployment

Tools for later conversion from files to the RDB will be provided

AMANDA

AMANDA is a PVSS-II manager which uses the PVSS API to access the archives

Manager and Win C++ client developed by the DCS team (Vlado Fekete)

Root client and interface to the OFFLINE developed by Boyko Yordanov

Powerful communication protocol handles the data transfer and error reporting

Archive architecture is transparent to AMANDA AMANDA can be used as interface between the

PVSS archive and non-PVSS clients AMANDA returns data for requested time period

AMANDA – Alice Manager for DCS Archives

User InterfaceManager

Driver DriverDriver



EventManager

APIManager

ControlManager

AMANDAServer

Archive(s)

AMANDAClient

PVSS-II

ArchiveManager(s)Archive

Manager(s)ArchiveManager

DataManager

Operation of AMANDA

1. After receiving the connection request, AMANDA creates a thread which will handle the client request

2. Due to present limitations of PVSS DM the requests are served sequentially as they arrive (the DM is not multithreaded)

3. AMANDA checks the existence of requested data and returns an error if it is not available

4. AMANDA retrieves data from archive and sends it back to the client in formatted blocks

AMANDA adds additional load to the running PVSS system !

Final qualification of AMANDA for use in the production system depends on the user requirements

AMANDA in distributed system

The PVSS can directly access only file-based data archives stored by its own data manager

In a distributed system also data produced by other PVSS can be accessed, if the connection via DIST manager exists

In case of the file-based archival DM of the remote system is always involved in the data transfer

In case of RDB archival, the DM can retrieve any data provided by other PVSS within the same distributed system without bothering other DMs

It is foreseen, that each PVSS system will run its own AMANDA

There will be at least one AMANDA per detector Using more AMANDA servers overcomes some API limitations –

some requests can be parallelized (clients need to know which sever has direct access to the requested data)

AMANDA in the distributed environment (archiving to files)

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

DIS

DISDIS

AMANDA in the distributed environment (archiving to ORACLE)

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

ORACLE

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

UI

DM

DRV DRVDRV

UI UI

EM

APICTR

AMD

PVSS-II

DIS

DISDIS

Additional CondDB developments

Development of a new mechanism for retrieving data from the RDB archive has been launched (Jim Cook – ATLAS)

A separate process will access the RDB directly without involving the PVSS

this approach will overcome the PVSS API limitations the data gathering process can run outside of the online DCS

network The conditions data will be described in the same database

(which datapoints should be retrieved, what processing should be applied, etc.)

Configuration of the conditions will be done via PVSS panels by the DCS users – unified interface for all detectors

Data will be written to the desired destination (root files in case of ALICE, COOL for ATLAS, CMS and LHCB)

Parts of AMANDA client could be re-used First prototype available

Performance issues – Database Performance

Performance of the DB server was studied The main question was – is the DB

server’s performance a bottleneck? Several methods of data insertion were

tested

Data Insertion Rate to the DB Server

Data was inserted to 2 column table (number(5),varchar2(128) no index)

Following results were obtained: OCCI autocommit ~500/s PL/SQL (bind variables) ~10000/s PL/SQL (vararrays) ~>50000/s

Comparing the results with estimated performance of PVSS systems we can conclude, that possible limitation could be the interface implementation, but not the server

Data Download from Oracle Server (BLOBs)

150MB of configuration data, 3 concurrent clients, DCS Private network (1Gbit), 100Mbit/s client connection, 1Gbit/s server connection:

Bfile ~10.59 MB/s Blob ~10.40 MB/s Blob, stream ~10.93 MB/s Blob, ADO.NET ~10.10 MB/s

150MB of configuration data, 1 client, CERN 10Mbit/s network:

Bfile ~0.81 MB/s Blob ~0.78 MB/s Blob, stream ~0.81 MB/s Blob, ADO.NET ~0.77 MB/s

Data Download Results (BLOBs)

The obtained results are comparable with the raw network throughput (direct copy between two computers)

The DB server does not add significant overhead, neither for concurrent client session

The main bottleneck could be the network There will be non-blocking switches used on the DCS

network Further network optimization is possible according to

the DB usage pattern (DCS switches are expandable to more performing backbone technologies)

If needed, additional DB servers could be installed Some applications could profit from local data caching

Data download from tables

Performance was studied using a realistic example – the SPD configuration

The FERO data of one SPD readout chip consists of 44 DAC settings/chip

There are in total 1200 chips used in the SPD Mask matrix for 8192 pixels/chip 15x120 front-end registers

The full SPD configuration can be loaded within ~3sec

FW Configuration Database Tests (1)

Development and tests were performed by Piotr Golonka (results are still preliminary, but already convincing)

Test configuration consists of CAEN SY1527 with 16 A1511 boards (12 channels each)

In the first tests all device properties were written/retrieved to/from the database

In the second test only typical properties were used Values: i0, i1, v0, v1 setpoints, switch on/off, ramp speeds, trip

time Alerts: overcurrent, overvoltage, trip, undervoltage, hw alert,

current current, chanel on, status, current voltage)


Test 1 results (all properties) 28 sec for loading the recipe from database 7 sec to apply the recipe to PVSS

27 sec needed to store the recipe in cache 24 sec to apply recipe from cache

17sec to extract data from cache 7 sec to apply the recipe to PVSS


Test 2 results (selected properties) 14 sec for loading the recipe from database 5 sec to apply the recipe to PVSS

10 sec needed to store the recipe in cache 10.5 sec to apply recipe from cache

5 sec to extract data from cache 5.5 sec to apply the recipe to PVSS

Tests performed with 26 systems showed that the download times are not affected by concurrent sessions

The FW configuration DB - results

The total time needed to download a configuration for a CAEN crate (using caching mechanism) is ~10s

Significant speed improvement - this number should be compared with the previous time of ~180s

Further performance improvements are expected (using prefetching mechanism etc.)

Incremental recipes will bring both performance and functionality improvement

Incremental recipe is loaded on top of a existing recipe

Present status and Conclusions

Configuration DB FW ConfDB concept was redesigned

Significant speed improvement New functionality is being added (e.g. recipe switching)

FED Configuration should follow the FW ConfDB concepts FERO Configuration performance did not reveal problems

Archival DB Present RDB problems are being solved

File-based archival will be used in the early stages of the pre-installation

AMANDA provides and interface with external clients, independent from archive technology

Present ATLAS and future FW developments could provide an efficient method for accessing the RDB archive

Documents

Peter Chochula and Svetozár Kapusta ALICE DCS Workshop, October 6,2005 DCS Databases