Upload
myrtle-webster
View
223
Download
0
Embed Size (px)
Citation preview
Control in ATLAS TDAQ
Dietrich Liko on behalf of
the ATLAS TDAQ Group
CHEP04 - Interlaken Control of the ATLAS TDAQ system 2
Overview The ATLAS TDAQ System
Dataflow & HLT
Control Subsystem of the Online Software Architecture TDAQ Wide Run Control Group
Technology Choice CLIPS
Design & Implementation Expert System Framework Run Control, Supervision & Verification
Testing & Verification Test beam Scalability Tests
CHEP04 - Interlaken Control of the ATLAS TDAQ system 3
The ATLAS TDAQ System Dataflow
ROD ROS
LVL1 HLT
LVL2 Event Filter
Online System Operation
DCS Detector control
Test beam: see [331] Event Building
Performance: see [217]
CHEP04 - Interlaken Control of the ATLAS TDAQ system 4
Control Aspects
Dataflow Fixed configuration Synchronization, classical Run Control Error handling
High level Triggers Flexible configuration Synchronization Error Handling
CHEP04 - Interlaken Control of the ATLAS TDAQ system 5
ATLAS Online Software
Component Architecture Object Oriented, C++ and Java Distributed system (CORBA) XML for Configuration
Specialized services for a TDAQ system Information sharing, Message Reporting, Configuration
Iterative Development Model Prototype already in use Laboratories, Test beam, Scalability tests Evolvement into the systems for initial ATLAS system
CHEP04 - Interlaken Control of the ATLAS TDAQ system 6
Online Software Architecture
In the context of the iterative development cycle and the Technical Design Review Reevaluation of requirements and architecture Several high level packages & corresponding subsystems
Control Supervision, Verification
Databases: see [130] Configuration, Conditions
Information Sharing: see [166] Information Service, Message Service, Monitoring
CHEP04 - Interlaken Control of the ATLAS TDAQ system 7
Control Subsystem
In the following only the Supervision subsystem is discussed
CHEP04 - Interlaken Control of the ATLAS TDAQ system 8
Supervision The Initialization and Shutdown is responsible for:
initialization of TDAQ hardware and software components; re-initialization of a part of the TDAQ partition when necessary; shutting the TDAQ partition down gracefully; TDAQ process supervision.
The Run Control is responsible for controlling the Run by accepting commands from the user and sending
commands to TDAQ sub-systems; analyzing the status of controlled sub-systems and presenting the
status of the whole TDAQ to the Operator
The Error Handling is concerned with analyzing run-time error messages coming from TDAQ sub-systems; diagnosing problems, proposing recovery actions to the operator, or
performing automatic recovery if requested.
CHEP04 - Interlaken Control of the ATLAS TDAQ system 9
TDAQ Wide Run Control group
Examines the requirements from the subsystem side Dataflow, HLT
Hierarchical concept Follows the overall organization of the TDAQ system
Controller central element All control functionality in combined controller State machine concept for synchronization Flexibility in error handling User customization
CHEP04 - Interlaken Control of the ATLAS TDAQ system 10
Initial Design & Technology Choice
A Run Control implementation is based on a State Machine model and uses the State Machine compiler, CHSM, as underlying technology. P.J. Lucas, An Object-Oriented language system for
implementing concurrent hierarchical, finite state machines, MS Thesis, University of Illinois, (1993)
A Supervisor is mainly concerned with process management. It has been built using the Open Source expert system CLIPS CLIPS, A tool for building expert systems,
http://www.ghg.net/clips/CLIPS.html
A Verification system (DVS) performs tests and provides diagnosis. It is also based on CLIPS.
CHEP04 - Interlaken Control of the ATLAS TDAQ system 11
Experiences
PLUS Scalability test in 2002 demonstrated that a
system of the size of ATLAS TDAQ system can be controlled
MINUS Lack of flexibility (CHSM)
CHEP04 - Interlaken Control of the ATLAS TDAQ system 12
Technologies CLIPS
Production system, standard open source expert system So-called Rete algorithm drives the evaluation rules on a set of facts In house experience General purpose scripting language, OO features C language bindings
Alternatives Jess: Java based, very similar to CLIPS Eclipse: Commercial evolution of CLIPS
SMI++ State Machine No general purpose scripting language Difficult to integrate in our environment
Python Excellent scripting language No expert system
CHEP04 - Interlaken Control of the ATLAS TDAQ system 13
Design & Implementation
General Framework embedding CLIPS in a CORBA server Periodic evaluation of knowledge base Extension mechanism
Online Software Components embedded as plug ins
Control functionality fully described by CLIPS rules
CHEP04 - Interlaken Control of the ATLAS TDAQ system 14
Proxy Objects
Represent external entities Other controllers, processes etc Member attributes exposed to expert system as facts Member functions implement functionality in terms of
Online software components
Example Proxy objects represents child controllers State of the object corresponds to state of the child
(idle, configured, running) Commands are forwarded to child controllers
CHEP04 - Interlaken Control of the ATLAS TDAQ system 15
Controller
Proxy Objects
Other Controllers
External processes
Rules drive interactions between objects
CHEP04 - Interlaken Control of the ATLAS TDAQ system 16
Status
Supervisor Uses Framework
Run Control Uses Framework
Verification system CLIPS based
Choice of a common technology drives the path to an unified control system based on Controllers
CHEP04 - Interlaken Control of the ATLAS TDAQ system 17
Scalability Test 2004
Test bed Up to 330 PCs of the CERN IT LXSHARE 600 to 800 MHz to 2.4 GHZ Dual Pentium III 256 to 512 MB Linux RedHat 7.3
Only control aspect verified No Dataflow network
Various configurations Servers on standard machines Servers on dedicated high end machines
CHEP04 - Interlaken Control of the ATLAS TDAQ system 18
Supervisor – Process Management
Supervisor P
P
P
One Supervisor PMG Agents
Startup limited by initialization of processes
Enhanced recovery
procedures
CHEP04 - Interlaken Control of the ATLAS TDAQ system 19
Startup with 1000 Controllers & 3000 processes in 40 to 100 seconds
Several configurations: mon_standard has two additional processes for a controller
CHEP04 - Interlaken Control of the ATLAS TDAQ system 20
Run Control
Usual RC tree Actually 10 controllers
on the lowest level Variation of the
number of intermediate nodes
Some central infrastructure Name Service (IPC) Information Sharing
CHEP04 - Interlaken Control of the ATLAS TDAQ system 21
Transitions
7 internal phases With 1000 Controllers 2 to 6 seconds No “real life” actions
Again:
More flexible error handling
CHEP04 - Interlaken Control of the ATLAS TDAQ system 22
Combined Testbeam 2004
Stable operation from the start – Advantage of the component model
CHEP04 - Interlaken Control of the ATLAS TDAQ system 23
Conclusions New assessment of requirements
Overall Architecture Controller studied in detail
CLIPS confirmed as technology choice
Design and implementation of a new framework
First test of new systems Test beam Scalability test
We can control a system of the size of the ATLAS TDAQ system Much more flexible system
Common technology in various control components Unified controllers in the future