31
Gaetano Maron, ANW, Halifax, November 2004 1 GRIDCC A realtime interactive GRID to integrate instruments, computational and information resources widely spread on a fast WAN Gaetano Maron Istituto Nazionale di Fisica Nucleare Laboratori Nazionali di Legnaro Legnaro Italy

Gaetano Maron, ANW, Halifax, November 2004 1 GRIDCC A realtime interactive GRID to integrate instruments, computational and information resources widely

Embed Size (px)

Citation preview

Gaetano Maron, ANW, Halifax, November 2004

1

GRIDCC

A realtime interactive GRID to integrate instruments, computational and information resources widely spread on a fast WAN

Gaetano Maron

Istituto Nazionale di Fisica NucleareLaboratori Nazionali di Legnaro

LegnaroItaly

Gaetano Maron, ANW, Halifax, November 2004

2

Outline

• Brief introduction to the project

• Main Pilot Applications

• The GRIDCC Services

• Technologies and performances

• Conclusions

Gaetano Maron, ANW, Halifax, November 2004

3

GRIDCC main goals• ... the GRIDCC project extends the state of the art of computing Grid

technologies, by introducing the handling of real-time constraints and interactive response into the existing Grid middleware

• Our goal is to build a widely distributed system that is able to remotely control and monitor complex instrumentation …These new applications introduce requirements for real-time and highly interactive operation of GRID resources.

• One of the main objectives of the project is to verify the feasibility of a Grid-based remote control of systems requiring real-time response with real applications running on existing Grid test beds over both national and international network infrastructures (e.g. GEANT).

• GRIDCC integrates a “grid of instrumentation” into existing Grid infrastructures that provide the computational power and storage needed for the applications ….

Gaetano Maron, ANW, Halifax, November 2004

4

Partecipants

Participant name Country

Istituto Nazionale di Fisica Nucleare Italy

Institute Of Accelerating Systems and Applications Greece

Brunel University UK

Consorzio Interuniversitario per Telecomunicazioni Italy

Sincrotrone Trieste S.C.P.A Italy

IBM (Haifa Research Lab) Israel

Imperial College of Science, Technology & Medicine UK

Istituto di Metodologie per l’Analisi ambientale – Consiglio Nazionale delle Ricerche

Italy

Universita degli Studi di Udine Italy

Greek Research and Technology Network S.A. Greece

Gaetano Maron, ANW, Halifax, November 2004

5

The Origin of the ProjectThe control of CMS Experiment

Gaetano Maron, ANW, Halifax, November 2004

6The CMS Data Acquisition

• O(104 ) distributed Objects to– control– configure– monitor

• On-line diagnostics and problem solving capability

• Highly interactive system (human reaction time - fraction of second)

• World Wide distributed monitor and control

2 107 electronics channels 40 MHz

100 Hz

Gaetano Maron, ANW, Halifax, November 2004

7

From the CMS Control and Monitor System ....

Virtual Control Room

Supporting Services

Diagnostic Tools

Interface to the“Instrumentation”

Standard comm.protocols

“Instrumentation”

Gaetano Maron, ANW, Halifax, November 2004

8

.... to the GRIDCC project

SupportingServices

VirtualCntr. Room

VirtualCntr. Room

Diagnostics

Instrument 1

Instrument 2

Instrument 3

ProcessingFarm

ProcessingFarm

DataStorage

Use of the Grid technology, as extension of the Web Service Technologies,to develop a a widley distributed control system with access to grid enabledcomputing and data storage facilities

Gaetano Maron, ANW, Halifax, November 2004

9

GRIDCC Layout

UserInterface

DiagnosticService

Virtual InstrumentGrid Services

Instr.Instr.

Instr.

Video Conf. & Chat Service

Inst

r.In

str.

Far

m

StorageServices

Inst

r.In

str.

DB

s

Instr.Instr.Tele

prsnce

Virtual Control Room

Virtual Control Room

UserInterface

UserInterface

Cooperative Environment

Test Bed GridInfrastructure

(Web-Service Infr.)

Security & loginService

ResourceService

InformationService

(Monitor)

Supporting Services

Problem SolverService

Data MiningTool

Knowledge based Services

FarmServices

Job Control

Work FlowEngine Service

Ex

isti

ng

GR

ID

faci

litie

s

WP2

:

Rea

l-tim

e an

d

Inte

ract

ive

Web

Ser

vice

s

WP3

: Grid

Ena

ble

Inst

rum

enta

tion

WP3

: Grid

Ena

ble

Inst

rum

enta

tion

WP4

: RT

Acc

ess

To e

xist

ing

Grid

ser

vice

s

WP5

: Coo

pera

tive

Envi

ronm

ent

BRUNEL

INFN

Imperial

INFNElettra

Gaetano Maron, ANW, Halifax, November 2004

10

Application Fields• Experimental Sciences

– Take control of a experiment from a distance (remote operation and control, data taking and data analysis):

• High Energy, Nuclear and Solid State Physics• Electronic Microscopes• Telescopes

• Monitoring and analysis of the territory (e.g. disaster analysis)– Meteorology– Geophysics

• Bio-medics– Integration of remote operation, data taking, data analysis and data

storage of sophisticated instruments like:• Mammography• Pet, TAC, NMR etc.

• Industrial Applications– widely distributed controls

• Electrical power grid• Public transportation• ……

Gaetano Maron, ANW, Halifax, November 2004

11

Pilot Application IPower Grid

– In electrical utility networks (or power grids), the introduction of very large numbers of ‘embedded’ power generators often using renewable energy sources, creates a severe challenge for utility companies.

– Existing computer software technology for monitoring and control is not scalable and cannot provide a solution for the many thousands of generators that are anticipated.

– GridCC technology would allow the generators to participate in a VO, and consequently to be monitored and scheduled in a cost-effective manner

– Embedded power generator is still in developing phase. GridCC project has access to its full computer emulation. So the Power Grid application will consist of a network (O(100)) of emulated embedded power generators and their full control and monitor operation

Gaetano Maron, ANW, Halifax, November 2004

12

Pilot Application II (Far) Remote Operation of Accelerator Facility

– Far remote operation of an accelerator facility (i.e. the Elettra Control Room in Italy) involves the planning of accelerator operations, the maintenance of the accelerator and its trouble shooting, the repair of delicate equipment, understanding and pushing performance limitations, performing studies, performing commissioning and set ups and routine operations.

– All these activities are based on large amounts of information, which are at present accessible only at the accelerator site. Remote control of an accelerator facility has the potential of revolutionising the mode of operation and the degree of exploitation of large experimental physics facilities.

– This pilot application will combine elements of immersive (i.e. providing the feeling to be present at the remote location) communication and cooperation technology. This includes video and audio presence, allowing the simultaneous operation of the same instruments, having access to the same accelerator controls and the relevant data, meeting easily and spontaneously and providing full awareness of the presence of the collaborators.

Gaetano Maron, ANW, Halifax, November 2004

13

– This application involves the use of the Grid in a real-time environment to control and monitor remote large-scale detectors.

– This application will make use of a High-Energy Physics (HEP) experiment, the CMS detector which is currently under construction at the future LHC collider at CERN. Data taking is foreseen by 2007, but several pre-production activities are planed.

– (See previous slides for some more details about CMS detector. )

– This application will be developed along the CMS on-line software developing and will have same time schedule and delivery terms.

Pilot Application IIIControl and monitor of high energy experiments

Gaetano Maron, ANW, Halifax, November 2004

14

The other GridCC pilot applications

• Meteorology (Ensemble Limited Area Forecasting)

• Analysis of neuro-physiological data (migraine attacks treatments)

• Device Farm for the Support of Cooperative Distributed Measurements in Telecommunications and Networking Laboratories

• Geo-hazards: Remote Operation of Geophysical Monitoring Network

Gaetano Maron, ANW, Halifax, November 2004

15

The GRIDCC Services– Supporting Services

• Security Service– login and user account management; security issues

• Resource Service (RS)– GRIDCC resources (including instrumentation controller nodes)

handling and their partitioning; GRIDCC resources configuration• Informartion And Monitor Service (IMS)

– Collectes messages and monitor data from the GRIDCC resources; distributes them to the subscribers

• Job Control– Starts, monitors and stops the software elements of GRIDCC,

including the Instrument components• Problem Solver

– Uses information from the RS and IMS to identify mulfunctions and attempts to provide automatic recovery procedures where applicable

– Virtual Instrument Controllers (VIGS)• Instrument controllers, hierarchy of controllers• Transform requests from the UI to proper actions to be sent to the

instrumentation

Gaetano Maron, ANW, Halifax, November 2004

16

General Requirements for the GridCC Services

• About 104 nodes/instruments to be controlled and monitored (for the more demanding application)– The nodes/instrument are controlled by VIGSs.– Round trip time to reach all the nodes must be in the order of human reaction

time. A hierarchy of VIGS allows to reduce such time. – Concurrent partitions should be possible (Resource Service) – Information collection from all the node to reach a “single point” of storage.

Collection time fast enough to allow monitoring , error detection , alarms, etc. Aggregate throughput in the order of 104 message/s (IMS)

– On-line diagnostics and problem solving fast enough to be useful (from seconds to minutes) (Problem Solver)

• Real time requirement– This requirement affects both network and Web Service QoS definitions including

parameter for :• Delivery certezza • Response to a request in a give amount of time

Gaetano Maron, ANW, Halifax, November 2004

17

The GRIDCC ServicesResource Service

DAQ description

Instrument configuration

Instruments descrip

tion

• The Resource Service (RS) handles all the GRID resources and manages their partition (if any).

• A resource can be any hardware or software component involved in the GRID.

• Resources can be discovered, allocated and queried.

• Partitions can only use available resources.

• It is the responsibility of the RS to check resource availability and contention with other active partitions when a resource is allocated for use.

• A periodic scan of the registered resources will keep the configuration database up to date.

VIGS

Inst

rum

ents

VIGS

Inst

rum

ents

VIGS

Inst

rum

ents

Gaetano Maron, ANW, Halifax, November 2004

18

Information and Monitor System

PUBLISHERS(Instruments nodes)SUBSCRIBERS

Errors Log infoMonitorState

VIGS

Inst

rum

ents

VIGS

Inst

rum

ents

VIGS

Inst

rum

ents

• The Information and Monitor Service (IMS) collects messages and monitor data coming from GRID resources and supporting services and stores them in a database. There are several types of messages collected from the sub-systems. The messages are catalogued according to their type, severity level and timestamp. Data can be provided in numeric formats, histograms, tables and other forms.

• The IMS collects and organizes the incoming information in a database and publishes it to subscribers. These subscribers can register for specific messages categorized by a number of selection criteria, such as timestamp, information source and severity level.

Gaetano Maron, ANW, Halifax, November 2004

19

Problem Solver

This Service identifies malfunctions of the GRICC system and determines possible recovery procedures. It subscribes to the IMS to receive the information it is interested in. The information is processed by a correlation engine and the result is used to determine a potential automatic recovery action, or to inform the user providing any analysis results it may have obtained.

PS1Rule-based expert system

(rules are known in advance)

Defines the control action representation

Inputs(the system behaviour)

Outputs(the system control

based on the generated rules)

PS2Pattern recognition system

Inputs(the system behaviour)

Outputs(the most common

patterns of the errors and warnings

appeared)

PS3Rule generator

(rule learning tool)

Inputs(the system behaviour)

Outputs(the identification

of the control behaviour of the

system)

Control action representation

Step 1

Step 2

Step 3

Gaetano Maron, ANW, Halifax, November 2004

20

Virtual Instrument Grid Service VIGS

InstrumentManager

ControlGateway

Real Instruments orSet of Instruments

Virtual Instrument Grid Service

InfoServProxy

Controls,Status

Errors, Log Info,Monitor, State

Virtual ControlRoom

IMS

InstrumentManager

DataMover

To Grid Farms,Data storage,Visualization, etc.

VIGS is a set of services that enables the remote control, monitoring and overall operation, via GRID protocols, of a set of real instruments

Gaetano Maron, ANW, Halifax, November 2004

21

Hierarchy of VIGSs

VIGS VIGS VIGS VIGS

VIGS VIGS VIGS VIGS VIGS

VirtualControl Room

IMS CE/SE

Control FlowErrors Flow

Data Flow

Real Instruments

Gaetano Maron, ANW, Halifax, November 2004

22Project Timing1 2 3Years

Gaetano Maron, ANW, Halifax, November 2004

23

Project Time Schedule

Gaetano Maron, ANW, Halifax, November 2004

24

Preliminary Service Prototypes

– Resource Service, IMS and a reduced version of VIGS exist as preliminary study prototypes. The aim is to gain experience with the technologies and provide a preliminary test bed for our appliccations.

– The following technologies have been used:• Tomcat based (Java servlet container)• SOAP/XML (Jaxm)• Castor• MySQL and Oracle DB, JDBC• Sun Message Queue (JMS) for IMS

– An integrated version of the above mentioned services and VIGS is now in operation to control a 128 nodes (instruments) system

Gaetano Maron, ANW, Halifax, November 2004

25

Performance Issues of the prototype: Round Trip Time

VIGS VIGS

VIGSSoap/XML

Soap/XML

Gaetano Maron, ANW, Halifax, November 2004

26

Performance Issues of the prototype: Info Pub/Sub Performance

0

50

100

150

200

250

300

350

400

0 8 16 24 32 40 48 56 64 72

Number of Publishers

To

tal n

um

ber

of

mes

sag

es/s No Persistency

File Storage

mySQL DB

MsgFilterPUBLISHERS SUBSCRIBERS

Persistency

Msg BROKER

Tomcat Based BrokerSOAP/XML msg Max throughput

0

500

1000

1500

2000

2500

3000

5 100 450 600

N° of Publisher

Msg

/sec

100Bytes

M.Q. JMS Based BrokerJMS msgDual Xeon 1.8 GHz

Gaetano Maron, ANW, Halifax, November 2004

27

Comments on the performance requirements and guess for the GridCC technologies

• Single VIGS routing capability should be in the order of 102 msg/s. Due to the topology of the application (hierarchical) this number is a reasonable compromise. The prototype (Tomcat + Soap) fits with this number. Web Service based VIGS should fit easily with this figure.

• IMS message broker capability. Due to the nature of the messages collected by this service (asynchronous error messages, state changes, monitor information, etc.) we require at least 1000 msg/s per broker. The prototype shows some limitation with Tomcat + Soap scheme. JMS approach behaves properly.

Gaetano Maron, ANW, Halifax, November 2004

28

Web Service PerformancesWeb Service Invocation

0

100

200

300

400

500

600

700

800

900

1 5 10 15

Clients

Inv/

sec

• Sun J2EE AS• Dual Xeon 1.8 GHz• 10 tag XML doc• remote method invocation• only ack as answer

• Glue Web Service• Dual Xeon 1.8 GHz• 10 tag XML doc• remote method invocation• only ack as answer

Gaetano Maron, ANW, Halifax, November 2004

29

Grid Technologies for the project

• Specifications we are looking at:– WS Agreement

• to define the QoS affecting the real time behaviour (as defined at the beginning) of the web service

– WS Resource Framework• State full web service

– WS Addressing

– WS-Notification

– WS-Federation

Gaetano Maron, ANW, Halifax, November 2004

30

Technology review in progress• Web/Grid Service

– WS-I and/or WS-I+ based platforms• OMII• Java Sun ? • Web Sphere ?• JBoss ?

– IBM emerging toolkit• WS-RF, WS-ResourceProp, WS-ResourceLifetime, WS-Notification,

WS-ServiceGroup, WS-BaeFaults– Globus Toolkit 4

– EGEE gLite

• Message pub/sub systems– NaradaBrokering– Sun Messages Queue 3.5– WebSphere MQ Series

Gaetano Maron, ANW, Halifax, November 2004

31

Conclusions