23
Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS Collaboration CERN, 1211 Geneva 23, Switzerland

Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

Embed Size (px)

Citation preview

Page 1: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

Architectural Software Support for Processing ClustersJohannes Gutleber, Luciano Orsini

European Organization for Nuclear ResearchDiv. EP/CMD, The CMS CollaborationCERN, 1211 Geneva 23, Switzerland

               

Page 2: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

2

The Issue

1988The biggest problem with creating distributed computing systems is devising a method of intercomputer communication that is reliable, fast and simple.

J.E. Tomayko, NASA CR-182505, p.228, Mar 1988

2000High-speed networks […] can obtain communication speeds close to those of supercomputers, but realizing this potential is a challenging problem.

H. Bal, ACM Op Sys Rev, p. 79, Oct 2000

Page 3: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

3

The Approach

• invest in alternative communication paradigms• optimise communication libraries

Do not…

• Lightweight framework for homogeneous communication• Configure with low-level communication libraries• Plug-in application components• homogeneous subsystem interface design support

Provide architectural software support

Page 4: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

4

Architectural Software Support

• Architecture support comprises– a processing model– subsystem addressing– configuration and control– Application Programmer Interface requirements

Everything that is needed tobuild and operate a

Distributed application

Page 5: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

5

Motivation

• In large scale data acquisition systems we have to cope with– Long operational lifetimes (10-15 yrs)– Modifications due to generation jumps (networking, processing)– Deployance of one application in various different environments– Bridging of hardware/software performance gaps

• From the special case we can extrapolate to general cluster based systems– Search engines, document retrieval systems– Plant control systems – Medical imaging networks in hospitals

Available tools don´t match the requirements

Page 6: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

6

HDM/FPGAHDM/IOP

Architecture Basis: I2O

• A specification for hardware and operating system independent device driver framework

• Targeted at collaboration between...

Messaging Layer

Host andIntelligentdevices

Intelligent deviceintercommunication

PCI busUNIX - OSM Windows - OSM

Page 7: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

7

I2O IOP Environment

• Inbound/Outbound queue (pass frame pointers, Zcopy)• Homogeneous frame format• Event driven processing• Uniform hardware access API

IRQ

bar ( )

Network

HDM, framework

foo( )

Inbound outbound

Page 8: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

8

I2O Message Frame

Used to implement an active message model

MessageSize MessageFlags VersionOffset

TargetAddressInitiatorAddressFunction (= FFh)

InitiatorContext

TransactionContext

XFunctionCodeOrganizationID

PrivatePayload = function parameters

PrivatePayload

3 2 1 031 24 23 16 15 8 7 0

Sta

nd

ard

Fra

me

Pri

vate

Fra

me

Ext

en

sio

n

Assigned by application and returned in reply (cookie)

Assigned by message layer. Used for routing back reply

Page 9: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

9

I2O Messaging

• A Message frame contains two addresses– initiatorTid = where the message comes from– destinationTid = to which DDM/ISM it shall go

• Message is associated with a handler function– Predefined Functions for I2O messages– Private frame extension for application specific messages

• Message length limited to 265 KB. Frame should only contain control information– Message data should go into Scatter-Gather Lists

• I2O frame byte order is little endian

Page 10: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

10

Peer and Peer2Peer Operations

• Peer Operation uses the queue pair on one PCI segment• Peer-to-Peer commands for network communication

Executive

Peer TransportAgent

Executive

Peer TransportAgent

PeerTransport

DDM

Messaging Layer

Executive

Messaging Layer

Device Driver

Module

Non-I2Omessages

I2O message frames

Page 11: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

11

I2O Peer Operation for Clusters

• Application component device• Processing node IOP• Controller node host

• Homogeneous communication– frameSend for local, remote, host– single addressing scheme (Tid)

• Application framework

Executive

Messaging Layer

Peer TransportAgent

Messaging Layer

Executive

Peer TransportAgent

ˆ

ƒ

„ …

‰PeerTransport

Application Application

I2O Message Frames

Page 12: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

12

TargetAddrClassId

InstanceDispatcher

Applications are I2O Classes

in XDAQthey are

equivalent toC++ classes

Listener

DDmAdapter UtilAdapter UserAdapter

Application

Each class exposes an

interface that is implemented by the application

Page 13: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

13

Polling Peer Transport Agent+ low OS service overhead- executive uses CPU continuously- no blocking PTs

Peer Transport Configurations

PTATCP

Myri

DLPI

FIFOPTA

TCP

Myri

DLPI

FIFO

Thread per Peer Transport- higher OS service overhead+ no CPU monopolisation+ allows integration with other software

Page 14: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

14

I2O for Cluster Configuration

executive tasks

RUIO (IOP480)VxWorks

PPC (MVME2306)VxWorks, Linux

WorkstationIntel Linux,

Sparc Solaris

Page 15: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

15

Boot

• Executives on each node in the cluster wait for I2O configuration messages

• Configuration and Control can be done through– Native I2O messages– XML/HTTP mapping Zzz..zzz…zzz..

Parameter set/get isAlso done through I2O/XML

Page 16: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

16

I2O Configuration Commands

• Where (e.g. IOP 34) ExecSysTabSet• How (e.g TCP, DLPI, Myrinet)• Who (e.g RU1 – Tid 10, RU2 – Tid 20) ExecDeviceAssign

Detector Frontend

Computing Services

ReadoutSystems

FilterSystems

Event Manager Builder Networks

Level 1Trigger

RunControl

Page 17: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

17

Ready

• What ExecSwDownload (e.g.libRU.so, libEVM.so)

LocalApp2

RemoteApp2

RemoteApp1

RemoteApp3

LocalApp3

Page 18: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

18

Operational

App2App1

frameSend (...)App3

DdmSystemChange

Page 19: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

19

Efficiency Evolution

• Roundtrip test, reporting half-roundtrip-time• Calculate difference to the bare-bones use of Myrinet GM library

June July August September October November

10

5

3

4

2

1

original efficiency, paper450 MHz, PCI 32/33

on-demand buffer-pool allocation450 MHz, PCI 32/33

750 MHz, PCI 32/33

µsecs

time

Page 20: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

20

Point To Point Efficiency

GM/XDAQ Latencies

y = -0,0000x + 2,1289

0

10

20

30

40

50

60

70

80

90

100

110

120

0 1024 2048 3072 4096

Bytes transferred

Mic

rose

cond

s

XDAQ

GM 1.2.3

Page 21: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

21

SOAP

CMS Data Acquisition System

XML

Java

I2O

I2OO(500) real-timesystems

Giga E´NetMyrinet, Infiniband

100 kHz input@ 2KB per node

Custom readout

O(500) builder units

O(2000) physics Analysis nodes

Prototype cluster 2000: 32 x 32 PCs2.5 Gbps Myrinet 2000Gigabit Ethernet

Page 22: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

22

Summary

• Lightweight middleware• 2.1 sec per remote function invocation

(50 000 calls/s on GM)

– Abstraction from hardware– Ease of adaptability and extensibility

is feasibile.• Need architectural support

– to efficiently integrate layers– to be able to keep pace with technology

evolution w/o a need for change– to construct homogeneous applications

for heterogeneous processing clusters

OS and Device Drivers

HTTP

Ethernet Myrinet

XDAQ

Util/DDM

Processing

Sensor readout

TCP

PCI

Page 23: Architectural Software Support for Processing Clusters Johannes Gutleber, Luciano Orsini European Organization for Nuclear Research Div. EP/CMD, The CMS

23

Information

http://cern.ch/xdaq

[email protected]@cern.ch