35
Development of the UK-NEES test middleware, early tests, web services approach, network usage, security, and usability issues in distributed hybrid testing Mobin Ojaghi OUEL Departmental Report no 2322/11 University of Oxford Department of Engineering Science 2011

Development of the UK-NEES test middleware, early tests, web

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Development of the UK-NEES test middleware, early tests, web services

approach, network usage, security, and usability issues in distributed hybrid testing

Mobin Ojaghi

OUEL Departmental Report no 2322/11

University of Oxford Department of Engineering Science

2011

i

Abstract

Development of the UK-NEES test middleware, early tests, web services approach, network usage, security, and usability issues in distributed hybrid testing

This document describes work conducted during development of UK-NEES (Network for Earthquake Engineering Simulation). It is written to supplement other work referred to here describing the development of real-time distributed hybrid testing. It describes various aspects of the development process first, the middleware development process and some of the early problems encountered which guided later middleware designs. It also describes the workings of the DHT.exe series program v0.3.1.1, the most successful DHT.exe program. Following this the other aspects of distributed testing identified and considered in the development of the UK-NEES real-time distributed hybrid testing system are described: the web services approach, network usage and security and, social issues including support systems and data ownership.

ii

Acknowledgements

The UK-NEES project was an EPSRC funded project funded using grants at

Oxford EP/D079101/1 (Oxford), in collaboration with EP/D080088/1 (Bristol) and EP/D079691/1 (Cambridge).

I am grateful to the EPSRC (departmental studentship), New College (travel award and writing up grant), Bristol University (EPSRC fund for travel expenses) and the IMechE (travel award) for the various funds they have provided me in the completion of this project.

I would like to thank my supervisors, professors Martin Williams and Tony Blakeborough for their support and encouragement throughout the course of this project. They played an important role in ensuring the feasibility of the UK-NEES project.

I would also like to thank the other members of the UK-NEES team who supported the work of the project, in particular with regards to IT matters.

Initial ideation and collaboratory work in development of pre-DHT.exe series programs was with Dr. Javier Para Fuente. The most extensive set of development programs leading to the first stable real-time DHT was during development of the DHT.exe series of programs with Mr. Ignacio Lamata Martínez. I am grateful also to Dr. Matt Dietz for his tireless support of this project during the many hours of experiments at Bristol.

I would also like to acknowledge the support of Mr. Kashif Saleem for his introduction of grid computing concepts and assistance in setting up UK-light and, the other members of the team Mr. Jonathon Evans, Mr. Arshad Ali, Dr. Gopal Madabhushi, Dr. Stuart Haigh, Professor Colin Taylor and Dr. Adam Crewe.

I would like to thank the IT support staff at Oxford, in particular Mr. Tony Gilham but also Mr. Chris Flux, Mr. Kevin Corbett and Mr. Harry Fearnley.

At Bristol Mr. Chris Hawkins.

I am also grateful to Dr. Andreas Schellenberg and Prof. Stephen Mahin for their support during my visit to UC Berkeley.

Finally, I would like to thank everyone else who supported me during the completion of this project.

iii

Table of Contents

1.0 Introduction 1

2.0 Middleware development process 2

2.1 Inter-controller communication 5 2.2 Early controller board testing 7 2.3 Board to board sine testing 12 2.4 DHT.exe series program 15

3.0 Web services based communication within UK-NEES 17 4.0 Network usage policies and security 20

5.0 Social aspects of a distributed hybrid testing system 24 5.1 Support systems 25

5.1.1 Remote access to testing machines 25 5.1.2 Telephone contact 26 5.1.3 Instant messaging - when speaking is not possible or there are 26 multiple sites 5.1.4 Live editing of shared documents via Google Docs 27 5.1.5 Tele-presence 27 5.1.6 Video-conferencing 29

5.6.2 Access rights and intellectual property in a distributed testing environment 29

6.0 Summary 30

References 31

1

1.0 Introduction

This report supplements work discussed and referred to in Ojaghi (2010) on the development of

real-time distributed hybrid testing (DHT). It expands on some of the work presented there and

also describes additional aspects of distributed testing explored within the development process

of the UK-NEES real-time distributed hybrid testing system. A distributed hybrid testing system

introduces many new challenges to enable testing, most of the technical and social issues have

been covered in detail in Ojaghi (2010). This report focuses initially on the middleware

development process. The development of the test middleware, responsible for making direct

connections between site actuation systems lead to a better understanding of the issues involved

in this new testing environment both technical and from an administrative and usability

perspective. Selected early test results are shown describing some early test findings that lead to

the development of DHT.exe v0.3.1.1, the most successful DHT.exe series program used for

real-time DHT within UK-NEES and capable of conducting stable tests. It should be noted that

extensive hybrid test results which were generated are not shown, an example describing the

issues faced is found in Ojaghi (2010). DHT.exe in turn would be superseded by IC-DHT.exe

series programs described in Ojaghi (2010) which were used to conduct what are believed to be

the first known stable and accurate real-time DHT. The program used together with the new

large delay compensation algorithms and data handling controllers developed for this purpose.

The web services approach is next discussed. The approach taken to make the connections

between sites was ultimately using sockets. However, originally web services were proposed.

Applying web services has significant advantages but its disadvantage in terms of

communication latency and computational overhead meant that a socket based approach would

be more appropriate for real-time continuous DHT. To benefit from the potential of web services

while not limiting test performance it is proposed that web services be used for auxiliary tasks

related to test usability while critical communication as required for control would continue to

use sockets through a separate dedicated PC. In the current implementation of UK-NEES test

software all communication primarily utilises sockets. Integrating all distributed test systems

including calling sockets for transmitting control signals via an independent processor/PC with

web services is intended for a future stage of development.

DHT presents a new use of the Internet. An important consideration is network usage and

security in DHT. This is discussed in section four. Finally, working in a distributed testing

environment not only poses technical challenges from a point of view of robust control, there are

significant challenges to overcome to facilitate work between distributed personnel and in order

2

to improve the user experience. A usability strategy has been implemented within UK-NEES

which aims to simplify test procedure in order to minimise human error and reduce the burden

on test operators in an already busy testing environment. To support work in the distributed

testing environment a range of additional tools were used, these are described in section five.

Also discussed there is the potential issue which arises of data ownership as data is generated

jointly between research institutions.

2.0 Middleware development process

The test middleware (Ojaghi, 2010) provides the interface between software layers

communicating between the controller cards directly controlling local actuators and network

interface cards which enable communication between geographically distributed controller cards

across the Internet or a similar network. It is therefore used to enable distributed inter-controller

communication.

It quickly emerged that conducting robust communication between distributed sites would

be a very difficult prospect since data loss, inter-controller latency and data arrival time

variations were quite high. Robust communication could not be achieved. To achieve DHT and

later as focus shifted to predominantly achieving real-time continuous DHT it was required that a

deep understanding of the existing hardware and software environment be developed. Since

existing hardware systems had to be used as part of the requirements of the project and, to ensure

that the developed system could be easily applied to other earthquake engineering labs, work

focussed on maximising the potential of available systems. The primary aim was to explore the

possibilities for achieving communication without data loss, with minimal variation in data

arrival time and to minimise as much as possible the communication latency. Achieving robust,

minimal latency, low jitter inter-controller communication would be essential both in the

development of and in the application of control strategies for real-time distributed hybrid

testing.

This process would be facilitated through the development of the test middleware and with

extensive testing from 2006 until mid 2009 when the final test architecture and associated

control systems presented in Ojaghi (2010) were developed.

The middleware was developed in three stages. The early stage work referred to as pre-

DHT.exe (named after the communication program) focussed on achieving host PC to controller

3

board communication between the hardware systems at Oxford and Bristol. Later this was

extended to the testing of pseudo-physical distributed models, where purely numerical

simulations of a structural system hosted on a controller board would be connected to single and

multiple distributed numerical substructures modelling a nonlinear fluid viscous damper. These

dampers would be placed on the local host machine, a machine on the local network and

distributed first across the Internet to a machine in Glasgow (the authors home and accessed

using SSH tunnelling to provide an encrypted communication channel to overcome institutional

firewalls) and later to PCs in Bristol and Cambridge.

At this stage many administrative hurdles in regards with institutional policies on network

usage and security had to be overcome. In order to minimise latency it was decided that inter-

controller communication should be achieved directly through host PC communication and not

through a proxy; UK-NEES point of presence machines located in the DMZ (demilitarized zone)

of the local network. Since a socket architecture was pursued, ports between host PC’s at the

UK-NEES sites and further afield including at Auckland University had to be opened on each

intuition’s firewall. Further details of network usage policies, security, related issues and how

they were tackled may be found in section 4.0.

In the second and most extensive development stage, the DHT.exe series of programs were

created building on the earlier programs. These programs were the first to enable distributed

inter-controller communication within UK-NEES and with that the first full assessment of the

testing environment could be made. They were designed with maximum efficiency in coding in

mind to make best use of computational resources both locally and on the network. The primary

role of these programs where to directly integrate with the dSpace controllers and Windows

operating system for network data transfer. The DHT.exe programs were also designed to

separately integrate with shared file/memory interfaces to allow communication between the

Cambridge host PC and the programs used to link the host PC and the Cambridge local

controller, further details of which may be found in Ojaghi et al. (2010). At this stage a deeper

understanding of the workings of the testing system were gained and many new control

strategies and improvements to the middleware developed in response to testing. The most

important middleware developments were, to discover the limitations of single process

synchronous socket communication; that latency and jitter with TCP/IP communication would

be too high for real-time control purposes particularly due to its flow control and error control

functions; that modifying flow control characteristics would be difficult and lead to saturation of

local and remote PC’s; and, to implement the ability to directly control time-step generation at

the Bristol and Oxford controller boards, which would be essential for enabling synchronisation

between distributed controller boards. At this stage the importance of coordinating distributed

4

personnel and equipment in conducting DHT was realized and led to the development of a

usability strategy which was incorporated into the DHT.exe series of programs. This usability

strategy and the pair programming of the DHT.exe series of programs was studied to assess its

wider impact on this and related eScience projects (de la Flor et al., 2009; 2010).

In the final stage of development the IC-DHT.exe program was developed. The program

used as the current UK-NEES test middleware builds extensively on the earlier programs and

was developed to address many of the limitations encountered in those programs. The program is

optimised for high performance messaging (low latency and jitter), implementing a single

process asynchronous version of UDP/IP with careful consideration of buffer sizes and similarly

to the DHT.exe programs written with maximum computational efficiency in mind. The

program, tailored currently for the controllers used at Bristol and Oxford (dSpace) can

implement all of the control strategies presented in Ojaghi (2010) provided it is supplied with the

correct initialisation and control data, achieving synchronisation of local and remote time-steps

through the use of a hardware based high resolution timer located on the host PC’s Intel

processor and by the ability to control the start of time-step generation and hence the outer loop

controller program on local and distributed controllers. The IC-DHT.exe program implements

the developed usability strategy, has the capability to capture data including both program and

controller board test variables, operates with soft real-time priority and runs on a dedicated

processor core. By running in a dedicated soft real-time environment the program has a greater

chance of completing its processes in time and in allowing data capture, secondary data capture

provided via dSpace software on the host PC is no longer required and thus the computational

load on the host PC and controller board is further reduced. This program setup was found to

limit or eliminate data loss and further reduce latency and jitter. Enabling robust real-time DHT

between Oxford and Bristol, extensive testing was conducted using the program.

The extensive testing and development process meant that a deep understanding of the

hardware and software environment of the UK-NEES real-time DHT system was gained that is,

the functioning and interaction of the controller boards, host PCs and the networks used. The

first and most obvious issue that comes to mind with conducting robust real-time distributed

control is with attempting to pass control signals across the Internet, since this is for the most

part beyond the control of the testing sites; data is transmitted according to communication

protocols and has to compete with other network traffic. However, it is found that network

communication between the testing sites is actually quite robust. It is true that the network could

fail but this is an extremely unlikely event and down-time is usually advertised beforehand.

Robust communication across the network is facilitated with the implementation of the test

communication protocols through IC-DHT.exe and also because the service between institutional

5

networks is highly redundant, has a high bandwidth and routing is provided through high

computational capacity multi-processor routers capable of handling multiple packets

instantaneously. In fact it became clear that the biggest barrier to robust low latency, low jitter

communication was not the network but the local nodes themselves. This led to an optimised

hardware and software environment set up to minimize data loss, latency and jitter.

The development and functioning of the hardware and software systems that enable robust

distributed communication and the learning outcomes that guided the development process are

discussed in Ojaghi (2010) in more detail. In the remainder of this section inter-controller

communication between the three UK-NEES sites is discussed, brief results from early tests are

shown including distributed sine tests which were important in understanding the issues involved

with testing in this new testing environment. Finally, the workings of the DHT.exe series

program, v0.3.1.1 is shown. The most successful DHT.exe series program it was the first to

integrate background tasks related to data communication and foreground tasks which were

designed to assist usability.

2.1 Inter-controller communication

In the first stage of development of the UK-NEES distributed hybrid testing system the

initial client-server architecture was applied to connect the local testing systems at the three test

sites. This is shown in Fig. 2.1 depicting an experiment described by Ojaghi et al. (2010). Here,

the central site, Oxford is the client connecting the two servers at Bristol and Cambridge. A high

level view of the control layer architecture connecting the hardware systems at the three sites is

also shown. The existing hardware systems used to conduct local testing at each of the sites are

used with a multilayer control architecture to enable communications between sites.

In this case the client connects to two very different hardware systems. Oxford and Bristol

share very similar testing environments. They both use dSpace hard real-time processor boards.

This allows numerical models and control software (the outer loop) to be run onboard with a

high resolution hardware clock ensuring accurate and consistent time-steps. The boards, hosted

on a Windows PC (Win XP Pro sp2/sp3), directly command the actuator inner loop controller

and control signals are fed back to them. On the host PC, the network connection to the Internet

is used and the role of the distributed control layer is to enable communication between the

controller boards over the network. Network communication may be achieved with the boards by

using the dSpace, Clib and Windows, Winsock API’s (Application Programming Interface). In

testing, both Oxford and Bristol use dynamic hydraulic actuators and have the capability to run

real-time hybrid experiments. At Cambridge the testing environment is quite different. As real-

time testing is not a priority, Cambridge uses high load capacity electrical motors with gearing,

6

that fulfil power requirements and while there are significant velocity restrictions they are

relatively compact - as required for use on the centrifuge basket. The Cambridge systems run

LabView on a Windows XP Pro (multi-process) environment to allow communication with a

ComputerBoards A/D board, regulating time-steps using a software based timer. While LabView

is used to interface with the A/D board, direct access to the memory registers of the board is

possible via a ComputerBoards software library.

Fig. 2.1. UK-NEES 3 site test distributed testing system - high level control layer view

Due to the nature of the testing environment presented two different solutions were found

for the design of the test middleware. The first describes the basis of middleware development

for real-time DHT and is tailored in this case specifically to connect to the dSpace boards at

Bristol and Oxford. The program runs on the host PC and uses the available API’s to allow read

write access to controller board variables and to the network. The different middleware programs

use this same approach and are distinguished by how the communications protocol is

implemented and how control variables are processed. The second solution enables

communication in a more general way between the hard-real time controller board at Oxford and

the soft real-time control afforded by the LabView program running on a host PC and providing

the interface to the Cambridge ComputerBoards A/D card. A LabView based program interfaces

via common read and writes files (memory or disk based) with the DHT.exe program, receiving

commands to pass to the actuator controller and transferring feedback control signals to the

DHT.exe program. This general connection shares the same test protocol with the other sites in

order to transmit messages and is developed to connect any hardware platform. However, by

7

introducing an extra file layer this makes it slower than the approach used to directly transfer

messages between PC and controller board using the Clib API as used in Oxford and Bristol.

The tests conducted using this setup were one of many to highlight the problems of

conducting robust communication within a software controlled multi-process environment.

Accurately controlling software timers at Cambridge was a significant issue and major

performance gains could be made in testing by using a more powerful host PC in Cambridge.

Testing at a slower rate reduced computational load and data loss events could largely be

avoided. However, this did not mean that by testing at a slower rate alone that robust

communication would be possible. In testing between Oxford and Bristol the computational load

on the controller board and host PC and the competing processes on the host PC presented a

major limiting factor in conducting robust communication in real-time.

2.2 Early controller board testing

The first serious attempt at conducting fast DHT was made using the pre-DHT.exe programs,

and they provided the first indications of the issues involved with the testing environment. The

most significant development of the pre-DHT.exe program was to connect a numerical model

running on the Oxford controller board to a pseudo-physical model which was part of the server

pre-DHT.exe program. This is shown in Fig. 2.2.

Fig. 2.2. Pre-DHT.exe program flow connecting controller board model to pseudo-physical model on PC

The pseudo-physical model would be run either on the host PC itself (using the localhost

IP), a PC on the local network or a PC running on an external network. The program was used to

develop the connections, gain a better understanding of the testing environment and gauge the

possibilities for fast testing. The program implements the TCP/IP (Winsock) communication

protocol using a single process synchronous or blocking mode of communication which is the

most common approach for designing a sockets application. Synchronous communication means

that the program will progress only when a new value (the expected in order packet) has been

received (with delivery all but guaranteed by the protocol).

8

Initial network testing (in 2007) between the test machine and local network machine (in

the office above the test lab) showed a fairly consistent round trip latency of <1ms (measured

using ping on Windows XP Pro (32bit packet) and 6ms to Bristol. The later seemed to bode well

for enabling real-time DHT within UK-NEES as this taken as an additional constant delay was

not excessive and within the order of magnitude compensated for using existing local techniques.

However, since reported DHT tests conducted up to then had not even approached real-time this

was taken with some caution. The network connection was idealised as equivalent to a long

coaxial cable such as used to provide local actuation commands and feedback sensor data. Since

a remote actuator was eventually to be controlled, to enable rapid communications between the

sites the communications program would attempt to read off the controller board and send the

control data as fast as possible to the remote site, to directly actuate and return the measured

sensor response as soon as possible to be applied to the main numerical model calculations on

the control board. All communication going well, the client site numerical model would start the

test and on connecting, the remote site actuator/pseudo-physical model would receive a constant

stream of delayed commands to process and feedback.

To gauge reading and writing possibilities on and off the board a series of initial tests

were conducted using the pre-DHT program as described in Fig. 2.2, the network delay

measured using the time onboard and the time kept value written to the board. A representative

set of network delay results conducted in one day of testing are shown in Fig. 2.3. A few

interesting observations can be made about these results.

Firstly, concentrating on the local tests (lt1-9). These tests are not conducted using the

network, rather they use the localhost IP address (127.0.0.1). This has two consequences. One

the test is not affected by fluctuations in network usage though it may be affected by unwanted

traffic from or to the testing machine as a consequence of the operating system or other

processes running on board. Two, since the server and the client are running on the same PC, the

host machine is working much harder. In tests lt1-4 and lt8 the test is conducted with the same

test conditions as far as could be controlled, except that in lt1 12 variables of data are captured at

0.1ms off the board and in the other tests only two variables are captured. This data capture is

with Control Desk (the dSpace host PC data capture program) and the pre-DHT.exe program

operates its loop as fast as possible. The board is running at 0.1ms time-steps (as was being used

at the time in local testing). The first thing to notice is that the minimum delay is quite small,

0.2ms. If all other variables that can affect the test are constant, it seems that capturing more

variables doesn’t significantly affect the test since delays and delay variations in lt1 is consistent

with the other tests (lt2-4, 8). While the minimum delay is very small this has a very small

influence on the actual overall average delay. A few variations in response can be observed, the

9

delay is at least 50ms consistently, though several fluctuations occur that lead to higher delays

around 100-500ms.

In some tests, particularly, lt1 after an initial period of fairly constant delay variation, the

delay variation increases linearly through the test. While this behaviour can happen at any time

this tends to happen towards the end of a test (a 60-80s period). Also, and lt1 is a good example,

there can be a sudden spike in the delay several times bigger than delays occurring consistently

through the test. Sometimes (lt2) the spikes occur more frequently and have significant variation.

Fig. 2.3 Network delay from pre-DHT.exe tests.

In tests lt6, lt7 and lt9, the board is running at 1ms time steps and data capture is also at

1ms time steps. The maximum delays are much lower, for large periods the maximum delay is

constant around 5ms though there are large variations as before with delays up to 30-50ms. lt9 is

interesting as the tests begins with a linear increase in test delay as the test proceeds, drops to a

much lower constant variation in test delay for a short period, then returns to the previous

behaviour with the maximum delay progressing as if the drop in delay had not occurred. In tests

lt7 the board is running at 10ms time-steps with data capture at 10ms time-steps. While there is a

10ms time-step delay as expected there are no other observable fluctuations in delay and though

this is a large delay the consistent behaviour is what would be required for a distributed test.

10

Secondly, in tests e1-e4 conducted with the server running on a local network machine,

the minimum delay increases to 0.3ms for tests at 0.1ms and is 1ms for tests running at 1ms.

Similar variations in delay are observed as with the local host tests. There seems to be no other

real observable difference between tests running on the local host PC and with those with the

server distributed to the local network PC. In test e2, 12 variables are captured at 1ms time-steps

with the controller board running at 0.1ms. However, the delays are not significantly lower than

test e1 or much different from lt1.

Finally, in tests distributed to the dSpace host PC in Bristol but not running on the server

board (b1-b3), tests are conducted at 1ms time-steps, since with 1ms time-steps satisfactory

dynamic control of an actuator can be achieved and the delay performance is much better. While

the minimum delay is much higher (the measured trace-route round trip time) and consequently

in the periods which have fairly constant delay variation behaviour (e.g. b1, between the first two

25ms spikes) the delay is higher than similar periods in an equivalent test e.g. lt5 or e4, between

10-15ms, similar behaviour in terms of delay fluctuations observed in local tests is found. In b1

and b3 it is difficult to ascertain if there is a pattern between the spike variations observed in

tests with Bristol and with local tests. In test b2 linear increases in maximum delay as the tests

progresses are observed (as with lt1, lt2, lt9, e1-e3).

These tests, though they are not statistically conclusive do indicate that while minimum

delays to Bristol of around 6ms may be manageable with delay compensation schemes, the larger

and significant fluctuations in delay make fast continuous control of a distributed actuator very

difficult under these conditions. The average delay is reduced when the controller board is

running at 1ms time-steps as compared to 0.1ms and delay fluctuations are not observed in the

test running at 10ms with data capture at 10ms time steps.

The results suggest that the computational load on the controller board and by virtue of

data capture (as the test progresses), on the host PC will significantly influence communication

performance. The larger the controller board time-step, the lower is the computational load on

both controller and host PC and the better is the communication performance both in terms of

maximum delay and with the elimination of delay fluctuations, especially at larger time-steps.

However, at larger time-steps the board updates its calculations less often and therefore

minimum delays are large. The results suggest a balance between computational load and

controller board time-step should be made to minimise delays caused by communication and to

eliminate fluctuations. If test are to progress at lower time-steps the significant fluctuations in

delay will be problematic for testing. No clear pattern emerges to explain these delay

fluctuations. There are various possibilities, from variations in how data is captured, how

11

processes are scheduled on the host PC and perhaps due to the network. There seems to be

saturation of computational capacity as tests progress at times which leads to delays increasing

but, this behaviour cannot be predicted, as with random spikes in delay.

In order to further investigate reading and writing performance on and off the board, a

variety of additional tests were conducted, reading and writing off the board without using

sockets. These tests were conducted on a variety of board / host machine configurations

(including at UC Berkeley, using a dSpace DS1104 card hosted on a PC with the following

specifications: Intel Xeon 1.5GHz 1Gb RAM Windows XP Pro sp2). In all cases reading and

writing speeds would deteriorate slightly to a fairly consistent mean as tests were repeated, until

the PC was restarted. In some cases the second time a test was run (after being compiled and ran

on the board) the reading and writing average speed (as measured using an interpolated operating

system tick count on the pre-DHT.exe program which would run as fast as possible) would

increase and consequently more read write operations could be completed during the test. The

behaviour of the dSpace board is expected to be fairly consistent but there seemed to be a

relationship between what the host PC was doing and the performance of the test. To ensure

better tests, tests would usually require a host PC restart. The reason read write speeds

deteriorated with time was unclear, other than it is known that as the PC is running for longer

idle tasks/threads remain in memory on the PC from previously running programs and can often

deteriorate PC performance (this memory may be cleared by a ProcessIdleTasks call in

Windows). By affecting the computational load placed on the PC delays could increase as

performance reduced. The different machines showed slight variations in read/write

performance, and though machines with greater processing power seemed to perform slightly

better the differences could not be determined conclusively with the tests conducted. At this

stage, testing strongly indicated that the host PC and particularly by virtue of the multi-process

nature of the operating system used, played an important role in determining test performance.

After these tests the versions of host PC software used for programming the control

system, Matlab was updated to version 2006b at both Oxford and Bristol and the dSpace

software upgraded to release 5.3 so that the latest software based performance enhancements

could be available, but also more importantly so that in development there were no compatibility

issues with the developed code (testing with future and previous releases of dSpace software

installed is possible, and tests do not rely on the same versions being installed at all sites). This

was the software used for the remainder of testing.

Since the Oxford dSpace machine was coming to the end of its life and there were

significant performance issues with it the test machine was upgraded. The upgraded machine

12

was chosen carefully (and specifically customized) as it was known that its performance could

affect the test. In addition an unrelated hardware fault to the Bristol test machine meant that this

was also upgraded. The upgrades were consistent with the budgets available and the machine

specifications are presented in Ojaghi (2010).

2.3 Board to board sine testing

The tests presented in the previous section gave an indication of the communication issues faced

in controller board to PC to network communication. These tests were extended to attempt

distributed control of an actuator with a series of open loop (outer loop command) sine tests.

This was achieved by enabling inter-controller communication between Oxford and Bristol.

These tests were also an opportunity to explore what control signals would be required to enable

distributed hybrid testing. Two alternative communication models are presented in Fig. 2.4.

Fig 2.4 Inter controller control signals. Minimal (top), with the addition of additional control variables (bottom).

To ensure computational load involved in inter-controller communication would be an

absolute minimum, in order to encourage robust communication the top model of Fig. 2.4 shows

only one command variable being transmitted and in order to provide feedback to a potential

numerical model only one variable is returned. The program also reads and writes the board time

variable to compute the network delay at the client. In the lower model two variables are

transmitted to control a remote actuator (a timestamp and a displacement command). Three

variables are returned, a time-stamp (this could be the local time read off the server controller

13

board or as shown here the client time received, written to the controller board on arrival, read

and sent back), the displacement and force achieved. The client also writes the time kept value

back to the board when writing data that has arrived. The second model is more computationally

intensive but also provides more flexible control since displacement variables returned can be

used for client variable delay compensation and the time signals can be used as control variables

to detect delays and data loss. This functionality proved essential as more complex control was

attempted and most DHT.exe series programs applied it.

A series of open loop constant amplitude and frequency sine tests were conducted, first

using the top model (Fig. 2.4) and then the bottom model (Fig. 2.4) attempting to control a single

distributed actuator from the one storey test rig described in Ojaghi (2010) hosted in Bristol from

the controller board in Oxford. The control loops described in Fig. 2.4 would be applied and the

DHT.exe series program would run as fast as possible applying single process synchronous

TCP/IP.

The tests shown all attempt to command the actuator at 5mm 1Hz. Due to the mode of

communication a stepped signal is received at Bristol, this directly commanded to the actuator.

The signal is updated around every 7ms or so and the actuator for the most part responds well.

However various issues are presented. Firstly, the top communication model of Fig. 2.4 is

considered with sample results shown in Fig. 2.5. At times when data does not arrive for one of

these 7ms steps, it can arrive slightly later or earlier or not at all. With data not arriving or

arriving late, the rate of loading at the actuator reduces as shown in the left zoomed view of Fig.

7.5. As the test progresses in some cases there is significant saturation in communication

performance leading to increasingly larger delay spikes. This causes large data loss events where

data is not written to the server controller board. This is likely a client or server host PC or

controller board saturation, where either the host PC drops the data or the controller board blocks

read write access. The consequence, shown in the middle view of Fig. 2.5 is that the desired

waveform cannot be reliably produced. The final important observation to be made in these tests

is that user interrupts significantly affect test performance. Mouse clicking, accessing menus etc.

in a random way at client or server can cause large data loss events as reading and writing is

stopped to process the user interrupt. This has severe consequences for the test since the actuator

will display holding behaviour since the command is not updated for some time and will jump to

the new command which is often quite some distance away as communication resumes (Fig. 7.5

right, client and server time not in sync). This would be a likely source of instability in a real-

time hybrid test, not to mention incorrectly imparting severe dynamic loading to a physical

substructure. Due to the way Windows process scheduling is conducted, user interrupts have

higher priority than standard programs. Another example is if the data variables are printed to

14

screen by the DHT.exe program shell as they are read off or written to the board. Though

computational load on the PC increases, since visual tasks are given more priority over

competing windows processes communication performance can improve as DHT.exe is given

more priority.

Fig. 2.5 Distributed sine tests using top model of Fig. 2.4.

In Fig. 2.6 tests are repeated using the bottom model shown in Fig. 2.4. It is clear that

though for the most part the sine waveform can be reproduced and the actuator correctly loaded,

again, variable spikes in delay lead to small loading issues, while user interrupts can cause large

data loss events causing the actuator to hold and ramp to a new target position after data loss

stops.

15

Fig. 2.6 Distributed sine tests using bottom model of Fig. 2.4.

2.4 DHT.exe series program

In Fig. 2.7 the workings of the DHT.exe program v0.3.1.1 is shown. The most successful

DHT.exe series program it was used for extensive testing within UK-NEES. Using a single

process synchronous mode of TCP/IP it is shown here connecting two sites using the client

server architecture applied by UK-NEES. The sites in this case dSpace control hardware. The

program is specifically optimised for connecting to this hardware platform. Both background

tasks relating to passing of critical control data between sites and foreground tasks relating to

usability are shown. This program superseded by IC-DHT.exe exhibits many of the features used

to conduct stable and accurate real-time DHT within UK-NEES. In particular it applies a socket

architecture to minimise latency and computational overhead.

In the remaining parts of this report other aspects of distributed testing considered by UK-

NEES are discussed. In the next section the use of web services is discussed. This is followed by

a discussion on security and network usage. Social aspects of distributed testing are then

discussed. While a usability strategy is built into the operation of the test middleware and

associated programs to coordinate and inform distributed test personal of activities and to

provide live test updates, support systems can play a vital role in distributed testing. Also

16

distributed testing introduces issues of data ownership which are discussed. Finally a brief

summary is made.

Fig. 2.7 The workings of DHT.exe v0.3.1.1 connecting two sites using dSpace controllers and showing both

foreground and background tasks.

17

3.0 Web services based communication within UK-NEES

The grid services concepts originally proposed for UK-NEES (Saleem et al. 2008) sought to

introduce a web services approach to enable operation of the UK-NEES Grid. Web services

would be used to integrate the provision of robust distributed inter-controller communication as

required for DHT with provision of support services necessary to fulfil social tasks to assist work

in the distributed environment. These services included tele-presence, to enable remote viewing

of tests (including test data); tele-participation, to enable remote viewing and participation in

tests; tele-operation, remote operation of tests (including controlling scientific equipment by

enabling locally generated commands to be selected by remote participants); and provision of

storage and online access to data.

An online presence would be created to serve as the front end external access point to all of

the facilities at each local site. To enable DHT, web services would be developed to plan and

prepare experiments between sites beforehand and during actual testing, manage human to

human and controller to controller interaction. The web services layer would sit over existing

legacy systems providing a seamless link between the different hardware platforms used at each

site. Different web services, for example, controller web services or physical web services would

be developed to be applied to the different aspects of a test. To support work in the distributed

environment exiting open source tools provided by it.NEES.org would be used or adapted to

enable tele-presence and tele-participation, integrating acquired tele-presence cameras and

videoconferencing equipment. In addition online data access would be given to locally stored

data. Data repositories would be developed for each site to organise and give access rights to

data that has previously not been stored in an accessible format.

While a sockets based approach is the more conventional form of Internet communication

a web services approach to networking communication was considered since it can overcome

problems encountered when using sockets. A grid architecture based on web services solves

communication restrictions introduced due to each sites network security restrictions, network

configurations or firewall policies. In a sockets based approach these multiple limitations to

communication must be solved in a static way and customized solutions developed for each

problem encountered. However, since web services use standard HTTP communication (as used

for web browsing) over a standard transport protocol these problems can be avoided.

In addition a web services approach aims to avoid platform dependencies and to allow high

scalability in the test architecture. It seeks to enable new sites and components to be added to the

repository of resources available to the distributed testing system in a relatively straightforward

18

way. Since each site involved in a distributed test may use different hardware and software

systems, distributed communication can be limited to sites using the same systems unless some

form of standard communication is developed between them.

By applying a web services approach, distributed test modules hosted on different testing

software environments can interact in a standard way using a shared distributed test protocol

over the web services communication protocol (SOAP, Simple Object Access Protocol) to allow

interoperability between the different site systems. They operate as web based API’s

(Application Programming Interface). By adding a web service interface in front of existing

testing systems, these may be accessible by any other component in the testing system

independent of platform, programming language or paradigm.

This approach, using many of the existing technologies developed for grid based

computing and collaboration is an attractive solution to the communication requirements of the

UK-NEES distributed testing system, offering an easy to use yet powerful way of orchestrating

distributed experiments. It shares many of the benefits and features developed by NEES to

enable distributed testing using NTCP (NEES tele-operation control protocol), applying web-

services based communication with a robust protocol for test communication. This protocol has

been designed with the potential for network delays in mind (Pearlman et al. 2003). It holds

actuators if delays are encountered and can recover from network delays and dropped

connections. Though further work attempted to ensure continuous motion during testing

(Mosqueda et al. 2006) network delays encountered limited continuous actuation with this

approach causing stress relaxation in the specimens tested. All tests conducted using this

approach so far have been at large timescales (taking place over hours).

Within UK-NEES a web services implementation for DHT did not prove to be the best

option. While the approach can work well in provision of support services, it does not

satisfactorily address the needs of DHT, particularly for real-time testing nor does it necessarily

offer a significant advantage over using a more conventional, sockets based approach to enabling

controller to controller communication for DHT.

To implement web services for DHT it is still required to solve the communication

problems between a web service and, each sites’ legacy hardware controllers and software

systems. This requires customized solutions for each controller or local testing system which is

to be connected. The same would be required using a conventional sockets based approach,

though standard communication between clients and servers using sockets would have to

accommodate the different platforms the clients and server may be hosted on.

19

In addition passing control signals using the web services protocol imposes significant

computational overhead and is not the fastest (lowest latency) method for network based

communication. To enable DHT at rates up to real-time low latency communication is critical,

the additional complications and overhead imposed by transmitting and processing web services

based control data with SOAP (utilising XML (Extensible Markup Language) based messaging

transmitted over the HTTP application layer) make it a poor choice for real-time distributed

control. Though a sockets based approach has to contend with additional communication

restrictions (institutional firewalls) it is also the fastest practical technique for network data

transfer.

With these in mind an alternative approach for managing communications within UK-

NEES was pursued. Out with this initial project a web services based approach would still be

favoured to provide the front end presence for each of the nodes and to integrate support

services, while an optimised (fast) sockets based implementation would be developed within this

project to connect partner sites for DHT. This inter-controller communication program would be

developed to provide the initial connections between sites, in order to first understand the

unknown problems of conducting DHT, particularly in real-time; to develop the protocols to

manage the test; and perhaps most importantly, find solutions to ensure robust communication in

the fastest possible way. The eventual aim to integrate this socket based DHT implementation

within the web services front end, to be called from but ran parallel to the web services serving

other tasks. This mixed approach would be necessary to enable DHT but would also ensure the

support services provision would retain the advantages of a web services approach.

In this project sockets based communication programs were developed to enable DHT and

real-time DHT between the UK-NEES sites. These would establish a generic testing protocol for

DHT and while the solutions were tailored for the test environments at each site, the same

functions and approach developed could be applied to other testing environments in a relatively

straightforward way. Since web services would not be applied at this stage of the project

additional support services were integrated within this program or used separately without

integrating them with web services to allow robust DHT experiments to be conducted. The use

of support services for example, tele-presence would also be evaluated to establish whether they

would interfere with the test (since they use the same network connection). This would provide

evidence for whether it would be practical to apply a web services approach for support services

as was the aim within UK-NEES, when conducting real-time DHT.

20

4.0 Network usage policies and security

One of the first barriers to making client-server connections using a sockets based approach is

that of institutional policies on network usage and network security. Network security is

obviously important, since malicious network usage (e.g. hacking), virus infection and other

security incidences that are made possible by virtue of Internet usage can, not just disrupt a test

but, can cause damage to data and even hardware at each of the testing sites. This is a concern,

since DHT is made possible by opening network ports on testing machines and by passing

control data between them across a network which is out with the control of any of the testing

sites and gives shared access to anyone connected to the Internet. By virtue of allowing Internet

communication between sites, a channel is opened up to probes of testing machines from

malicious and non malicious Internet users not involved in tests. The testing system and

consequently all computers on the local network at each site may be affected by these network

probes. Network security, due to the potential threats caused by the Internet is taken very

seriously by all institutions. Each institution has multiple layers of security and specific policies

to protect its users from malicious computer attacks. These are entrusted to network

professionals who are in charge of maintaining a very high quality of service for all users of the

local network, ensuring that the network is fully operational for most of the time and that the

shared network capacity is used fairly within the local network. The networks are very robust;

highly redundant with downtime warnings for planned maintenance given in advance. Network

communication and security responsibilities are also shared with the national institutional

network provider (ja.net within the UK) and through them, agreements internationally, to

connections to other national institutional network providers. Naturally these responsibilities do

not carry across the general Internet connections the network provides access to. Within Ja.net a

service level agreement (SLA) of 99.7% uptime is quoted for all IP traffic between institutions

(ja.net, 2010) though service is typically better.

Typically, all incoming connections, originating from outside the local network are

blocked by the institutional firewall. Usually outgoing connections are permitted and the reply to

the IP (Internet Protocol) address and port used to initiate a connection is also permitted from the

server that has been connected to. To connect to the testing machines and hence hardware

controllers at remote sites the appropriate IP addresses and ports at each institution must be

opened to allow network traffic through the firewall. While this may seem a straight forward

task, it can be an administrative challenge. To achieve this it is vital that good communications

and rapport be kept with the networking personnel at all institutions involved. Since they are in

charge of the network provision not just for the DHT testing system but for all other network use

21

they, rightly so, have concerns whenever a new and unknown request for network usage is made.

There are many misconceptions about the networking needs of a DHT system. Initial

information from NEES quoted network bandwidth minimum requirements of 100Mbps and

recommended 1Gbps for use of their tele-presence and related tools. Network administrators are

therefore alarmed at the potential needs for very high bandwidth network usage, potentially

limiting service to other users. In reality, the network usage for tele-presence is not any higher

than a typical user streaming a film online, is perhaps less and is limited in terms of duration.

Network administrators also have a duty to enable the research needs of their network users to be

met. A bigger concern is security. By opening ports, they are agreeing to transfer some

responsibility for security to the DHT test operators.

A typical solution (as agreed for the UK-NEES point of presence servers – providing the

outside link for the UK-NEES sites) is to place DHT testing system servers inside the

demilitarised zone (DMZ), a sub-network of the local network. Here they are fully able to make

incoming and outgoing network connections without affecting the rest of the network and are in

charge of maintaining their own security. Often physically placed in a secure room with other

DMZ servers, in case of a security incident, a firewall protects the rest of the network. They can

be used to facilitate testing machine connections inside the institutional firewall (i.e. the

controller board host PC) since they are on the same local network but provide an extra layer of

security.

While this is the favoured solution on the part of the network administrator and is

attractive as full management of remote sites may be given to the DHT operator at each site, this

was not the approach taken for enabling DHT within UK-NEES. Since reducing network latency

to a minimum was the preferred option, in order to maximise the possibilities for real-time DHT

it was decided to directly connect testing machines together through the university network

connection rather than by introducing an extra hop (and subsequently extra latency – however

small) to the test. Negotiations with network administrators at all sites involved, directly by the

author, other members of the UK-NEES team and by partners at the other sites ensured that

network administrators were made fully aware of the network access needs of DHT.

Ports were opened on the departmental firewall between the Oxford testing machine and

testing machines at Bristol, Cambridge and Auckland universities each hosting test controller

boards, the later for initial UK-NEES, NZ-NEES testing. Each testing machine has fixed IP

address(es). At Oxford, TCP (transmission control protocol) and UDP (user datagram protocol)

traffic would be permitted between two ports, for each IP address of the testing machine

22

connected to at each site. The ports chosen where not well known ports (reserved for specific

applications e.g. HTTP browsing) but unassigned registered ports.

This solution provides a good level of security since only traffic using these protocols

(the most commonly used on the Internet) is permitted between sites (Internet Control Message

Protocol (ICMP) attacks are often attempted) and only data connecting to specific ports from

specific IP addresses will be accepted by each site (return traffic from servers to clients are

allowed as usual). All other traffic to a test machine is blocked by the institutional firewall at

each site. While it is possible though perhaps not trivial for malicious users to impersonate IP

addresses, such an attack would have to know which IP address to impersonate and which port to

connect to.

Additional security measures have been implemented for testing (between Oxford and

Bristol). Local machine (software) firewalls are used to block external network traffic outside of

testing periods, minimising the time the testing machines are exposed to external traffic, though

local firewalls are turned off during testing. Since random security incidences via port scanning

are unlikely, with most Internet based incidences occurring due to Internet browsing (inadvertent

visits to compromised websites or other sites hosting malicious code), Internet browsing is not

permitted (though possible) on these machines. Strict usage guidelines are in place limiting the

installation of software and use of these machines. They are regularly updated and antivirus

software used. In addition the inner loop controller and its associated testing machine for

adjusting inner loop parameters is kept off the network.

As part of the social tasks of the distributed testing system it is important to enable remote

access to the testing machines of each site. Access to the desktops and file transfer between

various testing machines can be achieved through remote desktop (Windows) or other programs

such as VNC. This has proved particularly important within UK-NEES as it allows the lead test

site operator (at Oxford) to directly access remote testing machines for troubleshooting in

development of the technique. Allowing such access has severe security implications, and each

institution has given different responses to such requests. Access to the Cambridge test machine

was not directly permitted by network administrators, while Auckland University permitted

unencrypted access using the same IP and port security measure as allowed for DHT. The most

appropriate solution to protect the computers and ensure user and data privacy at each site is the

solution provided by Bristol University to access their testing machine. Remote desktop

connections are enabled through secure tunnelling (encrypted connection) to a proxy server in

Bristol that connects to the testing machine. This also allows for testing using the distributed

23

controller board and testing machine at Bristol as a client hosting only numerical substructures to

physical substructures in Oxford without any operator being present in Bristol.

Another security point to consider is the actual transmission of data and the software

vulnerabilities of the DHT program. In the first DHT tests, which were conducted from the

Oxford controller board to a pseudo-physical model hosted on a PC in Glasgow out with the

Ja.net network, a VPN (virtual private network) was used to bypass firewalls and to encrypt

transmitted data. Secure transmission is an option provided by NTCP and is regarded as an

important feature within NEES for DHT (Mosqueda et al. 2006). However, this approach was

soon abandoned as it was felt that secure data transmission was not a requirement for a robust

DHT system and was contrary to the minimum latency philosophy of DHT.exe.

Security is important to prevent malicious users from disrupting a test, either by

comprising a computer beforehand or interrupting or intercepting data transmission during a test.

This can have severe consequences in a hybrid test since unique or expensive test specimens may

be inadvertently yielded or destroyed. Worse still there could be a dangerous collapse of the

physical substructure. Though, this is can be prevented by the local test safety limits (which

cannot be compromised via the network). However, a targeted and sophisticated attack is not

likely.

It is important to consider the needs of data delivery for DHT and the consequences if

secure delivery or encryption techniques are used. Firstly, what is being transmitted during DHT

is not private data, but control data. The numbers being transmitted in of themselves, have no

real value to anyone other than those involved in testing (unlike transmission of online banking

details) and are therefore not attractive to malicious network users. Secondly secure data delivery

does not mean guaranteed delivery, malicious network users can listen in to, or block (with

software or physically) secure data communication just like any other data. Guaranteed delivery

is often a function of the underlying communication protocol and delays can result as a

consequence. Most importantly encryption and subsequent decryption adds significant

computational overhead and hence latency to a test which makes it inappropriate for real-time

testing.

The fastest and simplest approach for network communication is applying socket

technology using the TCP/IP or UDP/IP communication protocols. If adequate security measures

are taken as described locally, the likelihood of a malicious incident on a testing machine is

almost insignificant, though important to mention because of its potential to disrupt and the

perceived importance given to it by network professionals who may not appreciate the needs of

DHT. Good design of the test middleware means that software vulnerabilities, such as protecting

24

memory from buffer overflow attacks should be implemented. A security incident occurring at a

testing machine is not likely to be able to achieve much in the way of influencing the test other

than causing it to stop, if the control system has been designed with appropriate care. There have

not been any known security incidences during the entire development period of UK-NEES.

A final security measure to consider is the use of a private network connection or dedicated

line. Such a connection has been installed within the UK-NEES network. By providing a direct

connection between the UK-NEES testing labs, firewalls are no longer an administrative issue to

consider and since the connection is not physically shared with other users, security incidences

are more unlikely. However, in practice network performance using this network was not

optimal. While it was possible to conduct good tests with this network it is believed as it is

software limited to 100Mbps data transfer delays can be higher particularly at the start of data

transmission. This network is further discussed in Ojaghi (2010).

5.0 Social aspects of a distributed hybrid testing system

The test middleware serves two roles required by the developed DHT test protocol, the

background and foreground tasks. The background tasks are responsible for ensuring the

technical requirements for the test are met, that is they set up the test according to the client-

server test architecture topology, synchronise time-steps and transmit control data between client

and server(s) during the test according to the type of test protocol chosen to conduct the test. The

foreground tasks on the other hand ensure the critical social tasks of the distributed testing

system are met during a test; they manage the coordination of human to human and human to

controller interaction in the moments immediately before a test, during a test and at the end of

the test.

Though the features implemented within the test middleware and linked to those

implemented within the Control Desk application enable the essential foreground tasks to be

completed, without requiring audio/visual contact between sites, support systems that enhance

communication between distributed sites and enable audio/visual/textual contact and access to

distributed testing controllers have been applied. These and the issues they raise are briefly

discussed in this section. Finally, distributed collaboration raises new data ownership, access and

intellectual property issues these are also briefly discussed.

25

5.1 Support systems

During DHT experiments several support systems were used and trialled to fulfil many of the

social tasks of the distributed testing system. The aim was both to support testing and to assess

the impact of, limitations and potential issues with using these technologies. These support

systems all aim to enhance the user experience of the distributed testing environment. Since in

the distributed environment, no single operator has full access to the entire experiment, by giving

access to remote test controller host PC’s when required and by increasing awareness of testing

at remote sites, through enabling audio, visual and textual contact between test operators during

testing, as well as audio visual contact with remote test rigs these technologies all aim to give as

much access to the local testing environment to remote users as possible.

These support systems with the exception of the telephone require access to a computer

and an Internet connection, though in some cases a dedicated network connection between sites

is adequate (here UK Lightpath). In order to limit the impact of many of these tools that rely on

the use of a computer on the host PC performance during testing, these support systems were

generally accessed using an additional computer setup for that purpose (unless testing demanded

the host PC be used).

For simplicity these are listed:

5.1.1 Remote access to testing machines

Remote access to host PC testing machines proved to be one the most useful support systems

that was applied. This gives essentially the same level of access to a remote user that a local user

would have to host PC testing machines. Primarily Windows Remote Desktop was used (though

non Windows software such as VNC has also been used – the network security considerations

have been described).

This was useful first to allow the test developer (the author while in Oxford) to help setup

the remote testing machine in Bristol for real-time DHT by installing software and optimising

the host PC operating environment. Second it was crucial in allowing the test developer to access

remote testing machines to transfer control system files and programs before a testing session

and to check the functioning of these programs before testing. Also, since many programs and

control systems were under development, to troubleshoot these programs and control systems via

pure software tests (including network tests) or to investigate control system/software

phenomena that would occur during joint testing with the test collaborator in Bristol. Pure

software tests involving the network connecting to the controller cards in both Bristol and

Oxford were also conducted outside of normal lab test hours to check connections and

26

performance. These were often conducted by a third machine running two instances of Remote

Desktop. In testing it proved to be important to have one person oversee all aspects of a test and

be responsible for ensuring that all control software was correctly programmed and connected.

Since a distributed control system is inextricably linked to multiple control systems across sites,

one person is best placed to ensure that the control system is properly designed and is correctly

connected, ideally the control system designer. However due to the distributed nature of the

experiment and its complexity this role has to be supported by other members of the distributed

testing team.

In some of the cases where testing was conducted with the client located at Bristol and

hosting only the main numerical model, remote desktop was used in Oxford to access the Bristol

test machine without the presence of a local test operator at Bristol to conduct real-time DHT

with a physical substructure hosted in Oxford.

Finally, it was also useful to enable files to be shared between client and server sites after

testing.

5.1.2 Telephone contact

The telephone proved a useful tool during testing. It was used primarily as it provided an

audio link between test sites without using local computing resources. In two site testing it was

essential to guide remote test operators who were not familiar with standard operating procedure

or new functions. It was also very useful when conducting repeated tests, to pass instructions to

reset the test controller for the next test or to adjust variables as specified by the test operator (the

author, at Oxford). Telephone conversation benefits from the use of a headset as it frees

operators hands to access the multiple hardware interfaces involved in the testing system (inner

loop and outer loop controllers). In three site tests telephone conversation was limited to two site

conversations. In initial tests, one site (Cambridge) was unfamiliar with the testing systems and

most conversation was between the main test operator (the client at Oxford) and with the

operators at Cambridge. Since only one line was available the test operator at Bristol (who was

experienced in using the testing system) was updated periodically. While the conversation would

have benefited from three way calling it is important that the client site coordinates conversation

between each site in turn. It is administratively difficult to manage multiple conversations,

particularly if one main test operator, the client has to manage local testing systems as well.

5.1.3 Instant messaging - when speaking is not possible or there are multiple sites

As an alternative to telephone conversation, instant messaging was used in some tests to allow

textual messages to be transmitted between sites during tests. This was particularly useful in

27

three site testing where only two sites (Oxford and Cambridge) were predominantly in telephone

contact. Simple messages could be sent to update the test operator in Bristol (by the client at

Oxford). The messages were kept two way so not to complicate the sending and

acknowledgement of instructions between sites. They were predominantly used by the client

operator at Oxford to inform the operator in Bristol to reset for the next test (confirmation was

only done if there was a problem). More complicated messages proved distracting and would

begin to overburden the client operator who also had to manage the other server site and local

test systems. More work was planned to implement text messaging as part of a web services

based support system. There pre-programmed messages would be used to limit the time required

for typing and messaging could be implemented in a manner to contact sites with messages

intended only for them or to contact all sites with a global message. MSN messenger was used

for the testing conducted.

5.1.4 Live editing of shared documents via Google Docs

In order to document multiple experiments in a particular testing session, shared documents were

used in two site testing between Oxford and Bristol. To reduce the burden on the main test

operator (the client) during a testing session, the server test operator would be charged with

updating the document as tests progressed though only if they could, since the client test operator

would often lead experiments. Since the document could be shared live, it was a very useful way

for both operators to record particular observations relevant to their test site for each test that

progressed.

5.1.5 Tele-presence

One the main issues with working in a distributed testing environment when conducting DHT is

that remote operators/test participants no longer have audio visual access to all physical

substructures. The role of tele-presence is to attempt to give test participants/operators some

access to physical substructures at other sites. Tele-presence cameras were installed at each of

the three UK-NEES sites. These cameras enabled local and remote operators views of the

physical substructures (and in some cases test operators). The cameras had full pan, tilt and zoom

capabilities (high levels of optical zoom are possible). Streaming over the Internet/network was

possible at various frame rates depending on desired quality and with regard to bandwidth usage

issues. However, these streams were not encrypted. The video could be recorded, though while

the cameras had microphones they were not best suited for capturing sound in the lab, nor could

sound be streamed or recorded with the software available.

28

Since sound can play an important role in testing for example, to capture the sound of

cracking, capturing sound and video can be useful both for local testing so experiments may be

reviewed and in a distributed test since these could not otherwise be sensed by remote test

participants.

It is important to consider how audio visual data is used in the distributed test environment.

If too many streams are transmitted a test operator cannot focus on events at any one particular

site without missing out on what may be happening at another site, in addition sound from

multiple sites can be distracting. Testing labs can be noisy places and sound captured may be of

actuation equipment and test rig connections, not just of the test specimens. This may or may not

be useful. For example, in two site testing between Oxford and Bristol sound, as transmitted via

the telephone played an important role in allowing the client operator at Oxford to determine the

quality of the experiment. Although live graphical data from the server was an indicator of test

quality and distributed control, as observed at the client, in tests were the control system was on

the verge of stability or if there were significant data loss events the rattling sound that was heard

indicated control issues, whereas when the test was well controlled no noise from the test rig was

audible from the telephone. Audio played a more useful role than video in this respect.

In testing conducted, tele-presence did not play a particularly useful role particularly since

audio contact via the telephone was available. This was for a number of reasons. The physical

substructures were well known by test operators beforehand; in the case of testing with Bristol

the physical substructure remained elastic and as movement was small there was not much to

observe; and in the case of testing with Cambridge it was difficult to see the foundation pad

within the centrifuge basket as it was being moved according to commands both locally at

Cambridge and during DHT (see Ojaghi et al. 2010 for a description of the experiment).

Placement of cameras can be a difficult task, and they do not exactly replace the real thing when

video is concerned. It would also be useful if audio and video recording could be triggered at test

start.

The use of tele-presence cameras also raises two further issues. The first is related to test

privacy. The cameras may be controlled and test labs viewed by test operators at remote sites and

also interested test participants worldwide (though access is password protected). Since the

cameras can give access to other views of the testing lab and not just the systems being tested

there are privacy issues to consider. Individuals or sensitive (commercially or otherwise)

technologies under test which are not related to the DHT test may be viewed remotely. This will

require other users of local labs to consent to the use of such cameras or to ensure that such

cameras do not infringe on privacy. In the case of Bristol a strict policy regarding the use of

29

cameras is in place, such that cameras face the walls when not in use and other users of the lab

are warned beforehand when testing means that they might also be filmed. No such policy is in

place at Oxford and to the best of the authors knowledge neither is there one at Cambridge.

The second issue to consider is the impact that tele-presence can have on the quality of

testing. Tele-presence can be bandwidth intensive, and since it may share the same local Ethernet

connection to the Internet, high bandwidth usage locally may impede or clash with the critical

control signals required to conduct real-time DHT. It was important therefore to find out whether

this would occur or whether the developed testing system was robust enough not to be effected

by this. Testing with and without tele-presence suggested that tele-presence did not affect the

running of the test if the host PC and tele-presence machine used the same network connection

locally. Additionally, it would be possible within UK-NEES to enable tele-presence or DHT

communication on different networks. Real-time DHT could be conducted on the UK-lightpath

dedicated network connection while tele-presence data streamed via the Ja.net production

network, the shared Internet connection.

5.1.6 Video-conferencing

Video conferencing was originally to be used to enable operator to operator communication

during testing, though this application was never fully implemented and was largely replaced by

the other means of communication discussed. However, video conferencing did play an

important role in preliminary planning of DHT goals. The tests enabled high level meetings to

take place between members of the UK-NEES team and on some occasions participants from

partner sites worldwide. These meeting proved most important in breaking the ice and to allow

key decision makers and all members of the UK-NEES team to be involved in determining the

goals of DHT experiments, making agreements as to what equipment could be available and to

update progress. These meeting were often followed by personal telephone/email contacts and

less occasionally site visits to set up and go over the finer details of proposed work one to one.

5.6.2 Access rights and intellectual property in a distributed testing

environment

One of the social issues that emerges with working in a distributed testing environment and

especially while conducting distributed hybrid testing is that of access rights including, network

and computer access rights, data access rights, data ownership and intellectual property rights

amongst distributed test participants and sites. While this is a legal or policy issue it is worth

30

mentioning here as DHT testing possibly introduces new elements that have not been

encountered before.

In a distributed test multiple sites and test operators are involved in a test, and each site

may store test data generated locally and data fed back or received from remote sites. The

technology that enables the test may be predominantly developed by one test site or an individual

or it may be developed by many across the sites involved or by an individual out with the

network.

In conducting work on distributed computers, providing access to computing facilities by

remote personnel out with an institution could break local computer usage policies. With UK-

NEES testing, the main test developer (the author) was given formal permission by the local

network administrator to gain full access to the Host PC in Bristol. This was granted since the

majority of UK-NEES tests were between Oxford and Bristol and access to the testing machine

would be important to the success of the test.

Data ownership, data access rights and intellectual property are in some ways linked.

When the test is conducted across sites, who owns the data? Who should have permission to

access it?

The issue can be simple or complex. In the simple view taken within UK-NEES data is

generally shared between the sites by the participants, and those involved in development or

generation of testing control systems, software, test rigs and data are duly acknowledged.

In a more complex view each institution typically often claims any data generated on its

premises by its staff unless otherwise agreed as belonging to that institution, see for example,

Statute XVI: Property, Contracts, and Trusts of the statues and regulations of Oxford University

(2010). Of course research students, particularly if they are not paid by a research contract are

not automatically legally bound by such regulations (UKBI, 2010). A further complication arises

from the use of online tools such as Google Docs. Since data is stored by no one test site, it is not

clear who owns that data. A clear policy for sharing data and technology in a collaborative

framework may need to be agreed beforehand.

6.0 Summary

In this brief report a number of considerations taken in the development of the UK-NEES

distributed testing system are discussed. Firstly, the middleware development process is outlined.

Brief results are presented from selected early tests and the operation of DHT.exe v0.3.1.1, the

31

most successful DHT.exe series program described. The development of the test middleware and

the extensive testing involved lead to a better understanding of the testing environment both for

conducting test control but also highlighting usability and administrative issues in distributed

testing. These issues encountered and how they were approached are detailed in this report. The

web services approach is discussed and the alternative hybrid web services /socket approach

proposed where sockets would be used for critical communications and web services for

integrating all support systems and for calling the software required for socket communication.

Security and network usage issues are also discussed. Good communication between

networking staff and test operators is crucial, while security is important, its importance should

not be exaggerated at the expense of increased latency.

Finally social aspects of distributed testing are described. Although a usability strategy has

been developed the report focuses on the additional systems used to support work in the

distributed testing environment. Future developments of the network should seek to integrate

these within web services. The issue of data ownership is also raised. Since data is generated

between sites an agreement for data ownership and sharing should be made.

References

De la Flor G., Ojaghi M., Lamata Martínez I., Jirotka M., Blakeborough A., Williams M.S., (2009) Reconfiguring Practice: The Interdependence of Experimental Procedure and Computing Infrastructure in Distributed Earthquake Engineering, All Hands Meeting, Oxford.

De la Flor G., Ojaghi M., Lamata Martínez I., Jirotka M., Williams M.S., Blakeborough A. (2010).

Reconfiguring Practice: The Interdependence of Experimental Procedure and Computing Infrastructure in Distributed Earthquake Engineering, Philosophical Transactions of the Royal Society A. 368, 4073-4088.

ja.net (2010) JANET Service Description Version 4. 1 August 2010 to 31 July 2011 (available on www.ja.net) Mosqueda G., Stojadinovic, B., Hanley J., Sivaselvan M., Reinhorn A. (2006). Fast Hybrid Simulation with

Geographically Distributed Substructures. 17th Analysis and Computation Specialty Conference. Ojaghi, M. (Makhzan Ojaghi, S. M.) (2010) The Development of Real-Time Distributed Hybrid Testing for

Earthquake Engineering. DPhil thesis. University of Oxford. Ojaghi, M., Lamata Martínez, I., Dietz, M., Williams, M.S., Blakeborough, A., Crewe, A., Taylor, C.,

Madabhushi, G., Haigh, S., Ali, A. (2010). UK-NEES - Distributed Hybrid Testing Between Bristol, Cambridge and Oxford Universities: Connecting Structural Dynamics Labs to a Geotechnical Centrifuge. 9th U.S. National and 10th Canadian Conference on Earthquake Engineering. Paper 1024.

Pearlman L., D’Arcy M., Johnson E., Kesselman C. and Plaszczak P. (2003). “NEESgrid Teleoperation Control

Protocol.” Technical Rep. NEESgrid-2003-07, NEESgrid. Statues and regulations of Oxford University (2010) http://www.admin.ox.ac.uk/statutes/790-121.shtml UKBI (2010) Managing Intellectual Property The Guide. A Guide to Strategic Decision-Making in Universities

www.ukbi.co.uk UK business incubation.