Development of the UK-NEES test middleware, early tests, web services
approach, network usage, security, and usability issues in distributed hybrid testing
Mobin Ojaghi
OUEL Departmental Report no 2322/11
University of Oxford Department of Engineering Science
2011
i
Abstract
Development of the UK-NEES test middleware, early tests, web services approach, network usage, security, and usability issues in distributed hybrid testing
This document describes work conducted during development of UK-NEES (Network for Earthquake Engineering Simulation). It is written to supplement other work referred to here describing the development of real-time distributed hybrid testing. It describes various aspects of the development process first, the middleware development process and some of the early problems encountered which guided later middleware designs. It also describes the workings of the DHT.exe series program v0.3.1.1, the most successful DHT.exe program. Following this the other aspects of distributed testing identified and considered in the development of the UK-NEES real-time distributed hybrid testing system are described: the web services approach, network usage and security and, social issues including support systems and data ownership.
ii
Acknowledgements
The UK-NEES project was an EPSRC funded project funded using grants at
Oxford EP/D079101/1 (Oxford), in collaboration with EP/D080088/1 (Bristol) and EP/D079691/1 (Cambridge).
I am grateful to the EPSRC (departmental studentship), New College (travel award and writing up grant), Bristol University (EPSRC fund for travel expenses) and the IMechE (travel award) for the various funds they have provided me in the completion of this project.
I would like to thank my supervisors, professors Martin Williams and Tony Blakeborough for their support and encouragement throughout the course of this project. They played an important role in ensuring the feasibility of the UK-NEES project.
I would also like to thank the other members of the UK-NEES team who supported the work of the project, in particular with regards to IT matters.
Initial ideation and collaboratory work in development of pre-DHT.exe series programs was with Dr. Javier Para Fuente. The most extensive set of development programs leading to the first stable real-time DHT was during development of the DHT.exe series of programs with Mr. Ignacio Lamata Martínez. I am grateful also to Dr. Matt Dietz for his tireless support of this project during the many hours of experiments at Bristol.
I would also like to acknowledge the support of Mr. Kashif Saleem for his introduction of grid computing concepts and assistance in setting up UK-light and, the other members of the team Mr. Jonathon Evans, Mr. Arshad Ali, Dr. Gopal Madabhushi, Dr. Stuart Haigh, Professor Colin Taylor and Dr. Adam Crewe.
I would like to thank the IT support staff at Oxford, in particular Mr. Tony Gilham but also Mr. Chris Flux, Mr. Kevin Corbett and Mr. Harry Fearnley.
At Bristol Mr. Chris Hawkins.
I am also grateful to Dr. Andreas Schellenberg and Prof. Stephen Mahin for their support during my visit to UC Berkeley.
Finally, I would like to thank everyone else who supported me during the completion of this project.
iii
Table of Contents
1.0 Introduction 1
2.0 Middleware development process 2
2.1 Inter-controller communication 5 2.2 Early controller board testing 7 2.3 Board to board sine testing 12 2.4 DHT.exe series program 15
3.0 Web services based communication within UK-NEES 17 4.0 Network usage policies and security 20
5.0 Social aspects of a distributed hybrid testing system 24 5.1 Support systems 25
5.1.1 Remote access to testing machines 25 5.1.2 Telephone contact 26 5.1.3 Instant messaging - when speaking is not possible or there are 26 multiple sites 5.1.4 Live editing of shared documents via Google Docs 27 5.1.5 Tele-presence 27 5.1.6 Video-conferencing 29
5.6.2 Access rights and intellectual property in a distributed testing environment 29
6.0 Summary 30
References 31
1
1.0 Introduction
This report supplements work discussed and referred to in Ojaghi (2010) on the development of
real-time distributed hybrid testing (DHT). It expands on some of the work presented there and
also describes additional aspects of distributed testing explored within the development process
of the UK-NEES real-time distributed hybrid testing system. A distributed hybrid testing system
introduces many new challenges to enable testing, most of the technical and social issues have
been covered in detail in Ojaghi (2010). This report focuses initially on the middleware
development process. The development of the test middleware, responsible for making direct
connections between site actuation systems lead to a better understanding of the issues involved
in this new testing environment both technical and from an administrative and usability
perspective. Selected early test results are shown describing some early test findings that lead to
the development of DHT.exe v0.3.1.1, the most successful DHT.exe series program used for
real-time DHT within UK-NEES and capable of conducting stable tests. It should be noted that
extensive hybrid test results which were generated are not shown, an example describing the
issues faced is found in Ojaghi (2010). DHT.exe in turn would be superseded by IC-DHT.exe
series programs described in Ojaghi (2010) which were used to conduct what are believed to be
the first known stable and accurate real-time DHT. The program used together with the new
large delay compensation algorithms and data handling controllers developed for this purpose.
The web services approach is next discussed. The approach taken to make the connections
between sites was ultimately using sockets. However, originally web services were proposed.
Applying web services has significant advantages but its disadvantage in terms of
communication latency and computational overhead meant that a socket based approach would
be more appropriate for real-time continuous DHT. To benefit from the potential of web services
while not limiting test performance it is proposed that web services be used for auxiliary tasks
related to test usability while critical communication as required for control would continue to
use sockets through a separate dedicated PC. In the current implementation of UK-NEES test
software all communication primarily utilises sockets. Integrating all distributed test systems
including calling sockets for transmitting control signals via an independent processor/PC with
web services is intended for a future stage of development.
DHT presents a new use of the Internet. An important consideration is network usage and
security in DHT. This is discussed in section four. Finally, working in a distributed testing
environment not only poses technical challenges from a point of view of robust control, there are
significant challenges to overcome to facilitate work between distributed personnel and in order
2
to improve the user experience. A usability strategy has been implemented within UK-NEES
which aims to simplify test procedure in order to minimise human error and reduce the burden
on test operators in an already busy testing environment. To support work in the distributed
testing environment a range of additional tools were used, these are described in section five.
Also discussed there is the potential issue which arises of data ownership as data is generated
jointly between research institutions.
2.0 Middleware development process
The test middleware (Ojaghi, 2010) provides the interface between software layers
communicating between the controller cards directly controlling local actuators and network
interface cards which enable communication between geographically distributed controller cards
across the Internet or a similar network. It is therefore used to enable distributed inter-controller
communication.
It quickly emerged that conducting robust communication between distributed sites would
be a very difficult prospect since data loss, inter-controller latency and data arrival time
variations were quite high. Robust communication could not be achieved. To achieve DHT and
later as focus shifted to predominantly achieving real-time continuous DHT it was required that a
deep understanding of the existing hardware and software environment be developed. Since
existing hardware systems had to be used as part of the requirements of the project and, to ensure
that the developed system could be easily applied to other earthquake engineering labs, work
focussed on maximising the potential of available systems. The primary aim was to explore the
possibilities for achieving communication without data loss, with minimal variation in data
arrival time and to minimise as much as possible the communication latency. Achieving robust,
minimal latency, low jitter inter-controller communication would be essential both in the
development of and in the application of control strategies for real-time distributed hybrid
testing.
This process would be facilitated through the development of the test middleware and with
extensive testing from 2006 until mid 2009 when the final test architecture and associated
control systems presented in Ojaghi (2010) were developed.
The middleware was developed in three stages. The early stage work referred to as pre-
DHT.exe (named after the communication program) focussed on achieving host PC to controller
3
board communication between the hardware systems at Oxford and Bristol. Later this was
extended to the testing of pseudo-physical distributed models, where purely numerical
simulations of a structural system hosted on a controller board would be connected to single and
multiple distributed numerical substructures modelling a nonlinear fluid viscous damper. These
dampers would be placed on the local host machine, a machine on the local network and
distributed first across the Internet to a machine in Glasgow (the authors home and accessed
using SSH tunnelling to provide an encrypted communication channel to overcome institutional
firewalls) and later to PCs in Bristol and Cambridge.
At this stage many administrative hurdles in regards with institutional policies on network
usage and security had to be overcome. In order to minimise latency it was decided that inter-
controller communication should be achieved directly through host PC communication and not
through a proxy; UK-NEES point of presence machines located in the DMZ (demilitarized zone)
of the local network. Since a socket architecture was pursued, ports between host PC’s at the
UK-NEES sites and further afield including at Auckland University had to be opened on each
intuition’s firewall. Further details of network usage policies, security, related issues and how
they were tackled may be found in section 4.0.
In the second and most extensive development stage, the DHT.exe series of programs were
created building on the earlier programs. These programs were the first to enable distributed
inter-controller communication within UK-NEES and with that the first full assessment of the
testing environment could be made. They were designed with maximum efficiency in coding in
mind to make best use of computational resources both locally and on the network. The primary
role of these programs where to directly integrate with the dSpace controllers and Windows
operating system for network data transfer. The DHT.exe programs were also designed to
separately integrate with shared file/memory interfaces to allow communication between the
Cambridge host PC and the programs used to link the host PC and the Cambridge local
controller, further details of which may be found in Ojaghi et al. (2010). At this stage a deeper
understanding of the workings of the testing system were gained and many new control
strategies and improvements to the middleware developed in response to testing. The most
important middleware developments were, to discover the limitations of single process
synchronous socket communication; that latency and jitter with TCP/IP communication would
be too high for real-time control purposes particularly due to its flow control and error control
functions; that modifying flow control characteristics would be difficult and lead to saturation of
local and remote PC’s; and, to implement the ability to directly control time-step generation at
the Bristol and Oxford controller boards, which would be essential for enabling synchronisation
between distributed controller boards. At this stage the importance of coordinating distributed
4
personnel and equipment in conducting DHT was realized and led to the development of a
usability strategy which was incorporated into the DHT.exe series of programs. This usability
strategy and the pair programming of the DHT.exe series of programs was studied to assess its
wider impact on this and related eScience projects (de la Flor et al., 2009; 2010).
In the final stage of development the IC-DHT.exe program was developed. The program
used as the current UK-NEES test middleware builds extensively on the earlier programs and
was developed to address many of the limitations encountered in those programs. The program is
optimised for high performance messaging (low latency and jitter), implementing a single
process asynchronous version of UDP/IP with careful consideration of buffer sizes and similarly
to the DHT.exe programs written with maximum computational efficiency in mind. The
program, tailored currently for the controllers used at Bristol and Oxford (dSpace) can
implement all of the control strategies presented in Ojaghi (2010) provided it is supplied with the
correct initialisation and control data, achieving synchronisation of local and remote time-steps
through the use of a hardware based high resolution timer located on the host PC’s Intel
processor and by the ability to control the start of time-step generation and hence the outer loop
controller program on local and distributed controllers. The IC-DHT.exe program implements
the developed usability strategy, has the capability to capture data including both program and
controller board test variables, operates with soft real-time priority and runs on a dedicated
processor core. By running in a dedicated soft real-time environment the program has a greater
chance of completing its processes in time and in allowing data capture, secondary data capture
provided via dSpace software on the host PC is no longer required and thus the computational
load on the host PC and controller board is further reduced. This program setup was found to
limit or eliminate data loss and further reduce latency and jitter. Enabling robust real-time DHT
between Oxford and Bristol, extensive testing was conducted using the program.
The extensive testing and development process meant that a deep understanding of the
hardware and software environment of the UK-NEES real-time DHT system was gained that is,
the functioning and interaction of the controller boards, host PCs and the networks used. The
first and most obvious issue that comes to mind with conducting robust real-time distributed
control is with attempting to pass control signals across the Internet, since this is for the most
part beyond the control of the testing sites; data is transmitted according to communication
protocols and has to compete with other network traffic. However, it is found that network
communication between the testing sites is actually quite robust. It is true that the network could
fail but this is an extremely unlikely event and down-time is usually advertised beforehand.
Robust communication across the network is facilitated with the implementation of the test
communication protocols through IC-DHT.exe and also because the service between institutional
5
networks is highly redundant, has a high bandwidth and routing is provided through high
computational capacity multi-processor routers capable of handling multiple packets
instantaneously. In fact it became clear that the biggest barrier to robust low latency, low jitter
communication was not the network but the local nodes themselves. This led to an optimised
hardware and software environment set up to minimize data loss, latency and jitter.
The development and functioning of the hardware and software systems that enable robust
distributed communication and the learning outcomes that guided the development process are
discussed in Ojaghi (2010) in more detail. In the remainder of this section inter-controller
communication between the three UK-NEES sites is discussed, brief results from early tests are
shown including distributed sine tests which were important in understanding the issues involved
with testing in this new testing environment. Finally, the workings of the DHT.exe series
program, v0.3.1.1 is shown. The most successful DHT.exe series program it was the first to
integrate background tasks related to data communication and foreground tasks which were
designed to assist usability.
2.1 Inter-controller communication
In the first stage of development of the UK-NEES distributed hybrid testing system the
initial client-server architecture was applied to connect the local testing systems at the three test
sites. This is shown in Fig. 2.1 depicting an experiment described by Ojaghi et al. (2010). Here,
the central site, Oxford is the client connecting the two servers at Bristol and Cambridge. A high
level view of the control layer architecture connecting the hardware systems at the three sites is
also shown. The existing hardware systems used to conduct local testing at each of the sites are
used with a multilayer control architecture to enable communications between sites.
In this case the client connects to two very different hardware systems. Oxford and Bristol
share very similar testing environments. They both use dSpace hard real-time processor boards.
This allows numerical models and control software (the outer loop) to be run onboard with a
high resolution hardware clock ensuring accurate and consistent time-steps. The boards, hosted
on a Windows PC (Win XP Pro sp2/sp3), directly command the actuator inner loop controller
and control signals are fed back to them. On the host PC, the network connection to the Internet
is used and the role of the distributed control layer is to enable communication between the
controller boards over the network. Network communication may be achieved with the boards by
using the dSpace, Clib and Windows, Winsock API’s (Application Programming Interface). In
testing, both Oxford and Bristol use dynamic hydraulic actuators and have the capability to run
real-time hybrid experiments. At Cambridge the testing environment is quite different. As real-
time testing is not a priority, Cambridge uses high load capacity electrical motors with gearing,
6
that fulfil power requirements and while there are significant velocity restrictions they are
relatively compact - as required for use on the centrifuge basket. The Cambridge systems run
LabView on a Windows XP Pro (multi-process) environment to allow communication with a
ComputerBoards A/D board, regulating time-steps using a software based timer. While LabView
is used to interface with the A/D board, direct access to the memory registers of the board is
possible via a ComputerBoards software library.
Fig. 2.1. UK-NEES 3 site test distributed testing system - high level control layer view
Due to the nature of the testing environment presented two different solutions were found
for the design of the test middleware. The first describes the basis of middleware development
for real-time DHT and is tailored in this case specifically to connect to the dSpace boards at
Bristol and Oxford. The program runs on the host PC and uses the available API’s to allow read
write access to controller board variables and to the network. The different middleware programs
use this same approach and are distinguished by how the communications protocol is
implemented and how control variables are processed. The second solution enables
communication in a more general way between the hard-real time controller board at Oxford and
the soft real-time control afforded by the LabView program running on a host PC and providing
the interface to the Cambridge ComputerBoards A/D card. A LabView based program interfaces
via common read and writes files (memory or disk based) with the DHT.exe program, receiving
commands to pass to the actuator controller and transferring feedback control signals to the
DHT.exe program. This general connection shares the same test protocol with the other sites in
order to transmit messages and is developed to connect any hardware platform. However, by
7
introducing an extra file layer this makes it slower than the approach used to directly transfer
messages between PC and controller board using the Clib API as used in Oxford and Bristol.
The tests conducted using this setup were one of many to highlight the problems of
conducting robust communication within a software controlled multi-process environment.
Accurately controlling software timers at Cambridge was a significant issue and major
performance gains could be made in testing by using a more powerful host PC in Cambridge.
Testing at a slower rate reduced computational load and data loss events could largely be
avoided. However, this did not mean that by testing at a slower rate alone that robust
communication would be possible. In testing between Oxford and Bristol the computational load
on the controller board and host PC and the competing processes on the host PC presented a
major limiting factor in conducting robust communication in real-time.
2.2 Early controller board testing
The first serious attempt at conducting fast DHT was made using the pre-DHT.exe programs,
and they provided the first indications of the issues involved with the testing environment. The
most significant development of the pre-DHT.exe program was to connect a numerical model
running on the Oxford controller board to a pseudo-physical model which was part of the server
pre-DHT.exe program. This is shown in Fig. 2.2.
Fig. 2.2. Pre-DHT.exe program flow connecting controller board model to pseudo-physical model on PC
The pseudo-physical model would be run either on the host PC itself (using the localhost
IP), a PC on the local network or a PC running on an external network. The program was used to
develop the connections, gain a better understanding of the testing environment and gauge the
possibilities for fast testing. The program implements the TCP/IP (Winsock) communication
protocol using a single process synchronous or blocking mode of communication which is the
most common approach for designing a sockets application. Synchronous communication means
that the program will progress only when a new value (the expected in order packet) has been
received (with delivery all but guaranteed by the protocol).
8
Initial network testing (in 2007) between the test machine and local network machine (in
the office above the test lab) showed a fairly consistent round trip latency of <1ms (measured
using ping on Windows XP Pro (32bit packet) and 6ms to Bristol. The later seemed to bode well
for enabling real-time DHT within UK-NEES as this taken as an additional constant delay was
not excessive and within the order of magnitude compensated for using existing local techniques.
However, since reported DHT tests conducted up to then had not even approached real-time this
was taken with some caution. The network connection was idealised as equivalent to a long
coaxial cable such as used to provide local actuation commands and feedback sensor data. Since
a remote actuator was eventually to be controlled, to enable rapid communications between the
sites the communications program would attempt to read off the controller board and send the
control data as fast as possible to the remote site, to directly actuate and return the measured
sensor response as soon as possible to be applied to the main numerical model calculations on
the control board. All communication going well, the client site numerical model would start the
test and on connecting, the remote site actuator/pseudo-physical model would receive a constant
stream of delayed commands to process and feedback.
To gauge reading and writing possibilities on and off the board a series of initial tests
were conducted using the pre-DHT program as described in Fig. 2.2, the network delay
measured using the time onboard and the time kept value written to the board. A representative
set of network delay results conducted in one day of testing are shown in Fig. 2.3. A few
interesting observations can be made about these results.
Firstly, concentrating on the local tests (lt1-9). These tests are not conducted using the
network, rather they use the localhost IP address (127.0.0.1). This has two consequences. One
the test is not affected by fluctuations in network usage though it may be affected by unwanted
traffic from or to the testing machine as a consequence of the operating system or other
processes running on board. Two, since the server and the client are running on the same PC, the
host machine is working much harder. In tests lt1-4 and lt8 the test is conducted with the same
test conditions as far as could be controlled, except that in lt1 12 variables of data are captured at
0.1ms off the board and in the other tests only two variables are captured. This data capture is
with Control Desk (the dSpace host PC data capture program) and the pre-DHT.exe program
operates its loop as fast as possible. The board is running at 0.1ms time-steps (as was being used
at the time in local testing). The first thing to notice is that the minimum delay is quite small,
0.2ms. If all other variables that can affect the test are constant, it seems that capturing more
variables doesn’t significantly affect the test since delays and delay variations in lt1 is consistent
with the other tests (lt2-4, 8). While the minimum delay is very small this has a very small
influence on the actual overall average delay. A few variations in response can be observed, the
9
delay is at least 50ms consistently, though several fluctuations occur that lead to higher delays
around 100-500ms.
In some tests, particularly, lt1 after an initial period of fairly constant delay variation, the
delay variation increases linearly through the test. While this behaviour can happen at any time
this tends to happen towards the end of a test (a 60-80s period). Also, and lt1 is a good example,
there can be a sudden spike in the delay several times bigger than delays occurring consistently
through the test. Sometimes (lt2) the spikes occur more frequently and have significant variation.
Fig. 2.3 Network delay from pre-DHT.exe tests.
In tests lt6, lt7 and lt9, the board is running at 1ms time steps and data capture is also at
1ms time steps. The maximum delays are much lower, for large periods the maximum delay is
constant around 5ms though there are large variations as before with delays up to 30-50ms. lt9 is
interesting as the tests begins with a linear increase in test delay as the test proceeds, drops to a
much lower constant variation in test delay for a short period, then returns to the previous
behaviour with the maximum delay progressing as if the drop in delay had not occurred. In tests
lt7 the board is running at 10ms time-steps with data capture at 10ms time-steps. While there is a
10ms time-step delay as expected there are no other observable fluctuations in delay and though
this is a large delay the consistent behaviour is what would be required for a distributed test.
10
Secondly, in tests e1-e4 conducted with the server running on a local network machine,
the minimum delay increases to 0.3ms for tests at 0.1ms and is 1ms for tests running at 1ms.
Similar variations in delay are observed as with the local host tests. There seems to be no other
real observable difference between tests running on the local host PC and with those with the
server distributed to the local network PC. In test e2, 12 variables are captured at 1ms time-steps
with the controller board running at 0.1ms. However, the delays are not significantly lower than
test e1 or much different from lt1.
Finally, in tests distributed to the dSpace host PC in Bristol but not running on the server
board (b1-b3), tests are conducted at 1ms time-steps, since with 1ms time-steps satisfactory
dynamic control of an actuator can be achieved and the delay performance is much better. While
the minimum delay is much higher (the measured trace-route round trip time) and consequently
in the periods which have fairly constant delay variation behaviour (e.g. b1, between the first two
25ms spikes) the delay is higher than similar periods in an equivalent test e.g. lt5 or e4, between
10-15ms, similar behaviour in terms of delay fluctuations observed in local tests is found. In b1
and b3 it is difficult to ascertain if there is a pattern between the spike variations observed in
tests with Bristol and with local tests. In test b2 linear increases in maximum delay as the tests
progresses are observed (as with lt1, lt2, lt9, e1-e3).
These tests, though they are not statistically conclusive do indicate that while minimum
delays to Bristol of around 6ms may be manageable with delay compensation schemes, the larger
and significant fluctuations in delay make fast continuous control of a distributed actuator very
difficult under these conditions. The average delay is reduced when the controller board is
running at 1ms time-steps as compared to 0.1ms and delay fluctuations are not observed in the
test running at 10ms with data capture at 10ms time steps.
The results suggest that the computational load on the controller board and by virtue of
data capture (as the test progresses), on the host PC will significantly influence communication
performance. The larger the controller board time-step, the lower is the computational load on
both controller and host PC and the better is the communication performance both in terms of
maximum delay and with the elimination of delay fluctuations, especially at larger time-steps.
However, at larger time-steps the board updates its calculations less often and therefore
minimum delays are large. The results suggest a balance between computational load and
controller board time-step should be made to minimise delays caused by communication and to
eliminate fluctuations. If test are to progress at lower time-steps the significant fluctuations in
delay will be problematic for testing. No clear pattern emerges to explain these delay
fluctuations. There are various possibilities, from variations in how data is captured, how
11
processes are scheduled on the host PC and perhaps due to the network. There seems to be
saturation of computational capacity as tests progress at times which leads to delays increasing
but, this behaviour cannot be predicted, as with random spikes in delay.
In order to further investigate reading and writing performance on and off the board, a
variety of additional tests were conducted, reading and writing off the board without using
sockets. These tests were conducted on a variety of board / host machine configurations
(including at UC Berkeley, using a dSpace DS1104 card hosted on a PC with the following
specifications: Intel Xeon 1.5GHz 1Gb RAM Windows XP Pro sp2). In all cases reading and
writing speeds would deteriorate slightly to a fairly consistent mean as tests were repeated, until
the PC was restarted. In some cases the second time a test was run (after being compiled and ran
on the board) the reading and writing average speed (as measured using an interpolated operating
system tick count on the pre-DHT.exe program which would run as fast as possible) would
increase and consequently more read write operations could be completed during the test. The
behaviour of the dSpace board is expected to be fairly consistent but there seemed to be a
relationship between what the host PC was doing and the performance of the test. To ensure
better tests, tests would usually require a host PC restart. The reason read write speeds
deteriorated with time was unclear, other than it is known that as the PC is running for longer
idle tasks/threads remain in memory on the PC from previously running programs and can often
deteriorate PC performance (this memory may be cleared by a ProcessIdleTasks call in
Windows). By affecting the computational load placed on the PC delays could increase as
performance reduced. The different machines showed slight variations in read/write
performance, and though machines with greater processing power seemed to perform slightly
better the differences could not be determined conclusively with the tests conducted. At this
stage, testing strongly indicated that the host PC and particularly by virtue of the multi-process
nature of the operating system used, played an important role in determining test performance.
After these tests the versions of host PC software used for programming the control
system, Matlab was updated to version 2006b at both Oxford and Bristol and the dSpace
software upgraded to release 5.3 so that the latest software based performance enhancements
could be available, but also more importantly so that in development there were no compatibility
issues with the developed code (testing with future and previous releases of dSpace software
installed is possible, and tests do not rely on the same versions being installed at all sites). This
was the software used for the remainder of testing.
Since the Oxford dSpace machine was coming to the end of its life and there were
significant performance issues with it the test machine was upgraded. The upgraded machine
12
was chosen carefully (and specifically customized) as it was known that its performance could
affect the test. In addition an unrelated hardware fault to the Bristol test machine meant that this
was also upgraded. The upgrades were consistent with the budgets available and the machine
specifications are presented in Ojaghi (2010).
2.3 Board to board sine testing
The tests presented in the previous section gave an indication of the communication issues faced
in controller board to PC to network communication. These tests were extended to attempt
distributed control of an actuator with a series of open loop (outer loop command) sine tests.
This was achieved by enabling inter-controller communication between Oxford and Bristol.
These tests were also an opportunity to explore what control signals would be required to enable
distributed hybrid testing. Two alternative communication models are presented in Fig. 2.4.
Fig 2.4 Inter controller control signals. Minimal (top), with the addition of additional control variables (bottom).
To ensure computational load involved in inter-controller communication would be an
absolute minimum, in order to encourage robust communication the top model of Fig. 2.4 shows
only one command variable being transmitted and in order to provide feedback to a potential
numerical model only one variable is returned. The program also reads and writes the board time
variable to compute the network delay at the client. In the lower model two variables are
transmitted to control a remote actuator (a timestamp and a displacement command). Three
variables are returned, a time-stamp (this could be the local time read off the server controller
13
board or as shown here the client time received, written to the controller board on arrival, read
and sent back), the displacement and force achieved. The client also writes the time kept value
back to the board when writing data that has arrived. The second model is more computationally
intensive but also provides more flexible control since displacement variables returned can be
used for client variable delay compensation and the time signals can be used as control variables
to detect delays and data loss. This functionality proved essential as more complex control was
attempted and most DHT.exe series programs applied it.
A series of open loop constant amplitude and frequency sine tests were conducted, first
using the top model (Fig. 2.4) and then the bottom model (Fig. 2.4) attempting to control a single
distributed actuator from the one storey test rig described in Ojaghi (2010) hosted in Bristol from
the controller board in Oxford. The control loops described in Fig. 2.4 would be applied and the
DHT.exe series program would run as fast as possible applying single process synchronous
TCP/IP.
The tests shown all attempt to command the actuator at 5mm 1Hz. Due to the mode of
communication a stepped signal is received at Bristol, this directly commanded to the actuator.
The signal is updated around every 7ms or so and the actuator for the most part responds well.
However various issues are presented. Firstly, the top communication model of Fig. 2.4 is
considered with sample results shown in Fig. 2.5. At times when data does not arrive for one of
these 7ms steps, it can arrive slightly later or earlier or not at all. With data not arriving or
arriving late, the rate of loading at the actuator reduces as shown in the left zoomed view of Fig.
7.5. As the test progresses in some cases there is significant saturation in communication
performance leading to increasingly larger delay spikes. This causes large data loss events where
data is not written to the server controller board. This is likely a client or server host PC or
controller board saturation, where either the host PC drops the data or the controller board blocks
read write access. The consequence, shown in the middle view of Fig. 2.5 is that the desired
waveform cannot be reliably produced. The final important observation to be made in these tests
is that user interrupts significantly affect test performance. Mouse clicking, accessing menus etc.
in a random way at client or server can cause large data loss events as reading and writing is
stopped to process the user interrupt. This has severe consequences for the test since the actuator
will display holding behaviour since the command is not updated for some time and will jump to
the new command which is often quite some distance away as communication resumes (Fig. 7.5
right, client and server time not in sync). This would be a likely source of instability in a real-
time hybrid test, not to mention incorrectly imparting severe dynamic loading to a physical
substructure. Due to the way Windows process scheduling is conducted, user interrupts have
higher priority than standard programs. Another example is if the data variables are printed to
14
screen by the DHT.exe program shell as they are read off or written to the board. Though
computational load on the PC increases, since visual tasks are given more priority over
competing windows processes communication performance can improve as DHT.exe is given
more priority.
Fig. 2.5 Distributed sine tests using top model of Fig. 2.4.
In Fig. 2.6 tests are repeated using the bottom model shown in Fig. 2.4. It is clear that
though for the most part the sine waveform can be reproduced and the actuator correctly loaded,
again, variable spikes in delay lead to small loading issues, while user interrupts can cause large
data loss events causing the actuator to hold and ramp to a new target position after data loss
stops.
15
Fig. 2.6 Distributed sine tests using bottom model of Fig. 2.4.
2.4 DHT.exe series program
In Fig. 2.7 the workings of the DHT.exe program v0.3.1.1 is shown. The most successful
DHT.exe series program it was used for extensive testing within UK-NEES. Using a single
process synchronous mode of TCP/IP it is shown here connecting two sites using the client
server architecture applied by UK-NEES. The sites in this case dSpace control hardware. The
program is specifically optimised for connecting to this hardware platform. Both background
tasks relating to passing of critical control data between sites and foreground tasks relating to
usability are shown. This program superseded by IC-DHT.exe exhibits many of the features used
to conduct stable and accurate real-time DHT within UK-NEES. In particular it applies a socket
architecture to minimise latency and computational overhead.
In the remaining parts of this report other aspects of distributed testing considered by UK-
NEES are discussed. In the next section the use of web services is discussed. This is followed by
a discussion on security and network usage. Social aspects of distributed testing are then
discussed. While a usability strategy is built into the operation of the test middleware and
associated programs to coordinate and inform distributed test personal of activities and to
provide live test updates, support systems can play a vital role in distributed testing. Also
16
distributed testing introduces issues of data ownership which are discussed. Finally a brief
summary is made.
Fig. 2.7 The workings of DHT.exe v0.3.1.1 connecting two sites using dSpace controllers and showing both
foreground and background tasks.
17
3.0 Web services based communication within UK-NEES
The grid services concepts originally proposed for UK-NEES (Saleem et al. 2008) sought to
introduce a web services approach to enable operation of the UK-NEES Grid. Web services
would be used to integrate the provision of robust distributed inter-controller communication as
required for DHT with provision of support services necessary to fulfil social tasks to assist work
in the distributed environment. These services included tele-presence, to enable remote viewing
of tests (including test data); tele-participation, to enable remote viewing and participation in
tests; tele-operation, remote operation of tests (including controlling scientific equipment by
enabling locally generated commands to be selected by remote participants); and provision of
storage and online access to data.
An online presence would be created to serve as the front end external access point to all of
the facilities at each local site. To enable DHT, web services would be developed to plan and
prepare experiments between sites beforehand and during actual testing, manage human to
human and controller to controller interaction. The web services layer would sit over existing
legacy systems providing a seamless link between the different hardware platforms used at each
site. Different web services, for example, controller web services or physical web services would
be developed to be applied to the different aspects of a test. To support work in the distributed
environment exiting open source tools provided by it.NEES.org would be used or adapted to
enable tele-presence and tele-participation, integrating acquired tele-presence cameras and
videoconferencing equipment. In addition online data access would be given to locally stored
data. Data repositories would be developed for each site to organise and give access rights to
data that has previously not been stored in an accessible format.
While a sockets based approach is the more conventional form of Internet communication
a web services approach to networking communication was considered since it can overcome
problems encountered when using sockets. A grid architecture based on web services solves
communication restrictions introduced due to each sites network security restrictions, network
configurations or firewall policies. In a sockets based approach these multiple limitations to
communication must be solved in a static way and customized solutions developed for each
problem encountered. However, since web services use standard HTTP communication (as used
for web browsing) over a standard transport protocol these problems can be avoided.
In addition a web services approach aims to avoid platform dependencies and to allow high
scalability in the test architecture. It seeks to enable new sites and components to be added to the
repository of resources available to the distributed testing system in a relatively straightforward
18
way. Since each site involved in a distributed test may use different hardware and software
systems, distributed communication can be limited to sites using the same systems unless some
form of standard communication is developed between them.
By applying a web services approach, distributed test modules hosted on different testing
software environments can interact in a standard way using a shared distributed test protocol
over the web services communication protocol (SOAP, Simple Object Access Protocol) to allow
interoperability between the different site systems. They operate as web based API’s
(Application Programming Interface). By adding a web service interface in front of existing
testing systems, these may be accessible by any other component in the testing system
independent of platform, programming language or paradigm.
This approach, using many of the existing technologies developed for grid based
computing and collaboration is an attractive solution to the communication requirements of the
UK-NEES distributed testing system, offering an easy to use yet powerful way of orchestrating
distributed experiments. It shares many of the benefits and features developed by NEES to
enable distributed testing using NTCP (NEES tele-operation control protocol), applying web-
services based communication with a robust protocol for test communication. This protocol has
been designed with the potential for network delays in mind (Pearlman et al. 2003). It holds
actuators if delays are encountered and can recover from network delays and dropped
connections. Though further work attempted to ensure continuous motion during testing
(Mosqueda et al. 2006) network delays encountered limited continuous actuation with this
approach causing stress relaxation in the specimens tested. All tests conducted using this
approach so far have been at large timescales (taking place over hours).
Within UK-NEES a web services implementation for DHT did not prove to be the best
option. While the approach can work well in provision of support services, it does not
satisfactorily address the needs of DHT, particularly for real-time testing nor does it necessarily
offer a significant advantage over using a more conventional, sockets based approach to enabling
controller to controller communication for DHT.
To implement web services for DHT it is still required to solve the communication
problems between a web service and, each sites’ legacy hardware controllers and software
systems. This requires customized solutions for each controller or local testing system which is
to be connected. The same would be required using a conventional sockets based approach,
though standard communication between clients and servers using sockets would have to
accommodate the different platforms the clients and server may be hosted on.
19
In addition passing control signals using the web services protocol imposes significant
computational overhead and is not the fastest (lowest latency) method for network based
communication. To enable DHT at rates up to real-time low latency communication is critical,
the additional complications and overhead imposed by transmitting and processing web services
based control data with SOAP (utilising XML (Extensible Markup Language) based messaging
transmitted over the HTTP application layer) make it a poor choice for real-time distributed
control. Though a sockets based approach has to contend with additional communication
restrictions (institutional firewalls) it is also the fastest practical technique for network data
transfer.
With these in mind an alternative approach for managing communications within UK-
NEES was pursued. Out with this initial project a web services based approach would still be
favoured to provide the front end presence for each of the nodes and to integrate support
services, while an optimised (fast) sockets based implementation would be developed within this
project to connect partner sites for DHT. This inter-controller communication program would be
developed to provide the initial connections between sites, in order to first understand the
unknown problems of conducting DHT, particularly in real-time; to develop the protocols to
manage the test; and perhaps most importantly, find solutions to ensure robust communication in
the fastest possible way. The eventual aim to integrate this socket based DHT implementation
within the web services front end, to be called from but ran parallel to the web services serving
other tasks. This mixed approach would be necessary to enable DHT but would also ensure the
support services provision would retain the advantages of a web services approach.
In this project sockets based communication programs were developed to enable DHT and
real-time DHT between the UK-NEES sites. These would establish a generic testing protocol for
DHT and while the solutions were tailored for the test environments at each site, the same
functions and approach developed could be applied to other testing environments in a relatively
straightforward way. Since web services would not be applied at this stage of the project
additional support services were integrated within this program or used separately without
integrating them with web services to allow robust DHT experiments to be conducted. The use
of support services for example, tele-presence would also be evaluated to establish whether they
would interfere with the test (since they use the same network connection). This would provide
evidence for whether it would be practical to apply a web services approach for support services
as was the aim within UK-NEES, when conducting real-time DHT.
20
4.0 Network usage policies and security
One of the first barriers to making client-server connections using a sockets based approach is
that of institutional policies on network usage and network security. Network security is
obviously important, since malicious network usage (e.g. hacking), virus infection and other
security incidences that are made possible by virtue of Internet usage can, not just disrupt a test
but, can cause damage to data and even hardware at each of the testing sites. This is a concern,
since DHT is made possible by opening network ports on testing machines and by passing
control data between them across a network which is out with the control of any of the testing
sites and gives shared access to anyone connected to the Internet. By virtue of allowing Internet
communication between sites, a channel is opened up to probes of testing machines from
malicious and non malicious Internet users not involved in tests. The testing system and
consequently all computers on the local network at each site may be affected by these network
probes. Network security, due to the potential threats caused by the Internet is taken very
seriously by all institutions. Each institution has multiple layers of security and specific policies
to protect its users from malicious computer attacks. These are entrusted to network
professionals who are in charge of maintaining a very high quality of service for all users of the
local network, ensuring that the network is fully operational for most of the time and that the
shared network capacity is used fairly within the local network. The networks are very robust;
highly redundant with downtime warnings for planned maintenance given in advance. Network
communication and security responsibilities are also shared with the national institutional
network provider (ja.net within the UK) and through them, agreements internationally, to
connections to other national institutional network providers. Naturally these responsibilities do
not carry across the general Internet connections the network provides access to. Within Ja.net a
service level agreement (SLA) of 99.7% uptime is quoted for all IP traffic between institutions
(ja.net, 2010) though service is typically better.
Typically, all incoming connections, originating from outside the local network are
blocked by the institutional firewall. Usually outgoing connections are permitted and the reply to
the IP (Internet Protocol) address and port used to initiate a connection is also permitted from the
server that has been connected to. To connect to the testing machines and hence hardware
controllers at remote sites the appropriate IP addresses and ports at each institution must be
opened to allow network traffic through the firewall. While this may seem a straight forward
task, it can be an administrative challenge. To achieve this it is vital that good communications
and rapport be kept with the networking personnel at all institutions involved. Since they are in
charge of the network provision not just for the DHT testing system but for all other network use
21
they, rightly so, have concerns whenever a new and unknown request for network usage is made.
There are many misconceptions about the networking needs of a DHT system. Initial
information from NEES quoted network bandwidth minimum requirements of 100Mbps and
recommended 1Gbps for use of their tele-presence and related tools. Network administrators are
therefore alarmed at the potential needs for very high bandwidth network usage, potentially
limiting service to other users. In reality, the network usage for tele-presence is not any higher
than a typical user streaming a film online, is perhaps less and is limited in terms of duration.
Network administrators also have a duty to enable the research needs of their network users to be
met. A bigger concern is security. By opening ports, they are agreeing to transfer some
responsibility for security to the DHT test operators.
A typical solution (as agreed for the UK-NEES point of presence servers – providing the
outside link for the UK-NEES sites) is to place DHT testing system servers inside the
demilitarised zone (DMZ), a sub-network of the local network. Here they are fully able to make
incoming and outgoing network connections without affecting the rest of the network and are in
charge of maintaining their own security. Often physically placed in a secure room with other
DMZ servers, in case of a security incident, a firewall protects the rest of the network. They can
be used to facilitate testing machine connections inside the institutional firewall (i.e. the
controller board host PC) since they are on the same local network but provide an extra layer of
security.
While this is the favoured solution on the part of the network administrator and is
attractive as full management of remote sites may be given to the DHT operator at each site, this
was not the approach taken for enabling DHT within UK-NEES. Since reducing network latency
to a minimum was the preferred option, in order to maximise the possibilities for real-time DHT
it was decided to directly connect testing machines together through the university network
connection rather than by introducing an extra hop (and subsequently extra latency – however
small) to the test. Negotiations with network administrators at all sites involved, directly by the
author, other members of the UK-NEES team and by partners at the other sites ensured that
network administrators were made fully aware of the network access needs of DHT.
Ports were opened on the departmental firewall between the Oxford testing machine and
testing machines at Bristol, Cambridge and Auckland universities each hosting test controller
boards, the later for initial UK-NEES, NZ-NEES testing. Each testing machine has fixed IP
address(es). At Oxford, TCP (transmission control protocol) and UDP (user datagram protocol)
traffic would be permitted between two ports, for each IP address of the testing machine
22
connected to at each site. The ports chosen where not well known ports (reserved for specific
applications e.g. HTTP browsing) but unassigned registered ports.
This solution provides a good level of security since only traffic using these protocols
(the most commonly used on the Internet) is permitted between sites (Internet Control Message
Protocol (ICMP) attacks are often attempted) and only data connecting to specific ports from
specific IP addresses will be accepted by each site (return traffic from servers to clients are
allowed as usual). All other traffic to a test machine is blocked by the institutional firewall at
each site. While it is possible though perhaps not trivial for malicious users to impersonate IP
addresses, such an attack would have to know which IP address to impersonate and which port to
connect to.
Additional security measures have been implemented for testing (between Oxford and
Bristol). Local machine (software) firewalls are used to block external network traffic outside of
testing periods, minimising the time the testing machines are exposed to external traffic, though
local firewalls are turned off during testing. Since random security incidences via port scanning
are unlikely, with most Internet based incidences occurring due to Internet browsing (inadvertent
visits to compromised websites or other sites hosting malicious code), Internet browsing is not
permitted (though possible) on these machines. Strict usage guidelines are in place limiting the
installation of software and use of these machines. They are regularly updated and antivirus
software used. In addition the inner loop controller and its associated testing machine for
adjusting inner loop parameters is kept off the network.
As part of the social tasks of the distributed testing system it is important to enable remote
access to the testing machines of each site. Access to the desktops and file transfer between
various testing machines can be achieved through remote desktop (Windows) or other programs
such as VNC. This has proved particularly important within UK-NEES as it allows the lead test
site operator (at Oxford) to directly access remote testing machines for troubleshooting in
development of the technique. Allowing such access has severe security implications, and each
institution has given different responses to such requests. Access to the Cambridge test machine
was not directly permitted by network administrators, while Auckland University permitted
unencrypted access using the same IP and port security measure as allowed for DHT. The most
appropriate solution to protect the computers and ensure user and data privacy at each site is the
solution provided by Bristol University to access their testing machine. Remote desktop
connections are enabled through secure tunnelling (encrypted connection) to a proxy server in
Bristol that connects to the testing machine. This also allows for testing using the distributed
23
controller board and testing machine at Bristol as a client hosting only numerical substructures to
physical substructures in Oxford without any operator being present in Bristol.
Another security point to consider is the actual transmission of data and the software
vulnerabilities of the DHT program. In the first DHT tests, which were conducted from the
Oxford controller board to a pseudo-physical model hosted on a PC in Glasgow out with the
Ja.net network, a VPN (virtual private network) was used to bypass firewalls and to encrypt
transmitted data. Secure transmission is an option provided by NTCP and is regarded as an
important feature within NEES for DHT (Mosqueda et al. 2006). However, this approach was
soon abandoned as it was felt that secure data transmission was not a requirement for a robust
DHT system and was contrary to the minimum latency philosophy of DHT.exe.
Security is important to prevent malicious users from disrupting a test, either by
comprising a computer beforehand or interrupting or intercepting data transmission during a test.
This can have severe consequences in a hybrid test since unique or expensive test specimens may
be inadvertently yielded or destroyed. Worse still there could be a dangerous collapse of the
physical substructure. Though, this is can be prevented by the local test safety limits (which
cannot be compromised via the network). However, a targeted and sophisticated attack is not
likely.
It is important to consider the needs of data delivery for DHT and the consequences if
secure delivery or encryption techniques are used. Firstly, what is being transmitted during DHT
is not private data, but control data. The numbers being transmitted in of themselves, have no
real value to anyone other than those involved in testing (unlike transmission of online banking
details) and are therefore not attractive to malicious network users. Secondly secure data delivery
does not mean guaranteed delivery, malicious network users can listen in to, or block (with
software or physically) secure data communication just like any other data. Guaranteed delivery
is often a function of the underlying communication protocol and delays can result as a
consequence. Most importantly encryption and subsequent decryption adds significant
computational overhead and hence latency to a test which makes it inappropriate for real-time
testing.
The fastest and simplest approach for network communication is applying socket
technology using the TCP/IP or UDP/IP communication protocols. If adequate security measures
are taken as described locally, the likelihood of a malicious incident on a testing machine is
almost insignificant, though important to mention because of its potential to disrupt and the
perceived importance given to it by network professionals who may not appreciate the needs of
DHT. Good design of the test middleware means that software vulnerabilities, such as protecting
24
memory from buffer overflow attacks should be implemented. A security incident occurring at a
testing machine is not likely to be able to achieve much in the way of influencing the test other
than causing it to stop, if the control system has been designed with appropriate care. There have
not been any known security incidences during the entire development period of UK-NEES.
A final security measure to consider is the use of a private network connection or dedicated
line. Such a connection has been installed within the UK-NEES network. By providing a direct
connection between the UK-NEES testing labs, firewalls are no longer an administrative issue to
consider and since the connection is not physically shared with other users, security incidences
are more unlikely. However, in practice network performance using this network was not
optimal. While it was possible to conduct good tests with this network it is believed as it is
software limited to 100Mbps data transfer delays can be higher particularly at the start of data
transmission. This network is further discussed in Ojaghi (2010).
5.0 Social aspects of a distributed hybrid testing system
The test middleware serves two roles required by the developed DHT test protocol, the
background and foreground tasks. The background tasks are responsible for ensuring the
technical requirements for the test are met, that is they set up the test according to the client-
server test architecture topology, synchronise time-steps and transmit control data between client
and server(s) during the test according to the type of test protocol chosen to conduct the test. The
foreground tasks on the other hand ensure the critical social tasks of the distributed testing
system are met during a test; they manage the coordination of human to human and human to
controller interaction in the moments immediately before a test, during a test and at the end of
the test.
Though the features implemented within the test middleware and linked to those
implemented within the Control Desk application enable the essential foreground tasks to be
completed, without requiring audio/visual contact between sites, support systems that enhance
communication between distributed sites and enable audio/visual/textual contact and access to
distributed testing controllers have been applied. These and the issues they raise are briefly
discussed in this section. Finally, distributed collaboration raises new data ownership, access and
intellectual property issues these are also briefly discussed.
25
5.1 Support systems
During DHT experiments several support systems were used and trialled to fulfil many of the
social tasks of the distributed testing system. The aim was both to support testing and to assess
the impact of, limitations and potential issues with using these technologies. These support
systems all aim to enhance the user experience of the distributed testing environment. Since in
the distributed environment, no single operator has full access to the entire experiment, by giving
access to remote test controller host PC’s when required and by increasing awareness of testing
at remote sites, through enabling audio, visual and textual contact between test operators during
testing, as well as audio visual contact with remote test rigs these technologies all aim to give as
much access to the local testing environment to remote users as possible.
These support systems with the exception of the telephone require access to a computer
and an Internet connection, though in some cases a dedicated network connection between sites
is adequate (here UK Lightpath). In order to limit the impact of many of these tools that rely on
the use of a computer on the host PC performance during testing, these support systems were
generally accessed using an additional computer setup for that purpose (unless testing demanded
the host PC be used).
For simplicity these are listed:
5.1.1 Remote access to testing machines
Remote access to host PC testing machines proved to be one the most useful support systems
that was applied. This gives essentially the same level of access to a remote user that a local user
would have to host PC testing machines. Primarily Windows Remote Desktop was used (though
non Windows software such as VNC has also been used – the network security considerations
have been described).
This was useful first to allow the test developer (the author while in Oxford) to help setup
the remote testing machine in Bristol for real-time DHT by installing software and optimising
the host PC operating environment. Second it was crucial in allowing the test developer to access
remote testing machines to transfer control system files and programs before a testing session
and to check the functioning of these programs before testing. Also, since many programs and
control systems were under development, to troubleshoot these programs and control systems via
pure software tests (including network tests) or to investigate control system/software
phenomena that would occur during joint testing with the test collaborator in Bristol. Pure
software tests involving the network connecting to the controller cards in both Bristol and
Oxford were also conducted outside of normal lab test hours to check connections and
26
performance. These were often conducted by a third machine running two instances of Remote
Desktop. In testing it proved to be important to have one person oversee all aspects of a test and
be responsible for ensuring that all control software was correctly programmed and connected.
Since a distributed control system is inextricably linked to multiple control systems across sites,
one person is best placed to ensure that the control system is properly designed and is correctly
connected, ideally the control system designer. However due to the distributed nature of the
experiment and its complexity this role has to be supported by other members of the distributed
testing team.
In some of the cases where testing was conducted with the client located at Bristol and
hosting only the main numerical model, remote desktop was used in Oxford to access the Bristol
test machine without the presence of a local test operator at Bristol to conduct real-time DHT
with a physical substructure hosted in Oxford.
Finally, it was also useful to enable files to be shared between client and server sites after
testing.
5.1.2 Telephone contact
The telephone proved a useful tool during testing. It was used primarily as it provided an
audio link between test sites without using local computing resources. In two site testing it was
essential to guide remote test operators who were not familiar with standard operating procedure
or new functions. It was also very useful when conducting repeated tests, to pass instructions to
reset the test controller for the next test or to adjust variables as specified by the test operator (the
author, at Oxford). Telephone conversation benefits from the use of a headset as it frees
operators hands to access the multiple hardware interfaces involved in the testing system (inner
loop and outer loop controllers). In three site tests telephone conversation was limited to two site
conversations. In initial tests, one site (Cambridge) was unfamiliar with the testing systems and
most conversation was between the main test operator (the client at Oxford) and with the
operators at Cambridge. Since only one line was available the test operator at Bristol (who was
experienced in using the testing system) was updated periodically. While the conversation would
have benefited from three way calling it is important that the client site coordinates conversation
between each site in turn. It is administratively difficult to manage multiple conversations,
particularly if one main test operator, the client has to manage local testing systems as well.
5.1.3 Instant messaging - when speaking is not possible or there are multiple sites
As an alternative to telephone conversation, instant messaging was used in some tests to allow
textual messages to be transmitted between sites during tests. This was particularly useful in
27
three site testing where only two sites (Oxford and Cambridge) were predominantly in telephone
contact. Simple messages could be sent to update the test operator in Bristol (by the client at
Oxford). The messages were kept two way so not to complicate the sending and
acknowledgement of instructions between sites. They were predominantly used by the client
operator at Oxford to inform the operator in Bristol to reset for the next test (confirmation was
only done if there was a problem). More complicated messages proved distracting and would
begin to overburden the client operator who also had to manage the other server site and local
test systems. More work was planned to implement text messaging as part of a web services
based support system. There pre-programmed messages would be used to limit the time required
for typing and messaging could be implemented in a manner to contact sites with messages
intended only for them or to contact all sites with a global message. MSN messenger was used
for the testing conducted.
5.1.4 Live editing of shared documents via Google Docs
In order to document multiple experiments in a particular testing session, shared documents were
used in two site testing between Oxford and Bristol. To reduce the burden on the main test
operator (the client) during a testing session, the server test operator would be charged with
updating the document as tests progressed though only if they could, since the client test operator
would often lead experiments. Since the document could be shared live, it was a very useful way
for both operators to record particular observations relevant to their test site for each test that
progressed.
5.1.5 Tele-presence
One the main issues with working in a distributed testing environment when conducting DHT is
that remote operators/test participants no longer have audio visual access to all physical
substructures. The role of tele-presence is to attempt to give test participants/operators some
access to physical substructures at other sites. Tele-presence cameras were installed at each of
the three UK-NEES sites. These cameras enabled local and remote operators views of the
physical substructures (and in some cases test operators). The cameras had full pan, tilt and zoom
capabilities (high levels of optical zoom are possible). Streaming over the Internet/network was
possible at various frame rates depending on desired quality and with regard to bandwidth usage
issues. However, these streams were not encrypted. The video could be recorded, though while
the cameras had microphones they were not best suited for capturing sound in the lab, nor could
sound be streamed or recorded with the software available.
28
Since sound can play an important role in testing for example, to capture the sound of
cracking, capturing sound and video can be useful both for local testing so experiments may be
reviewed and in a distributed test since these could not otherwise be sensed by remote test
participants.
It is important to consider how audio visual data is used in the distributed test environment.
If too many streams are transmitted a test operator cannot focus on events at any one particular
site without missing out on what may be happening at another site, in addition sound from
multiple sites can be distracting. Testing labs can be noisy places and sound captured may be of
actuation equipment and test rig connections, not just of the test specimens. This may or may not
be useful. For example, in two site testing between Oxford and Bristol sound, as transmitted via
the telephone played an important role in allowing the client operator at Oxford to determine the
quality of the experiment. Although live graphical data from the server was an indicator of test
quality and distributed control, as observed at the client, in tests were the control system was on
the verge of stability or if there were significant data loss events the rattling sound that was heard
indicated control issues, whereas when the test was well controlled no noise from the test rig was
audible from the telephone. Audio played a more useful role than video in this respect.
In testing conducted, tele-presence did not play a particularly useful role particularly since
audio contact via the telephone was available. This was for a number of reasons. The physical
substructures were well known by test operators beforehand; in the case of testing with Bristol
the physical substructure remained elastic and as movement was small there was not much to
observe; and in the case of testing with Cambridge it was difficult to see the foundation pad
within the centrifuge basket as it was being moved according to commands both locally at
Cambridge and during DHT (see Ojaghi et al. 2010 for a description of the experiment).
Placement of cameras can be a difficult task, and they do not exactly replace the real thing when
video is concerned. It would also be useful if audio and video recording could be triggered at test
start.
The use of tele-presence cameras also raises two further issues. The first is related to test
privacy. The cameras may be controlled and test labs viewed by test operators at remote sites and
also interested test participants worldwide (though access is password protected). Since the
cameras can give access to other views of the testing lab and not just the systems being tested
there are privacy issues to consider. Individuals or sensitive (commercially or otherwise)
technologies under test which are not related to the DHT test may be viewed remotely. This will
require other users of local labs to consent to the use of such cameras or to ensure that such
cameras do not infringe on privacy. In the case of Bristol a strict policy regarding the use of
29
cameras is in place, such that cameras face the walls when not in use and other users of the lab
are warned beforehand when testing means that they might also be filmed. No such policy is in
place at Oxford and to the best of the authors knowledge neither is there one at Cambridge.
The second issue to consider is the impact that tele-presence can have on the quality of
testing. Tele-presence can be bandwidth intensive, and since it may share the same local Ethernet
connection to the Internet, high bandwidth usage locally may impede or clash with the critical
control signals required to conduct real-time DHT. It was important therefore to find out whether
this would occur or whether the developed testing system was robust enough not to be effected
by this. Testing with and without tele-presence suggested that tele-presence did not affect the
running of the test if the host PC and tele-presence machine used the same network connection
locally. Additionally, it would be possible within UK-NEES to enable tele-presence or DHT
communication on different networks. Real-time DHT could be conducted on the UK-lightpath
dedicated network connection while tele-presence data streamed via the Ja.net production
network, the shared Internet connection.
5.1.6 Video-conferencing
Video conferencing was originally to be used to enable operator to operator communication
during testing, though this application was never fully implemented and was largely replaced by
the other means of communication discussed. However, video conferencing did play an
important role in preliminary planning of DHT goals. The tests enabled high level meetings to
take place between members of the UK-NEES team and on some occasions participants from
partner sites worldwide. These meeting proved most important in breaking the ice and to allow
key decision makers and all members of the UK-NEES team to be involved in determining the
goals of DHT experiments, making agreements as to what equipment could be available and to
update progress. These meeting were often followed by personal telephone/email contacts and
less occasionally site visits to set up and go over the finer details of proposed work one to one.
5.6.2 Access rights and intellectual property in a distributed testing
environment
One of the social issues that emerges with working in a distributed testing environment and
especially while conducting distributed hybrid testing is that of access rights including, network
and computer access rights, data access rights, data ownership and intellectual property rights
amongst distributed test participants and sites. While this is a legal or policy issue it is worth
30
mentioning here as DHT testing possibly introduces new elements that have not been
encountered before.
In a distributed test multiple sites and test operators are involved in a test, and each site
may store test data generated locally and data fed back or received from remote sites. The
technology that enables the test may be predominantly developed by one test site or an individual
or it may be developed by many across the sites involved or by an individual out with the
network.
In conducting work on distributed computers, providing access to computing facilities by
remote personnel out with an institution could break local computer usage policies. With UK-
NEES testing, the main test developer (the author) was given formal permission by the local
network administrator to gain full access to the Host PC in Bristol. This was granted since the
majority of UK-NEES tests were between Oxford and Bristol and access to the testing machine
would be important to the success of the test.
Data ownership, data access rights and intellectual property are in some ways linked.
When the test is conducted across sites, who owns the data? Who should have permission to
access it?
The issue can be simple or complex. In the simple view taken within UK-NEES data is
generally shared between the sites by the participants, and those involved in development or
generation of testing control systems, software, test rigs and data are duly acknowledged.
In a more complex view each institution typically often claims any data generated on its
premises by its staff unless otherwise agreed as belonging to that institution, see for example,
Statute XVI: Property, Contracts, and Trusts of the statues and regulations of Oxford University
(2010). Of course research students, particularly if they are not paid by a research contract are
not automatically legally bound by such regulations (UKBI, 2010). A further complication arises
from the use of online tools such as Google Docs. Since data is stored by no one test site, it is not
clear who owns that data. A clear policy for sharing data and technology in a collaborative
framework may need to be agreed beforehand.
6.0 Summary
In this brief report a number of considerations taken in the development of the UK-NEES
distributed testing system are discussed. Firstly, the middleware development process is outlined.
Brief results are presented from selected early tests and the operation of DHT.exe v0.3.1.1, the
31
most successful DHT.exe series program described. The development of the test middleware and
the extensive testing involved lead to a better understanding of the testing environment both for
conducting test control but also highlighting usability and administrative issues in distributed
testing. These issues encountered and how they were approached are detailed in this report. The
web services approach is discussed and the alternative hybrid web services /socket approach
proposed where sockets would be used for critical communications and web services for
integrating all support systems and for calling the software required for socket communication.
Security and network usage issues are also discussed. Good communication between
networking staff and test operators is crucial, while security is important, its importance should
not be exaggerated at the expense of increased latency.
Finally social aspects of distributed testing are described. Although a usability strategy has
been developed the report focuses on the additional systems used to support work in the
distributed testing environment. Future developments of the network should seek to integrate
these within web services. The issue of data ownership is also raised. Since data is generated
between sites an agreement for data ownership and sharing should be made.
References
De la Flor G., Ojaghi M., Lamata Martínez I., Jirotka M., Blakeborough A., Williams M.S., (2009) Reconfiguring Practice: The Interdependence of Experimental Procedure and Computing Infrastructure in Distributed Earthquake Engineering, All Hands Meeting, Oxford.
De la Flor G., Ojaghi M., Lamata Martínez I., Jirotka M., Williams M.S., Blakeborough A. (2010).
Reconfiguring Practice: The Interdependence of Experimental Procedure and Computing Infrastructure in Distributed Earthquake Engineering, Philosophical Transactions of the Royal Society A. 368, 4073-4088.
ja.net (2010) JANET Service Description Version 4. 1 August 2010 to 31 July 2011 (available on www.ja.net) Mosqueda G., Stojadinovic, B., Hanley J., Sivaselvan M., Reinhorn A. (2006). Fast Hybrid Simulation with
Geographically Distributed Substructures. 17th Analysis and Computation Specialty Conference. Ojaghi, M. (Makhzan Ojaghi, S. M.) (2010) The Development of Real-Time Distributed Hybrid Testing for
Earthquake Engineering. DPhil thesis. University of Oxford. Ojaghi, M., Lamata Martínez, I., Dietz, M., Williams, M.S., Blakeborough, A., Crewe, A., Taylor, C.,
Madabhushi, G., Haigh, S., Ali, A. (2010). UK-NEES - Distributed Hybrid Testing Between Bristol, Cambridge and Oxford Universities: Connecting Structural Dynamics Labs to a Geotechnical Centrifuge. 9th U.S. National and 10th Canadian Conference on Earthquake Engineering. Paper 1024.
Pearlman L., D’Arcy M., Johnson E., Kesselman C. and Plaszczak P. (2003). “NEESgrid Teleoperation Control
Protocol.” Technical Rep. NEESgrid-2003-07, NEESgrid. Statues and regulations of Oxford University (2010) http://www.admin.ox.ac.uk/statutes/790-121.shtml UKBI (2010) Managing Intellectual Property The Guide. A Guide to Strategic Decision-Making in Universities
www.ukbi.co.uk UK business incubation.