14
1 Abstractions, Architecture, Mechanisms, and a Middleware for Networked Control Scott Graham, Girish Baliga, and P. R. Kumar Abstract— We focus on the mechanism half of the policy- mechanism divide for networked control systems, and ad- dress the issue of what are the appropriate abstractions and architecture to facilitate their development and de- ployment. We propose an abstraction of “virtual colloca- tion” and its realization by the software infrastructure of middleware. Control applications are to be developed as a collection of software components that communicate with each other through the middleware, called Etherware. The middleware handles the complexities of network operation, such as addressing, start-up, configuration and interfaces, by encapsulating application components in “Shells” which mediate component interactions with the rest of the system. The middleware also provides mechanisms to alleviate the eects of uncertain delays and packet losses over wireless channels, component failures, and distributed clocks. This is done through externalization of component state, with primitives to capture and reuse it for component restarts, upgrades, and migration, and through services such as clock synchronization. We further propose an accompanying use of local temporal autonomy for reliability, and describe the implementation as well as some experimental results over a trac control testbed. Index Terms— Abstractions, Architecture, Middleware, Mechanisms, Networked Control, Third Generation Control, Networked Embedded Control Systems. I. I A. A historical perspective The first generation of platforms for control systems in the electronic age that began with the vacuum tube was based on analog computing 1 . This created a need for theories to use the technology of operational amplifiers. The challenge was met by Black, Bode, Nyquist and others, who developed frequency domain based design methods. A second era of digital control systems com- menced around 1960, based on the technology of digital computers. It too created a need for new theories for exploiting the capabilities of digital computing, which This material is based upon work partially supported by USARO Contracts W911NF-08-1-0238 and W-911-NF-0710287, NSF Contracts ECCS-0701604, CNS-07-21992, CNS-0626584, CNS-05-19535, and CCR- 0325716, DARPA/AFOSR Contract F49620-02-1-0325, DARPA Con- tact N00014-0-1-1-0576, and AFOSR Contract F49620-02-1-0217. Please address correspondence to [email protected]. AFIT, USA. [email protected]. Google, Mountain View, CA 94043. [email protected]. CSL and Dept. of ECE, Univ. of Illinois, 1308 West Main St., Urbana, IL 61801, USA. Email: [email protected]. 1 Mindell [1] provides a detailed account of how control, communica- tion and computation were intertwined in the first half of the twentieth century. was met by the explosion of work on state-space based design by Kalman, Pontryagin, and others. Now, about fifty years later, we may be on the thresh- old of a third technological revolution in control plat- forms. This is driven by advances in VLSI hardware, wireline and wireless communication networking, and advanced software such as middleware. There has been tremendous growth in embedded computers, with 98% of microprocessors now sold being embedded [2]. In communication, besides the Internet, we may be on the cusp of a wireless revolution with Wi-Fi (IEEE 802.11x) experiencing double-digit growth since 2000 [3]. Less recognized is that software engineering has also evolved significantly in the last two decades. Although many basic ideas such as object oriented programming and distributed computing were mooted early on, they have been brought to fruition only recently due to better understanding of software management and advances in hardware. In particular, experience in design of large and complex software programs has been well codified and widely reused through design patterns such as Strategy and Memento [4], software frameworks such as Model- View-Controller (MVC) [4], and software development processes such as eXtreme Programming (XP) and Ra- tional Unified Process (RUP) [5]. The development and widespread use of software libraries and infrastructure such as middleware has drastically reduced develop- ment cycle times, and resulted in better software. A good indicator of advances made is that the level of abstractions, particularly in programming languages, is much higher today. Software design primitives have evolved from registers and variables to libraries, compo- nents, messages, and remote procedure calls. A typical mid-range automobile has about 45 micro-controllers connected by Controller Area Networks (CAN) and uses complex software [6]. This has thrust software engineer- ing into a central role [7] since the development and testing of complex software is expensive and fraught with numerous problems. There is much impetus to reuse previously developed and tested software, since it not only reduces the eort and expenses involved, but also enables sharing of valuable experience. About 40% of the software for the Boeing 777 airplane was of the commercial-o-the-shelf (COTS) variety [8]. These developments present new opportunities for control, as well as several challenges. Properly har- nessed, they can potentially revolutionize control system platforms and lead to a new era of system building.

Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

1

Abstractions, Architecture, Mechanisms, anda Middleware for Networked Control

Scott Graham, Girish Baliga, and P. R. Kumar

Abstract— We focus on the mechanism half of the policy-mechanism divide for networked control systems, and ad-dress the issue of what are the appropriate abstractionsand architecture to facilitate their development and de-ployment. We propose an abstraction of “virtual colloca-tion” and its realization by the software infrastructure ofmiddleware. Control applications are to be developed as acollection of software components that communicate witheach other through the middleware, called Etherware. Themiddleware handles the complexities of network operation,such as addressing, start-up, configuration and interfaces,by encapsulating application components in “Shells” whichmediate component interactions with the rest of the system.The middleware also provides mechanisms to alleviate theeffects of uncertain delays and packet losses over wirelesschannels, component failures, and distributed clocks. Thisis done through externalization of component state, withprimitives to capture and reuse it for component restarts,upgrades, and migration, and through services such as clocksynchronization. We further propose an accompanying useof local temporal autonomy for reliability, and describe theimplementation as well as some experimental results over atraffic control testbed.

Index Terms— Abstractions, Architecture, Middleware,Mechanisms, Networked Control, Third Generation Control,Networked Embedded Control Systems.

I. I

A. A historical perspective

The first generation of platforms for control systems inthe electronic age that began with the vacuum tube wasbased on analog computing1 . This created a need fortheories to use the technology of operational amplifiers.The challenge was met by Black, Bode, Nyquist andothers, who developed frequency domain based designmethods. A second era of digital control systems com-menced around 1960, based on the technology of digitalcomputers. It too created a need for new theories forexploiting the capabilities of digital computing, which

This material is based upon work partially supported by USAROContracts W911NF-08-1-0238 and W-911-NF-0710287, NSF ContractsECCS-0701604, CNS-07-21992, CNS-0626584, CNS-05-19535, and CCR-0325716, DARPA/AFOSR Contract F49620-02-1-0325, DARPA Con-tact N00014-0-1-1-0576, and AFOSR Contract F49620-02-1-0217.

Please address correspondence to [email protected], USA. [email protected], Mountain View, CA 94043. [email protected] and Dept. of ECE, Univ. of Illinois, 1308 West Main St., Urbana,

IL 61801, USA. Email: [email protected] [1] provides a detailed account of how control, communica-

tion and computation were intertwined in the first half of the twentiethcentury.

was met by the explosion of work on state-space baseddesign by Kalman, Pontryagin, and others.

Now, about fifty years later, we may be on the thresh-old of a third technological revolution in control plat-forms. This is driven by advances in VLSI hardware,wireline and wireless communication networking, andadvanced software such as middleware. There has beentremendous growth in embedded computers, with 98%of microprocessors now sold being embedded [2]. Incommunication, besides the Internet, we may be on thecusp of a wireless revolution with Wi-Fi (IEEE 802.11x)experiencing double-digit growth since 2000 [3].

Less recognized is that software engineering has alsoevolved significantly in the last two decades. Althoughmany basic ideas such as object oriented programmingand distributed computing were mooted early on, theyhave been brought to fruition only recently due to betterunderstanding of software management and advances inhardware. In particular, experience in design of large andcomplex software programs has been well codified andwidely reused through design patterns such as Strategyand Memento [4], software frameworks such as Model-View-Controller (MVC) [4], and software developmentprocesses such as eXtreme Programming (XP) and Ra-tional Unified Process (RUP) [5]. The development andwidespread use of software libraries and infrastructuresuch as middleware has drastically reduced develop-ment cycle times, and resulted in better software. Agood indicator of advances made is that the level ofabstractions, particularly in programming languages, ismuch higher today. Software design primitives haveevolved from registers and variables to libraries, compo-nents, messages, and remote procedure calls. A typicalmid-range automobile has about 45 micro-controllersconnected by Controller Area Networks (CAN) and usescomplex software [6]. This has thrust software engineer-ing into a central role [7] since the development andtesting of complex software is expensive and fraughtwith numerous problems. There is much impetus toreuse previously developed and tested software, sinceit not only reduces the effort and expenses involved, butalso enables sharing of valuable experience. About 40%of the software for the Boeing 777 airplane was of thecommercial-off-the-shelf (COTS) variety [8].

These developments present new opportunities forcontrol, as well as several challenges. Properly har-nessed, they can potentially revolutionize control systemplatforms and lead to a new era of system building.

Page 2: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

2

B. The importance of abstractions and architecture

In order for networked control systems to proliferate,they will need to be made easy to design and easyto deploy. This is the objective in this paper. The goaladdressed is how to reduce the design and developmenttime. This also requires alleviation of the burden on thecontrol system designer to be an expert in issues whichlie outside the domain of control, by minimizing theattention needing to be paid to extraneous details. Ananalogy is general purpose computing, where a com-putational platform is easily usable by multiple userswith different applications. Another analogy is generalpurpose networking where computers can be easily in-terconnected to communicate with each other regardlessof the envisaged application.

These objectives require the identification of well de-signed abstractions, the design of a matching architecture,and the development of a supporting middleware. It isalso important to provide services that render designeasy and quick, as well as design principles that enhancereliability. This paper can be regarded as focusing on the“mechanism“” half of the mechanism-policy divide. Thedevelopment of a holistic and systematic theory, that hasbeen the grand project of control systems research sincethe 1950s, can be regarded as the other “policy” half ofthe mechanism-policy divide.

An analogy with the developments leading to mod-ern computation and computer science is appropriate.The work of Alonzo Church and Alan Turing laid thetheoretical foundations of sequential computing. Theyindependently developed formal systems to define andmodel the notion of computable functions [9]. However,it was John von Neumann’s subsequent ideas aboutorganizing these concepts into an architecture that isthe basis of today’s sequential computers. It led to thepractical realization of the theoretical ideas. In particular,the stored program concept [10] was a significant break-through. It has led to the development of simple and uni-form mechanisms to write, test, and maintain complexprograms today. The von Neumann architecture had awide influence on the first generation of digital computerengineers in the 1950s. Till the mid 1960s, commercialcomputers produced by IBM, Burroughs, UNIVAC, andothers, were still custom built machines. However, theIBM 360, first shipped in June 1966, changed all that. Itwas the first commercially successful general purposecomputer, with the name 360 (degrees) intended tomarket its “all round” general purpose capabilities. The360/370 series introduced the notion of compatibility incomputers [11]. All computer hardware was built toa common set of abstractions such as instruction sets,memory models, etc. Consequently, the same softwarecould run on all the different machines in this series.This standardization tremendously reduced software de-velopment costs. More importantly, it solved problemsdue to changing customer requirements by supportingseamless upgrades of hardware and software. This in-

Fig. 1. Networked control system: Software component architecture

terface between hardware and software is what todayallows hardware and software designers to develop theirproducts separately, resulting in the proliferation of se-rial computation. In contrast, parallel computation lackssuch a “von Neumann bridge” [12], which, arguably, hasstifled its proliferation.

The development of appropriate abstractions andmatching architecture has been fundamental to the pro-liferation of other successful technologies too. The reasonfor the success of networked communication is, arguably,fundamentally its architecture, and, only secondarily, thealgorithms involved, though of course they too are veryimportant. In the Open Systems Interconnection (OSI)architecture [13], each layer provides a service to thelayer above it, and in turn can be oblivious to the detailsof layers beneath it. These capabilities make networksrobust and evolvable, and give longevity to the basic de-sign. For example, over the course of the years, differentversions of the transport layer protocol, TCP, have beenproposed and deployed, without necessitating a changein other layers. The overall design allows heterogeneoussystems to be composed in a plug and play fashion,making them amenable to massive proliferation, and hasresulted in the Internet. The massive proliferation hasled to reduced cost per unit over the long run, andresulted in minimizing deployment costs for new net-works, further stimulating proliferation. A similar rolehas been played in digital communication by Shannon’ssource/channel separation, one of the rare architecturalprinciples resulting from mathematical theory. Sourcecoding is often performed in software by algorithmssuch as JPEG, while channel coding techniques such asQPSK are performed in hardware by network interfacecards [14]. Closer home, in control, it is a standard ab-straction to view the plant separately from its controller.This is however obvious only in retrospect, and, evennow, is not routine in other fields; for example, it is diffi-cult to routinely simulate new policies in manufacturingsystem simulation software where plant and controllerare intermixed. Another architectural result, also onewith a mathematical foundation, is the separation ofestimation and control, which considerably simplifies

Page 3: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

3

design of control systems.

The situation in control systems today is similar tothat of computers in the early 1960s, with the criticalpresent requirement being a well designed organizingarchitecture that will make them amenable to massiveproliferation. Complex systems design can involve abasic trade-off between architecture and performance.A system designed with a more extensible architecturehas redundancies to improve design flexibility, whilea system optimized for performance has tighter cou-pling between sub-systems at the cost of lower designflexibility. While the latter approach is better in theshort term, the former is better in the long term. Im-proved design flexibility implies easier system exten-sibility, and modular design. Additional functionalityto address changing requirements can be incorporatedmuch more easily into systems which have an architec-ture that allows for evolution. Modular design allowsdifferent sub-systems to be developed separately andthen integrated into operational systems. Further, thesesub-systems can be changed, or even reused in othersystems, with minimal impact on the rest of the system.All this results in systems that are more reliable andhave a longer useful lifetime. Moreover, more generalarchitecture allows recovery or even enhancement ofsystem performance by facilitating sophisticated self-optimization capabilities. As an example, we will showhow one may deploy mechanisms such as automatedsoftware migration to optimize the use of resources andloop delays at run-time, by designing a sufficiently richsoftware infrastructure that supports such mechanisms.

There are several key challenges. Distributed opera-tion introduces problems due to concurrency and non-determinism in program execution. The network compli-cates the system further with configuration, operational,and maintenance problems. While present networkingtechnology provides some network abstractions, thereremain numerous configuration details, such as deter-mining network addresses and protocols, which needto be resolved in each application. These details occupysubstantial portions of design time and effort, motivatingthe need for even higher-level abstractions. Anotherissue is that wireless links are prone to unpredictabledelays and high losses. Hence, support for methodolo-gies to facilitate control system design in the presenceof such characteristics will have to be provided. Whatis required is an infrastructure that supports reusablesolutions for common problems.

One caveat is in order. Just as general purpose comput-ing is not targeted at high performance problems such asscientific computing, so also “general purpose” controlmay be inappropriate for situations where only a highlycoupled solution, highly customized in hardware andsoftware, can meet stringent requirements. However,even then, in an extensible architecture, one could embedlower level custom code in key subsystems to enhanceperformance.

Fig. 2. Networked Control System: Traffic Control Testbed

C. Summary of contributionsRather than implementing a classical control system

as simply a software program involving the control lawsand plant interfaces, it is preferable to use a componentarchitecture [15], [16], [17], which consists of decomposingit into components, which are autonomous software mod-ules with well-defined functionalities. As an example,StateEstimator, Supervisor, Controller, Sensor, andActuator, are all possible components in Figure 1. Acomponent architecture has several benefits. It allowsindividual components to be developed separately andintegrated later, which is important for development oflarge systems. Since components are well-defined, theycan be replaced without affecting the rest of the system.For instance, a zero-order hold filter can be dynamicallyreplaced by a Kalman filter without having to change theremaining software or restarting an operational system.Software reuse is promoted, since a component suchas a control algorithm tested on one system can betransplanted into others. Mechanisms that support com-ponent interface specification, including sequence andtypes of component interactions and format and contentsof messages exchanged, and explicit specification ofdesign assumptions, can simplify integration problemswhile reusing components.

Component based design is based on primitives suchas components and messages, which are at a higherlevel of abstraction than sockets and processes supportedby operating systems. These abstractions can be createdby software infrastructure such as middleware, whichhas emerged as a preeminent framework to developlarge distributed applications. Since systems integrationand testing constitute the most expensive developmentactivities in the engineering of complex software [5],a lot of effort has been directed at the developmentof software tools and architectures. Many successfulapplications have been developed using a CORBA basedarchitecture [18]. Currently, IBM [19], Microsoft [20], andSUN [21] promote their middleware based technologiesas platforms for software engineering. We anticipatethat the next generation of control applications will alsofollow this path, and present such a proposal.

The middleware we have developed is called Ether-ware. It requires that the control system designer only

Page 4: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

4

write the logic of the control law in components, with-out being burdened with how it will be executed. Theinformation flow between components is routed throughthe middleware, without a component needing to beaware of where a destination component is executing,thus providing location independence of components. Themiddleware also supports semantic addressing. It furthersupports facilities such as migration, whereby a KalmanFilter Component can migrate to a more computationallypowerful node or a node with lesser latency betweenit and the sensor, or between it and the actuator. Themiddleware also provides features to enhance reliability,a key performance requirement for control systems. Itallows automatic restart upon failure of a component,through mechanisms such as checkpointing of the state.

The middleware employs several Design Patterns [4]that codify good design solutions, capturing best prac-tices, thus leading to much better reuse of design. Wehave incorporated the design patterns of Strategy, Me-mento, and Facade in Etherware. The Strategy pattern hasled to replaceable control laws, Memento has resulted incomponent state check-pointing mechanisms, and Facadehas allowed a simple and uniform middleware interface.

The major abstraction that we propose and whichthe middleware realizes is a Virtually Collocated System,which allows control engineers to design control lawsfor networked systems using the same techniques asfor centralized systems, i.e., as though the distributedapplication were executing on a single computer. Thecontrol system designer need not concern herself withextraneous details of a networked system, and needs tofocus only on the details relevant to the control law.It can potentially significantly reduce design time fornetworked control.

Architecturally, we propose that this abstraction beregarded as a Virtually Collocated System Layer (VCL) thatresides above the Transport Layer. It can be regarded asa replacement for the Presentation and Session Layers,which are anyway absent in the TCP-IP architecture.

We propose an accompanying principle of Local Tem-poral Autonomy for design of reliable networked controlsystems, where the goal is to design components sothat they can function for a limited period of time evenwhen neighboring components fail, thus allowing themto migrate, or for a neighbor to restart. This facilitatesrobustness, reliable system evolution, development anddebugging. We show how Kalman Filters, RecedingHorizon Control, and Actuator Buffers, can all be usedto provide Local Temporal Autonomy.

We identify services that are useful across a range ofcontrol systems and show how they can be supported.One example is distributed time. This is useful sincenetworked control systems employ time-driven comput-ing in addition to the usual event-driven computing.We show how such services are supported through amiddleware feature called ”Filters.”

We have implemented the middleware on a laboratory

testbed2 featuring a networked system of cars, visionsensors, wireless and wired networks, and the full rangeof control hierarchy including regulation, tracking, tra-jectory generation, planning and scheduling, along witha discrete event system for liveness and safety.

This paper does not address the issue of theoriesneeded for networked control. It only focuses on themechanism half of the mechanism-policy division of re-search. Continuing theoretical effort, conducted on thepolicy front, is needed to fully realize the potential ofnetworked control. The current situation is that technol-ogy has overtaken theory in some respects, in that wedo not know how to optimally utilize the capabilitiesalready supported in the middleware. We conclude byproviding some examples of theoretical challenges forfuture research.

II. D While describing the middleware, how the networked

control system is to be implemented, and what func-tionalities the middleware can provide, it is helpful torefer to specific illustratory examples. Hence we providea brief account of the networked control testbed, see Fig-ure 2, on which all of the above has been implemented.It consists of a set of remote controlled cars with no on-board computational capabilities. Each car is controlledby radio transmitter over a dedicated radio frequencychannel connected to the serial port of a laptop througha micro-controller, which converts discrete commandsfrom the laptop into analog controls specifying the speedand steering angle in discrete steps. Commands can besent at a rate of up to 50 Hz, i.e., one command every20ms. Each car has a chassis top with uniquely codedcolor patches that are used to identify its position andorientation. The cars are monitored using a pair of ceilingmounted cameras. The video feed from each camera isprocessed by an image processing algorithm executingon a dedicated desktop computer. This feedback is avail-able at the rate of 10Hz, i.e., one update every 100ms.All computers in the testbed are connected optionally bya wired Ethernet or by an ad hoc wireless network withIEEE 802.11 [23] PCMCIA cards.

The architecture of the control application3 is illustratedin Figure 1 where the control software components inthe testbed are shown. The camera feed is processedin VisionSensor components. A data fusion compo-nent, called the VisionServer, combines the sensor datafrom both vision systems and distributes the positionand orientation information for each car via the com-munication network. Individual cars are operated bycorresponding Controllers that implement RecedingHorizon Model Predictive Control. Incoming sensor datais passed to a StateEstimator module which imple-ments a Kalman Filter. Its prediction of future states is

2Movies are available at [22].3It should be specifically noted that this is distinct from the primary

focus of this paper, the architecture of the middleware, the infrastructureover which the application is run.

Page 5: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

5

fed to the Controller, thus buffering it against delaysand jitter of the communication system. A softwaremodule, called the Actuator, feeds commands to theradio-control system. The Actuator is designed to buffera small horizon of future commands, so that it cantolerate brief control outages, and stopping the car if theController is inoperative for too long. A Supervisorresides above the Controllers and oversees the op-eration of the testbed. The Controllers need to begiven trajectories to operate the respective cars. Theyare generated by the Supervisor, which also ensuresglobal properties such as safety, the avoidance of carcollisions, and liveness, the elimination of traffic grid-locks. The Supervisor generates trajectories along a pre-specified network of roads, using algorithms describedin [?]. Briefly, blocks of the road network are modeledas bins in a corresponding graph. The planning prob-lem is formulated as finding shortest paths in a graph,and the scheduling problem is modeled as assigningcars to bins while avoiding collisions and gridlocks.Finally, the Supervisor also receives feedback from theVisionServer, forming a higher-level control loop. Dueto hardware constraints, the Actuator component foreach car, and the VisionSensor component for eachcamera, must be executed on respective computers. Allother components can execute on any computer in thetestbed.

We have tested the system with up to eight cars oper-ating simultaneously, and closely following pre-specifiedtrajectories. Examples of traffic scenarios implementedare pursuit–evasion where a set of software controlled carsfollow a manually controlled car, and automatic collisionavoidance; see the videos at [22].

III. N We begin by identifying requirements for common

functionalities needed for general networked controlapplications that should be part of a control-specificmiddleware. They may be grouped into Operational Re-quirements, Non-functional Requirements, and ManagementRequirements.

A. Operational RequirementsDistributed operation. Connecting diverse compo-

nents executing on different nodes on a network leadsto numerous problems, including the need to locateand connect related components across a network, andsupport the exchange of messages between connectedcomponents. Examples of the problems that arise areinitializing the components, connecting them, develop-ing protocols for their interaction, and synchronizingtheir operation to resolve conflicts. In the traffic controltestbed, the Supervisor and the various Controllersexecute on different computers.Location independence. This is an abstraction that pro-vides an addressing scheme for components that isindependent of their actual location in the network,

important for design of distributed operations. It al-lows components to communicate without distinguish-ing between local and remote components. It also allowsintegrating components uniformly in different networkconfigurations. For example, in the traffic testbed, theability to easily switch between wired and wireless net-works, which use different addressing schemes, requiressuch details to be abstracted away from the components.The abstraction also supports the future use of othernetworking technologies such as Bluetooth [25], withouthaving to update component code.Service description. Connecting components in a net-work involves determining if appropriate componentsare executing in the system. If several such componentsare available, then the most suitable component has tobe selected. This requires mechanisms to specify and dis-cover services provided by components. A car controllerthat knows the geographic location of its car should beable to connect directly to a sensor covering that location,without having to know about sensor locations in thenetwork.Interface compatibility. Integrating independently de-veloped software components can lead to interface in-compatibilities; e.g., the set of functions defined in theinterface of a sensor module may not be compatible withthe functions required by a controller component. Thiscan be eliminated if standard interfaces are specified andsupported for compatibility.Semantics. Incongruent assumptions in the implementa-tion of components can lead to problems. The interactionbetween different software components is usually basedon an implicit finite state machine (see Figure 6), withexchanged messages assumed to be in specific formats.However, current interface description languages suchas the CORBA IDL [26] do not provide a mechanismto specify these assumptions properly. For example,suppose the Controller in Figure 1 is implemented sothat it checks for an update at the VisionServer beforecomputing controls. This assumes that VisionServerresponds immediately, returning an update if available,and none otherwise. However, if VisionServer is im-plemented so that it checks with VisionSensors forupdates before responding, the additional delay maylead to failure as Controller is also waiting for anupdate. Hence, Controller and VisionServermay haveconsistent interfaces, but the interaction semantics maystill be incompatible. Consequently, there must be addi-tional provision for specifying interaction semantics ininterface descriptions.Distributed time. Components executing on differentcomputers do not share common clocks, and need amechanism to correctly translate time. For example,time-stamps on remote sensor updates must be properlytranslated to local time to avoid collisions.

B. Non-functional RequirementsThese are features that are not necessarily required

for the correct operation of the system, but support

Page 6: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

6

for which would significantly improve system perfor-mance.4

Robustness. Robustness being a fundamental require-ment for the viability of networked control, componentfailures must be contained and their effect on the overallsystem minimized. For instance, the failure of a faultysensor module should not immediately cause a con-nected controller to fail as well. This requires dependen-cies between components to be eliminated or replacedby use-only if available relationships as much as possible.Delay-reliability trade-off. Reliable delivery of data overa network introduces additional delay due to retransmis-sion of lost or re-ordered packets. However, some com-ponents may not require reliable delivery, and shouldtherefore be able to trade-off reliability for lower averagedelay. For example, time-stamped sensor updates aremore useful when delays are smaller, even though a fewupdates may be lost over the network.Other requirements. Algorithms should have good scal-ability. Security is also a key requirement and the systemmust be protected from misbehaving or malignant com-ponents.

C. Management Requirements

These are required for starting, managing, and updat-ing components in an operational system.Startup. Programmable interfaces to startup proceduresthat allow the specification of dependencies betweencomponents help ensure correct overall system initial-ization. For instance, a controller component may needto be started up before a related actuator component.System evolution. It is necessary to be able to migrate orupdate components at run-time. Component migrationis required to optimize the configuration of softwarecomponents to balance or reduce communication andcomputational loads. For example, a Controller in thetestbed can be migrated to another computer, where itis closer to the corresponding Actuator or less loadedcomputationally, thus reducing loop delay. Supportingsuch run time optimization allows changing controllersto address evolving plant goals, without having to restartthe whole system.

IV. E

We now present Etherware, a middleware for net-worked control. We begin with the design considera-tions, followed by the usage model and architecture.Application design. Control loops have to be closed overcommunication channels that are unreliable, especiallyin the case of a wireless network. We can mitigatethis by employing various application design strategies,particularly the principle of local temporal autonomy. Asan example, noisy feedback to the controller is filteredusing State Estimator, which implements a Kalman

4Strange as the terminology may seem, these may actually be moreimportant than functional requirements.

Filter. However, what is of more interest to us here isthat, architecturally, the interposed State Estimator cancontinue to provide position estimates of the cars, evenif sensor measurement updates are lost or delayed, thusenhancing the local temporal autonomy of the downstreamconsumer of sensor information, Controller, from theproducer, Sensor.

We further enhance reliability by using receding hori-zon control where the controller sends a sequence offuture controls that can be used to control the car for alonger period, rather than sending just one control input[27]. Future controls are stored in an Actuator Buffer at theactuator, and used in case updates are delayed or lost. Asimilar buffer at the controller holds future way-pointsfrom Supervisor.

By providing support for such strategies in the mid-dleware, the cost of application design complexity canbe reduced. Since critical components can now enduresome delays, we have been able to implement the entiresystem using only soft real time control5.Stability considerations. Robustness requires the abilityto restart application components efficiently to maintainsystem stability. For example, if Controller in Fig-ure 1 fails due to a software error, then restarting itshould not require reestablishing communication withVisionSensor, reinitializing Actuator, or re-establishingController state, as these delays could cause systeminstabilities and result in car collisions.

This motivates a simple and uniform design to ex-ternalize component state, so that it can be reinitializedwith this state, and restarted at the same node or evenmigrated to a different location and restarted there.Etherware enforces component state externalization asa basic architectural precept. Each component is instan-tiated with an initial state, and is required to supporta state check-pointing mechanism. On a check-pointrequest, it has to return a state object that can be usedto reinitialize it upon restart, update, or migration.

Maintenance of communication channels across suchchanges is also supported in Etherware. Identifiers forcommunication channels can be saved as part of thecheck-pointed state, allowing restarted or upgradedcomponents to continue using previously establishedchannels. It also provides communication continuity toother components during such changes.Message based communication. Components in net-worked control systems have to respond to changesas soon as possible. Upon detecting a safety violation,Controller may not be able to wait for an acknowl-edgment from Supervisor before it takes action. Overa wireless channel in particular, delays can be largedue to deep fading and queued packets. Another im-portant consideration is the presence of dependenciesin push-based communication channels. For example,the controller cannot wait for the updates from the

5We are presently incorporating real-time support into Etherware[28].

Page 7: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

7

sensor before sending controls to Actuator since updatesmay be delayed or lost. Conventional middlewares fortransaction based systems such as CORBA [26] use syn-chronous communication where a component sendinga message is blocked until it receives a reply. Thisleads to complex design with each component requiringmultiple threads of control, since a separate receivingthread is required for blocking on each channel. Onthe other hand, asynchronous operation eliminates thissource of complexity since a component is not blockedwhen sending a message.

Based on these considerations, Etherware has been de-veloped as a message-oriented middleware. Message basedcommunication requires a specification of message for-mats. Support for interface and semantic compatibil-ity during changes and component reuse requires thisspecification to be flexible, extensible, and backwardcompatible. Flexibility is the ability to easily incorporatechanges in the interface and semantics of a component,and extensibility supports ease of adding specificationsfor new functionality, while still honoring the originalspecifications used by older components to communicatewith it. Based on these requirements, we have used XML[29] as the language for messages, with all communica-tion in Etherware through well-formed XML documentswith appropriately defined and extensible formats. Forplatform independence, and due to availability of sup-port for XML, Etherware has been implemented usingthe Java programming language [30]. While these choicesincur some additional processing overhead, we believethat the use of open standards and advances in computerhardware compensate for this.Architecture. An important architecture design decisionwas to choose an execution model for active components,such as sensors, which need their own threads of con-trol. We decided to go with thread-per-active-componentinstead of process-per-active-component model for thefollowing reasons. First, support for Thread managementis much better than support for Process management inJava. For instance, support for thread prioritization andthread interruption is available in a standardized andplatform-independent interface. Second, since Threadsare much lighter-weight entities than Processes, theircreation and update time is much smaller. This is wellillustrated in Section VIII-A where we see that a Processrestart is on the order of seconds, while a Thread restartis on the order of tens of milliseconds. Finally, havingone Etherware process per computer makes it easierto define and use resources such as published networkports for remote communication. However, individualactive components can create and manage their ownprocesses if necessary. For instance, due to programminglanguage and platform constraints, the VisionSensorcomponents for cars in the testbed create and manageprocesses to collect data from the frame grabbers con-nected to the camera outputs.

Services provided by the middleware need to be eas-ily restartable and upgradeable. Supporting this allows

basic versions of services to be deployed initially forresource savings, and subsequently updated based onchanging application requirements, without having torestart the system. Hence, invariant aspects of the mid-dleware, which cannot be changed dynamically, have tobe minimized to maximize flexibility.

These considerations have motivated us to adopt amicro-kernel [31] based design. This consists of using asimple and efficient kernel, which has only the mostbasic functionality needed for minimal operation, andwhich will not need to be changed during system op-eration. This consists of functionality to deliver mes-sages between components, and primitives for creatingnew components. All other middleware functionality isimplemented as components, just as the application is;hence these can be changed without having to restartthe system. This philosophy of maximizing flexibilityhas also resulted in the development of a bare minimumfunctional interface for components to interact with themiddleware. Flexibility is increased since the above de-sign minimizes the number of aspects in the functionalinterface that need to be changed or upgraded duringsystem operation. For uniformity, application compo-nents interact with middleware services with messages,just as though they were other application components.

A. Etherware usage model

We next turn to the programming or usage model forEtherware, that is, the abstractions provided to controlengineers and software programmers using Etherware.

A component in Etherware can participate in controlhierarchies. For example, Controller in Figure 1 takesgoals from the higher-level Supervisor and sends com-mands to the lower-level Actuator. The VisionServerof Figure 1 takes data inputs from VisionSensors andprovides data to Controllers.

Etherware is an asynchronous message-based middle-ware. Components communicate by exchanging mes-sages, which are well formed XML documents [29]. XMLdocuments can be directly manipulated as large strings.However, Etherware also provides a hierarchy of Javaclasses which provide various primitives to manipulatethe underlying XML document across a Java basedinterface. This hierarchy can be extended to supportadditional messages with user-defined XML formats, inaddition to the predefined messages with specific XMLformats that are encoded in these classes.

Two basic problems attending message delivery indistributed systems are discovery and identification ofdestination components. The identification problem issolved in Etherware by associating a globally uniqueid, called a Binding, to each component. The discoveryproblem is solved by associating semantic profiles toaddressable components. Each component that needsto be addressed registers one or more profiles withEtherware. A profile describes the set of services thata component provides. For example, a profile for a

Page 8: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

8

MessageStream

Sensor Kalman Filter Controller

Filter

Fig. 3. Filters for MessageStreams

vision sensor could specify the type of camera and thegeographic region covered by it, and a car controllercould use this information to connect to a relevant visionsensor.

All messages have the following three XML tags orconstituents:Recipient Profile. This identifies the recipient of the mes-sage. A profile can be a service description as above, orthe globally unique Binding of a component.Content. This represents the contents of the messageincluding application specific information.Time-stamp. Each message has a time-stamp associatedwith it which is automatically translated to the localclock on that computer as a message moves from onecomputer to another.

By default, messages are delivered reliably and inorder. However, there may be streams of messages thatneed to be delivered using other specifications as well.To identify and manipulate a stream of messages asa separate entity, Etherware supports the notion of aMessageStream. A component can open a MessageStreamto another component and send messages through it.MessageStreams have settings that can be used to specifyhow messages are delivered through them. For exam-ple, Controller can tolerate a few lost sensor updatesfor lower delay, but has no use for old updates. Ac-cordingly, VisionSensor could open a MessageStream toController requesting unreliable in-order delivery, asshown in Figure 3. This means that messages alongthis pipe will not be retransmitted if lost, and messagesarriving late will be discarded. In addition, Etherwareprovides MulticastStreams that support efficient multi-source multi-cast of messages.

Etherware provides the capability to add, at run time,Filters that can intercept all messages sent to, or receivedby, a component, since it may be necessary to modifymessages in a MessageStream in response to changesin operating conditions. For example, updates fromVisionSensor could get noisy due to bad lighting con-ditions, and it is desirable to have the capability to filterout this noise without having to change Sensor or theController. Figure 3 shows the effective configurationafter a Kalman filter has been added to the message pipebetween VisionSensor and Controller.

The design of a generic Etherware component isshown in Figure 4. It is based on several well known“design patterns” [4]. In software development, manydesign problems have a common recurring theme, anddesign patterns represent solutions that exploit thetheme. We now consider the various design problemsfor Etherware components, and describe how these are

(Strategy) (Memento)

Higher Component

Lower Component

Input Component Output Component

Shell (Facade)

Component

Messages

Data out

Control in Feedback

Control out Feedback

Data in Control Law State

Fig. 4. Programming model for a Component in Etherware

addressed using appropriate design patterns.Memento. Support for restarts and upgrades requiresthe ability to externalize and capture application state.This is solved by the Memento pattern, wherein com-ponent state can be check-pointed and restored on re-initialization.Strategy. System stability is enhanced by the abilityto replace components without disrupting service. Inparticular, if the functional (syntactic) interface usedto communicate with the component is invariant, thencomponents can be replaced dynamically. In this case,the Strategy pattern is used in conjunction with theMemento pattern.Facade. Interaction with various services in the middle-ware usually requires a component to send messagesusing function calls. If the component has to directlycall functions on the various sub-systems, the structureof these subsystems needs to be programmed in it.This introduces unnecessary dependencies and makesthe component and the middleware hard to evolve, aschanging the middleware architecture will require thesoftware of all the components to be updated as well.This is eliminated by using the Facade pattern to providea uniform functional interface that is independent of theactual middleware architecture.

Components can be active or passive. Passive compo-nents do not have any active threads of control. Theyonly respond to incoming messages by processing themappropriately and generating resulting messages if any.Active components have one or more active threads ofcontrol. They can generate messages based on activi-ties in their individual threads of control. For example,VisionSensor can be implemented as an active compo-nent with a separate thread to block on sensor hardwareupdates, and generate appropriate messages when newsensor data is available.

B. Etherware architectureThe architecture of Etherware is based on the micro-

kernel concept shown in Figure 5. As motivated inSection IV, the Kernel represents the minimum invariantin Etherware. All other services are implemented as com-ponents. The Kernel manages all components on a givencomputer. The function of the Kernel is to create newcomponents, and deliver messages between its (local)components. We allow one Etherware process per nodein the current implementation.

Page 9: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

9

KERNEL

Shell Shell Shell

Messages Messages

Scheduler

Component ActiveComponent

ServiceComponent

Fig. 5. Architecture of Etherware

The Kernel has a Scheduler that is responsible forscheduling all messages and threads. The Scheduler canbe replaced dynamically if more complex schedulingalgorithms are necessary as the application evolves. Inaddition, the Scheduler also provides a notification ser-vice, whereby a component can request to be notifiedwith alarms after a given delay. This allows componentsto wait, without having to create separate threads ofcontrol for this purpose. Components can also registerto receive periodic notifications or ticks. This has beenused to implement all soft real time control in thetestbed. The car Controller operates at 10 Hz, and hasbeen implemented as a passive component. For periodicactivation, it registers with Scheduler to receive periodictick messages every 100ms.

Each component is encapsulated in its own Shell asshown in Figure 5. A Shell presents a facade to thecomponent, and provides a uniform interface for it to in-teract with the rest of the system. Shells also encapsulatecomponent specific information such as configurations ofMessageStreams. Activities involved in component restart,upgrade, and migration have also been implemented inShells.

All other functionality in Etherware is provided by ser-vice components. An instance of each service componentexecutes on each computer. The following basic servicesare used:ProfileRegistry. The ProfileRegistry is used to register andlook up profiles of components. Profiles of newly createdcomponents are registered and recipients of messagesaddressed using profiles are determined by the registry.Each node has a Local ProfileRegistry for components onthat node. A network also has a Global ProfileRegistry forcomponents on all nodes in the network.NetworkMessenger. This sends and receives remote mes-sages, i.e., messages between a local component and a re-mote component operating on another computer. Hence,it encapsulates all communication with remote nodesover the network, including details such as IP addresses,ports, and transport layer protocols. By default, all mes-sages addressed to remote components are forwardedby the Kernel to this service. The NetworkMessenger isan active component with separate threads to receivemessages from remote nodes.NetworkTimeService. This service translates time-stampsof messages as they are transmitted from one nodeto another. To implement this, a NetworkTimeServicecomponent is added as a Filter to the NetworkMessenger,so that it intercepts all messages that are sent to andreceived from other nodes. (It should be noted how easy

Start

Receive

Operate

RegisterProfileEvent, RegisterTickEvent

TickEvent

ControlEvent

OperateCar

Fig. 6. Finite state automata for Actuator semantics

it is to time-stamp information once we have the notionof Filters). Time translation is based on computing clockoffsets using the Control Time Protocol [32].

C. Etherware capabilities

We now describe how the requirements listed in Sec-tion III are addressed in Etherware.Operational Requirements. These requirements areaddressed primarily by the Etherware programmingmodel.Distributed operation. Components executing on differ-ent computers can communicate with each other usingthe same primitives as used for local interactions. Thefollowing mechanisms collectively implement locationindependence in Etherware. Each component is assigneda globally unique-id called a Binding. This addressingscheme is independent of network location and topology.Each Binding is mapped into a network specific address,such as an IP address, and a component id on a givencomputer. This mapping can change as components mi-grate. Local routing of messages on a single computer isperformed by the Kernel. Routing of messages betweencomputers is done by the NetworkMessenger.Service description. This is a consequence of addressablecomponents being able to register service profiles. Acomponent that wishes to access a given service canjust address messages using the appropriate profile. Theprofile is then matched with registered profiles by theProfileRegistry as explained in Section IV-B. If a matchis found, then the message is directly forwarded to theappropriate component. If not, an appropriate exceptionmessage is returned.Interface compatibility. This is primarily achieved by useof XML documents for communication between compo-nents. Besides, components use a simple and uniformfunctional interface provided by the Etherware Shell,and as suggested by the Facade pattern in Section IV-A.Semantics. This requirement can be properly addressedby an interface description language (IDL), which isa language to describe the functional and interactioninterface exposed by a component. The Etherware pro-gramming model mandates message based interactionsbetween components. This requires components to in-corporate interaction semantics that can be formallyspecified using a finite state machine [33]. It includesspecification of the messages that a component can sendor receive, and conditions under which it can send or re-ceive specific messages. Figure 6 represents the FSA that

Page 10: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

10

specifies Actuator semantics. The labels on the arrowsrepresent messages and actions, with overbars indicatingreceived messages. Etherware component semantics canbe formally specified in an executable fashion in therewriting logic based formal language Maude [34], asbelow:mod TESTBED is

pr RAT .

pr QID .

inc CONFIGURATION .

subsort Qid < Oid .

sorts ActuatorState State .

subsort ActuatorState < State .

sorts Control Controls .

subsort Control < Controls < Attribute .

ops startA receiveA : -> ActuatorState .

op state :_ : State -> Attribute .

op car :_ : String -> Attribute .

op Actuator : -> Cid .

op registerMsg(_,_) : Oid Qid -> Msg .

op controlMsg(_,_,_) : Oid Int Rat -> Msg .

op actionMsg(_,_) : Int Rat -> Msg .

op tickMsg(_) : Oid -> Msg .

op control(_,_) : Int Rat -> Control .

op nil : -> Controls .

op _ _ : Controls Controls -> Controls [assoc id: nil] .

op newActuator(_,_) : Qid String -> Object .

var A : Oid .

var S : String .

var I : Int .

var R : Rat .

var AS : AttributeSet .

var C : Control .

var CS : Controls .

eq newActuator(A,S) = < A : Actuator | state : startA , car : S, nil > .

rl [a-init] : < A : Actuator | state : startA , AS >

=> < A : Actuator | state : receiveA , AS > registerMsg(A,A) .

rl [a-recv] : < A : Actuator | state : receiveA , AS , CS > controlMsg(A,I,R)

=> < A : Actuator | state : receiveA , AS , CS control(I,R) > .

rl [a-send] : < A : Actuator | AS , control(I,R) CS > tickMsg(A)

=> < A : Actuator | AS , CS > actionMsg(I,R) .

endm

Components and systems specified in this fashion canbe verified for validity and correctness using theoremproving tools in Maude. XML provides the representa-tion for strongly typed messages, where the type andformat of all possible messages can be defined as part ofthe interface.

Note that the IDL provided by Etherware is only a lan-guage for specifying component interface and interactionsemantics. The actual semantics for a given componentIDL specification are highly dependent on its domainof operation. For instance, the IDL specification of aVisionSensor component could specify the refresh rateusing an XML tag called ”RefreshRate,” but for othercomponents to correctly use its sensory output, theymust understand that the ”RefreshRate” tag identifiesthe specification of the refresh rate of the vision sensor,i.e., they must have a common shared vocabulary withconsistent meanings.

In addition, an Etherware proxy service has also beenimplemented, which allows formal component specifica-tions in Maude [34] to interact with regular componentsin Etherware. This enables rapid prototyping, whereformal specifications can be checked for both logicalcorrectness and operational stability. For instance, theActuator module specified above can be plugged intothe actual testbed, and could emulate a real car in thesystem.

Distributed time. This is provided by the NetworkTimeSer-vice.Non-functional requirements. These requirements havebeen addressed by various architectural features andservices in Etherware.Robustness. This is facilitated primarily by the use ofthe Memento pattern. It allows the effect of componentfailures to be contained by efficient check-point andrestart mechanisms in Etherware. The efficiency of thesetechniques is considered in Section VIII.Delay-reliability trade-off. MessageStreams support tradingoff reliability for lower delays. They also provide asimple mechanism to incorporate support for other QoSrequirements.Other requirements. The current algorithms for variousservices scale well for the testbed requirements. We havetested them by operating up to eight cars at a time,which represents the scenario with the maximum load inthe system. Further, component Shells provide a uniformlocation to potentially incorporate security features.

The Etherware architecture essentially trades-off sys-tem performance (verbose protocols, java vms, soft real-time operation, etc.) to support complex functionalityand interactions between components. However, highperformance critical control loops can still be imple-mented using a locally optimized language and plat-form, and these can then be ”wrapped” with an Ether-ware component to plug it into a larger system. Inthis design, Etherware acts as an integration platformfor complex networked control systems, with higher-level supervisory control implemented for soft real-timeoperation, connected to lower-level critical control loopsimplemented on native platforms for hard-real timeoperation.Management Requirements. These are addressed usingconfiguration support and design:Startup. Start-up configurations and dependencies arespecified to Etherware using a configuration file for eachcomputer. Etherware ensures that components are initial-ized in the correct order. Currently, the absolute orderin which components need to be initialized on a givennode is specified, but support for more complicateddependencies can be easily supported using a richerdependency evaluation system such as Apache Ant [35].System evolution. System evolution through componentupdate and migration, is supported by the use of Me-mento pattern. Application state is maintained acrossboth operations using Mementos. Connections using Mes-sageStreams are maintained across component updates.As noted in Section IV-A, MessageStreams and Filtersalso provide a mechanism for system evolution with-out having to change any existing connections betweencomponents.

V. S

In addition to the Etherware infrastructure itself, manycontrol applications have many common sub-problems

Page 11: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

11

Physical Layer

Datalink Layer

Network Layer

Transport Layer

Session Layer

Application Layer

Presentation Layer

IP

TCP, UDP

Control software

NetworkInfrastructure

Virtual Collocation(Middleware)

OSI STACK PROPOSED STACK

Fig. 7. Mapping from OSI stack to proposed architecture

whose solutions could be reused. We facilitate such reuseinto services bundled with the middleware. Some of theservices that we have considered are:Distributed time management. We have implemented aclock synchronization algorithm.Plant delay estimation. A dynamic delay estimationservice is of interest for closing loops since delays maychange with time.Component repositories. Software component reposito-ries, from where new control software can be down-loaded and installed dynamically into operational sys-tems would be useful in promoting use of new theoriesand algorithms.Dynamic system optimization. This is a useful serviceto facilitate automatic determination of optimal systemconfiguration at run-time. For instance, whether to pro-cess all the pixels from the camera at a certain node,thus stressing its computational resource, or to transferthem to another node, thus stressing the communicationnetwork, is a low level decision that depends on thepower of the processor at a node as well as the currentcommunication traffic load, and should be automaticallydone. The control designer’s role can thus move to thehigher plane of specifying high level rules or algorithmsfor such migration.

VI. A L N CThe Open Systems Interconnections (OSI) [13] archi-

tecture is a standard reference model for networkedsystems. Our proposed architecture is based on thismodel, and the mapping between the different layersis shown in Figure 7. Central to it is the abstractionof a virtually collocated system, where information ex-change between application components is supportedat the level of messages exchanged between seeminglylocal components. This abstraction is provided by themiddleware, which hides the details of a networkedsystem such as IP addresses, network protocols, dataformats, and computational resources. Components canbe identified using application specific content insteadof network addresses and ports. They can exchangeinformation on a regular basis, as information pushwhenever updates are available, or as pull on demandby an information consumer. Hardware or software canbe added or removed while the system is running, andthe designer need not deal with issues like starting

StateEstimatorSensor Controller

Sensor Controller

Fig. 8. A State Estimator separating Sensor and Controller canreduce execution and timing dependencies between them.

up computers in an appropriate order. Also, clocks arealigned and all sensor and other information is auto-matically time-stamped. These features correspond tothe Session and Presentation layers shown in Figure 7.Utilizing the abstraction of a virtually collocated systemprovided by middleware, a system designer can focuson algorithmic code for planning, scheduling, control,adaptation, estimation, or identification.

VII. D L T ANetworked control systems will be subject to commu-

nication and node failures as well as component failures.We suggest the local temporal autonomy design principle:Design components to tolerate, with graceful degrada-tion, failures in connections to, and operation of, otherconnected components. We provide three illustratoryexamples.State Estimator: To match the unpredictable charac-teristics of the communication network with the pre-dictability required by control, we insert a buffer, aStateEstimator, before controllers in the system, seeFigure 8. It removes the execution and timing dependen-cies of Controller on the inherently unreliable commu-nication channel , since it can be used by Controller tointerpolate between lost sensor updates. Such a designcan take aperiodic sensor data as input, and providecontinuous or periodic outputs to the Controller, theformat preferred by digital control design methodology.This improves local temporal autonomy of Controller,since it buffers it from failures in the sensor and thecommunication channel.Actuator Buffer: Similarly, at the actuator end, instead ofproviding just one current control input at each updateto an actuator, by using receding horizon control, aController can provide a window of control inputs upto a time horizon. This again improves the local temporalautonomy of Actuator as it now has fall-back controlsin case subsequent controller updates do not arrive.

By weakening the dependence between components,these approaches also facilitate dynamic component up-grade and migration, in addition to simplifying designand testing.Migration for self-optimization and reliability: Com-ponents in the networked control system can execute atany location. Their optimal physical locations dependupon timing constraints, communication delays, compu-tational loads, etc., which may vary during operation.Hence it would be desirable for the execution of a com-ponent to be able to migrate to a better location within

Page 12: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

12

the system. To accomplish migration, the middlewareincorporates several supporting functions such as thecomponents being able to continue to communicate afterthe move. This involves updating the communicationinformation at each of the nodes or components whichcommunicate with the migrating component, as well asupdating the communication information at the compo-nent itself. To help an application optimize decisions onwhen and where to migrate a component, the middle-ware must be able to estimate the loads on the physicalresources which will exist before and after the migration,which can be provided as a middleware service.

VIII. E

This section presents three experiments evaluating theperformance of Etherware mechanisms.

A. Controller Restart

0 20 40 60 80 100 120 140 160 1800

200

400

600

800

1000

1200

45 83 39

3219

46 42 128

time (seconds)

Devia

tion

from

traj

ecto

ry (m

m)

Deviation (mm)Time of controller downTime of controller upElapsed time to recover (ms)

(a) Error in car trajectory dueto controller restarts

0 20 40 60 80 100 120 140 1600

500

1000

1500

47ms

time (seconds)

Devia

tion

from

traj

ecto

ry (m

m)

Deviation (mm)Old controller downNew controller upElapsed time to recover (ms)

(b) Error in car trajectory dueto controller upgrade

0 10 20 30 40 50 60 70 80 900

50

100

150

200

250

300

350

400

450

500

491ms

time (seconds)

Devia

tion

from

traj

ecto

ry (m

m)

Deviation (mm)Old controller downMigrated controller upTime to migrate (ms)

(c) Error in car trajectory dueto controller migration

Fig. 9. Performances of controller restart, upgrade and migration

In the first experiment, Controller was restarted sev-eral times. Faults were injected at random by performingan illegal operation (divide by zero) in Controller,which caused Controller to raise exceptions and berestarted by Etherware. The deviation of the actual carpositions from the desired trajectory, as a function oftime, is shown in Figure 9(a). Restarts are indicated bypointers, and the accompanying numbers indicate, inmilliseconds, the time for each restart. These are time-stamps at Observer, and include communication andsynchronization times between the restarted Controllerand Observer. The first restart occurred at about 70seconds into the experiment, and was followed by twoother restarts in the next 20 seconds. The last threefaults were also handled by the restart mechanisms.We see that the error in the car position during theserestarts was within the system error bounds duringnormal operation prior to restart. Two of the Etherware

mechanisms described in Section IV contributed to thequick recoveries. First, the Shell intercepted exceptionsdue to the Controller faults, and restarted it withoutaffecting the MessageStream connections to the othercomponents. Second, before termination, the Controllerstate was check-pointed according to the Memento pat-tern, and then used for re-initialization. To illustrate theperformance of these two mechanisms, we also testedthe alternative mechanism of restarting the Etherwareprocess managing the Controller at about 100 secondsafter the start. We see that the restart of Etherware andthe Controller took about three seconds, during whichthe car position accumulated a large error of about 0.8meters. This illustrates the necessity for efficient restarts.Furthermore, even though the Controller restarted afterthree seconds, additional error was accumulated beforerecovery. This was because the Controller had to re-connect to the other components, rebuild the state of thecar, and bring it back on track. These avoidable delaysare eliminated through efficient restart mechanisms inEtherware.

B. Controller Upgrade

In the second experiment, we tested the performanceof software upgrade mechanisms in Etherware. Thecar is initially controlled by a coarse Controller thatoperates myopically. Etherware is then commanded, atabout 90 seconds, to upgrade the coarse Controllerto a better model predictive Controller. From Figure9(b), we see the improvement in the car operation afterupdate. The involved transients are within the systemerror bounds as well. This functionality is due to threekey Etherware mechanisms from Section IV. First, theStrategy pattern allows one Controller to be replaced byanother without any changes to the rest of the system.Second, the Shell is able to upgrade the Controllerwithout affecting the connections to the other compo-nents. Finally, the Memento pattern allows the coarseController to check-point its state before termination.This is then used to initialize the new Controller.The first mechanism allows for simple upgrades, whilethe other two mechanisms minimize the impact of theupgrade on other components and the car operation.

C. Component Migration

In the third experiment, the support for migration inEtherware was tested. As before, the error in the cartrajectory is shown in Figure 9(c). The large spike atthe beginning of the graph was the error due to the cartrying to catch up with its trajectory. This is achieved atabout 10 seconds into the experiment, after which thecar follows the trajectory within an error of 50mm. Inparticular, the error introduced due to migration is wellwithin the operational error of the car. Two Etherwaremechanisms enable migration. First, the Memento patternallows the current state of Controller to be captured

Page 13: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

13

upon its termination. Second, the primitives in the Ker-nel on the remote computer allow a new controller to bestarted there with the check-pointed state of Controller.

IX. RW

This section presents an overview of related earlierwork in this area. CORBA [26] is probably the mostpopular middleware and has been used in a variety ofdifferent domains. It was primarily intended for transac-tions based business and enterprise systems. Hence, thetrade-offs incorporated in CORBA are not all compatiblewith requirements for control systems. For example, thespecification mandates the use of TCP [36] for reliabledelivery. This can be a serious limitation if lower delayis more important than reliability, as such a trade-offcannot be supported. Popular versions of middlewarefor control applications are based on various flavorsof CORBA, such as Real time CORBA [37] and Min-imum CORBA [38]. For example, OCP [39] is basedon Real Time CORBA and has been used to controlunmanned aerial vehicles. ROFES [40] implements RealTime CORBA and Minimum CORBA, and is targetedfor real-time applications in embedded systems. Fault-tolerant CORBA (FT-CORBA) [26] is the primary OMGspecification that addresses fault tolerance in distributedsystems. A key problem in using CORBA based middle-ware is that the CORBA interface description language(IDL) is not descriptive enough to specify key assump-tions in component design. In particular, issues involvedin semantic compatibility, such as description of data va-lidity and interaction semantics, cannot be specified. Thisleads to numerous problems while integrating indepen-dently developed, yet functionally compatible compo-nents. Other interesting approaches include Giotto [41],real-time framework [42] for robotics and automation,and OSACA [43] for automation. A good overview ofresearch and technology that has been developed forimplementing reusable, distributed control systems isprovided in [44].

X. T

Exploiting the capabilities of Etherware poses theo-retical challenges. At the tactical level, we need to beable to optimally exploit middleware capabilities alreadypresent, such as migration, restart, and the servicesprovided. Two examples of problems that arise are thefollowing. Since the middleware provides to the con-troller the actual delay that has already been incurredby a packet containing a measurement, i.e., random butknown delay, how should one design a controller that isrobust to plant uncertainties? Another is how to optimizethe “migration controller” as empirical delay profilesbetween various origin-destination node pairs over anetworked system change.

At the strategic level, there is a need for a cross-domain theory that is holistic and includes real-timescheduling, plant uncertainties, and hybrid systems. We

ControllerSensor

PlantActuator

RenewalGuarantees

Co!located components

Zero delay

Zero delay

Fig. 10. Control Loop Model

present an illustrative example. Consider the car controlloop model shown in Figure 10. The Controller and theActuator are collocated, see [45], and there is negligibledelay between a Controller issuing a control, and theActuator effecting it in the car. Suppose the feedbackchannel from the Sensor to the Controller is scheduledaccording to the renewal task model, [46] where a task isa sequence of jobs that execute on a specified resource,and where each job has a fixed computation time C and arelative deadline D, with the renewal referring to the factthat a new job is released immediately after a previousjob is completed. A renewal task is characterized byits task density ρ = C/D. In our setting, each job is acomputation of car controls, speed and orientation, tobe applied for a specific time interval.

We model a car itself as an oriented point in a lane,with state (d, θ), where d is the absolute distance of thecar from the center of the lane, and θ is the angle ofthe car to the median of the lane. As a convention, thedirection of traversal of the lane is assumed to be fromleft to right in what follows. We assume that a car hasa single non-zero speed s, and four steering controls:two in anti-clockwise direction, and two in clockwisedirection, both allowing motion along circles of radii Rand R with R < R. Based on the renewal task model forsensory feedback, a car can move a bounded distancebefore it is observed again. Since the car moves at aconstant speed s, this corresponds to a deadline of Rα/sin the renewal task model. Finally, a control is sentimmediately after the car has been observed.

Theorem 10.1: A set of cars can be driven along pre-specified routes without collisions (safety guarantee) orgridlocks (liveness guarantee), if the road network has

straight single-lane roads of length at least L =2γRRR−R

,and lane-width at least 2(W + R(1 − cosγ)) with W =R(1−cos β(2 cosα−1)), α ≤ π/3, and β = cos−1 (2 cosα − 1).Proof. We provide a brief outline; see [47] for details.First, it can be proved that a car can be controlled so thatit stays within a road of width W = R(1−cos β(2 cosα−1)).Then it can be proved that the length of the road issufficient to keep the car within its lane when movingfrom one lane to another at an intersection. Third, it canbe shown that a set of cars in a road network with singlelane roads can be cleared by a supervisory algorithm de-scribed in [?]. This algorithm models the traffic system asa discrete graph, where a lane is modeled as a sequenceof bins. The algorithm then moves cars between bins indiscrete time. This establishes the liveness guarantee.

Next, it can be shown that the cars can be scheduled

Page 14: Abstractions, Architecture, Mechanisms, and a Middleware for …cesg.tamu.edu/wp-content/uploads/2014/09/Abstractions-Architecture... · Abstractions, Architecture, Mechanisms, and

14

without collisions by ensuring that at most one car canbe in a bin at a given time. This is enforced duringbin transitions by keeping the bin in front of each carempty. This, in turn, is accomplished by associating anadditional ”virtual” car always in front of each real car.This final step establishes the safety guarantee for thesystem. �

As we build more complex systems, a challenge isto be able to automate such proofs of overall systemcorrectness.

XI. C

This paper has focused on the mechanism half ofthe policy-mechanism divide, and proposed an abstrac-tion and a middleware based approach to support thedesign, development, and deployment of networkedcontrol systems. The Etherware design supports the de-velopment of applications as collections of components.The support for standardized interfaces and seamlessintegration enables the reuse of components in otherapplications, which allows the building a library of com-ponents embodying various control algorithms and de-signs that can be assembled into working applications indeployed systems. Application development consists ofspecifying the interconnection of components and theirindividual configurations. Using Etherware primitives,system optimization can be supported by developing al-gorithms to automatically determine optimal componentinterconnection configurations, and migrate componentsaccordingly.

A

We are grateful to Lui Sha for enthusiastic discussionsand much wisdom over several years.

R

[1] D. A. Mindell, Between Human and Machine: Feedback, Control, andComputing Before Cybernetics. JHU Press, 2004.

[2] J. Stankovic, “Vest: A toolset for constructing and analyzingcomponent based operating systems for embedded and real-timesystems,” Univ. of Virginia, Tech. Rep. TRCS-2000-19, 2000.

[3] A. L. K. Carter and N. McNeil, “Unlicensed andunshackled: A joint OSP-OET white paper onunlicensed devices and their regulatory issues,”http://hraunfoss.fcc.gov/edocs: public/attachmatch/DOC-234741A1.pdf, May 2003.

[4] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns,ser. Professional Computing Series. Addison-Wesley, 1995.

[5] I. Sommerville, Software Engineering, Addison-Wesley, 2000.[6] R. Bannatyne, “Microcontrollers for the au-

tomobile,” Micro Control Journal, 2003,http://www.mcjournal.com/articles/arc105/arc105.htm.

[7] R. Sanz and K.-E. Arzen, “Trends in software and control,” IEEEControl Systems Magazine, June 2003.

[8] R. J. Pehrson, “Software development for the Boeing 777,” TheBoeing Company, Tech. Rep., 1996.

[9] H. P. Barendregt and E. Barendsen, “Introduction to lambdacalculus,” in Aspenæs Workshop on Implementation of FunctionalLanguages, Goteborg, 1988.

[10] J. von Neumann, “First draft of a report on the edvac,” June 1945.[11] C. H. Ferguson and C. R. Morris, Computer Wars - The Fall of IBM

and the Future of Global Technology. Three Rivers Press, 1994.

[12] L. G. Valiant, “A bridging model for parallel computation,”Communications of the ACM, vol. 33, no. 8, August 1990.

[13] W. R. Stevens, TCP/IP Illustrated.[14] V. Kawadia and P. R. Kumar, “A cautionary perspective on cross

layer design,” IEEE Wireless Communication Magazine, pp. 3–11,vol. 12, no. 1, Feb 2005.

[15] CORBA Components, Version 3.0, Object Management Group, 2002.[16] Microsoft .NET, http://www.microsoft.com/net/, Microsoft Inc.[17] V. Matena and B. Stearns, Applying Enterprise JavaBeans:

Component-Based Development for the J2EE Platform, ser. Java Series.Sun Microsystems Press, Jan 2001.

[18] CORBA Success Stories, OMG Inc,http://www.corba.org/success.htm.

[19] Middleware is Everywhere, IBM, http://www-306.ibm.com/software/.

[20] Microsoft .NET, Microsoft Inc, http://www.microsoft.com/net/.[21] Jave 2 Platform, Enterprise Edition (J2EE), Sun Microsystems,

http://java.sun.com/j2ee/.[22] “IT convergence lab,” http://decision.csl.uiuc.edu/∼testbed/.[23] IEEE 802 LAN/MAN Standards Committee, “Wireless LAN

medium access control (MAC) and physical layer (PHY)specifications,” IEEE Standard 802.11, 1999 edition, 1999.

[24] A. Giridhar and P.R.Kumar, “Scheduling Traffic on a Network ofRoads,” IEEE Trans on Veh Tech, pp. 1467–1474, vol. 55, Sep 2006.

[25] “The Official Bluetooth Website,” http://www.bluetooth.com/.[26] CORBA: Core Specification, Object Management Group, Dec 2002.[27] S. Kowshik, G. Baliga, S. Graham, and L. Sha, “Co-design based

approach to improve robustness in networked control systems,”in Proceedings of the International Conference on Dependable Systemsand Networks, Yokohama, Japan, June 2005.

[28] K.-D. Kim and P. R. Kumar, “Architecture and mechanism designfor real-time and fault-tolerant etherware for networked control,”in Proc 17th IFAC World Congress, Seoul, July 2008, pp. 9421–9426.

[29] XML, W3C - World Wide Web Consortium, Oct 2000.[30] “Java 2 platform (j2se),” http://java.sun.com/j2se/.[31] Applied Operating System Concepts, John Wiley, 2000.[32] S. Graham and P. R. Kumar, “Time in general-purpose control sys-

tems: The control time protocol and an experimental evaluation,”in Proc 43rd IEEE CDC, pp. 4004–4009, Dec 2004.

[33] A. V. Aho and J. D. Ullman, Foundations of Computer Science. W.H. Freemand and Co, 1995.

[34] M. Clavel, F. Duran, S. Eker, P. Lincoln, N. Marti-Oliet,J. Meseguer, and C. Talcott, Maude Manual (Version 2.1),http://maude.cs.uiuc.edu/maude2-manual/, Mar 2004.

[35] “Apache ant,” http://ant.apache.org.[36] T. Socolofsky and C. Kale, RFC 1180 - TCP/IP Tutorial.[37] Real-Time CORBA Specification Version 2.0, OMG, Inc, Nov 2003.[38] Minimum Corba Specification, OMG, Inc, Aug 2002.[39] L. Wills, S. Sander, S. Kannan, A. Kahn, J. V. R. Prasad, and

D. Schrage, “An open control platform for reconfigurable, dis-tributed, hierarchical control systems,” in Proceedings of the DigitalAvionics Systems Conference, Oct 2000.

[40] “Rofes: Real-time corba for embedded systems,”http://www.lfbs.rwth-aachen.de/users/stefan/rofes/.

[41] T. Henzinger, B. Horowitz, and C. Kirsch, “Giotto: A time-triggered language for embedded programming,” in Proc. 1st Int.Workshop Embedded Software (DMSOFT ’01), Tahoe City, Ca, 2001.

[42] A. Traub and R. Schraft, “An object-oriented realtime frameworkfor distributed control systems,” in Proc. IEEE Conference onRobotics and Automation, Detroit, Mi, May 1999.

[43] P. Lutz, W. Sperling, D. Fichtner, and R. Mackay, “Osaca - thevendor neutral control architecture,” in Proc. European Conferenceon Integration in Manufacturing, Dresden, Germany, Sept 1997, pp.247–256.

[44] B. S. Heck, L. M. Wills, and G. J. Vachtsevanos, “Software tech-nology for implementing reusable, distributed control systems,”IEEE Control Systems Magazine, vol. 23, no. 1, February 2003.

[45] C. Robinson and P. R. Kumar, “Optimizing controller location innetworked control systems with packet drops,” To appear in IEEEJournal on Selected Areas in Communications, Special Issue on Controland Communications, 2008.

[46] K.-J. L. C.-C. Han and C.-J. Hou, “Distance-constrained schedul-ing and its applications to real-time systems,” IEEE Transactionson Computers, vol. 45, no. 7, pp. 814–826, 1996.

[47] G. Baliga, “A Middleware Framework for Networked Con-trol Systems,” Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, 2005.