Distributed Multi-Scale Data Processing for Sensor Networksrwagner/docs/wagnerPHDThesis.pdf · Wireless sensor networks provide a challenging application area for signal process-ing

RICE UNIVERSITY

Distributed Multi-Scale Data Processing for Sensor

Networks

by

Raymond S. Wagner

A THESIS SUBMITTEDIN PARTIAL FULFILLMENT OF THEREQUIREMENTS FOR THE DEGREE

Doctor of Philosophy

Approved, Thesis Committee:

Richard G. Baraniuk, Chair,Victor E. Cameron Professor,Electrical and Computer Engineering

David B. Johnson, Associate Professor,Computer Science

T. S. Eugene Ng, Assistant Professor,Computer Science

Albert Cohen, Professor,Laboratoire Jacques-Louis Lions,Universite Pierre et Marie Curie

Houston, Texas

APRIL 2007

Abstract

Distributed Multi-Scale Data Processing for Sensor Networksby

Raymond S. Wagner

Wireless sensor networks provide a challenging application area for signal process-ing. Sensor networks are collections of small, battery-operated devices called sensornodes, each of which is capable of sensing data, processing data with an onboardmicroprocessor, and sharing data with other nodes by forming a wireless, multi-hopnetwork. Since communication power consumption in nodes typically dominates oversensing and processing power consumption by orders of magnitude, it is often moreefficient to pose questions on measured data in a distributed fashion within the net-work than it is to collect data at a single location for centralized processing. Underthis model, nodes collaborate with each other in some neighborhood using localizedcommunications and in-network processing to compute answers to users’ questions,which are then sent over more costly, long-haul links to a data sink.

In this thesis, our contributions to distributed data processing in sensor networksfall into two main categories. First, we develop a new class of multi-scale distributeddata processing algorithms based on distributed wavelet analysis. Specifically, weformulate and analyze a novel, distributed wavelet transform (WT) suited to theirregular-grid data samples expected in real-world sensor network deployments. TheWT replaces node measurements with a set of wavelet coefficients that are moresparse than the original data and enable subsequent distributed processing. We thendevelop and analyze protocols for wavelet-based processing, including distributed,lossy compression and distributed de-noising of node measurements.

Our second main contribution is the development of a network application pro-gramming interface (API) for distributed data processing in sensor networks. Guidedby our experience in implementing the distributed WT in a real sensor network, werealize that a fundamental set of communication patterns underlie the bulk of dis-tributed algorithms. Expanding our scope past the distributed WT, we survey allsuch algorithms proposed in the proceedings of the Information Processing in Sen-sor Networks (IPSN) conference to extract the communication patterns. Using thesurvey results, we design a network API composed of four main families of calls.Its implementation, in ongoing work, will enable easy and invaluable prototyping ofdistributed processing algorithms in real sensor network hardware.

Acknowledgements

This thesis represents years of growth, both personal and professional, which wouldnot be possible without the support of many colleagues and friends. In attemptingto name and thank them all, I will no doubt commit grave errors of omission — I begthe forgiveness of any whom I forget in advance.

To begin with, I must thank all the many colleagues at Rice and elsewhere whosecollaboration directly contributed to this thesis — Marco Duarte, Shu Du, ShriramSarvotham, and Ryan Stinnett, to name a few. Veronique Delouille, in particular, hasbeen extremely helpful. Many of the results here extend and apply those in her Ph.D.thesis, and her involvement has been a key factor in this work’s success. I must alsothank my thesis committee — Albert Cohen, David B. Johnson, and T.S. Eugene Ng— who not only served as mentors but also as collaborators in all the work presentedhere. And last, but certainly not least, I thank my advisor, Richard Baraniuk. Richhas made my graduate experience something truly remarkable, pushing me to be notonly a good researcher but also a good leader, organizer, entrepreneur, and salesman.I would also like to dedicate this thesis to the memory of Hyeokho Choi, who guidedand inspired all of us at Rice, and who left us far to early.

Nothing has gotten me through the good and bad times of graduate study like myfriends, with whom I have laughed, cried, and imbibed a goodly amount of spirits.Their fellowship has been the best part of my graduate years. Many thanks and muchlove go out to Juliet Bauer, Meredith Borders, Todd Graves, Georgeann and JesseGroh, Sanjiv Manghnani, Josh Katz, Phuc Luu, Sarah Pitre, Matt Schlabach, ChrisSteeger, and Ben and Zach Summers, to name a few.

Finally, I am entirely indebted to the love and support of my parents, Linda andEdwin Wagner. Their wisdom, strength, and faith in me have sustained me whenmine have failed. I cannot begin to thank them enough.

Contents

1 Introduction 11.1 Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Distributed Data Processing . . . . . . . . . . . . . . . . . . . . . . . 21.3 Overview and Contributions . . . . . . . . . . . . . . . . . . . . . . . 3

2 Distributed Data Representations for Sensor Networks 62.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Parametric Distributed Estimation . . . . . . . . . . . . . . . 72.1.2 Distributed Source Coding . . . . . . . . . . . . . . . . . . . . 82.1.3 Distributed Compressed Sensing . . . . . . . . . . . . . . . . . 92.1.4 Distributed Regression . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Wavelet Representation Background . . . . . . . . . . . . . . . . . . . 112.3 Wavelet Theory and Sensor Networks . . . . . . . . . . . . . . . . . . 13

3 Multi-Scale Transform Design and Properties 173.1 Related Work on Multi-Scale Transforms for Sensor Networks . . . . 183.2 Multiscale Description of Scattered Points . . . . . . . . . . . . . . . 213.3 Multiscale Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Decay Properties of Wavelet Coefficients . . . . . . . . . . . . . . . . 293.5 Numerical Stability Study . . . . . . . . . . . . . . . . . . . . . . . . 323.6 Transform Protocol, Synchronization, and Robustness . . . . . . . . . 343.7 Spatio-Temporal Wavelet Analysis . . . . . . . . . . . . . . . . . . . . 37

4 Distributed Wavelet Compression 394.1 Basic Compression Protocol . . . . . . . . . . . . . . . . . . . . . . . 404.2 Spatial Compression Performance Study . . . . . . . . . . . . . . . . 414.3 Transform Communication Cost . . . . . . . . . . . . . . . . . . . . . 44

4.3.1 Break-Even Analysis of Distributed Wavelet Processing . . . . 454.3.2 Distortion/Energy Analysis of Distributed Compression . . . . 47

4.4 Modifications of Basic Compression Protocol . . . . . . . . . . . . . . 494.4.1 Multiple-Threshold Queries . . . . . . . . . . . . . . . . . . . 494.4.2 Successive-Approximation Quantization . . . . . . . . . . . . . 50

4.5 Spatio-Temporal Compression Performance Study . . . . . . . . . . . 524.6 Distributed Wavelet Analysis Applicability Summary . . . . . . . . . 55

iv

5 Distributed Wavelet De-Noising 625.1 Distributed De-Noising Methods . . . . . . . . . . . . . . . . . . . . . 63

5.1.1 Universal Thresholding . . . . . . . . . . . . . . . . . . . . . . 645.1.2 Bayesian Shrinkage . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 Spatial De-noising Performance Study . . . . . . . . . . . . . . . . . 685.3 Spatio-Temporal De-noising Performance Study . . . . . . . . . . . . 70

6 Distributed Data Processing Application Programming Interface 776.1 Survey of Application Requirements . . . . . . . . . . . . . . . . . . . 786.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.2.2 Supporting Multiple Addressing Modes . . . . . . . . . . . . . 816.2.3 Supporting Multiple Receive Modes . . . . . . . . . . . . . . . 816.2.4 Giving the Application the Ability to Control Transmission Effort 826.2.5 Providing a Packet Fragmentation and Reassembly Service . . 836.2.6 Providing Flexible Memory Allocation and Management for

Variable Sized Data . . . . . . . . . . . . . . . . . . . . . . . . 846.2.7 Supporting Self-Organized Device Hierarchies . . . . . . . . . 85

6.3 API Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.3.1 Address-Based Sending . . . . . . . . . . . . . . . . . . . . . . 866.3.2 Region-Based Sending . . . . . . . . . . . . . . . . . . . . . . 876.3.3 Device Hierarchy Sending . . . . . . . . . . . . . . . . . . . . 886.3.4 Receiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.4 Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 916.4.1 Distributed Wavelet Compression . . . . . . . . . . . . . . . . 916.4.2 TinyDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4.3 Distributed Multi-Target Tracking . . . . . . . . . . . . . . . . 936.4.4 Fractional Cascading . . . . . . . . . . . . . . . . . . . . . . . 94

7 Conclusions and Future Work 967.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

A Appendix: Proof of Theorem 3.3.1 99

v

List of Figures

2.1 Cameraman image and wavelet transform . . . . . . . . . . . . . . . . 142.2 Diagram of Intel-Berkeley sensor network deployment . . . . . . . . . 15

3.1 Histograms of cameraman wavelet coefficient magnitudes (2-D and 1-Ddiscrete wavelet transforms) . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 A cascaded pair of lifting stages . . . . . . . . . . . . . . . . . . . . . 243.3 Example of proposed thinning algorithm applied to an irregular sam-

pling grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Numerical study of predict, update coefficient bounds. . . . . . . . . 343.5 Transform communication patterns . . . . . . . . . . . . . . . . . . . 36

4.1 Example measurement fields . . . . . . . . . . . . . . . . . . . . . . . 424.2 Spatial transform compression study . . . . . . . . . . . . . . . . . . 434.3 Wavelet processing break-even study . . . . . . . . . . . . . . . . . . 444.4 WT energy consumption histograms . . . . . . . . . . . . . . . . . . . 474.5 Distortion/Energy analysis example measurement fields . . . . . . . . 474.6 Distortion versus energy compression study . . . . . . . . . . . . . . . 494.7 Energy expenditures of querying protocols. . . . . . . . . . . . . . . . 514.8 Spatio-temporal transform compression study (smooth fields, fixed num-

ber of sensors, varying time series length) . . . . . . . . . . . . . . . . 574.9 Spatio-temporal transform compression study (smooth fields, fixed time

series length, varying number of sensors) . . . . . . . . . . . . . . . . 584.10 Spatio-temporal transform compression study (piecewise-smooth fields,

fixed number of sensors, varying time series length) . . . . . . . . . . 594.11 Spatio-temporal transform compression study (piecewise-smooth fields,

fixed time series length, varying number of sensors) . . . . . . . . . . 604.12 Spatio-temporal compression study with intermediate zerotree coding. 61

5.1 Spatial de-noising with compression study . . . . . . . . . . . . . . . 695.2 In-place spatio-temporal de-noising study (low noise, smooth fields) . 735.3 In-place spatio-temporal de-noising study (high noise, smooth fields) . 745.4 In-place spatio-temporal de-noising study (high noise, piecewise-smooth

fields) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.5 In-place spatio-temporal de-noising study (low noise, piecewise-smooth

fields) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

vi

List of Tables

3.1 Transform condition number numerical study . . . . . . . . . . . . . 34

5.1 In-place spatial de-noising study . . . . . . . . . . . . . . . . . . . . . 68

vii

Chapter 1Introduction

1.1 Wireless Sensor Networks

In recent years, wireless sensor networks have emerged as an important new mea-surement tool whose true potential is still being realized. These dense collectionsof small, inexpensive data gathering devices with self-contained power supplies andradios for wireless data transmission afford the opportunity to sample broad phe-nomena of interest with unprecedented detail in space and time. Some researchers,in fact, have taken to describing sensor networks as “macroscopes” to reflect theirpotential to blend the fine-grained detail of microscopes with the broad field of viewof telescopes [1].

The possible application areas for sensor networks are myriad, ranging from thosefor which sensor networks can supplant existing methods at a lower cost to ones un-thinkable with current technology. Wireless sensor networks, for instance, can offera huge advantage over existing wired networks of sensors, requiring less time andmanpower to instrument industrial lines for process control and civilian structuressuch as buildings and bridges for vibration analysis. But it is the completely novelapplications that make the most compelling case for sensor networks. Study of micro-climates — localized climates that exhibit large variability over a small spatial area— provides a flagship example. In these environments, commonly found in forests,quantities such as temperature and humidity vary greatly from the ground level tothe tree canopy. While this can be verified by hauling a common weather stationup a tree and periodically taking measurements, such devices are too expensive andbulky to deploy in a fashion that densely captures a picture of the microclimate ata given instant in time. Instead, researchers can instrument trees with a network ofsmall, inexpensive sensor nodes capable of coordinating their measurements to gathersnapshots of the microclimate. Such data open up entirely new fields of biology un-thinkable with previous technology [1].

Of course, the utility of sensor networks is not limited to the natural sciences. Withtheir onboard power supplies and wireless communication capacity, sensor networkscan be deployed in an ad-hoc fashion that enables a wealth of potential applications.In a military setting, sensor networks can be quickly emplaced from a mobile platform,such as a helicopter, to monitor a battlefield where no prior sensing infrastructureexists. In the civilian world, such rapid deployments can aid emergency respondersin the event of a disaster, monitoring, for example, the spread of a toxic substancereleased in an industrial accident or terrorism event [2]. Clearly, these applications

1

only scratch the surface of the potential utility of sensor networks; as the technologyprogresses and matures, we can expect to see networked sensors permeate manyaspects of life.

Each node in the sensor network is capable of three main tasks: collecting datawith its sensor suite, processing data with an onboard microprocessor, and sharingdata with neighboring nodes in the network using its radio. Due to the constraintsthat nodes be relatively inexpensive and conservatively use on-board power supplies,these radios are typically low power components with limited transmission ranges.Thus, the set of radio neighbors for each node is also limited, and a source node mustrelay a message through intermediaries when the destination node to which it wishesto send a message is further than its radio transmission limit. In doing so, nodes formwhat is known as a multi-hop wireless network.

In addition to the collection of inexpensive sensor nodes, the sensor network usu-ally features one or more data sinks — points where a user collects data gathered bythe network. A sink can, for example, take the form of an internet gateway, whereremote users can access the data gathered by the sensor network via the internetand issue commands for further sensing. Data sinks are typically considered to besophisticated, powered devices not subject to the component and power limitationsof sensor nodes. Devices of intermediate sophistication may also be present in thesensor network, possessing greater processing capabilities, more powerful radios, andgreater power supplies than node-class devices. When this is the case, they are typ-ically found in a lower concentration than the nodes and are often responsible formanaging data gathering from some local neighborhood of nodes.

Since the appearance of the first sensor node prototypes in the SmartDust projectat UC Berkeley in the late 1990’s [3], several commercial sensor networking platformshave been introduced. Most common among them are Crossbow, Inc.’s lower-tierMica nodes and upper-tier Stargate devices [4]. And with these early sensor networksas a promise of things to come, an active research community has arisen to addressthe many issues involved in building robust, practical sensor networks with whichto instrument the world. This work continues to be a truly multi-disciplinary effort,uniting the hardware designers creating the sensor nodes with the computer scientistssupporting node networking and the application developers creating methods to bringthis new observational tool to bear on the physical world.

1.2 Distributed Data Processing

The user of a sensor network wishes to answer questions about the environmentmonitored by the network using the data gathered by the network. For example, theuser may wish to track the dispersal of some chemical agent over time, monitor asurveillance network for intruders and identify their location and path, or visualizethe set of measurements gathered at a given time to some level of detail. The naiveway to implement these activities in the sensor network setting involves collecting all

2

the data gathered in the network at a central location and analyzing that data usingtraditional signal processing methods. This approach, however, is often ill-suited tothe sensor networking environment.

Given the low power of each node’s radio, only a small fraction of nodes in adense network will find themselves within transmission range of a data sink. Thus,for any given node in the network to send its data to the sink, that data messagemust be relayed through intermediate nodes on the path to the sink using multi-hoprouting through the network of nodes. Considering that each node in the networkmust initiate such a message to enable collection of all measured data, the amount ofrelay traffic in the network becomes quite significant, especially for the set of nodeswithin a few radio hops of the data sink. These nodes will find themselves relayingmeasurements from the entire network, and the burden of this will quickly exhausttheir limited power supplies as well as congest the radio bandwidth near the sink. Andonce these nodes have depleted their power and died, the sink will effectively be cutoff from the remainder of the network — though it may be able to send messages tonodes using its own higher-powered radio, there will be no remaining nodes within themore limited node transmission range of the sank. Thus, the network will effectivelybe useless until the sink can be re-located or a new sink installed. Clearly, the sensornetwork will not be capable of tolerating many such centralized processing operations.

Recall, though, that the network user wishes to pose a question to the data gath-ered by the network and in many cases does require access to the actual data. Thissuggests posing the question to the network itself and allowing nodes to collabora-tively process their measurements in such a fashion that the answer is found withinthe network and then sent instead of the original data set via potentially long-distanceroutes to the data sink. In typical sensor node hardware, the power required to pro-cess a bit is orders of magnitude less expensive than the power required to transmita bit on the node’s radio [5], so trading local communication with neighboring nodesand local data processing for long-haul, wholesale transmission of data to the sinkfor centralized processing can provide significant savings in terms of nodes’ powersupplies, especially in those nodes near the sink.

1.3 Overview and Contributions

Our goal in this thesis is the development of a new class of distributed, multi-scaleprocessing algorithms suitable for efficiently representing measured data in sensor net-works. Throughout the work, we remain mindful of the need for such algorithms tobe practically relevant in real sensor networks. On one hand, this requires designingalgorithms with relatively simple node computation requirements and fast interpre-tation of results at the data sink. On another, this necessitates accommodating abroad class of potential measurement signals captured by the network. And finally,to ensure that the procedure conserves more node power than it spends, this requiresthat inter-node communications operate efficiently in terms of the network routing

3

infrastructure.Utility of the proposed algorithms is verified through extensive simulations, but

in the course of doing so, we have come to realize the following truth: the intimateinterplay between the algorithm communication requirements and the network rout-ing economics in real sensor networks cannot be fully captured through simulations.Algorithm designers must be able to easily implement their designs in sensor networkhardware to verify the true practicality of their designs. Unfortunately, the currentstate of the art in sensor network programming tools makes fast algorithm prototypingimpossible. Thus, to lay the ground work for building the necessary tools, we expandour scope in the latter portion of this thesis past multi-scale algorithms to conduct anextensive survey of a much broader spectrum of distributed algorithms proposed forsensor networks. The results of this study enable us to design the requirements forthe next-generation of sensor network programming environments, currently underdevelopment.

Our novel contributions include:

a new multi-scale transform, based on the theory of wavelet lifting, that easilydistributes in a sensor network, tolerates irregular-grid node placement, andefficiently represents measured data;

analysis of the transform’s numerical stability;

application of the transform to distributed measurement compression and dis-tributed measurement noise removal ;

an extensive survey of the communication requirements in a broad range ofproposed sensor network applications;

development of a network application programming interface to support fastimplementation of these proposed algorithms.

We now outline these contributions chapter-by-chapter. We begin in Chapter2 with a discussion of distributed data representations for sensor networks. Wefirst overview several alternative representation methods, discussing their relativestrengths and weaknesses. We then provide an introduction to the theory of waveletanalysis and conclude with a discussion of the difficulties of applying traditionalwavelet theory to the sensor network setting.

In Chapter 3, we detail the design of a wavelet transform (WT) suitable fordeployment in the sensor networks, where nodes must compute the transform ina distributed fashion and tolerate node placement that deviates from the regular,square grid paradigm assumed by traditional wavelet analysis. We prove that theresultant transform inherits some key properties of traditional wavelet transforms,and we demonstrate its stability through a number of numerical experiments.

We apply the coefficients generated by the distributed WT to the problem of dis-tributed measurement compression in Chapter 4. We develop a protocol to realize

4

distributed, wavelet-based compression, and we demonstrate via numerical studiesthat the transform data is stable under such processing by compressing a varietyof sample fields captured by the network at a single instant in time. We extendthis investigation by means of ns-2 networking simulations to demonstrate that thecollaborative overhead required by the WT does not prove prohibitive when consid-ering sensor network routing economics. We then consider compression of time seriesof measurements captured by each node in the network through further numericalanalysis.

In Chapter 5, we consider the application of wavelet-based noise removal fromsensor measurements corrupted by additive noise. We develop two families of proto-cols, of varying costs and efficacies, for wavelet-based de-noising, and we numericallyevaluate the applicability of the transform data to such processing for both scalarnode measurements and time-series.

We expand our scope past multi-scale processing in Chapter 6 to study the com-munication requirements of a broader class of distributed data processing algorithmsproposed for the sensor network setting. Surveying more than 100 papers from thepremier conference for distributed processing in sensor networks, we develop a taxon-omy of communication requirements. Inspired by our own difficulties in implementingthe proposed WT in a real sensor network testbed, we design a network applicationprogramming interface (API) to enable easy access by programmers to the commoncollaboration patterns found in distributed data processing algorithms. The imple-mentation of this API will allow fast prototyping of algorithms by researchers wishingto assess the true practicality of their proposed algorithms in real sensor networkingenvironments.

Finally, we conclude with a summary of our contributions and a discussion ofongoing work in Chapter 7.

Research in the field of sensor networks is by its nature highly inter-disciplinary.Thus, this work is the result of extensive collaboration with researchers from signalprocessing, mathematics, statistics, and computer science. As such, the first page ofeach chapter will identify as necessary the collaborators who share credit for the workdescribed therein.

5

Chapter 2Distributed Data Representations for Sensor

Networks

The true power of sensor networks lies in their ability to densely sample phenom-ena of interest both in space, by populating a given area with a large number ofinexpensive sensor nodes, and in time, by collecting as large a time series as allowedby node memory resources. While the possible applications of this spatio-temporaldata are myriad, accessing the data recorded by the nodes at the network’s data sinkis perhaps one of the most fundamental. With such a tool, network users can visualizethe trends of measured data in space and time, diagnose faulty sensors with abnormalreadings, store data for future study, and so on.

The large size of this data set, however, complicates aggregation at the sink. Asmentioned in Section 1.2, node power and bandwidth constraints complicate eachnode sending its recorded measurements straight to the sink via multi-hop routingthrough the network. Fortunately, the data captured by a sensor network typicallyexhibit high correlation that can be leveraged to facilitate data collection. Specif-ically, measurements can show both correlation in time (between measurements inan individual node’s time series taken close in time) and in space (between measure-ments taken at the same instant in time at nodes that are located near each other).Intelligently designed data representations can reduce the size of the description ofthese correlated measurements by identifying and encoding common components asingle time, rather than allowing each node to independently encode the shared in-formation. This reduces the total amount of data flowing to the sink through thenetwork, easing the power and bandwidth requirements on the nodes.

In this Chapter, we set the stage for developing a distributed, multi-scale datarepresentation that efficiently represents sensor data. In Section 2.1, we briefly reviewseveral alternative data representations proposed for sensor networks, discussing theirrelative strengths and weaknesses, to establish a design space in which to develop ourdistributed multi-scale transform. In Section 2.2, we review multi-scale data analysis,focusing on the wavelet theory that underlies our design. Finally, in Section 2.3, wediscuss the problems associated with adapting traditional wavelet transforms to thesensor network setting and motivate the need for an entirely new wavelet analysistool for sensor networks.

6

2.1 Related Work

There are several fronts in the research effort to create an efficient and easilydistributable representation for sensor network data, and each approach has its ownadvantages and disadvantages. Some require no communication between nodes encod-ing their measurements, while others allow a modicum of in-network collaboration.Some place strict assumptions on the signal classes to which measured data mustbelong, while others have very loose requirements. We now review four main familiesof techniques to elucidate their design trade-offs. While this list of related work is byno means exhaustive, it provides a thorough summary of some of the most promis-ing techniques in order to contextualize the development of a distributed, multi-scaledata representation for sensor networks.

2.1.1 Parametric Distributed Estimation

In one family of distributed data representations, reviewed in [6], each node en-codes its value (typically, a scalar) as a heavily quantized version of the original sensormeasurement. The measured field is assumed to arise from a parametrized processcorrupted by additive noise independently distributed at each sensor, and the goalof the data encoding is recovery of the field parameters at the sink using the quan-tized values. The encoding process effectively compresses the field measurements,and representing the field as a parametric model helps to remove noise effects in thereconstruction.

Specifically, the measurement at each of K sensors is described as xk = φk(θ)+wk,k = [1, · · · , K], where θ is the parameter vector, φk() describes the function of θ givingthe noiseless measurement at sensor k, and wk describes the independent randomnoise variable at sensor k. Each node independently quantizes its measurement andtransmits the quantized value to the sink. As described in [6], transmissions are directto a data sink and mutually non-interfering, though the approach is easily extensibleto the scenario where all nodes are not within radio range of the sink and a multi-hoprouting scheme must be used for message delivery.

When the noise process at each sensor is completely known, or at least knownwith unknown parameters, estimating the parameter θ governing the field process iseasily accomplished for scalar θ using maximum-likelihood estimation. Estimatinga vector-valued θ is more difficult, however, but it is possible when noise densitiesare log-concave and each φk is a linear function of θ. Results are highly dependenton quantization choices at each sensor and can benefit from feedback of the priorestimate by the fusion center when iterating multiple rounds of estimation. Theapproach can also be extended to accommodate vector-valued xk, corresponding totime-series of measurements collected at each sensor. When the noise process iscompletely unknown, the optimal approaches described above will no longer work.Instead, sub-optimal, universal estimation techniques must be applied.

The approach is, in general, well-suited to deployment in sensor networks since it

7

requires no inter-sensor collaboration during encoding, though it may require feedbackfrom the sink to be practically relevant. However, the restrictions imposed on thereconstruction of measured data at the sink by the measurement model can easily beproblematic. As estimation of vector-valued θ can prove computationally difficult andrequire a linear relationship between θ and each mk, the approach is arguably bestsuited to simpler field models described by a single parameter. And, in either case,the quality of the reconstruction using this approach is only as good as the modelselected to represent the field. Such simple models are not likely to capture in generalthe complex measurement fields we expect to see in real sensor networks due to thephysics of the processes they are measuring. Thus, when fidelity to the true field isrequired at the data sink, this method is not likely to exhibit desirable performancefor a wide range of measurement field classes.

2.1.2 Distributed Source Coding

Distributed source coding (DSC) provides another proposed data representationfor sensor networks. This technique takes an information theoretic view of encodingdata from correlated sources for transmission to a data sink and subsequent de-coding.An overview of the history and recent progress of the field can be found in [7, 8].

Consider two correlated random variables, X and Y , that we wish to observe, en-code, and then jointly reconstruct at a decoder. The canonical source coding theoryof Shannon states that the pair can be jointly encoded at a rate of H(X,Y ), the jointentropy of X and Y , should both sources be known at the encoder. In a landmark1973 paper, Slepian and Wolf showed that, in theory, a similar rate can be achievedby encoding each source in isolation — that is, the optimal coding rate is theoreti-cally possible with no collaboration among the individual coders for each source [9].Clearly, such a result is quite attractive in the context of sensor networks, with theimplication that nodes can encode their data with zero in-network collaboration costand still achieve an optimal coding rate.

The Slepian-Wolf results applies to lossless encoding, where the data are recoveredperfectly at the de-coder. A similar result was proven a short time later by Wynerand Ziv for lossy encoding, whereby the coder can spend an increasing bit budget tobuy progressively greater fidelity in the reconstruction at the decoder, approachinglossless quality [10].

The Slepian-Wolf and Wyner-Ziv theorems, however, only proved that such codesexist. Real codes approaching the results promised by the theorems were not seen formany years; in fact, only with the advent of interest in sensor networks have practicalencoding schemes been derived. These new techniques have borrowed heavily fromresults in channel coding (Slepian-Wolf) and joint source/channel coding (Wyner-Ziv). That is, the correlation between variables is modelled as a communicationchannel, and well-known solutions such as low-density parity check codes or turbocodes derived for the chosen channel model are applied.

The results, however, are still relatively nascent. Many codes assume a binary

8

symmetric channel model describing the relationship between the two sources X andY ; some extended to a jointly Gaussian model. This is due in large part to the factthat work is still ongoing in channel coding theory for more sophisticated channels.As is the case with the parametric techniques, these models do not describe well thecorrelation we expect to see in general among data in a sensor network and are likely tofind only niche applicability. More problematic, however, is the difficulty extendingthe methods to joint encoding of more than two sources. With the typical densesampling expected in sensor networks, any one measurement will exhibit significantcorrelation with many other node’s values. Exploiting this commonality only on apair-wise basis will be inherently inefficient.

2.1.3 Distributed Compressed Sensing

The new theory of compressed sensing (CS) shows much promise for applicationin sensor networks. CS states that any signal that can be sparsely represented by Kelements of some basis can be also recovered by encoding the signal as a small numberof projections into a basis incoherent with the first — i.e., one in which the signal hasno sparse representation. Two facts make CS theory especially powerful. First, thenumber of required projections is typically cK, a small, constant multiple of the termsrequired by an expansion in the K-sparse basis. Second, bases formed by independent,identically distributed random vectors (such as Gaussian or Bernoulli random variablesamples) are with high probability incoherent with any given basis [11].

In the sensor network setting, these results allow the complexity of efficientlyrepresenting data to be pushed out of the encoder at each node and into the decoderat the data sink collecting the encoded measurements. Two recent applications ofCS to data compression in sensor networks have recently appeared in the literature.The first concerns encoding the spatio-temporal set of data captured by a network ofnodes collecting time-series of measurements. In this approach, each node computesrandom projections of its time series onto incoherent basis vectors in isolation anddoes not communicate with neighbors during the encoding process [11]. Instead ofexploiting spatial correlations among node measurements during the encoding process,these correlations are instead leveraged at the data sink to hasten decoding of therandom projections gathered from each node. The second technique aims to compressa snapshot captured by the network at a single instant in time, computing randomprojections of the spatial set of scalar node measurements rather than node time-series [12]. To compute the sum required in projecting the set onto incoherent basisvectors, nodes are allowed to repeatedly converse with their radio neighbors in aprocess called “gossiping”. Given enough gossiping rounds, each node in the networkhas access to the set of projections, allowing any node to potentially serve as an accesspoint for a data sink wishing to reconstruct the original measurement field values fromthe projections.

Encoding data using the random projections of CS is a very attractive solution inthe context of sensor networks. The encoding process itself requires very little com-

9

plexity in sensor hardware, and the projections can be accomplished using either localinter-sensor communication or none at all. Moreover, the class of field measurementsthat can be efficiently represented using this approach is quite large. Any signal thatis sparse in some basis can be efficiently encoded using random projections. Evensignals that are not sparse but compressible in some basis (i.e. their basis coefficientmagnitudes decay rapidly) can be encoded within a constant factor of error of thebest signal approximation [11]. These signal classes cover a variety of real-world datasets [11, 12], rendering CS a truly viable alternative for efficient data representationin sensor networks.

Recall, though, that CS pushes computational complexity from the encoder tothe decoder. As a result, decoding the random projections to recover the originalmeasurement fields becomes a computationally intensive task with a significant pro-cessing time [13, 14]. Thus, given the current start of the art in decoding, the CSrepresentation is best suited to data recovery where results are not needed for sometime, e.g. when the user is not interested in rapid interpretation of data for suchactivities as measurement-based node actuation.

2.1.4 Distributed Regression

Another approach, sharing a similarity with the distributed wavelet transform(WT) we develop in Chapter 3, is found in the work of [15, 16] that formulates amethod for distributed regression of data in sensor networks. Under this framework,the physical space measured by the network is divided up into overlapping regions.A set of basis functions is assigned to each region, and weighting coefficients for alinear combination of those functions are found using a linear regression through themeasured data. The approximations in overlapping regions are tied together so thatthe global field approximation, consisting of a weighted combination of each region’sbasis functions (supported over that region), form a consistent estimate in the areasof overlap.

Specifically, for each of ` regions, a set of basis functions Hj = [hji (x, t)] are

assigned to the region, where x indexes spatial coordinates over the region and t is atime variable. Also, to each region j a kernel function Kj(x) is assigned. The kernelfunction has support over only that region and, in general, will tend to decrease tozero at the region’s boundary. The goal is determining a set of weighting coefficientswj

j such that the approximation f(x, t) of the field value f(x, t) at location x andtime t is given by

f(x, t) =∑j=1

Kj(x)∑lv=1 Kv(x)

∑

hji∈Hj

wji h

ji (x, t).

To do so, each node must compute the coefficients for the basis functions of everykernel that has nonzero support at that node’s location. This is accomplished by

10

building a distributed data structure known as a junction tree and repeatedly passingmessages over the links of that tree until the estimates of each nodes’ coefficientsconverge. A data sink can then collect the full set of coefficients by querying individ-ual nodes and constructing a view of the measured field by evaluating the weightedcombinations of basis functions at each node location.

This approach has several attractive features. Communication traffic is confined tonodes’ immediate radio neighbors (an artifact of junction tree formation), so the scopeof inter-node communication is limited. Second, due to the iterative nature of themessage-passing solution, the algorithm is robust to time-varying radio link qualitybetween nodes. If a link between two collaborating nodes were to periodically fail,the nodes would eventually be able to arrive at a solution given enough iterations,provided that the link did not fail permanently. It is not without its drawbacks,however. Kernels must be custom-designed for each network deployment to ensurean adequate amount of overlap to smooth the overall regression solution. Moreover,similar to the case with parametric distributed estimation and DSC, the fidelity ofthe approximation is limited by the basis functions chosen for each kernel region.Only signals that can be represented as a linear combination of the chosen functionswill be well-represented by the approximation. When this is not the case, potentiallyimportant signal features may be lost in the reconstruction.

2.2 Wavelet Representation Background

Multi-scale analysis, in the form of a distributed WT, provides an alternate dis-tributed data representation for sensor networks combining (i) efficient representationfor a wide range of measurement fields, (ii) frugal inter-node communication require-ments, and (iii) fast decoding of encoded data, as we will demonstrate in the course ofthis thesis. Multi-scale techniques have proven their utility in a variety of fields overthe years, including image compression (e.g., the JPEG 2000 coding standard [17])and numerical analysis of partial differential equations (e.g., multi-grid methods [18]).

Before we explore the application of wavelet analysis to sensor networks, we firstgive a brief review of the basics of wavelet theory [19,20], drawing from the explanationin [21]. For now, consider the signal of interest to be sampled at regular intervalson a one-dimensional (1-D) domain; we discuss extension to two-dimensional (2-D)domains below.

In this regular-grid, 1-D setting, classical or so-called first-generation wavelet the-ory is developed as follows. First, we wish to define a sequence of spaces Vj, Vj+1, · · ·such that projection of a function f ∈ L2(R) onto each subspace will provide succes-sively finer approximations to f as j ∈ Z increases. These subspaces · · · ⊂ V−1 ⊂V0 ⊂ V1 ⊂ · · · are designed to give a first-generation multi-resolution analysis (MRA)such that Vj ⊂ L2(R), j ∈ Z, with

⋃j∈Z Vj dense in L2(R) and

⋂j∈Z Vj = 0. Each

space Vj must be a scaled version of a central space V0 — that is, if f(x) ∈ V0,then f(2jx) ∈ Vj. V0 must be invariant under translation with a ϕ ∈ V0 such that

11

ϕ(x− k); k ∈ Z is a Riesz basis in V0.For a given scale j of the MRA, the set

ϕj,k(x) = 2j/2ϕ(2jx− k); k ∈ Z (2.1)

(scaled, translated versions of ϕ) form a Riesz basis for Vj. We refer to these as thescaling functions for Vj. Using the fact that ϕ ∈ V0 ⊂ V1, we can relate each ϕ ∈ V0

to ϕ1,k ∈ V1 through a so-called refinement equation

ϕ(x) =∑

k∈Zhkϕ1,k(x). (2.2)

Iterative application of this relation allows us to express the scaling functions for anyscale as a weighted linear combination of those in the next-finer scale.

We could encode our signal as a series of projections onto the basis functions ofeach subspace Vj, but wavelet analysis adopts a more efficient approach. Instead, wemerely encode the difference between the signal’s projection into one subspace Vj+1

and into the next coarser subspace Vj — the details not contained in the projectiononto Vj. To do so, we define the space Wj as the complement (typically, orthogonal)of Vj in Vj+1. This gives us Vj ⊕ Wj = Vj+1 (where ⊕ denotes the direct sumof two subspaces). Iterative application of this approach gives us that L2(R) =

Vjo ⊕⊕∞

j=joWj.

We equip the difference space W0 with a Riesz basis ψx−k; k ∈ Z — translatesof a “mother” wavelet function ψ. As in the case of the scaling functions for V0,we can formulate a refinement equation for the wavelet functions spanning W0, sinceW0 ⊂ V1, relating the wavelet function basis of W0 to the scaling function basis of V1

asψ(x) =

∑

k

gkϕ1,k(x). (2.3)

Thus, with these bases for Wj and Vj, any function in L2(R) can be expressedas f =

∑k〈f, ϕj0,k〉ϕj0,k +

∑∞j=j0

∑k〈f, ψj,k〉ψj,k. We denote the projections onto the

scaling space Vj0 as cj,k = 〈f, ϕj,k〉 and refer to them as scaling coefficients ; projectionsonto the basis of each wavelet space Wj are denoted as dj,k = 〈f, ψj,k〉 and referredto as wavelet coefficients.

We can use the filters from the refinement equations to relate the scaling coeffi-cients at scale j + 1 to the scaling and wavelet equations at coarser scale j as

cj,k = 〈f,∑

l

hlϕj+1,2k+l〉 =∑

k

hl−2kcj+1,l (2.4)

anddj,k =

∑

k

gl−2kcj+1,l. (2.5)

12

We can approximate each scaling coefficient cJ,k at some starting scale J as thesample of the original function f at 2−Jk, appropriately weighted to approximate theinner product of f with ϕJ,k. These relations then allows to iterate the transformfrom J to some final scale j0.

Specifically, the filter in (2.4) acts as a low-pass filter, generating a coarser sum-mery of f than the one encoded in the scale-(j + 1) scaling coefficients. The filter in(2.5) acts as a high-pass filter preserving the detail elements from the scale-(j + 1)summary that are no longer contained in the scale-j summary. These operations aretypically represented using cascading series of filters, referred to as “filterbanks”. Thehigh-pass filter (H) in the bank computes the wavelet coefficients for each scale, andthe low-pass filter (L) computes the new scaling coefficients, smoothing the overallrepresentation, for input to the next stage. Filterbank operations are usually imple-mented using fast Fourier transforms.

The 1-D transform as described thus far is easily extensible to signals whosedomain covers two dimensions, such as the pixels of a still image captured by thesensing elements of a CCD array in a digital camera. As an example, consider theclassic cameraman test image in Figure 2.1(a). Essentially, the signal is first filteredin one spatial dimension, and the results are then filtered in the other. This sequentialfiltering combines the L,H results of the first stage with the L,H results of the second,generating four bands: H/H, H/L, L/H, and L/L. The first three encode some high-pass features of the signal and give the wavelet coefficients. The fourth encodes alllow-pass information and yields the summary information that participates in thenext stage of the transform. A two-scale discrete wavelet transform(DWT) of thecamerman image appears in Figure 2.1(b). The tiny image in the top-left cornerencodes the final L/L scaling band, and the remainder of the blocks encode theH/H, H/L, and L/H wavelet bands at each of the two scales, with increasing spatialresolution to the right and down from the top-left corner.

2.3 Wavelet Theory and Sensor Networks

Wavelet analysis provides an efficient tool for representing a wide variety of real-world signals, since signals that are locally smooth have a sparse representation in thewavelet domain. That is, when signals smoothly vary in a region of the signal domain,scaling coefficients efficiently represent the largely low-frequency local behavior, andthe wavelet coefficients for that region will tend to be small in magnitude. As anexample, consider again the DWT of the cameraman image in Figure 2.1(b), wherelighter (darker) pixels represent higher magnitude (lower magnitude) wavelet coeffi-cients in the wavelet bands. In smooth regions such as the sky or the photographer’sovercoat, the image is relatively smooth, and the corresponding wavelet coefficientsare small in magnitude. The majority of larger-magnitude wavelet coefficients clusteraround edges in the image, such as the transition between the overcoat and the skyor the legs of the camera tripod and the background. In each of these areas, the

13

(a) (b)

Figure 2.1: (a) Standard cameraman test image and (b) its 2-D discrete wavelet transformusing CDF-2,2 wavelets.

scaling coefficients describe a region of transition, so non-negligible differences fromthe average are required to represent the field.

Sparsity in the wavelet domain allows for easy compression of images by settingsmall-magnitude coefficients to zero and easy removal of image noise, since small-magnitude coefficients at fine scales tend to only encode measurement noise. Intelli-gent modification of these terms removes noise energy from the reconstructed signalwithout removing signal energy. A number of image coders (e.g., [22–24]) and de-noising algorithms (e.g., [25–27]) based on wavelet analysis have been proposed in thesignal processing literature, and the approach is widely accepted as a computationallyefficient and powerful solution.

Given its success in such fields as image processing, wavelet analysis clearly holdsmuch promise for the sensor network setting. In fact, the set of sensor nodes, dis-tributed in space and collecting measurements, effectively “image” the environmentthey monitor. When all sensor coordinate to sample a scalar at the same time, theycapture a snapshot of the environment. Likewise, when they coordinate to capturetime-series of measurements, they effectively capture a “video” sequence. An impor-tant difference exists, however — unlike the pixels in a digital camera’s charge-coupleddevice (CCD) array that are regularly spaced in both dimensions, the nodes of a sen-sor network are very likely to form an irregular sampling grid in space. Consider, atone extreme, the case of nodes deployed quickly from a mobile platform in a crisissituation. The sampling grid formed by these nodes is likely to be highly irregular.And even when nodes can be deployed at leisure in a more controlled environment,such as a manufacturing facility or research lab, the grid is still likely to deviate from

14

SERVER

LAB

KITCHEN

COP YELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

5253

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

20

21

22

2425

26283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Figure 2.2: Diagram of Intel-Berkeley sensor network deployment 2.2. Note irregularly-spaced node locations (denoted as hexagons).

the regular, square-grid format of most digital images. As an example, consider theIntel-Berkeley research lab deployment [28] depicted in Figure 2.2. Node placement isheavily influenced by the structure of the lab, which is not convenient to a regular-griddeployment.

This irregularity in the spatial sampling grid is a problem likely to vex all sensornetwork deployments, as observed in [29], and in fact it complicates the developmentof distributed wavelet analysis techniques for sensor networks. From (2.1), it followsfrom the definition of an MRA in the first generation setting that wavelet analysis aspresented not only requires regular spacing of sample points in the signal domain butalso that the size of finite signals be a power of two. While this could be overcome byinterpolating the irregularly sampled measurements onto a regular square grid, suchan approach will introduce unacceptable interpolation error and will not distributeeasily among the nodes. It is better, instead, to design a new class of WT from theground up that is suited to irregular sampling grids.

This new transform motivates a new definition of an MRA in the second-generationsense [21]. We now define an MRA in an L2 space as a strictly increasing, densesequence V := Vjj≥j0 of closed subspaces in L2 such that Vj ⊂ Vj+1, j ∈ Z andclos

⋃∞j=j0

Vj = L2 for any j0 ∈ N.As before, we construct complement subspaces such that Vj⊕Wj = Vj+1, allowing

us to write Vj = Vj0 ⊕⊕j−1

i=j0Wi. We again specify basis functions for each scale-j

scaling space and wavelet and wavelet space:

Vj = closL2spanΦj, Φj := ϕj,k|k ∈ Kj

andWj = closL2spanΨj, Ψj := ψj,m|m ∈Mj,

15

where Kj and Mj are some index sets.We can relate the scaling/wavelet functions at each scale j to the scaling functions

at the next finer scale j + 1 through a new set of refinement relations,

ϕj,k =∑

l∈Kj+1

hj,l,kϕj+1,l (2.6)

andψj,m =

∑

l∈Kj+1

gj,l,mϕj+1,l (2.7)

using refinement coefficients hj,l,k and gj,l,m.The key difference from (2.2) and (2.3) is that the refinement coefficients are now

dependent not only on the index of the scaling/wavelet function in question but alsoon the scale of the transform. That is, we must design unique filters, adapted to eachlocation in our irregular grid, to compute the scaling and wavelet coefficients at eachscale. In Chapter 3, which follows, we detail how to design a transform tailored tothe structure of an irregular sampling grid.

16

Chapter 3Multi-Scale Transform Design and Properties

In this Chapter1, we design a multi-scale wavelet transform (WT) suited to de-ployment in sensor networks. As we discuss in Section 2.3, we cannot look to workon traditional WTs to guide our development here, as such transforms are intimatelytied with regularly spaced, square sampling grids. Instead, we must design a newtransform that is tolerant of the irregular-grid sampling structure we expect to seein real sensor network deployments. Such transforms are known to be feasible basedeither on the lifting scheme of Sweldens [34] or on the discrete multiscale frameworkof Harten [35]. In both approaches (which have much in common), the sensor grid Γis thought as the finest resolution level of a multiscale hierarchy of grids

Γj0 ⊂ Γj0+1 ⊂ · · · ⊂ ΓJ = Γ (3.1)

Given the sampled values cJ,γ := f(γ) of the field f on the nodes γ ∈ ΓJ , thetransform computes approximation (or scaling) coefficients cj := (cj,γ)γ∈Γj

and detail(or wavelet) coefficients dj := (dj,λ)λ∈∆j

where

∆j := Γj+1 \ Γj. (3.2)

The multiscale decomposition consisting of the sequences(cj0 ,dj0 ,dj0+1, · · · ,dJ−1) is algebraically equivalent to the data of the initial sequencecJ .

In most approaches to scattered data encoding, the non-uniform grids are orga-nized into a triangulated mesh — see, for example [36] or [37]. This approach allowsin particular the use of techniques such as finite element approximation. In fact,we present a mesh-based technique for distributed, wavelet-based compression in thesensor network setting in [31], based on the transform designed for de-noising of datain a centralized setting in [21]. However, in order to reduce communication overheadand increase robustness in the presence of time-varying wireless channels, we desirethat the transform not require the construction and maintenance of a mesh. Addi-tionally, meshes are harder to construct when the sampling grid occupies a volumerather than a plane — a distinct possibility in the sensor network setting, where thedeployment geography may require us to represent nodes’ positions in space ratherthan on a flat surface. A meshless multiscale protocol for sensor networks that in-

1This chapter represents joint work with Richard Baraniuk, Hyeokho Choi, Albert Cohen,Veronique Delouille, Shu Du, David B. Johnson, and Shriram Sarvotham [30–33].

17

herits the attractive compression properties of [31] is proposed in [32], based looselyon the framework of the former. In this chapter, we will re-develop the transformof [32] with slight modification to facilitate a rigorous mathematical analysis of thisapproach. This work also appears in [33].

From both theoretical and practical viewpoints, it is important to ensure thatthe multiscale transform retains the analytical properties of classical wavelet decom-positions that ensures their sparsity — namely, that the magnitudes of the detailcoefficients are governed by the local smoothness of the underlying function. The re-alization of such properties strongly depends on the specific design of the multiscaledecomposition, and more precisely on:

1. The definition of the hierarchy (3.1) through a proper fine to coarse decimationprocedure.

2. The definition of the rules that relate the scaling and wavelet coefficients fromone scale to the next.

In the remaining sections, we will carefully develop these definitions in the context ofa network whose nodes coordinate to gather scalar measurements at a single instantin time. We refer to the transform for such a measurement domain as a spatial or2-D transform, assuming without loss of generality that nodes lie in a plane. Weshall first devise in Section 3.2 a thinning procedure that has the property that theresulting grids Γj are in some sense similar to uniform hierarchical grids of spacing 2−j.Section 3.3 follows with a discussion of the interpolatory predict and smoothing updatestages of the multiscale transform. Section 3.4 establishes the stability of the predictand update stages, resulting in the expected decay estimates of wavelet coefficientsgiven locally smooth input functions. Section 3.5 demonstrates application of thethinning algorithm and verifies the numerical stability of the update stage. Section3.6 follows with a description of the protocol to implement such a transform in asensor network along with a discussion of robustness of the protocol to time-varyingnetwork connectivity and node failures. Section 3.7 discusses extending the spatialtransform to a setting where nodes coordinate to collect time-series of measurements,developing for this extended domain a spatio-temporal or three-dimensional (3-D)WT.

Before we delve into the details of the development, however, we first presenta brief review of related techniques in distributed, multi-scale analysis for sensornetworks.

3.1 Related Work on Multi-Scale Transforms for Sensor Net-works

Among related approaches to distributed multi-scale data processing in sensornetworks, DIMENSIONS [38] provides a notable early example. This work proposes

18

the use of a distributed WT to facilitate efficient routing of queries in a sensor networkand enable distributed storage whose resolution, and, hence, memory requirements,degrade gracefully over time. While DIMENSIONS suggests both temporal and spa-tial processing of time-series of node measurements using wavelet techniques, it offersno suggestions on how to implement distributed spatial wavelet processing. Exper-iments conducted with node hardware reply solely on temporal wavelet processing.Moreover, a regular, dyadic spatial sampling grid is assumed throughout the work.The authors acknowledge the restrictions of such an assumption, demonstrating thateven interpolation of irregularly sampled data onto a regular grid prior to wavelet-based distributed query processing fails to deliver desired results.

A distributed WT for sensor networks is proposed in [39], but the approach takenby the authors is very different from the one described in this thesis. They assume thatthe network routing structure consists of a number of (possibly merging) individualpaths originating at the periphery of the network and directed toward the networksink. The proposed WT is 1-D in nature, considering each path to be the domain ofa new signal to be transformed. To account for irregularity in the distance betweeneach node in the path, a 1-D wavelet lifting scheme is suggested. Each node in thepath can decide to retain or reduce the level of detail at which to encode its waveletcoefficient to optimize the transform with respect to inter-node communications. Aspaths merge, some attempt is made to reduce the amount of data required to describemultiple sets of coefficients at each node, but the transform still generates an over-complete representation, with more wavelet coefficients than original measurements.Moreover, the fundamentally one-dimensional nature of the transform limits its overallefficiency for distributed compression. As an illustration, we consider histograms ofwavelet coefficient magnitudes for the cameraman example of Figure 2.1. In Figure3.1(a) we depict the histogram of coefficient magnitudes from a fully 2-D transform,and in Figure 3.1(b) we show the histogram of magnitudes from a transform where1-D wavelet analysis is applied to each row of the image but no processing is appliedalong the columns (mimicking the 1-D path approach of [39]). We see that the2-D transform more sparsely represents the overall signal, generating more small-magnitude coefficients than the 1-D transform, which in turn leads to more efficientcompression of the cameraman image. Thus, such a path-based approach cannot beexpected to as effectively exploit the spatial dependencies in sensor measurements as afully two-dimensional approach. Moreover, the transform parameters are intimatelytied with the network routing structure, so that as aggregation paths to the sinkchange, an entirely new set of transform meta-data must be sent to the sink so thatit can correctly compute the inverse transform of the wavelet coefficients. In a time-varying networking environment, there must be some mechanism for repairing thetransform in a more limited fashion.

A multi-scale technique for sensor networks unrelated to distributed wavelet anal-ysis can be found in the work of [40]. Known as Fractional Cascading, the proposedapproach calls for each node in the network to have a view of the entire scalar field

19

0 100 200 300 4000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

frequency

coef

f. m

agni

tude

Cameraman 2D DWT

0 100 200 300 4000

0.5

1

1.5

2

2.5

3

3.5x 10

4

frequency

coef

f. m

agni

tude

Cameraman 1D DWT

(a) (b)

Figure 3.1: Histograms of wavelet coefficient magnitudes from the cameraman test imageyielded by (a) fully 2-D wavelet analysis and (b) 1-D wavelet analysis along each column.

measured by the network with the resolution of that view decreasing as the locationsof the measured values become more distant from the node in question. The datastructure enables efficient responses of the network to so-called range queries — re-quests for all sensors in a given region whose measurements like in a given range.Fractional Cascading provides each node with its multi-resolution view by imposinga so-called virtual quadtree structure on the measured field. That is, the 2-D spa-tial domain in which the sensors lie is recursively split into dyadic partitions in eachdimension. This virtual quadtree does not, however, assume any regularity in theplacement of the nodes. Overall, the technique performs well for the application ofrange querying, and allows queries to be efficiently answered when injected at anypoint in the network. The data structure, however, does not extend to a broader setof applications such as compression or de-noising of nodes’ measurements. FractionalCascading is discussed in further detail in Section 6.4.4.

The authors of [41] propose another scheme suited to answering queries on a scalarmeasurement field sampled by a sensor network. In this scheme, each node randomlyselects a hop-neighborhood size h and, using data from all neighbors within h radiohops, computes a summary coefficient and a detail coefficient describing the h-hopneighborhood. The goal of such analysis is that, with high probability, a user shouldobtain a large scale summary by querying just a few nodes anywhere in the networkor a small scale detail by querying a handful of nodes in a region of interest. Theapproach is in general well suited to limited local querying of coefficients within thenetwork, makes efficient use of local communications, and is robust in the presence ofunreliable wireless links. It does not, however, extend well to efficient approximationof the entire measurement field with a handful of transform coefficients, due to therandom nature of scale selection and simple design of the averaging and differencing

20

filters.

3.2 Multiscale Description of Scattered Points

We now proceed to detail the design of our proposed distributed WT. We first re-strict ourselves for now to designing a spatial transform, where nodes are distributedon an irregular grid collecting scalar measurements. We extend our results to con-struct a spatio-temporal transform in Section 3.7.

Let Ω be a bounded domain in IRd and let Γ be a discrete finite set of points allincluded in Ω. For such a set of points, we introduce two natural quantities: themaximal density defined as

δ(Γ) := maxx∈Ω

minγ∈Γ

|x− γ| (3.3)

and the minimal spacing defined as

µ(Γ) := minγ,γ′∈Γ,γ 6=γ′

|γ − γ′|. (3.4)

Note that if Ω is convex, or if the segment [γ, γ′] between the minimizing pair in theabove expression is contained in Ω, then we clearly have that

µ(Γ) ≤ 2δ(Γ). (3.5)

When Γ is a uniform grid of points, the two quantities δ(Γ) and µ(Γ) are comparable.For instance µ(Γ) =

√2δ(Γ) in the case of a square grid.

More generally we shall be interested in sets that are close to a uniform gridaccording to the following definition.

Definition 3.2.1 The set Γ is called “quasi-uniform” if and only if

δ(Γ) < 2µ(Γ). (3.6)

This property will play an important role when further analyzing the stability prop-erties of the multiscale transform.

We next describe our thinning algorithm. Given the set Γ (that is not in generalassumed to be quasi-uniform), we first define the finest resolution level J as the uniqueinteger such that

2−J < µ(Γ) ≤ 2−J+1. (3.7)

Assuming that Γj has been defined for some integer j, the coarser grid Γj−1 is obtainedby the following iterative procedure:

1. Define Γj−1 = Γj and ∆j−1 as the empty set.

21

2. Pick γ ∈ Γj−1. For all µ ∈ Γj−1 such that µ 6= γ and |µ− γ| ≤ 2−j+1, remove µfrom Γj−1 and add it to ∆j−1.

3. Maintain γ in Γj−1 and return to Step 2 by picking a new γ that is still in theupdated Γj−1.

4. Continue Steps 2 and 3 until all possible γ ∈ Γj−1 have been visited.

Two properties of the grids Γj and ∆j immediately follow from the definition of thethinning procedure:

µ(Γj) > 2−j, (3.8)

andfor all µ ∈ ∆j, there exists γ ∈ Γj such that |µ− γ| ≤ 2−j. (3.9)

From the second property (3.9), we can derive that for all x ∈ Ω

minγ∈Γj−1

|x− γ| ≤ 2−j + minγ∈Γj

|x− γ|, (3.10)

and therefore, taking the maximum over x ∈ Ω from both sides,

δ(Γj−1) ≤ 2−j + δ(Γj), (3.11)

that by iteration leads to

δ(Γj) ≤∑

j<l≤J

2−l + δ(Γ) < 2−j + δ(Γ). (3.12)

This last inequality implies that the thinning algorithm eventually generates quasi-uniform sets, even if the initial set Γ is not quasi-uniform. This is expressed by thefollowing result, the proof of which is an immediate consequence of (3.12).

Theorem 3.2.1 Let L be the integer such that 2−L < δ(Γ) ≤ 2−L+1. Then for j ≤ L,we have

δ(Γj) < 2−j+1, (3.13)

and therefore δ(Γj) < 2µ(Γj), i.e., the sets Γj are quasi-uniform.

Note that, as a consequence of the above theorem, if the initial set satisfies δ(Γ) ≤2−J+1, then the entire hierarchy consists of quasi-uniform sets.

3.3 Multiscale Transforms

Assume that we are given the values of a function f on the finest grid ΓJ — wecall these values the field, as described above. We denote by cJ the vector consistingof these values, i.e.,

cJ = (cJ,γ)γ∈ΓJ, cJ,γ := f(γ). (3.14)

22

Our multiscale transforms are based on the data of three linear interscale operators.The first operator is simply the restriction operator that maps a vector (uγ)γ∈Γj+1

tothe smaller vector (uγ)γ∈Γj

. Note that the iterative application of this operator on thevector cJ produces the values of the function f on the grids Γj for j = J−1, J−2, · · · .

The second operator is a prediction operator that maps a vector (uγ)γ∈Γjto a

vector (uλ)λ∈Γj+1. If uγ represent the values of a function u at the point γ, then the

value uλ should be thought as an approximation of u at the point λ obtained byinterpolation of its known values on Γj. More precisely, the value uλ will be given by

uλ =∑

γ∈Nj(λ)

aλ,γuγ, (3.15)

where Nj(λ) is a neighborhood of λ consisting of those γ ∈ Γj that are in some ballcentered around λ and of radius CL2−j where CL ≥ 1 is a fixed constant, and aλ,γ arefixed coefficients such that ∑

γ∈Nj(λ)

aλ,γ = 1. (3.16)

Note that according to (3.9), the neighborhood Nj(λ) always contains at least oneelement.

Before specifying in more detail the neighborhood Nj(λ) and coefficients aλ,γ, letus define a first interpolatory multiscale transform in which the scaling coefficientsare simply the function values and the wavelet coefficients are the prediction errors.The decomposition algorithm reads as follows: for j = J − 1, · · · , j0, do

1. Restrict: cj,γ := cj+1,γ for all γ ∈ Γj.

2. Predict: cj+1,λ :=∑

γ∈Nj(λ) aλ,γcj,γ for all λ ∈ ∆j.

3. Compute details: dj,λ := cj+1,γ − cj+1,γ for all λ ∈ ∆j.

The reconstruction algorithm reads as follows : for j = j0, · · · , J − 1, do

1. Extend: cj+1,γ := cj,γ for all γ ∈ Γj.


γ∈Nj(λ) aλ,γcj,γ for all λ ∈ ∆j.

3. Correct: cj+1,λ := cj+1,λ + dj,λ for all λ ∈ ∆j.

The main defect of this interpolatory WT is that the scaling coefficients are exactlythe sub-sampled values of the original functions on the grid Γj, instead of being localaverages of the function values at the resolution 2−j. In the wavelet basis frameworkthis is reflected by an inherent lack of L2-stability of the interpolatory basis. Oneway to correct this defect is to introduce an updating operator that maps two vectors(uγ)γ∈Γj

and (dλ)λ∈∆jto a vector (uγ)γ∈Γj

defined by

uγ = uγ +∑

λ∈Mj(γ)

bγ,λdλ, (3.17)

23

split

+

-

c j+1,γγ∈Γj+1

c j+1,γγ∈Γj

c j+1,λλ∈∆j

P U

d j,λλ∈∆j

c j,γγ∈Γj

...

split

+

-

c j,γγ∈Γj-1

c j,λλ∈∆j-1

P U

d j-1,λλ∈∆j-1

c j-1,γγ∈Γj-1

split

Figure 3.2: A cascaded pair of lifting stages using predict (P) and update (U) operators.

where Mj(λ) is a neighborhood of γ consisting of those λ ∈ ∆j such that γ ∈ Nj(λ),i.e., those λ on which the prediction is influenced by the value at γ, and bλ,γ are fixedcoefficients. Note that the neighborhood Mj(γ) might be empty in the case where γdoes not contribute to the prediction operator. In such a case, there is no update atthis point.

Before specifying in more detail the coefficients bλ,γ, we can define a second mul-tiscale transform, by incorporating the update stage in the interpolatory multiscaletransform. The decomposition algorithm reads as follows: for j = J − 1, · · · , j0, do

1. Restrict: cj,γ := cj+1,γ for all γ ∈ Γj.


γ∈Nj(λ) aλ,γ cj,γ for all λ ∈ ∆j.

3. Compute details: dj,λ := cj+1,γ − cj+1,γ for all λ ∈ ∆j.

4. Update: cj,γ = cj,γ +∑

λ∈Mj(γ) bγ,λdj,λ for all γ ∈ Γj.

The reconstruction algorithm reads as follows: for j = j0, · · · , J − 1, do

1. Update: cj,γ = cj,γ −∑

λ∈Mj(γ) bγ,λdj,λ for all γ ∈ Γj.

2. Extend: cj+1,γ := cj,γ for all γ ∈ Γj.


γ∈Nj(λ) aλ,γ cj,γ for all λ ∈ ∆j.

4. Correct: cj+1,λ := cj+1,λ + dj,λ for all λ ∈ ∆j.

Figure 3.2 depicts a cascaded pair of lifting stages using both the predict and updateoperators.

We now make the choice of the interscale operators more precise. We will firstdiscuss the prediction operator. In order to ensure the correct rate of decay of thewavelet coefficients when the function has a given amount of smoothness, the follow-ing properties will be essential:

(P1) Polynomial exactness of some order m: if cj,γ = p(γ) for all γ ∈ Γj and for

24

some p ∈ Πm, then cj+1,γ = p(γ) for all γ ∈ Γj+1.

(P2) Locality: Nj(λ) is contained in the ball |x − λ| ≤ CL2−j with CL a uniformconstant independent of j and λ.

(P3) Stability:∑

γ∈Nj(λ) |aλ,γ| ≤ CA with CA a uniform constant independent ofj and λ.

In the case where m = 0 (exactness for constant functions), one can easily jointlyfulfill these three properties by a simple choice: for all λ ∈ ∆j, we denote by γ(λ) apoint of Γj such that |λ− γ(λ)| ≤ 2−j (such a point always exists according to (3.9))and set

uλ := uγ(λ). (3.18)

This choice obviously satisfies the above properties with m = 0, CL = 1 and CA = 1.If we want to raise the order of accuracy to some m > 0, then the most natural

approach is to reconstruct a polynomial pλ ∈ Πm from the data uγ for γ ∈ Nj(λ).In the case of regular grids, this is an easy task that can be addressed by buildingthe interpolating polynomial from a well chosen subset of points. For non-regulargrids, the choice of such a subset that ensures the well-posedness of the interpolationproblem is not an easy task, and we shall instead rely on a least-squares strategy: welook for the pλ ∈ Πm solution of the problem

minp∈Πm

∑

γ∈Nj(λ)

|p(γ)− uγ|2 (3.19)

and then defineuλ = pλ(λ). (3.20)

The least squares problem can be solved by introducing a basis of Πm locally adaptedto the neighborhood Nj(λ), for example

qα(x1, · · · , xd) =d∏

i=1

(2jC−1L (xi − λi))

αi , |α| = α1 + · · ·+ αd ≤ m, (3.21)

and by defining pλ =∑

|α|≤m xαqα where the vector x = (xα)T is the solution of thenormal equation

Gx = y, (3.22)

with the matrix G := (Gα,β)|α|,|β|≤m and right hand side y := (yα)T defined by

Gα,β :=∑

γ∈Nj(λ)

qα(γ)qβ(γ), (3.23)

25

andyα :=

∑

γ∈Nj(λ)

qα(γ)uγ. (3.24)

Note that a necessary (but not sufficient) condition for the invertibility of the matrixG is that the number of points in Nj(λ) is at least the dimension of Πm. Note alsothat, with the above choice of basis for Πm, the expression of the prediction simplifiesaccording to

uλ =∑

|α|≤m

xαqα(λ) = x0, (3.25)

and since x0 depends linearly on y that itself depends linearly on the uγ, we can writeuλ =

∑γ∈Nj(λ) aλ,γuγ, where the coefficients aλ,γ only depend on the choice of Nj(λ).

Since x is obtained by solving (3.22), the stability of the prediction in the sense of(P3) is related to the invertibility and conditioning properties of G.

Our main result in this section is the following theorem that states that (P1),(P2), (P3) can always be jointly ensured for quasi-uniform meshes in the sense ofDefinition 3.2.1.

Theorem 3.3.1 Assume that Γj is quasi-uniform. Then for all m > 0, there exist CL

and CA depending only on m and d such that with the choice Nj := γ ∈ Γj ; |γ−λ| ≤CL2−j the following properties hold: G is invertible with

‖G−1‖ := sup‖y‖`2=1

‖G−1y‖`2 ≤ CG, (3.26)

where CG only depends on m and d, and

∑

γ∈Nj(λ)

|aλ,γ| ≤ CA. (3.27)

The proof of this theorem is technical and postponed to the Appendix. Based on thisresult, we can formulate the following strategy for defining the prediction operator,given the sets Γj and the degree of polynomial exactness m: for a given λ ∈ ∆j,

1. Consider the points γ1, · · · , γn of Γj that are at distance less than CL2−j ofλ.

2. Consider all subsets Nj,`(λ) for ` = 1, · · · , 2n, sorted in order of increasing totalsquared distance from λ (or some other metric reflecting the transmission energycost in a sensor network).

3. For each `, build the build the matrix G associated to neighborhood Nj,`(λ),and when G is non-singular compute the quantity CA,` =

∑γ∈Nj,`(λ) |aλ,γ|.

4. In the case where CA,` ≤ CA for some `, take Nj(λ) := Nj,`∗(λ), where `∗ is thesmallest ` such that CA,`∗ ≤ CA, and stop.

26

5. In the case where, for all `, either CA,` > CA or G is singular, renounce topolynomial exactness of degree m and run the same procedure for polynomialexactness of degree m− 1,m− 2, · · · until the stability criterion is met.

According to Theorem 3.3.1 we are ensured that we never go to Step 5 in thecase of quasi-uniform grids. On the other hand, it is possible that we go to thisstep and are led to lower the order of polynomial exactness for the prediction at λdown to m = 0 in the case of a grid that does not have this property. Intuitively,this corresponds to the situation where λ has only one point γ(λ) ∈ Γj at distanceless than 2−j while all the other points of Γj are much further. In such a case, thealgorithm decides to use a low order prediction based on the close point γ(λ), ratherthan a high order prediction based on very far points.

Remark 3.3.1 In the case m = 1, corresponding to the affine polynomials that areused in the numerical tests of Section 3.5, one can easily estimate CL and CA moresharply than with the method proposed in the general proof of Theorem 3.3.1, which israther pessimistic on the size of these constants. For the two-dimensional case d = 2,elementary yet tedious computations lead to CL = 4 and CA = 2.

We now turn our attention to the design of the update operator. The goal of thisoperator is to improve the overall stability of the transform by smoothing the sub-sampled values cj,γγ∈Γj

so that the scaling coefficients cj,γγ∈Γjhave the same

average behavior as the scaling coefficients cj+1,γγ∈Γj+1at the next finer scale.

Before we can specify the update operator coefficients, we must first return tothe second-generation MRA introduced in Section 2.3 for a bit of background, againdrawing from the review presented in [21]. For succinctness, we can express therelations from (2.6) and (2.7) in terms of matrix operations as

Φj = Φj+1Hj (3.28)

Ψj = Φj+1Gj, (3.29)

respectively, where Φj (Ψj) represents a row vector of all the scale-j scaling (wavelet)functions.

The lifting construction presented here gives us a biorthogonal system of waveletfunctions. Under this paradigm, one set of basis functions, called the dual basis anddenoted as Φj and Ψj, is used for analysis. Another set, called the primal basis anddenoted as Φj and Ψj, is used for synthesis. The refinement operators relating theprime basis functions are the same as described above, and analogous operators forthe dual basis are specified as Φj = Φj+1Hj and Ψj = Φj+1Gj.

Using this dual basis for analysis now gives us cj = H∗j cj+1 and dj = G∗

jcj+1, andapplication of the primal basis for synthesis gives cj+1 = Hjcj + Gjdj.

The lifting construction of our biorthogonal transform begins with two initial filterpairs, (H0

j , G0j) and (H0

j , G0j), that serve as indicator matrices that merely segregate

27

the Γj+1 into Γj and ∆j. We collect the predict and update operator coefficients intotwo matrices, P and U , respectively. The lifting process then gives us the filter pair

Hj = H0j + G0

jPj (3.30)

Gj = G0j −HjUj (3.31)

for the primal basis. Similarly, for the dual basis, we have the pair Gj = G0j − H0

j P ∗j

and Hj = H0j + GjU

∗j .

Using these relations, we begin by considering each sequence (f(γ))γ∈ΓJ, as a

combination of Dirac sequences centered at points of γ. We call these Dirac sequencesϕJ,γ, and they serve as the “discrete scaling function” primal basis for reconstructing(f(γ))γ∈ΓJ

from cJ,γ. Using (3.28) and (3.30), we can define each discrete scalingfunction at the subsequent scale recursively as

ϕj,γ = ϕj+1,γ +∑

λ∈Mj(γ)

aλ,γϕj+1,λ (3.32)

(specified for each element of Φj). Similarly, we assign to each detail dj,λ, λ ∈ ∆j

a discrete wavelet ψj,λ. Using (3.29) and (3.31), we can define each of these as amixture of basis functions from scales j and j + 1 according to

ψj,λ = ϕj+1,λ −∑

γ∈Nj(λ)

bγ,λϕj,γ (3.33)

(specified for each element of Ψj). Denote the discrete integral of ϕj,γ as Ij,γ. The goalof maintaining a constant average value across scales amounts to keeping

∑γ∈Γj

cj,γIj,γ

constant for each scale j. Due to the relationships between functions in the dual andprimal biorthogonal bases, this is equivalent to giving each ψj,λ, λ ∈ ∆j a zero integral,i.e. a single vanishing moment [21].

Since the basis functions of the finest-scale grid are Dirac sequences, we have thatIJ,γ = 1 for each γ ∈ ΓJ . From (3.32), all subsequent basis integrals for scales j < Jare found as

Ij,γ = Ij+1,γ +∑

λ∈Mj(γ)

aλ,γIj+1,λ. (3.34)

With the scale j and j + 1 integrals in hand, giving each ψj,λ a zero integral amountsto choosing bγ,λ for γ ∈ Nj(λ) to satisfy

Ij+1,λ =∑

γ∈Nj(λ)

bγ,λIj,γ. (3.35)

Any number of choices of bγ,λ will satisfy (3.35). For example, in the sensor networkscenario, updating only the point γ∗ = arg minγ∈Nj(λ) |λ− γ| is an intuitive and low-

28

cost solution. Under this method, the update rule is

bλ,γ =Ij+1,λ

Ij,γ

, γ = γ∗ (3.36)

with bλ,γ = 0 otherwise. Unfortunately, this update rule does not provide the desiredstability. In fact, in many cases, it reduces the overall stability, as we will demonstratein Section 3.5. Instead, we turn to the least-squares solution of Jansen et al. [42] andDelouille [21], that gives update coefficients of minimum norm:

bλ,γ =Ij,γIj+1,λ∑η∈Nj(λ) I2

j,η

. (3.37)

This choice updates all neighbors in Nj(λ) and gives the desired transform stability,as Section 3.5 will illustrate.

3.4 Decay Properties of Wavelet Coefficients

In this section we shall prove that when the properties (P1), (P2), (P3) are fulfilled,the decay in scale of wavelet coefficients is governed by the local smoothness of thefunction f , similar to the case of standard wavelet bases.

Recall that for s > 0, the function f is Cs at the point λ if there exists a polynomialp of degree m < s and some Cλ such that for all y

|f(y)− p(y)| ≤ Cλ|y − λ|s. (3.38)

Our first result deals with the coefficients of the interpolatory multiscale transform.

Theorem 3.4.1 Assume that (P1), (P2), (P3) are fulfilled and that f is Cs at λ ∈ ∆j

for some s ≤ m + 1. Then, we have the estimate

|dj,λ| ≤ Kλ2−sj, (3.39)

with Kλ = CACλCsL.

Proof: According to (3.38), we can write f = p + r where p ∈ Πm is such that

p(λ) = f(λ), (3.40)

and r is such that for all y|r(y)| ≤ Cλ|y − λ|s. (3.41)

Therefore, when γ ∈ Nj(λ), we have

cj,γ = p(γ) + r(γ), (3.42)

29

where, by (P2) and (3.41), we have

|r(γ)| ≤ CλCsL2−sj. (3.43)

We now writecj+1,λ =

∑

γ∈Nj(λ)

aλ,γp(γ) +∑

γ∈Nj(λ)

aλ,γr(γ). (3.44)

From (P1) we obtain that

∑

γ∈Nj(λ)

aλ,γp(γ) = p(λ) = f(λ) = cj+1,λ, (3.45)

so that

|dj,λ| = |cj+1,λ − cj+1,λ| =∣∣∣∣∣∣

∑

γ∈Nj(λ)

aλ,γr(γ)

∣∣∣∣∣∣. (3.46)

From (P3) and (3.43), we get

∣∣∣∣∣∣∑

γ∈Nj(λ)

aλ,γr(γ)

∣∣∣∣∣∣≤ CACλC

sL2−sj, (3.47)

that concludes the proof. ¤

Our next result deals with the coefficients of the multiscale transform with theadditional update stage. We would like to show that these coefficients decay at asimilar rate as those of the interpolatory transform when the function f is smooth.Our strategy for proving this property will be to consider the update stage as aperturbation to the interpolatory transform that is very small when f is smooth. Wewill write

cj,γ = f(γ) + gj,γ, (3.48)

and prove that gj,γ has the same order of magnitude as the details dj,λ. In order toimplement this idea, we shall need additional assumptions that involve the coefficientsbγ,λ in the update stage:

(P4) Update stability:∑

λ∈Mj(γ) |bγ,λ| ≤ CB with CB independent of γ and j.

(P5) Combined stability:∑

µ |δγ,µ +∑

λ∈Mj(γ) bγ,λ(δλ,µ − aλ,µ)| ≤ CAB with CAB

independent of γ and j. Here δα,β = 1 if α = β and 0 else.

It is easily seen that (P3) and (P4) implies (P5) with CAB estimated by CAB ≤1 + CB(1 + CA). However (P5) might be valid with a smaller value for CAB. Thefollowing theorem shows that the optimal order of decay is achieved for the lifted

30

transform provided that CAB is small enough.

Theorem 3.4.2 Assume that (P1), (P2), (P3), (P4), (P5) are fulfilled with the con-stant CAB ≤ 2s in (P5) and that f is Cs for some s ≤ m + 1. Then, there existconstants C1 and C2 such that we have the estimates

|gj,γ| ≤ C12−sj (3.49)

and|dj,λ| ≤ C22

−sj. (3.50)

Proof: We proceed by induction. Assume that both estimates hold at scale j + 1.We first remark that for λ ∈ ∆j,

dj,λ = dj,λ + ej,λ, (3.51)

wheredj,λ := f(λ)−

∑

γ∈Nj(λ)

aλ,γf(γ) (3.52)

is the detail of the interpolatory transform and where

ej,λ := gj+1,λ −∑

γ∈Nj(λ)

aλ,γgj+1,γ. (3.53)

Using Theorem 3.4.1, we know that

|dj,λ| ≤ C02−sj, (3.54)

On the other hand, from (3.49) at scale j + 1 and (P3), we obtain that

|ej,λ| ≤ (1 + CA)C12−s(j+1). (3.55)

Therefore, we derive that (3.50) holds at scale j with C2 = C0 + 2−s(1 + CA)C1.It remains to show that (3.49) holds with the same constant C1 at scale j. For

this we writecj,γ = cj+1,γ +

∑λ∈Mj(γ) bγ,λdj,λ

= f(γ) + gj+1,γ +∑

λ∈Mj(γ) bγ,λdj,λ

= f(γ) + A + B,

withA =

∑

λ∈Mj(γ)

bγ,λdj,λ, (3.56)

andB = gj+1,γ +

∑

λ∈Mj(γ)

bγ,λej,λ. (3.57)

31

Using (P4) and (3.54) we obtain

|A| ≤ CBC02−sj. (3.58)

On the other hand, we can write

B = gj+1,γ +∑

λ∈Mj(γ) bγ,λ

(gj+1,λ −

∑µ∈Nj(λ) aλ,µgj+1,µ

)

=∑

µ

(δγ,µ +

∑λ∈Mj(γ) bγ,λ(δλ,µ − aλ,µ)

)gj+1,µ

so that using (P5) and (3.49) at scale j + 1, we obtain

|B| ≤ CABC12−s(j+1). (3.59)

It follows that|gj,γ| ≤ (CBC0 + 2−sCABC1)2

−sj. (3.60)

We therefore want that CBC0 + 2−sCABC1 ≤ C1. Since we have assumed CAB < 2s,it suffices to take C1 = CBC0

1−2−sCAB. ¤

3.5 Numerical Stability Study

We now illustrate numerically several of the properties of the multiscale transform.First, we present in Figure 3.3 the results of the thinning procedure applied to a gridof 500 points in the plane. The original grid appears in Figure 3.3(a), and the gridsat scales J − 1 = 8 through j0 = 1 are shown in Figures 3.3(b) through 3.3(i),respectively. At each scale j, nodes in Γj are marked as • and those in ∆j are markedas ¥. Scale j0 is chosen in this and subsequent examples to ensure that the quasi-uniformity property persists throughout all transform scales including and below scaleL from Theorem 3.2.1. Thinning past j0 encounters edge effects in the finite grids ofthese experiments, leaving too few remaining grid points for stable prediction in thesense of (P3).

We next examine briefly the stabilizing effect of the update stage, comparing anorder m = 1 predict-only transform with such a transform followed by the closest-point (3.36) and least-squares (LS) (3.37) update schemes discussed in Section 3.3.To do so, we inspect the condition numbers of each linear transform matrix, averagedover 100 instances of a 250-point grid, with grid locations drawn from a random,uniform distribution on the unit square. The results are shown in Table 3.1. Since thecondition number gauges the stability of the transform under coefficient modifyingoperations such as thresholding, we see that the least-squares update does indeedimprove upon the predict-only transform on average. The closest-point update, whileattractive from a logistical standpoint in sensor networks, does not preform well at all.Occasional grids ill-suited to this update technique drive the average condition numberorders of magnitude higher than the others. And even considering this number’s

32

Original Grid Scale−8 Grid Scale−7 Grid

(a) (b) (c)

Scale−6 Grid Scale−5 Grid Scale−4 Grid

(d) (e) (f)

Scale−3 Grid Scale−2 Grid Scale−1 Grid

(g) (h) (i)

Figure 3.3: Thinning algorithm example. Nodes in Γj are marked as • while those in ∆j

are marked as ¥.

median value of 45.66, we see that the closest-point update technique is not suited tostabilizing the overall transform.

We next turn our attention to the constants CA, CB, and CAB associated with(P3), (P4), and (P5). While the predict stage is designed so that order-m predictiononly takes place when a suitable CA is found (hence, (P3) always applies), there are nosuch guarantees on CAB in the design of the update stage, that instead guarantees aconstant average value across scales. Thus, we must verify numerically that CAB ≤ 2s

for some s ≤ m + 1 so that the assumptions in Theorem 3.4.2 are satisfied for theleast-squares update scheme. To do so, we study grids of size 100k points withk = 1, 2, · · · , 15. At each grid size, we generate 100 instances of the grid, drawing

33

none closest LS25.42 1305.67 18.13

Table 3.1: Condition numbers for the predict only (no update), closest update, and LSupdate transforms, averaged over 100 trials using 250 randomly-generated grid points. Themedian value for the closest update transform is 45.66.

200 400 600 800 1000 1200 14001

1.5

2

2.5

3

number of nodes

Maximum CA, C

B, and C

AB (averaged over 100 trials)

CA

CB

CAB

200 400 600 800 1000 1200 14001.5

2

2.5

3

3.5

4

4.5

number of nodes

Maximum CA, C

B, and C

AB (max over 100 trials)

CA

CB

CAB

(a) (b)

Figure 3.4: Maximum CA, CB, and CAB: (a) average over 100 trials and (b) maximumover 100 trials (using grid sizes from 100 to 1500 nodes).

point locations from a random, uniform distribution on the unit square. We computethe maximum CA, CB, and CAB for each grid instance, and the results are shown inFigure 3.4. In Figure 3.4(a), the average value of the maximum CA, CB, and CAB

over all 100 instances is plotted versus grid size. Figure 3.4(b) similarly depicts themaximum over all 100 instances of the maximum CA, CB, and CAB at each instance.We see that, indeed, the maximum CAB on average never rises above 3, and that itslargest instance typically never rises above the required constant 22 = 4 for orderm = 1 prediction. In fact, for only two grid sizes (600 and 1400 points) does themaximum CAB ever rise above 4, and in both it barely does so, reaching 4.01 for the600-point grid and 4.04 for the 1400-point grid. And in each case, only a single pointof a single instance of the 100 trials for that grid size produces a value above 4. Wetherefore observe that the stability assumptions of Theorem 3.4.2 apply in practice tothe multiscale transform with order m = 1 predict stage followed by a least-squaresupdate stage.

3.6 Transform Protocol, Synchronization, and Robustness

Now that we have shown decay properties of the wavelet coefficients, under boththe predict-only and predict-update paradigms, we turn our attention to the mechan-

34

ics of distributing this transform within a sensor network. To begin with, we assumethat the position of each node in the network is known — a reasonable assumption,since a data sink would have no way to make sense of the measured field withoutassociating nodes’ measurements to their locations. A great deal of research has beendevoted to the problem of node self-localization in sensor networks, and we refer thereader to [43] for an overview of the field. Additionally, we assume that nodes aresynchronized in some manner so that they can compute a snapshot of the measuredfield at a given time. Again, this problem has received much attention in the researchcommunity — see, for example, [44].

The predict and update coefficients used in the transform must be agreed upon byboth the network and the data sink, so that the sink can collect coefficients generatedby the network’s distributed transform and centrally compute an inverse transformto recover the measured field. Scale-j prediction at a node λ ∈ ∆J involves regressinga polynomial through the values at neighboring nodes Nj(λ). Thus, the coefficientsaλ,γγ∈Nj(λ) depend only on the locations of the nodes in Nj(λ), through the solutionto (3.22) with a local polynomial basis. Similarly, each coefficient bλ,γ for an updatednode γ ∈ Γj with λ ∈ Mj(γ) relies only on the set of integrals Ij,ηη∈Nj(λ) throughthe solution to (3.35). These integrals arise from Dirac functions at the original scaleJ and are easily computed at all subsequent scales using (3.34).

Thus, knowing only node locations and the fact that IJ,γ = 1 for each γ ∈ ΓJ ,we can easily compute the set of predict and update coefficients for each node ateach scale. Though this could be done within the network, with each predictednode gathering location and integral information from its updated neighbors andsolving (3.22) and (3.35), we favor a solution where the sink performs the necessarycomputations. In the former case, the network determines the coefficients with highcollaborative overhead and then transmits its decisions to the sink. In the latter, thesink computes coefficients using already-known position information for each node atno overhead cost and then informs the network. Clearly, the latter procedure is farmore efficient.

To compute the transform in the network, each node must know: (i) the scale atwhich it is predicted, (ii) the neighbors it uses for its prediction at that scale, and (iii)the neighbors it helps predict at all finer scales. The transform begins at scale J − 1with sensors in ΓJ−1 sending their scale-J scaling coefficients (raw measurements) topredicted neighbors in ∆J−1. Once a node in ∆J−1 has heard from each neighbor usedin its prediction, it can compute a scale-(J − 1) wavelet value and send that valueback to each updated neighbor. When a node in ΓJ−1 receives wavelet values fromeach predicted neighbor, it can compute its scale-(J − 1) scaling value. An updatednode can then participate in scale-(J − 2) of the transform, contacting predictedneighbors or waiting to hear from neighbors used in its own prediction as specified bythe scale-(J − 2) split of sensors in ΓJ−1. Since a node does not attempt to computea scale-j coefficient until all relevant neighbor data is received, there is no danger ofsynchronization errors due to slower nodes. Figure 3.5 illustrates the communications

35

c j+1,5c j+1,2

c j+1,3

c j+1,8

c j+1,5 c j+1,13

c j+1,11

d j,1

d j,9

d j,1

d j,1

d j,9

d j,9

d j,9

(a) (b)

Figure 3.5: Communication flow at scale j: (a) First, each predicted node n1, n9 ∈ ∆j

(marked as •) receives a scale-(j + 1) scaling coefficient from each neighbor in its predictneighborhood N (n1),N (n9) ∈ Γj . (b) Then n1, n9 each transmits its scale-j waveletcoefficient to each updated neighbor in N (n1),N (n9).

traffic among a set of predicted and updated nodes at a scale j of the transform.On occasion, a pair of nodes may not be able to find any multihop path to share

their coefficients, and so the transform must locally adapt to guarantee predict sta-bility. If a predicted node n at scale j is unable to hear from a neighbor m ∈ N (n),it must recompute its predict weights with a neighborhood of N (n) \m, which mayresult in ill-conditioning of matrix G. In such a case, it can begin looking for newpredict neighbors by repeating the process from Section 3.3 until the new neighbor-hood N (n) enables a new, well-conditioned G. Provided the grid missing m from thestandpoint of n is quasi-uniform, the spatial scope of this search is restricted per (P2).Given this new neighborhood, node n can re-compute its set of predict weights. Ifthe network is using the somewhat less stable predict-only transform, then no furtherin-network repair is required, and the repairing node must only inform the data sinkof its new neighbor set and transform weights. If the transform uses both predictand update stages for greater stability, however, the repairing node must computethe new update weights required by its neighbors in N (n). Note that, to maintain

proper integral bookkeeping, nodes in N (n) must re-compute their scale-j integralsto account for n’s new set of predict coefficients, and disconnected node m must re-move n’s contribution to its scale-j integral once it realizes n has not responded witha scale-j wavelet coefficient. This implies that m must keep track of n’s contributionto m’s integral at scale j. Additionally, this will trigger re-computation of all updatefilters at coarser scales that descend from the scaling functions of m and N (n) atscale-(j− 1). Clearly, this in-network link repair technique is suited only to networkswith occasional connectivity losses, as the repair overhead will become prohibitiveas the frequency of failing network links increases. To control the degree of repairneeded, the terminal scale of the transform can be set at some finer scale j > j0. Finerterminal scales lead to less sparsity in the wavelet coefficients but limit the spatialscope of repair traffic.

Thus, there is a tradeoff between using the less-stable but easier-to-repair predict-

36

only transform and the more-stable update transform whose repair can incur signifi-cant cost. We examine the relative performance of these two transform classes underthe application of distributed compression in Section 4.2.

3.7 Spatio-Temporal Wavelet Analysis

The transform as designed thus far in this chapter has been directed at spatialfields captured by the nodes — that is, at measurements arising from a snapshot of theobserved phenomenon at a single instant in time over a 2-D domain. It is quite likelythat, even with onboard memory constraints, nodes will also be capable of storinglimited time series of measurements, forming instead a spatio-temporal measurementfield. Measurements from such a 3-D sampling domain will exhibit spatial correlationsamong measurements at the same time index but also temporal correlations amongthe set of data gathered at each node. Thus, by developing a fully 3-D transform,we can perhaps more effectively exploit these correlations to increase sparsity in thewavelet domain representation. Fortunately, such an extension is straightforward. Indoing so, we assume as in the spatial case that nodes coordinate to align the firstsample of their measurement series in time and we also assume that nodes sample intime at uniform, regular intervals.

Recall that, in Section 2.2, the regular-grid 2-D WT is composed through separableapplication of regular-grid 1-D transforms, first in one dimension and then in theother. We can easily form a spatio-temporal 3-D transform for the sensor networksetting using a similar process. First, each node computes a 1-D, regular-grid WTon its time series. This step is carried out in isolation at each node, as it involvesaccess only to a given node’s time series. Then, the 2-D transform is repeatedlyapplied across the network for each index of the 1-D wavelet coefficient series, usingthe same-index element at each node as an input to the spatial transform. Such aprocess is analogous to repeated application of the 2-D transform on each elementof the time series but instead operates on coefficients from the higher-sparsity 1-Dwavelet series at each node, increasing the sparsity of the final wavelet coefficients.

As described thus far, nodes share full-bitrate wavelet coefficient vectors in the 2-D portion of the 3-D encoding process. Recall that, for smooth or piecewise-smoothtime series, these coefficient vectors are likely to exhibit a good deal of coefficientsparsity resulting from application of the 1-D WT at each node. We can leveragethis sparsity to reduce the description size of the 1-D wavelet coefficient series sharedbetween nodes in the 2-D encoding phase. Specifically, we can apply such lossy codingtechniques as the zerotree coder of Shapiro [23] to reduce the number of bits requiredto describe each 1-D wavelet coefficient series at only a slight loss in descriptionfidelity.

As numerical study in Section 4.5 will subsequently show, the 3-D transform isrobust to the slight intermediate errors induced by compression of 1-D wavelet coeffi-cient series. We now proceed to illustrate this fact and the general applicability of the

37

2-D and 3-D transforms to the task of distributed compression of node measurementsthrough a variety of numerical simulations in the following chapter.

38

Chapter 4Distributed Wavelet Compression

Wavelet analysis de-correlates data, yielding a wavelet representation that is muchsparser than the original, given a well-designed transform. Said otherwise, signalenergy is concentrated in a relatively few wavelet coefficients, so that there are fewerimportant values in the wavelet-domain representation than there are in the originaldata set, where all measurements must be treated a priori as equally important.

Thus, in-network wavelet analysis naturally leads itself to distributed compres-sion of measurement fields in the sensor network setting. Following the distributedwavelet transform (WT), only a few sensors have wavelet coefficients with significantmagnitudes, so by harvesting these coefficients at the data sink, setting non-harvestedcoefficients to zero, and computing an inverse WT, we can approximate the measuredfield values at a fraction of the cost of collecting all measurements at the sink. Forsmooth and piecewise-smooth fields, the majority of wavelet coefficients will be smallin magnitude, so the savings over the bulk-collection strategy can be significant. Thisespecially holds true in the immediate neighborhood of the sink, where bandwidthand power requirements for relaying nodes will be otherwise quite substantial.

Applying the WT to distributed compression affords us the opportunity to exam-ine in greater detail the numerical behavior of the transform introduced and analyzedin the previous chapter. Specifically, while we prove that wavelet coefficients from theirregular-grid WT inherit the decay properties of those in the regular-grid setting, wehave not yet demonstrated the transform’s stability under applications that modify asubset of coefficients in the wavelet domain. Parseval’s relation applied to orthogonalWTs gives us that the N largest-magnitude wavelet coefficients give the best N -termnonlinear approximation of the original measurements in terms of minimizing recon-struction error. For the biorthogonal class of WT developed here, the basis functionsdo not form a complete, orthonormal basis. Nor are they guaranteed to form a tightframe. Thus, no variation on Parseval’s relation is guaranteed to hold, and we are nolonger guaranteed that small-magnitude wavelet coefficients contribute proportion-ally small energy to the spatial- (or spatio-temporal-) domain reconstruction [21].Therefore, we must simulate the compression properties of the proposed transform tounderstand how it behaves under magnitude-based approximation.

Additionally, we have not yet addressed the cost of computing the in-networkWT. Recall that the utility of distributed data processing as described in Section 1.2depends on successfully trading local collaboration and computation for wholesale,long-distance transmission of data to the sink. The transform proposed in Chapter 3requires a good deal of inter-node collaboration at spatial scopes that become larger

39

as the number of transform scales increases. Thus, we must still verify that thecollaborative cost of in-network wavelet analysis does not exceed the energy costof the naive method of sending all nodes’ measurements directly to the data sink.Distributed compression provides an excellent framework in which to conduct thisanalysis.

In this chapter,1 we examine the compression properties of the WT through ex-tensive numerical simulations. We first introduce a basic distributed compressionprotocol to follow the distributed WT protocol of Section 3.6. We then begin oursimulations with a study of spatial measurement field compression in Section 4.2. InSection 4.3 we map the network traffic associated with computing the spatial trans-form and harvesting wavelet coefficients to energy costs by means of network trafficsimulations, demonstrating the energy savings that are possible with wavelet-basedcompression in sensor networks. We discuss extensions to the basic compressionprotocol in Section 4.4. Finally, we extend our investigation to compression of spatio-temporal fields in Section 4.5.

4.1 Basic Compression Protocol

The goal of distributed compression is collection of the n wavelet coefficients thatencode the best n-term approximation to the measured field, as n ranges from somesmall number to the total number of network measurements. Assuming our WT is welldesigned (which we will verify subsequently), this amounts to collecting the n largestwavelet coefficients. In the centralized setting of traditional image compression, theuser has access to all wavelet coefficients and can easily find the n largest througha simple magnitude sort, typically increasing n until some desired accuracy in thereconstruction is met. In the sensor network setting the goal is to avoid collecting allthese coefficients in one place, so we must turn instead to an alternative solution.

The sink adopts a thresholding approach to harvesting significant wavelet coeffi-cients. It initially broadcasts to the network a starting threshold above which coef-ficient magnitudes are deemed significant, and all nodes with significant coefficientsrespond with their wavelet coefficient value (spatial) or values (spatio-temporal) andnode identification number. Note that each node with a scaling coefficient at the finalscale must always transmit its scale-j0 scaling coefficient to the sink. The sink thenlowers and re-broadcasts the threshold, and all newly significant nodes (i.e., thosethat have not replied in an earlier round) respond with their data. The process con-tinues until the difference between the signal energy of the previous approximationand the new one is below some desired threshold, indicating that the majority ofsignal energy has been harvested.

1This chapter represents joint work with Richard Baraniuk, Hyeokho Choi, Albert Cohen,Veronique Delouille, Shu Du, David B. Johnson, and Shriram Sarvotham [30–33].

40

4.2 Spatial Compression Performance Study

With this threshold querying model in mind, we can design experiments to ver-ify the stability of the proposed WT to successive, threshold-based approximation.We examine the transform’s performance using both globally smooth and piecewise-smooth measurement fields. The globally smooth case reflects the smoothness as-sumptions underlying the proof of wavelet coefficient decay properties in Section 3.4and thus provides an important benchmark for analysis. From experience in the reg-ular grid setting, we expect that the wavelet coefficients of such signals will tend tooccur at coarse spatial scales, and the fine scale wavelet coefficients will typically besmall in magnitude. This is not the case, however, for piecewise-smooth measurementfields exhibiting edge features such as jump discontinuities. For these fields, experi-ence suggests that such significant wavelet coefficients will be greater in number andtend to cluster around the edges, persisting from coarse to fine spatial scales. Verify-ing that the transform continues to perform well when given these less ideal fields iscrucial to assessing the transform’s adaptability to a variety of signal classes.

To make the notion of good performance more specific, we desire that the approx-imation in the smooth case have high fidelity with just a few wavelet coefficients andthat it decay quickly to zero error as more coefficients are added — in other words, wewant a very few of the wavelet coefficients to contain the majority of the signal energy.In the piecewise-smooth case, a greater number of wavelet coefficients are likely to besignificant, and we expect the decay to be more gradual than in the globally smoothcase. Nevertheless, we still desire that the approximation decay fairly quickly to zeroerror by incorporating only a fraction of the largest wavelet coefficients.

We examine the spatial transform in this section, where nodes populate a 2-Ddomain and capture a single snapshot of the measured field. To evaluate how thistransform responds under such threshold-based querying, we conduct largest-n-termapproximation experiments using 100 randomly generated instances of irregularly-spaced grids of N = 250 points. We chose local plane-fitting as the prediction model,and commensurate with order m = 1 polynomial approximation we choose Cs testfunctions, where s ≤ m + 1. Specifically, we randomly generate and sample C2

functions at the grid locations, realizing each function as an order-k polynomial, kchosen uniformly and randomly on [m + 1, ..., 10], with polynomial coefficients drawnfrom a random, uniform distribution on the unit interval. Figure 4.1 depicts sampleinstances of the smooth and piecewise-smooth fields that are then sampled on irregulargrids to yield the measurements input to the transform.

We examine the transform using the predict-only formulation and the predict,least-squares update formulation. Recall from Section 3.5 that we expect the pre-dict/update transform to have greater stability than the predict-only transform andthus have more desirable compression performance. However, the cost to repair thepredict/update transform in response to network communication failures is substan-tially greater, so the following experiments help to better characterize this tradeoff.

Reconstruction quality is measured as the average of the squares of the differences

41

(a) (b)

Figure 4.1: Example measurement fields: (a) globally smooth (b) piecewise smooth acrossa jump discontinuity.

between approximated and actual sensor values. To match this mean-squared error(MSE) metric, we implement `2 thresholding of wavelet coefficient magnitudes: ascale-j coefficient dj,n is compared against the uniform threshold as 2−j|dj,n|. Thismatches `2 normalization applied to regular-grid WT’s as discussed in [45]. Startingwith the final set of scaling coefficients and no wavelet coefficients, we approximatethe field using successively more of the largest wavelet coefficients. The mean squarederror between the approximated and original fields is computed in the spatial domain,averaged over all 100 trials, and plotted in Figure 4.2(a), where the dotted line cor-responds to the predict-only transform and the solid line to the least-squares updatetransform. The transform data perform well under successive approximation, witha smooth decay of in the error as more terms are added. Also, as expected, thestabilized least-squares update transform provides a better approximation with fewercoefficients.

To examine the ability of the transform to adapt to data that is only piecewisesmooth, we repeat the experiment above, but we add a discontinuity in the randomly-generated C2 fields along the line x = y. The results are depicted in Figure 4.2(b).While the decay for each technique is not as rapid as in the case of the globally-C2

field, we see that as before, the least-squares update transform continues to moreefficiently represent the field than the predict-only transform. Thus, in situationswhere the prospect of frequent in-network repair is likely, we can safely resort tothe predict-only transform, that shows stable decay under successive approximation.Additionally, the transform cost is halved under this model, since no update traf-fic is required. When the network connectivity is not as volatile and the greatertransform cost can be paid, however, the greater performance of the predict/updatetransform justifies its use in favor of the simpler predict-only transform. We use the

42

0 50 100 150 200 2500

1

2

3

4

5

6

7

8x 10

−4

number of coefficients

aver

age

MS

E

Smooth Field

no updateLS update

(a)

0 50 100 150 200 2500

0.5

1

1.5

2x 10

−3


aver

age

MS

E

Piecewise−smooth Field

no updateLS update

(b)

Figure 4.2: Reconstruction error versus coefficient count, averaged over 100 trials of ran-domly generated (a) smooth and (b) piecewise-smooth fields using randomly generated250-point grids. The dotted line traces error for the predict-only (no update) transformand the solid line traces error for the predict transform followed by an LS update stage.

43

0 200 400 600 800 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

number of sensors

ener

gy (

Joul

es)

WT overheadall−node dump

0 200 400 600 800 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

number of sensors

ener

gy (

Joul

es)


(a) (b)

0 500 1000 15000.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

number of sensors

ener

gy (

Joul

es)


0 500 1000 15000.02

0.04

0.06

0.08

0.1

0.12

number of sensors

ener

gy (

Joul

es)


(c) (d)

Figure 4.3: ns-2 simulation of the energy cost of computing the WT (solid) and dumpingall raw measurements to the data sink (dashed) versus network size. Energy is computed inthe bottleneck metric for (a) 10 and (b) 20 radio neighbors per sensor and in the networkaverage metric for (c) 10 and (d) 20 radio neighbors per sensor.

predict/update formulation in the remainder of the numerical experiments in thisthesis.

4.3 Transform Communication Cost

Computing the WT in a distributed fashion requires a nontrivial amount of com-munication between nodes in the sensor network. And while this communication isconsidered local within scales, the spatial area covered by local neighborhoods in-creases as scale coarsens. Thus, the communication overhead for obtaining the trans-form data may become substantial, and in some cases, dominant. While this may bethe case for certain network configurations, we now use the example of distributedcompression to demonstrate that there exist break-even points in both network size

44

and communication density beyond which the cost of multiscale analysis is accept-able. Wavelet compression allows the network to trade reconstruction quality forcommunication energy and bandwidth usage, so we must show that there exists apoint where energy savings are no longer offset by the overhead cost of computingthe wavelet coefficients.

We note briefly that results in this section arise from [32], which proposes a dis-tributed WT on which the transform described in Chapter 3 is closely based. Waveletprediction according to [32] tends to use slightly more neighbors on average than theChapter 3 transform, so results depicted in this section arise from a more costlyapproach and serve to upper bound those expected from the Chapter 3 transform.

4.3.1 Break-Even Analysis of Distributed Wavelet Processing

For a given network configuration (say, fixed communication density), there will bea network size below which wavelet-based compression is less efficient than straight-forward forwarding of the set of raw measurements to the data sink. In other words,the overhead cost of the WT, prior to sending any significant wavelet coefficients tothe sink, will dominate the cost to offload all measurements. The same logic appliesto fixed network size with varying size of node radio neighborhoods.

For a relatively small number of sensors with large transmission ranges, eachsensor has a large fraction of the network as its radio neighbors. The collaborativeoverhead of the WT in this case is wasted — a single direct communication withthe sink is likely to be possible and is less expensive than communication with aset of neighbors. For a network of many sensors with few radio neighbors, however,hop count begins to approach geographic distance, and the localized nature of WTcommunication will require paths on average much shorter than a node’s expecteddistance to the sink. This allows the WT overhead to easily win with plenty of energyleft over for streaming significant coefficients to the sink.

To explore this tradeoff, we simulate the energy cost of WT network traffic andthe traffic to dump all measurements to the data sink. The average number of radioneighbors for a network is fixed, and networks of varying size are generated withuniform random node placement. Two energy metrics are used. The first is averageenergy consumed at the one-hop neighbors of the sink, which we refer to as bottleneckenergy. This is a critical metric, since it gives a direct measure of network lifetime.When the one-hop neighbors of the sink deplete their power supplies, the sink will becut off from the network, rendering it useless. The second metric is average energyconsumed by all nodes in the network.

Energy expenditure is simulated using version 2.1b8a of ns-2 with the Monarchwireless and mobile extensions [46]. These extensions model an IEEE 802.11-basednetwork with a wireless physical rate of 2Mbps, a nominal wireless transmission rangeof 250 m and a carrier sensing range of 550 m. The RTS/CTS of 802.11 are turned offduring our simulations so that all the unicast traffic use DATA/ACK frames in theMAC layer. For the node energy model, we use the default energy model provided

45

by ns-2, that defines a transmission power of 0.6W and receiving power of 0.3W; thenetwork area is scaled so that node radios achieve a desired coverage (average numberof radio neighbors per node). All of the nodes are initialized with enough power sothat their supplies are not depleted during the simulations. We assume an omnipotentrouting protocol has already been deployed to provide the shortest path between anytwo nodes before the WT process starts, so that there is no additional routing trafficduring the process of wavelet transformation or querying and harvesting. In all cases,the sink is located at the center of the measurement field, and packets are 24 bytes inlength, allowing each to carry a single coefficient as an 8-byte double-precision floatingpoint number with 16 bytes left over for potential header information. Results for eachnetwork size are averaged over 5 instantiations using randomly, uniformly distributednode locations.

Figures 4.3(a) and 4.3(b) show results for the bottleneck energy metric, plottingenergy consumed versus number of sensors in the network. Figure 4.3(a) sets thenetwork to have an average of 10 radio neighbors per sensor, while Figure 4.3(b)permits an average of 20 per sensor. The crossover point is seen for a relatively smallnumber of sensors in each — around 200 for the 10-neighbor case and around 300for the 20-neighbor case. This result is not surprising. The bottleneck cost of thenetwork dump scales with the number of sensors, so we expect to see a roughly linearincrease in cost. For the WT, each node requires on average 3.5 neighbors to predictits wavelet coefficient. As the network size increases, this cost remains fixed for one-hop neighbors of the sink, and the remainder of their energy expenditure depends on(i) the number of sensors they help predict, and (ii) the amount of network trafficrouted through them. The net effect of these two costs will tend to increase slowly,but it is dependent on random node placement and the trend is not as visible withthe bottleneck energy metric’s average over a small number of nodes.

Figures 4.3(c) and 4.3(d) track energy expenditures averaged among all nodes inthe network — again, for 10 and 20 radio neighbors per sensor, respectively. Here thecost of the WT is averaged among a larger number of sensors, and we can see moreclearly its gradual growth with increasing network size. Not surprisingly, a largernumber of nodes are required to achieve a crossover in the WT energy cost, sinceaverage energy does not fixate on the sink the way bottleneck energy does. For the10-neighbor case, a crossover is evidenced around 1300 nodes, and for the 20 radioneighbors the WT curve is trending toward a break-even point near 1500 nodes.

An alternate look at global energy expenditure provides a bit of additional in-tuition to the benefits of the WT. Consider Figure 4.4, which presents histogramsof node energy consumptions for the 1000-node, 10-neighbor network. Figure 4.4(a)gives the energy distribution for the network-wide dump while Figure 4.4(b) showsdistribution of WT overhead energy. Clearly, though the mean of the WT expendi-ture is a bit higher, its variance is much lower than the dump, indicating that theWT spreads the energy cost more uniformly across the network, thereby avoiding con-centrating it in regions such as the bottleneck. Thus, even for networks where the

46

0 0.1 0.2 0.3 0.4 0.50

5

10

15

20

25

energy (Joules)

sens

or c

ount

0 0.1 0.2 0.3 0.4 0.50

5

10

15

20

25

30

35

energy (Joules)

sens

or c

ount

(a) (b)

Figure 4.4: Histograms of relative energy usage for (a) a dump of all measurements to thesink and (b) the WT calculation overhead for a network of 1000 nodes with an average of10 radio neighbors per node. Note the much smaller variance of WT energy consumption.

(a) (b)

Figure 4.5: Distortion/Energy analysis example measurement fields: (a) a noisy, discon-tinuous quadratic field, and (b) random Gaussian bumps populating a smoothly-varyingquadratic field.

global expenditure is greater, the WT can help spread the cost of data gathering overall nodes in the network, prolonging the lifetime of the bottleneck at the expense ofhigher power output at non-critical sensors.

4.3.2 Distortion/Energy Analysis of Distributed Compression

Given that the cost of the WT is non-prohibitive for networks of a certain size,we can evaluate its utility for the distributed compression application. Wavelet com-pression enables trading reconstruction error for energy spent transporting waveletcoefficients to the sink, tracing a decreasing distortion curve along an energy axis(E/D curve), an extension of the distortion versus coefficient count experiments of

47

Section 4.2. The first point on the curve begins at the WT’s energy overhead, whereno wavelet coefficients (but all coarsest-scale scaling coefficients) are sent to the sinkfor maximum distortion. It effectively ends at the energy cost for transporting allraw measurements to the sink. With good energy compaction properties, the recon-struction should have nearly zero error using much less energy than a network-widemeasurement dump.

For all the query traffic, we use the threshold broadcast scheme described inSection 4.1. The sink node broadcasts the query to its one-hop neighbors, and anode forwards the query to its neighbors upon first receipt. Reply traffic followsa unicast model, where the source node and all the intermediate forwarding nodesdeliver packets along the shortest path provided by the omnipotent routing protocol.

We consider 1000-node samples of a different pair of sample functions than consid-ered in Section 4.2. These fields are illustrated in Figure 4.5 and further demonstratethe applicability of the proposed WT to a wide range of field classes. Figure 4.5(a)shows a noisy, inverted quadratic bowl with a discontinuity along the line x = y. Thisnoise feature further enhances the difficulty in compressing the field measurementsdue to the discontinuity.2 Figure 4.5(b) gives a set of randomly located Gaussianbumps of random height populating a slowly-varying quadratic field. Both fields ex-hibit super-planar features well beyond the first vanishing moment of the WT. Meansquared error is again used to measure the quality of the reconstruction.

Distortion versus bottleneck energy is plotted for the discontinuous quadraticfield in Figures 4.6(a) and (b) for 10 and 20 radio neighbors on average per node,respectively. In both cases, the dashed vertical line marks the energy consumed bydumping all node measurements to the sink. The distortion drops substantially inboth instances when only a small fraction of coefficients are sent to the sink, andthe network dump energy lies well within the effectively zero-distortion regime of thecurves. Small reconstruction error is realizable using upwards of 30% of the dumpenergy in the 10-neighbor case and 50% of the dump energy in the 20-neighbor case.Results are even more impressive for the smooth Gaussian bump field, shown inFigures 4.6(c) and (d) for 10 and 20 average radio neighbors. To achieve nearly zeroreconstruction error using wavelet compression, we require only 30% of the dumpenergy in the 10-neighbor case and 50% of the dump energy for the 20-neighbor case.Note that these results apply to the more costly predict/update transform. For thepredict-only transform, the overhead cost will be half of that depicted in Figure 4.6,though the decay of the approximation curves will be a bit more gradual, as in Figure4.2, due to loss of the stabilizing effect of the update stage.

2In this chapter, we do not attempt to remove the effects of the noise. This is left for Chapter 5.

48

0 0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05

0.06

bottleneck energy (Joules)

MS

E

0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05

0.06


MS

E

(a) (b)

0 0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05


MS

E

0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05


MS

E

(c) (d)

Figure 4.6: Distortion versus energy curves for 1000-node samplings of the discontinuousquadratic field with (a) 10 radio neighbors per node and (b) 20 neighbors per node on averageand the Gaussian bump field with (c) 10 and (d) 20 average radio neighbors. In all cases,the vertical dotted line marks the energy required to send all 1000 original measurementsto the sink.

4.4 Modifications of Basic Compression Protocol

We now briefly discuss potential modifications to the basic threshold-query com-pression protocol presented in Section 4.1.

4.4.1 Multiple-Threshold Queries

As mentioned in Section 4.1, achieving a desired reconstruction fidelity involveschoosing a proper threshold for judging coefficients’ significance. Picking the thresh-old to achieve a desired MSE from the outset requires not only access to all coefficientvalues but also truth data against which to measure the error — neither of which areavailable at the sink. Thus, in practice we must pick a reasonable starting thresh-

49

old, query the network for significant coefficients, and then repeat the process withsubsequently lower thresholds until the signal energy difference between successiveapproximations lies below some target value. Intelligent choices for threshold val-ues must be gleaned from network history, since the user no longer has the luxuryof accessing all coefficients at no cost and sorting them by magnitude to determinethe best n-term approximation. Thus, compression in the sensor network setting isfundamentally different from standard image compression, since the user must pay aprice to harvest data. It therefore behooves us to reduce this cost as much as possible.

We must flood the network with at minimum two threshold queries to evaluate thestop criterion, but in general we may need to issue multiple requests. Rather thanissuing multiple network-wide query floods, we propose a more efficient approach.The first query flood contains a set of decreasing threshold values to sweep along theE/D curve from high to low distortion regimes. Nodes then respond to the sink ina time delayed fashion. Those above the first threshold immediately respond. Thosebetween the first and second thresholds — in the second threshold “band” — respondafter some delay. Those in the third band reply after a greater delay, and so on. Ascoefficients stream in, the sink periodically measures the signal energy differencesbetween successive approximations and issues a “stop” flood when some target hasbeen reached. The savings from issuing two rather than arbitrarily many queries canbe substantial, provided that the threshold bands can be contained in a single floodpacket. Such a feat can be accomplished by either describing thresholds at less thanfull double floating-point precision or storing a finite number of candidate thresholdsat each node and referring to these with integer indices in query packets.

To illustrate the potential savings, we compare the cost of issuing multiple querieswith that of the banded-query approach using the network simulation methodology ofSection 4.3. A series of 10 approximation rounds are conducted on coefficients fromthe 1000-node discontinuous quadratic field. The bar plot of Figure 4.7(a) depicts theenergy expended per round for repeated querying, and Figure 4.7(b) gives the energyfor issuing a single banded query followed by a stop flood. Each technique issues aquery and expends the same energy in the first round, but the banded-query techniqueconsumes approximately half the energy in subsequent rounds by not issuing a query.Even the final “stop” query it issues after the 10th round does not substantially raiseits total energy cost, that is about 60% of the repeated querying approach’s cost of0.07 J.

4.4.2 Successive-Approximation Quantization

Recall that, under the threshold querying model proposed in Section 4.1, when anode first deems its coefficient significant relative to a threshold sent from the sink, itsends that coefficient in its entirety back to the sink. We assume thus far that wirelessnetwork packets have sufficient data payload to contain all bits of the coefficient, theidentification of the sending node, and any meta-data required for routing.

The packet size, and thus the data capacity, is limited by the particular radio

50

1 2 3 4 5 6 7 8 9 10 110

1

2

3

4

5

6

7

8x 10

−3

ener

gy (

Joul

es)

approximation round1 2 3 4 5 6 7 8 9 10 11

0

1

2

3

4

5

6

7

8x 10

−3

ener

gy (

Joul

es)

approximation round

(a) (b)

Figure 4.7: Energy expenditure per approximation round for 10 rounds of (a) repeatedquerying and (b) single banded-query with stop message after round 10.

used in each sensor node, so a packet may not always be able to contain a waveletcoefficient in the entirety of its precision. One potential solution involves allowing therouting protocol to control data fragmentation into multiple packets and re-assemblyof the datum at the receiver. We discuss this procedure in more detail in Section6.2.5.

Such a mechanism may not be provided by the networking, however. As analternative, we can consider adapting a component of the wavelet zerotree coder ofShapiro [23]. Though we consider direct application of this coder to the temporalportion of a spatio-temporal WT in Section 3.7 and in Section 4.5 below, for now wefocus on adapting the successive-approximation quantizer used by the coder to thesituation where packet fragmentation is not an option.

The zerotree coder uses a series of dyadically decreasing thresholds to determinecoefficient significance and define quantization intervals. Specifically, an initial thresh-old T0 is first set so that the magnitude of any wavelet coefficient is less than 2T0,and each subsequent threshold is defined as Ti = Ti−1/2. Coefficients significant rel-ative to T0 are given a bit to describe that they lie in the upper half of the [0, 2T0]interval. This defines the first coefficient quantizer precision. For thresholding roundsi > 0, a coefficient is deemed newly significant when its magnitude lies in the interval[Ti, Ti−1], and it is added to a list of significant coefficients. Following this addition,each coefficient in the list is given a bit of precision which narrows its quantizer binwidth to Ti.

This technique allows addition not only of newly significant coefficients but refine-ment, one bit at a time, of the precision at which we describe coefficients already onthis list. When raw coefficient precision exceeds the payload capacity of a networkpacket, we may use such an approach to divvy up the data among multiple packets.This amounts to user driven, application specific fragmentation and re-assembly in-spired by proven approaches in traditional wavelet processing. Note, though, that this

51

requires enough prior knowledge of coefficient magnitudes to set a starting thresholdT0. Also, since packets will likely carry more than a single bit of data, we will proba-bly favor an approach that divides the threshold by more than a single power of twoeach time, giving more than a single bit of refinement.

4.5 Spatio-Temporal Compression Performance Study

As mentioned in Section 3.7, we can construct a spatio-temporal transform by firstcomputing an isolated temporal transform of each node’s time series (1-D domain)and then repeating the spatial transform on each plane of the wavelet coefficient series(2-D domain). This yields a 3-D set of wavelet coefficients that we can then subject tothreshold-based querying for the purpose of distributed compression. In this section,we investigate the advantages of applying such a 3-D transform to spatio-temporalmeasurement fields captured by a sensor network. Specifically, we are interested incomparing the performance of repeated 1-D compression at each node, repeated 2-Dcompression across the network at each instance in time, and fully 3-D compression.Under the 1-D methodology, a node merely computes a 1-D WT on its time seriesand responds to repeated queries from the data sink with its 1-D wavelet coefficients.The 2-D methodology proceeds as described in the spatial compression case, with theset of nodes repeating the 2-D transform for each snapshot comprising the spatio-temporal data set. Specifically, we are interested in gaining an understanding for howthe relative performances change as the number N of sensor nodes and the length Tof the recorded time series change. The 1-D comparison is especially crucial, sinceisolated temporal processing at each sensor requires no inter-sensor collaborationand therefore no communication overhead to compute wavelet coefficients prior tocollection at the sink. For spatio-temporal wavelet analysis and compression to befeasible in the sensor network setting, there must be an operating regime in whichsubstantial gains over 1-D compression can be demonstrated.

We consider time-varying versions of the two field classes in Figure 4.1, where eachfield moves across the spatial distribution of nodes with time. We begin by examiningthe case of a globally smooth, time varying field. We fix N at 500 nodes and allow Tto vary from 128 to 512. Both the 1-D encoder and the temporal portion of the 3-Dencoder use a Daubechies-8 WT. Compression results for a representative, randomlygenerated instance of each (N, T ) pair are depicted in Figure 4.8, plotting MSE versusnumber of returned coefficients as in previous experiments. We see that, in Figure4.8(a) with T = 128, the 3-D compression clearly outperforms 1-D compression andfares very well against 2-D compression at all but the lowest coefficient count, whereperformance is nearly identical. As T increases to 256 in Figure 4.8(b) and 512 inFigure 4.8(c), the performance of the 1-D transform steadily improves as the greatersampling density leads to greater sparsity in the 1-D wavelet coefficient series. The3-D transform leverages this increased sparsity in its temporal component, remainingcompetitive against the 2-D transform. To examine the behavior as the time-series

52

length remains fixed and the number of nodes scales, we repeat the experiment withT = 512 and let N = 250, 500, 512. Results are shown in Figure 4.9. We see that,with N = 250 in Figure 4.9(a), repeated 2-D compression is not competitive againstrepeated 1-D compression, but the 3-D transform combines the sparsity of both toproduce a superior encoding. As N increases to 500 in Figure 4.9(b) and 750 in Figure4.9(c), we see the relative performance of 2-D compression improving, while the 3-Dtransform still maintains its dominance.

Thus, from the results of Figures 4.8 and 4.9, we see that the 3-D transform per-forms well for time-varying, smooth fields, combining the best features of both the1-D and 2-D components. When the ratio between the number of nodes N and thenumber of time samples T each node retains is high, 3-D compression outperforms1-D compression by a significant margin — indicating the greater overhead in com-puting the 3-D transform can be justified. As this ratio decreases, the 3-D transformstill exhibits superior performance, but the margin reduces enough that overhead con-siderations may preclude its effective use. In dense networks of simple, inexpensivesensors, we expect that node memory resources will be limited, so a high N/T ratioshould represent a feasible design point.

We now repeat these experiments with piecewise smooth, discontinuous fields, toexamine the behavior in a more challenging environment. Recall that, while WTsare optimal for smooth fields, they cannot optimally encode single edges, that tendto manifest as significant wavelet coefficients persisting from coarse to fine transformscales. Thus, we expect a more complex interplay between the 1-D, 2-D, and 3-Dtransforms.

We begin by fixing N at 500 nodes and examining T = 128, 256, 512. Resultsfor a representative, randomly generated instance of each (N, T ) pair are given inFigure 4.10. In Figure 4.10(a), we can see that for a relatively lower T/N ratio boththe 2-D and 3-D transforms outperform the 1-D transform. As we double T , we see inFigure 4.10(b) that the 1-D transform begins to take advantage of the higher sparsityoffered by its increased sampling density, demonstrating an improvement over therepeated 2-D transform. This benefit is reflected in the 3-D transform, which as aresult widens its improvement over the 2-D transform. Finally, in Figure 4.10(c), wesee that for T = 512 the 1-D transform has essentially overtaken both the 2-D and3-D transforms in its compression performance. At this point, the 1-D transform isso efficiently concentrating signal energy in spite of the signal edge that subsequentapplication of the 2-D transform component in the 3-D transform tends to re-spreadthe energy, resulting in a coefficient set with less sparsity than the 1-D coefficients.

We repeat the experiment, fixing T = 64 and allowing N to vary from 250 to 750,again considering representative, randomly generated instance of each (N, T ) pair.Results are depicted in Figure 4.11. We see in Figure 4.11(a) that, for a relativelylower N/T ratio, both the 2-D and 3-D transforms outperform the 1-D transform,showing a distinct margin of improvement with the 3-D transform. As we increase Nto 500 in Figure 4.11(b), however, the margin begins to reduce. The 2-D transform

53

now benefits from its greater sampling density and shows increased coefficient sparsity,but the effects of the 1-D component in the 3-D composite transform are beginning toreduce this sparsity. Finally, in Figure 4.11(c), we see that N/T ratio has risen highenough that the 2-D transform now outperforms both the 1-D and 3-D transforms.

Clearly, for fields with edge features, there is an interesting interplay between thenumber of sensors populating a given area and the length of the time series each onerecords, in terms of the most efficient wavelet-based compression algorithm to choose.When the T/N ratio is very high — corresponding to very dense temporal samplingbut sparser spatial sampling — repeated 1-D wavelet-based compression, with itsgreatly reduced collaborative overhead, is clearly the best solution. For a very highN/T ratio, however, we see that repeated 2-D compression is preferred, since thehigher cost of the 2-D transform is justified by the increase in performance. For inter-mediate network sizes, the 3-D composite transform tends to blend the performanceof the 1-D and 2-D transforms nicely. Recall, though, that the practical utility ofthe 2-D and 3-D transforms again depends on the cost of their overhead communica-tions being offset by the increase in performance over 1-D compression. Thus, thesemethods are again most useful in the case of dense networks of resource-constrainedsensors, as in the case with globally smooth measurement fields.

This overhead can be reduced, however, by efficiently encoding the wavelet coeffi-cient series shared in the spatial component of the 3-D transform. As we mentionedin Section 3.7, the zerotree wavelet coder [23] provides an extremely efficient meansof encoding wavelet coefficients in the centralized setting. The innovation of the ze-rotree coder comes in efficiently representing coefficients’ positions in the transformvector (1-D) or matrix (2-D) by exploiting tendencies of significant coefficients atfiner scales to descend from those at coarser scales. By using zerotree encoding in the3-D transform to represent the 1-D wavelet coefficient series shared among nodes ineach step of the 2-D transform component, we can significantly reduce the amount ofdata passed between collaborating nodes. However efficient, this encoding techniqueis not lossless, so we must verify that the intermediate errors it induces in the 3-Dtransform do not degrade the compression performance of the resulting 3-D waveletcoefficients.

To do so, we consider a representative, random instance of the time-varying, piece-wise continuous field as before, with N = 150 and T = 64. We compute the 3-Dtransform with no intermediate compression, and then we re-compute the transformby compressing the 1-D wavelet coefficient series shared at each spatial scale of thetransform to approximately 1/4 of their original size. The results are depicted in Fig-ure 4.12. We see that the coefficients from the transform with intermediate zerotreecompression closely track the performance of those from the full-bitrate transform,indicating that the intermediate errors do not significantly impact the performanceof the 3-D transform. This demonstrates the 2-D encoder’s robustness and moreoveris a testament to the zerotree coder’s high efficiency.

54

4.6 Distributed Wavelet Analysis Applicability Summary

We now briefly summarize the intuition gleaned from our studies of the utility ofdistributed wavelet analysis under the application of distributed compression, withthe goal of identifying the network configurations in which distributed wavelet analysisis most applicable. To begin with, we see from the experiments of Section 4.3.1 thatthere exist break-even points in network size (for fixed radio neighbor density) beyondwhich the collaborative cost of distributed wavelet analysis is less than the cost toship raw measurements back to the data sink for collection, both in average energyacross the entire network and average energy in the one-hop neighbors of the sink.As expected, we see that the network-wide average requires a greater node density tobreak even than the one-hop sink neighbor average, since the nodes in the networkbottleneck around the sink bear a disproportional amount of the routing burden whencollecting the entire set of raw measurements. In general, however, as the numberof nodes increase and radio transmission range scales to give each node a constantaverage number of radio neighbors, distributed wavelet analysis shows an increasinggain. Thus, the technique is best matched to networks with a large number of resourceconstrained nodes — a good pairing for the sensor network setting.

We remark that these results apply to the situation where information must berouted to a single point in the network to reach the sink. When multiple locationscan act as gateways to the sink, we expect the crossover to require a greater numberof nodes, in general. Clearly, as the number of sink node locations increases relativeto a given number of data-generating nodes, we expect the utility of distributedwavelet analysis to drop relative to a raw measurement dump. This will be reflectedby a narrower margin between the beginning of the energy-distortion curves and thevertical line marking the network-wide measurement dump in plots such as those ofFigure 4.6. Thus, distributed wavelet analysis is best suited to a deployment wherethe ratio of data-collection nodes to sink nodes is high.

Additionally, it is worth noting that these experiments fix the number of radioneighbors per nodes as the number of nodes populating a given area increases. Asthe transmission radius of each node increases relative to the number of nodes in thenetwork for a fixed area of deployment, we expect that the overall utility of distributedwavelet analysis will decrease. In the extreme case, each node will be able to contactthe sink (or sinks) directly, and the collaborative cost of the WT will be completelywasted.

Finally, considering the case where nodes collect time-series rather than scalarmeasurements, as in Section 4.5, we note that the additional sparsity afforded by thespatio-temporal WT will be offset by the collaborative cost of the spatial componentof the transform. As the sparsity afforded by temporal wavelet processing at eachnodes increases — corresponding to nodes having the capacity to retain longer timeseries records — the extra performance of spatio-temporal processing will likely not beworth the inter-node collaborative cost. Thus, we expect spatio-temporal processingto succeed when the ratio N/T between the number of nodes and the length of the

55

time-series stored at each node is high.

56

0 0.5 1 1.5 2

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4

number of coefficientsM

SE

N=500, T=128

1D2D3D

(a)

0 1 2 3 4

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4


MS

E

N=500, T=256

1D2D3D

(b)

0 2 4 6 8

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4


MS

E

N=500, T=512

1D2D3D

(c)

Figure 4.8: Reconstruction error versus coefficient count for repeated 1-D (dotted), re-peated 2-D (dashed) and 3-D (solid) compression of smooth measurement fields. Repre-sentative results are shown for N = 500 nodes with measurement time-series lengths of (a)T = 128, (b) T = 256, and (c) T = 512.

57

0 1 2 3 4

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4


SE

N=250, T=512

1D2D3D

(a)

0 2 4 6 8

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4


MS

E

N=500, T=512

1D2D3D

(b)

0 2 4 6 8 10

x 104

0

0.2

0.4

0.6

0.8

1x 10

−4


MS

E

N=750, T=512

1D2D3D

(c)

Figure 4.9: Reconstruction error versus coefficient count for repeated 1-D (dotted), re-peated 2-D (dashed) and 3-D (solid) compression of smooth measurement fields. Represen-tative results are shown for measurement time-series lengths of T = 512 with node countsof (a) N = 250, (b) N = 500, and (c) N = 750.

58

0 0.5 1 1.5 2

x 104

0

0.002

0.004

0.006

0.008

0.01


SE

N=500, T=128

1D2D3D

(a)

0 1 2 3 4

x 104

0

0.002

0.004

0.006

0.008

0.01


MS

E

N=500, T=256

1D2D3D

(b)

0 2 4 6 8

x 104

0

0.002

0.004

0.006

0.008

0.01


MS

E

N=500, T=512

1D2D3D

(c)

Figure 4.10: Reconstruction error versus coefficient count for repeated 1-D (dotted), re-peated 2-D (dashed) and 3-D (solid) compression of piecewise-smooth measurement fields.Representative results are shown for N = 500 nodes with measurement time-series lengthsof (a) T = 128, (b) T = 256, and (c) T = 512.

59

0 1000 2000 3000 4000 50000

0.005

0.01

0.015

0.02

0.025

0.03


SE

N=250, T=64

1D2D3D

(a)

0 2000 4000 6000 8000 100000

0.005

0.01

0.015

0.02

0.025

0.03


MS

E

N=500, T=64

1D2D3D

(b)

0 5000 10000 150000

0.005

0.01

0.015

0.02

0.025

0.03


MS

E

N=750, T=64

1D2D3D

(c)

Figure 4.11: Reconstruction error versus coefficient count for repeated 1-D (dotted), re-peated 2-D (dashed) and 3-D (solid) compression of piecewise-smooth measurement fields.Representative results are shown for measurement time-series lengths of T = 64 with nodecounts of (a) N = 250, (b) N = 500, and (c) N = 750.

60

500 1000 1500 2000 2500 30000

1

2

3

4

5

6

7

8x 10

−3

total coefficient count

MS

E

3d3d (ZT)

Figure 4.12: Reconstruction error versus coefficient count for 3-D (solid) and 3-D withintermediate zerotree coding (cross) compression methods.

61

Chapter 5Distributed Wavelet De-Noising

To achieve dense spatial sampling at a reasonable deployment cost, each node in asensor network must be fabricated from relatively inexpensive components — includ-ing the sensors the node uses to measure its environment. As a result, we can expectsome level of error to manifest itself in data gathered by the nodes in the form ofmeasurement noise. This motivates developing a distributed solution that can remov-ing the effects of this measurement noise, either as a prelude to further distributedprocessing or to increase the efficiency of distributed compression. Fortunately, thedistributed wavelet transform (WT) developed in Chapter 3 can greatly facilitate thistask.

Suppose that the sensor network is sampling some field function f and that mea-surements are corrupted by homoscedastic Gaussian noise. That is, the measurementcJ,γ of the field value f(γ) at sensor each γ ∈ ΓJ can be expressed as

cJ,γ = f(γ) + εγ,

where εγ are independent, identically distributed N(0, σ2) random variables of fixedbut unknown variance σ2. As demonstrated for the application of distributed com-pression in Chapter 4, applying the distributed WT to smooth- or piecewise-smoothnoiseless signals concentrates the signal energy in a relatively few wavelet coefficients.Energy of pure noise signals, on the other hand, tends to populate all wavelet coef-ficients uniformly. Thus, by retaining signal energy in those coefficients that containthe majority of the signal and disregarding noise energy in the remainder, we canremove a good deal of the noise component from the signal in the wavelet domain.Similar to compression, this process can be distributed within the sensor networkfollowing a distributed WT so that each node can replace its measurement with ade-noised wavelet coefficient. Such a procedure can apply to both spatial and spatio-temporal measurement sets. For ease of explanation, we restrict the details of ourdiscussion to spatial measurement de-noising here, but we detail the extension tospatio-temporal measurement data in subsequent sections.

Once wavelet coefficients have been modified to remove measurement noise, wecan use the de-noised set of wavelet coefficients dj,λλ∈∆j ,j∈j0,...,J−1 to recover a de-noised version of the original measurements in either of two ways. The first involvesre-constructing this information within the sensor network to give each node a de-noised measurement. This choice is desired when nodes in the network require lessnoisy measurements as input to another distributed data processing task. To do this,

62

we iterate the 2-D lifting transform in reverse. Starting at scale j = j0 and settingcj,γ = cj,γ for each γ ∈ Γj0 , we find cj+1,γ for each γ ∈ Γj+1 using dj,λλ∈∆j0

as

cj+1,γ = cj,γ −∑

λ∈Mj(γ)

bγ,λdj,λ. (5.1)

Using these cj+1,γγ∈Γj+1, we then find cj+1,λλ∈∆j+1

as

cj+1,λ = dj,λ +∑

γ∈Nj(λ)

aλ,γ cj+1,γ. (5.2)

The process iterates to scale j = J , at which point the scaling coefficients cJ,γγ∈ΓJ

give the de-noised signal values that we desire. In the case of 3-D de-noising, theinversion of the 3-D transform is completed by computing an inverse 1-D WT at eachnode.

The second method involves recovering the de-noised signal at a central data sinkfollowing compression of the measurement field within the sensor network. As dis-cussed in the compression protocol presented in Section 4.1, this typically entailssetting some coefficient magnitude threshold above which nodes transmit their in-formation to the sink and below which nodes send nothing. This is realized usingqueries from the data sink of successively smaller magnitudes until it has retrieved allthe signal information it requires. Under this paradigm, compression with de-noisingsimply involves using threshold rules generated by the distributed de-noising processas a stopping rule for coefficient magnitude queries from the sink. Reconstructionproceeds at the sink using a centralized version of (5.1) and (5.2) with thresholdedcoefficient values set to zero.

In this Chapter1, we evaluate the utility of the proposed WT to the task of mea-surement de-noising, similar to our study of its applicability to measurement compres-sion in Chapter 4. In Section 5.1 we review two families of wavelet-based de-noisingtechniques from the literature and approaches to distributing them, at varying costs,in the sensor network setting. In Section 5.2 we explore via numerical simulations theefficacy of the proposed WT for de-noising spatial data, and we extend the study tospatio-temporal data in Section 5.3.

5.1 Distributed De-Noising Methods

We now describe two families of wavelet-based de-noising techniques from the lit-erature that can be adapted to the distributed sensor network setting with differingcomplexity and efficacy. The first technique, known as universal thresholding, at-tempts to find a threshold to segregate signal-bearing coefficients from noise-bearing

1This chapter represents joint work with Richard Baraniuk and Veronique Delouille [47].

63

coefficients. It is easy to distribute and requires very little collaboration among sen-sors to compute. The second, known as Bayesian shrinkage, considers the fact thatcoefficients primarily encoding signal energy still contain a noise component. It at-tempts to achieve a greater performance by treating these noise terms in addition tothose populating the wavelet coefficients that bear little signal energy. The overheadfor the Bayesian family of techniques is significantly greater than that for universalthresholding. In this section, we overview each of these approaches and discuss themechanics of implementing them in the sensor network setting. We primarily describetheir implementation in the context of a 2-D (spatial) transform but discuss extensionof each to the 3-D (spatio-temporal) setting.

5.1.1 Universal Thresholding

The most canonical and straightforward wavelet de-noising technique, called uni-versal thresholding, is found in the work of Donoho and Johnstone [25]. In this chap-ter, we consider hard thresholding, which simply selects a threshold for wavelet coef-ficient magnitudes below which we consider coefficients to only contain noise terms.The cutoff point, known as the universal threshold, is simply given by

√2 log(N),

where N gives the number of nodes in the network. This value, however, applies tothe case of unit variance noise, so we must first account for the true noise varianceσ2 and the effects of the transform, which is not orthonormal and thus normalizeseach wavelet coefficient differently. Call the transform matrix W , and let WW T

(λ) de-

scribe the diagonal element of WW T corresponding to node λ. We re-normalize eachwavelet coefficient dj,λ (λ ∈ ∆j) to unit variance, giving d′j,λ =

dj,λ

σ√

WW T(λ)

. Universal

thresholding for the 2-D transform then modifies each wavelet coefficient as:

dj,λ = dj,λ, |d′j,λ| ≥√

2 log(N)

dj,λ = 0, otherwise.

For the case of a 3-D WT, we merely replace N with NT , where T gives the totalnumber of time samples at each sensor. Additional re-normalization is not required,since we can use an orthonormal 1-D WT in the composite 3-D transform.

Distributing Universal Thresholding

Distributing this technique is straightforward. Each node λ can be assumed toalready know its WW T

(λ), so we must only estimate σ within the network to implementdistributed universal thresholding. And this quantity need not be re-estimated eachtime de-noising is preformed — indeed, estimation of σ need only be commensuratewith the stationarity properties of the noise process. When this is necessary, weemploy the wavelet coefficients themselves through the median absolute deviation(MAD) approach described in [26]. Set d′fine = d′j,λλ∈∆J−1∪···∪∆jmax

— i.e. the

64

finest-scale wavelet coefficients up to some maximum scale jmax. The estimate σ ofthe noise deviation is then given by

σ = med(|d′fine −med(d′fine)|)/0.6745,

where med() denotes the median operator.There are several options for computing the median of a set of points within the

network, and the best choice will likely vary from network to network, depending onfeatures such as the network size, the routing energy economics, etc. We can centralizethis computation within the network at a node convenient to all the fine scale nodes,aggregating d′fine, computing the median in a single operation, and broadcastingthe result. We can alternatively employ a distributed median protocol as suggestedin [48], issuing a set of commands from a central point that eventually return themedian to that point for distribution to the network via a broadcast. Finally, whentime series are available at each sensor, we can entirely avoid estimating σ from 2-Dor 3-D transform data, instead allowing each node γ to form its own estimate σγ usingMAD on its 1-D WT. We can then average these estimates over the entire networkusing a gossiping algorithm such as that in [49], whereby nodes converge to a globalaverage σ = 1

N

∑γ∈ΓJ

σγ using repeated, local communications.

5.1.2 Bayesian Shrinkage

Universal thresholding treats all wavelet coefficients below the threshold as noiseand all coefficients above as signal components, but in reality noise energy populatesall wavelet coefficients, even those with strong signal components. The Bayesian de-noising techniques of Johnstone and Silverman [27] address this shortcoming and canprovided superior noise reduction.

In this Bayesian setting (in 2-D, for now), we consider the set of normalizedwavelet coefficients at each scale j as observations d′j,λλ∈∆j

, and we express each asd′j,λ = µλ + ελ, where ελ ∼ N(0, 1), and the mean µλ has a prior given by the mixture

fprior(µ) = (1− w)δ0(µ) + wρ(µ),

where δ0 represents a probability mass at 0, ρ(µ) is a unimodal symmetric density,and w controls the mixing between the two densities.

To estimate each µλ, we must first estimate the mixing parameter w. Definingthe quantity g = ρ ? ϕ, where ? denotes convolution and ϕ denotes the standardnormal density, we can express the marginal density of dj,λ as (1 − w)ϕ(d) + wg(d).The marginal maximum likelihood estimator w of w is the maximizer of the marginallog-likelihood function

l(w) =∑

λ∈∆j

log(1− w)ϕ(dj,λ) + wg(dj,λ). (5.3)

65

We can easily solve this minimization by defining a score function S(w) = ddw

l(w) as

S(w) =∑

λ∈∆j

β(dj,λ, w), (5.4)

with

β(d, w) =g(d)− ϕ(d)

(1− w)ϕ(d) + wg(d).

Let n give the number of elements in ∆j and let wn be the weight that gives the

universal threshold t(wn) =√

2 log(n). To minimize l(w), we merely find w in therange [wn, 1] such that S(w) = 0. This can be easily implemented through a binarysearch algorithm due to the smoothness and monotonicity of S(w). Selection of wn

depends on the choice of the symmetric, unimodal distribution ρ(µ) and will be dealtwith shortly.

Given our estimate of w, we can then form a posterior density for each µλ giveneach dj,λ. Defining the posterior probability wpost(d) = P (µ 6= 0|D = d) as

wpost(d) =wg(d)

wg(d) + (1− w)ϕ(d)(5.5)

and defining f1(µ|D = d) = f(µ|D = d, µ 6= 0) gives us the posterior density

fpost(µ|D = d) = (1− wpost)δ0(µ) + wpostf1(µ|d).

Posterior Mean

Using this posterior density, we can estimate µλ in one of two ways. The first esti-mator, which is relatively easier to compute, is the posterior mean µ(d; w). Definingµ1(d) as the mean of f1(·|d), we have that

µ(d; w) = wpost(d)µ1(d).

To detail computation of µ(d; w), we now address selection of the density ρ(µ).We choose the so-called “quasi-cauchy” density of [27],

ρ(u) = (2π)−1/21− |u|Φ(|u|)/ϕ(u),

where Φ denotes the standard normal cumulative and Φ = 1 − Φ. For this density,the authors of [27] detail the relevant quantities needed to compute µ(d; w):

g(d) = (2π)−1/2d−2(1− e−d2/2)

andµ1(d) = d(1− e−d2/2)−1 − 2d−1.

66

Using these formulae and (5.5), µ(d; w) can be found through simple substitution,given the mixing estimator w. The final wavelet coefficient estimate is given by

dj,λ = µ(d′j,λ; w)σ√

WW T(λ).

Posterior Median

The posterior mean estimator shrinks all wavelet coefficients to the underlyingsignal values, regardless of their size. It does not provide a thresholding rule, belowwhich small-magnitude coefficients are set to zero. When such a behavior is desirable,we can instead turn to the posterior median estimator µ(d; w). This is a bit moredifficult to compute. To do so, we must first define the quantity

F1(µ|d) =

∫ ∞

µ

f1(u|d)du

For z > 0, µ(d; w) can be found using

µ(d; w) = 0, wpost(d)F1(0|d) ≤ 12

F1(µ(d; w)|d) = 2wpost(d)−1, otherwise.(5.6)

For d < 0, µ(d; w) can be found through the antisymmetry property µ(−d; w) =

−µ(d; w). For the quasi-Cauchy prior density, the authors of [27] calculate F1(µ|d)to be

F1(µ|d) = (1− e−d2/2)−1×Φ(µ− d)− dϕ(µ− d) + (µd− 1)eµd−d2/2Φ(µ).

The first line of (5.6) provides the thresholding rule and also guides selection of thelower bound wn on the search interval for the zero of (5.4). That is, wn is given by

the w satisfying wpost(d)F1(0|d) = 12

for d =√

2 log(n). When the thresholding ruledoes not apply, µ(d; w) must be found using a numerical solver on the second line of

(5.6). The final wavelet coefficient estimate is given by dj,λ = µ(d′j,λ; w)σ√

WW T(λ).

Extending this approach to 3-D is not difficult. Rather than grouping all coef-ficients of the same spatial scale for estimation, we instead group coefficients thathave both the same temporal scale and the same spatial scale in the 1-D and 2-Dtransforms that form the composite 3-D transform.

Distributing Bayesian Shrinkage

Implementing these Bayesian techniques in a sensor network requires distributionof two steps — re-normalization of wavelet coefficients as in the universal thresholdingcase and computation of w at each scale j given the observations d′j,λλ∈∆j

. Thelatter involves optimizing (5.3), a procedure for which a distributed, iterative solutionis found in [50]. Alternatively, and according to routing economics, the optimization

67

L,S H,S L,D H,Dmean 23.30 15.85 20.63 13.72

median 23.24 15.85 19.99 13.48hard 23.17 15.82 18.77 13.24orig 16.89 9.51 16.88 9.54

Table 5.1: Results of in-place spatial de-noising experiments for smooth (S) and piecewise-smooth discontinuous (D) fields subjected to low (L) and high (H) measurement noise usingBayesian posterior mean (mean), Bayesian posterior median (median), and universal hardthresholding (hard) techniques. Compare against error of original data (orig). All results inPSNR in dB, averaged over 50 random trials for each field type using N = 1000 randomly-placed node field samples.

may be solved using (5.4) by collecting all the observations at a single, convenientnode.

5.2 Spatial De-noising Performance Study

We now illustrate the de-noising performance of the described techniques. Weconsider two types of measurement fields and two classes of additive noise. Thefirst field type represents the globally smooth class depicted in Figure 4.1(a). Wedenote this field S. The second type is the piecewise-smooth class of field with a jumpdiscontinuity, depicted in Figure 4.1(a). We denote this field D. This are the samefield types as used in the spatial compression study of Section 4.2. To these fields weadd IID Gaussian noise of either low (L) or high (H) magnitude, using the definitionsfrom [27]. For a low noise level, the ratio between the standard deviation of the noiseand the standard deviation of the signal values is 1/7; for a high noise level, the ratiois 1/3.

We begin by exploring with the 2-D transform the tradeoffs between universalthresholding and the Bayesian techniques. The results for the three techniques arecompared against the original, noisy data in Table 5.1 for a network of N = 1000randomly placed nodes averaged over 50 trials for each of the four field types. De-noising methods are applied to the finest 3/4 of wavelet scales, and estimate qualityis measured as PSNR2 in dB compared against noiseless truth data. We see that forthe smooth fields, the techniques perform roughly the same, with the posterior meanBayesian technique (marked mean) performing slightly better than the posterior me-dian technique (marked median) and with both Bayesian techniques out-performinguniversal hard thresholding (marked hard). This difference in performance betweenthe Bayesian techniques is consistent with the results of [27], since the WT used

2PSNR is defined here as 10 log10(var(true)

var(est−true) ), where true and est represent the true and esti-mated signals, and var() gives the variance of a signal.

68

0 200 400 60016

18

20

22

24

coefficient count

PS

NR

L,S

medianhardorig

0 200 400 6008

10

12

14

16

18

coefficient count

PS

NR

H,S

medianhardorig

(a) (b)

0 200 400 60012

14

16

18

20

coefficient count

PS

NR

L,D

medianhardorig

0 200 400 6009

10

11

12

13

coefficient count

PS

NR

H,D

medianhardorig

(c) (d)

Figure 5.1: Coefficient count versus distortion in PSNR using posterior-median Bayesiande-noising (solid) and hard universal thresholding (dashed) for (a) low-noise and (b) high-noise globally smooth fields and (c) low-noise and (d) high-noise piecewise smooth fieldswith a jump discontinuity. Compare to the distortion using the original, noisy data (dotted).Results are averaged over 50 trials using randomly generated fields sampled by randomly-placed networks of N = 500 nodes.

here is not translation-invariant. The disparity is magnified when the field containsa discontinuity. Recall that such a feature manifests itself in the wavelet domain asrelatively large magnitude wavelet coefficients at very fine scales — scales that tendonly to contain noise energy when fields are globally smooth. The strong signal com-ponent complicates noise removal, allowing the Bayesian techniques more opportunityto showcase their superior performance.

These results give rise to a first rule of thumb for wavelet de-noising in sensor net-works. For the application of in-network de-noising, when the S fields are expected,universal thresholding is the preferred solution. The performance is not substantially

69

different from that of the Bayesian techniques, and it requires no additional com-munication overhead past noise variance estimation. When D fields are more likelyand additional PSNR is required, then the posterior mean Bayesian estimator maybe used. The posterior median estimator is less preferable, since it requires the samecommunication overhead and gives poorer results than the posterior mean technique.

For the application of de-noising with compression, however, the decision is clearlybetween universal thresholding and the posterior median technique, since the pos-terior mean technique provides no thresholding rule. To explore this, we conductcompression experiments whose results are depicted in Figures 5.1(a)-(d). For eachof the four field types sampled by N = 500 randomly placed sensors, we simulate theresults of successive queries from the sink by computing the reconstruction quality inPSNR using the n largest-magnitude coefficients as n tends to N . Results are aver-aged over 50 trials with each field. The dotted curve in each figure shows the resultsobtained using the original, noisy data. Each curve has a clear maximum, after whichaddition of wavelet coefficients to the approximation only adds noise energy. Withoutan oracle to identify this maximum, we must instead turn to the threshold rules ofour distributed de-noising techniques. Figures 5.1(a) and (b) depict the L,S and H,Sfields, and we see that in both cases universal thresholding (dashed line) and the pos-terior median estimate (solid line) flatten out near the optimal number of coefficients.And for both techniques, the approximation prior to this cutoff is superior to thatgiven by the original, noisy coefficients. As expected, universal thresholding showsnear-identical performance to the Bayesian approach, making it the preferred solutionwhen the fields are expected to be globally smooth. Figures 5.1(c) and (d) explorethe case for the L,D and H,D fields. Both curves again flatten out around the optimalnumber of coefficients, but we see in this case the clear advantage of the posterior-median estimator’s shrinkage of the non-thresholded coefficients. Not only does theBayesian technique give a higher cutoff PSNR, but it also yields a better overall ap-proximation in the range of queries prior to the cutoff, where universal thresholdingis slightly sub-optimal to querying noisy coefficients (albeit, with an oracle). Whenfields are expected to contain such discontinuities, the Bayesian technique is clearlypreferable when the additional communication resources it requires are available.

5.3 Spatio-Temporal De-noising Performance Study

We now turn our attention to networks gathering time-series of measurements. Asdiscussed, we can leverage the temporal correlations in these measurements through acomposite 3-D transform, but an alternative presents itself: rather than employing thecommunication-heavy 3-D transform, we can simply de-noise the time series at eachsensor in isolation with a 1-D transform. As is the case for the compression study inSection 4.5, this approach ignores the spatial correlations among neighboring sensors’measurements but requires no extra expenditure of communication energy. We mustverify that 3-D de-noising can give added performance for the increased cost.

70

To do so, we examine the relative performances of 1-D (repeated at each sen-sor) and 3-D de-noising using the hard universal threshold and Bayesian posteriormean techniques over a spatio-temporal measurement field. This amounts to in-placespatio-temporal de-noising. We do not examine 1-D universal thresholding, since thesuperior Bayesian techniques can be used at no extra cost in the centralized setting of1-D de-noising. We also include results for 2-D de-noising (repeated on each elementof the time series) to help better understand the tradeoffs. We considering the time-varying smooth and piecewise-smooth discontinuous fields used in the spatio-temporalcompression study of Section 4.5. The finest 1/2 of wavelet scales are subjected tonoise removal in the 1-D case, and for the 3-D case we de-noise coefficients whose scalein either the 1-D or 2-D transforms of the 3-D composite transform would receive 1-Dor 2-D de-noising treatment.

We begin by consider the low-noise, globally smooth (LS) field, varying both thenumber of sensors N and the time-series length T . Results are depicted in Figure5.2(a) for N = 500 sensors, Figure 5.2(b) for N = 1000 sensors, and Figure 5.2(c)for N = 1500 sensors. In each figure, we consider T = 64, 128, 256, 512, and eachdata point represents an average over 50 realizations of the (N, T ) pair for the LSfield type. We show the results of de-noising the field using the following techniques:1-D Bayesian posterior mean, marked 1D; 3-D Bayesian posterior mean, marked 3D;3-D universal hard thresholding, marked 3D (hard); 2-D Bayesian posterior mean,marked 2D; and 2-D universal hard thresholding, marked 2D (hard).

We see in Figure 5.2 that, for the LS field, the difference between the Bayesianand universal thresholding techniques is somewhat larger in the 3-D case than the2-D case but is in general not substantial, matching our observation from Section 5.2.A more pronounced difference is found between the 3-D and 1-D approaches. The3-D techniques out-perform the 1-D technique by a considerable margin, with thegap closing in general as T increases for fixed N and the efficacy of 1-D de-noisingincreases. As N increases for fixed T , however, the disparity between 3-D and 1-D de-noising in general widens in favor of the 3-D approach. Thus, the 3-D approach doesa good job of blending the strengths of the 1-D and 2-D approaches. We can concludethat, for smooth fields in low noise, both the 3-D techniques perform favorably against1-D de-noising, especially for higher N/T ratios. This margin of improvement helpsjustify the added collaborative cost of 3-D processing, and the success of the universalthresholding approach helps to mitigate the effects of this added cost with its relativelylower overhead. Taken together, these observations dovetail nicely with those of thespatio-temporal compression study for smooth measurement fields in Section 4.5.Similar results can be seen for the case of smooth fields in high noise, as depicted inFigure 5.3 with the same experimental setup as the LS case.

De-noising results for the high noise, piecewise-smooth case, depicted in Figure5.4, are more interesting, however. To begin with, we see a much clearer disparitybetween the Bayesian posterior mean and universal thresholding approaches, in ac-cordance with our observations in Section 5.2. As the performance of 1-D de-noising

71

improves with increasing T , both the 3-D Bayesian and 3-D universal threshold-ing techniques similarly improve, but now 3-D universal thresholding is not able tokeep pace with the efficiency of the 1-D Bayesian approach. Though this improvessomewhat with increasing N , in general the universal thresholding approach seemsill-suited to de-noising the discontinuous measurement field in high noise. The 3-DBayesian approach, however, still generally shows desirable performance, especiallyfor higher N/T ratios, as before. We can see, however, that the 3-D transform istrending to a crossover with the 1-D transform when the N/T ratio is at its lowest.

This tendency is more pronounced when we consider the low-noise, piecewisesmooth case in Figure 5.5. This set of experiments by far exhibit the most interestingbehavior. We see that, for a low N/T ratio, the 1-D technique clearly dominates.Similarly, for a high N/T ratio, the 2-D Bayesian technique dominates. Betweenthese extremes, the 3-D Bayesian approach does not in general show a substantialgain over either isolated 1-D or repeated 2-D de-noising , with the notable exceptionof (N, T ) = (1000, 512). This recalls the results for compression of piecewise-smoothspatio-temporal data in Section 4.5. Depending on the spatial and temporal samplingdensity, either of the 2-D or 1-D transforms seems in general to be doing a betterjob of sparsifying the wavelet coefficients than is the composite 3-D transform. Sincethe discontinuity in this class of measurement field induces wavelet coefficients ofsignificant magnitudes from coarse to fine scales, and since wavelet-based de-noisingalgorithms work better when signal energy is more highly concentrated in a fewwavelet coefficients, this difference in sparsity becomes profound. It is likely mostevident in the low-noise case since the de-noising procedure must differentiate betweensmall signal components and small noise components — any loss in sparsity can leadto signal being mis-interpreted as noise. These subtle differences are less of an issuein the high-noise case when compared against the higher energy of the noise terms,suggesting why the 3-D technique manages to remain more competitive in Figure 5.4.The exact nature of this discrepancy bears further investigation.

Thus, to summarize, the spatio-temporal WT exhibits desirable de-noising perfor-mance for globally smooth data. In such a case, the similarity between the Bayesianand universal-thresholding results suggest use of the simpler universal threshold withits lower communication overhead. In the case of piecewise-smooth fields in high noise,the posterior-mean Bayesian technique manages to show superior performance whencompared against isolated 1-D de-noising, but this improvement must be critical tosubsequent applications using the de-noised data to justify its increased overhead cost.Finally, for the piecewise-smooth fields in low noise, it appears that 3-D de-noisingdoes not provide a clear benefit over de-noising using either temporal or spatial WTsalone. Thus, as a general rule of thumb for such measurement fields, it seems thatwe should avoid 3-D de-noising altogether, instead using 2-D de-noising up to someT for fixed N and then switching to isolated, 1-D de-noising thereafter (and viceversa for fixed T ). This optimal switch point will depend heavily on network routingeconomics and must be investigated further.

72

0 100 200 300 400 500 60016

18

20

22

24

26

T

N = 500

2D2D (hard)1D3D3D (hard)orig

(a)

0 100 200 300 400 500 60016

18

20

22

24

26

28

T

N = 1000


(b)

0 100 200 300 400 500 60016

18

20

22

24

26

28

T

N = 1500


(c)

Figure 5.2: Results of in-place, spatio-temporal de-noising using 1-D (Bayesian posteriormean), 2-D (Bayesian posterior mean and universal hard thresholding), and 3-D (Bayesianposterior mean and universal hard thresholding) techniques for the low-noise, globallysmooth (L,S) class of measurement field. Compare against the original, noisy measure-ments. Network sizes range from (a) N = 500, (b) N = 1000, and (c) N = 1500 sensorscapturing T = 64, 128, 256, 512 time samples each.

73

0 100 200 300 400 500 6008

10

12

14

16

18

20

T

N = 500


(a)

0 100 200 300 400 500 6008

10

12

14

16

18

20

T

N = 1000


(b)

0 100 200 300 400 500 6008

10

12

14

16

18

20

22

T

N = 1500


(c)

Figure 5.3: Results of in-place, spatio-temporal de-noising using 1-D (Bayesian posteriormean), 2-D (Bayesian posterior mean and universal hard thresholding), and 3-D (Bayesianposterior mean and universal hard thresholding) techniques for the high-noise, globallysmooth (H,S) class of measurement field. Compare against the original, noisy measurements.Network sizes range from (a) N = 500, (b) N = 1000, and (c) N = 1500 sensors capturingT = 64, 128, 256, 512 time samples each.

74

0 100 200 300 400 500 6008

10

12

14

16

18

T

N = 500


(a)

0 100 200 300 400 500 6008

10

12

14

16

18

T

N = 1000


(b)

0 100 200 300 400 500 6008

10

12

14

16

18

T

N = 1500


(c)

Figure 5.4: Results of in-place, spatio-temporal de-noising using 1-D (Bayesian posteriormean), 2-D (Bayesian posterior mean and universal hard thresholding), and 3-D (Bayesianposterior mean and universal hard thresholding) techniques for the high-noise, piecewise-smooth discontinuous (H,D) class of measurement field. Compare against the original, noisymeasurements. Network sizes range from (a) N = 500, (b) N = 1000, and (c) N = 1500sensors capturing T = 64, 128, 256, 512 time samples each.

75

0 100 200 300 400 500 60014

16

18

20

22

24

T

N = 500


(a)

0 100 200 300 400 500 60015

16

17

18

19

20

21

22

T

N = 1000


(b)

0 100 200 300 400 500 60014

16

18

20

22

24

T

N = 1500


(c)

Figure 5.5: Results of in-place, spatio-temporal de-noising using 1-D (Bayesian posteriormean), 2-D (Bayesian posterior mean and universal hard thresholding), and 3-D (Bayesianposterior mean and universal hard thresholding) techniques for the low-noise, piecewise-smooth discontinuous (L,D) class of measurement field. Compare against the original, noisymeasurements. Network sizes range from (a) N = 500, (b) N = 1000, and (c) N = 1500sensors capturing T = 64, 128, 256, 512 time samples each.

76

Chapter 6Distributed Data Processing Application

Programming Interface

In the preceding chapters, we have consistently emphasized that, for distributedprocessing algorithms in sensor networks to be practical, their collaborative costsmust not outweigh their benefits in terms of communication savings. Since thesecosts are governed by complex network routing economics, it is necessary to evalu-ate proposed algorithms using real routing frameworks, as we do for the applicationof compression in Section 4.3. Simulation results, however, are only as good as theassumptions made by the simulator, which will not always accurately duplicate thedynamic and sometimes unpredictable nature of multi-hop wireless communication.It is imperative, then, that designers of distributed algorithms be able to quickly pro-totype their algorithms in real sensor network hardware running real routing protocolsto determine their designs’ true practicality.

While the communication patterns and on-board computations required for com-puting a distributed wavelet transform (WT) are straightforward, our own experienceimplementing the WT on a small scale in a sensor network testbed has shown us thatthe task is anything but simple. The in-network collaborations require by both theWT and other distributed algorithms proposed in the sensor networking literature in-duce a variety of network communication patterns, many of which are not supportedby standard sensor network programming tools such as TinyOS [51]. This forcesalgorithm designers who wish to prototype their designs to re-implement commoncommunication modules. Unfortunately, since a good number of these researchers donot have the backgrounds necessary to develop the network communication protocolsthey need using TinyOS building blocks, many designs go untested in practice in realsensor network environments. Thus, many algorithm developers are unable to ascer-tain the true utility of their designs and identify and correct possible flaws exposedby the complex operating environments of real-world deployments.

What these researchers lack is a network programming abstraction to allow themto implement their designs without concerning themselves with the underlying net-work services to support the communications they require.

In response to this tension, we expand our scope past distributed multiscale pro-cessing to elucidate the communication requirements of a broader class of distributedalgorithms. In this second main contribution of the thesis1, we design a network

1This chapter represents joint work with Richard Baraniuk, Marco Duarte, David B. Johnson,

77

application programming interface (API) that provides access to a suitable set ofabstract network services for data processing in sensor networks. An API is not aspecific implementation of these abstract services. Rather, an API provides a portalthrough which an application accesses the abstract services. Defining the API is acrucial intellectual exercise, since the API is the part of the system that is the mostdifficult to change over time; any change to the API requires application programsto be re-written. In contrast, the implementation of an abstract service is relativelychangeable. For example, the implementation of a multi-hop datagram service canchange from using link state routing to distance vector routing transparently.

Designing a suitable network API requires solid knowledge of the applications’communication needs. Fortunately, the plethora of sensor network applications andalgorithms proposed to date provide strong guidance. Moreover, ad hoc wireless net-work algorithms and protocols have matured in recent years, so the basic technologiesnecessary for implementing the abstract services are available. Starting with theseobservations, this chapter addresses the problem of designing a suitable network APIfor sensor networking. Our primary goals are to expose the technical trade-offs be-tween different API design choices and to propose a specific API that is sufficientlypowerful, convenient to use, compact, and realizable with existing technologies. Weintentionally leave out implementation issues such as software architecture or systemperformance evaluation. We expect future work to explore and evaluate differentimplementation choices for realizing the proposed API.

Our contributions in this chapter are two-fold. First, we conduct a survey of thedistributed data processing algorithms proposed in the proceedings of InformationProcessing in Sensor Networks (IPSN) to extract a family of key communicationpatterns that recur across the algorithms. In Section 6.1 we present the results ofthis survey. Second, we design a network API to cover the classes of communication.We begin by carefully discussing the design of the API in Section 6.2, and we presentthe API calls and a brief discussion of possible implementation directions for each inSection 6.3. We then provide a detailed treatment of four of the surveyed algorithmsin Section 6.4, indicating the API calls necessary to implement the communicationsrequired by each.

6.1 Survey of Application Requirements

To understand the communication requirements of typical sensor network appli-cations, we conducted an extensive review of the papers proposing data processingalgorithms in the proceedings of IPSN to date. We surveyed over 100 papers in all,but due to space limitations, we present the survey results for a set of 30 of themhere, chosen to best represent the diversity of application classes and communicationpatterns. In categorizing each paper, we carefully looked to the authors’ description

T.S. Eugene Ng, and J. Ryan Stinnett [52].

78

of the assumed networking environment, making as few assumptions as possible onour own part in order to accurately capture the authors’ original intent.

The proposed applications span a wide variety of topics. Algorithms for dataanalysis include standard signal processing applications such as measurement com-pression [32, 53, 54], target tracking [55–59], and parameter estimation [16, 60, 61], aswell as those more specific to sensor networks, such as query servicing [40, 41, 62–64]and aggregation [65, 66]. Network maintenance algorithms form another commonapplication class, with protocols to enable node self-localization [67, 68], guide nodeplacement, [69], schedule node sleep cycles while maintaining sensing coverage [70–72],detect network faults [73], and provide navigation assistance for mobile agents travers-ing a sensor network [74]. Finally, a number of papers extend their algorithm pro-posals to real-world implementations, monitoring environmental and structural phe-nomena [75–78].

Despite great diversity in the kinds of applications, we find significant commonal-ity in their node communication requirements. Some notion of address-based sending,either to a single destination or a set of destinations, is found in a majority of theproposed algorithms [16, 32, 40, 41, 53–56, 58, 60, 62–67, 72–76]. These addresses typi-cally take the form of unique node identifiers, though a subset of applications addressmulticast groups to which nodes may subscribe [56,62].

Region-based sending also emerges as a common feature. Broadcast of a messageis common, either to all of a node’s immediate neighbors within wireless transmissionrange [32, 41, 58, 61, 64–70, 74, 75] or to all neighbors within a larger number of radiohops [62,68,74]. This notion is extended in several applications to sending a messageto all nodes within a geographic radius of the sender [32,62,70,71]. Finally, a numberof applications wish to send a message to nodes within an arbitrary region of spacenot centered around the sender [40, 53,56,57,62].

Communication based on hierarchies of more- and less-powerful devices is alsocommon. In its simplest form, this consists of sending to one or more central datasinks in an otherwise homogeneous network of nodes [32, 53, 54, 62, 64, 69, 70, 73, 75].For networks with multiple device classes below the level of the sink, this notionis extended to sending to a more powerful parent device and less powerful childdevices [55,59,77,78].

We note that most applications at least implicitly require some form of multi-hopcommunication, with only a handful specifically relying solely on single-hop trans-missions [16, 60, 61, 65, 66]. Similarly, most applications imply at least some level ofreliability in packet delivery, with a very few claiming complete robustness to unreli-able links [16, 61,65,66].

Finally, while most applications concern themselves with reception of packets onlyat the intended destination, a few leverage the ability of nodes to eavesdrop on packetspassing through their vicinity on the way to a different destination [16,59,62].

79

6.2 Design Decisions

The survey results provide a starting point for specifying an API to cover commoncommunication patterns; a number of issues, however, influence the shape of the finalAPI. We motivate and detail key design decisions in this section, beginning with abrief overview of the state of the art in sensor network programming APIs.

6.2.1 Preliminaries

Over the past several years, a variety of programming models and support frame-works have been developed for sensor networks to overcome the unique challengesfaced when trying to implement data processing applications of varying complexitieson highly constrained hardware platforms. Of the available architectures, TinyOShas garnered the most research and attention. Its event-driven model is attractivebecause it allows for a direct translation of hardware interrupts from physical devicesinto handlers that run in response to these interrupts.

TinyOS also allows developers to program in nesC, an event-based extension ofC created specifically for TinyOS, rather than forcing users to adapt to an entirelynew language. TinyOS 2.0 contains a large selection of components to simplify typ-ical tasks an application might want to perform, such as collecting sensor readings,managing overall device power, and communicating with other nearby devices [79].

While TinyOS has greatly reduced the work required of the application developer,its networking components are focused on supporting two basic communication types:single-hop unicast to a single node and single-hop broadcast to all nodes within ra-dio range of the sender [80]. This provides a basis upon which more complicatedtransmission schemes can be built. For example, TinyOS 2.0 extends the single-hopsupport to include a tree collection protocol [81]. The communication classes directlysupported by TinyOS, however, do not cover the bulk of the patterns enumerated inSection 6.1.

Moreover, while TinyOS’s basic send and receive system is well designed for simplenetworking systems, it lacks several features useful to applications developers. For ex-ample, to expend extra effort sending a certain packet in order to increase its chancesof reaching the receiver, a developer may take a number of platform-dependent ac-tions, such as enabling automatic radio acknowledgements (ACKs) or increasing theradio’s transmission power. There is, however, currently no transparent way for anapplication to specify increased reliability. Developers must directly access these lowlevel controls, that can differ from one platform to another. As another example,to send data larger than the packet structure’s fixed payload length, an applicationdesigner must custom-build a fragmentation and reassembly scheme. Implementingsuch a system can itself become quite complex and distracting for designers wishingto focus their efforts on novel applications.

80

6.2.2 Supporting Multiple Addressing Modes

In addition to the most basic form of unicast addressing for identifying the recip-ient of a message, the proposed API must also support a multicast group addressingmode as well as addressing modes based on physical regions, either centered aroundthe sending node or around an arbitrary point in space. Efficiently supporting thesefamilies of send functions in the proposed API requires the following decisions:

Provide a dedicated API call for each addressing mode. Instead of havinga generic send() API call and using a parameter to specify the intended addressingmode, we dedicate a separate API call to each addressing mode. This design allowsthe underlying execution environment, such as TinyOS, to selectively load only thecode corresponding to the addressing mode(s) required by the application onto thepossibly resource-constrained nodes. This design also removes the need to unnaturallyforce very different addressing modes such as unicast and region-based addressing tofit into a single API call mold.

Use a separate address space for multicast addressing. In today’s IP network,unicast and multicast addresses co-exist within a 32-bit address space, where multicastaddresses are identified by the prefix bit sequence 1110. While a similar strategy maybe used for a sensor network API, sensor node addresses are typically drawn froma 16-bit address space [82]. To statically reserve a significant portion of those 216

addresses for multicast addressing may be wasteful. Instead, the proposed API usesa separate 16 bit address space for multicast addressing. An address is evaluated aseither unicast or multicast depending on the API call naming it as a destination.

6.2.3 Supporting Multiple Receive Modes

To enable an application running on a node to eavesdrop on passing messages forwhich the node is not the intended recipient, the proposed API supports overhearingreceive modes in addition to the basic mode where the receiving node is the intendeddestination. The multi-hop nature of network traffic enables two very different kindsof overhearing receivers. The first merely observes traffic in the node’s vicinity, whilethe second allows a node to intervene and modify passing message. Support for thesetwo modes is qualified as follows:

Support an eavesdropping receive mode. A node is allowed to passively listento all overheard messages for which the node is not the ultimate destination. Thisincludes messages sent by neighbors for which the node is not the next-hop destinationas well as those for which it is the next-hop destination. This is the multi-hop analogueto TinyOS’s radio-channel eavesdropping.

Support an intervening receive mode. A node is notified of messages it isforwarding to another multi-hop destination and allowed to modify those messagesbefore performing the forward (including optionally cancelling the forward). This newcapability can allow for novel implementations of distributed algorithms. Consider,

81

for example, the algorithms proposed in [54,64], both of which perform cluster-basedaggregation of data where the node serving as the cluster head can change with time.Both algorithms expend effort to build and maintain a routing tree rooted at thehead node, and child data is aggregated by parents in the tree as it flows to the root.Instead, each node in the cluster could address the cluster head directly, with all nodesutilizing the intervening-receive mode. Nodes would wait an amount of time inverselyproportional to their hop-count to the head before aggregating all intercepted datawith their own reading and addressing the result to the cluster head. Such a solutionavoids re-building the routing tree when the cluster head membership changes ordetecting and repairing links in the tree that have been broken due to changes in thewireless communication environment.

6.2.4 Giving the Application the Ability to Control Transmission Effort

Transmission effort is an important issue to consider in designing a network APIfor sensor networks as it impacts both the reliability of packet delivery and the amountof energy consumed in delivering a packet.

Among existing network APIs that provide enhanced transmission effort, the TCPsocket API is the best-known. It provides an end-to-end reliable in-order byte-streamdelivery service to the application, and data is retransmitted until an end-to-endacknowledgement is received. And while the TCP socket API serves the common casein the communication and content oriented Internet, it does not serve the commoncase in data-processing oriented sensor networks, where data tend to be much moretime-sensitive—consider, for example, the sensor measurements that drive the targettracking applications proposed in [55–59]. A TCP socket-like API may retransmit apacket that the application no longer considers useful, wasting energy. Moreover,the in-order nature of the service causes the application to lose control over thetiming of packet transmissions. When a packet is retransmitted past its deadline, thesubsequent packet may be delayed sufficiently to miss its deadline as well.

Providing no transmission effort enhancement, however, does not sufficiently servesensor networks— the chance of a packet being delivered successfully over a multi-hopwireless sensor network can, in some cases, become too small to be practical. Theseconsiderations led us to the following design decisions:

Give control to applications. The proposed API is designed to give applicationscontrol over the level of transmission effort required. Wireless sensor networks’ re-liability characteristics will depend on the physical environments in which they aredeployed. Moreover, sensor network applications can have widely different toleranceto packet loss and sensitivity to timeliness of data delivery. Thus, only applicationsthemselves can decide the level of transmission effort sufficient to achieve the desiredperformance.

Allow per-packet control. The proposed API allows the transmission effort to becontrolled on a per packet basis. The API provides a datagram style service rather

82

than a byte-stream style service like TCP. This allows applications to maintain finercontrol over the timing of packet transmissions. Packets of different importance toan application can be transmitted at different effort levels to realize different levelsof reliability. The application can also adapt the transmission effort level at run-timeto find the operational sweet-spot.

Provide an energy-based abstraction. The proposed API expresses transmissioneffort abstractly in terms of an energy factor relative to the amount of energy re-quired for regular transmission. Thus, an energy factor of 1 implies no extra effort isneeded, while a factor of 2 allows for up to twice as much energy to be used for trans-mission of the packet. Compared to an alternative where the transmission effort isexpressed in terms of lower level notions, such as allowed number of re-transmissions,the energy-based abstraction has a number of advantages. First of all, applicationson resource-constrained nodes can relate to energy consumption most meaningfully—with the energy-based abstraction, the application can pick a desired balance betweenreliability and energy consumption. Secondly, using energy as an abstraction allowsa variety of techniques such as increased transmission power, decreased transmissionrate, or acknowledgements and retransmissions, to be used transparently by the lowerlayer software and hardware depending on the situation.

Allow applications to manage congestion. Increasing transmission effort maypotentially exacerbate congestion in the network; the proposed API explicitly leavesthe management of network congestion to the applications. Sensor network applica-tions should be designed correctly to avoid overloading the network. When necessary,an application can use a variety of available techniques such as rate-based and credit-based flow control [83] to help avoid transient congestion.

6.2.5 Providing a Packet Fragmentation and Reassembly Service

Operational experience from the IP Internet suggests that packet fragmentationis a liability on network performance and software complexity that should be avoidedwhen possible. However, sensor network radios in general allow only very small datapayloads in packets. For instance, Chipcon’s CC2420 radio currently supports a rawdata size of 128 bytes [84], and TinyOS 2.0 uses a message structure with a defaultpayload data size of just 28 bytes [82]. It is easy to find applications that send dataunits larger than such a small size.

Consider, for example, the Fractional Cascading query servicing algorithm [40],that returns the identifiers of all sensors in a region whose measurements fall in a givenrange—a set whose size can be arbitrarily large and require the payload of multiplepackets. Consider also the distributed wavelet compression algorithm described inthis thesis and [32], where transform data describing a node’s roles at each scale ofa multiscale transform must be sent by the sink before the transform can begin. Thenumber of transform scales, and hence the size of the transform data, depends on thenumber of nodes in the network and cannot be guaranteed to fit in a single packet.

83

Our proposed API, therefore, provides a message fragmentation and reassemblyservice to reduce the burden on application programmers. Note, however, that appli-cations should avoid packet fragmentation as much as possible.

The message fragmentation support of our API reflects the following design deci-sions:

Require hop-by-hop fragment reassembly. As mentioned in Section 6.1, someapplications leverage packet eavesdropping [16, 59, 62], and others will likely benefitfrom intervening and modifying in-transit packets. To support such applications,fragment reassembly must be performed at each intermediate hop.

Require in-order fragment delivery at each hop. The primary implementationcomplexity for fragmentation lies in the reassembly of packet fragments when theymay arrive out of order. The lower layer software should therefore provide an in-orderfragment delivery service. A simple stop-and-wait protocol [83] is ideal for hop-by-hopin-order fragment transmissions. Although in many cases, a stop-and-wait protocolwould sacrifice performance when compared to a sliding-window protocol (e.g., asused in TCP), this is not the case as used here, since there is no opportunity forpipelining of packets over the single wireless link before fragment reassembly at eachhop.

6.2.6 Providing Flexible Memory Allocation and Management for Vari-able Sized Data

Both the application and the implementation of our sensor network API willin general need to deal with data objects for which the total size is not known inadvance. For example, the message fragmentation and reassembly service describedin Section 6.2.5 requires a node to collect the data from a variable number of packets(fragments) to reassemble the original application-level message; only once completelyreassembled can the API pass the message to the application for processing. Asanother example, in a sensor network with nodes organized into hierarchical levels(Section 6.2.7), the application on some node may need a list of its immediate childrennodes that are one level below it in the hierarchy; the number of such children includedin the list may be a dynamic function of the network topology for which it may bedifficult in advance to know the expected list size.

Our proposed API, therefore, provides a flexible memory allocation and manage-ment mechanism for such variable sized data. This mechanism entails the followingdesign decisions:

Define a buffer chain data structure. Variable sized data are stored in a bufferchain data structure. A buffer chain is a linked list of memory chunks. Each memorychunk contains application data as well as meta data that facilitate the manipulationof the buffer chain. Multiple buffer chains containing different application messagescan also be linked together to form a message queue. The idea is similar to theFreeBSD mbuf chain data structure [85]. The primary advantage of this design is

84

that memory management is greatly simplified, since a pool of fix sized buffers can bepre-allocated by the system and used dynamically to store variable sized applicationdata without heavy weight memory allocation and de-allocation operations.

Provide a buffer allocation service. The application can request fix sized buffersfrom the pool of buffers maintained by the system to create a buffer chain for storingits variable sized data. When a buffer is no longer needed by the application, it isreturned to the pool for future use.

6.2.7 Supporting Self-Organized Device Hierarchies

The proposed API is designed to enable self-organization of the sensor networknodes into network hierarchies. Providing such network hierarchies is an importantservice since many applications rely on more powerful devices managing the data fromless powerful devices— see, for example the target tracking applications of [55,59] andthe environmental monitoring applications of [77,78].

Nodes in the network can be heterogeneous in many dimensions. Physically, nodescan have batteries with varying capacities or they may even be connected to a powergrid. They can have varying computation capabilities, data storage resources, andcommunication bandwidth. In addition, nodes can also have different logical roles ina sensor network. For example, a sink node has the special logical role of a gatewaybetween the sensor network and the outside world and is usually placed at the rootof a network hierarchy. The proposed API is based on the following design decisions:

Do not assign static roles to device classes. We have explicitly decided againststatically mapping different classes of devices (e.g. the gateway-class Stargates andnode-class Micas of Crossbow, Inc.’s product line [4]) to different fixed roles in ahierarchy, since the application should dictate how these resources are used. Suppose,for example, that some Mica nodes are connected to an external power source. Withthe proposed API, the application will have the flexibility of giving them a specialrole in the hierarchy to take advantage of their additional resources.

Support hierarchy level-based abstraction. The proposed API provides appli-cations with the means to assign a node a hierarchy level number at run time. Forexample, the application can choose specific nodes to be the sinks and set them atlevel 1, set less powerful Stargate devices at level 2, and finally set least powerfulMica devices at level 3. The lower layer software then organizes the nodes to formefficient logical network hierarchies rooted at the level 1 nodes. Each node at levelK is associated with a parent node at level K−1 whenever possible. Each node isalso associated with one of the nodes at level 1 to enable it to send messages directlyto the data sink. This approach allows applications to make very flexible decisionsbased on both the physical characteristics and the logical roles of network nodes.

Maintain self-organized logical overlays. It is important to note that the hier-archies constructed are logical overlays, so that a parent and its children need not

85

be within physical radio range. This gives the lower layer software the flexibility tooptimize the overlay structures to achieve good performance.

6.3 API Description

We present in this section the definition of our API for data processing algorithmsin sensor networks, from the point of view of application developers who may use thisAPI, and we also provide suggestions on how the underlying network protocols tosupport this API can be implemented efficiently in a real system. Our presentationhere is divided between the API calls for sending messages and for receiving messages,and for send calls, it is divided between the calls for our three different addressingmodes: address-based sending, geographic region-based sending, and sending relativeto a device hierarchy. For each proposed API call, we also provide citations to repre-sentative example data processing algorithms that can directly use the call, based onour survey of application requirements presented in Section 6.1.

6.3.1 Address-Based Sending

There are three principle destination types for address-based sending. The firstsends a message to a single node address. The second sends a message to a singleaddress that is a multicast address to which several nodes may be subscribed. Thethird sends a message to each of a list of node addresses. The calls for each of thesesend types are specified as follows:

sendSingle(data , address, effort , hopLimit). The parameter data is a pointerto the buffer chain containing the message data. address is the single-address des-tination for the message, drawn from the physical node address space. effort is aninteger specifying the transmission effort level to use at each hop (1 to MAXLEVEL).hopLimit is an integer specifying the maximum number of hops over which the mes-sage may be forwarded on its way to the destination address (1 to MAXHOPS ).[16, 32,40,41,53–56,58,60,62–67,72,74–76]

sendMulti(data , address, effort , hopLimit). The parameters here are as insendSingle() above, except that address is drawn from the multicast-group addressingspace. [56,62]

sendList(data , addList , effort , hopLimit). The parameters here are as insendSingle() above, except that addList is a pointer to a buffer chain containingthe list of destination addresses drawn from the physical node address space; thenumber of addresses in the list can be determined from the length of the data (inbytes) in the addrList buffer chain. [16,32,41,53,63,72,73,76]

Routing for sendSingle() can be done using any existing multihop wireless unicastrouting protocol; this problem has been well studied in the literature. Likewise,for sendMulti(), routing may be done using any existing multihop wireless multicast

86

routing protocol; although this problem has received less attention in sensor networks,it has been well studied in the multihop wireless ad hoc networking community.

Routing for sendList() can be done by leveraging the unicast routing protocol usedfor sendSingle(). For example, the sending node can determine the first hop towardeach of the destinations listed in addrList . For all destinations with a common firsthop, the sending node can forward a single copy of the packet to that first-hop node;this process is repeated for each distinct first-hop node needed for the routes to alldestinations listed in addrList . In the header of the packet sent to each unique first-hop node in this way, the sending node includes a list of all destinations from addrListreachable through that specific first-hop node. Each of these nodes, upon receivingthe packet, then repeats this process with the remaining address list.

Multi-hop transmissions are only allowed to propagate a limited number of hops(hopLimit) to allow applications to control the scope of their packets and as a safe-guard against routing loops in the underlying routing protocols.

6.3.2 Region-Based Sending

Applications may often want to address all nodes within certain geographic con-straints. This may include all nodes within a certain number of hops of a given node,all nodes within a certain radius of a given node, or all nodes within an arbitraryregion of space. The API calls to support this are specified as follows:

sendHopRad(data , hopRad , effort , hopLimit). The parameters here are as insendSingle() above, except that hopRad is an integer specifying a hop-count from thesending node within which all neighboring nodes are intended to receive the message.hopRad can take a value ranging from 1, corresponding to immediate radio neighbors,to MAXHOPS , corresponding to a network-wide flood. [32,41,58,61,62,64–70,74,75]

sendGeoRad(data , geoRad , outHops, effort , hopLimit). The parametershere are as in sendSingle() above, except that geoRad is a floating point numberspecifying a geographic distance (in standardized units) from the sending node withinwhich all neighboring nodes are intended to receive the message, and outHops specifiesthe maximum number of hops that packets are allowed to propagate outside thespecified region in order to route around voids inside the region, attempting to reachall intended nodes. [32,62,70,71]

sendCircle(data , centerX , centerY , radius, single, outHops, effort ,hopLimit). The parameters here are as in sendSingle() above, except that centerXand centerY are floating point numbers that define the coordinates of the centerof a circle, and radius is a floating point number specifying the radius from thatpoint within which all nodes are intended to receive the message (all in standard-ized units). single is a boolean flag indicating how many sensors in the area mustbe reached: single= 1 specifies that only one sensor in the area must receive themessage [56], whereas single= 0 specifies that all sensors in the area are intended toreceive the message. [56, 57,62]

87

sendPolygon(data , vertCount , vertices, single, outHops, effort ,hopLimit). The parameters here are as in sendSingle() above, except that vert-Count is an integer specifying a number of polygon vertices (1 to some numberMAXVERTS ), and vertices is a pointer to an array of floating point numbers repre-senting the spatial coordinate pairs of the vertices (in standardized units). single= 1specifies that only one sensor within the convex hull formed by the vertex list mustreceive the message [40, 56], whereas single= 0 specifies that all sensors in this areaare intended to receive the message. [40, 53,56,62]

The sendHopRad() API call can be implemented by any form of a flooding protocol(or using a spanning tree protocol); as a special case, if hopLimit= 1 , sendHopRad()can be implemented as a single link-layer broadcast transmission of the packet. ThesendGeoRad() API call is similar, except that nodes forward the flood if they arestill inside the specified geoRad radius around the originating node, or if they arewithin outHops beyond the first node encountered outside this radius. The use ofoutHops increases the chance of reaching all nodes inside the radius (at the expenseof increased overhead, as controlled by the application), despite the presence of voidsinside the circle.

For the sendCircle() call, routing can be done by adapting any existing geographicrouting protocol; the problem of geographic routing has been well studied in theliterature. The sending node routes the packet to geographic coordinates that arethe center of the circle (centerX , centerY ). However, once the packet is receivedby (or overheard by) the first node that is inside this circle (the node need notbe at the center coordinates), geographic forwarding of the packet terminates, andthis node instead initiates a form of the protocol used for sendGeoRad(), givingcenterX , centerY , radius , and outHops to define the flood of the packet. Mostflooding protocols use a unique identifier for multiple packets that are part of thesame flood, to ensure that the flood expands efficiently in a well controlled manner;by assigning the unique identifier at the original sending node (the node initiatingthe sendCircle() call), the resulting flood will be well controlled, even if the packetunder geographic forwarding is overheard by multiple nodes that all initiate copies ofthe flood — the different copies of the flood will in effect merge into a single flood.

For the sendPolygon() API call, we proceed similarly to the sendCircle() call,with the region boundary evaluated for flooding purposes as the convex hull of thevertex list.

6.3.3 Device Hierarchy Sending

In any sensor network, there will typically be a hierarchy induced by a centraldata sink and the nodes of the network. In a sensor network with multiple classes ofnon-sink devices (e.g., low power sensor nodes and higher power intermediate nodes),we support extending this device hierarchy to reflect these additional device classes.The API calls for sending in the device hierarchy are specified as follows:

88

setLevel(level). The parameter level is an integer specifying the level for this nodein the hierarchy. We assume that each node, in an application-specific manner, hasaccess to the level of the hierarchy it should occupy.

sendSink(data , effort , hopLimit). The parameters here are as in sendSingle()above. The message is sent to the “best” available sink node in the network, wherethe choice of sink node is determined for the application by the API. [32, 53, 54, 59,62,64,69,70,73,75,77,78]

sendParent(data , effort , hopLimit). The parameters here are as in sendSingle()above. The message is sent to the node’s parent in the device hierarchy. [55,59,77,78]

sendChildren(data , effort , hopLimit). The parameters here are as insendSingle() above. The message is sent to each of the node’s children in the de-vice hierarchy. [59, 77,78]

parent=getParent() returns the address of the node’s current parent in the devicehierarchy.

childList= getChildren() returns a pointer to a buffer chain containing a list ofthe addresses of the node’s current children in the device hierarchy; the number ofchildren in the list can be determined from the length of the data (in bytes) in thechildList buffer chain. This API call supports, for example, a node sending to asubset of its children by calling sendList() using a subset of the return child list.

newParent(parent). This API call is invoked by the API implementation as anevent into the application when the node’s parent in the device hierarchy has changed.The parameter parent gives the address of the node’s new parent.

newChildren(childList). This API call is invoked by the API implementationas an event into the application when one or more of the node’s children in thedevice hierarchy has changed. The parameter childList is a pointer to a buffer chaincontaining a list of the node’s current children; the number of children in the list canbe determined from the length of the data (in bytes) in the childList buffer chain.

In order to form the device hierarchy, the API implementation can cause eachlevel-n node (at all but the nodes with the largest level number) to broadcast a mes-sage announcing itself as a level-n device. Any level-(n−1) node hearing this messagemay consider itself as a potential parent and contacts the level-n node with this in-formation. The level-n node then chooses its parent from the responding level-(n−1)nodes and informs that parent of its status as a level-n child. The lowest level ofnodes (typically the lowest power sensor nodes) never advertise themselves and onlylook for messages from potential parents. The highest level (sink) node(s) never lookfor potential parents.

The API implementation maintaining this process can periodically update par-ent assignment to cope with changing network conditions. When a node’s parentchanges, the API implementation invokes the newParent() event in the application ifthe application had earlier requested knowledge of its parent through a getParent()

89

call. Likewise, when a node’s list of children changes (a new node becomes a child, oran old child is no longer associated with this node), the API implementation invokesthe newChildren() event in the application if the application had earlier requestedknowledge of its children through a getChildren() call.

We considered providing a sendLevel() call in our proposed API, as a more generalform of the sendParent() and sendChildren() calls. Such a sendLevel() call, given alevel number in the device hierarchy, would send to the respective node(s) at thatlevel. For example, for a node at level n (n > 1), a sendLevel() to level n−1 wouldbe equivalent to a sendParent() call, and to level n+1 would be equivalent to asendChildren() call; a sendLevel() call need not be limited, however, to only sendingto levels n−1 and n+1, creating an easy way to send to “grandchildren” (level n+2)and “great grandparents” (level n−3), for example. We do not include such a callin the API, though, since we do not find the need for this generality in our survey ofapplication requirements, as presented in Section 6.1.

6.3.4 Receiving

The three main receive functions correspond to the three receiving modes. Anode can (1) receive a message for which it is the target destination, (2) interceptand potentially modify a message for which the node is a forwarder on the path to adifferent target destination, or (3) passively eavesdrop (without modification) on allmessages overheard by the node’s radio receiver. The API calls for receiving messagesare specified as follows:

receiveTarget(data , metadata). This API call is invoked by the API imple-mentation as an event into the application when a message has been received. Theparameter data is a pointer to a buffer chain holding the message data, and metadatais a pointer to a buffer chain holding the header information appropriate for themessage packet type.

receiveForward(data , metadata). The parameters are the same as inreceiveTarget(). This API call is invoked by the API implementation as an eventinto the application when a message has been received. If the application returns azero result value in response to this call (the function return value), then the messagewill not be forwarded (forwarding stops at this node). If instead the application re-turns a nonzero result value, the message (possibly modified by the application) willbe forwarded along to the next hop to the destination(s).

receiveOverhear(data , metadata). The parameters are the same as inreceiveTarget(). This API call is invoked by the API implementation as an event intothe application when a message has been received. Any message overheard by thisnode’s radio will be received, regardless of the addressing of the message. [16, 59,62]

Within any of these receive events within the application, the application handlerfor that event can then parse the metadata for any packet-specific fields it cares toextract. All packet types will support the following two calls:

90

type = getPacketType(metadata)sender = getSender(metadata)

where type is an integer corresponding with one of the packet classes, and senderspecifies the address of the sender in the physical address space. With this type infor-mation in hand, the application layer can then use similar calls to extract packetfields such as the destination address for sendSingle() or the (remaining) list ofdestinations for sendList(), the hop radius for sendHopRad() or the geographic ra-dius for sendGeoRad(), or the center and radius for sendCircle() or the vertices forsendPolygon().

6.4 Application Examples

We now present detailed treatments of a selection of the surveyed papers [32,40,56,62], chosen for the range of communication patterns they exhibit. We briefly describethe objective of each proposed algorithm and show how it can be implemented usingthe API calls outlined in the previous section.

6.4.1 Distributed Wavelet Compression

First, we consider the distributed wavelet compression application discussed indetail earlier in this thesis and presented in [32]. Recall that this entails first com-puting a distributed WT of data within the network and then selectively streamingwavelet coefficients from nodes to the sink. First, however, the sink must learn ofeach node’s self-localized position, sent using a sendSink() call by each node. Usingthis information, the sink computes a set transform data for each node and sends itto the node using a sendSingle() call.

Given this transform data, the multi-scale WT, based on the theory of wavelet lift-ing, proceeds in the network as follows. A subset of nodes at each scale are designatedto compute WT values. Each node in this set computes its value — called a waveletcoefficient — using values from neighbors not generating wavelet coefficients. Eachsuch neighbor knows which nodes will require its value — called a scaling coefficient –and transmits this value to those nodes, using an instance of the sendList() call. Eachnode that computes a wavelet coefficient collects these neighboring values, computesits coefficient, and sends the coefficient value back to the neighbors, again using aninstance of the sendList() call. Each neighbor collects the list of wavelet coefficientssent to it and uses the information to compute a new coarser-scale scaling coefficientfor itself. The process then repeats at the next scale on the remaining scaling coef-ficient nodes, with a subset giving rise to wavelet coefficients at the new scale andthe remainder participating as scaling coefficients in further scales of the transform.The ease of coding this procedure using the proposed API calls is illustrated by thepseudocode implementation calcWT(n,j), shown in Algorithm 1, running on node nat scale j.

91

Algorithm 1 calcWT(n,j)

1: if n ∈ ∆j then2: repeat3: do nothing4: until receive cj+1,γ from each γ ∈ Nj,n

5: predict dj,n using cj+1,γγ∈Nj,n

6: sendList(dj,n,Nj,n, effort, hopLimit)7: else if n ∈ Γj then8: sendList(cj+1,n,Mj,n, effort, hopLimit)9: repeat

10: do nothing11: until receive dj,λ from each λ ∈Mj,n

12: update cj,n using dj,λλ∈Mj,n

13: end if

Once the transform has iterated to a final, coarsest scale, each node has a trans-form coefficient replacing its original measurement. This set of coefficients is muchmore sparse than the original measurement set — in other words, the energy of themeasured signal is concentrated at far fewer nodes. To harvest a lossily compressedversion of the measurement field, the sink broadcasts a threshold to all sensors usingthe sendHopRad() call with hoplimit = MAXHOPS. Upon receiving this thresholdquery, each node with a coefficient whose magnitude is above the threshold sendsits coefficient to the sink using the sendSink() call. The process may repeat withsubsequent threshold broadcasts from the sink and node replies until it has harvestedenough coefficients to reconstruct the field to some desired fidelity.

To provide robustness to occasional node and routing failures, [32] also proposesa mechanism for nodes to repair transform data in a distributed fashion. If a waveletcoefficient-generating node cannot hear from a required neighbor, it can begin tosearch for new neighbors outward in an expanding radial neighborhood. Such a re-quest is implemented using the sendGeoRad() call, and replies from potential neigh-bors return using the sendSingle() call. New and remaining neighbors must be in-formed of this change using further sendList() and sendSingle() calls.

6.4.2 TinyDB

In [62], the authors describe the TinyDB query engine for sensor networks. Basicaggregate queries — such as minima, maxima, averages, counts, and sums — areposed at the network’s data sink. The sink broadcasts the query to the networkwith an incrementing hop counter that allows nodes to form an application-specifichierarchy to service the query. Each node receiving the query picks a parent one-hop closer to the sink than itself and re-broadcasts the query to potential childrenin its 1-hop neighborhood. This query dissemination is handled as a network-wideflood at the sink using the sendHopRad() call with hoplimit = MAXHOPS. As the

92

query forwards, each node can extract its minimum distance to the sink from thedecrementing hoplimit field.

Once the query has reached the entire network, each leaf node (a node that hasno children) evaluates the query using its data and forwards the result to its parent.Each parent aggregates messages from its children with its own datum and forwardsthe result to its parent until the final aggregate is computed at the sink. Each child-to-parent message is sent using the sendSingle() API call.

TinyDB extends basic sink query servicing to support more advanced event-drivenqueries issued from nodes in the network. On detection of an event (say, a birdentering a nest monitored by a node), the node can query all neighbors within aspecified radius for data (such as light and temperature) using the sendGeoRad()call. Data from the neighboring nodes returns to the querying node using sendSingle()calls, and a report is sent to the sink using sendSink() addressing. Alternatively, thereport can be sent to a storage point within the network that is addressable by allnodes but does not occupy a fixed location. In this case, the querying node sends itsreport to a multicast group of storage nodes using the sendMulti() call.

As an example of the utility of TinyDB, the authors consider the problem oftracking a vehicle using magnetometer readings at each sensor. Nodes in the networkinitially start out in a low power mode, and those in the vicinity of the target power upwhen it first enters the sensor network using dedicated wake-up circuitry. Each activenode then monitors the running average of its magnetometer and when a thresholdis exceeded inserts its measurement and node identifier into a storage point (againusing the sendMulti() call). The storage point estimates the target’s location usingthe location of the node with the strongest reading. Nodes can also eavesdrop ontheir neighbors’ messages to the storage point using the receiveOverhear() call andsuppress their own transmissions when neighbors have a stronger reading. Note thatuse of the receiveForward() receiver would also allow nodes with stronger readingsto suppress forwarding messages from nodes with weaker readings. Finally, when asensing node detects that the target is moving out of range, it can wake up nodesin the next area to be traversed by the target using a region-based send such assendCircle() or sendPolygon().

6.4.3 Distributed Multi-Target Tracking

The authors of [56] tackle the problem of tracking multiple maneuvering targetsin a decentralized fashion. Maintaining a complete record of the joint state spaceof targets in any one component of the sensor network is impractical; it is far moreefficient to keep state information for each target in tracking agents occupying nearbynodes. When targets’ paths cross, however, their locations must be estimated jointly,and target identity upon track divergence becomes uncertain. Agents separatelytracking each target following the split must maintain a dialogue to determine whichof the targets they are actually tracking.

Before a merge, each target is tracked by an agent occupying a node designated

93

as the track leader. To localize its target, the node collects measurements fromother nodes in the vicinity of the target — a so-called “geographically constrainedgroup” (GCG) defined as a circle or polygon. This process involves sendCircle() orsendPolygon() API calls, and each measurement returns to the track leader via asendSingle() call. As the target moves, the track leader duty is passed to a new nodein the target’s path using sendCircle() or sendPolygon() with the single-node-onlyoption. When two targets come within a threshold range of each other, the two GCGsare merged and serviced by a single track leader. As the two targets subsequentlydiverge, two new GCGs, each with its own track leader, are created. To disambiguatetarget identities, nodes serving as track leaders must maintain a dialogue for a time,so they form an “acquaintance group” (AG). Since tracking agents move from nodeto node to follow their targets, we represent the AG as a multicast group to whicheach leader subscribes. Messages passed between leaders to sort out target identitiesare then sent using the sendMulti() call.

6.4.4 Fractional Cascading

In [40], the authors present a framework called Fractional Cascading for answeringrange queries injected from anywhere in the network (as discussed in Section 3.1).Each node in the network has a global view of the measurement field that decays inresolution proportional to distance from the node — that is, a node knows much moreabout measurements from nearby nodes than from those far away. This allows queriesposed at arbitrary locations, counting or enumerating all sensors in a rectangular areawhose measurements lie in some range, to be efficiently routed to regions with relevantinformation.

The structure that enables efficient query processing is a virtual quadtree partitionof the bounding box containing the sensor field. This square box is partitioned intofour sub-squares, and each of these is recursively partitioned into four smaller sub-squares, and so on, until a minimum square size is reached. At each scale, the largersquare giving rise to the four smaller squares is declared a parent and the smallersquares its children. To guide query routing, the maximum measurement value ofany sensor node in a given quadtree square is stored in the nodes in all children ofthat square’s parent — i.e., its sibling squares. This structure is built in a bottom-upfashion. One node in a square (that by induction has access to the maximum value ineach of that square’s children) computes the square’s maximum value and sends thevalue to all nodes in the square’s siblings, implemented using a sendPolygon() call.

The answer to each range query is compiled in three steps. First, the query isdirected toward a sensor in the region of interest, that requires a sendPolygon() APIcall with the single-node-only option. Once in the region, the query sequentially visitseach sub-region designated as a “canonical piece” — that is, a quadtree square thatis completely contained inside the query region but whose parent square extends out-side the region. This again necessitates a sendPolygon() API (single-node-only) call.Once the query has arrived at any sensor in the canonical region, it can exploit the

94

distributed information structure to efficiently traverse the sub-tree for that region,recursively visiting each child square in a fashion appropriate to the query, again us-ing a sendPolygon() API (single-node-only) call. For example, if the query wishes tocompile a list of sensors with measurements above a threshold value, the query neednot recur on any children or siblings of a square with a maximum temperature belowthe threshold. Since any sensor in each square records the maximum value in thatsquare and its sibling squares, the query can efficiently traverse the squares of eachcanonical region’s subtree.

Finally, the aggregated query data return to the querying node using asendSingle() API call.

95

Chapter 7Conclusions and Future Work

We conclude the thesis with a brief summary of our contributions and a discussionof ongoing work.

7.1 Conclusions

In this thesis, we have developed a novel method of distributed, multi-scale anal-ysis of sensor network data. The proposed technique is extremely easy to distributedand compute in the sensor network setting, tolerates a wide variety of measurementclasses, and allows for swift decoding of harvested data by the network user. Basedon the wavelet theory of lifting, the proposed transform is completely tolerant of theirregular spatial placement of sensor nodes expected in real-world sensor network de-ployments. We have proven that transform coefficients inherit the decay properties ofcoefficients arising from traditional, regular-grid wavelet transforms (WTs). We havedemonstrated through numerical simulations the applicability of transform data tothe tasks of distributed compression of and distributed noise removal form measureddata. We have shown through ns-2 networking simulations that the collaborativecost of distributed wavelet analysis is more than offset by the energy savings gainedin subsequent applications such as distributed compression.

We have then expanded our scope past distributed wavelet analysis to consider thecommunication requirements of the broad variety of distributed processing techniquesproposed in the sensor networking literature. Surveying more than 100 papers fromthe proceedings of IPSN, we have constructed a taxonomy of common collaborationsbetween nodes in sensor network applications. Based on this specification, we havegone on to design a network API to enable easy programming of sensor networkapplications by their designers. The implementation of this API will allow for a newlevel of practicality in future distributed algorithm designs.

7.2 Future Work

The work presented here can be extended in a variety of ways. First, the dis-tributed wavelet representation likely lends itself to a variety of uses past distributedcompression and de-noising of measured data. As one example, the interpolation ofneighboring scaling coefficients used by nodes in the predict stage of lifting to com-pute their wavelet coefficients can easily be leveraged to re-construct measurements

96

at varying spatial resolutions from nodes that have failed to record data due to tem-porary or permanent failure. As another example, DIMENSIONS [38] proposes useof the multi-resolution hierarchy of distributed wavelet analysis to efficiently guiderouting of routing of queries in the network. Queries are initially processed on thecoarse scale summary of data held in, say, the coarsest set of scaling coefficients. Thequeries are then only forwarded to regions described by those coefficients satisfyingthe criteria. Recall that DIMENSIONS assumes that the sensor network forms aregularly-spaced, square sampling grid. Thus, scaling coefficients form a distributedquad-tree hierarchy — one that is dyadic in each dimension — and routing to regionssatisfying the query is relatively straightforward. The hierarchical structure govern-ing the scaling coefficients from irregular grid wavelet analysis is more complicated,and more study will be required to understand the structure and determine how toleverage such a structure for efficient query routing.

Implementing the proposed transform in a medium- to large-scale sensor networktestbed is crucial to developing a greater understanding of its practical applicability.Though we have deployed the algorithm with success on a small scale, we have notbeen able to characterize the transform’s performance in the dynamic networkingenvironment that results from a large scale deployment. Issues such as communicationlatency will no doubt have an interesting impact on the transform’s performance.Such an evaluation will also allow us to explore in greater detail the tradeoff betweencollaborative spatio-temporal wavelet processing across the network and isolated, non-collaborative temporal processing at each node. As demonstrated in Sections 4.5 and5.3 for the applications of compression and de-noising, respectively, there exist a widevariety of measurement types and network configurations in which spatio-temporalanalysis provides greater performance than simple temporal analysis. It remains to beseen, however, how great these benefits are when the cost of the spatial collaborationsare taken into account. Thus, implementation and study of the transform in a realnetworking environment is crucial to better understanding this tradeoff.

This need for easy prototyping leads to the most crucial element of ongoing work— implementing the API calls outlined in Chapter 6. Algorithm designers must begiven an easy way to quickly evaluate their designs in real sensor network hardware toassess the practicality of their work. Thus, the API design must result in code artifactsthat can be easily re-used by developers. Section 6.3 suggests possible implementa-tions for each family of API calls, including off-the shelf multi-hop and geographicrouting protocols for address-based sending and region-based sending, respectively.Aggregating these off-the-shelf solutions to support the API calls, however, is nota viable solution. As Section 6.4 illustrates, many algorithms are likely to have anumber of communication patterns operating in tandem. Implementing each solu-tion in isolation will likely lead to protocols competing for network resources whenthey could otherwise be cooperating and saving overhead through re-use of commoncomponents.

Instead, we must jointly develop routing solutions for each family of communica-

97

tion patterns that operate well with each other. For the case of multi-hop communi-cation, the COMPASS architecture [86] provides an excellent starting point. Basedin part on the Safari routing mechanism [87], the COMPASS architecture forms aprobabilistic, self-organizing clustering of nodes that enables multi-resolution com-munication and scales to a very large number of sensors with diverse capabilities.Hierarchical addressing of nodes, with addresses assigned based on each level of clus-ter membership, provides the routing layer the mechanism it needs to find a routingpath between any two nodes, even in the presence of node-free routing voids on theshortest path between the pair. The COMPASS architecture also has the benefit ofstatelessness, in that nodes need not maintain per-destination routing state. COM-PASS does not, however, tend to find the shortest route between two points. To doso, we require a geographic routing protocol, where destinations are specified usingtheir geographic coordinates and next-hop routes are chosen as the neighboring nodeclosest to that destination. Such a mechanism is also useful for routing to regionsspecified by the sendCircle() and sendPolygon() region-based routing calls, wherethe first step involves routing toward the central point in the specified region. Un-fortunately, these approaches cannot cope well with the routing voids around whichCOMPASS paths manage to navigate. Thus, the multi-scale and geographic pro-tocols will both prove useful in address-based and region-based routing, suggestingan integrated solution. Additionally, both approaches require some notion of peri-odically sending meta-data, such as node location or hierarchal COMPASS address,in beacon packets that propagate a certain number of radio hops from the sender.Including the data required by each protocol in the payload of a single beacon packetcan significantly reduce the communication overhead in building and maintaining therouting system. Finally, both geographic routing protocols and COMPASS requiresome lookup service to map node locations to geographic coordinates or hierarchicalCOMPASS addressing. Intelligently exploiting the additional resources of the morepowerful nodes assumed in the device hierarchical API calls can greatly facilitateimplementing this lookup service in an efficient manner.

98

Appendix AAppendix: Proof of Theorem 3.3.1

As a first step, we prove that (3.26) implies (3.27), assuming that Nj(λ) in con-tained in the ball |x−λ| ≤ CL2−j with CL = CL(m, d) some fixed constant. Denotingby e = (eα)|α|≤m the first line of G−1, we have according to (3.25) that

|uλ| = |x0| = |〈e, y〉| ≤ ‖e‖`2‖y‖`2 ≤ CG‖y‖`2 , (A.1)

withyα :=

∑

γ∈Nj(λ)

qα(γ)uγ. (A.2)

According to the definition of the polynomials qα, we have

|qα(y)| ≤ 1, y ∈ Nj(λ). (A.3)

Also, since µ(Γj) > 2−j, we have

#(Nj(λ)) ≤ P, (A.4)

where P = P (CL + 1/2, 1/2, d) is the maximal number of balls of radius 1/2 that aredisjointly contained in the ball of radius CL + 1/2. It follows that

|yα| ≤ P supγ∈Nj(λ)

|uγ|, (A.5)

so that using (A.1) we obtain

|uλ| ≤ CA supγ∈Nj(λ)

|uγ|, (A.6)

with CA := P (dim(Πm))1/2 = CA(m, d). Since this is valid for all choices of (uγ), thisis equivalent to (3.27).

We now turn to the proof of (3.26). A first remark concerns the properties ofG. Since G is the Grammian matrix of the system (qα)|α|≤m for the `2(Nj(λ)) innerproduct, it is clear that G is symmetric and positive, but not necessarily positivedefinite. We next remark that if N 0

j (λ) is a subset of points in Nj(λ) and N 1j (λ) is

99

its complement, then we can decompose G according to

G = G0 + G1, (A.7)

where G0 and G1 are the Grammian matrices for the `2(N 0j (λ)) and `2(N 1

j (λ)) innerproducts, respectively. It follows that the invertibility of G0 implies the invertibilityof G with

‖G−1‖ ≤ ‖G−10 ‖. (A.8)

It is therefore sufficient to prove (3.26) when Nj(λ) is replaced by an appropriatesubset N 0

j (λ). We now explain how such a subset can be constructed.Consider the set of points in the half unit simplex

N := (zα)|α|≤m =

(α1

2m, · · · ,

αd

2m), αi ≥ 0, α1 + · · ·+ αd ≤ m

. (A.9)

It is well known from classical finite element theory that this set of point is Πm-unisolvent: for any vector (uγ)γ∈N , there exists a unique polynomial p such thatp(γ) = uγ for all γ ∈ N . Introducing the basis

rα(x1, · · · , xd) =d∏

i=1

xαii , |α| = α1 + · · ·+ αd ≤ m, (A.10)

it follows that the Grammian matrix H of (rα)|α|≤m for the `2(N ) inner product isnon-singular.

Consider now a more general set N = (zα)|α|≤m and the corresponding Grammian

matrix H of the same system (rα)|α|≤m for the `2(N ) inner product. By continuity ofthe entries of the matrix with respect to the points zα, there exists some 0 < η < 1

2m

such that if for all α|zα − zα| ≤ η, (A.11)

we have‖H−1‖ ≤ CG, CG := 2‖H−1‖. (A.12)

Note that the set N is contained in the unit ball. We define

CL :=2

η= CL(m, d). (A.13)

We now use the fact that the Grammian matrix is invariant when, for some a > 0and λ ∈ IRd, we apply the change of variable

z 7→ λ + az and rα 7→ rα(a−1(x− λ)). (A.14)

Here, we take a := 2−jCL so that the change of variable produces the basis (qα) of(3.21). We therefore obtain that if N 0

j (λ) = (γα)|α|≤m is a set of point such that for

100

all α,|γα − (λ + azα)| ≤ 2−j+1 (A.15)

then the Grammian matrix G0 of the system (qα)|α|≤m for the `2(Nj(λ)) satisfies

‖G−10 ‖ ≤ CG. (A.16)

It remains to remark that since the balls Bα := |x− (λ+ azα)| ≤ 2−j+1 are disjointand since Γj is quasi-uniform, there exists a different point γα ∈ Γj in each of theseballs. Since these balls are also contained in the larger ball of center λ and of radius2−jCL, we can therefore extract the appropriate subset N 0

j (λ) out of Nj(λ). Theproof of the theorem is complete. ¤

101

Bibliography

[1] G. Tolle, J. Polastre, R. Szewczyk, D. Culler, N. Turner, K. Tu, S. Burgess,T. Dawson, P. Buonadonna, D. Gay, and Hong W, “A macroscope in the Red-woods,” in Proc. International Conference on Embedded Networked Sensor Sys-tems (SenSys), 2005, pp. 51–63.

[2] C.-Y. Chong and S. P. Kumar, “Sensor networks: Evolution, opportunities, andchallenges,” Proc. of the IEEE, vol. 91, no. 8, pp. 1247–1256, 2003.

[3] S. Hollar, “COTS dust,” M.S. thesis, U.C. Berkeley, 2000.

[4] Crossbow technology incorporated website, http://www.xbow.com/.

[5] M. Srivastava, “Energy-aware wireless sensor and actuator networks,” http:

//www.cs.usyd.edu.au/wsnw/Srivastava-2006 02 eii MS v3.pdf, Feb. 2006,talk at Workshop on Wireless Sensor Networks for Information Infrastructure.

[6] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G. B. Giannakis, “Distributedcompression-estimation using wireless sensor networks,” IEEE Signal ProcessingMag., vol. 23, no. 4, pp. 27–41, Jul. 2006.

[7] S. S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compression ina dense microsensor network,” IEEE Signal Processing Mag., vol. 19, no. 2, pp.51–60, Mar. 2002.

[8] Z. Xiong, A.D. Liveris, and S. Cheng, “Distributed source coding for sensornetworks,” IEEE Signal Processing Mag., vol. 21, no. 5, pp. 80–94, 2004.

[9] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,”IEEE Trans. on Information Theory, vol. 19, no. 4, pp. 471 –480, Jul. 1973.

[10] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding withside information at the decoder,” IEEE Trans. on Information Theory, vol. 22,pp. 1–10, Jan. 1976.

[11] M. F. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk, “Universal dis-tributed sensing via random projections,” in Proc. Information Processing inSensor Networks (IPSN), 2006, pp. 177–185.

[12] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression andpredistribution via randomized gossiping,” in Proc. Information Processing inSensor Networks (IPSN), 2006, pp. 51–59.

102

[13] Personal communication with Marco Duarte. February 21, 2007.

[14] Personal communication with Joel Tropp. March 21, 2007.

[15] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden, “Distributedregression: an efficient framework for modeling sensor network data,” in Proc.Information Processing in Sensor Networks (IPSN), 2004, pp. 1–10.

[16] M. Paskin, C. Guestrin, and J. McFadden, “A robust architecture for distributedinference in sensor networks,” in Proc. Information Processing in Sensor Net-works (IPSN), 2005, pp. 55–62.

[17] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 still imagecoding system: An overview,” IEEE Trans. on Consumer Electronics, vol. 46,no. 4, pp. 1103–1127, 2000.

[18] A. Brandt, “Multi-level adaptive solutions to boundary-value problems,” Math-ematics of Computation, vol. 31, pp. 333–390, 1977.

[19] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and AppliedMathematics, 1992.

[20] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1998.

[21] V. Delouille, Nonparametric Stochastic Regression Using Design-AdaptedWavelets, Ph.D. thesis, Universite Catholique de Louvain, 2002.

[22] R. A. DeVore, B. Jawerth, and B. J. Lucier, “Image compression through wavelettransform coding,” IEEE Trans. on Information Theory, vol. 38, pp. 719–746,Mar. 1992.

[23] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,”IEEE Trans. on Signal Processing, vol. 41, pp. 3445–3462, Dec. 1993.

[24] S. LoPresto, K. Ramchandran, and M. T. Orchard, “Image coding based on mix-ture modeling of wavelet coefficients and a fast estimation-quantization frame-work,” in Proc. Data Compression Conf., 1997, p. 221230.

[25] D. L. Donoho, “De-noising via soft-thresholding,” IEEE Trans. on InformationTheory, vol. 41, pp. 613–627, 1995.

[26] D. L. Donoho and I. M. Johnstone, “Adapting to unknown smoothness viawavelet shrinking.,” J. American Statistical Assoc., vol. 90, pp. 1200–1224, 1995.

[27] I. M. Johnstone and B. W. Silverman, “Empirical Bayes selection of waveletthresholds,” The Annals of Statistics, vol. 33, no. 4, pp. 1700–1752, 2005.

[28] Intel-berkeley lab data, http://db.csail.mit.edu/labdata/labdata.html.

[29] D. Ganesan, S. Ratnasamy, H. Wang, and D. Estrin, “Coping with irregularspatio-temporal sampling in sensor networks,” SIGCOMM Comput. Commun.Rev., vol. 34, no. 1, pp. 125–130, 2004.

[30] R. Wagner, S. Sarvotham, and R. G. Baraniuk, “A multiscale data representationfor distributed sensor networks,” in IEEE International Conference on Acoustics,Speech, and Sig. Proc. (ICASSP), Mar. 2005.

[31] R. Wagner, H. Choi, R. Baraniuk, and V. Delouille, “Distributed wavlet trans-form for irregular sensor network grids,” in Proc. IEEE Statistical Signal Pro-cessing Workshop (SSP), Jul. 2005.

[32] R. S. Wagner, R. G. Baraniuk, S. Du, D. B. Johnson, and A. Cohen, “Anarchitecture for distributed wavelet analysis and processing in sensor networks,”in Proc. Information Processing in Sensor Networks (IPSN), 2006, pp. 243–250.

[33] R. G. Baraniuk, A. Cohen, and R. Wagner, “Approximation and compressionof scattered data by meshless multiscale decompositions,” J. of Applied andComutational Harmonic Analysis, submitted.

[34] W. Sweldens, “The lifting scheme: A construction of second generationwavelets,” SIAM J. of Mathematical Analysis, vol. 29, no. 2, pp. 511–546, Mar.1998.

[35] A. Harten, “Discrete multi-resolution analysis and generalized wavelets,” AppliedNumerical Mathematics, vol. 12, pp. 153–192, 1993.

[36] I. Daubechies, I. Guskov, P. Schroder, and W. Sweldens, “Wavelets on irregularpoint sets,” Philisophical Trans. of the Royal Society of London, vol. 357, pp.2397–2413, 1999.

[37] N. Dyn, M. S. Floater, and A. Iske, “Adaptive thinning for bivariate scattereddata,” J. Computational and Applied Mathematics, vol. 145, no. 2, pp. 505–517,2002.

[38] D. Ganesan, B. Greenstein, D. Estrin, J. Heidemann, and R. Govindan, “Multi-resolution storage and search in sensor networks,” ACM Trans. on Storage, vol.V, no. N, Apr. 2005.

[39] A. Ciancio, S. Pattem, A. Ortega, and B. Krishnamachari, “Energy-efficient datarepresentation and routing for wireless sensor networks based on a distributedwavelet compression algorithm,” in Proc. Information Processing in Sensor Net-works (IPSN), 2006, pp. 309–316.

[40] J. Gao, L. J. Guibas, J. Hershberger, and L. Zhang, “Fractionally cascadedinformation in a sensor network,” in Proc. Information Processing in SensorNetworks (IPSN), 2004, pp. 311–319.

[41] W. Wang and K. Ramchandran, “Random distributed multiresolution represen-tations with significance querying,” in Proc. Information Processing in SensorNetworks (IPSN), 2006, pp. 102–108.

[42] M. Jansen, G. Nason, and B. Silverman, “Scattered data smoothing by empir-ical bayesian shrinkage of second generation wavelet coefficients,” in Wavelets:Applications in Signal and Image Processing IX, Proc. of SPIE, 2001, vol. 4478,pp. 87–97.

[43] L. Hu and D. Evans, “Localization for mobile sensor networks,” in Proc. In-ternational Conference on Mobile Computing and Networking (MobiCom), 2004,pp. 45–57.

[44] S. PalChaudhuri, A. K. Saha, and D. B. Johnson, “Adaptive clock synchroniza-tion in sensor networks,” in Proc. Information Processing in Sensor Networks(IPSN), 2004, pp. 340–348.

[45] S. Amat, F. Arandiga, A. Cohen, R. Donat, G. Garcia, and M. von Oehsen,“Data compression with ENO schemes: A case study,” Applied and Computa-tional Harmonic Analysis, vol. 11, pp. 273–288, 2001.

[46] J. Broch, D. A. Maltz, D. B. Johnson, Y. Hu, and J. G. Jetcheva, “A PerformanceComparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols,” in Proc.International Conference on Mobile Computing and Networking MobiCom, 1998,pp. 85–97.

[47] R. Wagner, V. Delouille, and R. G. Baraniuk, “Distributed wavelet de-noisingfor sensor networks,” in IEEE International Conference on Decision and Control(CDC), Dec. 2006.

[48] B. Patt-Shamir, “A note on efficient aggregate queries in sensor networks,” inProc. ACM Symposium on Distributed Computing, 2004, pp. 283–289.

[49] A. G. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip:Efficient aggregation for sensor networks,” in Proc. Information Processing inSensor Networks (IPSN), 2006, pp. 69–76.

[50] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” inProc. Information Processing in Sensor Networks (IPSN), 2006, pp. 20–27.

[51] TinyOS Community Forum, http://www.tinyos.net/.

[52] R. Wagner, J. R. Stinnett, M. Duarte, R. G. Baraniuk, D. B. Johnson, andT. S. E. Ng, “A network application programming interface for data processingin sensor networks,” Tech. Rep. TREE0705, Rice University, Houston, Texas.

[53] R. Willett, A. Martin, and R. Nowak, “Backcasting: Adaptive sampling forsensor networks,” in Proc. Information Processing in Sensor Networks (IPSN),2004, pp. 124–133.

[54] S. Pattem, B. Krishnamachari, and R. Govindan, “The impact of spatial cor-relation on routing with compression in wireless sensor networks,” in Proc.Information Processing in Sensor Networks (IPSN), 2004, pp. 28–35.

[55] M. Coates, “Distributed particle filters for sensor networks,” in Proc. Informa-tion Processing in Sensor Networks (IPSN), 2004, pp. 99–107.

[56] J. Liu, M. Chu, J. Liu, J. Reich, and F. Zhao, “Distributed state representationfor tracking problems in sensor networks,” in Proc. Information Processing inSensor Networks (IPSN), 2004, pp. 234–242.

[57] S. Pattem, S. Poduri, and B. Krishnamachari, “Energy-quality tradeoffs fortarget tracking in wireless sensor networks,” in Proc. Information Processing inSensor Networks (IPSN), 2003, pp. 32–46.

[58] P. W. Boettcher and G. A. Shaw, “Energy-constrained collaborative processingfor target detection, tracking, and geolocation,” in Proc. Information Processingin Sensor Networks (IPSN), 2003, pp. 254–268.

[59] Q. Wang, W. Chen, R. Zheng, K. Lee, and L. Sha, “Acoustic target trackingusing tiny wireless sensor devices,” in Proc. Information Processing in SensorNetworks (IPSN), 2003, pp. 642–657.

[60] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” inProc. Information Processing in Sensor Networks (IPSN), 2004, pp. 20–27.

[61] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusionbased on average consensus,” in Proc. Information Processing in Sensor Net-works (IPSN), 2005, pp. 63–70.

[62] J. M. Hellerstein, W. Hong, S. Madden, and K. Stanek, “Beyond average: To-ward sophisticated sensing with queries,” in Proc. Information Processing inSensor Networks (IPSN), 2003, pp. 63–79.

[63] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran, “Ubiquitous access todistributed data in large-scale sensor networks through decentralized erasurecodes,” in Proc. Information Processing in Sensor Networks (IPSN), 2005, pp.111–117.

[64] B. Krishnamachari and S. S. Iyengar, “Efficient and fault-tolerant feature ex-traction in wireless sensor networks,” in Proc. Information Processing in SensorNetworks (IPSN), 2003, pp. 488–501.

[65] A. G. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip:Efficient aggregation for sensor networks,” in Proc. Information Processing inSensor Networks (IPSN), 2006, pp. 69–76.

[66] J. -Y. Chen, G. Pandurangan, and D. Xu, “Robust computation of aggregatesin wireless sensor networks: Distributed randomized algorithms and analysis,”in Proc. Information Processing in Sensor Networks (IPSN), 2005, pp. 348–355.

[67] A. T. Ihler, J. W. Fisher III, and R. L. Moses, “Nonparametric belief propagationfor self-calibration in sensor networks,” in Proc. Information Processing in SensorNetworks (IPSN), 2004, pp. 225–233.

[68] A. Savvides, W. Garber, S. Adlakha, R. Moses, and M. B. Srivastava, “On theerror characteristics of multihop node localization in ad-hoc sensor networks,”in Proc. Information Processing in Sensor Networks (IPSN), 2003, pp. 317–332.

[69] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg, “Near-optimal sensorplacements: Maximizing information while minimizing communication cost,” inProc. Information Processing in Sensor Networks (IPSN), 2006, pp. 2–10.

[70] Q. Cao, T. Abdelzaher, T. He, and J. Stankovic, “Towards optimal sleep schedul-ing in sensor networks for rare-event detection,” in Proc. Information Processingin Sensor Networks (IPSN), 2005, pp. 20–27.

[71] Z. Abrams, A. Goel, and S. Plotkin, “Set K-cover algorithms for energy efficientmonitoring in wireless sensor networks,” in Proc. of IPSN, 2004, pp. 424–432.

[72] G. Xing, C. Lu, R. Pless, and J. A. O’Sullivan, “Co-grid: an efficient coveragemaintenance protocol for distributed sensor networks,” in Proc. InformationProcessing in Sensor Networks (IPSN), 2004, pp. 414–423.

[73] N. Shrivastava, S. Suri, and C.D. Toth, “Detecting cuts in sensor networks,” inProc. Information Processing in Sensor Networks (IPSN), 2005, pp. 210–217.

[74] M. A. Batalin and G. S. Sukhatme, “Coverage, exploration, and deployment bya mobile robot and communication network,” in Proc. Information Processingin Sensor Networks (IPSN), 2003, pp. 376–391.

[75] A. Terzis, A. Anandarajah, K. Moore, and I.-J. Wang, “Slip surface localiza-tion in wireless sensor networks for landslide predection,” in Proc. InformationProcessing in Sensor Networks (IPSN), 2006, pp. 109–116.

[76] K. Chintalapudi, J. Paek, O. Gnawali, T. S. Fu, K. Dantu, J. Caffey, andR. Govindan, “Structural damage detection and localization using NETSHM,”in Proc. Information Processing in Sensor Networks (IPSN), 2006, pp. 475–482.

[77] W. Hu, V. N. Tran, N. Bulusu, C. T. Chou, S. Jha, and A. Taylor, “The designand evaluation of a hybrid sensor network for cane-toad monitoring,” in Proc.Information Processing in Sensor Networks (IPSN), 2005, pp. 503–508.

[78] P. Dutta, J. Hui, J. Jeong, S. Kim, C. Sharp, J. Taneja, G. Tolle, K. Whitehouse,and D. Culler, “Trio: Enabling sustainable and scalable outdoor wireless sensornetwork deployments,” in Proc. Information Processing in Sensor Networks(IPSN), 2006, pp. 407–415.

[79] P. Levis, “TinyOS Programming,” http://csl.stanford.edu/∼pal/pubs/tinyos-programming.pdf.

[80] P. Levis, “Packet Protocols,” http://www.tinyos.net/tinyos-2.x/doc/html/

tep116.html.

[81] R. Fonseca, O. Gnawali, K. Jamieson, and P. Levis, “Collection,” http://www.

tinyos.net/tinyos-2.x/doc/html/tep119.html.

[82] P. Levis, “message t,” http://www.tinyos.net/tinyos-2.x/doc/html/

tep111.html.

[83] S. Keshav, “An Engineering Approach to Computer Networking: ATM Net-works, the Internet, and the Telephone Netowrk,” Addison Wesley. 1997.

[84] Chipcon, “CC2420 Data Sheet,” http://www.chipcon.com/files/CC2420

Data Sheet 1 4.pdf.

[85] M. K. McKusick and G. V. Neville-Neil, “The Design and Implementation ofthe FreeBSD Operating System,” Addison Wesley. 2005.

[86] COMPASS project web page, http://compass.cs.rice.edu/.

[87] S. Du, M. Khan, S. PalChaudhuri, A. Post, A. Saha, P. Druschel, D. B. Johnson,and R. Riedi, “Self-organizing hierarchical routing for scalable ad hoc network-ing,” Tech. Rep. TR04-433, Department of Computer Science, Rice University,March 2004.

Documents

Distributed Multi-Scale Data Processing for Sensor Networksrwagner/docs/wagnerPHDThesis.pdf · Wireless sensor networks provide a challenging application area for signal process-ing