Sensor Networks - Hindawi Publishing Corporationdownloads.hindawi.com/journals/specialissues/754572.pdf · The information-driven sensor querying framework selec-tively activates

EURASIP Journal on Applied Signal Processing

Sensor Networks

Guest Editors: Kung Yao, Deborah Estrin, and Yu Hen Hu

Sensor Networks


Sensor Networks

Guest Editors: Kung Yao, Deborah Estrin, and Yu Hen Hu


Copyright © 2003 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2003 of “EURASIP Journal on Applied Signal Processing.” All articles are open accessarticles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly cited.

Editor-in-ChiefMarc Moonen, Belgium

Senior Advisory EditorK. J. Ray Liu, College Park, USA

Associate EditorsKiyoharu Aizawa, Japan Ulrich Heute, Germany Naohisa Ohta, JapanGonzalo Arce, USA Yu Hen Hu, USA Antonio Ortega, USAJaakko Astola, Finland Jiri Jan, Czech Bjorn Ottersten, SwedenKenneth Barner, USA Søren Holdt Jensen, Denmark Mukund Padmanabhan, USAMauro Barni, Italy Ton Kalker, The Netherlands Ioannis Pitas, GreeceSankar Basu, USA Mos Kaveh, USA Phillip Regalia, FranceShih-Fu Chang, USA Bastiaan Kleijn, Sweden Hideaki Sakai, JapanJie Chen, USA Ut-Va Koc, USA Wan-Chi Siu, Hong KongTsuhan Chen, USA Aggelos Katsaggelos, USA Dirk Slock, FranceM. Reha Civanlar, Turkey C. C. Jay Kuo, USA Piet Sommen, The NetherlandsTony Constantinides, UK S. Y. Kung, USA John Sorensen, DenmarkLuciano Costa, Brazil Chin-Hui Lee, USA Michael G. Strintzis, GreeceZhi Ding, USA Kyoung Mu Lee, Korea Tomohiko Taniguchi, JapanPeter M. Djurić, USA Sang Uk Lee, Korea Sergios Theodoridis, GreeceJean-Luc Dugelay, France Y. Geoffrey Li, USA Xiaodong Wang, USAPierre Duhamel, France Heinrich Meyr, Germany An-Yen (Andy) Wu, TaiwanTariq Durrani, UK Ferran Marqués, Spain Xiang-Gen Xia, USATouradj Ebrahimi, Switzerland José M. F. Moura, USA Kung Yao, USASadaoki Furui, Japan King N. Ngan, SingaporeMoncef Gabbouj, Finland Takao Nishitani, Japan

Contents

Editorial, Kung Yao, Deborah Estrin, and Yu Hen HuVolume 2003 (2003), Issue 4, Pages 319-320

Energy-Based Collaborative Source Localization Using Acoustic Microsensor Array, Dan Liand Yu Hen HuVolume 2003 (2003), Issue 4, Pages 321-337

The Fusion of Distributed Microphone Arrays for Sound Localization, Parham AarabiVolume 2003 (2003), Issue 4, Pages 338-347

A Self-Localization Method for Wireless Sensor Networks, Randolph L. Moses,Dushyanth Krishnamurthy, and Robert M. PattersonVolume 2003 (2003), Issue 4, Pages 348-358

Acoustic Source Localization and Beamforming: Theory and Practice, Joe C. Chen, Kung Yao,and Ralph E. HudsonVolume 2003 (2003), Issue 4, Pages 359-370

Dynamic Agent Classification and Tracking Using an Ad Hoc Mobile Acoustic Sensor Network,David Friedlander, Christopher Griffin, Noah Jacobson, Shashi Phoha, and Richard R. BrooksVolume 2003 (2003), Issue 4, Pages 371-377

Collaborative In-Network Processing for Target Tracking, Juan Liu, James Reich, and Feng ZhaoVolume 2003 (2003), Issue 4, Pages 378-391

Preprocessing in a Tiered Sensor Network for Habitat Monitoring, Hanbiao Wang, Deborah Estrin,and Lewis GirodVolume 2003 (2003), Issue 4, Pages 392-401

EURASIP Journal on Applied Signal Processing 2003:4, 319–320c© 2003 Hindawi Publishing Corporation

Editorial

Kung YaoDepartment of Electrical Engineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1594, USAEmail: [email protected]

Deborah EstrinDepartment of Computer Science, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1596, USAEmail: [email protected]

Yu Hen HuDepartment of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive,Madison, WI 53706-1691, USAEmail: [email protected]

Advances in low-cost and low-power wireless communica-tion, microsensor, and microprocessor hardware, as well asprogress in ad hoc networking routing and protocols, dis-tributed signal and array processing, pervasive computing,and embedded systems have all made sensor networking atopic of active interest. In recent years, the Internet has beenable to provide a large number of users with the ability tomove diverse forms of information readily and thus rev-olutionized business, industry, defense, science, education,research, and human interactions. Sensor networking may,in the long run, be equally significant by providing mea-surement of the physical phenomena around us, leading totheir understanding and ultimately the utilization of this in-formation for a wide range of applications. Potential appli-cations of sensor networking include environmental moni-toring, health care monitoring, battlefield surveillance andreconnaissance, modern highway, modern manufacturing,condition-based maintenance of complex systems, and soforth.

In order to understand and build sensor networks, di-verse technology and technical disciplines are involved. How-ever, in this special issue we deal only with various signal pro-cessing aspects of sensor networking. Of the seven papers,four of them deal with source localization, two of them withtracking, and one with sensor network decomposition andorganization. Energy-Based Collaborative Source LocalizationUsing Acoustic Microsensor Array, by D. Li and Y. H. Hu, usesacoustic energy measurements to perform source localiza-tion. This approach assumes the acoustic source energy de-cays inversely with the square of the distance. By compar-ing acoustic sensor energy measurements around the source,the source location can be estimated as the intersection of

multiple hyperspheres. The Fusion of Distributed MicrophoneArrays for Sound Localization, by P. Aarabi, also deals withacoustic source localization. The author proposes to use thespatial observability function (SOF), which gives an indica-tion of how well a microphone array perceives events at dif-ferent spatial position. Each microphone array also has a spa-tial likelihood function (SLF) which reports the likelihood ofa source at each spatial location. SOF and SLF approaches areused together for sound localization. In A Self-LocalizationMethod for Wireless Sensor Networks, by R. L. Moses, D.Krishnamurthy, and R. Patterson, the authors consider theproblem of locating and orienting a network of unattendedsensors by using a number of known source signals for cal-ibration purposes. The maximum-likelihood (ML) estima-tion and Cramer-Rao Bound (CRB) techniques are used.Acoustic Source Localization and Beamforming: Theory andPractice, by J. C. Chen, K. Yao, and R. E. Hudson, again usesthe ML method for direct localization of wideband acousti-cal source in the near field and uses the cross bearing of thedirection-of-arrivals (DOA) for localization in the far field.For multiple sources, an alternating projection procedure isused. CRB analysis provides various insights for the local-ization problem. Dynamic Agent Classification and TrackingUsing an Ad Hoc Mobile Acoustic Sensor Network, by D. Fried-lander, C. Griffin, N. Jacobson, S. Phoha, and R. R. Brooks,presents methods for dynamic distributed signal processingusing an ad hoc mobile network of sensors to detect, identify,and track targets. Forming dynamic clusters around eventsof interest allows for processing multiple events in parallelover different geographic areas along the trajectory of the tar-gets. In Collaborative In-Network Processing for Target Track-ing, J. Liu, J. Reich, and F. Zhao consider collaborative signal

320 EURASIP Journal on Applied Signal Processing

processing using acoustic-amplitude sensors for target dis-tance estimation and DOA sensors for bearing estimation.The information-driven sensor querying framework selec-tively activates sensors based on their utility and cost. Is-sues of distributed processing for tracking and energy effi-ciency of the network are addressed. Preprocessing in a TieredSensor Network for Habitat Monitoring, by H. Wang, D. Es-trin, and L. Girod, considers some common principles fortask-decomposition and collaboration for tiered sensor net-works. The system has a few powerful macronodes in thefirst tier and many less-powerful nodes in the second tier.Each macronode combines data collected by many micron-odes for target classification and localization. Application ismade to habitat monitoring and classification and localiza-tion of birds. All seven of these papers use simulations andmeasured data to verify the proposed methods. In the com-ing years, it is expected that sensor networking will becomeever more important both in research and industry and thathardware and software availability will enable significant datacollection and field experimentation.

Kung Yao received the B.S.E. (Highest Hon-ors), M.A., and Ph.D. degrees in electricalengineering all from Princeton University,Princeton, NJ. He was a NAS-NRC Postdoc-toral Research Fellow at the University ofCalifornia, Berkeley. Presently, he is a Pro-fessor in the Electrical Engineering Depart-ment at UCLA. In 1969, he was a Visit-ing Assistant Professor at the MassachusettsInstitute of Technology. In 1985–1988, heserved as an Assistant Dean of the School of Engineering and Ap-plied Science at UCLA. His research and professional interests in-clude sensor array system, digital communication theory and sys-tem, smart antenna and wireless radio system, chaos communica-tions and system theory, digital and array and signal and array pro-cessing, systolic and VLSI algorithms, architectures and systems,radar system, and simulation. He has published over 250 journaland conference papers. Dr. Yao received the IEEE Signal ProcessingSociety’s 1993 Senior Award in VLSI Signal Processing. He was thecoeditor of a two-volume series of an IEEE Reprint Book on HighPerformance VLSI Signal Processing, IEEE Press, 1997. In 1991–1993, he was the Associate Editor of VLSI Signal Processing of theIEEE Trans. on Circuits and Systems. Since 1999, he is an AssociateEditor of the IEEE Communications Letters. He is a member of theEditorial Board of the Journal of VLSI Signal Processing and Inte-gration: the VLSI Journal. He was also a Guest Editor of a SpecialIssue on Applications of Chaos in Modern Communication Systemsof the IEEE Trans. on Circuits and Systems—Part I in 2001. He is aFellow of IEEE.

Deborah Estrin is a Professor of computerscience at UCLA and Director of the Centerfor Embedded Networked Sensing (CENS),a newly awarded National Science Founda-tion Science and Technology Center. She re-ceived her Ph.D. degree in computer sciencefrom the MIT (1985) and was on the fac-ulty of Computer Science at USC from 1986through mid-2000, where she received theNational Science Foundation, Presidential

Young Investigator Award for her research in network interconnec-tion and security (1987). During the subsequent 10 years her re-search focused on the design of network and routing protocols forvery large, global networks. Estrin has been instrumental in defin-ing the national research agenda for wireless sensor networks, firstchairing a 1998 DARPA ISAT study and then a 2001 NRC study; thelatter culminated in an NRC publication—Embedded Everywhere:A Research Agenda for Networked System of Embedded Computers.Estrin’s research group develops algorithms and systems to sup-port rapidly-deployable and robustly operating networks of manythousands of physically embedded devices. She is particularly in-terested in applications to environmental monitoring. Estrin hasserved on numerous program committees and editorial boards,including SIGCOMM, Mobicom, SOSP, and ACM/IEEE Transac-tions on Networks. She is a Fellow of the ACM and the AAAS.

Yu Hen Hu received the B.S.E.E. degreefrom National Taiwan University, Taiwan,in 1976. He received the M.S. and Ph.D.degrees both in electrical engineering fromUniversity of Southern California, Los An-geles, Calif in 1980 and 1982, respectively.Currently, he is a Professor at the Electricaland Computer Engineering Department ofthe University of Wisconsin-Madison, Wis,USA. Previously, he has been with the Elec-trical Engineering Department of the Southern Methodist Univer-sity, Dallas, Tex, USA. Dr. Hu’s research interests include multime-dia signal processing, design methodology and implementation ofsignal processing algorithms and systems, sensor network and dis-tributive signal processing algorithms, and neural network signalprocessing. He published more than 200 journal and conferencepapers and edited two books: Programmable Digital Signal Proces-sors and Handbook of Neural Network Signal Processing. Dr. Hu isa Fellow of IEEE. He served as Associate Editor for IEEE Trans-actions on Signal Processing, IEEE Signal Processing Letters, Jour-nal of VLSI Signal Processing, EURASIP Journal on Applied SignalProcessing. He served as Secretary of IEEE signal processing soci-ety, board of governors of IEEE neural network council, Chair ofIEEE signal processing society, and neural network signal process-ing technical committee.


Energy-Based Collaborative Source LocalizationUsing Acoustic Microsensor Array

Dan LiDepartment of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive,Madison, WI 53706-1691, USAEmail: [email protected]

Yu Hen HuDepartment of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive,Madison, WI 53706-1691, USAEmail: [email protected]

Received 9 January 2002 and in revised form 13 October 2002

A novel sensor network source localization method based on acoustic energy measurements is presented. This method makes useof the characteristics that the acoustic energy decays inversely with respect to the square of distance from the source. By comparingenergy readings measured at surrounding acoustic sensors, the source location during that time interval can be accurately esti-mated as the intersection of multiple hyperspheres. Theoretical bounds on the number of sensors required to yield unique solutionare derived. Extensive simulations have been conducted to characterize the performance of this method under various parameterperturbations and noise conditions. Potential advantages of this approach include low intersensor communication requirement,robustness with respect to parameter perturbations and measurement noise, and low-complexity implementation.

Keywords and phrases: target localization, source localization, acoustic sensors, collaborative signal processing, energy-based,sensor network.

1. INTRODUCTION

Distributed networks of low-cost microsensors with signalprocessing and wireless communication capabilities have avariety of applications [1, 2]. Examples include under wa-ter acoustics, battlefield surveillance, electronic warfare, geo-physics, seismic remote sensing, and environmental moni-toring. Such sensor networks are often designed to performtasks such as detection, classification, localization, and track-ing of one or more targets in the sensor field. These sensorsare typically battery-powered and have limited wireless com-munication bandwidth. Therefore, efficient collaborative sig-nal processing algorithms that consume less energy for com-putation and communication are needed.

An important collaborative signal processing task issource localization. The objective is to estimate the positionsof a moving target within a sensor field that is monitored by asensor network. This may be accomplished by (a) measuringthe acoustic, seismic, or thermal signatures emitted from thesource—the moving target, at each sensor node of the net-work; and then (b) analyzing the collected source signaturescollaboratively among different sensor modalities and differ-ent sensor nodes. In this paper, our focus will be on collabo-rative source localization based on acoustic signatures.

Source localization based on acoustic signature has broadapplications: in sonar signal processing, the focus is on lo-cating under-water acoustic sources using an array of hy-drophones [3, 4]. Microphone arrays have been used to lo-calize and track human speakers in an indoor room environ-ment for the purpose of video conferencing [5, 6, 7, 8]. Whena sensor network is deployed in an open field, the soundemitted from a moving vehicle can be used to track the lo-cations of the vehicle [9, 10].

To enable acoustic source localization, two approacheshave been developed: for a coherent, narrowband source, thephase difference measured at receiving sensors can be ex-ploited to estimate the bearing direction of the source [11].For broadband source, time-delayed estimation has beenquite popular [6, 9, 10, 12, 13, 14].

In this paper, we present a novel approach to estimate theacoustic source location based on acoustic energy measuredat individual sensors. It is known that in free space, acousticenergy decays at a rate that is inversely proportional to thedistance from the source [15]. Given simultaneous measure-ments of acoustic energy of an omnidirectional point sourceat known sensor locations, our goal is to infer the source lo-cation based on these readings.


While the basic principle of this proposed approach issimple, to achieve reasonable performance in an outdoorwireless sensor network environment, the following numberof practical challenges must be overcome.

(i) In an indoor environment, sound propagation maybe interfered by room reverberation [16] and echoes. Simi-lar effects may also occur in an outdoor environment whenman-made walls or natural rocky hills are present within thesensor field.

(ii) In an outdoor environment, the sound propagationmay be affected by wind direction [17, 18] and presence ofdense vegetation [19].

(iii) The sensor locations may not be accurately mea-sured.

(iv) The acoustic energy emission may be directional. Forexample, the engine sound of a vehicle may be stronger onthe side where the engine locates. The physical size of theacoustic source may be too large to be adequately modeledas a point source.

(v) In an outdoor environment, strong background noiseincluding wind gust may be encountered during operation.In addition, the gain of individual microphones will need tobe calibrated to yield consistent acoustic energy readings.

(vi) If there are two or more closely spaced acousticsources, their corresponding acoustic signals may interfereeach other, rendering the energy decay model infeasible.

In this paper, we first propose a simple, yet powerfulacoustic energy decay model. A simple field experiment re-sult is reported to justify the feasibility of this model for thesensor network application. A maximum-likelihood estima-tion problem is formulated to solve the location of a sin-gle acoustic source within the sensor field. This is solved byfinding the intersection of a set of hyperspheres. Each hyper-sphere specifies the likelihood of the source location based onthe acoustic energy readings of a pair of sensors. Intersectingmany hyperspheres formed by a group of sensors within thesensor field will yield the source location. This is formulatedas a nonlinear optimization problem of which fast optimiza-tion search algorithms are available.

This proposed energy-based localization (EBL) methodwill potentially give accurate results at regular time inter-val, and will be robust with respect to parameter perturba-tions. It requires relatively few computations and consumeslittle communication bandwidth, and therefore is suitablefor low power distributed wireless sensor network applica-tions.

This paper is organized as follows. In Section 2, we reviewseveral existing source localization algorithms. In Section 3,an energy decay model of sensor signal readings is provided.An outdoor experiment to validate this model is also out-lined. The development of the EBL algorithm is specifiedin Section 4, where we also elaborate the notion of the tar-get location circles/spheres and some properties associatedwith them. A variety of search algorithms for optimizing thecost function are also proposed in this section. In Section 5,simulation is performed with the aim of studying the ef-fect of different factors on the accuracy and precision ofthe location estimate. A comparison of different search al-

CPAposition

(a)

CPAposition

(b)

Figure 1: Illustration of CPA-based localization (a) 1D CPA local-ization, (b) 2D CPA localization.

gorithms applied on our energy-based localizer is also re-ported.

2. EXISTING SOURCE LOCALIZATION METHODS

In a sensor network, a number of methods can be used tolocate and track a particular moving target. Some existingmethods are reviewed in this section.

2.1. CPA-based localization method

In its original definition, a CPA (closest point of ap-proach) point refers to the positions of two dynamicallymoving objects at which they reach their closest possibledistance (see, http://www.geometryalgorithms.com/Archive/algorithm 0106/algorithm 0106.htm). In a sensor networkapplication, a CPA position is a point on the trajectory of amoving target that is closest with respect to a stationary sen-sor node. Refer to Figure 1, using CPA point to estimate thetarget location can be accomplished in two different ways.

(i) One-dimensional CPA localization: if a target is movingalong a road with known coordinates, the CPA pointwith respect to a given sensor node is a coordinate onthis road that is closest to this observing sensor. Giventhe sensor coordinate and the road coordinates, thisCPA point can be precomputed prior to operation. As-suming that the signal intensity will reach maximumwhen a target is in the closest position, the time in-stant, when the target is on the CPA point, can be esti-mated from the time series observed at the sensor. Al-ternatively, the 1D CPA detection can be realized usinga tripped-wire style sensing modality, such as a polar-ized infrared (PIR) sensor.

Energy-Based Collaborative Source Localization 323

(ii) Two-dimensional localization: in a two-dimensionalsensor field, if the coordinate of the target trajectoryis not known in advance, the target position cannotbe precomputed. However, if the single intensity mea-sured at neighboring sensors during the same time in-terval can be compared, the sensor which measures thehighest acoustic signal intensity should be the one thatis closest to the target. Then the location of that sensormay be used as an estimate of the target location. Thisis equivalent to the partition of the sensor field into NVoronoi regions where N is the number of sensors. Ifthe target is in ith region, the corresponding ith sen-sor’s location will be used as an estimated location ofthe target.

To use the CPA style localization method, it is desirable thatsufficient number of sensors are deployed within a given sen-sor field. Otherwise, the accuracy of the localization resultsmay be too coarse to yield less accurate results.

2.2. Target localization based on time delayestimation

Sound signal travels at a finite speed. The same signal reachessensors at different locations with different amount of delays.Denote v to be the source signal propagation speed rs andri, respectively, to be the target location and ith sensor’s lo-cation, and ti to be the time lags experienced at ith sensor.Then the time delay between the source and received signalat sensor i is ti = |rs − ri|/v + ni, where ni is used to model arandom noise due to measurement error. While the absolutevalue of ti cannot be measured without knowing the sourcelocation rs, the relative time delay measured with respect to areference sensor r0 can be measured as

v(ti − t0

) = vti =∣∣rs − ri

∣∣− ∣∣rs − r0∣∣ + ni. (1)

Given N + 1 sensors, N equations like (1) can be formulated.Then, one may estimate the unknown parameters v and rsusing maximum likelihood estimation [6, 10, 14, 20].

Alternatively, (1) can be expressed as

[− 2

(ri − r0

)T∣∣ri − r0∣∣2 − 2

(ti − t0

)]x

= aTi x = (ti − t0

)2 = bi, 1 ≤ i ≤ N,(2)

where x = [(rs− r0)T1/|v|2|rs− r0|/v]T is a (d+ 2)× 1 vectorwith d being the dimension of the sensor and target loca-tion vector. Note that elements of x are interdependent. WithN + 1(> d + 2) sensors, the target location can be found bysolving a constrained quadratic optimization problem: find xto minimize C = ‖Ax− b‖2 subject to

xd+1 ·( d∑

i=1

x2i

)= x2

d+2, (3)

where A = [a1a2 · · · aN ]T and b = [b1b2 · · · bN ]T . The con-straint described by (3) is due to the interdependent relations

between elements of the x vector. The target location can beestimated as rs = r0 +[x1 · · · xd]T , and the propagation speedcan be solved simultaneously as v = 1/

√xd+1. If constraint

(3) is ignored, one would need only to solve an overdeter-mined linear system Ax = b using the least square method[9]. This method has also been refined using iterative im-provement method and the Cramer-Rao bound of param-eter estimation error has been derived [20]. Time-delayedestimation-based source localization methods require ac-curate estimation of time delays between different sensornodes. To measure the relative time delay, acoustic signaturesextracted from individual sensor node must be compared. Inthe extreme case, this will require the transmission of the rawtime series data that may consume too much wireless channelbandwidth. Alternative approaches include cross-spectrum[8] and range difference method [21].

3. ENERGY-BASED COLLABORATIVE SOURCELOCALIZATION ALGORITHM

Energy-based source localization is motivated by a simpleobservation that the sound level decreases when the distancebetween sound source and the listener becomes large. Bymodeling the relation between sound level (energy) and dis-tance from the sound source, one may estimate the sourcelocation using multiple energy readings at different knownsensor locations.

3.1. An energy decay model of sensor signal readings

When the sound is propagating through the air, it is knownthat [15] the acoustic energy emitted omnidirectionally froma sound source will attenuate at a rate that is inversely pro-portional to the square of the distance. To verify whetherthis relation holds in a wireless sensor network system with asound generated by an engine, we conducted a field experi-ment. In the absence of the adverse conditions laid out in theintroduction above, the experiment data confirms that suchan energy decay model is adequate. Details about this exper-iment will be reported in Section 3.2.

Let there be N sensors deployed in a sensor field in whicha target emits omnidirectional acoustic signals from a pointsource. The signal energy measured on the ith sensor over atime interval t, denoted by yi(t), can be expressed as follows:

yi(t) = gi · s(t − ti

)∣∣r(t − ti)− ri

∣∣α + εi(t). (4)

In (4), ti is the time delay for the sound signal propagatesfrom the target (acoustic source) to the ith sensor, s(t) is ascalar denoting the energy emitted by the target during timeinterval t; r(t) is a d × 1 vector denoting the coordinates ofthe target during time interval t; ri is a d × 1 vector denotingthe Cartesian coordinates of the ith stationary sensor; gi is thegain factor of the ith acoustic sensor; α(≈ 2) is an energy de-cay factor, and εi(t) is the cumulative effects of the modelingerror of the parameters gi, ri, and α and the additive obser-vation noise of yi(t). In general, during each time duration t,


many time samples are used to calculate one energy readingyi(t) for sensor i. Based on the central limit theorem, εi(t)can be approximated well using a normal distribution with apositive mean value, denoted by, say, µi (> 0), that is no lessthan the standard deviation (STD) of the background mea-surement noise during that time interval. The STD of εi(t)may also be empirically determined.

3.2. Experiment that validates the acoustic energydecay model

To validate the model described in (4), we conducted a fieldexperiment. We used a lawn mower at stationary position asour acoustic energy source. Two sensor nodes with acous-tic microphone used in a DARPA SensIT project are placedat different distances (5 m, 10 m, 15 m, 20 m, 25 m, 30 m)away from the energy source. The microphones are placedat about 50 cm above the ground and face the energy source.The weather is clear with gentle breeze, and the temperatureis about 24 ◦C.

The time series of both the acoustic sensors was recordedat a sampling rate of 4960.32 Hz. Then the energy readingswere computed offline as the moving average (over a 0.5-second sliding window) of the squared magnitude of the timeseries. These energy readings then were fitted to an exponen-tial curve to determine the decaying exponent α, as shown inFigure 2.

For both acoustic sensors, within the 30-meter range, theacoustic energy decay exponents are α = 2.1147 (with meansquare error 0.054374) and α = 2.0818 (with mean square er-ror 0.016167), respectively. This validates the hypothesis thatthe acoustic energy decreases approximately as the inverse ofthe square of the source sensor distance.

We here assume α to be constant, which is valid if thesound reverberation can be ignored and the propagationmedium (air) is roughly homogenous (i.e., no gusty wind)during the process of experiment.

3.3. Maximum likelihood parameter estimation

Assume that εi(t) in (4) are independent, identically dis-tributed (i.i.d.) normal random variables with known meanµi (> 0) and known variance σ2

i , then each yi(t) will be ani.i.d. conditional normal random variable with a probabilitydensity function N(gis(t)/|r(t)−ri|α+µi, σ2

i ). We also assumethat the time delay discrepancies among sensors are negligi-ble, that is, ti = 0. Then, the likelihood function or, equiv-alently, the conditional joint probability can be expressed asfollows:

�(s(t), r(t)

)= f

(y0(t), . . . , yN−1(t) | σ2, {s(t), r(t)})

∝ exp

− 12

N−1∑i=0

[yi(t)− µi − gis(t)/

∣∣r(t)− ri∣∣α]2

σ2i

.(5)

The objective of the maximum likelihood estimation is tofind the source energy reading and the source locations

3025201510500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6Lawn mower, sensor 2, exponent = 2.0818

3025201510500

0.5

1

1.5

2Lawn mower, sensor 1, exponent = 2.1147

Figure 2: Acoustic energy decay profile of the lawn mower and theexponential curve fitting.

{s(t), r(t)} to maximize the likelihood function. Since we as-sume that the mean µi and the variance σ2

i of εi(t) are known,this is equivalent to minimizing the following log-likelihoodfunction:

L(s(t), r(t)

)∝ N−1∑i=0

[yi(t)− µi − gis(t)/

∣∣r(t)− ri∣∣α]2

σ2i

.(6)

Given {yi(t), gi, ri, µi, σ2i ; 0 ≤ i ≤ N − 1} and α, the goal

is to find s(t) and r(t) to minimize L in (6). This can be ac-complished using a standard nonlinear optimization methodsuch as the Nelder-Mead simplex (direct search) method im-plemented in the optimization package in Matlab.

3.4. Energy ratio and target location hypersphere

In the above formulation, we solve for both the source loca-tion r(t) and source energy s(t). In this section, we presentan alternative approach that is independent of the source en-ergy s(t). This is accomplished by taking ratios of the energyreadings of a pair of sensors in the noise-free case to cancelout s(t). We refer to this approach as the energy ratio formu-lation.


0−2−4−6−8

−4

−3

−2

−1

0

1

2

3

4

κ = 0.8

κ = 0.7

κ = 0.6

κ = 0.4

Figure 3: Two sensors are located at (−1, 0) and (1, 0). Four κ valuesare used 0.4, 0.6, 0.7, and 0.8. The corresponding target locationcircles and their centers are also shown.

Approximating the additive noise term εi(t) in (4) by itsmean value µi, we can compute the energy ratio κi j of the ithand the jth sensors as follows:

κi j :=((

yi(t)− µi)/(yj(t)− µj

)gi(t)/g j(t)

)−1/α

=∣∣r(t)− ri

∣∣∣∣r(t)− r j∣∣ . (7)

Note that for 0 < κi j �= 1, all the possible source coordi-nates r(t) that satisfy (7) must reside on a d-dimensionalhypersphere described by the equation∣∣r(t)− ci j

∣∣2 = ρ2i j , (8)

where the center ci j and the radius ρi j of this hypersphereassociated with sensor i and j are given by

ci j =ri − κ2

i j r j

1− κ2i j

, ρi j =κi j∣∣ri − r j

∣∣1− κ2

i j

. (9)

For convenience, we will call this hypersphere a target lo-cation hypersphere. When d = 2, such a hypersphere is a cir-cle. When d = 3, it is a sphere. In Figure 3, several examplescorresponding to d = 2 and κi j < 1 are illustrated. As κi j in-creases, that is, as yj(t)/g j(t) → yi(t)/gi(t), the center of thecircle moves away from the sensors, and the radius increases.

In the limiting case when κi j → 1, the solution of (7)form a hyperplane between ri and r j

r(t) · (ri − r j) = ∣∣ri∣∣2 − ∣∣r j∣∣2

2or equivalently,

r(t) · γi j = ξi j ,

(10)

where

γi j = ri − r j , ξi j =∣∣ri∣∣2 − ∣∣Rj

∣∣2

2. (11)

So far, we have established that using the ratio of energy read-ings at a pair of sensors, the potential target location can berestricted to a hypersphere whose center and radius are func-tions of the energy ratio and the two sensor locations. If moresensors are used, more hyperspheres can be determined. If allthe sensors that receive the signal from the same target areused, the corresponding target location hyperspheres mustintersect at a particular point that corresponds to the sourcelocation. This is the basic idea of the energy-based source lo-calization. Note that since the source energy is cancelled dur-ing the energy ratio computation, this method will not beaffected even if the source energy levels vary dramatically be-tween successive energy integration time intervals.

3.5. Single target localization using multiple energyratios and multiple sensors

Suppose that N acoustic sensors detected the source sig-nal emitted from a target during the same time intervals,N(N − 1)/2 pairs of energy ratios can be computed. Basedon M (≤ N(N − 1)/2) these sensor energy ratios, our ob-jective is to estimate the target location r(t) during that timeinterval. Using a least square criterion, this problem leads toa nonlinear least square optimization problem where the costfunction is defined as

J(r) =M1∑m=1

∣∣∥∥r − cm∥∥− ρm

∣∣2+

M2∑n=1

∣∣γTn r − ξn∣∣2,

M1 + M2 =M,

(12)

where m and n are indices of the energy ratios computedbetween different pairs of sensor energy readings, M1 is thenumber of hyperspheres, and M2 is the number of hyper-planes. In practice, when |1− κ2

i j| becomes too small, it maycause numerical problem when evaluating r and ρ using (9).In this case, the hyperplane equation (10) should be used in-stead. In our simulation, a value of 10−3 was set as the thresh-old to switch between these two type of error terms.

Note that if two sensors are both close to the target,their energy readings have higher SNRs. Therefore, the en-ergy ratio κi j computed from these energy readings will bemore reliable than that computed from a pair of sensors faraway from the target. Using the energy decay model, we mayuse the relative magnitudes of energy readings as an indica-tion of the target-sensor distance. As such, the error term in(12) that correspond to sensors with higher-energy readingsshould be given more weight than sensors that have lower-energy readings.

Statistically, to employ the least square formulation in(12), one must assume that both the hypersphere estima-tion error ‖r− cm‖−ρm and the hyperplane estimation errorγTn r − ξn are linear, independent Gaussian random variableswith zero mean and identical variance. Obviously, such anassumption may not be true in practice and hence may causesome performance degradation.

The cost function in (12) is nonlinear with respect to thesource location vector r. In this work, we experimented withthree nonlinear optimization methods to solve for r.


(a) Exhaustive search over grid points within a pre-defined search region in the sensor field. This approach is themost time consuming, yet most simple to implement. Thegrid size determines the accuracy of the results.

(b) Multiresolution search. First a coarse-grained ex-haustive search is conducted to identify likely source loca-tions. Then a detailed fine-grained search is performed to re-fine the localization estimate.

(c) Gradient-based steepest descent search method.Based on an initial source location (perhaps the previouslyestimated position in the last time interval), say r(0), per-form the following iteration:

r(k + 1) = r(k)− µ∇r J(r). (13)

The gradient of J(r) can be expressed as

∇J(r)2M1∑m=1

r − cm∥∥r − cm∥∥(∥∥r − cm

∥∥− ρm)

+ 2M2∑n=1

γn(γTn r − ξn

).

(14)

In addition to the above methods, other standard optimiza-tion algorithms, such as the quasi-Newton’s method, conju-gate gradient search algorithm, and many others can be used.For comparison purpose, in the simulation, we also apply theNelder-Mead (direct search) method implemented in Matlaboptimization toolbox to minimize J(r).

In summary, there are two different methods to solve theenergy-based, (single) source localization problem.

(1) Direct minimization of the nonlinear log-likelihoodfunction L as in (7). With a number of acoustic energy mea-surements, this method is capable of simultaneously estimat-ing the source location r(t) as well as the source energy s(t),and the energy decay parameter α.

(2) Direct minimization of the cost function defined in(12). A potential advantage of this method is that N(N−1)/2pairs of energy ratios can be used for the localization purposerather than the N energy readings used for minimizing thelikelihood function.

3.6. Unconstrained least square formulation

Consider two hyperspheres based on (8)

∣∣r(t)− ci0∣∣2 = ρ2

i0,∣∣r(t)− cj0

∣∣2 = ρ2j0. (15)

They are formed from the sensor pairs (i, 0) and ( j, 0).Subtract each side and cancel the term |r(t)|2, we have ahyperplane equation

2(ci0 − cj0

)r(t) = (

c2i0 − ρ2

i0

)− (c2j0 − ρ2

j0

). (16)

Substitute the definition in (9), the above equation is simpli-fied to

ui jr(t) = θi j (17)

which is a linear hyperplane equation with

ui j = 2ri1− κ2

i

− 2r j1− κ2

j

, θi j =∣∣ri∣∣2

1− κ2i

−∣∣r j∣∣2

1− κ2j

. (18)

Then, the cost function in (12) can be replaced by a linearleast square cost function

jLinear(r) =M1∑m=1

∣∣uTn r − θn∣∣2

+M2∑n=1

∣∣γTn r − ξn∣∣2. (19)

Note that there is no constraint imposed in (19). Given thecoefficients, a solution of r can be found in closed form.

4. IMPLEMENTATION CONSIDERATIONS

4.1. Preprocessing: node and region energy detection

In a microsensor network, multiple acoustic sensors are de-ployed in a sensor field. Sensors within the same geograph-ical region will form a group. One sensor node in a groupwill be designated as a manager node where the collaborativeenergy-based source localization will be performed.

During operation, individual sensor nodes will performenergy-based target detection algorithm. For example, a con-stant false alarm rate (CFAR) detection algorithm [22, 23]can be applied. Pattern classifiers may also be used to identifythe type of a detected target based on its acoustic or seismicsignatures.

Upon detection of a potential target, the sensor nodewill report the finding to the manager node in the region. Ifthe number of detections reported by sensors within the re-gion exceeds a predefined threshold, the manager node thendecides that a target is indeed detected by the region. Thisimplements a simple voting-based detection fusion withinthe region. Only after a region-wide detection is confirmed,the manager node will proceed to perform energy-basedsource localization. Since the energy is computed on individ-ual nodes, there is no need to recompute the acoustic energyreadings at the manager node.

4.2. Minimum number of collaborating sensors andnumber of energy ratios used

In general, given N sensors, at maximal N(N − 1)/2 pairs ofenergy ratios can be computed, and equal number of targetlocation hyperspheres (including some hyperplanes) can bedetermined accordingly. The target location is the unique in-tersection of all these target location hyperspheres if the en-ergy readings do not contain any measurement noise.

However, many of these relationships are actually redun-dant. In order to uniquely identify a single target location, inthis section, we want to determine (i) the constraint on thesensor location configuration; and (ii) the minimum num-ber of sensors required in theory to arrive at a unique sourcelocation estimate. Regarding sensor location configuration,we have the following results.

Lemma 1. Denote d to be the dimensionality of the sensor coor-dinate ri. If all N sensors locate on a subspace with a dimension


d′ < d, then the centers of every target location hyperspheresmust lie within the same subspace.

Proof. From (10), since ci j is a linear combination of sensorcoordinates ri and r j , it must lie within the same subspace asri and r j . Hence this lemma is proved.

Specifically, in a 2D (d = 2) sensor field, if all sensors lo-cate on a straight line, then all the centers of the correspond-ing target location circles must locate on the same straightline. Since circles with their centers locating on the samestraight line cannot have a single point as their intersection(either no intersection, or two or more points in the inter-section), it is impossible to uniquely determine the target lo-cation. The exception is when the target location is also onthe same straight line. In a 3D (d = 3) sensor field, if all sen-sors locate on the same plane, then all the centers of the cor-responding target location spheres must locate on the sameplane as well. Since spheres with their centers locating on thesame plane cannot intersect at just a single point in general,it cannot uniquely determine the target location. Similarly,the exception is when the target locates on the same plane.These observations lead to the theorem below which is statedwithout proof.

Theorem 1. In order to estimate a unique target location, notall the sensors should be placed on a subspace whose dimensionis smaller than that of the sensor field unless the target locationis restricted in the same subspace as well.

Next, we consider the question of the minimum numberof sensors needed to locate a single target.

Lemma 2. Given three arbitrary placed sensors (say, 1, 2, and3) in a 2D sensor field, the centers of every target location circlesc12, c23, and c31 must lie on the same straight line. Moreover, thecorresponding three target location circles may intersect at twopoints if the target does not locate on the same straight line, orat exactly one point if the target does locate on the same straightline.

Proof. Performing linear combination of c12 and c23 in orderto eliminate r2 and using the relations κ12κ23κ31 = 1, one has

1− κ212

κ212

c12 +(1− κ2

23

)c23

= 1− κ212

κ212

[r1 − κ2

12r2

1− κ212

]

+(1− κ2

23

)[ r2 − κ223r3

1− κ223

]= r1

κ212− κ2

23r3

= κ223κ

231r1 − κ2

23r3

= −κ223

(1− κ2

31

)[ r3 − κ231r1

1− κ231

]= −κ2

23

(1− κ2

31

)c31

= 1− κ212κ

223

κ212

c31.

(20)

But

1− κ212

κ212

+(1− κ2

23

) = 1− κ212κ

223

κ212

. (21)

Since c31 = βc12 + (1 − β)c23, c12, c23, and c31 must lie onthe same straight line, next, note that the true target locationmust be a point in each of the three corresponding targetlocation circles. In addition, three circles with their centerslocated on the same straight line can intersect at most twopoints, or not to intersect at all. Hence, these three circlesmust intersect at exactly two points. When the target locateson the same straight line where the centers of these circles lo-cate, the two points of their intersection collide into a singlepoint. Hence, this lemma is proved.

Lemma 2 implies that, even though three sensors are noton the same straight line, the centers of the correspond-ing target location circles (or spheres) still lie on the samestraight line. Using the argument in the proof of Theorem 1,clearly three sensors are insufficient to estimate a unique tar-get location in a 2D sensor field. It appears that at least foursensor energy readings will be needed.

Lemma 2 addresses the 2D sensor field case. It can easilybe generalized to the 3D sensor field case.

Lemma 3. Given four arbitrary placed sensors in a 3D sen-sor field, the centers of every target location spheres must lie onthe same plane. Moreover, the six corresponding target locationspheres may intersect at two points if the target does not locateon the same plane. Otherwise, their intersection contains ex-actly one point if the target also locates on the same plane.

Proof. Label these four sensors from 1 to 4. With four sensorenergy readings, six energy ratios can be computed. UsingLemma 2, we conclude that

(i) c12, c13, and c23 must reside on the straight line La;

(ii) c12, c14, and c24 must reside on the straight line Lb;

(iii) c13, c14, and c34 must reside on the straight line Lc.

Lines La and Lb share the same point c12. Hence, they mustlie on the same plane. Line Lc share one point to each lineLa(c13) and line Lb(c14), respectively. Therefore, Lc must lieon the same plane as La and Lb. The intersection regions be-tween spheres with centers on La, Lb, and Lc, respectively, arecircles, respectively. With three circles in a 3D space, theirintersection contains at most two points. If the target also lo-cates on the same plane, then these two points collide intoone.

Lemma 2 also reveals the redundancy among differentenergy ratios. This critical observation can be stated as acorollary as follows.

Corollary 1. Given energy ratios κ1i and κ1 j , the energy ratioκi j is redundant and can be removed without affecting the so-lution of the target location.


Proof. Since κ1iκi jκ j1 = 1. Using Lemma 2, the intersectionbetween the target location circle (sphere), corresponding toκi j with any of the other two circles (spheres), will be iden-tical to the intersection between the circles (spheres) corre-sponding to κ1i and κj1. Hence, the inclusion of target lo-cation circle (sphere) of κi j does not contribute to any newinformation to refine the solution space. Therefore, it is re-dundant.

Corollary 1 naturally leads to an important result in thissection.

Lemma 4. Given K sensors in a sensor field, then at most K−1pairs of energy ratios are independent in that the target locationcircles (or spheres) corresponding to remaining energy ratios donot further reduce the intersection region formed by the K − 1target location circles (or spheres) of those independent energyratios.

Proof. Denote sensor #1 as a reference sensor. Then denote{κ1i; 2 ≤ i ≤ K} for the set of K − 1 independent energyratios. Any other energy ratio κjk, 2 ≤ j, k ≤ K , j �= k willbe redundant according to Corollary 1. Thus, this lemma isproved. Note that the set of K − 1 independent energy ratiosis not unique and can be chosen differently.

Theorem 2. Using the energy-based target localization meth-od, at least four sensors not locating on the same straight lineare required to locate a single target in a 2D sensor field; and atleast five sensors not all locating on the same plane are requiredto locate a single target in a 3D sensor field.

Proof. In a 2D sensor field, at least 3 (= K − 1) circles areneeded to form a single point intersection. Thus, at least foursensor energy readings are needed. In a 3D sensor field, theintersection of two spheres is a circle. The intersection be-tween a sphere and a circle consists of at least two points (ifthe intersection exists). Therefore, at least 4 (= K−1) spheresare needed to yield a single point intersection. Thus the min-imum number of sensor energy readings needed in a 3D sen-sor field is five.

Figure 4 shows a simulation of target localization in a 2Dsensor field using four sensors and three energy ratios.

4.3. Nonlinear optimization search parameters

In developing nonlinear optimization methods to minimizethe cost function, a few parameters must be set properly toensure the performance of this proposed algorithm.

4.3.1 Search area

The region of the potential target location can often be de-termined in advance, based on prior information about thetarget, the region to be monitored, and the sensor locations.Since acoustic energy decays exponentially with respect todistance, the receptive field of an acoustic sensor (micro-phone) is limited. This range can be estimated based on themaximum acoustic energy the target of interests may emit,

1.510.50−0.50

0.5

1

1.5

2Energy-based collaborative target localization

Sensor locationsTarget locations

Center of circle

Figure 4: Localization of the target (star) at (1, 1) position us-ing four sensors (triangle). The centers of the circles are small cir-cles. Three circles corresponding to three independent equations aregenerated. These three circles intersect at the target position as pre-dicted. Parameters used s(t) = 1, gi = 1, and α = 2.

and the averaged background noise level due to wind andother natural or man-made sound. Furthermore, due to theneed of collaborative region detection, a target is not consid-ered detected unless a certain number of sensors voted pos-itive detection. Hence the area that a target may be detectedshould be the intersection of a minimum number of sensorsreceptive fields.

If a target’s movement is restrictive, such as along a road,then the search area can further be restricted to those ar-eas where the target is allowed to move. These additional re-strictions will enhance the accuracy of the source localizationprocess.

4.3.2 Search accuracy

Depending on the size of the potential target and its speed,the required accuracy of localization may vary. For example,for a target with a dimension (say, length of a truck) largerthan 5 meters, it would be meaningless to try to locate the tar-get within a 1-meter grid. In addition, if the target is movingmore than 10 m/s (about 20 mph), and the time duration tocompute one energy reading is 0.5 second, then the ambigu-ity regarding the actual location of the target during this timeperiod will be at least 5 meters. In this situation, any attemptto locate the target within 5 meters will not be meaningful.Therefore, in practical implementation, one should chooseappropriate accuracy measure.


4.3.3 Initial search location

For gradient-based search algorithms and other greedysearch algorithms, the initial search position is important.One way to select the initial target location estimate is to usethe sensor location where the energy reading is the maximumamong all other sensors. The heuristic is that if the sensorreceives higher energy, then the true target location will becloser to that sensor. In a localize-and-track scenario, the fu-ture target location can be predicted based on its trajectory.In that case, the most likely position of the target during thepresent time window may be chosen as the initial search po-sition.

4.4. Distributive implementation

This proposed EBL algorithm would require at least foursensor readings in order to yield a unique target loca-tion. Therefore, when implemented in a distributive sen-sor network, the acoustic energy readings will have to bereported to a centralized location to facilitate localizationprocessing. To be deployed into a distributed wireless sen-sor network, it is desirable that a decentralized implemen-tation of this proposed algorithm can be devised. By “de-centralized,” we hope to devise a computation scheme suchthat

(i) not all the energy readings need to be reported a cen-tralized fusion center;

(ii) not all the computation required to evaluate the costfunction (12) need to be carried out at a centralizedprocessing center.

This can be accomplished by noting that the cost functionin (12) consists of summation of independent square errorterms. Given a potential target location r, each of the squareerror term can be evaluated within a sensor node as soonas it computes the k value after receiving the acoustic en-ergy reading at a neighboring sensor node. Hence, instead oftransmitting the raw energy reading to the fusion center, thepartially computed cost function can be transmitted instead.This way, the task of computation can be evenly distributedover individual sensors. This scheme, however, may increasethe amount of internode wireless communications due to theneed to pass around the partially computed cost function foreach search grid.

5. PERFORMANCE ANALYSIS

A number of factors may affect the performance of theenergy-based target localization algorithm. Due to the non-linear nature and the complexity of the model, an analyti-cal expression is difficult to obtain and may not reveal therespective impacts of individual factors on the overall per-formance. In this section, extensive simulation will be con-ducted to compare the effectiveness of different optimizationalgorithms as well as the sensitivities of the location estimateswith respect to perturbations of various parameters of themodel.

5.1. Comparison of different search algorithms

In this simulation, we compare four different optimizationalgorithms for a single target, acoustic source localizationproblem. For this purpose, 20 sensors are uniform randomlydistributed in a 50-meter by 50-meter sensor field. The loca-tion of the target is assumed to be within this sensor field.

The objective function is the energy ratio cost functionshown in (12). Two different modes are chosen to implementthe cost function: in mode 0, N−1 independent energy ratios(N : number of sensors) are used to form the cost function.In mode 1, all possible N(N − 1)/2 energy ratios (with manyredundant measurements) are used to form the cost func-tion. The hypothesis is that with redundant measurementsincluded in the cost function, it may better withstand param-eter perturbations.

The following four search algorithms are implemented.

(1) Nelder-Mead (simplex) direct search (DS) algorithm:the initial source location is obtained by an exhaustivesearch at a grid size of 5 meters by 5 meters. For eachnew target location, the DS method will evaluate thecost function 11 × 11 = 121 times, and the DS searchwill require additional cost function evaluations.

(2) Grid-based exhaustive search (ES) with a single gridsize of 1 m× 1 m. To estimate a target location, the ESmethod will evaluate the cost function 51× 51 = 2601times.

(3) Multiresolution (MR) search with three levels of reso-lution (grid sizes) at 5 meters (5×), 2 meters (2×), and1 meter (1×), respectively. The number of cost func-tion evaluations for each new target location equals to11× 11 + 6× 6 + 3× 3 = 166.

(4) Gradient descent (GD) search algorithm using the gra-dient expression shown in (13). The initial location isdetermined by ES at a grid size of 5 meters by 5 me-ters. The step size µ = 0.05 and maximal steps = 200.The number of cost function evaluations for each newtarget location will be 11 × 11 = 121 times plus thenumber of gradient search steps.

Provided that the local search steps using either DS orgradient search is within 50 steps of either the DS or theGD search method, then the three search algorithms DS,MR, and GD will require approximately the same number ofcost function evaluations (∼170). On the other hand, the ESmethod will require 15 times more cost function evaluations.

Four experiment configurations are designed to comparethese search methods. In each configuration, a known fixedenergy is emitting from the source. At each sensor, the re-ceived energy is computed according to the exponential en-ergy decay model described in (4) with K = 1 and εi = 0(SNR = ∞). Three parameters in this model will be per-turbed in configurations #2 to #4, respectively, as shown inTable 1. Configuration # 1 is the control experiment withno parameter perturbation. In configuration #1, the energydecay constant α is sampled from a uniform distribution[2 − ∆α, 2 + ∆α] with ∆α = 0.5. In configuration #3, eachsensor’s location r is subject to a random perturbation of


dgdrdαctrl0

2

4

6

8STD in y

dgdrdαctrl0

2

4

6

8STD in x

dgdrdαctrl

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2Mean in y

dgdrdαctrl

−0.5−0.4−0.3−0.2−0.1

00.10.20.30.40.5

Mean in x

DS, mode 0

DS, mode 1

ES, mode 0

ES, mode 1

MR, mode 0

MR, mode 1

GD, mode 0

Figure 5: Mean and standard deviation (STD) of target location estimation bias using different search algorithms.

Table 1: Parameter settings for different configurations to comparefour optimization search algorithms.

Configuration # ∆α ∆r ∆g

1 0 0 0

2 0.5 0 0

3 0 1 0

4 0 0 0.5

magnitude ±∆r (= ±1) in both the x and y coordinates. Inconfiguration #4, the sensor gain g is perturbed to vary be-tween [1− ∆g, 1 + ∆g] with ∆g = 0.5.

Each experiment will be repeated 500 times using a costfunction evaluated with mode 0 setting and another 500times with a cost function evaluated, using the mode 1 set-ting. The mean and the STD of the estimation error on x-and y-axis are summarized in Figure 5.

Averaged over the four different parameter settings listedin Table 1, the mean and variance of each method in bothx and y directions are listed in Table 2. Using T-test, it isfound that the differences in terms of the mean values of theposition estimation errors among the four different searchmethods are statistically insignificant. Hence, despite largenumber of cost function evaluations, the ES method doesnot offer significant benefit in terms of improving sourcelocalization accuracy. Of course, this conclusion is condi-tioned on the practice implemented in this experiment toconduct initial coarse-grained ES (at 5 meters resolution) be-fore commencing the three local search algorithms, namely,

Table 2: Mean and variance of four different optimization meth-ods, averaged over four test conditions.

Mean-x Var-x Mean-y Var-y

ES 0.093925 5.939293 −0.042100 6.466883

MR 0.082425 6.242463 −0.030850 6.671392

DS 0.086488 8.287125 −0.053988 8.492783

GD 0.074825 3.145920 0.029475 3.343724

MR, DS, and GD. Without this initial ES, these methods maybe trapped in a local minimum solution that yields muchlarger position estimation error.

The simulation results can also be used to compare theeffectiveness of evaluating the cost function using mode #0(using minimum number ofN−1 energy ratios) versus mode#1 (using maximum number of N(N − 1)/2 energy ratios)configurations. The results are listed in Table 3.

When the gain variation results are included, mode #1performs worse than mode #0. This is because the erroneousenergy reading will be used to compute N − 1 energy ratiosin the mode #1 configuration and only 1 energy ratio for themode #0 configuration. Hence the same amount of error ona single sensor reading will have a bigger impact in mode #1than mode #0. However, excluding the gain variation factor,in general, mode #1 performs much better than mode #0.This result indicates that gain calibration of microphone isessential to the success of the energy-based source localiza-tion method presented in this paper. This point is also clearlyillustrated in Figure 5.


Table 3: Comparison between mode #0 and mode #1 results, aver-aged over all the four different methods, with different parametervariations.

Include dg Mean-x Var-x Mean-y Var-y

Mode-0 0.081675 4.087787 0.0216 4.115982

Mode-1 0.091267 9.244179 −0.10443 10.0473

Exclude dg Mean-x Var-x Mean-y Var-y

Mode-0 0.044933 2.959709 0.048542 2.694365

Mode-1 0.001889 0.573453 −0.00394 0.107678

5.2. Sensitivity analysis to parameter perturbations

In the previous section, we compared the performance offour different search methods. In this section, we will inves-tigate how the accuracy of the energy-based source localiza-tion method will be affected by inaccurate measurements ofparameters or the presence of noise.

5.2.1 Factors affecting localization accuracy

(a) Energy decay exponent α. Although we have conductedpreliminary experiment and determined that the acoustic en-ergy decay exponent α is approximately 2. However, this re-sult is obtained using a point, omnidirectional sound sourcein a favorable environment where the breeze is gentle and thetemperature is mild. It is likely that this parameter may bevaried at different situations. Thus, it is important to under-stand how sensitive the localization result will be with respectto inaccurate estimate of the value of α.

(b) Sensor coordinate measurement ri. Sensor coordinatescan be obtained using on board global positioning system(GPS) readings if such a device is available. However, highlyaccurate sensor location measurements would require long-term averaging of GPS readings and may consume extensivebattery power. It is necessary to study what will be the im-pact of sensor location inaccuracy on the accuracy of energy-based target localization.

(c) Acoustic sensor gain measurement gi. Not all acousticsensors are identical. Different sensors may exhibit differentgain characteristics. Thus, it is crucial to calibrate the gainfactor of individual acoustic sensors. It is also important togauge the effect of gain calibration error on the target local-ization accuracy.

(d) Acoustic energy measurement—signal-to-noise ratio(SNR). As discussed earlier, the acoustic energy is usually av-eraged over a predefined time window as the sum of squaresof acoustic time series data (with mean subtracted). Energyreadings estimated this way may contain the energy of thebackground noise. Suppose that the noise time series is mod-eled as a white Gaussian random process, its energy shouldhave a χ2 distribution. However, if the number of time sam-ples within each time window is sufficiently large, using cen-tral limit theorem, the noise energy can be modeled with anequivalent Gaussian random process. Note that although thenoise energy level is likely to be the same over neighboringsensor nodes, the source energy measured at different sen-

Table 4: Parameter settings for the experiments to examine the lo-calization to perturbation.

Configuration # Grid size ∆α ∆r ∆g SNR (dB)

1 1 × 1 0 0 0 ∞2 5 × 5 0 0 0 ∞3 10 × 10 0 0 0 ∞4 1 × 1 0.2 0 0 ∞5 1 × 1 0.5 0 0 ∞6 1 × 1 1 0 0 ∞7 1 × 1 0 0.5 0 ∞8 1 × 1 0 1 0 ∞9 1 × 1 0 5 0 ∞

10 1 × 1 0 0 0.2 ∞11 1 × 1 0 0 0.5 ∞12 1 × 1 0 0 1 ∞13 1 × 1 0 0 0 20

14 1 × 1 0 0 0 10

15 1 × 1 0 0 0 0

sor nodes are different according to the energy decay model.In fact, due to energy decay, the SNR reduces by a factorof (a log10 |r − ri|) provided that the background noise en-ergy levels at every sensor are the same. If α ≈ 2, this means2 dB SNR reductions for every additional 10 meters distance.Hence, the SNR, measured at a sensor that is 50 meters awayfrom the source, will be 10 dB less than the SNR measured at1 meter from the same source.

5.2.2 Simulation method

In this experiment, 20 randomly located sensors are used tolocate a randomly placed target. Both are located within apredefined sensor field. We use the ES algorithm to min-imize the cost function. As listed in Table 4, 15 configura-tions are designed for this experiment. The first three config-urations are designed to compare the effect of different gridsize for ES. Three grid resolutions 1 meter, 5 meters, and 10meters are used. The purpose of configurations #4 to #6 isto compare the algorithm sensitivity with respect to varia-tions of exponential decaying factor α. The actual value of αis randomly drawn from the interval [α − ∆α, α + ∆α] with∆α = 0.2, 0.5, and 1. Configurations #7 to #9 are designed tocompare the effect of inaccurate sensor locations measure-ment. Each sensor location vector r is randomly perturbedas r + ∆r where ∆r = [∆x,∆y] and ∆x, ∆y are both randomvariables uniformly distributed over an interval (in meters)[−0.5, 0.5], [−1, 1], or [−5, 5]. In configurations #10 to #12,we intend to examine the impacts of inaccuracy in acousticsensor gain variation. The actual sensor gain is drawn ran-domly from a uniform distribution [1 − ∆g, 1 + ∆g]. Ouraim in designing configurations #13 to #15 is to examinethe effects of different SNRs. The energy variations in theseconfigurations, specified in dB, are measured at 1 meter awayfrom the source. As we discussed earlier, the actual SNR ateach sensor varies, depending on the relative distance to the


Table 5: Mean (bias) and STD of simulation results using different grid sizes.

Biasx-coordinate y-coordinate

1 × 1 grid 5 × 5 grid 10 × 10 grid 1 × 1 grid 5 × 5 grid 10 × 10 grid

(20, 19) 0.041 −0.0434 0.0649 0.0576 0.1755 0.1615

(10, 9) 0.009 0.0216 −0.0451 0.0396 0.2005 0.0915

(5, 4) 0.01 0.0516 0.1149 0.0316 0.0755 0.0115

(20, 190) −0.007 0.0216 −0.1151 0.0176 −0.0045 −0.0285

(10, 45) −0.006 −0.0134 −0.1151 0.0166 −0.0395 0.0015

(5, 10) −0.01 0.0366 −0.0151 0.0146 0.0305 −0.1285

STDx-coordinate y-coordinate

1 × 1 grid 5 × 5 grid 10 × 10 grid 1 × 1 grid 5 × 5 grid 10 × 10 grid

(20, 19) 0.7792 2.7691 4.4097 0.7147 2.7548 4.2629

(10, 9) 0.673 2.4047 4.0726 0.6453 2.418 4.0569

(5, 4) 0.83 2.5052 4.0247 0.721 2.71 4.2094

(20, 190) 0.3036 1.4664 3.0664 0.296 1.4916 2.9337

(10, 45) 0.3054 1.5007 3.111 0.2978 1.5019 2.9642

(5, 10) 0.3307 1.5853 3.3359 0.3151 1.5612 3.2578

source. SNR= ∞ implies that there is no noise, that is, ε = 0.SNR = 0 means that the noise energy is equal to that of thesource energy. The perturbations on r, g, and SNR are ap-plied to all individual sensors.

As in the previous experiment, different numbers of sen-sors and numbers of energy ratios may affect the localizationaccuracy. To better understand their impact, we devised sixdifferent modes and denoted this combination, using a vec-tor (N,M), where N = number of sensors used and M =number of energy ratios used. These modes are (20, 19),(10, 9), (5, 4), (20, 190), (10, 45), and (5, 10). In the first threemodes, M = N − 1. In the last three, M = N(N − 1)/2.For each configuration and each of the mode, 1000 indepen-dent simulations are performed and the mean and STD ofthe results in both x and y directions are computed for fur-ther analysis.

5.2.3 Results and discussions

(a) Different grid size (search resolution). The simulation re-sults corresponding to configurations #1 to #3 are listed inTable 5.

The following two observations are worth noting.

(i) Bias—the energy-based source localization methodyields unbiased estimate at each of the three grid sizes.

(ii) Variance—suppose that the target location is uni-formly and randomly distributed within a grid, thenthe expected STD of position estimation error will be�/√

12 ∼= 0.2887� at each x- and y-direction. FromTable 5, it is clear that when the maximum number ofenergy ratios are used, that is, M = N(N − 1)/2, theposition estimation error will approximate this lowerbound. On the other hand, when M = N − 1, thevariances are uniformly larger. This is more prominentwhen the grid size is small. Our conjecture is that the

cost functions formed, using N − 1 energy ratios doesnot, have the same global minimum as the cost func-tion formed using N(N − 1)/2 energy ratios.

(b) Variation on α—the results corresponding to config-urations #1, 4, 5, and 6 are listed in Table 6.

Again, we make two observations on this table.

(i) Bias—the variation of the energy decay exponent α haslittle effect on the bias of the estimation error.

(ii) STD—the variations of α did impact the results whenM = N − 1. It seems that the more sensors are used,the larger the STD is. On the other hand, when M =N(N − 1)/2, the variation of α as large as 1, that is, thevalues of α varies between 1 and 3, has little effect onthe STD of the location estimation error. This is an im-portant evidence to justify the use of a nominal valueof α = 2 provided the maximum number of energyratios is included in the cost function definition.

(c) Variations on sensor position error r—the results aresummarized in Table 7.

As in the previous cases, the sensor location errors willnot impose any bias to the location estimates. What is dif-ferent from the previous cases is that the STD of estima-tion errors seem to be similar using either M = N − 1 orM = N(N − 1)/2 energy ratios.

(d) Variations on sensor gain g—the results are summa-rized in Table 8.

Consistent with the results obtained in the previous ex-periment, the energy-based source localization algorithm isquite sensitive to the error in gain calibration. In particular,in terms of STD, two important trends can be observed fromTable 8.

(i) More sensors give worse results. Apparently, more sen-sors with wrong gain factor will impact significantly


Table 6: Mean and STD of position estimate errors due to variation of α.


∆α = 0 ∆α = 0.2 ∆α = 0.5 ∆α = 1 ∆α = 0 ∆α = 0.2 ∆α = 0.5 ∆α = 1

(20, 19) 0.041 −0.07 −0.0347 −0.2275 0.0576 0.0279 −0.0236 −0.0062

(10, 9) 0.009 −0.086 −0.0047 −0.0875 0.0396 0.0189 −0.0116 −0.0082

(5, 4) 0.01 −0.055 0.0593 −0.0395 0.0316 0.0229 0.0104 −0.0252

(20, 190) −0.007 0.004 −0.0177 0.0075 0.0176 0.0039 −0.0026 −0.0072

(10, 45) −0.006 0.003 −0.0097 0.0035 0.0166 0.0099 −0.0046 −0.0002

(5, 10) −0.01 0.004 −0.0027 0.0145 0.0146 0.0129 −0.0076 −0.0002


∆α = 0 ∆α = 0.2 ∆α = 0.5 ∆α = 1 ∆α = 0 ∆α = 0.2 ∆α = 0.5 ∆α = 1

(20, 19) 0.7792 0.8585 1.4971 2.9646 0.7147 0.9253 1.4706 2.9502

(10, 9) 0.673 0.7729 1.1727 2.183 0.6453 0.7842 1.205 2.1486

(5, 4) 0.83 0.7788 1.2808 2.1823 0.721 0.8247 1.4049 1.993

(20, 190) 0.3036 0.3007 0.2841 0.2935 0.296 0.2993 0.2962 0.291

(10, 45) 0.3054 0.3037 0.2833 0.2979 0.2978 0.2996 0.3007 0.2898

(5, 10) 0.3307 0.3225 0.3273 0.3549 0.3151 0.3259 0.3276 0.3286

Table 7: Mean and STD of source location estimation error due to different sensor location errors.


d(r) = 0 d(r) = 0.5 d(r) = 1 d(r) = 5 d(r) = 0 d(r) = 0.5 d(r) = 1 d(r) = 5

(20, 19) 0.041 0.0245 −0.0088 −0.0186 0.0576 0.0136 −0.0121 0.1262

(10, 9) 0.009 0.0195 −0.0848 0.0074 0.0396 0.0586 0.0149 0.1362

(5, 4) 0.01 0.0505 −0.0538 0.0274 0.0316 0.0546 −0.0031 −0.0088

(20, 190) −0.007 0.0005 0.0232 −0.0876 0.0176 −0.0064 0.0109 0.0852

(10, 45) −0.006 0.0185 0.0132 0.0154 0.0166 −0.0074 0.0829 0.0092

(5, 10) −0.01 0.0235 −0.0378 0.1484 0.0146 0.0176 −0.0211 −0.0978


d(r) = 0 d(r) = 0.5 d(r) = 1 d(r) = 5 d(r) = 0 d(r) = 0.5 d(r) = 1 d(r) = 5

(20, 19) 0.7792 0.873 1.0054 3.5845 0.7147 0.842 1.0074 3.5751

(10, 9) 0.673 0.8418 1.0797 3.7525 0.6453 0.8391 1.0405 3.594

(5, 4) 0.83 1.061 1.6229 4.093 0.721 1.1529 1.6245 4.4003

(20, 190) 0.3036 0.3672 0.5774 4.8538 0.296 0.3664 0.5541 4.9591

(10, 45) 0.3054 0.4718 0.8653 4.0243 0.2978 0.4458 0.8016 3.8904

(5, 10) 0.3307 0.8271 1.5717 3.9941 0.3151 0.8392 1.7502 4.2185

the shape of the cost function and therefore the loca-tion of its minimum.

(ii) Using M = N − 1 or M = N(N − 1)/2 yields approx-imately the same quality of the results. The favor isslightly tilted toward the former. However, the differ-ence is not statistically significant.

The key lesson learned from these three configurations isthat sensor gain calibration is crucial to the success of this al-gorithm. Hence, each sensor should be calibrated before de-ployment in the field.

(e) Variations on SNR—the results are summarized inTable 9.

The effect of additive background noise is similar to thatof sensor gain perturbation: both will affect the accuracy ofenergy measurements at each sensor. From Table 9, one ob-serves that

(i) the more sensors are used, the larger the STD. Appar-ently, the energy estimation errors do not cancel eachother when more sensor readings are used.

(ii) other than SNR = ∞, the two modes M = N − 1 andM = N(N − 1)/2 yield approximately the same stan-dard deviation. The differences increase when moresensors are being used.

We must note that for practical vehicle target, the SNR at


Table 8: Mean and STD of localization error for different sensor gain values.

Meanx-coordinate y-coordinate

d(g) = 0 d(g) = 0.2 d(g) = 0.5 d(g) = 1 d(g) = 0 d(g) = 0.2 d(g) = 0.5 d(g) = 1

(20, 19) 0.041 0.0869 0.0152 0.0727 0.0576 0.0148 0.0143 0.3178

(10, 9) 0.009 0.0369 −0.0058 0.0977 0.0396 0.0268 0.0043 0.1988

(5, 4) 0.01 0.0619 0.0512 −0.0553 0.0316 −0.0582 0.0253 0.1958

(20, 190) −0.007 0.1429 0.1452 0.0217 0.0176 0.0168 0.2463 0.5738

(10, 45) −0.006 0.0519 0.1922 0.1137 0.0166 0.0808 0.1133 0.2618

(5, 10) −0.01 0.0489 0.0292 0.0067 0.0146 −0.0022 0.0233 0.2008


d(g) = 0 d(g) = 0.2 d(g) = 0.5 d(g) = 1 d(g) = 0 d(g) = 0.2 d(g) = 0.5 d(g) = 1

(20, 19) 0.7792 1.3112 0.852 9.0153 0.7147 1.2086 3.5631 9.1049

(10, 9) 0.673 1.5515 3.2465 6.5942 0.6453 1.5616 3.1859 6.3942

(5, 4) 0.83 2.0879 3.213 4.9774 0.721 1.9148 3.2588 4.992

(20, 190) 0.3036 2.9657 7.4987 10.2708 0.296 2.7917 7.0885 10.0506

(10, 45) 0.3054 2.3371 4.2964 6.0754 0.2978 2.1831 4.1489 5.9161

(5, 10) 0.3307 2.2247 3.3303 4.526 0.3151 2.1157 3.3239 4.3258

Table 9: Mean and STD of position estimation error due to background noise.

Meanx-coordinate y-coordinate

SNR = ∞ SNR = 20 dB SNR = 10 dB SNR = 0 dB SNR = ∞ SNR = 20 dB SNR = 10 dB SNR = 0 dB

(20, 19) 0.041 −0.1776 0.1094 −0.082 0.0576 0.2093 0.0268 0.5168

(10, 9) 0.009 −0.2166 0.1604 0.08 0.0396 0.1833 0.0228 0.2688

(5, 4) 0.01 −0.1146 0.1784 −0.035 0.0316 0.0243 −0.1152 0.1878

(20, 190) −0.007 −0.6146 0.4334 −0.202 0.0176 0.2893 −0.0312 0.7268

(10, 45) −0.006 −0.1546 0.3234 −0.111 0.0166 0.3033 −0.0822 0.3788

(5, 10) −0.01 −0.0896 0.1784 −0.103 0.0146 −0.0397 −0.0312 0.1738


SNR = ∞ SNR = 20 dB SNR = 10 dB SNR = 0 dB SNR = ∞ SNR = 20 dB SNR = 10 dB SNR = 0 dB

(20, 19) 0.7792 5.6954 8.1004 9.837 0.7147 5.5849 7.5979 9.4387

(10, 9) 0.673 3.8446 5.6287 6.7225 0.6453 3.9481 5.2083 6.4093

(5, 4) 0.83 2.8921 4.4332 5.2167 0.721 3.1086 3.9147 5.2041

(20, 190) 0.3036 10.9375 12.6961 13.5552 0.296 10.9602 12.4986 13.3184

(10, 45) 0.3054 5.1254 6.5798 7.2908 0.2978 5.2017 6.5202 7.1348

(5, 10) 0.3307 3.0126 4.5768 5.0284 0.3151 3.1965 4.1028 4.9295

the source is often much higher than 40 dB. The condi-tion of 0 dB or worse may occur when strong wind directlyblowing into a microphone without wind-damper protec-tion, or the microphone is hit by blowing debris or similarinterferences.

5.2.4 Discussion

Based on the above two experiments, one may deduce thefollowing guidelines for the proper implementation of theenergy-based acoustic source localization algorithm:

(i) proper definition of the sensor field where the poten-tial target localization will lie;

(ii) careful calibration of sensor gain factor;(iii) use one of the fast search algorithm MR, GD, or sim-

plex DS method after first conducting a coarse-grainedES within the sensor field;

(iv) using few reliable energy readings from a few sensoris preferred to using many unreliable energy readingsfrom more sensors. If one may assess the accuracy ofindividual energy reading, it will be possible to pruneout unreliable sensor readings to enhance the overalllocalization accuracy;

(v) using more energy ratios (i.e., M = N(N − 1)/2) oftenyield more reliable results.


3002001000−100−200−300−300

−200

−100

0

100

200

300Distribution of errors: EBL

3002001000−100−200−300−300

−200

−100

0

100

200

300Distribution of errors: CPA

30020010000

50

100

150

200EBL position error histogram

30020010000

50

100

150

2002D CPA position error histogram

Figure 6: Comparison between EBL and CPA localization method.

5.3. Comparison with other acoustic localizationmethods

The energy-based single acoustic source localization methodpresented above differs from other existing method in anumber of important aspects, as follows.

(1) Target positions are estimated at constant time inter-val—with the CPA-based approach, a new target lo-cation is obtained only when the moving target passesthrough another sensor. If the target stopped and re-main stationary for a period of time, no additionalCPA detection will be made. With energy-based sourcelocalization method, as long as the target continue toemit acoustic energy, its location will be estimated ona regular time interval, even when the target vehicle isidling and remain stationary. This significantly simpli-fies the task of the tracking algorithm.

(2) Energy-based method reduces communication require-ments over wireless channels, and hence conservespower—energy is a scalar quantity that is computedover a number of data samples. The frequency of how

often an energy reading is computed can be easily ad-justed to meet the performance requirement and com-munication bandwidth as well as energy consump-tion constraints. Time delay-based localization meth-ods will require accurate estimate of relative time de-lays (or phase difference in frequency domain) be-tween different sensors. Hence, they may require moreraw data samples or corresponding frequency compo-nents to be exchanged between sensor nodes.

We conducted a preliminary experiment comparing theproposal EBL algorithm with the 2D CPA algorithm. A sen-sor field of 300 meters by 300 meters is deployed with eightacoustic sensors at random locations. The target location isalso randomly chosen within the same sensor field. Both sen-sor locations and target locations are drawn from a uniformdistribution. The measured sensor locations, however, are as-sumed to suffer a measurement error that is uniformly dis-tributed over [−0.5, 0.5] meters. The acoustic sensor gain gis assumed to vary between 0.6 and 1.2 compared to a cal-ibrated value of 1. Each sensor is also subject to a 20 dB


Table 10: Mean and STD of the estimation error.

EBL CPA

Mean value −0.14873 −0.60246 0.41733 −0.72433

STD 49.0514 46.5717 48.292 53.8862

additive Gaussian random noise with zero mean. The sourceenergy level is fixed at a value of 1000.

For the 2D CPA method, the measured sensor locationcorresponding to the sensor receiving maximum acoustic en-ergy will be used as an estimate of the target location. For theEBL method, a search grid of 10 meters, each side will beused to enable an ES. The experiment contains 1000 inde-pendent trials. In each trial, the sensor locations, the targetlocation, the perturbations on sensor location measurement,sensor gain variation, and additive noise are generated ac-cording to the specified distribution.

The mean and STDs of the target position estimation er-rors of these two methods are listed in Table 10.

The results are summarized in Figure 6. The ellipses inthe top row specify the covariance matrices of these errorswith each grey dot representing error incurred in a partic-ular try. The histograms of the magnitudes of the positionestimation errors are depicted at the bottom row.

6. DISCUSSION AND CONCLUSION

In this paper, we have presented the energy-based source lo-calization algorithm, and derived theoretical results on thenumber of sensors required to yield a unique location esti-mate. We have also conducted extensive simulation to com-pare different search algorithms and to study the parametersensitivity characteristics of this proposed algorithm.

An implicit advantage of this proposed algorithm is itssimplicity: only acoustic energy measured during a specificperiod is needed. However, this simplicity also implies manypractical difficulties that need to be mitigated. In particular,we note that the microphone gain calibration and SNR es-timation are two key factors that affect the accuracy of thisproposed algorithm.

Currently, we are working to apply this algorithm to realdata obtained in the test ground. We are also studying poten-tial extension of this algorithm to localize more than a singletarget within the sensor field.

ACKNOWLEDGEMENTS

This work was partly supported by DARPA under Grant no.F 30602-00-2-0555. The authors would like to thank Profes-sors A. Sayeed, P. Ramanathan, K. Saluja at UW-Madison,Prof. K. Yao, and Dr. R. E. Hudson at University of California,Los Angles, for helpful discussions. The acoustic energy de-cay profile experiment is conducted with the assistance fromUW-Madison SensIT team graduate students K. C. Wang,T. Chin, T. Clouquerur, V. Phipatanasuphcorn, A. Ashraf,

A. D’Costa, and M. Duarte. The authors would also like toextend their gratitude to the anonymous reviewers for theirvery constructive and helpful comments. In particular, Sec-tions 3.6, 4.4, and 5.3 are added upon their suggestions.

REFERENCES

[1] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrument-ing the world with wireless sensor networks,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, Salt Lake City,Utah, USA, May 2001.

[2] C. Savarese, J. M. Rabaey, and J. Beutel, “Locationing indistributed Ad-hoc wireless sensor networks,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, Salt Lake City,Utah, USA, May 2001.

[3] N. L. Owsley, “Sonar array processing,” in Array Signal Pro-cessing, S. Haykin, Ed., pp. 115–193, Prentice-Hall, EnglewoodCliffs, NJ, USA, 1985.

[4] A. Tolstoy, Matched Field Processing for Underwater Acoustics,World Scientific Publications, Singapore, 1993.

[5] M. S. Brandstein, J. E. Adcock, and H. F. Silverman, “Alocalization-error-based method for microphone-array de-sign,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro-cessing, Atlanta, Ga, USA, May 1996.

[6] M. S. Brandstein, J. E. Adcock, and H. F. Silverman, “A closed-form location estimator for use with room environment mi-crophone arrays,” IEEE Trans. Speech, and Audio Processing,vol. 5, no. 1, pp. 45–50, 1997.

[7] J. Huang, N. Ohnishi, and N. Sugie, “Sound localization in re-verberant environment based on the model of the precedenceeffect,” IEEE Trans. Instrumentation and Measurement, vol. 46,no. 4, pp. 842–846, 1997.

[8] M. Omologo and P. Svaizer, “Acoustic source location innoisy and reverberant environment using CSP analysis,” inProc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, At-lanta, Ga, USA, May 1996.

[9] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli,“Blind beamforming on a randomly distributed sensor arraysystem,” IEEE Journal on Selected Areas in Communications,vol. 16, no. 8, pp. 1555–1567, 1998.

[10] C. W. Reed, R. Hudson, and K. Yao, “Direct joint source lo-calization and propagation speed estimation,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, Phoenix, Ariz,USA, May 1999.

[11] S. Haykin, J. H. Justice, N. L. Owsley, J. L. Yen, and A. C. Kak,Array Signal Processing, Prentice-Hall, Englewood Cliffs, NJ,USA, 1985.

[12] G. C. Carter, Coherence and Time Delay Estimation, IEEEPress, Mill Road Edison, NJ, USA, 1993.

[13] “Special issue on time-delay estimation,” IEEE Trans. Acous-tics, Speech, and Signal Processing, vol. 29, June 1981.

[14] J. O. Smith and J. S. Abel, “Closed-form least-squares loca-tion estimation from range-difference measurements,” IEEETrans. Acoustics, Speech, and Signal Processing, vol. 35, no. 12,pp. 1661–1669, 1987.

[15] L. Kinsler, A. Frey, A. Coppens, and J. Sanders, Fundamentalsof Acoustics, John Wiley and Sons, New York, NY, USA, 1982.

[16] K. Srodecki, “Evaluation of the reverberation decay quality inrooms using the autocorrelation function and the cepstrumanalysis,” Acustica, vol. 80, no. 3, pp. 216–225, 1994.

[17] Y. L. Li, M. J. White, and S. J. Franke, “New fast field pro-grams for anisotropic sound propagation through an atmo-sphere with a wind velocity profile,” Journal of the AcousticalSociety of America, vol. 95, pp. 718–726, 1994.


[18] E. M. Salomons, “Downwind propagation of sound in an at-mosphere with a realistic sound-speed profile: a semianalyt-ical ray model,” Journal of the Acoustical Society of America,vol. 95, no. 5, pp. 2425–2436, 1994.

[19] T. Watanabe and S. Yamada, “Sound attenuation through ab-sorption by vegetation,” Journal of the Acoustical Society ofJapan, vol. 17, pp. 175–182, 1996.

[20] J. C. Chen, R. E. Hudson, and K. Yao, “A maximum likelihoodparametric approach to source localization,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, Salt Lake City,Utah, USA, May 2001.

[21] J. O. Smith and J. S. Abel, “The spherical interpola-tion method for closed-form passing source localizationusing range difference measurements,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, Dallas, Tex,USA, 1987.

[22] R. E. Schwartz, “Minimax CFAR detection in additive Gaus-sian noise of unknown covariance,” IEEE Transactions on In-formation Theory, vol. 15, pp. 722–725, 1969.

[23] J. Dabkowski Jr., “CFAR operation in non-homogeneous andpoint clutter,” in Proc. National Electronics Conference, 1970.

Dan Li received the B.S.E.E. degree from Ts-inghua University, Beijing, China, in 1996;the M.S. degree in biomedical engineeringfrom the University of Kentucky in 1999;and the M.S.E.E. degree from the Universityof Wisconsin-Madison in 2001. He is cur-rently a Researcher and Development Engi-neer at the Guidant Corporation, St. Paul,Minn, USA. His research interests are in al-gorithm development and applications ofDSP and statistical signal processing.

Yu Hen Hu received the B.S.E.E. degreefrom National Taiwan University, Taiwan,in 1976. He received the M.S. and Ph.D.degrees both in electrical engineering fromUniversity of Southern California, Los An-geles, Calif in 1980 and 1982, respectively.Currently, he is a Professor at the Electricaland Computer Engineering Department ofthe University of Wisconsin-Madison, Wis,USA. Previously, he has been with the Elec-trical Engineering Department of the Southern Methodist Univer-sity, Dallas, Tex, USA. Dr. Hu’s research interests include multime-dia signal processing, design methodology and implementation ofsignal processing algorithms and systems, sensor network and dis-tributive signal processing algorithms, and neural network signalprocessing. He published more than 200 journal and conferencepapers and edited two books: Programmable Digital Signal Proces-sors and Handbook of Neural Network Signal Processing. Dr. Hu isa Fellow of IEEE. He served as Associate Editor for IEEE Trans-actions on Signal Processing, IEEE Signal Processing Letters, Jour-nal of VLSI Signal Processing, EURASIP Journal on Applied SignalProcessing. He served as Secretary of IEEE signal processing soci-ety, board of governors of IEEE neural network council, Chair ofIEEE signal processing society, and neural network signal process-ing technical committee.


The Fusion of Distributed Microphone Arraysfor Sound Localization

Parham AarabiDepartment of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada M5S 3G4Email: [email protected]

Received 1 November 2001 and in revised form 2 October 2002

This paper presents a general method for the integration of distributed microphone arrays for localization of a sound source. Therecently proposed sound localization technique, known as SRP-PHAT, is shown to be a special case of the more general microphonearray integration mechanism presented here. The proposed technique utilizes spatial likelihood functions (SLFs) produced by eachmicrophone array and integrates them using a weighted addition of the individual SLFs. This integration strategy accounts for thedifferent levels of access that a microphone array has to different spatial positions, resulting in an intelligent integration strategythat weighs the results of reliable microphone arrays more significantly. Experimental results using 10 2-element microphonearrays show a reduction in the sound localization error from 0.9 m to 0.08 m at a signal-to-noise ratio of 0 dB. The proposedtechnique also has the advantage of being applicable to multimodal sensor networks.

Keywords and phrases: microphone arrays, sound localization, sensor integration, information fusion, sensor fusion.

1. INTRODUCTION

The localization of sound sources using microphone arrayshas been extensively explored in the past [1, 2, 3, 4, 5, 6, 7]. Itsapplications include, among others, intelligent environmentsand automatic teleconferencing [8, 9, 10, 11]. In all of theseapplications, a single microphone array of various sizes andgeometries has been used to localize the sound sources usinga variety of techniques.

In certain environments, however, multiple microphonearrays may be operating [9, 11, 12, 13]. Integrating the re-sults of these arrays might result in a more robust sound lo-calization system than that obtained by a single array. Fur-thermore, in large environments such as airports, multiplearrays are required to cover the entire space of interest. Inthese situations, there will be regions in which multiple ar-rays overlap in the localization of the sound sources. In theseregions, integrating the results of the multiple arrays mayyield a more accurate localization than that obtained by theindividual arrays.

Another matter that needs to be taken into considera-tion for large environments is the level of access of each ar-ray to different spatial positions. It is clear that as a speakermoves farther away from a microphone array, the array willbe less effective in the localization of the speaker due tothe attenuation of the sound waves [14]. The manner inwhich the localization errors increase depends on the back-ground signal-to-noise ratio (SNR) of the environment andthe array geometry. Hence, given the same background SNR

and geometry for two different arrays, the array closer tothe speaker will, on an average, yield more accurate loca-tion estimates than the array that is farther away. Conse-quently, a symmetrical combination of the results of the twoarrays may not yield the lowest error since more significanceshould be placed on the results of the array closer to thespeaker. Two questions arise at this point. First, how do weestimate or even define the different levels of access that amicrophone array may have to different spatial positions?Second, if we do have a quantitative level-of-access defini-tion, how do we integrate the results of multiple arrays whileat the same time accounting for the different levels of ac-cess.

In order to accommodate variations in the spatial ob-servability of each sensor, this paper proposes the spatial ob-servability function (SOF), which gives a quantitative indica-tion of how well a microphone array (or a sensor in general)perceives events at different spatial position. Also, each mi-crophone array will have a spatial likelihood function (SLF),which will report the likelihood of a sound source at eachspatial position based on the readings of the current micro-phone array [8, 13, 15]. It is then shown, using simulationsand experimental results, that the SOFs and SLFs for differ-ent microphone arrays can be combined to result in a robustsound localization system utilizing multiple microphone ar-rays. The proposed microphone array integration strategy isshown to be equivalent, in the case that all arrays have equalaccess, to the array integration strategies previously proposed[7, 12].

The Fusion of Distributed Microphone Arrays for Sound Localization 339

2. BASIC SOUND LOCALIZATION

Sound localization is accomplished by using differences inthe sound signals received at different observation pointsto estimate the direction and eventually the actual loca-tion of the sound source. For example, the human ears,acting as two different sound observation points, enablehumans to estimate the direction of arrival of the soundsource. Assuming that the sound source is modeled as apoint source, two different clues can be utilized in soundlocalization. The first clue is the interaural level difference(ILD). Emanated sound waves have a loudness that gradu-ally decays as the observation point moves further away fromthe source [6]. This decay is proportional to the square ofthe distance between the observation point and the sourcelocation.

Knowledge about the ILD at two different observationpoints can be used to estimate the ratio of the distances be-tween each observation point and the sound source location.Knowing this ratio as well as the locations of the observationpoints allows us to constrain the sound source location [6].Another clue that can be utilized for sound localization is theinteraural time difference (ITD), more commonly referredto as the time difference of arrival (TDOA). Assuming thatthe distance between each observation point and the soundsource is different, the sound waves produced by the sourcewill arrive at the observation points at different times due tothe finite speed of sound.

Knowledge about the TDOA at the different observa-tion points and the velocity of sound in air can be used toestimate the difference in the distances of the observationpoints to the sound source location. The difference in the dis-tances constrains the sound source location to a hyperbolain two dimensions, or a hyperboloid in three dimensions[8].

By having several sets of observation point pairs, it be-comes possible to use both the ILD and the TDOA re-sults in order to accurately localize sound sources. In real-ity, for speech localization, TDOA-based location estimatesare much more accurate and robust than ILD-based loca-tion estimates, which are mainly effective for signals withhigher frequency components than signals with componentsat lower frequencies [16]. As a result, most state-of-the-art sound localization systems rely mainly on TDOA results[1, 3, 4, 8, 17].

There are many different algorithms that attempt to es-timate the most likely TDOA between a pair of observers[1, 3, 18]. Usually, these algorithms have a heuristic measurethat estimates the likelihood of every possible TDOA, and se-lects the most likely value. There are generally three classesof TDOA estimators, including the general cross-correlation(GCC) approach, the maximum likelihood (ML)approach,and the phase transform (PHAT) or frequency whitening ap-proach [3]. All these approaches attempt to filter the cross-correlation in an optimal or suboptimal manner, and thenselect the time index of the peak of the result to be the TDOAestimate. A simple model of the signal received by two mi-crophones is shown as [3]

x1(t) = h1(t)∗ s(t) + n1(t),

x2(t) = h2(t)∗ s(t − τ) + n2(t).(1)

The two microphones receive a time-delayed version of thesource signal s(t), each through channels with possiblydifferent impulse responses h1(t) and h2(t), as well as amicrophone-dependent noise signal n1(t) and n2(t). Themain problem is to estimate τ, given the microphone signalsx1(t) and x2(t). Assuming X1(ω) and X2(ω) are the Fouriertransforms of x1(t) and x2(t), respectively, a common solu-tion to this problem is the GCC shown below [3, 7],

�τ = arg max

β

∫∞−∞

W(ω)X1(ω)X2(ω)e jwβdw, (2)

where τ is an estimate of the original source signal delay be-tween the two microphones. The actual choice of the weigh-ing function W(ω) has been studied at length for generalsound and speech sources, and three different choices, theML [3, 19], the PHAT [3, 17], and the simple cross correla-tion [6] are shown below,

WML(ω) =∣∣X1(ω)

∣∣∣∣X2(ω)∣∣∣∣N1(ω)

∣∣2∣∣X2(ω)∣∣2

+∣∣N2(ω)

∣∣2∣∣X1(ω)∣∣2 ,

WPHAT(ω) = 1∣∣X1(ω) · X2(ω)∣∣ ,

WUCC(ω) = 1,

(3)

where N1(ω) and N2(ω) are the estimated noise spectra forthe first and second microphones, respectively.

The ML weights require knowledge about the spectrumof the microphone-dependent noises. The PHAT does not re-quire this knowledge, and hence has been employed moreoften due to its simplicity. The unfiltered cross correlation(UCC) does not utilize any weighing function.

3. SPATIAL LIKELIHOOD FUNCTIONS

Often, it is beneficial not only to record the most likely TDOAbut also the likelihood of other TDOAs [1, 15] in order tocontrast the likelihood of a speaker at different spatial posi-tions. The method of producing an array of likelihood pa-rameters that correspond either to the direction or to the po-sition of the sound source can be interpreted as generatinga SLF [12, 14, 20]. Each microphone array, consisting of aslittle as 2 microphones, can produce an SLF for its environ-ment.

An SLF is essentially an approximate (or noisy) measure-ment of the posterior likelihood P(φ(x)|X), where X is a ma-trix of all the signal samples in a 10–20-ms time segment ob-tained from a set of microphones and φ(x) is the event thatthere is a speaker at position x. Often, the direct computationof P(φ(x)|X) is not possible (or tractable), and as a result, avariety of methods have been proposed to efficiently measure

e(x) = ψ(P(φ(x)|X)), (4)


−5 −3 −1 1 3 5Spatial x-axis

0

2

4

6

8

10Sp

atia

ly-

axis

Figure 1: SLF with the dark regions corresponding to a higher like-lihood and the light regions corresponding to a lower likelihood.

where ψ(t) is a monotonically nondecreasing function of t.The reason for wanting a monotonically nondecreasing func-tion is that we only care about the relative values (at differentspatial locations) of the posterior likelihood and hence anymonotonically nondecreasing function of it will suffice forthis comparison.

In this paper, whenever we define or refer to an SLF, itis inherently assumed that the SLF is related to the posteriorestimate of a speaker at position x, as defined by (4).

The simplest SLF generation method is to use the unfil-tered cross correlation between two microphones, as shownin Figure 1. Assuming that τ(x) is the TDOA between the twomicrophones for a sound source at position x, we can definethe cross-correlation-based SLF as

e(x) =∫∞−∞

X1(ω)X2(ω)e jwτ(x)dw. (5)

The use of the cross correlation for the posterior like-lihood estimate merits further discussion. The cross corre-lation is essentially an observational estimate of P(X|φ(x)),which is related to the posterior estimate as follows:

P(φ(x)|X) = P

(X|φ(x)

)P(φ(x)

)P(X)

. (6)

The probability P(φ(x)) is the prior probability of aspeaker at position x, which we define as ρx. When using thecross correlation (or any other observational estimate) to es-timate the posterior probability, we must take into accountthe “masking” of different positions caused by ρx. Note thatthe P(X) term is not a function of x and hence can be ne-glected since, for a given signal matrix, it does not change therelative value of the SLF at different positions. In cases whereall spatial positions have an equal probability of a speaker(i.e., ρx is constant over x), the masking effect is just a con-stant scaling of the observational estimate, and only in sucha case, we do get the posterior estimate of (5).

SLF generation using the unfiltered cross correlation isoften referred to as a delay-and-sum beamformer-based en-ergy scans or as steered response power (SRP). Using a sim-ple or filtered cross correlation to obtain the likelihood ofdifferent TDOAs and using them as the basis of the SLFs isnot the only method for generating SLFs. In fact, for mul-tiple speakers, using a simple cross correlation is one of theleast accurate and least robust approaches [4]. Many othermethods have generally been employed in multisensor-arraySLF generation, including the multiple signal classification(MUSIC) algorithm [21], ML algorithm [22, 23, 24], SRP-PHAT [7], and the iterative spatial probability (ISP) algo-rithm [1, 15]. There are also several methods developed forwideband source localization, including [25, 26, 27]. Most ofthese can be classified as wideband extensions of the MUSICor ML approaches.

The works [1, 15] describe the procedure of obtainingan SLF using TDOA distribution analysis. Basically, for theith microphone pair, the probability density function (PDF)of the TDOA is estimated from the histogram consisting ofthe peaks of cross correlations performed on multiple speechsegments. Here, it is assumed that the speech source (andhence the TDOA) remains stationary for the duration oftime that all speech segments are recorded. Then, each spatialposition is assigned a likelihood that is proportional to theprobability of its corresponding TDOA. This SLF is scaled sothat the maximum value of the SLF is 1 and the minimumvalue is 0. Higher values here correspond to a higher likeli-hood of a speaker at those locations.

In [7], SLFs are produced (called SRP-PHATs) for micro-phone pairs that are generated similarly to [1, 8, 15]. The dif-ference is that, instead of using TDOA distributions, actualfiltered cross correlations (using the PHAT cross correlationfilter) are used to produce TDOA likelihoods which are thenmapped to an SLF, as shown below,

e(x) =∑k

∑l

∫∞−∞

Xk(ω)Xl(ω)e jωτkl(x)∣∣Xk(ω)∣∣∣∣Xl(ω)

∣∣ dω, (7)

where e(x) is the SLF, Xi(ω) is the Fourier transform of thesignal received by the ith microphone, and τkl(x) is the arraysteering delay corresponding to the position x and the kthand lth microphones.

In the noiseless situation and in the absence of reverbera-tions, an SLF from a single microphone array will be a repre-sentative of the number and the spatial locations of the soundsources in an environment. When there is noise and/or re-verberations, the SLF of a single microphone array will bedegraded [3, 7, 28]. As a result, in practical situations, it isoften necessary to combine the SLFs of multiple microphonearrays in order to result in a more representative overall SLF.Note that in all of the work in [1, 7, 8, 15], SLFs are producedfrom 2-element microphone arrays and are simply added toproduce the overall SLF which, as will be shown, is a spe-cial case of the more robust integration mechanism proposedhere.

In this paper, we use the notation ei(x) for the SLF of theith microphone array over the environment x which can be


0 0.2 0.4 0.6 0.8 1 1.2

x-distance to source in m (y-distance fixed at 3.5 m)

0

0.05

0.1

0.15

0.2

0.25

Obs

erva

bilit

y

Figure 2: Relationship between sensor position and its observabil-ity.

a 2D or a 3D variable. In the case of 2-element microphonearrays, we also use the notation ekl(x) for the SLF of the mi-crophone pair formed by the kth and lth microphones, alsoover the environment x.

4. SPATIAL OBSERVABILITY FUNCTIONS

Under normal circumstances, an SLF would be entirelyenough to locate all spatial objects and events. However, insome situations, a sensor is not able to make inferences abouta specific spatial location (i.e., blocked microphone array)due to the fact that the sensing function provides incorrectinformation or no information about that position. As a re-sult, the SOF is used as an indication of the accuracy of theSLF. Although several different methods of defining the SOFexist [29, 30], in this paper, the mean square difference be-tween the SLF and the actual probability of an object at aposition is used as an indicator of the SOF.

The spatial observability of the ith microphone array cor-responding to the position x can thus be expressed as

oi(x) = E[(ei(x)− a(x)

)2], (8)

where oi(x) is the SOF, ei(x) is the SLF, and a(x) is the actualprobability of an object at position x, which can only take avalue of 0 or 1. We can relate a(x) to φ(x) as follows:

a(x) =1, if φ(x),

0, otherwise.(9)

The actual probability a(x) is a Bernoulli random vari-able with parameter ρx, the prior probability of an object atposition x. This prior probability can be obtained from thenature and geometry of the environment. For example, atspatial locations where an object or a wall prevents the pres-

−4 −3 −2 −1 0 1 2 3 4Spatial x-axis

0

1

2

3

4

5

6

7

8

Spat

ialy-

axis

Figure 3: A directly estimated SOF for a 2-element microphone ar-ray. The darker regions correspond to a lower SOF and the lighterregions correspond to a higher SOF. The location of the array is de-picted by the crosshairs.

ence of a speaker, ρx will be 0 and at other “allowed” spatialregions, ρx will take on a constant positive value.

In order to analyze the effects of spatial position of thesound source and the observability of the microphone array,an experiment was conducted with a 2-element microphonearray placed at a fixed distance of 3.5 m parallel to the spatialy-axis and a varying x-axis distance to a sound source. TheSLF values of the sensor corresponding to the source posi-tion were used in conjunction with prior knowledge aboutthe status of the source (i.e., the location of the source wasknown) in order to estimate the relationship between the ob-servability of the sensor and the x-axis position of the sensor.The results of this experiment, which are shown in Figure 2,suggest that as the distance of the sensor to the source in-creases, so does the observability.

In practice, the SOF can be directly measured by plac-ing stationary sound sources at known locations in space andcomparing it with the array SLF or by modeling the environ-ment and the microphone arrays with a presumed SOF [14].The modeled SOFs typically are smaller and closer to the mi-crophone array (more accurate localizations) and are largerfurther away from the array (less accurate localizations) [14].Clearly, the SOF values will also depend upon the overallnoise in the environment. More noise will increase the valueof the SOFs (higher localization errors), while less noise willresult in lower SOFs (lower localization errors). However, fora given environment with roughly equal noise at most loca-tions, the relative values of the SOF will remain the same,regardless of the noise level. As a result, in practice, we of-ten obtain a distance-to-array-dependent SOF as shown inFigure 3.

5. INTEGRATION OF DISTRIBUTED SENSORS

We will now utilize knowledge about the SLFs and SOFs inorder to integrate our microphone arrays. The approach here


is analogous to other sensor fusion techniques [12, 14, 20,31].

Our goal is to find the minimum mean square error(MMSE) estimate of a(x), which can be derived as follows.

Assuming that our estimate is a(x), we can define ourmean square error as

m(x) = (a(x)− a(x)

)2. (10)

From estimation theory [32], the estimate am(x) thatminimizes the above mean square error is

am(x) = Ea[a(x)|e0(x), e1(x), . . .]. (11)

Now, if we assume that the SLF has a Gaussian distribu-tion with mean equal to the actual object probability a(x)[14, 20], we can rewrite the MMSE estimate as follows:

am(x) = 1 · P(a(x) = 1|e0(x), . . .)

+ 0 · P(a(x) = 0|e0(x), . . .)

= P(a(x) = 1|e0(x), . . .

) (12)

which is exactly equal to (using the assumption that, for agiven a(x), all SLFs are independent Gaussians)

am(x) = 11 + (1− ρx)/ρx · exp

(∑i

(1− 2ei(x)/2oi(x)

)) ,(13)

where ρx is the prior sound source probability at the locationx. It is used to account for known environmental facts such asthe location of walls or desks at which a speaker is less likelyto be placed. Note that although the Gaussian model for theSLF works well in practice [14], it is not the only model orthe best model. Other models have been introduced and an-alyzed [14, 20].

At this point, it is useful to define the discriminant func-tion Vx as follows:

Vx =∑i

1− 2ei(x)2oi(x)

, (14)

and the overall object probability function can be expressedas

am(x) = 11 +

(1− ρx

) · exp(Vx)/ρx

. (15)

Hence, similar to the approach of [1, 8, 13], additive lay-ers dependent on individual sensors can be summed to re-sult in the overall discriminant. The discriminant is a spatialfunction indicative of the likelihood of a speaker at differentspatial positions, with lower values corresponding to higherprobabilities and higher values corresponding to lower prob-abilities. The discriminant does not take into account theprior sound source probabilities directly and hence a relativecomparison of discriminants is only valid for positions withequal prior probabilities.

This decomposition greatly simplifies the integration ofthe results of multiple sensors. Also, the inclusion of thespatial observabilities allows for a more accurate model ofthe behavior of the sensors, thereby resulting in greater ob-ject localization accuracy. The integration strategy proposedhere has been shown to be equivalent to a neural-network-based SLF fusion strategy [31]. Using neural networks oftenhas advantages such as direct influence estimation (obtainedfrom the neural weights) and the existence of strategies fortraining the network [33].

5.1. Application to multimedia sensory integration

The sensor integration strategy here, while focusing on mi-crophone arrays, can be adopted to a wide variety of sensorsincluding cameras and microphones. This work has been ex-plored in [12]. Although observabilities were not used in thiswork, resulting in a possible nonideal integration of the mi-crophone arrays and cameras, the overall result was impres-sive. An approximately 50% reduction in the sound localiza-tion errors was obtained at all SNRs by utilizing the audiovi-sual sound localization system compared to the stand-aloneacoustic sound localization system. Here, the acoustic soundlocalization system consisted of a 3-element microphone ar-ray and the visual object localization system consisted of apair of cameras.

5.2. Equivalence to SRP-PHAT

In the case when pairs of microphones are integrated with-out taking the spatial observabilities into account using SLFsobtained using the PHAT technique, the proposed sensor fu-sion algorithm is equivalent to the SRP-PHAT approach.

Assuming that the SLFs are obtained using the PHATtechnique, the SLF for the kth and lth microphones can bewritten as

ekl(x) =∫∞−∞


∣∣ dω, (16)

where Xk(ω) is the Fourier transform of the signal obtainedby the kth microphone, Xl(ω) is the complex conjugate of theFourier transform of the signal obtained by the lth micro-phone, and τkl(x) is the array steering delay correspondingto the position x and the microphones k and l.

In most applications, we care about the relative likeli-hoods of objects at different spatial positions. Hence, it suf-fices to only consider the discriminant function of (14) here.Assuming that the spatial observability of all microphonepairs for all spatial regions is equal, we obtain the followingdiscriminant function:

Vx = C1 − C2

∑i

ei(x), (17)

where C1 and C2 are positive constants. Since we care onlyabout the relative values of the discriminant, we can reduce(17) to

V ′x =

∑i

ei(x), (18)


Distributed network of microphone arrays

Single equivalent microphone array

Figure 4: The integration of multiple sensors into a single “super”-sensor.

and we note that while in (17) and (18) higher values of thediscriminant were indicative of a lower likelihood of an ob-ject, in (18) higher values of the discriminant are now indica-tive of a higher likelihood of an object. The summation overi is across all the microphone arrays. If we use only micro-phone pairs and use all available microphones, then we have

V ′x =

∑k

∑l

ekl(x). (19)

Utilizing (16), this becomes

V ′x =

∑k

∑l

∫∞−∞


∣∣ dω (20)

which is exactly equal to the SRP-PHAT equation [7].

6. EFFECTIVE SLF AND SOF

After the result of multiple sensors have been integrated, it isuseful to get an estimate of the cumulative observability ob-tained as a result of the integration. This problem is equiv-alent to finding the SLF and SOF of a single sensor that re-sults in the same overall object probability as that obtainedby multiple sensors, as shown in Figure 4.

This can be stated as

P(a(x) = 1|e0(x), o0(x), . . .

)= P

(a(x) = 1|e(x), o(x)

),

(21)

where e(x) is the effective SLF and o(x) is the effective SOFof the combined sensors. According to (13), this problem re-duces to finding equivalent discriminant functions, one cor-responding to the multiple sensors and one correspondingto the effective single sensors. According to (14), this be-comes (using the constraint that the effective SLF will alsobe a Gaussian)

∑i

1− 2ei(x)2oi(x)

= 1− 2e(x)2o(x)

. (22)

Now, we let the effective SOF be the variance of the ef-fective SLF, or in other words, we let the effective SOF be the

observability of the effective sensor. We first evaluate the vari-ance of the effective SLF as follows:

E(e(x)− Ee(x)

)2 = o(x)2E

(∑i

ei(x)− a(x)oi(x)

)2

. (23)

The random process ei(x) − a(x) is a zero-mean Gaus-sian random process, and the expectation of the square of asum of an independent set of these random processes is equalto the sum of the expectation of the square of each of theseprocesses [34], as shown below,

E(e(x)− Ee(x)

)2 = o(x)2∑i

E(ei(x)− a(x)

oi(x)

)2

. (24)

This is because all the cross-variances equal zero due tothe independency of the sensors and the zero means of therandom process. Equation (24) can be simplified to produce

E(e(x)− Ee(x)

)2 = o(x)2∑i

E(ei(x)2 − a(x)2

oi(x)2

). (25)

Now, by setting (25) equal to the effective observability, weobtain

o(x) = 1∑i

(1/oi(x)2

)E(ei(x)2 − a(x)2

) . (26)

Finally, noting that E(ei(x)2−a(x)2) = oi(x) according to (8),we obtain ∑

i

1oi(x)

= 1o(x)

, (27)

and the effective SLF then becomes

e(x) = 12− o(x) ·

∑i

1− 2ei(x)2oi(x)

= o(x) ·∑i

ei(x)oi(x)

. (28)

7. SIMULATED AND EXPERIMENTAL RESULTS

Simulations were performed in order to understand the re-lationship between SNR, sound localization error, and thenumber of microphone pairs used. Figure 5 illustrates the re-sults of the simulations. The definition of noise in these sim-ulations corresponds to the second speaker (i.e., the interfer-ence signal) in the simulations. Hence, SNR in this contextreally corresponds to the signal-to-interference ratio (SIR).

The results illustrated in Figure 5 were obtained by sim-ulating the presence of a sound source and a noise sourceat a random location in the environment and observing thesound signals by a pair of microphones. The microphonepair always has an intermicrophone distance of 15 cm buthave a random location. In order to get an average over allspeaker, noise, and array locations, the simulation was re-peated a total of 1000 times.

Figure 5 seems to suggest that accurate and robust soundlocalization is not possible, because the localization error atlow SNRs does not seem to improve when more microphone


1 2 3 4 5 6 7 8 9 10Number of 2-element microphone arrays

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Ave

rage

loca

lizat

ion

erro

r(m

)

1 dB SNR3 dB SNR5 dB SNR

7 dB SNR9 dB SNR

Figure 5: Relationship between SNR, simulated sound localizationaccuracy, and number of binary microphone arrays without takingspatial observabilities into consideration.

0.31 m 0.15 mWalls

2-elementmicrophone arrays

Sound localization test environment

Figure 6: The location of the 10 2-element microphone arrays inthe test environment.

arrays are added to the environment. On the other hand,at high SNRs, extra microphone arrays do have an impacton the localization error. It should be noted that the resultsof Figure 5 correspond to an array integration mechanismwhere all arrays are assumed to have the same observabilityover all spatial locations. In reality, differences resulting fromthe spatial orientation of the environment and the attenu-ation of the source signals usually result in one array to bemore observable of a spatial position than another.

An experiment was conducted with 2-element micro-phone arrays at 10 different spatial positions as shown inFigure 6. Two uncorrelated speakers were placed at randompositions in the environment, both with approximately equalvocal intensity that resulted in an overall SNR of 0 dB. Thetwo main peaks of the overall speaker probability estimatewere used as speaker location estimates, and for each trial theaverage localization error in two dimensions was calculated.The trials were repeated approximately 150 times, with the


0

0.2

0.4

0.6

0.8

1

1.2

1.4

Ave

rage

loca

lizat

ion

erro

r(m

)

Experimental error at 0 dB using observabilitiesExperimental error at 0 dB without using observabilitiesSimulated error at 0 dB without using observabilities

Figure 7: Relationship between experimental localization accuracy(at 0 dB) and number of binary microphone arrays both with andwithout taking spatial observabilities into consideration.

first 50 times used to train the observabilities of each of themicrophone arrays by using knowledge about the estimatedspeaker locations and the actual speaker locations. The lo-calization errors of the remaining 100 trials were averaged toproduce the results shown in Figure 7. The localization errorswere computed based on the two speaker location estimatesand the true location of the speakers. Also, for each trial, thelocation of the two speech sources was randomly varied inthe environment.

As shown in Figure 7, the experimental localization er-ror approximately matches the simulated localization errorat 0 dB for the case that all microphone arrays are assumedto equally observe the environment. The error in this case re-mains close to 1m even as more microphone arrays are used.Figure 7 also shows the localization error for the case thatthe observabilities obtained from the first 50 trials are used.In this case, the addition of extra arrays significantly reducesthe localization error. When the entire set of 10 arrays are in-tegrated, the average localization error for the experimentalsystem is reduced to 8 cm.

The same experiment was conducted with the delay-and-sum beamformer-based SLFs (SRPs with no cross-correlation filtering) instead of the ISP-based SLF generationmethod. The results are shown in Figure 8.

The localization error of the delay-and-sum beam-former-based SLF generator is reduced by a factor of 2 whenobservability is taken into account. However, the errors arefar greater than the sound localization system that uses theISP-based SLF generator. When all 10 microphone pairs aretaken into account, the localization error is approximately0.5 m.

Now, we consider an example of the localization of 3speakers, all speaking with equal vocal intensities. Figure 9



0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

Ave

rage

loca

lizat

ion

erro

r(m

)

Delay-and-sum sound localization without observabilitiesDelay-and-sum sound localization using observabilitiesDelay-and-sum sound localization using all 20 microphoneas single array

Figure 8: Relationship between experimental localization accuracy(at 0 dB) using a delay-and-sum beamformer-based SLFs and num-ber of binary microphone arrays both with and without taking spa-tial observabilities into consideration.

02

46

8

Spatial y-axis

42

0−2

−4Spatial x-axis

0

0.2

0.4

0.6

0.8

1

Sou

nd

sou

rce

likel

ihoo

d

Figure 9: The location of 3 speakers in the environment.

illustrates the location of the speakers in a two-dimensionalenvironment. Note that the axis labels of Figures 9, 10, and11 correspond to 0.31-m steps.

The ISP-based SLF generator, without taking the observ-ability of each microphone pair into account, produces theoverall SLF shown in Figure 10.

In Figure 10, it is difficult to determine the true positionof the speakers. There is also a third peak that does not corre-spond to any speaker. Using the same sound signals, an SLFwas produced and shown in Figure 11, this time with takingobservabilities into account.

This time, the location of the speakers can be clearly de-termined. Each of the three peaks correspond to the correctlocation of their corresponding speakers.

02

46

8

Spatial y-axis

42

0−2

−4Spatial x-axis

0

1

2

3

4

5

Sou

nd

sou

rce

likel

ihoo

d

Figure 10: Localization of 3 speakers without using observabilities.

02

46

8

Spatial y-axis

42

0−2

−4Spatial x-axis

0

0.1

0.2

0.3

0.4

0.5

0.6

Sou

nd

sou

rce

likel

ihoo

d

Figure 11: Localization of 3 speakers with observabilities.

For the experiments in Figures 10 and 11, the prior prob-ability ρx for all spatial positions was assumed to be a con-stant of 0.3. Furthermore, the SOFs were obtained by experi-mentally evaluating the SOF function of (8) at several differ-ent points (for each microphone pair) and then interpolatingthe results to obtain an SOF for the entire space. An exampleof this SOF generation mechanism is the SOF of Figure 3.

The large difference between the results of Figures 10and 11 merits further discussion. Basically, the main rea-son for the improvement in Figure 11 is that for locationsthat are farther away from a microphone pair, the estimatesmade by that pair are weighted less significantly than micro-phone pairs that are closer. On the other hand, in Figure 10,the results of all microphone pairs are combined with equalweights. As a result, even if, for every location, there are afew microphone pairs with correct estimates, the integrationwith the noisy estimates of the other microphone pairs taintsthe resulting integrated estimate.

8. CONCLUSIONS

This paper introduced the concept of multisensor object lo-calization using different sensor observabilities in order to


account for different levels of access to each spatial position.This definition led to the derivation of the minimum meansquare error object localization estimates that correspondedto the probability of a speaker at a spatial location given theresults of all available sensors. Experimental results using thisapproach indicate that the average localization error is re-duced to 8 cm in a prototype environment with 10 2-elementmicrophone arrays at 0 dB. With prior approaches, the local-ization error using the exact same network is approximately0.95 m at 0 dB.

The reason that the proposed approach outperforms itsprevious counterparts is that, by taking into account whichmicrophone array has better access to each speaker, the effec-tive SNR is increased. Hence, the behaviour and performanceof the proposed approach at 0 dB is comparable to that ofprior approaches at SNRs greater than 7–10 dB.

Apart from improved performance, the proposed algo-rithm for the integration of distributed microphone arrayshas the advantage of requiring less bandwidth and less com-putational resources. Less bandwidth is required since eacharray only reports its SLF, which usually involves far less in-formation than transmitting multiple channels of audio sig-nals. Less computational resources are required since com-puting an SLF for a single array and then combining the re-sults of multiple microphone arrays by weighted SLF addi-tion (as proposed in this paper) is computationally simplerthan producing a single SLF directly from the audio signalsof all arrays [14].

One drawback of the proposed technique is the measure-ment of the SOFs for the arrays. A fruitful direction of futurework would be to model the SOF instead of experimentallymeasuring it, which is a very tedious process. Another area ofpotential future work is a better model for the speakers in theenvironment. The proposed model, which assumes that theactual speaker probability is independent of different spatialpositions, could be made more realistic by accounting for thespatial dependencies that often exist in practice.

ACKNOWLEDGMENT

Some of the simulation and experimental results presentedhere have been presented in a less developed manner in [20,31].

REFERENCES

[1] P. Aarabi and S. Zaky, “Iterative spatial probability basedsound localization,” in Proc. 4th World Multi-Conference onCircuits, Systems, Computers, and Communications, Athens,Greece, July 2000.

[2] P. Aarabi, “The application of spatial likelihood functions tomulti-camera object localization,” in Proc. Sensor Fusion: Ar-chitectures, Algorithms, and Applications V, vol. 4385 of SPIEProceedings, pp. 255–265, Orlando, Fla, USA, April 2001.

[3] M. S. Brandstein and H. Silverman, “A robust method forspeech signal time-delay estimation in reverberant rooms,” inProc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp.375–378, Munich, Germany, April 1997.

[4] M. S. Brandstein, A framework for speech source localization us-

ing sensor arrays, Ph.D. thesis, Brown University, Providence,RI, USA, 1995.

[5] J. Flanagan, J. Johnston, R. Zahn, and G. Elko, “Computer-steered microphone arrays for sound transduction in largerooms,” Journal of the Acoustical Society of America, vol. 78,pp. 1508–1518, November 1985.

[6] K. Guentchev and J. Weng, “Learning-based three dimen-sional sound localization using a compact non-coplanar arrayof microphones,” in Proc. AAAI Spring Symposium on Intelli-gent Environments, Stanford, Calif, USA, March 1998.

[7] J. DiBiase, H. Silverman, and M. S. Brandstein, “Robust lo-calization in reverberant rooms,” in Microphone Arrays: Sig-nal Processing Techniques and Applications, M. S. Brandsteinand D. B. Ward, Eds., pp. 131–154, Springer Verlag, New York,USA, September 2001.

[8] P. Aarabi, “Multi-sense artificial awareness,” M.A.Sc. thesis,Department of Electrical and Computer Engineering, Univer-sity of Toronto, Toronto, Ontario, Canada, 1998.

[9] M. Coen, “Design principles for intelligent environments,”in Proc. 15th National Conference on Artificial Intelligence, pp.547–554, Madison, Wis, USA, July 1998.

[10] R. A. Brooks, M. Coen, D. Dang, et al., “The intelligent roomproject,” in Proc. 2nd International Conference on CognitiveTechnology, Aizu, Japan, August 1997.

[11] A. Pentland, “Smart rooms,” Scientific American, vol. 274, no.4, pp. 68–76, 1996.

[12] P. Aarabi and S. Zaky, “Robust sound localization using multi-source audiovisual information fusion,” Information Fusion,vol. 3, no. 2, pp. 209–223, 2001.

[13] P. Aarabi and S. Zaky, “Integrated vision and sound local-ization,” in Proc. 3rd International Conference on InformationFusion, Paris, France, July 2000.

[14] P. Aarabi, The integration and localization of distributed sensorarrays, Ph.D. thesis, Stanford University, Stanford, Calif, USA,2001.

[15] P. Aarabi, “Robust multi-source sound localization using tem-poral power fusion,” in Proc. Sensor Fusion: Architectures, Al-gorithms, and Applications V, vol. 4385 of SPIE Proceedings,Orlando, Fla, USA, April 2001.

[16] F. L. Wightman and D. Kistler, “The dominant role of low-frequency interaural time differences in sound localization,”Journal of the Acoustical Society of America, vol. 91, no. 3, pp.1648–1661, 1992.

[17] D. Rabinkin, R. J. Ranomeron, A. Dahl, J. French, J. L. Flana-gan, and M. H. Bianchi, “A DSP implementation of sourcelocation using microphone arrays,” in Proc. 131st Meeting ofthe Acoustical Society of America, Indianapolis, Ind, USA, May1996.

[18] M. S. Brandstein, J. Adcock, and H. Silverman, “A practicaltime-delay estimator for localizing speech sources with a mi-crophone array,” Computer Speech & Language, vol. 9, no. 2,pp. 153–169, 1995.

[19] C. H. Knapp and G. Carter, “The generalized correlationmethod for estimation of time delay,” IEEE Trans. Acoustics,Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.

[20] P. Aarabi, “The integration of distributed microphone arrays,”in Proc. 4th International Conference on Information Fusion,Montreal, Canada, July 2001.

[21] R. O. Schmidt, “Multiple emitter location and signal parame-ter estimation,” IEEE Transactions on Antennas and Propaga-tion, vol. 34, no. 3, pp. 276–280, 1986.

[22] H. Watanabe, M. Suzuki, N. Nagai, and N. Miki, “A methodfor maximum likelihood bearing estimation without nonlin-ear maximization,” Transactions of the Institute of Electronics,Information and Communication Engineers A, vol. J72A, no. 8,pp. 303–308, 1989.


[23] H. Watanabe, M. Suzuki, N. Nagai, and N. Miki, “Maximumlikelihood bearing estimation by quasi-Newton method us-ing a uniform linear array,” in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, pp. 3325–3328, Toronto, Ontario,Canada, April 1991.

[24] I. Ziskind and M. Wax, “Maximum likelihood localiza-tion of multiple sources by alternating projection,” IEEETrans. Acoustics, Speech, and Signal Processing, vol. 36, no. 10,pp. 1553–1560, 1988.

[25] H. Wang and M. Kaveh, “Coherent signal-subspace process-ing for the detection and estimation of angles of arrival ofmultiple wide-band sources,” IEEE Trans. Acoustics, Speech,and Signal Processing, vol. 33, no. 4, pp. 823–831, 1985.

[26] S. Valaee and P. Kabal, “Wide-band array processing usinga two-sided correlation transformation,” IEEE Trans. SignalProcessing, vol. 43, no. 1, pp. 160–172, 1995.

[27] B. Friedlander and A. J. Weiss, “Direction finding for wide-band signals using an interpolated array,” IEEE Trans. SignalProcessing, vol. 41, no. 4, pp. 1618–1634, 1993.

[28] P. Aarabi and A. Mahdavi, “The relation between speechsegment selectivity and time-delay estimation accuracy,” inProc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Or-lando, Fla, USA, May 2002.

[29] S. S. Iyengar and D. Thomas, “A distributed sensor networkstructure with fault tolerant facilities,” in Intelligent Controland Adaptive Systems, vol. 1196 of SPIE Proceedings, Philadel-phia, Pa, USA, November 1989.

[30] R. R. Brooks and S. S. Iyengar, Multi-Sensor Fusion: Funda-mentals and Applications with Software, Prentice Hall, UpperSaddle River, NJ, USA, 1998.

[31] P. Aarabi, “The equivalence of Bayesian multi-sensor infor-mation fusion and neural networks,” in Proc. Sensor Fusion:Architectures, Algorithms, and Applications V, vol. 4385 of SPIEProceedings, Orlando, Fla, USA, April 2001.

[32] A. Leon-Garcia, Probability and Random Processes for Electri-cal Engineering, Addison-Wesley, Reading, Mass, USA, 2ndedition, 1994.

[33] B. Widrow and S. D. Stearns, Adaptive Signal Processing,Prentice-Hall, Englewood Cliffs, NJ, USA, 1985.

[34] A. Papoulis, Probability, Random Variables and Stochastic Pro-cesses, McGraw-Hill, New York, NY, USA, 2nd edition, 1984.

Parham Aarabi is a Canada Research Chairin Multi-Sensor Information Systems, anAssistant Professor in the Edward S. RogersSr. Department of Electrical and ComputerEngineering at the University of Toronto,and the Founder and Director of the Artifi-cial Perception Laboratory. Professor Aarabireceived his B.A.S. degree in engineer-ing science (electrical option) in 1998, hisM.A.S. degree in electrical and computerengineering in 1999, both from the University of Toronto, and hisPh.D. degree in electrical engineering from Stanford University. InNovember 2002, he was selected as the Best Computer EngineeringProfessor of the 2002 fall session. Prior to joining the Universityof Toronto in June 2001, Professor Aarabi was a Coinstructor atStanford University as well as a Consultant to various silicon valleycompanies. His current research interests include sound localiza-tion, microphone arrays, speech enhancement, audiovisual signalprocessing, human-computer interactions, and VLSI implementa-tion of speech processing applications.


A Self-Localization Method for WirelessSensor Networks

Randolph L. MosesDepartment of Electrical Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USAEmail: [email protected]

Dushyanth KrishnamurthyDepartment of Electrical Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA

Robert M. PattersonDepartment of Electrical Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USAEmail: [email protected]

Received 30 November 2001 and in revised form 9 October 2002

We consider the problem of locating and orienting a network of unattended sensor nodes that have been deployed in a scene atunknown locations and orientation angles. This self-calibration problem is solved by placing a number of source signals, alsowith unknown locations, in the scene. Each source in turn emits a calibration signal, and a subset of sensor nodes in the networkmeasures the time of arrival and direction of arrival (with respect to the sensor node’s local orientation coordinates) of the signalemitted from that source. From these measurements we compute the sensor node locations and orientations, along with anyunknown source locations and emission times. We develop necessary conditions for solving the self-calibration problem andprovide a maximum likelihood solution and corresponding location error estimate. We also compute the Cramer-Rao bound ofthe sensor node location and orientation estimates, which provides a lower bound on calibration accuracy. Results using bothsynthetic data and field measurements are presented.

Keywords and phrases: sensor networks, localization, location uncertainty, Cramer-Rao bound.

1. INTRODUCTION

Unattended sensor networks are becoming increasingly im-portant in a large number of military and civil applications[1, 2, 3, 4]. The basic concept is to deploy a large number oflow-cost self-powered sensor nodes that acquire and processdata. The sensor nodes may include one or more acoustic mi-crophones as well as seismic, magnetic, or imaging sensors. Atypical sensor network objective is to detect, track, and clas-sify objects or events in the neighborhood of the network.

We consider a sensor deployment architecture as shownin Figure 1. A number of low-cost sensor nodes, eachequipped with a processor, a low-power communicationtransceiver, and one or more sensing capabilities, are set outin a planar region. Each sensor node monitors its environ-ment to detect, track, and characterize signatures. The senseddata is processed locally, and the result is transmitted to a lo-cal central information processor (CIP) through a low-powercommunication network. The CIP fuses sensor informationand transmits the processed information to a higher-levelprocessing center.

Centralinformation

processor

Higher-levelprocessing center

Sensors

Figure 1: Sensor network architecture. A number of low-cost sen-sor nodes are deployed in a region. Each sensor node communicatesto a local CIP, which relays information to a more distant commandcenter.

Many sensor network signal-processing tasks assume thatthe locations and orientations of the sensor nodes are known[4]. However, accurate knowledge of sensor node locationsand orientations is often not available. Sensor nodes are oftenplaced in the field by persons, by an air drop, or by artillery

A Self-Localization Method for Wireless Sensor Networks 349

Array 2(x2, y2)

θ2

Source S(xS, yS)

Array A(xA, yA)

θA

Source 1(x1, y1)

Array 1(x1, y1)

θ1

Figure 2: Sensor self-localization scenario.

launch. For careful hand placement, accurate location andorientation of the sensor nodes can be assumed; however, formost other sensor deployment methods, it is difficult or im-possible to know accurately the location and orientation ofeach sensor node. One could equip every sensor node witha GPS and compass to obtain location and orientation infor-mation, but this adds to the expense and power requirementsof the sensor node and may increase susceptibility to jam-ming. Thus, there is interest in developing methods to self-localize the sensor network with a minimum of additionalhardware or communication.

Self-localization in sensor networks is an active area ofcurrent research (see, e.g., [1, 5, 6, 7, 8] and the referencestherein). Iterative multilateration-based techniques are con-sidered in [7], and Bulusu et al. [5, 9] consider low-costlocalization methods. These approaches assume availabilityof beacon signals at known locations. Sensor localization,coupled with near-field source localization, is considered in[10, 11]. Cevher and McClellan consider sensor network self-calibration using a single acoustic source that travels alonga straight line [12]. The self-localization problem is also re-lated to the calibration of element locations in sensor arrays[13, 14, 15, 16, 17, 18]. In the element calibration problem,we assume knowledge of the nominal sensor locations andassume high (or perfect) signal coherence between the sen-sors; these assumptions may not be satisfied for many sensornetwork applications, however.

In this paper, we consider an approach to sensor networkself-calibration using sources at unknown locations in thefield. Thus, we relax the assumption that beacon signals atknown locations are available. The approach entails placinga number of signal sources in the same region as the sensornodes (see Figure 2). Each source in turn generates a knownsignal that is detected by a subset of the sensor nodes; eachsensor node that detects the signal measures the time of ar-rival (TOA) of the source with respect to an established net-work time base [19, 20] and also measures the direction of ar-rival (DOA) of the source signal with respect to a local (to thesensor node) frame of reference. The set of TOA and DOA

measurements are collected together and form the data usedto estimate the unknown locations and orientations of thesensor nodes.

In general, neither the source locations nor their signalemission times are assumed to be known. If the source sig-nal emission times are unknown, then the time of arrivalto any one sensor node provides no information for self-localization; rather, time difference of arrival (TDOA) be-tween sensor nodes carries information for localization. Ifpartial information is available, it can be incorporated intothe estimation procedure to improve the accuracy of the cali-bration. For example, [21] considers the case in which sourceemission times are known; such would be the case if thesources were electronically triggered at known times.

We show that if neither the source locations nor theirsignal emission times are known and if at least three sensornodes and two sources are used, the relative locations andorientations of all sensor nodes, as well as the locations andsignal emission times of all sources, can be estimated. Thecalibration is computed except for an unknown translationand rotation of the entire source-signal scene, which cannotbe estimated unless additional information is available. Withthe additional location or orientation information of one ortwo sources, absolute location and orientation estimates canbe obtained.

We consider optimal signal processing of the measuredself-localization data. We derive the Cramer-Rao bound(CRB) on localization accuracy. The CRB provides a lowerbound on any unbiased localization estimator and is usefulto determine the best-case localization accuracy for a givenproblem and to provide a baseline standard against whichsuboptimal localization methods can be measured. We alsodevelop a maximum likelihood (ML) estimation procedure,and show that it achieves the CRB for reasonable TOA andDOA measurement errors.

There is a great deal of flexibility in the type of signalsources to be used. We require only that the times of arrivalof the signals can be estimated by the sensor nodes. This canbe accomplished by matched filtering or generalized cross-correlation of the measured signal with a stored waveformor set of waveforms [22, 23]. Examples of source signals areshort transients, FM chirp waveforms, PN-coded or direct-sequence waveforms, or pulsed signals. If the sensor nodescan also estimate signal arrival directions (as is the case withvector pressure sensors or arrays of microphones), these esti-mates can be used to improve the calibration solution.

An outline of the paper is as follows. Section 2 presentsa statement of the problem and of the assumptions made.In Section 3, we first consider necessary conditions for aself-calibration solution and present methods for solving theself-calibration problem with a minimum number of sensornodes and sources. These methods provide initial estimatesfor an iterative descent computation needed to obtain MLcalibration parameter estimates derived in Section 4. Boundson the calibration uncertainty are also derived. Section 5presents numerical examples to illustrate the approach, andSection 6 presents conclusions.


2. PROBLEM STATEMENT AND NOTATION

Assume we have a set of A sensor nodes in a plane, eachwith unknown location {ai = (xi, yi)}Ai=1 and unknown ori-entation angle θi with respect to a reference direction (e.g.,North). We consider the two-dimensional problem in whichthe sensor nodes lie in a plane and the unknown referencedirection is azimuth; an extension to the three-dimensionalcase is possible using similar techniques. A sensor node mayconsist of one or more sensing element; for example, it couldbe a single sensor, a vector sensor [24], or an array of sensorsin a fixed known geometry. If the sensor node does not mea-sure the DOA, then its orientation angle θi is not estimated.

In the sensor field are also placed S point sources at lo-cations {s j = (x j , y j)}Sj=1. The source locations are in gen-eral unknown. Each source emits a known finite-length sig-nal that begins at time t j ; the emission times are also in gen-eral unknown.

Each source emits a signal in turn. Every sensor node at-tempts to detect the signal, and if detected, the sensor nodeestimates the TOA of the signal with respect to a sensor net-work time base, and a DOA with respect to the sensor node’slocal reference direction. The time base can be establishedeither by using the electronic communication network link-ing the sensor nodes [19, 20] or by synchronizing the sen-sor node processor clocks before deployment. The time baseneeds to be accurate to a number on the order of the time ofarrival measurement uncertainty (1 ms in the examples con-sidered in Section 5). The DOA measurements are made withrespect to a local (to the sensor node) frame of reference.The absolute directions of arrival are not available becausethe orientation angle of each sensor node is unknown (andis estimated in the calibration procedure). Both the TOA andDOA measurements are assumed to contain estimation er-rors. We denote the measured TOA at sensor node i of sourcej as ti j and the measured DOA as θi j .

We initially assume every sensor node detects ev-ery source signal; partial measurements are considered inSection 4.4. If so, a total of 2AS measurements are obtained.The 2AS measurements are gathered in a vector

X =[

vec(T)

vec(Θ)

]T

(2AS× 1), (1)

where vec(M) stacks the elements of a matrix M columnwiseand where

T =

t11 t12 · · · t1St21 t22 · · · t2S...

.... . .

...tA1 tA2 · · · tAS

,

Θ =

θ11 θ12 · · · θ1S

θ21 θ22 · · · θ2S...

.... . .

...θA1 θA2 · · · θAS

.

(2)

Each sensor node transmits its 2S TOA and DOA measure-ments to a CIP, and these 2AS measurements form the datawith which the CIP computes the sensor calibration. Notethat the communication cost to the CIP is low, and the cali-bration processing is performed by the CIP.

The above formulation implicitly assumes that sensornode measurements can be correctly associated to the corre-sponding source. That is, each sensor node TOA and DOAmeasurement corresponding to source j can be correctlyattributed to that source. There are several ways in whichthis association can be realized. One method is to time-multiplex the source signals so that they do not overlap. Ifthe source firing times are separated, then any sensor nodedetection within a certain time interval can be attributed toa unique source. Alternately, each source can emit a uniqueidentifying tag, encoded, for example, in its transmitted sig-nal. In either case, failed detections can be identified at theCIP by the absence of a report from sensor node i aboutsource j. Finally, we can relax the assumption of perfect as-sociation by including a data association step in the self-localization algorithm, using, for example, the methods in[25, 26].

Define the parameter vectors

β = [x1, y1, θ1, . . . , xA, yA, θA

]T(3A× 1),

γ = [x1, y1, t1, . . . , xS, yS, tS

]T(3S× 1),

α = [βT, γT

]T (3(A + S)× 1

).

(3)

Note that β contains the sensor node unknowns and γ con-tains the source signal unknowns. We denote the true TOAand DOA of source signal j at sensor node i as τi j(α) andφi j(α), respectively, and include their dependence on the pa-rameter vector α; they are given by

τi j(α) = t j +

∥∥ai − s j∥∥

c,

φi j(α) = θi + ∠(ai, s j

),

(4)

where ai = [xi, yi]T , s j = [x j , y j]T , ‖ · ‖ is the Euclideannorm, ∠(ξ, η) is the angle between the points ξ, η ∈ �2, andc is the signal propagation velocity.

Each element of X has measurement uncertainty; wemodel the uncertainty as

X = µ(α) + E, (5)

where µ(α) is the noiseless measurement vector whose ele-ments are given by (4) for values of i and j that correspondto the vector stacking operation in (1), and where E is a ran-dom vector with known probability density function.

The self-calibration problem then is, given the measure-ment X , estimate β. The parameters in γ are in general un-known and are nuisance parameters that must also be esti-mated. If some parameters in γ are known, the complexityof the self-calibration problem is reduced, and the resultingaccuracy of the β estimate is improved.


Table 1: Minimal solutions for sensor self-localization.

Case # Unknowns Minimum A, S Comments

Known locations3A A = 1, S = 2 Closed form solution

Known times

Known locations 3A + S A = 1, S = 3 Closed form solution

Unknown times 3A + S A = 2, S = 2 1D iterative solution

Unknown locations3(A−1)+2S A = 2, S = 2 Closed form solution

Known times

Unknown locations3(A + S− 1)

A = 2, S = 3 or2D iterative solution

Unknown times A = 3, S = 2

3. EXISTENCE AND UNIQUENESS OF SOLUTIONS

In this section, we address the existence and uniqueness ofsolutions to the self-calibration problem and establish theminimum number of sensor nodes and sources needed toobtain a solution. We assume that every sensor node detectsevery source and measures both TOA and DOA. In addi-tion, we assume that the TOA and DOA measurements arenoiseless and correspond to values that correspond to a pla-nar sensor-source scenario; that is, we assume they are solu-tions to (4) for some vector α ∈ �3(A+S). We establish theminimum number of sources and sensor nodes needed tocompute a unique calibration solution and give algorithmsfor finding the self-calibration solution in the minimal cases.These algorithms provide initial estimates to an iterative de-scent algorithm for the practical case of nonminimal noisymeasurements presented in Section 4.

The four cases below make different assumptions onwhat is known about the source signal locations and emis-sion times. Of primary interest is the case where no sourceparameters are known; however, the solution for this caseis based on solutions for cases in which partial informationis available, so it is instructive to consider all four cases. Inall four cases, the number of measurements is 2AS, and de-termination of β involves solving a nonlinear set of equa-tions for its 3A unknowns. Depending on the case consid-ered, we may also need to estimate the unknown nuisanceparameters in γ. The result in each case is summarized inTable 1.

Case 1 (known source locations and emission times). Aunique solution for β can be found for any number of sensornodes as long as there are S ≥ 2 sources. In fact, the loca-tion and orientation of each sensor node can be computedindependently of other sensor node measurements. The lo-cation of the ith sensor node ai is found from the intersec-tion of two circles with centers at the source locations andwith radii (ti1 − t1)/c and (ti2 − t2)/c. The intersection is ingeneral two points; the correct location can be found us-ing the sign of θi2 − θi1. We note that the two circle inter-sections can be computed in closed form. Finally, from theknown source and sensor node locations and the DOA mea-surements, the sensor node orientation θi can be uniquelyfound.

ai

s2s1

θi2− θi1

Figure 3: A circular arc is the locus of possible sensor node loca-tions whose angle between two known points is constant.

Case 2 (known source locations and unknown emissiontimes). For S ≥ 3 sources, the location and orientation ofeach sensor node can be computed in closed form inde-pendently of other sensor nodes. A solution procedure is asfollows. Consider the pair of sources (s1, s2). Sensor node iknows the angle θi2 − θi1 between these two sources. The setof all possible locations for sensor node i is an arc of a circlewhose center and radius can be computed from the sourcelocations (see Figure 3). Similarly, a second circular arc is ob-tained from the source pair (s1, s3). The intersection of thesetwo arcs is a unique point and can be computed in closedform. Once the sensor node location is known, its orienta-tion θi is readily computed from one of the three DOA mea-surements.

A solution for Case 2 can also be found using S = 2sources and A = 2 sensor nodes. The solution requires a one-dimensional search of a parameter over a finite interval. Theknown location of s1 and s2 and the known angle θ11 − θ12

means that sensor node 1 must lie on a known circular arc asin Figure 3. Each location along the arc determines the sourceemission times t1 and t2. These emission times are consistentwith the measurements from the second sensor node for ex-actly one position a1 along the arc.

Case 3 (unknown source locations and known emissiontimes). In this case and in Case 4 below, the calibrationproblem can only be solved to within an unknown trans-lation and rotation of the entire sensor-source scene be-cause any translation or rotation of the entire scene does not


change the ti j and θi j measurements. To eliminate this am-biguity, we assume that the location and orientation of thefirst sensor node are known; without loss of generality, weset x1 = y1 = θ1 = 0. We solve for the remaining 3(A − 1)parameters in β.

For the case of unknown source locations, a unique so-lution for β is computable in closed form for S = 2 and anyA ≥ 2 (the case A = 1 is trivial). The range to each sourcefrom sensor node 1 can be computed from r j = (t1 j − t j)/c,and its bearing is known, so the locations of the two sourcescan be found. The locations and orientations of the remain-ing sensor nodes are then computed using the method ofCase 1.

Case 4 (unknown source locations and emission times). Forthis case, it can be shown that an infinite number of calibra-tion solutions exist for A = S = 2,1 but a unique solutionexists in almost all cases for either A = 2 and S = 3 or A = 3and S = 2. In some degenerate cases, not all of the γ param-eters can be uniquely determined, although we do not knowa case for which the β parameters cannot be uniquely found.

Closed form calibration solutions are not known for thiscase, but solutions that require a two-dimensional search canbe found. We outline one such solution that works for eitherA = 2 and S ≥ 3 or S = 2 and A ≥ 3. Assume as before thatsensor node 1 is at location (x1, y1) = (0, 0) with orientationθ1 = 0. If we know the two source emission times t1 and t2,we can find the locations of sources s1 and s2 as in Case 3.From the two known source locations, all remaining sensornode locations and orientations can be found using the pro-cedure in Case 1, and then all remaining source locations canbe found using triangulation from the known arrival anglesand known sensor node locations. These solutions will be in-consistent except for the correct values of t1 and t2. The cal-ibration procedure, then, is to iteratively adjust t1 and t2 tominimize the error between computed and measured timedelays and arrival angles.

4. MAXIMUM LIKELIHOOD SELF-CALIBRATION

In this section, we derive ML estimator for the unknown sen-sor node location and orientation parameters.

The ML algorithm involves the solution of a set ofnonlinear equations for the unknown parameters, includ-ing the unknown nuisance parameters in γ. The solution isfound by iterative minimization of a cost function; we usethe methods in Section 3 to initialize the iterative descent.In addition, we derive the CRB for the variance of the un-known parameters in α; the CRB also gives parameter vari-ance of the ML parameter estimates for high signal-to-noiseratio (SNR).

The ML estimator is derived from a known parametricform for the measurement uncertainty in X . In this paper, we

1Note that for A = S = 2, there are 8 measurements and 9 unknown pa-rameters. The set of possible solutions in general lies on a one-dimensionalmanifold in the 9-dimensional parameter space.

adopt a Gaussian uncertainty. The justification is as follows.First, for sufficiently high SNR, TOA estimates obtained bygeneralized cross-correlation are Gaussian distributed withnegligible bias [23]. The variance of the Gaussian TOA errorcan be computed from the signal spectral characteristics [23].For broadband signals with flat spectra, the TOA error stan-dard deviation is roughly inversely proportional to the sig-nal bandwidth [21]. Furthermore, most DOA estimates arealso Gaussian with negligible bias for sufficiently high SNR[27]. For single sources, the DOA standard deviation is pro-portional to the array beamwidth [28]. Thus, Gaussian TOAand DOA measurement uncertainty model is a reasonable as-sumption for sufficiently high SNR.

4.1. The maximum likelihood estimate

Under the assumption that the measurement uncertainty Ein (5) is Gaussian with zero mean and known covariance Σ,the likelihood function is

f (X ;α) = 1(2π)AS|Σ|1/2 exp

{− 1

2Q(X ;α)

}, (6)

Q(X ;α) = [X − µ(α)

]TΣ−1[X − µ(α)

]. (7)

A special case is when the measurement errors are uncorre-lated and the TOA and DOA measurement errors have vari-ances σ2

t and σ2θ , respectively; (7) then becomes

Q(X ;α) =A∑i=1

S∑j=1

[(ti j − τi j(α)

)2

σ2t

+

(θi j − φi j(α)

)2

σ2θ

]. (8)

Depending on the particular knowledge about the source sig-nal parameters, none, some, or all of the parameters in α maybe known. We let α1 denote vector of unknown elements ofα and let α2 denote the vector of known elements in α. Usingthis notation along with (6), the ML estimate of α1 is

α1,ML = arg maxα1

f(X, α2;α

) = arg minα1

Q(X ;α). (9)

4.2. Nonlinear least squares solution

Equation (9) involves solving a nonlinear least squares prob-lem. A standard iterative descent procedure can be used, ini-tialized using one of the solutions in Section 3. In our imple-mentation, we used the Matlab function lsqnonlin.

The straightforward nonlinear least squares solution weadopted converged quickly (in several seconds for all exam-ples tested) and displayed no symptoms of numerical insta-bility. In addition, the nonlinear least squares solution con-verged to the global minimum in all cases we considered.We note, however, that alternative methods for solving (9)may reduce computation. For example, we can divide the pa-rameter set and iterate first on the sensor node location pa-rameters and second on the remaining parameters. Althoughthe sensor node orientations and source parameters dependnonlinearly on the sensor node locations, computationallyefficient approximations exist (see, e.g., [29]), so the com-putational savings of lower-dimensional searches may ex-ceed the added computational cost of iterations nested in


iterations if the methods are tuned appropriately. Similarly,one can view the source parameters as nuisance parametersand employ estimate-maximize (EM) algorithms to obtainthe ML solution [30].

4.3. Estimation accuracy

The CRB gives a lower bound on the covariance of any unbi-ased estimate of α1. It is a tight bound in the sense that α1,ML

has parameter uncertainty given by the CRB for high SNR;that is, as maxi Σii → 0. Thus, the CRB is a useful tool foranalyzing calibration uncertainty.

The CRB can be computed from the Fisher informationmatrix of α1. The Fisher information matrix is given by [22],

Iα1 = E{[∇α1 ln f (T,Θ;α)

][∇α1 ln f (T,Θ;α)]T}

. (10)

The partial derivatives are readily computed from (6) and(4); we find that

Iα1 =[G′(α1)]T

Σ−1[G′(α1)], (11)

where G′(α1) is the 2AS×dim(α1) matrix whose i jth elementis ∂µi(α1)/∂(α1) j .

For Cases 3 and 4, the Fisher information matrix is rankdeficient due to the translational and rotational ambiguity inthe self-calibration solution. In order to obtain an invertibleFisher information matrix, some of the sensor node or sourceparameters must be known. It suffices to know the locationand orientation of a single sensor node, or to know the lo-cations of two sensor nodes or sources. These assumptionsmight be realized by equipping one sensor node with a GPSand a compass, or by equipping two sensor nodes or sourceswith GPSs. Let α1 denote the vector obtained by removingthese assumed known parameters from α1. To compute theCRB matrix for α1 in this case, we first remove all rows andcolumns in Iα1 that correspond to the assumed known pa-rameters then invert the remaining matrix [22],

Cα1 =[Iα1

]−1. (12)

4.4. Partial measurements

So far we have assumed that every sensor node detects andmeasures both the TOA and DOA from every source signal.In this section, we relax that assumption. We assume thateach emitted source signal is detected by only a subset ofthe sensor nodes in the field and that a sensor node that de-tects a source may measure the TOA and/or the DOA for thatsource, depending on its capabilities. We denote the availabil-ity of a measurement using two indicator functions Iti j and Iθi j ,where

Iti j , Iθi j ∈ {0, 1}. (13)

If sensor node i measures the TOA (DOA) for source j, thenIti j = 1 (Iθi j = 1); otherwise, the indicator function is set tozero. Furthermore, let L denote the 2AS×1 vector whose kthelement is 1 if Xk is measured and is 0 if Xk is not measured;L is thus obtained by forming A × S matrices It and Iθ and

2000150010005000X (meters)

0

500

1000

1500

2000

Y(m

eter

s)

S1

S10S9S5

S8

S2S4

S11

S3

S6S 7

A8

A7

A6

A3

A5A9A1

A10A4

A2

Figure 4: Example scene showing ten sensor nodes (stars) andeleven sources (squares). Also are shown the 2σ location uncertaintyellipses of the sensor nodes and sources; these are on average lessthan 1 m in radius and show as small dots. The locations of sensornodes A1 and A2 are assumed to be known.

stacking their columns into a vector as in (1). Finally, defineX to be the vector formed from elements of X for which mea-surements are available, so Xk is in X if Lk = 1.

The ML estimator for the partial measurement case issimilar to (9) but uses only those elements of X for whichthe corresponding element of L is one. Thus,

α1,ML = arg minα1

Q(X ;α

), (14)

where (assuming uncorrelated measurement errors as in(8)),

Q(X ;α

) = A∑i=1

S∑j=1

[(ti j − τi j(α)

)2

σ2t

I ti j +

(θi j − φi j(α)

)2

σ2θ

Iθi j

].

(15)

The Fisher information matrix for this case is similar to (11),but includes only information from available measurements;thus

Iα1 =[G′(α1)]T

Σ−1[G′(α1)], (16)

where

[G′(α1)]

i j = Li · ∂µi(α1)

∂(α1)j

. (17)

The above expression readily extends to the case when theprobability of sensor node i detecting source j is neither zeronor one. If Σ is diagonal, the FIM for this case is given by


111110109108107106105X (meters)

477

478

479

480

481

482

483

Y(m

eter

s)

A3

130012991298X (meters)

1388

1389

1390

Y(m

eter

s)

A9

Figure 5: Two standard deviation location uncertainty ellipses for sensor nodes A3 and A9 from Figure 4.

Iα1 =[G′(α1)]T

Σ−1PD[G′(α1)], (18)

where PD is a diagonal matrix whose kth diagonal element isthe probability that measurement Xk is available.

We note that when partial measurements are available,the ML calibration may not be unique. For example, if onlyTOA measurements are available, a scene calibration solutionand its mirror image have the same likelihoods. A completeunderstanding of the uniqueness properties of solutions inthe partial measurement case is a topic of current research.

5. NUMERICAL RESULTS

This section presents numerical examples of the self-calibration procedure. First, we present a synthetically gener-ated example consisting of ten sensor nodes and 2–11 sourcesplaced randomly in a 2 km×2 km region. Second, we presentresults from field measurements using four acoustic sensornodes and four acoustic sources.

5.1. Synthetic data example

We consider a case in which ten sensor nodes are randomlyplaced in a 2 km × 2 km region. In addition, between twoand 11 sources are randomly placed in the same region.The sensor node orientations and source emission times arerandomly chosen. Figure 4 shows the locations of the sen-sor nodes and sources. We initially assume that every sen-sor node detects each source emission and measures the TOAand DOA of the source. The measurement uncertainties areGaussian with standard deviations of σt = 1 ms for the TOAsand σθ = 3◦ for the DOAs. Neither the locations nor emis-sion times of the sources are assumed to be known. In orderto eliminate the translation and rotation uncertainty in the

scene, we assume that either two sensor nodes have knownlocations or one sensor node has known location and orien-tation.

Figure 4 also shows the two standard deviation (2σ) lo-cation uncertainty ellipses for both the sources and sensornodes assuming that the locations of sensor nodes A1 andA2 are known. The ellipses are obtained from the 2 × 2 co-variance submatrices of the CRB in (12) that correspond tothe location parameters of each sensor node or source. Theseellipses appear as small dots in the figure; an enlarged viewfor two sensor nodes are shown in Figure 5.

The results of the ML estimation procedure are alsoshown in Figure 5. The “×” marks show the ML locationestimates from 100 Monte-Carlo experiments in which ran-domly generated DOA and TOA measurements were gener-ated. The DOA and TOA measurement errors were drawnfrom Gaussian distributions with zero mean and variancesof σt = 1 ms and σθ = 3◦, respectively. The solid el-lipse shows the 2-standard deviation (2σ) uncertainty re-gion as predicted from the CRB. We find good agreementbetween the CRB uncertainty predictions and the Monte-Carlo experiments, which demonstrates the statistical effi-ciency of the ML estimator for this level of measurement un-certainty.

Figure 6 shows an uncertainty plot similar to Figure 4,but in this case we assume that the location and orien-tation of sensor node A1 is known. In comparison withFigure 4, we see much larger uncertainty ellipses for thesensor nodes, especially in the direction tangent to circleswith center at sensor node A1. The high tangential uncer-tainty is primarily due to the DOA measurement uncer-tainty with respect to a known orientation of sensor nodeA1. By comparing Figures 4 and 6, we see that it is more


2000150010005000X (meters)

0

500

1000

1500

2000

Y(m

eter

s)

S1

S10S9S5

S8

S2S4

S11

S3

S6S 7

A8

A3

A7

A6

A5A9A1

A10A4

A2

Figure 6: The 2σ location uncertainty ellipses for the scene inFigure 4 when the location and orientation of sensor node A1 isassumed to be known.

desirable to know the locations of two sensor nodes than toknow the location and orientation of a single sensor node;thus, equipping two sensor nodes with GPS systems re-sults in lower uncertainty than equipping one sensor nodewith a GPS and a compass. In the example shown, we ar-bitrarily chose sensor nodes A1 and A2 to have known lo-cations, and in this realization they happened to be rela-tively close to each other; however, choosing the two sensornodes with known locations to be well-separated tends to re-sult in lower location uncertainties of the remaining sensornodes.

We use as a quantitative measure of performance the 2σuncertainty radius, defined as the radius of a circle whose areais the same as the area of the 2σ location uncertainty ellipse.The 2σ uncertainty radius for each sensor node or source iscomputed as the geometric mean of the major and minoraxis lengths of the 2σ uncertainty ellipse. We find that the av-erage 2σ uncertainty radius for all ten sensor nodes is 0.80 mfor the example in Figure 4 and it is 3.28 m for the examplein Figure 6.

Figure 7 shows the effect of increasing the number ofsources on the average 2σ uncertainty radius. We plot the av-erage of the ten sensor node 2σ uncertainty radii, computedfrom the CRB, using from 2 through 11 sources, starting ini-tially with sources S1 and S2 in Figure 4 and adding sourcesS3, S4, . . . , S11 at each step. The solid line gives the average2σ uncertainty radius values when sensor nodes A1 and A2have known locations, and the dotted line corresponds to thecase that A1 has known location and orientation. The un-certainty reduces dramatically when the number of sourcesincreases from 2 to 3 and then decreases more gradually asmore sources are added.

111098765432Number of sources

10−1

100

101

102

Ave

rage

2σu

nce

rtai

nty

radi

us

(m)

A1: known locationand orientation

A1 and A2: known location

Figure 7: Average 2σ location uncertainty radius for the scenes inFigures 4 and 6 as a function of the number of source signals used.

18001600140012001000800600400200Meters

0

0.2

0.4

0.6

0.8

1Pd

r0 = ∞

r0 = 2000 m

r0 = 800 m

Figure 8: Detection probability of a source a distance r from a sen-sor node, for three values of r0.

Partial measurements

Next, we consider the case when not all sensor nodes de-tect all sources. For a sensor node that is a distance r froma source, we model the detection probability as

PD(r) = exp−(r/r0)2, (19)

where r0 is a constant that adjusts the decay rate on the detec-tion probability (r0 is the range in meters at which PD = e−1).We assume that when a sensor node detects a source, it mea-sures both the DOA and TOA of that source.

Three detection probability profiles are considered, asshown in Figure 8, and correspond to r0 = 800 m, r0 =2000 m, and r0 = ∞. Figure 9 shows the average 2σ uncer-tainty radius values, computed from the inverse of the Fisherinformation matrix in (18), for each of these choices for r0.


111098765432Number of sources

0

1

2

3

4

5

6

7

8

9

10

11

Ave

rage

nu

mbe

rof

sou

rces

seen

per

arra

y

r0 = ∞

r0 = 2000 m

r0 = 800 m

(a)

111098765432

Number of sources

10−1

100

101

102

Ave

rage

2σu

nce

rtai

nty

radi

us

(m)

r0 = ∞

r0 = 2000 m

r0 = 800 m

(b)

Figure 9: (a) Average 2σ location uncertainty for sensor nodes in Figure 4 for three detection probability profiles. (b) Average number ofsources detected by each sensor node in each case.

In this experiment, we assume that the locations of sensornodes A1 and A2 are known. The average number of sourcesdetected by each sensor node is also shown. For r0 = 2000 m,we see only a slight uncertainty increase over the case whereall sensor nodes detect all sources. When r0 = 800 m, theaverage location uncertainty is substantially larger, becausethe effective number of sources seen by each sensor node issmall. This behavior is consistent with the average numberof sources detected by each sensor node, shown in the figure.For a denser set of sensor nodes or sources, the uncertaintyreduces to a value much closer to the case of full signal de-tection; for example, with 30 sensor nodes and 30 sources inthis region the average uncertainty is less than 1 m even whenr0 = 800 m.

5.2. Field test results

We present the results of applying the auto-calibration pro-cedure to an acoustic source calibration data collection con-ducted during the DUNES test at Spesutie Island, AberdeenProving Ground, Maryland, in September 1999. In this test,four acoustic sensors are placed at known locations 60–100 mapart as shown in Figure 10. Four acoustic source signals arealso used; while exact ground truth locations of the sourcesare not known, it was recorded that each source was withinapproximately 1 m of a sensor. Each source signal is a seriesof bursts in the 40–160-Hz frequency band. Time-alignedsamples of the sensor microphone signals are acquired at asampling rate of 1057 Hz. Times of arrival are estimated bycross-correlating the measured microphone signals with theknown source waveform and finding the peak of the correla-tion function. Only a single microphone signal is availableat each sensor node, so while TOA measurements are ob-tained, no DOA measurements are available. Figure 10 showsthe ML estimates of sensor node and source location, assum-

100806040200X (meters)

0

20

40

60

80

100

Y(m

eter

s)

A3

A1A4

A2

Actual sensor positionMLE sensor estimateMLE source estimate

Figure 10: Actual and estimated sensor node locations, and esti-mated source locations, using field test data. Sensor node A1 is as-sumed to have known location and orientation.

ing that sensor node A1 has known location and orientationbut assuming no information about the source locations oremission times. Since no DOA estimates are available, the lo-cation, but not the orientation, of each sensor node is esti-mated. The estimate shown in Figure 10 and its mirror imagehave identical likelihoods; we have shown only the “correct”


estimate in the figure. The location errors of sensor nodesA2, A2, and A4 are 0.09 m, 0.19 m, and 0.75 m, respectively,for an average error of 0.35 m. In addition, the source loca-tion estimates are within 1 m of the sensor node locations,consistent with our ground truth records.

Finally, we note that the calibration procedure requireslow sensor node communication and has reasonable com-putational cost. The algorithms require low communicationoverhead as each sensor node needs to communicate only 2scalar values to the CIP for each source signal it detects. Com-putation of the calibration solution takes place at the CIP. Forthe synthetic examples presented, the calibration computa-tion takes on the order of 10 seconds using Matlab on a stan-dard personal computer. For the field test data, computationtime was less than 1 second.

6. CONCLUSIONS

We have presented a procedure for calibrating the locationsand orientations of a network of sensor nodes. The calibra-tion procedure uses source signals that are placed in the sceneand computes sensor node and source unknowns from esti-mated TOA and/or DOA estimates obtained for each source-sensor node pair. We present ML solutions to four variationson this problem, depending on whether the source locationsand signal emission times are known or unknown. We alsodiscuss the existence and uniqueness of solutions and algo-rithms for initializing the nonlinear minimization step in theML estimation. A ML calibration algorithm for the case ofpartial calibration measurements was also developed.

An analytical expression for the Cramer-Rao lowerbound on sensor node location and orientation error covari-ance matrix is also presented. The CRB is a useful tool toinvestigate the effects of sensor node density and source de-tection ranges on the self-localization uncertainty.

ACKNOWLEDGMENTS

This material is based in part upon work supported by theU.S. Army Research Office under Grant no. DAAH-96-C-0086 and Batelle Memorial Institute under Task Control no.01092, and in part through collaborative participation inthe Advanced Sensors Consortium sponsored by the U.S.Army Research Laboratory under the Federated LaboratoryProgram, Cooperative Agreement DAAL01-96-2-0001. Anyopinions, findings, and conclusions or recommendations ex-pressed in this publication are those of the authors and donot necessarily reflect the views of the U.S. Army ResearchOffice, the Army Research Laboratory, or the U.S. govern-ment.

REFERENCES

[1] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrument-ing the world with wireless sensor networks,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, vol. 4, pp. 2033–2036, Salt Lake City, Utah, USA, May 2001.

[2] G. Pottie and W. Kaiser, “Wireless integrated network sen-

sors,” Communications of the ACM, vol. 43, no. 5, pp. 51–58,2000.

[3] N. Srour, “Unattended ground sensors a prospective for oper-ational needs and requirements,” Tech. Rep., Army ResearchLaboratory, Adelphi, Md, USA, October 1999.

[4] S. Kumar, D. Shepherd, and F. Zhao eds., “Collaborative signaland information processing in microsensor networks,” IEEESignal Processing Magazine, vol. 19, no. 2, pp. 13–14, 2002.

[5] N. Bulusu, J. Heidemann, and D. Estrin, “GPS-less low costoutdoor localization for very small devices,” IEEE PersonalCommunications Magazine, vol. 7, no. 5, pp. 28–34, 2000.

[6] C. Savarese, J. Rabaey, and J. Beutel, “Locationing in dis-tributed ad-hoc wireless sensor networks,” in Proc. IEEE Int.Conf. Acoustics, Speech, Signal Processing, vol. 4, pp. 2037–2040, Salt Lake City, Utah, USA, May 2001.

[7] A. Savvides, C.-C. Han, and M. B. Strivastava, “Dynamic fine-grained localization in ad-hoc networks of sensors,” in Proc.7th Annual International Conference on Mobile Computing andNetworking, pp. 166–179, Rome, Italy, July 2001.

[8] L. Girod, V. Bychkovskiy, J. Elson, and D. Estrin, “Locatingtiny sensors in time and space: a case study,” in Proc. Inter-national Conference on Computer Design, Freiburg, Germany,September 2002.

[9] N. Bulusu, D. Estrin, L. Girod, and J. Heidemann, “Scalablecoordination for wireless sensor networks: self-configuringlocalization systems,” in Proc. 6th International Symposiumon Communication Theory and Applications, Ambleside, LakeDistrict, UK, July 2001.

[10] C. Reed, R. E. Hudson, and K. Yao, “Direct joint source lo-calization and propagation speed estimation,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, vol. 3, pp. 1169–1172, Phoenix, Ariz, USA, March 1999.

[11] J. C. Chen, R. E. Hudson, and K. Yao, “Maximum-likelihoodsource localization and unknown sensor location estimationfor wideband signals in the near field,” IEEE Trans. SignalProcessing, vol. 50, pp. 1843–1854, August 2002.

[12] V. Cevher and J. H. McClellan, “Sensor array calibration viatracking with the extended Kalman filter,” in Proc. Fifth An-nual Federated Laboratory Symposium on Advanced Sensors,pp. 51–56, College Park, Md, USA, March 2001.

[13] B. Friedlander and A. J. Weiss, “Direction finding in the pres-ence of mutual coupling,” IEEE Trans. Antennas and Propaga-tion, vol. 39, no. 3, pp. 273–284, 1991.

[14] N. Fistas and A. Manikas, “A new general global array cal-ibration method,” IEEE Trans. Acoustics, Speech, and SignalProcessing, vol. 4, pp. 553–556, 1994.

[15] B. C. Ng and C. M. S. See, “Sensor array calibration usinga maximum likelihood approach,” IEEE Trans. Antennas andPropagation, vol. 44, no. 6, pp. 827–835, 1996.

[16] J. Pierre and M. Kaveh, “Experimental performance of cal-ibration and direction-finding algorithms,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, pp. 1365–1368,Toronto, Ont., Canada, 1991.

[17] B. Flanagan and K. Bell, “Improved array self calibration withlarge sensor position errors for closely spaced sources,” inProc. 1st IEEE Sensor Array and Multichannel Signal ProcessingWorkshop, pp. 484–488, Cambridge, Mass, USA, March 2000.

[18] Y. Rockah and P. M. Schultheiss, “Array shape calibration us-ing sources in unknown locations. Part II: Near-field sourcesand estimator implementation,” IEEE Trans. Acoustics, Speech,and Signal Processing, vol. 35, no. 6, pp. 724–735, 1987.

[19] J. Elson and K. Romer, “Wireless sensor networks: a newregime for time synchronization,” in Proc. 1st Workshop on


Hot Topics ln Networks (HotNets-I), Princeton, NJ, USA, Oc-tober 2002.

[20] J. Elson, L. Girod, and D. Estrin, “Fine-grained networktime synchronization using reference broadcasts,” Tech. Rep.UCLA-CS-020008, University of California, Los Angeles, LosAngeles, Calif, USA, May 2002.

[21] D. Krishnamurthy, “Self-calibration techniques for acousticsensor arrays,” M.S. thesis, The Ohio State University, Colum-bus, Ohio, USA, January 2002.

[22] H. L. Van Trees, Detection, Estimation, and Modulation The-ory, Part I, John Wiley, New York, NY, USA, 1968.

[23] C. Knapp and G. C. Carter, “The generalized correlationmethod for estimation of time delay,” IEEE Trans. Acoustics,Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976.

[24] A. Nehorai and M. Hawkes, “Performance bounds for esti-mating vector systems,” IEEE Trans. Signal Processing, vol. 48,pp. 1737–1749, June 2000.

[25] P. B. van Wamelen, Z. Li, and S. S. Iyengar, “A fast expectedtime algorithm for the 2-D point pattern matching problem,”submitted to Computational Geometry, August 2002.

[26] H.-C. Chiang, R. L. Moses, and L. C. Potter, “Model-basedclassification of radar images,” IEEE Transactions on Informa-tion Theory, vol. 46, no. 5, pp. 1842–1854, 2000.

[27] P. Stoica and R. L. Moses, Introduction to Spectral Analysis,Prentice-Hall, Upper Saddle River, NJ, USA, 1997.

[28] U. Baysal and R. L. Moses, “On the geometry of isotropicwideband arrays,” in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Processing, vol. 3, pp. 3045–3048, Orlando, Fla, USA,May 2002.

[29] J. Chaffee and J. Abel, “On the exact solutions of pseudorangeequations,” IEEE Trans. on Aerospace and Electronics Systems,vol. 30, pp. 1021–1030, October 1994.

[30] G. J. Mclachlan and T. Krishnan, The EM Algorithm and Ex-tensions, Wiley, New York, NY, USA, 1997.

Randolph L. Moses received the B.S., M.S.,and Ph.D. degrees in electrical engineer-ing from Virginia Polytechnic Institute andState University in 1979, 1980, and 1984,respectively. During the summer of 1983,he was a SCEEE Summer Faculty ResearchFellow at Rome Air Development Center,Rome, NY. From 1984 to 1985, he was withthe Eindhoven University of Technology,Eindhoven, the Netherlands, as a NATOPostdoctoral Fellow. Since 1985, he has been with the Departmentof Electrical Engineering, The Ohio State University, and is cur-rently a Professor there. During 1994–1995, he was on sabbaticalleave as a Visiting Researcher at the System and Control Group atUppsala University in Sweden. His research interests are in digitalsignal processing and include parametric time series analysis, radarsignal processing, sensor array processing, and sensor networks. Dr.Moses is an Associate Editor for the IEEE Transactions on SignalProcessing, and served on the Technical Committee on StatisticalSignal and Array Processing of the IEEE Signal Processing Societyfrom 1991–1994. He is a coauthor, with P. Stoica, of Introductionto Spectral Analysis (Prentice Hall, 1997). He is a member of EtaKappa Nu, Tau Beta Pi, Phi Kappa Phi, and Sigma Xi.

Dushyanth Krishnamurthy was born inMadras, India, on June 17, 1977. He re-ceived the Bachelor of Engineering degreein electronics and communication engi-neering from the University of Madras,Madras, in 1999 and the M.S. degree inelectrical engineering from the Ohio StateUniversity, Columbus, Ohio, in 2002. Since2002, he is with the research and develop-ment team of B.A.S.P., Dallas, Tex. His re-search interests include sensor-array signal processing, image seg-mentation, and statistical data mining.

Robert M. Patterson received his B.S. de-gree in electrical engineering from LafayetteCollege, Easton, Pa, and his M.S. degree inelectrical engineering from the Ohio StateUniversity, Columbus, in 2000 and 2002,respectively. He is currently an employeeat The Johns Hopkins University AppliedPhysics Laboratory. His research interestsare in signal and image processing.


Acoustic Source Localization and Beamforming:Theory and Practice

Joe C. ChenElectrical Engineering Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1594, USAEmail: [email protected]

Kung YaoElectrical Engineering Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1594, USAEmail: [email protected]

Ralph E. HudsonElectrical Engineering Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1594, USAEmail: [email protected]

Received 17 February 2002 and in revised form 21 September 2002

We consider the theoretical and practical aspects of locating acoustic sources using an array of microphones. A maximum-likelihood (ML) direct localization is obtained when the sound source is near the array, while in the far-field case, we demon-strate the localization via the cross bearing from several widely separated arrays. In the case of multiple sources, an alternatingprojection procedure is applied to determine the ML estimate of the DOAs from the observed data. The ML estimator is shownto be effective in locating sound sources of various types, for example, vehicle, music, and even white noise. From the theoreticalCramer-Rao bound analysis, we find that better source location estimates can be obtained for high-frequency signals than low-frequency signals. In addition, large range estimation error results when the source signal is unknown, but such unknown parame-ter does not have much impact on angle estimation. Much experimentally measured acoustic data was used to verify the proposedalgorithms.

Keywords and phrases: source localization, ML estimation, Cramer-Rao bound, beamforming.

1. INTRODUCTION

Acoustic source localization has been an active research areafor many years. Applications include unattended ground sen-sor (UGS) network for military surveillance, reconnaissance,or around the perimeter of a plant for intrusion detection[1]. Many variations of algorithms using a microphone arrayfor source localization in the near field as well as direction-of-arrival (DOA) estimation in the far field have been proposed[2]. Many of these techniques involve a relative time-delay-estimation step that is followed by a least squares (LS) fit tothe source DOA, or in the near-field case, an LS fit to thesource location [3, 4, 5, 6, 7].

In our previous paper [8], we derived the “optimal”parametric maximum likelihood (ML) solution to locateacoustic sources in the near field and provided computersimulations to show its superiority in performance overother methods. This paper is an extension of [8], whereboth the far- and the near-field cases are considered, and thetheoretical analysis is provided by the Cramer-Rao bound

(CRB), which is useful for both performance comparisonand basic understanding purposes. In addition, several ex-periments have been conducted to verify the usefulness ofthe proposed algorithm. These experiments include both in-door and outdoor scenarios with half a dozen microphonesto locate one or two acoustic sources (sound generated bycomputer speaker(s)).

One major advantage that the proposed ML approachhas is that it avoids the intermediate relative time-delay esti-mation. This is made possible by transforming the widebanddata to the frequency domain, where the signal spectrumcan be represented by the narrowband model for each fre-quency bin. This allows a direct optimization for the sourcelocation(s) under the assumption of Gaussian noise insteadof the two-step optimization that involves the relative time-delay estimation. The difficulty in obtaining relative time de-lays in the case of multiple sources is well known, and byavoiding this step, the proposed approach can then estimatemultiple source locations. However, in practice, when we ap-ply the discrete Fourier transform (DFT), several artifacts


can result due to the finite length of data frame (see Section2.1.1). As a result, there does not exist an exact ML solutionfor data of finite length. Instead, we ignore these finite effectsand derive the solution which we refer to as the approximatedML (AML) solution. Note that a similar solution has beenderived independently in [9] for the far-field case.

In practice, the number of sources may be determinedindependent of or together with the localization algorithm,but here we assume that it is known for the purpose ofthis paper. For the single-source case, we have shown thatthe AML formulation is equivalent to maximizing the sumof the weighted cross-correlation functions between time-shifted sensor data in [8]. The optimization using all sensorpairs mitigates the ambiguity problem that often arises in therelative time-delay estimation between two widely separatedsensors for the two-step LS methods. In the case of multi-ple sources, we apply an efficient alternating projection (AP)procedure, which avoids the multidimensional search by se-quentially estimating the location of one source while fixingthe estimates of other source locations from the previous it-eration. In this paper, we demonstrate the localization resultsusing the AML method to the measured data, both in thenear-field and far-field cases, and for various types of soundsources, for example, vehicle, music, and even white noise.The AML approach is shown to outperform the LS-type al-gorithms in the single-source case, and by applying AP, theproposed algorithm is able to locate two sound sources fromthe observed data.

The paper is organized as follows. In Section 2, the the-oretical performances of DOA estimation and source local-ization with the CRB analysis are given. Then, we derive theAML solution for DOA estimation and source localizationin Section 3. In Section 4, simulation examples and experi-mental results are given to demonstrate the usefulness of theproposed method. Finally, we give our conclusions.

2. THEORETICAL PERFORMANCE AND ANALYSIS

In this section, the theoretical performances of DOA estima-tion for the far-field case and of source localization for thenear-field case are analyzed. First, we define the signal mod-els for the far- and near-field cases. Then, the CRBs are de-rived and analyzed. The CRB is most often used as a theo-retical lower bound for any unbiased estimator [10]. Mostof the derivations of the CRB for wideband source localiza-tion found in the literature are in terms of relative time-delayestimation error. In the following, we derive a more generalCRB directly from the signal model. By developing a theoret-ical lower bound in terms of signal characteristics and arraygeometry, we not only bypass the involvement of the inter-mediate time-delay estimator but also offer useful insights tothe physical properties of the problem.

The DOA and source localization variances both dependon two separate parts, one that only depends on the sig-nal and another that only depends on the array geometry.This suggests separate performance dependence on the sig-nal and the geometry. Thus, for any given signal, the CRBcan provide the theoretical performance of a particular ge-

(x5, y5)

(x4, y4)

(x3, y3)

(xc, yc)

(x2, y2)

(x1, y1)φ1

φ(2)s

φ(1)s

Figure 1: Far-field example with randomly distributed sensors.

ometry and helps the design of an array configuration for aparticular scenario of interest. The signal dependence partshows that theoretically the DOA and source location rootmean squares (RMS) error are linearly proportional to thenoise level and the speed of propagation, and inversely pro-portional to the source spectrum and frequency. Thus, betterDOA and source location estimates can be obtained for high-frequency signals than low-frequency signals. In further sen-sitivity analysis, large range estimation error is found whenthe source signal is unknown, but such unknown parameterdoes not affect the angle estimation.

The CRB analysis also shows that the uniformly spacedcircular array provides an attractive geometry for good over-all performance. When a circular array is used, the DOA vari-ance bound is independent of the source direction, and italso does not degrade when the speed of propagation is un-known. An effective beamwidth for DOA estimation can alsobe given by the CRB. The beamwidth provides a measure ofhow dense the angles should be sampled for the AML metricevaluation, thus prevents unneeded iterations using numeri-cal techniques.

Throughout this paper, we denote superscript T as thetranspose, H as the complex conjugate transpose, and ∗ asthe complex conjugate operation.

2.1. Signal model of the far- and near-field cases

2.1.1 The far-field case

When the source is in the far-field of the array, the wave frontis assumed to be planar and only the angle information canbe estimated. In this case, we use the array centroid as thereference point and define a signal model based on the rela-tive time delays from this position. For simplicity, we assumea randomly distributed planar (2D) array of R sensors, eachat position rp = [xp, yp]T , as depicted in Figure 1. The cen-

troid position is given by rc = (1/R)∑R

p=1 rp = [xc, yc]T . Thesensors are assumed to be omnidirectional and have iden-tical responses. On the same plane as the array, we assume

that there are M sources (M < R), each at an angle φ(m)s

Acoustic Source Localization and Beamforming: Theory and Practice 361

from the array, for m = 1, . . . ,M. The angle convention issuch that north is 0 degree and east is 90 degrees. The relative

time delay of the mth source is given by t(m)cp = t(m)

c − t(m)p =

[(xc − xp) sinφ(m)s + (yc − yp) cosφ(m)

s ]/v, where t(m)c and t(m)

p

are the absolute time delays from the mth source to the cen-troid and the pth sensor, respectively, and v is the speed ofpropagation in length unit per sample. The data collected bythe pth sensor at time n can be given by

xp(n) =M∑

m=1

s(m)c

(n− t(m)

cp

)+ wp(n), (1)

for n = 0, . . . , L − 1, p = 1, . . . , R, and m = 1, . . . ,M, where

s(m)c is the source signal arriving at the array centroid posi-

tion, t(m)cp is allowed to be any real-valued number, and wp is

the zero-mean white Gaussian noise with variance σ2.For the ease of derivation and analysis, the wideband sig-

nal model should be given in the frequency domain, wherea narrowband model can be given for each frequency bin. Ablock of L samples in each sensor data can be transformed tothe frequency domain by a DFT of length N . It is well knownthat the DFT creates a circular time shift when applying a lin-ear phase shift in the frequency domain. However, the timedelay in the array data corresponds to a linear time shift, thuscreating a mismatch in the signal model, which we refer to asan edge effect. When N = L, severe edge effect results forsmall L, but it becomes a good approximation for large L. Wecan apply zero padding for small L to remove such edge ef-fect, that is, N ≥ L+ τ, where τ is the maximum relative timedelay among all sensor pairs. However, the zero padding re-moves the orthogonality of the noise component across fre-quency. In practice, the size of L is limited due to the nonsta-tionarity of the source location. In the following, we assumethat either L is large enough or the noise is almost uncorre-lated across frequency. Note that the CRB derived based onthis frequency-domain model is idealistic and does not takethis edge effect into account.

In the frequency domain, the array signal model is givenby

X(k) = D(k)Sc(k) + η(k), (2)

for k = 0, . . . , N − 1, where the array data spectrum isgiven by X(k) = [X1(k), . . . , XR(k)]T , the steering matrixis given by D(k) = [d(1)(k), . . . ,d(M)(k)], the steering vec-

tor is given by d(m)(k) = [d(m)1 (k), . . . , d(m)

R (k)]T , d(m)p (k) =

e− j2πkt(m)cp /N , and the source spectrum is given by Sc(k) =

[S(1)c (k), . . . , S(M)

c (k)]T . The noise spectrum vector η(k) iszero-mean complex white Gaussian, distributed with vari-ance Lσ2. Note that, due to the transformation of the fre-quency domain, η(k) asymptotically approaches a Gaussiandistribution by the central limit theorem even if the ac-tual time-domain noise has an arbitrary i.i.d. distribution(with bounded variance) other than Gaussian. This asymp-totic property in the frequency domain provides a more reli-able noise model than the time-domain model in some prac-tical cases. For convenience of notation, we define S(k) =

D(k)Sc(k). By stacking up the N/2 positive frequency bins(zero frequency bin is not important and the negative fre-quency bins are merely mirror images) of the signal modelin (2) into a single column, we can rewrite the sensor datainto an NR/2 × 1 space-temporal frequency vector as X =G(Θ) + ξ, where G(Θ) = [S(1)T , . . . , S(N/2)T]T , and Rξ =E[ξξH] = Lσ2INR/2.

2.1.2 The near-field case

In the near-field case, the range information can also be es-timated in addition to the DOA. Denote rsm as the locationof the mth source, and in this case we use this as the refer-ence point instead of the array centroid. Since we considerthe near-field sources, the signal strength at each sensor canbe different due to nonuniform spatial loss in the near-fieldgeometry. The sensors are again assumed to be omnidirec-tional and have identical responses. In this case, the data col-lected by the pth sensor at time n can be given by

xp(n) =M∑

m=1

a(m)p s(m)

0

(n− t(m)

p

)+ wp(n), (3)

for n = 0, . . . , L − 1, p = 1, . . . , R, and m = 1, . . . ,M, where

a(m)p is the signal-gain level of the mth source at the pth sen-

sor (assumed to be constant within the block of data), s(m)0

is the source signal, and t(m)p is allowed to be any real-valued

number. The time delay is defined by t(m)p = ‖rsm−rp‖/v, and

the relative time delay between the pth and the qth sensors is

defined by t(m)pq = t(m)

p − t(m)q = (‖rsm − rp‖ − ‖rsm − rq‖)/v.

With the same edge-effect problem mentioned above, thefrequency-domain model for the near-field case is given by

X(k) = D(k)S0(k) + η(k), (4)

for k = 0, . . . , N − 1, where each element of the steering vec-

tor now becomes d(m)p (k) = a(m)

p e− j2πkt(m)p /N , and the source

spectrum is given by S0(k) = [S(1)0 (k), . . . , S(M)

0 (k)]T .

2.2. Cramer-Rao bound for DOA estimation

In the following CRB derivation, we consider the single-source case (M = 1) under three conditions: knownsignal and known speed of propagation, known signal butunknown speed of propagation, and known speed of prop-agation but unknown signal. The comparison of the threeconditions provides a sensitivity analysis of different param-eters. Only the single-source case is considered since valuableanalysis can be obtained using a single source while the ana-lytic expression of the multiple-sources case becomes muchmore complicated. The far-field frequency-domain signalmodel for the single-source case is given by

X(k) = Sc(k)d(k) + η(k), (5)

for k = 0, . . . , N − 1, where d(k) = [d1(k), . . . , dR(k)]T ,dp(k) = e− j2πktcp/N , and Sc(k) is the source spectrum of thissource.


After considering all the positive frequency bins, we canconstruct the Fisher information matrix [10] by

F = 2 Re[

HHR−1ξ H

] = (2/Lσ2)Re

[HHH

], (6)

where H = ∂G/∂φs for the case of known signal andknown speed of propagation. In this case, the Fisher in-formation matrix is indeed a scalar Fφs = ζα, where ζ =(2/Lσ2v2)

∑N/2k=1(2πk|Sc(k)|/N)2 is the scale factor that is pro-

portional to the total power in the derivative of the sourcesignal, and α = ∑R

p=1 b2p is the geometry factor that depends

on the array and the source direction, where

bp =(xc − xp

)cosφs −

(yc − yp

)sinφs. (7)

Hence, for any arbitrary array, the RMS error bound for DOAestimation is given by σφs ≥ 1/

√ζα. The geometry factor α

provides a measure of geometric relations between the sourceand the sensor array. Poor array geometry may lead to a smallα, which results in large estimation variance. It is clear fromthe scale factor ζ that the performance does not solely de-pend on the SNR but also the signal bandwidth and spectraldensity. Thus, source localization performance is better forsignals with more energy in the high frequencies.

In the case of unknown source signal, the matrixH = [∂G/∂φs, ∂G/∂|Sc|T , ∂G/∂ΦT

c ], where Sc = [Sc(1),. . . , Sc(N/2)]T , and |Sc| and Φc are the magnitude and phasepart of Sc, respectively. The resulting bound after applyingthe well-known block matrix inversion lemma (see [11, Ap-

pendix]) on Fφs,Sc is given by σφs ≥ 1/√ζ(α− zSc), where

zSc = (1/R)[∑R

p=1 bp]2 is the penalty term due to the un-known source signal. It is known that the DOA perfor-mance does not degrade when the source signal is un-known; thus, we can show that zSc is indeed zero, that is,∑R

p=1 bp = cosφs∑R

p=1(xc − xp) − sinφs∑R

p=1(yc − yp) = 0

since∑R

p=1(xc−xp) = Rxc−∑R

p=1 xp = 0 and∑R

p=1(yc−yp) =0. Note that the above analysis is valid for any arbitrary ar-ray. When the speed of propagation is unknown, the ma-trix H = [∂G/∂φs, ∂G/∂v], and the resulting bound afterapplying the matrix inversion lemma on Fφs,v is given by

σφs ≥ 1/√ζ(α− zv), where zv = (1/

∑Rp=1 t

2cp)[

∑Rp=1 bptcp]2 is

the penalty term due to the unknown speed of propagation.This penalty term is not necessarily zero for any arbitrary ar-ray, but it becomes zero for a uniformly spaced circular array.

2.2.1 The circular-array case

In the following, we show the CRB for a uniformly spacedcircular array. Not only a simple analytic form can be givenbut also the optimal geometry for DOA estimation. The vari-ance of the DOA estimation is independent of the source di-rection, and also does not degrade when the speed of propa-gation is unknown. Without a loss of generality, we pick thearray centroid as the origin, that is, rc = [0, 0]T . The locationof the pth sensor is given by rp = [ρ sinφp, ρ cosφp]T , whereρ is the radius of the circular array, φp = 2πp/R + φ0 is the

angle of the pth sensor with respect to north, and φ0 is the an-gle that defines the orientation of the array. Then, α = ρ2R/2.The DOA variance bound is given by σ2

φs(circular array) ≥2/ζρ2R, which is independent of the source direction. It isuseful to define the following terms for a better interpreta-tion of the CRB. Define the normalized root weighted meansquared (nrwms) source frequency by

knrwms ≡ 2N

√√√√∑N/2k=1 k2

∣∣Sc(k)∣∣2∑N/2

k=1

∣∣Sc(k)∣∣2 , (8)

and the effective beamwidth by

φBW ≡ v

πρknrwms. (9)

Then, the RMS error bound for DOA estimation can be givenby

σφs(circular array) ≥ φBW√SNR array

, (10)

where the effective SNR �∑N/2k=1 |Sc(k)|2/Lσ2 and SNR array =

R· SNR.This shows that the effective beamwidth is proportional

to the speed and propagation and inversely proportional tothe circular array radius and the nrwms source frequency.For example, take v = 345/1000 = 0.345 m/sample, N =256, ρ = 0.1 m, knrwms = 0.78, and φBW = 2.8 degree. If weuse a larger circular array where ρ = 0.5 m, φBW = 0.6 degree.The effective beamwidth is useful to determine the angularsampling for the AML maximization. This avoids excessivesampling in the angular space and also prevents further it-erations on the AML maximization. Based on the angularsampling by the effective beamwidth, a quadratic polynomialinterpolation (concave function) of three points can yieldthe DOA estimate easily (see Appendix A). The explicit an-alytical form of the CRB for the circular array is also appli-cable to a randomly distributed 2D array. For instance, wecan compute the RMS distance of the sensors from its cen-troid and use that as the radius ρ in the circular array for-mula to obtain the effective beamwidth to estimate the per-formance of a randomly distributed 2D array. For instance,for a randomly distributed array of 5 sensors at positions{(1, 1), (2, 0.8), (3, 1.4), (1.5, 3), (1, 2.5)}, the RMS distance ofthe array to its centroid is 1.14. Since we cannot obtain anexplicit analytical form for this random array, we can simplyuse the circular array formula for ρ = 1.14 to obtain the effec-tive beamwidth φBW. For some random arrays, the DOA vari-ance depends highly on the source direction, and an ellipticalmodel is better than the circular one (see Appendix B).

2.3. CRB for source localization

For the near-field case, we also consider the CRB for a sin-gle source under three different conditions. The source sig-nal Sc and steering vector in the far-field case are replacedby S0 and by the steering vector with signal-gain level ap in


the signal component G, respectively. For the first case, wecan construct the Fisher information matrix by (6), whereH = ∂G/∂rTs , assuming that rs is the only unknown. In thiscase, Frs = ζA, where

A =R∑

p=1

a2pupuT

p (11)

is the array matrix and up = (rs − rp)/‖rs − rp‖. The A ma-trix provides a measure of geometric relations between thesource and the sensor array. Poor array geometry may lead todegeneration in the rank of matrix A. Note that the near-fieldCRB has the same dependence ζ on the signal as the far-fieldcase.

When the speed of propagation is also unknown, that is,Θ = [rTs , v]T , the H matrix is given by H = [∂G/∂rTs , ∂G/∂v].The Fisher information block matrix for this case is given by

Frs ,v = ζ

[A −UAat

−tTAaUT tTAat

], (12)

where U = [u1, . . . ,uR], Aa = diag([a21, . . . , a

2R]), and t =

[t1, . . . , tR]T . By applying the block matrix inversion lemma,the leadingD×D submatrix of the inverse Fisher informationblock matrix can be given by

[F−1

rs ,v

]11:DD =

1ζ

(A− Zv

)−1, (13)

where the penalty matrix due to the unknown speed of prop-agation is defined by Zv = (1/tTAat)UAattTAaUT . The ma-trix Zv is nonnegative definite; therefore, the source local-ization error of the unknown speed of propagation case isalways larger than that of the known case.

When the source signal is also unknown, that is, Θ =[rTs , |S0|T ,ΦT

0 ]T , the H matrix is given by H = [∂G/∂rTs ,∂G/∂|S0|T , ∂G/∂ΦT

0 ], where S0 = [S0(1), . . . , S0(N/2)]T , and|S0| and Φ0 are the magnitude and phase part of S0, respec-tively. The Fisher information matrix can then be explicitlygiven by

Frs ,S0 =[ζA B

BT D

], (14)

where B and D are not explicitly given since they are notneeded in the final expression. By applying the block matrixinversion lemma, the leading D×D submatrix of the inverseFisher information block matrix can be given by

[F−1

rs ,S0

]11:DD =

1ζ

(A− ZS0

)−1, (15)

where the penalty matrix due to the unknown source signalis defined by

ZS0 =1∑R

p=1 a2p

( R∑p=1

a2pup

)( R∑p=1

a2pup

)T

. (16)

The CRB with the unknown source signal is always largerthan that with the known source signal, as discussed below. Itcan be easily shown that since the penalty matrix ZS0 is non-negative definite. The ZS0 matrix acts as a penalty term sinceit is the average of the square of weighted up vectors. The es-timation variance is larger when the source is faraway sincethe up vectors are similar in directions to generate a largerpenalty matrix, that is, up vectors add up. When the source isinside the convex hull of the sensor array, the estimation vari-ance is smaller since ZS0 approaches zero, that is, up vectorscancel each other. For the 2D case, the CRB for the distanceerror of the estimated location [xs, ys]T from the true sourcelocation can be given by

σ2d = σ2

xs + σ2ys ≥

[F−1

rs ,S0

]11 +

[F−1

rs ,S0

]22, (17)

where d2 = (xs−xs)2+( ys−ys)2. By further expanding the pa-rameter space, the CRB for multiple source localization canalso be derived, but its analytical expression is much morecomplicated and will not be considered here. The case of theunknown signal and the unknown speed of propagation isalso not shown due to its complicated form but numericalsimilarity to the unknown signal case. Note that when boththe source signal and sensor gains are unknown, it is possibleto determine the values of the source signal and the sensorgains (they can only be estimated up to a scaled constant).

2.3.1 The circular-array case

In the following, we again consider the uniformly spaced cir-cular array with radius ρ for the near-field CRB. Assume thatthe source is at distance rs from the array centroid that islarge enough so that the signal-gain levels are uniform, thatis, ap = a. Consider the 2D case of unknown source signal,and without loss of generality, let the line of sight (LOS) bethe X-axis and let the cross line of sight (CLOS) be the Y-axis. Then, the error covariance matrix is given by[

F−1rs ,S0

]11:22(circular array)

= 1ζ

(A− ZS0

)−1

=[σ2

LOS 0

0 σ2CLOS

]� 2r2

s

ζRa2ρ2

O(rsρ

)0

0 1

.

(18)

The intermediate approximations are given in Appendix C.The above result shows that as rs increases, the LOS error in-creases much faster than the CLOS error. For any arbitrarysource location, the LOS error is always uncorrelated withthe CLOS error. The variance of the DOA estimation is givenby σ2

φs = σ2CLOS/r

2s � 2/ζRa2ρ2, which is the same as the far-

field case for a = 1. The ratio of the CLOS and LOS error canprovide a quantitative measure to differentiate far-field fromnear-field. For example, define far-field as the case when theratio rs/ρ > γ. Then, for a given circular array, we can definefar-field as the case when the source range exceeds the arrayradius γ times. The explicit analytical form of the circular ar-ray CRB in the near-field case is again useful for a randomly


distributed 2D array. In the near-field case, the location er-ror bound can be represented by an ellipse, where its majoraxis represents the LOS error and its minor axis representsthe CLOS error.

3. ML SOURCE LOCALIZATION AND DOA ESTIMATION

3.1. Derivation of the ML solution

The derivation of the AML solution for real-valued signalsgenerated by wideband sources is an extension of the classi-cal ML DOA estimator for narrowband signals. Due to thewideband nature of the signal, the AML metric results in acombination of each subband. In the following derivation,the near-field signal model is used for source localization,and the DOA estimation formulation is merely the result ofa trivial substitution.

We assume initially that the unknown parameter space is

Θ = [rTs , S(1)T

0 , . . . , S(M)T

0 ]T , where the source locations are de-noted by rs = [rTs1

, . . . , rTsM ]T and the source signal spectrum

is denoted by S(m)0 = [S(m)

0 (1), . . . , S(m)0 (N/2)]T . By stacking

up the N/2 positive frequency bins of the signal model in (4)into a single column, we can rewrite the sensor data into anNR/2× 1 space-temporal frequency vector as X = G(Θ) + ξ,where G(Θ) = [S(1)T , . . . , S(N/2)T]T , S(k) = D(k)S0(k),and Rξ = E[ξξH] = Lσ2INR/2. The log-likelihood functionof the complex Gaussian noise vector ξ, after ignoring irrele-vant constant terms, is given by �(Θ) = −‖X−G(Θ)‖2. TheML estimation of the source locations and source signals isgiven by the following optimization criterion:

maxΘ

�(Θ) = minΘ

N/2∑k=1

∥∥X(k)−D(k)S0(k)∥∥2, (19)

which is equivalent to finding minrs ,S0(k) f (k) for all k bins,where

f (k) = ∥∥X(k)−D(k)S0(k)∥∥2. (20)

The minima of f (k), with respect to the source signal vectorS0(k), must satisfy ∂ f (k)/∂SH

0 (k) = 0, hence the estimate ofthe source signal vector which yields the minimum residualat any source location is given by

S0(k) = D†(k)X(k), (21)

where D†(k) = (D(k)HD(k))−1D(k)H is the pseudoinverseof the steering matrix D(k). Define the orthogonal projec-tion P(k, rs) = D(k)D†(k) and the complement orthog-onal projection P⊥(k, rs) = I − P(k, rs). By substituting(21) into (20), the minimization function becomes f (k) =‖P⊥(k, rs)X(k)‖2. After substituting the estimate of S0(k), theAML source locations estimate can be obtained by solvingthe following maximization problem:

maxrs

J(

rs) = max

rs

N/2∑k=1

∥∥P(k, rs

)X(k)

∥∥2. (22)

Note that the AML metric J(rs) has an implicit form forthe estimation of S0(k), whereas the metric �(Θ) showsthe explicit form. Once the AML estimate of rs is ob-tained, the AML estimate of the source signals can begiven by (21). Similarly, in the far-field case, the unknownparameter vector contains only the DOAs, that is, φs =[φ(1)

s , . . . , φ(M)s ]T . Thus, the AML DOA estimation can be

obtained by arg maxφs

∑N/2k=1 ‖P(k,φs)X(k)‖2. It is interesting

that, when zero padding is applied, the covariance matrix Rξ

is no longer diagonal and is indeed singular; thus, an exactML solution cannot be derived without the inverse of Rξ .In the above formulation, we derive the AML solution usingonly a single block. A different AML solution using multi-ple blocks could also be formed with some possible compu-tational advantages. When the speed of propagation is un-known, as in the case of seismic media, we may expand theunknown parameter space to include it, that is, Θ = [rTs , v]T .

3.2. Single-source case

In the single-source case, the AML metric in (22) becomesJ(rs) =

∑N/2k=1 |B(k, rs)|2, where B(k, rs) = d(k, rs)HX(k) is

the beam-steered beamformer output in the frequency do-

main [12], d = d/√∑R

p=1 a2p is the normalized steering vector,

and ap = ap/√∑R

p=1 a2p is the normalized signal-gain level at

the pth sensor. It is interesting to note that in the near-fieldcase, the AML beamformer output is the result of forming afocused spot (or area) on the source location rather than abeam since the range is also considered. In the far-field case,the AML metric becomes J(φs). In [8], the AML criterionis shown to be equivalent to maximizing the weighted crosscorrelations between sensor data, which is commonly usedfor estimating relative time delays.

The source location can be estimated, based on where,J(rs) is maximized for a given set of locations. Define the nor-malized metric

JN(rs) ≡∑N/2

k=1

∣∣B(k, rs)∣∣2

Jmax≤ 1, (23)

where Jmax =∑N/2

k=1[∑R

p=1 ap|Xp(k)|]2, which is useful to ver-ify estimated peak values. Without any prior informationon possible region of the source location, the AML metricshould be evaluated on a set of grid points. A nonuniformgrid is suggested to reduce the number of grid points. Forthe 2D case, polar coordinates with nonuniform sampling ofthe range and uniform sampling of the angle can be trans-formed to Cartesian coordinates that are dense near the ar-ray and sparse away from the array. When the crude estimateof the source location is obtained from the grid-point search,iterative methods can be applied to reach the global maxi-mum (without running into local maxima, given appropriatechoice of grid points). In some cases, grid-point search is notnecessary since a good initial location estimate is availablefrom, for example, the estimate of the previous data framefor a slowly moving source. In this paper, we consider theNelder-Mead direct search method [13] for the purpose ofperformance evaluation.


3.3. Multiple-sources case

For the multiple-sources case, the parameter estimation isa challenging task. Although iterative multidimensional pa-rameter search methods such as the Nelder-Mead directsearch method can be applied to avoid an exhaustive mul-tidimensional grid search, finding the initial source locationestimates is not trivial. Since iterative solutions for the single-source case are more robust and the initial estimate is easierto find, we extend the AP method in [14] to the near-fieldproblem. The AP approach breaks the multidimensional pa-rameter search into a sequence of single-source-parametersearch, and yields fast convergence rate. The following de-scribes the AP algorithm for the two-sources case, but itcan be easily extended to the case of M sources. Let Θ =[ΘT

1 ,ΘT2 ]T be either the source locations in the near-field case

or the DOAs in the far-field case.

AP algorithm 1.

Step 1. Estimate the location/DOA of the stronger source ona single-source grid

Θ(0)1= arg max

Θ1

J(Θ1). (24)

Step 2. Estimate the location/DOA of the weaker source ona single-source grid under the assumption of a two-sourcemodel while keeping the first source location estimate fromStep 1 constant

Θ(0)2 = arg max

Θ2

J([

Θ(0)T

1 ,ΘT2

]T). (25)

Step 3. Iterative AML parameter search (direct or gradientsearch) for the location/DOA of the first source while keepingthe estimate of the second source location from the previousiteration constant

Θ(i)1 = arg max

Θ1

J([

ΘT1 ,Θ

(i−1)T

2

]T). (26)

Step 4. Iterative AML parameter search (direct or gradientsearch) for the location/DOA of the second source whilekeeping the estimate of the first source location from Step 3constant

Θ(i)2 = arg max

Θ2

J([

Θ(i)T

1 ,ΘT2

]T). (27)

For i = 1, . . . (repeat Steps 3 and 4 until convergence).

4. SIMULATION EXAMPLES AND EXPERIMENTALRESULTS

4.1. Cramer-Rao bound example

In the following simulation examples, we consider aprerecorded tracked vehicle signal with significant spectralcontent of about 50-Hz bandwidth centered about a domi-nant frequency at 100 Hz. The sampling frequency is set to

8

6

4

2

0

−2

−4−8 −6 −4 −2 0 2 4 6

X-axis (m)

Y-a

xis

(m)

Sensor locationsSource true track

123

456

7

Figure 2: Single-traveling-source scenario. Uniformly spaced circu-lar array of 7 elements.

be 1 kHz and the speed of propagation is 345 m/s. The datalength L = 200 (which corresponds to 0.2 second), the DFTsize N = 256 (zero padding), and all positive frequency binsare considered. We consider a single-traveling-source sce-nario for a circular array of seven elements (uniformly spacedon the circumference), as depicted in Figure 2. In this case,we consider the spatial loss that is a function of the distancefrom the source location to each sensor location, thus thegains ap’s are not uniform. To compare the theoretical per-formance of source localization under different conditions,we compare the CRB for the known source signal and speedof propagation, for the unknown speed of propagation, andfor the unknown source signal cases for this single-traveling-source scenario. As depicted in Figure 3, the unknown sourcesignal is shown to be a much more significant parameter fac-tor than the unknown speed of propagation in source loca-tion estimation. However, these parameters are not signifi-cant in the DOA estimations.

4.2. Single-source experimental results

Several acoustic experiments were conducted in Xerox PARC,Palo Alto, Calif, USA. The experimental data was collectedindoor as well as outdoor by half to a dozen omnidirectionalmicrophones. A semianechoic chamber with sound absorb-ing foams attached to the walls and ceiling (shown to havea few dominant reflections) was used for the indoor datacollection. An omnidirectional loud speaker was used as thesound source. In one indoor experiment, the source is placedin the middle of the rectangular room of dimension 3 × 5 msurrounded by six microphones (convex hull configuration),as depicted in Figure 4. The sound of a moving light-wheeledvehicle is played through the speaker and collected by themicrophone array. Under 12 dB SNR, the speaker locationcan be accurately estimated (for every 0.2 second of data)


100

10−1

10−2

10−3

10−4Sou

rce

loca

lizat

ion

RM

Ser

ror

(m)

−8 −6 −4 −2 0 2 4 6X-axis position (m)

Unknown signalUnknown vknown signal and v

(a) Source localization.

0.04

0.03

0.02

0.01

0

Sou

rce

DO

AR

MS

erro

r(d

egre

e)

−8 −6 −4 −2 0 2 4 6X-axis position (m)

Unknown signalUnknown vknown signal and v

(b) Source DOA estimation.

Figure 3: CRB comparison for the traveling-source scenario (R =7): (a) localization bound, and (b) DOA bound.

with an RMS error of 73 cm using the near-field AML sourcelocalization algorithm. An RMS error of 127 cm is reportedthe same data using the two-step LS method. This shows thatboth methods are capable of locating the source despite someminor reverberation effects.

In the outdoor experiment (next to Xerox PARC build-ing), three widely separated linear subarrays, each with fourmicrophones (1 ft interelement spacing), are used. A station-ary noise source (possibly air conditioning) is observed froman adjacent building. To demonstrate the effectiveness of thealgorithms in handling wideband signals, a white Gaussiansignal is played through the loud speaker placed at the twolocations (from two independent runs) shown in Figure 5. Inthis case, each subarray estimates the DOA of the source in-dependently using the AML method, and the bearing cross-ing (see Appendix D) from the three subarrays (labeled asA, B, and C in the figures) provides an estimate of thesource location. The estimation is again performed for ev-ery 0.2 second of data. An RMS error of 32 cm is reported forthe first location, and an RMS error of 97 cm is reported forthe second location. Then, we apply the two-step LS DOAestimation to the same data, which involves relative time-delay estimation among the Gaussian signals. Poorer resultsare shown in Figure 6, where an RMS error of 152 cm is re-ported for the first location, and an RMS error of 472 cm is

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

−2 −1 0 1 2 3 4

Y-a

xis

(m)

X-axis (m)

Sensor locationsActual source locationSource location estimates

Figure 4: AML source localization of a vehicle sound in a semiane-choic chamber.

15

10

5

0

Y-a

xis

(m)

−5 0 5 10

A B

C

15

10

5

0−5 0 5 10

A B

C

Y-a

xis

(m)

X-axis (m) X-axis (m)


Figure 5: Source localization of white Gaussian signal using AMLDOA cross bearing in an outdoor environment.

reported for the second location. This shows that when thesource signal is truly wideband, the time-delay-based tech-niques can yield very poor results. In other outdoor runs, theAML method was also shown to yield good results for musicsignals.

Then, a moving source experiment is conducted by plac-ing the loud speaker on a cart that moves on a straight linefrom the top to the bottom of Figure 7. The vehicle sound isagain played through the speaker while the cart is moving.We assume that the source location is stationary within each


15

10

5

0

Y-a

xis

(m)

−5 0 5 10

A B

C

15

10

5

0−5 0 5 10

A B

C

Y-a

xis

(m)

X-axis (m) X-axis (m)


Figure 6: Source localization of white Gaussian signal using LSDOA cross bearing in an outdoor environment.

15

10

5

0

Y-a

xis

(m) C

−4 −2 0 2 4 6 8 10 12 14X-axis (m)

A B

Sensor locationsSource location estimatesActual traveled path

Figure 7: Source localization of a moving speaker (vehicle sound)using AML DOA cross bearing in an outdoor environment.

data frame of about 0.1 second, and the DOA is estimatedfor each frame using the AML method. The source locationis again estimated by the cross bearing of the three DOAs.As shown in Figure 7, the source can be well estimated to bevery close to the actual traveled path. The results using theLS method (not shown) are much worse when the source isfaraway.

16

14

12

10

8

6

4

2

0

Y-a

xis

(m)

−5 0 5 10

A

X-axis (m)

Source 1

Source 2

C

Sensor locationsActual source locationsSource location estimates

Figure 8: Two-source localization using AML DOA cross bearingwith AP in an outdoor environment.

4.3. Two-source experimental results

In a different outdoor configuration, two linear subarrays(labeled as A and C), each consisting of four microphones,are placed at the opposite sides of the road and two omni-directional loud speakers are placed between them, as de-picted in Figure 8. The two loud speakers play two indepen-dent prerecorded sounds of light-wheeled vehicles of differ-ent kinds. By using the AP steps on the AML metric, theDOAs of the two sources are jointly estimated for each arrayunder 11 dB SNR (with respect to the bottom array). Then,the cross bearing yields the location estimates of the twosources. The estimation is performed for every 0.2 second ofdata. An RMS error of 37 cm is observed for source 1 andan RMS error of 45 cm is observed for source 2. Note that therange estimate of the second source is slightly worse than thatof the first source because the bearings from the two arraysare close to being collinear for the second source.

Another two-source localization experiment was alsoconducted inside the semianechoic chamber. In this setup,twelve microphones are placed in a linear manner near oneof the walls. Two speakers are placed inside the room, asdepicted in Figure 9. The microphones are then dividedinto three nonoverlapping groups (subarrays, labeled as A,B, and C), each with four elements. Each subarray per-forms the AML DOA estimation using AP. The cross bear-ing of the DOAs again provides the location estimate of thetwo sources. The estimation is again performed for every0.2 second of data. An RMS error of 154 cm is observed forthe first source, and an RMS error of 35 cm is observed forthe second source. Since the bearing angles are not too differ-ent across the three subarrays, the source range estimate be-comes poor, especially for source 1. This again suggests that


5

4

3

2

1

0

Y-a

xis

(m)

−2 −1 0 1 2 3 4 5

X-axis (m)

Source 1

Source 2


Figure 9: Two-source localization using AML DOA cross bearingwith AP in a semianechoic chamber.

the geometry of the subarrays used in this experiment wasfar from ideal, and widely separated subarrays would haveyielded better triangulation (cross bearing) results.

5. CONCLUSION

In this paper, the theoretical CRBs for source localization andDOA estimation are analyzed and the AML source localiza-tion and DOA estimation methods are shown to be effectiveas applied to measured data. For the single-source case, theAML performance is shown to be superior to that of the two-step LS method in various types of signals, especially for thetruly wideband ones. The AML algorithm is also shown tobe effective in locating two sources using AP. The CRB anal-ysis suggests the uniformly spaced circular array as the pre-ferred array geometry for most scenarios. When a circulararray is used, the DOA variance bound is independent ofthe source direction, and it also does not degrade when thespeed of propagation is unknown. The CRB also proves thephysical observations which favor high energy in the higher-frequency components of a signal. The sensitivity of sourcelocalization to different unknown parameters has also beenanalyzed. It has been shown that unknown source signal re-sults in a much larger error in range than that of unknownspeed of propagation, but those parameters are not signifi-cant in DOA estimation.

APPENDICES

A. DOA ESTIMATION USING INTERPOLATION

Denote the three data points {(x1, y1), (x2, y2), (x3, y3)} asthe angular samples and their corresponding AML function

values, where y2 is the overall maximum and the other twoare the adjacent samples. By the Lagrange interpolation poly-nomial formula [15], we can obtain a quadratic polyno-mial that interpolates the three data points. The angle (orthe DOA estimate) that yields the maximum value of thequadratic polynomial is given by

x = c1(x2 + x3

)+ c2

(x1 + x3

)+ c3

(x1 + x2

)2(c1 + c2 + c3

) , (A.1)

where c1 = y1/(x1−x2)/(x1−x3), c2 = y2/(x2−x1)/(x2−x3),and c3 = y3/(x3−x1)/(x3−x2). The interpolation step avoidsfurther iterations on the AML maximization.

B. THE ELLIPTICAL MODEL OF DOA VARIANCE

In Section 2.2.1, we show that we can conveniently definean effective beamwidth for a uniformly spaced circular ar-ray. This gives us one measure of the beamwidth that is in-dependent of the source direction. When we have randomlydistributed arrays, the circular CRB may be a reasonable ap-proximation if the sensors are distributed uniformly in boththe X and Y directions. However, in some cases, the sensorsmay span more in one direction than the other. In that case,we may model the effective beamwidth using an ellipse. Thedirection of the major axis indicates the best DOA perfor-mance, where a small beamwidth can be defined. The di-rection of the minor axis indicates the poorest DOA perfor-mance, and a large beamwidth is defined in that direction.This suggests the use of a variable beamwidth as a functionof angle, which is useful for the AML metric evaluation.

First, we need to determine the orientation of the ellipsefor an arbitrary 2D array. Without loss of generality, we de-fine the origin at the array centroid rc = [xc, yc]T = [0, 0]T .Let there be a total of R sensors. The location of the pth sen-sor is denoted as rp = [xp, yp]T in the coordinate system. Ourobjective is to find a rotation angle ψ from the X-axis suchthat the cross terms of the new sensor locations are summedto zero. The major and minor axes will be the new X- andY-axes. Denote [x′p, y′p]T as the new coordinate of the pthsensor in the rotated coordinate system. The new coordinatehas the following relation with the old coordinate:

x′p = xp cosψ + yp sinψ,

y′p = −xp sinψ + yp cosψ.(B.1)

The sum of the cross terms is then given by

R∑p=1

x′p y′p = c1 cosψ sinψ + c2

(1− 2 sin2 ψ

), (B.2)

where c1 =∑R

p=1(y2p − x2

p) and c2 =∑R

p=1 xp yp. After dou-ble angle substitutions and some algebraic manipulation toequate the above to zero, we obtain the solution

ψ = −12

tan−1(

2c2

c1

)+π

2�, (B.3)


for � = 0 and 1, which means that the two solutions that aredifferent by 90 degrees exist.

We have shown that, for a circular array, the DOAvariance bound is given by 1/ζα, where α = ρ2R/2. Foran ellipse with the center at the origin, the correspondingα = ∑R

p=1 b2p = cos2 φs

∑Rp=1(x′p)2 + sin2 φs

∑Rp=1(y′p)2. Note

that the cross terms become zero in this case. To put theabove in a form similar to that of the circular array, wecan write α = R[Vx cos2 φs + Vy sin2 φs], where Vx =(1/R)

∑Rp=1(x′p)2 and Vy = (1/R)

∑Rp=1(y′p)2. Note that at the

major or the minor axis, the source angles are either 0 degreeor 90 degrees. This means that α = RVx or RVy , dependingon which axis is the major or minor axis. Define ρx =

√2Vx

and ρy =√

2Vy . These two values can be used to determine

the largest and the smallest beamwidth for the ellipse, that is,φBW,x ≡ v/πρxknrwms and φBW,y ≡ v/πρyknrwms.

C. CIRCULAR ARRAY CRB APPROXIMATIONS

The approximations used in the near-field circular array CRBinvolve several steps, including the approximations for A andZS0 . The array matrix A defined in (11) can be given explicitlyby

A � a2R∑

p=1

upuTp

= Ra2

r2s

[(1− ρ2

r2s

+ O(ρ3

r3s

))rsrTs

+ρ2

2

(1 +

2ρ2

r2s

+ O(ρ3

r3s

))I],

(C.1)

where uniform gain a is assumed and power series expansionfor R > 3, preserving only the second order, is used to obtainthe final expression. Similarly, the penalty matrix ZS0 can beapproximated by

ZS0 �Ra2

r2s

(1− ρ2

2r2s

+ O(ρ3

r3s

))rsrTs , (C.2)

which also uses power series expansion preserving only thesecond order. After some simplifications, the difference ma-trix can be given by

A− ZS0 �Ra2

r2s

(ρ2

2I− ρ2

2r2s

rsrTs + O(ρ3

r3s

)rsrTs

)

= Ra2ρ2

2r2s

O(ρ

rs

)0

0 1

,

(C.3)

where rsrTs /r2s =

[1 00 0

]for this coordinate system. Hence, the

final approximation for the inverse Fisher information ma-trix is given by

1ζ

(A− ZS0

)−1 � 2r2s

ζRa2ρ2

O(rsρ

)0

0 1

. (C.4)

D. SOURCE LOCALIZATION VIA BEARING CROSSING

When two or more subarrays simultaneously detect the samesource, the crossing of the bearing lines can be used to es-timate the source location. This step is often called trian-gulation. Without loss of generality, let the centroid of thefirst subarray be the origin of the coordinate system. Denoterck = [xck , yck ]

T as the centroid position of the kth subarray,for k = 1, . . . , K . Denote φk as the DOA estimate (with re-spect to north) of the kth subarray. Then, the following sys-tem of linear equations can yield the bearing crossing solu-tion

cos(φ1) − sin

(φ1)

......

cos(φK) − sin

(φK)[xsys

]

=

xc1 cos

(φ1)− yc1 sin

(φ1)

...xcK cos

(φK)− ycK sin

(φK) .

(D.1)

Note that the source location [xs, ys]T is defined in the coor-dinate system with respect to the centroid of the first subar-ray.

ACKNOWLEDGMENTS

This work was partially supported by DARPA-ITO underContract N66001-00-1-8937. The authors wish to thank J.Reich, P. Cheung, and F. Zhao of Xerox PARC for conductingand planning the experiments presented in this paper.

REFERENCES

[1] J. C. Chen, K. Yao, and R. E. Hudson, “Source localization andbeamforming,” IEEE Signal Processing Magazine, vol. 19, no.2, pp. 30–39, 2002.

[2] M. S. Brandstein and D. Ward, Microphone Arrays: Techniquesand Applications, Springer-Verlag, Berlin, Germany, Septem-ber 2001.

[3] J. O. Smith and J. S. Abel, “Closed-form least-squares sourcelocation estimation from range-difference measurements,”IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 35,no. 12, pp. 1661–1669, 1987.

[4] H. C. Schau and A. Z. Robinson, “Passive source localiza-tion employing intersecting spherical surfaces from time-of-arrival differences,” IEEE Trans. Acoustics, Speech, and SignalProcessing, vol. 35, no. 8, pp. 1223–1225, 1987.

[5] Y. T. Chan and K. C. Ho, “A simple and efficient estimator forhyperbolic location,” IEEE Trans. Signal Processing, vol. 42,no. 8, pp. 1905–1915, 1994.

[6] M. S. Brandstein, J. E. Adcock, and H. F. Silverman, “A closed-form location estimator for use with room environment mi-crophone arrays,” IEEE Trans. Speech, and Audio Processing,vol. 5, no. 1, pp. 45–50, 1997.

[7] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli,“Blind beamforming on a randomly distributed sensor arraysystem,” IEEE Journal on Selected Areas in Communications,vol. 16, no. 8, pp. 1555–1567, 1998.

[8] J. C. Chen, R. E. Hudson, and K. Yao, “Maximum-likelihoodsource localization and unknown sensor location estimationfor wideband signals in the near-field,” IEEE Trans. SignalProcessing, vol. 50, no. 8, pp. 1843–1854, 2002.


[9] P. J. Chung, M. L. Jost, and J. F. Bohme, “Estima-tion of seismic-wave parameters and signal detection usingmaximum-likelihood methods,” Computers and Geosciences,vol. 27, no. 2, pp. 147–156, 2001.

[10] S. M. Kay, Fundamentals of Statistical Signal Processing: Es-timation Theory, Vol. 1, Prentice-Hall, New Jersey, NJ, USA,1993.

[11] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation,Prentice-Hall, New Jersey, NJ, USA, 2000.

[12] D. H. Johnson and D. E. Dudgeon, Array Signal Processing,Prentice-Hall, New Jersey, NJ, USA, 1993.

[13] J. A. Nelder and R. Mead, “A simplex method for functionminimization,” Computer Journal, vol. 7, pp. 308–313, 1965.

[14] I. Ziskind and M. Wax, “Maximum likelihood localiza-tion of multiple sources by alternating projection,” IEEETrans. Acoustics, Speech, and Signal Processing, vol. 36, no. 10,pp. 1553–1560, 1988.

[15] R. L. Burden and J. D. Faires, Numerical Analysis, PWS Pub-lishing, Boston, Mass, USA, 5th edition, 1993.

Joe C. Chen was born in Taipei, Taiwan, in1975. He received the B.S. (with honors),M.S., and Ph.D. degrees in electrical engi-neering from the University of California,Los Angeles (UCLA), in 1997, 1998, and2002, respectively. From 1997 to 2002, hewas with the Sensors and Electronics Sys-tems group of Raytheon Systems Company(formerly Hughes Aircraft), El Segundo,Calif. From 1998 to 2002, he was a ResearchAssistant at UCLA, and from 2001 to 2002, he was a Teacher As-sistant at UCLA. Since 2002, he joined TRW Space & Electronics,Redondo Beach, Calif, as a Senior Member of the Technical Staff.His research interests include estimation theory and statistical sig-nal processing as applied to sensor array systems, communicationsystems, and radar. Dr. Chen is a member of Tao Beta Pi and EtaKappa Nu honor societies and the IEEE.

Kung Yao received the B.S.E., M.A., andPh.D. degrees in electrical engineering fromPrinceton University, Princeton, NJ. He hasworked at the Princeton-Penn Accelerator,the Brookhaven National Lab, and the BellTelephone Labs, Murray Hill, NJ. He wasa NAS-NRC Postdoctoral Research Fellowat the University of California, Berkeley. Hewas a Visiting Assistant Professor at theMassachusetts Institute of Technology and aVisiting Associate Professor at the Eindhoven Technical University.In 1985–1988, he was an Assistant Dean of the School of Engineer-ing and Applied Science at UCLA. Presently, he is a Professor inthe Electrical Engineering Department at UCLA. His research in-terests include sensor array systems, digital communication theoryand systems, wireless radio systems, chaos communications andsystem theory, and digital and array signal processing. He has pub-lished more than 250 papers. He received the IEEE Signal Process-ing Society’s 1993 Senior Award in VLSI Signal Processing. He is thecoeditor of High Performance VLSI Signal Processing (IEEE Press,1997). He was on the IEEE Information Theory Society’s Board ofGovernors and is a member of the Signal Processing System Tech-nical Committee of the IEEE Signal Processing Society. He has beenon the editorial boards of various IEEE Transactions, with the mostrecent being IEEE Communications Letters. He is a Fellow of theIEEE.

Ralph E. Hudson received his B.S. degreein electrical engineering from the Univer-sity of California at Berkeley in 1960 and thePh.D. degree from the US Naval Postgradu-ate School, Monterey, Calif, in 1969. In theUS Navy, he attained the rank of LieutenantCommander and served with the Office ofNaval Research and the Naval Air SystemsCommand. From 1973 to 1993, he was withHughes Aircraft Company, and since thenhe has been a Research Associate in the Electrical Engineering De-partment at the University of California at Los Angeles. His re-search interests include signal and acoustic and seismic array pro-cessing, wireless radio, and radar systems. He received the Legionof Merit and Air Medal, and the Hyland Patent Award in 1992.


Dynamic Agent Classification and Tracking Usingan Ad Hoc Mobile Acoustic Sensor Network

David FriedlanderApplied Research Laboratory, The Pennsylvania State University, P.O. Box 30, State College, PA 16801-0030, USAEmail: [email protected]

Christopher GriffinApplied Research Laboratory, The Pennsylvania State University, P.O. Box 30, State College, PA 16801-0030, USAEmail: [email protected]

Noah JacobsonApplied Research Laboratory, The Pennsylvania State University, P.O. Box 30, State College, PA 16801-0030, USAEmail: [email protected]

Shashi PhohaApplied Research Laboratory, The Pennsylvania State University, P.O. Box 30, State College, PA 16801-0030, USAEmail: [email protected]

Richard R. BrooksApplied Research Laboratory, The Pennsylvania State University, P.O. Box 30, State College, PA 16801-0030, USAEmail: [email protected]

Received 12 December 2001 and in revised form 5 October 2002

Autonomous networks of sensor platforms can be designed to interact in dynamic and noisy environments to determine the oc-currence of specified transient events that define the dynamic process of interest. For example, a sensor network may be used forbattlefield surveillance with the purpose of detecting, identifying, and tracking enemy activity. When the number of nodes is large,human oversight and control of low-level operations is not feasible. Coordination and self-organization of multiple autonomousnodes is necessary to maintain connectivity and sensor coverage and to combine information for better understanding the dy-namics of the environment. Resource conservation requires adaptive clustering in the vicinity of the event. This paper presentsmethods for dynamic distributed signal processing using an ad hoc mobile network of microsensors to detect, identify, and tracktargets in noisy environments. They seamlessly integrate data from fixed and mobile platforms and dynamically organize plat-forms into clusters to process local data along the trajectory of the targets. Local analysis of sensor data is used to determine aset of target attribute values and classify the target. Sensor data from a field test in the Marine base at Twentynine Palms, Calif,was analyzed using the techniques described in this paper. The results were compared to “ground truth” data obtained from GPSreceivers on the vehicles.

Keywords and phrases: sensor networks, distributed computing, target tracking, target identification, self-organizing systems.

1. INTRODUCTION

Distributed sensing systems combine observations from alarge area network of sensors, creating the need for platformself-organization and the sharing of sensor information be-tween platforms. It is difficult to integrate the data from eachsensor into a single context for the entire network. Instead,groups of sensors in local areas collaborate to produce usefulinformation to the end user.

Our objective is to create a distributed wireless networkof sensors covering large areas to obtain an accurate repre-sentation of dynamic processes occurring within the region.Such networks are subject to severe bandwidth limitationsand power constrains. Additionally, we need to integrate datafrom heterogeneous sensors.

Our goals are met through algorithms that determine thecharacteristics of the target from local sensor data. They dy-namically cluster platforms into space-time neighborhoods


and exchange target information within neighborhoods todetermine target class and track characteristics. This differsfrom other methods of decentralized detection such as [1, 2]where the dimensionality of the sensor data vectors is re-duced to the distinct number of target attributes. Once or-ganized into clusters, sensors can combine their local knowl-edge to construct a representation of the world around them.This information can be used to construct a history of thedynamic process as it occurs in the sensor field [3].

Our analysis is based on the concepts of a space-timeneighborhood, a dynamic window, and an event. A space-timeneighborhood centered on the space-time point (x0, t0) is theset of space-time points

N(x, t) ≡ {(x, t) :

∣∣x − x0∣∣ ≤ ∆x,

∣∣t − t0∣∣ ≤ ∆t

}. (1)

The quantities ∆x and ∆t define the size of the neighbor-hood. The space-time window contains all the data that wasmeasured within a distance∆x around x0 and within the timeinterval t0 ± ∆t.

We can define a dynamic window around a moving pointg(t) as

ω(t) = {(x, t) :

∣∣x − g(t0)∣∣ ≤ ∆x,

∣∣t − t0∣∣ ≤ ∆t

}. (2)

Ideally, if g(t) were the trajectory of the target, we would an-alyze time-series data from sensors in the window Ne = ω(te)to determine information about the target at time te.

The target trajectory g(t) is unknown. It is, in fact, whatwe want to determine. We therefore look at closest-point-of-approach (CPA) events that occur within a single space-timeneighborhood. A CPA event ei j is defined for platform i oc-curring at the CPA time t j . The space-time coordinates ofthe event are (xi(t j), t j), where xi(t) is the trajectory of plat-form i.

We make the assumption that sensor energy increases asdistance from the source decreases. This is a reasonable as-sumption for acoustic and seismic sensors. The CPA eventis therefore assumed to occur when there is a peak in sen-sor energy. The amplitude of the event ai j is defined as theamplitude of the corresponding peak. In order to filter outnoise, reflection, or other spurious features, we count onlypeaks above a threshold and do not allow two events on asingle platform within the same space-time window. If datafrom multiple sensors are available, they must be integratedto determine a single peak time for the event.

For an event ei j , we analyze data from platforms in theneighborhood N(xi(t j), t j). We define the set of platformsthat contain events in this space-time neighborhood as thecluster of platforms associated with event ei j . These defini-tions apply to both stationary and moving platforms andseamlessly integrate both types. They can be used to deter-mine target velocity as long as the platform trajectories areknown and the platform speed is small compared to thepropagation speed of the energy field measured by the sen-sors. Platform locations can be determined by GPS and, forstationary platforms, additional accuracy can be achieved byintegrating GPS signals over time.

Local CPAbuffer

NeighboringCPA buffer

BroadcastCPA

CPAdetector

Formclusters

ReceiveCPA

Sensor databuffer

Sensor data

CPA eventclusters

Processclusters Target

event

Figure 1: System overview.

The sets of parameters needed to identify targets arecalled target events. They include xi: the target position, ti:the time, vi: the target velocity, and {a1 · · · an}: a set of tar-get attributes for target classification, which can be deter-mined from the sensor data in a region around the space-time point (xi, ti). A CPA event is detected by a platformwhen the target reaches its CPA to the platform. Each CPAwill correspond a peak in the readings of our acoustic sen-sors. We have developed an algorithm that limits data pro-cessing to the platforms closest to the trajectory of the tar-get rather than processing each CPA event. It evenly spreadsthe processing out over the space-time range of the targettrajectory. All the platforms within the neighborhood of anevent are assumed to be capable of communicating with eachother.

The remainder of this paper is divided as follows.Section 2 discusses the algorithm for platform clustering.Section 3 discusses our velocity and position estimation al-gorithm. Section 4 discusses our approach to target identi-fication. Section 5 provides both simulated and real-worldexperimental data that show that our approach producespromising results for velocity approximation and targetrecognition. Finally, Section 6 discusses our conclusions.

2. ALGORITHM FOR EVENT CLUSTERING

Nodes located within a given space-time window can forma cluster. Both the time and spatial extent of the windoware currently held constant. The maximum possible spatialsize of the window is constrained by the transmission rangeof the sensors. Each node contains a buffer for its own CPAevents, and a buffer for CPA events transmitted by its neigh-bors. Figure 1 shows a simple diagram depicting the systemrunning in parallel on each platform.

The CPA detector looks for peaks in sensor energy as de-scribed in Section 1. When it finds one, it stores the ampli-tude, time, and platform position in a buffer, and broad-casts the same information to its neighbors. When it receivesneighboring CPA events, it stores them in another buffer.The form clusters routine looks at both CPA event buffers,and forms event clusters as shown in Figure 1. The process

Dynamic Agent Classification and Tracking 373

For each local CPA event ki j = k(xi, t j)For each neighboring CPA event nkl = n(xl, tk)

If nkl is in the neighborhood Nij = N(xi, t j)Add nkl to the event set M

If the local peak amplitude a(ki j) ≥ a(nkl)∀nkl ∈M

Emit CPA event cluster F ≡ ki j ∪M

Algorithm 1: Form clusters pseudocode.

clusters routine determines the target position and velocity asdescribed in Section 3 and the target attributes as describedin Section 4.

3. VELOCITY AND POSITION ESTIMATIONALGORITHM

Models of human perception of motion may be based on thespatio-temporal distribution of energy detected through vi-sion [4, 5]. Similarly, the network detects motion through thespatio-temporal distribution of sensor energy.

We extend techniques found in [6] and adapt them tofind accurate vehicle velocity estimates from acoustic sensorsignals. The definitions shown below are for time and twospatial dimensions x = (x, y); however, their extension tothree spatial dimensions is straightforward.

The platform location data from the CPA event clustercan be organized into the following sets of observations:

(x0, 0

),(x1, t1

) · · · (xn, tn),(y0, 0

),(y1, t1

) · · · (yn, tn), (3)

where (x0, y0) is the location of event ki j (see Figure 1), whichcontains the largest amplitude CPA peak in the cluster. Weredefine the times in the observations, so t0 = 0 where t0 isthe time of CPA event ki j .

We weighted the observations based on the CPA peakamplitudes on the assumption that CPA times are more ac-curate when the target passes closer to the sensor to give

(x0, t0, w0

),(x1, t1, w1

) · · · (xn, tnwn),(

y0, t0, w0),(y1, t1, w1

) · · · (yn, tn, wn),

(4)

where wi is the weight of the ith event in the cluster. Thisgreatly improved the quality of the predicted velocities. Wedefined the spatial extent of the neighborhoods, so nodes donot span more than a few square meters and vehicle veloc-ities are approximately linear [6]. Under these assumptions,we can apply least square linear regression to obtain the fol-lowing equations [7]:

x(t) = vxt + c1, y(t) = vyt + c2, (5)

Input: Time-sorted event cluster F of CPA values.Output: Estimated velocity components vx and vy .While |F| ≥ 5{

Compute vx and vy using event cluster F;Compute rx and ry ; the vx and vy velocity

; correlation coefficients for FIf rx > Rx‖ry > Ry

{Rx = rx;Ry = ry ;vx store = vx;vy stored = vy ;

}PopBack(F);

};

Algorithm 2

where:

vx =(∑

i ti)(∑

i xi)− (∑i wi

)(∑i xiti

)(∑i ti)2 − (∑i wi

)(∑i t

2i

) ,

vy =(∑

i ti)(∑

i yi)− (∑i wi

)(∑i yiti

)(∑i ti)2 − (∑i wi

)(∑i t

2i

) ,

(6)

and the position x(t0) = (c1, c2). The space-time coordinatesof the target for this event are (x(t0), t0).

This simple technique can be augmented to ensure thatchanges in the vehicle trajectory do not degrade the qualityof the estimated track. The correlation coefficients for the ve-locities in each spatial dimension (rx, ry) can be used to iden-tify large changes in vehicle direction and thus limit the CPAevent cluster to include only those nodes that will best esti-mate local velocity. Assume that the observations are sortedas follows:

Oi < Oj −→∣∣ti − t0

∣∣ < ∣∣t j − t0∣∣, (7)

where Oi is an observation containing a time, location, andweight and t0 is the time of the event ki j . The velocity el-ements are computed once with the entire event set. Afterthis, the final elements of the list are removed and the veloc-ity is recomputed. This process is repeated while at least fiveCPAs are present in the set and subsequently the event sub-set with the highest velocity correlation is used to determinevelocity. Fewer than five CPA points could severely bias thecomputed velocity and thus render our approximation use-less. Algorithm 2 summarizes our technique.

4. TARGET CLASSIFICATION

The sounds a vehicle produces are a combination of theacoustic features of its components: its acoustic “finger-prints.” We have developed an algorithm to identify the pres-ence or absence of given features in a target vehicle trav-eling through a sensor network. Once the vehicle type is


0 2 4 6 8 10 12 14 16 18×104

−1.5

−1

−0.5

0

0.5

1

1.5×104

Computed speed versus true speed

Figure 2: Time series window.

determined, it is combined with velocity and position dataand broadcast over the network as a target event. This re-quires much less bandwidth than transmitting the originaltime series data.

The singular value decomposition (SVD) [8] is a ma-trix decomposition that can be used to find relationshipswithin sets of data. When used to construct relationships be-tween words and documents, this technique is called latentsemantic analysis (LSA). There is significant evidence thatLSA can be used to allow machines to learn words at a ratecomparable to that of school children [9]. LSA accomplishesthis by using SVD to infer relationships among members of adata set. We believe that this concept can be applied to vehicleidentification.

Our identification algorithm combines Latent SemanticAnalysis [9] with Principal Component Analysis [10, 11] tofuse semantic attributes and sensor data for target classifica-tion. There are two algorithms: data processing and data clas-sification. CPA event data are divided into training and testsets. The training data are used with the data processing al-gorithm and the test data are used with the data classificationalgorithm to evaluate the accuracy of the method.

The training set is further divided into databases for eachpossible value of each target attribute being used in the classi-fication. Target attribute values can be used to construct fea-ture vectors for use in pattern classification. Alternatively, wecan define “vehicle type” as a single attribute and identify thetarget directly.

A 4- to 5-second window is selected around the peak ofeach sample. All data outside the window is discarded. Thisensures that noise bias is reduced. The two long vertical linesin Figure 2 show what the boundaries of the window wouldbe on a typical sample.

The window corresponds to the period of time when avehicle was closest to the platform. The data are divided intoconsecutive frames. A frame is 512 data points sampled at5 kHz (0.5 seconds in length) and has a 12.5% overlap (0.07second) with each of its neighbors. The power spectral den-sity of each frame is found and stored as a column vector of513 data points (grouped by originating sample) with data

Unknown

Database featurespanned subspace

Residual

Figure 3: Isolating qualities in the feature space.

Table 1: Quality of estimation.

Computed versus true velocity Percent

Percent within 1 m/s 81%

Percent within 2 m/s 91%

Percent within 5 degrees 64%



points corresponding to frequencies from 0 to 512 Hz.Target identification combines techniques from [11] and

makes use of an eigenvalue analysis to give an indicationof the distance that an unknown sample vector is from thefeature space of each database. This indication is called aresidual. These residuals can be interpreted as “a measure-ment of the likelihood” that the frame being tested belongsto the class of vehicles represented by the database [11]. Thedatabases are grouped by attribute and the residuals of eachframe within each group are compared. The attribute valuecorresponding to the smallest total of the residuals withineach group is assigned to the frame. Figure 3 illustrates thisprocess.

5. EXPERIMENTAL RESULTS

We present two sets of results. Each demonstrates the qual-ity of our techniques for estimating vehicle velocity in a dis-tributed sensor field and identifying target characteristics.The result set comes from data collected at Twentynine PalmsMarine Base during a field test and also from ideal data con-structed in the lab for testing the velocity estimation algo-rithm.

5.1. Velocity estimation

We present a verification of our clustering and velocity esti-mation algorithms using data gathered at Twentynine PalmsMarine base located in California. A sensor grid was testedthere in August 2000.

We have analyzed the quality of our velocity estimationalgorithm using our field data and these results appear inTable 1.


Table 2: Classification.

Actual vehicle Classified numbers Percent correctly classified

AAV DW HV

AAV 117 4 7 94%

DW 0 106 2 98%

HV 0 7 117 94%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Real speed values (m/s)

0123456789

1011121314151617

Com

pute

dsp

eed

valu

es(m

/s)

91% within 1 m/s81% within 1 m/s

Figure 4: Computed speed versus true speed (field test).

Figures 4 and 5 show plots displaying the quality of theestimations.

We have also generated a simulated data set for testingour velocity algorithm. The data set was generated using aparabolic vehicle motion. Figure 6 shows activated sensors asthe simulated vehicle passed through a dense grid of pseu-dorandomly distributed sensor platforms. Figures 7 displaysthe results of our algorithm for vehicle speed.

The calculated vehicle speeds yielded a correlation of 0.99against a line of y = 0.99x, where y is the calculated speedand x is the simulated speed. The angle match is also ex-tremely close.

5.2. Target identification verification

ARL evaluated its classification algorithms against the datacollected during the field test. Data are shown for three typesof military vehicles labeled AAV, DW, and HV. The CPA peakswere selected by hand rather than automatically detected bythe software and there was only a single vehicle present in thenetwork at a time. Environmental noise due to wind was sig-nificant. The data show that classification of military vehiclesin the field can be accurate under noisy conditions, as shownin Table 2.

6. CONCLUSIONS

We have derived algorithms for target analysis that can iden-tify target attributes using time-series data from platformsensors.

We have described an effective algorithm for computingtarget velocity. This velocity is critical for track formation

Measured angle (radians)−1.75−1.5

−1.25−1

−0.75−0.5

−0.250

0.250.5

0.751

1.251.5

Com

pute

dan

gle

erro

r(r

adia

ns)

89% correct within 7 degrees

0 1 2 3 4 5 6 7

7 degrees

−7 degrees

Figure 5: Computed angle versus true angle (field test).

−50000

0

50000

100000

150000

200000

250000

300000

Y-c

oord

inat

e(a

rbit

rary

un

its)

−800 −600 −400 −200 0 200 400 600

X-coordinate (arbitrary units)

Figure 6: Simulated sensor node layout.

0 1 2 3 4 5 6 7 8 9True velocity (arbitrary units)

0

12

34

5

6

78

9

Com

pute

dve

loci

ty(a

rbit

rary

un

its)

Figure 7: Computed speed versus true speed (simulation).

algorithms like those proposed in [3]. We have described analgorithm for accurate classification of military vehicles inthe field.

We have also provided experimental verification of ourprocedures against field data using military vehicles andacoustic sensors. We have determined quantitative measuresof the accuracy of the procedures.

Dense sensor networks over large areas contain massiveamounts of computing power in total, but may be restricted


in bandwidth and power consumption at individual nodes.Forming dynamic clusters around events of interest allowsprocessing multiple events in parallel over different local ge-ographic areas. We have shown how networks can coordi-nate platforms around tracks and provide relevant process-ing with a minimum of bandwidth and power consump-tion related to interplatform communications. This proce-dure is scalable and takes full advantage of the parallelismin the network. The same algorithms run in parallel on eachplatform, making the procedure robust with respect to theloss of individual platforms. In addition, our method al-lows seamless integration of fixed and mobile heterogeneousplatforms.

ACKNOWLEDGMENTS

This material is based upon work supported by the US ArmyRobert Morris Acquisition under Award No. DAAD19-01-1-0504. Any opinions, findings, and conclusions or recommen-dations expressed in this paper are those of the authors anddo not necessarily reflect the views of the Army.

REFERENCES

[1] B. Picinbono and M. P. Boyer, “A new approach of decen-tralized detection,” in International Conference on Acoustics,Speech, and Signal Processing, vol. 2, pp. 1329–1332, 1991.

[2] R. R. Tenney and N. R. Sandell Jr, “Detection with distributedsensors,” IEEE Trans. on Aerospace and Electronics Systems, vol.17, pp. 501–510, July 1981.

[3] R. Brooks, C. Griffin, and D. S. Friedlander, “Self-organizeddistributed sensor network entity tracking,” InternationalJournal of High Performance Computing Applications, vol. 16,no. 3, pp. 207–219, 2002, Special Issue on Sensor Networks.

[4] E. H. Adelson and J. R. Bergan, “Spatiotemporal energy mod-els for the perception of motion,” Journal of the Optical Societyof America {A}, vol. 2, no. 2, pp. 284–299, 1985.

[5] E. H. Adelson, “Mechanisms for motion perception,” Opticsand Photonics News, vol. 2, no. 8, pp. 24–30, 1991.

[6] M. Hellebrant, R. Mathar, and M. Scheibenbogen, “Estimat-ing position and velocity of mobiles in a cellular radio net-work,” IEEE Trans. Vehicular Technology, vol. 46, no. 1, pp.65–71, 1997.

[7] W. H. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Nu-merical Recipes in C, Cambridge University Press, Cambridge,UK, 1992.

[8] I. T. Jolliffe, Principal Component Analysis, Springer-Verlag,New York, NY, 1986.

[9] T. K. Landauer and S. T. Dumais, “A solution to Platos prob-lem: the latent semantic analysis theory of acquisition, induc-tion, and representation of knowledge,” Psychological Review,vol. 104, no. 2, pp. 211–240, 1997.

[10] V. Bhatnagar, A. Shaw, and R. Williams, “Improved automatictarget recognition using singular value decomposition,” inIEEE Trans. Acoustics, Speech, and Signal Processing, Seattle,Wash, USA, 1998.

[11] H. Wu, M. Siegel, and P. Khosla, “Vehicle sound signaturerecognition by frequency vector principal component anal-ysis,” in IEEE Trans. Instrumentation and Measurement, St.Paul, Minn, USA, May 1998.

David Friedlander is a Senior Research En-gineer and Head of the Informatics Depart-ment of the Information Science and Tech-nology Division of the Applied ResearchLaboratory at the Pennsylvania State Uni-versity. His research includes formal lan-guages, discrete-event control applied tocommand and control of military opera-tions, and logistics for major industrial op-erations. He played a key role in developingand analyzing discrete-event control systems for the command andcontrol of air campaigns. This includes the development of meth-ods for analyzing the formal languages associated with finite statemachines. He coauthored The Scheduling of Rail at Union PacificRailroad, which won the Innovative Applications in Artificial In-telligence Award at the American Association for Artificial Intelli-gence in 1997. He researched methods for automating the develop-ment of Lexical Knowledge Bases. This included the use of latent se-mantic indexing (LSI) for automatically indexing an email corpus,and the use of hierarchical clustering of LSI indices for conceptualrelationship discovery of the relationship between the intents of theemail messages. He received the B.A. degree in physics and mathe-matics from New York University and received the M.A. degree inphysics from Harvard University.

Christopher Griffin graduated with highdistinction from the Pennsylvania StateUniversity in December of 2000 with a B.S.degree in mathematics. He is currently em-ployed as an Assistant Research Engineerat the Pennsylvania State Applied ResearchLaboratory where his areas of research in-clude high-level logical control, automatedcontrol systems, and systems modeling. Mr.Griffin is currently pursuing his master’s de-gree in mathematics at the Pennsylvania State University.

Noah Jacobson is an undergraduate at thePennsylvania State University, working to-wards majors in mathematics and computerengineering. He is doing research on acous-tic sensor networks for vehicle tracking atthe Information Science and TechnologyDivision of Pennsylvania State Applied Re-search Laboratory. After receiving his B.S.degree, Mr. Jacobson is planning on to grad-uate school where he intends to earn a Ph.D.in computer vision.

Shashi Phoha is Professor of electrical en-gineering and Director of the InformationScience and Technology Division of the Ap-plied Research Laboratory at the Pennsyl-vania State University. She has led multi-organizational advanced research programsand laboratories in major US industrialand academic institutions. She pioneeredthe use of formal methods for the scien-tific analysis of distributed information fordecision support, multistage coordination, and intelligent con-trol of complex dynamic systems. She formulated the conceptof information-based fault prognosis and maintenance planningover the National Information Infrastructure derived from onlinephysics-based analysis of emerging damage. She has established


in situ analysis of correlated time-series data collected by a self-organizing sensor network of undersea robotic vehicles. She is thePrincipal Investigator for the Surveillance Sensor Networks MURIfunded by DARPA, and the Project Director of the Complex Sys-tems Failures MURI funded by the ARO. Dr. Phoha received herM.S. degree in 1973 from Cornell University and Ph.D. degree in1976 from Michigan State. She is an Associate Editor of IEEE Trans-action on Systems, Man, and Cybernetics. Dr. Phoha chaired theSpringer-Verlag Technical Advisory Board for the Dictionary of In-ternet Security, published in May 2002.

Richard R. Brooks is the Head of the Dis-tributed Systems Department of the Ap-plied Research Laboratory, the Pennsylva-nia State University. His areas of researchexpertise include sensor networks, criticalinfrastructure protection, mobile code, andemergent behaviors. He has his B.A. degreein mathematical sciences from the JohnsHopkins University, and performed gradu-ate studies in computer science and opera-tions research at the Conservatoire National des Arts et Metiers inParis, France. Dr. Brooks received his Ph.D. degree in computerscience from Louisiana State University in 1996. His work expe-rience includes being Manager of Systems and Applications Pro-gramming for Radio Free Europe/Radio Liberty in Munich, Ger-many. The consulting tasks Dr. Brooks has performed include theimplementation of a stock trading network for the French stock ex-change authority, and the expansion of the World Bank’s internalcomputer network to Africa and the former Soviet Union.


Collaborative In-Network Processing for Target Tracking

Juan LiuPalo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USAEmail: [email protected]

James ReichPalo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USAEmail: [email protected]

Feng ZhaoPalo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, USAEmail: [email protected]

Received 21 December 2001 and in revised form 4 October 2002

This paper presents a class of signal processing techniques for collaborative signal processing in ad hoc sensor networks, focusingon a vehicle tracking application. In particular, we study two types of commonly used sensors—acoustic-amplitude sensors for tar-get distance estimation and direction-of-arrival sensors for bearing estimation—and investigate how networks of such sensors cancollaborate to extract useful information with minimal resource usage. The information-driven sensor collaboration has severaladvantages: tracking is distributed, and the network is energy-efficient, activated only on a when-needed basis. We demonstratethe effectiveness of the approach to target tracking using both simulation and field data.

Keywords and phrases: sensor network, target tracking, distributed processing, Bayesian filtering, beamforming, mutual infor-mation.

1. INTRODUCTION

Sensors of various types have already become ubiquitousin modern life, from infrared motion detectors in our lightswitches to silicon accelerometers in the bumpers of our cars.As the cost of the sensors comes down rapidly due to ad-vances in MEMS fabrication and because these sensors in-creasingly acquire networking and local processing capabili-ties, new types of software applications become possible, dis-tributed among these everyday devices and performing func-tions previously impossible for any of the devices indepen-dently. Enabling such functionality without overtaxing theresources of the existing devices, especially when these de-vices are untethered and running on batteries, may requireus to rethink some important aspects of how sensing systemsare designed.

1.1. Advantages of distributed sensor networks

There are a number of reasons why networked sensors have asignificant edge over existing, more centralized sensing plat-forms.

An ad hoc sensor network can be flexibly deployed in anarea where there is no a priori sensing infrastructure. Cover-age of a large area is important for tracking events of a sig-nificant spatial extent as in tracking events of a significant

spatial extent, as in tracking a large number of events simul-taneously, or for tracking dynamic events traversing the sens-ing ranges of many individual sensors, as in tracking a mov-ing vehicle.

In cases of tracking low-observable phenomena, such as aperson walking in an obstacle field in an urban environmentor a stealthy military vehicle, the signal-to-noise ratio (SNR)of data collected from a central location may be unaccept-able. As sensor density increases, the mean distance from thenearest sensor to a target decreases and the SNR received atthe nearest sensor improves.

A large-area sensor network may also activate differentparts of the network to process different queries from users,supporting a multimode, multiuser operation.

1.2. Sensor network challenges

The design of signal processing applications for the sensornetworks involves a number of significant challenges. Theprimary concern is the limited energy reserve at each node.Combining information from spatially distributed sensornodes requires processing and communicating sensor data,hence consumes energy. Second, the network must be able toscale to large numbers of nodes and to track many events.

To address these challenges, the sensor network mustblend sensing application with network routing so that the

Collaborative In-Network Processing for Target Tracking 379

communication is informed by the application needs. Datasource selection is key to conserving network resources,managing network traffic, and achieving scalability.

1.3. Collaborative signal processing

Traditional signal processing approaches have focused on op-timizing estimation quality for a set of available resources.However, for power-limited and multiuser decentralized sys-tems, it becomes critical to carefully select the embedded sen-sor nodes that participate in the sensor collaboration, balanc-ing the information contribution of each against its resourceconsumption or potential utility for other users.

This approach is especially important in dense networks,where many measurements may be highly redundant. Thedata required to choose the appropriate information sourcesmay be dynamic and may exist solely on sensors already par-ticipating in the collaboration. We use the term collaborativesignal processing to refer to signal processing problems dom-inated by this issue of selecting embedded sensors to partici-pate in estimation.

There already exist a number of approaches to col-laborative signal processing. Brooks et al. [1] described aprediction-based sensor collaboration that uses estimationof target velocities to activate regions of sensors. Our ap-proach builds on an information-driven approach to track-ing that exploits information from both residual uncertain-ties of the estimation as well as vehicle dynamics (e.g., dy-namics as in the Brooks et al. approach). Estrin et al. [2]developed the directed diffusion approach to move sensordata in a network that seeks to minimize communication dis-tance between data sources and data sinks. Their approachhas been successfully demonstrated in experiments. Our al-gorithms build on the directed diffusion so that the networkrouting is further informed by application-level knowledgeabout where to send information and where to get useful in-formation.

1.4. Organization of this paper

In this paper, we describe a particular approach to collabora-tive signal processing. A class of signal-processing algorithmswill be presented to support the so-called information-driven sensor collaboration. The application of tracking amaneuvering vehicle will be used as a primary examplethroughout the discussion. Finally, experimental results fromsimulation and field data will be presented to validate the ap-proach.

2. SENSOR NETWORK AND TARGET TRACKING

The ability to track a target is essential in many commer-cial and military applications. For example, battlefield situ-ational awareness requires an accurate and timely determi-nation of vehicle locations for targeting purposes. Other ap-plications include facility security and highway traffic mon-itoring. Networked sensors are often ideally suited for tar-get tracking because of their spatial coverage and multiplic-ity in sensing aspect and modality. Each sensor acquires local,partial, and relatively crude information from its immediate

environment; by exploiting the spatial and sensing diversityof a multitude of sensors, the network can arrive at a globalestimation by suitably combining the information from thedistributed sources.

In this paper, we follow the information-driven sensorquerying (IDSQ) framework in which sensors are selectivelyactivated based on their utility and cost [3]. The applica-tion focus will be on tracking a moving vehicle througha two-dimensional sensor field. Because of the constraintson sensing range, computation, communication bandwidth,and energy consumption, we consider a leader-based track-ing scheme, where at any time instant there is only one sensoractive, namely, the leader sensor, while the rest of the net-work is idle. The leader applies a measurement to its predic-tion of vehicle position and produces a posterior belief aboutthe target location. The updated belief is then passed on toone of the neighboring sensors. The original leader goes backto sleep, and the sensor which receives the belief becomesthe new leader, and the process of sensing, estimation, andleader selection repeats. The leader-based scheme has severaladvantages. Selective sensor activation and communicationmake the network energy-efficient of lower probability of de-tection, and are capable of supporting multiuser operation ormultitarget tracking.

We use the following notations throughout this paper:

(i) superscript t denotes time. We consider discrete timet ∈ Z+,

(ii) subscript k ∈ {1, . . . , K} (where applicable) denotessensor index. This K is the total number of sensors inthe network,

(iii) the target state at time t is denoted as x(t). Withoutloss of generality, we consider the tracking application,where x(t) is the location of the moving object in a two-dimensional plane,

(iv) the sensor measurement of sensor at time t is denotedz(t),

(v) the measurement history up to time t is denoted as z(t),that is, z(t) = {z(0), z(1), . . . , z(t)},

(vi) the collection of all sensor measurements at time t aredenoted as z(t), that is, z(t) = {z(t)

1 , z(t)2 , . . . , z(t)

K }. This isused only in Section 2.2 when our distributed trackingsystem is compared to a fully centralized system.

2.1. Distributed Bayesian estimation

The goal of tracking is to obtain a good estimate of the tar-get location x(t) from the measurement history z(t). For thisproblem, we use the classic Bayesian approach. We would likeour estimate x(z(t)) to be, on average, as close to the truevalue x(t) as possible according to some measure. That is, theestimate should minimize the average cost

� = E[d(x(z(t)

), x(t))], (1)

where d(·, ·) is a loss function to measure the estimator per-formance. For example, d(x, x) = ‖x − x‖2 measures thesquare of l2 distance between the estimate and its true value.


x(t): target position at time t; z(t): sensor measurement at time t;

vmax: upper bound on target speed; � : neighbor list.

Step 0. Sleep until receive handoff package (t, p(x(t)|z(t))).

Step 1. Diffuse belief using vehicle dynamicsp(x(t+1)|z(t)) = ∫

p(x(t+1)|x(t)) · p(x(t)|z(t))dx(t).

Step 2. Do sensing:compute z(t+1);compute p(z(t+1)|x(t+1)).

Step 3. Compute p(x(t+1)|z(t+1)) ∝ p(z(t+1)|x(t+1))p(x(t+1)|z(t)).

Step 4. For sensor k ∈ �,compute information utility Ik = I(x(t+2); z(t+2)|z(t+1))

select knext = arg max Ik .

Step 5. Handoff (t + 1, p(x(t+1)|z(t+1))) to knext. Go back to step 0.

Algorithm 1: Algorithm for IDSQ tracker at each node.

For this loss function, the estimate is

x(t)MMSE = E

[x(t)

∣∣z(t)] = ∫

x(t)p(x(t)

∣∣z(t))dx(t). (2)

This estimator is known as the minimum mean-squared er-ror (MMSE) estimator [4]. We informally refer to the currenta posteriori distribution p(x(t)|z(t)) as the belief. The key is-sue is how to compute the belief efficiently.

As briefly explained earlier, we use a leader-based trackerto minimize computation and power consumption require-ments. At time t, the leader receives a belief state p(x(t)|z(t))from the previous leader, and takes a new measurementz(t+1). We assume that the following conditional indepen-dence assumptions are satisfied:

(i) conditioned on x(t+1), the new measurement z(t+1) isindependent of the past measurement history z(t);

(ii) conditioned on x(t), the new position x(t+1) is indepen-dent of z(t).

These are standard assumptions in dynamics and fairly mildin practice. Under these assumptions, based on the new mea-surement, the leader computes the new belief p(x(t+1)|z(t+1))using the sequential Bayesian filtering

p(x(t+1)

∣∣z(t+1))∝ p

(z(t+1)

∣∣x(t+1))·∫p(x(t+1)

∣∣x(t)) · p(x(t)∣∣z(t)

)dx(t).

(3)

Sequential Bayesian filtering includes the traditional Kalmanfiltering [5] as a special case. While the former is restricted tolinear systems and explicitly assumes Gaussian belief statesand error models, the latter is suitable for more general dis-crete time dynamic systems. This is useful in multisensor tar-get tracking, where the sensor models and vehicle dynamicsare often non-Gaussian and/or nonlinear, as will be discussedin Section 3.

In (3), p(x(t)|z(t)) is the belief inherited from the previousstep; p(z(t+1)|x(t+1)) is the likelihood of observation given thetarget location; p(x(t+1)|x(t)) is related to vehicle dynamics.For example, if the vehicle is moving at a known velocity v,then p(x(t+1)|x(t)) is simply δ(x(t+1) − x(t) − v). However, inpractice, the exact vehicle velocity is rarely known. We as-sume that the vehicle has a speed “uniformly” (i.e., distancetraveled per sample interval) distributed in [0, vmax], and thevehicle heading is uniform in [0, 2π). Therefore, p(x(t+1)|x(t))is a disk centered at x(t) with radius vmax. Under this model,the predicted belief p(x(t+1)|z(t)) (the integral in (3)) is ob-tained by convolving the old belief p(x(t)|z(t)) with the uni-form circular disk kernel. The convolution reflects the dilateduncertainty about target location due to motion.

Once the updated belief p(x(t+1)|z(t+1)) is computed, thecurrent leader hands it off to one of its neighboring sensorsand goes back to sleep. An information-driven sensor selec-tion criterion, described in Section 4, is used to decide whichneighboring sensor to hand the belief to based on the ex-pected contribution from that sensor. The most “informa-tive” sensor is selected and becomes the leader for the nexttime step t + 1. The IDSQ tracking algorithm is summarizedin Algorithm 1.

Note that minimal assumptions are made in this formu-lation. We do not require any knowledge about the road con-figuration, and do not make the exclusive assumption thatthe vehicle travels only on a road. In particular, the vehicledynamics model is rather crude. The vehicle can accelerate,decelerate, turn, or stop. We assume that the vehicle velocitysequence sampled at the tracking interval is statistically inde-pendent. Only vmax has to be known or assumed. These min-imal assumptions allow the algorithm to achieve simplicity,flexibility, and wide applicability. On the other hand, moreaccurate prior knowledge can be incorporated to further im-prove the performance. For example, adding road constraintscould further improve the tracking accuracy and decrease the


Table 1: Single-step cost for centralized and distributed Bayesian tracking. In the second row, � is the leader’s neighbor list. The last row isthe power needed to communicate reliably through radio. We assume a particular model where the communication power is adjustable andis proportional to the communication distance raised to the power of RF attenuation exponent α (α ≥ 2). RF overhead consumption andpower consumption of sensing and is neglected.

Centralized Distributed

ComputationO(K · |belief|), if (4) is true

O(|�| · |belief|)O(K2 · |belief|), else

Bits to be communicated O(K · |belief|) O(|belief|)Wireless comm. power O(|belief| ·∑k ‖ζk − ζcenter‖α) O(|belief| · ‖ζnext leader − ζleader‖α)

computational load. We are currently exploring more realis-tic vehicle dynamics models, which take into account higher-order dynamics and other complex characteristics of vehicletrajectories. However, for the purposes of this paper, we haveopted for the lower-computational complexity of this simplemodel.

This algorithm is fully distributed. There is no single cen-tral node in the network. Note that the sensor nodes donot have a global knowledge about the network such as itstopology. Each node is only aware of its immediate neigh-bors and their specifications. The communication is exclu-sively neighbor-to-neighbor. Only local computation is in-volved in computing the measurement z(t) and updating thebelief state. As discussed in Section 1, such fully decentral-ized characteristics are often desirable to ensure reliability,survivability, and scalability of the sensor network.

2.2. Comparisons to centralized Bayesian estimation

It is interesting to see how this distributed sensor networkcompares to a fully centralized one. Consider a centralizedsensor network consisting of K sensors. At any time instantt, each sensor k (k = 1, 2, . . . , K) informs the central process-ing unit of its measurement z(t)

k . The central processing unitupdates the belief state using the same sequential Bayesian fil-tering technique as in (3), with the difference that instead ofusing the single sensor measurement z(t) (which is a scalar),

it uses the measurement vector z(t) = {z(t)1 , z(t)

2 , . . . , z(t)K }. If

the sensors measurements are mutually independent whenconditioned on the target locations, then

p(z(t)

∣∣x(t)) = ∏k=1,...,K

p(z(t)k

∣∣x(t)). (4)

Compared to the centralized tracking algorithm, whichutilizes all K measurements at every time step, our dis-tributed algorithm incorporates only one out of |�| mea-surements each step, where |�| is the size of the leader node’slocal neighborhood, and in general, |�| � K . Hence the dis-tributed algorithm suffers from decreased tracking accuracybut scales much better in computation and communicationas the network grows. Table 1 summarizes the cost for eachstep of tracking in the centralized and distributed schemes.Unlike the centralized algorithm whose complexities go uplinearly or superlinearly with K , the distributed algorithmhas complexities independent of K .

Figure 1: Nonparametric representation of belief state.

2.3. Nonparametric belief representation

As we will see in Section 3, the observational model is non-linear; the likelihood p(z(t)|x(t)) is non-Gaussian, as is theposterior belief p(x(t)|z(t)). In view of these characteristics,we use a nonparametric representation for probability distri-butions (see Figure 1). The distributions are represented asdiscrete grids in the two-dimensional plane. The grey leveldepicts the probability distribution function (pdf) evaluatedat the grid location. The lighter the grid square, the higherthe pdf value.

The nonparametric representation of likelihood and pos-terior belief admits an efficient computation. The MMSE es-timate (2) is simply the average of the grid locations in thebelief cloud weighted by the belief value. The predicted be-lief p(x(t+1)|z(t)) (the integral in (3)) is a weighted sum of thevehicle dynamics pdf conditioned on each grid point in theoriginal belief cloud.

The resolution of the grid representation is constrainedby the computational capacity, storage space, and commu-nication bandwidth of the leader node. For our choices ofsensors, as will be detailed Section 3, the likelihood functionsare relatively smooth. This smoothness allows low-resolutionrepresentation without much loss in performance. Further-more, in our experiment, we store only the grid points whichhave likelihood value above a fixed threshold. The grid pointsbelow this likelihood are neglected.

3. SENSOR MODELS

We use two types of sensors for tracking: acoustic ampli-tude sensors, and direction-of-arrival (DOA) sensors. The


Figure 2: Likelihood function p(z|x) for acoustic amplitude sen-sors. The circle is the sensor location ζ and the cross is the true tar-get location.

acoustic amplitude sensors calculate sound amplitude mea-sured at each microphone and estimate the distance of thevehicle based on the physics of sound attenuation. The DOAsensors are small microphone arrays. Using beamformingtechniques, they determine the direction where the soundcomes from, that is, the bearing of the vehicle.

The nonparametric Bayesian approach we are usingposes few restrictions on sensor type and allows the networkto easily fuse data from multiple sensor types. Relatively low-cost sensors such as microphones are attractive because of af-fordability as well as simplicity in computation, compared toimagers. However, there are no barriers to adding other sen-sor types, including imaging, motion, or magnetic sensors.

3.1. Acoustic amplitude sensors

Assuming that the sound source is a point source and soundpropagation is lossless and isotropic, a root mean-squared(rms) amplitude measurement z is related to the soundsource position x as

z = a

‖x − ζ‖ + w, (5)

where a is the rms amplitude of the sound source, ζ is thelocation of the sensor, and w is rms measurement noise [6].For simplicity, we model w as Gaussian with zero mean andvariance σ2. The sound source amplitude a is also modeled asa random quantity. Assuming the a is uniformly distributedin the interval [alo, ahi], the likelihood has a closed-form ex-pression

p(z|x) = ∫ ahi

alop(z|x, a)p(a)da

= 1∆a

∫ ahi

alo

1√2πσ2

e−(z−a/r)2/2σ2da

= r

∆a

[Φ(ahi − rz

rσ

)−Φ

(alo − rz

rσ

)],

(6)

×10−3

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

00 5 10 15 20 25 30 35 40

Figure 3: The cross section of the likelihood (plotted in Figure 2)along the horizontal line past the sensor location.

where ∆a = ahi − alo, r = ‖x− ζ‖ is the distance between thesound source and the sensor, and Φ(·) is the standard errorfunction. The details of the derivation is referred in [3].

Figure 2 shows an example of the likelihood functionp(z|x), a crater-shaped function centered at the sensor lo-cation. The thickness of the crater (outer radius minus innerradius) is determined by alo, ahi, and σ2. Fixing the first two,the thickness increases with σ2, as the target location is moreuncertain. Fixing σ2, the thickness increases as alo decreasesor as ahi increases. In cartesian space, it is clear that this like-lihood is non-Gaussian, and difficult to approximate as thesum of gaussians. The cross section of the likelihood func-tion along the radial direction is plotted in Figure 3, and it isquite smooth and amenable to approximation by sampling.

The uniform, stationary assumption of source amplitudeis computationally lightweight. To accommodate quiet vehi-cles and vehicles in idle state, alo is set to zero in our experi-ments and ahi is set via calibration. In practice, the uniformassumption is simplistic, and as part of our ongoing work,we are developing new models of source amplitude to bet-ter model vehicle engine sound characteristics, exploiting thecorrelation of sound energy over time.

3.2. DOA sensors

Amplitude sensing provides a range estimate. This estimate isoften not very compact and is limited in accuracy due to thecrude uniform sound source amplitude model. These limi-tations makes the addition of a target bearing estimator veryattractive.

For estimating the bearing of the sound source, we usethe maximum likelihood (ML) DOA estimation algorithmproposed by Chen et al. [7]. Here we only outline the for-mulation of the estimation problem; interested readers mayrefer to their paper for more details.

Assume that we have a microphone array composed ofM identical omnidirectional microphones, and the soundsource is sufficiently far away from the microphone array so


that the wave received at the array is a planar wave. In thiscase, the data collected at the mth microphone at time n is

gm(n) = s0(n− tm

)+ wm(n), (7)

where s0 is the source signal, wm is the noise (assumed whiteGaussian), and tm is the time delay, which is a function of thetarget direction θ. Now consider the equivalent problem inthe Fourier frequency domain (by DFT of length L, L�M).We have

G(l) = D(l)S0(l) + W(l), (8)

for l = 0, 1, . . . , L− 1, where

(i) G(l)=[G1(l), G2(l), . . . , GM(l)]T is the frequency com-ponent of the received signal at frequency l,

(ii) S0(l) is the signal component,(iii) W(l) = [W1(l),W2(l), . . . ,WM(l)]T is the noise com-

ponent,(iv) the steering matrix takes the form D(l) = [D1(l),

D2(l), . . . , DM(l)]T , and Dm(l) = e− j2πltm/L.

The ML estimator seeks an estimate

(θ, S0

) = arg minθ,S0

L−1∑l=0

∥∥G(l)−D(l)S0(l)∥∥2. (9)

Given the arriving angle θ, the signal spectrum estimate is

S0(l) = D†(l)G(l), (10)

where D†(l) is the pseudoinverse of the steering matrix D(l).Plugging-in (10), (9) boils down to a one-dimensional op-timization over θ ∈ [0, 2π], which can be solved using sim-ple searching techniques. This DOA algorithm works well forwideband acoustic signals, and does not require the micro-phone array to be linear or uniformly spaced.

In our experiments, we use a microphone array with fourmicrophones, as shown in Figure 4. The centroid’s locationis defined as ζ0 = (0, 0); the arriving angle θ is defined as theangle with respect to the vertical grid axis. This convention isused throughout the paper.

Due to the presence of noise, the DOA estimate of an-

gles is often imperfect. The measurement z (i.e., θ) is closeto the true underline angle θ with some perturbation. Tocharacterize the likelihood function p(z|x), we tested theDOA algorithm using recorded vehicle sounds from an AAV,running the DOA algorithm on an actual sensor node (seeSection 5.2 for information on the node and vehicle). Thetest took place under reasonable noise conditions includingair handling units at a nearby building and some street andaircraft traffic. We performed tests at r = 50, 150, 200, and500 feet, and θ = 0, π/8, π/4, and 3π/8.

Since the microphone array is symmetric in fourquadrants, we only have to examine the first quadrant.At each combination, we ran the DOA algorithm 100times and computed the histogram of the DOA estimates{zexp1, zexp2, . . . , zexp100}. Figure 5 shows the histograms for

Mag.N

Target

θ

Arraycentroid

2 3

1 0

1′

1′

Figure 4: DOA sensor arrangement and angle convention.

r = 50 feet. The histograms suggest that the distribution of zis unimodal and approximately centered at θ. Hence for theangle measurement z, a Gaussian model with zero mean isappropriate. The likelihood takes the form

p(z|x) = 1√

2πσ2e−(z−θ)2/2σ2

, (11)

where θ is calculated from the geometry of the sound sourceposition x and the sensor position ζ . This model ignores theperiodicity of angles around 2π, and is accurate when thevariance σ2 is small.

In our experiments, we have observed that the DOA esti-mates are reliable in some middle distance range and is lessreliable when the sound source is too near or too far awayfrom the microphone array. This is to be expected. In thenear field, the planar wave assumption is violated. In the farfield, the SNR is low, and the DOA algorithm may be stronglyinfluenced by noise and fail to obtain the correct angle. To ac-count for these factors, we developed a simplified likelihoodmodel which is qualitatively reasonable and empirically be-haves well. The model varies the standard deviation σ accord-ing to distance, as illustrated in Figure 6. We first specify therange [rnear, rfar] in which the DOA algorithm performs re-liably. In this range, the DOA estimate has a fixed standarddeviation. For the nearfield range [0, rnear), as distance de-creases, the standard deviation increases linearly to accountfor the increasing uncertainty of DOA readings. Likewise, inthe far field range r > rfar, the standard deviation increaseswith r.

Under this model, the likelihood function p(z|x) is plot-ted in Figure 7. It has a cone shape in the working rangeand fans out in the near and far range. If the target is lo-cated in either end of the range, the DOA estimates are unre-liable, thus the angle measurement does not provide muchevidence about the target’s location. Note that the likeli-hood in the two-dimensional Cartesian plane is not compact.The sequential Bayesian filtering approach (as described in


rr = 50, aa = 025

20

15

10

5

00 0.5 1 1.5 2 2.5 3

(a)

rr = 50, aa = 2325

20

15

10

5

026 26.5 27 27.5 28 28.5 29 29.5 30 30.5 31

(b)

rr = 50, aa = 4516

14

12

10

8

6

4

2

048.5 49 49.5 50 50.5 51 51.5 52

(c)

rr = 50, aa = 6725

20

15

10

5

069.5 70 70.5 71 71.5 72 72.5 73 73.5 74 74.5

(d)

Figure 5: Histograms of the DOA estimate z at r = 50 feet and θ = 0, π/8, π/4, and 3π/8 in (a), (b), (c), and (d), respectively.

Section 2.1) has the flexibility to accommodate such non-compactness, while a standard Kalman filtering approachmay have difficulty here.

4. INFORMATION-DRIVEN SENSOR SELECTION

Sensor selection is essential for the correct operation of theIDSQ tracking algorithm. The selection criterion is basedon information content to maximize the predicted informa-tion that a sensor’s measurement will bring. This selectionis performed based on currently available information alone:the current leader’s belief state and its knowledge about theneighboring sensor locations and their sensing capabilities.No querying of neighboring sensors is necessary.

To measure the information content, we consider mutualinformation, a measure commonly used for characterizingthe performance of data compression, classification, and es-timation algorithms and with a root in information theory.

4.1. Mutual information

Let U ∈ � and V ∈ � be two random variables (or vectors)having a joint pdf p(u, v).1 The mutual information betweenU and V is defined as

I(U ;V) � Ep(u,v)

[log

p(u, v)p(u)p(v)

](12)

=∫

�

∫�p(u, v) log

p(u, v)p(u)p(v)

dudv (13)

= D(p(u, v)

∥∥p(u)p(v)), (14)

where D(·‖·) is the relative entropy between two distribu-tions, also known as the Kullback-Leibler divergence [8].

1In this section, we use the standard notation, with upper case symbolsdenoting random variables and lower case symbols denoting a particularrealization.


180

160

140

120

100

80

60

40

20

00 50 100 150 200 250 300

Figure 6: Standard deviation σ in bearing estimation versus range.Here rnear = 20 meters, rfar = 100 meters, and σ = 10◦ in [rnear, rfar].

100

90

80

70

60

50

40

30

20

10

00 10 20 30 40 50 60 70 80 90 100

Figure 7: Likelihood function p(z|x). The circle is the sensor loca-tion and the cross is the target location. The standard deviation ofthe Gaussian distribution is as in Figure 6.

Similarly, the mutual information conditioned on a randomvariable W = w is defined as

I(U ;V |W = w) � Ep(u,v|w)

[log

p(u, v|w)p(u|w)p(v|w)

]. (15)

We use logarithm of base 2, hence I(U ;V) is measured inbits. The mutual information is symmetric in U and V ,nonnegative, and equal to zero if and only if U and V areindependent.

The mutual information I(U ;V) indicates how much in-formation V conveys about U . From a data compression per-spective, it measures the savings in bits of encoding U if V

is already known. In classification and estimation problems,mutual information can be used to establish performancebounds. The higher I(U ;V) is, the easier it is to estimate (orclassify) U given V , or vice versa [9, 10].

4.2. Sensor selection criterion

In our target tracking problem, we formulate the sensor se-lection criterion as follows. The leader, with a belief statep(x(t)|z(t)), must decide which sensor in its neighborhood tohand the belief to. The IDSQ suggests selecting the sensor

kIDSQ = arg maxk∈�

I(X (t+1);Z(t+1)

k

∣∣Z(t) = z(t)), (16)

where � is the collection of sensors which the current leadercan talk to, namely, the leader’s neighborhood. Essentially,

this criterion seeks the sensor whose measurement z(t+1)k ,

combined with the current measurement history z(t), wouldprovide the greatest amount of information about the tar-get location x(t+1). Intuitively, kIDSQ is the most “informative”sensor among the neighborhood �.

From the definition of mutual information (12), the in-formation content of sensor k is

I(X (t+1);Z(t+1)

k

∣∣Z(t) = z(t))

= Ep(x(t+1),z(t+1)k |z(t))

[log

p(x(t+1), z(t+1)

k

∣∣z(t))

p(x(t+1)

∣∣z(t))p(z(t+1)k

∣∣z(t))].(17)

The computation of mutual information is illustrated inTable 2.

We can also take a different view of mutual information,interpreting it as a measure of the difference between twodensities. From (17), one can easily show that

I(X (t+1);Z(t+1)

k

∣∣Z(t))

= Ep(z(t+1)k |z(t))

Ep(x(t+1)|z(t+1)

k )log

p(x(t+1)

∣∣z(t+1)k

)p(x(t)

∣∣z(t))

= Ep(z(t+1)

k |z(t))D(p(x(t+1)

∣∣z(t+1)k

)∥∥∥p(x(t+1)∣∣z(t)

)).

(18)

The Kullback-Leibler divergence term measures how differ-ent the updated belief, after incorporating the new measure-

ment z(t+1)k , would be from the current belief. Therefore, the

IDSQ criterion favors the sensor which would on average givethe greatest change to the current belief.

4.3. Reduction in dimensionality

Using the discrete representation of the belief, the complex-ity of computing mutual information grows exponentially inthe dimension of the joint pdf. The random variable X (t+1)

is a two-dimensional vector for the target tracking problemover a two-dimensional plane. Thus, we need to computemutual information from the three-dimensional joint den-

sity p(x(t+1), z(t+1)k |z(t)). This may be computationally inten-

sive, given the limited ability of the sensor nodes.


Table 2: Computation of mutual information I(X (t+1);Z(t+1)k |Z(t)).

Initialization: p(x(t)|z(t)) is known.

Step 1. Compute p(x(t+1)|z(t)) by diffusing p(x(t)|z(t)) with vehicle dynamics (see Section 2.1).

Step 2. Set � as the nontrivial grids in p(x(t+1)|z(t)); set � ⊂ R as the grid points for z(t+1)k .

Step 3. For z(t+1)k ∈ � and x(t+1) ∈ �, evaluate p(x(t+1), z(t+1)

k |z(t)) = p(z(t+1)k |x(t+1)) · p(x(t+1)|z(t)).

Step 4. For z(t+1)k ∈ �, compute p(z(t+1)

k |z(t)) =∑� p(x(t+1), z(t+1)

k |z(t)).

Step 5. For x(t+1) ∈ � and z(t+1)k ∈ �, Dxz = log[p(x(t+1), z(t+1)

k |z(t))/p(x(t+1)|z(t))p(z(t+1)k |z(t))]

Ik =∑

�,� Dxz · p(x(t+1), z(t+1)k |z(t))

Return Ik .

For acoustic amplitude sensors, we note that observa-tion model (5) indicates that at any given time instant, theobservation Zk is related to the position X only throughRk = ‖X − ζk‖, the distance from the target to the sensorpositioned at ζk. Equivalently, we have p(zk|rk, x) = p(zk|rk).In this case, Rk is known as the sufficient statistics of X . Fromthe definition of mutual information, it is easy to show that

I(X ;Zk

) = I(Rk;Zk

). (19)

This implies that instead of computing the mutual infor-mation from the three-dimensional density p(x, zk), we cancompute it from the two-dimensional density p(rk, zk).

Likewise, for DOA sensors, from the observational model(11), we see that Zk is related to X only through the angleθ(X, ζk). Hence

I(X ;Zk

) = I(θ, Zk

). (20)

Again, the computation of mutual information can be re-duced to a two-dimensional computation.

5. EXPERIMENTAL RESULTS

To validate and characterize the performance of the trackingalgorithm, we carried out both simulations as well as experi-ments using real data collected from the field.

5.1. Simulation

In the simulation, the vehicle produces stationary sound witha constant rms amplitude A = 40 and is traveling at a con-stant speed v = 7 m/s along a straight line (south to north) inthe middle of a field. The field is 150×250 m2 and covered byK randomly placed sensors (K = 24, 28, . . . , 60). The sensorpositions are simulated as follows:

(i) first, place the sensors in a uniform rectangular gridwith K/4 rows and four columns, evenly covering thefield;

(ii) then, add Gaussian noise with distribution N(0, 25) tothe horizontal and vertical coordinates.

A realization of this sensor position simulation is picturedin Figure 8. Acoustic sensor measurements are simulatedas A/‖x − ζk‖ + N(0, 0.052). DOA sensor measurements areGaussian random variables centered at the line through thesensor and the target with σ = 3◦.

Without precise knowledge about the target vehicle, thetracking algorithm allows maximum speed of 15 m/s. Theacoustic amplitude sensors assume that alo = 0, ahi = 80, andσk = 0.1 (twice the actual noise contamination to accom-modate outliers). The DOA sensors assumes that rnear = 20meters and rfar = 100 meters. The standard deviation of theDOA estimates is as plotted in Figure 6. The tracker updatesthe belief every 0.5 second.

The tracking algorithm begins with an initial belief whichis uniform over the entire field. The acoustic amplitude sen-sor with the highest amplitude reading at time t = 0 is ini-tialized as the leader. The connectivity between sensors is de-termined as follows: each sensor can talk to sensors within a40-meter range; or if there are less than two sensors in range,it can talk to the two nearest sensors. The leader selects thenext leader from among these sensors based on their infor-mation content according to the IDSQ rule.

To further enforce sensor diversity, we implement a“triplet” rule: the previous leader is not included in theneighborhood of the current leader. That is, if A hands offto B in the previous step, B is prohibited from selecting A asthe next leader. This rule prevents the leadership from con-stantly oscillating between two sensors. This is important inassuring that biases, due to modeling error of one particularsensor, will not be overweighted in the overall result. Ideally,we would like to task the sensors more or less evenly to ex-ploit the spatial and sensing modality diversity of the sensors,expecting that the modeling errors across different sensorswill probably balance each other.

Figure 8 shows three snapshots of a simulation run with40 sensors, 30% of which are DOA. The belief is shown us-ing the grid-based nonparametric representation describedin Section 2.3. The grid size is 5 meters in each direction(which is approximately the size of a tank). The belief followsthe target fairly closely. In general, the belief cloud is morecompact in sensor-dense regions (e.g., in Figure 8b) than in


(a) (b) (c)

Figure 8: Snapshots of a simulation run with 40 sensors, 30% DOA. The target is marked with a red “+.” The yellow diamonds are acousticamplitude sensors; the cyan diamonds are DOA sensors. The active leader is the sensor circled with a magenta square. Its neighbors (afterapplying the triplet rule) are circled with green squares.

sensor-sparse regions (e.g., in Figures 8a and 8c). This is asexpected since the SNR is higher in sensor-dense regions, andthe leader has more neighbors to choose from. The collabo-ration between sensors are more effective.

Table 3 summarizes the statistics from simulation resultsfor different values of K . For each K value, twenty runs aresimulated. We use x to denote the target’s true location, x todenote the blocks in the belief state, and xMMSE to denote theMMSE estimate of target location. From (2), we know thatxMMSE is simply the centroid of the belief state, computed asthe weighted average of location among all the blocks in thebelief, weighted by the posterior p(x(t)|z(t)). To analyze track-ing performance, we use the distance ‖xMMSE−x‖ to measurehow far away the MMSE estimate is to the true target posi-tion, and use the variance ‖x−xMMSE‖2 to measure the spread(thus the uncertainty) of the belief cloud. Other quantities ofinterest include the belief cloud size (directly related to com-munication throughput) and the neighborhood size. FromTable 3, we can see that the tracking performance (the meanerror, variance, and belief size) improves as more sensors aredeployed. Figure 9 plots the average error ‖xMMSE− x‖. Out-liers (the tracker losing the target) occur occasionally, espe-cially for small K . The improvement in average tracking per-formance (the blue curve) with increasing K is quite promi-nent.

We have also experimented with varying percentage ofDOA sensors and summarized the result in Table 4. The im-provement from 0 to 10% DOA is significant, thus it is desir-able to use a few DOA sensors in the sensor network thoughthey are computationally more expensive than acoustic am-plitude sensors. The all-DOA network gives better resultsthan the all-amplitude sensor network. This may be due tothe fact that the acoustic amplitude sensors use the very

55

50

45

40

35

30

25

20

15

1020 25 30 35 40 45 50 55 60 65

Figure 9: Average error versus the number of sensors in the field.The points marked with a circle are the error average over the track-ing steps. The points marked with “∗” and linked using a blue lineis the average over 20 runs.

crude uniform distribution for modeling sound source am-plitude.

5.2. Experiments on field data

5.2.1 Experimental setup

Data for our experiments was collected during a field ex-periment at the Marine Corps Air-Ground Combat Center(MCAGCC) in Twentynine Palms, California. The test vehi-cle was an AAVP7A1 tracked, armored assault amphibiousvehicle. This vehicle, shown in Figure 10, is a 4.1 meter-long,tracked, diesel-powered vehicle, weighing 21 tons (unloaded)and capable of speeds up to 72 km/h.


Table 3: Tracking performance averaged over all tracking steps and twenty runs.

K Avg. ‖xMMSE − x‖ Avg. ‖x − xMMSE‖2 Avg. belief size Avg. neigh. size

24 24.13 457.81 188.6 2.0

28 23.25 452.97 182.0 2.2

32 19.29 400.68 155.2 2.4

36 16.68 340.41 141.7 2.9

40 16.00 361.88 145.0 3.0

44 14.62 352.32 129.5 3.5

48 14.29 322.09 134.6 4.0

52 13.82 334.57 123.3 4.6

56 12.85 332.52 120.5 5.1

60 12.81 304.56 116.4 5.5

64 12.42 284.99 109.1 6.0

Table 4: Tracking performance versus percentage of DOA sensors.

Percentage of DOA Avg. ‖xMMSE − x‖ Avg. ‖xMMSE − x‖2 Avg. belief size

0 18.48 510.96 208.9

10% 16.84 416.17 168.2

20% 16.45 379.03 154.0

30% 15.48 345.68 138.3

40% 14.01 331.55 122.2

50% 14.05 373.88 122.6

60% 12.87 375.45 112.2

70% 11.89 327.94 104.4

80% 11.67 349.60 102.1

90% 11.31 366.50 99.7

100% 10.75 351.02 95.4

Figure 10: AAVP7A1 tracked, armored assault vehicle.

The sensor network consisted of a total of 70 WINS 2.0nodes from Sensoria Corp. A picture of a node is shown inFigure 11 along with a DOA array. The node specificationsare provided in Table 5. Among the 70 nodes, 20–25 nodeswere available for our use. They are randomly positioned atthe intersection of two dirt roads. The sensor layout and the

Figure 11: WINS 2.0 node with DOA-sensing microphone array.

roads are shown in Figure 12. Vehicle maneuvers were con-fined to the roads shown although, by choice, the algorithmsof this paper do not assume this knowledge.

DOA sensing arrays were identical square arrays with 1feet (0.305 m) on the side configured as shown in Figure 11and on legs 8” off the ground. The zero-degree bearing


Table 5: Specification of Sensoria nodes.

Manufacturer Sensoria Corp.

Processor

Hitachi SH4 7751

Performance 300 MIPS core, 1.1 GFLOPS floating point

CPU Power Consumption 400 mW

Memory 16 MB RAM, 16 MB Flash

Operating system Redhat Linux 7

Data acquisition4 channels, 16 bits @ 20 kHz

(this experiment sampled at 5 kHz throughout)

Communications

2 RF Modems @ 2.4 GHz

Power: 10 or 100 mW

Range: 25–100 m

350

300

250

200

150

100

100 150 200 250

Figure 12: Road intersection with node layout at MCAGCC. Thethick lines are the roads. The amplitude sensors are marked with asmall circle, and the two DOA sensors (at (150, 150) and (140, 164))are marked with small squares. The thin lines indicate the connec-tivity between sensors.

is aligned to magnetic north as shown in Figure 4. Micro-phones were electret, flat to frequencies above 16 kHz, om-nidirectional to less than 2.5 dB, and were field-calibrated to±2.5 dB.

5.2.2 Data acquisition and processing

Filtered acoustic amplitude and estimated DOA were com-puted and stored on all nodes at time intervals of 0.5 second

using the node signal processing routines described above.Additional data was collected for two DOA nodes using aportable data acquisition PC and sampled at an identical rateand resolution to the node’s data acquisition. In the experi-ments shown here, only these two “virtual” nodes are usedfor DOA measurements.

The IDSQ tracker, implemented in Matlab, then post-processed this data to simulate the node-to-node handoffand data source selection. Due to node dropout, the originalnetwork topology, as laid out in the field, has to be slightlyaugmented to prevent network segmentation; the resultingtopology is shown in Figure 12.

5.2.3 Tracking results

Figure 13 shows a few snapshots of an AAV run on thenorth-east road. The vehicle is traveling at roughly a con-stant speed of 15 mph (6.7 m/s). The tracker is designed forvmax = 40 mph (17.9 m/s) and updates its belief every 0.5 sec-ond. For acoustic amplitude sensors, the parameters are setas alo = 0, ahi = 80, and σk = 0.05 based on calibration data.For DOA sensors, the variance is set as in Figure 6. At timet = 0, the acoustic amplitude sensor with the highest readingis initialized as the leader. The initial belief is a big uniformsquare, centered at the leader with 50 meters (10 grids) toeach side, as plotted in Figure 13a. The first few tracking stepscan be considered as a “discovery” phase, where the trackerbegins with very little knowledge and use sensor measure-ments to improve the belief. The green trail depicts the tra-jectory of the tracker estimates; in the discovery phase, it isgradually pulled over to the road (the red curve). Figures 13b,13c, and 13d show the progress of tracking. Although ourtracker does not know the road configuration, it producesestimates which follows the road fairly closely.

In our experiment, we have observed that including twoDOA sensors in the sensor network improves tracking accu-racy. This is consistent with our results in Section 5.1. Wesuspect that including a few more DOA sensors may bringfurther improvement. Besides, since DOA sensors essentiallyuses beam crossing for localization, placing the DOA sensorsevenly across the sensor field to avoid colinearity may be ad-vantageous.


(a) (b) (c) (d)

Figure 13: Snapshots of an AAV run on the north-east road. Plotting convention is the same as in Figure 8. The red curves are the roads.The green curve is the estimated track.

6. DISCUSSION AND FUTURE WORK

In our experiments, we have glossed over the issue of auto-matically initializing a track when the vehicle enters the sen-sor field and instead hardwired a uniform initial probabilitydistribution around the entry point. In the future, we planto deploy a small number of “watchdog” sensors along theboundary of the sensor field. While the most part of the net-work remains idle, the watchdog sensors look out for eventsof interest. When a high-confidence detection occurs, sen-sors in that neighborhood wake up, elect a leader, causingthe IDSQ tracker to initialize. When the target moves out-side the sensor field, the network goes back to the sleepingmode, with only the watchdogs turned on.

Bandwidth is at a premium in a sensor network. To fur-ther reduce the bandwidth requirement during sensor collab-oration, we can consider switching between nonparametricand parametric belief representations depending on the na-ture of the distribution. The parametric form is more com-pact when it is feasible since only the parameters of a dis-tribution need to be communicated from the current leaderto the next. Even in the nonparametric representation, wemay be able to encode the distribution using image com-pression techniques to significantly reduce the number ofbits that need to be transmitted. Another improvement tothe tracker could come from using more realistic dynamicsmodels for sound source amplitudes and vehicles during theBayesian filtering. We used a crude form of DOA likelihoodfunction. More accurate, experimentally validated character-ization could also help improve the tracking.

Reliability is an important issue in the sensor network.A single leader tracking protocol, while conceptually sim-ple, might suffer from node failure or degradation. A multi-thread IDSQ tracker could partially alleviate such problemsby providing some degree of redundancy. The challenge hereis to maintain consistencies among and fuse information

from multiple threads of the tracker. In addition to the prob-lem of node dropout, the tracker must also accommodatesensor measurement outliers, perhaps through a local votingmechanism.

Extending the approach to tracking multiple targets facesis a significant next step. When multiple targets are farapart from each other, the network can partition into sub-networks and initialize independent trackers in parallel. Thedifficulty arises when some targets are in proximity with eachother. The source separation and data association are twomajor technical hurdles that must be solved. The data associ-ation module will need to exploit classification knowledge oftargets in order to better disambiguate between multiple tar-gets. The information criterion in the IDSQ tracker, in thiscase, must be extended to account for both state estimationas well as classification.

7. CONCLUSION

This paper has presented a principled approach to sensor se-lection and a class of signal processing algorithms for dis-tributed sensor network based on a mutual informationmeasure, models of acoustic amplitude and bearing sens-ing, and a computationally efficient implementation of theIDSQ tracking algorithm. The tracker is distributed, energy-efficient, and scalable. The approach has been demonstratedin both simulation and on field data of a moving vehicle.

ACKNOWLEDGMENTS

This work was partially supported by the DARPA SensorInformation Technology Program under Contract F30602-00-C-0139. We acknowledge the significant contributions ofPatrick Cheung, Jaewon Shin, and Dan Larner. We are alsoindebted to Professor Kung Yao and Joe Chen of UCLA fortheir advice on using bearing estimation for collaborative sig-nal processing.


REFERENCES

[1] R. Brooks, C. Griffin, and D. Friedlander, “Self-organized dis-tributed sensor network entity tracking,” International Jour-nal of High Performance Computing Applications, vol. 16, no.3, pp. 207–220, 2002.

[2] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar, “Nextcentury challenges: scalable coordination in sensor networks,”in Proc. 5th Annual International Conference on Mobile Com-puting and Networks, pp. 263–270, Seattle, Wash, USA, August1999.

[3] M. Chu, H. Haussecker, and F. Zhao, “Scalable information-driven sensor querying and routing for ad hoc heterogeneoussensor networks,” International Journal of High PerformanceComputing Applications, vol. 16, no. 3, pp. 293–314, 2002.

[4] H. V. Poor, An Introduction to Signal Detection and Estimation,Springer-Verlag, New York, NY, USA, 2nd edition, 1994.

[5] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper SaddleRiver, NJ, USA, 3rd edition, 1996.

[6] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders, Fun-damentals of Acoustics, John Wiley and Sons, New York, NY,USA, 1999.

[7] J. C. Chen, R. E. Hudson, and K. Yao, “Joint maximum-likelihood source localization and unknown sensor locationestimation for near-field wideband signals,” in Advanced Sig-nal Processing Algorithms, Architectures, and ImplementationsXI, vol. 4474 of SPIE Proceedings, San Diego, Calif, USA, July2001.

[8] T. M. Cover and J. A. Thomas, Elements of Information Theory,John Wiley and Sons, New York, NY, USA, 1991.

[9] J. Ziv and M. Zakai, “On functionals satisfying a data-processing theorem,” IEEE Transactions on Information The-ory, vol. 19, pp. 275–283, May 1973.

[10] A. O. Hero, “On the problem of granulometry for a degradedBoolean image model,” in Proc. IEEE International Conferenceon Image Processing, vol. II, pp. 16–20, Kobe, Japan, October1999.

Juan Liu received her B.E. degree in elec-tronic engineering from Tsinghua Uni-versity, China, in 1995, and the M.S.and Ph.D. degrees in electrical engineeringfrom the University of Illinois at Urbana-Champaign, USA, in 1998 and 2001, respec-tively. In September 2001, she joined PaloAlto Research Center as a Research Scien-tist, working in the Embedded Collabora-tive Computing Area. Her research interestsinclude signal processing, statistical modeling, detection and esti-mation, network routing, and their applications to distributed sen-sor network problems.

James Reich is a Researcher in the Embed-ded Collaborative Computing Area of thePalo Alto Research Center (PARC). He re-ceived an Undergraduate degree in aero-nautical and astronautical engineering fromthe MIT and an M.S. degree from CarnegieMellon in electrical and computer engineer-ing. His technical focus is in sensing andcontrol using ad hoc networks of intelligentdevices. His previous work ranges from re-furbishing Von Braun-era rockets to control system design and in-tegration of PARC’s first active surface, the ISS airjet paper mover.

Feng Zhao is a Principal Scientist at PaloAlto Research Center (PARC), where he di-rects the Embedded Collaborative Comput-ing Area in the Systems and Practices Lab-oratory. He is also a Consulting AssociateProfessor of computer science at Stanford.His research interest includes distributedsensor data analysis, diagnostics, qualitativereasoning, and control of dynamical sys-tems. Dr. Zhao received his Ph.D. degree inelectrical engineering and computer science from the MIT in 1992.From 1992 to 1999, he was an Assistant and Associate Professorof computer and information science at Ohio State University. Hereceived the ONR Young Investigator Award and the NSF YoungInvestigator Award, and was an Alfred P. Sloan Research Fellow incomputer science. He has authored or coauthored over 50 peer re-viewed technical papers in the areas of networked embedded sys-tems, artificial intelligence, nonlinear control, and programmingtools, and is a coinventor of two US Patents and three pendingpatent applications.


Preprocessing in a Tiered Sensor Networkfor Habitat Monitoring

Hanbiao WangComputer Science Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1596, USAEmail: [email protected]

Deborah EstrinComputer Science Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1596, USAEmail: [email protected]

Lewis GirodComputer Science Department, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1596, USAEmail: [email protected]

Received 1 February 2002 and in revised form 6 October 2002

We investigate task decomposition and collaboration in a two-tiered sensor network for habitat monitoring. The system recognizesand localizes a specified type of birdcalls. The system has a few powerful macronodes in the first tier, and many less powerfulmicronodes in the second tier. Each macronode combines data collected by multiple micronodes for target classification andlocalization. We describe two types of lightweight preprocessing which significantly reduce data transmission from micronodesto macronodes. Micronodes classify events according to their cross-zero rates and discard irrelevant events. Data about eventsof interest is reduced and compressed before being transmitted to macronodes for target localization. Preliminary experimentsillustrate the effectiveness of event filtering and data reduction at micronodes.

Keywords and phrases: sensor network, collaborative signal processing, tiered architecture, classification, data reduction, datacompression.

1. INTRODUCTION

Recent advances in wireless network, low-power circuit de-sign, and micro electromechanical systems (MEMS) will en-able pervasive sensing and will revolutionize the way inwhich we understand the physical world [1]. Extensive workhas been done to address many aspects of wireless sensornetwork design, including low-power schemes [2, 3, 4], self-configuration [5], localization [6, 7, 8, 9, 10, 11], time syn-chronization [12, 13], data dissemination [14, 15, 16], andquery processing [17]. This paper builds upon earlier work toaddress task decomposition and collaboration among nodes.

Although hardware for sensor network nodes will be-come smaller, cheaper, more powerful, and more energy-efficient, technological advances will never obviate the needto make trade-offs. Cerpa et al [18]. described a tiered hard-ware platform for habitat monitoring applications. Smaller,less capable nodes are used to exploit spatial diversity, whilemore powerful nodes combine and process the micronodesensing data.

Although details of task decomposition and collabora-tion clearly depend on the specific characteristics of appli-cations, we hope to identify some common principles thatcan be applied to tiered sensor networks across various ap-plications. We use birdcall recognition and localization as acase study of task decomposition and collaboration. In thiscontext, we demonstrate two types of micronode prepro-cessing. Distributed detection algorithms and beamformingalgorithms will not be discussed in detail in this paper al-though they are fundamental building blocks for our appli-cation.

The rest of the paper is organized as follows. Section 2presents a two-tiered sensor network for habitat monitor-ing and the task decomposition and collaboration betweentiers. Sections 3 and 4 illustrate two types of micronodepreprocessing. Section 5 presents the preliminary results ofdata reduction and compression experiments. Section 6 is abrief description of related work. Section 7 concludes thispaper.

Preprocessing in a Tiered Sensor Network for Habitat Monitoring 393

2. TASK DECOMPOSITION AND COLLABORATIONIN A TIERED SENSOR NETWORK FOR HABITATMONITORING

2.1. Tiered sensor network for habitat monitoring

Our example application is the recognition and localizationof a known acoustic source (e.g., a bird). The system first rec-ognizes birdcalls of interest and then determines their loca-tions.

Our two-tiered wireless sensor network is illustrated inFigure 1. It has two types of nodes: macronodes in the firsttier and micronodes in the second tier. Micronodes are lessexpensive but more resource-constrained than macronodes.We choose commercial-off-the-shelf (CTOS) PC104 prod-ucts as our macronodes http://www.pc104.org/consortium/.PC104 is a well-supported standard. They are physicallysmall but available with CPUs ranging from i386 to Pen-tium II, memory up to 64 MB, and a full spectrum of pe-ripheral devices including digital I/O, sensors and actua-tors. We choose the motes developed by UC Berkeley [19]and manufactured by Crossbow, Inc. as our micronodeshttp://www.xbow.com. The latest motes have 128-KB pro-gram memory, 4-KB data memory, 512-KB secondary stor-age, 50-Kb/s radio bandwidth, and 6 ADC channels. BothPC104s and motes can be equipped with acoustic sen-sors. Motes and PC104s can communicate with one anotherthrough wireless network. Micronodes can be densely dis-tributed because of their low cost and small form factor. Highdensity increases the probability for some micronodes to de-tect a stimulus close to its origin. Physical proximity to astimulus yields higher SNR and improves opportunities forline of sight. Macronodes are sparsely distributed becauseof their higher power consumption. Nodes form a clusteredwireless network by self-assembly [20]. Macronodes serve ascluster heads because they have more processing power andmore capabilities than do micronodes. GPS on macronodescan provide location and time references to the rest of the sys-tem. Locations of other nodes can be determined iteratively,given a group of reference nodes’ locations [6, 7, 10, 11].Other nodes can also be synchronized to reference nodes[12, 13]. Figure 1 illustrates two clusters in a tiered sensornetwork.

2.2. Task decomposition and collaboration

The task of our case study system is to recognize the spec-ified type of birdcalls and determine their locations. First,we need to specify the birdcalls of interest to the systemas input. A convenient input format for biologists is thebirdcall waveform. Biologists typically have recorded bird-call waveforms for the particular type of birds being studied.These waveforms can be input into the system from macron-odes. The macronodes convert the waveforms into the inter-nal formats used by birdcall recognition algorithms.

In particular, spectrograms are complete descriptions ofbioacoustic characteristics of birdcalls. They are widely usedby biologists for animal call classification. Macronodes haveenough computational resources to use spectrograms inter-nally to classify acoustic signals. However, micronodes are

Figure 1: Two-tiered sensor network for bird monitoring. Macron-odes are PC104s. Micronodes are Berkeley motes [19]. Dotted linesand dashed lines represent inner cluster and intercluster wirelesscommunication links, respectively.

too resource-constrained to use spectrograms. We proposeusing a cross-zero rate representation for micronodes. Cross-zero rate is the rate at which a waveform changes signs. Con-sequently, this representation is always two times the mostsignificant frequency and thus a summary of the most sig-nificant characteristics of a waveform. Figure 2 illustrates therelationship between spectrograms and cross-zero rates inSection 3. Cross-zero rates are easy to compute and easy touse. Classification using cross-zero rates will be discussed indetail in Section 3.

The target recognition task can be divided into twosteps. All nodes first independently determine whether theiracoustic signals are of the specified type of birdcalls. Then,macronodes can fuse all individual decisions into a more re-liable system-level decision using distributed detection algo-rithms [21]. We will not discuss details of the decision fusionin this paper. We will describe how individual decisions aremade in detail in Section 3.

The target localization task can also be divided into twosteps. First, waveforms are recorded at nodes that are dis-tributed at different locations. Second, all those data are ac-cumulated to one macro node, and beamforming is appliedto determine the target location. The procedure of the beam-forming estimates target location using the time differenceof arrival (TDOA) from a set of distributed sensors whoselocations are known [22, 23, 24]. The time lag of the cross-correlation maximum between waveforms of the same tar-get from two different sensors indicates TDOA between thosetwo sensors.

So far, we have decomposed tasks and distributed themto appropriate nodes in order to optimize the cost effective-ness. Micronodes are densely distributed for sensing whilemacronodes are sparsely distributed for time-space refer-ence and information fusion. Such optimization is one ofthe fundamental goals of task decomposition and collabo-ration in a tiered sensor network. However, there are alsosecondary goals that can significantly contribute to a longerlifetime for the system. For example, communication among


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5Time (s)

20002500300035004000

Cro

ss-z

ero

rate

(Hz)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1000

2000

3000

Spec

trog

ram

(Hz)

−60−40−20020

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−1

−0.5

00.5

1

Wav

efor

m

Birdcall A

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5Time (s)

20002500300035004000

Cro

ss-z

ero

rate

(Hz)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1000

2000

3000

Spec

trog

ram

(Hz)

−60−40−20020

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−1

−0.5

00.5

1

Wav

efor

m

Birdcall B

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4Time (s)

7000

8000

9000

Cro

ss-z

ero

rate

(Hz)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.435004000450050005500

Spec

trog

ram

(Hz)

−60−40−20020

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4−1

−0.5

00.5

1

Wav

efor

m

Birdcall C

Figure 2: Waveforms, spectrograms, and cross-zero rates of bird-calls A, B, and C. Birdcalls A and B are of the same type while bird-call C is different. Spectrograms are only shown in a limited fre-quency band. The cross-zero rates are calculated in a time windowof 20 ms.

nodes should be minimized because it is the primary en-ergy consumer. Pottie and Kaiser have pointed out in [25]that each bit transmitted on the air will bring the node bat-tery one step closer to its death. In the rest of this paper,we will discuss in detail two types of preprocessing at mi-cronodes, which significantly reduce the data transmissionoverhead.

The first type of preprocessing is to recognize events ofinterest and filter out irrelevant events at the micronodes.When waveforms of a specific type of birdcalls are input tothe system at a macro node, the macro node computes itsspectrogram and cross-zero rate and sends the spectrogramand the cross-zero rate to all other macronodes. All macron-odes broadcast the cross-zero rate to all micronodes in theirrespective clusters. Micronodes use the cross-zero rate to de-termine whether a detected signal is of the specified type ofbirdcalls or not. If it is not, it will be discarded without be-ing further sent to its cluster head for data fusion. Assum-ing events of interest occur sparsely in the long lifetime of asensor network, the local filtering at micronodes will signifi-cantly reduce the amount of data that needs to be transmittedto macronodes.

The second type of preprocessing is to do data reduc-tion/compression at the sensor nodes before data is transmit-ted to the macro node for combination. Data reduction re-duces data size by discarding irrelevant information in data.1

In our example of sensor network, source location estimationneeds arrival-time information of acoustic signals at multiplesensor nodes. We use an audio reduction/compression tech-nique that retains most time information in audio waveformswhile discarding amplitude change details. Cross correlationbetween two waveforms of the same stimulus recorded at twodifferent locations indicates TDOA between those two loca-tions. Cross correlation of two reduced/compressed wave-forms indicates the same TDOA as the cross correlation oftheir respective raw waveforms does.

The above two components have the potential to greatlyreduce the amount of wireless communication and energycost in the sensor network. As a result, the system lifetimewill be extended. The remainder of this paper describes spe-cific techniques to implement these two types of processingat micronodes.

3. EVENT FILTERING AT MICRONODES

We now describe the first type of preprocessing at micro-nodes—a lightweight event recognition scheme that identi-fies events of interest while discarding irrelevant events. Inour case study of bird monitoring application, motes will beexposed to acoustic signals from all kinds of events such aswind, rain, traffic, and other animal calls. We use micron-

1The semantics of irrelevant information is determined by the character-istics of the application. For example, MP3 compression uses the psychoa-coustic selection of sound signals to eliminate those signals that we are un-able to hear while retaining human perception. Therefore, sounds below theminimum audition threshold and sounds masked by stronger sounds areirrelevant information.


odes to determine event type locally and discard signals ofirrelevant events.

The traditional birdcall classification is based on bioa-coustics. Spectrograms completely describe bioacoustic char-acteristics of each type of birdcalls. When the spectrogramis computed for an observed acoustic signal, any standarddetection methods for two-dimensional signals can be ap-plied to determine whether the spectrogram is of the type ofbirdcalls of interest or not. One of the straightforward clas-sification methods uses the cross-correlation coefficient be-tween the measured spectrogram and the reference spectro-gram. In Figure 2, there are three birdcalls. Birdcalls A and Bare of the same type, and their cross-correlation coefficient isabout 97%. Birdcalls A and C are of different types, and theircross-correlation coefficient is 0%. We can choose a thresh-old for cross-correlation coefficients. All cross-correlationcoefficients beyond the threshold indicate that two birdcallsare of the same type.

Computation of spectrograms and cross-correlation co-efficients demands much CPU and memory. For example,it takes our macro node of 266 MHz CPU and 64 MB RAMmore than 300 ms to complete a classification operation us-ing the cross-correlation coefficient between the measuredspectrogram and the reference spectrogram. As describedearlier, we thus use the cross-zero rate of the detected signalto determine its event type. When signal samples stream intothe micronode, the cross-zero rate can be easily computedby simply counting the number of zero-crossings, which de-mands much less computational resource than the spectro-gram. One of the straightforward classification methods us-ing cross-zero rates is to use the average difference of twocross-zero rate curves. In Figure 2, the same type of birdcallsA and B have an average cross-zero rate difference of 84 Hzwhile different types of birdcalls A and C have an averagecross-zero rate difference of 5416 Hz. Computation of the av-erage difference between two cross-zero rate curves also costsmuch less resource than computation of cross-correlationcoefficient between two spectrograms. We choose a thresholdfor the average difference between two cross-zero rate curves.An average difference between two cross-zero rate curves be-low the threshold indicates that the two birdcalls are of thesame type.

The advantage of cross-zero rates comes from its lowcomputational resource demands. However, the cross-zerorate loses some information about the spectrogram. Whennoise is so strong that the most significant frequency is fromnoise instead of a birdcall, the cross-zero rate will be dis-torted. The distorted cross-zero rate curve represents char-acteristics of noise, not of the birdcall. When noise is notstrong enough to change the most significant frequency indata, noise has no effect on the cross-zero rate at all becausethe cross-zero rate is only determined by the most signif-icant frequency in data. Fortunately, birdcalls usually havea narrow bandwidth. Therefore, we can filter out the noisethat is not in the bandwidth of the birdcall to be monitored.For example, the noise caused by wind in the outdoor en-vironment usually has much lower frequency than typicalbirdcalls. Therefore, wind can be easily filtered out. Filtering

is the first stage of processing after signals are sampled at mi-cronodes. The computational cost of simple bandpass filter-ing is low enough for micronodes to handle. However, whennoise is in the same bandwidth as the birdcalls to be moni-tored, filtering does not help. For example, a birdcall of inter-est could be so severely polluted by other animal calls that themeasured cross-zero rate curve does not match the referencecross-zero rate curve. In that scenario, birdcalls of the speci-fied type indeed could be discarded as irrelevant calls. In rarecases, two different types of acoustic signals may have similarcross-zero rates although their spectrograms are different.

4. DATA REDUCTION/COMPRESSIONAT MICRONODES

In this section, we describe the second type of preprocess-ing at micronodes, a data reduction scheme that retains mosttime information of acoustic signals for beamforming us-ing TDOA. We also present S-coding that compactly encodesreduced acoustic signals. After reduction and compression,data will be sent to macronodes.

4.1. Data reduction

In the example of sensor network for bird monitoring, thesource location estimation requires beamforming of signalsdetected by multiple micronodes. The simplest design is forall micronodes to send all the waveforms to a macro nodefor beamforming. However, the bandwidth and energy con-sumption are far beyond the capability of the system. A sam-pling rate of 22 kHz with a sample size of 8 bits will generatedata at a rate higher than three times of what a micronode’s50-Kbps radio can transmit. Moreover, the energy consump-tion would greatly shorten the system lifetime. Instead, mi-cronodes must reduce/compress raw data locally before it issent to the macro node.

Data reduction based on application characteristics is nota new concept. In estimation theory, minimum sufficientstatistics is a function of a set of samples [26]. It contains noless information about the parameter to be estimated thanthe original set of samples while having much smaller datasize. This concept can also be generalized to apply to signalprocessing in sensor network. The following describes a spe-cific data reduction scheme used in our case study of sen-sor network. It transforms raw waveforms into a coarse for-mat with smaller data size while keeping most time infor-mation contained in raw waveforms. Specifically, the cross-correlation of reduced waveforms indicates the same TDOAas raw waveforms. Thus, TDOA-based beamforming can usereduced waveforms instead of raw waveforms to determinethe target location. TDOA-based beamforming has been dis-cussed in detail in many papers [22, 23, 24].

A typical digitized raw signal waveform is a sequence ofreal-valued signal samples, where indices indicate the time,{

ai | i = 0, . . . , n− 1}. (1)

We define a segment as a consecutive subsequence of thewaveform, within which all samples have the same signs,


but immediately-before or immediately-after samples havedifferent signs. For any physical signal sampled at properrate, {ai} is actually a sequence of alternate positive-signedsegments and negative-signed segments. Our data reductionscheme for a waveform is based on the following impor-tant observation.2 Most of the time information of the wave-form is contained in the moments when alternate transitionsbetween positive-signed segments and negative-signed seg-ments occur. The signal variation details within a segmentcan be discarded with little loss of time information. The fol-lowing coarse waveform {bi} contains most of the time in-formation contained in the raw waveform {ai}:{

bi | i = 0, . . . , n− 1}, (2)

where

bi =+1, if ai ≥ 0,

−1, if ai < 0.(3)

Therefore, {bi} can replace {ai} without causing much lossof time information.

After micronodes reduced the raw waveform {ai} intothe coarse waveform {bi}, there are two options. One is tocode {bi} into a binary string (+1 encoded as 1 and −1 en-coded as 0) before sending it to macronodes. When the rawwaveform has a sample size of n bits, then the total size ofthe reduced waveform is only 1/n of the total size of the rawwaveform. The second option is to view the coarse waveform{bi} as a sequence of segments, which can be completely rep-resented by the sign of the first segment, the starting timeof the first segment, and a sequence of segment lengths (SSL).The SSL representation can be further encoded into a morecompact format. In either case, data reduction can signifi-cantly reduce data transmission by reducing raw waveformsinto course waveforms. Motivated by bigger compress gains,we will discuss the second option in detail in the followingparagraphs.

We have discussed the effects of noise to cross-zero ratein Section 3. When noise is strong enough to alter the mostsignificant frequency component of the data to be classified,noise must be filtered out before computing cross-zero rate.Otherwise, cross-zero rate will represent characteristics ofnoise instead of the birdcall to be classified. Likewise, strongnoise must also be filtered before data reduction. Otherwise,the coarse waveform will represent the time information ofnoise arrival at sensors. Fortunately, the noise is low enoughin birdcalls that have already been classified as the type ofinterest using cross-zero rate. Otherwise, classification usingcross-zero rate will discard the birdcall as irrelevant events.Thus, data reduction applied after classification using cross-zero rate is safe from noise corruption and thus retain theright time information of signals. Therefore, filtering is crit-

2We were inspired by personal communication with Dr. Ralph Hudsonand Dr. Kung Yao. Dr. Hudson and Dr. Yao suggested that cross correlationbetween waveforms sampled at extreme sample size of 1 bit still indicates thecorrect TDOA.

Table 1: Base-16 S-code.

Number range Base-16 S-code

1, 15 0x 1, 0x F

16, 255 0x 0 10, 0x 0 FF

256, 4095 0x 00 100, 0x 00 FFF

ical to both cross-zero rate-based classification and data re-duction when noise is strong. In order to make cross-zerorate-based classification and data reduction valid, the firststep of preprocessing immediately after sampling should benoise filtering.

4.2. Data encoding

The sign and starting time of the first segment can be ef-ficiently encoded in a constant amount of space. However,depending on segment length distribution, it takes variablespace to encode an SSL. For convenience, we will not differ-entiate terms for the whole encoding task and the encodingof its SSL.

An SSL is a sequence of natural numbers in which mostsegments have a few samples while a few segments could havemany samples. To encode an SSL is a problem of variable-length coding of natural numbers. Many variable-length cod-ing of integers have been proposed [27, 28, 29, 30, 31, 32].However, there is no “best” encoding scheme because encod-ing efficiency always depends on the probability distributionof integers to be encoded. Many encoding schemes may beable to encode SSL with high efficiency. For convenience, wepropose to use S-code for the encoding of SSL. S-code is anextension of Elias γ′-code [28, 29]. Elias γ′-code usually con-sists of two parts: flag bits and data bits. Flag bits tell howmany data bits are used for the number. It produces shortercodes for small integers and longer codes for large integers.Unlike Elias γ′-code which is binary number, S-code is base-2N number instead. Like Elias γ′-code, S-code is the con-catenation of flag bits and data bits. Flag bits indicate cod-ing length of the integer. Elias γ′-code has no flag bit for 1.Likewise, S-code has no flag bits for natural number smallerthan 2N . Data bits are simply direct unsigned representationof the natural number. When N = 1, S-code turns into Eliasγ′-code. Table 1 shows base-24 (hexadecimal) S-code.

Because sampling rate is often several times the cutoff fre-quency of signals, the shortest segment has several samples.Because birdcalls are usually limited in a narrow bandwidthfrom tens of Hz to several kHz, length of the longest seg-ment will be no longer than 100 times of that of the shortestsegment. Each type of birdcalls has its characteristic segmentlength distribution for a given sampling rate. Given the seg-ment length distribution and base 2N used for S-code, thesize of S-coded SSL can be analytically predicted. To maxi-mize compression efficiency of S-code, this N should be cho-sen such that most segment lengths are between 2N − 1 and2N . Because the encoding size can be predicted when theevent type of interest is specified to the sensor network, wecan specify the optimal value of N before sensor nodes startdata compression.


After an SSL is S-coded, general purpose compressionsuch as zip can be applied in addition. Our preliminary ex-periments show that both encoding methods have significantcompression gain.

5. EXPERIMENTS

The purpose of our experiments is to explore the validity andefficiency of the proposed data reduction and compressionschemes. In our experiments, a birdcall is recorded with twosynchronized microphones. The cross correlation betweenwaveforms of those two channels indicates TDOA betweentwo microphones. We apply our data reduction/compressionto the raw waveforms as in (1) and then decode it into acoarse waveform as in (2). The cross correlation betweencoarse waveforms indicates almost the same TDOA as thatbetween the corresponding raw waveforms. The error iswithin one sample interval. Therefore, the data reductionscheme appears to retain most time information in raw wave-forms. When data reduction, S-coding, and zipping are ap-plied to raw waveforms in order, the overall compression ra-tio is 69.6 on average.

5.1. Experiment method

The experiments were done in an outdoor environmentwith noise of traffic and venting. Temperature, humidity,and wind speed are 55 F, 49%, and 12 mph, respectively.Estimated sound speed was approximately 339.5 m/s, basedon the algorithm in [33]. The birdcall was played backfrom a standard computer speaker driven by an CompaqiPAQ pocket PC H3760. Sound was recorded with a pair ofsynchronized microphones connected to a laptop. Samplingrate is 32 kHz. Sample size is 16 bits. Both speaker and mi-crophones were mounted above ground 6 feet and in onestraight line. Two microphones were separated by approxi-mately 9 feet.

There are two groups of recording experiments. In thefirst group of experiments, the speaker was put at four dif-ferent positions, as Figure 3 shows, with the same volume. Inthe second group of experiments, the speaker was turned tofour different volumes at the same position as S1 in Figure 3indicates.

5.2. Recorded waveforms

Figure 4 shows recorded waveforms in the first group of ex-periments. Figure 5 shows recorded waveforms in the secondgroup of experiments. S1 and V1 are the same recording ex-periment. They are put into two groups for purpose of com-parison.

5.3. Validity of data reduction/compression

We applied data reduction/compression to recorded wave-forms and then restored coarse waveforms from the encod-ing. TDOA was computed using cross correlation betweentwo coarse waveforms. For comparison, we also computedTDOA using cross correlation of raw waveforms. TDOA be-tween L and R channels are listed in Table 2 in unit of sam-

−5 0 5 10 15 20 25 30 35 40X coordinates (ft)

−20

−15

−10

−5

0

5

10

15

20

Yco

ordi

nat

es(f

t)

R L S1 S2 S3 S4

Figure 3: Microphones and speaker positions. Microphones are lo-cated at the triangles and speakers are located at circles. L and R arethe left and right channels of the synchronized microphones pairs.S1, S2, S3, and S4 are four positions of the speaker.

ple intervals (1/32000 second). TDOA computed from rawwaveforms are 261 sample intervals. Given sampling rate32 kHz and sound speed estimation 339.5 m/s, TDOA cor-responds to 261/32000∗339.5 m/s = 2.769 m, which is con-sistent with the distance between two microphones. TDOAcomputed from coarse waveforms are within ±1 sampleinterval from TDOA indicated by raw waveforms. Our datareduction essentially keeps all positions of zero crossings inthe recorded raw waveform. Because the resolution of cross-zero position is one sample interval, it is reasonable to seeerror of ±1 sample interval in TDOA indicated by coarsewaveforms. Therefore, our data reduction appears to retainalmost all time information in the raw waveforms. Figure 6shows cross correlation between L/R coarse waveforms of S1.

5.4. Efficiency of data reduction/compression

Table 3 shows data size of waveforms and their reduced/S-coded/zipped formats. Data size of all raw waveforms is16, 000 × 16 = 256, 000 bits. Data reduction reduces a rawwaveform as in (1) to a coarse waveform as in (2). A coarsewaveform is completely represented by the sign and the start-ing time of the first segment and SSL. Because SSL takes morethan 99% space of coarse waveform representation, we willnot differentiate SSL and coarse waveform representation forpurpose of compression ratio analysis. No segment has morethan 65,535 samples. Therefore, Each segment length can berepresented by a 16-bit natural number in SSL. Reduction ef-ficiency is given by the ratio of raw waveform size to SSL size.The average reduction efficiency is about 11.4.

S-coding encodes SSL into a compact format. Base-16 S-coding is chosen because most segment lengths are between 8and 16. A typical probability distribution of segment lengthsis shown in Figure 7. Efficiency of S-coding is the ratio ofSSL size to the size of S-coded SSL. The average S-coding ef-ficiency is about 3.3. In order to compare the performance of


0 0.1 0.2 0.3 0.4 0.5Time (s)

−1

0

1

S 4

0 0.1 0.2 0.3 0.4 0.5−1

0

1

S 3

0 0.1 0.2 0.3 0.4 0.5−1

0

1

S 2

0 0.1 0.2 0.3 0.4 0.5−1

0

1

S 1L channel

0 0.1 0.2 0.3 0.4 0.5Time (s)

−1

0

1

V4

0 0.1 0.2 0.3 0.4 0.5−1

0

1V

3

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V2

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V1

R channel

Figure 4: Recorded waveforms by a pair of synchronized micro-phones. Microphone speaker positions are shown in Figure 3. In theabove four recording experiments, the speaker volume is the samewhile the distances from the speaker to the pair of microphones aredifferent.

0 0.1 0.2 0.3 0.4 0.5Time (s)

−1

0

1

V4

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V3

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V2

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V1

L channel

0 0.1 0.2 0.3 0.4 0.5Time (s)

−1

0

1

V4

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V3

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V2

0 0.1 0.2 0.3 0.4 0.5−1

0

1

V1

R channel

Figure 5: Recorded waveforms by a pair of synchronized micro-phones. Microphone geometry is shown in Figure 3. The speaker islocated at S1 in Figure 3. In the above four recording experiments,the speaker volumes are in a decreasing order from V1 to V4 whilethe distances from the speaker to the pair of microphones are thesame.

S-coding to that of general-purpose compression algorithms,we compress SSL with WinZip 8.0. Zipping efficiency is theratio of SSL size to the size of zipped SSL. The average zippingefficiency is about 2.7.

We also examine the efficiency of S-coding followed byzipping. It is the ratio of SSL size to the size of zipped S-codedSSL. The average efficiency of concatenation of S-coding andzipping is about 6.1. It is significantly larger than that of

Table 2: TDOA indicated by cross correlation of raw waveformsand of coarse waveforms.

Record

TDOA from raw TDOA from coarsewaveforms waveforms

(sample interval) (sample interval)

S1/V1 261 261

S2 261 261

S3 261 261

S4 261 261

V2 261 262

V3 261 260

V4 261 260

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2×104

Time (1/32000 s)

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Cro

ssco

effici

ent

262

Figure 6: Cross coefficient between coarse waveforms of S1 indi-cates TDOA of 262 sample intervals. Dashed line represents TDOAof 0. The star indicates the peak of the cross coefficient, which hasan offset of 262 sample intervals from dashed line.

S-coding or zipping if applied individually. It indicates that S-coding and zipping are somewhat orthogonal to each other.They exploit different redundancy in SSL. Therefore, it ispossible to design a more sophisticated compression algo-rithm that combines the power of both S-coding and zipping.However, S-coding is quite simple and good for low-end mi-cronodes such as motes. When the sensor nodes have enoughprocessing capability to run a more sophisticated compres-sion algorithm than S-coding, we may just apply S-codingfollowed by zipping.

When data reduction, S-coding, and zipping are ap-plied in order, the ratio of raw waveform size to the size ofzipped S-coded SSL is 69.6, which is much larger than that ofexisting data compression schemes for audio data.

6. RELATED WORK AND DISCUSSION

Pottie [25, 34] pointed out that subnetworks should beformed in a large wireless sensor network. The subnetwork


Table 3: Data size of reduced/S-coded/zipped waveforms (all raw waveforms have 256,000 bits).

Record SSL (bit) Zipped SSL (bit) S-coded SSL (bit) Zipped S-coded SSL (bit)

S1/V1 (L) 24,048 7,544 6,944 3,064

S1/V1 (R) 23,120 8,664 6,784 3,832

S2 (L) 23,696 7,992 6,928 3,200

S2 (R) 23,088 8,440 6,832 2,032

S3 (L) 23,040 8,424 6,780 3,464

S3 (R) 22,400 8,616 6,628 3,832

S4 (L) 21,472 8,816 6,852 4,560

S4 (R) 22,016 8,984 6,900 4,008

V2 (L) 23,728 7,048 6,904 2,784

V2 (R) 23,968 8,168 7,020 3,904

V3 (L) 23,248 7,664 6,872 3,192

V3 (R) 21,618 8,728 6,424 3,328

V4 (L) 21,472 8,536 6,816 3,632

V4 (R) 17,504 8,296 5,804 4,648

0 5 10 15 20 25 30 35Segment length (Samples)

0

50

100

150

200

250

300

350

400

Nu

mbe

rof

segm

ents

Figure 7: Probability distribution of segment lengths for S1. Be-cause most segment lengths are between 8 (23) and 16 (24), base-16S-coding has the maximum compression gain.

organization enables coordinated internal communicationby a master so that some internal nodes can be powereddown. Many possible trade-offs related to architecture ofwireless sensor network were also extensively discussed in[34]. He concluded that the high cost of wireless commu-nication compared to data processing leads to a differenttrade-off regime other than that of traditional ad hoc wirelessnetwork. The trade-off between homogeneous and heteroge-neous nodes is briefly discussed. However, there were no de-tailed discussions on task decomposition and collaborationin a tiered architecture, especially preprocessing at micron-odes.

Van Dych and Miller [35] proposed a cluster-based archi-tecture for sensor networks motivated by the performance of

distributed detection algorithms. However, there is a signifi-cant difference between their focus and ours. They focus onthe scenario of distributed sensing and detection. Binary de-cisions are made at local sensing nodes and there is no needfor transmission of raw signals. We focus on coherent sig-nal processing scenarios that have much higher demands onbandwidth than distributed detections. We choose the hier-archical organization of sensor networks in order to reducewireless communication and thus energy consumption bydistributing signal processing to local micronodes and clus-ters. For coherent signal processing, either raw signal or itsreduced format must be collected to a central node for infor-mation fusion. We propose a data reduction scheme at mi-cronodes for acoustic signals. However, there is no need forsuch data reduction scheme in the distributed detection sce-nario in [35].

Tiered sensor network hardware platforms were pro-posed by Cerpa et al. [18] for habitat monitoring applica-tions. They pointed out that larger, faster, and more expen-sive hardware can be used more effectively together withsmall factor nodes because the later can be densely dis-tributed and have small form factor. However, software ar-chitecture or task decomposition and collaboration mecha-nisms for in-network signal processing was not addressed forthe tiered architecture in [18].

Mainwaring et al. [36] also describe a tiered sensornetwork for habitat monitoring on Great Duck Island(GDI). Their application monitors environment conditionssuch as light, temperature, barometric pressure, humid-ity, and infrared. They use a tiered architecture solely forcommunication. The lowest level consists of sensor nodes de-ployed in dense patches that could be widely separated. Ineach sensor patch, a gateway node transmits data from thepatch to a base station that serves the collection of patches.The base station transmits all data to a central databasethrough the Internet. In contrast, we propose a tiered


architecture for the purposes of collaborative signal and in-formation processing inside the sensor network. We deploya hierarchy of nodes to accommodate demanding data pro-cessing tasks that cannot be handled by smaller sensor nodes.The GDI system described does not require collaborativedata processing inside their sensor network. All data is trans-mitted back to a central database for off-line data miningand analysis. It is feasible to transmit data sampled at thoserelatively low rates all the way back without local process-ing. However, in our application context, it is not feasi-ble to transmit all the data back due to the higher sam-pling rate. For a network of 1000 sensor nodes that sampleacoustic signal at 20 kHz with a sample size of 16 bits, thedata generation rate is 320 Mbps, which is infeasible withthe existing wireless network technology on nodes of smallform factor and constrained-energy resource. We proposein-network processing of birdcalls to generate high-level de-scriptions such as birdcall type, calling time, and location.Then, the high-level description of smaller data size can betransmitted back for further analysis by biologists. In sum-mary, the Mainwaring et al. system, the birdcall recogni-tion, and the localization system described here are largelycomplementary.

7. CONCLUSION

Minimization of communication is a principle goal of taskdecomposition and collaboration in tiered sensor networksdue to energy constraints. We describe local filtering anddata reduction as two types of preprocessing at micronodesthat significantly reduce data transmission to macronodes.This paper presents only preliminary experimental evidencewhich shows that both data reduction and event filtering us-ing cross-zero rate are valid and effective. Future work mustinclude construction and evaluation of a complete system.

ACKNOWLEDGMENTS

The authors wish to acknowledge the inspiring personalcommunication with Dr. Ralph Hudson and Dr. Kung Yao.This work is sponsored by the NSF CENS.

REFERENCES

[1] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci,“Wireless sensor networks: a survey,” Computer Networks, vol.38, no. 4, pp. 393–422, 2002.

[2] Y. Xu, J. Heidemann, and D. Estrin, “Geography-informedenergy conservation for ad hoc routing,” in Proc. 7th AnnualACM/IEEE International Conference on Mobile Computing andNetworking (MobiCom ’01), pp. 70–84, Rome, Italy, July 2001.

[3] W. Ye, J. Heidemann, and D. Estrin, “An energy-efficient MACprotocol for wireless sensor networks,” Tech. Rep. ISI-TR-543,USC/Information Sciences Institute, University of SouthernCalifornia, Los Angeles, Calif, USA, September 2001.

[4] J. M. Rabaey, M. J. Ammer, J. L. Da Silva Jr., D. Patel, andS. Roundy, “PicoRadio supports ad hoc ultra-low power wire-less networking,” IEEE Computer Magazine, vol. 33, no. 7, pp.42–48, 2000.

[5] A. Cerpa and D. Estrin, “ASCENT: adaptive self-configuring

sensor network topologies,” in Proc. 21st International AnnualJoint Conference of the IEEE Computer and CommunicationsSocieties (Infocom ’02), New York, NY, USA, June 2002.

[6] N. Bulusu, D. Estrin, L. Girod, and J. Heidemann, “Scalablecoordination for wireless sensor networks: self-configuringlocalization systems,” in Proc. 6th International Symposiumon Communication Theory and Applications (ISCTA ’01), Am-bleside, Lake District, UK, July 2001.

[7] L. Girod and D. Estrin, “Robust range estimation usingacoustic and multimodal sensing,” in Proc. IEEE/RSJ Inter-national Conference on Intelligent Robots and Systems (IROS2001), Maui, Hawaii, USA, October 2001.

[8] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan,“The cricket location-support system,” in Proc. 6th AnnualACM/IEEE International Conference on Mobile Computing andNetworking (MobiCom ’00), pp. 32–43, Boston, Mass, USA,August 2000.

[9] N. B. Priyantha, A. Miu, H. Balakrishnan, and S. Teller, “Thecricket compass for context-aware mobile applications,” inProc. 7th Annual ACM/IEEE International Conference on Mo-bile Computing and Networking (MobiCom ’01), pp. 1–14,Rome, Italy, July 2001.

[10] L. Girod, V. Bychkovskiy, J. Elson, and D. Estrin, “Locatingtiny sensors in time and space: A case study,” in Proc. IEEEInternational Conference on Computer Design, Freiburg, Ger-many, September 2002.

[11] A. Savvides, C.-C. Han, and M. B. Srivastava, “Dynamicfine-grained localization in ad-hoc networks of sensors,” inProc. 7th Annual ACM/IEEE International Conference on Mo-bile Computing and Networking (MobiCom ’01), pp. 166–179,Rome, Italy, July 2001.

[12] J. Elson and D. Estrin, “Time synchronization for wirelesssensor networks,” in Proc. 2001 International Parallel and Dis-tributed Processing Symposium (IPDPS), Workshop on Paralleland Distributed Computing Issues in Wireless and Mobile Com-puting, p. 186, San Francisco, Calif, USA, April 2001.

[13] J. Elson, L. Girod, and D. Estrin, “Fine-grained networktime synchronization using reference broadcasts,” in Proc. 5thSymposium on Operating Systems Design and Implementation(OSDI 2002), Boston, Mass, USA, December 2002.

[14] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directeddiffusion: a scalable and robust communication paradigm forsensor networks,” in Proc. 6th Annual ACM/IEEE Interna-tional Conference on Mobile Computing and Networking (Mo-biCom ’00), pp. 56–67, Boston, Mass, USA, August 2000.

[15] M. Chu, H. Haussecker, and F. Zhao, “Scalable information-driven sensor querying and routing for ad hoc heterogeneoussensor networks,” International Journal on High PerformanceComputing Applications, vol. 16, no. 3, pp. 293–314, 2002.

[16] W. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive pro-tocols for information dissemination in wireless sensor net-works,” in Proc. 5th Annual ACM/IEEE International Con-ference on Mobile Computing and Networking (MobiCom ’99),pp. 174–185, Seattle, Wash, USA, August 1999.

[17] P. Bonnet, J. E. Gehrke, and P. Seshadri, “Querying the phys-ical world,” IEEE Personal Communications, vol. 7, no. 5, pp.10–15, 2000, Special Issue on Smart Spaces and Environ-ments.

[18] A. Cerpa, J. Elson, D. Estrin, L. Girod, M. Hamilton, andJ. Zhao, “Habitat monitoring: application driver for wire-less communications technology,” in Proc. ACM SIGCOMMWorkshop on Data Communications in Latin America and theCaribbean, Costa Rica, April 2001.

[19] B. Warneke, M. Last, B. Liebowitz, and K. S. J. Pister, “Smartdust: communicating with a cubic-millimeter computer,”IEEE Computer Magazine, vol. 34, no. 1, pp. 44–51, 2001.


[20] K. Sohrabi, W. Merrill, J. Elson, L. Girod, F. Newberg, andW. Kaiser, “Scaleable self-assembly for ad hoc wireless sen-sor networks,” in Proc. IEEE CAS Workshop on Wireless Com-munications and Networking, Pasadena, Calif, USA, Septem-ber 2002.

[21] R. Viswanathan and P. K. Varshney, “Distributed detectionwith multiple sensors: Part I-fundamentals,” Proceedings ofthe IEEE, vol. 85, no. 1, pp. 54–63, 1997.

[22] C. W. Reed, R. Hudson, and K. Yao, “Direct joint Source lo-calization and propagation speed estimation,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing, vol. 3, pp. 1169–1172, Phoenix, Ariz, USA, March 1999.

[23] T. L. Tung, K. Yao, C. W. Reed, R. E. Hudson, D. Chen, andJ. C. Chen, “Source localization and time delay estimationusing constrained least squares and best path smoothing,” inAdvanced Signal Processing Algorithms, Architectures, and Im-plementations IX, vol. 3807 of SPIE Proceedings, pp. 220–233,Los Angeles, Calif, USA, July 1999.

[24] H. Wang, L. Yip, D. Maniezzo, et al., “A wireless time-synchronized COTS sensor platform: applications to beam-forming,” in Proc. IEEE CAS Workshop on Wireless Communi-cation and Networking, Pasadena, Calif, USA, September 2002.

[25] G. J. Pottie and W. J. Kaiser, “Wireless integrated networksensors,” Communications of the ACM, vol. 43, no. 5, pp. 51–58, 2000.

[26] R. N. McDonough and A. D. Whalen, Detection of Signals inNoise, Academic Press, Orlando, Fla, USA, 1995.

[27] S. W. Golomb, “Run-length encodings,” IEEE Transactions onInformation Theory, vol. 12, no. 3, pp. 399–401, 1966.

[28] V. E. Levenstein, “On the redundancy and delay of separablecodes for the natural numbers,” Problems of Cybernetics, vol.20, pp. 173–179, 1968.

[29] P. Elias, “Universal codeword sets and representations of theintegers,” IEEE Transactions on Information Theory, vol. IT-21,no. 2, pp. 194–203, 1975.

[30] R. F. Rice, Some Practical Universal Noiseless Coding Tech-niques, vol. 79-22 of JPL Publication, Jet Propulsion Labo-ratory, Pasadena, Calif, USA, 1979.

[31] E. R. Fiala and D. H. Greene, “Data compression with finitewindows,” Communications of the ACM, vol. 32, no. 4, pp.490–505, 1989.

[32] P. Fenwich, “Punctured Elias codes for variable-length codingof the integers,” Tech. Rep. 137, Department of ComputerScience, The University of Auckland, Auckland, New Zealand,December 1996.

[33] O. Cramer, “The variation of the specific heat ratio and thespeed of sound in air with temperature, pressure, humidity,and CO2 concentration,” Journal of the Acoustical Society ofAmerica, vol. 93, pp. 2510–2516, May 1993.

[34] G. J. Pottie, “Wireless sensor networks,” in Proc. IEEE In-formation Theory Workshop, pp. 139–140, Killarney, Ireland,June 1998.

[35] R. E. Van Dyck and L. E. Miller, “Distributed sensor process-ing over an ad hoc wireless network: simulation frameworkand performance criteria,” in Proc. MILCOM, Washington,DC, USA, October 2001.

[36] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. An-derson, “Wireless sensor networks for habitat monitoring,” in1st ACM International Workshop on Wireless Sensor Networksand Applications (WSNA 2002), Atlanta, Ga, USA, September2002.

Hanbiao Wang is a third-year Ph.D. stu-dent of computer science at UCLA. He iscurrently working on collaborative infor-mation and signal processing in sensor net-works. He is very interested in designingenergy and bandwidth-efficient sensor net-works by intertwining tasks of networkingand information processing. He received hisB.S. degree in geophysics from University ofScience and Technology of China. He alsoreceived an M.S. degree in geophysics and space physics, and anM.S. degree in computer science from UCLA. He is a member ofthe ACM and the IEEE.

Deborah Estrin is a Professor of computerscience at UCLA and Director of the Centerfor Embedded Networked Sensing (CENS),a newly awarded National Science Founda-tion Science and Technology Center. She re-ceived her Ph.D. degree in computer sciencefrom MIT (1985) and was on the facultyof Computer Science at USC from 1986 tillmid-2000 where she received the NationalScience Foundation, Presidential Young In-vestigator Award for her research in network interconnection andsecurity (1987). During the subsequent 10 years, her research fo-cused on the design of network and routing protocols for very largeglobal networks. Estrin has been instrumental in defining the na-tional research agenda for wireless sensor networks, first chairinga 1998 DARPA ISAT study and then a 2001 NRC study; the lat-ter culminated in an NRC publication—Embedded Everywhere: AResearch Agenda for Networked System of Embedded Computers. Es-trin’s research group develops algorithms and systems to supportrapidly deployable and robustly operating networks of many thou-sands of physically-embedded devices. She is particularly interestedin applications to environmental monitoring. Estrin has servedon numerous program committees and editorial boards, includ-ing SIGCOMM, Mobicom, SOSP, and ACM/IEEE Transactions onNetworks. She is a Fellow of the ACM and AAAS.

Lewis Girod received his B.S. and M.E. incomputer science from MIT in 1995. Afterworking at LCS for two years in the areaof Internet naming infrastructure, he joinedDeborah Estrin’s group as a Ph.D. studentin 1998. He is currently a Ph.D. candidateat UCLA. His research focus is the devel-opment of robust networked sensor sys-tems, specifically physical localization sys-tems that use multiple sensor modalities tooperate independently of environment and deployment.

Documents

Sensor Networks - Hindawi Publishing Corporationdownloads.hindawi.com/journals/specialissues/754572.pdf · The information-driven sensor querying framework selec-tively activates