Clock skew and o set estimation from relative measurements ...plaza.ufl.edu/cdliao/document/tech_report/automatica_11_tech.pdf · Clock skew and o set estimation from relative measurements

Clock skew and offset estimation fromrelative measurements in mobile networks

with Markovian switching topology[Technical Report]

Chenda Liao, Prabir Barooah

Department of Mechanical and Aerospace Engineering, University of Florida,Gainesville, FL 32611, USA

Abstract

We propose a distributed algorithm for estimation of clock skew and offset of thenodes of a mobile network. The problem is cast as an estimation from noisy relativemeasurements. The time variation of the network was modeled as a Markov chain.The estimates are shown to be mean square convergent under fairly weak assumptionson the Markov chain, as long as the union of the graphs is connected. Expressionsfor the asymptotic mean and correlation are also provided. The Markovian switch-ing topology model of mobile networks is justified for certain node mobility modelsthrough empirically estimated conditional entropy measures.

Keywords: sensor networks, mobile networks, time synchronization, distributedestimation

1. Introduction

We consider the problem of estimation of variables in a network of mobile nodes inwhich pairs of communicating nodes can obtain noisy measurement of the differencebetween the variables associated with them. Specifically, suppose the u-th node of anetwork has an associated node variable xu ∈ R. If nodes u and v are neighbors at

IThis work has been supported by the National Science Foundation by Grants CNS-0931885and ECCS-0955023. Author email addresses: cdliao,[email protected].

Preprint submitted to Elsevier October 1, 2011

discrete time index k, then they can obtain a measurement ζu,v(k) where

ζu,v(k) = xu − xv + εu,v(k). (1)

The problem is for each node to estimate its node variable from the relative measure-ments it collects over time, without requiring any centralized information processingor coordination. We assume that at least one node knows its variable. Otherwise theproblem is indeterminate up to a constant. A node that knows its node variable iscalled a reference node. All nodes are allowed to be mobile, so that their neighborsmay change with time.

The problem of time synchronization (also called clock-synchronization) throughclock skew and offset estimation falls into this category, and provides the main mo-tivation for the study. The relationship between local clock time τu(t) of node u andglobal time t is usually modeled as

τu(t) = αut+ βu. (2)

where the scalars αu, βu are called its skew and offset, respectively. A node candetermine the global time t from its local clock time by using the relationship t =(τu(t) − βu)/αu as long as it knows the skew and offset of its local clock. Hencethe problem is clock synchronization in a network can be alternatively posed asthe problem of nodes estimating their skews and offsets. It is not possible for anode to measure its skew and offset directly. However, it is possible for a pair ofneighbors to measure the difference between their offsets and logarithm of skewsby exchanging a number of time stamped messages. Existing protocols to performso-called pairwise synchronization, such as [1, 2, 3, 4], can be used to obtain suchrelative measurements. The details will be described in Section 2. The problemof clock offset and skew estimation can therefore be cast as a special case of theestimation from relative measurements described above. If an algorithm is availableto solve the scalar node variable estimation problem, nodes can execute two copiesof this algorithm to estimate both skew and offset. Therefore we only consider thescalar case. In the context of time synchronization, the existence of a reference nodemeans that at least one node has access to the global time t. This is the case, wheneither at least one node is equipped with a GPS receiver or one node is elected to bea “root” so that it’s local clock time is considered the global time that everyone hasto synchronize to.

Time synchronization in ad-hoc networks, especially in wireless sensor networks,has been a topic of intense study in recent years. The utility of data collected andtransmitted by sensor nodes depend directly on the accuracy of the time-stamps. In

2

TDMA based communication schemes, accurate time synchronization is required forthe sensors to communicate with other sensors. Operation on a pre-scheduled sleep-wake cycle for energy conservation and lifetime maximization also requires accurateknowledge of the global time. We refer the interested reader to the review papers [5, 6]and to the more recent works [2, 7, 8, 9] for more details on time synchronization.

Most of the time synchronization protocols developed for sensor networks areprimarily targeted to networks of static, or quasi-static nodes. A common approachis to first elect a root node and construct a spanning tree of the network with theroot node being the “level 0” node. Every node thereafter synchronizes itself toa node of lower level (higher up in the hierarchy). This synchronization is donethrough message passing between the pair of nodes to estimate their relative skewand/or offset. Precise definitions of relative skew and relative offset will be givenin Section 2. Examples of such spanning-tree based protocols include the widelyused Timing-Sync Protocol for Sensor Networks (TPSN) [10] and Flooding TimeSynchronization Protocol (FTSP) [11]. Change in the network topology due to nodemobility or node failure requires recomputing the spanning tree and sometimes evenre-election of the root node. This adds considerable communication overhead. Thesituation gets worse if nodes move rapidly.

A number of papers have proposed fully distributed algorithms that do not relyon establishing a hierarchy among the nodes. Distributed protocols for estimation ofclock skews and/or offsets from measurements of relative skews and offsets betweenpairs of nodes have been proposed in [12, 13, 14, 9]. These algorithms were also tar-geted to networks of static nodes. The analysis of their correctness and performance,including robustness to nodes and link failure, that is reported in [13, 15, 16, 9] havealso been limited to static networks.

In this paper we focus on distributed estimation of skews and offsets in mobilenetworks in which the network changes with time due to the motion of the nodes.With some changes, the algorithms proposed in [12, 13, 14, 9] can be adapted tomobile networks. However, little is known about how such an algorithm will performin a mobile network. We propose such a modification and analyze the convergence ofthe algorithm.The algorithm we propose can be used by the nodes of a mobile net-work to estimate their clock skews and offsets and thus perform time synchronization.The network topology is continually changing due to the motion of the nodes, as wellas random communication failure. We model the resulting time-varying topology ofthe network as the state of a Markov chain. Techniques for the analysis of jumplinear systems from [17] are used to study convergence of the algorithm. We showthat under fairly weak assumptions on the Markov chain, the proposed algorithm ismean square convergent if and only if the union of the graphs that occur is connected.

3

Mean square convergence means the expected value and the variance of the estimatesobtained by each node converges to fixed values that do not depend on the initialconditions. When the relative measurements are unbiased, then limiting mean is thesame as the true value of the variable, meaning the estimates obtained are asymp-totically unbiased. Formulas for the limiting mean and variance are obtained byutilizing results from jump linear systems.

Another contribution of the paper is to provide justification for the Markovianswitching topology for mobile networks. The Markovian switching model has alsobeen used extensively in studying consensus protocols in networks with dynamictopologies [18, 19, 20, 21, 22]. For a network of static nodes with link drops, theMarkovian switching model arises naturally from Markovian link drop model. Inmobile networks, though, the only case where we can prove that a mobile networkevolves according to a Markov chain is when nodes move according to the so-calledrandom walk mobility model [23]. Although the Markovian switching assumption fa-cilitates analysis, this assumption requires justification. We use a technique from [24]to check if the graph switching with the so-called Random Waypoint Mobility (RWP)model is Markovian. The RWP model is one of the most widely used mobility modelsfor ad-hoc mobile networks [23]. We show that the resulting graph switching processcan indeed be approximated well by a (first order) Markovian switching model.

The error dynamics of the estimation algorithm turns out to be a consensus-typealgorithm with leader(s), where the leader states - corresponding to the error ofthe reference nodes - are always 0. In Remark 1 in Section 4, we comment on thesimilarities and differences between the scenario examined here and prior work onconsensus with Markovian switching topology in [21, 22].

The rest of the paper is organized as follows. Section 2 describes the connectionbetween the problem of estimation from relative measurements and the problem ofskew/offset estimation. Section 3 states the problem precisely, describes the proposedalgorithm, and states the main result (Theorem 1). It also discusses the relevanceof the Markovian switching topology model. Section 4 is devoted to the proof of thetheorem. Simulation studies are presented in Section 5.

2. Relative measurements of skew and offset

We describe here how skew and offset estimation can be posed in the form of thegeneral problem of estimation from relative measurements. A number of methodsare available in the literature that allows pairwise synchronization between a nodepair from time-stamped messages [1, 3, 25]. Specifically, in pairwise synchronizationnode u estimates the parameters αu,v and βu,v that allows it to determine its local

4

time in terms of the local time at node v at the same instant, where

τu(t) = αu,vτv(t) + βu,v. (3)

The parameters αu,v and βu,v are referred to as the skew and offset of node u withrespect to node v [4], and also as the relative skew and relative offset between nodes uand v [2]. In a mobile network, nodes u and v can perform pairwise synchronizationif they are in each others’ communication range long enough to exchange a numberof time stamped messages. The minimum number of messages that have to beexchanged is 4 in the protocol proposed in [1]. With more packets exchanged, theaccuracy of the estimates improves.

To understand the relationship between αu,v, βu,v and αu, βu, αv, βv, we expressthe local time τu(t) of node u at global time t in terms of the local time τv(t) at nodev at the same time t by using (1):

τu(t) = αu(τv(t)− βv

αv) + βu =

αuαvτv(t) + βu − βv

αuαv,

From this and (3) we obtain the relationship

αu,v :=αuαv

βu,v := βu − βvαuαv. (4)

Suppose a node u obtains noisy estimates αu,v, βu,v of the parameters αu,v, βu,v byusing a pairwise synchronization protocol. We model the noisy estimate of αu,v as

αu,v = αu,vexp(εsu,v) (5)

where εsu,v is a random variable that captures the error between the estimate and thetrue value of αu,v. If the estimation error is small, then εsu,v is close to 0. Taking logon both sides of (4), we obtain

log αu,v = logαu,v + εsu,v = logαu − logαv + εsu,v. (6)

With the definitions

ζsu,v := log αu,v, xsi := logαu,

we see that (6) can be rewritten as

ζsu,v = xsu − xsv + εsu,v, (7)

5

which makes ζsu,v a noisy relative measurement of the node variables xsu and xsv; cf. (1).It is important to notice that ζsu,v is a measured quantity – since αu,v is measured

– while the variables xsu, xsv are unknown. Similarly, the noisy estimate βu,v of βu,v

with random estimation error eou,v can be written as

βu,v = βu,v + eou,v = βu −αuαvβv + eou,v = βu − βv + εou,v, (8)

where

εou,v := βv(1−αuαv

) + eou,v.

With the definitions

ζou,v := βu,v, xou := βu,

we see that (8) can be rewritten as

ζou,v = xou − xov + εou,v, (9)

which again falls under the category of relative measurements of node variables, i.e.,has the form (1). In this case the node variables are the clock offsets βu’s. Thenoise εou,v in the offset measurement is in general biased even if the estimate of theparameter βu,v is unbiased.

This discussion shows that the problem of estimating the skews and offsets of allthe clocks in a network can be posed as an estimation from relative measurementsproblem, where relative measurements are of the form (1). Once node u obtainsestimates xsu and xou of its node variables xsu and xou, estimates of its skew and offsetcan be obtained as αu := exp(xsu) and βu := xou.

3. Problem statement, proposed algorithm and results

Since both relative offset and skew ratio measurements are special cases of noisyrelative measurements of scalar variables, we only consider the problem of estimatingthe scalar node variables xu, u = 1, . . . , n, where n is the number of nodes in thenetwork that do not know their node variables. We assume that there are nr addi-tional nodes that knows their node variables, nr has to be at least one to make theproblem well-posed. The total number of sensor nodes in the network is thereforentotal = n+ nr. These define a node set V = 1, . . . , ntotal. Time is measured by adiscrete time-index k = 0, 1, . . . . The mobile nodes define a time-varying undirected

6

measurement graph G(k) = (V ,E(k)), where (u, v) ∈ E(k) if and only if u and v canobtain a relative measurement of the form (1) during the time interval between thetime indices k and k+ 1. Specifically, for each (u, v) ∈ E(k), there is a measurementζu,v(k) = xu − xv + εu,v(k) that is available to both u and v at time k. In practice,such a measurement is usually obtained by one of the node nodes, though the othernode takes part in the process by returning time stamped messages in response tomessages sent by the first node. We assume that if u gets all the time stamps nec-essary to obtain a measurement of xu − xv first (between u and v), it computes themeasurement ζu,v and then sends this measurement to v so that v also has accessto the same measurement. We follow the convention that the relative measurementbetween u and v that is obtained by the node u is always of xu − xv while thatused by v is always of xv − xu. Since the same measurement is shared by a pair ofneighboring nodes, if v receives the measurement ζv,u from u, then it converts themeasurement to ζv,u by assigning ζv,u(k) := −ζu,v(k).

The neighbors of u at k, denoted by Nu(k), is the set of nodes that u has anedge with in the measurement graph G(k). Since multiple packet transmissionshave to occur to obtain a measurement, if (u, v) ∈ E(k), then u and v can alsocommunicate and exchange information at time k. Therefore, if one prefers to thinkof a communication graph, it is the same as the measurement graph.

The problem is to estimate the node variables xu for u = 1, . . . , n by usingthe relative measurements ζu,v(k), (u, v) ∈ E(k) that becomes available over timek = 0, 1, . . . . In addition, the algorithm has to be distributed in the sense thateach node has to estimate its own variables, and at every time k, a node u can onlyexchange information with its neighbors Nu(k).

3.1. Distributed estimation from relative measurement

In the proposed algorithm, each node u maintains in its local memory an estimatexu(k) ∈ R of its node variable xu. Every node - except the reference nodes - iter-atively updates its estimate as we’ll describe now. The estimates can be initializedto arbitrary values. In executing the algorithm at iteration k, node u communicateswith its current neighbors to obtain their current estimates xv(k), v ∈ Nu(k). It alsocollects the measurements ζu,v(k) for each v ∈ Nu(k) during this time. Since obtain-ing measurements require exchanging time-stamped messages, the current estimatescan be easily exchanged during the process of obtaining new measurements. Node uthen updates its estimate according to

xu(k + 1) =wuu(k)xu(k) +

∑v∈Nu(k)wvu(k)(xv(k) + ζu,v(k))

wuu(k) +∑

v∈Nu(k) wvu(k), (10)

7

where the weights wvu(k) are arbitrary positive numbers. Nodes continue this itera-tive update indefinitely unless they see little change in their local estimates, at whichpoint they can stop updating. If u is a reference node then xu(k) = xu for all k. Theupdate law is well-defined even at times when u has no neighbors.

The rationale behind the update law (10) is that

xv(k) + ζu,v(k) = xu + (xv(k)− xv) + εu,v(k), (11)

so that each term inside the summation on the right hand side of (10) is an estimateof xu, with (xv(k) − xv) + εu,v(k) being the estimation error. The right hand sideof (10) is therefore an weighted average of multiple noisy estimates of xu. If themeasurement noise εu,v(k) is zero-mean and the initial estimates are unbiased, i.e.E[xu(0)] = xu for each u, then the right hand side of (11) is an unbiased estimate ofxu for all k.

Each node u is allowed to vary its local weights wu,v(k) with time and use dis-tinct weights for distinct neighbors to account for the heterogeneity in measurementquality. Between two neighbors p, q of u at time k, the relative measurement ζu,p(k)between u and p may have lower measurement error than the relative measurementζu,q(k) between u and q. This occurs, for example, if u and p were able to ex-change more time stamped messages than u and q before computing the relativemeasurements [3, 4]. In that case, node u should choose its local weights at k so thatwu,p(k) > wu,q(k). Due to the denominator in (10), it is only the ratios among theweights that matter, not their absolute values.

The update law (10) has similarities to the update laws in [9, 12, 13, 14] thatwere proposed for static networks. The difference is the robustness of the update lawto the case when there are no neighbors, and the inclusion of the time-varying andheterogeneous weights.

3.1.1. Asynchronous implementation

The description so far is in terms of a common global iteration index k. Inpractice, nodes do not have access to such a global index. Instead, each node ukeeps a local iteration index ku. After every increment of the local index, the nodetries to collect a new set of relative measurements with respect to one or more ofits neighbors within a pre-specified time interval. At the end of the time interval,whether it is able to get new measurements or not, it updates its estimate accordingto the update law (10) - with k replaced by ku in it - and increments its local iterationcounter. The process then repeats. It follows from (10) that if a node is unable togather new measurements from any neighbors, then its updated estimate is preciselythe previous estimate.

8

The global iteration index is useful to describe the algorithm from the point ofview of an omniscient spectator. Let T the time interval, say, in seconds, betweentwo successive increments of the global index k. The parameter T is arbitrary, aslong as is small enough so that no node updates its local estimate more than oncewith the time interval T . In that case, one of only two events are possible for anarbitrary node u at the end of the time interval when the global counter is increasedfrom k to k+1: (i) u either increases its local index by one, or (ii) u does not increasesits local index. If a node increases its local index, both the local and global indicesincrease by one. A node does not increase its local iteration index if it is not able togather new measurements. In the omniscient spectator’s view, the node’s neighborset is empty at this time index; so according to (10), the next estimate of the node’svariable is the same as the previous one. Thus, a node’s local asynchronous stateupdate can be described in terms of the synchronous algorithm (10); the latter beingmore convenient for the purpose of exposition. In the remainder of the paper weconsider only the synchronous version.

3.2. Convergence with Markovian switching

In this paper we model the sequence of measurement/communication graphsG(k)∞k=0 that appear as time progresses as the realization of a (first order) Markovchain, whose state space G = G1, . . . ,GN is the set of graphs that can occurover time. In general, a stochastic process X(k) is called `-th order Markov ifP(X(k + 1)|X(k), X(k − 1), . . . ) = P(X(k + 1)|X(k), X(k − 1), . . . , X(k − ` + 1)),where P(·) denotes probability. The Markovian switching assumption on the graphsmeans that P(G(k + 1) = Gi|G(k) = Gj) = P(G(k + 1) = Gi|G(k) = Gj,G(k − 1) =G`, . . . ,G(0) = Gp) where Gi, Gj, G`, . . . , Gp ∈ G.

We assume that the Markov chain is homogeneous, and denote the transitionprobability matrix of the chain by P , pij being the (i, j)-th entry of P . Furtherdiscussion on Markov modeling of graphs is postponed till Section 3.3.

The main result of the paper - on the mean square convergence of the algo-rithm (10) - is stated below as a theorem. In the statement of theorem, e(k) := x(k)−x is the estimation error, where x := [x1, . . . , xn]T and x(k) := [x1(k), . . . , xn(k)]T arethe vectors of all the node variables and their estimates. Moreover, µ(k) := E[e(k)]is the mean and Q(k) := E[e(k)e(k)T ] is the correlation matrix of the estimationerror vector, where E[·] denotes expectation. We say that a stochastic process y(k)is mean square convergent if E[y(k)] and E[y(k)yT (k)] converges as k →∞ for everyinitial condition. The union graph G is defined as follows:

G := ∪Ni=1Gi = (V ,∪Ni=1Ei). (12)

9

We assume that the measurement noise εu,v(k) affecting the measurements on theedge (u, v) is a wide sense stationary process with mean E[εu,v(k)] = µu,v and varianceCov(εu,v(k), εu,v(k)) = σ2

u,v. We also assume that the measurement noise sequenceε(k) and the initial condition x(0) is independent of the Markov chain that governsthe time-variation graphs.

Due to technical reasons, we make an additional assumption that there exists atime k0 after which the weight between two nodes do not change. That is, if i andj become neighbors at two distinct times k1 and k2 (both greater than k0), thenwij(k1) = wij(k2) as well as wji(k1) = wji(k2). There are various ways nodes canachieve this. One simple way is for every node to use a constant weight w for allits neighbors. Another possibility is for every node u to store in its local memorythe weight wu,v(kv) that is used for the neighbor v at kv, where kv is the last timeit encountered v. After a certain time, nodes stop changing the weights but use theweight it used the previous time it encountered the same neighbor. In this case,a node has to store at most ntotal numbers, one for each potential neighbors. Thechoice of weights during the “transient period” (up to k0) will affect initial reductionof the estimation errors but will not change the asymptotic behavior.

Theorem 1. Assume that the temporal evolution of the communication graph G(k)is governed by an N-state homogeneous Markov chain that is ergodic, and pii > 0 fori = 1, . . . , N . The estimation error e(k) is mean square convergent if and only if Gis connected. If additionally all the measurements are unbiased, then the estimatesare asymptotically unbiased: limk→∞µ(k) = 0.

The implication of the theorem is that as long as nodes are connected in a “time-average” sense characterized by G being connected, the nodes have estimates whosemean and variance converges as time progresses irrespective of the initial conditions.Thus, after a sufficiently long time, the nodes can turn off the synchronization up-dates without much loss of accuracy. The assumption of ergodicity of the Markovchain ensures that there is an unique steady state distribution and that the steadystate probability of each state is non-zero [17]. This means every graph in the statespace of the chain occurs infinitely often. Since their union graph is connected, er-godicity implies that information from the reference node(s) will flow to each of thenodes over time. None of the graphs that ever occur is required to be a connectedgraph. The assumption pii > 0 means there is a non-zero probability that graphstays the same from one iteration index to the next, which is equivalent to the fol-lowing: if u and v are able to collect a relative measurement at k, they are able to doso again at k+ 1 with a non-zero probability. This can be assured if the nodes move

10

slowly enough. Therefore the assumption pii > 0 is an assumption on the rapidityof node motion.

Though the theorem states that the limiting mean and variance exist, it doesnot specify what they are. In fact, a formula for the limiting mean and variance isprovided later in Lemma 3 in Section 4.

3.3. Markovian model of topology change

Here we examine the question of the applicability of the Markovian model ofgraph switching. An example in which the time variation of the graphs satisfies thehomogeneous Markov model is a network of mobile agents whose motion is modeledwith first order dynamics with range-determined communication. In ad-hoc networksliterature this is referred to as the random walk mobility model [23]. Specifically,suppose the position of node u at time k, denoted by pu(k), is restricted to lieon the unit sphere S2 = x ∈ R3|‖x‖ = 1, and suppose the position evolutionobeys: pu(k + 1) = f(pu(k) + ∆u(k)), where ∆u(k) is a stationary zero-mean whitenoise sequence for every u, and E[∆u(k)∆v(k)T ] = 0 unless u = v. The functionf(·) : R3 → S2 is a projection function onto the unit-sphere. In addition, (u, v) ∈E(k) if and only if the geodesic distance between them is less than or equal to somepredetermined value. In this case, the graph G(k) is uniquely determined by the nodepositions at time k, and the prediction of G(k + 1) given G(k) cannot be improvedby the knowledge of the graphs observed prior to k: G(k − 1), . . . ,G(0). Hencethe evolution of the graph sequence satisfies the Markovian property. If in additionrandom communication failure leads to two nodes not being able to communicateeven when they are in range, the Markovian property is retained if the communicationfailure is i.i.d.

Nevertheless, for some mobility models, it is not straightforward to check if thesequence of graphs generated by the model satisfies the Markovian property. Ageneral method of checking Markovian switching of graphs is therefore needed. Weborrow a method that is proposed in [24] to check if a stochastic process is Markovfrom observations of the process. We first introduce some standard notation frominformation theory. Let X be a discrete random variable with sample space Ω =1, . . . , N and probability mass function p(x) = P(X = x), where x ∈ Ω. Theentropy of X is defined by

H(X) = −∑x∈Ω

p(x) log P (x). (13)

The definition of entropy is extended to a pair of random variable X, Y , where

11

0 1 2 3 4 50

0.5

1

1.5

2

i

H

(a) Independent

0 1 2 3 4 50

0.5

1

1.5

2

i

H

(b) First-order

0 1 2 3 4 50

0.5

1

1.5

2

i

H

(c) Second-order

Figure 1: The standard shapes of estimated conditional entropy for three different cases: (a)independence, (b) first-order dependence and (c) second-order dependence. If a process is a firstorder Markov chain, empirically computed conditional entropy will show a trend similar to the onein (b).

X, Y ∈ Ω, as follows

H(X, Y ) := −∑x,y∈Ω

p(x, y) log p(x, y). (14)

The conditional entropy H(Y |X) is defined as

H(Y |X) := −∑x,y∈Ω

p(x, y) log p(x|y) = H(X, Y )−H(X). (15)

The conditional entropy measures the conditional uncertainty about an event giventhe another event. Consider a stochastic process X1, X2, . . . . Assuming the pro-cess is stationary, we denote H(single) := H(Xk) and H(double) := H(Xk, Xk+1),for all k = 1, . . . . It is straightforward to show that H(double) = 2H(single) ifthe successive random variables Xk are i.i.d. In this case the random process isa zero-order Markov process. If the random variables Xk are not independent,H < H(double) < 2H. To address the question of whether it is second-order or higherorder Markov, we extend the entropy definition to multivariate random variablessuch that H(triple) := H(Xk, Xk+1, Xk+2), etc. Now, the sequence H0 = log(N),H1 = H(single), H2 = H(double) − H(single) and H3 = H(triple) − H(double),etc, measures the conditional uncertainty for each order of dependence. A graphicalapproach is given in [24] to determine the order of dependence of a random processby plotting the estimates of each Hi, where i = 1, 2, . . . and examining the shape ofthe curve. The estimate of each Hi can be calculated from observations. Figure 1shows the standard shapes for independence, first-order dependency, and second-order dependency. If the process is independent, then knowing the value of Xk will

12

not help in predicting Xk+1, which is seen in the flat shape of the entropy functionin Figure 1(a). In contrast, the sharp drop from H1 to H2 in Figure 1(b) indicatesthat knowing the value of Xk−1 will dramatically decrease the uncertainty in the pre-diction of Xk, while the values of Xk−2, Xk−3, . . . will not help much. This accordswith the dependence property of a first-order Markov chain. Similarly, Figure 1(c)indicates that the previous two variables Xk−2, Xk−1 are both important to predictXk. In this case the process is better modeled as a second order Markov chain.

In order to conclude whether the evolution of graphs is governed by a first-orderMarkov chain, we adopt the method discussed above as follows. For a particularmobility model, we conduct a simulation and collect observations of the graph se-quence. Since the underlying sample space G of the stochastic process Gk is finite,the method described above is applicable. We then use the approach above to checkwhether the plot of Hi estimated from the collected observations is closer to that inFigure 1(b) than to those in Figure 1(a) or Figure 1(c). If so, we declare that it isreasonable to model the graph switching process as Markovian.

As an illustrative example, we consider the widely used random waypoint (RWP)mobility model [23]. In the RWP model, each node is initialized to stay in its initialposition for a certain period of time (so called pause time tp). Then, the nodepicks a random destination within the region it is allowed to move and a speed thatis uniformly distributed in [vmin, vmax]. Once node reaches the new destination, itpauses again for tp before starting over. We conduct a simulation of the RWP modelwith 3 nodes, where vmin, vmax, tp are chosen as 10 m/s, 50 m/s and 0.1 s. Thenodes are allowed to move in a region 10 × 10 m. Nodes’ positions are initializedrandomly. The sample space consists of 8 graphs. By performing the simulation fora long time (104 s), we obtain a large number of observations of the process G(k).The probability mass function is empirically estimated from the observations. Forestimating conditional entropies, certain conditional probabilities, especially thoseof the type P (G(k) = G1|G(k − 1) = G2,G(k − 2) = G3), are problematic since therelevant events may not be observed even in a very long sequence of observations.In this case we set the corresponding probabilities to 0 and use 0 log 0 = 0. Theempirically estimated conditional entropies Hi are shown in Figure 2. Clearly, theshape of curve is similar to that in Figure 1(b). Therefore, we conclude that thegraph switching process in RWP mobility can be reasonably modeled as a (firstorder) Markov chain.

Note that in RWP mobility, prediction of the future node locations (and thereforethe graph) based on knowledge of past and present may be more accurate instead ofprediction based on only the present. Therefore it is quite possible that the graph

13

0 1 2 30.8

1

1.2

1.4

1.6

1.8

2

2.2

i

H

Figure 2: Empirically estimated conditional entropy for the graph process Gk with three nodesmoving according to the random waypoint mobility model.

switching is not first-order Markov. However, the results of the test above shows thata Markov model quite accurately captures the graph switching process with RWPmobility.

4. Proof of Theorem 1

We first introduce some standard and non-standard graph-theoretic terminology.

Consider a directed graph ~G = (V , ~E) with ntotal nodes, m edges, and positive weightsassigned to every edge. The weight matrix W is a diagonal matrix of the edge weights:

W := diag(w1, . . . , wm), where wi’s are weights in the set wu,v|(u, v) ∈ ~E indexed

from 1 to m. The pair (~G,W ) is called a directed weight graph. See Figure 3 for anexample of an undirected measurement graph and an associated directed weightedgraph. The node-edge incidence matrix A of the directed graph ~G is a ntotal × mmatrix whose entries are defined as follows: Au,e = 0 if the edge with index e is notincident on node u; Au,e = +1 if e is directed away from u and Au,e = −1 if e isdirected toward u. The weighted in-directed Laplacian matrix Lin of the weighteddirected graph (~G,W ) is a ntotal×ntotal matrix defined as Linu,v = −wv,u for u 6= v andLinu,u = −

∑v 6=u L

inu,v. In other words, the off-diagonal entries on the u-th row are the

negative of the weights on the edges directed toward the node u, the diagonal entriesare positive in such a way that each row sum is 0. We also define the ntotal × min-incidence matrix Ain as the matrix A with the +1 entries removed. In the in-incidence matrix, information about the tail nodes of edges is lost, only informationon the heads of edges is retained. The following is straightforward to verify:

Lin = AinWAT . (16)

The in-incidence matrix and the relation (16) appeared earlier in [26] under differentnames. Given a subset of r special nodes among the ntotal nodes, so that n = ntotal−r,

14

1

2 3

Gw1

1

2 3

w1w2w3

w4

Gw2

w5w6

Figure 3: A measurement graph G and the corresponding weight graph (~G,W ), with 1 being thereference node. Note that edges pointing toward the reference nodes do not affect the algorithmsince reference nodes do not update their estimates.

the basis incidence matrix Ab is the n × m sub-matrix of A that is obtained byremoving the rows corresponding to the nr special nodes [27]. Similarly, we definethe basis in-incidence matrix Ainb as the n×m sub-matrix of Ain that is obtained byremoving the rows corresponding to the nr special nodes. Finally, we define the basisin-Laplacian matrix Linb as the n×n sub-matrix of Lin that is obtained by removingthe rows and columns corresponding to the special nodes. The sub-matrices of A,Ain

that are removed to obtain the basis matrices are denoted by Ar, Ainr , respectively,

so that with appropriate indexing,

A =

[ArAb

]Ain =

[AinrAinb

]. (17)

The square non-negative matrices M,N are defined as follows: M is a diagonalmatrix made up of the diagonal entries of Lin, and N := M − Lin. Similarly, defineMb, Nb for Linb . In the language of iterative computation, M and N define a splittingof Linb [28]:

Linb = Mb −Nb. (18)

The update law (10) can be expressed compactly in terms of the undirected

communication graph G(k) = (V ,E(k)) and a directed weight graph (~G(k) =

(V , ~E(k)),W (k)), defined over the same node set but different edge sets even atthe same time k. In particular, if u and v have an undirected edge in G(k), then

we have two directed edges (u, v) and (v, u) in ~G. Thus, given a measurement graph

G(k), the edges of the weight graph are uniquely defined. An edge (u, v) of ~G(k) hasa positive weight wu,v(k) that is chosen by the node u; these weights are combined

into a weight matrix W (k) ∈ R~m(k)×~m(k), where ~m(k) is the number of edges in ~E(k).

15

The algorithm (10) also uses positive weights wuu(k), u ∈ V that are not part ofthe weight matrix W (k). The ntotal × ntotal diagonal matrix D(k) is defined withthese weights: Du,u = wu,u(k) for every u ∈ V . In this paper the special nodesused to define the basis incidence/in-incidence/in-Laplacian matrices will be the ref-erence nodes. The n× n diagonal matrix Db(k) is a submatrix of D(k) obtained byremoving those rows and columns corresponding to the reference nodes. Recall theconvention established earlier: if (u, v) ∈ ~E(k) so that ζu,v(k) is available to u, thenthe measurement ζv,u(k) (which is simply −ζu,v(k)) is available to v. If we count themeasurements available to every node as distinct measurements then the number ofmeasurements at time k is m(k), the number of edges in the weight graph. These~m(k) measurements are now indexed from 1 to m(k) and placed in a tall vectorζ(k) := [ζ1(k), . . . , ζ~m(k)(k)]T . The measurements available to reference nodes, andthe weights on edges pointing toward reference nodes, are not used since the referencenodes do not update their estimates.

With the definitions introduced so far, the update law (10) can be compactlyexpressed as

(Mb(k) +Db(k))x(k + 1) = (Nb(k) +Db(k))x(k)− Ainb (k)W (k)ATr (k)xr + Ainb (k)W (k)ζ(k).(19)

Recall that x(k) = [x1(k), . . . , xn(k)]T . To obtain the dynamics for the estimationerror e(k) = x(k) − x, we first note that if we define a vector of all node vari-ables, known and unknown, as xall = [xTr ,x

T ] ∈ Rntotal , then we have the followingrelationship

ζ(k) = AT (k)xall + ε(k) = ATr (k)xr + ATb (k)x + ε(k), (20)

where ε(k) := [ε1(k), . . . , εm(k)(k)]T is a measurement noise vector. Expanding theright hand side of (19) using (20), we obtain

(Mb(k) +Db(k))x(k + 1)

= (Nb(k) +Db(k))x(k) + Ainb (k)W (k)Ab(k)Tx + Ainb (k)W (k)ε(k)

= (Nb(k) +Db(k))x(k) + (Mb(k)−Nb(k))x + Ainb (k)W (k)ε(k), (21)

where the second equality follows from Ainb WATb = Linb = Mb − Nb, which followsfrom (16) and (18). We now subtract the equality

(Mb(k) +Db(k))x = (Nb(k) +Db(k))x + (Mb(k)−Nb(k))x

16

from (21) and use e(k) = x(k)− x to obtain

(Mb(k) +Db(k))e(k + 1) = (Nb(k) +Db(k))e(k) + Ainb (k)W (k)ε(k).

This is rewritten as

e(k + 1) = Jb(k)e(k) +Bb(k)ε(k), (22)

where

Jb(k) := (Mb(k) +Db(k))−1(Nb(k) +Db(k)),

Bb(k) := (Mb(k) +Db(k))−1Ainb (k)W (k),(23)

In general, the weight W (k) at time k is not completely specified by the graphG(k) at that time if nodes are allowed to vary the weights arbitrarily over time.However, recall that an additional constraint was imposed on choosing weights, thatthere exists a time k0 after which the weight between two nodes do not change. Theresult of imposing this constraint is that after the transient period, the graph G(k)uniquely determines the weight matrix W (k). Since there are N distinct graphs inG, we can define a set W := Wi, . . . ,WN, with Wi associated with Gi. As a result,for k ≥ k0, if G(k) = Gi then W (k) = Wi.

The dimensions of the quantities Bb(k), ε(k) depend on the state of the Markovchain at k. Now we introduce some “extended” quantities to rectify this issue. Letmx be the number of edges in the union graph G (recall G = ∪iGi). For everygraph Gi ∈ G, define an extended incidence matrix Axi by introducing additionalcolumns with all entries equal to 0 so that Axi is an ntotal ×mx matrix. Let Axb i thecorresponding extended basis incidence matrix. Let Mi, Di,Wi (and Mbi, Dbi,Wbi)be the matrices M,D,W (and Mb, Db,Wb) defined for the graph Gi. For every Wi,define W x

i ∈ Rmx×mxby introducing additional 0 entries on the diagonal for all

edges that are not in Gi. Now define Bxbi := (Mbi +Dbi)

−1Axb iWxi ∈ Rn×mx

. At everytime k, we also define an extended noise vector εx(k) ∈ Rmx

. The entries of εx(k)corresponding to the edges that do not exist in G(k) are set to 0, as are their theirmeans and variances. As a result, if G(k) = Gi, then Axb (k)W x(k)εx(k) = Axb iW

xi ε

x(k)= AbiWiε(k) = Ab(k)W (k)ε(k).

With these choices, the state of the following system is identical to that of (22)for the same initial conditions:

e(k + 1) = Jb θ(k)e(k) +Bxb θ(k)ε

x(k), k ≥ k0 (24)

where θ : Z+ → (G,W) is the switching process that is governed by the underlyingMarkov chain. The reason for the qualifier k ≥ k0 is that weights are not limited to

17

the set W before k0, so technically the matrices Jb, Bxb are uniquely determined by

the Markov chain only for k ≥ k0. The error dynamics (24) is a Markov jump linearsystem (MJLS) [17]. The following terminology will be used:

γ := E[εx(k)] Γ := E[εx(k), εxT (k)] ≥ 0. (25)

µ(k) := E[e(k)] Q(k) := E[e(k)eT (k)]. (26)

To proceed with the analysis of the mean square convergence of (24), we needadditional terminology. For a set of square matrices Xi ∈ R`×`, Xij ∈ R`×`, i, j =1, . . . , N , we define

diag[Xi] :=

X1 . . . 0...

. . ....

0 . . . XN

, [Xij] :=

X11 X12 . . . X1N

X21 X22 . . . X2N...

.... . .

...XN1 XN2 . . . XNN

.Both diag[Xi] and [Xij] are `N × `N matrices. We now define the matrices

Ji := (Mi +Di)−1(Ni +Di) ∈ Rntotal×ntotal , Fi := Ji ⊗ Ji ∈ Rn2

total×n2total ,

Jbi := (Mbi +Dbi)−1(Nbi +Dbi) ∈ Rn×n, Fbi := Ji ⊗ Ji ∈ Rn2×n2 (27)

where ⊗ denotes the Kronecker product, Mi, Ni are the matrices forming the split-ting (see (18)) of the in-Laplacian Li of the graph Gi, the the diagonal matrix Di’sdiagonal entries are the weights on the self-loops of Gi, and Mbi, Nbi, Dbi’s are theircorresponding basis matrices with the rows and columns associated with the referencenodes taken out. Furthermore, define the matrices

Cb := (PT ⊗I)diag[Jbi] = [pjiJbj] ∈ RNn×Nn,

Db := (PT ⊗I)diag[Fbi] = [pjiFbj] ∈ RNn2×Nn2

, (28)

D :=(PT ⊗I

)diag[Fi] = [pjiFj] ∈ RNn2

total×Nn2total , (29)

where I is an identity matrix of appropriate dimension. Recall that P is the transitionprobability matrix of the Markov chain.

The key to establishing the main result of the paper, Theorem 1, is the followingtechnical result.

Lemma 1. When the temporal evolution of the graph G(k) is governed by a homo-geneous ergodic Markov chain whose transition probability matrix P has the propertythat its diagonal entries are strictly positive, then ρ(Db) < 1 if and only if the uniongraph G defined in (12) is connected, where Db is defined in (28) and ρ(·) denotesthe spectral radius.

18

The proof of Lemma 1 requires additional technical results that are presentednext. Recall that a non-negative matrix is called stochastic if each row sum is 1. IfX is a stochastic matrix, then ρ(X) = 1 [29].

Proposition 1. 1. If X is stochastic matrix, X ⊗X is also a stochastic matrix.

2. The matrices Ji and Fi, i = 1, . . . , N , defined in (27) are stochastic matrices.

3. Let

K :=

F1 F2 · · · FNF1 F2 · · · FN...

.... . .

...F1 F2 · · · FN

ntotalN×ntotalN

, Kb :=

Fb1 Fb2 · · · FbNFb1 F2 · · · FbN

......

. . ....

Fb1 Fb2 · · · FbN

Nn×Nn

,

There exists a permutation matrix X so that Kb is a principal sub-matrix ofXTKX.

4. For the matrix D ∈ RNn2total defined in (28), we have ρ(D) ≤ 1.

Proof of Proposition 1. The first two statements are straightforward to establish.The third statement follows from the fact that Jbi is a principal submatrix of Ji. Wetherefore prove only the fourth statement. From (29).

ρ(D) = ρ([pjiFj]).

Since pjiFj is a non-negative square matrix, it follows from [30, Theorem 3.2] thatρ([pjiFj]) ≤ ρ([‖pjiFj‖∞]). Moreover, ‖pjiFj‖∞ = pji‖Fj‖∞, and since Fj is astochastic matrix, ‖Fj‖∞ = 1. We therefore have

ρ(D) ≤ ρ([pji‖Fj‖∞]) = ρ([pji]) = ρ(PT ) = ρ(P) = 1.

The proof of the next result is provided in the Appendix since it requires intro-duction of considerable new terminology.

Lemma 2. Let P be the transition probability matrix of an N-state ergodic Markovchain whose diagonal entries are positive. The D defined in (28) is irreducible if andonly if the union graph G defined in (12) is connected.

Now we can prove Lemma 1.

19

Proof of Lemma 1. Since the union graph G is connected, it follows from Lemma 2that D is irreducible. From the third statement of Proposition 1, there exists apermutation matrix X, such that Db is a principal submatrix of XTDX. The spectralradius of an irreducible matrix is strictly greater than the spectral radius of any ofits principal submatrices, which follows from Theorem 5.1 in [29]. Therefore we have

ρ(Db) < ρ(XTDX).

From the fourth statement in Proposition 1 and the fact that permutation does notchange eigenvalues, it follows that

ρ(XTDX) = ρ(D) ≤ 1.

Combining these two inequalities we get that if G is connected then ρ(Db) < 1. Toprove necessity, we construct a counterexample, in particular, a trivial Markov chainwith a single state: G = G1 (so that P = 1) where G1 is an n-node graph withouta single edge. Then Db = Fb1 = Jb1 ⊗ Jb1 = I, which has a spectral radius of unity.This completes the proof of the lemma.

The following definitions and terminology from [17] will be needed in the sequel.Let Rm×n be the space of m × n real matrices. Let Hm×n be the set of all N-sequences of real m × n matrices, so that V ∈ Hm×n means V = (V1, V2, . . . , VN)where Vi ∈ Rm×n for i = 1, . . . , n. The operators ϕ and ϕ is defined to create a tallvector by stacking together columns from these matrices, as follows: let (Vi)j ∈ Rm

be the j-th column of Vi ∈ Rm×n, then

ϕ(Vi) :=

(Vi)1...

(Vi)n

∈ Rmn ϕ(V ) :=

ϕ(V1)...

ϕ(VN)

∈ RNmn. (30)

Similarly, the inverse function ϕ−1 : RNmn → Hm×n is defined so that it produces anelement of Hm×n given a vector in RNmn.

Lemma 3. Consider the jump linear system (24) with an underlying homogeneousand ergodic Markov chain. The state vector e(k) of the system (24) converges in themean square sense if and only if ρ(Db) < 1, where Db is defined in (28). When meansquare convergence occurs, then µ(k)→ µ and Q(k)→ Q, where

µ :=N∑i=1

qi Q :=N∑i=1

Qi. (31)

20

where

[qT1 , . . . , qTN ]T = q := (I − C)−1ψ ∈ RNn

(Q1, . . . , QN) = Q := ϕ−1((I −Db)−1ϕ(R(q))

), ∈ Hn×n

and ψ,R(q) are given by

ψ := [ψT1 , . . . , ψTN ]T ∈ RNn, ψj :=

N∑i=1

pijBxbiγπi ∈ Rn

R(q) :=(R1(q), . . . , RN(q)) ∈ Hn×n,

Rj(q) :=N∑i=1

pij(BxbiΓB

xTbi πi + Jiqiγ

TBxTbi +Bx

biγqTi J

Ti )) ∈ Rn×n.

Moreover, Q is positive semi-definite.

Proof. It follows from Theorem 3.33 and Theorem 3.9 of [17] as well as remark3.5 [17, pg. 35], that mean square convergence of (24) is equivalent to ρ(Db) < 11.The expressions for the mean and correlation, as well as the fact that Q ≥ 0, alsofollow from [17, Proposition 3.37,3.38]. The existence of the steady state distributionπ follows from the ergodicity of the Markov chain, so the expressions provided arewell defined.

Now we are ready to prove Theorem 1

Proof of Theorem 1. (Sufficiency): It follows from Lemma 1 that we have ρ(Db) <1. It then follows from Lemma 3 that the state converges in the mean square sense.The statement about the asymptotic mean being unbiased if the measurement noiseis zero mean follows immediately from the expression for the liming mean in Lemma 3by plugging in γ = 0. (Necessity): If the union of graph is not connected, from theproof of Lemma 1, ρ(Db) = 1. This shows that (due to Lemma 3) convergence willnot occur.

Remark 1. The equation (24) can be interpreted as a leader-following consensusproblem, where the consensus state for node u is the estimation error eu(k) for u ∈V \ Vr, while the leader states are er(k) ≡ 0 for r ∈ Vr, where Vr is the set of

1For the interested reader, the matrix Db is referred to as A1 in [17].

21

reference nodes. Consensus problem are more interesting when there is no leader.Results from leaderless consensus can be used to prove convergence of consensus withleaders, as remarked in [21]. The papers [21, 22] have analyzed leaderless consensuswith Markovian switching graph. Even though the scenarios considered in [21, 22]are close to ours, there are important differences that prevent us from using theirresults to prove mean square convergence. In [21], communication among nodesis modeled as a directed graph and the noise sequence is modeled as a martingaledifference. Mean square convergence is achieved under the condition that certainunion of graphs is strongly connected and the weighted digraphs are balanced, whichmeans the weighted adjacency matrix of each graph is symmetric. The results in [22]are similar. The most significant difference between the formulation in [21, 22] andin this paper is that we allow the weights on the edges (u, v) and (v, u) to be different.As a result, in our formulation the weighted adjacency matrix - which is simply N- is not a symmetric matrix. The results in [21, 22] are therefore cannot be directlyapplied to analyze mean square convergence of (24).

5. Simulation studies

As discussed in Section 2, skew and offset estimation are special cases of theproblem of estimation of scalar node variables from relative measurements. Thereforesimulations are conducted only for scalar node variable estimation. In all simulations,node variables are chosen arbitrarily, a single reference node is present, and the valueof the its node variable is 0. The noise on each measurement is a normally distributedrandom variable. All the edge weights are assigned a value of unity at every time.

We conduct simulations for two scenarios: (i) a network with 4 nodes and (ii) anetwork with 100 nodes. In the first scenario we impose the Markovian switchingtopology. The limiting means and variances of the estimates are computed from thepredictions of Lemma 3, as well as estimated from Monte-Carlo simulations. Thepredictions of the theory are verified by comparing the two. In the second scenario,we simulate node motion according to the Random Waypoint (RWP) mobility model.The edges for a given set of node locations are determined based on a communicationrange. As discussed in Section 3.3, in this case Markovian switching is a very goodapproximation of the graph evolution process even though it may not be exactlyMarkovian. Even if the Markovian property holds exactly, the transition probabilitymatrix is not known and the large state space makes it infeasible to compute thetheoretical predictions of limiting mean and variances that are given in Lemma 3.One purpose of these simulations is therefore to test the performance of the algorithmwhen theoretical predictions are not available.

22

1

2

34

1

2

3

4

1 2

3

4

Figure 4: The three graphs G1,G2,G3 that comprises G. Node 1 is the reference.

5.1. Four-node network with Markovian switching topology

In this scenario the nodes move in such a way that the graph G(k) can be one ofonly 3 graphs shown in Figure 4. The graphs change according to a Markov chainwhose transition probability matrix is

P =

0.3 0 0.70.1 0.5 0.40 0.5 0.5

. (32)

Notice that the union of the graphs in G is connected, though none of the graphs isa connected graph, and P is ergodic. The mean and variance of measurement noiseon every edge are chosen as 0 and 10−4, respectively. Monte-Carlo experiments areconducted to empirically estimate the mean and variance of the estimation error, byaveraging over 1000 sample runs. Figure 5(a) and Figure 5(b) show the empiricallyestimated mean and variance of node 3’s estimate of its node variable. As predictedby Theorem 1, the mean of the estimate converges to the true value, since themeasurement noise is 0 mean. The variance also converges to the theoretical steadystate variance as predicted by Lemma 3.

5.2. A 100-node network with RWP mobility model

Here 100 nodes move in a 1000 m × 1000 m square according to the RandomWaypoint (RWP) mobility model that was described in Section 3.3. The parametersvmin, vmax, and tp for this simulation are the same as the values stated in Section 3.3.The communication range is chosen as 100 m, and a link failure probability of 0.1is used. The mean and variance of the measurement noise are chosen as 0 and10−4. Figure 6 shows two snapshots of the network during one of the simulations.Figure 7 shows the time trace of the estimates of two nodes in one of the simulations.

23

0 50 100 150 2000.95

1

1.05

Iteration index

Mea

n

Empirical True value

(a) Mean

0 50 100 150 2000

2

4x 10

−4

Iteration index

Variance

Empirical Steady state

(b) Variance

Figure 5: Mean and variance of the estimate of node 3’s node variable as a function of time. Theempirical estimate of mean and variance is computed from 1000 Monte Carlo experiments. In (b),the “steady-state” corresponds to the limiting standard deviation predicted by Lemma 3.

Figure 6: Two graphs that occur during a simulation with 100 nodes moving according to therandom waypoint mobility model.

The mean and variance of the estimation error was empirically computed from 1000Monte Carlo simulations. Figure 8 shows mean and variance of the estimation errorfor two nodes. The figure suggests that the estimates of the node variables convergein the mean square sense.

6. Summary

We proposed a distributed algorithm for estimation of clock skew and offset of thenodes of a mobile network and examined its convergence properties. The algorithm

24

0 100 200 300 400 500

0.9

1

1.1

iteration index

his

tory

node 2 node 70

Figure 7: The estimates of two nodes in one of the numerical experiments involving the 100-nodemobile network.

0 100 200 300 400 500

−0.02

0

0.02

iteration index

mean

node 2 node 70

0 100 200 300 400 500

0.5

1

1.5

2

2.5x 10

−3

iteration index

variance

node 2 node 70

Figure 8: Empirically estimated mean and variance of the estimation error for one of the nodes inthe 100-node mobile network.

allows nodes to put different weights on estimates received from distinct neighbors,depending on the accuracy of the corresponding relative measurements. The timevariation of the network was modeled as a Markov chain, which makes the algorithma jump linear system. Under the assumptions that the Markov chain is ergodic andthe diagonal entries of its transition probability matrix are positive, the estimateswere shown to be mean square convergent as long as the union of the graphs overtime is connected.

Expressions for the asymptotic mean and correlation are also provided by usingresults from jump linear systems from [17]. Evaluating these expressions requiressummation of N terms, where N is the number of distinct graphs that can occur. Ingeneral N is a very large number, so the utility of these expressions is limited in thegeneral setting. For instance, if no restriction is placed on the motion of the nodes oredge formation, N is the number of distinct graphs possible with ntotal nodes, which

25

is 212ntotal(ntotal−1). Clearly, this is a very large number unless ntotal is extremely small.

However, in special situations N can be smaller, e.g., if certain nodes are restrictedto move only within certain geographic areas.

In time-varying systems, the rate of change is an important parameter. Theassumption that Markov chain satisfies pii > 0 provides an upper bound on howfast nodes can move and the network can change (compared to the time requiredto obtain relative measurements and current estimates). This assumption was usedto prove Theorem 1. However, it is possible that mean square convergence can beproved with weaker constraints on the speed of topology change.

We have not examined the question of convergence rate. It is likely that thetransition probabilities of the chain will play a role in the convergence rate. However,precisely characterizing of the convergence rate of the algorithm remains an openproblem. The time to reach acceptable estimation accuracy can however be reducedby more careful choice of the initial condition, e.g., using the flagged initializationscheme proposed in [31].

References

[1] K.-L. Noh, Q. M. Chaudhari, E. Serpedin, B. W. Suter, Novel clock phase offsetand skew estimation using two-way timing message exchanges for wireless sensornetworks, IEEE Transactions on Communications 55 (4) (2007) 766–777.

[2] S. Yoon, C. Veerarittiphan, M. L. Sichitiu, Tiny-sync: Tight time synchroniza-tion for wireless sensor networks, ACM Transactions on Sensor Networks 3 (2)(2007) 1–34.

[3] M. Leng, Y.-C. Wu, On clock synchronization algorithms for wireless sensornetworks under unknown delay, IEEE Transactions on Vehicular Technology59 (1) (2010) 182–190.

[4] Y.-C. Wu, Q. Chaudhari, E. Serpedin, Clock synchronization of wireless sensornetworks, IEEE Signal Processing Magazine 28 (1) (2011) 124 –138. doi:10.

1109/MSP.2010.938757.

[5] F. Sivrikaya, B. Yener, Time synchronization in sensor networks: a survey, IEEENetwork 18 (4) (2004) 45 – 50.

[6] B. M. Sadler, A. Swami, synchronization in sensor networks: an overview, in:IEEE MILCOM, 2006, pp. 1–6.

26

http://dx.doi.org/10.1109/MSP.2010.938757

http://dx.doi.org/10.1109/MSP.2010.938757

[7] R. Carli, A. Chiuso, L. Schenato, S. Zampieri, A PI consensus controller for net-worked clocks synchronization, in: IFAC world congress on automatic control,2008.

[8] C. Lenzen, P. Sommer, R. Wattenhofer, Optimal clock synchronization in net-works, in: ACM Conference on Embedded Networked Sensor Systems (SenSys),2009, pp. 1–14.

[9] L. Schenato, F. Fiorentin, Average timesynch: A consensus-based protocol forclock synchronization in wireless sensor networks, Automatica In Press. doi:

DOI:10.1016/j.automatica.2011.06.012.

[10] S. Ganeriwal, R. Kumar, M. B. Srivastava, Timing-sync protocol for sensor net-works, in: ACM Conference on Embedded Networked Sensor Systems (SenSys),2003.

[11] M. Maroti, B. Kusy, G. Simon, A. Ledeczi, The flooding time synchronizationprotocol, in: ACM Conference on Embedded Networked Sensor Systems (Sen-Sys), 2004.

[12] R. Karp, J. Elson, D. Estrin, S. Shenker, Optimal and global time synchroniza-tion in sensornets, Tech. rep., Center for Embedded Networked Sensing, Univ. ofCalifornia, Los Angeles (2003).

[13] P. Barooah, J. P. Hespanha, Distributed optimal estimation from relative mea-surements, in: Proceedings of the 3rd International Conference on IntelligentSensing and Information Processing (ICISIP), 2005, pp. 226–231.

[14] R. Solis, V. S. Borkar, P. R. Kumar, A new distributed time synchronizationprotocol for multihop wireless networks, in: Proc. of the 45th IEEE Conferenceon Decison and Control, 2006, pp. 2734–2739.

[15] A. Giridhar, P. R. Kumar, Distributed clock synchronization in wireless net-works: Algorithms and analysis (I), in: 45th IEEE Conference on Decison andControl, 2006, pp. 4915 – 4920.

[16] P. Barooah, J. P. Hespanha, Error scaling laws for optimal estimation fromrelative measurements, IEEE Transactions on Information Theory 55 (2009)5661 – 5673.

[17] O. Costa, M. Fragoso, R. Marques, Discrete-Time Markov Jump Linear Systems,Probability and its Applications, Springer, 2004.

27

http://dx.doi.org/DOI: 10.1016/j.automatica.2011.06.012


[18] V. Gupta, B. Hassibi, R. M. Murray, Stability analysis of stochastically varyingformations of dynamic agents, in: Proceedings. of the 42nd IEEE Conferenceon Decision and Control, Vol. 1, 2003, pp. 504 – 509. doi:10.1109/CDC.2003.1272613.

[19] Y. Zhang, Y.-P. Tian, Consentability and protocol design of multi-agent systemswith stochastic switching topology, Automatica 45 (5) (2009) 1195 – 1201. doi:DOI:10.1016/j.automatica.2008.11.005.

[20] S. Kar, J. M. F. Moura, Distributed consensus algorithms in sensor networks:Quantized data and random link failures, IEEE Transactions on Signal Process-ing 58 (3) (2010) 13831400.

[21] M. Huang, S. Dey, G. N. Nair, J. H. Manton, Stochastic consensus over noisynetworks with Markovian and arbitrary switches, Automatica 46 (10) (2010)1571–1583.

[22] T. Li, J. Zhang, Consensus conditions of multi-agent systems with time-varyingtopologies and stochastic communication noises, Automatic Control, IEEETransactions on 55 (9) (2010) 2043–2057.

[23] T. Camp, J. Boleng, V. Davies, A survey of mobility models for ad hoc networkresearch, Wireless Communications and Mobile Computing 2 (5) (2002) 483–502.

[24] C. Chatfield, Statistical Inference Regarding Markov Chain Models, AppliedStatistics 22 (1) (1973) 7–20.

[25] K.-L. Noh, E. Serpedin, K. Qaraqe, A new approach for time synchronization inwireless sensor networks: Pairwise broadcast synchronization, IEEE Transactionon Wireless Communications 7 (9) (2008) 3318–3322.

[26] P. Barooah, J. P. Hespanha, A. Swami, On the effect of asymmetric communica-tion on distributed time synchronization, in: 46th IEEE Conference on Decisionand Control, 2007, pp. 5465 – 5471.

[27] W.-K. Chen, Applied Graph Theory, North Holland Publishing Company, 1971.

[28] G. H. Golub, C. F. V. Loan, Matrix Computations, Johns Hopkins UniversityPress, 1996.

[29] H. Minc, Nonnegative Matrices, Wiley-Interscience, 1988.

28

http://dx.doi.org/10.1109/CDC.2003.1272613

http://dx.doi.org/10.1109/CDC.2003.1272613



[30] M.-Q. Chen, X. Li, An estimation of the spectral radius of a product of blockmatrices, Linear Algebra and its Applications.

[31] P. Barooah, N. M. da Silva, J. P. Hespanha, Distributed optimal estimationfrom relative measurements for localization and time synchronization, in: P. B.Gibbons, T. Abdelzaher, J. Aspnes, R. Rao (Eds.), Distributed Computing inSensor Systems DCOSS, Vol. 4026 of LNCS, Springer, 2006, pp. 266 – 281.

[32] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM: Society forIndustrial and Applied Mathematics, 2001.

[33] F. Harary, J. Trauth, Charles A., Connectedness of products of two directedgraphs, SIAM Journal on Applied Mathematics 14 (2) (1966) pp. 250–254.

[34] B. Yackley, E. Corona, T. Lane, Bayesian network score approximation using ametagraph kernel 21, in: D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.),Advances in Neural Information Processing Systems, 2009, pp. 1833–1840.

[35] P. M. Weichsel, The kronecker product of graphs, Proceedings of the AmericanMathematical Society 13 (1) (1962) pp. 47–52.

Appendix A. Proof of Lemma 2

All matrices are non-negative hereafter; so we will explicitly say “non-negative”only when we have to stress it. For matrices X1, X2 of same dimension, we say X1

and X2 are congruent, and write X1∼= X2, if the following holds: (X1)ı 6= 0 if and

only if (X2)ı 6= 0. We also write X1 X2 if the following condition is satisfied:

(X1)ı 6= 0 if (X2)ı 6= 0. The directed graph ~G(X) = (V (X), ~E(X)) corresponding

to a square matrix X ∈ Rn×n is a graph defined on n nodes in which (u, v) ∈ ~E(X)

if and only if Xu,v 6= 0. A directed graph ~G is called strongly connected if for each

pair of nodes u and v, there is a sequence of directed edges in ~E leading from uto v [32]. If ~G1 is a subgraph of ~G2, meaning that ~G2 contains all the nodes and

edges of ~G1, we write ~G1 ⊆ ~G2 or ~G2 ⊇ ~G1. Two directed graphs ~G1 and ~G2 arecalled congruent if their adjacency matrices are congruent. We denote by Adj(~G)

the adjacency matrix of the graph ~G. For a n×n square matrix X, we write Adj(X)

to denote Adj(~G(X)), which is an n × n matrix with ı, -th entry equal to 1 if andonly if Xı > 0, and 0 otherwise. Essentially, the matrix Adj(X) replaces the positiveentries of X by 1 and leaves the 0 entries untouched.

The following statements for non-negative matrices can be verified in a straight-forward manner. All the matrices are of the same dimension.

29

Proposition 2. 1. X ∼= Adj(X).

2. ~G(X1) ∼= ~G(X2) if and only if X1∼= X2.

3. ~G(X1) ⊇ ~G(X2) if X1 X2.

4. ~G(∑`

i Xi) ∼= ∪`i=1~G(Xi).

Proposition 3. The graph ~G(X) is strongly connected if and only if X is irreducible.

If ~G(X) is strongly connected, then ~G(X ⊗ X) is also strongly connected and thusX ⊗X is irreducible.

The first statement of the proposition is well-known [32, pp.671]. The second state-ment follows from the first in a straightforward manner.

Now we define the Cartesian product of two directed graphs ~G1 = (V1, ~E1) and~G2 = (V2, ~E2), which is denoted by ~G1~G2. The Cartesian product has the vertexset equal to V1 × V2, so that nodes in the product are denoted by the pair (u, v),with u ∈ V1 and v ∈ V2, which is not to be confused with an edge. In order toprevent confusion, we will denote an edge from u to v in the sequel by u → v. Theedge set of the Cartesian product is characterized by the following property: there is

an edge (u1, v1) → (u2, v2) in ~G1~G2 if either u1 = u2 and v1 → v2 ∈ ~E2 or v1 = v2

and u1 → u2 ∈ ~E1. Cartesian products of undirected graphs G1 and G2 are similarlydefined, except that the resulting product graph is also undirected. The followingproperties will be useful in future.

Proposition 4. 1. If ~G1 and ~G2 are strongly connected, so is ~G1~G2.

2. If Adj(X1) and Adj(X2) are symmetric, then,

Adj(~G(X1)~G(X2)) = Adj(X1)⊗ I + I ⊗ Adj(X2). (A.1)

Proof of Proposition 4. The first statement is from [33, Table 2]. To prove the sec-ond statement, we introduce the notation G(X), which is the undirected graph

corresponding to a symmetric matrix X. It follows that Adj(~G(X1)~G(X2)) =Adj(G(Adj(X1))G(Adj(X2))). From [34, Section 2.3], Adj(G(Adj(X1))G(Adj(X2))) =Adj(X1)⊗ I + I ⊗ Adj(X2). Thus, we prove (A.1).

The following results will be useful in the proof of Lemma 2.

Proposition 5. If the union graph G = ∪Ni=1Gi is connected, then ∪Ni=1~G(Fi) is

strongly connected.

30

Proof of Proposition 5. Recall that Fi = Ji ⊗ Ji, where Ji = (Mi + Di)−1(Ni + Di).

Since Mi + Di, and therefore its inverse, is a diagonal matrix with positive entries,Ji ∼= (Ni +Di). By property 2 of Proposition 2,

∪Ni=1~G(Fi) ∼= ~G(

N∑i=1

Fi) ∼= ~G

(N∑i=1

(Ni +Di)⊗ (Ni +Di)

)(A.2)

We also have

(Ni +Di)⊗ (Ni +Di) Di ⊗Ni +Ni ⊗Di (A.3)

by dropping two terms in the expansion using their non-negativity. Since Di∼= I,

we get

(Ni +Di)⊗ (Ni +Di) I ⊗Ni +Ni ⊗ I

⇒N∑i=1

((Ni +Di)⊗ (Ni +Di)) I ⊗

(N∑i=1

Ni

)+

(N∑i=1

Ni

)⊗ I.

Using property 3 in Proposition 2, we have

∪Ni=1~G(Fi) ⊇ ~G

((N∑i=1

Ni)⊗ I + I ⊗ (N∑i=1

Ni)

)∼= ~G(

N∑i=1

Ni)~G(N∑i=1

Ni). (A.4)

where the congruence follows from the second statement in Proposition 4. Recallthat Li is the in-Laplacian matrix of ~Gi with weight matrix Wi, and Ni containsthe off-diagonal entries of Li with a change of sign. Because of the way the weightgraph (~Gi,Wi) is constructed from Gi, every pair of nodes u and v that has anundirected edge between them in G has two directed edges u → v and v → ubetween them in ~G. This ensures that (i) Adj(Gi) = Adj(Ni), and (ii) Adj(Li), andtherefore Adj(Ni), is symmetric. As a result, Adj(

∑Ni=1Ni) is also symmetric. Since

G = ∪Ni=1Gi is a connected undirected graph, its adjacency matrix is irreducible.Which means Adj(∪Ni=1Gi) =

∑Ni=1 Adj(Gi) =

∑Ni=1Adj(Ni) is irreducible. Due to

the first statement in Proposition 4, ~G(∑N

i=1Ni)~G(∑N

i=1 Ni) is strongly connected.The result of this proposition now follows from (A.4).

The Kronecker product of two graphs ~G1 = (V1, ~E1) and ~G2 = (V2, ~E2), denoted

by ~G1⊗ ~G2, has the vertex set equal to ~V1× ~V2 and an edge set that is characterizedby the following property: there is an edge (u1, v1)→ (u2, v2) in ~G1⊗ ~G2 if and only

31

if u1 → u2 ∈ ~E1 and v1 → v2 ∈ ~E2 [33]. Note that the Cartesian and Kronecker

products ~G1~G2 and ~G1 ⊗ ~G2 have the same vertex sets but distinct edge sets. Wehave the following property of Kronecker product of graphs from [35]:

Adj(~G1 ⊗ ~G2) = Adj(~G1)⊗ Adj(~G2). (A.5)

Adjacency matrices of both Cartesian and Kronecker products of two graphs arerelated to the adjacency matrices of the individual graphs through the matrix Kro-necker product, cf. (A.1) and (A.5).

Now we are ready to prove the Lemma 2.

Proof of Lemma 2. (Connectivity ⇒ irreducibility): Here we have to prove that ifthe union graph G is connected then the matrix D is irreducible. We will prove itby showing that the directed graph ~G(D) is strongly connected. Let Zj and Sj bethe diagonal and off-diagonal parts of Fj. Since Zi is a non-negative matrix withpositive diagonal, we get

D ∼= [pjiZj] + [pjiSj] ∼= [pjiIn2 ] + [pjiSj]

PT ⊗I + diag[piiSi] (A.6)

where we have used the fact that [pjiIn2 ] = PT ⊗I and dropped the off-diagonalblocks of [pjiSj]. Therefore

~G(D) ⊇ ~G(PT ⊗I)⋃

~G(diag[Si])

= ~G(PT )⊗ ~G(I)⋃

~G(diag[Si])

where the equality follows from the property (A.5) of Kronecker product of graphs.

We will now show that the directed graph ~G(PT )⊗ ~G(I)⋃ ~G(diag[Si]) is strongly

connected, which proves that ~G(D) is as well.

First notice that there are Nn2 nodes in the graph ~G(D). Due to its subsequentusefulness, we index these nodes by the pair (pi, dκ), where i = 1, . . . , N and κ =1, . . . , n2. It is convenient to imagine both the matrices PT ⊗I and diag[Si] as N×Nblock matrices, with each block being of dimension n2 × n2. The node with index(pi, dκ) refers to the κ-th row (among n2) in the i-th row block (among N). To

prove that ~G(PT )⊗ ~G(I)⋃ ~G(diag[Si]) is strongly connected, we need to show the

following

There is a path from an arbitrary node(pi, dκ)to another arbitrary node(pj, dν)

in the graph ~G(PT )⊗ ~G(I)⋃

~G(diag[Si]).

(A.7)

32

The following properties will be used to construct a proof of (A.7):

1. s1: There exists a path from (pi, dκ) to (pj, dκ) in ~G(PT ⊗I) for all i, j =1, . . . , N and κ = 1, . . . , n2.

2. s2: If dκ → dh is an edge in ~G(S`), then (p`, dκ) → (p`, dh) is an edge in~G(diag[Si]).

The first statement is proved as follows. Since the Markov chain is ergodic, P -and therefore PT - is irreducible, which means ~G(PT ) is strongly connected. Thus,

given arbitrary nodes p and q in ~G(PT ), there is a path connecting them in ~G(PT ).

Call this path p, u1, u2, . . . , um, q. Since the edge dκ → dκ ∈ ~G(I) exists for everyκ, it now follows from the definition of Kronecker product of graphs that the path(p, dκ), (u1, dκ), (u2, dκ), . . . , (um, dκ), (q, dκ) exists in ~G(PT ) ⊗ ~G(I) for every κ =1, . . . , n2. The statement s1 is now proved upon replacing p and q by pi and pj. Thestatement s2 is true because of the structure of the matrix diag[Si] and the nodeindexing scheme described immediately before (A.7).

From Proposition 5, we have that ∪Ni=1~G(Fi) is connected. Since Si is the off-diagonal

part of Fi, ∪Ni=1~G(Si) is connected as well, since the diagonal entries only determine

self-loops and do not affect connectivity. Therefore, there is a path from an arbitrarynode dκ to another arbitrary node dν in ∪Ni=1

~G(Si), for all κ, ν in 1, . . . , n2. Toprove the statement (A.7), pick such a path from the node dκ to the node dν in

∪Ni=1~G(Si), where each edge in the path may lie in any of the graphs ~G(Si)Ni=1.

For the sake of concreteness and compactness, let us consider a path of length two,consisting of the two edges dκ → dh and dh → dν , which belong to the graphs,say, ~G(S`) and ~G(Sm), respectively. From s1 we have proved above, we know that

there is a path from the node (pi, dκ) to the node (p`, dκ) in the graph ~G(PT ⊗I),call this path path[(pi, dκ) (p`, dκ)]. From s2, we have that the edge (p`, dκ) →(p`, dh) exists in the graph ~G(diag[Si]) due to the existence of the edge dκ → dhin ~G(S`). Thus, we have the path from (pi, dκ) to (p`, dh) in the combined graph

~G(PT )⊗ ~G(I)⋃ ~G(diag[Si]) by joining the path path[(pi, dκ) (p`, dκ)] with the

edge (p`, dκ)→ (p`, dh). Using this idea repeatedly, we construct a path from (pi, dκ)

33

to (pj, dν) in ~G(PT )⊗ ~G(I)⋃ ~G(diag[Si]) as follows:

path[(pi, dκ) (p`, dκ)], in ~G(PT )⊗ ~G(I)

(p`, dκ)→ (p`, dh), ∈ ~G(diag[Si])

path[(p`, dh) (pm, dh)], in ~G(PT )⊗ ~G(I)

(pm, dh)→ (pm, dν), ∈ ~G(diag[Si])

path[(pm, dν) (pj, dν)], in ~G(PT )⊗ ~G(I),

where each path[·] exists due to the property s1 established above, and each edgeexists due to the property s2 as well as with the assumed existence of the edgesdκ → dh and dh → dν in the union graph. This argument can be extended to a pathof any length between dκ and dν in the union graph ∪Ni=1

~G(Si). Thus, there is a path

from (pi, dκ) to (pj, dν) in ~G(PT )⊗ ~G(I)⋃ ~G(diag[Si]), which proves sufficiency.

(Not connected ⇒ reducible): A simple counterexample proves necessity. Constructa trivial Markov chain with a single state: G = G1 (so that P = 1) where G1 is ann-node graph without a single edge. Then D = F1 = J1⊗J1 = I, which is reducible.

34

Documents

Clock skew and o set estimation from relative measurements ...plaza.ufl.edu/cdliao/document/tech_report/automatica_11_tech.pdf · Clock skew and o set estimation from relative measurements