Upload
truongthien
View
238
Download
0
Embed Size (px)
Citation preview
MULTISCALE DISTRIBUTED ESTIMATION
WITH APPLICATIONS TO GPS AUGMENTATION
AND NETWORK SPECTRA
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF AERONAUTICS AND
ASTRONAUTICS
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Christina Selle
June 2010
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/zg345yw6848
© 2010 by Christina Selle. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Matthew West, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Sanjay Lall, Co-Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Per Enge
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
Abstract
Distributed estimation uses a network of sensors to measure a set of variables. The
computation tasks required for finding the optimal estimate can be divided among
the sensor nodes in a way that can be implemented as an iterative process using
nodes with little computational power. Most algorithms for distributed estimation
work for small networks, but convergence rates decrease with network size, making
them impractical for use in large networks. We present a consensus algorithm with a
convergence rate that scales logarithmically with network size by arranging nodes in
a multigrid network structure. The algorithm can adapt to changes in the network
structure and allows for selection of several parameters, representing a trade-off be-
tween performance and robustness of the network. We also describe how the algorithm
is adapted to account for time-varying measurements and measurement weights.
We present two applications of these methods. Our first application is an algo-
rithm that allows us to determine the spectral properties of a state transition matrix
on the network. Since the convergence rate of a consensus algorithm is related to
the spectral properties of the state transition matrix, we can use this information to
evaluate the effects of changes to the network structure.
Our second application is a distributed GPS augmentation system. Traditional
GPS augmentation systems use reference receivers to find a set of error correction
values, which is broadcast to surrounding mobile receivers. Our distributed augmen-
tation system uses only mobile receivers with unknown locations, which are able to
obtain a set of correction values by sharing and processing data in a distributed net-
work. The resulting method can be used to improve GPS point positioning accuracy
in areas where fixed augmentation systems are not available.
iv
Acknowledgments
This work was supported by a William R. and Sara Hart Kimball Stanford Graduate
Fellowship, and I am deeply thankful to the Kimball family for this support.
I would like to thank my adviser, Matt West, for all of the great ideas, guidance,
advice, suggestions, encouragement, LaTeX tips, and math lessons he shared with me
during my time at Stanford.
I also want to express my gratitude to Sanjay Lall, who stepped in as my official
Stanford adviser half-way through my journey. Per Enge and Sheri Sheppard have
been great professors for me to work with as a teaching assistant, and have also
provided valuable advice.
Sigrid Close and Ellen Kuhl provided some helpful feedback during and after my
Ph.D. oral examination.
I could not have made it through graduate school without the support of my
friends and family, including my parents Hartmut and Marie-Luise Mester, my sister
Mareike Mester, my grandparents, my friends Adam Grossman, Fraser Cameron,
Marianne Karplus, Tracy Rubin, the Carlstrom family, and the group of Aeronautics
and Astronautics graduate students who shared this journey with me.
Last but not least, I would like to thank my husband Andrew Selle for all of his
love and support.
v
Contents
Abstract iv
Acknowledgments v
1 Introduction 1
2 Multiscale consensus algorithms 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Construction of a multilevel network . . . . . . . . . . . . . . . . . . 8
2.3 Invariant distribution offset factor determination . . . . . . . . . . . . 12
2.4 Adjusting self-weights for improved performance . . . . . . . . . . . . 14
2.5 Adjusting the network for broken edges and nodes . . . . . . . . . . . 16
2.6 Performance and Robustness trade-offs . . . . . . . . . . . . . . . . . 18
2.7 Two dimensional numerical example . . . . . . . . . . . . . . . . . . 22
2.8 Measurement updates . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Sensor weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.10 Network spectral properties . . . . . . . . . . . . . . . . . . . . . . . 26
2.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Distributed GPS augmentation 34
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Position solution for a single receiver . . . . . . . . . . . . . . . . . . 39
3.2.1 Gauss-Newton method . . . . . . . . . . . . . . . . . . . . . . 40
3.2.2 Gradient descent method . . . . . . . . . . . . . . . . . . . . . 41
vi
3.2.3 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4 Comparison of different methods . . . . . . . . . . . . . . . . 43
3.3 Multiple receivers with delay estimation . . . . . . . . . . . . . . . . 44
3.3.1 Gauss-Newton method . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Accuracy and sensitivity to random errors . . . . . . . . . . . 47
3.3.3 Regularized delay estimation . . . . . . . . . . . . . . . . . . . 49
3.4 Distributed delay estimation . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.1 Regularized distributed delay estimation . . . . . . . . . . . . 52
3.4.2 Comparison of the different methods . . . . . . . . . . . . . . 53
3.5 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Multigrid methods for distributed delay estimation . . . . . . . . . . 61
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Distributed spectral methods 65
4.1 Introduction and Assumptions . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Spectral methods for symmetric matrices . . . . . . . . . . . . . . . . 67
4.3 Adapting spectral methods for distributed networks . . . . . . . . . . 69
4.4 Spectral methods for nonsymmetric matrices . . . . . . . . . . . . . . 72
4.5 Distributed concurrent computation of eigenvalues . . . . . . . . . . . 73
4.6 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Using spectral information for supernode placement . . . . . . . . . . 80
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A Distributed spectral algorithms 85
A.1 Power method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.2 QR-factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
vii
List of Tables
3.1 Typical GPS error budget (RMS values). . . . . . . . . . . . . . . . . 36
viii
List of Figures
2.1 Simple two-level network with five base-level nodes (gray) and two
supernodes (black). The base-level nodes form a ring. . . . . . . . . . 7
2.2 Transition matrix for Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . 11
2.3 Linear system for Theorem 2.2.1. . . . . . . . . . . . . . . . . . . . . 11
2.4 State transition matrix with adjusted supernode self-weights. . . . . . 15
2.5 Comparison of convergence for a ring network with three levels using
Metropolis weights and the multigrid weights described here with and
without supernode self-weight adjustments. . . . . . . . . . . . . . . . 16
2.6 Spectral gap vs. number of nodes in the base level . . . . . . . . . . . 19
2.7 Centralization Robustness vs. Performance trade-off . . . . . . . . . . 20
2.8 Performance vs. Robustness for various α and β values. The red cross
indicates the parameter values chosen for subsequent numerical examples. 21
2.9 Example network layout . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.10 Convergence results for the example network. . . . . . . . . . . . . . 23
2.11 Eigenvalues of network with 400 base level nodes with various numbers
of levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.12 Selected eigenvectors of a single level ring with 400 nodes. . . . . . . 28
2.13 Convergence times of different ring-shaped networks given the eigen-
vectors of a single level ring as starting value. . . . . . . . . . . . . . 29
2.14 Eigenvalues of network with various numbers of levels. . . . . . . . . 30
2.15 v2 of the base level of the network shown in figure 2.9. . . . . . . . . . 31
2.16 v6 of the base level of the network shown in figure 2.9. . . . . . . . . . 31
2.17 v30 of the base level of the network shown in figure 2.9. . . . . . . . . 32
ix
2.18 Convergence times of different networks given the eigenvectors of a
single level network as starting value. . . . . . . . . . . . . . . . . . . 32
3.1 Convergence for 50 receivers without delay estimation . . . . . . . . . 43
3.2 Effect of including delay estimation on position estimates . . . . . . . 48
3.3 Convergence for 500 receivers without delay estimation . . . . . . . . 54
3.4 Convergence for 50 receivers with delay estimation . . . . . . . . . . . 54
3.5 Convergence for 500 receivers with delay estimation . . . . . . . . . . 55
3.6 Mean positioning error as a function of the number of satellites . . . 55
3.7 Mean objective value function per receiver as a function of the number
of receivers, with and without delay estimation, and for a hypothetical
case where correlated delays are set to zero. . . . . . . . . . . . . . . 56
3.8 Ratio of total position error without and with delay estimation . . . . 57
3.9 Ratio of total position error without and with delay estimation in an
extended network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.10 Ratio of total position error without and with delay estimation with
large multipath error . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.11 Ratio of total position error without and with delay estimation in an
extended network with large multipath errors . . . . . . . . . . . . . 60
3.12 Example network layout . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.13 Positioning error convergence for the receiver network example. . . . 63
3.14 Objective function value for the receiver network example. . . . . . . 63
4.1 Convergence of the orthogonal basis for the distributed QR method . 77
4.2 Convergence of the orthogonal basis for the distributed power method 77
4.3 Convergence of the eigenvectors for the distributed QR method. . . . 78
4.4 Convergence of the eigenvectors for the distributed power method. . . 78
4.5 Convergence of the eigenvalues for the distributed QR method. . . . . 79
4.6 Convergence of the eigenvalues for the distributed power method. . . 79
4.7 Network from figure 2.9 in v2-v3 space. . . . . . . . . . . . . . . . . . 81
4.8 Final supernode placements in v2-v3 space. . . . . . . . . . . . . . . . 82
4.9 Final supernode placements in x-y space. . . . . . . . . . . . . . . . . 83
x
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
This thesis describes a distributed multigrid consensus algorithm, as well as ap-
plications of this algorithm to GPS augmentation and graph-spectrum computations.
Distributed estimation algorithms are used to provide optimal estimates of a vari-
able, based on a set of measurements taken by a network of sensors. Distributed
estimation algorithms have several advantages and disadvantages compared to cen-
tralized algorithms. While centralized algorithms require the availability of a single
processor that is capable of running the estimation algorithm, distributed methods
divide the computational tasks into smaller tasks that can be performed by nodes
with lower computational capabilities. For the algorithms described in this thesis, we
assume that the sensors themselves have some computational capabilities and form
the network of nodes that runs the distributed estimation algorithms. Distributed
methods can also be more robust than centralized methods, in many cases making it
possible to obtain good results even if some of the nodes or communication links in
the network fail.
The networks used here for distributed estimation can be modeled as graphs,
where the sensors are the nodes or vertices of the graph, and the communication
links between nodes form the edges. Every node can store some limited amount of
data for later use, and thus is modeled as having a self-loop. We also assume that
all communication links are two-way links, but that weights associated with different
directions of transmission between two nodes do not have to be equal, making the
network a directed graph. At every discrete time step, each node receives data from
adjacent nodes, and updates its stored variables.
Chapter 2 describes an algorithm for distributed consensus. While consensus is a
very basic operation for a distributed network to perform, there are many complex
computations that can be reduced to a combination of consensus steps and simple
operations that can be performed by each node in the network individually. The con-
sensus algorithm described in chapter 2 is different from other consensus algorithms
in that it uses a multigrid network structure. Multigrid methods are a tool commonly
used for improving convergence rates of algorithms for solving differential equations
by using several levels of increasing resolution in the discretization. Chapter 2 shows
how a multigrid structure can be created to run a consensus algorithm in a distributed
CHAPTER 1. INTRODUCTION 3
network. In addition, the performance and robustness trade offs of this algorithm are
studied, and convergence rates and their dependencies on noise characteristics are
compared to those of single level networks. Chapter 2 also proposes some extensions
of the basic multigrid algorithm for measurement updates and assigning weights to
the node measurements.
Chapter 3 describes how distributed methods, including the multigrid algorithm
from chapter 2, can be used to create a distributed GPS augmentation system. Tradi-
tional GPS augmentation systems use a reference station to create error corrections,
which are broadcast to mobile receivers and used in point positioning. The aug-
mentation system described here does not use a fixed reference receiver, but instead
calculates correction terms based only on the measurements obtained from a network
of mobile receivers. If distributed methods are used, the augmentation system also
does not require the use of a centralized station to compute the corrections, since all
computation is done by the network of receivers.
Chapter 4 describes a distributed eigenvalue method for nonsymmetric matrices.
Most eigenvalue methods are difficult to adapt to distributed systems due to their
dependence on matrix factorization, but the algorithm presented here can be reduced
to a series of consensus processes and simple computations, and can therefore be
run on a distributed network. This is of particular interest since it can be used to
find a worst-case estimate of the convergence rate of a consensus algorithm, and thus
monitor the status of the network if the structure of the network changes over time.
If the right number of levels is selected in the construction of the multigrid net-
works described in this thesis, the convergence rate scales logarithmically with net-
work size, making them practical for use in very large networks. Since microcontrollers
and microprocessors are included in a wide variety of devices, and wireless communi-
cation is becoming more and more ubiquitous, there are many potential application
areas where distributed estimation and control could be applied to large networks.
The main contributions of this thesis can be summarized as follows. A novel
multigrid algorithm for distributed consensus is presented, along with analysis of the
trade-offs between robustness and performance that occur when various parameters
CHAPTER 1. INTRODUCTION 4
are selected for this algorithm. The convergence of this consensus method is com-
pared to other single-level methods under various noise conditions. Chapter 3 includes
an algorithm for a distributed GPS augmentation system, which differs from exist-
ing augmentation systems in that it requires neither stationary reference receivers
with known positions, nor reference stations for centralized computations. Chapter
4 extends an existing distributed eigenvalue method for symmetric matrices to non-
symmetric matrices, while also describing how the power method can be adapted for
distributed systems. The spectral information of a network is then used for deter-
mining appropriate supernode locations in a network.
A review of the relevant literature is provided in the introduction section of each
of the individual chapters in this thesis. Conclusions and some ideas for future work
are provided at the end of each chapter.
Chapter 2
Multiscale networks for distributed
consensus algorithms
5
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 6
2.1 Introduction
Distributed consensus algorithms [11][30][45] allow a network of computational nodes
to iteratively exchange information between neighbors in order to compute the global
average of a quantity. They can be used as the basis for many applications, such
as distributed optimization methods [16] or control schemes [6][33]. While typically
less efficient than a centralized algorithm, consensus methods have the advantages of
distributing the work across all nodes in the network and of being robust to node and
connection failure.
The general framework for consensus methods considers each node synchronously
updating its own value to a weighted average of the current values of its neighbors
(as distinct from asynchronous gossip algorithms [3], for example). One of the most
natural questions, therefore, is what graph structure and what weights should be
chosen to give the fastest convergence of the algorithm to the consensus value while
guaranteeing convergence [7].
The choice of optimal weights has been investigated in depth by [2][39][44], who
used convex optimization and semidefinite programming to find the weights that
minimize the magnitude of the second largest eigenvalue of the Markov chain defined
by the consensus update. While such an approach gives the optimal choice of weights,
it requires a centralized scheme for solving the optimization problem for the weights.
An alternative to solving for the optimal weights is to choose a graph structure
that gives fast consensus with some weight choice. This can be done by optimiza-
tion [15], or by using a heuristic such as taking advantage of the fact that small-world
networks [42] have fast consensus [40] and thus trying to add edges or nodes to en-
hance this property. Other possibilities include making the node updates random [19]
or otherwise time-varying [20][29]. Networks can also have time-varying inputs or
topologies due to the nature of the network rather than the consensus algorithm [31].
In this chapter we present an alternate scheme for producing a network to achieve
fast consensus, based on the idea of multiscale networks. A simple example of a
multiscale network with two levels is shown in figure 2.1. Figure 2.9 in section 2.7
shows a more complex network with three levels, that is used for numeric examples
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 7
Figure 2.1: Simple two-level network with five base-level nodes (gray) and two su-pernodes (black). The base-level nodes form a ring.
throughout this thesis. We observe that a regular consensus method is similar to
using Jacobi’s method to solve the equation Lx = 0, where L is the graph Laplacian
or a similar matrix. Unfortunately, the convergence rate of Jacobi’s method is poor
and scales badly as system size grows [36]. This is due to the fact that errors that
vary slowly across the network are only slowly driven to zero by the Jacobi iteration,
which uses only nearest-neighbor updates. One standard way of overcoming these
deficiencies is to use multilevel algorithms, such as the multigrid method [41], where
coarsened versions of the base-level graph are used to enhance the decay of slowly
varying components.
We build on this insight and give an algorithm for constructing multilevel networks
for consensus. The basic multilevel network construction is presented in section 2.2,
with a heuristic for adjusting the weights to enhance convergence in Section 2.4. An
algorithm for adjusting the edge weights in the presence of node and edge failures
is given in section 2.5 and the trade-off between performance and robustness is in-
vestigated numerically in section 2.6. Section 2.7 presents a numerical example for
a randomly generated graph embedded in 2D. Section 2.8 describes how the algo-
rithm can be used if node measurements are time-varying, and section 2.9 presents
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 8
equations for adjusting the algorithm for calculating a weighted average of node mea-
surements. Finally, in section 2.10 we present some examples for the changes to
the network spectral properties due to adding a multigrid structure, and how this
influences correlations between noise spatial frequency and convergence rates.
While the algorithm described in this chapter only finds the mean of a single
variable, it can be used as a basis for performing many more complex computations.
For example, the variance of a the node measurements can be found using a sequence
of two consensus operations, the first to find the mean of the measurement values,
and the second to sum the squares of deviations from the mean. Some applications,
including the distributed GPS augmentation system described in chapter 3 and the
spectral algorithm described in chapter 4 require adding vectors, which can be done
by simply letting the state x of the network be a matrix, where each node stores the
information contained in one row of the state matrix.
2.2 Construction of a multilevel network
By a multilevel network we mean one where nodes are arranged in levels or classes.
All nodes are not equal in their connection structures, but are grouped. In a spatially
embedded network, lower levels contain more nodes and have physically short-range
connections, while higher levels contain fewer nodes that have longer-range connec-
tions. This thus mimics the multiscale structure generated by multilevel algorithms
such as multigrid [41]. We refer to nodes in all upper levels as “supernodes”, to
distinguish them from the nodes in the base level.
A consensus problem starts with a network with a set of nodes N and a set of
edges E connecting these nodes. Each node is given an initial value, and the purpose
of the consensus algorithm is to find the mean of the initial states of all nodes. The
initial values of all nodes are stored in the vector x(0). Starting with the initial
values, at any time step t each node i takes a weighted average of the state values of
its neighboring nodes to compute its own new state value xi(t+ 1). This process can
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 9
be represented as a multiplication with a state transition matrix P :
x(t+ 1) = P Tx(t) (2.2.1)
For a single-level network, Metropolis weights can be used to propagate the state as
described in [45]. With Metropolis weights, the state transformation matrix is
Pij =
1
1+max di,djif {i, j} ∈ E
1−∑{i,k}∈E Pik if i = j
0 otherwise.
(2.2.2)
This is equivalent to the evolution of probability distributions in Markov Chains and
we assume irreducibility and aperiodicity so the state converges to a unique final state
π, where
P Tπ = π (2.2.3)
State transition matrices that result from applying Metropolis weights are symmetric,
and all row- and column-sums are equal to one. The invariant distribution is uniform,
and represents the average of the initial states of the nodes:
π =1
n
n∑i=1
xi(0) (2.2.4)
Metropolis weights can be computed quickly by the distributed network, and can be
efficient for single-level networks. However, they result in inefficiencies when applied
to multilevel networks. In particular, Metropolis weights for connections between
supernodes in upper levels of the network are smaller than they would need to be to
maximize the convergence rate, since Metropolis weights take into account only the
degree of a node, but not other aspects of the geometry of the network, such as the
length of edges in a spatial embedding.
One method for constructing multilevel networks and finding their state transition
matrices and invariant distributions is to first generate the base level, and then add
the upper levels. Each superior level is generated by making an identical copy of the
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 10
next lower level, and merging several nodes into supernodes. The nodes in each level
are connected to their equivalent nodes in the levels directly above and below.
This method can be used for constructing multiscale networks based on an arbi-
trary layout of the base-level network. It does however constrain the construction of
the upper levels and the connections between levels, in that connections between su-
pernodes must mirror the connections in the lowest level. It can therefore be applied
in situations where the geometry of the upper levels of the network can be chosen to
fit these constraints, or when the layers of supernodes are created by selecting some
of the regular nodes in the base level to double as supernodes, and the base-level
connections between nodes are also used to implement supernode edges.
The first step in creating the multilevel network is duplicating the base level to
create upper levels, and connecting each node to its corresponding node in the levels
directly above and below, giving a so-called ladder network. The connections between
different levels initially all have equal weights going up and down. For such a network,
the invariant distribution of each level is equal to the invariant distribution π of the
original base level, so that the overall invariant distribution is
π =1
n
[πT , πT , . . . , πT
]T(2.2.5)
Next, weights are added for connections between different levels, so that the values a
node receives from superior levels can be given more weight than those from inferior
levels. Using coefficients α1, α2, . . . , αn to denote weights for connections between
nodes in each level, and β1, β2, . . . , βn−1 for weights of connections between levels, the
new state transition matrix is
P =
α1P (1− α1)I 0 · · · 0
β1I α2P (1− α2 − β1)I · · · 0
0 β2I α3P · · · 0...
......
. . ....
0 0 0 · · · αnP
(2.2.6)
The merging of nodes to form q supernodes from p nodes in a level is described by
the transformation matrix Bi ∈ Rp×q, where Bij = 1 if and only if the original node
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 11
¯P =
α1P (1− α1)B2 0 · · · 0 0
β1B†2 α2B
†2PB2 (1− α2 − β1)B†2B3 · · · 0 0
0 β2B†3B2 α3B
†3PB3 · · · 0 0
......
.... . .
......
0 0 0 · · · (1− αn)B†nBn−1 αnB†nPBn
(2.2.9)
Figure 2.2: Transition matrix for Theorem 2.2.1. B† denotes the pseudoinverse of B,
i.e. B† =(BTB
)−1BT
.
(α1 − 1) β1 0 · · · 0 0
0 (α2 + β1 − 1) β2 · · · 0 00 0 (α3 + β2 − 1) · · · 0 0...
......
. . ....
...0 0 0 · · · (αn−1 + βn−2 − 1) βn−1
1 1 1... 1 1
γ1γ1 + γ2
γ1 + γ2 + γ3...∑n−1
i=1 γi1
=
000...01
(2.2.10)
Figure 2.3: Linear system for Theorem 2.2.1.
i is merged into supernode j. Bi thus describes the mapping from the base level of
nodes to level i. Note that B1 = I. The transformation from the ladder network to
the final network is described by
B = diag(B1, B2, . . . , Bn) (2.2.7)
¯P =
(BT B
)−1BT P B = B†P B (2.2.8)
Theorem 2.2.1. A multilevel network constructed as described above will have the
state transition matrix (2.2.9) in figure 2.2 and invariant distribution ¯π given by
¯π =[(γ1π)T , (γ2B
T2 π)T , (γ3B
T3 π)T , . . . , (γnB
Tn π)T
]T, (2.2.11)
where the coefficients γ1, γ2, . . . , γn are found by solving the linear system (2.2.10)
shown in figure 2.3.
Proof. Using P from eq. 2.2.6 in eq. 2.2.8 yields the state transition matrix shown in
figure 2.2. Given P , we can show that ¯π in equation 2.2.11 is indeed the invariant
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 12
distribution:
¯P T ¯π =
α1γ1π + β1π
((1− α1)γ1 + α2γ2 + β2γ3)BT2 π
...
((1− αi−1 − βi−2)γi−1 + αiγi + βiγi+1)BTi π
...
((1− αn−1 − βn−2)γn−1 + αnγn)BTn π
= ¯π (2.2.12)
Since the α and β coefficients are known, this can be written as a system of linear
equations. Omitting the last row, which is redundant since each column of the original
system sums to zero, and adding the condition that the sum of the γ’s has to be one,
we get
(α1 − 1) β1 · · · 0
(1− α1) (α2 − 1) · · · 0
0 (1− α2 − β1) · · · 0...
.... . .
...
0 0 · · · βn−1
1 1... 1
γ1
γ2
γ3...
γn−1
γn
=
0
0
0...
0
1
(2.2.13)
The linear system in figure 2.3 is constructed by taking the sum of each row except
the last with all rows above it. As long as αi + βi−1 < 1 for all i, there is a unique
solution. Given this solution, each node can determine the consensus value from the
invariant distribution. �
The resulting invariant distribution is not uniform, and in order to determine the
consensus value, the state of each node has to be multiplied with a factor that can
be obtained by solving the linear equation above for the invariant distribution.
2.3 Invariant distribution offset factor determina-
tion through consensus
As an alternative to solving the equations presented above for deriving the consen-
sus value from the invariant distribution, the factors can also be found by using the
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 13
consensus method itself. This also makes it easy to relax the assumption that ev-
ery node starts out with a measurement value. Until now it was assumed that each
node had access to a unique measurement, but this might not be the case in some
implementations of this method. In a real-life situation, a sensor could malfunction
while the computation and communication capabilities of a node might be working
normally. Another scenario where this occurs is when supernodes are implemented
using the hardware of an already existing network.
Theorem 2.3.1. Let the elements of the vector κ be the factors the consensus value
has to be multiplied with to obtain the invariant distribution,
ˆπi = κi
n∑k=1
xk(0) (2.3.1)
Let xk(0) = 0 if node k has no measurement available for inclusion in the consensus
process. Also, let η be a vector so that ηi = 1 if node i has a measurement, and ηi = 0
otherwise. Then κ can be found by applying the consensus method to η:
κ = (P T )∞η (2.3.2)
Proof. Let m be the consensus value, which is the mean of all node measurement
values:
m =
∑nk=1 xi(0)∑nk=1 ηi
(2.3.3)
The invariant distribution can be expressed in terms of κ and m as
ˆπi = κim (2.3.4)
Now, for the consensus process that uses η as the initial state vector
ˆπi = κi
∑nk=1 ηi∑nk=1 ηi
= κi (2.3.5)
�
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 14
In the case where every node has a measurement value, this simplifies to
κ = (P T )∞1 (2.3.6)
2.4 Adjusting self-weights for improved performance
The method for constructing the propagation matrix has one deficiency: since su-
pernodes are constructed by merging several nodes in one level into one supernode,
the self-weights of the supernodes are on average significantly higher than the weights
for transmitting states between supernodes in a level. The convergence rate can be
improved by reducing the self-weights of the supernodes, so that they are on average
equal to the weights between nodes. This can be done by taking advantage of the
fact that ((1 + δ)P T − δI
)π = P Tπ (2.4.1)
Such an adjustment is applied to all submatrices that describe the connections be-
tween supernodes in their respective level, i.e. all block matrices on the diagonal of¯P with the exception of the first block matrix on the diagonal, which describes the
connections between the base-level nodes. The δ coefficients for each level are cho-
sen such that the mean weight for connections between nodes are equal to the mean
self-weights.
Theorem 2.4.1. If the multilevel network with state transition matrix¯P in equa-
tion (2.2.9) has weight changes given by
δj = min
{a− b
1− a− b,min {diag(Aj)}
1−min {diagAj}
}(2.4.2)
a =1
ntrace(A) (2.4.3)
b =1
n2 − n ‖Aj‖1 − trace(Aj) (2.4.4)
Aj = B†jPBj (2.4.5)
then it will have the same invariant distribution as the unmodified network. With
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 15
¯Pa =
α1P (1− α1)B2 0 · · · 0
β1B†2 α2
((1 + δ2)B
†2PB2 − δ2I
)(1− α2 − β1)B†2B3 · · · 0
0 β2B†3B2 α3
((1 + δ3)B
†3PB3 − δ3I
)· · · 0
......
.... . .
...
0 0 0 · · · αn
((1 + δn)B
†nPBn − δnI
)
(2.4.7)
Figure 2.4: State transition matrix with adjusted supernode self-weights.
these changes, the new state transfer matrix is equation (2.4.7) in figure 2.4.
Proof. Adjusting the blocks on the diagonal of¯P as described above does not change
the products of those entries with the corresponding parts of the invariant distribu-
tion: ((1 + δi)B
†iPBi − δiI
)γiB
Ti π = B†iPBiγiB
Ti π (2.4.6)
Therefore, the invariant distribution ¯π remains the same when the supernode self-
weights are adjusted. �
While adjusting super-node self-weights does not necessarily result in optimal
values for¯Pa, it is a heuristic that yields significant improvements in the spectral gap
ρ = 1− λ2.Figure 2.5 demonstrates the effect that adjusting supernode self-weights has on
the convergence rate. For a ring-shaped network with three levels of nodes, three
methods were used to construct the state transitions matrix: Metropolis weights, and
the method described in the previous section with and without supernode self-weight
adjustments. The computational cost of generating the networks was not taken into
account here, since it is assumed that networks are used for multiple computations.
Using the multigrid method, and initial improvement in the convergence rate com-
pared to Metropolis could be achieved, as averaging of states of nodes connected to
the same supernode is accelerated compared to Metropolis weights. However, since
connections between supernodes are weak, convergence slows down after a few steps.
With the improvement of adjusting supernode self-weights, a significantly higher con-
vergence rate is achieved even after these initial steps.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 16
0
0.2
0.4
0.6
0.8
1
Residual
norm
||ri||/||r 0
||
0 10 20 30 40 50
Time t
Metropolisweights not adjustedweights adjusted
Figure 2.5: Comparison of convergence for a ring network with three levels usingMetropolis weights and the multigrid weights described here with and without su-pernode self-weight adjustments.
2.5 Adjusting the network for broken edges and
nodes
In order to be robust, the network should continue to function when one or more of its
edges or nodes stop functioning, as long as the network is still connected. A broken
node is a special case of multiple broken edges, since it is equivalent to breaking all
edges of the effected node and removing it from the network.
One simple method for adjusting for a broken edge is for the adjacent nodes to
modify their self-weights so that the row sums of the weight matrix are again equal
to 1. Affected nodes only need to know the weights of their remaining edges to do
this. When this method is used, the invariant distribution does not change, as long as
the network is still connected. This can be shown by considering the joint probability
matrix W, where
Wij = Pijπi (2.5.1)
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 17
The column sums of W are equal to the invariant distribution:
∑j
Wji = πi (2.5.2)
When the edge between nodes p and q is broken, P is adjusted in the following way:
P ′pq = P ′qp = 0 (2.5.3)
P ′pp = Ppp + Ppq (2.5.4)
P ′qq = Pqq + Pqp (2.5.5)
This results in the following adjustments to W:
W ′pq = W ′
qp = 0 (2.5.6)
W ′pp
π′p=Wpp
πp+Wpq
πp(2.5.7)
W ′qq
π′q=Wqq
πq+Wqp
πq(2.5.8)
These adjustments preserve the symmetry of W . The column sums of W ′ are:
∑j
W ′ji =
∑j
Wji = πi = π′i for i 6= p, q (2.5.9a)
∑j
W ′jq =
∑j 6=p,q
W ′ji +W ′
pp +W ′qp
=∑j 6=p,q
Wjp + (Wpp +Wpq)π′pπp
for i = p (2.5.9b)
∑j
W ′jq =
∑j 6=p,q
Wji +W ′qq +W ′
pq
=∑j 6=p,q
Wjq + (Wqq +Wpq)π′qπq
for i = q (2.5.9c)
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 18
Theorem 2.5.1. If the multilevel network with state transition matrix¯Pa in equa-
tion (2.4.7) has some edges removed but remains connected, then updating the tran-
sition matrix by (2.5.3)–(2.5.5) ensures that the invariant distribution remains un-
changed.
Proof. Equations (2.5.9b) and (2.5.9c) can be solved for π′p and π′q:
π′p =(πp −Wpp −Wqp)πpπp −Wpp −Wqp
= πp (2.5.10)
π′q =(πq −Wqq −Wpq)πqπq −Wqq −Wpq
= πq (2.5.11)
Therefore, π′i = πi for all i. �
If the network becomes disconnected as a result of broken edges, or if one or more
nodes break, the resulting invariant distribution of the remaining or partial network
is not the same as that of the original network, since information is lost in the process.
However, the method for adjusting the network described above can still be used to
determine the average of the values of the remaining nodes at the time the network
was disconnected.
2.6 Performance and Robustness trade-offs
There are many useful measures of performance for consensus algorithms. One such
performance measure is the second largest eigenvalue modulus (SLEM) [34][24]. The
SLEM is a measure of the worst-case convergence rate, which applies if the initial guess
is aligned with the second eigenvector, or the convergence rate that is reached when
all differences in node states along other eigenvectors of the system are sufficiently
reduced.
Figure 2.6 shows the spectral gap ρ = (1 − SLEM) for multilevel networks with
various numbers of levels, where the nodes and edges within each level form a ring. In
these networks, every node is connected to its two neighboring nodes within its level,
so that each level forms a ring. In addition, each node is connected to one supernode
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 19
N=1
N=2
N=3
N=4
N=5
N=6
10−5
10−4
10−3
10−2
10−1
100
Spectralgapρ
Spectralgapρ
1 2 5 10 20 50 100 200 500 1000
Number of Nodes nNumber of Nodes n
Figure 2.6: Spectral gap vs. number of nodes n in the base level for networks withvarious numbers of levels N .
in the level above. Each supernode in one of the upper levels has the same number
of subnodes.
As demonstrated in the figure, the spectral gap for a single-level network is in-
versely proportional to the square of the number of nodes in the network. However,
if the number of levels in the network is allowed to vary and is sufficiently large, it
scales logarithmically instead.
One simple measure of robustness is the connectivity of the network. Additional
measures of robustness are necessary to evaluate how the network convergence rate
is affected by failures of some edges or nodes that do not lead to parts of the network
becoming disconnected. One such measures of robustness is the worst-case spectral
gap of a network with a specific number of broken edges or nodes.
Another measure of performance that can be used is the inverse of the number of
steps tc required for convergence of node values to within a small error margin of the
invariant distribution. Similarly, robustness can be defined as the ratio between the
number of steps required for convergence for the intact network and for a network
with a number of broken edges or nodes.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 20
0
500
1000
1500
2000
Tim
eto
convergen
cet c
Tim
eto
convergen
cet c
0 5 10 15 20
Number of Supernodes n2Number of Supernodes n2
Worst case single node failureIntact network
Figure 2.7: Centralization Robustness vs. Performance trade-off — Single Node failureworst case performance.
In constructing a multilevel network, there are a number of parameters one can
chose that influence the performance and robustness of the network. The extreme
cases are often equivalent to a single level distributed network, which is very robust
but has low performance, or to a network with a single supernode, which has high
performance and low robustness.
The first choices to make are the number of levels and the ratio of nodes per
supernode for each level. The effects of the number of levels on the SLEM for a ring-
shaped network are shown in figure 2.6. Figure 2.7 shows an example of the number
of time steps required for convergence of a ring-shaped network with two levels and
40 base nodes as a function of the number of supernodes n2 in the second level. In
the case where all nodes are functioning, the convergence rate is lower for networks
with more supernodes. However, if any one of the supernodes breaks, the time to
convergence increases dramatically for a network with few supernodes, while networks
with more supernodes are not affected as much. In this case, adding more than six
supernodes to a network does not lead to faster convergence if one of them breaks,
since the effect of lowering the convergence rate is larger than the benefit of added
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 21
0
0.2
0.4
0.6
0.8
1
Rob
ustnesst c/t
′ cRob
ustnesst c/t
′ c
0 0.02 0.04 0.06 0.08
Performance 1/tcPerformance 1/tc
Pareto frontierCoefficients used for examples below
Figure 2.8: Performance vs. Robustness for various α and β values. The red crossindicates the parameter values chosen for subsequent numerical examples.
robustness. However, if several nodes malfunction, having additional supernodes can
be beneficial. While the ideal number of levels depends primarily on the number of
nodes in the network, the best ratio between the number of nodes in different levels
depends on the expected failure rate of nodes and edges, as well as the desired level
of robustness.
Additional parameters that have to be chosen are the α and β coefficients in the
state transition matrix¯P (figure 2.2). Selecting large values for the coefficients that
govern data exchanges between supernodes and from supernodes to base nodes yields
high performance and lower robustness, while giving base level nodes more weight
increases robustness and lowers performance. Figure 2.8 shows the Pareto frontier of
all possible combinations of these coefficients for a ring-shaped network with three
levels and 64 base-level nodes. The times to convergence for the intact network
and for a network with ten broken edges were used to evaluate performance and
robustness. The majority of possible combinations of the four α and β parameters in
this case are not on the Pareto frontier and should not be selected. Each point on the
Pareto frontier represents a different performance-robustness trade-off, and selection
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 22
0
2
4
6
8
10
0 2 4 6 8 10
Figure 2.9: Example network layout (connections between different levels are notshown), where upper levels use larger nodes and thicker edges.
of a specific parameter combination depends on the desired level of performance or
robustness.
2.7 Two dimensional numerical example
To demonstrate how the algorithm described above might be used in a real network,
a two dimensional network consisting of 324 randomly positioned nodes was created.
The probability of having an edge between any two nodes in the base level was in-
versely proportional to the square of the distance between the nodes. Two supernode
levels were created by dividing the base level layout into a 6 × 6 grid for the second
level, and a 2× 2 grid for the third level, and selecting the node closest to the center
of each grid square to double as a supernode. The layout of this network is shown in
figure 2.9.
Figure 2.10 shows the convergence of the node values to the invariant distribution
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 23
10−12
10−9
10−6
10−3
100
Residual
norm
||ri||/||r 0
||Residual
norm
||ri||/||r 0
||
0 20 40 60 80
Time tTime t
Single level with broken edgesSingle level MetropolisThree levels with broken edgesThree level multigrid
Figure 2.10: Convergence results for the example network.
for both the multigrid network and a network consisting of the base level only. As
expected, the multigrid network converges significantly faster. Also plotted is a case
where all edges have a probability of being functional of 0.5 at any time step. While
this decreases the convergence rate, the multigrid network still performs significantly
better than the single level network.
2.8 Measurement updates
The method described in the previous sections is applicable to situations where each
node takes only one measurement. In many potential applications, the value that is
being estimated changes over time, and nodes update their measurements periodi-
cally. One option to handle measurement updates would be to restart the consensus
process with each new set of sensor measurements. However, this can be ineffective,
especially if variations between different sensors are larger than variations of a par-
ticular sensor’s values over time, since all progress towards consensus based on the
previous values would be discarded. In addition, it would require all nodes to per-
form measurement updates at the same prearranged time, and would not allow for
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 24
unscheduled asynchronous updates.
The following theorem describes a way of updating the state of a node to incor-
porate a new measurement without restarting the consensus process. It can also be
applied if only some or just a single node update their measurements, and unlike
restarting the consensus process, it does not require any synchronized action between
nodes. The only disadvantage of this method is that nodes need to store their previous
measurement values in addition to their current state.
Theorem 2.8.1. Let y be a vector of previous measurement values, and let y′ be a
vector of updated measurement values. Update the state vector as follows:
x′ = x+ (y′ − y) (2.8.1)
Then the new invariant distribution reflects the mean of the new measurement values,
i.e.
ˆπ′ = P∞y′ (2.8.2)
Proof. If the measurement update is performed at time t, then the node state before
and after the measurement update are
x(t) = (P T )ty (2.8.3)
x′(t) = (P T )ty + (y′ − y) (2.8.4)
The new invariant distribution is
ˆπ′ = P∞x′ = P∞(P T )ty + P∞y′ − P∞y = P∞y′ (2.8.5)
�
2.9 Sensor weights
The methods described above lead to a consensus that reflects the mean of the mea-
surement values of all nodes. In this section, we describe how to adapt the methods
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 25
to allow for giving nodes unequal weights, so that nodes that have access to more
accurate measurements can be given higher weights than nodes with less accurate
measurements.
Theorem 2.9.1. Let yi be a measurement value associated with node i and let φi be
the weight assigned to it. Then the weighted average of the measurements of all nodes
in the network can be found by running two separate consensus processes on variables
x and z with initial values as defined below:
xi(0) = φiyi (2.9.1)
zi(0) = φi (2.9.2)
The weighted average of yi is obtained at each node after both consensus processes
converge by dividing xi by zi.
xi(∞)
zi(∞)=
∑nk=1 φkyk∑nk=1 φk
(2.9.3)
Proof. Applying equation 2.3.1,
xi(t) = κi
n∑k=1
xk(0) (2.9.4)
zi(t) = κi
n∑k=1
zk(0) (2.9.5)
The factors κ are the same for both consensus processes.
xi(∞)
zi(∞)=κi∑n
k=1 φkykκi∑n
k=1 φk=
∑nk=1 φkyk∑nk=1 φk
(2.9.6)
�
With this method it is even possible to alter sensor weights from φi to new values
φ′i at some time during the consensus process by using the method described in the
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 26
previous section and applying equation 2.8.1 to both x and z, i.e.
x′i = xi + (φ′iyi − φiyi) (2.9.7)
z′ = zi + (φ′i − φi) (2.9.8)
Note that the sensor weights do not need to sum to one here, since we divide by their
sum z. This is particularly useful, since it means that a node can change its weight
by simply altering its own stored values of xi and zi, and no additional interaction
with other nodes is required.
If the noise in the node measurements is expected to be independent for each
node and normally distributed, setting the sensors weights equal to the inverse of the
variance σ2i of each node minimizes the overall error:
φi =1
σ2i
(2.9.9)
In some situations, the nodes might be able to provide an estimate of the accuracy of
their estimate that varies over time. The equations above can then be used to update
the node weights to reflect this change in the estimated accuracy.
2.10 Network spectral properties and convergence
rates
In this section we study how spectral properties of a network are influenced by adding
supernode levels to a network. Results presented in previous sections have shown that
multigrid methods can reduce the second-largest eigenvalue λ2 and thereby increase
the spectral gap ρ = 1−λ2 of a network. To show the effects on additional eigenvalues
and eigenvectors of the network, we start with the example of a ring-shaped network,
where the base level network consists of a simple ring of nodes, and every node has
exactly two neighbors. For such a ring-shaped network with n nodes, the eigenvalues
are given by the following expression, where k takes values from 0 to n/2 for even n,
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 27
and from 0 to (n− 1)/2 for odd n:
λ2k−1 =1
3+
2
3cos
(k
n2π
)(2.10.1)
For k larger than 0 and smaller than n, the multiplicity of the eigenvalue is 2. The
eigenvectors have the following forms, where vk,i is the i-th entry of the k-th eigen-
vector, and the vectors v′ have to be normalized to obtain the eigenvectors v:
v′2k−1,i = sin
(ki
n2π
)(2.10.2)
v′2k,i = cos
(ki
n2π
)(2.10.3)
The eigenvalue moduli for a ring-shaped network of 400 nodes are shown in figure 2.11.
Three of the eigenvectors are shown in figure 2.12. Overall, for a ring-shaped network,
eigenvectors corresponding to eigenvalues with high moduli are low frequency sinu-
soids, and eigenvectors corresponding to low modulus eigenvalues are high frequency
sinusoids. If a consensus process is run on such a network, high-spatial-frequency
noise therefore is averaged out quickly, while low-spatial-frequency noise persists for
a larger number of time steps.
Figure 2.11 also shows the eigenvalues of multigrid networks that use the simple
ring-shaped network as their base layer. The eigenvalues shown are for networks
with three and six layers. Both multigrid networks have the same number of nodes
in the top level, so that the three layer network represents a relatively centralized
network, and the six layer network represents a more robust network. As expected, the
eigenvalues of the multigrid network are lower in magnitude than those of the single
layer network, with the three level network having the overall smallest eigenvalues.
Most importantly, λ2 is significantly lower for the multigrid networks.
To study how the eigenvalues and eigenvectors relate to convergence rates, a con-
sensus algorithm was run on the ring-shaped network with the eigenvectors of the
single level ring as an input. For each of the eigenvectors, the process was started
with each node initialized to the corresponding entry of the eigenvector, and the
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 28
0
0.2
0.4
0.6
0.8
1
Eigenvaluemodulus
0 0.2 0.4 0.6 0.8 1
Eigenvalue number / n
1 level3 levels6 levels
Figure 2.11: Eigenvalues of network with 400 base level nodes with various numbersof levels.
−0.1
−0.05
0
0.05
0.1
Eigenvectorvalue
0 100 200 300 400
Node number
v2 v6 v40
Figure 2.12: Selected eigenvectors of a single level ring with 400 nodes.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 29
100
101
102
103
104
105
106
Stepsto
convergence
0 100 200 300 400
Eigenvector number
Single level ringThree level ringSix level ring
Figure 2.13: Convergence times of different ring-shaped networks given the eigenvec-tors of a single level ring as starting value.
number of steps until the invariant distribution (within a tolerance) was reached was
recorded. The results are shown in figure 2.13. For the single level network, the num-
ber of steps to convergence decreases with eigenvector number, as the corresponding
eigenvalue modulus decreases. For the multilevel networks on the other hand, the
convergence rates are nearly constant across all eigenvectors, indicating that the con-
sensus algorithms for multigrid networks described above can eliminate high and low
frequency noise equally in approximately the same number of time steps. Multigrid
networks are therefore particularly useful if low-spatial-frequency noise is present,
while single level networks can be used for eliminating high-spatial-frequency noise.
Figures 2.14 through 2.18 show similar results for the network shown in figure
2.9 instead of the ring-shaped network. Figure 2.14 compares the eigenvalues of the
base level network to the multigrid network, showing significantly lower eigenvalues
for the multigrid network. The next three figures are plots of some selected eigen-
vectors of the base network. Unlike the ring-shaped network, the node layout of this
two dimensional network cannot be simply represented by the node number alone.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 30
0
0.2
0.4
0.6
0.8
1
Eigenvaluemodulus
0 0.2 0.4 0.6 0.8 1
Eigenvalue number / n
1 level3 levels
Figure 2.14: Eigenvalues of network with various numbers of levels.
Therefore, in the plots of those eigenvectors, the two plot axes represent the loca-
tion of the node, while the color indicates the value of the entry of the eigenvector
corresponding to a particular node. While the eigenvectors for this irregular net-
work are not sinusoids, it is still true that eigenvectors corresponding to the large
eigenvalues usually vary slowly across neighboring nodes, and represent low-spatial-
frequency noise, while eigenvectors corresponding to smaller eigenvalues vary quickly
across neighboring nodes.
Figure 2.18 is the equivalent of figure 2.13, but for the two dimensional network.
Just as for the ring-shaped network, the time until convergence stays almost constant
for all initial conditions.
2.11 Conclusion
In this chapter we introduced a new multilevel and multiscale network construction
that accelerates distributed consensus algorithms run on the network. We also gave
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 31
0
2
4
6
8
10
0 2 4 6 8 10
Figure 2.15: v2 of the base level of the network shown in figure 2.9.
0
2
4
6
8
10
0 2 4 6 8 10
Figure 2.16: v6 of the base level of the network shown in figure 2.9.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 32
0
2
4
6
8
10
0 2 4 6 8 10
Figure 2.17: v30 of the base level of the network shown in figure 2.9.
100
101
102
103
104
Stepsto
convergen
ce
0 50 100 150 200 250 300
Eigenvector number
Single level networkMultilevel network
Figure 2.18: Convergence times of different networks given the eigenvectors of a singlelevel network as starting value.
CHAPTER 2. MULTISCALE CONSENSUS ALGORITHMS 33
update rules to show how the consensus transition matrix should be adjusted in re-
sponse to node and edge failure to ensure that the invariant distribution is preserved.
Using our multilevel construction we were able to explore the trade-off between heav-
ily weighting the coarsest levels of the network, resulting in high performance but
low robustness to failure, and more heavily weighting the fine base level, giving high
robustness but low performance. The algorithms presented constitute a heuristic
method. A detailed mathematical model for the resulting improvements in conver-
gence rates would be an interesting area for future work, but is beyond the scope here.
The accelerated performance of consensus methods on such multilevel networks was
demonstrated with an example of a random network embedded in 2D. The spectral
properties of this example network, as well as a ring-shaped network, were studied
and the times until convergence with different initial noise conditions for the multigrid
network were compared to the base network, indicating that the multigrid methods
described here are particularly useful in the presence of noise that varies slowly across
neighboring nodes. Furthermore, we described how time-varying node measurements
can be incorporated, and how the algorithm can be adapted if weights are introduced
for individual node measurements.
Chapter 3
Distributed GPS augmentation
34
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 35
3.1 Introduction
The purpose of this chapter is to describe a method that increases the point position-
ing accuracy of global navigation satellite systems such as GPS by sharing information
about measurement errors between receivers. Unlike other augmentation systems, the
method described here does not require the placement of reference receivers, but in-
stead uses a network of mobile receivers to compute error corrections.
GPS signals are subject to several sources of errors, including differences between
the true ephemeris of the satellite and the values that are broadcasted in the nav-
igation message, ionospheric and tropospheric signal delays, multipath errors, and
receiver noise. While models for ionospheric and tropospheric delays are available
and commonly used in receivers, the differences between actual and estimated delays
are significant and make up the majority of the error in pseudorange measurements.
For single frequency receivers using standard models of these delays, the RMS dif-
ference between the modeled and actual delays is 5 meters for ionospheric delays,
and 1 m for tropospheric delays, while the RMS range error due to ephemeris errors
is 3 meters[23]. Receiver noise and multipath effects are usually between 0.5 and 1
meter each, depending on the quality of the receiver, the type of antenna used, and
the topography of the terrain. Multipath errors can be significantly larger in some
areas, including urban canyons. Unless otherwise noted, we used a 1 meter error for
combined multipath and receiver noise.
GPS signals are subject to several sources of errors, including differences between
the true ephemeris of the satellite and the values that are broadcast in the navi-
gation message, ionospheric and tropospheric signal delays, multipath errors, and
receiver noise. While models for ionospheric and tropospheric delays are available
and commonly used in receivers, the differences between actual and estimated delays
are significant and make up the majority of the error in pseudorange measurements.
For single frequency receivers using standard models of these delays, the RMS (root-
mean-square) difference between the modeled and actual delays is 5 m for ionospheric
delays, and 1 m for tropospheric delays, while the RMS range error due to ephemeris
errors ans spacecraft clock modeling errors is 3 m[23]. Receiver noise and multipath
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 36
Ephemeris and satellite clock model errors 3 mTropospheric delay model error 1 mIonospheric delay model error 5 mReceiver noise and multipath 1 m
Table 3.1: Typical GPS error budget (RMS values).
effects are usually between 0.5 m and 1 m each, depending on the quality of the
receiver, the type of antenna used, and the topography of the terrain. Multipath
errors can be significantly larger in some areas, including urban canyons. Unless oth-
erwise noted, we used a 1 m error for combined multipath and receiver noise, which
is appropriate for locations with a relatively clear sky.
GPS signals are subject to several sources of errors, including differences between
the true ephemeris of the satellite and the values that are broadcasted in the nav-
igation message, ionospheric and tropospheric signal delays, multipath errors, and
receiver noise. While models for ionospheric and tropospheric delays are available
and commonly used in receivers, the differences between actual and modeled delays
are significant and make up the majority of the error in pseudorange measurements.
For single frequency receivers using standard models of these delays, the RMS dif-
ference between the modeled and actual delays is 5 meters for ionospheric delays,
and 1 m for tropospheric delays, while the RMS range error due to ephemeris errors
is 3 meters[23]. Receiver noise and multipath effects are usually between 0.5 and 1
meter each, depending on the quality of the receiver, the type of antenna used, and
the topography of the terrain. Multipath errors can be significantly larger in some
areas, including urban canyons. Unless otherwise noted, we used a 1 meter error for
combined multipath and receiver noise.
A number of methods for reducing these errors are currently being used or will
be available in the future. Ionospheric errors can be mostly eliminated by the use
of dual-frequency receivers, which are currently not commercially sold, but will be
available to users of GPS and Galileo in the future. There are also a variety of
differential GPS (DGPS) methods that are commonly used. DGPS systems typically
consist of one or more reference receivers with a known position. Based on the
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 37
positions and measurements that are computed by these reference receivers, a set of
corrections can be computed and transmitted to other receivers in the area. A mobile
receiver in the vicinity of the reference station can then apply those corrections to
its own pseudorange measurements. Corrections can take the form of scalar values
that are estimates of the total errors in the pseudorange for each satellite. The
accuracy of these types of corrections for the mobile receivers naturally depends on
the distance to the reference station. Wide-area DGPS systems on the other hand
broadcast vector corrections, where the exact correction that is applied by the user
depends on the user’s location. These systems can also estimate various types of
errors (such as ephemeris and ionospheric delays) separately. Scalar corrections are
generally sufficient if the distance between the reference station and the user is less
than 100 km [17].
One DGPS systems that is widely used today in the United States is the Wide Area
Augmentation System (WAAS) [13], which employs reference receivers in locations
across the United States to provide pseudorange vector corrections that eliminate
most of the error due to ionospheric delays, ephemeris errors, and satellite clock
biases. While WAAS is very effective in improving positioning accuracy, the distance
between a user and the closest reference station might be considerable, which limits
the accuracy that can be achieved. Several other countries have implemented similar
systems, including the European EGNOS [14] and the Japanese MSAS systems. At
some airports in the United States, Local Area Augmentation Systems (LAAS) [12]
are used to provide scalar GPS corrections, which are more accurate near the reference
stations, but are typically available only to aviation users in that particular area.
There are, however, still many areas in the world where GPS augmentation is not
available, creating a problem for users that require high positioning accuracy. Some of
the uses of GPS that require a higher accuracy than can be provided by un-augmented
GPS include agriculture and navigation for vision-impaired people [22].
The augmentation system proposed here does not rely on reference receivers, and
could therefore function in any region of the world. A receiver needs to have access to
signals from four satellites in order to obtain a position fix, but GPS receivers often
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 38
receive signals from five or more satellites at a time. Having more information avail-
able than required for a position estimate enables receivers to produce estimates of
the signal delays due to ionospheric, tropospheric, and ephemeris errors. After com-
puting a position estimate, the pseudorange residuals of a receiver form an estimate
of the signal delays. If produced by a single receiver, such an estimate would not
be very accurate due to the presence of receiver noise and possibly multipath effects,
but if a number of receivers collaborate, pseudorange corrections can be computed
from the combined data from all receivers. Currently, most GPS receivers do not
use additional pseudorange data to compute error corrections, while some receivers
use the additional pseudoranges from more than four satellites to perform integrity
checks. Since GPS receivers have become a common feature of cell phones and other
devices, it is conceivable that a network of receivers could be created within a rela-
tively small area, so that all receivers experience similar errors due to tropospheric
and ionospheric delays and ephemeris errors. Receiver noise and multipath errors
differ between receivers even if placed in close proximity to one another.
The augmentation system proposed below would find scalar pseudorange correc-
tions. It is however conceivable that the methods used could be extended to finding
vector corrections, as well as performing integrity checks. We also assume that the
receivers do not use carrier phase methods.
While all examples included here are based on the GPS constellation of satellites,
the same methods apply to receivers of other satellite navigation systems, and could
also be used for receivers that receive signals from multiple systems. These algorithms
might also be useful for systems that combine GPS with other positioning signals,
such as TV transmissions [32].
Methods for using distributed networks with GPS receivers for node localization
are described in [28]. In these types of networks, some nodes either do not have a
GPS receiver, or do not get a clear signal at their location, and ranging between
nodes in the network is used to enable positioning for those nodes. In contrast, for
our work described here we assume that every node is capable of finding a position es-
timate individually, and the interaction between nodes is used to improve positioning
accuracy, without directly measuring ranges between nodes.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 39
The algorithms proposed here could be implemented using a variety of different
means of communication between the nodes. Given the ubiquity of GPS receivers in
smartphones, it is conceivable that wireless communication via WiFi or cell phone
networks could be used. The use of networks of GPS-enabled cell phones is currently
being studied for uses in traffic monitoring [43]. Several existing DGPS systems,
ranging from small local systems such as the one described in [38] to wide area systems
such as NASA’s Global Differential GPS System [25][26] currently use the Internet
to send corrections to users. All of these systems feature fixed reference stations.
3.2 Position solution for a single receiver
This section describes algorithms for finding a position estimate for a single receiver.
These algorithms will form the basis for the development of algorithms for networks
of receivers in later sections of this chapter.
Let ρ be the vector of measured pseudo-ranges to GPS satellites, sj be the position
of satellite j, x be the position of the receiver, b be the the clock bias of the receiver,
ε be a vector containing delays (i.e. errors that are experienced by all receivers in a
specific area) associated with the satellite pseudo-ranges, and ν be a random noise
vector. The following equation relates these quantities:
ρj = ‖sj − x‖+ b+ εj + νj (3.2.1)
As a first step, we describe methods for finding a position solution that do not attempt
to find the correlated delays, and assume these to be a part of the noise vector ν, so
that
ρj = ‖sj − x‖+ b+ νj (3.2.2)
We want to find estimates of the position of the receiver x and the receiver clock bias
b, and therefore define a state vector y that includes these variables:
y =
[x
b
](3.2.3)
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 40
The least-squares optimal position solution is the estimate of the receiver location
that minimizes following the objective function, where Ns is the number of satellites
for which pseudorange measurements are available to the receiver:
f(y) =1
2
Ns∑j
(‖sj − x‖+ b− ρj
)2(3.2.4)
This function has a unique minimum if more than four satellites are available [1].
There is a large variety of methods that could be used to find the minimum of the
objective function. The Gauss-Newton method is commonly used in practice. Since
our objectives go beyond exploring single-receiver solutions, we also include a discus-
sion of additional algorithms, which are not commonly used for single receiver point
positioning, and compare their performance later in this chapter.
3.2.1 Single receiver Gauss-Newton method
The standard method for solving this non-linear least-squares problem is the Gauss-
Newton method. Starting with an initial guess y(0), we iterate the following until
convergence:
δρj(t) = ‖sj − x(t)‖+ b(t)− ρj (3.2.5)
δy(t) =[(G(t)TG(t)
)−1G(t)T
]δρ(t) (3.2.6)
y(t+ 1) = y(t) + δy(t) (3.2.7)
The matrix G, also called the observation or navigation matrix, is the Jacobian
G =∂(δρ)
∂y=
−`T1 1
−`T2 1...
...
−`Tm 1
(3.2.8)
The line-of-sight vector `j is the unit vector pointing from the estimated receiver
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 41
position to satellite j, i.e.
`j =sj − x‖sj − x‖
(3.2.9)
Since GPS satellites are arranged in a constellation at a fixed altitude, the line of
sight vectors of visible satellites are linearly independent, and GTG is invertible as
long as at least four satellites are in view.
3.2.2 Single receiver gradient descent method
For a single receiver, the position solution can also be found with a gradient descent
algorithm. This algorithm uses the gradient ∇f(y(t)) of the objective function shown
in equation 3.2.4. Defining gk to be the kth column of G, the kth element of the
gradient can be expressed as
[∇f(y(t))]k = δρ(t)Tgk(t) (3.2.10)
The descent direction is equal to the negative gradient:
∆y(t) = −∇f(y(t)) (3.2.11)
Backtracking line search is used to determine the step size as described in [4]:
while f(y(t) + τ∆y(t)
)> f
(y(t)) + ατ∇f(y(t)
)T∆y(t) set τ = βτ (3.2.12)
where β is a scalar parameter.
The convergence rate can be improved by scaling the clock bias so that its ex-
pected value has the same order of magnitude as the position of the receiver. In the
computation of the gradient the value g4 has to be adjusted accordingly.
3.2.3 Single receiver Newton’s method
Newton’s method is a second order method that is conceptually similar to the gradient
descent method described above, but uses the Hessian in addition to the gradient to
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 42
find the step direction:
∆ynt(t) = −∇2f(y(t))−1∇f(y(t)) (3.2.13)
The gradient ∇f(y(t)) can be found using the equations presented in the previous
section. The first derivative of the objective function with respect to the clock bias
is:∂f(y)
∂b=
Ns∑j
(‖sj − x‖+ b− ρj
)(3.2.14)
Let x(k) for k = 1 . . . 3 be the k-th element of x, and let s(k)j be the k-th component
of sj. The derivative of f(y) with respect to x(k) is:
∂f(y)
∂x(k)=
Ns∑j
(x(k) − s(k)j
)‖x− sj‖
(‖x− sj‖+ b− ρj
)(3.2.15)
Let δρj be the j-th component of δρ. The elements of the Hessian are:
∂2f(y)
∂b2= Ns (3.2.16)
∂2f(y)
∂x(k)∂b=
Ns∑j
(x(k) − s(k)j
)‖x− sj‖
(3.2.17)
∂2f(y)
∂x(k)∂x(`)=
Ns∑j
(x(k) − s(k)j
)‖x− sj‖2
(x(`) − s(`)j
)(1− δρj‖x− sj‖
)(3.2.18)
∂2f(y)
∂x(k)∂x(k)=
Ns∑j
(x(k) − s(k)j
)2‖x− sj‖2
(1− δρj‖x− sj‖
)+
δρj‖x− sj‖
(3.2.19)
where k 6= `.
The equations presented here for Newton’s method are more complex than those
for the Gauss-Newton method, and require finding second derivatives. Many of the
individual terms presented above are however similar or even identical to the corre-
sponding terms for the Gauss-Newton method.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 43
100
104
108
1012
1016
Objectivefunctionvalue
100 101 102 103 104
Time step
Gradient descentNewton’s methodGauss-Newton
Figure 3.1: Convergence of total objective function divided by the number of receiversfor 50 receivers without delay estimation.
3.2.4 Comparison of different single receiver methods
Figure 3.1 shows the convergence of the sum of the objective function values for 50
receivers. For the purpose of this and other simulations, we assumed that the receivers
were located in an area of six minutes in latitude by eight minutes in longitude,
roughly equal to the city of San Francisco in size and geographic location. The actual
ephemeris data of the GPS constellation was used. Pseudoranges for the simulation
were created by adding errors that were generated randomly as Gaussian random
variables with zero mean and standard deviations given in table 3.1 to the sum of the
distances between receivers and satellites and the clock biases. A probability function
was used to simulate the effect of buildings and other objects that might obscure the
field of view of the receiver, making it more likely for high elevation satellites to be
visible.
As expected, the Gauss-Newton method has the best convergence, requiring only a
few steps. Newton’s method also results in fast convergence, but the gradient descent
algorithm is slow and takes a large number of steps to converge.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 44
3.3 Point positioning for multiple receivers with
delay estimation
The errors associated with pseudo-range measurements can be divided into three
different types: (1) the clock bias associated with a particular receiver, which changes
the pseudoranges to all satellites seen by that receiver equally; (2) ionospheric and
tropospheric delays, as well as satellite clock biases, which are associated with each
satellite and are experienced equally by all receivers in a specific location; (3) an
uncorrelated random error consisting of multipath errors and receiver noise. Let ε be
a vector of the correlated delays, consisting of one scalar per satellite. The following
sections describe how a network of GPS receivers can estimate ε and use this estimate
to improve positioning accuracy for its individual receivers. Let εj denote the signal
delay associated with satellite j. If enough information is available, the delays can
be estimated along with the receiver positions in the process of finding a navigation
solution. In this section, we describe how this can be done in a centralized way using
the Gauss-Newton method. We want to minimize the total sum of squares of the
error over all receivers. After including the delays, the new objective function for a
set of Nr receivers which receive signals from the same set of Ns satellites is
f(y) =1
2
Nr∑i=1
Ns∑j=1
(‖sj − xi‖+ bi + εj − ρi,j
)2(3.3.1)
Here, xi denotes the estimated position of satellite i, bi is the estimated clock bias of
satellite i, ρi,j is the measured pseudorange from receiver i to satellite j, and y is a
vector that contains all xi and bi.
Let ηi,j be 1 if receiver i receives a signal from satellite j, and 0 if the receiver
does not have access to a signal from that satellite. Then a more general form of the
objective function, which does not require all receivers to see the same set of satellites,
is
f(y) =1
2
Nr∑i=1
Ns∑j=1
ηi,j
(‖sj − xi‖+ bi + εj − ρi,j
)2(3.3.2)
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 45
3.3.1 Multi-receiver Gauss-Newton method
To solve this least-squares problem, we need to add the delays ε to the vector of
variables to be estimated. We therefore create a vector y, which consists of the
estimated positions and velocities of all receivers, and the correlated delays. If yi =[xTi , bi
]Tfor i = 1, . . . , Nr, then this vector y is
y =[yT1 , y
T2 , . . . , y
Tn , ε
T]T
(3.3.3)
There is however a problem that arises with estimating all components of y simultane-
ously: There is an ambiguity in estimating all clock biases and correlated delays, since
adding a constant to all components of the estimated clock biases and the estimated
delays produces no change in the objective function value, i.e.,
f(x, b+ c1, ε− c1
)= f
(x, b, ε
)(3.3.4)
It is possible to resolve this ambiguity by assuming that either the clock biases or
delays have zero mean. With either assumption, the accuracy of the position estimate
is not affected by the validity of the assumption, i.e. the position solution will be
accurate even if the assumption does not apply. The accuracy of the time estimate
is however affected if the clock biases are assumed to be zero-mean, but aren’t, as is
likely to be the case for a receiver network. If, for example, the clock biases have a
mean of b, but are assumed to be zero-mean, the delays ε in the final solution would
be offset by b from the value they would have if the clock biases were zero-mean, and
the estimated position would not be affected, but the estimated receiver time would
also be offset by b. In the following algorithms, we take the delays ε to be zero-mean
in order to obtain an accurate time estimate. We therefore remove one of the delays,
εNs from y, and compute it instead from the estimates of the remaining delays, i.e.
εNs = −Ns−1∑k=1
εk (3.3.5)
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 46
In order for the least squares problem to have a unique solution, it is necessary that
the total number of pseudorange measurements across all receivers is at least equal
to the number of entries of y, which is 4Nr +Ns − 1.
Let ρi be a vector that contains all of the pseudorange measurements for receiver i.
We then create a vector ρ that contains all pseudorange measurements of all receivers,
i.e.
ρ =[ρT1 , ρ
T2 , . . . , ρ
Tn
]T(3.3.6)
Let Gi be the observation matrix used for solving the single receiver point positioning
problem for receiver i, as described in 3.2.8. The observation matrix G for the multi-
observer problem contains the single receiver observation matrices in a block diagonal:
G =
G1 0 0 . . . 0 B1
0 G2 0 . . . 0 B2
0 0 G3 . . . 0 B3
......
.... . .
......
0 0 0 . . . GNr BNr
(3.3.7)
If a receiver has access to signals to all satellites that are visible to any other
receiver, then the matrix Bi associated with that receiver is
Bi =∂ρi∂ε
=
1 0 0 . . . 0
0 1 0 . . . 0
0 0 1 . . . 0...
......
. . ....
0 0 0 . . . 1
−1 −1 −1 . . . −1
(3.3.8)
If a receiver does not receive a signal from a specific satellite, then the correspond-
ing row of Bi in equation 3.3.8 is removed, so that the number of rows of Bi is equal
to the number of pseudorange measurements of that receiver.
Given these definitions of ρ, y, and G, we can now write the iterative equations
for solving the multi-receiver least squares problem with delay estimation using the
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 47
Gauss-Newton method as follows. At every time step t,
δy(t) =
[(G(t)T G(t)
)−1G(t)T
]δρ(t) (3.3.9)
y(t+ 1) = y(t) + δy(t) (3.3.10)
3.3.2 Accuracy and sensitivity to random errors
To obtain good accuracy for single receiver point positioning, it is important that the
satellites seen by the receiver are located in different parts of the sky relative to the
receiver. If a large portion of the sky is occluded, and the angle between the satellites
as seen by the receiver is small, then the accuracy of the position solution is low. A
commonly used measure for this is dilution of precision (DOP), where various DOPs
are functions of the diagonal entries of the covariance matrix
W =(GTG
)−1(3.3.11)
In estimating the correlated delays of the pseudoranges, a similar issue arises. In order
to get a very accurate delay estimate, it is necessary that the receivers are not located
too close to one another. The inverse of the covariance matrix for the multi-receiver
problem is
W−1 = GT G =
GT1G1 0 0 . . . GT
1B1
0 GT2G2 0 . . . GT
2B2
0 0 GT3G3 . . . GT
3B3
. . . . . . . . .. . . . . .
BT1 G1 BT
2 G2 BT3 G3 . . .
∑Nr
i=1BTi Bi
(3.3.12)
If the estimated positions of the receivers are all in the same location, then GT G is a
singular matrix, and position solution cannot be found using the algorithm described
above. If the receivers are not in exactly the same positions but located within short
distances of one another, then GT G can be ill-conditioned, leading to large errors.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 48
−7500
−5000
−2500
0
2500
5000
7500
North
distance
from
reference
point(m
)
−6000 −4000 −2000 0 2000 4000 6000
East distance from reference point (m)
Figure 3.2: Effect of including delay estimation on receiver position estimates: Graydots show the actual receiver positions relative to the geographic center of the net-works, black dots show the estimated positions that minimize the objective functionin 3.3.2.
Another way to look at this issue is to consider the Taylor expansion of the multi-
receiver problem near a point y0. Let y(k) be the k-th entry of y, then
f(y) ≈ f(y0) +Ns∑j=1
Nr∑i=1
(4Nr+Ns−1)∑k=1
(ηi,j
(‖sj − xi,0‖+ bi,0 + εj,0 − ρi,j
))× ∂
∂y(k)
(‖sj − xi,0‖+ bi,0 + εj,0 − ρi,j
)(y(k) − y(k)0
)(3.3.13)
The constant term is not affected by any changes in the estimated position. If we
take point y0 to be the point where the estimated pseudoranges(‖sj − xi‖+ bi + εj
)are equal to the measured pseudoranges ρi,j, then the linear term is also invariant
under changes in the estimated position. The quadratic term in this case would be
non-zero, but relatively small for realistic values of y.
Figure 3.2 demonstrates what can happen to the final position estimate as a
result of the issues described above: The estimated position of the constellation of
receivers appears to be shifted significantly from their true positions. The estimated
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 49
delays for this example are very large. While the receivers are moved towards one of
the satellites, the corresponding estimated delay is shortened, resulting in very little
overall change to the objective function. Without the presence of noise, these shifts
would not occur, but even small amounts of noise can have the effect that shifts of
the estimated position cause minor reductions in the objective function value.
Several approaches can be taken to mitigate this problem. Since the shifts in
the estimated position are related to large estimated pseudorange delays, it helps to
penalize large delays in our objective value function. This is described in the following
section.
3.3.3 Regularized delay estimation
If we take the single receiver positioning solution for each receiver and find the value
of the multi-receiver objective function (equation 3.3.1) at those points, and compare
that to the value of the objective function at the true position of the receivers, the
difference is small but significant for scenarios with typical noise. Comparing the
values at the true position to the solution from the multi-receiver positioning algo-
rithm described above, the differences in the objective value function are very small,
even though the differences in the receiver positions and correlated delays are large.
Regularized least-squares methods can provide a solution where large values of ε are
penalized, so that the magnitude of ε can only be significantly larger than zero if
it provides a significant improvement in the objective function. Regularized least-
squares therefore prevents the position solution for the entire network from shifting
as seen in figure 3.2.
With the penalty for ε added with coefficient µ, the new objective function is:
freg(y) =1
2
Nr∑i=1
Ns∑j=1
ηi,j
(‖sj − xi‖+ bi + εj − ρi,j
)2+
1
2µ‖ε‖2 (3.3.14)
For finding the regularized least-squares solution, it helps to express ε as a linear
function of y.
‖ε‖ =∥∥∥[ 0 B
]y∥∥∥ = ‖F y‖ (3.3.15)
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 50
B ∈ RNs×Ns is composed of the matrices Bi described in equation 3.3.8:
B =[BT
1 BT2 . . . BT
Nr
]T(3.3.16)
The update equation for the Gauss-Newton method is
δy(t) =
[(G(t)T G(t) + µF (t)TF (t)
)−1G(t)T
]δρ(t) (3.3.17)
3.4 Distributed delay estimation
The method described in the previous section works if the number of nodes is small
and the computation is performed in a centralized way. If the number of nodes is
large, it becomes difficult to find the pseudoinverse of G. We would like to perform
the computations in a decentralized way, where each receiver has access to limited
information about the other receivers, and where the amount of computation to be
performed by each receiver is small, so the method described above is not suitable for
our purposes. Any method for finding the positions of the receivers in a decentralized
way should yield the same solution as the centralized method described above.
The previous section explains how the optimization problem can be solved by using
a least-squares method which varies all components of y concurrently to minimize
f(y). To be able to solve the problem in a decentralized way, each step in the
optimization process can instead be broken down into two parts: First, new values
of the position estimates x and the clock bias estimates b are found that reduce the
value of the objective function f(y) while keeping the delays ε constant, and then
the minimizing values of ε are found while keeping the estimated receiver positions
constant.
The first part of the time step is then equivalent to finding the position solution
for each receiver independently, and can be completed using any of the methods for
single receivers described above. The only modification necessary is that the previous
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 51
estimate of the delays needs to be included, so that equation 3.2.5 becomes
δρj(t+ 1) = ‖sj − x(t)‖+ b(t) + εj(t)− ρj (3.4.1)
Minimizing the norm of the residuals of the pseudoranges with respect to the
delays while keeping the receiver position estimate constant is relatively simple. The
least squares solution is actually equivalent to setting the delay associated with each
satellite equal to the mean of the residual of the pseudoranges without including
the previous delay estimate. To show that this is indeed the minimum, we take the
derivative of 3.3.2 with respect to ε and set it to zero:
∂f(y)
∂εj=
Nr∑i=1
ηi,j
(‖sj − xi‖+ bi + εj − ρi,j
)= 0 (3.4.2)
This equation is then solved for εj to find the following update equation for εj:
εj(t) =1∑Nr
i=1 ηi,j
Nr∑i=1
ηi,j
(‖sj − xi(t)‖+ bi(t)− ρi,j
)(3.4.3)
While this equation for updating the delay estimates requires coordination and
sharing of information between nodes, the only distributed operation that is necessary
is finding a sum of a vector, which can be accomplished using a consensus method.
This is fairly straightforward, and a variety of different distributed consensus meth-
ods are available and could be used for the implementation, including the multigrid
method described in chapter 2. Given such a consensus method, there are two ways
in which they could be integrated here: The first possibility would be to run a consen-
sus method until some convergence criteria is reached every time ε is updated. The
second method would be to only perform a fixed number of steps in the consensus
process, continuing the consensus process for a few steps each time epsilon is updated,
and using the new values of x and b at each time step to update the variables at each
node that are used as inputs to the consensus process. This would require that the
consensus algorithm that is used can handle dynamic updates of the initial node val-
ues, but also has a few advantages: It is likely to lead to faster overall convergence,
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 52
since it does not require completing a full consensus process at each update of x and
b, and it requires less coordination between nodes, since convergence does not have
to be detected at each time step, and the sequence of operations is pre-determined.
The two steps of minimizing with respect to the receiver estimated position and
with respect to the delays can be alternated until the solution converges. If the re-
ceiver positions are completely unknown, it can be helpful to perform a few iterations
on the receiver positions only, until the solution gets reasonably close to the true posi-
tion, since the delays are typically small. This is particularly useful for methods that
converge slowly, such as gradient descent, but does not yield significant improvements
in convergence time if the Gauss-Newton method is used. Instead of alternating the
two steps, it is also possible to perform multiple iterations of the position update
before each ε update. The results presented below however use alternating single
iterations of each step.
3.4.1 Regularized distributed delay estimation
For the distributed delay estimation the same accuracy problems arise as for the
centralized method described above. It is still possible for small changes in the random
errors to cause large shifts in the position solutions, and therefore regularized-least
squares methods are still useful for keeping the values of ε low.
Just like in the centralized case, we minimize the regularized least-squares objec-
tive function (equation 3.3.14). In the two-step process of the distributed optimization
method, the first step of minimizing with respect to the positions and clock biases do
not change if the regularization term is added, since the extra term does not contain
those variables. For the minimization with respect to ε, we find the partial derivative
of the new objective function with respect to ε and set it to zero:
∂freg∂εj
=Nr∑i=1
ηi,j
(‖sj − xi‖+ bi + εj − ρi,j
)+ 2εjµ = 0 (3.4.4)
⇒ εj(t) =
∑Nr
i=1 ηi,j
(‖sj − xi(t)‖+ bi(t)− ρi,j
)∑Nr
i=1 ηi,j + 2µ(3.4.5)
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 53
3.4.2 Comparison of the different methods
The first part of each time step is equivalent to finding the single-receiver position
solution for each receiver individually. Any of the three methods described in section
3.2 can be used for that step. We previously determined that if we don’t attempt
to estimate the delays, the Gauss-Newton method converges fastest. Figures 3.4
and 3.5 show that this is also true with delay estimation. Simulations were run for
networks of 50 and 500 receivers, using the same receiver locations and pseudorange
errors that were used for the simulation without delay estimation. The regularization
method described in 3.4.1 was implemented with µ = 0.1. For both network sizes,
the Gauss-Newton method performed best, while the convergence of the Gradient
Descent method was very slow. Based on these results, we use the Gauss-Newton
method for all further simulations presented below.
The figures show the value of the objective function divided by the number of
receivers as a function of time. The final value of f(y) after convergence is lower
for the simulations that included delay estimation. The objective function value per
receiver for the 500 receiver network with delay estimation is lower than for the 50
receiver network, although the difference is small.
3.5 Performance Comparison
Figure 3.6 shows what kind of improvements in positioning accuracy would be achieved
if the correlated delays could be determined exactly. The figure shows the mean po-
sitioning error for a receiver located in San Francisco over the course of a day as
a function of the number of satellites in view, using both clear-sky cases and cases
where some of the satellites above the horizon were blocked out. Errors are plotted
both for regular pseudoranges that are influenced by all of the errors mentioned in
section 3.1, as well as for pseudorange data that did not include any of the correlated
delays.
As could be expected, the positioning errors are dramatically reduced if the corre-
lated delays are removed. The results show that in the absence of delay estimation or
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 54
100
104
108
1012
1016
Objectivefunctionvalue
100 101 102 103 104
Time step
Gradient descentNewton’s methodGauss-Newton
Figure 3.3: Convergence of total objective function divided by the number of receiversfor 500 receivers without delay estimation.
100
104
108
1012
1016
Objectivefunctionvalue
100 101 102 103 104
Time step
Gradient descentNewton’s methodGauss-Newton
Figure 3.4: Convergence of total objective function divided by the number of receiversfor 50 receivers with delay estimation.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 55
100
104
108
1012
1016
Objectivefunctionvalue
100 101 102 103 104
Time step
Gradient descentNewton’s methodGauss-Newton
Figure 3.5: Convergence of total objective function divided by the number of receiversfor 500 receivers with delay estimation.
0
20
40
60
80
Meanpositionerror(m
)
4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0
Number of satellites visible
With correlated errorsRandom errors only
Figure 3.6: Mean positioning error as a function of the number of satellites signalsare received from, for pseudorange errors as described in table 3.1. Random errorsinclude receiver noise and multipath.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 56
0
10
20
30
40
50
f(y)/N
r(m
2)
1 2 5 10 20 50 100 200 500 1000
Number of receivers
Without delay estimationWith delay estimationWithout delays
Figure 3.7: Mean objective value function per receiver as a function of the numberof receivers, with and without delay estimation, and for a hypothetical case wherecorrelated delays are set to zero.
the use of some other augmentation method, positioning errors can be fairly large if
the satellite visibility is not good. For four visible satellites, the mean error is larger
than 75 meters, which poses a challenge for many common applications of GPS. For
those types of application, augmentation can be a necessity.
Figure 3.7 shows the average objective function value on a per receiver basis for
three different cases, including regular single receiver point positioning as described in
section 3.2 with all errors included, single receiver point positioning for pseudorange
measurements with correlated delays removed, and multi-receiver point positioning
with delay estimation as described in section 3.3. The results indicate that multi-
receiver positioning with delay estimation is very effective in reducing the value of
the objective function, almost lowering it to the values that would be achieved in the
absence of correlated delays.
Figure 3.8 shows the accuracy improvement that is achieved by using delay esti-
mation for networks with various numbers of receivers. The accuracy improvement
factor in the figure is the ratio between the mean positioning error without and with
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 57
0
1
2
3
4
5
Accuracy
improvem
entfactor
5 10 20 50 100 200 500
Number of receivers
Figure 3.8: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 7.3 km × 11.7 km × 100 m. Each data point represents the meanposition error per receiver for 100 trials with different noise and satellite visibility.Error bars represent one standard deviation.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 58
0
1
2
3
4
5
Accuracy
improvementfactor
5 10 20 50 100 200 500
Number of receivers
Figure 3.9: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 73 km × 117 km × 10 000 m.
delay estimation. The results show that for a network of receivers spread across the
area described above, the positioning error can be reduced by a factor of two or more
if the network consists of 30 of more receivers. Increasing the network size to more
than 30 receivers does not yield significant improvements in the positioning accuracy.
The reason for this lies in the conditioning problems described in section 3.3.2, which
are particularly bad for networks with short receiver baselines. If the receivers are
spread out more, especially in altitude, a much better accuracy improvement can be
achieved, and larger network sizes are required to get the full benefit. Figure 3.9
shows the accuracy improvement if the receivers are spread out over 10 000 m in al-
titude, 40 minutes in latitude, and 80 minutes in longitude. Unfortunately this kind
of geometry is not feasible for most geographic areas without the use of airborne re-
ceivers. As mentioned before, spreading the receivers too much will result in reduced
correlation between the errors experienced by the different receivers, which is not
modeled in the simulations presented here. Note that for very small networks, the
accuracy improvement of the more spread-out network is less than for the original
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 59
0
1
2
3
4
5
Accuracy
improvementfactor
5 10 20 50 100 200 500
Number of receivers
Figure 3.10: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 7.3 km × 11.7 km × 100 m with random and multipath errors of20 m.
network due to the fact that the receivers are less likely to see have the same satellites
in view.
One potential remedy for the decreased correlation of pseudorange errors between
receivers in a spread-out network would be to model the correlated delays not as
single values, but as functions of geographic location with a number of parameters,
and to use an extended version of the methods described in the preceeding sections
to estimate these parameters.
Figures 3.10 and 3.11 are equivalent to figures 3.8 and 3.9, but with random
and multipath errors of 20 m. The plots show no significant changes in positioning
accuracy if delay estimation is used, indicating that delay estimation does not work
for the sizes of networks studied here if multipath errors are very high, although delay
estimation also does not reduce the point positioning accuracy.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 60
0
1
2
3
4
5
Accuracy
improvem
entfactor
5 10 20 50 100 200 500
Number of receivers
Figure 3.11: Ratio of total position error for algorithm without delay estimation toerror with delay estimation as a function of the number of receivers for a networkcovering a space of 73 km × 117 km × 10 000 m with random and multipath errorsof 20 m.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 61
3.6 Multigrid methods for distributed delay esti-
mation
The method for point positioning using distributed networks described above can be
implemented using a variety of different consensus algorithms for finding the delay
estimates. For small networks, a Metropolis algorithm [45] can be used, while the
algorithm described in chapter 2 would be appropriate for large networks. The method
for adjusting node measurement values described in section 2.8 is useful for updating
the residual values for each node after x and b are updated, so the consensus process
does not have to be run to complete convergence at each time step. To demonstrate
this implementation, this section describes the results from a numerical simulation of
point positioning in such a multigrid network.
For this simulation, we used the same network structure as for the two-dimensional
example in chapter 2. The x- and y-coordinates of the nodes in the network were
scaled so that the network would cover the 7.3 km × 11.7 km area around San Fran-
cisco described above. The resulting network, including the latitudes and longitudes
of the nodes, is shown in figure 3.12. In addition, random values of altitude between
0 and 100 m were assigned to the nodes.
Figures 3.13 and 3.14 show the convergence of the positioning error and objective
function for this simulation, comparing the multigrid algorithm with a solution each
receiver obtained individually without delay estimation. For the multigrid simulation,
each time step consists of one iteration with respect to x and b, followed by one
iteration of optimizing with respect to ε with eight steps in the consensus process.
The use of delay estimation reduces the positioning error by 53% in this case, which
is consistent with the results shown in figure 3.8.
3.7 Conclusion
In this chapter we described a method that estimates the portion of GPS pseudorange
measurement errors that is experienced by all receivers located in a specific area, in-
cluding errors due to inaccurate ephemerides, as well as tropospheric and ionospheric
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 62
37.72
37.73
37.74
37.75
37.76
37.77
37.78
Longitude(deg)
−121.65 −121.6 −121.55 −121.5
Latitude (deg)
Figure 3.12: Example network layout (connections between different levels notshown).
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 63
101
102
103
104
105
106
107
Totalposition
errornorm
(m)
1 2 5 10 20
Time step
Multigrid methodSingle receiver
Figure 3.13: Positioning error convergence for the receiver network example.
100
103
106
109
1012
1015
Meanob
jectivefunctionvalue(m
2)
1 2 5 10 20
Time step
Multigrid methodSingle receiver
Figure 3.14: Objective function value for the receiver network example.
CHAPTER 3. DISTRIBUTED GPS AUGMENTATION 64
delays. The resulting estimates of correlated delays are used to improve the point
positioning accuracy of the receivers. We found that there is an advantage of spread-
ing the receivers far apart for obtaining a receiver network geometry that results in
a well-conditioned solution, although this would also reduce the correlation between
the errors experienced by different receivers. Due to these issues, the accuracy im-
provements that can be accomplished are not as good as the improvements possible
with augmentation methods that use a network of fixed reference receivers. Our
simulations show that even given these issues, we can still reduce positioning errors
significantly by using distributed augmentation. Since the algorithms described here
do not rely on reference receivers, they could be particularly useful for places where
no augmentation networks exist, which includes most places on the Earth outside
North America and Europe.
We also showed that the correlated delays can be estimated using either centralized
or distributed computation. The multigrid algorithms described in chapter 2 can be
applied here, making distributed estimation feasible even for very large networks of
receivers.
The methods described here model the pseudorange correlated delays as a single
variable, assuming the receivers are located sufficiently close together that they ex-
perience the same delays. For future work, it would be interesting to combine the
algorithm presented here with regression analysis methods to find an estimate of the
pseudorange delays as a function of geographic location. This would make it more
reasonable to include receivers that are located further apart, and it might make it
possible to reduce errors further by including more receivers in the network, resulting
in relatively large networks that could take full advantage of the multigrid methods.
Chapter 4
Spectral methods for distributed
networks
65
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 66
4.1 Introduction and Assumptions
In chapter 2 we used the spectral gap of the state transition matrix as a measure of
performance of a distributed network for the purpose of running consensus algorithms.
We also pointed out that the eigenvectors corresponding to the largest eigenvalues of
the state transition matrix are an indicator for the types of noise that result in slow
convergence. In this chapter we describe how the distributed network itself can be
used to find some spectral properties of the state transition matrix, focusing on the
largest eigenvalues and corresponding eigenvectors.
For the types of state transition matrices described in chapter 2, the largest eigen-
value is always one, and the corresponding eigenvector is equivalent to the invariant
distribution of the system. We furthermore assume that all eigenvalues are real,
unique, and well separated. It is possible to construct networks with relatively sym-
metric and regular topologies that have repeated eigenvalues, but we will not consider
these networks here.
There is of course a large variety of eigenvalue methods that are commonly used
for large systems, including the Jacobi method, Lanczos’ method [21], Davidson’s
method [10], Krylov subspace methods, and combinations and variations thereof, such
as in [37]. Most of these algorithms are difficult to adapt for distributed networks,
since they rely on matrix factorizations. Many methods also only work for symmetric
matrices.
Kempe and McSherry developed an algorithm for finding the spectral properties of
a distributed network [18], and we build on that algorithm in this chapter. However,
this method is based on the assumption that the state transition matrix is symmet-
ric, which is a valid assumption for many networks, such as those using Metropolis
weights [45] with a uniform invariant distribution. The multiscale methods we pre-
sented in chapter 2, however, result in nonsymmetric state transition matrices. In the
following sections, we describe how the method in [18] can be extended to nonsym-
metric matrices. We also describe another algorithm, based on the power method,
that can be used as an alternative. In section 4.6 we study the performance of the
two algorithms with a numerical example. The power method is used for finding
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 67
the first and second largest eigenvalue and eigenvector of a symmetric matrix with
distributed computation in [5] as well as in [46], which then uses this result to control
connectivity in a network of robots.
In addition to monitoring the worst-case convergence rate of the network, knowl-
edge of the spectral properties can also be used to provide some guidance for su-
pernode placements in multigrid networks. As show in [8], [9], and [27], a given the
spectral properties of the state transition matrix, a diffusion map of the network can
be created. In section 4.7 we use a map of the largest eigenvectors of a network
as a guide for placing supernodes, and present a heuristic for fast selection of good
placements.
4.2 Spectral methods for symmetric matrices
Many of the most efficient spectral methods that are in common use today are difficult
to adapt for distributed computation, since they require operations such as matrix
factorization that cannot easily be broken down into small parts that can be performed
by individual nodes in a network. The simplest types of operations that can be
performed by a distributed network are consensus computations and multiplication
of a vector by the state transition matrix. We will therefore only consider spectral
methods that can be reduced to a series of operations that are either of these two
types, or that are simple enough for nodes to perform individually without knowing
the states of other nodes. In this section, we present two methods that can be adapted
for distributed systems. Since the fully distributed versions of these algorithms are
rather complex, we first present their centralized forms. Section 4.3 describes which
modifications should be made for applications in distributed networks.
One method that can be used in distributed networks is the power method. In
its most simple form, it can be used to find the eigenvalue with the largest modulus
and its corresponding eigenvector. This is done by multiplying a random vector
q repeatedly by a matrix P, and normalizing the result. The power method can
therefore be expressed as iterating over the following two equations, where q(t)→ v1
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 68
as t→∞.
q(t+ 1) = Pq(t) (4.2.1)
q(t) =q(t)
‖q(t)‖ (4.2.2)
If the largest eigenvalue is either very large or close to zero, the normalization should
be done at every time step to keep the size of the vector q(t) within a reasonable
range. For eigenvalues close to one, it can be sufficient to normalize once every few
time steps.
For the type of distributed system we are interested in here, this method in this
simple form is not very useful, since the largest eigenvalue is already known to be 1,
and the corresponding eigenvector is equal to the invariant distribution.
A shifted version of the power method can be used to find additional eigenvalues
and eigenvectors. Once the first eigenvalue is known, the power method can be run
using the shifted matrix (P − λ1v1vT1 ), where v1 is the eigenvector associated with
the largest eigenvalue. Thus, the second eigenvalue and associated eigenvector can
be found, which can then be used to find additional eigenvectors and eigenvalues.
Instead of a single vector q, we start the algorithm with a matrix Q ∈ RN,n, where N
is the number of nodes (or the size of P ), and n is the number of eigenvalues we want
to find. We let qj be the j-th column of Q. Using the following equations, qj(t)→ vj
as t→∞ [35].
qj(t+ 1) =
(P −
j−1∑k=1
λkvk(t)vTk (t)
)qj(t) (4.2.3)
qj(t) =qj(t)
‖qj(t)‖(4.2.4)
Another simple shifting method can be used to find the eigenvector associated with
the largest eigenvalue (in absolute value) of the sign opposite to λ1. In the specific
case considered here, since λ1 = 1, this will find the smallest eigenvalue. To find the
associated eigenvector, one can simply run the power method on the matrix P − σI,
where any σ larger than λ1 (and some that are smaller) will work.
An alternative to the power method is the algorithm described in [18]. The main
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 69
idea behind this algorithm is similar to the power method in that a matrix is alter-
nately multiplied with P and orthonormalized. The method differs from the power
method in the way the orthonormalization is performed.
Theorem 4.2.1. Let Q(0) be a random matrix and P be the symmetric state-transition
matrix of a network. At each time step, a matrix V is computed:
W (t) = PQ(t− 1) (4.2.5)
Consider the QR-factorization of W, W = QR. Let K be equal to W TW , so that
K = W TW = RTQTQR = RTR (4.2.6)
The matrix R is found by computing the Cholesky factorization of K, and the resulting
matrix is inverted and used to compute an updated value for Q.
Q(t) = W (t)R−1(t) (4.2.7)
As t→∞, the columns of Q converge to the eigenvectors of P .
Proof. A proof for this theorem is given by [18]. �
While these algorithms only yield the eigenvectors for symmetric matrices P , they
can be run using a nonsymmetric P . The columns of the resulting matrix Q do not
converge to the eigenvectors in this case, but instead form an orthonormal basis,
where the first j columns span the same space as the first j eigenvectors. We take
advantage of this fact when we extend these methods to nonsymmetric matrices in
section 4.4.
4.3 Adapting spectral methods for distributed net-
works
The two spectral methods, in the way they are presented in the previous section, are
appropriate for centralized systems. This section describes how they can be adapted
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 70
for distributed networks. We assume that vectors of length N are stored by the
system by storing each scalar component of the vector in the corresponding node.
Adding vectors and multiplication of vectors by scalars in distributed networks can
thus be straightforwardly performed with only node-local computations. Multiplying
a vector by the transition matrix P is also possible in an efficient manner, involving
only nearest-neighbor exchanges of information.
To find the mean of a vector, we multiply the vector repeatedly with P , so that the
invariant distribution a of the vector a ∈ RN is found using the following equation,
where TF is an integer that is large enough for the algorithm to converge.
aκ = P TF a (4.3.1)
As described chapter 2, the mean of a is related to the invariant distribution by the
offset factors κ, which can be found by either solving a system of linear equations, or
by simply running the same consensus algorithm on 1:
κ = P TF 1 (4.3.2)
The mean of the entries of a can then be found by each node i by dividing the
corresponding entry of a, a(i), by the i-th entry of κ:
1Ta
N=a(i)
κ(i)(4.3.3)
If we need to find the sum of the entries of a vector, we simply multiply the mean by
the number of nodes N .
For the power method, both equations 4.2.3 and 4.2.4 need to be modified. Equa-
tion 4.2.3 can be written as
qj(t+ 1) = Pqj(t)−j−1∑k=1
(λkvk(t)
(vTk (t)qj(t)
))(4.3.4)
The challenge in distributing this computation is finding vTk (t)qj(t). In order to do
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 71
that, we introduce a set of n matrices Gk, where for each k < n,
Gk(t) = diag(vk(t))Q(t) (4.3.5)
The matrix Q in the equation above contains all vectors qj, each vector being a column
of Q. The expression vTk qj is then equal to the sum of the entries in the j-th column
of Gk. The column sums of Gk can be found by running a consensus algorithm on
each column of Gk, and multiplying the resulting average by N to obtain the column
sum.
The second difficulty in using the shifted power method on a distributed network
arises with the normalization of qj. This norm can also be computed by using a
consensus algorithm. Let h ∈ RN be the element-wise square of qj, so that for
i = 1, . . . , N , the i-th entry of hj is
h(i)j =
(q(i)j
)2(4.3.6)
After running a consensus algorithm on hj, each node can compute the norm of the
qj from its entry h(i) of the invariant distribution:
hj = P TFhj (4.3.7)
‖qj‖ =
√h(i)j N
κ(i)(4.3.8)
For the QR-factorization method, the only challenge for running the algorithm in a
distributed network is computing W TW . As described in [18], this can also be found
using a consensus algorithm. If we let wj be the j-th column of W , then each node
can compute the matrix wjwTj , and summing each entry of the corresponding matrix
across all nodes yields W TW .
Whenever one of the algorithms mentioned here requires the use of a consensus
algorithm for a specific step, we have a choice of running the consensus process to
convergence at each iteration of the spectral method, or of performing only a specific
number of iterations of the consensus process. In the numeric examples we present
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 72
below, we selected a fixed number of iterations for each consensus process. In addi-
tion, instead of restarting the consensus process anew at each iteration, we used the
equations for updating sensor measurements presented in section 2.8, which improves
convergence of the consensus processes once the spectral algorithm approaches the
limit.
4.4 Adapting spectral methods for nonsymmetric
matrices
As described above, several methods for finding the eigenvectors of a symmetric ma-
trix can be used to find an orthonormal basis for the eigenvectors of a nonsymmetric
matrix. This sections describes how the actual eigenvectors can be found if such an
orthonormal basis is known. The method below works for nonsymmetric matrices,
but we assume here that all eigenvalues are unique and well separated.
From one of the methods described above we get a matrix Q, where any column
qj of Q is a linear combination of the first j eigenvectors (sorted by magnitude of
the corresponding eigenvalue). For each qj we can then use a method based on the
following theorem to eliminate components of the first j − 1 eigenvectors to obtain
the eigenvector vj.
Theorem 4.4.1. Let qk be a unit vector that is a linear combination of the first k
eigenvectors of a matrix P that has real, well separated eigenvalues. Then(k∏i=1
(λiI − P )
)qk = 0 (4.4.1)
Proof. The vector qk can be expressed as a linear combination of the first k eigenvec-
tors:
qk =k∑j=1
αjvj (4.4.2)
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 73
Plugging this into the right-hand side of equation 4.4.1 yields(k∏i=1
(λiI − P )
)qk =
k∑j=1
((k∏i=1
(λiI − P )
)αjvj
)(4.4.3)(
k∏i=1
(λiI − P )
)αjvj =
(k∏i=1
(λi − λj))αjvj
= 0 for all j ≤ k (4.4.4)
⇒k∑j=1
((k∏i=1
(λiI − P )
)αjvj
)= 0 (4.4.5)
�
A method for finding eigenvector vk (assuming λ1 through λk−1 have already been
determined) is to first eliminate all components of qk that are not parallel to vk by
using the result stated in the theorem above. The resulting vector is then normalized
to obtain the eigenvector.
vk =
(k−1∏i=1
(λiI − P )
)qk (4.4.6)
vk =vk‖vk‖
(4.4.7)
Since the product in equation 4.4.6 includes only the first k−1 terms, the component
of qk along vk remains, and the resultant vector vk only has to be normalized to obtain
the kth eigenvector.
4.5 Distributed concurrent computation of eigen-
values
The shifted power method described above finds the eigenvalues of a matrix succes-
sively, so that to find the j-th eigenvalue, eigenvalue j−1 needs to be known already,
and the orthonormal basis for all eigenvectors has to be determined before finding
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 74
any of the eigenvalues. Since both finding the orthonormal basis and finding individ-
ual eigenvalues are iterative processes, we could use estimates of some of the values
required for the next step in the process, instead of waiting for convergence of one
computational step before starting the next step. For example, instead of waiting
for the computation of λj−1 to converge before starting to compute λj, we use the
current estimate of λj−1 at each time step to find an approximation for λj. This is
particularly useful for distributed networks, where detecting convergence and starting
a computational process in all nodes in a synchronized way are much more difficult
than in a centralized system. This method is also useful if the state-transition matrix
changes slowly over time, so the process does not have to be reinitiated from the
beginning when P changes.
Theorem 4.5.1. Let Q(t) ∈ RN×n be a matrix with columns qi(t), so that for any
k, the first k columns of Q(t) are an estimate for a set of vectors that form an
orthonormal basis for the space spanned by the first k eigenvectors of a matrix P ∈RN×N with real, well separated eigenvalues, with the eigenvectors sorted in order of
descending modulus. We assume that the estimate Q(t) is updated at each time step,
and Q(t)→ Q as t→∞, where the columns of Q are orthonormal and exactly span
the same spaces as the eigenvectors. For each i ≤ n we define a matrix Fi ∈ RN×n,
so that at each time step t,
F1(t) = Q(t) (4.5.1)
Fi(t) =(Li−1(t− 1)− P
)Fi−1(t− 1) for all i > 1 (4.5.2)
For each i = 1, . . . , n, let vi be equal to the i-th column of Fi. If ei ∈ RN has a 1 in
the i-th column, with all other entries being 0, then
vi(t) = Fi(t)ei (4.5.3)
We also define a set of n matrices Li, where
Li(t) =(diag(P vi)
)(diag(vi)
)−1(4.5.4)
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 75
Then as t → ∞, we have that vi → aivi, where ai is a scalar value, vi is the i-th
eigenvector of P , and the diagonal entries of Li all converge to λi.
Proof. First, we show that vi → vi and Li → λiI for i = 1:
F1(∞) = Q(∞) = Q (4.5.5)
v1(t) = F1(t)e1 = q1 = v1 (4.5.6)
L1(t) =(diag(Pv1)
)(diag(v1)
)−1= λ1I (4.5.7)
We prove convergence of additional eigenvectors and eigenvalues by induction. As-
sume that for some k, Li → λiI for all i < k. Then,
Fk(∞) = (λk−1I − P )Fk−1(∞) =k−1∏i=1
(λiI − P )Q (4.5.8)
Next, we use the fact that multiplying a matrix with ek is equivalent to extracting
the k-th column, and we apply the result from equation 4.4.6 to get
vk(∞) = Fk(∞)ek =
(k−1∏i=1
(λiI − P )Q
)ek
=
(k−1∏i=1
(λiI − P )
)qk = vk = akvk (4.5.9)
The eigenvector vk and the scalar ak can be found by normalizing vk. Given either
vk or vk, we can now show that equation 4.5.4 yields the eigenvalue λk by plugging
in akvk for vk:
Lk(t) =(diag(akPvk)
)(diag(akvk)
)−1=(diag(Pvk)
)(diag(vk)
)−1= λkI (4.5.10)
This shows that λk can be found using the equations above if λi is known for all i < k.
Since we also showed that the equations above yield λ1, by induction and assuming
that Q→ Q, vk → vk and Lk → λkI for all k ≤ n. �
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 76
The algorithm described in theorem 4.5.1 can easily be performed by a distributed
network. We assume that the process that is used for finding Q stores each row of
Q(t) in the node corresponding to the row number. Similarly, we also store each row
of all Fi in the corresponding node. Each node therefore needs space for n2 values
to store all Fi. To reduce computational time and storage space, it is also worth
noting that for each Fi, only the first i columns are used for finding the eigenvectors
and eigenvalues. The other columns are not needed, and converge to zero after Q
converges.
Since P is the state transition matrix of the network itself, multiplying a matrix
by P is straightforward, as long as each node has access to the row of the matrix that
corresponds to its node number in the network. We can therefore easily find PFi for
all i. Multiplying a matrix with a diagonal matrix is also simple, as long as each node
knows the corresponding entry of the diagonal matrix. Equation 4.5.4 can therefore
be computed in a distributed fashion, as long as each (j, j) entry of Li is stored in
node j for all i.
The complete set of equations for both the power method and QR-factorization
algorithms are given in section A.
4.6 Numerical Example
In this section, we present results from the implementation of the spectral algorithms
presented in this chapter. The network used for our example is the three-level network
described in section 2.7 and shown in figure 2.9, which consists of 364 nodes arranged
in three levels. We ran both the power method and QR-factorization algorithms as
outlined in section A on this network, using M = 20 steps for each consensus process
at each iteration. The convergence of the estimates of the first four eigenvectors and
eigenvalues, as well as the estimates of the vectors q forming the orthogonal basis are
shown in figures 4.1 through 4.6.
The figures show that convergence is generally somewhat smoother for the QR-
factorization method, compared to the power method. For both methods, accuracy of
the final converged value decreases for each successively smaller eigenvalue, as could
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 77
0
0.25
0.5
0.75
1
1.25
1.5
Residualnorm
ofthecolumnsofQ
0 500 1000 1500 2000
Time step
q1q2q3q4
Figure 4.1: Convergence of the orthogonal vector basis for the distributed QR method.
0
0.25
0.5
0.75
1
1.25
1.5
Residual
norm
ofthecolumnsof
Q
0 500 1000 1500 2000
Time step
q1q2q3q4
Figure 4.2: Convergence of the orthogonal vector basis for the distributed powermethod.
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 78
10−12
10−9
10−6
10−3
100Residual
norm
softheeigenvectors
0 500 1000 1500 2000
Time step
v1v2v3v4
Figure 4.3: Convergence of the eigenvectors for the distributed QR method.
10−12
10−9
10−6
10−3
100
Residual
normsof
theeigenvectors
0 500 1000 1500 2000
Time step
v1v2v3v4
Figure 4.4: Convergence of the eigenvectors for the distributed power method.
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 79
10−12
10−8
10−4
100
104
Residualoftheeigenvalues
0 500 1000 1500 2000
Time step
`1
`2
`3
`4
Figure 4.5: Convergence of the eigenvalues for the distributed QR method.
10−12
10−8
10−4
100
104
Residual
oftheeigenvalues
0 500 1000 1500 2000
Time step
`1
`2
`3
`4
Figure 4.6: Convergence of the eigenvalues for the distributed power method.
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 80
be expected when considering how larger eigenvalues are used directly or indirectly
in the orthonormalization process. As a result, these methods would not be useful in
practice for finding most or all of the eigenvalues and eigenvectors of a matrix.
4.7 Using spectral information for supernode place-
ment
After determining some of the spectral properties of a distributed network, we can use
this information to monitor the health of the distributed network, since the largest
eigenvalues are an indicator for the worst-case convergence rates of the network.
The spectral information can also be used to adjust the structure of the network
in an attempt to improve convergence rates. The structure of the network could
be adjusted by moving nodes or adding edges to the network. If we assume, as we
did for some of the examples in this thesis, that we start with a given base-level
network, and add a number of supernodes at select locations or regular nodes, then
it would be particularly useful to use spectral methods for determining where in the
network supernodes should be added. In theory, we could test all possible supernode
locations and select the ones that result in the highest spectral gap, but given that the
number of possible configurations grows exponentially with network size, this is likely
not practical. Instead, this section describes a heuristic that, while not necessarily
determining the the globally optimal supernode locations, can at least be used to
select locations that result in reasonably good convergence rates.
We showed in section 2.10 that the when each node is associated with its corre-
sponding entry of the eigenvectors, the largest eigenvectors of a network tend to vary
slowly across the network, so that adjacent nodes have similar entries of these vectors.
If we want to find several locations for supernode placements, and we want to place
supernodes in various areas of the network and avoid placing supernodes too close to
one another, we can use the eigenvectors as a guideline for the placements. It would
also be possible to use the physical location of the base level nodes as a criterion, but
in some networks, the physical location of a node might not be known to the node. In
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 81
−0.3
−0.2
−0.1
0
0.1
v 3
−0.1 −0.05 0 0.05 0.1 0.15
v2
Figure 4.7: Network from figure 2.9 in v2-v3 space.
addition, two nodes that are within close geometric proximity of one another might
not be close to one another in the network if there is not an edge connecting the
two nodes. The previous sections showed that it is possible to find the eigenvectors
of the network even if the physical locations of the nodes and the network topology
are unknown. The method described in this section places supernodes so that the
distance between supernodes is large in a coordinate system that uses the entries of
the second and third eigenvector as coordinates. A plot of the baselevel of nodes of
our two-dimensional example from section 2.7 is shown in figure 4.7.
One method for finding placements for supernode is to start with a set of NS
randomly selected nodes and place supernodes there. Then, in an iterative process,
the supernodes hop from their current placement to an adjacent node in the network,
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 82
−0.3
−0.2
−0.1
0
0.1
v 3
−0.1 −0.05 0 0.05 0.1 0.15
v2
Figure 4.8: Final supernode placements in v2-v3 space.
at each time step selecting the neighboring node that maximizes the sum of squares
of its distances to the other supernodes in v2-v3 space. After several iterations, the
supernodes are located far from each other in the network, and connections between
supernodes help to connect distant parts of the network.
The eigenvector values used to find the node locations in v2-v3 space can either be
the eigenvectors of the base level, or the eigenvectors of the entire network with su-
pernodes added. If the eigenvectors of the entire network are used, it can be difficult
to achieve convergence to a final set of supernode placements since the node locations
in the v2-v3 space change whenever a supernode moves. In addition, having to re-
compute the eigenvectors at each time step increases the computational complexity.
The examples below therefore use the eigenvectors of the base level.
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 83
0
2
4
6
8
10
y
0 2 4 6 8 10
x
Figure 4.9: Final supernode placements in x-y space.
Figure 4.8 shows the final position in v2-v3 space of six supernodes that were added
to the base level network in figure 4.7. Figure 4.9 shows these supernode locations
in the regular x-y space. The figures show that the supernodes are located far from
one another both in v2-v3 space and the network. The two supernodes at the bottom
of the plot in x-y space are not very far from one another geometrically, but since
the underlying base level nodes are not linked, the distance between them across the
network is considerable.
CHAPTER 4. DISTRIBUTED SPECTRAL METHODS 84
4.8 Conclusion
Two algorithms for finding eigenvalues and eigenvectors of the state transition ma-
trices of distributed networks were presented in this chapter. The algorithms can
be run on the distributed network itself, even if the state transition matrices are
nonsymmetric. The algorithms perform well for finding the largest eigenvectors of
a matrix, but accuracy decreases and time to convergence increases with each addi-
tional eigenvector that is to be found. Since the largest eigenvalues and corresponding
eigenvectors are of particular importance in determining the worst-case convergence
rates and corresponding noise types for a distributed network, these method can be
used to monitor the system health of such a network.
We also showed how the spectral properties of a distributed network can be used
to guide the placement of supernodes in a multilevel network.
Appendix A
Distributed spectral algorithms for
nonsymmetric matrices
The following equations can be used for finding the first n eigenvalues and eigenvectors
of a nonsymmetric matrix P in a distributed network. The steps of finding the
the orthonormal basis for the eigenvectors Q and finding the actual eigenvectors
are interleaved, so that a rough estimate for the eigenvalues is available while Q
is being computed. To make it clear how this can be implemented on a distributed
network, the equations are written in a form that emphasizes which vector and matrix
components are available to individual nodes. The superscript (i) indicates the i-th
entry of a vector or i-th row of a matrix, and the superscript (i,j) denotes the (i, j)
entry of a matrix, both of which are store in node i. As a result, there are two types
of operations: Some equations presented below (such as equation A.0.1) depend only
on variables that are all stored within the same node, while other equations (such as
A.0.2) represent a consensus process, where a vector or matrix is multiplied by the
state transition matrix P.
As an initial step before running either one of the algorithms, the offset factors κ
should be determined by running the following process to convergence at each node:
κ(i)(0) = 1 (A.0.1)
κ(i)(t+ 1) = P (i)κ(t) (A.0.2)
85
APPENDIX A. DISTRIBUTED SPECTRAL ALGORITHMS 86
A.1 Power method
The equations for using the power method are as follows. To initialize, set the vectors
qj to a set of linearly independent unit vectors, and let
qj(0) = vj(0) = vj(0) (A.1.1)
All other variables are initially set to zero.
G(i,j)k (t) = v
(i)k (t− 1)q
(i)j (t− 1) (A.1.2)
Gk(t) = PM(Gk(t− 1) +G(t)−G(t− 1)
)(A.1.3)
q(i)j (t) = P (i)qj(t− 1)−
j−1∑k=1
(`(i)k (t)v
(i)k (t)
G(i,j)k (t)N
κ(i)
)(A.1.4)
h(i)j (t) =
(q(i)j (t)
)2(A.1.5)
hj(t) = PM(hj(t− 1) + hj(t)− hj(t− 1)
)(A.1.6)
q(i)j (t) = q
(i)j
√√√√∣∣∣∣∣ κ(i)
h(i)j (t)N
∣∣∣∣∣ (A.1.7)
vj(t) = qj(t)−(j−1∏k=1
(diag(`k(t− 1))− P
))vj(t− 1) (A.1.8)
m(i)j (t) =
(v(i)j (t)
)2(A.1.9)
mj(t) = PM(mj(t− 1) +mj(t)−mj(t− 1)
)(A.1.10)
v(i)j = v
(i)j
√√√√∣∣∣∣∣ κ(i)
m(i)j (t)N
∣∣∣∣∣ (A.1.11)
`(i)j =
P (i)vj(t)
v(i)j (t)
(A.1.12)
APPENDIX A. DISTRIBUTED SPECTRAL ALGORITHMS 87
A.2 QR-factorization
The equations for the method using the QR-factorization are:
W(i)j (t) = P (i)uj(t− 1) (A.2.1)
K(i)j,k(t) = W (i,j)(t)W (i,k)(t) (A.2.2)
Kj,k(t) = PM(K
(i)j,k + Kj,k(t)− Kj,k(t− 1)
)(A.2.3)
Each node then individually finds a matrix Ki ∈ Rn×n with entries
K(j,k)i (t) =
K(i)j,k(t)N
κ(i)(A.2.4)
Each node then finds a matrix Ri ∈ Rn×n, which is the Cholesky factorization of Ki.
`(i)j = Rj,j
i (A.2.5)
Si = R−1i (A.2.6)
U (i) = W (i)Si (A.2.7)
u(i)j = U (i,j) (A.2.8)
The second half of the algorithm is the same as for the power method:
vj(t) = uj(t)−(j−1∏k=1
(diag(`k(t− 1))− P
))vj(t− 1) (A.2.9)
m(i)j (t) =
(v(i)j (t)
)2(A.2.10)
mj(t) = PM(mj(t− 1) +mj(t)−mj(t− 1)
)(A.2.11)
v(i)j = v
(i)j
√√√√∣∣∣∣∣ κ(i)
m(i)j (t)N
∣∣∣∣∣ (A.2.12)
`(i)j =
P (i)vj(t)
v(i)j (t)
(A.2.13)
Bibliography
[1] J.S. Abel and J.W. Chaffee. Existence and uniqueness of GPS solutions. IEEE
T. Aero Elec. Sys., 27, 1991.
[2] S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph.
SIAM Review, 46(4):667–689, 2004.
[3] S. Boyd, A. Ghosh, and B. Prabhakar. Randomized gossip algorithms.
IEEE/ACM Transactions on Networking, 14:2508–2530, 2006.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University
Press, 2004.
[5] G. Canright, K. Engo-Monsen, and M. Jelasity. Efficient and robust fully dis-
tributed power method with an application to link analysis. Technical Re-
port UBLCS-2005-17, University of Bologna, Department of Computer Science,
Bologna, Italy, 2005.
[6] R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri. Communication constraints
in the average consensus problem. Automatica, 44(3):671–684, 2008.
[7] S. Chatterjee and E. Seneta. Towards consensus: some convergence theorems on
repeated averaging. Journal of Applied Probability, pages 89–97, 1977.
[8] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and
S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure
definition of data: Diffusion maps. Proc. Natl. Acad. Sci., 102(21):7426–7431,
2005.
88
BIBLIOGRAPHY 89
[9] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and
S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure
definition of data: Multiscale methods. Proc. Natl. Acad. Sci., 102(21):7432–
7437, 2005.
[10] E.R. Davidson. The iterative calculation of a few of the lowest eigenvalues and
corresponding eigenvectors of large real-symmetric matrices. Journal of Compu-
tational Physics, 17:87–94, 1975.
[11] M.H. DeGroot. Reaching a consensus. Journal of the American Statistical As-
sociation, pages 118–121, 1974.
[12] P. Enge. Local area augmentation of GPS for the precision approach of aircraft.
Proceedings of the IEEE, 87(1), 1999.
[13] P. Enge and A.J. Van Dierendonck. Wide Area Augmentation System. In
B. Parkinson and J. Spilker, editors, Global Positioning System: Theory and
Applications, volume 2. AIAA, 1996.
[14] L. Gauthier, P. Michel, J. Ventura-Traveset, and J. Benedicto. EGNOS: The
first step in Europe’s contribution to the global navigation satellite system. ESA
Bulletin, 105:35–43, 2001.
[15] A. Ghosh and S. Boyd. Growing well-connected graphs. In Proceedings of the
45th IEEE Conference on Decision and Control, pages 6605–6611, 2006.
[16] B. Johansson, M. Rabi, and M. Johansson. A simple peer-to-peer algorithm for
distributed optimization in sensor networks. In Proceedings of the 46th IEEE
Conference on Decision and Control, pages 4705–4710, 2007.
[17] C. Kee. Wide Area Differential GPS. In B. Parkinson and J. Spilker, editors,
Global Positioning System: Theory and Applications, volume 2. AIAA, 1996.
[18] D. Kempe and F. McSherry. A decentralized algorithm for spectral analysis.
Journal of Computer and System Sciences, 74:70–83, 2008.
BIBLIOGRAPHY 90
[19] J.H. Kim, M. West, S. Lall, E. Scholte, and A. Banaszuk. Stochastic multiscale
approaches to consensus problems. In Proceedings of the 47th IEEE Conference
on Decision and Control, pages 5551–5557, 2008.
[20] J.H. Kim, M. West, E. Scholte, and S. Narayanan. Multiscale consensus for
decentralized estimation and its application to building systems. In American
Control Conference, 2008, pages 888–893, 2008.
[21] C. Lanczos. An iteration method for the solution of the eigenvalue problem of
linear differential and integral operators. Journal of Research of the National
Bureau of Standards, 45(4):255–282, 1950.
[22] J. Loomis, R. Golledge, and R. Klatzky. GPS-based navigation systems for
the visually impaired. In W. Barfield and T. Caudell, editors, Fundamentals of
wearable computers and augmented reality, pages 429–446. Lawrence Erlbaum
Associates Publishers, 2001.
[23] P. Misra and P. Enge. Global Positioning System Signals, Measurements, and
Performance. Ganga-Jamuna Press, 2006.
[24] R. Montenegro and P. Tetali. Mathematical aspects of mixing times in Markov
chains. Now Publishers Inc, 2006.
[25] R. Muellerschoen, W. Bertiger, M. Lough, D. Stowers, and D. Dong. An Internet-
based global differential GPS system, initial results. In ION National Technical
Meeting, Anaheim, CA, 2000.
[26] R. Muellerschoen, B. Iijima, R. Meyer, and Y. Bar-Sever. Real-time point-
positioning performance evaluation of single-frequency receivers using NASA’s
Global Differential GPS System. In ION GNSS Meeting, Long Beach, CA, 2004.
[27] B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis. Diffusion maps, spectral
clustering and reaction coordinates of dynamical systems. Applied and Compu-
tational Harmonic Analysis, 21(1):113–127, 2006. Diffusion Maps and Wavelets.
BIBLIOGRAPHY 91
[28] D. Niculescu and B. Nath. Ad hoc positioning system (APS). In Global Telecom-
munications Conference, 2001. GLOBECOM ’01. IEEE, volume 5, pages 2926–
2931, 2001.
[29] R. Olfati-Saber. Ultrafast consensus in small-world networks. In Proc. 2005
American Control Conference, pages 2371–2378, 2005.
[30] R. Olfati-Saber, J.A. Fax, and R.M. Murray. Consensus and cooperation in
networked multi-agent systems. Proceedings of the IEEE, 95(1):215–233, 2007.
[31] R. Olfati-Saber and R.M. Murray. Consensus problems in networks of agents with
switching topology and time-delays. Automatic Control, IEEE Transactions on,
49(9):1520–1533, sept. 2004.
[32] M. Rabinowitz and J.J. Spilker Jr. A new positioning system using television syn-
chronization signals. Broadcasting, IEEE Transactions on, 51(1):51–61, march
2005.
[33] W. Ren, R.W. Beard, and E.M. Atkins. A survey of consensus problems in
multi-agent coordination. In Proc. 2005 American Control Conference, pages
1859–1864, 2005.
[34] J. Rosenthal. Convergence rates of Markov chains. SIAM Rev., 37(3):387–405,
1995.
[35] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester Uni-
versity Press, 1992.
[36] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, second edition,
2003.
[37] G. Sleijpen and H. Van der Vorst. A Jacobi-Davidson iteration method for linear
eigenvalue problems. SIAM Review, 42(2):267, 2000.
[38] M.G. Soares, B. Malheiro, and F.J. Restivo. A distributed system for the dissem-
ination of DGPS data through the Internet. In Proceedings of the International
BIBLIOGRAPHY 92
Conference on Advances in Infrastructure for Electronic Business, Education,
Science, Medicine and Mobile Technologies on the Internet (SSGRR 2003), 2003.
[39] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing Markov process
on a graph and a connection to a maximum variance unfolding problem. SIAM
Review, 48(4):681, 2006.
[40] A. Tahbaz-Salehi and A. Jadbabaie. Small world phenomenon, rapidly mixing
Markov chains, and average consensus algorithms. In Proceedings of the 46th
IEEE Conference on Decision and Control, pages 276–281, 2007.
[41] U. Trottenberg, A. Schuller, and C. W. Oosterlee. Multigrid Methods. Academic
Press, 2000.
[42] D.J. Watts and S.H. Strogatz. Collective dynamics of ‘small-world’ networks.
Nature, 393:440–442, 1998.
[43] D.B. Work, O.-P. Tossavainen, S. Blandin, A.M. Bayen, T. Iwuchukwu, and
K. Tracton. An ensemble Kalman filtering approach to highway traffic estimation
using GPS enabled mobile devices. In Proceedings of the 47th IEEE Conference
on Decision and Control, pages 5062–5068, 9-11 2008.
[44] L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems
and Control Letters, 53(1):65–78, 2004.
[45] L. Xiao, S. Boyd, and S. Lall. A space-time diffusion scheme for peer-to-peer
least-squares estimation. In IPSN ’06: Proceedings of the 5th international con-
ference on Information processing in sensor networks, pages 168–176, 2006.
[46] P. Yang, R.A. Freeman, G.J. Gordon, K.M. Lynch, S.S. Srinivasa, and R. Suk-
thankar. Decentralized estimation and control of graph connectivity for mobile
sensor networks. Automatica, 46(2):390–396, 2010.