Structure in total least squares parameter estimation for

Structure in total least squares parameterestimation for electrical networks

Adair Lang ∗ Iman Shames ∗ Michael Cantoni ∗

∗Department of Electrical and Electronic Engineering, University ofMelbourne, Parkville, Victoria 3010, Australia.

Emails:[email protected],{ishames,cantoni}@unimelb.edu.au

Abstract: A total least squares approach to estimating the line admittances of an electricalnetwork for power distribution is explored. It is observed that the Newton iterates for solvingthe corresponding non-linear Karush-Kuhn-Tucker (KKT) optimality conditions are structuredfor path and tree networks. In particular, it is revealed that a condensed form of the linearsystem of equations to be solved at each Newton step has a sparse block arrow-head structure,which can be exploited to improve the scalability of the approach. A numerical example for apath network is presented.

Keywords: Parameter estimation, error in variables, electrical networks, large-scale systems

1. INTRODUCTION

Parameter estimation problems for error-in-variables mod-els, where both the input and output data are noisy,often arise in systems engineering (Soderstrom, 2007). Anapproach involves the construction of parameter estimatesfrom the data by solving a deterministic optimizationproblem. In many cases, this optimization problem can beencoded in the so-called total least squares form (Huffeland Lemmerling, 2002). An example of error-in-variablesmodeling arises in the study of electrical networks forpower distribution (Yu et al., 2017). The modelling prob-lem of interest is to determine the parameters of theflow-drop relationships (i.e., admittances) from node (i.e,bus) measurements, in a way that adheres to Kirchoff’slaws, which reflect the conservation of physical quantitiesaccording to the particular network interconnection struc-ture. A related challenge is to exploit the correspondingstructure of the parameter estimation problem in orderto facilitate the treatment of large-scale networks. Firststeps in this direction are explored below. It is envisagedthat the main ideas extend to networks that are consistentwith Kirchoff conservation laws more generally (Oster andDesoer, 1971).

In this paper, the problem of estimating the line ad-mittance parameters of an electrical network with giventopology is formulated as a deterministic optimizationproblem. The problem data comprises error-prone mea-surements of the voltage (potential difference to a datum)and injected current (reflecting the corresponding powersupply / load) phasors at each node. In Yu et al. (2017)voltage magnitude, and power measurements are used foreach node, which leads to a non-linear dependence of themodel on the parameter. By contrast, the relationship islinear for the choice of measurements considered below,allowing the problem to be formed as a total least squaresproblem. This structured optimization problem would alsoarise for a maximum likelihood formulation, in the case

where measurement noises have Gaussian distributions.The structure arising in Newton iterations for solvingthe corresponding Karush-Kuhn-Tucker (KKT) optimalityconditions are explored. In particular, for path and treenetworks it is found these iterations can be expressed interms of the solution of a condensed system of linear equa-tions that has a block arrow-head structure. Numericalresults suggest that working with this condensed form iscomputationally advantageous.

In Ardakanian et al. (2017), the parameter estimationproblem is combined with topology estimation in electricalnetworks. However, a traditional least squares approach isused which does not explicitly account for errors that couldarise in measurements of the input/dependent variables,which can give rise to biased estimates. In Marelli et al.(2015), a distributed algorithm is proposed for solving aweighted least squares estimation problem for power net-works. This algorithm arises from a linearization originallyconsidered in Schweppe and Rom (1970). By contrast, thedevelopments below relate directly to the non-convex totalleast squares problem and the structure of the resultingKKT conditions. Total least squares problems with struc-ture are studied in Huffel and Lemmerling (2002), espe-cially the Toeplitz type matrix structure associated withdynamical systems. Here, the structure in the problemstems from the Kirchoff laws underlying the system model.In Beck and Eldar (2010) a regularization is applied to thestructured total least squares (STLS) problem to arrive ata new problem referred to as structured total maximumlikelihood (STML). The STLM problem is shown to havedesirable properties such as guaranteed existence of asolution, which does not necessarily hold for the standardSTLS problem. Given that the STML inherits the samestructure as the original problem it is expected the tech-niques exploited here can be extended to this problem.In Cavraro et al. (2015), the related problem of networktopology reconstruction is studied in terms of an algorithm

that involves a library of possible topologies and usingdata to determine the closest match. Efficiency in solvingparameter estimation problems for any given topology isrequired in such an approach which motivates the issuesconsidered below.

The paper is structured as follows. Section 2 includes theformulation of the total least squares estimation problemand the corresponding KKT conditions. Newton iterationsfor solving the KKT conditions are investigated in Sec-tion 3, where the structure of a condensed system of linearequations to solve at each step is revealed, for path and treenetworks in particular. Numerical results are presentedin 4. Conclusions and future work are given in Section 5.

2. PROBLEM FORMULATION

Consider an electrical network defined by an undirectedgraph G(V, E), where V = {1, . . . , n} denotes the set ofnodes in the network, and E denotes the set of edges,comprising unordered two element subsets of V. Definethe set of neighbours for each node l ∈ V as Nl :={m : {l,m} ∈ E}. Associated with each node l is thevoltage vl (relative to a datum), and externally injectedcurrent il, taken to be positive into the node, to representpower supplied or consumed. In sinusoidal steady-state,these voltages and currents can be represented as complexnumbers that satisfy Kirchoff’s current and voltage laws.The admittance associated with the edge {l,m} ∈ E isdenoted by the complex number ylm = yml.

Suppose that a collection of M (a positive integer) mea-surements of the node voltage and injected current

v(k)l = vl + η

(k)l (1a)

i(k)l = il + γ

(k)l , (1b)

for k ∈ {1, . . . ,M} and l ∈ V is given, where η(k)l and γ

(k)l

denote the measurement errors associated with node thek-th data point and node l.

Problem 1. Given the electrical network with undirectedgraph G(V, E), and erroneous measurements of voltageand current at each node (1), estimate the admittanceparameter ylm for each edge {l,m} ∈ E.

Combining Ohm’s law with Kirchoff’s current law at eachnode yields

il =∑m∈Nl

(vl − vm)ylm, ∀l ∈ V. (2)

Using (1), this can be written in terms of the measuredvariables and the associated errors as follows:

i(k)l − γ

(k)l =

∑m∈Nl

((v

(k)l − v

(k)m ) + (η(k)m − η

(k)l ))ylm,

∀k ∈ {1, . . . ,M}. (3)

Let y = [. . . , ylm, . . . ]>, d(k) = [. . . , (v

(k)l − v

(k)m ), . . . ]>

and β(k) = [. . . ,−(η(k)l − η(k)m ), . . . ]> be vectors in Ce×1

with entries ordered according to any chosen ordering theedges {l,m} ∈ E , where e denotes the cardinality of E .

Furthermore, let i(k) = [i(k)1 , i

(k)2 , . . . , i

(k)n ]> and γ(k) =

[γ(k)1 , γ

(k)2 , . . . , γ

(k)n ]> be vectors in Cn×1. It follows from

(3) that

i(k) − γ(k) = H(diag(d(k)) + diag(β(k))

)y, (4)

whereH is an n×ematrix of entries from the set {−1, 0, 1},in accordance with the ordering of the edges chosen above.

Define B(k) = diag(β(k)) and D(k) = diag(d(k)). Byequating the real and imaginary terms, (4) can be writtenas an equation involving only real numbers as follows[

i(k)R − γ

(k)R

i(k)I − γ

(k)I

]=

[V1 −V2V2 V1

] [yRyI

], ∀k ∈ {1, . . . ,M}. (5)

where V1 = H(D

(k)R +B

(k)R

)and V2 = H

(D

(k)I +B

(k)I

).

Here the subscripts R and I denote the real and imaginaryparts of the corresponding complex number, respectively.Moreover, introduce a vector s(k) ∈ R2(n+e) and matrixA(k) ∈ R2(n×e) be defined as

s(k) =

β(k)R

β(k)I

γ(k)R

γ(k)I

, A(k) =

[HD

(k)R −HD(k)

I

HD(k)I HD

(k)R

]. (6)

Define the following selector matrices S1, S2 ∈ Re×(2n+2e)

and S3, S4 ∈ Rn×(2n+2e)

S1 =[Ie 0e×(2n+e)

], S2 =

[0e Ie 0e×(2n)

], (7)

S3 = [0n×2e In 0n] , S4 =[0n×(2e+n) In

], (8)

where Ia is the a×a identity matrix, 0a is a matrix of zerosin Ra×a and 0a×b a matrix of zeros in Ra×b. Using these

then the matrices B(k)R and B

(k)I can be written explicitly

in terms of s(k) as follows

B(k)R = (1e(S1s

(k))>) ◦ Ie, (9a)

B(k)I = (1e(S2s

(k))>) ◦ Ie, (9b)

where 1e is a vector of all ones in Re×1 and A ◦B denotesthe Hadamard-product of two matrices A,B ∈ Rn×r withthe result a matrix of same dimension with elements givenby (A ◦B)lm = (A)l,m(B)l,m. For simpler notation define

a mapping ∆ : R2(e+n) → R2n×2e as

∆(s(k)) =

[HB

(k)R −HB(k)

I

HB(k)I HB

(k)R

](10)

where the dependence on s(k) arises from from the depen-dence of s(k) on the error matrices defined in (9).

There are potentially many possible choices of yR, yI ands(k) that satisfy (5), especially in the case where thenumber of measurements, M , is greater than one. Even ifthere is only one measurement i.e., M = 1, then the systemdescribed by (5) is overdetermined when n > e, i.e., thereare more nodes in the network than edges, which is thecase for trees. If noise vector s(k) is treated as deterministicunknowns the problem can be formulated as an equalityconstrained optimization problem. The penalty consideredhere is a weighted sum of squares of error terms s(k). Thisweighting can arise from formulating the log maximumlikelihood function in the case that the noises are allGaussian random variables. In that case the weightingmatrix would be the inverse of the covariance matrix of thejointly gaussian distributed random variables. This leadsto the optimization problem

min(s(k))M

k=1,yR,yI

1

2

M∑k=1

(s(k))>Ws(k) (11a)

s.t

[i(k)R − S3s

(k)

i(k)I − S4s

(k)

]=(A(k) + ∆(s(k))

)[yRyI

],

∀k ∈ {1, . . . ,M} (11b)

where W is a weighting matrix in R2(n+e)×2(n+e) thatsuitably penalizes the error terms. The nonlinear equalityconstraint given by (11b) makes this problem non-convexand hence there may be many local minima. This type ofproblem is often called a structured total least squaresproblem; see Huffel and Lemmerling (2002), where acertain structure of the error matrices ∆(s(k)), is enforcedby the definition ∆(·).Remark 2.1. If ∆(·) did not enforce the structure andwas instead allowed to be a full matrix of unknown entriesthen (11) would become a standard total least squaresproblem which could be solved using singular value de-composition methods. However, the resulting parameterswould not necessarily be consistent with the measurementswhilst adhering to the network structure.

Remark 2.2. If voltages (vl) could be measured perfectly,then (11) would reduce to a standard weighted least

squares problem i.e., η(k)l = 0,∀l ∈ V and all k ∈

{1, . . . ,M} and (11) becomes

minyR,yI

M∑k=1

∥∥∥∥∥[i(k)R

i(k)I

]−A(k)

[yRyI

]∥∥∥∥∥2

W

. (12)

In this case the problem is convex and hence a globalsolution can be found.

There are many ways to find local solutions of the non-convex problem (11). Augmented or penalty methodscould be used to move the constraint into the objectivefunction and make use of efficient gradient like meth-ods. However, it may be difficult to enforce guaranteedfeasibility and the sparsity and structure in the equalityconstraints is likely lost by employing these methods. Sincethe aim here is to maintain and exploit the structurethen instead the KKT conditions are solved locally usingNewton’s method.

3. SOLVING THE KKT CONDITIONS

To find local solutions of (11) consider the KKT conditionsassociated with the problem, as defined below[

i(k)R − S3s

(k)

i(k)I − S4s

(k)

]− (A(k) + ∆(k)(s(k)))

[yRyI

]= 0 (13a)

Ws(k) +X(yR, yI)>µ(k) = 0 (13b)(

A(k) + ∆(s(k)))>

µ(k) = 0 (13c)

∀k ∈ {1, . . . ,M},where µ(k) ∈ R2n×1 is a vector of the Lagrange multiplierscorresponding to the k − th equality constraint in (11b)and X(yR, yI) ∈ R2n×(2n+2e) is given by

X(yR, yI) =

[−X1(yR) X1(yI)−X1(yI) −X1(yR)

−In 0n0n −In

],

where X1(y) = H(1e(y)> ◦ Ie

). The equations in (13)

are non-linear. Consider a Newton method which itera-tively solves linearized versions of (13) to step towardsthe solution. The incremental update or Newton stepfor s(k), yR, yI , µ

(k) are denoted δs(k) , δyR , δyI , δµ(k) respec-tively. To formulate the Newton update equations thenthe gradients of (13b) and (13c) need to be defined. Tofacilitate this introduce a matrix G(k) ∈ R(2n+2e)×2e as

G(k) =

[G1

G2

02n×e

G2

−G1

02n×e

]

where G1 = diag(H> [In 0n]µ(k)) and

G2 = diag(H> [0n In]µ(k))

The newton update is then formed by linearizing (13)around the current iterate (yR, yI , s

(k), µ(k)). This leadsto the following linear set of equations to be solved foreach newton step

W G(k) X(yR, yI)>

(G(k))> 02e (A(k) + ∆(s(k)))>

X(yR, yI) (A(k) + ∆(s(k))) 02n

δs(k)

δyRδyIδµ(k)

= −

Ws(k) +X(yR, yI)

>µ(k)

(A(k) + ∆(s(k))>µ(k)[i(k)R − S3s

(k)

i(k)I − S4s

(k)

]− (A(k) + ∆(s(k))

[yRyI

] , (14)

∀k ∈ {1, . . . ,M}

Note that although the system above is written as a seriesof sub-equations, each corresponding to a particular con-straint k, there is coupling between the equations since thesame δyR and δyI are in each sub-system. Consequently,the subsystems cannot solved independently. Instead anaugmented system is formed to reflect this coupling andform an explicit system for all k.

Define the following augmented matrices

W =

W 0 · · · 0

0 W. . . 0

.... . .

. . . 00 · · · 0 W

, X =

X 0 · · · 0

0 X. . . 0

.... . .

. . . 00 · · · 0 X

,

A =

A(1) + ∆(1)(s(1))A(2) + ∆(2)(s(2))

...A(M) + ∆(M)(s(M))

, G =

G(1)

G(2)

...G(M)

where W ∈ R2M(n+e)×2M(n+e), X ∈ R2Mn×2M(n+e),A ∈ R(2Mn×2e) and G ∈ R2M(n+e)×2e. Let

0 100 200 300 400 500 600

nz = 4818

0

100

200

300

400

500

600

0 500 1000 1500

nz = 13020

0

200

400

600

800

1000

1200

1400

1600

Fig. 1. Plot showing structure of the non-zero entries intotal newton updates given by (15) for 10 measure-ments on a 10 node path graph (left) and a 28 nodetree network (right).

s =

s(1)

s(2)

...s(M)

, δs =

δs(1)δs(2)

...δs(M)

, µ =

µ(1)

µ(2)

...µ(M)

,

δµ =

δµ(1)

δµ(2)

...δµ(M)

I =

i(1)R − S3s

(1)

i(1)I − S4s

(1)

i(2)R − S3s

(2)

i(2)I − S4s

(2)

...

i(M)R − S3s

(M)

i(M)I − S4s

(M)

.

Then a combined system is written as W G X>

G> 02e A>

X A 02Mn

δsδyRδyIδµ

= −

Ws+ X>µ

A>µ

I − A[yRyI

] , (15)

Once (15) is solved then at each iteration the variabless, yR, yI , µ can be updated

s← s+ δsyR ← yR + δyRyI ← yI + δyIµ← µ+ δµ.

The system described in (15) is a (4Mn + (2M + 2)e) ×(4Mn+ (2M + 2)e) system. However, it is structured (seeFig. 1). The following section outlines a way to condensethe system (15). Moreover, the structure of this condensedsystem is discussed.

3.1 Condensing using Schur Complement

Here, the possibility of formulating a smaller structuredproblem that is equal to (15) is explored. To do thisconsider the top left block, the weighting matrix W , in(15). A schur-complement can be efficiently formed for thesystem if W is easily invertible . The following assumptionis made to ensure this condition.

Assumption 3.1. W is an invertible diagonal matrix.

This assumption is consistent when the measurementerrors are independent of each other.

0 50 100 150 200

nz = 2852

0

50

100

150

200

0 100 200 300 400 500 600

nz = 7708

0

100

200

300

400

500

600

Fig. 2. Plot showing structure for the smaller condensedblock C given by (16a) for 10 measurements on a 10node path graph (left) and a 28 node tree network(right).

Define C as the the following, which can be considered theSchur compliment of the top left block of the matrix in(15),

C =

[02e A>

A 02Mn

]−[G>

X

]W−1

[G X>

](16a)

=

[−G>W−1G A> − G>W−1X>A− XW−1G XW−1X>

]. (16b)

In practice C would be constructed using the expressionin (16a), while (16b) gives an indication of the structureof this schur block. The term G>W−1G is a 2e × 2eblock with main diagonal and two minor diagonals. Theterm XW−1X> is banded block diagonal. The term A>−G>W−1X> is a 2e×2Mn matrix made up of a two blocksof sub-matrices, each with e rows. Each block is made up ofe×n blocks concatenated next to each other that have thestructure of the incidence matrix H. Hence, C has a blockarrow structure where the blocks within are also highlystructured, two examples of this structure are shown inFig. 2.

Remark 3.1. The retention of structure and sparsity inthe condensed system C is imperative to obtaining amore efficient scalable algorithm to solve the estimationproblem (11). Fig.3 shows the advantage of this structureand sparsity compared with random structure or lack ofsparsity. If this condensed system C had become a denseor unstructured matrix then it is likely that the smallersystem would become slower to solve than the originalsystem, which can be solved by exploiting the structureand sparsity.

Using C then from (15) the system can be rewritten as

C

[δyRδyIδµ

]=

−A>µ

A

[yRyI

]− I

− [G>X

]W−1

[−Ws− X>µ

](17a)

δs = W−1[−Ws− X>µ

]− W−1

[G X>

] [δyRδyIδµ

],

(17b)

where (17a) is solved first and the solution used to calcu-late δs as per (17b).

Remark 3.2. Since W is diagonal then the multiplicationW−1A equates to dividing each element of row m of Aby the mth diagonal entry in W for all rows m. For this

0 50 100 150 200

No of nodes (n)

10-5

10-4

10-3

10-2

10-1

100

No

rma

lize

d A

ve

rag

e R

un

tim

e (

s)

Actual C

Random C

Same Sparsity - Random Structure

80%Density - Random Structure

Fig. 3. Plot showing the computational time for solvingmatrices of same size, using Matlabs ‘\’ but withdifferent structures and sparsity. The structure andsparsity of C results in much faster solving time.

particular size system it takes at most 2M(n+ e)× (2e+2Mn+ 1) flops, and in practice it is less due to the knownsparsity in the right hand side. The main computationburden in the resulting algorithm lies in forming C and alsosolving (17a). The former requires 2M(e+n)×(2Mn+2e)2

flops if structure is not exploited and the latter O((2Mn+2e)3) flops if structure is not exploited. In practice theactual flop counts for this system will be much lower dueto the block arrow structure of C. This is much smaller,especially for large M , than the O(((2M + 2)e + 4Mn)3)flops required if the original system (15) is solved withoutexploiting structure.

Remark 3.3. It is stated in Huffel and Lemmerling(2002) that the Hessian of the Lagrangian can be ap-proximated by setting G(k) = 0(2n+2e)×2e. Although thisappears to work in practice, it is not clear if or when thisapproximation guarantees convergence. Moreover, it mayslow down the convergence rate of the Newton methodin terms of number of iterations however, this may becountered by a simpler structure of the resulting system,which can be seen from setting G = 0 in (16b).

Conjecture 1. It is hypothesized that sufficient con-ditions, such as those in Theorem 2.6 of Deuflhard(2011)[p.56], for convergence to a stationary point aresatisfied for (15) when using the approximation of G = 0.Proving such conditions hold and further comparing thestructure with and without G(k) is the subject of ongoingwork on this problem.

This approximate method is compared with the exactNewton iteration for a simple particular example in thesubsequent numerical results section.

Remark 3.4. Note that the matrix

[02e A>

A 02Mn

]has rank

at most 4ne, which for large M is much less than 2Mn+2e which would equate to a full rank C. Therefore itmay be possible to achieve further efficiency gains byusing Woodbury Matrix identity and constructing an evensmaller system to solve. This will be explored in futurework.

4. NUMERICAL RESULTS

Here a path graph example network is considered, i.e.,node l connects to nodes l − 1 and l + 1 unless it isfirst or last node. A set of nominal network conditions

satisfying conservation and Kirchoff’s laws is constructedand a series of measurements is modeled by adding errorssampled from set distributions to the nominal conditions.The admittances are estimated using the Newton methoddescribed in previous section with and without the pro-posed modification given by (17).

The network is setup to have specified current i1 injectedat node 1, and a fraction of this current, i1

n , injected atthe half-way point of network. The remaining nodes havea current selected on a uniform distribution between − i1

2n

and i12n except the last node where the current is set to

ensure conservation of power. A nominal admittance is setfor each node and a voltage v1 for node 1 is set, withthe remaining voltages for the nodes selected to matchthe currents and nominal admittances. The measurementerrors, η

(k)lR , η

(k)lI , γ

(k)lR , γ

(k)lI , are selected as samples from

independent normal distributions with zero mean andvariances σηlR , σηlI , σγlR , andσγlI respectively. Therefore,

β(k)lmR = η

(k)mR − η

(k)lR is effectively selected from a normal

distribution with zero mean and variance σηlR + σηmR.

The weighting matrix is selected to be

W = diag([σ>βR σ>βI σ

>γR σ>γI

]>)(18)

where σβR, σβI ∈ Re×1 and σγR, σγI ∈ Rn×1 are given by

σβR =

(ση1R + ση2R)−1

(ση2R + ση3R)−1

...(ση(n−1)R

+ σηnR)−1

, σβI =

(ση1I + ση2I )−1

(ση2I + ση3I )−1

...(ση(n−1)I

+ σηnI)−1

,σγR =

[σ−1γ1R σ−1γ2R · · · σ

−1γnR

]>, σγI =

[σ−1γ1I σ

−1γ2I · · · σ

−1γnI

]>.

This particular selection of weighting matrix for the casewhere ηl and γl are sampled from independent normaldistribution corresponds the objective of (11) becomingminimizing the log of the maximum likelihood function,hence equivalent to maximizing the maximum likelihoodfunction subject to the constraints (11b).

For these examples the particular values used were σηlR =0.01|vlR|, σηlI = 0.01|vlI |, σγlR = 0.01|ilR| and σγlR =0.01|ilR|. The current injected is i1R = n, i1I = −0.25nand nominal admittance yl,l+1 = 1

zl, where zlR = 0.1Ll

and zlI = 0.01πLl and Ll is uniformly selected for eachnode on the interval (80, 120) for all l ∈ {1, . . . , e}. Thevoltage at node 1 was set as v1 = 1000. To solve the linearsystems, in both cases Matlabs ‘\’ command was used. Astopping criterion of ‖δ‖2 < 10−5 is used for all examples.Where δ is the vector of all δs, δyR , δyI , δµ. Results werecarried out on a Windows laptop with 16GB RAM andIntel Core i7-4790K CPU @4.00GHz processor.

The top of Fig. 4 demonstrates improvement in runtimefor increasing network size using the condensed systemin (17). However, the bottom of Fig. 4 shows that for alarge number of measurements (M > 400) the originalmethod (15), is faster in this setup. This may be due tohow Matlab handles memory and how the generic sparsesolvers it uses work on this system. Future work will lookat creating an algorithm that explicitly exploits blockarrow structure of C in (17). For increasing network size,the approximation for the Newton method is marginallyslower than the exact. This is less conclusive for increasing

0 200 400 600 800 1000

No of Measurements (M)

10-3

10-2

10-1

100

101

102

103

Runtim

e (

s)

Exact Newton - Original (eq 16)

Exact Newton - Condensed (eq 18)

Approximate Newton - Original (eq 16)

Approximate Newton - Condensed (eq 18)

0 200 400 600 800 1000

No of Nodes (n)

10-2

10-1

100

101

102

Runtim

e (

s)

Exact Newton - Original (eq 16)

Exact Newton - Condensed (eq 18)

Approximate Newton - Original (eq 16)

Approximate Newton - Condensed (eq 18)

Fig. 4. Plot showing the runtime vs no of measurementsfor a fixed Network size n = 100 (top) and runtimevs no of nodes for fixed no of Measurements M =50, comparing the different Newton like methods forsolving KKT conditions.

0 20 40 60 80 100

Edge number

-10

-5

0

5

Err

or

in R

eal A

dm

itta

nce e

stim

ate

yiR

10-3

Result with 1 Measurement

Result with 1000 Measurements

0 200 400 600 800 1000

No of Measurements (M)

10-3

10-2

10-1

|yN

om

inal -

yM

easure

d|

Fig. 5. Plots showing the error between estimated admit-tance and actual admittance. Actual error for realadmittance for with 1 measurement vs 1000 measure-ments (left) and norm of the error vs no of measure-ments for fixed network size (right).

number of measurements. Fig. 6 shows for this example theapproximation takes a couple more iterations to convergebut this is compensated by a slightly quicker solve timefor each iterate due to a simpler, sparser system. Fig. 5show how the estimates improve with more measurementswith only marginal improvements after about 400 mea-surements for this network with 100 nodes.

5. CONCLUSION AND FUTURE WORK

This paper considers the problem of estimating admit-tances of an electrical network using measurements of nodevoltages and injected current, which are subject to mea-surement error. By treating the error as deterministic un-knowns, a structured total least squares optimization prob-lem is formulated. The explicit form for the linear systemof equations arising from Newton’s method applied to thefirst order optimality conditions of the problem is derived.From this the structure of the linear system is revealed. A

1 2 3 4 5 6 7

Iteration number

10-8

10-6

10-4

10-2

100

102

104

106

No

rm o

f re

sid

ua

l

Exact Newton - 10 Nodes

Approximate Newton - 10 Nodes

Exact Newton - 1000 Nodes

Approximate Newton - 1000 Nodes

Fig. 6. Plot showing the norm of the residual for itera-tions of Newton algorithm and approximate Newtonmethod.

schur complement is used to construct a smaller system forwhich it is shown, crucially maintains a nice structure. Anumerical comparison is done on a line network. Ongoingwork is focused on proving convergence properties forthis problem, and extending the application to scalable,potentially distributed, optimization algorithms that couldbe used in topology reconstruction. Another idea under ex-ploration is to consider constructing the problem in stages,first using some empirical techniques on the measurementswith the aim to arrive at a smaller convex problem forwhich structure can be exploited to solve large problemsefficiently with convergence guarantees.

REFERENCES

Ardakanian, O., Wong, V.W., Dobbe, R., Low, S.H.,von Meier, A., Tomlin, C., and Yuan, Y. (2017). Onidentification of distribution grids. arXiv preprintarXiv:1711.01526.

Beck, A. and Eldar, Y.C. (2010). Structured total maxi-mum likelihood: An alternative to structured total leastsquares. SIAM Journal on Matrix Analysis and Appli-cations, 31(5), 2623–2649. doi:10.1137/090756338.

Cavraro, G., Arghandeh, R., Poolla, K., and Von Meier, A.(2015). Data-driven approach for distribution networktopology detection. In Power & Energy Society GeneralMeeting, 2015 IEEE, 1–5. IEEE.

Deuflhard, P. (2011). Newton methods for nonlinearproblems: affine invariance and adaptive algorithms,volume 35. Springer Science & Business Media.

Huffel, S. and Lemmerling, P. (2002). Total Least Squaresand Errors-in-Variables Modeling. : Analysis, Algo-rithms and Applications. Dordrecht : Springer Nether-lands, 2002.

Marelli, D., Ninness, B., and Fu, M. (2015). Distributedweighted least-squares estimation for power networks.IFAC-PapersOnLine, 48(28), 562–567.

Oster, G. and Desoer, C. (1971). Tellegen’s theoremand thermodynamic inequalities. Journal of theoreticalBiology, 32(2), 219–241.

Schweppe, F.C. and Rom, D.B. (1970). Power systemstatic-state estimation, part ii: Approximate model.IEEE Transactions on Power Apparatus and Systems,(1), 125–130.

Soderstrom, T. (2007). Errors-in-variables methods insystem identification. Automatica, 43(6), 939 – 958.

Yu, J., Weng, Y., and Rajagopal, R. (2017). PaToPa:A data-driven parameter and topology joint estimationframework in distribution grids. IEEE Transactions onPower Systems, 1–1.

Documents

Structure in total least squares parameter estimation for