bad_data

8/17/2019 bad_data

1/5

1

A Modified Chi-Squares Test for

Improved Bad Data Detection

Murat Göl, Member, IEEE Ali Abur, Fellow, IEEE EEE Department ECE Department

Middle East Technical University Northeastern University

Ankara, Turkey Boston, MA, U.S.A.

[email protected] [email protected]

Abstract —Current state estimators employ the Weighted Least

Squares (WLS) estimator to solve the state estimation problem.

Once the state estimates are obtained, Chi-Square test is

commonly used to detect the presence of bad data in the

measurement sets. Regretfully, this test is not entirely reliable,

that is, bad data existing in the measurement set could be missed

for certain cases. One reason for this is the approximations used

to compute the bad data suspicion threshold, which is set based

on an assumed chi-squares distribution for the objective

function. In this paper, a modified metric is proposed in order

to improve the bad data detection accuracy of the commonly

used chi-square test. The bad data detection performance of the

proposed test is compared with that of conventional chi-square

test.

Index Terms-- Bad-data detection, state estimation, Chi-squared

distribution, measurement residuals, weighted least squares. 1

I.

I NTRODUCTIONPower system state estimation is one of the key tools of an

Energy Management System (EMS) [1]. State estimators

provide the best estimates of the system voltage magnitudes

and phase angles using the system model and a redundant

enough measurement set. Those estimates are used in the

economic and control tools of the EMS.

The most common state estimation technique employed in

present systems is the weighted least squares (WLS) method

[1]. WLS is a well-developed and fast method. When applied

to the first order approximation of measurement equations, it

provides the best linear unbiased estimator (BLUE) given

normally distributed measurement errors [2]. In the presence

of Gaussian errors, WLS provides unbiased state estimates.

Unfortunately WLS estimator is not robust against bad data,

and even a single measurement with gross error may

significantly bias the estimation results. Therefore, almost all

WLS estimators carry out a post-estimation bad data

detection test, which is commonly accomplished by the so-

called Chi-Squares test [3] - [4]. Although the Chi-Squares

This work made use of Engineering Research Center Shared Facilities

supported by the Engineering Research Center Program of the National

Science Foundation and the Department of Energy under NSF Award

Number EEC-1041877 and the CURENT Industry Partnership Program.

test is the most common bad data detection method used in

several commercial state estimators, this test may not always

yield correct results. There are cases where Chi-Squares test

can be shown to fail to detect existing bad data in the

measurement set.

Missing a bad measurement which is present in the

measurement set has dire consequences, such as biased

estimates which will affect the decisions based on those

estimates. Therefore, this paper proposes a simple

modification that will improve bad data detection capability

in existing state estimators. The proposed modification

requires calculation of residual covariance matrix. The

computation of residual covariance matrix uses a subset of

the elements in the inverse of the sparse gain matrix. It is

known that matrix inversion is a computationally expensive

operation, and hence avoided in power system analysis.

However, thanks to the efficient sparse inverse methods, [5] -

[7], the computation can be performed with littlecomputational cost. In this paper the proposed method is

compared with the conventional Chi Squares method in terms

of computational performance and bad measurement

detection accuracy.

The rest of the paper is organized as follows, Section II

explains the conventional Chi-Squares Test, while the

proposed method is explained in detail in Section III. The

simulations and the numerical results are shown in Section IV

and Section V concludes the paper.

II.

CONVENTIONAL CHI-SQUARE TEST

Consider a random variable Y , which has a chi-squared( χ 2) distribution with N degrees of freedom given by the

following expression:

∑=

=

N

i

i X Y

1

2 (1)

where the random variables X 1, X 2, … , X N are independent

and distributed according the standard normal distribution.

8/17/2019 bad_data

2/5

2

In power system state estimation problem formulation,

measurement errors are commonly assumed to have a normal

distribution with zero mean and known variance. Using the

same assumption a function f(x) can be defined as given in

(2), where f(x) has a chi-squared distribution with at most (m-

n) degrees of freedom (m being the number of measurements

and n being the number of the states). Note that in a power

system with m measurements and n system states at most (m-n) errors can be linearly independent, since at least n

measurements are required to obtain a solution. Thus the

degrees of freedom will be at most (m-n).

( ) ( )∑∑∑===

−=

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛ ==

m

i

N i

m

i ii

im

i

iii e R

ee R x f

1

2

1

2

1

21 (2)

In (2), ei is the measurement error with normal distribution

and Rii is the variance of the ith measurement error, where R is

the diagonal error covariance matrix. N ie is the normalized

error which has a standard normal distribution.

Consider the Chi-squared probability density function plot

given in Fig. 1 [1]. The area below the p.d.f. represents the

probability of finding X in the given region, as shown below.

{ } ( )∫∞

=≥

t xt duu x X P

2 χ (3)

Eq. (3) represents the probability of X being larger

than t x . This probability decreases as t x increases, since the

tail of the distribution decays. According to the Fig. 1, t x is

25 as shown by the dotted line for the chosen probability

0.05.

Fig. 1. Chi-Squared probabbility density function [1].

t x represents the largest value that will not be identified

as bad measurement. If the measured value exceeds the

threshold, the presence of the bad measurement will be

suspected.

In order to detect bad data, most of the commercial state

estimators that employ WLS estimation method, use the

following metric:

( ) ( )( ) ( )∑∑ ===

−=

m

i i

i

m

i i

ii r xh z x J 1

2

2

12

2ˆˆ

σ σ (4)

where m is the number of measurements, x̂ is the (nx1)

estimated state vector, ( ) xhi ˆ , i z and ir are the estimated andmeasured values and the residual for the ith measurement

respectively, and σ i

2 is the corresponding measurement

variance, which is the same as Rii. The conventional chi-

squares test will suspect existence of bad data if the computed

metric ( ) x J ˆ is larger than 2 ),( pnm− χ , the bad data suspicion

threshold value according to a chi-squared distribution for a

given probability p and degrees of freedom (m-n).

Note that, a random variable with standard normal

distribution can have a chi-squared distribution if that randomvariable is normalized with its variance as defined in (2).

Therefore, (4) is an approximation of ( ) x f , which is definedin (2), since the measurement residuals are normalized withrespect to the variances of the measurement errors.

III. PROPOSED APPROACH

The conventional chi-square test assumes that the metric

( ) x J m ˆ shown in (4) is distributed according to a chi-squared

distribution. However, the denominator is not the variance of

the corresponding residual appearing in the numerator. Thisintroduces an approximation, which may lead to incorrect

results, i.e. existing bad data may not be detected.

According to [2], the key to the analysis of bad data is the

residual sensitivity matrix, S , which is obtained by

linearization of the relation between the measurement vector z , and system state vector x and measurement error vector e,which is defined as follows.

( )

( ) ( ) ( )( )

( ) e R H H R H H I r

e R H H R H H er

e Hx R H H R H H e Hxr

x H z r

z R H H R H x

e Hx z

T T

T T

T T

T T

⎟ ⎠ ⎞

⎜⎝ ⎛

−=

−=

+−+=

−=

=

+=

−−

−

−−

−

−−−

−−

−

111

111

111

111

ˆ

ˆ

(5)

( ) 111 −−−−= R H H R H H I S T T (6)

8/17/2019 bad_data

3/5

3

S is the residual sensitivity matrix, R is the measurement errorcovariance matrix, H is the measurement Jacobian matrix and

I is the mxm identity, m being the number of measurements

[1]. Note that the derivation is based on the linear

measurement model. The details on derivation of S can be

found in [1]. The residual sensitivity matrix S has the

following properties [1].

SRSRS

S S S S S

T =

=⋅⋅ (7)

Once the linearized measurement model is assumed, the

residual sensitivity matrix S , represents the relation betweenthe measurement errors and measurement residuals [1] as

shown below.

Ser = (8)

where r is the measurement residual vector and e is the

measurement error vector.

Using (7) and (8), and the known covariance matrix forthe measurement errors R, one can easily derive the expectedvalue and the covariance matrix of the measurement residuals

as given below:

{ } { } { }

( )

[ ] [ ] SRSRS S ee E S rr E r Cov

e E S Se E r E

T T T ==⋅⋅==Ω

Ω=

=⋅== 0

(9)

where, ( ) xh z r ˆ−= , Ω is the residual covariance matrix. Notethat, due to the standard normal distributed measurement

error assumption, the expected value of the measurement

errors is 0.

As seen in (9), Ω differs significantly from R, themeasurement error covariance matrix. Therefore, in this paper

it is proposed to use a modified bad data detection

metric, ( ) xm ˆΨ , as defined below, where Ωii is the variance ofthe i

th measurement residual.

( ) ( )( )

∑=

Ω

−=Ψ

m

i ii

iim

xh z x

1

2ˆˆ (10)

Note that Ω is a rank-deficient matrix, such that it is not

invertible. Therefore, instead of using the inverse of Ω, thediagonal entries, which are the measurement residual

variances, are employed. In this formulation, off-diagonal

entries of Ω, which represent the correlations amongmeasurement residuals will be neglected and only the

diagonal elements will be considered. Thus, this metric willstill be an approximation, albeit a more reliable metric

compared to (4), since the residuals are normalized using the

square root diagonal entries of the residual covariance matrix,

which are the measurement residual standard deviations,

instead of those of measurement errors.

The main computational cost of this approach is the

computation of Ω, since a matrix inversion must be performed. However, thanks to the extremely sparse structure

of the measurement Jacobian H, efficient sparse inverse

methods [4] - [7] can be employed and the computational

burden will not be significant even for large-scale systems.

Note also that Ω does not strongly depend on the operating

point. Therefore, as long as the topology and measurementconfiguration remain the same, Ω will not have to be updated.

IV. SIMULATION R ESULTS

In this section a real utility system with 265-buses and

340-branches will be used to illustrate the benefits of the

proposed bad data detection test. The system is measured by362 measurements which ensure high enough measurementredundancy to detect presence of bad data. Simulations are

carried out in MATLAB R2014a environment using a PC

with 4GB RAM and Windows operating system.

The first study shows the additional computational burden

required for computation of residual covariance matrix. Thesecond study compares bad data detection performances ofthe proposed modified method and the conventional chi-

squares test.

Case 1: In this study solution time of WLS estimation is

compared with the CPU times required for the proposed bad

data detection approach and the conventional one. 1500

Monte-Carlo simulations are carried out and mean value ofthe results is obtained. In these simulations, random

Gaussian errors are added to the measurement set and one

randomly selected measurement is intentionally corrupted to

emulate bad data by changing its sign. Table I shows the

CPU times for the WLS state estimation solution as well asfor the modified and conventional Chi-squares tests. The

increase in computation time when using the proposed

modified test is expected and is primarily caused by the

computation of residual covariance matrix, Ω.

TABLE I. MEAN COMPUTATION TIME (MILLISECONDS)

WLS EstimationProposed Modified

Chi-Squares

Conventional

Chi-Squares

7 3.4 0.1

Case 2: Bad data detection performance of the proposedapproach is compared to that of the conventional method.

Four different single bad data scenarios are studied. Eachscenario is repeated 1500 times each time introducing a

randomly selected bad measurement. In these four cases, acertain amount of error, which is proportional to the standard

deviation of the considered measurement σ, is added to theoriginal measurements in order to emulate bad measurements.

The amount of error introduced for each case is given below.

In order to make the simulations realistic, Gaussian errors are

also added to all measurements.

• Case 2.a: No bad measurement.

8/17/2019 bad_data

4/5

4

• Case 2.b: 3σ.

• Case 2.c: 40σ.

• Case 2.d: 100σ.

Table II shows the bad data detection performance of the

proposed method and the conventional approach. The values

given in Table II are percentage values, which also indicate bad data detection probability of the proposed and the

conventional methods. As evident in Table II, both cases give

correct results for very large and very small error values.

However, for intermediate error values such as Case 2.c,

which can still significantly bias the estimation results, the

proposed approach can detect bad data which is missed by theconventional chi-squares test.

TABLE II. BAD DATA DETECTION PERFORMANCE

Case

Bad Data Detection Percentage

Proposed

Modified

Chi-Squares

Conventional

Chi-Squares

Bad Data

Present

2.a 0 0 No

2.b 0 0 No

2.c 100 68.9 Yes

2.d 100 100 Yes

According to Table II, the estimation results of Case 2.b

are unbiased, while estimation results of case 2.c are biased.Fig. 2.a presents the difference between the true states andestimation results of one randomly selected Monte Carlo run

for Case 2.b. Similarly, Fig. 2.b presents the difference

between the true states and estimation results of the same

randomly selected Monte Carlo run for Case 2.c, such that

both figures consider the same measurement but with

different errors. As seen in Fig. 2.b, although the estimationresults are biased, the conventional method was not capable

of identifying the presence of gross error. On the other hand,

the proposed metric successfully detected the presence of bad

measurement.

0 100 200 300 400 500-10

-8

-6

-4

-2

0

2

4

6

8x 10

-3

States

x t r u e -

x e s t

(a) Case 2.b

0 100 200 300 400 500-10

-8

-6

-4

-2

0

2

4

6

8x 10

-3

States

x t r u e

- x e s t

(b) Case 2.c

Fig. 2. Mismatch between estimated and true states.

Finally, it is quite informative to take a look at the

covariance values for the errors and residuals. Fig. 3 presents

the variation of Ωii and R ii values. As seen in Fig. 3,compared to the constant R ii values, Ωii values in general

appear to be much smaller. Therefore, the proposed bad data

suspicion threshold will always be smaller than that of the

conventional Chi-squares test.

0 200 400 600 800 1000 1200 14006.5

7

7.5

8

8.5

9

9.5

10

10.5x 10

-4

Measurement Residuals

Fig. 3. Variation of Ωii and R ii values.

V.

CONCLUSIONS

In this paper a modified Chi-squares test to improve the bad data detection accuracy when using WLS method in state

estimation is proposed. As seen in the simulations, the

proposed metric has a better performance compared to the

conventional test in detecting presence of bad data in a given

measurement set. Although the proposed test is successful in

detection of bad data, identification and removal of the badmeasurements will still have to be carried out by methods

such as normalized residuals test [8].

8/17/2019 bad_data

5/5

5

Most commercial programs use Chi-squares test as acomputationally cheap filter to decide whether or not toconduct an identification test. In that sense, this modification

may serve a useful purpose in increasing the reliability of this

initial filter so that bad data will not be missed.

R EFERENCES

[1]

A. Abur and A. Gomez-Exposito, “Power System State Estimation:Theory and Implementation”, book, Marcel Dekker, 2004.

[2]

A. C. Aitken, “On Least Squares and Linear Combinations ofObservations”, Proc. Royal Society of Edinburg, 1935, vol. 35, pp. 42-48.

[3]

E. Handschin, F. C. Schweppe, J. Kohlas, and A. Fiechter, “Bad dataanalysis for power systems state estimation,” IEEE Trans. Power App.Syst., vol. 94, pp. 329–337, Mar./Apr. 1975.

[4]

A. Monticelli, “Electric Power System State Estimation”, Proceedingsof the IEEE, vol. 88, no 2, February 2000.

[5] K. Takahashi, J. Fagan and M. Chen, “Formation of a Sparse BusImpedance Matrix and Its Application to Short Circuit Study”, PICAProceedings, May 1973, pp. 63-69.

[6]

Y. E. Campbell and T. A. Davis, “Computing the Sparse Inverse

Subset: An Inverse Multi-frontal Approach”, University of Florida,

Technical Report TR-95-021.

[7]

B. Bilir and A. Abur, “Bad Data Processing When Using the CoupledMeasurement Model and Takahashi’s Sparse Inverse Method”,

Innovative Smart Grid Technologies Conference - Europe, IEEE,Istanbul, Turkey, 12-15 Oct. 2014.

[8]

A. Monticelli and A. Garcia, “Reliable Bad Data Processing for Real-Time State Estimation”, IEEE Transactions on Power Apparatus andSystems, Vol. PAS-102, No. 5, May 1983, pp. 1126-1139.

Documents

bad_data