bad_data

Embed Size (px)

Citation preview

  • 8/17/2019 bad_data

    1/5

    1

    A Modified Chi-Squares Test for

    Improved Bad Data Detection

    Murat Göl, Member, IEEE   Ali Abur, Fellow, IEEE  EEE Department ECE Department

    Middle East Technical University Northeastern University

    Ankara, Turkey Boston, MA, U.S.A.

    [email protected]   [email protected]

     Abstract  —Current state estimators employ the Weighted Least

    Squares (WLS) estimator to solve the state estimation problem.

    Once the state estimates are obtained, Chi-Square test is

    commonly used to detect the presence of bad data in the

    measurement sets. Regretfully, this test is not entirely reliable,

    that is, bad data existing in the measurement set could be missed

    for certain cases. One reason for this is the approximations used

    to compute the bad data suspicion threshold, which is set based

    on an assumed chi-squares distribution for the objective

    function. In this paper, a modified metric is proposed in order

    to improve the bad data detection accuracy of the commonly

    used chi-square test. The bad data detection performance of the

    proposed test is compared with that of conventional chi-square

    test.

     Index Terms-- Bad-data detection, state estimation, Chi-squared

    distribution, measurement residuals, weighted least squares. 1 

    I. 

    I NTRODUCTIONPower system state estimation is one of the key tools of an

    Energy Management System (EMS) [1]. State estimators

     provide the best estimates of the system voltage magnitudes

    and phase angles using the system model and a redundant

    enough measurement set. Those estimates are used in the

    economic and control tools of the EMS.

    The most common state estimation technique employed in

     present systems is the weighted least squares (WLS) method

    [1]. WLS is a well-developed and fast method. When applied

    to the first order approximation of measurement equations, it

     provides the best linear unbiased estimator (BLUE) given

    normally distributed measurement errors [2]. In the presence

    of Gaussian errors, WLS provides unbiased state estimates.

    Unfortunately WLS estimator is not robust against bad data,

    and even a single measurement with gross error may

    significantly bias the estimation results. Therefore, almost all

    WLS estimators carry out a post-estimation bad data

    detection test, which is commonly accomplished by the so-

    called Chi-Squares test [3] - [4]. Although the Chi-Squares

    This work made use of Engineering Research Center Shared Facilities

    supported by the Engineering Research Center Program of the National

    Science Foundation and the Department of Energy under NSF Award

     Number EEC-1041877 and the CURENT Industry Partnership Program.

    test is the most common bad data detection method used in

    several commercial state estimators, this test may not always

    yield correct results. There are cases where Chi-Squares test

    can be shown to fail to detect existing bad data in the

    measurement set.

    Missing a bad measurement which is present in the

    measurement set has dire consequences, such as biased

    estimates which will affect the decisions based on those

    estimates. Therefore, this paper proposes a simple

    modification that will improve bad data detection capability

    in existing state estimators. The proposed modification

    requires calculation of residual covariance matrix. The

    computation of residual covariance matrix uses a subset of

    the elements in the inverse of the sparse gain matrix. It is

    known that matrix inversion is a computationally expensive

    operation, and hence avoided in power system analysis.

    However, thanks to the efficient sparse inverse methods, [5] -

    [7], the computation can be performed with littlecomputational cost. In this paper the proposed method is

    compared with the conventional Chi Squares method in terms

    of computational performance and bad measurement

    detection accuracy.

    The rest of the paper is organized as follows, Section II

    explains the conventional Chi-Squares Test, while the

     proposed method is explained in detail in Section III. The

    simulations and the numerical results are shown in Section IV

    and Section V concludes the paper.

    II. 

    CONVENTIONAL CHI-SQUARE TEST 

    Consider a random variable Y , which has a chi-squared( χ 2) distribution with  N   degrees of freedom given by the

    following expression:

    ∑=

    =

     N 

    i

    i X Y 

    1

    2   (1)

    where the random variables  X 1,  X 2, … ,  X  N   are independent

    and distributed according the standard normal distribution.

  • 8/17/2019 bad_data

    2/5

    2

    In power system state estimation problem formulation,

    measurement errors are commonly assumed to have a normal

    distribution with zero mean and known variance. Using the

    same assumption a function  f(x) can be defined as given in

    (2), where f(x) has a chi-squared distribution with at most (m-

    n) degrees of freedom (m being the number of measurements

    and n  being the number of the states). Note that in a power

    system with m measurements and n system states at most (m-n)  errors can be linearly independent, since at least n

    measurements are required to obtain a solution. Thus the

    degrees of freedom will be at most (m-n).

    ( )   ( )∑∑∑===

    −=

    ⎟⎟

     ⎠

     ⎞

    ⎜⎜

    ⎝ 

    ⎛ ==

    m

    i

     N i

    m

    i ii

    im

    i

    iii e R

    ee R x f 

    1

    2

    1

    2

    1

    21   (2)

    In (2), ei is the measurement error with normal distribution

    and Rii is the variance of the ith measurement error, where R is

    the diagonal error covariance matrix.  N ie   is the normalized

    error which has a standard normal distribution.

    Consider the Chi-squared probability density function plot

    given in Fig. 1 [1]. The area below the p.d.f. represents the

     probability of finding X  in the given region, as shown below.

    { } ( )∫∞

    =≥

    t  xt  duu x X  P 

    2 χ    (3)

    Eq. (3) represents the probability of  X   being larger

    than t  x . This probability decreases as t  x  increases, since the

    tail of the distribution decays. According to the Fig. 1, t  x   is

    25 as shown by the dotted line for the chosen probability

    0.05.

    Fig. 1. Chi-Squared probabbility density function [1].

    t  x   represents the largest value that will not be identified

    as bad measurement. If the measured value exceeds the

    threshold, the presence of the bad measurement will be

    suspected.

    In order to detect bad data, most of the commercial state

    estimators that employ WLS estimation method, use the

    following metric:

    ( )   ( )( )   ( )∑∑ ===

    −=

    m

    i i

    i

    m

    i i

    ii r  xh z  x J 1

    2

    2

    12

    2ˆˆ

    σ  σ    (4)

    where m is the number of measurements,  x̂   is the (nx1)

    estimated state vector, ( ) xhi ˆ , i z  and ir are the estimated andmeasured values and the residual for the ith  measurement

    respectively, and σ  i

    2   is the corresponding measurement

    variance, which is the same as  Rii. The conventional chi-

    squares test will suspect existence of bad data if the computed

    metric   ( ) x J  ˆ   is larger than 2 ),(  pnm− χ  , the bad data suspicion

    threshold value according to a chi-squared distribution for a

    given probability p and degrees of freedom (m-n).

     Note that, a random variable with standard normal

    distribution can have a chi-squared distribution if that randomvariable is normalized with its variance as defined in (2).

    Therefore, (4) is an approximation of ( ) x f  , which is definedin (2), since the measurement residuals are normalized withrespect to the variances of the measurement errors. 

    III.  PROPOSED APPROACH

    The conventional chi-square test assumes that the metric

    ( ) x J m ˆ  shown in (4) is distributed according to a chi-squared

    distribution. However, the denominator is not the variance of

    the corresponding residual appearing in the numerator. Thisintroduces an approximation, which may lead to incorrect

    results, i.e. existing bad data may not be detected.

    According to [2], the key to the analysis of bad data is the

    residual sensitivity matrix, S , which is obtained by

    linearization of the relation between the measurement vector z , and system state vector  x and measurement error vector e,which is defined as follows.

    ( )

    ( )   ( )   ( )( )

    ( ) e R H  H  R H  H  I r 

    e R H  H  R H  H er 

    e Hx R H  H  R H  H e Hxr 

     x H  z r 

     z  R H  H  R H  x

    e Hx z 

    T T 

    T T 

    T T 

    T T 

    ⎟ ⎠ ⎞

    ⎜⎝ ⎛ 

    −=

    −=

    +−+=

    −=

    =

    +=

    −−

    −−

    −−−

    −−

    111

    111

    111

    111

    ˆ

    ˆ

      (5)

    ( ) 111   −−−−=  R H  H  R H  H  I S  T T    (6)

  • 8/17/2019 bad_data

    3/5

    3

    S  is the residual sensitivity matrix, R is the measurement errorcovariance matrix, H  is the measurement Jacobian matrix and

     I   is the mxm  identity, m  being the number of measurements

    [1]. Note that the derivation is based on the linear

    measurement model. The details on derivation of S can be

    found in [1]. The residual sensitivity matrix S has the

    following properties [1].

    SRSRS 

    S S S S S 

    T =

    =⋅⋅     (7)

    Once the linearized measurement model is assumed, the

    residual sensitivity matrix S , represents the relation betweenthe measurement errors and measurement residuals [1] as

    shown below.

    Ser  =   (8)

    where r is the measurement residual vector and e is the

    measurement error vector.

    Using (7) and (8), and the known covariance matrix forthe measurement errors R, one can easily derive the expectedvalue and the covariance matrix of the measurement residuals

    as given below:

    { } { } { }

    ( )

    [ ] [ ] SRSRS S ee E S rr  E r Cov

    e E S Se E r  E 

    T T T ==⋅⋅==Ω

    Ω=

    =⋅== 0

      (9)

    where, ( ) xh z r  ˆ−= , Ω is the residual covariance matrix. Notethat, due to the standard normal distributed measurement

    error assumption, the expected value of the measurement

    errors is 0.

    As seen in (9), Ω  differs significantly from R, themeasurement error covariance matrix. Therefore, in this paper

    it is proposed to use a modified bad data detection

    metric,   ( ) xm ˆΨ , as defined below, where Ωii is the variance ofthe i

    th measurement residual.

    ( )  ( )( )

    ∑=

      Ω

    −=Ψ

    m

    i ii

    iim

     xh z  x

    1

    2ˆˆ   (10)

     Note that Ω  is a rank-deficient matrix, such that it is not

    invertible. Therefore, instead of using the inverse of Ω, thediagonal entries, which are the measurement residual

    variances, are employed. In this formulation, off-diagonal

    entries of Ω, which represent the correlations amongmeasurement residuals will be neglected and only the

    diagonal elements will be considered. Thus, this metric willstill be an approximation, albeit a more reliable metric

    compared to (4), since the residuals are normalized using the

    square root diagonal entries of the residual covariance matrix,

    which are the measurement residual standard deviations,

    instead of those of measurement errors.

    The main computational cost of this approach is the

    computation of Ω, since a matrix inversion must be performed. However, thanks to the extremely sparse structure

    of the measurement Jacobian  H,  efficient sparse inverse

    methods [4] - [7] can be employed and the computational

     burden will not be significant even for large-scale systems.

     Note also that Ω does not strongly depend on the operating

     point. Therefore, as long as the topology and measurementconfiguration remain the same, Ω will not have to be updated.

    IV.  SIMULATION R ESULTS

    In this section a real utility system with 265-buses and

    340-branches will be used to illustrate the benefits of the

     proposed bad data detection test. The system is measured by362 measurements which ensure high enough measurementredundancy to detect presence of bad data. Simulations are

    carried out in MATLAB R2014a environment using a PC

    with 4GB RAM and Windows operating system.

    The first study shows the additional computational burden

    required for computation of residual covariance matrix. Thesecond study compares bad data detection performances ofthe proposed modified method and the conventional chi-

    squares test.

    Case 1: In this study solution time of WLS estimation is

    compared with the CPU times required for the proposed bad

    data detection approach and the conventional one. 1500

    Monte-Carlo simulations are carried out and mean value ofthe results is obtained. In these simulations, random

    Gaussian errors are added to the measurement set and one

    randomly selected measurement is intentionally corrupted to

    emulate bad data by changing its sign. Table I shows the

    CPU times for the WLS state estimation solution as well asfor the modified and conventional Chi-squares tests. The

    increase in computation time when using the proposed

    modified test is expected and is primarily caused by the

    computation of residual covariance matrix, Ω. 

    TABLE I. MEAN COMPUTATION TIME (MILLISECONDS)

    WLS EstimationProposed Modified

    Chi-Squares

    Conventional

    Chi-Squares

    7 3.4 0.1

    Case 2:  Bad data detection performance of the proposedapproach is compared to that of the conventional method.

    Four different single bad data scenarios are studied. Eachscenario is repeated 1500 times each time introducing a

    randomly selected bad measurement. In these four cases, acertain amount of error, which is proportional to the standard

    deviation of the considered measurement σ, is added to theoriginal measurements in order to emulate bad measurements.

    The amount of error introduced for each case is given below.

    In order to make the simulations realistic, Gaussian errors are

    also added to all measurements.

    •  Case 2.a: No bad measurement.

  • 8/17/2019 bad_data

    4/5

    4

    •  Case 2.b: 3σ.

    •  Case 2.c: 40σ.

    •  Case 2.d: 100σ.

    Table II shows the bad data detection performance of the

     proposed method and the conventional approach. The values

    given in Table II are percentage values, which also indicate bad data detection probability of the proposed and the

    conventional methods. As evident in Table II, both cases give

    correct results for very large and very small error values.

    However, for intermediate error values such as Case 2.c,

    which can still significantly bias the estimation results, the

     proposed approach can detect bad data which is missed by theconventional chi-squares test.

    TABLE II. BAD DATA DETECTION PERFORMANCE 

    Case

    Bad Data Detection Percentage

    Proposed

    Modified

    Chi-Squares

    Conventional

    Chi-Squares

    Bad Data

    Present

    2.a 0 0 No

    2.b 0 0 No

    2.c 100 68.9 Yes

    2.d 100 100 Yes

    According to Table II, the estimation results of Case 2.b

    are unbiased, while estimation results of case 2.c are biased.Fig. 2.a presents the difference between the true states andestimation results of one randomly selected Monte Carlo run

    for Case 2.b. Similarly, Fig. 2.b presents the difference

     between the true states and estimation results of the same

    randomly selected Monte Carlo run for Case 2.c, such that

     both figures consider the same measurement but with

    different errors. As seen in Fig. 2.b, although the estimationresults are biased, the conventional method was not capable

    of identifying the presence of gross error. On the other hand,

    the proposed metric successfully detected the presence of bad

    measurement.

    0 100 200 300 400 500-10

    -8

    -6

    -4

    -2

    0

    2

    4

    6

    8x 10

    -3

    States

      x   t  r  u  e   -

      x  e  s   t

     

    (a) Case 2.b

    0 100 200 300 400 500-10

    -8

    -6

    -4

    -2

    0

    2

    4

    6

    8x 10

    -3

    States

      x   t  r  u  e 

      -  x  e  s   t

     

    (b) Case 2.c

    Fig. 2. Mismatch between estimated and true states.

    Finally, it is quite informative to take a look at the

    covariance values for the errors and residuals. Fig. 3 presents

    the variation of Ωii  and R ii  values. As seen in Fig. 3,compared to the constant R ii  values, Ωii values in general

    appear to be much smaller. Therefore, the proposed bad data

    suspicion threshold will always be smaller than that of the

    conventional Chi-squares test.

    0 200 400 600 800 1000 1200 14006.5

    7

    7.5

    8

    8.5

    9

    9.5

    10

    10.5x 10

    -4

    Measurement Residuals

     

    Fig. 3. Variation of Ωii and R ii values.

    V. 

    CONCLUSIONS

    In this paper a modified Chi-squares test to improve the bad data detection accuracy when using WLS method in state

    estimation is proposed. As seen in the simulations, the

     proposed metric has a better performance compared to the

    conventional test in detecting presence of bad data in a given

    measurement set. Although the proposed test is successful in

    detection of bad data, identification and removal of the badmeasurements will still have to be carried out by methods

    such as normalized residuals test [8].

  • 8/17/2019 bad_data

    5/5

    5

    Most commercial programs use Chi-squares test as acomputationally cheap filter to decide whether or not toconduct an identification test. In that sense, this modification

    may serve a useful purpose in increasing the reliability of this

    initial filter so that bad data will not be missed.

    R EFERENCES 

    [1] 

    A. Abur and A. Gomez-Exposito, “Power System State Estimation:Theory and Implementation”, book, Marcel Dekker, 2004.

    [2] 

    A. C. Aitken, “On Least Squares and Linear Combinations ofObservations”, Proc. Royal Society of Edinburg, 1935, vol. 35, pp. 42-48.

    [3] 

    E. Handschin, F. C. Schweppe, J. Kohlas, and A. Fiechter, “Bad dataanalysis for power systems state estimation,” IEEE Trans. Power App.Syst., vol. 94, pp. 329–337, Mar./Apr. 1975.

    [4] 

    A. Monticelli, “Electric Power System State Estimation”, Proceedingsof the IEEE, vol. 88, no 2, February 2000.

    [5]  K. Takahashi, J. Fagan and M. Chen, “Formation of a Sparse BusImpedance Matrix and Its Application to Short Circuit Study”, PICAProceedings, May 1973, pp. 63-69.

    [6] 

    Y. E. Campbell and T. A. Davis, “Computing the Sparse Inverse

    Subset: An Inverse Multi-frontal Approach”, University of Florida,

    Technical Report TR-95-021.

    [7] 

    B. Bilir and A. Abur, “Bad Data Processing When Using the CoupledMeasurement Model and Takahashi’s Sparse Inverse Method”,

    Innovative Smart Grid Technologies Conference - Europe, IEEE,Istanbul, Turkey, 12-15 Oct. 2014.

    [8] 

    A. Monticelli and A. Garcia, “Reliable Bad Data Processing for Real-Time State Estimation”, IEEE Transactions on Power Apparatus andSystems, Vol. PAS-102, No. 5, May 1983, pp. 1126-1139.