13
1 A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu, Fellow, IEEE, Wentao Huang, Longjian Piao, Haichun Liu Abstract—Model-based analysis tools, built on assumptions and simplifications, are difficult to handle smart grids with data characterized by volume, velocity, variety, and veracity (i.e. 4Vs data). This paper, using random matrix theory (RMT), motivates data-driven tools to perceive the complex grids in high- dimension; meanwhile, an architecture with detailed procedures is proposed. In algorithm perspective, the architecture performs a high-dimensional analysis, and compares the findings with RMT predictions to conduct anomaly detections. Mean Spectral Radius (MSR), as a statistical indicator, is defined to reflect the correlations of system data in different dimensions. In management mode perspective, a group-work mode is discussed for smart grids operation. This mode breaks through regional limitations for energy flows and data flows, and makes advanced big data analyses possible. For a specific large-scale zone-dividing system with multiple connected utilities, each site, operating under the group-work mode, is able to work out the regional MSR only with its own measured/simulated data. The large-scale interconnected system, in this way, is naturally decoupled from statistical parameters perspective, rather than from engineering models perspective. Furthermore, a comparative analysis of these distributed MSRs, even with imperceptible different raw data, will produce a contour line to detect the event and locate the source. It demonstrates that the architecture is compatible with the block calculation only using the regional small database; beyond that, this architecture, as a data-driven solution, is sensitive to system situation awareness, and practical for real large-scale interconnected systems. Five case studies and their visualizations validate the designed architecture in various fields of power systems. To our best knowledge, this study is the first attempt to apply big data technology into smart grids. Index Terms—big data, smart grid, architecture, random ma- trix, high-dimension, mean spectral radius, large-scale dis- tributed systems, group-work mode I. I NTRODUCTION B IG data technology is a new scientific trend [1, 2]. Driven by data analysis in high-dimension, big data technology works out data correlations (indicated by statistical parameters) to gain insight to the inherent mechanisms. Data- driven results only rely on an unrestrained selection of system raw data (the space can be whole system or only a region, the time can be long or short, and the size can be large or small) and a general statistical procedure (for data processing). On the other side, procedures for traditional model-based analysis, This work was supported by National Key Technology Research and Development Program of Science and Technology (2013BAA01B04). Xing He, Qian Ai, Wentao Huang, Longjian Piao, Haichun Liu are with the Department of Electrical Engineering, Research Center for Big Data Engineering Technology, State Energy Smart Grid R & D Center, Shanghai Jiaotong University, Shanghai 200240, China (e-mail: hexing [email protected]) Robert C. Qiu is also with the Department of Electrical and Computer Engineering, Tennessee Technological University, Cookeville, TN 38505 USA (e-mail: [email protected]) especially decoupling a practical interconnected system, are always based on assumptions and simplifications. Model-based results rely on identified causalities, specific parameters, sam- ple selections, and training processes; imprecise or incomplete formulas/expressions, biased sample selections, and improper training processes will all lead to bad results. The results are often barely satisfied or even unsatisfied as the system size grows and complexity increases. Generally speaking, data- driven analysis tools, rather than model-based ones, are more suitable to complex large-scale interconnected systems with readily accessible data. Data in power systems have increased dramatically, leaving gaps and challenges; data processing is a major concern and its urgency increases with data growth. The 4Vs data (data with features of volume, variety, velocity, and veracity) [3] in smart grids, which can hardly be handled within a tolerable elapsed time or hardware resources by traditional model-based tools, have encouraged the development of an emerging paradigm—big data technology for power systems [4–6]. Big data technology does not conflict with classical analyses or pretreatments. Actually, big data technology has already been successfully applied as a powerful data-driven tool for numerous phenomena, such as quantum systems [7], financial systems [8, 9], biological systems [10], as well as wireless communication networks [11–13]. Major tasks of the architecture for these applications seem similar—1) big data modeling, and 2) big data analysis. It is believed that big data technology will also have a wide applied scope in power systems, and the results will be fruitful. A. Contribution This paper, aiming to apply big data technology into smart grids, proposes a feasible architecture with detailed proce- dures. Firstly, we introduce random matrix theory (RMT) as our mathematical foundations. According to RMT, a standard random matrix is systematically formed to map the system. Then, we conduct the high-dimensional analysis and compare the findings (i.e. Empirical Spectrum Density, and Kernel Density Estimation) with the RMT theoretical predictions (i.e. Marchenko-Pastur Law, and Ring Law) to tell signals from white noises. Within the above mathematical procedure, a high-dimensional statistic—mean spectral radius (MSR)—is proposed to indicate the data correlations. More than that, MSR also clarifies the parameter interchanged among the utilities under the group-work mode for distributed calculation. In addition, power grids in three different periods, involving their features, management modes, and data flows and energy arXiv:1501.07329v4 [stat.ME] 16 Jun 2015

A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

Embed Size (px)

Citation preview

Page 1: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

1

A Big Data Architecture Design for Smart GridsBased on Random Matrix Theory

Xing He, Qian Ai, Member, IEEE, Robert C. Qiu, Fellow, IEEE, Wentao Huang, Longjian Piao, Haichun Liu

Abstract—Model-based analysis tools, built on assumptionsand simplifications, are difficult to handle smart grids withdata characterized by volume, velocity, variety, and veracity(i.e. 4Vs data). This paper, using random matrix theory (RMT),motivates data-driven tools to perceive the complex grids in high-dimension; meanwhile, an architecture with detailed proceduresis proposed. In algorithm perspective, the architecture performsa high-dimensional analysis, and compares the findings withRMT predictions to conduct anomaly detections. Mean SpectralRadius (MSR), as a statistical indicator, is defined to reflectthe correlations of system data in different dimensions. Inmanagement mode perspective, a group-work mode is discussedfor smart grids operation. This mode breaks through regionallimitations for energy flows and data flows, and makes advancedbig data analyses possible. For a specific large-scale zone-dividingsystem with multiple connected utilities, each site, operatingunder the group-work mode, is able to work out the regionalMSR only with its own measured/simulated data. The large-scaleinterconnected system, in this way, is naturally decoupled fromstatistical parameters perspective, rather than from engineeringmodels perspective. Furthermore, a comparative analysis of thesedistributed MSRs, even with imperceptible different raw data,will produce a contour line to detect the event and locate thesource. It demonstrates that the architecture is compatible withthe block calculation only using the regional small database;beyond that, this architecture, as a data-driven solution, issensitive to system situation awareness, and practical for reallarge-scale interconnected systems. Five case studies and theirvisualizations validate the designed architecture in various fieldsof power systems. To our best knowledge, this study is the firstattempt to apply big data technology into smart grids.

Index Terms—big data, smart grid, architecture, random ma-trix, high-dimension, mean spectral radius, large-scale dis-tributed systems, group-work mode

I. INTRODUCTION

B IG data technology is a new scientific trend [1, 2].Driven by data analysis in high-dimension, big data

technology works out data correlations (indicated by statisticalparameters) to gain insight to the inherent mechanisms. Data-driven results only rely on an unrestrained selection of systemraw data (the space can be whole system or only a region, thetime can be long or short, and the size can be large or small)and a general statistical procedure (for data processing). Onthe other side, procedures for traditional model-based analysis,

This work was supported by National Key Technology Research andDevelopment Program of Science and Technology (2013BAA01B04).

Xing He, Qian Ai, Wentao Huang, Longjian Piao, Haichun Liu are withthe Department of Electrical Engineering, Research Center for Big DataEngineering Technology, State Energy Smart Grid R & D Center, ShanghaiJiaotong University, Shanghai 200240, China (e-mail: hexing [email protected])

Robert C. Qiu is also with the Department of Electrical and ComputerEngineering, Tennessee Technological University, Cookeville, TN 38505 USA(e-mail: [email protected])

especially decoupling a practical interconnected system, arealways based on assumptions and simplifications. Model-basedresults rely on identified causalities, specific parameters, sam-ple selections, and training processes; imprecise or incompleteformulas/expressions, biased sample selections, and impropertraining processes will all lead to bad results. The results areoften barely satisfied or even unsatisfied as the system sizegrows and complexity increases. Generally speaking, data-driven analysis tools, rather than model-based ones, are moresuitable to complex large-scale interconnected systems withreadily accessible data.

Data in power systems have increased dramatically, leavinggaps and challenges; data processing is a major concernand its urgency increases with data growth. The 4Vs data(data with features of volume, variety, velocity, and veracity)[3] in smart grids, which can hardly be handled within atolerable elapsed time or hardware resources by traditionalmodel-based tools, have encouraged the development of anemerging paradigm—big data technology for power systems[4–6]. Big data technology does not conflict with classicalanalyses or pretreatments. Actually, big data technology hasalready been successfully applied as a powerful data-driventool for numerous phenomena, such as quantum systems [7],financial systems [8, 9], biological systems [10], as well aswireless communication networks [11–13]. Major tasks of thearchitecture for these applications seem similar—1) big datamodeling, and 2) big data analysis. It is believed that bigdata technology will also have a wide applied scope in powersystems, and the results will be fruitful.

A. Contribution

This paper, aiming to apply big data technology into smartgrids, proposes a feasible architecture with detailed proce-dures. Firstly, we introduce random matrix theory (RMT) asour mathematical foundations. According to RMT, a standardrandom matrix is systematically formed to map the system.Then, we conduct the high-dimensional analysis and comparethe findings (i.e. Empirical Spectrum Density, and KernelDensity Estimation) with the RMT theoretical predictions (i.e.Marchenko-Pastur Law, and Ring Law) to tell signals fromwhite noises. Within the above mathematical procedure, ahigh-dimensional statistic—mean spectral radius (MSR)—isproposed to indicate the data correlations. More than that,MSR also clarifies the parameter interchanged among theutilities under the group-work mode for distributed calculation.In addition, power grids in three different periods, involvingtheir features, management modes, and data flows and energy

arX

iv:1

501.

0732

9v4

[st

at.M

E]

16

Jun

2015

Page 2: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

2

flows, are summarized as general backgrounds and foundationsfor applying big data in power systems.

Based on the architecture, we also conduct five case studiesand summarize the most interesting results. 1) The com-parisons between the experimental findings and the RMTpredictions, as well as the proposed indicator MSR, aresensitive to event detections. In addition, data in differentdimensions are correlative under high-dimensional perspec-tive. 2) The MSR is somehow qualitatively correlated withthe quantitative parameters of system performance. 3) Thearchitecture, besides for event detections, can also be usedas a new method to find the critical active power point atany bus node, taking account of probable grid fluctuations. 4)The architecture is compatible with the block calculation onlyusing the regional small database, and practical for real large-scale distributed systems. In addition, the high-dimensionalcomparative analysis is sensitive to situation awareness forgrids operation, even with imperceptible different measureddata. 5) The architecture is suitable not only for the powerflow analysis, but also for the fault detection. To our bestknowledge, our study represents the first such attempt in theliterature on power systems.

B. Related WorkIt is well-established that data, as a vital resource, should be

utilized much more efficiently in power systems. References[14–16] showed the improvement in wide-area monitoring,protection and control (WAMPAC) with utilizing PMUs data.Kanao et al. proposed a practical data utilization methodbased on harmonic state-estimation (HSE) for power systemharmonic analysis [17]. It was a data processing method in aspecific field and only available when the engineering modelis accurate. Alahakoon et al. proposed advanced analyticrefer to a number of techniques in many specific fields: datamining tools, knowledge discovery tools, machine learningtechnologies, and so on [18]. Recently, Xu initiated powerdisturbance data analytics to explore useful aspects of powerquality monitoring data ,and showed a wide applied scopein the future [19]. The mathematical foundations and systemframeworks were missing yet. For data utilization methodsin power systems, although many researches, especially thosemethods based on specific physical models, were done invarious fields, little attention has been paid to the design of auniversal architecture, which is based on solid mathematicalfoundations and statistical procedures.

II. BIG DATA, RANDOM MATRIX THEORY, AND DATAPROCESSING PROCEDURE

The nomenclature is given as Table I.Big data is an emerging paradigm applied to datasets whose

size is beyond the ability of commonly used software toolsto capture, manage, and process the data within a tolerableelapsed time. Various technologies are being discussed tosupport the handling of big data such as massively parallelprocessing databases, scalable storage systems, cloud comput-ing platforms, and MapReduce. In the context of our paper,massively parallel processing databases is relevant to addressthe real-time operation within tolerable elapsed time.

TABLE I: Some Frequently Used Notations in the Theory

Notations MeansX,x ,x,xi,j a matrix, a vector, a single value, an entry of a matrixX,x ,x hat: raw dataX,x ,x ,Z tilde: transformation data, formed by normalizationxi ,xi overline: averageN,T, c the numbers of rows and columns, c=N/T

C|N×T N×T dimensional complex spaceXu the singular value equivalent of the matrix X

S Covariance matrix of X: S= 1NXXH ∈ C|N×N

Z,L L independent matrices product: Z =∏L

i=1Xu,i

λS, λZ the eigenvalue of matrix S, ZλS,i the i-th eigenvalue of matrix Sr the circle radius on the complex plane of eigenvaluesκMSR mean value of radius for all the eigenvalues of Z: rλZ

µ (x ),σ2(x ) mean, variance for x

A. Big Data Definition and its Features

Big data is a data-driven cognitive approach; it perceives theworld through data—it works out the statistical correlations in-dicated by high-dimensional parameters using a non-parametermodel. Currently, there exists no standardized definition forbig data. This paper gives a mathematical definition below asour past work [11–13, 20].• For each sampling time, data of N -dimension are mod-

eled as vectors, say xi ∈ R|N ;• The number of data samples, say T , is large;• A function, f (x

1,x

2,· · · ,xT ), is able to be defined;

One of the big data fundamental characteristics is hugevolume of data represented by heterogeneous and diversedimensions. X ∈ C|N×T is a natural model, formed by the datax

1,x

2,· · · ,xT , to describe a large-scale system or subsystem

[21]; X is a non-parameter model formed almost based on aminimum hypothesis. While the size (rows N for dimensionsnumber; columns T for samples number) increases, so do thecomplexity and the relationship underneath the data. Whereas,when the size are sufficiently large, some unique phenomena,such as concentration of spectral measure [4], will occur.

B. Random Matrix Theory

Random matrix theory (RMT) has emerged as a particularlyuseful framework for many theoretical questions, especially forthose concerning multivariate data. There are two frameworksfor RMT: one assumes the asymptotic convergence, and thedimensions are infinite; the other one is the non-asymptoticsolution, assuming finite matrix size. For the asymptoticresults, in theory, we require the infinite size of the matrix,which is infeasible in practical world. However, the results areremarkably accurate, even for relatively moderate matrix sizessuch as tens. This is the very reason why this random matrixmodel is penetrating so many areas from financial engineeringto wireless network. Our initial motivation for this model wasfrom large-scale wireless network. The new trends for RMTare (1) finite matrix and (2) non-Gaussian matrix entries.

1) Marchenko-Pastur Law (M-P Law):M-P Law describes the asymptotic behavior of sin-

gular values of large rectangular random matrices. Let

Page 3: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

3

X = xi,j be a N×T random matrix whose entries, withthe mean µ(x)=0 and the variance σ2(x)<∞, are inde-pendent identically distributed (i.i.d.). As N,T →∞ withthe ratio c = N/T ∈ (0, 1], the empirical spectrum den-sity (ESD) of the corresponding sample covariance matrixS= 1

NXXH ∈ C|N×N converges to the distribution of M-PLaw [20, 22] with density function

fESD(λS) =

1

2πλcσ2

√(b−λ)(λ−a) , a 6 λ 6 b

0 , otherwise(1)

where a=σ2(1−√c)2, b=σ2(1 +

√c)2.

2) Kernel Density Estimation (KDE):A nonparametric estimate [23] of the ESD of the sample

covariance matrix S ∈ C|N×N is used

fESD(λS) = 1Nh

∑Ni=1K(

x− λS,ih ) (2)

where λS,i (i=1, 2, · · · , N ) are the eigenvalues of S, and K(·)is the kernel function for bandwidth parameter h.

3) Ring Law:Ring Law for (large) non-Hermitian matrices is one of

the most remarkable developments in the modern probability[24, 25]. Except for our previous work [4, 11–13], littleresearch has been done to leverage this new tool in the contextof modeling massive datasets. This mathematical structure isgeneral enough to model many unprecedented problems.

Consider the product of L non-Hermitian random matricesZ =

∏Li=1Xu,i, where Xu ∈ C|N×N is the singular value

equivalent [26] of the rectangular non-Hermitian random ma-trix X ∈ C|N×T , whose entries are i.i.d. variables with themean µ(x)=0 and the variance σ2(x)=1. This product Zallows us to study the streaming datasets generated as afunction of both space and time. The basic target of thisstudy is the prospective architecture and its effectiveness. Forsimplification, we set L to one in this paper and need not todiscuss more on L. Beyond that, the product Z, by a transformwhich make the variance to 1/N , can be converted to Z (i.e.σ2(z )=1/N ). Thus, the ESD of Z converges almost surelyto the limit given by

fESD(λZ) =

1

πcL |λ|(2/L−2) , (1− c)L/2 6 |λ| 6 1

0 , otherwise(3)

as N,T →∞ with the ratio N/T = c ∈ (0, 1].On the complex plane of eigenvalues, the inner cir-

cle radius is (1− c)L/2 and the outer circle radius isunity. Especially, S= ZZ

H= 1NYYH (Y =

√N Z ∈ C|N×N ,

σ2(z )=1/N , σ2(y)=σ2(√Nz )=1) is able to be acquired,

and S conforms to M-P Law. Ring Law and M-P Law withL = 1 and L = 8 are shown as Figure 1. Furthermore, wepropose κMSR (defined at the end of this section) to indicatethe eigenvalues distribution of Z and the data correlation asa statistical parameter. In Figure 1, κMSR is depicted in greenline.

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

T=1000 , N=400 L=1

Real(Z)

Imag

(Z)

eigenvaluerinner

=0.7746

κMSR

=0.89296

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

PD

F

Tsamp=1000 , N=400, n=1000 ||Ver=54 a=0.13509, b=2.6649, h=0.1

HistogramMarcenko−Pastur Law: fcxKernel Density Estimation: fnx

0 0.5 1 1.5 2 2.5 3−40

−20

0

20

40

Eigenvalue

Pro

babi

lity

Den

sity

Deviation n*h*[ fc(x)−f

n(x) ]

(a) L=1

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

T=1000 , N=400 L=8

Real(Z)

Imag

(Z)

eigenvaluerinner

=0.1296

κMSR

=0.4631

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

PD

F

Tsamp=1000 , N=400, n=1000 ||Ver=53 a=0.13509, b=2.6649, h=0.1

HistogramMarcenko−Pastur Law: fcxKernel Density Estimation: fnx

0 0.5 1 1.5 2 2.5 3−150

−100

−50

0

50

Eigenvalue

Pro

babi

lity

Den

sity

Deviation n*h*[ fc(x)−f

n(x) ]

(b) L=8

Fig. 1: Ring Law and M-P Law for Z =∏Li=1Xu,i

C. Big Data Analysis

Data processing in power systems is a typical big datachallenge. For a system, lots of measured data or simulatedones x are readily accessible; for a certain time ti, they arearranged as a vector x

ti. As time goes by, vectors are acquired

one by one and a sheet is naturally formed as dataset to mapthe system. Any section in the dataset, according to our intend,is available as a raw data source Ωx for further analyses. Thus,Ωx consists of sample vectors on a series of times, which aredenoted as x

t1, x

t2, · · · , x

ti, · · ·; at any time ti, the vector

xti

consists of sample data in various dimensions, which aredenoted as x

ti, n1, x

ti, n2, · · · , x

ti, nj, · · · . The upper limit of

dimensions n (i.e. max j) is subject to the variety of the dataat a single sampling time, and the upper limit of time lengtht (i.e. max i) is subject to the volume of the database andgenerally big enough.

For the raw data source Ωx , we can focus on any dataarea (e.g., N rows, T columns) as a split-window; a rawmatrix X ∈ C|N×T is naturally formed. Then, X is convertedto a normalized non-Hermitian matrix X ∈ C|N×T row-by-rowwith following algorithm:

xi,j = (xi,j−xi)×(σ(xi)/σ(xi))+xi , 16 i 6N; 16j6T (4)

where xi =(x i1, x i2, · · · , x iT ) and xi =0, σ2(xi)=1.The matrix Xu ∈ C|N×N is introduced as the singular value

equivalent [4] of X ∈ C|N×T by

Xu =

√XX

HU (5)

where U ∈ C|N×N is a Haar unitary matrix, XuXuH ≡ XX

H.

For L arbitrarily assigned independent non-Hermitian ma-trices Xi (i=1,· · ·, L) in the raw data source Ωx , the matricesproduct Z =

∏Li=1Xu,i ∈ C|N×N is able to be acquired. Then,

Z is calculated row-by-row with following formula:

Page 4: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

4

zi = zi/(√Nσ(zi)) , 1 6 i 6 N (6)

where zi =(z i,1, z i,2, · · · , z i,N ) , zi =(z i,1, z i,2, · · · , z i,N ).Furthermore, for the radius of all eigenvalue of Z on

the complex plane, we calculate its mean value and denoteit as κMSR (i.e. κMSR =rλZ

). In general, we conduct high-dimensional analysis, only with its dataset, to reveal theproperties of a power system. Z and its ESD are analyzedbased on the newly developed Ring Law, and the high-dimensional statistic κMSR is calculated and visualized as anindicator. In addition, the corresponding sample covariancematrix S= ZZ

His calculated for the comparison among the

results of its histogram, KDE, and M-P Law.

III. A BIG DATA ARCHITECTURE FOR SMART GRIDS ANDITS ADVANTAGES

A. Big Data Architecture for Smart Grids

The frequently used notations are shown in Table II.

TABLE II: Notations for Architecture and Case Studies

Notations MeansATime time area to focus on the data split-windowANode node area to focus on the data split-windowt time: t0 for current time, ts for sampling timePBus-n power demand of load at bus-nPmax Bus-n critical point on Power–Voltage curve for bus-nγAcc addition factor for load fluctuations of the gridγMul multiplication factor for load fluctuations of the grid

The designed architecture is illustrated as Fig. 2.

Change Load

P=P1,…,Pt,…

Network

Information

Y

Simulation

Real System

Data sets

Grid

PQ NodeP,Q

PV Node

P,V

Vθ Node

V,θ

MatPower

All Node Data

P,Q,V,θ

Data Source

n Node at some time

2n given numbers

2n unknown numbers

2n function

Generate

Selected

Grid

SCADA,EMS

PMU,AMI,PQ

Sample Data

Collect

1 tsAbnormal

Engineering

Procedure

4n numbers at one point

Data S

ource

PV

PQ

Full Data Window WF0 at time t1

Small Window WS0

ATime(1)

Full Data Window WF0 at time t2

AN

ote

ATime

Sectionalized Small

Window WSS0

Abnormal

ATime(end)

t

Mathematical ProcedureATime=ATime(1):ATime(end)

ˆx

Fig. 2: The designed big data architecture for smart grids. Thepart above the dot line illustrates an engineering procedure for bigdata modeling, during which the raw data source Ωx are formedto map the physical system. The other part below the dot lineillustrates a mathematical procedure for big data analysis. It isfully independent of engineering parameters, and during which theanalyses are extracted from data source Ωx .

It consists of two independent procedures to connect smartgrids and big data—big data modeling as an engineeringprocedure, following by big data analysis as a mathematicalprocedure. During the engineering procedure, the raw datasource Ωx is acquired as described in Section II; during themathematical procedure, following steps are conducted:

Steps of Mathematical Procedure1) Set the initial parameters

1a) Set ATime0 and ANode0 to focus on the first data window1b) Set tΩ and k=0 to slide the moving split-window (MSW)

2) Focus on correspond window to form X (ATime =ATime0 + k)3) Calculate X, Xu, Z, Z, S4) Calculate the eigenvalues λZ ,λS

5) Conduct ESD analysis and compare the result according to RMT6) Calculate κMSR with λZ

7) Visualize the results8) Judge as times goes by:

8a) k < tΩ ⇒ k++; back to step 2)8b) k > tΩ ⇒ END)

Especially, in step 2) Focus on the data window, we areable to conduct 1) real-time analysis: focusing on the real-time data window whose last edge of the sampling time areais current time (i.e. ATime(end)= t0); and 2) block-calculationfor decoupling interconnected system: focusing on a smallerwindow consisting of data only in designated dimensions, butnot in all dimensions. Besides, as the split-window, in a fixedsize, slides across the data source Ωx with tΩ and k set in 1b),a series of κMSR is got for further research and visualization.

B. Advantages in Data Processing

1) Algorithm:This architecture analyzes data in high-dimension as illus-

trated by the solid purple lines in Fig. 4. It is a universalprocedure with 4 steps as follows:

Steps of Data Management for G3

1) Form standard random matrices X

2) Acquire Z, S by variables transforming (X → Xu → Z → Z → S)3) Conduct high-dimensional analysis based on RMT4) Conduct engineering interpretations

On the other hand, the procedure of traditional data process-ing algorithms, in most cases, relies highly on specific sim-plifications and assumptions to build models. Taking geneticalgorithm for an example, two steps are essential to achieve theresult. One is to transcode the engineering variables to geneas the input of the gene model. It is a subjective selectionprocedure for the specific roles in engineering system, andonly a few variables can be taken into account in finalmodel. The other one is to perform the genetic algorithmthrough operations of selection, crossover, and mutation. Manyproblems, such as improper settings of the population size, ofthe crossover probabilities, or of the mutation probabilities,will inevitably make the result worse.

Compared to traditional algorithms, big data analysis, drivenby data, enables us to analyze the interrelation and interaction

Page 5: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

5

among all the elements in system, seen as correlations indi-cated by high-dimensional parameters. Using a pure mathe-matical procedure without physical models and hypotheses,big data analysis is easier in logic and faster in speed.Moreover, except step 4): Conduct Engineering Interpretation,the whole procedure is objective without introducing or accu-mulating the systematic errors; moreover, the accidental errorscan be eliminated either with the matrix size growing, or byrepletion test and parallel computing due to the independenceof the algorithm.

2) Distributed Calculation for Interconnected Grids:Smart grids operation are featured with autonomous sources

and decentralized controls. Due to the potential transmis-sion cost and privacy concerns, Aggregating distributed datasources to a centralized site for mining is systematicallyprohibitive. On the other hand, although we are able tocarry out model-based mining at each distributed site, thedecoupling procedure for connected sources is high relatedto simplifications and assumptions. Accordingly the result areoften barely satisfied and leads to biased views and decisions.

Large random matrices provide a natural and universal data-driven solution. For a specific zone-dividing interconnectedsystem, each site is able to form a small matrix only with itsown data. In this way, the integrated matrix can be dividedinto blocks for distributed calculation and vice versa. Forthe overall system, we can conduct high-dimensional analysisby integrating the regional matrices, or even by processing afew regional high-dimensional parameters. The mathematicalfoundation is kept invariant as RMT; the scalability, however,depends on our intention. This architecture, decoupling thesystems in the form of statistical matrices or high-dimensionalparameters instead of models, is practical for real large-scaleinterconnected grids. The distributed calculation is a deepresearch, and in this paper we just provide a relative simpleapplied example—case 4 in the next section Five Case Studies.

C. Advantages in Management Mode

Some brief but novel introductions and analyses about thedevelopment of power grids, mainly under the perspective ofdata managing mode and information communication technol-ogy (ICT), are given as the related background for applyingbig data into smart grids. Meanwhile, new situations andchallenges, and advanced management mode for future gridsare further discussed from the perspective. Especially, it isgroup-work mode, in our opinion, that breaks through theregional limitation for energy flows and data flows. As a result,some data-driven functions, e.g., comparative analysis, anddistributed calculation, are able to be carried out.

Generally, the power grids evolutions are summarized asthree generations—G1, G2, and G3 [27]. Their own networkstructures are depicted in Fig. 12 [28]. Meanwhile, theirdata flows and energy flows, as well as corresponding datamanagement systems and work modes, are quite different [29],which are shown in Fig. 3 and Fig. 4, respectively. In thefollowing discussions, we will come to a conclusion that thegroup-work mode is the precondition for data-driven analysis,and the trend for smart grids.

Grid——Power Flow

Global Control Center

Power Component

Generation

Tradition

Power Plant

DG

Consumption Load

Non-Power Component

Human Being

Weather

Fuel

Load Characteristics

DG Performances

Power output Performances

Info

rma

tion

Inte

rch

an

ge

s

Arean

Area1

Area2. . .

Arean

CloudInternet Supergrid

Data flows

Energy flows

G2G1

G3MSR

Unit

Fig. 3: Data flows and energy flows for three generations of powersystems. The single lines, double lines, and triple lines indicate theflows of G1, G2, and G3, respectively.

1) G1: Small-scale Isolated Grids:G1 was developed from around 1900 to 1950, featured by

small-scale isolated grids. For G1, components interchange en-ergy and data within the isolated grid to keep stable. The com-ponents are fully controlled by decentralized control systemand operating under individual-work mode. It means that eachapparatus collects designated data, and makes correspondingdecisions only with its own application, just as shown at theabove part of Fig. 12a. The individual-work mode works withan easy logic and little information communication. Whereas,it means few advanced functions and inefficient utilization forresources. It is only suitable for small grids.

2) G2: Large-scale Interconnected Grids:G2 was developed from about 1960 to 2000, featured

by zone-dividing large-scale interconnected grids. For G2,utilities interchange energy and data within the adjacent ones.The components are dispatched by control center and operat-ing under team-work mode. The regional team leaders, likeslocal dispatching centers, substations, or microgrid controlcenters, aggregate their own team-members (i.e. componentsin the region) into a standard black-box model. These standardmodels will be further aggregated by the global control centerfor the control or prediction purposes. The two aggregationsabove are achieved by four steps, which are data monitoring,data pre-processing, data storage, and data processing, respec-tively. However, lots of engineering technologies or sciencesare essential as the foundations—Cognitive Radio WirelessNetwork [30–32], Specific Communication Service Mapping(SCSM) as Information and Communication Technology (ICT)[33], Cloud Storage, Parallel Computing as Computer Science(CS), and Modelling Building, Parameter Identification asMathematical Modeling [34, 35]. The description above areillustrated by dotted blue lines in Fig. 4. In general, theteam-work mode conducts model-based analysis, and mainlyconcerns with the system stability rather than the individualbenefit; it will not work well for smart grids with 4Vs data asdescribed in Section I.

3) G3: Smart Grids:The development of G3 was launched at the beginning of

Page 6: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

6

Acquire

by Variables

Transforming

Pre-Processing

Data Mining

Data Identification

Data Sharing

Data Source

Data Monitoring

Platform

Data Storage

Unified Modeling

SCSM

ACSI

Data Processing

Electrical Model

Cloud Storage

Parallel Computing

Cloud Computing

Applications

Modelling Building

Parameter Identification

Modelling Modification

CSEngineering

Technology

Form Large

Random

Matrix

Do Analysis and

Visualization

Based on the

Eigenvalue

Conduct

Engineering

Interpretation

Pure statistical science with mathematical processes

Universality and Real-time

Relay protection Prediction

Fault diagnosis

State recognition

……

Sample Data A Analysis 1Acquisition 1 Application 1

Sample Data B Analysis 2Acquisition 2 Application 2

Sample Data N Analysis nAcquisition n Application n

Sample Data A

Sample Data A

Sample Data N…

Distributed Control System

G1: Small-scale Isolated Grids

Decentralized Control System

Individual-work mode

G2: Large-scale Connected Grids

Distributed Control System

Team-work mode

G3: Smart Grids

Distributed Control System

Group-work modeCondition

monitoring

ICT

,Z S

Fig. 4: Data management systems and work modes for three generations of power systems. The above, middle, and below parts indicatethe data management systems and the work modes of G1, G2, and G3, respectively. For G1, each grid works independently. For G2, globaland local control centers are operating under the team-work mode. For G3, the group-work mode breaks through the regional limitation forenergy and data flows and has a better performance.

the 21st century; and for China, it is expected to be completedaround 2050 [27]. Fig. 12c shows that the clear-cut partitioningis no longer suitable for G3, as well as the team-work modewhich is based on the regional leader. For G3, the controlforce of the regional center (if still exist) is greatly releasedby individual units. The high performance and self-controlindividuals results in much more flexible flows, for both energyexchange and data communication, to improve utilization bysharing resources among the whole grid [36]. Accordingly,the group-work mode is proposed. Under this mode, theindividuals play a dominant part in the system under theauthority of the global control centers [29]. VPPs [37], MMGs[38], for instance, are typically G3 utilities. These group-workmode utilities provide a relaxed environment to benefit boththe individuals and the grids: the former (i.e. individuals),driven by their own interests and characteristics, are able tocreate or join a relatively free group to benefit mutually fromsharing their respective superior resources; meanwhile, theseutilities are generally big and controllable enough to be goodcustomers or managers to the latter (i.e. the smart grids).

IV. FIVE CASE STUDIES

For the following designed experiments, all the data areobtained in two scenarios: 1) Only white noises (i.e. bench-mark, whose statistical results agree with M-P Law and RingLaw); 2) Signals plus noises. We treat small random loads

fluctuations and sample errors as white noises, and suddenchanges and faults as signals.

Case 1 to Case 4, based on Matpower, belongs to the fieldof power system stability and control. The grid is a standardIEEE 118-bus system with six partitions displayed as Fig. 14[39]. Detailed information about the test bed is referred tothe case118.m in Matpower package and Matpower 4.1 UsersManual [40]. Case 5, based on PSCAD/EMTDC, is about faultdetection.

A. Case 1: Observation from the Split-Window with FullNetwork and 500 Sample Points—N=118, T =500

Case 1 is a simple study to validate that the architectureis able to quickly detect the signals from the noises with thereal-time data flow. Let N=118, T =500, c=N/T =0.236.Thus each split-window has n=NT =59, 000 data. Table IIIshows the series of assumed events, and the accordingly RMTanalysis results visualization at critical time and MSRs on thetime series are depicted as Fig. 5 and 6, respectively.

1) Sampling time ts =550 s, Time Area ATime =51:550 s:There are only small load fluctuations in this split-window,

which means that white-noises play a dominant part. Then,we compare the histogram, KDE and M-P Law. Inspection ofFig. 5a indicates that, in a white-noises dominated system, thekernel density estimation (in red line) matches the histogram

Page 7: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

7

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

St=550||Ver=328T=500, N=118 ||L=1

Real(Z)

Imag

(Z)

eigenvaluerinner

=0.87407

κMSR

=0.9363

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

St=550, h=0.12599||Ver=328T=500, N=118a=0.2644, b=2.2076

PD

F

HistogramMarcenko−Pastur Law: fcxKernel Density Estimation: fnx

0 0.5 1 1.5 2 2.5 3−30

−20

−10

0

10

20

Eigenvalue

Pro

babi

lity

Den

sity

Deviation n*h*[ fc(x)−f

n(x) ]

(a) ts =550 s

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

St=551||Ver=328T=500, N=118 ||L=1

Real(Z)

Imag

(Z)

eigenvaluerinner

=0.87407

κMSR

=0.81082

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

St=551, h=0.12599||Ver=328T=500, N=118a=0.2644, b=2.2076

PD

F

HistogramMarcenko−Pastur Law: fcxKernel Density Estimation: fnx

0 0.5 1 1.5 2 2.5 3−30

−20

−10

0

10

20

Eigenvalue

Pro

babi

lity

Den

sity

Deviation n*h*[ fc(x)−f

n(x) ]

(b) ts =551 s

Fig. 5: Ring Law, and the Comparation among Histogram,KDE, and M-P Law at Critical Time for Case 1

500 1000 1500 2000 25000

0.5

1

MS

R

X: 1049Y: 0.8095

Sample Time(s)

Tlength=500 N=118 ||Ver=328

X: 501Y: 0.9363

X: 550Y: 0.9363

X: 551Y: 0.8108

X: 554Y: 0.742 X: 561

Y: 0.696

X: 1050Y: 0.9311

500 1000 1500 2000 25000

2000

Pow

er D

eman

d(M

W)

X: 551Y: 199.1

X: 550Y: −0.8211

X: 1651Y: 1500Inner

VM

VM+A

load59

load59

Fig. 6: MSRs on the Time Series for Case 1

(in blue bar) very well. Moreover, the histogram curve and theKDE curve agree with M-P Law (in blue line).

2) Sampling time ts =551 s, Time Area ATime =52:551 s:For this split-window, Fig. 5b shows that the eigenvalues of

Ring Law collapse to the circle center, and both the histogramand KDE deviate from M-P Laws. It means that there aresome signals, any deviation of the benchmark (white noisesonly), in the system. Indeed, just at time t=551 s, the PBus-59suddenly changes from 0 MW to 200 MW somehow.

Fig. 6 depicts that the κMSR change dramatically ina short time since t=550 s. As the length of timearea T =500, the step change as the signal occurring attime t=551 s, is included during all the sampling timesfrom ts =551 s to ts =1049 s. It results in the deviation(0.9363, 0.8108, 0.742, · · · ). However, at the sampling timets =1050 s, when the time area ATime =551:1050 s, the step

signal is no longer exist, as well as the deviation of thehistogram and KDE from M-P Law, and κMSR is back to0.9311.

In addition, it is found that κMSR for the data of V (red line)which has definite physical meaning, and of V +iθ (black line)without any physical meaning, have the same trend. It indicatesthat MSR is a high-dimensional statistic, which is independentand robust to physical model in some way. The green lineindicates the inner radius of the ring for the analyzing matrix,whose value only depends on the matrix size as formula 3 insection II.

This case indicates that the presented statistic MSR is sen-sitive to signal. Meanwhile, there are some inherent relationsfor the data in different dimensions. It means that real-timeanalysis for event detection can be carried out with less kindsof data.

B. Case 2: Observation from the Smaller Split-Window withFull Network and 240 Sample Points—N=118, T =240

Case 2 is similar to the previous one except that T decreasesto 240 s, and the events are rearranged. In this case, we tryto find the qualitative relationships between the MSR and thequantitative parameters of system performances. The series ofassumed events and the κMSR –t curve are depicted as Table IVand Figure 7, respectively.

500 1000 1500 2000 25000

0.5

1

X: 1920Y: 0.4196

Sample Time

MS

R

X: 1320Y: 0.6122

X: 1020Y: 0.4689

X: 720Y: 0.6522

X: 420Y: 0.5562

X: 1800Y: 0.6281

X: 1200Y: 0.8352

X: 1500Y: 0.8307

X: 600Y: 0.8568

X: 900Y: 0.856

X: 300Y: 0.8631

X: 240Y: 0.8641

X: 2100Y: 0.6074

X: 1620Y: 0.3695

Tlength=240 N=118 ||Ver=285

500 1000 1500 2000 25000

5000

X: 1Y: -0.3794

Pow

er D

eman

d

X: 301Y: 200.8

X: 601Y: 249.6

X: 901Y: 798.7

X: 1201Y: 851.3

X: 1501Y: 2501

X: 1801Y: 2541

X: 2101Y: 2555

InnerVMVM+Aload59

Fig. 7: MSRs on the Time Series for Case 2

a) During once step-change, there is a negative correlationbetween the min value of MSR (i.e. κmin MSR ) and the step-change value of PBus-59 (i.e. ∆PBus-59):

∆PBus-59 200 50 550 1650 · · ·PBus-59 0→200 200→250 250→800 850→2500 · · ·κmin MSR 0.5562 0.6522 0.4689 0.3695 · · ·

b) When PBus-59 is steady at a higher level, κMSR is steadyat a lower level:

PBus-59 0 200 250 800 · · · 2540κMSR 0.8631 0.8568 0.8560 0.8352 · · · 0.6074

c) When the PBus-59 approaches to the critical active powerpoint Pmax Bus-59, a little step change of PBus-59 will lead to a

Page 8: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

8

small value of κmin MSR . The feature b) and c) is available toconduct vulnerable node identification [41] as a new method.And when PBus-59 is beyond Pmax Bus-59 (i.e. PBus-59 > 2555MW), κMSR is no longer steady.

∆PBus-59 50 50 40 15PBus-59 200→250 800→850 2500→2540 2540→2555κmin MSR 0.6522 0.6122 0.4196 unsteady∆κMSR 0.2046 0.2230 0.2085 unsteady

C. Case 3: Critical Power Point Estimation

This case, based on the feature c) of the previous one, isdesigned as a new method to find the critical point Pmax Bus-n,especially that the grid fluctuations is taking into consideration.The grid fluctuations are set by γAcc and γMul

yload nt=yload nt × (1 + γMul×x1) + γAcc×x2 (7)

where x1 and x2 are random numbers from a standardGaussian Distribution. The result are shown in Fig. 8.

In this model, the increase of grid fluctuations meansthe decrease of signal-noise ratio, which will cause a raiseof κmin MSR . Meanwhile, it will also cause the decrease ofPmax Bus-n for a certain node (2555 MW, 2548 MW, 2521MW), which meet our common knowledge and experience.

D. Case 4: Group-work Mode

For the above cases, the anomaly can be detected bytraditional model-based tools. These one validate that thedesigned data-driven architecture is also a solution to anomalydetection. In Case 4, however, we will design a case whichcan hardly be solved by traditional tools.

This case is based on a zone-dividing system with 6 regions(A1 to A6) depicted in Fig. 14. A PQ node far from slackbus, i.e. bus-117 in A1, is chosen as signal source. It is muchmore vulnerable than PV nodes such as bus-59. With the sameprocedures of the former case studies, Pmax Bus-117 and theκMSR–t curve for the overall system are depicted by Table V,Fig. 15, and Fig. 16, respectively. Under group-work mode,each regional center, just like the global one, calculates MSRwith its own data, such as Fig. 9a for A1, and 9b for A2.

When the data split-window is not big, just as A1 whichonly has 11 nodes (i.e. N=11), the signal is still able todetected, but the curve is not smooth. Thus, some regions arecombined to smooth the κMSR –t curve—A1&A2 in Fig. 10a,A3&A5 in 10b, and A4&A6 in 10c. In the appendices, the rawdata of V and their low-dimensional visualization for all PQbuses in A3&A5 around time ts =301 s, when the step-changeof PBus-117 happened, are shown as Fig. 17a, 17b.

Fig. 16 shows that Pmax Bus-117 =272.5 MW at ts =945 s.This point is observed by all the regional centers as shown inFig. 9 and 10. Meanwhile, the signal occurring just at timet=301 s is also able to be detected in the system. Accordingto κMSR of Fig. 10a, 10b and 10c, it is found that to responseto the signal, the distribution of ∆κMSR in distributed regions isjust like the contour line and the A1&A2 is the mountaintop.As a result, we conjecture that the signal is generated at

100 200 300 400 500 600 700 800 900 10000

0.5

1

Sample Time(s)

MS

R

Tlength=400 N=118 ||Ver=319

X: 955 Y: 0.8396

100 200 300 400 500 600 700 800 900 1000

1000

2000

3000

Pow

er D

eman

d(M

W)

920 930 940 950 960 970 980 990 10000.2

0.3

0.4

0.5 X: 954 Y: 0.4157

X: 955Y: 0.3447

InnerVMVM+Aload59

X: 955 Y: 2555

Pmax Bus-59 =2555 MW

(a) γAcc =0, γMul =0

100 200 300 400 500 600 700 800 900 10000

0.5

1

Sample Time(s)

MS

R

Tlength=400 N=118 ||Ver=320

X: 947 Y: 0.8396

100 200 300 400 500 600 700 800 900 1000

1000

2000X: 947 Y: 2548

Pow

er D

eman

d(M

W)

920 930 940 950 960 970 980 990 10000.2

0.3

0.4

0.5

X: 947 Y: 0.3

X: 946 Y: 0.5258

InnerVMVM+Aload59

3000

Pmax Bus-59 =2548 MW

(b) γAcc =1, γMul =0.02

100 200 300 400 500 600 700 800 900 10000

0.5

1

Sample Time(s)

MS

R

Tlength=400 N=118 ||Ver=321

X: 919 Y: 0.8396

100 200 300 400 500 600 700 800 900 1000

1000

2000

3000

X: 919 Y: 2521

Pow

er D

eman

d(M

W)

900 910 920 930 940 950 960 970 9800.3

0.4

0.5

0.6

X: 918 Y:0.6441

X: 919 Y: 0.5133

InnerVMVM+Aload59

Pmax Bus-59 =2526 MW

(c) γAcc =5, γMul =0.1

Fig. 8: Critical Power Point Estimation Taking Account ofGrid Fluctuations

A1&A2 rather than A3&A5 or A4&A6. The events set inTable V validates the conjecture.

Interchanging MSRs among distributed utilities, some use-ful analyses are able to be extracted. It is sensitive to anomalydetection, even with imperceptible different raw data just asshown in Fig. 17—the raw voltage amplitude dataset ΩV ofA3&A5 changes too little to be utilized in low-dimension (e.g.,Mean—one-dimensional statistic, Variance—two-dimensionalstatistic). This case study also validates the architecture iscompatible with block calculation only using regional smalldatabase; in this way, the architecture decouples the intercon-nect systems from statistical parameters perspective naturally,and is practical for real large-scale distributed systems.

Page 9: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

9

100 200 300 400 500 600 700 800 900 1000

0.2

0.4

0.6

0.8

1

X: 420Y: 0.5808

Sample Time(s)

MS

R

Tlength=240 N=44 ||Ver=34

X: 944Y: 0.5662

X: 945Y: 0.5275

X: 240Y: 0.9037

InnerVM

(a) ANode=[A1,A2]100 200 300 400 500 600 700 800 900 1000

0.2

0.4

0.6

0.8

1

X: 407Y: 0.7221

Sample Time(s)

MS

R

Tlength=240 N=44 ||Ver=35

X: 240Y: 0.9037

X: 944 Y: 0.7554

X: 945Y: 0.708

InnerVM

(b) ANode=[A3,A5]100 200 300 400 500 600 700 800 900 1000

0.4

0.6

0.8

1

X: 967Y: 0.8296

Sample Time(s)

MS

R

Tlength=240 N=30 ||Ver=36

X: 968Y: 0.5798

240 260 280 300 3200.8

0.85

0.9

InnerVM

(c) ANode=[A4,A6]

Fig. 10: MSR Series of Union Regions

100 200 300 400 500 600 700 800 900 1000

0.2

0.4

0.6

0.8

X: 951 Y: 0.9768

Sample Time(s)

MSR

Tlength=240 N=11 ||Ver=32

X: 420 Y: 0.7961

X: 540 Y: 0.9672

X: 300 Y: 0.9703

X: 944 Y: 0.7202

X: 945Y: 0.6699

100 200 300 400 500 600 700 800 900 1000

0

100

200X: 945Y: 272.5

Pow

er D

eman

d(M

W)

InnerVMload117

(a) ANode=A1

100 200 300 400 500 600 700 800 900 1000

0.2

0.4

0.6

0.8X: 240 Y: 0.9287

Sample Time(s)

MS

R

Tlength=240 N=33 ||Ver=33

X: 944 Y: 0.5812

X: 540 Y: 0.7815

X: 420 Y: 0.5815

X: 300 Y: 0.8011

X: 945 Y: 0.5357

100 200 300 400 500 600 700 800 900 1000

0

100

200

Pow

er D

eman

d(M

W)

InnerVMload117

(b) ANode=A2

Fig. 9: MSR Series of Single Region

E. Case 5: Fault Detection for Active Distribution Network

This case shows the application of the designed architecturein another field of power systems. Due to the integration andvariation of renewable generators and energy storage units,faults and disturbances detection has become increasinglycomplicated in active distribution networks (ADN) [42]. TableVI sets the events series, and Fig. 13 illustrates the fault mode.

Fig. 11 indicates that some signals are detected by theκMSR –t curve. Especially, it is conjectured that the most influ-ent events are happening around t=3, 000 ms and t=13, 000ms, which means that the three-phase and line-to-line shortcircuit have more influence than the single-phase ones. Thiscase shows that the architecture is also compatible with otherfields in power systems.

0 5000 10000 150000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X: 3004Y: 0.4537

Sample Time(ms)

MS

R

Tlength=200 N=45

X: 9049Y: 0.4623

X: 3099Y: 0.3202

X: 8186Y: 0.8803

InnerVM

2800 2900 3000 31000.2

0.3

0.4

0.5X: 3004Y: 0.4537

X: 3005Y: 0.433 X: 3099

Y: 0.3202

8600 8800 9000 92000.2

0.3

0.4

0.5

X: 9000Y: 0.4616

X: 9049Y: 0.4623

Fig. 11: MSR Series for Failure Events

V. CONCLUSION

This paper aims to apply big data technology into smartgrids. It proposes an architecture with two independent pro-cedures, as a data-driven solution, to conduct anomaly de-tections. In addition, moving split-window technology wasintroduced for real-time analysis, and a new statistic meanspectral radius was proposed to indicate the data correlations,as well as to clarify the parameter interchanged among theutilities.

The algorithm of this architecture is based on random matrixtheory (RMT); it is a fixed objective procedure, which is easyin logic and fast in speed. The non-asymptotic frameworkof RMT enables us to conduct high-dimensional analysis forreal systems, even with relatively moderate datasets. It alsoprovides us a natural way to decouple the interconnectedsystems from data perspective. The group-work model ofutilities in the systems, meanwhile, makes some data-drivenfunctions possible, such as distributed calculation and com-parative analysis.

Big data technology does not conflict with classical analysisor pretreatment. Instead, combination between block calcula-tion and traditional zone-dividing structure realizes compar-ative analysis—it is sensitive way to detect the event andlocate the source in the grid network, even with impercep-tible different measured/simulated data. Besides, the sparserepresentation of a random vector is carried out with the

Page 10: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

10

random matrix theory [43, 44]. An incomplete matrix datarepresentation of data may be considered: 1) random bandmatrices [45]; 2) low rank matrix recovery [11]. All thesetopics are considered through the unified framework of randommatrix theory [11].

It is a long way and big topic to apply big data in powersystems, especially for a specific field in real systems. Somedesigned methods are rough in this paper, e.g., to estimatecritical power point in case 3, to detect fault in power systemsin case 5, as well as some descriptions, e.g., group-workmode mentioned in section III, and new method for vulnerablenode identification mentioned in case 2. But these specialmethods/applications all have one thing in common: they areall based on the proposed architecture, and driven only bydata of voltage amplitude, which are the most basic in thegrids. With work ongoing, including 1) designing 3D Power-map by combining high-dimensional analysis and visualization[46], and 2) conducting event/fault detection for real systems[47, 48], we can say with confidence that the designed archi-tecture is practical for real world; and the keys are the RMTwith non-asymptotic framework, high-dimensional algorithm,block calculation, and augmented matrix. This paper, as aninitial one, just presents the universal architecture; for specialfields, it aims to raises many open questions than actuallyanswers ones. One wonders if this new direction will be far-reaching in years to come toward the age of big data.

APPENDIX ATHE GRID NETWORK STRUCTURES OF G1, G2 AND G3

1008 中 国 电 机 工 程 学 报 第 34卷

域”较小,电网在有限范围内演化生长,形成一个

个小电网。随着负荷不断增长,电网中逐渐出现更

高电压等级的变电站,这些节点一方面是高一级电

压等级电网中的节点,另一方面是所在的低一级电

压等级的“局域小电网”的集散节点,进而逐渐形

成分区级联的电网。

4 算例分析

4.1 初始状态 给定电网生长的地域规模 A[N1, N2],定义网

络节点 i的坐标为 vi(xi, yi),其中 xi[0,N1],yi[0,N2]

且 xi, yi为整数。进一步给定电网初始状态的能源分

布、负荷分布和电网拓扑。初始电网拓扑可以非常

简单,如两点一线的简单链式电网,或尚未修建任

何发电厂及变电站。

本 文 算 例 给 定 电 网 生 长 的 地 域 规 模

A[90,100],单位公里。进一步给定电网初始状态

的能源分布、负荷分布和电网拓扑。电网从无到有

开始生长演化。在以下的电网演化结果示意图中,

横、纵坐标表示地理坐标,按照图形、由大到

小和连线由粗到细的顺序,对应的发电厂、变电站

和输电线路依次属于 500、220、110和 10 kV 四个

电压等级。

4.2 演化结果 1——第一代电网再现 第一代小型电网如图 3所示,演化结果显示,

第一代电网中,发电厂单机容量一般在 100~

200 MW以下,输电和配电电压在 220 kV以下。这

是因为:

1)从需求的角度看,在电网建设初期,负荷

需求较分散且不大,在较小范围内新建小电站即可100

80

60

40

20

0 0 20 40 60 80

/

/

图 3 第一代电网生长演化仿真结果图

Fig. 3 Simulation result of the first generation grid’s evolution

满足供电需求。

2)从能源供给和控制能力发展水平的角度看,

此时电网技术尚未达到大装机容量发电厂和特高

压远距离输电所需的技术水平,因而,发电厂单机

装机容量较小,电能传输距离也较短。

3)从拓扑结构看,第 1 代电网呈现典型的链

式结构,由多个小电网构成,而这些小电网之间联

系较弱,即意味着此时电网尚未形成复杂网络。

综上,在需求驱动和技术制约的相互作用下,

第一代电网发展成为以小机组、低电压为特征的小

型电网,即发电厂及变电站围绕负荷中心建设,形

成一个个电压等级较低的小型电网,各小型电网自

成局部网络,之间有为数不多的输电线路进行相连。

4.3 演化结果 2——第二代电网再现 在第一代电网的基础上进一步演化,生长出第

二代电网,结果如图 4所示。演化结果显示,第二

代电网中,大容量机组出现并逐渐占据主导地位,

单机容量在 300~1 000 MW以上,由大容量机组发

电供给的负荷量占全网负荷总量的比重越来越大;

电网电压等级提高,主干电网的电压等级在 330 kV

及以上,超高压、特高压输电线相继产生并逐渐覆

盖全网。这是因为:

1)随着负荷增长,近距离的小电厂不能满足

供电需求,迫切需要大装机容量发电厂,大容量、

高效能机组不仅可以提供充足电能,还有利于降低

造价,节约能源,加快电力建设速度。

(a) G1

第 7期 梅生伟等:三代电网演化模型及特性分析 1009

100

80

60

40

20

0 0 20 40 60 80

/

/

图 4 第二代电网生长演化仿真结果

Fig. 4 Simulation result of the second generation grid’s evolution

2)此阶段以水利、煤炭为主的电能资源中心

往往与负荷中心相聚较远,为了合理利用能源、加

强环境保护、保障电力工业的可持续发展,迫切需

要远距离输电技术来解决电能输送问题。

3)为了电网安全稳定运行,电网需留有一定

量的备用容量,如果利用各地区用电的非同时性进

行负荷调整,则可减少备用容量和装机容量,提高

电网运行的经济性,这就需要实现电网互联。

4)在安全性和经济性的双重刺激下,大容量发

电机组和超高压特高压输电技术出现并日趋成熟,

从而从控制能力角度为电网互联提供了技术保障。

5)从拓扑结构看,第二代电网属于复杂网络,

呈现典型的网状结构,较低电压等级电网作为模

块,由主干网相连,形成等级网络。计算第二代电

网结构特性统计指标,如表 1所示。结果显示,第

二代电网平均路径长度较小,具有小世界网络属

性,且具有较明显的聚类效应,文献[17]的研究结

果即是第二代电网的一个典型例证。

6)从电网升级换代的角度去看,在第一代电

网时期形成的节点数较少、结构较为简单的小网络

模块,其电能供给与运输能力有限。随着负荷需求

增长,各模块不断扩张,且中心节点日渐显著。为

了保障电网运行的安全性,提高运行的经济性,各

模块通过各自的中心节点进行相连,即形成同时存

在模块性、局部聚类和无标度拓扑特性的等级网

络,即形成第二代分区级联式电网。

综上,电网逐渐演化成大容量、高效能机组作

为主力电源,特高压/超高压输电线路为主干网的第

二代大型分区级联式电网。第二代电网,能够实现

大范围资源优化配置安全可靠性较高,经济性较强。

4.4 演化结果 3——第三代电网再现 第三代电网演化结果如图 5所示。在第二代超

高压互联大电网的基础上,电网发展“舍本逐末”,

在低电压等级侧生长演化出分布式发电构成的微

网,最终形成以特高压电网为主干、下游由各分布

式发电构成的智能电网。从拓扑结构看,第三代电

网虽然也是呈现复杂的网络状态,但已无第二代电

网分层分区之特征。可以毫不夸张地讲,第三代电

网是当今复杂网络领域全新的研究对象,对它的演

化规律及动态行为的研究无疑将成为未来电网技

术以及复杂性科学本身的一个学科生长点。

计算第三代电网结构特性统计指标,如表 1所

示。单看第三代电网结构特性统计指标计算结果,

其仍具有小世界网络属性和聚类效应。对比第二代

电网,第三代电网呈现下述特征:

1)平均度减小。这是因为接入分布式电源后

网络中节点连接的概率减小,在拓扑结构上网络连

接变得松散。

2)平均路径长度增大。接入分布式电源后,电

网中大多数节点与新增的分布式电源节点缺少长程

连接,意味着网络中节点间的分离程度有所增加。

3)聚类系数下降。第三代电网中大多数小容

量的分布式电源仅通过一条线路与电网中某个节

点相连,不形成三角形连接,意味着总体上网络中

节点的聚集程度变得松散了。

第三代电网最终发展成为智能电网形态,绝非

(b) G21010 中 国 电 机 工 程 学 报 第 34卷

100

80

60

40

20

0 0 20 40 60 80

/

/

图 5 第三代电网生长演化仿真结果

Fig. 5 Simulation result of the third generation grid’s evolution

表 1 第二、三代电网结构统计特性指标

Tab. 1 Statistical properties of the second and

third generation grid

平均度数 平均路径长度 聚类系数

第二代电网 2.373 3 10.296 0.041 3

第三代电网 2.291 7 11.816 0.039 9

偶然。经济的腾飞带来的能源和环境危机不容小

觑,化石能源的枯竭始终是悬挂在当今世界上方的

达摩克里斯之剑。人类不得不将目光投射到新能源

上。资源密度较大的新能源可集中开发为大容量电

站,如我国甘肃的风电和青海的光伏发电工程。对

于分散的新能源则可采取分布式发电的利用形式。

分布式能源虽然密度低,但无所不在,因此分布式

电源位置灵活、分散,极好地适应了分散电力需求

和资源分布,延缓了输、配电网升级换代所需的巨

额投资,同时与大电网互为备用也使供电可靠性得

以改善。分布式电源尽管优点突出,但本身存在诸

多问题,例如,分布式电源单机接入成本高、控制

困难等。但在当前技术条件下,主要基于现代电力

电子技术的微电网能够实现分布式电源在中低压

配电网络中的大规模分散接入,充分利用分布式发

电单元,最大程度地发挥分布式发电供能系统效

能。目前,大批国内外电力科技工作者投身到微电

网研究领域,并在微电网的硬件研究、建模研究、

微电网对大电网的影响研究以及微电网的控制策

略等方面取得卓著的研究成果。这些成果的推广应

用使得分布式电源及微电网的广泛接入成为现实。

从复杂网络理论和代际理论的角度分析,第二

代电网具有小世界网络上属性,其中的长程连接在

提高输电效率的同时,使得部分与之相关的节点承

担的输电负荷远远高于其他节点,从而成为小世界

电网中的脆弱环节,易造成故障在小世界电网中迅

速蔓延,不利于电网安全稳定运行。应电网生长演

化的“优胜劣汰”的要求,第三代电网的生长过程则

考虑尽量分摊电网中各节点承担的输电负荷,分布

式发电不失为解决此难题的有效办法之一。第三代

电网在低压配网侧生长演化出分布式电源或微网,

改善电网资源配置,使电网在拓扑结构上紧密度有

所下降,更有助于预防大规模故障在小世界电网中

的迅速蔓延,提高电网的运行可靠性。

5 结语

本文从物理层面出发,分析归纳电力网络演化

生长的驱动因素,紧扣能电能供给、控制能力和负

荷需求三个电网生长演化的驱动因素,提出一种电

网演化生长模型,利用所提模型复现了三代电网演

化生长过程,从而验证了三代电网理论的正确性,

从复杂网络角度为“三代电网理论”提供了数学依

据。该模型用于工程实践,不仅对复杂大电网安全

性的研究具有深远意义,有利于提高我国电网的安

全性和可靠性,减少大面积停电给社会带来的灾难

性损失,而且能够大幅提升我国电网规划的统一优

化及规划水平,有利于实现从第二代电网向第三代

电网的过渡,最终实现大型电源与分布式电源相结

合、骨干电网与微电网结合的智能电网格局。

如前所述,三代电网理论本身蕴含了达尔文

(c) G3

Fig. 12: Simulation of network structures. The above twos areG1—Small-scale isolated grids, and G2—Large-scale interconnectedpower grids; and the below one is G3—Smart grids, which havecomplex network structures without clear-cut partitioning.

APPENDIX BEVENT SERIES FOR FIVE CASE STUDIES

TABLE III: Series of Events for Case 1

t [001:550] [551:1100] [1101:1650]PBus-59 0 200 0t [1651:2200] [2201:2500]PBus-59 1500 0

*t: (s), PBus-59: (MW)

TABLE IV: Series of Events for Case 2

t [001:300] [301:600] [601:900]PBus-59 0 200 250t [901:1200] [1201:1500] [1501:1800]PBus-59 800 850 2000t [1801:2100] [2101:2400] [2401:2500]PBus-59 2540 2555 3500

*t: (s), PBus-59: (MW)TABLE V: Series of Events for Case 4

t [001:300] [301:700] [701:1000]PBus-117 0 150 t/2− 200

*t: (s), PBus-117: (MW)TABLE VI: Series of Events for Case 5

t [1000:1500] [3000:3500] [5000:5500]EVENT A→ G AB→ G B→ Gt [7000:7500] [9000:11000] [13000:13500]EVENT C→ G BCS OPEN ABC→ G

*t: (ms)

DG1 DG2

DG3S1 S2

BCSA G

C GABC G AB G

B G

Fig. 13: Fault Model for Case 5

Page 11: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

11

APPENDIX C

1 1 2

3

4 11 12 117

14

15 13

16

6 7 5

G

G

G

G

G G

G

G

G G G

G G

G G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G G

G G G G G

G G

G

G

G

G G

G

G

G

G

G

G

G G

G G

G

19 17

18 30

113

32

31 29 20

21

35

33

34

36

24

22 27

8

9

10 28

114 115

26

25

23

39

40 41 42 53 54 55 56 59

58

51 57

50 48

49

45

46 47

69

38

43

52

68

67 66

65

62

61 64

60

63

73 71

70 72

74

116

79 81 78

82

75 118 76

77 97 96

88 8985

84 83

86

87

90 91

92

93 94 95

102 101 103

111 110 112

109

108

107 105 104

100

98

80 99 106

44 37

PDF 文件使用 "pdfFactory Pro" 试用版本创建 www.fineprint.com.cn

117

59A1

A2

A2

A2

A3

A4

A5

A6

A6

Fig. 14: Partitioning network for IEEE 118-bus system. Thereare six partitions, i.e. A1, A2, A3, A4, A5, and A6.

0 100 200 300 400 600 700 800 900 1000-50

0

50

100

150

300Load ||Ver=031

500

Pow

er D

eman

d(M

W)

Active Power Demand of Load on Bus 117

Sample Time(s)

Fig. 15: Grid Load Change for Case 4: γAcc =1, γMul =0.02

100 200 300 400 500 600 700 800 900 1000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

X: 945 Y: 0.5883

Sample Time(s)

MS

R

Tlength=240 N=118 ||Ver=31

X: 420 Y: 0.6187

X: 240 Y: 0.713

X: 540 Y: 0.7302

X: 300Y: 0.7436

X: 944 Y: 0.622

100 200 300 400 500 600 700 800 900 1000

0

50

100

150

200

300

X: 945Y: 272.5 P

ower

Dem

and(

MW

)

InnerVMVM+Aload117

250↑

Fig. 16: Global MSR Series(Pmax Bus-117 =272.5 MW)

Bus time 295 296 297 298 299 300 301 302 303 304 305 306 307 308 30939 0.970005 0.969983 0.969848 0.970096 0.969903 0.969995 0.970145 0.970012 0.969999 0.970005 0.970245 0.969933 0.970099 0.96996 0.97020641 0.966754 0.966685 0.966918 0.967009 0.966976 0.966935 0.966676 0.966833 0.966471 0.96676 0.9666 0.966547 0.966563 0.966763 0.96671344 0.984399 0.98382 0.984807 0.98424 0.984683 0.984323 0.980976 0.981477 0.980984 0.979607 0.981137 0.981585 0.980765 0.980326 0.98083745 0.986159 0.985919 0.986475 0.986035 0.9867 0.986232 0.983618 0.984244 0.983821 0.982742 0.984142 0.984361 0.983372 0.983277 0.98346647 1.017287 1.017029 1.0172 1.017073 1.017246 1.017314 1.016328 1.016143 1.016083 1.01625 1.015837 1.016058 1.016139 1.01585 1.01607948 1.020617 1.020854 1.020498 1.020576 1.020711 1.020752 1.020535 1.020788 1.020864 1.020696 1.020589 1.020722 1.020721 1.020486 1.02046850 1.001555 1.001473 1.001716 1.001359 1.00154 1.001021 1.001008 1.001145 1.000924 1.001311 1.001024 1.000831 1.001528 1.000546 1.00123451 0.966875 0.967021 0.967666 0.966543 0.966493 0.966801 0.966852 0.966935 0.967005 0.96647 0.966767 0.967124 0.966838 0.966583 0.96641452 0.956708 0.956717 0.957301 0.956364 0.956208 0.956561 0.956812 0.956951 0.956984 0.956343 0.956366 0.957502 0.956705 0.956349 0.95614353 0.945825 0.94593 0.946299 0.945711 0.945679 0.945927 0.946001 0.945844 0.945985 0.945577 0.945086 0.946315 0.946051 0.945759 0.94559857 0.9711 0.971013 0.971416 0.970994 0.971002 0.970781 0.970845 0.970836 0.970461 0.970899 0.970459 0.970249 0.970715 0.97007 0.97101858 0.959209 0.95909 0.959548 0.958857 0.958812 0.959062 0.95881 0.959101 0.959202 0.958819 0.959251 0.958807 0.959037 0.95876 0.95866368 1.00324 1.00323 1.003206 1.003232 1.003228 1.003243 1.00331 1.003313 1.00331 1.003309 1.003314 1.003313 1.00331 1.003312 1.00331371 0.986899 0.986883 0.986889 0.986923 0.986839 0.986878 0.98675 0.986715 0.986537 0.986658 0.986604 0.986713 0.986726 0.986554 0.98670175 0.967256 0.967232 0.967471 0.967343 0.967142 0.967213 0.966722 0.966731 0.96682 0.966516 0.966431 0.966914 0.966975 0.966566 0.96663978 1.003414 1.003523 1.003464 1.003526 1.003334 1.003337 1.003373 1.003359 1.003399 1.003556 1.003355 1.003497 1.003392 1.00341 1.00350279 1.009228 1.009288 1.009245 1.009251 1.00911 1.009066 1.009132 1.009142 1.009278 1.009384 1.00912 1.009328 1.009163 1.0092 1.00926381 0.99679 0.996753 0.996715 0.996781 0.996739 0.996784 0.996959 0.996914 0.996917 0.996911 0.996939 0.996942 0.996941 0.996898 0.99694582 0.988579 0.988269 0.988548 0.988544 0.988559 0.98849 0.988129 0.988476 0.98843 0.98825 0.988646 0.9886 0.988418 0.988725 0.98893483 0.984489 0.984081 0.984519 0.984369 0.984482 0.984381 0.983894 0.984484 0.984253 0.984027 0.984471 0.984465 0.98424 0.98456 0.98468984 0.979856 0.979474 0.980017 0.979697 0.979877 0.979412 0.979491 0.979964 0.979838 0.979406 0.97987 0.979662 0.979954 0.979511 0.97990686 0.986346 0.986656 0.986421 0.986177 0.986133 0.986261 0.986869 0.986215 0.986793 0.986812 0.986981 0.986868 0.986432 0.986516 0.98613996 0.992125 0.992003 0.992298 0.992142 0.992337 0.992212 0.991713 0.99215 0.99216 0.99215 0.992362 0.992366 0.991981 0.992263 0.99246997 1.011039 1.010851 1.011261 1.010953 1.011123 1.011065 1.010781 1.011048 1.01108 1.010941 1.011274 1.011171 1.01102 1.011111 1.01126698 1.023595 1.023298 1.023731 1.023517 1.023812 1.02356 1.023673 1.023153 1.023715 1.023551 1.02344 1.023858 1.023239 1.0235 1.023679118 0.949278 0.949199 0.949534 0.949275 0.949086 0.949337 0.948943 0.948904 0.949399 0.948898 0.948805 0.949365 0.949212 0.949062 0.948935

(a) Raw Data V (Orange for A3, and Blue for A5)

296 298 300 302 304 306 308 3100.94

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

Sample Time(s)

(b) Visualization of the above V (in low dimension)

Fig. 17: Raw Data V and the Visualization around ts =300 s

REFERENCES

[1] Nature, “Big data (specials),” Sep 2008,http://www.nature.com/news/specials/bigdata/index.html.

[2] Science, “Special online collection: Dealing with data,”Feb 2011, http://www.sciencemag.org/site/special/data/.

[3] IBM, “The four vs of big data.” [EB/OL], http://www.ibmbigdatahub.com/infographic/four-vs-big-data.

[4] R. Qiu and P. Antonik, Smart Grid and Big Data. JohnWiley and Sons, 2014.

[5] IBM, “Managing big data for smart grids and smartmeters,” May 2009.

[6] M. Kezunovic, L. Xie, and S. Grijalva, “The role ofbig data in improving power system operation and pro-tection,” in Bulk Power System Dynamics and Control-IX Optimization, Security and Control of the EmergingPower Grid (IREP), 2013 IREP Symposium. IEEE,2013, pp. 1–9.

[7] T. A. Brody, J. Flores, J. B. French, P. Mello, A. Pandey,and S. S. Wong, “Random-matrix physics: spectrumand strength fluctuations,” Reviews of Modern Physics,vol. 53, no. 3, p. 385, 1981.

[8] L. Laloux, P. Cizeau, M. Potters, and J.-P. Bouchaud,“Random matrix theory and financial correlations,” In-ternational Journal of Theoretical and Applied Finance,vol. 3, no. 03, pp. 391–397, 2000.

[9] H. Chen, R. H. Chiang, and V. C. Storey, “Businessintelligence and analytics: From big data to big impact.”MIS quarterly, vol. 36, no. 4, pp. 1165–1188, 2012.

[10] D. Howe, M. Costanzo, P. Fey, T. Gojobori, L. Hannick,

Page 12: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

12

W. Hide, D. P. Hill, R. Kania, M. Schaeffer, S. St Pierreet al., “Big data: The future of biocuration,” Nature, vol.455, no. 7209, pp. 47–50, 2008.

[11] R. Qiu and M. Wicks, Cognitive Networked Sensing andBig Data. Springer, 2013.

[12] C. Zhang and R. C. Qiu, “Data modeling with large ran-dom matrices in a cognitive radio network testbed: Ini-tial experimental demonstrations with 70 nodes,” arXivpreprint arXiv:1404.3788, 2014.

[13] X. Li, F. Lin, and R. C. Qiu, “Modeling massive amountof experimental data with large random matrices in a real-time UWB-MIMO system,” CoRR, vol. abs/1404.4078,2014. [Online]. Available: http://arxiv.org/abs/1404.4078

[14] A. Phadke and R. M. de Moraes, “The wide world ofwide-area measurement,” Power and Energy Magazine,IEEE, vol. 6, no. 5, pp. 52–65, 2008.

[15] V. Terzija, G. Valverde, D. Cai, P. Regulski, V. Madani,J. Fitch, S. Skok, M. M. Begovic, and A. Phadke, “Wide-area monitoring, protection, and control of future electricpower networks,” Proceedings of the IEEE, vol. 99, no. 1,pp. 80–93, 2011.

[16] L. Xie, Y. Chen, and H. Liao, “Distributed online moni-toring of quasi-static voltage collapse in multi-area powersystems,” Power Systems, IEEE Transactions on, vol. 27,no. 4, pp. 2271–2279, 2012.

[17] N. Kanao, M. Yamashita, H. Yanagida, M. Mizukami,Y. Hayashi, and J. Matsuki, “Power system harmonicanalysis using state-estimation method for japanese fielddata,” Power Delivery, IEEE Transactions on, vol. 20,no. 2, pp. 970–977, 2005.

[18] D. Alahakoon and X. Yu, “Advanced analytics for har-nessing the power of smart meter big data,” in IntelligentEnergy Systems (IWIES), 2013 IEEE International Work-shop on. IEEE, 2013, pp. 40–45.

[19] W. Xu and J. Yong, “Power disturbance data analytics–new application of power quality monitoring data,” Pro-ceedings of the CSEE, vol. 19, p. 013, 2013.

[20] R. C. Qiu, Z. Hu, H. Li, and M. C. Wicks, Cognitiveradio communication and networking: Principles andpractice. John Wiley & Sons, 2012.

[21] R. C. Qiu, “The foundation of big data: Experi-ments, formulation, and applications,” arXiv preprintarXiv:1412.6570, 2014.

[22] V. A. Marcenko and L. A. Pastur, “Distribution ofeigenvalues for some sets of random matrices,” Sbornik:Mathematics, vol. 1, no. 4, pp. 457–483, 1967.

[23] G. Pan, Q. Shao, and W. Zhou, “Universality of samplecovariance matrices: Clt of the smoothed empirical spec-tral distribution,” arXiv preprint arXiv:1111.5420, 2011.

[24] A. Guionnet, M. Krishnapur, and O. Zeitouni, “Thesingle ring theorem,” arXiv preprint arXiv:0909.2214,2009.

[25] F. Benaych-Georges and J. Rochet, “Outliers inthe single ring theorem,” Probability Theory andRelated Fields, pp. 1–51, 2015. [Online]. Available:http://dx.doi.org/10.1007/s00440-015-0632-x

[26] J. R. Ipsen and M. Kieburg, “Weak commutation rela-tions and eigenvalue statistics for products of rectangular

random matrices,” Physical Review E, vol. 89, no. 3, p.032106, 2014.

[27] X. Zhou, S. Chen, and Z. Lu, “Review and prospect forpower system development and related technologies: aconcept of three-generation power systems,” Proceedingsof the CSEE, vol. 22, pp. 1–11, 2013.

[28] S. Mei, Y. Gong, and F. LIU, “The evolution modelof three generation power systems and characteristicanalysis,” Proceedings of the CSEE, vol. 7, pp. 1003–1012, 2014.

[29] X. He, Q. Ai, Z. Yu, Y. Xu, and J. Zhang, “Power systemevolution and aggregation theory under the view ofpower ecosystem,” Power System Protection and Control,vol. 22, pp. 100–107, 2014.

[30] R. C. Qiu, Z. Hu, Z. Chen, N. Guo, R. Ranganathan,S. Hou, and G. Zheng, “Cognitive radio network forthe smart grid: experimental system architecture, controlalgorithms, security, and microgrid testbed,” Smart Grid,IEEE Transactions on, vol. 2, no. 4, pp. 724–740, 2011.

[31] H. Li, S. Gong, L. Lai, Z. Han, R. C. Qiu, and D. Yang,“Efficient and secure wireless communications for ad-vanced metering infrastructure in smart grids,” SmartGrid, IEEE Transactions on, vol. 3, no. 3, pp. 1540–1551, 2012.

[32] H. Li, L. Lai, and R. C. Qiu, “Scheduling of wirelessmetering for power market pricing in smart grid,” SmartGrid, IEEE Transactions on, vol. 3, no. 4, pp. 1611–1620, 2012.

[33] D. Baigent, M. Adamiak, and R. Mackiewicz, “Iec 61850communication networks and systems in substations: anoverview for users,” The Protection & Control Journal,vol. 8, pp. 61–68, 2009.

[34] R. Yuan, Q. Ai, and X. He, “Research on dynamic loadmodelling based on power quality monitoring system,”Generation, Transmission & Distribution, IET, vol. 7,no. 1, pp. 46–51, 2013.

[35] Q. Ai, X. Wang, and X. He, “The impact of large-scaledistributed generation on power grid and microgrids,”Renewable Energy, vol. 62, pp. 417–423, 2014.

[36] Z. Hong, D. Zhao, C. Gu, F. Li, and B. Wang, “Economicoptimization of smart distribution networks consideringreal-time pricing,” Journal of Modern Power Systems andClean Energy, vol. 2, no. 4, pp. 350–356, 2014.

[37] Y. Ji, “Multi-agent system based control of virtual powerplant and its application in smart grid,” Master’s thesis,Shanghai Jiaotong University, 2011.

[38] X. He, Q. Ai, P. Yuan, and X. Wang, “The researchon coordinated operation and cluster management formulti-microgrids,” in Sustainable Power Generation andSupply (SUPERGEN 2012), International Conference on.IET, 2012, pp. 1–3.

[39] X. Ni, Q. Ruan, S. Mei, and G. He, “A new networkpartitioning algorithm based on complex network theoryand its application in shanghai power grid,” Power systemtechnology, vol. 9, pp. 6–12, 2007.

[40] R. Zimmerman, C. Murillo-Sanchez, and D. Gan, “Mat-power users manual, version 4.1,” Power Systems Engi-neering Research Center, 2011.

Page 13: A Big Data Architecture Design for Smart Grids Based … A Big Data Architecture Design for Smart Grids Based on Random Matrix Theory Xing He, Qian Ai, Member, IEEE, Robert C. Qiu,

13

[41] Y. Zhao, Y. An, and Q. Ai, “Research on size andlocation of distributed generation with vulnerable nodeidentification in the active distribution network,” IETGeneration, Transmission & Distribution, vol. 8, no. 11,pp. 1801–1809, 2014.

[42] W. Huang, T. Nengling, X. Zheng, C. Fan, X. Yang, andB. J. Kirby, “An impedance protection scheme for feedersof active distribution networks,” Power Delivery, IEEETransactions on, vol. 29, no. 4, pp. 1591–1602, 2014.

[43] J. Xu, G. Yang, Y. Yin, H. Man, and H. He,“Sparse-representation-based classification withstructure-preserving dimension reduction,” CognitiveComputation, vol. 6, no. 3, pp. 608–621, 2014.

[44] G. E. Pfander, H. Rauhut, and J. Tanner, “Identificationof matrices having a sparse representation,” Signal Pro-cessing, IEEE Transactions on, vol. 56, no. 11, pp. 5376–5388, 2008.

[45] I. Jana, K. Saha, and A. Soshnikov, “Fluctuations oflinear eigenvalue statistics of random band matrices,”arXiv preprint arXiv:1412.2445, 2014.

[46] X. He, Q. Ai, J. Ni, L. Piao, Y. Xu, X. Xu et al.,“3d power-map for smart grids—an integration of high-dimensional analysis and visualization,” arXiv preprintarXiv:1503.00463, 2015.

[47] X. He, Q. Ai, R. C. Qiu, W. Huang, and J. Long,“An unsupervised learning method for early eventdetection in smart grid with big data,” arXiv preprintarXiv:1502.00060, 2015. [Online]. Available: http://arxiv.org/pdf/1502.00060.pdf

[48] Y. Cao, L. Cai, C. Qiu, J. Gu, X. He, Q. Ai, andZ. Jin, “A random matrix theoretical approach to earlyevent detection using experimental data,” arXiv preprintarXiv:1503.08445, 2015.