8
Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market Prices Payam Zamani-Dehkordi, Hamidreza Zareipour, Logan Rakai, and William D. Rosehart Department of Electrical and Computer Engineering University of Calgary, Calgary, AB, Canada [email protected]; [email protected]; [email protected]; [email protected] Abstract—There is vast potential value being stored in the massive amounts of data being produced each day in an electricity market. However, most of the value is not being realized due to the challenges in efficiently and intelligently processing and analyzing the large volumes data. These are the challenges of Big Data. Modern power systems are producing Big Data and better understanding it can lead to many business advantages. In this paper a data-driven approach is proposed for analyzing the effect of wind generation on the wholesale electricity price. It is a question of interest to know how electricity price is affected with higher level of wind penetration in a market. A model representing the quantitative effect of wind generation on electricity price would offer useful information to different sectors of electricity market from generators to consumers. Method is applied to the market of Alberta as a case study. The massive database is made based on the available public data from Alberta Electric System Operators (AESO). The impact of each MWh wind generation on the price of electricity is assessed. Results show that increased wind generation reduces wholesale market prices by a small, but economically-important amount. This impact is not constant and depends on the operating condition of the electricity market. Index Terms—Big Data, Electricity Market, Wind Generation. I. I NTRODUCTION Big Data is characterized by large volumes of data coming from a variety of sources and being produced at high velocites. These characteristics are present in several facets of the power industry such as the fine granularity data collected on individual consumer energy use, real-time data collection in smart grids, and data recording generation, load, bids, and other elements of a competive electricity market. Efficiently processing, analyzing and gaining new insights from the large amounts of available data is the domain of Big Data analytics. An important category of Big Data analytics is predictive ana- lytics, which utilizes a variety of statistical, modeling, machine learning and data mining techniques to study historical and recent data. The focus of the work described in this paper develops a predictive analytics procedure for understanding the impact of wind power generation in large, competitive electritcity markets. There were more than 200 million wind turbines operating across the globe, with a total capacity of 369 GW at the end of 2014 [1]. World-wide wind power capacity is expected to grow to around 630 GW by the end of 2019 [2]. Growth in wind power generation has not been limited to vertically integrated power utilities where generation investments are typically driven by government policies. Wind power generation has also significantly grown in competitive electricity markets where generation expansions are decided by private investors. For example, by the end of 2014, there was more than 1,400 MW of wind power capacity operating in Alberta’s compet- itive electricity market in Canada. In Texas interconnection in the United States, total installed capacity of wind power generation was more than 12,000 MW by end of December of 2014. Similarly, in National Electricity Market of Australia, there were 1866 operating turbines across 71 wind farms with total capacity of 3,806 MW. In Europe there was almost 128.8 GW of installed wind generation by the end of 2014. The fundamental difference between vertically integrated power systems and competitive electricity markets is that in the former electricity tariffs are predetermined by govern- ments. In the latter, however, price of electricity is determined competitively depending on supply and demand conditions in the market. While understanding the environmental benefits of wind power generation is straightforward, the economic impacts of large-scale wind power generation integration in a competitive market is more complicated. Adding the relatively low cost wind power to the supply side puts a downward pressure on electricity prices [3]–[17]. While lower prices are generally good news for the consumer in the short-term, they negatively impact the economics of conventional generators, which in turn would impact the adequacy and reliability of supply in the long-term [11]. Understanding and quantifying the impact of large-scale

Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

Big Data Analytics for Modelling the Impact ofWind Power Generation on Competitive Electricity

Market PricesPayam Zamani-Dehkordi, Hamidreza Zareipour, Logan Rakai, and William D. Rosehart

Department of Electrical and Computer EngineeringUniversity of Calgary, Calgary, AB, Canada

[email protected]; [email protected]; [email protected]; [email protected]

Abstract—There is vast potential value being stored in themassive amounts of data being produced each day in an electricitymarket. However, most of the value is not being realized dueto the challenges in efficiently and intelligently processing andanalyzing the large volumes data. These are the challenges ofBig Data. Modern power systems are producing Big Data andbetter understanding it can lead to many business advantages.In this paper a data-driven approach is proposed for analyzingthe effect of wind generation on the wholesale electricity price.It is a question of interest to know how electricity price isaffected with higher level of wind penetration in a market. Amodel representing the quantitative effect of wind generationon electricity price would offer useful information to differentsectors of electricity market from generators to consumers.Method is applied to the market of Alberta as a case study.The massive database is made based on the available public datafrom Alberta Electric System Operators (AESO). The impactof each MWh wind generation on the price of electricity isassessed. Results show that increased wind generation reduceswholesale market prices by a small, but economically-importantamount. This impact is not constant and depends on the operatingcondition of the electricity market.

Index Terms—Big Data, Electricity Market, Wind Generation.

I. INTRODUCTION

Big Data is characterized by large volumes of data comingfrom a variety of sources and being produced at high velocites.These characteristics are present in several facets of thepower industry such as the fine granularity data collected onindividual consumer energy use, real-time data collection insmart grids, and data recording generation, load, bids, andother elements of a competive electricity market. Efficientlyprocessing, analyzing and gaining new insights from the largeamounts of available data is the domain of Big Data analytics.An important category of Big Data analytics is predictive ana-lytics, which utilizes a variety of statistical, modeling, machinelearning and data mining techniques to study historical andrecent data. The focus of the work described in this paperdevelops a predictive analytics procedure for understanding

the impact of wind power generation in large, competitiveelectritcity markets.

There were more than 200 million wind turbines operatingacross the globe, with a total capacity of 369 GW at the end of2014 [1]. World-wide wind power capacity is expected to growto around 630 GW by the end of 2019 [2]. Growth in windpower generation has not been limited to vertically integratedpower utilities where generation investments are typicallydriven by government policies. Wind power generation hasalso significantly grown in competitive electricity marketswhere generation expansions are decided by private investors.For example, by the end of 2014, there was more than 1,400MW of wind power capacity operating in Alberta’s compet-itive electricity market in Canada. In Texas interconnectionin the United States, total installed capacity of wind powergeneration was more than 12,000 MW by end of Decemberof 2014. Similarly, in National Electricity Market of Australia,there were 1866 operating turbines across 71 wind farms withtotal capacity of 3,806 MW. In Europe there was almost 128.8GW of installed wind generation by the end of 2014.

The fundamental difference between vertically integratedpower systems and competitive electricity markets is that inthe former electricity tariffs are predetermined by govern-ments. In the latter, however, price of electricity is determinedcompetitively depending on supply and demand conditions inthe market. While understanding the environmental benefitsof wind power generation is straightforward, the economicimpacts of large-scale wind power generation integration in acompetitive market is more complicated. Adding the relativelylow cost wind power to the supply side puts a downwardpressure on electricity prices [3]–[17]. While lower prices aregenerally good news for the consumer in the short-term, theynegatively impact the economics of conventional generators,which in turn would impact the adequacy and reliability ofsupply in the long-term [11].

Understanding and quantifying the impact of large-scale

Page 2: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

wind generation integration on competitive electricity marketprices is of interest to market participants, system opera-tors and regulators. The impact of privileged wind powergeneration on electricity market prices is surveyed for thecases of Germany [3], [4], [10], [11], Spain [5], Texas [13]–[15], Belgium [9], and Britain [6]. Impact of wind generationcontrol strategies, penetration level and installation location onelectricity market prices are analyzed for a simple study casein [8]. The value of wind power integration resulting fromlower electricity prices in the PJM interconnection [16] andthe Spanish electricity market [17] was quantified to help theregulator determine subsidies to wind power generators.

The existing literature on price impact of large-scale windpower integration is based on either modeling system supplycurves [3]–[6], or simulating market operation [7]–[11], orcorrelating price variations to wind power fluctuations througheconometric models [12]–[17]. The first and second groups ofmodels require a wide set of information that may not beavailable in the public domain. The third group, however, relyon market data that are generally available to the public. Inall of the previous studies, however, the objective has beento determine the overall impact of wind power on electricitymarket prices without focusing of specific market conditions.With the advances in computing and Big Data, it is nowpossible to leverage the large volume of market data to gaininsights into the impact under different market conditions.

In the present paper, a data-driven method is proposedto estimate the relationship between wind generation leveland wholesale electricity market prices. Similar to the workspresented in [12]–[17], the proposed method is based onpublicly available data. A massive dataset will be prepared, thevolume of which depends on the length of studied time period,space between instances or sampling rate, and also the numberof included features. Dimensionality reduction is a commonlyused step in machine learning, especially when dealing witha high dimensional space of features. The original featurespace is mapped onto a new, reduced dimensionally space.The dimensionality reduction is usually performed either byselecting a subset of the original dimensions [18]. In this paperat the second step, a feature selection tool is employed inorder to reduce the volume of dataset and also accelerate thecalculation speed of data mining procedure. In the next step,we cluster market operation condition into different regimesand build econometric models for each regime individually.The main contribution of this paper is that by breaking theavailable data into separate clusters, the price impact of windpower generation is tied to market conditions, i.e., whilewind power availability may not necessarily impact the pricesin certain conditions, it may significantly shift the pricesunder others. In addition to that, clustering will decrease the

calculation time substantially and will help to manage the BigData since the volume of data and number of instances ineach cluster are lower than the total dataset and clusters canbe processed in parallel [19].

The rest of this paper is organized as follows. Backgroundand the proposed data mining methodology is discussed inSection II. Numerical results are presented in Section III,where the quantified impact of wind generation on electricityprice is estimated. Finally, the concluding remarks are pro-vided in Section IV.

II. LITERATURE REVIEW

The previous literature on quantifying the impact of windpower generation on electricity market prices can be cate-gorized into three groups. In the first group, referred to asstructural analysis, the price impact of wind generation isquantified based on the Merit-Order-Effect [3]–[6], i.e., thereduction in electricity price caused by shifting the supplycurve. In these studies, a perfectly competitive market structureis generally assumed, and market supply curves are built basedon the marginal costs of conventional generators.

In the second group, operation of electricity markets isformulated as an optimization and wind power is integratedin the operation model [7]–[11]. In a perfectly competitivemarket environment, the optimization problems is solved tosimulate system operation chronologically to minimize thetotal cost of supplying the load. Wind is considered as negativeload and the studies are conducted for a range of wind powerpenetration. In these studies, strategic behavior of conventionalgenerators are ignored.

In the third group, unlike the previous two, econometric-based methods are used to relate actual observations of elec-tricity prices to wind-power generation. A rigorous statisticaltheory is typically applied to determine the effect of theindependent variable, i.e., wind power, on the dependentvariable, i.e., the electricity price. In [12]–[15], a parametricmodel has been postulated and estimated to analyze the impactof wind generation on the wholesale electricity prices ofPacific Northwest [12] and Texas market [13]–[15]. The maindisadvantage behind parametric estimation is that a knownmathematical function representing the process must be pre-established. In [16], [17] nonparametric techniques are usedto find the impact of wind power on electricity prices. Thesestudies [12]–[17] normally present an overall average impactof wind generation on electricity prices.

None of the existing methods, reviewed above, have consid-ered different operating regimes in the market in the proposedmodels. In other words, the previous studies do not provide adetailed view of price impact depending on market conditions.The contribution of the proposed methodology in the current

Page 3: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

paper is that it integrates market operating regimes into priceimpact assessment of wind power generation.

III. PROPOSED METHODOLOGY

A data-driven method is suggested to quantify the impactof wind generation level on electricity prices. Briefly, theproposed methodology can be presented in a number of steps,as follows:

1) Data Preparation: Select an initial set of input variableswhich are considered to influence electricity prices. Inother words, these variables are perceived to explainprice fluctuations in the market.

2) Feature Selection: Narrow down the initial set of vari-ables into a smaller set that contains the most valued in-formation for modeling. This step, referred to as featureselection, aims at building optimal models by removingredundant information and reduce the dimensionality ofthe data set.

3) Model Building: Fit a model to the selected featuressuch that variations in price are related to variations inwind power and other inputs. This model represents aprice-formation process in the electricity market basedon the selected features.

4) Anomaly Detection: Detect anomalies for the price-wind relationship, built in the previous step and removethem from dataset. The detected anomalies represent theinstances for which electricity price is affected by factorsnot included in the model, i.e. unknown factors.

5) Clustering: Cluster the dataset after anomalies are re-moved into similar operating regimes. Each regime isdefined by the values of selected features, which in turnrepresent market conditions.

6) Similarity Detection: For each data point in each cluster,identify the instances for which all explanatory features,other than wind, are similar. For those instances, deter-mine the difference in wind level versus the differencein price level.

7) Price Impact Assessment: Fit a linear relationship to thechanges in price levels in terms of the changes in windlevels, for the instances identified in the previous step.

Each of the above steps are further explained in the follow-ing section.

A. Data Preparation

In this step, a set of market data that is deemed relevantto explain the fluctuations in electricity market prices needsto be formed. Exogenous explanatory variables such as load,day and hour indexes, temperature, transmission constraints,load levels of neighboring systems [20], [21], variants ofreserve margin [20], [22], and fuel prices [12]–[14] have been

considered in previous studies to model electricity prices. Thedata set also needs to include wind power generation data andelectricity prices as the two key variables in the study. Theinitial set of considered features is defined as follows:

x = [x1, x2, ..., xn], (1)

where n is the number of features included in the dataset.

B. Feature Selection

Feature selection aims to select and rank the relevant andinformative features. It is also used for general data reductionto improve computational speed and model efficiency. Severalfeature selection techniques are available, such as Relief [23],Mutual Information [24], and kernels [25]. Correlation-basedFeature Selection (CFS) and Mutual Information [24] are usedin this study for finding the final subset of features for the restof data mining tasks.

1) Mutual Information (MI): MI as an efficient non-linearfeature selection algorithm has been used extensively in pre-vious literature [26], [27]. As an information metric, MImeasures the mutual independence of two random variables.The information found commonly in two random variables,which is defined as the mutual information MI(x, y) betweenthe two variables is given by:

MI(x, y) =

∫ ∫p(x, y) log(

p(x, y)

p(x)p(y)) (2)

Where p(x) and p(y) are the corresponding probabilitydistributions of variables x and y. p(x, y) is the joint prob-ability distribution of the two variables x and y. If x andy are independent, then MI(x, y) will be zero. When usingmutual information in feature selection, it is always expectedto have a large mutual information value between the candidatefeature and the target class but a small mutual informationvalue between the candidate feature and the existing selectedfeatures.

2) Correlation-based Feature Selection (CFS): The CFSmethod is reported to be fast, effective, and simple to im-plement [28]. In [29] CFS is compared with other featureselection methods for electricity price classification, and hasoutperformed other studied methods. If the inter-correlationbetween each of the candidate features and their correspondingcorrelation with target variable is known, Pearsons correlationcoefficient is used a criterion to measure the correlationbetween a subset of features and target variable:

rzc =krzi√

k + k(k − 1)rii(3)

where rzi is the average correlation between features andtarget variable, and rii is the average inter-correlation between

Page 4: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

features. Equation 3 is used as a merit to decide about the finalconsidered features for the rest of data mining task.

The following final subset of features is selected by featureselection tools:

x = [x1, x2, ..., xm], (4)

where m ≤ n is the number of selected features. If there are Ninstances in the data set, then ith instance, i ∈ {1, 2, · · · , N},is shown by:

di = (xi, yi) (5)

xi = [x1,i, x2,i, ..., xm,i] (6)

Where, yi is the value of target variable, which is electricityprice in our case, at instance i.

C. Model Building

The objective of this step is to relate target variable to thefeatures selected by the feature selection step. The purposeis basically to determine a proper estimation of conditionalexpectation of the target variable depending upon the level offeatures. Several parametric and non-parametric approachesexist for building a model that relates one target variableto a set of explanatory variables. The main disadvantage ofparametric models is that a mathematical formulation repre-senting the relationship between target variable and featuresmust be assumed. In the present study, we intend to model thecomplicated nature of price formation based on features. Thus,establishing a parametric model with a predefined structureis challenging. Thus, we use the Robust Locally WeightedRegression (RLWR) as a non-parametric model. This methodhas been widely used in the literature to infer the conditionalexpectation of a target variable based on the level of a set offeatures [30]. A promoted multi-dimensional RLWR methodis used in this paper to find the conditional expectation of thetarget variable when the number of features is more than one.The algorithm can be defined by the following sequence ofoperations [31]:

Step 1: For the ith instance, fit a polynomial of order d inthe neighborhood of xi and find the correspondingvector of coefficients, Φi, that minimize the weightedsum of estimation error as follows:

Ei =

N∑j=1

Wij(yj −ΦiUj)2 (7)

where, Uj is a polynomial function of degree d ofinput vector at instance j, xj . Here d = 2 has beenchosen after trials with d ∈ {1, 2, 3}. The relatedweights are calculated as follows:

Wij = W (|xj − xi|/hi) (8)

where, hi is the distance from xi to its rth nearestneighbor, |xj − xi| is the distance from xi to xi,and W (x) is a weight function with the followingcharacteristics:

• W (x) > 0 for |x| < 1

• W (x) = W (−x)

• W (x) is a non-increasing function for x ≥ 0

• W (x) = 0 for |x| ≥ 1

Step 2: Compute the estimated residuals εj = yj − yj andlet s be the median of the |ε|. The robustness weightsare defined as follows:

δj = B(εj/6s) (9)

where, B is the bisquare weight function that isdefined as follows:

B(x) =

{(1− x2)2 if |x| < 1

0 if |x| ≥ 1(10)

Step 3: Fit a polynomial as in Step 1 but with the followingweights:

{δjWj(xi)} (11)

Step 4: Repeatedly carry out step 2 and 3 until an ac-ceptable accuracy is reached for weighted sum ofestimation errors based on Equ. 7.

The final yi’s are robust locally weighted regression fittedvalues.

D. Anomaly Detection

An anomaly detection algorithm searches the dataset forinstances which don’t follow the patterns detected by a builtmodel. The detected instances are basically where the targetvariable is driven by factors other than those included in themodel, i.e., unknown to the model. An anomaly is defined asan instance where the target variable deviates too much fromother observations for similar features levels. Outlier detectionhas been extensively studied in the literature and numerousapproaches have been developed [32]. In the previous step,a model has already been constructed that calculates theconditional expectation of the target variable based on certainlevel of features. A “normal” range of the target variable for agiven level of features can be calculated based on conditionalvariance of the target variable. In this study, we limit the targetvariable to the mean plus three times the standard deviation,in line with [33].

E. Clustering

Clustering refers to the grouping of records, observations,or instances into classes of similar objects. A cluster is a col-lection of records that are similar to each other, and different

Page 5: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

from records in other clusters. A clustering task attempts tosegment the entire data into subgroups or clusters, where thesimilarity of records within a cluster is maximized, whereasthe similarity of records outside a cluster is minimized. Eachcluster in this way represents an operating regime in thestudied process. Clustering improve the calculation speedbecause the number of instances in each cluster is lower thanthe total dataset and different clusters could be processed inparallel.

Several methods could be used for clustering, such as KernelPrincipal Component Analysis, Support Vector Clustering, andFuzzy Clustering [34]. In this study, we employ k-meansclustering. k-means is straightforward and widely used [35].The IEEE International Conference on Data Mining identifiedk-means clustering as one of the top 10 algorithms in datamining [35]. The k-means algorithm takes the input parameter,k, and partitions a set of N instances into k clusters. Thetraining procedure for this algorithm is described in [35].

F. Similarity Detection and Price Impact Assessment

This part is aimed to analyze the impact of a specificexplanatory variable, say xs, on the target variable, say y.The purpose is to see how variation in the selected feature ischanging the target variable. To do so, we have implementeda method that is designed to find two of instances in the data,say instance p and instance q for which, all of the featuresexcept xs are within a similar level. Thus, it can be concludedthat the difference in the target variable, y, is likely driven byvariable, xs. This procedure can be presented as follows:

1) Instances p, q refer to the data for two similar hours if:

|xh,p − xh,q| < δh , ∀h ∈ {1, 2, . . . ,m} & h 6= s

(12)

Where xh,p is the hth feature at the pth data instance.

2) The effect of feature xs on the target variable y for thetwo similar instances is calculated, as follows:

∆xs,pq = xs,p − xs,q (13)

∆ypq = yp − yq (14)

3) The mean effect of explanatory variable, xs, on the targetvariable, y, is calculated based on RLWR method foreach cluster separately.

IV. NUMERICAL RESULTS

The proposed data-driven algorithm is followed step by stepas it is explained in Section III. The related numerical resultsfor each step are illustrated in the next following sections.

TABLE IFEATURES AND TARGET VARIABLE IN THE DATASET

Features Target Variablex1 x2 x3 x4 y

Load Wind Power Temperature Gas Price Electricity Price

A. Data Preparation

The dataset is built based on the publicly available datafrom AESO database for the period of five years from Januaryof 2012 to December of 2014 with instances sampled withresolution of one minute. The total volume of dataset isabout 1 GB and the total number of instances is more than450, 000. Feature which are gathered as the initial set ofexplanatory variables are presented in Table I. These variablesare perceived to influence the target variable of electricityprice.

B. Feature Selection

The best values for CFS and MI algorithms are achievedwhen the selected subset of exogenous features includes Loadand Wind Power. The Temperature, x3, because of highcorrelation with load is not included in the final subset offeatures. The Gas Price, x4, doesn’t show strong correlationwith electricity price and therefore is not considered for restof data ming task. The final subset of features which shouldbe considered for the rest of data mining task is {x1, x2}.

C. Model Building

RLWR has been used to infer an econometric relationshipbetween the target variable of electricity price, y, and exoge-nous features of load, x1, and wind power, x2. The output ofRLWR method should be a proper estimation of conditionalexpectation of electricity price on the certain level of features.The fitted surface which shows the expectation of electricityprice in each level of load and wind power is presented inFig. 1. It is obvious from the figure that the impact of windpower on electricity price is not linear and is dependant onthe level of load. For higher amounts of load, the impact ofwind power on electricity price is more significant.

D. Anomaly Detection

Low and high boundaries for normal cases are representedin Fig. 2. Cases that fall out of this space are identifiedas outliers and are removed from the database for the restof analysis. Results show that anomaly hours include onlyabout 1.45% of the whole data set. These are instances wherethe electricity price is affected by non-modeled factors suchas unavailability of power plants or transmission lines dueto unscheduled outages [36] or different bidding strategiesconsidered by generation companies in various situations [37].

Page 6: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

Fig. 1. Expectation of electricity price in each level of load and wind power

Fig. 2. Low and high boundaries for normal cases

Fig. 3. Expectation of electricity price in each level of load and wind powerafter removing anomalies from dataset

The fitted smoothed surface for dataset after removing theanomalies is presented in Fig. 3.

TABLE IICLUSTER CENTERS

Cluster1 2 3

x1, Load 7773.73 9041.18 9529.56

x2, Wind Power 248.95 374.91 163.75

y, Electricity Price 20.76 31.93 120.85

η, Percentage 32.11% 61.84% 6.05%

E. Clustering

Similar instances are put into the same cluster based onk-means clustering algorithm. First of all, features are stan-dardized to achieve comparable scales for them. The numberof clusters is achieved by running the algorithm several timesfrom the possible minimum kmin = 2 to the maximumkmax = 7. Low number of clusters fails to distinguish thedata set completely. While high number of clusters on the otherhand leads to intensely small clusters which are so similar toother bigger clusters and can be merged with them. It wasconcluded that k=3 is the optimal number of clusters basedon Silhouette criterion [38]. Cluster centers are presented inTable II. The last row of the table shows the frequency ofcases in each cluster. As it can be seen from Fig. 3 the effectof wind power on electricity price is non-linear and dependanton the other influential factors. The clustering algorithm iscollaborating to model these non-linearities.

Cluster 1 represents the hours when net demand, differencebetween load and wind power, is in the lowest level andthus electricity price is much lower than average. The secondcluster models periods with intermediate level of net demand.This cluster includes more than 60% of the hours whenelectricity prices value is close to average. For the last clusterwhich only contains about 6% of the hours, net demandand electricity price are in the highest level. Each cluster isrepresenting an operating regime in the electricity market. Wewill quantify the impact of wind generation on electricity pricein each regime in the next section.

F. Similarity Detection and Price Impact Assessment

This part is aimed at calculating the impact of each MWhwind generation on the price of electricity quantitatively. Todo so, the similar hours method is applied to each clusterseparately. This method is designed to find the hours in whichall of the features except the level of wind generation are inthe same level. Thus, it can be concluded that the difference inelectricity prices is likely caused by wind power. The similarhours method is followed step by step as it is explained insection III-F. The criterion for similarity of two hours of pand q is defined as follows:

|x1,p − x1,q| < 5MW (15)

Page 7: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

Then, the effect of wind power, x2, on the electricity price, y,for the found couple of instances is calculated:

∆x2,pq = x2,p − x2,q (16)

∆ypq = yp − yq (17)

The average effect of wind power on the electricity price iscalculated based on RLWR method for each cluster separately:

E(∆y|∆x2 = ∆) (18)

Conditional expectation for the impact of wind power onthe electricity price for different clusters are calculated. Alinear regression model is then fitted to the achieved valuesof conditional expectations to have a better sense about theimpact of wind power on electricity price:

∆y = a∆x2 + b (19)

The ANOVA tables after applying Ordinary Least Square(OLS) algorithm to each cluster is shown in Table III. Eachregression’s R2 > 0.75 suggests a credible fit to the data. Cor-responding P-Values for coefficients are equal to zero whichindicate that there is a statically significant correlation betweenwind generation and electricity price in each cluster. Estimatedcoefficients all suggest a negative correlation between windpower and electricity price which confirms the fact that higherlevel of wind penetration reduces the price of electricity. As itcan be seen, values of estimated coefficients differ significantlyamong clusters. In the first cluster that the level of net demandand electricity price is the lowest, value of a indicate that 100

MWh increase in wind generation reduces market price by 100MWh ×0.008 $/MWh= $0.83. The effect wind generationincreases for the second cluster which has higher net demandand 100 MWh increase in wind generation reduces marketprice by $1.21. Finally, the last cluster representing the highestlevel of price is indicating high impact of wind power on theelectricity price. An increase of 100 MWh in wind generationreduces market price by $1.72 for instances belong to cluster3. It can be concluded that the effect of wind generation onelectricity price is completely non-linear and dependant on thelevel of electricity price and net demand.

V. CONCLUSION

In this paper, a data-driven method is proposed to estimatethe impact of wind generation level on wholesale electricitymarket prices. The proposed methodology has seven steps. Inthe first step, relevant market data is processed for potentialexplanation of price fluctuations. In the second step, a featureselection technique is applied to determine the most informa-tive market variables. In the next step, conditional expectationof electricity price for each level of features is found by

TABLE IIIORDINARY LEAST SQUARE ESTIMATION OF REGRESSION COEFFICIENTS

AFTER APPLYING THE SIMILAR HOURS TO THE EACH CLUSTER

Estimate R2

Cluster 1a -0.0083

0.8711b 0.6085

Cluster 2a -0.0121

0.9943b 0.5623

Cluster 3a -0.0172

0.7876b −2.2776

applying the RLWR algorithm to the dataset. An anomalydetection algorithm is employed then for detecting and re-moving the instances which the electricity price is affected bynon-modeled factors. Extreme values for the electricity pricesare removed from the dataset as they show too much deviationfrom the expectation. Then, different patterns in the electricitymarket are identified by the aid of clustering. This is to findthe existing patterns in the market and then investigate theimpact of wind power on electricity price for each of themseparately. A similarity detection algorithm is then appliedto find instances in the dataset with similarity in all factorsexcept wind power level. Finally, a relationship between pricevariations and the change in wind power level is estimated.

The proposed method is applied to data from Alberta’scompetitive market as a case study. A dataset is built for theperiod of three years from January of 2012 to December of2014. Load and level of wind power are found to be influentialexogenous factors on electricity prices. Three clusters aredetected each representing a different operating regime in themarket. Impact of each MWh Wind power on electricity priceis calculated based on similar hours method for each clusterseparately. It is concluded based on results that effect of windgeneration on electricity price is different for various patternsin the market. An increase of 100 MWh in wind generationwould reduce the electricity price within a range between$0.83 and $1.72 depending on operating regime. This impactis more substantial when level of net demand is higher.

The proposed data-driven approach in this paper can beapplied to any competitive electricity market. However, havingaccess to the market data is essential to do the data mininganalytic task. In other markets, influential factors on electricityprices may differ depending on the market structure. Thus,different explanatory variables may be considered for buildingmodel for the target variable of electricity price. Moreover,the final calculated price impacts of wind generation couldbe significantly different from one market to another one.The proposed methodology can also be extended to otherrenewable sources and estimate their impacts on wholesale

Page 8: Big Data Analytics for Modelling the Impact of Wind Power … · 2016-09-19 · Big Data Analytics for Modelling the Impact of Wind Power Generation on Competitive Electricity Market

electricity market prices. However, it should be noted thatin some cases, renewable energy resources do not have asignificant impact on electricity prices duo to their low level ofpenetration. For instances, currently in Alberta’s market, solarenergy generation does not have a notable impact on the pricedynamics.

REFERENCES

[1] L. Fried, “Global wind statistics 2014,” Global Wind Energy Council,Tech. Rep., 2014.

[2] A. Pullen and S. Sawyer, “Global wind report, annual market update2014,” Global Wind Energy Council, Tech. Rep., 2015.

[3] F. Sensfu, M. Ragwitz, and M. Genoese, “The merit-order effect: Adetailed analysis of the price effect of renewable electricity generationon spot market prices in Germany,” Energy Policy, vol. 36, no. 8, pp.3086 – 3094, 2008.

[4] S. Bode, “On the impact of renewable energy support schemes on powerprices,” HWWI Research Paper, Tech. Rep., 2006.

[5] G. S. de Miera, P. del Ro Gonzlez, and I. Vizcano, “Analysing the impactof renewable electricity support schemes on power prices: The case ofwind electricity in Spain,” Energy Policy, vol. 36, no. 9, pp. 3345 –3359, 2008.

[6] R. Green and N. Vasilakos, “Market behaviour with large amounts ofintermittent generation,” Energy Policy, vol. 38, no. 7, pp. 3211 – 3220,2010, large-scale wind power in electricity markets with Regular Papers.

[7] J. Morales, A. Conejo, and J. Perez-Ruiz, “Simulating the impact ofwind production on locational marginal prices,” Power Systems, IEEETransactions on, vol. 26, no. 2, pp. 820–828, May 2011.

[8] T. Elfouly, H. Zeineldin, E. El-Saadany, and M. Salama, “Impact of windgeneration control strategies, penetration level and installation locationon electricity market prices,” Renewable Power Generation, IET, vol. 2,no. 3, pp. 162–169, September 2008.

[9] E. D. Delarue, P. J. Luickx, and W. D. Dhaeseleer, “The actual effect ofwind power on overall electricity generation costs and CO2 emissions,”Energy Conversion and Management, vol. 50, no. 6, pp. 1450 – 1456,2009.

[10] H. Weigt, “Germanys wind energy: The potential for fossil capacityreplacement and cost saving,” Applied Energy, vol. 86, no. 10, pp. 1857– 1863, 2009.

[11] T. Traber and C. Kemfert, “Gone with the wind? electricity marketprices and incentives to invest in thermal power plants under increasingwind energy supply,” Energy Economics, vol. 33, no. 2, pp. 249 – 256,2011.

[12] C. Woo, J. Zarnikau, J. Kadish, I. Horowitz, J. Wang, and A. Olson,“The impact of wind generation on wholesale electricity prices in thehydro-rich pacific northwest,” Power Systems, IEEE Transactions on,vol. 28, no. 4, pp. 4245–4253, 2013.

[13] C. Woo, I. Horowitz, J. Moore, and A. Pacheco, “The impact of windgeneration on the electricity spot-market price level and variance: TheTexas experience,” Energy Policy, vol. 39, no. 7, pp. 3939 – 3944, 2011.

[14] C. Woo, J. Zarnikau, J. Moore, and I. Horowitz, “Wind generation andzonal-market price divergence: Evidence from Texas,” Energy Policy,vol. 39, no. 7, pp. 3928 – 3938, 2011.

[15] E. Nicholson, J. Rogers, , and K. Porter, “The relationship between windgeneration and balancing-energy market prices in ERCOT,” NationalRenewable Energy Labratory, 2010.

[16] H. Gil and j. Lin, “Wind power and electricity prices at the PJM market,”Power Systems, IEEE Transactions on, vol. 28, no. 4, pp. 3945–3953,Nov 2013.

[17] H. A. Gil, C. Gomez-Quiles, and J. Riquelme, “Large-scale wind powerintegration and wholesale electricity trading benefits: Estimation via anex-post approach,” Energy Policy, vol. 41, no. 0, pp. 849 – 859, 2012.

[18] D. Mladenic, Feature selection for dimensionality reduction. Springer,2006.

[19] D. Zhang, Z. Zhou, and S. Chen, “Semi-supervised dimensionalityreduction.” in SDM. SIAM, 2007, pp. 629–634.

[20] H. Zareipour, C. Canizares, K. Bhattacharya, and J. Thomson, “Applica-tion of public-domain market information to forecast ontario’s wholesaleelectricity prices,” Power Systems, IEEE Transactions on, vol. 21, no. 4,pp. 1707–1717, Nov 2006.

[21] G. Li, C. Liu, C. Mattson, and J. Lawarree, “Day-ahead electricity priceforecasting in a grid environment,” Power Systems, IEEE Transactionson, vol. 22, no. 1, pp. 266–274, Feb 2007.

[22] A. Boogert and D. Dupont, “When supply meets demand: The case ofhourly spot electricity prices,” Power Systems, IEEE Transactions on,vol. 23, no. 2, pp. 389–398, May 2008.

[23] I. Kononenko, “Estimating attributes: analysis and extensions of relief,”in Machine Learning: ECML-94. Springer, 1994, pp. 171–182.

[24] H. Liu, J. Sun, L. Liu, and H. Zhang, “Feature selection with dynamicmutual information,” Pattern Recognition, vol. 42, no. 7, pp. 1330–1339,2009.

[25] G. Baudat and F. Anouar, “Feature vector selection and projection usingkernels,” Neurocomputing, vol. 55, no. 1, pp. 21–38, 2003.

[26] N. Amjady and F. Keynia, “Day-ahead price forecasting of electric-ity markets by mutual information technique and cascaded neuro-evolutionary algorithm,” Power Systems, IEEE Transactions on, vol. 24,no. 1, pp. 306–318, Feb 2009.

[27] D. Huang, H. Zareipour, W. Rosehart, and N. Amjady, “Data miningfor electricity price classification and the application to demand-sidemanagement,” Smart Grid, IEEE Transactions on, vol. 3, no. 2, pp.808–817, June 2012.

[28] M. Hall, “Correlation-based feature subset selection for machine leam-ing,” Ph.D. dissertation, University of Waikato, 1998.

[29] D. Huang, H. Zareipour, W. Rosehart, and N. Amjady, “Data miningfor electricity price classification and the application to demand-sidemanagement,” Smart Grid, IEEE Transactions on, vol. 3, no. 2, pp.808–817, June 2012.

[30] W. Hardle, “Applied nonparametric regression,” Cambridge UniversityPress. 1990.

[31] W. S. Cleveland, “Robust locally weighted regression and smoothingscatterplots,” Journal of the American Statistical Association,, vol. 74,no. 368, pp. 829 – 836, 1979.

[32] C. C. Aggarwal, Outlier analysis. Springer Science & Business Media,2013.

[33] J. S. Racine, Nonparametric econometrics: A primer. Now PublishersInc, 2008.

[34] A. K. Jain and R. C. Dubes, Algorithms for clustering data. Prentice-Hall, Inc., 1988.

[35] X. Wu and V. Kumar, The top ten algorithms in data mining. CRCPress, 2009.

[36] S. Lotfifard, L. Xie, and M. Kezunovic, “Quantifying the impact ofunscheduled line outages on locational marginal prices,” in NorthAmerican Power Symposium (NAPS), 2010, 2010, pp. 1–6.

[37] A. Conejo, F. Nogales, and J. Arroyo, “Price-taker bidding strategy underprice uncertainty,” Power Systems, IEEE Transactions on, vol. 17, no. 4,pp. 1081–1088, Nov 2002.

[38] L. Vendramin, R. J. Campello, and E. R. Hruschka, “On the comparisonof relative clustering validity criteria.” in SDM. SIAM, 2009, pp. 733–744.