Univariate and multivariate properties of wind velocity time series

Preview:

Citation preview

Univariate and multivariate properties of wind velocity time series

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Download details:

IP Address: 200.124.242.99

This content was downloaded on 28/09/2013 at 08:27

Please note that terms and conditions apply.

J. Stat. Mech. (2009) P02026

(http://iopscience.iop.org/1742-5468/2009/02/P02026)

View the table of contents for this issue, or go to the journal homepage for more

Home Search Collections Journals About Contact us My IOPscience

J.Stat.M

ech.(2009)

P02026

ournal of Statistical Mechanics:An IOP and SISSA journalJ Theory and Experiment

Univariate and multivariate propertiesof wind velocity time series

S Bivona, G Bonanno, R Burlon, D Gurrera and C Leone

Dipartimento di Fisica e Tecnologie Relative, Universita degli Studi di Palermo,ItalyandCNR-CNISM, Viale delle Scienze, Edificio 18, I-90128 Palermo, ItalyE-mail: bivona@unipa.it, giovanni.bonanno@unipa.it, burlon@unipa.it,davide.gurrera@difter.unipa.it and leone@unipa.it

Received 14 November 2008Accepted 4 December 2008Published 9 February 2009

Online at stacks.iop.org/JSTAT/2009/P02026doi:10.1088/1742-5468/2009/02/P02026

Abstract. We analyze the time series of hourly average wind speeds measuredat 29 different stations located in Sicily, a region with a complex morphology. Theinvestigation, performed from the univariate as well as the multivariate point ofview, evidences that the statistical properties of wind at the single sites havefeatures that are not reproduced by standard models and, thus, require specificmodeling. Moreover, the synchronous evolution of wind velocity presents acluster structure, obtained with different algorithms, that persists in the standarddeviation too.

Keywords: hydrodynamic fluctuations, random graphs, networks

c©2009 IOP Publishing Ltd and SISSA 1742-5468/09/P02026+12$30.00

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Contents

1. Introduction 2

2. Data description 3

3. Univariate analysis 4

4. Multivariate analysis 9

5. Conclusions 10

Acknowledgments 12

References 12

1. Introduction

Atmospheric phenomena play an important regulatory role in the survival of many livingorganisms. These phenomena are also important for many human activities. For example,wind and rain help to clear atmospheric pollution generated in industrial and urbanagglomerates. Moreover, wind is considered as an important renewable energy source.

The dynamics of atmospheric wind is driven by the non-linear interaction of a greatamount of particles present in the air. External contributions to this dynamics are alsoimportant; the interactions with the sea, the mountains and the energy coming from thesun, just to cite a few, strongly influence the wind. So, the environment can be consideredas a complex system, and its properties are worth investigating with the statistical toolsused to deal with complexity [1]–[10].

Recently, different statistical models have been proposed for modeling, for predictionpurposes, and to investigate scaling behavior, observed in the wind velocity, that cannotbe reproduced by means of the theory of turbulence [5]–[8]. Such models include thesuperposition of different wind regimes [6], Markov models [7] and non-extensive statisticalmechanics [8].

The probability density function (PDF) of a stochastic variable gives the firstimportant statistical characterization of a process. Studies of the wind velocity PDF,carried out in [6, 11], have shown that the empirical PDF can be fitted, with good accuracy,with a Weibull distribution. However, when analyzing the time series of quantities that,like wind velocity, are recorded at different geographical positions, it can reasonably beexpected that measurements at neighboring places, due to the physical interactions, willhave some common characteristics that can be investigated by means of multivariateanalysis. In a previous study of hourly average wind speeds (HAWS) we have shown thatthe nearest neighbor/single-linkage (NNSL) clustering algorithm can be used to extractrelevant information from the cross-correlation matrix [12]. The algorithm filters themost correlated couples of sites and introduces the minimal possible distortion in theoriginal data set [15]. Such information was reported in two graphical representations:a dendrogram, which reveals a hierarchical arrangement of clusters that are stable overthe years, and a minimum spanning tree (MST), which is a particular network, whose

doi:10.1088/1742-5468/2009/02/P02026 2

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 1. Geographical distribution of the recording sites present in our database;they are: Acate (1), Agrigento (2), Alia (3), Cammarata (4), Canicattı (5),Castelvetrano (6), Catania (7), Cesaro (8), Corleone (9), Gela (10), Lascari (11),Leni (12), Lentini (13), Licata (14), Maletto (15), Mazara (16), Mazzarrone (17),Monreale (18), Pachino (19), Palazzolo Acreide (20), Palermo (21), Patti (22),Pettineo (23), Piazza Armerina (24), Ragusa (25), Riesi (26), Sciacca (27), Scicli(28), Trapani (29).

nodes are the different sites, that connects the highly correlated couples of nodes. In thispaper we investigate, from the univariate as well as from the multivariate perspective,the statistical properties of wind velocities measured at different sites located in an area,of about 25 × 103 km2, with a complex orography, such as Sicily. The purpose of thisinvestigation is to discover the limits of the existing approaches to wind modeling and,if possible, to provide further validations to such approaches. In doing that, we willproceed as follows: in section 2 we will describe our database; in section 3 we will presenta comparative analysis of the PDF of the HAWS and the corresponding daily standarddeviation for each site in our database; in section 4 we will study the cross-correlation ofwind velocity time series by comparing the results of the cluster analysis performed onthe HAWS [12] with those obtained from the daily standard deviation of the HAWS usingthe NNSL and other clustering algorithms; in section 5 we will draw our conclusions.

2. Data description

In the present work we analyze the time series of the modulus of the HAWS, measuredby Servizio Informativo Agrometeorologico Siciliano (SIAS). The data have been recordedat 29 different sites in Sicily, as shown in the map in figure 1, for the four-year period2003–2006.

The recording stations are placed 10 m above the ground at locations with quitedifferent morphologies. For instance, the town of Palermo is on the north western coast,

doi:10.1088/1742-5468/2009/02/P02026 3

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 2. Empirical PDF (circle), as a function of the bin center, obtained overall the 29 sites for the whole 2003–2006 period. The continuous line is the bestfitting Weibull distribution. The inset shows the same data on a logarithmic PDFscale.

i.e. on the Tyrrhenian Sea, surrounded by mountains just behind the town. The town ofSciacca lies on the south western coast, on a plane zone overlooking the Strait of Sicily,in the Mediterranean Sea. Corleone is on a hill in the hinterland, while Leni is locatedon Salina, a small Eolian island. So, heterogeneous morphology allows us to compare thestatistical properties of wind recorded in plane zones, on mountainous zones and near thesea to evidence common characteristics as well as differences.

The 29 synchronous HAWS time series present in the database are indicated by vi(t)where i is the index of the location and t is the time that can be conveniently expressedin hours. In the following, for the sake of simplicity, we omit i and t and write only v.

3. Univariate analysis

In this section we are going to discuss the empirical univariate statistical properties of theHAWS that emerge from the data. Let us begin by considering the PDFs of v obtainedin different ways. The empirical PDFs are calculated by making histograms from therecorded data. Such histograms have been normalized, to have unit area, by dividing thenumber of observations in each bin for the total number of points and for the bin size. Webegin by presenting the PDF of the HAWS, pertaining to the entire Sicilian area, by usingall the values of v recorded in the whole observation time for all of the 29 sites. This firstresult, being obtained from the largest number of entries available in the database, is alsothe one with highest statistical reliability. Figure 2 shows the empirical PDF as a functionof the bin center. Such figures evidence a distribution which rises to a single maximumand then falls monotonically. Previous works have shown that, among other distributionsthat reproduce the above described behavior, the Weibull distribution matches well tothe empirical data, producing good best fits [11] and wind scaling properties [6]. Theexpression for the Weibull distribution is

f(v) = Nvk−1e−(v/λ)k

. (1)

doi:10.1088/1742-5468/2009/02/P02026 4

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 3. RMSE calculated from the best fit of the empirical PDFs, for singlesites, as a function of the site.

N is a normalization constant, k and λ are two parameters of the distribution. Thecurve in figure 2 is the best fit of our data with the Weibull distribution. A satisfactoryagreement between the empirical and the theoretical PDF emerges in the low velocityregion. To give a quantitative estimate of the accuracy of the fit we have calculatedthe root mean square error (RMSE) which is, in our case, 0.007 m−1 s. To look at thetail of the distribution, we plot in the inset the same PDF on a semilogarithmic scale;the data now show an exponential tail which is severely underestimated by the Weibullfunction. So, the agreement between the empirical and theoretical PDFs seems to lastup to v ≈ 6 m s−1, and after this velocity value there is a breakdown in the estimationof the real PDF values, and the Weibull PDF is no longer appropriate for our data.However, this kind of analysis, that considers the data recorded at all the different sitesas a whole, does not give any insight into the statistical properties of v at each singlesite. We note that, given the heterogeneous morphology of the region, this question playsan important role in an exhaustive characterization of wind velocity time series. For thisreason we performed the same analysis over the whole four-year period, but on a siteby site basis. The RMSE values, obtained from the best fit of the empirical data to theWeibull distribution, are reported in figure 3. The evidently fluctuating behavior showsthat the fit accuracies can be quite different for the 29 sites. In fact, for many sitesthe RMSE assumes values approximately near 0.015 m−1 s; these are the sites where thefitting procedures produce the best results. Such RMSE values are higher than the oneobtained from the fit over the whole data set, but one has to consider that the best fitprocedures are now performed on less empirical points. So, higher RMSE values shouldbe expected.

For other sites however, the RMSE values are significantly higher than 0.015 m−1 s.For example, some of the sites exhibiting RMSE values higher than 0.029 m−1 s are Alia,Palermo, Riesi and Sciacca. An inspection of the map of figure 1 shows that such sitesare located in very different zones. Alia is in the hinterland, Palermo is on the northerncoast surrounded by mountains, Riesi is in the hinterland too, while Sciacca is on thesouthern coast on a plane zone. It seems that the observed ‘misbehavior’, observable at

doi:10.1088/1742-5468/2009/02/P02026 5

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

v v

Figure 4. Empirical PDF (circle) of the HAWS, as a function of the bin center,obtained at the sites of (a) Alia, (b) Leni, (c) Palermo and (d) Riesi for the whole2003–2006 period. The continuous line is the best fitting Weibull distribution.

sites with different positions and orographies, cannot be attributed to generic differencesin the geographical localization or morphology.

To gain more insight into these findings, we illustrate in figure 4 the empirical PDFsobtained for four sites chosen from those exhibiting low RMSE values and those exhibitinghigh RMSE values. Specifically, we report the PDFs for the sites of Alia, Leni, Palermoand Riesi. Once again, the curve in each figure is the best fitting Weibull distribution.

It can be immediately observed that for Leni the fit looks as good as that of figure 2.However, for the other three cases the fit accuracy is rather poor. Specifically, for Alia andPalermo two preferential wind velocities are observed, corresponding to the two maximain the empirical PDF. In other words, the empirical distributions are bimodal. The case ofRiesi however is different in that, regardless of the unimodal shape of the empirical PDF,the Weibull fitting curve does not reproduce the empirical observation with sufficientaccuracy. The empirical profile is tighter around the modal velocity than was predictedwith the Weibull PDF. In order to check whether these characteristics are persistent overtime, we repeated the same analysis at every site for each of the four years investigated,separately. In table 1 we report the RMSE values obtained from the fitting performed for6 of the 29 sites investigated. They are the sites of figure 4 and two other sites chosenamong the ones where the fitting accuracy is low: Cammarata and Pettineo. The tablealso contains the average and the standard deviation of the RMSE calculated over the 29sites for each given year. Again we note that the RMSE values obtained are higher thanthose for the whole period; this is to be ascribed to the further reduction in the number of

doi:10.1088/1742-5468/2009/02/P02026 6

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Table 1. RMSE values, in m−1 s, obtained from the best fit of the empiricalPDFs, calculated for the site and year indicated, with the Weibull distribution.

Year Average Std dev. Alia Cammarata Palermo Pettineo Riesi Leni

2003 0.031 0.011 0.036 0.046 0.060 0.044 0.060 0.0282004 0.025 0.009 0.030 0.024 0.049 0.054 0.040 0.0172005 0.027 0.010 0.030 0.055 0.049 0.038 0.045 0.0222006 0.025 0.009 0.029 0.043 0.041 0.046 0.046 0.021

points used in these yearly fitting procedures. It appears that some sites show values thatare persistently higher than the average for the year; this is the case for the RMSE valuesobtained from the best fit of the data from Palermo and Pettineo which are, in manycases, two standard deviations above the average. The other values are not so high asthe previous ones, but they are still higher than those characteristic of Leni where the fityields the best results. In fact, for Leni the fit accuracy is consistently good, with RMSEvalues smaller than the average over all the years. The yearly PDFs of the 29 sites (notshown for brevity), though more noisy, keep showing the fingerprints of bimodal behaviorwhere it has been observed in the PDFs for the whole period.

Before concluding this section, we discuss the PDF of the wind speed standarddeviation σi, which is an indicator of the HAWS variations during each day. In view ofthis, σi is an important quantity for the construction of wind energy conversion systems(WECS) which require the persistence of sufficiently high values of wind speed to work.The time series for wind velocity are non-stationary; hence it is reasonable to consider σi

as a random quantity whose properties are worth investigating. We calculate the valuesof daily standard deviations from the HAWS as follows:

σi(t) =

√√√√

24∑

t′=1

[vi(t′) − vi(t)]2

24

vi(t) =24∑

t′=1

vi(t′)

24

(2)

where i indicates the site, t is the day and t′ is the hour of that day. So we obtain 29 timeseries σi(t) that have been analyzed. The PDFs of ln σi(t) are obtained by constructinghistograms normalized in the same way as the previous ones. Let us discuss the resultsobtained for the same sites as in figure 4. We show in figure 5 the logarithm of the PDFas a function of the bin center for those bins containing at least 10 observations. Giventhe reduced number of data used for this daily analysis, the empirical points presenthigher fluctuations than those of figure 4, but nonetheless it is possible to present someconsiderations. The intervals of variation of the σi values are different for the differentsites. Specifically, for Palermo the standard deviation assumes values that are smallerthan those for the other three cases, and the modal value of the HAWS is also low(see figure 4(c)). This could be caused by the particular orography of the locations.In fact Palermo lies on a coastal zone surrounded by mountains, protecting the townfrom atmospheric turbulence and making it a less windy place; in contrast, the site

doi:10.1088/1742-5468/2009/02/P02026 7

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 5. Empirical PDF (circle), as function of the bin center, obtained atthe sites of (a) Alia, (b) Leni, (c) Palermo and (d) Riesi for the whole 2003–2006period. The continuous line is the best fitting Gaussian distribution whose RMSEvalues are (a) 0.24, (b) 0.33, (c) 0.32 and (d) 0.38.

of Leni, on Salina Island, is characterized by a higher modal value of the HAWS (seefigure 4(b)) and higher values of σi. So, while a persistently low wind speed makes asite less appropriate for WECS, a higher modal value can happen to be associated with ahigher variability, indicating the existence of low wind periods, when the energy conversioncould be inefficient. Finally, to model the shape of the empirical PDFs with a log-normaldistribution we performed parabolic best fits, shown as continuous lines in figure 5. TheRMSE values are higher than the previous ones, but considering the smaller number ofpoints used, the disagreement between the empirical and the fitted distributions is not sobad as to make us reject the log-normal function as a reasonable first guess for furtherinvestigation with more data.

So, on one hand the single-site HAWS distribution may show, in some cases, a bimodalshape that cannot be fitted using the unimodal Weibull PDF. Moreover, such a behaviordisappears when looking at the distribution over all the sites together. The Weibullfunction can reproduce the empirical data even if it seems to fail in reproducing the tailof the distribution. On the other hand, the distribution of daily standard deviations ofthe HAWS assumes different values for different sites and can be approximated by thelog-normal function, even if a definitive conclusion on this last point is still far from beingobtained.

doi:10.1088/1742-5468/2009/02/P02026 8

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 6. The MST (left) and the dendrogram (right) obtained from the timeseries of the HAWS standard deviation.

4. Multivariate analysis

The dynamics of the wind observed at a location is not independent from all the others.Indeed, it can be expected that wind velocity time series recorded at a location sharestatistical similarities with those of neighboring locations. However, this strongly dependson the morphology of the region. For instance, the presence of mountains is able to stopthe wind and protects locations lying in a valley from the turbulence that originates frombehind such mountains. Sicily presents a complex morphology in that there are mountainsand sea, sometimes quite close, while vast plane zones are present too. So, while it is truethat wind time series can exhibit correlated synchronous evolution, they can be influencedby many different factors that are not simple to evaluate qualitatively. Quantitativeinsights, obtained by means of the nearest neighbor/single-linkage (NNSL) clusteringalgorithm, have shown the existence of a meaningful cross-correlation taxonomy [12] thatcan be represented in a dendrogram. We used the correlation-based distance definedby dij =

2(1 − ρij) [13], where ρij is the coefficient of cross-correlation between thesynchronous time series of the HAWS while i and j identify two sites. Besides obtaining theclusters, the NNSL allows us to filter, by means of an iterative procedure, the N − 1 = 28most highly correlated couples of sites producing a minimum spanning tree (MST), whichconnects all the sites minimizing the total length of all its links. Results not shownhave evidenced that the observed taxonomy also persists in the cross-correlation of dailyaverage wind speeds, so a comparison between the results of [12] and those pertaining tothe daily standard deviations is reasonable.

In this section we investigate the persistence of the cluster structure in the time seriesof σi(t) defined by equation (2). We also provide further validation of the clustering bymeans of comparisons with the results from other algorithms.

By calculating ρσij , the coefficient of cross-correlation between the time series σi(t)

and σj(t), it is possible to apply the NNSL and obtain the dendrogram and the MSTassociated with the daily standard deviations. The results of such analysis, reported infigure 6, evidence a cluster structure reproducing the geographical neighborhood. Sucha first result is interesting; in fact a previous study on financial markets has shownthat only a part of the structure of the price return cross-correlation is maintained inthe standard deviation [14], especially in the dendrogram where the reliability of thehierarchical classification has to be checked. So, the persistence of the cross-correlation

doi:10.1088/1742-5468/2009/02/P02026 9

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Table 2. Comparison of the clusters found in [12] with those found in the presentwork. The © in the columns for Aσ and Bσ indicate that the site is present inthe actual cluster too.A Aσ B Bσ

Alia Acate ©Castelvetrano © Agrigento ©Corleone Gela ©Mazara © Licata ©Sciacca © Mazzarrone ©Trapani © Palazzolo Acreide

Piazza Armerina ©

clustering in the passage from a stochastic process to its higher moments is not guaranteedand has to be demonstrated. In this case, by using a threshold equal to the average overthe 28 distances entering the dendrogram, we obtain two main clusters; besides them,other significant connections can be observed in the MST. The composition of the twoclusters is reported in table 2, both for the HAWS [12] and for the standard deviations. Inorder to distinguish the analysis of [12] from the present one, we denote by A and B theclusters of [12], while Aσ and Bσ are those obtained in the present work. We observe thatthe four most stable sites of cluster A belong to cluster Aσ too. They are Castelvetrano,Mazara, Sciacca and Trapani, four coastal locations on a plane zone. The sites of Corleoneand Alia, two hinterland locations, which where observed in cluster A are not in clusterAσ. However, Corleone connects directly to Trapani, though at a higher distance in thedendrogram, while Alia is slightly farther from the rest of sites of cluster Aσ. It is worthnoting that Corleone and Alia were observed in cluster A only for 93% and 67% of the timerespectively, so this discrepancy is not surprising. In contrast, the geographical proximityof these two sites to cluster Aσ is reproduced by the distances observed in the MST. Thecluster Bσ contains all sites of cluster B but one, Palazzolo Acreide, which is connectedto Mazzarrone though at higher distance in the dendrogram. We note that PalazzoloAcreide is farther from all the other sites of cluster B so, despite its absence from clusterBσ, we can conclude that the similarity between B and Bσ is satisfactory.

To further validate such findings, we report in figure 7 the dendrograms obtained withtwo other widely used clustering algorithms: the farthest neighbor/complete linkage andthe average linkage ones. The algorithms differ in the way in which they define the distancebetween two clusters: respectively, the maximum distance and the average distancebetween all the elements composing those clusters. On cutting the four dendrogramsin the same way as before, we note that the locations of Aσ are still present in a welldefined cluster. However, the stations of Bσ can be split into different clusters that joineach other before merging with the sites of Aσ. In particular, these circumstances furtherconfirm the above results in that the velocity correlation structure still persists in thestandard deviation; moreover the plane zone cluster is the most robust.

5. Conclusions

In this work we presented a statistical investigation of the wind velocity time seriesmeasured at 29 different locations in Sicily, that looks at the univariate as well as the

doi:10.1088/1742-5468/2009/02/P02026 10

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Figure 7. Dendrograms obtained from the time series of the HAWS using (a)the complete linkage and (b) the average linkage, and from the daily standarddeviations of the HAWS using (c) the complete linkage and (d) the averagelinkage.

multivariate aspects. The univariate analysis has shown that the Weibull distribution,already used for wind velocity, yields an accurate fit of the empirical data for the wholeregion. However, when looking at the PDFs obtained for the single sites, differentbehaviors are observed. For some sites the Weibull distribution keeps working, while forothers there seems to be a breakdown. In fact, for a few sites the empirical PDF appears tobe bimodal and the RMSE assumes high values. So, on one hand, the Weibull distributionseems to be not so good for some sites; on the other hand, definitive conclusions and,possibly, appropriate stochastic modeling need further investigations with more data, thatare actually in progress. The standard deviation however presents intervals of variationscharacteristic of each site and could be reproduced by a log-normal distribution. Moreover,the correlation-based clustering obtained for the HAWS, that reproduces the geographicaland morphological characteristics of Sicily, in contrast to a previous study [14], is entirelymaintained in the standard deviations and also confirmed by other clustering procedures,evidencing a robust structure in the synchronous evolution of the time series investigated.

doi:10.1088/1742-5468/2009/02/P02026 11

J.Stat.M

ech.(2009)

P02026

Univariate and multivariate properties of wind velocity time series

Acknowledgments

This work was partially supported by MIUR. We acknowledge the SIAS for providing uswith data.

References

[1] Wilks D S, 2006 Statistic Method in the Atmospheric Science 2nd edn (New York: Academic)[2] Caldarelli G, 2007 Scale Free Networks (Oxford: Oxford University Press)[3] Caldarelli G, 2001 Phys. Rev. E 63 021118[4] Caldarelli G, De Los Rios P, Montuori M and Servedio V D P, 2004 Eur. Phys. J. B 38 387[5] Di Sabatino S, Sollazzo E, Paradisi P and Britter R, 2008 Boundary-Layer Meteorol. 127 131[6] Bottcher F, Barth St and Peinke J K, 2007 Stoch. Environ. Res. Ris. Assess. 21 299[7] Kantz H, Holstein D, Ragwitz M and Vitanov N K, 2004 Physica A 342 315[8] Rizzo S and Rapisarda A, 2004 Complexity, Metastability and Nonextensivity vol 246, ed C Beck,

G Benedeck, A Rapisarda and C Tsallis (Singapore: World Scientific)[9] Weber R O and Talkner P, 2001 J. Geophys. Res. 106 20131

[10] Weber R O, Kaufmann P and Talkner P, 1997 Tellus A 49 407[11] Bivona S, Burlon R and Leone C, 2003 Renew. Energy 28 1371[12] Bivona S, Bonanno G, Burlon R, Gurrera D and Leone C, 2008 Physica A 387 5910[13] Gower J C, 1966 Biometrika 53 325[14] Micciche S, Bonanno G, Lillo F and Mantegna R N, 2003 Physica A 324 66[15] Rammal R, Toulouse G and Virasoro M A, 1986 Rev. Mod. Phys. 58 765

doi:10.1088/1742-5468/2009/02/P02026 12