9
J. Renewable Sustainable Energy 11, 035504 (2019); https://doi.org/10.1063/1.5094059 11, 035504 © 2019 Author(s). Data article: Distributed PV power data for three cities in Australia Cite as: J. Renewable Sustainable Energy 11, 035504 (2019); https://doi.org/10.1063/1.5094059 Submitted: 26 February 2019 . Accepted: 16 April 2019 . Published Online: 28 May 2019 Jamie M. Bright , Sven Killinger , and Nicholas A. Engerer COLLECTIONS This paper was selected as Featured

Arena Report - Publication of PV Power Measurements

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

J Renewable Sustainable Energy 11 035504 (2019) httpsdoiorg10106315094059 11 035504

copy 2019 Author(s)

Data article Distributed PV power data forthree cities in Australia Cite as J Renewable Sustainable Energy 11 035504 (2019) httpsdoiorg10106315094059Submitted 26 February 2019 Accepted 16 April 2019 Published Online 28 May 2019

Jamie M Bright Sven Killinger and Nicholas A Engerer

COLLECTIONS

This paper was selected as Featured

Data article Distributed PV power data for threecities in Australia

Cite as J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059Submitted 26 February 2019 Accepted 16 April 2019 Published Online 28 May 2019

Jamie M Bright1a) Sven Killinger12 and Nicholas A Engerer1

AFFILIATIONS1Fenner School of Environment and Society The Australian National University Canberra ACT Australia2Division Smart Grids Fraunhofer-Institute for Solar Energy Systems ISE 79110 Freiburg Germany

a)Electronic addresses jamiebrightanueduau and jamiebright1gmailcom

ABSTRACT

We present a publicly available dataset containing photovoltaic (PV) system power measurements and metadata from 1287 residentialinstallations across three statesterritories in Australiamdashthough mainly for the cities of Canberra Perth and Adelaide The data are recordedbetween September 2016 and March 2017 at 10-min temporal resolution and consist of real inverter reported power measurements from PVsystems that are well distributed throughout each city The dataset represents a considerably valuable resource as public access to spatio-temporal PV power data is almost nonexistent this dataset has already been used in numerous articles by the authors The PV power dataare free to download and are available in their raw quality controlled (QC) and ldquotunedrdquo formats Each PV system is accompanied by individ-ual metadata including geolocation user reported metadata and simulated parameterization Data provenance download usage rights andexample usage are detailed within Researchers are encouraged to leverage this rich spatio-temporal dataset of distributed PV power data intheir research

Published under license by AIP Publishing httpsdoiorg10106315094059

I INTRODUCTION

Globally the installed capacity of photovoltaic (PV) systems israpidly increasing Very few of these systems continuously report theirpower generation data fewer still publicly expose the data Access toPV power generation data from a multitude of systems within a spe-cific area (eg a single section of the distribution grid or an entire city)is of scientific research interest for a host of applications such as PVgrid modeling or improved solar forecasting The crucial feature ofinterest is the spatial distribution of such PV systems While access toindividual PV profiles from around the world is possible the cross cor-relations of PV power fluctuations can have interesting impacts on theelectricity grid which are otherwise ignored (Bright et al 2017Munkhammar et al 2017 and Lave et al 2015) Very few datasetsfeaturing a rich spatially diverse dimension exist for public use ofthose available many are used and reviewed by Munkhammar andWiden (2018) Ultimately there is a dearth of spatial-temporal PVdata in the public domain

In our project funded by the Australian Renewable EnergyAgency (ARENA) Jones et al (2018) we have collected a significantnumber of PV system power measurements for three cities inAustralia for public release 393 PV systems in South Australia (SA)531 PV systems in Western Australia (WA) and 363 PV systems in

the Australian Capital Territory (ACT)mdashwhile data are available fromsites across the whole of these statesterritories of Australia predomi-nantly the data are from the three cities of Adelaide Perth andCanberra hence the title of the article Furthermore we concurrentlyprovide the resulting power measurements after state-of-the-art qual-ity controlled (QC) tests and subsequent tuning (Sec IIC) to makethe PV systems representative Following the recommendations toease the burden of data access and reproducibility in the field of solarenergy (Yang 2018) we are presenting our dataset in the form of adata article to the Journal of Renewable and Sustainable Energy alongwith a sample code that clearly enables the user to access the data andbegin their research in an efficient manner

The main objective of this paper is to make the readers aware ofa valuable and freely available dataset of distributed PV data fromthree cities within Australia Our aims are to inform the reader of allthe implications and nuances of the dataset itself how to downloadthe data and the terms of usage as well as to provide simple codeexamples of how to begin using the data A simple overview of the spa-tial distribution of the PV systems is illustrated in Fig 1

The background related to data provenance is discussed inSec II All the information required to access and download the datais detailed in Sec III User rights and expectations are stated in Sec IV

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-1

Published under license by AIP Publishing

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

Details of the data structure and format are found in Sec V Someexample code and scripts of how to access and use the data are pro-vided in Sec VI A discussion of the data and its potential uses as wellas closing statements is made in Sec VII

II DATA PROVENANCE

This section details all the information regarding the datasetfrom where the data were first obtained all the way through to it beingavailable for download This section is in essence a methodology ofprocessing raw PV power measurements It is subcategorized intothree main sections raw PV measurements and simulated clear-skyPV output (Sec IIA) QC of PV measurements (QCPV Sec II B) andtuning of PV measurements (Sec IIC)

An overview flowchart of the methodology is included for refer-ence in Fig 2

A Raw PV power measurements and metadata

The original raw PV power measurements were collected at 5-min resolution from httppvoutputorgmdasha public website whereusers are able to report their PV power generation using automateddata loggers that are pre-existing in participating inverter models TheAustralian National University (ANU) has an active subscription to

PVOUTPUTORG at the time of publication and this subscription shall becontinued by the data hosts Solcast (see Sec III) at the close of theANU project For our usage we transformed the original PV powerdata at 10-min intervals to match the Himawari-8 satellite scan timingover Australia (0007 0017 0027 etc) We do not state how this wasachieved to protect the original product from PVOUTPUTORG ulti-mately the data are not ldquorawrdquo from PVOUTPUTORG but a modified timeaverage that nevertheless represents raw data The data available spansa period from September 2016 through March 2017 While this datarelease does not span a full year it includes a vast amount of residen-tial PV installations during the Spring and Summer seasons PV powermeasurements are normalized by the reported rated total modulecapacity of the respective system so that all power values Pmeas arereported in WWp and this enables an easier comparison betweensystems To obtain the original reported power the user must multiplyby the respective capacity

Each user reports their PV system metadata in terms of tilt (b)azimuth (c) and the loss factor (LF) A critical concern is the knownerror in metadata reporting This issue is widely discussed in Killingeret al (2018) We provide the original user reported metadata howeverwe also provide simulated PV metadata as derived from a parameteri-zation technique which derives the most likely PV system metadatafrom the power time series

FIG 1 A map of Australia highlighting Perth WA Adelaide SA and Canberra ACT Each red marker represents the location of a PV system within the database Longitudeand latitude are reported in degrees

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-2

Published under license by AIP Publishing

The parameterization methodology is well defined in Killingeret al (2017a 2017b) (Sec IV) and so is not fully reproduced here In across-validation study against LiDAR data these estimated parameter-izations were found to be reasonably accurate over the ACT (Lingforset al 2018)

Differences in the previously described methodology are due tothe choice of input data In summary the parameterization calculatedclear-sky beam and diffuse irradiance from climatological data andreanalyses datasets in combination with the REST2 clear-sky irradi-ance model (Gueymard 2008) An initial estimate of the clear-skypower curve ethPsimcsTHORN was made using b c and LF and the QuadraticPV model (Killinger et al 2016a) note that the quadratic PV modelwas corrected and reintroduced by Killinger et al (2017b) Psimcs ismade available in the dataset an example of which can be seen in Fig4 (top left) It can be seen that there are some PV system clear-skypower curves that have a prominent afternoon peak indicative of awest facing azimuth An initial pass of the QCPV routine on Pmeas

(defined next in Sec II B) is performed so that only valid first passmeasurements were retained Pqc Satellite derived irradiance of globalhorizontal irradiance (Egh) beam normal irradiance (Ebn) and diffusehorizontal irradiance (Edh) for the corresponding PV time steps weretaken from Solcastmdasha solar forecasting and resource assessment com-panymdashthe derivation of the satellite-derived irradiance is described byEngerer et al (2017) Bright et al (2018) Irradiance was then

transposed onto the plane of array irradiance using the Reindl trans-position model (Reindl et al 1990) (all future described transpositionsto the plane of array use the Reindl model) Satellite-derived irradiancehas a temporal resolution of 10min hence all datasets used within thispaper (eg PV power measurements ambient temperature etc) wereaggregated to a corresponding resolution We report time steps at0007 0017 etc to account for the scan timing offset of theHimawari-8 satellite used to obtain Egh With the initial Pqc the esti-mated tilt bQ azimuth cQ and LF are then derived through parameter-ization as per Sec IVIII of Killinger et al (2017b)

Figure 4 (top right) plots Pmeas for the ACT on 10 Jan 2017 Thefigure highlights the need for additional quality control to deal withthe observed erratic and intuitively incorrect measurements

B Quality controlled PV measurements

The second data type available is PV power that has undergonethe rigorous QCPV routine Pqc The QCPV routine was initially devel-oped in Engerer (2015) and further developed and described inKillinger et al (2017b) with the exception of updated input datadescribed in Sec IIA additional QCPV tests were developed inKillinger et al (2017a) The QCPV routine consists of six tests directlyon Pmeas of each time step and three more systematic tests that use his-toric statistics and information from different nearby PV systems Allthe tests are summarized in Table I

FIG 2 Flow diagram of all the inputs (green parallelograms) the methodological steps (blue rectangles) and all the intermediary and final output variables (yellow diamonds)The diagram shows the procedures required for transition from the input PV power measurements (Pmeas) are transformed from the raw format through various parameteriza-tions and QCPV steps to become tuned PV power (Ptuned) The clear-sky power data (Psimcs) and quality controlled power data (Pqc) are also provided in the data download

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-3

Published under license by AIP Publishing

Power data that do not pass QCPV are simply replaced by ldquoNArdquofrom the time series this means that the data for Pqc already have theQCPV flags applied A direct comparison of the Pmeas and Pqc timeseries would indicate which time steps have been removed (eg if dataexist in Pmeas but not in Pqc then it was flagged) The QCPV routineoverall reduces the amount of data for ACT from 9157 671 to5 707 504 (623) for SA from 2619 416 to 1 829 595 (699) and forWA from 6763 084 to 4 897 408 (724) These percentages are notrepresentative on a PV system by PV system basis as certain PV sys-tems report data with excellent trust while others can be entirelyremoved by QCPV To demonstrate this Fig 3 shows the number ofPV systems that underwent a certain percentage of data reductionafter QCPV

We can observe the impact of QCPV by comparing the top rightand bottom left panels of Fig 4 Pqc clearly visualizes the removal ofthe intuitively erroneous data observed in Pmeas A subtle increase inthe mean power profile (thick yellow line) can be observed in the firsthalf of the day

C Tuned PV measurements

Tuning is the third development arising from the distributedPV power dataset Tuning corrects systematic biases that impact thePV system Consider a PV system that is shaded for the first half ofthe day every single morning there will be a systematic reduction inpower This is undesirable in a dataset where representative spatialcharacteristics of irradiance are desired because the systematic effecton one PV system is not representative of another To remove thesesystematic biases a tuning algorithm was developed These algo-rithms were initially developed in Engerer (2015) before being morefully developed in Killinger et al (2016b) and Killinger et al (2017a)following on from the QCPV development (Sec II B) In summaryeach PV system is analyzed over a rolling 30-day window wherebythe measurement at 1017 am is compared with the most recent 30previous measurements at 1017 am An assumption is made thatfor at least one of the 30 measurements considered one of the mea-surements was made under clear-sky conditions By taking the 99thpercentile of the 30-measurement distribution expressed askpv frac14 Pqc=Psimcs we would expect to observe a value of 1 under per-fectly clear conditions Should 1017 am never satisfy the definitionof clear conditions the time step is scaled toward clear-sky with theassumption that a systematic bias occurs at that measurement Thismost often captures shading events and rapid soiling events (egsnowfall or dust) it also mitigates for LF

The tuned PV power Ptuned is observed in the bottom right panelof Fig 4 The differences are subtle as PV shading is typically mini-mized by the PV system owner to maximize profitability however wecan observe significant alterations on a specific PV system time seriesAt the start and the end of the visualized day the mean power under-goes a significant trough in Pqc It is expected that many PV systemsregister a 99th percentile from the kpv distribution of the preceding30 days that is substantially below a value of 1mdashindicative of morningafternoon shading by trees buildings or from the roof itself The tun-ing algorithm rescales the distribution so that the measured time stepshave that shading removed This trough feature no longer appears inthe Ptuned time series hence systematic influences have been removed

III DATA ACCESS

The data are available as a two-part download and is hosted bythe ARENA project partner Solcast (httpssolcastcomaurooftop-solarpublication-of-a-research-grade-solar-pv-power-dataset) Solcastis a global solar data services company who provides a considerable

TABLE I A summary of the QCPV criteria (Killinger et al 2017ab)

Test Criteria

Upper limit Comparison of P meas against a theoretical upper limit based on extraterrestrial irradianceLower limit Comparison of Pmeas against a theoretical lower limit and invariable detection (flat lines)Sundown Enforcing a ldquozero-outputrdquo on P meas above a certain zenith angleDaily energy ratio Removal of days with an unusually low amount of total energy generationMaximum kpv Limit of Pmeas against a maximum kpv frac14 Pmeas=Psimcs (Engerer and Mills 2014)Persistence Maximum limit on Pmeas variability and prohibition of zero power values for zenith angles lt70Cloud enhancements Suspected events must be corroborated by nearby PV systems and threshold valuesSpurious data Removal of days showing atypical behavior significantly different from normal operationBenchmark violations Removal of days where energy ratio deviation is significantly different from nearby systems

FIG 3 Histogram showing the amount of data removed as a result of QCPV on aPV system by PV system basis There are 20 bins and QCPV data removal isrounded to the nearest 5 For example nearly 100 PV systems had 100 of theirdata removed from QCPV

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-4

Published under license by AIP Publishing

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

Data article Distributed PV power data for threecities in Australia

Cite as J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059Submitted 26 February 2019 Accepted 16 April 2019 Published Online 28 May 2019

Jamie M Bright1a) Sven Killinger12 and Nicholas A Engerer1

AFFILIATIONS1Fenner School of Environment and Society The Australian National University Canberra ACT Australia2Division Smart Grids Fraunhofer-Institute for Solar Energy Systems ISE 79110 Freiburg Germany

a)Electronic addresses jamiebrightanueduau and jamiebright1gmailcom

ABSTRACT

We present a publicly available dataset containing photovoltaic (PV) system power measurements and metadata from 1287 residentialinstallations across three statesterritories in Australiamdashthough mainly for the cities of Canberra Perth and Adelaide The data are recordedbetween September 2016 and March 2017 at 10-min temporal resolution and consist of real inverter reported power measurements from PVsystems that are well distributed throughout each city The dataset represents a considerably valuable resource as public access to spatio-temporal PV power data is almost nonexistent this dataset has already been used in numerous articles by the authors The PV power dataare free to download and are available in their raw quality controlled (QC) and ldquotunedrdquo formats Each PV system is accompanied by individ-ual metadata including geolocation user reported metadata and simulated parameterization Data provenance download usage rights andexample usage are detailed within Researchers are encouraged to leverage this rich spatio-temporal dataset of distributed PV power data intheir research

Published under license by AIP Publishing httpsdoiorg10106315094059

I INTRODUCTION

Globally the installed capacity of photovoltaic (PV) systems israpidly increasing Very few of these systems continuously report theirpower generation data fewer still publicly expose the data Access toPV power generation data from a multitude of systems within a spe-cific area (eg a single section of the distribution grid or an entire city)is of scientific research interest for a host of applications such as PVgrid modeling or improved solar forecasting The crucial feature ofinterest is the spatial distribution of such PV systems While access toindividual PV profiles from around the world is possible the cross cor-relations of PV power fluctuations can have interesting impacts on theelectricity grid which are otherwise ignored (Bright et al 2017Munkhammar et al 2017 and Lave et al 2015) Very few datasetsfeaturing a rich spatially diverse dimension exist for public use ofthose available many are used and reviewed by Munkhammar andWiden (2018) Ultimately there is a dearth of spatial-temporal PVdata in the public domain

In our project funded by the Australian Renewable EnergyAgency (ARENA) Jones et al (2018) we have collected a significantnumber of PV system power measurements for three cities inAustralia for public release 393 PV systems in South Australia (SA)531 PV systems in Western Australia (WA) and 363 PV systems in

the Australian Capital Territory (ACT)mdashwhile data are available fromsites across the whole of these statesterritories of Australia predomi-nantly the data are from the three cities of Adelaide Perth andCanberra hence the title of the article Furthermore we concurrentlyprovide the resulting power measurements after state-of-the-art qual-ity controlled (QC) tests and subsequent tuning (Sec IIC) to makethe PV systems representative Following the recommendations toease the burden of data access and reproducibility in the field of solarenergy (Yang 2018) we are presenting our dataset in the form of adata article to the Journal of Renewable and Sustainable Energy alongwith a sample code that clearly enables the user to access the data andbegin their research in an efficient manner

The main objective of this paper is to make the readers aware ofa valuable and freely available dataset of distributed PV data fromthree cities within Australia Our aims are to inform the reader of allthe implications and nuances of the dataset itself how to downloadthe data and the terms of usage as well as to provide simple codeexamples of how to begin using the data A simple overview of the spa-tial distribution of the PV systems is illustrated in Fig 1

The background related to data provenance is discussed inSec II All the information required to access and download the datais detailed in Sec III User rights and expectations are stated in Sec IV

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-1

Published under license by AIP Publishing

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

Details of the data structure and format are found in Sec V Someexample code and scripts of how to access and use the data are pro-vided in Sec VI A discussion of the data and its potential uses as wellas closing statements is made in Sec VII

II DATA PROVENANCE

This section details all the information regarding the datasetfrom where the data were first obtained all the way through to it beingavailable for download This section is in essence a methodology ofprocessing raw PV power measurements It is subcategorized intothree main sections raw PV measurements and simulated clear-skyPV output (Sec IIA) QC of PV measurements (QCPV Sec II B) andtuning of PV measurements (Sec IIC)

An overview flowchart of the methodology is included for refer-ence in Fig 2

A Raw PV power measurements and metadata

The original raw PV power measurements were collected at 5-min resolution from httppvoutputorgmdasha public website whereusers are able to report their PV power generation using automateddata loggers that are pre-existing in participating inverter models TheAustralian National University (ANU) has an active subscription to

PVOUTPUTORG at the time of publication and this subscription shall becontinued by the data hosts Solcast (see Sec III) at the close of theANU project For our usage we transformed the original PV powerdata at 10-min intervals to match the Himawari-8 satellite scan timingover Australia (0007 0017 0027 etc) We do not state how this wasachieved to protect the original product from PVOUTPUTORG ulti-mately the data are not ldquorawrdquo from PVOUTPUTORG but a modified timeaverage that nevertheless represents raw data The data available spansa period from September 2016 through March 2017 While this datarelease does not span a full year it includes a vast amount of residen-tial PV installations during the Spring and Summer seasons PV powermeasurements are normalized by the reported rated total modulecapacity of the respective system so that all power values Pmeas arereported in WWp and this enables an easier comparison betweensystems To obtain the original reported power the user must multiplyby the respective capacity

Each user reports their PV system metadata in terms of tilt (b)azimuth (c) and the loss factor (LF) A critical concern is the knownerror in metadata reporting This issue is widely discussed in Killingeret al (2018) We provide the original user reported metadata howeverwe also provide simulated PV metadata as derived from a parameteri-zation technique which derives the most likely PV system metadatafrom the power time series

FIG 1 A map of Australia highlighting Perth WA Adelaide SA and Canberra ACT Each red marker represents the location of a PV system within the database Longitudeand latitude are reported in degrees

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-2

Published under license by AIP Publishing

The parameterization methodology is well defined in Killingeret al (2017a 2017b) (Sec IV) and so is not fully reproduced here In across-validation study against LiDAR data these estimated parameter-izations were found to be reasonably accurate over the ACT (Lingforset al 2018)

Differences in the previously described methodology are due tothe choice of input data In summary the parameterization calculatedclear-sky beam and diffuse irradiance from climatological data andreanalyses datasets in combination with the REST2 clear-sky irradi-ance model (Gueymard 2008) An initial estimate of the clear-skypower curve ethPsimcsTHORN was made using b c and LF and the QuadraticPV model (Killinger et al 2016a) note that the quadratic PV modelwas corrected and reintroduced by Killinger et al (2017b) Psimcs ismade available in the dataset an example of which can be seen in Fig4 (top left) It can be seen that there are some PV system clear-skypower curves that have a prominent afternoon peak indicative of awest facing azimuth An initial pass of the QCPV routine on Pmeas

(defined next in Sec II B) is performed so that only valid first passmeasurements were retained Pqc Satellite derived irradiance of globalhorizontal irradiance (Egh) beam normal irradiance (Ebn) and diffusehorizontal irradiance (Edh) for the corresponding PV time steps weretaken from Solcastmdasha solar forecasting and resource assessment com-panymdashthe derivation of the satellite-derived irradiance is described byEngerer et al (2017) Bright et al (2018) Irradiance was then

transposed onto the plane of array irradiance using the Reindl trans-position model (Reindl et al 1990) (all future described transpositionsto the plane of array use the Reindl model) Satellite-derived irradiancehas a temporal resolution of 10min hence all datasets used within thispaper (eg PV power measurements ambient temperature etc) wereaggregated to a corresponding resolution We report time steps at0007 0017 etc to account for the scan timing offset of theHimawari-8 satellite used to obtain Egh With the initial Pqc the esti-mated tilt bQ azimuth cQ and LF are then derived through parameter-ization as per Sec IVIII of Killinger et al (2017b)

Figure 4 (top right) plots Pmeas for the ACT on 10 Jan 2017 Thefigure highlights the need for additional quality control to deal withthe observed erratic and intuitively incorrect measurements

B Quality controlled PV measurements

The second data type available is PV power that has undergonethe rigorous QCPV routine Pqc The QCPV routine was initially devel-oped in Engerer (2015) and further developed and described inKillinger et al (2017b) with the exception of updated input datadescribed in Sec IIA additional QCPV tests were developed inKillinger et al (2017a) The QCPV routine consists of six tests directlyon Pmeas of each time step and three more systematic tests that use his-toric statistics and information from different nearby PV systems Allthe tests are summarized in Table I

FIG 2 Flow diagram of all the inputs (green parallelograms) the methodological steps (blue rectangles) and all the intermediary and final output variables (yellow diamonds)The diagram shows the procedures required for transition from the input PV power measurements (Pmeas) are transformed from the raw format through various parameteriza-tions and QCPV steps to become tuned PV power (Ptuned) The clear-sky power data (Psimcs) and quality controlled power data (Pqc) are also provided in the data download

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-3

Published under license by AIP Publishing

Power data that do not pass QCPV are simply replaced by ldquoNArdquofrom the time series this means that the data for Pqc already have theQCPV flags applied A direct comparison of the Pmeas and Pqc timeseries would indicate which time steps have been removed (eg if dataexist in Pmeas but not in Pqc then it was flagged) The QCPV routineoverall reduces the amount of data for ACT from 9157 671 to5 707 504 (623) for SA from 2619 416 to 1 829 595 (699) and forWA from 6763 084 to 4 897 408 (724) These percentages are notrepresentative on a PV system by PV system basis as certain PV sys-tems report data with excellent trust while others can be entirelyremoved by QCPV To demonstrate this Fig 3 shows the number ofPV systems that underwent a certain percentage of data reductionafter QCPV

We can observe the impact of QCPV by comparing the top rightand bottom left panels of Fig 4 Pqc clearly visualizes the removal ofthe intuitively erroneous data observed in Pmeas A subtle increase inthe mean power profile (thick yellow line) can be observed in the firsthalf of the day

C Tuned PV measurements

Tuning is the third development arising from the distributedPV power dataset Tuning corrects systematic biases that impact thePV system Consider a PV system that is shaded for the first half ofthe day every single morning there will be a systematic reduction inpower This is undesirable in a dataset where representative spatialcharacteristics of irradiance are desired because the systematic effecton one PV system is not representative of another To remove thesesystematic biases a tuning algorithm was developed These algo-rithms were initially developed in Engerer (2015) before being morefully developed in Killinger et al (2016b) and Killinger et al (2017a)following on from the QCPV development (Sec II B) In summaryeach PV system is analyzed over a rolling 30-day window wherebythe measurement at 1017 am is compared with the most recent 30previous measurements at 1017 am An assumption is made thatfor at least one of the 30 measurements considered one of the mea-surements was made under clear-sky conditions By taking the 99thpercentile of the 30-measurement distribution expressed askpv frac14 Pqc=Psimcs we would expect to observe a value of 1 under per-fectly clear conditions Should 1017 am never satisfy the definitionof clear conditions the time step is scaled toward clear-sky with theassumption that a systematic bias occurs at that measurement Thismost often captures shading events and rapid soiling events (egsnowfall or dust) it also mitigates for LF

The tuned PV power Ptuned is observed in the bottom right panelof Fig 4 The differences are subtle as PV shading is typically mini-mized by the PV system owner to maximize profitability however wecan observe significant alterations on a specific PV system time seriesAt the start and the end of the visualized day the mean power under-goes a significant trough in Pqc It is expected that many PV systemsregister a 99th percentile from the kpv distribution of the preceding30 days that is substantially below a value of 1mdashindicative of morningafternoon shading by trees buildings or from the roof itself The tun-ing algorithm rescales the distribution so that the measured time stepshave that shading removed This trough feature no longer appears inthe Ptuned time series hence systematic influences have been removed

III DATA ACCESS

The data are available as a two-part download and is hosted bythe ARENA project partner Solcast (httpssolcastcomaurooftop-solarpublication-of-a-research-grade-solar-pv-power-dataset) Solcastis a global solar data services company who provides a considerable

TABLE I A summary of the QCPV criteria (Killinger et al 2017ab)

Test Criteria

Upper limit Comparison of P meas against a theoretical upper limit based on extraterrestrial irradianceLower limit Comparison of Pmeas against a theoretical lower limit and invariable detection (flat lines)Sundown Enforcing a ldquozero-outputrdquo on P meas above a certain zenith angleDaily energy ratio Removal of days with an unusually low amount of total energy generationMaximum kpv Limit of Pmeas against a maximum kpv frac14 Pmeas=Psimcs (Engerer and Mills 2014)Persistence Maximum limit on Pmeas variability and prohibition of zero power values for zenith angles lt70Cloud enhancements Suspected events must be corroborated by nearby PV systems and threshold valuesSpurious data Removal of days showing atypical behavior significantly different from normal operationBenchmark violations Removal of days where energy ratio deviation is significantly different from nearby systems

FIG 3 Histogram showing the amount of data removed as a result of QCPV on aPV system by PV system basis There are 20 bins and QCPV data removal isrounded to the nearest 5 For example nearly 100 PV systems had 100 of theirdata removed from QCPV

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-4

Published under license by AIP Publishing

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

Details of the data structure and format are found in Sec V Someexample code and scripts of how to access and use the data are pro-vided in Sec VI A discussion of the data and its potential uses as wellas closing statements is made in Sec VII

II DATA PROVENANCE

This section details all the information regarding the datasetfrom where the data were first obtained all the way through to it beingavailable for download This section is in essence a methodology ofprocessing raw PV power measurements It is subcategorized intothree main sections raw PV measurements and simulated clear-skyPV output (Sec IIA) QC of PV measurements (QCPV Sec II B) andtuning of PV measurements (Sec IIC)

An overview flowchart of the methodology is included for refer-ence in Fig 2

A Raw PV power measurements and metadata

The original raw PV power measurements were collected at 5-min resolution from httppvoutputorgmdasha public website whereusers are able to report their PV power generation using automateddata loggers that are pre-existing in participating inverter models TheAustralian National University (ANU) has an active subscription to

PVOUTPUTORG at the time of publication and this subscription shall becontinued by the data hosts Solcast (see Sec III) at the close of theANU project For our usage we transformed the original PV powerdata at 10-min intervals to match the Himawari-8 satellite scan timingover Australia (0007 0017 0027 etc) We do not state how this wasachieved to protect the original product from PVOUTPUTORG ulti-mately the data are not ldquorawrdquo from PVOUTPUTORG but a modified timeaverage that nevertheless represents raw data The data available spansa period from September 2016 through March 2017 While this datarelease does not span a full year it includes a vast amount of residen-tial PV installations during the Spring and Summer seasons PV powermeasurements are normalized by the reported rated total modulecapacity of the respective system so that all power values Pmeas arereported in WWp and this enables an easier comparison betweensystems To obtain the original reported power the user must multiplyby the respective capacity

Each user reports their PV system metadata in terms of tilt (b)azimuth (c) and the loss factor (LF) A critical concern is the knownerror in metadata reporting This issue is widely discussed in Killingeret al (2018) We provide the original user reported metadata howeverwe also provide simulated PV metadata as derived from a parameteri-zation technique which derives the most likely PV system metadatafrom the power time series

FIG 1 A map of Australia highlighting Perth WA Adelaide SA and Canberra ACT Each red marker represents the location of a PV system within the database Longitudeand latitude are reported in degrees

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-2

Published under license by AIP Publishing

The parameterization methodology is well defined in Killingeret al (2017a 2017b) (Sec IV) and so is not fully reproduced here In across-validation study against LiDAR data these estimated parameter-izations were found to be reasonably accurate over the ACT (Lingforset al 2018)

Differences in the previously described methodology are due tothe choice of input data In summary the parameterization calculatedclear-sky beam and diffuse irradiance from climatological data andreanalyses datasets in combination with the REST2 clear-sky irradi-ance model (Gueymard 2008) An initial estimate of the clear-skypower curve ethPsimcsTHORN was made using b c and LF and the QuadraticPV model (Killinger et al 2016a) note that the quadratic PV modelwas corrected and reintroduced by Killinger et al (2017b) Psimcs ismade available in the dataset an example of which can be seen in Fig4 (top left) It can be seen that there are some PV system clear-skypower curves that have a prominent afternoon peak indicative of awest facing azimuth An initial pass of the QCPV routine on Pmeas

(defined next in Sec II B) is performed so that only valid first passmeasurements were retained Pqc Satellite derived irradiance of globalhorizontal irradiance (Egh) beam normal irradiance (Ebn) and diffusehorizontal irradiance (Edh) for the corresponding PV time steps weretaken from Solcastmdasha solar forecasting and resource assessment com-panymdashthe derivation of the satellite-derived irradiance is described byEngerer et al (2017) Bright et al (2018) Irradiance was then

transposed onto the plane of array irradiance using the Reindl trans-position model (Reindl et al 1990) (all future described transpositionsto the plane of array use the Reindl model) Satellite-derived irradiancehas a temporal resolution of 10min hence all datasets used within thispaper (eg PV power measurements ambient temperature etc) wereaggregated to a corresponding resolution We report time steps at0007 0017 etc to account for the scan timing offset of theHimawari-8 satellite used to obtain Egh With the initial Pqc the esti-mated tilt bQ azimuth cQ and LF are then derived through parameter-ization as per Sec IVIII of Killinger et al (2017b)

Figure 4 (top right) plots Pmeas for the ACT on 10 Jan 2017 Thefigure highlights the need for additional quality control to deal withthe observed erratic and intuitively incorrect measurements

B Quality controlled PV measurements

The second data type available is PV power that has undergonethe rigorous QCPV routine Pqc The QCPV routine was initially devel-oped in Engerer (2015) and further developed and described inKillinger et al (2017b) with the exception of updated input datadescribed in Sec IIA additional QCPV tests were developed inKillinger et al (2017a) The QCPV routine consists of six tests directlyon Pmeas of each time step and three more systematic tests that use his-toric statistics and information from different nearby PV systems Allthe tests are summarized in Table I

FIG 2 Flow diagram of all the inputs (green parallelograms) the methodological steps (blue rectangles) and all the intermediary and final output variables (yellow diamonds)The diagram shows the procedures required for transition from the input PV power measurements (Pmeas) are transformed from the raw format through various parameteriza-tions and QCPV steps to become tuned PV power (Ptuned) The clear-sky power data (Psimcs) and quality controlled power data (Pqc) are also provided in the data download

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-3

Published under license by AIP Publishing

Power data that do not pass QCPV are simply replaced by ldquoNArdquofrom the time series this means that the data for Pqc already have theQCPV flags applied A direct comparison of the Pmeas and Pqc timeseries would indicate which time steps have been removed (eg if dataexist in Pmeas but not in Pqc then it was flagged) The QCPV routineoverall reduces the amount of data for ACT from 9157 671 to5 707 504 (623) for SA from 2619 416 to 1 829 595 (699) and forWA from 6763 084 to 4 897 408 (724) These percentages are notrepresentative on a PV system by PV system basis as certain PV sys-tems report data with excellent trust while others can be entirelyremoved by QCPV To demonstrate this Fig 3 shows the number ofPV systems that underwent a certain percentage of data reductionafter QCPV

We can observe the impact of QCPV by comparing the top rightand bottom left panels of Fig 4 Pqc clearly visualizes the removal ofthe intuitively erroneous data observed in Pmeas A subtle increase inthe mean power profile (thick yellow line) can be observed in the firsthalf of the day

C Tuned PV measurements

Tuning is the third development arising from the distributedPV power dataset Tuning corrects systematic biases that impact thePV system Consider a PV system that is shaded for the first half ofthe day every single morning there will be a systematic reduction inpower This is undesirable in a dataset where representative spatialcharacteristics of irradiance are desired because the systematic effecton one PV system is not representative of another To remove thesesystematic biases a tuning algorithm was developed These algo-rithms were initially developed in Engerer (2015) before being morefully developed in Killinger et al (2016b) and Killinger et al (2017a)following on from the QCPV development (Sec II B) In summaryeach PV system is analyzed over a rolling 30-day window wherebythe measurement at 1017 am is compared with the most recent 30previous measurements at 1017 am An assumption is made thatfor at least one of the 30 measurements considered one of the mea-surements was made under clear-sky conditions By taking the 99thpercentile of the 30-measurement distribution expressed askpv frac14 Pqc=Psimcs we would expect to observe a value of 1 under per-fectly clear conditions Should 1017 am never satisfy the definitionof clear conditions the time step is scaled toward clear-sky with theassumption that a systematic bias occurs at that measurement Thismost often captures shading events and rapid soiling events (egsnowfall or dust) it also mitigates for LF

The tuned PV power Ptuned is observed in the bottom right panelof Fig 4 The differences are subtle as PV shading is typically mini-mized by the PV system owner to maximize profitability however wecan observe significant alterations on a specific PV system time seriesAt the start and the end of the visualized day the mean power under-goes a significant trough in Pqc It is expected that many PV systemsregister a 99th percentile from the kpv distribution of the preceding30 days that is substantially below a value of 1mdashindicative of morningafternoon shading by trees buildings or from the roof itself The tun-ing algorithm rescales the distribution so that the measured time stepshave that shading removed This trough feature no longer appears inthe Ptuned time series hence systematic influences have been removed

III DATA ACCESS

The data are available as a two-part download and is hosted bythe ARENA project partner Solcast (httpssolcastcomaurooftop-solarpublication-of-a-research-grade-solar-pv-power-dataset) Solcastis a global solar data services company who provides a considerable

TABLE I A summary of the QCPV criteria (Killinger et al 2017ab)

Test Criteria

Upper limit Comparison of P meas against a theoretical upper limit based on extraterrestrial irradianceLower limit Comparison of Pmeas against a theoretical lower limit and invariable detection (flat lines)Sundown Enforcing a ldquozero-outputrdquo on P meas above a certain zenith angleDaily energy ratio Removal of days with an unusually low amount of total energy generationMaximum kpv Limit of Pmeas against a maximum kpv frac14 Pmeas=Psimcs (Engerer and Mills 2014)Persistence Maximum limit on Pmeas variability and prohibition of zero power values for zenith angles lt70Cloud enhancements Suspected events must be corroborated by nearby PV systems and threshold valuesSpurious data Removal of days showing atypical behavior significantly different from normal operationBenchmark violations Removal of days where energy ratio deviation is significantly different from nearby systems

FIG 3 Histogram showing the amount of data removed as a result of QCPV on aPV system by PV system basis There are 20 bins and QCPV data removal isrounded to the nearest 5 For example nearly 100 PV systems had 100 of theirdata removed from QCPV

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-4

Published under license by AIP Publishing

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

The parameterization methodology is well defined in Killingeret al (2017a 2017b) (Sec IV) and so is not fully reproduced here In across-validation study against LiDAR data these estimated parameter-izations were found to be reasonably accurate over the ACT (Lingforset al 2018)

Differences in the previously described methodology are due tothe choice of input data In summary the parameterization calculatedclear-sky beam and diffuse irradiance from climatological data andreanalyses datasets in combination with the REST2 clear-sky irradi-ance model (Gueymard 2008) An initial estimate of the clear-skypower curve ethPsimcsTHORN was made using b c and LF and the QuadraticPV model (Killinger et al 2016a) note that the quadratic PV modelwas corrected and reintroduced by Killinger et al (2017b) Psimcs ismade available in the dataset an example of which can be seen in Fig4 (top left) It can be seen that there are some PV system clear-skypower curves that have a prominent afternoon peak indicative of awest facing azimuth An initial pass of the QCPV routine on Pmeas

(defined next in Sec II B) is performed so that only valid first passmeasurements were retained Pqc Satellite derived irradiance of globalhorizontal irradiance (Egh) beam normal irradiance (Ebn) and diffusehorizontal irradiance (Edh) for the corresponding PV time steps weretaken from Solcastmdasha solar forecasting and resource assessment com-panymdashthe derivation of the satellite-derived irradiance is described byEngerer et al (2017) Bright et al (2018) Irradiance was then

transposed onto the plane of array irradiance using the Reindl trans-position model (Reindl et al 1990) (all future described transpositionsto the plane of array use the Reindl model) Satellite-derived irradiancehas a temporal resolution of 10min hence all datasets used within thispaper (eg PV power measurements ambient temperature etc) wereaggregated to a corresponding resolution We report time steps at0007 0017 etc to account for the scan timing offset of theHimawari-8 satellite used to obtain Egh With the initial Pqc the esti-mated tilt bQ azimuth cQ and LF are then derived through parameter-ization as per Sec IVIII of Killinger et al (2017b)

Figure 4 (top right) plots Pmeas for the ACT on 10 Jan 2017 Thefigure highlights the need for additional quality control to deal withthe observed erratic and intuitively incorrect measurements

B Quality controlled PV measurements

The second data type available is PV power that has undergonethe rigorous QCPV routine Pqc The QCPV routine was initially devel-oped in Engerer (2015) and further developed and described inKillinger et al (2017b) with the exception of updated input datadescribed in Sec IIA additional QCPV tests were developed inKillinger et al (2017a) The QCPV routine consists of six tests directlyon Pmeas of each time step and three more systematic tests that use his-toric statistics and information from different nearby PV systems Allthe tests are summarized in Table I

FIG 2 Flow diagram of all the inputs (green parallelograms) the methodological steps (blue rectangles) and all the intermediary and final output variables (yellow diamonds)The diagram shows the procedures required for transition from the input PV power measurements (Pmeas) are transformed from the raw format through various parameteriza-tions and QCPV steps to become tuned PV power (Ptuned) The clear-sky power data (Psimcs) and quality controlled power data (Pqc) are also provided in the data download

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-3

Published under license by AIP Publishing

Power data that do not pass QCPV are simply replaced by ldquoNArdquofrom the time series this means that the data for Pqc already have theQCPV flags applied A direct comparison of the Pmeas and Pqc timeseries would indicate which time steps have been removed (eg if dataexist in Pmeas but not in Pqc then it was flagged) The QCPV routineoverall reduces the amount of data for ACT from 9157 671 to5 707 504 (623) for SA from 2619 416 to 1 829 595 (699) and forWA from 6763 084 to 4 897 408 (724) These percentages are notrepresentative on a PV system by PV system basis as certain PV sys-tems report data with excellent trust while others can be entirelyremoved by QCPV To demonstrate this Fig 3 shows the number ofPV systems that underwent a certain percentage of data reductionafter QCPV

We can observe the impact of QCPV by comparing the top rightand bottom left panels of Fig 4 Pqc clearly visualizes the removal ofthe intuitively erroneous data observed in Pmeas A subtle increase inthe mean power profile (thick yellow line) can be observed in the firsthalf of the day

C Tuned PV measurements

Tuning is the third development arising from the distributedPV power dataset Tuning corrects systematic biases that impact thePV system Consider a PV system that is shaded for the first half ofthe day every single morning there will be a systematic reduction inpower This is undesirable in a dataset where representative spatialcharacteristics of irradiance are desired because the systematic effecton one PV system is not representative of another To remove thesesystematic biases a tuning algorithm was developed These algo-rithms were initially developed in Engerer (2015) before being morefully developed in Killinger et al (2016b) and Killinger et al (2017a)following on from the QCPV development (Sec II B) In summaryeach PV system is analyzed over a rolling 30-day window wherebythe measurement at 1017 am is compared with the most recent 30previous measurements at 1017 am An assumption is made thatfor at least one of the 30 measurements considered one of the mea-surements was made under clear-sky conditions By taking the 99thpercentile of the 30-measurement distribution expressed askpv frac14 Pqc=Psimcs we would expect to observe a value of 1 under per-fectly clear conditions Should 1017 am never satisfy the definitionof clear conditions the time step is scaled toward clear-sky with theassumption that a systematic bias occurs at that measurement Thismost often captures shading events and rapid soiling events (egsnowfall or dust) it also mitigates for LF

The tuned PV power Ptuned is observed in the bottom right panelof Fig 4 The differences are subtle as PV shading is typically mini-mized by the PV system owner to maximize profitability however wecan observe significant alterations on a specific PV system time seriesAt the start and the end of the visualized day the mean power under-goes a significant trough in Pqc It is expected that many PV systemsregister a 99th percentile from the kpv distribution of the preceding30 days that is substantially below a value of 1mdashindicative of morningafternoon shading by trees buildings or from the roof itself The tun-ing algorithm rescales the distribution so that the measured time stepshave that shading removed This trough feature no longer appears inthe Ptuned time series hence systematic influences have been removed

III DATA ACCESS

The data are available as a two-part download and is hosted bythe ARENA project partner Solcast (httpssolcastcomaurooftop-solarpublication-of-a-research-grade-solar-pv-power-dataset) Solcastis a global solar data services company who provides a considerable

TABLE I A summary of the QCPV criteria (Killinger et al 2017ab)

Test Criteria

Upper limit Comparison of P meas against a theoretical upper limit based on extraterrestrial irradianceLower limit Comparison of Pmeas against a theoretical lower limit and invariable detection (flat lines)Sundown Enforcing a ldquozero-outputrdquo on P meas above a certain zenith angleDaily energy ratio Removal of days with an unusually low amount of total energy generationMaximum kpv Limit of Pmeas against a maximum kpv frac14 Pmeas=Psimcs (Engerer and Mills 2014)Persistence Maximum limit on Pmeas variability and prohibition of zero power values for zenith angles lt70Cloud enhancements Suspected events must be corroborated by nearby PV systems and threshold valuesSpurious data Removal of days showing atypical behavior significantly different from normal operationBenchmark violations Removal of days where energy ratio deviation is significantly different from nearby systems

FIG 3 Histogram showing the amount of data removed as a result of QCPV on aPV system by PV system basis There are 20 bins and QCPV data removal isrounded to the nearest 5 For example nearly 100 PV systems had 100 of theirdata removed from QCPV

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-4

Published under license by AIP Publishing

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

Power data that do not pass QCPV are simply replaced by ldquoNArdquofrom the time series this means that the data for Pqc already have theQCPV flags applied A direct comparison of the Pmeas and Pqc timeseries would indicate which time steps have been removed (eg if dataexist in Pmeas but not in Pqc then it was flagged) The QCPV routineoverall reduces the amount of data for ACT from 9157 671 to5 707 504 (623) for SA from 2619 416 to 1 829 595 (699) and forWA from 6763 084 to 4 897 408 (724) These percentages are notrepresentative on a PV system by PV system basis as certain PV sys-tems report data with excellent trust while others can be entirelyremoved by QCPV To demonstrate this Fig 3 shows the number ofPV systems that underwent a certain percentage of data reductionafter QCPV

We can observe the impact of QCPV by comparing the top rightand bottom left panels of Fig 4 Pqc clearly visualizes the removal ofthe intuitively erroneous data observed in Pmeas A subtle increase inthe mean power profile (thick yellow line) can be observed in the firsthalf of the day

C Tuned PV measurements

Tuning is the third development arising from the distributedPV power dataset Tuning corrects systematic biases that impact thePV system Consider a PV system that is shaded for the first half ofthe day every single morning there will be a systematic reduction inpower This is undesirable in a dataset where representative spatialcharacteristics of irradiance are desired because the systematic effecton one PV system is not representative of another To remove thesesystematic biases a tuning algorithm was developed These algo-rithms were initially developed in Engerer (2015) before being morefully developed in Killinger et al (2016b) and Killinger et al (2017a)following on from the QCPV development (Sec II B) In summaryeach PV system is analyzed over a rolling 30-day window wherebythe measurement at 1017 am is compared with the most recent 30previous measurements at 1017 am An assumption is made thatfor at least one of the 30 measurements considered one of the mea-surements was made under clear-sky conditions By taking the 99thpercentile of the 30-measurement distribution expressed askpv frac14 Pqc=Psimcs we would expect to observe a value of 1 under per-fectly clear conditions Should 1017 am never satisfy the definitionof clear conditions the time step is scaled toward clear-sky with theassumption that a systematic bias occurs at that measurement Thismost often captures shading events and rapid soiling events (egsnowfall or dust) it also mitigates for LF

The tuned PV power Ptuned is observed in the bottom right panelof Fig 4 The differences are subtle as PV shading is typically mini-mized by the PV system owner to maximize profitability however wecan observe significant alterations on a specific PV system time seriesAt the start and the end of the visualized day the mean power under-goes a significant trough in Pqc It is expected that many PV systemsregister a 99th percentile from the kpv distribution of the preceding30 days that is substantially below a value of 1mdashindicative of morningafternoon shading by trees buildings or from the roof itself The tun-ing algorithm rescales the distribution so that the measured time stepshave that shading removed This trough feature no longer appears inthe Ptuned time series hence systematic influences have been removed

III DATA ACCESS

The data are available as a two-part download and is hosted bythe ARENA project partner Solcast (httpssolcastcomaurooftop-solarpublication-of-a-research-grade-solar-pv-power-dataset) Solcastis a global solar data services company who provides a considerable

TABLE I A summary of the QCPV criteria (Killinger et al 2017ab)

Test Criteria

Upper limit Comparison of P meas against a theoretical upper limit based on extraterrestrial irradianceLower limit Comparison of Pmeas against a theoretical lower limit and invariable detection (flat lines)Sundown Enforcing a ldquozero-outputrdquo on P meas above a certain zenith angleDaily energy ratio Removal of days with an unusually low amount of total energy generationMaximum kpv Limit of Pmeas against a maximum kpv frac14 Pmeas=Psimcs (Engerer and Mills 2014)Persistence Maximum limit on Pmeas variability and prohibition of zero power values for zenith angles lt70Cloud enhancements Suspected events must be corroborated by nearby PV systems and threshold valuesSpurious data Removal of days showing atypical behavior significantly different from normal operationBenchmark violations Removal of days where energy ratio deviation is significantly different from nearby systems

FIG 3 Histogram showing the amount of data removed as a result of QCPV on aPV system by PV system basis There are 20 bins and QCPV data removal isrounded to the nearest 5 For example nearly 100 PV systems had 100 of theirdata removed from QCPV

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-4

Published under license by AIP Publishing

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

free allowance of access to solar radiation and PV datasets for researchpurposes There are two files that must be downloaded and unzipped(i) metadata and timestamps (httpss3-ap-southeast-2amazonaws-comsolcast-site-downloadsMetadata_and_Timestampszip) and (ii)PV power measurements (httpss3-ap-southeast-2amazonawscomsolcast-site-downloadsPvPowthornforthornWA2CthornSA2CthornACTzip)The user can either navigate to the above Solcast link and locate thezipped files or they can simply click the two hyperlinks to start thedownload The metadata and timestamp file is 095 MB in sizeand the PV power data for all three cities is 539 MB The dataset islisted at ANU Data Commons (httpsdxdoiorg10259115ca6a0640869a) and mirrored after an initial embargo period onZenodo (httpszenodoorgrecord2635887) both with the sameDOI10259115ca6a0640869a

IV DATA USAGE RIGHTS

The data are released with bespoke terms We state the cru-cial elements of these terms here The dataset is freely provided to

researchers as is with no guarantee of support The dataset is notfor commercial usage but for research only You are empoweredto use this dataset however you wish in your research throughdirect usage adaptation or improvements to the data itself Thedata must not be redistributed and the access point for the data isexclusively through the website as described in Sec III Shouldyou make significant changes to the data and wish to redistributethe new data explicit permission must be obtained from theauthors Finally appropriate accreditation to the creators must bemade in all publications and outputs that arise from using thisdataset in any way To appropriately accredit the creators werequire that this exact data article is referenced alongside its DOIhttpsdxdoiorg10259115ca6a0640869a Additionally if usingthe QC version of the data we also require a citation for the origi-nal papers detailing QCPV (Killinger et al 2016a 2017b)Furthermore if using the tuned PV version of this data we alsorequire a citation for both the QCPV papers above and the PVtuning papers (Killinger et al 2016b 2017ab) for full visibility of

FIG 4 Examples of the time series of four different PV power data available for download Plotted for the ACT dataset for 10 Jan 2017 selected to demonstrate both clear-sky and cloudy periods This figure is produced in the matlab_examplem script provided (Top left) the simulated clear-sky power curve Psimcs (top right) original raw Pmeas(bottom left) output after QCPV Pqc (bottom right) output after tuning Ptuned The thick yellow line represents the mean of all the data

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-5

Published under license by AIP Publishing

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

the data provenance Lastly the original hosts of this dataPVOUTPUTORG should be recognized for their efforts

V DATA FORMAT

The two downloaded zip files must be unzipped to extract thecomma separated value (csv) files Many free tools can be used tounzip the downloaded files eg 7-zip PeaZip WinRAR etc The dataare delimited by a semicolon and have the convention for the stringNA for missing or filtered data

Once extracted there are 18 files representing 6 types of file perlocation provided These files are the metadata for each system thetimestamps and four levels of PV power data We provide data forthe Australian Capital Territory (ACT) South Australia (SA) andWestern Australia (WA) most notably this covers the cities ofCanberra Adelaide and Perth respectively The full formatting of thefiles is detailed in Table II

To summarize the metadata and timestamp files are essential inorder to find the appropriate column and the time reference withinthe power files Each column in the power files represents a differentPV system and the header row details the identity number (ID) forthat column so that the corresponding metadata can be found in the

metadata file Every row in the power files correspond to a 10-mintimestamp that can be found in the equivalent row within the timefile This means that each row is a single time step and is commonbetween all PV systems found within the power file The power isreported in kWkWp and so to get power in kilowatt one must mul-tiply the whole column by the appropriately ID corresponding capac-ity found in the metadata file The purpose of this is to normalize thedata so that all PV systems can be directly compared to each otherWere one to plot a heatmap of all the sites for a single row it would bepossible to see the probable influence of clouds on the ground for thatexact time step (Bright et al 2018)

VI DATA HANDLING AND EXAMPLES

We provide three accompanying scripts that contain withinsome typical approaches to loading and exploring the data containedThree scripting languages are selected due to their (i) popularity (ii)convention and (iii) our expertise Python R and MatlabVR WhileMatlab is a proprietary software it is widely used in many academicand private research institutions R and Python are freely availableThe three scripts are

TABLE II Format of the downloaded data files and a description of the data within

Fieldfile Variable Description Unit

Metadatametadata_final_ hellip File name followed by ACTSAWA hellip1st column ID Identity number from PVOUTPUTORG hellip2nd column H Latitude frac123rd column U Longitude frac124nd column hellip Installed capacity of systems [kWp]5th column hellip Installation date hellip6th column c Azimuth angle of modules reported frac127th column b Tilt angle of modules reported frac128th column cQ Azimuth angle of modules estimated frac129th column bQ Tilt angle of modules estimated frac1210th column LF Loss factor hellip1st row hellip Header hellipEvery other row hellip Values for each system hellipTimestampsunixtime_final_ hellip File name followed by ACTSAWA hellip1st column t Time in Unixtime format (10-min res) [UTC]2nd column t Time at the locality (10-min res) [Local time]1st row hellip Header hellipEvery other row hellip Corresponding timestamp to row in PV power file hellipPV power time seriesPVpow_trim_ Pmeas File with original PV power measurements followed by ACTSAWA [kWkWp]PVpow_qc_trim_ Pqc File with quality controlled PV power measurements followed by ACTSAWA [kWkWp]PVpow_tuned_ Ptuned File with tuned PV power measurements followed by ACTSAWA [kWkWp]PVpow_cs_trim_ Psimcs File with simulated clear-sky PV power followed by ACTSAWA [kWkWp]Every column hellip A time series from a different PV system [kWkWp]1st row ID Identity number from the corresponding metadata file hellipEvery other row hellip Normalized PV power time series corresponding to timestamps [kWkWp]

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-6

Published under license by AIP Publishing

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

bull matlab_examplembull R_exampleRbull python_examplepy

The scripts are designed to be placed within the same folder asthe downloaded and extracted data files Each script is well com-mented and shows how to open and load the information as well ashow to plot some of the information The intention is to demonstratesimple usability to encourage and benefit those who are less familiarwith the languages As an example Fig 4 was produced usingmatlab_examplem

Besides the csv files can be opened directly in Microsoft Excel(not recommended) however Excel does not recognize the native for-mat As such the user must use the ldquoText to Columnsrdquo feature withthe ldquodelimiterrdquo and ldquosemicolonrdquo settings activated in order to have allthe data in appropriate rows and columns Furthermore the data usethe convention of the string NA to represent missing data Excelrequires the format to be ldquoNArdquo therefore a ldquofind amp replacerdquo shouldbe performed swapping NA for NA It is recommended for this tobe done in an alternative computer program (eg NotepadNotepadthornthorn) for computational efficiency as Excel is slow in thisregard

VII DISCUSSION AND SUMMARY

The described dataset has already been an important feature formany published research papers We also highlight use cases wherethese data could also be used and applied in future research In the fol-lowing a short overview about typical use cases (previous and poten-tial) is given

First we discuss in terms of research that has already leveragedthe presented dataset The quality and the state of PV power datasetsare often a significant barrier for operational usage this motivated thedevelopment of a parametrization and quality control approach (SecII B) To balance systematic influences like shading a tuning approachwas developed on top of that (Sec II C) These approaches were thencompared to LiDAR-based approaches in order to assess the impact ofLiDAR spatial resolution (Lingfors et al 2017) and to corroborate theaccuracy of reported metadata vs LiDAR approximated metadata andparameterized metadata (Lingfors et al 2018) The metadata fromthis release were used to study the characteristics of PV systems inAustralia alongside many other countries (Killinger et al 2018) Thespatially distributed nature of the PV power dataset enabled advancednowcasting techniques using the PV power data to be used as a feed-back to a conventional satellite nowcasting methodology (Bright et al2018)

In terms of research potential there is considerable scope andopportunity that can be derived from the dataset Exploration into bat-tery storage optimization and grid integration could be performedwith the dataset (Coimbra 2019) Information on the spatio-temporalrelationship of solar energy and PV power generation is also facili-tated there have been many recent studies on the collocation of elec-tric vehicles and how the spatial distribution of PV power couldinfluence charging effectiveness (Shepero et al 2018 and Lingforset al 2018) Regional PV power modeling (Saint-Drenan et al 2018)PV aggregation (Lingfors and Widen 2016) and upscaling studies(Killinger 2017) have an interesting potential from such a datasetThere are also interesting opportunities for spatial time series data

visualization PV power forecasting development and testing as wellas a component within hierarchical modeling

Readers of this paper that are inclined to use this PV power data-set would also benefit from alternative and accompanying global data-sets For example in Bright et al (2018) power data wereaccompanied by Himawari-8 satellite imagery and numerical weatherprediction models in order to derive irradiance forecasts Many usefulaccompanying datasets are available from the National ComputationalInfrastructure (NCI) which is supported by the AustralianGovernment (NCI 2019ab) this should be the first point of referencefor Australian centric relevant and well-suited data including satelliteimagery Weather variables can be obtained from reanalyses datasetsIn the authorrsquos opinion the best available reanalysis dataset is theModern-Era Retrospective analysis for Research and ApplicationsVersion 2 by NASA (MERRA-2 httpsgmaogsfcnasagovreanaly-sisMERRA-2) this is because of the high temporal resolution of 1-hthe reasonable spatial resolution of 50 km2 latitudinally and the signif-icant historical coverage from 1980 to the present day

In conclusion we hope that this data article proves a useful guideto understanding and easily accessing a dataset of considerableresearch value

SUPPLEMENTARY MATERIAL

See supplementary material for scripts in Matlab R and Pythonto demonstrate possible ways of working with the data

ACKNOWLEDGMENTS

Dr Jamie M Bright and Dr Nicholas A Engerer were fundedby the Australian Renewable Energy Agency (ARENA Researchand Development Program Funding G00854) We would like toextend our gratitude to PVoutputorg for operating a valuablewebsite and for supporting this publication with written consent torepublish a similar yet transformed dataset derived from theirefforts

Dr Nicholas A Engerer is also the CTO of Solcast who is thehost of the provided dataset

REFERENCESBright J M Babacan O Kleissl J Taylor P G and Crook R ldquoA syntheticspatially decorrelating solar irradiance generator and application to a LV gridmodel with high PV penetrationrdquo Sol Energy 147 83ndash98 (2017)

Bright J M Killinger S Lingfors D and Engerer N A ldquoImproved satellite-derived PV power nowcasting using real-time power data from reference PVsystemsrdquo J Sol Energy 168 118ndash139 (2018)

Coimbra C F M ldquoLooking ahead with the Journal of Renewable andSustainable Energy Volume 11 and beyondrdquo J Renewable Sustainable Energy11 010401 (2019)

Engerer N A ldquoCity-wide simulations of distributed photovoltaic array poweroutputrdquo Doctor of Philosophy (The Australian National University 2015)

Engerer N A and Mills F P ldquoKPV A clear-sky index for photovoltaicsrdquo SolEnergy 105 679ndash693 (2014)

Engerer N A Bright J M and Killinger S ldquoHimawari 8 enabled real-time dis-tributed PV simulations for distribution networksrdquo in PVSC44 (WashingtonDC 2017)

Gueymard C A ldquoREST2 High-performance solar radiation model forcloudless-sky irradiance illuminance and photosynthetically active radiationValidation with a benchmark datasetrdquo Sol Energy 82 272ndash285 (2008)

Jones R Bright J and Engerer N A ldquoSolar forecasting for low voltage networkoperatorsrdquo Technical Report (Oakley Greenwood on behalf of ARENA 2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-7

Published under license by AIP Publishing

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27

Killinger S ldquoEvaluating different upscaling approaches to derive the actualpower of distributed PV systemsrdquo in 44th Photovoltaic Specialist Conference(PVSC44) Washington DC USA 2017

Killinger S Braam F Muller B Wille-Haussmann B and McKenna RldquoProjection of power generation between differently-oriented PV systemsrdquo SolEnergy 136 153ndash165 (2016a)

Killinger S Muller B Saint-Drenan Y M and McKenna R ldquoTowards animproved nowcasting method by evaluating power profiles of PV systems todetect apparently atypical behaviorrdquo in Conference Record of the IEEEPhotovoltaic Specialists (2016b) pp 980ndash985

Killinger S Bright J M Lingfors D and Engerer N A ldquoA tuning routine tocorrect systematic influences in reference PV systemsrsquo power outputsrdquo SolEnergy 157 1082ndash1094 (2017a)

Killinger S Engerer N and Meurouller B ldquoQCPV A quality control algorithm fordistributed photovoltaic array power outputrdquo Sol Energy 143 120ndash131(2017b)

Killinger S Lingfors D Saint-drenan Y-M Moraitis P Sark W V TaylorJ Engerer N A and Bright J M ldquoOn the search for representative character-istics of PV systems Data collection and analysis of PV system azimuth tiltcapacity yield and shadingrdquo Sol Energy 173 1087ndash1106 (2018)

Lave M Reno M J and Broderick R J ldquoCharacterizing local high-frequencysolar variability and its impact to distribution studiesrdquo Sol Energy 118327ndash337 (2015)

Lingfors D and Widen J ldquoDevelopment and validation of a wide-area model ofhourly aggregate solar power generationrdquo Energy 102 559ndash566 (2016)

Lingfors D Bright J M Engerer N A Ahlberg J Killinger S Widen JAlhberg J Killinger S and Widen J ldquoComparing the capability of low- andhigh-resolution LiDAR data with application to solar resource assessment

roof type classification and shading analysisrdquo Appl Energy 205 1216ndash1230(2017)

Lingfors D Killinger S Engerer N A Widen J and Bright J MldquoIdentification of PV system shading using a LiDAR-based solar resourceassessment model An evaluation and cross-validationrdquo Sol Energy 159157ndash172 (2018)

Lingfors D Shepero M Good C Bright J Widen J Bostreuroom T andMunkhammar J ldquoModelling city scale spatio-temporal solar energy generationand electric vehicle charging loadrdquo in 8th International Workshop onIntegration of Solar into Power Systems Stockholm Sweden 2018 pp 1ndash8

Munkhammar J and Widen J ldquoAn N-state Markov-chain mixture distributionmodel of the clear-sky indexrdquo Sol Energy 173 487ndash495 (2018)

Munkhammar J Widen J and Hinkelman L M ldquoA copula method for simu-lating correlated instantaneous solar irradiance in spatial networksrdquo SolEnergy 143 10ndash21 (2017)

NCI National Computational Infrastructure Data Access (NCI 2019a)NCI NCI Data Portal NCI Catalogue (NCI 2019b)Reindl D T Beckman W A and Duffie J A ldquoEvaluation of hourly tilted sur-face radiation modelsrdquo Sol Energy 45 9ndash17 (1990)

Saint-Drenan Y-M Wald L Ranchin T Dubus L and Troccoli A ldquoAnapproach for the estimation of the aggregated photovoltaic power generated inseveral European countries from meteorological datardquo Adv Sci Res 15 51ndash62(2018)

Shepero M Munkhammar J Widen J Bishop J D and Bostreuroom TldquoModeling of photovoltaic power generation and electric vehicles charging oncity-scale A reviewrdquo Renewable Sustainable Energy Rev 89 61ndash71 (2018)

Yang D ldquoSolarData An R package for easy access of publicly available solardatasetsrdquo Sol Energy 171 A3ndashA12 (2018)

Journal of Renewableand Sustainable Energy ARTICLE scitationorgjournalrse

J Renewable Sustainable Energy 11 035504 (2019) doi 10106315094059 11 035504-8

Published under license by AIP Publishing

  • s1
  • s2
  • s2A
  • f1
  • s2B
  • f2
  • s2C
  • s3
  • t1
  • f3
  • s4
  • f4
  • s5
  • s6
  • t2
  • s7
  • s8
  • c1
  • c2
  • c3
  • c4
  • c5
  • c6
  • c7
  • c8
  • c9
  • c10
  • c11
  • c12
  • c13
  • c14
  • c15
  • c16
  • c17
  • c18
  • c19
  • c20
  • c21
  • c22
  • c23
  • c24
  • c25
  • c26
  • c27