94
STAT The MARS Crop Yield Forecasting System METHODOLOGY OF THE MARS CROP YIELD FORECASTING SYSTEM VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS 1 Giampiero Genovese and Manola Bettio editors 2 Contributions from Giampiero Genovese 2 , Manola Bettio 2 , Stefania Orlandi 2 , Hendrik Boogaard 3 , Kees Van Diepen 3 , Michalis Petrakos 4 , Photis Stavropoulos 4 , Ioanna Tassoula 4 , Maria Glossioti 4 1 This book is mainly based on the achievements of the following studies: - METAMP (Methodological assessment of MARS predictions) Study EC-JRC project IPSC-AGRIFISH Unit MARS Stat Action. The project was executed by external contractor ALTERRA/VITO/Supit Cons. Contract N° 19226-2002-02 F1FED ISP NL in 2002. The study was updated according to internal JRC advances and to the MARSOP (ITT OJ 2003/S 141-127580 2003) and ASEMARS (OJ/S 157 13/08/2004) technical specifications (JRC-IPSC-AGRIFISH/MARS Stat) issued respectively in 2003 and 2004 - QUAMP (Quality assessment of MARS predictions) Study EC-JRC project IPSC-AGRIFISH Unit MARS Stat Action. The project was executed by external contractor Liaison Systems Contract N° 20240-2002-12 F1ED ISP GR in 2003). 2 JRC-EC, IPSC, AGRIFISH/MARS STAT, tp 266, contact [email protected] 3 Droevendaalsesteeg 3, 6700 AA, Wageningen, The Netherlands Tel: +31 317 474371 Fax: +31 317 419000 Email: [email protected] 4 (Former Liasion System) Akadimias 77, 10678 Athens, Greece Tel: +30-2103304315-7 Fax: +30-2103304345 web: www.agilis-sa.gr email: [email protected]

VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Embed Size (px)

Citation preview

Page 1: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

STAT

The MARS Crop Yield Forecasting System

METHODOLOGY OF THE MARS CROP YIELD FORECASTING

SYSTEM

VOL 4

STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Giampiero Genovese and Manola Bettio editors2

Contributions from Giampiero Genovese2, Manola Bettio2, Stefania Orlandi2, Hendrik Boogaard3, Kees Van Diepen3, Michalis Petrakos4, Photis Stavropoulos4, Ioanna

Tassoula4, Maria Glossioti4

1 This book is mainly based on the achievements of the following studies: - METAMP (Methodological assessment of MARS predictions) Study EC-JRC project IPSC-AGRIFISH Unit MARS Stat Action. The project was executed by external contractor ALTERRA/VITO/Supit Cons. Contract N° 19226-2002-02 F1FED ISP NL in 2002. The study was updated according to internal JRC advances and to the MARSOP (ITT OJ 2003/S 141-127580 2003) and ASEMARS (OJ/S 157 13/08/2004) technical specifications (JRC-IPSC-AGRIFISH/MARS Stat) issued respectively in 2003 and 2004 - QUAMP (Quality assessment of MARS predictions) Study EC-JRC project IPSC-AGRIFISH Unit MARS Stat Action. The project was executed by external contractor Liaison Systems Contract N° 20240-2002-12 F1ED ISP GR in 2003). 2 JRC-EC, IPSC, AGRIFISH/MARS STAT, tp 266, contact [email protected]

3 Droevendaalsesteeg 3, 6700 AA, Wageningen, The Netherlands Tel: +31 317 474371 Fax: +31 317 419000 Email: [email protected]

4 (Former Liasion System) Akadimias 77, 10678 Athens, Greece Tel: +30-2103304315-7 Fax: +30-2103304345 web: www.agilis-sa.gr email: [email protected]

Page 2: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1
Page 3: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 1

PREFACE The MARS (Monitoring Agriculture with Remote Sensing) STAT Sector of the Agriculture and Fisheries Unit of the Joint Research Centre (DG-JRC EC) is part of the Institute for the Protection and Security of the Citizen. Its mission is strongly “customer driven” and focuses on the support of EC-DGs in the accomplishment of EU policies. The MARS project, since its conception 14 years ago, went through several Research Framework Programs studying, developing and implementing a number of methodologies and techniques answering the requests of different EC Directorate Generals. Main MARS activities are: anti-fraud verifications and controls - MARS PAC (support to DG-Agriculture and Member States), food security activities - MARS FOOD (support to DG-Development and DG-AidCom), and outlook statistics on crop production - MARS STAT (support to DG-Agriculture). The need in Europe for early figures on harvests led to the development, within the MARS project, of the MARS-Stat activities (Meyer-Roux, Vossen: The first phase of the MARS project, 1988-1993. Overview, methods and results. In proceedings of the Conference on: “The MARS project, overview and perspectives”, Belgirate, November 19934). Among these, a crop yield forecasting system was put in place in order to supply early information to the DG-Agriculture Outlook group on the development and growth conditions of crops. The system relies on a Pan-European agro-meteorological method of analysis (Vossen, Rijks, third print, 1996. Early crop yield assessment of the EU Countries: the system implemented by the Joint Research Centre5). After some years of research in co-operation with Member States and a pre-operational-phase, the MARS Unit, following a recent Council/Parliament decision6, is now running in an operational context, called Mars Crop Yield Forecasting System (MCYFS). The system consists in an ensemble of methodologies and tools providing early information on crops across the agricultural season. The Mars-stat group is in charge of r&d, technological improvement, output quality control, agro-meteorological analyses, crop yield forecasts and the publication of the MARS bulletins (http://agrifish.jrc.it/marsstat/Bulletins/2004.htm), while the operational management of the system has been outsourced (see reference Operational Activities for the MARS Crop Yield Forecasting System). The data produced for the analysis and forecasts are distributed through a web site: http://ww.marsop.info, and other elements of the forecasting system such as meteo interpolated data, remote sensing indicators, software codes and executables, are available for transfer to external users (refer to http://agrifish.jrc.it/marsstat/default.htm). The MCYFS is called a system because several elements and independent modules are integrated to achieve a final purpose, e.g. to monitor crop behaviour and produce crop yield forecasts. The MCYFS is run operationally on an area covering the whole European Continent, the Maghreb and Turkey7. The following crops are covered by the system simulation models: wheat, spring barley, grain maize, rape seed, sunflower, potato, sugar beet, field bean. Crops such as rice, soy bean and pastures are presently under study/evaluation. The simulated crop parameters can be extended to other crops or varieties within the same class, such as winter barley, durum wheat, field peas, etc. In due course the system will be able to supply predictors and predictions for more than 11 crops. The main pillars of the system are: ⇒ Observed meteorological data collection, processing and analysis ⇒ Simulation of agro-meteorological crop growth parameters

4 EUR Publication n°15599 EN, of the Office for the Official Publications of the E.C., Luxembourg, Space Applications Institute, J.R.C. Ispra, pp

33-81

5 EUR Publication N° 16318of the Office for Official Publications of the EC. Luxemburg, 182 pp. 6 The European parliament and the Council adopted on the 22.05.2000 the decision n°1445/2000/EC “on the application of area frame survey

and remote sensing techniques to the agricultural statistics for 1999 to 2003”. The legal basis had a Renewal of the Decision for the period 2004-

2007 (Ref. PE/CONS 3661/1/03 OJ L 309 of 26.11.2003). Research actions related to the system find currently a legal basis on the JRC multi

annual working programme (FP6 2003-2006 action 1121 MARS STAT)

7 Within the MARS FOOD Sector the MARS Stat system is experimentally being extended to other 4 pilot areas: the whole Mediterranean basin,

CIS Countries, Eastern Africa (IGAD area), South America (MERCOSUR area), mainly in support of DG-DEV and DG-AIDCO for food-security

policies.

Page 4: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 2

⇒ Low resolution satellite data analysis ⇒ Statistical analysis and forecasts The output of the system is threefold: a) Mapped outputs of quality indicators for the agricultural season. E.g.: extreme temperature maps at given crop development stages; simulated biomass and grain production, estimated actual soil moisture reserves, state of advancement of the development stage during a given month, differences from the long term average at a given dekade or period within the growing season and for any agro-meteorological indicator. b) Alarm and risk warning: detection of abnormal weather conditions (during a given month, or cumulated since the start of the season). c) Calculated yield forecasts. Meteorological and agro-meteorological maps are produced by the system each ten days, and screened by analysts. Data is updated on the web site and published in the MARS bulletin about 6-7 times a year as a complete analysis, and each 15 days as climatic updates during the main crop vegetative period. As depicted in the cover picture of the publication, the MCYFS can be ideally divided into three levels: 1- In the first level, the meteorological data are collected, quality checked, processed and analysed. 2- In the second level a simulation engine (crop growth simulation system) is run to link the meteorological data to crop biomass production. The engines used are the Crop Growth Monitoring System (WOFOST model adapted to the European Scale) and LINGRA (for pastures). At this stage, auxiliary information such as soil parameters, crop calendars, crop practices, crop parameters (the last three forming the core of the crop knowledge base) are introduced as fundamental complements for an acceptable simulation. Many crop specific indicators/predictors are produced at this stage and transferred to the statistical analysis to support the production of a quantitative yield forecast. The second level of the system also includes the processing of remote sensing data to produce “measured” vegetation indicators which can be compared with the “estimated” agro-meteorological indicators as well as used as predictors. The satellite sensors are low and medium resolution ones, respectively SPOT-Vegetation/NOAA-AVHRR (about 1 km resolution) and MERIS/MODIS (about 300-500 m resolution). 3- In the third level, the indicators obtained from meteo and agro-meteo data and from remote sensing are linked to the time series of the official yields through regression models, and analysed through scenarios. The final results are quantitative yield forecasts, that are published into the MARS bulletins together with the analysis of the previous outputs. More than three hundred publications exist on the MCYFS, however none of them gives a comprehensive overview of the current operational system. The four volumes dedicated to the Methodology of the MARS Crop Yield forecasting system try to provide a general overview and a better understanding of the MCYFS describing its main components, different databases, methods and results in a synthetic though complete manner. This book as part of a series of 4, focuses on the elements of level 3 of the system: “The statistical data collection, analysis and processing”. The first part of the book describes mainly the methodology used to handle the statistical flow of data in the preparation of the forecasts as derived from the predictors provided by the previous levels (described in Volumes 1 to 3). The following chapters give an overview on the statistical forecasts calculations, including the scenario analysis and the module enclosed into CGMS and used to run automatically the regression analysis. The part describing the CGMS module is based on the METAMP project (Methodological Assessment of MARS predictions, Contract N° 19226-2002-02 F1FED ISP NL executed by the consortium ALTERRA/VITO/Supit Consulting in 2002). The last part of the book focuses on the forecast error calculations and the forecasting strength of the system, and is based on the results of the QUAMP project (Qualitative Assessment of MARS Predictions Contract N° 20240-2002-12 F1ED ISP GR executed by Liaison Systems). Additional information, data bases and related documents can be requested at JRC (contact [email protected]) or through the web site http://agrifish.jrc.it/marsstat/default.htm. GIAMPIERO GENOVESE

Page 5: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 3

Table of Contents PREFACE ........................................................................................................................ 4-1

INTRODUCTION .......................................................................................................... 4-6

1 OBJECTIVE OF THE QUANTITATIVE FORECAST LEVEL (3) OF THE MCYFS AND OVERVIEW OF THE APPROACH ....................................................... 4-7

1.1 GOAL AND ASSUMPTIONS .....................................................................................................4-7 1.2 THE FORECASTING APPROACH ............................................................................................4-9

2 DATA INPUT FOR FORECASTING ...................................................................4-13 2.1 INPUT DATA .........................................................................................................................4-13 2.2 NATIONAL STATISTICS ........................................................................................................4-13

3 MCYFS FORECASTING METHODS ..................................................................4-15 3.1 BACKGROUND......................................................................................................................4-15 3.2 CURRENT SITUATION ..........................................................................................................4-16 3.3 THE CGMS PREDICTION MODEL ......................................................................................4-17 3.4 TREND ANALYSIS IN EXCEL AND SPSS ............................................................................4-19 3.5 OTHER PREDICTION MODELS ............................................................................................4-20 3.6 SCENARIO-ANALYSIS IN SPSS.............................................................................................4-20

4 A CONTROL BOARD AS A TOOL TO GUIDE THE STATISTICAL DATA ANALYSIS...................................................................................................................... 4-26

5 NEW FEATURES IN CGMS RELEASE 8.0........................................................ 4-33

6 QUALITATIVE ASSESSMENT OF MARS PREDICTION (QUAMP) RESULTS.. ................................................................................................................................. 4-34

6.1 INTRODUCTION....................................................................................................................4-34 6.2 METHODOLOGY OF FORECAST ERRORS EVALUATION METHODS.................................4-35 6.3 OVERALL ERROR OF THE MARS CROP YIELD FORECATING SYSTEM...........................4-38 6.4 ANALYSIS OF THE SPATIAL DISTIBUTION OF MCYFS ERROR........................................4-38

6.4.1 Bias ..................................................................................................................................... 4-38 6.4.2 Error magnitude................................................................................................................... 4-41

6.5 ANALYSIS OF THE EVOLUTION OF MCYFS ERROR OVER THE MONTHS......................4-42 6.5.1 Bias ..................................................................................................................................... 4-43 6.5.2 Error magnitude................................................................................................................... 4-44

6.6 ANALYSIS OF THE EVOLUTION OF MCYFS ERROR OVER THE YEARS..........................4-47 6.6.1 Bias ..................................................................................................................................... 4-47 6.6.2 Error magnitude................................................................................................................... 4-48

6.7 CONCLUSIONS ON THE QUALITY EVALUATION OF THE MCYFS IN THE PERIOD 1996-2002 ................................................................................................................................................4-50 6.8 EVALUATION OF THE IMPROVEMENT OF MCYFS FORECASTS .....................................4-51

6.8.1 Forecasting of Soft wheat....................................................................................................... 4-51 6.8.2 Forecasting of Durum wheat ................................................................................................. 4-52 6.8.3 Forecasting of Barley............................................................................................................. 4-53 6.8.4 Forecasting of Grain maize................................................................................................... 4-54 6.8.5 Forecasting of Rape seed ....................................................................................................... 4-55

Page 6: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 4

6.8.6 Forecasting of Potato............................................................................................................. 4-56 6.8.7 Forecasting of Sugar beet....................................................................................................... 4-57 6.8.8 Forecasting of Sunflower .......................................................................................................4-58

7 REFERENCES ...................................................................................................... 4-60

8 ANNEXES.............................................................................................................. 4-63 8.1 OVERVIEW OF THE SOFTWARE IN THE MCYFS..............................................................4-63 8.2 CGMS TABLES FOR THE YIELD FORECAST PROCEDURE ................................................4-66 8.3 FLOW DIAGRAMS OF THE CGMS PROCEDURES ..............................................................4-68 8.4 MAPS OF THE COUNTRY MPES PER CROP ........................................................................4-69 8.5 MAPS OF THE COUNTRY MAPES PER CROP .....................................................................4-85

Page 7: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 5

List of Abbreviations ABBREVIATION MEANING AFE Absolute Forecast Error APE Absolute Percentage Error AVHRR Advanced Very High Resolution Radiometer, LR-sensor on-board of the NOAA-satellites CAP Common Agricultural Policy of the European Union CCC Concordance Correlation Coefficient CRIE Cumulative Rate of Increase Error CTE Cumulative Trend Error CGMS Crop Growth Monitoring System, the combination of an agrometeorological crop growth simulation

model WOFOST, a database, and a yield prediction routine CNDVI Coverage bades-NDVI, RS-derived regional crop status indicator (Genovese et al., 1999) DMP Dry Matter Productivity (kgDM/ha/day), RS vegetation-indicator derived with the Monteith-approach EAGGF European Agricultural Guarantee Guideline Fund, an important instrument in the CAP EC European Commission EU European Union EUROSTAT Statistical Office of the European Commission FE Forecast Error GIS Geographical Information System, software for storage of geographical data, mostly in vector format GCM Global Circulation Model GMRE Geometric Mean Relative Error IPSC-JRC Institute for the Protection and Security of the Citizen of JRC (until 1 Sep 2001 called SAI-JRC) JRC Joint Research Centre of the European Union at Ispra, Italy MAFE Mean Absolute Forecast Error MAPE Mean Absolute Percentage Error MFE Mean Forecast Error MPE Mean Percentage Error MRE Mean Relative Error MARS Monitoring Agriculture by Remote Sensing, EU-JRC programme started in 1988 MARS-CAP MARS actions supporting to the Common Agricultural Policy MARS-FOOD MARS actions supporting the European Food Aid and Food Security policy MARSOP Current project for the operation of part of the MCYFS-activities (consortium of Alterra, MeteoConsult

and VITO, sponsored by JRC). MARS-STAT MARS actions supporting the assessment of Agricultural Statistics MCYFS MARS Crop Yield Forecasting System, an important instrument of the MARS-STAT team to assess the

crop yields METAMP Methodology Assessment of MARS Predictions, a study to document the MCYFS, compare this system

with other systems and an analysis of possible improvements MVC Maximum Value Compositing, used with NDVI to create S1/S10/S30-syntheses (minimising clouds) NDVI Normalised Difference Vegetation Index, RS-indicator for the amount of standing vegetation NOAA Series of near-polar satellites monitored by the US National Oceanographic and Atmospheric Administr. NUTS Nomenclature Universelle de Territoires Statistiques, numbering of EU-regions used by EUROSTAT PE Percentage Error PET Potential EvapoTranspiration QUAMP Quantitative Assessment of MARS predictions. A MARS Study project made in 2002-2003 by Agilis RE Relative Error RIE Rate of Increase Error RIEP Rate of Increase Error Percentage RMSAFE Root Mean Square Absolute Forecast Error. It is equivalent to RMSFE. RMSAPE Root Mean Square Absolute Percentage Error. It is equivalent to RMSPE. RMSFE Root Mean Square Forecast Error RMSPE Root Mean Square Percentage Error RMSRE Root Mean Square Relative Error UAPE Unbiased Absolute Percentage Error UPE Unbiased Percentage Error

Page 8: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 6

INTRODUCTION The MARS project and the MCYFS Within the EC the DG AGRI is responsible for the implementation of the Common Agriculture Policy (CAP) regulations, for evaluation of their consequences and for the control on the European Agricultural Guarantee Guideline Fund (EAGGF). According to De Winne (1994), the collection of national and regional statistics on land use, land use changes and agricultural production is a prerequisite for this evaluation and control. Information on land use, inter annual land use changes and yields is routinely collected by various national statistical services, which convey this information to the statistical office of the EC, EUROSTAT. However, collection and compilation of these agricultural statistics is time consuming. In exceptional cases, these statistics are available some months after the end of the season, however, generally it takes one or even two years before this information is available in the EUROSTAT databases (Supit, 1999). Another problem is the heterogeneity in agricultural statistics at EU and country level. This causes difficulties in collecting, comparing and aggregating the data at Community level affecting timeliness. This problem is accentuated by the EU enlargement process. To assist DG AGRI and EUROSTAT in executing their tasks (i.e. EAGGF control; evaluation of the CAP effects on agriculture; collection of agricultural statistics), the Council of Ministers of the EU on 26th September 1988 approved a ten-year research and pilot project: the MARS project (Meyer-Roux and Vossen, 1994). Within the MARS Unit, the MARS-STAT activity concentrated on the assessment of crop yields and production volumes of various crops within the EU based on meteorological analysis, agro-meteorological simulated crop growth indicators, low-resolution satellite data and statistical analysis developing and using the MCYFS. In the period 1988 – 1993, the MCYFS was designed and implemented following two different lines. The first (previously known as action 2) was a European wide system to qualitatively monitor crop status and to provide warnings in case abnormal growth conditions were observed, using data derived from the NOAA-AVHRR meteorological satellites. The most frequently used satellite-based indicators were vegetation indices (Normalised Difference Vegetation Index (NDVI)). These indices could be applied as qualitative indicators for biomass development and consequently crop yield (Meyer-Roux and Vossen, 1994). The other line (previously known as action 3) was based on observations at the earth’s surface, using agrometeorological models and ground surveys. The main goal was to monitor weather and crop conditions in order to assess the effect of weather on crop conditions, and to make early yield and production estimates per country and/or large region. To this goal the agrometeorological crop growth simulation model WOFOST was combined with a GIS and a yield prediction routine to form the Crop Growth Monitoring System (CGMS). CGMS was therefore developed in the first stage of the MARS project. In a second stage, CGMS had to be refined and enlarged and then included into a larger concept which is the MARS Crop Yield Forecasting System, using amongst others remote sensing information as input (Meyer-Roux and Vossen, 1994). The System was defined as operational in 1998, and the main operational activities (basically data purchase and acquisition, system running and maintenance of a web site) were outsourced through public tender procedures in 2000. The JRC (MARS STAT Action) is in charge of r&d, technological improvement, main output quality control, the agro-meteorological analyses, crop yield forecasts and the publication of the MARS bulletins.

Page 9: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 7

1 OBJECTIVE OF THE QUANTITATIVE FORECAST LEVEL (3) OF THE MCYFS AND OVERVIEW OF THE APPROACH

1.1 GOAL AND ASSUMPTIONS

The objective of the quantitative forecast level (3) of the MCYF System is to provide the most likely, precise, accurate, scientific, traceable and independent forecasts for the main crops’ yields at EU level taking into account the effect of the climate during the season as early as possible during the cropping campaign (and until harvest). The main role of the MCYFS (level 3) is to provide yield statistics of the major crops at EU and national level, as accurate and timely as possible, while ensuring independence from all external sources of estimates, including the national statistical systems (Genovese, 1998). To realise this objective crop yield forecast procedures are applied which combine all kinds of input such as historical yield statistics, weather indicators, simulated crop indicators, remote sensing based vegetation indices, additional information sources and expert knowledge. Time series of historic yield statistics of EUROSTAT are an important data source in this procedure which is mainly based on regressions. In this context the MCYFS assumes that the official yield statistics are objective statistics and reflect the real situation. This is sometimes objective of discussion as the MCYFS can then be seen as an instrument to forecast the “the official yield” not the actual crop yield. However, the system is such that the forecasts can also be issued as own independent elements from crop yield time series. Vossen and Rijks (1995) stressed that due to unknown farming practices and uncertainty in the MCYFS input data quantitative forecasted yields can never be valid for a specific locality. They can be valid or reliable for (very) large areas such as countries or large regions, provided the information and model outputs were first carefully weighted for the relative importance of soil types, groups of varieties, common farming practices etc. Vossen and Rijks (1995) also stated that in relation to these limitations forecasting methods have to be validated per country and/or large regions. Here after are explained the main elements of the definition. “Produce the most likely, precise accurate, scientific, traceable and independent forecasts for the main crops”. To achieve this part of the objective different statistical tools are used: trend analysis, regression analysis, scenario/similarity analysis. At the end of the process different possible forecasts are available and often “statistically” acceptable. The “most performant result” is then individuated and selected according to statistical tests (De Koning et al., 1993) on the models used and scenarios analysis results. The measurement error (cause of main bias) is constantly a concern for the MCYFS as this could affect the results of the whole analysis. This means that on the side of the predictors it is often a material of R&D to improve the processing of the remote sensing signals or the re-calibration of the CGMS on updated crop parameters in order to obtain “predictors” as much as possible correlated with the observed crop yield. The concept of “scientific” includes the individuation of a direct link within cause and effect in the models used

Page 10: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 8

and the repeatability of the experience. In fact the system is tailored in a way that the same model with the same data sets can be repeated in any time obtaining the same result. For that reason any time the configuration of the system changes (for instance a new parameter calibration is injected in the model) the all archive of output like the time series of “predictors” is re-generated. The source of the forecast can be in the system of different nature. For that reason the system keeps track systematically of the variables, methods and models which were at the base of the final forecast. The system is then tailored to measure the current climate (level 1) and transform the effects into crop biomass production (level 2). This part is “inductive” as the initial observations of the climate are measures given by samples on points (meteorological stations) in some day hours. The remote sensing indicators, as another piece of the system and part of the level 2, give a “measure” of the general effect of the environmental conditions on the vegetation without specifying which part of the climate is really influencing the portion of the vegetation observed. The process is here “deductive”. The convergence of “evidence” from the two sides (bottom-up “inductive” from model simulation/ top-down “deductive” from Remote Sensing measure) is often considered in the framework of crop forecasting as sufficient to conclude on current crop behaviour and justify resulting forecasts with the mostly reduced uncertainty. However, in this framework, the error of measurement is often neglected and let underestimate the uncertainty which should be associated to the forecast. The improvement of the “measures” precision do of the predictors is a constant effort of the MARS team in charge of the MCYFS. More efforts are being made in order to control the uncertainty related to unpredictable future climate impact remaining from the moment of issuing the crop yield forecasts and the final harvests. One is the use the “ensembles” probabilistic output from weather circulation models (See Vol. 1) as initial input in the model and define an ensembles-probabilistic final predictors on which base the crop yield forecasts. “EU level” and the geographic dimension. The geographic dimension of the forecasts given by the MCYFS goes theoretically from the EMU (Elementary Mapping Unit) concept dimension (see Vol. 2 of this series) to the Continental dimension passing through the grid cells (50x50 km) (see Vol. 1 of this series), regions (NUTS2,1) and country levels. In theory the “predictors” production can be generated at whatever geographic level according to the layer available (catchments boundaries could also be used). For practical reasons linked to the management of the information and to the constraint of deriving the final forecast mainly re-calibrating the “predictors” on time series of observed yields, the system runs at Country level and then aggregates the results at EU level. In practice, while “predictors” are systematically produced and stored at grid, NUTS2, 1, 0 level, the quantitative final forecasts are given at National level and then aggregated at EU level weighting the results with the most recent crop area data available. The EU and National crop yield forecasts are made available to the DG-AGRI and EUROSTAT and published in the MARS bulletins. Taking into account the effect of climate during the season, as early as possible during the cropping campaign (until harvest). Concept of “predictors”. The main idea steering the MCYF System is that the climate can have a significant effect on crop yield, determining most of the interannual variability. The time lag of the forecast is normally one year ahead. It means that the system forecasts the crop yield at harvest during the current agriculture season. In the current conception of the System the climate so far observed will determine the crop yield at harvest. Here, two directions are developed; the first through the classic approach of regression modelling where at time “n” the forecast obtained by the model

Page 11: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 9

selected implies a “normal” effect of climate from time “n” to the harvest; the second one is developed through the scenarios where at time “n” the crop yield forecast is determined according to the range given by the ensembles of climatically similar years (developed in section 3.6). However, today one of the R&D directions to improve the system is the introduction of climate forecasts at different time span in the system. This approach uses output from CGM forecasts ensemble climate forecast approach (see vol.1 of this series) seems one of the most promising. The experience is investigating the use of forecasts at 10 days, monthly, seasonal and on climatic dimensions (dekade, century). The result expected could improve the timeliness of the forecasts, partially improve the uncertainty of the system treating its input/output flow in a probabilistic “ensemble” approach: at time “n” the crop yield forecast at harvest will include the effect of the most probable weather from time “n” to harvest as given by the climatic forecasts. The “predictors” are any variables which can be observed annually and related to crop yield at harvest time. The nature of the “predictor” can be meteorological (any parameter as tmax, tmin, rain, radiation levels. PET) regionalised (CMETEO for instance see Vol. 1); crop specific simulated parameter (for instance CGMS crop-soil moisture see Vol. 2); derived from remote sensing (SPOT-Vgt NDVI and the regionalised version CNDVI see Vol. 3). The trend extrapolation is also considered as a predictor. The function of the trend and period on which to calculate it is object of discussion as can heavily influence the final model. In the process of finalizing a bulletin report the MCYFS analysists goes through the following steps (A) meteorological impact evaluation (see Vol. 1) (B) crop status assessment (see Vol. 2, Vol. 3) (C) crop growth expectations (see all Vol.s) (D) yield forecasts (Vol. 4) The scheme is followed during the analysis and along the season. In steps from A to C the “predictors” are generated, confronted and analysed, then in step D are statistically evaluated. In the modelling process other medium and long-term non-biological or non-geophysical factors that could influence the crop yield, such as a technological trend, are taken into account, basically modelling the yield time trend, including the main factors of technological improvements. Hence, the classic approach in the field of prediction, suggesting that the past generates the future, is respected. The forecasts (or early estimates) are therefore always available, updated in near-real time and able to anticipating most other sources of information (Genovese G., 1998).

1.2 THE FORECASTING APPROACH

The crop yield forecast procedure assesses yield forecasts in ton.ha-1 fresh weight using different methods and software tools. The philosophy of the approach is outlined in the following: y = f1(trend) + and/or [1] f2 (crop simulation) and/or f3 (meteorological data) and/or f4 (satellite data) and/or f... (other input) + e residual error.

update until harvest

Page 12: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 10

The functions f. are explained in the other volumes of this series. Of course the linearity is here just a simplification. This approach can generate a big number of available forecasts, only part of them statistically performant and acceptable and a few respecting a direct link cause (climate impact)-consequence (crop yield) as logic. This is why within the MCYFS some priorities are followed in selecting the final model which will give the forecasts. 1) As a first step, yields are predicted by the CGMS statistical sub system which uses agricultural yield statistics and simulated crop indicators. This system consists of a linear regression model combining the mean yield, a linear time trend and a linear regression function to explain the residual variation (Vossen, 1990b; 1992). The linear time trend represents the influence of long-term economic and technological dynamics such as increased fertiliser application, improved crop management methods, new high yielding varieties, etc. on yields. The residual variation is modelled as a function of crop growth simulation results (potential or water limited dry weight of the simulated biomass or storage organs). As said before these simulated crop indicators account for the inter-annual yield variation that results from weather variability. 2) When for a certain combination of country and crop the accuracy of the predicted yield is deemed not to be sufficient, the MARS analysts at JRC start to redefine trend periods and functions. This part must not be underestimated as the trend reference can have a serious effect on the final forecasting model selected. 3) In some occasions the analysts build their own prediction models for certain combinations of crops and countries. These models use other crop indicators like simulated leaf area index and CNDVI. At the moment this part is not automated and thus time consuming. In future, these indicators and new indicators such as CMETEO (see Vol. 1 of this series), Ta/Tp (see Vol.2 of this series) will be added to the automated procedure of the statistical sub-system of the CGMS. 4) To deal with the uncertainty given by the unknown evolution of the season from the moment the forecast is issued to the moment the crop is harvested, agro-meteorological scenario’s can be produced and analysed. The scenarios are currently based on agro-meteorological similar years as detected by cluster and factorial analyses (Principal Component Analysis (PCA) techniques). Once agro-climatic years are detected which are similar to the current year, the resulting range of final yield performances of these years can be attributed to the current season. The extremes maximum and minimum yields obtained in the clusters of similar years are always characterised through the factorial analysis. If a trend exists the range of final yields will be corrected for this trend before relating the yields to the current year. This technique helps to understand how the yield prediction could still change before harvesting. 5) As a last attempt to determine the yield forecast, and in the optic to be able to release a forecast at EU level by aggregating National forecasts, the calculation of the trimmed average is performed. I.e. yield statistics of the last five years are taken and ranked. The trimmed average is the average of the three centre years. This trimmed average will be used as yield forecast when all other methods (prediction models such as CGMS level 3, trend analysis, scenario analysis) do not lead to satisfactory results or when there is a lack of data, a gap in the time series.

Page 13: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 11

At the moment the human analysis based on the accumulated experience and knowledge of the phenomena involved, is a consistent weight in the decision of which model better expresses the final yield. For instance a low simulated yield scenario (scenario analysis) could be accepted as forecasts if there are clear signs of the effect of negative factors which are not modelled in CGMS (i.e. heat stress in final grain maturity, strong rain at harvest, but also higher than normal risk of potential pests and diseases, severe drought effects etc.). The final choice is made empirically trying to optimise the pattern of reliability given by all sources. The optimisation of this pattern is an objective function of the MCYFS (Genovese, 1998). Decisions and final choices are documented through a system called COBO (Control Board).

Fig. 1.1 Flow chart of prioritized steps for the choice of the final yield forecast within the MCYFS.

CGMS prediction modelincluding linear trend

Remote sensing(CNDVI)

Yield forecast

Crop IndicatorDatabase (NUTS)

National yieldstatistics

optional:scenario analysis

(SPSS)(5.3.2.4)

optional:trend analysis(SPSS, Excel)

optional:other prediction models

(SPSS)

optional:trimmed average

(Excel)

Knowledge base fordata interpretation

Crop IndicatorDatabase (NUTS)

ok? yesno

ok? yesno

ok? yesno

ok? yes no

Page 14: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 12

The final predicted yields for each country, as established by the MARS analysts, are published in the MARS bulletin. The yield forecasts that are produced automatically by the statistical sub system of the CGMS are available on internet (http://www.marsop.info). It shows how the yield forecasts of the CGMS develop throughout the growing season. According to previous analysis (Genovese 1998) the desired level of accuracy is reached when the error is lower than 3% at national level. The level remains appreciable if it is lower than 6%. If the error is greater than 6% the yield forecast can be considered unacceptable with one exception: if the error is still lower than that of any other source available. In that case the MCYFS still produces an improvement which could be economically relevant. For winter crops the yield forecast should be accurate before June, for summer crops before August. A global evaluation of the system performances was made in 1998 using data until 1997 (Genovese, 1998). The RMSE error has been used as error indicator by comparing the MCYFS forecasts with ex-post observed results. The average yield forecast error ranges from about 3% to 5% at EU level for the main crops (8.3% for durum wheat). In absolute terms the average error for wheat ranges between about 2 and 4 quintals ha-1 both for European and national levels (Genovese, 2001). In general what has been shown is that the system gives higher errors at the beginning of the season and lower at the end according to a cumulative effect of the climate impact on the crop behaviour. The error analysis is repeated each year in order to carefully watch the performance of the MCYFS, individuate sources of errors and take decision on which part of the MCYFS has to be improved. The MARS Unit launched a study called QUAMP (Quantitative Assessment of MARS Predictions) to calculate the overall performance of the system including the period 1998-2001. Executive summary and results are given in Chap 6.

Page 15: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 13

2 DATA INPUT FOR FORECASTING

2.1 Input Data

The data for the yield forecast procedure were extensively discussed in the other volumes of this series and comprise the following: • Weather indicators (see Vol.1). • Crop indicators (see Vol.2). • Remote sense based vegetation indices (See Vol.4). • National yield statistics (see this chapter) • Additional sources (see this chapter).

2.2 National Statistics

There are two main sources of agricultural statistics including yield, planted area and production. The first source contains the regional agricultural statistics at NUTS level 2. This database is known as the REGIO database of the Statistical Office of the European Community (EUROSTAT) and starts in 1975. The second source refers to the CRONOS database of EUROSTAT which contains the national agricultural statistics, starting in 1955 (Vossen and Rijks, 1995). Not surprisingly, in the CGMS the REGIO data is stored in table REGIO and the CRONOS data in the table CRONOS. The contents of these tables are combined in the table EUROSTAT. The CRONOS data of this table, which refers to NUTS level 0, is used in the yield forecast procedure. In the operational services of the MCYFS this CRONOS data is updated frequently. The acreage statistics are taken directly from the tables REGIO, RU_EUROSTAT and KUL_AREA_STATISTICS and are needed in the aggregation of simulated crop indicators from NUTS level 2 to NUTS level 1 or 0 (see vol2 of this series).

Updating

REGIO database (NUTS level 2)

CRONOS database (NUTS level 0)

EUROSTAT database

CRONOS data

National Statistical Services of all EU member states

CGMS Prediction Model

Page 16: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 14

Statistics on planted area, yield and production volume are collected from national statistical services of all EU member states by EUROSTAT. Within the EU, no single Community system to establish these statistics exists: the methods applied vary from country to country. Through article 3 of CAP regulation 837/90, the Commission attempts to harmonise these methods and to stimulate the use of scientific procedures. This regulation prescribes amongst others that censuses or representative sample surveys shall obtain data on planted area, yield and production volume for all significant crops. The heterogeneity in agriculture statistics at EU and country level causes difficulties in collecting, comparing and aggregating the data at Community level affecting timeliness. Bradbury (1994) investigated the applied methods to establish statistics for cereals for various EU member states. The author concluded that ‘most member states attempt to estimate sampling errors, and usually manage to show that the margins are close enough to those set out in regulation 837/90, but with greater or lesser amount of convincing detail. For judgmental assessment of yield/production (and for Greece, of area as well) no fully satisfactory methods to establish the estimating error are available.

Page 17: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 15

3 MCYFS FORECASTING METHODS

3.1 Background

Various authors have proposed to subdivide crop yield in three components: mean yield, multi-annual trend and residual variation (e.g. Vossen, 1989; Dagnelie et al., 1983; Dennet et al., 1980; Odumodu and Griffits, 1980). It is assumed that the interacting effects of climate, soil, management, technology, etc. determine the mean yield. Observed national, regional and sub-regional yields show a trend in time. The trend is mainly due to long-term economic and technological dynamics such as increased fertiliser application, improved crop management methods, new high yielding varieties, etc. The third component, the residual variation, is considered to be the variation among years (Dennet et al., 1980). It is exactly this part which should be explained by weather, crop and remote sensing indicators. According to Dennet et al. (1980) and Odumodu and Griffits (1980), the technological time trend should be removed from the crop yield time series, assuming that the residual variation is independent of that trend. This approach can be summarised as (Vossen, 1989):

(3-1)

(3-2)

(3-3)

(3-4) where: YT : observed yield in year T [ton.ha-1] Y : mean yield [ton.ha-1] f(T) : technological trend as a function of time [ton.ha-1] e : residual, not explained by trend [ton.ha-1] $YT : estimated yield in year T using a trend function [ton.ha-1]

f(weather) : function of weather variables (e.g. 10-day rainfall etc.) [ton.ha-1] Palm and Dagnelie (1993) fitted various time trend functions to national yield series (ton.ha-1) of several crops for 9 EU member states. Regressions were executed for the period prior to 1983 and a forecast for 1983 was made. This procedure was repeated for successive years up till 1988. The prediction results were compared with national yield values. Of the tested functions a quadratic function of time performed best. However, differences with a simple linear trend function were small. In a next step, these authors removed the trend from the yield series using the quadratic function. The residuals for the period prior to 1983 were regressed against various meteorological parameters and a prediction for 1983 was made. Again, this procedure was repeated for successive years up till 1988. This was done for 19 Departments in France.

)(weatherfe =

eYY TT =− ˆ

( )TfYYT +=ˆ

( ) eTfYYT ++=

Page 18: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 16

Comparing the predicted and official yield series demonstrated that the applied meteorological variables did not improve the prediction accuracy. Swanson and Nyankori (1979) for corn and soybean production in the USA, Sakamoto (1978) for wheat production in South Australia, Agrawal and Jain (1982) for rice yields in the Raipur District in India, considered the technological time-trend dependent on the residual variation. According to Winter and Musick (1993), Hough (1990b) and Smith (1975), weather affects farm management practices such as planted area, timing of field operations, application of inputs, etc. Hence, the time trend should be analysed simultaneously with the explaining variables. This approach can be summarised as (Vossen, 1989):

(3-5) where: b0 is ‘theoretical’ yield in absence of a trend and weather influences. Swanson and Nyankori (1979) showed that the time trend was underestimated when weather data were not analysed simultaneously with the time trend. Similar results were found for millet in Botswana (Vossen, 1989). Equation 5-5 does not account for the interaction between crop growth and weather variability. Also root characteristics and soil physical properties are not accounted for. Therefore Vossen (1990b, 1992) proposed to use crop growth simulation results to describe year-to-year yield variation. In a crop growth simulation model weather and soil characteristics are summarised and crop characteristics, including yield form the output, i.e. simulation results quantitatively represent the influence of weather variables on crop growth. The yield can be written as:

(3-6) where: f(simulation) is function of crop growth simulation results that accounts for the yield variation that result from weather variability.

3.2 Current Situation

In the MCYFS yield forecasts in ton.ha-1 fresh weight are produced using different methods and software tools. This figure shows the various steps in the analysis to derive yield forecasts and forms the reading guide for the following paragraphs. The philosophy of the yield forecasting approach is outlined in the following formula:

(3-7) where: YT : observed yield in year T [ton.ha-1] f(T) : technological trend as a function of time [ton.ha-1] f(MET) : function of weather indicators [ton.ha-1] f(SIM) : function of simulated crop indicators [ton.ha-1] f(RS) : function of remotely sensed indicators [ton.ha-1] e : residual, not explained by trend and functions [ton.ha-1]

( ) ( ) eweatherfTfbYT +++= 0

( ) esimulationfTfbYT +++= )(0

eRSorandSIMorandMETorandTfYT += )///(

Page 19: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 17

This approach relies on the elasticity of adding or excluding factors, during the year, that improve the forecasts. The factors can be complementary in some part of the year, in other parts they can be considered less certain. Each one of the elements represents a self-contained model able to produce a yield forecast, but with a different degree of accuracy or, if not measurable, of reliability. The first predictions of production volumes are based on the extrapolation of historical crop yield time series and planted area. Crop yield predictions are refined in the course of the year, from early indicators based on provisional data to final results. Information provided by various sources are analysed and combined to improve the initial predictions. The GIS, relational databases and statistical software are intensively used to manage all information stored in the MARS project archives. During the process to establish the forecasts connection with the knowledge of the structure of the territory is maintained (morphology, pedology, crop practices in use etc.). For instance the crop phenology simulation permits to check how the climatic evolution influences the crop development. In case of a negative influence probable yield losses should be quantified. Existing literature can indicate the probability of yield losses for the crop investigated, considering the problem at that time, and the constraint of the territory (Genovese, 1998). The priority scheme for the forecasting procedure: is as follows: • use CGMS based forecast (see 3.3), if results are not performing then:

• evaluate trend alone (0), if not performing: • try other prediction models (3.5), if not performing:

• evaluate scenario’s (3.6), if not performing: • use a trimmed average (3.7), eventually

The scenario analysis is in fact run systematically and the extremes values obtained are used to generate the two extreme pessimistic and optimistic scenario of final crop yield. This information is evaluated especially in the early part of the season (see 3.6 for further elements).

3.3 The CGMS prediction model

Official statistics of regional mean yields are predicted by the CGMS using one of the following simulated predictors (see Vol. 2 for a detailed description of the predictor): • Potential dry weight of the simulated biomass (ton.ha-1). • Water limited dry weight of the simulated biomass (ton.ha-1). • Potential dry weight of the simulated storage organs (ton.ha-1). • Water limited dry weight of the simulated storage organs (ton.ha-1). Originally, it was intended to predict yields by solely using the water limited weight of storage organs in the prediction model. Later on, the other three were added. Water limited yield, for instance, is inappropriate for a region with a lot of irrigation. Furthermore drought stress can be strongly reduced in case of groundwater influence. This factor is not included in the CGMS. The simulated biomass indicators were added because these are more robust, and less sensitive to modelling errors in the distribution of assimilates. Moreover they also allow yield prediction during the growing season, when grain filling has not yet started or grains are still very small (de Koning et al., 1993). Today, within the new releases of CGMS (release 8) all of the crop parameters can be used as predictors including Leaf Area Index, Soil Moisture and Development Stage. Other indicators will be implemented soon as the ratio between the estimated actual crop transpiration and potential crop transpiration (Ta/Tp, see Vol. 2 for a description).

Page 20: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 18

The statistical sub system of the CGMS uses a combination of a linear time trend and crop growth simulation results as proposed by Vossen (1990b, 1992). This prediction model can be described as:

(3-8)

where: TY and ST are estimated yield and simulation results or predictors (ton.ha-1), respectively in year T, and b0, b1 and b2 are regression constants. Constant b0 represents the average official statistical yield (ton.ha-1), constant b1 is the yearly increase of the official yield (ton.ha-1). Sub-optimal production circumstances such as drought, low temperatures etc. are allowed for by the constant b2, which should lie between 0 and 1. Per region, for a moving window of at least 9 years, the regression coefficients are established and subsequently used for yield prediction of the 10th year (‘one-year-ahead’). The selection of the predictor to forecast the final yield is as follows. Each candidate predictor is fitted to the data currently available for this region. Candidates with a negative estimate of b2 are rejected because of the nature of the process. From the remaining ones, that with the lowest jackknife mean square error is selected. Jackknife errors are calculated by simulating that an observation is absent and that the predictor is used to assess its value. It reveals the error in predicting the observation which had been kept out of sight. Obviously, jackknife errors are not entirely relevant in the present situation where we want to predict the future rather than to reconstruct the past. For direct application it is more relevant to investigate the prediction of the one-year-ahead. Still the jackknife method is used because the jackknife error-size estimates are less variable, being based on a larger number of predictions. With the same number of observations ‘n’ the jackknife method has ‘n’ error estimates while the ‘one year ahead’ prediction, has only ‘n-y’ error estimates where ‘y’ is the number of years on which the prediction is based. More detailed descriptions are given by de Koning et al. (1993) and Jansen (1995). A quadratic trend function is also considered in the CGMS. However, based on results of Palm and Dagnelie (1993) and de Koning et al. (1993), it was concluded that a linear trend sufficiently describes the increasing official yields. A smooth trend of any type over a large number of years assumes a continuity which might be unrealistic (de Koning et al., 1993; Vossen, 1992; 1990a). According to Vossen and Rijks (1995) the predictor should only be based on data from the recent past. The length of the series should nevertheless be long enough to give a sufficient number of degrees of freedom in the regression analysis. Gradual shift in the time trend is allowed for by the shortness of the time series, used to derive the predictor. The statistical sub-system of the CGMS version 2.3 has recently been redesigned in S-PLUS 2000 Professional (Kuyper, 2001) as a spear module. In CGMS version 8.0 the level 3 module has been reconverted using Delphi language, enlarged in functionalities and re-included in the system. The yield forecasts are updated each ten days and results (yield forecasts and regression parameters) are written in the tables FORECASTED_NUTS_YIELD and REG_PARAM. Before the growing season starts, yield forecast are already produced based on the long term average and corrected for a technological trend. Within the S-PLUS application the MARS analyst can change the length of the time series. This re-defines the trend function and results in different CGMS level 3 forecasts.

TT SbTbbY 210ˆ ++=

Page 21: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 19

The crops for which forecasts are determined are listed in Table 3.1 and the table STAT_CROP. The statistics have a wider range of crops than the ones considered by the CGMS. Therefore yields of some of the ‘statistical’ crops are forecasted using the same ‘CGMS-crop’. Table 3.1: The crops for which forecast are determined in statistical module of the CGMS. Crop number (statistics) Crop name (statistics) Crop number (CGMS) Crop name (CGMS) 1 wheat 1 wheat 2 soft wheat 1 wheat 3 durum wheat 1 wheat 4 barley 1 wheat 5 winter barley 1 wheat 6 spring barley 3 barley 7 grain maize 2 grain maize 11 potato 7 potato 12 sugar beets 6 sugar beet 13 oil seed rape 10 oil seed rape 14 turnips (rape) 10 oil seed rape 16 sunflower 11 sunflower

Figure 3.1: An example of the evolution of the yield forecast in the course of the growing season as given by the automatic level3 of CGMS. These results are systematically available on www.marsop.info and are updated regularly..

3.4 Trend analysis in Excel and SPSS

When for a certain combination of country and crop the accuracy is deemed not to be sufficient, the MARS analyst start to redefine trend periods and functions using Excel. First, trends for a longer period (1975 until current year) are determined if yield statistics for such a period are

Page 22: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 20

available. Next, trends for more recent periods are studied. For Eastern Europe the period after 1990 is taken to exclude sharp changes caused by political changes around 1990. For countries within the European Union the period after 1992 is important because in 1992 the Common Agricultural Policy went through important changes that affected yield and planted areas. Besides, an analysis of the trend period, different trend functions are studied in Excel and SPSS. Yield statistics of each country are directly taken from the CRONOS database which is updated each month. Linear, quadratic and other type of trends are studied. MARS analysts also study the minimum and maximum trend evolution by separating the data set in two groups representing the 50% highest and 50% lowest values.

1 .0 0

1 .2 5

1 .5 0

1 .7 5

2 .0 0

2 .2 5

2 .5 0

2 .7 5

3 .0 0

3 .2 5

3 .5 0

3 .7 5

4 .0 0

1 99 1 19 9 2 19 93 19 94 1 99 5 1 99 6 19 97 19 98 1 9 99 2 00 0

Y IE L D t/h a

line a r tren d

m a xim um

m in im um

Figure 3.2: Maximum and minimum trend evolution for wheat in Rumania based on the period 1991-2000.

3.5 Other prediction models

For some occasions the MARS analyst at JRC builds his/her own prediction models for certain combinations of crops and countries. These models use other parameters of the CGMS (e.g. leaf area index), CMETEO (see vol.4 of this series), CNDVI (see vol.3 of this series) and are built in SPSS. In future these indicators will be included in the level3 of CGMS (release 8.0).

3.6 Scenario-analysis in SPSS

To deal with the residual uncertainty given by the unknown evolution of the season from the moment the forecast is issued to the moment the crop is harvested, agro-meteorological scenario’s can be produced and analysed. The scenario analyses consist in finding the most similar agro-meteorological years basing on the time series of parameters simulated by the CGMS. The analysis is based on Principal Component Analysis (PCA), Factor Analysis and Cluster Analysis (Hair et al., 1998). As input crop indicators of the CGMS of all available years are used (see Vol. 3 of this series). It is stressed that the climatic similarities are established basing on the time series of agro-meteorological parameters. In fact year similar in climatology are not necessarily similar in crop response as small changes in the sequence of the meteorological events can have a major effect in crop behaviour, this is why the approach is run directly on the crop parameters. The PCA gives a new combination of independent variables (factors). The first factors, explaining up to 90% of the variability, are selected and the combination of pairs of factors’ axis are analysed using as unit the original variables. The Unit (years’ observation) are then plotted on the new factors to characterise the years (for instance dry and hot season…).

Page 23: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 21

This is repeated for each country and at crop level (we remind that the original variables are the crop growth parameters as simulated by the CGMS). The analyst launches then a cluster analysis on the new factors (normally a hierarchical cluster) obtaining groups of homogeneous year according to obtained factors. Similarity or dissimilarity matrixes help to put a hierarchy on similarities among years. Once the score of similarities obtained and the hierarchy of similarity obtained the forecast is obtained as weighted average of the corresponding yields (in case de-trended). Weights are given by the similarity indexes. From the cluster of similar years different simple statistics are also used: within the group of the similar year the maximum and the minimum values of yield are used for optimistic and pessimistic yield scenarios. The routine used in SPSS is the following: FACTOR /VARIABLES ds sm wlai plai wb pb twc twr /MISSING LISTWISE /ANALYSIS ds sm wlai plai wb pb twc twr /PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION /PLOT EIGEN ROTATION /CRITERIA FACTORS(2) ITERATE(25) /EXTRACTION PC /ROTATION NOROTATE /SAVE REG(ALL) /METHOD=CORRELATION . GRAPH /SCATTERPLOT(BIVAR)=fac1_1 WITH fac2_1 BY year (NAME) /MISSING=LISTWISE . CLUSTER fac1_1 fac2_1 /METHOD WARD /MEASURE= SEUCLID /ID=year /PRINT NONE /PRINT DISTANCE /PLOT DENDROGRAM . In this example ds, sm, wlai, plai, wb, pb, twc, twr stand respectively for development stage, soil moisture, potential leaf area index, water limited biomass, potential biomass, total water consumption, total water requirements, and are the parameters simulated by CGMS. The initial data are (a dekade, crop and country/region/grid fixed) the CGMS simulations per year (year are the units). Step 1) In this example we extract the main variable for a given crop simulated and country and with a FACTOR analysis we reduce to few variables explaining about 90% of the variability (2 in the example). In alternative to fixing a dekade the procedure can run on several dekades, thus the number of variables could substantially increase. Step 2) We then obtain (GRAPH) plots of the original units (years) on the new axis and this will result in characterizing the current season in terms of impact on crop growth, i.e. wet and cold; wet and hot; dry and cold; dry and hot. Step 3) The third step (CLUSTER) is then used to look at the similar years as the graph factor analysis could not be sufficient to find these. The cluster algorithm is here based on

Page 24: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 22

The similar years are determined looking at coefficients of dissimilarity produced in the distance analysis. These coefficients are used in two ways: 1st detect the first ten similar years (or the ones similar below a defined threshold); 2nd use them as weights to define a prediction. Step 4) The fourth step is the prediction derived from the similar years (this is not in the routine above). The pairs (year, yield) belonging to the group of similar years. This will determine a range of yields and an average (min and max can be used as scenario min and scenario max where their explanation is given by the characterization of the factor analysis). The prediction is then obtained either using the average or (better) calculating a weighted average where the weights come from the dissimilarity coefficients. In case of presence of trend all the steps is in fact run on distances from trend (the trend model choice will affect all of the results). Here follow some examples: 1 - Germany soft wheat scenarios in 2003 made during the second dekade of May using as input all dekades of soil moisture and development stage values (years analysed from 1975 to 2003):

The scree plot on the left shows the eigenvalues of the factor analysis run on 20 variables. The first two corresponding axis (the most explanatory) are given above on the right. One can note that the first quadrant is correlated with development stage (all concentrated here) direct expression of the influence of temperatures on crops. The y axis is explained by the crop soil moisture in April. Looking at the axis counter-clockwise the north-east direction in the first quadrant expresses the most hot and humid years (in terms of effect on crops), the north-west direction in the second quadrant the cold and humid years, the southwest in the third quadrant the cold and dry year, the south-east in the forth quadrant the hot and dry years. The graph below shows the position of the years in the new axis obtained (first two):

Page 25: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 23

In this example one can note that the 2003 was at that time not so far from the origin appearing as a year slightly dry and cold in May. However, the position of 2003 in the new system of co-ordinates was opposite to 2002 (year characterised by a very high level of precipitations). 2- Spain soft wheat scenarios in 2000 made during March using as input all dekades of soil moisture and development stage values (years analysed from 1975 to 2000): In this example all the variables are analysed in the same dekade. The variables are Dev. Stage (DS), Soil Moisture (SM), Potential Biomass (PB), Potential Storage Organs (PS), Waterlimited Storage Organs (WS), Potential Leaf Area Index (PLAI) and Water Limited Leaf Area Index (WLAI). The difference between potential and water limited indicators is explained in Vol. 2 of this series.

The factor analysis gave the following results that show that the first two components explain almost 90% of the variability.

Page 26: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 24

and this is the corresponding plot and the contribution of each variable to the final variability:

And below the corresponding plots of the variables on the first 2 axis and the units (year). The year 2000 was placed among the normal year at that time. To be stressed the years in the fourth quadrant of the last chart that can be read the area of the dry and hot years (years of drought) already well characterised in March.

REGR factor score 1 for analysis 1

3210-1-2

RE

GR

fact

or s

core

2

for a

naly

sis

1

3

2

1

0

-1

-2

-3

2000

1999 1998

1997

1996

1995

19941993

1992

1991

1990

1989

1988

19871986

1985

1984

1983

1982

1981

1980

197919781977

19761975

Component Plot

Component 1

1.0.50.0-.5-1.0

Com

pone

nt 2

1.0

.5

0.0

-.5

-1.0

ws

wlai

wb

sm

ps

plai

pb

ds

.905 -.296

.994 -7.13E-02

.934 .125

.914 -.141-.599 .730.980 .176.796 .447.850 .308

DSPBPLAIPSSMWBWLAIWS

1 2Component

Component Matrixa

Extraction Method: PrincipalComponent Analysis.

2 components extracted.a.

Scree Plot

Component Number

87654321

Eig

enva

lue

7

6

5

4

3

2

1

0

6.190 77.373 77.373 6.190 77.373 77.373.987 12.336 89.709 .987 12.336 89.709.702 8.774 98.483

6.297E-02 .787 99.2704.665E-02 .583 99.8539.297E-03 .116 99.9692.371E-03 2.963E-02 99.9997.247E-05 9.059E-04 100.000

Component12345678

Total% of

VarianceCumulative

% Total% of

VarianceCumulative

%

Initial Eigenvalues Extraction Sums of Squared Loadings

Total Variance Explained

Extraction Method: Principal Component Analysis.

Page 27: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 25

This technique helps to understand how the yield prediction could still change before harvesting. In theory the more the growing season advances the lower is the number of similar years remaining thus lower the uncertainty. Further studies are in course to validate the approach.

Page 28: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 26

4 A CONTROL BOARD AS A TOOL TO GUIDE THE STATISTICAL DATA ANALYSIS In order to optimize the added-value given by analysts with further data processing, the map and graphic display analysis, the MARS analysts needs a tool in a common web framework able at the same time to

• visualize the results of MCYFS according to user definition of pages/content (portal concept)

• visualize specific pre-defined process output (warehouse concept)

• enable interactive re-parameterisation and re-launch of processes

• enable interactive geographical data visualisation.

In 2003, such a tool was developed to provide MARS statisticians an integrated interface to reduce the time access to the information and support the analysis in more automatically processing of the data; this tool is here referred as Control Board (COBO)

The Control Board links all the heterogeneous environment used in the forecast process by the analysts, consisting in a mix of development and automation of procedures in a web-based environment that provide an integrated guide for the user through the different steps of the estimation production.

Each user of the CoBo has an account, with a password and a rule, that permit to limit access to the data. In particular, two types of users are foreseen: the analyst, who performs the analysis and proposes forecasts for selected crops and countries, and the administrator, who is responsible of the “opening” of the analysis (starting the whole procedure with the loading of updated data) and the decision on the final forecasts to be published.

The statistical part of COBO The statistical forecasting process can be subdivided in three main phases, all integrated in the COBO tool; part of the first two phases are run automatically in CGMS Level 3, developed in S+ environment, while other analyses are carried out in COBO through SPSS.

First phase: data import and inspection

The first phase of the statistical forecasting process consists in updating the reference official time series (Eurostat data). Eurostat sends every 30 days via email a file extracted from their CRONOS DB, i.e. time series at national and regional level of crop area, yield and production; these data are in the following steps used to study and evaluate trends. The CoBo page for data import:

Page 29: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 27

CoBo allows the administrator to inspect the imported data, both graphically and in table format, and eventually to change manually imported values:

Selection of files to be used for the import

List of updates values (old and new values) in comparison to the previous analysis

Statistics on the last import

Selection of country and crop for which to visualize data

Time series of yield and surface for the selected crop and country (CRONOS)

Table listing all available data in the time series and their source (possibility of changing manually the imported values allowed)

Page 30: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 28

Second phase: production and proposal of forecasts by the analysts

Once the data loading process has been completed and confirmed by the administrator, each analyst is allowed to analyze and proposed forecasts separately for each crop and country, following the pre-defined scheme described above. The forecast production process allows the user to make his own analysis and at the end to propose to the administrator one or more plausible forecasts per country and crop; COBO registers all values proposed and displays them in a summary page for the administrator to leave him the possibility to make the final choice. In the following, the steps performed by the analysts will be described in the same order proposed by the Control Board. This order reflects the prioritization scheme given in section 3.2. The CGMS trend analysis and prediction model Official statistics of regional mean yields are predicted automatically by the CGMS using one of its simulated predictors (see section 3.3): • Potential dry weight of the simulated biomass (ton.ha-1). • Water limited dry weight of the simulated biomass (ton.ha-1). • Potential dry weight of the simulated storage organs (ton.ha-1). • Water limited dry weight of the simulated storage organs (ton.ha-1). The statistical sub system of the CGMS uses a combination of a linear time trend and crop growth simulation results. As a first attempt, the analysts considers the CGMS output results coming from the default launch of CGMS Level 3, and displayed automatically in COBO:

If the analyst is not satisfied by the automatic results given by CGMS Level 3 system output, he is given the possibility to re-parameterize part of the time series analysis (for instance, changing the time series length), launching a new CGMS simulation:

List of all automatic CGMS model outputs

Page 31: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 29

Trend analysis in COBO When for a certain combination of country and crop the accuracy of the CGMS prediction is deemed not to be sufficient, the MARS analyst starts to redefine trend periods and functions. First, trends for a longer period (1975 until current year) are determined if yield statistics for such a period are available. Next, trends for more recent periods are studied. Besides the analysis of the trend period, different trend functions (linear, quadratic, logarithm and exponential) are studied:

Parameters for the launching of a customized CGMS

Results of the CGMS simulation

Possibility to save parameters as template for further analysis

Page 32: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 30

Scenario-analysis in SPSS To deal with the residual uncertainty given by the unknown evolution of the season from the moment the forecast is issued to the moment the crop is harvested, agro-meteorological scenario’s can be produced and analysed. The scenario analyses consist in finding the most similar agro-meteorological years basing on the time series of parameters simulated by the CGMS. The analysis is based on Principal Component Analysis (PCA), Factor Analysis and Cluster Analysis; as input crop indicators of the CGMS of all available years are used. Further elements are given in section 3.6. Once agro-climatic years are detected which are similar to the current year, the resulting range of final yield performances of these years can be attributed to the current season. If a trend exists the range of final yields will be corrected for this trend before relating the yields to the current year. Scenario analyses and subsequent forecast computations are carried out in COBO calling the SPSS routines; the MARS analyst is given the possibility to decide which indicators can be used for each country and crop. The scenario analysis are always executed both to produce and update constantly pessimistic and optimistic crop yield scenario at EU level at any time of the year and in order to use the output as principal forecasts in case the other models would not show good statistical performances. Like in the CGMS section, the analyst is displayed with some “default” scenario analyses carried out automatically each 10 days, or he is allowed to create his own new scenario fitting to the specific situation analyzed:

Summary results of the current analysis are displayed in the COBO page, with a link to the detailed SPSS output:

Choice between automatic or custom scenario results

Possibility to create a new custom Scenario forecast

Page 33: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 31

Trimmed average or custom values The last options available concerning the forecast are the last five years average, or a trimmed average in COBO:

Third phase: the final choice of the forecast

The different elements such as prediction models, trend analysis, scenario analysis, trimmed average and judgement of the analyst on the result lead to a yield forecast. Based on their experience and expert knowledge, the MARS analysts propose the yield forecast during the MARS bulletins meetings. The forecasts proposed are checked by the statistical administrator and discussed collegially trying to optimise the pattern of reliability given by all sources i.e. give priority and “prefer predictors” according to the different moment of the year. Decisions, final choices and all intermediate steps through the COBO tool are registered and traceable, and the whole forecast process is therefore transparent and repeatable. The following chart resumes the approach:

Parameters for the launching of a custom CGMS simulation

Results of the custom analysis

Possibility to save parameters as template for further analysis

Link to the detailed SPSS output

Page 34: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 32

Starting Point

Real – Time MeteorologicalAnalysis

Temporal – Spatial Analysis

Yield Forecasts

Forecasting Procedure

Qualitative assessments

Discussion about the proposed forecasts

Approved

Rejected

Publish in MARS bulletin

Validation by advisors, Press Reviews, Field Visits

End Point

Feedback

Meteorological information

Simulated Crop Indicators

Remote Sensed Vegetation Indices

National Yield Statistics

Additional Sources

Page 35: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 33

5 NEW FEATURES IN CGMS RELEASE 8.0 Thus, the CGMS version 8.0 gives the following advantages for the user compared to the version 2.3:

1. The forecast module includes the possibilities to apply the following approaches of yield forecast:

a) the current CGMS approach extended to all the indicators generated or included in the system:

• simulated crop indicators (biomass-potential productivity level, storage organs biomass-potential productivity level, total biomass-water limited productivity level, storage organs biomass-water limited productivity level, potential leaf area index, water limited leaf area index, soil moisture, development stage, total water requirement as an output of CGMS, as well as outputs from the FAO Water Balance Model);

• meteorological indicators (rain, temperature, climatic water balance, global radiation, potential evapotranspiration);

• remote sensing indicators (greenness indexes, e.g. NDVI, CNDVI); • different combinations of the above-mentioned meteorological indicators

(multivariate approach). b) the scenario analysis: i.e. defining years-analogues (based on multivariate data

analysis) for dynamics of: • simulated crop indicators; • meteorological indicators; • remote sensing indicators; • different combinations of the above-mentioned meteorological indicators

(multivariate approach. c) Applying user-specified equations of yield-indicator interrelations. The forecast

is made on a basis of regionally established regressions between yield and separate meteorological parameters, calculated for fixed periods. The application must read selected parameters from the data base, and use these as input data for the forecast calculation.

Page 36: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 34

6 QUALITATIVE ASSESSMENT OF MARS PREDICTION (QUAMP) RESULTS

6.1 Introduction

The QUAMP study was launched to evaluate the performances of the forecasting system after 10 years of forecast production. The evaluation takes care of data until 2002. A previous study was made in 1998 when the decision to use the system in an operational context was taken. In the first global evaluation of the system performances made in 1998 (Genovese, 1998) the RMSE error has been used as error indicator by comparing the MCYFS forecasts with ex-post observed results. The average yield forecast error ranged from about 3% to 5% at EU level for the main crops (8.3% for durum wheat). In absolute terms the average error for wheat ranged between about 2 and 4 quintals ha-1 both for European and national levels (Genovese, 2001). In general what has been shown is that the system gives higher errors at the beginning of the season and lower at the end according to a cumulative effect of the climate impact on the crop behaviour. It is stressed that the assumption of the analysis is that the observed yield (official time series) used as reference are considered without errors and therefore the use of the word MCYFS error maybe not appropriate. More over the project was run in 2003 and at the time of error calculation the official statistics available for 2002 for some crops and countries were not stable. Table 6.1 shows the dynamic of the average forecast error in the course of the year. For soft wheat the RMSE error is around 5% in April and decreases to about 3% in September. This analysis was conducted at national level and for the main crops. The error analysis is repeated each year in order to carefully watch the performance of the MCYFS, individuate sources of errors and take decision on which part of the MCYFS has to be improved. Table 6.1: The RMSE yield error at EU level for the main cereal crops (the % values are calculated over the first 5 years of forecast publications), period 1993-1997

April May June July August September October Average Wheat 5.06 4.11 4.63 4.74 4.06 3.13 3.18 4.1 Soft wheat 5.08 4.42 5.11 5.33 4.56 3.57 3.59 4.5 Durum wheat 12.04 9.49 8.46 8.56 7.90 7.41 4.21 8.3 Barley 4.30 4.59 4.79 4.39 3.58 2.78 1.98 3.8 Maize 5.95 5.73 5.31 5.13 4.66 3.77 3.72 4.9 (Genovese, 1998) In the following sections the error of MCYFS forecasts is analyzed for the following eight crops of interest: Soft wheat, durum wheat, Barley, Grain maize, Rape seed, Potato, Sugar beet and Sunflower. After a short description of the error indicators adopted (section 6.2), the analysis kicks-off with an examination of MCYFS’ overall error, i.e. at the EU-15 level, in section 6.3. A more detailed analysis of the MCYFS is then reported in the following sections, where the error is analyzed from three different points of view:

Page 37: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 35

• Spatial distribution: averaged error figures which refer to combinations of crops and countries (EU member states) are examined. For each combination all monthly forecasts produced by MCYFS are considered, percentage and absolute percentage errors are computed and the errors over months and years are averaged (section 6.4). Specific marginal crop, country combinations with very small cultivated areas (below 25000 hectares) were left out of the comparisons;

• Evolution over months: the bias and average error size of MCYFS forecasts for EU-15 yields, per month and the way it changes are examined in section 6.5;

• Evolution over years: we examine the bias and average error size of MCYFS forecasts for EU-15 yields, per year and the way it changes are analyzed in section 6.6.

Finally, the last section of the chapter (section 6.8) examines whether the performance of the MCYFS in the period 1998-2002 improved compared to its performance in the period 1993-1997. One aspect of the analysis, from either point of view, concerns the bias of MCYFS predictions. As will be seen in the following sections, averaging the error of MCYFS over months, years or countries, shows that bias exists in the predictions. In some cases it is systematically negative (underestimation of true yield) or systematically positive (overestimation); in other cases the bias is sometimes positive and sometimes negative without a strong pattern. The main aim of the examination of bias will therefore not be to identify bias (it exists anyway!) but to identify cases of systematic underestimation or overestimation. Therefore, where in the following sections statistical tests for bias are applied or significant bias is mentioned, the concern is on systematic bias. The other aspect of analysis is error size, i.e. absolute error, the absolute difference between forecast and true yield.

6.2 Methodology of forecast errors evaluation methods

Error indicators for the QUAMP study The forecasting literature favours three different types of error indicators for use in evaluating forecast performance or for comparing forecasting methods: RMSFE, MAPE and Theil’s U coefficient. The reason behind the preference for RMSFE seems to be its resemblance to mean square error which is a discrepancy measure commonly used in statistics and its correspondence to a quadratic loss function; see for example Deschamps and Mehta (1980). It has however received a lot of criticism because it is affected by the magnitude of the forecasted series (and hence does not allow aggregation over multiple series) and for its lack of reliability. Armstrong and Collopy (1992) found that the rankings of several forecasting methods changed significantly depending on the sample of time series used. Similar comments are made by Makridakis and Hibon (1979). The effect of the series’ magnitude on the indicator is eliminated if instead of RMSFE we use RMSPE. MAPE is the indicator of choice in many studies of forecasting accuracy (Makridakis and Hibon, 1979, Deschamps and Mehta, 1980, Karamouzis, 1985). It does not depend on the series’ magnitude or unit of measurement, can be averaged across series and can be used for comparing methods. We manly have used MAPE as an indicator of error size, because, as we already

Page 38: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 36

mentioned it is an average absolute error while RMSPE is a proxy to an average absolute error. We have however used RMSPE for specific comparisons. Besides MAPE we have used MPE which also does not depend on a series magnitude or unit of measurement. It complements MAPE by giving the direction and size of forecasting bias. Finally, since Theil’s U is so widespread we have calculated a similar indicator, and more specifically MRE, where the naïve method uses 1=k . We chose MRE because firstly we have a personal preference for indicators measuring absolute instead of squared error and we chose

1=k because we wanted to compare the forecasting methods with a slightly more complicated naïve method than the random walk. In the following section the forecast error indicators chosen are presented and analyzed with particular regards to their statistical properties. Percentage error It is the difference between the forecast and the true value of the variable of interest expressed as

a proportion of the true value. If tY is the true value of the variable at time point t and tY is a forecast for it, the percentage error (PE) at time point t is given by the formula

t

t

t

ttt Y

eY

YYd =

−=

ˆ

. The indicator may assume any real number as its value. The further from zero its value is the larger the forecast error. It can be multiplied it by 100, and expressed as a percentage (%). The indicator shows the relative magnitude and the direction of bias in the same way as FE. A

disadvantage of the indicator is its lack of symmetry; for example 1=td means that the forecast

is twice as large as the true value but 5.0−=td means that the true value is twice as large as the forecast. This asymmetry also means that overestimation is penalized more (“looks worse”) than underestimation. The aggregation of percentage error over a given period of time consisting of T points gives the mean percentage error (MPE),

T

dT

tt∑

=1

, and the root mean square percentage error (RMSPE),

T

dT

tt∑

=1

2

.

Page 39: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 37

MPE may take any real value while RMSPE takes only positive values. The first aggregate indicates the direction and relative size of the bias of the forecasts while RMSPE gives an average relative size of forecast error over the given period. Unusually large errors affect MPE and RMSPE. On the other hand, due to the existence of the true values in the denominator of the indicator, the aggregates are not affected by the magnitude or the unit of measurement of the series being forecasted. Alternating large positive and negative errors affect them. This renders them suitable as means for comparing the performance of a forecasting method on several series or the performance of several methods on the same series. Absolute Percentage error It is the absolute value of the percentage error. Using the same notation as before, the absolute percentage error (APE) at time point t is given by the formula

t

tt Y

ed =

. The indicator may assume any positive real number as its value. The further from zero its value is the larger the forecast error. It can be multiplied it by 100, and expressed as a percentage (%). A disadvantage of the indicator is its lack of symmetry just like PE. Aggregating absolute percentage error over a given period of time consisting of T points gives the mean absolute percentage error (MAPE):

T

dT

tt∑

=1

. The aggregate has the same advantages and disadvantages as those of PE and is suitable for the comparison of the performance of a forecasting method on several series or of the performance of several methods on the same series. The fact that it considers absolute error makes it more suitable than PE-based aggregates for evaluating the relative size of error. Statistical comparisons of crop yield forecasting systems A statistical comparison has also been carried out where possible; more specifically, Wilcoxon, Friedman and Page tests have been used (Conover, 1998). Wilcoxon tests have been used to examine the statistical significance of MCYFS’ forecasting bias for each crop and at each country. The tests examined whether an observed mean percentage error above or below zero indicated bias or could be due to random fluctuations. Friedman tests are used in order to compare the forecast error size of MCYFS between countries and also between crops. They indicated countries and crops in which MCYFS has a significantly different performance (in terms of forecast error).

Page 40: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 38

Finally, Page tests have examined whether the average error size of MCYFS reduces as the year advances and also whether it reduces from year to year.

6.3 Overall error of the MARS crop yield forecasting system

In this section a first look is given to the error made by MCYFS in forecasting EU-15 yields for the eight crops of interest. Averaged percentage error and absolute percentage error figures which have averaged years and months out are used. More specifically, the errors of each month and year are added, and the sum is divided by the number of errors in order to get the final figures. The period of reference is 1996-2002. This period has been chosen as it is the EU15 reference. For each crop and month the Wilcoxon test is used to check whether the bias is significantly negative (underestimation) or positive (overestimation). The data for the test are annual averages for the given crop. The p-values of the tests are given alongside average errors in the table that follows:

Table 6.2: Overall MPE and MAPE of EU-15 yield forecasts for the period 1996-2002 (averages across years and months).

Crop MPE p-value MAPE Soft wheat -0.38 0.87 3.48 Durum wheat

2.44 0.40 6.49

Barley -1.21 0.50 2.71 Grain maize

-2.47 0.31 3.66

Rape seed -0.22 1.00 5.03 Potato -3.77 0.06 4.13 Sugar beet -3.40 0.18 4.91 Sunflower -3.24 0.12 5.54

Table 6.2 shows that MCYFS underestimates the yield of all crops except for Durum wheat. However, the apparent bias is not statistically significant, although in Potato this is marginally true. Moreover it can be observed that MAPE is beyond the 3% limit in all crops except Barley; for Durum wheat in fact it is beyond even the 6% limit.

6.4 Analysis of the spatial distribution of MCYFS error

In this section the spatial features of the error of MCYFS forecasts are examined. The data used are the MPE and MAPE calculated as follows: for each combination of crop, country the percentage or absolute percentage error have been averaged over all months and years.

6.4.1 Bias

The table of MCYFS MPE across EU member states and for all crops of interest is presented below:

Page 41: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 39

Table 6.3: MPE of MCYFS forecasts for the period 1993-2002.

Country Soft wheat

Durum wheat Barley Grain

maize Rape seed Potato Sugar beet

Sunflower

AT -0.08 11.03 3.52 -3.56 10.34 -5.57 -3.96 -3.60 BE -1.29 -3.79 -6.75 -6.99 -4.07 DE -1.43 -0.04 -0.91 -2.96 -2.90 -2.07 -3.64 5.07 DK 0.20 -1.55 1.40 -2.65 -1.80 ES -3.05 17.94 -1.03 -4.03 0.45 -9.75 -8.90 7.41 FI 10.86 6.82 18.38 -1.79 -0.04 FR -0.34 4.12 -0.17 -2.36 0.23 -4.47 -0.86 3.30 GR 3.03 4.38 5.65 -0.11 -2.71 -2.66 2.07 IE -2.74 0.29 -3.94 1.96 IT 0.77 1.31 1.59 -1.06 36.54 -0.25 0.57 5.31 LU 0.43 NL 3.70 2.43 -12.10 -0.57 1.14 PT -5.29 -0.71 -5.16 -10.94 1.98 23.29 SE 0.93 -1.72 -3.81 2.53 -2.40 UK 0.16 0.63 0.64 -1.33 -3.15

The marked cells denotes combinations for which the Wilcoxon test indicates significant bias at the 0.05 level (only cases potato and sugar beet, ES), or significance at the 0.15 level (only case Maize, PT) and significance at the 0.5 level (Soft wheat IE); all significance levels have been Bonferonni-adjusted (Note: The averages are not based on the same number of years and months for each Crop, Country combinations.) In Table 6.36.3, 6 figures are smaller than –6%, 15 are in the interval [-6%, -3%), 29 figures are in [-3%, 0), 20 are in (0, 3], 9 are in (3%, 6%] and 9 are greater than 6%. Therefore a large number of crop - country combinations have errors beyond the 3% limit. Moreover, there is a preponderance of negative biases. A Wilcoxon test has been performed for each crop, country combination to check the existence of systematic bias, giving a total of 88 tests. The data used for each test were the annual MPEs (e.g. averages of each year’s monthly percentage errors) of the respective crop, country combination. To avoid the effects of multiplicity, the nominal 0.05 level of confidence was adjusted with the Bonferonni adjustment. 6.3 indicates with different colours those combinations which exhibit significant bias at the (Bonferonni adjusted) 0.05, 0.15 and 0.5 level. Of the 88 tests only 2 found significant biases at the 0.05 level. They refer to Potato and Sugar beet in Spain and they correspond to systematic underestimation. Another feature worth noting is that Grain maize was underestimated on average in all countries. Potato was underestimated in 12 out of the 14 countries, although only in Spain underestimation was systematic. Durum wheat shows a tendency for overestimation, negative biases being small; no bias is significant though. Finally Sunflower yield was overestimated in all countries except Austria, which again, might be explained by the fact that yields in Austria are consistently well above the EU average. Country wise it is worth noting that the yield in Belgium was underestimated for all crops while in Germany it was underestimated for 7 out of 8 crops. On the contrary, yields in Italy appear to be overestimated on average in 6 out of 8 crops. Below the boxplots of each crop’s MPEs across the fifteen countries and the boxplots of each country’s MPEs across the eight crops are presented. These graphs allow at a glance the inspection of the overall bias in each crop and in each country.

Page 42: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 40

Figure 6.1: Boxplots of country MPEs per crop.

The most prominent features in Figure 6.1 are the underestimation of the yield of Grain maize, Potato and Sugar beet and the overestimation of the yield of Sunflower, Durum wheat and Rape seed.

Figure 6.2: Boxplots of crop MPEs per country.

In Figure 6.2 the negative biases in Belgium, Germany and Denmark and the positive biases in Finland and Italy can be observed. From both graphs it is evident that positive biases (overestimations) are relatively higher than negative ones. This is to a certain extent due to the indicator used (the percentage error), which takes larger values for positive than negative differences of the same absolute value. For this reason, the attention to the graphs has not to be so much focused on the distance of the extremes from the zero line but rather on the proportion of each “box” above or below the line. In order to give a clearer picture of the MCYFS error concerning each crop, Annex 8.4 reports and comments maps of the country MPEs and of the interval the MPEs belong to.

Page 43: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 41

6.4.2 Error magnitude

The table of MCYFS MAPE across EU member states and for all crops of interest is the following:

Table 6.3: MAPE of MCYFS forecasts for the period 1993-2002.

Country Soft wheat

Durum wheat Barley Grain

maize Rape seed Potato Sugar

beet Sunflower

Average rank

AT 6.59 19.88 9.21 6.93 14.71 7.05 9.46 5.43 6.50 BE 5.61 6.60 11.56 8.66 7.75 7.00 DE 4.42 6.94 5.27 5.83 10.66 7.21 6.86 12.57 4.00 DK 3.87 4.16 12.66 6.83 6.13 3.20 ES 13.20 27.98 15.29 7.84 16.84 9.75 9.05 19.62 8.63 FI 17.18 13.00 22.48 9.98 17.54 11.40 FR 4.76 8.50 4.90 3.61 8.70 5.85 5.44 5.20 2.25 GR 11.05 12.83 14.26 6.08 9.34 6.64 21.35 7.43 IE 7.12 10.12 12.23 15.59 11.75 IT 4.89 6.58 4.96 3.20 50.25 3.25 7.15 8.10 3.75 LU 7.48 9.00 NL 6.94 7.46 20.69 3.74 5.61 6.20 PT 27.01 32.51 27.12 13.03 11.30 35.51 10.67 SE 3.62 7.36 9.90 10.34 5.16 4.60 UK 5.85 5.44 13.40 6.72 9.83 6.40 Average rank

2.21 6.14 3.21 2.44 6.11 3.29 3.62 5.71

Note: The averages are not based on the same number of years and months for each Crop, Country combination. In Table 6.3 no figure is in the interval [0, 3%], 23 are in (3%, 6%] and 65 are greater than 6%. All crop, country combinations have therefore high errors. This result is more worrisome than the result reported in the previous because it indicates that small MPEs may be the result of high positive and negative errors. In MAPE figures such averaging does not take place and the quality of forecasts is more accurately reflected in them. Annex 8.5 reports and comments maps (separately for each crop) which show into which interval belongs the MAPE of each country. Table 6.3 also presents one column and one row of average ranks. These are calculated as follows: (a) firstly, the MAPEs within each country have been ranked with rank 1 assigned to the crop with the smallest MAPE. When ranking has been performed in all countries the ranks each crop has received are summed and their sum is divided by the number of countries for which forecasts are available. This indicates how large is the error of each crop compared to the other crops. (b) the column of average ranks has been calculated in a similar way, by ranking countries separately within each crop. The observed averages show, for example, that on average, largest errors occur in Rape seed, while among countries the largest errors on average are observed in Portugal, Ireland and Finland. The boxplots of each crop’s MAPEs across the fifteen countries and the boxplots of each country’s MAPEs across the eight crops are the following:

Page 44: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 42

Figure 6.3: Boxplots of country MAPEs per crop.

In Figure 6.3 it can be observed that besides Rape seed large errors are observed in Durum wheat and Sunflower. The countries with the largest errors are Portugal, Spain and Finland:

Figure 6.4: Boxplots of crop MAPEs per country.

6.5 Analysis of the evolution of MCYFS error over the months

In this section the temporal evolution of the error of MCYFS forecasts is analyzed. The data used are the MPE and MAPE for EU-15 which have been calculated as average for each combination of crop and month of the EU-15 percentage or relative percentage error over the period 1996-2002.

Page 45: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 43

6.5.1 Bias

The table reporting MCYFS MPE across months and for all crops of interest is the following:

Table 6.4: MPE of MCYFS forecasts for EU-15 yield for the period 1996-2002.

Crop March April May June July August Sept Oct Nov

Soft wheat

0.75 -0.13 0.01 -0.24 -2.27 -0.69 -1.40 -0.28 -0.13

Durum wheat

3.76 6.46 2.69 1.50 2.44 0.80 0.61 0.60 1.99

Barley -0.61 -1.25 -1.27 -1.90 -2.08 -0.95 -1.15 -0.88 0.74 Grain maize

-3.97 -2.03 -1.71 -4.97 -1.90 -3.96 -2.44 2.77

Rape seed 2.27 2.08 0.09 -0.92 -1.86 -0.88 -3.33 -1.93 5.46 Potato -9.19 -2.92 -4.38 -6.03 -3.56 -4.06 -2.41 -0.81 Sugar beet -6.19 -1.92 -4.22 -5.51 -4.73 -1.85 -1.61 -2.96 Sunflower -3.40 -3.79 -4.72 -8.31 -2.72 1.84 -2.72

The marked cell denotes combinations for which the Wilcoxon test indicates significant bias at the 0.05 level, all significance levels have been Bonferonni-adjusted. (Note: The averages are not based on the same number of years for each Crop, Month combination)

In Table 6.4, 4 figures are smaller than –6%, 13 are in the interval [-6%, -3%), 32 figures are in [-3%, 0), 15 are in (0, 3], 2 are in (3%, 6%] and 1 is greater than 6%. Almost one in three crop-month combinations have high errors. The preponderance of negative biases already observed in national figures is evident here too. The Wilcoxon test has been performed, for each crop and month combination, to check the existence of systematic bias, giving a total of 67 tests. The data used for each test are the MPEs (e.g. averages of each month’s percentage errors across the seven years) of the respective crop, month combination for EU-15. To avoid the effects of multiplicity, the nominal 0.05 level of confidence was adjusted with the Bonferonni correction. Table 4 indicates with different colours the combinations which exhibits bias significant at the (Bonferonni adjusted) 0.05. Of the 67 tests only 1 found a significant bias at the 0.05 level. It corresponds to a systematic underestimation of the yield of Grain maize in October. It is worth noting that Potato and Sugar beet were underestimated on average in all months. The yield of Barley, Grain maize and Sunflower were underestimated in all but one months (November, November and September respectively). On the contrary, Durum Wheat yield was overestimated on average in all months. The line chart of the behaviour of MPE (separately for each crop) across months is the following:

Page 46: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 44

-10

-8

-6

-4

-2

0

2

4

6

8

Mar

ch

Apr

il

May

June

July

Aug

ust

Sep

t

Oct

Nov

MONTH

MPE

(%)

Soft wheatDurum wheatBarleyGrain maizeRape seedPotatoSugar beetSunflower

Figure 6.5: EU-15 MPEs of MCYFS forecasts for the yield of several crops (averaging across years).

Figure 6.5 shows that bias decreases on average for all crops as the year advances. November however shows a deterioration in Rape seed and slightly less so in Grain maize and Durum wheat. In other words MCYFS predictions are, as expected, more reliable in the last than in the first months of the year. Other important features of Figure 6.5 are the mostly negative bias of the forecasts for all crops except Durum wheat (positive bias) and the fact that the error for each crop follows roughly the same pattern along the year. Moreover, it can be observed that July stands out from the rest of the months, having a larger bias.

6.5.2 Error magnitude

The table reporting MCYFS MAPE across months and for all crops of interest is the following:

Table 6.5: MAPE of MCYFS forecasts for EU-15 yield for the period 1996-2002.

Crop Mar Apr May June July Aug Sept Oct Nov

Soft wheat

3.55 3.71 4.16 3.91 5.07 2.68 3.29 2.65 0.13

Durum wheat

14.91 10.20 6.44 5.29 3.84 3.59 5.67 3.86 1.99

Barley 3.15 3.47 2.81 2.74 3.80 2.43 2.38 1.62 0.74 Grain maize

5.90 3.73 3.09 5.36 3.18 3.96 2.44 2.77

Rape seed 5.68 6.68 5.82 4.93 5.50 3.93 3.33 3.67 5.46

Potato 9.19 3.11 4.75 6.77 4.10 4.38 2.68 0.81 Sugar beet

6.19 4.61 6.21 5.51 6.71 1.85 3.12 2.96

Sunflower 3.40 5.86 8.13 8.31 4.10 4.10 4.31 Average rank

7.00 6.75 5.75 5.38 6.63 3.75 3.50 2.50 1.86

Page 47: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 45

Note: The averages are not based on the same number of years for each Crop, Month combination. In Table 6.5, 16 figures are in the interval [0, 3%], 40 are in (3%, 6%] and 11 are greater than 6%. Most crop-month combinations have therefore errors above 3%, but the picture is better than that of country figures in section 2.4.2. This slight improvement is explained by the fact that EU-15 forecasts are weighted averages of country forecasts, and consequently their error is an average of the country errors. Average ranks for months show a clear reduction of error size as the year advances (with the exception of July). The evident reduction however is not enough to give a significant result in statistical testing according to Page’s test (which tests whether error size decreases on average from month to month): the application of the test gives a non-significant result at the 0.05 level (p-value: 0.56). However, since we cannot consider months as independent one from the other, this result must be viewed as a lack of strong indications that the error size of MCYFS forecasts decreases as the year advances, at least at the EU-15 level. A diagrammatic representation of Table 6.5 and further evidence for the decrease of error size is given by the following line chart:

0

2

4

6

8

10

12

14

16

Mar

Apr

May

June

July

Aug

Sep

t

Oct

Nov

MONTH

MA

PE (%

)

Soft w heat

Durum w heat

Barley

Grain maize

Rape seed

Potato

Sugar beet

Sunflow er

Figure 6.6: MAPEs of MCYFS forecasts for the EU-15 yield of several crops (averaging across years).

The same pattern already observed in Figure 5, that is the deterioration of forecasts in July, can be detected in Figure 6 as well; in fact if July conformed to the pattern of the other months, the result of Page’s test would be even more favourable to the hypothesis that MCYFS improve as the year advances. In an effort to quantify the decrease of MAPE as the year advances a linear regression model has been applied, separately for each crop, of MAPE on month. Months have been codified giving a value of –4 to March, -3 to April, and so on, arriving at 3 for October and 4 for November. In Table 6 the following results of the regressions for each crop are reported: R2, the slope of each regression line and the statistical significance of the slope’s estimate. The slope represents the average change of MAPE per month.

Page 48: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 46

Table 6.6: Regression results for estimation of the relationship between MAPE and month

Crop Average change of MAPE per month

P-value of estimate of change R2

Soft wheat -0.33 0.056 0.428 Durum wheat -1.23 0.004 0.709 Barley -0.27 0.010 0.633 Grain maize -0.33 0.075 0.435 Rape seed -0.27 0.061 0.416 Potato -0.77 0.039 0.537 Sugar beet -0.5 0.065 0.459 Sunflower -0.17 0.694 0.034

For all crops the regressions indicate a reduction in MAPE from month to month. The strength of the relationship between MAPE and month varies across crops, as does the statistical significance, from very strong as in Durum wheat, Barley and Potato to very weak as in Sunflower. Average reduction of MAPE per month is less than 1 MAPE unit (1%) for all crops except Durum wheat whose MAPE decreases by 1.23 units per month. Differences in average reduction must however be considered in relation to the starting value of each crop’s MAPE. Durum wheat shows the largest reduction because it starts in March with a MAPE which is almost the triple of the other crops’ MAPE and therefore has greater scope for reduction.

Page 49: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 47

6.6 Analysis of the evolution of MCYFS error over the years

In this section the evolution of the error of MCYFS forecasts over the last seven years is analyzed. The data used are the EU-15 MPE and MAPE calculated as follows: for each combination of crop and year (1996-2002) EU-15 percentage or relative percentage error over all months have been averaged.

6.6.1 Bias

MCYFS MPE across years and for all crops of interest are the following:

Table 6.7: Annual MPE of MCYFS forecasts for EU-15 yield for the period 1996-2002.

Crop 1996 1997 1998 1999 2000 2001 2002 Soft wheat -8.00 3.06 -3.83 1.70 -0.27 4.82 1.12 Durum wheat

5.28 8.87 -10.04 3.68 -0.17 10.74 -0.73

Barley -6.96 1.52 -1.77 -1.21 -3.11 2.21 1.47 Grain maize -7.65 -6.84 -0.27 -1.31 -0.43 1.20 2.04 Rape seed -0.68 -7.14 -2.17 -6.81 7.99 1.70 7.11 Potato -10.66 -6.23 0.55 -3.97 -2.63 -0.34 -0.43 Sugar beet -4.09 -4.37 -6.08 -5.64 -3.52 6.04 -5.34 Sunflower -1.38 -6.65 4.12 -5.51 -8.52 -1.80 Note: The averages are not based on the same number of months for each Crop, Year combination. In Table 6.7, 12 figures are smaller than –6%, 9 are in the interval [-6%, -3%), 15 figures are in [-3%, 0), 9 are in (0, 3], 5 are in (3%, 6%] and 5 are greater than 6%. More than half of the crop, year combinations have high or unacceptably high errors. The corresponding line chart:

-15

-10

-5

0

5

10

15

1996

1997

1998

1999

2000

2001

2002

YEAR

MPE

(%)

Soft wheatDurum wheatBarleyGrain maizeRape seedPotatoSugar beetSunflower

Figure 6.7: EU-15 annual MPEs of MCYFS forecasts for the yield of several crops (averaging across months).

Page 50: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 48

A negative bias is observed in the first years, followed by a drift of the bias towards zero around 1999 and a subsequent increase towards 2002. Durum wheat is the crop that demonstrates the largest errors on average.

Figure 6.8: Boxplots of MCYFS annual MPEs for EU-15, per crop (averaging across

months).

Mostly positive biases are observed for Durum Wheat and negative biases for Grain maize, Potato, Sugar beet and Sunflower.

6.6.2 Error magnitude

The MCYFS MAPE across years and for all crops of interest are the following:

Table 6.8: Annual MAPE of MCYFS forecasts for EU-15 yield for the period 1996-2002.

Crop 1996 1997 1998 1999 2000 2001 2002 Average rank

Soft wheat 8.00 3.06 3.83 1.70 0.99 4.82 1.20 3.71 Durum wheat

5.28 8.87 10.04 5.02 2.62 10.74 2.06 5.86

Barley 6.96 1.64 1.77 1.39 3.11 2.21 1.47 3.29 Grain maize

7.65 6.84 2.28 1.31 0.43 1.44 2.04 3.29

Rape seed 2.59 7.14 2.35 6.81 7.99 1.70 7.11 5.29 Potato 10.66 6.23 0.55 3.97 2.63 1.05 1.31 3.43 Sugar beet 4.09 4.37 6.08 5.64 3.52 6.04 5.34 5.43 Sunflower 5.21 6.65 4.92 5.51 8.52 1.80 5.33 Average rank

5.00 4.75 4.00 3.38 3.50 3.88 3.00

Note: The averages are not based on the same number of months for each Crop, Year combination.

Page 51: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 49

In Table 6.8, 23 figures are in the interval [0, 3%], 15 are in (3%, 6%] and 17 are greater than 6%. Most crop-year combinations therefore have high errors. Page’s test has been applied to evaluate whether average error size decreases from year to year. The application of the test gives a marginally insignificant result (no decrease) at the 0.05 level (p-value: 0.073).

A linear regression model has been applied, separately for each crop, of MAPE on years. Years have been codified giving a value of –3 to 1996, -2 to 1997, and so on, arriving at 2 for 2001 and 3 for 2002. The following table reports the results of the regressions:

Table 6.9: Regression results for estimation of the relationship between MAPE and year

Crop Average change of MAPE per year

P-value of estimate of change R2

Soft wheat -0.70 0.144 0.375 Durum wheat -0.48 0.525 0.085 Barley -0.5 0.208 0.294 Grain maize -1.05 0.034 0.626 Rape seed 0.30 0.613 0.055 Potato -1.30 0.041 0.599 Sugar beet 0.16 0.450 0.118 Sunflower -0.31 0.616 0.069

Results are various, depending on the crop, indicating slight reduction or slight increase of MAPE during the last seven years. The strength of the relationship between error size and year as well as its statistical significance also varies. The strongest relationships correspond to the largest average reductions per year and occur in Grain maze (MAPE falls by 1.05 units per year) and in Potato (MAPE falls by 1.3 units per year). Sugar beet and Rape seed demonstrate a slight (statistically insignificant) increase of their MAPE during the last years. Figure 9 below shows the performance of MAPE for the different crops across the years, in which there does not seem to be great resemblance in the behaviours of the different crops:

0

2

4

6

8

10

12

1996

1997

1998

1999

2000

2001

2002

YEAR

MA

PE (%

)

Soft w heat

Durum w heat

Barley

Grain maize

Rape seed

Potato

Sugar beet

Sunflow er

Figure 6.9: EU-15 annual MAPEs of MCYFS forecasts for the yield of several crops (averaging across months).

Page 52: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 50

The comparison of the crop ranks in Table 6.8 with Friedman’s test, in order to evaluate the magnitude of the errors between crops, indicates that there is no statistically significant difference in error size between crops (p-value=0.168). The smaller error of EU-15 forecasts compared to country forecasts is also evident in the following boxplot:

Figure 6.10: Boxplots of MCYFS annual MAPEs for EU-15, per crop (averaging across

months).

6.7 Conclusions on the quality evaluation of the MCYFS in the period 1996-2002

Crops

Small negative biases seem to exist for all crops except Durum Wheat which has a small positive bias. These biases however are not statistically significant. This is one reason why Durum Wheat’s yield error was much different from that of the other crops. The size of error (MAPE) is mainly at the 3-6% interval, except for Durum Wheat where it is larger than 6%. In general, Soft wheat and Barley have the smallest errors, while Durum wheat, Rape seed and Sunflower have the largest ones. Countries

Most crops were underestimated in most countries. In two combinations of crop, country (i.e. Potato and Sugar beet in Spain) this underestimation was found significantly large. Also Grain maize was underestimated on average in all countries, Potato was underestimated in 12 out of the 14 countries (Luxemburg excluded), and Sunflower yield was overestimated in all countries except Austria. It is also worth noting that the yield in Belgium was underestimated for all crops while in Germany it was underestimated for 7 out of 8 crops. On the contrary, yields in Italy appear to be overestimated on average in 6 out of 8 crops. Observing the size of error it can be observed that the largest error occurs in Spain, Portugal and Finland (and to a lesser extend Greece and Ireland). On the other hand the smallest error occurs in Denmark, France and Belgium. It is worth noting that the former group mostly consists of countries in the periphery of the EU and the latter consists of countries which are geographically close.

Page 53: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 51

Evolution across months

Bias appears to decrease in each consecutive month with the exception of July where it actually increases for all crops. This decrease, as expected, leads to more reliable results later in the campaign. The only statistically significant bias occurs in the October prediction of Grain maize. The size of error also decreases for all crops. The average change of MAPE per month for EU-15 appears larger for Durum wheat (-1.23%) and Potato (-0.77%) with Sunflower (-0.14%), Barley (-0.27%) and Rape seed (-0.27%) reporting the smallest changes. The large value of the reduction for Durum wheat appears more reasonable if we consider that it starts with the largest error. Evolution across years

No significant trend appears to exist throughout the years although in the last years (2000, 2001) mostly negative biases have turned to positive ones. The size of error appears to be falling from year to year but different crops have widely different behaviours.

6.8 Evaluation of the improvement of MCYFS forecasts

The purpose of this section is to examine whether the performance of MCYFS in the period 1998-2002 improved compared to its performance in the period 1993-1997. Performance is assessed in terms of average error size and more specifically in terms of RMSPE. Separately for each crop monthly RMSPE have been computed averaging the MCYFS errors, for the given month, during the two periods in question. Errors refer to EU-total forecasts. EU-total denotes EU12 until 1995 and EU15 from 1996 onwards. Moreover, a Wilcoxon test on each month’s absolute percentage errors is applied to test the null hypothesis (H0) that absolute percentage errors have not improved (i.e. have not decreased) during the period 1998-2002 as opposed to the period 1993-1997. Therefore small p-values (less than 0.05 in this document) reject the null hypothesis and show improvement (reduction) of errors and hence better performance of MCYFS.

6.8.1 Forecasting of Soft wheat

The RMSPEs of MCYFS in the forecasting of Soft wheat and the p-values of the respective Wilcoxon tests are reported in the following table:

Table 6.10: Monthly RMSPEs of MCYFS forecasts for Soft wheat – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March 4.16 4.16April 4.37 5.05 3.73 0.2778May 4.37 4.82 3.87 0.7262June 4.19 4.98 3.21 0.5317July 5.02 5.47 1.37 0.5000August 3.45 4.52 1.85 0.1111September 3.39 3.60 2.05 0.3333October 3.09 3.80 2.18 0.2429November 0.13 0.13Overall 4.00 4.66 3.15

Page 54: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 52

The RMSPE in 1998-2002 is smaller than the respective RMSPE of 1993-1997 in all months; the differences however are not statistically significant. The lack of significance is partly due to the sparseness of the data from which these figures are derived. However, there are strong indications that MCYFS has improved in the forecasting of Soft wheat yield after 1998. The contents of Table 6.10 are presented graphically in the following diagram:

0.00

1.00

2.00

3.00

4.00

5.00

6.00

Mar

ch

Apr

il

May

June July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-0293-9798-02

Figure 6.10: Monthly RMSPEs for Soft wheat

6.8.2 Forecasting of Durum wheat

The RMSPEs of MCYFS in the forecasting of Durum wheat and the p-values of the respective Wilcoxon tests are the following:

Table 6.11: Monthly RMSPEs of MCYFS forecasts for Durum wheat – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March 16.52 16.52April 26.32 37.66 10.61 0.0556May 26.27 36.57 6.58 0.0278June 23.95 35.41 5.42 0.0317July 33.26 36.42 2.09 0.1667August 25.91 36.41 4.03 0.0278September 33.52 36.56 7.60 0.5000October 25.14 35.28 4.40 0.1000November 1.99 1.99Overall 26.59 36.36 8.50Note: Marked cells indicate significant cases at the 0.05 level. The RMSPE in 1998-2002 for Durum wheat is smaller than the respective RMSPE of 1993-1997 in all months. The improvement in May, June and August is considered statistically significant. Moreover, the p-values of April, July and October are small. Therefore, there are very strong

Page 55: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 53

indications that MCYFS has improved in the forecasting of Durum wheat yield after 1998. The contents of Table 6.11 are displayed in the following graph:

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

Mar

ch

Apr

il

May

June

July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-02

93-97

98-02

Figure 6.11: Monthly RMSPEs for Durum wheat

6.8.3 Forecasting of Barley

The RMSPEs of MCYFS in the forecasting of Barley and the p-values of the Wilcoxon tests are reported in the following table:

Table 6.12: Monthly RMSPEs of MCYFS forecasts for Barley – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March 3.63 3.63April 3.89 4.46 3.36 0.8571May 3.53 4.49 2.18 0.3730June 3.44 4.44 1.98 0.2103July 3.93 4.29 0.74 0.1667August 2.74 3.39 1.87 0.4206September 2.95 3.21 0.83 0.3333October 1.65 1.68 1.62 0.6571November 0.74 0.74Overall 3.24 3.86 2.42 The RMSPE in 1998-2002 is smaller than the respective RMSPE of 1993-1997 in all months; the improvements however are not considered statistically significant. Data sparseness may be partly responsible for this outcome. There are however indications that MCYFS has improved in the forecasting of Barley yield after 1998. The contents of Table 6.12 are presented in the following graph:

Page 56: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 54

0.000.501.001.502.002.503.003.504.004.505.00

Mar

ch

Apr

il

May

June July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-0293-9798-02

Figure 6.12: Monthly RMSPEs for Barley

6.8.4 Forecasting of Grain maize

The RMSPEs of MCYFS in the forecasting of Grain maize and the p-values of the Wilcoxon tests are the following:

Table 6.13: Monthly RMSPEs of MCYFS forecasts for Grain maize – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March April 6.26 7.75 2.80 0.2000May 4.73 6.76 1.95 0.0556June 4.18 5.73 1.46 0.0159July 5.07 5.55 0.58 0.1667August 4.04 5.33 2.03 0.0278September 4.08 4.43 2.16 0.4000October 3.43 4.62 1.47 0.0143November 2.77 2.77Overall 4.46 5.74 1.91Note: Marked cells indicate significant cases at the 0.05 level. The RMSPE in 1998-2002 is smaller than the respective RMSPE of 1993-1997 in all months. The improvement in June, August and October is considered statistically significant. Moreover, the p-values of April, May and July are small. Therefore, there are very strong indications that MCYFS has improved in the forecasting of Grain maize yield after 1998.The contents of Table 6.13 are presented in the following diagram:

Page 57: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 55

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Apr

il

May

June July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-0293-9798-02

Figure 6.13: Monthly RMSPEs for Grain maize

RMSPEs after 1998 are definitely smaller than those before 1998. The improvement is slightly smaller in the second half of the year than in the first.

6.8.5 Forecasting of Rape seed

The RMSPEs of MCYFS in the forecasting of Rape seed and the p-values of the Wilcoxon tests:

Table 6.14: Monthly RMSPEs of MCYFS forecasts for Rape seed – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March 6.77 6.77April 7.33 6.46 7.95May 6.86 6.63 7.04June 5.09 4.83 5.29July 4.96 4.83 5.46August 4.99 5.54 4.50 0.6349September 5.09 5.54 2.64 0.6000October 4.92 5.54 4.22 0.6571November 5.46 5.46Overall 5.86 5.66 6.03 The RMSPE in 1998-2002 is greater than the respective RMSPE of 1993-1997 from April until July, and smaller thereafter. No statistically significant improvement in 1998-2002 is detected.

Page 58: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 56

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Mar

ch

Apr

il

May

June July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-0293-9798-02

Figure 6.14: Monthly RMSPEs for Rape seed

Figure 6.14 shows that RMSPEs after 1998 are greater than those before 1998 for all months up to July. The only month where RMSPEs denote improvement after 1998 is September.

6.8.6 Forecasting of Potato

The RMSPEs of MCYFS in the forecasting of Potato and the p-values of the Wilcoxon tests:

Table 6.15: Monthly RMSPEs of MCYFS forecasts for Potato – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March April 6.89 6.89 May 3.75 4.80 3.09 0.1333June 6.12 8.26 2.57 0.1714July 6.99 7.80 1.10 0.2000August 5.36 7.56 2.44 0.0317September 4.90 5.47 0.47 0.2000October 3.77 5.07 1.64 0.0143November 0.81 0.81Overall 5.26 6.80 2.32Note: Marked cells indicate significant cases at the 0.05 level. The RMSPE in 1998-2002 is smaller than the respective RMSPE of 1993-1997 in all months. The improvement in August and October is considered statistically significant. Moreover, the p-values of May, June, July and September are small. Therefore, there are very strong indications that MCYFS has improved in the forecasting of Potato yield after 1998.

Page 59: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 57

0.001.002.003.004.005.006.007.008.009.00

Apr

il

May

June

July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-02

93-97

98-02

Figure 15: Monthly RMSPEs for Potato

6.8.7 Forecasting of Sugar beet

The RMSPEs of MCYFS in the forecasting of Sugar beet and the p-values of the Wilcoxon tests are the following:

Table 6.16: Monthly RMSPEs of MCYFS forecasts for Sugar beet – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March April 4.56 4.56 May 4.95 4.65 5.17 June 5.83 6.17 5.38 0.4524July 4.68 4.22 6.18August 7.87 3.34 10.62September 1.99 2.13 1.08

0.1667

October 3.35 1.33 4.28 November 2.96 2.96Overall 5.38 4.09 6.74 The RMSPE in 1998-2002 is greater than the respective RMSPE of 1993-1997 in all months except June and September. No statistically significant improvement in 1998-2002 is detected. The graphical representation of Table 6.16:

Page 60: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 58

0.00

2.00

4.00

6.00

8.00

10.00

12.00

Apr

il

May

June July

Aug

ust

Sep

tem

ber

Oct

ober

Nov

embe

r

MONTH

RM

SPE

(%)

93-0293-9798-02

Figure 6.16: Monthly RMSPEs for Sugarbeet

RMSPEs after 1998 are greater than those before 1998 for all months, except June and September. Especially in August, after 1998 RMSPEs are much worse than before 1998.

6.8.8 Forecasting of Sunflower

The RMSPEs of MCYFS in the forecasting of Sunflower and the p-values of the Wilcoxon tests are the following:

Table 6.17: Monthly RMSPEs of MCYFS forecasts for Sunflower – p-values of Wilcoxon tests

Month RMSPE

93-02 RMSPE

93-97 RMSPE

98-02 p-value

March April 14.06 14.06 May 11.29 17.35 3.44 0.1000June 7.50 6.59 8.30July 7.15 7.15 August 4.92 3.46 6.04September 7.13 7.70 4.12 0.4000October 6.80 7.70 5.77 0.2429November Overall 7.91 8.89 6.16 The RMSPE in 1998-2002 is greater than the respective RMSPE of 1993-1997 from June until August. No statistically significant improvement in 1998-2002 is detected. The p-value of May however is quite small.

Page 61: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 59

0.002.004.006.008.00

10.0012.0014.0016.0018.0020.00

Apr

il

May

June

July

Aug

ust

Sep

tem

ber

Oct

ober

MONTH

RM

SPE

(%)

93-02

93-97

98-02

Figure 6.17: Monthly RMSPEs for Sunflower

Page 62: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 60

7 REFERENCES Aggrawal, R., Jain, R.C., 1982. Composite model for forecasting rice yields. Indian Journal of Agricultural Sciences, 52:177-181. Armstrong, J. S., Collopy, F., 1992. Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8: 69-80. Boogaard, H.L., Eerens, H., Supit, I., Diepen, C.A. van, Piccard, I., Kempeneers, P., 2002. Description of the MARS Crop Yield Forecasting System (MCYFS). METAMP-report 1/3, Alterra and VITO, JRC-contract 19226-2002-02-F1FED ISP NL. Boons-Prins, E.R., Koning, G.H.J. de, Diepen, C.A. van, Penning de Vries, F.W.T., 1993. Crop-specific parameters for yield forecasting across the European Community. Simulation Reports CABO-TT, no 32. Wageningen, The Netherlands, pp 160. Bradbury, D., 1994. Cereals in Europe. Statistical systems for measuring area, production and yield. Working party "crop products statistics" of the agricultural statistics committee, 10-11 October 1994, Eurostat, Luxembourg, pp 53. Conover, W. J., 1998. Practical nonparametric statistics (3rd rd.). John Wiley and Sons. Dagnelie, P., Palm, R., Istasse, A., 1983. Prévision de productions agricoles dans six pays de la Communauté Economique Européenne. Faculté des Sciences Agronomiques de l’Etat, Gembloux, Belgium. Deschamps, B. and Mehta, D. R., 1980, Predictive ability and descriptive validity of earnings forecasting models. Journal of Finance, 33, 4, 933-949. De Koning G. H. J., Jansen M.J.W., Boons-Prins E. R., van Diepen C. A., Penning de Vries F. W. T., 1993, Crop growth simulation and statistical validation for regional yield forecasting across the European Community, Simulations Reports CABO-TT, n° 31, Wageningen Agricultural University, CABO-DLO, J.R.C. Dennett, M.D., Elston, J., Diego, R., 1980. Weather and yield of tobacco, sugar beet and wheat in Europe. Agricultural Meteorology, 21:249-263. De Winne, P., 1994. Les besoins de la direction générale VI: agriculture. Proceedings of the conference on the MARS project: Overview and Perspectives. (Belgirate, 17-18 Nov., 1993. Publication EUR 15599 EN of the Office for Official Publications of the EU, Luxembourg, pp 17-22. Genovese, G.P., 1998. The methodology, the results and the evaluation of the MARS crop yield forecasting system. In: D. Rijks, J.M. Terres, P. Vossen (eds). Agrometeorological applications for regional crop monitoring and production assessment. EUR 17735 EN, Space Applications Institute, Joint Research Centre of the European Commission, Ispra, Italy, p 67-119.

Page 63: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 61

Genovese, G.P., 2001. Introduction to the MARS Crop Yield Forecasting System (MCYFS). Meeting on 4 and 5 October 2001, Luxembourg. Space Applications Institute, Joint Research Centre of the European Commission, Ispra, Italy, pp 15. Hair, J.F, Tatham, R.L., Anderson, R.E., Black W., 1998, Multivariate Data Analysis, Prentice Hall. Hough, M.N., 1990b. Agrometeorological aspects of crops in the United Kingdom and Ireland. A review for sugar beet, oilseed rape, peas, wheat, barley, oats, potatoes, apples and pears. EUR 13039 EN, Office for Official Publications of the EU, Luxembourg, pp 310. Jansen, M.J.W., 1995. Validation of CGMS. In: J.F. Dallemand, P. Vossen (eds). Workshop for Central and Eastern Europe on agrometeorological models: theory and applications in the MARS project, 21-25 November 1994, Ispra, Italy. EUR 16008 EN, Office for Off. Publ. of the EU, Luxembourg, p 159-170. Karamouzis, N.,1985, An evaluation of M1 forecasting errors by the Federal Reserve Staff in the 1970s. Journal of Money, Credit, and Banking, 17, 4, 512-516. Koning, G.H.J. de, Jansen, M.J.W., Boons-Prins, E.R., Diepen, C.A. van, Penning de Vries, F.W.T, 1993. Crop growth simulation and statistical validation for regional yield forecasting across the European Community. Simulation Reports CABO-TT, No. 31, AB-DLO, Wageningen, The Netherlands, pp 105. Kuyper, M.C., 2001. CGMS statistical sub-system v4.2. Transformation from Unix to Windows platform. Technical manual. Alterra, Green World Research, Wageningen, The Netherlands, pp 23. Makridakis, S., 1993. Accuracy measures: Theoretical and practical concerns. International Journal of Forecasting, 9: 527-529. Makridakis, S., Hibon, M., 1979. Accuracy of forecasting: an empirical investigation (with discussion). J. R. Statist. Soc. A, 142, 2: 97-145. Meyer-Roux, J., 2000. The MARS project: an overview. In: Perdigão, V. (eds). Proceedings of the ‘Conference MARS project. Ten years of demand driven technical support’, 22-23 April 1999, Brussels, Belgium. S.P.I. 00.84, SAI, Joint Research Centre of the European Commission, Ispra, Italy, pp. 9-15. Meyer-Roux, J., Vossen, P., 1994. The first phase of the MARS Project, 1988-1993: overview, methods and results. In: Proceedings of conference on the MARS project: overview and perspectives, 17-18 November 1993, Belgirate, Italy. EUR 15599 EN, Office for Official Publications of the EU, Luxembourg, p 33-81. Odumodu, O.L., Griffits, J.F., 1980. Some techniques for predicting winter wheat yields in major wheat producing crop districts of Texas and Oklahoma. Agricultural Meteorology, 22:267-279. Palm, R., Dagnelie, P., 1993. Tendance génerale et effets du climat dans la prévision des rendements agricoles des different pays des C.E.. EUR 15106 FR, Office for Official Publications of the EU, Luxembourg, pp 128.

Page 64: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 62

Sakamoto, S., 1978. The Z-index as a variable for crop yield estimation. Agricultural Meteorology, 19:305-313. Smith, L.P., 1975. Methods in agricultural meteorology. Developments in Atmospheric Science 3. Supit, I., 1999. An exploratory study to improve predictive capacity of the Crop Growth Monitoring System as applied by the European Commission. Treebook No. 4, ISBN 90-80-443-5-9, Treemail Publishers, Heelsum, The Netherlands, pp 180. Swanson, E.R., Nyankori, J.C, 1979. Influence of weather and technology on corn and soybean yield trends. Agricultural Meteorology, 20:327-342. Vossen, P., 1989. An agrometeorological contribution to quantitative and qualitative rainy season quality monitoring in Botswana. Ph.D. thesis. Faculty of Agricultural Sciences, State University of Gent, Belgium. Vossen, P., 1990a. Modèles Agrométéorologiques pour le Suivi des Cultures et la Prévision de Rendements des Grandes Régions des Communautés Européennes. In: F. Toselli, J. Meyer-Roux (eds). Proceedings of conference on the application of remote sensing to agricultural statistics, 10-11 October 1989, Varese, Italy. EUR 12581 EN, Office for Official Publications of the EU, Luxembourg, p 75-84. Vossen, P., 1990b. Comparative statistical validation of two ten-day water use models and three reduction hypotheses for yield assessments in Botswana. Agricultural and Forest Meteorology, 51:177-195. Vossen, P., 1992. Forecasting national crop yields of E.C. countries: the approach developed by the agriculture project. In: F. Toselli, J. Meyer-Roux (eds). Proceedings of conference on the application of remote sensing to agricultural statistics, 26-27 November 1991, Belgirate, Italy. EUR 14262 EN, Office for Official Publications of the EU, Luxembourg, p 159-176. Vossen, P., 1995a. Early crop production assessment of the European Union, the systems implemented by the MARS-STAT project. In: J.F. Dallemand, P. Vossen (eds). Workshop for Central and Eastern Europe on agrometeorological models: theory and applications in the MARS project, 21-25 November 1994, Ispra, Italy. EUR 16008 EN, Office for Off. Publ. of the EU, Luxembourg, p 21-51. Vossen, P., Rijks, D., 1995. Early crop yield assessment of the EU countries: the system implemented by the Joint Research Centre. EUR 16318 EN, Office for Official Publications of the EU, Luxembourg, pp 180. Winter, S.R., Musick, J.T., 1993. Wheat planting effects on soil water extraction and grain yield. Agronomy Journal, 85:912-916.

Page 65: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 63

8 ANNEXES

8.1 Overview of the software in the MCYFS

WEATHER MONITORING Thematic description Technical information Version Issue dateMeteoInsert inserts meteorological station data Windows - 2000 SupitConstants interpolates regression coefficients to estimate radiation Windows - 2000 CGMS • calculates evapotranspiration and radiation

• determines available weather stations • interpolates station data to regular climatic grid • simulates crop growth simulation (WOFOST 6.0)

Unix, ASCII data files 1.1 1993

CGMS same as CGMS 1.1 (unix) Unix, ORACLE db 3.1/4.1/ 5.1/5.2

1994-1996

CGMS same as CGMS 1.1 (unix) Windows, ODBC 1.5a 2000

same as CGMS 1.1 (unix) new functions: • introduction of campaign season • possible initialization of water balance • logging simulation run data

Windows, ODBC 2.0a 2000

CGMS same as CGMS 2.0a (Windows) new functions: • spatial and temporal variability of initial soil moisture • fixed date to start water balance in stead of x-number of days prior

emergence • temporal variability of sowing dates • simulated soil moisture rooted zone and potential root zone written

to database

Windows, ODBC 2.1 2001

REFERENCEWEATHER calculates long term average station weather Windows - 2000

Anagrw_pc.sql extracts data for creation of weather indicator maps PL-SQL procedure - 1993 Map_grid_automatic_2plot.aml / Meteo_2plot.aml

creates weather indicator maps ArcInfo AML procedure - 1998

Anagrwlt_pc.sql calculates long term average grid weather PL-SQL procedure - 1994

Page 66: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 64

CROP MONITORING Thematic description Technical information Version Issue dateCGMS see weather monitoring - - - Fill_agg_areas_from_KUL.sql inserts areas from K.U.Leuven data into AGGREGATION_AREAS PL-SQL procedure - 1999 Fill_agg_areas_from_regio.sql inserts areas from REGIO into AGGREGATION_AREAS PL-SQL procedure - 1999 Fill_agg_areas_from_RU.sql inserts areas from Russian statistical data into

AGGREGATION_AREAS PL-SQL procedure - 1999

Fill_manual_*.sql corrects area statistics PL-SQL procedures - 1999 Fill_missing_aggregation_areas.sql

calculates missing areas for one NUTS and one YEAR from the sum of its child-NUTS’s where data are complete

PL-SQL procedures - 1999

Fill_area_suitable.sql inserts areas from NUTS_SUITABILITY for NUTS belonging to the same father where other source data are not complete or not available

PL-SQL procedure - 1999

Calcul_perc.sql replaces area, where source is ‘Area Suitable’ & exists at least a complete set of children that have a long term value, with the long term average

PL-SQL procedure - 1999

Calnyl_suit.sql aggregates the crop indicators from EMU to NUTS regions PL-SQL procedure - 1993 Anayldg.sql/anayldg_dvs.sql extracts data for creation of crop indicator maps PL-SQL procedure - 1993 Map_grid_automatic_2plot.aml / Meteo_2plot.aml

creates crop indicator maps ArcInfo AML procedure - 1998

Anayldglt.sql calculates long term average crop indicators on grid level PL-SQL procedure - 1994 REMOTE SENSING Thematic description Technical information Version Issue dateSpace-II automated processing of raw AVHRR-registrations (SHARP1, Level1B,

GAC) until the level of daily composites (Level3/S1) UNIX, C++ - ?

SpacePC PC-version of Space-II with extended facilities Windows, Visual C++ - 2000 SpacePC+ same as above but partly modified by VITO Windows, Visual C++ - 2002 REG_L1B renames Level1B’s (BENssYYMMDDhhmm.L1B) to IMAGE and

stores it in a newly created directory YYMMDDhhmm for access by SpacePC+

Windows, IDL - 2001

TBUS_PARSE detects and copies new NOAA14/16 TBUS files to appropriate SpacePC+ TBUS directory

Windows, IDL - 2001

GET_L2 auxiliary tool to archive files parameters.par, matches.txt and geo_results.txt from SpacePC+ Internal_Level2 (per scene)

Windows, IDL - 2001

Page 67: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis 4 - 65

SP_UP auxiliary tool to unpack all layers in any SpacePC AVHRR-product (Internal_L2 or L3) into separate BYTE-scaled ENVI-images

DOS, MS Quick-Basic - 2001

MARSOP_APPLICATION post-processing of AVHRR/VGT-composites: • AVHRR L3→L4: Extraction of 4 daily scenes (NDVI, surface

temp., scan angle, Scene ID) and spatial degradation to 4.4 km • AVHRR-NDVI L4→L5: Creation of 10-daily composites (S10) • Ingestion of VGT-S10 (similar NDVI-composites from SPOT-

VEGETATION) • AVHRR/VGT-NDVI L5→L6: Creation of monthly composites

(S30) • Computation of images with (abs./rel.) NDVI-difference w.r.t.

previous year • AVHRR/VGT NDVI L5/S10: Extraction of CNDVI-databases

Windows, Visual C++ - 2001

MONTEITH computation of DMP-images (Dry Matter Productivity) from any NDVI-S10 and corresponding Meteo-images (radiation, temperature)

DOS, ANSI-C - 2001

DMP_MON monthly DMP-images from three 10-daily DMP-S10 DOS, ANSI-C - 2001 DMP_CUM cumulative DMP-images from series of 10-daily DMP-S10 DOS, ANSI-C - 2001 DMP_MENU Menu-Shell for application of DMP-progs on 4 MARS-FOOD zones DOS, MS Quick-Basic - 2001 INDEX computes any simple combination of 2/3 input images. Also used to

derive images with (abs./rel.) DMP-difference w.r.t. previous year DOS, ANSI-C - 2002

PNG_MAKE generates all types of PNG-QuickLooks Windows, IDL - 2001 Cop_*.bat

Copies MARSOP products to VITO FTP-site and renames them from IMG/ADJ to BIL/HDR for use in ArcView.

DOS, BAT-file - 2001

TreeComp transfer of MARSOP products by FTP Windows 3.7 2001 PuTTY update of the JRC CNDVI database by means of an SSH session, using a

script for SQLplus 0.51 2001

QUANTITATIVE CROP YIELD FORECASTS

Statistical sub-system of the CGMS version 4.2

calculation of yield forecasts S-PLUS 2000 Professional

4.2 2001

Caleur_cronos.sql fills the EUROSTAT table from the CRONOS table PL-SQL procedure - 1999 Caleur_regio.sql fills the EUROSTAT table from the REGIO table PL-SQL procedure - 1999

Page 68: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 66

8.2 CGMS tables for the yield forecast procedure

Notation: • Underlined indicates these fields form the primary key for searching and sorting. • NOT NULL indicates this field is obligated. • NUMBER(8,2) means the value must be a number and will be stored with two digits. • VARCHAR2(1) means the value is a character string with one character. CRONOS (statistical data from EUROSTAT) YEAR (calendar year) NOT NULL NUMBER(4) - CROP (character code of crop) NOT NULL VARCHAR2(8) - NUTS_CODE (character code administrative unit) VARCHAR2(8) - ACREAGE (acreage cultivated) NUMBER(16,3) 103 ha YIELD (yield (fresh weight)) NUMBER(16,3) 103 kg.ha-1 PRODUCTION (yield production (fresh weight)) NUMBER(16,3) 106 kg EUROSTAT (statistical data from REGIO or CRONOS table) STAT_CROP_NO (‘statistic’ crop number) NOT NULL NUMBER(2) - NUTS_CODE (character code administrative unit) NOT NULL VARCHAR2(8) - YEAR (calendar year) NOT NULL NUMBER(4) - AREA_CULTIVATED (acreage cultivated ) NUMBER(10,3) 103 ha OFFICIAL_YIELD (yield (fresh weight)) NUMBER(6,3) 103 kg.ha-1 FORECASTED_NUTS_YIELD (forecasted crop yields at NUTS level) STAT_CROP_NO (‘statistic’ crop number) NOT NULL NUMBER(2) - DECADE (decade) NOT NULL NUMBER(2) - YEAR (calendar year) NOT NULL NUMBER(4) - NUTS_CODE (character code administrative unit) NOT NULL VARCHAR2(8) - FORECASTED_YIELD (forecast of crop yield as fresh weight of storage organs) NUMBER(6,3) 103 kg.ha-1 REG_PARAM (regression parameters to calculate the yield forecast) STAT_CROP_NO (‘statistic’ crop number) NOT NULL NUMBER(2) - NUTS_CODE (character code administrative unit) NOT NULL VARCHAR2(8) - DECADE (decade) NOT NULL NUMBER(2) - TYPE_OF_DATA_SET NOT NULL VARCHAR2(1) - (data set over years used to derive parameters (W=window, T=total) START_YEAR NOT NULL NUMBER(4) - (start of sequence where parameters are based on) END_YEAR NOT NULL NUMBER(4) - (end of sequence where parameters are based on) NUMBER_OF_OBS_REGRESSION NOT NULL NUMBER(2) - (number of observation (years) for regression; in case of window: window size) NUMBER_OF_OBS_TOTAL NOT NULL NUMBER(2) - (number of total observations (years) to select the predictor) MISSING_YEARS_TOTAL NOT NULL NUMBER(2) - (number of missing years from total data set) MEAN_YEAR NOT NULL NUMBER(6,1) - (year to be used in forecasting formula) MEAN_OFFICIAL_YIELD NOT NULL NUMBER(8,3) 103 kg.ha-1 (mean official yield) YEARLY_INCREASE NOT NULL NUMBER(8,3) 103 kg.ha-1.yr-1 (yearly increase in yield) USED_REGRESSION_COEFFICIENT NOT NULL VARCHAR2(2) - (indicates which (simulated) variable is to be used for forecasting) COEFF_POTENTIAL_BIOMASS NOT NULL NUMBER(6,3) -

Page 69: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 67

(multiplier for delta potential biomass) COEFF_POTENTIAL_STORAGE NOT NULL NUMBER(6,3) - (multiplier for delta potential storage organs) COEFF_WATER_LIM_BIOMASS NOT NULL NUMBER(6,3) - (multiplier for delta water limited biomass) COEFF_WATER_LIM_STORAGE NOT NULL NUMBER(6,3) - (multiplier for delta water limited storage organs) MEAN_POTENTIAL_BIOMASS NOT NULL NUMBER(8,3) 103 kg.ha-1 (mean potential biomass) MEAN_POTENTIAL_STORAGE NOT NULL NUMBER(8,3) 103 kg.ha-1 (mean potential storage organs) MEAN_WATER_LIM_BIOMASS NOT NULL NUMBER(8,3) 103 kg.ha-1 (mean water limited biomass) MEAN_WATER_LIM_STORAGE NOT NULL NUMBER(8,3) 103 kg.ha-1 (mean water limited storage) STUDENT_VALUE_P5 NOT NULL NUMBER(6,3) - (T-value resulting from regression, for model indicator) RSQ_P0 NOT NULL NUMBER(6,3) - (R-square of trend only) RSQ_P5 NOT NULL NUMBER(6,3) - (R-square of selected regression) REL_RMSQ_RESIDUAL_ERR_P5 NUMBER(5,1) % (relative rmsq residual error of selected regression) REL_RMSQ_JACKKNIFE_ERR NUMBER(5,1) % (relative rmsq jackknife error of prediction based on all years) REL_RMSQ_ONE_YEAR_AHEAD_ERR_P5 NUMBER(5,1) % (relative rmsq one year ahead error of complete prediction rule) REL_RMSQ_TWO_YEAR_AHEAD_ERR_P5 NUMBER(5,1) - (relative rmsq two year ahead error of complete prediction rule) ONE_YEAR_AHEAD_ERR_P0 NUMBER(8,3) - (one year ahead error of trend only) ONE_YEAR_AHEAD_ERR_P5 NUMBER(8,3) - (one year ahead error of the selected regression) TWO_YEAR_AHEAD_ERR_P0 NUMBER(8,3) - (two year ahead error of trend only) TWO_YEAR_AHEAD_ERR_P5 UMBER(8,3) - (two year ahead error of selected regression) STANDARD_DEV_P5 NOT NULL NUMBER(7,4) - (standard deviation of selected regression) STAT_CROP (description of ‘statistic’ crops)

STAT_CROP_NO (‘statistic’ crop number) NOT NULL NUMBER(2) - CROP_NAME (name of crop) NOT NULL VARCHAR2(40) - LAST_USED_SIM_CROP (‘simulated’ crop used for last forecast) NUMBER(2) -

Page 70: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 68

8.3 Flow diagrams of the CGMS procedures

This annex gives flow diagrams of all the CGMS procedures (the dark colour of tables indicates that the procedure update or add records to these tables):

Figure 10 Tables used by the statistical sub-system of the CGMS to calculate yield forecasts

NUTS_YIELD REG_PARAM

FORECASTED_NUTS_YIELD

CGMS:Yield Forecast Calculation

NUTS

REG_PARAM

EUROSTAT

STAT_CROP

CROP

Page 71: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 69

8.4 Maps of the country MPEs per crop

Soft wheat

MPE (%) of MCYFS forecastsfor soft wheat yield in EU-15

-5.29 - -1.52-1.52 - 2.292.29 - 6.116.11 - 9.929.92 - 10.86

Figure 1: MPE of MCYFS forecasts for the yield of Soft wheat (averaging across all months and years).

In the forecasts for Soft wheat the large positive error bias for Finland and the negative bias in Spain, Portugal and Ireland, although not comparable in size with the bias of Finland, can be observed. No bias was found significant from the statistical tests. A geographic effect is apparent in the fact that the largest negative biases (beyond the –3% limit) are found in the west (Iberian Peninsula and Ireland) and in that neighbouring countries have comparable MPEs. However, geographic effects are easier to see however in the following map.

Page 72: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 70

MPE (%) classification ofMCYFS firecasts for soft wheat

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 2: MPE classification of MCYFS forecasts for the yield of Soft wheat (averaging across all months and years).

A strong geographic effect is apparent in Figure as neighbouring countries tend to have MPE in the same class. Notable exceptions are the Netherlands and Finland. The latter has an MPE above the 6% mark, while Spain and Portugal have MPEs between –3% and –6%.

Page 73: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 71

Durum wheat

MPE (%) of MCYFS forecasts for durum wheat yield in EU-15

-0.71 - 2.052.05 - 8.828.82 - 15.5915.59 - 17.94

Figure 3: MPE of MCYFS forecasts for the yield of Durum wheat (averaging across all months and years).

In the forecasts for Durum wheat large positive error biases are observed in Spain and Austria. No large negative bias is observed. Moreover, no bias is statistically significant. No similarities are noticed between neighbouring countries; it can be said in fact that dissimilarities between neighbours are the strongest feature of the map.

Page 74: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 72

MPE (%) classification ofMCYFS firecasts for durum wheat

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 4: MPE classification of MCYFS forecasts for the yield of Durum wheat (averaging across all months and years).

The dissimilarities noted in Figure 3 are also evident here. Of note also is the preponderance of overestimation in the forecasting of Durum wheat yields.

Page 75: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 73

Barley

MPE (%) of MCYFS forecasts for barley yield in EU-15

-5.16 - -4.36-4.36 - -1.14-1.14 - 2.082.08 - 5.295.29 - 6.82

Figure 5: MPE of MCYFS forecasts for the yield of Barley (averaging across all months and years).

In the forecasts for Barley large positive error bias is evident in Finland and Greece while large negative bias is evident in Portugal. No bias is statistically significant. France and Germany have similar bias, while notable are the difference between Spain and Portugal, between Sweden and Finland and between Belgium and its neighbours. Evident is also the existence of larger biases in the three corners of the EU.

Page 76: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 74

MPE (%) classification ofMCYFS firecasts for barley

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 6: MPE classification of MCYFS forecasts for the yield of Barley (averaging across all months and years).

In Figure 6 an “axis” of small negative biases connecting Spain with Sweden can be noted. We can also see more clearly the large positive biases in the three corners of the map (Portugal, Greece and Finland) and the quite large bias of Belgium.

Page 77: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 75

Grain maize

MPE (%) of MCYFS forecasts for grain maize yield in EU-15

-12.10 - -11.20-11.20 - -6.98-6.98 - -2.77-2.77 - -0.11

Figure 7: MPE of MCYFS forecasts for the yield of Grain maize (averaging across all months and years).

In the forecasts for Grain maize the bias is negative in all countries. Netherlands and Portugal have the largest negative bias. The bias of Portugal is statistically significant at the nominal 0.01 level. Because of multiplicity concerns however, it is not considered significant even at the 0.05 level. The cluster of Germany, France and Spain has similar biases. Portugal stands out from Spain and so does Netherlands compared to Germany. Note also the similarity of Italy and Greece.

Page 78: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 76

MPE (%) classification ofMCYFS firecasts for grain maize

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 8: MPE classification of MCYFS forecasts for the yield of Grain maize (averaging across all months and years).

Looking at MPE classes the dominance of underestimation in the forecasts of Grain maize is evident. Portugal, Netherlands and Belgium with large underestimation are very different from their neighbours.

Page 79: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 77

Rape seed

MPE (%) of MCYFS forecasts for rape seed yield in EU-15

-3.81 - 0.220.22 - 13.3913.39 - 26.5726.57 - 36.54

Figure 9: MPE of MCYFS forecasts for the yield of Rape seed (averaging across all months and years).

In the forecasts for Rape seed large positive error bias are present for Italy, Finland and Austria, while the strongest negative bias is observed in Sweden. No bias is found to be statistically significant. A grouping of UK, France and Spain is evident on the map but there are also neighbours with very dissimilar error figures.

Page 80: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 78

MPE (%) classification ofMCYFS firecasts for rape seed

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 10: MPE classification of MCYFS forecasts for the yield of Rape seed (averaging across all months and years).

The most notable feature of Figure 10 is the high overestimation of Rape seed’s yield along the eastern side of the EU. Overestimation afflicts the greatest part of the EU in terms of area.

Page 81: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 79

Potato

MPE (%) of MCYFS forecasts for potato yield in EU-15

-9.75 - -7.68-7.68 - -4.35-4.35 - -1.02-1.02 - 2.312.31 - 2.53

Figure 11: MPE of MCYFS forecasts for the yield of Potato (averaging across all months and years).

In the forecasts for Potato positive error biases are not large (the largest being demonstrated in Sweden), while large negative biases are observed in Belgium and Spain. The bias of Spain, in fact, is significant at the 0.05 level even after Bonferonni adjustment. Geographic similarities are not very common apart from the case of Germany and Denmark.

Page 82: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 80

MPE (%) classification ofMCYFS firecasts for potato

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 12: MPE classification of MCYFS forecasts for the yield of Potato (averaging across all months and years).

Looking at error classes geographic similarities become more apparent. There is a large cluster of countries in the centre-north of the EU with comparable small underestimation of Potato yield. Underestimation is very common, being significantly high in Spain.

Page 83: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 81

Sugar beet

MPE (%) of MCYFS forecasts for sugar beet yield in EU-15

-8.90 - -6.41-6.41 - -3.56-3.56 - -0.71-0.71 - 1.96

Figure 13: MPE of MCYFS forecasts for the yield of Sugar beet (averaging across all months and years).

In the forecasts for Sugar beet large negative bias are observed for Spain. The latter in fact is statistically significant at the 0.05 level. We can note the difference of Spain from France and the cluster of similar countries in the north.

Page 84: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 82

MPE (%) classification ofMCYFS firecasts for sugar beet

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 14: MPE classification of MCYFS forecasts for the yield of Sugar beet (averaging across all months and years).

The same geographic clusters appear in Figure 14 as in Figure 13. The high underestimation for Spain is evident here as well.

Page 85: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 83

Sunflower

MPE (%) of MCYFS forecasts for sunflower yield in EU-15

-3.60 - 1.951.95 - 10.2910.29 - 18.6318.63 - 23.29

Figure 15: MPE of MCYFS forecasts for the yield of Sunflower (averaging across all months and years).

The map points out large overestimation in Portugal and Spain, while the only underestimation (not so large in absolute terms anyway) is observed for Austria. No bias however is statistically significant.

Page 86: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 84

MPE (%) classification ofMCYFS firecasts for sunflower

yield in EU-15

( -Inf , -6% )( -6% , -3%)( -3% , 0%) ( 0% , 3%) ( 3% , 6%)

( 6% , Inf )

Figure 16: MPE classification of MCYFS forecasts for the yield of Sunflower (averaging across all months and years).

In Figure 16 two groupings of neighbouring countries with comparable bias are observed. Of note is the preponderance of overestimation of Sunflower yield and the high overestimation in Portugal and Spain.

Page 87: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 85

8.5 Maps of the country MAPEs per crop

Soft wheat

MAPE (%) of MCYFS forecasts for soft wheat yield in EU-15

3.62 - 5.215.21 - 12.2312.23 - 19.2619.26 - 26.2826.28 - 27.01

Figure 1: MAPE of MCYFS forecasts for the yield of Soft wheat (averaging across all months and years).

The largest errors in absolute terms are observed in Portugal, Finland and Spain. There seems to exist a tendency for increase of error as moving towards the southern and eastern corners of the EU, especially in the direction of Portugal and Finland.

Page 88: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 86

Durum wheat

MAPE (%) of MCYFS forecasts for durum wheat yield in EU-15

6.58 - 6.746.74 - 13.2213.22 - 19.7019.70 - 26.1826.18 - 32.51

Figure 2: MAPE of MCYFS forecasts for the yield of Durum wheat (averaging across all months and years).

As far as Durum wheat is concerned, the largest errors occur in Portugal, Spain and Austria. Error tends to increase as moving towards Portugal. Austria stands out from its neighbours.

Page 89: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 87

Barley

MAPE (%) of MCYFS forecasts for barley yield in EU-15

4.16 - 7.667.66 - 11.3611.36 - 15.0515.05 - 18.7418.74 - 22.4422.44 - 26.1326.13 - 27.12

Figure 3: MAPE of MCYFS forecasts for the yield of Barley (averaging across all months and years).

In the forecasts of Barley Portugal has the largest errors, followed by Spain and Greece. Error again increases as moving towards Portugal, Greece and Finland. This pattern has been identified in all three cereals of the study.

Page 90: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 88

Grain maize

MAPE (%) of MCYFS forecasts for grain maize yield in EU-15

3.20 - 3.643.64 - 7.057.05 - 10.4610.46 - 13.8613.86 - 17.2717.27 - 20.6820.68 - 20.69

Figure 4: MAPE of MCYFS forecasts for the yield of Grain maize (averaging across all months and years).

In the forecasts for Grain maize the pattern changes and the largest errors now occur in Netherlands and Belgium. There are clusters of neighbours with similar errors and neighbours with errors in adjacent classes. Portugal is quite different from Spain but the most striking feature of the map is the difference of Belgium and Netherlands from their neighbours as well as the difference between them.

Page 91: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 89

Rape seed

MAPE (%) of MCYFS forecasts for rape seed yield in EU-15

8.70 - 13.7713.77 - 21.6921.69 - 29.6129.61 - 37.5437.54 - 45.4645.46 - 50.25

Figure 5: MAPE of MCYFS forecasts for the yield of Rape seed (averaging across all months and years).

All country forecasts for Rape seed show quite large errors, with the largest being those of Italy and Finland. Of note is the similarity in error figures of northern Europe.

Page 92: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 90

Potato

MAPE (%) of MCYFS forecasts for potato yield in EU-15

3.25 - 3.903.90 - 5.555.55 - 7.197.19 - 8.848.84 - 10.4910.49 - 12.1412.14 - 12.23

Figure 6: MAPE of MCYFS forecasts for the yield of Potato (averaging across all months and years).

The forecasts for Potato do not demonstrate as large errors as Rape seed. The largest errors occur in Ireland, Portugal, the Nordic countries and Greece. Dissimilarities between neighbours are regular.

Page 93: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 91

Sugar beet

MAPE (%) of MCYFS forecasts for sugar beet yield in EU-15

5.16 - 7.457.45 - 9.819.81 - 12.1812.18 - 14.5514.55 - 16.9216.92 - 17.54

Figure 7: MAPE of MCYFS forecasts for the yield of Sugar beet (averaging across all months and years).

In the forecasts for Sugar beet the largest errors are observed in Ireland and Finland. As with cereals a tendency for larger errors at the corners of the EU can be noted, although this time larger errors are found to the northern corners than to southern ones.

Page 94: VOL 4 STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1ies-webarchive-ext.jrc.it/mars/mars/content/download/608/4356/file... · STATISTICAL DATA COLLECTION, PROCESSING AND ANALYSIS1

Methodology of the MCYFS Vol. 4 – statistical data collection processing and analysis page 92

Sunflower

MAPE (%) of MCYFS forecasts for sunflower yield in EU-15

5.20 - 5.285.28 - 12.0212.02 - 18.7718.77 - 25.5225.52 - 32.2732.27 - 35.51

Figure 8: MAPE of MCYFS forecasts for the yield of Sunflower (averaging across all months and years).

In the forecasts for Sunflower large errors occur in Portugal, Spain and Greece. The largest errors occur in the corners of the EU; moreover, a certain similarity can be observed between countries at the north of mainland Europe.