Flash estimates on income and poverty indicators ... FLASH ESTIMATE... · Flash estimates for income and poverty indicators The objective of the exercise is to produce flash estimates

1

Doc. FLASH/2-2/16/EN

estat.f.1 (2016)

2nd MEETING OF THE TASK-FORCE

ON FLASH ESTIMATES FOR INCOME AND POVERTY

25th October 2016

At 09.00 am

Eurostat - Luxembourg

BECH B2/404

Flash estimates on income and poverty indicators:

methodological report

EUROPEAN COMMISSION EUROSTAT

Directorate F: Social statistics

Unit F-1: Social indicators; methodology and development; relations with users

2

Acknowledgements:

This report summarises the main outcomes of the first year of the Eurostat project on

developing flash estimates on income inequalities and poverty for the European Union.

The microsimulation results in this report represent a major output from the cooperation

between Eurostat and the Institute for Social and Economic Research (ISER) in University of

Essex. They rely on the use of EUROMOD, the European Union tax-benefit microsimulation

model, managed, maintained and developed by ISER. Eurostat would like to thank the

EUROMOD team in ISER for their fruitful cooperation, in particular, the director of the

EUROMOD programme, Holly Sutherland and Olga Rastrigina, the Senior Researcher

coordinating our cooperation.

The process of extending and updating EUROMOD is financially supported by the

Directorate General for Employment, Social Affairs and Inclusion of the European

Commission [Progress grant no. VS/2011/0445]. The EUROMOD input dataset is based on:

1) Micro-data from the EU Statistics on Incomes and Living Conditions (EU-SILC) for

Belgium, Bulgaria, Czech Republic, Denmark, Germany, Ireland, Croatia, Cyprus, Latvia,

Hungary, Malta, the Netherlands, Portugal, Romania, Slovenia, Finland and Sweden; 2) EU-

SILC data and some variables from national EU-SILC data for Estonia, Lithuania,

Luxembourg, and Poland; and 3) national EU-SILC data for Greece, Spain, France, Italy,

Austria, and Slovakia, made available by respective national statistical offices. The results

based on PQM make use of EU-SILC for all countries.

3

Table of Contents Summary ................................................................................................................................................. 4

1. Introduction ................................................................................................................................. 22

2. Overview target income data in EU-SILC ................................................................................... 24

3. Methodology ................................................................................................................................. 26

3.1. General framework ............................................................................................................... 26

3.2. Microsimulation .................................................................................................................... 27

3.2.1. Changes in population characteristics ........................................................................... 28

3.2.2. Updating non-simulated income sources ...................................................................... 30

3.2.3. Simulating changes in tax-benefit policies .................................................................... 33

3.3. Parametric Quantile Modelling (PQM) .................................................................................. 34

3.4. Benchmark models ................................................................................................................ 37

4. Quality framework ....................................................................................................................... 38

4.1. Quality Assurance .................................................................................................................. 38

4.1.1. Consistency analysis of auxiliary sources ...................................................................... 38

4.1.2. Intermediate quality checks .......................................................................................... 46

4.2. Quality assessment ................................................................................................................ 53

4.2.1. Retrospective performance analysis ............................................................................. 53

4.2.2. Method selection .......................................................................................................... 66

4.2.3. Ex-ante quality assessment for flash estimates in the target year ............................... 76

4.2.4. Conclusions .................................................................................................................... 77

5. Flash estimates 2015 ................................................................................................................... 79

5.1. Production of FE 2015 ........................................................................................................... 79

5.2. Communication of flash estimates 2015: magnitude-direction scales ................................. 82

5.3. Main results ........................................................................................................................... 84

6. References .................................................................................................................................. 101

4

SUMMARY

Objectives

The indicators on poverty and income inequality based on the European Union Statistics on

Income and Living Conditions (EU-SILC) are an important part of the toolkit for the

European Semester. Currently, the indicators on income of year N are only available in the

autumn of year N+2, which comes too late for the EU’s policy agenda. Therefore, an

improvement in this area represents an important priority for monitoring the effectiveness of

social policies at EU level.

The strategy of the European Statistical System (ESS) for providing more timely data on

income is based on two pillars:

− Flash estimates about income N in Summer N+1, that would be available to prepare

and to start the European Semester and provide data to Macroeconomics Imbalance

Procedure (MIP) in autumn N+1

− Final EU-SILC data (or at least data more robust than flash estimates) on income N

during the European Semester (end N+1 / early N+2)

This report presents the methods developed so far as well as their application for the year

2015 (income year1) based on the on-going exercise in Eurostat. It includes the description of

the main methodologies tested and of the quality assessment framework (QAF) put in place in

order to ensure a comparable way to assess results stemming from different methods. In June

2016, Eurostat set up a task force on “Flash estimates on income distribution” with 8 Member

States (BE, IT, PT, LU, DE, FR, SE and UK) to support the on-going work, with several

National Statistical Institutes already developing and releasing flash estimates at national level

(UK2, FR

3). Eurostat and the Institute for Social and Economic Research (ISER) at the

University of Essex are also cooperating in order to further develop and enhance the use of

EUROMOD4 and microsimulation for nowcasting income distribution indicators

5.

The main challenge given the sensitivity of releasing early estimates is to ensure a sufficient

level of confidence in these figures and thus Eurostat efforts focused in this first part of the

project on four main areas: (1) a review of several methods from more simple naïve

forecasting methods to more complex meso-level dynamic factor models and microsimulation

1Years throughout the paper will refer to income year (N) and not the EU-SILC data collection year (N+1). For

Ireland the income reference period is the last twelve months and for the United Kingdom the current income is

annualised and aims to refer the current calendar year, i.e. weekly estimates are multiplied by 52, monthly by

12. 2https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bulle

tins/nowcastinghouseholdincomeintheuk/2015to2016 3http://www.insee.fr/fr/themes/document.asp?ref_id=ia23).

4EUROMOD is a tax-benefit microsimulation model for the European Union, for more details see

https://www.euromod.ac.uk/. 5 This cooperation partly builds on the previous work on nowcasting carried out in ISER, see e.g. Rastrigina et

al. (2016).

https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bulletins/nowcastinghouseholdincomeintheuk/2015to2016


http://www.insee.fr/fr/themes/document.asp?ref_id=ia23

5

techniques; (2) the set-up of the QAF in order to both select and identify conditions of use of

these methods and assess the quality of final estimates; (3) build a common platform for

exchanging experiences via the formation of the task force and creation of a dedicated

website6 and 4) address communications issues and possible ways to disseminate such

experimental statistics.

Flash estimates for income and poverty indicators

The objective of the exercise is to produce flash estimates for income and poverty indicators

collected through EU-SILC. The starting point is the main income indicators used in the

frame of Europe 2020 and in the scoreboard of key social indicators: at-risk-of poverty rate7

and the income quintile share ratio8 (QSR or S80/S20). However, an analysis of the trends in

EU-SILC (2008-2014) showed that these indicators are rather structural and the observed

yearly changes are often not significant. In terms of early estimates there can be other

indicators which are more sensitive: i.e. important shifts at different points of the income

distributions throghout the crisis are captured by what we call "positional indicators" (i.e. at-

risk-of-poverty threshold (ARPT) and deciles).

Therefore, we included both inequality indicators such as AROP and the evolution of income

at different points of the income distributions in our analysis.

In general terms flash estimates on income distribution and poverty should9:

− Refer to a past yearly reference period (year N). This should differentiate the

flash estimates from a purely forecasting exercise. In this round the main purpose

is to use and develop methods and apply them to test the production of flash

estimates 2015 (therefore collected through EU-SILC 2016) ;

− Refer to a set of distributional indicators for equivalised disposable income. The

set included in the analysis contains at-risk-of-poverty rate (AROP), the income

quintile share ratio and the deciles of the equivalised household disposable

income;

− Be based on an information set that includes the latest income data available from

EU-SILC (income N-1 or N-2 even) that is the reference data source in

establishing EU poverty statistics. This will be enhanced with more timely

auxiliary information from the reference period (year N) such as Labour Force

Survey (LFS), National Accounts, etc.;

− Be based on a set of statistical techniques, such as calibration, modelling,

extrapolation that are not traditionally used in the calculation of social statistics

indicators.

6https://ec.europa.eu/eurostat/cros/content/flash-estimates-income-and-poverty-indicators_en

7 http://ec.europa.eu/eurostat/statistics-explained/index.php/People_at_risk_of_poverty_or_social_exclusion

8http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Income_quintile_share_ratio

9http://unstats.un.org/unsd/nationalaccount/workshops/2009/ottawa/AC188-S31a.PDF

https://ec.europa.eu/eurostat/cros/content/flash-estimates-income-and-poverty-indicators_en

http://ec.europa.eu/eurostat/statistics-explained/index.php/People_at_risk_of_poverty_or_social_exclusion

http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Income_quintile_share_ratio

http://unstats.un.org/unsd/nationalaccount/workshops/2009/ottawa/AC188-S31a.PDF

6

Methodological approaches

Flash estimates have already been developed at EU level in relation to macro-indicators such

as early releases of the GDP growth and inflation rate10

. However, in our case the focus is on

the distributional changes and this implies more complex models that allow the estimation of

the entire distribution and capture the complex interaction of a large number of various past

and present events, such as the effects of economic and monetary policies, the

implementation of social reforms or shifts in macroeconomic circumstances or demographic

changes.

Two main approaches are currently being tested in the frame of the flash estimates for income

and poverty indicators project: (1) Evolution of income components-microsimulation; (2)

Parametric quantile approach. In addition a third approach is based on basic models to be used

as benchmarks in the estimation. A fourth method, based on monthly income, will be

developed when these data will become available for several countries.

The first approach is in line with current practices in different Member States and it aims to

micro-simulate income changes at individual/household level within EU-SILC microdata. The

EU tax-benefit microsimulation model EUROMOD is used for this purpose in combination

with timelier macro-level statistics on changes in demographics, employment characteristics

and income. For the purposes of the nowcasting exercise standard EUROMOD policy

simulation routines are enhanced with additional adjustments to the input data to take into

account changes in the population structure and labour market characteristics. Micro-

simulation models are very powerful in accounting for the effect of changes in the the tax-

benefit policies which leads to a high explanatory power of this family of models.Moreover,

you can disentangle the effects on specific socio-economic groups or differentiate between

policy effects and other factors such as changes in the population structure. However, there

are several assumptions that might affect the abbility of the micro-simulation model to predict

actual changes in observed data. For a limited number of countries there are simulation for tax

evasions and non take-up of benefits but in the large majority of cases EUROMOD doesn't

inlude behavioral effects due to policy changes.

The parametric quantile model (PQM) family relies on the exploitation of macro-level

information, provided by National Account figures such as GDP, consumption, wages and

salaries. It aims at producing flash estimates of a set of distributional parameters, which are

assumed to encode the entire information contained within the income distribution, instead of

income distribution itself. The estimate of the distribution is derived at a second-step of the

estimation procedure through a mapping algorithm which allows reconstituting the

distribution from the set of parameters. The flash estimates of the distributional parameters

are obtained on the basis of dynamic factor models. Hence, PQM models are designed to

capture the impact of changes in macroeconomic circumstances on the income indicators.

They might also capture changes in social and economic policies if and only if these changes

are reflected somehow in the National Account figures.

10

http://ec.europa.eu/eurostat/statistics-

explained/index.php/Inflation_%E2%80%93_methodology_of_the_euro_area_flash_estimate

http://ec.europa.eu/eurostat/statistics-explained/index.php/Inflation_%E2%80%93_methodology_of_the_euro_area_flash_estimate

http://ec.europa.eu/eurostat/statistics-explained/index.php/Inflation_%E2%80%93_methodology_of_the_euro_area_flash_estimate

7

A third group of more “naïve” forecasting techniques were included in the framework

essentially to be used as benchmarks. The benchmark models build flash estimates on the

basis of extrapolations of past income information. Note that the benchmark model do not

contain explanatory variables which means that they only provide predictive information and

hence have no explanatory power in terms fo additional variables that can drive the changes in

the income distribution.

A fourth approach relies on the collection in EU-SILC of additional item(s) on current income

and it will be done in the next stage when further data will become available from countries

which collected this information on voluntary basis. Currently we have collected data from 7

countries and this approach will be included both as a stand-alone line of work or as a

complement for updating information on income components for the next round of flash

estimates (2016).

The aim of considering several approaches was to develop a toolkit of methods (models) that

can be used in all countries and different macro-economic conditions. Ideally we should have

one method that outperforms the other in order to have a common harmonised approach.

However, the four families of methods differ in terms of model complexity (and flexibility),

the way current information is used in the production of the flash estimates and their

explanatory power. The most comprehensive model might not constitute the best choice in all

countries and socio-economic circumstances. In some cases, a complex model, accounting for

all potential influences on the social indicators, can be over-parametrized and lead to a lower

predictive power relative to the simpler models. All the methods considered in the exercise

went through the common quality assessment framework and were scored based on their

historical performance in the test period (2012-201411

). In the future we will add to this

toolkit the current income approach as well as other possible national solutions that can be

then assessed through the common quality framework.

Quality assessment framework

The quality of the flash estimates depends on both the methods and sources used. Therefore,

setting and using a common quality framework becomes a central part of the production

process to select the methods and auxiliary sources that perform best in reproducing EU-SILC

indicators for previous years. This would also allow to benchmark results stemming from

Eurostat exercise, including the estimates based on EUROMOD done in collaboration with

ISER, as well as other national initiatives. In particular, the United Kingdom and France are

also producing their own flash estimates at national level and therefore a common platform

for validating and benchmarking results is essential.

The quality framework is composed of two main parts: (1) Quality assurance, which focuses

on analysing inconsistencies in the input data and includes several intermediate quality checks

11

At the moment of writing this report 2014 income data (EU-SILC 2015) were not available for DE, IE, IT,

CY, LU

8

along the estimation process; (2) Quality assessment, which focuses on the historical

performance of different methods.

(1) Quality assurance

A retrospective assessment of data consistency was performed for the auxiliary information to

be used in the estimation process (LFS and National Accounts). This is a historical

comparison, i.e. between two sets of observed data. Consistency means that relevant macro-

level aspects such as income components and totals as measured by National Accounts or

unemployment as measured by LFS are similar to EU-SILC trends. Source inconsistencies

can steam from differences in concepts, coverage, sampling variance or errors in the data

collection. The lack of such consistency at macro level in the trends showed by EU-SILC and

auxiliary sources would mean that a FE generated by a model that includes the

aforementioned macro data as explanatory variables is not expected to be a good estimation of

the EU-SILC-derived indicator. This analysis showed that overall employment trends for LFS

can be used as a proxy in most of the countries. Data on totals for different income

components from National Accounts and EU-SILC were checked in terms of levels and trends

for the period 2009-2014. The intermediate checks showed that for property and self-

employment income, the two sources are very different. Moreover, these components vary

significantly between the quarterly National Accounts data available for the flash estimates

and later revisions. Therefore, only income from employment, social benefits and taxes from

the National Accounts are used for time series modelling. Further work is ongoing in Eurostat

on the reconciliation of macro-micro data on income and this can feed also the work on flash

estimates12

.

In addition, we also introduced several intermediate checks in order to assess and disentangle

effects introduced in the different steps of the estimation process: e.g. comparing distributions

after calibration, the cumulated effects of calibration and uprating on the different income

components. More details on the quality assurance part can be found in Section 4.1.

(2) Quality assessment

As part of the assessment procedure, we follow three main stages:

2.1. Perform a retrospective analysis of the performance of flash estimates for 2012- 2014

income years.

In particular we look at:

(a) The two quality metrics for comparing point estimates and year-on-year changes for the

key income indicators: accuracy and consistency;

(b) An additional quality measure that takes into account the uncertainty related to the

sampling variance. This has the advantage also of providing an absolute performance

threshold so we can select out the low-performing methods.

12

http://www.iariw.org/dresden/gregorini.pdf

http://www.iariw.org/dresden/gregorini.pdf

9

The performance analysis included 36 methods: 24 from PQM, 9 from microsimulation and

the 3 benchmark models. In terms of performance over the period of analysis there are some

fundamental conclusions that can be drawn at this stage:

First, the performance analysis illustrates that for the two main families of methods tested

(Microsimulation and PQM) the number of well-performing models is at its largest for the

AROP indicator. As far as the positional parameters (ARPT and the deciles) are concerned, in

general the PQM family have a relatively lower performance. In terms of QSR, we have

observed that performances for both families of methods are lower than for AROP.

Second, for all indicators the number of well-performing flash estimates among the family of

micro-simulation models is significantly higher than the corresponding share of PQM models

and this for almost all countries. However, there is a strong complementarity between the two

main approaches across countries and indicators. This raised the issue of method selection.

2.2. Select a strategy for method selection

Three different model selection procedures have been tested for choosing the value of the

flash estimate out of a set of candidate models.

− A best method by country that takes into account national specificities: (1) computing

the average performance of each of the candidate models across indicators at the

country level; (2) selecting the candidate model with the highest average performance.

Microsimulation tends to outperform PQM in this case (16 countries). However, PQM

can outperform microsimulation in some countries (e.g. SI, CY).

− A best method by country and indicator that takes into account the specific traits of

groups of indicators. The results showed that there is a kind of polarisation of methods

according to their performance by two groups of indicators: inequality indicators

(AROP and QSR) and positional indicators. For the latter group the Microsimulation

performs better while for the first group the results between the two main approaches

are more balanced.

− An aggregate of the best methods based on three steps: 1) discard the methods with

low historical performance (described in Section 4.2); (2) discard candidate flash

estimates that diverge in the target year; (3) aggregate those flash estimates that have

passed the filtering step into one single value.

The outcomes were evaluated for the simulation of the flash estimate 2014 for countries for

which 2015 EU-SILC data was available at the moment this report was produced. Therefore

there were three main steps: (1) first single methods within each family are filtered out on the

basis of their historical performance 2012-2013. The uncertainty based quality measure

mentioned in section 4.2.1 is used given that we can identify a precise threshold for selecting

out low performers; (2) the three candidate models for method selection are applied on the

pool of methods that passed the quality criteria; (3) the resulting flash estimates are assessed

against the observed EU-SILC indicators income 2014.

The first important conclusion that can be drawn from this analysis is that both, the aggregate

estimate and the estimate derived on the basis of a country-specific objective function largely

10

outperform the estimate consisting in choosing the best historical estimate by country and

indicator. Moreover, for nearly all countries the aggregate estimate has a better performance

than the country-specific estimate.

Note that the main difference between both estimates is that the best-by-country estimate only

uses the information provided by one of the available candidate models whereas the aggregate

estimate aims at combining the information of the well-performing candidate estimates

leading to a more robust model selection procedure. Moreover, in the aggregation algorithm

the convergent/divergent behaviour of the candidate models within the target year constitutes

an integral part of the performance assessment setting. On the contrary, the best-by-country

estimate does not take into account the relative performance of the candidate models at the

target year and hence is exclusively based on the use of historical performance. However, the

strength of the best-by-country estimate is that the model selection is done at the country-level

which implies that for a given country the same model is used to produce the flash estimates

for all the indicators assuring consistency among the latter. Note that using different methods

for distinct indicators by no means implies the occurrence of an inconsistent set of indicators.

However, using the same model for all the indicators estimates is a sufficient condition for

consistency.

We can thus conclude that aggregating the information provided by several well-performing

methods allows creating more robust flash estimates compared to the other two selection

procedures. However there are two main issues to be addressed:

In contrast to the best-by-country estimate, the aggregation algorithm is not

constrained to the use of one sole model at the country level and hence the set of

participating models might very well vary across indicators. Hence, the aggregation

algorithm does not assure consistency in advance.

The optimisation algorithm of aggregation doesn't take into account the hierarchical

structure of the methods at this point so we can have issues to balance the participation

of the two approaches.

2.3. Ex-ante quality assessment in the target year

Historical performance

A first measure of quality is based on the historical performance based on the period 2012-

2014. The historical performance of the aggregate estimate (𝑯+ ) is specified as the average

of the historical performances of all methods that participate in the construction of the final

(aggregate) estimate that pass the quality criteria (1) and the convergence criteria in the target

year (2). If a single method is used for the final estimate (e.g. best by country) this method is

assessed based only on its historical performance.

Prediction interval and estimated probabilities to be within certain change classes

11

Providing a point estimate for the population parameter does not take into account the

uncertainty of the flash estimate. We therefore computed the standard deviation that includes

both the sampling error and the uncertainty of the flash estimates. However, it doesn't take

into account model bias. Considering that we focus on estimating YoY changes rather than

the actual levels we assume that at least partially we adjust for the systematic bias. The point

estimate for the flash is calculated by adding the YoY change to the last observed value in

EU-SILC.

Flash estimates 2015(income year)

(1) Production FE 2015

A preliminary set of flash estimates 2015 are produced based on the following steps:

Produce the individual values for the FE 2015 for all methods by country and indicator

Apply the QAF to the flash estimates 2015:

o Assess past performance for individual estimates

o Use two distinct method selection procedures: the aggregation algorithm and

the best by country to produce the flash estimates 2015 for each indicator and

country. As far as the inequality indicators are concerned, we produce three

different estimates for each indicator based on three different sets of candidate

models, that is a first one which only considers micro-simulation models, a

second one based exclusively on PQM models and a third one combining both

model families. Benchmark models are used for comparison during the

analysis but are not included in the final aggregate.

o Perform an ex-ante analysis of quality in order to decide between the different

sets of indicators as well as identifying cases in which indicators are still

considered unreliable. This assessment is mainly done on the historical

performance and the convergence in the target year taking into account

uncertainty, including also more qualitative information.

If the aggregate estimate built from micro-simulation models and the PQM-based estimate

have a similar historical performance but provide contradictory or at least discrepant results at

the target year, their joint information is considered to be unreliable. If one approach has a

considerably higher historical performance compared to the other, we report the information

provided by the high-performing estimate. Finally, if both estimates have good historical

performance, without one of them considerably out-performing the other one, we report the

information provided by the combined estimate considering both model families. In the future

this kind of hierarchically-structured decision process can be directly introduced in the

aggregation algorithm.

12

Figure 113

illustrates the participation of each of the available approaches in the construction

of the aggregate estimate for AROP and QSR at target year 2015. It transmits a clear picture

of complementarity.

Figure 1: Participation by approach in the FE 2015 – AROP and QSR

Apart the main income inequality indicators (i.e. AROP and QSR) we produced the flash

estimates 2015 also for the positional indicators, namely deciles and the ARPT.

For the two main inequality indicators the aggregation clearly outperforms the other two

selection procedures (i.e. the best by country and best by country and indicator). Methods,

mainly coming from PQM have a relative higher performance on AROP or QSR than for

positional indicators. Thus the participation number in the aggregate is higher for inequality

13

UK not available for the microsimulation exercise

13

indicators but relatively lower for positional parameters. However, the predictive power of the

aggregate estimate increases with the participation number. Hence, if the participation number

is small the aggregation algorithm cannot fully deploy its potential and hence sees its

performance is reduced to the performance of the by-country-and-indicator estimate which, as

mentioned above, is underperforming when compared to the by-country estimate. This has led

to the decision to use the aggregation algorithm for inequality indicators only and to resort the

by-country estimate for positional parameters.

This still leads to a large number of indicators that have a performance lower than the

threshold for specific deciles: for example D10 seems a particular difficult indicator.

Further work is needed to improve some of the issues raised by the use of different models:

1) Need to better take into account the hierarchical structure of the methods used: by sub-

group, by approach etc.

2) Need to ensure a higher consistency across indicators. The same objective function in the

optimization algorithm by country would ensure this. However, the first results show that

methods which perform better for predicting inequality indicators are not necessarily the same

as for positional indicators.

Table 1 presents an overview of the flash estimate exercise for income year 2015. It highlights

the main stages, conclusions and decisions.

14

Table1: Overview of the flash estimate exercise

15

If for some aggregates or for the best by country results no method passes the performance threshold than the flash estimate is

considered unreliable for the specific country and indicator.

1) Assess past performanceIf for some aggregates or for the best by country results no method passes the historical performance threshold than the flash

estimate is considered unreliable for the specific country and indicator.

2) Two main method selection strategies: aggregation and best by

country. 3 sets of aggregate estimates are produced: for each

approach (microsimulation and PQM) as well as the overall one.

3) Ex-ante assessment for flash estimate 2015

Ex-post quality

assessment

The relative magnitude direction scale(MD6) is a relative grid for communicating the YoY change.

It is country specific and it takes into account the sampling error in order to clasify the change as

minor/moderate or major change.

The absolute magnitude direction scale(MD8)-has the same categories across countries and is more

straigtforward to interpret. However it should be considered together with the statistical significance of

the change.

Review the quality of estimates when EU-SILC data becomes available.

Assess revisions and improve consequently the methodology and the QAF

We have decided that when the historical performance is under the performance threshold these estimates should not be

communicated as they are not reliable. The magnitude direction scales are proposed in order to balance in the communication the

information provided and the associated uncertainty of our estimates.

Production flash

estimates 2015

Set of individual FE 2015 for all methods

Communication main

results

Apply the QAF to FE 2015

A comparative analysis was conducted for the four sets of estimates based on the three aggregates and the best by country. Both

the convergence in the target year and the historical performance were compared.

If both methods have high performance and the results are convergent we are chossing the overall aggregate. If the aggregate for

one of the two approaches clearly outperforms the other only one approach is chosen. If they have similar performance but they

have divergent values in the target year results are considered not reliable.

Due to the low number of methods passing the aggregation for the locational indicators and coherence issues among indicators we

have chosen the best by country method. This should be addressed in the future by using a country specific aggregation which

maximizes the expected quality fo flash estimates at country level for all chosen indicators. If specific indicators such as AROP are

considered more important this could be given a higher weight.

In general results are more robust and they have a higher expected quality based on their historical performance for AROP.

For the deciles there are still several cases where no method passes the threshold and in general also the aggregation performs less well.

FE 2015

To be further developed when EU-SILC data 2016 will become available.

16

(2) Communication of results

Despite its simplicity, communicating a single number gives a false impression of precision.

Alternatively, we could take into account the uncertainty of the estimate by providing a

magnitude-direction scale for the estimated change in the flash estimate. From the overlap of

the prediction interval with the change classes, we can derive the probabilities of each of the

latter.

Based on preliminary consultation with our main users and in order to take onto account the

uncertainty of the flash estimate we propose using for communication to the main public a

magnitude-direction scale with 6 classes (MD6), to which the YoY change will be assigned:

(1) major increase [+++]

(2) moderate increase [+]

(3) (quasi) stable / minor changes [O]

(4) moderate decrease [-]

(5) major decrease [---]

(6) no conclusion (when we get contradictory signals from different methods with good

past performance).

There are two possibilities concerning the choice of thresholds for the MD6:

Relative thresholds which are country-specific and are calculated as multiples of the

standard deviation for each indicator.

Absolute thresholds which are common to all countries and should be based on users'

needs. This can be extended to several classes. In our case, we used eight.

The prediction interval calculated for the flash estimate allows calculating a probability for

each change class, and therefore takes into account the uncertainty of the point estimate.

(2) Overview of the main results

DISCLAIMER

The data and conclusions presented hereafter are to be seen as preliminary results of newly

developed experimental methods. The specific figures for income indicators 2015 are given

for illustrative purposes and should not be considered at this stage as official or experimental

data produced by Eurostat.

Figure 2 illustrates the flash estimates 2015 for AROP based on the two types of magnitude-

direction scales. In most countries, a minor change is the dominant class. A major increase is

predicted for Germany (2013-2015), a moderate decrease for Slovenia and Greece and a

major decrease for Romania. The scale based on absolute thresholds is more granular so it

highlights more changes. Their statistical significance should also be considered: the same

change in terms of percentage points can be significant in one country and not in another.

17

Figure 2: Flash estimates AROP 2015: MD6 scales with relative thresholds (left) and with absolute thresholds (right)

18

Table 2 below summarises the main predicted changes for the complete set of flash estimates

for 2015 (and 2014 in case EU-SILC 2015 was not available yet at the moment when

estimates were produced). We use the MD6 scale based on relative thresholds in order to

decide the type of increase/decrease. The red colours signal a decrease, the green, an increase

and the shades give the magnitude of the change. The white/light blue background signal

minor changes. The bounds for the changes are country specific and depend on the standard

deviation for each indicator from EU-SILC. Tables 3 and 4 contain a time perspective as they

include the observed changes in EU-SILC between 2008-2013/2014 plus the flash estimates

between last observed values (2013/2014) to 2015.

Positional indicators are less stable than AROP; therefore, more changes are expected given

the chosen MD6 classes. Further analysis is available in Section 5.2. Tables 5.1 to 5.7 include

the point estimate, for levels of the indicators and the YoY change as well as the 95%

prediction intervals as well as information on the historical performance measure which is

another important parameter for assessing the expected quality and precision for the flash

estimates.

Table 2: Changes for main income indicators between last observed EU-SILC (2013/2014)-2015

19

Table 3: Changes for main income indicators across years: observed values and FE 2015

Tables 4: Changes for deciles cut-off points across years: observed and FE 2015

20

Way forward

This report provides the first results for the flash estimates on income indicators 2015.

Following this first exercise it is essential not only to further develop the methodological

framework but also to discuss issues of quality assessment and communication that emerge in

the context of this work.

− A consultation with the users is needed at this stage in order to prioritise their needs in

terms of indicators that need to be available as early estimates and their expectations

of users in terms of the explanatory power associated with the communicated changes

in main indicators. A relevant trade-off is between the predictive and explanatory

power of different methods. The different approaches differ also in their ability to

provide details and explanations about the change. Microsimulation allows to link

specific estimated changes to the implementation of certain social policies and to

identify particular groups or income components affected. On the other hand there are

countries for which the historical performance of these models was not good enough

to pass the assessment framework at this moment.

− A consultation with the producers on the methodology and the way of communicating

the results, taking into account the users' needs will also take place.

− For the production of flash estimates, in general the aggregation proved to be more

robust but the algorithm should be further improved to include qualitative information

as well as lessons learned from the ex-post evaluation of this first round of flash

estimates. Four main areas were identified for further work (a) take into account ex-

ante qualitative information related to specific groups of methods and/or intermediary

checks (b) a better consistency of the results across indicators using for example the

same combination of methods for all indicators for a particular country; (c) improve

the convergence of the candidate estimates that enter the aggregate; (d) take into

account the dependence within specific groups of methods in the filtering process.

− In terms of methodology the next step would be to include the current income

approach in the framework and assess its results not only as a standalone line of work

but also as a potential complement for updating/predicting changes in the income

components. Further improvements of the two main approaches are also foreseen

based on their historical performance and these first results for 2015.

− Further development of the quality framework is essential to build a common platform

together with the EU Member States for comparing and assessing results.

− In terms of communication, point estimates can be provided but they are subject to

several sources of uncertainty: model bias and variance, the sampling error in EU-

SILC, inconsistencies between the different data sources entering the estimation. This

21

raises not only a question of quality but also of communication of the results to the

public. A first proposal, at least until the methodology and the quality assessment are

more mature, is to start the communication of the flash estimates with a magnitude-

direction scale that would give a more general message concerning the expected

changes in the income indicators. There are two main types of scales: 1) relative scales

which are country-specific and focused on highlighting significant changes or 2)

absolute scales which are common to all countries and based on user-friendly

thresholds that can provide more information. However, providing more information

implies also a higher degree of uncertainty related to the communicated changes.

− A first future important milestone is to include the flash estimates on income

distribution as a part of the European Semester toolkit so the flash estimates should be

validated through the assessment framework and the consultation with the main users.

− After validation through the assessment framework and with the producers, Eurostat

aims at publishing the first round of flash estimates 2016 during summer 2017.

22

1. Introduction

Policy makers in the European Union (EU) are facing an increasing demand for monitoring

changes in social conditions at national and EU level, especially during periods of rapid

economic changes. The European Union Statistics on Income and Living Conditions (EU-

SILC) is an instrument aimed at collecting comparable multidimensional microdata on

income, poverty, social exclusion and living conditions for EU countries. EU-SILC indicators

on poverty and income inequality are a key part of the toolkit for the European Semester, the

yearly cycle of economic policy coordination among EU Member States. The timeliness of

these indicators is crucial for keeping track of the effectiveness of policies and the impact of

macroeconomic conditions on poverty and income distribution. However, partly due to the

complexity of the data collection process, income data for year N is only available in the

autumn of year N+2, which comes too late for the policy agenda.

The current strategy at the level of the European Statistical System (ESS) for providing more

timely data on income is based on two pillars. The first pillar refers to the production of flash

estimates on income distribution and poverty. It states that flash estimates on income of year

N should be available in time to prepare for the European Semester in the autumn of year

N+1. The second pillar concerns the final EU-SILC microdata, and it states that the microdata

on income of year N should be available during the European Semester (i.e. in the end of year

N+1 or early N+2).

This report aims to provide an overview of the process that Eurostat has put in place for the

development of flash estimates for income distribution at EU level. The flash estimates are

based on nowcasting techniques which estimate the current income distribution using

microsimulation modelling techniques or other appropriate methods based on household

income microdata from a previous period in combination with the most up-to-date statistics.

In this report, there are two main methodological approaches explored, each of them with

several sub-groups of methods being described and assessed.

In the context of development of flash estimates, the quality assessment framework becomes a

central part of the production process in order to select the methods and auxiliary sources that

perform best in reproducing EU-SILC indicators for previous years. The quality framework is

composed of two parts: (1) the quality assurance, which focuses on analysing inconsistencies

in the input data and includes several intermediate quality checks along the process; (2) the

quality assessment, which focuses on the historical performance of different methods and the

uncertainty of the final flash estimates. The quality framework is an essential element for both

the scientific and political acceptance of the flash estimates. Thus, it is important that the flash

estimates are disseminated together with methodological guidelines concerning their

development, interpretation and appropriate use.

The estimation exercise has a set of target indicators: in this round we focused on the main

income indicators such as at-risk-of-poverty rate, and quantile share ratio, but also the income

deciles which are more sensitive to changes in the economic situation and therefore they

reflect changes at different points in the income distribution. These are collected through EU-

SILC which is the reference data source on income and poverty statistics and therefore ideally

23

we should capture overall the relevant changes in the income distribution reflected in EU-

SILC.

In June 2016, Eurostat set up a task force on “Flash estimates on income distribution” with 8

Member States (BE, IT, PT, LU, DE, FR, SE and UK) to support the on-going work, with

several National Statistical Institutes already developing and releasing flash estimates at

national level (UK14

, FR15

). Eurostat and the Institute for Social and Economic Research

(ISER) at the University of Essex are also cooperating in order to further develop and enhance

the use of EUROMOD and microsimulation for nowcasting income distribution indicators16

.

The structure of the report is the following: Section 2 analyses the characteristics of the target

income data in EU-SILC. Section 3 explains the nowcasting methodology. Section 4 presents

the quality framework. Section 5 concludes by summarising the most important findings for

2015 and the next steps for the development of the flash estimate on income distribution.

14

https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bull

etins/nowcastinghouseholdincomeintheuk/2015to2016 15

http://www.insee.fr/fr/themes/document.asp?ref_id=ia23 16

https://www.iser.essex.ac.uk/research/publications/working-papers/euromod/em8-16



http://www.insee.fr/fr/themes/document.asp?ref_id=ia23

https://www.iser.essex.ac.uk/research/publications/working-papers/euromod/em8-16

24

2. Overview target income data in EU-SILC

The objective of this section is to summarise the main characteristics of the key income and

poverty indicators based on EU-SILC that we are aiming to estimate.

Figure 2.1. shows the year-on-year (YoY) changes from 2008 until the last available value at

the time of writing this report. The YoY changes are divided by their standard deviation.

Figure 2.1: Standardized YoY change

The majority of the observed YoY changes are non-significant (i.e. below 2 standard

deviations) in the case of AROP (69%) and QSR (82%); the opposite is true for ARPT (20%)

and D10 (35%).

Figure 2.2. focuses on the evolution of the key positional indicators (D10, D30, MEDIAN,

D70 and D90) from 2008 until the last available value.

25

Figure 2.2. Overview of quintiles' evolution

Despite the different dynamics during the examined period, the two noticeable phenomena

are:

an upward shift of the whole distribution (with the exception of Ireland, Greece,

Spain, Portugal and Cyprus);

an increase in inequality driven by both the relative impoverishment of the lower-

income households and the increase of the top-income ones; notable exceptions are

Greece and – to some extent – Ireland and Cyprus (where the quintiles are coming

closer together), and countries like Spain, France, Italy, Netherlands and Portugal

(where the quintiles move in tandem).

26

3. Methodology

3.1. General framework

Changes in the income distribution are the outcome of a complex interaction of a large

number of various past and present events, such as the application of economic and monetary

policies, the implementation of social reforms or shifts in macroeconomic circumstances. The

most general and flexible approach towards flash estimation would be to develop model

frameworks which allow to account for all these different influences acting on the income

distribution and the income inequality indicators.

However, the most general model framework might not constitute the best choice in all kinds

of situations and circumstances. In order to illustrate the potential sub-optimality of highly-

flexible model frameworks we first need to have a look at the concepts of predicitve power

and explanatory power in the context of flash estimation of income indicators. Predictive

power is the ability of a model to produce adequate approximations of the values taken by

income indicators at different time points before they are available. On the other hand the

explanatory power of a model indicates the capacity of that model to effectively explain the

predictions that it comes up with. As mentioned above, changes in income distribution are

linked to different phenomena ranging from macroeconomic circumstances to the

implementation of specific policies. However, there are cases in which the change in income

distribution might be overwhelmingly impacted by one main event. In this situation, the

general model framework, accounting for all potential influences on the social indicators,

reveals to be over-parametrized leading to a lower predictive power relative to the simpler

models which only focus on small numbers of effects and hence are characterized by a much

more parsimonious parametrization. The large number of parameters in the general model

framework also has a negative impact on its explanatory power. Indeed, the presence of a

large number of interacting parameters makes it very hard or even impossible to link a given

change in income indicators to specific circumstances.

Thus, having one single highly-flexible model framework at our disposal might not be the best

strategy. Instead, we sould rather aim at building a model toolbox containing a collection of

different models ranging from very specialized frameworks to more general ones. Note

however that replacing one highly-flexible model framework by a model toolkit, requires a

more sophisticated performance analysis scheme which is essential for picking the most

appropriate model framework at each moment in time.

Model toolkit

The model toolkit at our disposal can be divided into three model families, the micro-

simulation models, the parametric quantile models and a set of benchmark models. Note that

the three model families differ in terms of model complexity (and flexibility), the way current

information is used in the production of the flash estimates and their explanatory power.

Micro-simulation models are very powerful in accounting for changes in taxes and benefits

and their respective impacts on the income indicators. According to the micro-simulation

paradigm, the changes in taxes and benefits are simulated directly at the individual or

27

household level. This direct simulation allows to establish a causality link between the

implemented policies and the changes in income indicators. The existence of causal

relationships between policy rules on the one hand and income indicators on the other hand

leads to a high explanatory power of these model family. Moreover, you can disentangle the

effects on specific socio-economic groups.

The parametric quantile family relies on the exploitation of macro-level information,

provided by National Account figures such as GDP, consumption, wages and salaries. Hence,

these type of models are designed to capture the impact of changes in macroeconomic

circumstances on the income indicators. They might also capture changes in social and

economic policies if and only if these changes are reflected somehow in the National Account

figures. Note however that in contrast to the micro-simulation family the parametric quantile

models do not manage to establish causal links between policy rules and changes in

indicators. Instead the relations between the explanatory variables and the income indicators

are based around their correlations resulting in a lower explanatory power relative to micro-

simulation models.

The benchmark models build flash estimates on the basis of extrapolations of past

information. Hence, they do not constitute nowcasting frameworks as such given that they do

not use any kind of present information. Note that the benchmark model do not contain

explanatory variables which means that they only provide predictive information and hence

have no explanatory power.

3.2. Microsimulation

The approach presented here relies on microsimulation modelling and is being developed by

Eurostat in collaboration with the Institute for Social and Economic Research (ISER) at the

University of Essex.

Microsimulation models have been widely used for assessing the distributional impact of

current and future tax-benefit policy reforms, as well as the impact of the evolution of market

incomes, changes in the labour market and in the demographic structure of the population.17

Using microsimulation techniques based on representative household data enables changes in

the distribution of market income to be distinguished and the effects of the tax-benefit system

to be identified taking into account the complex ways in which these factors interact with each

other (Peichl, 2008; Immervoll et al., 2006). Combined macro-micro modelling has also been

used for analysing the impact of macroeconomic policies and shocks on poverty and income

distribution.18

17 Some examples include Brewer et al. (2013) for the UK, Keane et al. (2013) for Ireland, Brandolini et al.

(2013) for Italy, Matsaganis & Leventi (2014) for Greece and Narayan & Sánchez-Páramo (2012) for

Bangladesh, Mexico, Philippines and Poland.

18 A detailed review is provided in Bourguignon et al. (2008) and Essama-Nssah (2005). See also Figari et al.

(2015) for a discussion.

28

The nowcasting methodology presented in this report is based on microsimulation techniques

used in combination with more up-to-date statistics from National Accounts, LFS and other

national sources. It aims at developing a generic approach that can be applied to all EU

countries in a straightforward, flexible and transparent way. By doing so, it ensures the

comparability and consistency of the methodology both across countries and through time.

Our analysis makes use of EUROMOD, the microsimulation model based on EU-SILC data

which estimates in a comparable way the effects of taxes and benefits on the income

distribution in each of the EU Member States. For the purposes of the nowcasting exercise

standard EUROMOD policy simulation routines are enhanced with additional adjustments to

the input data to take into account different changes in the socio-economic conditions of the

underlying population. In order to produce flash estimates for income indicators, the

microsimulation approach is used to update the structure of a micro dataset to account for

changes to the main components of income variables over time. This is based on the

following stages: (1) adjustment for changes to the demographic structure of the population

and for changes to the presence of income sources determined by labour market

characteristics; (2) uprating the level of market income components; and (3) changes in taxes

and benefits due to policy reforms (O'Donoghue and Loughrey, 2014). For this exercise we

used the last year available for all countries for EUROMOD input datasets, namely EU-SILC

2012 (income 2011). This was available for all countries apart United Kingdom and Spain.

For Greece the 2014 file was used.

The remaining of this section explains each of these stages in detail.

3.2.1. Changes in population characteristics

There are two main approaches to take into account changes in population characteristics:

static and dynamic. The static approach is based on reweighting (or calibration). It consists of

the derivation of a new vector of sample weights that brings the marginal distributions from

the base year for a set of main socio-economic variables (e.g. age, labour, gender) to the level

of the target year. In the dynamic process individual trajectories are modelled and individuals

in the sample undergo transitions. The paper tests both approaches. The main auxiliary source

of information used to obtain the population characteristics in the target year is the Labour

Force Survey (LFS) statistics. LFS micro-level statistics for year N are usually available in

April N+1. This allows the production of flash estimates for year N based on the updated

structure for labour and demographics.

Modelling labour market transitions

The dynamic approach to take into account changes in population characteristics is based on

modelling net employment transitions. It accounts for changes in labour market

characteristics, while other population characteristics (such as demographics) are kept

constant. A detailed discussion of this approach can be found in Navicke et al. (2013) and

Rastrigina et al. (2016). Changes in employment are modelled by explicitly simulating

transitions between labour market states (Figari et al., 2011; Fernandez Salgado et al., 2013;

Avram et al., 2011). Two types of transitions are modelled: (i) from non-employment into

employment and (ii) from employment into short-term/long-term unemployment (or

29

inactivity). Observations are selected for transitions based on their conditional probabilities of

being employed rather than being unemployed or inactive. A logit model is used for

estimating these probabilities for working age (16-64) individuals in the EUROMOD input

data. In order to account for gender differences in the labour market situation, the model is

estimated separately for men and women. Students, working-age individuals with permanent

disability or in retirement and mothers with children aged below 2 are excluded from the

estimation, unless they report employment income in the underlying data. Explanatory

variables include age, marital status, education level, country of birth, employment status of

partner, unemployment spells of other household members, household size, number of

children and their age, home ownership, region of residence and urban (or rural) location. The

specification of the logit model used and the estimated coefficients are reported in Rastrigina

et al. (2016).

The weighted total number of observations that are selected to go through transitions

corresponds to the relative net yearly change in employment rates by age group and gender (a

total of 6 strata) as shown in the LFS statistics. Changes from short-term to long-term

unemployment are modelled based on a similar selection procedure but using LFS figures on

long-term unemployment (with unemployment duration more than one year) as an external

source of information. This transition is critical due to its implications for eligibility and

receipt of unemployment benefits. Transitions to and from inactivity are modelled implicitly

through restricting eligibility for unemployment benefits, according to the country-specific

rules.

Labour market characteristics and sources of income are adjusted for those observations that

are subject to transitions. In particular, employment and self-employment income is set to

zero for individuals moving out of employment. For individuals moving into employment,

earnings are set equal to the mean among those already employed within the same stratum.

Unemployment benefits are simulated for those moving out of employment in case they are

eligible for such benefits according to the country rules. If the rules require assessment of

earnings and number of months in work for several years preceding unemployment, we

assume that these remain unchanged throughout the assessment period and are equal to the

values observed in the income reference period. For those moving into long-term

unemployment the eligibility is adjusted assuming that the duration of unemployment spell is

more than one year. In some countries the long-term unemployed are not eligible for any

unemployment benefits (e.g. Latvia); in other countries they are not eligible for

unemployment insurance but still qualify for unemployment assistance (e.g. Portugal); in

countries with fairly long duration of unemployment insurance (e.g. Finland) we assume that

the long-term unemployed continue to receive unemployment insurance.

Reweighting

The static approach to account for changes in population characteristics is based on

reweighting and consists of the derivation of a new vector of sample weights that brings the

marginal distributions from the base year for a set of main socio-demographic variables to the

level of the target year. This approach allows controlling for a wider range of population

characteristics including retirement and demographic changes. In theory, this can be done also

30

through modelling individual transitions but the process is much more complex. However,

there are limitations related to the use of reweighting in the context of nowcasting: given that

it only adjusts the structure of the population to some marginal distributions it may perform

worse in times of rapid economic changes. For example, reweighting cannot capture if

individuals entering a particular state have characteristics completely different from the

characteristics of the people observed in that state in the base year. Such changes were often

observed during recent unemployment shocks.

The variables that are more likely to impact the income distribution over time should be both

highly related to income and volatile: thus the main relevant variables are related to labour

market information and changes in household composition. Other relevant controlled

characteristics include regional information. The reweighting can be done at household or

individual level. In the end the last option was dropped as results were not satisfactory.

The variables at household level are based on demographic and employment characteristics of

individuals within households: number of members in the household by age group, gender

and activity status; household size and number of dependent children; region and degree of

urbanisation. The target distributions of the relevant variables are obtained from the LFS.

However, the initial distributions observed in EU-SILC and LFS are not always consistent.

This implies that reweighting based on LFS margins can introduce a bias. For example, in

Portugal there is a 10% difference in the number of self-employed in 2011. Nevertheless in

both sources there was a decrease of 7% between 2011 and 2012. In such cases we adjusted

the margins in LFS by adding up the percentage change from LFS to the EU-SILC base year.

Hence, the adjustment reflects the change to a more recent structure while systematic source

inconsistencies are ruled out.

Overall, two alternative reweighting methods were tested in this paper:

a. CAL_H_S2L: calibration at household level based on the marginal distributions (levels)

from LFS for the following variables: household size, number of people by age group and

sex, number of people part/full-time employed/self-employed/retired, region and degree

of urbanisation.

b. CAL_H_ S2L_A: calibration at household level based on the changes in the shares for

the same variables.

The principle of calibration is that it minimizes the distance between the vectors of weights so

that the other distributions are not distorted. Further work should investigate not only the

effect on the income distribution but also possible side effects of calibration such as

distortions of distributions for variables not controlled for in the calibration.

3.2.2. Updating non-simulated income sources

After adjusting the input data for changes in the population characteristics, the next step is to

update non-simulated income beyond the income data reference period. This approach applies

uprating coefficients to market incomes19

and non-simulated social benefits (or taxes). The

19

Market incomes are wages and salaries, self-employment income, property income, income from capital, etc.

31

coefficients are based on more timely data sources from the target year which reflect

indexation rules or the change in the average income per recipient. Two approaches are tested

in the paper.

EUROMOD uprating factors

EUROMOD contains uprating factors based on available administrative or survey statistics.

Country-specific updating factors are derived for each income source, reflecting statutory

rules (such as indexation rules) or the change in the average amount per recipient between the

income data reference period and the target year. The latter is preferred for the nowcasting

exercise, especially for pensions. The evolution of average pensions can capture important

changes in the population of pensioners (e.g. inflow of newly retired pensioners with higher

average pensions). In order to capture differential growth rates in employment income,

updating factors are disaggregated by economic activity and/or by economic sector if such

information is available.

Model-based factors for socio-demographic groups

An alternative way to update income also uses EUROMOD uprating factors as a base but

introduces differential growth rates in the income distribution for some important income

components (such as employment and self-employment income) via a model-based approach.

The modelling approach starts from the premise that the growth rates of income of individuals

belonging to a given socio-demographic category follow the same probability distribution. A

socio-demographic category is defined as one specific outcome of all possible combinations

of a collection of categorical variables. The estimation algorithm is divided into two steps: (i)

the classification step and (ii) the modelling step.

The classification step aims at detecting those socio-demographic categories whose time

series of growth rates display similar patterns and to regroup these categories into larger

classes. We use EU-SILC time series for 2008-2011 to compute the average growth rates of

each of the socio-demographic categories. For wages we use the following variables: gender;

geographical regions; degree of urbanisation (densely populated, intermediate, and thinly

populated area); type of employment (full-time, part-time, other); age groups (16-24, 25-34,

35-44, 45-54, 55+); wage quantile group in the base year (QC1, QC2, QC3, QC4, QC5). At

each node of the decision tree there are several possibilities to build the subclasses using

different categorical variables. In order to choose between these potential subclasses we need

to define an optimality criterion.

To obtain the optimality criterion we estimate different logistic regression models and

compute the associate Wald significance test for the regression parameters. The explanatory

variables of the different models are identical and are given by the growth rates from 2008 to

2011 income years. However, the models differ from each other with respect to dependent

variables. Each model uses a different categorical variable as a dependent variable. Finally,

we choose the categorical variable associated with the lowest p-value for the Wald test of

significance indicating the maximum discriminatory power for this variable. Figure 3.1

illustrates the shape of the decision tree for Austria using EU-SILC 2009- 2012 (2008-2011

income years).

32

The modelling step consists of estimating the underlying statistical properties of the growth

time series of the classes using a dynamic factor modelling approach. In order to produce

estimates of income growth rates, the factor model includes explanatory variables which are

more timely than the EU-SILC data, e.g. non-financial quarterly sector accounts such as gross

domestic product, gross saving rate, compensation of employees and social contributions and

benefits.20

The estimates are obtained through a combined use of the expectation-

maximization algorithm and the Kalman filter/smoother (see Shumway and Stoffer, 1982 and

Banbura et al., 2013).

Figure 3.1: Decision tree for building socio-demographic groups in Austria (EU-SILC 2009-2012)

20

Source: Eurostat, non-financial transactions, code “nasq_10_nf_tr”.

33

3.2.3. Simulating changes in tax-benefit policies

After updating market income and other non-simulated income sources, we simulate tax-

benefit policies for each year from the base year up to the target year.

We use EUROMOD to simulate changes in the income distribution within the period of

analysis. Income elements simulated by the model include universal and targeted cash

benefits, social insurance contributions and personal direct taxes. Income elements that cannot

be simulated mostly concern benefits for which entitlement is based on previous contribution

history (e.g. pensions) or unobserved characteristics (e.g. disability benefits). These are read

from the data and updated according to statutory rules (such as indexation rules) or changes in

their average levels over time (see Section 4.1). Both contributory and non-contributory

unemployment benefits are simulated in the model; but e.g. severance payments are not.

Detailed information on EUROMOD and its applications can be found in Sutherland & Figari

(2013).

All simulations are carried out on the basis of the tax-benefit rules in place on the 30th

June of

the given policy year. The exceptions to this rule are Estonia (in 2013), Greece (in 2011,

2013-2015), and Portugal (in 2012), where policy changes after the 30th

of June were taken

into account to better match the annual income observed in the EU-SILC data. In order to

enhance the credibility of estimates, an effort has been made to address issues such as tax

evasion (e.g. in Bulgaria, Greece, Italy and Romania) and benefit non-take-up (e.g. in

Belgium, Estonia, France, Ireland, Greece, Poland, Portugal, Romania, and Finland)21

.

However, such adjustments are not possible to implement in all countries due to data

limitations.22

The last methodological step involves an attempt to account for differences between

EUROMOD and EU-SILC estimates of household income in the data reference year (here

2011). The main reasons for these discrepancies are related to the precision of simulations

when information in the EU-SILC data is limited, issues of benefit non-take-up and tax

evasion, under-reporting of income components, and small differences in income concepts

and definitions.23

In order to account for these differences, an alignment factor is calculated for each household.

The factor is equal to the absolute difference between the value of equivalised household

disposable income in EU-SILC 2012 and the EUROMOD estimate for the same period and

income concept. For consistency reasons, the same household specific factor is applied to all

later policy years. This is based on the assumption that the discrepancy between EUROMOD

and EU-SILC estimates remains stable over time. Further work will investigate the

plausibility of these assumptions and the effect on the final results.

21

https://www.euromod.ac.uk/sites/default/files/working-papers/em3-16.pdf

22 Detailed information on the scope of simulations, updating factors, non-take-up and tax evasion adjustments is

provided in the EUROMOD Country Reports (see: https://www.iser.essex.ac.uk/euromod/resources-for-

euromod-users/country-reports).

23 For more detailed information on these issues see Figari et al. (2012) and Jara and Leventi (2014).

https://www.iser.essex.ac.uk/euromod/resources-for-euromod-users/country-reports

https://www.iser.essex.ac.uk/euromod/resources-for-euromod-users/country-reports

34

Table 3.1 below indicates the codes used for the different microsimulation approaches which

correspond to the three stages of the process.

Table 3.2: Model codes of the Microsimulation approach

Uprating factors Updating labour &

demographics EU-SILC alignment factor

no= Data-based uprating factors

available in EUROMOD for all

income components

cal_H_S2L= Calibration at

household level with non-adjusted

LFS data

EUROMOD= No adjustment

for base year differences

between EU-SILC and

EUROMOD

cla= Model-based uprating

factors for employment and

self-employment income

cal_H_S2L_A= Calibration at

household level with adjusted LFS

data

emd_ad= Adjustment for base

year differences between EU-

SILC and EUROMOD

lab_I_S2L= Labour transitions at

individual level with LFS data

3.3. Parametric Quantile Modelling (PQM)

The main assumption of the parametric quantile method approach or PQM approach is that

the entire information of the population distribution of disposable income can be summed up

into a relatively small set of parameters. In terms of flash estimation procedure this

assumption implies that the estimation target is not the probability distribution function but

the small set of parameters. Thus, the estimates of the population parameters represent

sufficient statistics in the sense that no additional information on the income distribution can

be derived from the underlying EU-SILC sample than the one contained in the parameter

estimates.

In order for this “parametric assumption” to be valid the income distribution needs to belong

to some family of distributions. The first distribution one might think of is the normal

distribution since it would allow for a very parsimonious representation of the income

distribution using only two parameters: the expected value or the mean and the standard

deviation. However, there are two main features present in income distributions which make

the normal distribution a bad choice for modelling income data. First, income distributions

tend to have a right tail and hence are characterized by a positive skewness. This presence of

asymmetries within the income distribution stems from the fact that the income can

potentially take on very large values but that it hardly takes on strong negative values given

that most of the income components that make up disposable income, such as wages and

salaries for example, cannot be smaller than zero. Secondly, income distributions are heavy

tailed. Mathematically speaking heavy-tailedness means that the tails of the income

distribution are not exponentially bounded. In a less technical definition the presence of heavy

tails indicate that the probability mass in the tails of the income distribution is larger

compared to the normal distribution. This increased probability mass in the tails results in a

kurtosis larger than three. Note that the normal distribution has skewness of zero and a

kurtosis equal to three and hence that the values of these two parameters are fixed in advance.

This implies the family of normal distributions does not have enough free parameters to

capture the inherent complexity of the income distribution. Hence, the use of the normal

family would result in an under-parametrization of the income distribution given that the

35

mean and the standard deviation are not sufficient for summing up the entire information of

the income distribution and hence the presence of an inherent model bias.

Thus, given the stylized facts of income data we need at least four parameters to represent the

income probability distribution. One of the most used distributions for income modelling is

the generalized beta distribution of the second kind or GB2 distributions. The GB2

distribution is a four-parameter distribution which nests many common distributions such as

the lognormal, generalized gamma, Weibull, chi-square, half-normal, half-Student's t,

exponential and the log-logistic and hence offers a high degree of flexibility. The four

parameters of the GB2 distribution are commonly referred to as 𝑎, 𝑏, 𝑝 and 𝑞. Generally

speaking, the 𝑏 parameter encodes the central location of the parameter and hence a change in

the value of the 𝑏 parameter implies a horizontal shift of the income distribution. The

parameter 𝑎 indicates the general shape of the entire distribution, whereas 𝑝 and 𝑞 control the

shape of the left and the right tail respectively. In mathematical terms the GB2 is defined by

the following probability distribution function

𝐺(𝑥; 𝑎, 𝑏, 𝑝, 𝑞) =|𝑎|𝑥𝑎𝑝−1 (1 − (

𝑥𝑏

)𝑎

)𝑞−1

𝑏𝑎𝑝𝐵(𝑝, 𝑞)

where 𝐵(𝑝, 𝑞) is the distribution function of the Beta distribution.

However, in most cases even the four parameter distributions as the GB2 might not be

flexible enough to represent all the underlying characteristics of the income distribution. Note

for instance that the GB2 belongs to the class of unimodal distributions. However, in reality

income distributions might very well show multimodal features. Indeed, there might be strong

differences with respect to certain income components across different socio-demographic

categories. For instance the distribution of wages and salaries might be different for men and

women or vary across age groups. Hence, given these diverging characteristics of the income

distribution across socio-demographic categories the distribution of disposable income at the

population level might be the outcome of a mixture distribution, and hence of a combination

of distribution corresponding to different subpopulations. Note that the GB2 distribution can

be generalized to mixture distribution settings such as to accommodate cases of multimodal

distributions. However, in order to do so we need to have some a priori information on the

composition of the different subpopulations.

An alternative solution to the issue of multimodal income distributions is to avoid making any

assumptions on the distribution family. The drawback of this approach is that the set of

parameters that should be used to sum up the sample information is not pre-determined in

advance. Indeed, by imposing the GB2 distribution in advance the set of sufficient statistics is

known to be equal to the set of distributional parameters of the GB2. Thus, if the distribution

family is not imposed in advance the type and the number of parameters to be used as

sufficient statistics needs to be chosen by the analyst.

We have used the income quantiles as sufficient statistics. Given that the number of quantiles

is not fixed in advance, we have tested four different quantile grids. If we denote as 𝑌𝑡

36

disposable income at time period 𝑡 the quantile corresponding to the probability level 𝛼 is

defined as

𝑄𝛼 = 𝐹−1(𝛼)

where 𝐹−1(𝑦𝑡) is the cumulative distribution function of 𝑌𝑡. The four quantile grids differ from each

other in terms of their degree of sparsity such that the probability levels are specified as

𝐺𝑟𝑖𝑑 1: 𝛼 = 0.1, 0.3, 0.5, 0.7, 0.9

𝐺𝑟𝑖𝑑 2: 𝛼 = 0.1, 0.2, … , 0.9

𝐺𝑟𝑖𝑑 3: 𝛼 = 0.05, 0.1, … , 0.9, 0.95

𝐺𝑟𝑖𝑑 4: 𝛼 = 0.01, 0.02, … , 0.98, 0.99

The flash estimates of the quantiles are obtained on the basis of dynamic factor models which

exploit the more timely information of large quantities of explanatory data series consisting of

quarterly National Account figures. We test two different parametrizations of the dynamic

factor model in which the latter takes the form of either a one-factor model or a two-factor

model.

The nowcasted income indicators are derived from the flash estimates of the quantiles in three

different ways. The first approach consists in a direct computation of the income indicators

from the estimates quantiles. The remaining two approaches consist in an indirect

computation of the indicators since they first require a reconstitution of the income

distribution from the estimated quantiles via a so-called mapping step. The indicators are then

derived from the reconstituted distributions. We have developed two different mapping

algorithms. The first one is based on the assumption that the income distribution can be well-

represented on the basis of the GB2 Given this assumption the mapping algorithm derives the

optimal values of the distributional parameters of the GB2 from the nowcasted quantiles by

minimizing a non-linear least squares function. The second one reconstructs the income

density function from the estimated quantiles using Kernel smoothing. Note that whereas the

direct computation and the Kernel-based mapping approach are appropriate in either unimodal

or multimodal cases, the nonlinear least squares mapping approach only provides valid results

in cases of unimodal income distributions.

Table 3.2 below indicates the three elements of the codes of the different PQM approaches

differing in terms of quantile grids, number of factors and mapping algorithm.

Table 3.2: Model codes of the PQM approach

Mapping Factor Model Quantile Grid

G = GB2 1 = Single factor 1 = Grid 1

K = Kernel 2 = Double factor 2 = Grid 2

R = Raw

3 = Grid 3

4 = Grid 4

37

The model codes are concatenations of each combination of the three elements. Thus, there

are twenty-four possible model parametrizations with the first element of the code indicating

the mapping approach, the second one encoding the number of factors and the third one the

type of quantile grid. Thus, for instance K12 represents the PQM model using the Kernel

mapping approach as well as a two-factor time series model with the components of the

second quantile grid as dependent variables. More details on the methodological framework

for PQM can be found in Annex 1.

3.4. Benchmark models

We consider three different kinds of benchmark models. The first one is based on the

assumption that the unknown population indicator is constant over time and hence that the

best predictor of the latter at a given target year 𝑇 is the unweighted average of historical

SILC indicators. Hence, if we denote as �̃�1, … , �̃�𝑇−𝑙 the available SILC indicators at target

year 𝑇 with 𝑙 > 0, the flash estimate of the constant becnhmark model at target year 𝑇 is

defined as

�̂�𝑇 =1

𝑇 − 𝑙∑ �̃�𝑡

𝑇−𝑙

𝑡=1

.

The assumption that the population are constant over time might be inherently wrong for the

location indicators such as the ARPT of the income deciles which might rather show some

trending behavior especially in times of inflation. Note that the constant model can be written

as a linear regression model with zero slope parameter such that

�̃�𝑇 = 𝛼 + 𝜖𝑇

with 𝜖𝑇 representing the sampling noise. Thus, the linear model nests the constant model

while allowing for the presence of a linear trend with respect to the time index and hence the

flash estimate is formulated as

�̂�𝑇 = 𝛼 + 𝛽𝑇 + 𝜖𝑇 .

Thus, to produce the flash estimate at T we only need to obtain estimated versions of the

regression parameters α and β. We do so by running an ordinary least-squares algorithm on

the sample of available EU-SILC indicators.

The third benchmark model is based on the assumption that the best possible estimate of the

population indicator at 𝑇 is simply the last available EU-SILC indicator. According to this,

the flash estimate at 𝑇 takes on the following form

�̂�𝑇 = �̃�𝑇−𝑙.

38

4. Quality framework

The quality of the flash estimates depends on both the methods and sources used. Therefore,

setting and using a quality framework becomes a central part of the production process to

select the methods and auxiliary sources that perform best in reproducing EU-SILC indicators

for previous years. This would also allow to benchmark results stemming from Eurostat

exercise, including the estimates based on EUROMOD done in collaboration with ISER, as

well as other national initiatives. In particular, the United Kingdom and France are also

performing their own flash estimates at national level and therefore a common platform for

validating and benchmarking results is essential.

The quality framework is composed of two parts: (1) the quality assurance, which focuses on

analysing inconsistencies in the input data and includes several intermediate quality checks

along the process; (2) the quality assessment, which focuses on the historical performance of

different methods.

4.1. Quality Assurance

The quality framework is an essential tool for designing the production process of flash

estimates. Therefore, the quality framework doesn't focus only on the final results but

includes also the following steps:

Consistency analysis of auxiliary data sources: Consistency means that income

components and totals as measured by National Accounts, or unemployment as

measured by LFS and related EU-SILC statistics show similar trends. The lack of

such consistency at macro level in the trends showed by EU-SILC and auxiliary

sources would mean that a flash estimate generated by a model that includes the

aforementioned macro data as explanatory variables is not expected to be a good

estimation of the EU-SILC-derived indicator.

Intermediate checks along the stages of production of flash estimate: The main

purpose is to draw conclusions concerning the performance and the impact on results

of different methodological steps. For microsimulation, there are checks after

calibration, uprating and the simulation of benefits and taxes.

4.1.1. Consistency analysis of auxiliary sources

An important element is the use of auxiliary information in order to estimate changes in

income distribution based on the evolution of related indicators such as income components

from National Accounts, labour market changes etc. Therefore, this estimation relies on the

assumption that the trends observed in EU-SILC data and in auxiliary data sources are similar.

A retrospective assessment of consistency was performed on the auxiliary information used in

the estimation process: LFS and National Accounts. In addition, in the case of

39

microsimulation, the accuracy of income components simulated with EUROMOD and

observed in EU-SILC was tested for the base year.

For this, the original EU-SILC indicators and the estimated ones are compared based on two

main metrics:

(1) Accuracy - measuring the degree to which the point estimate is similar to the

reference value based on the mean absolute percentage error (MAPE):

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑀𝐴𝑃𝐸 = 1 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑖=1𝑛 (𝑎𝑏𝑠 (

𝐸𝑆𝑇𝑖

𝑅𝐸𝐹𝑖− 1))

(2) Consistency - measuring the extent to which the year-on-year rates of changes are

similar across the time series:

𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 = 1 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑖=1𝑛 (𝑎𝑏𝑠 (

𝐸𝑆𝑇𝑖

𝐸𝑆𝑇𝑖−1−

𝑅𝐸𝐹𝑖

𝑅𝐸𝐹𝑖−1))

Both measures involve subtraction from 1 in order to have higher values indicating better

performance. The two performance metrics allow conducting an extensive comparative

analysis across countries, years and methods and to summarise a large amount of information.

In addition for categorical variables in order to assess the similarity of two probabilistic

distributions (',VV ), we use the Hellinger distance (HD):

K

i P

Pi

O

OiK

i N

n

N

niVpiVpVVHD

1

2

1

2''

2

1)()(

2

1),(

where:

K is the total number of cells in the contingency table;

nOi is the frequency of cell i in the first data set (e.g. EU-SILC);

nPi is the frequency of cell i in the second data set (e.g. LFS);

N is the total size of the specific sources.

A HD value of 0% indicates a perfect similarity between the two probabilistic distributions,

whereas a HD value of 100% indicates a total discrepancy.24

24

Usually, in statistical analysis a Hellinger distance smaller than 5% is considered acceptable.

40

Labour Force Survey data

In the first stage, the socio-demographic structure of the input data is updated in line with the

labour and demographic information from the LFS data for the target year. As the main

calibration methods are at household level, the assessment is done for household level

variables which are based on basic demographic and especially employment individual

information.

In general, distributions of demographic variables are well-aligned in EU-SILC and LFS,

whereas higher discrepancies are observed for labour variables. Table 4.1 provides a summary

for the main demographic and labour variables at household level by country.

We can observe that in general the HD values are rather low for the main demographic

variables with average HD by country ranging from 0.46 for Italy to 4.64 for Croatia. For

several variables, there are issue related to the number of people 74+ as these are not included

in the LFS sample (SE, DK and EE). The degree of urbanisation for several countries (IT, LV,

EL, LU, MT, FI and SE) had a high HD so they were not used further on in the estimation.

The HD is higher for labour variables at household level especially for Romania but also for

specific groups for Bulgaria, Cyprus, Croatia, Sweden and Denmark.

Table 4.2 provides an overview on the trends in EU-SILC and LFS for main employment

variables at individual level. We can observe that once we rule out source inconsistencies

which are systematic and visible in the levels, the year on year changes are consistent. Two

performance metrics are used for assessing the similarity between EU-SILC and LFS:

accuracy for the levels and consistency for the trends. Therefore, for calibration, an

adjustment is done so that the update takes into account only the changes in LFS from the

base to the target year rather than the actual values.

A similar approach is taken for modelling employment transitions: the total number of

simulated labour market transitions in EUROMOD input data and their direction are

determined by relative changes in employment rates as shown in the LFS. However, as noted

in Rastrigina et al. (2015), employment changes are not always the same in LFS and EU-

SILC. The evolution of employment rates in LFS and EU-SILC for 2011-2013 follows

different trends in Luxembourg and for specific categories in the Czech Republic, Latvia and

France. There are several reasons for these discrepancies, such as differences in definitions,

imputations, survey methodology, as well as operational differences that may affect the nature

of non-response and sampling errors. A detailed discussion on these issues can be found in

Rastrigina et al. (2015).

41

Table 4.1: Average HD (%) for main variables used in the estimation by country: EU-SILC vs. LFS (2011-2014)

Variables excluded from calibration.Region of household: EE, CY, LV, LT, LU, MT: no regional breakdown for NUTS level 2. EL, HR: LFS and EU-SILC are based on different NUTS versions.

HD >= 5.0

DE, UK: (n/a)

BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE

Children 0-5 years 1.11 3.46 1.72 2.97 1.82 1.40 2.35 0.83 0.63 2.43 0.69 1.07 1.48 0.72 0.97 0.59 3.27 0.81 0.40 0.31 0.83 2.82 0.46 1.59 0.84 1.28

Children 6-9 years 1.28 3.48 0.56 2.23 1.97 1.94 0.90 0.48 1.20 3.30 0.22 1.89 1.06 0.89 2.30 0.67 2.17 0.59 0.97 0.64 0.59 3.37 1.94 1.09 1.28 1.55

Children 10-14 years 1.74 3.19 1.18 2.14 2.17 0.92 0.67 0.86 1.45 3.31 0.25 2.71 0.50 1.29 2.63 0.45 2.32 0.36 0.58 2.02 0.56 3.22 2.14 1.39 1.21 1.23

Children 15-19 years 0.96 1.66 0.21 2.34 1.02 2.64 1.08 0.69 0.71 2.46 0.39 3.70 1.01 0.54 1.81 1.30 1.98 0.57 1.75 1.78 0.61 2.33 0.71 2.40 0.83 1.15

Persons 20-24 years 0.69 2.85 0.89 2.48 1.03 1.23 1.09 0.33 0.89 2.37 0.43 5.51 1.10 1.13 2.19 1.87 0.90 1.36 0.87 0.34 0.49 1.92 0.96 2.37 1.53 3.55

Females 25-34 years 0.46 4.66 0.78 0.71 0.19 2.65 1.55 0.58 0.53 3.78 0.66 1.52 1.05 0.26 1.29 1.30 0.69 0.66 0.62 2.49 0.88 4.60 0.43 2.68 0.58 0.52

Males 25-34 years 0.62 4.41 2.33 0.69 0.46 2.01 2.18 0.79 0.65 3.60 0.44 2.43 0.61 0.91 0.89 2.05 2.09 0.59 2.15 2.62 0.62 4.48 0.42 2.44 0.81 0.47

Females 35-49 years 1.31 1.72 0.91 2.59 1.01 1.48 1.60 0.56 0.69 9.06 0.50 2.90 0.67 0.44 0.66 2.12 2.10 0.51 1.05 0.74 0.54 2.42 1.81 1.52 0.59 0.26

Males 35-49 years 0.92 2.10 0.80 1.62 2.08 2.03 2.31 0.89 0.37 7.89 0.55 1.36 0.68 0.72 0.69 1.42 1.35 0.54 1.57 1.51 0.62 2.98 1.05 1.15 0.88 0.87

Females 50-64 years 1.31 1.16 0.80 2.73 1.26 1.31 1.35 0.69 0.58 9.31 0.48 1.97 0.42 0.83 0.98 3.42 2.25 0.40 1.12 0.73 0.59 0.98 1.25 1.26 1.06 0.30

Males 50-64 years 0.82 1.67 0.90 1.92 1.73 2.54 1.35 0.42 0.74 7.45 0.48 1.10 0.68 0.88 0.92 3.25 1.93 0.31 1.35 0.34 0.86 0.80 0.67 1.10 0.72 0.18

Persons 65-69 years 0.57 1.36 0.72 2.89 1.12 0.41 2.90 0.42 0.56 3.14 0.37 0.82 0.62 0.61 0.78 1.18 0.40 0.46 0.77 0.36 0.49 1.34 1.04 1.44 0.56 0.68

Persons 70-74 years 3.38 0.35 0.64 1.45 1.43 0.49 1.39 0.53 0.35 2.52 0.38 1.02 0.78 0.57 0.65 0.50 0.98 0.71 0.80 0.74 0.29 1.49 0.74 1.05 0.39 0.79

Person 75 years and older 2.70 0.78 0.27 14.06 12.10 0.86 1.08 0.65 0.59 4.34 0.67 1.36 0.74 1.51 2.60 0.49 1.25 0.84 0.55 0.34 0.55 1.89 1.57 1.86 1.75 18.51

AVERAGE on main

demographic variables 1.28 2.35 0.91 2.92 2.10 1.56 1.56 0.62 0.71 4.64 0.46 2.10 0.81 0.81 1.38 1.47 1.69 0.62 1.04 1.07 0.61 2.47 1.09 1.67 0.93 2.24

Full-time employed persons

(15-74 years) 1.34 3.80 1.70 4.52 3.20 2.64 1.38 0.90 0.70 1.90 1.60 2.47 0.50 2.10 3.27 0.90 1.85 1.68 1.60 0.70 1.90 5.62 2.29 3.21 1.60 5.64

Part-time employed persons

(15-74 years) 0.74 3.61 2.50 7.06 1.90 2.41 3.62 0.90 0.70 1.70 1.10 0.78 0.80 3.00 1.22 2.60 4.99 2.31 1.30 1.00 1.10 0.85 3.01 3.17 0.60 1.77

Self-employed persons

(15-74 years) 3.95 2.07 0.90 1.62 1.20 2.97 2.39 1.50 0.40 6.14 1.30 5.37 2.50 1.60 1.08 1.80 3.38 2.24 0.70 0.90 2.30 2.90 2.35 2.26 0.30 1.85

Unemployed persons

(15-74 years) 0.63 5.09 3.60 1.29 1.90 1.78 2.06 2.70 1.00 3.51 2.80 1.40 1.30 1.20 4.49 1.00 1.53 3.16 1.30 1.20 1.00 7.53 0.71 0.33 1.20 3.59

Retired persons

(15-74 years) 3.19 2.27 1.00 4.42 1.20 0.74 3.14 0.60 1.60 0.77 1.00 1.84 0.70 1.90 1.43 1.60 1.57 1.69 1.60 1.50 2.40 3.82 1.08 1.66 1.10 1.79

AVERAGE on labour

variables 1.97 3.37 1.96 3.78 1.87 2.11 2.52 1.32 0.87 2.80 1.58 2.37 1.14 1.95 2.29 1.59 2.66 2.52 1.28 1.07 1.75 4.14 1.89 2.13 0.95 2.93

Household size 4.32 8.46 1.99 2.39 1.19 1.86 5.69 0.48 1.07 1.26 0.53 2.89 0.95 2.56 1.18 0.51 4.32 1.12 0.46 3.15 0.46 6.67 2.91 5.86 0.34 7.47

Region of household 1.22 1.22 1.08 1.51 0.10 0.25 5.30 0.08 0.74 0.45 0.21 2.55 0.26 4.11 0.00 0.64 0.27 3.47

Degree of urbanisation of

household 2.76 1.57 1.49 0.49 0.43 3.75 10.02 2.39 2.83 5.50 8.41 2.42 11.61 1.11 6.39 4.48 33.97 1.61 0.47 1.05 1.41 3.53 4.13 5.10 7.94 12.60

1.64 2.78 1.24 2.95 1.93 1.74 2.35 0.85 1.05 4.08 1.06 2.22 1.38 1.20 1.83 1.56 3.46 1.09 1.02 1.22 0.90 3.18 1.44 2.06 1.18 3.11

Calibration variables

Variables refer to current

reference period

Variables refer to income

reference period (IRP)

Number of …. in

household

AVERAGE on all variables

42

Table 4.2: Average accuracy and consistency for employment variables: EU-SILC vs. LFS (2011-

2014)

2012

2011

2013

2012

2014

2013AVERAGE > 0.95 AVERAGE > 0.95

BE Employed persons (15-74 years) LFS(y) 0.28 -0.53 0.69 0.990 0.9900 0.961 0.9610

SILC(y+1) 1.80 -0.86 -0.48

Unemployed persons (15-74 years) LFS(y) 0.30 3.36 -1.65 0.902 0.9020 0.877 0.8770

SILC(y+1) -21.92 7.32 -4.86

Retired persons (older than 64 years) LFS(y) 2.28 1.29 1.80 0.991 0.9910 0.921 0.9210

SILC(y+1) 3.80 2.09 1.50

BG Employed persons (15-74 years) LFS(y) -0.03 -0.37 0.37 0.983 0.9830 0.974 0.9740

SILC(y+1) -2.38 1.72 1.11

Unemployed persons (15-74 years) LFS(y) 4.99 -1.72 -4.18 0.972 0.9720 0.841 0.8410

SILC(y+1) 7.49 -5.16 -6.57

Retired persons (older than 64 years) LFS(y) 1.91 -0.06 -0.86 0.979 0.9790 0.981 0.9810

SILC(y+1) -0.27 2.33 0.83

CZ Employed persons (15-74 years) LFS(y) 0.89 1.02 0.63 0.991 0.9910 0.969 0.9690

SILC(y+1) 1.39 -0.20 1.57


SILC(y+1) 4.96 2.93 -9.76


SILC(y+1) 3.22 3.50 2.82

DK Employed persons (15-74 years) LFS(y) -1.10 0.55 1.08 0.974 0.9740 0.981 0.9810

SILC(y+1) -2.56 3.34 4.60

Unemployed persons (15-74 years) LFS(y) -0.65 -8.06 -1.90 0.825 0.8250 0.915 0.9150

SILC(y+1) 32.03 -22.76 3.23


SILC(y+1) 2.23 -4.58 1.07

DE Employed persons (15-74 years) LFS(y) 0.75 1.38 1.11 0.983 0.9830 0.969 0.9690

SILC(y+1) -2.03 1.98 .


SILC(y+1) -1.81 -2.18 .

Retired persons (older than 64 years) LFS(y) . . .

SILC(y+1) . . .

EE Employed persons (15-74 years) LFS(y) 1.98 0.42 1.25 0.992 0.9920 0.983 0.9830

SILC(y+1) 0.49 0.81 1.65


SILC(y+1) -13.63 -17.56 -15.77


SILC(y+1) -0.85 3.87 1.64

IE Employed persons (15-74 years) LFS(y) -0.87 1.20 1.90 0.987 0.9870 0.944 0.9440

SILC(y+1) -0.26 3.13 .


SILC(y+1) -2.29 -14.17 .


SILC(y+1) 7.73 6.58 .

EL Employed persons (15-74 years) LFS(y) -9.59 -5.51 2.05 0.980 0.9800 0.975 0.9750

SILC(y+1) -9.39 -0.54 1.35


SILC(y+1) 17.64 -0.76 -3.96


SILC(y+1) 3.97 -4.37 1.41

ES Employed persons (15-74 years) LFS(y) -4.77 -3.33 0.72 0.988 0.9880 0.967 0.9670

SILC(y+1) -4.67 -2.72 3.45


SILC(y+1) 9.98 3.80 -5.62


SILC(y+1) 1.16 5.66 2.26

FR Employed persons (15-74 years) LFS(y) 0.30 -0.29 1.91 0.981 0.9810 0.983 0.9830

SILC(y+1) -0.33 -3.96 3.33

Unemployed persons (15-74 years) LFS(y) 4.11 3.55 11.32 0.842 0.8420 0.832 0.8320

SILC(y+1) 6.78 21.00 -15.91


SILC(y+1) 2.24 4.09 1.41

HR Employed persons (15-74 years) LFS(y) -2.12 -1.65 1.58 0.971 0.9710 0.933 0.9330

SILC(y+1) 1.82 1.43 -0.11


SILC(y+1) 2.26 4.40 -9.41


SILC(y+1) 1.02 0.67 6.22

IT Employed persons (15-74 years) LFS(y) -0.03 -1.34 0.87 0.991 0.9910 0.965 0.9650

SILC(y+1) -0.56 -0.04 .


SILC(y+1) 9.95 11.22 .


SILC(y+1) 0.20 4.36 .

CY Employed persons (15-74 years) LFS(y) -1.61 -6.75 -1.62 0.985 0.9850 0.946 0.9460

SILC(y+1) -3.74 -5.88 .


SILC(y+1) 32.79 29.78 .


SILC(y+1) 1.82 -4.32 .

CONSISTENCY ACCURACYCountry Variable

(Number of…)

Source Growth rate in %, YoY

43

2012

2011

2013

2012

2014

2013AVERAGE > 0.95 AVERAGE > 0.95

LV Employed persons (15-74 years) LFS(y) 1.40 2.26 -1.04 0.986 0.9860 0.994 0.9940

SILC(y+1) -0.32 4.09 -1.75


SILC(y+1) -20.63 -12.92 -20.09

Retired persons (older than 64 years) LFS(y) -0.90 -0.83 1.59 0.997 0.9970 0.990 0.9900

SILC(y+1) -0.80 -0.30 1.82

LT Employed persons (15-74 years) LFS(y) 1.21 0.46 1.70 0.986 0.9860 0.991 0.9910

SILC(y+1) -0.07 2.02 0.40


SILC(y+1) -16.45 -11.29 -8.36

Retired persons (older than 64 years) LFS(y) -0.67 0.34 -0.53 0.993 0.9930 0.982 0.9820

SILC(y+1) -0.45 -0.41 -1.58

LU Employed persons (15-74 years) LFS(y) 7.37 1.26 4.94 0.897 0.8970 0.945 0.9450

SILC(y+1) -8.45 6.07 .


SILC(y+1) -3.14 7.53 .


SILC(y+1) -6.89 11.64 .

HU Employed persons (15-74 years) LFS(y) 2.65 2.37 5.46 0.985 0.9850 0.990 0.9900

SILC(y+1) 1.42 2.10 8.47


SILC(y+1) 1.95 3.74 -26.28


SILC(y+1) 3.93 1.68 -0.57

MT Employed persons (15-74 years) LFS(y) 2.48 2.53 2.37 0.990 0.9900 0.984 0.9840

SILC(y+1) 1.91 1.10 3.49


SILC(y+1) 8.38 1.79 -5.23


SILC(y+1) 8.65 3.36 8.67

NL Employed persons (15-74 years) LFS(y) 0.13 -1.76 -2.35 0.987 0.9870 0.921 0.9210

SILC(y+1) -0.58 -2.64 0.05


SILC(y+1) 7.78 6.19 9.98

Retired persons (older than 64 years) LFS(y) 4.45 3.78 -28.49 0.888 0.8880 0.918 0.9180

SILC(y+1) 4.00 2.48 3.27

AT Employed persons (15-74 years) LFS(y) 0.95 0.42 -0.01 0.991 0.9910 0.971 0.9710

SILC(y+1) 0.91 -0.07 2.09


SILC(y+1) 3.68 7.15 -1.79


SILC(y+1) 3.08 2.80 2.15

PL Employed persons (15-74 years) LFS(y) 0.76 0.38 2.09 0.962 0.9620 0.963 0.9630

SILC(y+1) -3.15 1.60 -4.18


SILC(y+1) 4.58 3.52 -14.65


SILC(y+1) 2.59 3.87 1.87

PT Employed persons (15-74 years) LFS(y) -5.51 -2.73 4.67 0.993 0.9930 0.964 0.9640

SILC(y+1) -5.23 -1.87 3.80


SILC(y+1) 23.85 4.65 -11.84


SILC(y+1) -0.65 2.64 1.84

RO Employed persons (15-74 years) LFS(y) 0.66 0.15 1.83 0.932 0.9320 0.886 0.8860

SILC(y+1) 9.05 1.44 -8.86

Unemployed persons (15-74 years) LFS(y) -1.87 3.47 -0.48 0.865 0.8650 -0.137 -0.1370

SILC(y+1) 10.82 -3.44 -21.54

Retired persons (older than 64 years) LFS(y) -0.25 1.24 2.72 0.961 0.9610 0.929 0.9290

SILC(y+1) -9.03 2.99 3.84

SI Employed persons (15-74 years) LFS(y) 0.95 -2.94 -0.78 0.980 0.9800 0.989 0.9890

SILC(y+1) -1.18 -4.36 1.62


SILC(y+1) 17.70 23.14 -7.03


SILC(y+1) 3.63 1.37 5.17

SK Employed persons (15-74 years) LFS(y) 1.41 -0.82 1.63 0.970 0.9700 0.948 0.9480

SILC(y+1) -3.08 3.56 1.46


SILC(y+1) 3.64 0.83 -7.00


SILC(y+1) -2.43 6.86 -4.55

FI Employed persons (15-74 years) LFS(y) -0.13 -0.34 -1.60 0.992 0.9920 0.962 0.9620

SILC(y+1) 1.32 -0.66 -2.12


SILC(y+1) 0.42 11.11 13.45


SILC(y+1) 3.95 3.46 3.77

SE Employed persons (15-74 years) LFS(y) -0.16 1.07 0.04 0.993 0.9930 0.926 0.9260

SILC(y+1) -0.11 0.62 1.64


SILC(y+1) 10.17 4.75 -12.90

Retired persons (older than 64 years) LFS(y) 3.93 6.07 -0.36 0.972 0.9720 0.557 0.5570

SILC(y+1) 2.13 3.04 3.16

ACCURACYCountry Variable

(Number of…)

Source Growth rate in %, YoY CONSISTENCY

44

National Accounts data

Data from National Accounts and EU-SILC were checked in terms of levels and trends for the

period 2008-2014. The intermediate checks showed that for property and self-employment

income, the two sources are very different. Moreover, these components vary significantly

between the quarterly National Accounts data available for the flash estimates and later

revisions. Therefore, only income from employment, social benefits and taxes from the

National Accounts are used for time series modelling. A parallel project25

is ongoing in

Eurostat on the reconciliation of macro-micro data on income and this can also feed the work

on flash estimates in the future

EUROMOD data

EUROMOD input data are based on cross-sectional EU-SILC data. However, these data

undergo some transformations: e.g. missing values are imputed, children born after the

income reference period are dropped from the data, employment characteristics observed in

the data collection period are adjusted to match the income observed in the income reference

period; in some countries gross incomes are recalculated from net incomes using different

(arguably more precise) algorithms than in the original EU-SILC (e.g. in Greece, Italy and

Croatia).. In the case of Luxembourg the discrepancies between the nowcasted estimates and

the Eurostat indicators in the base year are caused by the fact that households with at least one

international civil servant have been excluded from the EUROMOD input data (645

households), as they have a specific tax-benefit system which is different from the national

one.26

Further discrepancies between EU-SILC and EUROMOD output data for the base year may

arise due to precision of simulations when information in the EU-SILC data is limited, issues

of benefit non take-up or tax evasion, under-reporting of income components in the EU-SILC

data, as well as some differences in income concepts and definitions, coverage and reference

period (e.g. EU-SILC collects taxes paid in the income reference paid whereas EUROMOD

simulates the taxes arising earned in the income reference period).

In table 4.3 the average accuracy for the main income components is calculated by comparing

the EU-SILC 2012 (income 2011) with the EUROMOD output file based on the same data. If

there is no difference between them, the accuracy values are equal to 1.

Overall, the non-simulated income components27

are very similar to their original EU-SILC

values. However, there are some country-specific differences due to imputations and other

data transformations: e.g. for self-employment income in Denmark, Spain and Sweden.;

25

http://www.iariw.org/dresden/gregorini.pdf 26

This limitation is going to be addressed in the next update of the Luxembourg model based on EU-SILC 2015

data. 27

The non-simulated components are employment and self-employment income, inter-household transfers

received/paid, property income, private pension and most public pensions.

45

Larger differences are observed for the simulated income components with accuracy being

below 0.90 for most of the countries for taxes. The low accuracy in Hungary and Slovakia are

explained by relatively large level differences in small values in D10 and D20.

Table 4.3: Average accuracy for the main income components in EUROMOD by country in the base

year (2011)

Country Equivalised

disposable

income

Employment Self-

employment

Taxes Social

benefits

Average

country

BE 0.97 1.00 1.00 0.84 0.98 0.96

BG 0.99 1.00 1.00 0.59 0.91 0.90

CZ 1.00 1.00 1.00 0.73 0.99 0.94

DK 0.98 0.99 -10.47 0.93 0.90 -1.33

DE 0.99 0.99 0.95 0.94 0.90 0.95

EE 1.00 1.00 1.00 0.93 0.97 0.98

IE 0.99 1.00 1.00 0.83 0.91 0.95

EL 1.00 0.99 0.93 0.94 0.95 0.96

ES 0.86 0.86 -0.91 0.82 0.95 0.52

FR 0.99 1.00 1.00 0.73 0.99 0.94

HR 0.99 1.00 0.92 0.65 0.98 0.91

IT 0.99 0.97 0.80 0.96 0.77 0.90

CY 0.99 1.00 1.00 0.96 0.92 0.97

LV 0.98 1.00 0.99 0.64 0.98 0.92

LT 0.98 1.00 1.00 0.75 0.98 0.94

LU 0.98 0.98 0.96 0.98 0.96 0.97

HU 0.96 1.00 0.93 0.02 0.96 0.77

MT 0.99 0.99 0.96 0.77 0.91 0.92

NL 0.99 0.94 0.71 0.87 0.87 0.87

AT 0.99 1.00 1.00 0.89 0.98 0.97

PL 0.99 1.00 1.00 0.88 0.97 0.97

PT 0.98 1.00 1.00 0.82 0.96 0.95

RO 0.99 0.99 0.98 0.69 0.94 0.92

SI 0.98 0.69 0.74 0.80 0.96 0.84

SK 0.98 1.00 1.00 -0.45 0.88 0.68

FI 0.99 1.00 1.00 0.90 0.96 0.97

SE 0.99 1.00 -0.12 0.96 0.97 0.76

Average

income 0.98 0.98 0.42 0.75 0.94 0.81

Note: Average for indicators: D10, D20, D30, D40, MEDIAN, D60, D70, D80, D90, SUM

Base year= 2013 for EL; UK (n/a)

Source: EUROMOD 3.22+

In table 4.4, there are the income components with a lower share in the total disposable

income. As the other non-simulated income components, they are on average very similar to

their original EU-SILC values. The performance of property income and private pensions is

below 0.98 due the impact in the average of the lower accuracy of Spain due to revisions in

the data after the EUROMOD input file was created.

46

Table 4.4: Average accuracy for other income components in EUROMOD in the base year (2011)

Other income components Average income

Inter-household transfers paid 0.98

Inter-household transfers received 0.99

Property income 0.97

Private pensions 0.97 Note: Average for indicators: D10, D20, D30, D40, MEDIAN, D60, D70, D80, D90, SUM and all countries except UK, base

year = 2013 for EL


Both the differences in the income components and revisions in the data lead to an equivalised

disposable income in EUROMOD that differs from the original EU-SILC in the base year. In

order to account for these differences, an alignment factor is calculated for each household.

The factor is equal to the absolute difference between the value of equivalised household

disposable income in EU-SILC 2012 and the EUROMOD estimate for the same period and

income concept. For consistency reasons, the same household specific factor is applied to all

later policy years. This is based on the assumption that the discrepancy between EUROMOD

and EU-SILC estimates remains stable over time.

4.1.2. Intermediate quality checks

In the retrospective quality assessment, intermediate quality checks were implemented

throughout the process. This allows for a more precise identification of the problematic issues

in the past years: e.g. identify discrepancies between specific microsimulated income

components and the observed values in EU-SILC.

Demographic and labour characteristics

For the microsimulation approach an essential element is the update of labour information and

demographics. For categorical demographic or labour market variables we measure the

similarity of distributions using the Hellinger distance (HD). We compare variables from the

base year (2011) after calibration to the target year with figures from the actual EU-SILC for

the corresponding year. This is a measure of performance after the first stage of input data

transformations aimed at assessing how close we are to EU-SILC in terms of labour and

demographic characteristics. The results in table 4.5 are in line with the previous section: the

variables with a higher HD are the ones related to the labour information.

This measure of distributions similarity was implemented for two calibration procedures:

- CAL H: calibration at household level based on the marginal distributions (levels) from

LFS in the target year for: household size, number of people by age group and sex,

number of people part/full-time employed/self-employed/retired, region and urban.

- CAL_H_A: calibration at household level based on the changes in the shares for the

same variables.

47

Table 4.5a: Hellinger distances after not adjusted calibration calculated for calibration variables

Calibration variables BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE

Children 0-5 years 1.14 3.00 1.93 3.16 1.81 1.30 2.59 0.48 0.42 1.90 0.75 0.75 1.27 0.64 0.94 0.69 3.32 0.72 0.34 0.33 0.73 3.99 0.42 0.75 0.82 1.17

Children 6-9 years 1.19 3.35 0.55 2.32 1.77 1.75 0.98 0.81 1.00 2.77 0.23 1.82 0.89 0.92 2.08 0.71 2.44 0.46 0.89 0.58 0.62 4.42 1.95 0.98 1.44 1.51

Children 10-14 years 1.57 3.01 1.36 2.14 1.86 0.74 0.49 1.04 1.43 3.43 0.25 3.03 0.44 1.37 2.64 0.39 2.41 0.30 0.46 1.92 0.57 4.31 2.39 0.94 1.28 1.10

Children 15-19 years 0.83 1.57 0.26 2.38 0.76 2.65 1.20 0.42 0.77 1.99 0.36 3.39 0.96 0.49 1.59 1.18 1.64 0.59 1.65 1.60 0.62 2.17 0.94 2.35 0.87 1.13

Persons 20-24 years 0.61 2.27 0.98 2.35 1.34 1.07 0.48 0.20 1.25 2.27 0.33 5.30 1.10 1.46 2.25 1.58 0.64 1.08 0.89 0.41 0.47 2.50 0.94 2.19 1.36 3.31

Females 25-34 years 0.48 4.34 0.91 1.00 0.51 2.77 1.50 0.43 0.74 3.16 0.70 1.43 1.18 0.26 0.89 1.15 0.14 0.77 0.63 2.52 1.00 5.40 0.56 2.81 0.50 0.39

Males 25-34 years 0.67 4.04 2.27 0.82 0.54 2.04 1.71 0.52 0.82 3.73 0.39 2.47 0.60 1.04 0.61 2.01 1.14 0.64 2.19 2.68 0.64 5.15 0.57 2.68 0.68 0.47

Females 35-49 years 1.36 1.54 0.91 2.70 0.90 1.41 1.63 0.18 0.71 8.76 0.46 3.04 0.58 0.02 0.50 2.13 2.05 0.53 1.15 0.99 0.59 2.57 1.96 1.90 0.59 0.37

Males 35-49 years 0.90 1.69 0.70 1.66 1.55 1.97 2.42 0.86 0.34 7.35 0.51 1.17 0.64 0.96 0.52 1.55 1.03 0.54 1.61 1.67 0.64 3.50 1.12 1.38 0.88 0.89

Females 50-64 years 1.47 0.92 0.93 2.69 1.18 1.49 1.61 0.30 0.44 1.18 0.31 1.44 0.46 0.39 1.02 3.53 1.90 0.38 1.03 0.48 0.68 1.21 1.31 1.14 1.24 0.34

Males 50-64 years 0.94 1.64 1.02 2.03 1.34 2.76 1.46 0.41 0.87 6.93 0.41 1.11 0.79 0.35 0.79 3.43 1.68 0.24 1.46 0.42 0.73 0.97 0.86 0.85 0.71 0.06

Persons 65-69 years 0.80 1.41 0.75 3.09 0.88 0.20 3.24 0.33 0.67 2.85 0.38 0.96 0.54 0.62 0.63 1.36 0.23 0.50 0.74 0.44 0.44 1.73 0.93 1.35 0.54 0.69

Persons 70-74 years 3.28 0.27 0.53 1.45 1.33 0.58 1.60 0.71 0.45 2.30 0.33 1.22 1.03 0.58 0.71 0.54 0.79 0.85 0.80 0.72 0.35 2.09 0.47 1.13 0.33 0.75

Person 75 years and older 2.54 0.76 0.33 0.64 0.89 0.53 0.58 4.56 0.76 1.53 0.60 1.46 2.58 0.49 1.20 1.08 0.59 0.35 0.58 2.59 1.74 1.83 1.82


(15-74 years) 1.09 4.39 1.69 5.83 3.47 3.13 2.42 1.05 0.67 2.11 1.19 2.49 0.47 2.31 2.69 1.05 2.27 1.18 1.22 0.96 2.17 5.51 1.55 3.18 2.38 5.49


(15-74 years) 1.11 3.09 1.78 1.78 1.78 3.18 2.33 0.31 0.73 2.40 1.55 0.75 0.79 1.99 2.67 3.05 4.69 1.72 0.88 2.11 0.91 0.43 1.49 4.09 1.33 1.72


(15-74 years) 3.86 1.67 0.50 3.00 1.20 1.77 2.11 0.74 1.45 6.16 0.26 5.55 1.45 1.06 0.77 1.88 3.28 0.55 0.35 0.75 2.13 1.59 3.42 2.36 0.25 1.01

Unemployed persons

(15-74 years) 0.31 4.02 3.42 1.04 2.30 1.76 1.13 1.47 1.72 3.83 4.31 1.40 1.63 1.38 4.68 0.57 1.02 2.07 1.71 0.96 1.40 6.55 2.54 1.18 0.71 1.98

Retired persons

(15-74 years) 2.04 2.20 1.03 2.07 0.77 1.29 2.91 1.39 1.89 0.10 0.76 2.20 0.90 2.14 0.81 1.70 1.16 2.68 2.37 0.98 3.04 3.15 1.57 1.72 1.00 2.32

Household size 4.32 8.46 1.99 2.39 1.19 1.99 5.22 0.37 1.07 1.25 0.53 2.89 0.95 2.56 1.18 0.51 4.32 1.12 0.46 3.15 0.46 6.65 3.34 5.86 0.34 7.47

Region of household 1.22 1.22 1.08 1.51 0.03 0.21 5.31 0.08 0.74 0.45 0.21 2.55 0.26 4.08 0.64 0.27 3.47


household 2.76 1.57 1.49 0.49 0.43 3.47 2.91 2.83 5.50 2.42 1.11 4.48 1.61 0.47 1.05 1.41 3.55 4.20 5.10

AVERAGE country 2.56 1.20 2.19 1.35 1.73 1.90 0.71 1.19 3.55 3.55 2.21 0.86 1.10 1.53 1.58 1.87 0.91 1.01 1.25 0.93 3.37 1.63 2.11 0.92 1.78



Number of …. in

household


reference period

48

Table 4.5a: Hellinger distances after adjusted calibration calculated for calibration variables

Variables excluded from calibration.

HD >= 5.0

DE, UK: (n/a). NL: Unemployed perons (15-74 years): LFS based on ILOSTAT.

Calibration variables BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE

Children 0-5 years 1.09 2.38 1.36 0.83 1.99 1.68 1.56 0.60 1.07 1.82 0.43 1.07 1.31 0.81 0.58 0.49 5.28 0.95 0.63 0.57 0.55 4.35 0.91 3.90 0.61 0.55

Children 6-9 years 0.47 2.27 0.37 0.69 1.18 1.87 0.70 0.49 0.79 1.60 0.24 0.74 2.31 0.79 1.13 0.42 2.18 0.64 0.53 0.56 0.83 5.30 0.32 1.81 0.81 0.77

Children 10-14 years 1.24 1.36 0.99 0.80 1.30 1.40 0.61 1.23 0.37 0.72 0.22 1.51 0.83 1.00 3.49 0.60 3.53 0.40 0.55 0.49 0.92 3.61 1.01 2.47 0.74 0.80

Children 15-19 years 0.57 1.17 0.27 0.70 0.72 0.73 1.22 1.07 1.02 1.48 0.22 1.01 1.26 0.89 2.40 0.49 2.34 0.53 0.45 0.53 0.48 3.26 0.63 2.37 0.27 0.26

Persons 20-24 years 0.58 2.54 1.04 1.69 1.28 1.77 0.62 0.76 1.30 0.46 0.90 1.33 0.65 1.45 1.84 1.39 0.88 1.10 0.68 0.42 0.16 3.06 0.68 2.07 0.74 1.02

Females 25-34 years 0.55 2.14 0.76 0.84 0.51 1.08 0.79 0.29 0.71 1.35 0.47 1.04 1.03 0.11 1.39 0.81 1.15 0.39 0.62 0.33 0.80 5.85 0.41 1.46 0.81 0.40

Males 25-34 years 1.22 1.91 0.62 0.82 0.29 0.52 0.51 0.29 0.89 1.01 0.19 0.44 0.81 0.53 1.63 0.87 0.52 0.78 1.74 0.25 0.46 5.63 0.64 1.19 0.71 0.12

Females 35-49 years 0.32 1.76 1.07 0.69 0.43 1.02 0.56 0.44 0.66 0.52 0.19 0.79 0.68 0.20 1.09 0.49 0.55 0.37 0.47 0.75 0.45 2.53 0.59 1.29 0.52 0.33

Males 35-49 years 0.55 2.15 0.77 0.73 0.74 0.69 0.64 0.65 0.51 1.10 0.15 1.65 0.69 1.12 0.78 0.71 0.64 0.40 0.93 0.61 0.52 3.90 1.00 0.82 1.14 0.32

Females 50-64 years 0.70 1.30 0.51 0.39 0.19 0.98 0.59 1.54 0.42 0.36 0.54 0.50 0.74 0.60 1.64 1.29 0.82 0.66 0.74 0.67 0.57 0.92 0.50 1.14 0.81 0.55

Males 50-64 years 0.47 1.64 0.59 0.95 0.45 0.64 0.69 0.24 0.72 0.95 0.29 0.71 0.97 0.44 0.58 1.15 0.61 0.56 0.52 0.31 0.78 0.73 0.81 1.31 0.28 0.34

Persons 65-69 years 0.69 1.23 0.26 1.14 1.39 1.17 0.57 0.50 0.56 0.73 0.29 0.73 1.09 0.60 1.41 0.77 0.72 0.49 0.62 0.59 0.94 1.89 1.19 1.03 0.32 0.27

Persons 70-74 years 0.53 0.77 0.81 0.62 0.84 0.84 0.49 0.44 0.59 0.64 0.29 0.99 1.03 0.62 1.16 0.37 0.70 0.59 0.94 0.26 0.34 1.90 0.92 0.84 0.48 0.44

Person 75 years and older 0.91 0.95 0.28 0.98 0.30 0.16 0.34 0.48 0.27 0.92 0.80 0.24 1.01 0.53 0.28 1.06 0.40 0.32 0.78 2.32 0.73 0.64 0.35


(15-74 years) 1.65 2.92 1.05 0.58 1.65 1.71 0.78 1.53 0.58 1.78 0.65 2.14 1.50 1.11 3.34 1.41 1.04 0.86 0.68 0.84 1.99 5.92 1.71 1.47 1.01 0.47


(15-74 years) 0.41 1.70 1.10 1.35 0.65 1.19 1.06 0.96 1.38 0.80 0.30 1.34 0.68 0.87 0.73 2.51 0.63 0.46 0.61 0.62 0.62 0.44 0.99 0.85 1.36 0.48


(15-74 years) 0.57 1.76 0.96 0.61 1.09 1.11 1.32 0.76 0.75 0.64 0.43 0.80 1.21 1.12 2.18 2.01 0.55 0.95 0.69 0.74 0.98 2.21 0.64 0.92 0.36 0.53

Unemployed persons

(15-74 years) 0.61 2.11 1.19 0.62 1.51 0.88 0.59 0.33 0.75 1.34 1.79 3.33 1.85 1.29 2.17 0.90 1.98 1.28 0.96 0.73 2.63 1.66 2.28 1.12 1.59 0.59

Retired persons

(15-74 years) 1.04 0.88 0.72 1.23 0.67 1.36 0.72 0.69 0.86 0.71 0.21 1.52 0.36 2.08 1.03 0.65 0.53 2.19 0.64 0.41 1.02 2.55 0.57 0.78 0.31 0.15

Household size 0.43 4.63 1.10 1.00 1.26 1.45 0.91 0.15 0.39 0.76 0.43 1.17 1.14 1.84 1.39 0.61 0.92 0.22 0.44 0.93 0.58 5.22 1.34 1.36 0.26 1.59

Region of household 0.74 1.05 0.42 1.35 0.03 0.13 1.97 0.09 0.40 0.45 0.22 0.60 0.44 3.02 0.00 0.67 0.35 1.49


household 0.41 1.52 0.94 0.37 0.41 1.59 0.20 0.86 5.00 0.83 0.27 3.28 1.27 0.76 0.30 0.35 3.67 0.56 2.14

AVERAGE country ` 0.72 1.82 0.78 0.86 0.93 1.12 0.76 0.61 0.79 1.15 0.41 1.17 1.05 0.86 1.55 1.01 1.29 0.75 0.67 0.54 0.78 3.18 0.84 1.44 0.66 0.58


reference period



Number of …. in

household

49

Tables 4.5a and b show the HD values for the two calibration methods with particular

problems for Romania, Sweden and Croatia. It can be noticed that in general the calibration

at the household level with adjustment improves the consistency between the 'updated' labour

variables and EU-SILC data from the target year.

Table 4.6 provides a quick overview on the performance of the different methods for updating

the demographic and labour characteristics by country for several indicators: average for the

positional indicators (deciles), quintile share ratio (QSR) and the at-risk-of-poverty rate.

Table 4.6: Average consistency scores

Indicator Positional indicators Indicator QSR Indicator AROP

Country cal_H_

S2L

cal_H_

S2L_A

lab_I_

S2L Total

cal_H_

S2L

cal_H_

S2L_A

lab_I_

S2L Total

cal_H_

S2L

cal_H_

S2L_A

lab_I_

S2L Total

BE 0.980 0.977 : 0.979 0.981 0.983 : 0.982 0.970 0.966 : 0.968

BG 0.964 0.961 0.963 0.962 0.954 0.955 0.946 0.953 0.957 0.953 0.965 0.956

CZ 0.990 0.989 0.988 0.989 0.971 0.967 0.970 0.969 0.937 0.912 0.901 0.922

DK 0.983 0.986 0.991 0.985 0.964 0.964 0.960 0.964 0.976 0.982 0.998 0.981

DE 0.974 0.987 0.987 0.981 0.909 0.904 0.906 0.906 0.948 0.969 0.979 0.961

EE 0.967 0.966 0.967 0.967 0.931 0.918 0.919 0.923 0.926 0.936 0.955 0.936

IE 0.976 0.968 0.984 0.973 0.771 0.737 0.973 0.778 0.816 0.832 0.889 0.831

EL 0.989 0.973 0.962 0.973 0.958 0.934 0.817 0.891 0.984 0.982 0.957 0.972

ES : : 0.979 0.979 : : 0.951 0.951 : : 0.956 0.956

FR 0.981 0.988 0.987 0.985 0.978 0.993 0.984 0.985 0.964 0.981 0.980 0.974

HR 0.958 0.976 0.976 0.976 0.965 0.963 0.961 0.963 0.962 0.957 0.950 0.956

IT 0.990 0.987 0.969 0.986 0.986 0.979 0.876 0.971 0.979 0.977 0.948 0.975

CY 0.949 0.963 0.971 0.958 0.947 0.937 0.962 0.944 0.969 0.952 0.961 0.961

LV 0.978 0.978 0.977 0.978 0.972 0.970 0.989 0.973 0.978 0.980 0.977 0.979

LT 0.969 0.973 0.972 0.971 0.894 0.890 0.880 0.891 0.909 0.906 0.887 0.905

LU 0.970 0.974 0.982 0.973 0.913 0.928 0.931 0.921 0.940 0.936 0.951 0.939

HU 0.984 0.984 0.980 0.983 0.975 0.972 0.962 0.971 0.938 0.940 0.918 0.935

MT 0.953 0.968 0.981 0.963 0.954 0.960 0.969 0.958 0.976 0.978 0.993 0.978

NL 0.992 0.993 0.992 0.992 0.975 0.976 0.978 0.976 0.947 0.954 0.971 0.953

AT 0.975 0.976 0.976 0.975 0.984 0.985 0.979 0.984 0.986 0.986 0.988 0.986

PL 0.991 0.993 0.992 0.992 0.960 0.975 0.972 0.968 0.975 0.991 0.986 0.984

PT 0.986 0.986 0.983 0.986 0.978 0.975 0.985 0.978 0.970 0.962 0.965 0.966

RO 0.952 0.965 0.959 0.964 0.912 0.912 0.932 0.916 0.888 0.954 0.954 0.954

SI 0.969 0.973 : 0.971 0.967 0.961 : 0.964 0.929 0.927 : 0.928

SK 0.979 0.967 0.971 0.973 0.923 0.946 0.939 0.935 0.914 0.953 0.949 0.935

FI 0.990 0.989 0.994 0.990 0.983 0.986 0.987 0.985 0.946 0.943 0.947 0.945

SE 0.969 0.983 0.989 0.984 0.924 0.969 0.972 0.969 0.925 0.942 0.947 0.943

Average

all 0.977 0.978 0.978 0.978 0.951 0.949 0.948 0.950 0.949 0.951 0.954 0.951

This can give a first basis for comparison of different methods and their selection for the

production of flash estimates in the future. We can notice that the results are indicator and

country dependent.. The method based on labour market transitions works better in several

50

countries and for more difficult indicators such as the quintile share ratio (QSR) and AROP.

However, there is a certain complementarity as for example calibration methods work better

in Italy and the Czech Republic. In general, adjusting for source inconsistencies in LFS

improves accuracy and it improves the consistency for countries like Romania, Croatia and

Sweden.

Consistency of the income components

Table 4.7 shows the consistency indicators for the same income components comparing the

evolution of the main indicators based on 9 alternative microsimulation-based nowcasting

methods and the actual evolution in EU-SILC.

Table 4.7: Average consistency of methods by indicator and income component (2011-2014)

Income D10 D30 MEDIAN D70 D90 Average

income

Employment 0.89 0.95 0.97 0.97 0.97 0.95

Self-employment 0.73 0.82 0.84 0.88 0.72 0.8

Taxes 0.65 0.77 0.88 0.92 0.94 0.83

Social benefits 0.89 0.89 0.96 0.97 0.97 0.94

Inter-household transfers paid 0.8 0.84 0.87 0.89 0.89 0.86

Inter-household transfers

received 0.79 0.88 0.91 0.89 0.86 0.87

Property income 0.62 0.69 0.72 0.72 0.76 0.7

Private pensions 0.12 0.49 0.71 0.73 0.7 0.55

Average indicator 0.69 0.79 0.86 0.87 0.85 0.81

Note: Average for all methods, countries (except UK, and DK for D10) and years


The consistency is quite high for the median for employment income (0.97) and social

benefits (0.96), but relatively lower for self-employment (0.84) and, particularly, for property

income (0.72). The latter is highly volatile and difficult to predict. The changes in the left tail

of distribution (D10 and D30) are also more challenging to estimate.

Consistency of the uprating factors: model-based vs data-based

Table 4.8 focuses on the consistency in the mean for employment income comparing its

evolution based on the two alternative uprating factors methods and the actual evolution in

EU-SILC.

51

Table 4.8: Consistency in the mean for employment income by methods and by country (2011-2014)

Country MODEL-BASED DATA-BASED

2012 2013 2014 Average

country 2012 2013 2014

Average

country

BE 0.97 0.99 0.98 0.98 0.96 0.98 1.00 0.98

BG 0.94 0.98 0.88 0.93 0.99 0.96 0.94 0.96

CZ 1.00 0.97 0.96 0.98 0.97 0.97 1.00 0.98

DK 0.98 0.97 0.99 0.98 0.98 0.98 1.00 0.98

DE 0.97 0.98 : 0.98 0.99 0.99 : 0.99

EE 0.82 0.82 : 0.82 0.99 1.00 : 0.99

IE 0.91 0.93 : 0.92 0.96 0.98 : 0.97

EL : : 0.80 0.80 0.95 0.96 0.99 0.97

ES : : 0.94 1.00 1.00 0.98

FR 0.98 0.97 : 0.98 0.99 0.99 : 0.99

HR 0.97 0.99 : 0.98 0.94 0.98 : 0.96

IT 0.99 0.96 : 0.97 0.97 0.97 : 0.97

CY 0.97 0.96 : 0.96 0.96 0.97 : 0.97

LV 0.99 0.98 1.00 0.99 0.96 0.97 0.98 0.97

LT 0.98 0.99 1.00 0.99 0.98 1.00 0.96 0.98

LU 0.98 0.99 : 0.98 0.98 0.96 : 0.97

HU 0.86 0.84 0.95 0.89 1.00 0.97 0.96 0.97

MT 0.97 0.96 : 0.97 0.99 0.97 : 0.98

NL 0.99 0.96 0.99 0.98 0.99 0.95 0.99 0.98

AT 0.98 0.95 0.99 0.97 0.99 0.97 0.99 0.98

PL 0.97 0.99 : 0.98 0.98 0.99 : 0.99

PT 0.94 0.99 : 0.97 0.98 0.97 : 0.98

RO 0.96 0.96 0.89 0.93 1.00 0.99 0.95 0.98

SI 0.87 0.99 : 0.93 0.87 0.99 : 0.93

SK 0.94 0.90 : 0.92 0.91 0.92 : 0.92

FI 0.96 0.97 0.99 0.97 0.99 1.00 1.00 1.00

SE 0.96 0.98 : 0.97 0.99 0.99 : 0.99

Note: Average for the mean

UK (n/a)


The consistency is generally better for the data-based uprating factors, except for Croatia,

Lithuania, Luxembourg and Latvia. For the period analysed, the model-based uprating factors

do not perform well for Greece (0.80), Estonia (0.82), and Hungary (0.89).

Table 4.9 shows the consistency in the mean for self-employment income and also compares

the two alternative uprating factors methods.

The consistency for self-employment income is generally worse for the model-based uprating

factors, in particular, for France (0.16), Denmark (0.47) and Hungary (0.56). Overall, the

52

data-based uprating factors for self-employment income have a lower average performance

than the factors for employment income due its volatility.

Table 4.9: Consistency in the mean for self-employment income by methods and by country (2011-

2014)

Country MODEL-BASED DATA-BASED

2012 2013 2014 Average

country 2012 2013 2014

Average

country

BE 0.97 0.93 0.99 0.96 0.96 0.99 0.94 0.97

BG 0.67 0.66 0.85 0.73 0.61 0.72 0.90 0.74

CZ 0.94 0.79 0.76 0.83 0.97 0.96 0.99 0.97

DK 0.14 0.74 0.52 0.47 0.24 0.68 0.87 0.60

DE 0.92 0.90 : 0.91 0.90 0.93 : 0.91

EE 0.96 0.96 : 0.96 0.96 0.97 : 0.97

IE 0.84 0.84 : 0.84 0.97 0.90 : 0.93

EL : : 0.70 0.70 0.97 0.97 0.98 0.98

ES : : : : 0.99 0.98 0.98 :

FR 0.16 : : : 0.96 0.95 : 0.96

HR 0.75 0.87 : 0.81 0.91 0.92 : 0.92

IT 0.95 0.96 : 0.95 0.93 0.97 : 0.95

CY 0.84 0.88 : 0.86 0.88 0.94 : 0.91

LV 0.87 0.93 0.81 0.87 0.89 0.99 0.99 0.96

LT 0.57 0.99 0.98 0.85 0.53 0.98 0.99 0.83

LU 0.85 0.79 : 0.82 0.87 0.77 : 0.82

HU 0.67 0.72 0.30 0.56 0.92 0.96 0.32 0.74

MT 0.91 0.97 : 0.94 0.97 0.99 : 0.98

NL 0.90 0.91 0.99 0.93 0.98 0.99 0.98 0.98

AT 0.91 0.90 0.95 0.92 0.87 0.94 0.98 0.93

PL 0.94 0.96 : 0.95 0.99 0.95 : 0.97

PT 0.92 0.96 : 0.94 0.95 0.99 : 0.97

RO 0.98 0.97 0.87 0.94 0.86 0.97 0.83 0.89

SI 0.86 0.80 : 0.83 0.84 0.84 : 0.84

SK 0.93 0.92 : 0.92 0.97 0.96 : 0.96

FI 0.99 0.98 0.97 0.98 0.96 0.99 0.96 0.97

SE 0.92 0.92 : 0.92 0.83 0.95 : 0.89

Note: Average for the mean

UK (n/a)


53

4.2. Quality assessment

As part of the assessment procedure, we follow two main stages. We:

1) Perform a retrospective analysis of the performance of flash estimates for 2012 and

2014 income years. In particular, we look at:

(a) The two performance metrics for comparing point estimates and year-on-year

changes for the key income indicators (i.e. accuracy and consistency);

(b) An additional quality measure that takes into account the uncertainty related to the

sampling variance.

2) Define a strategy for providing a flash estimate for the target year based on selected

methods. Three different options are envisaged:

(a) Best method by country;

(b) Best method by indicator that takes into account the specific traits of groups of

indicators.

(c) An aggregate of the best methods selected based on their historical performance

and the convergence of estimates in the target year.

These options are simulated for 2014 flash estimates for the countries for which EU-SILC

data is already available. Section c) provides the methodological background and the

empirical results for the assessment done in order to decide the overall strategy.

4.2.1. Retrospective performance analysis

The objective of performance analysis is to evaluate the capacity of a given model framework

to produce adequate flash estimates of some income indicator of interest. The definition of

what adequate estimation means and the nature of the performance criterion used in the

performance analysis depends upon the underlying objective of the estimation procedure:

prediction vs explanation.

That being said we have to draw a clear distinction between the explanatory power and the

predictive power of a given model framework.

− If the objective of the estimation procedure is to detect and explain certain phenomena

within the data (e.g. correlations between a dependent variable and a set of

explanatory variables, as it is the case in classic multiple regression analysis), we

prefer simpler models over more complex models which might have a better

goodness-of-fit with respect to the underlying data but lack any kind of clear message.

− On the other hand, if the exclusive goal is prediction, the model choice should be

based on the quality of prediction with the explanatory power of the selected model

being of minor importance.

54

We are in the second case as our main objective is to provide an early estimate of the final

data. Therefore, our focus is on performance measures that assess the predictive power of the

model rather than the explanatory one. In practice, even though the primary goal of the

analysis might be prediction, the explanatory power of the chosen model cannot be entirely

neglected and might still be considered as relevant. For example, if we want to provide

explanations for changes in the estimated indicators by linking the changes to macroeconomic

phenomena or specific policy decisions. This is possible with the microsimulation approaches,

but not with PQM.

Methodological topic 1: Out-of-sample performance analysis

Out-of-sample measures are selected over in-sample measures such as the goodness-of-fit to

evaluate the prediction power of the models. The out of sample measures such as accuracy,

consistency and the uncertainty-based performance metric are used to assess the historical

performance by comparing flash estimates and observed values.

The first alternative for evaluating the prediction power would be to simply measure the

goodness-of-fit of the predicted values to the data points. However, this simple approach to

evaluating the prediction performance gives a natural preference to complex models having

large numbers of parameters over more parsimonious models due to the well-known problem

of overfitting. In general, overfitting occurs when a model has a large number of parameters

relative to the number of observations which lead the model to describe random error or noise

instead of the underlying signal component which is the object of interest of any statistical

modelling scheme. One way of getting rid of this inherent bias towards over-complex models

is to add a penalty term onto the goodness-of-fit measure which penalizes each of the models

according to their respective number of parameters. Thus, by introducing this type of penalty

terms if two models have similar goodness-of-fit the simpler model is preferred over the more

complex one. Note that the penalized goodness-of-fit metric can be interpreted as a practical

implementation of the law of parsimony which states that complexity should not be posited

without necessity. Some of the most widely-used penalized goodness-of-fit criteria are

Mallow's Cp or the Akaike information criterion.

Another way of preventing overfitting is to use out-of-sample performance analysis instead of

in-sample analysis. As far as in-sample prediction is concerned the same data is used for

model estimation (training) and for the validation of the prediction results. In contrast to this,

out-of-sample experiments consist in splitting the available data into two separate sets

commonly referred to as the training set and the validation set. To illustrate the out-of-sample

paradigm let us assume that we have 𝑡 EU-SILC datasets at our disposal indexed as 𝑡 =

1, … , 𝑇. For each of these EU-SILC datasets we can compute the income indicator of interest

which leads to a sequence of 𝑇 EU-SILC indicators denoted as �̃�𝑡 with 𝑡 = 1, … , 𝑇. We now

split this sequence of indicators into two sets with the training set containing the indicators �̃�𝑗

up to �̃�𝑙 such that 1 ≤ 𝑗 ≤ 𝑙 < 𝑡 and the validation set consisting of one single observation

that is �̃�𝑡. The out-of-sample experiment then consists in producing the flash estimate for the

target year �̂�𝑡 and to use the validation set for the evaluation of the performance of �̂�𝑡. Of

55

course we can repeat this procedure by running multiple out-of-sample experiments with 𝑡

taking on values on the grid 𝑇, 𝑇 − 1, 𝑇 − 2, …

Note that whereas the length of the validation set is always equal to one, the length of the

training set may differ with the target year and the nowcasting approach to be analysed. As far

as the group of micro-simulation models with data-based uprating factors is concerned, the

training set only consists of one single time point namely 𝑙 = 𝑡 − 1. On the contrary for the

PQM settings and the simulation models with model-based differential growth rates, the

training set is not fixed in advance while being strictly larger than two. Thus, for this kind of

models the sample size of the training is a free parameter. The guiding principle when

defining the starting point of the training set is that in order to produce meaningful flash

estimates for the target year the statistical properties of the EU-SILC indicators and the

explanatory variables comprised in the training set should be similar to the ones in the

validation set. This similarity in the statistical properties can either be derived in advance or

derived on the basis of an optimization procedure. In the former approach we might rely on

economic reasoning allowing picking those time periods which were characterized by the

macroeconomic and political circumstances comparable to the ones observed before and in

the target year. The optimization approach allows picking the optimal starting point, among a

set of candidate starting points, leading to the flash estimate characterized by the best

performance at the target year. However, given the high-dimensional nature of the PQM

models on the one hand and the simulation models using differential uprating factors on the

other hand, we fix the starting point to be equal to the starting point of the sample of available

EU-SILC datasets. Hence, for the out-of-sample nowcast for 2012 we use time points 2008 to

2011 for the model training. For the 2013 nowcast the training set consists of time points

2008 to 2012 and so on. Thus, the number of time point in the training varies along with the

value of the target year in the validation set.

(a) Accuracy and Consistency

As presented in the quality assurance section, these are first two metrics we have used for

assessing the quality of our estimates.

− Accuracy is the absolute percentage difference between the estimated and the

observed values of the indicator, for a given year; it tells how close the estimate is

to the observed value.

− Consistency is the absolute difference between the estimated and the observed

YoY % change; it tells whether the estimate got the magnitude of the change right.

They are expressed as percentage, the higher the value, the higher the desirability of the

situation. One limitation is that there is no clear threshold that discriminates between good

and not-so-good estimates. Therefore, they are not suitable as measures of absolute

performance. Another limitation is that neither accuracy nor consistency are measures that

take into account the uncertainty.

56

They are therefore used as measures of relative performance, to compare methods and to

discard those that do not work. In our quality framework, a value below 90% for either

consistency or accuracy is a signal for us to review the input data and the estimation

algorithm, in order to identify possible sources of error and to improve the results; the method

is discarded only if no improvement is achieved. Table 4.10 shows the average consistency by

method. This highlights two main messages: the performance is country dependent and

microsimulation seems to perform better for positional indicators (deciles).

57

Table 4.10: Consistency by method

58

(b) Uncertainty based performance measure

This section introduces an additional performance measure, being complementary to accuracy

and consistency. We will refer to this performance measure as the uncertainty-based

performance measure and it has two main advantages in comparison with accuracy and

consistency: it takes into account the sampling noise of the EU-SILC estimate when

evaluating the performance of the flash estimate and it provides also an absolute performance

measure as we can derive a threshold for the low and high performance based on a statistical

test.

Hence, in contrast to the previous performance metrics, the uncertainty based metric

establishes a link between the flash estimation output and concepts from probability theory

and statistical inference. The performance metric is based on the concept of distributional

equivalence. In general, a flash estimate at a given target year is said to have good

performance if its probability distribution is identical to the sampling distribution of the

corresponding EU-SILC estimate. The null hypothesis of identical distributions means that

the difference of the flash estimate and the EU-SILC estimate follows a normal distribution

with zero mean and standard deviation of √2 𝜎𝑡 or more formal as:

𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡).

Now if we denote as 𝐹𝜎𝑡(𝑥) the cumulative distribution function (CDF) of the 𝑁(0, √2𝜎𝑡)

distribution, the performance metric is given by

𝑫𝒕 = 𝟒 ⋅ 𝑭𝝈𝒕(�̂�𝒕 − �̃�𝒕) ⋅ (𝟏 − 𝑭𝝈𝒕

(�̂�𝒕 − �̃�𝒕)).

𝐷𝑡 is defined on the interval [0,1]. If the observed values of the flash estimate and the EU-

SILC estimate are identical, e.g. if �̂�𝑡 − �̃�𝑡 = 0, we have that 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) = 0.5 and hence

that 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have that 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) tends to 1

or 0 respectively implying that 𝐷𝑡 → 0.

The distribution of 𝐷𝑡 can be derived numerically using Monte Carlo simulations. The

availability of the distribution under 𝐻0 allows to compute the p-value associated with the

observed value of 𝐷𝑡 which then allows for probabilistic conclusions regarding the absolute

model performance. Having a measure of absolute performance reveals to be crucial hest

ranking flash estimate which outperforms all other available flash estimates could have a low

absolute performance. The threshold for discriminating between well-performing and low-

performing models can be defined in relation to the p-value of 𝐷𝑡 and be set equal to the

standard confidence levels of statistical test. We have chosen 0.1 as a confidence level. .

Hence, a p-value larger than 0.1 implies that we do not reject the null hypothesis that EU-

SILC and the flash estimates distributions are the same for methods with p-values higher than

0.1. Therefore all methods having a p-value lower than the significance level of 0.1 are

considered to be of low-performance, due either to the presence of some type of bias or to

instability issues reflecting in a large standard deviation, which in consequence leads to their

elimination.

59

The rest of the section contains the theoretical underpinnings of the uncertainty based

performance measure with the main methodological issues and as a last point the empirical

results of the performance analysis. All the methods went through the common assessment

framework and were scored mainly based on their historical performance in the test period

(2012-201428

).

Methodological topic 2: Distributional equivalence

The performance measure relies on the idea of distributional equivalence: a flash estimate at a

given target year is said to have good performance if its probability distribution is identical to

the sampling distribution of the corresponding EU-SILC estimate. This is translated in terms

of the first two moments of the sampling distribution based on the normality assumption.

In general, a flash estimate at a given target year is said to have good performance if its

probability distribution is identical to the sampling distribution of the corresponding EU-SILC

estimate. In order to illustrate why the distributional equivalence indicates a good

performance of the flash estimate, we can reason in terms of the first and the second statistical

moments, e.g. the mean and the variance or the standard deviation. First note that if the

probability distribution of the flash estimate is equal to the sampling distribution all their

statistical moments are identical. Thus, if the EU-SILC indicator is an unbiased estimate of

the unknown population indicator the equality of the two probability distributions implies that

the flash estimate is an unbiased estimate of the population parameter as well. Moreover, as

for any variable of interest in a statistical model framework we can consider the EU-SILC

indicator to consist of two components, a signal component and a noise component, with the

former carrying the information of interest and the latter representing a random error

reflecting sample noise which cannot and should not be captured by the modelling process.

The equivalence of both, the first and the second moment, imply that the flash estimate

captures the entirety of the signal component and that the only source of variation within the

flash estimate is due to the presence of sampling noise in the EU-SILC indicator.

We have conducted a Shapiro-Wilk testing scheme to verify the normality of the sampling

distribution of the EU-SILC indicators. The results of the test clearly indicate that the

normality assumption is a valid one and hence that the quality assessment of the flash

estimation results can be built around the first two moments of the sampling distribution.

Methodological topic 3: Formalization of the performance metric

The performance metric takes the form of a test statistic of a distributional testing framework.

The test statistic offers a measure of relative performance. The p-value of the test statistic

gives a measure of absolute performance.

28

2014 income data (EU-SILC 2015) not available for DE, IE, FR, IT, CY, LU, MT, PL, SI, SK, SE

60

To formalize the performance metric and the associated framework for testing the

convergence of the flash estimate to the EU-SILC estimate we first have to introduce a few

notations which allows for a more schematic representation of the different objects. From this

point on we denote as �̂�𝑡 the flash estimate of a given income indicator at target year 𝑡 and as

�̃�𝑡 the corresponding EU-SILC indicators. Moreover, we denote as 𝜇𝑡 the unknown

population parameter and as 𝜎𝑡 the bootstrap-based estimate of the sample standard deviation

which, as described above, can be considered as a consistent estimate of the unknown sample

standard deviation in the case of using unweighted indicators for the replicated samples.

Given these notations and provided that the EU-SILC indicator is an unbiased estimate of the

population indicator, and hence that 𝜇𝑡 = 𝐸(�̃�𝑡), the null hypothesis of equality of

distribution of the EU-SILC estimate and the flash estimate can be formalized as

𝐻0: 𝐸(�̂�𝑡) = 𝜇𝑇 𝑎𝑛𝑑 𝑆𝐷(�̃�𝑡) = 𝜎𝑡.

In response to the drawbacks of the more classical performance metrics such as the chi-

squared measure, we have developed an alternative metric which has the advantage of being

strictly bounded between bounded between zero and one and reveals to be less sensitive to the

presence of outliers than the chi-squared measure.

To explain the underlying idea of the performance metric we are first going to rewrite the null

hypothesis of identical distributions as the assumption that the difference of the flash estimate

and the EU-SILC estimate follows a normal distribution with zero mean and standard

deviation of √2 𝜎𝑡 or more formal as

𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡).

Now if we denote as 𝐹𝜎𝑡(𝑥) the cumulative distribution function of the 𝑁(0, √2𝜎𝑡)

distribution, the performance metric is given by

𝐷𝑡 = 4 ⋅ 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) ⋅ (1 − 𝐹𝜎𝑡

(�̂�𝑡 − �̃�𝑡)).

As mentioned above, 𝐷𝑡 is defined on the interval [0,1]. If the observed values of the flash

estimate and the EU-SILC estimate are identical, e.g. if �̂�𝑡 − �̃�𝑡 = 0, we have that 𝐹𝜎𝑡(�̂�𝑡 −

�̃�𝑡) = 0.5 and hence that 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have that

𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) tends to 1 or 0 respectively implying that 𝐷𝑡 → 0.

Hence, flash estimates producing larger values of 𝐷𝑡 have higher performance than flash

estimates producing lower values. In this sense, as for accuracy and consistency, the

uncertainty-based performance measure allows establishing a ranking of the flash estimates

based on their relative performance in terms of 𝐷𝑡 . However, as for accuracy and consistency,

these comparative studies of the performances delivered by different models does not allow

drawing any conclusion on their absolute performance. Note that indeed even the highest

ranking flash estimate which outperforms all other available flash estimates could have a low

absolute performance.

61

However, given that the performance metric is part of a statistical testing framework a

measure of absolute performance can be obtained from the p-values associated with the test

statistic𝐷𝑡. The p-value of the test statistics represents the probability to observe a value that

is equal or larger than the available test statistics. Thus, a low p-value indicates that the

available test statistics represents a rare event under the null hypothesis. Hence, what the p-

value offers in contrast to a mere analysis of the test statistics is the interpretation of the latter

as an outcome of a random experiment. For the sake of example assume that the p-value of 𝐷𝑡

was equal to 0.1. This implies that if we drew one hundred realizations from the distribution

of 𝐷𝑡 under the null hypothesis only ten of these one hundred realizations would be equal or

smaller than the observed value of 𝐷𝑡. Thus, the observing a value of 𝐷𝑡 or smaller can be

considered to be a rare event under the null hypothesis which occurs with rather low

frequency. Hence, given the observations of a statistically improbable value of 𝐷𝑡 we conclude

that the performance metric is unlikely to have been produced by a flash estimate which

fulfils 𝐻0 and hence that it has absolute low performance.

Obtaining the distribution of 𝐷𝑡 under 𝐻0 is straightforward in the sense that under the null

distribution 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) follows a uniform distribution on the interval [0,1]. Hence, the null

distribution of 𝐷𝑡 can be obtained numerically by simply generating observations from a 𝑈[0,1]

distribution and plugging the observations into the formula of the performance metric. Figure

4.1 plots the density function of the distribution under the null hypothesis. As far as the first

and second moment of the null distribution are concerned we have that 𝐸(𝐷𝑡) =2

3 and that

𝑆𝐷(𝐷𝑡) = 0.29..

Figure 4.1: Density function of the test statistics under the null hypothesis (single-period)

Note however that the discriminatory power of the distribution under the null is quite weak.

This implies that the capacity of the testing framework for rejecting incorrect null hypotheses

and hence for detecting low-performing flash estimates is not very strong.

62

To increase the discriminatory power of the testing framework, we have to measure the

performance of the flash estimates at more than one time point. There are basically three

elements that we need to adapt for accommodating multi-period performance analysis

settings: (1) the formulation of the null hypothesis, (2) the form of the test statistics and (3)

the distribution under the null hypothesis. This will be used for our analysis.

As in the one-period case, we denote as �̂�𝑡 the flash estimate at period t. However, we now

assume that there are T flash estimates at our disposal such that the set of available estimates

is given as (�̂�1, … , �̂�𝑇). If we denote as (�̃�1, … , �̃�𝑇) the set of the corresponding EU-SILC

indicators on that same time span we can write the null hypothesis as the assumption that the

probability distribution of the flash estimate is identical to the sampling distribution of EU-

SILC at each of the time points located within the period of analysis and hence as

𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡 = 1, … , 𝑇.

The test statistics for the multi-period testing framework is defined as the average of the test

statistics of the single-period frameworks over the period of analysis, which we denote as

𝐷𝑇 =1

𝑇∑ 𝐷𝑡

𝑇

𝑡=1

.

The distribution of 𝐷𝑇 depends on the length of the period of analysis and hence is dependent

upon the value of 𝑇. As far as the first two moments are concerned and given the assumption

of independent summands in 𝐷𝑇 we obtain that 𝐸(𝐷𝑇) =2

3 and 𝑆𝐷(𝐷𝑇) =

𝑆𝐷(𝐷𝑡)

√𝑇. Moreover,

given the central limit theorem for independent and identically distributed random variables

we can conclude that 𝐷𝑇 converges to a normal distribution as 𝑇 → +∞. Thus, the use of the

numerical approximation of the null distribution on the basis of Monte Carlo simulations gets

less relevant with increasing values of 𝑇 and hence for 𝑇 > 20 the normal distribution can be

used for the computation of the p-values. Figures 4.2 and 4.3 illustrate the density function

and the cdf function of the distribution under the null for different values of 𝑇. Note that given

the vanishing standard deviation the discriminatory power of the null distribution and hence

the power of the test increases along with 𝑇. On top of that, as 𝑇 → +∞ the power of the test

tends to 1. Thus, as 𝑇 increases the probability of not detecting low-performing flash

estimates decreases and asymptotically this probability tends to zero resulting in a consistent

decision rule.

63

Figure 4.2: Density function of the test statistics under the null hypothesis

Figure 4.3: CDF function of the test statistics under the null hypothesis

64

Using changes instead of levels

If the flash estimates of income indicators are characterized by a systematic error (bias), the

estimates of the first-order differences might very well be unbiased. The generalization of the

performance metric to estimates of absolute and relative first-order differences is

straightforward.

In some cases, the assumption of converging flash estimates might be rejected simply because

the concerned model framework commits a systematic error at each time point. One form of

systematic error is the presence of the same type of bias at each time point, whether it is a

multiplicative bias or an additive one. In the case of an additive bias the probability

distribution of the flash estimate is equal to the sampling distribution of the EU-SILC

indicator at each time point except for an identical shift in the location. We can denote this

more formally as a difference in expected values as 𝐸(�̂�𝑡) = 𝐸(�̃�𝑡) + 𝑏 for 𝑡 = 1, … , 𝑇 with

b being some rational number. In case of a multiplicative shift we would a have a discrepancy

in the expected values as 𝐸(�̂�𝑡) = 𝑏 ⋅ 𝐸(�̃�𝑡).

Note that if a systematic bias is present, an unbiased estimate can be defined on the basis of

simple transformation of sequence of flash estimates and by shifting the target of the

nowcasting procedure accordingly. Indeed, in the case of an additive bias the first order

difference of sequence of flash estimates, that is �̂�𝑡 − �̂�𝑡−1, is an unbiased sequence for

�̃�𝑡 − �̃�𝑡−1 given that 𝐸(�̂�𝑡 − �̂�𝑡−1) = 𝐸(�̃�𝑡 − �̃�𝑡−1). In the case of a multiplicative bias a

sequence of unbiased estimates can be constructed on the basis of the relative first-order

difference, e.g. (�̂�𝑡 − �̂�𝑡−1)/�̂�_(𝑡 − 1). All the concepts described throughout this section can

be easily applied when working with the relative or the absolute first-order difference instead

of the level of the indicators.

Note that the standard deviation of the changes under the null hypothesis is not identical to the

standard deviation of the levels. However, the former can be derived from the latter with the

standard deviation of the absolute differences being given by √𝜎𝑡−12 + 𝜎𝑡

2 and the one of the

relative change being well approximated as √𝜎𝑡−1

2 +𝜎𝑡2

�̃�𝑡−1.

Empirical results

As define above, the uncertainty-based performance measure 𝐷𝑡 is defined on the interval

[0,1]. If the observed values of the flash estimate and the EU-SILC estimate are identical we

have 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have 𝐷𝑡 → 0.

All the methods went through the common assessment framework and were scored mainly

based on their historical performance in the test period (2012-201429

). Hence, different

methods can be ranked based on their relative performance in terms of 𝐷𝑡 . Moreover, the

29

2014 income data (EU-SILC 2015) not available for DE, IE, FR, IT, CY, LU, MT, PL, SI, SK, SE

65

value added of the uncertainty based measure is that we can define an absolute performance

threshold in order to select out the low-performers on a sound statistical basis. This is

important as even the highest ranking flash estimate which outperforms all other available

flash estimates could have a low absolute performance.

We take the p-value of 𝐷𝑡 equal to 0.1 similar to most statistical tests. This implies that we

cannot reject the null hypothesis that EU-SILC and the flash estimates distributions are the

same for methods with p-values higher than 0.1. Therefore all methods having a p-value

lower than the significance level of 0.1 are considered to be of low-performance, due either to

the presence of some type of bias or to instability issues reflecting in a large standard

deviation, which in consequence leads to their elimination.

After low-performers are discarded we make an analysis of the average performance by

method. Ideally, we should have one method that outperforms the other in order to have a

common harmonised approach that fits all situations. However, as illustrated by the analysis

of accuracy and consistency, also in this case we have different patterns conditional on the

specific country and indicator.

Table 4.11: Average performance by main approach for all indicators

In general the average performance for all indicators among the family of micro-simulation

models is significantly higher than the corresponding share of PQM models with 16 countries.

However PQM has a higher average for AT,.IT, HU, BE, EL, LU, BG, DE, RO and EE.

These patterns are very different according to the indicators. For both model families the

number of well-performing models is at its largest for the AROP indicator. As far as the

positional parameters (ARPT and the deciles) are concerned, in general the PQM family have

a relatively lower performance. For example for ARPT for eight countries only

microsimulation passes the threshold of 0.10. However, in five countries PQM is better for

ARPT.

12.1

4.4 4.56.8 7.3 8.8

2.4

7.8 6.910.2

0.1

5.3 5.7

0.2

8.1

2.4

15.3

20.0

11.7

6.3

21.3

5.4

0.8 1.24.0

6.3

55.1

68.0

0

10

20

30

40

50

60

70

80

90

100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

NL PL DK ES FI LV FR SE SK PT CZ MT CY IE SI LT AT IT HU BE EL LU BG DE RO EE HR UK

Microsimulation PQM Absolute difference between microsimulation and PQM in percentage points

Group 3: MS or

PQM only

p.p.

Group 2: Performance PQM > MSGroup 1: Performance MS > PQM

66

Table 4.12: Average performance by main approach-ARPT

Table 4.13: Average performance by main approach: AROP

Therefore, we can conclude that even if microsimulation outperforms PQM in several

countries there is still a strong complementarity between the two main approaches when we

look across countries and indicators. This raised the issue of method selection which will be

discussed in the next chapter.

4.2.2. Method selection

As described earlier, we have at our disposal several model frameworks inspired by different

modelling paradigms. On the one hand there are simulation-based frameworks which operate

at the level of individual units by simulating the effects of tax policies on each individual

income. On the other hand we have the PQM approach based around dynamic factor models

and relying heavily on econometric and statistical estimation techniques. Moreover for each

of the two flash estimation methods there exist different subgroups of models which, in case

of the micro-simulation models for instance can be distinguished with respect to the

calibration algorithm used or the existence of a single uprating coefficient of the income

components across the distribution or multiple uprating coefficient for different socio-

26.9

17.5

30.4

2.9 2.8

10.7

28.8

21.1

4.16.1

14.311.7

0.3

8.7 10.1

60.759.1 58.9

53.8 53.7 52.6

45.8

4.0

57.2

48.2 46.9 46.4

41.0

0

10

20

30

40

50

60

70

80

90

100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

PL SE NL DE FI DK IT PT CZ MT IE FR LU EE LV EL RO HU LT AT BG BE HR SK SI UK CY ES


Group 3: MS or PQM only

p.p.

Group 2: Performance PQM > MSGroup 1: Performance MS > PQM

8.54.5 4.7

15.2

26.4

8.5

19.921.9

10.4

3.8

9.0

0.8

23.6

11.0

2.9

13.216.7

62.5

56.453.1

49.4 48.4 47.8

37.0

60.158.2

53.0

40.0

0

10

20

30

40

50

60

70

80

90

100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

DK AT PL NL LV BE FR MT CY DE SI CZ IT LU SK HU BG RO FI EE HR PT LT IE UK EL SE ES


Group 3: MS or PQM only

p.p.

Group 2:

Performance PQM > MS

Group 1: Performance MS > PQM

67

demographic groups. As far as the PQM model is concerned the sub-models differ in terms of

number of factors, mapping technique used to build the distribution of disposable income

from the estimated quantiles or type of quantiles used as dependent variables within the

dynamic factor model.

The assessment of the performance of the different methods highlighted also a strong

complementarity between approaches, namely microsimulation and PQM. This raised the

issue of method selection and different strategies were tested within the quality assessment

framework:

(a) A best method by country and indicator that takes into account the specific traits of

groups of indicators. The results showed that there is a kind of polarisation of

methods according to their performance by two groups of indicators: inequality

indicators (AROP and QSR) and positional indicators. In general microsimulation

outperforms PQM for positional indicators.

(b) A best method by country that takes into account national specificities;

microsimulation tends to outperform PQM in this case (19 countries) as it and to

provide in general more stable results overall for the indicators. However, PQM

gives in general better results for AROP and in several countries it provides better

results (e,g, SI, CY).

(c) An aggregate of the best methods selected based on their historical performance

and the convergence of estimates in the target year. This last option proved to be

the more robust and it can address the issue of the high complementarity of

methods.

(a) Choosing the best-performing model by country and indicator

One possible model selection procedure would be to pick the best-performing model among

all the candidate models and this by country and indicator. If we denote as 𝐷𝑇𝑘 the multi-

period (out-of-sample) performance metric (see Section 4.2.1) of the k-th modelling approach

for some given country and indicator with the model index ranging from 1 to 𝐾 and hence an

overall of 𝐾 distinct model frameworks at our disposal, we can express this simple model

selection procedure under the form of an optimization setting in which the selected model is

defined as

𝑘∗ = max𝑘(𝐷𝑇𝑘).

Thus, the chosen flash estimate for the target year 𝑇 + 1 would be the one produced by the

𝑘∗-th model framework that is �̂�𝑇+1(𝑘∗)

. The principle which underlies this model selection

procedure is that given the candidate models at our disposal, picking the model with the

highest historical performance should also provide the best-performing flash estimate at the

target year. However, there are some flaws to this reasoning which could lead to an

underperformance of the chosen flash estimate at the target year.

To begin with, recall for any of the countries the respective training set only contains income

data from 2008 onwards and hence provided that the 2014 data is already available for the

68

2015 nowcasting exercise we have a maximum number of seven data points that can be used

in the out-of-sample experiments. If the objective is to nowcast the levels of the indicators this

allows to generate four out-of-sample flash estimates for 2011, 2012, 2013 and 2014. In the

presence of additive or multiplicative bias and hence a shift of the objective towards the

estimation of the absolute or relative year-on-year change that number is reduced to only

three. Recall from Section 4.2.1 that the power of the statistical testing framework to

distinguish between the null hypothesis and its alternative strongly depends on the number of

out-of-sample observations at our disposal. That being said, given that for each country and

indicator we only have three observations at our disposal, the power of the performance

metric might be quite small which could ultimately result in a performance analysis which

does not reflect the actual performance of the different approaches and hence to a selection of

under-performing models.

Secondly, given that the choice of the model is done by country and indicator there is a

possibility that we end up with different models for each indicator of one specific country.

However, given that the flash estimates of each of the approaches are generated independently

from each other, selecting different models for producing the nowcasts of the set of indicators

of one given country does not take into account the inherent link that exists between the

indicators which ultimately could result in inconsistent outcomes. By inconsistent outcome

we understand a situation in which the produced set of flash estimates for a given country

cannot originate from one single income distribution. However, if no regularity conditions

whatsoever are imposed upon the underlying income distribution, these situations of

inconsistencies might actually never be observed. Thus, we might reformulate the definition

of inconsistent indicator set as being a collection of indicators which cannot be mapped to a

distribution stemming from a given family of probability distributions. A widely used

distribution for modelling income data is the generalized beta distribution of the second kind

(GB2 distribution) in which case a set of nowcasted indicators would be considered to be

inconsistent if there does not exist a vector of GB2 distributional parameters and hence no

GB2 distribution which can be mapped to the set of nowcasted indicators.

Thirdly, in Section 4.2.1 we have seen that the performance analysis is built around a

statistical testing framework whose null hypothesis states that the flash estimate is an

unbiased estimate of the population indicator and that its standard deviation is identical to the

sample standard deviation of the corresponding EU-SILC estimate (hypothesis of equal

distributions). The absolute performance metric, that is the p-value of the testing framework,

can be seen as an indication of the likelihood that the null hypothesis is correct. The more out-

of-sample observations available, the more precise this indication is. Simply picking the

model framework with the highest p-value among all candidate models can lead to sub-

optimal decisions leading to elimination of model frameworks which fulfil the assumption of

the null hypothesis. The smaller the number of out-of-sample observations the more frequent

the occurrence of these sub-optimal decisions.

69

(b) Choosing the best-performing model by country

To strengthen the discriminatory power of the performance metric, we have to increase the

number of out-of-sample observations used for the computation of the performance metric.

Indeed, note that for the model selection procedure consisting in choosing one model

framework by country and indicator, the performance metric is defined as

𝐷𝑇𝑘 =

1

𝑇∑ 𝐷𝑡

𝑘

𝑇

𝑡=1

with 𝑡 ranging from 2012 to 2013 or 2014 depending whether the EU-SILC dataset of 2015 is

already available for that country. However, instead of choosing one method by both, country

and indicator, we can also select one single model framework for each country and use the

latter to the flash estimates of the entire set of indicators. In this case the optimization setting

takes on a different form with a new objective function as

𝑘∗ = max𝑘

1

𝑅 ⋅ 𝑇∑ ∑ 𝐷𝑡,𝑟

𝑘

𝑇

𝑡=1

𝑅

𝑟=1

where 𝐷𝑡,𝑟𝑘 represents the country-specific performance metric of the k-th modelling approach

with at target year 𝑡 and with respect to indicator 𝑟 with 𝑟 = 1, … , 𝑅. Hence, in contrast to the

county and indicator specific optimization scheme, the aggregate performance metric used in

the country-specific setting does not only sum the performance metrics across time points but

across time and indicators. This results in a higher discriminatory power of the performance

metric which should result in an increased performance of the derived flash estimates.

It can be seen that the increased robustness of the country-wise optimization settings leads to

an increase out-of-sample performance of the flash estimates and hence the use of country-

specific objective functions should be given preference over the use of objective functions by

country and indicator.

(c) Aggregating the well-performing estimates into a single estimate

An alternative approach to model selection is to bundle the values of different flash estimates

into a one single aggregate estimate. In contrast to the single-choice approaches, consisting in

selecting the best-performing model framework, model aggregation is not based around

selection but it rather aims at combining the information provided by the entire set of well-

performing estimates.

The use of country-specific objective functions solves two of the issues related to the use of

country-and-indicator-specific objective functions. To start with, by picking one single model

framework for each country the power of the performance metric is increased considerably.

Moreover, the resulting set of flash estimates does not suffer from any inconsistency issues as

70

described above. However, given that in the country-specific optimization scheme we still

only select the highest-performing model for producing the flash estimate at the target year,

we might still miss some relevant information offered by some of the remaining models. In

this part we are going to describe an algorithm which aims at combining the values of the

flash estimates produced by different model frameworks such as to end up with one single

aggregate flash estimate for the target year. This idea of aggregating the information provided

by different estimates is presented as an alternative to picking the best estimate is known in

the literature as model averaging. For more details see for example Claeskens and Hjort

(2008) or Hjort and Claeskens (2003).

The algorithm takes the form of an iterative optimization procedure consisting of three main

steps.

Step 1: Remove low-performing estimates

The first step consists in discarding the low-performing model frameworks that is those with

very low historical performance. Given that the absolute performance metric has an

interpretation as a p-value the filter step takes the form of a statistical testing procedure in the

sense that a model framework having a historical performance lower than a pre-defined cut-

off-values (confidence level) are removed from the initial set of models. We have chosen a

confidence level of 0.1 and hence all the methods having a p-value lower than 0.1 are

considered to be of low-performance, due either to the presence of some type of bias or to

instability issues reflecting in a large standard deviation, which in consequence leads to their

elimination.

Step 2: Compute aggregate estimate from the set of remaining candidate models

The second step consists in aggregating those flash estimates that have passed the filtering

step into one single value. Recall from Section 4.1 that the sampling distribution of the EU-

SILC estimate at the target year is considered to be a normal distribution with mean 𝜇𝑇 and

standard deviation 𝜎𝑇. The objective of the aggregation step is to find the most likely value of

the parameter set (𝜇𝑇 , 𝜎𝑡) and hence the most likely form for the underlying normal

distribution from which the set of flash estimates could have originated from. Put differently,

we consider all the flash estimates to be observations from the same normal distribution and

our objective is to find the parametrization of this distributions. However, recall at this point

that the testing framework is based around the null hypothesis that the flash estimates follow a

probability distribution that is identical to the sample distribution of the EU-SILC estimates.

Thus, we do not only want the flash estimate to be observations of any normal distribution but

of the sampling distribution of the EU-SILC estimate at the target year. This information is

crucial since it allows us to reduce the degree of freedom of the optimization setting. Indeed,

given that the underlying distribution of the flash estimates is given by the sampling

distribution of EU-SILC this implies that the standard deviation 𝜎𝑡 should be equal to the

standard deviation of the EU-SILC estimate. As illustrated in Section 4.1, a consistent

estimate of the sample standard deviation can be found using bootstrap procedures. Given that

at the target there is no corresponding EU-SILC dataset which could be used for the

bootstrapping procedure we set 𝜎𝑇 equal to the standard deviation of the last available EU-

71

SILC indicator. Given this, the aggregate estimate, denoted as 𝜇𝑇∗ can be defined as the

solution to a weighted maximum-likelihood estimation setting as

𝜇𝑇∗ = max𝜇𝑇

∑ 𝐷𝑇−1𝑘 𝐹(�̂�𝑇

𝑘 , 𝜇𝑇 , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇𝑘 , 𝜇𝑇 , 𝜎𝑇))

𝐾+𝑘=1

where �̂�𝑇𝑘 denotes the flash estimate of the k-th model framework among all the 𝐾+candidate

models that have passed the filtering step. 𝐷𝑇−1𝑘 is the test statistics reflecting the historical

performance of the 𝑘-th candidate model. Hence, the better the historical performance of a

given candidate model the stronger the impact of the latter on the outcome of the optimization

frameworks and hence on the value of the aggregate estimate.

The aggregate estimate obtained at the second step represents the most likely value of the

mean parameter of the sampling distribution conditional on the set of available and remaining

flash estimates after application of the filter.

Step 3: Evaluate performance of aggregate estimate

However, one question which is left unanswered is whether 𝜇𝑇∗ is an overall good estimate in

addition to being the most likely one. At the third step of the aggregation step we aim at

providing an answer to this question. In order to do so we first construct a standardized

version of the objective function used in the maximum-likelihood framework, that is

𝑓(𝜇𝑇∗ ) =

4

𝐾+ ∑ 𝐷𝑇−1

𝑘 ⋅ 𝐹(�̂�𝑇𝑘 , 𝜇𝑇

∗ , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇𝑘, 𝜇𝑇

∗ , 𝜎𝑇))

𝐾+

𝑘=1

.

Note that the form above is only a weighted version of the test statistics defined in Section

4.1. This implies that the distribution of 𝑓(𝜇𝑇∗ ) under the null hypothesis that all the flash

estimates are observation of the same probability distribution can be obtained numerically on

the basis of Monte Carlo simulations. From the distribution under the null we can derive the

p-value associated with 𝑓(𝜇𝑇∗ ) which provides an indication of the absolute performance of

𝜇𝑇∗ .

As for the historical performance, we conclude that if the p-value is larger than 0.1 the overall

performance of the aggregate estimate at the target year is acceptable in which case 𝜇𝑇∗

constitutes the final output of the aggregation algorithm. On the contrary, if the p-value is

smaller than 0.1 we go back to the second step and try to find a different value of the

aggregate estimate using a reduced set of candidate flash estimates. In order to construct this

reduced set of estimates we proceed according to a leave-one-out scheme by discarding the

candidate flash estimate which has the lowest performance, both historically and at the target

year, that is

𝑘− = min𝑘

𝐷𝑇−1𝑘 ⋅ 𝐹(�̂�𝑇

𝑘, 𝜇𝑇∗ , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇

𝑘, 𝜇𝑇∗ , 𝜎𝑇)).

Iterations

Step 2 and Step 3 are iterated as long as the p-value associated with 𝑓(𝜇𝑇∗ ) is smaller than 0.1.

72

Comparative studies and conclusions: FE 2014

Recall that we have three different model selection algorithms at our disposal. Hence, using

each of the model selection algorithms, we can compute three different flash estimates for

each combination of country and income at a given target year. In order to decide which

among these three estimates should be picked, we need to have a criterion at our disposal

reflecting the performances of each of the model selection algorithms.

To obtain this measure of performance we conduct out-of-sample experiments in which we

compute the flash estimates of twenty-three countries and eight indicators (AROP, QSR,

ARPT, D10, D30, D50, D70 and D90) using the tree model selection procedures described

above. Based on the indicator reference values at income year 2014 we then evaluate the

performance of the flash estimates by computing the p-value of the uncertainty-based

performance evaluation approach as described in Section 4.2.1. At the time of the analysis

there were a total of twenty-three countries for which income data 2014 was available. Thus

for each of the three model selection procedures we obtain 184 flash estimates in total, one for

each combination of country and indicator.

Table 4.14 reports the mean performance of the three model selection approaches by country

and across indicators. The first important conclusion that can be drawn from the results shown

in Table 4.14 is that in most cases the performance of the best-by-country estimate is higher

than the best-by-country-and-indicator estimate. This is a direct consequence of the increased

discriminatory power of the indicator-unspecific objective function resulting in high degrees

of robustness of the best-by-country estimate and hence to a better out-of-sample performance

compared to the indicator-specific objective function. The same conclusion holds when

comparing the performances of the aggregate estimate and the best-by-country-and-indicator

estimate given that in twenty out of the twenty-three countries the performance of the

aggregate estimate is higher.

However, the information provided by these pairwise performance comparisons across

countries is misleading in a lot of cases. Take for instance the cases of Austria and Bulgaria.

For both countries the aggregate estimate outperforms the best-by-country estimate. However,

the magnitudes of the performances of the estimates are very different in both cases, 0.79

respectively 0.73 in the case of Austria and 0.33 respectively 0.06 for Bulgaria. Hence, even

though both countries would lead to identical conclusions in terms of relative performance

evaluation, namely that the aggregate estimate ranks highest for both countries, they lead to

different conclusions in terms of absolute performance evaluation.

73

Table 4.14: Average performance across indicators of 2014 flash estimates

Country Aggregation By country By country and

indicator

BE 0.119 0.149 0.085

BG 0.335 0.063 0.101

CZ 0.218 0.055 0.034

DK 0.154 0.033 0.067

EE 0.349 0.149 0.001

EL 0.036 0.001 0.000

ES 0.008 0.034 0.000

FR 0.481 0.930 0.606

HR 0.000 0.000 0.000

LV 0.894 0.243 0.010

LT 0.000 0.000 0.000

HU 0.830 0.014 0.666

MT 0.190 0.132 0.249

NL 0.922 0.885 0.651

AT 0.785 0.739 0.003

PL 0.751 0.804 0.433

PT 0.006 0.044 0.001

RO 0.884 0.000 0.000

SI 0.000 0.000 0.000

SK 0.000 0.118 0.000

FI 0.525 0.830 0.850

SE 0.009 0.021 0.006

UK 0.548 0.067 0.003

Indeed, as far as Austria is concerned both estimates have a high absolute performance. For

instance, both estimates have performances beyond the classical thresholds used in statistical

testing settings, that is 0.01, 0.05, and 0.1. On the other hand, in the case of Bulgaria the best-

by-country estimate has a performance below 0.1 whereas the performance of the aggregate

estimate is significantly larger than 0.1. Thus, in the case of Austria both estimates have good

absolute performance and should be treated alike. On the contrary, as far as Bulgaria is

concerned, there should be a clear discrimination between both estimates given the

significantly higher performance of the aggregate estimate.

For the results shown in Table 4.14 we fix confidence levels ranging from 1 to 0.05 and report

the proportion of flash estimates by country having an average performance across indicators

smaller than the respective confidence level.

74

Table 4.14: Proportion of flash estimates below confidence levels

Confidence Level Aggregation By country By country and

indicator

1 1 1 1

0.95 1 1 1

0.9 0.957 0.957 1

0.85 0.870 0.913 1

0.8 0.826 0.826 0.957

0.75 0.739 0.826 0.957

0.7 0.739 0.783 0.957

0.65 0.739 0.783 0.870

0.6 0.739 0.783 0.826

0.55 0.739 0.783 0.826

0.5 0.652 0.783 0.826

0.45 0.609 0.783 0.826

0.4 0.609 0.783 0.783

0.35 0.609 0.783 0.783

0.3 0.522 0.783 0.783

0.25 0.522 0.783 0.783

0.2 0.478 0.739 0.739

0.15 0.391 0.739 0.739

0.1 0.348 0.565 0.696

0.05 0.348 0.435 0.609

Hence, there are 35% of the aggregate estimates having absolute performance lower than 0.1.

However, at the same time 57% of the best-by-country estimates and 70% of the best-by-

country-and-indicator estimate are characterized by a performance lower than 0.1. In general,

for most confidence levels the proportion of aggregate estimates having lower performance

than the confidence levels is significantly smaller than the proportions reported for the other

two model selection methods. Hence, the out-of-sample experiments based on the 2014

income data provide strong evidence for an outperformance of the aggregation approach over

the other two model selection methods.

Figures 4.4 and 4.5 show the results of the flash estimation procedure for AROP and QSR at

target year 2014 in the case of the aggregation-based model selection procedure. The x-axis (

y-axis) contains the absolute year-on-year change of the EU-SILC estimate (flash estimate)

from 2013 to 2014. The dot-shaped points represent the different flash estimates with the ones

highlighted in blue corresponding to the aggregate estimate. The red dots indicate those flash

estimates which have a p-value lower than or equal to 0.1 and hence are eliminated at the at

the filtering step. The orange dots represent those flash estimates whose historical

performance is considered to be good enough to pass the filtering step. However, they are

eliminated while constructing the aggregate estimate. Finally, the green dots stand for the

estimates which pass both, the filtering step and the aggregation step. The corresponding year-

on-year changes of the EU-SILC indicator are represented as through the cross-shaped dots.

75

Figure 4.4: AROP 2014: YoY change EU-SILC versus YoY estimated

Figure 4. 5: QSR 2014: YoY change EU-SILC versus YoY estimated

76

4.2.3. Ex-ante quality assessment for flash estimates in the target year

In order to evaluate the performance of the flash estimates at target year, we have considered

two different quality criteria:

a) Historical performance of the aggregate;

b) Prediction interval that considers both the sampling error and the uncertainty of the model;

The quality criteria provide an indication of the reliability of the information provided by a

given flash estimate 2015.

(a) Historical performance of the aggregate

The historical performance of the aggregate flash estimate, denoted as 𝐻+, takes the form of

the mean performance of its 𝐾+ constituent estimates and hence

𝐻+ =1

𝐾+ ∑ 𝐷𝑇−1

𝑘 .

𝐾+

𝑘=1

Under the assumption that the historical performance of the aggregate estimate provides an

adequate approximation of the performance of the latter at the target year we have that a low

value of 𝐻+corresponds to a low reliability of flash estimate. Hence, in practice we say that

flash estimates having a value of 𝐻+ lower than 0.40 (8%) are of low reliability and hence are

discarded. Moreover, we made a qualitative review for cases where the quality was between

(0.40-0.70). This included several checks: the cross-validation with the results of the best

method by country, further analysis of the estimates between the two main approaches and

individual values composing the aggregate, inconsistencies between indicators due to

selection of different methods. These qualitative checks will need to be formalized in the

future and included in the aggregation framework. In general the aggregation proved to be

more robust but for further work it would be important to ensure (a) a better consistency of

the results across indicators using for example the same combination of methods for all

indicators for a particular country; (b) improve the convergence of the candidate estimates

that enter the aggregate; (c) take into account the dependence within specific groups of

methods in the filtering process.

(b) Standard deviation of the estimate

Note that the historical performance only provides an indication of the average performance

of the participating estimates. Thus, two aggregate estimates with identical average

performance of its constituent estimates but with different values of 𝐾+, that is different

numbers of participating estimates are treated the same. In order to discriminate with respect

to 𝐾+we need to make a conceptual shift by moving from the mere point estimation to interval

estimation.

77

In contrast to point estimates, interval estimates account for the uncertainty of the data

estimation process, resulting in an estimate of both, the mean value and the standard deviation

of the target variable. In order to do this, we first need an estimate of the standard deviation of

the aggregate estimate.

We obtain the latter using a bootstrap approach. Thus,, we draw 𝐾+ independent observations

from a 𝑁(𝜇𝑇∗ , 𝜎𝑇) distribution and plug them into the objective function of Step 2 of the

aggregation algorithm. By maximizing the objective function conditional on the generated set

of observations we obtain a value 𝜇𝑇,𝑚∗ of the aggregate estimate. If we repeat this exercise M

times we get a set of values 𝜇𝑇,𝑚∗ with 𝑚 = 1, … , 𝑀.

Under the assumption that the EU-SILC estimate is an unbiased estimate of the population

indicator 𝜇𝑡, we have that the EU-SILC indicator is distributed according to 𝑁(𝜇𝑇 , 𝜎𝑇). Given

that the flash estimate is an estimate of the population parameter with standard deviation 𝜅𝑇

and under the assumption of independence of the EU-SILC estimate and the flash estimate,

the new estimated distribution can be written as

𝑁 (𝜇𝑇∗ , √𝜅𝑇

2 + 𝜎𝑇2)

where 𝜎𝑇 is the standard deviation derived from the last available EU-SILC dataset.

Based on the estimated distribution of the SILC indicator we can derive a prediction interval

for the flash estimates 2015 which takes into account both the sampling error of EU-SILC

and the uncertainty of the model.

The size of the prediction interval provides an indication of the quality of the aggregate

estimate. Indeed, if there is a large number of estimates participating in the construction of the

aggregate estimate with more or less uniform participation the model-based standard

deviation (𝜅𝑇) will be relatively small indicating an increased robustness of the aggregate

estimate. On the other hand, if only a small number of individual estimates contribute to the

aggregate estimate 𝜅𝑇 will be large reflecting an unstable behaviour of the aggregate estimate.

The same is true if the contribution of one or very few estimates is large compared to the

other participating estimates resulting in a non-uniform weights vector.

4.2.4. Conclusions

The first important conclusion that can be drawn from this analysis is that both, the aggregate

estimate and the estimate derived on the basis of a country-specific objective function largely

outperform the estimate consisting in choosing the best historical estimate by country and

indicator. Moreover, for nearly all countries the aggregate estimate has a better performance

than the country-specific estimate.

Note that the main difference between both estimates is that the best-by-country estimate only

uses the information provided by one of the available candidate models whereas the aggregate

estimate aims at combining the information of the well-performing candidate estimates.

78

leading to a more robust model selection procedure. Moreover, in the aggregation algorithm

the convergent/divergent behaviour of the candidate models within the target year constitutes

an integral part of the performance assessment setting. On the contrary, the best-by-country

estimate does not take into account the relative performance of the candidate models at the

target year and hence is exclusively based on the use of historical performance. However, the

strength of the best-by-country estimate is that the model selection is done at the country-level

which implies that for a given country the same model is used to produce the flash estimates

for all the indicators assuring consistency among the latter. Note that using different methods

for distinct indicators by no means implies the occurrence of an inconsistent set of indicators.

However, using the same model for all the indicators estimates is a sufficient condition for

consistency.

We can thus conclude that aggregating the information provided by several well-performing

methods allows creating more robust flash estimates compared to the other two selection

procedures. However there are two main issues to be addressed:

In contrast to the best-by-country estimate, the aggregation algorithm is not

constrained to the use of one sole model at the country level and hence the set of

participating models might very well vary across indicators. Hence, the aggregation

algorithm does not assure consistency in advance.

The optimisation algorithm of aggregation doesn't take into account the hierarchical

structure of the methods at this point so we can have issues to balance the participation

of the two approaches.

79

5. Flash estimates 2015

5.1. Production of FE 2015

Following the comparative analysis done for 2014 indicating an outperformance of both, the

best-by-country and the best by-country-and-indicator model selection method, by the

aggregation method, the flash estimates 2015 are constructed based on the aggregation

algorithm as follows: (1) discard the low-performing methods with low historical performance

(described in Section 4.2); (2) aggregate those flash estimates that have passed the filtering

step into one single value; (3) discard candidate flash estimates that diverge in the target year.

In practice, we compute three different types of aggregate estimates using the iterative

optimization procedure described above. In each of the three cases the subset of individual

flash estimates and hence the input of the aggregation algorithm is different. Hence, for the

computation of the first aggregate estimate we exclusively take into account those estimates

that were built using micro-simulation methods. As far as the second aggregate estimate is

concerned we only consider individual estimates stemming from the PQM family. Finally, a

third version of the algorithm considers the estimates provided by both model families. Thus,

the set of candidate models considered for the computation of the first two aggregate

estimates are nested within the set of candidate models of the third estimate.

Figures 5.1 to 5.3 illustrate the participation of the different methods in the construction of the

aggregate estimate at target year 2015 in the case where the set of candidate models consists

of both, micro-simulation models and pqm models. The colours provide an indication of the

relative importance of each of the models. To start with, the white cells indicate that a given

model framework has not been available for a specific country. A cell highlighted in red

means that this model is characterized by a historical performance lower than the threshold

value of 0.1 and implying an elimination of that model at the filtering step. The models that

pass the filtering step are either highlighted in green or orange. The orange cells indicate that

this model has been discarded during the iterative aggregation scheme. Hence, only the

models marked in green have had an impact on the construction of the final aggregate

estimate. The darker the shade of green of a given cell, the higher the influence of that model

framework.

We can notice that for most countries the share of micro-simulation models passing the

filtering step for AROP and ARPT is considerably higher than the corresponding share of

PQM models. A similar conclusion can be drawn in terms of the income deciles (D10, D30,

MEDIAN, D70, D90). A slightly different behaviour can be observed for the QSR indicator

for which the performances of both approaches seems to be more balanced. Thus, we can say

at this point that the micro-simulation models tend to outperform the PQM models across

indicators and countries.

Besides the figures clearly show that within both model families the number of models

passing the filtering step is at its largest for the AROP indicator. As far as the ARPT is

concerned, the figure shows that all the members of the PQM family have relatively low

80

performance and that consequently for several countries they do not play any role or any

important role in the construction of the aggregate estimate.

Figure 5.1: Contribution of each of the flash estimates to the aggregate estimate of the 2015 AROP

indicator

Figure 5.2: Contribution of each of the flash estimates to the aggregate estimate of the 2015 ARPT

indicator

81

Figure 5.3: Contribution of each of the flash estimates to the aggregate estimate of the 2015 QSR

indicator

The next step was to perform an ex-ante analysis of quality in order to decide between the

different sets of indicators as well as identifying cases in which indicators are still considered

unreliable. This assessment is mainly done on the historical performance and the convergence

in the target year taking into account uncertainty, including also more qualitative information.

If the aggregate estimate built from micro-simulation models and the PQM-based estimate

have a similar historical performance but provide contradictory or at least discrepant results at

the target year, their joint information is considered to be unreliable. If one approach has a

considerably higher historical performance compared to the other, we report the information

provided by the high-performing estimate. Finally, if both estimates have good historical

performance, without one of them considerably out-performing the other one, we report the

information provided by the combined estimate considering both model families. In the future

this kind of hierarchically-structured decision process can be directly introduced in the

aggregation algorithm

Apart the main income inequality indicators (AROP and QSR) we produced the flash

estimates 2015 also for the so-called positional indicators, namely deciles and the ARPT.

For the two main inequality indicators the aggregation clearly outperforms the other two

selection procedures (the best by country and best by country and indicator). Methods,

mainly coming from microsimulation, which have a relative higher performance on positional

indicators perform less well for AROP or QSR. As mentioned above the participation number

in the aggregate is high for inequality indicators (AROP and QSR) but relatively low for

location parameters (ARPT and the deciles of the income distribution). However, the

predictive power of the aggregate estimate increases with the participation number. Hence, if

the participation number is small the aggregation algorithm cannot fully deploy its potential

and hence sees its performance reduced to the performance of the by-country-and-indicator

estimate which, as mentioned above, is underperforming when compared to the by-country

82

estimate. This has led to the decision to use the aggregation algorithm for inequality indicators

only and to resort the by-country estimate for location parameters.

This leads to sub-optimal solutions in which for the estimation of a specific decile we rely on

a single method which is the best method by country and indicator. Therefore, for this set of

indicators, we decided to opt for the best by country method in order to ensure a better

consistency across the different deciles. This still leads to a large number of indicators that

have a performance lower than the threshold for specific deciles: for example D10 seems a

particular difficult indicator.

Further work needs to improve some of the issues raised by the use of different models: 1)

need to better take into account the hierarchical structure of the methods used: by sub-group,

by approach etc. 2) need to ensure a higher consistency across indicators. The same objective

function in the optimization algorithm by country would ensure this. However, the first results

show that methods which perform better for predicting inequality indicators are not

necessarily the same as for positional indicators.

5.2. Communication of flash estimates 2015: magnitude-direction scales

Providing a point estimate for the population parameter does not take into account the

uncertainty of the flash estimate. Despite its simplicity, communicating a single number gives

a false impression of precision. Alternatively, we could take into account the uncertainty of

the estimate by providing two main elements (a) a prediction interval; and (b) a magnitude-

direction scale for the estimated change in the flash estimate. From the overlap of the

prediction interval with the change classes, we can derive the probabilities of each of the

latter.

Based on preliminary consultation with or main users and in order to take onto account the

uncertainty of the flash estimate we suggest using a magnitude-direction scale with 6 classes

(MD6), to which the YoY change will be assigned:

(1) major increase [+++]

(2) moderate increase [+]

(3) (quasi) stable / minor changes [O]

(4) moderate decrease [-]

(5) major decrease [---]

(6) no conclusion (when we get contradictory signals from different methods with good

past performance).

This would allow associating a probability to each magnitude scale, and therefore taking into

account the uncertainty of the point estimate.

We tend to see favourably such an approach, which takes into account the uncertainty of our

estimates; this is expressed also by the fact that we have included the "no conclusion" class in

83

the scale: this indicates a case when the signal is highly ambiguous, the results (or the

prediction interval) cover equally multiple divergent classes.

Figure 5.4: Illustration MD6 scale

Estimate Moderate increase

Estimate Probability of major decrease: --- moderate decrease: --- stable / minor change: 5% moderate increase: 75% major increase: 20%

The definition of the classes requires choosing a meaningful threshold. There are two main

approaches that can be envisaged:

− Country specific thresholds that are defined as multiples of standard deviation

observed in EU-SILC.

− Pre-defined thresholds based on uniform values for all countries which take into

consideration an user assessment of what is considered a relevant change

One argument in favour of having uniform thresholds is related to communication, namely the

difficulty in explaining why a change of the same magnitude is described as (in an extreme

but nevertheless possible example) "a stable situation" for one country, and "a large decrease"

84

for another. However, depending on the sampling design and the sample size a specific YoY

change can be significant or not for one country.

Therefore, we present both types of scales. For the first option the thresholds that define the

classes are multiples of the standard deviation (SD) as follows:

− major increase / decrease: >4SD

− moderate increase / decrease: 2-4SD

− minor changes [O]: ±2SD

For the second option we choose a more granular scale that depends on the specific indicator

with 7 possible change classes.

Once we have defined the thresholds for the MD6 classes we can derive the estimated

probabilities for our flash estimates to fall within pre-specified intervals based on the standard

deviation as calculated under in point a).

𝑃(𝑎 ≤ �̃�𝑇 < 𝑏) = 𝐹 (𝑏, 𝜇𝑇∗ , √𝜅𝑇

2 + 𝜎𝑇2) − 𝐹 (𝑎, 𝜇𝑇

∗ , √𝜅𝑇2 + 𝜎𝑇

2).

This formula is used to compute the estimated probabilities to be within the different MD6

classes with intervals being fixed as ] − ∞, 4 ⋅ 𝜎𝑇], [−2 ⋅ 𝜎𝑇 , −𝜎𝑇[, [−𝜎𝑇 , 𝜎𝑇[, [𝜎𝑇 , 2 ⋅ 𝜎𝑇[,

[4 ⋅ 𝜎𝑇 , +∞[. Thus for instance the estimated probability for the third MD6 class is computed

as

𝑃(−𝜎𝑡 ≤ �̃�𝑇 < 𝜎𝑡) = 𝐹 (𝜎𝑡, 𝜇𝑇∗ , √𝜅𝑇

2 + 𝜎𝑇2) − 𝐹 (−𝜎𝑡, 𝜇𝑇

∗ , √𝜅𝑇2 + 𝜎𝑇

2).

5.3. Main results

Figures 5.5 to 5.11 show the two proposal for magnitude direction scales for the main

indicators: with absolute thresholds and with relative threshold. The results with either a low

historical performance or a high historical performance but with inconsistencies across the

two approaches in the target year are not presented in the tables as considered unreliable.

Figure 5.5 shows the flash estimates for 2015 for AROP. They are available for all Member

States, except for Ireland, Czech Republic and Spain. In most countries, a minor change is the

dominant class. A major increase is predicted for Germany (2013-2015), a moderate decrease

for Slovenia and Greece and a major decrease for Romania. The scale based on absolute

thresholds is more granular so it highlights more changes.

In figure 5.6, there are the flash estimates for 2015 for QSR. As for AROP, minor changes are

estimated for most countries. Figures 5.7 to 5.11 show the flash estimates for 2015 for the

positional indicators. They are not available for several countries due to inconsistent results

across approaches or low historical performance. The value of absolute thresholds is in table

5.19.

85

Figure 5.5: Flash estimate 2015 AROP

Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;

Unreliable: CZ, ES, IE

86

Figure 5.6: Flash estimate 2015 QSR


Unreliable: DE, EE, ES, LT

87

Figure 5.7: Flash estimate 2015 ARPT


Unreliable: EL, ES, HR, LV

88

Figure 5.8: Flash estimate 2015 D10


Unreliable: EE, EL, ES, CY, LT, RO, UK

89



Unreliable: BE, BG, EE, EL, ES, IT, LT, RO, SI, SK, UK

90



Unreliable: BG, EL, HU, MT, PT, SI, UK

91



Unreliable: BG, DK, EE, EL, LT, LU, AT, SK

92

Tables 5.12 to 5.18 show the detailed results for the flash estimates 2015 for the ratio (AROP,

QSR) and positional indicators (ARPT, D10, D30, D70, D90). All figures are shown in these

tables, including flash estimates considered not informative or unreliable which are not to be

disseminated to the users. In all tables, the estimated change of the indicator compared to its

last observed value (either 2013 or 2014) is shown along with its prediction interval. It can be

observed that for some countries the prediction interval is rather large. Moreover results

should be interpreted with caution taking into account the historical performance of the

estimates in order to assess how reliable the estimates are.

93

Table 5.12: Flash estimates 2015 by country:: At-risk-of-poverty rate (AROP)

Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15;

Unreliable: CZ, IE, ES

94

Table 5.13: Flash estimates 2015 by country:: QSR

Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15; Unreliable: EE, ES, DE, LT

95

Table 5.14: Flash estimates 2015 by country:: At-risk-of-poverty threshold (ARPT)


Unreliable: EL, ES, HR, LV

96

Table 5.15: Flash estimates 2015 by country: D10

Source:

EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;

Unreliable: EE,EL, ES, CY, LT, RO, UK

97

Table 5.16: Flash estimates 2015: D30


Unreliable: BE, BG, EE, EL, ES, IT, LT, R, SI, SK, UK

98



Unreliable: BG, EL, HU, MT, PT, SI, UK

99



Unreliable: BG, DK, EE, EL, LT, LU, AT, SK

100

Table 5.19: Bounds for MD6 classes

BE BG CZ DK DE EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE UK

[---] / [-] -1.1 -1.1 -0.8 -1.4 -0.8 -1.2 -2.2 -1.1 -0.8 -0.6 -1.1 -1.6 -2.0 -1.1 -2.0 -2.8 -2.0 -1.4 -1.4 -1.4 -1.1 -0.6 -0.6 -0.6 -1.4 -0.8 -0.8 -0.3

[-] / [O] -0.5 -0.6 -0.4 -0.7 -0.4 -0.6 -1.1 -0.6 -0.4 -0.3 -0.6 -0.8 -1.0 -0.6 -1.0 -1.4 -1.0 -0.7 -0.7 -0.7 -0.6 -0.3 -0.3 -0.3 -0.7 -0.4 -0.4 -0.1

[O] / [+] 0.5 0.6 0.4 0.7 0.4 0.6 1.1 0.6 0.4 0.3 0.6 0.8 1.0 0.6 1.0 1.4 1.0 0.7 0.7 0.7 0.6 0.3 0.3 0.3 0.7 0.4 0.4 0.1

[+] / [+++] 1.1 1.1 0.8 1.4 0.8 1.2 2.2 1.1 0.8 0.6 1.1 1.6 2.0 1.1 2.0 2.8 2.0 1.4 1.4 1.4 1.1 0.6 0.6 0.6 1.4 0.8 0.8 0.3

[---] / [-] -2.1 -2.9 -1.2 -1.7 -2.0 -2.8 -3.6 -1.7 -1.1 -1.4 -2.0 -2.4 -3.4 -2.1 -1.1 -3.5 -1.5 -2.9 -1.2 -2.0 -1.2 -1.5 -1.8 -1.2 -1.8 -1.1 -2.1 -1.6

[-] / [O] -1.0 -1.4 -0.6 -0.9 -1.0 -1.4 -1.8 -0.9 -0.6 -0.7 -1.0 -1.2 -1.7 -1.1 -0.6 -1.8 -0.7 -1.4 -0.6 -1.0 -0.6 -0.7 -0.9 -0.6 -0.9 -0.6 -1.0 -0.8

[O] / [+] 1.0 1.4 0.6 0.9 1.0 1.4 1.8 0.9 0.6 0.7 1.0 1.2 1.7 1.1 0.6 1.8 0.7 1.4 0.6 1.0 0.6 0.7 0.9 0.6 0.9 0.6 1.0 0.8

[+] / [+++] 2.1 2.9 1.2 1.7 2.0 2.8 3.6 1.7 1.1 1.4 2.0 2.4 3.4 2.1 1.1 3.5 1.5 2.9 1.2 2.0 1.2 1.5 1.8 1.2 1.8 1.1 2.1 1.6

[---] / [-] -3.1 -6.9 -3.1 -1.6 -4.1 -4.5 -3.3 -4.3 -5.4 -1.5 -4.0 -3.7 -4.3 -4.1 -0.7 -3.7 -3.1 -2.8 -2.3 -3.2 -2.6 -3.8 -6.1 -1.9 -4.0 -1.7 -2.5 -3.1

[-] / [O] -1.6 -3.5 -1.6 -0.8 -2.0 -2.2 -1.7 -2.2 -2.7 -0.8 -2.0 -1.9 -2.1 -2.1 -0.4 -1.8 -1.5 -1.4 -1.1 -1.6 -1.3 -1.9 -3.1 -1.0 -2.0 -0.8 -1.2 -1.6

[O] / [+] 1.6 3.5 1.6 0.8 2.0 2.2 1.7 2.2 2.7 0.8 2.0 1.9 2.1 2.1 0.4 1.8 1.5 1.4 1.1 1.6 1.3 1.9 3.1 1.0 2.0 0.8 1.2 1.6

[+] / [+++] 3.1 6.9 3.1 1.6 4.1 4.5 3.3 4.3 5.4 1.5 4.0 3.7 4.3 4.1 0.7 3.7 3.0 2.8 2.3 3.2 2.6 3.8 6.1 1.9 4.0 1.7 2.5 3.1

[---] / [-] -3.0 -3.4 -1.6 -1.9 -2.3 -1.8 -3.1 -2.1 -1.9 -1.6 -2.7 -2.1 -3.4 -2.5 -0.9 -5.2 -2.8 -2.6 -1.6 -1.6 -1.4 -2.7 -3.2 -1.4 -1.6 -1.3 -2.3 -1.7

[-] / [O] -1.5 -1.7 -0.8 -1.0 -1.2 -0.9 -1.6 -1.0 -1.0 -0.8 -1.4 -1.0 -1.7 -1.3 -0.5 -2.6 -1.4 -1.3 -0.8 -0.8 -0.7 -1.3 -1.6 -0.7 -0.8 -0.7 -1.1 -0.8

[O] / [+] 1.5 1.7 0.8 1.0 1.2 0.9 1.6 1.0 1.0 0.8 1.4 1.0 1.7 1.3 0.5 2.6 1.4 1.3 0.8 0.8 0.7 1.3 1.6 0.7 0.8 0.7 1.1 0.8

[+] / [+++] 3.0 3.4 1.6 1.9 2.3 1.8 3.1 2.1 1.9 1.6 2.7 2.1 3.4 2.5 0.9 5.2 2.8 2.6 1.6 1.6 1.4 2.7 3.2 1.4 1.6 1.3 2.3 1.7

[---] / [-] -2.1 -2.9 -1.2 -1.7 -2.0 -2.8 -3.6 -1.7 -1.1 -1.4 -2.0 -2.4 -3.4 -2.1 -1.1 -3.5 -1.5 -2.9 -1.2 -2.0 -1.2 -1.5 -1.8 -1.2 -1.8 -1.1 -2.1 -1.6

[-] / [O] -1.0 -1.4 -0.6 -0.9 -1.0 -1.4 -1.8 -0.9 -0.6 -0.7 -1.0 -1.2 -1.7 -1.1 -0.6 -1.8 -0.7 -1.4 -0.6 -1.0 -0.6 -0.7 -0.9 -0.6 -0.9 -0.6 -1.0 -0.8

[O] / [+] 1.0 1.4 0.6 0.9 1.0 1.4 1.8 0.9 0.6 0.7 1.0 1.2 1.7 1.1 0.6 1.8 0.7 1.4 0.6 1.0 0.6 0.7 0.9 0.6 0.9 0.6 1.0 0.8

[+] / [+++] 2.1 2.9 1.2 1.7 2.0 2.8 3.6 1.7 1.1 1.4 2.0 2.4 3.4 2.1 1.1 3.5 1.5 2.9 1.2 2.0 1.2 1.5 1.8 1.2 1.8 1.1 2.1 1.6

[---] / [-] -1.9 -2.6 -1.6 -1.6 -2.3 -2.1 -4.2 -1.8 -1.8 -1.6 -2.4 -1.8 -3.6 -2.8 -0.9 -4.7 -1.6 -2.2 -1.6 -2.1 -1.6 -1.7 -2.2 -1.3 -1.9 -1.0 -1.9 -1.8

[-] / [O] -1.0 -1.3 -0.8 -0.8 -1.1 -1.1 -2.1 -0.9 -0.9 -0.8 -1.2 -0.9 -1.8 -1.4 -0.5 -2.3 -0.8 -1.1 -0.8 -1.1 -0.8 -0.9 -1.1 -0.6 -0.9 -0.5 -1.0 -0.9

[O] / [+] 1.0 1.3 0.8 0.8 1.1 1.1 2.1 0.9 0.9 0.8 1.2 0.9 1.8 1.4 0.5 2.3 0.8 1.1 0.8 1.1 0.8 0.9 1.1 0.6 0.9 0.5 1.0 0.9

[+] / [+++] 1.9 2.6 1.6 1.6 2.3 2.1 4.2 1.8 1.8 1.6 2.4 1.8 3.6 2.8 0.9 4.7 1.6 2.2 1.6 2.1 1.6 1.7 2.2 1.3 1.9 1.0 1.9 1.8

[---] / [-] -2.0 -4.8 -2.4 -1.6 -2.8 -3.2 -4.4 -2.1 -2.9 -3.0 -2.3 -2.3 -6.5 -3.3 -1.8 -4.4 -3.4 -3.5 -2.2 -3.0 -2.6 -3.2 -3.6 -1.5 -2.3 -2.1 -2.1 -2.0

[-] / [O] -1.0 -2.4 -1.2 -0.8 -1.4 -1.6 -2.2 -1.0 -1.5 -1.5 -1.1 -1.2 -3.3 -1.6 -0.9 -2.2 -1.7 -1.7 -1.1 -1.5 -1.3 -1.6 -1.8 -0.7 -1.2 -1.1 -1.1 -1.0

[O] / [+] 1.0 2.4 1.2 0.8 1.4 1.6 2.2 1.0 1.5 1.5 1.1 1.2 3.2 1.6 0.9 2.2 1.7 1.7 1.1 1.5 1.3 1.6 1.8 0.7 1.2 1.1 1.1 1.0

[+] / [+++] 2.0 4.8 2.4 1.6 2.8 3.2 4.4 2.1 2.9 3.0 2.3 2.3 6.5 3.3 1.8 4.4 3.4 3.5 2.2 3.0 2.6 3.2 3.6 1.5 2.3 2.1 2.1 2.0

[---] / [-] -0.3 -0.8 -0.6 -0.9 -0.5 -0.4 -0.4 -0.9 -1.7 -2.7 -0.4 -0.5 -0.6 -0.6 -0.9 -0.7 -0.6 -0.3 -0.5 -0.4 -0.3 -0.6 -0.7 -0.2 -0.4 -0.2 -0.2 -0.5

[-] / [O] -0.1 -0.4 -0.3 -0.5 -0.3 -0.2 -0.2 -0.4 -0.8 -1.3 -0.2 -0.3 -0.3 -0.3 -0.4 -0.3 -0.3 -0.1 -0.2 -0.2 -0.2 -0.3 -0.3 -0.1 -0.2 -0.1 -0.1 -0.2

[O] / [+] 0.1 0.4 0.3 0.5 0.3 0.2 0.2 0.4 0.8 1.3 0.2 0.3 0.3 0.3 0.4 0.3 0.3 0.1 0.2 0.2 0.2 0.3 0.3 0.1 0.2 0.1 0.1 0.2

[+] / [+++] 0.3 0.8 0.6 0.9 0.5 0.4 0.4 0.9 1.7 2.7 0.4 0.5 0.6 0.6 0.9 0.7 0.6 0.3 0.5 0.4 0.3 0.6 0.7 0.2 0.4 0.2 0.2 0.5

[---] Major decrease [-] Moderate decrease [O] Minor change [+] Moderate increase [+++] Major increase

D70, in %

D90, in %

QSR, in percentage points

Bounds

AROP, in percentage points

ARPT, in %

D10, in %

D30, in %

MEDIAN, in %

101

6. REFERENCES

Anderson, T. W. and Darling D. A. (1954) A test of goodness-of-fit, Journal of the American

Statistical Association, vol. 49, no. 268, pp. 765-769.

Avram, S., Sutherland, H., Tasseva, I. and Tumino, A. (2011) Income protection and poverty

risk for the unemployed in Europe, Research Note 1/2011 of the European Observatory on

the Social Situation and Demography, European Commission.

Banbura, M., Giannone, D., Modugno, M. and Rechlin, L. (2013) Now-casting and the real-

time data flow, ECB Working Paper Series, no. 1564, pp. 1-53.

Berger, Y.G., and Priam, R. (2016) A simple variance estimator of change for rotating

repeated surveys: an application to the European Union Statistics on Income and Living

Conditions household surveys, Journal of the Royal Statistical Society, Series A Statistics

in Society, vol. 179, issue , January 2016, pp. 251–272

Bourguignon, F., Bussolo, M. and Da Silva, L.P. (2008) The impact of macro-economic

policies on poverty and income distribution Macro-Micro Evaluation Techniques and

Tools, The World Bank and Palgrave-Macmillan, New York.

Brandolini, A., D’Amuri, F. and Faiella, I. (2013) “Country case study – Italy”, Chapter 5 in

Jenkins et al, The Great Recession and the Distribution of Household Income, Oxford:

Oxford University Press.

Brewer, M., Browne, J., Hood, A., Joyce, R., Sibieta, L. (2013) The Short- and Medium-

Term Impacts of the Recession on the UK Income Distribution, Fiscal Studies, 34(2):

179–201.

Claeskens, G. and Hjort, N. L. (2008) Model Selection and Model Averaging, Cambridge

University Press.

Essama-Nssah, B. (2005) The Poverty and Distributional Impact of Macroeconomic Shocks

and Policies: A Review of Modeling Approaches, World Bank Policy Research Working

Paper 3682, Washington, DC.

Fernandez Salgado M., Figari, F., Sutherland, H. and Tumino, A. (2014) Welfare

compensation for unemployment in the Great Recession, The Review of Income and

Wealth, 60(S1): 177–204.

Figari, F., Iacovou, M., Skew, A. and Sutherland, H. (2012) Approximations to the truth:

comparing survey and microsimulation approaches to measuring income for social

indicators, Social Indicators Research, 105(3): 387-407.

Figari, F., A. Paulus and H. Sutherland (2015) “Microsimulation and Policy Analysis” in

A.B. Atkinson and F. Bourguignon (Eds.) Handbook of Income Distribution, Vol 2B.

Amsterdam: Elsevier, pp. 2141-2221. ISBN 978-0-444-59430-3

Figari, F., Salvatori, A. and Sutherland, H. (2011) Economic downturn and stress testing

European welfare systems, Research in Labor Economics, 32: 257-286.

Hjort, N. L and Claeskens, G. (2003) Frequentist model average estimators, Journal of the

American Statistical Association, 98: 879-899.

Immervoll, H., Levy, H., Lietz, C., Mantovani, D. and Sutherland, H. (2006) The sensitivity

of poverty rates in the European Union to macro-level changes, Cambridge Journal of

Economics, 30: 181-199.

Jara, X.H. and Leventi, C. (2014) Baseline results from the EU27 EUROMOD (2009-2013),

102

EUROMOD Working Paper EM18/14, Colchester: ISER, University of Essex.

Kalman, R. E. (1960) A new approach to linear filtering and prediction problems,

Transactions of the ASME-Journal of Basic Engineering, no. 8, pp. 35-45.

Keane C., Callan, T., Savage, M., Walsh, J.R. and Timoney, K. (2013) Identifying Policy

Impacts in the Crisis: Microsimulation Evidence on Tax and Welfare, Journal of the

Statistical and Social Inquiry Society of Ireland, XLII: 1-14.

MacDonald, J. B. and Xu, Y. J. (1995) A generalization of the beta distribution with

applications, Journal of Econometrics, vol. 66, no. 1-2, pp. 133-152.

Massey, J. (1951) The Kolmogorov-Smirnov Test for Goodness of Fit, Journal of the

American Statistical Association, vol. 46, no. 253, pp. 68-78.

Matsaganis, M. and Leventi C. (2014) Poverty and inequality during the Great Recession in

Greece, Political Studies Review, 12: 209 -223.

Narayan, A. and Sánchez-Páramo C. (2012) Knowing, when you don't know: using

microsimulation models to assess the poverty and distributional impacts of

macroeconomic shocks, The World Bank, Washington DC.

Navicke, J., Rastrigina, O. and Sutherland, H. (2013) Nowcasting Indicators of Poverty Risk

in the European Union: A Microsimulation Approach, Social Indicators Research, 119(1):

101-119.

O'Donoghue, C.,Loughrey, J. (2014) Nowcasting in Microsimulation Models: A

Methodological Survey, Journal of Artificial Societies and Social Simulation 17 (4) 12.

Peichl, A. (2008) The Benefits of Linking CGE and Microsimulation Models: Evidence from

a Flat Tax Analysis, IZA Discussion Paper No. 3715.

Rastrigina, O., Leventi, C., Vujackov S. and Sutherland, H. (2016) Nowcasting: estimating

developments in median household income and risk of poverty in 2014 and 2015,

Research Note 1/2015, Social Situation Monitor, European Commission.

Rastrigina, O., Leventi, C. and Sutherland, H. (2015) Nowcasting: estimating developments

in the risk of poverty and income distribution in 2013 and 2014, Research Note 1/2014,

Social Situation Monitor, European Commission.

Shumway, R. H. and Stoffer, D. S. (1982) An approach to time series smoothing and

forecasting using the EM algorithm, Journal of Time Series Analysis, vol. 3, no. 4, pp.

253-264.

Sutherland, H. and Figari, F. (2013) EUROMOD: the European Union tax-benefit

microsimulation model, International Journal of Microsimulation, 6(1): 4-26.

103

Abbreviations

AROP At-risk-of-poverty rate

ARPT At-risk-of-poverty threshold

CDF Cumulative Distribution Function

ESS European Statistical System

EU-

SILC European Union Statistics on Incomes and Living Conditions

FE Flash Estimates

GB2 Generalized beta distribution of the second kind

GDP Gross Domestic Product

ISER

Institute for Social and Economic Research in University of

Essex

LFS Labour Force Survey

MD6 Magnitude-direction scale with 6 classes

MIP Macroeconomics Imbalance Procedure

NA National Accounts

PQM Parametric Quantile Modelling

QAF Quality Assessment Framework

QSR Interquartile share ratio

YoY Year On Year change

MAPE Mean Absolute Percentage Error

HD Hellinger Distance

Country codes

BE Belgium

BG Bulgaria

CZ Czech Republic

DK Denmark

DE Germany

EE Estonia

IE Ireland

EL Greece

ES Spain

FR France

HR Croatia

IT Italy

CY Cyprus

LV Latvia

LT Lithuania

LU Luxembourg

HU Hungary

104

MT Malta

NL Netherlands

AT Austria

PL Poland

PT Portugal

RO Romania

SI Slovenia

SK Slovakia

FI Finland

SE Sweden

UK United Kingdom

Documents

Flash estimates on income and poverty indicators ... FLASH ESTIMATE... · Flash estimates for income and poverty indicators The objective of the exercise is to produce flash estimates