Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
1
Doc. FLASH/2-2/16/EN
estat.f.1 (2016)
2nd MEETING OF THE TASK-FORCE
ON FLASH ESTIMATES FOR INCOME AND POVERTY
25th October 2016
At 09.00 am
Eurostat - Luxembourg
BECH B2/404
Flash estimates on income and poverty indicators:
methodological report
EUROPEAN COMMISSION EUROSTAT
Directorate F: Social statistics
Unit F-1: Social indicators; methodology and development; relations with users
2
Acknowledgements:
This report summarises the main outcomes of the first year of the Eurostat project on
developing flash estimates on income inequalities and poverty for the European Union.
The microsimulation results in this report represent a major output from the cooperation
between Eurostat and the Institute for Social and Economic Research (ISER) in University of
Essex. They rely on the use of EUROMOD, the European Union tax-benefit microsimulation
model, managed, maintained and developed by ISER. Eurostat would like to thank the
EUROMOD team in ISER for their fruitful cooperation, in particular, the director of the
EUROMOD programme, Holly Sutherland and Olga Rastrigina, the Senior Researcher
coordinating our cooperation.
The process of extending and updating EUROMOD is financially supported by the
Directorate General for Employment, Social Affairs and Inclusion of the European
Commission [Progress grant no. VS/2011/0445]. The EUROMOD input dataset is based on:
1) Micro-data from the EU Statistics on Incomes and Living Conditions (EU-SILC) for
Belgium, Bulgaria, Czech Republic, Denmark, Germany, Ireland, Croatia, Cyprus, Latvia,
Hungary, Malta, the Netherlands, Portugal, Romania, Slovenia, Finland and Sweden; 2) EU-
SILC data and some variables from national EU-SILC data for Estonia, Lithuania,
Luxembourg, and Poland; and 3) national EU-SILC data for Greece, Spain, France, Italy,
Austria, and Slovakia, made available by respective national statistical offices. The results
based on PQM make use of EU-SILC for all countries.
3
Table of Contents Summary ................................................................................................................................................. 4
1. Introduction ................................................................................................................................. 22
2. Overview target income data in EU-SILC ................................................................................... 24
3. Methodology ................................................................................................................................. 26
3.1. General framework ............................................................................................................... 26
3.2. Microsimulation .................................................................................................................... 27
3.2.1. Changes in population characteristics ........................................................................... 28
3.2.2. Updating non-simulated income sources ...................................................................... 30
3.2.3. Simulating changes in tax-benefit policies .................................................................... 33
3.3. Parametric Quantile Modelling (PQM) .................................................................................. 34
3.4. Benchmark models ................................................................................................................ 37
4. Quality framework ....................................................................................................................... 38
4.1. Quality Assurance .................................................................................................................. 38
4.1.1. Consistency analysis of auxiliary sources ...................................................................... 38
4.1.2. Intermediate quality checks .......................................................................................... 46
4.2. Quality assessment ................................................................................................................ 53
4.2.1. Retrospective performance analysis ............................................................................. 53
4.2.2. Method selection .......................................................................................................... 66
4.2.3. Ex-ante quality assessment for flash estimates in the target year ............................... 76
4.2.4. Conclusions .................................................................................................................... 77
5. Flash estimates 2015 ................................................................................................................... 79
5.1. Production of FE 2015 ........................................................................................................... 79
5.2. Communication of flash estimates 2015: magnitude-direction scales ................................. 82
5.3. Main results ........................................................................................................................... 84
6. References .................................................................................................................................. 101
4
SUMMARY
Objectives
The indicators on poverty and income inequality based on the European Union Statistics on
Income and Living Conditions (EU-SILC) are an important part of the toolkit for the
European Semester. Currently, the indicators on income of year N are only available in the
autumn of year N+2, which comes too late for the EU’s policy agenda. Therefore, an
improvement in this area represents an important priority for monitoring the effectiveness of
social policies at EU level.
The strategy of the European Statistical System (ESS) for providing more timely data on
income is based on two pillars:
− Flash estimates about income N in Summer N+1, that would be available to prepare
and to start the European Semester and provide data to Macroeconomics Imbalance
Procedure (MIP) in autumn N+1
− Final EU-SILC data (or at least data more robust than flash estimates) on income N
during the European Semester (end N+1 / early N+2)
This report presents the methods developed so far as well as their application for the year
2015 (income year1) based on the on-going exercise in Eurostat. It includes the description of
the main methodologies tested and of the quality assessment framework (QAF) put in place in
order to ensure a comparable way to assess results stemming from different methods. In June
2016, Eurostat set up a task force on “Flash estimates on income distribution” with 8 Member
States (BE, IT, PT, LU, DE, FR, SE and UK) to support the on-going work, with several
National Statistical Institutes already developing and releasing flash estimates at national level
(UK2, FR
3). Eurostat and the Institute for Social and Economic Research (ISER) at the
University of Essex are also cooperating in order to further develop and enhance the use of
EUROMOD4 and microsimulation for nowcasting income distribution indicators
5.
The main challenge given the sensitivity of releasing early estimates is to ensure a sufficient
level of confidence in these figures and thus Eurostat efforts focused in this first part of the
project on four main areas: (1) a review of several methods from more simple naïve
forecasting methods to more complex meso-level dynamic factor models and microsimulation
1Years throughout the paper will refer to income year (N) and not the EU-SILC data collection year (N+1). For
Ireland the income reference period is the last twelve months and for the United Kingdom the current income is
annualised and aims to refer the current calendar year, i.e. weekly estimates are multiplied by 52, monthly by
12. 2https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bulle
tins/nowcastinghouseholdincomeintheuk/2015to2016 3http://www.insee.fr/fr/themes/document.asp?ref_id=ia23).
4EUROMOD is a tax-benefit microsimulation model for the European Union, for more details see
https://www.euromod.ac.uk/. 5 This cooperation partly builds on the previous work on nowcasting carried out in ISER, see e.g. Rastrigina et
al. (2016).
5
techniques; (2) the set-up of the QAF in order to both select and identify conditions of use of
these methods and assess the quality of final estimates; (3) build a common platform for
exchanging experiences via the formation of the task force and creation of a dedicated
website6 and 4) address communications issues and possible ways to disseminate such
experimental statistics.
Flash estimates for income and poverty indicators
The objective of the exercise is to produce flash estimates for income and poverty indicators
collected through EU-SILC. The starting point is the main income indicators used in the
frame of Europe 2020 and in the scoreboard of key social indicators: at-risk-of poverty rate7
and the income quintile share ratio8 (QSR or S80/S20). However, an analysis of the trends in
EU-SILC (2008-2014) showed that these indicators are rather structural and the observed
yearly changes are often not significant. In terms of early estimates there can be other
indicators which are more sensitive: i.e. important shifts at different points of the income
distributions throghout the crisis are captured by what we call "positional indicators" (i.e. at-
risk-of-poverty threshold (ARPT) and deciles).
Therefore, we included both inequality indicators such as AROP and the evolution of income
at different points of the income distributions in our analysis.
In general terms flash estimates on income distribution and poverty should9:
− Refer to a past yearly reference period (year N). This should differentiate the
flash estimates from a purely forecasting exercise. In this round the main purpose
is to use and develop methods and apply them to test the production of flash
estimates 2015 (therefore collected through EU-SILC 2016) ;
− Refer to a set of distributional indicators for equivalised disposable income. The
set included in the analysis contains at-risk-of-poverty rate (AROP), the income
quintile share ratio and the deciles of the equivalised household disposable
income;
− Be based on an information set that includes the latest income data available from
EU-SILC (income N-1 or N-2 even) that is the reference data source in
establishing EU poverty statistics. This will be enhanced with more timely
auxiliary information from the reference period (year N) such as Labour Force
Survey (LFS), National Accounts, etc.;
− Be based on a set of statistical techniques, such as calibration, modelling,
extrapolation that are not traditionally used in the calculation of social statistics
indicators.
6https://ec.europa.eu/eurostat/cros/content/flash-estimates-income-and-poverty-indicators_en
7 http://ec.europa.eu/eurostat/statistics-explained/index.php/People_at_risk_of_poverty_or_social_exclusion
8http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Income_quintile_share_ratio
9http://unstats.un.org/unsd/nationalaccount/workshops/2009/ottawa/AC188-S31a.PDF
6
Methodological approaches
Flash estimates have already been developed at EU level in relation to macro-indicators such
as early releases of the GDP growth and inflation rate10
. However, in our case the focus is on
the distributional changes and this implies more complex models that allow the estimation of
the entire distribution and capture the complex interaction of a large number of various past
and present events, such as the effects of economic and monetary policies, the
implementation of social reforms or shifts in macroeconomic circumstances or demographic
changes.
Two main approaches are currently being tested in the frame of the flash estimates for income
and poverty indicators project: (1) Evolution of income components-microsimulation; (2)
Parametric quantile approach. In addition a third approach is based on basic models to be used
as benchmarks in the estimation. A fourth method, based on monthly income, will be
developed when these data will become available for several countries.
The first approach is in line with current practices in different Member States and it aims to
micro-simulate income changes at individual/household level within EU-SILC microdata. The
EU tax-benefit microsimulation model EUROMOD is used for this purpose in combination
with timelier macro-level statistics on changes in demographics, employment characteristics
and income. For the purposes of the nowcasting exercise standard EUROMOD policy
simulation routines are enhanced with additional adjustments to the input data to take into
account changes in the population structure and labour market characteristics. Micro-
simulation models are very powerful in accounting for the effect of changes in the the tax-
benefit policies which leads to a high explanatory power of this family of models.Moreover,
you can disentangle the effects on specific socio-economic groups or differentiate between
policy effects and other factors such as changes in the population structure. However, there
are several assumptions that might affect the abbility of the micro-simulation model to predict
actual changes in observed data. For a limited number of countries there are simulation for tax
evasions and non take-up of benefits but in the large majority of cases EUROMOD doesn't
inlude behavioral effects due to policy changes.
The parametric quantile model (PQM) family relies on the exploitation of macro-level
information, provided by National Account figures such as GDP, consumption, wages and
salaries. It aims at producing flash estimates of a set of distributional parameters, which are
assumed to encode the entire information contained within the income distribution, instead of
income distribution itself. The estimate of the distribution is derived at a second-step of the
estimation procedure through a mapping algorithm which allows reconstituting the
distribution from the set of parameters. The flash estimates of the distributional parameters
are obtained on the basis of dynamic factor models. Hence, PQM models are designed to
capture the impact of changes in macroeconomic circumstances on the income indicators.
They might also capture changes in social and economic policies if and only if these changes
are reflected somehow in the National Account figures.
10
http://ec.europa.eu/eurostat/statistics-
explained/index.php/Inflation_%E2%80%93_methodology_of_the_euro_area_flash_estimate
7
A third group of more “naïve” forecasting techniques were included in the framework
essentially to be used as benchmarks. The benchmark models build flash estimates on the
basis of extrapolations of past income information. Note that the benchmark model do not
contain explanatory variables which means that they only provide predictive information and
hence have no explanatory power in terms fo additional variables that can drive the changes in
the income distribution.
A fourth approach relies on the collection in EU-SILC of additional item(s) on current income
and it will be done in the next stage when further data will become available from countries
which collected this information on voluntary basis. Currently we have collected data from 7
countries and this approach will be included both as a stand-alone line of work or as a
complement for updating information on income components for the next round of flash
estimates (2016).
The aim of considering several approaches was to develop a toolkit of methods (models) that
can be used in all countries and different macro-economic conditions. Ideally we should have
one method that outperforms the other in order to have a common harmonised approach.
However, the four families of methods differ in terms of model complexity (and flexibility),
the way current information is used in the production of the flash estimates and their
explanatory power. The most comprehensive model might not constitute the best choice in all
countries and socio-economic circumstances. In some cases, a complex model, accounting for
all potential influences on the social indicators, can be over-parametrized and lead to a lower
predictive power relative to the simpler models. All the methods considered in the exercise
went through the common quality assessment framework and were scored based on their
historical performance in the test period (2012-201411
). In the future we will add to this
toolkit the current income approach as well as other possible national solutions that can be
then assessed through the common quality framework.
Quality assessment framework
The quality of the flash estimates depends on both the methods and sources used. Therefore,
setting and using a common quality framework becomes a central part of the production
process to select the methods and auxiliary sources that perform best in reproducing EU-SILC
indicators for previous years. This would also allow to benchmark results stemming from
Eurostat exercise, including the estimates based on EUROMOD done in collaboration with
ISER, as well as other national initiatives. In particular, the United Kingdom and France are
also producing their own flash estimates at national level and therefore a common platform
for validating and benchmarking results is essential.
The quality framework is composed of two main parts: (1) Quality assurance, which focuses
on analysing inconsistencies in the input data and includes several intermediate quality checks
11
At the moment of writing this report 2014 income data (EU-SILC 2015) were not available for DE, IE, IT,
CY, LU
8
along the estimation process; (2) Quality assessment, which focuses on the historical
performance of different methods.
(1) Quality assurance
A retrospective assessment of data consistency was performed for the auxiliary information to
be used in the estimation process (LFS and National Accounts). This is a historical
comparison, i.e. between two sets of observed data. Consistency means that relevant macro-
level aspects such as income components and totals as measured by National Accounts or
unemployment as measured by LFS are similar to EU-SILC trends. Source inconsistencies
can steam from differences in concepts, coverage, sampling variance or errors in the data
collection. The lack of such consistency at macro level in the trends showed by EU-SILC and
auxiliary sources would mean that a FE generated by a model that includes the
aforementioned macro data as explanatory variables is not expected to be a good estimation of
the EU-SILC-derived indicator. This analysis showed that overall employment trends for LFS
can be used as a proxy in most of the countries. Data on totals for different income
components from National Accounts and EU-SILC were checked in terms of levels and trends
for the period 2009-2014. The intermediate checks showed that for property and self-
employment income, the two sources are very different. Moreover, these components vary
significantly between the quarterly National Accounts data available for the flash estimates
and later revisions. Therefore, only income from employment, social benefits and taxes from
the National Accounts are used for time series modelling. Further work is ongoing in Eurostat
on the reconciliation of macro-micro data on income and this can feed also the work on flash
estimates12
.
In addition, we also introduced several intermediate checks in order to assess and disentangle
effects introduced in the different steps of the estimation process: e.g. comparing distributions
after calibration, the cumulated effects of calibration and uprating on the different income
components. More details on the quality assurance part can be found in Section 4.1.
(2) Quality assessment
As part of the assessment procedure, we follow three main stages:
2.1. Perform a retrospective analysis of the performance of flash estimates for 2012- 2014
income years.
In particular we look at:
(a) The two quality metrics for comparing point estimates and year-on-year changes for the
key income indicators: accuracy and consistency;
(b) An additional quality measure that takes into account the uncertainty related to the
sampling variance. This has the advantage also of providing an absolute performance
threshold so we can select out the low-performing methods.
12
http://www.iariw.org/dresden/gregorini.pdf
9
The performance analysis included 36 methods: 24 from PQM, 9 from microsimulation and
the 3 benchmark models. In terms of performance over the period of analysis there are some
fundamental conclusions that can be drawn at this stage:
First, the performance analysis illustrates that for the two main families of methods tested
(Microsimulation and PQM) the number of well-performing models is at its largest for the
AROP indicator. As far as the positional parameters (ARPT and the deciles) are concerned, in
general the PQM family have a relatively lower performance. In terms of QSR, we have
observed that performances for both families of methods are lower than for AROP.
Second, for all indicators the number of well-performing flash estimates among the family of
micro-simulation models is significantly higher than the corresponding share of PQM models
and this for almost all countries. However, there is a strong complementarity between the two
main approaches across countries and indicators. This raised the issue of method selection.
2.2. Select a strategy for method selection
Three different model selection procedures have been tested for choosing the value of the
flash estimate out of a set of candidate models.
− A best method by country that takes into account national specificities: (1) computing
the average performance of each of the candidate models across indicators at the
country level; (2) selecting the candidate model with the highest average performance.
Microsimulation tends to outperform PQM in this case (16 countries). However, PQM
can outperform microsimulation in some countries (e.g. SI, CY).
− A best method by country and indicator that takes into account the specific traits of
groups of indicators. The results showed that there is a kind of polarisation of methods
according to their performance by two groups of indicators: inequality indicators
(AROP and QSR) and positional indicators. For the latter group the Microsimulation
performs better while for the first group the results between the two main approaches
are more balanced.
− An aggregate of the best methods based on three steps: 1) discard the methods with
low historical performance (described in Section 4.2); (2) discard candidate flash
estimates that diverge in the target year; (3) aggregate those flash estimates that have
passed the filtering step into one single value.
The outcomes were evaluated for the simulation of the flash estimate 2014 for countries for
which 2015 EU-SILC data was available at the moment this report was produced. Therefore
there were three main steps: (1) first single methods within each family are filtered out on the
basis of their historical performance 2012-2013. The uncertainty based quality measure
mentioned in section 4.2.1 is used given that we can identify a precise threshold for selecting
out low performers; (2) the three candidate models for method selection are applied on the
pool of methods that passed the quality criteria; (3) the resulting flash estimates are assessed
against the observed EU-SILC indicators income 2014.
The first important conclusion that can be drawn from this analysis is that both, the aggregate
estimate and the estimate derived on the basis of a country-specific objective function largely
10
outperform the estimate consisting in choosing the best historical estimate by country and
indicator. Moreover, for nearly all countries the aggregate estimate has a better performance
than the country-specific estimate.
Note that the main difference between both estimates is that the best-by-country estimate only
uses the information provided by one of the available candidate models whereas the aggregate
estimate aims at combining the information of the well-performing candidate estimates
leading to a more robust model selection procedure. Moreover, in the aggregation algorithm
the convergent/divergent behaviour of the candidate models within the target year constitutes
an integral part of the performance assessment setting. On the contrary, the best-by-country
estimate does not take into account the relative performance of the candidate models at the
target year and hence is exclusively based on the use of historical performance. However, the
strength of the best-by-country estimate is that the model selection is done at the country-level
which implies that for a given country the same model is used to produce the flash estimates
for all the indicators assuring consistency among the latter. Note that using different methods
for distinct indicators by no means implies the occurrence of an inconsistent set of indicators.
However, using the same model for all the indicators estimates is a sufficient condition for
consistency.
We can thus conclude that aggregating the information provided by several well-performing
methods allows creating more robust flash estimates compared to the other two selection
procedures. However there are two main issues to be addressed:
In contrast to the best-by-country estimate, the aggregation algorithm is not
constrained to the use of one sole model at the country level and hence the set of
participating models might very well vary across indicators. Hence, the aggregation
algorithm does not assure consistency in advance.
The optimisation algorithm of aggregation doesn't take into account the hierarchical
structure of the methods at this point so we can have issues to balance the participation
of the two approaches.
2.3. Ex-ante quality assessment in the target year
Historical performance
A first measure of quality is based on the historical performance based on the period 2012-
2014. The historical performance of the aggregate estimate (𝑯+ ) is specified as the average
of the historical performances of all methods that participate in the construction of the final
(aggregate) estimate that pass the quality criteria (1) and the convergence criteria in the target
year (2). If a single method is used for the final estimate (e.g. best by country) this method is
assessed based only on its historical performance.
Prediction interval and estimated probabilities to be within certain change classes
11
Providing a point estimate for the population parameter does not take into account the
uncertainty of the flash estimate. We therefore computed the standard deviation that includes
both the sampling error and the uncertainty of the flash estimates. However, it doesn't take
into account model bias. Considering that we focus on estimating YoY changes rather than
the actual levels we assume that at least partially we adjust for the systematic bias. The point
estimate for the flash is calculated by adding the YoY change to the last observed value in
EU-SILC.
Flash estimates 2015(income year)
(1) Production FE 2015
A preliminary set of flash estimates 2015 are produced based on the following steps:
Produce the individual values for the FE 2015 for all methods by country and indicator
Apply the QAF to the flash estimates 2015:
o Assess past performance for individual estimates
o Use two distinct method selection procedures: the aggregation algorithm and
the best by country to produce the flash estimates 2015 for each indicator and
country. As far as the inequality indicators are concerned, we produce three
different estimates for each indicator based on three different sets of candidate
models, that is a first one which only considers micro-simulation models, a
second one based exclusively on PQM models and a third one combining both
model families. Benchmark models are used for comparison during the
analysis but are not included in the final aggregate.
o Perform an ex-ante analysis of quality in order to decide between the different
sets of indicators as well as identifying cases in which indicators are still
considered unreliable. This assessment is mainly done on the historical
performance and the convergence in the target year taking into account
uncertainty, including also more qualitative information.
If the aggregate estimate built from micro-simulation models and the PQM-based estimate
have a similar historical performance but provide contradictory or at least discrepant results at
the target year, their joint information is considered to be unreliable. If one approach has a
considerably higher historical performance compared to the other, we report the information
provided by the high-performing estimate. Finally, if both estimates have good historical
performance, without one of them considerably out-performing the other one, we report the
information provided by the combined estimate considering both model families. In the future
this kind of hierarchically-structured decision process can be directly introduced in the
aggregation algorithm.
12
Figure 113
illustrates the participation of each of the available approaches in the construction
of the aggregate estimate for AROP and QSR at target year 2015. It transmits a clear picture
of complementarity.
Figure 1: Participation by approach in the FE 2015 – AROP and QSR
Apart the main income inequality indicators (i.e. AROP and QSR) we produced the flash
estimates 2015 also for the positional indicators, namely deciles and the ARPT.
For the two main inequality indicators the aggregation clearly outperforms the other two
selection procedures (i.e. the best by country and best by country and indicator). Methods,
mainly coming from PQM have a relative higher performance on AROP or QSR than for
positional indicators. Thus the participation number in the aggregate is higher for inequality
13
UK not available for the microsimulation exercise
13
indicators but relatively lower for positional parameters. However, the predictive power of the
aggregate estimate increases with the participation number. Hence, if the participation number
is small the aggregation algorithm cannot fully deploy its potential and hence sees its
performance is reduced to the performance of the by-country-and-indicator estimate which, as
mentioned above, is underperforming when compared to the by-country estimate. This has led
to the decision to use the aggregation algorithm for inequality indicators only and to resort the
by-country estimate for positional parameters.
This still leads to a large number of indicators that have a performance lower than the
threshold for specific deciles: for example D10 seems a particular difficult indicator.
Further work is needed to improve some of the issues raised by the use of different models:
1) Need to better take into account the hierarchical structure of the methods used: by sub-
group, by approach etc.
2) Need to ensure a higher consistency across indicators. The same objective function in the
optimization algorithm by country would ensure this. However, the first results show that
methods which perform better for predicting inequality indicators are not necessarily the same
as for positional indicators.
Table 1 presents an overview of the flash estimate exercise for income year 2015. It highlights
the main stages, conclusions and decisions.
14
Table1: Overview of the flash estimate exercise
15
If for some aggregates or for the best by country results no method passes the performance threshold than the flash estimate is
considered unreliable for the specific country and indicator.
1) Assess past performanceIf for some aggregates or for the best by country results no method passes the historical performance threshold than the flash
estimate is considered unreliable for the specific country and indicator.
2) Two main method selection strategies: aggregation and best by
country. 3 sets of aggregate estimates are produced: for each
approach (microsimulation and PQM) as well as the overall one.
3) Ex-ante assessment for flash estimate 2015
Ex-post quality
assessment
The relative magnitude direction scale(MD6) is a relative grid for communicating the YoY change.
It is country specific and it takes into account the sampling error in order to clasify the change as
minor/moderate or major change.
The absolute magnitude direction scale(MD8)-has the same categories across countries and is more
straigtforward to interpret. However it should be considered together with the statistical significance of
the change.
Review the quality of estimates when EU-SILC data becomes available.
Assess revisions and improve consequently the methodology and the QAF
We have decided that when the historical performance is under the performance threshold these estimates should not be
communicated as they are not reliable. The magnitude direction scales are proposed in order to balance in the communication the
information provided and the associated uncertainty of our estimates.
Production flash
estimates 2015
Set of individual FE 2015 for all methods
Communication main
results
Apply the QAF to FE 2015
A comparative analysis was conducted for the four sets of estimates based on the three aggregates and the best by country. Both
the convergence in the target year and the historical performance were compared.
If both methods have high performance and the results are convergent we are chossing the overall aggregate. If the aggregate for
one of the two approaches clearly outperforms the other only one approach is chosen. If they have similar performance but they
have divergent values in the target year results are considered not reliable.
Due to the low number of methods passing the aggregation for the locational indicators and coherence issues among indicators we
have chosen the best by country method. This should be addressed in the future by using a country specific aggregation which
maximizes the expected quality fo flash estimates at country level for all chosen indicators. If specific indicators such as AROP are
considered more important this could be given a higher weight.
In general results are more robust and they have a higher expected quality based on their historical performance for AROP.
For the deciles there are still several cases where no method passes the threshold and in general also the aggregation performs less well.
FE 2015
To be further developed when EU-SILC data 2016 will become available.
16
(2) Communication of results
Despite its simplicity, communicating a single number gives a false impression of precision.
Alternatively, we could take into account the uncertainty of the estimate by providing a
magnitude-direction scale for the estimated change in the flash estimate. From the overlap of
the prediction interval with the change classes, we can derive the probabilities of each of the
latter.
Based on preliminary consultation with our main users and in order to take onto account the
uncertainty of the flash estimate we propose using for communication to the main public a
magnitude-direction scale with 6 classes (MD6), to which the YoY change will be assigned:
(1) major increase [+++]
(2) moderate increase [+]
(3) (quasi) stable / minor changes [O]
(4) moderate decrease [-]
(5) major decrease [---]
(6) no conclusion (when we get contradictory signals from different methods with good
past performance).
There are two possibilities concerning the choice of thresholds for the MD6:
Relative thresholds which are country-specific and are calculated as multiples of the
standard deviation for each indicator.
Absolute thresholds which are common to all countries and should be based on users'
needs. This can be extended to several classes. In our case, we used eight.
The prediction interval calculated for the flash estimate allows calculating a probability for
each change class, and therefore takes into account the uncertainty of the point estimate.
(2) Overview of the main results
DISCLAIMER
The data and conclusions presented hereafter are to be seen as preliminary results of newly
developed experimental methods. The specific figures for income indicators 2015 are given
for illustrative purposes and should not be considered at this stage as official or experimental
data produced by Eurostat.
Figure 2 illustrates the flash estimates 2015 for AROP based on the two types of magnitude-
direction scales. In most countries, a minor change is the dominant class. A major increase is
predicted for Germany (2013-2015), a moderate decrease for Slovenia and Greece and a
major decrease for Romania. The scale based on absolute thresholds is more granular so it
highlights more changes. Their statistical significance should also be considered: the same
change in terms of percentage points can be significant in one country and not in another.
17
Figure 2: Flash estimates AROP 2015: MD6 scales with relative thresholds (left) and with absolute thresholds (right)
18
Table 2 below summarises the main predicted changes for the complete set of flash estimates
for 2015 (and 2014 in case EU-SILC 2015 was not available yet at the moment when
estimates were produced). We use the MD6 scale based on relative thresholds in order to
decide the type of increase/decrease. The red colours signal a decrease, the green, an increase
and the shades give the magnitude of the change. The white/light blue background signal
minor changes. The bounds for the changes are country specific and depend on the standard
deviation for each indicator from EU-SILC. Tables 3 and 4 contain a time perspective as they
include the observed changes in EU-SILC between 2008-2013/2014 plus the flash estimates
between last observed values (2013/2014) to 2015.
Positional indicators are less stable than AROP; therefore, more changes are expected given
the chosen MD6 classes. Further analysis is available in Section 5.2. Tables 5.1 to 5.7 include
the point estimate, for levels of the indicators and the YoY change as well as the 95%
prediction intervals as well as information on the historical performance measure which is
another important parameter for assessing the expected quality and precision for the flash
estimates.
Table 2: Changes for main income indicators between last observed EU-SILC (2013/2014)-2015
19
Table 3: Changes for main income indicators across years: observed values and FE 2015
Tables 4: Changes for deciles cut-off points across years: observed and FE 2015
20
Way forward
This report provides the first results for the flash estimates on income indicators 2015.
Following this first exercise it is essential not only to further develop the methodological
framework but also to discuss issues of quality assessment and communication that emerge in
the context of this work.
− A consultation with the users is needed at this stage in order to prioritise their needs in
terms of indicators that need to be available as early estimates and their expectations
of users in terms of the explanatory power associated with the communicated changes
in main indicators. A relevant trade-off is between the predictive and explanatory
power of different methods. The different approaches differ also in their ability to
provide details and explanations about the change. Microsimulation allows to link
specific estimated changes to the implementation of certain social policies and to
identify particular groups or income components affected. On the other hand there are
countries for which the historical performance of these models was not good enough
to pass the assessment framework at this moment.
− A consultation with the producers on the methodology and the way of communicating
the results, taking into account the users' needs will also take place.
− For the production of flash estimates, in general the aggregation proved to be more
robust but the algorithm should be further improved to include qualitative information
as well as lessons learned from the ex-post evaluation of this first round of flash
estimates. Four main areas were identified for further work (a) take into account ex-
ante qualitative information related to specific groups of methods and/or intermediary
checks (b) a better consistency of the results across indicators using for example the
same combination of methods for all indicators for a particular country; (c) improve
the convergence of the candidate estimates that enter the aggregate; (d) take into
account the dependence within specific groups of methods in the filtering process.
− In terms of methodology the next step would be to include the current income
approach in the framework and assess its results not only as a standalone line of work
but also as a potential complement for updating/predicting changes in the income
components. Further improvements of the two main approaches are also foreseen
based on their historical performance and these first results for 2015.
− Further development of the quality framework is essential to build a common platform
together with the EU Member States for comparing and assessing results.
− In terms of communication, point estimates can be provided but they are subject to
several sources of uncertainty: model bias and variance, the sampling error in EU-
SILC, inconsistencies between the different data sources entering the estimation. This
21
raises not only a question of quality but also of communication of the results to the
public. A first proposal, at least until the methodology and the quality assessment are
more mature, is to start the communication of the flash estimates with a magnitude-
direction scale that would give a more general message concerning the expected
changes in the income indicators. There are two main types of scales: 1) relative scales
which are country-specific and focused on highlighting significant changes or 2)
absolute scales which are common to all countries and based on user-friendly
thresholds that can provide more information. However, providing more information
implies also a higher degree of uncertainty related to the communicated changes.
− A first future important milestone is to include the flash estimates on income
distribution as a part of the European Semester toolkit so the flash estimates should be
validated through the assessment framework and the consultation with the main users.
− After validation through the assessment framework and with the producers, Eurostat
aims at publishing the first round of flash estimates 2016 during summer 2017.
22
1. Introduction
Policy makers in the European Union (EU) are facing an increasing demand for monitoring
changes in social conditions at national and EU level, especially during periods of rapid
economic changes. The European Union Statistics on Income and Living Conditions (EU-
SILC) is an instrument aimed at collecting comparable multidimensional microdata on
income, poverty, social exclusion and living conditions for EU countries. EU-SILC indicators
on poverty and income inequality are a key part of the toolkit for the European Semester, the
yearly cycle of economic policy coordination among EU Member States. The timeliness of
these indicators is crucial for keeping track of the effectiveness of policies and the impact of
macroeconomic conditions on poverty and income distribution. However, partly due to the
complexity of the data collection process, income data for year N is only available in the
autumn of year N+2, which comes too late for the policy agenda.
The current strategy at the level of the European Statistical System (ESS) for providing more
timely data on income is based on two pillars. The first pillar refers to the production of flash
estimates on income distribution and poverty. It states that flash estimates on income of year
N should be available in time to prepare for the European Semester in the autumn of year
N+1. The second pillar concerns the final EU-SILC microdata, and it states that the microdata
on income of year N should be available during the European Semester (i.e. in the end of year
N+1 or early N+2).
This report aims to provide an overview of the process that Eurostat has put in place for the
development of flash estimates for income distribution at EU level. The flash estimates are
based on nowcasting techniques which estimate the current income distribution using
microsimulation modelling techniques or other appropriate methods based on household
income microdata from a previous period in combination with the most up-to-date statistics.
In this report, there are two main methodological approaches explored, each of them with
several sub-groups of methods being described and assessed.
In the context of development of flash estimates, the quality assessment framework becomes a
central part of the production process in order to select the methods and auxiliary sources that
perform best in reproducing EU-SILC indicators for previous years. The quality framework is
composed of two parts: (1) the quality assurance, which focuses on analysing inconsistencies
in the input data and includes several intermediate quality checks along the process; (2) the
quality assessment, which focuses on the historical performance of different methods and the
uncertainty of the final flash estimates. The quality framework is an essential element for both
the scientific and political acceptance of the flash estimates. Thus, it is important that the flash
estimates are disseminated together with methodological guidelines concerning their
development, interpretation and appropriate use.
The estimation exercise has a set of target indicators: in this round we focused on the main
income indicators such as at-risk-of-poverty rate, and quantile share ratio, but also the income
deciles which are more sensitive to changes in the economic situation and therefore they
reflect changes at different points in the income distribution. These are collected through EU-
SILC which is the reference data source on income and poverty statistics and therefore ideally
23
we should capture overall the relevant changes in the income distribution reflected in EU-
SILC.
In June 2016, Eurostat set up a task force on “Flash estimates on income distribution” with 8
Member States (BE, IT, PT, LU, DE, FR, SE and UK) to support the on-going work, with
several National Statistical Institutes already developing and releasing flash estimates at
national level (UK14
, FR15
). Eurostat and the Institute for Social and Economic Research
(ISER) at the University of Essex are also cooperating in order to further develop and enhance
the use of EUROMOD and microsimulation for nowcasting income distribution indicators16
.
The structure of the report is the following: Section 2 analyses the characteristics of the target
income data in EU-SILC. Section 3 explains the nowcasting methodology. Section 4 presents
the quality framework. Section 5 concludes by summarising the most important findings for
2015 and the next steps for the development of the flash estimate on income distribution.
14
https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/bull
etins/nowcastinghouseholdincomeintheuk/2015to2016 15
http://www.insee.fr/fr/themes/document.asp?ref_id=ia23 16
https://www.iser.essex.ac.uk/research/publications/working-papers/euromod/em8-16
24
2. Overview target income data in EU-SILC
The objective of this section is to summarise the main characteristics of the key income and
poverty indicators based on EU-SILC that we are aiming to estimate.
Figure 2.1. shows the year-on-year (YoY) changes from 2008 until the last available value at
the time of writing this report. The YoY changes are divided by their standard deviation.
Figure 2.1: Standardized YoY change
The majority of the observed YoY changes are non-significant (i.e. below 2 standard
deviations) in the case of AROP (69%) and QSR (82%); the opposite is true for ARPT (20%)
and D10 (35%).
Figure 2.2. focuses on the evolution of the key positional indicators (D10, D30, MEDIAN,
D70 and D90) from 2008 until the last available value.
25
Figure 2.2. Overview of quintiles' evolution
Despite the different dynamics during the examined period, the two noticeable phenomena
are:
an upward shift of the whole distribution (with the exception of Ireland, Greece,
Spain, Portugal and Cyprus);
an increase in inequality driven by both the relative impoverishment of the lower-
income households and the increase of the top-income ones; notable exceptions are
Greece and – to some extent – Ireland and Cyprus (where the quintiles are coming
closer together), and countries like Spain, France, Italy, Netherlands and Portugal
(where the quintiles move in tandem).
26
3. Methodology
3.1. General framework
Changes in the income distribution are the outcome of a complex interaction of a large
number of various past and present events, such as the application of economic and monetary
policies, the implementation of social reforms or shifts in macroeconomic circumstances. The
most general and flexible approach towards flash estimation would be to develop model
frameworks which allow to account for all these different influences acting on the income
distribution and the income inequality indicators.
However, the most general model framework might not constitute the best choice in all kinds
of situations and circumstances. In order to illustrate the potential sub-optimality of highly-
flexible model frameworks we first need to have a look at the concepts of predicitve power
and explanatory power in the context of flash estimation of income indicators. Predictive
power is the ability of a model to produce adequate approximations of the values taken by
income indicators at different time points before they are available. On the other hand the
explanatory power of a model indicates the capacity of that model to effectively explain the
predictions that it comes up with. As mentioned above, changes in income distribution are
linked to different phenomena ranging from macroeconomic circumstances to the
implementation of specific policies. However, there are cases in which the change in income
distribution might be overwhelmingly impacted by one main event. In this situation, the
general model framework, accounting for all potential influences on the social indicators,
reveals to be over-parametrized leading to a lower predictive power relative to the simpler
models which only focus on small numbers of effects and hence are characterized by a much
more parsimonious parametrization. The large number of parameters in the general model
framework also has a negative impact on its explanatory power. Indeed, the presence of a
large number of interacting parameters makes it very hard or even impossible to link a given
change in income indicators to specific circumstances.
Thus, having one single highly-flexible model framework at our disposal might not be the best
strategy. Instead, we sould rather aim at building a model toolbox containing a collection of
different models ranging from very specialized frameworks to more general ones. Note
however that replacing one highly-flexible model framework by a model toolkit, requires a
more sophisticated performance analysis scheme which is essential for picking the most
appropriate model framework at each moment in time.
Model toolkit
The model toolkit at our disposal can be divided into three model families, the micro-
simulation models, the parametric quantile models and a set of benchmark models. Note that
the three model families differ in terms of model complexity (and flexibility), the way current
information is used in the production of the flash estimates and their explanatory power.
Micro-simulation models are very powerful in accounting for changes in taxes and benefits
and their respective impacts on the income indicators. According to the micro-simulation
paradigm, the changes in taxes and benefits are simulated directly at the individual or
27
household level. This direct simulation allows to establish a causality link between the
implemented policies and the changes in income indicators. The existence of causal
relationships between policy rules on the one hand and income indicators on the other hand
leads to a high explanatory power of these model family. Moreover, you can disentangle the
effects on specific socio-economic groups.
The parametric quantile family relies on the exploitation of macro-level information,
provided by National Account figures such as GDP, consumption, wages and salaries. Hence,
these type of models are designed to capture the impact of changes in macroeconomic
circumstances on the income indicators. They might also capture changes in social and
economic policies if and only if these changes are reflected somehow in the National Account
figures. Note however that in contrast to the micro-simulation family the parametric quantile
models do not manage to establish causal links between policy rules and changes in
indicators. Instead the relations between the explanatory variables and the income indicators
are based around their correlations resulting in a lower explanatory power relative to micro-
simulation models.
The benchmark models build flash estimates on the basis of extrapolations of past
information. Hence, they do not constitute nowcasting frameworks as such given that they do
not use any kind of present information. Note that the benchmark model do not contain
explanatory variables which means that they only provide predictive information and hence
have no explanatory power.
3.2. Microsimulation
The approach presented here relies on microsimulation modelling and is being developed by
Eurostat in collaboration with the Institute for Social and Economic Research (ISER) at the
University of Essex.
Microsimulation models have been widely used for assessing the distributional impact of
current and future tax-benefit policy reforms, as well as the impact of the evolution of market
incomes, changes in the labour market and in the demographic structure of the population.17
Using microsimulation techniques based on representative household data enables changes in
the distribution of market income to be distinguished and the effects of the tax-benefit system
to be identified taking into account the complex ways in which these factors interact with each
other (Peichl, 2008; Immervoll et al., 2006). Combined macro-micro modelling has also been
used for analysing the impact of macroeconomic policies and shocks on poverty and income
distribution.18
17 Some examples include Brewer et al. (2013) for the UK, Keane et al. (2013) for Ireland, Brandolini et al.
(2013) for Italy, Matsaganis & Leventi (2014) for Greece and Narayan & Sánchez-Páramo (2012) for
Bangladesh, Mexico, Philippines and Poland.
18 A detailed review is provided in Bourguignon et al. (2008) and Essama-Nssah (2005). See also Figari et al.
(2015) for a discussion.
28
The nowcasting methodology presented in this report is based on microsimulation techniques
used in combination with more up-to-date statistics from National Accounts, LFS and other
national sources. It aims at developing a generic approach that can be applied to all EU
countries in a straightforward, flexible and transparent way. By doing so, it ensures the
comparability and consistency of the methodology both across countries and through time.
Our analysis makes use of EUROMOD, the microsimulation model based on EU-SILC data
which estimates in a comparable way the effects of taxes and benefits on the income
distribution in each of the EU Member States. For the purposes of the nowcasting exercise
standard EUROMOD policy simulation routines are enhanced with additional adjustments to
the input data to take into account different changes in the socio-economic conditions of the
underlying population. In order to produce flash estimates for income indicators, the
microsimulation approach is used to update the structure of a micro dataset to account for
changes to the main components of income variables over time. This is based on the
following stages: (1) adjustment for changes to the demographic structure of the population
and for changes to the presence of income sources determined by labour market
characteristics; (2) uprating the level of market income components; and (3) changes in taxes
and benefits due to policy reforms (O'Donoghue and Loughrey, 2014). For this exercise we
used the last year available for all countries for EUROMOD input datasets, namely EU-SILC
2012 (income 2011). This was available for all countries apart United Kingdom and Spain.
For Greece the 2014 file was used.
The remaining of this section explains each of these stages in detail.
3.2.1. Changes in population characteristics
There are two main approaches to take into account changes in population characteristics:
static and dynamic. The static approach is based on reweighting (or calibration). It consists of
the derivation of a new vector of sample weights that brings the marginal distributions from
the base year for a set of main socio-economic variables (e.g. age, labour, gender) to the level
of the target year. In the dynamic process individual trajectories are modelled and individuals
in the sample undergo transitions. The paper tests both approaches. The main auxiliary source
of information used to obtain the population characteristics in the target year is the Labour
Force Survey (LFS) statistics. LFS micro-level statistics for year N are usually available in
April N+1. This allows the production of flash estimates for year N based on the updated
structure for labour and demographics.
Modelling labour market transitions
The dynamic approach to take into account changes in population characteristics is based on
modelling net employment transitions. It accounts for changes in labour market
characteristics, while other population characteristics (such as demographics) are kept
constant. A detailed discussion of this approach can be found in Navicke et al. (2013) and
Rastrigina et al. (2016). Changes in employment are modelled by explicitly simulating
transitions between labour market states (Figari et al., 2011; Fernandez Salgado et al., 2013;
Avram et al., 2011). Two types of transitions are modelled: (i) from non-employment into
employment and (ii) from employment into short-term/long-term unemployment (or
29
inactivity). Observations are selected for transitions based on their conditional probabilities of
being employed rather than being unemployed or inactive. A logit model is used for
estimating these probabilities for working age (16-64) individuals in the EUROMOD input
data. In order to account for gender differences in the labour market situation, the model is
estimated separately for men and women. Students, working-age individuals with permanent
disability or in retirement and mothers with children aged below 2 are excluded from the
estimation, unless they report employment income in the underlying data. Explanatory
variables include age, marital status, education level, country of birth, employment status of
partner, unemployment spells of other household members, household size, number of
children and their age, home ownership, region of residence and urban (or rural) location. The
specification of the logit model used and the estimated coefficients are reported in Rastrigina
et al. (2016).
The weighted total number of observations that are selected to go through transitions
corresponds to the relative net yearly change in employment rates by age group and gender (a
total of 6 strata) as shown in the LFS statistics. Changes from short-term to long-term
unemployment are modelled based on a similar selection procedure but using LFS figures on
long-term unemployment (with unemployment duration more than one year) as an external
source of information. This transition is critical due to its implications for eligibility and
receipt of unemployment benefits. Transitions to and from inactivity are modelled implicitly
through restricting eligibility for unemployment benefits, according to the country-specific
rules.
Labour market characteristics and sources of income are adjusted for those observations that
are subject to transitions. In particular, employment and self-employment income is set to
zero for individuals moving out of employment. For individuals moving into employment,
earnings are set equal to the mean among those already employed within the same stratum.
Unemployment benefits are simulated for those moving out of employment in case they are
eligible for such benefits according to the country rules. If the rules require assessment of
earnings and number of months in work for several years preceding unemployment, we
assume that these remain unchanged throughout the assessment period and are equal to the
values observed in the income reference period. For those moving into long-term
unemployment the eligibility is adjusted assuming that the duration of unemployment spell is
more than one year. In some countries the long-term unemployed are not eligible for any
unemployment benefits (e.g. Latvia); in other countries they are not eligible for
unemployment insurance but still qualify for unemployment assistance (e.g. Portugal); in
countries with fairly long duration of unemployment insurance (e.g. Finland) we assume that
the long-term unemployed continue to receive unemployment insurance.
Reweighting
The static approach to account for changes in population characteristics is based on
reweighting and consists of the derivation of a new vector of sample weights that brings the
marginal distributions from the base year for a set of main socio-demographic variables to the
level of the target year. This approach allows controlling for a wider range of population
characteristics including retirement and demographic changes. In theory, this can be done also
30
through modelling individual transitions but the process is much more complex. However,
there are limitations related to the use of reweighting in the context of nowcasting: given that
it only adjusts the structure of the population to some marginal distributions it may perform
worse in times of rapid economic changes. For example, reweighting cannot capture if
individuals entering a particular state have characteristics completely different from the
characteristics of the people observed in that state in the base year. Such changes were often
observed during recent unemployment shocks.
The variables that are more likely to impact the income distribution over time should be both
highly related to income and volatile: thus the main relevant variables are related to labour
market information and changes in household composition. Other relevant controlled
characteristics include regional information. The reweighting can be done at household or
individual level. In the end the last option was dropped as results were not satisfactory.
The variables at household level are based on demographic and employment characteristics of
individuals within households: number of members in the household by age group, gender
and activity status; household size and number of dependent children; region and degree of
urbanisation. The target distributions of the relevant variables are obtained from the LFS.
However, the initial distributions observed in EU-SILC and LFS are not always consistent.
This implies that reweighting based on LFS margins can introduce a bias. For example, in
Portugal there is a 10% difference in the number of self-employed in 2011. Nevertheless in
both sources there was a decrease of 7% between 2011 and 2012. In such cases we adjusted
the margins in LFS by adding up the percentage change from LFS to the EU-SILC base year.
Hence, the adjustment reflects the change to a more recent structure while systematic source
inconsistencies are ruled out.
Overall, two alternative reweighting methods were tested in this paper:
a. CAL_H_S2L: calibration at household level based on the marginal distributions (levels)
from LFS for the following variables: household size, number of people by age group and
sex, number of people part/full-time employed/self-employed/retired, region and degree
of urbanisation.
b. CAL_H_ S2L_A: calibration at household level based on the changes in the shares for
the same variables.
The principle of calibration is that it minimizes the distance between the vectors of weights so
that the other distributions are not distorted. Further work should investigate not only the
effect on the income distribution but also possible side effects of calibration such as
distortions of distributions for variables not controlled for in the calibration.
3.2.2. Updating non-simulated income sources
After adjusting the input data for changes in the population characteristics, the next step is to
update non-simulated income beyond the income data reference period. This approach applies
uprating coefficients to market incomes19
and non-simulated social benefits (or taxes). The
19
Market incomes are wages and salaries, self-employment income, property income, income from capital, etc.
31
coefficients are based on more timely data sources from the target year which reflect
indexation rules or the change in the average income per recipient. Two approaches are tested
in the paper.
EUROMOD uprating factors
EUROMOD contains uprating factors based on available administrative or survey statistics.
Country-specific updating factors are derived for each income source, reflecting statutory
rules (such as indexation rules) or the change in the average amount per recipient between the
income data reference period and the target year. The latter is preferred for the nowcasting
exercise, especially for pensions. The evolution of average pensions can capture important
changes in the population of pensioners (e.g. inflow of newly retired pensioners with higher
average pensions). In order to capture differential growth rates in employment income,
updating factors are disaggregated by economic activity and/or by economic sector if such
information is available.
Model-based factors for socio-demographic groups
An alternative way to update income also uses EUROMOD uprating factors as a base but
introduces differential growth rates in the income distribution for some important income
components (such as employment and self-employment income) via a model-based approach.
The modelling approach starts from the premise that the growth rates of income of individuals
belonging to a given socio-demographic category follow the same probability distribution. A
socio-demographic category is defined as one specific outcome of all possible combinations
of a collection of categorical variables. The estimation algorithm is divided into two steps: (i)
the classification step and (ii) the modelling step.
The classification step aims at detecting those socio-demographic categories whose time
series of growth rates display similar patterns and to regroup these categories into larger
classes. We use EU-SILC time series for 2008-2011 to compute the average growth rates of
each of the socio-demographic categories. For wages we use the following variables: gender;
geographical regions; degree of urbanisation (densely populated, intermediate, and thinly
populated area); type of employment (full-time, part-time, other); age groups (16-24, 25-34,
35-44, 45-54, 55+); wage quantile group in the base year (QC1, QC2, QC3, QC4, QC5). At
each node of the decision tree there are several possibilities to build the subclasses using
different categorical variables. In order to choose between these potential subclasses we need
to define an optimality criterion.
To obtain the optimality criterion we estimate different logistic regression models and
compute the associate Wald significance test for the regression parameters. The explanatory
variables of the different models are identical and are given by the growth rates from 2008 to
2011 income years. However, the models differ from each other with respect to dependent
variables. Each model uses a different categorical variable as a dependent variable. Finally,
we choose the categorical variable associated with the lowest p-value for the Wald test of
significance indicating the maximum discriminatory power for this variable. Figure 3.1
illustrates the shape of the decision tree for Austria using EU-SILC 2009- 2012 (2008-2011
income years).
32
The modelling step consists of estimating the underlying statistical properties of the growth
time series of the classes using a dynamic factor modelling approach. In order to produce
estimates of income growth rates, the factor model includes explanatory variables which are
more timely than the EU-SILC data, e.g. non-financial quarterly sector accounts such as gross
domestic product, gross saving rate, compensation of employees and social contributions and
benefits.20
The estimates are obtained through a combined use of the expectation-
maximization algorithm and the Kalman filter/smoother (see Shumway and Stoffer, 1982 and
Banbura et al., 2013).
Figure 3.1: Decision tree for building socio-demographic groups in Austria (EU-SILC 2009-2012)
20
Source: Eurostat, non-financial transactions, code “nasq_10_nf_tr”.
33
3.2.3. Simulating changes in tax-benefit policies
After updating market income and other non-simulated income sources, we simulate tax-
benefit policies for each year from the base year up to the target year.
We use EUROMOD to simulate changes in the income distribution within the period of
analysis. Income elements simulated by the model include universal and targeted cash
benefits, social insurance contributions and personal direct taxes. Income elements that cannot
be simulated mostly concern benefits for which entitlement is based on previous contribution
history (e.g. pensions) or unobserved characteristics (e.g. disability benefits). These are read
from the data and updated according to statutory rules (such as indexation rules) or changes in
their average levels over time (see Section 4.1). Both contributory and non-contributory
unemployment benefits are simulated in the model; but e.g. severance payments are not.
Detailed information on EUROMOD and its applications can be found in Sutherland & Figari
(2013).
All simulations are carried out on the basis of the tax-benefit rules in place on the 30th
June of
the given policy year. The exceptions to this rule are Estonia (in 2013), Greece (in 2011,
2013-2015), and Portugal (in 2012), where policy changes after the 30th
of June were taken
into account to better match the annual income observed in the EU-SILC data. In order to
enhance the credibility of estimates, an effort has been made to address issues such as tax
evasion (e.g. in Bulgaria, Greece, Italy and Romania) and benefit non-take-up (e.g. in
Belgium, Estonia, France, Ireland, Greece, Poland, Portugal, Romania, and Finland)21
.
However, such adjustments are not possible to implement in all countries due to data
limitations.22
The last methodological step involves an attempt to account for differences between
EUROMOD and EU-SILC estimates of household income in the data reference year (here
2011). The main reasons for these discrepancies are related to the precision of simulations
when information in the EU-SILC data is limited, issues of benefit non-take-up and tax
evasion, under-reporting of income components, and small differences in income concepts
and definitions.23
In order to account for these differences, an alignment factor is calculated for each household.
The factor is equal to the absolute difference between the value of equivalised household
disposable income in EU-SILC 2012 and the EUROMOD estimate for the same period and
income concept. For consistency reasons, the same household specific factor is applied to all
later policy years. This is based on the assumption that the discrepancy between EUROMOD
and EU-SILC estimates remains stable over time. Further work will investigate the
plausibility of these assumptions and the effect on the final results.
21
https://www.euromod.ac.uk/sites/default/files/working-papers/em3-16.pdf
22 Detailed information on the scope of simulations, updating factors, non-take-up and tax evasion adjustments is
provided in the EUROMOD Country Reports (see: https://www.iser.essex.ac.uk/euromod/resources-for-
euromod-users/country-reports).
23 For more detailed information on these issues see Figari et al. (2012) and Jara and Leventi (2014).
34
Table 3.1 below indicates the codes used for the different microsimulation approaches which
correspond to the three stages of the process.
Table 3.2: Model codes of the Microsimulation approach
Uprating factors Updating labour &
demographics EU-SILC alignment factor
no= Data-based uprating factors
available in EUROMOD for all
income components
cal_H_S2L= Calibration at
household level with non-adjusted
LFS data
EUROMOD= No adjustment
for base year differences
between EU-SILC and
EUROMOD
cla= Model-based uprating
factors for employment and
self-employment income
cal_H_S2L_A= Calibration at
household level with adjusted LFS
data
emd_ad= Adjustment for base
year differences between EU-
SILC and EUROMOD
lab_I_S2L= Labour transitions at
individual level with LFS data
3.3. Parametric Quantile Modelling (PQM)
The main assumption of the parametric quantile method approach or PQM approach is that
the entire information of the population distribution of disposable income can be summed up
into a relatively small set of parameters. In terms of flash estimation procedure this
assumption implies that the estimation target is not the probability distribution function but
the small set of parameters. Thus, the estimates of the population parameters represent
sufficient statistics in the sense that no additional information on the income distribution can
be derived from the underlying EU-SILC sample than the one contained in the parameter
estimates.
In order for this “parametric assumption” to be valid the income distribution needs to belong
to some family of distributions. The first distribution one might think of is the normal
distribution since it would allow for a very parsimonious representation of the income
distribution using only two parameters: the expected value or the mean and the standard
deviation. However, there are two main features present in income distributions which make
the normal distribution a bad choice for modelling income data. First, income distributions
tend to have a right tail and hence are characterized by a positive skewness. This presence of
asymmetries within the income distribution stems from the fact that the income can
potentially take on very large values but that it hardly takes on strong negative values given
that most of the income components that make up disposable income, such as wages and
salaries for example, cannot be smaller than zero. Secondly, income distributions are heavy
tailed. Mathematically speaking heavy-tailedness means that the tails of the income
distribution are not exponentially bounded. In a less technical definition the presence of heavy
tails indicate that the probability mass in the tails of the income distribution is larger
compared to the normal distribution. This increased probability mass in the tails results in a
kurtosis larger than three. Note that the normal distribution has skewness of zero and a
kurtosis equal to three and hence that the values of these two parameters are fixed in advance.
This implies the family of normal distributions does not have enough free parameters to
capture the inherent complexity of the income distribution. Hence, the use of the normal
family would result in an under-parametrization of the income distribution given that the
35
mean and the standard deviation are not sufficient for summing up the entire information of
the income distribution and hence the presence of an inherent model bias.
Thus, given the stylized facts of income data we need at least four parameters to represent the
income probability distribution. One of the most used distributions for income modelling is
the generalized beta distribution of the second kind or GB2 distributions. The GB2
distribution is a four-parameter distribution which nests many common distributions such as
the lognormal, generalized gamma, Weibull, chi-square, half-normal, half-Student's t,
exponential and the log-logistic and hence offers a high degree of flexibility. The four
parameters of the GB2 distribution are commonly referred to as 𝑎, 𝑏, 𝑝 and 𝑞. Generally
speaking, the 𝑏 parameter encodes the central location of the parameter and hence a change in
the value of the 𝑏 parameter implies a horizontal shift of the income distribution. The
parameter 𝑎 indicates the general shape of the entire distribution, whereas 𝑝 and 𝑞 control the
shape of the left and the right tail respectively. In mathematical terms the GB2 is defined by
the following probability distribution function
𝐺(𝑥; 𝑎, 𝑏, 𝑝, 𝑞) =|𝑎|𝑥𝑎𝑝−1 (1 − (
𝑥𝑏
)𝑎
)𝑞−1
𝑏𝑎𝑝𝐵(𝑝, 𝑞)
where 𝐵(𝑝, 𝑞) is the distribution function of the Beta distribution.
However, in most cases even the four parameter distributions as the GB2 might not be
flexible enough to represent all the underlying characteristics of the income distribution. Note
for instance that the GB2 belongs to the class of unimodal distributions. However, in reality
income distributions might very well show multimodal features. Indeed, there might be strong
differences with respect to certain income components across different socio-demographic
categories. For instance the distribution of wages and salaries might be different for men and
women or vary across age groups. Hence, given these diverging characteristics of the income
distribution across socio-demographic categories the distribution of disposable income at the
population level might be the outcome of a mixture distribution, and hence of a combination
of distribution corresponding to different subpopulations. Note that the GB2 distribution can
be generalized to mixture distribution settings such as to accommodate cases of multimodal
distributions. However, in order to do so we need to have some a priori information on the
composition of the different subpopulations.
An alternative solution to the issue of multimodal income distributions is to avoid making any
assumptions on the distribution family. The drawback of this approach is that the set of
parameters that should be used to sum up the sample information is not pre-determined in
advance. Indeed, by imposing the GB2 distribution in advance the set of sufficient statistics is
known to be equal to the set of distributional parameters of the GB2. Thus, if the distribution
family is not imposed in advance the type and the number of parameters to be used as
sufficient statistics needs to be chosen by the analyst.
We have used the income quantiles as sufficient statistics. Given that the number of quantiles
is not fixed in advance, we have tested four different quantile grids. If we denote as 𝑌𝑡
36
disposable income at time period 𝑡 the quantile corresponding to the probability level 𝛼 is
defined as
𝑄𝛼 = 𝐹−1(𝛼)
where 𝐹−1(𝑦𝑡) is the cumulative distribution function of 𝑌𝑡. The four quantile grids differ from each
other in terms of their degree of sparsity such that the probability levels are specified as
𝐺𝑟𝑖𝑑 1: 𝛼 = 0.1, 0.3, 0.5, 0.7, 0.9
𝐺𝑟𝑖𝑑 2: 𝛼 = 0.1, 0.2, … , 0.9
𝐺𝑟𝑖𝑑 3: 𝛼 = 0.05, 0.1, … , 0.9, 0.95
𝐺𝑟𝑖𝑑 4: 𝛼 = 0.01, 0.02, … , 0.98, 0.99
The flash estimates of the quantiles are obtained on the basis of dynamic factor models which
exploit the more timely information of large quantities of explanatory data series consisting of
quarterly National Account figures. We test two different parametrizations of the dynamic
factor model in which the latter takes the form of either a one-factor model or a two-factor
model.
The nowcasted income indicators are derived from the flash estimates of the quantiles in three
different ways. The first approach consists in a direct computation of the income indicators
from the estimates quantiles. The remaining two approaches consist in an indirect
computation of the indicators since they first require a reconstitution of the income
distribution from the estimated quantiles via a so-called mapping step. The indicators are then
derived from the reconstituted distributions. We have developed two different mapping
algorithms. The first one is based on the assumption that the income distribution can be well-
represented on the basis of the GB2 Given this assumption the mapping algorithm derives the
optimal values of the distributional parameters of the GB2 from the nowcasted quantiles by
minimizing a non-linear least squares function. The second one reconstructs the income
density function from the estimated quantiles using Kernel smoothing. Note that whereas the
direct computation and the Kernel-based mapping approach are appropriate in either unimodal
or multimodal cases, the nonlinear least squares mapping approach only provides valid results
in cases of unimodal income distributions.
Table 3.2 below indicates the three elements of the codes of the different PQM approaches
differing in terms of quantile grids, number of factors and mapping algorithm.
Table 3.2: Model codes of the PQM approach
Mapping Factor Model Quantile Grid
G = GB2 1 = Single factor 1 = Grid 1
K = Kernel 2 = Double factor 2 = Grid 2
R = Raw
3 = Grid 3
4 = Grid 4
37
The model codes are concatenations of each combination of the three elements. Thus, there
are twenty-four possible model parametrizations with the first element of the code indicating
the mapping approach, the second one encoding the number of factors and the third one the
type of quantile grid. Thus, for instance K12 represents the PQM model using the Kernel
mapping approach as well as a two-factor time series model with the components of the
second quantile grid as dependent variables. More details on the methodological framework
for PQM can be found in Annex 1.
3.4. Benchmark models
We consider three different kinds of benchmark models. The first one is based on the
assumption that the unknown population indicator is constant over time and hence that the
best predictor of the latter at a given target year 𝑇 is the unweighted average of historical
SILC indicators. Hence, if we denote as �̃�1, … , �̃�𝑇−𝑙 the available SILC indicators at target
year 𝑇 with 𝑙 > 0, the flash estimate of the constant becnhmark model at target year 𝑇 is
defined as
�̂�𝑇 =1
𝑇 − 𝑙∑ �̃�𝑡
𝑇−𝑙
𝑡=1
.
The assumption that the population are constant over time might be inherently wrong for the
location indicators such as the ARPT of the income deciles which might rather show some
trending behavior especially in times of inflation. Note that the constant model can be written
as a linear regression model with zero slope parameter such that
�̃�𝑇 = 𝛼 + 𝜖𝑇
with 𝜖𝑇 representing the sampling noise. Thus, the linear model nests the constant model
while allowing for the presence of a linear trend with respect to the time index and hence the
flash estimate is formulated as
�̂�𝑇 = 𝛼 + 𝛽𝑇 + 𝜖𝑇 .
Thus, to produce the flash estimate at T we only need to obtain estimated versions of the
regression parameters α and β. We do so by running an ordinary least-squares algorithm on
the sample of available EU-SILC indicators.
The third benchmark model is based on the assumption that the best possible estimate of the
population indicator at 𝑇 is simply the last available EU-SILC indicator. According to this,
the flash estimate at 𝑇 takes on the following form
�̂�𝑇 = �̃�𝑇−𝑙.
38
4. Quality framework
The quality of the flash estimates depends on both the methods and sources used. Therefore,
setting and using a quality framework becomes a central part of the production process to
select the methods and auxiliary sources that perform best in reproducing EU-SILC indicators
for previous years. This would also allow to benchmark results stemming from Eurostat
exercise, including the estimates based on EUROMOD done in collaboration with ISER, as
well as other national initiatives. In particular, the United Kingdom and France are also
performing their own flash estimates at national level and therefore a common platform for
validating and benchmarking results is essential.
The quality framework is composed of two parts: (1) the quality assurance, which focuses on
analysing inconsistencies in the input data and includes several intermediate quality checks
along the process; (2) the quality assessment, which focuses on the historical performance of
different methods.
4.1. Quality Assurance
The quality framework is an essential tool for designing the production process of flash
estimates. Therefore, the quality framework doesn't focus only on the final results but
includes also the following steps:
Consistency analysis of auxiliary data sources: Consistency means that income
components and totals as measured by National Accounts, or unemployment as
measured by LFS and related EU-SILC statistics show similar trends. The lack of
such consistency at macro level in the trends showed by EU-SILC and auxiliary
sources would mean that a flash estimate generated by a model that includes the
aforementioned macro data as explanatory variables is not expected to be a good
estimation of the EU-SILC-derived indicator.
Intermediate checks along the stages of production of flash estimate: The main
purpose is to draw conclusions concerning the performance and the impact on results
of different methodological steps. For microsimulation, there are checks after
calibration, uprating and the simulation of benefits and taxes.
4.1.1. Consistency analysis of auxiliary sources
An important element is the use of auxiliary information in order to estimate changes in
income distribution based on the evolution of related indicators such as income components
from National Accounts, labour market changes etc. Therefore, this estimation relies on the
assumption that the trends observed in EU-SILC data and in auxiliary data sources are similar.
A retrospective assessment of consistency was performed on the auxiliary information used in
the estimation process: LFS and National Accounts. In addition, in the case of
39
microsimulation, the accuracy of income components simulated with EUROMOD and
observed in EU-SILC was tested for the base year.
For this, the original EU-SILC indicators and the estimated ones are compared based on two
main metrics:
(1) Accuracy - measuring the degree to which the point estimate is similar to the
reference value based on the mean absolute percentage error (MAPE):
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑀𝐴𝑃𝐸 = 1 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑖=1𝑛 (𝑎𝑏𝑠 (
𝐸𝑆𝑇𝑖
𝑅𝐸𝐹𝑖− 1))
(2) Consistency - measuring the extent to which the year-on-year rates of changes are
similar across the time series:
𝑐𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 = 1 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑖=1𝑛 (𝑎𝑏𝑠 (
𝐸𝑆𝑇𝑖
𝐸𝑆𝑇𝑖−1−
𝑅𝐸𝐹𝑖
𝑅𝐸𝐹𝑖−1))
Both measures involve subtraction from 1 in order to have higher values indicating better
performance. The two performance metrics allow conducting an extensive comparative
analysis across countries, years and methods and to summarise a large amount of information.
In addition for categorical variables in order to assess the similarity of two probabilistic
distributions (',VV ), we use the Hellinger distance (HD):
K
i P
Pi
O
OiK
i N
n
N
niVpiVpVVHD
1
2
1
2''
2
1)()(
2
1),(
where:
K is the total number of cells in the contingency table;
nOi is the frequency of cell i in the first data set (e.g. EU-SILC);
nPi is the frequency of cell i in the second data set (e.g. LFS);
N is the total size of the specific sources.
A HD value of 0% indicates a perfect similarity between the two probabilistic distributions,
whereas a HD value of 100% indicates a total discrepancy.24
24
Usually, in statistical analysis a Hellinger distance smaller than 5% is considered acceptable.
40
Labour Force Survey data
In the first stage, the socio-demographic structure of the input data is updated in line with the
labour and demographic information from the LFS data for the target year. As the main
calibration methods are at household level, the assessment is done for household level
variables which are based on basic demographic and especially employment individual
information.
In general, distributions of demographic variables are well-aligned in EU-SILC and LFS,
whereas higher discrepancies are observed for labour variables. Table 4.1 provides a summary
for the main demographic and labour variables at household level by country.
We can observe that in general the HD values are rather low for the main demographic
variables with average HD by country ranging from 0.46 for Italy to 4.64 for Croatia. For
several variables, there are issue related to the number of people 74+ as these are not included
in the LFS sample (SE, DK and EE). The degree of urbanisation for several countries (IT, LV,
EL, LU, MT, FI and SE) had a high HD so they were not used further on in the estimation.
The HD is higher for labour variables at household level especially for Romania but also for
specific groups for Bulgaria, Cyprus, Croatia, Sweden and Denmark.
Table 4.2 provides an overview on the trends in EU-SILC and LFS for main employment
variables at individual level. We can observe that once we rule out source inconsistencies
which are systematic and visible in the levels, the year on year changes are consistent. Two
performance metrics are used for assessing the similarity between EU-SILC and LFS:
accuracy for the levels and consistency for the trends. Therefore, for calibration, an
adjustment is done so that the update takes into account only the changes in LFS from the
base to the target year rather than the actual values.
A similar approach is taken for modelling employment transitions: the total number of
simulated labour market transitions in EUROMOD input data and their direction are
determined by relative changes in employment rates as shown in the LFS. However, as noted
in Rastrigina et al. (2015), employment changes are not always the same in LFS and EU-
SILC. The evolution of employment rates in LFS and EU-SILC for 2011-2013 follows
different trends in Luxembourg and for specific categories in the Czech Republic, Latvia and
France. There are several reasons for these discrepancies, such as differences in definitions,
imputations, survey methodology, as well as operational differences that may affect the nature
of non-response and sampling errors. A detailed discussion on these issues can be found in
Rastrigina et al. (2015).
41
Table 4.1: Average HD (%) for main variables used in the estimation by country: EU-SILC vs. LFS (2011-2014)
Variables excluded from calibration.Region of household: EE, CY, LV, LT, LU, MT: no regional breakdown for NUTS level 2. EL, HR: LFS and EU-SILC are based on different NUTS versions.
HD >= 5.0
DE, UK: (n/a)
BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE
Children 0-5 years 1.11 3.46 1.72 2.97 1.82 1.40 2.35 0.83 0.63 2.43 0.69 1.07 1.48 0.72 0.97 0.59 3.27 0.81 0.40 0.31 0.83 2.82 0.46 1.59 0.84 1.28
Children 6-9 years 1.28 3.48 0.56 2.23 1.97 1.94 0.90 0.48 1.20 3.30 0.22 1.89 1.06 0.89 2.30 0.67 2.17 0.59 0.97 0.64 0.59 3.37 1.94 1.09 1.28 1.55
Children 10-14 years 1.74 3.19 1.18 2.14 2.17 0.92 0.67 0.86 1.45 3.31 0.25 2.71 0.50 1.29 2.63 0.45 2.32 0.36 0.58 2.02 0.56 3.22 2.14 1.39 1.21 1.23
Children 15-19 years 0.96 1.66 0.21 2.34 1.02 2.64 1.08 0.69 0.71 2.46 0.39 3.70 1.01 0.54 1.81 1.30 1.98 0.57 1.75 1.78 0.61 2.33 0.71 2.40 0.83 1.15
Persons 20-24 years 0.69 2.85 0.89 2.48 1.03 1.23 1.09 0.33 0.89 2.37 0.43 5.51 1.10 1.13 2.19 1.87 0.90 1.36 0.87 0.34 0.49 1.92 0.96 2.37 1.53 3.55
Females 25-34 years 0.46 4.66 0.78 0.71 0.19 2.65 1.55 0.58 0.53 3.78 0.66 1.52 1.05 0.26 1.29 1.30 0.69 0.66 0.62 2.49 0.88 4.60 0.43 2.68 0.58 0.52
Males 25-34 years 0.62 4.41 2.33 0.69 0.46 2.01 2.18 0.79 0.65 3.60 0.44 2.43 0.61 0.91 0.89 2.05 2.09 0.59 2.15 2.62 0.62 4.48 0.42 2.44 0.81 0.47
Females 35-49 years 1.31 1.72 0.91 2.59 1.01 1.48 1.60 0.56 0.69 9.06 0.50 2.90 0.67 0.44 0.66 2.12 2.10 0.51 1.05 0.74 0.54 2.42 1.81 1.52 0.59 0.26
Males 35-49 years 0.92 2.10 0.80 1.62 2.08 2.03 2.31 0.89 0.37 7.89 0.55 1.36 0.68 0.72 0.69 1.42 1.35 0.54 1.57 1.51 0.62 2.98 1.05 1.15 0.88 0.87
Females 50-64 years 1.31 1.16 0.80 2.73 1.26 1.31 1.35 0.69 0.58 9.31 0.48 1.97 0.42 0.83 0.98 3.42 2.25 0.40 1.12 0.73 0.59 0.98 1.25 1.26 1.06 0.30
Males 50-64 years 0.82 1.67 0.90 1.92 1.73 2.54 1.35 0.42 0.74 7.45 0.48 1.10 0.68 0.88 0.92 3.25 1.93 0.31 1.35 0.34 0.86 0.80 0.67 1.10 0.72 0.18
Persons 65-69 years 0.57 1.36 0.72 2.89 1.12 0.41 2.90 0.42 0.56 3.14 0.37 0.82 0.62 0.61 0.78 1.18 0.40 0.46 0.77 0.36 0.49 1.34 1.04 1.44 0.56 0.68
Persons 70-74 years 3.38 0.35 0.64 1.45 1.43 0.49 1.39 0.53 0.35 2.52 0.38 1.02 0.78 0.57 0.65 0.50 0.98 0.71 0.80 0.74 0.29 1.49 0.74 1.05 0.39 0.79
Person 75 years and older 2.70 0.78 0.27 14.06 12.10 0.86 1.08 0.65 0.59 4.34 0.67 1.36 0.74 1.51 2.60 0.49 1.25 0.84 0.55 0.34 0.55 1.89 1.57 1.86 1.75 18.51
AVERAGE on main
demographic variables 1.28 2.35 0.91 2.92 2.10 1.56 1.56 0.62 0.71 4.64 0.46 2.10 0.81 0.81 1.38 1.47 1.69 0.62 1.04 1.07 0.61 2.47 1.09 1.67 0.93 2.24
Full-time employed persons
(15-74 years) 1.34 3.80 1.70 4.52 3.20 2.64 1.38 0.90 0.70 1.90 1.60 2.47 0.50 2.10 3.27 0.90 1.85 1.68 1.60 0.70 1.90 5.62 2.29 3.21 1.60 5.64
Part-time employed persons
(15-74 years) 0.74 3.61 2.50 7.06 1.90 2.41 3.62 0.90 0.70 1.70 1.10 0.78 0.80 3.00 1.22 2.60 4.99 2.31 1.30 1.00 1.10 0.85 3.01 3.17 0.60 1.77
Self-employed persons
(15-74 years) 3.95 2.07 0.90 1.62 1.20 2.97 2.39 1.50 0.40 6.14 1.30 5.37 2.50 1.60 1.08 1.80 3.38 2.24 0.70 0.90 2.30 2.90 2.35 2.26 0.30 1.85
Unemployed persons
(15-74 years) 0.63 5.09 3.60 1.29 1.90 1.78 2.06 2.70 1.00 3.51 2.80 1.40 1.30 1.20 4.49 1.00 1.53 3.16 1.30 1.20 1.00 7.53 0.71 0.33 1.20 3.59
Retired persons
(15-74 years) 3.19 2.27 1.00 4.42 1.20 0.74 3.14 0.60 1.60 0.77 1.00 1.84 0.70 1.90 1.43 1.60 1.57 1.69 1.60 1.50 2.40 3.82 1.08 1.66 1.10 1.79
AVERAGE on labour
variables 1.97 3.37 1.96 3.78 1.87 2.11 2.52 1.32 0.87 2.80 1.58 2.37 1.14 1.95 2.29 1.59 2.66 2.52 1.28 1.07 1.75 4.14 1.89 2.13 0.95 2.93
Household size 4.32 8.46 1.99 2.39 1.19 1.86 5.69 0.48 1.07 1.26 0.53 2.89 0.95 2.56 1.18 0.51 4.32 1.12 0.46 3.15 0.46 6.67 2.91 5.86 0.34 7.47
Region of household 1.22 1.22 1.08 1.51 0.10 0.25 5.30 0.08 0.74 0.45 0.21 2.55 0.26 4.11 0.00 0.64 0.27 3.47
Degree of urbanisation of
household 2.76 1.57 1.49 0.49 0.43 3.75 10.02 2.39 2.83 5.50 8.41 2.42 11.61 1.11 6.39 4.48 33.97 1.61 0.47 1.05 1.41 3.53 4.13 5.10 7.94 12.60
1.64 2.78 1.24 2.95 1.93 1.74 2.35 0.85 1.05 4.08 1.06 2.22 1.38 1.20 1.83 1.56 3.46 1.09 1.02 1.22 0.90 3.18 1.44 2.06 1.18 3.11
Calibration variables
Variables refer to current
reference period
Variables refer to income
reference period (IRP)
Number of …. in
household
AVERAGE on all variables
42
Table 4.2: Average accuracy and consistency for employment variables: EU-SILC vs. LFS (2011-
2014)
2012
2011
2013
2012
2014
2013AVERAGE > 0.95 AVERAGE > 0.95
BE Employed persons (15-74 years) LFS(y) 0.28 -0.53 0.69 0.990 0.9900 0.961 0.9610
SILC(y+1) 1.80 -0.86 -0.48
Unemployed persons (15-74 years) LFS(y) 0.30 3.36 -1.65 0.902 0.9020 0.877 0.8770
SILC(y+1) -21.92 7.32 -4.86
Retired persons (older than 64 years) LFS(y) 2.28 1.29 1.80 0.991 0.9910 0.921 0.9210
SILC(y+1) 3.80 2.09 1.50
BG Employed persons (15-74 years) LFS(y) -0.03 -0.37 0.37 0.983 0.9830 0.974 0.9740
SILC(y+1) -2.38 1.72 1.11
Unemployed persons (15-74 years) LFS(y) 4.99 -1.72 -4.18 0.972 0.9720 0.841 0.8410
SILC(y+1) 7.49 -5.16 -6.57
Retired persons (older than 64 years) LFS(y) 1.91 -0.06 -0.86 0.979 0.9790 0.981 0.9810
SILC(y+1) -0.27 2.33 0.83
CZ Employed persons (15-74 years) LFS(y) 0.89 1.02 0.63 0.991 0.9910 0.969 0.9690
SILC(y+1) 1.39 -0.20 1.57
Unemployed persons (15-74 years) LFS(y) 3.18 2.47 -9.02 0.990 0.9900 0.729 0.7290
SILC(y+1) 4.96 2.93 -9.76
Retired persons (older than 64 years) LFS(y) 4.46 3.11 3.13 0.994 0.9940 0.960 0.9600
SILC(y+1) 3.22 3.50 2.82
DK Employed persons (15-74 years) LFS(y) -1.10 0.55 1.08 0.974 0.9740 0.981 0.9810
SILC(y+1) -2.56 3.34 4.60
Unemployed persons (15-74 years) LFS(y) -0.65 -8.06 -1.90 0.825 0.8250 0.915 0.9150
SILC(y+1) 32.03 -22.76 3.23
Retired persons (older than 64 years) LFS(y) 8.54 2.72 3.18 0.948 0.9480 0.721 0.7210
SILC(y+1) 2.23 -4.58 1.07
DE Employed persons (15-74 years) LFS(y) 0.75 1.38 1.11 0.983 0.9830 0.969 0.9690
SILC(y+1) -2.03 1.98 .
Unemployed persons (15-74 years) LFS(y) -7.31 -1.83 -4.20 0.971 0.9710 0.712 0.7120
SILC(y+1) -1.81 -2.18 .
Retired persons (older than 64 years) LFS(y) . . .
SILC(y+1) . . .
EE Employed persons (15-74 years) LFS(y) 1.98 0.42 1.25 0.992 0.9920 0.983 0.9830
SILC(y+1) 0.49 0.81 1.65
Unemployed persons (15-74 years) LFS(y) -14.39 -17.06 -15.76 0.996 0.9960 0.901 0.9010
SILC(y+1) -13.63 -17.56 -15.77
Retired persons (older than 64 years) LFS(y) 0.57 1.50 1.16 0.986 0.9860 0.970 0.9700
SILC(y+1) -0.85 3.87 1.64
IE Employed persons (15-74 years) LFS(y) -0.87 1.20 1.90 0.987 0.9870 0.944 0.9440
SILC(y+1) -0.26 3.13 .
Unemployed persons (15-74 years) LFS(y) -0.75 -11.62 -10.69 0.980 0.9800 0.911 0.9110
SILC(y+1) -2.29 -14.17 .
Retired persons (older than 64 years) LFS(y) 7.80 7.35 3.34 0.996 0.9960 0.978 0.9780
SILC(y+1) 7.73 6.58 .
EL Employed persons (15-74 years) LFS(y) -9.59 -5.51 2.05 0.980 0.9800 0.975 0.9750
SILC(y+1) -9.39 -0.54 1.35
Unemployed persons (15-74 years) LFS(y) 32.99 11.61 -2.67 0.903 0.9030 0.913 0.9130
SILC(y+1) 17.64 -0.76 -3.96
Retired persons (older than 64 years) LFS(y) 2.46 1.66 1.26 0.974 0.9740 0.938 0.9380
SILC(y+1) 3.97 -4.37 1.41
ES Employed persons (15-74 years) LFS(y) -4.77 -3.33 0.72 0.988 0.9880 0.967 0.9670
SILC(y+1) -4.67 -2.72 3.45
Unemployed persons (15-74 years) LFS(y) 14.20 3.98 -7.38 0.980 0.9800 0.879 0.8790
SILC(y+1) 9.98 3.80 -5.62
Retired persons (older than 64 years) LFS(y) 2.48 2.74 3.75 0.981 0.9810 0.976 0.9760
SILC(y+1) 1.16 5.66 2.26
FR Employed persons (15-74 years) LFS(y) 0.30 -0.29 1.91 0.981 0.9810 0.983 0.9830
SILC(y+1) -0.33 -3.96 3.33
Unemployed persons (15-74 years) LFS(y) 4.11 3.55 11.32 0.842 0.8420 0.832 0.8320
SILC(y+1) 6.78 21.00 -15.91
Retired persons (older than 64 years) LFS(y) 3.00 1.86 5.37 0.977 0.9770 0.979 0.9790
SILC(y+1) 2.24 4.09 1.41
HR Employed persons (15-74 years) LFS(y) -2.12 -1.65 1.58 0.971 0.9710 0.933 0.9330
SILC(y+1) 1.82 1.43 -0.11
Unemployed persons (15-74 years) LFS(y) 7.75 8.21 -10.75 0.965 0.9650 0.848 0.8480
SILC(y+1) 2.26 4.40 -9.41
Retired persons (older than 64 years) LFS(y) 0.73 1.81 0.42 0.976 0.9760 0.966 0.9660
SILC(y+1) 1.02 0.67 6.22
IT Employed persons (15-74 years) LFS(y) -0.03 -1.34 0.87 0.991 0.9910 0.965 0.9650
SILC(y+1) -0.56 -0.04 .
Unemployed persons (15-74 years) LFS(y) 13.75 13.91 4.84 0.968 0.9680 0.824 0.8240
SILC(y+1) 9.95 11.22 .
Retired persons (older than 64 years) LFS(y) 0.26 2.49 2.25 0.990 0.9900 0.990 0.9900
SILC(y+1) 0.20 4.36 .
CY Employed persons (15-74 years) LFS(y) -1.61 -6.75 -1.62 0.985 0.9850 0.946 0.9460
SILC(y+1) -3.74 -5.88 .
Unemployed persons (15-74 years) LFS(y) 42.81 32.67 2.71 0.935 0.9350 0.908 0.9080
SILC(y+1) 32.79 29.78 .
Retired persons (older than 64 years) LFS(y) 4.01 4.06 0.01 0.947 0.9470 0.920 0.9200
SILC(y+1) 1.82 -4.32 .
CONSISTENCY ACCURACYCountry Variable
(Number of…)
Source Growth rate in %, YoY
43
2012
2011
2013
2012
2014
2013AVERAGE > 0.95 AVERAGE > 0.95
LV Employed persons (15-74 years) LFS(y) 1.40 2.26 -1.04 0.986 0.9860 0.994 0.9940
SILC(y+1) -0.32 4.09 -1.75
Unemployed persons (15-74 years) LFS(y) -7.97 -18.28 -10.13 0.907 0.9070 0.914 0.9140
SILC(y+1) -20.63 -12.92 -20.09
Retired persons (older than 64 years) LFS(y) -0.90 -0.83 1.59 0.997 0.9970 0.990 0.9900
SILC(y+1) -0.80 -0.30 1.82
LT Employed persons (15-74 years) LFS(y) 1.21 0.46 1.70 0.986 0.9860 0.991 0.9910
SILC(y+1) -0.07 2.02 0.40
Unemployed persons (15-74 years) LFS(y) -12.34 -10.29 -9.75 0.978 0.9780 0.951 0.9510
SILC(y+1) -16.45 -11.29 -8.36
Retired persons (older than 64 years) LFS(y) -0.67 0.34 -0.53 0.993 0.9930 0.982 0.9820
SILC(y+1) -0.45 -0.41 -1.58
LU Employed persons (15-74 years) LFS(y) 7.37 1.26 4.94 0.897 0.8970 0.945 0.9450
SILC(y+1) -8.45 6.07 .
Unemployed persons (15-74 years) LFS(y) 12.90 18.15 1.03 0.867 0.8670 0.610 0.6100
SILC(y+1) -3.14 7.53 .
Retired persons (older than 64 years) LFS(y) 2.35 1.37 6.99 0.902 0.9020 0.947 0.9470
SILC(y+1) -6.89 11.64 .
HU Employed persons (15-74 years) LFS(y) 2.65 2.37 5.46 0.985 0.9850 0.990 0.9900
SILC(y+1) 1.42 2.10 8.47
Unemployed persons (15-74 years) LFS(y) 3.68 -4.85 -18.61 0.940 0.9400 0.943 0.9430
SILC(y+1) 1.95 3.74 -26.28
Retired persons (older than 64 years) LFS(y) 3.12 2.81 1.24 0.988 0.9880 0.953 0.9530
SILC(y+1) 3.93 1.68 -0.57
MT Employed persons (15-74 years) LFS(y) 2.48 2.53 2.37 0.990 0.9900 0.984 0.9840
SILC(y+1) 1.91 1.10 3.49
Unemployed persons (15-74 years) LFS(y) 3.90 22.79 6.99 0.874 0.8740 0.852 0.8520
SILC(y+1) 8.38 1.79 -5.23
Retired persons (older than 64 years) LFS(y) 6.51 8.04 5.98 0.968 0.9680 0.954 0.9540
SILC(y+1) 8.65 3.36 8.67
NL Employed persons (15-74 years) LFS(y) 0.13 -1.76 -2.35 0.987 0.9870 0.921 0.9210
SILC(y+1) -0.58 -2.64 0.05
Unemployed persons (15-74 years) LFS(y) 18.76 25.43 1.97 0.873 0.8730 0.708 0.7080
SILC(y+1) 7.78 6.19 9.98
Retired persons (older than 64 years) LFS(y) 4.45 3.78 -28.49 0.888 0.8880 0.918 0.9180
SILC(y+1) 4.00 2.48 3.27
AT Employed persons (15-74 years) LFS(y) 0.95 0.42 -0.01 0.991 0.9910 0.971 0.9710
SILC(y+1) 0.91 -0.07 2.09
Unemployed persons (15-74 years) LFS(y) 3.62 9.71 10.42 0.951 0.9510 0.918 0.9180
SILC(y+1) 3.68 7.15 -1.79
Retired persons (older than 64 years) LFS(y) 2.09 2.41 2.08 0.995 0.9950 0.969 0.9690
SILC(y+1) 3.08 2.80 2.15
PL Employed persons (15-74 years) LFS(y) 0.76 0.38 2.09 0.962 0.9620 0.963 0.9630
SILC(y+1) -3.15 1.60 -4.18
Unemployed persons (15-74 years) LFS(y) 5.75 4.42 -10.03 0.978 0.9780 0.979 0.9790
SILC(y+1) 4.58 3.52 -14.65
Retired persons (older than 64 years) LFS(y) 3.13 4.10 4.15 0.990 0.9900 0.939 0.9390
SILC(y+1) 2.59 3.87 1.87
PT Employed persons (15-74 years) LFS(y) -5.51 -2.73 4.67 0.993 0.9930 0.964 0.9640
SILC(y+1) -5.23 -1.87 3.80
Unemployed persons (15-74 years) LFS(y) 18.23 5.25 -11.80 0.979 0.9790 0.954 0.9540
SILC(y+1) 23.85 4.65 -11.84
Retired persons (older than 64 years) LFS(y) 1.82 1.10 2.83 0.983 0.9830 0.867 0.8670
SILC(y+1) -0.65 2.64 1.84
RO Employed persons (15-74 years) LFS(y) 0.66 0.15 1.83 0.932 0.9320 0.886 0.8860
SILC(y+1) 9.05 1.44 -8.86
Unemployed persons (15-74 years) LFS(y) -1.87 3.47 -0.48 0.865 0.8650 -0.137 -0.1370
SILC(y+1) 10.82 -3.44 -21.54
Retired persons (older than 64 years) LFS(y) -0.25 1.24 2.72 0.961 0.9610 0.929 0.9290
SILC(y+1) -9.03 2.99 3.84
SI Employed persons (15-74 years) LFS(y) 0.95 -2.94 -0.78 0.980 0.9800 0.989 0.9890
SILC(y+1) -1.18 -4.36 1.62
Unemployed persons (15-74 years) LFS(y) 6.06 19.51 0.15 0.925 0.9250 0.906 0.9060
SILC(y+1) 17.70 23.14 -7.03
Retired persons (older than 64 years) LFS(y) 1.94 4.57 1.54 0.972 0.9720 0.954 0.9540
SILC(y+1) 3.63 1.37 5.17
SK Employed persons (15-74 years) LFS(y) 1.41 -0.82 1.63 0.970 0.9700 0.948 0.9480
SILC(y+1) -3.08 3.56 1.46
Unemployed persons (15-74 years) LFS(y) 1.91 5.07 -5.59 0.975 0.9750 0.866 0.8660
SILC(y+1) 3.64 0.83 -7.00
Retired persons (older than 64 years) LFS(y) 1.89 2.86 3.08 0.947 0.9470 0.957 0.9570
SILC(y+1) -2.43 6.86 -4.55
FI Employed persons (15-74 years) LFS(y) -0.13 -0.34 -1.60 0.992 0.9920 0.962 0.9620
SILC(y+1) 1.32 -0.66 -2.12
Unemployed persons (15-74 years) LFS(y) 0.31 10.62 8.90 0.983 0.9830 0.898 0.8980
SILC(y+1) 0.42 11.11 13.45
Retired persons (older than 64 years) LFS(y) 3.85 3.71 3.57 0.998 0.9980 0.996 0.9960
SILC(y+1) 3.95 3.46 3.77
SE Employed persons (15-74 years) LFS(y) -0.16 1.07 0.04 0.993 0.9930 0.926 0.9260
SILC(y+1) -0.11 0.62 1.64
Unemployed persons (15-74 years) LFS(y) 9.77 -3.00 -2.88 0.939 0.9390 0.541 0.5410
SILC(y+1) 10.17 4.75 -12.90
Retired persons (older than 64 years) LFS(y) 3.93 6.07 -0.36 0.972 0.9720 0.557 0.5570
SILC(y+1) 2.13 3.04 3.16
ACCURACYCountry Variable
(Number of…)
Source Growth rate in %, YoY CONSISTENCY
44
National Accounts data
Data from National Accounts and EU-SILC were checked in terms of levels and trends for the
period 2008-2014. The intermediate checks showed that for property and self-employment
income, the two sources are very different. Moreover, these components vary significantly
between the quarterly National Accounts data available for the flash estimates and later
revisions. Therefore, only income from employment, social benefits and taxes from the
National Accounts are used for time series modelling. A parallel project25
is ongoing in
Eurostat on the reconciliation of macro-micro data on income and this can also feed the work
on flash estimates in the future
EUROMOD data
EUROMOD input data are based on cross-sectional EU-SILC data. However, these data
undergo some transformations: e.g. missing values are imputed, children born after the
income reference period are dropped from the data, employment characteristics observed in
the data collection period are adjusted to match the income observed in the income reference
period; in some countries gross incomes are recalculated from net incomes using different
(arguably more precise) algorithms than in the original EU-SILC (e.g. in Greece, Italy and
Croatia).. In the case of Luxembourg the discrepancies between the nowcasted estimates and
the Eurostat indicators in the base year are caused by the fact that households with at least one
international civil servant have been excluded from the EUROMOD input data (645
households), as they have a specific tax-benefit system which is different from the national
one.26
Further discrepancies between EU-SILC and EUROMOD output data for the base year may
arise due to precision of simulations when information in the EU-SILC data is limited, issues
of benefit non take-up or tax evasion, under-reporting of income components in the EU-SILC
data, as well as some differences in income concepts and definitions, coverage and reference
period (e.g. EU-SILC collects taxes paid in the income reference paid whereas EUROMOD
simulates the taxes arising earned in the income reference period).
In table 4.3 the average accuracy for the main income components is calculated by comparing
the EU-SILC 2012 (income 2011) with the EUROMOD output file based on the same data. If
there is no difference between them, the accuracy values are equal to 1.
Overall, the non-simulated income components27
are very similar to their original EU-SILC
values. However, there are some country-specific differences due to imputations and other
data transformations: e.g. for self-employment income in Denmark, Spain and Sweden.;
25
http://www.iariw.org/dresden/gregorini.pdf 26
This limitation is going to be addressed in the next update of the Luxembourg model based on EU-SILC 2015
data. 27
The non-simulated components are employment and self-employment income, inter-household transfers
received/paid, property income, private pension and most public pensions.
45
Larger differences are observed for the simulated income components with accuracy being
below 0.90 for most of the countries for taxes. The low accuracy in Hungary and Slovakia are
explained by relatively large level differences in small values in D10 and D20.
Table 4.3: Average accuracy for the main income components in EUROMOD by country in the base
year (2011)
Country Equivalised
disposable
income
Employment Self-
employment
Taxes Social
benefits
Average
country
BE 0.97 1.00 1.00 0.84 0.98 0.96
BG 0.99 1.00 1.00 0.59 0.91 0.90
CZ 1.00 1.00 1.00 0.73 0.99 0.94
DK 0.98 0.99 -10.47 0.93 0.90 -1.33
DE 0.99 0.99 0.95 0.94 0.90 0.95
EE 1.00 1.00 1.00 0.93 0.97 0.98
IE 0.99 1.00 1.00 0.83 0.91 0.95
EL 1.00 0.99 0.93 0.94 0.95 0.96
ES 0.86 0.86 -0.91 0.82 0.95 0.52
FR 0.99 1.00 1.00 0.73 0.99 0.94
HR 0.99 1.00 0.92 0.65 0.98 0.91
IT 0.99 0.97 0.80 0.96 0.77 0.90
CY 0.99 1.00 1.00 0.96 0.92 0.97
LV 0.98 1.00 0.99 0.64 0.98 0.92
LT 0.98 1.00 1.00 0.75 0.98 0.94
LU 0.98 0.98 0.96 0.98 0.96 0.97
HU 0.96 1.00 0.93 0.02 0.96 0.77
MT 0.99 0.99 0.96 0.77 0.91 0.92
NL 0.99 0.94 0.71 0.87 0.87 0.87
AT 0.99 1.00 1.00 0.89 0.98 0.97
PL 0.99 1.00 1.00 0.88 0.97 0.97
PT 0.98 1.00 1.00 0.82 0.96 0.95
RO 0.99 0.99 0.98 0.69 0.94 0.92
SI 0.98 0.69 0.74 0.80 0.96 0.84
SK 0.98 1.00 1.00 -0.45 0.88 0.68
FI 0.99 1.00 1.00 0.90 0.96 0.97
SE 0.99 1.00 -0.12 0.96 0.97 0.76
Average
income 0.98 0.98 0.42 0.75 0.94 0.81
Note: Average for indicators: D10, D20, D30, D40, MEDIAN, D60, D70, D80, D90, SUM
Base year= 2013 for EL; UK (n/a)
Source: EUROMOD 3.22+
In table 4.4, there are the income components with a lower share in the total disposable
income. As the other non-simulated income components, they are on average very similar to
their original EU-SILC values. The performance of property income and private pensions is
below 0.98 due the impact in the average of the lower accuracy of Spain due to revisions in
the data after the EUROMOD input file was created.
46
Table 4.4: Average accuracy for other income components in EUROMOD in the base year (2011)
Other income components Average income
Inter-household transfers paid 0.98
Inter-household transfers received 0.99
Property income 0.97
Private pensions 0.97 Note: Average for indicators: D10, D20, D30, D40, MEDIAN, D60, D70, D80, D90, SUM and all countries except UK, base
year = 2013 for EL
Source: EUROMOD 3.22+
Both the differences in the income components and revisions in the data lead to an equivalised
disposable income in EUROMOD that differs from the original EU-SILC in the base year. In
order to account for these differences, an alignment factor is calculated for each household.
The factor is equal to the absolute difference between the value of equivalised household
disposable income in EU-SILC 2012 and the EUROMOD estimate for the same period and
income concept. For consistency reasons, the same household specific factor is applied to all
later policy years. This is based on the assumption that the discrepancy between EUROMOD
and EU-SILC estimates remains stable over time.
4.1.2. Intermediate quality checks
In the retrospective quality assessment, intermediate quality checks were implemented
throughout the process. This allows for a more precise identification of the problematic issues
in the past years: e.g. identify discrepancies between specific microsimulated income
components and the observed values in EU-SILC.
Demographic and labour characteristics
For the microsimulation approach an essential element is the update of labour information and
demographics. For categorical demographic or labour market variables we measure the
similarity of distributions using the Hellinger distance (HD). We compare variables from the
base year (2011) after calibration to the target year with figures from the actual EU-SILC for
the corresponding year. This is a measure of performance after the first stage of input data
transformations aimed at assessing how close we are to EU-SILC in terms of labour and
demographic characteristics. The results in table 4.5 are in line with the previous section: the
variables with a higher HD are the ones related to the labour information.
This measure of distributions similarity was implemented for two calibration procedures:
- CAL H: calibration at household level based on the marginal distributions (levels) from
LFS in the target year for: household size, number of people by age group and sex,
number of people part/full-time employed/self-employed/retired, region and urban.
- CAL_H_A: calibration at household level based on the changes in the shares for the
same variables.
47
Table 4.5a: Hellinger distances after not adjusted calibration calculated for calibration variables
Calibration variables BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE
Children 0-5 years 1.14 3.00 1.93 3.16 1.81 1.30 2.59 0.48 0.42 1.90 0.75 0.75 1.27 0.64 0.94 0.69 3.32 0.72 0.34 0.33 0.73 3.99 0.42 0.75 0.82 1.17
Children 6-9 years 1.19 3.35 0.55 2.32 1.77 1.75 0.98 0.81 1.00 2.77 0.23 1.82 0.89 0.92 2.08 0.71 2.44 0.46 0.89 0.58 0.62 4.42 1.95 0.98 1.44 1.51
Children 10-14 years 1.57 3.01 1.36 2.14 1.86 0.74 0.49 1.04 1.43 3.43 0.25 3.03 0.44 1.37 2.64 0.39 2.41 0.30 0.46 1.92 0.57 4.31 2.39 0.94 1.28 1.10
Children 15-19 years 0.83 1.57 0.26 2.38 0.76 2.65 1.20 0.42 0.77 1.99 0.36 3.39 0.96 0.49 1.59 1.18 1.64 0.59 1.65 1.60 0.62 2.17 0.94 2.35 0.87 1.13
Persons 20-24 years 0.61 2.27 0.98 2.35 1.34 1.07 0.48 0.20 1.25 2.27 0.33 5.30 1.10 1.46 2.25 1.58 0.64 1.08 0.89 0.41 0.47 2.50 0.94 2.19 1.36 3.31
Females 25-34 years 0.48 4.34 0.91 1.00 0.51 2.77 1.50 0.43 0.74 3.16 0.70 1.43 1.18 0.26 0.89 1.15 0.14 0.77 0.63 2.52 1.00 5.40 0.56 2.81 0.50 0.39
Males 25-34 years 0.67 4.04 2.27 0.82 0.54 2.04 1.71 0.52 0.82 3.73 0.39 2.47 0.60 1.04 0.61 2.01 1.14 0.64 2.19 2.68 0.64 5.15 0.57 2.68 0.68 0.47
Females 35-49 years 1.36 1.54 0.91 2.70 0.90 1.41 1.63 0.18 0.71 8.76 0.46 3.04 0.58 0.02 0.50 2.13 2.05 0.53 1.15 0.99 0.59 2.57 1.96 1.90 0.59 0.37
Males 35-49 years 0.90 1.69 0.70 1.66 1.55 1.97 2.42 0.86 0.34 7.35 0.51 1.17 0.64 0.96 0.52 1.55 1.03 0.54 1.61 1.67 0.64 3.50 1.12 1.38 0.88 0.89
Females 50-64 years 1.47 0.92 0.93 2.69 1.18 1.49 1.61 0.30 0.44 1.18 0.31 1.44 0.46 0.39 1.02 3.53 1.90 0.38 1.03 0.48 0.68 1.21 1.31 1.14 1.24 0.34
Males 50-64 years 0.94 1.64 1.02 2.03 1.34 2.76 1.46 0.41 0.87 6.93 0.41 1.11 0.79 0.35 0.79 3.43 1.68 0.24 1.46 0.42 0.73 0.97 0.86 0.85 0.71 0.06
Persons 65-69 years 0.80 1.41 0.75 3.09 0.88 0.20 3.24 0.33 0.67 2.85 0.38 0.96 0.54 0.62 0.63 1.36 0.23 0.50 0.74 0.44 0.44 1.73 0.93 1.35 0.54 0.69
Persons 70-74 years 3.28 0.27 0.53 1.45 1.33 0.58 1.60 0.71 0.45 2.30 0.33 1.22 1.03 0.58 0.71 0.54 0.79 0.85 0.80 0.72 0.35 2.09 0.47 1.13 0.33 0.75
Person 75 years and older 2.54 0.76 0.33 0.64 0.89 0.53 0.58 4.56 0.76 1.53 0.60 1.46 2.58 0.49 1.20 1.08 0.59 0.35 0.58 2.59 1.74 1.83 1.82
Full-time employed persons
(15-74 years) 1.09 4.39 1.69 5.83 3.47 3.13 2.42 1.05 0.67 2.11 1.19 2.49 0.47 2.31 2.69 1.05 2.27 1.18 1.22 0.96 2.17 5.51 1.55 3.18 2.38 5.49
Part-time employed persons
(15-74 years) 1.11 3.09 1.78 1.78 1.78 3.18 2.33 0.31 0.73 2.40 1.55 0.75 0.79 1.99 2.67 3.05 4.69 1.72 0.88 2.11 0.91 0.43 1.49 4.09 1.33 1.72
Self-employed persons
(15-74 years) 3.86 1.67 0.50 3.00 1.20 1.77 2.11 0.74 1.45 6.16 0.26 5.55 1.45 1.06 0.77 1.88 3.28 0.55 0.35 0.75 2.13 1.59 3.42 2.36 0.25 1.01
Unemployed persons
(15-74 years) 0.31 4.02 3.42 1.04 2.30 1.76 1.13 1.47 1.72 3.83 4.31 1.40 1.63 1.38 4.68 0.57 1.02 2.07 1.71 0.96 1.40 6.55 2.54 1.18 0.71 1.98
Retired persons
(15-74 years) 2.04 2.20 1.03 2.07 0.77 1.29 2.91 1.39 1.89 0.10 0.76 2.20 0.90 2.14 0.81 1.70 1.16 2.68 2.37 0.98 3.04 3.15 1.57 1.72 1.00 2.32
Household size 4.32 8.46 1.99 2.39 1.19 1.99 5.22 0.37 1.07 1.25 0.53 2.89 0.95 2.56 1.18 0.51 4.32 1.12 0.46 3.15 0.46 6.65 3.34 5.86 0.34 7.47
Region of household 1.22 1.22 1.08 1.51 0.03 0.21 5.31 0.08 0.74 0.45 0.21 2.55 0.26 4.08 0.64 0.27 3.47
Degree of urbanisation of
household 2.76 1.57 1.49 0.49 0.43 3.47 2.91 2.83 5.50 2.42 1.11 4.48 1.61 0.47 1.05 1.41 3.55 4.20 5.10
AVERAGE country 2.56 1.20 2.19 1.35 1.73 1.90 0.71 1.19 3.55 3.55 2.21 0.86 1.10 1.53 1.58 1.87 0.91 1.01 1.25 0.93 3.37 1.63 2.11 0.92 1.78
Variables refer to income
reference period (IRP)
Number of …. in
household
Variables refer to current
reference period
48
Table 4.5a: Hellinger distances after adjusted calibration calculated for calibration variables
Variables excluded from calibration.
HD >= 5.0
DE, UK: (n/a). NL: Unemployed perons (15-74 years): LFS based on ILOSTAT.
Calibration variables BE BG CZ DK EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE
Children 0-5 years 1.09 2.38 1.36 0.83 1.99 1.68 1.56 0.60 1.07 1.82 0.43 1.07 1.31 0.81 0.58 0.49 5.28 0.95 0.63 0.57 0.55 4.35 0.91 3.90 0.61 0.55
Children 6-9 years 0.47 2.27 0.37 0.69 1.18 1.87 0.70 0.49 0.79 1.60 0.24 0.74 2.31 0.79 1.13 0.42 2.18 0.64 0.53 0.56 0.83 5.30 0.32 1.81 0.81 0.77
Children 10-14 years 1.24 1.36 0.99 0.80 1.30 1.40 0.61 1.23 0.37 0.72 0.22 1.51 0.83 1.00 3.49 0.60 3.53 0.40 0.55 0.49 0.92 3.61 1.01 2.47 0.74 0.80
Children 15-19 years 0.57 1.17 0.27 0.70 0.72 0.73 1.22 1.07 1.02 1.48 0.22 1.01 1.26 0.89 2.40 0.49 2.34 0.53 0.45 0.53 0.48 3.26 0.63 2.37 0.27 0.26
Persons 20-24 years 0.58 2.54 1.04 1.69 1.28 1.77 0.62 0.76 1.30 0.46 0.90 1.33 0.65 1.45 1.84 1.39 0.88 1.10 0.68 0.42 0.16 3.06 0.68 2.07 0.74 1.02
Females 25-34 years 0.55 2.14 0.76 0.84 0.51 1.08 0.79 0.29 0.71 1.35 0.47 1.04 1.03 0.11 1.39 0.81 1.15 0.39 0.62 0.33 0.80 5.85 0.41 1.46 0.81 0.40
Males 25-34 years 1.22 1.91 0.62 0.82 0.29 0.52 0.51 0.29 0.89 1.01 0.19 0.44 0.81 0.53 1.63 0.87 0.52 0.78 1.74 0.25 0.46 5.63 0.64 1.19 0.71 0.12
Females 35-49 years 0.32 1.76 1.07 0.69 0.43 1.02 0.56 0.44 0.66 0.52 0.19 0.79 0.68 0.20 1.09 0.49 0.55 0.37 0.47 0.75 0.45 2.53 0.59 1.29 0.52 0.33
Males 35-49 years 0.55 2.15 0.77 0.73 0.74 0.69 0.64 0.65 0.51 1.10 0.15 1.65 0.69 1.12 0.78 0.71 0.64 0.40 0.93 0.61 0.52 3.90 1.00 0.82 1.14 0.32
Females 50-64 years 0.70 1.30 0.51 0.39 0.19 0.98 0.59 1.54 0.42 0.36 0.54 0.50 0.74 0.60 1.64 1.29 0.82 0.66 0.74 0.67 0.57 0.92 0.50 1.14 0.81 0.55
Males 50-64 years 0.47 1.64 0.59 0.95 0.45 0.64 0.69 0.24 0.72 0.95 0.29 0.71 0.97 0.44 0.58 1.15 0.61 0.56 0.52 0.31 0.78 0.73 0.81 1.31 0.28 0.34
Persons 65-69 years 0.69 1.23 0.26 1.14 1.39 1.17 0.57 0.50 0.56 0.73 0.29 0.73 1.09 0.60 1.41 0.77 0.72 0.49 0.62 0.59 0.94 1.89 1.19 1.03 0.32 0.27
Persons 70-74 years 0.53 0.77 0.81 0.62 0.84 0.84 0.49 0.44 0.59 0.64 0.29 0.99 1.03 0.62 1.16 0.37 0.70 0.59 0.94 0.26 0.34 1.90 0.92 0.84 0.48 0.44
Person 75 years and older 0.91 0.95 0.28 0.98 0.30 0.16 0.34 0.48 0.27 0.92 0.80 0.24 1.01 0.53 0.28 1.06 0.40 0.32 0.78 2.32 0.73 0.64 0.35
Full-time employed persons
(15-74 years) 1.65 2.92 1.05 0.58 1.65 1.71 0.78 1.53 0.58 1.78 0.65 2.14 1.50 1.11 3.34 1.41 1.04 0.86 0.68 0.84 1.99 5.92 1.71 1.47 1.01 0.47
Part-time employed persons
(15-74 years) 0.41 1.70 1.10 1.35 0.65 1.19 1.06 0.96 1.38 0.80 0.30 1.34 0.68 0.87 0.73 2.51 0.63 0.46 0.61 0.62 0.62 0.44 0.99 0.85 1.36 0.48
Self-employed persons
(15-74 years) 0.57 1.76 0.96 0.61 1.09 1.11 1.32 0.76 0.75 0.64 0.43 0.80 1.21 1.12 2.18 2.01 0.55 0.95 0.69 0.74 0.98 2.21 0.64 0.92 0.36 0.53
Unemployed persons
(15-74 years) 0.61 2.11 1.19 0.62 1.51 0.88 0.59 0.33 0.75 1.34 1.79 3.33 1.85 1.29 2.17 0.90 1.98 1.28 0.96 0.73 2.63 1.66 2.28 1.12 1.59 0.59
Retired persons
(15-74 years) 1.04 0.88 0.72 1.23 0.67 1.36 0.72 0.69 0.86 0.71 0.21 1.52 0.36 2.08 1.03 0.65 0.53 2.19 0.64 0.41 1.02 2.55 0.57 0.78 0.31 0.15
Household size 0.43 4.63 1.10 1.00 1.26 1.45 0.91 0.15 0.39 0.76 0.43 1.17 1.14 1.84 1.39 0.61 0.92 0.22 0.44 0.93 0.58 5.22 1.34 1.36 0.26 1.59
Region of household 0.74 1.05 0.42 1.35 0.03 0.13 1.97 0.09 0.40 0.45 0.22 0.60 0.44 3.02 0.00 0.67 0.35 1.49
Degree of urbanisation of
household 0.41 1.52 0.94 0.37 0.41 1.59 0.20 0.86 5.00 0.83 0.27 3.28 1.27 0.76 0.30 0.35 3.67 0.56 2.14
AVERAGE country ` 0.72 1.82 0.78 0.86 0.93 1.12 0.76 0.61 0.79 1.15 0.41 1.17 1.05 0.86 1.55 1.01 1.29 0.75 0.67 0.54 0.78 3.18 0.84 1.44 0.66 0.58
Variables refer to current
reference period
Variables refer to income
reference period (IRP)
Number of …. in
household
49
Tables 4.5a and b show the HD values for the two calibration methods with particular
problems for Romania, Sweden and Croatia. It can be noticed that in general the calibration
at the household level with adjustment improves the consistency between the 'updated' labour
variables and EU-SILC data from the target year.
Table 4.6 provides a quick overview on the performance of the different methods for updating
the demographic and labour characteristics by country for several indicators: average for the
positional indicators (deciles), quintile share ratio (QSR) and the at-risk-of-poverty rate.
Table 4.6: Average consistency scores
Indicator Positional indicators Indicator QSR Indicator AROP
Country cal_H_
S2L
cal_H_
S2L_A
lab_I_
S2L Total
cal_H_
S2L
cal_H_
S2L_A
lab_I_
S2L Total
cal_H_
S2L
cal_H_
S2L_A
lab_I_
S2L Total
BE 0.980 0.977 : 0.979 0.981 0.983 : 0.982 0.970 0.966 : 0.968
BG 0.964 0.961 0.963 0.962 0.954 0.955 0.946 0.953 0.957 0.953 0.965 0.956
CZ 0.990 0.989 0.988 0.989 0.971 0.967 0.970 0.969 0.937 0.912 0.901 0.922
DK 0.983 0.986 0.991 0.985 0.964 0.964 0.960 0.964 0.976 0.982 0.998 0.981
DE 0.974 0.987 0.987 0.981 0.909 0.904 0.906 0.906 0.948 0.969 0.979 0.961
EE 0.967 0.966 0.967 0.967 0.931 0.918 0.919 0.923 0.926 0.936 0.955 0.936
IE 0.976 0.968 0.984 0.973 0.771 0.737 0.973 0.778 0.816 0.832 0.889 0.831
EL 0.989 0.973 0.962 0.973 0.958 0.934 0.817 0.891 0.984 0.982 0.957 0.972
ES : : 0.979 0.979 : : 0.951 0.951 : : 0.956 0.956
FR 0.981 0.988 0.987 0.985 0.978 0.993 0.984 0.985 0.964 0.981 0.980 0.974
HR 0.958 0.976 0.976 0.976 0.965 0.963 0.961 0.963 0.962 0.957 0.950 0.956
IT 0.990 0.987 0.969 0.986 0.986 0.979 0.876 0.971 0.979 0.977 0.948 0.975
CY 0.949 0.963 0.971 0.958 0.947 0.937 0.962 0.944 0.969 0.952 0.961 0.961
LV 0.978 0.978 0.977 0.978 0.972 0.970 0.989 0.973 0.978 0.980 0.977 0.979
LT 0.969 0.973 0.972 0.971 0.894 0.890 0.880 0.891 0.909 0.906 0.887 0.905
LU 0.970 0.974 0.982 0.973 0.913 0.928 0.931 0.921 0.940 0.936 0.951 0.939
HU 0.984 0.984 0.980 0.983 0.975 0.972 0.962 0.971 0.938 0.940 0.918 0.935
MT 0.953 0.968 0.981 0.963 0.954 0.960 0.969 0.958 0.976 0.978 0.993 0.978
NL 0.992 0.993 0.992 0.992 0.975 0.976 0.978 0.976 0.947 0.954 0.971 0.953
AT 0.975 0.976 0.976 0.975 0.984 0.985 0.979 0.984 0.986 0.986 0.988 0.986
PL 0.991 0.993 0.992 0.992 0.960 0.975 0.972 0.968 0.975 0.991 0.986 0.984
PT 0.986 0.986 0.983 0.986 0.978 0.975 0.985 0.978 0.970 0.962 0.965 0.966
RO 0.952 0.965 0.959 0.964 0.912 0.912 0.932 0.916 0.888 0.954 0.954 0.954
SI 0.969 0.973 : 0.971 0.967 0.961 : 0.964 0.929 0.927 : 0.928
SK 0.979 0.967 0.971 0.973 0.923 0.946 0.939 0.935 0.914 0.953 0.949 0.935
FI 0.990 0.989 0.994 0.990 0.983 0.986 0.987 0.985 0.946 0.943 0.947 0.945
SE 0.969 0.983 0.989 0.984 0.924 0.969 0.972 0.969 0.925 0.942 0.947 0.943
Average
all 0.977 0.978 0.978 0.978 0.951 0.949 0.948 0.950 0.949 0.951 0.954 0.951
This can give a first basis for comparison of different methods and their selection for the
production of flash estimates in the future. We can notice that the results are indicator and
country dependent.. The method based on labour market transitions works better in several
50
countries and for more difficult indicators such as the quintile share ratio (QSR) and AROP.
However, there is a certain complementarity as for example calibration methods work better
in Italy and the Czech Republic. In general, adjusting for source inconsistencies in LFS
improves accuracy and it improves the consistency for countries like Romania, Croatia and
Sweden.
Consistency of the income components
Table 4.7 shows the consistency indicators for the same income components comparing the
evolution of the main indicators based on 9 alternative microsimulation-based nowcasting
methods and the actual evolution in EU-SILC.
Table 4.7: Average consistency of methods by indicator and income component (2011-2014)
Income D10 D30 MEDIAN D70 D90 Average
income
Employment 0.89 0.95 0.97 0.97 0.97 0.95
Self-employment 0.73 0.82 0.84 0.88 0.72 0.8
Taxes 0.65 0.77 0.88 0.92 0.94 0.83
Social benefits 0.89 0.89 0.96 0.97 0.97 0.94
Inter-household transfers paid 0.8 0.84 0.87 0.89 0.89 0.86
Inter-household transfers
received 0.79 0.88 0.91 0.89 0.86 0.87
Property income 0.62 0.69 0.72 0.72 0.76 0.7
Private pensions 0.12 0.49 0.71 0.73 0.7 0.55
Average indicator 0.69 0.79 0.86 0.87 0.85 0.81
Note: Average for all methods, countries (except UK, and DK for D10) and years
Source: EUROMOD 3.22+
The consistency is quite high for the median for employment income (0.97) and social
benefits (0.96), but relatively lower for self-employment (0.84) and, particularly, for property
income (0.72). The latter is highly volatile and difficult to predict. The changes in the left tail
of distribution (D10 and D30) are also more challenging to estimate.
Consistency of the uprating factors: model-based vs data-based
Table 4.8 focuses on the consistency in the mean for employment income comparing its
evolution based on the two alternative uprating factors methods and the actual evolution in
EU-SILC.
51
Table 4.8: Consistency in the mean for employment income by methods and by country (2011-2014)
Country MODEL-BASED DATA-BASED
2012 2013 2014 Average
country 2012 2013 2014
Average
country
BE 0.97 0.99 0.98 0.98 0.96 0.98 1.00 0.98
BG 0.94 0.98 0.88 0.93 0.99 0.96 0.94 0.96
CZ 1.00 0.97 0.96 0.98 0.97 0.97 1.00 0.98
DK 0.98 0.97 0.99 0.98 0.98 0.98 1.00 0.98
DE 0.97 0.98 : 0.98 0.99 0.99 : 0.99
EE 0.82 0.82 : 0.82 0.99 1.00 : 0.99
IE 0.91 0.93 : 0.92 0.96 0.98 : 0.97
EL : : 0.80 0.80 0.95 0.96 0.99 0.97
ES : : 0.94 1.00 1.00 0.98
FR 0.98 0.97 : 0.98 0.99 0.99 : 0.99
HR 0.97 0.99 : 0.98 0.94 0.98 : 0.96
IT 0.99 0.96 : 0.97 0.97 0.97 : 0.97
CY 0.97 0.96 : 0.96 0.96 0.97 : 0.97
LV 0.99 0.98 1.00 0.99 0.96 0.97 0.98 0.97
LT 0.98 0.99 1.00 0.99 0.98 1.00 0.96 0.98
LU 0.98 0.99 : 0.98 0.98 0.96 : 0.97
HU 0.86 0.84 0.95 0.89 1.00 0.97 0.96 0.97
MT 0.97 0.96 : 0.97 0.99 0.97 : 0.98
NL 0.99 0.96 0.99 0.98 0.99 0.95 0.99 0.98
AT 0.98 0.95 0.99 0.97 0.99 0.97 0.99 0.98
PL 0.97 0.99 : 0.98 0.98 0.99 : 0.99
PT 0.94 0.99 : 0.97 0.98 0.97 : 0.98
RO 0.96 0.96 0.89 0.93 1.00 0.99 0.95 0.98
SI 0.87 0.99 : 0.93 0.87 0.99 : 0.93
SK 0.94 0.90 : 0.92 0.91 0.92 : 0.92
FI 0.96 0.97 0.99 0.97 0.99 1.00 1.00 1.00
SE 0.96 0.98 : 0.97 0.99 0.99 : 0.99
Note: Average for the mean
UK (n/a)
Source: EUROMOD 3.22+
The consistency is generally better for the data-based uprating factors, except for Croatia,
Lithuania, Luxembourg and Latvia. For the period analysed, the model-based uprating factors
do not perform well for Greece (0.80), Estonia (0.82), and Hungary (0.89).
Table 4.9 shows the consistency in the mean for self-employment income and also compares
the two alternative uprating factors methods.
The consistency for self-employment income is generally worse for the model-based uprating
factors, in particular, for France (0.16), Denmark (0.47) and Hungary (0.56). Overall, the
52
data-based uprating factors for self-employment income have a lower average performance
than the factors for employment income due its volatility.
Table 4.9: Consistency in the mean for self-employment income by methods and by country (2011-
2014)
Country MODEL-BASED DATA-BASED
2012 2013 2014 Average
country 2012 2013 2014
Average
country
BE 0.97 0.93 0.99 0.96 0.96 0.99 0.94 0.97
BG 0.67 0.66 0.85 0.73 0.61 0.72 0.90 0.74
CZ 0.94 0.79 0.76 0.83 0.97 0.96 0.99 0.97
DK 0.14 0.74 0.52 0.47 0.24 0.68 0.87 0.60
DE 0.92 0.90 : 0.91 0.90 0.93 : 0.91
EE 0.96 0.96 : 0.96 0.96 0.97 : 0.97
IE 0.84 0.84 : 0.84 0.97 0.90 : 0.93
EL : : 0.70 0.70 0.97 0.97 0.98 0.98
ES : : : : 0.99 0.98 0.98 :
FR 0.16 : : : 0.96 0.95 : 0.96
HR 0.75 0.87 : 0.81 0.91 0.92 : 0.92
IT 0.95 0.96 : 0.95 0.93 0.97 : 0.95
CY 0.84 0.88 : 0.86 0.88 0.94 : 0.91
LV 0.87 0.93 0.81 0.87 0.89 0.99 0.99 0.96
LT 0.57 0.99 0.98 0.85 0.53 0.98 0.99 0.83
LU 0.85 0.79 : 0.82 0.87 0.77 : 0.82
HU 0.67 0.72 0.30 0.56 0.92 0.96 0.32 0.74
MT 0.91 0.97 : 0.94 0.97 0.99 : 0.98
NL 0.90 0.91 0.99 0.93 0.98 0.99 0.98 0.98
AT 0.91 0.90 0.95 0.92 0.87 0.94 0.98 0.93
PL 0.94 0.96 : 0.95 0.99 0.95 : 0.97
PT 0.92 0.96 : 0.94 0.95 0.99 : 0.97
RO 0.98 0.97 0.87 0.94 0.86 0.97 0.83 0.89
SI 0.86 0.80 : 0.83 0.84 0.84 : 0.84
SK 0.93 0.92 : 0.92 0.97 0.96 : 0.96
FI 0.99 0.98 0.97 0.98 0.96 0.99 0.96 0.97
SE 0.92 0.92 : 0.92 0.83 0.95 : 0.89
Note: Average for the mean
UK (n/a)
Source: EUROMOD 3.22+
53
4.2. Quality assessment
As part of the assessment procedure, we follow two main stages. We:
1) Perform a retrospective analysis of the performance of flash estimates for 2012 and
2014 income years. In particular, we look at:
(a) The two performance metrics for comparing point estimates and year-on-year
changes for the key income indicators (i.e. accuracy and consistency);
(b) An additional quality measure that takes into account the uncertainty related to the
sampling variance.
2) Define a strategy for providing a flash estimate for the target year based on selected
methods. Three different options are envisaged:
(a) Best method by country;
(b) Best method by indicator that takes into account the specific traits of groups of
indicators.
(c) An aggregate of the best methods selected based on their historical performance
and the convergence of estimates in the target year.
These options are simulated for 2014 flash estimates for the countries for which EU-SILC
data is already available. Section c) provides the methodological background and the
empirical results for the assessment done in order to decide the overall strategy.
4.2.1. Retrospective performance analysis
The objective of performance analysis is to evaluate the capacity of a given model framework
to produce adequate flash estimates of some income indicator of interest. The definition of
what adequate estimation means and the nature of the performance criterion used in the
performance analysis depends upon the underlying objective of the estimation procedure:
prediction vs explanation.
That being said we have to draw a clear distinction between the explanatory power and the
predictive power of a given model framework.
− If the objective of the estimation procedure is to detect and explain certain phenomena
within the data (e.g. correlations between a dependent variable and a set of
explanatory variables, as it is the case in classic multiple regression analysis), we
prefer simpler models over more complex models which might have a better
goodness-of-fit with respect to the underlying data but lack any kind of clear message.
− On the other hand, if the exclusive goal is prediction, the model choice should be
based on the quality of prediction with the explanatory power of the selected model
being of minor importance.
54
We are in the second case as our main objective is to provide an early estimate of the final
data. Therefore, our focus is on performance measures that assess the predictive power of the
model rather than the explanatory one. In practice, even though the primary goal of the
analysis might be prediction, the explanatory power of the chosen model cannot be entirely
neglected and might still be considered as relevant. For example, if we want to provide
explanations for changes in the estimated indicators by linking the changes to macroeconomic
phenomena or specific policy decisions. This is possible with the microsimulation approaches,
but not with PQM.
Methodological topic 1: Out-of-sample performance analysis
Out-of-sample measures are selected over in-sample measures such as the goodness-of-fit to
evaluate the prediction power of the models. The out of sample measures such as accuracy,
consistency and the uncertainty-based performance metric are used to assess the historical
performance by comparing flash estimates and observed values.
The first alternative for evaluating the prediction power would be to simply measure the
goodness-of-fit of the predicted values to the data points. However, this simple approach to
evaluating the prediction performance gives a natural preference to complex models having
large numbers of parameters over more parsimonious models due to the well-known problem
of overfitting. In general, overfitting occurs when a model has a large number of parameters
relative to the number of observations which lead the model to describe random error or noise
instead of the underlying signal component which is the object of interest of any statistical
modelling scheme. One way of getting rid of this inherent bias towards over-complex models
is to add a penalty term onto the goodness-of-fit measure which penalizes each of the models
according to their respective number of parameters. Thus, by introducing this type of penalty
terms if two models have similar goodness-of-fit the simpler model is preferred over the more
complex one. Note that the penalized goodness-of-fit metric can be interpreted as a practical
implementation of the law of parsimony which states that complexity should not be posited
without necessity. Some of the most widely-used penalized goodness-of-fit criteria are
Mallow's Cp or the Akaike information criterion.
Another way of preventing overfitting is to use out-of-sample performance analysis instead of
in-sample analysis. As far as in-sample prediction is concerned the same data is used for
model estimation (training) and for the validation of the prediction results. In contrast to this,
out-of-sample experiments consist in splitting the available data into two separate sets
commonly referred to as the training set and the validation set. To illustrate the out-of-sample
paradigm let us assume that we have 𝑡 EU-SILC datasets at our disposal indexed as 𝑡 =
1, … , 𝑇. For each of these EU-SILC datasets we can compute the income indicator of interest
which leads to a sequence of 𝑇 EU-SILC indicators denoted as �̃�𝑡 with 𝑡 = 1, … , 𝑇. We now
split this sequence of indicators into two sets with the training set containing the indicators �̃�𝑗
up to �̃�𝑙 such that 1 ≤ 𝑗 ≤ 𝑙 < 𝑡 and the validation set consisting of one single observation
that is �̃�𝑡. The out-of-sample experiment then consists in producing the flash estimate for the
target year �̂�𝑡 and to use the validation set for the evaluation of the performance of �̂�𝑡. Of
55
course we can repeat this procedure by running multiple out-of-sample experiments with 𝑡
taking on values on the grid 𝑇, 𝑇 − 1, 𝑇 − 2, …
Note that whereas the length of the validation set is always equal to one, the length of the
training set may differ with the target year and the nowcasting approach to be analysed. As far
as the group of micro-simulation models with data-based uprating factors is concerned, the
training set only consists of one single time point namely 𝑙 = 𝑡 − 1. On the contrary for the
PQM settings and the simulation models with model-based differential growth rates, the
training set is not fixed in advance while being strictly larger than two. Thus, for this kind of
models the sample size of the training is a free parameter. The guiding principle when
defining the starting point of the training set is that in order to produce meaningful flash
estimates for the target year the statistical properties of the EU-SILC indicators and the
explanatory variables comprised in the training set should be similar to the ones in the
validation set. This similarity in the statistical properties can either be derived in advance or
derived on the basis of an optimization procedure. In the former approach we might rely on
economic reasoning allowing picking those time periods which were characterized by the
macroeconomic and political circumstances comparable to the ones observed before and in
the target year. The optimization approach allows picking the optimal starting point, among a
set of candidate starting points, leading to the flash estimate characterized by the best
performance at the target year. However, given the high-dimensional nature of the PQM
models on the one hand and the simulation models using differential uprating factors on the
other hand, we fix the starting point to be equal to the starting point of the sample of available
EU-SILC datasets. Hence, for the out-of-sample nowcast for 2012 we use time points 2008 to
2011 for the model training. For the 2013 nowcast the training set consists of time points
2008 to 2012 and so on. Thus, the number of time point in the training varies along with the
value of the target year in the validation set.
(a) Accuracy and Consistency
As presented in the quality assurance section, these are first two metrics we have used for
assessing the quality of our estimates.
− Accuracy is the absolute percentage difference between the estimated and the
observed values of the indicator, for a given year; it tells how close the estimate is
to the observed value.
− Consistency is the absolute difference between the estimated and the observed
YoY % change; it tells whether the estimate got the magnitude of the change right.
They are expressed as percentage, the higher the value, the higher the desirability of the
situation. One limitation is that there is no clear threshold that discriminates between good
and not-so-good estimates. Therefore, they are not suitable as measures of absolute
performance. Another limitation is that neither accuracy nor consistency are measures that
take into account the uncertainty.
56
They are therefore used as measures of relative performance, to compare methods and to
discard those that do not work. In our quality framework, a value below 90% for either
consistency or accuracy is a signal for us to review the input data and the estimation
algorithm, in order to identify possible sources of error and to improve the results; the method
is discarded only if no improvement is achieved. Table 4.10 shows the average consistency by
method. This highlights two main messages: the performance is country dependent and
microsimulation seems to perform better for positional indicators (deciles).
57
Table 4.10: Consistency by method
58
(b) Uncertainty based performance measure
This section introduces an additional performance measure, being complementary to accuracy
and consistency. We will refer to this performance measure as the uncertainty-based
performance measure and it has two main advantages in comparison with accuracy and
consistency: it takes into account the sampling noise of the EU-SILC estimate when
evaluating the performance of the flash estimate and it provides also an absolute performance
measure as we can derive a threshold for the low and high performance based on a statistical
test.
Hence, in contrast to the previous performance metrics, the uncertainty based metric
establishes a link between the flash estimation output and concepts from probability theory
and statistical inference. The performance metric is based on the concept of distributional
equivalence. In general, a flash estimate at a given target year is said to have good
performance if its probability distribution is identical to the sampling distribution of the
corresponding EU-SILC estimate. The null hypothesis of identical distributions means that
the difference of the flash estimate and the EU-SILC estimate follows a normal distribution
with zero mean and standard deviation of √2 𝜎𝑡 or more formal as:
𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡).
Now if we denote as 𝐹𝜎𝑡(𝑥) the cumulative distribution function (CDF) of the 𝑁(0, √2𝜎𝑡)
distribution, the performance metric is given by
𝑫𝒕 = 𝟒 ⋅ 𝑭𝝈𝒕(�̂�𝒕 − �̃�𝒕) ⋅ (𝟏 − 𝑭𝝈𝒕
(�̂�𝒕 − �̃�𝒕)).
𝐷𝑡 is defined on the interval [0,1]. If the observed values of the flash estimate and the EU-
SILC estimate are identical, e.g. if �̂�𝑡 − �̃�𝑡 = 0, we have that 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) = 0.5 and hence
that 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have that 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) tends to 1
or 0 respectively implying that 𝐷𝑡 → 0.
The distribution of 𝐷𝑡 can be derived numerically using Monte Carlo simulations. The
availability of the distribution under 𝐻0 allows to compute the p-value associated with the
observed value of 𝐷𝑡 which then allows for probabilistic conclusions regarding the absolute
model performance. Having a measure of absolute performance reveals to be crucial hest
ranking flash estimate which outperforms all other available flash estimates could have a low
absolute performance. The threshold for discriminating between well-performing and low-
performing models can be defined in relation to the p-value of 𝐷𝑡 and be set equal to the
standard confidence levels of statistical test. We have chosen 0.1 as a confidence level. .
Hence, a p-value larger than 0.1 implies that we do not reject the null hypothesis that EU-
SILC and the flash estimates distributions are the same for methods with p-values higher than
0.1. Therefore all methods having a p-value lower than the significance level of 0.1 are
considered to be of low-performance, due either to the presence of some type of bias or to
instability issues reflecting in a large standard deviation, which in consequence leads to their
elimination.
59
The rest of the section contains the theoretical underpinnings of the uncertainty based
performance measure with the main methodological issues and as a last point the empirical
results of the performance analysis. All the methods went through the common assessment
framework and were scored mainly based on their historical performance in the test period
(2012-201428
).
Methodological topic 2: Distributional equivalence
The performance measure relies on the idea of distributional equivalence: a flash estimate at a
given target year is said to have good performance if its probability distribution is identical to
the sampling distribution of the corresponding EU-SILC estimate. This is translated in terms
of the first two moments of the sampling distribution based on the normality assumption.
In general, a flash estimate at a given target year is said to have good performance if its
probability distribution is identical to the sampling distribution of the corresponding EU-SILC
estimate. In order to illustrate why the distributional equivalence indicates a good
performance of the flash estimate, we can reason in terms of the first and the second statistical
moments, e.g. the mean and the variance or the standard deviation. First note that if the
probability distribution of the flash estimate is equal to the sampling distribution all their
statistical moments are identical. Thus, if the EU-SILC indicator is an unbiased estimate of
the unknown population indicator the equality of the two probability distributions implies that
the flash estimate is an unbiased estimate of the population parameter as well. Moreover, as
for any variable of interest in a statistical model framework we can consider the EU-SILC
indicator to consist of two components, a signal component and a noise component, with the
former carrying the information of interest and the latter representing a random error
reflecting sample noise which cannot and should not be captured by the modelling process.
The equivalence of both, the first and the second moment, imply that the flash estimate
captures the entirety of the signal component and that the only source of variation within the
flash estimate is due to the presence of sampling noise in the EU-SILC indicator.
We have conducted a Shapiro-Wilk testing scheme to verify the normality of the sampling
distribution of the EU-SILC indicators. The results of the test clearly indicate that the
normality assumption is a valid one and hence that the quality assessment of the flash
estimation results can be built around the first two moments of the sampling distribution.
Methodological topic 3: Formalization of the performance metric
The performance metric takes the form of a test statistic of a distributional testing framework.
The test statistic offers a measure of relative performance. The p-value of the test statistic
gives a measure of absolute performance.
28
2014 income data (EU-SILC 2015) not available for DE, IE, FR, IT, CY, LU, MT, PL, SI, SK, SE
60
To formalize the performance metric and the associated framework for testing the
convergence of the flash estimate to the EU-SILC estimate we first have to introduce a few
notations which allows for a more schematic representation of the different objects. From this
point on we denote as �̂�𝑡 the flash estimate of a given income indicator at target year 𝑡 and as
�̃�𝑡 the corresponding EU-SILC indicators. Moreover, we denote as 𝜇𝑡 the unknown
population parameter and as 𝜎𝑡 the bootstrap-based estimate of the sample standard deviation
which, as described above, can be considered as a consistent estimate of the unknown sample
standard deviation in the case of using unweighted indicators for the replicated samples.
Given these notations and provided that the EU-SILC indicator is an unbiased estimate of the
population indicator, and hence that 𝜇𝑡 = 𝐸(�̃�𝑡), the null hypothesis of equality of
distribution of the EU-SILC estimate and the flash estimate can be formalized as
𝐻0: 𝐸(�̂�𝑡) = 𝜇𝑇 𝑎𝑛𝑑 𝑆𝐷(�̃�𝑡) = 𝜎𝑡.
In response to the drawbacks of the more classical performance metrics such as the chi-
squared measure, we have developed an alternative metric which has the advantage of being
strictly bounded between bounded between zero and one and reveals to be less sensitive to the
presence of outliers than the chi-squared measure.
To explain the underlying idea of the performance metric we are first going to rewrite the null
hypothesis of identical distributions as the assumption that the difference of the flash estimate
and the EU-SILC estimate follows a normal distribution with zero mean and standard
deviation of √2 𝜎𝑡 or more formal as
𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡).
Now if we denote as 𝐹𝜎𝑡(𝑥) the cumulative distribution function of the 𝑁(0, √2𝜎𝑡)
distribution, the performance metric is given by
𝐷𝑡 = 4 ⋅ 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) ⋅ (1 − 𝐹𝜎𝑡
(�̂�𝑡 − �̃�𝑡)).
As mentioned above, 𝐷𝑡 is defined on the interval [0,1]. If the observed values of the flash
estimate and the EU-SILC estimate are identical, e.g. if �̂�𝑡 − �̃�𝑡 = 0, we have that 𝐹𝜎𝑡(�̂�𝑡 −
�̃�𝑡) = 0.5 and hence that 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have that
𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) tends to 1 or 0 respectively implying that 𝐷𝑡 → 0.
Hence, flash estimates producing larger values of 𝐷𝑡 have higher performance than flash
estimates producing lower values. In this sense, as for accuracy and consistency, the
uncertainty-based performance measure allows establishing a ranking of the flash estimates
based on their relative performance in terms of 𝐷𝑡 . However, as for accuracy and consistency,
these comparative studies of the performances delivered by different models does not allow
drawing any conclusion on their absolute performance. Note that indeed even the highest
ranking flash estimate which outperforms all other available flash estimates could have a low
absolute performance.
61
However, given that the performance metric is part of a statistical testing framework a
measure of absolute performance can be obtained from the p-values associated with the test
statistic𝐷𝑡. The p-value of the test statistics represents the probability to observe a value that
is equal or larger than the available test statistics. Thus, a low p-value indicates that the
available test statistics represents a rare event under the null hypothesis. Hence, what the p-
value offers in contrast to a mere analysis of the test statistics is the interpretation of the latter
as an outcome of a random experiment. For the sake of example assume that the p-value of 𝐷𝑡
was equal to 0.1. This implies that if we drew one hundred realizations from the distribution
of 𝐷𝑡 under the null hypothesis only ten of these one hundred realizations would be equal or
smaller than the observed value of 𝐷𝑡. Thus, the observing a value of 𝐷𝑡 or smaller can be
considered to be a rare event under the null hypothesis which occurs with rather low
frequency. Hence, given the observations of a statistically improbable value of 𝐷𝑡 we conclude
that the performance metric is unlikely to have been produced by a flash estimate which
fulfils 𝐻0 and hence that it has absolute low performance.
Obtaining the distribution of 𝐷𝑡 under 𝐻0 is straightforward in the sense that under the null
distribution 𝐹𝜎𝑡(�̂�𝑡 − �̃�𝑡) follows a uniform distribution on the interval [0,1]. Hence, the null
distribution of 𝐷𝑡 can be obtained numerically by simply generating observations from a 𝑈[0,1]
distribution and plugging the observations into the formula of the performance metric. Figure
4.1 plots the density function of the distribution under the null hypothesis. As far as the first
and second moment of the null distribution are concerned we have that 𝐸(𝐷𝑡) =2
3 and that
𝑆𝐷(𝐷𝑡) = 0.29..
Figure 4.1: Density function of the test statistics under the null hypothesis (single-period)
Note however that the discriminatory power of the distribution under the null is quite weak.
This implies that the capacity of the testing framework for rejecting incorrect null hypotheses
and hence for detecting low-performing flash estimates is not very strong.
62
To increase the discriminatory power of the testing framework, we have to measure the
performance of the flash estimates at more than one time point. There are basically three
elements that we need to adapt for accommodating multi-period performance analysis
settings: (1) the formulation of the null hypothesis, (2) the form of the test statistics and (3)
the distribution under the null hypothesis. This will be used for our analysis.
As in the one-period case, we denote as �̂�𝑡 the flash estimate at period t. However, we now
assume that there are T flash estimates at our disposal such that the set of available estimates
is given as (�̂�1, … , �̂�𝑇). If we denote as (�̃�1, … , �̃�𝑇) the set of the corresponding EU-SILC
indicators on that same time span we can write the null hypothesis as the assumption that the
probability distribution of the flash estimate is identical to the sampling distribution of EU-
SILC at each of the time points located within the period of analysis and hence as
𝐻0: �̂�𝑡 − �̃�𝑡 ∼ 𝑁(0, √2𝜎𝑡) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡 = 1, … , 𝑇.
The test statistics for the multi-period testing framework is defined as the average of the test
statistics of the single-period frameworks over the period of analysis, which we denote as
𝐷𝑇 =1
𝑇∑ 𝐷𝑡
𝑇
𝑡=1
.
The distribution of 𝐷𝑇 depends on the length of the period of analysis and hence is dependent
upon the value of 𝑇. As far as the first two moments are concerned and given the assumption
of independent summands in 𝐷𝑇 we obtain that 𝐸(𝐷𝑇) =2
3 and 𝑆𝐷(𝐷𝑇) =
𝑆𝐷(𝐷𝑡)
√𝑇. Moreover,
given the central limit theorem for independent and identically distributed random variables
we can conclude that 𝐷𝑇 converges to a normal distribution as 𝑇 → +∞. Thus, the use of the
numerical approximation of the null distribution on the basis of Monte Carlo simulations gets
less relevant with increasing values of 𝑇 and hence for 𝑇 > 20 the normal distribution can be
used for the computation of the p-values. Figures 4.2 and 4.3 illustrate the density function
and the cdf function of the distribution under the null for different values of 𝑇. Note that given
the vanishing standard deviation the discriminatory power of the null distribution and hence
the power of the test increases along with 𝑇. On top of that, as 𝑇 → +∞ the power of the test
tends to 1. Thus, as 𝑇 increases the probability of not detecting low-performing flash
estimates decreases and asymptotically this probability tends to zero resulting in a consistent
decision rule.
63
Figure 4.2: Density function of the test statistics under the null hypothesis
Figure 4.3: CDF function of the test statistics under the null hypothesis
64
Using changes instead of levels
If the flash estimates of income indicators are characterized by a systematic error (bias), the
estimates of the first-order differences might very well be unbiased. The generalization of the
performance metric to estimates of absolute and relative first-order differences is
straightforward.
In some cases, the assumption of converging flash estimates might be rejected simply because
the concerned model framework commits a systematic error at each time point. One form of
systematic error is the presence of the same type of bias at each time point, whether it is a
multiplicative bias or an additive one. In the case of an additive bias the probability
distribution of the flash estimate is equal to the sampling distribution of the EU-SILC
indicator at each time point except for an identical shift in the location. We can denote this
more formally as a difference in expected values as 𝐸(�̂�𝑡) = 𝐸(�̃�𝑡) + 𝑏 for 𝑡 = 1, … , 𝑇 with
b being some rational number. In case of a multiplicative shift we would a have a discrepancy
in the expected values as 𝐸(�̂�𝑡) = 𝑏 ⋅ 𝐸(�̃�𝑡).
Note that if a systematic bias is present, an unbiased estimate can be defined on the basis of
simple transformation of sequence of flash estimates and by shifting the target of the
nowcasting procedure accordingly. Indeed, in the case of an additive bias the first order
difference of sequence of flash estimates, that is �̂�𝑡 − �̂�𝑡−1, is an unbiased sequence for
�̃�𝑡 − �̃�𝑡−1 given that 𝐸(�̂�𝑡 − �̂�𝑡−1) = 𝐸(�̃�𝑡 − �̃�𝑡−1). In the case of a multiplicative bias a
sequence of unbiased estimates can be constructed on the basis of the relative first-order
difference, e.g. (�̂�𝑡 − �̂�𝑡−1)/�̂�_(𝑡 − 1). All the concepts described throughout this section can
be easily applied when working with the relative or the absolute first-order difference instead
of the level of the indicators.
Note that the standard deviation of the changes under the null hypothesis is not identical to the
standard deviation of the levels. However, the former can be derived from the latter with the
standard deviation of the absolute differences being given by √𝜎𝑡−12 + 𝜎𝑡
2 and the one of the
relative change being well approximated as √𝜎𝑡−1
2 +𝜎𝑡2
�̃�𝑡−1.
Empirical results
As define above, the uncertainty-based performance measure 𝐷𝑡 is defined on the interval
[0,1]. If the observed values of the flash estimate and the EU-SILC estimate are identical we
have 𝐷𝑡 = 1. Moreover, as �̂�𝑡 − �̃�𝑡 tends to +∞ or −∞ we have 𝐷𝑡 → 0.
All the methods went through the common assessment framework and were scored mainly
based on their historical performance in the test period (2012-201429
). Hence, different
methods can be ranked based on their relative performance in terms of 𝐷𝑡 . Moreover, the
29
2014 income data (EU-SILC 2015) not available for DE, IE, FR, IT, CY, LU, MT, PL, SI, SK, SE
65
value added of the uncertainty based measure is that we can define an absolute performance
threshold in order to select out the low-performers on a sound statistical basis. This is
important as even the highest ranking flash estimate which outperforms all other available
flash estimates could have a low absolute performance.
We take the p-value of 𝐷𝑡 equal to 0.1 similar to most statistical tests. This implies that we
cannot reject the null hypothesis that EU-SILC and the flash estimates distributions are the
same for methods with p-values higher than 0.1. Therefore all methods having a p-value
lower than the significance level of 0.1 are considered to be of low-performance, due either to
the presence of some type of bias or to instability issues reflecting in a large standard
deviation, which in consequence leads to their elimination.
After low-performers are discarded we make an analysis of the average performance by
method. Ideally, we should have one method that outperforms the other in order to have a
common harmonised approach that fits all situations. However, as illustrated by the analysis
of accuracy and consistency, also in this case we have different patterns conditional on the
specific country and indicator.
Table 4.11: Average performance by main approach for all indicators
In general the average performance for all indicators among the family of micro-simulation
models is significantly higher than the corresponding share of PQM models with 16 countries.
However PQM has a higher average for AT,.IT, HU, BE, EL, LU, BG, DE, RO and EE.
These patterns are very different according to the indicators. For both model families the
number of well-performing models is at its largest for the AROP indicator. As far as the
positional parameters (ARPT and the deciles) are concerned, in general the PQM family have
a relatively lower performance. For example for ARPT for eight countries only
microsimulation passes the threshold of 0.10. However, in five countries PQM is better for
ARPT.
12.1
4.4 4.56.8 7.3 8.8
2.4
7.8 6.910.2
0.1
5.3 5.7
0.2
8.1
2.4
15.3
20.0
11.7
6.3
21.3
5.4
0.8 1.24.0
6.3
55.1
68.0
0
10
20
30
40
50
60
70
80
90
100
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
NL PL DK ES FI LV FR SE SK PT CZ MT CY IE SI LT AT IT HU BE EL LU BG DE RO EE HR UK
Microsimulation PQM Absolute difference between microsimulation and PQM in percentage points
Group 3: MS or
PQM only
p.p.
Group 2: Performance PQM > MSGroup 1: Performance MS > PQM
66
Table 4.12: Average performance by main approach-ARPT
Table 4.13: Average performance by main approach: AROP
Therefore, we can conclude that even if microsimulation outperforms PQM in several
countries there is still a strong complementarity between the two main approaches when we
look across countries and indicators. This raised the issue of method selection which will be
discussed in the next chapter.
4.2.2. Method selection
As described earlier, we have at our disposal several model frameworks inspired by different
modelling paradigms. On the one hand there are simulation-based frameworks which operate
at the level of individual units by simulating the effects of tax policies on each individual
income. On the other hand we have the PQM approach based around dynamic factor models
and relying heavily on econometric and statistical estimation techniques. Moreover for each
of the two flash estimation methods there exist different subgroups of models which, in case
of the micro-simulation models for instance can be distinguished with respect to the
calibration algorithm used or the existence of a single uprating coefficient of the income
components across the distribution or multiple uprating coefficient for different socio-
26.9
17.5
30.4
2.9 2.8
10.7
28.8
21.1
4.16.1
14.311.7
0.3
8.7 10.1
60.759.1 58.9
53.8 53.7 52.6
45.8
4.0
57.2
48.2 46.9 46.4
41.0
0
10
20
30
40
50
60
70
80
90
100
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
PL SE NL DE FI DK IT PT CZ MT IE FR LU EE LV EL RO HU LT AT BG BE HR SK SI UK CY ES
Microsimulation PQM Absolute difference between microsimulation and PQM in percentage points
Group 3: MS or PQM only
p.p.
Group 2: Performance PQM > MSGroup 1: Performance MS > PQM
8.54.5 4.7
15.2
26.4
8.5
19.921.9
10.4
3.8
9.0
0.8
23.6
11.0
2.9
13.216.7
62.5
56.453.1
49.4 48.4 47.8
37.0
60.158.2
53.0
40.0
0
10
20
30
40
50
60
70
80
90
100
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
DK AT PL NL LV BE FR MT CY DE SI CZ IT LU SK HU BG RO FI EE HR PT LT IE UK EL SE ES
Microsimulation PQM Absolute difference between microsimulation and PQM in percentage points
Group 3: MS or PQM only
p.p.
Group 2:
Performance PQM > MS
Group 1: Performance MS > PQM
67
demographic groups. As far as the PQM model is concerned the sub-models differ in terms of
number of factors, mapping technique used to build the distribution of disposable income
from the estimated quantiles or type of quantiles used as dependent variables within the
dynamic factor model.
The assessment of the performance of the different methods highlighted also a strong
complementarity between approaches, namely microsimulation and PQM. This raised the
issue of method selection and different strategies were tested within the quality assessment
framework:
(a) A best method by country and indicator that takes into account the specific traits of
groups of indicators. The results showed that there is a kind of polarisation of
methods according to their performance by two groups of indicators: inequality
indicators (AROP and QSR) and positional indicators. In general microsimulation
outperforms PQM for positional indicators.
(b) A best method by country that takes into account national specificities;
microsimulation tends to outperform PQM in this case (19 countries) as it and to
provide in general more stable results overall for the indicators. However, PQM
gives in general better results for AROP and in several countries it provides better
results (e,g, SI, CY).
(c) An aggregate of the best methods selected based on their historical performance
and the convergence of estimates in the target year. This last option proved to be
the more robust and it can address the issue of the high complementarity of
methods.
(a) Choosing the best-performing model by country and indicator
One possible model selection procedure would be to pick the best-performing model among
all the candidate models and this by country and indicator. If we denote as 𝐷𝑇𝑘 the multi-
period (out-of-sample) performance metric (see Section 4.2.1) of the k-th modelling approach
for some given country and indicator with the model index ranging from 1 to 𝐾 and hence an
overall of 𝐾 distinct model frameworks at our disposal, we can express this simple model
selection procedure under the form of an optimization setting in which the selected model is
defined as
𝑘∗ = max𝑘(𝐷𝑇𝑘).
Thus, the chosen flash estimate for the target year 𝑇 + 1 would be the one produced by the
𝑘∗-th model framework that is �̂�𝑇+1(𝑘∗)
. The principle which underlies this model selection
procedure is that given the candidate models at our disposal, picking the model with the
highest historical performance should also provide the best-performing flash estimate at the
target year. However, there are some flaws to this reasoning which could lead to an
underperformance of the chosen flash estimate at the target year.
To begin with, recall for any of the countries the respective training set only contains income
data from 2008 onwards and hence provided that the 2014 data is already available for the
68
2015 nowcasting exercise we have a maximum number of seven data points that can be used
in the out-of-sample experiments. If the objective is to nowcast the levels of the indicators this
allows to generate four out-of-sample flash estimates for 2011, 2012, 2013 and 2014. In the
presence of additive or multiplicative bias and hence a shift of the objective towards the
estimation of the absolute or relative year-on-year change that number is reduced to only
three. Recall from Section 4.2.1 that the power of the statistical testing framework to
distinguish between the null hypothesis and its alternative strongly depends on the number of
out-of-sample observations at our disposal. That being said, given that for each country and
indicator we only have three observations at our disposal, the power of the performance
metric might be quite small which could ultimately result in a performance analysis which
does not reflect the actual performance of the different approaches and hence to a selection of
under-performing models.
Secondly, given that the choice of the model is done by country and indicator there is a
possibility that we end up with different models for each indicator of one specific country.
However, given that the flash estimates of each of the approaches are generated independently
from each other, selecting different models for producing the nowcasts of the set of indicators
of one given country does not take into account the inherent link that exists between the
indicators which ultimately could result in inconsistent outcomes. By inconsistent outcome
we understand a situation in which the produced set of flash estimates for a given country
cannot originate from one single income distribution. However, if no regularity conditions
whatsoever are imposed upon the underlying income distribution, these situations of
inconsistencies might actually never be observed. Thus, we might reformulate the definition
of inconsistent indicator set as being a collection of indicators which cannot be mapped to a
distribution stemming from a given family of probability distributions. A widely used
distribution for modelling income data is the generalized beta distribution of the second kind
(GB2 distribution) in which case a set of nowcasted indicators would be considered to be
inconsistent if there does not exist a vector of GB2 distributional parameters and hence no
GB2 distribution which can be mapped to the set of nowcasted indicators.
Thirdly, in Section 4.2.1 we have seen that the performance analysis is built around a
statistical testing framework whose null hypothesis states that the flash estimate is an
unbiased estimate of the population indicator and that its standard deviation is identical to the
sample standard deviation of the corresponding EU-SILC estimate (hypothesis of equal
distributions). The absolute performance metric, that is the p-value of the testing framework,
can be seen as an indication of the likelihood that the null hypothesis is correct. The more out-
of-sample observations available, the more precise this indication is. Simply picking the
model framework with the highest p-value among all candidate models can lead to sub-
optimal decisions leading to elimination of model frameworks which fulfil the assumption of
the null hypothesis. The smaller the number of out-of-sample observations the more frequent
the occurrence of these sub-optimal decisions.
69
(b) Choosing the best-performing model by country
To strengthen the discriminatory power of the performance metric, we have to increase the
number of out-of-sample observations used for the computation of the performance metric.
Indeed, note that for the model selection procedure consisting in choosing one model
framework by country and indicator, the performance metric is defined as
𝐷𝑇𝑘 =
1
𝑇∑ 𝐷𝑡
𝑘
𝑇
𝑡=1
with 𝑡 ranging from 2012 to 2013 or 2014 depending whether the EU-SILC dataset of 2015 is
already available for that country. However, instead of choosing one method by both, country
and indicator, we can also select one single model framework for each country and use the
latter to the flash estimates of the entire set of indicators. In this case the optimization setting
takes on a different form with a new objective function as
𝑘∗ = max𝑘
1
𝑅 ⋅ 𝑇∑ ∑ 𝐷𝑡,𝑟
𝑘
𝑇
𝑡=1
𝑅
𝑟=1
where 𝐷𝑡,𝑟𝑘 represents the country-specific performance metric of the k-th modelling approach
with at target year 𝑡 and with respect to indicator 𝑟 with 𝑟 = 1, … , 𝑅. Hence, in contrast to the
county and indicator specific optimization scheme, the aggregate performance metric used in
the country-specific setting does not only sum the performance metrics across time points but
across time and indicators. This results in a higher discriminatory power of the performance
metric which should result in an increased performance of the derived flash estimates.
It can be seen that the increased robustness of the country-wise optimization settings leads to
an increase out-of-sample performance of the flash estimates and hence the use of country-
specific objective functions should be given preference over the use of objective functions by
country and indicator.
(c) Aggregating the well-performing estimates into a single estimate
An alternative approach to model selection is to bundle the values of different flash estimates
into a one single aggregate estimate. In contrast to the single-choice approaches, consisting in
selecting the best-performing model framework, model aggregation is not based around
selection but it rather aims at combining the information provided by the entire set of well-
performing estimates.
The use of country-specific objective functions solves two of the issues related to the use of
country-and-indicator-specific objective functions. To start with, by picking one single model
framework for each country the power of the performance metric is increased considerably.
Moreover, the resulting set of flash estimates does not suffer from any inconsistency issues as
70
described above. However, given that in the country-specific optimization scheme we still
only select the highest-performing model for producing the flash estimate at the target year,
we might still miss some relevant information offered by some of the remaining models. In
this part we are going to describe an algorithm which aims at combining the values of the
flash estimates produced by different model frameworks such as to end up with one single
aggregate flash estimate for the target year. This idea of aggregating the information provided
by different estimates is presented as an alternative to picking the best estimate is known in
the literature as model averaging. For more details see for example Claeskens and Hjort
(2008) or Hjort and Claeskens (2003).
The algorithm takes the form of an iterative optimization procedure consisting of three main
steps.
Step 1: Remove low-performing estimates
The first step consists in discarding the low-performing model frameworks that is those with
very low historical performance. Given that the absolute performance metric has an
interpretation as a p-value the filter step takes the form of a statistical testing procedure in the
sense that a model framework having a historical performance lower than a pre-defined cut-
off-values (confidence level) are removed from the initial set of models. We have chosen a
confidence level of 0.1 and hence all the methods having a p-value lower than 0.1 are
considered to be of low-performance, due either to the presence of some type of bias or to
instability issues reflecting in a large standard deviation, which in consequence leads to their
elimination.
Step 2: Compute aggregate estimate from the set of remaining candidate models
The second step consists in aggregating those flash estimates that have passed the filtering
step into one single value. Recall from Section 4.1 that the sampling distribution of the EU-
SILC estimate at the target year is considered to be a normal distribution with mean 𝜇𝑇 and
standard deviation 𝜎𝑇. The objective of the aggregation step is to find the most likely value of
the parameter set (𝜇𝑇 , 𝜎𝑡) and hence the most likely form for the underlying normal
distribution from which the set of flash estimates could have originated from. Put differently,
we consider all the flash estimates to be observations from the same normal distribution and
our objective is to find the parametrization of this distributions. However, recall at this point
that the testing framework is based around the null hypothesis that the flash estimates follow a
probability distribution that is identical to the sample distribution of the EU-SILC estimates.
Thus, we do not only want the flash estimate to be observations of any normal distribution but
of the sampling distribution of the EU-SILC estimate at the target year. This information is
crucial since it allows us to reduce the degree of freedom of the optimization setting. Indeed,
given that the underlying distribution of the flash estimates is given by the sampling
distribution of EU-SILC this implies that the standard deviation 𝜎𝑡 should be equal to the
standard deviation of the EU-SILC estimate. As illustrated in Section 4.1, a consistent
estimate of the sample standard deviation can be found using bootstrap procedures. Given that
at the target there is no corresponding EU-SILC dataset which could be used for the
bootstrapping procedure we set 𝜎𝑇 equal to the standard deviation of the last available EU-
71
SILC indicator. Given this, the aggregate estimate, denoted as 𝜇𝑇∗ can be defined as the
solution to a weighted maximum-likelihood estimation setting as
𝜇𝑇∗ = max𝜇𝑇
∑ 𝐷𝑇−1𝑘 𝐹(�̂�𝑇
𝑘 , 𝜇𝑇 , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇𝑘 , 𝜇𝑇 , 𝜎𝑇))
𝐾+𝑘=1
where �̂�𝑇𝑘 denotes the flash estimate of the k-th model framework among all the 𝐾+candidate
models that have passed the filtering step. 𝐷𝑇−1𝑘 is the test statistics reflecting the historical
performance of the 𝑘-th candidate model. Hence, the better the historical performance of a
given candidate model the stronger the impact of the latter on the outcome of the optimization
frameworks and hence on the value of the aggregate estimate.
The aggregate estimate obtained at the second step represents the most likely value of the
mean parameter of the sampling distribution conditional on the set of available and remaining
flash estimates after application of the filter.
Step 3: Evaluate performance of aggregate estimate
However, one question which is left unanswered is whether 𝜇𝑇∗ is an overall good estimate in
addition to being the most likely one. At the third step of the aggregation step we aim at
providing an answer to this question. In order to do so we first construct a standardized
version of the objective function used in the maximum-likelihood framework, that is
𝑓(𝜇𝑇∗ ) =
4
𝐾+ ∑ 𝐷𝑇−1
𝑘 ⋅ 𝐹(�̂�𝑇𝑘 , 𝜇𝑇
∗ , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇𝑘, 𝜇𝑇
∗ , 𝜎𝑇))
𝐾+
𝑘=1
.
Note that the form above is only a weighted version of the test statistics defined in Section
4.1. This implies that the distribution of 𝑓(𝜇𝑇∗ ) under the null hypothesis that all the flash
estimates are observation of the same probability distribution can be obtained numerically on
the basis of Monte Carlo simulations. From the distribution under the null we can derive the
p-value associated with 𝑓(𝜇𝑇∗ ) which provides an indication of the absolute performance of
𝜇𝑇∗ .
As for the historical performance, we conclude that if the p-value is larger than 0.1 the overall
performance of the aggregate estimate at the target year is acceptable in which case 𝜇𝑇∗
constitutes the final output of the aggregation algorithm. On the contrary, if the p-value is
smaller than 0.1 we go back to the second step and try to find a different value of the
aggregate estimate using a reduced set of candidate flash estimates. In order to construct this
reduced set of estimates we proceed according to a leave-one-out scheme by discarding the
candidate flash estimate which has the lowest performance, both historically and at the target
year, that is
𝑘− = min𝑘
𝐷𝑇−1𝑘 ⋅ 𝐹(�̂�𝑇
𝑘, 𝜇𝑇∗ , 𝜎𝑇) ⋅ (1 − 𝐹(�̂�𝑇
𝑘, 𝜇𝑇∗ , 𝜎𝑇)).
Iterations
Step 2 and Step 3 are iterated as long as the p-value associated with 𝑓(𝜇𝑇∗ ) is smaller than 0.1.
72
Comparative studies and conclusions: FE 2014
Recall that we have three different model selection algorithms at our disposal. Hence, using
each of the model selection algorithms, we can compute three different flash estimates for
each combination of country and income at a given target year. In order to decide which
among these three estimates should be picked, we need to have a criterion at our disposal
reflecting the performances of each of the model selection algorithms.
To obtain this measure of performance we conduct out-of-sample experiments in which we
compute the flash estimates of twenty-three countries and eight indicators (AROP, QSR,
ARPT, D10, D30, D50, D70 and D90) using the tree model selection procedures described
above. Based on the indicator reference values at income year 2014 we then evaluate the
performance of the flash estimates by computing the p-value of the uncertainty-based
performance evaluation approach as described in Section 4.2.1. At the time of the analysis
there were a total of twenty-three countries for which income data 2014 was available. Thus
for each of the three model selection procedures we obtain 184 flash estimates in total, one for
each combination of country and indicator.
Table 4.14 reports the mean performance of the three model selection approaches by country
and across indicators. The first important conclusion that can be drawn from the results shown
in Table 4.14 is that in most cases the performance of the best-by-country estimate is higher
than the best-by-country-and-indicator estimate. This is a direct consequence of the increased
discriminatory power of the indicator-unspecific objective function resulting in high degrees
of robustness of the best-by-country estimate and hence to a better out-of-sample performance
compared to the indicator-specific objective function. The same conclusion holds when
comparing the performances of the aggregate estimate and the best-by-country-and-indicator
estimate given that in twenty out of the twenty-three countries the performance of the
aggregate estimate is higher.
However, the information provided by these pairwise performance comparisons across
countries is misleading in a lot of cases. Take for instance the cases of Austria and Bulgaria.
For both countries the aggregate estimate outperforms the best-by-country estimate. However,
the magnitudes of the performances of the estimates are very different in both cases, 0.79
respectively 0.73 in the case of Austria and 0.33 respectively 0.06 for Bulgaria. Hence, even
though both countries would lead to identical conclusions in terms of relative performance
evaluation, namely that the aggregate estimate ranks highest for both countries, they lead to
different conclusions in terms of absolute performance evaluation.
73
Table 4.14: Average performance across indicators of 2014 flash estimates
Country Aggregation By country By country and
indicator
BE 0.119 0.149 0.085
BG 0.335 0.063 0.101
CZ 0.218 0.055 0.034
DK 0.154 0.033 0.067
EE 0.349 0.149 0.001
EL 0.036 0.001 0.000
ES 0.008 0.034 0.000
FR 0.481 0.930 0.606
HR 0.000 0.000 0.000
LV 0.894 0.243 0.010
LT 0.000 0.000 0.000
HU 0.830 0.014 0.666
MT 0.190 0.132 0.249
NL 0.922 0.885 0.651
AT 0.785 0.739 0.003
PL 0.751 0.804 0.433
PT 0.006 0.044 0.001
RO 0.884 0.000 0.000
SI 0.000 0.000 0.000
SK 0.000 0.118 0.000
FI 0.525 0.830 0.850
SE 0.009 0.021 0.006
UK 0.548 0.067 0.003
Indeed, as far as Austria is concerned both estimates have a high absolute performance. For
instance, both estimates have performances beyond the classical thresholds used in statistical
testing settings, that is 0.01, 0.05, and 0.1. On the other hand, in the case of Bulgaria the best-
by-country estimate has a performance below 0.1 whereas the performance of the aggregate
estimate is significantly larger than 0.1. Thus, in the case of Austria both estimates have good
absolute performance and should be treated alike. On the contrary, as far as Bulgaria is
concerned, there should be a clear discrimination between both estimates given the
significantly higher performance of the aggregate estimate.
For the results shown in Table 4.14 we fix confidence levels ranging from 1 to 0.05 and report
the proportion of flash estimates by country having an average performance across indicators
smaller than the respective confidence level.
74
Table 4.14: Proportion of flash estimates below confidence levels
Confidence Level Aggregation By country By country and
indicator
1 1 1 1
0.95 1 1 1
0.9 0.957 0.957 1
0.85 0.870 0.913 1
0.8 0.826 0.826 0.957
0.75 0.739 0.826 0.957
0.7 0.739 0.783 0.957
0.65 0.739 0.783 0.870
0.6 0.739 0.783 0.826
0.55 0.739 0.783 0.826
0.5 0.652 0.783 0.826
0.45 0.609 0.783 0.826
0.4 0.609 0.783 0.783
0.35 0.609 0.783 0.783
0.3 0.522 0.783 0.783
0.25 0.522 0.783 0.783
0.2 0.478 0.739 0.739
0.15 0.391 0.739 0.739
0.1 0.348 0.565 0.696
0.05 0.348 0.435 0.609
Hence, there are 35% of the aggregate estimates having absolute performance lower than 0.1.
However, at the same time 57% of the best-by-country estimates and 70% of the best-by-
country-and-indicator estimate are characterized by a performance lower than 0.1. In general,
for most confidence levels the proportion of aggregate estimates having lower performance
than the confidence levels is significantly smaller than the proportions reported for the other
two model selection methods. Hence, the out-of-sample experiments based on the 2014
income data provide strong evidence for an outperformance of the aggregation approach over
the other two model selection methods.
Figures 4.4 and 4.5 show the results of the flash estimation procedure for AROP and QSR at
target year 2014 in the case of the aggregation-based model selection procedure. The x-axis (
y-axis) contains the absolute year-on-year change of the EU-SILC estimate (flash estimate)
from 2013 to 2014. The dot-shaped points represent the different flash estimates with the ones
highlighted in blue corresponding to the aggregate estimate. The red dots indicate those flash
estimates which have a p-value lower than or equal to 0.1 and hence are eliminated at the at
the filtering step. The orange dots represent those flash estimates whose historical
performance is considered to be good enough to pass the filtering step. However, they are
eliminated while constructing the aggregate estimate. Finally, the green dots stand for the
estimates which pass both, the filtering step and the aggregation step. The corresponding year-
on-year changes of the EU-SILC indicator are represented as through the cross-shaped dots.
75
Figure 4.4: AROP 2014: YoY change EU-SILC versus YoY estimated
Figure 4. 5: QSR 2014: YoY change EU-SILC versus YoY estimated
76
4.2.3. Ex-ante quality assessment for flash estimates in the target year
In order to evaluate the performance of the flash estimates at target year, we have considered
two different quality criteria:
a) Historical performance of the aggregate;
b) Prediction interval that considers both the sampling error and the uncertainty of the model;
The quality criteria provide an indication of the reliability of the information provided by a
given flash estimate 2015.
(a) Historical performance of the aggregate
The historical performance of the aggregate flash estimate, denoted as 𝐻+, takes the form of
the mean performance of its 𝐾+ constituent estimates and hence
𝐻+ =1
𝐾+ ∑ 𝐷𝑇−1
𝑘 .
𝐾+
𝑘=1
Under the assumption that the historical performance of the aggregate estimate provides an
adequate approximation of the performance of the latter at the target year we have that a low
value of 𝐻+corresponds to a low reliability of flash estimate. Hence, in practice we say that
flash estimates having a value of 𝐻+ lower than 0.40 (8%) are of low reliability and hence are
discarded. Moreover, we made a qualitative review for cases where the quality was between
(0.40-0.70). This included several checks: the cross-validation with the results of the best
method by country, further analysis of the estimates between the two main approaches and
individual values composing the aggregate, inconsistencies between indicators due to
selection of different methods. These qualitative checks will need to be formalized in the
future and included in the aggregation framework. In general the aggregation proved to be
more robust but for further work it would be important to ensure (a) a better consistency of
the results across indicators using for example the same combination of methods for all
indicators for a particular country; (b) improve the convergence of the candidate estimates
that enter the aggregate; (c) take into account the dependence within specific groups of
methods in the filtering process.
(b) Standard deviation of the estimate
Note that the historical performance only provides an indication of the average performance
of the participating estimates. Thus, two aggregate estimates with identical average
performance of its constituent estimates but with different values of 𝐾+, that is different
numbers of participating estimates are treated the same. In order to discriminate with respect
to 𝐾+we need to make a conceptual shift by moving from the mere point estimation to interval
estimation.
77
In contrast to point estimates, interval estimates account for the uncertainty of the data
estimation process, resulting in an estimate of both, the mean value and the standard deviation
of the target variable. In order to do this, we first need an estimate of the standard deviation of
the aggregate estimate.
We obtain the latter using a bootstrap approach. Thus,, we draw 𝐾+ independent observations
from a 𝑁(𝜇𝑇∗ , 𝜎𝑇) distribution and plug them into the objective function of Step 2 of the
aggregation algorithm. By maximizing the objective function conditional on the generated set
of observations we obtain a value 𝜇𝑇,𝑚∗ of the aggregate estimate. If we repeat this exercise M
times we get a set of values 𝜇𝑇,𝑚∗ with 𝑚 = 1, … , 𝑀.
Under the assumption that the EU-SILC estimate is an unbiased estimate of the population
indicator 𝜇𝑡, we have that the EU-SILC indicator is distributed according to 𝑁(𝜇𝑇 , 𝜎𝑇). Given
that the flash estimate is an estimate of the population parameter with standard deviation 𝜅𝑇
and under the assumption of independence of the EU-SILC estimate and the flash estimate,
the new estimated distribution can be written as
𝑁 (𝜇𝑇∗ , √𝜅𝑇
2 + 𝜎𝑇2)
where 𝜎𝑇 is the standard deviation derived from the last available EU-SILC dataset.
Based on the estimated distribution of the SILC indicator we can derive a prediction interval
for the flash estimates 2015 which takes into account both the sampling error of EU-SILC
and the uncertainty of the model.
The size of the prediction interval provides an indication of the quality of the aggregate
estimate. Indeed, if there is a large number of estimates participating in the construction of the
aggregate estimate with more or less uniform participation the model-based standard
deviation (𝜅𝑇) will be relatively small indicating an increased robustness of the aggregate
estimate. On the other hand, if only a small number of individual estimates contribute to the
aggregate estimate 𝜅𝑇 will be large reflecting an unstable behaviour of the aggregate estimate.
The same is true if the contribution of one or very few estimates is large compared to the
other participating estimates resulting in a non-uniform weights vector.
4.2.4. Conclusions
The first important conclusion that can be drawn from this analysis is that both, the aggregate
estimate and the estimate derived on the basis of a country-specific objective function largely
outperform the estimate consisting in choosing the best historical estimate by country and
indicator. Moreover, for nearly all countries the aggregate estimate has a better performance
than the country-specific estimate.
Note that the main difference between both estimates is that the best-by-country estimate only
uses the information provided by one of the available candidate models whereas the aggregate
estimate aims at combining the information of the well-performing candidate estimates.
78
leading to a more robust model selection procedure. Moreover, in the aggregation algorithm
the convergent/divergent behaviour of the candidate models within the target year constitutes
an integral part of the performance assessment setting. On the contrary, the best-by-country
estimate does not take into account the relative performance of the candidate models at the
target year and hence is exclusively based on the use of historical performance. However, the
strength of the best-by-country estimate is that the model selection is done at the country-level
which implies that for a given country the same model is used to produce the flash estimates
for all the indicators assuring consistency among the latter. Note that using different methods
for distinct indicators by no means implies the occurrence of an inconsistent set of indicators.
However, using the same model for all the indicators estimates is a sufficient condition for
consistency.
We can thus conclude that aggregating the information provided by several well-performing
methods allows creating more robust flash estimates compared to the other two selection
procedures. However there are two main issues to be addressed:
In contrast to the best-by-country estimate, the aggregation algorithm is not
constrained to the use of one sole model at the country level and hence the set of
participating models might very well vary across indicators. Hence, the aggregation
algorithm does not assure consistency in advance.
The optimisation algorithm of aggregation doesn't take into account the hierarchical
structure of the methods at this point so we can have issues to balance the participation
of the two approaches.
79
5. Flash estimates 2015
5.1. Production of FE 2015
Following the comparative analysis done for 2014 indicating an outperformance of both, the
best-by-country and the best by-country-and-indicator model selection method, by the
aggregation method, the flash estimates 2015 are constructed based on the aggregation
algorithm as follows: (1) discard the low-performing methods with low historical performance
(described in Section 4.2); (2) aggregate those flash estimates that have passed the filtering
step into one single value; (3) discard candidate flash estimates that diverge in the target year.
In practice, we compute three different types of aggregate estimates using the iterative
optimization procedure described above. In each of the three cases the subset of individual
flash estimates and hence the input of the aggregation algorithm is different. Hence, for the
computation of the first aggregate estimate we exclusively take into account those estimates
that were built using micro-simulation methods. As far as the second aggregate estimate is
concerned we only consider individual estimates stemming from the PQM family. Finally, a
third version of the algorithm considers the estimates provided by both model families. Thus,
the set of candidate models considered for the computation of the first two aggregate
estimates are nested within the set of candidate models of the third estimate.
Figures 5.1 to 5.3 illustrate the participation of the different methods in the construction of the
aggregate estimate at target year 2015 in the case where the set of candidate models consists
of both, micro-simulation models and pqm models. The colours provide an indication of the
relative importance of each of the models. To start with, the white cells indicate that a given
model framework has not been available for a specific country. A cell highlighted in red
means that this model is characterized by a historical performance lower than the threshold
value of 0.1 and implying an elimination of that model at the filtering step. The models that
pass the filtering step are either highlighted in green or orange. The orange cells indicate that
this model has been discarded during the iterative aggregation scheme. Hence, only the
models marked in green have had an impact on the construction of the final aggregate
estimate. The darker the shade of green of a given cell, the higher the influence of that model
framework.
We can notice that for most countries the share of micro-simulation models passing the
filtering step for AROP and ARPT is considerably higher than the corresponding share of
PQM models. A similar conclusion can be drawn in terms of the income deciles (D10, D30,
MEDIAN, D70, D90). A slightly different behaviour can be observed for the QSR indicator
for which the performances of both approaches seems to be more balanced. Thus, we can say
at this point that the micro-simulation models tend to outperform the PQM models across
indicators and countries.
Besides the figures clearly show that within both model families the number of models
passing the filtering step is at its largest for the AROP indicator. As far as the ARPT is
concerned, the figure shows that all the members of the PQM family have relatively low
80
performance and that consequently for several countries they do not play any role or any
important role in the construction of the aggregate estimate.
Figure 5.1: Contribution of each of the flash estimates to the aggregate estimate of the 2015 AROP
indicator
Figure 5.2: Contribution of each of the flash estimates to the aggregate estimate of the 2015 ARPT
indicator
81
Figure 5.3: Contribution of each of the flash estimates to the aggregate estimate of the 2015 QSR
indicator
The next step was to perform an ex-ante analysis of quality in order to decide between the
different sets of indicators as well as identifying cases in which indicators are still considered
unreliable. This assessment is mainly done on the historical performance and the convergence
in the target year taking into account uncertainty, including also more qualitative information.
If the aggregate estimate built from micro-simulation models and the PQM-based estimate
have a similar historical performance but provide contradictory or at least discrepant results at
the target year, their joint information is considered to be unreliable. If one approach has a
considerably higher historical performance compared to the other, we report the information
provided by the high-performing estimate. Finally, if both estimates have good historical
performance, without one of them considerably out-performing the other one, we report the
information provided by the combined estimate considering both model families. In the future
this kind of hierarchically-structured decision process can be directly introduced in the
aggregation algorithm
Apart the main income inequality indicators (AROP and QSR) we produced the flash
estimates 2015 also for the so-called positional indicators, namely deciles and the ARPT.
For the two main inequality indicators the aggregation clearly outperforms the other two
selection procedures (the best by country and best by country and indicator). Methods,
mainly coming from microsimulation, which have a relative higher performance on positional
indicators perform less well for AROP or QSR. As mentioned above the participation number
in the aggregate is high for inequality indicators (AROP and QSR) but relatively low for
location parameters (ARPT and the deciles of the income distribution). However, the
predictive power of the aggregate estimate increases with the participation number. Hence, if
the participation number is small the aggregation algorithm cannot fully deploy its potential
and hence sees its performance reduced to the performance of the by-country-and-indicator
estimate which, as mentioned above, is underperforming when compared to the by-country
82
estimate. This has led to the decision to use the aggregation algorithm for inequality indicators
only and to resort the by-country estimate for location parameters.
This leads to sub-optimal solutions in which for the estimation of a specific decile we rely on
a single method which is the best method by country and indicator. Therefore, for this set of
indicators, we decided to opt for the best by country method in order to ensure a better
consistency across the different deciles. This still leads to a large number of indicators that
have a performance lower than the threshold for specific deciles: for example D10 seems a
particular difficult indicator.
Further work needs to improve some of the issues raised by the use of different models: 1)
need to better take into account the hierarchical structure of the methods used: by sub-group,
by approach etc. 2) need to ensure a higher consistency across indicators. The same objective
function in the optimization algorithm by country would ensure this. However, the first results
show that methods which perform better for predicting inequality indicators are not
necessarily the same as for positional indicators.
5.2. Communication of flash estimates 2015: magnitude-direction scales
Providing a point estimate for the population parameter does not take into account the
uncertainty of the flash estimate. Despite its simplicity, communicating a single number gives
a false impression of precision. Alternatively, we could take into account the uncertainty of
the estimate by providing two main elements (a) a prediction interval; and (b) a magnitude-
direction scale for the estimated change in the flash estimate. From the overlap of the
prediction interval with the change classes, we can derive the probabilities of each of the
latter.
Based on preliminary consultation with or main users and in order to take onto account the
uncertainty of the flash estimate we suggest using a magnitude-direction scale with 6 classes
(MD6), to which the YoY change will be assigned:
(1) major increase [+++]
(2) moderate increase [+]
(3) (quasi) stable / minor changes [O]
(4) moderate decrease [-]
(5) major decrease [---]
(6) no conclusion (when we get contradictory signals from different methods with good
past performance).
This would allow associating a probability to each magnitude scale, and therefore taking into
account the uncertainty of the point estimate.
We tend to see favourably such an approach, which takes into account the uncertainty of our
estimates; this is expressed also by the fact that we have included the "no conclusion" class in
83
the scale: this indicates a case when the signal is highly ambiguous, the results (or the
prediction interval) cover equally multiple divergent classes.
Figure 5.4: Illustration MD6 scale
Estimate Moderate increase
Estimate Probability of major decrease: --- moderate decrease: --- stable / minor change: 5% moderate increase: 75% major increase: 20%
The definition of the classes requires choosing a meaningful threshold. There are two main
approaches that can be envisaged:
− Country specific thresholds that are defined as multiples of standard deviation
observed in EU-SILC.
− Pre-defined thresholds based on uniform values for all countries which take into
consideration an user assessment of what is considered a relevant change
One argument in favour of having uniform thresholds is related to communication, namely the
difficulty in explaining why a change of the same magnitude is described as (in an extreme
but nevertheless possible example) "a stable situation" for one country, and "a large decrease"
84
for another. However, depending on the sampling design and the sample size a specific YoY
change can be significant or not for one country.
Therefore, we present both types of scales. For the first option the thresholds that define the
classes are multiples of the standard deviation (SD) as follows:
− major increase / decrease: >4SD
− moderate increase / decrease: 2-4SD
− minor changes [O]: ±2SD
For the second option we choose a more granular scale that depends on the specific indicator
with 7 possible change classes.
Once we have defined the thresholds for the MD6 classes we can derive the estimated
probabilities for our flash estimates to fall within pre-specified intervals based on the standard
deviation as calculated under in point a).
𝑃(𝑎 ≤ �̃�𝑇 < 𝑏) = 𝐹 (𝑏, 𝜇𝑇∗ , √𝜅𝑇
2 + 𝜎𝑇2) − 𝐹 (𝑎, 𝜇𝑇
∗ , √𝜅𝑇2 + 𝜎𝑇
2).
This formula is used to compute the estimated probabilities to be within the different MD6
classes with intervals being fixed as ] − ∞, 4 ⋅ 𝜎𝑇], [−2 ⋅ 𝜎𝑇 , −𝜎𝑇[, [−𝜎𝑇 , 𝜎𝑇[, [𝜎𝑇 , 2 ⋅ 𝜎𝑇[,
[4 ⋅ 𝜎𝑇 , +∞[. Thus for instance the estimated probability for the third MD6 class is computed
as
𝑃(−𝜎𝑡 ≤ �̃�𝑇 < 𝜎𝑡) = 𝐹 (𝜎𝑡, 𝜇𝑇∗ , √𝜅𝑇
2 + 𝜎𝑇2) − 𝐹 (−𝜎𝑡, 𝜇𝑇
∗ , √𝜅𝑇2 + 𝜎𝑇
2).
5.3. Main results
Figures 5.5 to 5.11 show the two proposal for magnitude direction scales for the main
indicators: with absolute thresholds and with relative threshold. The results with either a low
historical performance or a high historical performance but with inconsistencies across the
two approaches in the target year are not presented in the tables as considered unreliable.
Figure 5.5 shows the flash estimates for 2015 for AROP. They are available for all Member
States, except for Ireland, Czech Republic and Spain. In most countries, a minor change is the
dominant class. A major increase is predicted for Germany (2013-2015), a moderate decrease
for Slovenia and Greece and a major decrease for Romania. The scale based on absolute
thresholds is more granular so it highlights more changes.
In figure 5.6, there are the flash estimates for 2015 for QSR. As for AROP, minor changes are
estimated for most countries. Figures 5.7 to 5.11 show the flash estimates for 2015 for the
positional indicators. They are not available for several countries due to inconsistent results
across approaches or low historical performance. The value of absolute thresholds is in table
5.19.
85
Figure 5.5: Flash estimate 2015 AROP
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: CZ, ES, IE
86
Figure 5.6: Flash estimate 2015 QSR
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: DE, EE, ES, LT
87
Figure 5.7: Flash estimate 2015 ARPT
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: EL, ES, HR, LV
88
Figure 5.8: Flash estimate 2015 D10
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: EE, EL, ES, CY, LT, RO, UK
89
Figure 5.9: Flash estimate 2015 D30
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BE, BG, EE, EL, ES, IT, LT, RO, SI, SK, UK
90
Figure 5.10: Flash estimate 2015 D70
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BG, EL, HU, MT, PT, SI, UK
91
Figure 5.11: Flash estimate 2015 D90
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BG, DK, EE, EL, LT, LU, AT, SK
92
Tables 5.12 to 5.18 show the detailed results for the flash estimates 2015 for the ratio (AROP,
QSR) and positional indicators (ARPT, D10, D30, D70, D90). All figures are shown in these
tables, including flash estimates considered not informative or unreliable which are not to be
disseminated to the users. In all tables, the estimated change of the indicator compared to its
last observed value (either 2013 or 2014) is shown along with its prediction interval. It can be
observed that for some countries the prediction interval is rather large. Moreover results
should be interpreted with caution taking into account the historical performance of the
estimates in order to assess how reliable the estimates are.
93
Table 5.12: Flash estimates 2015 by country:: At-risk-of-poverty rate (AROP)
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15;
Unreliable: CZ, IE, ES
94
Table 5.13: Flash estimates 2015 by country:: QSR
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15; Unreliable: EE, ES, DE, LT
95
Table 5.14: Flash estimates 2015 by country:: At-risk-of-poverty threshold (ARPT)
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: EL, ES, HR, LV
96
Table 5.15: Flash estimates 2015 by country: D10
Source:
EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: EE,EL, ES, CY, LT, RO, UK
97
Table 5.16: Flash estimates 2015: D30
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BE, BG, EE, EL, ES, IT, LT, R, SI, SK, UK
98
Table 5.17: Flash estimates 2015 by country: D70
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BG, EL, HU, MT, PT, SI, UK
99
Table 5.18: Flash estimates 2015 by country: D90
Source: EU-SILC 2009-2014 or 2015 when available; FE 2013/15 or 2014/15 in national currency;
Unreliable: BG, DK, EE, EL, LT, LU, AT, SK
100
Table 5.19: Bounds for MD6 classes
BE BG CZ DK DE EE IE EL ES FR HR IT CY LV LT LU HU MT NL AT PL PT RO SI SK FI SE UK
[---] / [-] -1.1 -1.1 -0.8 -1.4 -0.8 -1.2 -2.2 -1.1 -0.8 -0.6 -1.1 -1.6 -2.0 -1.1 -2.0 -2.8 -2.0 -1.4 -1.4 -1.4 -1.1 -0.6 -0.6 -0.6 -1.4 -0.8 -0.8 -0.3
[-] / [O] -0.5 -0.6 -0.4 -0.7 -0.4 -0.6 -1.1 -0.6 -0.4 -0.3 -0.6 -0.8 -1.0 -0.6 -1.0 -1.4 -1.0 -0.7 -0.7 -0.7 -0.6 -0.3 -0.3 -0.3 -0.7 -0.4 -0.4 -0.1
[O] / [+] 0.5 0.6 0.4 0.7 0.4 0.6 1.1 0.6 0.4 0.3 0.6 0.8 1.0 0.6 1.0 1.4 1.0 0.7 0.7 0.7 0.6 0.3 0.3 0.3 0.7 0.4 0.4 0.1
[+] / [+++] 1.1 1.1 0.8 1.4 0.8 1.2 2.2 1.1 0.8 0.6 1.1 1.6 2.0 1.1 2.0 2.8 2.0 1.4 1.4 1.4 1.1 0.6 0.6 0.6 1.4 0.8 0.8 0.3
[---] / [-] -2.1 -2.9 -1.2 -1.7 -2.0 -2.8 -3.6 -1.7 -1.1 -1.4 -2.0 -2.4 -3.4 -2.1 -1.1 -3.5 -1.5 -2.9 -1.2 -2.0 -1.2 -1.5 -1.8 -1.2 -1.8 -1.1 -2.1 -1.6
[-] / [O] -1.0 -1.4 -0.6 -0.9 -1.0 -1.4 -1.8 -0.9 -0.6 -0.7 -1.0 -1.2 -1.7 -1.1 -0.6 -1.8 -0.7 -1.4 -0.6 -1.0 -0.6 -0.7 -0.9 -0.6 -0.9 -0.6 -1.0 -0.8
[O] / [+] 1.0 1.4 0.6 0.9 1.0 1.4 1.8 0.9 0.6 0.7 1.0 1.2 1.7 1.1 0.6 1.8 0.7 1.4 0.6 1.0 0.6 0.7 0.9 0.6 0.9 0.6 1.0 0.8
[+] / [+++] 2.1 2.9 1.2 1.7 2.0 2.8 3.6 1.7 1.1 1.4 2.0 2.4 3.4 2.1 1.1 3.5 1.5 2.9 1.2 2.0 1.2 1.5 1.8 1.2 1.8 1.1 2.1 1.6
[---] / [-] -3.1 -6.9 -3.1 -1.6 -4.1 -4.5 -3.3 -4.3 -5.4 -1.5 -4.0 -3.7 -4.3 -4.1 -0.7 -3.7 -3.1 -2.8 -2.3 -3.2 -2.6 -3.8 -6.1 -1.9 -4.0 -1.7 -2.5 -3.1
[-] / [O] -1.6 -3.5 -1.6 -0.8 -2.0 -2.2 -1.7 -2.2 -2.7 -0.8 -2.0 -1.9 -2.1 -2.1 -0.4 -1.8 -1.5 -1.4 -1.1 -1.6 -1.3 -1.9 -3.1 -1.0 -2.0 -0.8 -1.2 -1.6
[O] / [+] 1.6 3.5 1.6 0.8 2.0 2.2 1.7 2.2 2.7 0.8 2.0 1.9 2.1 2.1 0.4 1.8 1.5 1.4 1.1 1.6 1.3 1.9 3.1 1.0 2.0 0.8 1.2 1.6
[+] / [+++] 3.1 6.9 3.1 1.6 4.1 4.5 3.3 4.3 5.4 1.5 4.0 3.7 4.3 4.1 0.7 3.7 3.0 2.8 2.3 3.2 2.6 3.8 6.1 1.9 4.0 1.7 2.5 3.1
[---] / [-] -3.0 -3.4 -1.6 -1.9 -2.3 -1.8 -3.1 -2.1 -1.9 -1.6 -2.7 -2.1 -3.4 -2.5 -0.9 -5.2 -2.8 -2.6 -1.6 -1.6 -1.4 -2.7 -3.2 -1.4 -1.6 -1.3 -2.3 -1.7
[-] / [O] -1.5 -1.7 -0.8 -1.0 -1.2 -0.9 -1.6 -1.0 -1.0 -0.8 -1.4 -1.0 -1.7 -1.3 -0.5 -2.6 -1.4 -1.3 -0.8 -0.8 -0.7 -1.3 -1.6 -0.7 -0.8 -0.7 -1.1 -0.8
[O] / [+] 1.5 1.7 0.8 1.0 1.2 0.9 1.6 1.0 1.0 0.8 1.4 1.0 1.7 1.3 0.5 2.6 1.4 1.3 0.8 0.8 0.7 1.3 1.6 0.7 0.8 0.7 1.1 0.8
[+] / [+++] 3.0 3.4 1.6 1.9 2.3 1.8 3.1 2.1 1.9 1.6 2.7 2.1 3.4 2.5 0.9 5.2 2.8 2.6 1.6 1.6 1.4 2.7 3.2 1.4 1.6 1.3 2.3 1.7
[---] / [-] -2.1 -2.9 -1.2 -1.7 -2.0 -2.8 -3.6 -1.7 -1.1 -1.4 -2.0 -2.4 -3.4 -2.1 -1.1 -3.5 -1.5 -2.9 -1.2 -2.0 -1.2 -1.5 -1.8 -1.2 -1.8 -1.1 -2.1 -1.6
[-] / [O] -1.0 -1.4 -0.6 -0.9 -1.0 -1.4 -1.8 -0.9 -0.6 -0.7 -1.0 -1.2 -1.7 -1.1 -0.6 -1.8 -0.7 -1.4 -0.6 -1.0 -0.6 -0.7 -0.9 -0.6 -0.9 -0.6 -1.0 -0.8
[O] / [+] 1.0 1.4 0.6 0.9 1.0 1.4 1.8 0.9 0.6 0.7 1.0 1.2 1.7 1.1 0.6 1.8 0.7 1.4 0.6 1.0 0.6 0.7 0.9 0.6 0.9 0.6 1.0 0.8
[+] / [+++] 2.1 2.9 1.2 1.7 2.0 2.8 3.6 1.7 1.1 1.4 2.0 2.4 3.4 2.1 1.1 3.5 1.5 2.9 1.2 2.0 1.2 1.5 1.8 1.2 1.8 1.1 2.1 1.6
[---] / [-] -1.9 -2.6 -1.6 -1.6 -2.3 -2.1 -4.2 -1.8 -1.8 -1.6 -2.4 -1.8 -3.6 -2.8 -0.9 -4.7 -1.6 -2.2 -1.6 -2.1 -1.6 -1.7 -2.2 -1.3 -1.9 -1.0 -1.9 -1.8
[-] / [O] -1.0 -1.3 -0.8 -0.8 -1.1 -1.1 -2.1 -0.9 -0.9 -0.8 -1.2 -0.9 -1.8 -1.4 -0.5 -2.3 -0.8 -1.1 -0.8 -1.1 -0.8 -0.9 -1.1 -0.6 -0.9 -0.5 -1.0 -0.9
[O] / [+] 1.0 1.3 0.8 0.8 1.1 1.1 2.1 0.9 0.9 0.8 1.2 0.9 1.8 1.4 0.5 2.3 0.8 1.1 0.8 1.1 0.8 0.9 1.1 0.6 0.9 0.5 1.0 0.9
[+] / [+++] 1.9 2.6 1.6 1.6 2.3 2.1 4.2 1.8 1.8 1.6 2.4 1.8 3.6 2.8 0.9 4.7 1.6 2.2 1.6 2.1 1.6 1.7 2.2 1.3 1.9 1.0 1.9 1.8
[---] / [-] -2.0 -4.8 -2.4 -1.6 -2.8 -3.2 -4.4 -2.1 -2.9 -3.0 -2.3 -2.3 -6.5 -3.3 -1.8 -4.4 -3.4 -3.5 -2.2 -3.0 -2.6 -3.2 -3.6 -1.5 -2.3 -2.1 -2.1 -2.0
[-] / [O] -1.0 -2.4 -1.2 -0.8 -1.4 -1.6 -2.2 -1.0 -1.5 -1.5 -1.1 -1.2 -3.3 -1.6 -0.9 -2.2 -1.7 -1.7 -1.1 -1.5 -1.3 -1.6 -1.8 -0.7 -1.2 -1.1 -1.1 -1.0
[O] / [+] 1.0 2.4 1.2 0.8 1.4 1.6 2.2 1.0 1.5 1.5 1.1 1.2 3.2 1.6 0.9 2.2 1.7 1.7 1.1 1.5 1.3 1.6 1.8 0.7 1.2 1.1 1.1 1.0
[+] / [+++] 2.0 4.8 2.4 1.6 2.8 3.2 4.4 2.1 2.9 3.0 2.3 2.3 6.5 3.3 1.8 4.4 3.4 3.5 2.2 3.0 2.6 3.2 3.6 1.5 2.3 2.1 2.1 2.0
[---] / [-] -0.3 -0.8 -0.6 -0.9 -0.5 -0.4 -0.4 -0.9 -1.7 -2.7 -0.4 -0.5 -0.6 -0.6 -0.9 -0.7 -0.6 -0.3 -0.5 -0.4 -0.3 -0.6 -0.7 -0.2 -0.4 -0.2 -0.2 -0.5
[-] / [O] -0.1 -0.4 -0.3 -0.5 -0.3 -0.2 -0.2 -0.4 -0.8 -1.3 -0.2 -0.3 -0.3 -0.3 -0.4 -0.3 -0.3 -0.1 -0.2 -0.2 -0.2 -0.3 -0.3 -0.1 -0.2 -0.1 -0.1 -0.2
[O] / [+] 0.1 0.4 0.3 0.5 0.3 0.2 0.2 0.4 0.8 1.3 0.2 0.3 0.3 0.3 0.4 0.3 0.3 0.1 0.2 0.2 0.2 0.3 0.3 0.1 0.2 0.1 0.1 0.2
[+] / [+++] 0.3 0.8 0.6 0.9 0.5 0.4 0.4 0.9 1.7 2.7 0.4 0.5 0.6 0.6 0.9 0.7 0.6 0.3 0.5 0.4 0.3 0.6 0.7 0.2 0.4 0.2 0.2 0.5
[---] Major decrease [-] Moderate decrease [O] Minor change [+] Moderate increase [+++] Major increase
D70, in %
D90, in %
QSR, in percentage points
Bounds
AROP, in percentage points
ARPT, in %
D10, in %
D30, in %
MEDIAN, in %
101
6. REFERENCES
Anderson, T. W. and Darling D. A. (1954) A test of goodness-of-fit, Journal of the American
Statistical Association, vol. 49, no. 268, pp. 765-769.
Avram, S., Sutherland, H., Tasseva, I. and Tumino, A. (2011) Income protection and poverty
risk for the unemployed in Europe, Research Note 1/2011 of the European Observatory on
the Social Situation and Demography, European Commission.
Banbura, M., Giannone, D., Modugno, M. and Rechlin, L. (2013) Now-casting and the real-
time data flow, ECB Working Paper Series, no. 1564, pp. 1-53.
Berger, Y.G., and Priam, R. (2016) A simple variance estimator of change for rotating
repeated surveys: an application to the European Union Statistics on Income and Living
Conditions household surveys, Journal of the Royal Statistical Society, Series A Statistics
in Society, vol. 179, issue , January 2016, pp. 251–272
Bourguignon, F., Bussolo, M. and Da Silva, L.P. (2008) The impact of macro-economic
policies on poverty and income distribution Macro-Micro Evaluation Techniques and
Tools, The World Bank and Palgrave-Macmillan, New York.
Brandolini, A., D’Amuri, F. and Faiella, I. (2013) “Country case study – Italy”, Chapter 5 in
Jenkins et al, The Great Recession and the Distribution of Household Income, Oxford:
Oxford University Press.
Brewer, M., Browne, J., Hood, A., Joyce, R., Sibieta, L. (2013) The Short- and Medium-
Term Impacts of the Recession on the UK Income Distribution, Fiscal Studies, 34(2):
179–201.
Claeskens, G. and Hjort, N. L. (2008) Model Selection and Model Averaging, Cambridge
University Press.
Essama-Nssah, B. (2005) The Poverty and Distributional Impact of Macroeconomic Shocks
and Policies: A Review of Modeling Approaches, World Bank Policy Research Working
Paper 3682, Washington, DC.
Fernandez Salgado M., Figari, F., Sutherland, H. and Tumino, A. (2014) Welfare
compensation for unemployment in the Great Recession, The Review of Income and
Wealth, 60(S1): 177–204.
Figari, F., Iacovou, M., Skew, A. and Sutherland, H. (2012) Approximations to the truth:
comparing survey and microsimulation approaches to measuring income for social
indicators, Social Indicators Research, 105(3): 387-407.
Figari, F., A. Paulus and H. Sutherland (2015) “Microsimulation and Policy Analysis” in
A.B. Atkinson and F. Bourguignon (Eds.) Handbook of Income Distribution, Vol 2B.
Amsterdam: Elsevier, pp. 2141-2221. ISBN 978-0-444-59430-3
Figari, F., Salvatori, A. and Sutherland, H. (2011) Economic downturn and stress testing
European welfare systems, Research in Labor Economics, 32: 257-286.
Hjort, N. L and Claeskens, G. (2003) Frequentist model average estimators, Journal of the
American Statistical Association, 98: 879-899.
Immervoll, H., Levy, H., Lietz, C., Mantovani, D. and Sutherland, H. (2006) The sensitivity
of poverty rates in the European Union to macro-level changes, Cambridge Journal of
Economics, 30: 181-199.
Jara, X.H. and Leventi, C. (2014) Baseline results from the EU27 EUROMOD (2009-2013),
102
EUROMOD Working Paper EM18/14, Colchester: ISER, University of Essex.
Kalman, R. E. (1960) A new approach to linear filtering and prediction problems,
Transactions of the ASME-Journal of Basic Engineering, no. 8, pp. 35-45.
Keane C., Callan, T., Savage, M., Walsh, J.R. and Timoney, K. (2013) Identifying Policy
Impacts in the Crisis: Microsimulation Evidence on Tax and Welfare, Journal of the
Statistical and Social Inquiry Society of Ireland, XLII: 1-14.
MacDonald, J. B. and Xu, Y. J. (1995) A generalization of the beta distribution with
applications, Journal of Econometrics, vol. 66, no. 1-2, pp. 133-152.
Massey, J. (1951) The Kolmogorov-Smirnov Test for Goodness of Fit, Journal of the
American Statistical Association, vol. 46, no. 253, pp. 68-78.
Matsaganis, M. and Leventi C. (2014) Poverty and inequality during the Great Recession in
Greece, Political Studies Review, 12: 209 -223.
Narayan, A. and Sánchez-Páramo C. (2012) Knowing, when you don't know: using
microsimulation models to assess the poverty and distributional impacts of
macroeconomic shocks, The World Bank, Washington DC.
Navicke, J., Rastrigina, O. and Sutherland, H. (2013) Nowcasting Indicators of Poverty Risk
in the European Union: A Microsimulation Approach, Social Indicators Research, 119(1):
101-119.
O'Donoghue, C.,Loughrey, J. (2014) Nowcasting in Microsimulation Models: A
Methodological Survey, Journal of Artificial Societies and Social Simulation 17 (4) 12.
Peichl, A. (2008) The Benefits of Linking CGE and Microsimulation Models: Evidence from
a Flat Tax Analysis, IZA Discussion Paper No. 3715.
Rastrigina, O., Leventi, C., Vujackov S. and Sutherland, H. (2016) Nowcasting: estimating
developments in median household income and risk of poverty in 2014 and 2015,
Research Note 1/2015, Social Situation Monitor, European Commission.
Rastrigina, O., Leventi, C. and Sutherland, H. (2015) Nowcasting: estimating developments
in the risk of poverty and income distribution in 2013 and 2014, Research Note 1/2014,
Social Situation Monitor, European Commission.
Shumway, R. H. and Stoffer, D. S. (1982) An approach to time series smoothing and
forecasting using the EM algorithm, Journal of Time Series Analysis, vol. 3, no. 4, pp.
253-264.
Sutherland, H. and Figari, F. (2013) EUROMOD: the European Union tax-benefit
microsimulation model, International Journal of Microsimulation, 6(1): 4-26.
103
Abbreviations
AROP At-risk-of-poverty rate
ARPT At-risk-of-poverty threshold
CDF Cumulative Distribution Function
ESS European Statistical System
EU-
SILC European Union Statistics on Incomes and Living Conditions
FE Flash Estimates
GB2 Generalized beta distribution of the second kind
GDP Gross Domestic Product
ISER
Institute for Social and Economic Research in University of
Essex
LFS Labour Force Survey
MD6 Magnitude-direction scale with 6 classes
MIP Macroeconomics Imbalance Procedure
NA National Accounts
PQM Parametric Quantile Modelling
QAF Quality Assessment Framework
QSR Interquartile share ratio
YoY Year On Year change
MAPE Mean Absolute Percentage Error
HD Hellinger Distance
Country codes
BE Belgium
BG Bulgaria
CZ Czech Republic
DK Denmark
DE Germany
EE Estonia
IE Ireland
EL Greece
ES Spain
FR France
HR Croatia
IT Italy
CY Cyprus
LV Latvia
LT Lithuania
LU Luxembourg
HU Hungary
104
MT Malta
NL Netherlands
AT Austria
PL Poland
PT Portugal
RO Romania
SI Slovenia
SK Slovakia
FI Finland
SE Sweden
UK United Kingdom