21
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT- TERM SURVEYS RESPONSE BURDEN

Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Embed Size (px)

Citation preview

Page 1: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia

TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE

SHORT-TERM SURVEYS RESPONSE BURDEN

Page 2: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Introduction Through the recent years the national statistical

institutes have been constantly confronted with two challenges, which are especially outstanding in the case of the short-term business surveys: How to improve the timeliness of the published data How to decrease response burden and the survey costs

One of the lately most frequently used ways to fulfill at least some of these demands is a convenient use of different types of administrative data.

A lot of offices is in the last years exploring the possibilities of using the TAX data, which are originally used for the monthly settlement of the value added tax (VAT), for the purposes of the turnover indices estimation.

Page 3: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Introduction cont’d

Statistical Office of the Republic of Slovenia (SORS) began to carry out the first systematic studies in this area in year 2005.

In 2005 the feasibility study was carried out, which explored the possibilities of using the VAT data for the purposes of the turnover indices estimation in the wholesale trade activity.

On the basis of the results of this study the fundaments of the new methodology were set up.

This methodology was then adopted and applied to some other areas.

Page 4: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Main features of the new methodology

One of the significant changes in the new methodology was the movement from the random sampling to the cut-off sampling procedure.

The sampling error is “replaced” with the bias due to the omission of the part of the population. One of the goals of the feasibility study was the estimation of the range of this bias.

The new methodology combines two types of data. For the small number of the largest units the classical post survey is carried out. For the majority of the units the “VAT data” are used. The statistical data processing is therefore significantly changed.

Page 5: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Feasibility study In the feasibility study we simulated the data

collection process under the new methodology for all the months in 2003-2005 and then compared the “new” results with the originally published results.

The level of the turnover sometimes differed essentially but the movement, expressed in the form of the indices, was in most cases quite coherent.

As the main indicator of the coherence of the index time-series we used the coefficient of correlation. With the exception of some smaller domains, the coefficient was around 0.9.

For the “problematic” domains we increased the number of units to be surveyed.

Page 6: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Comparison of time series obtained by two different methodologies

60,00

70,00

80,00

90,00

100,00

110,00

120,00

130,00

feb.

03

apr.0

3

jun.0

3

avg.

03

okt.0

3

dec.0

3

feb.

04

apr.0

4

jun.0

4

avg.

04

okt.0

4

dec.0

4

feb.

05

apr.0

5

jun.0

5

avg.

05

Ind

ex New methodology

Old methodology

Month-to-month indices in Wholesale Trade activity

Page 7: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Main steps of the process Selection of the set of the observational units Selection of the set of the units to be surveyed Collection and editing of survey data Merging survey and administrative data Detection of outlying values by using the Hideroglou-

Berthelot method Imputation for non-response Aggregation and calculation of processing quality

indicators

Page 8: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Selection process The whole procedure is carried out in two steps.

In the first step the units of the target population are determined and then in the second step the units for which the data will still be obtained by the “classical survey” are selected.

In the first step the units which fulfill one of the following criteria are selected: The semi-annual turnover of the unit is more than

100,000 EUR. The semi-annual turnover of the unit is more than

50,000 EUR and the unit has at least 3 employees. The unit has at least 6 employees.

Page 9: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Selection process cont’d For the smaller part of the units, the data is still

obtained by the post survey. For the selection of the units to be surveyed, the

target population is firstly sorted by the descending turnover in each of the activity groups.

Then so many of the largest units of the group are selected that the share of the turnover of the selected units exceeds the target share of the total turnover. The target share slightly differs between the activity groups, but it is generally between 50% and 60%.

The number of units to be surveyed is approximately 2% of the whole target population.

Page 10: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Selection process - schematic presentation

Business register

Admin.data

Survey data

Selected units

Sorted data

Units to be surveyed

Units for the admin. data

Target population

Page 11: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Merging data from different sources

The data are entering the process by using the two different channels. Each of the set of the data is firstly separately edited by using some consistency checks.

Data from different sources are merged into one table and each data on turnover is assigned with the suitable status.

This status contains information about the data collection method and the information whether the data was corrected through the editing process or not.

The values of the status are assigned according to the standard 4-digit classification used at the SORS.

Page 12: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Merging data from different sources – schematic presentation

Tax data database

ID TURNOVER STATUS10010 124323 11.12

10230 572 21.11

Survey data

Page 13: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Statistical editing When the data are merged together we use

Hidiroglu-Berthelot method to detect the outliers. The methods explores the distribution of month-

to-month growth rate to find extreme values. The main goal is to detect the “extreme leaps” in

the turnover, estimated from the VAT data. These leaps are usually the consequence of the methodological difference between administrative and statistical data.

Such problems mostly occur in the case when the enterprise sells the real property. This purchase money is reported to the tax authorities but it shouldn’t be included in the turnover.

Page 14: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Imputation procedures In the imputation process we impute the missing

values as well as the values which were in the statistical editing process designated as the extreme values

Three different imputation methods are used: Estimation of monthly data from quarterly data (only at

the end of each quarter). Historical Trend Method (only for the units with the data

from previous month). Mean Value method.

For each imputed data, through the values of the statuses the reason for imputation as well as the imputation method is recorded.

Page 15: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Editing and imputation – schematic presentation

Merged data

ID TURNOVER STATUS10010 124323 11.1210230 572 21.1113213 Null Null

OUTLIERYN

Null

H-B method

Detection of outliers Imputations

Imputed data

ID TURNOVER STATUS10010 19345 12.1310230 572 21.1113213 28122 41.14

Page 16: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Quality indicators Using the values of the statuses, where all the

“process changes” were recorded, the set of quality indicators is automatically calculated.

Two types of quality indicators are calculated: micro and macro indicators.

An example of the micro indicators is the imputation rate, which is defined as the rate of the data which have been imputed through the process.

An example of the macro indicators is the relative difference between the index calculated from all the data and the index calculated from the non-imputed data.

Page 17: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Quality indicators cont’d All the quality indicators are calculated automatically and inserted into

the excel spreadsheet template. The indicators for the last 13 months could also be presented graphically.

MAR06 APR06 MAY06 JUN06 JUL06 AUG06 SEP06 OCT06 NOV06 DEC06 JAN07 FEB07 MAR07Response rateDomain1 97,6% 97,5% 97,4% 97,2% 97,2% 97,3% 97,1% 97,1% 96,8% 93,7% 76,4% 76,1% 73,3%Domain2 99,5% 99,4% 99,3% 99,3% 98,8% 98,8% 98,5% 98,4% 98,0% 94,7% 92,2% 91,5% 87,8%Domain3 95,2% 95,1% 94,9% 94,8% 95,4% 95,1% 94,9% 94,8% 94,5% 91,5% 54,6% 54,4% 53,1%Domain4 99,1% 99,1% 99,1% 99,1% 97,5% 97,5% 97,0% 97,0% 96,6% 94,1% 90,4% 89,1% 87,0%

Response rate for Domain4

80,0%

85,0%

90,0%

95,0%

100,0%

105,0%

FEB06

MAR06

APR06

MAY06

JUN06

JUL0

6

AUG06

SEP06

OCT06

NOV06

DEC06

JAN07

FEB07

MAR07

Page 18: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Quality indicators cont’d One of the macro indicators compares indices, calculated from the whole

set of data with the indices calculated just from the “survey data” and the indices, calculated just from the “admin data”.

Indices - all data

FEB06 MAR06 APR06 MAY06 JUN06 JUL06 AUG06 SEP06 OCT06 NOV06 DEC06 JAN07 FEB07 MAR07Domain1 99,31 120,81 93,58 109,09 105,18 101,19 93,19 113,05 104,07 102,30 111,01 74,03 101,96 116,77

Indices - survey dataFEB06 MAR06 APR06 MAY06 JUN06 JUL06 AUG06 SEP06 OCT06 NOV06 DEC06 JAN07 FEB07 MAR07

Domain1 95,28 122,94 93,11 108,22 104,08 98,51 95,24 110,95 99,83 105,64 100,35 83,40 99,89 111,26

Indices - VAT data

FEB06 MAR06 APR06 MAY06 JUN06 JUL06 AUG06 SEP06 OCT06 NOV06 DEC06 JAN07 FEB07 MAR07Domain1 104,93 118,11 94,20 110,23 106,60 104,54 90,79 115,65 109,10 98,68 123,37 65,20 104,46 123,12

Indices M/M-1 (Domain1)

40,0050,0060,0070,0080,0090,00

100,00110,00120,00130,00

FEB06 APR06 JUN06 AUG06 OCT06 DEC06 FEB07

All data

Field data

VAT data

Page 19: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Benefits of the new system The new methodology represents a radical change in the

process of the production of the short-term indices. Although there are some deficiencies of the new system,

the benefits far overcome them. The largest benefit of the new methodology is the essential

reduction of the response burden as well as the reduction of the survey costs.

To quantify the benefits of the new methodology we estimated the burden and cost reduction, both of them expressed in the “man-days unit”.

The estimation was done for two areas “Hotels and restaurants” and “Services”.

In the chart we present the cost and burden for year 2006, when the old methodology was still used, compared with the year 2007 when we launched the new methodology.

Page 20: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Respond burden and cost reduction

Response burden and costs

0

200

400

600

800

1000

1200

2006 2007 2006 2007

Burden Costs at SORS

man

-day

s

Hotels and restaurants Services

Page 21: Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE

Conclusions SORS started to implement the new methodology for the

estimation of the monthly turnover indices in 2006. The new methodology combines two different sources.

Survey data for smaller part and administrative data for larger part of the units.

Allthough there are differences in the methodological definitions of the turnover, all the studies showed that the admin data could be well used for the purposes of the short-term statistics.

The new methodology means an essential decrease of the costs and the response burden.

The new methodology is planned to be widened to the retail trade activity in year 2008.