41
The standard error of EU-SILC estimates Tim Goedemé Research Foundation – Flanders (FWO) Herman Deleeck Centre for Social Policy, University of Antwerp Workshop EU-SILC / EU-LFS Manchester, 5 August

The standard error of EU-SILC estimates · LU IS SE NL NO DK ES FI MT UK FR07 EE DE FR08 IE BE08 BE07 AT SI CZ IT CY PT GR SK LT PL HU LV BG RO. ... • In Stata: - svyset. psu [pw=RB050],

Embed Size (px)

Citation preview

1

The standard error of EU-SILC estimates

Tim Goedemé

Research Foundation – Flanders (FWO)Herman Deleeck Centre for Social Policy, University of Antwerp

Workshop EU-SILC / EU-LFSManchester, 5 August

2

3

4

5

6

1. The problem

• EU-SILC consists of a sample-> standard errors and confidence intervals(only random error)

-> analytical vs. replication methods (BRR, JRR, bootstrap)

-> take account of:- sample design- weighting- imputation- character of indicator

7

1. The problem

-> information is needed on sample design- Documentation- Sample design variables- Ultimate cluster model

• But: Lack of information (doc, UDB, Eurostat)

8

1. The problem

• Description of sample design- Documentation- Sample design variables (UDB, Eurostat)

• Empirical test

• Recommendations

9

Overview

1. The problem2. Sample design & variables3. Setup of tests4. Results5. Implementation6. Discussion and conclusion

10

Overview

1. The problem2. Sample design & variables3. Setup of tests4. Results5. Implementation6. Discussion and conclusion

11

2. Sample design & variables

• Sources: Quality reports (Circa) / NSIs

• Large diversity in sample designs- Simple random samples – multi-stage samples

- Equal probability of selection – Probability proportional to size

- With / without replacement

- With / without stratification

- Systematic sampling

- Panel structure (most 4 year rotational panel design)

- Fixed / changing PSUs

12

2. Sample design variables

• Potential candidates

- DB060: Primary Sampling Units (PSUs)

- DB050: Primary strata (not in UDB)

- DB040: region (NUTS1 / NUTS2 / missing)

- DB062: Secondary Sampling Units (SSUs)

- (DB100: degree of urbanisation)

- (DB070: order of selection)

13

2. Sample design variables

• Problems with region (DB040) as stratification variable- Sometimes not applicable (e.g. FI, SE)- Very crude compared to DB050- Current situation v. moment of selection

• -> solution: re-group split PSUs

14

2. Sample design variables

• Problems with PSUs (DB060)- Missing (e.g. Germany)- Less PSUs than reported- Self-representative PSUs- Not unique across strata -> problem when working with

DB040 (UDB)

• Solution: - Use household ID when DB060 is missing (HU: DB062)- Eurostat is working on improved variables

15

2. Sample design variables

• 2 versions of EU-SILC 2008:- UDB - Eurostat data- + France, Belgium: EU-SILC 2007

• Difference between two versions: - primary strata (DB050)

16

Overview

1. The problem2. Sample design & variables3. Setup of tests4. Results5. Implementation6. Discussion and conclusion

17

3. Setup of tests

• Aims of comparison UDB – Eurostat data:

- Insight into overall precision of EU-SILC

- Insight into adequacy of incomplete sample design

variables

• Method:

- Linearization (DASP for Stata)

18

3. Setup of tests

• Limitations:

- only first stage of sample design

- finite population correction (IE) ignored

- systematic sampling ignored (simple random instead)

(EE, IT, LV, NL, NO, SE, SI, UK)

- not possible to disentangle various weighting effects

- imputation is not taken into account

19

3. Setup of tests

• Three lowly correlated indicators:- At-risk-of-poverty rate (AROP)- Rate of severe material deprivation (Matdep)- Rate of very low work intensity (LWI)

• One relative measure

• All defined at the household level

20

3. Setup of tests

• Four scenarios:

- “individuals”: Personal ID

- “households”: Household ID

- “UDB”: PSU + region (DB060 + DB040)

- “Eurostat”: PSU + Primary strata (DB060 + DB050)

21

Overview

1. The problem2. Sample design & variables3. Setup of tests4. Results5. Implementation6. Discussion and conclusion

22

4. Results

0

5

10

15

20

25

30

35

LU IS SE NL

NO DK ES FI

MT

UK

FR07 EE D

EFR

08 IEBE

08BE

07 AT SI CZ IT CY PT GR SK LT PL HU LV BG RO

Percentage of the population confronted with severe material deprivation, EU-SILC 2008

23

4. Results1. at least use household ID as PSU variable

24

4. Results2. UDB scenario leads to larger standard errors and results in

a better proxy of “Eurostat” scenario

0

0.2

0.4

0.6

0.8

1

1.2

ES NL FR07 SI CZ UK PL FR08 BE07 IT HU GR IE LV PT BG RO

Estimated standard errors Matdep, "households", "UDB", and "Eurostat"

households UDB Eurostat

25

Overview

1. The problem2. Sample design & variables3. Setup of tests4. Results5. Implementation6. Discussion and conclusion

26

5. Implementation

• However, it is easy:- Downloadable do-file:

http://www.ua.ac.be/tim.goedeme

- More details in:Goedemé, T. (2010), The standard error of estimates based on EU-SILC. An exploration through the Europe 2020 poverty indicators, CSB Working Paper Series, WP 10/09, Antwerp, Herman DeleeckCentre for Social Policy, University of Antwerp, 36p.

(forthcoming in Social Indicators Research)

- http://doiop.com/SvysetEU-SILC2008

27

5. Implementation

• In Stata:

- svyset psu [pw=RB050], strata(strata)

- Svydes [if country==“...”]

- svy: proportion / mean /regress / ...

- estat effect

- lincom ...

28

5. Implementation

More information:

• CSB working paper

• Heeringa, S. G., West, B. T., & Berglund, P. A. (2010). Applied Survey Data Analysis. Boca Raton: Chapman & Hall/CRC.

• Handbook on variance estimation for EU-SILC with Guillaume Osier and Yves Berger, as part of Net-SILC2 project.

29

6. Conclusion

• When analysing sample data, standard errors and confidence

intervals are a prerequisite

• However: documentation and sample design variables are

incomplete

• Controlling for household clustering leads to good proxy in many

cases

• It seems best to use as much information as possible in UDB

30

Thanks for your attention!

[email protected]

31

2. Sample design variables

UDB Eurostat ReportedBE07 243 243 275BE08 6,300 6,300 275BG 506 1,415 1,415DE 13,312 13,312 n/aCZ 2,362 2,364 2,362FR07 9,017 9,017 349FR08 n/a 349 349HU 4,875 5,245 4,184LV 912 912 930PL 468 5,093 5,912SI 774 1,672 2,799UK 1,014 1,014 1,065Source: EU-SILC UDB 2008 (FR, BE: 2007)

EU-SILC 2008 Eurostat databaseComparative quality report; national quality reports; NSIs

Number of PSUs

32

2. Sample design variables

UDB Eurostat ReportedBE07 3 11 11BE08 3 11 11BG 2 56 56CZ 8 53 53DE 1 1 n/aFR07 22 22 86FR08 22 87 86HU 3 526 529LV 1 4 4PL 6 211 211SI 1 6 6UK 1 31 30Source: EU-SILC UDB 2008 (FR, BE: 2007)

EU-SILC 2008 Eurostat databaseComparative quality report; national quality reports; NSIs

Number of Primary Strata

33

5. Implementation

• Reminder:

- VARa+b = VARa + VARb – 2*COVARa,b

- Dependent samples: COVAR can be positive or negative!

- Independent samples: COVAR ==0; However: (VARa+b)^0.5 = (VARa + VARb)^0.5

< (VARa)^0.5 + (VARb)^0.5

⇒ Watch out for comparisons within / across countries / across ‘cross-sections’ based on panel data or with fixed PSUs

⇒ Do not simply compare confidence intervals!

34

1. The problem

• In principle in EU-SILC:

- sample design: DB050 (strata), DB060 (PSUs)

- weighting: RB050

- imputation: flag-variables

- character of indicator: not dependent on data

35

4. Results4. But is it worth the effort? Yes, but…

0

20

40

60

80

100

120

AROP Matdep LWI

Absolute number of non-significant country-by-country comparisons (95% confidence, two-tailed test).

individuals households UDB Eurostat

Out of 378 country-by-country differences ... are not significant:

36

1. The problem

0

5

10

15

20

25

30

35

40

DK

NO SI IS FI SE NL

CZ CY AT

DE

FR07 SK BE EE IE H

U LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008

CY

PL

37

1. The problem

0

5

10

15

20

25

30

35

40

DK

NO SI IS FI SE NL CZ CY AT

DE

FR07 SK BE EE IE

HU LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008

CY

PL

38

1. The problem

0

5

10

15

20

25

30

35

DK

NO SI IS FI SE NL

CZ CY AT

DE

FR07 SK BE EE IE H

U LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008

39

1. The problem

0

5

10

15

20

25

30

35

40

DK

NO SI IS FI SE NL CZ CY AT

DE

FR07 SK BE EE IE

HU LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008

CY

PL

40

1. The problem

0

5

10

15

20

25

30

35

40

DK

NO SI IS FI SE NL

CZ CY AT

DE

FR07 SK BE EE IE H

U LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008

41

1. The problem

0

5

10

15

20

25

30

35

40

DK

NO SI IS FI SE NL

CZ CY AT

DE

FR07 SK BE EE IE H

U LU PL UK PT LT GR ES LV IT BG RO

At-risk-of-poverty rate for children, EU-SILC 2008