27
A selective editing method considering both suspicion and potential impact, developed and applied to the Swedish foreign trade statistics Topic (ii), WP 12 Anders Jäder and Anders Norberg, Statistics Sweden

Anders Jäder and Anders Norberg, Statistics Sweden

Embed Size (px)

DESCRIPTION

A selective editing method considering both suspicion and potential impact, developed and applied to the Swedish foreign trade statistics Topic (ii), WP 12. Anders Jäder and Anders Norberg, Statistics Sweden. The data. Main variables collected monthly: Commodity code (8-digit CN codes) - PowerPoint PPT Presentation

Citation preview

Page 1: Anders Jäder and Anders Norberg, Statistics Sweden

A selective editing method considering both suspicion

and potential impact, developed and applied to the

Swedish foreign trade statistics

Topic (ii), WP 12

Anders Jäder and Anders Norberg, Statistics Sweden

Page 2: Anders Jäder and Anders Norberg, Statistics Sweden

The data

Main variables collected monthly:

Commodity code (8-digit CN codes)

Country of dispatch/arrival

Quantity (weight and supplementary unit)

Invoiced Value

350 000 observations per month

Page 3: Anders Jäder and Anders Norberg, Statistics Sweden

Score function

Computed as a weighted geometric mean of measures of Suspicion and Potential impact

ImpPImpactSuspicionScore

Page 4: Anders Jäder and Anders Norberg, Statistics Sweden

Selective editing

The 1,500 observations with the highest scores are flagged

Page 5: Anders Jäder and Anders Norberg, Statistics Sweden

Suspicion

The difference between Unit price and the lower/upper quartile,

divided by inter-quartiles distance. Logarithmic scale

(Euro/Kg)

Page 6: Anders Jäder and Anders Norberg, Statistics Sweden

Potential Impact

The difference between Invoiced Value and

the median of Unit price multiplied by Quantity

(Euro)

Page 7: Anders Jäder and Anders Norberg, Statistics Sweden

ImpPImpactSuspicionScore

Page 8: Anders Jäder and Anders Norberg, Statistics Sweden
Page 9: Anders Jäder and Anders Norberg, Statistics Sweden
Page 10: Anders Jäder and Anders Norberg, Statistics Sweden
Page 11: Anders Jäder and Anders Norberg, Statistics Sweden
Page 12: Anders Jäder and Anders Norberg, Statistics Sweden
Page 13: Anders Jäder and Anders Norberg, Statistics Sweden

Hit rate = 30%

Page 14: Anders Jäder and Anders Norberg, Statistics Sweden

Hit rate=46%Impact=65%

Page 15: Anders Jäder and Anders Norberg, Statistics Sweden

Impact=80%

Hit rate=30%

Page 16: Anders Jäder and Anders Norberg, Statistics Sweden

Hit rate=34%Impact=81% Best!

Page 17: Anders Jäder and Anders Norberg, Statistics Sweden

Potential impact

The 8-digit commodity codes can be aggregated to 6, 4 and 2-digit commodity codes (CN6, CN4, CN2) and other classifications , e.g. the SITC classification.

Over 10,000 estimates to be computed

Page 18: Anders Jäder and Anders Norberg, Statistics Sweden

Potential impact

We have developed a formula with which the impact of an error on the

statistics on all aggregation levels and sizes of estimates can be expressed in

one single variable.

vgk

*k

10

v

ValueInvoicedlog

vgk

*k

2Qii

51voveri f

O

1

ValueInvoiced

iUPQuantityvalueInvoicedimummaxpactIm

Page 19: Anders Jäder and Anders Norberg, Statistics Sweden

Potential impact

Excel demonstration

Page 20: Anders Jäder and Anders Norberg, Statistics Sweden

Potential impact

Relative errors that are judged to have equal impact on the publiced statistics

Classification variableValue of domain of study

Total Import/Export SITC 2 SITC 3 CN6 CN8

10 - 1741,6 2612,4 8708,1 13062,1100 - 435,4 653,1 2177,0 3265,5

1 000 - 108,9 163,3 544,3 816,410 000 - 27,2 40,8 136,1 204,1

100 000 - 6,8 10,2 34,0 51,01 000 000 - 1,7 2,6 8,5 12,8

10 000 000 - 0,4 0,6 2,1 3,235 000 000 0,1 0,2 0,3 1,0 1,5

f= 4

Page 21: Anders Jäder and Anders Norberg, Statistics Sweden

StrategySCB has saved raw and corrected data for all

months since 2000. We analyzed them

New system with parameters

Produce monthly process data for a continuous search of best parameter values

Will we be misled when we analyze data that has been flagged by the old method ???

Page 22: Anders Jäder and Anders Norberg, Statistics Sweden

Study

We need many months of historical data – current data is not enough

Homogenous groups – modest demand on number of observations

Computation of median and quartiles weighted by Quantity

Suspicion versus probability of error – transformation of Suspicion

Page 23: Anders Jäder and Anders Norberg, Statistics Sweden

Suspicion versus probability of error

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Hit rate

Suspicion

Page 24: Anders Jäder and Anders Norberg, Statistics Sweden

Experiences from production

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 2 3 4 5 6 7 8 9 10

11 12 1 2 3 4 5 6 7 8 9 10

11 12 1 2

2003 2004 2005

pro

ce

nt

Total Quantity 1 (Net Weight) Quantity 2 (Sup.unit) Invoiced value

Hit rate by variable:

Page 25: Anders Jäder and Anders Norberg, Statistics Sweden

Experiences from production

0

50 000

100 000

150 000

200 000

250 000

300 000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2

2003 2004 2005

Total Quantity 1 (Net weight) Quantity 2 (Sup.unit) invoiced value

Impact by variable:

Page 26: Anders Jäder and Anders Norberg, Statistics Sweden

Experiences from production

0

200 000

400 000

600 000

800 000

1 000 000

1 200 000

1 400 000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2

2003 2004 2005

- Impact on variable invoiced value:

Page 27: Anders Jäder and Anders Norberg, Statistics Sweden

Thank You!