Upload
may-saunders
View
32
Download
0
Embed Size (px)
DESCRIPTION
A selective editing method considering both suspicion and potential impact, developed and applied to the Swedish foreign trade statistics Topic (ii), WP 12. Anders Jäder and Anders Norberg, Statistics Sweden. The data. Main variables collected monthly: Commodity code (8-digit CN codes) - PowerPoint PPT Presentation
Citation preview
A selective editing method considering both suspicion
and potential impact, developed and applied to the
Swedish foreign trade statistics
Topic (ii), WP 12
Anders Jäder and Anders Norberg, Statistics Sweden
The data
Main variables collected monthly:
Commodity code (8-digit CN codes)
Country of dispatch/arrival
Quantity (weight and supplementary unit)
Invoiced Value
350 000 observations per month
Score function
Computed as a weighted geometric mean of measures of Suspicion and Potential impact
ImpPImpactSuspicionScore
Selective editing
The 1,500 observations with the highest scores are flagged
Suspicion
The difference between Unit price and the lower/upper quartile,
divided by inter-quartiles distance. Logarithmic scale
(Euro/Kg)
Potential Impact
The difference between Invoiced Value and
the median of Unit price multiplied by Quantity
(Euro)
ImpPImpactSuspicionScore
Hit rate = 30%
Hit rate=46%Impact=65%
Impact=80%
Hit rate=30%
Hit rate=34%Impact=81% Best!
Potential impact
The 8-digit commodity codes can be aggregated to 6, 4 and 2-digit commodity codes (CN6, CN4, CN2) and other classifications , e.g. the SITC classification.
Over 10,000 estimates to be computed
Potential impact
We have developed a formula with which the impact of an error on the
statistics on all aggregation levels and sizes of estimates can be expressed in
one single variable.
vgk
*k
10
v
ValueInvoicedlog
vgk
*k
2Qii
51voveri f
O
1
ValueInvoiced
iUPQuantityvalueInvoicedimummaxpactIm
Potential impact
Excel demonstration
Potential impact
Relative errors that are judged to have equal impact on the publiced statistics
Classification variableValue of domain of study
Total Import/Export SITC 2 SITC 3 CN6 CN8
10 - 1741,6 2612,4 8708,1 13062,1100 - 435,4 653,1 2177,0 3265,5
1 000 - 108,9 163,3 544,3 816,410 000 - 27,2 40,8 136,1 204,1
100 000 - 6,8 10,2 34,0 51,01 000 000 - 1,7 2,6 8,5 12,8
10 000 000 - 0,4 0,6 2,1 3,235 000 000 0,1 0,2 0,3 1,0 1,5
f= 4
StrategySCB has saved raw and corrected data for all
months since 2000. We analyzed them
New system with parameters
Produce monthly process data for a continuous search of best parameter values
Will we be misled when we analyze data that has been flagged by the old method ???
Study
We need many months of historical data – current data is not enough
Homogenous groups – modest demand on number of observations
Computation of median and quartiles weighted by Quantity
Suspicion versus probability of error – transformation of Suspicion
Suspicion versus probability of error
0
0,2
0,4
0,6
0,8
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Hit rate
Suspicion
Experiences from production
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2 3 4 5 6 7 8 9 10
11 12 1 2 3 4 5 6 7 8 9 10
11 12 1 2
2003 2004 2005
pro
ce
nt
Total Quantity 1 (Net Weight) Quantity 2 (Sup.unit) Invoiced value
Hit rate by variable:
Experiences from production
0
50 000
100 000
150 000
200 000
250 000
300 000
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
2003 2004 2005
Total Quantity 1 (Net weight) Quantity 2 (Sup.unit) invoiced value
Impact by variable:
Experiences from production
0
200 000
400 000
600 000
800 000
1 000 000
1 200 000
1 400 000
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
2003 2004 2005
- Impact on variable invoiced value:
Thank You!