Upload
ceana
View
49
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Methodology of Allocating Generic Field to its Details. Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007. Outline. Background Information on Tax Data Objective Current Methodology Other Methodologies Considered Comparison of the Methodologies - PowerPoint PPT Presentation
Citation preview
Methodology of Allocating Generic Field to its Details
Jessica Andrews
Nathalie Hamel
François Brisebois
ICESIII - June 19, 2007
Outline
Background Information on Tax Data
Objective
Current Methodology
Other Methodologies Considered
Comparison of the Methodologies
Future Work and Conclusions
Tax Data
Statistics Canada receives annual data from Canada Revenue Agency (CRA) on incorporated (T2) businesses
Tax data:Balance Sheet
Income Statement
88 different Schedules
Tax Data
About 700 different fields to reportMost companies provide only 30-40 fieldsOnly 8 fields are actually required by CRA (section totals)
Non-farm revenueNon-farm expensesFarm revenueFarm expensesAssetsLiabilitiesShareholder EquityNet Income/Loss
Objective
To impute the missing detail variablesWhy ?
Tax data users need detailed data (tax replacement project (TRP))Different concepts and definitions between tax and survey dataA subset of details linked to the same generic can be mapped to different survey variables (Chart of Account)
Challenges to meet
Methodology mustWork well for a large number of details
Be capable of dealing with details which are rarely reported and those which are frequently reported
Give good micro results for tax replacement, but also give good macro results when examined at the NAICS or full database level
First attempt to complete Tax Data
Edit rulesOutlier detection within a recordDeterministic edits (to ensure the record balances within section)Review and manual correctionsOverlap between fiscal periodNegative valuesConsistency edits between tax variablesOutlier detection between records (Hidiroglou-Berthelot)CORTAX balancing edits
Deterministic imputation of key variablesInventoriesDepreciationSalaries and wages
GDA ConceptsCorporation can use either generic or detail fields to report their results
Case 1 Case 2 Case 3
Generic 8810 Office expenses amount 100 30
Details
8811Office stationery and supply expense amount 20
8812Office utilities expense amount 30 10
8813Data processing expense amount 50 60
Total 100 100 100
GDA Concepts
Block is defined by a generic and its details
Generic field is not a totalGoal is to impute the most significant detail variables when a generic amount has been reportedGDA: Generic to detail allocation
Current method
Uses imputation classes based on industry codes and size of company
First 2 digits of NAICS (about 25 industries)Three sizes of revenue (boundaries of 5 and 25 million)
Calculates ratios within imputation classes for each block
Uses all non-zero and non-missing detailsUses only details reported at least 10% of the time (5% for block General Farm Expense)
Assigns ratios to businesses with a generic
Current method
Originally proposed as a solution with good macro (aggregate) results
Now need good micro (business) level results for TRP
ProblemsImputation classes are frequently not homogeneous in terms of distribution
A large number of small imputation classes
Other methods considered
Historic imputation method
Scores method
Cluster method
Historic imputation method
Assumes distributions of details are the same from one year to the next
ProblemsA change in business strategies/properties will not be considered this way
Most businesses which report details in the previous year will report them also in the current year, leaving few businesses which could be imputed with this method (~5% on all blocks tested)
Requires use of another method for remaining businesses
Scores method
Uses response/non response models for each detail
Groups businesses into imputation classes on the basis of percentiles of response probability
Calculates ratios within imputation classes
Assigns ratios to businesses with a generic
Scores method
ProblemsNeed to create a model for each detail
Difficult to resolve what to do in the case of blocks with many details (5 or more) which are frequently reported
This method was excluded due to it’s difficulty in coping with blocks with a moderate to large number of details
Cluster method
Divides businesses into imputation classes on the basis of response patterns to details
Uses clustering or dominant detail method
Uses discriminatory models (parametric or not) to assign businesses with generic to imputation classesCalculates ratios within imputation classesAssigns ratios to businesses with a generic
Cluster method
ProblemsFor certain blocks it can be difficult to find good variables on which to discriminate
Issue of how often clustering method and models should be reviewed
Comparing the methods
Estimate distributions of known data for year n from ratios calculated for year n-1
Create a benchmark fileReported details in years n-1 and nPut all details into generic fields in year nCalculate ratios from businesses in year n-1 for all methodsAssign ratios to businesses in year nCompare the results to the reported fields
Comparing the methods
Compare the results at the micro (businesses) and the macro (aggregate) levels
Compare true and estimated distributions
Comparing the methods
Macro statistics
for the jth detail in the block
2)ˆ( jj
j ttSSE
2)1ˆ(
j j
j
t
tSSEP
Comparing the methods
Micro StatisticsMedian Pseudo CV
for the jth detail and ith business in the block
j
ijj
ijij xxx 2ˆ
Comparing the methods
Micro StatisticsMedian Pearson Contingency Coefficient
for the jth detail and ith business in the block
f values represent the marginal distributionsd2 represents the degree of dependency (depends on n, r and c)
2/1
2
2
nd
dP
i j ji
jiij
i j ji
jiij
ff
fffn
n
nn
n
nnn
d..
2..
..
2
..
2
Comparing the methods
We show results for Block 8230: Other Revenue
This block has 20 details covering revenue distribution
Important for clients as used in many surveys
The scores method is not shown as it is difficult to implement with this many details
Comparing the methods
OTHER REVENUE FLDS 8230 TO 8250
8230 Other revenue
8231 Foreign exchange gains/losses
8232 Income/loss of subsidiaries/affiliates
8233 Income/loss of other divisions
8234 Income/loss of joint ventures
…
8248 Insurance recoveries
8249 Expense recoveries
8250 Bad debt recoveries
Results
Block 8230 Micro Statistics Macro Statistics
Median PseudoCV
IQR Median
PearsonCont. Coeff.
IQR SSE SSEP
Current Method
1.08 0.43 0.66 0.14 2.2e20 120
Cluster Method
0.34 1.39 0.36 0.63 2.8e20 12
Historic + Cluster
0.51 0.99 0.10 0.7 9.9e19 4.5
Cluster methodology
Most blocks use dominant detail (attractor) x clusters to define the imputation classes A business i belongs to cluster j of attractor x where x>50 if
where is the total value reported by business i in detail j. If this statement is not true for any detail then the business is assigned to cluster j+1.
ijY
100
x
Y
Y
jij
ij
Cluster methodology
Distribution ratios to details are calculated for each cluster
Discriminatory models are then created
(nonparametric for most blocks) to assign businesses with a generic
Use variables on industry (NAICS), location (province), size (revenue, log revenue), details and totals of details in other blocks
Cluster methodology
Generic amounts are assigned to details in the following 3 ways
If generic amount and no details reported then ratios are assigned as calculatedIf generic amount and all details with ratio greater than 0% are reported then ratios are assigned as calculatedIf generic amount and some details but not all are reported, then ratios are pro-rated and generic is assigned only to details which were not reported
Cluster methodology
Gives better micro results
Improved data for tax replacement
Macro results remain similar to current methodology
Micro results are consistent year to year
Future work and conclusions
The cluster methodology will be implemented for reference year 2006 for the Income Statement
Model fitting and implementation for Balance Sheet will follow
Review of models and clustering methods as deemed appropriate
For more information please contact
Pour plus d’information, veuillez contacter
Visit our web site atwww.statcan.ca
Contact Information / Coordonnées