Upload
jacqui
View
31
Download
0
Embed Size (px)
DESCRIPTION
Some ACS Data Issues and Statistical Significance (MOEs). Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance Testing & Margins of Error (MOEs). Table Release Rules. February 28, 2007. “B” and “C” Tables. Full Table – PASSED FILTERING. - PowerPoint PPT Presentation
Citation preview
Some ACS Data Issues and Some ACS Data Issues and Statistical Significance Statistical Significance (MOEs)(MOEs)
Table Release RulesTable Release Rules
Statistical Filtering & CollapsingStatistical Filtering & Collapsing
Disclosure Review BoardDisclosure Review Board
Statistical Significance Testing & Statistical Significance Testing & Margins of Error (MOEs)Margins of Error (MOEs)
Table Release RulesTable Release Rules
February 28, 2007February 28, 2007
““B” and “C” TablesB” and “C” Tables
Full Table – Full Table – PASSED FILTERINGPASSED FILTERING
Statistically Statistically too Smalltoo Small
Collapsed TableCollapsed Table
The Census Bureau StoryThe Census Bureau Story
Why did we collect all this data if we were not going to
release it?
ACS Data Release Rules
Doug Hillmer
Data Products Area
American Community Survey Office
U.S. Census Bureau
October 11, 2006
Limitation of Disclosure Risk
– The Census Bureau’s Disclosure Review Board (DRB) must clear all data products prior to their release to the public.
Assurance of Statistical Reliability
– Data users need to be able to use ACS estimates as official Census Bureau data. Thus, some rules must be in place to ensure minimum reliability of estimates.
– Statistical reliability is assured by:
• Population size thresholds below which estimates are not released
• Data release testing and collapsing of tables that fail
The Census Bureau Will Not Release All Available Estimates to the Public
The ACS “Identity Crisis” on Reliability• Ultimately, the 5-year estimates, with no “data
release rules” acts as a long-form replacement• Single-year ACS sample is more like a current
demographic survey – although much larger in size
• Question to answer for single-year estimates: Do we accept less detail in our measures of characteristics or do we allow more detail but with data release rules in place? Less detail punishes those areas with the diversity to support the detail.
Choices for displaying estimatesin ACS data products
No suppression
1. Publish full detail with no suppression but higher pop threshold (eg., 500,000)
2. Publish limited set of estimates for all areas with 65,000+ pop
3. Published more detailed estimates for higher pop threshold and limited set for lower threshold
With suppression or Warnings4. Define a very detailed set of estimates for all geo areas with
65,000+ pop and suppress estimates that fail reliability test
5. Define a very detailed set of estimates for all geo areas with 65,000+ pop and flag estimates that fail reliability test
Filtering <<Data Release Rules >>
• Goal: to identify “weak” tables• Some tables have many zero or “near zero” cells
and relatively large standard errors• Filtering <<Data Release>> rule used during
2000-2004 ACS: drop tables if…– Universe is less than 500 (weighted) – Average cell size is less than 2 cases (unweighted)
• filtering <<data release>> rule used now: – Accept if median coefficient of variation is less than or
equal to 61%– Otherwise, collapse and review again
Why not just use cell suppression as is done for the Economic products?
Advantages• Gets rid of the “bad” estimates• Keeps the “good” estimates (depends on complementary
suppression)
Disadvantages• Creates “holes” in distributions• Makes new problems for combined estimates (eg., in derived
products, such as data profiles)• Produces a new set of problems for year-to-year comparisons
Data Release Testing – Step by Step• Compute coefficients of variation
– Coefficient of variation = standard error / estimate– Standard error = (upper bound – estimate) / 1.65– If the estimate = 0 set coefficient of variation = 100%
• Ignore total and sub-total lines in base table• Sort coefficients of variation in descending order• Find the middle value (the median)• If the median is greater than 61% the table FAILS
(median > 61% means more than half of the cells have a lower bound of 0; i.e., these cells are not statistically different from 0)
• If the median is 61% or less the table PASSES
Collapsing
• Goal: release a simplified version of a base table for a geographic area that otherwise would get nothing
• Decisions on design of collapsed tables are made by subject-matter experts at the Census Bureau
• For operational reasons, only one collapsed version of each base table will be available regardless of geographic area
How the Data Release Rules will Work with Collapsed Versions of Base Tables
More About Collapsing
• Collapsed Tables are designed to assure that derived products (profiles, ranking tables, subject tables,…) can still be sourced from the base tables
• 2005 Tables: if a table passes filtering and a collapsed version exists, publish both the original version and the collapsed version for that geographic area
Problems to fix in the current implementation of the data
release rules
• Collapsed versions missing in some cases
• Collapsed versions that aren’t working
• Poor choices in “sourcing” for derived products (eg., profiles)
Statistical Significance Testing Statistical Significance Testing
Why should I do it?Why should I do it?
When should I do it?When should I do it?
How do I do it?How do I do it?
Testing is ImportantTesting is Important
• Estimate X is bigger than YEstimate X is bigger than Y
• Estimate X this year is larger Estimate X this year is larger than X last yearthan X last year
• Estimate X is smaller than Estimate X is smaller than Census 2000 valueCensus 2000 value
• State Z has the highest valueState Z has the highest value
Statements you might want to makeStatements you might want to make
1.1. Get the Margin of Error (MOE) from ACS Get the Margin of Error (MOE) from ACS
2. Calculate the Standard Error (SE)2. Calculate the Standard Error (SE) [SE = MOE / 1.645][SE = MOE / 1.645]
3. Solve for Z where A and B are the two 3. Solve for Z where A and B are the two estimatesestimates
22 (SE(B))(SE(A))
BAZ
4. If Z < -1.645 or Z > 1.6454. If Z < -1.645 or Z > 1.645Difference is Significant at 90% confidenceDifference is Significant at 90% confidence
How do I do a significance test?How do I do a significance test?
Obtaining Standard Errors is the KeyObtaining Standard Errors is the Key
• Sum or Difference of EstimatesSum or Difference of Estimates
• Proportions and PercentsProportions and Percents
• Means and Other RatiosMeans and Other Ratios
Simple FormulasSimple Formulas
222 )()(1
BSEPASEB
PSE
22 )(BSEASEBASE
B
AP Where….
There is There is HELP HELP off in off in the the
wingswings
But what if I am using 2000But what if I am using 2000non-ACS Data?non-ACS Data?
Where’s are my MOEs?Where’s are my MOEs?
Lets get to work on the Standard ErrorLets get to work on the Standard Error
)1(Y5ΥSE NY
N = Size of publication area (population)
Y = Estimate of characteristic
XSurvey Design Factor
Survey Design Factor
www.census.gov/prod/cen2000/doc/tablec-xx.pdfxx=fl
Mode to Work 1.4 1.2 0.9 0.7
)1(Y5ΥSE NY
N = Size of publication area (population = 362,563 )
Y = Estimate of characteristic
5Y = 5* 126,540632,700
1 - (Y/N) = 126,540 / 362,5631- 0.3490152
0.6509848
SE = 641.7772
)1(Y5ΥSE NY X
Survey Design Factor
SE = 641.777 126,540 / 362,563 = 35%
Survey Design Factor
= 0.7Final Adjusted SE = 450
Tempting
Green is OKGreen is OK
This is NOTThis is NOT
Want to do an Want to do an exercise on your exercise on your
own?own?