C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys
CPES Sample Designs
and Analysis Weights
Steve Heeringa
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 204/21/23
Sample Design and WeightingDocumentation
• www.icpsr.umich.edu/CPES • Table of contents
– Sample design– Weighting
• Detailed description of sample design and weighting procedures.
• Also, Heeringa et al. (2004). “Sample Designs and Sampling Methods for the Collaborative Psychiatric Epidemiology Studies (CPES).” International Journal of Methods in Psychiatric Research, 13(4), 221- 239.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 304/21/23
CPES Estimation and Inference
• CPES is based on the integration of three nationally representative multi-stage area probability samples.
• Estimation will require weighted analysis to compensate for disproportionate sampling.
• Inference (design-based) will require the use of variance estimation procedures for complex sample designs.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 404/21/23
Complex Sample Design
• Probability sample design:– Each population element has a known, non-
zero selection probability
– Properly weighted, sample estimates are unbiased or nearly unbiased for the corresponding population statistic.
– Variance of sample statistics can be estimated from the sample data (measurability)
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 504/21/23
Complex Sample Design
• “Complex sample” :– A probability sample developed using
sampling procedures such as stratification, clustering and weighting designed to improve statistical efficiency, reduce costs or improve precision for subgroup analyses relative to simple random sampling (SRS)
– Unbiased estimates with measurable sampling error are still possible
– Independence of observations, equal probabilities of selection may no longer hold
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 604/21/23
Where are Complex Sample Designs Used?
• Complex sample designs are the rule and not the exception in sample-based studies in the Social Sciences, Epidemiology, Public Health, Agriculture, Natural Resources and many other scientific fields.
• Multi-stage area probability sample designs for household surveys: NCS-R, NSAL, NLAAS, NHIS, NHANES, CPS, etc.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 704/21/23
Multi-Stage Sample of U.S. Households
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 804/21/23
Consortium in Psychiatric Epidemiology Studies (CPES)
• Sampled persons are interviewed about mental health and mental health-related matters.
• Separate estimates (oversampling) required for domains defined by:– Age-sex groups, African Americans,
Latinos, Asians and All Others.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 904/21/23
CPES SAMPLE DESIGN(Primary stage)
• The PSUs are mostly single counties or metropolitan statistical areas (MSAs).
• Stratify PSUs by region and MSA status.• Select PSUs by probability proportionate
to estimated size (PPES) sampling (a few selected with certainty).
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1004/21/23
CPES SAMPLE DESIGN(Second Stage)
• The SSUs are area segments. Area segments are Census blocks or combinations of Census blocks with a minimum number of households, e.g. 50 for NCS-R.
• Area segments are selected with PPES-typically 6-20 area segments per PSU
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1104/21/23
CPES SAMPLE DESIGN(Third Stage)
• Before selection can occur, field staff list the housing units in the selected area segments.
• For each segment, a subsample of housing units is selected from the prepared list.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1204/21/23
CPES SAMPLE DESIGN(Fourth Stage)
Conduct a screening interview to classify persons by domain.
• Subsample persons to give required sampling rates by domain.– Survey weights will reflect this subsampling
step
• Conduct interviews with sample persons (SPs).
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1304/21/23
Basic Weighting Approach
• Suppose sample element i was selected with probability πi. Then sample element i represents (1 / πi ) elements in the population.
• That is, count the element i in the analysis by giving it a weight of
wi = (1 / πi ).
• For example, a sample element selected with probability 1/10 represents 10 elements in the population.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1404/21/23
Weighting
Weighting is used to compensate for• Unequal probabilities of selection• Nonresponse (typically, a unit that fails to
respond)• In poststratification to adjust weighted
sample distributions for certain variables (e.g., age and sex) to make them conform to the known population distribution.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1504/21/23
Overall Weight
CPES weights incorporate simultaneously all three components, unequal probabilities of selection, nonresponse, and poststratification:
sel nr pstratW W W W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1604/21/23
Weighted Estimation
• In probability samples in which the sample elements have been selected with different probabilities and each case has been assigned a weight, Wi=1/ πi, the unbiased estimator of the population mean is:
1
1
n
i ii
weighted n
ii
W yy
W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1704/21/23
Weighted Estimation
• The simple unweighted estimator of the sample mean is a special case of the weighted estimator of the population mean with Wi=1 for all cases:
1 1 1
1 1
1
1
n n n
i i i ii i i
weighted n n
ii i
W y y yy y
nW
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1804/21/23
Weighted Estimates of Population Statistics
2
2
2 1
1
xy
1,
1
Weighted estimate of the Variance of a Random Variable: S
1
Weighted Estimate of the Covariance of Variables y and x: S
( )
1
weighted
n
i ii
n
ii
n
i i ii
xy weighted n
ii
W y ys
W
W y y x xs
W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 1904/21/23
Weighted Estimation
1
2
1
Weighted Estimate of the
Simple Linear Regression Coefficient:
( )ˆ
( )
.
n
i i ii
weighted n
i ii
W y y x x
W x x
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2004/21/23
Weighted Estimation
1
1
Weighted Estimate of
Multiple Linear Regression Coefficients,
( ' ) '
where: W is an n x n diagonal matrix with weight,
W 0 0
W= 0 0
0 0 n
X WX X WY
W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2104/21/23
Weighted Estimation
'
'
Pseudo-Maximum Likelihood, Weights are included in the score equations, e.g. for Logistic Model
ln
h i h i h i
ih
h i h i h iih
L BU B
w x y
w x P
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2204/21/23
Effect of Weights on Variances of Estimates of Means
LW = Relative increase in sampling variances of estimates due to weighting
Generally, oversampling for subgroups:LW,SUB < LW
22 1
2
1
~ ( ) 1
n
ii
n
i
WCV W n
W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2304/21/23
Example of Weighting Lossin Disproportionate Sampling
Stratum% Hispanic Population
Oversampling Rate Weight
% of Hispanic Sample
1 19.2% 1:1 4 7%
2 22.8% 2:1 2 7%
3 24.1% 3:1 1.33 26%
4 33.9% 4:1 1 50%
100% 100%
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2404/21/23
Example of Weighting Lossin Disproportionate Sampling
21
2
1
For 1000:
2759.92,148,569
1
1000 1 .284
ni
Wn
i
n
WL n
W
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2504/21/23
Scaling of Weights
• Analysis based on:
* **1; ;
: a and b are constants
ii i i i
i
WW W a W W
b
where
will yield the same results (provided the software program treats the weights correctly).
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2604/21/23
Normalizing Weights
,
1
i,cent
cent
Properties of normalized weights include:
W
W 1.0
i cent i n
ii
nW W
W
n
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2704/21/23
Example: Distribution of NCS Part 2 Weights (Normalized)
MomentsN 5877 Sum Weights 5877
Mean 1.00000007 Sum Observations
5877.00039
Std Deviation 1.16010775 Variance 1.34584999
Skewness 2.8311973 Kurtosis 7.9882
Uncorrected SS 13785.2153 Corrected SS 7908.21452
Coeff Variation 116.010767 Std Error Mean 0.01513284
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2804/21/23
CPES Weighting Approach
• Separately, NCS-R, NSAL and NLAAS cases are assigned to 1 of 12 distinct race/ancestry groups– Vietnamese, Filipino, Chinese, All other
Asian, Cuban, Puerto Rican, Mexican, All other Hispanic, Afro-Caribbean, African-American, White, All Other
– Persons with multiple ancestry are assigned to the first applicable discrete category in this list.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 2904/21/23
CPES Weighting Approach
• Separately, NCS-R, NSAL and NLAAS cases are assigned to 1 of 11 distinct geographic domains defined based on the race/ancestry composition of the Census tract in which they resided at the time of interview.– Hawaii is only represented in NLAAS
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3004/21/23
CPES Weighting Approach
• The March 2002 Current Population Survey (CPS) was used to estimate the total size of the adult population in each race/ancestry stratum.
• Within each race/ancestry stratum the distribution of the population to the 11 geographic domains was estimated based on the CPES study which could provide the most precise estimates, e.g. NSAL for African Americans.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3104/21/23
CPES Weighting Approach
• Separately, NCS-R, NSAL and NLAAS study-specific weights (see chart) were post-stratified to the 12 x 11 grid of population totals for race/ancestry by geographic domain.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3204/21/23
CPES Weighting Approach
• CPES weight values for the pooled data were constructed by scaling each individual study weight by a factor equal to the proportion of the total sample in a grid cell that originated with the component study.
• The result is a CPES weight that sums to the CPS 2002 post-stratification controls in the 12 x 11 grid.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3304/21/23
CPES Weighting Approach
• Long and Short form weight variables– NCS-R uses a two-phase sampling approach in which only a subsample of cases progress from Part 1 to Part 2.
• Therefore, there are 2 CPES weights:– CPESWTLG – used when one or more
variables of interest are measured only in NCS-R Part 2
– CPESWTSH – used when all variables of interest are measured in NCS-R Part 1
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3404/21/23
CPES Weighting Approach
• Weights for pooled analysis of data from only two of the three CPES data sets:– NCNLWTLG, NCNLWTSH: NLAAS and NCS-R– NCNSWTLG, NCNSWTSH: NSAL and NCS-R– NSNLWT : NLAAS and NSAL
• Derived as CPES weights but with final scaling step limited to proportion of sample in cell from the two studies.
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3504/21/23
Sample Sample Weight Mean S.E. CV
NCSR Short NCSRWTSH 1.00 0.0054 52.28
NCSR Long NCSRWTLG 1.00 0.0127 95.82
NLAAS --- NLAASWGT 6333.6 86.77 93.42
NSAL --- NSALWTPN 8022.2 165.84 161.22
CPES Short CPESWTSH 10468.2 75.25 101.69
CPES Long CPESWTLG 12693.4 152.03 153.49
NCSR-NLAAS Short ncnlwtsh 15061.7 100.10 78.44
NCSR-NLAAS Long ncnlwtlg 20290.6 261.12 130.87
NCSR-NSAL Short ncnswtsh 13588.6 106.91 97.52
NCSR-NSAL Long ncnswtlg 17731.9 248.46 152.04
NLAAS-NSAL --- nsnlwt 19553.1 608.91 322.59
Table 1: Descriptive Statistics for CPES Study Weights
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3604/21/23
Sample Sample size
Un-weighted Prevalence
% (s.e.)
Weighted Prevalence
% (s.e.)
White-Non-Hispanic
NCSRa 6696 17.95(0.47) 17.75(0.55)
CPESa 7567b 18.00(0.44) 17.92(0.54)
Black
NCSRa 1176 11.99(0.95) 11.29(1.04)
NSAL 3433 10.63(0.53) 10.31(0.58)
CPESa 4609b 10.98(0.46) 10.39(0.47)
Hispanic
NLAAS 2554 15.66(0.72) 13.79(0.68)
CPESa 3615 15.60(0.60) 13.71(0.63)
Table 2: Un-weighted and Weighted Prevalence of Major Depressive Disorder
a =short sample, b= excluding missing cases s.e.=standard error
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3704/21/23
Sample Sample size
Un-weighted Prevalence
% (s.e.)
Weighted Prevalence
% (s.e.)
White-Non-Hispanic
NCSRa 6696 6.45(0.30) 6.16(0.31)
CPESa 7566b 6.25(0.28) 6.08(0.32)
Black
NCSRa 1176 4.34(0.59) 4.14(0.74)
NSAL 3430 3.53(0.31) 3.26(0.36)
CPESa 4606b 3.73(0.28) 3.48(0.34)
Hispanic
NLAAS 2554 3.48(0.36) 2.82(0.40)
CPESa 3615 3.76(0.32) 3.03(0.34)
Table 3: Un-weighted and Weighted Prevalence of Generalized Anxiety Disorder
a =short sample, b= excluding missing cases s.e.=standard error
C P E SC P E S
NIMH Collaborative Psychiatric Epidemiology Surveys 3804/21/23
Sample Sample size
Un-weighted Prevalence
% (s.e.)
Weighted Prevalence
% (s.e.)
White-Non-Hispanic
NCSRa 4180 10.26(0.47) 6.84(0.54)
CPESb 4180b 10.26(0.47) 6.92(0.53)
Black
NCSRa 679 12.37(1.26) 7.4(0.86)
NSAL 3419c 9.15(0.49) 9.10(0.54)
CPESa 4098c 9.69(0.46) 8.76(0.49)
Hispanic
NLAAS 2554 5.29(0.44) 4.42(0.45)
CPESa 3259c 6.17(0.42) 4.74(0.39)
Table 4: Un-weighted and Weighted Prevalence of Post Traumatic Stress Disorder
a =long sample, b= only assessed in NCS-R , c= excluding missing cases s.e.=standard error