Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Confounding
Lecture 4
2
Learning Objectives
n In this set of lectures we will: - Formally define confounding and give explicit examples of it’s
impact - Define adjustment and adjusted estimates conceptually - Begin a discussion of the analytics of adjustment
Section A
Confounding: A Formal Definition, and Some Examples
4
Learning Objectives
n Formally define confounding
n Establish conditions which can results in the confounding of an outcome/exposure relationship
n Demonstrate the potential effects of confounding via examples
5
Confounding (Lurking Variable)
n Consider results from the following (fictitious) study: - This study was done to investigate the association between
smoking and a certain disease in male and female adults - 210 smokers and 240 non-smokers were recruited for the study
Results for All Subjects
Smokers Non Smokers TOTALS
Disease 52 64 116
No Disease 158 176 334
TOTALS 210 240 450
0.9364/24021052
pp
RRsmokersnon
smokers ≈==−ˆˆˆ
0.91)p(1p
)p-(1pRO
smokersnonsmokersnon
smokerssmokers ≈×
×=
−=
−− 6415817652
ˆˆˆˆ
n Smoking is protective against disease?
n Most of the smokers are male and non-smokers are female
6
What’s Going On?
All Subjects
Smokers Non Smokers TOTALS
Male 160 40 200
Female 50 200 250
TOTALS 210 240 450
n Smoking is protective against disease?
n Further, most of the persons with disease are female
7
What’s Going On?
All Subjects
Disease No Disease TOTALS
Male 33 167 200
Female 83 167 250
TOTALS 116 324 450
n A picture?
8
What’s Going On?
Disease
Smoking Sex
n The comparison of disease risk between smokers and non-smokers is potentially distorted or negated by the disproportionate percentage of males among the smokers
9
What’s Going On?
n The original outcome of interest is DISEASE
n The original exposure of interest is SMOKING
n In this sample, SEX is related to both the outcome and exposure - This relationship is possible impacting overall relationship
between DISEASE and SMOKING
n How can we look at relationship between DISEASE and SMOKING removing any possible “interference” from SEX? - On approach – look at DISEASE and SMOKING relationship
separately for males and females 10
What’s Going On?
n Is smoking related to disease in males?
11
Example
Results for MALES
Smokers Non Smokers TOTALS
Disease 29 4 33
No Disease 131 36 167
TOTALS 160 40 200
1.84/40
16029p
pRR
smokersnon male
smokersmalemales ≈==
−ˆˆˆ
213143629
)p(1p)p-(1p
ROsmokersnon malesmokersnon male
smokersmale smokersmalemales ≈
×
×=
−=
−− ˆˆˆˆ
n Is smoking related to disease in females?
12
Example
Results for FEMALES
Smokers Non Smokers TOTALS
Disease 23 60 83
No Disease 27 140 167
TOTALS 50 200 250
1.560/200p
pRR
smokersnon female
smokersfemalefemales ≈==
−
5023ˆˆˆ
26027
14023)p(1p
)p-(1pRO
smokersnon femalesmokersnon female
smokersfemale smokersfemalefemales ≈
×
×=
−=
−− ˆˆˆˆ
13
Smoking, Disease and Sex
n A recap - The overall (sometimes called crude, unadjusted) relationship
(RR) between smoking and disease was nearly 1 (risk difference nearly 0)
- The sex specific results showed similar positive associations
between smoking and disease
MALES: FEMALES:
(note, for the moment we are not considering statistical significance, just using estimates to illustrate point)
02.0ˆˆˆ −== smokers-nonsmokers p-p0.93;RR
08.0ˆˆ;8.1ˆ ≈= smokers-non male smokersmale p-pRR
16.0ˆˆ;5.1ˆ ≈= smokers-non female smokersfemale p-pRR
14
Simpson’s Paradox
n The nature of an association can change (and even reverse direction) or disappear when data from several groups are combined to form a single group
n An association between an exposure X and an outcome Y can be confounded by another lurking (hidden) variable Z (or variables Z1, Z2, …)
15
Confounding (Lurking Variable)
n A confounder Z (or set of confounders Z1…Zp) distorts the true relation between X and Y
n This can happen if Z is related both to X and to Y
X Y
Z
n A picture
16
Y
X Z
What’s Going On?
17
What is the Solution for Confounding?
n If you DON’T KNOW what the potential confounders are, there’s not much you can do after the study is over - Randomization is the best protection - Randomization eliminates the potential links between the
exposure of interest and potential confounders Z1, Z2,..Z3
n If you can’t randomize but KNOW what the potential confounders are there are statistical methods to help control (adjust for confounders) - Potential confounders must be measured as part of study
18
Randomization Minimizes Threat of Confounding
n How/Why does randomization minimize the threat of confounding?
19
Example 2: Arm Circumference and Height
n An observational study to estimate association between arm circumference and height in Nepali children - 150 randomly selected subjects, ages [0, 12) months, had arm
circumference, weight and height measured - This study is observational—it is not possible to randomize
subjects to height groups!
20
Example 2: Arm Circumference and Height
n The data - Arm circumference range: 7.3–15.6 cm - Height range: 40.9–73.3 cm - Weight range: 1.6 – 9.9 kg
n Scatterplot: arm circumference by height
21
Example 2: Arm Circumference and Height
810
1214
16
Arm
Circ
umfe
renc
e (c
m)
40 50 60 70 80Height (cm)
With Regression Line
Nepalese Children < 12 Months (n= 150)Arm Circumference versus Height
45.016.07.2ˆ
21
=
+=
Rxy
n Notice, perhaps not surprisingly:
22
Example 2: Arm Circumference and Height
4050
6070
80
Hei
ght
(cm
)2 4 6 8 10
Weight (kg)
Nepalese Children < 12 Months (n= 150)Height versus Weight
810
1214
16
Arm
Circ
umfe
renc
e (c
m)
2 4 6 8 10Weight (kg)
Nepalese Children < 12 Months (n= 150)Arm Circumference versus Weight
70.02 =R 86.02 =R
n Scatterplot: arm circumference by height, after adjusting for weight
23
Example 2: Arm Circumference and Height
Arm
Circ
umfe
renc
e (c
m)
Height (cm)
Nepalese Children < 12 Months (n= 150)Arm Circumference versus Height
16.0)(1̂ −=heightβ
24
Example 3: South African Study
n A longitudinal study from South Africa: birth cohort, followed up five years after birth1 : Participation by medical aid status at birth, all baseline participants
1 Morell C. Simpson's Paradox: An Example From a Longitudinal Study in South Africa. Journal of Statistical Education (1999)
95% CI: 0.53 to 0.92
All Subjects
Medical Aid No Medical Aid TOTAL
Follow-Up Participation
46 370 416
No Follow-Up Participation
195 979 1,164
TOTAL 241 1,349 1,590
0.70.270.19
370/1,34924146
pp
RRaid medical no
aid medicalup-follow ≈===
ˆˆˆ
25
Example 3: South African Study
n A longitudinal study from South Africa: birth cohort, followed up five years after birth : Participation by medical aid status at birth, Black participants
95% CI: 0.76 to 1.36
Black Subjects
Medical Aid No Medical Aid TOTAL
Follow-Up Participation
36 368 404
No Follow-Up Participation
91 957 1,048
TOTAL 127 1,325 1,452
1.0.280.28
368/1,32512736
p̂p̂RR̂
Blackaid medical no
Blackaid medical Blackup-follow ≈===
26
Example 3: South African Study
n A longitudinal study from South Africa: birth cohort, followed up five years after birth : Participation by medical aid status at birth, White participants
95% CI: 0.25 to 4.5
White
Medical Aid No Medical Aid TOTAL
Follow-Up Participation
10 2 12
No Follow-Up Participation
104 22 126
TOTAL 114 24 138
1.05.083
0.0882/24
11410pp
RRWhite aid medical no
White aid medicalWhite up-follow ≈===
ˆˆˆ
27
Example 3: South African Study
n Whats going on?
n Race - Majority of sample Black subjects (91%)
n Race and follow-up participation - 26% of Black subjects completed follow-up as compared to 9% of
White subjects
n Race and medical aid - 9% of Black subjects had medical aid compared to 83% of White
subjects
28
Example 3: South African Study
n Recap
29
Example 4: “Batch Effects” In Lab Based Analyses
n Lab based results can be influenced by the technician, the laboratory used, the time of day, the temperature in the lab etc..
n If the goal of a study is to ascertain differences in lab measures between groups (for example diseased and non-diseased), and the group is associated with at least some of the above characteristics, then there can be confounding
30
Summary
n In non-randomized studies, outcome/exposures relationships of interest may be confounded by other variables
n In order to confound an outcome/exposure relationship, a variable must be related to both the outcome and exposure
Section B
Adjusted Estimates: Presentation, Interpretation and Utility for Assessing Confounding
32
Learning Objectives
n Understand how to interpret estimates of association that have been adjusted to control for a confounder
n Compare/contrast the comparisons being made by unadjusted and adjusted association estimates
33
Adjustment
n Adjustment is a method for making comparable comparisons between groups in the presence of a confounder/confounding variables
n We will discuss the basics of the mechanics behind adjustment in the next lecture section
34
Example 1: Fictitious Study
n Consider results from the following (fictitious) study: - This study was done to investigate the association between
smoking and a certain disease in male and female adults - 210 smokers and 240 non-smokers were recruited for the study
Results for All Subjects
Smokers Non Smokers TOTALS
Disease 52 64 116
No Disease 158 176 334
TOTALS 210 240 450
0.9364/24021052
pp
RRsmokersnon
smokers ≈==−ˆˆˆ
35
Example 1: Fictitious Study
n This relative risk is being influenced by the difference sex distributions among smokers and non-smokers
n This relative risk compares all smokers to all non-smokers in the sample without taking any other factors into account: this is called the unadjusted or crude estimated association between disease and smoking
36
Example 1: Fictitious Study
n Adjustment provides a mechanism for estimating an outcome/exposure relationship after removing the potential distortion or negation that comes from a confounder or multiple confounders
n In the fictional example, for example, the relationship between disease and smoking can be adjusted for sex
n Frequently, the presentation of results from non-randomized studies will include a table of unadjusted and adjusted measures of association
n Example: table of relative risks
37
Example 1: Fictitious Study
Table 2: Unadjusted and Adjusted Relative Risks of Disease
Unadjusted Adjusted1
Non-‐Smoker ref refSmoker 0.93 (0.68, 1.27) 1.57 (1.12, 2.20)
1 adjusted for sex
n Unadjusted estimated relative risk, 0.93
n Adjusted estimated relative risk, 1.57
38
Example 1: Fictitious Study
n Comparing unadjusted and adjusted associations to assess confounding
39
Example 1: Fictitious Study
Table 2: Unadjusted and Adjusted Relative Risks of Disease
Unadjusted Adjusted1
Non-‐Smoker ref refSmoker 0.93 (0.68, 1.27) 1.57 (1.12, 2.20)
1 adjusted for sex
40
Example 2: Arm Circumference and Height
n An observational study to estimate association between arm circumference and height in Nepali children - 150 randomly selected subjects, ages [0, 12) months, had arm
circumference, weight and height measured - This study is observational—it is not possible to randomize
subjects to height groups!
41
Example 2: Arm Circumference and Height
n The data - Arm circumference range: 7.3–15.6 cm - Height range: 40.9–73.3 cm - Weight range: 1.6 – 9.9 kg
n Frequently, the presentation of results from non-randomized studies will include a table of unadjusted and adjusted measures of association
n Example: table of linear regression slopes
42
Example 2: Arm Circumference and Height
Table 2: Regression Slopes for Arm CircumferenceUnadjusted Adjusted
Height (cm) 0.16 (0.13, 0.19) -‐0.16 (-‐0.21, -‐0.11)Weight (kg) 0.80 (0.72, 0.89) 1.40 (1.21, 1.60)
n Unadjusted linear regression slope estimate for height,
n Adjusted linear regression slope estimated for height,
43
Example 2: Arm Circumference and Height
16.0ˆ =heightβ
16.0ˆ −=heightβ
n Comparing unadjusted and adjusted associations to assess confounding
44
Example 2: Arm Circumference and Height
Table 2: Regression Slopes for Arm CircumferenceUnadjusted Adjusted
Height (cm) 0.16 (0.13, 0.19) -‐0.16 (-‐0.21, -‐0.11)Weight (kg) 0.80 (0.72, 0.89) 1.40 (1.21, 1.60)
Example 3: Academic Physician Salaries1
n From abstract
1 Jagsi R, et al. Gender Differences in the Salaries of Physician Researchers. Journal of the American Medical Association (2012); 307(22); 2410-2417.
45
n Unadjusted linear regression slope estimate for sex (1=M, 0 = F)
n Adjusted linear regression slope estimated for sex (1=M, 0 = F)
( after adjustment for specialty, academic rank, leadership positions, publications, and research time)
46
Example 3: Academic Physician Salaries
764,32$ˆ =sexβ
399,13$ˆ =sexβ
n Unadjusted linear regression slope estimate for sex (1=M, 0 = F)
n Adjusted linear regression slope estimated for sex (1=M, 0 = F)
( after adjustment for specialty, academic rank, leadership positions, publications, and research time)
47
Example 3: Academic Physician Salaries
764,32$ˆ =sexβ
399,13$ˆ =sexβ
n Adjustment is a method for making comparable comparisons between groups in the presence of a confounder/confounding variables
n The group comparisons made by adjusted associations are more specific than those made by unadjusted (crude) associations
n Contrasting crude and adjusted association estimates is useful for identifying confounding
48
Summary
Section C
Adjusted Estimates: The General Idea Behind the Computations
50
Learning Objectives
n Gain some insight conceptually as to how adjusted estimates are computed
51
Example 1: Fictitious Study
n Consider results from the following (fictitious) study: - This study was done to investigate the association between
smoking and a certain disease in male and female adults - 210 smokers and 240 non-smokers were recruited for the study
Results for All Subjects
Smokers Non Smokers TOTALS
Disease 52 64 116
No Disease 158 176 334
TOTALS 210 240 450
0.9364/24021052
pp
RRsmokersnon
smokers ≈==−ˆˆˆ
52
Example 1 :Smoking, Disease and Sex
n A recap - The overall (sometimes called crude, unadjusted) relationship
(RR) between smoking and disease was nearly 1 (risk difference nearly 0)
- The sex specific results showed similar positive associations
between smoking and disease
MALES: FEMALES:
(note, for the moment we are not considering statistical significance, just using estimates to illustrate point)
0.93;RR̂ =
;8.1RR̂ =;5.1RR̂ =
53
Example 1: How to Adjust for Confounding?
n Stratify when Z is categorical - Look at tables separately - For our example, separate tables for males and females - Take weighted average of stratum specific estimates Ex: To get a sex adjusted relative risk for the smoking disease
relationship we could weight the sex-specific relative risks by numbers of males and females
femalesmales
femalesfemalesmalesmalesadjusted sex nn
RRnRRnRR
+
×+×=
ˆˆˆ
1.6250200
1.52501.8200RR adjusted sex ≈+
×+×=ˆ
54
Example 1: How to Adjust for Confounding?
n There are better ways than this to take such a weighted average (weighting by standard error, for example), but this just illustrates the concept
n Confidence intervals can be computed for these adjusted measures of association
n Multiple regression (in this case, logistic) will be a very useful tool for performing adjustment
n Scatterplot: arm circumference by height
55
Example 2: Arm Circumference and Height
810
1214
16
Arm
Circ
umfe
renc
e (c
m)
40 50 60 70 80Height (cm)
With Regression Line
Nepalese Children < 12 Months (n= 150)Arm Circumference versus Height
45.016.07.2ˆ
21
=
+=
Rxy
n IDEA Scatterplots: arm circumference by height, stratified by weight values
56
Example 2: Arm Circumference and Height
n The adjusted association between Y and X, adjusted for a single potential confounder Z can be estimated by: - Stratifying on Z (hard to operationalize is Z is continous) - Estimate the Y/X relationship for each strata of Z - Take a weighted estimate of all Z strata specific Y/X
associations
n Idea can be generalized to estimating the adjusted association between Y and X, adjusted for a multiple potential confounders Z1, Z2, ….Zc
57
Summary
n Multiple regression methods will make the adjustment process easy and straightforward
58
Summary