48
Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

  • Upload
    wray

  • View
    55

  • Download
    1

Embed Size (px)

DESCRIPTION

Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression). Goals. Understand the issue of confounding in statistical analysis Learn how to use matching and logistic regression to control for confounding. Confounding. - PowerPoint PPT Presentation

Citation preview

Page 1: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Advanced Data Analysis:Methods to Control for Confounding(Matching and Logistic Regression)

Page 2: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Goals

Understand the issue of confounding in statistical analysis

Learn how to use matching and logistic regression to control for confounding

Page 3: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Confounding Example: people in a gastrointestinal outbreak

Mostly members of the same dinner club BUT many club members also went to a city-wide food festival

Food handling practices in the dinner club might be blamed for the outbreak when food eaten at the festival was the cause

Membership in the dinner club could be a confounder of the relationship between attendance at the food festival and illness

Analyzing the data to account for both dinner club membership and food festival attendance could help determine which event was truly associated with the outcome

Page 4: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Confounding

Gastrointestinal outbreak (continued) Stratification methods could be used to

calculate the risk of illness due to the food festival for those in the dinner club vs. those not in the dinner club

If attending the food festival was a significant risk factor for illness in both groups, then the festival would be implicated because illness occurred whether or not people were members of the dinner club

Page 5: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Confounding What if there are multiple factors that might be

confounding the exposure-disease relationship? Using our previous example, what if we had to stratify

by membership in the dinner club and by health status? Or stratify by other potential confounders (age, occupation, income, etc.)?

Trying to stratify by all of these layers becomes difficult At this point more advanced methods are

needed: Logistic regression – controls for many potential

confounders at one time Matching – when incorporated correctly into the study

design, reduces confounding before analysis begins

Page 6: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Confounding Confounders In field epidemiology, we commonly compare

two groups by using measures of association: Risk ratio (RR) in cohort studies Odds ratio (OR) in case-control studies

May have multiple exposures significantly associated with disease or no exposures associated

In these cases you need to explore whether a confounder is present making it appear that exposures are associated with the disease (when they really are not) or making it appear that no association exists (when there really is one)

Page 7: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Confounders A confounder is a variable that distorts the risk

ratio or odds ratio of an exposure leading to an outcome

Confounding is a form of bias that can result in a distortion in the measure of association between an exposure and disease

Confounding must be eliminated for accurate results (1)

Confounding can occur in an observational epidemiologic study whenever two groups are compared to each other

Confounding is a “mixing of effects” when the groups are compared (exposure-disease relationship can be affected by factors other than the relationship)

Page 8: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Common Confounders

Common confounders include age, socioeconomic status and gender.

Examples: Children born later in the birth order are

more likely to have Down’s syndrome. Does birth order cause Down’s syndrome? No—relationship is confounded by mother’s age,

older women are more likely to have children with Down’s

Mother’s age confounds the association between birth order and Down’s syndrome: appears there is an association when there is not (2)

Page 9: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Common Confounders--Examples Women’s use of hormone replacement therapy

(HRT) and risk of cardiovascular disease Some studies suggest an association, others do not Women of higher socio-economic status (SES) are

more likely to be able to afford HRT Women of lower SES are at higher risk of

cardiovascular disease Differences in SES may thus confound the

relationship between HRT and cardiovascular disease Need to control for SES among study participants (3)

Page 10: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Common Confounders--Examples Hypothetical outbreak of gastroenteritis

at a restaurant Study shows women were at much greater

risk of the disease than men Association is confounded by eating salad—

women were much more likely to order salad than men

Salad was contaminated with disease-causing agent

Relationship between gender and disease was confounded by salad consumption (which was the true cause of the outbreak)

Page 11: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Characteristics of Confounders

Confounders must have two key characteristics: A confounder must be associated with the disease being

studied A confounder must be associated with the exposure

being studied

Page 12: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Controlling for Confounding To control for confounding you must take the

confounding variable out of the picture There are 3 ways to do this:

Restrict the analysis—analyze the exposure-disease relationship only among those at one level of the confounding variable

Example: look at the relationship between HRT and cardiovascular disease ONLY among women of high SES

Stratify—analyze the exposure-disease relationship separately for all levels of the confounding variable

Example: look at the relationship between HRT and cardiovascular disease separately among women of high SES and low SES

Conduct logistic regression—regression puts all the variables into a mathematical model

Makes it easy to account for multiple confounders that need to be controlled

Page 13: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Controlling for Confounding:Stratification Stratification can be used to separate the

effects of exposures and confounders Example: tuberculosis (TB) outbreak

among homeless men Homeless shelter and soup kitchen

implicated as the place of transmission Men likely to spend time in both places To determine which site is most likely, could

examine the association between the homeless shelter and TB among men who did NOT go to the soup kitchen and among men who DID go to the soup kitchen

Page 14: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example

Outbreak at a reception, cookies and punch have both been implicated Suspicion that one food item is

confounding the other Cannot tease out the effects without

stratifying because many people consumed both cookies and punch

Page 15: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example After conducting a case-control study,

overall data show the following:

Cases Controls Total

Cookies 37 21 58

No Cookies 13 29 42

Total 100

Cookie Exposure

OR = (37x29)/(21x13) = 3.93; 95% CI, 1.69 – 9.15 p= 0.001*  

Page 16: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Data continued…..

Cases Controls Total

Punch 40 20 60

No Punch 10 30 40

Total 100

Punch Exposure

OR = (40x30)/(20x10) = 6.00; 95% CI, 2.83 –12.71 p= 0.0004*

Page 17: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Both cookies and punch have a high odds ratio

for illness & a confidence interval that does not include 1

OR (cookies) = 3.93; 95% CI, 1.69 – 9.15, p= 0.001* OR (punch) = 6.00; 95% CI, 2.83 –12.71, p= 0.0004*

To stratify by punch exposure, we want to know: Among those who did not drink punch, what is the odds

ratio for the association between cookies and illness? Among those who did drink punch, what is the odds

ratio for the association between cookies and illness? If cookies are the culprit, there should be an

association between cookies and illness, regardless of whether anyone drank punch

Page 18: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Stratification of the cookie association by

punch exposure:

Cases Controls Total

Cookies 35 17 52

No Cookies 5 3 8

Total 60

Did have punch

OR = (35x3)/(17x5) = 1.3; 95% CI, 0.17 –7.22 p= 1.0* 

Page 19: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Stratification of the cookie association by

punch exposure:

Cases Controls Total

Cookies 2 4 6

No Cookies 8 26 34

Total 40

Did not have punch

OR = (2x26)/(4x8) = 1.63; 95% CI, 0.12 – 13.86 p= 0.63* 

Page 20: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example

To stratify by cookie exposure, we want to know: Among those who did not eat cookies, what

is the odds ratio for the association between punch and illness?

Among those who did eat cookies, what is the odds ratio for the association between punch and illness?

If punch is the culprit, there should be an association between punch and illness, regardless of whether anyone ate cookies

Page 21: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Stratification of the punch association

by cookie exposure:

Cases Controls Total

Punch 35 17 52

No Punch 2 4 6

Total 58

Did have cookies

OR = (35x4)/(17x2) = 4.12; 95% CI, 0.52 – 48.47p= 0.18*

Page 22: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification--Example Stratification of the punch association

by cookie exposure:

Cases Controls Total

Punch 5 3 8

No Punch 8 26 34

Total 42

Did not have cookies

OR = (5x26)/(3x8) = 5.42; 95% CI, < 0.80 – 40.95p= 0.08*

Page 23: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification

Stratification allows us to examine two risk factors independently of each other

In our cookies and punch example we can see that cookies were not really a risk factor independent of punch (stratified ORs ≈ 1)

Punch remained a potential risk factor independent of cookies (large ORs and p-values close to significant)

Page 24: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

More on Stratification Mantel-Haenszel odds ratio

Method of controlling for confounding using stratified analysis

Takes an association, stratifies it by a potential confounder and then combines these by averaging them into one estimate that is “controlled” for the stratifying variable

Cookies and punch example: 2 stratum-specific estimates of the association

between punch and illness (ORs of 4.1 and 5.4) More convenient to have only one estimate—can

average two estimates into a pooled or common odds ratio

Page 25: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Stratification and Effect Measure Modifiers Effect measure modification

One stratum shows no association (OR ≈ 1) while another stratum does have an association

No confounding third variable present, rather, need to identify and present estimates separately for each level or stratum

Example: if gender is an effect measure modifier, you should give 2 odds or risk ratios, 1 for men and 1 for women

You identify effect measure modification by stratification (same technique used to identify confounding) but you are looking for the measure of effect to be different between the 2 or more strata

Page 26: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Effect Measure Modifiers--Examples Among the elderly, gender is an effect modifier of

the association between nutritional intake and osteoporosis

Nutritional intake (calcium) is associated with osteoporosis among women

Among men this association is not so strong because men’s bone mineral content is not affected as much by nutritional intake

In developing countries, sanitation is an effect modifier of the association between breastfeeding and infant mortality

In unsanitary conditions, breastfeeding has a strong effect in reducing infant mortality

In cleaner conditions infant mortality is not very different between breastfed and bottle-fed infants

Page 27: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching

Matching can reduce confounding In case-control studies cases are matched to

controls on desired characteristics In cohort studies unexposed persons are

matched to exposed persons on desired characteristics

You must account for matching when analyzing matched data

Important that the matched variables not be exposures of interest

Page 28: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching--Example Hypothetical study where students in a high

school have reported a strange smell and sudden illness

Test the association between smelling an unusual odor and a set of symptoms

Match cases and controls on gender, grade and hallway

Precedents for ‘outbreaks’ of illness related to unusual odors in buildings, possibly psychogenic (ie. illness spread by panic rather than true cause)

Women are more reactive in this situation, grade level controls for age (different ages may react differently) and matching on hallway controls for actual odor observed (different locations may produce different odors)

Page 29: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching--Example

Cells e and h are concordant cells because the case and the control have the same exposure status

Cells f and g are discordant because the case and control have a different exposure status

Only the discordant cells give us useful data to contrast the exposure between cases and controls

Controls

Cases

ExposedNot

ExposedTotal

Exposed e f e + f

Not Exposed

g h g + h

Total e + g f + h

With matched case-control pairs, a 2x2 table is set up to examine pairs

Table 1: Analysis of matched pairs for a case control study

Page 30: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching--Example

A chi-square for matched data (McNemar’s chi-square) can be calculated using a statistical computing program Calculation examines discordant pairs and

results in a McNemar chi-square value and p-value

If the p-value <0.05, you can conclude that there is a statistically significant difference in exposure between cases and controls

Page 31: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching--Example

A table of discordant pairs can also be used to calculate a measure of association

Controls

Cases

Smell No Smell Total

Smell 6 12 18

No Smell 4 5 9

Total 10 17

Table 2: Sample data for sudden illness in a high school. Controls matched to cases on gender, grade, and hallway in the school

Page 32: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching--Example Calculating the odds ratio:

OR = (# pairs with exposed cases and unexposed cases) (# pairs with unexposed cases and exposed

controls) = f / g = 12/4 = 3.0

Interpretation: The odds of having a sudden onset of nausea,

vomiting, or fainting if students smelled an unusual odor in the school were 3.0 times the odds of having a sudden onset of these symptoms if students did not smell an unusual odor in the school, controlling for gender, grade, and location in the school.

Page 33: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching

An important note about matching: Once you have matched on a

variable, you cannot use that variable as a risk factor in your analysis

Cases and controls will have the exact same matched variables so they are useless as risk factors

Do not match on any variable you suspect might be a risk factor

Page 34: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

An Introduction to Logistic Regression

Logistic regression is a mathematical process that results in an odds ratio

Logistic regression can control for numerous confounders

The odds ratio produced by logistic regression is known as the “adjusted” odds ratio because its value has been adjusted for the confounders

Page 35: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

An Introduction to Logistic Regression

Outcome variable (sick or not sick) and exposure variable (exposed or not exposed) must both be dichotomous

Other variables (the confounders) can be dichotomous, categorical, or continuous

Page 36: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

An Introduction to Logistic Regression

Logistic regression uses an equation called a logit function to calculate the odds ratio

Using our earlier punch and cookies example, we suspect one of these food items is confounding the other

Variables would be: SICK (value is 1 if ill, 0 if not ill) PUNCH (1 if drank punch, 0 if did not drink

punch) COOKIES (1 if ate cookies, 0 if did not eat

cookies)

Page 37: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression--Example

General equation is: Logit (OUTCOME) = EXPOSURE +

CONFOUNDER1 + CONFOUNDER2 + CONFOUNDER3 + … (etc)

For our example: Outcome = variable SICK Exposure = variable PUNCH Confounder = variable COOKIES Equation is: Logit (SICK) = PUNCH + COOKIES

Page 38: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression--Example Computer uses the math behind logistic

regression to give the results as odds ratios Each variable on the right side will have its

own odds ratio Odds ratio for PUNCH would be the odds of

becoming ill if punch was consumed compared to the odds of becoming ill if punch was not consumed, controlling for COOKIES

Odds ratio for COOKIES is the odds of becoming ill if cookies were consumed compared to the odds of becoming ill if cookies were not consumed, controlling for PUNCH

Page 39: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression: Important Points Each variable on the right side of the equation is

controlling for all the other variables on the right side of the equation

If you are not sure whether one of several variables is a confounder, you can examine them all at the same time

Two important warnings: Do not put too many variables in the equation (a loose

rule of thumb is you can add one variable for every 25 observations)

You cannot control for confounders you did not measure (Example: if a child’s attendance at a particular daycare was a confounder of the SICK-PUNCH relationship, but you do not have data on children’s daycare attendance, you cannot control for it.)

Page 40: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression & Matching

Logistic regression can also account for matching in the data analysis Known as conditional logistic regression Computer calculates odds ratios similar to

McNemar’s test but the results are “conditioned” on the matching variables

Can be done using Epi Info Interpretation of matched odds ratios

(MORs) using conditional logistic regression is the same as interpretation of MORs calculated from tables

Page 41: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression For many investigations you may not need to

use logistic regression Logistic regression is helpful in managing

confounding variables, useful with large datasets and in studies designed to establish risk factors for chronic conditions, cancer cluster investigations or other situations with numerous confounding factors

Many software packages can simplify data analysis using logistic regression

SAS, SPSS, STATA and Epi Info are a few examples

Page 42: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression: Software Packages Common software packages used for data

analysis, including logistic regression* SAS – Cary, NC http://www.sas.com/index.html SPSS – Chicago, IL http://www.spss.com/ STATA –College Station, TX http://www.stata.com Epi Info –Atlanta, GA http://www.cdc.gov/EpiInfo/ Episheet – Boston, MA http

://members.aol.com/krothman/modepi.htm(Episheet cannot do logistic regression but is useful for simpler analyses, e.g., 2x2 tables and stratified analyses.)

*This is not a comprehensive list, and UNC does not specifically endorse any particular software package.

Page 43: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression--Examples

Wedding Reception, 1997 (4)

Guests complained of a diarrheal illness diagnosed as cyclosporiasis

Univariate analysis (using 2x2 tables) showed eating raspberries was the exposure most strongly associated with risk for illness

Multivariate logistic regression showed same results

Investigators determined raspberries had not been washed

Page 44: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Logistic Regression--Examples Assessing the relationship between obesity

and concern about food security (5) Washington State Dept. of Health analyzed data from

the 1995-99 Behavioral Risk Factor Surveillance System

A variable indicating concern about food security was analyzed using a logistic regression model with income and education as potential confounders

Persons who reported being concerned about food security were more likely to be obese than those who did not report such concerns (adjusted OR = 1.29, 95% CI: 1.04-1.83)

Page 45: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching & Conditional Logistic Regression--Examples Foodborne Salmonella Newport outbreak, 2002

(6) Affected 47 people from 5 different states Case-control study carried out, controls matched by

age-group Logistic regression conducted to control for

confounders Cases were more likely than controls to have eaten

ground beef (MOR = 2.3, 95% CI: 0.9-5.7) and more likely to have eaten raw or undercooked ground beef (MOR = 50.9, 95% CI: 5.3-489.0)

No specific contamination event identified but public health alert issued to remind consumers about safe food-handling practices

Page 46: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Matching & Conditional Logistic Regression--Examples Outbreak of typhoid fever in Tajikistan, 1996-97 (7)

10,000 people affected in outbreak, case-control study conducted

Cases were culture positive for the organism (Salmonella serotype Typhi)

Using 2x2 tables, illness was associated with: Drinking unboiled water in the 30 days before onset (MOR =

6.5, 95% CI: 3.0-24.0) Using drinking water from a tap outside the home (MOR = 9.1,

95% CI: 1.6-82.0) Eating food from a street vendor (MOR = 2.9, 95% CI: 1.4-7.2)

When all variables were included in conditional logistic regression, only drinking unboiled water (MOR = 9.6, 95% CI: 2.7-334.0) and obtaining water from an outside tap (MOR = 16.7, 95% CI: 2.0-138.0) were significantly associated with illness

Routinely boiling drinking water was protective (MOR = 0.2, 95% CI: 0.05-0.5)

Page 47: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

Conclusion

Controlling for confounding can be done using matched study design and logistic regression

While complicated, with practice these methods can be as easy to use as 2x2 tables

Page 48: Advanced Data Analysis: Methods to Control for Confounding (Matching and Logistic Regression)

References1. Gregg MB. Field Epidemiology. 2nd ed. New York, NY: Oxford

University Press; 2002.2. Hecht CA, Hook EB. Rates of Down syndrome at livebirth by one-

year maternal age intervals in studies with apparent close to complete ascertainment in populations of European origin: a proposed revised rate schedule for use in genetic and prenatal screening. Am J Med Genet. 1996;62:376-385.

3. Humphrey LL, Nelson HD, Chan BKS, Nygren P, Allan J, Teutsch S. Relationship between hormone replacement therapy, socioeconomic status, and coronary heart disease. JAMA. 2003;289:45. 

4. Centers for Disease Control and Prevention. Update: Outbreaks of Cyclosporiasis -- United States, 1997. MMWR Morb Mort Wkly Rep. 1997;46:461-462. Available at: http://www.cdc.gov/mmwr/PDF/ wk/mm4621.pdf. Accessed December 12, 2006.

5. Centers for Disease Control and Prevention. Self-reported concern about food security associated with obesity --- Washington, 1995—1999. MMWR Morb Mort Wkly Rep. 2003;52:840-842. Available at: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5235a3.htm. Accessed December 12, 2006.