Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
ACPM Review Course Epidemiology
Objective of session: to review the major terms, definitions and
concepts of epidemiology
Agenda Introduction Measures of disease frequency Descriptive epidemiology Measures of excess risk Study Design
» Descriptive studies » Case-control studies » Cohort studies » Clinical trials and quasi-experiments
Agenda (cont.)
Epidemiologic evidence and causal inference
Confounding and effect modification Disease screening Infectious disease epidemiology
Measures of Disease Frequency
Counts = number of people with a disease
Rates - account for the denominator, or size of the population, and imply a period of time
Cumulative Incidence (most commonly used as synonymous with "incidence")
= number of new cases of a disease occurring in a specified time period number of people initially at risk Synonymous with
» attack rate » risk of disease » probability of getting disease
Incidence density
= number of new cases of a disease occurring in a specified time period .
total amount of "person-time" at risk contributed during the time period
Estimates the instantaneous rate of
occurrence of disease per unit of time relative to the size of the population at risk
Prevalence
= number of existing cases of a disease at a specified time .
size of base population at that time
A "snapshot" view of disease frequency in a population at a single point in time, important for planning and allocation of resources
"Point prevalence" refers to the prevalence at a single point in time
"Period prevalence" refers to the prevalence measured for a specific time interval
Mortality
= number of people dying in a specified time period .
average number of people alive during that period of time # dying of a disease in the
numerator produces: Disease-specific Mortality
Case fatality
= number of people who die of a disease total number of people who get the disease
Measures disease prognosis rather than
disease frequency
Proportional mortality
number of deaths = due to a disease in a specified time period . total number of deaths during that time period
Can be misleading if mortality rates for other
causes are unusually high or low in a group
Proportional mortality
1960’s study of Hodgkins Disease in teachers: » 2.5% of deaths in teachers due to Hodgkins » 1.0% of deaths in general population due to
Hodgkins Authors concluded that teachers were at
2.5 times higher risk for death from Hodgkins
Proportional mortality
In Denver, 10% of deaths in white children under 10 years are due to leukemia, vs. 5% of deaths in black children. Which of the following are true? » The relative risk for leukemia in white vs. black
children is 2.0 » The attributable risk for leukemia in white vs. black
children is 5/100 » Neither the attributable risk nor the relative risk
may be determined from the data provided
Relationships between disease rates
Prevalence, incidence (density), and duration of disease are related:
prevalence = incidence X average duration of disease .
1 + (incidence X average duration of disease) = incidence X duration (when prevalence is low, i.e.,<10%)
Holds only when incidence and duration are stable over
time Useful in predicting what a change in one variable will
cause in another
Relationships between disease rates
Mortality, incidence, and case fatality:
mortality = incidence X case fatality
(when incidence and case fatality are stable over time)
Relationships between disease rates
Survival rate and case fatality:
1 - case fatality = survival rate
Other useful rates and measures
Birth rate =
Number of live births in a year . Population (in thousands) at midyear
Other useful rates and measures
Fertility rate = number of live births reported in one year number (in thousands) of women age 15-44 years at midyear
Some sources calculate total fertility rate by
summing births/women for 5 year age categories Some sources extend the age range to 10-49 years
Other useful rates and measures
Fetal death rate (stillbirth rate) =
annual number of fetal deaths (gest. age 20 wks/350 g) annual number of fetal deaths plus live births (in thousands)
Other useful rates and measures
Neonatal death rate =
annual number of deaths in the first 28 days of life annual number of live births (in thousands)
Other useful rates and measures
Perinatal death rate =
annual fetal deaths plus deaths in the first 7 or 28 days of life . annual number of fetal deaths* plus live births (in thousands) *Defined as either after 20 or 28 weeks gestation
Other useful rates and measures
Infant death rate =
annual number of deaths in the first year of life annual number of live births (in thousands)
Years of Potential Life Lost (YPLL)
A measure of premature mortality, YPLL takes into account not only the cause of death but the age of occurrence
Calculated by multiplying the number of cause-specific deaths in an age group and multiplying by the difference between the midpoint of the age group and age 75 (or the average age at death)
Years of Potential Life Lost Age group 0-10 11-20 21-30 # deaths 2 9 4 # years lost between age 75 and 75-5= 75-15= 75-25= midpoint of interval 70 60 50 total # years lost 70x2= 140 60x9= 540 50x4= 200 TOTAL YPLL = 140+540+200 = 840
Life expectancy and life-time risk
Average age at death given live birth Additional life expectancy changes as
you age (and don’t die young) Age-specific life-expectancy comes from
life-table (survival) analysis Be careful of cumulative life-time risk
estimates that don’t give you the life expectancy on which they are based
Descriptive epidemiology
Practical uses of descriptive epidemiology data: » To provide clues to etiology and means of
prevention » To help target screening efforts » To aid in diagnosis » To aid in the planning of health services » To provide baseline data
Sources of numerator data for descriptive epidemiology
Vital records (birth and death certificates) Disease reports (for example, reportable
diseases, tumor registries) Medical records Surveys
Numerator data
Problems can arise with numerator data from problems in defining a case to variability in methods of identifying a case
Sources of denominator data
Census/vital statistics records Enrollment records (health plans,
industry or union records, alumni rosters, etc.)
Framework for reporting descriptive data
What kinds of people get the disease? » Age: immunity to infectious diseases; slowly
developing diseases; diseases with long latency periods (time between exposure to causative agent and onset of disease); environmental exposures that vary with age
» Gender: anatomic and physiologic differences which affect susceptibility; many differences in life style, environmental exposures
What kinds of people get the disease?
» Race/ethnicity: genetic differences in susceptibility; frequent association with socio-economic status, life style
» Socio-economic status: nutritional factors; life style; adequacy of medical care
» Occupation » Marital status » Other (factors important for some diseases
and not others)
Where is the disease common or rare, and what are the characteristics of those places?
Physical environment Man-made exposures that vary by place
How does the frequency of the disease change over time?
--and, what historical factors appear to correlate with those changes?
Short-term trends; disease outbreaks Secular (long-term ) trends Cyclical variations
Cohort Effect
Age and time may interact to produce a "cohort effect", a point-in-time, cross-sectional observation that reflects variation in disease rates based on year of birth, and variation in disease rate within each cohort by age
Suspect whenever you see an unexpected decline in older age groups
Cohort effect
Lung Cancer mortality
Age
Cohort effect
Lung Cancer mortality
Age
1910 1920
1930 1940 1950
1960
90 80 70 60 50 40 30
Measures of excess risk
The need to identify individuals at increased risk for contracting a disease pervades all aspects of medicine
Prevention
Who is at risk and should be targeted for primary and secondary procedures?
Diagnosis
Given the characteristics of (and therefore the constellation of risks for) a given individual presenting with a certain symptom complex, what is the most likely diagnosis?
Management
Once a patient is known to have a given condition, for what is he/she at further risk?
What characteristics put an individual for an adverse reaction to a potential therapeutic intervention?
Two-by-two table for the computation of excess risk
Disease Present Absent
Risk Present a b Factor Absent c d
Relative risk and odds ratio
Relative risk (risk ratio, rate ratio) is the ratio of the incidence of a condition in the group of individuals with a specific characteristic (a "risk factor") to the incidence in the group of individuals without the risk factor
Relative risk
Relative risk (RR) = incidence in the exposed
incidence in the unexposed = a/(a+b) c/(c+d)
Odds ratio
The relative risk can be estimated by the odds ratio
The odds ratio is used in case-control studies, where individuals with a disease are compared to individuals without the disease for the presence or absence of a risk factor
Odds ratio
Note that, in the computation of the relative risk, if the disease is rare, then a would be small compared to b, c would be small compared to d, and the relative risk could be approximated by the cross-product of the two-by-two table:
Odds ratio (OR) = a/b = ad c/d bc
Attributable risk
Attributable risk, (risk difference, rate difference) is the difference between the incidence of the disease in individuals with a risk factor and in those without
Accounts for the baseline incidence of disease and gives the absolute amount of excess risk an individual
Requires a prospective study in order to produce incidence rates, more difficult to interpret
Attributable risk
Attributable risk (AR) = incidence _ incidence in exposed in unexposed = {a/(a+b)} - {c/(c+d)}
Number needed to treat (NNT)
Represents a more interpretable transformation of the attributable risk
Can be the number needed to screen, number needed to treat, or number needed to harm
NNT = 1/Attributable risk “You’d have to treat (NNT) people to
gain one additional outcome”
Attributable risk percent
Attributable risk percent (attributable rate percent, attributable proportion, attributable fraction, etiologic fraction) is the proportion of disease among those with a risk factor that is due to the risk factor
Tells you the amount of disease in the exposed group that is due to the exposure
Attributable risk percent
Attributable risk percent (AR%) = incidence in _ incidence in exposed unexposed . incidence in exposed = {a/(a+b)} - {c/(c+d)} X 100% a/(a+b)
Population attributable risk
Population attributable risk is the rate of disease in the population that is due to the exposure
Population attributable risk
Population attributable risk (PAR) = total _ incidence incidence in unexposed = {(a+c)/(a+b+c+d)} - {c/c+d)}
Population attributable risk percent
Population attributable risk percent is the proportion of cases of disease that is due to a given risk factor
The PAR% is the amount of disease that would be prevented if the risk factor could be eliminated from the population
Population attributable risk percent Population attributable risk percent (PAR%) =
total _ incidence incidence in unexposed . total incidence = {(a+c)/(a+b+c+d)} - {c/(c+d)} X 100% (a+c)/(a+b+c+d)
Population attributable risk percent
PAR% can also be calculated from case-control studies using the odds ratio:
PAR% = OR - 1 X a X 100% OR a+c
Study Design
Bias The goal of science is to be accurate in the
discovery, description, and measurement of the truth
Bias is a systematic deviation of study measurement, results or inferences from the truth
The “internal validity” of a study related to the minimization of bias so that the study result can most confidently be assigned to the factors under study
Types of bias
Measurement bias Recall Bias Selection bias
“CBC” of research evaluation
Could the findings be due to Chance (random error)?
Could the findings be due to Bias? Could the findings be due to Confounding?
Overview diagram of research designs Descriptive Analytic (hypothesis generating) (hypothesis testing) Describe Describe Observational Intervention something how something Studies (non- Studies -case report varies (by experimental) -case series time, place) -descriptive Randomized Quasi-
epidemiology Controlled Experiment studies Trial (true experiment,
clinical trial) NEXT SLIDE
Observational Studies Before-after Correlational Cross-sectional Case-control Cohort Study Study (pretest- Study (eco- (prevalence Study (Trohoc (follow-up study, posttest study) (logic study) study, survey) study, case- longitudinal study, referent study, incidence study, case-compeer prospective study) study, causal- comparative study, retro- spective study) Prospective Retrospective
(concurrent, (historical, futuristic retrospective- cohort study) prospective cohort study)
Hierarchy of Clinical Study Designs
Descriptive studies Observational studies Intervention (experimental) studies Summary or extrapolation studies
» Meta-analysis » Decision analysis
Descriptive studies
Usually lack hypothesis in advance Usually satisfied with establishing non-
causal associations Case reports, case series Descriptive reports (example: vital
statistics reports, hospital discharge data, physician pharmacy profiles)
Analytic Studies
Usually have hypothesis specified in advance
Usually intended to establish causal associations
Research study structure
The variation in an exposure (a drug, environmental exposure, health habit, heritable gene, etc.) is associated (usually causally) with the variation in an outcome (disease, morbidity, mortality, quality of life, health care utilization, etc.)
Simplest format for reviewing research structure is the 2X2 table
Research study structure
Outcome Bad Good Exposure Yes a b No c d Designs vary primarily on how the cell
subjects are obtained
Observational Studies
Before-after, Ecologic (correlational), Cross-sectional
Case Control (retrospective) Cohort
» prospective cohort (prospective, follow-up) » retrospective cohort (historical prospective)
The “lesser” observational studies
Before-after: outcomes for individual or a population are compared before and after an known exposure or event
Require little resources Source of bias: unclear what would
have happened without the intervention
The “lesser” observational studies
Ecological (Correlational): rates of exposure and outcomes for different populations are compared
Source of bias: the individuals with the exposure aren’t necessarily the ones with the outcome (“the ecological fallacy”)
The “lesser” observational studies
Cross-sectional: individual exposures and outcomes are determined at the same point in time for a population
Gives prevalence of outcomes and exposures, but statistically inefficient
Source of bias: can be difficult to establish temporal relationship between exposure and outcome
Case-control studies Subjects with an outcome are compared to
those without, comparing their prior exposure histories
The increased risk of having an exposure based on the outcome is interpreted as an increased risk of outcome as a result of exposure
Example: people with and without prostate cancer are compared to see if they had a vasectomy
Case-control studies Exposure (a) Disease No exposure(c) Exposure (b) No disease No exposure (d) THE PAST TODAY
Situations where case-control studies are favored:
Disease under study represents a rare outcome event (only way to study some diseases)
Intent to study multiple potential risk factors for a single outcome
Not much is known about a disease, but there are associational suspicions and hypothesis generating studies are needed
Expensive to diagnose or detect outcome in study individuals
Long latent period between exposure and outcome Resources and time are limited
Steps in case-control studies
Define the hypothesis(-es) Select your cases Select you controls Ascertain exposure status as well as
status on important confounders
Case selection
When possible, cases should consist of incident (newly-arising) rather than prevalent (existing) cases
Over-sampling of cases of long duration may tend to bias the results, describing factors that influence prognosis or survival rather than etiologic factors
If disease under study is very rare, you often must use prevalent cases to get enough to study
Sources for case selection
Representative of cases arising in a defined population (either representative or total/near total sample of cases)
Cases not arising from or representative of a defined population. (N.B.--cases must be 'unselected' within source--either choose all cases or select randomly)
Problems in case selection
Misclassification of disease status Combination of heterogeneous
outcomes in defining cases
Selection of controls
"Rule of thumb"--choose control subjects who, if they had gotten the disease under study, would have been eligible for case selection
Sources for controls If cases represent all cases in a defined population,
select controls from the non-diseased members from the same population
If cases are not from a defined population; select controls from individuals receiving care from same source
Neighbors or friends, or family members Persons who underwent the same case-finding
procedure as cases, but were found to be disease-free (e.g., persons with a negative diagnostic procedure)
Matching in control selection
Matching controls represents an alternative to control of confounding factors in the analysis stage (i.e., by adjustment) by establishing criteria for selection of controls which prevent certain extraneous factors from being considered
The major disadvantage is that you can not evaluate the relationship between the matched variable and the outcome
Ascertainment of exposure
Data sources Direct from subjects via interviews,
questionnaires Proxy respondents--next of kin, household
members--this method is necessary when the disease is rapidly fatal
Pre-existing records--includes vital statistic data tapes, laboratory results, other medical records
Comments on exposure ascertainment
Need comparable ascertainment between cases and controls
Response rates should be high and similar for cases and controls
Similar consideration of ascertainment issues apply to measurement of exposure to confounding factors
Analysis of case control studies
Relative risk estimation is done by calculating the relative odds (odds ratio)
For rare outcomes (i.e., prevalence less than 10%), the odds ratio will be very close to the relative risk that would have been obtained from a similar cohort study
Sources of potential bias in case-control studies
Selection bias: » Representativeness of the case group » Appropriateness of the control group
(especially if study is not population-based) » Detection bias (unmasking bias)--results
when the identification of cases varies with exposure status
Selection bias
Am J Epidemiol 1983; 117:326-334 Hospital vs. population controls in
evaluating the association between artificial sweeteners and bladder cancer
OR with hospital controls: 0.8-0.9 OR with community controls: 1.1-1.2
Sources of potential bias in case-control studies
Information bias » Misclassification (including heterogeneous
outcomes) » Differential reporting of exposure data (including
recall bias) Confounding (especially if confounding
variable was not anticipated and measured) » In case-control studies, the factor need not be a risk
factor for the disease if it influences selection probability differentially in cases and controls
Misclassification and effects of heterogeneous outcomes: thrombotic stroke and OC use
Assume the “truth” in a study of 200 stroke patients and 200 controls is that 50% of thrombotic stroke patients and 10% of controls use OCs:
Case Control +OC 100 20 -OC 100 180 OR = (100X 180)/(100X20) = 9
Suppose as the study was conducted, 20 cases of non-thrombotic stroke with OC use the same as controls (10%) were included as cases:
Case Control +OC 100+2 18 -OC 100+18 162 OR = (102X162)/(118X18) = 7.8 --inclusion of non-related cases dilutes the odds ratio
Suppose there were 20 cases of thrombotic stroke undetected (misclassified as controls) who had OC use the same as the detected cases (50%):
Case Control +OC 90 20+10 -OC 90 180+10 OR = (90X190)/(90X30) = 6.3
Bias in observational studies Misclassification Effect on OR Exposed cases as controls underestimate Exposed controls as cases overestimate Exposed cases as unexposed underestimate Exposed controls as unexposed overestimate Unexposed cases as controls overestimate Unexposed controls as cases underestimate Unexposed cases as exposed overestimate Unexposed controls as exposed underestimate
Factors that inhibit finding associations in case-control studies:
More than one causal pathway (allows some cases to not have the exposure)
Other components of causal pathway absent (allows some non-cases to have the exposure)
Insufficient variation in exposure Too much confounding from other
factors
Cohort studies
Subjects with and without the exposure are followed and outcomes are compared
Exposed and unexposed subjects can come from the same group (geographically and temporally) or from different groups
Can be done either prospectively or retrospectively (as long as the study group can be assembled on the basis of exposure status independent of outcome status)
Cohort studies Disease (a) Exposed No disease (b) Disease (c) Not exposed No disease (d)
TODAY THE FUTURE (prospective) THE PAST TODAY (retrospective)
Situations where cohort studies are favored:
Risk factor represents a rare event Intent to study the multiple potential
outcomes of a single exposure Necessary if incidence rates are needed
from the study Necessary if limitations make other
designs unfeasible
Steps in cohort studies
Define the hypothesis(-es) Select study population(s) (exposed and
comparison groups) Exclude subjects not at risk Ascertain exposure (including
confounders) Monitor for and ascertain outcome
Study population selection
Study population is a group with a special exposure, or share a geographic and/or temporal commonality
Study population is a group for which there are special data resources available
Some other available, identifiable population
Exclude subjects not at risk
(Those who have the disease or cannot get the disease)
This prevents significant bias in the results Should be done at recruitment if follow-up
information is expensive or difficult to collect
May be done at analysis if follow-up data are readily available
Ascertainment of exposure
Need to collect data on both main exposure(s) of interest as well as other potential confounding factors
Methods/sources » Direct from subjects via surveys and/or
examinations » Abstract from available records » Other, such as environmental exposures
Select groups for comparison
Rule of thumb--if not for the presence or absence of exposure, the groups should look like two random samples from the same "universe"
Comparison group can be internal group from the cohort without the exposure or varying levels of the exposure (most common design)
Select groups for comparison
Select concurrently studied comparison groups, such as workers in a similar occupational category as the exposed group
Comparison can be made with population rates/other published rates--very common in retrospective occupational cohort studies; presents certain problems with validity
Ascertainment of outcome
Methods/sources » Vital records, such as death certificates » Other available records, such as hospital discharge
data, medical records, disease registries » Directly from study cohort (follow-up surveys and/or
examinations) If possible, should "blind" observers to
exposure status of each subject when ascertaining outcome
Analysis
Computation of the excess risk associated with the exposure involves determining and comparing the incidence or mortality in each comparison group
Outcome measures include relative risk, attributable risk, and population attributable risk percent
Sources of bias in cohort studies Selection bias can occur in the formation of
exposure groups » Those choosing to be exposed may be different
than those who don’t » There may be other factors related to outcome
that determined why one group was exposed and the other wasn't
Completeness of follow-up is a major source of potential bias, especially if loss to follow-up is unequal in exposure groups
Sources of bias in cohort studies
Ascertainment bias can occur, especially if method of determining outcome does not include blinding to exposure status
Confounding
Retrospective cohort studies
Defined as a study in which the cohort is assembled retrospectively, exposure data for this cohort are determined retrospectively, and outcome is determined now
Can retain the same level of rigor as a prospective cohort study
Retrospective cohort studies
Relies on the ability to assemble a cohort based on some common past experience and on the availability of the necessary exposure data collected without bias on all members of the cohort
While much faster and cheaper, lost to follow-up usually represents a formidable problem
Other problems (retrospective cohort)
Sample sizes are usually smaller Misclassification of either exposure or
outcome is potentially a greater problem Data on exposure status for potential
confounders may not be available
Nested studies
Cohort studies provide the opportunity to perform nested case-control studies
Once sufficient outcome endpoints have accrued, diseased individuals can be compared with those free of disease, and exposure status can be determined retrospectively
Most often used when a potential confounder is identified in the analysis as an important determinate of excess disease risk
Intervention studies
Randomized controlled trials (RCTs) Natural experiments Group randomization trials (GRTs) Quasi-experimental studies
» Before-after » Non-equivalent control group » Interrupted time series » Regression discontinuity
RCTs
Essentially the same as a cohort study, except the investigator decides who gets the exposure, using random assignment
Strongest study design of all, maximizing internal validity (usually at the expense of external validity)
Maximizes internal validity by promoting the equal distribution of potential confounders into exposed and unexposed groups
Steps in RCTs
Define the hypothesis Select study subjects Randomly allocate subjects to intervention
groups “Blind” subjects whenever possible “Blind” investigators whenever possible Follow and ascertain all relevant outcomes,
monitor for adverse effects and stopping rules
Select study subjects
Requires informed consent and strict inclusion and exclusion criteria
Pre-randomization visits are used to help insure successful participation
These design elements likely make the study population non-representative
Allocate subjects to intervention and/or control groups
Assignment must be random Block randomization can be used to
insure equal distribution of important confounders
Quasi-random techniques must produce assignments that can produce no sources of bias
Blinding
Subjects should be blinded (e.g. with placebo treatment) to which study group (experimental vs. control) they are in whenever possible to guard against placebo effects and cross-over
Allocation concealment is important
Outcome assessment
Those ascertaining outcomes should be blinded to the subject's study group whenever possible to guard against investigator bias (double-blind trial)
Consider all relevant outcomes Safety monitoring for adverse effects Stopping rules (the point at which the results
become statistically significant and ethically the trial should be ended)
Analysis
RCTs must be analyzed using the "intent-to treat" approach: subjects must be analyzed as belonging to the group to which they were first randomized, even if they cross-over to the other group
Re-assigning cross-overs invalidates the RCT, as those that cross-over are likely to be quite different from those who don't
Confounding variables must be compared across groups, as errors of randomization can lead to imbalances in the distribution of these factors
Analysis
Often RCT's compare continuous variables between groups: recognize that in general, it takes a much smaller difference to be statistically significant with continuous variables than with categorical variables
For example, do you want to know the difference in diastolic blood pressure between two groups, or do you want to know how many subjects in each group become normotensive?
Sources of bias
Errors of allocation Ascertainment of outcomes Lost to follow-up Inclusion of all relevant outcomes Cross-overs, intent to treat analysis Selection bias (now relates to external
validity/generalization) Errors of randomization and confounding
Issues in RCTs In general, lengthy and very expensive Ethical and legal issues are important Blinding can be difficult, cross-overs may be common,
and drop-outs and lost to follow-up are major problems
Strong internal validity is strong is achieved at the expense of external validity; your study groups may not generalize to any other population
Power issues become important when effect sizes are inadequate to reach statistical significance
Still, the gold standard of research design
Natural experiments
Researcher does not determine the group receiving the intervention, which occurs "naturally" or under control of some other process
Can address problems where RCT interventions are unrealistic as to what can be implemented and sustained in natural setting
Natural experiments
Can be used for rapid evaluations of innovative, expensive, or complex interventions or policy changes in natural settings
Worse internal validity; may have limited generalizability (but better than RCT)
Internal validity can be improved with more data points pre and post intervention (as in time series analysis)
Group randomized trials
Study design where the unit of assignment is an identifiable group, allocated to different exposures; unit of observation are the members of the group
Randomization provides the assumption of independence at the group level, with a desire that potential sources of bias are fairly distributed across the study exposures
Group randomized trials
There is extra variation attributable to groups, which increases the standard error of the intervention effect
This is worsened when the number of groups is small, limiting degrees of freedom and resulting in more problems with interclass correlation
Quasi-experimental studies
These studies are those in which the researcher cannot or does not assign interventions randomly to participants, but may have some control over: » who gets the intervention » when the intervention is given, and/or » when measurement occurs
Quasi-experimental studies
Study design needs to match the question under study:
There are instances where an RCT will not be the best design (when you cannot address generalizablility), when an RCT is not feasible, and when an RCT is not appropriate
Quasi-experimental studies There are two major concerns with quasi-
experimental designs: » additional sources of bias not controlled for by
random assignment (especially selection bias) » interclass correlation that may be responsible
for the observed effect, unrelated to the intervention (confounding)
Classic example: before-after study design—has same sources of bias as the observational before-after study
Quasi-experimental designs
Non-equivalent control group design Study groups are assembled in a non-
randomized fashion intended to minimize unequal distribution of important confounders, and researcher decides which group(s) gets the intervention
Group A Ol X O2
Group B Ol O2
Quasi-experimental designs
Time series design (interrupted times series, with or without non-equivalent control group; with a control group = multiple times-series design).
Represents a refinement of the pre-post study design in that multiple measurements over time give more of a sense of whether post-intervention differences can be assigned to the intervention
Multiple time series design
Study group Ol O2 O3 O4 X O5 O6 O7 O8 Control group Ol O2 O3 O4 O5 O6 O7 O8
Quasi-experimental designs
Regression Discontinuity A kind of time series analysis, where units are
assigned to a condition based on a cutoff score on a measured covariate; for example, for a smoking cessation intervention, communities that exceed a certain cutoff for packs of cigarettes sold are given the intervention, and communities below that cutoff are the comparison
Regression discontinuity
The treatment effect is measured as the discontinuity between treatment and control regression lines at the cutoff point (not the group mean difference)
When properly implemented and analyzed, give an unbiased estimate of treatment effect
Regression discontinuity
Effect
Smoking Prevalence
Smoking Prevalence
Tobacco Sales
Tobacco Sales
Intervention
Control group
Intervention group . .
. . .
.
. .
Causal association Causation has its roots in the Koch-Henle Postulates
Koch-Henle covers diseases for which the cause is necessary and sufficient; usually restricted to infectious agents
These postulates are not appropriate when disease entities are multifactorial, such as most chronic diseases
Diseases not caused by infectious agents usually require considering etiologic factors that are sufficient but perhaps not necessary, or even not sufficient but contributory
Causal relationships
Steps in determining cause: » Investigate the statistical and temporal
association » Eliminate alternatives through research
Often epidemiology must be satisfied with determining “causal association”
Causal Association
The association should be strong The association should be consistent The association should be strongest when
you expect it to (biological gradient) The exposure should precede the outcome The association should be biologically
plausible, including supportive data from other sources
Confounding
A distortion of the true relationship between an given exposure and a given outcome, resulting from a mutual relationship with one or more extraneous factors
The effect of the extraneous factor(s) can account for all or part of the observed relationship between exposure and outcome, or mask an underlying relationship
Confounding
Exists when the association between an exposure and an outcome is due all or in part to the mutual association with a third variable
Example: hot flashes are associated with endometrial cancer, through the mutual association with estrogen use
Confounding
Expressed diagrammatically: Exposure of interest Outcome of interest
Confounding factor
Criteria for confounding (need both):
The potential confounder must be associated with the outcome of interest: » the confounder is an actual risk factor for
the outcome » the confounder affects the likelihood of
recognizing the outcome The potential confounder must be
associated with the exposure of interest but not be a result of the exposure
When is a factor not a confounder?
If an individual's status regarding the confounder is a result of the exposure under study, or the confounder is in the "causal pathway" between exposure and outcome
If an individual's status regarding the confounder is a result of the disease under study
If the confounder is essentially measuring the same thing as the exposure
If the association between the confounder and the outcome of interest is thought to be due to chance
How do you detect confounding?
Determine if the potential confounder is associated with both exposure and outcome
Adjust for the potential confounder in analyzing the data: if there is a difference between the adjusted and unadjusted estimate of the effect of the exposure, then potential confounder is a true confounder
How do you account for or control for confounding?
Prevent confounding through the study design: » Restriction: study only the subjects in a given
category » Matching: match individuals for comparison on the
basis of their status regarding the confounder (and use a matched analysis)
» Use a RCT design: randomly allocate subjects to exposed and unexposed groups (note that confounding can still happen by chance)
How do you account for or control for confounding?
Remove effects of confounding in the analysis: » Report stratum- specific rates: list the effect of the
exposure on the outcome for each level of the confounder
» Use rate adjustment to account for the confounder » Use Mantel-Haenszel methods to account for the
confounder » Use regression analysis (especially attractive if you
need to consider several coincident confounders)
Effect modification
Occurs when the effect of a risk factor on an outcome is different at different levels of a third factor; the third factor is known as an effect modifier
Note that compared to the definition of confounding, effect modification says nothing about the relationship between the outcome and the effect modifier
Effect modification
The most common effect modifiers seen are age and gender » For example, the effect of estrogen on
heart disease risk is very different for men and women; hence gender is an effect modifier
It is usually not appropriate to summarize over the strata of an effect modifier
Confounding and effect modification
Males CAD No CAD High chol 45 47 Normal 9 16 OR = 1.7 Females High chol 5 3 Normal 42 33 OR = 1.3 Pooled High chol 50 50 Normal 51 49 OR = 0.96
Detecting confounding
CAD No CAD Male 54 63 Female 47 36 OR= 0.66 High chol Normal Male 92 25 Female 8 75 OR= 34.5 Gender is related both to the outcome of interest and to the risk
factor of interest and therefore meets the criteria of a confounder
Rate adjustment
Rate adjustment is a technique used to provide a single number (adjusted rate) for each of two or more comparison groups which summarizes the experience of the group but is not influenced by a confounding variable when the comparison is made
Rate adjustment
Rate adjustment is most often used to account for the affect of age (age adjustment); also commonly used to adjust for gender; could be used to adjust for any confounder or group of confounders
In rate adjustment, you essentially estimate how the comparison would turn out if the groups had the same distribution of the confounder (e.g., the same age distribution)
Direct adjustment
Choose a standard population whose age distribution is known, such as the US population
Calculate the number of events which would be expected to occur in each age group of the standard population using the age group-specific rates from each comparison group
Direct adjustment
Calculate the age-adjusted event rate for each comparison group:
Expected events in standard pop'n Age-adjusted rate = (using rates from comparison group) Total in standard population
Direct adjustment
Interpret these rates as the expected event rate for each comparison group if the group population had the same age distribution of that standard population
Note that the age-adjusted rates will vary depending on the age distribution of the standard population used for the standardization
Direct adjustment If all you want to do is compare the event
rates in the comparison groups then what standard population you choose is not important » dividing the two rates will give you an age-
adjusted relative risk If you want to use the age-adjusted rates as
descriptive tools in other comparisons, you will want to use a standard such as the U.S. population
Direct adjustment
US pop’n CO rate of disease
Expected cases
WA rate of disease
Expected cases
Age 40-50 10 million
1/1,000 1,000 1.2/1,000 1,200
Age 50-60 10 million
2/1,000 2,000 2.3/1,000 2,300
Total 20 million
3,000 3,500
Direct adjustment
Age adjusted rate: Colorado 3,000/20,000,000 = 1.5/10,000
Age adjusted rate: Washington
3,500/20,000,000 = 1.75/10,000
Indirect adjustment
Choose a standard population for which the age-specific event rates are known. This standard population can be either of the comparison groups or an external population
Indirect adjustment
Calculate the number of deaths which would be expected to occur in each age group of each of the comparison groups had they been subject to the event rates observed for the corresponding age groups in the standard population
Indirect adjustment
Calculate the Standardized Mortality Ratio (SMR) for each study group:
SMR = Observed number of deaths in study group Expected number of deaths in study group
Indirect adjustment
Pop’n Obs Exp Pop’n Obs Exp
Age 40-50 1%
10,000 1,500 1,000 15,000 1,000 1,500
Age 50-60 1.5%
20,000 3,500 3,000 20,000 2,500 3,000
Total 5,000 4,000 3,500 4,500
US death rate
Colorado
Washington
Indirect adjustment
Colorado SMR: 5,000/4,000 = 1.25
Washington SMR:
3,500/4,500 = .078
Indirect adjustment
The SMRs can be interpreted as the ratio of the observed number of events in each comparison group to the number that would be expected based on the event rates from the standard population
An SMR of 2 in comparing the mortality rate in group A to group B means that twice as many deaths occurred in group A after accounting for age
Indirect vs. direct adjustment
Direct adjustment is commonly used to provide adjusted rates for descriptive purposes, and in comparing data from several different studies
Indirect vs. direct adjustment
Indirect adjustment may be preferable when there are only a small number of people in each age category of the study groups where the age-specific rates are based on small numbers and changes of only one or two individuals in the numerator produce big changes in the age-specific rates
Indirect vs. direct adjustment Both methods give a single number for each
study group These somewhat artificial summary numbers
are based on hypothetical circumstances, but since the circumstances are the same for both groups, these numbers provide a fairer comparison than the overall rates, which reflect both differences in age distributions and the underlying differences between comparison groups
Indirect vs. direct adjustment
Direct adjustment Indirect adjustment Data used from standard pop’n Age distribution Age-specific rates Data used from each study group Age-specific rates Age distribution Result of adjustment for each study group Age-adjusted rate Ratio of observed to expected events (SMR)
Disease Screening
The disease/prevention spectrum: Primary Secondary Tertiary Prevention Prevention Prevention No Asymptomatic Symptomatic Disease with Disease Disease Disease Complications
Criteria for targeting a disease for screening
The disease is important The disease has a recognizable pre-
symptomatic stage—this stage needs to be long enough to provide you a window for screening, diagnosis and treatment
Reliable screening tests exist for the pre-symptomatic stage and have acceptable sensitivity and specificity as well as acceptable risks for the screened individuals.
Criteria for targeting a disease for screening
Treatment of the disease during the pre-symptomatic stage must result in improvement in outcome (morbidity and/or mortality)
Sufficient resources and access to these resources must exist for diagnosis and treatment of disease in the population with positive screening tests
Benefits of screening for this disease should outweigh the benefits of other programs
Screening test utilities
The “Truth”: Disease No Disease Test Results: Disease True Positives False Positives
(TP) (FP) No Disease False Negatives True Negatives (FN) (TN)
Sensitivity and Specificity
Sensitivity describes how often a screening test detects a disease when it is indeed present; = TP/(TP+FN), or true positives over the total with disease
Specificity describes how often a screening test detects the absence of disease when it is indeed absent; = TN/(TN+FP), or true negatives over the total without disease
Predictive Value
Positive predictive value describes how often individuals with positive tests actually have the disease; = TP/(TP+FP), or true positives over all positives
Negative predictive value describes how often individuals with negative tests are actually disease-free; = TN/(TN+FN), or true negatives over all negatives
Effect of prevalence
Sensitivity and specificity remain the same for the test regardless of the prevalence of disease in the population
Predictive value depends on sensitivity, specificity, as well as disease prevalence
Effect of prevalence
As the test is applied to populations with lower disease prevalence, positive predictive value drops and negative predictive value increases
Conversely, as disease prevalence increases, positive predictive value increases and negative predictive value drops
Example
If fecal occult blood testing is 92% sensitive and 95% specific, and the prevalence of colon cancer is 2/1000, what is the positive predictive value of a positive test (i.e., what percent of positives will have colon cancer)?
Assume you screen 100,000 adults 200 have cancer 99,800 have no cancer 184 test 16 test 4990 test 94,810 test positive (TP) negative (FN) positive (FP) negative (TN) Then 184 out of (184 + 4990), or 3.6% of people testing
positive have the disease
Note that negative predictive value is over 99.9%. As the prevalence increases to 50 per 1000, positive predictive value rises to 49.2% and negative predictive value decreases to 99.6%
100,000 adults 5,000 have cancer 95,000 have no cancer 4,600 test 400 test 4,750 test 90,250 test positive (TP) negative (FN) positive (FP) negative (TN) 4,600 out of (4,600 + 4,750), or 49.2% of people testing positive
have the disease
Screening for rare diseases If the disease is rare in the population screened,
even a test with high sensitivity and specificity will produce many more false positives than true positives
The false positives can result in significant costs, health risks, and anxiety, as further tests are required to diagnose disease
Strategies to improve the predictive value of a test include using a test with higher sensitivity and specificity, or targeting screening efforts on high-risk populations
Summary of effects of prevalence and sensitivity and specificity on predictive values
Prevalence ----> PPV and NPV Prevalence ----> PPV and NPV Specificity ----> PPV Sensitivity ----> NPV
Screening test cutoffs
There are often times when it is possible to choose a cutoff for declaring a positive or negative test
Most often this occurs when the screening test involves measuring the level of a substance in a body fluid such as urine or blood
In such a case, the choice of a cutoff for a positive test must involve consideration of the effects of trading off false positives for false negatives
– (N.B.--selection of cutoffs may not apply for more subjective screening tests such as mammography.)
PKU example
PKU (phenylketonuria) affects about 1/10,000 newborns. The following results are from a cost-benefit analysis of a Swedish program
With PKU Normal Cost/ Cutoff point TP FN FP TN Sens/Spec +/-Pred Val Case 0.4 mmol/l 43 0 33 1,326,421 100/>99.9 57/100 379K 0.5 mmol/l 43 0 26 1,326,428 100/>99.9 62/100 378K 0.6 mmol/l 41 2 13 1,326,441 95/>99.9 76/>99.9 477K 0.7 mmol/l 38 5 7 1,326,447 88/>99.9 84/>99.9 624K
Costs are in Swedish Kroner and ignore the 15 Kroner cost of the screening test itself, using only the additional costs of diagnosis for normal infants with positive tests, the additional costs of treatment for PKU infants, and the lifetime care of undetected PKU infants.
PKU example
Serum Phenylalanine
Normal children
PKU children
Cutoff A Cutoff B
Caveats about cutoffs
No cutoff point totally separates those with or without disease
Choosing a lower cutoff point raises sensitivity at the expense of lowering positive predictive value and (although not seen in this example) specificity; choosing a higher cutoff does the opposite
Caveats about cutoffs
Thus the choice of the best cutoff involves trading off false positives against false negatives
FPs and FNs may not deserve equal consideration; here false negatives have worse health consequences as well as generating more costs
Screening test biases
In the evaluation of a screening procedure or program, at least two sources of bias must be considered, lead time and length time bias
These sources of bias can result in finding improvements in disease mortality even if the early treatment of the disease in no way alters the outcome
Lead time bias
Lead time bias ("zero-time shift bias") occurs because screening tests detect illness at earlier points in time
Lead time bias If you developed a screening test that diagnoses
a disease one year earlier, but early treatment had no effect on survival, you would know the person had disease for a year longer than those detected without screening
Comparison of the outcomes between screened and unscreened groups would yield the erroneous conclusion that the screening program resulted in an improved outcome, when all it did was add a year to the length of time that the disease was diagnosed
Length time bias Length-time bias can be understood by
recognizing that for most diseases, the natural history varies in time span » One cancer may progress quickly through all
phases of the disease spectrum, with a short asymptomatic phase and a short interval between the development of symptomatic disease and death
» Another cancer of the same organ may have a very long, indolent course, with long asymptomatic and symptomatic phases
Length time bias
If the prevalence pool (those with the disease in the population) are made up of individuals with these differences, then a screening test would be more likely to pick up individuals with longer asymptomatic phases
Hence, those detected by screening would be more likely to have longer disease regardless of whether it is detected by screening or not
Parallel and Series Testing Strategies
There are two methods for combining the results of multiple screening tests in an overall strategy: » Parallel testing: the overall screening result is
positive if any one test is positive; this strategy increases sensitivity at the expense of specificity
» Series testing: the overall screening result is positive only if all tests are positive; this strategy increases specificity at the expense of sensitivity
Concepts of infectious disease epidemiology
Types of host-agent interaction: spectrum of disease
Colonization: agent infects host continuously with no overt evidence of disease or infection
Covert infection: agent infects host, time-limited, with no overt indication (majority of infections)
Overt infection--infection with disease (minority of infections)
Typical patterns of infection
Inapparent infection frequent, rare clinical disease » Inapparent infection is important in disease
transmission and control, and influences apparent epidemiology (understates amount of disease and overstates severity)
Clinical disease frequent, few severe cases Infection usually always fatal
Infectious disease terms
Infectivity--ability to involve/multiply in a host (establish an infection) » ID50 = dose of agent necessary to infect
50% of hosts Infectiousness--ability to be transmitted
to other hosts Pathogenicity--ability to cause disease
in a susceptible host; to produce clinical illness
Infectious disease terms
Virulence--ability to produce severe clinical illness, including death. » LD50 = dose of agent necessary to kill 50%
of hosts Immunogenicity--ability to elicit an
immune response (cellular, local, or systemic)
Areas important to the natural history of infectious disease
Reservoir--a thing (person, animal, insect, plant, soil) in which an agent lives, depends on for survival, and reproduces for transmission into a susceptible host
Vector--that which allows for the transmission of the agent
Transmission--the action of moving from a reservoir to a susceptible host
Host--the organism at the receiving end of the transmission
Types of transmission
Direct--immediate transfer from one host to another » Examples--air droplets, direct body contact » Control target--primary host
Indirect--transmission via an intermediate agent » Examples--Vehicle-borne (fomites, water)
– vector borne (mechanical/passive = flies; biological/active = mosquitoes)
– Control target--secondary host » Airborne--aerosolized organisms (mist, dust)
Issues in transmission Generation time--period between infection and
maximal communicability (can be the same as incubation time)
Herd immunity--resistance of a population to the spread of disease based on prior immunity of the population.
Secondary attack rate--measures the rate of spread of disease within an exposed group
= (#new cases - # initial cases) (#susceptible - # initial cases)
Types of infections in a population Sporadic--occasional cases at irregular intervals Endemic--low level (usually), expected frequency of
disease occurrence; the constant presence of a disease within a given geographical area ("usual prevalence")
Hyperendemic--a gradual increase in the frequency of disease occurrence above endemic level
Epidemic--a sudden increase in the frequency of disease occurrence above endemic level (clearly in excess of expected levels)
Pandemic--an epidemic occurring across continents
Types of outbreaks/epidemics
Common source--point source. Example--food poisoning, ongoing source =
water contamination Epidemiologic curve (plot of number of cases
against time) for point source is single humped, with a rapid rise and a slower return to baseline
Curve for ongoing source shows rapid rise (at onset of exposure) to persistent epidemic level
Types of outbreaks/epidemics
Propagated epidemic-- transmission, direct or indirect, from one host to another
Epidemiologic curve is multi-humped: » the first peak represents the primary attack
rate for those exposed to the index case » subsequent peaks represent secondary and
beyond attack rates as those in the first peak infect others
Types of outbreaks/epidemics
Vector borne similar to common source, but with
complex pattern because of vector
Control of infectious disease
Control of reservoir: animal, human Control of transmission: vector,
environmental Reduction of host susceptibility Disease eradication
Vaccine efficacy
Vaccine efficacy =
Attack rate (AR) in unvaccinated – AR in vaccinated . AR in unvaccinated = (1-Relative Risk (vaccinated vs. unvaccinated)) x 100% = 1 - Proportion of cases vaccinated X (1- Proportion of population vaccinated) (1-proportion of cases vaccinated) Proportion of population vaccinated
x 100%
x100%