ACPM Review Course Epidemiology51c57a956dd663bc9927-2f556433223620ea71c4245ec155be59.r73.cf1.rackcdn.…ACPM Review Course Epidemiology Objective of session: to review the major terms,

ACPM Review Course Epidemiology

Objective of session: to review the major terms, definitions and

concepts of epidemiology

Agenda Introduction Measures of disease frequency Descriptive epidemiology Measures of excess risk Study Design

» Descriptive studies » Case-control studies » Cohort studies » Clinical trials and quasi-experiments

Agenda (cont.)

Epidemiologic evidence and causal inference

Confounding and effect modification Disease screening Infectious disease epidemiology

Measures of Disease Frequency

Counts = number of people with a disease

Rates - account for the denominator, or size of the population, and imply a period of time

Cumulative Incidence (most commonly used as synonymous with "incidence")

= number of new cases of a disease occurring in a specified time period number of people initially at risk Synonymous with

» attack rate » risk of disease » probability of getting disease

Incidence density

= number of new cases of a disease occurring in a specified time period .

total amount of "person-time" at risk contributed during the time period

Estimates the instantaneous rate of

occurrence of disease per unit of time relative to the size of the population at risk

Prevalence

= number of existing cases of a disease at a specified time .

size of base population at that time

A "snapshot" view of disease frequency in a population at a single point in time, important for planning and allocation of resources

"Point prevalence" refers to the prevalence at a single point in time

"Period prevalence" refers to the prevalence measured for a specific time interval

Mortality

= number of people dying in a specified time period .

average number of people alive during that period of time # dying of a disease in the

numerator produces: Disease-specific Mortality

Case fatality

= number of people who die of a disease total number of people who get the disease

Measures disease prognosis rather than

disease frequency

Proportional mortality

number of deaths = due to a disease in a specified time period . total number of deaths during that time period

Can be misleading if mortality rates for other

causes are unusually high or low in a group


1960’s study of Hodgkins Disease in teachers: » 2.5% of deaths in teachers due to Hodgkins » 1.0% of deaths in general population due to

Hodgkins Authors concluded that teachers were at

2.5 times higher risk for death from Hodgkins


In Denver, 10% of deaths in white children under 10 years are due to leukemia, vs. 5% of deaths in black children. Which of the following are true? » The relative risk for leukemia in white vs. black

children is 2.0 » The attributable risk for leukemia in white vs. black

children is 5/100 » Neither the attributable risk nor the relative risk

may be determined from the data provided

Relationships between disease rates

Prevalence, incidence (density), and duration of disease are related:

prevalence = incidence X average duration of disease .

1 + (incidence X average duration of disease) = incidence X duration (when prevalence is low, i.e.,<10%)

Holds only when incidence and duration are stable over

time Useful in predicting what a change in one variable will

cause in another


Mortality, incidence, and case fatality:

mortality = incidence X case fatality

(when incidence and case fatality are stable over time)


Survival rate and case fatality:

1 - case fatality = survival rate

Other useful rates and measures

Birth rate =

Number of live births in a year . Population (in thousands) at midyear


Fertility rate = number of live births reported in one year number (in thousands) of women age 15-44 years at midyear

Some sources calculate total fertility rate by

summing births/women for 5 year age categories Some sources extend the age range to 10-49 years


Fetal death rate (stillbirth rate) =

annual number of fetal deaths (gest. age 20 wks/350 g) annual number of fetal deaths plus live births (in thousands)


Neonatal death rate =

annual number of deaths in the first 28 days of life annual number of live births (in thousands)


Perinatal death rate =

annual fetal deaths plus deaths in the first 7 or 28 days of life . annual number of fetal deaths* plus live births (in thousands) *Defined as either after 20 or 28 weeks gestation


Infant death rate =

annual number of deaths in the first year of life annual number of live births (in thousands)

Years of Potential Life Lost (YPLL)

A measure of premature mortality, YPLL takes into account not only the cause of death but the age of occurrence

Calculated by multiplying the number of cause-specific deaths in an age group and multiplying by the difference between the midpoint of the age group and age 75 (or the average age at death)

Years of Potential Life Lost Age group 0-10 11-20 21-30 # deaths 2 9 4 # years lost between age 75 and 75-5= 75-15= 75-25= midpoint of interval 70 60 50 total # years lost 70x2= 140 60x9= 540 50x4= 200 TOTAL YPLL = 140+540+200 = 840

Life expectancy and life-time risk

Average age at death given live birth Additional life expectancy changes as

you age (and don’t die young) Age-specific life-expectancy comes from

life-table (survival) analysis Be careful of cumulative life-time risk

estimates that don’t give you the life expectancy on which they are based

Descriptive epidemiology

Practical uses of descriptive epidemiology data: » To provide clues to etiology and means of

prevention » To help target screening efforts » To aid in diagnosis » To aid in the planning of health services » To provide baseline data

Sources of numerator data for descriptive epidemiology

Vital records (birth and death certificates) Disease reports (for example, reportable

diseases, tumor registries) Medical records Surveys

Numerator data

Problems can arise with numerator data from problems in defining a case to variability in methods of identifying a case

Sources of denominator data

Census/vital statistics records Enrollment records (health plans,

industry or union records, alumni rosters, etc.)

Framework for reporting descriptive data

What kinds of people get the disease? » Age: immunity to infectious diseases; slowly

developing diseases; diseases with long latency periods (time between exposure to causative agent and onset of disease); environmental exposures that vary with age

» Gender: anatomic and physiologic differences which affect susceptibility; many differences in life style, environmental exposures

What kinds of people get the disease?

» Race/ethnicity: genetic differences in susceptibility; frequent association with socio-economic status, life style

» Socio-economic status: nutritional factors; life style; adequacy of medical care

» Occupation » Marital status » Other (factors important for some diseases

and not others)

Where is the disease common or rare, and what are the characteristics of those places?

Physical environment Man-made exposures that vary by place

How does the frequency of the disease change over time?

--and, what historical factors appear to correlate with those changes?

Short-term trends; disease outbreaks Secular (long-term ) trends Cyclical variations

Cohort Effect

Age and time may interact to produce a "cohort effect", a point-in-time, cross-sectional observation that reflects variation in disease rates based on year of birth, and variation in disease rate within each cohort by age

Suspect whenever you see an unexpected decline in older age groups

Cohort effect

Lung Cancer mortality

Age

Cohort effect

Lung Cancer mortality

Age

1910 1920

1930 1940 1950

1960

90 80 70 60 50 40 30

Measures of excess risk

The need to identify individuals at increased risk for contracting a disease pervades all aspects of medicine

Prevention

Who is at risk and should be targeted for primary and secondary procedures?

Diagnosis

Given the characteristics of (and therefore the constellation of risks for) a given individual presenting with a certain symptom complex, what is the most likely diagnosis?

Management

Once a patient is known to have a given condition, for what is he/she at further risk?

What characteristics put an individual for an adverse reaction to a potential therapeutic intervention?

Two-by-two table for the computation of excess risk

Disease Present Absent

Risk Present a b Factor Absent c d

Relative risk and odds ratio

Relative risk (risk ratio, rate ratio) is the ratio of the incidence of a condition in the group of individuals with a specific characteristic (a "risk factor") to the incidence in the group of individuals without the risk factor

Relative risk

Relative risk (RR) = incidence in the exposed

incidence in the unexposed = a/(a+b) c/(c+d)

Odds ratio

The relative risk can be estimated by the odds ratio

The odds ratio is used in case-control studies, where individuals with a disease are compared to individuals without the disease for the presence or absence of a risk factor

Odds ratio

Note that, in the computation of the relative risk, if the disease is rare, then a would be small compared to b, c would be small compared to d, and the relative risk could be approximated by the cross-product of the two-by-two table:

Odds ratio (OR) = a/b = ad c/d bc

Attributable risk

Attributable risk, (risk difference, rate difference) is the difference between the incidence of the disease in individuals with a risk factor and in those without

Accounts for the baseline incidence of disease and gives the absolute amount of excess risk an individual

Requires a prospective study in order to produce incidence rates, more difficult to interpret

Attributable risk

Attributable risk (AR) = incidence _ incidence in exposed in unexposed = {a/(a+b)} - {c/(c+d)}

Number needed to treat (NNT)

Represents a more interpretable transformation of the attributable risk

Can be the number needed to screen, number needed to treat, or number needed to harm

NNT = 1/Attributable risk “You’d have to treat (NNT) people to

gain one additional outcome”

Attributable risk percent

Attributable risk percent (attributable rate percent, attributable proportion, attributable fraction, etiologic fraction) is the proportion of disease among those with a risk factor that is due to the risk factor

Tells you the amount of disease in the exposed group that is due to the exposure

Attributable risk percent

Attributable risk percent (AR%) = incidence in _ incidence in exposed unexposed . incidence in exposed = {a/(a+b)} - {c/(c+d)} X 100% a/(a+b)

Population attributable risk

Population attributable risk is the rate of disease in the population that is due to the exposure

Population attributable risk

Population attributable risk (PAR) = total _ incidence incidence in unexposed = {(a+c)/(a+b+c+d)} - {c/c+d)}

Population attributable risk percent

Population attributable risk percent is the proportion of cases of disease that is due to a given risk factor

The PAR% is the amount of disease that would be prevented if the risk factor could be eliminated from the population

Population attributable risk percent Population attributable risk percent (PAR%) =

total _ incidence incidence in unexposed . total incidence = {(a+c)/(a+b+c+d)} - {c/(c+d)} X 100% (a+c)/(a+b+c+d)

Population attributable risk percent

PAR% can also be calculated from case-control studies using the odds ratio:

PAR% = OR - 1 X a X 100% OR a+c

Study Design

Bias The goal of science is to be accurate in the

discovery, description, and measurement of the truth

Bias is a systematic deviation of study measurement, results or inferences from the truth

The “internal validity” of a study related to the minimization of bias so that the study result can most confidently be assigned to the factors under study

Types of bias

Measurement bias Recall Bias Selection bias

“CBC” of research evaluation

Could the findings be due to Chance (random error)?

Could the findings be due to Bias? Could the findings be due to Confounding?

Overview diagram of research designs Descriptive Analytic (hypothesis generating) (hypothesis testing) Describe Describe Observational Intervention something how something Studies (non- Studies -case report varies (by experimental) -case series time, place) -descriptive Randomized Quasi-

epidemiology Controlled Experiment studies Trial (true experiment,

clinical trial) NEXT SLIDE

Observational Studies Before-after Correlational Cross-sectional Case-control Cohort Study Study (pretest- Study (eco- (prevalence Study (Trohoc (follow-up study, posttest study) (logic study) study, survey) study, case- longitudinal study, referent study, incidence study, case-compeer prospective study) study, causal- comparative study, retrospective study) Prospective Retrospective

(concurrent, (historical, futuristic retrospective- cohort study) prospective cohort study)

Hierarchy of Clinical Study Designs

Descriptive studies Observational studies Intervention (experimental) studies Summary or extrapolation studies

» Meta-analysis » Decision analysis

Descriptive studies

Usually lack hypothesis in advance Usually satisfied with establishing non-

causal associations Case reports, case series Descriptive reports (example: vital

statistics reports, hospital discharge data, physician pharmacy profiles)

Analytic Studies

Usually have hypothesis specified in advance

Usually intended to establish causal associations

Research study structure

The variation in an exposure (a drug, environmental exposure, health habit, heritable gene, etc.) is associated (usually causally) with the variation in an outcome (disease, morbidity, mortality, quality of life, health care utilization, etc.)

Simplest format for reviewing research structure is the 2X2 table

Research study structure

Outcome Bad Good Exposure Yes a b No c d Designs vary primarily on how the cell

subjects are obtained

Observational Studies

Before-after, Ecologic (correlational), Cross-sectional

Case Control (retrospective) Cohort

» prospective cohort (prospective, follow-up) » retrospective cohort (historical prospective)

The “lesser” observational studies

Before-after: outcomes for individual or a population are compared before and after an known exposure or event

Require little resources Source of bias: unclear what would

have happened without the intervention


Ecological (Correlational): rates of exposure and outcomes for different populations are compared

Source of bias: the individuals with the exposure aren’t necessarily the ones with the outcome (“the ecological fallacy”)


Cross-sectional: individual exposures and outcomes are determined at the same point in time for a population

Gives prevalence of outcomes and exposures, but statistically inefficient

Source of bias: can be difficult to establish temporal relationship between exposure and outcome

Case-control studies Subjects with an outcome are compared to

those without, comparing their prior exposure histories

The increased risk of having an exposure based on the outcome is interpreted as an increased risk of outcome as a result of exposure

Example: people with and without prostate cancer are compared to see if they had a vasectomy

Case-control studies Exposure (a) Disease No exposure(c) Exposure (b) No disease No exposure (d) THE PAST TODAY

Situations where case-control studies are favored:

Disease under study represents a rare outcome event (only way to study some diseases)

Intent to study multiple potential risk factors for a single outcome

Not much is known about a disease, but there are associational suspicions and hypothesis generating studies are needed

Expensive to diagnose or detect outcome in study individuals

Long latent period between exposure and outcome Resources and time are limited

Steps in case-control studies

Define the hypothesis(-es) Select your cases Select you controls Ascertain exposure status as well as

status on important confounders

Case selection

When possible, cases should consist of incident (newly-arising) rather than prevalent (existing) cases

Over-sampling of cases of long duration may tend to bias the results, describing factors that influence prognosis or survival rather than etiologic factors

If disease under study is very rare, you often must use prevalent cases to get enough to study

Sources for case selection

Representative of cases arising in a defined population (either representative or total/near total sample of cases)

Cases not arising from or representative of a defined population. (N.B.--cases must be 'unselected' within source--either choose all cases or select randomly)

Problems in case selection

Misclassification of disease status Combination of heterogeneous

outcomes in defining cases

Selection of controls

"Rule of thumb"--choose control subjects who, if they had gotten the disease under study, would have been eligible for case selection

Sources for controls If cases represent all cases in a defined population,

select controls from the non-diseased members from the same population

If cases are not from a defined population; select controls from individuals receiving care from same source

Neighbors or friends, or family members Persons who underwent the same case-finding

procedure as cases, but were found to be disease-free (e.g., persons with a negative diagnostic procedure)

Matching in control selection

Matching controls represents an alternative to control of confounding factors in the analysis stage (i.e., by adjustment) by establishing criteria for selection of controls which prevent certain extraneous factors from being considered

The major disadvantage is that you can not evaluate the relationship between the matched variable and the outcome

Ascertainment of exposure

Data sources Direct from subjects via interviews,

questionnaires Proxy respondents--next of kin, household

members--this method is necessary when the disease is rapidly fatal

Pre-existing records--includes vital statistic data tapes, laboratory results, other medical records

Comments on exposure ascertainment

Need comparable ascertainment between cases and controls

Response rates should be high and similar for cases and controls

Similar consideration of ascertainment issues apply to measurement of exposure to confounding factors

Analysis of case control studies

Relative risk estimation is done by calculating the relative odds (odds ratio)

For rare outcomes (i.e., prevalence less than 10%), the odds ratio will be very close to the relative risk that would have been obtained from a similar cohort study

Sources of potential bias in case-control studies

Selection bias: » Representativeness of the case group » Appropriateness of the control group

(especially if study is not population-based) » Detection bias (unmasking bias)--results

when the identification of cases varies with exposure status

Selection bias

Am J Epidemiol 1983; 117:326-334 Hospital vs. population controls in

evaluating the association between artificial sweeteners and bladder cancer

OR with hospital controls: 0.8-0.9 OR with community controls: 1.1-1.2

Sources of potential bias in case-control studies

Information bias » Misclassification (including heterogeneous

outcomes) » Differential reporting of exposure data (including

recall bias) Confounding (especially if confounding

variable was not anticipated and measured) » In case-control studies, the factor need not be a risk

factor for the disease if it influences selection probability differentially in cases and controls

Misclassification and effects of heterogeneous outcomes: thrombotic stroke and OC use

Assume the “truth” in a study of 200 stroke patients and 200 controls is that 50% of thrombotic stroke patients and 10% of controls use OCs:

Case Control +OC 100 20 -OC 100 180 OR = (100X 180)/(100X20) = 9

Suppose as the study was conducted, 20 cases of non-thrombotic stroke with OC use the same as controls (10%) were included as cases:

Case Control +OC 100+2 18 -OC 100+18 162 OR = (102X162)/(118X18) = 7.8 --inclusion of non-related cases dilutes the odds ratio

Suppose there were 20 cases of thrombotic stroke undetected (misclassified as controls) who had OC use the same as the detected cases (50%):

Case Control +OC 90 20+10 -OC 90 180+10 OR = (90X190)/(90X30) = 6.3

Bias in observational studies Misclassification Effect on OR Exposed cases as controls underestimate Exposed controls as cases overestimate Exposed cases as unexposed underestimate Exposed controls as unexposed overestimate Unexposed cases as controls overestimate Unexposed controls as cases underestimate Unexposed cases as exposed overestimate Unexposed controls as exposed underestimate

Factors that inhibit finding associations in case-control studies:

More than one causal pathway (allows some cases to not have the exposure)

Other components of causal pathway absent (allows some non-cases to have the exposure)

Insufficient variation in exposure Too much confounding from other

factors

Cohort studies

Subjects with and without the exposure are followed and outcomes are compared

Exposed and unexposed subjects can come from the same group (geographically and temporally) or from different groups

Can be done either prospectively or retrospectively (as long as the study group can be assembled on the basis of exposure status independent of outcome status)

Cohort studies Disease (a) Exposed No disease (b) Disease (c) Not exposed No disease (d)

TODAY THE FUTURE (prospective) THE PAST TODAY (retrospective)

Situations where cohort studies are favored:

Risk factor represents a rare event Intent to study the multiple potential

outcomes of a single exposure Necessary if incidence rates are needed

from the study Necessary if limitations make other

designs unfeasible

Steps in cohort studies

Define the hypothesis(-es) Select study population(s) (exposed and

comparison groups) Exclude subjects not at risk Ascertain exposure (including

confounders) Monitor for and ascertain outcome

Study population selection

Study population is a group with a special exposure, or share a geographic and/or temporal commonality

Study population is a group for which there are special data resources available

Some other available, identifiable population

Exclude subjects not at risk

(Those who have the disease or cannot get the disease)

This prevents significant bias in the results Should be done at recruitment if follow-up

information is expensive or difficult to collect

May be done at analysis if follow-up data are readily available

Ascertainment of exposure

Need to collect data on both main exposure(s) of interest as well as other potential confounding factors

Methods/sources » Direct from subjects via surveys and/or

examinations » Abstract from available records » Other, such as environmental exposures

Select groups for comparison

Rule of thumb--if not for the presence or absence of exposure, the groups should look like two random samples from the same "universe"

Comparison group can be internal group from the cohort without the exposure or varying levels of the exposure (most common design)

Select groups for comparison

Select concurrently studied comparison groups, such as workers in a similar occupational category as the exposed group

Comparison can be made with population rates/other published rates--very common in retrospective occupational cohort studies; presents certain problems with validity

Ascertainment of outcome

Methods/sources » Vital records, such as death certificates » Other available records, such as hospital discharge

data, medical records, disease registries » Directly from study cohort (follow-up surveys and/or

examinations) If possible, should "blind" observers to

exposure status of each subject when ascertaining outcome

Analysis

Computation of the excess risk associated with the exposure involves determining and comparing the incidence or mortality in each comparison group

Outcome measures include relative risk, attributable risk, and population attributable risk percent

Sources of bias in cohort studies Selection bias can occur in the formation of

exposure groups » Those choosing to be exposed may be different

than those who don’t » There may be other factors related to outcome

that determined why one group was exposed and the other wasn't

Completeness of follow-up is a major source of potential bias, especially if loss to follow-up is unequal in exposure groups

Sources of bias in cohort studies

Ascertainment bias can occur, especially if method of determining outcome does not include blinding to exposure status

Confounding

Retrospective cohort studies

Defined as a study in which the cohort is assembled retrospectively, exposure data for this cohort are determined retrospectively, and outcome is determined now

Can retain the same level of rigor as a prospective cohort study

Retrospective cohort studies

Relies on the ability to assemble a cohort based on some common past experience and on the availability of the necessary exposure data collected without bias on all members of the cohort

While much faster and cheaper, lost to follow-up usually represents a formidable problem

Other problems (retrospective cohort)

Sample sizes are usually smaller Misclassification of either exposure or

outcome is potentially a greater problem Data on exposure status for potential

confounders may not be available

Nested studies

Cohort studies provide the opportunity to perform nested case-control studies

Once sufficient outcome endpoints have accrued, diseased individuals can be compared with those free of disease, and exposure status can be determined retrospectively

Most often used when a potential confounder is identified in the analysis as an important determinate of excess disease risk

Intervention studies

Randomized controlled trials (RCTs) Natural experiments Group randomization trials (GRTs) Quasi-experimental studies

» Before-after » Non-equivalent control group » Interrupted time series » Regression discontinuity

RCTs

Essentially the same as a cohort study, except the investigator decides who gets the exposure, using random assignment

Strongest study design of all, maximizing internal validity (usually at the expense of external validity)

Maximizes internal validity by promoting the equal distribution of potential confounders into exposed and unexposed groups

Steps in RCTs

Define the hypothesis Select study subjects Randomly allocate subjects to intervention

groups “Blind” subjects whenever possible “Blind” investigators whenever possible Follow and ascertain all relevant outcomes,

monitor for adverse effects and stopping rules

Select study subjects

Requires informed consent and strict inclusion and exclusion criteria

Pre-randomization visits are used to help insure successful participation

These design elements likely make the study population non-representative

Allocate subjects to intervention and/or control groups

Assignment must be random Block randomization can be used to

insure equal distribution of important confounders

Quasi-random techniques must produce assignments that can produce no sources of bias

Blinding

Subjects should be blinded (e.g. with placebo treatment) to which study group (experimental vs. control) they are in whenever possible to guard against placebo effects and cross-over

Allocation concealment is important

Outcome assessment

Those ascertaining outcomes should be blinded to the subject's study group whenever possible to guard against investigator bias (double-blind trial)

Consider all relevant outcomes Safety monitoring for adverse effects Stopping rules (the point at which the results

become statistically significant and ethically the trial should be ended)

Analysis

RCTs must be analyzed using the "intent-to treat" approach: subjects must be analyzed as belonging to the group to which they were first randomized, even if they cross-over to the other group

Re-assigning cross-overs invalidates the RCT, as those that cross-over are likely to be quite different from those who don't

Confounding variables must be compared across groups, as errors of randomization can lead to imbalances in the distribution of these factors

Analysis

Often RCT's compare continuous variables between groups: recognize that in general, it takes a much smaller difference to be statistically significant with continuous variables than with categorical variables

For example, do you want to know the difference in diastolic blood pressure between two groups, or do you want to know how many subjects in each group become normotensive?

Sources of bias

Errors of allocation Ascertainment of outcomes Lost to follow-up Inclusion of all relevant outcomes Cross-overs, intent to treat analysis Selection bias (now relates to external

validity/generalization) Errors of randomization and confounding

Issues in RCTs In general, lengthy and very expensive Ethical and legal issues are important Blinding can be difficult, cross-overs may be common,

and drop-outs and lost to follow-up are major problems

Strong internal validity is strong is achieved at the expense of external validity; your study groups may not generalize to any other population

Power issues become important when effect sizes are inadequate to reach statistical significance

Still, the gold standard of research design

Natural experiments

Researcher does not determine the group receiving the intervention, which occurs "naturally" or under control of some other process

Can address problems where RCT interventions are unrealistic as to what can be implemented and sustained in natural setting

Natural experiments

Can be used for rapid evaluations of innovative, expensive, or complex interventions or policy changes in natural settings

Worse internal validity; may have limited generalizability (but better than RCT)

Internal validity can be improved with more data points pre and post intervention (as in time series analysis)

Group randomized trials

Study design where the unit of assignment is an identifiable group, allocated to different exposures; unit of observation are the members of the group

Randomization provides the assumption of independence at the group level, with a desire that potential sources of bias are fairly distributed across the study exposures

Group randomized trials

There is extra variation attributable to groups, which increases the standard error of the intervention effect

This is worsened when the number of groups is small, limiting degrees of freedom and resulting in more problems with interclass correlation

Quasi-experimental studies

These studies are those in which the researcher cannot or does not assign interventions randomly to participants, but may have some control over: » who gets the intervention » when the intervention is given, and/or » when measurement occurs

Quasi-experimental studies

Study design needs to match the question under study:

There are instances where an RCT will not be the best design (when you cannot address generalizablility), when an RCT is not feasible, and when an RCT is not appropriate

Quasi-experimental studies There are two major concerns with quasi-

experimental designs: » additional sources of bias not controlled for by

random assignment (especially selection bias) » interclass correlation that may be responsible

for the observed effect, unrelated to the intervention (confounding)

Classic example: before-after study design—has same sources of bias as the observational before-after study

Quasi-experimental designs

Non-equivalent control group design Study groups are assembled in a non-

randomized fashion intended to minimize unequal distribution of important confounders, and researcher decides which group(s) gets the intervention

Group A Ol X O2

Group B Ol O2


Time series design (interrupted times series, with or without non-equivalent control group; with a control group = multiple times-series design).

Represents a refinement of the pre-post study design in that multiple measurements over time give more of a sense of whether post-intervention differences can be assigned to the intervention

Multiple time series design

Study group Ol O2 O3 O4 X O5 O6 O7 O8 Control group Ol O2 O3 O4 O5 O6 O7 O8


Regression Discontinuity A kind of time series analysis, where units are

assigned to a condition based on a cutoff score on a measured covariate; for example, for a smoking cessation intervention, communities that exceed a certain cutoff for packs of cigarettes sold are given the intervention, and communities below that cutoff are the comparison

Regression discontinuity

The treatment effect is measured as the discontinuity between treatment and control regression lines at the cutoff point (not the group mean difference)

When properly implemented and analyzed, give an unbiased estimate of treatment effect

Regression discontinuity

Effect

Smoking Prevalence

Smoking Prevalence

Tobacco Sales

Tobacco Sales

Intervention

Control group

Intervention group . .

. . .

.

. .

Causal association Causation has its roots in the Koch-Henle Postulates

Koch-Henle covers diseases for which the cause is necessary and sufficient; usually restricted to infectious agents

These postulates are not appropriate when disease entities are multifactorial, such as most chronic diseases

Diseases not caused by infectious agents usually require considering etiologic factors that are sufficient but perhaps not necessary, or even not sufficient but contributory

Causal relationships

Steps in determining cause: » Investigate the statistical and temporal

association » Eliminate alternatives through research

Often epidemiology must be satisfied with determining “causal association”

Causal Association

The association should be strong The association should be consistent The association should be strongest when

you expect it to (biological gradient) The exposure should precede the outcome The association should be biologically

plausible, including supportive data from other sources

Confounding

A distortion of the true relationship between an given exposure and a given outcome, resulting from a mutual relationship with one or more extraneous factors

The effect of the extraneous factor(s) can account for all or part of the observed relationship between exposure and outcome, or mask an underlying relationship

Confounding

Exists when the association between an exposure and an outcome is due all or in part to the mutual association with a third variable

Example: hot flashes are associated with endometrial cancer, through the mutual association with estrogen use

Confounding

Expressed diagrammatically: Exposure of interest Outcome of interest

Confounding factor

Criteria for confounding (need both):

The potential confounder must be associated with the outcome of interest: » the confounder is an actual risk factor for

the outcome » the confounder affects the likelihood of

recognizing the outcome The potential confounder must be

associated with the exposure of interest but not be a result of the exposure

When is a factor not a confounder?

If an individual's status regarding the confounder is a result of the exposure under study, or the confounder is in the "causal pathway" between exposure and outcome

If an individual's status regarding the confounder is a result of the disease under study

If the confounder is essentially measuring the same thing as the exposure

If the association between the confounder and the outcome of interest is thought to be due to chance

How do you detect confounding?

Determine if the potential confounder is associated with both exposure and outcome

Adjust for the potential confounder in analyzing the data: if there is a difference between the adjusted and unadjusted estimate of the effect of the exposure, then potential confounder is a true confounder

How do you account for or control for confounding?

Prevent confounding through the study design: » Restriction: study only the subjects in a given

category » Matching: match individuals for comparison on the

basis of their status regarding the confounder (and use a matched analysis)

» Use a RCT design: randomly allocate subjects to exposed and unexposed groups (note that confounding can still happen by chance)

How do you account for or control for confounding?

Remove effects of confounding in the analysis: » Report stratum- specific rates: list the effect of the

exposure on the outcome for each level of the confounder

» Use rate adjustment to account for the confounder » Use Mantel-Haenszel methods to account for the

confounder » Use regression analysis (especially attractive if you

need to consider several coincident confounders)

Effect modification

Occurs when the effect of a risk factor on an outcome is different at different levels of a third factor; the third factor is known as an effect modifier

Note that compared to the definition of confounding, effect modification says nothing about the relationship between the outcome and the effect modifier

Effect modification

The most common effect modifiers seen are age and gender » For example, the effect of estrogen on

heart disease risk is very different for men and women; hence gender is an effect modifier

It is usually not appropriate to summarize over the strata of an effect modifier

Confounding and effect modification

Males CAD No CAD High chol 45 47 Normal 9 16 OR = 1.7 Females High chol 5 3 Normal 42 33 OR = 1.3 Pooled High chol 50 50 Normal 51 49 OR = 0.96

Detecting confounding

CAD No CAD Male 54 63 Female 47 36 OR= 0.66 High chol Normal Male 92 25 Female 8 75 OR= 34.5 Gender is related both to the outcome of interest and to the risk

factor of interest and therefore meets the criteria of a confounder

Rate adjustment

Rate adjustment is a technique used to provide a single number (adjusted rate) for each of two or more comparison groups which summarizes the experience of the group but is not influenced by a confounding variable when the comparison is made

Rate adjustment

Rate adjustment is most often used to account for the affect of age (age adjustment); also commonly used to adjust for gender; could be used to adjust for any confounder or group of confounders

In rate adjustment, you essentially estimate how the comparison would turn out if the groups had the same distribution of the confounder (e.g., the same age distribution)

Direct adjustment

Choose a standard population whose age distribution is known, such as the US population

Calculate the number of events which would be expected to occur in each age group of the standard population using the age group-specific rates from each comparison group

Direct adjustment

Calculate the age-adjusted event rate for each comparison group:

Expected events in standard pop'n Age-adjusted rate = (using rates from comparison group) Total in standard population

Direct adjustment

Interpret these rates as the expected event rate for each comparison group if the group population had the same age distribution of that standard population

Note that the age-adjusted rates will vary depending on the age distribution of the standard population used for the standardization

Direct adjustment If all you want to do is compare the event

rates in the comparison groups then what standard population you choose is not important » dividing the two rates will give you an age-

adjusted relative risk If you want to use the age-adjusted rates as

descriptive tools in other comparisons, you will want to use a standard such as the U.S. population

Direct adjustment

US pop’n CO rate of disease

Expected cases

WA rate of disease

Expected cases

Age 40-50 10 million

1/1,000 1,000 1.2/1,000 1,200

Age 50-60 10 million

2/1,000 2,000 2.3/1,000 2,300

Total 20 million

3,000 3,500

Direct adjustment

Age adjusted rate: Colorado 3,000/20,000,000 = 1.5/10,000

Age adjusted rate: Washington

3,500/20,000,000 = 1.75/10,000

Indirect adjustment

Choose a standard population for which the age-specific event rates are known. This standard population can be either of the comparison groups or an external population

Indirect adjustment

Calculate the number of deaths which would be expected to occur in each age group of each of the comparison groups had they been subject to the event rates observed for the corresponding age groups in the standard population

Indirect adjustment

Calculate the Standardized Mortality Ratio (SMR) for each study group:

SMR = Observed number of deaths in study group Expected number of deaths in study group

Indirect adjustment

Pop’n Obs Exp Pop’n Obs Exp

Age 40-50 1%

10,000 1,500 1,000 15,000 1,000 1,500

Age 50-60 1.5%

20,000 3,500 3,000 20,000 2,500 3,000

Total 5,000 4,000 3,500 4,500

US death rate

Colorado

Washington

Indirect adjustment

Colorado SMR: 5,000/4,000 = 1.25

Washington SMR:

3,500/4,500 = .078

Indirect adjustment

The SMRs can be interpreted as the ratio of the observed number of events in each comparison group to the number that would be expected based on the event rates from the standard population

An SMR of 2 in comparing the mortality rate in group A to group B means that twice as many deaths occurred in group A after accounting for age

Indirect vs. direct adjustment

Direct adjustment is commonly used to provide adjusted rates for descriptive purposes, and in comparing data from several different studies


Indirect adjustment may be preferable when there are only a small number of people in each age category of the study groups where the age-specific rates are based on small numbers and changes of only one or two individuals in the numerator produce big changes in the age-specific rates

Indirect vs. direct adjustment Both methods give a single number for each

study group These somewhat artificial summary numbers

are based on hypothetical circumstances, but since the circumstances are the same for both groups, these numbers provide a fairer comparison than the overall rates, which reflect both differences in age distributions and the underlying differences between comparison groups


Direct adjustment Indirect adjustment Data used from standard pop’n Age distribution Age-specific rates Data used from each study group Age-specific rates Age distribution Result of adjustment for each study group Age-adjusted rate Ratio of observed to expected events (SMR)

Disease Screening

The disease/prevention spectrum: Primary Secondary Tertiary Prevention Prevention Prevention No Asymptomatic Symptomatic Disease with Disease Disease Disease Complications

Criteria for targeting a disease for screening

The disease is important The disease has a recognizable pre-

symptomatic stage—this stage needs to be long enough to provide you a window for screening, diagnosis and treatment

Reliable screening tests exist for the pre-symptomatic stage and have acceptable sensitivity and specificity as well as acceptable risks for the screened individuals.

Criteria for targeting a disease for screening

Treatment of the disease during the pre-symptomatic stage must result in improvement in outcome (morbidity and/or mortality)

Sufficient resources and access to these resources must exist for diagnosis and treatment of disease in the population with positive screening tests

Benefits of screening for this disease should outweigh the benefits of other programs

Screening test utilities

The “Truth”: Disease No Disease Test Results: Disease True Positives False Positives

(TP) (FP) No Disease False Negatives True Negatives (FN) (TN)

Sensitivity and Specificity

Sensitivity describes how often a screening test detects a disease when it is indeed present; = TP/(TP+FN), or true positives over the total with disease

Specificity describes how often a screening test detects the absence of disease when it is indeed absent; = TN/(TN+FP), or true negatives over the total without disease

Predictive Value

Positive predictive value describes how often individuals with positive tests actually have the disease; = TP/(TP+FP), or true positives over all positives

Negative predictive value describes how often individuals with negative tests are actually disease-free; = TN/(TN+FN), or true negatives over all negatives

Effect of prevalence

Sensitivity and specificity remain the same for the test regardless of the prevalence of disease in the population

Predictive value depends on sensitivity, specificity, as well as disease prevalence

Effect of prevalence

As the test is applied to populations with lower disease prevalence, positive predictive value drops and negative predictive value increases

Conversely, as disease prevalence increases, positive predictive value increases and negative predictive value drops

Example

If fecal occult blood testing is 92% sensitive and 95% specific, and the prevalence of colon cancer is 2/1000, what is the positive predictive value of a positive test (i.e., what percent of positives will have colon cancer)?

Assume you screen 100,000 adults 200 have cancer 99,800 have no cancer 184 test 16 test 4990 test 94,810 test positive (TP) negative (FN) positive (FP) negative (TN) Then 184 out of (184 + 4990), or 3.6% of people testing

positive have the disease

Note that negative predictive value is over 99.9%. As the prevalence increases to 50 per 1000, positive predictive value rises to 49.2% and negative predictive value decreases to 99.6%

100,000 adults 5,000 have cancer 95,000 have no cancer 4,600 test 400 test 4,750 test 90,250 test positive (TP) negative (FN) positive (FP) negative (TN) 4,600 out of (4,600 + 4,750), or 49.2% of people testing positive

have the disease

Screening for rare diseases If the disease is rare in the population screened,

even a test with high sensitivity and specificity will produce many more false positives than true positives

The false positives can result in significant costs, health risks, and anxiety, as further tests are required to diagnose disease

Strategies to improve the predictive value of a test include using a test with higher sensitivity and specificity, or targeting screening efforts on high-risk populations

Summary of effects of prevalence and sensitivity and specificity on predictive values

Prevalence ----> PPV and NPV Prevalence ----> PPV and NPV Specificity ----> PPV Sensitivity ----> NPV

Screening test cutoffs

There are often times when it is possible to choose a cutoff for declaring a positive or negative test

Most often this occurs when the screening test involves measuring the level of a substance in a body fluid such as urine or blood

In such a case, the choice of a cutoff for a positive test must involve consideration of the effects of trading off false positives for false negatives

– (N.B.--selection of cutoffs may not apply for more subjective screening tests such as mammography.)

PKU example

PKU (phenylketonuria) affects about 1/10,000 newborns. The following results are from a cost-benefit analysis of a Swedish program

With PKU Normal Cost/ Cutoff point TP FN FP TN Sens/Spec +/-Pred Val Case 0.4 mmol/l 43 0 33 1,326,421 100/>99.9 57/100 379K 0.5 mmol/l 43 0 26 1,326,428 100/>99.9 62/100 378K 0.6 mmol/l 41 2 13 1,326,441 95/>99.9 76/>99.9 477K 0.7 mmol/l 38 5 7 1,326,447 88/>99.9 84/>99.9 624K

Costs are in Swedish Kroner and ignore the 15 Kroner cost of the screening test itself, using only the additional costs of diagnosis for normal infants with positive tests, the additional costs of treatment for PKU infants, and the lifetime care of undetected PKU infants.

PKU example

Serum Phenylalanine

Normal children

PKU children

Cutoff A Cutoff B

Caveats about cutoffs

No cutoff point totally separates those with or without disease

Choosing a lower cutoff point raises sensitivity at the expense of lowering positive predictive value and (although not seen in this example) specificity; choosing a higher cutoff does the opposite

Caveats about cutoffs

Thus the choice of the best cutoff involves trading off false positives against false negatives

FPs and FNs may not deserve equal consideration; here false negatives have worse health consequences as well as generating more costs

Screening test biases

In the evaluation of a screening procedure or program, at least two sources of bias must be considered, lead time and length time bias

These sources of bias can result in finding improvements in disease mortality even if the early treatment of the disease in no way alters the outcome

Lead time bias

Lead time bias ("zero-time shift bias") occurs because screening tests detect illness at earlier points in time

Lead time bias If you developed a screening test that diagnoses

a disease one year earlier, but early treatment had no effect on survival, you would know the person had disease for a year longer than those detected without screening

Comparison of the outcomes between screened and unscreened groups would yield the erroneous conclusion that the screening program resulted in an improved outcome, when all it did was add a year to the length of time that the disease was diagnosed

Length time bias Length-time bias can be understood by

recognizing that for most diseases, the natural history varies in time span » One cancer may progress quickly through all

phases of the disease spectrum, with a short asymptomatic phase and a short interval between the development of symptomatic disease and death

» Another cancer of the same organ may have a very long, indolent course, with long asymptomatic and symptomatic phases

Length time bias

If the prevalence pool (those with the disease in the population) are made up of individuals with these differences, then a screening test would be more likely to pick up individuals with longer asymptomatic phases

Hence, those detected by screening would be more likely to have longer disease regardless of whether it is detected by screening or not

Parallel and Series Testing Strategies

There are two methods for combining the results of multiple screening tests in an overall strategy: » Parallel testing: the overall screening result is

positive if any one test is positive; this strategy increases sensitivity at the expense of specificity

» Series testing: the overall screening result is positive only if all tests are positive; this strategy increases specificity at the expense of sensitivity

Concepts of infectious disease epidemiology

Types of host-agent interaction: spectrum of disease

Colonization: agent infects host continuously with no overt evidence of disease or infection

Covert infection: agent infects host, time-limited, with no overt indication (majority of infections)

Overt infection--infection with disease (minority of infections)

Typical patterns of infection

Inapparent infection frequent, rare clinical disease » Inapparent infection is important in disease

transmission and control, and influences apparent epidemiology (understates amount of disease and overstates severity)

Clinical disease frequent, few severe cases Infection usually always fatal

Infectious disease terms

Infectivity--ability to involve/multiply in a host (establish an infection) » ID50 = dose of agent necessary to infect

50% of hosts Infectiousness--ability to be transmitted

to other hosts Pathogenicity--ability to cause disease

in a susceptible host; to produce clinical illness

Infectious disease terms

Virulence--ability to produce severe clinical illness, including death. » LD50 = dose of agent necessary to kill 50%

of hosts Immunogenicity--ability to elicit an

immune response (cellular, local, or systemic)

Areas important to the natural history of infectious disease

Reservoir--a thing (person, animal, insect, plant, soil) in which an agent lives, depends on for survival, and reproduces for transmission into a susceptible host

Vector--that which allows for the transmission of the agent

Transmission--the action of moving from a reservoir to a susceptible host

Host--the organism at the receiving end of the transmission

Types of transmission

Direct--immediate transfer from one host to another » Examples--air droplets, direct body contact » Control target--primary host

Indirect--transmission via an intermediate agent » Examples--Vehicle-borne (fomites, water)

– vector borne (mechanical/passive = flies; biological/active = mosquitoes)

– Control target--secondary host » Airborne--aerosolized organisms (mist, dust)

Issues in transmission Generation time--period between infection and

maximal communicability (can be the same as incubation time)

Herd immunity--resistance of a population to the spread of disease based on prior immunity of the population.

Secondary attack rate--measures the rate of spread of disease within an exposed group

= (#new cases - # initial cases) (#susceptible - # initial cases)

Types of infections in a population Sporadic--occasional cases at irregular intervals Endemic--low level (usually), expected frequency of

disease occurrence; the constant presence of a disease within a given geographical area ("usual prevalence")

Hyperendemic--a gradual increase in the frequency of disease occurrence above endemic level

Epidemic--a sudden increase in the frequency of disease occurrence above endemic level (clearly in excess of expected levels)

Pandemic--an epidemic occurring across continents

Types of outbreaks/epidemics

Common source--point source. Example--food poisoning, ongoing source =

water contamination Epidemiologic curve (plot of number of cases

against time) for point source is single humped, with a rapid rise and a slower return to baseline

Curve for ongoing source shows rapid rise (at onset of exposure) to persistent epidemic level


Propagated epidemic-- transmission, direct or indirect, from one host to another

Epidemiologic curve is multi-humped: » the first peak represents the primary attack

rate for those exposed to the index case » subsequent peaks represent secondary and

beyond attack rates as those in the first peak infect others


Vector borne similar to common source, but with

complex pattern because of vector

Control of infectious disease

Control of reservoir: animal, human Control of transmission: vector,

environmental Reduction of host susceptibility Disease eradication

Vaccine efficacy

Vaccine efficacy =

Attack rate (AR) in unvaccinated – AR in vaccinated . AR in unvaccinated = (1-Relative Risk (vaccinated vs. unvaccinated)) x 100% = 1 - Proportion of cases vaccinated X (1- Proportion of population vaccinated) (1-proportion of cases vaccinated) Proportion of population vaccinated

x 100%

x100%

Documents

ACPM Review Course Epidemiology51c57a956dd663bc9927-2f556433223620ea71c4245ec155be59.r73.cf1.rackcdn.…ACPM Review Course Epidemiology Objective of session: to review the major terms,