Study Design in Molecular Epidemiology of Cancer Epi243 Zuo-Feng Zhang, MD, PhD

Study Design in Molecular Epidemiology of Cancer

Epi243

Zuo-Feng Zhang, MD, PhD

Objectives of Molecular Epidemiology

To gain knowledge about the distribution and determinants of disease occurrence and outcome that may be applied to reduce the frequency and impact of disease in human populations.

Epidemiological Study Design and Analysis

• Transitional studies provide a bridge between the use of biomarkers in laboratory experiments and their use in cancer epidemiological studies.

• The study is employed to characterization of biomarkers

• The problem of the use of biomarkers• Serve as preliminary results rather than end results

about cancer etiology and prevention

Epidemiological Study Design and Analysis

Transitional studies:• Measure Intra- and inter-subject variability• Explore the feasibility of marker use in field

condition• Identify potential confounding and effect-

modifying factors for the marker• Study mechanisms reflected by the

biomarker

Transitional Studies

Transitional studies can be divided into three functional categories:

• Developmental

• Characterization

• Applied studies

Transitional Studies: Developmental Studies

Developmental studies involved• determining the biological relevance• pharmacokinetics• reproducibility of measurement of the marker• the optimal conditions for collecting, processing,

and storing biological specimens in which the marker is to be measured

Transitional Studies: Characterization

Assessing inter-individual variation and the genetic and acquired factors that influence the variation of biomarkers in populations


• Assessing frequency or level of a marker in populations

• Identifying factors that are potential confounders or effect modifiers


• Establishing the components of variance in biomarker measurement, laboratory variability, intra-individual variation, and inter-individual variation. The ratio of intra-individual variation to inter-individual variation has important implications for study size and power

Transitional Studies: Applied Studies

• The applied studies assess the relationship between a marker and the event that it marks, including exposure, pre-clinical effects, disease, and susceptibility

• The study is usually cross-sectional or short term longitudinal design and not intended to establish or refute a causal relationship between given exposure and disease.

Transitional Studies: Ethical Issues

• The objectives of the research generally are not to identify health risks, but to identify characteristics of the biomarker or the distribution of the marker by population subtypes.

• The meaning of the biomarker results is usually unknown.

• There is a need to anticipate the impact of transitional studies on study subjects and plan to address their concerns.

Cohort or Case-Control Studies

• In the clinical-based cohort studies, of treated patients or screened populations, the inclusion of biological measures of exposure and susceptibility is both methodologically sound and logistically feasible

Cohort or Case-Control Studies

• In population-based studies, the collection of biological material for such markers is feasible but logistically more complex.

• For early biological marker, collection of materials (e.g., pre-cancerous lesions) is logistically feasible in a hospital setting, but become more difficult in the population setting

Prospective Studies: Strengths

• Exposure is measured before the outcome

• The source population is defined

• The participation rate is high if specimen are available for all subjects and follow-up is complete

Prospective Studies: Weaknesses

• The usually small number of cases of each of many type of cancer

• The lack of specimen if the biomarker requires large amounts of specimen or unusual specimens

• Degradation of the biomarkers during long-term storage

• The lack of details on other potentially confounding or interacting exposures

Prospective Studies

• The major concern of cohort studies of the short duration (as in case-control studies) is the possibility that the disease process has influenced the biomarker level among cases diagnosed within 1 to 2 years of the specimen being collected.

Prospective Studies: Misclassification

• In prospective studies in longer duration, there may be considerable misclassification of the etiologically relevant exposures if the specimens have been collected only at baseline.

• This misclassification occurs when individual’s exposure level may change systematically over time and there may be intra-individual variation in biomarker level.

Prospective Studies: Intra-Individual Variation

• The intra-individual misclassification may be reduced by taking multiple samples, but this will generally increase expenses of sample collection and storage and the burden on study subjects

• Similar approaches apply to taking sample at several points in time in an attempt to estimate time-integrated exposures or exposure change.

Prospective Studies

• An alternative approach is to estimate the extent of intra-individual variation, and the misclassification involved in taking single specimens, by taking multiple specimens in a sample of the cohort.

• This information can be used to correct for bias to the null introduced if the misclassification is non-differential, and therefore de-attenuate observed relative risks

Prospective Studies: Ethical Issues

• Repeated contact of subjects

• Informing the cohort members of their biomarker level is problematic if the biomarker is not considered to be sufficiently predictive of disease and if there is no preventive steps cohort members can take to reduce their risk of the disease

Nested Case-Control Study

• The biomarker can be measured in specimens matched on storage duration

• The case-control set can be analyzed in the same laboratory batch, reducing the potential for bias introduced by sample degradation and laboratory drift

Case-Cohort Study Design

• Collecting the specimens at the baseline for entire cohort and then collecting specimens from cases as they occur.

• Measuring the biomarker using newly collected specimen and using the baseline cohort specimen as control.

• Because the specimens for cases and controls are taken at the different times for cases and controls, bias will be introduced if sample degradation or lab drift occurs over time

Case-Control Study Design

• For genetic susceptibility markers, case-control study design is highly appropriate

• Clinic-based case-control studies are particularly suitable for studies of intermediate endpoints, as these end-point can be systematically measured.

• Clinic-based case-control studies are excellent for studying etiology of precancerous lesions (e.g., CIN)

Case-Control Study Design

• Biomarkers of internal dose (e.g., carrier status for infectious agents, such as HBsAg) or effective dose (PAH DNA adducts) are appropriate when they are stable over a long period of time or when the exposures have been constant over exposure period. However, it is essential that you are not affected by the disease process, diagnosis, or treatment.

The Case-Case Design:

Applications in Tumor Markers and Genetic Polymorphisms

Studies

Case-Case Study Design

• To identify etiological heterogeneity

• To evaluate gene-environment interaction

Case-Case Study Design

• Case-only, Case-series, etc.

• Studies with cases without using controls

• Can be employed to evaluate the etiological heterogeneity when studying tumor markers and exposure

• May be used to assess the statistical gene-environment or gene-gene interactions

Interaction Assessment using Case-Control Study

Genotype abnormal OR1

Genotype normal OR2

Interaction measure OR1/OR2

here OR2=OR01

OR1=OR11/OR10

OR Interaction= OR11/(OR10xOR01)

Comparison of Case-Control and Case-Case Study designs

Parameter Case-control Case-Case

Beta(01) OR01 Not measured

Beta(10) OR10 Not measured

Beta interaction

ORint=

OR11/OR01xOR10

Measured

Beta (11) OR11=OR01 x OR10 x ORint

Not measured

Assumptions for Case-Case Study Design

• Exposure and genotype occur independently in the population

• The Risk of disease is small (or the disease is rare) at all level of the study variables

From Rothman & Greenland, p.615

Smoking and TGF-alpha Polymorphism

Smoking TGF-B Case Control OR adj.

Never Normal 36 A00 167 B00 1.0 OR00

Never Positive 7 A01 34 B01 1.0 OR01

Yes Normal 13 A10 69 B10 0.9 OR10

Yes Positive 13 A11 11 B11 5.5 OR11

OR int= OR11/(OR01 x OR10) = 5.5/(1.0 x 0.9)=6.1

OR CA=(A11 x A00)/(A10 x A01)=(13 x 36)/(13 x 7)=5.1

OR int=OR CA/OR CO=[OR 11/(OR01xOR10)]

OR11=A11 B00/A00 B11

OR CA = [OR 11/(OR01xOR10)] x OR CO

Assumption: OR CO=1,

OR int = OR CA

Sample Size

Main effect Interaction

Case-control (RR) 2.0 (RR) 2.0

Sample size 150 cases

150 controls

600 cases

600 controls

Case-Case 300 cases

Strengths of Case-Case Study Design

• Case-Case study design offers greater precision for estimating gene-environment interaction than case-control study design

• The power for detecting gene environment interactions in case-case study is comparable to the power for assessing a main effect in a classic case-control study. Which leads to reduced sample size for interaction assessment.

Strengths of Case-Case Study Design

• Only cases are needed, thus avoiding the difficulties and often unsatisfying selection of appropriate controls (avoiding selection bias for controls)

Limitations of Case-Case Study Design

• The main effects of susceptible genotype (G) and environment effect (E) cannot be estimated

• The case-case study will miss gene-environment models with departures from additivity.

Intervention Studies

• In studies of smoking cessation intervention, we can measure either serum cotinine or protein or DNA adducts (exposure) or p53 mutation, dysplasia and cell proliferation (intermediate markers for disease)

• Measure compliance with the intervention such as assaying serum -carotene in a randomized trial of -carotene.

Intervention Studies

Susceptibility markers (GSTM1) can also be used to determine whether the randomization is successful (comparable intervention and control arms)

Family Studies

• Does familial aggregation exist for a specific disease or characteristic?

• Is the aggregation due to genetic factors or environmental factors, or both?

• If a genetic component exists, how many genes are involved and what is their mode of inheritance?

• What is the physical location of these genes and what is their function?

Issues in Study Design and Analysis

• Relating a particular disease (or marker of early effect); to a particular exposure; while minimizing bias; controlling for confounding; assessing and minimizing random error; and assessing interactions

Sample Size and Power Consideration

EPI243: Molecular Epidemiology of Cancer

Sample Size and Power

• False positive (alpha-level, or Type I error). The alpha-level used and accepted traditionally are 0.01 or 0.05. The smaller the level of alpha, the larger the sample size.


• False negative (beta-level, or Type II error). (1-beta) is called the power of the study. Investigator like to have a power of around 0.80 or 0.95 when planning a study, which means that there have a 80% or 95% chance of finding a statistically significant difference between study and control groups.


• The difference between study and control groups (delta). Two factors need to be considered here: one is what difference is clinically important, and the another is what is the difference reported by previous studies.


• Variability. The more the variability of the data, the bigger the sample size.

Power or Sample Size Estimate for Case-Control Studies

• Alpha-level (false positive): 0.05

• Beta-level (false negative level; 1-beta=power): 0.20

• Delta-level: Proportion of exposure in controls and exposure in cases or expected odds ratio

Power Estimate

Sample Size Estimate

N vs P0 by OR with Phi=0.20 M=1 Alpha=0.05Power=0.80 M.C.C. Test

1.50

2.00

2.50

N OR

P0

0

500

1000

1500

2000

2500

0.0 0.1 0.2 0.3 0.4 0.5

Estimate Minimum Detectable Odds Ratios

OR vs P0 by N with Phi=0.20 M=1 Alpha=0.05Power=0.80 M.C.C. Test

200

400

600

OR N

P0

1

3

5

7

9

0.0 0.2 0.4 0.6 0.8 1.0

Gene-Environment (Gene-Gene) Interaction

EPI242: Molecular Epidemiology

Zuo-Feng Zhang, MD. PhD

Definition for Interaction

• Interaction (effect modification) occurs when the estimate of effect of exposure depends on the level of other factor in the study base.

• Interaction is distinct from confounding (or selection or information bias), but rather a real difference in the effect of exposure in various subgroup that may be of considerable interest.

Interaction Assessment

Factor A

Absent Present

Factor A Absent RR00 RR01

Present RR10 RR11


• RR00, relative risk when both factors absent

• RR01, relative risk when factor A present only

• RR10, relative risk when factor B present only

• RR11, relative risk when both factors A & B present


• Combined RR = RR11

RR11 > RR01 x RR10 indicating more than multiplicative interaction

or RR11/RR10 >or < RR01/RR00

or RR11/RR01xRR10 > or < 1

• Interaction RR = RR11 / (RR01 x RR10)

Odds Ratios for two factors,Interaction?

Factor B

absent present

Factor A absent 1.0 2.5

present 4.0 10.0

No more than multiplicative interaction

• ORs for factor B: 2.5 when factor A present; 2.5 (10.0/4.0) when factor A absent

• ORs for factor A: 4.0 when B absent and 4.0 (10.0/2.5) when factor B present


Factor B

absent present


present 4.0 20.0

More than Multiplicative Interaction, Positive Quantitative

Interaction

• ORs for factor B: 2.5 when factor A absent; 5.0 (20.0/4.0) when factor A present

• ORs for factor A: 4.0 when B absent and 8.0 (20.0/2.5) when factor B present


Factor B

absent present


present 4.0 5.0

More than Multiplicative Interaction, Negative

Quantitative Interaction

• Both factors increase the risk regardless of the value of the other factor, but the combined effect is less than the product of the two, although greater than that of either factor alone, giving a negative quantitative interaction.


Factor B

absent present


present 4.0 4.0

More than Multiplicative Interaction, Negative

Quantitative Interaction

• Both factors increase the risk

• When A is present, there is no additional effect of factor B

• Adding factor A to factor B, only increases the risk to the degree found for factor A alone (4.0), leading to negative quantitative interaction.

Sample Size Consideration for Interaction Assessment

• Evaluation of interaction requires a substantial increase in study size. For example, in a case-control study involves comparing the sizes of the odds ratios (relating exposure and disease) in different strata of the effect modifier, rather than merely testing whether the overall odds ratio is different from the null value of 1.0.

Sample Size Consideration

• The power to test interaction depends on the number of cases and controls in each strata (of the effect modifier) rather than overall numbers of cases and controls.

• When considering possible interactions, the size of the study needs to be at least four time larger than when interaction is not considered (Smith and Day)

Documents

Study Design in Molecular Epidemiology of Cancer Epi243 Zuo-Feng Zhang, MD, PhD