Causation. Associations may be due to Chance (random error) statistics are used to reduce it by appropriate design of the study statistics are used to

Causation

Associations may be due to

Chance (random error) statistics are used to reduce it by appropriate design of the studystatistics are used to estimate the probability that the observed results are due to chance

Bias (Systematic error) must be considered in the design of the study

Confounding can be dealt with during both the design and the analysis of the study

Causation





Causation

Random error vs. Systematic error

70

70

70

70

70

70

70

70

70

70

----

70

69

70

71

71

70

68

72

69

69

71

----

70

71

71

71

71

71

71

71

71

71

71

----

71

70

71

72

73

70

71

70

69

70

71

----

71

Dealing with chance error

During design of study Sample size Power

During analysis (Statistical measures of chance)

Test of statistical significance (P value) Confidence intervals

Statistical measures of chance I(Test of statistical significance)

Observed association

Association in Reality

Yes No

Yes

No

Type I error

Type II error

P-value

the probability the observed results occurred by chancethe probability that an effect at least as extreme as that observed could have occurred by chance alone, given there is truly no relationship between exposure and disease (Ho)statistically non-significant results are not necessarily attributable to chance due to small sample size

Statistical Power

Power = 1 – type II errorPower = 1 - ß

P value

0.00001

Clinical ImportanceVS

Statistical Significance

Statistical measures of chance II(Confidence intervals)

Question?

20 out of 100 participants: 20%



What is the difference?

Answer: Confidence Interval

Definition: A range of values for a variable of interest constructed so that this range has a specified probability of including the true value of the variable for the populationCharacteristics:

a measure of the precision (stability) of an observed effectthe range within which the true magnitude of effect lies with a particular degree of certainty95% C.I. means that true estimate of effect (mean, risk, rate) lies within 2 standard errors of the population mean 95 times out of 100Confidence intervals get smaller (i.e. more precise or more certain) if the underlying data have less variation/scatterConfidence intervals get smaller if there are more people in your sample

95% Confidence Interval (95% CI)

20 out of 100 participants: 20%95% CI: 12 to 28

80 out of 400 participants: 20%95% CI: 16 to 24


95% CI: 19.2 to 20.8

How to Estimate CI?

Standard Error (SE)

95% CI = statistic ± 1.96 SE

Example: 95% CI of mean = sample mean ± SEM SDSEM = n

How to Estimate CI? (example)

A sample of 100 participantsMean of their age 25 yearsSD of age: 10CIM?

CIM = 25 ± 1.96 * 10 / 100 CIM ~ from 23 to 27

Confidence Interval vs

P value





Causation





Causation

Bias

Any systematic error that results in an incorrect estimate of the association between risk factors and outcome

Types of Bias

Selection bias – identification of individual subjects for inclusion in study on the basis of either exposure or disease status depends in some way on the other axis of interestObservation (information) bias – results from systematic differences in the way data on exposure or outcome are obtained from the various study groups

Selection Bias

Self-selection bias Healthy (or diseased) people may seek out participation in the study

Referral biasSicker patients are referred to major health centers

Diagnostic biasDiagnosis of outcome associated with exposure

Non-response biasResponse, or lack of it, depends on exposure

Differential loss to follow-upExposed (or unexposed) group followed with different intensity

Berkson’s biasHospitalization rates differ for by disease and presence/absence of the exposure of interest

Information Bias

Data collection biasBias that results from abstracting charts, interviews or surrogate interviews

Recall bias Disease occurrence enhances recall about potential exposures

Surveillance or detection biasThe exposure promoted more careful evaluation for the outcome of interest

Reporting biasExposure may be under-reported because of attitudes, perceptions, or beliefs

Bias results from systematic flaws

study design, data collection, analysis interpretation of results

Control of Bias

Can only be prevented and controlled during the design and conduct of a study:

Careful planning of measurementsFormal assessments of validityRegular calibration of instrumentsTraining of data collection personnel





Causation





Causation

Confounding

Confounding results when the effect of an exposure on the disease (or outcome) is distorted because of the association of exposure with other factor(s) that influence the outcome under study.

Confounding

Coffee MI

Smoking, Stress

Observed association, presumed causation

Observed associationTrue association

Properties of a confounder

A confounding factor must be a risk factor for the disease. The confounding factor must be associated with the exposure under study in the source population. A confounding factor must not be affected by the exposure or the disease.

The confounder cannot be an intermediate step in the causal path between the exposure and the disease.

Confounding

High fat diet

MI

cholesterol


Observed association True association

Control of Confounding

During design of study Restriction Matching Randomization

During analysis Stratified analysis Multivariate analysis

Confounding

Coffee MI

Smoking, Stress


Observed associationTrue association

Is there any confounding effect?

Coffee drinkingMyocardial Infarction

Yes No

Yes36 24

No24 36

OD= ?

OD = 36×36/24×24OD= 2.25

Stratification

CoffeeDrinkin

g

MyocardialInfarction

Yes No

Yes 2 6No 18 34

CoffeeDrinkin

g


Yes No

Yes 36 24No 24 36

CoffeeDrinkin

g


Yes No

Yes 34 18No 6 2

Smokers

Non Smokers

Stratification

CoffeeDrinkin

g


Yes No

Yes 2 6No 18 34

CoffeeDrinkin

g


Yes No

Yes 34 18No 6 2

Smokers

Non Smokers

OD= ?

OD = 34×2/18×6

OD= 0.63

OD= ?

OD = 2×34/6×18

OD= 0.63

Crude and Common OR

Crude OR = 2.25

Common OR = 0.63

So

There is confounding effect.

Mantel Haenzel OD

Sample sizes in the strata is not equal

?

Common OR is a weighted summary of OR

Control of Confounding

During design of study Restriction Matching Randomization

During analysis Stratified analysis Multivariate analysis





Causation





Causation

DETERMINATION OF CAUSATION

The general QUESTION:Is there a cause and effect relationship between the presence of factor X and the development of disease Y?One way of determining causation is personal experience by directly observing a sequence of events.

Personal Experience / Insufficient

Long latency period between exposure and diseaseCommon exposure to the risk factorSmall risk from common or uncommon exposureRare diseaseCommon diseaseMultiple causes of disease

Scientific Evidences

The answer is made by inference and relies on a summary of all valid evidence.

Nature of Evidence:

1. Replication of Findings – consistent in populations

2. Strength of Association –significant high risk

3. Temporal Sequence – exposure precede disease

Nature of Evidence:

4. Dose-Response – higher dose exposure, higher risk

5. Biologic Credibility – exposure linked to pathogenesis

6. Consideration of alternative explanations –

the extent to which other explanations have been considered.

Nature of Evidence

7. Cessation of exposure (Dynamics) –

removal of exposure – reduces risk

8. Specificity specific exposure is associated with only one disease

9. Experimental evidence

Bradford Hill Criteria (1965)

criteria for assessing causality: Consistency StrengthSpecificity TemporalityPlausibility Coherence Dose Response AnalogyExperimental evidence

Bradford Hill Criteria

Hill stated None of my criteria can bring indisputable evidence for or against the cause-and-effect hypothesis None can be required as sufficient alone

Henle-Koch Postulates

before a causative relationship can be accepted between a disease agent and the disease in question

The agent must be shown to be present in every case of the disease by isolation in a pure culture.The agent must not be found in cases of other disease.Once isolated, the agent must be capable of reproducing the disease in experimental animals.The agent must be recovered from the experimental disease produced

Henle-Koch Postulates

Useful in infectious diseasesNot useful with complicated chronic diseases

Heart diseaseDiabetesCancerViolence

H. pylori

Consistencyassociation has been replicated in other studies

Strength H. pylori is found in at least 90% of patients with duodenal ulcer

Temporal relationship11% of chronic gastritis patients go on the develop duodenal ulcers over a 10-year period.

Dose responsedensity of H.pylori is higher in patients with duodenal ulcer than in patients without

H. pylori

Biologic plausibilityoriginally – no biologic plausibilitythen H. pylori binding sites were foundknow H. pylori induces inflammation

CessationEradication of H Pylori heals duodenal ulcers

SMOKING AND LUNG CANCER

1. Strength of Association: The relative risks for the association of smoking and lung cancer are very high

2. Biologic Credibility: The burning of tobacco produces carcinogenic compounds which are inhaled and come into contact with pulmonary tissue.


3. Replication of findings: The association of cigarette smoke and lung cancer is found in both sexes in all races, in all socioeconomic classes, etc.

4. Temporal Sequence: Cohort studies clearly demonstrate that smoking precedes lung cancer and that lung cancer does not cause an individual to become a cigarette smoker.


5. Dose-Response: The more cigarette smoke an individual inhales, over a life-time, the greater the risk of developing lung cancer.

6. Dynamics (cessation of exposure):

Reduction in cigarette smoking reduces the risk of developing lung cancer.

Smoking is cited as a cause of lung cancer, however. . .

. . . smoking is not necessary (is not a prerequisite) to get lung cancer. Some people get lung cancer who have never smoked.. . . smoking alone does not cause lung cancer. Some smokers never get lung cancer.

Smoking is a member of a set of factors (i.e., web of causation) which cause lung cancer.

The identity of all the other factors in the set are unknown. (One factor in the web of causation is probably genetic susceptibility.)

necessary / sufficient

D

E A

B

C F

B

AH

G

F

C

AJ

I

“A” is necessary – since it appears in each sufficient causal complex

“A” is not sufficient –

Disease Not Present Disease Present Disease Present


necessary and sufficientthe factor always causes disease and disease is never present without the factor

most infectious diseasesnecessary but not sufficient

multiple factors are requiredcancer

sufficient but not necessarymany factors may cause same disease

leukemianeither sufficient nor necessary

multiple cause


Few causes are necessary and sufficient

High cholesterol is neither necessary nor sufficient for CVD because many individuals who develop CVD do not have high serum cholesterol levels

Hierarchy in Establishing Causality

trialsrandomized, double-blind, placebo-controlled with sufficient power and appropriate analysis

cohort studies & case-control studieshypothesis specified prior to analysis

case-seriesno comparison groups

Documents

Causation. Associations may be due to Chance (random error) statistics are used to reduce it by appropriate design of the study statistics are used to