Upload
clara-warren
View
219
Download
0
Embed Size (px)
Citation preview
Type I and II errors
Ana Jerončić
P value is a short form for probability value
P=0.07=7% There is 7% probability that we will incounter such
or more extreme differences by chance. OR In case when no real effect exsists if we repeat
experiment a 100 times, such difference (or more extreme) would be found in 7 experiments.
What is a p value?
P value is a short form for probability value
P=0.99=99% There is 99% probability that we will incounter such
or even more extreme differences by chance. OR In case when no real effect exsists if we repeat
experiment a 100 times, such difference (or more extreme) would be found in 99 experiments.
What is a p value?
What is a significance level α?
Interpretation of P-value (0.05)
P>=0.05
Significant difference between the treatmentsNull hypothesis is rejected, alternative is accepted
P<0.05 5%
No difference between the treatments (observed difference having happened by chance)Null hypothesis is accepted
The threshold of P-value that determines when to reject a null hypothesis
It refers to the chance that you are willing to take in being wrong ie. in concluding that there is a substantial difference when there is none.
What is a significance level α?
The most common significance level: α=0.05=5%
We want to risk that only 5% of our predictions are wrong.
What is a significance level α?
= Alpha=0.05Out of 40 decisions => we could expect that 2 are wrong
α is also called Type I error The probability of erroneously rejecting the
null hypothesis
Consequence of type I error Put an useless medicine into the market!
What is (Type I error)?
Watch out for…
p
The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group
Example from the literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732
The sample size calculation was based on the primary outcome, BMI or BMI z-score, which was assumed to have a SD of 1.5, or 1.0 respectively. To have 80% power to detect a difference in mean BMI of 0.38, or mean BMI z-score of 0.25 units between the groups at age 2 at the two sided 5% significance level, we needed a sample size of 252 per group
Example from the literatureEffectiveness of a home-based early intervention on children’s BMI at age two years: randomised controlled trial.” BMJ 2012;344:e3732
…. The higher-degree RR was deemed significantly better if the P-value for the higher-degree model was 0.01.
…..
Example from the literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait Dana in Complex Pedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005
…. The higher-degree RR was deemed significantly better if the P-value for the higher-degree model was 0.01.
…..
Example from the literature Quantitative Trait Locus Analysis of Longitudinal Quantitative Trait Dana in Complex Pedigrees. Macgregor, S, Knott, S et al. Genetics 171, 1365-1376, 2005
Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.
Example: The Brain-Derived Neurotrophic Factor val66met Polymorphism and Variation in Human Cortical MorphologyLukas Pezawas, Beth A. Verchinski, et al.
Hippocampal gray matter volume change was assessed statistically using a two-tailed t contrast with a significance level set to 0.05 (corrected for multiple comparisons within the ROI). Uncorrected exploratory full-brain statistics were also performed with two-tailed t contrasts at a significance level set to 0.001.
Example: The Brain-Derived Neurotrophic Factor val66met Polymorphism and Variation in Human Cortical MorphologyLukas Pezawas, Beth A. Verchinski, et al.
The probability of erroneously failing to reject the null hypothesis.
The most common β = 0.2
Consequence of type I error Keep a good medicine away from patients!
What is (Type II error)?
Power quantifies the ability of the study to find true differences.
Power = 1- =P (accept H1 given H1 is true) the probability of correctly identifing H1
(correctly identify a better medicine)
If β=0.2, power=0.8=80%
What is Power ?
Example
Studies with the drug X have shown that usage of drug X induces very serious side effects. Therefore drug X was with-drawn from the market.
New alternative drug Y was examined and the reduction in harmful effects, compared to drug X, was observed.
What is the significance level that you will use to evaluate the significance of reduction in harmful effects of drug Y, compared to drug X?
Example
The effect of alcohol on the driver’s reaction time was investigated on a simple random sample. Observed reaction times, before and after the alcohol intake, have shown the increase in average reaction time after the alcohol intake.
What is the significance level that you will use to evaluate the significance of increase in reaction time?
1. the medical and practical consequences of the two kinds of errors
2. the desired impact of the results
The choice of and depends on:
< (the most common approach =0.05 and =0.2) ie. if the control treatment is already widely used and
is known to be reasonably safe and effective, whereas the test treatment is new, costly, or produces serious side effects.
> ie. if there is no established control treatment and
test treatment is relatively inexpensive, easy to apply and is not known to have any serious side effects.
The choice of and
Choices other than =0.05 and =0.2 =0.10 and =0.2 for preliminary trials that
are likely to be replicated.
=0.01 and =0.05 for the trial that are unlikely replicated.
The choice of and
A company who used to develop a clot-busting product in the indication of occluded central venous catheter - Nuvelo Pharmaceuticals was sewed by their investors for setting extraordinarily small significance level α=0.00125
http://onbiostatistics.blogspot.com/2010/01/significant-level-of-000125.html
Significance level at court
Power calculation
Power quantifies the ability of the study to find true differences.
Power = 1- =P (accept H1 given H1 is true)
the probability of correctly identifing H1
(correctly identify a better medicine)
If β=0.2, power=0.8=80%
What is the power of the study?
is the minimum difference between groups that is judged to be clinically important
1. Minimal effect which has clinical relevance in the management of patients
or2. The anticipated effect of the new treatment
What is delta ()?
Power Depends on 4 elements:
The real difference between the two medicines, Big big power
The variation among individuals, Small big power
The sample size, n Large n big power
Type I error, Large big power
Power Calculation(assuming we compare two medicines)
Sample size
N
The power 1- N
The N
Sample size and , , and
“How large a sample do I need?”-Very commonly asked -Important question-Answer not so simple
Statistical power calculations-Use statistical software or
graphical method-Depends on data type
Sample Size
Braga L, Byrne R, Lorenzo A et al. Methodological quality assessment of RCTs in hypospadias literature. 23rd Annual ESPU Congress - Zurich, Switzerland - 2012
Analyses showed that publication after 2006 (p<0.01), RCT sample size >50 (p=0.03), significance level α=0.01 (p<0.01) and blinding of outcome assessor (p<0.01) were significantly associated with better quality of RCTs.
Interpret the results
Hypospadias is a birth defect of the urethra in males
Weir R. Randomised controlled trial to meta-analysis ratio: a replyfrom a group producing systematic reviews. 2007. The New Zel Med
Journal 120, 1-3
Antman et al showed that recommendations for routine use of thrombolytic therapy first appeared in 1987, 14 years after a statistically significant reduction in mortality was apparent on a subsequent cumulative meta-analysis of all relevant RCTs.
At the first time a significant reduction in mortality was apparent in the cumulative meta-analysis of IV streptokinase therapy (1973, p=0.01), 2432 patients had been randomised in eight small trials. The results of a further 25 studies (34,542 additionalpatients) published before routine recommendation of thrombolytic therapy, reduced the significance level to p=0.001 in 1979 and p=0.0001 in 1986.
Interpret the results
Based on the results presented in the abstract –
write down conclusion section
CONCLUSION: Overall advice to use steam inhalation, or
ibuprofen rather than paracetamol, does not help control symptoms in patients with acute respiratory tract infections and must be balanced against the possible progression of symptoms during the next month for a minority of patients. Advice to use ibuprofen might help short term control of symptoms in those with chest infections and in children.
Little P, Moore M, et al. Ibuprofen, paracetamol, and steam for patients with respiratory tract infections in primary care: pragmatic randomised factorial trial. BMJ 2013 Oct 25;347:f6041
CONCLUSION: Our findings suggest the presence of
heterogeneity in the associations between individual fruit consumption and risk of type 2 diabetes. Greater consumption of specific whole fruits, particularly blueberries, grapes, and apples, is significantly associated with a lower risk of type 2 diabetes, whereas greater consumption of fruit juice is associated with a higher risk.
Muraki I, Imamura F, et al. Fruit consumption and risk of type 2 diabetes: results from three prospective longitudinal cohort studies. BMJ. 2013 Aug 28;347:f5001
Conclusions Although limited in quantity, existing randomised trial evidence on exercise interventions suggests that exercise and many drug interventions are often potentially similar in terms of their mortality benefits in the secondary prevention of coronary heart disease, rehabilitation after stroke, treatment of heart failure, and prevention of diabetes.
Huseyin Naci, John P A Ioannidis et al. Comparative effectiveness of exercise and drug interventions on mortality outcomes: metaepidemiological study. BMJ 2013; 347
Sanjay Basu et al. Palm oil taxes and cardiovascular disease mortality in India: economic-epidemiologic model, BMJ. 2013 Oct 22;347;
Conclusions Curtailing palm oil intake through taxation may modestly reduce hyperlipidemia and cardiovascular mortality, but with potential distributional consequences differentially benefiting male and urban populations, as well as affecting food security.