Upload
patrick-webb
View
219
Download
0
Embed Size (px)
DESCRIPTION
Missing Values and Logical Operators management/logical-expressions-and- missing-values/http://www.stata.com/support/faqs/data- management/logical-expressions-and- missing-values/
Citation preview
Basics of Biostatistics for Health ResearchSession 4 – February 28, 2013
Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences
& Department of Psychiatry
Generate Commands Using Logic
generate obese2 = .recode obese2 .=0 if bmi <= 30recode obese2 .=1 if bmi > 30tab obese obese2prtest obese2, by(sex)
Missing as obese, which is strange.
Missing Values and Logical Operators
• http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/
Generate Commands Using Logic
generate obese2 = .recode obese2 .=0 if bmi <= 30recode obese2 .=1 if bmi > 30 & bmi !=.tab obese obese2, missingprtest obese2, by(sex)
This code works.
Statistical Errors
P (non-exposed) 0.1Alt Hypoth. 0.2 (diff. between 2 prop.)P (exposed) 0.3
N (exposed) 30N (non-exposed) 30 (set equal to exposed)
Alpha 0.05
Power 0.5095
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
-0.5 -0.4 -0.3 -0.2 -0.14.3715E-160.1 0.2 0.3 0.4 0.5
Null Hypothesis Alternative Hypothesis Reject Indicator
Increase Sample Size
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
Power
Reset
Increase Effect Size
Increase Alpha
Sample Size Simulation
Sample Size Calculation in STATA
3
21
Sample Size Dialogue Boxes
Let’s do a calculation!
• You are planning a parallel group RCT – with treatment and control groups.
• Normally, 20% of people die with disease X, but you expect to cut this in half with a new treatment.
• How many do you need in each group to achieve 95% power at alpha = 5%?
Output (sampsi)
n2 = 349 n1 = 349
Estimated required sample sizes:
n2/n1 = 1.00 p2 = 0.1000 p1 = 0.2000 power = 0.9500 alpha = 0.0500 (two-sided)
Assumptions: and p2 is the proportion in population 2Test Ho: p1 = p2, where p1 is the proportion in population 1
Estimated sample size for two-sample comparison of proportions
. sampsi .2 .1, alpha(0.05) power(.95)
Another Calculation
• A QoL scale in a particular disease has a mean score of 20 and a standard deviation of 5.
• You are conducting a placebo controlled trial to evaluate a treatment that is expected to improve the QoL by 2 points on this scale.
• You recruit n=50 into each group – what power will you achieve?
Output (sampsi)
power = 0.5160
Estimated power:
n2/n1 = 1.00 n2 = 50sample size n1 = 50 sd2 = 5 sd1 = 5 m2 = 22 m1 = 20 alpha = 0.0500 (two-sided)
Assumptions: and m2 is the mean in population 2Test Ho: m1 = m2, where m1 is the mean in population 1
Estimated power for two-sample comparison of means
• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten
• Scroll to the bottom.• Right click to download the files described as
being “for PGME Students”– One is a dataset– One is a data dictionary
• Save them on your desktop
Review: Comparing Proportions
• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):
generate obese = .recode obese .=0 if bmi <= 30recode obese .=1 if bmi > 30 & bmi !=.tab obese obese, missingprtest obese, by(sex)
Epitab Commands
1
32
Review: Comparing Proportions
• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):
recode sex 2=1 1=0cs obese sex
The output…
chi2(1) = 17.16 Pr>chi2 = 0.0000 Attr. frac. pop .1118099 Attr. frac. ex. .181502 .0997744 .25581 Risk ratio 1.22175 1.110833 1.343743 Risk difference .0265444 .0141393 .0389496 Point estimate [95% Conf. Interval] Risk .1462487 .1197042 .1347732 Total 6571 5004 11575 Noncases 5610 4405 10015 Cases 961 599 1560 Exposed Unexposed Total sex
. cs obese sex
A “non-significant” association
generate highgluc = .recode highgluc .=0 if glucose <= 140 recode highgluc .=1 if glucose > 140 & glucose !=.generate female=sexrecode female (1=0) (2=1)tab highgluc female, exact
How does this look with cs?
.
chi2(1) = 3.51 Pr>chi2 = 0.0609 Prev. frac. pop .12358 Prev. frac. ex. .2215609 -.0122169 .4013463 Risk ratio .7784391 .5986537 1.012217 Risk difference -.0054099 -.0111474 .0003276 Point estimate [95% Conf. Interval] Risk .0190074 .0244173 .0213998 Total 5682 4505 10187 Noncases 5574 4395 9969 Cases 108 110 218 Exposed Unexposed Total female
. cs highgluc female
Review: Try the cci command to obtain the OR
.
chi2(1) = 3.51 Pr>chi2 = 0.0609 Prev. frac. pop .12358 Prev. frac. ex. .2215609 -.0122169 .4013463 Risk ratio .7784391 .5986537 1.012217 Risk difference -.0054099 -.0111474 .0003276 Point estimate [95% Conf. Interval] Risk .0190074 .0244173 .0213998 Total 5682 4505 10187 Noncases 5574 4395 9969 Cases 108 110 218 Exposed Unexposed Total female
. cs highgluc female
Check your work with the cc command.
Comparing Proportions?
Yes No
Fisher’s Exact Test Parametric Assumptions?
Yes No
Multiple Groups? Multiple Groups?
Yes NoYes No
ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum
Two situations we haven’t covered…
• Severely skewed distributions• Two continuous variables
Severely Skewed Variables
Solution: Make Some Categories
• For example:– Non-smokers– Light smokers (<20)– Moderate 20-40– Heavy > 40
• Your task: Make a variable with these categories and do a statistical test to compare men to women.
E.g. for the recoding…
generate smoke = .recode smoke .=1 if cigpday==0recode smoke .=2 if cigpday > 0 & cigpday < 20recode smoke .=3 if cigpday >=20 & cigpday <= 40recode smoke .=4 if cigpday > 40 & cigpday !=.tab smoke, missing
Some output…
Fisher's exact = 0.000
Total 4,990 6,558 11,548 4 122 23 145 3 1,754 1,073 2,827 2 686 1,292 1,978 1 2,428 4,170 6,598 smoke 1 2 Total sex
stage 1: enumerations = 0stage 2: enumerations = 142603stage 3: enumerations = 146stage 4: enumerations = 1Enumerating sample-space combinations:
. tab smoke sex, exact
Two continuous variables
• E.g. diastolic blood pressure and BMI• The place to start is always a scatter plot• STATA calls this a “two way” graph
Start with Create
Select the two variables
Submit
The command produced…• Produced by our dialogue box…
twoway (scatter diabp sysbp)• The same dialogue box can fit a line…
twoway (lfit diabp sysbp)
This time select “line”
You can combine the two..
• Try it!twoway (scatter diabp sysbp) (lfit diabp sysbp)
• To assess significance, use the regress command (can you find the menu option?)regress diabp sysbp
Note: the linear output
• Line: y = mx + b
• diabp = 33.42 + 0.364(sysbp) _cons 33.42091 .4606105 72.56 0.000 32.51804 34.32379 sysbp .3639623 .0033325 109.22 0.000 .3574301 .3704946 diabp Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1580658.92 11626 135.958965 Root MSE = 8.1921 Adj R-squared = 0.5064 Residual 780160.451 11625 67.1105764 R-squared = 0.5064 Model 800498.474 1 800498.474 Prob > F = 0.0000 F( 1, 11625) =11928.05 Source SS df MS Number of obs = 11627
. regress diabp sysbp
(In Class) Assignment for Today
• Assess whether there is an association between systolic blood pressure and death
(you need to decide how)• We’ll define elevated systolic blood
pressure as being > 140 mm of Hg.– What is the risk ratio for death for people with
elevated systolic blood pressure?– Is the risk ratio statistically significant?