11
1 MEDINF 409 INTRODUCTION TO BIOSTATISTICS NORTHWESTERN UNIVERSITY MMI 409 EXAM 2 Himabindu Moram 11/9/14

MMI 409 EXAM 2 - Weeblybindumoram.weebly.com/uploads/4/6/5/4/46546235/moram_himabindu_mmi409_exam2.pdfMMI 409 EXAM 2 Himabindu Moram 11/9/14 . 2 Q1a. Calculate a 95 percent confidence

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

1

MEDINF 409 INTRODUCTION TO BIOSTATISTICS

NORTHWESTERN UNIVERSITY

MMI 409 EXAM 2

Himabindu Moram 11/9/14

2

Q1a. Calculate a 95 percent confidence interval for the mean data. Show your work. Explain, in plain English, what your results mean. (5 points) A1a. x = 39.04 SD = 11.51 n = 83 SE = SD / sqrt (83) = 1.26 m – margin of error for 95% CI – 1.96* SE = 2.46 95% confidence interval for the mean data = x +- m u = 39.04 +- 2.46 36.58 39.04 41.50 The 95% confidence interval for u is 36.58 to 41.50. With sample size of 83 and mean of 39.04, above calculated 95% confidence interval indicates that there is a probability of 5% or less (statistically significant) chance that sample worked <36.58 or > 41.5 hrs a week (by chance). Q1b. If we increased the sample size to 10,000 and used a level of confidence of 99 percent, how would this impact the confidence interval for the mean? Calculate the confidence interval (showing your work) and describe your results. (5 points) A1b. x = 39.04 SD = 11.51 n = 10,000 SE = SD / sqrt (10,000) = 0.1151 m – margin of error for 99% CI – 2.576* SE =0.296 99% confidence interval for the mean data = x +- m u = 39.04 +- 0.296 38.74 39.04 39.34 With sample size of 10,000 and mean of 39.04, above calculated 99% confidence interval indicates that there is 1% or less probability ((statistically significant) that a sample worked less than 38.74 hrs or more than 39.34 hrs a week (by chance).

3

Q2. Using the normal distribution, we know that ____ of all random sample means will fall within +/-1.96 standard errors of the true population mean. (5 points) A) 68 B) 95 C) 99 D) 100 A2. B) 95 Q3. ____is the probability associated with failing to reject the null hypothesis when it is false; whereas, ____is the probability associated with rejecting the null hypothesis when it is true. (5 points) A) type I error; type II error B) type II error; type I error C) p-value; alpha D) alpha; p-value A3. B) type II error; type I error Q4. The 40-hour workweek is generally considered as a standard in American society today. Using data from the 2006 General Social Survey, you wish to determine whether the mean number of hours worked per week by men in the sample differs from the 40-hour standard. Weekly Hours Worked Men Range 60 Minimum value 5 Maximum value 65 Mean 42.31 Variance 100.00 Standard deviation 10.00 Sum 1,227 Number of observations 29

Q4a. State the null hypothesis (H0) and alternative hypothesis (H1). (5 points) A4a. Null Hypothesis is H0:u = 42.31 Alternative Hypothesis is Ha:u > 42.31

4

Q4b. Use some or all of the information provided with this question (4) to calculate the value of the single sample t-test, setting alpha at .05. (5 points) A4b Test Statistics: The estimated standard error of the mean SE = 10 / sqrt (29) = 1.857 tstat = 42.31 – 40 = 2.31 /1.857 = 1.243 with df = n-1 = 29-1 = 28 P-Value: The one sided P-value is between 0.15 and 0.10 The two-sided P-value is between 0.30 and 0.20 Using Software utility One sided P-Value = 0.1121 Two–sided P-Value = 0.2242 Significance level: The observed difference is high at alpha – 0.05, which means p > alpha. Q4c. Explain, in plain English, what your results mean. (5 points) A4c. As P > alpha (0.05) therefore does not provide statistically significant evidence against null hypothesis H0. Thus, the observed sample mean of 42.31 in this sample is not significantly different from the population mean hours of 40.

5

Q5. Suppose you are interested in comparing the mean number of hours worked by gender. Weekly Hours Worked Women Range 59 Minimum value 11 Maximum value 70 Mean 37.28 Variance 142.88 Standard deviation 11.95 Sum 2013 Number of observations 54 Weekly Hours Worked Men Range 60 Minimum value 5 Maximum value 65 Mean 42.31 Variance 100.00 Standard deviation 10.00 Sum 1,227 Number of observations 29

Q5a. State the null and alternative hypotheses. (5 points) A5a. Weekly Hours Worked Women

Null Hypothesis is H0:u1 = 37.28 Alternative Hypothesis is Ha: u1 > 37.28

Weekly Hours Worked Men

Null Hypothesis is H0:u2 = 42.31 Alternative Hypothesis is Ha:u2 > 42.31

Q5b. Use all or some of the information provided to calculate the value of the independent samples t- statistic. (5 points) A5b.

Women Men

x1 - 37.28 x2 – 42.31 s1 - 11.95 s2 - 10.00 n1 - 54 n2 - 29

6

Hypothesis - H0:u1 = u2 against Ha:u1 <> u2 (two-sided) Test Statistic - tstat = 2.034

P-value – Using dfconserve = 28 smaller of (n1-1) or (n2-1) The one sided P-value is between 0.05 and 0.025 The two-sided P-value is between 0.10 and 0.05 Significance level – The results are significant at the a = 0.025 level but not at the a = 0.01 level. Q5c. Explain, in plain English, your results. (5 points) A5c The data provided significant evidence that the mean hours worked in both men and women differ. Hypotheses about a difference in means for one variable (HOURS) in two groups (women and men) call for an independent-samples t test. This is a two-tailed hypothesis test because a sample difference in either direction could lead to rejecting the null hypothesis.

7

Q5d. For ANOVA, why is F statistic also called F ratio? (5 points) A) it is the ratio of one group mean to another B) it is the ratio of the mean square between to the mean square within C) it is the ratio of the with group sum of squares to the between group sum of squares D) it is the ratio of degrees of freedom for the within group sum of squares to the degrees of freedom for the between group sum of squares Q6. Use the data file ALCOHOL.SAV (attached to the exam assignment) to conduct an ANOVA examining difference in alcohol use by age. Q6a. State your null and alternative hypotheses. (5 points) A6a. Null Hypothesis is H0:u = .9220 Alternative Hypothesis is Ha:u > .9220 Q6b. Attach your SPSS output. (5 points) A6b

8

Q6c. Interpret your SPSS output. (5 points) A6c. The descriptives table above provides some very useful descriptive statistics, including the mean, standard deviation and 95% confidence intervals for the dependent variable “Alcohol Use” separated for each age group as well as combined total for all age groups. The table ANOVA shows the output of the ANOVA analysis and whether we have a statistically significant difference between our group means. We can see that the significance level is 0.004 (p = .004), which is below 0.05. The Means Plots illustrate the mean differences.

9

Q6d. Run two more ANOVAs – one for males (examining age and alcohol use) and one for females (examining age and alcohol use). What can you conclude? (10 points) A6d. In the below ANOVA stats Male is 1 and Female is 0 for AGE There is a significance difference of alcohol use between men and women. The Anova table is the most important part of the output because it tells us whether any of the independent variables have had an effect. The important things to look at in the table are the significance values of the independent variables. The first thing to notice is that there is a significant effect of alcohol (because the significance value is less than .05). The F ratio is highly significant indicating that the amount of alcohol consumed.    This effect means that overall, when we ignore whether the participant was male or female, the amount of alcohol influenced their mate selection. The best way to see what this means is to look at the bar chart that should have plotted of the average mark for each level of alcohol (ignoring gender).

10

11

Q7. Select the best response: Maximum power of a statistical test is achieved when the sample

size for group 1 is _____________ the sample size for group 2. (5 points)

A) Greater than

B) Less than

C) Equal to

A7. A) Greater than