48
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan 10/25/12 Sections 6.4-6.6, 6.10-6.13 Single Mean t-distribution (6.4) Intervals and tests (6.5, 6.6) Difference in means Distribution, intervals and tests (6.10-6.12) Matched pairs (6.13) Intervals and tests (6.8, 6.9) Correlation Inference for Means: t-Distribution

STAT 101 Dr. Kari Lock Morgan 10/25/12

  • Upload
    kinsey

  • View
    31

  • Download
    2

Embed Size (px)

DESCRIPTION

STAT 101 Dr. Kari Lock Morgan 10/25/12. Inference for Means: t-Distribution. Sections 6.4-6.6, 6.10-6.13 Single Mean t-distribution (6.4) Intervals and tests (6.5, 6.6) Difference in means Distribution, intervals and tests (6.10-6.12) Matched pairs (6.13) - PowerPoint PPT Presentation

Citation preview

Page 1: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

STAT 101

Dr. Kari Lock Morgan

10/25/12

S e c t i o n s 6 . 4 - 6 . 6 , 6 . 1 0 - 6 . 1 3• Single Mean• t-distribution (6.4)• Intervals and tests (6.5, 6.6)

• Difference in means • Distribution, intervals and tests (6.10-6.12)• Matched pairs (6.13)• Intervals and tests (6.8, 6.9)

• Correlation

Inference for Means: t-Distribution

Page 2: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Comments• Regrade requests due today. Hand them to me

in person or slide them under my office door.

• My Monday office hours are being permanently switched to Tuesday, 1 – 2:30. This change will be updated on the website.

• Both Heather and Sam have office hours Monday

• If you can’t make my office hours, I am always available to talk after class.

Page 3: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

If the distribution of the sample statistic is normal:

A confidence interval can be calculated by

where z* is a N(0,1) percentile depending on the level of confidence.

A p-value is the area in the tail(s) of a N(0,1) beyond

Inference Using N(0,1)

*sample statistic z SE

sample statistic null valueSE

z

Page 4: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

CLT for a MeanPopulation Distribution of

Sample DataDistribution of Sample Means

n = 10

n = 30

n = 50

Frequency

0 1 2 3 4 5 6

0.0

1.5

3.0

1.0 2.0 3.0

Frequency

0 1 2 3 4 5

04

8

1.5 2.0 2.5 3.0

Frequency

0 2 4 6 8 12

010

25

1.4 1.8 2.2 2.6

x

0 2 4 6 8 10

Page 5: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The standard error for a sample mean can be calculated by

SE of a Mean

SEn

Page 6: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The standard deviation of the population is

a)

b) s

c)

Standard Deviation

n

Page 7: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The standard deviation of the sample is

a)

b) s

c)

Standard Deviation

n

Page 8: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The standard deviation of the sample mean is

a)

b) s

c)

Standard Deviation

n The standard error is the standard

deviation of the statistic.

Page 9: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

If n ≥ 30*, then

CLT for a Mean

,X Nn

*Smaller sample sizes may be sufficient for symmetric distributions, and 30 may not be sufficient for very skewed distributions or distributions with high outliers

Page 10: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• We don’t know the population standard deviation , so estimate it with the sample standard deviation, s

Standard Error

SEn

SE sn

Page 11: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• Replacing with s changes the distribution of the z-statistic from normal to t

• The t distribution is very similar to the standard normal, but with slightly fatter tails to reflect this added uncertainty

t-distribution

Page 12: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• The t-distribution is characterized by its degrees of freedom (df)

• Degrees of freedom are calculated based on the sample size

• The higher the degrees of freedom, the closer the t-distribution is to the standard normal

Degrees of Freedom

Page 13: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

t-distribution

Page 14: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Aside: William Sealy Gosset

Page 15: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Normality Assumption• Using the t-distribution requires an extra assumption: the data comes from a normal distribution

• Note: this assumption is about the original data, not the distribution of the statistic

• For large sample sizes we do not need to worry about this, because s will be a very good estimate of , and t will be very close to N(0,1)

• For small sample sizes (n < 30), we can only use the t-distribution if the distribution of the data is approximately normal

Page 16: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Normality Assumption• One small problem: for small sample sizes, it is very hard to tell if the data actually comes from a normal distribution!

Population

x

0 2 4 6 8 10

Sample Data, n = 10

0 2 4 6 8 10 0 1 2 3 4 5 60.5 1.5 2.5 3.5

-2.0 -1.0 0.0 1.0 -0.5 0.5 1.0 1.5 2.0 -2 -1 0 1-4 -2 0 2 4

Page 17: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Small Samples

• If sample sizes are small, only use the t-distribution if the data looks reasonably symmetric and does not have any extreme outliers.

• Even then, remember that it is just an approximation!

• In practice/life, if sample sizes are small, you should just use simulation methods (bootstrapping and randomization)

Page 18: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Confidence Intervals

*sample statistic t SE

* sX tn

df = n – 1

IF n is large or the data is normal

t* is found as the appropriate percentile on a t-distribution with n – 1 degrees of freedom

Page 19: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Hypothesis Testing

sample statistic null valueSE

t

0t Xsn

The p-value is the area in the tail(s) beyond t in a t-distribution with n – 1 degrees of freedom,IF n is large or the data is normal

df = n – 1 0 0:H

Page 20: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Chips Ahoy!

?

A group of Air Force cadets bought bags of Chips Ahoy! cookies from all over the country to verify this claim. They hand counted the number of chips in 42 bags.

1261.6, 1 1 7.6X s Source: Warner, B. & Rutledge, J. (1999). “Checking the Chips Ahoy! Guarantee,” Chance, 12(1).

Page 21: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Chips Ahoy!

Can we use hypothesis testing to prove that there are 1000 chips in every bag? (“prove” = find statistically significant)

(a) Yes(b) No We can only do inference for parameters

(such as mean, proportion, etc.), not for the value of every single bag.

Page 22: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Can we use hypothesis testing to prove that the average number of chips per bag is 1000? (“prove” = find statistically significant)

(a) Yes(b) No

Statements of equality are always in the null hypothesis, and we can only reject or fail or reject the null, never accept the null. Therefore, we cannot use hypothesis testing to prove equality.

Chips Ahoy!

Page 23: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Can we use hypothesis testing to prove that the average number of chips per bag is more than 1000? (“prove” = find statistically significant)

(a) Yes(b) No

Chips Ahoy!

Page 24: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

?

1) Are there more than 1000 chips in each bag, on average?

(a) Yes(b) No(c) Cannot tell from this data

2) Give a 99% confidence interval for the average number of chips in each bag.

1261.6, 117.6, 42X s n

Chips Ahoy!

Page 25: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Chips Ahoy!0 : 1000

: 1000aHH

0 1261.6 1000 14.4117.642

Xtsn

0p value

This provides extremely strong evidence that the average number of chips per bag of Chips Ahoy! cookies is significantly greater than 1000.

42 30n

1. State hypotheses:

2. Check conditions:

3. Calculate test statistic:

4. Compute p-value:

4. Interpret in context:

Page 26: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Chips Ahoy!

*

117.61261.6 2.742

1212.6,1310.6

sX tn

We are 99% confident that the average number of chips per bag of Chips Ahoy! cookies is between 1212.6 and 1310.6 chips.

42 30n 1. Check conditions:

2. Find t*:

4. Compute confidence interval:

4. Interpret in context:

* 2.7t

Page 27: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

t-distribution

Which of the following properties is/are necessary for to have a t-distribution?

a) the data is normalb) the sample size is largec) the null hypothesis is trued) a or be) d and c

0Xt sn

To use the t-distribution, either the sample size has to be large or the data has to come from a normal distribution. If these conditions are met, then t has a t-distribution when the null hypothesis is true.

As with all hypothesis testing, you find the distribution of the statistic (in this case the t-statistic) when the null is true, and then see how extreme your observed value is.

Page 28: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

SE for Difference in Means

2 21 2

1 2

SEn n

df = smaller of n1 – 1 and n2 – 1

Page 29: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

CLT for Difference in Means

1 230 and *I 3 ,f 0nn

2 21 2

1 2 1 21 2

,n n

X X N

*Smaller sample sizes may be sufficient for symmetric distributions, and 30 may not be sufficient for skewed distributions

Page 30: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

t-distribution

• For a difference in means, the degrees of freedom for the t-distribution is the smaller of n1 – 1 and n2 – 1

• The test for a difference in means using a t-distribution is commonly called a t-test

Page 31: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The Pygmalion Effect

Source: Rosenthal, R. and Jacobsen, L. (1968). “Pygmalion in the Classroom: Teacher Expectation and Pupils’ Intellectual Development.” Holt, Rinehart and Winston, Inc.

Teachers were told that certain children (chosen randomly) were expected to be “growth spurters,” based on the Harvard Test of Inflected Acquisition (a test that didn’t actually exist). These children were selected randomly.

The response variable is change in IQ over the course of one year.

Page 32: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

The Pygmalion Effect

n sControl Students 255 8.42 12.0“Growth Spurters” 65 12.22 13.3

X

Does this provide evidence that merely expecting a child to do well actually causes the child to do better?

(a) Yes(b) No

If so, how much better?

*s1 and s2 were not given, so I set them to give the correct p-value

Page 33: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Pygmalion Effect0 1 2

1 2

: 0: 0aH

H

1 22 2 2 21 2

1 2

12.22 8.420

13.3 1265 255

3.8 2.0961.813

X Xts sn n

.02p value

We have evidence that positive teacher expectations significantly increase IQ scores in elementary school children.

30,6255 5 30

1. State hypotheses:

2. Check conditions:

3. Calculate t statistic:

4. Compute p-value:

5. Interpret in context:

1

2

: Mean IQ change for "growth spurters"Mean IQ change for control stud ts: en

Page 34: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Pygmalion Effect

From the paper:

“The difference in gains could be ascribed tochance about 2 in 100 times”

Page 35: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Pygmalion Effect

We are 95% confident that telling teachers a student will be an intellectual “growth spurter” increases IQ scores by between 0.17 and 7.43 points on average, after 1 year.

1. Check conditions:

2. Find t*:

3. Compute the confidence interval:

4. Interpret in context:

* 1.9 298t

3.8 2 1.813 0.17,7.43

30,6255 5 30

95% CI

2 2* 1 2

1 21 2

sn nsX X t

Page 36: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• A matched pairs experiment compares units to themselves or another similar unit, rather than just compare group averages

• Data is paired (two measurements on one unit, twin studies, etc.).

• Look at the difference in responses for each pair

Matched Pairs

Page 37: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• Do pheromones (subconscious chemical signals) in female tears affect testosterone levels in men?

• Cotton pads had either real female tears or a salt solution that had been dripped down the same female’s face

• 50 men had a pad attached to their upper lip twice, once with tears and once without, order randomized.

• Response variable: testosterone level

Pheromones in Tears

Gelstein, et. al. (2011) “Human Tears Contain a Chemosignal," Science, 1/6/11.

Page 38: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Why do a matched pairs experiment?

a) Decrease the standard deviation of the response

b) Increase the power of the test

c) Decrease the margin of error for intervals

d) All of the above

e) None of the above

Matched Pairs

Using matched pairs decreases the standard deviation of the response, which decreases the standard error. A smaller SE increases the power of tests and decreases the margin of error for intervals.

Page 39: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• Matched pairs experiments are particularly useful when responses vary a lot from unit to unit

• We can decrease standard deviation of the response (and so decrease standard error of the statistic) by comparing each unit to a matched unit

Matched Pairs

Page 40: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• For a matched pairs experiment, we look at the difference between responses for each unit, rather than just the average difference between treatment groups

• Get a new variable of the differences, and do inference for the difference as you would for a single mean

Matched Pairs

Page 41: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

• The average difference in testosterone levels between tears and no tears was -21.7 pg/ml.

• The standard deviation of these differences was 46.5

• Average level before sniffing was 155 pg/ml.

• The sample size was 50 men

• Do female tears lower male testosterone levels?

(a) Yes (b) No (c) ???

• By how much? Give a 95% confidence interval.

Pheromones in Tears

“pg” = picogram = 0.001 nanogram = 10-12 gram

Page 42: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Pheromones in Tears0 : 0

: 0d

a dHH

0 21.7 3.36.58

dXtSE

This provides strong evidence that female tears decrease testosterone levels in men, on average.

50 30n

1. State hypotheses:

2. Check conditions:

3. Calculate test statistic:

4. Compute p-value:

5. Interpret in context:

21.7, 46.5, 50d d nX s 46.5 6.58

50dsn

SE

Page 43: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Pheromones in Tears

*dX t SE

We are 95% confident that female tears on a cotton pad on a man’s upper lip decrease testosterone levels between 8.54 and 34.86 pg/ml, on average.

50 30n 1. Check conditions:

2. Find t*:

3. Compute the confidence interval:

4. Interpret in context:

21.7 2 6.58 -34.86, -8.54

*or 2t

21.76.58

d

SEX

Page 44: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Correlation

212rSE

n

df = n - 2

t-distribution

Page 45: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain• Is the size of certain regions of your brain correlated with the size of your social network?

• Social network size measured by many different variables, one of which was number of facebook friends. Brain size measured by MRI.

• The sample correlation between number of Facebook friends and grey matter density of a certain region of the brain (left middle temporal gyrus), based on 125 people, is r = 0.354. Is this significant? (a) Yes (b) No

Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011). “Online social network size is reflected in human brain structure,” Proceedings of the Royal Society B: Biological Sciences. 10/19/11.

Page 46: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain0 : 0

: 0aHH

0 0.354 4.200.084

rtSE

This provides strong evidence that the size of the left middle temporal gyrus and number of facebook friends are positively correlated.

125 30n

1. State hypotheses:

2. Check conditions:

3. Calculate test statistic:

4. Compute p-value:

5. Interpret in context:

2

0.3541 0.354

1250.0

284SE

r

p-value = 0.00005

Page 47: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

Parameter Distribution Conditions Standard Error

ProportionNormal

All counts at least 10np ≥ 10, n(1 – p) ≥ 10

Difference in Proportions

NormalAll counts at least 10

n1p1 ≥ 10, n1(1 – p1) ≥ 10, n2p2 ≥ 10, n2(1 – p2) ≥ 10

Mean t, df = n – 1 n ≥ 30 or data normal

Difference in Means

t, df = smaller of n1 – 1, n2 – 1

n1 ≥ 30 or data normal, n2 ≥ 30 or data normal

Paired Diff. in Means

t, df = nd – 1 nd ≥ 30 or data normal

Correlationt, df = n – 2 n ≥ 30

(1 )p pn

2

n

1 1

1

2 2

2

(1 ) (1 )p p p pn n

2 21 2

1 2n n

2

1

d

n

212r

n

pg 454

Page 48: STAT 101 Dr. Kari Lock Morgan 10/25/12

Statistics: Unlocking the Power of Data Lock5

To DoRead Chapter 6

Do Homework 5 (due Tuesday, 10/30)