28
Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

  • View
    235

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Lecture 6: Let’s Start Inferential Stats

Probability and Samples:

The Distribution of Sample Means

Page 2: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Let’s Do an Experiment Imagine a jar filled with marbles. 2/3 of

the marbles are one color and the remaining 1/3 is a different color.– Sample 1: N = 5; red = 4, white = 1 (80%

red).– Sample 2: N = 20; red = 12, white = 8 (60%

red). Which sample are you more confident came

from a population of 2/3 red and 1/3 white balls? Why?

Tversky and Kahneman (1974) found that most people tend to focus on the sample proportion than the sample size, but when asked how many balls they would like to select to make their decision people preferred the opportunity to select 20 v. 5.

Page 3: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Today’s Goal

First from z-scores and probabilities we KNOW:– how scores relate to each other in a distribution– how an individual score relates to its population– Where scores fit into their distributions (probabilities)

• Are they representation • Are they extreme? To understand the relationship between

samples and populations

But…we only know about samples that are made up of a single individual score– Most researcher take much larger samples

• E.g. 100 specimen, 30 dogs, 50 math scores

Page 4: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Populations Members must share at least 1 trait The more traits,

– the lower the ability to generalize– the smaller the population size

Samples Greater the n, the more accurate the

parameter estimate (more chances you’ve got to accurately represent the population)

Representative Sample: sample which possess all the defining characteristics of the population from which it was drawn

Page 5: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

For Example: Say we want to learn about college students

at the UA. We randomly choose 30 students at the UA.– We chose our sample randomly it should be pretty

representative of the population of students at the UA, but we may be missing some segments of the population (e.g. what if by chance our sample includes no Christians, or any international students?)

– Any corresponding stats we compute for the sample will also not be identical to the corresponding parameters

– What if we choose another random sample?

Page 6: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

How do we know how closely our sample represents our population?

Z-scores: where a single score lies in its population AND where a sample mean lies in its population.

Samples give use an incomplete and often inaccurate picture of our population, so we keep track of sampling error:– Sampling error: the discrepancy or amount

of error between a sample statistic and its population parameter.

Page 7: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Sampling Error Samples never precisely reflect the population The difference between the parameter &

statistic is sampling error( - M)

Sampling error is expected & normal

p (+ sampling error) = p (- sampling error)– Some samples overestimate and some

underestimate– This error should be random

f

1 2 3 4 5 6 7 98

Page 8: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Distribution of Sample Means 2 samples taken from the same population will

probably be different– Different individuals -- Different means– Different scores -- Different standard deviations

Given that we can take some extremely large # of samples…what pattern might these samples show?

Distribution of Sample Means (or sampling distribution) - all the possible random samples of a particular size (n) that can be taken from a population

So, we can compute probabilities p(particular sample) = # particular sample/all samples

Page 9: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Distribution of Sample Means Sampling distribution - is a distribution

of statistics (means of samples). Consider a population of 4 scores: 2, 4,

6, 8

f

1 2 3 4 5 6 7 98

* See also…Box 7.1 in the Book…page 205

Page 10: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Predictions What would we expect if we created a

distribution of all the possible n = 2 samples of our data set– Sample means won’t always be perfect,

but should pile around pop. mean– Should start to form a normal distribution

b/c most of the sample means should pile around the pop. Mean, only a few should be extreme

– Larger the sample size the closer the sample mean should be to the population mean b/c a larger sample should be more representative

Page 11: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

f

1 2 3 4 5 6 7 98

* If we chose samples of n = 2, then we can have a total of 16 different possible samples

= 5

Sample 1st score 2nd score M1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

2

2

2

2

4

4

4

4

6

6

6

6

8

8

8

8

2

4

6

8

2

4

6

8

2

4

6

8

2

4

6

8

2

3

4

5

3

4

5

6

4

5

6

7

5

6

7

8

f

1 2 3 4 5 6 7 98

(1) Sample means pile around pop. mean. (they are representative.)

(2) Distribution is ~normal

(3) We can use this sample distribution to answer probability questions.

e.g. What is the probability of obtaining a sample less than 3?

P (M < 3) = 1/16 or 0.06

Page 12: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Central Limit Theorem Not reasonable to take all the possible

samples in a pop. Usually we just take one.

Central Limit Theorem - general characteristics about the sample mean– For any population with mean and

standard deviation , the distribution of sample means for sample size n will have a mean of and a standard deviation of /n, and will approach a normal distribution as n approaches infinity.

Page 13: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Central Limit Theorem Perks: Describes the distribution of sample means

for any population regardless of original shape, mean or standard deviation– Shape– Central tendency– Variability

Important mathematical finding:– Sampling distribution of mean has a mean =

population mean and variance = population variance/n

Page 14: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

A little more about the shape and mean of the distribution of sample means

Shape:– Normal if the samples come from a population that is

normal– Normal if the number of scores (n) in each sample is

around 30 or more.– What does this mean for research

Mean:– Average of all sample means = population mean.– This mean value is call the expected value of M.

M (b/c this value will always be equal to , this book

will just use to refer to the mean for both the pop. and the mean for the distribution of sample means

Page 15: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Standard Error of M Standard deviation for a distribution of

sample means is called standard error of M.

Just as individual scores vary from the sample mean (standard deviation), the sample means comprising the sampling distribution vary from – This measure the average error between

the sample and the population. IMPORTANT!

– Standard error = M = standard distance btwn M and .

Page 16: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Standard Error How do we determine standard error (SE)?

(1) Sample size: law of large numbers = the larger the sample size (n), the more probable the sample mean will be close to the population mean.

– The > sample the < SE, the < sample the > SE(2) Standard deviation:

standard error = x = /n = 2 / n – By definition is the standard distance

between X and , so when n = 1 the SD and SE are the same.

– So SE should be the starting point for standard error. When n = 1 SD = SE and as n SE

Page 17: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

More about Standard Error In sum the standard error provides a way to

measure the “average” distance between the sample mean and the population mean

Research:– Typically uses only 1 sample– What if we had chosen a different sample, see

standard error as a measure of reliability– When we calculate the standard error of that

sample mean we can get an idea of how closely our sample means represents the population mean.

– Critical element in the inferential process

Page 18: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Illustration - Let’s do it! We start with IQ, a population that is

normally distributed with a = 100 and a = 15.

4 samples are taken where:– n = 1 – n = 5– n= 30– n = 100

First calculate the SE = /n

Page 19: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Let’s Do it! n = 1; SE = 15/ = 15 n = 5; SE = 15/ = 6.7 n = 30; SE = 15/ = 2.74 n = 100; SE = 15/ = 1.5

Volunteer to come up and sketch each of these normal distributions with a line denoting the mean and a line denoting standard error.

What does this tell us about how researchers should select their samples?

1 5

100 30

Page 20: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Z-scores for sample means We can compute the probability of obtaining a

particular sample mean using z-scores

Just like last week only now with samples located in a distribution of sample of means

Z = ______

M

M -

Page 21: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Let’s Do it The sampling distribution of home

prices for metro Boston yields a mean price of $250K, with an standard deviation $100K. What is the probability of randomly selecting a sample of n = 4 houses whose mean price is below $185K?

* Remember probability is equivalent to proportion…use your z unit table.

* Also, start by drawing the distribution.

Page 22: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

One More…

An automobile manufacturer claims that a newly introduced model will average mean MPG of 45 with a standard deviation of 2. A sample of n = 4 cars is tested and averages on M = 42 MPA. Is this sample mean likely to occur if the manufacturer’s claim is true? More specifically is the sample mean within the range of values that would be expected 95% of the time?

Page 23: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Z - Scores and Sample Means It is possible to find the probability associated

with any specific sample mean. We can make quantitative predictions about the kinds of samples obtained from any population.

You could use Z-scores for your projects:– Look up the average price of a car of a particular

year on the net. Go car hunting and find a few cars made in that year, what was the probability of finding a car of that price.

– Do people in my family (using a sample of grandparents and great-grandparents ) live longer than the average (look up average lifespan on the net)

Page 24: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Further application of Z-tests z problems can be used in conjunction

with probability estimates to determine if a sample is from a known population.

The .05 convention in statistics states that if there is 5% or less chance that a sample comes from a particular population, then it can be concluded that this sample does not represent this population

Page 25: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Class Problem The sampling distribution of IQ scores

has a = 100 & = 15. You have randomly sampled 50 students and found their mean IQ = 107. Are these students smarter than average? Was this sample larger than the mean due to chance or does this sample represent students that are truly smarter.

Page 26: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

In the Literature Because standard error plays an

important role in inferential stats you’ll see it reported in scientific papers:

Symbols: SE and SEM Tables:

n Mean SE

Control 17 32.23 2.31

Treatment 15 45.17 2.78

Page 27: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

In the Literature

1025

1050

1075

1100

1125

1150

Condition

Rea

ctio

n T

ime

(ms)

Experimental

Control

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6

Page 28: Lecture 6: Let’s Start Inferential Stats Probability and Samples: The Distribution of Sample Means

Homework: Chapter 7

3, 5, 6, 7, 9, 10, 14, 16, 20, 24, 25, 26

Please also read Box 7.3 for more on the difference between standard error and standard deviation. Page 211.