36
Epidemiology 9509 sampling distributions (more) Epidemiology 9509 Principle of Biostatistics Chapter 7: Sampling Distributions (continued) John Koval Department of Epidemiology and Biostatistics University of Western Ontario 1

Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Epidemiology 9509Principle of Biostatistics

Chapter 7: Sampling Distributions (continued)

John Koval

Department of Epidemiology and BiostatisticsUniversity of Western Ontario

1

Page 2: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Next

want to look at histogram of sample statisticssample mean, median, sample variance, sample standard deviation

to see what their distribution looks like

2

Page 3: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample mean of Bernoullis

Consider the sample of 10 observations from a Bernoulli that is,the sample of 10 responses to the question

Do you smoke? where Yes is valued as 1

and No is valued as 0

In what are we interested??

3

Page 4: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample mean of Bernoullis

Consider the sample of 10 observations from a Bernoulli that is,the sample of 10 responses to the question

Do you smoke? where Yes is valued as 1

and No is valued as 0

In what are we interested??the proportion, pwhich is the sample mean of a bunch of 0’s and 1’s

4

Page 5: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Random variables - some math

Les us call X1, a random variable which measuresthe response (0 or 1) of the first person

and X2 is the response is the response of the second person

etc, up to X10, the response of the 10’th person

let Y be the sum of the responses of all ten subjects

Then P, the sample proportion, is the average (sample mean)or all ten responses

that is P = Y

n=

∑101 Xi

n= 0+1+1...+0

10

5

Page 6: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of a sample mean of Bernoullis

Remember that Y is the sum of 10 Bernoullis

so that what is the distribution of Y?(which can be thought of number of ”successes” in a sample ofsize 10)

6

Page 7: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of a sample mean of Bernoullis

Remember that Y is the sum of 10 Bernoullis

so that what is the distribution of Y?(which can be thought of number of ”successes” in a sample ofsize 10)

Binomial (10,0.2)where π = 0.2 is the population proportion of smokersor the probability of picking a smoker at random

Hence the distribution of the sample proportionis that of a multiple of the binomial distribution

that is, it is a curve which has the same ”boxes” as the binomialexcept the x-axis is marked in proportions rather that integers

7

Page 8: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Binomial Distribution B(10,0.2)

x Pr(X=x)

0 0.107371 0.268442 0.301993 0.201334 0.088085 0.026426 0.005517 0.000798 0.000079 0.0000010 0.00000

8

Page 9: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Bin(10,0.2)

0.15

0.10

0.00

0.20

0.05

8 9765430 1 2 10

Probability

0.25

0.30

9

Page 10: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of proportion

x Pr(X=x)

0.0 0.107370.1 0.268440.2 0.301990.3 0.201330.4 0.088080.5 0.026420.6 0.005510.7 0.000790.8 0.000070.9 0.000001.0 0.00000

10

Page 11: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

proportion of 10 Bern(0.2)’s

0.15

0.10

0.00

0.20

0.05

Probability

0.25

0.30

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

11

Page 12: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

distribution of proportions

If the proportion is the average of a number of Bernoullidistributionsits distribution is exactly a multiple of a Binomial

Hence we can always plot its distribution and calculate probabilities

From a previous lecture, we know thatfor large sample size, n, and nπ > 5the binomial distribution can be approximated by a Normaldistribution

Similarly, the distribution of the proportionfor large sample size, n, and nπ > 5can be approximated by a multiple of a Normal distribution

12

Page 13: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Sample means from other distributions

easy stuff ends here

If we have more complicated distributions that produce the dataof which we are calculating sample means

we cannot get the distributions so easily as for the proportion

However, for large samples, the distribution can be approximated

13

Page 14: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Sampling from a Binomial

Consider taking a random sample of 10 peopleto you have administered the earlier described Stress ScaleWe assume that the distribution of the Stress Scaleis Binomial(10,0.2)

From what we have just donewe know that, if we simulate the taking of such sample many timeswe can plot the resulting statisticand see the distribution of the statisticin this case, that of the sample mean

14

Page 15: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of sample mean - 1000 simulations

Title ’distribution of sample mean’;

options ps=24 ls=64;

data samples;

seed=25487;

nsim = 1000;

nsam=10;

nquest=10;

pi=0.2;

do nrun = 1 to nsim;

sumx = 0;

do i =1 to nsam ;

x=ranbin(seed,n,pi);

sumx = sumx+x;

end;

xbar=sumx/nsam; output;

end;

15

Page 16: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of sample mean (continued)

this is a default plot

proc means;

var xbar;

title ’sampling distribution of sample means’;

proc chart;

vbar xbar/type=pct space=0;

proc gchart;

vbar xbar/type=pct space=0;

16

Page 17: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Statistics

Sample statistics

nsam Mean Std Dev Minimum Maximum

---------------------------------------------------

10 1.9980000 0.3982510 0.6000000 3.7000000

30 1.9983867 0.2340997 1.1666667 2.9000000

100 1.9984980 0.1279179 1.5600000 2.5600000

---------------------------------------------------

as the sample size increases

1. the standard deviation gets smaller

2. the range gets smaller, and more symmetric

17

Page 18: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

CHART output for sample size 10

Graphical representation of changes with sample size

Percentage

10 | ***

| ****

8 | ******

| *******

6 | *********

| **********

4 | ***********

| *************

2 | ***************

| *******************

---------------------------

1.1 1.5 1.9 2.3 2.7 3.1

18

Page 19: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

CHART output for sample size 30

Percentage

12 | **

| ****

10 | ****** **

| ****** **

8 | ******** ****

| ******** ****

6 | ** ****************

| ** ****************

4 | ************************

| ************************

2 | ******************************

| ************************************

-------------------------------------

1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9

19

Page 20: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

CHART output for sample size 100

Percentage

10 | *

| ****

8 | ******

| *******

6 | *********

| **********

4 | ************

| **************

2 | *****************

| *********************

------------------------------------

1.7 1.9 2.1 2.3 2.5

20

Page 21: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 10- default plot

fancier graphs

21

Page 22: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 30- default plot

22

Page 23: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 100 - default plot

23

Page 24: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of sample mean (continued again)

this is a plot with a defined rangeso that we can compare the output for 10,30,100

proc gchart;

vbar xbar/type=pct space=0

midpoints = 0.6 to 3.4 by 0.2;

24

Page 25: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 10- plot with defined range

25

Page 26: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 30- plot with defined range

26

Page 27: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample 100- plot with defined range

can see that plots centre around population mean (2.0)

27

Page 28: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Conclusions

1. as sample size gets largervariance decreases

2. as sample size gets largercurve looks more symmetric

28

Page 29: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Distribution of sample mean (more)

alternatively use Proc UNIVARIATE’s command HISTOGRAMfor both the histogram and approximating normal

proc univariate;

var xbar;

histogram /normal(mu = 2.0 sigma = 0.4);

where sigma = 0.2309 for nsam = 30and sigma = 0.1265 for nsam = 100

29

Page 30: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 10- histogram and theoretical distribution

30

Page 31: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample size 30- histogram and theoretical distribution

31

Page 32: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

sample 100- histogranmand theoretical distribution

32

Page 33: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Conclusions

1. as sample size gets largercurve looks more Normal

33

Page 34: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

Sampling from other distributions

1. Normal - perfectdistribution of sample mean is Normalregardless of sample size

2. symmetric, eg, Uniformdistribution of sample mean is symmetric(for uniform, tails may be truncated)for ”smallish” samples, distribution is normalapproximately

3. asymmetric - continuous counterpart of Binomiallike Binomial

3.1 for large sample size, distribution is approximately normal3.2 for small sample size, approximation to normal is poor

34

Page 35: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

The Central Limit Theorem

◮ take sample of size nsam

◮ for nsam large enoughthe distribution of the sample meanwill be ”Normal”

35

Page 36: Epidemiology 9509 - Principle of Biostatistics Chapter 7 ...publish.uwo.ca/~jkoval/courses/Epid9509/chapter7/sampling_distributionsb.pdf · Epidemiology 9509 Principle of Biostatistics

Epidemiology 9509 sampling distributions (more)

The Central Limit Theorem (statistically)

◮ sample from (µ, σ2) nsam times

◮ for nsam large enoughX̄ ∼ N(µ, σ2/nsam)

36