11
Point and Interval Estimation Daniel Y.T. Fong (Email: [email protected]) NURS4302 - STATISTICS School of Nursing, The University of Hong Kong Statistical Inference Population Sample Random sampling :- easy! Statistical Inference Estimation Hypothesis Testing Learning Objectives 1. To estimate a population mean 2. To estimate a population proportion … when we do not have data from the population Estimation for Mean - Point estimation when we do not have data from the whole population Population Sample

Learning Objectives Estimation for Mean - Point estimationnursing.hku.hk/biostats/lab_ft/lecture4.pdf · Sampling Distribution for the Mean ... no such row on the t-table! ( – t(df,X

  • Upload
    hakiet

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Point and Interval EstimationDaniel Y.T. Fong(Email: [email protected])

NURS4302 - STATISTICS

School of Nursing, The University of Hong Kong

Statistical Inference

Population Sample

Random sampling :- easy!

Statistical Inference

■ Estimation■ Hypothesis Testing

Learning Objectives

1. To estimate a population mean

2. To estimate a population proportion

… when we do not have data from the population

Estimation for Mean- Point estimation

… when we do not have data from the whole population

Population Sample

Can You Plan for This?

What is the blood pressure, on average, of an

African-Chinese after taking calcium for 12 weeks?

What we want to do …

SamplingBlood pressure of all African Chinese at 12 weeks after

calcium intake

(Population)

Sample

mean mean

unknown observed

Estimation

How willYou Start the Study?

1. Decide a sample size !

2. Draw a random sample

Let’s take it as 5 !

What is the blood pressure, on average, of an African-Chinese

after taking calcium for 12 weeks?

97, 121, 113, 98, 101 Average = 106

Point estimate of the population meanAre we done?

Point Estimation

97, 121, 113, 98, 101 Average = 106

112, 108, 97, 101, 113 Average = 106.2

99, 109, 92, 98, 121 Average = 103.8

What is the blood pressure, on average, of an African-Chinese

after taking calcium for 12 weeks?

Is it sufficient to just report the sample mean ?

Knowing the Sample Mean

Sample mean is Random ! Average of all possible

sample means is the population mean!

Sample mean is an unbiased estimate of the population mean

Sample variance (using n-1) is also an unbiasedestimate of the population variancePopulation mean

The truth

Sample mean Sample 1  106 Sample 2  106.2 Sample 3  103.8 Sample 4  108 

Average  105 

Example – An Illustration

You are interested in the heights of a family consisting of 4 members

You did not know that their heights are 1.7m, 1.5m, 0.9m and 0.8m

Taking random samples of size 2, with replacement

Mean = 1.23m Variance = 0.15m2

Example – An Illustration

Using divisor of 2-1 = 1

Their average = 1.23

Their average = 0.15

Population mean = 1.23Population variance = 0.15

Example – An Illustration

Their average = 1.23

Their standard deviation (divisor = 16) is the standard error for the mean

Standard error for the mean refers to the variability of a sample mean

Standard deviation () of the population refers to the variability of measurements in the population

Both of them are population parameters and are therefore unknown

Standard Error for the Mean

n

Standard error for the mean

where n is the sample size

Example – An Illustration

Their average = 1.23

Their standard deviation (divisor = 16) is 0.27

27.0238.0

n

With only one random sample, how can the standard error for the mean be estimated ?

Sample Standard Error for the Mean (SEM)

SEM = sample standard error for the mean SD = sample standard deviation n = sample size

SEM =SD

n

SEM vs SD

0

5

10

15

20

25

30

0 5 10 15 20

Sample size (n)

SEMSD = 30

SD = 20SD = 10

SEM =SD

n

The larger the variability in the sample/population, the larger is the variability of the sample mean

An increase in sample size will reduce the SEM

Is SEM < SD always true?

Revisiting Example

97, 121, 113, 98, 101 Average = 106

What is the blood pressure, on average, of an African-Chinese

after taking calcium for 12 weeks?

SD = 10.54SEM = 10.54/5 = 4.71

112, 108, 97, 101, 113 Average = 106.2SD = 6.98SEM = 6.98/5 = 3.12

■ We need to report the sample mean in order to estimate the population mean

■ We need to report the SEM in order to know the precision of using the sample mean for estimation

Example from the Literature

A survey was conducted to examine the perception of Hong Kong high school students in choosing nursing as their career

604 and 639 male and female students responded For each student, a PNC score was obtained to assess

the perception of nursing as a career ◦ The highest possible PNC score is 44, indicating the most positive

attitude towards nursing as a career.

◦ The lowest score is 11, signifying a strong negative attitude.

Suppose you are particularly interested in the mean PNC score of all male high school students in Hong Kong.

~ Law & Arthur (2003)

Estimating the mean PNC score of all male high school students in Hong Kong

1. In the sample of 604 male students, the sample mean and standard deviation are 27.8 and 3.42.

2. A point estimate is 27.83. The SEM is

604/42.3

It can generally be considered as relatively small. Therefore, the point estimate can be considered as precise.

= 0.14

Estimation for Mean- Interval estimation

… when we do not have data from the whole population

Population Sample

An Alternative to Point Estimate

The Idea

Obtain an interval which we are highly certain that it includes the population mean (the truth)

Truth (unknown)So, how?

Sampling Distribution for the Mean

The population

Sample 1 Sample 2

mean1 mean2 mean3

Sample 3

Sampling distributionfor the mean

Sampling Distribution for the Mean

The distribution of the sample mean is N(µ,2/n)provided the sample size is large enough

(Central limit theorem)

What is given..

What we can say …

A sample whose corresponding population has mean=µ and variance=2

Getting an interval with high certainty to include the true value (µ)?

nXl

2

1 96.1

nXl

2

2 96.1 96.1

/21

nlX

Getting an interval with 95% chance toinclude the true value (µ)?

95.0)( 21 llP

),( 21 ll

95.0)( 12 llP

Suppose the interval is .

95.0)( 12 lXXlXP

95.0)///

(2

122

2

nlX

nX

nlXP

95.0)//

(2

12

2

nlXZ

nlXP

)1,0(~/2

Nn

XZ

0

0.95

This is what we required!

Similarly,

is unknown!!2

( – t(df,/2) SEM, + t(df,/2) SEM)XX

Interval Estimation

= sample meanSEM = estimated standard error for the meant(df,/2) = t value that depends on two values: df and df = degrees of freedom = n-11- = level of certainty/confidence, 0 < < 1

X

confidence interval for the mean (population)(1-)100%

e.g. An = 0.05 specifies a 95% confidence interval

??

Obtaining t(df,/2)

/2/2

t(df,/2)-t(df,/2)

t-distribution with degrees of freedom = df

1. Determine df and

2. Check out the critical values from the t-Table (using two-tailed)

Obtaining t(df,/2)

t(4,0.1/2) t(4,0.05/2) t(4,0.01/2)

Revisiting Example

97, 121, 113, 98, 101 Average = 106SD = 10.54SEM = 10.54/5 = 4.71

What is the blood pressure, on average, of an African-Chinese

after taking calcium for 12 weeks?

1. Determine df and

2. Check out the critical values from the t-table

df = 4 (=5-1); = 0.05 (specified)

2.776

( – t(df,/2) SEM, + t(df,/2) SEM)XX

(1-)100% confidence interval

A 95% confidence interval for the mean is(106 – 2.7764.71, 106 + 2.7764.71) = (92.9, 119.1)

So, we are 95% certain that the interval (92.9, 119.1) includes the true population mean.

Confidence Intervals

Drew 100 random samples Hence, 100 CIs

95% CI thatincludes the true valuedoes not include the true value

99% CI thatincludes the true valuedoes not include the true value

Revisiting ExampleWhat is the blood pressure, on average, of an African-Chinese

after taking calcium for 12 weeks?

112, 108, 97, 101, 113 Average = 106.2SD = 6.98SEM = 6.98/5 = 3.12

A 90% confidence interval for the mean is(106.2 – 2.1323.12, 106.2 + 2.1323.12) = (99.5, 112.9)

A 95% confidence interval for the mean is(106.2 – 2.7763.12, 106.2 + 2.7763.12) = (97.5, 114.9)

A 99% confidence interval for the mean is(106.2 – 4.6043.12, 106.2 + 4.6043.12) = (91.8, 120.6)

Width of Confidence Interval

The width reflects the precision

90% CI

95% CI

99% CI

97.5 114.9

91.8 120.6

99.5 112.9

十拿九穩

九五之尊

百發失一

Example from the Literature - Revisit

1. In the sample of 604 male students, the sample mean and standard deviation are 27.8 and 3.42.

2. A point estimate is 27.8 with SEM = 0.143. df =603, no such row on the t-table!

( – t(df,/2) SEM, + t(df,/2) SEM)XX

(1-)100% confidence interval

Normal Approximation

A t-distribution with df= is the standard Normal distribution

For a 90% confidence interval, approximatet(603,0.1/2) by 1.645

For a 95% confidence interval, approximatet(603,0.05/2) by 1.96

For a 99% confidence interval, approximatet(603,0.01/2) by 2.576

(27.57, 28.03)

(27.53, 28.07)

(27.44, 28.16)

( – t(df,/2) SEM, + t(df,/2) SEM)XX

(1-)100% confidence interval

Q & A

measures the variability of the observations is a measure of how far the sample mean is likely to be

from the population mean

is greater than the SD of the sample

is proportional to the number of observations

1. On the sample standard error for the mean (SEM) of a sample

True or False ?

Estimation for Proportion… when we do not have data from the whole population

Population Sample

Estimation of Proportion

What is the percentage of persons who experienced lethal shock

after receiving the current vaccine?

What we want to do …

Sampling

Estimation

Whether or not all vaccinated

persons experienced lethal shock

(Population)

Sample

proportion proportion

unknown observed

How willYou Start the Study?

What is the percentage of persons who experienced lethal shock

after receiving the current vaccine?

Point estimate of the population proportion

■ Decide a sample size!■ Sample size = 136■ Number of persons experienced lethal shock

= 9

Sample proportion (p) = 6.6%

Confidence Interval for Proportion

Sample standard error for proportion (SEP)

SEP =p(1p)

n

A 95% CI = (2.4%, 10.8%)

(p – z(/2)SEP, p + z(/2)SEP)

when the sample size is sufficiently large

(1-)100% confidence interval for the proportion

SEP = 0.021

Critical Value from the Standard Normal Distribution

Z(0.05/2)

Example from the Literature- Revisit

In the same survey, students were also asked if they would consider nursing as a career possibility

A total of 348 from the total of 1243 students responded they would

Estimate the proportion of students who would consider nursing as a career possibility

Estimate the proportion of students who would consider nursing as a career possibility

A point estimate for the proportion is 348/1243 = 28% The SEP =

= = 0.013

A 95% confidence interval for the proportion of school students who would consider nursing as a career possibility is

npp /)1( 1243/)28.01(28.0

))013.0(96.128.0),013.0(96.128.0( = (0.255, 0.305)

That is, we are 95% confident that the proportion of school students who would consider nursing as a career possibility is between 25.5% and 30.5%.

Q & A

The SEP becomes smaller when the sample size becomes smaller.

The SEP when p=0.1 is larger than that when p=0.5. A 95% confidence interval for a population proportion

bears 95% chance to include the sample proportion.

2. Decide True or False in the following questions.

True or False ?