40
5. (Estimation) 2014/4/17

5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5. 추 정 (Estimation)

2014/4/17

Page 2: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.1 머리말 (Intoriduction)

• 통계적 추측 (statistical inference)

– 어느 모집단으로부터 구한 표본에서

얻어진 결과를 기초로 그 모집단에 관해 추측하는 과정

– Say something about the population based on the information of the sample

1) 추정(estimation)

2) 가설검정(hypothesis testing)

Page 3: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• 추정치(estimate)

1) 점추정(point estimate)

2) 구간추정(interval estimate)

• 추정식(estimator)

• 불편이성(unbiasedness)

• Target population vs. sampling population

3ix

xn

의추정식

2

ˆ( . . based on data) is an unbaised estimator of (parameter)

ˆif ( )

. ( ) , so sample mean is an ue of the population mean

if the samples are randomly selected from ( , )

r v

E

ex E X

N

Page 4: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

예) sample variance ( ) is an unbiased estimator of

So

: not an unbiased estimator

• Bias =

• Bias of an unbiased estimator is zero

• Probability sampling and non-probability sampling

• Randomization

• Blinding

2 2( )E s

2 21( )iE y y

n

2s2

ˆ( )E

Page 5: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.2 모집단평균의 신뢰구간 (Confidence interval of population mean)

• If we select samples repeatedly from normal population, will include with the probability of

• :confidence level (ex. .95) 신뢰수준

:significance level (ex. .05) 유의수준

= )

)

신뢰구간 추정평균 (신뢰도계수 표준오차

conf. int.=eatimated value (reliability coef SE

(1 2) xx z

(1 2) xx z

100(1- )%

1

Page 6: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터
Page 7: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.2.1>

A researcher measures amount of a certain enzyme. n=10, sample mean=22, We can assume normality with pop variance=45. . 95% C.I. of ?

45100

2 22 2 (17.76, 26.24)xx

Page 8: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.2.2>

Measuring maximum strength of a certain muscle. We want 99% CI of the pop mean. We assume normality with pop variance=144. n=15, sample mean=84.3,

z=2.58 with 0.99 confidence level,

SE=

What is the 99% C.I. of ?

12 15 3.10x

84.3 2.58(3.10) 84.3 8.0 (76.3, 92.3)

Page 9: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• Sample from non-normal pop central limit theorem

<보기 5.2.3>

delay time because of patient’s being late at a clinic, n=35, sample mean=17.2 min, sd from the previous study (assumed to be known)=8 min. Pop is not normally dist’ed.

what is 90% CI of ?

17.2 1.645(1.35) 17.2 2.2 15.0, 19.4

8/ 35 1.35x

Page 10: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.2.4>

Measure activity of a certain enzyme from 35 patients

Population variance= 0.36, 95% CI?

-> apply CLT

0.7164 1.96(.6 / 35) (0.5174, 0.9155)

Page 11: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

CI calculation using R

> m <- 0.7164

> s <- sqrt(0.36)

> n <- 35

> alpha <- 0.05

> error <- qnorm(1-alpha/2)*s/sqrt(n)

> left <- m-error

> right <- m+error

> left

[1] 0.5176234

> right

[1] 0.9151766

>

confint <- function(m,s,n,alpha=0.05){ error <- qnorm(1-alpha/2)*s/sqrt(n) left <- m-error right <- m+error print(c(left,right)) } confint(0.7164,sqrt(0.36),35) [1] 0.5176234 0.9151766

Page 12: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.3 t-분포 (t-dist’n)

• Pop variance is known and n is large:

• Pop variance is not known and n is large :

• Small sample size (n<30) :

derived by Gosset “Student’s t-dist’n”

표 E

2( )

1

ix xs

n

1n

xt t

s n

xz

n

Page 13: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

x <- seq(-4, 4, length=100) hx <- dnorm(x) degf <- c(1, 3, 8, 30) colors <- c("red", "blue", "darkgreen", "gold", "black") labels <- c("df=1", "df=3", "df=8", "df=30", "normal") plot(x, hx, type="l", lty=2, xlab="x value", ylab="Density", main="Comparison of t Distributions") for (i in 1:4){ lines(x, dt(x,degf[i]), lwd=2, col=colors[i]) } legend("topright", inset=.05, title="Distributions", labels, lwd=2, lty=c(1, 1, 1, 1, 2), col=colors)

• Some properties of t-dist’n

1) mean= 0

2) Symmetric about the mean

3) Variance > 1, -> 1 as n -> ∞

4)

5) The shape depends on degrees-of-freedom)=n-1

t

Page 14: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

6) Flatter at the center

and heavier tails

than normal dist’n

7) t-dist’n -> normal dist’n as n-1 -> ∞

• CI :

(1 2)

sx t

n

Page 15: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.3.1>

n=15, measure Amylase, sample mean=96unit/100ml, sd= 35, pop variance is not known. 95% CI of the pop mean?

0.975

SE( )= 35 15 9.04

1 14

2.1448

96 2.1448(9.04) 96 19 (77,115)

x s n

n

t

df=

Page 16: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• Choice of z and t

pop~normal?

enough n ? enough n ?

σ2 known? σ2 known? σ2 known? σ2 known?

applying CLT Non-parametric methods

Page 17: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.4 CI of the difference of the two means

• Samples from normal pop’s

<보기 5.4.1>

Measure serum uric acid from 12 patients , measurements from 15 normal controls , variances are known to be 1 for each group, 95% CI for ?

2 21 2

1 2 1 2

1 2

x x zn n

1 4.5 /100x ml ml

2 3.4x

1 2 -

1 11.1 1.96( ) 1.1 1.96(.39) (.3, 1.9)

12 15

CI does not include 0

Page 18: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• Sample from non-normal pop central limit theorem

<보기 5.4.2>

To compare socio-economic status (SES) of patients from two hospitals. 75 pts from hospital A: , 80 pts from hospital B: , pop variances are

99% CI of ?

1 6,800x

2 4,450x 1 2600, 500,

1 2 -

2 2(600) (500)(6800 4450) 2.58 (2120, 2580)

75 80

Page 19: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• t-dist’n and difference of the means:

In practice, pop variance is typically unknown. Two approaches

1) Same variances,

2) Different variances

1) When the variances are the same:

we calculate pooled estimate by calculating weighted average of the variances

2 22 1 1 2 2

1 2

( 1) ( 1)

2p

n s n ss

n n

Page 20: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

df=

<보기 5.4.3>

to measure Amylase: 15 normal controls(group2) sample mean=96, sd=35. 22 pts(group1) sample mean and sd=120 and 40. pop ∼normal. Variances unknown but equal.

1 2100(1 )% of CI

2 2

1 2 (1 2)

1 2

( )p ps s

x x tn n

1 2 2n n

2 22 14(35) 21(40)

145015 22 2

ps

1450 1450(120 96) 2.0301 2,50

15 22

1 2100(1 )% of ?CI

Page 21: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

2) When the variances are different

does not follow t-dist! 1 2 1 2

2 21 2

1 2

( ) ( )x x

s s

n n

1 1 2 21 2

1 2

'w t w t

tw w

2 2

1 21 2 (1 2)

1 2

( ) 's s

x x tn n

2 21 1 1 2 2 2

1 1 1 2 2 2 1 2

* , ,

1 , 1

w s n w s n

df n t t df n t t

Page 22: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.4.3>

To measure some bio-marker. We can assume normal dist’n. But variances are not the same.

1 295% CI of ?

114.244(2.2622) 5.1005(2.0930)' 2.255

114.244 5.1005t

2 233.8 10.1(62.6 47.2) 2.255 ( 9.2,40.0)

10 20

t9 t19

Pts normal

Page 23: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

pop~normal?

σ2 known?

applying CLT Non-parametric methods

Page 24: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

숙제

• 5.2.4 5.2.5

• 5.3.4 5.3.5

• 5.4.3 5.4.10

Page 25: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.5 모집단 비율의 신뢰구간(CI of proportion)

<보기 5.5.1>

behavior of oral hygiene. 123 take 2 oral exams per year out of 300

(1 2) (1 )p z p p n

100(1 )%- CI of p?

95% CI of p?

123/300 0.41p

0.41 1.96 0.41(0.59) / 300 .36,.46

Page 26: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.6 두 모집단 비율의 차이의 신뢰구간 CI of difference of two proportions

<보기 5.6.1>

Recovery times of a disease by two treatments Assign 200 pts randomly to two trt groups. Trt A: 78pts recovered within 3days, trt B 90 pts.

1 1 2 21 2 (1 2)

1 2

(1 ) (1 )p p p pp p z

n n

100(1 )% ? 1 2- CI of p -p

95% 1 2CI of p -p ?

(.78)(.22) (.90)(.10)(.78 .90) 1.96 ( .22, .02)

100 100

Page 27: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.7 평균을 추정할 때의 표본수의 결정 sample size calculation: inference of the mean

• Sampling from an infinite pop

• Sampling without replacement from a small pop

( )d zn

reliability coef (SE)

2 2

2

zn

d

1

N nd z

Nn

2 2

2 2 2( 1)

Nzn

d N z

(reliability coef)d (width of the CI)/2, (SE)

Page 28: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.7.1>

measure daily protein intake from teenage girls. Width of CI=10 (+-5). Confidence level= 0.95, pop sd=20, pop is very large; we can ignore finite pop correction factor

1.96, 20, 5z d

2 2 2 2

2 2

(1.96) (20)61.47 62

(5)

zn

d

girls

Page 29: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.8.1>

surveying proportion of household with medical service. We know p<0.35. d of 95% CI =0.05, n=?

5.8 비율을 추정할 때 표본수의 결정 sample size calculation: inference of the proportion

2

2

2

2 2

( 0.05)

( 1)

z pqn n N

d

Nz pqn

d N z pq

2

2

(1.96) (.35)(.65)349.6 350

(.05)n households

Page 30: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

숙제

• 5.5.3 5.6.2 5.7.3 5.8.4

Page 31: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.9 정규분포 모집단 분산의 신뢰구간 CI of the variance from normal dist’n

• Point estimator of variance

Good estimator? ‘unbiasedness’

2 2( ) ?E s

Page 32: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• Pop=(6,8,10,12,14), n=2 (exam at chap4)

w replacement-

w/o replacement-

22

22

( )8

( )10

1

i

i

x

N

xS

N

2

2 20 2 0( ) 8

25

i

n

sE s

N

2

2 22 8 2( ) 10

10

i

Nn

sE s S

Page 33: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

(chi-square distribution) 표 F

2 2 2large 1 ( )N N N E s S

22

2

( 1) of on the dis'n of

n sCI

n

2 2i

i=1

depends = (x -x) / .

22

12

( 1)n

n s

Page 34: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

2

2 22 2

/ 2 (1 / 2)2 2

2 22 2

2 2(1 / 2) / 2

100(1 )%

( 1) ( 1) 100(1 )%

( 1) ( 1) 100(1 )%

n s n s

n s n s

CI of ?

CI of

CI of

Page 35: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.9.1>

n=15, measure Amylase, 296, 35, 95%x s CI of ?

2 1225, 1 14s df n

2

2

(14)(1225) (14)(1225)

26.119 5.629

656.6101 3046.7223

25.62 55.20

Page 36: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

5.10 두 정규분포 모집단의 분산비에 대한 신뢰구간 CI for the ratio of two variances

표 G

1 2

2 21 1

1, 12 22 2

n n

sF

s

Page 37: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

<보기 5.10.1>

normal adults, n=21(group1). Parkinson disease pts, n=16명(group2). Observe response time of a certain stimulus. Sample variance of grp1=1600,grp2=1225.

21

22

100(1 )%

CI of ?

2 2 2 2 2 2 21 1 1 2 1 1 2

/ 2 (1 / 2)2 2 2(1 / 2) / 22 2 2

,s s s s s

F FF Fs

2 21 2 95% CI of ?

2 2 2 2 2 2 21 2 1 1 2 1 1

2 2 2.975 .0252 2 2

1600 1225 1600 1225, , .473 3.36

2.76 .389

s s s s

F F

Page 38: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

• Statistical distributions

: sum of n independent normal rv’s

21 2, , , N( , ) nY Y Y random sample from iid

2

,Y Nn

22

212

1

( 1) ni

n

i

Y Yn s

0,1/

YN

n

1

/n

Yt

s n

2n

2

2

1

(0,1)n

i in

i

Y YN

22 2

/ 2, 1 1 / 2, 12

( 1)1n n

n sP

Page 39: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

: ratio of 2 independent chi-squares (df= )

1 2,n nF

2 21 2

/ 2 1 / 22 22 1

1s

P F Fs

1 2,n n

1

1 2

2

21

,22

/

/

n

n n

n

nF

n

1 2

2 21 1

1, 12 22 2

/

/n n

sF

s

Page 40: 5. 추 정hosting03.snu.ac.kr/~hokim/int/2014/chap5.pdf · 2014-04-17 · 5.1 머리말 (Intoriduction) •통계적 추측 (statistical inference) – 어느 모집단으로부터

Homework

• 5.9.3 5.9.7 5.10.5 5.10.7

• 종합문제 13 21 23