Upload
eara
View
63
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mean and Variance. Distribution ?. statistics. pop’n dist’n. dist’n of a sample. (sample) statistic. (population) parameter. pop’n dist’n. dist’n of a sample. A new variable X from mseg of credit card data. mseg X - PowerPoint PPT Presentation
Citation preview
Mean and Variance
Distribution ?
dist’n of a sample pop’n dist’n
statistics
(sample) statistic (population) parameter
X %freq
Head 1 0.5
Tail 0 0.5
Total 1.0
X freq %freq
Head 1 20 0.4
Tail 0 30 0.6
Total 50 1.0
dist’n of a sample
pop’n dist’n
X %freq
Head 1 0.35
Tail 0 0.65
Total 1.0
Y %freq
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
Total 1.0
Y freq %freq
1 10 0.1
2 20 0.2
3 10 0.1
4 20 0.2
5 20 0.2
6 20 0.2
Total 100 1.0
mseg X
Low Spender 1Med Low Spender 2 Average Spender 3 Med High Spender 4 High Spender 5
A new variable X from mseg
of credit card data
X freq %freq
1 26 0.26
2 20 0.20
3 11 0.11
4 25 0.25
5 18 0.18
Total 100 1.00
X %freq
1 ?
2 ?
3 ?
4 ?
5 ?
Total 1.00
Variable X of credit card
data
?
Measure for location (center)
Mean,
Mode
Median
(truncated, winsorized) Mean
Mean
Median
50% 50%
Median
Mode
Hit/Stop Burst
Dealer's hidden card ?
2 - 91,11 10
Outlier
64
5 6
Truncated mean / Winsorized mean
64 5 61 9
64 5 64 6
64 5 6
64
5 6
Truncated mean / Winsorized mean
50% 50%
Q1 Q2 Q3
75% 25%25% 75%
Quartiles
25 percentile 50 percentile 75 percentile
Median
일러스트 = 유재일 기자 [email protected]
빗나간 주택통계 부동산 정책도 헛발질
한국의 PIR 은 주택의 평균 가격과 도시근로자의 평균 가계소득을 기준으로 계산한다 . 반면 미국의 PIR 은 미디언 가격 (MEDIAN PRICE·중간가격 ) 과 미디언 소득을 기준으로 한다 . 미디언 가격은 그 지역에서 거래된 가장 가격이 싼 주택에서부터 가장 비싼 주택을 일렬로늘어 놓은 뒤 그 중간치를 선택한다 .
건설산업전략연구소 김선덕 소장은 “평균가격이나 평균소득은 고가의 주택이나 엄청난고소득자가 일부 포함되면 통계가 왜곡될 수 있다”고 말했다 . 더군다나 한국의 주택가격은호가 ( 呼價 ) 이고 미국의 주택가격은 실거래가를 기준으로 한다 .
차학봉 기자 , [email protected]입력 : 2007.03.26 23:31
Wrong housing statistics make wrong real estate policy.
While median is better statistic than mean in representing house prices,Korean government publishes statistics calculated by mean on house prices. Mean price can be distorted by just one or two extreme prices.
percentile
p% (100-p)%
p-th percentile
Measure for variability
Range
InterQuartile Range (IQR)
Variance
Standart Deviation
11
Range
1Q 2Q 3Q
13 QQIQR
11
variance, standard deviation
Y %freq
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
Total 1.0
Y freq %freq
1 10 0.1
2 20 0.2
3 10 0.1
4 20 0.2
5 20 0.2
6 20 0.2
Total 100 1.0
Mean (Y) = 1*0.1 + 2*0.20 + 3*0.1 + ... + 6*0.2
= 3.8 Mean (Y) = 1*(1/6) + 2*(1/6) + ... + 6*(1/6) =
3.5
X freq %freq
Low Spender 1 26 0.26 Med Low Spender 2 20 0.20 Average Spender 3 11 0.11 Med High Spender 4 25 0.25 High Spender 5 18 0.18 -----------------------------------------------Total 100 1.00
Mean of X
Mean (X) = 1*0.26 + 2*0.20 + 3*0.11 + 4*0.25 +
5*0.18 = 2.89
fX ~
i
ii xfxXE )()(
fX
)( 1xf1x
)( nxfnx
1Total
1)(
iixf
fX ~
i
ii xfxXE )()( 22
fX
)( 1xf1x
)( nxfnx
1Total
2X21x
2nx
X Q %freq
Low Spender 1 (-2)2 0.26 Med Low Spender 2 (-1)2 0.20 Average Spender 3 02 0.11 Med High Spender 4 12 0.25 High Spender 5 22 0.18 -----------------------------------------------Total 1.00
A new variable Q = (X – 3)2
Mean (Q) = (-2)2*0.26 + (-1)2*0.20 + 02*0.11 +
12*0.25 + 22*0.18
fX ~
i
ii xfcxcXE )()(])[( 22
]))([()( 2XEXEXVar
)(XEc Let ,
*~ fX
XxfxXEi
ii )()( **
*fX
)( 1* xf1x
)(* nxfnx
1Total
Distribution of a sample
i
ii
ii Xxn
xfxXE1
)()( **
*fX
5/21
5/13
1Total
5/22
*fX
5/11
5/13
1Total
5/12
5/11
5/12
Sample mean
freq
2
12
5
2*** ))(()( XEXEXVar
*~ fX
2*2 )(1
)()( xxn
xfxxi
ii
ii
(O)
Sample variance
222)(1
1X
ii sorsxx
n
2*** ))((1
)( XEXEn
nXVar
1
2)(1
1
ii xx
n
For large n,
1
2)(1
ii xx
n
11
n
n
20n large enough
1
22 )(1
1
ii xx
ns
n N
1
22 )(1
iixN
X
Standard deviation
)()( XVarXsd
)(*)(* XVarXsd
X V freq
Low Spender 1 (1-2.89)2 26 Med Low Spender 2 (2-2.89)2 20 Average Spender 3 (3-2.89)2 11 Med High Spender 4 (4-2.89)2 25 High Spender 5 (5-2.89)2 18 -----------------------------------------------Total 100
V = (X – 2.89 )2
Var*(X)= (1/99)[(1-2.89)2*26 + …+ (5-2.89)2*18] =
2.22 sd*(X) = 1.49
dist’n of a sample pop’n dist’n
statistics
sample mean population mean
sample variance population variance
sample median population median
…. ….
Nn
no. of teeth
weight of body
no. of phone calls
N
no. of teeth weight of body
N
freqxf ii )( )(xf
1)( dxxf1)( i
ixf
no. of phone calls
n
n
freqxf ii )(
1)( i
ixf
dxxf )(i
ixf )(
dxxfx )(2i
ii xfx )(2
E
)(,)(,)(* xfxfxf ii
dxxfxXEXEXVar )()())(()( 22
dxxfxXE )()(
i
ii xfxXE )()(
)()())(()( 22ii
i
xfxXEXEXVar
Expected value
dxxfxXE )()(
i
ii xfxXE )()(
X f(xi)
Head 1 0.5
Tail 0 0.5
5.0)( XE
0 1
Y f(yi)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
5.3)( YE
1)1( E
1)(1)1( i
ixfE
ccE )( X f(xi)
1 1/2
1 1/4
1 1/8
1 1/8
)(3)3( XEXE
)(3)(3)(3)3( XExfxxfxXEi
iii
ii
X 3X f(xi)
1 3 1/2
2 6 1/4
3 9 1/8
4 12 1/8
)()( XEccXE
)()1()()1)(())(( XEEXEXEEXEE
2))(()()())(())(( XEXEXEXXEEXXEE
E
)(),(),(* xfxfxf ii
100 x + 10 x
i ii i iii ybxaybxa )(
)()()( YEbXEaYbXaE
100 x + 10 x
X Y 100X 10Y 100X+10Y
f
1 (H) 1 100 10 110 1/12
0 (T) 1 0 10 10 1/12
1 (H) 2 100 20 120 1/12
0 (T) 2 0 20 20 1/12
1 (H) 6 100 60 160 1/12
0 (T) 6 0 60 60 1/12
]6010110)[12/1()10100( YXE
85)(10)(100 YEXE
2))(()( XEXEXVar
22 ))(()( XEXE
22 ))(()(2 XEXEXXE
22 )())(()( cXEXEXEXVar
For any constantc
0)1( Var
)()( 2 XVaraaXVar
Thank you !!