Upload
nguyenanh
View
228
Download
0
Embed Size (px)
DESCRIPTION
d d dá da sdasdasd
Citation preview
HC VIN CNG NGH BU CHNH VI N THNG ---------------------------------------
V Th Gng
K THUT KHAI PH D LIU CHUI THI GIAN
P DNG TRONG D BO CHNG KHON
Chuyn ngnh: Truyn d liu v Mng my tnh M s: 60.48.15
TM TT LUN VN THC S
H NI - 2012
Lun vn c hon thnh ti:
HC VIN CNG NGH BU CHNH VI N THNG
Ngi hng dn khoa hc: TS. NGUYN C DNG
Phn bin 1: ....................................................................
Phn bin 2: ....................................................................
Lun vn s c bo v trc Hi ng chm lun vn
thc s ti Hc vin Cng ngh Bu chnh Vin thng
Vo lc: ..... gi ....... ngy ..... thng ..... nm ............
C th tm hiu lun vn ti:
- Th vin ca Hc vin Cng ngh Bu chnh Vin thng
1
M U
1. L do chn ti
Ngy nay, khi x hi ngy cng pht trin th lng
thng tin cng tng ln vi tc bng n. Lng d liu
khng l y l mt ngun ti nguyn v gi nu nh
chng ta bit cch pht hin v khai thc nhng thng tin
hu ch c trong . Nh vy vn t ra vi d liu ca
chng ta l vic lu tr v khai thc chng. Cc phng
php khai thc d liu truyn thng ngy cng khng p
ng c nhu cu thc t. Mt khuynh hng k thut
mi ra i l K thut Khai ph d liu v khm ph tri
thc (Knownledge Discovery and Data mining - KDD).
Cng ngh khai ph d liu ra i cho php ta khai
thc c nhng tri thc hu dng bng vic trch xut
nhng thng tin c mi quan h hoc mi tng quan nht
nh t mt kho d liu ln (cc ln) m bnh thng
khng th nhn din c t gii quyt cc bi ton tm
kim, d bo cc xu th, cc hnh vi trong tng lai, v
nhiu tnh nng thng minh khc. Ngy nay, cc cng
2
ngh data mining c ng dng rng ri trong hu ht
cc lnh vc: phn tch d liu, d bo,
Mt trong nhng vn quan trng nht trong lnh
vc ti chnh hin i l tm kim
nhng cch thc hiu qu tm tt v hnh dung d
liu th trng chng khon cung cp
cho cc c nhn hoc t chc nhng thng tin hu ch v
cc hnh vi th trng h tr vic ra cc quyt nh u t.
S lng ln d liu c gi tr c to ra bi th
trng chng khon thu ht c cc nh nghin cu
khm ph vn ny bng cch s dng cc phng php
khc nhau.
i vi Vi t Nam, th trng chng khon cn kh
mi m, song ai cng bit c tim nng v li ch ng
k ca n. Vic khai thc c th trng ny s em li
li ch kinh t cao. D bo th trng chng khon l mt
cng vic kh quan trng khai thc lnh vc ny. Chnh
v vy ti chn ti K thut khai ph d liu chui
thi gian p dng trong d bo chng khon lm
lun vn tt nghip vi mc ch hiu c cng ngh
3
data mining cng nh ng dng to ln ca n trong vic
d bo, d on xu hng trong tng lai, c bit l
trong lnh vc th trng ti chnh, chng khon t c
nhng quyt nh u t, giao dch ph hp.
2. Mc ch nghin cu
- Nghin cu khi nim, vai tr, ng dng v cc k
thut khai ph d liu.
- Tm hiu k thut phn tch d liu chui thi gian
trong khai ph d liu p dng vo bi ton d bo ni
chung v d bo trong th trng chng khon ni ring.
- Tm hiu m hnh ARIMA (Auto Regressive
Integrate Moving Average) vi chc nng nhn dng
m hnh, c lng cc tham s v a ra kt qu d bo
da trn cc tham s c lng c la chn mt
cch ti u. Thc nghim m hnh ARIMA trn d liu
thi gian thc, p dng vi d liu chng khon hng ti
vic d bo chng khon.
3. i tng v phm vi nghin cu
Nghin cu cc k thut khai ph d liu, tp trung
vo k thut phn tch chui theo thi gian p dng vo
4
bi ton d bo s ln xung ca th trng chng khon.
M hnh ARIMA thc nghim trn d liu VNIndex,
ABT, ACB.
4. Phng php nghin cu
Nghin cu, tm hiu l thuyt v cc k thut khai
ph d liu.
Tm hiu, phn tch d liu ti chnh, chng khon.
Tm hiu c s l thuyt v m hnh ARIMA cho d
liu thi gian thc (time series) v cch p dng vo bi
ton thc t - d bo s ln xung ca th trng chng
khon.
Xy dng v thi hnh m hnh ARIMA v ng dng
vo bi ton khai ph d liu chui thi gian trong d bo
ti chnh, chng khon
S dng phn mm Eviews thi hnh chng trnh.
nh gi kt qu d bo c.
5. Kt cu lun vn
Ni dung chnh ca lun vn chia lm 3 chng:
5 Chng 1: Tng quan v khai ph d liu gii thiu
tng quan v qu trnh pht hin tri thc v khai ph d
liu, cc k thut khai ph d liu v ng dng ca khai
ph d liu.
Chng 2: K thut khai ph d liu chui thi
gian gii thiu v d liu chui thi gian thc v bi ton
d bo ang c quan tm trong khai ph d liu. Gii
thiu c s l thuyt ca m hnh ARIMA v cc bc
pht trin m hnh. Bi ton d bo c p dng di
kha cnh s dng m hnh ARIMA cho chui thi gian
thc. Tip n gii thiu v phn mm Eviews cho qu
trnh thi hnh.
Chng 3: p dng m hnh ARIMA cho bi ton
d bo chng khon trnh by thc nghim bi ton d
bo vi chui d liu ti chnh, chng khon bng m
hnh ARIMA. Thi hnh cc bc trong m hnh vi phn
mm Eviews 6, a ra kt qu v nh gi vi thc t.
Cui cng l Phn kt lun v hng pht trin ca
ti.
6
Chng 1: TNG QUAN V KHAI PH D LIU
1.1.Gii thi u
1.1.1. Khi nim
Khai ph d liu (Data Mining)
Khm ph tri thc (Knownledge Discovery - KD)
Data Mining l mt qu trnh trch xut thng tin c
mi quan h hoc c mi tng quan nht nh t mt kho
d liu ln (cc ln) nhm mc ch d on cc xu th,
cc hnh vi trong tng lai, hoc tm kim nhng tp
thng tin hu ch m bnh thng khng th nhn din
c.
1.1.2.Qu trnh pht hin tri thc trong CSDL
Hnh 1.1. Qu trnh pht hin tri th c
7
1.2. Cc k thut khai ph d liu
1.2.1. Cy quyt nh
1.2.2. Mng nron
1.2.3. Phn cm
1.2.4. Lut kt hp
1.2.5. Factor analysis (Phn tch nhn t)
1.2.6. Chui thi gian
1.3. ng dng ca khai ph d liu
1.3.1. Dng d liu c th khai ph
Data Mining c ng dng rng ri nn n c th
lm vic vi rt nhiu kiu d liu khc nhau, mt s dng
d liu in hnh nh: CSDL quan h, CSDL a chiu
(multidimentional structures, data warehouses), CSDL
dng giao dch, CSDL quan h-hng i tng, d liu
khng gian v thi gian, D liu chui thi gian, CSDL a
phng tin, d liu Text v Web...
1.3.2. ng dng ca khai ph d liu
Khai ph d liu l mt lnh vc c quan tm v
ng dng rng ri. Mt s ng dng in hnh trong khai
ph d liu c th lit k: (i) phn tch d liu v h tr ra
quyt nh; (ii) iu tr y hc; (iii) pht hin vn bn; (iv)
8
tin sinh hc; (v) ti chnh v th trng chng khon; (vi)
bo him...
1.3.3.ng dng ca cc k thut KPDL trong th trng
chng khon
ng dng in hnh ca khai ph d liu trong th
trng ti chnh, chng khon l: phn tch tnh hnh
ti chnh v d bo gi ca cc loi c phiu trong th
trng chng khon t mang li cho cc nh u t
nhiu c hi chn la loi c phiu cn u t, c hnh
thc v quy m giao dch ph hp nhm t c gi tr
gia tng hiu qu.
1.3.3.1. ng dng ca cy quyt nh
1.3.3.2. ng dng ca mng nron
1.3.3.3. ng dng ca phn cm
1.3.3.4. ng dng ca lut kt hp
1.3.3.5. ng dng ca phn tch nhn t
1.3.3.6. ng dng ca time series
9
Chng 2: K THUT KHAI PH D LIU CHUI
THI GIAN
2.1. Bi ton d bo
D bo l mt nhu cu khng th thiu cho nhng
hot ng ca con ngi trong bi cnh bng n thng tin.
D bo s cung cp nhng c s cn thit cho cc hoch
nh, v c th ni rng nu khng c khoa hc d bo th
nhng d nh tng lai ca con ngi vch ra s khng
c s thuyt phc ng k.
C rt nhiu phng php, k thut gii quyt bi
ton d bo, trong c phng php d bo theo chui
thi gian. ARIMA l m hnh d bo nh lng theo thi
gian, gi tr tng lai ca bin s d bo s ph thuc vo
xu th vn ng ca i tng trong qu kh (chui d
liu qu kh).
2.2. D liu chui thi gian
Mt chui thi gian (Time Series) l mt chui cc
quan st theo trt t thi gian. Ch yu nhng quan st
ny c thu thp nhng khong thi gian ri rc, cch
u nhau. Cc m hnh chui thi gian c c bit p
dng trong d bo ngn hn. Trong cc bi ton d bo
10
ni chung v cc bi ton d bo ti chnh v chng khon
ni ring, d liu thng c biu din di dng chui
thi gian. Trong cc dng d liu c phn tch th d
liu chui thi gian lun thuc tp u v tnh ph bin.
2.2.1. Chui thi gian thc
2.2.2. Thnh phn xu hng di hn
2.2.3. Thnh phn ma
2.2.4. Thnh phn chu k
2.2.5. Thnh phn bt thng
2.3. M hnh ARIMA cho d liu chui thi gian
2.3.1. Cc cng c p dng trong m hnh
2.3.1.1. Hm t tng quan ACF (AutoCorrelation
Function)
=
.
2.3.1.2. Hm t tng quan tng phn PACF
y(t+k) = Ck1y(t+k-1) + Ck2y(t+k-2) + ... + Ckk-1y(t + 1) +
Ckky(t) + e(t) (2.2)
11 Tng quan, hm t tng quan tng phn c tnh
theo Durbin :
=
(2.3)
2.3.1.3. M hnh AR(p)
y(t)=a0+a1y(t-1)+a2y(t-2)+apy(t-p)+e(t) (2.4)
M hnh AR(1): y(t) = a0 + a1y(t-1) + e(t)
M hnh AR(2): y(t) = a0 + a1y(t-1) + a2y(t-2) +e(t)
2.3.1.4. M hnh MA(q)
y(t) = b0 + e(t) +b1e(t-1) + b2e(t-2) + ... +bqe(t-q) (2.5)
M hnh MA(1) : y(t) = b0 + e(t) + b1e(t-1)
M hnh MA(2) : y(t) = b0 + e(t) + b1e(t-1) + b2e(t-2)
2.3.1.5. Sai phn I(d)
Sai phn ln 1 (I(1)) : z(t) = y(t) y(t-1)
Sai phn ln 2 (I(2)) : h(t) = z(t) z(t-1)
12
2.3.2. M hnh ARIMA
- M hnh ARMA(p,q):
y(t) = a0+a1y(t-1)+a2y(t-2)+...+apy(t-p)+e(t)
+b1e(t-1)+b2e(t-2)+...+bqe(t-q) (2.6)
- M hnh ARIMA(p,d,q):
M hnh ARIMA (1, 1, 1):
y(t) y(t-1) = a0 + a1(y(t-1) y(t-2) + e(t) + b1e(t-1))
Hoc z(t) = a0 + a1z(t-1) + e(t) + b1e(t-1),
Vi z(t) = y(t) y(t-1) sai phn u tin: d = 1.
Tng t ARIMA(1,2,1):
h(t) = a0 + a1z(t-1) + e(t) + b1e(t-1),
Vi h(t) = z(t) z(t-1) sai phn th hai: d = 2.
2.3.3. Cc bc pht trin m hnh.
2.3.3.1. Xc nh m hnh
2.3.3.2. c lng tham s
2.3.3.3. Kim nh chnh xc
2.3.3.4. D bo
13
Hnh 2.16. S m phng m hnh Box - Jenkins
2.4. Phn mm EVIEWS
2.4.1. Gii thiu phn mm ng dng Eviews
14
Hnh 2.17.Ca s chnh ca Eviews [Ngun: Eviews
5 Users Guide, tr16]
2.4.2. p dng Eviews thi hnh cc bc ca m hnh
ARIMA
2.4.2.1. Xc nh m hnh
2.4.2.2. c lng m hnh, kim tra m hnh
2.4.2.3. D bo
15
Chng 3: P DNG M HNH ARIMA CHO BI
TON D BO CHNG KHON
3.1. D liu ti chnh, chng khon
D liu chng khon c bit ti nh mt chui thi
gian a dng bi c nhiu thuc tnh cng c ghi ti
mt thi im no . Cc thuc tnh ca d liu chng
khon l: Open, High, Low, Close, Volume
3.2. M hnh ARIMA cho d bo chng khon
3.2.1. Qu trnh xy dng m hnh
- Xc nh m hnh
- c lng, kim tra m hnh
- D bo
3.2.2. Thit k m hnh ARIMA cho d liu
Cc bc xy dng mt m hnh nh sau :
1. Chn tham bin
2. Chun b d liu
Xc nh tnh dng ca chui d liu
Xc nh yu t ma v
Xc nh yu t xu th
16 3. Xc nh cc thnh phn p, q trong m hnh ARMA
4. c lng cc tham s v chn on m hnh ph
hp nht
5. D bo ngn hn
3.3. Thc nghim
S dng m hnh ARIMA v phng php Box
Jenkins thc hin 3 qu trnh d bo gi ng ca ca:
VnIndex, m c phiu ABT (ca Cng ty c phn xut
nhp khu thy sn Bn Tre) v m c phiu ACB (ca
Ngn hng Thng mi c phn Chu) trong ngn hn
cn c vo cc chui d liu qu kh ca cc m CK .
3.2.1. Mi trng thc nghim
3.2.2. D liu u vo
D liu u vo ca lun vn c ly t
http://www.cophieu68.com/datametastock.php. l 3
file.CSV tng ng vi 3 m CK c ly t website trn
xung. D liu c dng:
17
Hnh 3.1. D liu u vo.
To cc workfile.
3.2.3. X l d liu
3.2.3.1. Kim tra tnh dng ca chui chng khon
Da vo biu ca bin gi ng ca ca mi
chui chng khon.
18
Hnh 3.6. Biu gi ng ca ca ABT
3.2.3.2. Nhn dng m hnh
- Xc nh cc tham s p, d, q trong m hnh ARIMA
ca tng m CK da vo biu t tng quan.
Hnh 3.9. Biu SAC v SPAC ca chui
GIADONGCUA ca VNINDEX
19
3.2.3.3. c lng v kim nh vi m hnh ARIMA
Hnh 3.16. c lng m hnh ARIMA(1,0,1) ca ABT
Hnh 3.17. Kt qu m hnh ARIMA(1,0,1) ca ABT
20
Hnh 3.18. Kim tra phn d ca chui ABT
Bng 3.2. Bng tiu chun nh gi cc m hnh ARIMA
ca ABT
M hnh
ARIMA
BIC Adjusted R2 SEE
ARIMA(1,0,0) 2.385271 0.814950 0.782972
ARIMA(1,0,1) 2.345217 0.825445 0.760445
ARIMA(1,0,2) 2.397569 0.816063 0.780614
M hnh c chn cho chui ABT l ARIMA(1,0,1)
3.2.3. Thc hin d bo
Thc hin d bo gi ng ca ca VNINDEX, ABT,
ACB trong vng 8 ngy t 11/09/2012 n 20/09/2012
21
Hnh 3.22. D bo
Hnh 3.23. Kt qu d bo VNINDEX.
22
Bng3.4. Bng nh gi gi d bo VNINDEX so
vi gi thc t
Ngy Gi d bo Gi thc
t
nh gi Sai s
(%)
11/09/2012 390.8433 386.6 4.2433 1.09
12/09/2012 391.1221 388.4 2.7221 0.70
13/09/2012 391.3961 391.4 -0.0039 ~0.00
14/09/2012 391.6655 398.9 -7.2345 1.85
17/09/2012 391.9303 401.8 -9.8697 2.52
18/09/2012 392.1906 394.5 -2.3094 0.59
19/09/2012 392.4465 394.6 -2.1535 0.55
20/09/2012 392.6980 389.3 3.3980 0.87
nh gi: kt qu d bo l kh chnh xc (mc sai
s rt thp, t xp x 0% n 2.52%).
23
KT LUN
Lun vn trnh by c tng quan v khai ph d
liu: khi nim, cc k thut khai ph d liu v cc ng
dng ca khai ph d liu. Trong lun vn tp trung
vo k thut khai ph d liu chui thi gian p dng vo
bi ton thc t ang c quan tm l bi ton d bo
ni chung v d bo gi chng khon ni ring.
Lun vn cng trnh by c mt s ni dung c
s l thuyt v chui thi gian thc, v m hnh ARIMA
(cc cng c p dng trong m hnh, quy trnh xy dng
m hnh) v phn mm Eviews, p dng Eviews thi
hnh cc bc ca m hnh ARIMA trong d bo chng
khon. Tc gi c bn nm c quy trnh dng phn
mm Eviews xy dng m hnh ARIMA cho d liu
thi gian thc, tnh ton gi tr d bo cho chui d liu
chng khon.
Lun vn p dng nhng c s l thuyt nghin
cu tin hnh thc nghim trn ba chui chng khon (ch
s VnIndex, m CK ABT, ACB) da trn d liu lch s
ca mi chui (gm 257 quan st trong qu kh) v d
bo c gi ng ca ca 10 ngy tip theo. Kt qu d
24
bo c phn tch, kim tra, i chiu vi gi thc t
v cho thy kt qu l kh chnh xc, tin cy cao.
Nh vy cng cho thy rng m hnh ARIMA a ra cho
mi chui chng khon trong lun vn l kh ph hp
d bo ngn hn gi c phiu.
Bn cnh nhng kt qu t c, lun vn cn
mt s hn ch:
- Thut ton c lng cng nh nh gi cn nhiu
hn ch.
- Trong cc phin giao dch cn c th c tc ng ca cc
yu t ngoi lai ln nh tm l nh u t, tc ng ca
cc th trng chng khon khc, thng tin v s thay i
chnh sch, s lm cho sai s d bo tng. Do kt
qu ca m hnh a ra vn ch mang tnh cht tham kho
nhiu hn. y ch l m hnh phn tch k thut, cha th
d bo mt cch chnh sch, bi ch ph thuc vo mt
bin Thi gian, trong khi qu trnh d bo ph thuc vo
nhiu yu t.
Hng pht trin tip theo ca ti: Xy dng m
hnh ARIMA a bin: ch s ca gi chng khon ph
thuc vo nhiu bin khc nhau.