61
ĐẠI HỌC QUỐC GIA HÀ NỘI TRƢỜNG ĐẠI HỌC CÔNG NGHỆ PHAN THỊ THUẬN TRÍCH CHỌN SỰ KIỆN TRONG VĂN BẢN TIN TỨC TIẾNG VIỆT LUẬN VĂN THẠC SĨ CÔNG NGHỆ THÔNG TIN HÀ NỘI - 2014

[123doc.vn] Trich Chon Su Kien Trong Van Ban Tin Tuc Tieng Viet

Embed Size (px)

DESCRIPTION

khai phá dữ liệu

Citation preview

  • I HC QUC GIA H NI TRNG I HC CNG NGH

    PHAN TH THUN

    TRCH CHN S KIN TRONG VN BN

    TIN TC TING VIT

    LUN VN THC S CNG NGH THNG TIN

    H NI - 2014

  • I HC QUC GIA H NI TRNG I HC CNG NGH

    PHAN TH THUN

    TRCH CHN S KIN TRONG VN BN

    TIN TC TING VIT

    Ngnh : Cng ngh thng tin Chuyn ngnh : H thng thng tin

    M s : 60480104

    LUN VN THC S CNG NGH THNG TIN

    NGI HNG DN KHOA HC: TS NGUYN TR THNH

    H NI - 2014

  • i

    LI CM N

    Trc tin, ti xin c gi li cm n v lng bit n su sc nht ti

    Thy gio, TS. Nguyn Tr Thnh tn tnh ch bo; hng dn; ng vin v

    gip ti trong sut qu trnh thc hin lun vn tt nghip.

    Ti xin gi li cm n ti Thy gio, PGS. TS. H Quang Thu ngi

    tn tnh gip , c v, v gp cho ti trong sut thi gian ti nghin cu v

    lm vic ti phng th nghim Cng ngh Tri thc (Knowledge Technology

    Laboratory - KTLab).

    Ti xin gi li cm n ti cc anh ch, cc bn sinh vin ti phng th

    nghim Cng ngh Tri thc (KTLab) Trng i hc Cng ngh h tr ti

    rt nhiu trong qu trnh thc hin lun vn.

    Cui cng, ti mun gi li cm n ti gia nh v bn b, nhng ngi

    thn yu lun bn cnh: quan tm; ng vin ti trong sut qu trnh hc tp v

    thc hin lun vn tt nghip ny.

    Ti xin chn thnh cm n!

    H Ni, ngy 20 thng 6 nm 2014

    Hc vin

    Phan Th Thun

  • ii

    LI CAM OAN

    Ti xin cam oan gii php trch chn s kin trong vn bn tin tc ting

    Vit c trnh by trong lun vn ny do ti thc hin di s hng dn ca

    TS. Nguyn Tr Thnh.

    Ti trch dn y cc ti liu tham kho, cng trnh nghin cu lin

    quan trong nc v quc t. Tt c nhng tham kho t cc nghin cu lin

    quan u c nu ngun gc mt cch r rng t danh mc ti liu tham kho

    trong lun vn.

    H Ni, thng 6 nm 2014

    Tc gi lun vn

    Phan Th Thun

  • iii

    MC LC

    DANH MC CC HNH .............................................................................................. vi

    DANH MC CC BNG ............................................................................................. vi

    M U ....................................................................................................................... vii

    Chng 1. GII THIU TI .................................................................................... 1

    1.1. BI TON TRCH CHN THNG TIN TRONG VN BN ................... 1

    1.2. TNG QUAN V S KIN ......................................................................... 1

    1.2.1. nh ngha s kin .................................................................................. 3

    1.2.2. Trch chn s kin ................................................................................... 3

    1.3. TRCH CHN S KIN TRONG VN BN TIN TC TING VIT .... 4

    1.3.1. Bi ton trch chn s kin v tai nn ..................................................... 4

    1.3.2. Pht hin s kin ..................................................................................... 6

    1.3.3. Trch chn s kin ................................................................................... 6

    1.4. NGHA CA BI TON TRCH CHN S KIN V TAI NN ....... 7

    1.4.1. ngha khoa hc .................................................................................... 7

    1.4.2. ngha thc tin ..................................................................................... 7

    1.5. KT LUN .................................................................................................... 8

    Chng 2. MT S PHNG PHP TIP CN ........................................................ 9

    2.1. PHNG PHP TIP CN DA TRN TP LUT (RULE BASED) 9

    2.1.1. Lut c php (lexico-syntactic patterns) ............................................... 10

    2.1.2. Lut ng ngha (lexico-semantic patterns) ............................................ 11

    2.1.3. Hnh dng v biu din ca tp lut (Form and Representation of Rules)

    ......................................................................................................................... 11

  • iv

    2.2. PHNG PHP TIP CN DA TRN HC MY ............................. 15

    2.3 PHNG PHP TIP CN KT HP LUT V HC MY ............... 17

    2.5. TNG KT .................................................................................................. 18

    Chng 3. XUT M HNH TRCH CHN S KIN V TAI NN ............... 19

    3.1. CC C TNH CA S KIN V TAI NN ....................................... 19

    3.2. PHT BIU BI TON ............................................................................. 19

    3.3. M HNH PHT HIN V TRCH CHN S KIN V TAI NN ..... 21

    3.3.1. Phng php xut ............................................................................ 21

    3.3.2. M hnh pht hin v trch chn s kin v tai nn .............................. 22

    3.4. GII QUYT BI TON PHT HIN S KIN V BI TON TRCH

    CHN S KIN V TAI NN ........................................................................ 23

    3.4.1. Bi ton 1- Php hin s kin v tai nn (pha 1) ................................. 23

    3.4.1.1. Pht biu bi ton ..................................................................... 23

    3.4.1.2. Xy dng tp lut ...................................................................... 24

    3.4.1.3. Xy dng m hnh phn lp ...................................................... 28

    3.4.2. Bi ton 2- Trch chn s kin v tai nn (pha 2) ................................ 29

    3.4.2.1. Pht biu bi ton ..................................................................... 29

    3.4.2.2. Trch chn thi gian .................................................................. 30

    3.4.2.3. Trch chn a im ................................................................... 32

    3.4.2.4. Trch chn s thng vong ........................................................ 32

    3.4.2.5. Trch chn phng tin gy tai nn .......................................... 33

    3.5. TNG KT .................................................................................................. 34

    Chng 4. THC NGHIM V NH GI.............................................................. 36

  • v

    4.1. MI TRNG V CC CNG C S DNG THC NGHIM ......... 36

    4.2. XY DNG TP D LIU ....................................................................... 37

    4.2.1. Thu thp d liu .................................................................................... 37

    4.2.2. Tin x l d liu .................................................................................. 37

    4.3. NH GI QU TRNH PHT HIN S KIN .................................... 37

    4.3.1. nh gi b lc d liu ......................................................................... 37

    4.3.2. nh gi qu trnh phn lp .................................................................. 38

    4.4. NH GI QU TRNH TRCH CHN S KIN ................................. 39

    4.4.1. Thc nghim khng qua b phn lp ................................................... 39

    4.4.2. Thc nghim qua b phn lp............................................................... 41

    4.4.3. Nhn xt ................................................................................................ 41

    4.5 PHN TCH LI .......................................................................................... 41

    4.5.1. Phn tch li qu trnh pht hin s kin ............................................... 41

    4.5.2. Phn tch li qu trnh trch chn s kin ............................................. 42

    4.6. MT S KT QU PHN TCH CC S KIN .................................... 43

    Biu 4.3. Thng k s v tai nn theo tnh .................................................... 44

    4.7. TNG KT .................................................................................................. 45

    TI LIU THAM KHO ............................................................................................. 48

  • vi

    DANH MC CC HNH

    Hnh 3.1: Qu trnh pht hin v trch chn s kin v tai nn ......................... 22

    Hnh 3.2 Thnh phn pht hin s kin .............................................................. 24

    Hnh 3.3 Tiu bn tin c cha t lin quan phng tin giao thng ............. 25

    Hnh 3.4 Tiu khng cha cc t lin quan n phng tin giao thng ..... 26

    Hnh 3.5 Thnh phn trch chn s kin ............................................................. 30

    Hnh 4.1. Li b lc khi d liu khng thuc min tai nn giao thng .............. 42

    DANH MC CC BNG

    Bng 3.1Phng tin giao thng ........................................................................ 26

    Bng 4.1 Cu hnh phn cng ............................................................................. 36

    Bng 4.2. Cng c phn mm s dng................................................................ 36

    Bng 4.3. Cc thnh phn ca mt bn tin ......................................................... 37

    Bng 4.4. T l li ca qu trnh lc d liu....................................................... 38

    Bng 4.5. nh gi kt qu phn lp ................................................................. 39

    Bng 4.6. nh gi qu trnh trch chn - d liu khng qua b phn lp ....... 41

    Bng 4.7. nh gi qu trnh trch chn - d liu qua b phn lp. ................. 41

    Bng 4.8 Mt s li - trong qu trnh trch chn .............................................. 43

  • vii

    M U

    Trch chn thng tin (Information Extraction - IE), c bit l trch chn s

    kin (Event Extraction - EE) l mt lnh vc con trong khai ph d liu (Data

    Mining - DM). Nhng nm gn y, trch chn s kin thu ht nhiu s quan

    tm t cc nh khoa hc trn th gii v thu c nhiu kt qu trong thc t.

    Trch chn s kin c th p dng vo nhiu min d liu khc nhau nh kinh

    t, vn ha, y t, x hi (chng hn nh thng tin v cc v tai nn giao thng),

    chnh tr, ...

    Theo nhng con s thng k trn cc trang bo in t v con s tai nn

    hng nm, nh: thng tin ng trn bo in t http://binhduong.gov.vn, sng

    03 01-2013, Chnh ph t chc Hi ngh trc tuyn tng kt cng tc trt t

    an ton giao thng nm 2012 v trin khai nhim v nm 2013 do Ph Th

    tng Chnh ph Nguyn Xun Phc ch tr. Trong hi ngh, y ban An ton

    giao thng ATGT Quc gia thng k: nm 2012, c nc xy ra 36.376 v

    tai nn giao thng, lm cht 9.838 ngi, b thng 38.060 ngi. Cng theo

    bo in t http://hanoimoi.com.vn, ngy 31-12-2013, Ph Th tng Chnh

    ph, Ch tch y ban ATGT Quc gia Nguyn Xun Phc ch tr hi ngh

    trc tuyn vi cc b, ngnh, a phng nhm tng kt cng tc bo m trt t

    ATGT nm 2013 v trin khai nhim v nm 2014. Theo thng k ca y ban

    ATGT Quc gia, nm 2013 c nc xy ra 29.385 v tai nn giao thng

    (TNGT), lm cht 9.369 ngi, b thng 29.500 ngi.

    T cc con s thng k tai nn giao thng hng nm, chng ta thy s v

    tai nn cn rt cao, i cng vi n l con s t vong v s thng vong l rt

    ln. Mt khc, bn tin v tai nn c cp nht kh y v mang tnh thi s

    trn cc bo in t. Hn na, trch chn s kin ang rt pht trin, chng ta c

    th s dng trch chn s kin trch chn thng tin hu ch t cc bn tin v

    tai nn, kt qu ca qu trnh ny s c thng k thnh cc con s hu ch

    gip cc nh qun l v ngi dn tham gia giao thng ng cch. cng l l

    do, tc gi chn v nghin cu tiTrch chn s kin trong vn bn tin tc

  • viii

    ting Vit min d liu khai thc l s kin v tai nn. Chi tit lun vn c

    chia thnh 4 chng:

    Chng 1. Gii thiu ti

    Chng ny trnh by c bn v bi ton trch chn s kin trong bi cnh

    bng n thng tin trn Internet. Hn na nu ln c ngha khoa hc, ngha

    thc tin, ng dng ca ti trch chn s kin v tai nn giao thng trn min

    vn bn ting Vit.

    Chng 2.Mt s phng php tip cn

    Chng ny tp trung trnh by cc phng php tip cn cho bi ton trch

    chn s kin l, phng php tip cn da trn tp lut, phng php tip cn

    da trn hc my, phng php tip cn kt hp lut v hc my, trong mi

    phng php u c nhn xt hu ch. T , lun vn s ch ra phng php

    ph hp cho bi ton trch chn s kin v tai nn.

    Chng 3. xut m hnh trch chn s kin v tai nn

    Chng ny, pht biu v m t m hnh tng th cho bi ton trch chn

    s kin v tai nn. Sau , pht biu, m t m hnh chi tit v cch gii quyt

    cho hai bi ton: pht hin s kin v trch chn s kin.

    Chng 4. Thc nghim v nh gi

    Chng ny, lun vn m t qu trnh thc nghim v nh gi kt qu

    xut da trn hai bi ton, l: bi ton pht hin s kin v bi ton trch

    chn s kin. Ba o c s dng trong pha pht hin s kin l chnh xc

    (P - Precision), hi tng (R - Recall), v o F1 (F1-score) v so snh vi

    kt qu nh gi th cng (bng tay) cho pha trch chn s kin. Thng k v

    nh gi (biu ) cc thuc tnh c trch chn.

    Phn kt lun: trnh by kt qu t c ca lun vn, nhng hn ch v

    hng pht trin ca lun vn trong tng lai.

  • 1

    Chng 1. GII THIU TI

    Trong chng ny, lun vn tp trung gii quyt cc vn sau: gii thiu

    bi ton trch chn thng tin, tng quan v s kin, trch chn s kin trong vn

    bn tin tc ting Vit (tin tc c cp l v tai nn), ngha khoa hc v

    ngha thc tin ca bi ton trch chn s kin v tai nn.

    1.1. BI TON TRCH CHN THNG TIN TRONG VN BN

    Theo Douglas E. Appelt, trch chn thng tin (Information Extraction- IE)

    c th c coi nm gia thu hi thng tin (Information Retrieval - IR) v hiu

    vn bn (Text Understanding - UT) [2]. Khng ging nh thu hi thng tin ch

    tp trung vo cc mu thng tin c lin quan trong vn bn m khng ch trng

    n vic hiu vn bn; trch chn thng tin cn quan tm ti cc s kin c lin

    quan trong vn bn v biu din chng di dng cc khun mu thng tin c

    lin quan trong vn bn v biu din chng di dng khun mu. Khc vi

    hiu vn bn ch tp trung trn mt phn nh ca vn bn (cu, on), trch chn

    thng tin quan tm ti ton b ni dung vn bn.

    Theo Peshkin v Pfeffer [11], trch chn thng tin c th c nh ngha:

    nh l mt cng vic in thng tin vo cc mu t cc d liu cha bit trc

    trong min c nh ngha trc. Mc tiu ca trch chn thng tin l ly t

    vn bn cc thng tin ni bt ca cc s kin, thc th, cc mi lin h. Nh

    vy, c th coi trch xut thng tin l mt k ngh ly v biu din tri thc thnh

    cc thng tin c nh dng v hu ch t ngun d liu ln trn Internet.

    Bi ton trch chn thng tin trong vn bn c th c pht biu nh sau:

    - u vo: d liu vn bn bt k

    - u ra: thng tin hu ch di dng c cu trc.

    1.2. TNG QUAN V S KIN

    Trch chn s kin vi vai tr trch chn ra cc thng tin c ngha t tp

    d liu ln v c cng ng khoa hc rt quan tm v u t nghin cu.

  • 2

    Nm 1987, Message Understanding Conferences (MUC)6 c t chc vi s

    h tr ca Qu nghin cu B quc phng Hoa K7 v ln u tin khi nim

    event (s kin) c cp. Sau , rt nhiu hi ngh c t chc to thnh

    dy hi ngh MUC. Vi mi hi ngh, thng tin c quan tm khc nhau nhng

    u c c im chung l chng c trch xut t d liu ni v khng hong

    (crisis). Cc ch trong d liu thng l ti phm, khng b, nh bom

    mt trong nhng ng gp ln ca MUC l a ra vic trch chn thng tin da

    trn mu (scenariotemplate). Cc mu c ban t chc quy nh v cc i

    tham gia cn in thng tin vo cc mu ny mt cc t ng. Cui cng, cc s

    kin c trch chn gm cc thng tin: t chc, i tng tham gia (ngi, s

    vt, s vic), thi gian, a im, s lng chnh xc (precision) v hi

    tng (recall) ca cc nghin cu tham d MUC nm trong khong 50% n

    60% [5].

    Chng trnh Pht hin v theo di ch (Topic Detection and Tracking,

    TDT)8 c t chc t nm 1997 thu ht nhiu nhm nghin cu t cc trng

    i hc tham gia. Chng trnh ny c phi hp bi Vin Cng ngh v

    Chun ho quc gia Hoa K (NIST) v DAPRA nhm gii quyt bi ton pht

    hin, theo di v xu chui s kin. Mt s nhm nghin cu tham gia chng

    trnh nh sau: nhm CMU ca i hc Carnegie Mellon, nhm BBN t cng ty

    BBN Technologies, nhm DRAGON ca cng ty Dragon, nhm UPENN ca

    trng i hc Pennsylvania (UPENN). Cc bi ton quan trng ca TDT gm:

    Story Segmentation, Topic Tracking, Topic Detection, First Story Detection, v

    Link Detection.

    Chng trnh Trch chn ni dung t ng (Automatic Content Extraction,

    ACE) ca i hc Pennsylvania cng thu ht c nhiu quan tm t cc cng

    ng nghin cu v trch chn thng tin cng nh trch chn s kin. Chng

    trnh ny tp trung vo cc ngn ng nh ting Anh, Trung Quc v rp. Cc

    thng tin c trch chn gm cc thc th, quan h gia cc thc th, v cc s

    kin chng tham gia vo.

  • 3

    Nh vy, c th thy rng trch chn thng tin ni chung v trch chn s

    kin ni ring l mt vn quan trng v thi i, nhn c rt nhiu quan

    tm t cng ng khoa hc. Trong phn tip theo lun vn s lm sng t nh

    ngha s kin [1.2.1] v trch chn s kin [1.2.2].

    1.2.1. nh ngha s kin

    Trch chn s kin ln u tin c gii thiu nh mt ch quan trng

    trong Message Understanding Conference (MUC) nm 1987 [21]. Trong MUC,

    mt s kin c nh ngha nh sau: mt s kin c tc nhn (actor), thi

    gian (time), a im (place) v tc ng ti mi trng xung quanh.

    Trong chng trnh ACE, Dodington Deorge R v cng s a ra nh

    ngha s kin nh sau: mt s kin l mt hnh ng c to bi nhng

    ngi tham gia[22]. ACE chia s kin thnh 8 loi khc nhau: LIFE (s sng -

    cht), MOVEMENT (s di chuyn), TRANSACTION (giao dch), BUSINESS

    (kinh t), CONFLICT (xung t), CONTACT (giao thip), PERSONNEL

    (nhn - i vic), JUSTICE (php l). Mi dng s kin li phn bit tng dng

    con. V d, LIFE c cc dng con nh BE-BORN (cho i), INJURE (b

    thng), DIE (cht), hay PERSONAL c START-POSITION (v tr khi nhn

    vic), END-POSITION (v tr khi thi vic), NOMINATE (b nhim), ELECT

    (bu chn),...

    C th thy rng cc nghin cu lit k trn u ng rng s kin c

    th coi nh mt mu (template) gm nhiu cc thuc tnh (elements). Qu trnh

    trch chn s kin quan tm ti vic lm th no c th in cc thng tin ph

    hp t cc vn bn gc tng ng tng thuc tnh.

    1.2.2. Trch chn s kin

    Trch chn s kin v trch chn thng tin c im g chung? C th ni

    rng trch chn s kin l mt lnh vc con ca trch chn thng tin. Nu nh

    trch chn thng tin ch quan tm cc d liu ri rc (tn ngi, a im, cc

    con s,) th trch chn s kin quan tm nhiu hn ti tnh cu trc v mc

  • 4

    lin quan ca thng tin trong mt s kin. T , ngi c c th d rng suy

    lun ra cc thng tin c ngha. V d, ngay sng ngy 30/4, trn ng Xun

    Thu, th H Ni xy ra v tai nn nghim trong lm 2 ngi trn xe my

    b thng nng. Nguyn nhn bc u c cho l do ti x tc-xi tng tc

    khi nhn im nn x thng vo xe my i cng chiu. Trong v d ny,

    trch chn thng tin a ra cc kt qu ri rc nh: 30/4, H Ni, 2 hoc tc xi;

    trong khi trch chn s kin th quan tm ti mt b cc thuc tnh biu din

    cho s kin gm {30/4, H Ni, 2 ngi b thng, tc-xi}. R rng, vi tp d

    liu trn, thng tin l hu ch v y hn cc thng tin ri rc.

    Mt cch tng qut, c th coi trch chn s kin trong vn bn nhn u

    vo l cc vn bn phi cu trc v u ra l tri thc c biu din di dng

    thng tin c cu trc. Nhng thng tin ny rt hu ch cho vic khai thc d liu

    nh: thng k, h thng gim st, cc h thng h tr ra quyt nh. Trch chn

    s kin c th p dng cho mt min d liu c th nh v tai nn giao thng,

    thng tin cc tour du lch, bnh dch, ng thi a ra cc thng tin xung

    quanh s kin thng bao gm: Thi gian, a im, s lng,

    Theo Grishman v cng s, trch chn s kin l mt bi ton kh do vn

    x l ngn ng t nhin (Natural Language Processing - NLP) v c trng

    d liu [21]. D rng nhn thy trch chn s kin ph thuc nhiu vo NLP, c

    th l bi ton nhn dng thc th (Named Entity Recognition - NER). Bn cnh

    , d liu u vo ca trch chn s kin rt a dng nn s nh hng ti tnh

    hiu qu ca qu trnh trch chn.

    1.3. TRCH CHN S KIN TRONG VN BN TIN TC TING VIT

    1.3.1. Bi ton trch chn s kin v tai nn

    Trch chn thng tin (Information Extraction - IE), c bit l trch chn s

    kin (Event Extraction - EE) l mt lnh vc con trong khai ph d liu (Data

    Mining - DM). Nhng nm gn y, trch chn s kin thu ht nhiu s quan

  • 5

    tm t cc nh khoa hc. N l bc i tt cho vic khai thc tri thc trn vn

    bn.

    Trch chn thng tin v s kin v tai nn nh: thi gian(gi trong ngy),

    thi gian (dd/mm/yyyy), th/tun, thng/nm, a im xy ra v tai nn, s

    thng vong, phng tin tham gia trong v tai nn, phng tin gy tai nn,

    tui ca ngi iu khin phng tin gy tai nn, ngnh ngh, a hnh gy tai

    nn, nguyn nhn gy tai nn... Kt qu ca qu trnh trch chn c lm u

    vo cho h thng khai thc nh thng k v trc quan ho trn bn Vit Nam

    nhng a im nng hay xy ra tai nn, thi gian no trong ngy c nguy c

    xy ra tai nn nhiu hn, thng no hay ma no trong nm c nguy c tai nn

    giao thng nhiu hn, tui c nguy c xy ra tai nn Nhng iu gip

    ch cho cc nh qun l c bit php gip khc phc gim thiu s v tai nn,

    t bng bin bo hiu ni c nguy c tai nn cao, c bim php gio dc ngi

    dn khi tham gia giao thng. Mt khc, gip ngi dn bit cch t phng trnh

    khng mnh l mn nhn ng tic trong cc v tai nn.

    Bi ton trch chn s kin v tai nn c pht biu nh sau:

    u vo: bn tin bt k trn bo in t

    u ra: trch chn nhng thng tin ca s kin v tai nn (nu c).

    Bi ton trch chn s kin v tai nn c chia thnh hai bi ton. Bi

    ton th nht, pht hin s kin v tai nn, u vo l bn tin bt k trn bo

    in t, bi ton phi ch ra u l s kin v tai nn. Kt qu ca bi ton pht

    hin s kin s l d liu u vo cho bi ton trch chn; thng tin c trch

    chn trong s kin v tai nn c th l thi gian, a im xy ra tai nn, s

    thng vong, phng tin gy tai nn, gi (gi no trong ngy xy ra tai nn),

    tui ca ngi iu khin phng tin xy ra tai nn, gii tnh, a hnh xy

    ra tai nn, Trong gii hn ti, tc gi tp trung vo vic trch chn ra b cc

    thuc tnh nh: (thi gian, a im xy ra tai nn, s thng vong, phng tin

    gy tai nn).

  • 6

    1.3.2. Pht hin s kin

    Bi ton pht hin s kin tr li cu hi lm th no pht hin c

    mt vn bn c cha s kin v tai nn. Tc l, cho trc u vo l vn bn,

    lm th no pht hin vn bn c cha s kin v tai nn? theo Grishman

    v cng s [13], pht hin s kin l qu trnh hc khng gim st, tc gi s

    dng cc t kho quyt nh mt vn bn c cha s kin dch bnh hay

    khng. Hai t kho c tc gi s dng l outbreak of v died from.

    Theo Doan v cng s [14], bi ton pht hin s kin c th coi nh qu trnh

    hc c gim st. Trong nghin cu ca mnh, tc gi s dng phng php

    phn lp cc ti liu. B phn lp ny da trn mt tp cc d liu c gn

    nhn. Qua qu trnh hun luyn, b phn lp s quyt nh mt vn bn u vo

    c cha s kin dch bnh hay khng.

    T nghin cu ca Grishman v cng s hoc nghin cu ca Doan v

    cng s, c cc cch khc nhau gii quyt bi ton pht hin s kin dch

    bnh. Do , c th vn dng phng php ny cho vic pht hin s kin v tai

    nn giao thng cng vi vic xy dng b t kho hoc xy dng mt tp cc d

    liu c gn nhn ph hp cho s kin v tai nn giao thng.

    1.3.3. Trch chn s kin

    Nhim v ca bi ton trch chn s kin phi tr li cu hi lm th no

    trch chn cc thuc tnh ca mt s kin. C nhiu phng php cho vic

    trch chn s kin; trong phi k n phng php s dng lut (hc khng

    gim st) c s dng t rt sm gii quyt bi ton ny[13]. Qu trnh trch

    chn bng phng php ny thng c s dng cc lut da vo qu trnh

    kho st d liu trch ra cc thuc tnh ca mt s kin.

    Phng php s dng hc my v cc k thut NLP gii quyt bi ton

    trch chn s kin. Qu trnh ny thng s dng Named Entity Recognition

    (NER) ly ra cc thuc tnh c bn ca s kin: thi gian, a im, tn

    ngi, sau kt hp cc thuc tnh ny thnh mt s kin. [14].

  • 7

    Nh vy, bi ton trch chn s kin ni chung hay bi ton trch chn s

    kin v tai nn ni ring c th c chia thnh hai bi ton con, l: pht hin

    s kin v trch chn s kin. Trong lun vn ny, tc gi s m t chi tit cc k

    thut c p dng gii quyt hai bi ton ny chng 3.

    1.4. NGHA CA BI TON TRCH CHN S KIN V TAI NN

    1.4.1. ngha khoa hc

    ngha khoa hc ca bi ton trch chn s kin c rt nhiu cc nh

    khoa hc quan tm. Kt qu ca bi ton trch chn s kin v tai nn lm tin

    cho vic khai thc d liu nh thng k, d on xu hng, h thng gim st

    v h tr ra quyt nh.

    1.4.2. ngha thc tin

    Kt qu vic trch chn s kin v tai nn l d liu u vo cho vic khai

    thc: thng k cc con s lin quan n v tai nn nh cc v tai nn hay xy ra

    vo thi gian no trong ngy (vo bui sng, gi n cng s, bui tra, gi tan

    tm, hay vo m), nhng thng no trong nm hay xy ra tai nn (vo ma l

    hi, ma ngh mt hay ma ma), phng tin no hay xy ra tai nn (xe

    but, xe ti, tc-xi, xe khch,), tui ca ngi iu kin phng tin giao

    thng (tui 18-20, tui ngoi 60, hay tui no khc), ngh nghip ca ngi

    iu kin phng tin giao thng (lm ngh t do, xe m, cng chc,..), a hnh

    gy tai nn (ng vng cua, ng giao nhau, ng rc, ng trn, ng

    g gh, ng cao tc,) T nhng thng k trn c th trc quan ho trn bn

    nhng a im nhy cm hay xy ra tai nn.

    Qua , cung cp cho ngi dn c thm kin thc khi tham gia giao thng

    nh: trong khong thi gian no, trn qung ng no, hay xy ra ta nn.

    iu c th gip ngi dn bit cch phng trnh cc nguy c c th xy

    ra tai nn.

    Ngoi ra, n cn gip ngi dng mun tm kim thng tin lin quan n

    v tai nn giao thng.

  • 8

    Hn th na, kt qu ca bi ton c th gip cc nh qun l c ci nhn

    khch quan tnh trng tai nn giao thng, c bim php phng nga cc v tai

    nn nh: sa cha nng cp c s h tng, c bim php gio dc thc ngi

    dn khi tham gia, t bin cnh bo ni no c nguy c cao xy ra tai nn, cn

    phi gim tc , thn trng quan st ng trong khi tham gia giao thng

    Ngoi ra, nhng con s thng k t vic trch chn s kin v tai nn. Cn

    gip cc nh qun l so snh quy m mc nghim trng ca cc v tai nn

    trong tng khong thi gian vi nhau, t a ra bn nh gi trung v s pht

    trin ca cc v tai nn theo chiu hng no.

    1.5. KT LUN

    Trong chng ny, lun vn trnh by c bn bi ton trch chn s

    kin. Trng tm ca chng ny trnh by nhng khi nim c bn ca bi ton

    trch chn s kin ni chung v bi ton trch chn s kin v tai nn ni ring.

    Bn cnh , chng ny cng cp ti hai bi ton c bn ca trch chn s

    kin v tai nn, l bi ton pht hin s kin v bi ton trch chn s kin;

    ng thi nu ngha khoa hc, ngha thc tin, nhng kh khn khi gii

    quyt bi ton trch chn s kin v tai nn. Trong chng 2, lun vn s trnh

    by cc phng php tip cn gii quyt bi ton pht hin s kin v trch

    chn s kin v tai nn.

  • 9

    Chng 2. MT S PHNG PHP TIP CN

    Theo nghin cu ca Hogenbcom F. v cng s [4] cung cp mt kho

    st da trn ba phng php c bn ph hp cho bi ton trch chn s kin

    trong vn bn. l cc phng php: phng php da lut hay cn c gi

    l phng php da ttrn tri thc (knowledge - driven), phng php hc my

    hay cn c gi l phng php da trn d liu (data-driven), phng php

    kt hp gia hai phng php trn hay cn c gi l phng php lai

    (hybrid).

    Phng php th nht da trn tri thc, thng s dng kin thc chuyn

    gia min sinh ra tp lut (thng l chuyn gia v ngn ng v chuyn min

    d liu); i hi c d liu v hiu d liu sau sinh ra tp lut. Phng php

    th hai da trn d liu, phng php ny da trn tri thc t mt tp d liu

    ln gii quyt bi ton trch chn thng tin trong mt s kin (thng s

    dng phng php thng k v m hnh ton hc). in hnh cho phng php

    ny l nhn dng thc th (NER). Tp lut ny thng s dng trch chn

    thuc tnh ca s kin. Phng php cui cng, s dng kt gia hai phng

    php trn.

    Trong chng ny, tc gi s trnh by phng php tip cn bi ton Trch

    chn s kin v tai nn giao thng bao gm: phng php tip cn da trn

    lut (rule - base), phng php tip cn da trn hc my, phng php tip cn

    kt hp lut v hc my. Phn cui tc gi s c nhng nhn xt v a ra

    phng php gii quyt bi ton trong chng 3. Chi tit ca tng phng php

    s c trnh by cc mc [2.1], [2.2], [2.3].

    2.1. PHNG PHP TIP CN DA TRN TP LUT (RULE

    BASED)

    Phng php da trn tp lut hay cn c gi l phng php da ttrn

    tri thc (knowledge - driven). Phng php ny da trn tri thc, thng s

    dng kin thc chuyn gia min sinh ra tp lut (thng l chuyn gia v

  • 10

    ngn ng v chuyn gia min d liu); i hi c v hiu d liu sau sinh ra

    tp lut.

    2.1.1. Lut c php (lexico-syntactic patterns)

    Lut c php, i khi cn c gi l mu c php (lexico-syntactic

    patterns) c th coi l phng php s dng sm trong bi ton trch chn s

    kin. Cc mu ny c sinh ra t cc chuyn gia min (expert knowledge) di

    dng tp lut (rules) [4]. in hnh cho phng php ny l cc lut c biu

    din di dng biu thc chnh quy (regular expression).

    Cc lut c php l s kt hp biu din ca cc k t v cc thng tin c

    php vi cc biu thc chnh quy. Sau khi cc biu thc chnh quy c xy

    dng, cc biu thc ny s c so khp vi d liu trong vn bn u vo

    trch chn ra cc thng tin tng ng ca cc thuc tnh. i khi, lut c php

    c biu din dng n gin hn, l cc t kho. Tp lut c php c

    s dng trong trch chn s kin [7], [5], [6]. Trong nghin cu ca mnh,

    Nishihara v cng s s dng ba t kho: a im (place), i tng (object),

    v hnh vi (action) biu din mt s kin c trch chn t blogs [10].

    Trong lnh vc y sinh, Yakushiji v cng s s dng mt b phn tch kt hp

    vi ng php xc nh mi quan h v cc s kin [16]. Cn trong lnh vc

    tin v chnh tr Aone v cng s dng lut c php trch chn thng tin

    ca s kin [24]. Lut c php xc nh cc tham s bn trong vn bn khng

    xc nh ngha vn bn.

    Khi s dng lut trch chn s kin, i khi phi trch chn khi nim c

    ngha c bit hoc mi quan h gia cc thnh phn c trch chn. Do ,

    s dng lut c php khng p ng c iu ny. gii quyt c iu

    ny, phng php thng s dng trong (rulebased) l s dng lut ng ngha

    (lexico-semantic patterns). Chi tit ca lut ng ngha s c trnh by trong

    mc [2.1.2].

  • 11

    2.1.2. Lut ng ngha (lexico-semantic patterns)

    i khi trch chn s kin phi trch chn cc khi nim c ngha c

    bit hoc mi quan h gia cc thnh phn c trch chn. Do , gii quyt

    c iu ny, phng php thng s dng trong (rulebased) l s dng lut

    ng ngha. Cc lut ng ngha khng n gin l cc t c biu din di

    dng biu thc chnh quy m l cc t v mi quan h gia chng.

    Lut ng ngha c s dng vi nhiu mc ch v nhiu lnh vc khc

    nhau. V d nh, Li Fang v cng s s dng lut ngh ngha trch chn

    thng tin t sn chng khon (stock market) [25]; Hay, Cohen v cng s [17]

    s dng khi nim b nhn dng (recognizer) trn min d liu y sinh trch

    chn thng tin y sinh t tp d liu; Capet v cc cng s s dng mu ng

    ngha trch chn s kin cho h thng cnh bo sm [27]; cn Vargas-Vera

    v Celjuska xut mt b khung (framework) cho vic nhn din cc s kin

    tp trung trn bo Knowledge Media Institute (KMI) [26].

    Trch chn s kin trong vn bn phi cu trc c th c ng dng trong

    nhiu lnh vc nh: ti chnh, chng khon, y sinh, bn tin php lut C l s

    l cha y nu khng cp chi tit hn n hnh dng v biu din ca tp

    lut trong trch chn thc th. iu ny s c trnh by ti mc [2.1.3].

    2.1.3. Hnh dng v biu din ca tp lut (Form and Representation of

    Rules)

    Theo ti liu Information Extraction ca Sunita Sarawagi [1], mt lut c

    bn c dng: "mu theo ng cnh hnh ng". Mt mu theo ng cnh bo

    gm mt hoc nhiu mu nhn ghi li thuc tnh ca mt hoc nhiu thc th v

    bi cnh xut hin trong vn bn. Mt mu c gn nhn l so khp mt biu

    thc chnh quy c xc nh qua cc tnh nng ca th trong vn bn v mt

    nhn tu chn. Cc thuc tnh c th c ch ra l thuc tnh ca th hoc ng

    cnh hoc cc vn bn trong cc th xut hin.

  • 12

    Hu ht cc h thng da trn lut c lin tng; lut c p dng trong

    nhiu giai on m mi giai on lin kt mt d liu u vo vi mt ch thch

    nh l tnh nng u vo cho cc giai on tip theo. V d, mt trch chn cho

    cc a ch lin lc ca ngi c to ra trong hai giai on ca lut: giai on

    th nht nhn th cng vi nhn thc th nh: tn ngi, v tr a l nh tn

    ng, tn thnh ph, v a ch th in t. Giai on th hai, xc nh khi a

    ch cng vi u ra ca giai on th nht nh l thuc tnh b sung.

    1/. Cc thuc tnh ca cc th (Features of Tokens)

    Mi mt th trong mt cu thng c kt hp cng vi tp thuc tnh

    thu c thng qua mt hoc nhiu cc tiu ch sau:

    - Cc chui i din cho th .

    - Cc loi chnh t ca th c th c dng t in hoa, t in nh, t hn hp,

    s, k hiu c bit, du cch, du chm cu,

    - Cc phn pht biu (part of speech) ca th

    - Danh sch xut hin cc th ca t in. Thng thng, iu ny c th

    c tip tc tinh ch ch ra, nu cc th ph hp vi t bt u, kt

    thc, hoc t gia ca t in. V d, mt th nh " New " ph hp vi t

    u tin ca t in vi tn thnh ph, tn s c lin kt vi mt thuc

    tnh

    "Dictionary - Lookup = start of city . "

    - Ch thch km theo cc bc x l trc .

    Lut xc nh mt thc th n (Rules to Identify a Single Entity):

    Lut nhn ra mt thc th n y bao gm ba loi mu.

    - Mt mu ty chn ghi li bi cnh trc khi bt u ca mt thc th .

    - Mt mu kt hp cc th trong cc thc th.

    - Mt mu ty chn ghi li bi cnh sau khi kt thc ca thc th.

  • 13

    V d v mt mu xc nh tn ngi c dng "Dr. Yair Weiss" bao gm

    mt th tiu c lit k trong tp t in cc chc danh (c cha cc mc

    nh : Prof , Dr, Mr ), mt du chm, v hai t vit hoa l

    ({Dictionary - Lookup = Titles}{String = .}{Orthography type

    =capitalized word}{2})Person Names.

    Mi iu kin trong du ngoc nhn l mt iu kin ca mt th c

    theo sau cng vi s ty chn v ch ra s ln lp li ca th.V d v mt lut

    nh du tt c s i sau cc gii t "by" v "in" l thc th nm:

    (String=by|String=in})({Orthography type = Number}):yYear=:y. C hai

    mu trong lut ny: mu u tin ghi li ng cnh xut hin ca cc thc th

    nm v mu th hai ghi li cc tnh cht ca th to thnh " year". Mt v d

    khc cho vic tm kim tn cng ty dng The XYZ Corp. or ABC Ltd. c

    to bi:

    ({String=The}? {Orthography type = All capitalized}{Orthography type

    = Capitalized word, DictionaryType =Company end})Company name

    2/. Cc lut nh du ranh gii thc th (Rules to Mark Entity Boundaries)

    i vi mt s loi thc th, trong cc n v di c bit nh tiu cun

    sch, n l hiu qu hn xc nh cc lut c bit nh du s bt u v

    kt thc mt ranh gii thc th. l loi b mt cch c lp v tt c cc th

    trong gi hai th nh du u v cui c gi l thc th. Nhn nhn vn

    theo mt cch khc, mi lut c bn dn n s chn ca mt n Th SGML

    trong vn bn m cc th ny c th l mt th bt u hoc mt th kt thc.

    gii quyt s khng nht qun khi c hai thc th bt u nh du trc v ch

    mt thc th nh du kt thc, iu ny cn c mt cch gii quyt c bit. V

    d, mt quy tc chn mt th , nh du s bt u ca mt tn

    tp ch trong mt bn trch dn:

    ({String=to} {String=appear} {String=in}):jstart

  • 14

    ({Orthography type = Capitalized word}{2-5})insert

    after:jstart.

    Nhiu h thng trch chn da trn lut thnh cng da trn cc lut nh

    vy, nh (LP)2 [60], STALKER [156], Rapier [ 43 ], v WEIN [121 , 23].

    3/. Cc lut cho a thc th (Rules for Multiple Entities)

    Mt s lut c dng biu thc chnh quy vi nhiu slot, mi slot i din

    cho mt thc th khc nhau sao cho lut ny dn n s cng nhn ca nhiu i

    tng cng mt lc. Nhng lut ny c s dng tt hn cho bn ghi d liu

    theo nh hng. V d, h thng da trn lut WHISK [18] c nhm ti

    cho vic khai thc t h s c cu trc nh h s y t , cc bn ghi bo tr thit

    b, v phn loi qung co. Cc lut ny c vit li t [18], trch chn hai

    thc th, s lng phng ng v cho thu, t mt qung co cho thu cn h.

    ({Orthography type = Digit}):Bedrooms ({String =BR})({}*)

    ({String =$})({Orthography type = Number}):PriceNumber

    of Bedrooms =:Bedroom, Rent =: Price

    4/. Chn la hnh dng ca tp lut (Alternative Forms of Rules)

    C nhiu h thng da trn lut state-of-the-art cho php cc chng trnh

    ty vit bng ngn ng th tc nh Java v C + + thay cho c hai thnh phn

    mu v phn hnh vi ca cc lut. V d, GATE[19] h tr cc chng trnh

    Java thay cho ngn ng hnh thc cc lut ty chnh ca n c gi l JAPE

    trong hot ng ca mt lut. y l mt kh nng mnh m bi v n cho php

    phn hnh vi ca cc quy tc truy cp cc thuc tnh khc nhau m c s

    dng trong phn mu ca cc quy tc v c s dng chn cc trng mi

    cho chui ch thch. V d, phn hot ng c th dn n chn cc dng chun

    ca mt chui trong t in. Cc trng mi c th c xem nh cc thuc

    tnh b sung cho mt lut trong cc ng ly tin ring. Tng t, trong cc

    cng thc Prolog-based t [20] th bt k m th tc no cng c th c thay

    th nh l so khp mu cho bt k tp hp con ca cc loi thc th.

  • 15

    Nhn chung, trong cc h thng tri thc (knowledge systems), ban u

    thng c s dng phng php tip cn da trn lut (rule-based). u im

    ca phng php ny, th nht, cn s dng t d liu hun luyn hn phng

    php tip cn da trn d liu. Th hai, phng php ny c th xy dng cc

    biu thc chnh quy tt cho trch chn thng da trn c php, t vng, v cc

    thnh phn ng ngha. Phng php tip cn da trn lut ph hp vi bi ton

    trch chn cc thng tin v thi gian (rng sng hm qua, gia tra hm

    nay). Phng php ny cho chnh xc rt cao (do c xy dng ly ra

    cc thng tin c bit), hi tng thp. Do phng php ny rt thch hp

    cho cc bi ton ch quan tm n chnh xc.

    Bn cnh nhng u im, phng php tip cn da trn lut cn c nhng

    nhc im. Khi s dng phng php ny i hi ngi xy dng ng vai tr

    nh chuyn gia min d liu, cn phi rt am hiu d liu, ngi xy dng phi

    c kin thc v ngn ng, t vng, v c php. Hn na, tp lut thng c

    xy dng ly ra cc thng tin c bit, d khi thay i sang min d liu

    khc th li phi xy dng tp lut cho ph hp. Vic xy dng tp lut i khi

    rt tn thi gian v chi ph.

    2.2. PHNG PHP TIP CN DA TRN HC MY

    Phng php ny i khi cn c gi vi tn l tip cn da trn d liu

    (data-driven). Phng php tip cn da trn hc my thng c s dng cho

    cc ng dng x l ngn ng t nhin v tp d liu hun luyn ln hun

    luyn cho ph hp vi cc hin tng ngn ng [9]. Phng php ny thng

    da trn m hnh xc sut (probabilistic models), l thuyt thng tin

    (information theory), v i s tuyn tnh (linear algebra). Mt s cch tip cn

    c bn thng c s dng l Term Frequency - Inverse Document Frequency

    (TF-IDF), n-grams hay phn cm.

    C rt nhiu v d v p dng phng php tip cn da trn d liu

    trch chn thng tin trong cc s kin. Nm 2009, Okamoto v cng s [9]

  • 16

    dng mt khung (frameword) pht hin cc s kin cc b (loacal events).

    Trong nghin cu tc gi s dng cc k thut phn cm phn cp. Trong khi

    , phn cm c th sinh ra cc kt qu tt cho trch chn s kin, Liu M v cc

    cng s [10] kt hp cc th c trng s v hng chia i (weighted

    undirected bipartite graphs) v phn cm trch chn cc thc th chnh v cc

    s kin c ngha t cc thng tin hng ngy. Cc k thut phn cm cng c

    s dng bi Tanev v cng s [13] trch chn cc s kin bo lc v thm

    ho cho h thng gim st.

    Cch tip cn da trn d liu (data - driven) khng i hi ngi xy

    dng cn n cc kin thc v ngn ng v chuyn gia min. Nhng phng

    php ny li i hi mt lng d liu ln lm tp hun luyn. Phng php

    tip cn da trn d liu cn xy dng xc sut xp s m hnh hun luyn

    vi d liu. Phng php ny c nhng u im. u th nht, cch tip cn ny

    khng cn c s tham gia ca cc chuyn gia v ngn ng v chuyn gia min.

    u th hai, cc m hnh sau khi hun luyn c th s dng vi cc min d liu

    khc nhau.

    Tuy th, cch tip cn da trn d liu cng c nhng nhc im. Th

    nht, trong cc bi ton trch chn s kin, phng php tip cn da trn d

    liu khng gii quyt c cc vn c lin quan n ng ngha (v d,

    phng php ny ch pht hin cc quan h trong tp d liu m khng gii

    quyt c cc vn ng ngha). Th hai, phng php ny cn mt lng d

    liu ln hun luyn m hnh. Trong mt s trng hp, vic gn nhn d liu

    tn thi gian v chi ph. Th ba, do phng php tip cn da trn d liu c

    xy dng trn cc m hnh xc sut thng k, do , trong mt s trng hp

    nu qu trnh lm d liu hun luyn khng tt dn n kt qu ca qu trnh

    trch chn khng cao.

  • 17

    2.3 PHNG PHP TIP CN KT HP LUT V HC MY

    Phng php tip cn kt hp lut v hc my (lai - hybrid) thng c

    s dng trong cc bi ton trch chn s kin. Hu ht cc h thng da trn tri

    thc (knowledge - driven) c b sung bi cc phng thc da trn d liu

    (data - driven), do vy n c th gii quyt c cc khuyt im ca phng

    php da trn tri thc. V d, Piskorski v cng s [12] s dng cc k thut

    bootstrapping cho h thng trch chn cc s kin lin quan ti bo lc t cc

    bn tin trc tuyn vi chnh xc v hi tng cao.

    Morik [8] kt hp cc lut ng ngha vi Conditional Random Fields

    (CRFs) c biu din nh th v hng trch chn cc s kin t phin

    hp ton th ca ngh vin c. y, tc gi gii quyt hn ch ca thut

    ton hc c gim st vi cc cm. Lee v cng s [8] s dng ontology m

    (ontology-based fuzzy) trch chn s kin t cc bn tin ting Trung Quc.

    Tc gi s dng thng k da trn ng php (grammar-based statistical) v

    gn nhn t loi (part-of-speech tagging). Chun v cng s [3] trch chn cc s

    kin y sinh bng cch s dng cc lut c php kt hp vi ng tham chiu(co-

    occurrences). Nh vy phng php ny c th c coi l phng php lai.

    Trong lun vn, tc gi s dng phng php kt hp lut v hc my v

    cc l do sau: Th nht, phn lp d liu thuc min tai nn giao thng vi

    d liu u vo ln, cch thch hp hn c l dng lut c php lc, bc lm

    ny gim ng k s lng d liu u vo cho qu pht hin s kin. Th hai,

    trong bn thng tin ca s kin v tai nn: thi gian, a im, s thng vong,

    v loi phng tin gy tai nn. c bit thng tin v thi gian, s thng vong,

    v loi phng tin gy tai nn. i khi nhng thng tin ny c cp khng

    r rng thiu chi tit v d vo gia tra, ng lc tan tm hay 2 ngi

    thit mng, lm cht 1 ngi hay xe khch m vo xe ti; do tc gi

    s dng lut ng ngha trch chn ra cc thng tin ny. L do th 3, tc gi s

    dng phng php lai l trong h thng c chc nng phn lp v nhn dng

  • 18

    thc th m cc yu cu ny c thc hin tt bi phng php xc sut thng

    k da trn d liu.

    2.5. TNG KT

    Trong chng ny, tc gi trnh by mt s phng php tip cn bi

    ton v ch ra mt s u nhc im ca tng phng php. Cui cng, tc

    gi nhn ra rng s dng phng tip cn kt hp lut v hc my gii quyt

    bi ton trch chn s kin v tai nn l ph hp. Pht biu bi ton, m hnh,

    phng php gii quyt bi ton s c trnh by chi tit trong chng 3.

  • 19

    Chng 3. XUT M HNH TRCH CHN S KIN V TAI NN

    Trong chng ny, tc gi tp trung phn tch lm r bi ton trch chn s

    kin v tai nn. Tm hiu cc c tnh ca s kin v tai nn; php biu bi ton,

    xut m hnh, cch gii quyt chi tit hai bi ton quan trng trong lun vn

    l bi ton pht hin s kin v tai nn v bi ton trch chn s kin v tai nn.

    3.1. CC C TNH CA S KIN V TAI NN

    Qu trnh kho st trn min d liu l thng tin v tai nn ch ra rng trong

    qu trnh pht hin s kin v tai nn cn phi phn bit r u l thng tin v

    tai nn giao thng, u l thng tin tai nn giao thng. Thng tin v tai nn giao

    thng l ci m lun vn quan tm trong bi ton trch chn s kin v tai nn,

    v d nh sng ngy 25/5 mt v tai nn thm khc xy ra trn quc l

    1A; cn thng tin tai nn giao thng nh tiu bi bo lm th no gim

    thiu s v tai nn giao thng, hay sc v con s thit mng do tai nn trong

    na u nm 2014 th y khng phi thng tin v tai nn giao thng m ch l

    thng tin tai nn giao thng.

    Cng qua kho st trn min d liu thng tin v tai nn ch ra rng mt s

    kin v tai nn c th cha thi gian xy ra tai nn, a im, s thng vong,

    phng tin gy tai nn, nguyn nhn ca v tai nn, tui ca ngi iu

    khin phng tin, v tai nn xy ra vo thi gian no trong ngy Trong s

    cc thng tin th thng tin v thi gian, a im, s thng vong, phng

    tin gy tai nn c c bit quan tm v cng l cc thng tin s c trch

    chn trong s kin v tai nn.

    3.2. PHT BIU BI TON

    Bi ton trch chn s kin trong vn bn bn tin Ting Vit. Trong lun

    vn, tc gi s tp trung vo gii quyt bi ton trch chn s kin trong bn tin

    v tai nn giao thng (t nay s c gi l trch chn s kin v tai nn). Tc

    gi mun nhn mnh l s kin v tai nn phn bit vi cc thng tin v tai

    nn giao thng nhng khng phi bn tin v tai nn giao thng (v d, bn tin

  • 20

    v bui tho lun lm th no gim thiu tai nn giao thng). Trong chng

    ny, tc gi tp trung vo gii quyt bi ton trch chn thng tin v tai nn giao

    thng t vn bn tin tc ting Vit, ly t cc trang bo in t Vit Nam. Trch

    ra thng tin v s kin v tai nn nh thi gian xy ra v tai nn, a im xy ra

    tai nn, s thng vong (s t vong v s b thng), phng tin gy tai nn,

    tui ca ngi gy tai nn, a hnh gy tai nn, nguyn nhn ca v tai

    nn t cc vn bn phi cu trc. Bi ton c pht biu nh sau:

    u vo: mt bn tin trn bo in t

    u ra: bn tin u vo c phi s kin v tai nn giao thng khng, nu

    c th trch chn ra thng tin v v tai nn giao thng.

    Thng tin trong mt bn tin v tai nn giao thng (t nay gi l bn tin v

    tai nn) c nh ngha l mt b E gm bn thnh phn, l: Thi gian, a

    im, s thit hi, phng tin gy tai nn. Mt cch hnh thc E c nh

    ngha nh sau:

    E= (3.1)

    Thi gian: l thi gian xy ra v tai nn

    a im: l a im xy ra v tai nn

    S thng vong: l s nn t vong, s ngi b thng. S thit hi c th

    l danh sch gm c 2 trng l s thng vong v s t vong. V d, x hp

    do say ru m trc tip vo nh ngi dn, lm cho 2 ngi b thng

    nng, ti x cht ngay ti ch. Thng tin s thng vong c trch ra di

    dng danh sch:

    s t vong s thng vong

    2 1

    V d khc, xe khch m thng vo xe ti bn ng, lm 3 hnh

    khch b thng. Thng tin c trch ra di dng danh sch:

  • 21

    s t vong s thng vong

    0 2

    Phng tin gy tai nn: ch trch ra loi phng tin gy ra tai nn.

    V d, thng tin v s kin v tai nn E nh sau: E=. Qua bn thng tin tai nn c bn ny, chng ta

    c th d rng suy lun ra rng: vo ngy 12 thng 7 nm 2013 mt v tai nn

    xy ra trn Quc l 1A lm 3 ngi i xe my b thng.

    Bi ton nh ngha, u vo ca m hnh l cc bn tin trn bo in t.

    Tc gi chn d liu u vo l cc trang bo in t v ba l do sau. Th nht,

    thng tin trn cc trang rt phong ph; Th hai, thng tin c tin cy cao v

    tnh cp nht cao; Th ba, qu trnh thu thp d liu t trn cc trang bo in t

    cng kh d rng. Nn d liu lun bo m tnh a dng v tnh cp nht.

    M hnh trong phn nh ngha bi ton c chia thnh hai bi ton nh

    sau: bi ton th nht c gi l pha 1- pht hin s kin v tai nn, bi ton

    gii quyt vn pht hin mt bi bo c cha thng tin v tai nn hay khng,

    bi ton th hai c gi l pha 2 - trch chn s kin v tai nn, bi ton ny s

    gii quyt vn sau khi pha 1 kim tra d liu l s kin v tai nn, pha 2 s

    trch chn thng tin v s kin v tai nn.

    3.3. M HNH PHT HIN V TRCH CHN S KIN V TAI NN

    3.3.1. Phng php xut

    Trong chng 2, lun vn tp trung trnh by cc phng php tip

    cn: phng php tip cn da trn lut (rule-based), phng php hc my, v

    phng php kt hp lut v hc my (phng php lai). Trong phn ny, lun

    vn tip tc pht trin tng ca vic kt hp gia lut v hc my cho bi

    ton trch chn s kin v tai nn.

  • 22

    Pha 1- Pht hin s kin v tai nn: D liu u vo ca pha ny l cc

    bn tin trn cc trang bo in t, s lng cc bi rt nhiu v ca rt nhiu cc

    lnh vc khc nhau. Nn ti chia bi ton ny thnh hai bc; bc 1 - dng

    lut lc ra d liu trong min tai nn giao thng, bc 2 - dng b lc

    nhn din cc bn tin c cha s kin v tai nn. Nh vy, gii quyt bi

    ton pht hin s kin v tai nn l kt hp gia lut v hc my.

    Pha 2- Trch chn s kin v tai nn: Trong pha ny ta phi trch chn ra

    cc thng tin v thi gian xy ra v tai nn, a im u, s thng vong, v

    phng tin gy tai nn. Trch chn thng tin v a im xy ra v tai nn dng

    nhn dng thc th (NER) v ontology hoc dng t in; thng tin v thi

    gian c th dng chun (dd/mm/yyyy) hoc khng chun (gia tra, na

    m, gi tan tm ), nn ta dng lut trch chn ra thng tin; Trch chn

    thng tin s thng vong (s t vong v s b thng) s dng nhn dng thc

    th v lut lc ra thng tin; Trch chn thng tin phng tin gy tai nn, tc

    gi xy dng mt b t in cc phng tin giao thng sau dng lut so

    khp vi b t in.

    Nh vy, gii quyt c cc vn trong hai pha ta kt hp c lut v

    hc my ( y l phn lp v nhn dng thc th). M hnh ca c hai pha s

    c trnh by chi tit trong phn 3.3.2 v cch gii quyt chi tit hai bi ton

    trong phn 3.4.

    3.3.2. M hnh pht hin v trch chn s kin v tai nn

    gii quyt cc vn c hai pha phn 3.3.1, tc gi xut m hnh

    pht hin v trch chn s kin v tai nn gm c bn thnh phn chnh nh sau:

    Hnh 3.1: Qu trnh pht hin v trch chn s kin v tai nn

  • 23

    Thu thp d liu: phn ny c nhim v thu thp d liu t ng t cc bn

    tin t cc trang bo in t trn Internet sau chuyn cho bc tin x l d

    liu.

    Tin x l d liu: thnh phn ny c nhim v x l d liu sau khi thu

    thp c phn trn, ta loi b cc th HTML, ly d liu dng th (text). Sau

    chuyn n sang bc pht hin s kin v tai nn.

    Pht hin s kin: l pht hin s kin v tai nn, d liu c ly t thnh

    phn tin x l d liu, ta dng lut ly cc d liu thuc min thng tin tai

    nn giao thng, sau ta dng hc my phn lp d liu, kim tra d liu

    c phi bn tin v tai nn giao thng hay khng, nu khng phi th loi, nu

    ng th ly v chuyn d liu cho bc trch chn s kin v tai nn.

    Trch chn s kin: l bc trch chn s kin v tai nn; bc ny ta

    trch chn nhng thng tin c trng ca v tai nn nh: thi gian, a im, s

    thng vong, phng tin gy tai nn giao thng.

    3.4. GII QUYT BI TON PHT HIN S KIN V BI TON

    TRCH CHN S KIN V TAI NN

    Nhim v ca bi ton 1, t d liu th (text) bc tin x l dng lut

    lc ly d liu trong min thng tin tai nn giao thng, t dng b phn

    lp kim tra d liu c phi l bn tin v tai nn hay khng, nu d liu l

    bn tin v tai nn th d liu c chuyn sang bi ton 2 - Trch chn s

    kin v tai nn. M hnh v cc gii quyt chi tit ca hai bi ton s c trnh

    by trong mc 3.4.1 v 3.4.2.

    3.4.1. Bi ton 1- Php hin s kin v tai nn (pha 1)

    3.4.1.1. Pht biu bi ton

    Mc tiu ca bi ton 1- Pht hin s kin v tai nn, d liu cn gii quyt

    c ly t bc tin x l d liu (d liu dng th - text), d liu u ra c

  • 24

    cha s kin v tai nn hay khng. Mt cc hnh thc, bi ton c pht biu

    nh sau:

    u vo: mt bn tin trn cc trang bo c dng th.

    u ra: bn tin c cha s kin tai nn hay khng?

    Trong pha 1, gm hai chc nng: mt b lc d liu v mt b phn lp.

    B lc c chc nng lc d liu t bc tin x l (d liu dng th sau khi

    c lc th HTML t bn tin c ly trn cc trang bo) cc bn tin trong

    min tai nn giao thng; Cn chc nng phn lp kim tra bn tin c cha s

    kin v tai nn hay khng? Qu trnh pht hin s kin v tai nn c trnh by

    trong hnh 3.2

    Hnh 3.2 Thnh phn pht hin s kin

    3.4.1.2. Xy dng tp lut

    Nh trong phn 3.4.1.1 trnh by, pha pht hin d kin gm hai chc

    nng, chc nng lc d liu (l cc bn tin thuc min tai nn giao thng), sau

    chc nng phn lp s kim tra d liu c cha s kin v tai nn hay khng.

    Trong phn ny tc gi s trnh by chi tit chc nng th nht - lc d liu

    thuc min tai nn giao thng.

    Qua kho st d liu, ta thy tiu ca bn tin thng ni ln kh y

    ni dung ca bn tin. Nn tc gi thay v lc d liu qua ni dung th lc d liu

    qua tiu ca bn tin.

    Hot ng ca b lc d liu c m t nh sau: (1) xy dng tp lut da

    trn kho st min d liu, cc t kho lin quan n min d liu tai nn giao

  • 25

    thng. (2) b lc d liu s dng cc lut ny so khp vi tiu bn tin, nu

    tiu bn tin cha cc tp lut ny th iu bn tin thuc min tai nn giao

    thng, ngc li th khng thuc.

    Qua kho st d liu hu ht cc tiu bi bi thuc min tai nn giao

    thng thng c cc t lin quan n phng tin giao thng. V d nh,

    Tp.HCM: Xe khch ko l xe my trn ng, Xe bus ri xung hm ni, 56

    ngi thng vong, t i tri ng, 1 ngi thit mng, TP.HCM:

    Nam thanh nin t vong di gm xe ben v mt s t cc bn tin v tai nn

    tiu khng cha phng tin giao thng, v d nh: Ngh An: Hai th sinh

    khng th thi tt nghip v TNGT th n li cha cc t nh tai nn, tai nn giao

    thng, TNGT, tai nn bi thm, V d cc bn tin trong hnh 3.3. v hnh 3.4.

    Hnh 3.3 Tiu bn tin c cha t lin quan phng tin giao thng

  • 26

    Hnh 3.4 Tiu khng cha cc t lin quan n phng tin giao thng Qua kho st d liu v thc t, tc gi xy dng c mt tp cc

    phng tin giao thng gi l t in cc phng tin giao thng. Chi tit tn

    cc loi phng tin c lit k di bng 3.1.

    Bng3.1 Phng tin giao thng

    Stt Tn phng tin Stt Tn phng tin

    1 Xe 29 Xe lu

    2 t 30 My tut

    3 M t 31 Xe cn cu

    4 Xe my 32 My sc

    5 Xe khch 33 tc-xi

    6 Xe but 34 Xe th

  • 27

    7 Xe hi 35 Xe hng

    8 Xe bn ch 36 Xe

    9 x hp 37 Xe b

    10 Xe tru 38 Xe nga

    11 Xe in 39 Cng-te-n

    12 Tu ho 40 cn cu

    13 My bay 41 Xe ba gc

    14 tu la 42 Xe ua

    15 Xe ti 43 Xe phn khi ln

    16 Xe m 44 Xe ga

    17 Xe p 45 Xch-l

    18 Xe p in 46 Trc thng

    19 Cng nng 47 Xe bus

    20 My ko 48 Xe ben

    21 Xe lu 49 Xe 3 bnh

    22 t 4 ch 50 Xe ba bnh

    23 Xe u ko 51 Xe 3 gc

    24 Xe 7 ch 52 Thuyn

    25 t 7 ch 53

    26 Xe 16 ch 54 Xung my

    27 Xe 24 ch 55 Tu

    28 Xe 29 ch 56 Ghe

    T , tc gi xy dng lut cho hai trng hp, trng hp th nht dng

    mu 1, cc tiu bn tin so khp vi t in cc phng tin giao thng nu

  • 28

    khp th lc ra; cn khng dng mu 2. Chi tit cc mu c minh ho trong

    cng thc (3.1) v (3.2).

    Mu1 = phng tin giao thng (3.1)

    V d minh ho cho mu 1:

    Tm thy t xe trong tiu bn tin Xe ch bia m ct in, 2 ngi

    mc kt trong cabin

    Tm thy t xe but trong tiu bn tin Tp.HCM: Xe but cn nt

    chn ngi b hnh

    Mt v d khc, a em i thi i hc ch b tai nn giao thng, trong

    tiu bn tin ny khng cha phng tin giao thng nn mu 1 c b qua

    m s chuyn sang s dng mu 2.

    Mu 2= ng t # danh t (3.2)

    Trong :

    ng t gm cc t: Tai nn, TNGT,

    Danh t gm cc t: giao thng, thng tm,

    V d minh ho cho mu 2:

    tai nn # thng tm

    tai nn # giao thng

    3.4.1.3. Xy dng m hnh phn lp

    B phn lp c nhim v pht hin mt bi bo c cha s kin hay khng.

    B phn lp s phn ra thnh hai lp: lp c cha s kin v tai nn nhn l

    EVENT v lp khng cha s kin v tai nn nhn l NOT_EVENT. Qu trnh

    kho st cho thy rng phn tiu v tm tt ca bn tin cha y ni

    dung chnh ca c bn tin. Nn, tc gi dng thng tin ny xy dng vct

    c trng biu din vn bn. Cc c trng c s dng trong qu trnh hun

  • 29

    luyn l 2-grams, 3-grams, 4-grams. Tc gi xy dng mt tp hun luyn v

    dng tp d liu hun luyn ny xc nh vn bn cha s kin.

    Trong phn ny tc gi s dng m hnh Maximum Entropy (ME) v: (1)

    d liu trong qu trnh hun luyn l vn bn, do vy, khi biu din di dng

    vector c trng th y l d liu tha m ME tt khi d liu c biu din

    di dng tha: (2) tc hun luyn ca ME kh tt v thc nghim cho thy

    phng php ny cho kt qu tt vi d liu vn bn; (3) c th tu bin m

    ngun ca ME do y l m ngun m. M hnh ME da trn xc sut c iu

    kin cho php tch hp s a dng ca cc c trng t tp hun luyn cho bi

    ton phn lp. tng ca ME l m hnh phi xc nh mt phn phi u tho

    mn cc rng buc t tp d liu hun luyn m khng thm bt bt k mt gi

    nh no. iu ny c ngha s phn b ca m hnh phi tho mn cc rng

    buc ca d liu quan st v cng gn vi phn b cng tt.

    Sau qu trnh hun luyn, ton b d liu qua b lc s c a vo

    m hnh. Tai y, nhng vn bn c nhn EVENT s l u vo cho qu trnh

    trch chn; ngc li vn bn c nhn NOT_EVENT m hnh s b qua.

    3.4.2. Bi ton 2- Trch chn s kin v tai nn (pha 2)

    3.4.2.1. Pht biu bi ton

    B trch chn s kin c th coi l thnh phn trng tm nht ca m hnh.

    Ni m cc thng tin ca mt s kin v tai nn giao thng c trch chn.

    Mt cch hnh thc, c th pht biu bi ton trch chn s kin nh sau:

    u vo: bn tin cha s kin v tai nn

    u ra: cc thng tin ca mt v tai nn gm: thi gian, a im, s

    thng vong, phng tin gy tai nn. y s thng vong bao gm s nn

    nhn t vong v s nn nhn b thng. S thng vong c lit kt di dng

    danh sch gm hai trng (s t vong, s b thng), v mt bn ghi tng ng

    ghi ra s t vong v s b thng.

  • 30

    Bi ton trch chn s kin c th c minh ho trong hnh 3.5.

    Hnh 3.5. Thnh phn trch chn s kin

    B trch chn gm 4 c trng: trch chn thi gian, trch chn a im,

    trch chn s thng vong, v trch chn phng tin gy tai nn. c trng th

    nht s dng cc lut ly v thng tin thi gian xy ra v tai nn (thng tin v

    thi gian l ngy xy ra tai nn (khng phi l gi trong ny xy ra v tai nn).

    c trng th hai dng mt t in cha cc a im ly v a im. c

    trng th ba v th t tc gi s dng cc lut ly ra thng tin v s thng

    vong v trch ra phng tin gy tai nn.

    3.4.2.2. Trch chn thi gian

    Qua kho st trn tp d liu, kt qu cho thy thng tin v thi gian

    thng c biu din di hai dng: tuyt i, tng i. Thi gian tuyt i

    thng c biu din di dng DD/ MM/YYYY hoc dng DD/MM (vi DD

    l ch ngy, MM l ch thng, v YYYY l ch nm xy ra tai nn). V d, vo

    ngy 8/5 trn quc l 5 xe my va quyt vo t lm hai ngi b thng. V

    d khc vo ngy 09/7/2014, v li xe trong tnh trng say ru m x hp

    m thng xung h nc, nn nhn cht ti ch. Tuy nhin trong nhiu

    trng hp thng tin v thi gian c cp mt m v khng trc tip. V d,

    ngay sng sm ngy 5/5, mt v tai nn thm khc xy ra, ti x xe ti m

    thng vo xe khch, rt may khng c thit mng nhng ton b hnh khch b

    thng c a i cp cu. Trong trng hp ny, thng tin v thi im xy

  • 31

    ra v tai nn khng r rng, n ch l chiu ti. Nh vy, cn kt hp cm t

    sng sm v ngy chnh xc a ra thng tin v thi gian.

    T thc t thi gian c biu din bng hai cch, nn tc gi dng cc

    lut c xy dng sn ly ra thng tin v thi gian. Trong trng hp thi

    gian c biu din di dng tuyt i, thi gian c th d rng c trch ra

    bng cch s dng biu thc chnh quy (Regular Expression - RE). Trng hp

    th hai, thi gian di dng tng i, c th nhn thy n cha hai thnh phn:

    thnh phn tin t v thi gian. Thnh phn tin t l tp cc t ch thi gian

    tng i (rng sng, na m, chiu,) v thnh phn thi gian c biu din

    di dng DD/MM/YYYY. Lut trch chn thi gian c minh ho trong cng

    thc

    Thi gian = + (3.2)

    Trong , Tin t gm cc t: vo, ngy, sng, tra, chiu, ti, na m,

    tra nay, sng nay, chiu nay, vo gi tan tm, hm qua, hm nay, ti qua, m

    qua, rng sng nay, thng.

    Ngy thng, c nh dng DD/MM/YYYY hoc DD/MM

    Trong trng hp bn tin khng cp n ngy thng, th thi gian mc

    nh s c ly l thi gian ng bn tin.

    Mt s v v minh ho vic s dng biu thc chnh quy v lut trch

    chn thi gian ca s kin.

    V d 1: Ngy 23/5, t khch chy t Nam ra Bc n khu vc trc ch

    Thi Ph thuc a phn x c Thnh, huyn M c, bt ng lao sang bn

    tri ng, hc mnh vo t ti do ti x c Th iu khin lu thng

    ngc chiu. Do ang chy tc qu nhanh, xe khch tip tc lao thm

    khong 100m, m vo t ti khc ca ti x Nguyn c Ln ang u bn

    ng. C va chm mnh khin u ca c 3 t v nt, knh vng tung te trn

    ng. Ti x, ph xe v 3 hnh khch trn t khch b thng nng. Ngi

    dn xung quanh khu vc ny phi p knh xe cu ngi b nn a n bnh

  • 32

    vin cp cu.V tai nn khin nm ngi i trn xe khch b thng, c u xe

    t v xe ti b b nt.

    V d 2: Chiu ngy 24/8, anh H Vn o ch v v con mt tui bng xe

    my t x Tnh Hip, huyn Sn Tnh (Qung Ngi) v huyn min ni Tr Bng

    th trt ng, b xe ti cun vo gm. Ti hin trng, chic xe my cng 3

    ngi trn xe b cun vo gm t ti, mc kt. Cng an huyn Sn Tinh a a

    thi th c nh anh ao ra ngoai ; phong ta hin trng iu tra nguyn nhn.

    Nhn nh ban u ca c quan chc nng, do ng trn nn xe my ca anh

    o trt ng. Xe ti i cng chiu phanh khng kp nn cun xe my vo

    gm, ri ko l hai v chng cng con nn nhn khin c ba cht ti ch.

    Trong v d 1, thi gian c trch chn bng cch s dng biu thc chnh

    quy, cn trong v d 2 li s dng lut thi gian trch chn. Kt qu ca v d

    1 l 23/5, kt qu ca v d 2 l chiu ngy 24/8.

    3.4.2.3. Trch chn a im

    Trong trch chn a im, s dng NER v t in a im.

    Bc 1: p dng NER

    Bc 2: ly v cc thc th c gn th

    Bc 3: kim tra ngc li vi t in a im tm cc location chnh

    xc

    3.4.2.4. Trch chn s thng vong

    trch chn thng tin s thng vong tc gi s dng lut. Lut trch

    chn s thng vong c minh ho trong cng thc (3.3)

    S nn nhn = + (3.3)

    S: chnh l s nn nhn. C th l s hoc ch

    s={"mt", "hai", "ba", "bn", "nm", "su", "by", "tm", "chn",

    "mi"}; v cc s [1..9]

  • 33

    Hu t: l cc t t vong, b thng, thit mng, cht, nhp vin....

    hu t={"b thng", "cht", "t vong", "thit mng", "cht thm",

    thng nng, thng nh, cp cu,bnh vin};

    Kt qu c ghi li di dng danh sch gm hai trng v mt bn ghi:

    trng s t vong v trng s thng vong; tng ng vi mi trng l s

    liu c ghi di bn ghi.

    V d 3: Vo khong 12h55 tra nay (2/6), trn ng cao tc Thng

    Long - Ni Bi (H Ni), on cch siu th M Linh Plaza khong 200m xy

    ra v va chm gia 1 xe taxi v 1 xe my. Hu qu, v tai nn khin 2 ngi

    trn xe my b thng rt nng, hin vn ang nm trn ng cha c a

    i cp cu. V vic cng khin giao thng qua khu vc gp tr ngi, cc

    phng tin di chuyn kh khn theo hng vo trung tm thnh ph.

    Kt qu ca v d 3, s nn nhn t vong l 0, s b thng l 2

    S t vong S b thng

    0 2

    V d 4: Khong 22h, ngy 27-5, ti km 1045 + 950, quc l 1A, on i

    qua thn Th Li, x Tnh Phong, huyn Sn Tnh, tnh Qung Ngi, xy ra v

    tai nn giao thng nghim trng lm 1 thanh nin t vong ti ch v 1 ngi

    phi nhp vin.

    Kt qu ca v d 4: s nhn nhn t vong l 1, s b thng l 1

    S t vong S b thng

    1 1

    3.4.2.5. Trch chn phng tin gy tai nn

    trch chn thng tin phng tin gy tai nn, tc gi s dng lut

    trch chn. Lut c minh ho trong cng thc sau:

  • 34

    Phng tin gy tai nn = + (3.4)

    Trong :

    Danh t: gm cc t phng tin giao thng trong t in nh: xe khch,

    xe ti, xe u ko,... Chi tit ca tp cc phng tin giao thng c lit k

    trong bng 3.1.

    ng t: gm cc t nh, i u, m xe, gy ti nn, ng xe, m

    nhau...

    Chi tit ca tp cc ng t nh sau, verbs={"m nt u", "m xe",

    "u u", "xe i u", "ng xe", "m nhau","tai nn giao thng", "gy tai

    nn", "gp tai nn", "hc nhau", "lao xung gm", "chui vo gm", "b tng",

    "tng mnh", "cn cht", "cn qua", "hc", "m", "chui gm", "lt tu", "trt

    bnh", "tu trt bnh", "m thuyn", "chm thuyn", "lt thuyn", "lt nga",

    "cn cht"};

    V d 5: Khong 17h ngy 26/5/2014 ti Km 677 + 700 trn QL1A on i

    qua a phn thn Dinh Mi x Duy Ninh huyn Qung Ninh (Qung Bnh)

    xy ra mt v tai nn lm mt nam thanh nin t vong. Vo thi im ni trn,

    chic xe t ti mang BKS 60C-116.80 ang lu thng theo hng Bc Nam,

    khi i n a bn x Duy Ninh, bt ng mt nam thanh nin iu khin xe my

    mang BKS 73G1 - 074.03 ang chy ngc chiu m chnh din vo u xe

    ti, t vong ti ch.Nn nhn c xc nh l Ng nh Lm (SN 1989) tr ti

    thn Ph Lc x Gia Ninh, huyn Qung Ninh (Qung Bnh).

    Kt qu ca v d 5: phng tin gy tai nn l xe my

    3.5. TNG KT

    Trong chng ny, tc gi xut phng php v m hnh gii quyt

    bi ton tng quan trch chn s kin v tai nn. ng thi tc gi cng trnh

    by chi tit phng php v m hnh gii quyt hai bi ton: bi ton pht hin

    s kin v tai nn v bi ton trch chn s kin v tai nn; bi ton th nht tc

  • 35

    gia dng phng php kt hp lut v hc my pht hin s kin v tai

    nn giao thng v d liu ny c lm u vo cho bi ton th hai. bi ton

    th hai ny, cc thng tin c trch chn l: thi gian, a im, s thng

    vong, v phng tin gy tai nn. Trong c hai bi ton u dng phng php

    kt hp gia lut v hc my . Trong chng 4, tc gi s chng minh tnh hiu

    qu ca phng php bng phng php thc nghim.

  • 36

    Chng 4. THC NGHIM V NH GI

    Chng ny tc gi s trnh by v mi trng, cng c, cng nh cc gi

    c tc gi xy dng; bn cnh , tc gi cng chng minh tnh hiu qu ca

    phng php thng qua hai bi ton quan trng l pht hin s kin v trch

    chn s kin; cui cng, tc gi trnh by mt s bn lun lin quan ti kt qu

    thc nghim ca phng php xut cng nh phn tng kt chng.

    4.1. MI TRNG V CC CNG C S DNG THC NGHIM

    Cu hnh phn cng v cc cng c phn mm s dng s dng trong

    thc nghim ca lun vn c trnh by trong bng 4.1, bng 4.2.

    Bng 4.1 Cu hnh phn cng

    Stt Thnh phn Ch s

    1 CPU 2.6GHz Intel Core i5

    2 RAM 8GB

    3 H iu hnh Win7

    4 B nh ngoi 256GB

    Bng 4.2. Cng c phn mm s dng

    STT Tn phn mm Chc nng Ngun

    1 Teleport Pro Ti d liu t cc

    website http://teleport-pro.en.softonic.com/

    2 Eclipse Stan- dard/Kepler

    Release

    To mi trng vit

    chng trnh http://eclipse.org/eclipse

    3 JsoupParser B cng phn tch m

    html http://jsoup.org/apidocs/org

    4 JvnTextPro v.2.1 Cam-Tu Nguyen http://jvntextpro.sourceforge.net

    5 vn.hus.nlp.tokenizer-4.1.1 M ngun m

    https://code.google.com/p/vntaggergate-

    plugin/source/browse/lib/vn.hus.nlp.token

    izer-4.1.1.jar?r=85418c90bafeec89da

    9203f9a7f10338d2cff40c

  • 37

    4.2. XY DNG TP D LIU

    4.2.1. Thu thp d liu

    D liu c thu thp trn trang http://vovgiaothong.vn/giao-thong-

    trong-nuoc/ (knh VOV Giao thng Quc gia i Ting ni Vit Nam) v

    trang http://antoangiaothong.gov.vn/tai-nan-giao-thong/ (ca U ban An ton

    giao thng Quc gia). Tc gi chn trang ny v cc trang ny lun cp nht

    nhanh v kh y cc v tai nn trn c nc.

    Vic thu thp d liu s c thc hin bng phn mm Teleport Pro, phn

    mm ny s ly v 500 bn tin t cc website trn, nh vy sau khi thu thp d

    liu ta c 3000 bn tin.

    4.2.2. Tin x l d liu

    D liu c lu di dng JSON, tc gi tin hnh a d liu v dng

    HTML, sau tch th HTML thu vn bn dng th (text). Sau qu trnh s

    l, tc gi thu c 3000 bn tin. Cc thnh phn trong mt bn tin c minh

    ho trong bng 4.3.

    Bng 4.3. Cc thnh phn ca mt bn tin

    Stt Tn thnh phn M t

    1 Tiu Tiu ca bn tin

    2 Tm tt Phn tm tt ca bn tin

    3 Ngy ng tin Ngy m bn tin c ng

    4 Ni dung Ni dung bn tin

    4.3. NH GI QU TRNH PHT HIN S KIN

    4.3.1. nh gi b lc d liu

    M t thc nghim: mc ch ca thc nghim ny nh gi kh nng ca

    b lc d liu.

    Pht biu thc nghim

  • 38

    - u vo: mt tp cc bn tin c thu thp t trang

    http://vovgiaothong.vn/giao-thong-trong-nuoc/ v trang

    http://antoangiaothong.gov.vn/tai-nan-giao-thong/

    - u ra: cc bi bo lin quan ti min d liu tai nn giao thng

    D liu thc nghim: l 3.000 bn tin

    Sau qu trnh lc d liu thu c tng s 919 bn tin thuc min tai nn

    giao thng, trong s bn tin khng lin quan n tai nn giao thng rt t, v

    c th tnh t l li theo cng thc 4.1. Chi tit c trnh ny trong bng 4.4.

    Bng 4.4. T l li ca qu trnh lc d liu

    Tng s bn tin s bn tin khng lin quan T l li

    919 19 3.9%

    Cng thc tnh t l li ca qu trnh lc d liu:

    Trong :

    Tng s: l tng s bn tin thu c sau qu trnh lc

    S bi khng lin quan: l s bn tin khng thuc min tai nn giao thng.

    Kt qu ca qu trnh ny, c trnh by trong bng 4.4, thu c kt qu

    chnh xc kh cao.

    4.3.2. nh gi qu trnh phn lp

    M t thc nghim: mc ch ca phn ny l nh gi qu trnh phn lp

    ca thc nghim.

    Php biu thc nghim

    u vo: mt tp cc bn tin c lc

    u ra: cc bn tin c gn nhn EVENT hoc NOT_EVENT

  • 39

    D liu thc nghim: d liu ca mi ln nh gi l 100 bn tin c ly

    ngu nhin t cc bn tin c lc bi b d liu. Kt qu ca cc qu trnh

    nh gi c trnh by trong bng 4.5.

    Bng 4.5. nh gi kt qu phn lp

    Stt

    S bn tin

    chnh xc

    S bn tin

    khng chnh

    xc

    S bn tin

    khng tm thy Precision Recall o F-1

    1 85 12 3 88% 97% 92%

    2 81 16 3 84% 96% 90%

    3 83 15 2 85% 98% 91%

    4 85 11 4 89% 96% 92%

    5 80 17 3 82% 96% 89%

    Trung

    binh

    82.8

    14.2 3 85% 97% 91%

    Kt qu thc nghim trong bng 4.5, cho thy qu trnh phn lp cho thy

    chnh xc (P-Precision) t 85%, o hi tng (R-Recall) t 97%, o

    F-1 t 91%.

    4.4. NH GI QU TRNH TRCH CHN S KIN

    4.4.1. Thc nghim khng qua b phn lp

    M t thc nghim: mc ch ca phn ny l nh gi kh nng trch

    chn.

    Pht biu thc nghim

    u vo: mt bn tin trong min tai nn giao thng

    u ra: thng tin v s kin v tai nn gm: thi gian xy ra v tai nn, a

    im xy ra v tai nn, s thng vong (s t vong, s b thng), v phng

    tin gy tai nn.

  • 40

    D liu thc nghim: d liu l 200 bn tin ly ngu nghin t cc bn tin

    trong nim tai nn tai nn giao thng cha qua b phn lp.

    Mt s kin E c nh ngha l mt b gm thi gian, a im, s

    thng vong, v phng tin gy tai nn c trnh by trong cng thc 3.1.

    Nh vy mt s kin ng nn cha c bn thnh phn trn. Nu mt s kin

    khng bao gm phng tin gy tai nn v thi gian gy tai nn th c xem l

    mt s kin sai.

    nh gi kh nng trch chn ca s kin, tc gi s dng ba o:

    chnh xc (P - Precision), hi tng (R - Recall), v o F1 (F-score). Cc

    o ny c biu din trong cng thc (4.2), (4.3), (4.4)

    Trong :

    - S s kin ng: s s kin c m hnh trch chn chnh xc.

    - S s kin sai: l s s kin m m hnh trch chn sai.

    Trong :

    - S s kin ng: s s kin c m hnh trch chn chnh xc.

    - S s kin khng c trch chn: l s s kin m m hnh khng trch

    chn ra.

    (4.4) 2 x P x R F1 =

    (P + R)

    (4.2) S s kin ng chnh xc (P) =

    S s kin ng + S s kin sai

    (4.3) S s kin ng hi tng (R) =

    S s kin ng+s s kin khng c trch chn

  • 41

    Da vo cng thc (4.2), (4.3), (4.4), tc gi a ra bng nh gi m hnh

    trch chn, chi tit c trnh by trong bng4.6.

    Bng 4.6. nh gi qu trnh trch chn - d liu khng qua b phn lp

    Tn website S s kin

    ng

    S s

    kin sai

    S s kin

    khng tm thy P R F1

    antoangiaothong.gov.vn 160 34 6 82% 96% 89%

    vovgiaothong.vn 154 37 9 81% 94% 87%

    Trung bnh 314 71 15 82% 95% 88%

    4.4.2. Thc nghim qua b phn lp

    D liu thc nghim: d liu l 100 bn tin c ly t cc bn tin cha

    s kin v tai nn (gn nhn EVENT). Kt qu ca qu trnh trch chn s kin,

    tc gi cng s dng cng thc (4.2), (4.3), (4.4) nh gi thc nghim. Kt

    qu c m t chi tit trong bng 4.7.

    Bng 4.7. nh gi qu trnh trch chn - d liu qua b phn lp.

    Tn website S s kin

    ng

    S s

    kin sai

    S s kin

    khng tm thy P R F1

    antoangiaothong.gov.vn 91 5 4 95% 96% 95%

    vovgiaothong.vn 93 4 2 96% 98% 97%

    Trung bnh 184 9 6 95% 97% 96%

    4.4.3. Nhn xt

    T thc nghim c chi tit trong bng 4.6 (d liu khng qua b phn

    lp) v bng 4.7 (d liu c x l qua b phn lp). Kt qu cho thy d liu

    c x l qua b phn lp cho kt qu cao hn. iu chng t tm quan

    trng ca b phn lp trong m hnh.

    4.5 PHN TCH LI

    4.5.1. Phn tch li qu trnh pht hin s kin

    Qu kho st v thng k d liu sau thc nghim, pht hin li khi tiu

    c t nhc n phng tin giao thng nhng bn tin li khng thuc min

  • 42

    tai nn giao thng: V d, hnh 4.1 tiu bn tin kh v mua xe tr gp, c

    cha phng tin giao thng l xe nhng thc cht bn tin ny thuc min d

    liu thng mai khng phi min tai nn giao thng. Tuy th, b lc vn pht

    hin d liu thuc min d liu tai nn giao thng.

    Hnh 4.1. Li b lc khi d liu khng thuc min tai nn giao thng

    4.5.2. Phn tch li qu trnh trch chn s kin

    Trong pha trch chn thng tin th kh nng trch chn thng tin cn thp,

    tc gi tm hiu nguyn nhn v thy rng thng xy ra cc li nh: trch

    chn a im, i khi trong cc bn tin ch nhc n tn ng khng nhc n

    tn a phng (x/huyn/ tnh) trng hp ny khng th xc nh c a

    im chnh xc hoc cho gi tr Null. Trong s t cc trng hp cc thng tin

    c vit tt l khng trch chn c.Trch chn thng tin phng tin gy tai

    nn trong mt s trng hp trch chn ra thng tin sai nh: xe my b m,

    nn nhn cht ti ch, thng tin c trch ra xe my l phng tin gy tai

    nn kt qu ny l sai. Hay trong trng hp v trch chn s nn nhn nh

    Nn nhn c ngi dn a i cp cu, th khng trch chn c ra s

    nn nhn v khng c tin t v s lng. Chi tit hn v cc li c trnh by

    trong bng 4.8.

  • 43

    Bng 4.8 Mt s li - trong qu trnh trch chn

    Stt Thng tin ng Thng tin trch chn

    1 Phng 4, Qun 1, Phng 9, TP H Ch Minh Qun 5, Phng 7, Qun Bnh Thch

    2 Tnh Pray Veng Null

    3 Huyn Xun trng, Nam nh Nam nh

    4 Quc l 1A Null

    5 xe my b m Xe my

    6 Nn nhn c ngi dn Null

    4.6. MT S KT QU PHN TCH CC S KIN

    Kt qu ca qu trnh trch chn c s dng thng k nh thng k s

    v tai nn theo Tun, theo Th trn Tun, theo Tnh, v thng k s v tai nn

    theo Phng tin tham gia giao thng.

    1./ Thng k s v tai nn theo tun trong hai thng (thng 4 v thng 5

    nm 2014). D liu c tp trung vo thng 4 v thng 5 nm 2014, thng k

    cho thy cc ngy ngh l 30/4 v 1/5 s v tai nn tng ln ng knh ngc, c

    nc xy ra 191 v tai nn v lm thit nng 109 ngi. Chi tit c m t

    trong biu 4.1.

    Biu 4.1. Thng k s v tai nn theo Tun trong thng 4 v thng 5

  • 44

    2./ Thng k s v tai nn theo Th trn Tun, kt qu cho thy vo

    nhng ny cui tun s v tai nn tng ln ng k. Chi tit v s v tai nn

    trong tng Th trn Tun c th hin trong biu 4.2.

    Biu 4.2. Thng k s v tai nn theo Th trn Tun

    3./ Thng k s v tai nn theo cc tnh (thng k trn 4 tnh in hnh) trn

    c nc. Kt qu cho thy Thnh ph H Ch Minh c mc tai nn cao nht.

    Chi tit xem biu 4.3.

    Biu 4.3. Thng k s v tai nn theo Tnh

  • 45

    4./ Thng k cc phng tin c tn sut gy tai nn cao khi tham gia giao

    thng (thng k 5 phng tin c mc tai nn cao hn). Chi tit ca tng loi

    phng tin c hin th trong biu 4.4

    Biu 4.4. Thng k s v tai nn theo loi phng tin giao thng

    Qua thng k cc v tai nn giao thng tc gi rt ra nhn xt sau:

    i vi ngi dn khi tham gia giao thng vo nhng ngy ngh l, ngy

    cui tun, trong cc thnh ph ln, v tham gia giao thng trn cc phng tin

    nh xe my, xe but, xe khch, xe cng te- n v c bit l xe ti phi ht sc

    cn thn c bit l ngi iu khin phng tin giao thng, trnh nhng tai

    nn ng tic cho bn thn v cho ngi i ng.

    i vi cc nh qun l cng nn c cc bim php hiu qu ngn nga

    tai nn giao thng c bit vo nhng ngy ngh l di.

    4.7. TNG KT

    Trong chng ny, tc gi tin hnh thc nghim, xem xt v nh gi

    kt qu ca m hnh trch chn thng tin trong vn bn du lch c xy dng

    trong chng ba. Kt qu thc nghim cho thy tnh kh thi ca m hnh gii

    quyt bi ton trch chn s kin v tai nn.

  • 46

    KT LUN

    1/. Kt qu t c ca lun vn

    Trong lun vn ny, tc gi tm hiu cc phng php trch chn s

    kin, phng php kt hp lut v hc my c s dng cho bi ton pht hin

    s kin v bi ton trch chn s kin. Trn c s , xy dng m hnh v

    phng php gii quyt chi tit cho bi ton pht hin s kin v tan nn v bi

    ton trch chn s kin v tai nn. Kt qu thc nghim ca qu trnh trch chn

    s kin trn min d liu v tai nn vi o P t 95%, o R t 97 %, v

    o F1 t 96%, iu chng t tnh kh thi ca m hnh.

    2./ Hn ch

    - Kt qu ca b phn lp cha cao do nhp nhng gia bn tin c cha s

    kin v tai nn v bn tin cha thng tin tai nn giao thng khc.

    - Xy dng tp lut bng tay, do kh c th bao ph ton b d liu.

    iu ny dn n tp lut c th b st nhng d liu lin quan ti min d liu.

    - Trch chn a im da trn t in trong mt s trng hp cn b nhp

    nhng khi d liu cung cp khng thng tin v a im.

    - Trong mt s trng hp vit tt, khi trch chn thng tin cn cha chnh

    xc.

    3/. nh hng tng lai

    nh hng nghin cu tip theo ca lun vn l tip tc hon thin v pht

    trin m hnh trch chn s kin trong vn bn tin tc ting Vit. Pht trin trch

    chn thm cc thuc tnh quan trng nh: gi/ngy (gi no trong ngy xy ra

    v tai nn), tui ca ngi iu kin phng tin gy tai nn, ngnh ngh ca

    ngi iu khin phng tin gy tai nn, a hnh gy tai nn, Kt qu ca

    qu trnh trch chn c thng k nh: tai nn hay xy ra vo gi/ngy (gi no

    trong ngy hay xy ra tai nn vo ban m, gi n cng s, gi tan tm),

    th/tun (tai nn thng xy ra vo th no trn tun, nh ngy i lm hay ngy

  • 47

    cui tun, ), ma/nm (vo ma l hi, ma thi i hc, ma ma, hay vo

    cc k ngh mt ma h,), a hnh gy tai nn (ng dc, ng vng cua,

    hay ng c nhiu ng r..), ngnh ngh ca ngi iu khin phng tin giao

    thng T nhng thng k c th tm ra nguyn nhn xy ra cc v tai nn,

    so snh quy m mc nghim trng ca cc v tai nn trong tng khong thi

    gian vi nhau, t a ra bn nh gi trung v s pht trin ca cc v tai nn

    theo chiu hng no. Mt khc, kt qu ca qu trnh thng k s c trc

    quan ho trn bn Vit Nam cc im hay xy ra tai nn bng cc cnh bo,

    bin bo, v cc ghi ch.

  • 48

    TI LIU THAM KHO

    Ti liu ting Anh

    [1] Sunita Sarawagi (2008), Information Extraction, Indian Institute of

    Technology, CSE, Mumbai 400076, India,

    [2] Douglas E. Appelt. Introduction to information extraction technology. In

    Tutorial held at IJCAI-99, Stockholm, Sweden, 1999.

    [3] Young-Sook Hwang Chun Hong-Woo and Hae-Chang Rim. Unsupervised

    event extraction from biomedical literature using co-occurrence information and

    basic patterns. In: 1st International Joint Conference on Natural Language

    Processing (IJCNLP 2004). Lecture Notes in Computer Science. Springer-

    Verlag Berlin Heidelberg, vol. 3248:777786, 2004.

    [4] Uzay Kaymak Frederik Hogenboom, Flavius Frasincar and Franciska de

    Jong. An overview of event extraction from text. Workshop on Detection,

    Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011)

    at Tenth International Semantic Web Conference (ISWC 2011), 779:pp. 4857,

    2011.10

    [5] M.A Hearst. Automatic acquisition of hyponyms from large text corpora. In:

    14th Conference on Computational Linguistics (COLING 1992), vol.

    2:539545, 1992.

    [6] M.A Hearst. Wordnet: An electronic lexical database and some of its

    applications. In Automated Discovery of WordNet Relations, pp. 131151. MIT

    Press, 1998.

    [7] Frederik Hogenboom Jethro Borsje and Flavius Frasincar. Semi-automatic

    financial events discovery based on lexico-semantic patterns. International

    Journal of Web Engineering and Technology, 6(2):115140, 2010.

  • 49

    [8] Yea-Juan Chen Lee Chang-Shing and Zhi-Wei Jian. Ontology-based fuzzy

    event extraction agent for chinese e-news summarization. In Expert Systems

    with Applications 25(3), 431 447, 2003.

    [9] Okamoto Masayuki and Masaaki Kikuchi. Discovering volatile events in

    your neighborhood: Local-area topic extraction from blog entries. In: 5th Asia

    Information Retrieval Symposium (AIRS 2009). Lecture Notes in Computer

    Science. Springer-Verlag Berlin Heidelberg, vol. 5839:181192, 2009.

    [10] Liang Xiang Xing Chen Mingrong Liu, Yicen Liu and Qing Yang.

    Extracting key entities and significant events from online daily news. In: 9th

    International Conference on Intel- ligent Data Engineering and Automated

    Learning (IDEAL 2008). Lecture Notes in Computer Science. Springer-Verlag

    Berlin Heidelberg, vol. 5326:201209, 2008.

    [11] L. Peshkin and A. Pfeffer. Bayesian information extraction network. In

    Proc.of the 18th International Joint Conference on Artificial Intelligence

    (IJCAI), 2003.

    [12] Hristo Tanev Piskorski Jakub and Pinar Oezden Wennerberg. Extracting

    violent events from on-line news for ontology population. In: 10th International

    Conference on Business Information Systems (BIS 2007). Lecture Notes in

    Computer Science. Springer-Verlag Berlin Heidelberg, vol. 4439:287300,

    2007.

    [13] Silja Huttunen Ralph Grishman and Roman Yangaber. Information

    extraction for enhenced access to disease outbreak reports. Journal of

    Biomedical Informastic, 35(4):pp. 236246, 2002.

    [14] Ai Kawazoe Son Doan and Nigel Collier. Global health monitor - a web-

    based system for detecting and mapping infectious diseases. Proc. International

    Joint Conference on Natural Language Processing (IJCNLP), Companion

    Volume,Hyderabad, India:pp. 951956, 2008.

  • 50

    [15] William H. Hsu Svitlana Volkova, Doina Caragea and Swathi Bujuru.

    Animal disease event recognition and classification. 2010

    [16] Yusuke Miyao Akane Yakushiji, Yuka Tateisi and Jun ichi Tsujii. Event

    extraction from biomedical papers using a full parser. In In: 6th Pacific

    Symposium on Biocomputing (PSB 2001):pp. 408419, 2001.

    [17] Helen L. Johnson Chris Roeder Philip V. Ogren-William A. Baumgartner Jr.

    Elizabeth White Hannah Tipney K. Bretonnel Cohen, Karin Verspoor and Lawrence

    Hunter. High-precision biological event extraction with a concept recognizer. In In:

    Workshop on BioNLP: Shared Task collocated with the NAACL-HLT 2009 Meeting.

    pp. 5058. Association for Computational Linguistics, 2009.

    [18] S. Soderland, Learning information extraction rules for semi-structured and free

    text, Machine Learning, vol. 34, 1999.

    [19] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, Gate: A framework

    and graphical development environment for robust nlp tools and applications, in

    Proceedings of the 40th Anniversary Meeting of the Association for Computational

    Linguistics, 2002

    [20] W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan, Declarative

    information extraction using datalog with embedded extraction predicates, in VLDB,

    pp. 10331044, 2007.

    [21] Ralph Grishman and Beth Sundheim. Message understanding conference-6: a

    brief history. Proceedings of the 16th conference on Computational linguistics,

    COLING, Stroudsburg, PA, USA, Volume 1:pp. 466471, 1996.

    [22] Doddington George R. The automatic content extraction (ace) program tasks,

    data, and evaluation. In LREC, 2004

    [23] Keita Sato Nishihara, Yoko and Wataru Sunayama. Event extraction and

    visualization for obtaining personal experiences from blogs. In: Symposiumon Human

    Interface 2009 on Human Interface and the Management of Information. Information

    and Interaction. Part II. Lecture Notes in Computer Science, Springer-Verlag Berlin

    Heidelberg, vol. 5618:315324, 2009.

  • 51

    [24] Chinatsu Aone and Mila Ramos-Santacruz. Rees: A large-scale relation and event

    extraction system. In In: 6th Applied Natural Language Processing Conference (ANLP

    2000):pp. 7683. Association for Computational Linguistics, 2000.

    [25] Huanye Sheng Li Fang and Dongmo Zhang. Event pattern discovery from the

    stock market bulletin. In: 5th International Conference on Discovery Science (DS

    2002). Lecture Notes in Computer Science, Springer-Verlag Berlin Heidelberg, vol.

    2534:3549, 2002.

    [26] Vargas-Vera Maria and David Celjuska. Event recognition on news stories and

    semi-automatic population of an ontology. In In: 3rd IEEE/WIC/ACM International

    Conference on Web Intelligence (WI 2004). pp. 615618 , 2004.

    [27] Takuya Nakamura Agnes Sandor Cedric Tarsitano Philippe Capet, Thomas

    Delavallade and Stavroula Voyatzi. A risk assessment system with automatic

    extraction of event types. Intelligent Information Processing IV, IFIP International

    Federation for Information Processing. Springer Boston, vol. 288:220229, 2008.