DecisionTree Vanxuan.net

Preview:

Citation preview

  • nh ngha cy quyt nhCy quyt nh l mt kiu m hnh d boK thut hc my dng trong cy quyt nh c gi l hc bng cy quyt nh, hay ch gi vi ci tn ngn gn l cy quyt nhPhng tin c tnh m t dnh cho vic tnh ton cc xc sut c iu kinS kt hp ca cc k thut ton hc v tnh ton nhm h tr vic m t, phn loi v tng qut ha mt tp d liu cho trc

  • nh ngha cy quyt nhCy quyt nh l mt cu trc phn cp ca cc nt v cc nhnh3 loi nt trn cy:Nt gc Nt ni b: mang tn thuc tnh ca CSDLNt l: mang tn lp CiNhnh: mang gi tr c th ca thuc tnhCy quyt nh c s dng trong phn lp bng cch duyt t nt gc ca cy cho n khi ng n nt l, t rt ra lp ca i tng cn xt

  • V dDavid l qun l ca mt cu lc b nh golf ni ting. Anh ta ang c rc ri chuyn cc thnh vin n hay khng n. C ngy ai cng mun chi golf nhng s nhn vin cu lc b li khng phc v. C hm, khng hiu v l do g m chng ai n chi, v cu lc b li tha nhn vin.Mc tiu ca David l ti u ha s nhn vin phc v mi ngy bng cch da theo thng tin d bo thi tit on xem khi no ngi ta s n chi golf. thc hin iu , anh cn hiu c ti sao khch hng quyt nh chi v tm hiu xem c cch gii thch no cho vic hay khng.Vy l trong hai tun, anh ta thu thp thng tin v: Tri (outlook) (nng (sunny), nhiu my (overcast) hoc ma (raining)). Nhit (temperature) bng F. m (humidity). C gi mnh (wind) hay khng.V tt nhin l s ngi n chi golf vo hm . David thu c mt b d liu gm 14 dng v 5 ct.

  • V d

  • V dKim tra khi no chi golf, khi no khng chiWindHumidityOutlookYesNoYesSunnyOvercastRainYesNoHighNormalStrongWeak

  • V dKim tra khi no chi golf, khi no khng chiHumidityOutlookYesNoSunnyOvercastRainHighNormalMi nt mang mt thuc tnh (bin c lp)Mi nhnh tng ng vi mtgi tr ca thuc tnhMi nt l l mt lp (bin ph thuc)

  • Duyt cy quyt nhWindHumidityOutlookYesNoYesSunnyOvercastRainYesNoHighNormalStrongWeak

  • Biu thc lun lWindOutlookNoOvercastYesNoStrongWeakNoOutlook=Sunny Wind=WeakSunnyRain = AND = v = OR = hoc

  • Biu thc lun lWindOutlookOvercastRainYesNoStrongWeakYesOutlook=Sunny Wind=WeakSunnyWindYesNoStrongWeak

  • Biu thc lun l(Outlook=Sunny Humidity=Normal) Outlook=Overcast (Outlook=Rain Wind=Weak)WindHumidityOutlookYesNoYesSunnyOvercastRainYesNoHighNormalStrongWeak

  • Xy dng cy quyt nhCy c thit lp t trn xung diRi rc ha cc thuc tnh dng phi sCc mu hun luyn nm gc ca cyChn mt thuc tnh phn chia thnh cc nhnh. Thuc tnh c chn da trn o thng k hoc o heuristicTip tc lp li vic xy dng cy quyt nh cho cc nhnh

  • Xy dng cy quyt nhiu kin dngTt c cc mu ri vo mt nt thuc v cng mt lp (nt l)Khng cn thuc tnh no c th dng phn chia mu naKhng cn li mu no ti nt

  • La chn thuc tnh o la chn thuc tnh: Thuc tnh c chn l thuc tnh c li nht cho qu trnh phn lp (to ra cy nh nht)C 2 o thng dng1. li thng tin (Information gain)Gi s tt c cc thuc tnh dng phi sC th bin i p dng cho thuc tnh s2. Ch s Gini (Gini index)Gi s tt c cc thuc tnh dng sGi s tn ti mt vi gi tr c th phn chia gi tr ca tng thuc tnhC th bin i p dng cho thuc tnh phi s

  • li thng tin(Information gain)S: s lng tp hun luynSi: s cc mu ca S nm trong lp Ci vi i = {1, , m} Thng tin cn bit phn lp mt mu

  • li thng tinThuc tnh A c cc gi tr {a1, a2, ,an}Dng thuc tnh A phn chia tp hun luyn thnh n tp con {S1, S2, , Sn} Sij : s mu ca lp Ci thuc tp con Sj (A=aj)Entropy ca thuc tnh A:

    li thng tin da trn phn nhnh bng thuc tnh A:

    Ti mi cp, chng ta chn thuc tnh c li ln nht phn nhnh cy hin ti

  • V d

  • li thng tin, v dTa cS = 14m = 2C1 = Yes, C2 = NoS1 = 9, S2 = 5

  • li thng tin, v dGain(S,Humidity)=0.940 (7/14)*0.985 (7/14)*0.592=0.151E=0.985E=0.592HumidityNormal[3+, 4-]High[6+, 1-]Ghi ch: tnh log25 bng my tnh in t, nhn: 5 log / 2 log =

  • li thng tin, v dGain(S,Wind)=0.940 (8/14)*0.811 (6/14)*1.000=0.048E=0.811E=1.000WindStrong[6+, 2-]Weak[3+, 3-]

  • li thng tin, v dGain(S,Outlook)=0.940 (5/14)*0.971 (4/14)*0.0 (5/14)*0.0971=0.247E=0.971E=0.000OutlookOvercast[2+, 3-]Sunny[4+, 0-]E=0.971[3+, 2-]RainGain(S,Humidity)=0.151Gain(S,Wind)=0.048

  • Ch s GiniCh s Gini ca nt t:

    Trong l tn sut ca lp j trong nt tLn nht l 1-1/nc khi cc mu phn b u trn cc lpThp nht l 0 khi cc mu ch thuc v mt lp

  • V d ch s GiniP(C1) = 0/6 = 0P(C2) = 6/6 = 1GINI = 1 (P(C1)2+P(C2)2) = 1 (0+1) = 0P(C1) = 1/6P(C2) = 5/6GINI = 1 (1/6)2 (5/6)2 = 0.278P(C1) = 2/6P(C2) = 4/6GINI = 1 (2/6)2 (4/6)2 = 0.444

    C11C25

    C12C24

  • Phn nhnh bng ch s GiniKhi phn chia nt p thnh k nhnh, cht lng ca php chia c tnh bng:

    trong ni l s mu trong nt in l s mu trong nt pChn thuc tnh c GINIchia nh nht phn nhnh

  • Phn nhnh thuc tnh nh phnCh phn thnh 2 nhnhGini(N1) =1-(5/6)2-(2/6)2=0.194Gini(N2) =1-(1/6)2-(4/6)2=0.528Ginichia =7/12*0.194+5/12*0.528=0.333

    N1N2C151C224Gini=0.333

  • Phn chia thuc tnh c gi tr lin tcDa trn mt gi tr nu mun phn chia nh phnDa trn vi gi tr nu mun c nhiu nhnhVi mi gi tr tnh cc mu thuc mt lp theo dng AvCch chn gi tr v n gin: vi mi gi tr v trong CSDL u tnh Gini ca n v ly gi tr c Gini nh nht km hiu quTax> 80K< 80K

  • Phn chia thuc tnh c gi tr lin tcCch chn gi tr v hiu qu:Sp xp cc gi tr tng dnChn gi tr trung bnh ca tng gi tr ca thuc tnh phn chia v tnh ch s giniChn gi tr phn chia c ch s gini thp nht

  • Bin i cy quyt nh thnh lutBiu din tri thc di dng lut IF-THEN Mi lut to ra t mi ng dn t gc n l Mi cp gi tr thuc tnh dc theo ng dn to nn php kt (php AND v)Cc nt l mang tn ca lp

  • Bin i cy quyt nh thnh lutWindHumidityOutlookYesNoYesSunnyOvercastRainYesNoHighNormalStrongWeakR1: If (Outlook=Sunny) (Humidity=High) Then Play=No R2: If (Outlook=Sunny) (Humidity=Normal) Then Play=YesR3: If (Outlook=Overcast) Then Play=Yes R4: If (Outlook=Rain) (Wind=Strong) Then Play=NoR5: If (Outlook=Rain) (Wind=Weak) Then Play=Yes

  • u im ca cy quyt nhCy quyt nh d hiuVic chun b d liu cho mt cy quyt nh l c bn hoc khng cn thitCy quyt nh c th x l c d liu c gi tr bng s v d liu c gi tr l tn th loiCy quyt nh l mt m hnh hp trngC th thm nh mt m hnh bng cc kim tra thng kCy quyt nh c th x l tt mt lng d liu ln trong thi gian ngn

Recommended