4
(IIHFWLYH SUHGLFWLRQ RI WKUHH FRPPRQ GLVHDVHV E\ FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU LPEDODQFHG PHGLFDO GDWD 0LQ =HQJ %HLML =RX )DUDQ :HL ;L\DR /LX /HL :DQJ 6FKRRO RI ,QIRUPDWLRQ 6FLHQFH DQG (QJLQHHULQJ &HQWUDO 6RXWK 8QLYHUVLW\ &KDQJVKD 3HRSOHV 5HSXEOLF RI &KLQD 0RELOH +HDOWK 0LQLVWU\ RI (GXFDWLRQ&KLQD 0RELOH -RLQW /DERUDWRU\ &HQWUDO 6RXWK 8QLYHUVLW\ &KDQJVKD 3HRSOHV 5HSXEOLF RI &KLQD &RUUHVSRQGLQJ DXWKRU /HL :DQJ (PDLO DGGUHVV ]HQJPLQ#FRP 0LQ =HQJ EM]RX#FVXHGXFQ %HLML =RX IUDQZHH#FRP)DUDQ :HL O[\]RHZ[#FVX ;L\DR /LX ZDQJOHL#FVXHGXFQ /HL :DQJ Abstract²'LDEHWHV YHUWHEUDO FROXPQ SDWKRORJLHV DQG 3DUNLQVRQV GLVHDVH DUH WKUHH FRPPRQ GLVHDVHV ZKLFK KDYH KLJK SUHYDOHQFH DQG EURXJKW JUHDW WURXEOH DQG SDLQ WR ELOOLRQV RI SDWLHQWV &RPSXWHU DLGHG GLDJQRVLV FDQ VXSSRUW GHFLVLRQ PDNLQJ RI SK\VLFLDQV +RZHYHU LPEDODQFHG QDWXUH RI GDWD VHWV KDPSHUHG WKH PLQLQJ RI PHGLFDO UHVRXUFHV ,Q WKLV VWXG\ ZH SURSRVHG D SRZHUIXO SUHSURFHVVLQJ PHWKRG E\ FRPELQLQJ 6\QWKHWLF 0LQRULW\ 2YHUVDPSOLQJ 7HFKQLTXH 6027( ZLWK 7RPHN OLQNV WHFKQLTXH DQG WKHQ LV DSSOLHG WR WKH LPEDODQFHG PHGLFDO GDWD VHWV RI WKH WKUHH GLVHDVHV %\ XVLQJ FODVVLILHUV ZH FRPSDUHG WKH H[SHULPHQWDO UHVXOWV ZLWK WKRVH RI XVLQJ RQO\ 6027( WHFKQLTXH WR HYDOXDWH WKH HIIHFWLYHQHVV RI WKLV PHWKRG 7KH UHVXOWV VKRZ WKDW WKH PHWKRG RI 6027( FRPELQHG ZLWK 7RPHN OLQNV WHFKQLTXH LV PXFK VXSHULRU FRPSDUHG ZLWK WKDW RI XVLQJ RQO\ 6027( 7KH SHUIRUPDQFHV DUH HYLGHQWO\ EHWWHU ZLWK RXW RI D WRWDO RI HYDOXDWLRQ PHWULFV DUH LPSURYHG IRU GLDEHWHV 3DUNLQVRQV GLVHDVH DQG YHUWHEUDO FROXPQ UHVSHFWLYHO\ Keywords-imbalanced medical data; SMOTE; Tomek links; diabetes; vertebral column pathologies; Parkinson's disease , ,1752'8&7,21 'LDEHWHV YHUWHEUDO FROXPQ SDWKRORJLHV DQG 3DUNLQVRQV GLVHDVH EURXJKW JUHDW WURXEOH DQG SDLQ WR D JUHDW QXPEHU RI SDWLHQWV 7KHVH GLVHDVHV DUH FDXVLQJ VHULRXV KDUP WR SHRSOHV KHDOWK DQG OLYLQJ TXDOLW\ DQG DUH EULQJLQJ D KHDY\ EXUGHQ WR WKH IDPLO\ DQG RXU VRFLHW\ $FFXUDWH GLDJQRVLV RI WKHVH GLVHDVHV LV YLWDO WR EHWWHU LPSURYH WKH SDWLHQWV TXDOLW\ RI OLIH 0DFKLQH OHDUQLQJ WHFKQLTXHV DUH GHYHORSLQJ UDSLGO\ DQG DUH XVHG IRU WKH GHWHFWLRQ DQG SUHGLFWLRQ RI WKHVH FRPPRQ GLVHDVHV >@ +RZHYHU PHGLFDO GDWD DUH QRUPDOO\ FROOHFWHG RYHU D ORQJ SHULRG RI WLPH DQG WKXV ZH RIWHQ HQFRXQWHU LPEDODQFHG GDWD VHWV 'DWD LPEDODQFH PHDQV WKDW WKHUH LV QRW DQ HYHQ GLVWULEXWLRQ RI VDPSOHV EHWZHHQ WKH GLIIHUHQW FODVVHV 7KH LPEDODQFHG QDWXUH RI PHGLFDO GDWD RIWHQ KDPSHUHG WKH PLQLQJ RI PHGLFDO UHVRXUFHV $OWKRXJK JUHDW SURJUHVV KDV EHHQ DFKLHYHG LQ PDFKLQH OHDUQLQJ LW UHPDLQV D FKDOOHQJLQJ WDVN WR FRQVWUXFW HIILFLHQW DOJRULWKPV WKDW OHDUQ IURP LPEDODQFHG GDWD 6HYHUDO PHWKRGV KDYH EHHQ SURSRVHG WR GHDO ZLWK LPEDODQFHG GDWD DQG GDWD VDPSOLQJ PHWKRG LV RQH RI WKH PRVW IUHTXHQWO\ XVHG SUHSURFHVVLQJ WHFKQLTXH >@ 6\QWKHWLF 0LQRULW\ 2YHUVDPSOLQJ 7HFKQLTXH 6027( DV D HIIHFWLYH GDWD VDPSOLQJ DOJRULWKP KDV EHHQ DSSOLHG WR DQDO\]H LPEDODQFHG PHGLFDO GDWD >@ 7KH UHVXOWV LQ 5HIV >@ VKRZHG WKDW LPSURYHG SHUIRUPDQFH ZDV REWDLQHG E\ XVLQJ 6027( PHWKRG FRPSDUHG ZLWK WKRVH RI XVLQJ UDZ GDWD VHWV 2WKHU ZRUN >@ GHPRQVWUDWHG WKDW 6027( PHWKRG LV VXSHULRU RU DW OHDVW FRPSDUDEOH WR FRQYHQWLRQDO UDQGRP VDPSOLQJ WHFKQLTXHV 7KHVH UHVHDUFKHV LPSURYHG WKH SHUIRUPDQFH RI FODVVLILHUV E\ XVLQJ WKH 6027( PHWKRG :H VXJJHVW WKDW WKH SHUIRUPDQFH FDQ EH IXUWKHU LPSURYHG E\ FRPELQLQJ 6027( ZLWK GDWD FOHDQLQJ WHFKQLTXHV ,Q WKLV ZRUN ZH SURSRVHG D SUHSURFHVVLQJ WHFKQLTXH RI FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WR DGGUHVV WKH SUREOHP RI LPEDODQFHG PHGLFDO GDWD 7RPHN OLQNV DV D GDWD FOHDQLQJ WHFKQLTXH ZHUH HIIHFWLYHO\ DSSOLHG WR UHPRYH WKH VDPSOHV ZKLFK JHQHUDWHG E\ WKH 6027( PHWKRG QHDU WKH ERXQGDU\ RI FODVVLILFDWLRQ %\ FRPELQLQJ 7RPHN OLQNV WHFKQLTXH WKH ERXQGDU\ EHWZHHQ GLIIHUHQW FODVVHV FDQ EH HDVLO\ LGHQWLILHG 7R WKH EHVW RI RXU NQRZOHGJH VXFK D FRPELQHG WHFKQLTXH KDV QHYHU EHHQ XWLOL]HG LQ WKH WUHDWPHQW RI PHGLFDO GDWD 2XU UHVXOWV VKRZ WKDW PXFK EHWWHU SHUIRUPDQFH LV REWDLQHG E\ XVLQJ WKH 6027( ZLWK 7RPHN OLQNV WKDQ XVLQJ RQO\ 6027( %DVHG RQ SUHYLRXV UHVHDUFKHV >@ DQG RXU H[SHULPHQWDO UHVXOWV ZH FRQFOXGH WKDW WKH FRPELQHG PHWKRG XVLQJ 6027( DV D GDWD VDPSOLQJ WHFKQLTXH FRPELQLQJ 7RPHN OLQNV DV D GDWD FOHDQLQJ WHFKQLTXH LV D SRZHUIXO SUHSURFHVVLQJ DOJRULWKP WR DGGUHVV LPEDODQFHG PHGLFDO FODVVLILFDWLRQ GDWD ,, 0(7+2'2/2*< 6\QWKHWLF 0LQRULW\ 2YHUVDPSOLQJ 7HFKQLTXH 6027( D NLQG RI RYHUVDPSOLQJ WHFKQLTXH ZDV SURSRVHG E\ &KDZOD HW DO >@ 7KH NH\ LGHD LV WR ILQG . QHDUHVW QHLJKERUV ZKLFK GHILQHG DV WKH . HOHPHQWV EHORQJ WR WKH PLQRULW\ FODVV IRU HDFK PLQRULW\ FODVV VDPSOH x i DQG WKHQ UDQGRPO\ VHOHFWV RQH i x Ö RI WKHVH QHLJKERUV %\ XVLQJ 978-1-4673-7755-3/16/$31.00 ©2016 IEEE 225 Proceedings of ICOACS2016

FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU …

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU …

Abstract

Keywords-imbalanced medical data; SMOTE; Tomek links; diabetes; vertebral column pathologies; Parkinson's disease

xi

ix 978-1-4673-7755-3/16/$31.00 ©2016 IEEE

225

Proceedings of ICOACS2016

Page 2: FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU …

xnew

iiinew xxxx

xi xjxi xj

d xi xjxi xj xk

d xi xk d xi xj d xj xk d xixj xi xj

226

Proceedings of ICOACS2016

Page 3: FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU …

227

Proceedings of ICOACS2016

Page 4: FRPELQLQJ 6027( ZLWK 7RPHN OLQNV WHFKQLTXH IRU …

228

Proceedings of ICOACS2016