1
www.polyomx.org Wang Y 1,2 , Damaraju S 1,3,4 , Cass CE 1,3,4 , Murray D 3,4 , Fallone G 3,4 , Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1 , Department of Computing Science 2 and Oncology 3 , U of A and Cross Cancer Institute 4 Study Design AIM AIM : To explore the possible relationship : To explore the possible relationship between 51 single nucleotide polymorphisms (SNPs) between 51 single nucleotide polymorphisms (SNPs) in candidate genes encoding DNA damage, in candidate genes encoding DNA damage, recognition/repair/response and clinical radiation recognition/repair/response and clinical radiation toxicity in a retrospective cohort of patients toxicity in a retrospective cohort of patients (n=82) treated with conformal radiotherapy (3DCRT) (n=82) treated with conformal radiotherapy (3DCRT) for prostate cancer. In this study, we tested for prostate cancer. In this study, we tested techniques from Machine Learning (ML) to build techniques from Machine Learning (ML) to build classifiers and to predict toxicity in patients' classifiers and to predict toxicity in patients' treated with radiation. treated with radiation. SNPs SNPs (Single Nucleotide Polymorphisms) are (Single Nucleotide Polymorphisms) are commonly occurring genetic variations. SNPs may commonly occurring genetic variations. SNPs may affect an individual's susceptibility to disease affect an individual's susceptibility to disease or response to particular treatment by altering or response to particular treatment by altering the expression of the gene in which it occurs. the expression of the gene in which it occurs. - -- - Analysis of Single Nucleotide Polymorphisms in Candidate Genes and Application of Machine Learning Techniques to Predict Radiation Toxicity in Prostate Cancer Patients Treated with Conformal Radiotherapy Methods SNPs served as features (independent variables) and the patient response to treatment as the class label (dependent variable). Patients (n=28) with adverse reactions (rectal bleeding) to radiation more than 90 days after treatment were considered as negative and the remaining 54 as positives in a binary classification. We considered two types of classifiers: the "J48" decision tree and the "KStar" nearest-neighbor. For each classifier, we also used information gain to rank the quality of the SNPs and then considered classifiers based on the top k SNPs, for different "ks”. We used ten-fold cross validation to estimate the quality (predictive accuracy) of each classifier with each feature subset as a way to identify the best classification system. We ran a permutation test (using 4000 trials) to test the significance of our results . Decision Trees is a tree-structured decision diagram based on the training data. It can be used to classify new data. Information Gain is a concept coming from the information and decision tree theory. It defines the increase in information which is caused by adding a new attribute node to a rule or decision tree. Usually an attribute with high information gain should be preferred to other attributes. Results Our initial analysis suggested 70-80% prediction accuracy by the following SNPs in this rank order: XRCC3 (A>G, 5’ UTR Nt 4541), CYP2D6*4 (G>A, Splicing defect), BRCA2 (A>G, K 1132 K), MLH1 (C>T, V 219 I), BRCA1 (A>G, R 356 Q), RAD51 (G>T, 5’ UTR Nt 172), BRCA2 (A>G, S 455 S), BRCA2 (C>A, N 289 H), and BRCA2 (A>G, D 991 N). The 4,000-trial permutation test demonstrated significance at the p<0.05 level for both J48 and KStar classifiers. Radiation toxicity Radiation toxicity : Patients treated with : Patients treated with conformal radiotherapy (3DCRT) were given a conformal radiotherapy (3DCRT) were given a RTOG toxicity score from 0 - 5. We assigned RTOG toxicity score from 0 - 5. We assigned positive and negative labels for each positive and negative labels for each patient based on toxicity scores such that patient based on toxicity scores such that a score of 2 or higher during the course of a score of 2 or higher during the course of the treatment was considered negative or the treatment was considered negative or experiencing adverse reaction to radiation experiencing adverse reaction to radiation therapy, while others were given given a therapy, while others were given given a positive label. positive label. Machine Learning: Machine Learning: The field of machine The field of machine learning is concerned with the question of learning is concerned with the question of how to construct computer programs that how to construct computer programs that automatically improve with experience. [1] automatically improve with experience. [1] The techniques are designed to find The techniques are designed to find patterns in training data and classify new patterns in training data and classify new data. data. KStar is a nearest neighbor method with a generalized distance function based on transformations. Permutation test: Randomly rearrange LABELS of data, and run through the same algorithm. • 0 if attribute “a” is NOT correlated with class “c” • Positive if correlated K-fold Cross Validation is a common method used for model checking. ( Example: when K=3) Reference [1] Mitchell, T. Machine Learning. McGraw- Hill, Boston, 1997. Conclusion: Machine Learning techniques can be used for SNP data analyses and clinical treatment outcome prediction. This preliminary analysis demonstrates the utility of Machine Learning in discriminating between populations according to SNP data towards identifying predictive SNPs for use in radio-genomics in the near future. Acknowledgements This work was funded by the Research Initiatives Pr of the Alberta Cancer Board. a c a P c P a c P a c P C A I , ) ( ) ( ) , ( log ) , ( ) , (

Www.polyomx.org Wang Y 1,2, Damaraju S 1,3,4, Cass CE 1,3,4, Murray D 3,4, Fallone G 3,4, Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1, Department

Embed Size (px)

Citation preview

Page 1: Www.polyomx.org Wang Y 1,2, Damaraju S 1,3,4, Cass CE 1,3,4, Murray D 3,4, Fallone G 3,4, Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1, Department

www.polyomx.org

Wang Y1,2, Damaraju S1,3,4, Cass CE1,3,4, Murray D3,4, Fallone G3,4, Parliament M3,4 and Greiner R1,2

PolyomX Program1, Department of Computing Science2 and Oncology3, U of A and Cross Cancer Institute4

Study Design

AIMAIM: To explore the possible relationship between 51 single : To explore the possible relationship between 51 single nucleotide polymorphisms (SNPs) in candidate genes encoding nucleotide polymorphisms (SNPs) in candidate genes encoding DNA damage, recognition/repair/response and clinical radiation DNA damage, recognition/repair/response and clinical radiation toxicity in a retrospective cohort of patients (n=82) treated with toxicity in a retrospective cohort of patients (n=82) treated with conformal radiotherapy (3DCRT) for prostate cancer. In this conformal radiotherapy (3DCRT) for prostate cancer. In this study, we tested techniques from Machine Learning (ML) to study, we tested techniques from Machine Learning (ML) to build classifiers and to predict toxicity in patients' treated with build classifiers and to predict toxicity in patients' treated with radiation. radiation.

SNPs SNPs (Single Nucleotide Polymorphisms) are commonly (Single Nucleotide Polymorphisms) are commonly occurring genetic variations. SNPs may affect an individual's occurring genetic variations. SNPs may affect an individual's susceptibility to disease or response to particular treatment by susceptibility to disease or response to particular treatment by altering the expression of the gene in which it occurs.altering the expression of the gene in which it occurs.

--

---

--

Analysis of Single Nucleotide Polymorphisms in Candidate Genes and Application of Machine Learning Techniques to Predict Radiation

Toxicity in Prostate Cancer Patients Treated with Conformal Radiotherapy

Methods

SNPs served as features (independent variables) and the patient response to treatment as the class label (dependent variable). Patients (n=28) with adverse reactions (rectal bleeding) to radiation more than 90 days after treatment were considered as negative and the remaining 54 as positives in a binary classification. We considered two types of classifiers: the "J48" decision tree and the "KStar" nearest-neighbor. For each classifier, we also used information gain to rank the quality of the SNPs and then considered classifiers based on the top k SNPs, for different "ks”. We used ten-fold cross validation to estimate the quality (predictive accuracy) of each classifier with each feature subset as a way to identify the best classification system. We ran a permutation test (using 4000 trials) to test the significance of our results .

Decision Trees is a tree-structured decision diagram based on the training data. It can be used to classify new data.

Information Gain is a concept coming from the information and decision tree theory. It defines the increase in information which is caused by adding a new attribute node to a rule or decision tree. Usually an attribute with high information gain should be preferred to other attributes.

Results

Our initial analysis suggested 70-80% prediction accuracy by the following SNPs in this rank order: XRCC3 (A>G, 5’ UTR Nt 4541), CYP2D6*4 (G>A, Splicing defect), BRCA2 (A>G, K 1132 K), MLH1 (C>T, V 219 I), BRCA1 (A>G, R 356 Q), RAD51 (G>T, 5’ UTR Nt 172), BRCA2 (A>G, S 455 S), BRCA2 (C>A, N 289 H), and BRCA2 (A>G, D 991 N). The 4,000-trial permutation test demonstrated significance at the p<0.05 level for both J48 and KStar classifiers.

Radiation toxicityRadiation toxicity: Patients treated with conformal : Patients treated with conformal radiotherapy (3DCRT) were given a RTOG toxicity score from radiotherapy (3DCRT) were given a RTOG toxicity score from 0 - 5. We assigned positive and negative labels for each patient 0 - 5. We assigned positive and negative labels for each patient based on toxicity scores such that a score of 2 or higher during based on toxicity scores such that a score of 2 or higher during the course of the treatment was considered negative or the course of the treatment was considered negative or experiencing adverse reaction to radiation therapy, while others experiencing adverse reaction to radiation therapy, while others were given given a positive label.were given given a positive label.

Machine Learning:Machine Learning: The field of machine learning is concerned The field of machine learning is concerned with the question of how to construct computer programs that with the question of how to construct computer programs that automatically improve with experience. [1] The techniques are automatically improve with experience. [1] The techniques are designed to find patterns in training data and classify new data.designed to find patterns in training data and classify new data.

KStar is a nearest neighbor method with a generalized distance function based on transformations.

Permutation test: Randomly rearrange LABELS of data, and run through the same algorithm.

• 0 if attribute “a” is NOT correlated with class “c”• Positive if correlated

K-fold Cross Validation is a common method used for model checking. ( Example: when K=3)

Reference[1] Mitchell, T. Machine Learning. McGraw-Hill, Boston, 1997.

Conclusion: Machine Learning techniques can be used for SNP data analyses and clinical treatment outcome prediction. This preliminary analysis demonstrates the utility of Machine Learning in discriminating between populations according to SNP data towards identifying predictive SNPs for use in radio-genomics in the near future.

AcknowledgementsThis work was funded by the Research Initiatives Program of the Alberta Cancer Board.

ac aPcP

acPacPCAI

, )()(

),(log),(),(