Chengwei LEI, Ph.D. Assistant Professor of Computer Science Department of Electrical Engineering and...

Preview:

Citation preview

A Random Walk Based Approach

for Improving Interaction Network and

Increasing Prediction Accuracy

Chengwei LEI, Ph.D.Assistant Professor of Computer Science

Department of Electrical Engineering and Computer ScienceMcNeese State University

• Interaction network is a network of nodes that are connected by features.

What is Interaction Network

• If the feature is a physical and molecular, the interaction network is molecular interactions usually found in cells.

First Introduced in Biology

Network View of Protein Interaction Network

Sounds familiar?

Sounds familiar?

Even In Mechanical Engineering

Real-world Classification

• Noisy data

• Overfitting problem

• Few true “driver” changes / vast number of “passenger” changes.

Good Bad

Current Methods

Classifier

Prediction

Current Methods

Classifier

Prediction

Statistical test

Pick the most significant ones

Problem?

• Ignore the relationships between nodes/features/sensors

Our approach

• Improve prognosis by combining

– Node readout data – Node-node interaction networks

Classifier

Prediction

Network

Transformation Matrix

Network

TransformationMatrix

Classifier

Network

Prediction

Transformation Matrix

Transformation Matrix

• Transformation matrix is generated by apply the Random Walk with Restart (RWR) algorithm on the Interaction network.

• A random walk is a mathematical formalization of a path that consists of a succession of random steps.

Random Walk

• A random walk is a mathematical formalization of a path that consists of a succession of random steps.

• Random walk for one node on a graph G is a walk on G where the next node is chosen uniformly at random from the set of neighbors of the current node– when the walk is at node v, the probability to

move in the next step to the neighbor u is Pvu = 1/d(v) for (v, u) is connected and 0 otherwise.

Random Walk

Random Walk

Random Walk

Step 1Step 1

Random Walk

Random Walk

Step 2Step 2

Random Walk

Random Walk

Random Walk

Random Walk

Step 3

Random Walk

Step 2Step 1

…… Step NStep 3

Random Walk with Restart

• A random walker start from a node (v) with – uniform probability to visit its neighbors – fixed probability c to revisit the start node

(v)• The probability for a random walker to

be on node j after k times is

– fijk(v) is the probability for a random walker

to take path i to j at time k– Fj(v) at equilibrium is the probability for a random

walker starting from node v to reach node j => Similarity between patient v and j

How about Two?

• Biology Data– Cancer prediction

Experiments

Classification results

Wang’s Dataset

Network

TransformationMatrix

1 1 0 … 1

1 1 0 … 1

0 0 1 … 0

… … … … …

1 1 0 … 1

1 1 0 … 1

1 1 0 … 1

… … … … …

1 1 0 … 1

286

Wang’s Dataset

7885

10144

10144

10144

7885

286

7885

2259

286

7885

286

7885

2259

1247T-test

Good Bad

286

7885

286

7885

2259

1678

T-test

Good Bad

286

7885

286

7885

2259

483119552

Pvalue comparison for Wang’s data

Significantlydown-regulated

genes

Significantlyup-regulated

genes

For Vijver’s dataset

146349 856

DE Genes

Further verification

• For verification, search each gene in the PubMed database – pick the top DE genes from the original dataset

and the enhanced dataset,– with keyword “( GENE-NAME ) AND Cancer AND

(Metastasis or Metastatic) ”.

Top 15 DE genes in original dataset

Top 15 original non-significant genes in the enhanced dataset

Top 15 original non-significant genes in the enhanced dataset

• SLC26A8 is a male reproductive system diseases related gene

• It is also related to breast cancer

Top 15 original non-significant genes in the enhanced dataset

• SLC26A8 is a male reproductive system diseases related gene

• It is also related to breast cancer– A. E. Dahm, A. L. Eilertsen, J.

Goeman, “A microarray study on the effect of four hormone therapy regimens on gene transcription in whole blood from healthy postmenopausal women,” Thrombosis research, vol. 130, no. 1, pp. 45–51, 2012.

– J.-H. Shin, E. Son, H. Lee, S. Kim, “Molecular and functional expression of anion exchangers in cultured normal human nasal epithelial cells,” Acta physiologica, vol. 191, no. 2, pp. 99–110, 2007

Top 15 original non-significant genes in the enhanced dataset

• RPS6 is a very important gene in cancer research, especially for the cancer antibodies drug development

Top 15 original non-significant genes in the enhanced dataset

• RPS6 is a very important gene in cancer research, especially for the cancer antibodies drug development

– J. C. Potratz, D. N. Saunders, D. H. Wai, et al., “Synthetic lethality screens reveal rps6 and mst1r as modifiers of insulin-like growth factor-1 receptor inhibitor activity in childhood sarcomas,” Cancer research, vol. 70, no. 21, pp. 8770–8781, 2010.

– F. Henjes, C. Bender, S. von der Heyde, L. Braun, H. et al., “Strong egfr signaling in cell line models of erbb2-amplified breast cancer attenuates response towards erbb2-targeting drugs,” Oncogenesis, vol. 1, no. 7, p. e16, 2012.

Top 15 original non-significant genes in the enhanced dataset

• G2E3 is a dual function ubiquitin ligase required for early embryonic development

• and also a nucleo-cytoplasmic shuttling protein with DNA damage responsive localization

Top 15 original non-significant genes in the enhanced dataset

• G2E3 is a dual function ubiquitin ligase required for early embryonic development

• and also a nucleo-cytoplasmic shuttling protein with DNA damage responsive localization

– W. S. Brooks, E. S. Helton, S. Banerjee, “G2e3 is a dual function ubiquitin ligase required for early embryonic development,” Journal of Biological Chemistry, vol. 283, no. 32, pp. 22 304–22 315, 2008.

Top 15 original non-significant genes in the enhanced dataset

• RACGAP1 plays a regulatory role in cell growth, transformation and metastasis

Top 15 original non-significant genes in the enhanced dataset

• RACGAP1 plays a regulatory role in cell growth, transformation and metastasis

– S. Saigusa, K. Tanaka, Y. Mohri, M. Ohi, T. Shimura, et al., “Clinical signif-icance of racgap1 expression at the invasive front of gastric cancer,” Gastric Cancer, pp. 1–9, 2014.

– V. Kotoula, K. T. Kalogeras, G. Kouvatseas, D. Televantou, R. Kro-nenwett, “Sample parameters affecting the clinical relevance of rna biomarkers in translational breast cancer research,” Virchows Archiv, vol. 462, no. 2, pp. 141–154, 2013.

– K. Pliarchopoulou, K. Kalogeras, R. Kronenwett, et al., “Prognostic significance of racgap1 mrna expression in high-risk early breast cancer: a study in primary tumors of breast cancer patients participating in a randomized hellenic cooperative oncology group trial,” Cancer chemotherapy and pharmacology, vol. 71, no. 1, pp. 245–255, 2013..

Top 15 original non-significant genes in the enhanced dataset

Ongoing Experiment

Thank you