23
Simulation and Application on learning gene causal relationships Xin Zhang

Simulation and Application on learning gene causal relationships Xin Zhang

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simulation and Application on learning gene causal relationships Xin Zhang

Simulation and Application on learning gene causal relationships

Xin Zhang

Page 2: Simulation and Application on learning gene causal relationships Xin Zhang

Introduction• High-throughput genetic technologies empowers

to study how genes interact with each other; • Simulation to evaluate how well IC algorithm

learns gene causal relationships;• We present an algorithm (mIC algorithm) for

learning causal relationship with knowledge of topological ordering information, and apply it on Melanoma dataset;

• Apply mIC algorithm on Melanoma dataset;

Page 3: Simulation and Application on learning gene causal relationships Xin Zhang

Steps for Simulation Study

• Construct a causal network N;• Generate datasets based on the causal network;• Learning the simulated data using causal

algorithms (e.g. IC algorithm) to obtain network N´;

• Compare the original network N with obtained network N´ w.r.t precision and recall;

Page 4: Simulation and Application on learning gene causal relationships Xin Zhang

Modeling and simulation of a causal Boolean network (BN)

• Boolean network:A

C

B

f

C=f(A,B)

• Constructing a causal structure;• Assign parameters (proper functions) for each

node with casual parents;• Assign probability distribution;

Page 5: Simulation and Application on learning gene causal relationships Xin Zhang

Constructing Boolean Network

1. Generate M BNs with up to 3 causal parents for each node;

2. For each BN, generate a random proper function for each node;

3. Assign random probabilities for the root gene(s);

4. Given one configuration, get probability distribution;

5. Collect 200 data points for each network;

6. Repeat above steps 3-5 for all M networks.

Page 6: Simulation and Application on learning gene causal relationships Xin Zhang

Constructing Causal Structure

A

C

B

E

D

Page 7: Simulation and Application on learning gene causal relationships Xin Zhang

Steps for constructing causal structure

Page 8: Simulation and Application on learning gene causal relationships Xin Zhang

Proper function (1)

Proper function: The function that reflects the influence of the operators.

Example:

By simplifying f, c is a function of a with c = a

b is a pseudo predictor of c, and has no effect on c.

f is not a proper function.

Page 9: Simulation and Application on learning gene causal relationships Xin Zhang

Proper function (2)

• Definition:

With n predictors, the number of proper function is given by:

Page 10: Simulation and Application on learning gene causal relationships Xin Zhang

Probability Distribution

Page 11: Simulation and Application on learning gene causal relationships Xin Zhang

Generating dataset

Page 12: Simulation and Application on learning gene causal relationships Xin Zhang

Steps of learning gene causal relationships

• Step1: obtain the probability distribution and data sampling;

• Step2: apply algorithms to find causal relations;• Step3: compare the original and obtained networks

based on the two notions of precision and recall;• Step4: repeat step 1-3 for every random network;

Page 13: Simulation and Application on learning gene causal relationships Xin Zhang

Comparing two networks

A

DC

B A

DC

B

Original Network Obtained Network

Page 14: Simulation and Application on learning gene causal relationships Xin Zhang

Precision and Recall

• Original graph is a DAG, while obtained graph has both directed and undirected edges;

Orig Graph Obt. Graph

FN

TP

TN

FP

PFN, PTP

PTN, PFP

Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)

Page 15: Simulation and Application on learning gene causal relationships Xin Zhang

Observational equivalence and Transitive Closure

• Two DAGs are said to be observational equivalent (OE) if they have the same skeleton and the same set of v-structure;

A

DC

B A

DC

BOE

Transitive closure (TC): A ->B -> C with A -> C

cc(x,y): is true if there is a directed or an undirected edge from x to y;

pcc(x,y): is true if there is a path from x to y consisting of properly directed and undirected edges

pcc(x,y):= cc(x,y) | pcc(x,z) pcc(z,y)

Page 16: Simulation and Application on learning gene causal relationships Xin Zhang

Result for IC algorithm

Page 17: Simulation and Application on learning gene causal relationships Xin Zhang

How to improve IC algorithm

• The original IC algorithm did not have good results on learning gene causal relationships;

• A possible way to improve the performance is to incorporate extra information;

• If we know the topological ordering of the regulatory network, it would be helpful to improve the learning result;

Page 18: Simulation and Application on learning gene causal relationships Xin Zhang

Gene topological ordering

• If a specific gene is the causal parent of another gene;

• In a pathway, if one gene appears before another gene;

• If one gene is at the beginning or at the end of the pathway;

IC algorithm + topological ordering information

Page 19: Simulation and Application on learning gene causal relationships Xin Zhang

mIC algorithm

• mIC algorithm based on IC, but incorporates both topological ordering information with steady state data to infer causality;

• 3 Steps of mIC algorithm:– Find conditional independence:

For each pair of gene gi and gj in a dataset, test pairwise conditional independence. If they are dependent, search for a set

Sij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}.

Construct an undirected graph G such that gi and gj are connected with an edge if an only if they are pairwise dependent and no Sij can be found;

– Find v-structure:

For each pair of nonadjacent genes gi and gj with common neighbor gk, if gk Sij, and k>i, k>j, add arrowheads pointing at gk, such as gi ->gk <- gj;

– Orientate more directed edges according to rules:

Orientate the undirected edges without creating new cycles and v-structures;

Page 20: Simulation and Application on learning gene causal relationships Xin Zhang

Results from mIC algorithm

Page 21: Simulation and Application on learning gene causal relationships Xin Zhang

Melanoma dataset

• The 10 genes involved in this study chosen from 587 genes from the melonoma data;

• Previous studies show that WNT5A has been identified as a gene of interest involved in melanoma;

• Controlling the influence of WNT5A in the regulation can reduce the chance of melanoma metastasizing;

Page 22: Simulation and Application on learning gene causal relationships Xin Zhang

Applying mIC algorithm on Melanoma Dataset

WNT5A

Partial biological prior knowledge:MMP3 is expected to be the end of the

pathway

Pirin causatively influences WNT5A – In order to maintain the level of

WNT5A we need to directly control WNT5A or through pirin.

WNT5A directly causes MART-1

Page 23: Simulation and Application on learning gene causal relationships Xin Zhang

Conclusion• Evaluated IC algorithm using simulation data;• We presented mIC algorithm that can infer gene causal

relationship from steady state data with gene topological ordering information;

• Performed simulation based on Boolean network to evaluate the performance of the causal algorithms;

• We applied mIC algorithm to real biological microarray data Melanoma dataset;

• The result showed that some of the important causal relationships associated with WNT5A gene have been identified using mIC algorithm.