Reconstructing Gene Networks Presented by Andrew Darling Based on article “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised

Reconstructing Gene Networks

Presented by Andrew Darling Based on article

“Research Towards Reconstruction of Gene Networks from Expression Data by Supervised Learning”

- Soinov, Krestyaninova, Brazma

Outline

Why study another microarray algorithm? Background info Methods Results Discussion Conclusion

Why study another microarray algorithm?

Study of microarray data continues Still unclear on what the data means Still unclear on how the genome works

Confirm existing knowledge about gene networks using existing datasets

Proof of concept in a new algorithm using existing knowledge and datasets

This algorithm actually explains its reasoning

Background information

What is a gene network? What is supervised learning? What are decision trees / classifiers? Why use classifiers?

What is a gene network?

A model of a genes affecting other genes What other genes affect a given gene How other genes affect a given gene

Positive, negative, complicated

Several model types – graphs, nodes, edges Boolean ( on – off ) Bayesian network ( conditional probability ) Differential equations ( derivatives, integrals )

Gene network - example

What is supervised learning?

The paper was unclear on the subjectPerhaps a reference to the type of algorithm

used It may have involved human interaction with

the softwarePossibly, the software produced the

classifiers in the form of a decision tree, then users interpreted the output into classification rules

What are decision trees / classifiers?

Acyclic directed graph - tree Each graph explains what other genes affect a

specific gene Inner nodes are gene products of other genes Edges are thresholds of concentration of the gene

products of the other genes – rules of the tree Leaf nodes are effects on transcription of the specific

gene

Each graph is a classifier for a specific gene

Classifiers – model of gene networks

Expression of gene is function of transcription Transcription of gene is in discrete states

Expressed more than average Expressed less than average

Transcription state affected by amount of other gene products (expression of other genes)

Use yeast cell cycle data to test algorithm and previous knowledge to judge accuracy

Why use classifiers?

The products affecting a specific gene are listed in the tree

Allows for continuous values for concentrations Each additional dataset refines the decision

information Decision trees are easy to read and interpret

Classifier - example

Methods

Use induction algorithm to generate decision trees Program called C4.5

Apply program three ways Regulation of target gene as a function of other genes

at same time (simultaneous) Regulation of target gene as a function of other genes

at previous times (time delay) Regulation as a function of change of other genes

(changes)

Results - given

These genes and yeast datasets

Spellman, Cho, … Cdc28 Alpha-factor

Results – produced this

Results – with this accuracy

Discussion

Some concern about the accuracy between 70% and 94% on systems with known interactions

Does that imply that the microarray data is wrong or the algorithm is flawed?

Conclusions

Decision trees and classifiers seem a better way to explain gene expression

This paper did not do a good job of explaining how to make / use them

Reference to the algorithm itself was almost specious

Documents

Reconstructing Gene Networks Presented by Andrew Darling Based on article “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised