Upload
harold-walton
View
212
Download
0
Embed Size (px)
Citation preview
Reconstructing Gene Networks
Presented by Andrew Darling Based on article
“Research Towards Reconstruction of Gene Networks from Expression Data by Supervised Learning”
- Soinov, Krestyaninova, Brazma
Outline
Why study another microarray algorithm? Background info Methods Results Discussion Conclusion
Why study another microarray algorithm?
Study of microarray data continues Still unclear on what the data means Still unclear on how the genome works
Confirm existing knowledge about gene networks using existing datasets
Proof of concept in a new algorithm using existing knowledge and datasets
This algorithm actually explains its reasoning
Background information
What is a gene network? What is supervised learning? What are decision trees / classifiers? Why use classifiers?
What is a gene network?
A model of a genes affecting other genes What other genes affect a given gene How other genes affect a given gene
Positive, negative, complicated
Several model types – graphs, nodes, edges Boolean ( on – off ) Bayesian network ( conditional probability ) Differential equations ( derivatives, integrals )
Gene network - example
What is supervised learning?
The paper was unclear on the subjectPerhaps a reference to the type of algorithm
used It may have involved human interaction with
the softwarePossibly, the software produced the
classifiers in the form of a decision tree, then users interpreted the output into classification rules
What are decision trees / classifiers?
Acyclic directed graph - tree Each graph explains what other genes affect a
specific gene Inner nodes are gene products of other genes Edges are thresholds of concentration of the gene
products of the other genes – rules of the tree Leaf nodes are effects on transcription of the specific
gene
Each graph is a classifier for a specific gene
Classifiers – model of gene networks
Expression of gene is function of transcription Transcription of gene is in discrete states
Expressed more than average Expressed less than average
Transcription state affected by amount of other gene products (expression of other genes)
Use yeast cell cycle data to test algorithm and previous knowledge to judge accuracy
Why use classifiers?
The products affecting a specific gene are listed in the tree
Allows for continuous values for concentrations Each additional dataset refines the decision
information Decision trees are easy to read and interpret
Classifier - example
Methods
Use induction algorithm to generate decision trees Program called C4.5
Apply program three ways Regulation of target gene as a function of other genes
at same time (simultaneous) Regulation of target gene as a function of other genes
at previous times (time delay) Regulation as a function of change of other genes
(changes)
Results - given
These genes and yeast datasets
Spellman, Cho, … Cdc28 Alpha-factor
Results – produced this
Results – with this accuracy
Discussion
Some concern about the accuracy between 70% and 94% on systems with known interactions
Does that imply that the microarray data is wrong or the algorithm is flawed?
Conclusions
Decision trees and classifiers seem a better way to explain gene expression
This paper did not do a good job of explaining how to make / use them
Reference to the algorithm itself was almost specious