Upload
dylan-anderson
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Using Bayesian Networks to Analyze Expression Data
N. Friedman, M. Linial, I. Nachman, D. Pe’er @ Hebrew University
What I will cover
• Domain background• Overview of their work• Causal networks vs. Bayes networks• Application • Results
• What are gene expressions?– It is the process in which information is used in the
synthesis of a functional gene product (protein or Rna).
• Think of it as a menu for a dinner given a certain holiday.– Need certain ingredients / food to pull it off right.– Too much or too little of something can lead to
odd results.
• Advancement in technology lead to DNA Microarrays.– Snapshot of internals of a cell at a given moment in time.– No more having to look at one gene at a time for
comparison.• Most computational analysis has focused on
clustering algorithms.– Cluster like genes with like genes.– Useful for finding co-regulated genes but not really for
finding the structure of the regulation process.
Overview
• How to discover key relations in cellular systems given large amounts of micro array data.
• Propose a Bayesian Network framework for gene interaction discovery from micro array data.– Trying to build statistical dependencies.– Understand interactions from multiple expression
measurements.
Overview
• Want to uncover properties of the network by examining the dependence and conditional dependence of the gene data.– How does one gene interact with another etc.– Can use this information to determine causal
influence.
Bayesian Network
• Useful for a few reasons– Great for describing locally interacting entities.– Well understood array of algorithms and
successful use in many areas.– Can be used to infer a causal network even though
they are not mathematically defined as such.– Able to handle noise fairly well.
Causal Network
• Very similar to a typical Bayesian net.• Bayesian network with a strict requirement
that the relationships are causal.– X causes something about Y.
• Learning multiple networks with the same directed path could mean there is a causal indication between X and Y.
Bayes vs Causal
• Bayesian Network generally deals with dependence.
• Causal Networks deal with strict relationships.• Bayesian Network can have equivalent
networks.– X Y is equivalent to Y X
• Causal Network– The above cannot hold due to the definition of
Causal networks.
Learning Causal Patterns
• Need to determine a causal interpretation of the network.
• Observation– Passive domain measurement.
• Intervention– Setting variable values using outside forces.
Causal Markov assumption
• Given the values of a variables immediate causes, it is independent of its earlier causes.– Once we know the makeup of the genes parents,
we don’t care about the ancestors anymore in terms of the current gene.
Analyzing Expression Data
• Consider distributions over all possible states ( can include environmental states etc)
• State of the system is a series of random variables.– Each random variable denotes expression level of
each gene.• Take all of these variables and build the joint
distribution.
• Difficult to learn from expression data due to involving transcript levels from thousands of genes!
• However these gene networks are sparse so Bayes Nets are still well suited.
Learning the model
• Markov relations are a feature that indicates if two genes are related in a joint biological process.
• Order relations are a feature that captures a global property about the network.– Used as an indication of some causality between X
and Y. Its not certain though.
Confidence of features
• Produce m different networks and for each feature of interest calculate its confidence.
• Where f(G) is 1 if f is a feature of G, 0 otherwise.
m
iiGfm
fconf1
)(1
)(
Learning the network structure
• Issues– Extremely large search space (super-exponential in
the number of variables)• Need to id potential parents for each gene
using simple statistics to build the network.– Reduces search space to networks that only
contain the candidate parents as parents of some variables Xi .
Different local probability models
• Multinomial Model– Treat each variable as discrete and learn
multinomial distribution to describe the possible state of each child given the stat of the parents.
• Linear Gaussian Model– Linear regression model for the child given its
parents.
Results
• Applied Cell Cycle Expression patterns.• 76 gene expression measurements.• Treat each measurement as an independent
sample.• Performed the boot strapping algorithm along
with the sparse search algorithm to extract learned features.– Performed on only 250 genes
Test robustness
• Tested their confidence assessment by using a randomly created data set. Random permutation of the order of experiments per gene.– Found that random data did not perform well due
to not finding real features that correspond in the data.
– Tells us that the learned features are not artifacts of the boot strapping estimation.