Upload
josiah
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Understanding Gene Regulation: From Networks to Mechanisms. Daphne Koller Stanford University. Gene Regulatory Networks. Controlled by diverse mechanisms. Modified by endogenous and exogenous perturbations. http://en.wikipedia.org/wiki/Gene_regulatory_network. Goals. - PowerPoint PPT Presentation
Citation preview
Understanding Gene Regulation: From Networks to Mechanisms
Daphne KollerStanford University
Gene Regulatory Networks
http://en.wikipedia.org/wiki/Gene_regulatory_network
Controlled by diverse mechanismsModified by endogenous and
exogenous perturbations
Goals Infer regulatory network and mechanisms
that control gene expression
Identify effect of perturbations on network
Understand effect of gene regulation on phenotype
Outline Regulatory networks for gene expression Individual genetic variation and gene
regulation Cell differentiation and gene regulation Expression changes underlying phenotype
Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its
regulators Use expression of regulatory genes as regulators
PHO5PHM6
SPL2
PHO3PHO84
VTC3GIT1
PHO2
TEC1
GPA1
ECM18
UTH1MEC3
MFA1
SAS5SEC59
SGS1
PHO4
ASG7
RIM15
HAP1
PHO2
GPA1MFA1
SAS5PHO4
RIM15
Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006
Transcription factors, signal transduction proteins, mRNA
processing factors, …
Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs
Regulatory Network II
module
PHO5PHM6
SPL2
PHO3PHO84
VTC3GIT1
PHO2
TEC1
GPA1
ECM18
UTH1MEC3
MFA1
SAS5SEC59
SGS1
PHO4
ASG7
RIM15
HAP1
PHO2
GPA1MFA1
SAS5PHO4
RIM15
Targets
Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006
“Regulatory Program” ?
Module Networks* Learning quickly runs
out of statistical power Poor regulator selection
lower in the tree Many correct regulators
not selected Arbitrary choice among
correlated regulators
Combinatorial search Multiple local optima
repressor
true
repressor expressio
n false
contex
t B
mod
ule
gene
sre
gula
tion
pr
ogra
m
true
contex
t C
activatoractivator expression
contex
t A
false
target gene expression
induced
repressed
Activator Repressor
Genes in module
Gene1 Gene2 Gene3
* Segal et al., Nature Genetics 2003
Regulation as Linear Regression
minimizew (Σwixi - ETargets)2
But we often have hundreds or thousands of regulators … and linear regression gives them all nonzero weight!
xN…x1 x2
w1w2 wN
ETargets
Problem: This objective learns too many regulators
parametersw1
w2 wN
ETargets= w1 x1+…+wN xN+ε
=
MFA1
Module
GPA1-3 x+
0.5 x
Lasso* (L1) Regressionminimizew (w1x1 + … wNxN - ETargets)2+ C |wi|
Induces sparsity in the solution w (many wi‘s set to zero) Provably selects “right” features when many features are irrelevant
Convex optimization problem Unique global optimum Efficient optimization
But, arbitrary choice among correlated regulators
xN…x1 x2
w1w2 wN
ETargets
parametersw1
w2
x1 x2
* Tibshirani, 1996
L2 L1
Elastic Net* Regressionminimizew (w1x1 + … wNxN - ETargets)2+ C |wi| + D wi
2
Induces sparsity But avoids arbitrary choices among relevant features
Convex optimization problem Unique global optimum Efficient optimization algorithms
Lee et al., PLOS Genetics 2009 * Zhou & Hastie, 2005
xN…x1 x2
w1w2 wN
ETargets
w1w2
x1 x2 L2 L1
Cluster genes into modules Learn a regulatory program for each module
Learning Regulatory Network
PHO5PHM6
SPL2
PHO3PHO84
VTC3GIT1
PHO2
TEC1
GPA1
ECM18
UTH1MEC3
MFA1
SAS5SEC59
SGS1
PHO4
ASG7
RIM15
HAP1
PHO2
GPA1MFA1
SAS5PHO4
RIM15
Lee et al., PLoS Genet 2009
=
MFA1
Module
GPA1-3 x+
0.5 x
This is a Bayesian network• But multiple genes share same program• Dependency model is linear regression
Outline Regulatory networks for gene expression Individual genetic variation and gene
regulation Effect of genotype on expression Regulatory potential
Cell differentiation and gene regulation Expression changes underlying phenotype
Genotype phenotype
…ACTCGGTTGGCCTAAATTCGGCCCGG…
…ACCCGGTAGGCCTTAATTCGGCCCGG…
:
…ACTCGGTAGGCCTATATTCGGCCGGG…
Different sequences
Different phenotypes
??
Perturbations toregulatory network
Genotype Regulation
…ACTCGGTTGGCCTAAATTCGGCCCGG…
…ACCCGGTAGGCCTTAATTCGGCCCGG…
:
…ACTCGGTAGGCCTATATTCGGCCGGG…
Different sequences
??
Perturbations toregulatory network
Goals: Infer regulatory network that controls gene
expression Identify mechanisms by which genetic variation
affects gene expression
eQTL Data [Brem et al. (2002) Science]
BY
×
RM
:
112 progeny
Mar
kers
Expression data112 individuals
6000
gen
es
0101100100…0111011110100…0010010110000…010
:
0000010100…101
0010000000…100
Genotype data
3000
mar
kers
112 individuals
Expression quantitative trait loci (eQTL) mapping For each gene, find the marker that is most predictive of its
expression level [Yvert G et al. (2003) Nat Gen].
Traditional Approach: Single Marker
genes
Genotype data Expression data
markers
individualsindividuals0101100100100…0111011110111100…0010010110001000…010
:
0000010110100…101
1110000110000…100
Gene iMarker jinduced
repressed
1 2 3 4 5 …Marker
mRNA
Gene
LirNet Regulatory network E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes
Measured as values of chromosomal markers
Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009
PHO5PHM6
SPL2
PHO3PHO84
VTC3GIT1
PHO2
TEC1
GPA1
ECM18
UTH1MEC3
MFA1
SAS5SEC59
SGS1
PHO4
ASG7
RIM15
HAP1
PHO2
GPA1MFA1
SAS5PHO4
RIM15
M1
M120
M22
M1011
M321M321
marker
The Telomere Module
Includes Rif2 – • control telomere length • establishes telomeric silencing• 6 coding & 8 promoter SNPs• Binds to Rap1p C-terminus
Enrichment for Rap1p targets (29/42; p < 10-15)
Chr XII: 1056097
40/42 genes in telomeres Enriched for telomere maintenance (p < 10-11)
& helicase activity (p < 10-18)
Lee et al., PNAS 2006
Some Chromatin Modules• Locus containing Sir1• 4 coding SNPs
4/5 consecutive genes
• Locus containing uncharacterized Sir1 homologue• 87(!) coding SNPs
5/7 consecutive genes
Known Sir1 targets
Chr XI: 643655
Chr X: 22213
Lee et al., PNAS 2006
Chromatin as Mechanism
23 modules (out of 165) with “chromosomal features”
16 have “chromatin regulators” (p < 10-7)
Chromatin modification explains significant part of variation in gene expression between strains
Lee et al., PNAS 2006
module #genes Runs Dom DEG Enrich Chrom Reg-E Reg-G728 16732 7 Telo742 24743 6 Telo744 5 Telo755 14757 3 Telo770 35783 23790 122848 15855 42 Telo862 20 869 48876 51 Telo890 114 897 15904 56 TY,LTR917 31941 110968 218 972 7 Telo991 4993 5
1018 881025 217 1030 151054 6 Telo1061 46 Telo1078 901085 42 1120 8 Telo1126 26 Telo1133 191145 7 Telo1155 75
Mechanism I
“Evolutionary strategy” to make coordinated changes in gene
expression by modifying small number of hubs
The Puf3 Module
+1
0
- 1
112 segregantsBY RM
HAP4TOP2KEM1GCN1GCN20DHH1
weight
PUF3 expressiongenotype
147/153 genes (P ~ 10-130)are pulldown targets of
mRNA binding protein Puf3
P-body component
s
PUF family: Sequence specific mRNA binding proteins (3’ UTR) Regulate degradation of mRNA and/or repress translation
Lee et al., PLOS Genetics 2009
P-Bodies
Dhh1 regulates mRNA decapping in p-bodies
mRNAs stored in P-bodies are translationally repressed
TranslationmRNAs can exit P-bodies and resume translation
** Beliakova-Bethell , et al. (2006) RNA 12:94
GFP of Dhh1**
RNA degradationRNA stored in P-bodies can be degraded (via decapping)
* Sheth and Parker (2003) Science 300:805cell
P-bodyPuf3 protein
?But what regulates sequence-specific
localization ofmRNAs to P-bodies?
Microscopy experiment Fluorescent microscopy: Puf3 localizes to P-body Supports hypothesis of Puf3 involvement in
regulating mRNA degradation by P-bodies
Dhh1
Puf3
cell
P-body
Puf3 protein ?
Dhh1
Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009
What Regulates the P-bodies?
A marker that covers a large region in Chr 14.
Region contains ~30 genes and ~318 SNPs. Experiments for all 30 genes not
feasible! ChrXIV:449639
DHH1
BYRM
GCN20
GCN1KEM1
BLM3
Lee et al., PLOS Genetics 2009
Outline Regulatory networks for gene expression Individual genetic variation and gene
regulation Effect of genotype on expression Regulatory potential
Cell differentiation and gene regulation Expression changes underlying phenotype
Motivation Not all SNPs are equally likely to be causal.
Gene
SNPs
TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA
ChrXIV: 449,639-502,316
SNP 1:Conserved residue in a gene involved in RNA degradation
SNP 2:In nonconserved intergenic region
“Regulatory features” F1. Gene region?2. Protein coding region?3. Nonsynonymous?4. Create a stop codon?5. Strong conservation? :
Idea: Prioritize SNPs that have “good” regulatory features
But how do we weight different features?Lee et al., PLOS Genetics 2009
Bayesian L1-Regularization
wmk
ym
Module m
xmk
Regulator k
~ P(ym|x;w) = N (k wmkxmk,ε2)
~ P(w) = Laplacian(0,C)
higher prior variance weight can more easily deviate from 0 regulator more likely to be selected
wiwi
Priorvariance
km
kmwC,
, ||
M
mmmmyP
1
),|(log wx
Lee et al., PLOS Genetics 2009
Metaprior Model (Hierarchical Bayes)
ym
xmk wmk ~ Laplacian (0,Cmk)
= g(ßTfmk)
:
~ P(ym|x;w) = N (k wmkxmk,ε2)
Cmk
fmk
Regulatoryprior ß
Module m
Regulator k
Module 1
Module M
xN…x1 x2x1 x2
Emodule 1
Potential regulators
xN…x1 x2x2
Emodule M
xN
Regulatory featuresInside a gene?Protein coding region? Strong conservation?TF binds to module genes :
“Regulatory potential” = ß1 x Inside a gene? + ß2 x Protein coding region? + ß3 x Conserved? …
YESNO:
YESNO:
NONO:
w11w12 w1N
wN1wN2 wMN
Lee et al., PLOS Genetics 2009
Metaprior Method
Regulatory potentials…x1 x2 xN
Ei
…x1 x2
Ei
xN
Module i+2
…x1 x2
Ei
xN
Module i+1
…x1 x2
Ei
xN
Module i
Regulatory programs
1. Learn regulatory programs
3. Compute regulatory
potential of SNPs
0 0.1 0.2 0.3
1
2
3
4
5
6
Regulatory weights ß
Non-synonymousConservationAA small large
Cell cycle0 0.1 0.2 0.3
BY(lab) MVLT ELVQ VSDASKQLWDI
RM(wild) MVLT ELVQ VSDASKQLWDI
1 1 1
0
L
D
1 0 0
1
: :
0.3 0.9
= =
Regulatory features F
A
G
2. Learn ß
Lee et al., PLOS Genetics 2009
Maximize P(E,ß,W|X)
Maximize P(E,ß,W|X)
Empirical hierarchical Bayes Use point estimate of model parameters Learn priors from data to maximize joint
posterior
Transfer Learning What do regulatory potentials do?
They do not change selection of “strong” regulators – those where prediction of targets is clear
They only help disambiguate between weak ones
Strong regulators help teach us what to look for in other regulators
Transfer of knowledge between different prediction tasks
Learned regulatory weights0 0.1 0.2 0.3 0.4 0.5
Non-synonymous codingStop codon
Synonymous coding3' UTR
500 bp upstream5' UTR
500 bp downstreamConservation score
Cis-regulationChange of average mass (Da)Change of isoelectric point (pI)
Change of pK1Change of pK2
Change of hydro-phobicityChange of pKa
Change of polarityChange of pH
Change of van der WaalsTranscription regulator activity
Telomere organization andProtein folding
Glucose metabolic processRNA modification
Hexose metabolic processCell cycle checkpoint
Proteolysissame GO processsame GO functionChIP-chip binding
0 0.1 0.2 0.3 0.4 0.5
Non-synonymous codingSynonymous coding
IntronLocus region
Splice siteUTR region
Conservation scoreCis-regulation
Change of average mass (Da)Change of isoelectric point
Change of pK1Change of pKa
Change of polarityChange of pH
Change of van der Waals volumecell communicationsignal transduction
transcription
Human regulatory weights
Location
AA property change
Genefunctio
n
Pairwise feature
Regulatory features
Lee et al., PLOS Genetics 2009
Yeast regulatory weights
Statistical Evaluation#
of g
enes
wit
h PG
V >
X
% PGV: Percent genetic variation explained by the predicted
regulatory program for each gene
100 90 80 70 60 50 40 30 20 10 0
500
100
0 1
500
200
0
2500
PGV (%)
1450 genes (Lirnet without regulatory prior)
250 genes (Brem & Kruglyak)
850 genes (Geronemo*)
1650 genes (Lirnet)
* Lee et al. PNAS 2006Lee et al., PLOS Genetics 2009
How many predicted interactions have support in other data?
Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] ChIP-chip binding experiments [Harbison et al. 2004] Transcription factor binding sites [Maclsaac et al. 2006] mRNA binding pull-down experiments [Gerber et al. 2004] Literature-curated signaling interactions
Biological evaluation I
0
10
20
30
40
50
60
70
80
90
GeronemoFlat LirnetLirnet
%interactions %interactions%modules %modules
Reg
Module
Via cascade
Lirnet without regulatory features
% S
uppo
rted
inte
ract
ions
TF
Lee et al., PLOS Genetics 2009
Biological Evaluation II
0
10
20
30
40
50
60
70
80
90
100
0 15 30 45 60 75 90
Significance (%)
% s
uppo
rted
regu
lato
ry p
redi
ctio
ns
Lee et al., PLOS Genetics 2009
LirnetZhu et al (Nature Genet 2008)Random
significance of support
What Regulates the P-Bodies?
Regu
lato
ry p
oten
tial
0.7
0.6
0.5
0.4
ChrXIV:415,000-495,000
Saccharomyces Genome Database (SGD)
High-scoring regulatory featuresConservation 0.230 Mass (Da) 0.066Cis-regulation 0.208 pK1 0.060Same GO proc
0.204 Polar 0.057
Non-syn 0.158 pI 0.050Translation regulation 0.046
MKT1 The regulatory potential over all 318 SNPs in the
region
ChrXIV:449639
Lee et al., PLOS Genetics 2009
XPG-N Pbp1 InteractionXPG-I*
Mkt1 Mkt1 binds to mRNAs at 3’ UTR BY has SNP at conserved residue in nuclease domain
0
5
10
15
20
25
30
35
-1.2 -0.8 -0.4 0 0.4 0.8 1.2
All othersP-body ModulePuf3 Module
MKT1
Puf3 module
P-body module
BYRM
mkt1 in RM
Lee et al., PLOS Genetics 2009
Predicting Causal Regulators
Region Zhu et al [Nat Genet 08] Lirnet (top 3 are considered)
1 NoneSEC18 RDH54 SPT7
2TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1
AMN1 CNS1 TOS1
3 None TRS20 ABD1 PRP5
4 LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2 PGS1 ILV65 MATALPHA1 MATALPHA1 MATALPHA2 RBK1
6 URA3 URA3 NPP2 PAC2
7 GPA1 STP2 GPA1 NEM1
8 HAP1 HAP1 NEJ1 GSY2
9 YRF1-4, YRF1-5, YLR464W SIR3 HMG2 ECM7
10 None ARG81 TAF13 CAC2
11 SAL1, TOP2 MKT1 TOP2 MSK1
12 PHM7 PHM7 ATG19 BRX1
13 None ADE2 ORT1 CAT5
Finding causal regulators for 13 “chromosomal hotspots”
8 validated regulators in 7
regions
14 validated regulators in 11
regions
Lee et al., PLOS Genetics 2009
Learning Regulatory Priors Learns regulatory potentials that are specific
to organism and even data set
Can use any set of regulatory features: Sequence features Functional features for relevant gene Features of regulator/target(s) pairs
Applicable to any organism, including ones where functional data may not be readily available
Lee et al., PLOS Genetics 2009
Conclusion Framework for modeling gene regulation
Use machine learning to identify regulatory program
Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network
Uncovers diverse regulatory mechanisms Chromatin remodeling mRNA degradation
Acknowledgements
National Science Foundation
Nevan Krogan, UCSF Pam Silver, David Drubin, Harvard Medical
School
Su-In LeeUniversity of Washington
Dana Pe’er
Columbia University
Aimee Dudley
Institute for System Biology