Upload
kali-elderkin
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
Analysis of time-course Analysis of time-course
gene expression datagene expression data
Shyamal D. PeddadaBiostatistics Branch
National Inst. Environmental Health Sciences (NIH)
Research Triangle Park, NC
Outline of the talkOutline of the talk
Some objectives for performing “long series” time-course experiments
A. Single cell-cycle experiment
– A nonlinear regression model– Phase angle of a cell cycle gene– Inference– Open research problems
B. Multiple cell-cycle experiments
– “Coherence” between multiple cell-cycle experiments– Illustration– Open research problems
ObjectivesObjectives
Some genes play an important role during the cell division cycle process. They are known as “cell-cycle genes”.
Objectives: Investigate various characteristics of cell-cycle and/or circadian genes such as:
– Amplitude of initial expression– Period– Phase angle of expression (angle of maximum
expression for a cell cycle gene)
Phases in cell division cycle
A brief descriptionA brief description
• G1 phase:
"GAP 1". For many cells, this phase is the major period of cell growth during its lifespan.
• S ("Synthesis”) phase:
DNA replication occurs.
A brief descriptionA brief description
• G2 phase:
"GAP 2“: Cells prepare for M phase. The G2 checkpoint prevents cells from entering mitosis when DNA was damaged since the last division, providing an opportunity for DNA repair and stopping the proliferation of damaged cells.
• M (“Mitosis”) phase:
Nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur. Mitosis is further divided into 4 phases.
Single, long series experiment … Single, long series experiment …
Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)
Basic design is as follows:
Experimental units: Human cancer cells (HeLa)
Microarray platform: cDNA chips used with approx 43000 probes (i.e. roughly 29000 genes)
3 different patterns of time points (i.e. 3 different experiments)
One of the goals of these experiments was to identify periodically expressed genes.
Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)
Experiment 1: (26 time points)
Hela cancer cells arrested in the S-phase using double thymidine block.
Sampling times after arrest (hrs):
– 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22 24 26 28 32 36 40 44.
Whitfield Whitfield et al.et al. (2002) (2002)
Experiment 2: (47 time points)
Hela cancer cells arrested in the S-phase using double thymidine block.
Sampling times after arrest (hrs):
– every hour between 0 and 46.
Whitfield Whitfield et al.et al. (2002) (2002)
Experiment 3: (19 time points)
Hela cancer cells arrested arrested in the M-phase using thymidine and then by nocodazole.
Sampling times after arrest (hrs):
– 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36.
Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:
Cell Cycle Phase Genes------------------ -------
G1/S CCNE1, CDC6, PCNA,E2F1
S RFC4, RRM2
G2 CDC2, TOP2A, CCNA2, CCNF
G2/M STK15, CCNB1, PLK, BUB1
M/G1 VEGFC, PTTG1, CDKN3, RAD21
QuestionsQuestions
Can we describe the gene expression of a cell-cycle gene as a function of time?
Can we determine the phase angle for a given cell-cycle gene? i.e. can we quantify the previous table in terms of angles on a circle?
What is the period of expression for a given gene?
Can we test the hypothesis that all cell-cycle genes share the same time period?
Etc.
Profile of PCNA based on Profile of PCNA based on experiment 2 dataexperiment 2 data
Some important observationsSome important observations
1. Gene expression has a sinusoidal shape
2. Gene expression for a given gene is an average value of mRNA levels across a large number of cells
3. Duration of cell cycle varies stochastically across cells
4. Initially cells are synchronized but over time they fall out of synchrony
5. Gene expression of a cell-cycle gene is expected to “decrease/decay” over time. This is because of items 2 and 4 listed above!
Random Periods Model Random Periods Model (PNAS, 2004)(PNAS, 2004)
dzz
zT
tCos
Kbtatf ⎟⎟
⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛+++= ∫
∞
∞− 2exp
) (exp2
2 )(
2
φσ
ππ
• a and b: background drift parameters• K: the initial amplitude• T: the average period• the attenuation parameter• the phase angle
:σ:φ
Fitted curves for some phase marker Fitted curves for some phase marker genesgenes
Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:
Phase Genes Phase angles (radians)
-------- ------- ------------------------
G1/S CCNE1, CDC6, PCNA,E2F1 0.56, 5.96, 5.87, 5.83
S RFC4, RRM2 5.47, 5.36
G2 CDC2, TOP2A, CCNA2, CCNF 4.24, 3.74, 3.55, 3.25
G2/M STK15, CCNB1, PLK, BUB1 3.06, 2.67, 2.61, 2.51
M/G1 VEGFC, PTTG1, CDKN3, RAD21 2.66, 2.40, 2.25, 1.81
A hypothesis of biological interestA hypothesis of biological interest
Do all cell cycle genes have same T and same but the other 4 parameters are gene specific? i.e.
σ
gTTH gg genes allfor ,:0 σσ ==
An Important FeatureAn Important Feature
Correlated data
– Temporal correlation within gene
– Gene-to-gene correlations
Test StatisticTest Statistic
Wald statistic for heteroscedastic linear and non-linear models
– Zhang, Peddada and Rogol (2000)– Shao (1992)– Wu (1986)
The Null DistributionThe Null Distribution
Due to the underlying correlation structure
– Asymptotic approximation is not appropriate.
– Use moving-blocks bootstrap technique on the residuals of the nonlinear model.
Kunsch (1989)
2χ
Moving-blocks BootstrapMoving-blocks Bootstrap
Step 1: Fit the null model to the data and compute the residuals.
Step 2: Draw a simple random sample (with replacement) from all possible blocks , of a specific size, of consecutive residuals.
Moving-blocks BootstrapMoving-blocks Bootstrap
Step 3: Add these residuals to the fitted curve under the null hypothesis to obtain the bootstrap data set
Step 4: Using the bootstrap data fit the model under the alternate hypothesis and compute the Wald statistic.
Moving-blocks BootstrapMoving-blocks Bootstrap
Step 5: Repeat the above steps a large number of times.
Step 6: The bootstrap p-value is the proportion of the above Wald statistics that exceed the Wald statistic determined from the actual data.
Analysis of experiment 2Analysis of experiment 2
The bootstrap p-value for testing
using Experiment 2 data of Whitfield et al. (2002) is 0.12.
Thus our model is biologically plausible.
σσ == gg TTH ,:0
Statistical inferences on the phase angle φ
Multiple experimentsMultiple experiments
Some questions of interestSome questions of interest
How to evaluate or combine results from multiple cell division cycle experiments?
– Are the results “consistent” across experiments?
How to evaluate this?What could be a possible criterion?
DataData
: RPM estimate of phase angle of a cell-cycle gene ‘g’
from the experiment.
ig ,φ̂thi
Representation using a circleRepresentation using a circle
Consider 4 cell cycle genes A, B, C, D. The vertical line in the circle denotes the reference line. The angles are measured in a counter-clockwise.
Thus the sequential orderof expression in thisexample is A, B, D, C.
A
D
B
C
““Coherence” in multiple cell-cycle Coherence” in multiple cell-cycle experimentsexperiments
A group of cell cycle genes are said to be coherent across experiments if their sequential order of the phase angles is preserved across experiments.
A
D
B
C
D
A
C
B
B
C
D
A
Exp 1
Exp 2
Exp 3
Geometric RepresentationGeometric Representation
We shall represent phase angles from multiple cell cycle experiments using concentric circles.
Each circle represents an experiment.
Same gene from a pair of experiments is connected by a line segment.
– A figure with non-intersecting lines indicates perfect coherence.
– If there is no coherence at all then there will be many intersecting lines.
Example: Perfectly CoherentExample: Perfectly Coherent
Example: Perfectly CoherentExample: Perfectly Coherent
Example: No coherenceExample: No coherence
Estimated Phase AnglesEstimated Phase Angles
Due to statistical errors in estimation, the estimated phase angles from multiple cell cycle experiments need not preserve the sequential order even though the true phase angles are in a sequential order.
How to evaluate coherence?How to evaluate coherence?
Some background on regression Some background on regression for circular datafor circular data
Experiment A Experiment B
Question: Can we determine a rotation matrix A such thatwe can rotate the circle representing Experiment A to obtain the circle representing Experiment B?
1,3̂φ
1,1̂φ
1,2̂φ
2,3̂φ
2,1̂φ
2,2̂φ
Angle of rotation for a rigid body
Yes! By solve the following minimization problem:
221,
12, ||ˆˆ||min g
n
gg
SAAφφ∑
=∈−
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−=
uvuv
uvuvA
||
||
ˆcos ˆsin
ˆsin ˆcosˆ
θθ
θθ
Determination of Coherence Determination of Coherence Across “k” ExperimentsAcross “k” Experiments
The Basic IdeaThe Basic Idea
Consider a rigid body rotating in a plane. Suppose the body is perfectly rigid with no deformations.
Let denote the 2x2 rotation matrices from
experiment i to i+1 (k+1 = 1). Then
Alternatively
1+→ iiA
kkk AAAAA →→−→→→ = 11433221 . . .
IAAAAA
IAAAAA
kkk
kkk
=⇔
=
→→−→→→
→→−→→→
11433221
11433221
. . .
'. . .
The Basic IdeaThe Basic Idea
Equivalently, if
Then under perfect rigid body motion we should have
€
Ai→i+1 =cos ˆ θ i+1| i sin ˆ θ i+1| i
−sin ˆ θ i+1| i cos ˆ θ i+1| i
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
1)cos(1
|1 =∑=
+
k
iiiθ
Problem!Problem!
In the present context we do NOT necessarily have a rigid body!
– Not all experiments are performed with same precision.
– The time axis may not be constant across experiments.
– Number of time points may not be same across experiments.
– Etc.
Example: Not a rigid motion Example: Not a rigid motion
but perfectly coherentbut perfectly coherent
Consequence
Rotation matrix A alone may not be enough to bring two circles to congruence!
An additional “association/scaling” parameter may be needed as see in the previous figure!
Circular-Circular regression model Circular-Circular regression model for a pair of experiments for a pair of experiments (Downs and Mardia, 2002)(Downs and Mardia, 2002)
For , let denote a pair of
angular variables.
Suppose is von-Mises distributed with
mean direction and concentration parameter
)ˆ,ˆ( 2,1, gg φφ
μ κ
Gg ,...,2,1=
1,2,ˆ|ˆgg φφ
Circular-Circular Regression Model Circular-Circular Regression Model (Downs and Mardia, 2002)(Downs and Mardia, 2002)
where),2
ˆtan()
2tan( 1|21,
1|21|2 βφ
ωαμ −
=− g
parameter"n associatio"
rotation of angle the
12
1|21|21|2
=
=−=
|ω
βαθ
πθπω ≤≤−≤≤ 1212 ,10 ||
The regression model is given by the link function
Back to the toy examplesBack to the toy examples
0 |ˆˆˆ| ),1,1,1()ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω
0 |ˆ ˆ ˆ| ),20,.34,.64(.)ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω
2.2 |ˆ ˆ ˆ| ),0,0,0()ˆ ,ˆ ,ˆ( |||||| ≈++≈ CABCABCABCAB θθθωωω
Determination Of CoherenceDetermination Of Coherence
Suppose we have K experiments, labeled as
1, 2, 3, …, K. Let denote the angle of rotation
for the regression of i on j for a group of g genes.
Compute
Note .
ji|θ̂
|ˆ|1
1|∑=
+
K
iiiθ
11≡+K
Determination Of CoherenceDetermination Of Coherence
We expect under no coherence
to be “stochastically” larger than
under coherence.
|ˆ|1
1|∑=
+
K
iiiθ
|ˆ|1
1|∑=
+
K
iiiθ
Comparison of Cumulative Comparison of Cumulative Distribution FunctionsDistribution Functions
Blue line: CoherencePink line: No Coherence
Determination Of CoherenceDetermination Of Coherence
For a given data compute
Generate the bootstrap distribution of
under the null hypothesis of no coherence.
|ˆ| 1
1|∑=
+=K
iiic θ
|ˆ| 1
1|∑=
+
K
iiiθ
Bootstrap P-value For CoherenceBootstrap P-value For Coherence
Let denote the angle of rotation using
the bootstrap sample. Then the P-value is:
c) |ˆ| P(1
*
1|≤∑
=+
K
iii
θ
*1|
ˆ+iiθ
Illustration: Whitfield Illustration: Whitfield et alet al. data. data
There are 3 experiments. The phase angles of each gene was estimated using Liu et al., (2004) model.
A total of 47 common cell-cycling genes were selected from the three experiments.
EstimatesEstimates
The estimated values of interest are
Note that
2.59) 3.03,- 0.5,( )ˆ ,ˆ ,ˆ(
),64.0,70.0,67.0()ˆ ,ˆ ,ˆ(
|||
3|12|31|2
=
=
CABCAB θθθ
ωωω
radians 0.06 |ˆˆ ˆ| 3|12|31|2 =++ θθθ
029.0 0.06) |ˆˆ ˆP(| ***
3|12|31|2≈≤++ θθθ
ConclusionConclusion
Since the bootstrap P-value < 0.05, we conclude that the three experiments are coherent.
Accession Gene Symbol Phase (rad) Res (rad) Dispersion (rad) A B C B - B|A C - C|B A - A|C Cir_dist AA135809 EST 0.882 0.040 3.399 -0.29 0.66 -0.10 0.04 W93120 EST 0.260 0.427 2.580 0.52 -0.58 0.53 0.21 T54121 CCNE1* 1.191 0.559 2.661 0.02 -0.65 1.35 0.33 AA131908 FLJ10540 3.534 2.220 6.186 -0.65 0.65 -0.08 0.25 AA088457 EST 2.613 2.373 5.700 0.66 -0.02 -0.68 0.08 AA464019 E2-EPF 3.478 2.464 5.798 -0.33 -0.02 0.12 0.07 AA430092 BUB1 3.566 2.510 6.132 -0.41 0.26 -0.01 0.11 AA425404 FLJ10156 3.508 2.519 6.241 -0.32 0.36 -0.14 0.12 H73329 C20orf1 3.494 2.594 5.873 -0.22 -0.09 0.08 0.04 AA629262 PLK 3.314 2.613 5.888 0.05 -0.10 -0.11 0.02 AA157499 MAPK13 3.390 2.615 5.784 -0.05 -0.20 0.04 0.02 AA282935 MPHOSPH1 3.826 2.667 6.233 -0.64 0.19 0.18 0.12 AA053556 MKI67 3.600 2.731 5.665 -0.24 -0.44 0.33 0.06 AA279990 TACC3 3.804 2.810 0.275 -0.46 0.37 -0.05 0.13 AA402431 CENPE 3.556 2.892 5.939 -0.01 -0.33 0.10 0.01 R11407 STK15 3.484 2.940 5.869 0.14 -0.44 0.08 0.01 AA598776 CDC20 3.355 2.957 5.854 0.34 -0.47 -0.04 0.00 AA262211 KIAA0008 3.457 2.989 5.918 0.23 -0.44 0.02 0.00 AA421171 NUF2R 3.785 3.000 5.679 -0.24 -0.69 0.50 0.10 AA010065 CKS2 3.341 3.030 5.826 0.43 -0.57 -0.04 0.02 AA292964 CKS2 3.312 3.037 5.980 0.48 -0.42 -0.17 0.01 AA430511 FLJ14642 4.170 3.244 1.653 -0.57 1.35 -0.74 0.70 AA430511 FLJ14642 4.170 3.244 1.474 -0.57 1.17 -0.57 0.55 AA676797 CCNF 4.024 3.249 1.170 -0.35 0.86 -0.46 0.36 AA458994 PMSCL1 0.841 3.387 0.298 -0.15 -0.13 0.12 0.01 AA235662 FLJ14642 3.653 3.396 1.278 0.35 0.85 -0.92 0.51 N63744 FLJ10468 3.864 3.511 0.637 0.15 0.11 -0.23 0.07 AA620485 ANKT 3.709 3.531 0.923 0.40 0.38 -0.59 0.24 AA608568 CCNA2 3.857 3.541 6.133 0.19 -0.70 0.28 0.05 R96941 C20orf129 3.751 3.546 0.667 0.36 0.11 -0.36 0.11 AA504625 KNSL1 4.107 3.551 0.410 -0.17 -0.15 0.17 0.00 AI053446 EST 4.348 3.612 1.256 -0.45 0.65 -0.21 0.21 R22949 EST 4.164 3.631 0.161 -0.17 -0.46 0.39 0.02 AA452513 KNSL5 3.915 3.730 0.192 0.29 -0.50 0.12 0.03 T66935 DKFZp762E1312 4.193 3.884 0.800 0.04 -0.01 -0.01 0.03 AA099033 USP1* 5.000 4.760 2.876 -0.12 1.43 -1.45 0.71 AA485454 EST 4.886 5.086 0.891 0.33 -0.79 0.61 0.24 AA485454 EST 4.275 5.086 0.891 1.12 -0.79 0.00 0.44 AA485454 EST 4.886 5.235 0.891 0.48 -0.90 0.61 0.32 AA485454 EST* 4.275 5.235 0.891 1.27 -0.90 0.00 0.55 AA620553 FEN1 5.897 5.510 3.028 -0.21 1.02 -0.79 0.23 AA425120 CHAF1B 5.697 5.714 1.685 0.16 -0.49 0.76 0.16 N57722 MCM6 0.047 5.817 2.568 -0.23 0.31 0.34 0.00 AA450264 PCNA 0.195 5.858 2.438 -0.29 0.14 0.67 0.02 H51719 ORC1L 5.906 5.917 2.889 0.19 0.54 -0.57 0.14 H59203 CDC6 0.551 5.968 2.723 -0.43 0.33 0.61 0.04 R06900 RAMP 0.243 6.049 2.889 -0.13 0.42 0.06 0.00
Statistical inferences on the phase angle
- Some open problems
φ
Estimation subject to inequality Estimation subject to inequality constraintsconstraints
It is reasonable to hypothesize that for a normal cell division cycle, the p phase marker genes must express in an order around the unit circle.
Thus they must satisfy:
πφφφ 2...0 21 ≤≤≤≤≤ p
Open problemsOpen problems- data from single experiment- data from single experiment
How to estimate the phase angles subject to the simple order restriction?
More generally - wow to estimate the phase angles subject isotropic simple order restriction?
How to test the above hypothesis? What are the null and alternative hypotheses?
πφφφ 2...0 21 ≤≤≤≤≤ p
pφφφ ≤≤≤ ...21
Open problems – data from multiple Open problems – data from multiple experimentsexperiments
How do we estimate the phase angles from multiple experiments under the order restriction on the phase angles of cell cycle genes?
What are the statistical errors associated with such an estimator?
How to construct confidence intervals and test hypotheses?
AcknowledgmentsAcknowledgments
Delong Liu (former Post-doc at NIEHS) David Umbach (NIEHS) Leping Li (NIEHS) Clare Weinberg (NIEHS) Pat Crocket (Constella Group) Cristina Rueda (Univ. of Valladolid, Spain) Miguel Fernandez (Univ. of Valladolid, Spain)