68
Analysis of time-course Analysis of time-course gene expression data gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle Park, NC

Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Embed Size (px)

Citation preview

Page 1: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Analysis of time-course Analysis of time-course

gene expression datagene expression data

Shyamal D. PeddadaBiostatistics Branch

National Inst. Environmental Health Sciences (NIH)

Research Triangle Park, NC

Page 2: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Outline of the talkOutline of the talk

Some objectives for performing “long series” time-course experiments

A. Single cell-cycle experiment

– A nonlinear regression model– Phase angle of a cell cycle gene– Inference– Open research problems

B. Multiple cell-cycle experiments

– “Coherence” between multiple cell-cycle experiments– Illustration– Open research problems

Page 3: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

ObjectivesObjectives

Some genes play an important role during the cell division cycle process. They are known as “cell-cycle genes”.

Objectives: Investigate various characteristics of cell-cycle and/or circadian genes such as:

– Amplitude of initial expression– Period– Phase angle of expression (angle of maximum

expression for a cell cycle gene)

Page 4: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Phases in cell division cycle

Page 5: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

A brief descriptionA brief description

• G1 phase:

"GAP 1". For many cells, this phase is the major period of cell growth during its lifespan.

• S ("Synthesis”) phase:

DNA replication occurs.

Page 6: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

A brief descriptionA brief description

• G2 phase:

"GAP 2“: Cells prepare for M phase. The G2 checkpoint prevents cells from entering mitosis when DNA was damaged since the last division, providing an opportunity for DNA repair and stopping the proliferation of damaged cells.

• M (“Mitosis”) phase:

Nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur. Mitosis is further divided into 4 phases.

Page 7: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Single, long series experiment … Single, long series experiment …

Page 8: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)

Basic design is as follows:

Experimental units: Human cancer cells (HeLa)

Microarray platform: cDNA chips used with approx 43000 probes (i.e. roughly 29000 genes)

3 different patterns of time points (i.e. 3 different experiments)

One of the goals of these experiments was to identify periodically expressed genes.

Page 9: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)

Experiment 1: (26 time points)

Hela cancer cells arrested in the S-phase using double thymidine block.

Sampling times after arrest (hrs):

– 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22 24 26 28 32 36 40 44.

Page 10: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. (2002) (2002)

Experiment 2: (47 time points)

Hela cancer cells arrested in the S-phase using double thymidine block.

Sampling times after arrest (hrs):

– every hour between 0 and 46.

Page 11: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. (2002) (2002)

Experiment 3: (19 time points)

Hela cancer cells arrested arrested in the M-phase using thymidine and then by nocodazole.

Sampling times after arrest (hrs):

– 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36.

Page 12: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:

Cell Cycle Phase Genes------------------ -------

G1/S CCNE1, CDC6, PCNA,E2F1

S RFC4, RRM2

G2 CDC2, TOP2A, CCNA2, CCNF

G2/M STK15, CCNB1, PLK, BUB1

M/G1 VEGFC, PTTG1, CDKN3, RAD21

Page 13: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

QuestionsQuestions

Can we describe the gene expression of a cell-cycle gene as a function of time?

Can we determine the phase angle for a given cell-cycle gene? i.e. can we quantify the previous table in terms of angles on a circle?

What is the period of expression for a given gene?

Can we test the hypothesis that all cell-cycle genes share the same time period?

Etc.

Page 14: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Profile of PCNA based on Profile of PCNA based on experiment 2 dataexperiment 2 data

Page 15: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Some important observationsSome important observations

1. Gene expression has a sinusoidal shape

2. Gene expression for a given gene is an average value of mRNA levels across a large number of cells

3. Duration of cell cycle varies stochastically across cells

4. Initially cells are synchronized but over time they fall out of synchrony

5. Gene expression of a cell-cycle gene is expected to “decrease/decay” over time. This is because of items 2 and 4 listed above!

Page 16: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Random Periods Model Random Periods Model (PNAS, 2004)(PNAS, 2004)

dzz

zT

tCos

Kbtatf ⎟⎟

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛+++= ∫

∞− 2exp

) (exp2

2 )(

2

φσ

ππ

• a and b: background drift parameters• K: the initial amplitude• T: the average period• the attenuation parameter• the phase angle

:σ:φ

Page 17: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Fitted curves for some phase marker Fitted curves for some phase marker genesgenes

Page 18: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle
Page 19: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle
Page 20: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle
Page 21: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:

Phase Genes Phase angles (radians)

-------- ------- ------------------------

G1/S CCNE1, CDC6, PCNA,E2F1 0.56, 5.96, 5.87, 5.83

S RFC4, RRM2 5.47, 5.36

G2 CDC2, TOP2A, CCNA2, CCNF 4.24, 3.74, 3.55, 3.25

G2/M STK15, CCNB1, PLK, BUB1 3.06, 2.67, 2.61, 2.51

M/G1 VEGFC, PTTG1, CDKN3, RAD21 2.66, 2.40, 2.25, 1.81

Page 22: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

A hypothesis of biological interestA hypothesis of biological interest

Do all cell cycle genes have same T and same but the other 4 parameters are gene specific? i.e.

σ

gTTH gg genes allfor ,:0 σσ ==

Page 23: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

An Important FeatureAn Important Feature

Correlated data

– Temporal correlation within gene

– Gene-to-gene correlations

Page 24: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Test StatisticTest Statistic

Wald statistic for heteroscedastic linear and non-linear models

– Zhang, Peddada and Rogol (2000)– Shao (1992)– Wu (1986)

Page 25: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

The Null DistributionThe Null Distribution

Due to the underlying correlation structure

– Asymptotic approximation is not appropriate.

– Use moving-blocks bootstrap technique on the residuals of the nonlinear model.

Kunsch (1989)

Page 26: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 1: Fit the null model to the data and compute the residuals.

Step 2: Draw a simple random sample (with replacement) from all possible blocks , of a specific size, of consecutive residuals.

Page 27: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 3: Add these residuals to the fitted curve under the null hypothesis to obtain the bootstrap data set

Step 4: Using the bootstrap data fit the model under the alternate hypothesis and compute the Wald statistic.

Page 28: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 5: Repeat the above steps a large number of times.

Step 6: The bootstrap p-value is the proportion of the above Wald statistics that exceed the Wald statistic determined from the actual data.

Page 29: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Analysis of experiment 2Analysis of experiment 2

The bootstrap p-value for testing

using Experiment 2 data of Whitfield et al. (2002) is 0.12.

Thus our model is biologically plausible.

σσ == gg TTH ,:0

Page 30: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Statistical inferences on the phase angle φ

Multiple experimentsMultiple experiments

Page 31: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Some questions of interestSome questions of interest

How to evaluate or combine results from multiple cell division cycle experiments?

– Are the results “consistent” across experiments?

How to evaluate this?What could be a possible criterion?

Page 32: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

DataData

: RPM estimate of phase angle of a cell-cycle gene ‘g’

from the experiment.

ig ,φ̂thi

Page 33: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Representation using a circleRepresentation using a circle

Consider 4 cell cycle genes A, B, C, D. The vertical line in the circle denotes the reference line. The angles are measured in a counter-clockwise.

Thus the sequential orderof expression in thisexample is A, B, D, C.

A

D

B

C

Page 34: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

““Coherence” in multiple cell-cycle Coherence” in multiple cell-cycle experimentsexperiments

A group of cell cycle genes are said to be coherent across experiments if their sequential order of the phase angles is preserved across experiments.

A

D

B

C

D

A

C

B

B

C

D

A

Exp 1

Exp 2

Exp 3

Page 35: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Geometric RepresentationGeometric Representation

We shall represent phase angles from multiple cell cycle experiments using concentric circles.

Each circle represents an experiment.

Same gene from a pair of experiments is connected by a line segment.

– A figure with non-intersecting lines indicates perfect coherence.

– If there is no coherence at all then there will be many intersecting lines.

Page 36: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Example: Perfectly CoherentExample: Perfectly Coherent

Page 37: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Example: Perfectly CoherentExample: Perfectly Coherent

Page 38: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Example: No coherenceExample: No coherence

Page 39: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Estimated Phase AnglesEstimated Phase Angles

Due to statistical errors in estimation, the estimated phase angles from multiple cell cycle experiments need not preserve the sequential order even though the true phase angles are in a sequential order.

Page 40: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

How to evaluate coherence?How to evaluate coherence?

Page 41: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Some background on regression Some background on regression for circular datafor circular data

Page 42: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Experiment A Experiment B

Question: Can we determine a rotation matrix A such thatwe can rotate the circle representing Experiment A to obtain the circle representing Experiment B?

1,3̂φ

1,1̂φ

1,2̂φ

2,3̂φ

2,1̂φ

2,2̂φ

Page 43: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Angle of rotation for a rigid body

Yes! By solve the following minimization problem:

221,

12, ||ˆˆ||min g

n

gg

SAAφφ∑

=∈−

⎟⎟

⎜⎜

−=

uvuv

uvuvA

||

||

ˆcos ˆsin

ˆsin ˆcosˆ

θθ

θθ

Page 44: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Determination of Coherence Determination of Coherence Across “k” ExperimentsAcross “k” Experiments

Page 45: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

The Basic IdeaThe Basic Idea

Consider a rigid body rotating in a plane. Suppose the body is perfectly rigid with no deformations.

Let denote the 2x2 rotation matrices from

experiment i to i+1 (k+1 = 1). Then

Alternatively

1+→ iiA

kkk AAAAA →→−→→→ = 11433221 . . .

IAAAAA

IAAAAA

kkk

kkk

=⇔

=

→→−→→→

→→−→→→

11433221

11433221

. . .

'. . .

Page 46: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

The Basic IdeaThe Basic Idea

Equivalently, if

Then under perfect rigid body motion we should have

Ai→i+1 =cos ˆ θ i+1| i sin ˆ θ i+1| i

−sin ˆ θ i+1| i cos ˆ θ i+1| i

⎝ ⎜ ⎜

⎠ ⎟ ⎟

1)cos(1

|1 =∑=

+

k

iiiθ

Page 47: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Problem!Problem!

In the present context we do NOT necessarily have a rigid body!

– Not all experiments are performed with same precision.

– The time axis may not be constant across experiments.

– Number of time points may not be same across experiments.

– Etc.

Page 48: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Example: Not a rigid motion Example: Not a rigid motion

but perfectly coherentbut perfectly coherent

Page 49: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Consequence

Rotation matrix A alone may not be enough to bring two circles to congruence!

An additional “association/scaling” parameter may be needed as see in the previous figure!

Page 50: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Circular-Circular regression model Circular-Circular regression model for a pair of experiments for a pair of experiments (Downs and Mardia, 2002)(Downs and Mardia, 2002)

For , let denote a pair of

angular variables.

Suppose is von-Mises distributed with

mean direction and concentration parameter

)ˆ,ˆ( 2,1, gg φφ

μ κ

Gg ,...,2,1=

1,2,ˆ|ˆgg φφ

Page 51: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Circular-Circular Regression Model Circular-Circular Regression Model (Downs and Mardia, 2002)(Downs and Mardia, 2002)

where),2

ˆtan()

2tan( 1|21,

1|21|2 βφ

ωαμ −

=− g

parameter"n associatio"

rotation of angle the

12

1|21|21|2

=

=−=

βαθ

πθπω ≤≤−≤≤ 1212 ,10 ||

The regression model is given by the link function

Page 52: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Back to the toy examplesBack to the toy examples

0 |ˆˆˆ| ),1,1,1()ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω

Page 53: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

0 |ˆ ˆ ˆ| ),20,.34,.64(.)ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω

Page 54: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

2.2 |ˆ ˆ ˆ| ),0,0,0()ˆ ,ˆ ,ˆ( |||||| ≈++≈ CABCABCABCAB θθθωωω

Page 55: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Determination Of CoherenceDetermination Of Coherence

Suppose we have K experiments, labeled as

1, 2, 3, …, K. Let denote the angle of rotation

for the regression of i on j for a group of g genes.

Compute

Note .

ji|θ̂

|ˆ|1

1|∑=

+

K

iiiθ

11≡+K

Page 56: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Determination Of CoherenceDetermination Of Coherence

We expect under no coherence

to be “stochastically” larger than

under coherence.

|ˆ|1

1|∑=

+

K

iiiθ

|ˆ|1

1|∑=

+

K

iiiθ

Page 57: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Comparison of Cumulative Comparison of Cumulative Distribution FunctionsDistribution Functions

Blue line: CoherencePink line: No Coherence

Page 58: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Determination Of CoherenceDetermination Of Coherence

For a given data compute

Generate the bootstrap distribution of

under the null hypothesis of no coherence.

|ˆ| 1

1|∑=

+=K

iiic θ

|ˆ| 1

1|∑=

+

K

iiiθ

Page 59: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Bootstrap P-value For CoherenceBootstrap P-value For Coherence

Let denote the angle of rotation using

the bootstrap sample. Then the P-value is:

c) |ˆ| P(1

*

1|≤∑

=+

K

iii

θ

*1|

ˆ+iiθ

Page 60: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Illustration: Whitfield Illustration: Whitfield et alet al. data. data

There are 3 experiments. The phase angles of each gene was estimated using Liu et al., (2004) model.

A total of 47 common cell-cycling genes were selected from the three experiments.

Page 61: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

EstimatesEstimates

The estimated values of interest are

Note that

2.59) 3.03,- 0.5,( )ˆ ,ˆ ,ˆ(

),64.0,70.0,67.0()ˆ ,ˆ ,ˆ(

|||

3|12|31|2

=

=

CABCAB θθθ

ωωω

radians 0.06 |ˆˆ ˆ| 3|12|31|2 =++ θθθ

029.0 0.06) |ˆˆ ˆP(| ***

3|12|31|2≈≤++ θθθ

Page 62: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

ConclusionConclusion

Since the bootstrap P-value < 0.05, we conclude that the three experiments are coherent.

Page 63: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Accession Gene Symbol Phase (rad) Res (rad) Dispersion (rad) A B C B - B|A C - C|B A - A|C Cir_dist AA135809 EST 0.882 0.040 3.399 -0.29 0.66 -0.10 0.04 W93120 EST 0.260 0.427 2.580 0.52 -0.58 0.53 0.21 T54121 CCNE1* 1.191 0.559 2.661 0.02 -0.65 1.35 0.33 AA131908 FLJ10540 3.534 2.220 6.186 -0.65 0.65 -0.08 0.25 AA088457 EST 2.613 2.373 5.700 0.66 -0.02 -0.68 0.08 AA464019 E2-EPF 3.478 2.464 5.798 -0.33 -0.02 0.12 0.07 AA430092 BUB1 3.566 2.510 6.132 -0.41 0.26 -0.01 0.11 AA425404 FLJ10156 3.508 2.519 6.241 -0.32 0.36 -0.14 0.12 H73329 C20orf1 3.494 2.594 5.873 -0.22 -0.09 0.08 0.04 AA629262 PLK 3.314 2.613 5.888 0.05 -0.10 -0.11 0.02 AA157499 MAPK13 3.390 2.615 5.784 -0.05 -0.20 0.04 0.02 AA282935 MPHOSPH1 3.826 2.667 6.233 -0.64 0.19 0.18 0.12 AA053556 MKI67 3.600 2.731 5.665 -0.24 -0.44 0.33 0.06 AA279990 TACC3 3.804 2.810 0.275 -0.46 0.37 -0.05 0.13 AA402431 CENPE 3.556 2.892 5.939 -0.01 -0.33 0.10 0.01 R11407 STK15 3.484 2.940 5.869 0.14 -0.44 0.08 0.01 AA598776 CDC20 3.355 2.957 5.854 0.34 -0.47 -0.04 0.00 AA262211 KIAA0008 3.457 2.989 5.918 0.23 -0.44 0.02 0.00 AA421171 NUF2R 3.785 3.000 5.679 -0.24 -0.69 0.50 0.10 AA010065 CKS2 3.341 3.030 5.826 0.43 -0.57 -0.04 0.02 AA292964 CKS2 3.312 3.037 5.980 0.48 -0.42 -0.17 0.01 AA430511 FLJ14642 4.170 3.244 1.653 -0.57 1.35 -0.74 0.70 AA430511 FLJ14642 4.170 3.244 1.474 -0.57 1.17 -0.57 0.55 AA676797 CCNF 4.024 3.249 1.170 -0.35 0.86 -0.46 0.36 AA458994 PMSCL1 0.841 3.387 0.298 -0.15 -0.13 0.12 0.01 AA235662 FLJ14642 3.653 3.396 1.278 0.35 0.85 -0.92 0.51 N63744 FLJ10468 3.864 3.511 0.637 0.15 0.11 -0.23 0.07 AA620485 ANKT 3.709 3.531 0.923 0.40 0.38 -0.59 0.24 AA608568 CCNA2 3.857 3.541 6.133 0.19 -0.70 0.28 0.05 R96941 C20orf129 3.751 3.546 0.667 0.36 0.11 -0.36 0.11 AA504625 KNSL1 4.107 3.551 0.410 -0.17 -0.15 0.17 0.00 AI053446 EST 4.348 3.612 1.256 -0.45 0.65 -0.21 0.21 R22949 EST 4.164 3.631 0.161 -0.17 -0.46 0.39 0.02 AA452513 KNSL5 3.915 3.730 0.192 0.29 -0.50 0.12 0.03 T66935 DKFZp762E1312 4.193 3.884 0.800 0.04 -0.01 -0.01 0.03 AA099033 USP1* 5.000 4.760 2.876 -0.12 1.43 -1.45 0.71 AA485454 EST 4.886 5.086 0.891 0.33 -0.79 0.61 0.24 AA485454 EST 4.275 5.086 0.891 1.12 -0.79 0.00 0.44 AA485454 EST 4.886 5.235 0.891 0.48 -0.90 0.61 0.32 AA485454 EST* 4.275 5.235 0.891 1.27 -0.90 0.00 0.55 AA620553 FEN1 5.897 5.510 3.028 -0.21 1.02 -0.79 0.23 AA425120 CHAF1B 5.697 5.714 1.685 0.16 -0.49 0.76 0.16 N57722 MCM6 0.047 5.817 2.568 -0.23 0.31 0.34 0.00 AA450264 PCNA 0.195 5.858 2.438 -0.29 0.14 0.67 0.02 H51719 ORC1L 5.906 5.917 2.889 0.19 0.54 -0.57 0.14 H59203 CDC6 0.551 5.968 2.723 -0.43 0.33 0.61 0.04 R06900 RAMP 0.243 6.049 2.889 -0.13 0.42 0.06 0.00

Page 64: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Statistical inferences on the phase angle

- Some open problems

φ

Page 65: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Estimation subject to inequality Estimation subject to inequality constraintsconstraints

It is reasonable to hypothesize that for a normal cell division cycle, the p phase marker genes must express in an order around the unit circle.

Thus they must satisfy:

πφφφ 2...0 21 ≤≤≤≤≤ p

Page 66: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Open problemsOpen problems- data from single experiment- data from single experiment

How to estimate the phase angles subject to the simple order restriction?

More generally - wow to estimate the phase angles subject isotropic simple order restriction?

How to test the above hypothesis? What are the null and alternative hypotheses?

πφφφ 2...0 21 ≤≤≤≤≤ p

pφφφ ≤≤≤ ...21

Page 67: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

Open problems – data from multiple Open problems – data from multiple experimentsexperiments

How do we estimate the phase angles from multiple experiments under the order restriction on the phase angles of cell cycle genes?

What are the statistical errors associated with such an estimator?

How to construct confidence intervals and test hypotheses?

Page 68: Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle

AcknowledgmentsAcknowledgments

Delong Liu (former Post-doc at NIEHS) David Umbach (NIEHS) Leping Li (NIEHS) Clare Weinberg (NIEHS) Pat Crocket (Constella Group) Cristina Rueda (Univ. of Valladolid, Spain) Miguel Fernandez (Univ. of Valladolid, Spain)