XPRIME: A Novel Motif Searching Method

Preview:

DESCRIPTION

Presentation prepared for the WNAR conference held at Portland State University in 2009

Citation preview

XPRIME: A Novel Motif Searching Method

Rachel L. Poulsen

Department of StatisticsBrigham Young University

June 15, 2009

Introduction

DNA contains the genetic instructions that uniquely define anorganism

RNA is created to carry genetic instructions from the DNA tothe rest of the cell

The process of DNA “talking” to the rest of the cell is calledtranscription

Introduction

DNA contains the genetic instructions that uniquely define anorganism

RNA is created to carry genetic instructions from the DNA tothe rest of the cell

The process of DNA “talking” to the rest of the cell is calledtranscription

Transcription

DNA

RNA

Transcription

DNA RNA

Transcription

DNA RNA

Position Weight Matrix (PWM) (Hertz et al 1990)

ETS1 TF binding motif

Position: 1 2 3 4 5 6 7 8ACGT

0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.0670.933 0.600 0.0 0.0 0.0 0.133 0.067 0.4000.000 0.000 1.0 1.0 0.0 0.000 0.667 0.0000.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533

Position Weight Matrix (PWM) (Hertz et al 1990)

ETS1 TF binding motif

Position: 1 2 3 4 5 6 7 8ACGT

0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.0670.933 0.600 0.0 0.0 0.0 0.133 0.067 0.4000.000 0.000 1.0 1.0 0.0 0.000 0.667 0.0000.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533

Sequence Logos

Figure: DNA binding motif for the ETS1 TF

De Novo motif searching

Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)

PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)

De Novo motif searching

Regular expression enumeration

1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)

PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)

De Novo motif searching

Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)

PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)

De Novo motif searching

Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)

PWM updating

1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)

De Novo motif searching

Regular expression enumeration1 Actual count vs. expected count2 Dictionary-based sequence model (Bussemaker et al. 2000)

PWM updating1 MEME (Bailey et al 1995)2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)3 BioProspector (Liu et al 2001)4 AlignACE (Roth et al 1998)

Known Motif Search

1 GREP

2 Database search with scoring function (Hertz et al 1990)

XPIME: An Improved Method

TRANSFAC (Matys et al 2003)

Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously

XPIME: An Improved Method

TRANSFAC (Matys et al 2003)

Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFAC

XPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously

XPIME: An Improved Method

TRANSFAC (Matys et al 2003)

Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior information

XPRIME can search for both de novo motifs and known motifssimultaneously

XPIME: An Improved Method

TRANSFAC (Matys et al 2003)

Information pulled from in vitro experiments and literatureMost methods justify results using TRANSFACXPRIME incorporates prior informationXPRIME can search for both de novo motifs and known motifssimultaneously

Notation and Data

Indices

w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence

The data, zs

zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )

yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not

Notation and Data

Indices

w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence

The data, zs

zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )

yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not

Notation and Data

Indices

w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence

The data, zs

zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )

yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not

Notation and Data

Indices

w: width of motifL: length of sequencem: motif indicatori: position in sequencej: position in motifs: indicates sequence

The data, zs

zs = (yis ,∆1i ,∆2i , · · · ,∆(m+1)i )

yi represents the position (w-mer)∆mi indicates if yi belongs to motif m or not∆(m+1)i indicates if yi belongs to the backgrond motif or not

The Scoring Function

MotifScore = f (y) =w∏

j=1

∑i∈A,C ,G ,T

pij I (yj = i).

Methods: Complete Data Likelihood

(m+1) – component mixture model

L(θ|z) =Ls∏i=1

C (yi )[r1f1(yi )]∆1i [r2f2(yi )]∆2i · · · [rm+1fm+1]∆(m+1)i

f(y) is the Motif Score equation

Methods: Complete Data Likelihood

(m+1) – component mixture model

L(θ|z) =Ls∏i=1

C (yi )[r1f1(yi )]∆1i [r2f2(yi )]∆2i · · · [rm+1fm+1]∆(m+1)i

f(y) is the Motif Score equation

Methods: Priors

fm+1(y) is fixed a priori

∆(m+1)i ’s are missing a priori

f1(y), · · · , fm(y) have product Dirichlet priors such that

π(fm(y)) ∝L∏

j=1

∏k∈(A,C ,G ,T )

papmij

−1

mjk

r also has a Dirichlet prior

π(r) ∝M∏i=1

rari−1

i

Methods: Gibbs Algorithm

1 Draws ∆’s from a multinomial distribution

p∆ ∝ rM ∗ fM(y)

2 Draws r from a Dirichlet distribution

αr =∑L

i=1 ∆Mi + aM

3 Draws pmij from a Dirichlet distribution

αpmij =∑L

i=1

∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij

Methods: Gibbs Algorithm

1 Draws ∆’s from a multinomial distribution

p∆ ∝ rM ∗ fM(y)

2 Draws r from a Dirichlet distribution

αr =∑L

i=1 ∆Mi + aM

3 Draws pmij from a Dirichlet distribution

αpmij =∑L

i=1

∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij

Methods: Gibbs Algorithm

1 Draws ∆’s from a multinomial distribution

p∆ ∝ rM ∗ fM(y)

2 Draws r from a Dirichlet distribution

αr =∑L

i=1 ∆Mi + aM

3 Draws pmij from a Dirichlet distribution

αpmij =∑L

i=1

∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij

Methods: Gibbs Algorithm

1 Draws ∆’s from a multinomial distribution

p∆ ∝ rM ∗ fM(y)

2 Draws r from a Dirichlet distribution

αr =∑L

i=1 ∆Mi + aM

3 Draws pmij from a Dirichlet distribution

αpmij =∑L

i=1

∑k={A,C ,G ,T} ∆mi I (yij = k) + apmij

An Example: ETS1

We hypothesize that ETS1 has a specific binding site

The Data1 ETS1 only2 GABP only3 ETS1 and GABP

ETS1 Binding Motifs

(a) ETS1 from TRANSFAC (b) ETS1 from ETS1 only

(c) ETS1 from GABP only (d) ETS1 from ETS1/GABP

Justification of Prior Information

Pete Hollenhorst sequence logo

Justification of Prior Information

Figure: Motif found without prior specification

Figure: Motif found with prior specification

Conclusions and Future Research

XPRIME successfully searches for de novo and known motifs

Evidence found suggesting ETS1 has its own binding motif

Hidden Markov Models and forward backward algorithm

Prior information on r

Conclusions and Future Research

XPRIME successfully searches for de novo and known motifs

Evidence found suggesting ETS1 has its own binding motif

Hidden Markov Models and forward backward algorithm

Prior information on r

Conclusions and Future Research

XPRIME successfully searches for de novo and known motifs

Evidence found suggesting ETS1 has its own binding motif

Hidden Markov Models and forward backward algorithm

Prior information on r

Conclusions and Future Research

XPRIME successfully searches for de novo and known motifs

Evidence found suggesting ETS1 has its own binding motif

Hidden Markov Models and forward backward algorithm

Prior information on r

Conclusions and Future Research

XPRIME successfully searches for de novo and known motifs

Evidence found suggesting ETS1 has its own binding motif

Hidden Markov Models and forward backward algorithm

Prior information on r

Recommended