35
Motif discovery Tutorial 5

Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM

Embed Size (px)

Citation preview

Motif discovery

Tutorial 5

Motif discovery•MEME

Creates motif PSSM de-novo (unknown motif)•MAST

Searches for a PSSM in a DB•TOMTOM

Searches for a PSSM in motif DBs

Agenda

Cool story of the day: How NOT to be a bioinformatician

Motif – definition

Motifa widespread pattern with a biological significance.

Sequence motif

PTB (RNA binding protein)

UCUU

CAP (DNA binding protein)

TGTGAXXXXXXTCACAXT

Sequence motif – definition

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 3/6 1/6 2/6 0 0

D 0 3/6 2/6 0 0 1/6 5/6 1/6 0 1/6

E 0 0 4/6 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 3/6 3/6 0 0

..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE..

Motifa nucleotide or amino-acid sequence pattern that is widespread

and has a biological significance

PSSM - position-specific scoring matrix

Can we find motifs using multiple sequence alignment (MSA)?

YES! NO

Local multiple sequence alignment is a hard problem to solve

Motif search: from de-novo motifs to motif annotation

gapped motifs

Large DNA data

http://meme.sdsc.edu/

MEME

MEME – Multiple EM* for Motif finding

• Motif discovery from unaligned sequences - genomic or protein sequences

• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)

*Expectation-maximization

http://meme.sdsc.edu/

MEME - Input

Input file (fasta file)

How many times in each

sequence?

How many motifs?

How many

sites?

Range of motif lengths

MEME - Output

Motif e-value

MEME – Sequence logo

Motif length

Number of appearnces

Motif e-value

A graphical representation of the sequence motif

MEME – Sequence logoHigh information content = High confidence

The relative sizes of the letters indicates their frequency in the sequences The total height of the letters depicts the information content of the position, in bits of information.

Multilevel Consensus

MEME – Sequence logo

Patterns can be presented as regular expressions

[AG]-x-V-x(2)-{YW}

[] - Either residuex - Any residuex(2) - Any residue in the next 2 positions{} - Any residue except these

Examples: AYVACM, GGVGAA

Sequence names

Position in sequence

Strength of match

Motif within sequence

MEME – motif alignment

Overall strength of motif matches

Motif location in the input sequence

MEME – motif locationsSequence names

What can we do with motifs?

• MAST - Search for them in non annotated sequence databases (protein and DNA).

• TOMTOM - Find the protein which binds the DNA motifs.

MAST

MAST

• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for

searching the discovered motifs on the given sequences.

http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi

MAST - Input

Input file (motifs)

Database

If you wish to use motifs discovered by MEME

MAST - OutputInput motifs

Presence of the motifs in a given database

MAST – Output (another example, global view)

MAST – Output (another example, global view)

TOMTOM

TOMTOM

• Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value.

• The output contains results for each query, in the order that the queries appear in the input file.

http://meme.sdsc.edu/meme/doc/tomtom.html

TOMTOM - Input

Input motif

Background frequencies

Database

TOMTOM - OutputInput motif

Matching motifs

TOMTOM – OutputWrong input (RNA sequence of RNA binding protein NOVA1)

“OK” results

MAST vs. TOMTOM

MAST TOMTOMComparison Profile against DB Profile against

ProfileDB General DBs Known motif DBs

Cool Story of the day

How NOT to be a bioinformatician