Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW...

Preview:

Citation preview

Exploring Protein Sequences

Tutorial 5

Exploring Protein Sequences

• Multiple alignment– ClustalW

• Motif discovery– MEME– Jaspar

• More than two sequences– DNA– Protein

• Evolutionary relation– Homology Phylogenetic tree– Detect motif

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

• Dynamic Programming– Optimal alignment– Exponential in #Sequences

• Progressive– Efficient– Heuristic

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

ClustalW

“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

• Progressive– At each step align two existing alignments or sequences

– Gaps present in older alignments remain fixed

ClustalW

GTCGTAGTCG-GC-TGTC-TAG-CGAGCGTGC-GAAG-AG-GCG-GCCGTCG-CG-TCGT

GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

ClustalW - InputScoring matrix

Gap scoring

Input sequences

ClustalW - Output

ClustalW - Output

Input sequences

Pairwise alignment scores

Building alignment

Final score

ClustalW - Output

ClustalW Output

Sequence names Sequence positions

Match strength in decreasing order: * : .

http://http://www.megasoftware.net/

Can we find motifs using multiple sequence alignment?

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

1 3 5 7 9..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *:

MotifA widespread pattern with a biological significance

Can we find motifs using multiple sequence alignment?

YES! NO

MEME – Multiple EM for Motif finding

• http://meme.sdsc.edu/• Motif discovery from unaligned sequences

– Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)

MEME - InputEmail address

Multiple input sequences

How many times in each sequence?

How many motifs?

How many sites?

Range of motif lengths

MEME - OutputMotif length

Number of times

Like BLAST

MEME - Output

Probability * 10

‘a’=10, ‘:’=0

MEME - Output

Low uncertainty

=

High information content

MEME - Output

Multilevel Consensus

Sequence names

Reverse complement (genomic input only)

Position in

sequence

Strength of match

Motif within sequence

MEME - Output

Overall strength of motif matches

sequence lengths

Motif instance

MEME - Output

‘-’=Other strand

MAST• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

JASPAR• Profiles

– Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of

experiments

• Open data accesss

JASPAR• profiles

– Modeled as matrices.– can be converted into PSSM for scanning

genomic sequences.

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

Search profile

http://jaspar.cgb.ki.se/

http://jaspar.cgb.ki.se/

Recommended