Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one...

Multiple AlignmentsMotifs/Profiles

• What is multiple alignment?• HOW does one do this?• WHY does one do this?• What do we mean by a motif or

profile?

BIO520 Bioinformatics Jim Lund

Prev. reading: Ch 1-5Assigned reading: Ch 6.4, 6.5, 6.6

Information from Alignments

• Infer biological function– Conserved elements critical for function– Divergent elements relate to divergent

function

• Infer structure (2°, 3°)• Infer phylogeny

– History– Evolutionary forces (selection…)

How do I find similar sequences?

Multiple Alignment

•Global, Optimal

•Theory

•Computation

•Progressive Alignment

Multiple Alignment: better alignments

Alignment Methods/Programs

• GAP (GCG suite)– Optimal Alignment

• MSA– (nearly) Optimal Alignment

• Clustal W/X – Progressive Alignment

• PSI-BLAST– Searches for matching sequences iteratively– Search seq is invariant master for the

alignment.

MSA Strategy

c(A)=c(Ai,j)Minimize score!

• HUGE matrix(aa# of seqs) CRASH computer– time~product of sequence length– 1000x10,000 OK, but 200x200x200x200 NOT

• Alignment procedure– nearly optimal--only considers a subset of all

alignment)– weight sequences via distance– branch-and-bound algorithm

Running MSA

• Download and run it locally (UNIX):– http://www.ncbi.nlm.nih.gov/CBBresearch/S

chaffer/genetic_analysis.html

• On the internet:– http://searchlauncher.bcm.tmc.edu/multi-

align/multi-align.html

• Rerun on segments AFTER Clustal...

Clustal Strategy

1. Rapid pairwise alignments each-to-each

2. Calculate distance matrix– Create guide tree (neighbor joining)

3. Align– Closest pairs first

– Add pairs or align sub-alignments

– Adjust similarity matrix as alignment proceeds

4. Add sequences– introduce gaps

• gaps at loops, not inside known 2° structures

• Dynamic gap weighting

Clustal Strategy

Pairwise alignments Guide tree Align

Clustal W(X) Strategy1. Pairwise alignments

The pairwise alignment number here is a dissimilarity measure.

Clustal W(X) Strategy2. Unrooted neighbor tree

(dendrogram)

Clustal W(X) Strategy3. Guide tree

Clustal W(X) Strategy4. Progressive alignment

using guide tree

Running Clustal W/X• WWW, Win, Mac, UNIX

– http://www2.ebi.ac.uk/clustalw/

• Input– Multiple sequence file (PIR, FASTA,…)

• Can FORCE alignments

• Specify secondary structures

• Considerations– Fast, easy, widely used

– Divergent proteins OK (trees misleading)

“The Right Proteins”GAPDH

Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117

Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117

*********** :**********.:***.*******************************

“The Right Proteins”GAPDH

Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118

Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110

Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105

:. : :: * : : :*:* :* *. *. :********* ** **:*****::

Alignment Interpretation

• DNA sequences– >50% “worth looking at” (eyeball test)– ~75% needed for phylogeny

• Polypeptide sequences– 80% similar=SAME tertiary structure– 30-80% domains=similar structure– 15-30% ????– <15% short motifs

Uses of Alignment

• Understanding or predicting mutant function

• Finding motifs in DNA or polypeptides

• Directing experiments--e.g. PCR primers

• Phylogeny

“The Right Proteins”

Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118

Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110

Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105

:. : :: * : : :*:* :* *. *. :********* ** **:*****::

Viewing and interpreting alignments

•Color residues by property•Conservation in the alignment•Known properties

•Substitution groups: STA, HY•Physiochemical property

•charge•hydrophobicity

•Programs for visualization•Jalview•AMAS•Alscript

Viewing alignments

JalView alignment viewer

How to build multiple alignments

1. Find sequences to align (db search).

2. Choose which regions of each protein to include.

• Sequences should be of similar lengths.

3. Run multiple alignment program.

4. Inspect multiple alignment for problems.• Regions with many gaps have aligned poorly.

5. Remove disruptive sequences and re-run alignment.

6. Add back remaining sequences avoiding disruption.

Interpro

• Pfam 7.3 (3865 domains), • PRINTS 33.0 (1650 fingerprints), • PROSITE 17.5 (1565 and 252

preliminary profiles), • ProDom 2001.3 (1346 domains), • SMART 3.1 (509 domains), • TIGRFAMs 1.2 (814 domains), • SWISS-PROT 40.27 (113470 entries), • TrEMBL 21.12 (685610 entries).

InterproA database of protein families, domains

and functional sites

• PROSITE, home of regular expressions and profiles;

• Pfam, SMART, TIGRFAMs, PIRSF, and SUPERFAMILY keepers of hidden Markov models(HMMs);

• PRINTS, provider of fingerprints (groups of aligned, un-weighted motifs);

Interpro

NCBI CDD (Conserved Domain Database

Domains from:• Pfam (Protein families)

– A database of protein families that currently contains > 7973 entries.

• SMART (a Simple Modular Architecture Research Tool)– More than 500 domain families found in signalling,

extracellular and chromatin-associated proteins are detectable.

– Domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues.

• COGs (Clusters of Orthologous Groups)– Proteins or groups of paralogs from at least 3 lineages that

correspond to an ancient conserved domain

Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one...

Documents

Multiple Regression Analysis - SAGE Publications Inc | 5A: Multiple Regression Analysis 173 predictors—does not equal the squared multiple correlation. The reason for this is that

Stellar Physics 1 Multiple Choice Questions. Test Question Does this quiz work? A.Yes B.No

Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund

Multiple Environmental Deprivation Multiple environmental deprivation in South Lanarkshire: Does it influence health? Dr Elizabeth Richardson Dr Jamie

What Does a Price-Earnings Multiple Mean?...January 29, 2014 What Does a Price-Earnings Multiple Mean? 4 The equation solves for the value of the firm, but it is straightforward to

Does Multiple Axial Vein Incompetence Increase The Clinical Severity of Venous Disease?

Multiple Myeloma - Canadian Cancer Society/media/cancer.ca/CW/publications/Multiple... · Multiple myeloma often develops very slowly and does not always cause symptoms. If you have

Projective Reconstruction from Multiple Views with ...wktang/papers/hung06projectiveIJCV.pdf · Projective Reconstruction from Multiple Views 307 methods, our method does not rely

Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16

Does functional type vulnerability to multiple threats ... › media › ...BIODIVERSITY RESEARCH Does functional type vulnerability to multiple threats depend on spatial context in

Gene Structure and Identification Genes and Genomes ORFs and more Consensus Sequences Gene Finding BIO520 BioinformaticsJim Lund Reading: sections 1.3,

Protein Structure and Function 1 , 2 , 3 , 4 Structure Viewing, interpreting structure Protein Characterization BIO520 BioinformaticsJim Lund

Positional Astronomy Multiple Choice Questions. Test Question Does this quiz work? A.Yes B.No

Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1

RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:

Bioinformatics BIO520/INF520 Jim Lund Assigned reading: Ch1 & 2

Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund

Dynamical Astronomy Multiple Choice Questions. Test Question Does this quiz work? A.Yes B.No

MULTIPLE MYELOMA - mpeurope.org · 08 09 Myeloma Patients Europe Multiple Myeloma – A Patients’ Guide 2. Multiple myeloma 2.1 What is bone marrow and what does it do? To understand