23
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software Web addresses

1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software

Embed Size (px)

Citation preview

1

Orthology and paralogy

A practical approach

Searching the primaries

Searching the secondaries

Significance of database matches

DB Web addresses

Software Web addresses

2

Why Search Databases?

• To find out if a new DNA sequence already is deposited in the databanks.

• To find proteins homologous to a putative coding ORF.

3

Why Search Databases?

• To find similar non-coding DNA stretches in the database, (for example: repeat elements, regulatory sequences).

• To locate false priming sites for a set of PCR oligonucleotides.

4

What Databases Are Available?• DNA (nucleotide sequences):

The big databases: Genbank, Embl, DDBJ an their weekly updates. These databases exchange information routinely.

• Genomic databases like the: Human (GDB), Mouse (MGB), Yeast (SGB), etc…

• Special databases: ESTs (expressed sequence tags) STSs (sequence-tagged sites) EPD (eukaryotic promoter database) REPBASE (repetitive sequence database) and many others.

5

What Databases Are Available?• Protein (amino acid sequences):

The big databases are: Swiss-Prot ( high level of annotation) PIR (protein identification resource)

• Translated databases like: SPTREMBL (translated EMBL) GenPept (translation of coding regions in GenBank)

• Special databases like: PDB(sequences derived from the 3D structure Brookhaven PDB)

6

Web Addresses

• http://www.ncbi.nlm.nih.gov/Entrez/– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=sear

ch&DB=nucleotide– http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.

html– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

7

Let us go

http://www.ncbi.nlm.nih.gov/Entrez/

8

What is GenBank?

• http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html

• GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences …

10

NCBI databases

• http://www.ncbi.nlm.nih.gov/Database/index.html

http://www.ncbi.nlm.nih.gov/Database/tut1.html

Let us try a tutorial

11

Web Addresses

• http://www.ebi.ac.uk/Databases/– http://www.ebi.ac.uk/embl/index.html– http://www.ebi.ac.uk/swissprot/index.html– http://www.ebi.ac.uk/microarray/ArrayExpress/

arrayexpress.html

12

Homology and Analogy

It is important to understand a concept that underpins sequence analysis - homology.

The term homology is confounded and abused in the literature.

Simply, sequences are said to be homologous if they are related by divergence from a common ancestor.

13

What Is Homology ?(from the Technion

course)• Similarity or likeness between

properties in species.• Before Darwin, homology was

defined morphologically:• Example:

14

Homology

Bats and butterflies fly, but are different.

Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike.

Bats and butterflies wings are not homologous.

Bats wings and whales flippers are homologous.

15

Homology Interpretation from Darwin

to 21st Century• Darwin (1859) explained homology

as the result of descent with modification from a common ancestor.

• Modern genetics: Homology information is in the genes.

• Two sequences are homologous if they are both similar and have a common ancestor.

16

When Does Similarity Imply

Homology?• Similarity by itself is not enough: for

example, short sequences similarity could be random (result from different ancestors).

• Large enough similarities typically imply homology (and usually we do not have direct evidence on descent).

• Sequence similarity comes with a significance measure.

17

Homology and Analogy

Understanding homology allows us to appreciate the concept of analogy; this is encountered in protein structures that share similar folds but have no demonstrable sequence similarity; or that share groups of catalytic residues with almost exactly equivalent spatial geometries, but otherwise have neither sequence nor structural similarity. Such relationships are thought to result from convergence to similar biological solutions from different evolutionary starting-points.

18

Homology and Analogy

The essence of sequence analysis is the inference of homology.

Homology is not a measure of similarity, but an absolute statement that sequences have a divergent rather than a convergent relationship.

Thus, phrases that quantify homology are meaningless.

19

Orthology and Paralogy

Homologous proteins may perform the same function in different species (orthologues) or different but related functions within one organism (paralogues).

Comparison of orthologues allows study of molecular palaeontology, while paralogues have provided deeper insights into the underlying mechanisms of evolution.

20

Orthology and Paralogy

Paralogues arose from single genes via successive duplication events.

The duplicated genes followed separate evolutionary pathways, and new specificities evolved through variation and adaptation.

21

Complete genomes

• http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome

• Let us walk around among genomes

22

COGsPhylogenetic classification of proteins encoded in complete

genomesClusters of Orthologous Groups of proteins (COGs) were

delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Proteins from two eukaryotic genomes (Drosophila melanogaster and Caenorhabditis elegans) were assigned to COGs and can be reached from each individual COG page.

23

COGs

• http://www.ncbi.nlm.nih.gov/COG/• Cognitor• http://www.ncbi.nlm.nih.gov/COG/xognitor.html

• COG Help• http://www.ncbi.nlm.nih.gov/COG/

COGhelp.html#top»FTPftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mycobacterium_leprae/