46
EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

Embed Size (px)

Citation preview

Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

EBI is an Outstation of the European Molecular Biology Laboratory.

Proteins to Proteomes The InterPro Database

Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

raw data

Origins of InterPro

UniProt

automated annotationInterPro

Swiss-Prot TrEMBL290K

annotated5M ???

Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Curated Annotation in InterPro

TrEMBL

uncharacterised sequence

Swiss-Prot

annotated sequence

TrEMBL

feed back common annotatio

n

groups of related proteins

(same family or share domains)

multiple signatures

InterPro

Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Finding Conserved Signatures

• Pattern

More information

Simplest (limited)• Fingerprint

• Sequence clustering

• HMM

Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

Enzyme catalytic site Prosthetic group attachment Metal ion binding site Cysteines for disulphide bonds Protein or molecule binding

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: Insulin

Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N

Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C

Regular expression

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N

Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Patterns

Extract pattern sequencesxxxxxxxxxxxxxxxxxxxxxxxx

Sequence alignment

Insulin family motifDefine pattern

Pattern signature

C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-CBuild regular expression

PS00000

Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAETGIHARPATLLVQTASKFNSDINLEYKGKSVNLK

SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEYKGKSVNLK

SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

His phosphorylation site

Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA:

His phosphorylation site

Ser phosphorylation site

MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA:

His phosphorylation site

Ser phosphorylation siteConserved site

MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE

Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE

1) GIHARPATLLVQTASKF2) KGKSVNLKSIMGVMSL

3) LGVGQGSDVTITVDGADE 3-motif fingerprint

Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Fingerprints

Extract motif sequences

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

Sequence alignment

Correct order

Correct spacing

Ser phosphorylation

site

Conserved site

His phosphorylation

siteDefine motifs

Fingerprint signature 1 2 3

PR00000

Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Sequence clustering

Automatic clustering of homologous domains

**Rarely covers entire domain (conserved core)

**Signature size can change with release

Known domain families

Recruit homologous domains

PSI-BLAST

MKDOM2

Automatic clustering

ProDomAlignAlign domain families

Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Hidden Markov Models (HMM)

Can characterise protein over entire length

Models conserved and divergent regions (position-specific scoring)

Models insertions and deletions

Outperform in sensitivity and specificity

More flexible (can use partial alignments)

Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

Sequence alignment

Scoring matrix

(residue frequency at each position in

alignment)

Profile

Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

Phe most conserved

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

Phe, Tyr and Leu found at position 1 of alignment

highest match value

Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

Tyr and Leu found at equal frequency at position 1

Tyr closer to Phe than Leu

Scores: F > Y > L

Probability method gauges scoring

parameters

Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Hidden Markov Models (HMM)

Sequence alignment

M1 M2 M3 M4Begin

End

M = match state

Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Hidden Markov Models (HMM)

D3

I2 I3

M1 M2 M3 M4Begin

End

D1 D4

M = match state,

D2

D = delete state

I1

I = insert state,

Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

SAM Profile HMMs

Homologous structural superfamilies

Start with single seed sequence

Proteins in superfamily may have low

sequence identity

Few proteins in family have PDB structures

Create 1 model for every protein in superfamily combine results

Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Specialisation of Databases

Describe sibling families

Identify binding and active sites

Describe conserved core of domains

PRINTS

PROSITE

PRODOM

Wide coverage of domains & familiesPFAM

Signalling, extracellular & nuclear domainsSMART

Functional classification of familiesTIGRFAM

Families conserved in domain compositionPIRSF

Functional classification of familiesPANTHER

Structural-based domain classificationSuperfam

Structural-based domain classificationGENE3D

Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Manual curation

Integration of signatures

InterProInterPro

Foundations of InterPro

Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

InterPro Entry

Groups similar signature together

Adds extensive annotation

Linked to other databases

Structural information and viewers

Links related signatures

Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Assigning Type

Domain Biological units with defined boundaries

Full-length signatures grouping related proteins Family

Repeat

Site

Signature repeated as a series of short motifs

Protein feature described by a Prosite pattern

Region Any signature that doesn’t fit the above

Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Grouping Signatures Together

Same positions

Different protein hits2)

PFAM

PROSITE (100)

(50)

PROSITE

PFAM

3) (100)

(100)

Different positions

Same protein hits

PFAM

PROSITE1) (100)

(100)Same positionsSame protein hits

IPR000001

Different positions4)PFAM

PROSITE (100)

(100)

IPR000001

IPR000002

IPR000001

IPR000002

IPR000001

IPR000002

Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Link related signatures - relationships

1) Parent - Child (subgroup of more closely related proteins)

PFAM

(75)

(100)

SMART

Protein kinase

Serine kinase

PROSITE (25) Tyrosine kinase

*

PFAM (100) Protein kinase*

No proteins in common

SMART PROSITE

PFAM

Protein kinase

SMART PROSITE

Serine kinase Tyrosine kinase

Parent

Children

Applies to domains and families

Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

2) Contains – Found in (Describes domain composition)

PROSITE C-terminal domainSMARTN-terminal domain

PFAM Receptor family

PFAM

Receptor Family

SMART PROSITE

N-terminal domain C-terminal domain

Both families and domains can contain domains

Found in(Pfam)

Contains (Smart and Prosite)

Link related signatures - relationships

Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

2) Contains – Found in

Link related signatures - relationships

Coverage Signature must cover the entire (>90%) sequence of contained signature

PFAM

SMART

ContainsFound in

PFAM

SMART

Contains

Found in

Overlapping

Page 32: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Relationships – evolutionary context

GENE3D Grandparent

Parents

Children

InterPro Relationship

Criteria for Signature

Structural family

PFAM PFAMSequence families

TIGRFAM TIGRFAM TIGRFAM TIGRFAMFunctional families

Unique to InterPro

Page 33: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

Page 34: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

Select species-specific protein sets

Page 35: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Links to Other Databases

Annotation Fields in InterPro

• Blocks (family alignments)

• IntEnz (enzymes)

• Prosite documents• COME (bioinorganic motifs)

• CAZy (carbohydrate-active enzymes)

• IUPHAR (GPCR receptors)

• CluS-Tr (protein clusters)

• Pandit (phylogenetic trees of PFAMs)

• Merops (peptidases & inhibitors)

Page 36: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Structural information

PDB

Classification

Structures

CATH

SCOP

Homology Models

Swiss-Model

ModBase

Page 37: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Sequence-Structure Display

Signatures predictive of

protein annotation

Structural data for specific proteins

AstexViewer® for structure

Page 38: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Structure Viewer

Navigate between structure and sequence

Manipulate structures

Page 39: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Other Features – splice variants

Splice variants

Page 40: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Other Features – domain architecture

Select data set of these proteins

Each ‘balloon’ represents a

linked InterPro domain

Page 41: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Other Features – protein-protein interactions

Lists proteins in entry known to be involved in protein-protein interactions

IntAct database of interactions

Page 42: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Protein Sequence Coverage

InterPro signatures cover:

95% of UniProt/Swiss-Prot proteins

79% of UniProt/TrEMBL proteins

>4 million matches in InterPro

>16,000 InterPro entries

>50,000 signature methods

Page 43: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

Searching InterPro

http://www.ebi.ac.uk/interpro/

Search tools include:

• Text Search

• InterProScan (sequence search)

Page 44: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

InterPro Text Search

Text search box Search using:• text• protein ID• InterPro ID• GO term

Search results

Direct links to entry

Page 45: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

InterProScan Search Use ftp site to run multiple sequences

simultaneously

Member database search engines

Paste in sequence (protein/nucleotide)

Page 46: EBI is an Outstation of the European Molecular Biology Laboratory. Proteins to Proteomes The InterPro Database

http://www.ebi.ac.uk/interpro

InterProScan Search Results

single InterPro entry

Direct links to entry

Direct links to signature databases