Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function...

Preview:

Citation preview

Today’s menu:-UniProt - SwissProt/TrEMBL -PROSITE-Pfam-Gene Onltology

Protein and Function Databases

Tutorial 7

Glossary

Domain- A structural unit which can be found in multiple protein contexts.Motif- A short unit found outside globular domains.Repeat- A short unit which is unstable in isolation but forms a stable structure when multiple copies are present.Family- A collection of related proteins.

UniProt

The Universal Protein Resource

(UniProt) is a central

Repository of protein sequence,

function,classification,and cross

reference. It was created by

Joining the information contained

in Swiss-Prot and TrEMBL.

http://www.uniprot.org/

Characterized proteins

Hypothetical proteins

Pfam

• http://pfam.sanger.ac.uk/

•Pfam is a database of multiple alignments of protein domains or conserved protein regions.

One more example

Description

Structure info

Gene Ontology

Links

What kind of domains can we find in Pfam?

Trusted Domains

Repeats and Motifs

Fragment Domains

Nested Domains

Disulfide bonds

Important residues(e.g active sites)

Trans membrane domains

What kind of domains can we find in Pfam?

Low complexity regions

Coiled Coils:(two or three alpha helices that wind around each other)

Context domains: are those that despite not scoring above the family threshold are expected to be real, based on the other domains found in the protein.

Signal peptides:(indicate a protein that will be secreted)

• http://www.expasy.org/tools/scanprosite ProSite is a database of protein domains and motifs that can be searched by either regular expression patterns or sequence profiles.

Search Results

Domains architecture

http://www.expasy.ch/tools/pratt/

PRATTMake a pattern from FASTA format sequences inorder to query Prosite

Greed, Overlap and Include

Search A-x(1,3)-A on ABACADAEAFA

Gene Ontology (GO)

• It is a database of biological processes, molecular functions and cellular components.• GO does not contain sequence information nor gene or protein description. • GO is linked to gene and protein databases. •The GO database is structured as a tree

http://www.geneontology.org/

Three principal branches

http://www.geneontology.org/amigo/

GO structure is a Directed Acyclic Graph

Important: note what is the source of the GO entry

GO sources

ISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation

http://www.ebi.ac.uk/interpro/

Interpro

Recommended