The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)

Preview:

Citation preview

The Complex Portal- relationship to Gene Ontology

Sandra Orchard(IntAct)

Project Aim

• To design a Online Portal to search and visualise protein complexes

• Including cross-referencing to source databases and beyond

• Export to interested parties in a format of their choice

• Incorporate the data into network analysis tools

• Emphasis on major model organisms, chosen to span the taxonomic range –

• Homo sapiens, Saccharomyces cerevisiae, Escherichia coli

• Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces pombe, Arabidopsis thaliana

• All data held in IntAct DB – share editor, protein update mechanism, QC procedures

• Separate search and visualisation facility

• wwwdev.ebi.ac.uk/intact/complex/

Definition: stable protein complexes

A stable set (2 or more) of interacting protein molecules which

• can be co-purified and

• have been shown to exist as a functional unit in vivo.

Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex.

What is not a stable complex?• Two proteins associated in a pulldown /

coimmunoprecipitation with no functional link• Enzyme/substrate, receptor/ligand or similar transient

interactions• Exception - obligate complex that requires substrate/ligand,

e.g. PDGF receptors

Source Databases

• PDBe (EBI) – almost 1000 complexes imported

• ChEMBL (EBI) – 81 complexes imported, more to come with each release

• MatrixDB (Sylvie Richard-Blum, Univ. of Lyon)

• Mining UniProt – yeast (Bernd Roechert, SIB – manually)

• Reactome – human (EBI)

• Manual curation from IMEx DBs & the literature

• Gramene – Arabidopsis

• Unmaintained web resources – CYGD (yeast), CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI),

Data captured currently for IntAct complexes• Participants – proteins (UniProt), small molecules

(ChEBI), nucleic acids (Ensembl, ChEBI, RNACentral?)

• Species

• Stoichiometry – when known

• Topology (= binding sites) – when known

Data captured currently for IntAct complexes• Complex-specific, free-text annotation fields:• Function and context – UniProt-style (visible in search

results)

• Assembly, e.g. homodimer, heterotetramer…

• Physical properties, e.g. MW, size, topology/assembly

• Ligands

• Disease

Data captured currently for IntAct complexes• Complex names:• Recommended name:

most recognisable name from literature, use GO component if specific complex exists in GO

• Systematic name:

based on Reactome’s new CV names – ‘string of gene names with stoichiometry’

• Synonyms:

all other names the complex may be known as

Data captured currently for IntAct complexes• Structured annotation using GO (BP, MF, CC)

• Cross references to experimental evidence:• IMEx (+ non-IMEx IntAct & DIP), PDB, EMDB

• Cross references to related complex data: • Reactome (human)

• ChEMBL

• PubMed (for further information)

• Intenz (enzyme EC numbers)

• OMIM (disease)

• ECO (evidence code ontology)

Parallel Annotation of complexes in GO

• Project start > 400 complex terms in GO CC, mostly children

of GO:0043234 protein complex – lacking hierarchal

structure

• Good collaboration with GO to provide structured annotation

• Parent terms mainly based on complex function

• TermGenie (TG) Standard Form <protein_complex_by_activity>

• Otherwise use TG Free Form

• Some complexes still direct children of GO:0043234 protein complex

• Adding “logical definitions” / “cross-products” / “extensions”

• e.g. “capable_of x activity”

ECO – Evidence Code Ontology

• ECO:0000353 physical interaction evidence used in manual assertion (=IPI)

• full experimental evidence for the complexes is present

• ECO:0000266 - sequence orthology evidence used in manual assertion (=ISO)

• only limited experimental evidence exists for a complex in one species (e.g. mouse) but it is desirable to curate the complex which has been curated in another species (e.g. human) and orthologous gene products exist in the former species, e.g. PDGFs

• ECO:0000306: inference from background scientific knowledge used in manual assertion, if:

• no or only partial experimental evidence can be found but the complexes are generally assumed to exist, e.g. GABA receptors exist in ChEMBL

Download

• At present:

• One PSI-MI xml 2.5.4 file for all complexes on ftp site

• From next IntAct release:

• One file per complex within a folder per species on ftp site and a zip file per species

• Future:

• Separate files for each complex accessible on each complex details page

• List of files for complexes from search results list

• Database specified dumps

• Network analysis appropriate format (as developed by MIPS)

Project status

• Website will move to production site end March

• Further development (particularly graphics) will be made public over the next 6 months

• Curation priorities – Human (mouse), yeast, Ecoli

- user requestsExports to GOA (process and component) and UniProt under discussion.

Future Plans - Display

• Add search filters, e.g.

• Species –almost done

• GO terms

• ECO

Advanced Search

• Links to ‘experimental evidence’ and ‘related complexes’ searches

• Schematic view of complex

• Add existing widgets/BioJS components to show content from other databases directly in the Complex Portal (BioJS)

- crystal structure, pathway, enzyme reactions etc

Future Plans - Functionality

• Concept of ‘sets’ – important for Reactome import

• Hierarchy of complex sets specific complex sub-complex

• Introducing features to indicate, e.g. complex-drug binding sites

Complexes on demand

1. Request via ‘Contact us’ button

1. Name & components

2. Experimental paper

3. Full details including Function,

stoichiometry and topology

.. or we give you access to editor to create your own

17

????

??? ?

??

?

?

?

?

?

?

??

?

?

? ?

?

Summary of ‘User Survey’ and own goals

Summary of ‘User Survey’ - Search

Summary of ‘User Survey’ - Display

Summary of ‘User Survey’ - Features

Expression Atlas?

Summary of ‘User Survey’ - Features

Manually for mouse

ECO xref to exp-evidence

Summary of ‘User Survey’ - Features

Definition???Reactome

Summary of ‘User Survey’ - Features

Summary of ‘User Survey’ - Downloads

IntAct and Complex Portal homepage

Complex PortalUniProt-style display

Complex Portaltab-style display

Recommended