18
EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

Embed Size (px)

Citation preview

Page 1: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Structural Proteomics Automatic

Target SelectionGordon Whamond

Page 2: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Aim: • Provide a resource that facilitates the automatic selection of potential targets for protein structure determination while minimising human interaction with the software (if required).

Input: • Raw amino acid sequence• UniProt accession number• UniProt accession number and a sequence range

Output:• Query sequence showing possible domains• All candidates for structure determination• Recommendation for which sequence to use

Project Overview

Page 3: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Considerations

• Is there a known structure?

• Are there Classified Structural (CATH, SCOP) Domains?

• Are there Known Sequence (Pfam) Domains?

• Are there Predicted Structural (Gene3D, Superfamily) Domains?

• Do Domain Boundaries Conform to Secondary Structure Restrictions?

• Which Species has a Representative Domain that is the Most Compactly Folded?

• The core implementation needs to be extendible and easily maintainable.

Page 4: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

The software is to be implemented using the Taverna workbench.

This is a tool that can be used to formulate the workflow and implement each of the processes as distributed web services.

Tom Oinn - http://taverna.sourceforge.net/

Taverna

Advantages: • Distributed computing reduces resource requirement.• Easily extendible system• Maintenance issues shifted to external providers

Disadvantages:• Learning curve• Convincing service providers to adopt a standard format• Maintenance issues shifted to external providers

Page 5: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Taverna

The prototype workflow:

When it is expanded to show all of

the incorporated sub-workflows is

quite complex

Luckily Taverna can provide a top

level view.

Page 6: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Taverna

Page 7: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Dealing With DAS

Page 8: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Taverna

Page 9: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Process Data

Secondary Structure Elements:(Method not yet chosen)

Sequence Domains:Pfam, Gene3D, Superfamily etc

Protein Folding:RONN, FoldIndex, DisEMBL

Rank Target Selection:Based on loop lengths, folding predictions, etc

Page 10: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Starting the Process

Page 11: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Monitoring Progress

Page 12: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Assess Data

Page 13: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Review Results

Page 14: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Extensibility

Java Services

• Straightforward to provide as a web service using Tomcat and Axis

• WSDL (describing the service) can be generated automatically

Legacy Software

• Any command line based tools can be wrapped into a web service using Soaplab

•For example the EMBOSS tools are already available

Page 15: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Extensibility

Output Format:

To ensure generic service compatibility it helps to define a common

results format. As a result we are using the e-Family service schema

(http://www.efamily.org.uk/)

Current collaborators include:

The Weizmann Institute - FoldIndex

University of Oxford - RONN

Page 16: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

http://www.efamily.org.uk/software/dasclients/spice/

Results Viewers

Page 17: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Conclusions

Taverna and Web Services:

• Taverna facilitates the provision of complex distributed systems that utilise web services

• This reduces maintenance overheads and keeps technology requirements at a reasonable level

• It is also easily extensible to accommodate new services

Availability:

• Hopefully the core system will be ready by the end of the year

• This will provide the basic workflow for users to customise according to their needs

Page 18: EMBL-EBI Structural Proteomics Automatic Target Selection Gordon Whamond

EMBL-EBI

Acknowledgments

Thanks to:

Tom Oinn

Andreas Prlic

The RONN and FoldIndex teams

The MSD Group