13
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Embed Size (px)

Citation preview

Page 1: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Modelling proteins and proteomes using Linux clustersRam Samudrala

University of Washington

Page 2: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Examples of biological problems

Protein structure prediction/docking simulations- need to run different trajectories that sometimes

talk with each other

Molecular dynamics simulations- need more cohesive parallelisation

Polarisable force fields - need true parallelisation

Bioinformatics searches/exploration- trivially parallelisable

Page 3: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Computational issues

Need efficient methods to start/stop jobs

Need load/balancing queuing system

Need fast communications at times

Need stability (months/years uptimes)

Need low maintainance/management overhead

Need low installation overhead

Needs to be cheap!

Page 4: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Hardware and operating system

256 AMD and Intel CPUs (1-2.5 GHz)

0.5-1 GB RAM, 100-200 GB HD, dual processor MBs

100Mbps ethernet connectivity for 64 processor sets

White boxes are good but use up space – 1u racks ideal

Minimal Linux installation – create clone “CD” – copy on all machines

Page 5: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Our solution

No single solution – user implements their own

Completely decentralised

Analyse problem and determine parallelisable parts

Implementation specific to problem

Use local scratch space for computation

Redundant storage of data for faster access

Limit problem space to specific problems

Page 6: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Problem specific implementation

MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs

Docking: sample different ligands/regions of the proteinon different CPUs

MD: Pairwise force-fields are additive

PFF: ?

Bioinformatics: trivial parallelisation; communication by disk

Page 7: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

generatefragments from database14-state , model

… …

minimisemonte carlo with simulated annealingconformational space annealing, GA

… …

filter all-atom pairwise interactions, bad contactscompactness, secondary structure

Page 8: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

T170/sfrp3 – 4.8 Å for all 69 aa

Ab initio prediction at CASP

Page 9: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Comparative modelling at CASP

T182 – 1.0 Å (249 aa; 41% id)

Page 10: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Prediction of SARS CoV proteinase inhibitors

Ekachai Jenwitheesuk

Page 11: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Bioverse – S. typhimurium protein-protein interaction network

Jason McDermott

Page 12: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Bioverse – H. sapiens protein-protein interaction network

Jason McDermott

Page 13: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Future directions

Network connection with multiple ethernet cards based on traffic analysis

Gigabit ethernet (switches are still expensive)

Better network filesystems