Upload
hugh-cain
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Computation in Biology
Nagasuma ChandraBioinformatics Centre & SERC
IISc
Next-generation biologists must straddle computation and biology
Organism
Organ
Cell
Organelle
Tissue
Supramolesular assembly
Macromolecule
Hierarchical structures in living systems
Genome Sequence- a book of life
DOE-Genomes.org
examplesfromenglishtextgenomicbiologytakesaholisticapproachtomolecularbiologyandevolutionbystudyingthecompletegenomeitsgenesanditsproteinexpressionpatternsncbiprovidesseveralgenomicbiologytoolsandresourcesincludingorganismspecificpagesthatincludelinkstomanywebsitesanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinksprovidedonthispage.
examplesfromenglishtextgenomicbiologytakesaholisticapproachtomolecularbiologyandevolutionbystudyingthecompletegenomeitsgenesanditsproteinexpressionpatternsncbiprovidesseveralgenomicbiologytoolsandresourcesincludingorganismspecificpagesthatincludelinkstomanywebsitesanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinksprovidedonthispage.
Genomic biology takes a holistic approach to molecular biology and evolution by studying the complete genome, its genes, and its protein expression patterns.NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species. We invite you to explore the links provided on this page.
Molecular circuitry in the cell
Biochemical networks
www.expasy.ch
Cellular networks
Characteristics of the yeast proteome: map of protein-protein interactions.
H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001);
Role of computation
Data management Data Analysis & Interpretation Prediction Application
What you need…
A model A computational tool
Models
Levels of modelling
Abstraction level
Hierarchy in living organisms
Abstraction level of the model
Molecular models
Sequences Structures
Genome Sequences The ‘omics’ era
Software tools
Accelrys Tripos MOE BioSuite Schrodinger + hundreds of academic software
bits
What you can do …………. Sequence Space
Determine identity of the molecule Predict physicochemical properties Predict three dimensional structure Predict Function Apply in pharmaceutical/ other
industries
Examples
Accelrys GCG
MOE
BioSuite
Example usage
Examples of GCG capabilities Sequence Comparison Database Searching and Retrieval DNA/RNA Secondary Structure Prediction Editing and Publication Evolution Fragment Assembly Gene Finding and Pattern Recognition Sequence Importing and Exporting Mapping Primer Selection Protein Analysis
Single Gene/Protein Sequence analysis- MOE
The colored bars over the sequences reflect the secondary structure of those sequences having associated atomic coordinates. Chains with sequence-only data have no such bars. In this instance, seven of the chains in the family have structural data and can therefore be used as structural templates.
This image illustrates Residue Identity matrix in MOE which shows Chains 13 and 14 have the highest percent identity to the query sequence.
Whole genomeSequence analysis- BioSuite
Structures
Advantages of structural-level studies
The protein folding problem Sequence-Structure Gap Need to predict structure using
computational methods Applications
Four levels of protein structure
Structures
Advantages of structural-level studies
The protein folding problem Sequence-Structure Gap Need to predict structure using
computational methods Applications
What you can do …………. Structure Space
Visualize structures Build molecular models Manipulate Analyse Simulate molecular behaviour Apply in Drug Discovery
VisualizationVisualization: : Viewer Module of InsightIIViewer Module of InsightII
PulldownsPulldownsModule IconModule Icon
Icon PaletteIcon Palette
Command promptCommand prompt
Information AreaInformation Area
Visualizations
Ligand-Protein Interaction
Aiding NMR Structue determination
Aiding crystal structure determination.. X-ray crystallography
Building molecular models
Small molecules Protein/ Nucleic acid/
Carbohydrates Predicting Protein Structure
Homology modelling Threading Modifications- Site directed mutants Protein-ligand complexes
BIOPOLYMER
Biopolymer module provides tools for building and modifying a wide range of biological macromolecules, including proteins, peptides, nucleic acids, and carbohydrates.
Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms
Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms
It is useful in:Building Proteins and PeptidesStructural Domain AnalysisBuilding CarbohydratesBuilding Nucleic AcidsStructural Database Searching.
This module in turn can be used later by other programs for structure refinement and analysis of small and large molecules
Manipulations Eg., Conformation tweaking
The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown.
ASP_187
HIS_229
MODELERMODELER uses a comparative modeling methodology to rapidly build structural models for protein sequences without a known structure. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes.
MODELER can create a model even with only one source protein. In this case, the structure for dihydrofolate reductase from Lactobacillus Casei is used to generate a model for the E. Coli protein. The model is 2.2 Å RMS deviation from the crystal structure of the E. Coli protein.
PROFILES – 3DProfiles-3D offers a unique approach to structure prediction by measuring the compatibility between protein sequences and known protein structures, and then using this information to address the inverse protein folding problem. Profiles-3D enables you to investigate which particular fold an amino acid sequence is likely to adopt.
Benefits:
Profiles-3D can test the validity of a model or preliminary structures derived from experimental data or modeling studies.
Profiles-3D can suggest which 3D structure an amino acid sequence is likely to adopt by relating structural properties to amino acid sequence information.
Reference template proteins identified by Profiles-3D can be used as input to InsightII Homology,MODELER module.
This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.
This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.
MATCHMAKERMatchMaker uses an inverse-folding method to predict the 3D structure of a protein from its amino acid sequence.By comparing a new protein sequence to its topology fingerprint database, MatchMaker assesses the ability of a sequence to adopt characteristic topologies.
Even in the absence of strong sequence similarity, MatchMaker generates high quality structural models.
Examples of MatchMaker output, including a histogram of sequence-structural compatibility (upper right), a sub-optimal alignment plot (upper left),an energy profile (middle left), and a prediction of structural elements (helix/beta strand, buried/exposed) for the input sequence.
Simulations- ‘Discover’
Analysis
Protein characterization Protein Comparison
Sequence-Structure-Function relationships
Active site detection Ligand Binding mode analysis Electrostatic analysis
Structure Analysis
Quality Check
ProTable used to analyze and evaluate protein structures. ProTable creates Ramachandran plots, assesses deviation of local geometries and side chain rotameric states from standard protein values, and determines the energetics of each residue.
PROTABLE
These images show the results of a ProTable evaluation of a theoretical model of prostatespecific antigen (2PSA).
MatchMaker energies reveals a loop (highlighted in green) that may require further refinement. Structures (purple and blue are low probability; orange and red are high probability). An automated Ramachandran analysis (right) identifies backbone torsions in borderline or disallowed regions.
DELPHIDelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation engine. DelPhi gives you the ability to determine the specificity of ligand-receptor interactions which aids in accelerating drug discovery.
DelPhi calculates:Electrostatic properties,including the effects of bulk solvent and ionic strength for nucleic acids, polysaccharides, and complexes such as glycoproteins and protein/DNA.
HIV protease, rendered with an electrostatic contour surface with a stick rendering of the drug inside the surface. Blue is positive, red is negative charge and gray is neutral.
Applications: Drug Discovery
SITEID
SiteID provides analysis and visualization tools leading to the identification of potential binding sites within or at the surface of biological targets.
The binding pocket of dihydrofolate reductase located by SiteID and shown as a MOLCAD surface. The red areas of the surface indicate contact atoms in the pocket, while the yellow areas show the residues in which those atoms are contained. The inhibitor (methotrexate) is shown in green.
Applications:
Locate ligand binding pockets on a
Macromolecule.
Identify protein-protein
interaction surfaces.
Identify constraints in a novel protein
structure for 3D database searching to
find or optimize lead compounds.
Active Site Detection: MOE uses a fast geometric algorithm, based on Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and protein-protein binding sites. Individual sites can be visualized or populated with “dummy atoms” for docking calculations or Starting points for de novo ligand design efforts.
STRUCTURE BASED DESIGN TOOLS
Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site.
FLEXXFlexX rapidly docks a conformationally flexible ligand into a binding site, using an incremental construction algorithm that builds the ligand in the active site.
FlexX is composed of four basic components:
Conformational flexibility.
Set of possible protein-ligand interactions.
Scoring function for the interactions.
Algorithm for placement and incremental growth of the ligand from a defined core.
A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.
A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.
RACHELRACHEL performs automated combinatorial optimization of lead compounds by systematically derivatizing user-defined sites on the ligand.
Applications:
Combinatorially enumerate user defined sites on a lead scaffold to optimize binding within a receptor
Bridge high-affinity ligand fragments positioned within the active site
The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.
The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.
HTS-QSAR : CCG’s unique Binary QSAR methodology is ideal for building pass/fail models from high error content data and standard molecular descriptors. The resulting probabilistic models (based on Bayesian statistical inference) are used as a biasing agent in the design of focused combinatorial libraries
HIGH THROUGHPUT DISCOVERY TOOLS
Molecular Databases: The MOE Molecular Database is a disk-based spreadsheet central to the manipulation and visualization of large collections of compounds.Data can be imported and exported in various standard file formats and merged with structural or biological activity data.
MOLECULAR DATABASE VIEWER MOLECULAR DATABASE CALCULATOR
CHEMINFORMATICS TOOLS
SEARCH COMPARE
Search Compare provides systematic conformational search and analysis as well as superimposition, molecular similarity.
Using Search Compare, two angiotensin II antagonists are flexibly superimposed based on the field similarity (combined steric and electrostatic potentials).
UNITY
Unity locates compounds in databases that match a pharmacophore or fit to receptor site.
Applications:
Exploration of databases for compounds consistent with a pharmacophore hypothesis
Lead explosion by retrieving similar compounds
Virtual screening of compound databases to discover lead compounds
Determining reagents in commercial databases that support combinatorial chemistry synthesis
A UNITY query constructed at the active site of the streptavidin/biotin complex (1STP). Yellow lines originate at hydrogen bonding sites of the protein (shown as spheres) and terminate within the spatial constraint for complementary ligand sites. A surface constraint at the protein/ligand interface is shown in green. The spatial cap in red accounts for a bifurcated interaction with an Asp carboxyl. Partial match groups are shown in different colors: red, yellow, or green.
CATALYST/SHAPE
Catalyst/SHAPE identifies compounds that possess similar 3D shapes to a specified 3D conformation.
Methotrexate is displayed (left: hydrogen removed) in its bound conformation to the enzyme dihydrofolate reductase inhibitor. On the right are 3D compounds retrieved from the Derwent’s World Drug Index that best fit the shape of the bound conformation of methotrexate. This shape-based 3D search was performed with Accelrys’ Catalyst/SHAPE
•Performs flexible shape-based database searches.
•Performs statistical analysis of shape indices of a particular database.
•Simultaneously performs shape and pharmacophore searches via a merged query.
FEATURES:
HypoGenHypoGenGiven only available experimental information such as 2D structures and biological activities of a set of molecules, Catalyst can be used to generate general interaction hypotheses that explain variations in activity across a set of molecules.
Two 5HT3 antagonists (green and yellow) mapped on to a six-feature hypothesis.
C2-LIGAND FIT
C2.LigandFit provides active site finding, flexible docking and scoring capabilities, allowing evaluation of compounds against a receptor site
Active site identification for HIV Protease usingC2•LigandFit flood filling technique
Features
• Active site search by flood filling method
• Fast conformational search for ligand in protein cavity
• Fast grid method for evaluation of protein-ligand interactions
• Clustering of docked conformers
• Multiple scoring functions
C2ADME TOOL
C2ADME provides computational models for the prediction of absorption, Distribution, metabolism,and excretion (ADME) properties derived from chemical structures.
Plot of Polar Surface Area (PSA) vs. LogP for a sample of the World Drug Index (WDI) database showing the 95% and 99% confidence limit ellipses corresponding to the Absorption Model. The points are color coded by Absorption level (Good,Moderate, Poor and Very Poor).
Features:
C2•ADME provides computational ADME/Tox prediction tools with the ability to predict problematic New Chemical Entities at an early stage of the development process
C2•ADME currently includes models for passive intestinal absorption,blood-brain barrier (BBB) penetration,and aqueous solubility at 25°C.
In-built utilities
Scripting- automation Session Folders Log files
What you should remember …..
Good computational practices Other users are as important as
yourself Do not use up licenses unduly Preparation
Evaluate protocol, choice of package, follow job submission rules
Access details
Insight/ Catalyst/ Cerius – SGI machines- base modules- several licenses
Tripos- SGI machines MOE- Linux platform/ Windows/ SGI BioSuite- Linux