Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc

Computation in Biology

Nagasuma ChandraBioinformatics Centre & SERC

IISc

Next-generation biologists must straddle computation and biology

Organism

Organ

Cell

Organelle

Tissue

Supramolesular assembly

Macromolecule

Hierarchical structures in living systems

Genome Sequence- a book of life

DOE-Genomes.org

examplesfromenglishtextgenomicbiologytakesaholisticapproachtomolecularbiologyandevolutionbystudyingthecompletegenomeitsgenesanditsproteinexpressionpatternsncbiprovidesseveralgenomicbiologytoolsandresourcesincludingorganismspecificpagesthatincludelinkstomanywebsitesanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinksprovidedonthispage.

examplesfromenglishtextgenomicbiologytakesaholisticapproachtomolecularbiologyandevolutionbystudyingthecompletegenomeitsgenesanditsproteinexpressionpatternsncbiprovidesseveralgenomicbiologytoolsandresourcesincludingorganismspecificpagesthatincludelinkstomanywebsitesanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinksprovidedonthispage.

Genomic biology takes a holistic approach to molecular biology and evolution by studying the complete genome, its genes, and its protein expression patterns.NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species. We invite you to explore the links provided on this page.

Molecular circuitry in the cell

Biochemical networks

www.expasy.ch

Cellular networks

Characteristics of the yeast proteome: map of protein-protein interactions.

H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001);

Role of computation

Data management Data Analysis & Interpretation Prediction Application

What you need…

A model A computational tool

Models

Levels of modelling

Abstraction level

Hierarchy in living organisms

Abstraction level of the model

Molecular models

Sequences Structures

Genome Sequences The ‘omics’ era

Software tools

Accelrys Tripos MOE BioSuite Schrodinger + hundreds of academic software

bits

What you can do …………. Sequence Space

Determine identity of the molecule Predict physicochemical properties Predict three dimensional structure Predict Function Apply in pharmaceutical/ other

industries

Examples

Accelrys GCG

MOE

BioSuite

Example usage

Examples of GCG capabilities Sequence Comparison Database Searching and Retrieval DNA/RNA Secondary Structure Prediction Editing and Publication Evolution Fragment Assembly Gene Finding and Pattern Recognition Sequence Importing and Exporting Mapping Primer Selection Protein Analysis

Single Gene/Protein Sequence analysis- MOE

The colored bars over the sequences reflect the secondary structure of those sequences having associated atomic coordinates. Chains with sequence-only data have no such bars. In this instance, seven of the chains in the family have structural data and can therefore be used as structural templates.

This image illustrates Residue Identity matrix in MOE which shows Chains 13 and 14 have the highest percent identity to the query sequence.

Whole genomeSequence analysis- BioSuite

Structures

Advantages of structural-level studies

The protein folding problem Sequence-Structure Gap Need to predict structure using

computational methods Applications

Four levels of protein structure

Structures

Advantages of structural-level studies

The protein folding problem Sequence-Structure Gap Need to predict structure using

computational methods Applications

What you can do …………. Structure Space

Visualize structures Build molecular models Manipulate Analyse Simulate molecular behaviour Apply in Drug Discovery

VisualizationVisualization: : Viewer Module of InsightIIViewer Module of InsightII

PulldownsPulldownsModule IconModule Icon

Icon PaletteIcon Palette

Command promptCommand prompt

Information AreaInformation Area

Visualizations

Ligand-Protein Interaction

Aiding NMR Structue determination

Aiding crystal structure determination.. X-ray crystallography

Building molecular models

Small molecules Protein/ Nucleic acid/

Carbohydrates Predicting Protein Structure

Homology modelling Threading Modifications- Site directed mutants Protein-ligand complexes

BIOPOLYMER

Biopolymer module provides tools for building and modifying a wide range of biological macromolecules, including proteins, peptides, nucleic acids, and carbohydrates.

Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms

Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms

It is useful in:Building Proteins and PeptidesStructural Domain AnalysisBuilding CarbohydratesBuilding Nucleic AcidsStructural Database Searching.

This module in turn can be used later by other programs for structure refinement and analysis of small and large molecules

Manipulations Eg., Conformation tweaking

The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown.

ASP_187

HIS_229

MODELERMODELER uses a comparative modeling methodology to rapidly build structural models for protein sequences without a known structure. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes.

MODELER can create a model even with only one source protein. In this case, the structure for dihydrofolate reductase from Lactobacillus Casei is used to generate a model for the E. Coli protein. The model is 2.2 Å RMS deviation from the crystal structure of the E. Coli protein.

PROFILES – 3DProfiles-3D offers a unique approach to structure prediction by measuring the compatibility between protein sequences and known protein structures, and then using this information to address the inverse protein folding problem. Profiles-3D enables you to investigate which particular fold an amino acid sequence is likely to adopt.

Benefits:

Profiles-3D can test the validity of a model or preliminary structures derived from experimental data or modeling studies.

Profiles-3D can suggest which 3D structure an amino acid sequence is likely to adopt by relating structural properties to amino acid sequence information.

Reference template proteins identified by Profiles-3D can be used as input to InsightII Homology,MODELER module.

This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.

This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.

MATCHMAKERMatchMaker uses an inverse-folding method to predict the 3D structure of a protein from its amino acid sequence.By comparing a new protein sequence to its topology fingerprint database, MatchMaker assesses the ability of a sequence to adopt characteristic topologies.

Even in the absence of strong sequence similarity, MatchMaker generates high quality structural models.

Examples of MatchMaker output, including a histogram of sequence-structural compatibility (upper right), a sub-optimal alignment plot (upper left),an energy profile (middle left), and a prediction of structural elements (helix/beta strand, buried/exposed) for the input sequence.

Simulations- ‘Discover’

Analysis

Protein characterization Protein Comparison

Sequence-Structure-Function relationships

Active site detection Ligand Binding mode analysis Electrostatic analysis

Structure Analysis

Quality Check

ProTable used to analyze and evaluate protein structures. ProTable creates Ramachandran plots, assesses deviation of local geometries and side chain rotameric states from standard protein values, and determines the energetics of each residue.

PROTABLE

These images show the results of a ProTable evaluation of a theoretical model of prostatespecific antigen (2PSA).

MatchMaker energies reveals a loop (highlighted in green) that may require further refinement. Structures (purple and blue are low probability; orange and red are high probability). An automated Ramachandran analysis (right) identifies backbone torsions in borderline or disallowed regions.

DELPHIDelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation engine. DelPhi gives you the ability to determine the specificity of ligand-receptor interactions which aids in accelerating drug discovery.

DelPhi calculates:Electrostatic properties,including the effects of bulk solvent and ionic strength for nucleic acids, polysaccharides, and complexes such as glycoproteins and protein/DNA.

HIV protease, rendered with an electrostatic contour surface with a stick rendering of the drug inside the surface. Blue is positive, red is negative charge and gray is neutral.

Applications: Drug Discovery

SITEID

SiteID provides analysis and visualization tools leading to the identification of potential binding sites within or at the surface of biological targets.

The binding pocket of dihydrofolate reductase located by SiteID and shown as a MOLCAD surface. The red areas of the surface indicate contact atoms in the pocket, while the yellow areas show the residues in which those atoms are contained. The inhibitor (methotrexate) is shown in green.

Applications:

Locate ligand binding pockets on a

Macromolecule.

Identify protein-protein

interaction surfaces.

Identify constraints in a novel protein

structure for 3D database searching to

find or optimize lead compounds.

Active Site Detection: MOE uses a fast geometric algorithm, based on Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and protein-protein binding sites. Individual sites can be visualized or populated with “dummy atoms” for docking calculations or Starting points for de novo ligand design efforts.

STRUCTURE BASED DESIGN TOOLS

Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site.

FLEXXFlexX rapidly docks a conformationally flexible ligand into a binding site, using an incremental construction algorithm that builds the ligand in the active site.

FlexX is composed of four basic components:

Conformational flexibility.

Set of possible protein-ligand interactions.

Scoring function for the interactions.

Algorithm for placement and incremental growth of the ligand from a defined core.

A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.

A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.

RACHELRACHEL performs automated combinatorial optimization of lead compounds by systematically derivatizing user-defined sites on the ligand.

Applications:

Combinatorially enumerate user defined sites on a lead scaffold to optimize binding within a receptor

Bridge high-affinity ligand fragments positioned within the active site

The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.

The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.

HTS-QSAR : CCG’s unique Binary QSAR methodology is ideal for building pass/fail models from high error content data and standard molecular descriptors. The resulting probabilistic models (based on Bayesian statistical inference) are used as a biasing agent in the design of focused combinatorial libraries

HIGH THROUGHPUT DISCOVERY TOOLS

Molecular Databases: The MOE Molecular Database is a disk-based spreadsheet central to the manipulation and visualization of large collections of compounds.Data can be imported and exported in various standard file formats and merged with structural or biological activity data.

MOLECULAR DATABASE VIEWER MOLECULAR DATABASE CALCULATOR

CHEMINFORMATICS TOOLS

SEARCH COMPARE

Search Compare provides systematic conformational search and analysis as well as superimposition, molecular similarity.

Using Search Compare, two angiotensin II antagonists are flexibly superimposed based on the field similarity (combined steric and electrostatic potentials).

UNITY

Unity locates compounds in databases that match a pharmacophore or fit to receptor site.

Applications:

Exploration of databases for compounds consistent with a pharmacophore hypothesis

Lead explosion by retrieving similar compounds

Virtual screening of compound databases to discover lead compounds

Determining reagents in commercial databases that support combinatorial chemistry synthesis

A UNITY query constructed at the active site of the streptavidin/biotin complex (1STP). Yellow lines originate at hydrogen bonding sites of the protein (shown as spheres) and terminate within the spatial constraint for complementary ligand sites. A surface constraint at the protein/ligand interface is shown in green. The spatial cap in red accounts for a bifurcated interaction with an Asp carboxyl. Partial match groups are shown in different colors: red, yellow, or green.

CATALYST/SHAPE

Catalyst/SHAPE identifies compounds that possess similar 3D shapes to a specified 3D conformation.

Methotrexate is displayed (left: hydrogen removed) in its bound conformation to the enzyme dihydrofolate reductase inhibitor. On the right are 3D compounds retrieved from the Derwent’s World Drug Index that best fit the shape of the bound conformation of methotrexate. This shape-based 3D search was performed with Accelrys’ Catalyst/SHAPE

•Performs flexible shape-based database searches.

•Performs statistical analysis of shape indices of a particular database.

•Simultaneously performs shape and pharmacophore searches via a merged query.

FEATURES:

HypoGenHypoGenGiven only available experimental information such as 2D structures and biological activities of a set of molecules, Catalyst can be used to generate general interaction hypotheses that explain variations in activity across a set of molecules.

Two 5HT3 antagonists (green and yellow) mapped on to a six-feature hypothesis.

C2-LIGAND FIT

C2.LigandFit provides active site finding, flexible docking and scoring capabilities, allowing evaluation of compounds against a receptor site

Active site identification for HIV Protease usingC2•LigandFit flood filling technique

Features

• Active site search by flood filling method

• Fast conformational search for ligand in protein cavity

• Fast grid method for evaluation of protein-ligand interactions

• Clustering of docked conformers

• Multiple scoring functions

C2ADME TOOL

C2ADME provides computational models for the prediction of absorption, Distribution, metabolism,and excretion (ADME) properties derived from chemical structures.

Plot of Polar Surface Area (PSA) vs. LogP for a sample of the World Drug Index (WDI) database showing the 95% and 99% confidence limit ellipses corresponding to the Absorption Model. The points are color coded by Absorption level (Good,Moderate, Poor and Very Poor).

Features:

C2•ADME provides computational ADME/Tox prediction tools with the ability to predict problematic New Chemical Entities at an early stage of the development process

C2•ADME currently includes models for passive intestinal absorption,blood-brain barrier (BBB) penetration,and aqueous solubility at 25°C.

In-built utilities

Scripting- automation Session Folders Log files

What you should remember …..

Good computational practices Other users are as important as

yourself Do not use up licenses unduly Preparation

Evaluate protocol, choice of package, follow job submission rules

Access details

Insight/ Catalyst/ Cerius – SGI machines- base modules- several licenses

Tripos- SGI machines MOE- Linux platform/ Windows/ SGI BioSuite- Linux

Documents

Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc