Drug Design Using Bioinformatics

International medical university

Introduction to Bioinformatics

Drug Design Using Bioinformatics

Aniqah Zulfa Binti Abdul Latif

MB0710029885

Medical Biotechnology 1/10

Table of Contents

Content Page

Introduction 2

Drug Discovery 3

Bioinformatics Approach to Drug

Design

4

Sequence Annotation Databases 5

Structure Prediction 6

High-Throughput and Virtual

Screening

1. High-Throughput Screening

2. Virtual Screening

a. Docking Screening

b. Similarity Screening

7

ADMET 10

Future Outlooks 11

References 11

INTRODUCTION

1

Bioinformatics is a field that associates computer science with the pure science field such as

biology, chemistry and medicine. They play an important role in organizing, managing and interpreting

data from biological information. Terms like genomics and proteomics are the backbone of the field of

bioinformatics. In this essay, let us dive in to see the development of the bioinformatics in drug designing

process. [1]

Bioinformatics has grown so well that its presence has transformed the customary approaches of

the drug designing and development. In our time, the approaches to the drug designing and development

have been increasingly favoring the computational methodologies. Methods such as high-throughput

screening, microarray, two-dimensional (2D) gel experiments, large-scale mass spectrophotometry and

chemical library screens are acknowledged due to its contribution in introducing many potential and

reliable drugs to the community. Despite the molecular and chemical understanding of certain drug

development and designing, these methods too have been used to speed up the overall process of drug

discovery.[2]

Cited 1 - http://www.ittc.ku.edu/bioinfo_seminar/F07.html

Since, there is a drastic increase of computational usage in scientific researches, the major

challenge scientists face nowadays is not in collecting the data, but, in the interpreting, analyzing,

recovering and also in the storage of the data. Most of the scientific data are collected in large-scale

database. Such databases contain many experimental results, gene sequences, mutations and millions of

nucleotide polymorphisms. For example, GenBank contains 39,000,000 genomes, 43 billion bases and

occupying 100 gigabytes of disk space. There are more than 1,000 viral genomes, 200 bacterial genomes

and more than a dozen eukaryotic genomes have been sequenced. Finally, database called PubMed

2

http://www.google.com.my/imgres?q=bioinformatics&hl=en&sa=X&gbv=2&tbas=0&biw=1280&bih=827&tbm=isch&tbnid=AUu8uO3Jzuif0M:&imgrefurl=http://irfgc.irri.org/cropbioportal/index.php?option=com_content&task=view&id=1&Itemid=32&docid=Orp9QFQC_NzuJM&imgurl=http://cropwiki.irri.org/gcp/images/2/23/Bioinformatics2.jpg&w=487&h=321&ei=v3WWTv-LC8nQrQe-s_SLBA&zoom=1

contained 15 million abstracts from more than 4,600 journals occupying more than 40 gigabytes of

textual data. [3]

Scientists have been working side by side with computer scientists to help in managing this so

called “data explosion”. Thus, this collaboration has led to the rise of two new arena in information

science; bioinformatics and cheminformatics. Cheminformatics touches more on the chemistry basis and

in case of drug designing; chemistry is the backbone of it. Hence, from the collaboration of both fields,

scientist is able to predict the pharmaceutical importance of a drug by retrieving and visualizing the

storage experimental data. [4]

In this essay, we will take a look on the bioinformatics features that are significant in

pharmaceutical researches, specifically in drug designing and development. Since the features of

bioinformatics in pharmaceutical researches are so wide, we will only concentrate on the bioinformatics

tools that apply on the important pharmaceutical factors, especially the structural prediction of the target

drugs. Besides that, we will also discuss on the prediction and understanding of the metabolism and

toxicity of the drugs using bioinformatics resources and relevant software. [5]

Drug Discovery

For a drug to have high efficacy and potency it should be as specific as possible and the side

effects are as low as possible. Therefore, good chemists should be able to identify the drug’s target before

designing it. The drug should be design according to the specificity of the drug’s target and its action. For

example, the protein protease; an enzyme that catabolized proteins. Protease is an important enzyme that

helps in many metabolic activities in the body. However, it also plays an important role in human

3

diseases. Take as example, the Human Immunodeficiency Virus (HIV) in AIDS. This virus makes use of

protease to break down healthy proteins and use them as a precursor for the development of new viruses.

In case of osteoporosis, osteoclast cells that stick onto the bone surface produce proteases that make

bones more fragile. Therefore, in the case of protease, the drugs that are design should be specifically act

to inhibit the actions of the enzyme protease. However, the major challenge is to have enough specificity

and lower the possible side effects of the drug. [6]

Earlier, most of the human genomes were still unknown and not yet discovered. Thus, the drug

development had been constrained to a small percentage of possible drug’s targets. Thanks to

bioinformatics, the task of selecting drug targets are highly lightened as more and more genome

sequences were identified and stored in the genes databases.[7]

In dealing with the drug design process, it is also important to understand the function of the

proteins that make up that particular drug. In order to achieve this, bioinfomaticist will perform a

computational analysis that can predict the three-dimensional structure (3D) of the proteins. Important

software tools can be used in order to generate the 3D structure with a desired epitopes coordinates.[8]

Bioinformatics Approach to Drug Design

In the bioinformatics of drug design, it can take numerous of approaches in order to develop

significant and reliable drugs. The approaches include: [9]

4

Identification and characterization of gene

Identification and characterization of

proteinMolecular phylogeny Determination of

protein structure

Analysis and finding of promoter Identification and

analysis of splice siteAnalysis of genome and

proteomeDetermination of protein structure

Identification of transcription factor

binding siteSimulation of biochemical

Analysis of DNA microarray

Analysis and identification of motif

Sequence Annotation Databases

In pharmaceutical research, it is important to understand and interpret the gene and protein

sequences of particular organism in order to have an overview on the possible protein drug targets. For

example, the regular sequencing of bacteria, parasites and other pathogenic organisms can really help

scientist to identify its pathogenicity. Moreover, performing sequences on mammalian’s genomes has

helped in categorizing various drug-metabolizing enzymes and the gene information is used widely to

study and understand protein expressions in many pharmacology and toxicology experiments. From the

sequence annotation data, we are able to predict the proteins in which the drug acts upon, the mechanisms

of the drug and the metabolism of the drug. [10]

There are two main providers that offer sequence annotation data and they are:

National Centre for Biotechnology Information (NCBI)

European Bioinformatics Institute (EBI)

In general, NCBI offers data that is DNA-rich and EBI offers protein-rich information.

Some of other sequence annotation databases are as follow:

DatabasesGenBankGenBank StatsEnsemblEntrezGeneUCSC-GoldenPathRefSeqSwissProtUniProtTrEMBLGeneCardsMouse genome database (MGD)Rat genome database (RGD)MAGPIE/BLUEJAYSymAtlasCypAlleles DBDirectory of P-450 containing systemsCytochrome P-450 interaction tableHuman membrane transporter database (HMTD)Transporter pageHuman ABC transporter database

5

Structure Prediction

One of the applications of bioinformatics in drug designing processes is to achieve an

understanding about the connection between the amino acid sequence and protein’s 3D structure. The

structure of the protein can give the overview of how the protein will function. As a result, the most vital

approach that needs to be taken in consideration is the identification and the classification of protein. This

is due to the need to visualize the 2D and 3D structure of a particular protein. Hence, through this method

that protein structure shall be predicted. [11]

The process of drug designing is facilitated by understanding the structure of the target protein.

The prediction starts by identifying the amino acid sequences and genes before going to the purified

protein. Thus, this results in more accurate prediction of the protein. [11]

Thanks to bioinformatics, there have been various databases that offer lists of 3D structure of

various proteins and macromolecules. For example for such databases are, molecular modeling database

(MMDB) and protein data bank (PDB). [12]

The methods in in which the structure of the proteins is predicted are categorized into three

standard methods. They are:

De novo prediction is used when the protein sequences have little or no structure similar to it. It is

done based on the chemistry and physics of the protein structure. Secondly, the prediction based on

6

Ab initio / de novo prediction Homology modeling Fold recognition

(threading)

homology modeling is done by comparing with homologous sequence which in turn will produce similar

structures. However, not all homologous sequence will produce the similar structure that we need.

Thirdly, the threading method or fold recognition method is used to predict the protein structure when two

proteins have similar three-dimensional structure but they have distinct primary sequence. Hence, this

method can verify the unknown structural alignment. MAMMOTH and SCOP are some of the programs

that are used in structural alignment. [12]

High-Throughput and Virtual Drug Screening

The next step after the drug’s identification, structure prediction and functional recognition, they

need to be tested for their efficiency in vivo as well as in vitro. Therefore, there are several approaches

that can be done in order to put the drugs to screening. They can be classified into high-throughput

screening and virtual drug screening.[13]

High-Throughput Screening

High-throughput screening is the traditional approach that is done upon a drug to recognize its

activities. This method involves the use of chemicals that are tested systematically upon the drugs in

vitro. The whole process of high-throughput screening is an automated process whereby 100,000

molecules can be screened per day. The media that the drugs are tested upon could include the use of

organism or cell-based testing.[13]

Virtual Drug Screening

Virtual drug screening is an expensive yet precise approach for the testing of drug’s activity. This

method uses different and unrelated databases which provide all the sequence and structure information

of genes. It uses the gene’s information and sequence to predict the 3D structure of proteins and give ideal

virtual screening. The most precise virtual screening is achieved based on the accuracy and the degree of

completion in data. [13]

7

Virtual screening includes several methods and two of them are:

Docking-Based Virtual Screening

Similarity-Based Virtual Screening

Docking-Based Virtual Screening

This method of virtual screening includes the identification

and characterization of the binding sites of the drug’s target

proteins. The surface of the proteins that make up the

drug’s targets can be visualized by using modeling

programs such as DOCK and AUTODOCK. Significantly,

this programs use various databases such as ZINC to

identify potential ligands that can bind to the binding sites

of the proteins. Moreover, this approach of drug screening

visualizes the protein’s side chains conformation in the selection of ligands and characterized them as

conserved or non-conserved. Conserved side chains are mainly found in various proteins’ binding sites

and therefore are non-specific. On the other hand, non-conserved side chains are expected to be more

specific. Thus, we need to identify the degree of specificity of the ligands that target the protein’s binding

sites. [14]

The significant of this in drug designing is that, if we assume that the drugs that we want to

design is the ligands, hence, we can use this approach to know the degree of specificity of the drugs that

we designed on the targeted protein’s binding sites.[14]

8

Similarity-Based Virtual Screening

This method of virtual screening includes the small molecule alignment in which test ligands are

screened through known ligands databases and the most similar known ligands can be the reference to the

test ligands. The similarity of the ligand’s alignments is scored based on the molecular groups

overlapping. Examples of the programs that make use of this concept are GASO and FlexS. [15]

In addition to ligand’s alignment, the identification of the ligand’s binding site structure can also

be used to recognize the possible drug targets. This approach makes use both the protein structure

databases as well as the ligand binding affinity databases. Binding Database is one of the examples of

ligand binding affinity database. From these two concepts, we shall examine proteins that have

comparable functions and from the fact that proteins which have similar functions could also possess the

similar binding areas, we can predict the interaction of the ligands and the targets of our drugs of interest.

Relibase is a tool that is used to analyze the reaction of ligand

binding and can sets out the significant data which includes the

binding pockets conformation, interactions of water molecules

and the degree of specificity of the ligands. [15]

On top of that, databases such as Comprehensive

Medicinal Chemistry Database and MACCS-II Drug Data Report are specialized designed screening

libraries in which they are able to give the performance report of the drug molecules in vivo. Such

databases also include the chemical properties of the drug molecules for instance the properties of the

hydrogen binding, log P, the molecular weight and also the possible attachment of certain functional

groups. [15]

9

ADMET

Bioinformatics play a very important role in terms of describing the ADMET of drugs in drug

designing and development. Many clinical trials of drugs failed to describe the ADMET of the drugs in

such details. This is due to the fact that the ADMET of a drug is an extremely complicated picture

whereby scientists need to understand the mechanisms of action of the compound from its entrance to the

digestive tract to its target. In between, many chemical reactions are taking place and each details of it are

crucial in the predicting the ADMET of that compound. [16]

Therefore, what bioinformatics do in this case is just to predict the ADMET based on the

collected data for instance, the size of the compound, lipophilicity properties and the presence of probable

functional groups. From this information, QSAR (Quantitative Structure-Activity Relationship) model

can be build. QSAR model is a process attempt to quantitatively associate the structural and properties of

a compound with a well-defined process; in this case, it’s a biological process. [16]

There are various QSAR programs that are created to specifically predict the ADMET of a

compound. For example of such programs is ADMET Predictor from Simulation Plus. [16]

1 - ADMET Predictive Software

Due to the fact that predicting the performance of such complex system, these ADMET

prediction tools are able to give 60 to 70% of accuracy. On the other hand, certain toxicity models

somehow give more reliable results. This is because; toxicity models are designed for only one specific

type of toxicity.

10

Future Outlook of Bioinformatics in Drug Designing

Someday, it is not impossible to expect that data collected are not limited to the molecular basis

of organisms but their physiological and even their epidemiological information can be collected and

interpret. Possibly, this information can give more accurate interpretation of a specific disease in the

aspect of populations, racial or ethnic groups. From this information, we can predict the likelihood of

probable adverse effects, toxicity, and the pharmacokinetics in the distribution of population if the data

are incorporated with the high-throughput in vitro ADMET.

Due to the fact that there is a drastic increase in the development of bioinformatics, it is most expected

that there will be a new innovative era of medical and health sciences.

References

1 Special issue: Biological databases, Nucleic Acids Res., 29, 1 (2001).

2 K. Rutherford, J. Parkhill, J. Crook, T. Horsnell, et al., Bioinformatics,

16, 944 (2000).

3 A.A. Schaffer, Y.I. Wolf, C.P. Ponting, E.V. Koonin, et al.,

Bioinformatics, 15, 1000 (1999).

4 M.J. Callow, S. Dudoit, E.L. Gong, T.P. Speed, et al., Genome Res., 10,

2022 (2000).

5 http://www.genomicglossaries.com/content/chapterinfosourcestext.asp

6 http://www.scfbio-iitd.res.in/tutorial/drugdiscovery.htm

7 http://www.b-eye-network.com/view/852

8 http://www.pharmainfo.net/reviews/computer-aided-drug-design-and-

bioinformatics-current-tool-designing

9 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1609333/

10 http://www.vls3d.com/courses_talk/Villoutreix_intro_drug_design.pdf

11 http://www.mrc-lmb.cam.ac.uk/genomes/madanm/pdfs/medinfo.pdf

12 Odriguez R., Chinea G., Lopez N., Pon T., andVriend G. 1998.Homology

modeling, model and software evolution: Three related resources.

Bioinformatics 14:523-528

13 http://biospectrumindia.ciol.com/content/careers/10306091.asp

14 Ortiz, A. R., Gomez-Puertas, P., Leo-Macias, A., et al. (2006)

Computational approaches to model ligand selectivity in drug design.

Curro Top Med. Chern. 6(1). 41-55.

15 http://www.slideshare.net/bknanjwade/applications-of-bioinformatics-in-

drug-discovery-and-process-presentation

16 http://phobos.ramapo.edu/~pbagga/binf/binf_future.htm

11

12

Documents

Drug Design Using Bioinformatics