58
Bioinformatics PVPSIT, Vijayawada 30 th September 2010 Allam Appa Rao JNTUK 03/27/22 Allam Appa Rao 1

Bioinformatics PVPSIT, Vijayawada 30 th September 2010 Allam Appa Rao JNTUK 8/24/2015Allam Appa Rao1

Embed Size (px)

Citation preview

Bioinformatics

PVPSIT, Vijayawada30th September 2010

Allam Appa RaoJNTUK

04/19/23 Allam Appa Rao 1

Socrates taught us, the essence of Scienceis measuring, counting, and weighing together with reasoning from postulates or axioms

304/19/23 Allam Appa Rao

Knuth

“Science is what we understand well enough to explain to a computer”

Dijkstra "Computer Science is no more about computers than astronomy is about telescopes http://www.quotationspage.com/quote/78

8.html”

If you want to understand life, don't think about vibrant, throbbing gels and oozes, think

about information technology. --- Richard Dawkins,

University of California, BerkeleyOxford UniversityThe Blind Watchmaker, 1986,

Norton, p. 112.

IT

Over the past few decades rapid developments in molecular research technologies (MRT) and developments in information technologies (IT) have combined to produce a tremendous amount of information related to molecular biology.

04/19/23 4Allam Appa Rao

Bioinformatics: Definition

Bioinformatics is the application of information technology and computer science to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatic processes in biotic systems.

04/19/23 8Allam Appa Rao

Bioinformatics: Applications

Bioinformatics focuses on developing and applying computationally intensive techniques like pattern recognition, data mining, machine learning algorithms, and visualization

04/19/23 9Allam Appa Rao

Bioinformatics: Activities

Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures.

04/19/23 10Allam Appa Rao

DM: Definition

Diabetes Mellitus is a condition in which the body either does not produce enough, or does not properly respond to, insulin, a hormone produced in the pancreas.

04/19/23 11Allam Appa Rao

Bioinformatics: Entailment

Bioinformatics entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.

04/19/23 12Allam Appa Rao

Bioinformatics: Research

Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modelling of evolution.

04/19/23 13Allam Appa Rao

DM: Epidemic

Diabetes Mellitus (DM) affects about 10-15% of the Indians and is assuming epidemic proportions, development of newer robust therapeutic approaches both in its prevention and treatment are needed.

04/19/23 14Allam Appa Rao

Our Work: BDNF• BDNF is a natural compound that is present in

human body and hence therapeutics developed using these molecules are expected to have fewer side effects.

• BDNF can indeed prevent and ameliorate DM, it would pave way to develop newer therapeutic opportunities.

• BDNF as a therapeutic tool for the prevention and treatment of DM.

04/19/23 Allam Appa Rao 15

04/19/23 Allam Appa Rao 16

04/19/23 Allam Appa Rao 17

Protein folding ?

The ability of a protein to fold reliably into a pre-determined conformation despite a near infinite number of possibilities is, despite much research, still poorly understood.

The structure of a protein is determined purely by the amino acid sequence, and the structure of the protein determines the function.

The function of a protein depends entirely on the ability of the protein to fold rapidly and reliably to its native structure. Many proteins fold spontaneously into their native structure in aqueous solution.

It has been suggested that for a protein of 100 amino acids, a purely random conformational search would require around 10 **29 years, and yet proteins are able to fold on a timescale of milliseconds to seconds.

This suggests that only a small amount of conformational space is sampled during the folding process and this in turn implies the existence of kinetic folding pathways.

This paradox of how proteins fold rapidly and reliably to their native conformation is known as the protein folding problem.

Statistics is essence of making sense

04/19/23 Allam Appa Rao 18

Application of Shannon’s information theory breaks genetics and molecular biology out of the descriptive mode into the quantitative mode

04/19/23 Allam Appa Rao 19

George Gamow (1904-68)

Shannon - Information Flow Information flow in an information theoretical context is the transfer of

information from a variable h to a variable l in a given process. The measure of information flow, p P is defined as the uncertainty before the process started minus the

uncertainty after the process terminated. This can be quantified as

where H (h | l) is the conditional entropy (equivocation) of variable h (before the process started) given the variable l (before the process started), and H(h | l') is the conditional entropy (equivocation) of variable h (before the process started) given the variable l' (the value of variable l after the process finished).

H(X,Y) is the joint entropy, and can be calculated as follows:

04/19/23 Allam Appa Rao 20

Gene Protein

Information in living organisms

One of the prime characteristics of all living organisms is the information they contain for all operational processes

Braitenberg, a German cybernetist, has submitted evidence ‘that information is an intrinsic part of the essential nature of life.’ The transmission of information plays a fundamental role in everything that lives.

• Without a doubt, the most complex information processing system in existence is the human body. If we take all human information processes together, that is, conscious ones (language, information-controlled functions of the organs, hormone system), this involves the processing of 1024 bits daily.

• This astronomically high figure is higher by a factor of 1,000,000 than the total human knowledge of 1018 bits stored in all the world’s libraries.

04/19/23 Allam Appa Rao 21

Allen Turing and Gatlinburg symposium on information theory in biology

The logic of Turing machines has an isomorphism with the logic of the genetic information system

• Information Source• Transmission of Information• Tasks to be completed• Output

Information source: DNATransmission through

m/t/r RNATasks: Transcription,

translationOutput: Protein(s)

04/19/23 Allam Appa Rao 22

04/19/23 Allam Appa Rao 23

Information Theory, Evolution, and the Origin of LifeHubert P. Yockey, pp 35

Shannon, Turing, Gamow and Rao

H (gene) L (protein)

04/19/23 Allam Appa Rao 24

InformationTransferProcess

The word "information" derives from the Latin, informare, which means "to put into form”

Latent

• Existing or present

• but concealed or inactive

Manifest

Readily seen, or understood:

apparent, clear, evident, noticeable, observable

04/19/23 Allam Appa Rao 25

DNA Protein

How does the code work?

• Template for construction of proteins

04/19/23 Allam Appa Rao

Inherited disease: broken/ damaged dna broken/ damaged proteinsViral disease: dna/rna foreign proteins (Akin to Computer VIRUS)

26

Latent Information Manifested Information

Manifestation

04/19/23 Allam Appa Rao 27

Genomic InformationACGTCCGGCCTTATACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAGACGCTAATAAGCGCTCTATGTCTATACGCGCGATGCCGTACGAG…

What good is all What good is all this genetic this genetic information?information?

Information Inheritance

• Human beings are endowed with the information encoded in the genetic material of inheritance which controls the development, reproduction and self-repair.

• The carrier of this information is a complex structure of dna.

04/19/23 Allam Appa Rao 28

Anything Computable!

• Markov defined what became known as Markov algorithms (HMM) for biological computation

• Alonzo Church used Lambda calculus for computing

• Kurt Gödel defined Recursive functions

04/19/23 Allam Appa Rao 29

Genetic Information Flow

• Within the body, genetic information flows from dna to protein and other products

– first, by the transcription of portions of the dna into so-called messenger rna and,

– second, by (translation) the assembly of individual amino acids into polypeptides, including proteins.

This is the process of life and living related to information flow.

04/19/23 Allam Appa Rao 30

Paradigm of 'computational thinking'

Advances in computational power and computational methods have led to the crystallization of the paradigm of 'computational thinking’

This paradigm of 'computational thinking‘ takes applications of computer science far beyond mere programming and data management.

This paradigm of 'computational thinking’ provides newer methods for understanding the complex life style diseases like Diabetes. 04/19/23 Allam Appa Rao 31

32

The Value of the Right Tool

3.2 Billion Nucleotides A “Supercomputer” Cluster04/19/23 Allam Appa Rao

The new Challenges in Computer Science

• Promoter recognition in genomic sequence,

• Understanding data from micro array experiments, and

• Accurate prediction of protein folds from sequences

04/19/23 Allam Appa Rao 33

OVERVIEW

With the widespread availability of nucleotide and amino acid sequences, novel

methods for extracting biologically and clinically relevant knowledge are feasible.

Data is deposited on the Internet on websites such as GeneCards, available at

http://www.genecards.org/mirror.shtml .

Further information can be obtained from related sites - UniProt

(http://www.uniprot.org) and SwissProt (http://www.expasy.org/sprot/).

Using FASTA and CLUSTAL_X programs, similarity scores can be calculated to choose

items of interest.

Further information can be obtained by mining text, either manually or increasingly

using text-mining tools such as PathBinderH and GENIA corpus.

Bioinformatics approach to extract information from genes

METHODS AND PROCEDURES

Computerization is necessary to build the database of chromatography

coupled mass spectrometric analytical data.

On the basis of this proteomic data we can identify the proteins as

biomarkers which are expressing dissimilarity between healthy and disease

condition.

Affinity chromatography is one of the fastest liquid chromatographic

methods for the separation and purification of biomolecules due to its high

molecular specificity.

Proteomics and tools for identification and understanding the

biochemistry of proteins and pathways are in a new stage of development

and evaluation.

Computational approach of protein analysis using affinity chromatography: application to proteomics

A rapid and sensitive RP-HPLC method with UV detection (242 nm) for routine

analysis of famciclovir in pharmaceutical formulations was developed.

Chromatography was performed with mobile phase containing a mixture of

methanol and phosphate buffer (50:50, v/v) with flow rate 1.0 mL min−1.

Quantitation was accomplished with internal standard method. The procedure

was validated for linearity (correlation coefficient =0.9999), accuracy, robustness

and intermediate precision.

Experimental design was used for validation of robustness and intermediate

precision.

Contd…

Development and Validation of LC Method for the Determination of Famciclovir in Pharmaceutical

Formulation Using an Experimental Design

To test robustness, three factors were considered; percentage v/v of

methanol in mobile phase, flow rate and pH; flow rate, the percentage of

organic modifier and pH have considerable important effect on the response.

For intermediate precision measure the variables considered were: analyst,

equipment and number of days. The RSD value (0.86%, n=24) indicated an

acceptable precision of the analytical method.

The proposed method was simple, sensitive, precise, accurate and quick and

useful for routine quality control.

Mathematical Analysis of Diabetes Related Proteins Having High Sequence Complexity

We have searched for proteins affecting diabetes and we also found in which

common species these proteins were more prevalent and have performed protein

composition analysis of those having high sequence complexity.

About 90% of rat genes have counterparts in the mouse and human genomes and

this is the reason to find proteins common among the three different species.(Rat

Genome Sequencing Consortium 2004, www.ratbehaviour.org/Ratsmice.htm)

The distribution pattern of the protein variates was examined and bivariate plots

were further drawn.

Contd…

The bivariate plots show a similar clustering for Rattus norvegicus and Mus

Musculus but show some variation in Homo sapiens indicating that the plots are

correct as Rattus Norvegicus and Mus Musculus are relatively close in the

phylogenetic tree(Sridhar GR etal) hence having a similar clustering.

The proteins which are away from the cluster are outliers due to the reason that

they are having different compositional characteristics.

PHYLOGENETIC TREES: DIABETIC COMPLICATIONS AND RELATED CONDITIONS

Bioinformatics analysis of diabetic retinopathy using functional protein sequences

Diabetic retinopathy is the leading cause of blindness among patients with

diabetes mellitus.

We evaluated the role of several proteins that are likely to be involved in diabetic

retinopathy by employing multiple sequence alignment using ClustalW tool and

constructed a phylogram tree using functional protein sequences extracted from

NCBI.

Phylogram was constructed using Neighbor-Joining Algorithm in bioinformatics

approach.

It was observed that aldose reductase and nitric oxide synthase are closely

associated with diabetic retinopathy.

Contd…

It is likely that vascular endothelial growth factor, pro-inflammatory cytokines,

advanced glycation end products, and adhesion molecules that also play a role

in diabetic retinopathy may do so by modulating the activities of aldose

reductase and nitric oxide synthase.

These results imply that methods designed to normalize aldose reductase and

nitric oxide synthase activities could be of significant benefit in the prevention

and treatment of diabetic retinopathy.

COMPUTATIONAL PROTEIN SEQUENCES ANALYSIS FOR DIABETIC RETINOPATHY – A BIO INFORMATICS STUDY

The role of bioinformatics is to aid life scientists in gathering and processing

genomic data to study protein function.

Another important role is to aid researchers at pharmaceutical companies in

making detailed studies of protein structures to facilitate drug design.

Human genome with 3 billion chemical nucleotide bases has about 30,000

genes whose functions are known to a great extent.

These genes dictate the synthesis of different proteins which proteins differ

from one another in their amino acid sequence.

The physiological functions of a protein depend upon this sequence.

Contd…

The functions of the protein, butyrylcholinesterase, are not known to a great

extent.

Therefore its amino acid sequence is compared with the sequences of 29

different proteins using computational techniques.

Close similarity is observed with the protein EST2_human which confirms

similarities of physiological actions.

This finding obtained from computational techniques, now found to be

indispensable, help the scientists of life sciences to proceed with the work in their

wet laboratories.

Contd…

Amino acid sequence of BChE is compared with proteins which act as

inhibitors of neovascularisation and similarity is found with one of the

inhibitors. Early onset of diabetic retinopathy is often found in patients who

have insufficient BChE in their serum.

This suggests that BChE may act as an inhibitor of neovascularisation

that causes retinopathy.

Bioinformatics analysis of functional protein sequences reveals a role for brain-derived neurotrophic factor in

obesity and type 2 diabetes mellitus

Using bioinformatics techniques and sequence analyses algorithms, a

comparative study between human and rodents revealed similarity in the

behavior of genes involved in the control of energy homeostasis.

Brain-derived neurotrophic factor (BDNF) modulates the secretion and

actions of insulin, leptin, ghrelin, various neurotransmitters and peptides, and

pro-inflammatory cytokines involved in energy homeostasis suggesting that it

(BDNF) has a significant role in the pathobiology of obesity and type 2

diabetes mellitus.

Contd…

Based on these evidences, we propose that obesity and type 2 diabetes could be

disorders of the brain and BDNF could serve as a biomarker in predicting their

development.

Hence, methods developed to selectively deliver BDNF to appropriate

hypothalamic neurons may form a novel approach in their treatment.

Bioinformatics Analysis of Functional Protein Sequences Reveals a Role for Tumor Necrosis Factor-α and Nitric

Oxide in Insulin Resistance Syndrome

Using bioinformatics techniques and sequence analyses algorithms, we

identified that tumor necrosis factor-α (TNF-α) and nitric oxide (NO) have a

significant role in the pathobiology of insulin resistance syndrome, a condition that

is common in subjects with abdominal obesity, hypertension, dyslipidemia,

atherosclerosis, and coronary heart disease and are accompanied by endothelial

dysfunction due to reduced endothelial nitric oxide generation.

TNF-α has neurotoxic actions, stimulates inducible NO synthase activity, and

modulates the expression of neurotransmitters involved in the control of feeding

and thermogenesis.

Contd…

NO is a neurotransmitter and influences secretion and actions of various

hypothalamic peptides and neuropeptides.

Insulin suppresses the production of TNF-α but stimulates that of endothelial NO.

This close interaction between TNF-α, NO, hypothalamic peptides, and insulin

suggests that regulation of TNF-α and NO production and action could be critical in

the management of insulin resistance syndrome and its associated conditions.

BUTYRLCHOLINESTERASE

Phylogenetic Tree Construction of Butyrylcholinesterase Sequences in Life Forms

Butyrylcholinesterase is an enzyme with few known physiological functions. It is

related to acetylcholine that was shown to be expressed in a variety of life forms.

We performed a search using the human butyrylcholinesterase gene

(HGNC:983;MIM:177400), and found the sequence in a broad spectrum including

plants, bacteria and animals.

Therefore butyrylcholinesterase appears to have evolved early in evolution,

and to have been conserved.

Serum butyrylcholinesterase in type 2 diabetes mellitus: a biochemical and bioinformatics approach

Background

Butyrylcholinesterase is an enzyme that may serve as a marker of metabolic

syndrome. We (a) measured its level in persons with diabetes mellitus, (b)

constructed a family tree of the enzyme using nucleotide sequences downloaded

from NCBI. Butyrylcholinesterase was estimated colorimetrically using a

commercially available kit (Randox Lab, UK). Phylogenetic trees were constructed

by distance method (Fitch and Margoliash method) and by maximum parsimony

method.Contd…

Results

There was a negative correlation between serum total cholesterol

and butyrylcholinesterase (-0.407; p < 0.05) and between serum LDL

cholesterol and butyrylcholinesterase (-0.435; p < 0.05). There was

no statistically significant correlation among the other biochemical

parameters. In the evolutionary tree construction both methods gave

similar trees, except for an inversion in the position of Sus scrofa

(M62778) and Oryctolagus cuniculus (M62779) between Fitch and

Margoliash, and maximum parsimony methods.

Conclusion

The level of butyrylcholinesterase enzyme was inversely related to

serum cholesterol; dendrogram showed that the structures from

evolutionarily close species were placed near each other.

Alzheimer's disease and Type 2 diabetes mellitus: the cholinesterase connection?

Alzheimer's disease and type 2 diabetes mellitus tend to occur together.

We sought to identify protein(s) common to both conditions that could suggest a

possible unifying pathogenic role. Using human neuronal butyrylcholinesterase

(AAH08396.1) as the reference protein we used BLAST Tool for protein to protein

comparison in humans. We found three groups of sequences among a series of

12, with an E-value between 0–12, common to both Alzheimer's disease and

diabetes: butyrylcholinesterase precursor K allele (NP_000046.1),

acetylcholinesterase isoform E4-E6 precursor (NP_000656.1), and apoptosis-

related acetylcholinesterase (1B41|A).

Contd…

Butyrylcholinesterase and acetylcholinesterase related proteins were found

common to both Alzheimer's disease and diabetes; they may play an etiological

role via influencing insulin resistanceand lipid metabolism.