Bioinformatics 1

7/31/2019 Bioinformatics 1

1/24

Bioinformatics

Bioinformatics is a branch of biological science whichdeals with the study of methods for storing, retrieving andanalyzing biological data, such as nucleic acid (DNA/RNA)and protein sequences, structures, functions, pathwaysand genetic interactions. It generates new knowledge

about drug designing and development of new softwaretools. Bioinformatics also deals with algorithms, databases andinformation systems, web technologies, artificial intelligenceand soft computing, information and computation theory,structural biology, software engineering, data mining, imageprocessing, modeling and simulation, signal processing,discrete mathematics, control and system theory, circuit theory,and statistics.

Commonly used software tools and technologies in this fieldincludeJava,XML,Perl,C,C++,Python,R,MySQL,SQL,CUDA,MATLAB, andMicrosoft Excel.
http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Java_(programming_language)


2/24

Introduction


3/24

Introduction

Bioinformatics was applied in the creation and maintenance of

a database to store biological information at the beginning of

the "genomic revolution", such asnucleotide

sequencesandamino acid sequences. Development of this

type of database involved not only design issues but the

development of complex interfaces whereby researchers could

access existing data as well as submit new or revised data.

In order to study how normal cellular activities are altered in

different disease states, the biological data must be combined

to form a comprehensive picture of these activities. Therefore,

the field of bioinformatics has evolved such that the most

pressing task now involves the analysis and interpretation of

various types of data. This includes nucleotide and amino acid

sequences, protein domains, and protein structures.[1]The

actual process of analyzing and interpreting data is referred to

as computational biology. Important sub-disciplines within

bioinformatics and computational biology include:
http://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequences


4/24

the development and implementation of tools that enable

efficient access to, and use and management of, various

types of information.

the development of new algorithms (mathematical formulas)

and statistics with which to assess relationships among

members of large data sets. For example, methods to locate

a gene within a sequence, predict protein structure and/or

function, and cluster protein sequences into families of

related sequences.

The primary goal of bioinformatics is to increase the

understanding of biological processes. What sets it apart from

other approaches, however, is its focus on developing and

applying computationally intensive techniques to achieve this

goal. Examples include:pattern recognition,data

mining,machine learningalgorithms, andvisualization. Major

research efforts in the field includesequence alignment,gene

finding,genome assembly,drug design,drug discovery,protein

structure alignment,protein structure prediction, prediction
http://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Pattern_recognition


5/24

ofgene expressionandproteinprotein interactions,genome-

wide association studiesand the modeling ofevolution.

Interestingly, the term bioinformaticswas coined before the

"genomic revolution".Paulien Hogewegand Ben Hesper

introduced the term in 1978 to refer to "the study of information

processes in biotic systems".[2][3]

This definition placed

bioinformatics as a field parallel to biophysics or biochemistry

(biochemistry is the study of chemical processes in biological

systems).[3]However, its primary use since at least the late

1980s has been to describe the application of computer

science and information sciences to the analysis of biological

data, particularly in those areas of genomics involving large-

scaleDNA sequencing.

Bioinformatics now entails the creation and advancement of

databases, algorithms, computational and statistical techniques

and theory to solve formal and practical problems arising from

the management and analysis of biological data.
http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Gene_expression


6/24

Over the past few decades rapid developments in genomic and

other molecular research technologies and developments in

information technologies have combined to produce a

tremendous amount of information related to molecular biology.

Bioinformatics is the name given to these mathematical and

computing approaches used to glean understanding of

biological processes.

Common activities in bioinformatics include mapping and

analyzingDNAand protein sequences, aligning different DNA

and protein sequences to compare them, and creating and

viewing 3-D models of protein structures.

There are two fundamental ways of modelling a Biological

system (e.g., living cell) both coming under Bioinformatic

approaches.

Static

Sequences Proteins, Nucleic acids and Peptides

Structures Proteins, Nucleic acids, Ligands (includingmetabolites and drugs) and Peptides

Interaction data among the above entities includingmicroarray data and Networks of proteins, metabolites
http://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNA


7/24

Dynamic

Systems Biology comes under this category includingreaction fluxes and variable concentrations of metabolites

Multi-Agent Based modelling approaches capturingcellular events such as signalling, transcription andreaction dynamics

A broad sub-category under bioinformatics isstructural

bioinformatics.
http://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformatics


8/24

Sequence analysis


9/24

Sequence analysis

Main articles:Sequence alignmentandSequence database

Since thePhage -X174wassequencedin 1977,[4]theDNA

sequencesof thousands of organisms have been decoded and

stored in databases. This sequence information is analyzed to

determine genes that encodepolypeptides(proteins), RNA

genes, regulatory sequences, structural motifs, and repetitive

sequences. A comparison of genes within aspeciesor between

different species can show similarities between protein

functions, or relations between species (the use ofmolecular

systematicsto constructphylogenetic trees). With the growing

amount of data, it long ago became impractical to analyze DNA

sequences manually. Today,computer programssuch

asBLASTare used daily to search sequences from more than

260 000 organisms, containing over 190

billionnucleotides.[5]

These programs can compensate for

mutations (exchanged, deleted or inserted bases) in the DNA
http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_alignment


10/24

sequence, to identify sequences that are related, but not

identical. A variant of thissequence alignmentis used in the

sequencing process itself. The so-calledshotgun

sequencingtechnique (which was used, for example, byThe

Institute for Genomic Researchto sequence the first bacterial

genome,Haemophilus influenzae)[6]does not produce entire

chromosomes. Instead it generates the sequences of many

thousands of small DNA fragments (ranging from 35 to 900

nucleotides long, depending on the sequencing technology).

The ends of these fragments overlap and, when aligned

properly by a genome assembly program, can be used to

reconstruct the complete genome. Shotgun sequencing yields

sequence data quickly, but the task of assembling the

fragments can be quite complicated for larger genomes. For a

genome as large as thehuman genome, it may take many days

of CPU time on large-memory, multiprocessor computers to

assemble the fragments, and the resulting assembly will usually

contain numerous gaps that have to be filled in later. Shotgun

sequencing is the method of choice for virtually all genomes
http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Sequence_alignment


11/24

sequenced today, and genome assembly algorithms are a

critical area of bioinformatics research.

Another aspect of bioinformatics in sequence analysis is

annotation. This involves computationalgene findingto search

for protein-coding genes, RNA genes, and other functional

sequences within a genome. Not all of the nucleotides within a

genome are part of genes. Within the genomes of higher

organisms, large parts of the DNA do not serve any obvious

purpose. This so-calledjunk DNAmay, however, contain

unrecognized functional elements. Bioinformatics helps to

bridge the gap between genome andproteomeprojects for

example, in the use of DNA sequences for protein identification.
http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Gene_finding


12/24

Genome annotation

Main article:Gene finding

In the context ofgenomics,annotationis the process of

marking the genes and other biological features in a DNA

sequence. The first genome annotation software system was

designed in 1995 by Dr. Owen White, who was part of the team

atThe Institute for Genomic Researchthat sequenced and

analyzed the first genome of a free-living organism to be

decoded, the bacteriumHaemophilus influenzae. Dr. White

built a software system to find the genes (places in the DNA

sequence that encode a protein), the transfer RNA, and other

features, and to make initial assignments of function to those

genes. Most current genome annotation systems work similarly,

but the programs available for analysis of genomic DNA are

constantly changing and improving.
http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Gene_finding


13/24

Analysis of gene expression

Theexpressionof many genes can be determined by

measuringmRNAlevels with multiple techniques

includingmicroarrays,expressed cDNA sequence tag(EST)

sequencing,serial analysis of gene expression(SAGE) tag

sequencing,massively parallel signature

sequencing(MPSS),RNA-Seq, also known as "Whole

Transcriptome Shotgun Sequencing" (WTSS), or various

applications of multiplexed in-situ hybridization. All of these

techniques are extremely noise-prone and/or subject to bias in

the biological measurement, and a major research area in

computational biology involves developing statistical tools to

separatesignalfromnoisein high-throughput gene expression

studies. Such studies are often used to determine the genes

implicated in a disorder: one might compare microarray data

from cancerousepithelialcells to data from non-cancerous cells

to determine the transcripts that are up-regulated and down-

regulated in a particular population of cancer cells.
http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Gene_expression


14/24


15/24

Analysis of protein expression

Protein microarraysand high throughput (HT)mass

spectrometry(MS) can provide a snapshot of the proteins

present in a biological sample. Bioinformatics is very much

involved in making sense of protein microarray and HT MS

data; the former approach faces similar problems as with

microarrays targeted at mRNA, the latter involves the problem

of matching large amounts of mass data against predicted

masses from protein sequence databases, and the complicated

statistical analysis of samples where multiple, but incomplete

peptides from each protein are detected.
http://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Protein_microarray


16/24

Comparative genomics

Main article:Comparative genomics

The core of comparative genome analysis is the establishment

of the correspondence betweengenes(orthologyanalysis) or

other genomic features in different organisms. It is these

intergenomic maps that make it possible to trace the

evolutionary processes responsible for the divergence of two

genomes. A multitude of evolutionary events acting at various

organizational levels shape genome evolution. At the lowest

level, point mutations affect individual nucleotides. At a higher

level, large chromosomal segments undergo duplication, lateral

transfer, inversion, transposition, deletion and insertion.

Ultimately, whole genomes are involved in processes of

hybridization, polyploidization andendosymbiosis, often leading

to rapid speciation. The complexity of genome evolution poses

many exciting challenges to developers of mathematical
http://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Comparative_genomics


17/24

models and algorithms, who have recourse to a spectra of

algorithmic, statistical and mathematical techniques, ranging

from exact,heuristics, fixed parameter andapproximation

algorithmsfor problems based on parsimony models toMarkov

Chain Monte Carloalgorithms forBayesian analysisof

problems based on probabilistic models.

Modeling biological systems

Main article:Computational systems biology

Systems biology involves the use ofcomputer

simulationsofcellularsubsystems (such as thenetworks of

metabolitesandenzymeswhich comprisemetabolism,signal

transductionpathways andgene regulatory networks) to both

analyze and visualize the complex connections of these cellular

processes.Artificial life or virtual evolution attempts to
http://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Heuristics


18/24

understand evolutionary processes via the computer simulation

of simple (artificial) life forms.

Prediction of protein structure

Main article:Protein structure prediction

Protein structure prediction is another important application of

bioinformatics. Theamino acidsequence of a protein, the so-

calledprimary structure, can be easily determined from the

sequence on the gene that codes for it. In the vast majority of

cases, this primary structure uniquely determines a structure in

its native environment. (Of course, there are exceptions, such

as thebovine spongiform encephalopathy a.k.a.Mad Cow

Diseaseprion.) Knowledge of this structure is vital in
http://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Protein_structure_prediction


19/24

understanding the function of the protein. For lack of better

terms, structural information is usually classified as one

ofsecondary,tertiaryandquaternarystructure. A viable

general solution to such predictions remains an open problem.

As of now, most efforts have been directed towards heuristics

that work most of the time.

One of the key ideas in bioinformatics is the notion

ofhomology. In the genomic branch of bioinformatics,

homology is used to predict the function of a gene: if the

sequence of geneA, whose function is known, is homologous to

the sequence of gene B, whose function is unknown, one could

infer that B may share A's function. In the structural branch of

bioinformatics, homology is used to determine which parts of a

protein are important in structure formation and interaction with

other proteins. In a technique called homology modeling, this

information is used to predict the structure of a protein once the

structure of a homologous protein is known. This currently

remains the only way to predict protein structures reliably.
http://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Secondary_structure


20/24

One example of this is the similar protein homology between

hemoglobin in humans and the hemoglobin in legumes

(leghemoglobin). Both serve the same purpose of transporting

oxygen in the organism. Though both of these proteins have

completely different amino acid sequences, their protein

structures are virtually identical, which reflects their near

identical purposes.

Other techniques for predicting protein structure include protein

threading and de novo(from scratch) physics-based modeling.
http://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobin


21/24

Web services in bioinformatics

SOAPandREST-based interfaces have been developed for a

wide variety of bioinformatics applications allowing an

application running on one computer in one part of the world to

use algorithms, data and computing resources on servers in

other parts of the world. The main advantages derive from the

fact that end users do not have to deal with software and

database maintenance overheads.

Basic bioinformatics services are classified by theEBIinto

three categories:SSS(Sequence Search

Services),MSA(Multiple Sequence Alignment)
http://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/SOAP


22/24

andBSA(Biological Sequence Analysis). The availability of

theseservice-orientedbioinformatics resources demonstrate

the applicability of web based bioinformatics solutions, and

range from a collection of standalone tools with a common data

format under a single, standalone or web-based interface, to

integrative, distributed and extensiblebioinformatics workflow

management systems.
http://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysis


23/24

High-throughput image analysis

Computational technologies are used to accelerate or fully

automate the processing, quantification and analysis of large

amounts of high-information-contentbiomedical imagery.

Modern image analysis systems augment an observer's ability

to make measurements from a large or complex set of images,

by improvingaccuracy,objectivity, or speed. A fully developed

analysis system may completely replace the observer. Although

these systems are not unique to biomedical imagery,
http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1


24/24

biomedical imaging is becoming more important for

bothdiagnosticsand research. Some examples are:

high-throughput and high-fidelity quantification and sub-

cellular localization (high-content screening,

cytohistopathology,Bioimage informatics)

morphometrics

clinical image analysis and visualization

determining the real-time air-flow patterns in breathing lungs

of living animals

quantifying occlusion size in real-time imagery from the

development of and recovery during arterial injury

making behavioral observations from extended video

recordings of laboratory animals

infrared measurements for metabolic activity determination

inferring clone overlaps inDNA mapping, e.g. the Sulston

score
http://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Diagnostics

Documents

Bioinformatics 1