Upload
eric-lobo
View
219
Download
0
Embed Size (px)
Citation preview
7/31/2019 Bioinformatics 1
1/24
Bioinformatics
Bioinformatics is a branch of biological science whichdeals with the study of methods for storing, retrieving andanalyzing biological data, such as nucleic acid (DNA/RNA)and protein sequences, structures, functions, pathwaysand genetic interactions. It generates new knowledge
about drug designing and development of new softwaretools. Bioinformatics also deals with algorithms, databases andinformation systems, web technologies, artificial intelligenceand soft computing, information and computation theory,structural biology, software engineering, data mining, imageprocessing, modeling and simulation, signal processing,discrete mathematics, control and system theory, circuit theory,and statistics.
Commonly used software tools and technologies in this fieldincludeJava,XML,Perl,C,C++,Python,R,MySQL,SQL,CUDA,MATLAB, andMicrosoft Excel.
http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Java_(programming_language)7/31/2019 Bioinformatics 1
2/24
Introduction
7/31/2019 Bioinformatics 1
3/24
Introduction
Bioinformatics was applied in the creation and maintenance of
a database to store biological information at the beginning of
the "genomic revolution", such asnucleotide
sequencesandamino acid sequences. Development of this
type of database involved not only design issues but the
development of complex interfaces whereby researchers could
access existing data as well as submit new or revised data.
In order to study how normal cellular activities are altered in
different disease states, the biological data must be combined
to form a comprehensive picture of these activities. Therefore,
the field of bioinformatics has evolved such that the most
pressing task now involves the analysis and interpretation of
various types of data. This includes nucleotide and amino acid
sequences, protein domains, and protein structures.[1]The
actual process of analyzing and interpreting data is referred to
as computational biology. Important sub-disciplines within
bioinformatics and computational biology include:
http://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequences7/31/2019 Bioinformatics 1
4/24
the development and implementation of tools that enable
efficient access to, and use and management of, various
types of information.
the development of new algorithms (mathematical formulas)
and statistics with which to assess relationships among
members of large data sets. For example, methods to locate
a gene within a sequence, predict protein structure and/or
function, and cluster protein sequences into families of
related sequences.
The primary goal of bioinformatics is to increase the
understanding of biological processes. What sets it apart from
other approaches, however, is its focus on developing and
applying computationally intensive techniques to achieve this
goal. Examples include:pattern recognition,data
mining,machine learningalgorithms, andvisualization. Major
research efforts in the field includesequence alignment,gene
finding,genome assembly,drug design,drug discovery,protein
structure alignment,protein structure prediction, prediction
http://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Pattern_recognition7/31/2019 Bioinformatics 1
5/24
ofgene expressionandproteinprotein interactions,genome-
wide association studiesand the modeling ofevolution.
Interestingly, the term bioinformaticswas coined before the
"genomic revolution".Paulien Hogewegand Ben Hesper
introduced the term in 1978 to refer to "the study of information
processes in biotic systems".[2][3]
This definition placed
bioinformatics as a field parallel to biophysics or biochemistry
(biochemistry is the study of chemical processes in biological
systems).[3]However, its primary use since at least the late
1980s has been to describe the application of computer
science and information sciences to the analysis of biological
data, particularly in those areas of genomics involving large-
scaleDNA sequencing.
Bioinformatics now entails the creation and advancement of
databases, algorithms, computational and statistical techniques
and theory to solve formal and practical problems arising from
the management and analysis of biological data.
http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Gene_expression7/31/2019 Bioinformatics 1
6/24
Over the past few decades rapid developments in genomic and
other molecular research technologies and developments in
information technologies have combined to produce a
tremendous amount of information related to molecular biology.
Bioinformatics is the name given to these mathematical and
computing approaches used to glean understanding of
biological processes.
Common activities in bioinformatics include mapping and
analyzingDNAand protein sequences, aligning different DNA
and protein sequences to compare them, and creating and
viewing 3-D models of protein structures.
There are two fundamental ways of modelling a Biological
system (e.g., living cell) both coming under Bioinformatic
approaches.
Static
Sequences Proteins, Nucleic acids and Peptides
Structures Proteins, Nucleic acids, Ligands (includingmetabolites and drugs) and Peptides
Interaction data among the above entities includingmicroarray data and Networks of proteins, metabolites
http://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNA7/31/2019 Bioinformatics 1
7/24
Dynamic
Systems Biology comes under this category includingreaction fluxes and variable concentrations of metabolites
Multi-Agent Based modelling approaches capturingcellular events such as signalling, transcription andreaction dynamics
A broad sub-category under bioinformatics isstructural
bioinformatics.
http://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformatics7/31/2019 Bioinformatics 1
8/24
Sequence analysis
7/31/2019 Bioinformatics 1
9/24
Sequence analysis
Main articles:Sequence alignmentandSequence database
Since thePhage -X174wassequencedin 1977,[4]theDNA
sequencesof thousands of organisms have been decoded and
stored in databases. This sequence information is analyzed to
determine genes that encodepolypeptides(proteins), RNA
genes, regulatory sequences, structural motifs, and repetitive
sequences. A comparison of genes within aspeciesor between
different species can show similarities between protein
functions, or relations between species (the use ofmolecular
systematicsto constructphylogenetic trees). With the growing
amount of data, it long ago became impractical to analyze DNA
sequences manually. Today,computer programssuch
asBLASTare used daily to search sequences from more than
260 000 organisms, containing over 190
billionnucleotides.[5]
These programs can compensate for
mutations (exchanged, deleted or inserted bases) in the DNA
http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_alignment7/31/2019 Bioinformatics 1
10/24
sequence, to identify sequences that are related, but not
identical. A variant of thissequence alignmentis used in the
sequencing process itself. The so-calledshotgun
sequencingtechnique (which was used, for example, byThe
Institute for Genomic Researchto sequence the first bacterial
genome,Haemophilus influenzae)[6]does not produce entire
chromosomes. Instead it generates the sequences of many
thousands of small DNA fragments (ranging from 35 to 900
nucleotides long, depending on the sequencing technology).
The ends of these fragments overlap and, when aligned
properly by a genome assembly program, can be used to
reconstruct the complete genome. Shotgun sequencing yields
sequence data quickly, but the task of assembling the
fragments can be quite complicated for larger genomes. For a
genome as large as thehuman genome, it may take many days
of CPU time on large-memory, multiprocessor computers to
assemble the fragments, and the resulting assembly will usually
contain numerous gaps that have to be filled in later. Shotgun
sequencing is the method of choice for virtually all genomes
http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Sequence_alignment7/31/2019 Bioinformatics 1
11/24
sequenced today, and genome assembly algorithms are a
critical area of bioinformatics research.
Another aspect of bioinformatics in sequence analysis is
annotation. This involves computationalgene findingto search
for protein-coding genes, RNA genes, and other functional
sequences within a genome. Not all of the nucleotides within a
genome are part of genes. Within the genomes of higher
organisms, large parts of the DNA do not serve any obvious
purpose. This so-calledjunk DNAmay, however, contain
unrecognized functional elements. Bioinformatics helps to
bridge the gap between genome andproteomeprojects for
example, in the use of DNA sequences for protein identification.
http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Gene_finding7/31/2019 Bioinformatics 1
12/24
Genome annotation
Main article:Gene finding
In the context ofgenomics,annotationis the process of
marking the genes and other biological features in a DNA
sequence. The first genome annotation software system was
designed in 1995 by Dr. Owen White, who was part of the team
atThe Institute for Genomic Researchthat sequenced and
analyzed the first genome of a free-living organism to be
decoded, the bacteriumHaemophilus influenzae. Dr. White
built a software system to find the genes (places in the DNA
sequence that encode a protein), the transfer RNA, and other
features, and to make initial assignments of function to those
genes. Most current genome annotation systems work similarly,
but the programs available for analysis of genomic DNA are
constantly changing and improving.
http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Gene_finding7/31/2019 Bioinformatics 1
13/24
Analysis of gene expression
Theexpressionof many genes can be determined by
measuringmRNAlevels with multiple techniques
includingmicroarrays,expressed cDNA sequence tag(EST)
sequencing,serial analysis of gene expression(SAGE) tag
sequencing,massively parallel signature
sequencing(MPSS),RNA-Seq, also known as "Whole
Transcriptome Shotgun Sequencing" (WTSS), or various
applications of multiplexed in-situ hybridization. All of these
techniques are extremely noise-prone and/or subject to bias in
the biological measurement, and a major research area in
computational biology involves developing statistical tools to
separatesignalfromnoisein high-throughput gene expression
studies. Such studies are often used to determine the genes
implicated in a disorder: one might compare microarray data
from cancerousepithelialcells to data from non-cancerous cells
to determine the transcripts that are up-regulated and down-
regulated in a particular population of cancer cells.
http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Gene_expression7/31/2019 Bioinformatics 1
14/24
7/31/2019 Bioinformatics 1
15/24
Analysis of protein expression
Protein microarraysand high throughput (HT)mass
spectrometry(MS) can provide a snapshot of the proteins
present in a biological sample. Bioinformatics is very much
involved in making sense of protein microarray and HT MS
data; the former approach faces similar problems as with
microarrays targeted at mRNA, the latter involves the problem
of matching large amounts of mass data against predicted
masses from protein sequence databases, and the complicated
statistical analysis of samples where multiple, but incomplete
peptides from each protein are detected.
http://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Protein_microarray7/31/2019 Bioinformatics 1
16/24
Comparative genomics
Main article:Comparative genomics
The core of comparative genome analysis is the establishment
of the correspondence betweengenes(orthologyanalysis) or
other genomic features in different organisms. It is these
intergenomic maps that make it possible to trace the
evolutionary processes responsible for the divergence of two
genomes. A multitude of evolutionary events acting at various
organizational levels shape genome evolution. At the lowest
level, point mutations affect individual nucleotides. At a higher
level, large chromosomal segments undergo duplication, lateral
transfer, inversion, transposition, deletion and insertion.
Ultimately, whole genomes are involved in processes of
hybridization, polyploidization andendosymbiosis, often leading
to rapid speciation. The complexity of genome evolution poses
many exciting challenges to developers of mathematical
http://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Comparative_genomics7/31/2019 Bioinformatics 1
17/24
models and algorithms, who have recourse to a spectra of
algorithmic, statistical and mathematical techniques, ranging
from exact,heuristics, fixed parameter andapproximation
algorithmsfor problems based on parsimony models toMarkov
Chain Monte Carloalgorithms forBayesian analysisof
problems based on probabilistic models.
Modeling biological systems
Main article:Computational systems biology
Systems biology involves the use ofcomputer
simulationsofcellularsubsystems (such as thenetworks of
metabolitesandenzymeswhich comprisemetabolism,signal
transductionpathways andgene regulatory networks) to both
analyze and visualize the complex connections of these cellular
processes.Artificial life or virtual evolution attempts to
http://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Heuristics7/31/2019 Bioinformatics 1
18/24
understand evolutionary processes via the computer simulation
of simple (artificial) life forms.
Prediction of protein structure
Main article:Protein structure prediction
Protein structure prediction is another important application of
bioinformatics. Theamino acidsequence of a protein, the so-
calledprimary structure, can be easily determined from the
sequence on the gene that codes for it. In the vast majority of
cases, this primary structure uniquely determines a structure in
its native environment. (Of course, there are exceptions, such
as thebovine spongiform encephalopathy a.k.a.Mad Cow
Diseaseprion.) Knowledge of this structure is vital in
http://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Protein_structure_prediction7/31/2019 Bioinformatics 1
19/24
understanding the function of the protein. For lack of better
terms, structural information is usually classified as one
ofsecondary,tertiaryandquaternarystructure. A viable
general solution to such predictions remains an open problem.
As of now, most efforts have been directed towards heuristics
that work most of the time.
One of the key ideas in bioinformatics is the notion
ofhomology. In the genomic branch of bioinformatics,
homology is used to predict the function of a gene: if the
sequence of geneA, whose function is known, is homologous to
the sequence of gene B, whose function is unknown, one could
infer that B may share A's function. In the structural branch of
bioinformatics, homology is used to determine which parts of a
protein are important in structure formation and interaction with
other proteins. In a technique called homology modeling, this
information is used to predict the structure of a protein once the
structure of a homologous protein is known. This currently
remains the only way to predict protein structures reliably.
http://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Secondary_structure7/31/2019 Bioinformatics 1
20/24
One example of this is the similar protein homology between
hemoglobin in humans and the hemoglobin in legumes
(leghemoglobin). Both serve the same purpose of transporting
oxygen in the organism. Though both of these proteins have
completely different amino acid sequences, their protein
structures are virtually identical, which reflects their near
identical purposes.
Other techniques for predicting protein structure include protein
threading and de novo(from scratch) physics-based modeling.
http://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobin7/31/2019 Bioinformatics 1
21/24
Web services in bioinformatics
SOAPandREST-based interfaces have been developed for a
wide variety of bioinformatics applications allowing an
application running on one computer in one part of the world to
use algorithms, data and computing resources on servers in
other parts of the world. The main advantages derive from the
fact that end users do not have to deal with software and
database maintenance overheads.
Basic bioinformatics services are classified by theEBIinto
three categories:SSS(Sequence Search
Services),MSA(Multiple Sequence Alignment)
http://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/SOAP7/31/2019 Bioinformatics 1
22/24
andBSA(Biological Sequence Analysis). The availability of
theseservice-orientedbioinformatics resources demonstrate
the applicability of web based bioinformatics solutions, and
range from a collection of standalone tools with a common data
format under a single, standalone or web-based interface, to
integrative, distributed and extensiblebioinformatics workflow
management systems.
http://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysis7/31/2019 Bioinformatics 1
23/24
High-throughput image analysis
Computational technologies are used to accelerate or fully
automate the processing, quantification and analysis of large
amounts of high-information-contentbiomedical imagery.
Modern image analysis systems augment an observer's ability
to make measurements from a large or complex set of images,
by improvingaccuracy,objectivity, or speed. A fully developed
analysis system may completely replace the observer. Although
these systems are not unique to biomedical imagery,
http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=17/31/2019 Bioinformatics 1
24/24
biomedical imaging is becoming more important for
bothdiagnosticsand research. Some examples are:
high-throughput and high-fidelity quantification and sub-
cellular localization (high-content screening,
cytohistopathology,Bioimage informatics)
morphometrics
clinical image analysis and visualization
determining the real-time air-flow patterns in breathing lungs
of living animals
quantifying occlusion size in real-time imagery from the
development of and recovery during arterial injury
making behavioral observations from extended video
recordings of laboratory animals
infrared measurements for metabolic activity determination
inferring clone overlaps inDNA mapping, e.g. the Sulston
score
http://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Diagnostics