Bioinformatics 1

Embed Size (px)

Citation preview

  • 7/31/2019 Bioinformatics 1

    1/24

    Bioinformatics

    Bioinformatics is a branch of biological science whichdeals with the study of methods for storing, retrieving andanalyzing biological data, such as nucleic acid (DNA/RNA)and protein sequences, structures, functions, pathwaysand genetic interactions. It generates new knowledge

    about drug designing and development of new softwaretools. Bioinformatics also deals with algorithms, databases andinformation systems, web technologies, artificial intelligenceand soft computing, information and computation theory,structural biology, software engineering, data mining, imageprocessing, modeling and simulation, signal processing,discrete mathematics, control and system theory, circuit theory,and statistics.

    Commonly used software tools and technologies in this fieldincludeJava,XML,Perl,C,C++,Python,R,MySQL,SQL,CUDA,MATLAB, andMicrosoft Excel.

    http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/Java_(programming_language)http://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/Microsoft_Excelhttp://en.wikipedia.org/wiki/MATLABhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/MySQLhttp://en.wikipedia.org/wiki/R_(programming_language)http://en.wikipedia.org/wiki/Python_(programming_language)http://en.wikipedia.org/wiki/C%2B%2Bhttp://en.wikipedia.org/wiki/C_(programming_language)http://en.wikipedia.org/wiki/Perlhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Java_(programming_language)
  • 7/31/2019 Bioinformatics 1

    2/24

    Introduction

  • 7/31/2019 Bioinformatics 1

    3/24

    Introduction

    Bioinformatics was applied in the creation and maintenance of

    a database to store biological information at the beginning of

    the "genomic revolution", such asnucleotide

    sequencesandamino acid sequences. Development of this

    type of database involved not only design issues but the

    development of complex interfaces whereby researchers could

    access existing data as well as submit new or revised data.

    In order to study how normal cellular activities are altered in

    different disease states, the biological data must be combined

    to form a comprehensive picture of these activities. Therefore,

    the field of bioinformatics has evolved such that the most

    pressing task now involves the analysis and interpretation of

    various types of data. This includes nucleotide and amino acid

    sequences, protein domains, and protein structures.[1]The

    actual process of analyzing and interpreting data is referred to

    as computational biology. Important sub-disciplines within

    bioinformatics and computational biology include:

    http://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Bioinformatics#cite_note-0http://en.wikipedia.org/wiki/Amino_acid_sequencehttp://en.wikipedia.org/wiki/Nucleotide_sequenceshttp://en.wikipedia.org/wiki/Nucleotide_sequences
  • 7/31/2019 Bioinformatics 1

    4/24

    the development and implementation of tools that enable

    efficient access to, and use and management of, various

    types of information.

    the development of new algorithms (mathematical formulas)

    and statistics with which to assess relationships among

    members of large data sets. For example, methods to locate

    a gene within a sequence, predict protein structure and/or

    function, and cluster protein sequences into families of

    related sequences.

    The primary goal of bioinformatics is to increase the

    understanding of biological processes. What sets it apart from

    other approaches, however, is its focus on developing and

    applying computationally intensive techniques to achieve this

    goal. Examples include:pattern recognition,data

    mining,machine learningalgorithms, andvisualization. Major

    research efforts in the field includesequence alignment,gene

    finding,genome assembly,drug design,drug discovery,protein

    structure alignment,protein structure prediction, prediction

    http://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Pattern_recognitionhttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Protein_structural_alignmenthttp://en.wikipedia.org/wiki/Drug_discoveryhttp://en.wikipedia.org/wiki/Drug_designhttp://en.wikipedia.org/wiki/Genome_assemblyhttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Biological_Data_Visualizationhttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/Pattern_recognition
  • 7/31/2019 Bioinformatics 1

    5/24

    ofgene expressionandproteinprotein interactions,genome-

    wide association studiesand the modeling ofevolution.

    Interestingly, the term bioinformaticswas coined before the

    "genomic revolution".Paulien Hogewegand Ben Hesper

    introduced the term in 1978 to refer to "the study of information

    processes in biotic systems".[2][3]

    This definition placed

    bioinformatics as a field parallel to biophysics or biochemistry

    (biochemistry is the study of chemical processes in biological

    systems).[3]However, its primary use since at least the late

    1980s has been to describe the application of computer

    science and information sciences to the analysis of biological

    data, particularly in those areas of genomics involving large-

    scaleDNA sequencing.

    Bioinformatics now entails the creation and advancement of

    databases, algorithms, computational and statistical techniques

    and theory to solve formal and practical problems arising from

    the management and analysis of biological data.

    http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/DNA_sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-Hogeweg2011-2http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Bioinformatics#cite_note-1http://en.wikipedia.org/wiki/Paulien_Hogeweghttp://en.wikipedia.org/wiki/Evolutionhttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Genome-wide_association_studieshttp://en.wikipedia.org/wiki/Protein%E2%80%93protein_interactionshttp://en.wikipedia.org/wiki/Gene_expression
  • 7/31/2019 Bioinformatics 1

    6/24

    Over the past few decades rapid developments in genomic and

    other molecular research technologies and developments in

    information technologies have combined to produce a

    tremendous amount of information related to molecular biology.

    Bioinformatics is the name given to these mathematical and

    computing approaches used to glean understanding of

    biological processes.

    Common activities in bioinformatics include mapping and

    analyzingDNAand protein sequences, aligning different DNA

    and protein sequences to compare them, and creating and

    viewing 3-D models of protein structures.

    There are two fundamental ways of modelling a Biological

    system (e.g., living cell) both coming under Bioinformatic

    approaches.

    Static

    Sequences Proteins, Nucleic acids and Peptides

    Structures Proteins, Nucleic acids, Ligands (includingmetabolites and drugs) and Peptides

    Interaction data among the above entities includingmicroarray data and Networks of proteins, metabolites

    http://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNAhttp://en.wikipedia.org/wiki/DNA
  • 7/31/2019 Bioinformatics 1

    7/24

    Dynamic

    Systems Biology comes under this category includingreaction fluxes and variable concentrations of metabolites

    Multi-Agent Based modelling approaches capturingcellular events such as signalling, transcription andreaction dynamics

    A broad sub-category under bioinformatics isstructural

    bioinformatics.

    http://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformaticshttp://en.wikipedia.org/wiki/Structural_bioinformatics
  • 7/31/2019 Bioinformatics 1

    8/24

    Sequence analysis

  • 7/31/2019 Bioinformatics 1

    9/24

    Sequence analysis

    Main articles:Sequence alignmentandSequence database

    Since thePhage -X174wassequencedin 1977,[4]theDNA

    sequencesof thousands of organisms have been decoded and

    stored in databases. This sequence information is analyzed to

    determine genes that encodepolypeptides(proteins), RNA

    genes, regulatory sequences, structural motifs, and repetitive

    sequences. A comparison of genes within aspeciesor between

    different species can show similarities between protein

    functions, or relations between species (the use ofmolecular

    systematicsto constructphylogenetic trees). With the growing

    amount of data, it long ago became impractical to analyze DNA

    sequences manually. Today,computer programssuch

    asBLASTare used daily to search sequences from more than

    260 000 organisms, containing over 190

    billionnucleotides.[5]

    These programs can compensate for

    mutations (exchanged, deleted or inserted bases) in the DNA

    http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid18073190-4http://en.wikipedia.org/wiki/Nucleotideshttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Proteinshttp://en.wikipedia.org/wiki/Polypeptideshttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/DNA_sequencehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid870828-3http://en.wikipedia.org/wiki/Sequencinghttp://en.wikipedia.org/wiki/Phi_X_174http://en.wikipedia.org/wiki/Sequence_databasehttp://en.wikipedia.org/wiki/Sequence_alignment
  • 7/31/2019 Bioinformatics 1

    10/24

    sequence, to identify sequences that are related, but not

    identical. A variant of thissequence alignmentis used in the

    sequencing process itself. The so-calledshotgun

    sequencingtechnique (which was used, for example, byThe

    Institute for Genomic Researchto sequence the first bacterial

    genome,Haemophilus influenzae)[6]does not produce entire

    chromosomes. Instead it generates the sequences of many

    thousands of small DNA fragments (ranging from 35 to 900

    nucleotides long, depending on the sequencing technology).

    The ends of these fragments overlap and, when aligned

    properly by a genome assembly program, can be used to

    reconstruct the complete genome. Shotgun sequencing yields

    sequence data quickly, but the task of assembling the

    fragments can be quite complicated for larger genomes. For a

    genome as large as thehuman genome, it may take many days

    of CPU time on large-memory, multiprocessor computers to

    assemble the fragments, and the resulting assembly will usually

    contain numerous gaps that have to be filled in later. Shotgun

    sequencing is the method of choice for virtually all genomes

    http://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignmenthttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Human_genomehttp://en.wikipedia.org/wiki/Bioinformatics#cite_note-pmid7542800-5http://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Shotgun_sequencinghttp://en.wikipedia.org/wiki/Sequence_alignment
  • 7/31/2019 Bioinformatics 1

    11/24

    sequenced today, and genome assembly algorithms are a

    critical area of bioinformatics research.

    Another aspect of bioinformatics in sequence analysis is

    annotation. This involves computationalgene findingto search

    for protein-coding genes, RNA genes, and other functional

    sequences within a genome. Not all of the nucleotides within a

    genome are part of genes. Within the genomes of higher

    organisms, large parts of the DNA do not serve any obvious

    purpose. This so-calledjunk DNAmay, however, contain

    unrecognized functional elements. Bioinformatics helps to

    bridge the gap between genome andproteomeprojects for

    example, in the use of DNA sequences for protein identification.

    http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Proteomehttp://en.wikipedia.org/wiki/Junk_DNAhttp://en.wikipedia.org/wiki/Gene_finding
  • 7/31/2019 Bioinformatics 1

    12/24

    Genome annotation

    Main article:Gene finding

    In the context ofgenomics,annotationis the process of

    marking the genes and other biological features in a DNA

    sequence. The first genome annotation software system was

    designed in 1995 by Dr. Owen White, who was part of the team

    atThe Institute for Genomic Researchthat sequenced and

    analyzed the first genome of a free-living organism to be

    decoded, the bacteriumHaemophilus influenzae. Dr. White

    built a software system to find the genes (places in the DNA

    sequence that encode a protein), the transfer RNA, and other

    features, and to make initial assignments of function to those

    genes. Most current genome annotation systems work similarly,

    but the programs available for analysis of genomic DNA are

    constantly changing and improving.

    http://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Gene_findinghttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Genome_project#Genome_annotationhttp://en.wikipedia.org/wiki/Genomicshttp://en.wikipedia.org/wiki/Gene_finding
  • 7/31/2019 Bioinformatics 1

    13/24

    Analysis of gene expression

    Theexpressionof many genes can be determined by

    measuringmRNAlevels with multiple techniques

    includingmicroarrays,expressed cDNA sequence tag(EST)

    sequencing,serial analysis of gene expression(SAGE) tag

    sequencing,massively parallel signature

    sequencing(MPSS),RNA-Seq, also known as "Whole

    Transcriptome Shotgun Sequencing" (WTSS), or various

    applications of multiplexed in-situ hybridization. All of these

    techniques are extremely noise-prone and/or subject to bias in

    the biological measurement, and a major research area in

    computational biology involves developing statistical tools to

    separatesignalfromnoisein high-throughput gene expression

    studies. Such studies are often used to determine the genes

    implicated in a disorder: one might compare microarray data

    from cancerousepithelialcells to data from non-cancerous cells

    to determine the transcripts that are up-regulated and down-

    regulated in a particular population of cancer cells.

    http://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Gene_expressionhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Epithelialhttp://en.wikipedia.org/wiki/Noisehttp://en.wikipedia.org/wiki/Signal_(information_theory)http://en.wikipedia.org/wiki/RNA-Seqhttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Massively_parallel_signature_sequencinghttp://en.wikipedia.org/wiki/Serial_analysis_of_gene_expressionhttp://en.wikipedia.org/wiki/Expressed_sequence_taghttp://en.wikipedia.org/wiki/DNA_microarrayhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://en.wikipedia.org/wiki/Gene_expression
  • 7/31/2019 Bioinformatics 1

    14/24

  • 7/31/2019 Bioinformatics 1

    15/24

    Analysis of protein expression

    Protein microarraysand high throughput (HT)mass

    spectrometry(MS) can provide a snapshot of the proteins

    present in a biological sample. Bioinformatics is very much

    involved in making sense of protein microarray and HT MS

    data; the former approach faces similar problems as with

    microarrays targeted at mRNA, the latter involves the problem

    of matching large amounts of mass data against predicted

    masses from protein sequence databases, and the complicated

    statistical analysis of samples where multiple, but incomplete

    peptides from each protein are detected.

    http://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Protein_microarrayhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Mass_spectrometryhttp://en.wikipedia.org/wiki/Protein_microarray
  • 7/31/2019 Bioinformatics 1

    16/24

    Comparative genomics

    Main article:Comparative genomics

    The core of comparative genome analysis is the establishment

    of the correspondence betweengenes(orthologyanalysis) or

    other genomic features in different organisms. It is these

    intergenomic maps that make it possible to trace the

    evolutionary processes responsible for the divergence of two

    genomes. A multitude of evolutionary events acting at various

    organizational levels shape genome evolution. At the lowest

    level, point mutations affect individual nucleotides. At a higher

    level, large chromosomal segments undergo duplication, lateral

    transfer, inversion, transposition, deletion and insertion.

    Ultimately, whole genomes are involved in processes of

    hybridization, polyploidization andendosymbiosis, often leading

    to rapid speciation. The complexity of genome evolution poses

    many exciting challenges to developers of mathematical

    http://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Comparative_genomicshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Endosymbiosishttp://en.wikipedia.org/wiki/Orthologyhttp://en.wikipedia.org/wiki/Geneshttp://en.wikipedia.org/wiki/Comparative_genomics
  • 7/31/2019 Bioinformatics 1

    17/24

    models and algorithms, who have recourse to a spectra of

    algorithmic, statistical and mathematical techniques, ranging

    from exact,heuristics, fixed parameter andapproximation

    algorithmsfor problems based on parsimony models toMarkov

    Chain Monte Carloalgorithms forBayesian analysisof

    problems based on probabilistic models.

    Modeling biological systems

    Main article:Computational systems biology

    Systems biology involves the use ofcomputer

    simulationsofcellularsubsystems (such as thenetworks of

    metabolitesandenzymeswhich comprisemetabolism,signal

    transductionpathways andgene regulatory networks) to both

    analyze and visualize the complex connections of these cellular

    processes.Artificial life or virtual evolution attempts to

    http://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Artificial_lifehttp://en.wikipedia.org/wiki/Gene_regulatory_networkhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Signal_transductionhttp://en.wikipedia.org/wiki/Metabolismhttp://en.wikipedia.org/wiki/Enzymehttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Metabolic_networkhttp://en.wikipedia.org/wiki/Cell_(biology)http://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computer_simulationhttp://en.wikipedia.org/wiki/Computational_systems_biologyhttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Heuristics
  • 7/31/2019 Bioinformatics 1

    18/24

    understand evolutionary processes via the computer simulation

    of simple (artificial) life forms.

    Prediction of protein structure

    Main article:Protein structure prediction

    Protein structure prediction is another important application of

    bioinformatics. Theamino acidsequence of a protein, the so-

    calledprimary structure, can be easily determined from the

    sequence on the gene that codes for it. In the vast majority of

    cases, this primary structure uniquely determines a structure in

    its native environment. (Of course, there are exceptions, such

    as thebovine spongiform encephalopathy a.k.a.Mad Cow

    Diseaseprion.) Knowledge of this structure is vital in

    http://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Protein_structure_predictionhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Prionhttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Mad_Cow_Diseasehttp://en.wikipedia.org/wiki/Bovine_spongiform_encephalopathyhttp://en.wikipedia.org/wiki/Primary_structurehttp://en.wikipedia.org/wiki/Amino_acidhttp://en.wikipedia.org/wiki/Protein_structure_prediction
  • 7/31/2019 Bioinformatics 1

    19/24

    understanding the function of the protein. For lack of better

    terms, structural information is usually classified as one

    ofsecondary,tertiaryandquaternarystructure. A viable

    general solution to such predictions remains an open problem.

    As of now, most efforts have been directed towards heuristics

    that work most of the time.

    One of the key ideas in bioinformatics is the notion

    ofhomology. In the genomic branch of bioinformatics,

    homology is used to predict the function of a gene: if the

    sequence of geneA, whose function is known, is homologous to

    the sequence of gene B, whose function is unknown, one could

    infer that B may share A's function. In the structural branch of

    bioinformatics, homology is used to determine which parts of a

    protein are important in structure formation and interaction with

    other proteins. In a technique called homology modeling, this

    information is used to predict the structure of a protein once the

    structure of a homologous protein is known. This currently

    remains the only way to predict protein structures reliably.

    http://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Secondary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Homology_(biology)http://en.wikipedia.org/wiki/Quaternary_structurehttp://en.wikipedia.org/wiki/Tertiary_structurehttp://en.wikipedia.org/wiki/Secondary_structure
  • 7/31/2019 Bioinformatics 1

    20/24

    One example of this is the similar protein homology between

    hemoglobin in humans and the hemoglobin in legumes

    (leghemoglobin). Both serve the same purpose of transporting

    oxygen in the organism. Though both of these proteins have

    completely different amino acid sequences, their protein

    structures are virtually identical, which reflects their near

    identical purposes.

    Other techniques for predicting protein structure include protein

    threading and de novo(from scratch) physics-based modeling.

    http://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobinhttp://en.wikipedia.org/wiki/Leghemoglobin
  • 7/31/2019 Bioinformatics 1

    21/24

    Web services in bioinformatics

    SOAPandREST-based interfaces have been developed for a

    wide variety of bioinformatics applications allowing an

    application running on one computer in one part of the world to

    use algorithms, data and computing resources on servers in

    other parts of the world. The main advantages derive from the

    fact that end users do not have to deal with software and

    database maintenance overheads.

    Basic bioinformatics services are classified by theEBIinto

    three categories:SSS(Sequence Search

    Services),MSA(Multiple Sequence Alignment)

    http://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/SOAPhttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Multiple_sequence_alignmenthttp://en.wikipedia.org/wiki/Sequence_alignment_softwarehttp://en.wikipedia.org/wiki/European_Bioinformatics_Institutehttp://en.wikipedia.org/wiki/RESThttp://en.wikipedia.org/wiki/SOAP
  • 7/31/2019 Bioinformatics 1

    22/24

    andBSA(Biological Sequence Analysis). The availability of

    theseservice-orientedbioinformatics resources demonstrate

    the applicability of web based bioinformatics solutions, and

    range from a collection of standalone tools with a common data

    format under a single, standalone or web-based interface, to

    integrative, distributed and extensiblebioinformatics workflow

    management systems.

    http://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysishttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systemshttp://en.wikipedia.org/wiki/Service-orientationhttp://en.wikipedia.org/wiki/Bioinformatics#Sequence_analysis
  • 7/31/2019 Bioinformatics 1

    23/24

    High-throughput image analysis

    Computational technologies are used to accelerate or fully

    automate the processing, quantification and analysis of large

    amounts of high-information-contentbiomedical imagery.

    Modern image analysis systems augment an observer's ability

    to make measurements from a large or complex set of images,

    by improvingaccuracy,objectivity, or speed. A fully developed

    analysis system may completely replace the observer. Although

    these systems are not unique to biomedical imagery,

    http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Objectivity_(science)http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/w/index.php?title=Biomedical_imagery&action=edit&redlink=1
  • 7/31/2019 Bioinformatics 1

    24/24

    biomedical imaging is becoming more important for

    bothdiagnosticsand research. Some examples are:

    high-throughput and high-fidelity quantification and sub-

    cellular localization (high-content screening,

    cytohistopathology,Bioimage informatics)

    morphometrics

    clinical image analysis and visualization

    determining the real-time air-flow patterns in breathing lungs

    of living animals

    quantifying occlusion size in real-time imagery from the

    development of and recovery during arterial injury

    making behavioral observations from extended video

    recordings of laboratory animals

    infrared measurements for metabolic activity determination

    inferring clone overlaps inDNA mapping, e.g. the Sulston

    score

    http://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Sulston_scorehttp://en.wikipedia.org/wiki/Gene_mappinghttp://en.wikipedia.org/wiki/Morphometricshttp://en.wikipedia.org/wiki/Bioimage_informaticshttp://en.wikipedia.org/wiki/High-content_screeninghttp://en.wikipedia.org/wiki/Diagnostics