Upload
elmer
View
95
Download
1
Tags:
Embed Size (px)
DESCRIPTION
NCBI Bioinformatics Workshop. Rabat, Morocco 2012. What is Bioinformatics?. Bioinformatics is the application of information technology to the field of molecular biology . - PowerPoint PPT Presentation
Citation preview
NCBI Bioinformatics Workshop
Rabat, Morocco 2012
What is Bioinformatics? Bioinformatics is the application of information technology to
the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in
1979 for the study of informatics' processes in biotic systems. Its primary use since at least the late 1980s has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
Wikipedia
What is NCBI?
• Create automated systems for knowledge about molecular biology, biochemistry, and genetics.
• Perform research into advanced methods of analyzing and interpreting molecular biology data.
• Enable biotechnology researchers and medical care personnel to use the systems and methods developed.
On November 4, 1988 that President Ronald Reagan signed the Health Omnibus Extension Act to create The National Center for Biotechnology Information as part of National Library of Medicine at NIH.
History of molecular biology
1860 Genetics Gregor Mendel discovered that genes determine characteristics of the organism genes are passed to children from both parents
1943 Molecular biologyJames Watson discovered that DNA
molecule might store the genes1962 Noble Prize James Watson, Francis Crick, Wilkins (Rosaline Franklin)
1970 Central Dogma (first announced in 1952) and reinstated by Francis Crick in Nature.
Central Dogma of molecular biologyThe central dogma of molecular biology was first enunciated by Francis Crick in 1958[1] and re-
stated in a Nature paper published in 1970 The general transfers describe the normal flow of biological information: DNA can be copied to DNA
(DNA replication), DNA information can be copied into mRNA, (transcription), and proteins can be synthesized using the information in mRNA as a template (translation).
Does the central dogma still stand?Koonin EV. Biol Direct. 2012 Aug 23;7(1):27. [Epub ahead of print]
History of biotechnology1590 the microscope is discovered by Janssen1675 Leeuwehoek discovered protozoa and bacteria1855 Escherichia coli bacterium is discovered (major research and production tool for biotechnology1879 Flemming discovered chromatin, rod-like structures in cell nucleus, later called ‘chromosomes’1942 The electron microscope is used to identify and characterize a bacteriophage- a virus that infects bacteria.1953 Watson and Crick reveal the three-dimensional structure of DNA.1973 Cohen and Boyer perform the first successful recombinant DNA experiment, using bacterial genes.1983 The Polymerase Chain Reaction (PCR) technique1995 First bacterial genome is sequenced by whole genome shotgun technology2001 The sequence of the human genome is published in Science and Nature, making it possible for researchers all over the world to begin developing treatments.2005 Next Generation Sequencing: Illumna, MySeq, Ion Toron, PAcBio
History of Bioinformatics
Sequence database 1960 - Margaret Dayhoff collected sequences in a database that later become PIR1974 –GenBank; 1980 –EMBL(ENA); 1984 – DDBJ; 1984 –SwissProtSequence comparison1970 – Needleman- Wuncsh global pairwise alignment1972 - Smith-Waterman local alignment1973 – multiple alignmentDatabase searches by sequence similarity1988 – FASTA by Pearson and Lipman1990 – BLAST by Altshul, Gish, LipmanText search and retrieval system1990 – Entrez designed by Lipman and BensonAlgorithmsGene predictionProtein structureHidden Markov ModelClusteringTrees
Hypothesis
Data managment
MODELExperim
ent
DATA
Validation
Visualization
Analysis
Interpretation
Problem Solving
For every complex problem, there is an answer that is clear, simple, and wrong… - H. L. Mencken
ROC curve analysisReceiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993)
Challenges in Computational Biology
Protein
Protein structure prediction
Homology searches
Multiple alignments and phylogenetic tree
Genome assembly and annotation
Challenging issues in Bioinformatics
• Data management processing, storage accuracy (highthrouput low quality) search and retrieval presentation • Data analysis algorithms statistical techniques• Simulation modeling and prediction Parameter estimation prediction accuracy
NCBI mission: discovery initiative
NCBI Analysis
Search
Visualization
Validation
What is GenBank? NCBI’s Primary Sequence Database
• Nucleotide only sequence database • Archival in nature
– Historical– Reflective of submitter point of view (subjective)– Redundant
• GenBank Data– Direct submissions (traditional records)– Batch submissions (EST, GSS, STS)– ftp accounts (genome data)
• Three collaborating databases– GenBank– DNA Database of Japan (DDBJ) – European Molecular Biology Laboratory (EMBL) Database
Sequence Databases
GenBank
SequencingCenters
GA
GAGA
ATTAT
TC
CGAGA
ATTAT
TC
C
AT
GAGA
ATTC
C GAGA
ATTC
C
TTGACAATT
GACTA
ACGTGC
TTGACA
CGTGAATTGAC
TATATAGCCG
ACGTGC
ACGTGCACGTGCTTGACA
TTGACA
CGTGA
CGTGA
CGTGA
ATTGACTAATTGACTA AT
TGACTA
ATTGACTA
TATAGC
CG
TATAGCCGTATAGCCGTATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG
CATT
GAGA
ATTC
C GAGA
ATTC
C Labs
Algorithms
UniGene
Curators
RefSeq
GenomeAssembly
TATAGCCGAGCTCCGATACCGATGACAA
Updated continuall
y by NCBI
Updated ONLY by submitters
Next Generation Sequencing
Next Generation Sequencing
NGS produces a lot of data
Information retrieval
NCBI Discovery initiative
Entrez Search and retrieval system
"From a computer in the comfort of your own home or from one in your neighborhood library, you will be able to access timely and accurate information. Already 30,000 people a day are using MEDLINE. By making it more accessible -- free and private -- we can increase that number many times over."
Vice President Gore 1997
Improve information retrieval
Add links filtersRelated information
Rescuing Zero-Result PubMed Searches
Unassisted
Zero-result rescued by spelling
Unassisted
Zero-result rescued by spelling
Gene sensor
Citation sensor/Hydra
2008 2011
Auto-complete
16% of all PubMed searches
19%Improvement 37%
Improvement
Sequence analysis
Visualization
NCBI Bioinformatics Workshop 2009
NCBI Bioinformatics Workshop 2011