30

Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics
Page 2: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

BioinformaticsApplying the Concept of Information in Biology

The theory of evolution is the conceptual framework of biology and medicine and bioinformatics is the tool used to analyze and quantify evolutionary relationships at every level of investigation – molecular, physiological, or ecological.

Page 3: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Diagrammatic view of metabolic pathways showing major functional interaction among synthesis and degradation of nutrients from different food groups

(from KEGG; http://www.genome.ad.jp:80/kegg/metabolism.html)

Page 4: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Ellis, E.J., Macromolecular crowding, 2001, TIBS 26:597

Macromolecular crowding in bacterial cytoplasm

Hopper & Mayer, 1999, Prokaryotes. Am.Sci. 87:518

Page 5: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

What makes a scientific discipline? A look at the history of biochemistry.

To know where we come from helps us understand where we are going. Novel ways of curing diseases and fighting off infections will include individualized prescription drug regiments, gene therapy, and the development of new generations of antibiotics. These changes are no less sweeping and broad than those brought to biology by chemists and physicists in the 1920s, 30s, and 40s attracted to a most obvious problem in biology at that time, the staggering lack of an atomistic understanding of genetics.

Page 6: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

1926, 1930

First accounts that proteins convey enzymatic activity (urease, pepsin) in cellular metabolism; an important step to demonstrate that proteins catalyze chemical reactions and are not only structural components of cells

1934

First successful X-ray study of the globular protein pepsin by Bernal and Crowfoot; it does not show high resolution details, but demonstrates water covered protein surface

Page 7: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

1937

Citric Acid Cycle described by Hans Krebs; this is the central energy yielding pathway in all organisms; complete biochemical pathway reactions could be elucidated in the absence of any protein structure information (kinetic data represents macroscopic behavior of enzymes)

Page 8: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

1941

'One gene, one enzyme' hypothesis by Beadle and Tatum

1944

DNA is carrier of genetic information in bacteria (Oswald Avery)

1945

First complete amino acid content of a protein is published (not its sequence, however)

Page 9: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

1951

First complete amino acid sequence published of the protein hormone insulin by Fred Sanger

Proposed model for alpha helix and beta sheet and importance of so called hydrogen bonds in protein structures (Pauling and Corey)

1953

DNA structure at atomic resolution by Crick, Watson, and Wilkins; they propose a model for DNA replication based on the structural information; the concept of structure-function relationship has been successfully used to solve a major problem in biology

Page 10: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

1962

High resolution structure of myoglobin at 2 Angstrom confirms for the first time the existence of alpha helix structures in proteins (Perutz and Kendrew)

The structure of the enzyme Lysozyme with a bound inhibitor molecule solved at 2 Angstrom resolution giving the first structural insight into enzyme-substrate interaction and Koshland's induced fit theory

1963

Genetic code solved; links DNA sequence to amino acid sequence in proteins (Holley, Khorana, Nirenberg)

Page 11: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Data base structures

• Sequences

• Structures

• Pathways

• Analysis tools

• Prediction tools

• Functional categories & interactivity

• PubMed

Page 12: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

http://www.genome.ad.jp:80/dbget/dbget.links.html

Integrated database retrieval system, GenomeNet, Japan

Page 13: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

KEGG: Kyoto Encyclopedia of Genes and Genomes

http://www.genome.ad.jp:80/kegg/kegg2.html

Page 14: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Analysis, Prediction, Data Mining

• Similarity searches

• Structure prediction

• Gene prediction

• Pathway reconstruction

• Visualization and Modeling

• Pattern recognition

• Clustering

• Annotation

Page 15: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Analysis of sequence information

Page 16: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Principal component analysis of variability found in whole genome databases

Prediction of relationship among sequences

Cluster of Orthologous Groups at NCBI

Page 17: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Clusters of orthologous groups (sequences of individual proteins or protein families represented in at least 3 species (currently microorganisms only) thus corresponding to an ancient conserved domain)

TranslationTranscriptionSignaling

Page 18: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Rickettsia prowazekii

Obligate intracellular parasite, the causative agent of epidemic typhus. The functional profiles of these genes show similarities to those of mitochondrial genes: no genes required for anaerobic glycolysis are found in either R. prowazekii or mitochondrial genomes, but a complete set of genes encoding components of the tricarboxylic acid cycle and the respiratory-chain complex is found in R. prowazekii. In effect, ATP production in Rickettsia is the same as that in mitochondria. Many genes involved in the biosynthesis and regulation of biosynthesis of amino acids and nucleosides in free-living bacteria are absent from R. prowazekii and mitochondria. Such genes seem to have been replaced by homologues in the nuclear (host) genome. (Nature 1998 Nov 12;396(6707):133-40)

Phylogenetic analyses indicate that R. prowazekii is more closely related to mitochondria than is any other microbe studied so far.

E.coli

Page 19: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

KEGG – pathway maps

Page 20: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Glycolysis pathway map from KEGG

Rickettsia prowazekii Escherichia coli K-12 MG1655

Page 21: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Rickettsia prowazekii

Page 22: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Escherichia coli

Page 23: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Homo sapiens

Page 24: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

What kind of information can be obtained using the COG database?

1. Annotation of proteins. Known functions (and two- or three-dimensional structures) of one COG member can often be directly attributed to the other members of the COG. Caution must be used here, however, since some COGs contain paralogs whose function may not precisely correspond to that of the known protein.

2. Phylogenetic patterns. These show the presence or absence of proteins from a given organism in a specific COG. Used systematically, such patterns can be used to identify whether a particular metabolic pathway exists in an organism.

3. Multiple alignments. Each COG page includes a link to a multiple alignment of COG members, which can be used to identify conserved sequence residues and analyze evolutionary relationships between member proteins.

Page 25: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Hierarchical cluster analysis of DNA microarrays

Eisen et al. (1998) PNAS 95:14863.; http://rana.lbl.gov/EisenSoftware.htm

Page 26: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Hierarchical clustering and factor analysis of DNA microarrays

Factor analysis (and principal component analysis) demonstrates three independent factors (Eigenvectors) accounting for 99.5% of the variability of the array data (6 arrays; three conditions; each condition repeated once). Factor one (F1) accounts for the variability in hybridization strength. Factor two accounts for gene specific differences of hybridization strength that are more distinguish Va2 from both Vb5 and control (see diagram F2-F1). Factor three shows that there are general condition specific differences that distinguish control from Vb5 and from Va2 but are highly reproducible when repeated by labeling cDNA from same RNA samples. The dendrogram obtained from hierarchical clustering of the six arrays shows the same relationship as determined by the second variable (F2) from factor analysis.

Page 27: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

One can ask any biologically interesting question concerning relationship between database entries, e.g.:

How many genes in the human genome?

Minimal gene set theory!

Evolutionary psychology: Explaining behavioral traits.

Page 28: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Minimal gene set theory!

The definition of a minimal gene set would be that any knock-out that does not kill the organism, proves that there are more genes than the organism needs for survival. Therefore, a minimal gene set would be one where each single gene knock-out would result in a non-viable clone.

The smallest gene set (besides large viral genomes with >200 genes) found is 467 in Mycoplasma genitalium. The latter can hardly be considered a free living organism.

Autonomous (neither symbiotic nor parasitic) species to not tend to have minimal gene sets Chemotrophs, for which there are only archaea known, have genomes with usually more than 2,000 open reading frames, up to four times the minimal gene set found in eubacteria.

Phototrophs produce, besides bacterial species, some of the largest life forms (trees) containing some of the largest genomes.

Page 29: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

Name Proteins in COGs

Methanococcus jannaschii 1786 1330

Methanobac. thermoautotrophicum 1873 1388

Saccharomyces cerevisiae 5955 2290

Escherichia coli K12 4275 3414

Escherichia coli O157 5315 3662

Helicobacter pylori 1576 1096

Rickettsia prowazekii 835 697

Mycoplasma pneumoniae 689 425

Mycoplasma genitalium 484 381

Genome “size” (number of proteins) of some microorganisms

Page 30: Bioinformatics Applying the Concept of Information in Biology The theory of evolution is the conceptual framework of biology and medicine and bioinformatics

How many genes in the human genome?

Bets: 165 Mean: 61,710 Lowest: 27,462 Highest: 153,478

Source: Sanger Institutehttp://www.ensembl.org/Genesweep/

Assessment of the gene number will occur on the 2003 Cold Spring Harbor Laboratory Genome meeting