22
Introduction to the course Introduction to Molecular Biology (Part I) Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura Fall 2010 Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Algorithms in Bioinformatics: Lecture 01 Introductionlucia/courses/5126-10/lecturenotes/01... · Algorithms in Bioinformatics: Lecture 01 Introduction ... alphabet of size 20. Algorithms

Embed Size (px)

Citation preview

Introduction to the course Introduction to Molecular Biology (Part I)

Algorithms in Bioinformatics:Lecture 01 Introduction

Lucia Moura

Fall 2010

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Intro

Introduction to the course

“Bioinformatics is the study of biology through computer modeling andanalysis. It is a multi-discipline research involving biology, statistics,data-mining, machine learning and algorithms.”

textbook: Wing-Kin SUNG, Algorithms in Bioinformatics, CRC Press,2009.

This course will give an in-depth view of algorithmic techniques used inbioinformatics.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Intro

Course contents (tentative):Introduction to Molecular Biology (chapter 1)Sequence Similarity (chapter 2)global/local/semi-global alignment, gap penalty, scoring functionsSuffix trees and related data structures (chapter 3)algorithms to build a suffix trees, applicationsGenome Alignment (chapter 4)methods use suffix tree and longest common subsequence algorithmMultiple sequence alignment (chapter 6)dynamic programming, approximation algorithms, heuristicsPhylogeny Reconstruction (chapter 7)constructing a phylogenetic tree given different types of dataGenome Rearrangement (chapter 9)reversals, transpositions, etc, various distances consideredOther topics: RNA secondary structure prediction (guest lecture);other topics/guest lectures TBA

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Intro

Course Administration

Please refer to the course outline:http://www.site.uottawa.ca/ lucia/courses/5126-10/outline.html

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Intro to Molecular biology: DNA, RNA, Protein

Our body has organs formed by tissues which are collections of similarcells that perform specialized functions.A cell is the minimal self-reproducing unit in all living species. It performstwo functions:

1 stores and passes genetic information for preserving life fromgeneration to generation.This is done via DNA molecules.

2 Performs chemical reactions necessary to maintain our life.To do this portions of DNA called genes are transcribed into RNAmolecules, which in turn guide the synthesis of proteins. Proteins arethe main catalysts for chemical reactions in the cell.

Next we discuss these macromolecules (molecules formed from acollection of smaller molecules): protein, DNA and RNA.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Proteins

Proteins are the building blocks of cells; they execute nearly all cellfunctions.

Understanding proteins is essential to understanding how the bodyfunctions and other biological processes.

A protein (also called polypeptide) is a chain of amino acids (on averagearound 350 amino-acids form a protein), each bonding to its neighbourthrough a covalent peptide bond. The protein’s primary structure isgiven by its sequence of amino-acids.There are 20 different common amino acids.

Computer science language: a protein’s primary structurecorresponds to a string (of length in average 350 symbols) over analphabet of size 20.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Amino acid structure

1 Amino group (NH2)

2 Carboxyl Group (COOH)

3 R-group (side chain):Different R-groups (side chain) characterize each of the 20 commonamino acids.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Amino acids join together via a peptide bond

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNADeoxyribonucleic Acid (DNA) is the genetic material in all livingorganisms. It stores the instructions for the cell to perform its functions.“DNA can be thought of as a large cookbook with recipes for makingevery protein in a cell. (...)The information in the genes is read, perhaps millions of times in the lifeof an organism, but the DNA itself is never used up.”

DNA consists of 2 strands of nucleotides forming a double helix structure.DNA nucleotides vary depending on 4 possible nitrogenous bases:adenine (A), guanine (G), cytosine (C), thymine (T).One strand is a polynucleotide (a sequence of nucleotides of 4 types); thesecond strand has their complementary base pairs (A = T , C ≡ G).

Computer science language: a DNA’s primary structure correspondsto a string over the alphabet A, C, T,G(the second strand is determined by the first).

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNA Nucleotides

1 A pentose sugar deoxyribose

2 Phosphate group (bound to the 5’ carbon)

3 Nitrogenous base (bound to the 1’ carbon): A, C, T, G

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNA formed by chaining nucleotides I

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNA formed by chaining nucleotides II

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Watson-Crick base paring

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNA double helix structure (Watson and Crick 1958)

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

DNA replicationCell duplicates and passes DNA to two daughter cells.

1 double strand separated

2 each strand forms a template for a complementary new strand

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

RNA

Ribonucleic Acid (RNA) is the nucleic acid produced during thetranscription process (from DNA to RNA).The nucleotide structure for RNA is similar to the one for DNA.Differences:

1 Ribose Sugar in place of Deoxyribose;

2 Nitrogenous bases are (A, U), (C, G); Uracyl instead of Thymine.

RNA is single stranded.RNA can form more complex 3D structures (than DNA) to perform morefunctions.Proteins can perform even more functions than RNA.DNA is more stable to store information than RNA.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Nucleotide structure for RNA

1 A pentose sugar ribose

2 Phosphate group (bound to the 5’ carbon)

3 Nitrogenous base (bound to the 1’ carbon): A,U,C,G

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Different types of RNA

mRNA: messenger RNAcarry encoded information needed to make proteins

ncRNA: non-coding RNA, which includes:I ribosomal RNA (rRNA):

are parts of ribosomes, help translate mRNA into proteinsI transfer RNA (tRNA):

are like molecular diccionaries that translate the nucleic acid code intothe amino acid sequence of proteins.

I short ncRNA:regulate the process for generating proteins from genes.

I long ncRNA:diverse functions, unknown functions.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Genome, Chromosome and Gene

genome: the set of all DNA in an organism.genome size varies; size doesn’t necessarily correspond to complexity:bacteria Mycoplasma genitalium genome has ∼ 600,000 base pairs;human and mouse genomes have ∼ 3 billion base pairs;the single cell organism Amoeba dubia has ∼ 670 billion base pairs!

the genome is partitioned into chromosomes; each chromosome is sdouble-stranded DNA chain wrapped around histones. Humans have 23pairs of chromosomes (e.g. males have 22 pairs of autosomes, one X andone Y chromosome).

a gene is a “substring” of DNA that encodes a protein or an RNAmolecule. Each chromosome contains many genes. In the human genomethere are ∼ 30,000 genes.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Chromosomes

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Processes to be understood in more detail next class:

DNA replication and DNA mutation.

Central Dogma (proposed by Crick in 1958)process of transfering information from DNA to RNA to protein.

I Transcription (transfer of genetic information from the DNA tothe mRNA):DNA is transcribed to mRNA, i.e., during the transcription process, anmRNA is synthesized from a DNA template.

I Translation (mRNA is translated to protein):The mRNA is translated into an amino acid sequence. Here the geneticcode is used: each codon (3 consecutive symbols) is translated intotheir corresponding amino acid.

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura

Introduction to the course Introduction to Molecular Biology (Part I)

Molecular Biology: DNA, RNA, Protein, Gene, Chromosome, Genome

Brief History of Bioinformatics1866: Mendel discovered genetics ( hybridization of peas, genes)1869: DNA was discovered.1944: Avery & McCarty show DNA is the carrier of genetic info.1953: Watson and Crick deduced the double helix structure of DNA.1970’s and beyond: several biotechnology techniques were developed.E.g. DNA sequencing using any tissue ; polymerase chain-reaction.1986: RNA splicing in eukaryotes is discovered (introns/extrons)1998: Fire and Mello discovered RNA interference1980-1990: genome sequencing of various organisms (e.g. E. coli)1990: the human genome project is lauched2003: sequencing of the human genome (first draft 2000)2006-now second generation sequencing technology is availableOther projects: Genomes to Life (understand the detailed mechanismof cells), ENCODE (annotating: all the genes & functional elements),HAPMAP(study differences in genetic data among people)

Algorithms in Bioinformatics: Lecture 01 Introduction Lucia Moura