50
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engi neering

COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Embed Size (px)

Citation preview

Page 1: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

COT 6930HPC and Bioinformatics

Introduction to Molecular Biology

Xingquan ZhuDept. of Computer Science and Engineering

Page 2: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Outline

Cell DNA

DNA Structure DNA Sequencing

RNA (DNA-> RNA) Protein

Page 3: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Cells are fundamental working units of every living system. A cell is the smallest structural unit of an organism that is capable of

independent function Unicellular organism (Any living being consisting of a single cell): mainly bacteria Multicellular organism (Organisms consisting of more than one cell): Plant and animal

All cells have some common features Membrane, cytoplasm

Cell is able to survive and multiply independently in appropriate environment There are estimated about 6x1013 (60 trillions) cells in a human body, of

about 210 distinct cell types Cells may have different sizes: a human red blood cell may be 5 microns in

diameter while some neurons are about 1 m long (from spinal cord to leg) Name a cell visible with naked eyes..

Life begins with Cell

Page 4: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Cell

Living organisms (on Earth) require ability to Separate inside from outside (lipids) Build 3D machinery to perform biological functions (proteins) Store information on how to build machinery (DNA)

The basic unit of life Every living thing is made of cells. Every cell comes from a pre-existing cell All of life’s functions are cellular

Page 5: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Every organism is composed of one of two radically different types of cells: prokaryotic cells or eukaryotic cells.

Prokaryotic cells are simpler than eukaryotic cells

Prokaryotes are (mostly) single cellular organisms

Eukaryotic cell has a nucleus, separated from the rest of the cell by a membrane

Eukaryotes can be single cellular (Yeast) or multicellular (animals, plants)

Organisms – Eukaryotes and Prokaryotes

Page 6: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Organisms – Eukaryotes and Prokaryotes

Prokaryotes Eukaryotes

Single cell Single or multi cell

No nucleus Nucleus

No organelles Organelles

One piece of circular DNA Chromosomes

No mRNA post transcriptional modification

Exons/Introns splicing

Page 7: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Structure of a Eukaryotic Cell

• Nucleus contains chromosomes, which are the carrier of the genetic material

• Organelles like centrioles, lysosomes, golgi complexes are enclosed compartments within the cell and are responsible for particular biological processes

• Area of the cell outside the nucleus and the organelles is called the cytoplasm

Page 8: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Composition of Cells

Cell membrane Boundary between cell and outside world Cell membranes consist of two layers of lipid

molecules with hydrophobic ends facing in (keeps water out)

Nucleus Contain genetic material Separated from the rest of the cell by a nuclear

membrane

Page 9: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

The nucleus1. nuclear envelope2. nucleolus3. chromosomes

chromosomes

Page 10: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

All Cells have common Cycles

Growth of a single cell and its subsequent division is called the cell cycleM: Mitosis

Prokaryotes, particularly bacteria, are extremely successful at multiplying.

Multicellular organisms typically begin life as a single cell. The single cell has to grow, divide and differentiate into different cell types to produce tissues and in higher eukaryotes, organs

Page 11: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

All cells come from pre-existing cells

Page 12: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Molecular Biology: Studying life at the molecular level

DNA Protein RNA

mRNA rRNA tRNA

Protein synthesis Protein transcription Protein translation

Page 13: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Molecules of Life All Life depends on 3 critical molecules –

DNA, RNA, and Protein All 3 are specified linearly

DNA and RNA are constructed from nucleic acids (nucleotides) Can be considered to be a string written in a four-letter

alphabet (A C G T/U) Proteins are constructed from amino acids

Strings in a twenty-letter alphabet of amino acids

Page 14: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA RNA protein phenotype

Central dogma of molecular biology

Page 15: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA, RNA, Protein

Self replication and genetic code

DNA DNA → DNA (Replication)RNA DNA → RNA (Transcription / Gene Expression)Protein RNA → Protein (Translation)

Page 16: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Outline

Cell DNA

DNA Structure DNA Sequencing

RNA (DNA-> RNA) Protein

Page 17: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA (Deoxyribonucleic Acid ) Structure Physical structure

Double (stranded) helix Sugar & phosphate groups form backbone Complementary bases (A-T, C-G) connected by hydrogen bond 5’ = end w/ free phosphate group 3’ = end w/ free oxygen group

Page 18: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA

Composition Sequence of nucleotides Deoxyribonucleotide = deoxyribose sugar + phosphate group +

base

Page 19: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Nucleotide Bases

Page 20: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Nucleotides

The five-carbon sugar (a pentose) in nucleotides has two types Deoxyribose, which has a hydrogen atom attached to its #2 carbon atom

(designated 2') : DNA Ribose, which has a hydroxyl group atom there: RNA

Page 21: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA structure

Page 22: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Why 5’ and 3’

Deoxyribonucleotide = deoxyribose sugar + phosphate group + base

The deoxyribose sugar in DNA is a pentose, a five-carbon sugar. Four carbons and an oxygen make up the five-membered ring; the other carbon branches off the ring. The carbon constituents of the sugar ring are numbered 1'-4' (pronounced "one-prime carbon"), starting with the carbon to the right of the oxygen going clockwise. The fifth carbon (5') branches from the 4' carbon.

Page 23: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA - Denaturation, Hybridization

Page 24: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA For bioinformatics

DNA can be represented as a sequence of letters (A,C,G,T)

5’ A T A C G T A 3’ 3’ T A T G C A T 5’ (matching strand, redundant)

Terms Base pair (bp) – one pair of DNA bases (1 letter) Gene – section of DNA that produces a functional product Chromosome – physical linear sequence of DNA Genome – entire collection of DNA for an organism

E Coli 1 chromosome 5 x 106 bases (5 Mbps) Drosophila 8 chromosomes 2 x 108 bases (200 Mbps) Human 48 chromosomes 3 x 109 bases (3 Bbps)

Page 25: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Replication

DNA can be replicated DNA strands are split DNA polymerase (enzyme) reads one strand (template) Builds new (complementary) strand to form duplicate DNA

Page 26: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA fascinating factEach cell has 2m of DNA

Average person has 75 trillion cells = 75 * 1012

Length of DNA in a person = 150 * 1012 m

Each person has enough DNA to go to the sun and back 500 times

Page 27: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Organization of DNA in chromosomes

homologous

3 bases/ amino acid

27,000 bases/ protein (1 gene)

3,000,000,000 base pairs/ genome

20,000 genes/ genome

Histone proteins

Human Genome Project

Page 28: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Genome

Gene: Contiguous subparts of single strand DNA that are templates for producing proteins. Chromosomes: compact chain

s of coiled DNA

Genome: The set of all genes in a given organism.

Noncoding part: The function of DNA material between genes is largely unknown.

Source: www.mtsinai.on.ca/pdmg/Genetics/basic.htm

Page 29: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

More Terminology

The genome is an organism’s complete set of DNA. A bacteria contains about 600,000 DNA base pairs Human and mouse genomes have some 3 billion.

Human genome has 23 pairs of chromosomes. Each chromosome contains many genes.

Gene Basic physical and functional units of heredity. Specific sequences of DNA bases that encode

instructions on how to make proteins.

Page 30: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA sequences in the human genome

Page 31: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA homologies98.7%

Page 32: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Outline

Cell DNA

DNA Structure DNA Sequencing

RNA (DNA-> RNA) Protein

Page 33: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing (Sanger’s Dideoxy Method) Method for identifying short DNA sequences Algorithm

Replicate DNA with (color-labeled) dideoxy-nucleotides Creates fragments of DNA

Apply gel electrophoresis Separates fragments based on size

Machine scans gel Records level of color found at each position

Software calls bases Predicts base at each position

Limitations Upper bound of 700-800 bases on sequence length Larger DNA sequences will need to be assembled

Page 34: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing Dideoxynucleotides

Similar to normal nucleotide base Missing 3’ hydroxyl group terminates DNA sequence May be chemically modified to fluoresce under UV light

Page 35: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing

Example for GCGAATGTCCACAACGCTACAGGTG Replicate DNA in the presence of dideoxy-Cytidine (ddC) Replication terminates when ddC is used instead of C Produces the following DNA fragments

GC GCGAATGTC GCGAATGTCC GCGAATGTCCAC GCGAATGTCCACAAC GCGAATGTCCACAACGC GCGAATGTCCACAACGCTAC

Page 36: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing

Gel electrophoresis Place DNA fragments in gel Apply electric field Speed of fragment is determined

by size Smaller = faster Larger = slower

After given time Fragments are separated in gel Fragments are sorted by size

(number of bases)

Page 37: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Gel electrophoresis

Page 38: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing

Page 39: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

DNA Sequencing

Page 40: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Outline

Cell DNA

DNA Structure DNA Sequencing

RNA (DNA-> RNA) Protein

Page 41: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Central Dogma of Biology: DNA, RNA, and the Flow of Information

TranslationTranscription

Replication

Page 42: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Ribonucleic acid (RNA)

Composition Sequence of nucleotides Ribonucleotide = ribose sugar + phosphate group + base

Major difference between DNA and RNA RNA: usually single stranded RNA: ribose sugar, DNA: Deoxyribose sugar RNA: Uracil (U) instead of Thymine (T)

DNA → RNA (Transcription / Gene Expression) RNA polymerase (enzyme)

Finds gene initiation marker (codon) on DNA strand Reads DNA strand containing marker Builds (complementary) strand of messenger RNA (mRNA) Stops when gene end marker (codon) found

Resulting RNA sequence = transcript

Page 43: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Ribonucleotides

The five-carbon sugar (a pentose) in nucleotides has two types Deoxyribose, which has a hydrogen atom attached to its #2 carbon atom

(designated 2') : DNA Ribose, which has a hydroxyl group atom there: RNA

Page 44: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Transcription Example (1)

Page 45: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Transcription Example (2)

Page 46: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Transcription Example (3)

Page 47: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Transcription Example (4)

Page 48: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Transcription Example

Page 49: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

What is Enzyme?

Proteins that catalyze (i.e. accelerate) chemical reactions They are not living things Two types of Enzyme

Join specific molecules together to form new molecules Break specific molecules apart into separate molecules

Things about Enzyme Enzymes are specific: Performing only one specific job, about

3000 types enzymes identified so far Enzymes are catalysts: Can perform that same job over and over

again, millions of times, without being consumed in the process. Enzymes are efficient: Enzymes are natural: Once they have done their job, enzymes

break down swiftly and can be absorbed back into nature

Page 50: COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering

Outline

Cell DNA

DNA Structure DNA Sequencing

RNA (DNA-> RNA) Protein