42
1 Sequence Optimization For Synthetic Genes Using Genetic Algorithms David Sigfredo Angulo 1 Rob Vogelbacher 1, Benjamin R. Capraro 2 , Tobin Sosnick 2 , Shohei Koide 2 1 School of Computer Science Telecommunications and Information Systems DePaul University 2 Department of Biochemistry and Molecular Biology The University of Chicago

Sequence Optimization For Synthetic Genes Using Genetic Algorithms

  • Upload
    glenna

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequence Optimization For Synthetic Genes Using Genetic Algorithms. David Sigfredo Angulo 1 Rob Vogelbacher 1, Benjamin R. Capraro 2 , Tobin Sosnick 2 , Shohei Koide 2 1 School of Computer Science Telecommunications and Information Systems DePaul University - PowerPoint PPT Presentation

Citation preview

Page 1: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

1

Sequence Optimization For Synthetic GenesUsing Genetic Algorithms

David Sigfredo Angulo1

Rob Vogelbacher1, Benjamin R. Capraro2, Tobin Sosnick2,Shohei Koide2

1 School of Computer Science Telecommunications and Information Systems DePaul University2 Department of Biochemistry and Molecular Biology The University of Chicago

Page 2: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Introduction

• Genetic Algorithms:– Using ideas based on the biology of genes– Create software to use such a stochastic

means to search through large searchspaces– Resulting algorithm has nothing to do with

genes• Designing Genes

– This search space is huge– REALLY NOVEL IDEA:

• Use Genetic Algorithms based on genes to design genes!!

Page 3: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

3

Outline

• Short biology Tutorial• DNA Sequence Generation

– Why is the problem difficult?• IBG Gene Designer

– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation

Page 4: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

First

• Before the problem can be described– Must give some background biochemistry principles

• Tutorial outline– DNA– Codons– Protein

• Synthetic genes– What are they and what are they used for?

– Restriction Enzymes– Expressing Proteins using Vectors

Page 5: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Transcription/Translation

Transcription Translation

DNA RNA Protein RNA Polymerase Ribosomes

Central Dogma of Molecular Biology

Page 6: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

DNA

• Deoxyribonucleic acid• Strand backbone is made

of sugar & phosphate molecules

• Strands connected by nitrogen containing nucleotide bases

• Two strands join making a double helix

• Each strand is made of nucleotides joined together

Page 7: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

2 nm

11 nm

30 nm

300 nm

700 nm

1100 nm

Short region of DNA 2bl helix

"beads on a string" form of Chromatin

30 nm chromatin fiber of packed nucleosomes

Section of chromosome in an extended form

Condensed section of chromosome

Entire mitotic chromosome

Page 8: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

DNA

Four Nucleotides:AGTC

Page 9: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

DNA: Base Pairing

Page 10: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Short Biology Tutorial

• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors

Page 11: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

11

DNA Sequence Generation:Codon to Amino Acid Translation

http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg

Page 12: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Short Biology Tutorial

• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors

Page 13: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Proteins: AA Chains

Page 14: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Proteins

• Amino Acid Chains Fold Into complex 3D Structures• Functional properties depend on

3D structure• Usefulness depends on

functional properties– E.g. designing drugs

Page 15: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Designed/Expressed Proteins Extremely Useful

• Designed Proteins– Can be used to study protein

structure– Can be used to study effects of

otther proteins• Can be designed to “knock

out” other proteins• Can be designed to “block” the

acgtion of other proteins• Expressed proteins

– Expressed in cow’s milk or chicken eggs

– Can manufacture drugs on large scales in this way

• E.g. insulin

Page 16: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

16

Synthetic Genes• DNA sequences

– “backtranslated” from a novel Protein or Amino Acid sequence

Transcription Translation

DNA RNA Protein RNA Polymerase Ribosomes

• We’ll put the DNA for our designed protein into an organism (a vector)• Then that vector will make (express) our protein• But, how do we get the DNA into an organism???

Page 17: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Short Biology Tutorial

• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors

Page 18: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Restriction Enzyme Digests

• Watson – Crick 1953• Took 20 years to be able to do anything with DNA• H. Smith (and others) made a discovery that allowed manipulation and

deciphering of DNA• Discovery was that bacteria produced enzymes that introduce breaks in

double stranded DNA molecules whenever they encountered a specific string of nucleotides

• These enzymes are called Restriction Enzymes• Restriction Enzymes can be used as precise scissors

– They let biologists cut (and paste) portions of DNA

Page 19: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

EcoRI• EcoRI was the very first Restriction

Enzyme discovered– "Eco" because it was isolated

from E. Coli (Escherichia Coli)– "R" because it is a Restriction

Enzyme– "I" because it was the first

Restriction Enzyme from E. Coli

– Now over 300 Restriction Enzymes known

• EcoRI cleaves (restricts, digests) DNA– Between the G and A

nucleotides– Only when it encounters them

in the string 5'-GAATTC-3'– This is called the

restriction site

5'-GAATTC-3'3'-CTTAAG-5'

5'-G AATTC-3'3'-CTTAA G-5'

Regulated by EcoRI

Page 20: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Sticky Ends

• Many restriction enzymes in such a way that some single stranded DNA is left at both ends• These nucleotide sequences

– Are complimentary to each other– Are 5'-AATT-3' in the case of EcoRI– Can base pair with other nucleotides in a sequence– Thus, are called "sticky ends"– Can temporarily hold two

DNA strands together– The enzyme ligase

will permanently jointhose strands

– This is calledligation

5'-GAATTC-3'3'-CTTAAG-5'

5'-G AATTC-3'3'-CTTAA G-5'

Regulated by EcoRI

Page 21: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Short Biology Tutorial

• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors

Page 22: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

22

Gene Synthesis:On the Lab Bench

• Initial Sequence Construction– Oligonucleotides (short strands of DNA) are defined with complementary

overlapping sites • The “sticky ends”

– Assembly PCR• Oligonucleotides and polymerase are mixed and placed in a

thermocycler• Creates contiguous DNA sequence from component oligos

Page 23: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

23

Gene Synthesis:On the Lab Bench (cont)

• After PCR, generated DNA sequence cut with restriction enzymes• Expression hosts's plasmid cut with restriction enzymes• Synthetic gene inserted into plasmid and plasmid repaired• Expression Vectors

– Host organisms used to express the synthetic genes (make the protein)– Typically E. Coli

• Possibly Chickens or Cows• Expression vector can now express protein coded for by synthetic gene

– A bit more complicated than described above!!!

Page 24: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

24

DNA Sequence Generation:Gene Insertion

Page 25: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

25

Outline

• Short biology Tutorial• DNA Sequence Generation

– Why is the problem difficult?• IBG Gene Designer

– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation

Page 26: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

26

DNA Sequence Generation:The Computational Problem

• Why is the problem difficult?– Conflicting goals

• Avoid restriction sites• Maximizing Codon Preference• Thus, cannot use deterministic algorithm

– Degeneracy (redundancy) of the DNA code – 64 codons, 20 (21) amino acids (see next slide)

• Several synonymous codons are translated into the same amino acid• Synonymous codons per AA vary from one to six (average is four

codons per AA)• Huge number of possible DNA Sequences

– Average 2N for protein of amino acid length n– Codon Preference

• Varying levels of tRNA assembly components in organisms• Codon usage for a particular AA greatly influence protein expression

– (continued)

Page 27: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

27

DNA Sequence Generation:Codon to Amino Acid Translation

http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg

Page 28: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

28

DNA Sequence Generation:The Computational Problem (cont)

• Why is the problem difficult?– (continued)– Restriction Enzymes

• The vector will contain many restriction enzymes– If these cut up our DNA, we won’t express our proteins– We must design the DNA string using synonymous codons so that there are no

restriction sites

• Helpful to include some other restriction sites – We must design the DNA string using synonymous codons so that these are

included

– (continued)

Page 29: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

29

DNA Sequence Generation:The Computational Problem (cont)

• Why is the problem difficult?– (continued)– mRNA Secondary Structure

• In prokaryotes, mRNA can fold into complex shapes

• This inhibits protein creation– Oligonucleotide generation

• Want a specific melting temperature so that the complex folding doesn’t take place

• The “sticky ends” must have the same melting temperature so that they will bind together.

Page 30: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

30

Outline

• Short biology Tutorial• DNA Sequence Generation

– Why is the problem difficult?• IBG Gene Designer

– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation

Page 31: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

31

IBG GeneDesigner:Our Solution

•IBG GeneDesigner

Page 32: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

32

IBG GeneDesigner:Genetic Algorithm

• Uses a Genetic Algorithm for sequence optimization– Tournament selection model– Uniform and single-point crossover (behind the scenes – not user selectable

at present.)– Mutation causes codon “wobbling”– Sequence “fitness” determined by heuristic evaluation

Page 33: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

33

IBG GeneDesigner:Fitness Evaluation

• GeneDesigner heuristics– Manipulation of nucleotide percentages/ratios to reduce mRNA secondary

structure formation– Inclusion and Exclusion of restriction sites

• Restriction sites requested for inclusion should only occur once– Matching of codon preference– Oligonucleotide generation

• Fitness determined by melting points, start and end nucleotide

Page 34: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

34

IBG GeneDesigner:Future Work

• Algorithm parameters– Systematically manipulate GA parameters to identify default values for

sequence optimization• Population size• Number of generations• Mutation rate• Convergence criteria

– Modify heuristic weighting scheme• Selection models

– Experiment with alternative selection models (Roulette wheel, elitism, limit population replacement)

Page 35: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

35

IBG GeneDesigner:Future Work

• Move algorithm to ECJ architecture– Use the Strength-Pareto multi-objective optimization algorithm

• Create web-based version of application• Explore island model effects on optimization

Page 36: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Results

• IBG GeneDesigner utilized to generate a nucleotide sequence for the SH3 domain of a-spectrin1.

• The codon optimization option was set for expression in E. coli with a 40% G/C bias

• We also used the application to generate four assembly PCR template oligonucleotide sequences to produce the protein coding sequence flanked by desired restriction enzyme recognition sites.

• The calculated Tm values of the three overlapping regions were within 1.6oC– Promoting similar annealing behavior between strands. – Success of the reaction was confirmed by DNA sequencing of a pUC19

expression vector containing the PCR product cloned between restriction sites included in the gene design.

• Summary: Protein Made!!!

Page 37: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Input: Protein Sequnce, Vector, Restriction Enzymes

Page 38: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Input: Flanking Sequences

Page 39: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Input: Algorithm Parameters and Fitness Scores

Page 40: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

Output: Generation of Oligonucleotides

Page 41: Sequence Optimization For Synthetic Genes Using Genetic Algorithms
Page 42: Sequence Optimization For Synthetic Genes Using Genetic Algorithms

42

Acknowledgements

• Graduate student who did much of the coding• Rob Vogelbacher

• University of Chicago undergraduate who used it to build a protein• Benjamin R. Capraro

• His advisor• Tobin Sosnick

• Our collaborator at University of chicago• Shohei Koide