29
Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques Ka-Chun Wong, Kwong-Sak Leung, Man-Hon Wong Department of Computer Science & Engineering The Chinese University of Hong Kong, HKSAR, China {kcwong, ksleung, mhwong}@cse.cuhk.edu.hk

Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques

Embed Size (px)

DESCRIPTION

Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques. Ka-Chun Wong , Kwong-Sak Leung, Man-Hon Wong Department of Computer Science & Engineering The Chinese University of Hong Kong, HKSAR, China { kcwong , ksleung, mhwong}@cse.cuhk.edu.hk. Outline. - PowerPoint PPT Presentation

Citation preview

Protein Structure Prediction on a Lattice Model via Multimodal Optimization Techniques

Ka-Chun Wong, Kwong-Sak Leung, Man-Hon WongDepartment of Computer Science & EngineeringThe Chinese University of Hong Kong, HKSAR, China{kcwong, ksleung, mhwong}@cse.cuhk.edu.hk

Outline Introduction Background Objective Related Works Paper Contributions

Apply multimodal optimization techniques Propose a novel mutation method

Experiments Conclusion

Introduction

Protein is: a sequence of amino acid residues folded

into a 3D structure important for living:

Material transportations across cells Catalyzing metabolic reactions Body defenses against viruses

Introduction

Protein Function: Substantially depends on its 3D structure

http://www.pdb.org/pdb/explore/explore.do?structureId=2X7M

Introduction

Protein Structure Determination “Wet-lab” experiments exist

X-ray crystallography NMR spectroscopy ……

But they are: Labor intensive Not scalable Expensive

Introduction “Wet lab” experiments for Protein

Structure Determination are Costly Time-consuming Not scalable Accurate

Computational approaches for

Protein Structure Prediction are Less Costly Fast Scalable Less Accurate

Complementary TwinsWet-labs for fine-tuning

Computation for coarse-tuning

Introduction

Protein Structure Prediction (PSP) Input: An amino acid sequence Output: The 3D structure of the sequence Divided into two classes:

Using / Not using similar sequences & their structures

Prediction

Similar sequences & their structures

……YDVAEGCKVV……

Introduction

This paper focuses on De novo protein structure prediction on

the 3D HP lattice model using evolutionary algorithms *

De novo means: the input of the method only contains the sequence to be predicted

*N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.

Background

3D HP lattice model Assume the main driving forces are the

interactions among the hydrophobic amino acid residues

All known amino acid residues are experimentally classified as either hydrophobic (H) or polar (P).

Background

3D HP lattice model An amino acid sequence is represented

as a string {H,P}+

The sequence folded into a limited space, a cubic lattice

Background

Amino acid residue – Bead Peptide bond – Straight Line

HPHPPHHPHPPHPHHPPHPH

H: Red color

P: Blue color

Objective

To find the conformation with the minimal energy. Maximize the number of the H-H bonds

which are formed by two non-sequence-adjacent residues (non-local H-H bonds)

Objective

Mathematically, it is to minimize the following function:

* H. Li, R. Helling, C. Tang, and N. Wingreen. Emergence of Preferred Structures in a Simple Model of Protein Folding. Science, 273(5275):666–669, 1996.

Bond Energy

Distance Function

Only non-sequence-adjacent residues are checked

Related Works

Unger et al. first apply a hybridized genetic algorithm to solve the problem [1]

Patton et al. use a standard genetic algorithm [2]

[1] Unger, R. and Moult, J. 1993. Genetic Algorithm for 3D Protein Folding Simulations. In Proceedings of the 5th international Conference on Genetic Algorithms S. Forrest, Ed. Morgan Kaufmann Publishers, San Francisco, CA, 581-588. [2] Patton, A. L., Punch, W. F., and Goodman, E. D. 1995. A Standard GA Approach to Native Protein Conformation Prediction. In Proceedings of the 6th international Conference on Genetic Algorithms (July 15 - 19, 1995). L. J. Eshelman, Ed. Morgan Kaufmann Publishers, San Francisco, CA, 574-581.

Related Works

Berger et al. prove that the problem is NP-complete [1]

Krasnogor et al. publish a work discussing the basic algorithmic factors affecting the problem [2]

[1] Berger, B. and Leighton, T. 1998. Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete. In Proceedings of the Second Annual international Conference on Computational Molecular Biology. RECOMB '98. ACM, New York, NY, 30-39.[2] N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999.

Related Works

Since then, many related algorithms are proposed. Some examples: Multimeme algorithm by Krasnogor et al. Guided genetic algorithm by Hoque et al. Ant colony algorithm by Shmygelska et al. Differential Evolution by Bitello et al. Immune Algorithm by Cutello et al. EDA by Santana et al.

Paper Contributions

Observation: Some diversity preserving techniques are

incorporated in most algorithms Duplicate predator [1] Aging operator [2] Additional renormalization of the pheromone

[3][1] G. A. Cox, T. V. Mortimer-Jones, R. P. Taylor, and R. L. Johnston. Development and optimisation of a novel genetic algorithm for studying model protein folding. Theoretical Chemistry Accounts: Theory, Computation, and Modeling, 112(3):163–178, 2004.[2] V. Cutello, G. Nicosia, M. Pavone, and J. Timmis. An immune algorithm for protein structure prediction on lattice models. IEEE Transactions on Evolutionary Computation, 11(1):101–117, Feb. 2007.[3] A. Shmygelska and H. Hoos. An ant colony optimisation algorithm for the 2d and 3d hydrophobic polar protein folding problem. BMC Bioinformatics, 6(1):30, 2005.

Paper Contributions

Observation Unger et al. have observed that there

can be multiple conformations for each energy value [1]

A study also indicates the fitness landscapes of the problem are multimodal [2][1] R. Unger and J. Moult. Genetic algorithms for protein folding simulations. J. Mol. Biol., 231:75–81, May 1993.[2] S. D. Flores and J. Smith. Study of fitness landscapes for the HP model of protein structure prediction. In Evolutionary Computation, 2003. CEC ’03. pages 2338–2345, Dec. 2003.

Paper Contributions

In this paper: Apply multimodal optimization techniques to

solve the PSP problem Fitness Sharing (SharingGA) [1] Species Conserving (SCGA) [2] Crowding (CGA) [3]

1. Goldberg, D. E. and Richardson, J. 1987. Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the Second international Conference on Genetic Algorithms on Genetic Algorithms and their Application, 41-49.

2. Li, J., Balazs, M. E., Parks, G. T., and Clarkson, P. J. 2002. A species conserving genetic algorithm for multimodal function optimization. Evol. Comput. 10, 3 (Sep. 2002), 207-234.

3. De Jong, K. A. 1975 An Analysis of the Behavior of a Class of Genetic Adaptive Systems.. Doctoral Thesis. UMI Order Number: AAI7609381., University of Michigan.

Paper Contributions

In this paper: Proposes a novel mutation method

Mixing two types of mutations together Sometimes use RM, sometimes use AM

RM: Mutation in Relative EncodingAM: Mutation in Absolute Encoding

and apply it in CGA (called CGA-mixed)

Experiments Experiments are conducted:

Relative Encoding [1] Hamming Distance 100 Individuals (Overlapping) Uniform Deterministic (Parent Selection) Truncation (Survival Selection) 50 runs 105 and 5x106 energy evaluations

UN [2] as a control algorithm

•N. Krasnogor, W.E. Hart, J. Smith, and D. Pelta. Protein structure prediction with evolutionary algorithms. In Eiben Garzon Honovar Jakiela Banzhaf, Daida and Smith, editors, International Genetic and Evolutionary Computation Conference (GECCO99), pages 1569-1601. Morgan Kaufmann, 1999. •K.A. De Jong, Evolutionary computation: a unified approach. MIT Press, Cambridge MA, 2006

Experiments

105 energy evaluations over 50 runs

H(x): The lowest energy over 50 runs

mean+σ: The lowest energy of a run averaged over 50 runs

Experiments

5x106 energy evaluations over 50 runs

H(x): The lowest energy over 50 runs

mean+σ: The lowest energy of a run averaged over 50 runs

Experiments The experimental results quoted in the following literatur

es are taken and compared under the same termination condition Santana, R.; Larranaga, P.; Lozano, J.A.; , "Protein Folding in Simplified M

odels With Estimation of Distribution Algorithms," Evolutionary Computation, IEEE Transactions on , vol.12, no.4, pp.418-438, Aug. 2008

Cutello, V.; Nicosia, G.; Pavone, M.; Timmis, J.; , "An Immune Algorithm for Protein Structure Prediction on Lattice Models," Evolutionary Computation, IEEE Transactions on , vol.11, no.1, pp.101-117, Feb. 2007

Experiments

105 energy evaluations over 50 runs

H(x): The lowest energy over 50 runs

mean+σ: The lowest energy of a run averaged over 50 runs

Experiments

5 x 106 energy evaluations over 50 runs

H(x): The lowest energy over 50 runs

mean+σ: The lowest energy of a run averaged over 50 runs

Conclusion In this paper, we:

Apply multimodal optimization techniques for PSP Propose a novel mutation method for PSP

Some results comparable with the state-of-the-art algorithms have been obtained

The source codes can be downloaded at: http://pc89075.cse.cuhk.edu.hk:8080/myapp/GECCO2010-PSP-LatticeModels.zip

Q&A

The source codes can be downloaded at: http://pc89075.cse.cuhk.edu.hk:8080/myapp/GECCO2010-PSP-LatticeModels.zip

Paper Contributions Proposed mutation method

and apply it in CGA (called CGA-mixed)