11
Automated Alphabet Automated Alphabet Reduction Method with Reduction Method with Evolutionary Algorithms Evolutionary Algorithms for Protein Structure for Protein Structure Prediction Prediction Jaume Bacardit, Michael Stout, Jonathan Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà D. Hirst, Kumara Sastry, Xavier Llorà and Natalio Krasnogor and Natalio Krasnogor University of Nottingham and University University of Nottingham and University of Illinois at Urbana-Champaign of Illinois at Urbana-Champaign

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction

  • Upload
    huong

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction. Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà and Natalio Krasnogor University of Nottingham and University of Illinois at Urbana-Champaign. What is a protein?. - PowerPoint PPT Presentation

Citation preview

Page 1: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Automated Alphabet Reduction Automated Alphabet Reduction Method with Evolutionary Method with Evolutionary

Algorithms Algorithms for Protein Structure Predictionfor Protein Structure Prediction

Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà and Natalio KrasnogorKumara Sastry, Xavier Llorà and Natalio Krasnogor

University of Nottingham and University of Illinois at University of Nottingham and University of Illinois at Urbana-ChampaignUrbana-Champaign

Page 2: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

What is a protein?What is a protein?

Page 3: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Protein Structure Prediction (PSP)Protein Structure Prediction (PSP)

The goal is to predict the (complex) 3D structure (and some sub-The goal is to predict the (complex) 3D structure (and some sub-features) of a protein from its amino acid sequence (a 1D object)features) of a protein from its amino acid sequence (a 1D object)

Primary Sequence 3D Structure

Page 4: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Alphabet reduction process and Alphabet reduction process and validationvalidation

DatasetCard=20

ECGA

MutualInformation

Size = N (<20)

DatasetCard=N

(<20)

BioHEL

Test set

Accuracy

Ensembleof rule sets

Domain (CN, RSA, …)

Page 5: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

This entry is human competitive This entry is human competitive because:because:

G: The result solves a problem of indisputable G: The result solves a problem of indisputable difficulty in its field difficulty in its field (Difficult)(Difficult)

D: The result is publishable in its own right as a D: The result is publishable in its own right as a new scientific result - independent of the fact that new scientific result - independent of the fact that the result was mechanically created the result was mechanically created (Publishable)(Publishable)

E: The result is equal to or better than the most E: The result is equal to or better than the most recent human-created solution to a long-standing recent human-created solution to a long-standing problem for which there has been a succession of problem for which there has been a succession of increasingly better human-created solutions increasingly better human-created solutions ((≥≥Human)Human)

B: The result is equal to or better than a result that B: The result is equal to or better than a result that was accepted as a new scientific result at the time was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific when it was published in a peer-reviewed scientific journal journal (Innovative)(Innovative)

Page 6: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

G:DifficultyG:Difficulty PSP is, after many decades of research, still one of the main PSP is, after many decades of research, still one of the main

unsolved problems in Scienceunsolved problems in Science In the 2006 CASP experiment, one of the best methods In the 2006 CASP experiment, one of the best methods

(Rosetta@home) used > 3 cpu yrs to predict a single protein(Rosetta@home) used > 3 cpu yrs to predict a single protein Amino acid sequence is a string drawn from a 20-letter Amino acid sequence is a string drawn from a 20-letter

alphabetalphabet Some AAs are similar & could be grouped, reducing the Some AAs are similar & could be grouped, reducing the

dimensionality of the domaindimensionality of the domain We can find a new alphabet with much lower cardinality We can find a new alphabet with much lower cardinality

than the AA representation without loosing critical than the AA representation without loosing critical information in the processinformation in the process

We can tailor alphabet reduction automatically to a We can tailor alphabet reduction automatically to a variety of PSP-related domainsvariety of PSP-related domains

Page 7: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Why is this entry human-Why is this entry human-competitive?competitive?

The initial version of our alphabet reduction The initial version of our alphabet reduction process has been accepted in GECCO process has been accepted in GECCO 2007, in the biological applications track2007, in the biological applications track

One of the most famous alphabet One of the most famous alphabet reductions is the HP model that reduces AA reductions is the HP model that reduces AA types to only two: Hydrophobic & Polar (e.g. types to only two: Hydrophobic & Polar (e.g. [Broome & Hecht, 2000])[Broome & Hecht, 2000])

Other experts use a broader set of physico-Other experts use a broader set of physico-chemical properties to propose reduced chemical properties to propose reduced alphabets (examples in later slides)alphabets (examples in later slides)

We have improved upon both of the aboveWe have improved upon both of the above

D:Publish.D:Publish.

E:≥HumanE:≥Human

Page 8: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

B:InnovativeB:Innovative Comparison of our results against other reduced alphabets existing in Comparison of our results against other reduced alphabets existing in

the literature and human-designed ones, applied to two PSP-related the literature and human-designed ones, applied to two PSP-related datasets, Coordination Number (CN) and Solvent Accessibility (SA)datasets, Coordination Number (CN) and Solvent Accessibility (SA)

Our method produces the best reduced alphabetsOur method produces the best reduced alphabets

AlphabetAlphabet LettersLetters CN acc.CN acc. SA acc.SA acc. Diff.Diff. Ref.Ref.

AAAA 2020 74.074.0±±0.60.6 70.7±0.470.7±0.4 ------ ------

Our methodOur method 55 73.3±0.573.3±0.5 70.3±0.470.3±0.4 0.7/0.40.7/0.4 This workThis work

WW5WW5 66 73.1±0.773.1±0.7 69.6±0.469.6±0.4 0.9/1.10.9/1.1 [Wang & Wang, 99][Wang & Wang, 99]

SR5SR5 66 73.1±0.773.1±0.7 69.6±0.469.6±0.4 0.9/1.10.9/1.1 [Solis & Rackovsky, 00][Solis & Rackovsky, 00]

MU4MU4 55 72.6±0.772.6±0.7 69.4±0.469.4±0.4 1.4/1.31.4/1.3 [Murphy et al., 00][Murphy et al., 00]

MM5MM5 66 73.1±0.673.1±0.6 69.3±0.369.3±0.3 0.9/1.40.9/1.4 [Melo & Marti-Renom, 06][Melo & Marti-Renom, 06]

HD1HD1 77 72.9±0.672.9±0.6 69.3±0.469.3±0.4 1.1/1.41.1/1.4 This workThis work

HD2HD2 99 73.0±0.673.0±0.6 69.3±0.469.3±0.4 1.0/1.41.0/1.4 This workThis work

HD3HD3 1111 73.2±0.673.2±0.6 69.9±0.469.9±0.4 0.8/0.80.8/0.8 This workThis work

Alphabetsfrom the literature

Expertdesignedalphabets

Page 9: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Why is this entry better than the Why is this entry better than the other entries?other entries?

PSP is a very difficult and very relevant domainPSP is a very difficult and very relevant domain It has been named as Grand Challenge by the USA It has been named as Grand Challenge by the USA

government [1]government [1] Impact of having better protein structure models are Impact of having better protein structure models are

countlesscountless Genetic therapyGenetic therapy Synthesis of drugs for incurable diseasesSynthesis of drugs for incurable diseases Improved cropsImproved crops Environmental remediationEnvironmental remediation

Our contribution is a small but concrete step towards Our contribution is a small but concrete step towards achieving this goalachieving this goal

[1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal Coordinating Council for Science, and Technology. Grand challenges 1993: High performance computing and communications, 1992.

Page 10: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Better than other entries: New Better than other entries: New understanding of the folding processunderstanding of the folding process

Simpler rules obtained by BioHELSimpler rules obtained by BioHEL AA alphabet: AA alphabet: If AA−4 {F, G, I, L, V,X, Y }, AA−3 {F,

G, Q,W}, AA−2 {C,N, P}, AA−1 {A, I, Q, V, Y }, AA {K}, AA1 {F, I, L,M,N, P, T, V }, AA2 {N, P, Q, S}, AA3 {C, I, L,R,W}, AA4 {A,C, I, L,R, S} then AA is exposed

Reduced alphabet: Reduced alphabet: If AA−4 {1, 3}, AA−3 {1, 3}, AA {3}, AA1 {1, 3}, AA2 {1}, AA3 {0} then AA is exposed

0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3 = X

Unexpected explanations: Alphabet reduction Unexpected explanations: Alphabet reduction clustered AA types that experts did not expect. clustered AA types that experts did not expect. Analyzing the data verified that groups were Analyzing the data verified that groups were soundsound

Page 11: Automated Alphabet Reduction Method with Evolutionary Algorithms  for Protein Structure Prediction

Better than other entries: run-Better than other entries: run-time reduction & conclusionstime reduction & conclusions

Alphabet reduction is also beneficial in the short Alphabet reduction is also beneficial in the short termterm We have extrapolated the reduced alphabet to Position-We have extrapolated the reduced alphabet to Position-

Specific Scoring Matrices (PSSM)Specific Scoring Matrices (PSSM) PSSM is the state-of-the-art representation for PSP with PSSM is the state-of-the-art representation for PSP with

orders of magnitude more information than the AA alphabetorders of magnitude more information than the AA alphabet Learning time of BioHEL using PSSM has been reduced Learning time of BioHEL using PSSM has been reduced

from from 2 weeks to 3 days2 weeks to 3 days with only 0.5% accuracy drop with only 0.5% accuracy drop

We consider that our entry is the best because it We consider that our entry is the best because it addresses addresses successfullysuccessfully and in and in many waysmany ways a very a very relevantrelevant, , importantimportant, , high profilehigh profile and and timely timely problemproblem