Prediction of protein and peptide isoelectric point based

Preview:

Citation preview

Łukasz P. Kozłowski

Warsaw, 13.12.2018

Prediction of protein and peptide isoelectric point based on

sequence and structure features

lukasz.kozlowski.lpk@gmail.comlukaskoz@mimuw.edu.pl

Outline of the project

1.1 IPC version 2.0 - sequence (1st Student)- structure (2nd Student)

1.2 Proteome-pI version 2.0

Timeline: ~24 months

Output: • 2-3 publications expected (hopefully only 2)• 2 Master’s theses

www.creative-proteomics.com

www.creative-proteomics.com

Isoelectric point

Introduction

acidic medialow pH

neutral form basic mediahigh pH

Introduction

Introduction

acidic medialow pH

neutral form basic mediahigh pH

PDB: 4DKL

Two-dimensional electrophoresis(2D-PAGE)

Image source: goo.gl/saHJRH

Isoelectric point importance - isolating the proteins

Isoelectric point importance - isolating the proteins

Isoelectric Point-Based Fractionation with

Liquid Chromatography and Mass Spectrometry (2D-LC-MS)

Ovitrap mass spec machine

Isoelectric Point-Based Fractionation with

Liquid Chromatography and Mass Spectrometry (2D-LC-MS)

Source: 10.1016/j.ymeth.2016.01.013

Isoelectric point importance - isolating the proteins

Isoelectric point importance – X –ray crystalography

Source: www.creative-biostructure.com

Isoelectric point importance – X –ray crystalography

Source: www.creative-biostructure.com

Protein crystals do not grow in the buffer with pH around isoelectric point

Current state of the art in Isoelectric Point calculation

www: isoelectric.org

Reference: Kozlowski LP (2016) IPC - Isoelectric Point Calculator. Biology Direct 11:55

Henderson–Hasselbalch equation

Acid dissociation constant (Ka)

the strength of an acid in solution (the equilibrium constant for a chemical reaction known as dissociation in the context of acid–base reactions

Henderson–Hasselbalch equation

negative charged positively charged

NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4       

Henderson–Hasselbalch equation

negative charged positively charged

NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4       

Henderson–Hasselbalch equation

negative charged positively charged

NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4       

Optimization

negative charged positively charged

NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4       

Brute force attack

Checking all possible combinations is not very tractable as even for 9 variables (charged amino acid pKa) in range of pH of 3 (±1.5 pH of average for given amino acid pKa) with 0.01 precision gives 1.9683 × 1022 possibilities. Far too many to compute.

Basinhopping optimization using truncated Newton algorithmThis produces suboptimal results in more reasonable time with less than few dozens of

iterations with pKa optimized with high precission.

In the nutshell, the basinhopping algorithm is iterative search procedure with each cycle composed of the following features:

random perturbation of the coordinates local minimization accept or reject the new coordinates based on the minimized function value of the

Metropolis criterion of standard Monte Carlo algorithm

As an initial seed previously published pKa values were used. To limit search space truncated Newton algorithm was used with 2 pH units bounds for pKa variables (e.g. if starting point for Cys pKa was 8.5 the solution was allowed in the interval [6.5, 10.5]).For more details how those algorithms works read Wales & Doye, 1997 (doi: 10.1021/jp970984n)

Current state of the art in Isoelectric Point calculation

Reference: Kozlowski LP (2016) IPC - Isoelectric Point Calculator. Biology Direct 11:55

Expected results

1) Better tools for prediction of isoelectric point:

a) sequence based predictor *

b) structure based predictor *

2) Update of Proteome-pI database *

* parts which can be published as separate publications

Source: www.edureka.co

Use deep learning for Isoelectric Point prediction

Deep learning architecture

The plan

Source of the data

Datasets:

a) The literature (summarized in Kozlowski, 2016) 

a) The collaborators:   

- prof. Henning Urlaub (University Medical Center & Max Planck Institute for Biophysical Chemistry, Gottingen, Germany)

- dr Ilian Atanassov (Protein Core Facility at Max Planck Institute for Biology of Ageing in Colone, Germany).

Proteome-pI database

http://isoelectricpointdb.org/

Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978

www: isoelectricpoint db.org

Proteome-pI database

http://isoelectricpointdb.org/

Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978

Proteome-pI database

http://isoelectricpointdb.org/

Current state: good db for 2D-PAGE

After the update: good db also for MS

Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978

More details about the offer

https://bit.ly/2G0OhPU (Polish version)https://bit.ly/2L1U7iR (English version)

Proponowane tematy prac magisterskich:

1. Przewidywanie punktu izoelektrycznego białek i peptydów z sekwencji za pomocą głębokich sieci neuronowych

2. Przewidywanie punktu izoelektrycznego białek ze struktury trzeciorzędowej za pomocą głębokich sieci neuronowych

More details about the offer

Thank you for your time

Any other

questions & comments

email: lukasz.kozlowski.lpk@gmail.com www: bioinformatic.netmark.pl

Recommended