Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Łukasz P. Kozłowski
Warsaw, 13.12.2018
Prediction of protein and peptide isoelectric point based on
sequence and structure features
[email protected]@mimuw.edu.pl
Outline of the project
1.1 IPC version 2.0 - sequence (1st Student)- structure (2nd Student)
1.2 Proteome-pI version 2.0
Timeline: ~24 months
Output: • 2-3 publications expected (hopefully only 2)• 2 Master’s theses
www.creative-proteomics.com
www.creative-proteomics.com
Isoelectric point
Introduction
acidic medialow pH
neutral form basic mediahigh pH
Introduction
Introduction
acidic medialow pH
neutral form basic mediahigh pH
PDB: 4DKL
Two-dimensional electrophoresis(2D-PAGE)
Image source: goo.gl/saHJRH
Isoelectric point importance - isolating the proteins
Isoelectric point importance - isolating the proteins
Isoelectric Point-Based Fractionation with
Liquid Chromatography and Mass Spectrometry (2D-LC-MS)
Ovitrap mass spec machine
Isoelectric Point-Based Fractionation with
Liquid Chromatography and Mass Spectrometry (2D-LC-MS)
Source: 10.1016/j.ymeth.2016.01.013
Isoelectric point importance - isolating the proteins
Isoelectric point importance – X –ray crystalography
Source: www.creative-biostructure.com
Isoelectric point importance – X –ray crystalography
Source: www.creative-biostructure.com
Protein crystals do not grow in the buffer with pH around isoelectric point
Current state of the art in Isoelectric Point calculation
www: isoelectric.org
Reference: Kozlowski LP (2016) IPC - Isoelectric Point Calculator. Biology Direct 11:55
Henderson–Hasselbalch equation
Acid dissociation constant (Ka)
the strength of an acid in solution (the equilibrium constant for a chemical reaction known as dissociation in the context of acid–base reactions
Henderson–Hasselbalch equation
negative charged positively charged
NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4
Henderson–Hasselbalch equation
negative charged positively charged
NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4
Henderson–Hasselbalch equation
negative charged positively charged
NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4
Optimization
negative charged positively charged
NQ=QN1+QN2+QN3+QN4+QN5+QP1+QP2+QP3+QP4
Brute force attack
Checking all possible combinations is not very tractable as even for 9 variables (charged amino acid pKa) in range of pH of 3 (±1.5 pH of average for given amino acid pKa) with 0.01 precision gives 1.9683 × 1022 possibilities. Far too many to compute.
Basinhopping optimization using truncated Newton algorithmThis produces suboptimal results in more reasonable time with less than few dozens of
iterations with pKa optimized with high precission.
In the nutshell, the basinhopping algorithm is iterative search procedure with each cycle composed of the following features:
random perturbation of the coordinates local minimization accept or reject the new coordinates based on the minimized function value of the
Metropolis criterion of standard Monte Carlo algorithm
As an initial seed previously published pKa values were used. To limit search space truncated Newton algorithm was used with 2 pH units bounds for pKa variables (e.g. if starting point for Cys pKa was 8.5 the solution was allowed in the interval [6.5, 10.5]).For more details how those algorithms works read Wales & Doye, 1997 (doi: 10.1021/jp970984n)
Current state of the art in Isoelectric Point calculation
Reference: Kozlowski LP (2016) IPC - Isoelectric Point Calculator. Biology Direct 11:55
Expected results
1) Better tools for prediction of isoelectric point:
a) sequence based predictor *
b) structure based predictor *
2) Update of Proteome-pI database *
* parts which can be published as separate publications
Source: www.edureka.co
Use deep learning for Isoelectric Point prediction
Deep learning architecture
The plan
Source of the data
Datasets:
a) The literature (summarized in Kozlowski, 2016)
a) The collaborators:
- prof. Henning Urlaub (University Medical Center & Max Planck Institute for Biophysical Chemistry, Gottingen, Germany)
- dr Ilian Atanassov (Protein Core Facility at Max Planck Institute for Biology of Ageing in Colone, Germany).
Proteome-pI database
http://isoelectricpointdb.org/
Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978
www: isoelectricpoint db.org
Proteome-pI database
http://isoelectricpointdb.org/
Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978
Proteome-pI database
http://isoelectricpointdb.org/
Current state: good db for 2D-PAGE
After the update: good db also for MS
Reference: Kozlowski LP. Proteome-pI: proteome isoelectric point database Nucleic Acids Res. 2017, 45 (D1): D1112-D1116. doi: 10.1093/nar/gkw978
More details about the offer
https://bit.ly/2G0OhPU (Polish version)https://bit.ly/2L1U7iR (English version)
Proponowane tematy prac magisterskich:
1. Przewidywanie punktu izoelektrycznego białek i peptydów z sekwencji za pomocą głębokich sieci neuronowych
2. Przewidywanie punktu izoelektrycznego białek ze struktury trzeciorzędowej za pomocą głębokich sieci neuronowych
More details about the offer
Thank you for your time
Any other
questions & comments
email: [email protected] www: bioinformatic.netmark.pl