Quorum-Sensing Control Repressor - Pseudomonas aeruginosa ...jscglobal.org/gallery/2-dec-1467.pdf · Prediction of the three states from protein sequences (i.e., the Q3 prediction

Quorum-Sensing Control Repressor - Pseudomonas

aeruginosa Secondary Structure Prediction using Particle

Swarm Optimization (PSO) tuned Artificial

Neural Network

1Saravanan K,

2Sivakumar S

1Department of Physics, AVS Engineering College, Salem, Tamilnadu, INDIA

2Department of Physics, Government Arts College(Autonomous), Salem, Tamilnadu, INDIA

Corresponding E.Mail:[email protected].

ABSTRACT

Quorum sensing controls gene expression in hundreds of Proteobacteria including a

number of plant and animal pathogens. Generally, the AHL receptors are members of a

family of related transcription factors, and although they have been targets for the

development of antivirulence therapeutics. But there is very little structural information

about this class of bacterial receptors. Hence, secondary structure prediction becomes one

of the most important and challenging problems. Machine learning techniques have been

applied to solve this problem and have gained substantial success in this research area.

Although, neural network-based prediction becomes more popular, the training

methodology involves more processing. Hence, in order to overcome this drawback, this

work proposed a new topology called PSO trained Neural Fields which can able to tune NN

automatically and is designed for protein SS prediction. The results are compared with

other prediction mechanisms. The obtained results are more accurate and better than the

corresponding other mechanisms.

Keywords: protein structure prediction / secondary structure / neuralnetwork / back-propagation

/ PSO.

INTRODUCTION

Proteins perform many biological functions and represent the building blocks of

organisms. They are complex organic compounds of which the basic forming unit is the amino

acid. Proteins are initially linear chains of amino acids which can vary in length from a few up to

thousands of amino acids. Proteins fold, under the influence of several chemical and physical

factors, into their unique 3D structures that determine their biological functions and properties.

Misfolding occurs when the protein folds into a 3D structure that does not represent its correct

native structure, which can lead to many diseases, such as Alzheimer's, several types of cancer,

etc. Due to the importance of this issue to human life, scientists have developed laboratory

techniques such as X-ray crystallography and nuclear magnetic resonance (NMR) to determine

9

ISSN NO: 1524-2560

http://jscglobal.org/

Journal of Scientific Computing

Volume 8 Issue 12 2019

the native structures of proteins. Although these methods are reliable, they are not always

feasible. Hence, predicting the native structure of a protein, given its primary sequence, is an

important and challenging task in computational biology. The primary protein structure is a

linear sequence of amino acids connected together via peptide bonds. Proteins fold due to the

hydrophobic effect, van der Waals interactions, electrostatic forces, hydrogen bonding, etc. The

secondary structures are three-dimensional structures characterized by a repeating bonding

pattern. The most common structures are helices and strands. The proteins that include these

secondary structures can further fold into the tertiary structure forming a bundle of secondary

structures, turns and loops. Furthermore, the aggregation of tertiary structure regions of some

separate protein sequences forms the so-called quaternary structures [1]. Thus the protein

structure prediction computational approaches are heuristics and can be classified as homology

modeling, threading, and ab initio methods [2]

Comparative-modeling methods can successfully exploit this property. Most of these

methods use neural networks and achieve good prediction accuracies. Evolutionary information

in the form of multiple sequence alignment profiles is used [3]. Best methods can achieve

accuracies up to 79.01% with bidirectional neural networks [4–6].

The secondary-structure prediction approaches in today can be categorized into three

groups: neighbor-based, model-based, and meta predictor-based [7]. The neighbor-based

approaches predict the secondary structure by identifying a set of similar sequence fragments

with known secondary structure; the model-based approaches employ sophisticated machine

learning techniques to learn a predictive model trained on sequences of known structure, whereas

the meta predictor -based approaches predict based on a combination of the results of various

neighbor and/or model-based techniques.

Historically, the most successful model-based approaches, such as PSIPRED [8] were

based on neural network (NN) learning techniques [9]. Protein secondary structures are

traditionally characterized as 3 general states: helix (H), strand (E), and coil (C). From these

general three states, the DSSP program [10] proposed a finer characterization of the secondary

structures by extending the three states into eight states: 310 helix (G), α-helix (H), π-helix (I), β-

stand (E), bridge (B), turn (T), bend (S), and others (C). Prediction of the three states from

protein sequences (i.e., the Q3 prediction problem) has been intensively investigated for decades

using many machine learning methods, including the probability graph models [11,12], support

vector machines [13, 14], hidden Markov models [15, 16], artificial neural network [19-21], and

bidirectional recurrent neural network(BRNN) [17].

However, Artificial Neural Network (ANN) design is a complex task because its

performance depends on the architecture, the selected transfer function, and the learning

algorithm used to train the set of synaptic weights[18]. To overcome this drawback, a new

methodology that automatically tunes ANN using particle swarm optimization algorithms (PSO)

is implemented here...

10

ISSN NO: 1524-2560




Among various kinds of proteins, Quorum sensing, a cell-cell communication system, is broadly

distributed among bacteria and is commonly used to regulate the production of shared products.

An important consequence of quorum sensing is a delay in the production of certain products

until the population density is high. The bacterium Pseudomonas aeruginosa has a particularly

complicated quorum sensing system involving multiple signals and receptors. Hence it is

necessary to predict the structure of this protein. In this work, in order to incorporate the

prediction of the structure of QscR, a new hybrid topology called PSO trained Neural Network is

introduced...

METHODOLOGY AND MATERIALS

PARTICLE SWARM OPTIMIZATION (PSO)

Particle Swarm Optimization was developed by James Kennedy and Russell Eberhart in

the year 1995. This technique is a population-based one that is inspired by biological

perceptions like flocking and swarming. The idea first appeared through the behavior observed

from swarms of bees, school of fishes and flocks of birds. The key fact of PSO is its fast

convergence, simple execution and needless gradient information. PSO is initialized by a

randomly generated population and it conducts searching in the population of the particles. Each

and every particle in the population signifies a fitness solution to the given problem [22,24]. The

particles travel in the search space and transform their position by getting the information such as

(i) the distance between the Pbest and the particle’s current position (ii) the distance between the

Gbest and the particle’s current position. All particles remember its feasible solution with its

achieved position known as Pbest, the personal best. It is the best value with its position

originated in the group Gbest, the global best. Each particle is accelerated by the PSO to its

Pbest and the Gbest locations. Figure 1 shows particle position alteration in particle swarm

optimization.

Figure 1. Particle position alternation in PSO

whereGbest

iV

is the velocity based on Gbest and Pbest

iV

is the velocity based on Pbest .

11

ISSN NO: 1524-2560




Figure 2. Particle Swarm group behavior

Every particle in the group has the ability to produce a solution to the given problem.

The ith

particle’s position is represented as Xi = (xi1, xi2, …..,xin) and the velocity corresponding

to the ith particle is represented as Vi = (vi1, vi2, …..,vin) is shown in figure 2.

Figure 3. The behavior of Pbest and Gbest

The Pibest

and Gbest

of the ith

particle are specified as Pibest

= (xi1Pbest

, xi2Pbest

,….xinPbest

) and

Gbest

= (x1Gbest

, x2Gbest

,…. xnGbest

) which is mentioned in figure.4. The velocity of the particle i is

represented as shown in equation 1.

(.1)

xi(t)

-

the

present position of the particle i at iteration t

t - Iteration pointer

pibest

- the best position of the particle i until iteration t

Gbest

- the global best position of entire swarm until iteration t

c1, c2 - the acceleration coefficients varies between 0 and 4

vi(t)

- the velocity between the step size xi(t) and xi(t+1)

ω - the inertia weight/damping factor which decreases from

))(

(22

))(

(11

)()1( tiXbestGrc

tiX

bestiPrc

tiV

tiV

12

ISSN NO: 1524-2560




0.9 to 0.4 used to control the contact of new velocity with

its previous velocity

r1,r2 - Random variables with a range of [0, 2]

The inertia weight ω is calculated by the following equation 2.

iteriter

max

minmaxmax

(2)

where

ωmax - initial weight

ωmax - final weight

itermax - maximum iteration number

iter - current iteration number

A new velocity is calculated in the direction of pibest

and Gbest

to execute a change in the

current search point (Swagatam Das et al. 2005). Every particle attempts to migrate from its

current position to the new position by using the modified velocity given below in equation 3.

)1()()1( t

i

t

i

t

i VXX (3)

In PSO optimization, all the particles attempt to migrate for improved positions. The

mutual effort of all the particles, the best position (optimal solution) is obtained. This iteration

comes to an end after attaining the stopping condition. Common types of stopping conditions are

the number of iteration of the algorithm, the number of iterations while the final update of the

global best solution and a predefined fitness value (Miller 2002). The PSO algorithm is

completed after certain iterations after reaching the fitness value close enough to the desired

output. There are surplus versions of PSO are available for discrete optimization, constrained

optimization and for multi-objective optimization [25].

ARTIFICIAL NEURAL NETWORK (ANN)

ANN is an approximation function mapping inputs to outputs. A typical network of three

layers of neurons depicted in figure 4 consists of input, hidden, and output layers; in which each

neuron acting as an independent computational element.

13

ISSN NO: 1524-2560




Figure 4. An Artificial Neural Network

The input layer is defined as a layer of neurons receiving inputs directly from outside the

network. The layer of a network that is not connected to the network output called hidden layer

and layer whose output is passed to the world outside the network is the output layer. Weight

functions apply weights to input to get weighted inputs, as specified by a particular function.

Different algorithms can apply to minimize the network error during the ANN training. This

usually happens by finding the correct tune of the network in which is depended on weights,

biases, number of neurons in the hidden layer and iteration number. Among these parameters, the

training algorithm is important in a way it reaches optimum weights and biases. Generally, the

performance function of the ANN models during the training process is assessed using the sum

of squares of the errors as follows:

Where

T is the total number of training samples,

m is the number of output layer neurons,

W represents the vector containing all the weights in the network,

yp is the actual network output, and

dp is the desired output.

Achieving an optimum number of neurons in the hidden layer is often obtained through a

trial and error procedure. Aside from the abovementioned parameters, the correct selection of the

input variables which affect the target variable is considered as one of the most important stages

dealing with ANN models.

14

ISSN NO: 1524-2560




ANN TRAINING WITH PSO ALGORITHM

To train the ANN with the PSO algorithm, the following procedures are taken into

consideration. The algorithm is utilized to find the optimum weights and biases of the ANN

model. Weights and biases' values form the search space of the algorithm which is of n

dimensions[23]. The n is the total number of weights (and biases) that need to be optimized.

Each particle has a position vector and a velocity vector of n-dimensions. Here both weights and

biases are shown by W. The optimal set of weights is obtained by flying the particles around the

search space. At each iteration, the algorithm comes up with a set of weights that their fitness is

assessed. It happens by assigning these weights to the nodes and predicting the target value.

Afterward, the accuracy of the prediction through assigned weights is evaluated as the difference

between actual and predicted values which should be minimized through the optimization

process. In this regard, the best fitness the particle has been achieved so far is considered as its

personal best.

Similarly, the best fitness of the swarm is used as the global best. This process is repeated

for a specific number of iteration until the optimized weights for the ANN are yielded. The steps

for a PSO optimized ANN is given below. For a three-layered perceptron, W[1] and W[2]

represent the connection weight matrix between the input layer and the hidden layer, and

between the hidden layer and the output layer respectively. Applying a PSO algorithm to train

the multilayer perceptron, the ith

particle is denoted by:

(4)

(5)

(6)

where j = 1, 2; m = 1,. . . ,Mj ; n = 1,. . . , Nj ;

Mj and Nj are the rows and column sizes of the matrices

W, P, and V; r and s are positive constants;

a and b are random numbers in the range from 0 to 1;

t is the time step between observations and is often taken as unity;

V'' and W'' represent the new values.

15

ISSN NO: 1524-2560




Applying Equation,

the new velocity of the particle is computed by using its previous velocity and the distances of its

current position from the best experiences both in its own and as a group. The second element on

the right-hand side of Equation represents the private thinking of the particle itself whilst the

social part, i.e., the third element on the right-hand side of Equation, denotes the collaboration

among the particles as a group. The new position according to the new velocity can be

determined by Equation

The fitness function f is the mean squared error and is defined as:

Where

F is the fitness value,

n is the number of data points.

STUDY AREA AND DATA

In this work, 100 proteins set for training and 5 protein set for testing were used. All

these sets have a representative mix of the three secondary structure classes, α-helix, β-strand

and coil. Each set was used as the validating set and as the testing set.

Table 1: The parameter configuration used in PSO

S.No Parameters Values

1 Particles 100

2 C1 1

3 C2 2

4 Max generation 1000

16

ISSN NO: 1524-2560




RESULTS AND DISCUSSION

HYDROPHILICITY PLOT

A hydrophilicityplotis a quantitative analysis of the degree of hydrophobicity or

hydrophilicity of amino acids of a protein. It is used to characterize or identify possible structures

or domains of a protein.

Figure 5.hydrophilicity plot of QscR

From the figure5, it is concluded that amino acids show positive for hydrophobicity, these amino

acids may be part of alpha-helix spanning.

Table 2: Predicted secondary structure ofQscR under different topology

Methods Secondary structure

Sequence (1-50) MHDEREGYLE ILSRITTEEE FFSLVLEICG NYGFEFFSFG ARAPFPLTAP

Structure

DSSP ******SHHH HHHH** SHHH HHHHHHHHHH HTT*SEEEEE EE***STTS*

MLNN CHHHHHHHHH HHHHCCCHHH HHHHHHHHHH HHCCCEEEEE EECCCCCCCC

Proposed

PSONN CHHHHHHHHH HHHHCHCHHH HHHHHHHHHH HHCHCEEEEE EECCCCCCCH

Sequence(51-100) KYHFLSNYPG EWKSRYISED YTSIDPIVRH GLLEYTPLIW NGEDFQENRF

Structure

DSSP *EEEEE*** H HHHHHHHHTT GGGT*HHHHH HHHS*S* EEE ETTT*SS*HH

MLNN CEEEECCCCH HHHHHHHHHC CHHHCHHHHH HHHCCCCEEE CCCCCHHHHH

Proposed

PSONN HEEEECCCCH HHHHHHHHHC CHHHHHHHHH HHHCCCHEEE CCCCHHHHHH

Sequence(101-150) FWEEALHHGI RHGWSIPVRG KYGLISMLSL VRSSESIAAT EILEKESFLL

Structure

DSSP HHHHHHHTT* *EEEEEEEE* GGG*EEEEEE EESSS*** HH HHHHHHHHHH

MLNN HHHHHHHHCC CCEEEEEEEC CCCCEEEEEE ECCCCCCCHH HHHHHHHHHH

Proposed

PSONN HHHHHHHHCH CCEEEEEEEH CCCCEEEEEE ECCCCCCHHH HHHHHHHHHH

Sequence(151-200) WITSMLQATF GDLLAPRIVP ESNVRLTARE TEMLKWTAVG KTYGEIGLIL

Structure

DSSP HHHHHHHHHH HHHHHHHHSG GGG**** HHH HHHHHHHHTT **HHHHHHHH

MLNN HHHHHHHHHH HHHHCCCCCC CCCCCCCHHH HHHHHHHHHC CCHHHHHHHH

Proposed HHHHHHHHHH HHHHCCCCCH CCCCCCHHHH HHHHHHHHHC HCHHHHHHHH

17

ISSN NO: 1524-2560




PSONN

Sequence(201-237) SIDQRTVKFH IVNAMRKLNS SNKAEATMKA YAIGLLN ---

Structure

DSSP TS*HHHHHHH HHHHHHHTT* SSHHHHHHHH HHTT*** ---

MLNN CCCHHHHHHH HHHHHHHHCC CCHHHHHHHH HHHCCCC ---

Proposed

PSONN CCHHHHHHHH HHHHHHHHCH CCHHHHHHHH HHHCCHH

----

ASSESSMENT OF PREDICTION ACCURACY

Four routinely used assessment criteria were adopted here, that is, sensitivity (SN),

specificity (SP), accuracy (ACC), and AUC (area under Receiver Operating Characteristic

curve): where TP, TN, FP, and FN were the abbreviations of true positives, true negatives, false

positives, and false negatives. The experimental results were given in Table 3.

Table 3: The prediction performanceof algorithms

S.No Methodology SN (%) SP (%) ACC (%)

1 MLNN 92.55 97.17 94.28

2 PSO-NN 93.73 97.59 95.04

The ROC (Receiver Operating Characteristic) curve was to plot the true positive rate

against false positive rate, and the AUC was a reliable measure for evaluating performance.

Generally, the PSONN performed the best among these two algorithms and the same is inferred

from figure 6. Figure 7 depicts the 3D structure of QscR

Figure 6. ROC Curve

18

ISSN NO: 1524-2560




Figure 7. 3D structure of QscR

COMPARISON WITH OTHER METHODS

Finally, the results obtained using the proposed methodology is compared with the performance

of other networks which depicted the secondary structure of QscR. And is displayed in table 2.

Table 4. Comparative analysis of the performance of the other methods in secondary

structure prediction

S.No Method Alpha (%) Beta sheet(%) Coil

1 DSSP 54 13 -

2 STRIDE 54 11 -

3 MLNN 68 15 4

4 PSO-NN 72 16 3

From table 4, it is concluded that the proposed PSO-NN gives more and better prediction of

secondary structure than the other topologies.

CONCLUSION

In this work, a novel method PSO based ANN is implemented to identify the secondary

structure of a protein. The proposed predictor achieved promising results and outperformed

many other state-of-the-art predictors. This scheme automatically tunes the neural network using

optimization topology called PSO. It achieved an accuracy of about 95% on the independent

dataset. The experimental performance indicated that the proposed method could be useful in

assisting the discovery of important protein modifications and would be powerful in protein

structure prediction research domains.

19

ISSN NO: 1524-2560




REFERENCES

1. Rylance, G.. Applications of genetic algorithms in protein folding studies, 2004 The first-

year report, School of Chemistry, England.

2. Sikder, A.R., and Zomaya, A.Y.. An Overview of protein-folding techniques: issues and

perspectives, 2005 International Journal of Bioinfermatics Resaerch and Application,

V- 1,P- 121–143.

3. B. Rost, and C. Sander, Prediction of protein secondary structure at better than accuracy,

1993, Journal of Molecular Biology ,P- 232 584.

4. P. Baldi, S. Brunak, P., Plotting the past and the future in secondary structure prediction,

1999, Journal of Bioinformatics,V- 15 (11),P- 937.

5. G. Pollastri and A. McLysaght, Porter: a new, accurate server for protein secondary

structure prediction, 2005, Journal of Bioinformatics, V- 21,P- 1719.

6. P. Baldi and S. Brunak, The Machine Learning Approach,2001, MIT Press, Cambridge.

7. Hae-Jin Hu, Robert W. Harrison, Current Methods for Protein Secondary-Structure

Prediction Based on Support Vector Machine, 2007, Knowledge Discovery in

Bioinformatics: Techniques, Methods, and Applications,,.

8. David T. Jones, Protein Secondary Structure Prediction Based on Position-specific

Scoring Matrice, 1999,Journal of Molecular Biology,V-1,P-1-5.

9. Wang S and Peng J, Protein secondary structure prediction using deep convolutional

neural fields, 2016. Science. Report. V-6, P-18962.

10. Kabsch, Wolfgang and Sander, Christian. Dictionary of protein secondary structure:

pattern recognition of hydrogen-bonded and geometrical features, 1983, Journal of

Biopolymers, V-22(12),P-2577–2637.

11. Schmidler SC, Liu JS, Brutlag LD. Bayesian segmentation of protein secondary structure,

2000, Journal of Computational Biology, V- 7(1-2),P-233–48.

12. Chu W, Ghahramani Z, A graphical model for protein secondary structure prediction,

2004, Proceedings 21st Annual (ICML). New York: ACM,P- 161–168.

13. Hua S, Sun Z. A novel method of protein secondary structure prediction with high

segment overlap measure: support vector machine approach, 2001, Journal Molecular

Biology,V- 308(2),P-397–407.

14. Guo J, Chen H, Sun Z, A novel method for protein secondary structure prediction using

dual-layer SVM and profiles,2004, Protein Structre Function and Bioinformatics, V-

54(4),P-738–743.

15. Asai K, and Hayamizu S, Prediction of protein secondary structure by the hidden

Markov model, 1993, Journal of Bioinformatics,V-9(2), P-141.

16. Aydin Z, Altunbasak Y, ,Protein secondary structure prediction for a single-sequence

using hidden semi-Markov models, 2006, Journmal of Bioinformatics,V- 7(1),P-178.

17. Qian N and Sejnowski TJ. Predicting the secondary structure of globular proteins using

neural network models, 1988, Journal of Molecular Biology,V- 202(4),P-865–884

20

ISSN NO: 1524-2560




18. Jones DT. Protein secondary structure prediction based on position-specific scoring

matrices, 1999, Journal of Molecular Biology,V- 292(2),P-195.

19. Buchan DW, Minneci F, Nugent TC, Scalable web services for the inspired protein

analysis workbench, 2013, Journal of Nucleic Acids Research,V- 413,P-49–57.

20. Faraggi E, Al E. Spine x: improving protein secondary structure prediction by multistep

learning coupled with prediction of solvent accessible surface area and backbone torsion

angles, 2012,Journal of Computational Chemistry, V- 33(3), P-259–67.

21. Baldi P, BrunakSfrasconi P, Exploiting the past and the future in protein secondary

structure prediction ,1999, Journal of Bioinformatics.,V-15(11),P- 937–946.

22. R. Mendes, J. Kennedy, The fully informed particle swarm: simpler, maybe better, 2004,

IEEE Transectional .Evaluation Computational,V- 8 (3),P- 204–210.

23. C.H. Yang, Y.S. Lin, A particle swarm optimization-based approach with local search for

predicting protein folding, 2017, Journal of Computational Biology,V- 24 (10),P- 981–

994.

24. Wilke, D.N. Analysis of the Particle Swarm Optimization Algorithm. Master dissertation,

University of Pretoria, 2005.

25. M. Geis, and M. Middendorf, Particle swarm optimization for finding RNA secondary

structures, 2011, Journal of Intellegence and Computational cybern,V- 4 (2),P- 160–186.

21

ISSN NO: 1524-2560




Documents

Quorum-Sensing Control Repressor - Pseudomonas aeruginosa ...jscglobal.org/gallery/2-dec-1467.pdf · Prediction of the three states from protein sequences (i.e., the Q3 prediction