[IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference

Notice of Violation of IEEE Publication Principles

"Wavelet Assignment Graph Kernal for Drug Virtual Screening" by Soumya T. Soman and Soman K.P. in the Proceedings of the 2009 International Conference on Advances in Recent Technology in Communication and Computing, pp 282-284 After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE's Publication Principles. This paper contains significant portions of original text from the paper cited below. The original text was copied without attribution (including appropriate references to the original author(s) and/or paper title) and without permission. Due to the nature of this violation, reasonable effort should be made to remove all past references to this paper, and future references should be made to the following article: "Graph Wavelet Alignment Kernals for Drug Virtual Screening" by A. Smalter, J. Huan, G. Lushington in the Proceedings of the 7th Annual International Conference on Computational Systems Bioinformatic, Life Sciences Society, August 2008, pp. 327-338

Wavelet Assignment Graph kernel for Drug Virtual Screening

Soumya.T. Soman Computational Engineering and Networking

Amrita Vishwa Vidyapeetham Coimbatore, India.

[email protected]

Soman K P Computational Engineering and Networking

Amrita Vishwa Vidyapeetham Coimbatore, India.

[email protected]

Abstract - We propose a kernel function called Wavelet Assignment graph kernel for graph classification which has applications in drug discovery. This is an extension of wavelet alignment graph kernel. In this method we use graphs to model chemical compounds. For feature extraction we have applied wavelet analysis to graph structured chemical structure, for each atom we collect features about the atom and its local environment with different scales. For finding the similarity between two graphs, nodes of one graph are aligned with nodes of the other graph. such that total overall similarity is maximized with respect to all possible alignment. For alignment between two graphs we have used Wavelet Assignment graph kernel. We have evaluated the efficiency of our kernel function using Predictive Toxicology Challenge data set. Our results indicate that the new kernel function is more efficient than the existing wavelet alignment graph kernel function. Keywords – SVM, wavelet, kernel, QSAR. I. INTRODUCTION Chemoinformatics is the use of information technology to manage chemical information and solve chemical problems. It is a rapidly emerging research area that provides wide array of statistical, data mining, and machine learning techniques for finding the relationships between structures of the chemical compounds and their biological properties. Prediction of the biological activity of the chemical compounds is one of the major goals of chemoinformatics.

Recently Support Vector Machine (SVM) is used in drug discovery especially in virtual screening [1]. Support vector machines work by constructing a hyper plane in a high dimensional feature space. Two important characteristics of SVM are the utilization of kernel functions (i.e. inner product of two points in a Hilbert Space) to transform a non-linear classification to a linear classification and the utilization of a large margin classifier to separate points with different class labels. For evaluating the classifier performance we have applied our graph kernel methods to Predictive Toxicology Challenge (PTC) dataset (which reports the carcinogenicity of several hundred chemical compounds for Male Mice (MM), female mice, male rats etc,.).For feature creation we apply Wavelet analysis which is proposed by Aaron Smalter et al [2].� Each atom in a graph, we may collect features about the atom and its local environment with different scales. The accuracies

obtained are better than the existing wavelet alignment kernels for graphs. The rest of the paper is structured in the following way. Section 2 presents an overview of previous work on quantitative chemical structure-property relationship study�and the importance of SVM in virtual screening. We outline the background information about the graph representation of chemical compounds, Graph kernel function and Graph wavelet Analysis in section 3.Section 4 describes the algorithmic details of design of graph Wavelet alignment kernel. In section 5 we describe the modification of wavelet alignment graph kernel (Wavelet Assignment Graph Kernel). In section 6 we summarize the experiments and results of our experiments on the classification of chemical compounds. Finally, we conclude with Section 7. II PREVIOUS WORK Target property is the measurable quantity of a chemical compound. Continuous target property (e.g., binding affinities to a protein) and discrete target property (e.g. active compounds vs. inactive compounds) are the two categories of target properties. A chemical compound and its target property is typically investigated through a quantitative structure-property relationship (QSPR).�QSPR method may be generally defined as a function that maps a chemicalspace to a property space in the form of )(DkP = Where D is a chemical structure, P is a property, and the function k is a mapping from a chemical space to property space. Many classification methods has been applied to build QSPR models. Recently Support Vector Machines (SVM) are increasingly used for virtual screening phase of drug discovery.

III. BACKGROUND This section describes a general background regarding a computational analysis of chemical structure-property relationship which includes A Graph Representation of Chemical compounds A chemical compound is conventionally represented as an undirected graph where nodes represent atom types and edges represent bond types (single, double, and aromatic bond).

2009 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-3845-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.197

282


978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.197

282


978-0-7695-3845-7/09 $26.00 © 2009 IEEE

DOI 10.1109/ARTCom.2009.197

282

Figure 1 shows graphical representation of chemical compounds.

Figure 1 Graphical Representation of a chemical compound B Graph kernel Functions Kernel methods such as support vector machines are becoming increasingly popular for their high performance. In SVM all computations are done via a kernel function. In order to apply kernel methods to graph classification, we first need to define a kernel function between the graphs. Different types of graph kernel functions are existing. Examples are kernel based on paths [3] and cyclic graphs[4]. C Graph Wavelet Analysis Usually wavelets are applied to numerically valued data such as communication signals or mathematical functions, as well as to some regularly structured numeric data such as matrices and images.� Graphs, however, are arbitrarily structured and may represent innumerable relationships and topologies between data elements. Recent work has established the successful application of wavelet functions to graphs for multi-resolution analysis. Examples of wavelet functions are Haar wavelet and mexican hat wavelet. IV. ALGORITHM DESIGN In the following sections we outline the algorithm for design of wavelet alignment graph kernel. A. Graph Wavelet Analysis & Feature Extraction Wavelet analysis, transforms a series of signals to a set of summaries with different scale. Wavelet analysis offers efficient tools to decompose and represent a function with arbitrary shape. For feature extraction we have used wavelet function. In order to define a reasonable graph wavelet functions, we have introduced the following two important concepts.

• h-hop neighborhood • Discrete wavelet functions

h-Hop Neighborhood The h-hope neighborhood of a node v in a graph G is denoted by ( )hN v , is the set of nodes that are exactly h hop

away from v. If h = 0, 0 ( )N v v= and if h=1 then

1( ) { | ( , ) [ ]}N v u u v E G= ∈ .The average feature

measurement, denoted by ( )jf v−

for nodes in ( )jN v

( )

1( )| ( ) |

j

uju N vj

f v fN v

−

∈

= ∑ (2)

Given a node v in the graph G, we label the shortest distance of nodes to v in the graph G. Here 0 ( )N v v= and

1( ) { , }N v t u= .If the feature vector contains a single

feature of atomic number, ( )jf v−

is the average atomic number of atoms that are at most 1-hop away from v. For example in our case t, u both are carbon with atomic number

equal to 6, then ( )jf v−

is equal to 6. Discrete Wavelet Functions For applying wavelet function to discrete structure such as graphs, we convert wavelet function ( )xψ to apply to the h-hop neighborhood. We scale a wavelet function ( )xψ (such as the Haar wavelet) to have support on the domain [0,1]), with integral 0, and partition the function into h+1 intervals. Then compute the average , ,j hψ as the average of ( )xψ

over the jth interval, 0 j h≤ ≤ as below

( 1 ) / ( 1 )

, ( 1 )

1 ( )1

j h

j h j hx d x

hψ ψ

+ +

+≡

+ ∫ (3)

Then we apply wavelet analysis to discrete structure such as graphs. Wavelet analysis is called wavelet measurements, denoted by ( )h vΓ , for a node v in a graph G at scale up to h>0.

, ,0

( ) ( )h h v j h jj

v C f vψ−

=

Τ = ∗ ∗∑ (4)

Where ,h vC is a normalization factor with 1 / 22

,0

( , )| ( ) |

h j hj

j

C h vN vψ

−

=

⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠∑

We define ( )h vΓ as the sequence of wavelet measurements as applied to a node v with scale value up to h. That is

1 2( ) { ( ), ( ),....., ( )}.hhv v v vΓ = Γ Γ Γ Finally we introduce the

wavelet measurement vector into the alignment kernel with the following formula.

[ ]

( , ') max ( ( ), ( ( )))h ha

v V Gk G G k v v

ππΓ

∈

= Γ Γ∑ (5)

This is the wavelet Alignment Graph kernel (WA). Where

ak is either Linear or RBF kernel.

283283283

Figure 3 shows a chemical graph overlayed with a wavelet function centered on a specific vertex. Here we can see when the hop distance is zero, the wavelet is most intense at the specific (here central) vertex.

Figure 3. Superposition of a wavelet functions on the chemical graph. V. MODIFICATION OF WAVELET ALIGNMENT GRAPH KERNEL. For finding the similarity between two graphs we use the following formula.

[ ]

( , ') max ( ( ), ( ( )))h ha

v V Gk G G k v v

ππΓ

∈

= Γ Γ∑

where ak is the Optimal Assignment kernel [5] defined by | |

1 ( )1

| |1 ( )1

max ( , | | | |( , ) :

max ( , )

xi ii

A yj ji

k x y if y xk x y

k x y otherwise

π π

π π

=

=

⎧ ≥⎪= ⎨⎪⎩

∑∑

Where 1k is either linear or RBF Kernel Function. This is the modified Wavelet Alignment graph Kernel called Wavelet Assignment graph kernel (WS). VII. EXPERIMENTAL RESULTS A. Datasets To evaluate our classifier performance we have used Predictive Toxicology Challenge data set [6], which contains a set of chemical compounds classified according to their toxicity in male rats (PTC-MR), female rats (PTC-FR), male mice (PTC-MM), and female mice (PTC-M). Various statistics for these data sets can be found in Table 1. B. Methods In our experimental study, for collecting the neighborhood information we have used the value of h (hop distances) as 1, 2 and 3. Here we have done binary classification. In our experiments, we have used the support vector machine (SVM) classifier in order to generate activity predictions. We have used the LibSVM classifier implemented by Chang et al. [7] In this software we can give our own wavelet based graph assignment linear kernel and RBF kernel as input. Using kernel matrix given by us the software undergoes the training process. Here we have done 10 cross validation. For binary classification we have used c-SVM with c = 0.01 for all four data sets. We have used Haar wavelet function in our experiments. We have developed and tested our algorithm under Java programming environment.

TABLE I. DATA SET AND CLASS STATISTICS

Data set

Graphs Class Labels Count

PTC-MM 336 binary 1 -1

129 207

PTC-RR 344 binary 1 -1

152 192

PTC-FR 351 binary 1 -1

121 230

PTC-FM 349 binary 1 -1

143 206

C. Results Table 2 reports the prediction results of four data sets over 10 cross validation. Wavelet Assignment -RBF kernel shows more accuracy than other three kernels (WA RBF, WA Linear, Wavelet Assignment Linear kernel) when applied on PTC data set. TABLE II. CLASSIFICATION ACCURACY OF THE DATA SETS

Data set

WA RBF

WA Linear

WS* RBF

WS Linear

PTC-FM 51.46 55.81 59.09 57.90 PTC-FR 52.87 59.31 65.52 64.52 PTC_MM 52.36 58.91 61.60 60.16 PTC-MR 52.38 52.09 55.81 55.81

WS- Wavelet assignment kernel VII CONCLUSION In this work we modified the wavelet Alignment graph kernel to Wavelet Assignment graph kernel for finding the similarity of chemical compounds, based on the use of wavelet based descriptors.Our experimental study shows that this modified graph kernel shows improved performance over the existing wavelet alignment graph kernel.

REFERENCES [1] Pierre Mahe, "Kernel design for virtual screening of molecules using

support vector machines". In Ph.d thesis from Pierre Mahe Ecole des Mines de Paris September 15th, 2006.

[2] Graph Wavelet Alignment Kernels for Drug Virtual Screening Aaron Smalter, Jun Huan, and Gerald Lushington Department of Electrical Engineering and Computer Science Molecular Graphics and Modeling Laboratory University of Kansas.

[3] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels labeled graphs. In Proc. of the Twentieth Int. Conf. on Machine learning 2003.

[4] Tamas Horvath, Thomas Gartner, and Stefan Wrobel.Cyclic pattern kernels for predictive graph mining.SIGKDD, 2004.

[5] Holger Frohlich, Jorg K.Wegner, Florian Sieker, Andreas Zell Optimal Assignment Kernels For Attributed Molecular Graphs.

[6] C. Helma, R. King, and S. Kramer. The predictive toxicology challenge 2000-2001. Bioinformatics.

[7] C. Chang and C. Lin. Libsvm: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

284284284

Documents

[IEEE 2009 International Conference on Advances in Recent Technologies in Communication and Computing - Kottayam, Kerala, India (2009.10.27-2009.10.28)] 2009 International Conference