Upload
independent
View
2
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [Soongsil University]On: 22 April 2012, At: 16:31Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
SAR and QSAR in EnvironmentalResearchPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gsar20
yaInChI: Modified InChI string schemefor line notation of chemical structuresY.S. Cho a , K.T. No b & K.-H. Cho aa Department of Bioinformatics and Research Center forIntegrative Basic Science, SoongSil University, Seoul, Koreab Department of Biotechnology, Yonsei University, Seoul, Korea
Available online: 02 Apr 2012
To cite this article: Y.S. Cho, K.T. No & K.-H. Cho (2012): yaInChI: Modified InChI string scheme forline notation of chemical structures, SAR and QSAR in Environmental Research, 23:3-4, 237-255
To link to this article: http://dx.doi.org/10.1080/1062936X.2012.657677
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.
SAR and QSAR in Environmental ResearchVol. 23, Nos. 3–4, April–June 2012, 237–255
yaInChI: Modified InChI string scheme for line notation of chemical
structures$£
Y.S. Choa, K.T. Nob and K.-H. Choa*
aDepartment of Bioinformatics and Research Center for Integrative Basic Science,SoongSil University, Seoul, Korea; bDepartment of Biotechnology, Yonsei University, Seoul, Korea
(Received 30 August 2011; in final form 19 October 2011)
A modified InChI (International Chemical Identifier) string scheme, yaInChI(yet another InChI), is suggested as a method for including the structuralinformation of a given molecule, making it straightforward and more easilyreadable. The yaInChI theme is applicable for checking the structural identitywith higher sensitivity and generating three-dimensional (3-D) structures from theone-dimensional (1-D) string with less ambiguity than the general InChI method.The modifications to yaInChI provide non-rotatable single bonds, stereochem-istry of organometallic compounds, allene and cumulene, and parity of atomswith a lone pair. Additionally, yaInChI better preserves the original informationof the given input file (SDF) using the protonation information, hydrogen countþ1, and original bond type, which are not considered or restrictively consideredin InChI and SMILES. When yaInChI is used to perform a duplication check ona 3D chemical structure database, Ligand.Info, it shows more discriminatingpower than InChI. The structural information provided by yaInChI is in acompact format, making it a promising solution for handling large chemicalstructure databases.
Keywords: line notation; duplication check; InChI; chemical database; SMILES
1. Introduction
The ‘One compound-One name conundrum’ method provides a unique name to eachcompound and has attracted major interest in chemistry and related fields. Presently,SMILES [1,2] and InChI [3–5] strings are the most widely used to provide compoundnames (or line notations). The original SMILES scheme was designed by Arthur andDavid Weininger in the 1980s [2] and has since been modified by others affording severaldifferent algorithms for producing SMILES strings [6]. The SMILES strings share thesame representation, but the generated SMILES strings differ from one algorithm toanother depending on the string generation method and canonicalization algorithm. Thecanonicalization algorithm is used to assign unique numbers to atoms in a molecule,independent of the order of atoms in a given input file. Additionally, SMILES is difficultto apply to molecules with complicated structures.
*Corresponding author. Email: [email protected]$Dedicated to the memory of Professor Corwin H. Hansch (1918–2011).£Presented at CMTPI 2011: Computational Methods in Toxicology and Pharmacology IntegratingInternet Resources (Maribor, Slovenia, 3–7 September 2011).
ISSN 1062–936X print/ISSN 1029–046X online
� 2012 Taylor & Francis
http://dx.doi.org/10.1080/1062936X.2012.657677
http://www.tandfonline.com
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
In contrast, InChI was developed in cooperation with the International Union of Pureand Applied Chemistry (IUPAC) and the National Institute of Standards and Technology(NIST). InChI is the latest method for describing strings of chemical structures and itovercomes the ambiguity associated with SMILES strings. The InChI system was derivedfrom the chemical structures and uses unique, layered and tautomer-friendly character-ization. InChI was designed to assign one name to one compound using a universalcanonical numbering system affording a unique string. A layered format was used inInChI, affording a variety of aims and to describe the tautomer forms within the InChIstring. Moreover, compared with SMILES, InChI can easily be applied to molecules withcomplicated structures. However, the InChI string is not easily readable because InChIrepresents all bond types with a single dash (-), which does not provide the number orlocation of double or triple bonds in a molecule. This requires the user to understandorbital and valence theories, to know the number of hydrogen atoms attached to thecentral atom and to identify the charge to estimate bond types. Additionally, it is verydifficult to determine the number of rings and their sizes using the InChI string. Both theInChI and SMILE systems have several limitations in describing chemical structures –non-rotatable single bonds, allene or cumulene, parity of atoms that have three branchesand one lone pair such as amines, stereochemistry of metal-containing compounds, andgenerating three-dimensional (3-D) structures from one-dimensional (1-D) strings. Theadvantages and disadvantages of SMILES and InChI are listed (Table 1) [7].
Large chemical databases have different compounds that share the same name or thesame compounds are stored under different names or IDs, which make the databaseinefficient and difficult to use. The best way of checking molecule identity is by convertingthe 3-D structures to a 1-D string and then comparing the outputs; however, doing sorequires more sophisticated methods. We have suggested a modified InChI scheme,yaInChI, to overcome the current InChI limitations (ver. 1.03) and to include as much
Table 1. Characteristics of SMILES and InChI methods.
C
C
C
N
N
BC
S
C
OH
S
C
C
C
O O
SMILES InChI
CCCS(¼O)(¼O)[N]1¼NC¼c2sc(C)cc2¼B1O InChI¼1S/C9H13BN2O3S2/c1-3-4-17(14,15)12-10(13)8-5-7(2)16-9(8)6-11-12/h5-6,13H,3-4H2,1-2H3
- not unique- bond type is explicitly expressed- no tautomer information- difficult to generate string- ring information
- unique- multiple layer information- supports tautomer information- low human readability- no ring information
238 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
structural information as possible, in order to limit ambiguity, which is present in othermethods. However, some ambiguity is necessary for compatibility with differentconventions. The yaInChI method could be beneficial for some purposes such as checkingstructural identity, improving readability and generating 3-D structures from thecorresponding 1-D string. yaInChI is available at http://ebio.ssu.ac.kr/yaInChI.
2. Methods
To avoid ambiguity and to enhance the readability of InChI, we propose a modifiedversion of InChI called yaInChI. yaInChI was developed based on the InChI scheme,which means yaInChI inherits most of the layers from InChI. Additionally, yaInChIcontains a few more layers such as /bt, /nr, /en, /mt and /mh, and modifies the /c, /t, /fh, /pand /q layers to provide more structural information. An outline of the yaInChI layers anda comparison to InChI are presented (Table 2). The main purpose of the yaInChI system isto include more structural information in a structure file. With the yaInChI string, one cancompare molecules very easily and convert them from 1-D string to 3-D structures withless ambiguity.
2.1 Input file format
The input for yaInChI is standard SDF (structure data file) format [8] except for the extraatom information in the eighth column of the atom block, which is not used in thestandard SDF format (Figure 1). The SDF format does not present tautomer informationin the standard format, so some modifications were necessary. Tautomer-related mobilehydrogen information from a tautomer-detection program is reported in the eighthcolumn. The tautomer information can be obtained from various algorithms [9].
The InChI system calculates the mobile hydrogen using an intrinsic tautomer detectionalgorithm based on balanced network searches (BNS) [10]. However, the accuracy of thetautomer detection algorithms is still controversial, so instead of using a tautomerdetection algorithm, yaInChI uses tautomer information from the input file. Atoms thathave the same mobile hydrogen group will have the same number in the eighth column asshown in (Figure 1). If the tautomer detection program can distinguish the stability of thetautomer, information such as 1A, 1B and 1C, where the number represents the number ofthe tautomer group and the letter represents the order of the tautomer stability, can beadded by the user. All of the x, y and z coordinates are required for optimum performancebecause some stereochemical outputs are estimated from the coordinates; for example,configurations of metal-containing compounds and four types of special double bondstereochemistry, which are described in the next section.
2.2 Stereochemistry of special double bonds
Generally, the stereochemistry of some special double bonds such as allene or cumulene,and non-rotatable single bonds, is expressed as cis or trans based on the assumption thatall atoms involved in the stereochemistry are planar. However, the molecule stereochem-istry (dihedral angle) is sometimes closer to �90� or þ90� than 0� or 180�. If one moleculehas a dihedral angle of 89� and another has 91�, they will end up as cis andtrans conformation, respectively, with the prototypical cis-trans definition.
SAR and QSAR in Environmental Research 239
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
Alternatively, the yaInChI system uses four stereochemistry definitions to represent thestereochemistry in special cases. Though seemingly complicated, the information is veryuseful for building a 3-D structure from a given 1-D string. The stereochemistry definitionsused in this paper are presented in Table 3. Additionally, the yaInChI method does notrepresent the stereochemistry (/b, /en and /nr layers) of atoms in less than seven-memberedrings. The rings in a molecule are identified using the RP-Path [11].
Table 2. Identification of yaInChI layers and comparison with InChI.
Layer Meaning of layerDifference betweenyaInChI and InChI
Included incanonicalization
1. Main layer /f chemical formula Not changed No/c connectivity Modified, yaInChI
specific/c layer/h hydrogen (include mobile
hydrogen)Not modified, but takes
information fromgiven SDF file
2. Charge layer /q net charge Modified, net charge ofmolecule
No
/p protonation Modified, informationof all protonatedatoms
No
3. Stereo layer /b cis–trans double bond Not changed/en allene or cumulene Added, structural infor-
mation of series ofdouble bonds
/t parity Modified, includesatoms having threedifferent brancheswith lone pair andfour branches havingthree or four differentbranches
/nr non-rotatable bond Added, structural infor-mation of non-rotatable single bond
/mt metal connectivity Added, structuralinformation of metalconnectivity
/m parity inverted to obtainrelative stereo
Deleted No
/s stereo type Deleted No4. Extra layer /i isotope Not changed
/mh tautomer-specific hydrogen Added, original tauto-mer specific hydrogeninformation
No
/fh hydrogen count þ1 Added, original value ofhydrogen count þ1column
No
/bt bond table Added, bond informa-tion of given input
No
240 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
2.3 Connectivity layer, /c
The /c layer contains the unique atom number and their connection table values based on
the canonicalization process (described in Section 2.11). In InChI, an atom having not
only the smallest number of branches, but also the smallest canonical number (see
Section 2.11 for canonical numbers) is selected as the starting atom. The remaining atoms
Figure 1. SDF format column information.
Table 3. Stereochemistry definitions for special double bonds.
Dihedral angles Symbol
þ45�5T� 135� þ
�135�5T��45� �
�45�5T� 45� ¼
T��135� or 135�5T %
SAR and QSAR in Environmental Research 241
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
are ordered from the smallest canonical number using the connection table. Alternatively,the /c layer of yaInChI is designed to use the longest path among the pairs of shortestFloyd and Warshall algorithm paths [12] as the main connectivity string (main chain),providing a rough estimate of the molecular length. If the longest paths are of the samelength, the path containing end-point atoms with the smallest number of branches isselected as the main chain. Then, the branch connectivity strings are added to the front ofthe main chain using a similar method as above based on the connection table data. Thenewly generated strings are merged into the previously generated string using parentheses.This process is repeated until all information of the connection table is used. Rings areexpressed using the same number twice. In Table 4, ‘1(3-7(8)4)2-6-4’ means that atomnumbers 1, 3, 7, 4, 6 and 2 make a six-membered ring, and ‘1-2-6-4-5’ is the longest path inthe molecule. With this scheme, the length of molecule, the number of rings and their sizes,the number of branches and the overall molecule shape can be visualized.
2.4 Charge layer, /q and /p
The /q and /p layers in yaInChI store information on net charge and protons, respectively,in a given molecule. These definitions differ from the InChI system. InChI changes thecharged state and bond types by adding extra protons to radicals, disconnecting salts andmetals, and recalculating the formal charges according to the new state in thenormalization step [13]. This process to the InChI system limits the original chargedistribution information. The yaInChI system uses a normalization step to neutralize themolecule charge distribution, but maintains salt and metal structures, takes tautomerinformation from the input file and represents original charge distribution in the /p layer.The original charge distribution information is provided by the SDF (atom block in the
Table 4. Representation of charge information in yaInChI.1
(a) (b)
C1 C3
N7
C4N6
C2
N5
H
H
O8+1 -1
C1 C3
N7
C4N6
C2
N5
O80 0
H
H
yaInChI yaInChI¼/fC4H5N3O/c1(3-7(8)4)2-6-4-5/h1-3H,(H2,5,6)/mh5H2/p7Y3,8Y5/bt44441441
yaInChI¼/fC4H5N3O/c1(3-7(8)4)2-6-4-5/h1-3H,(H2,5,6)/mh5-6H/bt21122112
InChI InChI¼1S/C4H5N3O/c5-4-6-2-1-3-7(4)8/h1-3H,(H2,5,6)
1 Additionally, eliminating the /p, /mh and /bt layers from yaInChI results in the same output asInChI.
242 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
‘proton’ column). The utilization of the /p layer in InChI and yaInChI for keeping theprotonation information for molecules (a) and (b) is shown in Table 4. InChI generates thesame string for both (a) and (b); however, yaInChI generates different strings for eachmolecule according to the information contained in the input file. Information on /p couldaffect the /mh and /bt layers; therefore, due to the elimination of the /p, /mh and /bt layers,yaInChI generates the same string for both molecules.
For the duplication check, /p, /mt and /bt layers were not considered, but could beincluded depending on the desired sensitivity. The net charge of the molecule in Table 4column (a) was ‘zero’ and therefore the /q layer does not appear in the string.
2.5 Cumulene layer, /en
InChI and SMILES could manage the stereochemistry of cumulene structures. The InChIsystem uses even and odd numbers of double bonds to determine the stereochemistry. Aneven number of double bonds in the /t layer (parity layer) suggests a tetrahedral structureand an odd number of double bonds in the /b layer (cis–trans layer) indicates the cis–transconformation [14]. However, in some cases, cumulene could have cis or trans conforma-tion even though they have an even number of double bonds and could have tetrahedralconformations with an odd number of double bonds due to the steric constraints of theentire molecule. Unfortunately, InChI cannot separate those cases correctly. Therefore,yaInChI utilizes the /en layer to avoid any uncertainty related to cumulene (Tables 5 and6). The information in the /en layer was calculated using the dihedral angle and two atoms
Table 5. Misrepresentation of allene stereochemistry using InChI for (a) 1-[2-(1,1,4,8-Tetramethyl-nona-2,3,7-trienyl)-oxazolidin-3-yl]-ethanone and (b) 2-methyl-2,3-pentadien-1-amine.
(a) (b)
C1
C14
C2 C8C7
C9
C15
C3
C10
C11
C18
C5
C6
C17O21
C13
C12
N19C16
C4
O20
C1
C3C4
C6
C2
C5
N7
yaInChI yaInChI¼/fC18H29NO2/c1-14(2)8-7-9-15(3)10-11-18(5,6)17(21-13-12-19)19-16(20)4/h8,11,17H,7,9,12-13H2,1-6H3/en11^15/t17-/nr19þ16 /bt111111112122111112111
yaInChI¼/fC6H11N/c1-3-4-6(2)5-7/h3H,5,7H2,1-2H3/en3^6/bt112211
InChI InChI¼1S/C18H29NO2/c1-14(2)8-7-9-15(3)10-11-18(5,6)17-19(16(4)20)12-13-21-17/h8,11,17H,7,9,12-13H2,1-6H3/t10?,17-/m1/s1
InChI¼1S/C6H11N/c1-3-4-6(2)5-7/h3H,5,7H2,1-2H3
SAR and QSAR in Environmental Research 243
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
of larger canonical number from both ends. For example, the dihedral angle of four
consecutive atoms, C1-C3-C12-C11, in Table 6 was measured. The angle definition of the /
en layer is presented (Table 3). The information in the /en layer consists of two numbers
and one symbol between them, which represents the atom at both ends of the double bond
series and one of four types of stereochemistry, respectively.
2.6 Parity layer, /t
The concept of parity is similar to chirality and provides information pertaining to the
spatial direction of four branches attached to the centre atom. Parity uses canonical
numbers of atoms instead of weights or branch priority. In InChI, only atoms having four
different branches or centre atoms of even numbers of double bonds (cumulene) can have
parity and are expressed in the /t layer, but in yaInChI, any sp3 atom, for example, an
atom with three branches and one lone pair, is also included. The lone pair cannot change
its position freely and thus is included in the parity layer, /t. The yaInChI system indicates
parity of atoms having both four and three different types of branches to provide
selectivity in situations such as N15 (Table 6). C13 (Table 6) has only three different types of
branches; however, without displaying the parity on C13, the two molecules are not
distinguishable. For reference, the /m and /s layers (the stereo options in InChI) were not
used in yaInChI because the /m and /s layers are subordinate to the /t layer and
stereoisomerism related to the /t layer can be distinguished without these layers. The
symbols following the atom numbers, ‘þ’ and ‘�’, indicate clockwise and
counter-clockwise spatial arrangements of atoms with increasing canonical numbers,
respectively. The lone pair has the lowest priority.
Table 6. Example of /en and /t layers for hypothetical isomers.1
(a) (b)
C1
C2
C3C4
C5C6
C7
C8
C9
C10
C11
C12
C13
N14
N15
C1
C2
C3C4
C11N14
C8
C7
C10
C9
C5
C12
C13
C6
N15
yaInChI yaInChI¼/fC13H22N2/c1-3-4-12(11-14-13-8-10-15)5-6-13-7-9-15-2/h3,14H,5-11H2,1-2H3/en3%12/
t13-,15-/bt1122111111111111
yaInChI¼/fC13H22N2/c1-3-4-12(11-14-13-8-10-15)5-6-13-7-9-15-2/h3,14H,5-11H2,1-2H3/en3^12/
t13-,15Y/bt1122111111111111InChI InChI¼1S/C13H22N2/c1-3-4-12-5-6-13(14-11-12)7-9-15(2)10-8-13/h3,14H,
5-11H2,1-2H3
1 The lone pair of N15 in (a) is closer to of N14, whereas the lone pair of N15 in (b) is closer to C6. TheyaInChI strings for these molecules are different in the /en and /t layers; however, InChI considers(a) and (b) to be the same.
244 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
2.7 Non-rotatable single bond layer, /nr
A non-rotatable single bond is a single bond that cannot rotate freely, such as a peptidebond in proteins. The peptide bond, C–N, is presented as a single bond in SDF format butit cannot rotate freely because of the sp2–sp2 hybridization causing double bond character.Molecules have different stereochemistry around the C–N bonds, cis and trans, which havedifferent properties. However, SMILES and InChI do not handle non-rotatable singlebonds and consider these molecules to be the same. In contrast, the yaInChI system usesthe /nr layer, which provides information about non-rotatable single bonds, including theamide group and sp2 carbons connected to three nitrogen atoms as in hydroxyl alginine.Because non-rotatable single bonds can have angles closed to 90� and �90�, the /nr layerfollows the four types of stereochemistry described in Section 2.2. Further, non-rotatablesingle bonds can exist as various forms within the same molecule; for example, an amidecan transform to imidic acid by tautomerization (Table 7). Therefore, yaInChI gives thesame string for both (a) the imidic acid in the cis form and (b) the amide in the cis formwith non-rotatable single bond information. The InChI system indicates the cis form forboth cases; however, it cannot distinguish the stereochemistry around the non-rotatablebonds if they are different such as (c) amide in the trans form.
The information in the /nr layer consists of two numbers and one symbol between thenumbers, which represent the two atoms at both ends of the non-rotatable single bond andone of the stereochemistry cases, respectively.
2.8 Metal connectivity layer, /mt
In InChI, all metal atoms of organometallic compounds are disconnected in the main layerand are not considered as a part of the molecule. The user is able to manage the metalconnectivity but not the stereochemistry of the metal atom with the ‘reconnect’ option.
Table 7. Example of /nr layer related to tautomers of N-methylacetamide.1
(a) Imidic acid cis form (b) Amide cis form (c) Amide trans form
C1
O5
C3
N4
C2
C1
O5
C3
N4
C2 C1
C3
O5
N4
C2
yaInChI yaInChI¼/fC3H7NO/c1-3(5)4-2/h1-2H3,(H,4,5)/nr4^3/mh5H/bt1121
yaInChI¼/fC3H7NO/c1-3(5)4-2/h1-2H3,(H,4,5)/nr4^3/mh4H/bt1112
yaInChI¼/fC3H7NO/c1-3(5)4-2/h1-2H3,(H,4,5)/nr4%3/
mh4H/bt1112InChI InChI¼1S/C3H7NO/c1-3(5)4-2/h1-2H3,(H,4,5)
1 The compounds are the same in (a) and (b) whereas (c) is a different compound from (a) and (b)because of the non-rotatable single bond. The user can determine the level of identificationsensitivity by including or excluding the /nr and /mh layers.
SAR and QSAR in Environmental Research 245
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
In contrast, yaInChI considers the stereochemistry of metals, distinguishes the moleculeshaving different metal connectivity and preserves the original structural information.Table 8 shows an example of organometallic compounds with different stereochemistry.Metals in molecules could have various hybridization states and geometries (shapes). TheyaInChI system was devised to treat metals with up to six bonds, which could have ninedifferent shapes total (Table 9). The stereochemistry of the distorted molecules wasestimated using the provided atomic coordinates and was fitted to one of nine shapes.
The first number in the /mt layer indicates the canonical number of the metal (centreatom) and the numbers after ‘:’ indicate the atoms attached to the centre atom. In the caseof two and three branches, the different symbols between the numbers, such as ‘�’, ‘¼’ and‘_’, indicate different shapes. In the case of two, three and four branches, the first numberafter ‘:’ is always the smallest number among the attached atoms (in this case 1) and thesecond number is the next atom in the clockwise direction, and so on. With five and sixbranches, the number in parentheses indicates atoms in the plane staring from the smallestnumber followed by numbers in the clockwise direction. The number before ‘(’ is the axialatom with the smaller canonical number and the number after ‘)’ is the axial atom with thelarger canonical number. The atoms in the plane and in the axial direction are estimatedfrom the given atomic coordinates.
One of the purposes of yaInChI is to use the string to generate a 3-D structure. If theinitial structure is poor, the 3-D structure may have the wrong geometry regardless ofwhether energy minimization is used. This layer provides useful information whengenerating 3-D structure from the 1-D string.
2.9 Extra hydrogen layer, /mh and /fh
The extra hydrogen layer of yaInChI consists of two parts, /mh and /fh. The /mh layerrepresents the tautomer-specific hydrogen, which means the location of the hydrogen
Table 8. yaInChI displays stereochemistry of metal-containing compounds.
(a) (b)
N2
O4
Ru6
P5
N1
O3
N1
O4
Ru6
P5
O3
N2
yaInChI yaInChI¼/fH8N2O2PRu/c1-6(3,4,5)2/h3-4H,1-2,5H2/qþ2/mt6:1(2^3^5)4/p6þ2/bt11111
yaInChI¼/fH8N2O2PRu/c1-6(3,4,5)2/h3-4H,1-2,5H2/qþ2/mt6:3(1^2^5)4/p6þ2/bt11111
InChI(OB) InChI¼1S/2H2N.2H2O.H2P.Ru/h5*1H2;/q2*-1;;;-1;þ7/p-2
InChI(OB) With‘reconnect’ option
InChI¼1/2H2N.2H2O.H2P.Ru/h5*1H2;/q2*-1;;;-1;þ7/p-2/rH8N2O2PRu /c1-6(2,3,4)5 /h3-4H,1-2,5H2 /qþ2
246 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
among the paired atoms in tautomer parenthesis (/h layer). A tautomer is an organic
compound isomer that immediately converts from one form to another at room
temperature.Representing extra hydrogen information in yaInChI due to tautomerization is
somewhat different from InChI. The InChI system represents the mobile hydrogen groups
for tautomers in the /h layer by placing paired atoms in parenthesis. For example, ‘(H2, 5,
6)’ indicates that two hydrogen atoms are connected to the ‘N5’ or ‘N6’ atom and that this
hydrogen can migrate from one location to another (Table 4). InChI calculates
mobile hydrogen using an intrinsic BNS-based tautomer detection algorithm [13];
Table 9. Types of hybridization of metal-containing compounds and notation using /mt layer.
Number of branches Metal connectivity types
2 Bent Linear
1 3
2
1 3 2
3:1-2 3:1=2
3 Horn Trigonal planar
2
4
13
2
4
1 3 4:1-2-3 4:1=2=3
4 Tetrahedral Plane Pyramid
1
5
3 24
1
5
3
4 2
5
4 21 3
5:1-2-3-4 5:1=2=3=4 5:1_2_3_4
5 Trigonal bipyramid
4 6
1
5
2
3
6:1(2=3=4)5
6 Octahedral
1
7
6
24
5
3
7:1(2=3=4=5)6
SAR and QSAR in Environmental Research 247
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
however, the accuracy of tautomer-detection algorithms is questionable. The yaInChIsystem uses the tautomer information provided in the input file explained in Section 2.1.If a tautomer detection program is elaborated enough to distinguish the stability oftautomers, one can add some more information such as 1A, 1B and 1C in the column.In that case, numbers in /mh layer have the order of stability not the order of canonicalnumbers. Various tautomer detection algorithms could be used according to user’s choice.
The /fh layer contains information on the hydrogen countþ1 column (the fourthcolumn of extra atom information in the atom block (Figure 1), which is the number ofexcess hydrogen atoms. InChI does not contain information on the hydrogen countþ1column shown (Table 10); however, yaInChI includes this information and is authentic tothe input file.
2.10 Bond type layer, /bt
In the InChI scheme, the various types of bonds in a molecule are not explicitly presentedbecause it is impossible to present defined bond types when a molecule has tautomers andvarious protonated states. Bond types could be calculated with given information such as
Table 10. Representation of /fh layer in yaInChI.1
(a) (b)
C5 C7
N8
C6
C4
C2
H
C3C1 8 8 0 0 0 0 0 0 0 0999 V2000 4. 9974 - 2. 5284 0. 3886 C 0 0 0 0 0 0 0 0 4. 0236 - 1. 4336 0. 5516 C 0 0 0 0 0 0 0 0 4. 4382 - 0. 2142 0. 9226 N 0 0 0 2 0 0 0 0 3. 3820 0. 6084 1. 0004 C 0 0 0 0 0 0 0 0 3. 5628 2. 0101 1. 4080 C 0 0 0 0 0 0 0 0 2. 2107 - 0. 0952 0. 6709 C 0 0 0 0 0 0 0 0 2. 6317 - 1. 4059 0. 3775 C 0 0 0 0 0 0 0 0 1. 7871 - 2. 5412 - 0. 0647 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 2 3 4 0 0 0 3 4 4 0 0 0 4 5 1 0 0 0 4 6 4 0 0 0 6 7 4 0 0 0 2 7 4 0 0 0 7 8 1 0 0 0
C5 C7
N8
C6
C4
C2
C3C1 8 8 0 0 0 0 0 0 0 0999 V2000 4. 9974 - 2. 5284 0. 3886 C 0 0 0 0 0 0 0 0 4. 0236 - 1. 4336 0. 5516 C 0 0 0 0 0 0 0 0 4. 4382 - 0. 2142 0. 9226 N 0 0 0 0 0 0 0 0 3. 3820 0. 6084 1. 0004 C 0 0 0 0 0 0 0 0 3. 5628 2. 0101 1. 4080 C 0 0 0 0 0 0 0 0 2. 2107 - 0. 0952 0. 6709 C 0 0 0 0 0 0 0 0 2. 6317 - 1. 4059 0. 3775 C 0 0 0 0 0 0 0 0 1. 7871 - 2. 5412 - 0. 0647 C 0 0 0 0 0 0 0 0 1 2 1 0 0 0 2 3 4 0 0 0 3 4 4 0 0 0 4 5 1 0 0 0 4 6 4 0 0 0 6 7 4 0 0 0 2 7 4 0 0 0 7 8 1 0 0 0
yaInChI yaInChI=/fC7H11N/c1-5(7-3)4-6(8-7)2 /h4,8H,1-3H3/fh8H/bt11144444
yaInChI=/fC7H10N/c1-5(7-3)4-6(8-7)2 /h4H,1-3H3/bt11144444
InChI InChI=1S/C7H11N/c1-5-4-6(2)8-7(5)3/h4,8H,1-3H3
1The yaInChI system provides hydrogen countþ1 in a column of the SDF format. The N8 atom of(a) molecule has ‘2’ in the hydrogen countþ1 column in the SDF, which means one excess hydrogenwhile (b) does not. Neither InChI (IUPAC) nor InChI (OB) considers this information, which meansInChI generates the same string for both molecules.
248 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
atom types, number of attached hydrogen atoms and the charged state. However, inmolecules with complicated structures, assigning bond types is ambiguous and aromaticitycalculation from non-aromatic bond types is difficult.
The yaInChI system presents the molecule bond type in the /bt layer. With the /btlayer, the yaInChI string is able to conserve the original bond type informationconsidering tautomer-specific forms and charged states. The /bt layer information isgenerated from the bond info in the SDF format. To generate the /bt layer, the bondinformation is sorted in ascending order using lexicographical comparison (the first andthe second atoms are sorted first by their atom number in ascending order and then eachpair of atoms is sorted using lexicographical comparison in ascending order), for example,(1,2)5 (2,3) and (3,4)5 (3,5). The numbers in the bond type column [(the third column inbond information (Figure 1)] range from 1 to 8. The numbers correspond to the SDFformat definition (1¼ single, 2¼double, 3¼ triple, 4¼ aromatic, 5¼ single or double,6¼ single or aromatic, 7¼ double or aromatic, and 8¼ any). The number of atomsconnected to each other does not need to be displayed in the /bt layer because thisinformation can be extracted from the connectivity layer, /c.
The purpose of the /bt layer is to display the bond type of a molecule given by SDF sothat the information could be used to convert from a 1-D string to a 3-D structure with lessambiguity. Because the /bt layer may vary in different tautomer or protonation states, evenwith the bond type represented in the SDF file, this layer could be eliminated forduplication checks. Table 4 shows an example of a molecule with two different /bt layers.
2.11 Modified canonicalization algorithm
It is necessary to generate the same 1-D string for a molecule, even if the atoms are enteredinto the SDF in a different order. To do that, InChI uses a canonicalization algorithm,which algorithmically generates a set of unique atom labels (canonical numbers) [14].The InChI canonicalization algorithm consists of four major steps:
(A) After removing all hydrogen atoms, atoms are labelled by considering atom nameand number of connections.
(B) After adding hydrogen atoms to heavy atoms except mobile hydrogens, all atomsare re-labelled with hydrogen connection.
(C) After adding isotopic composition to the structure, the atoms are re-ordered.(D) Finally the canonical numbers are obtained by considering stereochemistry.
The yaInChI system has several extra layers compared with InChI and thereforerequires a modified canonicalization algorithm. Though similar to the InChI algorithm,the new algorithm includes the /en, /nr and /mt layers in the Major Step D [14] in companywith /b and /t layers (see Table 2). For the /en and /nr layers, the priority of symbols is‘þ’4 ‘�’4 ‘¼’4 ‘%’, and for the /mt layer, the priority is ‘�’4 ‘¼’4 ‘_’. (See Table 9for the definition of ‘�’, ‘¼’, ‘_’ in the /mt layer.)
Other added or modified layers such as /p, /mh, /fh and /bt were not included in themodified canonicalization algorithm because they do not represent different moleculesrather different states. Therefore, when applying the yaInChI string to the canonicalizationand duplication check, the /p, /mh, /fh and /bt layers should not be considered. However,to better preserve the original information of given input file, these layers should beincluded in the yaInChI string. The /q layer was also removed from the modified
SAR and QSAR in Environmental Research 249
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
canonicalization algorithm because it represents the state of the molecule and not a specificatom, but could be included in the duplication check.
2.12 Test chemical dataset
A total of 1,140,785 molecules out of 1,159,274 from Ligand.Info Meta Database(ver. 1.02) [15] after removing data that lacked 3-D coordinates (1,244) and had multiplemolecules (16,298) were used to test the ability of yaInChI. Data with the atomic symbol‘A’ (947) were also eliminated because OpenBabelTM (ver. 2.3.0) [16], with which wewanted to compare, could not handle these molecules. Finally, two hypothetical molecules(Table 8) were added to the test chemical data set because there were no examples inLigand.Info that related to different organometallic stereochemistry. Therefore, 1,140,787molecules were used for the test chemical data set. Ligand.Info was chosen because itprovides a convincing collection of biologically active compounds with 3-D structures andduplication checks using SMILES were reported. The results indicated that 1,016,389compounds out of 1,159,224 compounds (87.7%) were unique [15].
3. Results and discussion
The InChI system was originally developed in cooperation with IUPAC and NIST, andthen implanted into other software. The original InChI (IUPAC) has several discrepanciesbetween the manual [13] and the software (ver. 1.03). The InChI scheme implemented inOpenBabelTM (ver. 2.3.0), InChI(OB), has fewer bugs and was therefore used to test theyaInChI system for duplication.
In general, the yaInChI method is similar to InChI. However, yaInChI was developedto contain more information on stereochemistry by using the /en, /t, /nr and /mt layers,and to improve the amount of structural information by utilizing charge (/q and /p), extrahydrogen (/fh), tautomers (/mh), connectivity (/c) and bond type (/bt) layers. Asmentioned previously, the /p, /fh, /mh and /bt layers were not considered for yaInChIbecause they represent different states, not different molecules. In the case of the InChIsystem, the /p layer was included for duplication check because it was defined differently.
The InChI and yaInChI methods generally consider tautomeric structures to be thesame molecule. Because yaInChI takes the tautomer information from the input file,tautomer information from InChI(OB) was used for fair comparison. The test chemicaldataset described in Section 2.12 was used to check redundancy. The results from theduplication check for both methods are presented in Table 11 and shown in Figure 2.
Table 11. Duplication check results for InChI and yaInChI.1
Total Unique group Duplicated group
InChI 1,140,787 998,016 142,771yaInChI 1,140,787 998,076 142,711
1InChI strings are not modified and the yaInChI strings do not include the /mh, /p, /fh and /bt layersfor duplication check.
250 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
The number of unique molecules produced using yaInChI was larger than InChI
because of the enhanced stereochemical representations. The yaInChI specific layers, such
as /t, /nr and /mt, distinguish the stereochemistry in greater detail. Tables 8 and 12 show
some examples which InChI(OB) cannot distinguish correctly.The difference between the yaInChI and InChI systems was 95 cases, but the numerical
difference between them is 94 cases (Table 13 and Figure 2). In one case, yaInChI classified
a group of four molecules into a, a, a and b, whereas InChI classified them into a, a, b and
b. Although both methods classified the molecules into two groups, they were different
identifications. The number of cases were 24, 1, 1, 3, 15 and 51, which were related to
non-rotatable bond (/nr), parity (/t), metal connectivity (/mt), charge (/q and /p layers of
InChI), hydrogen information (/h) and aromaticity, respectively (Table 13). The
differences relating to the /nr, /t and /mt layers were expected because InChI does not
consider the information (/nr and /mt) or has a different meaning (/t).
Table 12. Misrepresentation in hybridization and hydrogen number for InChI(OB).1
(a) (b)
C6
C2
C4
C8
C12
C10
N15
C11
C9
C13
C7
C3
C1
C5
N14
C6
C2
C4
C8
C12
C10
N15
C11
C9
C13
C7
C3
C1
C5
N14
yaInChI yaInChI¼/fC13H14N2/c1(3-7-11(15-12)9)5-9-13(14)10(12-8-4)6-2-4/h2,4,6,8H,1,3,5,7H2,(H2,14,15)/mh14H2/bt11441414144444441
yaInChI¼/fC13H10N2/c1(3-7-11(15-12)9)5-9-13(14)10(12-8-4)6-2-4/h1-8H,(H2,14,15)/mh14H2/bt44444444444444441
InChI InChI¼1S/C13H14N2/c14-13-9-5-1-3-7-11(9)15-12-8-4-2-6-10(12)13/h1,3,5,7H,2,4,6,8H2,(H2,14,15)
1The InChI system regards the two different molecules to be the same but (a) has a sp3 carbon and(b) has no sp3 carbon. These molecules have 14 and 10 hydrogen atoms, respectively.
Figure 2. Venn diagram of duplication-check results. The number of cases for both InChI andyaInChI, number of InChI-specific unique cases and number of yaInChI-specific unique cases are997,999, 17 and 77, respectively.
SAR and QSAR in Environmental Research 251
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
Molecules with a metal atom have a different charge string (/q and /p in InChI) in some
cases due to different normalization steps in yaInChI (Table 14).The 15 cases related to the /h layer were classified into two categories, where 14 cases
were related to hydrogen countþ1 and one case was related to hydrogen number and
location. Among all cases, 14 were related to the ‘hydrogen countþ1’ column because
InChI does not utilize the information, whereas yaInChI represents it in the /h and /fh
layers. Information from the ‘hydrogen countþ1’ column could be implicitly indicated in
the /h layer in InChI(OB), but was exactly indicated using the yaInChI system.One difference in the /h layer is due to the miscalculation of hydrogen in the InChI
trial. After careful visual inspection, we concluded that the yaInChI number of hydrogen
atoms and location definitions were correct.The 51 differences related to aromaticity were due to misidentification of aromaticity
using InChI(OB) (Table 12). The difference in aromaticity for the two methods was
identified using a comparison of the number of hydrogen atoms attached in aromatic
bonds. Determination of the aromaticity of a molecule was very complicated in some
cases. Further investigation was completed by comparing the results with InChI (IUPAC).
In 48 of the 51 cases, our result was identical to InChI (IUPAC); in the three remaining
cases, the InChI (IUPAC) displayed error messages. It seems that InChI (OB) has a bug in
the aromaticity calculation program. The yaInChI takes the aromaticity information from
the SDF file rather than calculating the value.The yaInChI string has a layered structure similar to that of the InChI system, thus
allowing modifications to the level of sensitivity for the purpose of duplication check. For
example, stereochemistry information layers (/b, /en, /t, /nr, /mt) can be included or
excluded from the duplication check. Excluding the layers provides similar results for both
methods. The yaInChI provides higher structural sensitivity and achieves various levels of
sensitivity depending on the layers included. Several features were added to the basic
InChI system, including non-rotatable single bonds, metal connectivity, stereochemistry of
organometallic compounds, allene and cumulene, and atom parity with lone pairs. All of
the protonation information including, hydrogen countþ1 and original bond type were
incorporated into the yaInChI system to better preserve the original information. These
were not considered or restricted in the InChI and SMILES systems. The hydrogen
countþ1 and original bond type were not essential information, but were useful for
generating 3-D structures from 1-D strings. Additionally, the yaInChI method used four
classes of stereochemistry notation for non-rotatable double bonds and allene or
cumulene, which provides valuable molecular geometry information.
Table 13. Different cases between yaInChI and InChI (OB).
Cases Number of cases
Non-rotatable bond(/nr) 24Parity(/t) 1Metal connectivity(/mt) 1Charge(/q) 3Hydrogen information(/h) Hydrogen countþ1 14
Hydrogen number and location 1Aromaticity 51
252 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
The duplication check results indicate that the yaInChI system was more discriminating thanany other methods. The Ligand.Info and PubChem compound databases were ca. 87.5%(998,076 compounds) and 93% unique, respectively, based on the yaInChI studies. TheyaInChI system shows promise as a useful and efficient tool to eliminate redundancy in large
Table 14. Example of different normalization steps in yaInChI and InChI.1
O28+
C24
C18C10
C3
C9
C17C23
O27+Sn31
O29+
C25
C19
C11C4
C12
C20C26
O30+
C13
C5C1
C7
C15
C16
C8
C2C6
C14
C21-
C22-
O28 C24
C18C10
C3
C9
C17C23
O27+Sn31
O29C25
C19
C11C4
C12
C20C26
O30+
C13
C5C1
C7
C15
C16
C8
C2C6
C14
C21 C22
InChI(OB) InChI¼1S/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;2*-1;Y4
InChI¼1S/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;;;Y4/p-2
yaInChI yaInChI¼/fC26H20O4Sn/c3(10-18-24-23)9-17-23-27-31(21-13-5-1-7-15-21,22-14-6-2-8-16-22,28-24,30-26-20-12-4)29-25(26)19-11-4/h1-20H/qþ2/mt31:21(22¼27¼28¼29)30/p21Y5,22Y5,27Y3,28-
3,29Y3,30Y3/bt444412214444211244441221111211121111
yaInChI¼/fC26H20O4Sn/c3(10-18-24-23)9-17-23-27-31(21-13-5-1-7-15-21,22-14-6-2-8-16-22,28-24,30-26-20-12-4)29-25(26)19-11-4/h1-20H/qþ2/mt31:21(22¼27¼28¼29)30/p27Y3,30Y3/bt444412214444211244441221111211121111
InChI (IUPAC) InChI¼1S/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;2*-1;Y4
InChI¼1S/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;;;Y4/p-2
InChI(OB)(reconnected metal)
InChI¼1/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;2*-1;þ4/rC26H22O4Sn/c1-5-13-21(14-6-1)31(22-15-7-2-8-16-22,27-23-17-9-3-10-18-24(23)28-31)29-25-19-11-4-12-20-26(25)30-31/h1-20,27,29H/qþ2
InChI¼1/2C7H6O2.2C6H5.Sn/c2*8-6-4-2-1-3-5-7(6)9;2*1-2-4-6-5-3-1;/h2*1-5H,(H,8,9);2*1-5H;/q;;;;þ4/p-2/rC26H20O4Sn/c1-5-13-21(14-6-1)31(22-15-7-2-8-16-22,27-23-17-9-3-10-18-24(23)28-31)29-25-19-11-4-12-20-26(25)30-31/h1-20H/qþ2
1The yaInChI system considers both molecules to be the same after charge normalization, whereasInChI does not.
SAR and QSAR in Environmental Research 253
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
chemical databases, which is very important in the fields of chemoinformatics and drugdiscovery.
4. Conclusions
The yaInChI system was developed to incorporate as much structural information aspossible for a given molecule. The yaInChI method uses more layers than the prototypicalInChI system. This feature provides higher sensitivity inspection of the structural identityin a large chemical database. We applied yaInChI to several compound databases andfound that yaInChI provided superior duplication results compared to InChI.Furthermore, the yaInChI system is easier to read than InChI with /bt, /fh and modified/c layers. Because yaInChI contains more structural information in a compact format thanother methods, it could be used to generate 3-D structures with less ambiguity, which isimportant for large chemical database that are affected by duplication. The yaInChIsystem reported here provides a readable output that is straightforward with improvedstructural sensitivity. The advances provide promise for yaInChI future applications inlarge chemical databases.
Acknowledgements
This work was supported by the Human Resources Development of the Korea Institute of EnergyTechnology Evaluation and Planning (KETEP) grant funded by the Ministry of KnowledgeEconomy, Republic of Korea (No. 20104010100610) and a grant of the Korea HealthcareTechnology R & D Project, Ministry for Health, Welfare & Family Affairs, Republic of Korea(No. A100096).
References
[1] D. Weininger, SMILES, a chemical language and information system. I. Introduction to
methodology and encoding rules, J. Chem. Inf. Comput. Sci 28 (1988), pp. 31–36.
[2] D. Weininger, A. Weininger, and J.L. Weininger, SMILES. 2. Algorithm for generation of unique
SMILES notation, J. Chem. Inf. Comput. Sci 29 (1989), pp. 97–101.[3] S.R. Heller and A.D. McNaught, The IUPAC international chemical identifier (InChI), Chem.
Int 31 (2009), pp. 7–9.[4] IUPAC, The IUPAC International Chemical Identifier (InChITM); available at http://
www.iupac.org/inchi (last accessed July 2011).[5] Murray-Rust Research Group, University of Cambridge, The Unofficial InChI FAQ; available
at http://wwmm.ch.cam.ac.uk/inchifaq/ (last accessed July 2011).[6] Daylight Chemical Information Systems Inc., SMILES – A Simplified Chemical Language;
available at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html (last accessed July
2011).[7] B. Kosata, Comparison of InChI to other chemical formats; available at http://inchi.info/
(last accessed July 2011).[8] A. Dalby, J.G. Nourse, W.D. Hounshell, A.K.I. Gushurst, D.L. Grier, B.A. Leland, and
J. Laufer, Description of several chemical structure file formats used by computer programs
developed at Molecular Design Limited, J. Chem. Inf. Comput. Sci. 32 (1992), pp. 244–255.
[9] A.W. Wendy, Tautomerism in chemical information management systems, J. Comput. Aid. Mol.
Des. 24 (2010), pp. 497–520.
254 Y.S. Cho et al.
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12
[10] W. Kocay and D. Stone, An algorithm for balanced flows, J. Comb. Math. Comb. Comput. 19
(1995), pp. 3–31.
[11] C.J. Lee, Y.M. Kang, K.H. Cho, and K.T. No, A robust method for searching the smallest set of
smallest rings with a path-included distance matrix, Proc. Natl. Acad. Sci. U. S. A. 106 (2009),
pp. 17355–17358.[12] R.W. Floyd, Algorithm 97: Shortest Path, Commun. Ass. Comput. Mach. 5 (1962), p. 345.
[13] E.S. Stephen, R.H. Stephen, and V.T. Dmitrii, IUPAC International Chemical Identifier (InChI)
InChI version 1, software version 1.03 (2010) Technical Manual; available at http://
www.iupac.org/inchi/download/version1.03/INCHI-1-DOC.zip (last accessed July 2011).[14] R. Apodaca, InChI canonicalization algorithm; available at http://depth-first.com/articles/2006/
08/12/inchi-canonicalization-algorithm (last accessed July 2011).[15] M. von Grotthuss, G. Koczyk, J. Pas, L.S. Wyrwicz, and L. Rychlewski, Ligand.info small-
molecule meta-database, Comb. Chem. High. Throughput Screening 7 (2004), pp. 757–761.[16] The Open Babel Package, version 2.3.0; available at http://openbabel.org/wiki/Main_Page
(last accessed July 2011).
SAR and QSAR in Environmental Research 255
Dow
nloa
ded
by [
Soon
gsil
Uni
vers
ity]
at 1
6:31
22
Apr
il 20
12