Upload
wanda-ortiz
View
20
Download
0
Embed Size (px)
DESCRIPTION
Substructures and Patterns in 2-D Chemical Space. Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological Complexity Virginia Commonwealth University. - PowerPoint PPT Presentation
Citation preview
Substructures and Patterns in 2-D Chemical Space
Danail Bonchev
Department of Mathematics and Applied Mathematics and
Center for the Study of Biological ComplexityVirginia Commonwealth University
Workshop CCSWS2: Optimization, Search and Graph-Theoretical Algorithms for Chemical Compound Space, IPAM, UCLA, 11-15 April, 2011
Molecular Properties and Graph Theory
• H. Wiener JACS 69(1947)17; JPC 52 (1948) 1082 – empirical equations – “path number”
• H. Hosoya Bull Chem Soc Japan 44 (1971) 2332 – reformulation in graph theory terms
• E. Smolenski Zh. Fiz. Khim. 38 (1964) 700
M. Gordon and J. W. Kennedy, J Chem Soc Faraday Trans II 69 (1973) 484.
k
iiiHaaGP
1
)0()(
Hi - the number of subgraphs of k nodes
k
iiiTIaGP
1
)(
TI i– topological invariant
2/1(max)
..
)...()(
k
i
kjiji aaaG
Molecular Connectivity Concept
Randic, 1975:2/1
)()(
jadjijiaaG
Kier and Hall, 1976
2/1(max)
..
)...()(
k
i
kjiji aaaSG
where SG = p (path), c (cluster), pc (path-cluster), etc.
path cluster path-cluster
Zk - the total number of electrons in the k-th atom
- the number of valence electrons in the k-th atomHk - the number of hydrogen atoms directly attached to the kth non-hydrogen atom
m = 0 - atomic valence connectivity indicesm = 1 - one bond path valence connectivity indicesm = 2 - two bond fragment valence connectivity indicesm = 3 three contiguous bond fragment valence connectivity indices etc.
B. Kier, L. H. Hall, Eur. J. Med. Chem., 1977, 12, 307.
Kier and Hall Valence Connectivity Indices
- Valence connectivity for the k-th atom in the molecular graph
Definition:
• The success of molecular connectivity indices
• Why they work so well?
From Molecular Connectivity to Overall Topological Indices
1986, Bertz & Herndon – the idea for using the total subgraph count as a similarity measure
Bertz, S.; Herndon, W. C. In Artificial Intelligence Applications in Chemistry; ACS: Washington, D.C., 1986,
pp.169-175. 1995-1997, Bonchev/Bertz – a subgraph count-based measure of structural complexity
D. Bonchev, Bulg. Chem. Commun. 28, 567-582(1995).
D. Bonchev, SAR QSAR Environ. Res. 7, 23-43(1997).
Bertz, S. H. and Sommer, T. J. Chem. Commun. 2409-2410(1997).
S. H. Bertz and W. F. Wright, Graph Theory Notes New York Acad. Sci. 32-48 (1998).
D. Bonchev, In: Topological Indices and Related Descriptors, J. Devillers and A.T. Balaban,
Eds., Gordon and Breach, Reading, U.K., 1999, p. 361-401.
D. Bonchev, J. Chem. Inf. Comput. Sci. 40, 934-941(2000).
D. Bonchev, J. Mol. Graphics Model., 5271 (2001) 1-11.
D. Bonchev, N.Trinajstić, SAR QSAR Environ. Res. 12 (2001) 213-235.
D. Bonchev,J. Chem. Inf. Comput. Sci., 41(2001) 582-592.
D. Bonchev, Lect. Ser. Computer and Computational Sciences, 4, 1554-1557 (2005).
1995-2005, Bonchev – overall topological indices
The Subgraph Count
N = 5, E = 4
e=0
e=1
0SC=5
1SC=4
2SC=4
3SC=3
4SC=1
SC = 17 (5, 4, 4, 3, 1)
e=2
e=3
e=4
SCSCSCSCSC E ...210
},...,,,{ 210 SCSCSCSCSCV E
From Subgraph Count To Overall Topological Indices
The idea: Weight all subgraphs with graph-invariant values and sum-up to characterize the structure as a whole. Sum-up weighted subgraphs having the same number of edges to capture different levels of graph complexity.
Motivation: The more complete the molecular structure representation, the better it captures the patterns of structural complexity, the more distinctive the topological descriptor, the more accurate the structure -property relationship.
Definition 1: The Overall Topological Index OTI(G) of any graph G is defined as the sum of the topological index values TIi (Gi ) of all K subgraphs Gi of G:
Definition 2: The eth -order Overall Topological Index eOTI(G) of any graph G is defined as the sum of the topological index values TIj (eGj ) of all eK subgraphs eGj of G, which have e edges:
The Overall Topological Complexity Indices
K
ii GGOTIGOTI
1
)()(
K
jj
ej
e
e
GGOTI GOTI1
) ()(
Corollary 1: The Overall Topological Index OTI(G) of any graph G can be
presented as a sum over all e-orders of this index eOTI(G):
OTIOTIOTIOTIGOTIGOTI EE
e
e
...)()( 210
1
Some More Definitions
Definition 3: The Overall Topological Index Vector OTIV(G) of any graph G is the ordered sequence of all eOTIs:
OTIV(G) = OTI(1OTI, 2OTI, … , EOTI)
Corollary 2: The E -order overall topological index, EOTI(G), is the index TI (G) itself:
EOTI (G) = TI(G)
Even More Definitions
V
)OTI(G)(GOTI;
V
OTI(G)(G)OTI e
e
eae
a
Definition 4a: The average overall topological index OTIa(G), and its e-order
term eOTIa (Ge) are obtained by dividing OTI(G) or eOTI(G) by the number
of vertices V:
Definition 4b: The normalized overall topological index OTIn (G), and its e-term eOTIn (Ge), are obtained by dividing OTI (G) by the value OTI(KV) that index has for the complete graph KV having the same number of vertices V:
)(
)(;
)(
)()(
Ve
ee
ne
Vn KOTI
GOTIOTI
KOTI
GOTIGOTI
Aren’t You Tired of Definitions?
The overall topological indices work well for molecules but what about networks? Computational disaster!
Definition 5: The cumulative pth-order overall index pOTI(G) is defined asthe sum over the first e = 0, 1, 2, … , p orders eOTI(G)s
p
e
ep GOTIGOTI0
)()(
The Solution: Use the first several orders of the OTIs !
How to Apply the Overall Topological Indices Approach to Molecules with Heteroatoms?
For OTI(G) ≡ OC, OM1, OM2 (overall connectivity and the first and second Zagreb index) substitute vertex degree ai with the Kier and Hall atomic valence term ai
v:
)()()()(1
)(
1
)(
11
)(
11i
eE
e
eK
i
iN
j
vji
eE
e
eK
i
vi
eE
e
evev GaGAGOCGOC
Example for overall connectivity index, OC:
Topological Indices Used in Realizing the Overall Indices Program
Total Adjacency, A(G):
ia - degree of vertex i; N – number of vertices in G
N
iiaGA
1
)(
First Zagreb Index, M1(G):
N
iiaGM
1
2)(1
jadji
ji aaGM )(2Second Zagreb Index, M1(G):
- distance of vertex iidijd - distance between vertices i and j;
Wiener Number, W(G):
N
i
N
jij
N
ii ddGW
1 11 2
1
2
1)(
The Overall Hosoya Index
The Overall Hosoya Index OZ(G)
),()()()(1
)(
1
(max)
11
)(
11
kGpGZGOZGOZ ie
E
e
eK
i
k
ki
eE
e
eK
ii
eE
e
ee
The Hosoya Index z(G):
H. Hosoya, Bull. Chem. Soc. Japan 44, 2332-2339 (1971).
p (G,k) is the number of not adjacent k edges in G, p(G,0)being unity and p(G,1) the number of edges.
(max)
1
),()(k
k
kGpGz
Examples of Calculation of Overall Topological Indices
N = 5, E = 4
e=0
e=1
0SC=5, 0OZ=5x1=5; 0OC= 3x1+2+3=8 1SC=4, 1OZ=4x2=8 ; 1OC= 2x4+5+3 = 16
2SC=4, 2OZ=4x3=12; 2OC=5+3x6=23
3SC=3, 3OZ=1x4+2x5=14; 3OC=3x7=21
4SC=1, 4OZ=1x7=7; 4OC= 3x1+2+3=8
SC = 17 (5, 4, 4, 3, 1); OZ = 46 ( 5, 8, 12, 14, 7); OC = 76(8, 16, 23, 21, 8)
e=2
e=3
e=4
1
1 1
23
Formulae for the Overall Indices for Some Classes of Graphs
Monocyclic Graphs
Linear (Path) Graphs
eSC(Pn) = n – e; SC(Pn) = n(n+1)/2
eOC(Pn) = 2[q(e+1) - e2] ; OC(Pn) = n(n-1)(n+4)/3
eOW (Pn) = e (e+1)(e+2)(n-e)/6 ; OW(Pn) = (n+3)(n+2)(n+1)n(n-1)/120
n – total number of vertices ; q – total number of edges ; e – number of edges in a subgraph
eSC – number of subgraphs having e edges each
eSC(Cn) = n ; qSC(Cn) = 1 ; SC(Cn) = n2 + 1
eOC(Cn) = 2n(e+1) ; qOC(Cn) = 2n ; OC(Cn) = n(n2+n+2)
eOW(Cn) = e (e+1)(e+2)n/6 (for e = 1, 2, …, n-1) ; qOW(Cn) = W
OW(Cn) = (n5+2n4+2n3-2n2-an)/24 ; a(even) = 0 ; a(odd) = 3
Star Graphs
q
e
qnnn
e
qnn
e
e
qeqSOCqSOC
e
qeSOC
qSSCnSCe
qSSC
1
0
0
).(2)(;2)(;2)(
2)(;;)(
)!(!
!)(
2
eqe
eqSOW n
e
2
0
2
)!1(!
)1()!1()(
n
in ini
innSOW;
Total Walk Count, twc
Example 5 4 1 3
2
WC = 106 ( 8, 16, 28, 54) 1 3
l = 1
l =2
1 3 1 3 4
l = 3 1 3 4 1 3 1 3 4 5
Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2000), 40, 99-106. Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2001) 41, 1457-1462.
1
1
1
1
V
l
V
l ii
ll wWCTWC il w-The number of walks of length l
that start in vertex i
WCl-The total number of walks of length l
The Six Overall Topological Indices Order Structures According to
Patternsof Increasing Complexity
1 (4) 2 (14) 3 (32) 4(39) 5(60)
6 (76) 7 (100) 8(100) 9(127)
10 (136) 11 (164) 12 (181) 13 (154)
14 (194) 15 (214) 16 (234) 17 (246)
18 (276) 19 (284) 20 (314) 21 (369)
# (OC)
Graphs
SC OC OW OM1 OM2 OZ
1 3 4 1 1 1 4
2 6 14 6 22 10 10
3 10 32 21 56 26 21
4 11 39 24 87 27 23
5 15 60 56 110 60 40
6 17 76 67 168 67 46
7 20 100 80 292 68 52
8 21 100 126 188 130 72
9 24 127 154 277 149 84
10 25 136 161 300 161 89
11 28 164 188 404 172 100
12 30 181 197 505 168 103
13 28 154 252 294 272 125
14 32 194 311 418 315 147
15 34 214 333 468 351 159
16 36 234 354 516 390 172
17 37 246 384 584 366 173
18 40 276 411 668 410 191
19 41 284 414 762 370 185
20 44 314 440 850 412 202
21 49 369 510 1075 433 225
Table 1. Quantitative Comparison of the Six Overall Topological Indices in C2-C7 Alkanes
Table 2. Standard deviations of the best C3-C8 alkane properties models with five parameters produced by the six overall topological indices versus those obtained by the set of molecular connectivity indices
Properties Correlation Coefficient, R
Standard SD of the best molecularDeviation, SD connectivity models
Boiling Point, C 0.9993
1.60 3.31
Heat of Formation, kJ/mol 0.9995 1.02 1.37
Heat of Vaporization, kJ/mol 0.9950 0.67 0.79
Heat of Atomization, kcal/mol 1.0000 0.30 5.78
Surface Tension, dyn/cm 0.9963 0.17 0.22
Molar Volume*, cm3/mol 0.9999
0.23 0.36
Molar Refraction, cm3/mol 1.0000 0.041 0.044
Critical Volume, L/mol 0.9948 0.0079 0.0087
Critical Pressure, atm 0.9955 0.37 0.50
Critical Temperature*, C 0.9983 3.23 4.76
3 4 5
6
7
8
SC = 11 17 20 26OC = 32 76 100 160 TWC = 58 106 140 150
9
10
11
12 13
14
15
SC = 29 31 54 57 OC = 190 212 482 522 TWC = 178 214 300 350
The Overall Topological Indices andComplexity of Structures Containing Cycle
SC = 61 114 119 477 973 OC = 566 1316 1396 7806 18180 TWC = 337 538 608 1200 1700
SC 28 (5, 8, 9, 5, 1) 30 (5, 9, 10, 5, 1)
OC (in) 111 (12, 28, 41, 25, 5) 135 (16, 40, 49, 25, 5)
TWC 15 (5, 5, 5) 21 (5, 7, 9)
1
2
The Overall Complexity Measures Can Discriminate Very Subtle Complexity Features
Complexity of structure 2 is higher, because it has more complex cycle
Cyclicity contributes more to complexity than Branching
Some Conclusions While the six topological indices used show degeneracy and order differently the isomeric molecules, the overall indices are non-degenerate and order similarly the molecules in series of increasing complexity. The sets of overall topological indices produce QSPR models with (sometimes considerably) smaller standard deviations than the corresponding models with molecular connectivity indices. The best model statistics is shown by overall connectivity, followed by the overall Wiener indices.
The patterns of structural complexity deserve considerable attention due to their generality
Molecular Branching
Wiener, 1947: First analyzed some aspects of branching of molecular skeleton by fitting experimental data for several properties of alkane compounds to the diversion of his “path number” W in branched alkanes from that of the linear isomeric compound.
Graph-invariants tested early as “branching indices” of acyclic molecules correlating to their properties:
Graph non-adjacency number, Hosoya, 1971
Graph largest eigenvalue, Lovasz and Pelikan, 1973
First and second Zagreb indices, Gutman et al., 1975
Molecular branching index, Randić, 1975
Rouvray, D.H. and King, P.B., Eds.,Topology in Chemistry. Discrete Mathematics of Molecules. Horwood, Chichester, U.K. 2002.
Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17-20. Relation of the Physical Properties of the Isomeric Alkanes to Molecular Structure. J. Phys. Chem. 1948, 52, 1082-1089.
The Goal: To go beyond inventing new graph invariants and experimental data fitting, and try to understand thetopological basis of molecular properties.
The Hypothesis: The increase in branching complexity is associated with a decrease in the Wiener number W.
D. Bonchev and N. Trinajstic, On Topological Characterization of Molecular Branching. Intern. J. Quantum Chem. Symp. 12(1978)293‑303.
D. Bonchev, and N. Trinajstic, Information Theory, Distance Matrix, and Molecular Branching, J. Chem. Phys. 67(1977) 4517‑4533.
D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156.
The Branching Patterns of Molecular Structures
0)1)(1(21 jNjNWWW
Rule 1:
(N – number of vertices in the main chain; j – branch position)
Rule 2:
0)1)(1( jjNW
Rule 3:
0)1)(1( 1 NNW
(N1 - number of vertices in the branch)
Rule 4:
0)1)(1( 1 jNNjW
The Rules of Branching
Rule 5:
Rule 6:
Rule 7:
Rule 8:
0)1)(1( 1 jNNjW
0)1)(1( 21 NNNW
0)1)(( jjNjjW
0 jjW
)]()([1 vdudnW
)])(2[( ,,1 iviui nniLnW
u
v
u1v1
Generalization of the Branching Rules
5 more general rules derived: three mechanisms of formation of new branches, one with branch transformations related to a vertex degree redistribution, one shows the topological identity of branch elongation to branch shifting toward a more central position.
The number of branches and the number of vertices of higher degree are considerably stronger complexity factors than the branch length and branch centrality, however the role of centrality increases with the size of the system, and becomes dominant in polymeric macromolecules.
Conclusions:
D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156.
O. E. Polansky and D. Bonchev, Commun. Math. Comput. Chem. (MATCH) 1986, 21, 133‑186; 1990, 25, 3‑40.
Molecular Cyclicity
Similar conjecture: All structural patterns that increase the cyclic complexity of molecules are associated with a decrease in the Wiener number.
Bonchev, Mekenyan, Trinajstic, 1979-1983
stronger link between the cycles
Cyclic complexity increases by:
A)
26 2 x 14 3 x 10 4 x 8
6 x 6 reduction in the cycle size for the creation of more cycles of smaller size
B)
Papers for cyclic complexity: Intern. J. Quantum Chem. 1980, 17, 845‑89; 1981, 19, 929‑955. Math.Comput. Chem. (MATCH) 1979, 6, 93‑115; 1981, 11, 145‑168; Croat. Chem. Acta 1983,56, 237‑261.
transforming a linear chain of cycles into a zigzag-like one
C)
D)
increasing the number of cycles fused to a common edge (propelerity)
LUMO
HOMO
LUMO
HOMO
ΔW < 0, ΔE > 0 ΔW < 0, ΔE < 0
Rules 3, 5-7, 9, 10, 12-15 Rule 1
E)
With a single exception the 15 rules derived for benzenoid hydrocarbonsidentify structural transformations that increase their stability
Topology of Polymers
12 3 4
5 6 7
8 9
10
Wiener “infinite” index: the limit for the Wiener number of a polymer having N non-H atoms, normalized per unit distance and unit bond:
lim)1](2/)1([
lim23
CNNN
dcNbNaNW (N – number of atoms,
C – number of cycles)
lim
0833.012
1
6/)7114
18/)188358(lim
23
23
NNN
NNNW
For structure 9: 01.0,902.0,25
851.0533.1,
srn
WE
(Bonchev, Mekenyan et al., 1980-1983)
A simple equation incorporating only topological invariants of the monomer unit was derived 10 years later. These are the numbers of atoms N1 and cycles C1 in the monomer unit, as well as the number of bonds D (or the graph distance) between two neighboring monomer units:
Examples:
D = 2, N1 = 4, C1 = 1, W = 2/15; d = 4, N1 = 6, C1 = 1, W = 4/21
Improved Method for Calculating Wiener Infinite Index
T.-S. Balaban, A. T. Balaban, and D. Bonchev, J. Mol. Structure (Theochem) 2001, 535, 81-92
)(3 11 CN
dW
Equations linking the Wiener number to the radius of gyration and
viscosity of polymer melts and solutions
linWW / = (3x1+3x2) / (3x1 + 2x2 + 1x3) = 9/10 = 0.9
g (3-arm star) =
WN
bRbR top
g 2
222
2
linling
g
W
W
R
Rg 2
,
2
WN
cb2
2
0 6
is the friction coefficient, and c is the number of polymer chains in a unit volume
g is the Zimm-Stockmayer branching ratio of a branched macromolecule
Rg2 and g are measured by laser light
scattering
D. Bonchev, E. Markel, and A. Dekmezian, J. Chem. Inf. Comput. Sci. 2001, 41, 1274-1285. D. Bonchev, E. Markel, and A. Dekmezian, Polymer 2002, 43, 203-222.
Kirchhoff-number-based generalization of the equations for polymers containing atomic rings
D. Bonchev, O. Mekenyan, and H. Fritsche, An Approach to the Topological Modeling of Crystal Growth, J. Cryst. Growth 1980, 49, 90‑96. D. Bonchev, O. Mekenyan, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. I.Model Crystallites with a Single Vacancy, Phys. stat. sol. (a) 1979, 55, 181‑187.
O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. II. Model Crystallites with Two and Three Vacancies, Phys. Stat. sol.(a) 1979, 56, 607‑614.
O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Defect Studies, Z. Phys. Chem. (Leipzig) 1984, 265, 959‑967. H. G. Fritsche, D. Bonchev, and O. Mekenyan, Deutung der Magischen Zahlen von Argonclustern als Extremwerte Topologischer Indizes, Z. Chem. 1987, 27, 234.
H. G. Fritsche, D. Bonchev, and O. Mekenyan, On the Topologies of (M13)13 Superclusters of Ruthenium, Rhodium and Gold, J. Less‑Common Metals 1988, 141, 137‑143. H. G. Fritsche, D. Bonchev, and O. Mekenyan, Are Small Clusters of Inert‑Gas Atoms Polyhedra of Minimun Surfaces? Phys. Stat. Sol.(b) 1988, 148K, 101‑104. H. G. Fritsche, D. Bonchev, and O. Mekenyan, The Optimum Topology of Small Clusters, Z. Phys. Chem. (Leipzig) 1989, 270, 467‑476.
H. G. Fritsche, D. Bonchev, and O. Mekenyan, A Topological Approach to Studies of Ordered Structures of Absorbed Gases in Host Lattices (I). The Structure of ‑PdD0.5, Crystal Res. Technol. 1983, 18, 1075‑1081.
Topology of Crystals
Basic criterion used: Wiener number minimum
Crystal Growth
Reproduced shape maximally close to the spherical shape typical for the free nucleation in vapor phase, and crystallization under zero-gravity conditions:
W=1 W=8 W=48 W=369
W=972
W=5536
The detailed sequences of crystal growth were constructed by adding an atom at each step, and by selecting from a number of candidate-structures the one with the minimum Wiener number.
Crystallization on a substrate with a low surface energy.
The crystallization on a substrate with a high surface energy also reproduced the experimentally observed monolayer shape.
Prediction of the most probable locations of crystal vacancies and defect atoms
max0 WWW
Criterion used:
Equations derived for a series of two- and three-dimensional models of crystal lattice with variable vacancy locations. For a simple cubic crystallite having N = 3x3x3 atoms, the variation in the Wiener number is expressed as:
)(2)2(2
)()]}(21[3{2
1
222
222234
kkjjiikjiN
kkjjiiNkjiNNW
where i, j, k are the lattice nodes along the x, y, and z coordinate axes, respectively.
ΔW increases when going from volume to face to edge to corner in agreement with thermodynamic theory and quantum chemical calculations.
Modeling of Atomic Clusters
The Wiener number minimum was used again as a criterion
Adding one atom at a time over a certain crystal face and connecting this atom to all face atoms produced cluster genetic lines.
Two of the genetic lines resulted in icosahedrons, two others yielded cubo-octahedron, and another line generated anticubo-octahedron in agreement with the experimental data.
The minimum of the Wiener number in the icosahedron cluster also explained the “magic” number 13, for which a maximum intensity of cluster mass spectra has been observed.
Predicted correctly the doubly magic metal super clusters [(M13)13]n, where M = ruthenium, rhodium or gold, as well as the stable argon clusters at the magic numbers 13, 19, 23, 26, 29, and 32.
i
V
i i
i bd
aB
1
2
B1 = A/D = <ai>/<di>
“Small-World “Connectivity• Complex network properties: High Connectivity and Small Diameter
• They can be integrated into a single parameter:
<ai> - average vertex degree; <d,> - average node distance
B1 – a quick estimate of network complexity
B2 – a much more precise complexity measure
bi - a measure of node centrality
A/D 0.313 = 0.313 0.429 0.400 B2 1.677 1.783 2.200 2.211 B3 3.641 3.650 3.387 4.972
A/D 0.200 0.222 0.250 0.333 B2 1.105 1.294 1.571 1.667 B3 2.385 2.554 2.628 3.871
A/D 0.429 0.538 = 0.538 0.818 1B2 2.410 2.867 2.943 4.200 5B3 4.957 6.298 6.311 9.580 11.61
1 2 3 4
5 6 7 8
9 10 11 12 13
Complexity Patterns Analysis