50
1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software & Consultancy Services Sheffield, UK

1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

Embed Size (px)

Citation preview

Page 1: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

1Chemical Structure Representation

and Search Systems

Lecture 5. Nov 13, 2003

John Barnard

Barnard Chemical Information LtdChemical Informatics Software & Consultancy Services

Sheffield, UK

Page 2: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

2 Lecture 5: Topics to be Covered

• Reaction searchingo atom-atom mappingo Maximal Common Substructure search

• 3D substructure search• Searching Markush structures in patents

o nature and origin of Markush structureso fragment codeso topological systems (MARPAT, Markush DARC)

Page 3: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

3 Searching Chemical Reactions

each database entry contains several molecules• reactants• products• catalysts• solvents• etc.

may want query substructure confined to one of these• can be done by assigning role indicator to each

molecule but role indicators are not enough on their own for

a useful reaction search system

Page 4: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

4 Reaction search

Query: CO

COH

Page 5: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

5 Reaction search

Query:

“Hit”:

We didn’t get what we wanted because the hydroxyl in the product did not involve the same oxygen as the ketone in the reactant

We need to “map” the atoms between the reactant and product

CO

COH

O

OH

CH3

OH

Br

OCH3+ BrH +

Page 6: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

6 Atom mapping

atoms on each side of the reaction can be numbered to show which corresponds to which• similar mappings can be used in the query

automatic assignment of atom mapping is very important in reaction indexing systems• problem is obviously related to finding a graph

isomorphism between reactant and product sides• except that the two sides are NOT isomorphic

.6.

.5.

.4.

.3.

.2..1.

.9.

O.11.

.7.

OH.8.

CH3

.10.

.6.

.5.

.4.

.3.

.2..1.

.7.

OH.8.

Br.12.

.9.

O.11.

CH3

.10.

+ BrH.12.

+

Page 7: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

7 Maximal common subgraph

atoms and bonds in red represent the largest subgraph that is common to both sides• all these atoms have same neighbours on both sides• none of these bonds are made or broken

remaining atoms and bonds represent reaction site

.6.

.5.

.4.

.3.

.2.C.1.

C .9.

O.11.

.7.

OH.8.

CH3

.10.

.6.

.5.

.4.

.3.

.2.C.1.

.7.

OH.8.

Br.12.

CH

.9.

O.11.

CH3

.10.

+ BrH.12.

+

Page 8: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

8 Maximal common subgraph

Finding the MCS between two graphs is an NP-complete problem• even worse than subgraph isomorphism because you

don’t know in advance how big the subgraph will be• exhaustive backtracking is prohibitively slow• the best algorithms find an approximate solution (i.e. a

large, but not necessarily maximal, subgraph)• tricks can be used to determine an upperbound for the

size of the MCS (so you can stop looking when you’ve found one of this size)

• new algorithm published 2002

Page 9: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

9 Applications of MCS

MCS algorithms can be applied to other things than atom-atom mapping in reactions• structural similarity between molecules

o size of MCS (relative to size of molecules) can be used as measure of similarity of molecules

• approximate match searcheso search for molecules containing at least 80% of

query substructure

• multiple maximal common substructure

Page 10: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

10 Multiple MCS

largest substructure common to whole set of molecules• can be used to extract “core” for a Markush

structure• might represent features important for

biological activity• even more difficult than MCS of two molecules

o unfortunately it doesn’t work to find MCS of first two, and then MCS between that and the third, etc.

Page 11: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

11 3-D substructure search

Analogous to 2-D substructure search• need to find atoms in correct spatial orientation relative

to each othero some fuzziness (tolerance) permitted in distance values

• query can be defined as a group of atoms, with specified interatomic distances

o sometimes called a pharmacophore

• both query and database structures can be shown as topological graphs in which the nodes are atoms, but the edges are interatomic distances

Page 12: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

12 3-D substructure searching

the interatomic distances are the labels on the edges

graph is fully-connected (an edge between every pair of nodes)

the graph edges do not correspond to bonds in the molecule

matching is then a process of subgraph isomorphism between such graphs

N C

C

O

2.3Å

5.1Å

2.5Å

6.4Å

7.1Å 4.1Å

Page 13: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

13 3D substructure searching

subgraph isomorphism involving fully-connected graphs is computationally more demanding than for 2D substructure search

• Ullmann’s algorithm performs well• other approaches (e.g. clique detection) have also been used

fingerprint-like screening stages can also be applied in the search, based on 3D-fragments such as 3-point pharmacophores

• screens based on torsion and valance angles have also been used

Willett, P. Three-Dimensional Chemical Structure Handling. Wiley: New York (1991)

Page 14: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

14 Chemical patents

Contract between inventor and State to encourage innovation

• Inventor reveals nature of invention• State grants protected monopoly over its exploitation for limited

period Invention must be novel, useful and non-obvious

• new ways of making compounds• new compounds with useful properties (therapeutic uses)

Essential for success of pharmaceutical industry Knowledge of existing patents (prior art) essential to avoid

fruitless development

Page 15: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

15 Chemical patents

May claim single product or process More usually claim class of products or processes to

ensure protection for closely-related compounds etc. Very broad claims can disguise true nature of invention

• But may claim compounds which lack claimed activity• Nested series of claims (A, preferably B, more preferably C etc.)

can provide “fallback” positions Extremely broad claims have become more common as

Patent Offices moved to publication before examination• Sibley, J. F. “Too broad generic disclosures: a problem for all”

J. Chem. Inf. Comput. Sci. 1991, 31 (1) 5-8

Page 16: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

16R1-X-R36

R1 is a substituted or unsubstituted, mono-, di- or polycyclic, aromatic or non-aromatic carbocylic or heterocyclic ring system, or…

X is a single or double bond, substituted or unsubstituted heteroatom, or substituted carbon atom, or substituted or unsubstituted chain of two or more carbon atoms and/or heteroatoms…

R36 is substituted or unsubstituted asymmetrical heterocylic ring system having at least 3 nitrogens…[Structure 32 from Claim 105 of PCT Application 8704321,

claimed as novel]

Page 17: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

17 The patent explosion

Originally only granted patents published. Belgium (1950s), Netherlands (1964) and EPO

(1978) -> publishing all patent applications. Rapid publication makes information available

very quickly. Huge number of patents, many low quality,

insufficient or incorrect details, no novelty. Less work for patent examiners but greater

problems for retrieval systems.

Page 18: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

18

Structural information in chemical patents Uses mixture of:

• 2D structure diagrams

• linear formulae (e.g. “C2H5”, “EtOH”)

• specific nomenclature (e.g, “phenyl”, “isopropyl”)• generic nomenclature (e.g. “alkyl”, “heteroaryl”)• non-structural expressions (e.g. “pharmaceutically

acceptable cation”, “group known in the art”)

Many machine readable systems just show structural information as free text and images

Page 19: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

19 Specific Structures from Patents

Several databases contain specific molecules claimed in patents• Chemical Abstracts Registry• Derwent Registry• MDL announced major new database Nov 2003

o will include reactions, molecules and Markush displayo http://www.mdl.com/company/news/press_releases/2003

/pr_patentdb_07nov03.jsp

Page 20: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

20 Markush Structures

also known as “Generic Structures” or “R-group Structures”

chemical structures involving variable parts

OH

R1R2

Br

*

I*

Cl

*R1=

CH2

*

CH3CH2

* CH2CH3 CH2

* CH2CH2

CH3R2=

Page 21: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

21 Markush Structures

compact representation of a set or class of specific compounds with common structural features

used in • chemical patents• query structures in substructure search systems• Quantitative Structure-Activity Relationship (QSAR)

analysiso class of related compounds with activity data

• combinatorial librarieso rapid synthesis of large numbers of related compounds

• legislation (controlled drugs, chemical weapons)

Page 22: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

22 Variability in Markush Structures

s-variation (substituent variation)list of alternative values for an R-group

p-variation (position variation)variable point of attachment

f-variation (frequency variation)multiple occurrence of groups

h-variation (homology variation)generically described group (e.g. “alkyl”)• potentially infinite set of specific alternatives

Page 23: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

23 Types of variation

substituent variation

R1 is methyl or ethyl

homology variation

R2 is alkyl

position variation

R3 is amino

frequency variation

m is 1-3

OH

R1

R2

R3

(CH2)m

Cl

Page 24: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

24 Types of Markush structure

subst homol posn freq

Patents * * * *

Queries * (*) (*) (*)

QSAR * *

Libraries * (*) (*)

Legislation * * (*) (*)

Page 25: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

25 Markush Structures

Compact representation for sets of molecules• common parts shown once only

Can be considered as formal “grammar” for generating valid molecules (“sentences”)

Enumeration of coverage usually impractical and often impossible (infinite sets)

Appropriate algorithms for handling take advantage of Markush representation:• Avoid enumeration (especially infinite sets)• Compare finite grammars rather than infinite sets of valid

sentences

Page 26: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

26 Dr Eugene A. Markush

born Budapest, Hungary, c. 1888 migrated to USA, 1913 (Citizen, 1920) Founded Pharma Chemical Corporation (NJ),

1919 Filed US patent 1506316 on pyrolazone dyes, 9

January 1924, using expression “where R is a group selected from ...” to circumvent USPTO “rule against ‘or’ ”

died New York, 21 April 1968

Page 27: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

27 Markush storage and retrieval

Early systems (1950s, 1960s) developed in-house by pharmaceutical companies/consortiums

High costs of patent abstracting and technical difficulties with automation shifted development to specialist companies

Fragmentation code systems superseded by topological (structure graphics) systems

Page 28: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

28 Fragmentation Codes

Structural features (ring systems, functional groups, etc.) used as indexing terms

Structural relationships usually lost• all alternatives tend to be “over-coded”• retrieved structures include many “false drops” (“ballast”)

Codes originally assigned manually• Now usually generated (semi-)automatically from graphical input• Queries also generated automatically

Some codes use “closed” set of terms (periodically revised) Others are “open-ended”

Page 29: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

29 Fragmentation Codes

Derwent World Patent Index Chemical Code • Closed code with about one thousand terms• Large comprehensive backfile (from early 1960s)• Available for online searching (Questel)

IFI/Plenum Code• Open-ended code• Used for “CLAIMS” database (U.S. patents)• Available for online searching (STN)

o no graphical interface

Page 30: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

30 Fragmentation Codes

GREMAS code• Very sophisticated open-ended code• Private collaboration between (mainly) German

pharmaceutical companies• Good retrieval performance• Input discontinued in early 1990s• Backfile (from 1950s) still searched at a few

companies

Page 31: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

31 Graphical (“topological”) systems

Development started in early 1980s Intended to supplement graphical substructure

search systems for specific structures• MACCS, CAS Online, DARC, etc.

User draws graphical (sub)structure query System displays graphical Markush structure hits Two commercial systems implemented

• available for online searching only• each with its own database• no “in-house” systems or databases

Page 32: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

32 Markush DARC

Joint development of• Questel SA (software and online host) • Derwent Information Ltd (WPIM database)• INPI (French Patent Office) (PHARMSEARCH

database) Integrated database (“Merged Markush File”) now

available• http://www.inpi.fr/inpi/mms/index.htm• Extension forwards (Derwent) and backwards (INPI)

Page 33: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

33 MARPAT

software and database from Chemical Abstracts Service

available online via STN International • http://www.cas.org/CASFILES/marpat.html

integrated with CA Registry database of specific compounds

Proposal to allow Derwent database to be searched with MARPAT software dropped in mid 1990s for commercial reasons

Page 34: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

34 The Markush Problem

Representation• Mixture of structures and text• Generic (h-variant) expressions• Vagueness (“where by X we mean…”)

Search• The “translation” problem

o Specific groups (e.g. tert. butyl) must be matched against generic expressions (e.g. 1-6C alkyl)

• The “segmentation” problemo Boundaries between scaffold and R-groups may not coincide

in query and database structures

Page 35: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

35 Matching Markush Structures Translation and Segmentation problems coincide

to make it difficult to spot matching structures

O

O R1 R2 R1CH3

CH3

/ isopropylR1 = alkyl

O*

R4

R3R2 = NH2 /

R3 = O

R4 = cycloalkyl

R1 = t-butyl/ cycloalkyl

/ S

Page 36: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

36 Sheffield University Research

Extended project (1979-1994) on Markush structure storage and retrieval• designed external (GENSAL) and internal (ECTR)

storage formatso parameter lists for homology-variant groups

• developed novel matching algorithms based around graph isomorphism

o “reduced graph” concept

• influenced development of commercial systemso independent work also done at CAS, Derwent and Questel

Downs and Barnard, J. Documentation, 1998, 54 (1), 106-120

Page 37: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

37 GENSAL

formalised version of language used in patent specifications

design analogous to programming language lexical elements include

• structure diagrams• specific and generic chemical nomenclature• substitution operators• position/multiplicity values

GENSAL Interpreter program (compiler) generates internal representation based on “partial” connection tables with links between them

Page 38: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

38 GENSAL example

R1

R2

R1 = H / alkyl <1-4>;

R2 = F / Cl ;

R1 + R2 = SD

;

R3 = phenyl OSB <1-2> Cl;

IF R2 = Cl THEN R1 = H.

R3

*

*O

Page 39: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

39 Parameter Lists

Represent generic (“homology-variant”) expressions by set of permitted numerical ranges for structural parameterse.g. “alkyl”:• 1-n carbon atoms• 0 heteroatoms• 0 double or triple bonds• 0-n branch points• 0 rings

Page 40: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

40 Reduced Graphs

connected groups of atoms “collapsed” to form a single node of the reduced graph• atoms in the same ring system (R)• optionally branched carbon chains (C)• connected acyclic heteroatoms (Z)

N

NH

CH2C

OH

O

O O

Z 3 R 9 C 2

Z 1

Z 1

Page 41: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

41 Reduced Graphs

boundaries between nodes are non-arbitrary• thus provides solution to segmentation problem

each node can be described by a parameter list

homology-variant groups can also be represented as reduced graph nodes with parameter lists

• thus provides solution to translation problem:o first identify isomorphism between reduced graphso if parameter lists match can do atom-by-atom match on original atoms in

specific groups, if necessary

N 1 O 2C 8 N 1R 6 :1R 5 :1

C 2

0 1

0 1

Page 42: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

42 Design of Commercial Systems

Sheffield system never implemented commercially Ideas incorporated into both Markush DARC and

MARPAT• also used by BCI Ltd. in various projects

Other ideas developed independently• both systems have patent protection

Basic concepts parallel those developed at Sheffield

• Barnard, J. M. “A comparison of different approaches to Markush structure handling” JCICS, 1991, 31 (1), 64-67

• Berks, A. “Current state of the art of Markush topological search systems”, World Patent Information, 2001, 23 5-13

Page 43: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

43 Markush DARC

Specific groups shown as structure diagrams• Rather clunky display (one R-group at a time)

Generic groups shown as “superatoms”• e.g. CHK = alkyl, HEF = fused heterocycle• qualitative attributes used in searching• quantitative parameters (texnotes) available for display

reduced graph concepts used in atom-by-atom search stage

Page 44: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

44 Markush DARC Display

Page 45: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

45 MARPAT

Part of CASLink substructure search system on STN

Input and display uses text and graphics • similar to GENSAL

Generic Group Nodes with quantitative attributes (not fully implemented for search)

Page 46: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

46 MARPAT Generic Group NodesR

an y g ro u p

C ycy c lic g ro u p

A kca rb o n ch a in

Qh e te ra to m

C bca rb o cy c le

H yh e te ro c y le

Xh a lo gen

Mm eta l

GGN definitions imply reduced graph concept “Spin-off” GGNs generated for specific groups to allow

specific-generic matching (“translation”)

Page 47: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

47 MARPAT Display

MSTR 1

G1 = N, CH G2 = H, X, SC,Cl DER: or acid addition salts MPL: Claim 1

Page 48: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

48 Conclusions from Lecture 5

Chemical reaction search requires atom-atom mapping between reactant and product

• Maximal Common Subgraph algorithms can be used 3D substructure search uses interatomic distances as edge

labels in fully-connected graphs Markush structures pose particular problems to structure

search systems• extremely broad classes• homology-variant (generic) expressions• segmentation between R-groups

Two publicly-available Markush search systems for chemical patents

• Markush DARC and MARPAT

Page 49: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

49 Further Reading

Chen, L.; Nourse, J. G.; Christie, B. D.; Leland, B. A.; Grier, D. L. “Over 20 years of reaction access from MDL: a novel reaction substructure search system”. J. Chem. Inf. Comput. Sci. 2002, 42, 1296-1310.

“Representation and manipulation of 3D molecular structures”. Chapter 2 (pp. 27-52) in A. R. Leach and V. J. Gillet, An Introduction to Chemoinformatics, Dordrecht: Kluwer, 2003

Berks, A. H. “Current state of the art of Markush topological search systems”. In J. Gasteiger (ed.) Handbook of Chemoinformatics: From Data to Knowledge, Vol 2, pp. 885-903, Wiley-VCH, 2003

Page 50: 1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software

50 Lecture 6: Topics to be Covered

Similarity searching• similarity search vs. substructure search• similarity and distance metrics• different types of descriptor for similarity

search• choice of descriptors

The drug discovery process