41
Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Embed Size (px)

Citation preview

Page 1: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Maximizing the Use of the Lawson Number in Beilstein SearchingGary Wiggins and Usha Coca

School of Informatics

Indiana UniversityACS CERM, June 4, 2004

Page 2: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Abstract In the Beilstein database, the Lawson Number is based on the

Beilstein System for classifying organic substances. Every substance in the Beilstein file has at least one Lawson Number, and the smaller the Lawson Number, the more common is the fragment. While the Lawson Number is a searchable field, searching with Lawson Numbers is not equivalent to substructure or Markush searching. Since the Lawson Numbers represent certain structural fragments, they can be used for structural similarity searches. Searches that include the Lawson Number are effective when used in combination with other search keys, such as molecular formula, element ranges, etc. It is also useful when combined with NOT in substructure searches. Thus, the Lawson Number could serve as an effective index search key if its meaning were known. We have developed a prototype system that could be interfaced with the CrossFire system for effective use of the Lawson Numbers in searching. The system will be described and demonstrated.

Page 3: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Friedrich Konrad Beilstein(1838-1906)

Page 4: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook of Organic Chemistry in 27 “volumes” Series Abbrev. Coverage Basic H up to 1910

Sup Ser I E I 1910-1919 Sup Ser II E II 1920-1929 Sup Ser III E III 1930-1949 (v. 1-16 only) Sup Ser III/IV E III/IV 1930-1959 (v. 17-27 only) Sup Ser IV E IV 1950-1959 (v. 1-16 only) Sup Ser V E V 1960-1979 (English)

Page 5: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Psst! Want a good, cheap set of Beilstein? We have finally decided that our cramped Chemistry Library can no longerafford the luxury of retaining our Beilstein print collection (which hasprobably not been touched for several years now, since we acquired theonline version). We hope we can find a new home for the collection (all437 volumes, plus a handful of how-to-use-it texts), otherwise it mustbe discarded. Any organization willing to pay the shipping costs iswelcome to this collection. If interested, please contact me directly.

Howard M. DessChemistry and Physics LibrarianLibrary of Science and MedicineRutgers UniversityPiscataway, NJ 08854-8009

Source: CHMINF-L, June 1, 2004

Page 6: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook: Arrangement of Compounds Beilstein: a collection of critically evaluated

data on organic compounds arranged in a classified manner

Arrangement:– Acyclic Compounds, Volumes 1-4 – Isocyclic Compounds, Volumes 5-16– Heterocyclic Compounds, Volumes 17-27– Divided into System Numbers 1-4720

Each Supplementary Series (E) volume contains the same classes of compounds as the corresponding Basic (H) volume

Page 7: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

System Number Meaning

Beilstein Institute never published the meanings of the System Numbers

System Number 3691 means "heterocyclic carbon frameworks with exactly 2 N ring atoms with a combination of exactly 2 hydroxy groups and 1 carboxylic acid group”

Page 8: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Placement of Info in Beilstein: Registry (Index) Compounds Stem nuclei: Hydrocarbons, saturated followed by unsaturated Oxy = Hydroxy compounds: alcohols (OH) Oxo = Carbonyl compounds: aldehydes and ketones (C=O) Carboxylic Acids (COOH) Sulfinic Acids (SO2H) Sulfonic Acids (SO3H) Chalcogen Oxoacids (XO2H, XO2OH); X = S, Se, Te Amines (NH2) Hydroxylamines (NHOH) & Dihydroxylamines (N(OH)2) Hydrazines (NHNH2) Azo compounds (N=NH) More complex N functionalities Group containing other elements (P, As, Si, Mg, etc.)

Page 9: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein System Algorithm 1

Beilstein “hydrolysis” scheme based on an instinctive chemical classification as perceived by an organic chemist

Carbons with more than one (non-ring) heteroatom attached are always regarded as derived from carbonyl groups, if:– at least one of the heteroatoms is other

than the attachment atom of a substituent (halogen, nitro, nitroso, azide)

Page 10: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein System Algorithm 2 Splits any molecule into a set of fragments Splitting points are C-Q bonds, where Q is a

heteroatom that does not belong to a ring in common with the C in question

Fragments then classified and coded using – skeletal features– type and multiplicity of chemical functional groups

(including masked groups)– degree of unsaturation– carbon number(See "Notes for Users" at the start of each Beilstein

volume published from about 1992 onwards.)

Page 11: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Source of Ambiguity

In the physical Beilstein Handbook, the end of one system number and the beginning of another sometimes occur on the same physical page.

Leads to bleed-over from the previous section (e.g., alkyl hydrocarbons linked to the simplest alcohol, Methane)

Page 12: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number

Originally used in the program SANDRA Algorithmic expression of the System-

Numbers in the printed work – System Numbers: 1-4720– Lawson Numbers: 8-32759– System Number = Lawson Number

divided by 8 (roughly) Inherited the ambiguity of the page

number placement

Page 13: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number: Purpose

To divide the total virtual structure universe of published and unpublished compounds into approximately equal sections (virtual pages) of related compounds

Page 14: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Occurrence 1 Any compound may have several LNs; most

have 2 to 3. In 1991, (1.8 million compounds in the file at

that time):– 25.1% had 1– 39.4% had 2– 24.0% had 3– 8.5% had 4– 3.0% had > 4

 Average LN occurred in about 70 compounds in 1991

Page 15: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Occurrence 2

Occasionally a LN will represent a unique structure, e.g., LN 12, retrieves only BRN 4736629:

Page 16: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

What governs the value of the LN? In order of influence: Cyclic class (number and type of heteroatoms) Chemical functions (amine, hydroxy, etc.) Degree of unsaturation of the carbon framework wrt

multiple bonds at carbon + ring closures Carbon count of the carbon-complete fragment

framework Degree of carbon branching Degree of halogen and nitro substitution Chalcogen exchange Ring sizes

Page 17: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook of Organic Chemistry: SANDRA SANDRA, Structure AND Reference Analyzer

– Program that interpreted a graphical structure of a compound and predicted where it should be found in printed Beilstein

– Developed in 1987 by Alexander Lawson for use on a local microcomputer

SANDRA fragment screens had a heavy chemical bias: classified according to chemical structure

Page 18: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook of Organic Chemistry: SANDRA 12-digit code linked information to page

ranges

Page 19: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook of Organic Chemistry: SANDRA This compound belongs in v. 13 Syst. 1823 H p. 348 Hashcodes:

• Ethylamine 000500010002

• Phenol 800100010906

• Non-localized amino-cyclohexanol 800510010306

Page 20: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein Handbook of Organic Chemistry SANDRA 12-digit hash code had corresponding

4-digit code,e.g., the number 1849 linked 800510010306 to System no. 1823, H-page 348.

Four-digit number retained the sortability of the 12-digit code, but gives a hashcode for each fragment that can be stored in 2 bytes: 7392-28C1-1610

Page 21: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Planned Enhancements (around 1990) A second phase of the LN implementation never materialized

for LNs greater than 32767– was to include 8000 shape discriminators to help avoid false drops,

with LN values in the range 32776-40951– Ring skeletal shapes for all mono and bicyclic systems (including

fused, bridged, and spiro rings) of 3-10 ring atoms, containing 0, 1, or 2 heteroatoms of the set (O,N,S) in any combination or any ring position would get a unique LN

– For rings with 11-17 atoms including O,N,S ring atoms would get a LN

– Another LN for those with heteroatoms other than N, O, S– All mono and bicyclic systems with 18 or more ring-atoms were to

get one LN– A single LN for for tricyclic and greater ring systems (Further

discrimination could be based on present or not present, such as steroid skeletons, morphane, adamantine, etc.)

Page 22: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Uses

Most effectively used when combined with other search elements, e.g.:– Molecular Formula– Element Ranges– Boolean operator NOT in combination with

substructures

Page 23: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Search Toolhttp://mypage.iu.edu/~ucoca/begperl/formFetch.html

Page 24: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Search in Usha’s DB for COOH/O-R/(O4) Retrieves (among seven LN ranges):

LN Range Function

31456-31471 COOH/O-R/(O4)

Page 25: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein CrossFire Search for LN Range 31456-31471 Yielded 10,467 hits on 4/15/2004 One of those was BRN 18833 with LNs

31459 and 289:

Page 26: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Search in Usha’s DB Revealed that LN 289 is O-R(*1) Combining the previous Beilstein

CrossFire search with LN 289 yielded 4910 hits on 4/15/2004.

Page 27: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Search for LN 289 in Usha’s Database

Page 28: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Search in Beilstein CrossFire Find a compound with a cyclopentane

ring with three free sites (over 440,000 substances) and with both LN 31459 and LN 289

Result: 10 substances on 4/15/2004

Page 29: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

CrossFire LN Search Yields Very Diverse Results

Page 30: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number Range Search # 2 on CrossFire 23369 –25200 Yielded 668,065 substances on 6/3/04 When combined with the chemical

name segment Aziridin* in proximity to Propion*, the search yielded 142 substances.

Page 31: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

CrossFire 2 Search Results:All have in common LN 24059

Page 32: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Lawson Number 24059

Parent Heterocycles N(1)

Page 33: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Possible to Link CrossFire to Usha’s Web Tool Hop in feature

– Allows users to jump into CrossFire Commander and run a search from a link on the Web (or from an external package)

Page 34: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Conclusion

While the Lawson Number was originally developed as a tool to aid in finding the correct place for a given compound in the printed Beilstein, it clearly has utility in online searches of the Beilstein database. Having a Web supplement that defines the meaning of the Lawson Numbers will enhance the usefulness of the search field.

Page 35: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Bibliography and AcknowledgementThe generous input from Dr. Alexander Lawson is much

appreciated!

Lawson, Alexander J. “Structure graphics in: pointers to Beilstein out.” in: Warr, Wendy A., ed. Graphics for chemical structures: integration with text and data. (ACS Symposium Series; 341) American Chemical Society: Washington, 1987, 80-87.

Lawson, Alexander J. “Chemical structure browsing.” in: Warr, Wendy A., ed. Chemical structure information systems: Interfaces, communication, and standards. (ACS Symposium Series; 400) American Chemical Society: Washington, 1989, 41-49.

Lawson, Alexander J. “The Lawson similarity number (LN). Offline generation and online use.” in: Heller, Stephen R., ed. The Beilstein online database: implementation, content, and retrieval. (ACS Symposium Series; 436) American Chemical Society: Washington, 1990, 143-155.

Page 36: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Bibliography Sunkel,J.; Hoffman, E.; Luckenbach, R.

“Straightforward procedure for locating chemical compounds in the Beilstein Handbook.” Journal of Chemical Education 1981, 58(12), 982-986..

“A powerful tool for chemists: The Lawson-Number.” [brochure] Springer-Verlag, Berlin: 1989?.

Lawson, Alexander. Personal communication. 22 June 2001.

Meehan, Paul; Schofield, Helen. “CrossFire; a structural revolution for chemists.” Online Information Review 2001, 25(4), 241-249.

Page 37: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

MIMAS (Manchester Information & Associated Services) JISC-supported UK national data

center Run by Manchester Computing at

the University of Manchester Provides access to ISI Web of

Knowledge, JSTOR, CrossFire, etc. http://www.mimas.ac.uk/

Page 38: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

MIMAS CrossFire Services

Very useful documentation– http://www.mimas.ac.uk/crossfire/docs.html

Introductory guides Training materials Manuals

Page 39: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

UW-Madison CrossFire Site

Links to a locally-produced help file http://chemistry.library.wisc.edu/beilstein/home.htm

Quick Guide http://chemistry.library.wisc.edu/beilstein/quickguide.htm

Page 40: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

Beilstein on STN

Beilstein on STN (Workshop Manual). FIZ Karlsruhe: Eggenstein-Leopoldshafen, 2003.

http://www.stn-international.com/training_center/chemistry/beilstein/beilstein_wsm.pdf

Page 41: Maximizing the Use of the Lawson Number in Beilstein Searching Gary Wiggins and Usha Coca School of Informatics Indiana University ACS CERM, June 4, 2004

MDL Web Site

Replaces the former Beilstein site MDL Knowledge Base

– http://www.mdl.com/support/knowledgebase/index.jsp