Upload
blaise-woods
View
218
Download
3
Tags:
Embed Size (px)
Citation preview
Maximizing the Use of the Lawson Number in Beilstein SearchingGary Wiggins and Usha Coca
School of Informatics
Indiana UniversityACS CERM, June 4, 2004
Abstract In the Beilstein database, the Lawson Number is based on the
Beilstein System for classifying organic substances. Every substance in the Beilstein file has at least one Lawson Number, and the smaller the Lawson Number, the more common is the fragment. While the Lawson Number is a searchable field, searching with Lawson Numbers is not equivalent to substructure or Markush searching. Since the Lawson Numbers represent certain structural fragments, they can be used for structural similarity searches. Searches that include the Lawson Number are effective when used in combination with other search keys, such as molecular formula, element ranges, etc. It is also useful when combined with NOT in substructure searches. Thus, the Lawson Number could serve as an effective index search key if its meaning were known. We have developed a prototype system that could be interfaced with the CrossFire system for effective use of the Lawson Numbers in searching. The system will be described and demonstrated.
Friedrich Konrad Beilstein(1838-1906)
Beilstein Handbook of Organic Chemistry in 27 “volumes” Series Abbrev. Coverage Basic H up to 1910
Sup Ser I E I 1910-1919 Sup Ser II E II 1920-1929 Sup Ser III E III 1930-1949 (v. 1-16 only) Sup Ser III/IV E III/IV 1930-1959 (v. 17-27 only) Sup Ser IV E IV 1950-1959 (v. 1-16 only) Sup Ser V E V 1960-1979 (English)
Psst! Want a good, cheap set of Beilstein? We have finally decided that our cramped Chemistry Library can no longerafford the luxury of retaining our Beilstein print collection (which hasprobably not been touched for several years now, since we acquired theonline version). We hope we can find a new home for the collection (all437 volumes, plus a handful of how-to-use-it texts), otherwise it mustbe discarded. Any organization willing to pay the shipping costs iswelcome to this collection. If interested, please contact me directly.
Howard M. DessChemistry and Physics LibrarianLibrary of Science and MedicineRutgers UniversityPiscataway, NJ 08854-8009
Source: CHMINF-L, June 1, 2004
Beilstein Handbook: Arrangement of Compounds Beilstein: a collection of critically evaluated
data on organic compounds arranged in a classified manner
Arrangement:– Acyclic Compounds, Volumes 1-4 – Isocyclic Compounds, Volumes 5-16– Heterocyclic Compounds, Volumes 17-27– Divided into System Numbers 1-4720
Each Supplementary Series (E) volume contains the same classes of compounds as the corresponding Basic (H) volume
System Number Meaning
Beilstein Institute never published the meanings of the System Numbers
System Number 3691 means "heterocyclic carbon frameworks with exactly 2 N ring atoms with a combination of exactly 2 hydroxy groups and 1 carboxylic acid group”
Placement of Info in Beilstein: Registry (Index) Compounds Stem nuclei: Hydrocarbons, saturated followed by unsaturated Oxy = Hydroxy compounds: alcohols (OH) Oxo = Carbonyl compounds: aldehydes and ketones (C=O) Carboxylic Acids (COOH) Sulfinic Acids (SO2H) Sulfonic Acids (SO3H) Chalcogen Oxoacids (XO2H, XO2OH); X = S, Se, Te Amines (NH2) Hydroxylamines (NHOH) & Dihydroxylamines (N(OH)2) Hydrazines (NHNH2) Azo compounds (N=NH) More complex N functionalities Group containing other elements (P, As, Si, Mg, etc.)
Beilstein System Algorithm 1
Beilstein “hydrolysis” scheme based on an instinctive chemical classification as perceived by an organic chemist
Carbons with more than one (non-ring) heteroatom attached are always regarded as derived from carbonyl groups, if:– at least one of the heteroatoms is other
than the attachment atom of a substituent (halogen, nitro, nitroso, azide)
Beilstein System Algorithm 2 Splits any molecule into a set of fragments Splitting points are C-Q bonds, where Q is a
heteroatom that does not belong to a ring in common with the C in question
Fragments then classified and coded using – skeletal features– type and multiplicity of chemical functional groups
(including masked groups)– degree of unsaturation– carbon number(See "Notes for Users" at the start of each Beilstein
volume published from about 1992 onwards.)
Source of Ambiguity
In the physical Beilstein Handbook, the end of one system number and the beginning of another sometimes occur on the same physical page.
Leads to bleed-over from the previous section (e.g., alkyl hydrocarbons linked to the simplest alcohol, Methane)
Lawson Number
Originally used in the program SANDRA Algorithmic expression of the System-
Numbers in the printed work – System Numbers: 1-4720– Lawson Numbers: 8-32759– System Number = Lawson Number
divided by 8 (roughly) Inherited the ambiguity of the page
number placement
Lawson Number: Purpose
To divide the total virtual structure universe of published and unpublished compounds into approximately equal sections (virtual pages) of related compounds
Lawson Number Occurrence 1 Any compound may have several LNs; most
have 2 to 3. In 1991, (1.8 million compounds in the file at
that time):– 25.1% had 1– 39.4% had 2– 24.0% had 3– 8.5% had 4– 3.0% had > 4
Average LN occurred in about 70 compounds in 1991
Lawson Number Occurrence 2
Occasionally a LN will represent a unique structure, e.g., LN 12, retrieves only BRN 4736629:
What governs the value of the LN? In order of influence: Cyclic class (number and type of heteroatoms) Chemical functions (amine, hydroxy, etc.) Degree of unsaturation of the carbon framework wrt
multiple bonds at carbon + ring closures Carbon count of the carbon-complete fragment
framework Degree of carbon branching Degree of halogen and nitro substitution Chalcogen exchange Ring sizes
Beilstein Handbook of Organic Chemistry: SANDRA SANDRA, Structure AND Reference Analyzer
– Program that interpreted a graphical structure of a compound and predicted where it should be found in printed Beilstein
– Developed in 1987 by Alexander Lawson for use on a local microcomputer
SANDRA fragment screens had a heavy chemical bias: classified according to chemical structure
Beilstein Handbook of Organic Chemistry: SANDRA 12-digit code linked information to page
ranges
Beilstein Handbook of Organic Chemistry: SANDRA This compound belongs in v. 13 Syst. 1823 H p. 348 Hashcodes:
• Ethylamine 000500010002
• Phenol 800100010906
• Non-localized amino-cyclohexanol 800510010306
Beilstein Handbook of Organic Chemistry SANDRA 12-digit hash code had corresponding
4-digit code,e.g., the number 1849 linked 800510010306 to System no. 1823, H-page 348.
Four-digit number retained the sortability of the 12-digit code, but gives a hashcode for each fragment that can be stored in 2 bytes: 7392-28C1-1610
Lawson Number Planned Enhancements (around 1990) A second phase of the LN implementation never materialized
for LNs greater than 32767– was to include 8000 shape discriminators to help avoid false drops,
with LN values in the range 32776-40951– Ring skeletal shapes for all mono and bicyclic systems (including
fused, bridged, and spiro rings) of 3-10 ring atoms, containing 0, 1, or 2 heteroatoms of the set (O,N,S) in any combination or any ring position would get a unique LN
– For rings with 11-17 atoms including O,N,S ring atoms would get a LN
– Another LN for those with heteroatoms other than N, O, S– All mono and bicyclic systems with 18 or more ring-atoms were to
get one LN– A single LN for for tricyclic and greater ring systems (Further
discrimination could be based on present or not present, such as steroid skeletons, morphane, adamantine, etc.)
Lawson Number Uses
Most effectively used when combined with other search elements, e.g.:– Molecular Formula– Element Ranges– Boolean operator NOT in combination with
substructures
Lawson Number Search Toolhttp://mypage.iu.edu/~ucoca/begperl/formFetch.html
Lawson Number Search in Usha’s DB for COOH/O-R/(O4) Retrieves (among seven LN ranges):
LN Range Function
31456-31471 COOH/O-R/(O4)
Beilstein CrossFire Search for LN Range 31456-31471 Yielded 10,467 hits on 4/15/2004 One of those was BRN 18833 with LNs
31459 and 289:
Lawson Number Search in Usha’s DB Revealed that LN 289 is O-R(*1) Combining the previous Beilstein
CrossFire search with LN 289 yielded 4910 hits on 4/15/2004.
Lawson Number Search for LN 289 in Usha’s Database
Lawson Number Search in Beilstein CrossFire Find a compound with a cyclopentane
ring with three free sites (over 440,000 substances) and with both LN 31459 and LN 289
Result: 10 substances on 4/15/2004
CrossFire LN Search Yields Very Diverse Results
Lawson Number Range Search # 2 on CrossFire 23369 –25200 Yielded 668,065 substances on 6/3/04 When combined with the chemical
name segment Aziridin* in proximity to Propion*, the search yielded 142 substances.
CrossFire 2 Search Results:All have in common LN 24059
Lawson Number 24059
Parent Heterocycles N(1)
Possible to Link CrossFire to Usha’s Web Tool Hop in feature
– Allows users to jump into CrossFire Commander and run a search from a link on the Web (or from an external package)
Conclusion
While the Lawson Number was originally developed as a tool to aid in finding the correct place for a given compound in the printed Beilstein, it clearly has utility in online searches of the Beilstein database. Having a Web supplement that defines the meaning of the Lawson Numbers will enhance the usefulness of the search field.
Bibliography and AcknowledgementThe generous input from Dr. Alexander Lawson is much
appreciated!
Lawson, Alexander J. “Structure graphics in: pointers to Beilstein out.” in: Warr, Wendy A., ed. Graphics for chemical structures: integration with text and data. (ACS Symposium Series; 341) American Chemical Society: Washington, 1987, 80-87.
Lawson, Alexander J. “Chemical structure browsing.” in: Warr, Wendy A., ed. Chemical structure information systems: Interfaces, communication, and standards. (ACS Symposium Series; 400) American Chemical Society: Washington, 1989, 41-49.
Lawson, Alexander J. “The Lawson similarity number (LN). Offline generation and online use.” in: Heller, Stephen R., ed. The Beilstein online database: implementation, content, and retrieval. (ACS Symposium Series; 436) American Chemical Society: Washington, 1990, 143-155.
Bibliography Sunkel,J.; Hoffman, E.; Luckenbach, R.
“Straightforward procedure for locating chemical compounds in the Beilstein Handbook.” Journal of Chemical Education 1981, 58(12), 982-986..
“A powerful tool for chemists: The Lawson-Number.” [brochure] Springer-Verlag, Berlin: 1989?.
Lawson, Alexander. Personal communication. 22 June 2001.
Meehan, Paul; Schofield, Helen. “CrossFire; a structural revolution for chemists.” Online Information Review 2001, 25(4), 241-249.
MIMAS (Manchester Information & Associated Services) JISC-supported UK national data
center Run by Manchester Computing at
the University of Manchester Provides access to ISI Web of
Knowledge, JSTOR, CrossFire, etc. http://www.mimas.ac.uk/
MIMAS CrossFire Services
Very useful documentation– http://www.mimas.ac.uk/crossfire/docs.html
Introductory guides Training materials Manuals
UW-Madison CrossFire Site
Links to a locally-produced help file http://chemistry.library.wisc.edu/beilstein/home.htm
Quick Guide http://chemistry.library.wisc.edu/beilstein/quickguide.htm
Beilstein on STN
Beilstein on STN (Workshop Manual). FIZ Karlsruhe: Eggenstein-Leopoldshafen, 2003.
http://www.stn-international.com/training_center/chemistry/beilstein/beilstein_wsm.pdf
MDL Web Site
Replaces the former Beilstein site MDL Knowledge Base
– http://www.mdl.com/support/knowledgebase/index.jsp