Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
In-Depth Structure Searching in DWPIM on STNext® – Best Practices
• Node types
• Attributes for nodes
• STN generic nodes vs. Derwent superatoms
• Search examples
Agenda
• Markush Indexing provided for DWPI records of Derwent sections B (pharma), C (agro) and E (chemistry)
• Powerful Markush retrieval solution supporting Derwent nodes
• STN structure search options‒ Markush search modes: substructure and
closed substructure (CSS, SSS)‒ Search scope includes SAMPLE, FULL and
BATCH search (SUBSET planned)
3
Derwent Markush Resource on STNext
DWPIM2.1 M Markush
DWPI890,000 patent records
=> S Lx
* => TRA MCN /AN
*
4
DWPIM Structure Searching and Crossover to DWPI isSimilar to REGISTRY/CAplusSM
=> FILE DWPIM
=> Uploading structure file: 2018_0006_StructureL1 STRUCTURE UPLOADED
=> S L1 SSS FULL
100.0% PROCESSED 0 ITERATIONS 1443 ANSWERSL2 1443 SEA SSS FUL L3
=> FIL WPINDEX
=> S L2L3 955 L5
=> D FULLG AHITSTR
1
2
3
1
2
3
DWPIM structure search
Crossover to WPINDEX
WPINDEX display with HIT structures More information in previous e-seminar
STNext® home page 5
Click the Draw button to open the STN Structure Editor.
All drawn structures can be found under the Structures option of the My Files menu, and are saved until manually deleted.
• Node types
• Attributes for nodes
• STN generic nodes vs. Derwent superatoms
• Search examples
Agenda
STN Structure Editor7
The four options for nodes can be found here.
Structures are made up of two things – Nodes & Bonds8
• Nodes− Specific atoms (C, N, O, etc.)− Shortcuts (OH, NH2, CH3, etc.)− Variables
− STN Generic Nodes (Ak, Cy, Cb, Hy)− Derwent Superatoms (CHK, CHE, CHY, ARY, CYC, HEF, HEA, HET)
− R Groups− User defined variables
Specific Atom nodes9
Click on this button to open specific atom options.
Common specific atoms can also be selected here.
Shortcut nodes10
Click on this button to open shortcut options.
Note: All shortcuts with hydrogens force hydrogens at that site, even with substructure searches.
Shortcut nodes11
Put the cursor over any shortcut to see the corresponding chemical structure.
Variable nodes12
Click on this button to open variable node options.
Variable nodes13
Click on the Derwent generic node options to see specific Derwent superatoms.
R Group nodes14
Click on this button to open the R Group options. An R Group can be made up of specific atoms, shortcuts, variables, or any combinations thereof.
Multiple R Groups (i.e., R1, R2, etc.) can be created.
• Repeating Group button
• Variable Points of Attachment button
• Lock Ring button
• Lock Atoms button
Additional STN Structure Editor functions
Repeating Group button16
Click on Repeating Group button, draw a box around the node(s) of interest, set range in number boxes, then click Apply.
Variable Points of Attachment (VPA) button17
Draw node separate from core structure, click on VPA button and drag from node to point of interest. Repeat as necessary.
Lock Ring button18
Use the Lock Ring function on ring systems to block them from further ring fusion, or on a chain to block it from ring formation.
Click on Lock Ring button, put cursor over any ring node or bond, left mouse click. Ring nodes and bonds will become bold, denoting that the ring is locked.
Lock Atoms button19
Use the Lock Atoms function to block a node from any substitution.
Click on Lock Atoms button, put cursor over node, left mouse click. Node will have a box around it, denoting that the node is locked.
• Node types
• Attributes for nodes
• STN generic nodes vs. Derwent superatoms
• Search examples
Agenda
Sample Structure21
Attributes for Nodes
• Hydrogen Count
• Markush Attributes
• Mass
• Node Type
• Non-Hydrogen Count
• Valency
22
Attributes for Nodes
• Hydrogen Count‒ Any, Exact, Minimum
• Markush Attributes
• Mass
• Node Type
• Non-Hydrogen Count
• Valency
23
Hydrogen Count Node Attribute for Specific Atoms 24
Hydrogen Count attribute can be associated with specific atoms, but not with shortcuts or variables.
Right click on node to open up Node Attributes menu.
Attributes for Nodes
• Hydrogen Count• Markush Attributes
‒ Element Counts, Generic Definitions: Heterocycle, Markush Attributes, and Non-Hydrogen Count
• Mass• Node Type• Non-Hydrogen Count• Valency
25
Attributes for Hy Node26
Note the highlighted attributes in the Attributes Values panel for the node in question. The default Markush Attribute for the Hy STN Generic Node is Atom. That means that all structures retrieved will only have specific heterocyclic rings at that position, and will NOT have the Derwent HEA, HET or HEF superatoms at that position.
Attributes for Nodes27
Right click on node to open up Node Attributes menu.
Markush Attributes: Structure Match Levels
• ATOM‒ Retrieves only specific elements and groups of elements
• CLASS‒ Retrieves the results of ATOM plus all generic nodes of the node hierarchy
except the generic R-node
• ANY‒ Retrieves the results of CLASS plus the generic R-node
28
Structure Match Levels (cont.)
• Default match levels –‒ ATOM for ring nodes‒ CLASS for chain nodes
• Right click on node, then choose Markush attributes to change the match level
29
Full Substructure Search for Sample Structure30
After the substructure search was done, the INCOMPLETE records were always separated out from the structure search. They can be dealt with separately.
Displays for First Substructure Search31
Note that the highlighted portion of the structure shows specific heterocyclic rings where the Hy variable was.
Displays for First Substructure Search (cont.)32
Note that the highlighted portion of the structure shows specific heterocyclic rings where the Hy variable was.
Class Markush Attribute for Hy Node33
The Markush Attribute for the Hy node is changed from Atom to Class. Once the change has been made, click on Apply then OK.
Class Markush Attribute for Hy Node Substructure Search34
A substructure search was run on the edited structure (L6.) The resulting set (L8) had the records previously found with the Atom attribute removed (L9.)
Displays for Second Substructure Search35
Note that records with HEA, HET and HEF are now being captured as well.
Displays for Second Substructure Search (cont.)36
Note that records with HEA, HET and HEF are now being captured as well.
Any Markush Attributes for Hy Node37
The Markush Attribute for the Hy node is changed from Class to Any.
Any Markush Attribute for Hy Node Substructure Search38
A substructure search was run on the edited structure (L11.) The resulting set (L13) had the records previously found with the Atom or Class attributes removed (L14.)
Displays for Third Substructure Search39
Note that records with XX (or for older records, UNK) are now being captured as well.
Option: Define heteroatom counts for Hy node40
Going back to the Atom MarkushAttribute, it is possible to limit the Hy node to heterocyclic rings having certain kinds and counts of heteroatoms.
Option: Define heteroatom counts for Hy node41
In this example, the Hy heterocyclic ring must have at least one nitrogen and one oxygen. Once the changes have been made, click Apply, then OK.
Attributes Values Panel for Hy Node42
With the cursor over the Hy node, the Element Counts are now displayed in the Attributes Values Panel.
Full Substructure Search for Hy (ATOM) node with atom counts
43
Displays for Fourth Substructure Search44
Note that the specific heterocyclic ring has at least one nitrogen and at least one oxygen in the ring.
Displays for Fourth Substructure Search (cont.)45
Note that the specific heterocyclic ring has at least one nitrogen and at least one oxygen in the ring.
Option: Define heteroatom counts for Hy node46
Going back to the Class MarkushAttribute, it is possible to limit the Hy node to heterocyclic rings having certain kinds and counts of heteroatoms.
Option: Define heteroatom counts for Hy node47
In this example, the heterocyclic ring must have at least one nitrogen and one oxygen. Once the changes have been made, click Apply, then OK.
Full Substructure Search for Hy (CLASS) node with atom counts
48
A substructure search was run on the edited structure (L20.) The resulting set (L22) had the records previously found with the Atom attribute removed (L23.)
Displays for Fifth Substructure Search49
Note that the HET node has scope note 15, which denotes that the HET node has the possibility of the nitrogen and oxygen counts associated with that node.
Displays for Fifth Substructure Search (cont.)50
Note that the HET node has scope note 32, which denotes that the HET node has the possibility of the nitrogen and oxygen counts associated with that node.
• Node types
• Attributes for nodes and bonds
• STN generic nodes vs. Derwent superatoms
• Search examples
Agenda
STN Generic Nodes vs. Derwent Superatoms• The STN Generic Nodes are used by the CAS indexers when indexing
Markush structures for MARPAT®• The Derwent Superatoms are used by the Clarivate Analytics indexers
when indexing Markush structures for DWPIM• Searcher can use the STN Generic Nodes in structures for searching in
DWPIM, DCR, MARPAT® and REGISTRY‒ STN Structure Editor translates STN Generic Nodes into corresponding Derwent
Superatoms for DWPIM and DCR searches
• Searcher can use the Derwent Superatoms in structures for searching in DWPIM only! ‒ DCR search capability using the Derwent Superatoms is coming
52
STN Generic Nodes
• Ak = Alkyl chain• Cy = Any ring system• Cb = Any carbocyclic ring system• Hy = Any heterocyclic ring system• X = Any halogen atom, including At• M = Any metal atom• Q = Any atom except for Carbon or Hydrogen• A = Any atom except for Hydrogen
53
STN Generic Nodes
• Ak = Alkyl chain• Cy = Any ring system• Cb = Any carbocyclic ring system• Hy = Any heterocyclic ring system• X = Any halogen atom, including At• M = Any metal atom• Q = Any atom except for Carbon or Hydrogen• A = Any atom except for Hydrogen
54
Carbon chain Derwent Superatoms
• CHK = Alkyl, Alkylene
• CHE = Alkenyl, Alkenylene
• CHY = Alkynyl, Alkynylene
55
Carbocyclic Derwent Superatoms
• ARY = Aryl carbocyclic system‒ Carbocyclic systems which have at least one benzene ring or quinoid
variant, monocyclic or fused
• CYC = Any carbocyclic ring not covered by ARY‒ Monocyclic or fused
56
Heterocyclic Derwent Superatoms
• HEA = Heteroaryl‒ Aromatic, monocyclic ‒ Either fully unsaturated 6 membered rings or 5 membered rings with 2
double bonds
• HET = Heterocycle ‒ Nonaromatic, monocyclic which do not fit the definition of HEA
• HEF = Heterocycle (fused)
57
STN Generic Nodes translates into Derwent Superatoms for DWPIM searches
• Ak => CHK or CHE or CHY
• Cy => ARY or CYC or HEA or HET or HEF
• Cb => ARY or CYC
• Hy => HEA or HET or HEF
58
STN Generic Nodes vs. Derwent Superatoms example59
Note that in this example, the Hy Match Level attribute is Atom.
Upload Structure into DWPIM60
Run Substructure Search61
Replace Hy with HEA, repeat strategy 62
Replace HEA with HET, repeat strategy 63
Replace HET with HEF, repeat strategy 64
Compare Hy results with HEA, HET and HEF results65
L1 = Hy structure
L5 = HEA structure
L9 = HET structure
L13 = HEF structure
Another way to compare Hy to HEA, HET and HEF66
No additional records were found using Hy versus HEA, HET and HEF, and vice versa.
Best Practice: How to deal with Iteration Incompletes
From our experience: • Most often very generic Markush structures are of no relevance• Most often core structure is not related to query structure• If set needs to be evaluated:
‒ do not analyze structures in DWPIM but crossover incomplete structures to WPINDEX/WPIDS/WPIX
• If possible, narrow result to technical area like pharma (B/DC), e.g. by using roles• Display WPINDEX records with graphic images, e.g.
=> D AN TI GI=> D FULLG
(Graphic images include structures often from the claims)• Development work for DWPIM is continued to reduce number of incompletes further
67
DWPIM TO DWPI Patent Family Records68
By including the INCOMPLETES, 47 additional patent families are added to the search results. Limit this search by class codes, keyword search, Manual codes, etc..
Consistent searching in all structure and Markush databases on STN• Single structure query for all structure files with STN structure conventions
- adaptations to DWPIM Markush searching can easily be done• Single flexible concept for the Markush search options (match levels) on STN
Integrated environment of Derwent files for search and display• Combination of structure (DCR, DWPIM) and text searches (DWPI)• Integrated display in DWPI, including structures from DCR and DWPIM
Deployment of the full potential of the DWPI Markush database• Derwent bonding conventions are retained as provided by Clarivate Analytics• Availability of Derwent node conventions for searching
High precision state-of-the-art Markush search engine Improved evaluation of Markush structures with hit structure display, highlighting,
and assembled structures
69
Advantages of the DWPIM Implementation on STNext®
Summary
• Markush searching is as much art as it is science
• Do not be afraid to play around with structures‒ Compare results as changes are made to attributes to understand the
effect of those changes
• Do not be afraid to ask for help!
70
E-Seminar on DWPIM coming next
E-Seminars on February 28, 2019Multifile structure searching on STNext®
• Why is it important to search DWPIM in addition to otherstructure databases on STN (e.g. MARPAT®)
• Case study evaluating unique hits• Best practices
Sign-up for the STN training newsletter by FIZ Karlsruhe!
71
Resources
• DWPIM Manual‒ http://www.stn-international.de/dwpim_reference_manual.html
• Recorded sessions‒ Structure Searching on STNext®
• http://www.stn-international.com/201803_structure_search_stnext.html‒ The Derwent Markush Resource database (aka DWPIM) on STNext®
• http://www.stn-international.com/201810_dwpim_stnext.html
72