In-Depth Structure Searching in DWPIM on STNext® – Best ... · 100.0% PROCESSED 0 ITERATIONS...

Preview:

Citation preview

In-Depth Structure Searching in DWPIM on STNext® – Best Practices

• Node types

• Attributes for nodes

• STN generic nodes vs. Derwent superatoms

• Search examples

Agenda

• Markush Indexing provided for DWPI records of Derwent sections B (pharma), C (agro) and E (chemistry)

• Powerful Markush retrieval solution supporting Derwent nodes

• STN structure search options‒ Markush search modes: substructure and

closed substructure (CSS, SSS)‒ Search scope includes SAMPLE, FULL and

BATCH search (SUBSET planned)

3

Derwent Markush Resource on STNext

DWPIM2.1 M Markush

DWPI890,000 patent records

=> S Lx

* => TRA MCN /AN

*

4

DWPIM Structure Searching and Crossover to DWPI isSimilar to REGISTRY/CAplusSM

=> FILE DWPIM

=> Uploading structure file: 2018_0006_StructureL1 STRUCTURE UPLOADED

=> S L1 SSS FULL

100.0% PROCESSED 0 ITERATIONS 1443 ANSWERSL2 1443 SEA SSS FUL L3

=> FIL WPINDEX

=> S L2L3 955 L5

=> D FULLG AHITSTR

1

2

3

1

2

3

DWPIM structure search

Crossover to WPINDEX

WPINDEX display with HIT structures More information in previous e-seminar

STNext® home page 5

Click the Draw button to open the STN Structure Editor.

All drawn structures can be found under the Structures option of the My Files menu, and are saved until manually deleted.

• Node types

• Attributes for nodes

• STN generic nodes vs. Derwent superatoms

• Search examples

Agenda

STN Structure Editor7

The four options for nodes can be found here.

Structures are made up of two things – Nodes & Bonds8

• Nodes− Specific atoms (C, N, O, etc.)− Shortcuts (OH, NH2, CH3, etc.)− Variables

− STN Generic Nodes (Ak, Cy, Cb, Hy)− Derwent Superatoms (CHK, CHE, CHY, ARY, CYC, HEF, HEA, HET)

− R Groups− User defined variables

Specific Atom nodes9

Click on this button to open specific atom options.

Common specific atoms can also be selected here.

Shortcut nodes10

Click on this button to open shortcut options.

Note: All shortcuts with hydrogens force hydrogens at that site, even with substructure searches.

Shortcut nodes11

Put the cursor over any shortcut to see the corresponding chemical structure.

Variable nodes12

Click on this button to open variable node options.

Variable nodes13

Click on the Derwent generic node options to see specific Derwent superatoms.

R Group nodes14

Click on this button to open the R Group options. An R Group can be made up of specific atoms, shortcuts, variables, or any combinations thereof.

Multiple R Groups (i.e., R1, R2, etc.) can be created.

• Repeating Group button

• Variable Points of Attachment button

• Lock Ring button

• Lock Atoms button

Additional STN Structure Editor functions

Repeating Group button16

Click on Repeating Group button, draw a box around the node(s) of interest, set range in number boxes, then click Apply.

Variable Points of Attachment (VPA) button17

Draw node separate from core structure, click on VPA button and drag from node to point of interest. Repeat as necessary.

Lock Ring button18

Use the Lock Ring function on ring systems to block them from further ring fusion, or on a chain to block it from ring formation.

Click on Lock Ring button, put cursor over any ring node or bond, left mouse click. Ring nodes and bonds will become bold, denoting that the ring is locked.

Lock Atoms button19

Use the Lock Atoms function to block a node from any substitution.

Click on Lock Atoms button, put cursor over node, left mouse click. Node will have a box around it, denoting that the node is locked.

• Node types

• Attributes for nodes

• STN generic nodes vs. Derwent superatoms

• Search examples

Agenda

Sample Structure21

Attributes for Nodes

• Hydrogen Count

• Markush Attributes

• Mass

• Node Type

• Non-Hydrogen Count

• Valency

22

Attributes for Nodes

• Hydrogen Count‒ Any, Exact, Minimum

• Markush Attributes

• Mass

• Node Type

• Non-Hydrogen Count

• Valency

23

Hydrogen Count Node Attribute for Specific Atoms 24

Hydrogen Count attribute can be associated with specific atoms, but not with shortcuts or variables.

Right click on node to open up Node Attributes menu.

Attributes for Nodes

• Hydrogen Count• Markush Attributes

‒ Element Counts, Generic Definitions: Heterocycle, Markush Attributes, and Non-Hydrogen Count

• Mass• Node Type• Non-Hydrogen Count• Valency

25

Attributes for Hy Node26

Note the highlighted attributes in the Attributes Values panel for the node in question. The default Markush Attribute for the Hy STN Generic Node is Atom. That means that all structures retrieved will only have specific heterocyclic rings at that position, and will NOT have the Derwent HEA, HET or HEF superatoms at that position.

Attributes for Nodes27

Right click on node to open up Node Attributes menu.

Markush Attributes: Structure Match Levels

• ATOM‒ Retrieves only specific elements and groups of elements

• CLASS‒ Retrieves the results of ATOM plus all generic nodes of the node hierarchy

except the generic R-node

• ANY‒ Retrieves the results of CLASS plus the generic R-node

28

Structure Match Levels (cont.)

• Default match levels –‒ ATOM for ring nodes‒ CLASS for chain nodes

• Right click on node, then choose Markush attributes to change the match level

29

Full Substructure Search for Sample Structure30

After the substructure search was done, the INCOMPLETE records were always separated out from the structure search. They can be dealt with separately.

Displays for First Substructure Search31

Note that the highlighted portion of the structure shows specific heterocyclic rings where the Hy variable was.

Displays for First Substructure Search (cont.)32

Note that the highlighted portion of the structure shows specific heterocyclic rings where the Hy variable was.

Class Markush Attribute for Hy Node33

The Markush Attribute for the Hy node is changed from Atom to Class. Once the change has been made, click on Apply then OK.

Class Markush Attribute for Hy Node Substructure Search34

A substructure search was run on the edited structure (L6.) The resulting set (L8) had the records previously found with the Atom attribute removed (L9.)

Displays for Second Substructure Search35

Note that records with HEA, HET and HEF are now being captured as well.

Displays for Second Substructure Search (cont.)36

Note that records with HEA, HET and HEF are now being captured as well.

Any Markush Attributes for Hy Node37

The Markush Attribute for the Hy node is changed from Class to Any.

Any Markush Attribute for Hy Node Substructure Search38

A substructure search was run on the edited structure (L11.) The resulting set (L13) had the records previously found with the Atom or Class attributes removed (L14.)

Displays for Third Substructure Search39

Note that records with XX (or for older records, UNK) are now being captured as well.

Option: Define heteroatom counts for Hy node40

Going back to the Atom MarkushAttribute, it is possible to limit the Hy node to heterocyclic rings having certain kinds and counts of heteroatoms.

Option: Define heteroatom counts for Hy node41

In this example, the Hy heterocyclic ring must have at least one nitrogen and one oxygen. Once the changes have been made, click Apply, then OK.

Attributes Values Panel for Hy Node42

With the cursor over the Hy node, the Element Counts are now displayed in the Attributes Values Panel.

Full Substructure Search for Hy (ATOM) node with atom counts

43

Displays for Fourth Substructure Search44

Note that the specific heterocyclic ring has at least one nitrogen and at least one oxygen in the ring.

Displays for Fourth Substructure Search (cont.)45

Note that the specific heterocyclic ring has at least one nitrogen and at least one oxygen in the ring.

Option: Define heteroatom counts for Hy node46

Going back to the Class MarkushAttribute, it is possible to limit the Hy node to heterocyclic rings having certain kinds and counts of heteroatoms.

Option: Define heteroatom counts for Hy node47

In this example, the heterocyclic ring must have at least one nitrogen and one oxygen. Once the changes have been made, click Apply, then OK.

Full Substructure Search for Hy (CLASS) node with atom counts

48

A substructure search was run on the edited structure (L20.) The resulting set (L22) had the records previously found with the Atom attribute removed (L23.)

Displays for Fifth Substructure Search49

Note that the HET node has scope note 15, which denotes that the HET node has the possibility of the nitrogen and oxygen counts associated with that node.

Displays for Fifth Substructure Search (cont.)50

Note that the HET node has scope note 32, which denotes that the HET node has the possibility of the nitrogen and oxygen counts associated with that node.

• Node types

• Attributes for nodes and bonds

• STN generic nodes vs. Derwent superatoms

• Search examples

Agenda

STN Generic Nodes vs. Derwent Superatoms• The STN Generic Nodes are used by the CAS indexers when indexing

Markush structures for MARPAT®• The Derwent Superatoms are used by the Clarivate Analytics indexers

when indexing Markush structures for DWPIM• Searcher can use the STN Generic Nodes in structures for searching in

DWPIM, DCR, MARPAT® and REGISTRY‒ STN Structure Editor translates STN Generic Nodes into corresponding Derwent

Superatoms for DWPIM and DCR searches

• Searcher can use the Derwent Superatoms in structures for searching in DWPIM only! ‒ DCR search capability using the Derwent Superatoms is coming

52

STN Generic Nodes

• Ak = Alkyl chain• Cy = Any ring system• Cb = Any carbocyclic ring system• Hy = Any heterocyclic ring system• X = Any halogen atom, including At• M = Any metal atom• Q = Any atom except for Carbon or Hydrogen• A = Any atom except for Hydrogen

53

STN Generic Nodes

• Ak = Alkyl chain• Cy = Any ring system• Cb = Any carbocyclic ring system• Hy = Any heterocyclic ring system• X = Any halogen atom, including At• M = Any metal atom• Q = Any atom except for Carbon or Hydrogen• A = Any atom except for Hydrogen

54

Carbon chain Derwent Superatoms

• CHK = Alkyl, Alkylene

• CHE = Alkenyl, Alkenylene

• CHY = Alkynyl, Alkynylene

55

Carbocyclic Derwent Superatoms

• ARY = Aryl carbocyclic system‒ Carbocyclic systems which have at least one benzene ring or quinoid

variant, monocyclic or fused

• CYC = Any carbocyclic ring not covered by ARY‒ Monocyclic or fused

56

Heterocyclic Derwent Superatoms

• HEA = Heteroaryl‒ Aromatic, monocyclic ‒ Either fully unsaturated 6 membered rings or 5 membered rings with 2

double bonds

• HET = Heterocycle ‒ Nonaromatic, monocyclic which do not fit the definition of HEA

• HEF = Heterocycle (fused)

57

STN Generic Nodes translates into Derwent Superatoms for DWPIM searches

• Ak => CHK or CHE or CHY

• Cy => ARY or CYC or HEA or HET or HEF

• Cb => ARY or CYC

• Hy => HEA or HET or HEF

58

STN Generic Nodes vs. Derwent Superatoms example59

Note that in this example, the Hy Match Level attribute is Atom.

Upload Structure into DWPIM60

Run Substructure Search61

Replace Hy with HEA, repeat strategy 62

Replace HEA with HET, repeat strategy 63

Replace HET with HEF, repeat strategy 64

Compare Hy results with HEA, HET and HEF results65

L1 = Hy structure

L5 = HEA structure

L9 = HET structure

L13 = HEF structure

Another way to compare Hy to HEA, HET and HEF66

No additional records were found using Hy versus HEA, HET and HEF, and vice versa.

Best Practice: How to deal with Iteration Incompletes

From our experience: • Most often very generic Markush structures are of no relevance• Most often core structure is not related to query structure• If set needs to be evaluated:

‒ do not analyze structures in DWPIM but crossover incomplete structures to WPINDEX/WPIDS/WPIX

• If possible, narrow result to technical area like pharma (B/DC), e.g. by using roles• Display WPINDEX records with graphic images, e.g.

=> D AN TI GI=> D FULLG

(Graphic images include structures often from the claims)• Development work for DWPIM is continued to reduce number of incompletes further

67

DWPIM TO DWPI Patent Family Records68

By including the INCOMPLETES, 47 additional patent families are added to the search results. Limit this search by class codes, keyword search, Manual codes, etc..

Consistent searching in all structure and Markush databases on STN• Single structure query for all structure files with STN structure conventions

- adaptations to DWPIM Markush searching can easily be done• Single flexible concept for the Markush search options (match levels) on STN

Integrated environment of Derwent files for search and display• Combination of structure (DCR, DWPIM) and text searches (DWPI)• Integrated display in DWPI, including structures from DCR and DWPIM

Deployment of the full potential of the DWPI Markush database• Derwent bonding conventions are retained as provided by Clarivate Analytics• Availability of Derwent node conventions for searching

High precision state-of-the-art Markush search engine Improved evaluation of Markush structures with hit structure display, highlighting,

and assembled structures

69

Advantages of the DWPIM Implementation on STNext®

Summary

• Markush searching is as much art as it is science

• Do not be afraid to play around with structures‒ Compare results as changes are made to attributes to understand the

effect of those changes

• Do not be afraid to ask for help!

70

E-Seminar on DWPIM coming next

E-Seminars on February 28, 2019Multifile structure searching on STNext®

• Why is it important to search DWPIM in addition to otherstructure databases on STN (e.g. MARPAT®)

• Case study evaluating unique hits• Best practices

Sign-up for the STN training newsletter by FIZ Karlsruhe!

71

Resources

• DWPIM Manual‒ http://www.stn-international.de/dwpim_reference_manual.html

• Recorded sessions‒ Structure Searching on STNext®

• http://www.stn-international.com/201803_structure_search_stnext.html‒ The Derwent Markush Resource database (aka DWPIM) on STNext®

• http://www.stn-international.com/201810_dwpim_stnext.html

72

CAShelp@cas.orgSupport:www.cas.org

FIZ Karlsruhehelpdesk@fiz-karlsruhe.deSupport:www.stn-international.de

For more information …

Recommended