Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Derwent Markush Resource (DWPIM) on STNext®
Webex30.01.2020
Thomas Stengel (Product Manager Chemistry & Patents)
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
Information about DWPIM on STNext4
Webinars about DWPIM on STNext5
Latest DWPIM seminars• DWPIM on STNext:
Introduction and New Functionality• Multifile Structure Searching on STNext
Why is it Important?How is it Done?
• In-depth Structure Searching in DWPIM on STNext Best Practices
• The Derwent Markush Resource (aka DWPIM) on STNext
AN 2121-56402...
What is the Derwent Markush Resource (DWPIM)?6
1 => TRA MCN /AN
TI New pyrazolopyrimidine derivative are bruton'styrosine kinase inhibitors used to treat cancer comprises solid cancer e.g. brain tumor, malignantastrocytoma and blood cancer e.g. leukemia, autoimmune disease e.g. rheumatoid arthritis
CMC UPB 20181206...RIN: 01174 01732 MCN: 2121-56402-N 2121-56402-P
Markush Compound Number
Accession Number
Chemical Code (CMC)
TRA1
Indexing guidlines for DWPIM• Markush structures are indexed from:
‒ the patent claims‒ the embodiment if a 'wider disclosure' is indicated
• The maximum number of Markush and DCR structures indexed per DWPI basic patent is 99‒ Specific structures which cannot be covered using
DCR indexing are covered within a DWPIM Markushstructure
‒ Specific Structures prior DCR implementation (1999) can be found in DWPIM
7
Example: Markush of specific compounds (9213-F8101)
TI New 5-formyl-1,1,2,3,3,4,6-hepta:methyl:indane -which is organoleptic agent with fragrant musk-like aroma
CMC ... M3 *01* G023 G034 G036 G038 G039 G212 J011 J241 J431 K0 K840 L143 M210 M211 M212 M240 M283 M320 M414 M510 M520 M531 M540 M710 Q253 Q254 R021 R022 R023 M903 M904MCN: 9213-F8101-N 9317-G7301-M 9317-G7301-N 9317-
G7301-Q 9319-A0801-N
In DWPI (US 5095152 A) In DWPIM
Example: Markush of Series of specific compounds (9830-C8901)
Differences DWPIM Structure vs Patent Claim• Indexing conventions
‒ Keto-enol tautomerism (keto form is the preferred one in DWPIM)
• Use of Markush terminology and shortcuts‒ DWPIM: Use of Superatoms terms (CHK, ARY etc.)
& shortcuts (CO2, SO3 etc.)
• Allowing for variable attachments‒ All parts of the structure where the attachment can be made by a variable group are assigned
G group elements
• Allowing for exceptions mentioned in the patent (povisos)‒ E.g. if A=value x, then B can take on subset of its possible values)
• Allowing for system limits‒ Means sometimes one structure is split into 2 or more structures
Comparison of patent claim with DWPIM structure helpful for optimizing query
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
Ways to generate structures12
• Structure Editor
• CAS REGISTRY Number
• InChI Strings
• SMILES Strings
• Import structure
• .cxf format fully supported
• .mol, .str formats supported for specific structures
• Command line
Structure Editor Preferences13
Derwent Markush Attributes for nodes and bonds14
When the structure is uploaded for the session, all Derwent attributes are displayed in the transcript(ring lock indicated with bold bonds)
DWPIM Structure Searching and Crossover to DWPI is Similar to REGISTRY/CAplus
15
=> FILE DWPIM
=> Uploading structure file: 2018_0006_StructureL1 STRUCTURE UPLOADED
=> S L1 SSS FULL
100.0% PROCESSED 0 ITERATIONS 1443 ANSWERSL2 1443 SEA SSS FUL L3
=> FIL WPINDEX
=> S L2L3 955 L5
=> D FULLG AHITSTR
1
2
3
1
2
3
DWPIM structure search
Crossover to DWPI
DWPI display with HIT structures
Three Markush Displays16
DWPI:
AHITSTR assembled
BHITSTR brief
FHITSTR full
DWPIM:
ASB assembled
BRIEF brief
FULL full
Types of Searches17
• Sample (default)
• Subset
• Batch
• Substructure SSS and Closed Substructure CSS
Subset Search in DWPIM18
• All valid search types (CSS, SSS) and search scopes (SAMPLE, FULL) may be used in the subset search. The search syntax follows STN conventions, e.g.:
Batch Search in DWPIM19
• Increased search time
per structure may reduce
number of iteration
incompletes
• Extended overall search
time of 90 minutes may
increase number of
completed searches
Substructure Search in DWPIM 20
Separating out the records retrieved with incomplete designation. Preserves hit structures
Meaning of „Iterations“21
There is no full file projection for online and batch search available.
Information on Iterations:
no meaning, to be ignored !
#of it incs to give 50 sample records
Best Practise: How to deal with Iteration Incompletes• Most often very generic Markush structures are of no relevance
• Most often core structure is not related to query structure
• If set needs to be evaluated:‒ do not analyze structures in DWPIM but crossover incomplete structures to DWPI
• If possible, narrow result to technical area like pharma (B/DC), e.g. by using roles
• Display WPINDEX records with graphic images (incl structures from the claims) , e.g. => D AN TI GI=> D FULLG
• Split up complicated query structures with many possible variations into two or more separate less complicated structures
22
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
Applying Roles in DWPIM24
• SDM: 26 Substance Descriptors (mainly tech- and structure-related)
• MDE: 3 Markush Descriptors (specific generic)
Applying Roles in DWPI (cross-over from DWPIM) 25
• DCR,DCN: 30 Roles (compound provience, type, analytics)
Syntax:S L-num(T)(role(s))/MCN
Syntax:S L-num(T)(CL)/MCND HIT
Applying Roles in DWPI (cross-over from DWPIM) 26
• Frag-codes: > 100 Roles (Pharmaceutical and Agricultural activities, properties and uses)
Syntax:S L-num(P)(role(s))/M0,M2,M3,M4
Syntax:S L-num(P)(Q25)/M0,M2,M3,M4D HIT
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
DWPIM vs. MMS: content and database structure
DWPI basic patents
PHARMpatents
MMS(Markush + Specific)
DWPI basic patents
DWPIM(Markush)
DCR(Specific)
MMS STN
Summary of key differences DWPIM vs MMS
• Structure displays (hit structures only available in DWPIM)
• Free sites (MMS “closed”, DWPIM “open”)
• Match Level (STN) vs. Translation (Questel)
• Differences in search functions
• Bond values
• VPA Function
• Exclusion of certain elements / fragments
MARPATSM and DWPIM complement each as regards
• Authority coverage• Compound class coverage• Indexing policies
‒ Chains / bonds‒ Generic Nodes‒ Match Level‒ G-group numbering‒ Tautomers
• Time periods covered
Generic nodes (=superatoms) in DWPIM
X preserved
M preserved
DWPIM superatoms defined by properties
till 1990s
Searching POL
US200600518240323-31403Query
Mixed Match Level Rings can be searched
• Only the combination of ML ATOM and ML ANY is allowed (Clarivate indexing rules)
• Hybrid ring systems can only contain the generic node XX.
These records can only be found with ML ANY
Mixed Match Level Rings can be searched
• ML combinations ATOM + CLASS, CLASS + ANY and ATOM + CLASS + ANY
are not allowed. The following rules apply:
• The most common type of ML is determined and applied for all ring atoms
• If it is ML ANY, the second most common will be applied for all ring atoms
• For equal numbers of assigned ML’s the lower ML is assigned, e.g. a rig with 3
atoms ML Atom and 3 Atoms ML Class the overall ML Atom will be assigned.
Indexing inconsistencies Rare example for ML Atom-Class combination within same ring
MARPAT: ML Atom-Class within rings allowed
Carbons: ML Class unlimitedML Atom
Mixed Match Level Rings (ATOM-ANY)Ring Contraction does not take place: Ring size of query structure is preserved.
MARPAT:Ring contraction allowed
DWPIM:Ring contraction not allowed(Pyrrolidine ML ANY)
Search in DWPIM
DWPIM hit structure
MARPAT hit structure
(Pyrrolidine ML CLASS)
Search in MARPAT
Mixed Match Level Rings
ML ANY atom# completes
None (=all atom) 522
4 529
3,4 532
3,4,5 532
3,4,5,6 536
2,3,4,5,6 536
All Class 4858
All ANY 7284
1
2
3
4
5
6
Mixed Match Level Rings
Query 1 Hit record Query 2
ML ANY ML ANY
Mixed Match Level Rings
• ML ATOM: Pyridines
• ML CLASS: Pyridines, HEA, HEF
• ML ANY: HEA, HEF, Pyridines, XX, ring containing XX
Tautomers – Carboxylic acids and amids Case 1: X and Y are different (chains)
Example:
• MARPAT: − Bonds normalized− Indexing: double bond at O,S
• DWPIM: − Bonds as single and double bonds− Indexing: double bond at O,S
DWPIMMARPAT
35983970229(indexing inconsistencies)
3970STR 14STR 13
Tautomers – LactamsCase 2: X and Y are different (rings)
DWPIMMARPAT
278(indexing inconsistencies)
5248
52495251
• MARPAT: bonds are normalized
• DWPIM: located double and single bonds
STR 26
STR 25
STR25 STR26
Example:
Tautomers – Pyridinone type
• MARPAT: N-C-O bonds normalized.
Preferred indexing as 2-pyridinone tautomer (Oxo Rule)
• DWPIM: Aromatization takes priority over tautomerization,
i.e. 2-pyridinol tautomer indexed
Tautomers – Pyridinone type
bond normalization for pyridinone-type in MARPAT but not in DWPIM
DWPIM: 83 4
MARPAT: 52 52
=87
=52
+
+
Aromatization rule inDWPIM but some hitsare „keto“
Example: Valaciclovir
STR 1 STR 2
DWPM, Marpat: N-C-N bonds are normalized in both db‘s
Tautomers – Imidazoles and Guanidines
Case 3: X and Y are N
DWPIM: Marpat:
Tautomers – Keto-Enol
DWPIM: 4 196
MARPAT: 19 20
No bond normalization for keto-enol bonds in DWPIM or MARPAT
=200
=39
+
+
Keto rule inDWPIM but some hits are „hydroxy“
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
Nov 2019 Release – Resolved Issues / Enhancements
49
• Repeating groups enhancements:
• In combination with ring lock
• Starting with 0 (e.g. [0-3])
• Without upper or lower limit (e.g. [2-] or [-2])
• Atom-Atom match with A- and Q-node issue resolved
• Ring-lock function issues resolved
Ring and Chain Expansion via free sites• Chain nodes have at least 1 free site and can therefore match to other chain nodes:
CHK CHE, CHYCHE CHY
• Ring nodes have at least 2 free site and can therefore match to other ring nodes:Cb HEFCYC ARY, HEFHET HEFHEA HEFARY HEF
• Currently the ring expansion cannot be avoided (attribute „monocyclic“ not applicable).
Ring and Chain Expansion via free sites – Example 1
Query:
Pyridine ring ML Class
Without ring expansion: 6113 hitsWith ring expansion: 8639 hits
Result:
AN:2149-71702
Pyridine HEA HEF
Ring expansion
Current implementation ! Will be reversed (Q1 2020)
Ring and Chain Expansion via free sites – Example 2
52
ML Atom
SSS Full
Current implementation ! Will be reversed (Q1 2020)
Ring and Chain Expansion via free sites – Planned Changes• Chain nodes have at least 1 free site and can therefore match to other chain nodes:
CHK CHE, CHYCHE CHY
• No ring extensions takes place for CYC, ARY, Cb, HEA, HET.
• Attribute „monocyclic/polycyclic“ for CYC and ARY.
Queries with Fragments do not work correctly
54
• Hit records may be missed due to a matching problem.
Example: STR2 does not include all hits retrieved with STR1
STR1 STR2
Adjust bond value for Carboxylic acid derivates to e/n
55
RR
Switch from n (default) to e/n
• On STN Carboxylic acid derivates as well as corresponding phosphoric, sulfonic
and selenic acid derivates are defined with normalized bonds
• Affected groups: -COOH, -CO2H, -COSH, -CSSH, -CS2H, -OPO3H2, -PO3H2, -PO2H,
-OSO3H, -SO3H, -SO2H, -SeO3H, -SeO2H
• How to handle this issue for query structures:
e/n (default)
Avoid shortcuts since their bond values can’t be changed !
• Recommendation: hitstructure information within DWPIM preserved.
Therefore answer set should always be saved in DWPIM as well.
• Records from DWPIM and DWPI can be mutually allocated by Markush number
(AN in DWPIM corresponds to MCN in DWPI).
DWPIM WPIX Cross-Over: Hit Structures
Reporting Function (including hitstructures)• Reporting as „Substance Report“ does not work (fix planned in Q1/2020)
• Workaround: Use the patent template and drag the „substance descriptor“ field for DWPIM
and „manual code“ field for DWPI reporting.
• DWPIM Database information, content & Indexing
• Basic Markush searching techniques
• Advanced Markush searching techniques (descriptors, roles)
• DWPIM compared to MMS and MARPAT
• Latest DWPIM Release: Enhancements & Outstanding Issues
• Special Topics
Agenda
XX as Query Node59
Possible hit records
XX node for „linker“ searches60
Complexity ↑# incompletes ↑search time ↑
Nested G-Group search –Example 161
R1
R2
Search time: „seconds“
Nested G-Group search –Example 262
Cy, Q
R4: X, Ak, Cy
R1
R2
R3
Search time: „minutes“
Nested G-Group search –Example 363
Example 3 Search: limit of 1024 possible variations reached.Error message: SYSTEM ERROR
R4: X, Ak, Cy
R3
R2
R1
Matching of carbonyls adjacent to carbon chains64
General Spin-off Rules:
Spin-off are always generated from real nodes but never from generic nodes.
• E.g. CO1-CHK group
• Starting from CO1:
• a spin-off for the CO1-group is generated (CHK*), adjacent CHK included
Chain contraction takes place
• Starting from CHK:
• No spin-off for generic node CHK generated
No chain contraction with adjacent CO1 group
Dependency on Search Direction65
Example Query –CO-CH2-CH3
1) Search Direction from left to right
Query: -CO-CH2-CH3 Spin-off: -CO-CHK(C=2)
Target: -CO-CHK -CO-CHK
2) Search Direction from right to left
Query: -CO-CH2-CH3 Spin-off: CHK(C=3) O
Target: -CO-CHK CHK-CO-
Match !
No Match !
Example 166
CO1
Query Index
1
22
21
CHK*
CHK*
match
no match
• Search direction starting at
imidazolinone N
• No hit because there is no match
for CO1 moiety
Example 267
CO1
Query Index
22
1
2
CHK*
CHK*
match
• Search direction starting at NH2
• Hit (matching is complete)