46
NIH Virtual Workshop on Reaction Informatics, May 2021 Pistachio John Mayfield, Ingvar Lagerstedt and Roger Sayle NextMove Software “Fantastic reactions and how to use them”

Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

PistachioJohn Mayfield, Ingvar Lagerstedt and Roger Sayle

NextMove Software

 “Fantastic reactions and how to use them”

Page 2: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

What is Pistachio?

A document centric database of 13.3 million reactions

Automatically extracted from U.S., European and WIPO patents

JSON and SMILES provided for bulk analysis/model building

Containerised WebApp for exploring and querying the data

Aim is to extract reactions as described in the original document,

Warts and all

Page 3: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

History

Daniel’s PhD Thesis (2012)repository.cam.ac.uk/handle/1810/244727

DEPARTMENT OF CHEMISTRY

Extraction of chemical structures and reactions

from the literature

Daniel Mark Lowe Pembroke College

This dissertation is submitted for the degree of Doctor of Philosophy

June 2012

Original Open-Source Projectdan2097/patent-reaction-extraction

Pistachio (13.3 million) nextmovesoftware.com/pistachio

We use an internal fork built using LeadMine instead of OSCAR4.

Primarily improves chemical entity and physical quantity recognition, spelling

correction, etc.

USPTO CC-Zero Subset (3.7 million)Chemical_reactions_from_US_patents_1976-Sep2016_/5104873

Page 4: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Data ImpactChristos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266

Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402

Bowen Liu et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci., 2017, 3 (10), pp 1103–1113

Philippe Schwaller et al. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction ACS Cent. Sci., 2019, 5 (9), pp 1572–1583

Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci., 2017, 3 (5), pp 434–443

Philippe Schwaller et al. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv., 2021, 7 (15)

Alessandra Toniato et al. Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intell. 2021

Amol Thakkar et al. Artificial intelligence and automation in computer aided synthesis planning. React. Chem. Eng., 2021, 6

Page 5: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Important: The same reaction will occur in application/grant, related patents, sketches/text and different authorities (WIPO/EPO/USPTO). Using RInChI without any role normalisation ~4.2 million.

Often identical but not always - different description/yield/actions.

Page 6: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Important: The same reaction will occur in application/grant, related patents, sketches/text and different authorities (WIPO/EPO/USPTO). Using RInChI without any role normalisation ~4.2 million.

Often identical but not always - different description/yield/actions.

Page 7: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

A solution of 2-(2-hydroxyethyl)-5-methoxy-1-indanone (105 mg, 0.51 mmol) in methanol (2.0 mL) at room temperature was treated with ethyl vinyl ketone (EVK, 0.102 mL) and 0.5M sodium methoxide in methanol (0.204 mL, 0.1 mmol). The mixture was stirred in a capped flask and heated in an oil bath at 60° C. for 8 hours. After cooling, the reaction mixture was diluted with EtOAc (25 mL), washed with 0.2N HCl (15 mL), water (15 mL), and brine (15 mL), dried over MgSO4, filtered, and evaporated under vacuum to afford 2-(2-hydroxyethyl)-5-methoxy-2-(3-oxopentyl)-1-indanone as an oil.

Amy Fried and Robert Wilkening Merck Sharp & DohmeEstrogen receptor modulators. US 7151196 B2 [0236] (19-Dec-2006)Example 2, Step 2

A solution of 2-(2-hydroxyethyl)-5-methoxy-1-indanone (105 mg, 0.51 mmol) in methanol (2.0 mL) at room temperature was treated with ethyl vinyl ketone (EVK, 0.102 mL) and 0.5M sodium methoxide in methanol (0.204 mL, 0.1 mmol). The mixture was stirred in a capped flask and heated in an oil bath at 60°C for 8 hours. After cooling, the reaction mixture was diluted with EtOAc (25 mL), washed with 0.2N HCl (15 mL), water (15 mL), and brine (15 mL), dried over MgSO4, filtered, and evaporated under vacuum to afford 2-(2-hydroxyethyl)-5-methoxy-2-(3-oxopentyl)-1-indanone (138 mg, 93% yield) as an oil.

Dann Parker, Ronald Ratcliffe, Kenneth Wildonger and Robert Wilkening Merck Sharp & DohmeEstrogen Receptor Modulators EP 1257264 B1 [0261] (14-Sep-2011)EXAMPLE 34, Step 2

Page 8: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio Data• Updated quarterly• EPO, WIPO patents• USPTO sketches• NameRxn Classification/AAM

– Improved role assignment– NameRxn 71.5% coverage

• Example/Step Labels• Solvent Mixtures• Solvent associations• Document Assignees, Targets and Diseases• Continual tweaks based on feedback

U.S. Grant Text 3,366,399 2021-05-18U.S. Appl. Text 3,629,411 2021-05-13WIPO PCT Text 1,520,596 2021-05-06Euro. Grant Text 1,074,590 2021-05-12Euro. Appl. Text 702,035 2021-05-12U.S. Grant Sketch 1,211,521 2021-05-18U.S. Appl. Sketch 1,834,132 2021-05-13

Page 9: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio DataPistachio is a “super-set” of USPTO but not strictly so…• NameRxn filtering/mapping• Improved/changed name-to-

structure, roles, sectioning• Structure normalisation differences• Whack-a-mole/pachinko machine

– Obvious sensible change can have unforeseen consequences

The NextMove’s pachinko machine

Page 10: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

Regression Testing

Page 11: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio Data• Reactions from USPTO Application Text 2001-22nd Sep 2016

– 1,939,253 CC-Zero Subset– 2,568,513 Pistachio– 458,995 common (-1,480,258,+ 2,109,518) by SMILES

Page 12: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio Data• Reactions from USPTO Application Text 2001-22nd Sep 2016

– 1,939,253 CC-Zero Subset– 2,568,513 Pistachio– 1,386,306 common (-552,947,+1,182,207) by ~RInChI

Page 13: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio Data• Reactions from USPTO Application Text 2001-22nd Sep 2016

– 1,939,253 CC-Zero Subset– 2,568,513 Pistachio– 1,465,946 common (-473,307,+1,102,567) by norm SMILES

Page 14: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

USPTO vs Pistachio Data• Reactions from USPTO Application Text 2001-22nd Sep 2016

– 1,939,253 CC-Zero Subset– 2,568,513 Pistachio– 1,866,314 common (-72,939,+702,199) by paragraph Id

Page 15: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

Overview of extraction

Page 16: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text Extraction

Example from US20020133011A1 [0070]

Sectioning

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

Page 17: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text ExtractionSectioning

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

Examples from WO 2020/239862 A1 PatentScope OCR

Missed break

Extra break

Page 18: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text Extraction

UnitType.MassUnitType.PercentQuantityType.Yield

Sectioning

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

UnitType.PercentQuantityType.Purity

Page 19: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text ExtractionSectioning

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

Page 20: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text Extraction

Type AddCompounds • ethyl cyanoacetate (mass=13.56 g)• ethyl 4-fluorocinnamate (mass=19.4 g)• sodium ethoxide (mass=2.3 g, vol=50 ml)Conditions • 2-3 minutes• 60° C.

Sectioning

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

Type YieldCompounds • 2-cyano-3-(flurophenyl)-glutarate

(mass=23 g, yield=74%, purity=98%)

Type HeatConditions • 1 hour

Type CoolConditions • 5° C.

Page 21: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Text ExtractionSegmentation

Tagging/Tokenization

Parsing

Action Phrases

Reaction Assembly

Preliminary role assignment based on action, surrounding context and dictionaries (common solvents/catalysts)

Page 22: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

ChEMU 2020 Evaluation Lab

Run Exact matching Relaxed matchingF1-score Precision Recall F1-score Precision Recall

Task 1 0.8983 0.9042 0.8924 0.9240 0.9301 0.9181Task 2 0.8977 0.9441 0.8556 n/a n/a n/a

end-2-end 0.8026 0.8492 0.7609 0.8196 0.8663 0.7777end-2-end

(after deadline)0.8255 0.8746 0.7816 0.8420 0.8909 0.7983

Nguyen D.Q. et al. (2020) ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. In: Jose J. et al. (eds) Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12036. Springer, Cham. https://doi.org/10.1007/978-3-030-45442-5_74

Daniel Lowe and John Mayfield. Extraction of reactions from patents using grammars. 2020http://ceur-ws.org/Vol-2696/paper_221.pdf

Page 23: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Sketch Extraction

US 09718816 B2 Example 26

Example 26, US 9718816 B2

John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016

Step 1

Step 4

Step 3

Step 2

etc..

NextMove’s Praline

Page 24: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

Overview of Filtering/Mapping

Page 25: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Reaction Filtering - Text

1 >= Precursors <= 15 1 >= Products <= 4 Product not on left Min Product Size = 9 Definite Reference

NameRxn AAM

Indigo AAM

2 >= Num Precursors <= 15

Rebond

Fix Roles

Calculate Yield

Reject

Reject

Reject

Mapped

Sane

Page 26: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Reaction Filtering - Sketch

Specific Reaction 1 >= Precursors 1 >= Products

NameRxn AAM

Fix Roles

Reject

Mapped

Sane

Page 27: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

ROLE FIX

3) Move unmapped reactants back to agents

1) Move all agents to reactants

2) Atom-Atom Mapping - Michael addition (3.11.92)

Page 28: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Why NameRxn?• 1,543 rule based classes - easy to update a mapping disagreement

• Higher precision/lower recall• Originally for pharmaceutical ELNs ~80% • Pistachio coverage is ~71.5%

– >77% USPTO appl. text.• Fast ~380 reactions per second per core

– A few hours to remap entire database– Speed depends on backend

4.1.6 Cyclic Beckmann rearrangement

Page 29: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

NameRxn - Magic functional groupsNameRxn originally written as classification tool, AAM is a by product

• For us no answer is better than a wrong answer • Lowest number of wrong answers (Disagreement

with gold-standard)

• Yellow bar is so called “magic group additions” where a product atom is unmapped:• We didn’t know where a group came from• Where there group came from was missing• Stoichometry (multiple groups from one reactant)

• Aim to indicate this better in bulk data• AMAP bench

Arkadii Lin et al. Atom-to-atom mapping: a benchmarking study of popular mapping algorithms and consensus strategies https://chemrxiv.org/articles/preprint/Atom-to-Atom_Mapping_A_Benchmarking_Study_of_Popular_Mapping_Algorithms_and_Consensus_Strategies/13012679/1

Page 30: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

It’s a kind of magic…

Bromo Grignard + nitrile ketone synthesis (3.7.10)EP0200736B1 [0072] Example 1, Step 1

RxnMapper/Indigo

RxnMapper

Water comes from the quenching:“The reaction mixture is slowly poured into ice cold 10% hydrochloric acid”“quenched slowly with 2N aq. HCl” (different paragraph)

Page 31: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Symmetry/Stoichiometry

NameRxn - 8.2.2 Sulfanyl to sulfonyl

RxnMapper

US20010000511A1 [0357]

Page 32: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Symmetry/Stoichiometry

Handle by reusing atom-maps in the reactant

US20010000511A1 [0357]

Page 33: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Symmetry/Stoichiometry

RxnMapper

Indigo

US 03674855 A

Page 34: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

AMAP BENCH

Indigo

RxnMapper

US20200071310A1 [0487] Example 34

AMAP bench: Changed: 23, Broken: 13, C-C Broken: 7

AMAP bench: Changed: 5, Broken: 3, C-C Broken: 0Daniel Lowe, Roger Sayle. Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms. 244th ACS National Meeting & Exposition. Aug 2012

Page 35: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Indigo/RxnMapper

8-(3,5-Bis-trifluoromethyl-benzoyl)-3-furan-2-yl-methyl-1-o-tolyl-1,3,8-triaza-spiro[4.5]decane-2,4-dione

AMAP bench: Changed: 4, Broken: 2, C-C Broken: 1

Ambiguous Names

Page 36: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Ambiguous Names

NameRxn/Indigo/RxnMapper

8-(3,5-Bis-trifluoromethyl-benzoyl)-3-furan-2-ylmethyl-1-o-tolyl-1,3,8-triaza-spiro[4.5]decane-2,4-dione

1.2.9 Alcohol + amine condensation

AMAP bench: Changed: 2, Broken: 1, C-C Broken: 0

Page 37: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Example 1Example 2Example 3Example 4Example 5Example 6Example 7Example 8Example 9Example 10Example 11Example 12Example 13Example 14Example 15Example 16Example 17Example 18Example 19Example 20

Example 21Example 22Example 23Example 24Example 25Example 26Example 27Example 28Example 29Example 30Example 31Example 32Example 33

US 2020/0087299 A1

Case Study

Page 38: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Example 1Example 2Example 3Example 4Example 5Example 6Example 7Example 8Example 9Example 10Example 11Example 12Example 13Example 14Example 15Example 16Example 17Example 18Example 19Example 20

Example 21Example 22Example 23Example 24Example 25Example 26Example 27Example 28Example 29Example 30Example 31Example 32Example 33

US 2020/0087299 A1

Case Study

NameRxn 127/154 82.4% Indigo 15/154 9.7% Reject 12/154 7.7%

Page 39: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Example 1Example 2Example 3Example 4Example 5Example 6Example 7Example 8Example 9Example 10Example 11Example 12Example 13Example 14Example 15Example 16Example 17Example 18Example 19Example 20

Example 21Example 22Example 23Example 24Example 25Example 26Example 27Example 28Example 29Example 30Example 31Example 32Example 33

NameRxn 132/154 85.7% Indigo 10/154 6.4% Reject 12/154 7.7%

US 2020/0087299 A1

Case Study

Page 40: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Example 27

Typo: “tert-butyl”

Typo: “tert-butyl”

Page 41: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Example 12

Step 1 Small Product (8 heavy atoms)

Typo: “methyl”

Page 42: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

Data STORAGE TIPS

Page 43: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

Hierarchical DataHierarchical data is index in the WebApp: NameRxn Tags, Assignees, Diseases (MESH), Targets (ChEMBL), IPC Codes

A simple way of store and searching for the data is using a nested identifier string, e.g. LIKE ’11.%’ pulls back all AstraZeneca and related companies:

NameRxn is handled slightly different, we pack the three level number into an integer

Parent queries e.g. 3.1 (Suzuki coupling) can be handled as a range

See also: https://www.postgresql.org/docs/9.1/ltree.html

11 AstraZeneca

11.5 Imperial Chemical Industries

11.7 MedImmune

...

3.1.1 50397185

4.1.42 67174442

(lvl1<<24)|(lvl2<<16)|(lvl3&0xffff)

3.1 >= 50397184 <= 50462720

Page 44: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NameRxn concepts and rxno

CINF 13, ACS Fall 2017, Washington, D.C.

1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n

Page 45: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

CINF 13, ACS Fall 2017, Washington, D.C.

1 Heteroatom alkylation and arylation .7 O-substitution .1 Chan-Lam ether coupling .2 Diazomethane esterification .3 Ethyl esterification .4 Hydroxy to methoxy .5 Hydroxy to triflyloxy .6 Methyl esterification .n 2 Acylation and related processes .6 O-acylation to ester .1 Ester Schotten-Baumann .2 Esterification (generic) .3 Fischer-Speier esterification .4 Baeyer-Villiger oxidation .5 Yamaguchi esterification .6 Hydroxy to imidazolecarbonyloxy .7 Imidazolecarbonyl to ester .8 Hydroxy to acetoxy .9 Steglich esterification .n

Esterification (7)

Chan-Lam coupling (3)

Schotten-Baumann Reaction (9)

RXNO: http://github.com/rsc-ontologies/rxno

NameRxn concepts and rxno

Page 46: Pistachio - NextMove Software · 2021. 5. 20. · John May, et al. Sketchy Sketches: Hiding Chemistry in Plain Sight. Seventh Joint Sheffield Conference on Cheminformatics. 2016 Step

NIH Virtual Workshop on Reaction Informatics, May 2021

SummaryWe always welcome feedback if you spot a mistake!

• It’s a long tail but many things are simple changes that are fixed when rerun• Lot’s of people “cleaning” the data, We’d rather know what was wrong and can we

fix it

Plans • Reaction sketch compound numbers• Better quality indication

• Integrate RxnMapper, AMAP bench indicators, Boot-strapping sequences

• Handled reactions from non-english patents• General procedures/example references, currently only resolve

compoundsAcknowledgements

Daniel Lowe (MineSoft)Richard Gowers (NextMove Software)