Upload
simbiosysinc
View
1.490
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Underpinning the computer-aided synthesis design system, ARChem, are algorithms that extract synthetic knowledge from large reaction databases. The generation of reaction rules that facilitate retrosynthetic analysis, as well as the extraction of information about expected yields, regioselectivity, functional group compatibility, and stereo-chemistry are discussed in these slides.
Citation preview
from Reaction Databases
Orr Ravitz
SimBioSys Inc.
246th ACS National Meeting
Extracting Synthetic Knowledge
ARChem – main concepts
A computer-aided synthesis design system.
The Approach:
Comprehensive rule- and precedent-based retrosynthetic analysis back to available starting materials.
Automated rule generation with manual rule curation.
Generate many alternatives.
Provide supporting literature examples.
Allow user guidance and control.
Solution Display
Exploring Alternative Paths
Supporting Examples
Chemical Interference
Functional groups that may interfere with transformations are highlighted.
Functional Group Tolerance
Break down of example set based on the presence of functional groups beyond the reaction center provides evidence for compatibility.
Examples can be exported to database’s web interface for further analysis.
Stereochemistry
Currently: Exact matches Starting materials Coming soon: Rule-based
Essential Information
Automated extraction of knowledge
Reaction rules
Yield values
Chemical interference - functional group tolerance
Regioselectivity
Stereochemistry
Data Information Knowledge
Perceive
Generalize
System Design
Reactions
Reaction Rules
Starting Materials Expert Knowledge-
bases
Target
Source reactions Esterification examples
Other examples
··· → ··· ··· → ··· ··· → ···
Esterification rule
Other rule
··· → ···
Reactions
Reaction Rules
Rule Extraction
Reactions
Reaction Rules
Reaction Perception
Source reaction:
Extracted core
Extended core
Reaction file with atom mapping
Atoms attached to bonds changed, made or broken in the reaction
Include all structural motifs that are essential for the reaction to occur
Extending the Core: Passengers vs Drivers
The goal of chemical perception is to discriminate between structural features that are essential for the reaction, and those that are passengers.
Shell-based approach: 1st shell
2nd shell
Graph-based methods are inappropriate.
Mechanism-Dependent Core Extension
Nucleophilic aromatic substitution:
Addition /elimination mechanism Requires a π acceptor group in ortho or para position
Via organometallic intermediate
Reactions
Reaction Rules
Rule Extraction
Similar extended cores
Completed reaction rule
Common extracted core
Nucleofuge (NF) - a leaving group which carries away the bonding electron pair.
Generalized rule
Generalized group (NF) is replaced by the most common group.
Interfering Functionality
Following rule abstraction, compatible functionality is detected by examining the examples:
Compatible Interfering
Moieties outside the extended core are listed as compatible.
Other functional groups will be inferred as `possibly interfering’.
Possibly interfering functionality will be penalized in scoring and highlighted to the user.
Regioselectivity – Main Steps
Recognize rule’s reaction type – electrophilic substitution, nucleophilic addition etc.
Only reactions prone to regioselectivity are subject to regio calculations.
Identify competing sites
Identify substituents and other structural motifs that may influence the directionality
Collect statistics from example set regarding selectivity in the reaction core as well as elsewhere in the molecule (chemoselectivity)
Assign regioselectivity to rule if predefined statistical requirements are met.
? ? ?
Collecting Statistics
Electrophilic aromatic substitutions
For each example in DB:
Evaluate ring activation including for heteroaromatic rings and fused rings Evaluate location, type and neighborhood of ring substituents Identify symmetry Compute environment signatures that include all aromatic features plus
relevant substituents
For each rule:
Cluster reacting vs. non-reacting signature-equivalent sites for reactions with yield > 20%
Define regioselectivity if examples ratio is 10:1
Regio Example
X=Cl, 84% X=Cl, 5.5% Rejected
Misinterpreted yield value provided
positive evidence
Stereochemistry – the challenge
Efficient machine perception and representation of a broad range of synthetically important stereogenic types Including tetrahedral C, S, N and P. Also alkenes, allenes and atropisomers
Representation of stereochemical reaction rules and stereochemical strategies
Develop a versatile stereochemical substructure algorithm to support retron matching
Efficient discovery of symmetry in stereochemically defined molecules and rules - avoid duplicate routes
Stereoselectivity is captured inaccurately and inconsistently across common
databases.
The Data
Database content Portion of data Notes
Number of unmapped examples 14% Reaction type unknown
Number of examples belonging to reactions
with 5000 or more examples 4% Ubiquitous protection / deprotection reactions
Number of examples belonging to reactions
with 20 or less examples 16%
Bad atom maps (database errors)
Multistep reaction sequences
General useable examples 65% 65 %
0
10
20
30
40
50
60
70
80
90
100
yield cs de ee
% o
f d
atab
ase
Examples with quoted selectivity values
Selectivity metric
0
10
20
30
40
50
60
70
80
90
100
> 0% > 25% > 50% > 75% > 90% > 95% > 98%
yield
cs
de
ee
Examples with selectivity above a threshold
% o
f av
aila
ble
Threshold selectivity values
Stereo-Rules Generation – A Different Approach
Manually code rules for a diverse set of useful enantioselective and generally selective reaction types.
Mine supporting examples from existing large reaction databases to discover reaction scope and limitations for each rule.
Find effective strategies to aid planning of a stereo controlled synthesis
Reactions
Diels Alder Sharpless Reduction of C=C Reduction of C=O
70 reaction types with ee>95% and more than 50 examples
Designing a Rule-Set
Reaction type Bond alterations Examples with ee ≥95% Notes
Addition of C nucleophiles to C=C CH + C=C → CCCH 1603 Mostly conjugate additions
Reduction of C=O C=O → HCOH 1553 Any type of carbonyl
Addition of C nucleophiles to C=O CH + C=O →CCOH 1265 Includes mostly Aldols + alkynylations
Reduction of C=C C=C → HCCH 1120 Wide variety of environments
Addition of C nucleophiles to C=N CH + C=N →CCNH 639 Any type of C=N
Epoxidation of C=C C=C → C1CO1 415 Sharpless, Jacobsen, Shi etc
Addition via R3B to C=C C-B + C=C → CCCH 329 Mostly conjugate addition to enones
Addition via R2Zn to C=O C-Zn + C=C → CCCH 306
Dihydroxylation of C=C C=C → HOCCOH 266
Reduction of C=N C=N → HCNH 256 Any type of C=N
Diels-Alder C=C + C=CC=C → C1CCC=CC1 222 Carbocyclic Diels-Alder
Cyclopropanation of C=C C=N + C=C → C1CC1 222 Via diazo precursor (carbene)
Mukaiyama Aldol SiOC=C + C=O → O=CCCOH 210
C substitution of Br CH + CBr → CC 199
[2+3] azomethine cycloaddition C=NCH + C=C → N1CCCC1 198
Addition via R2Zn to C=C CZn + C=C → CCCH 162 Mostly conjugate addition to enones
Addition via R3B to C=O CB + C=O → CCOH 141
Oxidation of sulphides S → S=O 137 Chiral sulphoxides
Perception of stereochemistry in structural diagrams
Enabling Technology
Stereocenter manipulation and stereo descriptors
Op 1 2 3 4
A E 1 2 3 4
B C23 1 3 4 2
C C13 1 4 2 3
D C2 2 1 4 3
E C13 2 3 1 4
F C23 2 4 3 1
G C23 3 1 2 4
H C13 3 2 4 1
J C2 3 4 1 2
K C13 4 1 3 2
L C23 4 2 1 3
M C2 4 3 2 1
Op 1 2 3 4
s 2 1 3 4
E + 8C3 + 3C2 Rotations
Reflection
Conceptual Model Stereo Descriptor
Chemical constraints layer of representation
Enabling Technology
CONNECTIONS=1,2,3 FUSION=BIARYL
RINGS=5+6,6+7 BRIDGEHEAD=YES
DIFFRING=1 EPS=0,1
SAMERING=1 HETS=0,1,2
DIFF=1 NONAROMHETS=0,1,2
SAME=1 HALOGENS=0,1,2
ARYL=YES FGS=ALCOHOL
SPCENTRE=1,2,3 FGNOT=CARBONYL
CHARGE=YES PROP=EWG
HS=0,1,2 PROPNOT=Lg
Substructure search/match
Reduction of Ketones to Secondary Alcohols
Level 1: + Environment constraints
Level 0: Bond change constraints only
Level 1: + Stereochemical constraints
Base ARChem rule Hits ee de (screen)
10,004 (10,004) Not unique to ketone → secondary alcohol conversion
8,442 (10,004) Unique to ketone → secondary alcohol conversion 140 tolerated functional groups
6,525 3,457 4,711 (6,765) Enantioselective and diastereoselective examples
Dihydroxylation of Alkenes
Level 1: Bond changes with environment constraints
Level 2: + Stereochemical constraints
Level 3: + Substitution patterns
2253 examples (2416 screened)
Hits ee de (screen)
1,428 1,008 1,151 (1,634)
428 117 352 (444)
Hits ee de
681 578 552
526 289 418
206 131 168
12 10 11
236 89 191
123 51 103
51 27 41
8 4 7
Conclusions
Useful chemical knowledge can be extracted algorithmically from reaction databases.
Automation is crucial given the size and growth of databases.
Different layers of knowledge are tightly entangled: regioselectivity, chemoselectivity and stereoselectivity overlap considerably.
The extracted knowledge can be applied effectively in computer-aided synthesis design, and empower chemists by offering new ideas a broader perspective on the literature.
But...
The quality of extracted knowledge highly depends on the accuracy and scope of the source data!
The Rule-Set
Cut-off threshold
Useful reactions
Noise
Distractions
Low utility reactions
Bad atom maps (avoid) Rare multistep reaction sequences (low utility) Multiple concurrent reactions on substrate (very low utility) Exotic heterocycle formation (promote)
Ubiquitous protection / deprotection FGIs such as alcohol/ester, amine/amide etc (demote)
Conclusions Significant portion of data is being lost due to mapping errors and other problems.
Yield and selectivity information is captured inconsistently.
What can be done:
Meta data perception can be improved. (in progress)
Mapping algorithms should reflect contemporary mechanistic understanding of reactions.
Systematic mapping errors can be manually fixed (planned)
Extracted rules can be manually curated (continuous).
Acknowledgements
SimBioSys
James Law - Regioselectivity
Victoria Lubitch
Yasamin Salmasi
Aniko Simon
Zsolt Zsoldos
Reaction Data
Elsevier – Reaxys
Wiley - CIRX
RSC - MOS
Accelrys - RefLib
University of Leeds
Tony Cook - Stereochemistry
Peter Johnson
Steve Marsden
Other Collaborators
ChemAxon
And…
ARChem users! THANK YOU!