27
Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Embed Size (px)

Citation preview

Page 1: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Scientific & technical presentation

JChem Cartridge for Oracle

version 5.3, January 2010

Page 2: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Contents

• Purpose of JChem Cartridge

• Features of JChem Cartridge

• Constituents of the JChem Cartridge API

• Normal Tables vs. JChem Tables

• Architecture of JChem Cartridge

Page 3: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Purpose of JChem Cartridge

•Access JChem functionality using SQL:SELECT count(*) FROM nci WHERE jc_contains(structure, 'Brc1cnc2ccccc12') = 1

Access JChem in any programming environment offering Oracle connectivity (.NET, Java, Perl, PHP, Python, Apache mod_plsql...)

• Execute SQL queries efficiently using extensible indexes

Precompute chemical information on structures by creating jc_idxtype indexes:

CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype

The jc_idxtype implementation scans the indexed column for eligible structures in one single performance-optimized operation: domain index scan

Page 4: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Features of JChem Cartridge

• Adds chemistry knowledge into the SQL language of Oracle (SELECT, INSERT, UPDATE, ...)

• Substructure, superstructure, full structure, similarity searching• Complex chemical expressions using the Chemical Terms

language that includes logP, pKa, ...

• Automatic property calculation during registration• Standardization (canonicalization) during registration• Structure format conversions (MRV, Molfile, SDfile, RDfile,

SMILES, CML, etc.) ;2 D, 3D image generation• Structure enumeration using reaction rules• User-defined fingerprint columns• Custom similarity search through molecular descriptors• Interaction with Oracle optimizer

Page 5: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

• Wide range of query atoms

• Query properties

• R-group queries

• Full SMARTS support

• Coordination compounds

• Link nodes

• Pseudo atoms, lone pairs

• Relative stereo

• Reaction search features

• Hit coloring, position variation

• Polymers

Structure search features

See detailed information on structure search: www.chemaxon.com/conf/Structural_Search.ppt

Page 6: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Search options

• Stereo on/off

• Ignore charge/isotope/radical/valence/mixture brackets

• Vague bond matching options

• Chemical Terms filter

• Tautomer search

• Inverse hit list

• Maximum search time / number of hits

• Combine with non-structure conditions

• Ordering of results

• etc.

Page 7: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Searching in Markush structures

Combinatorial Markush structure registration and search

• Markush features handled in search & enumeration:

• R-groups (nesting to any depth)

• Atom lists, bond lists

• Position variation bond

• Link nodes and repeating units

• Homology groups

• Compatible Markush enumeration plugin

Detailed description:

http://www.chemaxon.com/jchem/doc/user/Query.html#combinatorialMarkush

Page 8: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Standardization

• Default standardization includes:

– Hydrogen removal

– Aromatization

• Custom standardization

can be specified for each

table or JChem index

Page 9: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Custom Standardization Example

afterbefore

JChem Cartridge http://www.chemaxon.com/conf/Standardizer.ppt

Page 10: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Compatibility and integrationFile formats:

• SMILES• MDL molfile (v2000 and v3000)• MDL SDF• RXN• RDF• MRV• IUPAC name, InChI• Markush DARC• CDX

Operating systems:

• Windows• Linux• Solaris• HP-UX• etc.

DB engines:

Oracle versions 9i R2 or above

for alternative RDBMS systems, see the JChem Base

presentation: http://www.chemaxon.com/JChem_Base.ppt

Page 11: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Elements of the JChem Cartridge API

• Operators (jc_...) for SQL and their functional forms (jcf package) for PL/SQL

• Parameters for index creation

• DML operators for JChem tables

• Support functions for user defined operators

Page 12: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Operators and functions I.

Typical operator:jc_<some-operation>(<target-structure-column>, <some-

operand>)

Operator for substructure search:jc_contains(<target-structure-column>, <query-structure>)

“Swiss-army-knife” search operator:jc_compare(<target-structure-column>, <query-structure>,

<options>)

Page 13: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Operators and functions II.

• Chemical Terms–Over 100 built-in functions, including

- elemental analysis- topological descriptors- property predictions (logP/D, pKa, PSA, H bond

donors/acceptors, charge etc).- tautomers, protonation forms

–User-defined functions.–Example: The Lipinski-rule in chemical terms

SELECT count(*) FROM nci_3m WHERE jc_compare(structure, 'O=C1ONC(N1c2ccccc2)-c3ccccc3','sep=! t:s!ctFilter:(mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1

Page 14: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

• jc_compare: substructure/similarity/exact searching combined with

Chemical Terms expressions

• jc_matchcount: number of occurences of the query structure in the

target

• jc_evaluate: Chemical Terms evaluation

• jc_molweight: molecular weight

• jc_formula: molecular formula

• jc_react: structure enumeration based on virtual reactions

• jc_standardize: structure canonicalization

• jc_molconvert: conversion to different formats (image generation is

supported)

• jc_tanimoto: similarity search

• jcf.hitColorAndAlign: substructure coloring and alignment

Operators and functions III.

Page 15: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Operators and functions IV.

Chemical Terms and Query Prefiltering:SELECT id, purchase_date FROM compounds_instock WHERE jc_compare(structure, 'C(=S)([N][N])[S]', 'sep=! t:t!simThreshold:0.9!ctFilter:logp()>1!filterQuery:SELECT rowid FROM compounds_instock WHERE purchase_date > DATE ''2002-01-01''') = 1

Prefiltering allows to execute search on a subset of rows more efficiently.

Dynamic generation of static images:SELECT jc_molconvertb(structure, 'png -2') FROM nci where id = :1

Avaliable image formats: png, jpeg, svg, ...

PNG

Similarity search example displaying ID, SMILES code, and molweight:

SELECT cd_id, cd_smiles, cd_molweight FROM my_structuresWHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8;

Page 16: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Operators and functions V.

Calculate logp:SELECT jc_evaluate('OC(=O)c1c2ccccc2nc3ccccc13', 'logp')

FROM dual;

Generate tautomers:SELECT jc_evaluate_x('NC1=C(CC=O)C=CCC1',

'chemTerms:tautomers() outFormat:smiles') FROM dual;

Generate resonants:SELECT jc_evaluate_x('NC1=C(CC=O)C=CCC1',

'chemTerms:resonants() outFormat:smiles') FROM dual

Page 17: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Index parameters

Index parameters affect:• Fingerprint attributes• Standardizer configuration• Table space and storage options of the index table

Examples:

• Standardization by stripping hydrogens and using basic aromatization:

CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STD_CONFIG=removeexplicitH..aromatize:b')

• Add structural keys to fingerprint for more efficient substructure searching (structural keys are defined in table stfp_keys):

CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STRUCTURALFP_CONFIG=select structure from stfp_keys')

Page 18: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Calls Not Using Indexes

Using SQL statements for calling JChem operators on structures not stored in a table

Sample SQL statement without index information:SELECT jc_contains('O=C1C=CNC=C1', 'n1ccccc1') FROM dual

Setting default properties for calls not using indexes:CALL jc_set_default_property('standardizerConfig', 'aromatize:b')

Page 19: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Supported Column Types

• VARCHAR2: typically for short formats, e.g. SMILES

• CLOB

• BLOB

for longer formats, e.g. MDL molfile, Marvin (mrv)

Page 20: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Supported Structure Table Types

• Regular Table: nci_1k

• JChem Table (generated by jcman or API): jc_nci_1k

CREATE INDEX jcxnci_1k...

Index table:jcxnci_1k_jcx

CREATE INDEX jcxjc_nci_1k...

Rowid of the base table (nci_1k)

Page 21: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Regular Tables vs. JChem Tables

• Regular structure tables– base table and index table are physically distinct– index properties are specified as index parameters

• JChem structure tables– base table and index table are physically the same– most of the “index” properties are specified during table creation (jcman or

Java API)

• Pros & Cons:– inserts from outside the database are faster with JChem tables– JChem tables require Java API or the jcman command line tool (for table

creation) and Java API or special cartridge functions for INSERTs, UPDATEs and DELETEs; standard SQL can be used with regular tables in all cases.

Page 22: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

JChem Cartridge Architecture

Computation intensive operations are performed in a separate Sun JVM.

Advantage: fast execution (optimized native code)flexibility in deployment

Search

Cache

JChem Core

Cache

JChem Streams

JChem Base

Update

Oracle

JChem CartridgeRMI

JDBC

JChem Server

Page 23: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Performance

Table containing 19,528,372 structures from PubChem with Intel Quad CPU Q6600 2.40GHz desktop PC, 8GB memory desktop PC

Substructure search results:

Query Structure Hit Count Time (ms)

C1CN1c2cnnc3c(cncc23)C4=CSC=C4 0 1487

O=C1ONC(N1c2ccccc2)c3ccccc3 129 823

Oc1c(N=N)c(cc2cc(ccc12)S(O)(=O)=O)S(O)(=O)=O 93 764

C(Sc1ncnc2ncnc12)c3ccccc3 489 786

NC1=CC=NC2=C1C=CC(Cl)=C2 6,001 1,189

c1ncc2ncnc2n1 146,256 6,665

Clc1ccccc1 2,975,285 82,646

JChem 5.2

Page 24: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Future plans

• Flexible 3D pharmacophore search

• R-Group decomposition

• Clustering

• Maximum common substructure search type

• Extended fingerprint connectivity (EFPC)

Page 25: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Summary

JChem Cartridge for Oracle allows to access the

rich functionality of JChem Base in a flexible and

efficient manner.

JChem Cartridge for Oracle uses creative solutions

to broaden the applicability of JChem's core

functions while preserving key benefits of the Java

platform.

Page 26: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Links

• Documentation– www.jchem.com/doc/admin/cartridge.html– www.jchem.com/doc/guide/cartridge/index.html– www.jchem.com/doc/guide/cartridge/index.html

• Forum– www.chemaxon.com/forum/

• Brochure– www.chemaxon.com/brochures/JChem_Cartridge.pdf

Page 27: Scientific & technical presentation JChem Cartridge for Oracle version 5.3, January 2010

Visit other technical presentations

MarvinSketch/View http://www.chemaxon.com/MarvinSketch_View.ppt

MarvinSpace http://www.chemaxon.com/MarvinSpace.ppt

Calculator Plugins http://www.chemaxon.com/Calculator_Plugins.ppt

JChem Base http://www.chemaxon.com/JChem_Base.ppt

JChem Cartridge http://www.chemaxon.com/JChem_Cartridge.ppt

Standardizer http://www.chemaxon.com/Standardizer.ppt

Screen http://www.chemaxon.com/Screen.ppt

JKlustor http://www.chemaxon.com/JKlustor.ppt

Fragmenter http://www.chemaxon.com/Fragmenter.ppt

Reactor http://www.chemaxon.com/Reactor.ppt