28
What’s new in JChem back- end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics

What’s new in JChem back-end and Markush storage, search and enumeration

  • Upload
    loman

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

What’s new in JChem back-end and Markush storage, search and enumeration. Szabolcs Csepregi. Solutions for Cheminformatics. Contents. ChemAxon chemical database tools Main features of JChem Base, Cartridge Example interfaces: JSP, ASP, AJAX examples Integration with other CXN products - PowerPoint PPT Presentation

Citation preview

Page 1: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new in JChem back-end and Markush storage, search and enumeration

Szabolcs Csepregi

Solutions for Cheminformatics

Page 2: What’s new in JChem back-end and Markush storage, search and enumeration

Contents

• ChemAxon chemical database tools

• Main features of JChem Base, Cartridge

• Example interfaces: JSP, ASP, AJAX examples

• Integration with other CXN products

• Markush structure storage, search and enumeration

• Recent developments, plans

Page 3: What’s new in JChem back-end and Markush storage, search and enumeration

Chemical database products

JChem Base– A library for adding chemical structures into relational

database systems. Available in Java, JSP and .NET– Open-source web application example is available.

JChem Cartridge for Oracle– Extends Oracle SQL with chemical operators and index.– SQL interface for ChemAxon functionality

Instant JChem– An all-in-one desktop chemical database application.

JChem Web Services – SOAP interface to JChem Base

JC4XL – Excel integration (coming)

3

Page 4: What’s new in JChem back-end and Markush storage, search and enumeration

Compatibility and integration

Supported chemical file formats:• SMILES• MDL MOL/RXN/SDF/RDF (v2000 and v3000)• CML, MRV• IUPAC and traditional names• InChI, mol2, PDB, etc.

Database engines:• Oracle, MySQL, MS SQL Server, MS Access,

PostgreSQL, IBM DB2, Derby, etc.

All operating systems through:• Java API (JChem Base)• .NET API (JChem Base + IKVM) – for Windows• SQL (Cartridge)

4

Page 5: What’s new in JChem back-end and Markush storage, search and enumeration

Structure searching: features• Substructure, Similarity,

Full, Full fragment, etc. search types

• Wide range of query atoms

• Query properties

• R-group queries

• Full SMARTS support

• Coordination compounds

• Link nodes

• Pseudo atoms, Lone pairs

• Relative stereo

• Reaction search features

• Polymers

• Position variation

• Hit coloring ...

www.chemaxon.com/conf/Structural_Search.ppt

5

Page 6: What’s new in JChem back-end and Markush storage, search and enumeration

Structure searching: options

Some selected structure search options:– Chemical Terms filter constraint– Tautomer search– Stereo on/off– Ignore charge/isotope/radical/valence/polymers– Vague bond matching modes: „or aromatic”; ignore

bond types– Inverse hit list– Maximum search time / number of hits– SQL SELECT statement for pre-filtering– Ordering of results– etc.

6

Page 7: What’s new in JChem back-end and Markush storage, search and enumeration

Structure search: performance

7

JChem Base 5.2.0,

Intel Quad Q6600 2.4GHz,

8GB RAM; Oracle 10.2.0.3

Number of compounds

Elapsed time

Duplicates not checked

Duplicates checked

10,000 21 s 26 s

100,000 2 min 2 min 36 s

200,000 3 min 45 s 5 min 5 s

Query Number of hits Search time

2 0.81 s

93 0.79 s

5,855 1.457 s

142,950 11.076 s

Compound registration:

Substructure search in PubChem (19.5 million

compounds):

Page 8: What’s new in JChem back-end and Markush storage, search and enumeration

Table typesControl allowed chemical structures and available

operations

• Molecule

• Reaction

• Markush

• Query

• Any structure

8

Page 9: What’s new in JChem back-end and Markush storage, search and enumeration

Example web applications

Open source JSP, ASP examples– Marvin applets

are used for query drawing and structurevisualization

AJAX example– Back-end is JChem

Web Services– No Java is needed

for browsing

Demo

9

Page 10: What’s new in JChem back-end and Markush storage, search and enumeration

Integration

Integration with other ChemAxon tools: – Custom, uniform chemical representation. (Standardizer –

see separate presentation today.)– Automatically calculated properties by Chemical Terms

Calculated columns (Calculator plugins)– Additional similarity calculations (Screen - JChem Base

only) – Tautomer handling:

• Tautomer search

• Tautomer duplicate filter table/index option

• Custom tautomer transforms or canonical tautomer using Standardizer

– Query drawing and structure visualization (Marvin)Provides the most consistent interface and back-end.

10

Page 11: What’s new in JChem back-end and Markush storage, search and enumeration

Integration

Additional Cartridge functionality– JChem index (for non-JChem tables)– Communication with Oracle optimizer– Reaction based enumeration (Reactor)– Format conversions – image generation also– Markush enumeration (Calculator plugins)– Property predictions through Chemical Terms

(Calculator plugins)

11

Page 12: What’s new in JChem back-end and Markush storage, search and enumeration

Registration system

• New component for registration system is under development (API only)

• Main features:– Customizable business logic

• Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures

– Identification, split and registration of salt and solvent structures Storage of input structures in original format

– Mock registration (dry run)

– Pre-registration through a transitory area

– Basic, customizable implementation examples • Separate examples for chemists and registrars

• Web and Instant JChem interfaces will follow later

12

Page 13: What’s new in JChem back-end and Markush storage, search and enumeration

Handling of Markush structures

Page 14: What’s new in JChem back-end and Markush storage, search and enumeration

Markush structures

• Combinatorial Markush structure registration and search features handled in search and enumeration– R-groups (nesting to any depth)– Atom lists, bond lists– Position variation bond– Link nodes– Repeating units– Homology groups (aryl, alkyl, etc.)

• Built-in• User-defined

• Compatible Markush enumeration plugin

Page 15: What’s new in JChem back-end and Markush storage, search and enumeration

Markush Enumeration

• Markush enumeration plugin– Full enumeration– Selected parts only– Random enumeration– Calculate library size:

exact size of huge Markush libraries

arbitrary precision orMagnitude

– Scaffold alignmentand coloring

– Markush code– Optional example

homology groupenumeration

Page 16: What’s new in JChem back-end and Markush storage, search and enumeration

Markush storage & search

• Available in JChem Base and Instant JChem

• No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.)

• Substructure and Full structure search

• Basic query features supported

• Substructure hit visualization: „Markush structure reduction”

Page 17: What’s new in JChem back-end and Markush storage, search and enumeration

Markush demo

Page 18: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new

Page 19: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new: JChem Base

5.1– Position variation in queries– New fast & reliable tautomer duplicate search

5.2– .NET API– Polymer storage and search– New query options and features including searching of

attached data, group matching of undefined R-atoms, repeating units.

– Improved substructure search performance– JChem Web Services– New metrics for similarity search (Tversky, etc.) (5.2.2)

Page 20: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new: JChem Base

Polymer support details

• Polymer brackets and properties(type, connectivity, etc.) considered during search and registration

• Attached data search (optional) – attached to atoms/bonds/brackets

• Source- and structure-based representation equivalence is checked (but can be switched off)– Addition to a double bond. E.g. polystyrene.– Polymerization through elimination of water or HCl. E.g.

polyester, polyamide.

Page 21: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new: JChem Base

Polymer support details (cont.)

• Ladder type polymers

• Phase-shifting (for ht SRU) (can be switched off)

• End group matching:– * atoms: unspecified end groups– Search option to switch on/off end group matching

• Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod

• Polymer mixtures

• New search options

Page 22: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new: Cartridge-specific

5.1– Tautomer duplicate filtering index option– Alter index option– Improved import speed (5.1.3)– Improved upgrade: no need to remove/recreate indices

(5.1.4)

5.2– Interactive installer– Increased substructure search performance (5.2.2)– Tversky similarity search (5.2.2)

Page 23: What’s new in JChem back-end and Markush storage, search and enumeration

What’s new: Markush

• New Features– Homology groups

• 19 built-in groups• Customizable:

– Examples (for built-in groups, enumeration only),

– Full user-defined homology groupsdefined by R-group definition

• Marvin templates for easier sketching

– Import reagent files as R-groups– Position variation and Repeating units

Page 24: What’s new in JChem back-end and Markush storage, search and enumeration

Plans

Page 25: What’s new in JChem back-end and Markush storage, search and enumeration

Plans: JChem Base & Cartridge

JChem Base

• Further speed improvements (SSS, similarity)

• New vague bond level options

• R-group decomposition integration

• Improved support for Screen molecular descriptors

Cartridge

• Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search

• User-defined descriptor fingerprints

• Markush tables and search

• JChem Server, JChem cluster

Page 26: What’s new in JChem back-end and Markush storage, search and enumeration

Plans: Markush

– .VMN import (format used by Merged Markush Service & Derwent World Patent Index)

– Multiple graphical attachment points of R-groups– Homology variation queries– Overlap analysis of Markush structures– Homology group properties (# of atoms, branching points,

# of heteroatoms, etc.)– Conditions for Markush variables

Page 27: What’s new in JChem back-end and Markush storage, search and enumeration

Summary

• JChem Base and Cartridge are comprehensive and efficient

• Markush structure storage, search and enumeration now reaching patent features coverage

• Continuous development, improvements in the pipeline

Page 28: What’s new in JChem back-end and Markush storage, search and enumeration

Find out more

• Product descriptions & linkswww.chemaxon.com/products.html

• Forumwww.chemaxon.com/forum

• Presentations and posterswww.chemaxon.com/conf

• Download

www.chemaxon.com/download.html