31
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Embed Size (px)

Citation preview

Page 1: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT

Chemoinformatics Software for Data

Management

Joanna Jaworska Nina JeliazkovaP&G Brussels, Ideaconsult Ltd.,

Belgium Bulgaria

Page 2: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Introduction – why Ambit ?

Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD)

Realization that efficient use of existing information on chemicals requires better ways for

• Storage − standardized formats, computer automated verification of structures, capability to

store large amounts of data

• Taking advantage of rapidly evolving field of data mining and extraction of relevant information

Page 3: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

IT strategy

Ambit - building blocks for Decision Support System

High emphasis on • interoperability for “plug and play”

• Flexibility modular design

• Transparency− Open source, relying on open standards. Open source software lowers the

user barrier, facilitates the dissemination activities and enables the reproducibility of models and results

− The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit http://cdk.sourceforge.net/

− The software is based on MySQL database (www.mysql.com), which is the most popular open source relational database.

− Chemical Markup LanguageChemical Markup Language (CML) (CML) • acknowledged method of encoding chemical data in XML acknowledged method of encoding chemical data in XML • Is being adopted by a large number of chemical organisations, from government, Is being adopted by a large number of chemical organisations, from government,

through commercial to academia. through commercial to academia. • The choice of CML for the internal format makes the database independent of the The choice of CML for the internal format makes the database independent of the

software which is able to access it, in contrast to some proprietary solutions.software which is able to access it, in contrast to some proprietary solutions.

Page 4: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit - Overview

AMBIT software is a set of libraries and tools, providing various cheminformatics functionalities for data management.

The AMBIT system consists of a database and functional modules allowing a variety of flexible searches and mining of the data stored in the database.

The unique feature of AMBIT is the ability to store multifaceted information about chemical structures and provide a searchable interface linking these diverse components.

Page 5: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit overviewThe AMBIT database:

• stores chemical structures, their identifiers such as CAS, INChI numbers; attributes such as molecular descriptors, experimental data together with test descriptions, and literature references. The database can also store QSAR models. In addition the software can generate a suite of 2D and 3D molecular descriptors.

• can be searched by identifiers, attribute value or range, experimental data value or range, user defined structure and substructure, structural similarity

• AMBIT database contains over 450 000 chemical compounds with data imported from over a dozen databases [http://ambit.acad.bg/ambit/stats/]. The number of compounds is growing all the time and one the of system’s great strengths is that any dataset can be imported for comparison and analysis. AMBITDatabaseTools 1.10 allows the user to create a local database and to import his own sets of chemical compounds.

AMBIT Discovery performs chemical grouping and assesses the applicability domain of a QSAR offering a variety of methods including using different approaches to similarity assessments: statistical that rely on ‘descriptor space’; approaches based on mechanistic understanding; and approaches based on structural similarity.

ToxTree ToxTree is a flexible user friendly application which integrates structure based (classification) schemes. Currently 3 schemes are available: Verhaaar for fish toxicity, Cramer for human acute toxicity, BfR rules for skin irritation. ToxTree implements a plug-in mechanism, allowing to be extended by modules developed at a future time, without recompiling the application. ToxTree and AMBIT modules can be integrated one within another.

Toxmatch – stand alone application for pairwise similarity assessments with intention for read-across.

QSAR database under development. Will store information in QMRF. Large effort on standardization

Page 6: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT Database Today

Not restricted to these datasets!Any dataset can be imported. (e.g. DSSTox, AQUIRE, LLNA …)

Page 7: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT Database Schema

Page 8: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Experimental results repository

Page 9: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit database

Two user interfaces to the database• Online

• StandaloneOnline

• a more restricted interface

Standalone• Full interface

• Can be used for storing & managing confidential data

Common• Can link with other databases and pull information via webservices

Page 10: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT database functionalitiesStorage: information about chemicals name and structure, descriptors, experimental data and QSAR models

• Example with a tailored template : BCF golden database LRI project ( EURAS) Q2 2007

• QSAR database with QMRF ( ECB funded)

Conversion:• Different computer formats of structure, CAS-structure

Calculation• Variety of descriptors

• The available list is growing thanks to contributions to CDK

Search• identification search (CAS, SMILES, chemical name)

• Descriptor search

• Experimental data search

• Substructure and similarity search

Complex searches with multiple criteria (standalone)

Page 11: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria
Page 12: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria
Page 13: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

What kind of searches are desired ?

•Detailed analyses for pairwise similarity

•Similarity of a compound to compounds in the database

•Similarity of a compounds to a reference set

•Similarity of a set of compounds to compounds in the database •Grouping based on chemical class

Page 14: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit onlineSearching for basic information

Page 15: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT Online:Similarity search – replace with new search results !!!

Page 16: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT Online:Query result

Page 17: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Links to other databases:(example: KEGG)

Page 18: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Link to Aquire

Page 19: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Information about QSAR models

Page 20: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit Database Tools 1.20Standalone application

available at http://ambit.acad.bg/downloads

Page 21: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit converter (Batch search)

Ambit converter can open : CML, CSV, HIN, ICHI, INCHI, MDL MOL, MDL SDF, MOL2, PDB, SMI, TXT and XYZ file types

Ambit converter can save : SDF, MOL, CSV, TXT, SMI file types.

•CAS-SMILES conversion based on a database lookup

•Descriptors calculation

•Cramer rules,

•Verhaar scheme

Page 22: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit Database Tools 1.20Import to Database•Compounds – several file formats•Descriptors – SDF, CSV, TXT•Experimental data – SDF, CSV, TXT•QSAR models – SDF, CSV, TXT

Database processing•Calculate SMILES/Fingerprints/Atom environments – necessary in order to perform substructure and similarity search. Should be invoked after importing compounds into database•several file formats•Descriptors calculation•Distances calculation – used to speed up distance between heavy atoms query

Page 23: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Ambit Database Tools 1.20

•perform a CAS RN search in the database (submenu "Search -> CAS RN search"); •perform a SMILES search in the database (submenu "Search -> SMILES"); •perform a molecular formula search in the database (submenu ("Search -> Molecular formula"); •define structure,descriptor,distance-based and experimental data criteria and perform searches in the database database

•Output:•On screen•To file

The user can select between the different datasets existing in the AMBIT database. Subsequent searches will be performed only within the selected dataset

Page 24: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT User InterfaceExample: Search by structure

•Exact search

•Substructure search

•Similarity search

•Fingerprints

•Atom environments

Page 25: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT User InterfaceExample: Search by descriptors

Page 26: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

AMBIT User InterfaceExample: Search by experimental data

Page 27: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Similarity based on toxicity mechanismVerhaar scheme

Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for

Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp.471-491, 199234 rules5 classes

• Class 1. Narcosis or baseline toxicity

• Class 2 Less inert compounds

• Class 3 Unspecific reactivity

• Class 4 Compounds and groups of compounds acting by a specific mechanism

• Class 5 Not possible to classify according to these rules

Page 28: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria
Page 29: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Chemical similarity assessment using the database

Exact substructure search based on 2D

Structural Similarity search (various methods)

Criteria on descriptors

Based on mechanistic understanding ( Verhaar scheme)

Page 30: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Another view on Similarity assessments with Toxmatch and Discovery

Discovery

• similarity to a set (summary representation)

Toxmatch

• pairwise similarities

• Similarity to a set (nearest neighbours)

Page 31: AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria

Thank you

Questions?Questions?