28
Online chemical modeling environment: models Iurii Sushko, Sergey Novotarskiy Thursday, August 13, 2009

Online Chemical Modeling Environment: Models

  • Upload
    ssa-kpi

  • View
    570

  • Download
    3

Embed Size (px)

DESCRIPTION

AACIMP 2009 Summer School lecture by Yuriy Sushko and Sergii Novotarskyi. "Environmental Chemoinfornatics" course.

Citation preview

Page 1: Online Chemical Modeling Environment: Models

Online chemical modelingenvironment: models

Iurii Sushko, Sergey NovotarskiyThursday, August 13, 2009

Page 2: Online Chemical Modeling Environment: Models

Existent alternatives

Classical approach: Weka, R, Mathematica

Advantages:

1. Most flexible2. Suitable for research and deep analysis

Disadvantages:

1. It’s complex: suitable for mathematician,informatician, statistician but notchemist and biologist

2. Very tedious data preparation

Page 3: Online Chemical Modeling Environment: Models
Page 4: Online Chemical Modeling Environment: Models

Community driven source Authority driven source

Page 5: Online Chemical Modeling Environment: Models

Collaboration in QSAR

Possibilities for collaboration in QSAR:

1.Use others' dataa.build models, based on others' datab.validate your models against others' data

2. Use others' modelsa.validate your data against published modelsb.use output of published models

as an input for new onesc.compare performance of published models

with own ones

All existent modeling tools lack means of collaboration

Page 6: Online Chemical Modeling Environment: Models

OCHEM advantages

Collaboration-targeted features:

1. Tight connection between database andmodeling tools

2. Wiki, discussion, comments, tags

Simplified modeling workflow:

1. Sensible defaults for most parameters2. Only necessary parameters requested3. Data representation is targeted for chemist4. Possibility of fine tune for experts

Page 7: Online Chemical Modeling Environment: Models

Modeling workflow

1. Data preparation

2. Building a model

3. Analysing the model

4. Application of themodel

AD

Page 8: Online Chemical Modeling Environment: Models

Stage 1 – Data preparation

IntroducerBill G., Sergey B.

Date of modificationInformationsystem

TagsToxicology, Biology,Partition coefficient.

logP = 0.5Melting Point = 100

C

PropertyTemperature,pH, species,

tissue, method

Condition

Garberg, P“In vitro models for …”

ArticleBenzene, Urea, ...

Structure

FilteringToxicology, Biology,Partition coefficient.

Data Point

ManipulationEditing

OrganizationWorking sets<

Page 9: Online Chemical Modeling Environment: Models

Stage 1 – Data preparation TagsToxicology, Biology,Partition coefficient.

ManipulationEditing

OrganizationWorking sets<

FilteringToxicology, Biology,Partition coefficient.

Page 10: Online Chemical Modeling Environment: Models

Stage 1: Data preparation

Page 11: Online Chemical Modeling Environment: Models

Stage 1: Data preparation

Page 12: Online Chemical Modeling Environment: Models

Stage 1: Data preparation

Page 13: Online Chemical Modeling Environment: Models

Stage 1: Data preparation

Page 14: Online Chemical Modeling Environment: Models

Stage 2: Model building - input data

Page 15: Online Chemical Modeling Environment: Models

Stage 2: Model building - descriptors (I)

Page 16: Online Chemical Modeling Environment: Models

Stage 2: Model building - descriptors (II)

Page 17: Online Chemical Modeling Environment: Models

Stage 2: Model building – descriptors (manual)

Page 18: Online Chemical Modeling Environment: Models

Stage 3: Analysing the model (I)Basic model statistics

Page 19: Online Chemical Modeling Environment: Models

Stage 3: Analysing the model (II)Applicability domain assessment

Page 20: Online Chemical Modeling Environment: Models

Stage 4: Application of the modelSelection of the model of interest

Model, published by another user

Newly created model

Page 21: Online Chemical Modeling Environment: Models

Stage 4: Application of the modelProvide target compounds

Page 22: Online Chemical Modeling Environment: Models

Stage 4: Application of the modelPrediction results

Target compound Prediction Accuracy assessment

Page 23: Online Chemical Modeling Environment: Models

Stage 4: Application of the modelAssessment of accuracy of predictions

Target compound

Page 24: Online Chemical Modeling Environment: Models

Need for distribution of calculations

Fact: QSAR modeling is calculation-intensive

Examples of calculations:• Training of neural network ensembles• Computing 3D conformations• Computing complex molecular descriptors

Solution:• Distributed calculation network• User can postpone, cancel or fetch task results later

Page 25: Online Chemical Modeling Environment: Models

Automatic updates and testing

Calculation servers are automatically updated uponavailability of new releaseAutomatic testing of servers upon updatesTasks that did not pass tests are disabled, keepingthe server functional

Page 26: Online Chemical Modeling Environment: Models

Backend - distributed calculationCentral metaserver, distributed calculation serversAutomatic server updates, on-the-fly server testing

Page 27: Online Chemical Modeling Environment: Models

Basic facts

About 50000 experimental measurements on285 physicochemical properties published inabout 2000 articlesImplemented modeling methods:ANN, KNN, MLR, Kernel ridge regressionIntegrated descriptors: Dragon, E-State,Fragments

Page 28: Online Chemical Modeling Environment: Models

Backend - basic facts

Platform: Java EEDatabase: MySQLServer: TomcatORM: HibernateMVC: Spring frameworkClient side: AJAX, HTML+Javascript