28
UGM 2006 Miklós Vargyas What’s new in JKlustor

UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Embed Size (px)

Citation preview

Page 1: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

UGM 2006

Miklós Vargyas

What’s new in JKlustor

Page 2: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Overview

• An introduction to JKlustor – Brief history of the product– Main features– Usage examples– Performance

• LibMCS, an alternative approach to clustering chemical structures– Concepts, motivation– Features– Performance

• Future of JKlustor

Page 3: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Brief history of JKlustor

• First discovery tool in the JChem package– Jarp released in version 1.5.2 (March 22, 2001)– Compr 1.5.7 (May 27, 2001)– Ward 1.5.9 (Jun 25, 2001)

• API released in JChem 1.6.2 (May 16, 2002)

• Experimental LibMCS first released in JChem 3.0 (Dec 1, 2004)

• New JKlustor GUI to be released in JChem 3.?

Page 4: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

JKlustor features

• Similarity based clustering– ChemAxon’s topological fingerprint– External data points, arbitrary dimension– Tanimoto, weighted Euclidean

• Hierarchical clustering: Ward– Reciprocal nearest neighbor algorithm– Kelley method

• Non-hierarchical clustering: Jarvis-Patrick

• Diversity calculation: Compr

• Structure based clustering: LibMCS

Page 5: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

JKlustor usage

• Command line tools– Pipelining commands– Option flags– Structure file/database input– Manual creation of cluster views

Input SDFile GenerateMD NNeib

JarvisPatrick CreateView MarvinView Picture

Page 6: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

JKlustor usage

generatemd c input.sdf -k CF -c cfp.xml -D -o fingerprints.txt nneib -f 512 -t 0.1 -g –i fingerprints.txt –o neighborlists.txt jarp -c 0.2 -y –i neighborlists.txt –o clusters.txt

• Prepare data and run clustering

• View first cluster

• View centroids, display cluster id and size

crview -i id -c "clid=1" -s input.sdf -t clusters.txt –o jarp_cluster1.sdf

mview –c 3 -r 3 jarp_cluster1.sdf

crview -i "centr:2" -c "size>=20" -d "clid:size" -s input.sdf -t clusters.txt -o jarp_centroids.sdf

mview -c 3 -r 3 -f "clid:size" jarp_centroids.sdf

Page 7: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

JKlustor usage

Page 8: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

0

2000

4000

6000

8000

10000

12000

14000

100 1000 10000 20000 40000 100000

library size

run

tim

e (s

)

Ward 512

Jarp 512

JKlustor performance

• Memory: O(n)

• Time: Jarvis-Patrick O(n1.5), Ward O(n2)

Page 9: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

What is MCS?

• The Maximum Common Substructure of two chemical structures

Page 10: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Clustering by MCS?

• Find the MCS of a group of structures

Page 11: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Very brief history of LibMCS

• Reaction automapper, based on Maximum Common Subgraph Search

• MCS class API made public

• Customer requested MCS based clustering– More intuitive than similarity based– Focused set analysis

• screens: 2000 – 10000 structures• lead optimization: 3000 – 5000 structures

– Should be hierarchical (outliers)– Ultimate goal: cluster 5000 compounds in 5

seconds

Page 12: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS features

• MCS based hierarchical clustering

• Flexible search options

• Hierarchy browser

• Filtering by chemical properties

• Cluster statistics

• No size limitation

• Fast operation

Page 13: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Dendogram view

Page 14: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Molecule view

Page 15: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Table view

Page 16: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Statistics

Page 17: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Selections

Page 18: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Property filters

Page 19: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Output files

Page 20: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Output files

CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)CN(C2CCCC2)C1=O 0 21 0CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)C2CCCN2C1=O 0 21 0OC(=O)C1CCCN1C(=O)CCS CC(CS)C(=O)N1CCCC1C(O)=O 0 19 0OC(=O)C1CCCN1C(=O)CCS [H]C1(CCCN1C(=O)CCS)C(O)=O 0 19 0OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCC2SC(=O)C3=CC=CC=C3 0 19 0OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCCC2S 0 19 0CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1SC(=O)C2(C)CC3=CC=CC=C3CN2C1=O 0 20 0CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1CSC(=O)C2CC3=C(CN2C1=O)C=CC=C3 0 20 0CC1SC(=O)C2CCCN2C1=O CC1SC(=O)C2CCCN2C1=O 0 30 0CC1SC(=O)CNC1=O CC1SC(=O)CNC1=O 0 29 0OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 0 31 0CC(S)C(=O)NCC(O)=O CC(S)C(=O)NCC(O)=O 0 24 0CCC1=CC=CC=C1 CC(NC(CCC1=CC=CC=C1)C(O)=O)C(=O)N2CCCC2C(O)=O 0 22 0CCC1=CC=CC=C1 CCOC(=O)C(CC1=CC=CC=C1)NC(=O)NC(CC2=CC=CC=C2)C(=O)OCC 0 22 0OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 0 23 0C\C(Cl)=N/OC(N)=O C\C(Cl)=N/OC(N)=O 0 27

> <Cluster_ID>1163

> <Element_count>1

> <Parent_ID>1

$$$$

Marvin 05290619172D

23 24 0 0 0 0 999 V2000 2.4230 -0.3587 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.0538 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.8788 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.4349 -1.1837 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.1494 -1.5962 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.8638 -1.1837 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0

Page 21: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – RGroup decomposition

Page 22: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – RGroup decomposition

Page 23: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Performance

• Depends on– average structure size– total diversity– minimal required MCS size– atom/bond constraints

• Scales linearly

• Maximum speed achieved– 1 000 structures in 3 seconds

• Memory requirements– 100 000 structures occupy 200MB

Page 24: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Performance

0

500

1000

1500

2000

2500

3000

3500

4000

0 5000 10000 15000 20000 25000 30000 35000

Structure count

Ru

nn

ing

tim

e (s

ec)

Page 25: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

LibMCS – Further applications

• Find the MCS of existing clusters

• Data retrieval

• Assay analysis

• Compound acquisition

• Combinatorial library profiling

Page 26: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Development plans

• Disconnected MCS

• Multi-group clustering

• More chemical sense (e.g. avoid opening rings, consider chirality)

• Performance tuning (e.g. NN)

• Integrate Ward/Jarp into new GUI

• Additive clustering

• Clustering million compound libraries

• Integrate Chemical Terms

• Integrate molecular descriptors, optimized metrics

Page 27: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Summary

• New tool in JKlustor based on MCS

• More plausible grouping

• Hierarchical with dendogram browser

• Statistics

• Filtering, coloring, selection

Page 28: UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance

Acknowledgements

• Developers– Ferenc Csizmadia, Árpád Tamási,

András Volford, Szilárd Doránt– Péter Vadász, Nóra Máté

• Special thanks