Upload
isabella-campbell
View
213
Download
0
Embed Size (px)
Citation preview
Indiana University School of
Projects 1-4 update
David WildCICC Quarterly Meeting
January 27th 2006
Indiana University School of
CICC-related projects
• Formal CICC projects1. Innovative cross-screen analysis of NIH DTP Human Tumor Cell
Line Data – innovative scientific analysis of NIH HTS data2. Development of cheminformatics web services and use cases in
Taverna – web service & workflow infrastructure3. Development of a novel interface for the analysis of PubChem
HTS data – tools for interacting with lots of complex data4. A structure storage and searching system for Distributed Drug
Discovery – innovative kinds of chemical databases
• Other, related projects– Fast clustering of very large datasets using Linux clusters– Smart client for mining drug discovery data (Microsoft
supported)
Indiana University School of
PROJECT 4Experimental
Databases
PROJECT 2Web services& workflows PROJECT 1
Innovative cross-screenanalysis ofHTS data
PROJECT 3Visualization, navigation
& analysis tools forHTS data
SMART CLIENTSmart interfaces (incl.NLP, RSS, agents, etc)
SMART CLIENTGeneral drug discovery
web services& workflows SMART CLIENT
Smart interfaces (incl.NLP, RSS, agents, etc)
FAST PARALLELCLUSTERING
Using DivKmeans& AVIDD
Indiana University School of
Desired outcomes by Summer 2006
• A chemical informatics web service infrastructure running at IU• Several Taverna workflows that use these and other web
services, and which demonstrate that the infrastructure can be used to perform complex, relevant operations on PubChem data
• Demonstrated scientific results with the NIH DTP data• An established Distributed Drug Discovery database linked with
PubChem, that shows that our techniques together with PubChem can be employed in ways which benefit humanity in general
• A sandbox PubChem copy with improved functionality and architecture
• One or more novel visualization tools for PubChem data• Demonstrate the feasibility of fast, accurate clustering of very
large datasets (including the whole of PubChem) using the AVIDD Linux Cluster and a parallelized clustering algorithm (DivKmeans)
• Show that .NET and Java-based web services can work well together in a common infrastructure
• Demonstrate the feasibility of a natural language or other straightforward interface for scientists to express their information needs
Indiana University School of
NIH DatabaseService
PostgreSQLCHORD
FingerprintGenerator
BCI Makebits
ClusterAnalysis
BCI Divkmeans TableManagement
VoTables
PlotVisualizer
VoPlot
DockingSelector
Script
2D-3D
OpenEye OMEGA
Docking
OpenEye FRED
3D Visualizer
JMOL
Cluster the compounds in the NIH DTP database by chemical structure, then
choose representative compounds from the clusters and dock them into
PDB protein files of interest
SMILES + ID
Fingerprints
PDB DatabaseService
SMILES + ID + Data
ClusterMembership
SMILES + ID + + Cluster # + Data
SMILES + ID
MOL File
PDB Structure +
Box
Docked Complex
Indiana University School of
NIH DatabaseService
PostgreSQLCHORD
DockingSelector
Script
2D-3D
OpenEye OMEGA
Docking
OpenEye FRED
3D Visualizer
JMOL
PDB LigandDatabaseService
SMILES + ID + + Data
NIH SMILES + ID
MOL File
Docked Complex
PDBDatabaseService
Prot
ein