Upload
chemaxon
View
181
Download
2
Tags:
Embed Size (px)
Citation preview
Making workflows Work
Enterprise KNIME deployment at Lilly
James A. Lumley (Research IT UK)
ChemAxon UGM Budapest 2014
Making workflows Work!
1. Why KNIME?
2. Old meets New
3. Don’t mention structures
4. Better conversions
20/05/2014
KNIME@Lilly ‘Freemium’ turned ‘Premium’
• 2010: Strong usage including opensource contributions
by Mike Bodkins UK CompChem group
• 2012: Research IT consolidated workflow tools via
KNIME.com Enterprise license and built an infrastructure
to develop and deploy the tool globally
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 6
Java/Eclipse platform allows easy
creation of custom extensions
(including security model)
Server helps drive Sci/IT collab
and knowledge capture
Integration with existing legacy
systems & data (esp. via SOA)
Strong precedence for
Workflow Software in Pharma
OpenSource Nodes:
+
• Due a ‘refresh’
• Chemaxon dependency in many
nodes including:
• Chemical structure handling
(conversions), sketcher (Marvin),
Molecule Difference check (testing)
and rendering (views)
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 8
Example Lilly Node
using Chemaxon:
• Multi-molecule sketcher
extension based on Marvin
• Configure to sketch and edit
multiple structures or reactions
• Output multiple structures
(port_0) or reactions (port_1)
on node execution
• Internally reuse code for
sketcher applet in webportal
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 9
2 years on, significant usage*:
• CompChem/MedChem, ADME Reporting, Analytical
Technologies Automation, Sample Management,
Automating Data ETL & Data Exploration…
http://www.knime.com/files/004_kuduk2013-jamesalumley-lilly.pdf
http://www.knime.com/knime-user-day-uk-2013-news
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 10
*
Many nodes link legacy systems:
1. Retain ‘trusted’ status of internal data access tools
(e.g.: internal system for integrated data access,
Mobius)*
2. Retain power of in house legacy predictive modelling
code e.g.: SVM models unix code
3. Interface with new systems e.g.: AT Structure
Verification tools
50% of >100 internal nodes use SOA or similar to
serve analytics tools and data to KNIME
http://www.triconference.com/11/ird
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
*
Making workflows Work!
3. Don’t Mention Structures:
Getting KNIME to work with different data
security models
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
14
Huge reliance on SOA to provide Tools and Data to KNIME:
+ moves data security issues to web service layer
+ reduces CPU load on ‘office’ laptops
- Services needs constant monitoring
- Large work effort adding NTLM Auth to Webservice nodes
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 15
• In application support page/tab
• Status of Webservices (separates node errors from service layer errors)
• Links to Webpages
• Known Bugs/Issues from Redmine
Making workflows Work!
4. Better conversions
Ensuring good interplay between the many
chemical data types in KNIME without users feeling
the pain
• Converter nodes in top 20 most commonly used nodes in analysis
of >2000 workflows on Lilly KNIME server
• Some workflows contain around 50% converter nodes
• New users confused by multiple molecule types and conversions
(Analysis from Summer 2012)
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
Lilly Matched Pairs node
requires RDKit type
Internal unix code
(service layer) requires
Smiles value Property calculator
needs CDK type
Internal data
retrieval system
serves data and
molecule as chime
type
converter
converter
converter converter
• Different Chemical
Types don’t work well
together
• Users constantly
converting chemical
data from one ‘type’ to
another
• Worse for Lilly nodes
that utilise many formats
with no ‘standard’
vendor like
representation
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
Aim:
• Remove need for user to manually add chemical converter nodes
• Ensure Nodes that use different Chemical formats to work together
better
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
KNIME.com introduced “Adaptor Cell” in 2.9
• Container with several representations of same entity
• Node can add additional representations that can be re-used by
downstream nodes
• Avoids multiple conversions
• Original representation still present
• Vendor Specific! No pseudo standards such as SDF
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 21
SDF RDKit CDK Indigo
Lilly Solution for (Pseudo) standards:
• Extension point for handling Molecule Type conversions
• Depends on Marvin library for Molecule conversions
+
• In development!
• Will be released opensource
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 22
• Extension point moves conversions into Node configuration
• Workflow still documents explicit type conversions
• Still retains support for Converter nodes if/when appropriate
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 24
5/20/2014 Company Confidential © 2014 Eli Lilly and Company 25
Converters could be ‘chained’ if
direct conversion not available
(e.g.: InChI or Chime). Example
shown in dialogue:
Before
After
Requires SMILES
Making Workflows Work:
• Added many legacy tools and data services into
KNIME via custom nodes and SOA
• Aided usability by adding dashboard for service layer
monitoring
• Added authentication handling via NTLM Auth to
provide data authentication at source
• Adding molecule handling framework to reduce
number of molecule conversions users need
20/05/2014 Company Confidential © 2014 Eli Lilly and Company
Acknowledgements
5/21/2014 Company Confidential © 2014 Eli Lilly and Company 27
Java Coding & Infrastructure (Lilly):
Luke Bullard, Tom Wilkin
Project Management, End User support, Expert Users (& Testers), Previous Developers etc.:
Derek Marren, Marnie Williams, Pip Turner, Matt Hirst, Dave Thorner, Dave Evans, Mike Bodkin,
Niko Fechner, Roger Robinson, Jibo Wang, Christos Nicolaou, Beth Wright, Gary Sharman,
Simon Richards, Stuart Morton, Jason Ochoada, Jim Hughes
(In no particular order!)
KNIME.com
Bernd, Thorsten, Thomas, Aaron ++