27
Making workflows Work Enterprise KNIME deployment at Lilly James A. Lumley (Research IT UK) ChemAxon UGM Budapest 2014

EUGM 2014 - James Lumley (Eli Lilly and Co.): Making Workflows Work: Enterprise deployment of KNIME at Lilly

Embed Size (px)

Citation preview

Making workflows Work

Enterprise KNIME deployment at Lilly

James A. Lumley (Research IT UK)

ChemAxon UGM Budapest 2014

Making workflows Work!

1. Why KNIME?

2. Old meets New

3. Don’t mention structures

4. Better conversions

20/05/2014

Making workflows Work!

1. Why KNIME?

20/05/2014

KNIME@Lilly ‘Freemium’ turned ‘Premium’

• 2010: Strong usage including opensource contributions

by Mike Bodkins UK CompChem group

• 2012: Research IT consolidated workflow tools via

KNIME.com Enterprise license and built an infrastructure

to develop and deploy the tool globally

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 6

Java/Eclipse platform allows easy

creation of custom extensions

(including security model)

Server helps drive Sci/IT collab

and knowledge capture

Integration with existing legacy

systems & data (esp. via SOA)

Strong precedence for

Workflow Software in Pharma

Infrastructure to support the deployment

Company Confidential © 2014 Eli Lilly and Company

OpenSource Nodes:

+

• Due a ‘refresh’

• Chemaxon dependency in many

nodes including:

• Chemical structure handling

(conversions), sketcher (Marvin),

Molecule Difference check (testing)

and rendering (views)

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 8

Example Lilly Node

using Chemaxon:

• Multi-molecule sketcher

extension based on Marvin

• Configure to sketch and edit

multiple structures or reactions

• Output multiple structures

(port_0) or reactions (port_1)

on node execution

• Internally reuse code for

sketcher applet in webportal

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 9

2 years on, significant usage*:

• CompChem/MedChem, ADME Reporting, Analytical

Technologies Automation, Sample Management,

Automating Data ETL & Data Exploration…

http://www.knime.com/files/004_kuduk2013-jamesalumley-lilly.pdf

http://www.knime.com/knime-user-day-uk-2013-news

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 10

*

Making workflows Work!

2. Old Meets New:

KNIME working alongside legacy systems

Many nodes link legacy systems:

1. Retain ‘trusted’ status of internal data access tools

(e.g.: internal system for integrated data access,

Mobius)*

2. Retain power of in house legacy predictive modelling

code e.g.: SVM models unix code

3. Interface with new systems e.g.: AT Structure

Verification tools

50% of >100 internal nodes use SOA or similar to

serve analytics tools and data to KNIME

http://www.triconference.com/11/ird

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

*

Making workflows Work!

3. Don’t Mention Structures:

Getting KNIME to work with different data

security models

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

14

Huge reliance on SOA to provide Tools and Data to KNIME:

+ moves data security issues to web service layer

+ reduces CPU load on ‘office’ laptops

- Services needs constant monitoring

- Large work effort adding NTLM Auth to Webservice nodes

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 15

• In application support page/tab

• Status of Webservices (separates node errors from service layer errors)

• Links to Webpages

• Known Bugs/Issues from Redmine

Making workflows Work!

4. Better conversions

Ensuring good interplay between the many

chemical data types in KNIME without users feeling

the pain

• Converter nodes in top 20 most commonly used nodes in analysis

of >2000 workflows on Lilly KNIME server

• Some workflows contain around 50% converter nodes

• New users confused by multiple molecule types and conversions

(Analysis from Summer 2012)

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

Lilly Matched Pairs node

requires RDKit type

Internal unix code

(service layer) requires

Smiles value Property calculator

needs CDK type

Internal data

retrieval system

serves data and

molecule as chime

type

converter

converter

converter converter

• Different Chemical

Types don’t work well

together

• Users constantly

converting chemical

data from one ‘type’ to

another

• Worse for Lilly nodes

that utilise many formats

with no ‘standard’

vendor like

representation

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

Aim:

• Remove need for user to manually add chemical converter nodes

• Ensure Nodes that use different Chemical formats to work together

better

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

KNIME.com introduced “Adaptor Cell” in 2.9

• Container with several representations of same entity

• Node can add additional representations that can be re-used by

downstream nodes

• Avoids multiple conversions

• Original representation still present

• Vendor Specific! No pseudo standards such as SDF

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 21

SDF RDKit CDK Indigo

Lilly Solution for (Pseudo) standards:

• Extension point for handling Molecule Type conversions

• Depends on Marvin library for Molecule conversions

+

• In development!

• Will be released opensource

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 22

Before

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 23

• Extension point moves conversions into Node configuration

• Workflow still documents explicit type conversions

• Still retains support for Converter nodes if/when appropriate

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 24

5/20/2014 Company Confidential © 2014 Eli Lilly and Company 25

Converters could be ‘chained’ if

direct conversion not available

(e.g.: InChI or Chime). Example

shown in dialogue:

Before

After

Requires SMILES

Making Workflows Work:

• Added many legacy tools and data services into

KNIME via custom nodes and SOA

• Aided usability by adding dashboard for service layer

monitoring

• Added authentication handling via NTLM Auth to

provide data authentication at source

• Adding molecule handling framework to reduce

number of molecule conversions users need

20/05/2014 Company Confidential © 2014 Eli Lilly and Company

Acknowledgements

5/21/2014 Company Confidential © 2014 Eli Lilly and Company 27

Java Coding & Infrastructure (Lilly):

Luke Bullard, Tom Wilkin

Project Management, End User support, Expert Users (& Testers), Previous Developers etc.:

Derek Marren, Marnie Williams, Pip Turner, Matt Hirst, Dave Thorner, Dave Evans, Mike Bodkin,

Niko Fechner, Roger Robinson, Jibo Wang, Christos Nicolaou, Beth Wright, Gary Sharman,

Simon Richards, Stuart Morton, Jason Ochoada, Jim Hughes

(In no particular order!)

KNIME.com

Bernd, Thorsten, Thomas, Aaron ++