Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Michael R. Berthold University of Konstanz, Germany
KNIME.com AG, Switzerland
The Berkeley R Language Beginner Study Group
Nov 19, 2013
R and KNIME: The Best of Two Worlds.
Agenda
• KNIME Overview • Demo / Intro • Interactive R Nodes • A few Examples • Q&A
A Brief History of KNIME 2004: KNIME development commences 2006: KNIME v1 released 2006: Spin-off in Konstanz, Germany 2006-2007: First commercial partners 2008: KNIME moves to Zurich 2010: Enterprise products released 2011: KNIME.com AG founded 2013: KNIME comes to the West Coast… +3000 Organizations Using KNIME
~30% Life Science ~70% Business Intelligence, Analytics +50 Very Active Community Developers
3
„KNIME saved my life in a world of scripts
that I do not want to learn!“ 2012
Who’s Using KNIME?
• >17.000 Individuals • ~3.000 Organizations world wide • ~300 KNIME.com Customers
The KNIME Platform
KNIME loads and integrates data from diverse data sources: • Different data bases • Various file formats (CSV, XML, SDF, etc.)
KNIME provides huge repository of modules for easy-to-use, modular • Data preprocessing • Data fusion • Data transformation
In addition to standard data mining techniques, KNIME adds cutting edge data analysis algorithms. (…thanks to its academic roots)
Interactive views provide data overviews and insights into the learned models. Interactive linking&brushing techniques allow for powerful exploration of models and data.
KNIME
Due to its open API and “node-in-a-sandbox”-approach additional (also external) tools are easily integrated,
e.g. • Access to the statistics tool R • Complete integration of the machine learning
library WEKA • Application area specific integration, e.g. CDK
(Chemical Development Kit), RDKit, ImageJ, … KNIME is Eclipse-based: Integrating other Eclipse projects such as BIRT, DTP, etc. provides even more functionality
KNIME Selected Node Highlights
Statistics Data Mining Time Series Image Processing Neighborgrams Web Analytics Text Mining Network Analysis Social Media Analysis WEKA R
Database Support ETL Text Processing Data Generation XML Read/Write PMML Read / Write Social Media Analysis Business Intelligence Community Nodes 3rd Party Nodes
11
Over 1000 native and imbedded nodes included:
Advanced Visualization
Community Contributors
Technology Partners
Distribution & Consulting Partners
Community Contributors
Community User Base
Academic Instiutions: • Universität Tübingen (BALL, OpenMS) • Freie Universität Berlin (SeqAn) • MPI Dresden (ImgLib) • Universität Dresden (Palladin) • ETH Zürich (OpenBIS) • Dublin University (OMERO) • University of Wisconsin (ImageJ2) • … Commercial Contributors: • Dymatrix Consulting Group (Uplift Nodes) • Eli Lilly (ChemInf suite) • Novartis (RDKit, Indigo) • Vernalis (Proteomics) • Cenix (SOAP Nodes) • Böhringer-Ingelheim (various sponsored nodes) • …
Community User Base
Technology Partners
Distribution &Consulting Partners
Community Contributors
Community User Base
0
50
100
150
200
Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13
Annual User Group Meeting Attendees
Dr. Rosaria Silipo (consultant) Simon Richards
(Eli Lilly)
Mike Mazanetz (Evotec)
What can I do with KNIME?
Standardization
Data Integration
Tool Integration – Version A
Tool Integration – Version B
Big Data: Clustering Meter IDs
30 clusters with k-Means on average daily,
monthly, hourly, ... kW values
Average hourly time series cluster by
cluster
KNIME and Big Data
• Big ETL • Big Analytics • Big Data(bases)
What else is KNIME used for?
And more…: • Next Best Offer • Survey Analysis • (Big) Time Series Data • …
Commercial KNIME (Attention – sales pitch!)
Tools for Collaboration: • KNIME TeamSpace • KNIME Server • Training, Consulting, and Custom
Development.
Standardization: KNIME TeamSpace at Work
Standardization: The KNIME Server in it’s element
Resources http://www.knime.org/learning-hub • Links to Guides, White Papers,
Documentation, and the KNIME YouTube Channel
• Tons of example workflows! http://www.knime.org/knimepress • Books for Beginners, Advanced
KNIME Users, and SAS Users.
Free Beginner’s Guide – use Code
“meetupsf13”
The R in KNIME Webinar: http://www.youtube.com/watch?v=wCvnO96d8h4
Demo.
27
Why use KNIME and R?
• Powerful statistics • Leading edge algorithms
• Powerful/flexible
graphics
• Widely accepted language
• Powerful user interface
• Designed for big data
• Integrates com and org tools
• Enterprise grade solutions
• Open source analytics
• Cross platform
• Vibrant communities
R KNIME
28
R in KNIME: 3 ways to play…
• Community
(RServe Integration)
• Core (Deprecated soon)
• R Interactive (Today's topic)
Overview of R (Interactive)
• Different input and output options • Grey ports enable workspace branching
The Interactive Editor
Columns
Variables Code Editor
Workspace Overview
Console
Templates
Preview
List
Summary
Node: R Source
• Get data from an R data frame
• Assign output to knime.out
• Use with foreign, RCurl, or ...
Node: R Snippet
• Generic data manipulation
• Derive knime.out from knime.in
• Use with grep(), plyr, or ...
Nodes: R Mining
• Use R models in KNIME
• Learner & Predictor motif
• PMML support for portability
Nodes: R View
• Generic R plots
• Plot(knime.in)
• Use with many packages including ggplot2
Metanodes and R: Quickforms
Metanodes and R: Deployment
• Abstract: Configure w/ simple dialog
• Share (TeamSpace/Server)
• Deploy (KNIME Webportal)
Embedding plots in BIRT
• Generate plots in R • Send to BIRT
EQPOL Data with Bioconductor I • External Quality Assurance Program Oversight Laboratory • NIH, NIAID, DAIDS program for QA of HIV/AIDS research • Can machine learning automate some manual analysis? • Problem: Lots of real data (~100,000,000 rows) • Bioconductor provides flowCore to make this easier
EQPOL Data with Bioconductor II
(Node) Development
KNIME Data Management and Execution Layer
KNIME Workflow Manager & User Interface
Execution Control Meta Data Handling
Data Management
KNIME I/O
KNIME Native
Algorithms
Open Source Integrations (R, BIRT, …)
Partner Extensions
Node Interface
Node Interface
Node Interface
Node Interface
Community Extensions
Node Interface
Data Mgmt &
Execution Ctrl
Data Mgmt &
Execution Ctrl
Data Mgmt &
Execution Ctrl
Data Mgmt &
Execution Ctrl
Data Mgmt &
Execution Ctrl
Clus
ter
Exec
utio
n
Mul
ti C
ore
Exec
utio
n
Dis
trib
uted
D
ata
Stor
age
Dis
trib
uted
Ex
ecut
ion
In M
emor
y D
ata
Han
dlin
g
Auto
mat
ic
Dat
a Ca
chin
g
KNIME Platform: Technology Overview D
ata
Type
Ex
tens
ions
Node Architecture
KNIME interacts only with a Node
Node takes care of
embedding the node in the infrastructure
New nodes implement
Model/View/Dialog
class Node (final)
class Node-
Dialog- Pane
(abstract)
class Node- View
(abstract)
class Node- Model
(abstract)
class NodeFactory (abstract)
Node Extension Wizard
• Included in the KNIME Developer Version
• Allows creation of plugin projects including functioning KNIME nodes (with sample code)
• Helpful to easily create all node classes – Generates all Java classes – Node is registered with the plugin project – Launch KNIME and enjoy the new node working!
Node Extension Wizard
Node Extension Wizard • Specify all settings to
create a new KNIME node – In a completely new plugin
project, or – Into an existing project
• Node type: Sink, Source, Learner, Predictor, Manipulator, Visualizer, Meta, or Other
• Include sample code or not
Node Extension Wizard • Contains all Java
classes (including sample code)
• Node is registered in the plugin.xml
• NodeDialog and NodeView class are also created and registered to the NodeFactory