What‘s new…
Bernd Wiswedel
KNIME.com AG, Zurich, Switzerland
Two feature releases last year: 2.6 & 2.7
Documented in Changelog, “What‘s new
summary” and as video on YouTube
What‘s new page on knime.org
KNIMETV Youtube Channel
Outline
Illustrative examples
• Swiss Survival Analysis
• KNIME Forum Analysis
• (Next Best Offer)
New Features in 2.6 & 2.7
Outline
Illustrative examples
• Swiss Survival Analysis
• KNIME Forum Analysis
• (Next Best Offer)
New Features in 2.6 & 2.7
Swiss Survival Analysis
• Survival Analysis / Actuarial Tables
• Using population and deaths data to predict
longevity
• Creating the tables
• Investigating the tables
• Creating customer tables for:
• Overall
• Personal
• Historical
• Forecasting
• Make it easy to use for the non-expert!
Outline
Illustrative examples
• Swiss Survival Analysis
• KNIME Forum Analysis
• (Next Best Offer)
New Features in 2.6 & 2.7
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
KNIME Forum Analysis
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Forum Analysis – Get Data
Two alternatives:
• Connect to underlying database, read
content
Doable but complicated:
7+ tables need to be read,
prepared and joined
Forum Analysis – Get Data
Two alternatives:
• Connect to underlying database, read
content
complicated and not generic
• Crawl the web page, parse html
• Use XML parser & Palladian’s html retriever
nodes
Forum Analysis – Structure of forum
Several Categories, “KNIME General”,
“KNIME Reporting”, “Palladian”, …
(~20 in total)
Forum Analysis – Structure of forum
Discussion threads on several sub-pages
Forum Analysis – Structure of forum
Each thread consists of an initial post
and a variable number of comments
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Structure of forum
Discussion threads on several sub-pages
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Forum Analysis – Crawler Flow
Input for all subsequent workflows!
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Forum Analysis – Simple Statistics
Forum Analysis – Simple Statistics
Input table from crawler workflow
Forum Analysis – Simple Statistics
Meta nodes perform simple
preprocessing, e.g. average number
of active users per month
Forum Analysis – Simple Statistics
Many different reporting nodes with
different statistics. Reporting
extension to generate PDF, DOC, …
Forum Analysis – Simple Statistics
Number of active users per year
Forum Analysis – Simple Statistics
An active user is an user with at
least one comment or one post in
that year.
Number of posts per year
Forum Analysis – Simple Statistics
Numbers are just posts (new
discussion threads), not comments
Number of posts per month and year
Forum Analysis – Simple Statistics
Big increase early 2011.
Coincidentally, Simon Richards
(richards99) joined
Who comments/answers on posts?
Forum Analysis – Simple Statistics
Response time
Forum Analysis – Simple Statistics
Number of comments per post
Forum Analysis – Simple Statistics
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Forum Analysis – Classify Posts
• Use text mining to classify forum post into
categories such as ‘io’, ‘manipulation’,
‘mining’, …
• No training set available
(mis-)use KNIME node description
• See evolution of discussion topics over the
years
Forum Analysis – Classify Posts
Want to classify forum post (only
first post, no comments)…
Forum Analysis – Classify Posts
… using KNIME node description text
as labeled training set
Forum Analysis – Classify Posts
Reads node descriptions from xml
dumps (generated with KNIME
command line tool)
Uses forum data input file and
prepares with text mining tools
Forum Analysis – Classify Posts
Unzips an archive with all xml files
into temp location
Forum Analysis – Classify Posts
XML files read with loop and
preprocessed (header and footer
removed)
Forum Analysis – Classify Posts
Description is converted into KNIME
text document, from which
(stemmed) terms are extracted
Forum Analysis – Classify Posts
Forum Analysis – Classify Posts
Training data extracted. Learning
attributes are keyword
occurrences; target is document
category
Forum Analysis – Classify Posts
Training data extracted. Learning
attributes are keyword
occurrences; target is document
category
Verify model by splitting data
into train/test.
Using random forest classifier to
address high dimensionality of
small (and sparse) data set
Forum Analysis – Classify Posts
… continuing with main input branch
(Input table from crawler workflow)
Forum Analysis – Classify Posts
Preprocessing similar to before,
extracting date, author, title, …
Forum Analysis – Classify Posts
Extracting attribute table using the
keywords from the node description
(training) data.
Forum Analysis – Classify Posts
Remainder of the workflow ranks
the prediction and prepares for the
report.
Forum Analysis – Classify Posts
Hot topics have always been
manipulation and mining … tasks
that KNIME is very good at.
Note also increase of ‘flowcontrol’
over the years and low ‘r’ traffic
(separate forum category, not part
of this data set)
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Forum Analysis – Content & Users
• Look at individual categories (KNIME
General, Developer, Reporting, …)
• Learn what is discussed
• See who is contributing
Forum Analysis – Content & Users
Input are all discussions
in one forum category…
Forum Analysis – Content & Users
Output is a multi page
report with tag cloud and
user connection graph
Combines KNIME’s text and
network mining extensions
Forum Analysis – Content & Users
Forum Analysis – Content & Users
Input table from crawler workflow
Forum Analysis – Content & Users
Main loop over all ~20 categories
Forum Analysis – Content & Users
General statistics per category
User network analysis
Text analytics
Forum Analysis – Content & Users
Text analysis: Forum posts converted
to documents and tagged (persons,
node names, node categories)
Forum Analysis – Content & Users
Terms fed into tag cloud, colors
represent persons (‘kilian’), nodes
(‘bow creator’), node categories
(‘xml’), …
Forum Analysis – Content & Users
Network analysis:
User connections
(content ignored)
Forum Analysis – Content & Users
Network analysis: Ignore topics, only
look at user relation ships. Network
nodes represent users, connections
represent (directed) relationships
between users
Forum Analysis – Content & Users
Network analysis: Very simple
user graph, visualized with
standard KNIME graph viewer
Forum Analysis – Content & Users
Data collected and send
to reporting extension
Forum Analysis – Content & Users
Multi page pdf output for
different forum categories
Forum Analysis – Content & Users
Text Mining forum category
Forum Analysis – Content & Users
RDKit (community
chemistry extension)
Forum Analysis – Content & Users
KNIME Users – not
dominated by any
particular users
KNIME Forum Analysis
Learn something about the KNIME forum:
http://tech.knime.org/forum
Challenges:
• Get data into KNIME
• Extract simple statistics (how many posts,
response time, response length)
• Classify topics and detect topic shifts
• Identify content and users
Reviewing all workflows
• All workflows rely on the same input data
• Requires re-run of “Crawler” workflow and
updating parameters in analysis flow
What do all these flows have in common?
They all require the “Crawler” data
Reviewing all workflows
• All workflows rely on the same input data
• Requires re-run of “Crawler” workflow and
updating parameters in analysis flow
• Better: Use meta node and share it between
all instances
They all require the “Crawler” data
They all require the “Crawler” data
Now use it in all the
analysis flows
Nice … but now all workflows
fetch the data each time they
execute!
Let’s add a cache option.
Quickform Node defining a switch:
-Get data from web or
-use cached file (lives on server)
Meta Node Templates
• Meta nodes as isolated functional unit
• Shared on KNIME Server (or teamspace) for
use in other workflows or by other users
• Quickforms to expose relevant parameters
in meta node dialog or in wizard execution
• Can also be used on the KNIME server…
Knime Web Portal
Knime Web Portal
Knime Web Portal
Knime Web Portal
Outline
Illustrative examples
• Swiss Survival Analysis
• KNIME Forum Analysis
• (Next Best Offer)
New Features in 2.6 & 2.7
NBO as a typical Project Collect training
data from
multiple sources:
- DB tables
- text files
- excel files
- SAS files
- binary tables
- map files
Define File Paths
and Parameters
Train and evaluate a number of
prediction algorithms to predict
variable Target
Retrieve old model
that has been decently working so
far
Compare
performances
and choose
best model
Recalculate
predictions
based on
best model
and save
Save
best
model
Read
current
data
NBO as an Example
Collect Training
Data from
multiple Sources
Select best
prediction model
Apply best
model to
score data
Select files and
define parameters
Build a
report
NBO Report
KNIME Server Training 109
Mean Error in %
Mean Error in %
Quickform dialogs
Execution
Wizard File Upload
Quickforms
Value
Selection
Quickform
Integer Input
Quickform „Workflow
Stopped“ light
Status
“Workflow
Running” icon
“Workflow
Running” light
KNIME User Training 115
Errors and Warnings
Report
Export report as
Results of past
Executions
Outline
Illustrative examples
• Swiss Survival Analysis
• KNIME Forum Analysis
• (Next Best Offer)
New Features in 2.6 & 2.7
New Features in 2.6 & 2.7 - Highlights
• Enhanced database functionality
• File Handling node collection
• More flexible R integration
• Streaming API
• Better (Java) scripting support
• Hypothesis testing nodes
• UI Changes
• Database update and delete
Enhanced DB functionality
• New type support: Boolean and Blobs
Enhanced DB functionality
• Set of nodes to read, (un)zip, copy, move,
convert, … files
• Add notion of unique resource identifier
(URI) and mime types
Used in 3rd party extensions
• Nodes to up and
download files:
ssh, http, ftp, …
File Handling Nodes
• Collection of Nodes to extract statistical measures
• Different t-tests
• Anova
• (Crosstab)
Hypotheses Testing Nodes
• Before KNIME 2.7:
• With KNIME 2.7:
Flexible R integration
• Enhanced functionality:
• define multiple outputs at once
• Script templates
• Better editor
• Syntax highlighting
• Auto completion
Scripting – Java Snippet & friends
Enhanced programming interface in KNIME
enabling nodes to be streamed and
distributed.
Streaming API
KNIME Explorer replaces “Workflow Projects”
KNIME UI Changes
Customizable Node repository
(getting from 1500+ nodes to <100)
KNIME UI Changes
…
Tons more …
Summary
Discussed KNIME Usage Examples
check “Examples” Server for even more
New functionality constantly added, thanks to
community, partners and customers
And more is coming…