Treparel Delftechpark 26 2628 XH Delft
The Netherlands www.treparel.com
Jeroen Kleinhoven CEO
February, 2014
Introducing Treparel:
Big Data Text Analytics &
Visualization applications
Industry Thought Leaders about Treparel
“Treparel KMX’s visualiza(on capabili(es around its auto-‐categoriza8on and clustering offer immediate insight into unstructured data sets and appear to be adaptable and customizable to customer needs. Its approach to auto-‐categoriza8on u8lizes sta8s8cal principles and machine learning that require significantly less training and tuning on the part of customers than other approaches.” David Schubmehl, IDC
“As we acquire more and more informa8on, we need tools that will guide us through the data maze. Analysts need tools to help them understand paGerns and define clusters. Users need to explore data to uncover rela8onships from scaGered sources. Treparel’s KMX serves both these needs with its ability to cluster and categorize collec8ons of data with a high degree of accuracy, and its interac8ve visualiza8on tools that enable explora8on of large data sets.” Sue Feldman, Synthexis.com (author: The Answer Machine.
Treparel KMX – All Rights Reserved 2013 2 www.treparel.com
Some of our clients & partners
KMX is an integral part of our IP analysis toolbox. It contributes to our capability of making added value IP analyses of technologies and compe8tors to support strategic decision making.
www.fusepool.eu
“We’ve speed up our patent searches from 2 days to 2 hours using KMX technology”
Treparel KMX – All rights reserved 2014 3
Key Business Problems Treparel KMX solves
Applica'on Area Business problem Value
IP & Patent Search How to improve the Bme-‐consuming and costly manual search-‐process of patents.
Reduce research Bme, improve precision & recall of relevant documents. Improve legal posiBon and drive more revenue from IP.
Compe''ve Analysis How to increase knowledge on compeBtors by gaining clustered insights from (semi-‐) public sources.
Improve compeBBve advantage by determining internaBonal strategy, product roadmap, R&D planning, markeBng campaigns and customer senBment.
Healthcare How to idenBfy health risks and find correlaBons in deceases or medical defects.
Early idenBficaBon on health risks by cross-‐discipline analyses on medical records, clinical observaBons and medical images.
Media & Publishing How to improve search and content analyBcs on large volumes of publicaBons.
Text analyBcs embedded in publishing improves relevance and accuracy of search and shows previously hidden documents.
Treparel KMX – All Rights Reserved 2013 4 www.treparel.com
Key Business Problems Treparel KMX solves -‐ 2
Use Cases Business problem Value
Sen'ment Analysis How to manage current and future customers and their interacBons
Deriving senBment from criBcal customer-‐based text sources can drive revenue, saBsfacBon and loyalty
Voice of Customer How to manage communicaBons and interacBons with employees, managers, subordinates and employment candidates
Analyzing HR-‐related informaBon (like CVs and projects) to match demand to supply.
eDiscovery How to manage and miBgate general liBgaBon risk and cost in large sets of text and emails.
Text analyBcs applied to legal trials or in laws and jurisprudence improves accuracy in legal cases and lowers costs.
Predic've Analysis How to idenBfy early signs of required maintenance that affect customer saBsfacBon and operaBonal costs
Use customer saBsfacBon surveys on food quality to idenBfy airplane ovens requiring maintenance tune-‐ups
5
Part 1: KMX: Ready to Use Text AnalyBcs Intui8ve Content Clustering, Classifica8on & Visualiza8on
Treparel KMX – All rights reserved 2014 6 www.treparel.com
VisualizaBon
Clustering ClassificaBon
Text Preprocessing and Indexing
Acquire documents
Present Results
Taxonomies, Ontologies
SemanBc Analysis
KMX Text AnalyBcs ApplicaBon overview
KMX unique funcBons: • Extract concepts in context using clustering and classificaBon of documents
• Use classificaBon to create ranked lists and to tag subsets
• Support of binary and mulB-‐class ClassificaBon
• Enterprise ediBon (server/cloud) & Professional ediBon (desktop)
• IntegraBon with other applicaBons through KMX API
Treparel KMX – All rights reserved 2013 7
Query & Search Tools
Benefits: Get quick insights through automated visual clusters with annotaBons to enhance the discovery process 1. Analyze the clusters and the relaBonships in the data 2. Explore outliers in the data 3. Find documents of interest
What it does: A visualizaBon of clusters where the documents are displayed as points and the distance between them shows their similarity. What KMX delivers: Use KMX to do: 1. Perform text preprocessing (stemming/tokenizaBon etc) 2. Calculate between all documents a similarity measure 3. Calculate visualizaBon (landscape) with automaBc annotaBon 4. Create the visualizaBon
– As a staBc image – Or provide interacBon where the user can zoom in/out with
support for adapBve annotaBon
Clustering: User Unsupervised AnalyBcs
Treparel KMX – All rights reserved 2014 www.treparel.com 8
Benefits: Finding fast, accurate and precise small result sets and enabling trend reporBng and AlerBng by reusing predefined categorizaBon models. 1. Obtain a ranked list of the most relevant documents 2. Separate the important documents from the irrelevant documents (noise)
How it works: A list of the relevant documents defined from a users perspecBve. What KMX delivers: Use KMX to do: 1. Tag (label) a small number of relevant and irrelevant documents
– Use search to idenBfy documents that need to be tagged – Perform manual tagging – Select documents interacBve from the visualizaBon (brushing)
2. Create a Classifier (categorizer) using the tagged documents 3. AutomaBcally perform the classificaBon on all documents 4. Obtain the important documents as ranked high and the irrelevant
documents which are ranked low
ClassificaBon: User Supervised AnalyBcs
Treparel KMX – All rights reserved 2014 www.treparel.com 9
Benefits: KMX VisualisaBons are supporBng the process of construcBng a visual image in the mind to understand the data be_er.
How it works: KMX offers a visualizaBon framework with various methods for seeing the unseen. It enriches the process of discovery and fosters profound and unexpected insights. What KMX delivers: Different visualizaBons or visual pipelines to: • Comprehend large datasets, datasets that are too large to grasp by mental
imaginaBon. • Discover previous unknown properBes of the data set that may not have
been anBcipated • Reveal inherent problems of the data, for instance errors and artefacts • Examine large-‐scale features of the dataset as well as the local features or
allows the user to see local features in a larger scale reference • Let users form hypothesis based on the (newly) observed phenomena or
developed insights
VisualizaBon: Discovering Unexpected Insights
Treparel KMX – All rights reserved 2014 www.treparel.com 10
Add-‐on servers: Auto ReporBng & Batch ClassificaBon
• Auto Repor'ng Server – Support automated analysis for aggregated
results for mulBple users – Pie & bar charts – Landscape visualizaBons for overview of
subjects – Enabling rich interacBon via web interface
• Classifica'on Batch Server – high-‐performance stand-‐alone text-‐
classificaBon server – Enables large scale parallel processing
Page 11 Treparel KMX – All rights reserved 2014 11 www.treparel.com
Business Value from Content with KMX þ Text Analy'cs for Anyone and Everyone – IntuiBve to use and learn. Designed
for every user: business (info consumers) and scienBfic (info creators).
þ Instant Business Insights – Explore all of your unstructured data (text, blogs, email, patents) without limits.
þ Rapid Time to Value -‐ Adaptable and customizable to users needs. No implementaBon or extensive and expensive modelling or development. Significant less training and tuning.
þ Any size deployment – Meets every business need from a single user to large mulBlevel type user groups.
þ Language independent – Search and analyze most of the world’s languages using machine translaBon.
þ Any kind or deployment -‐ Use it from your desktop or in a -‐ private -‐ cloud. Buy the socware-‐as-‐a-‐service or get the output-‐as-‐a-‐service.
þ Enterprise-‐proven, IP & IT friendly – Successfully delivering value to IP, business and markets in mulBnaBonal companies.
þ Integra'on – Use the KMX API to increase the value of unstructured data in your IP discovery infrastructure
12 Treparel KMX – All rights reserved 2012 www.treparel.com
Part 2: KMX socware: User Interface, key func8ons & value
Treparel KMX – All rights reserved 2014 13 www.treparel.com
KMX : Model, Analyse, Discover and Visualize in one view and deploy it to large scale
Document text
Search and highligh'ng
Landscape visualiza'on Coloring of classifica'on score
Brushing
Filtering
Treparel KMX – All rights reserved 2014 14 www.treparel.com KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
KMX : OpBmize Output using ClassificaBon Performance Tuning
Precision And Recall
Distribu'on of classifica'on scores
Document classifica'on for three classes
Treparel KMX – All rights reserved 2014 15 www.treparel.com
Use Case 1: Performing small to large scale SWOT analysis (on AstraZeneca patents)
Patent Database
+10.000 patents
986 patents
29 patents
Ranking
Queries
Filtering
SWOT analysis example Start with removing irrelevant patents using Classifica8on and Filtering to determine: • Who are the important players (assignees, inventors)?
• Where are the important patents filed (countries)?
• What is the trend over Bme (growth of patents over the years)?
• NB: we used a (very) simple query to find 986 patents filed under Astrazeneca.
Output
Business User
Ranking Filtering
Ranking Filtering
Treparel KMX – All rights reserved 2014
Landscaping and Ranking: From 986 to the most relevant patents
Fig: Using vlsual selection (brushing) to build a classification model (Classifier) to be able to rank the full data set and to extract the most relevant. 17
Landscaping and Ranking: What are most relevant Respiratory & Inflamma8on patents?
Fig: Ranked patents using a Classifier for Respiratory & Inflammation patents (In yellow the selection of 29 absolute relevant patents to be further analyzed). We used ‘respiratory’ to demonstrate highlighting capabilities.
Yellow = most important patents (+80% score) Blue = least relevant patents (for this analysis)
18
NB: crosshair points to 1 specific patent (full text in left pane)
How Reliable & Accurate are the results? Review your results with advanced performance tools The quality of the automaBc classificaBon (categorizaBon) is shown in the histogram, where a small number of documents with a high classificaBon score are separated from the large number of documents.
Non relevant documents Relevant documents
KMX calculates the Precision and Recall of the results using cross validaBon. • Precision is essenBal for: First analysis & AlerBng services • Recall is crucial for: Freedom to Operate search, Validity search Patentability search • Both need to be high for: Patent porkolio landscape analysis, Technology ExploraBon, Risk Assessments
Fig: Classification performance 1280 patents on ‘biomass’
19
Page 20 |
Extrac8ng concepts in context from classifica8on of documents
Use Case 2: Concept detecBon using document classificaBon
1. VisualizaBon à mulBple topic clusters
2. Select cluster à select documents with similar topics
3. Select training documents within the sub-‐cluster
4. Build Classifier and classify 5. Rank documents à find set of
documents with related concepts 6. Extract concepts
KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
Treparel KMX – All rights reserved 2014 20
Part 3: NEW: Content Dashboard (InfoApp) Integrated SAAS based search, repor8ng, visualiza8on and analysis
Treparel KMX – All rights reserved 2014 21 www.treparel.com
Role of KMX in Integrated InformaBon ApplicaBons
Domain or Market Specific InfoApps (by Partners)
Text
Research Literature
Mobile
Patent Data Tweets
Web
Documents
Text PreP Indexing Clustering Classification Visualize
Text Mining Stem/Token
Enterprise Content
Websites
Dashboard Reporting Search Visualization Alerting Exploring
Informa'on Consumers (+ 100 users)
Creators/ Data Scien'sts (1-‐5 users)
Client/Server
Management, Development and Integration
Treparel KMX – All Rights Reserved 2013 22 [email protected]
Content Dashboard: Content Driven AnalyBcal solu8on
Treparel KMX – All rights reserved 2014 www.treparel.com 23
Ease of Use access to Search, Reporting & Analysis of content like Patents, Emails, Legislation, Application Notes, websites
Content Dashboard: Content analyBcs beyond key-‐word search
Treparel KMX – All rights reserved 2014 www.treparel.com 24
Interactive taxonomy with multiple coupled views and advanced search in large sets of documents
Content Dashboard: Built in analy8cs & interac8ve visualiza8ons
Treparel KMX – All rights reserved 2014 www.treparel.com 25
Ad-hoc or Standard interactive visualizations leading directly to the underlying documents or notes
Part 4: NEW: KMX API for OEM partners: Put best in class content analy8cs in your solu8ons
Treparel KMX – All rights reserved 2014 26 www.treparel.com
SoluBons built on KMX
27 Fig 1. McKinsey diagram showing the three technology layers of the Big Data technology stack
Partner solutions: • IP & Patent Analytics • Media & Publishing • HR • eDiscovery (Law & Legislation) • Fraud Detection • National Security & Police • Sentiment analytics • CRM/Voice of Customer • Government • Sharepoint (Enrich & Migrate) • Content-based Dashboards
KMX platform Big Data Text Analytics
(cloud based platform / API)
KMX Empowers InfoApps (solution partners/OEM/VAR)
KMX API for OEM: Embed Advanced Text AnalyBcs in your soluBon
Classification Supervised analytics to help users automatically categorize large sets of documents. The Classification process can use a small number of documents sets for learn-by-example categorization. By sorting the content of documents by topic, relevancy and keywords users can apply their own models or rules for classification.
Visualization Advanced visual knowledge discovery for displaying, exporting and sharing data results, ranked document lists, labeled and enriched data or interactive visualizations. Terms can be extracted to use in building thesauri or taxonomies.
Clustering Provides users unsupervised
analytics and automatically identifies inherent themes or
information clusters.
Through a dynamic hierarchical topic view into
search results it enables users to quickly focus on annotated subjects rather than scrolling
through long results lists.
KMX API XML-RPC and REST (JSON)
Python Pickle protocol
Server: User / Tenant mgt User objects mgt (datasets,
work spaces, classifiers, stop lists,.)
Databases: Oracle, PostgreSQL
Client Application:
Native Windows (for creating Analysis pipelines)
Using QT for GUI Using OpenGL for
visualizations
Example Applications Areas Advanced Visualizations, Interactive Analytics, Text Disambiguation, Data Enrichment, Click-through Optimization, Concept Extraction, Automated Tagging, Semantic Discovery, Named Entity Recognition Document Overlap Display, SWOT analysis, Sentiment Analysis, Predictive Analytics
KMX enables information and knowledge professionals to gain faster, reliable, more precise insights in large complex unstructured data sets allowing them to make better informed decisions.
Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization