Transcript
Page 1: 2014: Treparel Big Data Text Analytics & Visualization

Treparel Delftechpark 26 2628 XH Delft

The Netherlands www.treparel.com

Jeroen Kleinhoven CEO

[email protected]

February, 2014

Introducing Treparel:

Big Data Text Analytics &

Visualization applications

Page 2: 2014: Treparel Big Data Text Analytics & Visualization

Industry  Thought  Leaders  about  Treparel  

“Treparel  KMX’s  visualiza(on  capabili(es  around  its  auto-­‐categoriza8on  and  clustering  offer  immediate  insight  into  unstructured  data  sets  and  appear  to  be  adaptable  and  customizable  to  customer  needs.  Its  approach  to  auto-­‐categoriza8on  u8lizes  sta8s8cal  principles  and  machine  learning  that  require  significantly  less  training  and  tuning  on  the  part  of  customers  than  other  approaches.”  David  Schubmehl,  IDC  

“As  we  acquire  more  and  more  informa8on,  we  need  tools  that  will  guide  us  through  the  data  maze.  Analysts  need  tools  to  help  them  understand  paGerns  and  define  clusters.    Users  need  to  explore  data  to  uncover  rela8onships  from  scaGered  sources.    Treparel’s  KMX  serves  both  these  needs  with  its  ability  to  cluster  and  categorize  collec8ons  of  data  with  a  high  degree  of  accuracy,  and  its  interac8ve  visualiza8on  tools  that  enable  explora8on  of  large  data  sets.”  Sue  Feldman,  Synthexis.com  (author:  The  Answer  Machine.  

Treparel KMX – All Rights Reserved 2013 2 www.treparel.com

Page 3: 2014: Treparel Big Data Text Analytics & Visualization

Some  of  our  clients  &  partners  

KMX  is  an  integral  part  of  our  IP  analysis  toolbox.  It  contributes  to  our  capability  of  making  added  value  IP  analyses  of  technologies  and  compe8tors  to  support  strategic  decision  making.  

www.fusepool.eu

“We’ve  speed  up  our  patent  searches  from  2  days  to  2  hours  using  KMX  technology”  

Treparel KMX – All rights reserved 2014 3

Page 4: 2014: Treparel Big Data Text Analytics & Visualization

Key  Business  Problems  Treparel  KMX  solves  

Applica'on  Area   Business  problem   Value  

IP  &  Patent  Search  How  to  improve  the  Bme-­‐consuming  and  costly  manual  search-­‐process  of  patents.  

Reduce  research  Bme,  improve  precision  &  recall  of  relevant  documents.  Improve  legal  posiBon  and  drive  more  revenue  from  IP.  

Compe''ve  Analysis    How  to  increase  knowledge  on  compeBtors  by  gaining  clustered  insights  from  (semi-­‐)  public  sources.  

Improve  compeBBve  advantage  by  determining  internaBonal  strategy,  product  roadmap,  R&D  planning,  markeBng  campaigns  and  customer  senBment.  

Healthcare    How  to  idenBfy  health  risks  and  find  correlaBons  in  deceases  or  medical  defects.  

Early  idenBficaBon  on  health  risks  by  cross-­‐discipline  analyses  on  medical  records,  clinical  observaBons  and  medical  images.  

Media  &  Publishing  How  to  improve  search  and  content  analyBcs  on  large  volumes  of  publicaBons.  

Text  analyBcs  embedded  in  publishing  improves  relevance  and  accuracy  of  search  and  shows  previously  hidden  documents.  

Treparel KMX – All Rights Reserved 2013 4 www.treparel.com

Page 5: 2014: Treparel Big Data Text Analytics & Visualization

Key  Business  Problems  Treparel  KMX  solves  -­‐  2  

Use  Cases   Business  problem   Value  

Sen'ment  Analysis  How  to  manage  current  and  future  customers  and  their  interacBons  

Deriving  senBment  from  criBcal  customer-­‐based  text  sources  can  drive  revenue,  saBsfacBon  and  loyalty    

Voice  of  Customer  How  to  manage  communicaBons  and  interacBons  with  employees,  managers,  subordinates  and  employment  candidates  

Analyzing  HR-­‐related  informaBon  (like  CVs  and  projects)  to  match  demand  to  supply.  

eDiscovery  How  to  manage  and  miBgate  general  liBgaBon  risk  and  cost  in  large  sets  of  text  and  emails.  

Text  analyBcs  applied  to  legal  trials  or  in  laws  and  jurisprudence  improves  accuracy  in  legal  cases  and  lowers  costs.  

Predic've  Analysis  How  to  idenBfy  early  signs  of  required  maintenance  that  affect  customer  saBsfacBon  and  operaBonal  costs  

Use  customer  saBsfacBon  surveys  on  food  quality  to  idenBfy  airplane  ovens  requiring  maintenance  tune-­‐ups  

5

Page 6: 2014: Treparel Big Data Text Analytics & Visualization

Part  1:    KMX:  Ready  to  Use  Text  AnalyBcs    Intui8ve  Content  Clustering,  Classifica8on  &  Visualiza8on  

Treparel KMX – All rights reserved 2014 6 www.treparel.com

Page 7: 2014: Treparel Big Data Text Analytics & Visualization

VisualizaBon  

Clustering   ClassificaBon  

Text  Preprocessing  and  Indexing  

Acquire  documents  

Present  Results  

Taxonomies,  Ontologies  

SemanBc  Analysis  

KMX  Text  AnalyBcs  ApplicaBon  overview  

KMX  unique  funcBons:  •  Extract  concepts  in  context  using  clustering  and  classificaBon  of  documents  

•  Use  classificaBon  to  create  ranked  lists  and  to  tag  subsets  

•  Support  of  binary  and  mulB-­‐class  ClassificaBon  

•  Enterprise  ediBon  (server/cloud)  &  Professional  ediBon  (desktop)  

•  IntegraBon  with  other  applicaBons  through  KMX  API  

Treparel KMX – All rights reserved 2013 7

Query & Search Tools

Page 8: 2014: Treparel Big Data Text Analytics & Visualization

Benefits:  Get  quick  insights  through  automated  visual  clusters  with  annotaBons  to  enhance  the  discovery  process    1.  Analyze  the  clusters  and  the  relaBonships  in  the  data    2.  Explore  outliers  in  the  data  3.  Find  documents  of  interest  

What  it  does:  A  visualizaBon  of  clusters  where  the  documents  are  displayed  as  points  and  the  distance  between  them  shows  their  similarity.      What  KMX  delivers:  Use  KMX  to  do:  1.  Perform  text  preprocessing  (stemming/tokenizaBon  etc)  2.  Calculate  between  all  documents  a  similarity  measure  3.  Calculate  visualizaBon  (landscape)  with  automaBc  annotaBon  4.  Create  the  visualizaBon    

–  As  a  staBc  image  –  Or  provide  interacBon  where  the  user  can  zoom  in/out  with  

support  for  adapBve  annotaBon  

Clustering:  User  Unsupervised  AnalyBcs  

Treparel KMX – All rights reserved 2014 www.treparel.com 8

Page 9: 2014: Treparel Big Data Text Analytics & Visualization

Benefits:  Finding  fast,  accurate  and  precise  small  result  sets  and  enabling  trend  reporBng  and  AlerBng  by  reusing  predefined  categorizaBon  models.  1.  Obtain  a  ranked  list  of  the  most  relevant  documents    2.  Separate  the  important  documents  from  the  irrelevant  documents  (noise)  

How  it  works:  A  list  of  the  relevant  documents  defined  from  a  users  perspecBve.      What  KMX  delivers:  Use  KMX  to  do:  1.  Tag  (label)  a  small  number  of  relevant  and  irrelevant  documents  

–  Use  search  to  idenBfy  documents  that  need  to  be  tagged  –  Perform  manual  tagging  –  Select  documents  interacBve  from  the  visualizaBon  (brushing)  

2.  Create  a  Classifier  (categorizer)  using  the  tagged  documents  3.  AutomaBcally  perform  the  classificaBon  on  all  documents    4.  Obtain  the  important  documents  as  ranked  high  and  the  irrelevant  

documents  which  are  ranked  low  

ClassificaBon:  User  Supervised  AnalyBcs  

Treparel KMX – All rights reserved 2014 www.treparel.com 9

Page 10: 2014: Treparel Big Data Text Analytics & Visualization

Benefits:  KMX  VisualisaBons  are  supporBng    the  process  of  construcBng  a  visual  image    in  the  mind  to  understand  the  data  be_er.  

How  it  works:  KMX  offers  a  visualizaBon  framework  with  various  methods  for  seeing  the  unseen.  It  enriches  the  process  of  discovery  and  fosters  profound  and  unexpected  insights.    What  KMX  delivers:  Different  visualizaBons  or  visual  pipelines  to:  •  Comprehend  large  datasets,  datasets  that  are  too  large  to  grasp  by  mental  

imaginaBon.  •  Discover  previous  unknown  properBes  of  the  data  set  that  may  not  have  

been  anBcipated  •  Reveal  inherent  problems  of  the  data,  for  instance  errors  and  artefacts  •  Examine  large-­‐scale  features  of  the  dataset  as  well  as  the  local  features  or  

allows  the  user  to  see  local  features  in  a  larger  scale  reference  •  Let  users  form  hypothesis  based  on  the  (newly)  observed  phenomena  or  

developed  insights    

VisualizaBon:  Discovering  Unexpected  Insights  

Treparel KMX – All rights reserved 2014 www.treparel.com 10

Page 11: 2014: Treparel Big Data Text Analytics & Visualization

Add-­‐on  servers:  Auto  ReporBng  &  Batch  ClassificaBon  

•  Auto  Repor'ng  Server  –  Support  automated  analysis  for  aggregated  

results  for  mulBple  users  –  Pie  &  bar  charts  –  Landscape  visualizaBons  for  overview  of  

subjects  –  Enabling  rich  interacBon  via  web  interface  

•  Classifica'on  Batch  Server  –  high-­‐performance  stand-­‐alone  text-­‐

classificaBon  server  –  Enables  large  scale  parallel  processing  

Page 11 Treparel KMX – All rights reserved 2014 11 www.treparel.com

Page 12: 2014: Treparel Big Data Text Analytics & Visualization

Business  Value  from  Content  with  KMX  þ  Text  Analy'cs  for  Anyone  and  Everyone  –  IntuiBve  to  use  and  learn.  Designed  

for  every  user:  business  (info  consumers)  and  scienBfic  (info  creators).  

þ  Instant  Business  Insights  –  Explore  all  of  your  unstructured  data  (text,  blogs,  email,  patents)  without  limits.    

þ  Rapid  Time  to  Value  -­‐  Adaptable  and  customizable  to  users  needs.  No  implementaBon  or  extensive  and  expensive  modelling  or  development.  Significant  less  training  and  tuning.      

þ  Any  size  deployment  –  Meets  every  business  need  from  a  single  user  to  large  mulBlevel  type  user  groups.    

þ  Language  independent  –  Search  and  analyze  most  of  the  world’s  languages  using  machine  translaBon.  

þ  Any  kind  or  deployment  -­‐  Use  it  from  your  desktop  or  in  a  -­‐  private  -­‐    cloud.  Buy  the  socware-­‐as-­‐a-­‐service  or  get  the  output-­‐as-­‐a-­‐service.      

þ  Enterprise-­‐proven,  IP  &  IT  friendly  –  Successfully  delivering  value  to  IP,  business  and  markets  in  mulBnaBonal  companies.  

þ  Integra'on  –  Use  the  KMX  API  to  increase  the  value  of  unstructured  data  in  your  IP  discovery  infrastructure  

12 Treparel KMX – All rights reserved 2012 www.treparel.com

Page 13: 2014: Treparel Big Data Text Analytics & Visualization

Part  2:    KMX  socware:    User  Interface,  key  func8ons  &  value  

Treparel KMX – All rights reserved 2014 13 www.treparel.com

Page 14: 2014: Treparel Big Data Text Analytics & Visualization

KMX  :  Model,  Analyse,  Discover  and  Visualize    in  one  view  and  deploy  it  to  large  scale  

Document  text  

Search  and  highligh'ng  

Landscape  visualiza'on   Coloring  of  classifica'on  score  

Brushing  

Filtering  

Treparel KMX – All rights reserved 2014 14 www.treparel.com KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’

Page 15: 2014: Treparel Big Data Text Analytics & Visualization

KMX  :  OpBmize  Output    using  ClassificaBon  Performance  Tuning  

Precision  And    Recall  

Distribu'on  of  classifica'on  scores  

Document  classifica'on  for  three  classes  

Treparel KMX – All rights reserved 2014 15 www.treparel.com

Page 16: 2014: Treparel Big Data Text Analytics & Visualization

Use  Case  1:  Performing  small  to  large  scale  SWOT  analysis  (on  AstraZeneca  patents)  

Patent  Database  

+10.000 patents

986 patents

29 patents

Ranking  

Queries  

Filtering  

SWOT  analysis  example    Start  with  removing  irrelevant  patents  using  Classifica8on  and  Filtering  to  determine:  •  Who  are  the  important  players  (assignees,  inventors)?  

•  Where  are  the  important  patents  filed  (countries)?  

•  What  is  the  trend  over  Bme  (growth  of  patents  over  the  years)?  

•  NB:  we  used  a  (very)  simple  query  to  find  986  patents  filed  under  Astrazeneca.  

   

Output

Business  User      

Ranking   Filtering  

Ranking   Filtering  

Treparel KMX – All rights reserved 2014

Page 17: 2014: Treparel Big Data Text Analytics & Visualization

Landscaping  and  Ranking:  From  986  to  the  most  relevant  patents  

Fig: Using vlsual selection (brushing) to build a classification model (Classifier) to be able to rank the full data set and to extract the most relevant. 17

Page 18: 2014: Treparel Big Data Text Analytics & Visualization

Landscaping  and  Ranking:  What  are  most  relevant  Respiratory  &  Inflamma8on  patents?  

Fig: Ranked patents using a Classifier for Respiratory & Inflammation patents (In yellow the selection of 29 absolute relevant patents to be further analyzed). We used ‘respiratory’ to demonstrate highlighting capabilities.

Yellow = most important patents (+80% score) Blue = least relevant patents (for this analysis)

18

NB: crosshair points to 1 specific patent (full text in left pane)

Page 19: 2014: Treparel Big Data Text Analytics & Visualization

How  Reliable  &  Accurate  are  the  results?  Review  your  results  with  advanced  performance  tools  The  quality  of  the  automaBc  classificaBon  (categorizaBon)  is  shown  in  the  histogram,  where  a  small  number  of  documents  with  a  high  classificaBon  score  are  separated  from  the  large  number  of  documents.  

Non  relevant  documents   Relevant  documents  

KMX  calculates  the  Precision  and  Recall  of  the  results  using  cross  validaBon.  • Precision  is  essenBal  for:  First  analysis  &  AlerBng  services  • Recall  is  crucial  for:  Freedom  to  Operate  search,  Validity  search  Patentability  search  • Both  need  to  be  high  for:  Patent  porkolio  landscape  analysis,  Technology  ExploraBon,  Risk  Assessments    

Fig: Classification performance 1280 patents on ‘biomass’

19

Page 20: 2014: Treparel Big Data Text Analytics & Visualization

Page  20  |    

Extrac8ng  concepts  in  context  from  classifica8on  of  documents  

Use  Case  2:  Concept  detecBon  using  document  classificaBon  

1.  VisualizaBon  à  mulBple  topic  clusters  

2.  Select  cluster  à  select  documents  with  similar  topics  

3.  Select  training  documents  within  the  sub-­‐cluster  

4.  Build  Classifier  and  classify  5.  Rank  documents  à  find  set  of  

documents  with  related  concepts  6.  Extract  concepts  

KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’

Treparel KMX – All rights reserved 2014 20

Page 21: 2014: Treparel Big Data Text Analytics & Visualization

Part  3:    NEW:  Content  Dashboard  (InfoApp)  Integrated  SAAS  based  search,  repor8ng,  visualiza8on  and  analysis  

Treparel KMX – All rights reserved 2014 21 www.treparel.com

Page 22: 2014: Treparel Big Data Text Analytics & Visualization

 Role  of  KMX  in  Integrated  InformaBon  ApplicaBons    

Domain or Market Specific InfoApps (by Partners)

Text

Research Literature

Mobile

Patent Data Tweets

Web

Documents

Text PreP Indexing Clustering Classification Visualize

Text Mining Stem/Token

Email

Enterprise Content

Websites

Dashboard Reporting Search Visualization Alerting Exploring

Informa'on  Consumers  (+  100  users)  

Creators/  Data  Scien'sts  (1-­‐5  users)  

Client/Server

Management, Development and Integration

Treparel KMX – All Rights Reserved 2013 22 [email protected]

Page 23: 2014: Treparel Big Data Text Analytics & Visualization

Content  Dashboard:    Content  Driven  AnalyBcal  solu8on  

Treparel KMX – All rights reserved 2014 www.treparel.com 23

Ease of Use access to Search, Reporting & Analysis of content like Patents, Emails, Legislation, Application Notes, websites

Page 24: 2014: Treparel Big Data Text Analytics & Visualization

Content  Dashboard:    Content  analyBcs  beyond  key-­‐word  search  

Treparel KMX – All rights reserved 2014 www.treparel.com 24

Interactive taxonomy with multiple coupled views and advanced search in large sets of documents

Page 25: 2014: Treparel Big Data Text Analytics & Visualization

Content  Dashboard:    Built  in  analy8cs  &  interac8ve  visualiza8ons  

Treparel KMX – All rights reserved 2014 www.treparel.com 25

Ad-hoc or Standard interactive visualizations leading directly to the underlying documents or notes

Page 26: 2014: Treparel Big Data Text Analytics & Visualization

Part  4:    NEW:  KMX  API  for  OEM  partners:  Put  best  in  class  content  analy8cs  in  your  solu8ons  

Treparel KMX – All rights reserved 2014 26 www.treparel.com

Page 27: 2014: Treparel Big Data Text Analytics & Visualization

SoluBons  built  on  KMX  

27 Fig 1. McKinsey diagram showing the three technology layers of the Big Data technology stack

Partner solutions: •  IP & Patent Analytics • Media & Publishing • HR •  eDiscovery (Law & Legislation) • Fraud Detection • National Security & Police • Sentiment analytics • CRM/Voice of Customer • Government • Sharepoint (Enrich & Migrate) • Content-based Dashboards

KMX platform Big Data Text Analytics

(cloud based platform / API)

KMX Empowers InfoApps (solution partners/OEM/VAR)

Page 28: 2014: Treparel Big Data Text Analytics & Visualization

KMX  API  for  OEM:  Embed  Advanced  Text  AnalyBcs  in  your  soluBon  

Classification Supervised analytics to help users automatically categorize large sets of documents. The Classification process can use a small number of documents sets for learn-by-example categorization. By sorting the content of documents by topic, relevancy and keywords users can apply their own models or rules for classification.

Visualization Advanced visual knowledge discovery for displaying, exporting and sharing data results, ranked document lists, labeled and enriched data or interactive visualizations. Terms can be extracted to use in building thesauri or taxonomies.

Clustering Provides users unsupervised

analytics and automatically identifies inherent themes or

information clusters.

Through a dynamic hierarchical topic view into

search results it enables users to quickly focus on annotated subjects rather than scrolling

through long results lists.

KMX API XML-RPC and REST (JSON)

Python Pickle protocol

Server: User / Tenant mgt User objects mgt (datasets,

work spaces, classifiers, stop lists,.)

Databases: Oracle, PostgreSQL

Client Application:

Native Windows (for creating Analysis pipelines)

Using QT for GUI Using OpenGL for

visualizations

Example Applications Areas Advanced Visualizations, Interactive Analytics, Text Disambiguation, Data Enrichment, Click-through Optimization, Concept Extraction, Automated Tagging, Semantic Discovery, Named Entity Recognition Document Overlap Display, SWOT analysis, Sentiment Analysis, Predictive Analytics

Page 29: 2014: Treparel Big Data Text Analytics & Visualization

KMX enables information and knowledge professionals to gain faster, reliable, more precise insights in large complex unstructured data sets allowing them to make better informed decisions.

Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization


Recommended