62
Text Mining & Visualization Impressions of emerging capabilities Cynthia Barcelon-Yang (speaker) Yun Yun Yang (speaker) Lucy Akers Bristol-Myers Squibb 2007 PIUG Northeast Conference New Brunswick, New Jersey

Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

  • Upload
    vannhu

  • View
    238

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Text Mining & Visualization

Impressions of emerging capabilities

Cynthia Barcelon-Yang (speaker)

Yun Yun Yang (speaker)

Lucy Akers

Bristol-Myers Squibb

2007 PIUG Northeast Conference

New Brunswick, New Jersey

Page 2: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

�Introduction – Text Mining &Visualization

�Overview of Text Mining Tools

■Capabilities

■Data Sources

■Results

■Strengths

� Summary

Page 3: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Why do we need a tool to do text mining?

Welcome to the age of too much information...

Page 4: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Typical questions asked of IP Operations

�How many patents do we have concerning technology ‘x’?

�How does our portfolio compare with company ‘ABC’ ?

�Who is citing our portfolio?

�Which patents do business unit ‘xyz’ own?

�Which patents should we divest as a result of selling division

XYZ?

�How do our invention disclosures compare with current granted

patents?

�How do we improve our patent operations?

Often, the IP Operations group within an organization provides centralized support

to a wide range of business units, and is responsible for answering the following:

Page 5: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

What is text mining?

(according to Marti Hearst of UC Berkeley School of Information)

■ The discovery of new, previously unknown information, by automatically extracting information from different written resources.

■ A variation on a field called data mining, that tries to find interesting patterns from large databases.

■ Many researchers think it will require a full simulation of how the mind works before we can write programs that read the way people do.

■ computational linguistics (also known as natural language processing)

■ Hearst distinguishes between "real" text mining, that discovers new pieces of knowledge, and approaches that find overall trends in textual data.

Page 6: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Text Mining Process

Courtesy of: Invention Machine Corp.

Page 7: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Common Tasks�List generation (can be displayed as histograms)

�List cleanup and grouping of concepts

�Co-occurrence matrices and other graphing

�Clustering, categorization, grouping and extraction of text

�Mapping document clusters or concepts

�Adding temporal components to maps

�Citation analysis

�Subject/Action/Object (SAO) functions (a.k.a. NLP)

�Federated searching e.g. on Internet or Intranets

Page 8: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Project Planning

■ Phase I

►Literature searches, key references, brainstorming of

text/data mining & visualization

►Identify potential tools to evaluate

►Vendor onsite demonstrations

► Summary of initial tool evaluations

■ Phase II

►Pilot selected tools

►Identify potential clients groups and interview

representative clients

Page 9: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Investigation & Process Approach

■ Scout the literature/internet sources & brainstorm

■ Benchmark

■ “Patinformatics – Tools and Tasks” by Tony Trippe,

World Patent Information 25 (2003) 211–221

■ “Data Visualization Tools - A Perspective from

the Pharmaceutical Industry” by Jeannette Eldridge, World Patent Information 28 (2006) 43–49

■ Vendor demos

Page 10: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Tools Initially Identified

AnaVist Matheo Patent

Anacubis OmniViz

Aureka PatAnalyst

Bioalma Quosa

BizInt Technology Watch

ClearForest Temis

Delphion VantagePoint

Entrieva (Semio) Vivisimo

GoldFire Wisdomain

Inxight Wistract

M-CAM

Page 11: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Vendor Tool Demonstrations

1.Quosa

2.Inxight

3.PatAnalyst

4.OmniViz

5.Temis

6.Aureka

7.Wisdomain

8.GoldFire

9.VantagePoint

10.ClearForest

11.m-CAM

12.RefViz

Page 12: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

* Overview of Vendor Tools

�Type of Tool

�Capabilities

�Data Sources

�Results

�Strengths

�Summary

* Text mining tool slides are provided courtesy of the vendors.

Page 13: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Text Mining Capabilities

�Keyword Analysis■ Extracting nouns or noun phrases in text without understanding their meaning or relationships or counting the number of times the nouns appear

�Statistical Analysis ■ Frequency-based analysis – counting the number of times a word appears in the text

� Linguistic Analysis■ Natural language processing (NLP) – “Trained Agent”

■ Semantic analysis

Page 14: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Text Mining Data Sources

■Unstructured text

►full text document, emails

■Structured text

►database records, such as records from STN,

pubmed

■Hybrid content

►Patents, front page is structured, text is not

Page 15: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Data Sources

I. General Data Sources (Unstructured):ClearForest

GoldFire Innovator

Inxight

OmniViz

Temis

II. Bibliographic Data Sources (Structured):Quosa

RefViz

VantagePoint

III. Patent-Focused (Hybrid): Aureka

M-CAM

PatAnalyst

Wisdomain

Page 16: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Evaluation Template� Type of Tool

■ Text mining software tool

■ Database content provider

■ Both

� Capabilities■ Keyword analysis

■ Statistical analysis

■ Linguistic analysis

� Data Sources■ Structured bibliographic data sources

■ Unstructured sources – full-text web, email, corporate repositories, etc.

■ Hybrid sources – patents, combination of structured/unstructured

� Results■ Lists of documents

■ Tables

■ Charts/Graphs

■ Maps

� Strengths – Disclaimer: Our Impressions only!

� Summary

Page 17: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

GoldFire Innovator

� Type of tool – text mining tool

Page 18: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

GoldFire Innovator� Technology – Semantic Analysis

Page 19: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

GoldFire Innovator

Page 20: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

GoldFire Innovator

� Data Sources■ Unstructured information from personal data, corporate data, deep web, content, patents, internet

►15 MM worldwide patents

►Database of over 8000 scientific effects

►3000 cross-disciplinary scientific deep web websites

� Results■ Static categorization of key concepts

■ Accurate answers to questions

■ Dynamic document summarization

Page 21: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

GoldFire Innovator - Strengths

�Precision retrieval of targeted R&D content►Retrieves information from context – semantic

indexing

►automated summaries and categorization

►Relevant filtering and ranking

�Using natural language query to search►Ask the right questions - How to dry paper? How to balance diets?

�Innovation Trend Analysis► Competitive analysis

► Technology analysis

► Patent relationship analysis – citation analysis

Page 22: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Inxight

� Type of Tool■ Text mining software tool.

� Capability■ Natural Language Processing

■ Contextual extractions (leaning towards semantic analysis)

� Data Source■ Unstructured text from websites, internal repositories, full-

text documents

■ Documents have to be pre-processed to extract meta-data and identify entity types

� Results■ Hierarchical categorization

Page 23: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Inxight - Strengths

� Federated Search capability

� Claim to have more accuracy than a

human reader

� Software can work in 32 languages

and can understand 27 entity types

� Can process 1.2Gigabytes per hour

� Claim to have the most powerful

linguistic algorithms in the field

Page 24: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Temis

�Type of tool ■ Text Mining Solutions - software

�Capability ■ Natural Language processing

►Insight DiscovererTM Extractor – info extraction sever poweredby Xe-LDA and used with specialized Skill Cartridges

►Insight DiscovererTM Categorizer – doc categorization sever

►Insight DiscovererTM Clusterer – automated classification sever

►XeLDA - Multilingual linguistic engine – natural language processing

►Skill Cartridge – A set of customizable knowledge components

that define the information to be extracted. The two major knowledge

components are multi-lingual dictionaries and multi-lingual

extraction rules (establish relationships between defined concepts

Page 25: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Skill Cartridge Overview

� Open architecture

■ Plug & Play annotation components

■ Each defines areas of interests & extraction rules

■ Extraction rules describe the sentence structure that characterizes a concept

XeLDA™

Text

(any kind, any format)

Words

(any concept)

Merger & Acquisition

Positive & Negative Sentiment Analysis

Meaning = Acquisition

• Target & buyer• Amount & date

...

Meaning = Satisfaction

• People, companies, Products

• Satisfaction• Support

...

Plug & Play

Skill Cartridges™…

InsightDiscoverer™Extractor

Page 26: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Temis

�Data Sources

■ Any kind, any format, Internal & external data,

documents, literature, patents, clinical trials,

chemistry and biology, bioinformatics, internet,

email, etc

�Results

■ Clusters, Rankings, Lists to discover information

trends and relationships

Page 27: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Temis - Strengths

�Searching by concepts►Selecting concepts from concept tree

�Specialized Skill cartridges►Life science Skill Cartridges

– Analytics

– Text Mining 360°

– Competitive Intelligence

– Human Resources Management

►General Skill Cartridges

– Biological Entity Relationships – best selling

– Medical Entity Relationships

– Chemical Entity Relationships

– Competitive Intelligence Life Sciences Edition

Page 28: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Temis - Strengths

�Strong extraction, categorization, and

clustering capabilities

�Robust XeLDA linguistic engine

�Quick trend analysis

�Chemical Document Browser – specialized

extraction module for chemical substance

nomenclature translation to chemical

structures.

Page 29: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

OmniViz

� Type of tool■ visual based data/text mining software

� Capability■ algorithm based statistical analysis, not semantics

� Data source/type■ numeric, text, categorical, chem. structures, sequence,

structured/unstructured text

� Results

■ interactive visualizations maps such as CoMet,

Correlation, Galaxy Proximity, etc.

Page 30: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

OmniViz

Page 31: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

OmniViz- Strengths

■ Interactive visualizations

■ Supports analysis of large amounts of data (millions of documents) - numeric, categorical and full-text analysis, including patents.

■ Broad applications including gene expression, sequence & pathway analysis, chemical structures, cheminformatics, clinical trial, patent analysis, diagnosis and treatment, legal, marketing data, regulatory compliance, intelligence analysis, etc.

■ Flexible data import and merge capabilities

Page 32: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

ClearForest � Type of Tool

■ Text mining tool (text analytics solution)

� Capability■ Semantic analysis/NLP

� Data Sources■ Unstructured text – websites

■ Patents

■ Internal documents

■ Meta-data

� Results■ Structured data entities

■ List of potential solutions for identified issues

■ Visualization tools – trend graphs, category maps

►Color and font are used to show intensity of relationships

Page 33: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

ClearForest

Text Analytics: How it Works

Unified Analysis

Output

TaggingPlatform

UnstructuredText

Problem Condition

Fuel Pump Fails corroded

Pump Relay Shorts Cold

weather

Headlight Fails Running hot

Engine Stalls At low

speeds

Part

DB

Database

DatabaseText Fields

DB

XML

Extraction

Across RecordsIncluding domain specific

entities & relationships

Role-Based Interfaces

<PartProblemCondition>

<Part> Fuel Pump </Part>

<Problem> Fails </Problem>

<Condition> Corroded </Condition>

</PartProblemCondition>

DocumentsText, Word, Excel,

Email, WWW, PDF

Page 34: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Clear Forest

Packaged Extraction ModulesInputs

Outputs

Patents

Structured Data Entities� Agent� Application Number� Assignee� Assignee Address� Examiner� Filing Date� Inventor� Inventor Address� IPC� Issue Date� Number Of Claims� Patent Citations� Patent Number� US Class

Entities • Claim Element• Claim Invention• Extracted Terms• Invention Terms• Measurement Terms• Number of Claims• Patent Section• Problem Solved Terms• Problems Solved• Process Technology Terms• Technology Terms

U.S. PatentSearch

MicroPatentSearch

DatabaseFields

Text, Word,Excel, etc

Page 35: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

ClearForest - Strengths

�Can be applied to a wide range of applications as evidenced by wide variety of available extraction modules■ Security/intelligence gathering

■ Product/customer information

■ Corporate/People profiles

■ Patents

■ Biomedical entities

�Analytics tool can discover unexpected relationships between entities that would not have been otherwise uncovered by standard, manual methods.

Page 36: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

VantagePoint

� Type of the tool■ Text mining software mainly used for technology

assessment and company profiling

� Capability■ Uses pattern matching, rule-based, and natural language

processing techniques

� Data Sources■ Works best with structured data - text data from

bibliographic databases

� Results

■ summaries, charts, matrices, maps, and graphs

Page 37: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

VantagePoint - Key Features

� Rapid navigation in large abstract collections

� Helps find relationships within your data

� Visually displays relationships

� Buckets documents to help in categorization

� Utilities for cleaning data

� User created thesauri for reducing data

� Scripting capabilities to automate knowledge-gathering

� Easily exports output to other applications

� Can be configured to text mine most forms of structured bibliographic data

Page 38: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths
Page 39: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

VantagePoint - Strengths

� List Creation and Cleanup■ patent assignee, author, inventor

■ pre-built IPC, User created thesauri

� Analytical tool box■ rapid navigation in large abstract collections to answer who, where, what, when but not how and why

■ visually displays relationships

� Scripting capabilities to automate knowledge-gathering■ configure to extract from structured databases

Page 40: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

RefViz

� Type of tool

■ Text Analysis and Data Visualization software

� Capability

■ Statistical and Linguistic analysis

►“mathematical signature” – relationship of words

►Uses a thesaurus tool

� Data Sources

■ Only structured data from title, abstracts/notes fields, or ISI Web of Science, PubMed, OCLC, Output

� Results

■ “Galaxy” & matrix visualization

Page 41: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

RefViz - Strengths

■ Reference Retriever™ can search multiple online

sources simultaneously

■ can be used together with EndNote, ProCite, and

Reference Manager to provide an additional level

of analysis to existing reference collections

■ analyzes large numbers of references by thematic

content

■ interactive, visual landscape

Page 42: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Reveal trends and associations in references

The Galaxy view organizes references according to how they are related conceptually.

References on farming and herbs, either their

cultivation or use as herbicides, are found in

the upper left region of the Galaxy.

Groups in the lower right focus on herbs in

medicine.

The region in between farming and medicine contains a mix of

references about herbage diets in farm animals, herbal extracts

from plants, and research on health effects of herbicide exposure.

Page 43: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Quosa

� Type of tool■ Text mining tool based on concept extraction/clustering

� Capability■ Statistical analysis (term extraction, frequency ranking,

concept extraction using dynamic extraction algorithm from MIT/Harvard)

� Data sources■ unstructured text - PubMed, Ovid, Google Scholar

■ Patents

■ Internal documents

� Results■ Highly organized collection of documents (folders on

shared server or local machine)

■ Team sharing and annotating

Page 44: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Quosa - Strengths

Full-text retrieval and management of

scientific documents

■Get full-article from a journal or patent

gateway

► PubMed, Ovid, USPTO website

■Document Summary from My Article

Organizer

■Download to EndNote

Page 45: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

M-CAM DoorsTM

� Type of tool■ Patent database provider, with text analysis and risk management

solution

� Capability■ Linguistic & semantic-based analysis, multi lingual

� Data Sources■ Patents from over 88 patenting authorities, 50 million patent doc.

■ journal articles (by the end of the summer 2006)

� Results■ “Compass” citation view

■ “Magellan” telescope & hourglass – patent life timeline

■ Patent uniqueness and enforceability analysis

■ Competitive intelligence analysis - financial risk analysis for merger/acquisition and stock trading

Page 46: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

M-CAM DoorsTM

Hourglass view – shows behavior and intent

Red bar – cited patents

Blue bar – citing patents

Green bar – concurrent art – share pendency

Purple bar – volume of uncited patents

Orange bar – volume of patents that did not cite subject patent

Page 47: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

M-CAM DoorsTM - Strengths

�Powerful visual interface for citation analysis with related family & legal status views

�Can rate each patent for its uniqueness, reliance on related patents, and enforcement potential – based on Hourglass view

�Can rank patent clusters by relevance to business objectives

�Competitive Intelligence/Investment Research ■ New Patent Thursday™ , Patent Portfolio Confidence Rating™ , Custom PPCR™

Page 48: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

PatAnalyst

� Type of tool■ Patent database provider – integrated source (UNIPAT) of patent

databases from US, PCT, EPO, PAJ, Germany, UK, France and Switzerland

■ Patent search & examination service

� Capability■ No text mining algorithm

� Data Sources■ 51.5 MM patent documents – bibliographic data from 70 countries

from EPO

■ 15MM full-text documents – 8 countries/patenting authorities

� Results■ Viewer – analyze and orgnize the patent documents/families.

■ easy to use analytical colored text-highlighting of keywords

■ Organized folders of documents

Page 49: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

PatAnalyst - Strengths

�Powerful user-interface with enhanced

display features

■ Highlight keywords are in different colors

■ Side-by-side views of full-text and standard

bibliographic data

■ Integrated IPC category trees

■ “Live” legal status & patent family tree view from

EPO Viewer (EPOQUE)

■ Combined search of full-text & bibliographic data

Page 50: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths
Page 51: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Aureka

� Type of tool

■ content and software tool specializing in visualization and

citation analysis

� Capability

■ Keyword and Statistical Analysis

� Data Sources

■ patent databases listed in MicroPatent’s FullText collection

� Results

■ ThemeScape maps, hyperbolic citations trees, text clusters

Page 52: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Aureka Themescape Map of

Stem Cell TechnologyA Themescape map of

a large set of

documents provides an

initial view of the

content. Additional

probing and analysis of

the map will help to

reveal more insight.

Page 53: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Citation Tree of Patent EP0778277

A cited patent provides insight into a corporation’s strategic intent with a patent;

build a picket fence, non-core patent, or lack of R&D interest.

Page 54: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Aureka – Strengths

� Strong citation analysis tool►Interactive citation tree – intelligence analysis

and strategic planning

� Annotation capabilities

� Strong visualization analysis►Patent mapping with ThemeScape

►Clustering by Vivisimo

Page 55: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Wisdomain

� Type of tool■ Content and software tool. Web-based searching and

citation tool. Analysis module is local

� Capability■ Keyword analysis, citation map visualized searching

� Data Sources■ Patents, specialized in US, EP, PCT, PAJ, INPADOC legal and family status, China abs, Korea abs

� Results■ Genealogy tree, Tables, charts

Page 56: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Wisdomain - Strengths

�Strong citation analysis capability►backward and forward citations, more than one nesting

►collateral citation analysis

►citation alerts

�Genealogy Tree►good in competitive analysis and licensing

strategy planning

� Graphic view of the search results

Page 57: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

ISSUED

1993APPLIED

1990

PENDING PERIOD

SUBJECT PATENT

PATENT

PATENT

PATENT

PATENT

PATENT

PATENT

Collateral CitationIdentifying similar patents sharing the same pending period with the subject patent

PATENT

PATENT

PATENT

PATENT

PATENT

PATENT

PATENT

Key Collateral patentKey Collateral patentKey Collateral patentKey Collateral patent

7 collateral patents are identified based on indirect citation r7 collateral patents are identified based on indirect citation r7 collateral patents are identified based on indirect citation r7 collateral patents are identified based on indirect citation relations.elations.elations.elations.

Page 58: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Summary

R&D scientists,

Information Professionals

Strong collateral citation analysis Wisdomain

Information Professionals,

R&D scientists

Powerful full-text user interface

with display featuresPatAnalyst

Business Intelligence, Legal/Patent

Dept., Information Professionals

Patent uniqueness & enforcement

analysisM-CAM

Legal/Patent Dept., R&D scientists,

Information Professionals,

Strategic Planning, Business

Intelligence

Patent mapping, clustering &

citation analysisAureka

Information Professionals,

Business Intelligence

Analytical tool box for technology

or company assessmentVantagePoint

R&D scientists,

Information Professionals

Bibliographic data post-

processingRefViz

R&D scientistsFull-text retrieval & mgmtQuosa

R&D scientists,

Business Intelligence

Extraction using Specialized Skill

Cartridges Temis

R&D scientistsInteractive visualizationOmniViz

R&D InformaticsExtraction & Federated Search Inxight

R&D scientistsSophisticated semantic analysis

toolGoldFire

Business IntelligenceExtraction modulesClearForest

Potential User GroupsStrengthVendor Name

Page 59: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Path Forward

■Phase II

►Pilot selected tools

►Identify potential clients groups and interview

representative clients

Page 60: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Closing Remarks

Page 61: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Acknowledgements

Peter Mattei Aureka

Thomas Klose ClearForest

Shelley Pavlek GoldFire/Invention Machine

Joanne Freeman Inxight

Marlene Khouri M-CAM

Heahyun Yoo OmniViz

Tony Medina PatAnalyst

Michael Rogers Quosa

Karen Stesis RefViz

Tisha Zawisky Temis

Lou Ann DiNallo VantagePoint

Mary Talmadge-Grebenar Wisdomain

Joseph Bezek

Claudia Powers

Ramesh Durvasula (Informatics)

Ronald Stoner (Mead Johnson)

Page 62: Text Mining & Visualization - Patent Information Users Group · Introduction –Text Mining &Visualization Overview of Text Mining Tools Capabilities Data Sources Results Strengths

Questions