Upload
julio-dudgeon
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Semantics Rule, Keywords Drool
J. Brooke AkerCEO Expert System USA
February 2010
Corporate background
• Most accurate, largest, fastest growing semantics company worldwide
• 100+ customers including large corporations, government in;– business intelligence - enterprise search & data extensibility
– market sentiment - customer care
• 100+ dedicated engineers focused on core semantic technology, applications, tools and services:
– 200 man/years in the development of COGITO over the last 10 years.
• 20 years old, private & profitable – FY2008: $13.5M, 110+ employees, 30% growth each of last 3 years
– Offices in Connecticut, California, UK, Italy, & Germany
2
Why Do Keywords Drool?
3 Problems with Search Technology;
1. Same WordDifferent Meanings
Jaguar (animal) Jaguar (car)
2. Different WordsSame Meaning
Disability Legislation Equal Opportunity Law
3. Different WordsRelated Meaning
Organization CompanyOrganization Charity
Organization Trade Union
Results in Declining Productivity
Pro
duct
ivit
y o
f Searc
h
Amount of Information
Databases
Files & Folders
Directories
Keyword Search (Google)
Tagging
Natural Language Search
Semantic Search
Desktop
PC Era
World Wide Web
Web 1.0
Social Web
Web 2.0
Semantic Web
Web 3.0
Information Tasks In Business
Query Well Formed
Query Not Well Formed
Discovery Analysis
Exploration
SourcesKnown
SourcesNot Known
Search
Information Measures In Business1. Precision: Retrieving a high level of accurate results relevant to your search
query (a measure of exactness)
2. Recall: Retrieving a high percentage of relevant documents (a measure of completeness)
Recall
Precisionlow high
high
low
PowerSet
Keywords
Statistics
Semantics
What Business Wants IT to Provide
Semantics plays a role in all these except perhaps the last 2.
Source: AMR Research
So What Then is the Semantic Web?
Web 1.0
Producer Consumer
Web 2.0
Web 3.0
One ProducerMany Consumers
Everyone ProducesEveryone Consumes
Everyone ProducesPinpoint Consumption
semantics
COGITO® : deep analysis
4 Approaches Definition Example
Morphological Analysis understand word formsdog, dogs, and dog-catcher are closely related
Grammatical Analysis understand the parts of speech
"There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb
Logical Analysisunderstand how words relate to other words
"Jeffrey Skilling, represented by Attorney Daniel Petrocelli, is married to Rebecca Carter". Rebecca is married to Jeffrey not Daniel.
Semantic Analysis (disambiguation)
understand the context of key words
"I used beef broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory.
• Technology that understands the real meaning of the words – based on theories of human comprehension
The solution is Semantics
Using human comprehension for machine understanding of text.
Machine understanding of text needs:
A semantic network
A parser to trace each text back to its basic elements
A linguistic engine to query the semantic network
A system to eliminate ambiguity
Steps to establish meaning
SemanticNetwork
ParseEliminateAmbiguity
Order &Priority
1 2 3
Linguistic Query Engine
COGITO® is generic and horizontal and can transform unstructured information in structured data that can be managed with standard databases
• The heart of semantic technology;
• Quality of results derived from the complexity and richness of the network.
• Includes all definitions of all words.• Include relationships among all words.
COGITO® EnglishSemantic Network:
- 350,000 words- 2.8m relationships
What is a Semantic Network?
Semantic Networks
Traditional technologies can only “guess” the meaning using; keywords, shallow linguistics, & statistics
Semantic Networks instead indentify;
Connections
Concepts
Terms
Abbrev.
Phrases Meanings
Domains
“San Jose is anAmerican city”
“San Jose is ageographic part of California”
SemanticNetwork
SemanticNetwork
SemanticNetwork
SemanticNetwork
Technology Stack
SemanticNetwork
LinguisticQueryEngine
DevelopmentStudio
English
Arabic
Italian
German
Other Middle Eastern
1. Morphology
2. Grammatical
4. Disambiguation
Develop & AddCustom Rules
3. Logic
80% Precision
90%+ Precision
Semantic Intelligence• Linguistic rules• Sentence analysis• Semantic Network
Shallow text analytics• Statistics• Heuristic rules• Morphological recognition
Keyword-basedtechnologies
Disam
bigu
atio
n
Entit
y ex
tract
ion
Categ
oriza
tion
Natur
al la
ng. U
I
Sem
antic
Sea
rch
Discov
ery
Sent
imen
t
100% Semantic Technology
60KB / secSemantic text analysis processing speed (one CPU)
<10-6 sec
Scalability in number of CPUs
Typical time of access to a concept in the semantic net
Number of concepts in English semantic net
Hyponyms and hypernyms
Hypernyms and troponyms
Average # of attributes for each concept
Number of relations in semantic net (English)
Software memory footprint (semantic net and engine) 50 MB
350,000
400,000+
55,000
20
2,800,000
Virtually unlimited
Superior Performance
Expert System Unique Feature #1
• Expanded Definition Sets - captures all possible ways of expressing a concept, beyond the use of a single word;
• Compound word – like “blackbird” or “cookbook”
• Collocation – like “overhead projector” or “landing field”
• Idiomatic expression – like “to fly off the handle” or “to weight anchor”
• Locutions – group of words that express simple concepts that cannot be expressed by a single word
• Verbal lemmas – such as a verb in the infinitive form, e.g. “to write”, or verbal collocations, e.g. “to sneak away”
Keyword / Statistical and Shallow Semantic Tech Fails Here treats “to fly off the handle” all as separate words not as a concept.
Expert System Unique Feature #2• Expanded Semantic Relations - expanded set (65) of
relations between concepts by looking at their use within the text. Answers questions like “Who did what to whom?”, often called a “triple” or a subject-action-object. WordNet for example contains only 5 relation types.
•Verb / Subject•Verb / Direct Object•Adjective / Class•Syncon / Class•Syncon / Corpus•Syncon / Geography•Fine Grain / Coarse Grain•Supernomen / Subnomen•Omninomen / Parsnomen
Keyword / Statistical and Shallow Semantic Tech Fails Here treats “RIM sued Verizon” as the same thing as “Verizon sued RIM”
Expert System Unique Feature #3
• Categories of Attributes – every concept in the semantic network also contains attributes which are organized into a hierarchy of categories. The attributes and categories are assigned to maximize similarities and differences between concepts as an aid in disambiguation.
objectanimals plantspeople concepts places
timenatural phenomena
statesquantity groups
Keyword / Statistical and Shallow Semantic Tech Fails Here can’t tell you what portions of a document are related to categorically … e.g. only points to words not sections within a long document as a first cut.
Thank you
Brooke Aker
CEO of Expert System US
+1 860-614-2411
www.expertsystem.net