Upload
estella-farmer
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Patterns of Semantic IntegrationRiding the Next Wave
April 2006
Dan McCrearyPresidentDan McCreary & [email protected](952) 931-9198
Managed Metadata Solutions
Licensed Under Creative Commons 2.5
2 of 54
Creative Commons 2.5
• Attribution. You must attribute the work in the manner specified by the author or licensor.
• Noncommercial. You may not use this work for commercial purposes.
• Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.
$
BY:
Licensed Under Creative Commons 2.5
3 of 54
Patterns of Semantic IntegrationOur ever increasing understanding of solid-state physics has allowed Moore’s Law to proceed unabated for the last 40 years. Exciting developments in quantum physics, nanotechnology and molecular self-assembly will continue this trend for the foreseeable future. But why is it that an instructor can’t quickly import a database of 10,000 subject-appropriate lesson plans and quiz items into their learning-management system and dynamically adjust classroom content and assessments to individual student learning styles and interests? The key to this and other computer-to-computer interoperability challenges lie in the difficulty computer systems have in finding and precisely exchanging data. Enter the Semantic Web. The designers of the current world-wide-web realized that the gateway to this does not require faster computers and networks but instead lies in the careful publishing and exchange of data semantics (or meaning) and the precise publishing data-that-describes-data (metadata) in a machine-readable structure. This presentation will review patterns that researches around the world are using to make the job of computer integration easier allowing even ultimate frisbee™ coaches access to vast amounts of structured information.
Licensed Under Creative Commons 2.5
4 of 54
Background for Dan McCreary• Computer Consultant in Minneapolis
• Became obsessed at a young age on computer-to-computer communications
• Interested in OO, XML, semantics and business strategy
Licensed Under Creative Commons 2.5
5 of 54
Pattern Themes• We learn how to create and use models of
the world to discover underlying patterns of nature
• Computer-to-computer communication also uses models and allows us to find of underlying patterns to solve these problems
Licensed Under Creative Commons 2.5
6 of 54
Agenda
• The steps required for precise exchange of information between computer systems
• Define “semantics” and key concepts in the semantic web
• HTML, XML, RDF
• Discuss limitations of current HTML web and XML
• Show how Semantic Web technologies solve many of these problems
• Semantic patterns
• Predictions
• References
Licensed Under Creative Commons 2.5
7 of 54
1970 Sci-Fi Classic: “The Forbin Project”
A NewIntersystemLanguage!
Lesson: Before you take over the world you mustexchange semantically precise metadata!
Licensed Under Creative Commons 2.5
8 of 54
Moore’s Law
Creative Commons 1.0 Courtesy of Ray Kurzweil and Kurzweil Technologies, Inc
Licensed Under Creative Commons 2.5
9 of 54
Thesis: We Need Semantics• For the next revolution in computing
– We don’t need faster CPUs– We don’t need larger hard drives– We don’t need faster networks– We don’t need more HTML linking
• We need to link our concepts using semantic technologies
Licensed Under Creative Commons 2.5
10 of 54
The Agent VisionThe Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.
The Semantic Web A new form of Web content that is meaningful to
computers will unleash a revolution of new possibilities By Tim Berners-Lee, James Hendler and Ora Lassila
Licensed Under Creative Commons 2.5
11 of 54
Overlapping Terminology
Data Warehouse
Data Mining
EnterpriseApplication Integration
(EAI)
MetadataDiscovery
Statistical Analysis
PatternDiscovery
Relational DatabaseMetadata
SemanticWeb
Business SemanticsData Dictionary
HTML Web
Licensed Under Creative Commons 2.5
12 of 54
Computer Science Is About Abstraction
Time
Level ofAbstraction
10100101
MachineLanguage
MOV R0, A1BNE F32C
AssemblyLanguage
DO I=1, 100I=I+1
FORTRAN
Proc(i1, i2, o1)
StructuredProgramming
Object-orientedProgramming
XML
GUI
Licensed Under Creative Commons 2.5
13 of 54
Person to Person Dialog
SoundSound
WordsWords
ConceptsConcepts
SentencesSentences
ConversationConversation
Problem SolvingProblem Solving
higherabstraction
Licensed Under Creative Commons 2.5
14 of 54
Computer to Computer Dialog
InternetInternet
XML TagsXML Tags
Documents/XML SchemaDocuments/XML Schema
Graphs/Ontologies/RDF/OWLGraphs/Ontologies/RDF/OWL
Semantic IntegrationSemantic Integration
AgentsAgents
You AreHere
Licensed Under Creative Commons 2.5
15 of 54
Semantic Triangle
Concept
Referent
Refers ToSymbolizes
Stands For“cat”
Physical Objects
A pattern of neural activity in our brain
Symbol
Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning
“katze” (German)
“gato” (Spanish)
Licensed Under Creative Commons 2.5
16 of 54
Symbols Can Only Directly Link to Concepts
Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning
Concept
Referent
“cat”Symbol
• The link between a symbol is an INDIRECT link
• The referent MUST pass through the Concept
• Only symbols can be transmitted between computers
Licensed Under Creative Commons 2.5
17 of 54
The Problem of Semantic Ambiguity
Did you say you were looking for mixed nuts?
context=food context=hardware
People use context to derive the correct meaning.
Licensed Under Creative Commons 2.5
18 of 54
59 meanings of "run"
"run"
18 noun"senses"
41 verb"senses"
tally
test
footrace
streak
play
…
move fast
scat
go
operate
has form
…
"the kids ran to the store"
"the Yankees scored a run in the bottom of the 9th"
"The experiment ran for over an hour"
"her run of luck was just starting"
"she broke mile run record"
"the football 3rd down play was a run"
"13 other noun meanings…"
"I would run from a ticking bomb."
"The path runs up the hill."
"you need training to run this machine."
"the movie plot runs like this."
"36 other verb meanings…"
Source: WordNet at http://wordnet.princeton.edu/
Context
Licensed Under Creative Commons 2.5
19 of 54
Analogy: English Dictionary
Term
Metadata (data about data)
Definitions
source: www.m-w.com
Note: people usecontext to findthe correct meaning.
Licensed Under Creative Commons 2.5
20 of 54
Word Senses
“run”
tally
test
footracestreak
play
move fast
scat
gooperate
has form
duration
A single word mapsTo many concepts
Licensed Under Creative Commons 2.5
21 of 54
Synonym Ring
<Person>Joe Smith<Person>
<Individual>Joe Smith<Individual>
<Human>Joe Smith<Human>
Joe Smith
Many symbols forthe same object
Refers To
Symbolizes
Stands For
Licensed Under Creative Commons 2.5
22 of 54
I’m Thinking of an Animal…
• It has four legs
• It has fur
• It chases mice
• It goes “meow”
If you describe enough of the properties of a concept, you can havereasonable assurances that they are the same
Note: since “concepts” are neural patterns in the brain theconcept of “exact” is difficult to measure
Licensed Under Creative Commons 2.5
23 of 54
Concept Linking
Question: How can you tell if two concepts are the same if twosystems don’t share the same symbol?
Answer: If they have the same properties (and relationships)you can assume with reasonable probability they are
the same concepts.
symbol
Licensed Under Creative Commons 2.5
24 of 54
Semantics is About Concept Linking• Wouldn’t it be nice…
– If computers could name things internally or on a web site however they liked (keep using the current web)
– But we could always link those names back to a centralized database of concepts
– Computers could do this automatically just like they translate domain names (www.google.com) into IP addresses (64.233.187.99)
– Then we could communicate precisely without dictating the names that are used inside a computer system or on a web page
Licensed Under Creative Commons 2.5
25 of 54
HTML Sample<title>The Problem of Semantics</title>
<p>This is a standard document that is sent between two computers using the <a href="http://w3c.org/Protocols">HTTP<a> protocol. Note that other then the markup tags like <b>bold</b> there is very little that a computer can do to understand the meaning of the text.</p>
Unless computers "understand" the words in the English language it will be very difficult for them to understand the meaning or semantics of the web.
Licensed Under Creative Commons 2.5
26 of 54
What Computers "See" Today<title></title><p><a href="http://w3c.org"><a> <b></b> </p>
Unless computers "understand" the words in the English language it will be very difficult for them to understand the meaning or semantics of the web.
Licensed Under Creative Commons 2.5
27 of 54
XML allows you to create new “tags”
<PersonGivenName>Joe</PersonGivenName>
<PersonFamilyName>Smith</PersonFamilyName>
<Address>123 Main Street</Address>
<City>Anytown</City>
<State>Minnesota</State>
<Phone>(651) 555-1234</Phone>
Without a data dictionary, it is difficult to know what the meaning of the data elements is. The tags appear in patterns but what they mean is still a mystery to a computer.
<tag> </tag>data
Licensed Under Creative Commons 2.5
28 of 54
Which external computers may not understand<></><></><></><></><></>
Without a “data dictionary”, it is difficult to know what the meaning of the data elements is. The tags appear in patterns but what they mean is still a mystery to a computer.
Licensed Under Creative Commons 2.5
29 of 54
Metadata• Metadata is any data that describes other data• Metadata is itself data and is stored in specialized
structures (directed graphs) to aid comparison with other metadata
• A controlled store of metadata is called a “registry”
Data
describesMetadata
RDBMS
document keywords
tables
web navigation
columns
source-code
org-chart
product-specs
Licensed Under Creative Commons 2.5
30 of 54
Hypertext Links and Data Element Links
The Semantic Web
MetadataRegistry A
MetadataRegistry B
The semantic web is about linking conceptual data elements in published metadata registries
The Hypertext Web
The current HTML web is focused on linking published documents with HTML
Licensed Under Creative Commons 2.5
31 of 54
Enter the URI…
• Today's web allows documents to be accessed by people if people put links in between documents – the hypertext web
• But it is very difficult for machines to "understand" what we are saying and what we mean and what to do with the data
• But machines CAN determine if two URIs match:
<SurName>Smith<SurName> <LastName>Smith</LastName>
http://www.shared_dictionary.com/PersonGivenName
MDR
Hey, you both “mean” the same thing!
Licensed Under Creative Commons 2.5
32 of 54
Subject-Verb-Object Triple
Person
“Joe”
Has-a-Given-Name
The person is named “Joe”.
<PersonGivenName>Joe</PersonGivenName>
Licensed Under Creative Commons 2.5
33 of 54
Triples are Almost all URIs
http://MyDictionay/DataElement/Person
“Dan”
http://MyDictionay/DataElement/PersonGivenName
URIs can point to a standard location in a metadata registry.
The “type” of link.
Licensed Under Creative Commons 2.5
34 of 54
Sample RDF Document<?xml version="1.0"?>
<RDF>
<Description about="http://www.danmccreary.com/Training/Classes/Semantic_Web">
<author>Dan McCreary</author>
<created>2006-01-01</created>
<modified> 2006-03-15</modified>
</Description>
</RDF>
Licensed Under Creative Commons 2.5
35 of 54
Massive Databases of "Triple Stores"
Subject Predicate Object
Triple store is:- A database with just 3 Columns- but millions/billions of rowsMay require specialized hardwareKey Metrics: - Time to load triples into application - Time to save triples into database - Time to browse to an element - Time to configure systemSample Projects:•Kowari•3Store•Sesame
RDF "Triple Store"
See: http://simile.mit.edu/reports/stores/
Licensed Under Creative Commons 2.5
36 of 54
Semantic Web Standards Stack
URI/IRIURI/IRI UnicodeUnicode
XMLXML NamespacesNamespaces
XML QueryXML Query XML SchemaXML Schema
RDF Model & SyntaxRDF Model & Syntax
Ontology (OWL)Ontology (OWL)
Rules/QueryRules/Query
LogicLogic
ProofProof
Trusted Semantic WebTrusted Semantic Web
Sign
atur
eSi
gnat
ure
Encr
yptio
nEn
cryp
tion
Source: Tim Berners-Lee www.w3c.org
http://www.w3.org/Consortium/Offices/Presentations/SemanticWeb/34.html
Licensed Under Creative Commons 2.5
37 of 54
Example of Metadata Registry
Licensed Under Creative Commons 2.5
38 of 54
May I have a beer?
Me gusteria una cerveza
Metaphor: The Translator Agent
Customer(Spanish Only)
TranslationService
(Speaks Spanishand English)
InternalServer
(English Only)
Comingright up!
Licensed Under Creative Commons 2.5
39 of 54
Cost of Mapping• Goal: create semantic maps to a few metadata
standard, not many standards
R5
R2
R3
R4R6
R7
RN
Mapping from one to many metadata registry to N other metadata registries: The O(N2) problem
R2
R3
R4
R5
R6
R7
RN
ESB
Mapping to one metadata registryThe O(N) problem(ESB-Enterprise Service Bus)
R1 R1
Licensed Under Creative Commons 2.5
40 of 54
Semantic Mappers and Semantic Brokers
ReportRequestIn Model
A
MetadataTranslation
ServiceXML
ResponseIn Model
ATDS
In ModelB
Metadata Registry
Model A Model B
Metadata Mappings
RDFQueries
XMLResults
Gartner: Vocabulary-based transformation
Data Warehouse (RDBMS)SQL or XMLA
QueriesIn Model
B
XMLA: XML for Analysis
Licensed Under Creative Commons 2.5
41 of 54
Wikipedia Rocks!• It is currently burdensome to add new metadata to the
registry• Would like to add “Edit this data element” (ala Wikis)• Ideally a “Semantic Wiki”
See: Wikipedia: “Semantic Wiki”
Licensed Under Creative Commons 2.5
42 of 54
Retrieving Data: An Evolution
• Shorten the time-to-report interval• Allow users to "browse" data sets interactively• Remove programmers with "backlogs" of reports• Users frequently waited days, weeks for months to get a custom report
created
Monthly “Green Bar” Reports BrowseableGraphical Interface(Cognos)
Increasing Responsiveness
Licensed Under Creative Commons 2.5
43 of 54
Classification and Categorization• Whenever we decide to break the continuous observable
world into a predefined list of categories when each category has a label we call this a categorical value. These will then become the "dimensions" of our cube.
"red" "green" "blue"
George Lakoff: Women, Fire and Other Dangerous Things: What Categories Revel about the Mind
Note: NO OVERLAP!
Licensed Under Creative Commons 2.5
44 of 54
Metadata Discovery• Tools that “scan” data sources and create
new ontologies or mappings to existing ontologies
Metadata Registry
Data Source Mappings
Relational Database
Licensed Under Creative Commons 2.5
45 of 54
Federated Ontologies
What do you do when you have more than one Ontology?
1) Combine
2) Map
3) Federate
Tools for combination and federation
Multiple Overlapping Ontologies
Licensed Under Creative Commons 2.5
46 of 54
Cost of Poor Semantics• IT Departments spend 40-60% of their costs
on Integration
• 90% of integration costs are due to poor semantics
• If every application used and "published" a machine readable ontology with mappings to published ontologies integration could be almost "automatic"
Licensed Under Creative Commons 2.5
47 of 54
Gartner
Metadata cast into formal logics will drive interoperability, automation, cost cutting, better search capabilities and new business opportunities.
Semantic Web Drives Data Management, Automation and Knowledge and Discovery
Alexander Linder
March 2005
G00125145
Licensed Under Creative Commons 2.5
48 of 54
Semantic Spectrum
Time/Money
HighSemanticClarityStrong
Semantics
WeakSemantics
UML, XMI
Taxonomies
Ontologies
Thesaurus
RDF
XML, XSLT
See also: Wikipedia/semantic spectrum
Glossaries
OWL
Controlled Vocabularies
Word/HTML
Concept MapsEnterprise Data Models
Licensed Under Creative Commons 2.5
49 of 54
Structures for Increased Semantics
HTML PDF Word PowerPoint Excel Access Server XML RDBMS RDF Taxonomies OntologiesSOAWSDL
Increased Semantic Precision
Source: Network Inference
Licensed Under Creative Commons 2.5
50 of 54
Friend of a Friend• A "Proof of Concept for RDF"• Requires each person to put an RDF file on
their web pages• System in place to prevent spammers from
getting e-mail accounts• Sample RDF vocabulary• Sample FoaF file:<foaf:Person>
<foaf:name>Dan McCreary</foaf:name> <foaf:knows> <foaf:Person> <foaf:name>Bill Titus</foaf:name> </foaf:Person> </foaf:knows></foaf:Person>
© emode.com
Licensed Under Creative Commons 2.5
51 of 54
Ontology Architectures• One "big" ontology (see CycCorp cyc.com)
– Using a single "Uber-Ontology"
– Akin to "Boiling the Ocean"
• Compared to:– Many smaller ontologies
– Micro-formats (RDF/A)
– How to combine?
CYC contains over3 Million "assertions"
Source: cyc.com
Licensed Under Creative Commons 2.5
52 of 54
If You Give A Kid A Hammer……the whole world becomes a nail• People solve problems with the tools they
know• Semantics are new tools for solving
computer-to-computer communication problems
• Intelligent agents will be prevalent when we teach organization to publish their metadata
Licensed Under Creative Commons 2.5
53 of 54
Cognitive StylesThe way we solve problems is dependant on the tools we know how to use.
Shoshana Zuboff (1988)
In the Age of the Smart Machine
Technology creates:- new ways of thinking- new ways of approaching and solving problems- new sets of "Cognitive Styles"
It is only if we share these cognitive styles that we will be able to create a coherent technology strategy that everyone understands
Licensed Under Creative Commons 2.5
54 of 54
Agents
Open The Door To The Semantic Web!
• Metadata publishing is hard• It is a foundation upon which the Semantic Web will be built• The benefits are indirect and need strong executive sponsorship• Metadata publishing is no “silver bullet”• I believe it is the most direct way to get to the Semantic Web• This will be the most practical way to build intelligent agents
Licensed Under Creative Commons 2.5
55 of 54
Questions & Answers
If software is ever going to be able to effectively inter-operate (in ways that were not explicitly preconceived and engineered), it will be because applications share enough of the semantics of their data elements.
Doug Lenat, CycorpSemantic Technology Conference
2005