Upload
kellie-heath
View
218
Download
0
Embed Size (px)
Citation preview
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Knowledge Sharing on the Semantic WebTim Finin
University of Maryland, Baltimore County
Department of Homeland SecurityAdvanced Scientific Computing Program
Text Analysis Workshop 25 May 2005
http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF
grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
tell
register
tell
register
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 2
This talk• Motivation• The knowledge sharing problem• Some ongoing projects
– Finding knowledge on the web– Evaluating provenance and trust
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 3
“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”
-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 4
“The web has made people smarter. We need to understand how to use it to make machines smarter, too.”
-- Michael I. Jordan, paraphrased from a talk at AAAI, July 2002 by Michael Jordan (UC Berkeley)
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 5
“The Semantic Web will globalize KR, just as the WWW globalize hypertext”
-- Tim Berners Lee
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 6
This talk• Motivation
• The knowledge sharing problem
• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 7
Knowledge Sharing 1.0
• In 1990 the DARPA knowledge sharing effort defined an approach for interoperability among KB systems and agents
–KIF + Shared Ontologies + KQML
• It was (and is) a great vision that resulted in much good research and some sound standards
–Supporting knowledge interoperability, agent communication, agent tasking and cooperation, etc.
• It never really made it out of the lab
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 8
Knowledge Sharing
2.0•The Web is a Blob, consuming all in it’s path. Resistance is futile
•More seriously, it promotessharing, building on other’s content, offering your content for building upon, decentralization, community development and evolution, common identifiers (URIs), using a working infrastructure, collaborating with industry, etc.
•These are significant advantages•The Semantic Web can be the interlingua and infrastructure for interoperability and knowledge sharing.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 9
From where will the markup come?• A few authors will add it manually.
• More will use annotation tools.– SMORE: Semantic Markup, Ontology and RDF Editor
• Intelligent processors (e.g., NLP) can understand documents and add markup (hard) – Machine learning powered information extraction tools
show promise
• Lots of web content comes from databases & we can generate SW markup along with the HTML– See http://ebiquity.umbc.edu/
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 10
From where will the markup come?• In many tools, part of the metadata information is
present, but thrown away at output – e.g., a business chart can be generated by a tool…
– …it “knows” a chart’s structure, classification, etc.
– …but, usually, this information is lost
– …storing it in metadata is easy!
• So “semantic web aware” tools can produce lots of metadata– E.g., Adobe’s use of its XMP platform
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 11
This talk• Motivation
• The knowledge sharing problem
• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 12
Google has made us smarter
Something similar is needed by people andsoftware agents for information on the semantic web.
tell
register
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 13
Why use IR techniques?• We will want to retrieve over the structured and
unstructured parts of a Semantic Wed Document (SWD)
• We should prepare for the appearance of text documents with embedded SW markup
• We may want to get our SWDs into conventional search engines, such as Google.
• IR techniques also have some unique characteristics that may be very usefule.g., ranking matches, measuring similaritybetween documents, relevance feedback,etc.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 14
title• text
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 15
Swoogle Architecture
metadata creation
data analysis
interface
SWD discovery
SWD MetadataWeb Service
Web Server
SWD Cache
The Web
The WebCandidate
URLs Web Crawler
SWD Reader
IR analyzer SWD analyzer
Agent Service
340K SWDs, 48M triples, 97K classes,55K properties, 7M individuals (April 2005)
Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo1
Digest “Time” Ontology (document view)
Demo2(a)
Digest “Time” Ontology (term view)
Demo2(b)
………….
TimeZone
before
intAfter
Find Term “Person”Demo3
Not capitalized! URIref is case sensitive!
Digest Term “Person”Demo4
167 different properties
562 different properties
Demo5(a) Swoogle
Today
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 22
Swoogle’s Triple Store lets you shop
And check out your triples into any of several reasoners
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 23
Summary
Swoogle (Mar, 2004)Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)
Swoogle3 (July 2005)Swoogle3 (July 2005)
Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface
Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart
Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals
2005
2004
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 24
Will it Scale? And How?• An open question is how well our approach will scale and what techniques will
work as the semantic web grows.• Here’s a rough estimate of the data on the semantic web based on Swoogle’s
crawling
System/date Terms Documents Individuals Triples Bytes
Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109
Swoogle3 1.75x105 5x105 1x107 7.5x107 1x1010
2005 2.5x105 5x106 5x107 5x108 5x1010
2008 5x105 5x107 5x108 5x109 5x1011
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 25
Harnessing Google• Google started indexing RDF documents some
time in late 2003• Can we take advantage of this?• We’ve developed techniques to get some
structured data to be indexed by Google• And then later retrieved• Technique: give Google enhanced documents with
additional annotations containing Swangle Terms ™
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 28
This talk• Motivation
• The knowledge sharing problem
• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 29
Levels of granularity on the Semantic Web
• The semantic web has several levels of granularity.
• We’re most familiar withdocuments and triples.
• We’ve been exploring thenotion of an RDF Moleculeas a “meaningful” collectionof RDF triples.
• We believe that RDF moleculeswill be useful for: gathering evidence to verify an RDF graph and recording the provenance.
Universal RDF Graph
RDF Documents
Named Graphs
Molecules
Triples
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 30
RDF Molecules
• An RDF graph can be decomposed into subgraphs.
• A lossless decomposition is one in which the original graph can be recovered by concatenating the components.
• The presence of “blank nodes” limits our ability to completely reduce the graph to triples.
• RDF molecules are subgraphs which can not be further decomposed.
• RDF molecules are useful as minimal units of “evidence” in support of a graph.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 31
1 2
4
756
3
An RDF graph of interest
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 32
1 2
4
756
3
4
56
1 2
3
1 2
4
4
75
An RDF graph of interest
The graph’s molecules
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 33
1 2
4
756
3
4
56
1 2
3
1 2
4
4
75
Web pages containing one or more molecules discovered by Swoogle
An RDF graph of interest
The graphs molecules
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 34
Blank nodes cause RDF molecule
http://foo.com/john
John Smith
foaf:name
foaf:mbox
@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://foo.com/john foaf:name “John Smith”)(http://foo.com/john foaf:mbox mailto:[email protected])
mailto:[email protected]
John Smith
foaf:name
foaf:mbox
mailto:[email protected]
@prefix foaf: <http://xmlns.com/foaf/0.1/>.( ?x foaf:name “John Smith” )( ?x foaf:mbox mailto:[email protected] )
G1: RDF graph without blank node G2: RDF graph with one blank node
2 molecules 1 molecule
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 35
Impact of functional dependency
Smith
foaf:firstName
foaf:mbox
mailto:[email protected]
G3
Johnfoaf:surname
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
(?x foaf:firstName “John")(?x foaf:surname “Smith")(?x foaf:mbox mailto:[email protected] )
foaf:mbox an Inverse
Functional Property?
One molecule{ }
Two molecules{ }{ }
t1t2t3
t1
t2
t3
N
Y
t1 t2 t3
t1 t3
t2 t3
Molecule(s) produced after functional decomposition
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 36
Propagation of functional dependency
@prefix foaf: <http://xmlns.com/foaf/0.1/>.@prefix kin: <http://ebiquity.umbc.edu/ontologies/kin/0.3/>.(?y foaf:surname "Wang")(?y kin:motherOf ?x)(?x foaf:name "Li Ding")(?x foaf:mbox mailto:[email protected] )
Wangfoaf:surname
Li Dingfoaf:name
kin:motherOf
foaf:mbox
t2
mailto:[email protected]
G4
t1
t3t4
t1
t2
t3t4
• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a
• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a
t1
t4
t2 t4
foaf:mbox and kin:motherOf are IFP
t2 t4
t3 t4
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 37
Beyond functional dependency• Our examples relied on OWL inverse functional
properties • A more general (and realistic) approach will be
based on probabilities• At issue is the conditional probability that two
blank nodes S1 and S2 are equivalence if each has a P property with value O.
prob(S1=S2 | P(S1,O), P(S2,O))• A set of properties can be used to get a high
probability, e.g., John Smith and J. Smith share the same home phone number and office phone number
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 38
Utility of Molecules
• Why are RDF molecules interesting?
• Suppose we have a graph and we seek evidence from the web to verify it’s accuracy.– E.g., verifying the information in a foaf description.
• Approach: – decompose the graph into molecules
– Search for instances of each using Swoogle4
– Note the source and provenance of each molecule
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 39
This talk• Motivation
• The knowledge sharing problem
• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust
• Conclusion
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 40
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers
• We need better ways to discover, index, search and reason over SW knowledge
• Special attention must be applied to provenance and trust
• We must develop, deploy and build on open, non-proprietary standards for knowledge sharing.
• The W3C standards RDF and OWL are a foundation for the first generation
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 41
http://ebiquity.umbc.edu/Annotated
in OWL
For more information
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 42
Nobody ever gotfired for buying IBM
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 43
Nobody ever gotfired for choosingWeb technology
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 44
This talk• Motivation
• The knowledge sharing problem
• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust– NLP meets the semantic web
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 45
NLP meets the semantic web• Agents can benefit from knowledge and informa-
tion extracted by sophisticated NLP systems.
• NLP systems can make good use of facts published on the web.
• The semantic web provides both an interlingua andpublication method for this information exchange
• We’re working on a system totranslate information betweenOntoSem and OWL
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 46
O2O System Architecture
NL Text OntoSem
Ontology
FactRepository
TMR
OntoSem2OWLOWL
Ontology
TMRsIn OWL
OWL2OntoSem
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 47
Issues
• Mismatch between NLP KR systems and Semantic Web KR languages languages, e.g.– Most NLP systems use default reasoning
– Relaxing constraints for metaphorical readings
• Practical ontology mapping systems need to be developed– Combining distributed, partial maps is an interesting
idea
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 48
Types of molecule
@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://www.cs.umbc.edu/~dingli1 foaf:name "Li Ding")(http://www.cs.umbc.edu/~dingli1 foaf:knows ?x )(?x foaf:name "Tim Finin")(?x foaf:mbox mailto:[email protected])(?x foaf:mbox mailto:[email protected])
G4
http://www.cs.umbc.edu/~dingli1
Li Ding
Tim Finin
foaf:knows foaf:name
foaf:name
foaf:mbox
foaf:mbox
t1t2
t3t4t5
• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a
• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a
• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }
• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }
t1
t4 t5
t2 t4 t2 t5
t3 t4 t3 t5 t4 t5
t1
t2 t3 t4 t5
foaf:mbox is not IFP
foaf:mbox is IFP
mailto:[email protected]
mailto:[email protected]
t1
t2
t3t4
t5