UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University...

Preview:

Citation preview

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Knowledge Sharing on the Semantic WebTim Finin

University of Maryland, Baltimore County

Department of Homeland SecurityAdvanced Scientific Computing Program

Text Analysis Workshop 25 May 2005

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

tell

register

tell

register

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation• The knowledge sharing problem• Some ongoing projects

– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

“The web has made people smarter. We need to understand how to use it to make machines smarter, too.”

-- Michael I. Jordan, paraphrased from a talk at AAAI, July 2002 by Michael Jordan (UC Berkeley)

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners Lee

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Knowledge Sharing 1.0

• In 1990 the DARPA knowledge sharing effort defined an approach for interoperability among KB systems and agents

–KIF + Shared Ontologies + KQML

• It was (and is) a great vision that resulted in much good research and some sound standards

–Supporting knowledge interoperability, agent communication, agent tasking and cooperation, etc.

• It never really made it out of the lab

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

Knowledge Sharing

2.0•The Web is a Blob, consuming all in it’s path. Resistance is futile

•More seriously, it promotessharing, building on other’s content, offering your content for building upon, decentralization, community development and evolution, common identifiers (URIs), using a working infrastructure, collaborating with industry, etc.

•These are significant advantages•The Semantic Web can be the interlingua and infrastructure for interoperability and knowledge sharing.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

From where will the markup come?• A few authors will add it manually.

• More will use annotation tools.– SMORE: Semantic Markup, Ontology and RDF Editor

• Intelligent processors (e.g., NLP) can understand documents and add markup (hard) – Machine learning powered information extraction tools

show promise

• Lots of web content comes from databases & we can generate SW markup along with the HTML– See http://ebiquity.umbc.edu/

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

From where will the markup come?• In many tools, part of the metadata information is

present, but thrown away at output – e.g., a business chart can be generated by a tool…

– …it “knows” a chart’s structure, classification, etc.

– …but, usually, this information is lost

– …storing it in metadata is easy!

• So “semantic web aware” tools can produce lots of metadata– E.g., Adobe’s use of its XMP platform

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

Google has made us smarter

Something similar is needed by people andsoftware agents for information on the semantic web.

tell

register

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

Why use IR techniques?• We will want to retrieve over the structured and

unstructured parts of a Semantic Wed Document (SWD)

• We should prepare for the appearance of text documents with embedded SW markup

• We may want to get our SWDs into conventional search engines, such as Google.

• IR techniques also have some unique characteristics that may be very usefule.g., ranking matches, measuring similaritybetween documents, relevance feedback,etc.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

title• text

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

340K SWDs, 48M triples, 97K classes,55K properties, 7M individuals (April 2005)

Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Demo1

Digest “Time” Ontology (document view)

Demo2(a)

Digest “Time” Ontology (term view)

Demo2(b)

………….

TimeZone

before

intAfter

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Digest Term “Person”Demo4

167 different properties

562 different properties

Demo5(a) Swoogle

Today

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

Swoogle’s Triple Store lets you shop

And check out your triples into any of several reasoners

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3 (July 2005)Swoogle3 (July 2005)

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart

Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals

2005

2004

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

Will it Scale? And How?• An open question is how well our approach will scale and what techniques will

work as the semantic web grows.• Here’s a rough estimate of the data on the semantic web based on Swoogle’s

crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 1.75x105 5x105 1x107 7.5x107 1x1010

2005 2.5x105 5x106 5x107 5x108 5x1010

2008 5x105 5x107 5x108 5x109 5x1011

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 25

Harnessing Google• Google started indexing RDF documents some

time in late 2003• Can we take advantage of this?• We’ve developed techniques to get some

structured data to be indexed by Google• And then later retrieved• Technique: give Google enhanced documents with

additional annotations containing Swangle Terms ™

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 29

Levels of granularity on the Semantic Web

• The semantic web has several levels of granularity.

• We’re most familiar withdocuments and triples.

• We’ve been exploring thenotion of an RDF Moleculeas a “meaningful” collectionof RDF triples.

• We believe that RDF moleculeswill be useful for: gathering evidence to verify an RDF graph and recording the provenance.

Universal RDF Graph

RDF Documents

Named Graphs

Molecules

Triples

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 30

RDF Molecules

• An RDF graph can be decomposed into subgraphs.

• A lossless decomposition is one in which the original graph can be recovered by concatenating the components.

• The presence of “blank nodes” limits our ability to completely reduce the graph to triples.

• RDF molecules are subgraphs which can not be further decomposed.

• RDF molecules are useful as minimal units of “evidence” in support of a graph.

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 31

1 2

4

756

3

An RDF graph of interest

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

1 2

4

756

3

4

56

1 2

3

1 2

4

4

75

An RDF graph of interest

The graph’s molecules

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 33

1 2

4

756

3

4

56

1 2

3

1 2

4

4

75

Web pages containing one or more molecules discovered by Swoogle

An RDF graph of interest

The graphs molecules

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 34

Blank nodes cause RDF molecule

http://foo.com/john

John Smith

foaf:name

foaf:mbox

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://foo.com/john foaf:name “John Smith”)(http://foo.com/john foaf:mbox mailto:john@foo.com)

mailto:john@foo.com

John Smith

foaf:name

foaf:mbox

mailto:john@foo.com

@prefix foaf: <http://xmlns.com/foaf/0.1/>.( ?x foaf:name “John Smith” )( ?x foaf:mbox mailto:john@foo.com )

G1: RDF graph without blank node G2: RDF graph with one blank node

2 molecules 1 molecule

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 35

Impact of functional dependency

Smith

foaf:firstName

foaf:mbox

mailto:john@foo.com

G3

Johnfoaf:surname

@prefix foaf: <http://xmlns.com/foaf/0.1/>.

(?x foaf:firstName “John")(?x foaf:surname “Smith")(?x foaf:mbox mailto:john@foo.com )

foaf:mbox an Inverse

Functional Property?

One molecule{ }

Two molecules{ }{ }

t1t2t3

t1

t2

t3

N

Y

t1 t2 t3

t1 t3

t2 t3

Molecule(s) produced after functional decomposition

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

Propagation of functional dependency

@prefix foaf: <http://xmlns.com/foaf/0.1/>.@prefix kin: <http://ebiquity.umbc.edu/ontologies/kin/0.3/>.(?y foaf:surname "Wang")(?y kin:motherOf ?x)(?x foaf:name "Li Ding")(?x foaf:mbox mailto:dingli1@umbc.edu )

Wangfoaf:surname

Li Dingfoaf:name

kin:motherOf

foaf:mbox

t2

mailto:dingli1@umbc.edu

G4

t1

t3t4

t1

t2

t3t4

• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

t1

t4

t2 t4

foaf:mbox and kin:motherOf are IFP

t2 t4

t3 t4

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

Beyond functional dependency• Our examples relied on OWL inverse functional

properties • A more general (and realistic) approach will be

based on probabilities• At issue is the conditional probability that two

blank nodes S1 and S2 are equivalence if each has a P property with value O.

prob(S1=S2 | P(S1,O), P(S2,O))• A set of properties can be used to get a high

probability, e.g., John Smith and J. Smith share the same home phone number and office phone number

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

Utility of Molecules

• Why are RDF molecules interesting?

• Suppose we have a graph and we seek evidence from the web to verify it’s accuracy.– E.g., verifying the information in a foaf description.

• Approach: – decompose the graph into molecules

– Search for instances of each using Swoogle4

– Note the source and provenance of each molecule

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusion

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers

• We need better ways to discover, index, search and reason over SW knowledge

• Special attention must be applied to provenance and trust

• We must develop, deploy and build on open, non-proprietary standards for knowledge sharing.

• The W3C standards RDF and OWL are a foundation for the first generation

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 41

http://ebiquity.umbc.edu/Annotated

in OWL

For more information

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 42

Nobody ever gotfired for buying IBM

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 43

Nobody ever gotfired for choosingWeb technology

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 44

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust– NLP meets the semantic web

• Conclusions

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 45

NLP meets the semantic web• Agents can benefit from knowledge and informa-

tion extracted by sophisticated NLP systems.

• NLP systems can make good use of facts published on the web.

• The semantic web provides both an interlingua andpublication method for this information exchange

• We’re working on a system totranslate information betweenOntoSem and OWL

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 46

O2O System Architecture

NL Text OntoSem

Ontology

FactRepository

TMR

OntoSem2OWLOWL

Ontology

TMRsIn OWL

OWL2OntoSem

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 47

Issues

• Mismatch between NLP KR systems and Semantic Web KR languages languages, e.g.– Most NLP systems use default reasoning

– Relaxing constraints for metaphorical readings

• Practical ontology mapping systems need to be developed– Combining distributed, partial maps is an interesting

idea

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 48

Types of molecule

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://www.cs.umbc.edu/~dingli1 foaf:name "Li Ding")(http://www.cs.umbc.edu/~dingli1 foaf:knows ?x )(?x foaf:name "Tim Finin")(?x foaf:mbox mailto:finin@umbc.edu)(?x foaf:mbox mailto:finin@cs.umbc.edu)

G4

http://www.cs.umbc.edu/~dingli1

Li Ding

Tim Finin

foaf:knows foaf:name

foaf:name

foaf:mbox

foaf:mbox

t1t2

t3t4t5

• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }

• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }

t1

t4 t5

t2 t4 t2 t5

t3 t4 t3 t5 t4 t5

t1

t2 t3 t4 t5

foaf:mbox is not IFP

foaf:mbox is IFP

mailto:finin@cs.umbc.edu

mailto:finin@umbc.edu

t1

t2

t3t4

t5

Recommended