46
UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland Security Advanced Scientific Computing Program Text Analysis Workshop 25 May 2005 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP. tell register tell register

UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Embed Size (px)

Citation preview

Page 1: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Knowledge Sharing on the Semantic WebTim Finin

University of Maryland, Baltimore County

Department of Homeland SecurityAdvanced Scientific Computing Program

Text Analysis Workshop 25 May 2005

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

tell

register

tell

register

Page 2: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation• The knowledge sharing problem• Some ongoing projects

– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

Page 3: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 3

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

Page 4: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 4

“The web has made people smarter. We need to understand how to use it to make machines smarter, too.”

-- Michael I. Jordan, paraphrased from a talk at AAAI, July 2002 by Michael Jordan (UC Berkeley)

Page 5: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 5

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners Lee

Page 6: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 6

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

Page 7: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 7

Knowledge Sharing 1.0

• In 1990 the DARPA knowledge sharing effort defined an approach for interoperability among KB systems and agents

–KIF + Shared Ontologies + KQML

• It was (and is) a great vision that resulted in much good research and some sound standards

–Supporting knowledge interoperability, agent communication, agent tasking and cooperation, etc.

• It never really made it out of the lab

Page 8: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

Knowledge Sharing

2.0•The Web is a Blob, consuming all in it’s path. Resistance is futile

•More seriously, it promotessharing, building on other’s content, offering your content for building upon, decentralization, community development and evolution, common identifiers (URIs), using a working infrastructure, collaborating with industry, etc.

•These are significant advantages•The Semantic Web can be the interlingua and infrastructure for interoperability and knowledge sharing.

Page 9: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

From where will the markup come?• A few authors will add it manually.

• More will use annotation tools.– SMORE: Semantic Markup, Ontology and RDF Editor

• Intelligent processors (e.g., NLP) can understand documents and add markup (hard) – Machine learning powered information extraction tools

show promise

• Lots of web content comes from databases & we can generate SW markup along with the HTML– See http://ebiquity.umbc.edu/

Page 10: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

From where will the markup come?• In many tools, part of the metadata information is

present, but thrown away at output – e.g., a business chart can be generated by a tool…

– …it “knows” a chart’s structure, classification, etc.

– …but, usually, this information is lost

– …storing it in metadata is easy!

• So “semantic web aware” tools can produce lots of metadata– E.g., Adobe’s use of its XMP platform

Page 11: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

Page 12: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

Google has made us smarter

Something similar is needed by people andsoftware agents for information on the semantic web.

tell

register

Page 13: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

Why use IR techniques?• We will want to retrieve over the structured and

unstructured parts of a Semantic Wed Document (SWD)

• We should prepare for the appearance of text documents with embedded SW markup

• We may want to get our SWDs into conventional search engines, such as Google.

• IR techniques also have some unique characteristics that may be very usefule.g., ranking matches, measuring similaritybetween documents, relevance feedback,etc.

Page 14: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 14

title• text

Page 15: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 15

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

340K SWDs, 48M triples, 97K classes,55K properties, 7M individuals (April 2005)

Page 16: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Demo1

Page 17: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Digest “Time” Ontology (document view)

Demo2(a)

Page 18: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Digest “Time” Ontology (term view)

Demo2(b)

………….

TimeZone

before

intAfter

Page 19: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Page 20: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Digest Term “Person”Demo4

167 different properties

562 different properties

Page 21: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

Demo5(a) Swoogle

Today

Page 22: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

Swoogle’s Triple Store lets you shop

And check out your triples into any of several reasoners

Page 23: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 23

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3 (July 2005)Swoogle3 (July 2005)

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart

Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals

2005

2004

Page 24: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 24

Will it Scale? And How?• An open question is how well our approach will scale and what techniques will

work as the semantic web grows.• Here’s a rough estimate of the data on the semantic web based on Swoogle’s

crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 1.75x105 5x105 1x107 7.5x107 1x1010

2005 2.5x105 5x106 5x107 5x108 5x1010

2008 5x105 5x107 5x108 5x109 5x1011

Page 25: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 25

Harnessing Google• Google started indexing RDF documents some

time in late 2003• Can we take advantage of this?• We’ve developed techniques to get some

structured data to be indexed by Google• And then later retrieved• Technique: give Google enhanced documents with

additional annotations containing Swangle Terms ™

Page 26: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

Page 27: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 29

Levels of granularity on the Semantic Web

• The semantic web has several levels of granularity.

• We’re most familiar withdocuments and triples.

• We’ve been exploring thenotion of an RDF Moleculeas a “meaningful” collectionof RDF triples.

• We believe that RDF moleculeswill be useful for: gathering evidence to verify an RDF graph and recording the provenance.

Universal RDF Graph

RDF Documents

Named Graphs

Molecules

Triples

Page 28: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 30

RDF Molecules

• An RDF graph can be decomposed into subgraphs.

• A lossless decomposition is one in which the original graph can be recovered by concatenating the components.

• The presence of “blank nodes” limits our ability to completely reduce the graph to triples.

• RDF molecules are subgraphs which can not be further decomposed.

• RDF molecules are useful as minimal units of “evidence” in support of a graph.

Page 29: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 31

1 2

4

756

3

An RDF graph of interest

Page 30: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

1 2

4

756

3

4

56

1 2

3

1 2

4

4

75

An RDF graph of interest

The graph’s molecules

Page 31: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 33

1 2

4

756

3

4

56

1 2

3

1 2

4

4

75

Web pages containing one or more molecules discovered by Swoogle

An RDF graph of interest

The graphs molecules

Page 32: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 34

Blank nodes cause RDF molecule

http://foo.com/john

John Smith

foaf:name

foaf:mbox

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://foo.com/john foaf:name “John Smith”)(http://foo.com/john foaf:mbox mailto:[email protected])

mailto:[email protected]

John Smith

foaf:name

foaf:mbox

mailto:[email protected]

@prefix foaf: <http://xmlns.com/foaf/0.1/>.( ?x foaf:name “John Smith” )( ?x foaf:mbox mailto:[email protected] )

G1: RDF graph without blank node G2: RDF graph with one blank node

2 molecules 1 molecule

Page 33: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 35

Impact of functional dependency

Smith

foaf:firstName

foaf:mbox

mailto:[email protected]

G3

Johnfoaf:surname

@prefix foaf: <http://xmlns.com/foaf/0.1/>.

(?x foaf:firstName “John")(?x foaf:surname “Smith")(?x foaf:mbox mailto:[email protected] )

foaf:mbox an Inverse

Functional Property?

One molecule{ }

Two molecules{ }{ }

t1t2t3

t1

t2

t3

N

Y

t1 t2 t3

t1 t3

t2 t3

Molecule(s) produced after functional decomposition

Page 34: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

Propagation of functional dependency

@prefix foaf: <http://xmlns.com/foaf/0.1/>.@prefix kin: <http://ebiquity.umbc.edu/ontologies/kin/0.3/>.(?y foaf:surname "Wang")(?y kin:motherOf ?x)(?x foaf:name "Li Ding")(?x foaf:mbox mailto:[email protected] )

Wangfoaf:surname

Li Dingfoaf:name

kin:motherOf

foaf:mbox

t2

mailto:[email protected]

G4

t1

t3t4

t1

t2

t3t4

• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

t1

t4

t2 t4

foaf:mbox and kin:motherOf are IFP

t2 t4

t3 t4

Page 35: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

Beyond functional dependency• Our examples relied on OWL inverse functional

properties • A more general (and realistic) approach will be

based on probabilities• At issue is the conditional probability that two

blank nodes S1 and S2 are equivalence if each has a P property with value O.

prob(S1=S2 | P(S1,O), P(S2,O))• A set of properties can be used to get a high

probability, e.g., John Smith and J. Smith share the same home phone number and office phone number

Page 36: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

Utility of Molecules

• Why are RDF molecules interesting?

• Suppose we have a graph and we seek evidence from the web to verify it’s accuracy.– E.g., verifying the information in a foaf description.

• Approach: – decompose the graph into molecules

– Search for instances of each using Swoogle4

– Note the source and provenance of each molecule

Page 37: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusion

Page 38: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers

• We need better ways to discover, index, search and reason over SW knowledge

• Special attention must be applied to provenance and trust

• We must develop, deploy and build on open, non-proprietary standards for knowledge sharing.

• The W3C standards RDF and OWL are a foundation for the first generation

Page 39: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 41

http://ebiquity.umbc.edu/Annotated

in OWL

For more information

Page 40: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 42

Nobody ever gotfired for buying IBM

Page 41: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 43

Nobody ever gotfired for choosingWeb technology

Page 42: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 44

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust– NLP meets the semantic web

• Conclusions

Page 43: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 45

NLP meets the semantic web• Agents can benefit from knowledge and informa-

tion extracted by sophisticated NLP systems.

• NLP systems can make good use of facts published on the web.

• The semantic web provides both an interlingua andpublication method for this information exchange

• We’re working on a system totranslate information betweenOntoSem and OWL

Page 44: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 46

O2O System Architecture

NL Text OntoSem

Ontology

FactRepository

TMR

OntoSem2OWLOWL

Ontology

TMRsIn OWL

OWL2OntoSem

Page 45: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 47

Issues

• Mismatch between NLP KR systems and Semantic Web KR languages languages, e.g.– Most NLP systems use default reasoning

– Relaxing constraints for metaphorical readings

• Practical ontology mapping systems need to be developed– Combining distributed, partial maps is an interesting

idea

Page 46: UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 48

Types of molecule

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://www.cs.umbc.edu/~dingli1 foaf:name "Li Ding")(http://www.cs.umbc.edu/~dingli1 foaf:knows ?x )(?x foaf:name "Tim Finin")(?x foaf:mbox mailto:[email protected])(?x foaf:mbox mailto:[email protected])

G4

http://www.cs.umbc.edu/~dingli1

Li Ding

Tim Finin

foaf:knows foaf:name

foaf:name

foaf:mbox

foaf:mbox

t1t2

t3t4t5

• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }

• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }

t1

t4 t5

t2 t4 t2 t5

t3 t4 t3 t5 t4 t5

t1

t2 t3 t4 t5

foaf:mbox is not IFP

foaf:mbox is IFP

mailto:[email protected]

mailto:[email protected]

t1

t2

t3t4

t5