UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University...

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Knowledge Sharing on the Semantic WebTim Finin

University of Maryland, Baltimore County

Department of Homeland SecurityAdvanced Scientific Computing Program

Text Analysis Workshop 25 May 2005

http://creativecommons.org/licenses/by-nc-sa/2.0/This work was partially supported by DARPA contract F30602-97-1-0215, NSF

grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

register

MarylandMaryland 2

This talk• Motivation• The knowledge sharing problem• Some ongoing projects

– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

MarylandMaryland 3

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

MarylandMaryland 4

“The web has made people smarter. We need to understand how to use it to make machines smarter, too.”

-- Michael I. Jordan, paraphrased from a talk at AAAI, July 2002 by Michael Jordan (UC Berkeley)

MarylandMaryland 5

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners Lee

MarylandMaryland 6

This talk• Motivation

• The knowledge sharing problem

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust

• Conclusions

MarylandMaryland 7

Knowledge Sharing 1.0

• In 1990 the DARPA knowledge sharing effort defined an approach for interoperability among KB systems and agents

–KIF + Shared Ontologies + KQML

• It was (and is) a great vision that resulted in much good research and some sound standards

–Supporting knowledge interoperability, agent communication, agent tasking and cooperation, etc.

• It never really made it out of the lab

MarylandMaryland 8

Knowledge Sharing

2.0•The Web is a Blob, consuming all in it’s path. Resistance is futile

•More seriously, it promotessharing, building on other’s content, offering your content for building upon, decentralization, community development and evolution, common identifiers (URIs), using a working infrastructure, collaborating with industry, etc.

•These are significant advantages•The Semantic Web can be the interlingua and infrastructure for interoperability and knowledge sharing.

MarylandMaryland 9

From where will the markup come?• A few authors will add it manually.

• More will use annotation tools.– SMORE: Semantic Markup, Ontology and RDF Editor

• Intelligent processors (e.g., NLP) can understand documents and add markup (hard) – Machine learning powered information extraction tools

show promise

• Lots of web content comes from databases & we can generate SW markup along with the HTML– See http://ebiquity.umbc.edu/

MarylandMaryland 10

From where will the markup come?• In many tools, part of the metadata information is

present, but thrown away at output – e.g., a business chart can be generated by a tool…

– …it “knows” a chart’s structure, classification, etc.

– …but, usually, this information is lost

– …storing it in metadata is easy!

• So “semantic web aware” tools can produce lots of metadata– E.g., Adobe’s use of its XMP platform

MarylandMaryland 11

• Conclusions

MarylandMaryland 12

Google has made us smarter

Something similar is needed by people andsoftware agents for information on the semantic web.

register

MarylandMaryland 13

Why use IR techniques?• We will want to retrieve over the structured and

unstructured parts of a Semantic Wed Document (SWD)

• We should prepare for the appearance of text documents with embedded SW markup

• We may want to get our SWDs into conventional search engines, such as Google.

• IR techniques also have some unique characteristics that may be very usefule.g., ranking matches, measuring similaritybetween documents, relevance feedback,etc.

MarylandMaryland 14

title• text

MarylandMaryland 15

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

340K SWDs, 48M triples, 97K classes,55K properties, 7M individuals (April 2005)

Find “Time” OntologyWe can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Digest “Time” Ontology (document view)

Demo2(a)

Digest “Time” Ontology (term view)

Demo2(b)

………….

TimeZone

before

intAfter

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Digest Term “Person”Demo4

167 different properties

562 different properties

Demo5(a) Swoogle

MarylandMaryland 22

Swoogle’s Triple Store lets you shop

And check out your triples into any of several reasoners

MarylandMaryland 23

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3 (July 2005)Swoogle3 (July 2005)

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart

Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string literals

MarylandMaryland 24

Will it Scale? And How?• An open question is how well our approach will scale and what techniques will

work as the semantic web grows.• Here’s a rough estimate of the data on the semantic web based on Swoogle’s

crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 1.75x105 5x105 1x107 7.5x107 1x1010

2005 2.5x105 5x106 5x107 5x108 5x1010

2008 5x105 5x107 5x108 5x109 5x1011

MarylandMaryland 25

Harnessing Google• Google started indexing RDF documents some

time in late 2003• Can we take advantage of this?• We’ve developed techniques to get some

structured data to be indexed by Google• And then later retrieved• Technique: give Google enhanced documents with

additional annotations containing Swangle Terms ™

MarylandMaryland 28

• Conclusions

MarylandMaryland 29

Levels of granularity on the Semantic Web

• The semantic web has several levels of granularity.

• We’re most familiar withdocuments and triples.

• We’ve been exploring thenotion of an RDF Moleculeas a “meaningful” collectionof RDF triples.

• We believe that RDF moleculeswill be useful for: gathering evidence to verify an RDF graph and recording the provenance.

Universal RDF Graph

RDF Documents

Named Graphs

Molecules

Triples

MarylandMaryland 30

RDF Molecules

• An RDF graph can be decomposed into subgraphs.

• A lossless decomposition is one in which the original graph can be recovered by concatenating the components.

• The presence of “blank nodes” limits our ability to completely reduce the graph to triples.

• RDF molecules are subgraphs which can not be further decomposed.

• RDF molecules are useful as minimal units of “evidence” in support of a graph.

MarylandMaryland 31

An RDF graph of interest

MarylandMaryland 32

The graph’s molecules

MarylandMaryland 33

Web pages containing one or more molecules discovered by Swoogle

The graphs molecules

MarylandMaryland 34

Blank nodes cause RDF molecule

http://foo.com/john

John Smith

foaf:name

foaf:mbox

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://foo.com/john foaf:name “John Smith”)(http://foo.com/john foaf:mbox mailto:john@foo.com)

mailto:john@foo.com

John Smith

foaf:name

foaf:mbox

mailto:john@foo.com

@prefix foaf: <http://xmlns.com/foaf/0.1/>.( ?x foaf:name “John Smith” )( ?x foaf:mbox mailto:john@foo.com )

G1: RDF graph without blank node G2: RDF graph with one blank node

2 molecules 1 molecule

MarylandMaryland 35

Impact of functional dependency

foaf:firstName

foaf:mbox

mailto:john@foo.com

Johnfoaf:surname

@prefix foaf: <http://xmlns.com/foaf/0.1/>.

(?x foaf:firstName “John")(?x foaf:surname “Smith")(?x foaf:mbox mailto:john@foo.com )

foaf:mbox an Inverse

Functional Property?

One molecule{ }

Two molecules{ }{ }

t1t2t3

t1 t2 t3

Molecule(s) produced after functional decomposition

MarylandMaryland 36

Propagation of functional dependency

@prefix foaf: <http://xmlns.com/foaf/0.1/>.@prefix kin: <http://ebiquity.umbc.edu/ontologies/kin/0.3/>.(?y foaf:surname "Wang")(?y kin:motherOf ?x)(?x foaf:name "Li Ding")(?x foaf:mbox mailto:dingli1@umbc.edu )

Wangfoaf:surname

Li Dingfoaf:name

kin:motherOf

foaf:mbox

mailto:dingli1@umbc.edu

• Terminal Molecules { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

foaf:mbox and kin:motherOf are IFP

MarylandMaryland 37

Beyond functional dependency• Our examples relied on OWL inverse functional

properties • A more general (and realistic) approach will be

based on probabilities• At issue is the conditional probability that two

blank nodes S1 and S2 are equivalence if each has a P property with value O.

prob(S1=S2 | P(S1,O), P(S2,O))• A set of properties can be used to get a high

probability, e.g., John Smith and J. Smith share the same home phone number and office phone number

MarylandMaryland 38

Utility of Molecules

• Why are RDF molecules interesting?

• Suppose we have a graph and we seek evidence from the web to verify it’s accuracy.– E.g., verifying the information in a foaf description.

• Approach: – decompose the graph into molecules

– Search for instances of each using Swoogle4

– Note the source and provenance of each molecule

MarylandMaryland 39

• Conclusion

MarylandMaryland 40

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers

• We need better ways to discover, index, search and reason over SW knowledge

• Special attention must be applied to provenance and trust

• We must develop, deploy and build on open, non-proprietary standards for knowledge sharing.

• The W3C standards RDF and OWL are a foundation for the first generation

MarylandMaryland 41

http://ebiquity.umbc.edu/Annotated

in OWL

For more information

MarylandMaryland 42

Nobody ever gotfired for buying IBM

MarylandMaryland 43

Nobody ever gotfired for choosingWeb technology

MarylandMaryland 44

• Some ongoing projects– Finding knowledge on the web– Evaluating provenance and trust– NLP meets the semantic web

• Conclusions

MarylandMaryland 45

NLP meets the semantic web• Agents can benefit from knowledge and informa-

tion extracted by sophisticated NLP systems.

• NLP systems can make good use of facts published on the web.

• The semantic web provides both an interlingua andpublication method for this information exchange

• We’re working on a system totranslate information betweenOntoSem and OWL

MarylandMaryland 46

O2O System Architecture

NL Text OntoSem

Ontology

FactRepository

OntoSem2OWLOWL

Ontology

TMRsIn OWL

OWL2OntoSem

MarylandMaryland 47

Issues

• Mismatch between NLP KR systems and Semantic Web KR languages languages, e.g.– Most NLP systems use default reasoning

– Relaxing constraints for metaphorical readings

• Practical ontology mapping systems need to be developed– Combining distributed, partial maps is an interesting

MarylandMaryland 48

Types of molecule

@prefix foaf: <http://xmlns.com/foaf/0.1/>.(http://www.cs.umbc.edu/~dingli1 foaf:name "Li Ding")(http://www.cs.umbc.edu/~dingli1 foaf:knows ?x )(?x foaf:name "Tim Finin")(?x foaf:mbox mailto:finin@umbc.edu)(?x foaf:mbox mailto:finin@cs.umbc.edu)

http://www.cs.umbc.edu/~dingli1

Li Ding

Tim Finin

foaf:knows foaf:name

foaf:name

foaf:mbox

t3t4t5

• Terminal Molecules { } { } { } { } { } { }• Non-Terminal Molecules { } { }• Contextual Molecule n/a

• Terminal Molecule{ }• Non-Terminal Molecule n/a• Contextual Molecule{ }

t2 t4 t2 t5

t3 t4 t3 t5 t4 t5

t2 t3 t4 t5

foaf:mbox is not IFP

foaf:mbox is IFP

mailto:finin@cs.umbc.edu

mailto:finin@umbc.edu

UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University...

Documents

Policies for Autonomy in Open Distributed Systems Mark Cornwell (GITI) James Just (GITI) Lalana Kagal (UMBC) Tim Finin (UMBC, GITI) Mike Huhns (USC, GITI)

1 What Comes Next ? Tim Finin University of Maryland, Baltimore County February 17, 2004

1 Intelligent Information Systems on the Web and in the Aether Tim Finin University of Maryland Baltimore County March 21, 2003 finin/talks/swig03.ppt

Sunil Gowda and Krishna M.Sivalingam University of Maryland Baltimore Country(UMBC)

:: eBiquity Research Group :: CSEE :: UMBC :: :: :: A Context Broker for Building Smart Meeting Rooms Harry Chen, Tim Finin, Anupam Joshi Univ. of Maryland,

STAFF (EMPLOYEE) HANDBOOK - UMBC: An Honors University In Maryland

Managing the Assured Information Sharing Lifecycle Tim Finin UMBC 22 June 2009 use acquire discover

Maryland Department of Health Master Agreement Annual ......The Hilltop Institute at UMBC The Hilltop Institute at the University of Maryland, Baltimore County (UMBC), currently in

UMBC an Honors University in Maryland The Semantic Web in use: Analyzing FOAF Documents Li Ding, Lina Zhou, Tim Finin and Anupam Joshi University of Maryland,

Graduate Studies at UMBC CSEE: How to Succeed Tim Finin Computer Science and Electrical Engineering Adapted from presentations by Professor Marie desJardins

Trust, Influence and Bias in Social Media Anupam Joshi Joint work with Tim Finin and several students Ebiquity Group, UMBC joshi@cs.umbc.edu

UMBC an Honors University in Maryland 1 Information Integration and the Semantic Web Finding knowledge, data and answers Tim Finin 1, Anupam Joshi 1, Li

Research support was provided by NSF, award NSF-ITR-IIS-0326460, PI Tim Finin, UMBC. SPIRE Semantic Prototypes in Research Ecoinfomatics Approach We are

Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009

Managing the Assured Information Sharing Lifecycle Tim Finin, UMBC 08 June 2009 use acquire discover

APV Dryer Handbook - UMBC: An Honors University In Maryland

:: Ebiquity Research Group :: CSEE :: UMBC :: :: :: An Ontology for Context-Aware Pervasive Computing Environments Harry Chen, Tim Finin, Anupam Joshi

Research support was provided by NSF, award NSF-ITR-IIS-0326460, PI Tim Finin, UMBC

UMBC AN HONORS UNIVERSITY IN MARYLAND Future Research Challenges and Needed Resources for The Web, Semantics and Data Mining Tim Finin UMBC, Baltimore

Dr. JoAnn (Jodi) Crandall University of Maryland Baltimore County (UMBC) crandall@umbc