36
eBiquity Lab, CSEE, UMBC @ Swoogle Tutorial (Part I: Swoogle R & D) A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development Presented by eBiquity Lab, CSEE, UMBC

Swoogle Tutorial (Part I: Swoogle R & D)

  • Upload
    alva

  • View
    78

  • Download
    0

Embed Size (px)

DESCRIPTION

Swoogle Tutorial (Part I: Swoogle R & D). Presented by eBiquity Lab, CSEE, UMBC. A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development. 1. Introduction. Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture. S w o o g l e. - PowerPoint PPT Presentation

Citation preview

Page 1: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @

Swoogle Tutorial (Part I: Swoogle R & D)

A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development

Presented by eBiquity Lab, CSEE, UMBC

Page 2: Swoogle Tutorial  (Part I: Swoogle R & D)

1. Introducti

on Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture

SwoogleSwoogle

Page 3: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Motivation

(Google + Web) has made us all smarter something similar is needed by people and software

agents for information on the semantic web

Page 4: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

The Role of Swoogle in Semantic Web

Semantic WebServices

Semantic web data

Software Agents, Applications

SW data service

database(Web) document

RDF document

usesuses

Directory/Digest Service

Service Finder

digestsdigests

searches

Data Finder Swoogle

Page 5: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Concepts Explained

wordNet:Agent

rdf:typerdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

foaf:mbox

rdfs:domain

rdf:typerdf:Property

Property

Class

SWO

http://foo.com/foaf.rdf#finin

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://foo.com/foaf.rdf#finin

SWI

Individual

SWD

Term

NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows

rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/wordNet: => http://xmlns.com/wordnet/1.6/

Page 6: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Glossary Document

A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).

An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.

An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic.

Term A term is a non-anonymous RDF resource which is the URI reference of

either a class or a property.

Individual An individual refers to a non-anonymous RDF resource which is the URI

reference of a class member.

In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple.

*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm

rdf:typerdfs:Class

foaf:Person

rdf:typefoaf:Person

http://.../foaf.rdf#finin

Page 7: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Swoogle Architecture

metadata creation

data analysis

interface

SWD discovery

SWD MetadataWeb Service

Web Server

SWD Cache

The Web

The WebCandidate

URLs Web Crawler

SWD Reader

IR analyzer SWD analyzer

Agent Service

Page 8: Swoogle Tutorial  (Part I: Swoogle R & D)

2. Swoogle Research

Discovery Digest Search & Navigation Rank Statistics

SwoogleSwoogle

Page 9: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Discovery - research Discovering URLs of possible SWD

automatically Google-crawler Focused-crawler Semantic-Web-crawler, e.g. scutter

Revisiting URLs

Page 10: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Discovery -- results Crawler performance

Google crawler is the best Focused crawler needs to be improved

Verified pure SWDs are only 1/3 of discovered URLs Some NSWDs contains embedded RDF graph.

  SWD NSWD Undecided TOTAL

Focused Crawler 1,465 7% 10,580 52% 8,292 20,337

google crawler 273,023 36% 369,371 49% 110,794 753,188

swd_crawler 61,870 15% 285,506 70% 57,709 405,085

TOTAL 336,358   665,457   176,795 1,178,610

Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by

Page 11: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Digest -- research Document metadata

Annotative General metadata SWD metadata Ontology metadata

Inter-document relations Document-term relations

Term metadata Term Definition Inter-term Relation

Class-property bond (C-P bond): rdfs:domain Property-Class bond (P-C bond): rdfs:range

Page 12: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Document Metadata Web document metadata

When/how discovered/fetched Suffix of URL Last modified time Document size

SWD metadata Language features

OWL species RDF encoding

Statistical features # of Defined/used terms # of Declared/used namespaces Ontology Ratio

Ontology Rank

Ontology annotation Label Version Comment

Relations Links to other SWDs

Imported SWDs Referenced SWDs Extended SWDs Prior version

Links to terms Classes/properties defined Classes/properties used

Page 13: Swoogle Tutorial  (Part I: Swoogle R & D)

Digest “Time” Ontology (document view)

Demo2(a)

Page 14: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Document-Term Relation

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://www.cs.umbc.edu/~finin/foaf.rdf

wordNet:Agent

rdf:typerdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

foaf:mbox

rdfs:domain

rdf:typerdf:Property

populated Class

defined Class

populated Property

defined Property

http://foo.com/foaf.rdf#finin

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://foo.com/foaf.rdf

defined Individual

Page 15: Swoogle Tutorial  (Part I: Swoogle R & D)

Digest “Time” Ontology (term view)

Demo2(b)

………….

Page 16: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Term Metadata

Term Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”

C-P bond (from SWI)• foaf:name• dc:title

C-P bond (from SWO)• foaf:mbox• foaf:name

foaf:name

foaf:mbox

rdfs:domain

rdfs:domain

Onto 1

owl:Classrdf:type

“Person”rdfs:label

foaf:Agent

rdfs:subClassOf

Onto 2

foaf:name

rdf:type

“Tim Finin”

SWD3

foaf:Person

Page 17: Swoogle Tutorial  (Part I: Swoogle R & D)

Digest Term “Person”Demo4

Page 18: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Term Distribution (grouped by local name)case-insensitive case-sensitive

Name 656 1 name 560 11 source 129

Person 399 2 Person 357 12 email 125

Title 349 3 title 292 13 Book 124

Location 334 4 description 242 14 address 121

Description 288 5 location 213 15 Event 117

Date 257 6 type 196 16 Location 114

Type 242 7 date 173 17 author 111

country 236 8 value 154 18 Animal 111

Address 212 9 Organization 134 19 Country 104

organization 186 10 country 130 20 language 103

   

total 72502 total 76827

Page 19: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Digest -- result

type Pop. Def. # termTotalTerms # populated

Totalpopulated

class 0 1 83,602 88%   0 0%  

1 0 3,954 4%   1,002,961 13%  

1 1 7,065 7% 94,621 6,483,485 87% 7,486,446

property 0 1 42,853 73%   0 0%  

1 0 8,312 14%   2,438,455 6%  

1 1 7,836 13% 59,001 36,899,842 94% 39,338,297

Ontological Term Distribution (populated, defined)

Source: Swoogle (2005-Jan-05) SELECT res_type,sign(cnt_instance_populate>0), sign(cnt_swd_def>0),count(*), sum(cnt_instance_populate) FROM `digest_term` WHERE 1 group by res_type, sign(cnt_instance_populate>0), sign(cnt_swd_def>0)

Page 20: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Search & Navigation -- researchThe Semantic Web is not the Web

Search service Document search – RDF document is not free text Term search – URIref and compound local name

Navigation service The RDF graph – Typed links The web of RDF documents – Few hyperlinks The social network of agents – trust & provenance

Page 21: Swoogle Tutorial  (Part I: Swoogle R & D)

Find “Time” Ontology

We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Demo1

Page 22: Swoogle Tutorial  (Part I: Swoogle R & D)

Find Term “Person”Demo3

Not capitalized! URIref is case sensitive!

Page 23: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Current Swoogle Navigation Model A URIref refers to

A term, i.e. instance of RDFS class/property

An individual, i.e. populated terms A SWD could be

SWO: term definition SWI: individuals

Observations RDF Resources are semantically

linked in RDF graph SWDs are poorly linked due to the

absence of explicit hyperlink concept

Ontologies are more interesting Approach

Build inter-document relations Rational surfing model

SWOs

SWIs

HTMLdocuments

Images

Audiofiles

Videofiles

Page 24: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

URL

URIref

Semantic Web Navigation Model new!

Resource

RDF Document

populatesClasspopulatesPropertyrefersClassrefersProperty

definesClassdefinesProperty

rdfsOntologyowldlOntology

owl:importsowl:priorVersionowl:backwardCompatibleWithowl:imcompatiableWith

rdfs:seeAlsordfs:isDefinedBy

Ontology

Namespace

isDefinedByisUsedBy

usesNamespace

rdfs:subClassOf

sameNamespacesameLocalname

RDF Graph Navigation …Term Search

Document Search

Page 25: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Ranking -- research Surfing models

Ranking method PageRank variation

What to rank

Scope Idea

Rational surfing model SWD Semantic Web Summarize inter-document relation as EX, TM, IM, PV

Plain Graph Model Resource RDF graph RDF graph is browsed as a weighted directed graph

RDFS-based Model Resource RDF graph RDF graph is browsed only with RDFS semantics

SW navigation model Resource

& SWD

Semantic Web Assume Swoogle is used in navigation

Page 26: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Ranking with Rational Surfing Model: An Example

foaf:mbox

rdf:type

[email protected]

foaf:Person

http://www.cs.umbc.edu/~finin/foaf.rdfwordNet:Person

rdf:type rdfs:Class

rdfs:subClassOf

foaf:Person

http://xmlns.com/foaf/1.0/

TM

TM

TM

http://www.w3.org/2000/01/rdf-schema

rdfs:subClassOf

rdf:Property

rdf:type

http://xmlns.com/wordnet/1.6/

rdfs:Classrdf:type

wordNet:Individualrdfs:subClassOf

wordNet:Person

EX

Page 27: Swoogle Tutorial  (Part I: Swoogle R & D)

Demo6 Swoogle’ top

10

This report is dynamically generated based on the latest data, and it will take 5 to 10 seconds.

Swoogle use PageRank like algorithm to rank semantic web documents. Well-known ontologies are highly ranked.

Page 28: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Statistics – research Summarize the dataset collected by Swoogle

Swoogle Watch Swoogle Today Distribution of visited URLs Document discovery log Term discovery log

Semantic Web Watch SWD distribution by last-modified month SWD distribution by website SWD distribution by suffix

Ontology Watch Term (class/property) usage Namespace usage

Page 29: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Demo5(a) Swoogle

Today

Page 30: Swoogle Tutorial  (Part I: Swoogle R & D)

Demo5(b) Swoogle

Statistics

FOAFFOAF

TrustixTrustix

W3CW3C

StanfordStanford

Page 31: Swoogle Tutorial  (Part I: Swoogle R & D)

Demo5(c) Swoogle

Statistics

Page 32: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Miscellaneous Submit URL for focused Crawler Swoogle Web Service (Delivered in Sept.)

http://swoogle.umbc.edu/webservice/ Search document Search term Term digest

Page 33: Swoogle Tutorial  (Part I: Swoogle R & D)

When you can’t find your ontologies in Swoogle, it may be the case that your ontologies are not indexed by swoogle yet.

Please submit it and increase its visibility.

From site map

When your query fails

Demo7 Submit URL for focused crawler

Page 34: Swoogle Tutorial  (Part I: Swoogle R & D)

3. Summar

y Summary Current Status

SwoogleSwoogle

Page 35: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Summary

Swoogle (Mar, 2004)Swoogle (Mar, 2004)

Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)

Swoogle3Swoogle3

Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface

Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search

Better discovery & revisit strategies Better navigation models Semantic web dataset Index Instance data More metadata (ontology mapping) Better web service interfaces

2005

2004

Page 36: Swoogle Tutorial  (Part I: Swoogle R & D)

eBiquity Lab, CSEE, UMBC @SwoogleSwoogle

Current Status Swoogle Watch reported (Jan 6, 2005)

46.7 M triples 336 K SWDs: 4k ontologies 153 K terms: 94K classes & 59K properties

Ongoing work Research

Self-adaptive SWD Discovery Efficient SWD digest and RDF Graph Abstract Semantic Web navigation model

Engineering Enhancing Web Service interface