Swoogle Semantic 201183583

Swoogle: A Semantic Web Search and Metadata EngineLi Ding, Tim Finin, Anupam Joshi, Rong Pan, Pavan Reddivari, Vishal Doshi, R. Scott Cost, Joel Sachs, Yun Peng Department of Computer Science and Electronic Engineering University of Maryland Baltimore County, Baltimore MD 21250, USA

Presented by

Adhitya Bhawiyuga (201183583)

Content

Introduction Semantic Web Document Swoogle Architecture Finding Semantic Web Document Semantic Web Document Metadata Ranking Indexing and Retrieval Current Status Conclusion and Future Work

Swoogle : Introduction

Are you familiar with this?

Introduction : What is Swoogle

Swoogle is search engine Crawler-based indexing and retrieval system Intended for Semantic web Extract metadata for each deocument Computes relation between document

Introduction : Related Work

Ontology based annotation system i.e. SHOE, Ontobroker, WebKB, QuizRDF based on annotation rather than on entire document Ontology repositories i.e. DAMLOntologyLibrary, SEM Web Central do not automatically discover semantic web document Semantics web browser i.e. Ontaria only focus on RDF storing, rather than on metadata

Swoogle : Semantic Web Document

Semantic Web Document : SWD

a document in a semantic web language that is online and accessible to web users and software agents...

SWD : Classification SWD is divided into : Semantics Web Ontology (SWO) define significant proportion of statement which makes new term (i.e. class, property) Semantics Web Database doesn't define or extend significant number of terms or we can say as individuals

in the case of Swoogle, SWD is classified by using a threshold formulation

SWD : Classification Example SWO Label Property

SWDB Tim Finin Tim 9da08e2b4dc670d9254ab4 a4b4d61637fed3b18f 49953f47b9c33484a753eaf 14102af56c0148d37

Swoogle : Architecture

Swoogle : Architecture (1)Data IR analyzer analysis SWD analyzer interface Web server Metadata creation SWD cache SWD metadata Web service Agent service

SWD discovery

SWD Reader

candidate url

Web crawler

web

Swoogle : Architecture (2) SWD Discovery discover potential SWD through web Metadata creation cache snapshot of SWD and generate objective metada of SWD Data analysis build analytical report based on cached SWD and created metadata Interface providing data services to Semantic Web Community

Swoogle : Finding SWD

Finding SWDGoogle based CrawlerUtilizing Google webservice

Web crawler

Focused CrawlerGive url address user Verify and discover SWD based on Its relation i.e import web

web crawler

Swoogle : SWD Metadata

SWD Metadata : About

Basic Metadata syntactic and semantic features of SWD Relations relation between SWD Analytical Result describe SWD ranking

Basic Metadata (1)

Language Features properties describing syntactic and semantic data. i.e encoding (xml/rdf), language (owl,daml), owl species (owl-dl,owl-lite) RDF Statistics properties summarizing the node distribution. containing information about statistics of rdf:Class, rdf:Property or individuals and obtain the ontology ratio Ontology annotation properties describing a SWD as an ontology. Swoogle record instance of OWL:Ontology properties. i.e label, comment, versionInfo

Basic Metadata : Determining Ontology Ratio

ontology-ratio

amount of class

amount of properties

| C ( foo) | | P ( foo) | R ( foo) ! | C ( foo) | | P ( foo) | I ( foo)amount of individuals

if ontology-ratio = 1, pure SWO if ontology-ratio = 0, pure SWDB if 0 < ontology-ratio < 1, determine a threshold

Relations Metadata (1) Swoogle captures following SWD relation

TM/IN captures term reference bewtween two SWD IM captures ontology import relation i.e. owl:imports, daml:imports EX captures ontology extends relation i.e. rdfs:subClassOf, rdfs:subPropertyOf, PV shows that an ontology is prior version of another i.e. owl:priorVersion

Relations Metadata (2) Swoogle captures following SWD relation

CPV shows that an ontology is prior version and compatible with another i.e. owl:DeprecatedProperty, owl:DeprecatedClass IPV shows that an ontology is prior version and incompatible with another i.e. owl:incompatibleWith

Swoogle : Ranking SWD

Ranking SWD : Google Page Rank Concept

Google introduce Page Rank concept to evaluate relative importance of web documents (probability) Probability calculated based on direct access probability and probability of following one links pointing to it

Ranking SWD : Swoogle Page Rank Concept (1)

Google page rank use uniform probability means all web document are treated with same manner In SWD, there are some different way to link the document with different manner. i.e. import, uses-term, extends Different term should be treated with different manner (give different weight) Therefore, Swoogle uses rational random surfing model

Ranking SWD : Rational Random Surfing Modelsum all link from x to a random page rankingf ( x, a ) !

weigth(l )l links ( x , a )

f ( x, a ) rawPR (a ) ! (1 d ) d rawPR( x) f ( x) x L ( a ) directprobability sum all outlink f ( x) !

f ( x, a )a T ( x )

Swoogle : Indexing and Retrieval

Information Retrieval

Using Traditional Information Retrieval method Work well with SWD document and text document with embedded markup Here is I describe the rdf:Description syntax : W3Schools Jan Egil Refsnes W3Schools Jan Egil Refsnes

Pure SWD Document

Text with embedded markup

Traditional Information RetrievalN-Gram based matching Matched sample with URIref Given word Slide n character Find matched sample With probability

Word based Matching Reduce RDF to triple Extract URI from SWD Matched with given word

Indexing

After retrieving some information, each SWD is indexed based on Page Ranking formulaRank 1 2 3 4 5 URL http://www.w3.org/1999/02/22-rdf-syntax-ns http://www.w3.org/2000/01/rdf-schema http://www.daml.org/2001/03/daml+oil http://www.w3.org/2002/07/owl http://www.w3.org/2000/10/rdftests/rdfcore/testSchema Value 2845.97 2814.21 311.65 192.18 59.82

Current Status

Page 30

Conclusion and Future Work

Powerful search and indexing systems are needed by Semantic Web developers and researchers to help them find and analyze SWDs Current web search engines such as Google and AlltheWeb do not work well with SWDs, as they are designed to work with natural languages Swoogle runs multiple crawlers to discover SWDs through meta-search and link-following

Thank you Terima kasih

Documents

Swoogle Semantic 201183583