Upload
adhitya-bhawiyuga
View
60
Download
1
Tags:
Embed Size (px)
Citation preview
Swoogle: A Semantic Web Search and Metadata EngineLi Ding, Tim Finin, Anupam Joshi, Rong Pan, Pavan Reddivari, Vishal Doshi, R. Scott Cost, Joel Sachs, Yun Peng Department of Computer Science and Electronic Engineering University of Maryland Baltimore County, Baltimore MD 21250, USA
Presented by
Adhitya Bhawiyuga (201183583)
Content
Introduction Semantic Web Document Swoogle Architecture Finding Semantic Web Document Semantic Web Document Metadata Ranking Indexing and Retrieval Current Status Conclusion and Future Work
Swoogle : Introduction
Are you familiar with this?
Introduction : What is Swoogle
Swoogle is search engine Crawler-based indexing and retrieval system Intended for Semantic web Extract metadata for each deocument Computes relation between document
Introduction : Related Work
Ontology based annotation system i.e. SHOE, Ontobroker, WebKB, QuizRDF based on annotation rather than on entire document Ontology repositories i.e. DAMLOntologyLibrary, SEM Web Central do not automatically discover semantic web document Semantics web browser i.e. Ontaria only focus on RDF storing, rather than on metadata
Swoogle : Semantic Web Document
Semantic Web Document : SWD
a document in a semantic web language that is online and accessible to web users and software agents...
SWD : Classification SWD is divided into : Semantics Web Ontology (SWO) define significant proportion of statement which makes new term (i.e. class, property) Semantics Web Database doesn't define or extend significant number of terms or we can say as individuals
in the case of Swoogle, SWD is classified by using a threshold formulation
SWD : Classification Example SWO Label Property
SWDB Tim Finin Tim 9da08e2b4dc670d9254ab4 a4b4d61637fed3b18f 49953f47b9c33484a753eaf 14102af56c0148d37
Swoogle : Architecture
Swoogle : Architecture (1)Data IR analyzer analysis SWD analyzer interface Web server Metadata creation SWD cache SWD metadata Web service Agent service
SWD discovery
SWD Reader
candidate url
Web crawler
web
Swoogle : Architecture (2) SWD Discovery discover potential SWD through web Metadata creation cache snapshot of SWD and generate objective metada of SWD Data analysis build analytical report based on cached SWD and created metadata Interface providing data services to Semantic Web Community
Swoogle : Finding SWD
Finding SWDGoogle based CrawlerUtilizing Google webservice
Web crawler
Focused CrawlerGive url address user Verify and discover SWD based on Its relation i.e import web
web crawler
Swoogle : SWD Metadata
SWD Metadata : About
Basic Metadata syntactic and semantic features of SWD Relations relation between SWD Analytical Result describe SWD ranking
Basic Metadata (1)
Language Features properties describing syntactic and semantic data. i.e encoding (xml/rdf), language (owl,daml), owl species (owl-dl,owl-lite) RDF Statistics properties summarizing the node distribution. containing information about statistics of rdf:Class, rdf:Property or individuals and obtain the ontology ratio Ontology annotation properties describing a SWD as an ontology. Swoogle record instance of OWL:Ontology properties. i.e label, comment, versionInfo
Basic Metadata : Determining Ontology Ratio
ontology-ratio
amount of class
amount of properties
| C ( foo) | | P ( foo) | R ( foo) ! | C ( foo) | | P ( foo) | I ( foo)amount of individuals
if ontology-ratio = 1, pure SWO if ontology-ratio = 0, pure SWDB if 0 < ontology-ratio < 1, determine a threshold
Relations Metadata (1) Swoogle captures following SWD relation
TM/IN captures term reference bewtween two SWD IM captures ontology import relation i.e. owl:imports, daml:imports EX captures ontology extends relation i.e. rdfs:subClassOf, rdfs:subPropertyOf, PV shows that an ontology is prior version of another i.e. owl:priorVersion
Relations Metadata (2) Swoogle captures following SWD relation
CPV shows that an ontology is prior version and compatible with another i.e. owl:DeprecatedProperty, owl:DeprecatedClass IPV shows that an ontology is prior version and incompatible with another i.e. owl:incompatibleWith
Swoogle : Ranking SWD
Ranking SWD : Google Page Rank Concept
Google introduce Page Rank concept to evaluate relative importance of web documents (probability) Probability calculated based on direct access probability and probability of following one links pointing to it
Ranking SWD : Swoogle Page Rank Concept (1)
Google page rank use uniform probability means all web document are treated with same manner In SWD, there are some different way to link the document with different manner. i.e. import, uses-term, extends Different term should be treated with different manner (give different weight) Therefore, Swoogle uses rational random surfing model
Ranking SWD : Rational Random Surfing Modelsum all link from x to a random page rankingf ( x, a ) !
weigth(l )l links ( x , a )
f ( x, a ) rawPR (a ) ! (1 d ) d rawPR( x) f ( x) x L ( a ) directprobability sum all outlink f ( x) !
f ( x, a )a T ( x )
Swoogle : Indexing and Retrieval
Information Retrieval
Using Traditional Information Retrieval method Work well with SWD document and text document with embedded markup Here is I describe the rdf:Description syntax : W3Schools Jan Egil Refsnes W3Schools Jan Egil Refsnes
Pure SWD Document
Text with embedded markup
Traditional Information RetrievalN-Gram based matching Matched sample with URIref Given word Slide n character Find matched sample With probability
Word based Matching Reduce RDF to triple Extract URI from SWD Matched with given word
Indexing
After retrieving some information, each SWD is indexed based on Page Ranking formulaRank 1 2 3 4 5 URL http://www.w3.org/1999/02/22-rdf-syntax-ns http://www.w3.org/2000/01/rdf-schema http://www.daml.org/2001/03/daml+oil http://www.w3.org/2002/07/owl http://www.w3.org/2000/10/rdftests/rdfcore/testSchema Value 2845.97 2814.21 311.65 192.18 59.82
Current Status
Page 30
Conclusion and Future Work
Powerful search and indexing systems are needed by Semantic Web developers and researchers to help them find and analyze SWDs Current web search engines such as Google and AlltheWeb do not work well with SWDs, as they are designed to work with natural languages Swoogle runs multiple crawlers to discover SWDs through meta-search and link-following
Thank you Terima kasih