Upload
brady-logan
View
51
Download
1
Embed Size (px)
DESCRIPTION
Search Engines for Semantic Web Knowledge. Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi. - PowerPoint PPT Presentation
Citation preview
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Search Engines for Semantic Web
KnowledgeTim Finin
University of Maryland, Baltimore County
Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi
http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and
HP.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 2
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• State of the Semantic Web• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 8
Google has made us smarter
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 9
But what about our agents?
tell
register
Agents still have a very minimal understanding of text and images.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 10
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• State of the Semantic Web• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 11
XML helps
“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”
-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 12
“The Semantic Web will globalize KR, just as the WWW globalize hypertext”
-- Tim Berners-Lee
Semantic Web adds semantics
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 13
Semantic Web 101<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/>
<uni:Student> <foaf:name>Li Ding</foaf:name> <foaf:mbox rdf:resource=“mailto:[email protected]”/> </uni:Student></rdf:RDF>
• RDF/XML• rdf:RDF tag• namespaces ontologies
• Semantic graph, URIs as nodes & links
• triples
Li Dingfoaf:name
uni:Studentrdf:type
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 18
But what about our agents?
A Google for knowledge on the Semantic Web is needed by software agents and programs
SwoogleSwoogle
Swoogle
Swoogle
SwoogleSwoogle
SwoogleSwoogle
Swoogle SwoogleSwoogle
SwoogleSwoogle
SwoogleSwoogle
tell
register
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 19
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• State of the Semantic Web• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 20
•http://swoogle.umbc.edu/•Running since summer 2004•1.4M RDF documents, 250M RDF triples, 10K
ontologies
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 21
Analysis
Index
Discovery
IR Indexer
Search Services
Semantic Webmetadata
Web Service
Web Server
Candidate URLs
Bounded Web CrawlerGoogle Crawler
SwoogleBot
SWD Indexer
Ranking
document cache
SWD classifier
human machine
html rdf/xml
…
the WebSemantic Web
Information flow Swoogle‘s web interface
Legends
Swoogle Architecture
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 22
A Hybrid Harvesting Framework
Manual submission
RDF crawlingBounded HTML crawlingMeta crawling
Seeds M Seeds H Seeds R
SwoogleSampleDataset
Inductive learner
the Web
Google API call crawl crawl
true
would
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 25
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• State of the Semantic Web• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 26
Applications and use cases
• Supporting Semantic Web developers– Ontology designers, vocabulary discovery, who’s using
my ontologies or data?, use analysis, errors,statistics, etc.
• Searching specialized collections– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
• Supporting SW tools– Triple shop: finding data for SPARQL queries
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 27
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 28
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.
80 ontologies were found that had these three terms
Let’s look at this one
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 29
Basic MetadatahasDateDiscovered: 2005-01-17 hasDatePing: 2006-03-21 hasPingState: PingModified type: SemanticWebDocument isEmbedded: false hasGrammar: RDFXML hasParseState: ParseSuccess hasDateLastmodified: 2005-04-29 hasDateCache: 2006-03-21 hasEncoding: ISO-8859-1 hasLength: 18K hasCntTriple: 311.00 hasOntoRatio: 0.98 hasCntSwt: 94.00 hasCntSwtDef: 72.00 hasCntInstance: 8.00
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 30
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 31
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 32
These are the namespaces this ontology uses. Clicking on one
shows all of the documents using the namespace.
All of this is available in RDF form for the
agents among us.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 33
Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 34
We can also search for terms (classes, properties) like terms for “person”.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 35
10K terms associatged with “person”! Ordered by use.
Let’s look at foaf:Person’s metadata
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 36
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 37
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 38
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 39
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 40
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 41
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 45
UMBC Triple Shop
• http://sparql.cs.umbc.edu/• Online SPARQL RDF query processing based
on HP’s Jena and Joseki with several interesting features• Selectable level of inference over model• Automatically finds SWDs for give queries using Swoogle
backend database– Provide dataset creation wizard– Dataset can be stored on our server or downloaded– Tag, share and search over saved datasets
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 46
Web-scale semantic web data access
agent data access service the Web
ask (“person”)Search vocabulary
ask (“?x rdf:type foaf:Person”)
inform (“foaf:Person”)
Fetch docs
Populate RDF database
Query localRDF database
inform (doc URLs)
Search URIrefs in SW vocabulary
Search URLsin SWD index
Compose query
Index RDF data
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 47
Who knows Anupam Joshi?Show me their names, email address and pictures
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 48
The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 49
No FROM clause!
Constraints on wherethe data comes from
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 50
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT DISTINCT ?p2name ?p2mbox ?p2pixWHERE { ?p1 foaf:name "Anupam Joshi" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . }ORDER BY ?p2name
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 51
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 52
Swoogle found 292 RDF data files that appear relevant to answering our query
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 53
Let’s save the dataset before we use it
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 54
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 55
And tag it so we and others can find it more easily.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 56
Here we are using it to get an answer to “Who knows Anupam Joshi”
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 57
He has many friends!
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 58
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 59
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications
• State of the Semantic Web
• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 60
Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling
System/date Terms Documents Individuals Triples Bytes
Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109
Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010
2006 1x106 5x107 5x107 5x109 5x1011
2008 5x106 5x109 5x109 5x1011 5x1013
We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 61
How much reasoning?
• SwoogleN (N<=3) does limited reasoning– It’s expensive
– It’s not clear how much should be done
• More reasoning would benefit many use cases– e.g., type hierarchy
• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 62
This talk• Motivation
• Semantic web 101• Swoogle Semantic Web
search engine• Use cases and applications• State of the Semantic Web• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 63
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than html search engines– So they require different techniques and APIs
• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 64
http://ebiquity.umbc.edu/Annotated
in OWL
For more information
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 65
backup
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 66
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 67