Upload
george-hawkins
View
240
Download
1
Embed Size (px)
Citation preview
Tabulator:Exploring and Analyzing linked dataon the Semantic Web
Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly,
Ruth Dhanaraj, James Hollenbach, Adam Lerer, David Sheets
Decentralized Information Group, Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, Cambridge, MA, USA.
2008-10-01
Presentation by JongHeum Yeon, IDS Lab.
Copyright 2008 by CEBT
Contents
Introduction
Methodology
Scenarios
User Interface
Web vs. Semantic Web/Explore vs. Analyze
Exploring in outliner mode/Views of RDF data
Network access algorithms
From representation to RDF/What to dereference/Hash signs and redi-rects
Friend-of-a-friend conventions/Inference on the client
Related Work
Future Work
Evaluation
Conclusions
2
Copyright 2008 by CEBT
Introduction
The Tabulator
RDF browser
For users
– To access and interact with the entire web of RDF data
For developers
– To post their data in RDF, refine and promote RDF linking stan-dards,
For providers
– To see how their data interacts with the rest of the Semantic Web.
demonstrate and utilize
– the power of linked RDF data with a user-friendly Semantic Web browser
A challenge for Semantic Web browsers
domain-specific applications in a generic program
3
Copyright 2008 by CEBT
Motivation
HTML
could immediately see the results of the work
add immediate value to his page by linking to other chosen resources
URI
Many projects such as Biopax are currently available online
but consist of a small number of huge archive files
– the emphasis on inference over fixed datasets
– lack of linked data is self-sustaining
without things to link <-> no incentive for putting one’s own data on the web
– lack of a straightforward generic data browser
give an immediate feedback and gratification of online linked data.
4
Copyright 2008 by CEBT
Methodology
The browser should be as easy as possible for a new user to pick up
and easy for developers to extend with their own ideas
Asynchronous Javascript and RDF (AJAR) platform
HTML DOM
HTTP
XML
RDF
SPARQL
5
Copyright 2008 by CEBT
Scenarios
The documents published by W3C and its organizational structure are on separate web pages. As a working group chair, Alice wants a list of documents which are in Last Call status, their editors, and the email address of the domain leaders responsible for the working groups for which the doc-ument is a deliverable. She sorts them by date comments are due, and finds one she almost missed. She emails the chair of the working group asking for a extension of the deadline.
Bob has no idea what a line for $364 on his credit card state-ment is about. He looks at it on a calendar view, sees it was a Saturday, but still doesn’t know. He looks for photos he took at the same time, and displays it on the same calendar. Now he understands: he took the kids to an amusement park.
6
Copyright 2008 by CEBT
Scenarios
Charlie notes that Danielle’s blog is critical of his work, and wants to understand why that might be. Danielle has in her data file information pointing to pa-pers she has written, projects she has worked on, and professional acquaintances. Charlie realizes they have an acquaintance in common who can probably help resolve the issue.
An existing relational database such as the W3C in-ternal enterprise database, designed without semantic web export in mind, is exported into linked RDF. The Tabulator is found to be effective for browsing data in the database, both for the original users of the data, and also for new users.
7
Copyright 2008 by CEBT
User Interface
Web Brower
navigates along links between documents
Semantic Web Browser
navigates along relationships (predicates) in a web of concepts
have an awareness of the underlying web of documents (and query services)
Tabulator
Primary : the graph of logical information
Secondary : the web of documents
The user explores an abstract web of data
– conjunction of all the graphs of documents that have been read
Users check the provenance or source of any piece of information
data is not yet accessed (blue), already fetched (green), in progress (yellow), or failed (red)
8
Copyright 2008 by CEBT
User Interface
http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab.html
http://dig.csail.mit.edu/2007/wiki/Projects.rdf#OpenLinkedDataProject
9
Copyright 2008 by CEBT
User Interface
Explore vs. Analyze
WWW is something we explore, following links on a hunch
Exploring the semantic web in this way involves
– moving from node to node
– finding more arcs
– reassessing our next move
Tabulator
operates in two interlocking modes
Exploration
Analysis
10
Copyright 2008 by CEBT
User Interface
Exploration mode
explore an RDF graph in a tree view
Analysis mode
select certain fields (arcs or predicates) to define a pattern
ask the Tabulator to find all examples of that pattern
match the query pattern to the RDF graph
11
Copyright 2008 by CEBT
Analysis mode
12
Copyright 2008 by CEBT
User Interface
Outliner mode
People are typically very comfortable in a tree-like envi-ronment
Much data in the world has been organized into trees
The web is largely composed of overlapping trees with local roots all over the place
13
Copyright 2008 by CEBT
Views of RDF data
Map View
14
Copyright 2008 by CEBT
Views of RDF data
Calendar and Timeline View
15
Copyright 2008 by CEBT
Views of RDF data
SPARQL: Formulating Queries in Tabulator
16
Copyright 2008 by CEBT
Network access algorithms
The dereferencing of links in RDF has not been explored to a great extent to date
If an inference engine simply downloads data
whenever it comes across a new URI
in an unbounded open web
continue without limit
In the case of a user browser
there is direction
in the direction the user chooses to navigate
in a specific query
17
Copyright 2008 by CEBT
Network access algorithms:Limitations
Server has to anticipate what data will be useful
In the general case of an exploring user
Some data would be just too numerous
e.g. for a person, an entire track of every known date/loca-tion
Not all data is available in machine-readable form
18
Copyright 2008 by CEBT
Network access algorithms:Hash signs and redirects
URI containing a hash (#)
http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator
http://dig.csail.mit.edu/2007/wiki/tabulator#project
Information on many things in the same file
URIs without a # if the URI denotes an arbitrary thing
Server responsd : HTTP 303
Forwarding the client to a document that contain informa-tion
19
Copyright 2008 by CEBT
Network access algorithms:From representation to RDF
Conventions - should the whole set be a standard?
Content-type is definitive
Text/plain: Guess with a warning
application/rdf+xml Parse as RDF/XML
text/rdf+n3 Parse as N3 (*not imp.)
text/html
– link rel=meta in HTML
– GRDDL profile
– RDF/a ??
*+xml*:
– GRDDL profile
20
Copyright 2008 by CEBT
Network access algorithms:From representation to RDF
21
Copyright 2008 by CEBT
Network access algorithms:What to deference
If user is browsing information about a subject x
Looking up involves
looking up the URI of x itself
looking up any y where the store includes the fact that { x rdfs:seeAlso y}.
22
Copyright 2008 by CEBT
Network access algorithms:Friend-of-a-friend conventions
FOAF, given by rdfs:seeAlso, is a link to their FOAF file
The protocol is
to load the resource linked by rdfs:seeAlso
to merge (’smush’) nodes with the same mailbox or mail-box hash
As these are inverse-functional properties
dealt with by the inference layer
23
Copyright 2008 by CEBT
Network access algorithms:Inference on the client
The limited inference performed
<owl:sameAs> smushing
– Merging nodes <owl:sameAs> each other
– Merging of equal nodes
Merging nodes
– With identical functional or inverse functional properties
– Essential for the FOAF convention
rdfs:subPropertyOf
– used for finding subproperties of rdfs:label for the user interface
URI canonicalization Certain URIs
– have only syntactic differences and are always equivalent
Hashed property entailment
– { ?p :Sha1Property ?q. ?x ?p y?. ?y crypto:sha1 ?z} => {?x ?q ?z}
– This is needed for matching foaf:mailbox to foaf:sha1 mbox.
OWL-DL entailment is not supported.
24
Copyright 2008 by CEBT
Future Work
Extension of query-by-example to intuitive rule-building systems
Experiments with a variety of remote stores & query en-gines
Provide a more graphical user interface using Scalable Vector Graphics (SVG) or 3D
Extensions of the Fresnel language to cover input form views are one possibility
user may want to add from the existing data
client save back a modified file, share every change over the network with a server, other clients viewing or editing the same information
25
Copyright 2008 by CEBT
Evaluation
W3C staff members test and evaluate
No way to undo or go back to a certain previous state
Lack of cognitive map
Requirement of spreadsheet like navigation
26
Copyright 2008 by CEBT
Conclusions
Domain-specific applications
will always be important
will always do better at specific tasks than the general one
Interoperability between a generic client and an applica-tion-specific one is crucial
In order to enable generic browsing
users should be encouraged to leave sufficiently powerful user interface tips in ontologies
generic application can acquire the ability
– to provide an effective and useful interface to data from previ-ously unknown domains
27
Copyright 2008 by CEBT
Current Works in 2007
Tabulator Redux: Writing Into the Semantic Web, unpub-lished, 2007
Object Selection
Predicate Selection
Editing in Table Mode
Network Protocol for Writing
28
Copyright 2008 by CEBT
Future Work in 2007
Browser integration
Updating Information
Collaboration
Predicates
Social Policy
UI/Usability
Longer term devlopments
29
Copyright 2008 by CEBT
Interesting Points
Written by Tim Berners-Lee
Scenarios in semantic web
Give examples of practice use case off semantic web
Provide a great vision of semantic web browsers
30