Tabulator: Exploring and Analyzing linked data on the Semantic Web Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach,

Tabulator:Exploring and Analyzing linked dataon the Semantic Web

Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly,

Ruth Dhanaraj, James Hollenbach, Adam Lerer, David Sheets

Decentralized Information Group, Computer Science and Artificial Intelligence Laboratory,

Massachusetts Institute of Technology, Cambridge, MA, USA.

2008-10-01

Presentation by JongHeum Yeon, IDS Lab.

Copyright 2008 by CEBT

Contents

Introduction

Methodology

Scenarios

User Interface

Web vs. Semantic Web/Explore vs. Analyze

Exploring in outliner mode/Views of RDF data

Network access algorithms

From representation to RDF/What to dereference/Hash signs and redi-rects

Friend-of-a-friend conventions/Inference on the client

Related Work

Future Work

Evaluation

Conclusions

2


Introduction

The Tabulator

RDF browser

For users

– To access and interact with the entire web of RDF data

For developers

– To post their data in RDF, refine and promote RDF linking stan-dards,

For providers

– To see how their data interacts with the rest of the Semantic Web.

demonstrate and utilize

– the power of linked RDF data with a user-friendly Semantic Web browser

A challenge for Semantic Web browsers

domain-specific applications in a generic program

3


Motivation

HTML

could immediately see the results of the work

add immediate value to his page by linking to other chosen resources

URI

Many projects such as Biopax are currently available online

but consist of a small number of huge archive files

– the emphasis on inference over fixed datasets

– lack of linked data is self-sustaining

without things to link <-> no incentive for putting one’s own data on the web

– lack of a straightforward generic data browser

give an immediate feedback and gratification of online linked data.

4


Methodology

The browser should be as easy as possible for a new user to pick up

and easy for developers to extend with their own ideas

Asynchronous Javascript and RDF (AJAR) platform

HTML DOM

HTTP

XML

RDF

SPARQL

5


Scenarios

The documents published by W3C and its organizational structure are on separate web pages. As a working group chair, Alice wants a list of documents which are in Last Call status, their editors, and the email address of the domain leaders responsible for the working groups for which the doc-ument is a deliverable. She sorts them by date comments are due, and finds one she almost missed. She emails the chair of the working group asking for a extension of the deadline.

Bob has no idea what a line for $364 on his credit card state-ment is about. He looks at it on a calendar view, sees it was a Saturday, but still doesn’t know. He looks for photos he took at the same time, and displays it on the same calendar. Now he understands: he took the kids to an amusement park.

6


Scenarios

Charlie notes that Danielle’s blog is critical of his work, and wants to understand why that might be. Danielle has in her data file information pointing to pa-pers she has written, projects she has worked on, and professional acquaintances. Charlie realizes they have an acquaintance in common who can probably help resolve the issue.

An existing relational database such as the W3C in-ternal enterprise database, designed without semantic web export in mind, is exported into linked RDF. The Tabulator is found to be effective for browsing data in the database, both for the original users of the data, and also for new users.

7


User Interface

Web Brower

navigates along links between documents

Semantic Web Browser

navigates along relationships (predicates) in a web of concepts

have an awareness of the underlying web of documents (and query services)

Tabulator

Primary : the graph of logical information

Secondary : the web of documents

The user explores an abstract web of data

– conjunction of all the graphs of documents that have been read

Users check the provenance or source of any piece of information

data is not yet accessed (blue), already fetched (green), in progress (yellow), or failed (red)

8


User Interface

http://dig.csail.mit.edu/2005/ajar/release/tabulator/0.8/tab.html

http://dig.csail.mit.edu/2007/wiki/Projects.rdf#OpenLinkedDataProject

9






User Interface

Explore vs. Analyze

WWW is something we explore, following links on a hunch

Exploring the semantic web in this way involves

– moving from node to node

– finding more arcs

– reassessing our next move

Tabulator

operates in two interlocking modes

Exploration

Analysis

10


User Interface

Exploration mode

explore an RDF graph in a tree view

Analysis mode

select certain fields (arcs or predicates) to define a pattern

ask the Tabulator to find all examples of that pattern

match the query pattern to the RDF graph

11


Analysis mode

12


User Interface

Outliner mode

People are typically very comfortable in a tree-like envi-ronment

Much data in the world has been organized into trees

The web is largely composed of overlapping trees with local roots all over the place

13


Views of RDF data

Map View

14


Views of RDF data

Calendar and Timeline View

15


Views of RDF data

SPARQL: Formulating Queries in Tabulator

16


Network access algorithms

The dereferencing of links in RDF has not been explored to a great extent to date

If an inference engine simply downloads data

whenever it comes across a new URI

in an unbounded open web

continue without limit

In the case of a user browser

there is direction

in the direction the user chooses to navigate

in a specific query

17


Network access algorithms:Limitations

Server has to anticipate what data will be useful

In the general case of an exploring user

Some data would be just too numerous

e.g. for a person, an entire track of every known date/loca-tion

Not all data is available in machine-readable form

18


Network access algorithms:Hash signs and redirects

URI containing a hash (#)

http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator

http://dig.csail.mit.edu/2007/wiki/tabulator#project

Information on many things in the same file

URIs without a # if the URI denotes an arbitrary thing

Server responsd : HTTP 303

Forwarding the client to a document that contain informa-tion

19


Network access algorithms:From representation to RDF

Conventions - should the whole set be a standard?

Content-type is definitive

Text/plain: Guess with a warning

application/rdf+xml Parse as RDF/XML

text/rdf+n3 Parse as N3 (*not imp.)

text/html

– link rel=meta in HTML

– GRDDL profile

– RDF/a ??

*+xml*:

– GRDDL profile

20


Network access algorithms:From representation to RDF

21


Network access algorithms:What to deference

If user is browsing information about a subject x

Looking up involves

looking up the URI of x itself

looking up any y where the store includes the fact that { x rdfs:seeAlso y}.

22


Network access algorithms:Friend-of-a-friend conventions

FOAF, given by rdfs:seeAlso, is a link to their FOAF file

The protocol is

to load the resource linked by rdfs:seeAlso

to merge (’smush’) nodes with the same mailbox or mail-box hash

As these are inverse-functional properties

dealt with by the inference layer

23


Network access algorithms:Inference on the client

The limited inference performed

<owl:sameAs> smushing

– Merging nodes <owl:sameAs> each other

– Merging of equal nodes

Merging nodes

– With identical functional or inverse functional properties

– Essential for the FOAF convention

rdfs:subPropertyOf

– used for finding subproperties of rdfs:label for the user interface

URI canonicalization Certain URIs

– have only syntactic differences and are always equivalent

Hashed property entailment

– { ?p :Sha1Property ?q. ?x ?p y?. ?y crypto:sha1 ?z} => {?x ?q ?z}

– This is needed for matching foaf:mailbox to foaf:sha1 mbox.

OWL-DL entailment is not supported.

24


Future Work

Extension of query-by-example to intuitive rule-building systems

Experiments with a variety of remote stores & query en-gines

Provide a more graphical user interface using Scalable Vector Graphics (SVG) or 3D

Extensions of the Fresnel language to cover input form views are one possibility

user may want to add from the existing data

client save back a modified file, share every change over the network with a server, other clients viewing or editing the same information

25


Evaluation

W3C staff members test and evaluate

No way to undo or go back to a certain previous state

Lack of cognitive map

Requirement of spreadsheet like navigation

26


Conclusions

Domain-specific applications

will always be important

will always do better at specific tasks than the general one

Interoperability between a generic client and an applica-tion-specific one is crucial

In order to enable generic browsing

users should be encouraged to leave sufficiently powerful user interface tips in ontologies

generic application can acquire the ability

– to provide an effective and useful interface to data from previ-ously unknown domains

27


Current Works in 2007

Tabulator Redux: Writing Into the Semantic Web, unpub-lished, 2007

Object Selection

Predicate Selection

Editing in Table Mode

Network Protocol for Writing

28


Future Work in 2007

Browser integration

Updating Information

Collaboration

Predicates

Social Policy

UI/Usability

Longer term devlopments

29


Interesting Points

Written by Tim Berners-Lee

Scenarios in semantic web

Give examples of practice use case off semantic web

Provide a great vision of semantic web browsers

30

Documents

Tabulator: Exploring and Analyzing linked data on the Semantic Web Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach,