The Semantic Web: From Representation to Realizationalumni.media.mit.edu/~kris/ftp/ThorissonEtAl-SemanticWebII-2010.pdf · The Semantic Web: From Representation to Realization 93

The Semantic Web:From Representation to Realization

Kristinn R. Thorisson1,2, Nova Spivack2, and James M. Wissner2

1 Center for Analysis & Design of Intelligent Agentsand School of Computer Science, Reykjavik University

Menntavegur 1, 101 Reykjavik, Iceland2 Radar Networks, Inc.

410 Townsend Street, Suite 150, San Francisco, CA 94107 [email protected], {nova,jim}@radarnetworks.com

Abstract. A semantically-linked web of electronic information – the Se-mantic Web – promises numerous benefits including increased precisionin automated information sorting, searching, organizing and summariz-ing. Realizing this requires significantly more reliable meta-informationthan is readily available today. It also requires a better way to repre-sent information that supports unified management of diverse data anddiverse Manipulation methods: from basic keywords to various types ofartificial intelligence, to the highest level of intelligent manipulation – thehuman mind. How this is best done is far from obvious. Relying solely onhand-crafted annotation and ontologies, or solely on artificial intelligencetechniques, seems less likely for success than a combination of the two. Inthis paper describe an integrated, complete solution to these challengesthat has already been implemented and tested with hundreds of thou-sands of users. It is based on an ontological representational level we callSemCards that combines ontological rigour with flexible user interfaceconstructs. SemCards are machine- and human-readable digital entitiesthat allow non-experts to create and use semantic content, while empow-ering machines to better assist and participate in the process. SemCardsenable users to easily create semantically-grounded data that in turnacts as examples for automation processes, creating a positive iterativefeedback loop of metadata creation and refinement between user and ma-chine. They provide a holistic solution to the Semantic Web, supportingpowerful management of the full lifecycle of data, including its creation,retrieval, classification, sorting and sharing. We have implemented theSemCard technology on the semantic Web site Twine.com, showing thatthe technology is indeed versatile and scalable. Here we present the keyideas behind SemCards and describe the initial implementation of thetechnology.

Keywords: Semantic Web, Ontologies, Knowledge Management, UserInterface, SemCards, Human-Machine Collaboration, Twine.com, Meta-data.

N.T. Nguyen and R. Kowalczyk (Eds.): Transactions on CCI II, LNCS 6450, pp. 90–107, 2010.c! Springer-Verlag Berlin Heidelberg 2010

The Semantic Web: From Representation to Realization 91

1 Introduction

Intelligent automated retrieval, manipulation and presentation of informationdefines the speed of progress in much of today’s high-technology work. In a worldwhere information is at the center, any improvement is welcomed that can helpautomate even more of the massive amounts of data manipulation necessary. Inmany people’s vision of the Semantic Web machines take center stage, based ona deeper knowledge of the data they manipulate than currently possible. To doso calls for metadata – data about the data. Making machines smarter at taskssuch as retrieving relevant information at relevant times automatically from thevast collection, even on today’s average laptop hard drive, requires much moremeta-information than is available at present for this data.

Accurate metadata can only be derived from an understanding of content;classifying photographs according to what they depict, for example, is best doneby a recognition of the entities in them, lighting conditions, weather, film stock,lens type used, etc. Authoring metadata for images by hand, to continue withthis example, will be an impossible undertaking, even if we limited the metadatato surface phenomena such as the basic objects included in the picture, as thenumber of photographs generated and shared by people is increasing exponen-tially. Powertools designed for manual metadata creation would only improvethe situation incrementally, not exponentially, as needed.

Although text analysis has come quite a long way and is much further ad-vanced than image analysis, artificial intelligence techniques for analyzing textand images have a long way to go to reliably decipher the complex content of suchdata. The falling price of computing power could help in this respect, as imageanalysis is resource-intensive. This will not be su!cient, however, as general-purpose image analysis (read: software with “commmon sense”) is needed toanalyze and classify the full range of images produced by people based on con-tent. On the one hand, achieving the full potential of a semantic web, leavingmetadata creation to current AI technologies, will not be possible as these tech-nologies are simply not powerful enough. This state of a"airs may very possiblyextend well beyond the next decade. On the other hand, because the growth ofdata available online is rising exponentially, and can be expected to continue todo so, manual metadata entry will never catch up to the extent necessary for sig-nificant e"ect. Creating the full set of ontologies by hand required for adequatemachine manipulation would be a Herculean e"ort; waiting for the adequatemachine intelligence could delay the Semantic Web for decades.

Does this mean the semantic web is unrealizable until machines become sig-nificantly smarter? Not necessarily. While we believe that neither hand-craftedontologies nor current (or next wave) artificial intelligence techniques alone canachieve a giant leap towards the Semantic Web, a clever combination of the twocould potentially achieve more than a notable improvement. The idea is that ifonline manual labor could somehow be augmented in such a way that it sup-ported automatic classification, making up for its weak points, this could helpmove the total amount of semantically-tagged data closer to the 100% mark andhelp automatic processes get over the well-known “90% accuracy brick wall”.

92 K.R. Thorisson, N. Spivack, and J.M. Wissner

For us the question about how to achieve the vision of the Semantic Webhas been, What kind of collaborative framework will best address the building ofthe Semantic Web? Most tools and methodologies designed for automating datahandling are not suitable for human usage – the underlying data representationsis designed for machines in ways that are not meant for human consumption.Data formats designed exclusively for human usage, such as e.g. HTML, arenot suitable for machine manipulation – the data is unstructured, the process isslow, error prone and ultimately, to make it work, calls for massive amounts ofmachine intelligence that are well beyond today’s reach.

This line of reasoning has resulted in our two-prong approach to the creationof the Semantic Web: First, we develop a system for helping people take a morestructured approach to their data creation, management and manipulation andsecond, we develop automatic analysis mechanisms that use the human-providedstructured data and framework to expand the semantic classification beyondwhat is possible to do by hand. We have already achieved significant progress onthe first part of this approach; the second part is also well under way. Our methodfacilitates an iterative interaction loop between the user’s information input, theautomated extension of this work and subsequent monitoring of feedback on theextensions from the user.

Semantic Cards, or SemCards, is what we call the underlying representationof our approach. It is a technology that combines ontology creation, manage-ment/usage with the user interface in a way that supports simultaneously (a) hu-man metadata creation, manipulation and consumption, (b) expert-user creationand maintenance of ontologies, and (c) automation services that are augmentedby human-created meaningful examples of metadata and semantic relationshiplinks, which greatly enhances their functionality and accuracy. SemCards pro-vide an intermediate ontological representational level that allows end-users tocreate rich semantic networks for their information sphere.

One of the big problems with automation is low quality of results. Whilestatistics may work reasonably in some cases as a solution to this, for any singleindividual the “average user” is all too often too di"erent on too many dimensionsfor such an approach to be useful. The SemCard intermediate layer encouragesusers to create metadata and semantic links, which provides underlying automa-tion with highly specific, user-motivated examples. The net e"ect is an increasein the possible collaboration between the user and the machine. Semi-intelligentprocesses can be usefully employed without requiring significant or immediateleaps in AI research.

From the users’ perspective what we have developed is a network portal wherethey can organize their own information for personal use, publish any of thatinformation to any group – be it “emails” addressed to a single individual orphoto albums shared with the world – and manage the information shared withthem from others, whether it is documents, books, music, etc. Under the hoodare powerful ontology-driven technologies for organizing all categories of data,including access management, relational (semantic) links and display policies,in a way that is relatively transparent to the user. The result is a system that


o"ers improved automation and control over access management, informationorganization and display features.

Here we describe the ideas behind our approach and give a short overivew of ause-case on the semantic Web site Twine.com. The paper is organized as follows:First we review related work, then we describe the technology underlying Sem-Cards and explain how the are used. We then describe our Web portal Twine.com,where we have implemented various user interfaces for enabling the use of Sem-Cards in a number of ways, including making semantically rich Web bookmarks,notes, blogs and semantically-annotated uploads.

2 Related Work

The full vision of the Semantic Web will require significant amounts of metadata,some of which describes entities themselves, other which describes relationshipsbetween entities. Two camps can be seen proposing rather di"erent approachesto this problem. One extreme claims that manual creation of metadata willnever work as it is not only slow and error-prone, the level to which it wouldhave to be done would go well beyond the patience of any average user – quitepossibly all. To this camp the only real option is automation. The other camppoints out that automation is even more error-prone than manual creation, ascurrent e"orts to automatic semantic annotation on massive scales produces onlymoderate results of between 80% and 90% correct, at the very best [1]. Theyclaim that the remaining 10% will always be beyond reach because it requiressignificant amounts of human-level intelligence to be done correctly. Further, asargued by Etzioni and Gribble [2], metadata augmentation has quite possiblynot been done by the general user population because they have seen no benefitsin doing so. Lastly, this camp points to the massive amounts of tagging and dataentry done on sites such as Wikipedia, Myspace and Facebook as a proof of pointthat end-users are quite willing to provide (some amount of) metadata. Givingthem the right tools might change this. Applications that connect casual end-users with ontologically-driven content and processes are, nevertheless, virtuallynon-existent.

Many e"orts have focused on building digital content management with afocus on the object. Of these, our technology bears perhaps the greatest resem-blance to the Buckets of Maly et al. [3] which are “self-contained, intelligent, andaggregative ... objects that are capable of enforcing their own terms and condi-tions, negotiating access, and displaying their contents”. Like SemCards, Bucketsare fairly self-contained, with specifications for how they should be displayed.Buckets grew out of Kahn and Wilensky’s [4] proposed infrastructure for digitalinformation services. Key to their proposal was the notion of digital object, com-posed of essentially the two familar parts, data and metadata. The subsequentwork on FEDORA [5] saw the creation of an open-source software frameworkfor the “storage, management, and dissemination of complex objects and therelationships among them” [6]. Buckets represent a focus on storing contentin digital libraries, most likely manipulated by experts. In contrast, SemCards


aim at enabling casual end-users to create metadata. Buckets are targeted tomachime manipulation; SemCards are aimed at machine manipulation as well,but more importantly at supporting automatically generated meta-information.SemCards also di"er from Buckets in that they are especially designed to besharable between multiple users over mixed-architecture networks.

The Haystack [7] and Chandler1 projects were e"orts to create new user in-terfaces for wiewing and working with semantic objects. While this work wasimportant – and in many ways still is – it also shows how di!cult it is to leadsuch e"orts to ultimate fruition while addressing all the key issues that must besolved. Our work on PersonalRadar followed a similar path2, albeit always withthe ultimate objective of solving the hard problems related to deployment overa WAN.

The separate representation layer provided by SemCards is a key di"erencebetween prior e"orts and ours. They enable ontologically-driven constructs tobe collaboratively built by ontology specialists, algorithms and end-users, en-couraging them to provide examples to improve the automation. Because ofthis, SemCards are tolerant to end-user mistakes; the casual Internet user is notinitiated to invest a lot of time in understanding the intricacies of the kinds ofadvanced ontologies required. Separating the two makes the automation systemsmore robust to manual input errors.

Other important di"erences between our approach and prior work are an in-tegrated ability to share data between individuals and groups of users over anetwork, with complex policy control over access and sharing, and the flexibleuse of SemCards to represent metadata for real-world objects and hypotheti-cal constructs - as “library index cards for digital content, physical things andabstract ideas”.

Although current enterprise portals are capable of organizing group or teaminformation, they are often inaccessible to the public or to individuals, and theyare expensive as they are highly monolithic. Even less utilitarian and intelligentwith respect to organizing information are the popular online search engineswhich are deisgned for largely unstructured data. Furthermore, these typicallyorganize information and data by relevance to keywords. We have built a networkportal, Twine.com, for deploying the SemCard technology. Twine.com providesa test of the strength of our semantic object framework when deployed over theInternet and working in an integrated, coordinated manner. Our work sets itselfapart from prior work on the Semantic Web in that it has already been testedwith a relatively large number of end users, with measurable results.

1 http://chandlerproject.org2 PersonalRadar was a desktop application that we developed around the same time

that Chandler became public, and in some ways it presented similar solutions to thesemnatic interface; the semantic search/filtering interface for PersonalRadar was,however, vastly superior to anything we have seen so for proposed for that purpose.Unfortunately the numerous excellent interface ideas developed for PersonalRadarare still not supported by Twine as it is virtually impossible to implement thesemethods over a standard network link.

http://chandlerproject.org


3 SemCards: Semantic Objects for CollaborativeOntology-Driven Information Management

A single SemCard can be characterized as an intuitive user interface constructthat bridges between a user and an underlying ontology that a"ords all thebenefits of a Semantic Web such as automatic relationship discovery, sorting,data mining, semantic search, etc. Together many SemCards form semantic netsthat are in every way the embodiment of what many have envisioned the Se-mantic Web to be. Instead of being complex, convoluted and non-intuitive asany machine-manipulatable ontology will appear to the uninitiated (c.f. [8]),SemCards provide a powerful and intuitive interface to a unified framework formanaging information.

As mentioned above, SemCards form an intermediate separation layer betweenontologies and the user interface. By isolating the stochastic nature of end-useractivity from underlying semantic networks built with ontological rigour, twoimportant goals are achieved. First, end-users are encouraged to create meta-data for their content, as the input methods are familiar and straight-forward.SemCards shield the deep ontology from being a"ected by end-user activity. Thisdoes not only help stabilize the system, it also helps the automation processesfrom having to deal with the “ground shifting from underneath”. Second, theautomation processes are provided with manually-created semantic nets, cre-ated directly and continuously by end-users, that serve as examples and can beused to improve the automatic metadata creation. The net result of this is asignificant improvement in automation quality and speed, including automationof many tedious details of information management such as data sharing pol-icy maintenance, indexing, sorting – in fact, the of the full data managementlifecycle.

3.1 Structure of a SemCard

In its simplest version a SemCard will appear to the user as a form with fieldsor slots. A SemCard has one template and one or more instances, correspondingroughly to the object-oriented programming concepts of object template/class,and object instance, respectively. Under the hood their slots are ontologicallydefined; however, the end user normally does not see this. To take an example,a SemCard for holding an e-mail message may look exactly like any interfaceto a regular email program. However, the slots (“To:”, “From:”, etc.) referencean ontology that defines what kinds of data each slot can take, what type ofinformation that is, etc. The e-mail SemCard, when created, will contain infor-mation about who authored which part of the content and when. Additionally,the author will not simply be a regular “From” but have a link to the SemCardrepresenting the author of the email SemCard.

No executable code. An important feature of SemCards is that they are com-pletely passive – they do not carry with them any executable code: We haveentirely separated the services operating on the SemCards from the SemCards


Fig. 1. Metadata for entities, digital or physical, is semantically defined by an under-lying ontology that appears to the user as (networks of) SemCards

themselves, leaving only a specification for the desired operations (named pro-cesses) to be done on a SemCard in the SemCard itself. This has many benefits,the most important of which is simplicity in usage and ease of maintainingcompatibility between systems that use SemCards.

Unique ID. Every SemCard instance has a global, unique identifier (GUID),timestamps representing time of creation and related temporal aspects such astimes of modification, as well as a set of policies. Its author is also represented,and any authors of modifications throughout the SemCard’s lifetime. The Sem-Card’s policies allow it to be displayed, shared, copied, etc. in predescribed ways,through the use of rules.

Representing any entity. Any type of digital object or information can be pointedto with a SemCard, e.g. a Web page, a product, a service o"er, a data record ina database, a file or other media object, media streams, a link to a remote Web


service, etc. A SemCard can thus represent any digital item, like a png image orpdf document, physical entities such as a person, building, street, or a kitchenutensil, as well as immaterial things like ideas, mythologicical phenomena andintellectual creations. A SemCard can also represent collections, for example aSemCard representing a group of friends would contain links to the SemCardsrepresenting the individuals of that social group. Equally importantly, SemCardscan represent relationships between SemCards, for example, that a person is theauthor of an idea.

Display rules. SemCard can carry display rules that dictate how the SemCard it-self (as well as its target reference - the thing it represents) should be displayed tothe user. These can describe, for example, its owner’s preferences or the display de-vice required. As SemCards carry with them their own display specifications theiron-screen representation can be customized by their userss; the same SemCard canthus be displayed di"erently to two di"erent users with di"erent preferences. Therules can specify how metadata and slot values in the SemCard should be orga-nized and what human-readable labels should be used for them, if any, as well aswhat aspects of the SemCard appear as interactive elements in the interface, andthe results of specific interaction with those elements.

3.2 Using Semcards

Creating an instance of a SemCard involves simply selecting the appropriateSemCard template (“template”) from a menu or via a search-enhanced selectorinterface. To fill out the SemCard instance, one or more slots are filled with values– these could be semantic links to other SemCards, typed entities or unclassifiedcontent. Each SemCard instance, its semantic dimensions and their values, canbe stored as an XML (extensible Markup Language) object, using e.g. the RDF(Resource Description Framework) format [9].

While SemCards could be invisible to users, hidden underneath the standardapplications they use, typically a user will want to view and manipulate theinformation they represent directly, especially for linking them together. Forexample, a document authored by David, a non-SemCard user, is received byKris’ SemCard system. When received, a SemCard of type “SemDocument” isautomatically created. Kris links the SemDocument SemCard to his SemCardrepresenting the document’s author, David, using an instance of the SemCardtype Authored-By. This instantly puts the received document in a rich semanticcontext of the network of all SemCards that link, in one way or another, thedocument-author pair to a lot of metadata as to who created what at whattime, and who shared it with whom, how, and so on.

For viewing and manipulating SemCards we have developed both client-based editors in the spirit of Haystack [10] and Web-based interfaces. Our Per-sonalRadar desktop application, of which there were made several prototypes,made SemCards actively usable on personal computers, expanding the reach ofTwine.com down to personal data, via semantically rich networks. SemCardscan be created in many ways; doing so manually from scratch involves selecting


a SemCard template type, making an instance of it and customizing its slotsusing typed entities from an underlying ontology.

Fig. 2. The iterative nature of human-machine metadata an-notation. (1) User creates digital document, (2) a SemCardinstance is automatically created; the automation infers thata particular image is included in the document and (3) cre-ates a SemCard for it and a SemCard of type Includes thatlinks the two; (4) relationship between the SemCards nowforms a triplet that the user can inspect (here shown in prepo-sitional form, but is typically graphical); (5) user modifiesthe results (+/-) from which (6) the automation processesgeneralizes to improve own performance.

SemCards temp-lates are ideally fullydefined by one ormore ontologies. How-ever, the case couldarise where a userwants to representan entity for whichno template exists.A user can createfree-form slots andcollect them into anew SemCard (thathas no template). Aslong as the type ofthe SemCard – orat least one slot init – has a connec-tion to a known on-tology (it will alwayshave its author anddate of creation), theautomation mecha-nisms can use thisinformation to basefurther automatic re-finement of the Sem-Card instance, likelinking it to (whatare believed to be)related SemCards. Man-aging such automatic semantic links becomes akin to unstructured databasemanagment; it will of course never be as good as that for fully-specified Sem-Cards, but because these SemCards live in a rich network of other SemCards,this problem is typically not as large as it may seem.

3.3 End-Users Versus Ontology Experts

In our system expert designers create basic SemCard templates for all majorentities such as digital documents, presentations, video files etc., where a tem-plate’s meta-tags are hand-picked and surface presentation defined (see 3). Im-portantly, non-experts can then create derivative SemCards by modifying these,adding or removing pre-assigned slots in the SemCards, or making new ones fromscratch, using either completely new ones or pre-existing ontologically defined


slots (e.g. by copying slots from other SemCards or from a library). All under-lying ontological relationships are maintained in the new SemCard; a modifiedSemCard will store the specifics of its creation history (and can either carry thatdata around as metadata or link to it in an online database via a GUID). Thishistory information, and its subsequent use and further modification of hundredsor thousands of users, can be used by the automation system to infer about thesemantics of the new SemCard and its relation to the underlying ontology, whichwas not modified in the process.

As SemCards isolate the user from the related ontologies, classificatory mis-takes in their creation does not destroy the underlying ontologies. This results ina kind of graceful degradation; instead of breaking the system such mistakes onlymake the automated handling of information in the system slightly less accurate.The relationship between SemCards and the unerlying ontology can be likenedto non-destructive editing for video: As the creation history (original data, i.e.ontologies) are not changed but rather represented in a separate intermediatelayer, the edit history of any SemCard can be traced back and reverted, if needbe, with no change to the underlying ontologies.

Behind each SemCard is thus an ontology that defines the meaning of theSemCard slots, specifies valid values and relations between slots (see Figure 1).An ontology like FOAF (c.f. [11]) or the Dublin Core [12] can be used withSemCards, as each SemCard carries with it a reference to the ontology it isbased on. Thus, networks of ontologies can be used with SemCards, whetherthey use a basic, simple and singleton ontology like the Dublin Core or aredefinded more deeply in e.g. foundational ontologies such as DOLCE, SUMO[13] [14], or OCHRE [15].

In our current implementation we have created a fairly extensive ontologyfor important digital data types including Web page, 2-D image, URL, textdocument, as well as for physical entities such as person, place, organization,etc. The idea is to make this ontology open-source to encourage linking of otherontologies to it, extending its reach and improving its utility, and ultimatelybringing the Semantic Web to maturity sooner.

4 Collaborative SemCard Creation by Man and Machine

Through the iterative addition and editing of SemCards by users and the au-tomation mechanisms, a positive feedback loop of iterative improvement on thenetwork is created through such collaboration (Figure 2); initial example net-works provide a model for the automation. Reasoning mechanisms are used toinfer the implications of corrections to automatically-created data, based onoriginal manual creation. When the initial manual data entry and correctionsreaches a critical point the automation starts to provide significant and noticableenhancements to the user. Increased manual input, especially in the form of ad-ditions to automatically generated semantic links, allows the automation systemalso to make inferences about the quality of the data entry, not just for a singleuser but for many. This allows it to improve the accuracy of its own automation


Fig. 3. The ontology editor allows expert ontology creators to quickly create, manage,connect and extend the multitude of ontologies underlying Twine.com

even further, and suggestions to users about related data will be more relevantand targeted.

An important feature of SemCards is that they record significant amountsof metadata about themselves, including their own genesis. This makes auto-matic creation of SemCards much more flexible as the automation process canmake inferences about the quality of the SemCards (based on e.g. edit history).Because the same representational framework - SemCards - can be used for alldata, including friend networks, author-entity relationships, object-owner, etc.,inferencing can use the multiple SemCard relationship types (e.g. not only whocreated it but also who the creator’s friends are) to decide how to perform au-tomatic relationship creation, data-slot filling, automatic correction or deletion.Moreover, as the SemCard stores its edit history, including who/what made theedits, any such changes can be undone with relative ease. Since this history isstored as semantic information, it can be used to sort the SemCards accordingto their history. This makes managing SemCards over time much more flexiblethan if they were history-scarce, like e.g the losely-defined metadata of mostdata on people’s hard drives. For example, caching, compressing or any otherprocesses can be made history-sensitive to a high level of detail.

As an example of collaborative automatic/ manual creation of SemCards,Nova, a SemCard end-user, finds a useful URL and creates a SemCard for it oftype “bookmark for a Web-page” (see Figure 4). He makes personal commentson the Web page’s contents by making a “Note” SemCard and linking it to theWebpage SemCard. Nova’s automation processes, running on the SemCard host-ing site, add two things: They fill the Webpage SemCard with machine-readablemetadata from the Web page, and they also link these SemCards to a new


Fig. 4. The Twine bookmarklet popup enables Web surfers to create SemCards quickly.The system automatically fills in relevant information (“Title”, “Description”, “icon”,etc.).

SemCard that it created, containing further information mined from the Website. Now Nova shares (a copy) of the SemCard with Jim (it gets saved in hisSemCard space), who may add his own comments and links to related Sem-Cards; the fact that the SemCard was shared with Jim by Nova is automaticallyrecorded as part of the SemCard’s metadata. Thus, events, data and metadataare created seamlessly and unobtrusively through the collaborative paradigm.

As their authorship is automatically recorded in the SemCards, this can belater used to e.g. exclude all SemCards created by particular automation pro-cesses, should this be desired. Proactive automatic mining of a user’s SemCardscan reveal implicit relationships that the system can automatically make ex-plicit, facilitating faster future retrieval through particular relationship chainsin the resulting relationship graphs.

As a users’s SemCard database grows user-customized automation becomesmore relevant; in the long run, as the benefits of automation become increasinglyobvious to each user, people will see the benefits of providing a bit of extra


meta-information when they create e.g. a word processing document or an image.This will trigger a positive upward spiral where increased use of automation willmotivate users to add more pieces of metadata, which will in turn enable betterautomation.

5 Deployment on Twine.com

Fig. 5. Upon creation of the bookmark, SemCard users canchoose to share it (left side of popup) with users via Twinesthey have created or subscribed to (“My Twines”), or directlywith people they have connected with (“My Connections”)

We have implementthe SemCard technol-ogy and deployed iton the Twine.com, anonline Semantic Webportal where peoplecan create accountsand use a SemCard-enabled system tomanage their onlineactivities and infor-mation, includingbookmarks, digitalfiles, sharing policies,and more.

As of summer 2009there were well over 4million SemCards onTwine.com. At thattime Twine.com hadaround 250 thousandregistered users andover 2 million uniquevisits montly,3 usingthe interface devel-oped for the Web-site. The rate of newSemCard creation hadgrown to 3K per day,created by an esti-mated 10% of the users. So far, users seem to rarely correct the automatically-generated SemCards, but a relatively small subset of Twine power users addextensive additional information to them.

We will now detail an actual example of making a SemCard for a Web page, ashort article on the Physorg.com Web site.4 As can be seen in Figure 4, when auser comes to a Web page of interest they can click on the bookmarklet “Twine3 According to compete.com4 http://www.physorg.com/news157210821.html

http://www.physorg.com/news157210821.html


Fig. 6. Left and center: Two snapshots of the same dropdown box are shown; automaticprocessing of a bookmarked Web page can detect places, people organizations andvarious named entities (“tags”). The user can then modify these by deleting (clickingon the [x]) and adding new ones. The left image shows its “other tags” that wereauto-recognized, the middle image showing “organizations”, as well as one user-added“place” (“Montreal”). Right: Automatic pop-up of items in various categories such as“people”, “places”, etc., related to a SemCard.

This”, which brings up a simple menu with a few information fields. Parts of theSemCard slots have been filled in; the user can choose to edit these, overwritethem with her own information or to leave them as-is. When the user clicks“save” a SemCard for this Web page is created in their Twine account. Theuser can choose to share this item with users and/or twines (see Figure 5) –a twine is a SemCard that can be described as “a blog with controlled accesspermissions” – in other words, a SemCard for a set of SemCards with particularvisual presentation and adjustable viewing permissions.5 The twine SemCardshows the dynamic properties of SemCards for specifying dynamic processes,e.g. calling on services from mining, inferencing, etc.

When the “bookmark” SemCard is saved, using the “Save” button on the lowerright on the bookmarklet popup, the SemCard is stored on Twine.com. Any shar-ing selection that the user had made during the creation will make the bookmarkSemCard available to the users who have permissions to read those twines; forexample, sharing it with the twine Architecture of Intelligence (Figure 5) will en-able everyone who has been invited to subscribe to this twine to see it. In theirhome page on Twine.com this SemCard will now additionally bring forth a lot ofinformation, including auto-tagging (recognized entities, relationships, etc.).

As seen in Figure 6 a cross next to auto-generated tags allow the user to deletethe ones that they don’t agree with. Further related information is automaticallypulled forward, sorted into “places”, “people”, “organizations”, “other tags” and“types of items”: The last one is interesting as it is a unique feature of semantic

5 We will use “twine” with a lower-case “t” to refer to a SemCard of type “twine”.


Webs – here one can find related SemCards of type “video”, for example, or“product”.

Fig. 7. One type of se-mantic search box onTwine.com

Many machine learning techniques can be employedfor automatic tagging, entity extraction and relation-ship detection – in our implementation we have usedvector space representations to profile users and theirsemantic networks and subsequently select relateditems from other semantic nets. Using a (semantic)drill-down search mechanism a user can further keeprefining a search for a SemCard, by selecting any com-bination of type, tags, author, etc. (Figure 7). Duringsuch drill-downs, suggestions by the automation of re-lated material become increasingly better.

6 Future SemCard-DrivenAutomation Services

As already mentioned, the system we have developedenables automatic semantic mining of content sent bya user. Using existing semantic networks created man-ually and automatically by the system, this miningcan be done without requiring any actions or specialediting by the user, such as inserting special charactersor identifying terms or phrases as potential semanticobjects. We will now provide a few examples of futureautomation services enabled by the SemCard technol-ogy. These have not been implemented as services yet,but prototypes already exist.

Intelligent E-Mail/Sharing. An example of a po-tential future use of the SemCard technology is foremail-like purposes. In this example the user has a se-mantic email account with a semantic service provider(or the user keeps the same normal non-semanticemail account but adjusts mailbox settings so thatmail received and sent are processed by the provider).The semantic service provider processes all incom-ing and outgoing e-mail, automatically creating Sem-Cards representing the e-mail itself and concepts referenced in the email andidentified by entity detection algorithms [16]. No intervention would be requiredof the user, other than initial set-up. When the SemCards have been createdthey are automatically linked to other previously defined semcards in the user’saccount, enlarging the user’s knowledge network. Now the user can for examplefind all emails “sent by Jim to Nova about Semantic Web technologies regardingthe PersonalRadar product” – a perfectly valid search using semantic relationsbuilt from information readily available in the user’s account.


Fig. 8. The user interface for searching large collections of SemCards has a familiar,easily navigated multi-column tabbed layout

Because the underlying representation for this sharing of text messages is theSemCard, this activity constitutes semantic sharing. Its principles can be appliedto any digital object – with a SemCard client the same sharing method used forthe email SemCards can be used for sharing any digital object; there is no needto use “attachments” for sharing such entities as they are first-class objects withfull meta-data about their history including creation, manipulation and sharingevents.

Semantic social networks from emails. Another important feature enabledby SemCards is the creation of semantic relationship networks, where a user’srelationships is automatically mapped based on email correspondence. Such anetwork will be most useful if the correspondence is also based on SemCardtechnology, as described above, but regular email can also be used to form thebasis for this technology. To move to this technology from their current software,users provide their private and business contacts in their account, for exampleby uploading their address book into the system.

Depending on various factors, including the content and number of the emailsexchanged, these relationship link types include types such as “friend”, “col-league”, “relative”, “conversants”, with the last type being the most generic. Totake the example of the “conversants” link type: The link is created between theuser and another person when they have exchanged at least two emails, wherethe second email was a response to the first. The link contains the time of theexchange (time of sending, time of reading, both emails), as well as who made


the link, and when (even when automatically created). As before, the user canset a preference for minimal, medium, or heavy mining of her email.

The system processes the addressees of all emails to infer who is communicat-ing with the user about what, as in the above email example, as well as inferringwith whom the user has relationships, what kind of relationships those are, andwhat projects they relate to. Emails are then linked to those inferred projects.This enables very powerful personalization of information displaying: For exam-ple, sets of di"erent preference settings can be associated with separate (namedor unnamed) groups of contacts, enabling di"erential treatment depending onwho the user communicates with. A group called “friends” may have certain set-tings for how entries from/to them should be formatted for viewing; a “personalFacebook-like service” with a corresponding look could be set up by a user forone of her groups, while using a vastly di"erent display setup for others.

7 Conclusions

To realize the full potential of the Semantic Web vision, several challenges mustbe met. One of these is the unreliability of automated metadata creation systems,another is the lack of a strong and flexible framework for representing data andmetadata. We have developed SemCards, which solve these challenges in a waythat takes advantage of current technologies while allowing for future growth inthe foreseeable future. We have implemented this technology on the desktop aswell as on the Web, showing it to scale to hundreds of thousands of users.

The technology presents a powerful representation scheme that enable collabo-rative human-machine and human-human creation of Semantic Web information.SemCards achieve this by separating hard-core ontologies from the end-user, me-diating these via graphical information structures, represented under the hoodusing RDF and OWL, while supplying their own visual representation schemasfor on-screen viewing. The SemCard framework allows better sharing, storing,annotating, enhancing and expanding semantic networks, creating true knowl-edge networks through a collaboration between people and artificial intelligenceprograms.

TheSemanticWeb siteTwine.com, whichhaswell over 2millionmonthly uniquevisitors, has demonstrated the usefulness and extensibility of the technology. Inclose collaboration with automation processes, these users have created over 5 mil-lion SemCards to date. Our results so far show that SemCards can support all ofthe features described in this paper for over 300 thousand users and we have goodreason to believe that the technology will scale well beyond this.

Other proposed approaches for realizing the Semantic Web vision have fallenshort on one or more of the key features that SemCards address and solve. Webelieve that as a uniform standard for representing data and metadata on theWorld Wide Web, SemCards, or a related technology, could very well be themissing glue that is needed to link together the forces – natural and artificial –that are needed to propel the Web forward to the next level, the semantic level.


Acknowledgments. The work was supported by Radar Networks, Inc., by Reyk-javik University, Iceland, and by the School of Computer Science at Reykjavik Univer-sity. The authors would like to thank the Twine team for the implementation of theSemCard technology on Twine.com, as well as the funders of Radar, Vulcan VenturesInc., Leapfrog Ventures Inc. and the angel investors.

References

1. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T.,Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and Seeker: Boot-strapping the Semantic seb via automated semantic annotation. In: Proceedings ofthe World Wide Web Conference (2003) doi: 10.1145/775152.775178

2. Etzioni, O., Gribble, S.: An evolutionary approach to the Semantic Web. Posterpresentation at the First International Semantic Web Conference (2002)

3. Maly, K., Nelson, M.L., Zubair, M.: Smart objects, dumb archives: A user-centric,layered digital library framework. D-Lib Magazine, 5 (1999)

4. Kahn, R., Wilensky, R.: A framework for distributed digital object services. Inter-national Journal on Digital Libraries 6, 115–123 (1995)

5. Payette, S., Lagoze, C.: Flexible and extensible digital object and repository archi-tecture (FEDORA). In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS,vol. 1513, pp. 41–59. Springer, Heidelberg (1998)

6. Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: An architecture for com-plex objects and their relationships. Journal of Digital Libraries - Special Issue onComplex Objects (2005) doi: arXiv:cs/0501012v6

7. Karger, D.R., Bakshi, K., Huynh, D., Quan, D., Sinha, V.: Haystack: A general-purpose information management tool for end users based on semistructured data.In: CIDR, pp. 13–26 (2005)

8. Drummond, N., Jupp, S., Moulton, G., Stevens, R.: A practical guide to buildingOWL ontologies using the Protege 4 and CO-ODE tools, 1.2. edn. (2009)

9. Decker, S.,Melnik, S.,Harmelen,F.V., Fensel,D.,Klein,M., Erdmann,M.,Horrocks,I.: Knowledge networking in the Semantic Web: The roles of XML and RDF (2000)

10. Huynh,D.,Karger,D.R.,Quan,D.:Haystack:Aplatform for creating, organizingandvisualizing information using RDF. In: Semantic Web Workshop, WWW 2002 (May2002)

11. Ding,L.,Zhou,L.,Finin,T., Joshi,A.:HowtheSemanticWeb isbeingused:Ananaly-sis of FOAF documents. In: Proceedings of the 38th International Conference on Sys-tem Sciences, p. 313.3 (2005) doi: 10.1109/HICSS.2005.299

12. Sutton, S.A., Mason, J.: The Dublin Core and metadata for educational resources.In: International Conference on Dublin Core and Metadata Applications, pp. 25–31(2001)

13. Varma, V.: Building large scale ontology networks. In: Language Engineering Con-ference (LEC 2002), Hyderabad, India, p. 121 (2002)

14. Hong, J.F., Li, X.B., Huang, C.R.: Ontology-based predication of compound rela-tions: A study based on SUMO. In: Proceedings of PACLIC18, Waseda University,Japan, pp. 151–160 (2004)

15. Schneider, L.: Designing foundational ontologies: The object-centered highlevel ref-erence ontology OCHRE as a case study. In: Song, I.-Y., Liddle, S.W., Ling, T.-W.,Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 91–104. Springer, Heidelberg(2003)

16. von Brzeski, V., Irmak, U., Kraft, R.: Leveraging context in user-centric entitydetection systems. In: Proceedings of the 16th Conference on Information andKnowledge Management (2007)

Documents

The Semantic Web: From Representation to Realizationalumni.media.mit.edu/~kris/ftp/ThorissonEtAl-SemanticWebII-2010.pdf · The Semantic Web: From Representation to Realization 93