10
Agents in Delivering Personalized Content Based on Semantic Metadata Teppo Kurki Sami Jokela Reijo Sulonen HelsinkiUniversity of Technology Otakaari 1, Espoo, Finland Tel: +358-9-4511 [email protected], [email protected], [email protected] Marko Turpeinen Alma Media Corp. Etel~iesplanadi 24, Helsinki, Finland Tel: +358-9-50771 marko.turpeinen @almamedia.fi Abstract In the SmartPush project professionaleditors add semantic metadata to information flow when the content is created. This metadatais used to filter the information flow to provide the end users with a personalized news service. Personalization and deliveryprocessis modeled as software agents, to whom the user delegates the task of sifting throughincoming information. The key components of the SmartPush architecture have been implemented, and the focus in the project is shifting towards a pilot implementation and testing the ideas in practice. Introduction Internet and online distribution are changingthe rules of professional publishing. Physical constraints of media products are not any more the key factors in the publishing process. In newspaper publishing, the bottleneck was what could fit in the paper. Today publishers are capable of creating more content, but they need new methods for personalized distribution. Incrementalprice for publishing on the Internet is negligible and the limiting factors are in producing the right content and delivering it to the right user. Locating relevant information is time consuming and expensive. Customers need personalized information flow from trusted information sources. This service should adapt to customer’schangingneeds and interests. The main goal of the SmartPush -project is to bring together and satisfy the needs of both the professional publishers and their customers. This is achieved by targeting and personalizing the information flow. Content providers see metadata as a promising way to make their information better searchable and usable on the web. Currently news organizations deliver targeted Copyright © 1998, American Association for Artificial Intelligence (www.aaai.org). Allrights reserved. 84 information based on fixed and very limited metadata schemes. These specifications are often proprietary for one content provider. Sometimes providers agree upon a shared metadata scheme, like ANPA format (ANPA 1984), which is widely used in newswirecontent header description. In our approach, authors augment the information flow with rich metadata descriptions that are used in the delivery process. Information filtering is usually based either on textual information flow or on somemanual classification scheme embedded in the flow. Our system uses semantic metadata for personalization. The customer should be able to delegate the filtering task to a software agent. The agent should possess enough machine intelligence to match changing customer needs with continuous information flow from the content providers. Agent autonomy promises flexibility in adding newfunctionality and changing the waythe system works on the fly (Figure 1). Content 1~~~ [ User provider Figure1. Agents in content delivery. Metadata-based Information Filtering Textual information flow is normally analyzed using some variation of the term vector model (Salton and McGill 1983). Manual classification has typically been based on keywords or categories, which are used to makesimple decisions with Boolean logic. These approaches are fundamentally different. Classification system relies on human assistance, where an From: AAAI Technical Report SS-99-03. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved.

Agents in Delivering Personalized Content-Based Semantic Metadata

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Agents in Delivering Personalized Content Based on Semantic Metadata

Teppo Kurki Sami Jokela Reijo Sulonen

Helsinki University of TechnologyOtakaari 1, Espoo, Finland

Tel: [email protected], [email protected], [email protected]

Marko Turpeinen

Alma Media Corp.Etel~iesplanadi 24, Helsinki, Finland

Tel: +358-9-50771marko.turpeinen @ almamedia.fi

AbstractIn the SmartPush project professional editors add semanticmetadata to information flow when the content is created.This metadata is used to filter the information flow toprovide the end users with a personalized news service.Personalization and delivery process is modeled as softwareagents, to whom the user delegates the task of siftingthrough incoming information. The key components of theSmartPush architecture have been implemented, and thefocus in the project is shifting towards a pilotimplementation and testing the ideas in practice.

Introduction

Internet and online distribution are changing the rules ofprofessional publishing. Physical constraints of mediaproducts are not any more the key factors in the publishingprocess. In newspaper publishing, the bottleneck was whatcould fit in the paper. Today publishers are capable ofcreating more content, but they need new methods forpersonalized distribution. Incremental price for publishingon the Internet is negligible and the limiting factors are inproducing the right content and delivering it to the rightuser.

Locating relevant information is time consuming andexpensive. Customers need personalized information flowfrom trusted information sources. This service should adaptto customer’s changing needs and interests.

The main goal of the SmartPush -project is to bringtogether and satisfy the needs of both the professionalpublishers and their customers. This is achieved bytargeting and personalizing the information flow.

Content providers see metadata as a promising way tomake their information better searchable and usable on theweb. Currently news organizations deliver targeted

Copyright © 1998, American Association for Artificial Intelligence(www.aaai.org). All rights reserved.

84

information based on fixed and very limited metadataschemes. These specifications are often proprietary for onecontent provider. Sometimes providers agree upon a sharedmetadata scheme, like ANPA format (ANPA 1984), whichis widely used in newswire content header description. Inour approach, authors augment the information flow withrich metadata descriptions that are used in the deliveryprocess.

Information filtering is usually based either on textualinformation flow or on some manual classification schemeembedded in the flow. Our system uses semantic metadatafor personalization.

The customer should be able to delegate the filteringtask to a software agent. The agent should possess enoughmachine intelligence to match changing customer needswith continuous information flow from the contentproviders. Agent autonomy promises flexibility in addingnew functionality and changing the way the system workson the fly (Figure 1).

Content 1~~~ [ User

provider

Figure 1. Agents in content delivery.

Metadata-based Information Filtering

Textual information flow is normally analyzed using somevariation of the term vector model (Salton and McGill1983). Manual classification has typically been based onkeywords or categories, which are used to make simpledecisions with Boolean logic.

These approaches are fundamentally different.Classification system relies on human assistance, where an

From: AAAI Technical Report SS-99-03. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved.

expert makes the decision about what categories theinformation covers. Term vector model based approachestreat information as is. With term vector models thedecision process is more abstract and lends itself better tofully automatic personalization and delivery.

Our vision is that one can combine these two by usingmetadata for information filtering. This means that as thematerial is published, it is enriched with a semanticmetadata representation of the content. The associated highquality metadata has enough expressive power to functionas a surrogate for the entire content in information filtering.

Compared to simple keywords or categorization, themetadata should be more expressive and have widercoverage. Expressiveness facilitates more accuratepersonalization. By using the metadata for filtering weeliminate the need to transfer the full-length content, thusdiminishing the required bandwidth.

The fundamental question in information filtering iswhether or not a given user should be provided with aparticular piece of information. This question can beanswered in different stages of the content delivery process(Figure 2). Simple categorization done by the domainexpert implicitly targets the content very early in theprocess. On the other hand, an agent armed with a user’sknowledge base has to first fetch the information and thenanalyze it to decide if it should be given to the user.

If we think of an information creation and deliveryprocess, we can use software agents in various stages ofthe process. The selection or matching that results in apersonalized feed is one suitable task for an agent. Theuser delegates the task of sifting through the availableinformation to the agent. The intelligence required for thistask depends on the nature of information that the materialhas.

If the material is "dumb" ASCII text, the agent’s task ishard - in principle the agent has to mimic exactly the user’s

thoughts in reading textual information. This task becomesan order of magnitude harder, if the information is nottextual, but sound or video.

Structured documents contain more relevantinformation, since they carry information about thedifferent components and their relations within thedocument. This way the agent has more information towork on. For example it can tell from the structure whatparts of the documents contain heading information, whichdescribes the content in a compact form.

To develop this idea of augmenting the informationcontent to the extreme, the information flow could betagged for each potential user already by the contentprovider. The agent’s job would be just to route relevantinformation to the end users.

Somewhere between these extremes is the optimaldistribution of intelligence between the agent and theinformation flow. If we can describe the content in a waythat can express the most important aspects of the enduser’s interest profile, the job of the agent becomesconsiderably less demanding. It only has to compare thedescriptions of new pieces of content against the user’sinterest profile, and make delivery decisions based on howwell these two match each other.

This approach has naturally its drawbacks. Developing amodel that can capture even the most important aspectscovering the interests of a group of users is not an easytask. The expressiveness of any model is seriously limitedwhen compared to what can be expressed in naturallanguage, not to mention subtle semantics introduced bylayout structures or temporal structures in sound and video.

However, the metadata-based approach has definiteadvantages. Instead of each and every agent in the systemmaking their own analysis of the information, this workcan be done once and made available to all interestedparties. This way we can achieve economics of scale that

Content provider Content provider

Delivery

Content providerIIIII IContentcreation

Manualcategorization

Content provider

Content [creation

Metadata ]creation

End user End user End user End user

Figure 2. Different alternatives for personalized content delivery.

85

make it possible to use considerable human and machineresources for the metadata creation process. If the work isdone only once, it can be done properly instead ofhaphazardly doing it over and over again.

Pros Cons

Eady ¯ Bandwidth savings ¯ Loss of richness in¯ Multiuse advantages information, if¯ Mobility only metadata is¯ Uniformity delivered

Late ¯ Flexibility ¯ Requires the full-¯ Full material scale information

available flow

¯ Own judgment ¯ Nonuniformity

Table 1. Position of content analysisin the filtering process.

Generating Metadata during ContentDevelopmentSmartPush architecture relies on compact structuralrepresentation of the content, the metadata (Figure 3).Metadata means machine understandable informationabout information. An introduction to metadata standardscan be found e.g. in (W3Ca 1998, IFLA 1998).

U.S. Supporting

Figure 3. A news article with its semantic metadatarepresentation.

Our approach requires that we can provide the deliveryprocess with both the actual content and structuraldescriptions of the content semantics. This in turn leads tothe need to generate this metadata during the contentdevelopment (Figure 4).

Content creation Content formatting Content publication

Figure 4. Steps in the content development process.

The amount and type of required metadata depends on themedia and the end products created from the raw content.

Metadata can be generated during content creation orseparately for each product during the content formattingand publication phases. Some of the issues related togenerating semantic information within contentdevelopment process are discussed below:¯ Automated generation. Fully automated analysis of

information semantics is difficult, because theinterpretation of the content relies on human expertise.Metadata tools must therefore assist the generation asmuch as possible while leaving the judgment inclassification to the human experts.

¯ Different interpretations of the same content. Everyonehas his or her own interpretation and side of a story.However, if professionals using somewhat objective andsystematic methods describe the content, we can rely onone conceptual model that can then be extended. Theseexperts are available mostly during content creation andcontent formatting phases.

¯ Author expertise. Creator or editor together withassisting tools is the best source for ideas and thoughtsbehind the content. In a later phase of the process allaspects of the content can not be known.

¯ Traditional publishing rules inapplicable. Ensuredquality, brands and structures similar to traditionalmedia world do not exist, because online producers arenot bound to expensive production facilities or realestate. It is thus easy to imitate media conglomerates andadd own possibly false interpretations of the content. Ifsemantic information is accepted only from entrustedcontent providers or their subcontractors, we can rely onthose descriptions and thus improve perceived quality.

¯ Need for coordination. If semantic information does notexist in the information flow, every instance requiringthat information must develop methods for generatingand storing it. This in turn leads to inefficient andrepetitive analysis of content and varying storagemethods. In addition, if the content descriptions are laterchanged, distribution might lead to consistencyproblems.

In the SmartPush system, semantic analysis and metadatageneration are tightly integrated with the authoring. Thismeans that during the content creation phase authorsgenerate the required semantic information with the help of

86

a custom-made tool (Figure 5). This tool generates suggestion of suitable semantic information for theinformation flow, after which authors make necessarymodifications and publish this information together withthe actual content.

Figure 5. Content provider’s tool formetadata analysis and creation.

Metadata for Information FilteringThe following preconditions have to be met to successfullyuse metadata representations in information filtering.¯ Expressiveness and coverage. Metadata system must be

able to cover users’ interests sufficiently to makepcrsonalization decisions possible. One must understandwhat qualities customers value in the information flowand how these are expressed in the metadata structure.Defining such structure is a difficult task, becausecustomers have different interpretations andprioritizations regarding the content.

¯ Machine understandability. Once metadata is extracted,agents must be able to process it without humanassistance and without original content.

¯ Structural compatibility. Agents require uniform orcompatible metadata format to process contentoriginating from different sources. This dictates that theformat is flexible and expressive to accommodatedifferent needs, for example to describe different mediaformats like video and audio as well as material indifferent languages.

¯ Semantic compatibility. In addition to a common format,agents need shared understanding of the metadatasemantics. This is achieved with a common vocabularywith which to describe different types of content. If wewant to use the same metadata format with multipleorganizations, the descriptions must be agreed uponbetween these instances. Agents are of little help inresolving compatibility issues, as this task requires deepunderstanding of the content.

Several standardization efforts for creating uniformmetadata structures are under way. The Resource

Description Framework, RDF (Lassila and Swick 1998), a data model for describing attributes and relationsbetween documents. RDF is implemented with ExtensibleMarkup Language, XML (W3Cb 1998). Another effort Dublin Core (Weibel and Miller 1997), which defines a setof attributes for storing bibliographical information.Information Content & Exchange, ICE, concerns logisticinformation for republishing and syndicating electroniccontent.

None of these metadata efforts define usable semanticsfor describing content. We have chosen RDF as themetadata interchange format for our system, but havedeveloped our own method for describing topics coveredby an article. Our method consists of several independentdimensions. Each dimension describes one aspect of thecontent such as source, geographical location, story type orsubject area.

Each dimension has its own data model. For describingthe subject area of the material we have extended a simplehierarchic category system. The document’s relation todifferent concepts is expressed as weights in the leaf nodesof a hierarchical concept model (Figure 6). By expressingthe relative importance of different concepts as weights wegain a lot more expressive power compared to a simplebinary belongs-to relation. This model provides also anatural way for summarizing related lower level topics tohigher level concepts. In the current model the contentprovider determines the concept model, but it could also bedefined and refined by feedback from the users.

I News

]1

, [ ,Economies

Literature0.1

J

Sports 0.9

I , I[ St=k markets

[ rr=k ,fio,dII I

Diving. ]

Sailing0.1 0.8

IWater sports

0.9

Wind surfing I

Figure 6. Simplified example of a hierarchical model.

SmartPush Architecture

One part of the SmartPush project is to implement aworking prototype for an information targetingenvironment. The underlying goal is to develop aframework where a content producer can provide amultitude of different, customized information feeds andwhere the end user can filter different incoming feeds toget a personalized information service.

87

Metadata flow- from the content

producer

Matching Agent- matches documentmetadata againstregistered profiles

Profile Agent- manages profiles- handles feedback

Web

Delivery Agent SMS- communicateswith the user

Email

Pushchannel

End user

Figure 7. Main agent types.

Our architecture rests on two assumptions aboutsemantic metadata. Firstly, we assume that it will bepossible to create a domain specific ontology capable ofexpressing the different facets on which the end userdecides his or her interest for a given information item.Secondly, we assume that it will be not only possible butalso commercially viable to enhance the content withmetadata descriptions as it is created.

SmartPush relies on high quality metadata representationof the content. We can think of the content producers assources for metadata flows that describe the introduction ofnew content in the producers’ multimedia databases. Thisflow can be distributed to any and all interested parties,since it acts as an advertisement for the actual content. Themetadata flow is used to tell which content referencesshould be delivered to each user.

This type of a system can be implemented as amonolithic information system based on a central databaseor as a system of relatively independent software entities.We believe that in today’s networked informationmarketplace a rigid centralized system can not keep upwith the changes in the environment. Content providerscan no longer control the whole distribution chain. InternetService Providers, portal managers and others interfacingwith the end user need professional, high quality content.Thus there is a growing need for a distributed system thatchannels high quality content to the right users.

In the SmartPush system there exist three different majoragent types. The tasks involved in managing user profiles,matching incoming metadata records and deliveringinformation to the user are divided among three differentprocesses (Figure 7). For facilitating dynamic collaborationamong the agents we need also a coordination agent andontology agents. This division of tasks among severaldifferent agents allows flexibility in distributing processingload, changing the flow of information on the fly anddistributing tasks across organizational boundaries.

Profile Agent

The profile agent’s tasks involve managing profile and userinformation (Figure 8). When a new user enters the systemhe or she needs a fresh profile, which can be created inmany ways. For example the user can rank sampledocuments and the system derives the profile informationfrom them. The user can also give explicit keywordsdescribing the profile or pick one or more prototypeprofiles as a baseline profile that will start to adapt. Oncethe interest profile is created it needs to be storedpermanently.

The user needs to specify also options for delivering theresults of the information filtering process. The profilecontains information about how the system should actwhen sufficiently interesting material appears. If thesystem judges that the user’s interest for an incomingdocument exceeds a certain threshold it can take directaction such as sending an email message to deliver thesuggestion. An example of a more passive delivery optionis keeping a list of ten most interesting suggestions for theuser to retrieve at his discretion.

Figure 8. Profile agent administration view.

88

Adaptation is an important part of the overall system andrequires either implicit or explicit feedback from the user.As the user gives feedback for the system’s documentsuggestions the profile agent’s job is to modify the contentsof the interest profile to be better in line with the feedbackreceived over time. This means that the profile agent mayneed to maintain explicit feedback history as well ashistory of the documents that have been suggested to theuser and the user’s reactions to them. Depending on theimplementation of the adaptation algorithm historyinformation may be stored in a summary form or as acomplete log.

The location of the profile agent in the content provider -mediator - consumer chain has bearing on what functionsare possible. The profile can be stored and managed on theend user’s machine. This discloses as little information aspossible about the user, but makes all aspects of inter-agentcommunication and social filtering harder to implement.Furthermore the user has to carry a device containing theprofile with him for using the system on a mobile platformor at different locations. Distributed profile managementmakes also software upgrades more troublesome.

We have made the decision to have the profile agent inthe network hosted by an information broker. One or moreprofile agents provide services to several users, managetheir profiles and modify the profiles as the users interactwith the system. Thus the user can contact his or her agentwhen and where he or she wants and make changesindependent of location, available bandwidth, or device.

Separating the different functions in the delivery chainnecessitates trusting only one party, the profile agent, withthe user’s interest information. The user’s identity can behidden from the matching agent and content provider ifneeded.

A database covering several users’ profiles provides alsoa foundation for exploring social information filtering(Shardanand and Maes 1995). Similarities between profilescan be used to detect common areas of interest uncoveredby each individual’s profiles. The users benefit from thisanalysis of the profile database by getting suggestions theymight otherwise have missed.

Matching AgentThe actual matching process is carried out in a matchingagent (Figure 9). The interaction between the profile agentand the matching agent functions on subscribe-unsubscribebasis, where the profile agent expresses the user’s interestareas by sending over a compact interest representation.The matching agent sees the user as an abstractidentification code, an interest profile and possible deliveryoptions such as a minimum interest value defined forincoming suggestions.

rn ’

Figure 9. Matching agent status view.

The task of the matching agent is to compare incomingmetadata records against the base of registered profiles.The agent listens for incoming metadata, decodes andparses it as it arrives and compares it against each profile.The comparisons are based on a distance measure thatmeasures how far the document is from each profile(Figure 10). We have developed an asymmetric distancemeasure that considers how well the document fits somepart of the profile (Savia, Kurki, and Jokela 1998). Thebest matching document does not have to cover all theareas that the user has interest for. Many other systems usea symmetric distance measure, such as the cosine measureused with the term vector model.

http:#smartpush:cs:hut.111 10:1http:l/vcww.h ut.fi! ~i 0111Markkinoilla SP:n inopeaap~i~itSst~ikiiteltiin 10:005...3 kk:n korko 4,73 prosenttla 10:087.:.

, Nesteell~ nyt 25 000osakkeenornistajaa 10,086;,:

Figure 10. Example suggestions with distance measures.

Optimized computations are required for a scalablematching process. Our model for content description ishierarchic and summarizes the user’s interest to several toplevel categories. This offers a natural way to limit thenumber of profiles that have to be considered as possiblematches. The agent can take into account only thoseprofiles that have something in common with the documenton the top level of the hierarchy.

Delivery AgentOnce the matching agent has decided to suggest a certainreference to the user it hands it over to the delivery agent.The delivery agent’s role is to deliver the suggestion to theuser according to the user’s delivery options.

The delivery agent has different modules as its actors forcommunicating the suggestions to the user. The deliveryprocess can involve two-way communication through arelatively restricted channel. For example the user can askfor current suggestions by sending a short message from amobile phone to the delivery agent and have it return asuggestion list.

89

The basic idea in using a separate delivery agent is toseparate the end user communication details from therelatively high level interagent communications. This waynew communication and interaction mechanisms can beadded without changes in the basic setup.

Ontology AgentThe incoming metadata flow and the profiles managed bythe profile agent have to have common representations forthe matching agent to be able to make the comparisons in ameaningful way. Even if the matching agent understandsthe digital representations, they have to be compatible alsoon a conceptual level.

There is no doubt that creating a universal conceptualmodel and wide scale adoption of a single model isimpossible in the real world. Thus we need to take intoaccount the need for ontology agents that are able totranslate between different representations available fromdifferent content producers. If there is no ontology agentthat is able to offer at least a partial bridge between thesemantic islands the system remains specific to a singlesource of information.

Coordination AgentIf there are individual agents that need to discover eachother, we need a coordination agent. In our case forexample dynamic introduction of new ontologies anddelivery services necessitates a system where new servicesannounce their availability and contact information to acentral coordination agent that acts as a clearinghouse forthe current capabilities of the whole system.

A linear distribution chain consisting of a single triple ofone matching agent, one profile agent and one deliveryagent does not really need automatic coordinationtechniques. The agents’ interconnections are hard coded orset up once with no dynamic reconfiguration.

Agenthood in SmartPushWhen applying "Agenthood as an ascriptor" (Bradshaw1997) model to the SmartPush agents, we can easily admitthat the components are not really agents. As creators ofthe system we possess knowledge about its inner workingsand regard it as a deterministic system. However, if weobserve the system from the user’s point of view, thesystem may seem to be a rational entity with its capabilityto learn from users and to recommend interestinginformation.

As a common definition for agenthood does not exist,we analyze some common agent qualities and comparethese to the existing and near future SmartPusharchitecture.

MediationAgents have been characterized as intelligent middlewaremodules that mediate between resources and customers

(Wiederhold 1992). Our agents’ tasks can be mostlyconsidered as mediation services. They access data frommultiple resources, and represent the information in anintegrated and homogenized package to the users.

Resource models and customer models in mediationservices are conceptually based on hierarchies (Wiederholdand Genesereth 1997). This allows inheritance, selection,and aggregation in the model. Hierarchical approach is alsoa key to SmartPush functionality.

The matching agent is the matchmaker between themetadata used as the resource model, and the customermodel that represents the end user. This agent acts as alogical mediator that connects the user with rightinformation objects, as they become available. Thefunctionality of the matching agent is similar to content-based routing of matchmakers in (Kuokka 1995).

Profile agents encapsulate user’s interests and representthe user to other agents in the system. They provide amediation service functioning as gatekeepers by notrevealing private information unless necessary.

AdaptivityThe profile agent acts as an interface agent (Maes 1994)that monitors user’s behavior, learns from theseinteractions, and adapts to user’s needs. This meanschanging user’s interest profile according to monitoredbehavior.

SmartPush users provide feedback on suggestions andthe system is able to adapt to changing interests. Thus thesystem acts as an adaptive agent by learning and improvingits performance with experience.

Cooperation and CommunicationShoham has defined agenthood as "agent is a softwareentity which functions continuously and autonomously in aparticular environment, often inhabited by other agents andprocesses" (Shoham 1997).

Social ability, meaning an agent possessing explicitmodel of other agents, does not exist in SmartPush. Theonly moment in SmartPush where an agent has knowledgeabout interiors of another agent is when they communicatedirectly with the same data structures.

SmartPush agents are not currently cooperative, butcollaborative filtering functionality is to be incorporated tothe future versions of the system. When those facilities areready, SmartPush will be able to recommend documentsbased also on other people’s interests. Whethercollaborative filtering qualifies for multi-agent behavior, iswhole another question.

Without common semantics and understanding of thecontent there is little to communicate between differentparts of the system. SmartPush does not currently use anygeneric agent communication language (ACL) such KQML, but instead relies on Java Remote MethodInvocation (RMI) mechanism for communication betweenthe agents.

90

Without any coordination or ontology translationservices in the current implementation there is little needfor a universal communication mechanism. All agents inthe system use shared, proprietary concepts. The mainadvantages for using separate agents for different tasksrelate to load balancing, scalability issues and privacy.

In the future multiple content providers using differentconceptual models require the use of an ACL. Withaccessible ontology agents SmartPush would be able toconsolidate information from different sources soproviding the user with more coherent service.

AutonomyWhether the SmartPush system acts proactively orreactively depends on the point of view. From customer’spoint of view the system is proactive with its ability tomonitor its environment and recommend new material byusing its information push facilities.

If we observe the system from the content provider’sviewpoint, it seems to act reactively. From that angle thesystem is activated by the distribution of a new informationobject, which triggers the delivery of new material to therelevant customers. All in all, the system actsautonomously after the initialization.

MobilityThe current model of dividing tasks among differentmodels does not require agent mobility. The tasks areclearly divided and there is no need to migratefunctionality from one agent to another. Bandwidth savingsthat motivate mobile agent architectures are not significantin our approach (Harrison, Chess, and Kershenbaum 1995).

Related WorkIn most agent-based information filtering systems agentsanalyze the information in complete and original form andcreate an index for document matching (Moukas and Maes1998, Mostafa et al. 1997). Our matching agent comparesmetadata descriptions of incoming material against userprofiles as new material is published. The computationalrequirements for the matching process are therefore greatlyreduced.

Information Lens (Malone et al. 1987) is a rule-basedsystem for information filtering for co-operative workenvironments inside organizations. Users create their ownagents that filter the information based on the metadata thatthe material contains.

Fishwrap is a personalized newspaper that receivesmaterial from multiple information sources. The systemuses a knowledge-base of topic definitions and matchesincoming material against the specified topics (Chesnais1995). This pre-categorizing significantly speeds up thepersonalization process for each Fishwrap reader.

InfoFinder is an agent that learns to find and categorizeinformation using sample documents selected by the user.By using these samples, it extracts semantically significant

phrases from documents and tries to learn optimal searchqueries for each category (Krulwich 1997).

Edlnfo project at Swedish Institute of Computer Scienceuses edited adaptive hypermedia model, which combineshuman expertise with machine intelligence in the filteringprocess (Rudstrtm, Waern, H66k 1997). A human expertacts as the information broker between information sourcesand readers. Edlnfo users are able to collaborativelyannotate and categorize the information. This approach hasbeen tested in the ConCall system in filtering conferencecall messages (Waern et al. 1998). In SmartPush, assume that the human expertise for metadata creation canbe located at the information source instead ofintermediary human brokers. However, the possibility offlexible user defined classification schemas is worthexploring.

Push technology has been touted as the solution toinformation overload problem. The information isdistributed using the channel metaphor where users switchbetween channels of different types of information content.Most commercial push-oriented services, such as PointCastNetwork, are based on static user models derived from dataentered directly by the user. Our work is more focused onpersonalization and adaptivity, leaving push technology asone option for delivery.

In our earlier work, we have specified a generic agent-based architecture for news filtering (Turpeinen et al.1996), and introduced the idea of using edited semanticmetadata in the filtering process (Saarela et al. 1997).

Conclusions and Future WorkWe have presented in this paper an information deliveryarchitecture based on semantic metadata and agents. Onceprofessional authors have generated both the content andits semantic metadata, agents add flexibility to the deliveryby performing user profiling, document matching anddelivery tasks.

The idea in our approach is to use the author’s expertiseand perform the semantic analysis of the content as early aspossible, within the creation of the actual content. By usingwell-defined and expressive semantic models, we cangenerate personalized information flows with relativelystraightforward computations. Incorporated semanticinformation allows agents to concentrate on the deliverywithout a need to understand the content. These ideastogether lead to a more accurately targeted and higherquality information flow to the customers.

This paper describes work in progress. Our work has hada strong emphasis on methodologies for content and profilerepresentation as well as for matching algorithms. Thepresent implementation of the system can be seen as aproof of concept working on artificial data and profiles.Our current focus is shifting towards a pilotimplementation and testing the ideas with a continuousreal-world data flow. The future work in the projectincludes also a field trial within a commercial online newsservice.

The ideas presented in this paper have backing frommedia industry professionals and the described scenarioseems viable for applications in today’s online media.However, as the current results are still purely academic,the ideas can be fully validated only after the completion ofthe pilot phase.

AcknowledgementsThis research is part of the SmartPush project sponsored bythe Finnish Technology Development Centre TEKES andAlma Media, WSOY, Sonera, ICL, Nokia Research Centreand TeamWARE Group.

ReferencesANPA Format: Wireless Service Transmission Guidelines1984. Report Number 84-2. American NewspaperPublishers Association. Washington, DC.

Bradshaw, J. 1997. An Introduction to Software Agents. InSoftware Agents, Bradshaw, J.(ed.), AAAI Press / TheMIT Press. Cambridge, USA.

Chesnais, P.; Mucklo, M.; and Sheena J. A. 1995. TheFishwrap Personalized News System, IEEE SecondInternational Workshop on Community NetworkingIntegrating Multimedia Services to the Home. June 1995,Princeton, New Jersey, USA

Harrison, C. G.; Chess, D. M.; and Kershenbaum A. 1995.Mobile agents: Are they a good idea? IBM ResearchDivision, T.J. Watson Research Center. Yorktown Hights,NY, U.S.A.

IFLA 1998. Metadata Resourceshttp://www.nlc-bnc.califiallIImetadata.htm

Krulwich, B. and Burkey, C. 1997. The InfoFinder Agent:Learning User Interests through Heuristic PhraseExtraction. IEEE Expert, September/October 1997.

Kuokka, D. and Harada 1995. Supporting InformationRetrieval via Matchmaking. Working Notes 1995 AAAISpring Symposium on Information Gathering inHeterogeneous, Distributed Environments. TechnicalReport SS-95-08, AAAI Press, 1995.

Lassila, O. and Swick, R. 1998. Resource DescriptionFramework (RDF) Model and Syntax Specification.W3C Working Draft 08 October 1998,http://www.w3.org/TR/WD-rdf-syntax/ (work-in-progress).

Maes, P. (1994). Agents that Reduce Work andInformation Overload. Communications of the ACM, Vol.37, No. 7.

Malone, T. W.; Grant K. R.; Turbak F. A.; Brobst S. A.;and Cohen M. D. 1987. Intelligent information-sharingsystems. Communications of the ACM, Vol. 30, No. 5, pp.390-402.

Mostafa, J.; Mukhopadhyay, S.; Lam W.; and Palakal M.1997. A Multilevel Approach to Intelligent InformationFiltering: Model, System, and Evaluation. ACMTransactions on Information Systems, Vol. 15, No. 4,October 1997, pp. 368-399.

Moukas, A. and Maes, P. 1998. Amalthaea: An EvolvingMulti-Agent Information Filtering and Discovery Systemfor the WVdW. Autonomous Agents and Multi-AgentSystems, Vol.1, No.I, pp. 59-88

Rudstr6m, ~.; Waern, A.; H65k, K. 1997. InteractiveAdaptation of Intranet Newsletters. Sixth InternationalConference on User Modeling, Sardinia, 2-5 June 1997

Saarela, J.; Turpeinen, M.; Puskala, T.; Korkea-aho, M.;and Sulonen, R. 1997. Logical Structure of a HypermediaNewspaper. Information Processing & Management, Vol.33 No. 5, pp. 599-614.

Salton, G. and McGill, M.J. 1983. Introduction to ModernInformation Retrieval. McGraw-Hill, New York

Savia, E.; Kurki, T., Jokela, S. (1998). Metadata basedMatching of Documents and User Profiles. In Human andArtificial Information Processing, Proceedings of the ~hFinnish Artificial Intelligence Conference, pp. 61-69:Finnish Artificial Intelligence Society.

Shardanand, U. and Maes, P. 1995. Social InformationFiltering: Algorithms for Automating "Word of Mouth".CHI’95 Proceedings, Denver, Colorado, USA.

Shoham, Y. 1997. An Overview of Agent-OrientedProgramming. In Software Agents, Bradshaw, J.(ed.),AAAI Press / The MIT Press. Cambridge, USA.

Turpeinen, M.; Saarela, J.; Korkea-aho, M.; Puskala, T.;Sulonen, R. 1996. Architecture for Agent-mediatedPersonalised News Services. PAAM’96: The FirstInternational Conference on Practical Application ofIntelligent Agents and Multi-Agent Technology, 22-24April 1996, London, UK.

W3Ca 1998. Metadata Activity. W3C Technology andSociety domain,http://www.w3.org/Metadata

W3Cb 1998. Extensible Markup Language (XMLTM). W3C

Architecture Domain,http://www.w3.org/XML/

92

W3Cc 1998. The Information and Content Exchange (ICE)Protocol. Submission to the Worm Wide Web Consortium,http://www.w3.org/TR/1998/NOTE-ice- 19981026

Weibel, S. and Miller, E. 1997. Dublin Core MetadataElement Set WWW homepage.http://purl.org/metadata/dublin_core

Waern, A.; Tierney, M.; Rudstr6m, A.; and Laaksolahti, J.1998. ConCall: An information service for researchersbased on EdInfo. SICS Technical Report T98-04, October1998.http://www.sics.se/-mark/papers/t9804/t98-04.htm

Wiederhold, G. 1992. Mediators in the Architecture ofFuture Information Systems. Computer, Vol. 25, No.3,March 1992, pp. 38-49.

Wiederhold, G. and Genesereth, M. 1997. The ConceptualBasis for Mediation Services. IEEE Expert,September/October 1997, pp. 38-47.

93