10
Collaborative Search and Retrieval: Finding Information Together Mark van Setten and Ferial Moelaert-El Hadidy Telematica Instituut P.O. Box 589 7500 AN Enschede, The Netherlands +31 53 4850485 [email protected], [email protected] ABSTRACT The emerging field of collaborative search and retrieval offers exciting possibilities for both businesses and consumers. This paper sheds insights on several types of information search and retrieval that combines the field of computer supported collaborative work with information search and retrieval. We examine the roots of collaborative search and retrieval, the major challenges, the types of solutions that exist today, what we can expect in years to come and issues that need to be addressed. Keywords Collaborative Search and Retrieval, Collaboration, Browsing, Filtering, Profiles, Agents INTRODUCTION Consider the following situation: A TV producer looking for a video fragment of a running horse on a beach requests a librarian to retrieve this fragment from a large collection of video information. A few days later the producer receives a tape with a number of scenes. Not satisfied with the results the producer asks the librarian for the same type of fragments but without any buildings and people in the scene. This time the producer finds a scene that matches his needs. Consumers of information, like the one sketched above, have one thing in common: they face the challenge of finding relevant information. This becomes more difficult due to the explosive growth of available digital information. If the TV producer could collaborate with the librarian during the search process, a quicker and better query formulation could be achieved. This increases the quality of the search results and decreases the duration of the search process. The converging boundaries between the richness of high- resolution video, high-fidelity sound, the interactive content of the PC and the Internet add even more complexity. These challenges are visible at both the content production and deployment stage. Content production represents the practice of creating, indexing, cataloging, and storing media assets in digital form so that they can be searched, shared, distributed, re-purposed, and e-commerce enabled. Content deployment is concerned with disclosure and delivery of digital content to end users, based upon the information need of those end users. Effective management of content and media assets will lead to better utilization of intellectual capital. Ultimately, the goal is effective knowledge management, the fullest exploitation of our intellectual capital that is the basis upon which companies will soon be valued. Using collaboration in the content management chain is a strategy that may lead to more effective knowledge management. This paper offers an introduction to the emerging area of using collaboration in the content deployment process. Information search and retrieval, which is concerned with the disclosure and delivery of digital content to the end users, is one of the main parts of the deployment process. Combining computer supported collaborative work (CSCW) and information search and retrieval results in collaborative search and retrieval: the support of collaboration in searching and retrieving digital information by integrated applications using telecommunication and/or information technology. Various research projects tackle the field of collaborative search and retrieval. GroupWeb [8], CSCW3 [9], and Let’s Browse [13] support collaboration during browsing. Ariadne [24] focuses on the use of search histories, while Fab [1], Sifter [12], NewT [14], and Phoaks [22] use collaboration in filtering and recommending information. Most of this research focuses on just one of the types of collaborative search and retrieval. This paper examines the roots of collaborative search and retrieval, the major challenges, the types of solutions that exist today, what we can expect in years to come and issues that need to be addressed. This paper starts with an overview of the major types of collaborative search and retrieval that we distinguish. These types are than described in more detail, including

Collaborative Search and Retrieval: Finding Information Together

Embed Size (px)

Citation preview

Collaborative Search and Retrieval:Finding Information Together

Mark van Setten and Ferial Moelaert-El Hadidy

Telematica Instituut

P.O. Box 589

7500 AN Enschede, The Netherlands

+31 53 4850485

[email protected], [email protected]

ABSTRACTThe emerging field of collaborative search and retrieval

offers exciting possibilities for both businesses and

consumers. This paper sheds insights on several types of

information search and retrieval that combines the field of

computer supported collaborative work with information

search and retrieval. We examine the roots of collaborative

search and retrieval, the major challenges, the types of

solutions that exist today, what we can expect in years to

come and issues that need to be addressed.

KeywordsCollaborative Search and Retrieval, Collaboration,

Browsing, Filtering, Profiles, Agents

INTRODUCTIONConsider the following situation: A TV producer looking

for a video fragment of a running horse on a beach requests

a librarian to retrieve this fragment from a large collection

of video information. A few days later the producer

receives a tape with a number of scenes. Not satisfied with

the results the producer asks the librarian for the same type

of fragments but without any buildings and people in the

scene. This time the producer finds a scene that matches his

needs.

Consumers of information, like the one sketched above,

have one thing in common: they face the challenge of

finding relevant information. This becomes more difficult

due to the explosive growth of available digital information.

If the TV producer could collaborate with the librarian

during the search process, a quicker and better query

formulation could be achieved. This increases the quality of

the search results and decreases the duration of the search

process.

The converging boundaries between the richness of high-

resolution video, high-fidelity sound, the interactive content

of the PC and the Internet add even more complexity. These

challenges are visible at both the content production and

deployment stage. Content production represents the

practice of creating, indexing, cataloging, and storing media

assets in digital form so that they can be searched, shared,

distributed, re-purposed, and e-commerce enabled. Content

deployment is concerned with disclosure and delivery of

digital content to end users, based upon the information

need of those end users.

Effective management of content and media assets will lead

to better utilization of intellectual capital. Ultimately, the

goal is effective knowledge management, the fullest

exploitation of our intellectual capital that is the basis upon

which companies will soon be valued. Using collaboration

in the content management chain is a strategy that may lead

to more effective knowledge management.

This paper offers an introduction to the emerging area of

using collaboration in the content deployment process.

Information search and retrieval, which is concerned with

the disclosure and delivery of digital content to the end

users, is one of the main parts of the deployment process.

Combining computer supported collaborative work

(CSCW) and information search and retrieval results in

collaborative search and retrieval: the support of

collaboration in searching and retrieving digital information

by integrated applications using telecommunication and/or

information technology.

Various research projects tackle the field of collaborative

search and retrieval. GroupWeb [8], CSCW3 [9], and Let’s

Browse [13] support collaboration during browsing.

Ariadne [24] focuses on the use of search histories, while

Fab [1], Sifter [12], NewT [14], and Phoaks [22] use

collaboration in filtering and recommending information.

Most of this research focuses on just one of the types of

collaborative search and retrieval. This paper examines the

roots of collaborative search and retrieval, the major

challenges, the types of solutions that exist today, what we

can expect in years to come and issues that need to be

addressed.

This paper starts with an overview of the major types of

collaborative search and retrieval that we distinguish. These

types are than described in more detail, including

challenges and possible solutions. Where available,

examples of research and tools are given. Furthermore, we

identify several general issues that need to be addressed for

all types of collaborative search and retrieval.

COLLABORATIVE SEARCH AND RETRIEVAL TYPESThere are three main ways for users to search for

information: browsing, querying and filtering. Browsing is

the process of searching for information by navigating

through links between electronic documents. Querying is

searching for information by explicitly formulating the

user’s need in a question. Filtering is the process in which a

system filters a vast amount of information and only

delivers or recommends information to the user that is

relevant to the user.

Within search and retrieval, there are two types of players:

users and systems (also called agents). Retrieved

information can also be re-used again. Furthermore, a

distinction between two types of collaboration can be made:

working together and mediated working. Working together

means two or more players are both directly involved in the

same task and are equal partners, whereas mediated

working means that one player performs a task by using

another player as an expert who is doing part of the task,

based on his expertise.

Based on these aspects, we distinguish five types of

collaborative search and retrieval:

1. Collaborative browsing: collaboration when users arebrowsing;

2. Mediated searching: collaboration when users and/orsystems are working with mediators during querying;

3. Collaborative information filtering: collaborationduring information filtering;

4. Collaborative agents: collaboration between systems;

5. Collaborative re-use of results: collaboration whenretrieved information is re-used.

Besides these five types of collaborative search and

retrieval, there is also an interesting side effect that can be

enhanced when using collaboration in search and retrieval,

which is building communities of interest.

These five types of collaborative search and retrieval and

building communities of interest are subsequently discussed

in the remainder of this section.

Collaborative BrowsingWith collaborative browsing, people are searching and

retrieving information together via browsing. Browsing can

be seen as an implicit way to formulate a query [26].

Another type of browsing is when someone is not searching

for a specific document, but is merely exploring a certain

subject.

AdvantagesOne of the major problems with browsing is that the user

does not always know up-front whether a link will actually

bring him closer to his goal or if the new information is

interesting. Browsing together with other people can reduce

this problem. Collaborative browsing can help answer the

question of “which link should be followed next?” as some

participants may already have read some of the referenced

documents.

By using collaborative browsing, we believe that group

understanding can be improved, as team members become

aware of reasons behind choices and know why some

documents are more relevant than others, because they were

all involved in the browsing process. This also helps the

process of teambuilding, as people are actually working

together in performing a task.

Collaborative browsing also enables division of labor.

When a group needs to browse through an extensive

collection of references, each member can explore a subset

of this collection. This makes the whole browsing process

more efficient [28]. When dividing labor, two processes

become important: splitting and joining. Splitting is the

division of tasks amongst the group members, after which

each member individually performs the tasks assigned to

him. At the end, the results of the individual tasks must be

joined to create a combined end result.

An advantage that follows from both the division of labor

and the sharing of knowledge is that collaborative browsing

allows groups to build up expertise quickly. Especially

when there is collaboration between experts in different

domains [28]. When information is needed that is situated

on the boundaries of two or more different knowledge

domains, experts from each field may not have enough

knowledge of the other domains to distinguish between

relevant and irrelevant information. When experts browse

together, they can consult and help each other in finding the

most relevant information.

Another area of collaborative browsing can be found in the

entertainment domain, with games like team scavenger

hunts [28], e.g. Quake – capture the flag.

ChallengesCurrently, there are three major challenges when people

want to browse collaboratively:

The first challenge is that every participant has his own

background, knowledge and interests (which is also an

advantage in cross-domain browsing). Although this can

make the choice of the next link to follow easier (due to

possible up-front knowledge of participants), it can also

make the choice more difficult as conflicting opinions may

arise amidst the participants. There are some solutions to

address this issue. One solution is the use of user profiles.

For each participant there is a user profile that describes his

interests and backgrounds. A collaborative browsing system

can use these profiles to advice the group about the links

that best match the interests of all participants. Another

solution is using predefined group goals that are used by

the system in a similar way as user profiles. These solutions

are related to filtering systems (see the collaborative

information filtering section). Finally, a collaborative

browsing system can take advantage of a voting system,

where users vote for the link to follow. The system collects

these votes and uses them to either advice about the link to

follow or automatically follow the link with the highest

votes. Combinations of the described solutions are also

possible.

Another main challenge of collaborative browsing is that all

participants need to read simultaneously through the

displayed information. During browsing, all participants

read information from one or more screens. When a

document does not fit on one screen, scrolling through the

document is necessary. With more than one user, this is

complicated as everyone reads at a different speeds. A

solution for this problem is allowing each participant to

have a certain degree of freedom in reading through the

information. Degrees of freedom in reading through

document(s) are (based on the zipper model [5], [6], which

identifies various levels on which states can be dynamically

coupled and uncoupled in a collaborative system, and

applied to browsing):

• Fixed page scrolling: All participants watch the same

information simultaneously, and scroll to the same

position in the document. This type of collaborative

browsing can be provided by using application sharing

on existing browsers. It does not allow users to read

information independently and does not address the

challenge we described above.

• Independent page scrolling: all users are looking at the

same document on their own screens and they can

scroll through the current page independently. When

one user chooses to browse to another page, all

browsers of the other users automatically go to that

page. On that page, they can again read and scroll

through the information independently. This is a form

of relaxed “what-you-see-is-what-I-see” (WYSIWIS)

and can for example be found in GroupWeb [8].

• Independent document browsing: participants cannot

only read and scroll through the current page, they can

also start reading other directly linked pages. The level

of depth to which individual members can browse is

determined up-front. There is one user who is in

control of the common “base” page and who can

synchronize the screens of all participants. This is very

useful for group browsing, as it allows each user to

explore a subset of links, after which the whole group

can make a better decision on which links are relevant

for the whole group.

• Independent site browsing or independent related

browsing: this allows participants to browse completely

independent. The collaborative browsing system keeps

track of where each individual participant is and has

been. It can inform other participants of pages that

other members already read. It can also synchronize the

individual screens to one page of interest. This type of

collaborative browsing is related to collaborative

filtering systems, but in this situation the participants

are browsing synchronously.

Related with the previous challenges is the challenge of

who can actually activate the next link? One possibility is to

give one user the control. When deciding which link to

follow he may use the advice of others (using voting, a

common goal, a recommendation of the system based on

user profiles and common goals etc). Another type of

control is when the system selects the link to follow, based

on voting of the participants, common goals and/or user

profiles. In this situation the participants have no direct

influence over the link to follow after it has been calculated

by the system. It is also possible to give every user the

rights to navigate, but this may lead to conflicts.

Spatial and temporal aspectsBased on the place dimension in the time-place matrix of

CSCW [10] two types of collaborative browsing are

distinguished: co-located and remote. With co-located

collaborative browsing all the participants are in the same

location, whereas with remote collaborative browsing some

or all participants are at different locations.

The time dimension in the time-place matrix can also be

used to differentiate between two types of collaborative

browsing: synchronous collaborative browsing and

asynchronous collaborative browsing. When people

browse synchronously, they read the information, choose

the next link and follow these links simultaneously. With

asynchronous browsing, participants read the current

information and choose the next link to follow at different

times. Depending on the type of decision-making (voting,

user profiles, common goals or some combination)

participants must wait until some sort of synchronization

point has been reached. Examples of synchronization points

are time, a minimum number of participants that have read

the information or an intervention from a coordinator.

Especially for asynchronous browsing the issues of splitting

and joining are important.

Current systemsCurrently, there are some systems available for

collaborative browsing, but they do not cover all the

possibilities of collaborative browsing. One research system

for collaborative browsing is called “Let’s Browse” [13].

This system is used for synchronous co-located

collaborative browsing, using user interests.

Another possibility is using application sharing systems,

like those available as part of Microsoft NetMeeting1 and

Friends [27]. In these applications, a remote user can view

(and if allowed even control) the screen of another user’s

system. This allows multiple users to browse together when

1 http://www.microsoft.com/netmeeting

used in combination with remote conference facilities like

Internet Phone, video conferencing or chat facilities.

Other synchronous browser sharing systems are GroupWeb

[8], CSCW3 [9] and the proxy-based approach for

cooperative WWW Browsing described in [3].

WebEx2 is a system for remote meetings via the web. One

of the functionalities is collaborative browsing. The degree

of reading freedom is rather low, as the users can only

watch the chairman browse through the information and

they are not able to scroll through the current document

themselves.

There is an application that uses an electronic equivalent of

post-it notes to supports remote asynchronous collaborative

browsing. This application is called uTOK3. With this

application users can leave electronic notes on web pages.

Users visiting a web page can read the notes placed by

other users. These notes contain questions, suggestions and

remarks about the web page.

Currently most systems for collaborative browsing hardly

support splitting and joining and only allow fixed page

scrolling or independent page scrolling. To take full

advantage of the possibilities of collaborative browsing, we

recommend that both splitting and joining and other degrees

of reading freedom should be supported.

This section discussed the use of collaboration during

browsing, while the next section discusses the use of

collaboration when users and/or systems are working with

mediators during querying.

Mediated SearchingThere are situations in which people need the help of other

people. This is also true when searching and retrieving

information. Although a person may have knowledge about

the domain he works in, he may not be an expert in

searching information, for which he needs help. Someone

may also need help when trying to find information about

new topics outside his own knowledge domain. In both

situations, people turn to others to support them in their

search tasks.

The classic example of a mediated search environment is

the library where librarians offer support to a user. Another

example is booking a trip for a conference or holiday. Tour

operators help booking a seat on an airplane, a hotel and a

rental car. Also the use of call centers and helpdesks that

help people to find information or to solve problems is an

example of mediated searching. These types of mediators

have existed for several years now. But this is not always

effective and efficient. In some situations, problems occur

(the hotel is not what the customer asked for, the plane

leaves a day too late or the requested information formally

meets the query, but it is not exactly what the customer

2 http://www.webex.com3 http://www.utok.com

wanted). It can also require many interactions between the

customer and the mediator before the mediator fully

understands the customer’s needs.

ResultPresentation

Matching

QueryFormulation

GoalsTranslating goal into query Matching results with goals

Processing Query Returning Results

Query Reformulation

Figure 1. Query Formulation and Result Browsing

Information and communication technology (ICT) can offer

support for some of these situations. The support is

described on two levels: the query formulation level and the

result browsing with query reformulation level. Figure 1

visualizes these two levels in the search and retrieval

process. On the first level, the search expert and customer

use ICT to formulate the need of the customer in a way that

allows them to both understand this need. Tools are needed

to help the (remote) communication between customer and

expert i.e. phone support, e-mail and video-conferencing.

Collaborative result browsing is the same as collaborative

browsing where the roles of the participants differ. There

are participants who are search experts (the mediator) while

others are domain experts (the users). An example is the

librarian who is the search expert and the library customer

who is the domain expert. For more information about

collaborative browsing, see the previous section.

Figure 2. Relationship between Query Quality

and Result Browsing

There is a relationship between the quality of query

formulation and the need for collaborative result browsing.

When query formulation quality is high (left side of

Figure 2) the need for collaborative result browsing is low

as the search results will be more relevant and contain less

irrelevant results and visa versa (right side of Figure 2).

For collaborative query formulation, there are two ways in

which collaboration can take place: direct collaboration

and indirect collaboration. Direct collaboration means

collaborative query formulation using tools, whereas

indirect collaboration is performed by using search history.

Collaborative query formulation using toolsSome information search and retrieval systems require

complex query formulation. For such systems, experts are

usually needed to translate user’s needs to correct and

efficient system queries. When this translation process is an

asynchronous and/or remote process, the interaction

between the user and the expert is limited. This increases

the potential of translation errors, frequently resulting in

incorrect query and wrong results. In this case, tools are

needed to help improving the understanding between the

user and the expert.

The actual need of collaborative query formulation also

depends on the quality and ease-of-use of the user interface

of the information search and retrieval system. We believe

that well designed and easy to use interfaces result in less

need for experts. Another alternative for collaborative

query formulation is the use of artificial intelligence

dialogue systems that help the user to formulate a query.

We believe that for specialized domains and cross-domain

searches, the need for mediation by experts still remains.

Using search historyAnother class of mediated searching is the reuse of search

history. Search history is the recorded and stored

information about interactions with the systems by a user

during a search (e.g. query formulation, browsing through

results, query reformulation etc). A non-expert user can use

the search history of experts (or other users, including

himself) to learn how they search for information. In this

case, the knowledge of an expert is used in an indirect

manner. The search history can be visualized in a manner

that can easily be interpreted by others. Besides viewing, a

user can even replay (parts of) the search process during his

own search process using different parameters. An example

of this type of expert search history reuse is implemented in

the Ariadne system [24]. Search history can also be used the

other way around. According to Twidale and Nichols [23],

experts can use the search history of a user to give a user

suitable advice in how to search for required information.

This section discussed the use of collaboration when users

and/or systems are working with mediators during querying.

The next section discusses the use of collaboration in

information filtering.

Collaborative Information FilteringInstead of explicit collaboration, like collaborative

browsing and most types of mediated search and retrieval,

there is also a type of implicit collaboration. In these

situations, users are helped in their information search and

retrieval by services that recommend information or show

the relevancy of information based on the use, history and

ranking of other users for these documents. In this type of

collaboration, we distinguish three types:

• Information filtering, which filters information based

on the interests of the user and the content of the

information;

• Collaborative filtering, which filters information based

on the interests of the user and the ratings of

information by users with similar interests;

• Collaborative information filtering, which filters

information based on the interests of the users, the

content of the information and the ratings of

information by users with similar interests.

Information filteringIn single user situations, information filtering is the process

in which a system filters a vast amount of information and

only delivers information that is relevant to the user. In this

situation the collaboration is between a human and a

system. In contrast with information retrieval systems,

information filtering systems are commonly used to support

long-term information needs of a particular user or group

with similar needs. Where information retrieval systems

operate on a relatively static set of documents (at least

during the process of information retrieval), information

filtering systems operate on a continuously changing stream

of documents (like newsgroups). The documents in this

stream need to be identified and valued based on their

relevance for a particular user or group. Information

filtering systems calculate the relevancy of a document

based on their knowledge about what is relevant for that

user. This knowledge is stored in the user’s profiles [26].

The main method used for calculating the relevancy of

documents is using feature extraction. Features of a

document are calculated and stored in user’s profiles with

their importance weights. The most widely used feature

extraction methods use categorizations of documents into

classes [4], [17].

The key aspect of information filtering is that the system

bases its recommendations on the content and its

knowledge about the interests of the user. The interest of a

user can vary in time [12]. Within most of these systems

user relevance feedback plays an important role. Relevance

feedback allows users to indicate the actual relevancy of a

document that has been read. This indication is then used

by the system to improve the user’s profile (in particular the

weights for the different features/categories).

Although these principles seem to work in practice, there

are limitations and problems associated with them ([1],

[12], [17]):

1. Because a set of classes or features is used, not all the

interests of every user can exactly be captured in these

classes or features. This makes every set inaccurate.

2. Finding the set of categories is difficult. It is not only a

finite set that must represent an infinite set of

documents, the environment in which the optimal set

must be determined is also dynamic (new information

is added, old information removed, other documents

are updated etc). The available information changes

over time.

3. Users must give feedback in the form of rankings.

Users are often not willing to give this feedback,

mainly because it is extra work. Furthermore, the

rankings may sometimes appear inconsistent to the

filtering system. Users may be uncertain about their

needs or they may not be very discriminating when

ranking documents (giving all documents an average

rate). The set of categories used may not correspond

with the way the user would normally group his

documents.

4. Information filtering systems result in over-

specialization. The information that users get is

restricted only to what is selected via their profile. But

interests of users can change over time. These shifts

can happen quickly or may change slowly over time.

Shifts in interest should be detected as soon as possible

by the filtering systems to prevent degradation in the

quality of the suggested documents.

5. Analysis of the content is, in general, very shallow.

Only certain aspects of the content are used in the

analysis. For more complex media, like audio and

video, content analysis techniques are in the early

stages of development.

Different systems handle these issues in different ways.

Some systems incorporate algorithms to detect shifts in the

user’s interests, like Sifter [12], [17]. Other systems like

Phoaks [22] try to incorporate collaborative aspects in

information filtering to solve these problems. These last

types of systems will be the focus of the next two sections.

Collaborative FilteringCollaborative filtering (also called social filtering) systems

also recommend information to users. These systems search

for users with similar interests and recommend items those

users liked. Instead of computing the similarity between

items (content), the system computes the similarity between

user interests [16]. How others liked a piece of information

is based on how they ranked those items. Information

filtering systems look at the content of the information,

whereas collaborative filtering systems look at the opinions

of users with similar interests. In these systems users are

identified as being similar if a good correlation can be

found between the ratings of documents made by these

users.

There are two major variants for collaborative filtering

systems [15]:

• Open group collaborative filtering: In this variant, all

users are part of a virtual community of people that do

not know one another. This means that

recommendations are based on ratings of users that do

not know each other.

• Closed group collaborative filtering: When a closed

user group is used, all users know each other. Knowing

the other users can influence the trust a user has in the

recommendations of the system. It is also an incentive

for users to rate items correctly.

One of the key advantages of collaborative filtering systems

is that it works on all media types, as no content analysis is

needed. Another advantage of collaborative filtering

systems is that it can discover new items of interests to the

user, simply because other people liked them. There are

however two main limitations to collaborative filtering

systems [7]:

• Scarcity problem: This means that information objects

that are rather obscure (a limited number of people

prefer these items) receive almost no ratings, which

means that users who are interested in these obscure

items will not get them recommended to them.

• Early-rater problem: This problem refers to the fact

that a recommendation of new items is not possible, as

nobody has rated them yet. It also refers to the fact that

the first users of the system get no or very limited (and

often incorrect) recommendations, due to the fact that

there are almost no ratings in the system (yet). Also a

user who starts using the system will get rather

inaccurate recommendations because the systems has

not yet been able to learn the interests of that user.

Collaborative Information Filtering, which is discussed

next, can help to overcome these two limitations.

Collaborative Information FilteringThe approach of collaborative information filtering

combines techniques from collaborative filtering (finding

similar users) with techniques from information filtering

(filtering based on content). The main purpose of these

approaches is to achieve collaborative filtering systems,

without the problems of rating scarcity and early-rater

problems. There are a several approaches, like:

• Communicating agents approach;

• Correlating profiles approach;

• Filterbots.

The Communicating Agents approach [14] uses agents in

the meaning of information filtering agents. An agent will

try to filter all documents for one user, based on the

interests of that user. The difference between the traditional

information filtering and the communicating agents

approach is that the agent also asks advice from the agents

of other users. Based on both the knowledge of his user and

the advice of other agents, the agent recommends

documents to the user. Using relevance feedback, the agent

updates its knowledge about the user and the confidence it

has in the other agents of which the advice was used. For

information filtering, two experimental systems were

developed [14]: NewT to filter Usenet news messages and

Maxims to filter e-mail.

The approach of Balabanović & Shoham [1] is called the

Correlating Profiles approach. In this approach, user

profiles are based solely on content analysis and the

relevance feedback of the user. These profiles are compared

with the profiles of other users to identify similar users.

Users receive documents that both score highly using their

own user profile, and also when they are rated highly by

users with a similar profile. In their implementation, called

Fab, they use two types of agents. The collection agent

finds pages for a specific topic and the profile of a

collection agent represents a topic of interest to a

dynamically changing group of users. The selection agent

finds pages for a specific user and represents a single user’s

interest. The selection agent’s profile represents multiple

interests for one user and may be served by several

collection agents. The main difference between the

correlating profiles and the communicating agent approach

is that with the latter approach a personal agent asks advice

from other agents, without directly accessing other user’s

ratings. With the correlating profiles approach a personal

agent has direct access to the rating of all other users.

Filterbots [19] are automated rating robots that evaluate

new documents as soon as they are published and enter

ratings for these documents. A collaborative filtering

system treats a filterbot as a “normal” user that enters many

ratings but does not request any predictions. The

collaborative filtering engine need not even know whether

users are filterbots or humans. The authors of a filterbots

need not be concerned with the use of the filterbot in a

collaborative information filtering system. They only need

to develop algorithms that analyze a document and return a

rating to the collaborative filtering engine. Filterbots can be

especially useful when used in a collaborative information

filtering system that correlates users with other user’s

profiles. This allows filterbots to be used by users only

when they match his interests. When the profile of a user

correlates with the filterbot, the ratings from that filterbot

are weighted higher. Filterbots are either simple algorithms

that convert a new document into a rating (e.g. counting the

total number of words in the document) or more advanced

learning agents that change based on their usefulness to a

group. In addition, “personal” filterbots can be developed

that try to learn the needs of one user (for example by

learning to use a group of simple filterbots that are most

suitable for this one user).

More information about these types of collaborative search

and retrieval systems is found in [14], [16] and [20]. This

section discussed the use of collaboration in information

filtering. The next section discusses collaboration between

systems.

Collaborative AgentsNot only people can collaborate, systems can also work

together to search and retrieve information. Due to the

increasing size of available information and the distributed

nature of the Internet, it is impossible for one system to

know all available content on the Internet. Systems that

perform (search and retrieval) tasks on behalf of someone

else (human or system) are generally called agents.

Good examples of collaborative agents are meta-search

engines. The main characteristic of a meta-search engine is

that it does not search the web itself for information. It

passes the query to other search engines, collects the results

from the other search engines, combines these results and

presents these combined results to the user. Examples of

meta-search engines are Meta Crawler4, Mamma.com5 and

Metagopher6.

Other examples of collaborative agents are special search

tools that must be installed on the user’s computer. The

advantage of a local meta-search application is that it can

save queries and results locally. This makes it possible to

schedule updates of queries, so up-to-date results are always

available. Some of these applications also use specific

search engines depending on the subject for which

information is requested (e.g. musical search engines for

queries with musical subjects). Examples of these types of

products are Copernic7 and AnswerChase8.

Another advantage of collaborative agents is parallel

processing (splitting the work). Similar agents get part of

the search task, which makes the whole search process more

efficient. This technique is not specific to certain types of

applications, but can be used by search applications.

Several well-known search engines use multiple agents in

parallel to probe the web for new content.

This section discussed collaboration between systems. The

next section discusses the use of collaboration when re-

using retrieved information.

Collaborative Re-use of ResultsIn collaborative settings people usually share search results.

There are four ways in which people share found

information [18]:

• Sharing results with other members of a team;

• Broadcasting interesting information;

• Acting as a consultant for other information searchers;

• Archiving.

In all these types of sharing, information can be either

shared unmodified or modified. Unmodified sharing of

results is simply the process of forwarding (electronically or

physically) the results to other people. There might be a

process in which the user provides the information to

4 http://www.metacrawler.com5 http://www.mamma.com6 http://www.metagopher.com7 http://www.copernic.com8 http://www.answerchase.com

different users, depending on their interests (as far as

known to the providing user). This process can be seen as a

manual filtering system. It is also possible that someone

first modifies the information before sharing it with others

(reversed content provision). These modifications can range

from summarizing to analyzing and rewriting the

information. Also adding annotations to the information is a

type of modification.

This section discussed the use of collaboration when re-

using retrieved information. The next section discusses an

interesting side effect of using collaboration in the search

and retrieval process.

Building Communities of InterestAn effect of the Internet is that it brings people with similar

interests together from all over the world. This provides

opportunities for collaboration. But due to the vast amount

of available information, it is not easy to find people with

the same interests. Existing collaborative search and

retrieval techniques, specifically collaborative (information)

filtering, are already equipped with techniques for finding

similar users. They only need extra functionality to allow

similar users to contact each other. There are already some

collaborative search and retrieval tools that incorporate

functionalities specifically created for this effect.

An example of such an application is Cobrow9, which is an

application that shows people that are browsing in the

vicinity of web pages the user is currently viewing. Icons

with names visualize the other users. A user can directly

contact them by starting a chat session or running a web

telephony application.

One possibility of these types of applications is that

webstore customers can talk to each other and discuss the

available products [11]. A store representative can also get

into contact with possible customers. In other words, it

brings content providers closer to content consumers.

ISSUESIn the previous sections, we described five types of

collaborative search and retrieval. Also a side effect of

collaboration in search and retrieval has been described,

namely building communities of interest. Several

applications already use these techniques, but most are not

widely used, apply only to textual information and have

limited collaborative aspects. There are issues that need to

be addressed to make collaborative search and retrieval

systems successful. Some of these issues are of a technical

nature, while others are more social or economical oriented.

This section describes some of the major issues that we

identify.

MultimediaCurrent information filtering systems and collaborative

information filtering systems are mostly based on textual

9 http://www.cobrow.com

information (even in video and music recommendation

systems, the recommendations are based on the textual

metadata available). Until recently, the technology and

algorithms for building these systems on other media than

text were not available. Even now, the possible feature

extractions for other media are still limited, or not directly

applicable for human interpretation (like color histograms).

As both information filtering and collaborative information

filtering base their recommendations on the content of the

information, the success of these systems for media other

than text depends largely on the development of new user

oriented feature extraction algorithms. Another issue with

multimedia is the need for more bandwidth, when working

in a network environment. As the size of multimedia

documents is large, especially when including audio and

video, high bandwidth and reliable networks are required

for fast electronic transmission.

Quality of ToolsThe quality of the collaborative search and retrieval tools

currently available is not sufficient. On one side, some of

these tools are limited in their range of functionality. Most

of the current collaborative browsing tools lack support for

multiple degrees of freedom in reading through the

information and do not (fully) support splitting and joining

of search tasks. On the other side, the quality of information

offered by collaborative (information) filtering systems is

still limited, mainly due to problems with automated

calculation of information relevancy and the first-rater and

scarcity problems. The combination of information filtering

with collaborative filtering seems to offer better solutions,

but this is a rather new approach that needs more research.

Another issue regarding current collaborative search and

retrieval tools is the lack of standardization. Each tool uses

its own developed systems, but for the acceptance of

collaborative systems, standards are necessary to insure

interoperability between different tools.

Security and Access RightsNot every piece of information is available to everyone.

Some information is classified or protected by security and

access rights. This raises the question of how a

collaborative search and retrieval system should handle

differences in security and access rights. E.g. what should a

system do when one user has enough rights for a piece of

information, but another user not, and they want to

collaboratively browse to this information? This issue must

be addressed in research and development of collaborative

search and retrieval systems.

PrivacyCollaborative search and retrieval systems store an

increasing amount of personal information about its users.

In collaborative (information) filtering this information is

necessary to find similar users, while in mediated systems

the search history of users need to be stored. Without this

information, collaborative systems cannot perform their

functions optimally. There is a trade-off: users need to

sacrifice privacy to increase the functionality of

collaborative systems [24], [25]. Some people will have no

problems with sacrificing part of their privacy, while others

are unwilling to do so. A solution could be that

collaborative systems offer different levels of privacy and

functionality. Where possible, these systems need to store

information as anonymously and as secure as possible. The

most basic principle is that users must be made aware of

who stores what information about them and for what

reasons.

OwnershipAn issue related to privacy is ownership. Who owns the

information stored about a user? Is it the user, the service

provider or a third party that gathers and stores the

information? Can a user make the service provider delete or

change his information? Is a service provider allowed to use

the stored information for other purposes than specified by

the collaborative system? Even if information is stored

completely anonymously this issue remains. Patterns found

in the stored information might be sufficient to uniquely

identify the person [24]. Most of these questions are of a

legal nature and should therefore be solved by laws or

agreements.

Trust and PaymentAnother issue is trust. Do users trust the systems that

recommend information to them? Do users trust the

opinions made by other users that are used to recommend

information? Related to this question of trust is the question

of payment. If information is free of payment, there is no

guarantee for the quality of this information. This is also the

situation for ratings. Fee based information and ratings

might mean there is a type of quality control. If experts are

paid for rating information from their domain, the quality of

ratings will increase and experts become more willing to

actually rate the information. A new type of services that

goes further than just fee based information services are the

valued added information services [2], like Lexis-Nexis10,

Dialog11 and Dow Jones Interactive12. These services are

focused on the information needs for business

professionals. These services collect and archive high

quality business information. But access to these services

requires a membership.

CONCLUSIONSAs the Internet is growing not only in bandwidth but also in

the available information, it becomes more difficult to find

and retrieve information. Even with the current search

engines on the Internet, it is very hard to find high quality

and relevant information, especially when multimedia

information is required. Part of the solution for this problem

10 http://www.lexisnexis.com11 http://www.dialog.com12 http://www.djinteractive.com

is the development of advanced search and retrieval

systems, where the field of collaborative search and

retrieval will play a major role.

This emerging field offers exciting possibilities for both

businesses and consumers. This paper sheds insight on

several types of information search and retrieval that

combines the field of computer supported collaborative

work with information search and retrieval. For some types

of collaborative search and retrieval, applications already

exist, but most of them are new and only used in small

communities. Before collaborative search and retrieval

systems can become a success, several issues have to be

addressed, for which research on collaborative search and

retrieval is needed.

We conclude that especially collaborative browsing and

collaborative (information) filtering are important types of

collaborative search and retrieval. In these types, research

should focus on issues like degrees of freedom, splitting

and joining within collaborative browsing and collaborative

information filtering as a combination of information

filtering and collaborative filtering. Research into technical

issues is not sufficient, but social and economical issues

need to be addressed as well.

This paper surveyed the field of collaborative search and

retrieval and is considered a starting point for further

research within the GigaCE project of the Telematica

Instituut. More detailed information can be found in the

reports on which this paper is based [20], [21].

ACKNOWLEDGMENTSThis work is conducted within the GigaCE project, which is

part of the Dutch Gigaport project (www.gigaport.nl) that

focuses on the next generation Internet technologies and

applications. The Telematica Instituut is one of the key

players in the GigaPort project and manages the research

activities in the project. We thank Daan Velthausz, Henri

ter Hofte and Andrew Tokmakoff for their help in making

this paper possible.

REFERENCES1. Balabanović, M. & Shoham, Y. Fab: content-based

collaborative recommendation. Communications Of The

ACM, 40 , 3 (March 1997), 66-72.

2. Bates, M.E. Selecting Business Intelligence Sources:

The Public Web vs. Value-Added Online Services,

White Paper from Dow Jones Reuters Business

Interactive LLC, 1997,

http://www.factiva.com/inspiring/feevfree/complete.htm

3. Cabri. G., Leonardi, L., Zambonelli, F., Supporting

Cooperative WWW Browsing: a Proxy-based

Approach. In Proc. of the seventh Euromicro Workshop

on parallel and Distributed Processing, IEEE, 1999, 3-

5.

4. Ehrmantraut, M., Harder, T., Wittig, H. & Steinmetz, R.

The personal electronic program guide – towards the

pre-selection of individual TV programs. Proc. ACM

CIKM ’96, 1996, 243-249.

5. ter Hofte. G.H. & van der Lugt, H.J. (1997). CoCoDoc:

a framework for collaborative compound document

editing based on OpenDoc and COBRA. Open

Distributed Processing and Distributed Platforms,

Proceedings of IFIP/IEEE international conference on

Open Distributed Processing and Distributed Platforms,

1997, 15-33. http://extranet.telin.nl/dscgi/ds.py/Get/File-

419

6. ter Hofte, G.H. Working Apart Together: Foundations

for Component Groupware, Telematica Instituut

Fundamental Research Series, vol. 001. Enschede, the

Netherlands: Telematica Instituut, 1998

http://www.telin.nl/publicaties/1998/wat/wat.htm.

7. Good, N., Schafer, B., Kanstan, J.A., Borchers, A.,

Sarwar, B., Herlocker, J. & Riedl, J. Combining

Collaborative Filtering with Personal Agents for Better

Recommendation, American Association for Artificial

Intelligence, 1999.

8. Greenberg, S., and Roseman, M. GroupWeb: A WWW

Browser as Real Time Groupware. In Companion to the

Proceedings of ACM SIGCHI’96, 1996, 271-272.

9. Gross, T. The CSCW3 Prototype – Supporting

Collaboration in Global Information Systems. In

Conference Supplement of the Fifth European

Conference on Computer-Supported Cooperative Work

– EC-CSCW’97 (Sept. 7-11, Lancaster UK)

10. Johansen, R. Groupware: computer support for business

teams, New York: NY Free Press, 1988.

11. Kobayashi, M., Shinozaki, M., Sakairi, T., Touma, M.,

Shahrokh, D., Wolf, C. Collaborative Customer Services

Using Synchronous Web Browser Sharing. In Proc. of

ACM CSCW’98, 1998, 99-108

12. Lam, W., Mukhopadhyay, S., Mostafa, J. & Palakal, M.

Detection of shifts in user interests for personalized

information filtering. Proc. ACM SIGIR’96, 1996, 317-

325.

13. Lieberman, H., van Dyke, N.W., & Vivacqua, A.S. Let’s

Browse: a collaborative web browsing agent. Proc.

ACM IUI ’99, 1999, 65-68.

14. Maes, P. Agents that reduce work and information

overload. Communications of the ACM, 37, 7 (1994),

31-40.

15. McCarthy, J.F. InfoShare: a system to support co-

operative information seeking in a real community of

users. In Churchill, E., Snowdon, D., Golovchinsky, G.

(Eds.), Proceedings of CSCW’98 workshop on

Collaborative and co-operative information seeking in

digital information environments, 1998.

16. Mladenic, D. Text-learning and related intelligent

agents: a survey. IEEE Intelligent Systems, (July-August

1999), 44-54.

17. Mostafa, J., Mukhopadhyay, S., Lam, W. & Palakal, M.

A Multilevel Approach to Intelligent Information

Filtering: Model, System, and Evaluation, ACM

Transactions on Information Systems, 15, 4 (1997),

368-399.

18. O’Day, V.L. & Jeffries, R. Information artisans: patterns

of result sharing by information searchers. Proc. ACM

COOCS’93, 1993, 98-107.

19. Sarwar, B.M., Konstan, J.A., Borchers, A., Herlocker,

J., Miller, B. & Riedl, J. Using filtering agents to

improve prediction quality in the GroupLens research

collaborative filtering system. Proc. of CSCW’98, 1998,

345-354.

20. Setten, M., Moeleart-El Hadidy, F. Search and

Retrieval: Collaborative Search and Retrieval, GigaCE

report, Telematica Instituut, The Netherlands.

21. Setten, M., Moeleart-El Hadidy, F. New Services:

Collaborative Search and Retrieval, GigaCE report,

Telematica Instituut, The Netherlands.

22. Terveen, L.G., Hill, W.C., Amento, B., McDonald, D. &

Creter, J. Building task-specific interfaces to high

volume conversational data. Proc. ACM CHI’97, 1997,

226-233.

23. Twidale, M.B, Nichols, D.M., Mariani, J.A., Rodden, T.

& Sawyer, P. Supporting the active learning of

collaborative database browsing techniques. Association

For Learning Technology Journal, 3, 1 (1995), 75-79.

24. Twidale, M.B. & Nichols, D. Collaborative browsing

and visualization of the search process, Proc. ASLIB,

1996, 177-182.

25. Twidale, M.B. & Nichols, D.M. A Survey of

Applications of CSCW for Digital Libraries: Technical

Report CSEG/4/98. Lancaster UK: Lancaster University

Computing Department, 1998.

26. Velthausz, D.D. Cost-effective network-based

multimedia information retrieval, Telematica Instituut

Fundamental Research Series, vol. 003. Enschede, the

Netherlands: Telematica Instituut, 1998.

http://www.telin.nl/publicaties/1998/admire/admire.htm.

27. Verhoosel, J.P.C, Wibbels, M., Batteram, H.J. &

Bakker, J.L. Rapid service development on a TINA-

based service deployment platform, Proc. of TINA’99,

1999, http://extranet.telin.nl/dscgi/ds.py/Get/File-1919

28. Zeballos, G.S. Tools for efficient collaborative web

browsing. In Churchill, E., Snowdon, D., Golovchinsky,

G. (Eds.), Proceedings of CSCW’98 workshop on

Collaborative and co-operative information seeking in

digital information environment, 1998.