Upload
independent
View
2
Download
0
Embed Size (px)
Citation preview
Collaborative Search and Retrieval:Finding Information Together
Mark van Setten and Ferial Moelaert-El Hadidy
Telematica Instituut
P.O. Box 589
7500 AN Enschede, The Netherlands
+31 53 4850485
[email protected], [email protected]
ABSTRACTThe emerging field of collaborative search and retrieval
offers exciting possibilities for both businesses and
consumers. This paper sheds insights on several types of
information search and retrieval that combines the field of
computer supported collaborative work with information
search and retrieval. We examine the roots of collaborative
search and retrieval, the major challenges, the types of
solutions that exist today, what we can expect in years to
come and issues that need to be addressed.
KeywordsCollaborative Search and Retrieval, Collaboration,
Browsing, Filtering, Profiles, Agents
INTRODUCTIONConsider the following situation: A TV producer looking
for a video fragment of a running horse on a beach requests
a librarian to retrieve this fragment from a large collection
of video information. A few days later the producer
receives a tape with a number of scenes. Not satisfied with
the results the producer asks the librarian for the same type
of fragments but without any buildings and people in the
scene. This time the producer finds a scene that matches his
needs.
Consumers of information, like the one sketched above,
have one thing in common: they face the challenge of
finding relevant information. This becomes more difficult
due to the explosive growth of available digital information.
If the TV producer could collaborate with the librarian
during the search process, a quicker and better query
formulation could be achieved. This increases the quality of
the search results and decreases the duration of the search
process.
The converging boundaries between the richness of high-
resolution video, high-fidelity sound, the interactive content
of the PC and the Internet add even more complexity. These
challenges are visible at both the content production and
deployment stage. Content production represents the
practice of creating, indexing, cataloging, and storing media
assets in digital form so that they can be searched, shared,
distributed, re-purposed, and e-commerce enabled. Content
deployment is concerned with disclosure and delivery of
digital content to end users, based upon the information
need of those end users.
Effective management of content and media assets will lead
to better utilization of intellectual capital. Ultimately, the
goal is effective knowledge management, the fullest
exploitation of our intellectual capital that is the basis upon
which companies will soon be valued. Using collaboration
in the content management chain is a strategy that may lead
to more effective knowledge management.
This paper offers an introduction to the emerging area of
using collaboration in the content deployment process.
Information search and retrieval, which is concerned with
the disclosure and delivery of digital content to the end
users, is one of the main parts of the deployment process.
Combining computer supported collaborative work
(CSCW) and information search and retrieval results in
collaborative search and retrieval: the support of
collaboration in searching and retrieving digital information
by integrated applications using telecommunication and/or
information technology.
Various research projects tackle the field of collaborative
search and retrieval. GroupWeb [8], CSCW3 [9], and Let’s
Browse [13] support collaboration during browsing.
Ariadne [24] focuses on the use of search histories, while
Fab [1], Sifter [12], NewT [14], and Phoaks [22] use
collaboration in filtering and recommending information.
Most of this research focuses on just one of the types of
collaborative search and retrieval. This paper examines the
roots of collaborative search and retrieval, the major
challenges, the types of solutions that exist today, what we
can expect in years to come and issues that need to be
addressed.
This paper starts with an overview of the major types of
collaborative search and retrieval that we distinguish. These
types are than described in more detail, including
challenges and possible solutions. Where available,
examples of research and tools are given. Furthermore, we
identify several general issues that need to be addressed for
all types of collaborative search and retrieval.
COLLABORATIVE SEARCH AND RETRIEVAL TYPESThere are three main ways for users to search for
information: browsing, querying and filtering. Browsing is
the process of searching for information by navigating
through links between electronic documents. Querying is
searching for information by explicitly formulating the
user’s need in a question. Filtering is the process in which a
system filters a vast amount of information and only
delivers or recommends information to the user that is
relevant to the user.
Within search and retrieval, there are two types of players:
users and systems (also called agents). Retrieved
information can also be re-used again. Furthermore, a
distinction between two types of collaboration can be made:
working together and mediated working. Working together
means two or more players are both directly involved in the
same task and are equal partners, whereas mediated
working means that one player performs a task by using
another player as an expert who is doing part of the task,
based on his expertise.
Based on these aspects, we distinguish five types of
collaborative search and retrieval:
1. Collaborative browsing: collaboration when users arebrowsing;
2. Mediated searching: collaboration when users and/orsystems are working with mediators during querying;
3. Collaborative information filtering: collaborationduring information filtering;
4. Collaborative agents: collaboration between systems;
5. Collaborative re-use of results: collaboration whenretrieved information is re-used.
Besides these five types of collaborative search and
retrieval, there is also an interesting side effect that can be
enhanced when using collaboration in search and retrieval,
which is building communities of interest.
These five types of collaborative search and retrieval and
building communities of interest are subsequently discussed
in the remainder of this section.
Collaborative BrowsingWith collaborative browsing, people are searching and
retrieving information together via browsing. Browsing can
be seen as an implicit way to formulate a query [26].
Another type of browsing is when someone is not searching
for a specific document, but is merely exploring a certain
subject.
AdvantagesOne of the major problems with browsing is that the user
does not always know up-front whether a link will actually
bring him closer to his goal or if the new information is
interesting. Browsing together with other people can reduce
this problem. Collaborative browsing can help answer the
question of “which link should be followed next?” as some
participants may already have read some of the referenced
documents.
By using collaborative browsing, we believe that group
understanding can be improved, as team members become
aware of reasons behind choices and know why some
documents are more relevant than others, because they were
all involved in the browsing process. This also helps the
process of teambuilding, as people are actually working
together in performing a task.
Collaborative browsing also enables division of labor.
When a group needs to browse through an extensive
collection of references, each member can explore a subset
of this collection. This makes the whole browsing process
more efficient [28]. When dividing labor, two processes
become important: splitting and joining. Splitting is the
division of tasks amongst the group members, after which
each member individually performs the tasks assigned to
him. At the end, the results of the individual tasks must be
joined to create a combined end result.
An advantage that follows from both the division of labor
and the sharing of knowledge is that collaborative browsing
allows groups to build up expertise quickly. Especially
when there is collaboration between experts in different
domains [28]. When information is needed that is situated
on the boundaries of two or more different knowledge
domains, experts from each field may not have enough
knowledge of the other domains to distinguish between
relevant and irrelevant information. When experts browse
together, they can consult and help each other in finding the
most relevant information.
Another area of collaborative browsing can be found in the
entertainment domain, with games like team scavenger
hunts [28], e.g. Quake – capture the flag.
ChallengesCurrently, there are three major challenges when people
want to browse collaboratively:
The first challenge is that every participant has his own
background, knowledge and interests (which is also an
advantage in cross-domain browsing). Although this can
make the choice of the next link to follow easier (due to
possible up-front knowledge of participants), it can also
make the choice more difficult as conflicting opinions may
arise amidst the participants. There are some solutions to
address this issue. One solution is the use of user profiles.
For each participant there is a user profile that describes his
interests and backgrounds. A collaborative browsing system
can use these profiles to advice the group about the links
that best match the interests of all participants. Another
solution is using predefined group goals that are used by
the system in a similar way as user profiles. These solutions
are related to filtering systems (see the collaborative
information filtering section). Finally, a collaborative
browsing system can take advantage of a voting system,
where users vote for the link to follow. The system collects
these votes and uses them to either advice about the link to
follow or automatically follow the link with the highest
votes. Combinations of the described solutions are also
possible.
Another main challenge of collaborative browsing is that all
participants need to read simultaneously through the
displayed information. During browsing, all participants
read information from one or more screens. When a
document does not fit on one screen, scrolling through the
document is necessary. With more than one user, this is
complicated as everyone reads at a different speeds. A
solution for this problem is allowing each participant to
have a certain degree of freedom in reading through the
information. Degrees of freedom in reading through
document(s) are (based on the zipper model [5], [6], which
identifies various levels on which states can be dynamically
coupled and uncoupled in a collaborative system, and
applied to browsing):
• Fixed page scrolling: All participants watch the same
information simultaneously, and scroll to the same
position in the document. This type of collaborative
browsing can be provided by using application sharing
on existing browsers. It does not allow users to read
information independently and does not address the
challenge we described above.
• Independent page scrolling: all users are looking at the
same document on their own screens and they can
scroll through the current page independently. When
one user chooses to browse to another page, all
browsers of the other users automatically go to that
page. On that page, they can again read and scroll
through the information independently. This is a form
of relaxed “what-you-see-is-what-I-see” (WYSIWIS)
and can for example be found in GroupWeb [8].
• Independent document browsing: participants cannot
only read and scroll through the current page, they can
also start reading other directly linked pages. The level
of depth to which individual members can browse is
determined up-front. There is one user who is in
control of the common “base” page and who can
synchronize the screens of all participants. This is very
useful for group browsing, as it allows each user to
explore a subset of links, after which the whole group
can make a better decision on which links are relevant
for the whole group.
• Independent site browsing or independent related
browsing: this allows participants to browse completely
independent. The collaborative browsing system keeps
track of where each individual participant is and has
been. It can inform other participants of pages that
other members already read. It can also synchronize the
individual screens to one page of interest. This type of
collaborative browsing is related to collaborative
filtering systems, but in this situation the participants
are browsing synchronously.
Related with the previous challenges is the challenge of
who can actually activate the next link? One possibility is to
give one user the control. When deciding which link to
follow he may use the advice of others (using voting, a
common goal, a recommendation of the system based on
user profiles and common goals etc). Another type of
control is when the system selects the link to follow, based
on voting of the participants, common goals and/or user
profiles. In this situation the participants have no direct
influence over the link to follow after it has been calculated
by the system. It is also possible to give every user the
rights to navigate, but this may lead to conflicts.
Spatial and temporal aspectsBased on the place dimension in the time-place matrix of
CSCW [10] two types of collaborative browsing are
distinguished: co-located and remote. With co-located
collaborative browsing all the participants are in the same
location, whereas with remote collaborative browsing some
or all participants are at different locations.
The time dimension in the time-place matrix can also be
used to differentiate between two types of collaborative
browsing: synchronous collaborative browsing and
asynchronous collaborative browsing. When people
browse synchronously, they read the information, choose
the next link and follow these links simultaneously. With
asynchronous browsing, participants read the current
information and choose the next link to follow at different
times. Depending on the type of decision-making (voting,
user profiles, common goals or some combination)
participants must wait until some sort of synchronization
point has been reached. Examples of synchronization points
are time, a minimum number of participants that have read
the information or an intervention from a coordinator.
Especially for asynchronous browsing the issues of splitting
and joining are important.
Current systemsCurrently, there are some systems available for
collaborative browsing, but they do not cover all the
possibilities of collaborative browsing. One research system
for collaborative browsing is called “Let’s Browse” [13].
This system is used for synchronous co-located
collaborative browsing, using user interests.
Another possibility is using application sharing systems,
like those available as part of Microsoft NetMeeting1 and
Friends [27]. In these applications, a remote user can view
(and if allowed even control) the screen of another user’s
system. This allows multiple users to browse together when
1 http://www.microsoft.com/netmeeting
used in combination with remote conference facilities like
Internet Phone, video conferencing or chat facilities.
Other synchronous browser sharing systems are GroupWeb
[8], CSCW3 [9] and the proxy-based approach for
cooperative WWW Browsing described in [3].
WebEx2 is a system for remote meetings via the web. One
of the functionalities is collaborative browsing. The degree
of reading freedom is rather low, as the users can only
watch the chairman browse through the information and
they are not able to scroll through the current document
themselves.
There is an application that uses an electronic equivalent of
post-it notes to supports remote asynchronous collaborative
browsing. This application is called uTOK3. With this
application users can leave electronic notes on web pages.
Users visiting a web page can read the notes placed by
other users. These notes contain questions, suggestions and
remarks about the web page.
Currently most systems for collaborative browsing hardly
support splitting and joining and only allow fixed page
scrolling or independent page scrolling. To take full
advantage of the possibilities of collaborative browsing, we
recommend that both splitting and joining and other degrees
of reading freedom should be supported.
This section discussed the use of collaboration during
browsing, while the next section discusses the use of
collaboration when users and/or systems are working with
mediators during querying.
Mediated SearchingThere are situations in which people need the help of other
people. This is also true when searching and retrieving
information. Although a person may have knowledge about
the domain he works in, he may not be an expert in
searching information, for which he needs help. Someone
may also need help when trying to find information about
new topics outside his own knowledge domain. In both
situations, people turn to others to support them in their
search tasks.
The classic example of a mediated search environment is
the library where librarians offer support to a user. Another
example is booking a trip for a conference or holiday. Tour
operators help booking a seat on an airplane, a hotel and a
rental car. Also the use of call centers and helpdesks that
help people to find information or to solve problems is an
example of mediated searching. These types of mediators
have existed for several years now. But this is not always
effective and efficient. In some situations, problems occur
(the hotel is not what the customer asked for, the plane
leaves a day too late or the requested information formally
meets the query, but it is not exactly what the customer
2 http://www.webex.com3 http://www.utok.com
wanted). It can also require many interactions between the
customer and the mediator before the mediator fully
understands the customer’s needs.
ResultPresentation
Matching
QueryFormulation
GoalsTranslating goal into query Matching results with goals
Processing Query Returning Results
Query Reformulation
Figure 1. Query Formulation and Result Browsing
Information and communication technology (ICT) can offer
support for some of these situations. The support is
described on two levels: the query formulation level and the
result browsing with query reformulation level. Figure 1
visualizes these two levels in the search and retrieval
process. On the first level, the search expert and customer
use ICT to formulate the need of the customer in a way that
allows them to both understand this need. Tools are needed
to help the (remote) communication between customer and
expert i.e. phone support, e-mail and video-conferencing.
Collaborative result browsing is the same as collaborative
browsing where the roles of the participants differ. There
are participants who are search experts (the mediator) while
others are domain experts (the users). An example is the
librarian who is the search expert and the library customer
who is the domain expert. For more information about
collaborative browsing, see the previous section.
Figure 2. Relationship between Query Quality
and Result Browsing
There is a relationship between the quality of query
formulation and the need for collaborative result browsing.
When query formulation quality is high (left side of
Figure 2) the need for collaborative result browsing is low
as the search results will be more relevant and contain less
irrelevant results and visa versa (right side of Figure 2).
For collaborative query formulation, there are two ways in
which collaboration can take place: direct collaboration
and indirect collaboration. Direct collaboration means
collaborative query formulation using tools, whereas
indirect collaboration is performed by using search history.
Collaborative query formulation using toolsSome information search and retrieval systems require
complex query formulation. For such systems, experts are
usually needed to translate user’s needs to correct and
efficient system queries. When this translation process is an
asynchronous and/or remote process, the interaction
between the user and the expert is limited. This increases
the potential of translation errors, frequently resulting in
incorrect query and wrong results. In this case, tools are
needed to help improving the understanding between the
user and the expert.
The actual need of collaborative query formulation also
depends on the quality and ease-of-use of the user interface
of the information search and retrieval system. We believe
that well designed and easy to use interfaces result in less
need for experts. Another alternative for collaborative
query formulation is the use of artificial intelligence
dialogue systems that help the user to formulate a query.
We believe that for specialized domains and cross-domain
searches, the need for mediation by experts still remains.
Using search historyAnother class of mediated searching is the reuse of search
history. Search history is the recorded and stored
information about interactions with the systems by a user
during a search (e.g. query formulation, browsing through
results, query reformulation etc). A non-expert user can use
the search history of experts (or other users, including
himself) to learn how they search for information. In this
case, the knowledge of an expert is used in an indirect
manner. The search history can be visualized in a manner
that can easily be interpreted by others. Besides viewing, a
user can even replay (parts of) the search process during his
own search process using different parameters. An example
of this type of expert search history reuse is implemented in
the Ariadne system [24]. Search history can also be used the
other way around. According to Twidale and Nichols [23],
experts can use the search history of a user to give a user
suitable advice in how to search for required information.
This section discussed the use of collaboration when users
and/or systems are working with mediators during querying.
The next section discusses the use of collaboration in
information filtering.
Collaborative Information FilteringInstead of explicit collaboration, like collaborative
browsing and most types of mediated search and retrieval,
there is also a type of implicit collaboration. In these
situations, users are helped in their information search and
retrieval by services that recommend information or show
the relevancy of information based on the use, history and
ranking of other users for these documents. In this type of
collaboration, we distinguish three types:
• Information filtering, which filters information based
on the interests of the user and the content of the
information;
• Collaborative filtering, which filters information based
on the interests of the user and the ratings of
information by users with similar interests;
• Collaborative information filtering, which filters
information based on the interests of the users, the
content of the information and the ratings of
information by users with similar interests.
Information filteringIn single user situations, information filtering is the process
in which a system filters a vast amount of information and
only delivers information that is relevant to the user. In this
situation the collaboration is between a human and a
system. In contrast with information retrieval systems,
information filtering systems are commonly used to support
long-term information needs of a particular user or group
with similar needs. Where information retrieval systems
operate on a relatively static set of documents (at least
during the process of information retrieval), information
filtering systems operate on a continuously changing stream
of documents (like newsgroups). The documents in this
stream need to be identified and valued based on their
relevance for a particular user or group. Information
filtering systems calculate the relevancy of a document
based on their knowledge about what is relevant for that
user. This knowledge is stored in the user’s profiles [26].
The main method used for calculating the relevancy of
documents is using feature extraction. Features of a
document are calculated and stored in user’s profiles with
their importance weights. The most widely used feature
extraction methods use categorizations of documents into
classes [4], [17].
The key aspect of information filtering is that the system
bases its recommendations on the content and its
knowledge about the interests of the user. The interest of a
user can vary in time [12]. Within most of these systems
user relevance feedback plays an important role. Relevance
feedback allows users to indicate the actual relevancy of a
document that has been read. This indication is then used
by the system to improve the user’s profile (in particular the
weights for the different features/categories).
Although these principles seem to work in practice, there
are limitations and problems associated with them ([1],
[12], [17]):
1. Because a set of classes or features is used, not all the
interests of every user can exactly be captured in these
classes or features. This makes every set inaccurate.
2. Finding the set of categories is difficult. It is not only a
finite set that must represent an infinite set of
documents, the environment in which the optimal set
must be determined is also dynamic (new information
is added, old information removed, other documents
are updated etc). The available information changes
over time.
3. Users must give feedback in the form of rankings.
Users are often not willing to give this feedback,
mainly because it is extra work. Furthermore, the
rankings may sometimes appear inconsistent to the
filtering system. Users may be uncertain about their
needs or they may not be very discriminating when
ranking documents (giving all documents an average
rate). The set of categories used may not correspond
with the way the user would normally group his
documents.
4. Information filtering systems result in over-
specialization. The information that users get is
restricted only to what is selected via their profile. But
interests of users can change over time. These shifts
can happen quickly or may change slowly over time.
Shifts in interest should be detected as soon as possible
by the filtering systems to prevent degradation in the
quality of the suggested documents.
5. Analysis of the content is, in general, very shallow.
Only certain aspects of the content are used in the
analysis. For more complex media, like audio and
video, content analysis techniques are in the early
stages of development.
Different systems handle these issues in different ways.
Some systems incorporate algorithms to detect shifts in the
user’s interests, like Sifter [12], [17]. Other systems like
Phoaks [22] try to incorporate collaborative aspects in
information filtering to solve these problems. These last
types of systems will be the focus of the next two sections.
Collaborative FilteringCollaborative filtering (also called social filtering) systems
also recommend information to users. These systems search
for users with similar interests and recommend items those
users liked. Instead of computing the similarity between
items (content), the system computes the similarity between
user interests [16]. How others liked a piece of information
is based on how they ranked those items. Information
filtering systems look at the content of the information,
whereas collaborative filtering systems look at the opinions
of users with similar interests. In these systems users are
identified as being similar if a good correlation can be
found between the ratings of documents made by these
users.
There are two major variants for collaborative filtering
systems [15]:
• Open group collaborative filtering: In this variant, all
users are part of a virtual community of people that do
not know one another. This means that
recommendations are based on ratings of users that do
not know each other.
• Closed group collaborative filtering: When a closed
user group is used, all users know each other. Knowing
the other users can influence the trust a user has in the
recommendations of the system. It is also an incentive
for users to rate items correctly.
One of the key advantages of collaborative filtering systems
is that it works on all media types, as no content analysis is
needed. Another advantage of collaborative filtering
systems is that it can discover new items of interests to the
user, simply because other people liked them. There are
however two main limitations to collaborative filtering
systems [7]:
• Scarcity problem: This means that information objects
that are rather obscure (a limited number of people
prefer these items) receive almost no ratings, which
means that users who are interested in these obscure
items will not get them recommended to them.
• Early-rater problem: This problem refers to the fact
that a recommendation of new items is not possible, as
nobody has rated them yet. It also refers to the fact that
the first users of the system get no or very limited (and
often incorrect) recommendations, due to the fact that
there are almost no ratings in the system (yet). Also a
user who starts using the system will get rather
inaccurate recommendations because the systems has
not yet been able to learn the interests of that user.
Collaborative Information Filtering, which is discussed
next, can help to overcome these two limitations.
Collaborative Information FilteringThe approach of collaborative information filtering
combines techniques from collaborative filtering (finding
similar users) with techniques from information filtering
(filtering based on content). The main purpose of these
approaches is to achieve collaborative filtering systems,
without the problems of rating scarcity and early-rater
problems. There are a several approaches, like:
• Communicating agents approach;
• Correlating profiles approach;
• Filterbots.
The Communicating Agents approach [14] uses agents in
the meaning of information filtering agents. An agent will
try to filter all documents for one user, based on the
interests of that user. The difference between the traditional
information filtering and the communicating agents
approach is that the agent also asks advice from the agents
of other users. Based on both the knowledge of his user and
the advice of other agents, the agent recommends
documents to the user. Using relevance feedback, the agent
updates its knowledge about the user and the confidence it
has in the other agents of which the advice was used. For
information filtering, two experimental systems were
developed [14]: NewT to filter Usenet news messages and
Maxims to filter e-mail.
The approach of Balabanović & Shoham [1] is called the
Correlating Profiles approach. In this approach, user
profiles are based solely on content analysis and the
relevance feedback of the user. These profiles are compared
with the profiles of other users to identify similar users.
Users receive documents that both score highly using their
own user profile, and also when they are rated highly by
users with a similar profile. In their implementation, called
Fab, they use two types of agents. The collection agent
finds pages for a specific topic and the profile of a
collection agent represents a topic of interest to a
dynamically changing group of users. The selection agent
finds pages for a specific user and represents a single user’s
interest. The selection agent’s profile represents multiple
interests for one user and may be served by several
collection agents. The main difference between the
correlating profiles and the communicating agent approach
is that with the latter approach a personal agent asks advice
from other agents, without directly accessing other user’s
ratings. With the correlating profiles approach a personal
agent has direct access to the rating of all other users.
Filterbots [19] are automated rating robots that evaluate
new documents as soon as they are published and enter
ratings for these documents. A collaborative filtering
system treats a filterbot as a “normal” user that enters many
ratings but does not request any predictions. The
collaborative filtering engine need not even know whether
users are filterbots or humans. The authors of a filterbots
need not be concerned with the use of the filterbot in a
collaborative information filtering system. They only need
to develop algorithms that analyze a document and return a
rating to the collaborative filtering engine. Filterbots can be
especially useful when used in a collaborative information
filtering system that correlates users with other user’s
profiles. This allows filterbots to be used by users only
when they match his interests. When the profile of a user
correlates with the filterbot, the ratings from that filterbot
are weighted higher. Filterbots are either simple algorithms
that convert a new document into a rating (e.g. counting the
total number of words in the document) or more advanced
learning agents that change based on their usefulness to a
group. In addition, “personal” filterbots can be developed
that try to learn the needs of one user (for example by
learning to use a group of simple filterbots that are most
suitable for this one user).
More information about these types of collaborative search
and retrieval systems is found in [14], [16] and [20]. This
section discussed the use of collaboration in information
filtering. The next section discusses collaboration between
systems.
Collaborative AgentsNot only people can collaborate, systems can also work
together to search and retrieve information. Due to the
increasing size of available information and the distributed
nature of the Internet, it is impossible for one system to
know all available content on the Internet. Systems that
perform (search and retrieval) tasks on behalf of someone
else (human or system) are generally called agents.
Good examples of collaborative agents are meta-search
engines. The main characteristic of a meta-search engine is
that it does not search the web itself for information. It
passes the query to other search engines, collects the results
from the other search engines, combines these results and
presents these combined results to the user. Examples of
meta-search engines are Meta Crawler4, Mamma.com5 and
Metagopher6.
Other examples of collaborative agents are special search
tools that must be installed on the user’s computer. The
advantage of a local meta-search application is that it can
save queries and results locally. This makes it possible to
schedule updates of queries, so up-to-date results are always
available. Some of these applications also use specific
search engines depending on the subject for which
information is requested (e.g. musical search engines for
queries with musical subjects). Examples of these types of
products are Copernic7 and AnswerChase8.
Another advantage of collaborative agents is parallel
processing (splitting the work). Similar agents get part of
the search task, which makes the whole search process more
efficient. This technique is not specific to certain types of
applications, but can be used by search applications.
Several well-known search engines use multiple agents in
parallel to probe the web for new content.
This section discussed collaboration between systems. The
next section discusses the use of collaboration when re-
using retrieved information.
Collaborative Re-use of ResultsIn collaborative settings people usually share search results.
There are four ways in which people share found
information [18]:
• Sharing results with other members of a team;
• Broadcasting interesting information;
• Acting as a consultant for other information searchers;
• Archiving.
In all these types of sharing, information can be either
shared unmodified or modified. Unmodified sharing of
results is simply the process of forwarding (electronically or
physically) the results to other people. There might be a
process in which the user provides the information to
4 http://www.metacrawler.com5 http://www.mamma.com6 http://www.metagopher.com7 http://www.copernic.com8 http://www.answerchase.com
different users, depending on their interests (as far as
known to the providing user). This process can be seen as a
manual filtering system. It is also possible that someone
first modifies the information before sharing it with others
(reversed content provision). These modifications can range
from summarizing to analyzing and rewriting the
information. Also adding annotations to the information is a
type of modification.
This section discussed the use of collaboration when re-
using retrieved information. The next section discusses an
interesting side effect of using collaboration in the search
and retrieval process.
Building Communities of InterestAn effect of the Internet is that it brings people with similar
interests together from all over the world. This provides
opportunities for collaboration. But due to the vast amount
of available information, it is not easy to find people with
the same interests. Existing collaborative search and
retrieval techniques, specifically collaborative (information)
filtering, are already equipped with techniques for finding
similar users. They only need extra functionality to allow
similar users to contact each other. There are already some
collaborative search and retrieval tools that incorporate
functionalities specifically created for this effect.
An example of such an application is Cobrow9, which is an
application that shows people that are browsing in the
vicinity of web pages the user is currently viewing. Icons
with names visualize the other users. A user can directly
contact them by starting a chat session or running a web
telephony application.
One possibility of these types of applications is that
webstore customers can talk to each other and discuss the
available products [11]. A store representative can also get
into contact with possible customers. In other words, it
brings content providers closer to content consumers.
ISSUESIn the previous sections, we described five types of
collaborative search and retrieval. Also a side effect of
collaboration in search and retrieval has been described,
namely building communities of interest. Several
applications already use these techniques, but most are not
widely used, apply only to textual information and have
limited collaborative aspects. There are issues that need to
be addressed to make collaborative search and retrieval
systems successful. Some of these issues are of a technical
nature, while others are more social or economical oriented.
This section describes some of the major issues that we
identify.
MultimediaCurrent information filtering systems and collaborative
information filtering systems are mostly based on textual
9 http://www.cobrow.com
information (even in video and music recommendation
systems, the recommendations are based on the textual
metadata available). Until recently, the technology and
algorithms for building these systems on other media than
text were not available. Even now, the possible feature
extractions for other media are still limited, or not directly
applicable for human interpretation (like color histograms).
As both information filtering and collaborative information
filtering base their recommendations on the content of the
information, the success of these systems for media other
than text depends largely on the development of new user
oriented feature extraction algorithms. Another issue with
multimedia is the need for more bandwidth, when working
in a network environment. As the size of multimedia
documents is large, especially when including audio and
video, high bandwidth and reliable networks are required
for fast electronic transmission.
Quality of ToolsThe quality of the collaborative search and retrieval tools
currently available is not sufficient. On one side, some of
these tools are limited in their range of functionality. Most
of the current collaborative browsing tools lack support for
multiple degrees of freedom in reading through the
information and do not (fully) support splitting and joining
of search tasks. On the other side, the quality of information
offered by collaborative (information) filtering systems is
still limited, mainly due to problems with automated
calculation of information relevancy and the first-rater and
scarcity problems. The combination of information filtering
with collaborative filtering seems to offer better solutions,
but this is a rather new approach that needs more research.
Another issue regarding current collaborative search and
retrieval tools is the lack of standardization. Each tool uses
its own developed systems, but for the acceptance of
collaborative systems, standards are necessary to insure
interoperability between different tools.
Security and Access RightsNot every piece of information is available to everyone.
Some information is classified or protected by security and
access rights. This raises the question of how a
collaborative search and retrieval system should handle
differences in security and access rights. E.g. what should a
system do when one user has enough rights for a piece of
information, but another user not, and they want to
collaboratively browse to this information? This issue must
be addressed in research and development of collaborative
search and retrieval systems.
PrivacyCollaborative search and retrieval systems store an
increasing amount of personal information about its users.
In collaborative (information) filtering this information is
necessary to find similar users, while in mediated systems
the search history of users need to be stored. Without this
information, collaborative systems cannot perform their
functions optimally. There is a trade-off: users need to
sacrifice privacy to increase the functionality of
collaborative systems [24], [25]. Some people will have no
problems with sacrificing part of their privacy, while others
are unwilling to do so. A solution could be that
collaborative systems offer different levels of privacy and
functionality. Where possible, these systems need to store
information as anonymously and as secure as possible. The
most basic principle is that users must be made aware of
who stores what information about them and for what
reasons.
OwnershipAn issue related to privacy is ownership. Who owns the
information stored about a user? Is it the user, the service
provider or a third party that gathers and stores the
information? Can a user make the service provider delete or
change his information? Is a service provider allowed to use
the stored information for other purposes than specified by
the collaborative system? Even if information is stored
completely anonymously this issue remains. Patterns found
in the stored information might be sufficient to uniquely
identify the person [24]. Most of these questions are of a
legal nature and should therefore be solved by laws or
agreements.
Trust and PaymentAnother issue is trust. Do users trust the systems that
recommend information to them? Do users trust the
opinions made by other users that are used to recommend
information? Related to this question of trust is the question
of payment. If information is free of payment, there is no
guarantee for the quality of this information. This is also the
situation for ratings. Fee based information and ratings
might mean there is a type of quality control. If experts are
paid for rating information from their domain, the quality of
ratings will increase and experts become more willing to
actually rate the information. A new type of services that
goes further than just fee based information services are the
valued added information services [2], like Lexis-Nexis10,
Dialog11 and Dow Jones Interactive12. These services are
focused on the information needs for business
professionals. These services collect and archive high
quality business information. But access to these services
requires a membership.
CONCLUSIONSAs the Internet is growing not only in bandwidth but also in
the available information, it becomes more difficult to find
and retrieve information. Even with the current search
engines on the Internet, it is very hard to find high quality
and relevant information, especially when multimedia
information is required. Part of the solution for this problem
10 http://www.lexisnexis.com11 http://www.dialog.com12 http://www.djinteractive.com
is the development of advanced search and retrieval
systems, where the field of collaborative search and
retrieval will play a major role.
This emerging field offers exciting possibilities for both
businesses and consumers. This paper sheds insight on
several types of information search and retrieval that
combines the field of computer supported collaborative
work with information search and retrieval. For some types
of collaborative search and retrieval, applications already
exist, but most of them are new and only used in small
communities. Before collaborative search and retrieval
systems can become a success, several issues have to be
addressed, for which research on collaborative search and
retrieval is needed.
We conclude that especially collaborative browsing and
collaborative (information) filtering are important types of
collaborative search and retrieval. In these types, research
should focus on issues like degrees of freedom, splitting
and joining within collaborative browsing and collaborative
information filtering as a combination of information
filtering and collaborative filtering. Research into technical
issues is not sufficient, but social and economical issues
need to be addressed as well.
This paper surveyed the field of collaborative search and
retrieval and is considered a starting point for further
research within the GigaCE project of the Telematica
Instituut. More detailed information can be found in the
reports on which this paper is based [20], [21].
ACKNOWLEDGMENTSThis work is conducted within the GigaCE project, which is
part of the Dutch Gigaport project (www.gigaport.nl) that
focuses on the next generation Internet technologies and
applications. The Telematica Instituut is one of the key
players in the GigaPort project and manages the research
activities in the project. We thank Daan Velthausz, Henri
ter Hofte and Andrew Tokmakoff for their help in making
this paper possible.
REFERENCES1. Balabanović, M. & Shoham, Y. Fab: content-based
collaborative recommendation. Communications Of The
ACM, 40 , 3 (March 1997), 66-72.
2. Bates, M.E. Selecting Business Intelligence Sources:
The Public Web vs. Value-Added Online Services,
White Paper from Dow Jones Reuters Business
Interactive LLC, 1997,
http://www.factiva.com/inspiring/feevfree/complete.htm
3. Cabri. G., Leonardi, L., Zambonelli, F., Supporting
Cooperative WWW Browsing: a Proxy-based
Approach. In Proc. of the seventh Euromicro Workshop
on parallel and Distributed Processing, IEEE, 1999, 3-
5.
4. Ehrmantraut, M., Harder, T., Wittig, H. & Steinmetz, R.
The personal electronic program guide – towards the
pre-selection of individual TV programs. Proc. ACM
CIKM ’96, 1996, 243-249.
5. ter Hofte. G.H. & van der Lugt, H.J. (1997). CoCoDoc:
a framework for collaborative compound document
editing based on OpenDoc and COBRA. Open
Distributed Processing and Distributed Platforms,
Proceedings of IFIP/IEEE international conference on
Open Distributed Processing and Distributed Platforms,
1997, 15-33. http://extranet.telin.nl/dscgi/ds.py/Get/File-
419
6. ter Hofte, G.H. Working Apart Together: Foundations
for Component Groupware, Telematica Instituut
Fundamental Research Series, vol. 001. Enschede, the
Netherlands: Telematica Instituut, 1998
http://www.telin.nl/publicaties/1998/wat/wat.htm.
7. Good, N., Schafer, B., Kanstan, J.A., Borchers, A.,
Sarwar, B., Herlocker, J. & Riedl, J. Combining
Collaborative Filtering with Personal Agents for Better
Recommendation, American Association for Artificial
Intelligence, 1999.
8. Greenberg, S., and Roseman, M. GroupWeb: A WWW
Browser as Real Time Groupware. In Companion to the
Proceedings of ACM SIGCHI’96, 1996, 271-272.
9. Gross, T. The CSCW3 Prototype – Supporting
Collaboration in Global Information Systems. In
Conference Supplement of the Fifth European
Conference on Computer-Supported Cooperative Work
– EC-CSCW’97 (Sept. 7-11, Lancaster UK)
10. Johansen, R. Groupware: computer support for business
teams, New York: NY Free Press, 1988.
11. Kobayashi, M., Shinozaki, M., Sakairi, T., Touma, M.,
Shahrokh, D., Wolf, C. Collaborative Customer Services
Using Synchronous Web Browser Sharing. In Proc. of
ACM CSCW’98, 1998, 99-108
12. Lam, W., Mukhopadhyay, S., Mostafa, J. & Palakal, M.
Detection of shifts in user interests for personalized
information filtering. Proc. ACM SIGIR’96, 1996, 317-
325.
13. Lieberman, H., van Dyke, N.W., & Vivacqua, A.S. Let’s
Browse: a collaborative web browsing agent. Proc.
ACM IUI ’99, 1999, 65-68.
14. Maes, P. Agents that reduce work and information
overload. Communications of the ACM, 37, 7 (1994),
31-40.
15. McCarthy, J.F. InfoShare: a system to support co-
operative information seeking in a real community of
users. In Churchill, E., Snowdon, D., Golovchinsky, G.
(Eds.), Proceedings of CSCW’98 workshop on
Collaborative and co-operative information seeking in
digital information environments, 1998.
16. Mladenic, D. Text-learning and related intelligent
agents: a survey. IEEE Intelligent Systems, (July-August
1999), 44-54.
17. Mostafa, J., Mukhopadhyay, S., Lam, W. & Palakal, M.
A Multilevel Approach to Intelligent Information
Filtering: Model, System, and Evaluation, ACM
Transactions on Information Systems, 15, 4 (1997),
368-399.
18. O’Day, V.L. & Jeffries, R. Information artisans: patterns
of result sharing by information searchers. Proc. ACM
COOCS’93, 1993, 98-107.
19. Sarwar, B.M., Konstan, J.A., Borchers, A., Herlocker,
J., Miller, B. & Riedl, J. Using filtering agents to
improve prediction quality in the GroupLens research
collaborative filtering system. Proc. of CSCW’98, 1998,
345-354.
20. Setten, M., Moeleart-El Hadidy, F. Search and
Retrieval: Collaborative Search and Retrieval, GigaCE
report, Telematica Instituut, The Netherlands.
21. Setten, M., Moeleart-El Hadidy, F. New Services:
Collaborative Search and Retrieval, GigaCE report,
Telematica Instituut, The Netherlands.
22. Terveen, L.G., Hill, W.C., Amento, B., McDonald, D. &
Creter, J. Building task-specific interfaces to high
volume conversational data. Proc. ACM CHI’97, 1997,
226-233.
23. Twidale, M.B, Nichols, D.M., Mariani, J.A., Rodden, T.
& Sawyer, P. Supporting the active learning of
collaborative database browsing techniques. Association
For Learning Technology Journal, 3, 1 (1995), 75-79.
24. Twidale, M.B. & Nichols, D. Collaborative browsing
and visualization of the search process, Proc. ASLIB,
1996, 177-182.
25. Twidale, M.B. & Nichols, D.M. A Survey of
Applications of CSCW for Digital Libraries: Technical
Report CSEG/4/98. Lancaster UK: Lancaster University
Computing Department, 1998.
26. Velthausz, D.D. Cost-effective network-based
multimedia information retrieval, Telematica Instituut
Fundamental Research Series, vol. 003. Enschede, the
Netherlands: Telematica Instituut, 1998.
http://www.telin.nl/publicaties/1998/admire/admire.htm.
27. Verhoosel, J.P.C, Wibbels, M., Batteram, H.J. &
Bakker, J.L. Rapid service development on a TINA-
based service deployment platform, Proc. of TINA’99,
1999, http://extranet.telin.nl/dscgi/ds.py/Get/File-1919
28. Zeballos, G.S. Tools for efficient collaborative web
browsing. In Churchill, E., Snowdon, D., Golovchinsky,
G. (Eds.), Proceedings of CSCW’98 workshop on
Collaborative and co-operative information seeking in
digital information environment, 1998.