Inforadar @ UPRM Computing Systems Research Group Prof. Bienvenido Vélez-Rivera – Leader José Enseñat – Graduate student Juan Torres – Undergraduate student

Inforadar @ UPRM

Computing Systems Research GroupProf. Bienvenido Vélez-Rivera – Leader

José Enseñat – Graduate student

Juan Torres – Undergraduate student

University of Puerto RicoUniversity of Puerto Rico

Mayagüez CampusMayagüez Campus

PRECISE ProjectPRECISE Project

Mayagüez, October 07, 2000Mayagüez, October 07, 2000

Problem Statement

Query-based Web Search

large result-set

short query

BUT- queries hard to write-sequential access to result set inadequate

Proposed Solution

Proposed SolutionInforadar’sInforadar’sInteractive

queryhierarchies

seed query

result set forselected query

dynamiccategoriesare queries

selectedquery

Inforadar’sInforadar’sInteractive

queryhierarchies

colors indicate node status

level 2categories

icons markdocuments

read or in-basket

Theoretical Formulation

(a)low information loss

high redundancy

Coverage-based Category Evaluation MetricGoal: Avoid Redundancy and Information Loss

q

q2q1

(b)high information loss

low redundancy(c)

better

Ideal: Select categories that best approximate a partitionBut: This is an NP-complete problem

seed

CTS: A greedy approximation algorithm for category selection

Approach: CTS picks term fi maximizing:

CitqD )^(*

)^(*itqDC

C = set of documents coveredby previously selected terms

winningcategory!

low coverage

highredundancy

Goal: Pick best term among { t1, t2, t3}

C

D(q ^ t3)

D(q ^ t2)D(q ^ t1)

D(q)

Experimental Plan

• Implement InforadarInforadar site indexing ALL website data at UPRM

• Make InforadarInforadar the official search engine for the UPRM web site

• Conduct usability study

• Analyze real user feedback

• Incorporate feedback into an improved design

References

• Query Lookahead for Query-Based Document Categorization. – Ph.D. Thesis

– Massachusetts Institute of Technology

– September 1999

• Fast and Effective Query Refinement– Bienvenido Vélez, Ron Weiss, Mark Sheldon and David K. Gifford

– ACM Conference in Research and Development in Information Retrieval (SIGIR 97)

• HyPursuit: A network search engine exploiting concent-link similarity– R. Weiss, B. Vélez, M. Sheldon, C. Namprempre, P. Szilagy and D. K.

Gifford..

– ACM Conference on Hypertext (HyperText 96)

Documents

Inforadar @ UPRM Computing Systems Research Group Prof. Bienvenido Vélez-Rivera – Leader José Enseñat – Graduate student Juan Torres – Undergraduate student