Upload
jeffery-flowers
View
216
Download
0
Embed Size (px)
Citation preview
Inforadar @ UPRM
Computing Systems Research GroupProf. Bienvenido Vélez-Rivera – Leader
José Enseñat – Graduate student
Juan Torres – Undergraduate student
University of Puerto RicoUniversity of Puerto Rico
Mayagüez CampusMayagüez Campus
PRECISE ProjectPRECISE Project
Mayagüez, October 07, 2000Mayagüez, October 07, 2000
Problem Statement
Query-based Web Search
large result-set
short query
BUT- queries hard to write-sequential access to result set inadequate
Proposed Solution
Proposed SolutionInforadar’sInforadar’sInteractive
queryhierarchies
seed query
result set forselected query
dynamiccategoriesare queries
selectedquery
Inforadar’sInforadar’sInteractive
queryhierarchies
colors indicate node status
level 2categories
icons markdocuments
read or in-basket
Theoretical Formulation
(a)low information loss
high redundancy
Coverage-based Category Evaluation MetricGoal: Avoid Redundancy and Information Loss
q
q2q1
(b)high information loss
low redundancy(c)
better
Ideal: Select categories that best approximate a partitionBut: This is an NP-complete problem
seed
CTS: A greedy approximation algorithm for category selection
Approach: CTS picks term fi maximizing:
CitqD )^(*
)^(*itqDC
C = set of documents coveredby previously selected terms
winningcategory!
low coverage
highredundancy
Goal: Pick best term among { t1, t2, t3}
C
D(q ^ t3)
D(q ^ t2)D(q ^ t1)
D(q)
Experimental Plan
• Implement InforadarInforadar site indexing ALL website data at UPRM
• Make InforadarInforadar the official search engine for the UPRM web site
• Conduct usability study
• Analyze real user feedback
• Incorporate feedback into an improved design
References
• Query Lookahead for Query-Based Document Categorization. – Ph.D. Thesis
– Massachusetts Institute of Technology
– September 1999
• Fast and Effective Query Refinement– Bienvenido Vélez, Ron Weiss, Mark Sheldon and David K. Gifford
– ACM Conference in Research and Development in Information Retrieval (SIGIR 97)
• HyPursuit: A network search engine exploiting concent-link similarity– R. Weiss, B. Vélez, M. Sheldon, C. Namprempre, P. Szilagy and D. K.
Gifford..
– ACM Conference on Hypertext (HyperText 96)