Upload
clifford-pierce
View
231
Download
1
Tags:
Embed Size (px)
Citation preview
Document Collections
cs5984: Information Visualization
Chris North
Where are we?
• Multi-D• 1D• 2D• Hierarchies/Trees• Networks/Graphs• Document collections• 3D
• Design Principles• Empirical Evaluation• Java Development• Visual Overviews• Multiple Views• Peripheral Views
Structured Document Collections
• Multi-dimensional• author, title, date, journal, …
• Trees• dewey decimal
• Networks• web, citations
Envision
• Ed Fox, et al.
• Multi-D
• similar to Spotfire
Unstructured Document Collections
• Focus on Full Text
• Examples:• digital libraries, encyclopedia
• Web, homepages, photo collections
• Tasks:• search, keyword
• Browse
• Themes, subjects, topics, library coverage
• Size, distributions
Visualization Strategies
• Cluster Maps
• Keyword Query
• Relationships
• Reduced representation
• User controlled layout
today
today
Cluster Map
• Create a “map” of the document collection
• Similar documents near
• Dissimilar document far
• “Grocery store” concept
Document Vectors
Doc1 Doc2 Doc3 …
• “aardvark” 1 2 0• “banana” 2 1 0• “chris” 0 0 3• …
• Similarity between pair of docs = •
• Layout documents in 2-D map by similarity• similar to spring model for graph layout
Cluster Algorithms
• Partition clustering: Partition into k subsets
• Pick k seeds
• Iteratively attract nearest neighbors
• Hierarchical clustering: Dendrogram
• Group nearest-neighbor pair
• Iterate
Kohonen Maps
• Xia Lin, “Document Space”• samal, ying
• http://faculty.cis.drexel.edu/sitemap/index.html
Themescapes, Cartia• PNL• Mountain height
= Cluster size
WebSOM
• http://websom.hut.fi/websom/
Cluster Map
• Good:• Map of collection
• Major themes and sizes
• Relationships between themes
• Scales up
• Bad:• Where to locate documents with multiple themes?
» Both mountains, between mountains, …?
• Relationships between documents, within documents?
• Algorithm becomes (too) critical
Keyword Query
• Keyword query, Search engine• Rank ordered list
• “Information Retrieval”
Tilebars
• Hearst, “Tilebars”• reenal, xueqi
• http://elib.cs.berkeley.edu/tilebars/
VIBE• Korfhage, http://www.pitt.edu/~korfhage/interfaces.html
• Documents located between query keywords using spring model
VR-VIBE
Keyword Query
• Good:• Reduces the browsing space
• Map according to user’s interests
• Bad:• What keywords do I use?
• What about other related documents that don’t use these keywords?
• No initial overview
• Mega-hit, zero-hit problem
Assignment• Thurs: Document Collections
• Bederson, “Image Browsing”» Rui, anusha
• Card, “Web Book and Web Forager”» mrinmayee, ming
• Demo your hw3: tues or thurs
Next Week• Tues: 3-D data
• Kniss, “Interactive Volume Rendering with Direct Manip”» xueqi, mahesh
• Thurs: Workspaces• Robertson, “Task Gallery”
» supriya, varun
• Upson, “AVS”» christa, jun
• Thanksgiving break
• Tues 27: Debates• Kobsa, “Empirical comparison of comm infovis systems”
» kunal, zhiping
Upcoming Sched
• Tues: 3-D data
• Thurs: Workspaces
• Thanksgiving break
• Tues 27: Debates
• Thurs 29: How (not) to lie with visualization
• Dec: project presentations
• Dec 7: CHI 2-pagers due, student posters due