19
Visualization in Visualization in Text Information Text Information Retrieval Retrieval Ben Houston Exocortex Technologies www.exocortex.org Zack Jacobson CAC

Visualization in Text Information Retrieval Ben Houston Exocortex Technologies Zack Jacobson CAC

Embed Size (px)

Citation preview

Page 1: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Visualization in Visualization in Text Information Text Information

RetrievalRetrieval

Ben HoustonExocortex Technologies

www.exocortex.org

Zack JacobsonCAC

Page 2: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

The Starting Goal

The Original Project GoalCan we come up with a graphical way of representing search

results in a way that is superior to text only displays?

Other VITA project members:Els Goyette Olivier DagenaisSarah Rosser

Page 3: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

A Text IR Interaction Model

InterfaceIR Search

Proxy

DocumentCollection

Query(s)

Results

Browsing

User

Page 4: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

A Quantification of User Needs

Specific resource.– Has a particular book, web page in mind.

Specific information.– Needs a book on a particular subject matter which

contains particular information.

Specific knowledge.– Needs to know about an unfamiliar subject matter.

Page 5: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

A Quantification of User Needs

IR is good at these tasks.IMHO Visualization would be

an unneeded hindrance.

Maybe this is an opportunity here.There is a lot of information to shift through.

Specific resource.

Specific information.

Specific knowledge.

Page 6: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Formalizing Knowledge Search

There is a hypothetical set of relevant documents which the user would like: Dr

The user attempts to get the set Dr through initially guess and refining a series of: q1, q2, … qn.

We can think of it as iterative evolutionary hill climber.– Serial sub goals of finding qn+1 such that P(Dr|qn+1) > P(Dr |qn)

Thus… How can we help the user maximize P(Dr|q) as quickly as possible?

Page 7: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Don’t forget… popular IR problems.

Difficulty in formulating effective queries.– Average number of terms per query is about 1.5.

Words do not have a 1:1 mapping to semantic concepts.Determining the relevance ranking of an individual document.– Going past just words.

How do you deal with 1 billion documents?– Did you know its more than doubling every year?– Databases/indices of + 500 GB each.

Page 8: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Our efforts

1st Try: Bar charts. (Even 3D bar charts!)– Naïve first attempts – we won’t mention those.

2nd Try: Concept-document clustering in a information space.– Two prototypes: NetViz & AutoViz, more should be

developed.

Page 9: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

The Major “Neat” Features

Focus on concrete representation of the query.Use data-mining techniques before visualization.Visual summaries.An active model for interaction.Bridging the gaps between “serial” queries.Widening / narrowing to get context.

Page 10: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Location, Color, Size, Shape

Show each concept ina meaningful spatial relationships.

Show the specific results positioned in relation to the concepts.

Page 11: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Display Intra-result Structure

Clustering on implicit/latent trends

Page 12: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Visual Document Summaries

Instead of

Lets show intra-documentconcept co-occurrence

Page 13: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Exploring within a result set

Highlighting and extracting subsets.

Each document has a probability distribution amount the different clusters.

1)|( iC

diCP

Page 14: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Exploring outside a result set

(Slightly Hypothetical)

Present three things to the user1. Where the user is. (The City)2. What is at the location the user is at. (The Sights)3. What are related/nearby places. (The Highways)

There is a mockup of this available on my website:www.exocortex.org/~ben/trendanalysis2.html

Page 15: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Bridging Serial Queries

Instead of requiring a user to judge each query as a separate entity why not let a user see what changes in the results as they refine their query?– Currently we do serial searching with backtracking.– A potentiator for for non-serial methods of

exploration in a (Bayesian) “concept space” network.

P(Dr|qn+1) > P(Dr |qn) P(Dr|f(qn+1,qn)) > P(Dr |f(qn))

Page 16: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Widening / Narrowing Scope

Allowing for interactive narrowing or widening of the display by filtering on document relevance.

Page 17: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Browser Integration… of course

Spawning of browsers. Seamless browserintegration. (hypothetical)

Page 18: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Results and Predictions

Extracting / presenting intra-result set structure is extremely effective. There is value breaking free from serial queries.Provide landmarks and easy exploratory interaction models. More worked Needed. ??? The current browser interface is really limiting.The underlying engine is more critical than visualizations.Visual document summaries need more work.Overactive (hyperactive) interfaces are hard to learn.

“Ad hoc” Results Ben’s Future of Text IR

• Visualization is usually a fix for insufficient data-mining / algorithm techniques (in text IR).

• Intra-result set clustering works in text only displays too. It will be integrated into existing text search engines.

• The metaphor of exploring information space it become more popular.

Page 19: Visualization in Text Information Retrieval Ben Houston Exocortex Technologies  Zack Jacobson CAC

Hmm… Sturgeon tastes good.

Want to try it? Download the prototypes!NetViz http://www.exocortex.org/netviz

AutoViz http://www.exocortex.org/autoviz

Comments? Email [email protected]