Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Knowledge Management Institute
1
Markus Strohmaier 2012
Navigational Models
Markus Strohmaier, Denis Helic Multimediale Informationssysteme II
Knowledge Management Institute
2
Markus Strohmaier 2012
Overview
Next 3 lectures (held by me): • Models of user navigation • Search query log analysis • Recommendation algorithms
Knowledge Management Institute
3
Markus Strohmaier 2012
The Memex (1945)
l A mechanized private library for individual use
l Mimics associative memory where users can l insert documents l navigate documents l retrieve documents l build trails through documents
l Operated and maintained individually
l But trails can be shared socially e.g.
(i) a user A can send trail to user B (ii) user B modifies and shares it with user C (iii) user C uses the trail for navigation
[Bush 1945] V. Bush. As We May Think. Atlantic Monthly, 1945.
The Memex [Bush 1945]:
A C
B
(i)
(iii) C‘s interaction with documents
is mediated by user A and B
(ii)
Knowledge Management Institute
4
Markus Strohmaier 2012
Web based Retrieval: Challenges
The Web is self-organized No central authority (for the WWW) or main index Everyone can add (even edit) pages Pages disappear on regular basis
– A US study claimed that in 2 investigated tech. journals 50% of the cited links were inaccessible after four years.
Lots of errors and falsehood, no quality control
Knowledge Management Institute
5
Markus Strohmaier 2012
Web based Retrieval: Challenges
The Web is hyperlinked Based on HTML Markup tags and URIs Pages are interconnected
– Unidirectional links (in-link, out-link, self-link)
Network structures emerge from the links – Link analysis is possible
Knowledge Management Institute
6
Markus Strohmaier 2012
The World Wide Web (1990-2000)
A user‘s interaction with the web is
mediated by (a few) editors and publishers
Knowledge Management Institute
7
Markus Strohmaier 2012
Interaction between individuals and computational systems is mediated by the aggregate behavior of massive numbers
(millions) of users.
The World Wide Web Today (2010)
Knowledge Management Institute
10
Markus Strohmaier 2012
Information Foraging Theory
Slides based on • Information Foraging Theory for Control Room
Resilience, Ronald Laurids Boring, PhD, [email protected]
• Information Search (Shneiderman and Plaisant, Ch. 13) from http://wps.aw.com/aw_shneider_dtui_13
Knowledge Management Institute
11
Markus Strohmaier 2012
Cost of Knowledge, Search, Cognition, and Computers
Information systems (computers) and “cost” of acquiring knowledge – A first principle of information system design – “Cognitive information ergonomics”
• Efficiency/productivity gain/usability/… – “Economics of cognition and the cognitive cost of knowledge”
There is (and has always been) a cost to acquire information /
knowledge – cost = user/worker time +, e.g., machine cost, db access charge, book
Many studies fail to document increased profit directly from implementation of (single) information system
– However, no doubt that worker productivity in late 20th century dramatically increased
– Productivity greatly enhanced by pervasive use electronic information systems (computers)
Knowledge Management Institute
12
Markus Strohmaier 2012
Informavores and Information Foraging That human quest for information is innate and adaptive is
well known
Humans are informavores – George Miller, 1983, “… magic number 7 + 2” – Organisms that hunger for information about the world and
themselves
“A wealth of information creates a poverty of attention and a need to allocate it efficiently”
– Herb Simon, AI, Nobel prize, economics, cognition
Consider analogy of acquiring knowledge with animals seek food
– Pirolli, P. and S. Card (1995). Information Foraging in Information Access Environments, in CHI '95, p. 518
– Pirolli, P. (2007) ….. Book …..
Information overload vs. diet
Knowledge Management Institute
13
Markus Strohmaier 2012
Information Foraging Theory (IFT) Information Foraging Theory (IFT)
– Pirolli and Card – Xerox PARC – “an approach to the analysis of human activities involving information access
technologies” – Derives from optimal foraging theory in biology and anthropology
• Analyzes adaptive value of food-foraging strategies
Analyzes trade-offs in value of information gained against the costs of performing activity in human-computer interaction tasks
– And need models and analysis techniques to determine value added by information access, manipulation, and presentation techniques
Real information system design problem is not how to collect more information, but how to optimize user’s time
– Increase relevant information gained per unit time expended
IFT provides a relatively “formal” (quantitative account)
Knowledge Management Institute
14
Markus Strohmaier 2012
IFT – Time Scales Considers “adaptiveness of
human-system designs in the context of the information ecologies in which tasks are performed”
– Ecology, as system, here, information
Time scales of information seeking and sense making activities
– Cognitive band (~100 ms – 10 s) – Rational band (minutes to hours) – Social band (days to months)
Have seen much of cognitive, now others
Knowledge Management Institute
15
Markus Strohmaier 2012
Time scales of analysis
Time scale (s) Psychological domain
10-1000 • Problem solving • Decision making
1-100 • Visual search • Motor behavior
Pete Pirolli's Home Page
Peter Pirolli. ... Palo Alto, CA 94304 USA phone: +1-650-812-4483 fax: +1-650-812-4241
email: [email protected] This page updated December 18, 2000.
www.parc.xerox.com/istl/members/pirolli/pirolli.html - 9k - Cached - Similar pages
.100-1 • Visual attention • Perceptual judgment
User Interface Domain
Knowledge Management Institute
16
Markus Strohmaier 2012
IFT – An Ecological Perspective Time scales of information seeking and sense making activities
– Cognitive band (~100 ms – 10 s) – Rational band (minutes to hours) – Social band (days to months)
As time scale increases, less regard for how internal processing
accomplishes linking of actions to goals Assumes behavior governed by “rational principles and shaped by
constraints and affordances of the task environment” An ecological perspective, i.e., that behavior is “adaptive” in that it
accomplishes some goal
Knowledge Management Institute
17
Markus Strohmaier 2012
IFT – Metaphor and Quantitative Information Foraging Theory
– name both a metaphor and straightforward use of biological “optimal foraging theory”
Metaphor:
– Animals adapt behavior and structure through evolution • (humans don’t have to wait that long!)
– Animals adapt to increase their rate of energy intake, etc. • To do this they evolve different methods • E.g., wolf hunts prey, spiders build webs and wait
And there are analogies to this
– E.g., hunting = active information seeking, waiting = information filtering – Humans (and others) hunt in groups - when variance of food is high
• Accept lower expected mean to minimize probability of days without food – Also, on social time scale, sharing of information
Knowledge Management Institute
18
Markus Strohmaier 2012
Optimal Foraging Theory - Biology Developed in biology for understanding opportunities and forces of
adaptation – elements of the theory can help in understanding existing human
adaptations for gaining and making sense of information – Also, aid in task analysis for creating new interactive information system
designs
Optimality models include – Decision assumptions
• Which of the problems faced by an agent are to be analyzed • E.g., whether to pursue a particular type of information (or prey) when
encountered, how long to spend – Currency assumptions
• How choices are to be evaluated, e.g., information value (food value) – Constraint assumptions
• Limit and define relationships among decision and currency variables • E.g., from task structure, interface technology, user knowledge
Knowledge Management Institute
19
Markus Strohmaier 2012
Information Foraging Theory Information foraging usually a task embedded in context of some
other task – Value and cost structure defined in relation to the embedding task – Value of external information may be in improvements to outcomes of
embedding task
Usually, embedding task is some ill-structured problem – Additional knowledge is needed to better define goals, available actions,
heuristics, etc. – E.g., choosing a graduate school, developing business strategy
Though use optimality model, not imply human behavior is classically rational
– I.e., have perfect information and infinite computational resources – Rather, humans exhibit bounded rationality, or make choices based on
satisficing
Knowledge Management Institute
20
Markus Strohmaier 2012
IFT – Information Patch Model Information patch model – from optimal foraging theory
Rate of currency intake, R = U / (Ts + Th) – U = net amount of currency gained – Ts = time spent searching – Th = time spent exploiting
Net currency gain, U = Uf - Cf – Uf = overall currency intake (gross amount foraged) – Cf = currency expended in foraging
Average rate of currency intake u = Uf / λTs – If assume information workers/foragers/consumers encounter information as
linear function of time – Total n items encountered = λTs, where λ is rate of encounter with items
Knowledge Management Institute
21
Markus Strohmaier 2012
IFT – Information Patch Model Average cost of handling items: Let s = search cost per unit time, then total cost of search = sTs
Then, substituting in equation for R, rate of currency intake: So, can express R in terms of
– Average rate of currency intake, u – Search cost per unit time, s – Cost of handling items, h
Knowledge Management Institute
22
Markus Strohmaier 2012
Optimal Foraging Time in a Patch • gi(t), cumulative gain function – Amt of information gained in time t – gA(t) = random order of encounter
• Increase in information equal for all elements
• Hence, constant slope
– gB(t) and gc(t) = ordered by relevancy • “Relevant” items, those with higher
information content, encountered earlier • Hence, highest rate of information increase
earlier, and rate decreases
λp, rate of encounter with relevant items
x-axis, travel time between patches
RB and RC = rate of return tc and tb optimal foraging time
Knowledge Management Institute
23
Markus Strohmaier 2012
IFT - Cost of Knowledge Foraging Efficiency
– Animals minimize energy expenditure to get required gain in sustenance – Humans minimize effort to get necessary gain in information
Again, foraging for food has much in common with seeking information – Like edible plants in wild, useful information items often grouped together,
but separated by long distances in an “information wasteland”
Also, information “scent” – Like scent of food, information in current environment that will assist in
finding more information clusters
Activities analyzed according to value gained and the cost incurred – Resource costs
• Expenditures of time and cognitive effort incurred – Opportunity costs
• Benefits that could be gained in engaging in other activities • E.g., if not gaining information about visualization, could be gaining information about
software design
Knowledge Management Institute
24
Markus Strohmaier 2012
IFT Information processing systems evolve so as to maximize the gain of
valuable information per unit cost – Sensory systems (vision, hearing) – Information access (card catalogs, offices)
information value cost of interaction ( ) maximize
Knowledge Management Institute
25
Markus Strohmaier 2012
Navigating Networks
How can we model user navigation on
networks?
Knowledge Management Institute
26
Markus Strohmaier 2012
Experiment [Milgram] Goal • Define a single target person and a group of starting persons • Generate an acquaintance chain from each starter to the target Experimental Set Up • Each starter receives a document • was asked to begin moving it by mail toward the target • Information about the target: name, address, occupation, company,
college, year of graduation, wife’s name and hometown • Information about relationship (friend/acquaintance) [Granovetter 1973] Constraints • starter group was only allowed to send the document to people they
know and • was urged to choose the next recipient in a way as to advance the
progress of the document toward the target
1933-1984
Knowledge Management Institute
27
Markus Strohmaier 2012
Introduction
The simplest way of formulating the small-world problem is: Starting with any two people in the world, what is the likelihood that they will know each other? A somewhat more sophisticated formulation, however, takes account of the fact that while person X and Z may not know each other directly, they may share a mutual acquaintance - that is, a person who knows both of them. One can then think of an acquaintance chain with X knowing Y and Y knowing Z. Moreover, one can imagine circumstances in which X is linked to Z not by a single link, but by a series of links, X-A-B-C-D…Y-Z. That is to say, person X knows person A who in turn knows person B, who knows C… who knows Y, who knows Z.
[Milgram 1967, according to ]http://www.ils.unc.edu/dpr/port/socialnetworking/theory_paper.html#2]
Knowledge Management Institute
28
Markus Strohmaier 2012
An Experimental Study of the Small World Problem [Travers and Milgram 1969]
A Social Network Experiment tailored towards • Demonstrating • Defining • And measuring Inter-connectedness in a large society (USA) A test of the modern idea of “six degrees of separation” Which states that: every person on earth is
connected to any other person through a chain of acquaintances not longer than 6
Knowledge Management Institute
29
Markus Strohmaier 2012
Results I
• How many of the starters would be able to establish contact with the target? – 64 out of 296 reached the target
• How many intermediaries would be required to link starters with the target? – Well, that depends: the overall mean 5.2 links – Through hometown: 6.1 links – Through business: 4.6 links – Boston group faster than Nebraska groups – Nebraska stockholders not faster than Nebraska random
• What form would the distribution of chain lengths take?
Knowledge Management Institute
30
Markus Strohmaier 2012
Results III .
• Common paths • Also see:
Gladwell’s “Law of the few”
Knowledge Management Institute
31
Markus Strohmaier 2012
Follow up work (2008) http://arxiv.org/PS_cache/arxiv/pdf/0803/0803.0939v1.pdf
– Horvitz and Leskovec study 2008 – 30 billion conversations among 240 million people of Microsoft
Messenger – Communication graph with 180 million nodes and 1.3 billion
undirected edges – Largest social network constructed and analyzed to date (2008)
Knowledge Management Institute
32
Markus Strohmaier 2012
Decentralized Search
Shortest path to target
A (tag-tag) network:
Background knowledge: (a tag hierarchy)
start target
Goal: Navigate from START to TARGET using local and background knowledge only
Folksonomy 1
Folksonomy ...
Folksonomy n
J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)
shortest path with global knowledge pGK = 3
shortest path found with local knowledge pLK = 4
Δ = pLK-pGK
Idea: use folksonomies as background knowledge Then, the performance of decentralized search
depends on the suitability of folksonomies. In other words, we can evaluate the suitability of folksonomies for decentralized search through simulations.
candidates
Knowledge Management Institute
33
Markus Strohmaier 2012
Evaluating Hierarchical Structures in Networks
How can measure the efficiency of
hierarchical structures for navigation?
Knowledge Management Institute
34
Markus Strohmaier 2012
The World Wide Web (1990-2000)
How efficient is this as a navigational aid?
Knowledge Management Institute
35
Markus Strohmaier 2012
Construction of hierarchies from unstructured tagging data
From tag centrality to tag generality:
[Heyman and Garcia-Molina 2006]
high tag centrality: more abstract
low tag centrality: more specific
Other existing folksonomy algorithms: k-means, affinity propagation, …
Knowledge Management Institute
36
Markus Strohmaier 2012
Decentralized Search
Evaluation Framework
Folksonomy 1
Folksonomy n
Folksonomy …
Simulation
Click-Data
Performance Evaluation
Explanatory Evaluation
which folksonomy explains actual user behavior best
which folksonomy performs best on a given navigational task
Knowledge Management Institute
37
Markus Strohmaier 2012
Success Rates Across Different Folksonomies
Random folksonomy
k-means / affinity propagation
Tag generality approaches
All approaches outperform a random folksonomy
Tag generality approaches outperform k-means / Aff. Propagation
flickr dataset
Success rate: The number of times an agent is successful in finding a path using a particular folksonomy as background knowledge
max hops n: the maximal number of steps an agent is allowed to perform before stopping (a tunable parameter e.g., an agent only follows n links).
n
Knowledge Management Institute
38
Markus Strohmaier 2012
Success Rates Across Different Datasets
Holds for all datasets (to diff. extents)
But how efficient are
those folksonomies
during search? Efficiency: how often does an agent not
find the global shortest path, but some other path that is longer.
Knowledge Management Institute
39
Markus Strohmaier 2012
Stretch Δ = pLK-pGK Shortest Paths found with Local Knowledge
Finds no path: Δ = infinite Finds paths that is +1 longer: Δ = 1
Finds shortest possible path: Δ = 0
Tag generality approaches (d+e) find much shorter
paths!
Holds for all datasets (to diff. extents)
Bibsonomy K-Means
Knowledge Management Institute
40
Markus Strohmaier 2012
Summary
• Dsearch as a natural model of user navigation on the web
• Emergence of dynamic, user-generated links reduces control
• Empirical studies and new algorithms are needed to recover important system properties
Knowledge Management Institute
41
Markus Strohmaier 2012
End of Presentation