Upload
mirit
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Secondary Evidence for User Satisfaction With Community Information Systems. Gregory B. Newby University of North Carolina at Chapel Hill ASIS Midyear Meeting 1999. What do we want to know?. Who are information seekers ; users? What are their needs? Are their needs being met? - PowerPoint PPT Presentation
Citation preview
Secondary Evidence for Secondary Evidence for User Satisfaction With User Satisfaction With
Community Information Community Information SystemsSystems
Gregory B. NewbyUniversity of North Carolina at Chapel Hill
ASIS Midyear Meeting 1999
What do we want to know?What do we want to know?
Who are information seekers ; Who are information seekers ; users?users?
What are their needs?What are their needs? Are their needs being met?Are their needs being met? Context: the goals and missions of Context: the goals and missions of
the community netthe community net
What else do we want to What else do we want to know?know?
Are people viewing sponsorship Are people viewing sponsorship information?information?
Reading policy documents?Reading policy documents? Displaying images?Displaying images? Using search engines or indexes?Using search engines or indexes? Local or remote?Local or remote? Browsing or reading?Browsing or reading?
Possible sources of Possible sources of evidenceevidence
Content analysis: what’s available on Content analysis: what’s available on the system(s)? Questions asked.the system(s)? Questions asked.
Sociological research: talk to people, Sociological research: talk to people, look at what they use the net for, etc.look at what they use the net for, etc.
Psychological research: evaluate Psychological research: evaluate cognitive change in user knowledge, cognitive change in user knowledge, etc.etc.
Market research: broad data collection Market research: broad data collection from multiple potential audiencesfrom multiple potential audiences
More possible sources of More possible sources of evidenceevidence
Secondary data: artifacts generated Secondary data: artifacts generated by information system useby information system use
Today’s focus: analysis of log file Today’s focus: analysis of log file entriesentries– Web usage statisticsWeb usage statistics– Instrumenting online menu systemsInstrumenting online menu systems– Login or call historyLogin or call history– Other system logs (email, FTP)Other system logs (email, FTP)
What questions may be What questions may be asked of secondary data?asked of secondary data?
What content is accessed, with what What content is accessed, with what frequency?frequency?
What paths are followed to content?What paths are followed to content? Are entry points, policy documents, or Are entry points, policy documents, or
other front-end material bypassed?other front-end material bypassed? Is content read, skimmed or skipped Is content read, skimmed or skipped
through?through? What subsets of content are viewed by What subsets of content are viewed by
individuals (patterns of use)individuals (patterns of use)
What’s wrong with Web What’s wrong with Web server logs?server logs?
Aggregate level access to content: not Aggregate level access to content: not the whole story!the whole story!
What are SESSIONS like (a sequence of What are SESSIONS like (a sequence of accesses by a single person)?accesses by a single person)?
What are paths from item to item What are paths from item to item (transcends a single “referrer” log)(transcends a single “referrer” log)
Are data used linearly (following Are data used linearly (following hyperlinks)?hyperlinks)?
How long is spent on a document?How long is spent on a document?
More analysis is feasible. More analysis is feasible. Sample: Web server logsSample: Web server logs
Single line entries for each “hit” Single line entries for each “hit” (HTTP “GET” or similar request)(HTTP “GET” or similar request)
Separate file for errors, referrersSeparate file for errors, referrers Sample entry:Sample entry:
56kdial52.absi.net - - 56kdial52.absi.net - - [22/May/1999:20:12:45 -0500] "GET [22/May/1999:20:12:45 -0500] "GET /index.html HTTP/1.0" 200 6353/index.html HTTP/1.0" 200 6353
Sources of complexity:Sources of complexity:
Multiple types of servers might be on a Multiple types of servers might be on a single system (e.g., RealServer, single system (e.g., RealServer, database server, search engine)database server, search engine)
A Web page visit might involve many A Web page visit might involve many filesfiles
Frames and other authoring techniques Frames and other authoring techniques can confusecan confuse
More than one person might use the More than one person might use the same remote computersame remote computer
Question: Can we get the Question: Can we get the “story” of a session?“story” of a session?
Yes! Just track through all the “hits” Yes! Just track through all the “hits” from the same host within a narrow from the same host within a narrow time periodtime period– Challenge: how narrow a time period?Challenge: how narrow a time period?– Challenge: some hosts support multiple Challenge: some hosts support multiple
simultaneous users (but not many)simultaneous users (but not many)– Challenge: lots of files per page might Challenge: lots of files per page might
confuse things (but narrow +/- a few confuse things (but narrow +/- a few second time frames can help)second time frames can help)
– Challenge: what is structure of site?Challenge: what is structure of site?
Sample “GET” might Sample “GET” might include multiple filesinclude multiple files
203.87.57.76 - - [20/May/1999:18:44:48 -0400] "GET 203.87.57.76 - - [20/May/1999:18:44:48 -0400] "GET /~gbnewby/inls80/explore2.html HTTP/1.1" 200 9681/~gbnewby/inls80/explore2.html HTTP/1.1" 200 9681
203.87.57.76 - - [20/May/1999:18:44:50 -0400] "GET 203.87.57.76 - - [20/May/1999:18:44:50 -0400] "GET /~gbnewby/inls80/octo.gif HTTP/1.1" 200 12053/~gbnewby/inls80/octo.gif HTTP/1.1" 200 12053
203.87.57.76 - - [20/May/1999:18:44:53 -0400] "GET 203.87.57.76 - - [20/May/1999:18:44:53 -0400] "GET /~gbnewby/inls80/pmail.gif HTTP/1.1" 200 593/~gbnewby/inls80/pmail.gif HTTP/1.1" 200 593
Here’s a “story” (gbn’s Here’s a “story” (gbn’s pages)pages)
116.33.237.26 - - [08/May/1999:09:30:59 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 116.33.237.26 - - [08/May/1999:09:30:59 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030200 7030116.33.237.26 - - [09/May/1999:00:44:45 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 116.33.237.26 - - [09/May/1999:00:44:45 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030200 7030116.33.237.26 - - [09/May/1999:11:43:31 -0400] "GET /gbnewby/forms HTTP/1.0" 301 186116.33.237.26 - - [09/May/1999:11:43:31 -0400] "GET /gbnewby/forms HTTP/1.0" 301 186116.33.237.26 - - [09/May/1999:12:06:30 -0400] "GET /gbnewby/forms/ HTTP/1.0" 200 1837116.33.237.26 - - [09/May/1999:12:06:30 -0400] "GET /gbnewby/forms/ HTTP/1.0" 200 1837116.33.237.26 - - [09/May/1999:16:36:06 -0400] "GET /~gbnewby HTTP/1.0" 301 181116.33.237.26 - - [09/May/1999:16:36:06 -0400] "GET /~gbnewby HTTP/1.0" 301 181116.33.237.26 - - [09/May/1999:17:44:47 -0400] "GET /~gbnewby/ HTTP/1.0" 200 1355116.33.237.26 - - [09/May/1999:17:44:47 -0400] "GET /~gbnewby/ HTTP/1.0" 200 1355116.33.237.26 - - [10/May/1999:06:20:22 -0400] "GET /gbnewby/review2.html HTTP/1.0" 200 116.33.237.26 - - [10/May/1999:06:20:22 -0400] "GET /gbnewby/review2.html HTTP/1.0" 200 51785178116.33.237.26 - - [10/May/1999:09:33:51 -0400] "GET /gbnewby/vita.html HTTP/1.0" 200 116.33.237.26 - - [10/May/1999:09:33:51 -0400] "GET /gbnewby/vita.html HTTP/1.0" 200 2948729487116.33.237.26 - - [10/May/1999:13:33:30 -0400] "GET /gbnewby/inls80/explore1.html 116.33.237.26 - - [10/May/1999:13:33:30 -0400] "GET /gbnewby/inls80/explore1.html HTTP/1.0" 200 3977HTTP/1.0" 200 3977116.33.237.26 - - [11/May/1999:02:43:15 -0400] "GET /gbnewby/inls80/explore2.html 116.33.237.26 - - [11/May/1999:02:43:15 -0400] "GET /gbnewby/inls80/explore2.html HTTP/1.0" 200 9681HTTP/1.0" 200 9681116.33.237.26 - - [11/May/1999:09:21:56 -0400] "GET /~gbnewby/vita.html HTTP/1.0" 200 116.33.237.26 - - [11/May/1999:09:21:56 -0400] "GET /~gbnewby/vita.html HTTP/1.0" 200 2948729487116.33.237.26 - - [11/May/1999:10:05:31 -0400] "GET /gbnewby/presentations/security.html 116.33.237.26 - - [11/May/1999:10:05:31 -0400] "GET /gbnewby/presentations/security.html HTTP/1.0" 200 11270HTTP/1.0" 200 11270116.33.237.26 - - [11/May/1999:13:35:27 -0400] "GET /gbnewby/index_top.html HTTP/1.0" 116.33.237.26 - - [11/May/1999:13:35:27 -0400] "GET /gbnewby/index_top.html HTTP/1.0" 200 7030200 7030
Question: What are entry Question: What are entry points for particular points for particular
documents?documents?
You’re on easy street with httpd You’re on easy street with httpd “referrer” logs, but these are often not “referrer” logs, but these are often not kept (for efficiency)kept (for efficiency)
Otherwise, you don’t know where Otherwise, you don’t know where someone came from unless it was from someone came from unless it was from YOUR siteYOUR site
By looking through a session “story” By looking through a session “story” you can see the path people take to you can see the path people take to particular pages. Analyze finding aids!particular pages. Analyze finding aids!
Here’s a path, including Here’s a path, including searching and readingsearching and reading
128.22.40.142 - - [20/May/1999:11:08:34 -0400] 128.22.40.142 - - [20/May/1999:11:08:34 -0400] "GET /docsouth HTTP/1.0" 301 307"GET /docsouth HTTP/1.0" 301 307
128.22.40.142 - - [20/May/1999:11:08:45 -0400] 128.22.40.142 - - [20/May/1999:11:08:45 -0400] "GET /docsouth/dasmain.html HTTP/1.0" 200 2705"GET /docsouth/dasmain.html HTTP/1.0" 200 2705
128.22.40.142 - - [20/May/1999:11:08:46 -0400] 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/dasnav.html HTTP/1.0" 200 679"GET /docsouth/dasnav.html HTTP/1.0" 200 679
128.22.40.142 - - [20/May/1999:11:08:46 -0400] 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/images/greensquare.gif HTTP/1.0" "GET /docsouth/images/greensquare.gif HTTP/1.0" 200 55200 55
128.22.40.142 - - [20/May/1999:11:08:56 -0400] 128.22.40.142 - - [20/May/1999:11:08:56 -0400] "GET /docsouth/search.html HTTP/1.0" 200 3778"GET /docsouth/search.html HTTP/1.0" 200 3778
(part II. This is via (part II. This is via metalab.unc.edu)metalab.unc.edu)
128.22.40.142 - - [20/May/1999:11:08:57 -0400] 128.22.40.142 - - [20/May/1999:11:08:57 -0400] "GET /docsouth/images/greenarrow.gif HTTP/1.0" "GET /docsouth/images/greenarrow.gif HTTP/1.0" 200 113200 113
128.22.40.142 - - [20/May/1999:11:19:58 -0400] 128.22.40.142 - - [20/May/1999:11:19:58 -0400] "GET /docsouth/southlit/southlit.html HTTP/1.0" 200 "GET /docsouth/southlit/southlit.html HTTP/1.0" 200 36853685
128.22.40.142 - - [20/May/1999:11:20:07 -0400] 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitmain.html HTTP/1.0" "GET /docsouth/southlit/southlitmain.html HTTP/1.0" 200 2583200 2583
128.22.40.142 - - [20/May/1999:11:20:07 -0400] 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitnav.html HTTP/1.0" "GET /docsouth/southlit/southlitnav.html HTTP/1.0" 200 789200 789
(Part III.)(Part III.) 128.22.40.142 - - [20/May/1999:11:38:40 -0400] "GET /docsouth/neh/neh.html HTTP/1.0" 128.22.40.142 - - [20/May/1999:11:38:40 -0400] "GET /docsouth/neh/neh.html HTTP/1.0"
200 3539200 3539 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehmain.html 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehmain.html
HTTP/1.0" 200 2743HTTP/1.0" 200 2743 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehnav.html 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehnav.html
HTTP/1.0" 200 759HTTP/1.0" 200 759 128.22.40.142 - - [20/May/1999:11:39:21 -0400] "GET /docsouth/neh/specialneh.html 128.22.40.142 - - [20/May/1999:11:39:21 -0400] "GET /docsouth/neh/specialneh.html
HTTP/1.0" 200 16549HTTP/1.0" 200 16549 128.22.40.142 - - [20/May/1999:11:39:51 -0400] "GET /docsouth/neh/texts.html HTTP/1.0" 128.22.40.142 - - [20/May/1999:11:39:51 -0400] "GET /docsouth/neh/texts.html HTTP/1.0"
200 11999200 11999 128.22.40.142 - - [20/May/1999:11:40:16 -0400] "GET /docsouth/harriet/menu.html 128.22.40.142 - - [20/May/1999:11:40:16 -0400] "GET /docsouth/harriet/menu.html
HTTP/1.0" 200 2085HTTP/1.0" 200 2085 128.22.40.142 - - [20/May/1999:11:40:27 -0400] "GET /docsouth/harriet/small.gif HTTP/1.0" 128.22.40.142 - - [20/May/1999:11:40:27 -0400] "GET /docsouth/harriet/small.gif HTTP/1.0"
200 43701200 43701 128.22.40.142 - - [20/May/1999:11:41:01 -0400] "GET /docsouth/harriet/harriet.html 128.22.40.142 - - [20/May/1999:11:41:01 -0400] "GET /docsouth/harriet/harriet.html
HTTP/1.0" 200 217418HTTP/1.0" 200 217418 128.22.40.142 - - [20/May/1999:11:41:07 -0400] "GET /docsouth/harriet/harrietcva.gif 128.22.40.142 - - [20/May/1999:11:41:07 -0400] "GET /docsouth/harriet/harrietcva.gif
HTTP/1.0" 200 85180HTTP/1.0" 200 85180 128.22.40.142 - - [20/May/1999:11:41:11 -0400] "GET /docsouth/harriet/harriettpa.gif 128.22.40.142 - - [20/May/1999:11:41:11 -0400] "GET /docsouth/harriet/harriettpa.gif
HTTP/1.0" 200 77742HTTP/1.0" 200 77742
Question: Where do Question: Where do people go from a people go from a
particular location?particular location?
Again, your “story” logs can track Again, your “story” logs can track thisthis
Again, caching is a particular Again, caching is a particular challenge. For example, a user challenge. For example, a user might follow hyperlinks, but the might follow hyperlinks, but the logs show discontinuities (because logs show discontinuities (because they went via a cached document)they went via a cached document)
Sample: going from Sample: going from specifics, to index, to sub-specifics, to index, to sub-
indexindex 4blah18.blahinc.com - - [22/May/1999:00:21:01 -0500] "GET /mrm/father.html HTTP/1.0" 4blah18.blahinc.com - - [22/May/1999:00:21:01 -0500] "GET /mrm/father.html HTTP/1.0"
200 1760200 1760 4blah18.blahinc.com - - [22/May/1999:00:21:03 -0500] "GET /mrm/bluegrass.gif HTTP/1.0" 4blah18.blahinc.com - - [22/May/1999:00:21:03 -0500] "GET /mrm/bluegrass.gif HTTP/1.0"
200 26959200 26959 4blah18.blahinc.com - - [22/May/1999:00:27:48 -0500] "GET /index.html HTTP/1.0" 200 4blah18.blahinc.com - - [22/May/1999:00:27:48 -0500] "GET /index.html HTTP/1.0" 200
62166216 4blah18.blahinc.com - - [22/May/1999:00:27:51 -0500] "GET /beige_pale.gif HTTP/1.0" 200 4blah18.blahinc.com - - [22/May/1999:00:27:51 -0500] "GET /beige_pale.gif HTTP/1.0" 200
20852085 4blah18.blahinc.com - - [22/May/1999:00:27:53 -0500] "GET /pnetlogo.gif HTTP/1.0" 200 4blah18.blahinc.com - - [22/May/1999:00:27:53 -0500] "GET /pnetlogo.gif HTTP/1.0" 200
38613861 4blah18.blahinc.com - - [22/May/1999:00:28:07 -0500] "GET /directory.html HTTP/1.0" 302 4blah18.blahinc.com - - [22/May/1999:00:28:07 -0500] "GET /directory.html HTTP/1.0" 302
216216 4blah18.blahinc.com - - [22/May/1999:00:28:16 -0500] "GET /directory/culture.html 4blah18.blahinc.com - - [22/May/1999:00:28:16 -0500] "GET /directory/culture.html
HTTP/1.0" 200 2980HTTP/1.0" 200 2980 4blah18.blahinc.com - - [22/May/1999:00:28:18 -0500] "GET /directory/buggy.jpg 4blah18.blahinc.com - - [22/May/1999:00:28:18 -0500] "GET /directory/buggy.jpg
HTTP/1.0" 200 8213HTTP/1.0" 200 8213 4blah18.blahinc.com - - [22/May/1999:00:28:38 -0500] "GET /prairienations/index.htm 4blah18.blahinc.com - - [22/May/1999:00:28:38 -0500] "GET /prairienations/index.htm
HTTP/1.0" 200 9136HTTP/1.0" 200 9136 4blah18.blahinc.com - - [22/May/1999:00:30:23 -0500] "GET /directory/nature.html 4blah18.blahinc.com - - [22/May/1999:00:30:23 -0500] "GET /directory/nature.html
HTTP/1.0" 200 6865HTTP/1.0" 200 6865
Question: How long is Question: How long is spent on a document?spent on a document?
Easy: inter-click time from a sessionEasy: inter-click time from a session You could even make an “average time You could even make an “average time
per document” for some gateway per document” for some gateway documents (such as user agreements). documents (such as user agreements). Or, infer AT/D by tracking those sessions Or, infer AT/D by tracking those sessions that “seem” to be contiguous. This is that “seem” to be contiguous. This is challenging: what if someone goes to challenging: what if someone goes to another site, or takes a nap?another site, or takes a nap?
Caching is still a problemCaching is still a problem
Analysis of other Analysis of other secondary sources of datasecondary sources of data
See Newby & Bishop 1997 for See Newby & Bishop 1997 for instrumentation of menu systemsinstrumentation of menu systems– Log choices of menu optionsLog choices of menu options– Correlate with basic user demographics Correlate with basic user demographics
(collected online)(collected online)– Problem: most modern systems are not login-Problem: most modern systems are not login-
based, they’re Web-basedbased, they’re Web-based Access logs: are people coming in from Access logs: are people coming in from
dial-up lines, academic locations, etc? dial-up lines, academic locations, etc? Dial-up = watch graphics!Dial-up = watch graphics!
ConclusionsConclusions
The “easy” automated tools for The “easy” automated tools for Web log analysis are insufficientWeb log analysis are insufficient
They could be extended with some They could be extended with some programming effort or utilitiesprogramming effort or utilities
““Eyeballing” the logs is still usefulEyeballing” the logs is still useful Be cautious about privacy - both Be cautious about privacy - both
your own site’s policy, and the your own site’s policy, and the problems of posting some log dataproblems of posting some log data