Upload
ellis
View
37
Download
0
Embed Size (px)
DESCRIPTION
FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”. David Schuff ([email protected]) Temple University Ozgur Turetken ([email protected]) Ryerson University. The role of weblogs. Increasingly important mode of discourse Is this really the “new media”?. - PowerPoint PPT Presentation
Citation preview
FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”
David Schuff ([email protected])Temple University
Ozgur Turetken ([email protected])Ryerson University
The role of weblogs
Increasingly important mode of discourse Is this really the “new media”?
The consequences
Proliferation of information Easy self-publishing Proliferation of content
Leads to a “silo effect” Limited information diet of
only a few blogs Will tend to seek out
confirmatory points of view
Our area of interest is news and political blogs. Not a blog about Paris Hilton (yes, there is one).
The consequences
(Strict) filtering is seen as a threat to public discourse and democracy (Sunstein 2004)
At least, the true potential of the blogosphere is not being realized
The power law distribution
An exponential relationship between two variables
Used to explain website popularity
On the right: number of inbound links by weblog (2002)
http://www.shirky.com/writings/powerlaw_weblog.html
The top 3% of the political blog sites accounted for 20% of the inbound links
The decision support and information systems context
A key challenge is to create tools that help “filter, sort, and navigate” the blogosphere (Cayzer 2004)
Blogging is essentially a form of CMC (Tan et al. 2005)
Can facilitate “common understanding” The formation of an opinion is essentially
a decision-making issue
Research question
How can information presentation techniques be used to improve information consumption on the blogosphere?
Our proposition: This can be done by presenting information organized by content, not by author (or site)
What we’re drawing from
Chunking and semantic networks (Miller 1964, Mandler 1967, Quillian 1968, Collins and Quillian 1969)
Clustering of text-based documents(Chen et al. 1996, Chen et al. 1996, Pirolli et al. 1997, Spangler et al. 2003, Roussinov and Chen 2001, Turetken and Sharda 2004)
Information visualization “Preattentive” extraction of information (Bray
1996)
Size and color (Shneiderman 1994)
FeedWiz (demo)
Live demo… How it works…
Select/create a list of weblogs
Navigate clusters of blog entries
Browse the individual clusters
Study 1 design
Quasi-experiment (semi-controlled)
Two groups of subjects Both given a list of webogs Group A: Given an ordered list of URLs Group B: Given FeedWiz
O X OMeasuring effectiveness
Study how attitudes change (OXO design)
Measuring… Opinion (agree/disagree and supporting rationale) Level of conviction Sources (blogs) used to form the opinion
Ask subjects’ opinion on an
issue (i.e., hybrid cars)
Give subjects an hour to read the
list of blogs
Ask subjects again for their opinion on that
issue
Hypotheses
H1: In forming their opinions, FeedWiz users will use more sources than those who use an ordered list
H2: FeedWiz users will be more likely to change their opinions than those who use an ordered list
H3: FeedWiz users are less likely to form strong opinions than those who use an ordered list
Study 2 design
Intensive data collection with small sample Tracking of eye-movements Recording verbal comments
Protocol analysis For further insights on usability of tool
Expected contributions
Investigate how opinions are formed from blogs
Understand how information presentation techniques can influence information consumption Implications for public discourse on the web
Creation of a highly usable tool which demonstrates those techniques
References
Bray, T. (1996). Measuring the Web, In Proceedings of the Fifth International World Wide Web Conference, Paris, France.
Cayzer, S. (2004). Semantic blogging and decentralized knowledge management. Communications of the ACM, 47(12), 47-52.
Chen, H., Nunamaker, J., Orwig, R.E., & Titkova, O. (1998). Information visualization for collaborative computing. IEEE Computer, 31(8), 75-82.
Chen, H., Schuffels, C., & Orwig, R.E. (1996). Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1), 88-102.
Collins, A.M. & Quillian, M.R. (1969). Retrieval time from semantic memory. Journal of Learning and Verbal Behavior, 8, 240-247.
Mandler, G. (1967). Organization in memory. In K. W. Spence, & J. T. Spence (Eds.), The Psychology of Learning and Motivation (pp. 327-372). New York, NY: Academic Press.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.
Pirolli, P. Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/gather browsing communicates the topic structure of a very large text collection. In Proceedings of the Conference on Human Factors in Computing Systems, New York, NY: ACM Press, 213-220.
Quillian, M.R. (1968). Semantic memory. In M. Minsky (Ed.), Semantic Information Processing (pp. 227-270), Cambridge, MA: The MIT Press.
References (continued)
Roussinov, D.G. & Chen, H. (2001). Information navigation on the web by clustering and summarizing query results. Information Processing and Management, 37(6), 789-817.
Shirky, C. (2003). Power laws, weblogs, and inequality. Accessed September 26, 2006 from http://www.shirky.com/writings/powerlaw_weblog.html.
Shneiderman, B. (1994). Dynamic queries for visual information seeing. IEEE Software, 11(6), 70.
Spangler, S., Kreulen, J.T., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191-212.
Sunstein, C.R. (2004). Democracy and filtering. Communications of the ACM, 47(12), 57-59.
Tan, C., Goswami, S., Chan, Y., & Zhong, Y. (2005). Conceptual evaluation of weblog as a computer-mediated communication application. In Proceedings from the 11th Americas Conference on Information Systems, Omaha, NE, 2361-2367.
Turetken, O. & Sharda, R. (2004). Development of a fisheye-based information search processing aid (FISPA) for managing information overload in the web environment. Decision Support Systems, 37(3), 415-434.
Appendix: How FeedWiz Works
FeedWiz Application Architecture
FeedWiz Application Server
HierarchicalClustering
Module Intelligent Miner for Text
Feed Aggregation
Module
.NET Web Service (C#)
Weblog sites(RSS feeds)
FeedWizClient
Flash applicationList of blogURLs
Hierarchy (XML) and individual posts
Appendix: How the documents are clustered
Blog posts are saved as text files on the FeedWiz server
Those files are grouped into clusters based on similarity
An output file is generated that describes the hierarchy
HierarchicalClustering
Module
Hierarchical Clustering Module
Original collection
1st
Iteration
2nd
Iteration
3rd
Iteration
nth
Iteration