20
High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping Humanity’s Knowledge and Expertise in the Digital Domain Annual Meeting of the Association of American Geographers, Denver, CO, April 5-9, 2005

High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Embed Size (px)

Citation preview

Page 1: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

High-dimensional social data: a mapper’s worst nightmare

Elijah WrightSchool of Library and Information Science

Indiana University, Bloomington

Mapping Humanity’s Knowledge and Expertise in the Digital Domain

Annual Meeting of the Association of American Geographers, Denver, CO, April 5-9, 2005

Page 2: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Pros and Cons, or “Why Map Social Artifacts?”

One of the largest benefits of my work - and all similar work - is that it tries to transform very large social interaction patterns into a more understandable form.

The largest danger with this approach, of course, is that fine-grained details of high interpretive value can quickly become lost in a sea of possibly, but not necessarily, relevant data.

Major advantage: we can come to understand aspects of systems that are far too complex to map out by any other means.

Page 3: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Visualization 1 and User Task

Page 4: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

User Task Questions• How do I find myself within this map?• How do I locate other people who share more than

one, or some combination, of my interests?• How do I track myself within this map as my ideas,

and as my posting habits, change?• What can I learn from the relationships between

clusters of points in this map? Do I infer that proximity between the center of two topic clusters means that people tend to post about both of those topics?

• How do I interpret cluster centrality versus marginal positioning?

Page 5: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

What do we all share, and what can we get from these

questions?• I want us to share new, exciting, and innovative ways

of thinking about the arrangement of high-dimensional data, and to actively contribute to each other’s work. For myself, I want to better understand how cartographers and geographers would like to see us use spatial metaphors to place abstract (social and other) data into imagined space.

• The “real-world” nature of social questions makes for a pretty convenient grounding for the interpretation of new techniques.

Page 6: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

What I do• Geographers and information scientists are facing

similar problems – especially regarding the visualization of high-dimensional, abstract data that may or may not have a real-world spatial component.

• In my own work, which attempts to analyze, model, and visualize the evolving structure of social communication networks (via citations, weblogs, or semantic web data), the scale of the data is often such that it is very difficult to generate usable visualizations or any meaningful analysis of its systemic properties.

Page 7: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

More What-I-Do

• As we saw in my sample set of user tasks, it is common for users to want to develop an understanding of how they relate to others within the system, where interesting activity is going on, and where fruitful results may be gleaned from additional conversation or interaction.

• These seem to be core, common user tasks which many providers of high-dimensional or high-quality data sets wish to support.

Page 8: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Algorithmic Toolbox

• Many. Primarily drawing upon Social Network Analysis; often use Principal Components Analysis, MDS (multidimensional scaling), and other SVD-related techniques.

• More of the approaches from various schools of analysis (information retrieval research, network analysis, corpus-linguistic methods) share mathematical roots (factor analysis, principal components analysis, eigen-systems of one sort or another) than is commonly admitted.

Page 9: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Motivation, and Suggestions for Disciplinary Synthesis

• The visualization problem - accompanied by the need for research into how users cognize and interpret our ‘maps’

• The data storage and management problem - along with issues of data quality and data sampling.

• The mathematical problem: much of this work relies on techniques that are relatively difficult to learn, evaluate, or teach.

• Trust and privacy issues• All of these need *synthesis* into solutions. This is

**HARD**.

Page 10: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Technical Challenges

• The primary technical challenge facing us, as users of high-dimensional data, is the provision of both appropriate statistical methods and reliable, efficient storage systems that can scale with the rapid increase in size of the data to be considered.

Page 11: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Non-technical (social!) challenges

• The most important non-technical challenges are interpretation and the retention of contextually sensitive data. Many current systems are difficult to interpret for other users than those designing the systems, or require expertise that is not readily available to the target users. Along similar lines, it is difficult to retain an appropriate amount of contextually important information within a large universe of data. Users viewing maps of all of science, or of large subsets of human knowledge, may be able to devise interesting (and perhaps more valuable) ways to arrange the data that are not initially obvious to system designers. Any system that allows for the creation of maps should carefully consider user input and the interpretation of stored data by these end users.

Page 12: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Sample Data and Visualizations / ‘Maps’

My work is primarily associated with analysis of weblog and semantic web data. In association with a number of other researchers, I have been studying the network properties of both the “blogosphere” (the global, interconnected network of weblog authors) and of data associated with the W3C’s (World Wide Web Consortium) FOAF (Friend-of-a-Friend) Semantic Web project.

Page 13: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Sample Maps

1) Content-based MDS map of a weblog corpus [see slide #2]

2) Vis of PCA of LiveJournal user’s interests

3) Vis of correspondences betweeen LJ user interests and their social relations

4) Vis of a snowball crawl of the blogosphere, with content-analytic codes applied

5) A smaller slice of vis 4 - Catholic weblog authors

Page 14: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Visualization 2

• LJ FOAF vis

Page 15: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Visualization 3

Clusters

and

Groups

Page 16: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Visualization 4

Page 17: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Visualization 5 - Catholic weblog authors

Page 18: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Planned Work (some now completed…)

• When I proposed this talk, my collaborator (John Paolillo) and I were preparing a chapter for the second edition of Vladimir Geroimenko’s book, Visualizing the Semantic Web. That’s now done, and is the source for Vis. 2/3.

• An article for The Semantic Web Journal and a research paper for a social networks conference (focused on the evolving network structure of NIH author networks) are also in the planning stages.

Page 19: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Representative Work

• Much of my recent research reading has focused on the application of social network analysis principles to the graph structures produced by interconnections between weblog and Semantic Web documents. Also see papers linked from http://www.blogninja.com/ for a sense of what our research group is up to and has done in the past.

• Representative papers:

Susan C. Herring, Inna Kouper, John C. Paolillo, Lois Ann Scheidt, Michael Tyworth, Peter Welsch, Elijah Wright, and Ning Yu. (2005). Conversations in the Blogosphere: An Analysis "From the Bottom Up". Proceedings of the Thirty-Eighth Hawai'i International Conference on System Sciences (HICSS-38). Los Alamitos: IEEE Press. Available at http://www.blogninja.com/hicss05.blogconv.pdf

John C. Paolillo and Elijah Wright. (2004). “The Challenges of FOAF Characterization.” From the proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web. Available at http://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/challenges_of_foaf_characterization/

Page 20: High-dimensional social data: a mapper’s worst nightmare Elijah Wright School of Library and Information Science Indiana University, Bloomington Mapping

Citations…

• John C. Paolillo and Elijah Wright. (2004). “The Challenges of FOAF Characterization.” From the proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web. Available at http://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/fp/challenges_of_foaf_characterization/

• Herring, Susan C., Kouper, Inna, Paolillo, John, Scheidt, Lois Ann, Tyworth, Michael, Welsch, Peter, Wright, Elijah, Yu, Ning. (2005). Conversations in the Blogosphere: A Social Network Analysis "from the Bottom Up". In Proceedings of the Thirty-eighth Hawaii International Conference on System Sciences (HICSS-38) (Ed.), Los Alamitos: IEEE Press.

• Herring, Susan C., Kouper, Inna, Scheidt, Lois Ann, & Wright, Elijah (2004). Women and Children Last: The Discourse Construction of Weblogs. In Laura J. Gurak, Smiljana Antonijevic, Laurie Johnson, Clancy Ratliff, & Jessica Reyman (Eds.), Into the Blogosphere: Rhetoric, Community, and Culture of Weblogs (Minneapolis).

• Herring, Susan C., Scheidt, Lois Ann, Bonus, Sabrina, & Wright, Elijah (in press). Weblogs as a bridging genre. Information, Technology, & People.

• Herring, Susan C., Scheidt, Lois Ann, Bonus, Sabrina, & Wright, Elijah (2004b). Bridging the Gap: A Genre Analysis of Weblogs. In Proceedings of the Thirty-seventh Hawaii International Conference on System Sciences (HICSS-37) (Ed.), Los Alamitos: IEEE Press.

• Scheidt, Lois Ann & Wright, Elijah (2004). Common Visual Design Elements of Weblogs. In Laura J. Gurak, Smiljana Antonijevic, Laurie Johnson, Clancy Ratliff, & Jessica Reyman (Eds.), Into the Blogosphere: Rhetoric, Community, and Culture of Weblogs (Minneapolis).