Analyzing Multidimensional Networks within MediaWikis

Preview:

DESCRIPTION

The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be analyzed using standard techniques from social network analysis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, information retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a software library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and differences between di erent relationships for the Wikipedia articles about "Pope Francis" and "Social media." We conclude by discussing the implications this library has for both theoretical and methodological research as well as community management and outline future work to expand the capabilities of the library.

Citation preview

Analyzing Multidimensional Networks within MediaWikis!

WikiSym 2013!Hong Kong, China!August 7, 2013!

Brian Keegan, Ph.D. @bkeegan

Arber Ceni Marc A. Smith, Ph.D. @marc_smith

Outline!•  Motivation!•  Relationships within MediaWikis!•  Multidimensional network exploration!•  NodeXL platform!•  NodeXL MediaWiki Importer!•  Case Study!•  Demo!

2

Motivation!•  Collaboration is fundamentally

relational!•  Use network analysis methods to

understand success of wikis!!•  A variety of MediaWiki meta-data

accessible through API are relational!•  Build on top of existing network

analysis package to simplify retrieval, structuring, cleanup, and visualization!

3

Relationship types!User-Object relationships!

!!

User-User relationships!!!

Object-Object relationships!

4

e   a  

e   e  

a  a  

User-Object relationships!

•  Editing!•  user e makes a revision to article a!

•  Watchlist!•  user e has article a on watchlist!

•  Affiliation!•  user e is a member of project a!

5

e   a  

Undirected User-User relationships!

•  Co-authorship!•  e1 and e2 edited the same article !

•  Co-affiliation!•  e1 and e2 are members of the same project!

6

e1   e2  

Directed User-User relationships!

•  Discussion!•  e1 left a message on e2’s talk page !

•  Article trajectory!•  e2 modified the article after e1!

7

e1   e2  

Undirected Object-Object relationships!

•  Shared authorship!•  a1 and a2 were edited by the same users!

•  Category co-membership!•  a1 and a2 are members of the same categories!

8

a2  a1  

Directed Object-Object relationships!

•  Hyperlinks!•  a1 has a link to a2 !

•  Editor trajectory!•  a2 is modified by a user after a1!

9

a2  a1  

Multidimensional networks!

•  Multiple types of links between nodes!•  Hyperlink!•  Shared authorship!•  Category co-membership!

•  Presence of overlapping ties may explain collaboration more richly!

•  Absence of overlapping ties may reveal anomalies for follow-on analysis!

10

a2  a1  

Network exploration!

11

Network exploration!

12

NodeXL Platform!•  https://nodexl.codeplex.com/!•  Lower barriers to entry by using spreadsheet workflows!•  Network analysis plug-in for Microsoft Excel!•  “Spigots” to import network data from Twitter, Facebook,

Flickr, Email, YouTube, and WWW!

13

NodeXL MediaWiki Importer!•  https://wikiimporter.codeplex.com/!•  Graph data provider for NodeXL à new “spigot”!•  Queries MediaWiki API through DotNetWikiBot

framework!•  Given a Page and a Site, returns a PageList!

14

NodeXL MediaWiki Importer!

15

Rela%onship  to  crawl  

Boundary  condi%ons  

Case Study!•  Compare the structures of different relationships across

two types of English Wikipedia articles!•  “Social media”!•  “Pope Francis”!

•  Node layout via “Harel-Koren Fast Multiscale”!•  Spring-embedding layout to emphasize clusters of ties!

•  Nodes grouped via “Clauset-Newman-Moore”!•  Nodes assigned to group if more ties within group than outside!

•  “Group-in-a-box” layout!•  Ties within group visualized individually, ties between groups

collapsed together!

16

17

Co-authorship!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they contributed to other articles

18

Article trajectory!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they edited after one another

19

User discussion!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they left messages on other users’ talk

20

Shared authorship!Pope Francis! Social media!

Nodes are other articles edited by the users who contributed to article Links together if they share multiple co-authors

21

Hyperlink!Pope Francis! Social media!

Nodes are articles linked from seed article Links together if they link to each other

Structural Typologies!

22

Discussion!•  Wikipedia and other MediaWiki projects contain a variety

of complex and multidimensional relationships among users and objects!

•  NodeXL MediaWiki Importer is a tool for simplifying complex data extraction and analysis workflows!

•  NodeXL provides a powerful suite of tools to analyze and visualize the structure of multidimensional relationships!

•  Empirical testing of social theories as well as diagnosing the health of online communities!

23

Future work!•  Incorporating additional meta-data!

•  Editors (registered, edit count, block count, tenure)!•  Objects (namespace, age, edit count, assessment, pageviews)!•  Content-level features (images, keywords)!•  Temporal features!

•  Additional relationships!•  Inter-language links!•  Backlinks!•  Wiki-love!•  Blocks (users and objects)!

24

25

THANK YOU!!

Brian Keegan, Ph.D. @bkeegan

Arber Ceni Marc A. Smith, Ph.D. @marc_smith

Recommended