17
Drawing the Big Picture: Temporal Visualization of Dynamic Collaboration Graphs of OSS Software Forks author name 1, author name 2 university name address, country {author1,author2}@department.university http://department.university/~author1 http://department.university/~author2 Abstract. How can we understand FOSS collaboration better? Can social issues that emerge be identified and addressed as they happen? Can the community heal itself, become more transparent and inclusive, and promote diversity? We propose a technique to address these issues by quantitative analysis and temporal visualization of social dynamics in FOSS communities. We used social network analysis metrics to iden- tify growth patterns and unhealthy dynamics; This gives the community a heads-up when they can still take action to ensure the sustainability of the project. Keywords. FOSS, Social Dynamics, Visualization, Temporal Visual- ization, Temporal Analysis, Free/Open Source Software, FLOSS, Fork- ing, Social Network Analysis 1 Introduction Social networks are a ubiquitous part of our social lives, and the creation of online social communities has been a natural extension of this phenomena. Free/Open Source Software (FOSS) development efforts are prime examples of how community can be leveraged in software development, as efforts are formed around communities of interest, and depend on continued interest and involvement in order to stay alive [Nyman 2011]. Though the bulk of collaboration and communication in FOSS communities occurs online and is publicly accessible, there are many open questions about the social dynamics in FOSS communities. Projects might go through a meta- morphosis when faced with an influx of new developers or the involvement of an outside organization. Conflicts between developers raised as the result of divergent opinions about the future of the project might lead to an ensuing fork of the project and the dilution of the community. Forking, either as a vio- lent split when there is a conflict or as a friendly divide when new features are experimentally added both affect the community [Bezrukova et al. 2010].

Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture:Temporal Visualization of Dynamic

Collaboration Graphs of OSS Software Forks

author name 1, author name 2

university nameaddress, country

{author1,author2}@department.universityhttp://department.university/~author1

http://department.university/~author2

Abstract. How can we understand FOSS collaboration better? Cansocial issues that emerge be identified and addressed as they happen?Can the community heal itself, become more transparent and inclusive,and promote diversity? We propose a technique to address these issuesby quantitative analysis and temporal visualization of social dynamicsin FOSS communities. We used social network analysis metrics to iden-tify growth patterns and unhealthy dynamics; This gives the communitya heads-up when they can still take action to ensure the sustainabilityof the project.

Keywords. FOSS, Social Dynamics, Visualization, Temporal Visual-ization, Temporal Analysis, Free/Open Source Software, FLOSS, Fork-ing, Social Network Analysis

1 Introduction

Social networks are a ubiquitous part of our social lives, and the creation ofonline social communities has been a natural extension of this phenomena.Free/Open Source Software (FOSS) development efforts are prime examplesof how community can be leveraged in software development, as efforts areformed around communities of interest, and depend on continued interest andinvolvement in order to stay alive [Nyman 2011].

Though the bulk of collaboration and communication in FOSS communitiesoccurs online and is publicly accessible, there are many open questions aboutthe social dynamics in FOSS communities. Projects might go through a meta-morphosis when faced with an influx of new developers or the involvement ofan outside organization. Conflicts between developers raised as the result ofdivergent opinions about the future of the project might lead to an ensuingfork of the project and the dilution of the community. Forking, either as a vio-lent split when there is a conflict or as a friendly divide when new features areexperimentally added both affect the community [Bezrukova et al. 2010].

Page 2: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

2 author name 1, author name 2

Most recent studies of FOSS communities have tended to suffer from animportant limitation. They treat community as a static structure rather thana dynamic process. In this paper, we propose to use temporal social networkanalysis to study the evolution and social dynamics of FOSS communities. Withthese techniques we aimed to identify measures associated with unhealthy groupdynamics, e.g. a simmering conflict, as well as early indicators of major eventsin the lifespan of a community. One dynamic we are especially interested in,are those of forked FOSS projects. We used the results of a study of forkedFOSS projects by [Robles and Gonzalez-Barahona 2012] as the starting pointfor out study, and tried to gain a better understanding of the evolution of thesecommunities.

This paper is organized as follows: We present related literature on onlinesocial communities, recounting their focus and the findings. We then present thegap in the literature, and what further study needs to be done. Next, we discusswhy the issue needs to be addressed and who benefits from it, in the motivationsection. After that, in methodology section, we describe how gathering data,doing the analysis, and the visualization of the findings was carried out. At theend, we present results, discussion and threats to validity.

2 Related Work

The social structures of FOSS communities have been studied extensively overthe past decade. Researchers have studied the social structure and dynamicsof team communications [Howison et al. 2006, Bird et al. 2008], identifyingknowledge brokers and their associated activities in FOSS projects [Sowe etal. 2006], their sustainability [Nyman 2011], FOSS forking [Nyman 2011, Rob-les and Gonzalez-Barahona 2012], their topology [Bird et al. 2008], their de-mographic diversity [Kunegis et al. 2012], gender differences in the process ofjoining them [Kuechler et al. 2012] and the role of the core team in their com-munities [Torres et al. 2011]. They have tended to look at community as a staticstructure rather than a dynamic process. This makes it hard to determine causeand effect, or the exact impact of social changes.

The study of communities has grown in popularity in part thanks to ad-vances in social network analysis. From the earliest works on studying infor-mation flow and predicting conflict and fission in groups by Zachary [Zachary1977], to the more recent works of [Leskovec et al.] on the statistical prop-erties of community structure in social networks, there is a growing body ofquantitative research on online communities.

The earliest works on communities was done with a focus on informationdiffusion in a community [Zachary 1977]. Zachary investigated the fission of acommunity, the process of communities splitting into two or more parts. Hefound that fission could be predicted by applying the Ford-Fulkerson min-cutalgorithm [Ford and Fulkerson 1957] on the group’s communication graph; “theunequal flow of sentiments across the ties” and discriminatory sharing of infor-

Page 3: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 3

mation lead to “subcommunities with more internal stability than the commu-nity as a whole.”

Community splits in FOSS projects are referred to as forks, and are rela-tively common. Forking is defined as “when a part of a development community(or a third party not related to the project) starts a completely independentline of development based on the source code basis of the project.” Robles andGonzalez-Barahona [Robles and Gonzalez-Barahona 2012] identified 220 signif-icant FOSS projects that have forked over the past 30 years, and compiled acomprehensive list of the dates and reasons for forking. They classified theseinto six main categories. Table 3. shows their results, discussed further in thenext sections, and which we build on extensively in this work. They identified agap in the literature in case of “how the community moves when a fork occurs”.

The dynamic behavior of a network and identifying key events was the aimof a study by [Asur et al. 2009]. They studied three DBLP co-authorship net-works and defined the evolution of these networks as following one of thesepaths: a) Continue, b) k-Merge, c) k-Split, d) Form, or e) Dissolve. They alsodefined four possible transformation events for individual members: 1) Appear,2) Disappear, 3) Join, and 4) Leave. They compared groups extracted from con-secutive snapshots, based on the size and overlap of every pair of groups. Then,they labeled groups with events, and used these identified events to calculatethe metrics in Table 1 for the nodes and the network.

Table 1: The measures of diversity used by [Asur et al. 2009]

Metrics Meaning

Stability Tendency of a node to have interactions with the same nodes overtime

Sociability Tendency of a node to have different interactions

Influence Number of followers a node has on a network and how its actions arecopied and/or followed by other nodes. (e.g. when it joins/leaves aconversation, many other nodes join/leave the conversation, too)

Popularity Number of nodes in a cluster (how crowded a sub-community is)

The communication patterns of FOSS developers in a bug repository wereexamined by [Howison et al. 2006]. They calculated out-degree centrality astheir metric. Out-degree centrality measures the proportion of the number oftimes a node contacted other nodes (outgoing) over how many times it wascontacted by other nodes (incoming). They calculated this centrality over time“in 90-day windows, moving the window forward 30 days at a time.” They foundthat “while change at the center of FOSS projects is relatively uncommon,”participation across the community is highly skewed, following a power-lawdistribution, where many participants appear for a short period of time, anda very small number of participants are at the center for long periods. Our

Page 4: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

4 author name 1, author name 2

approach is similar to theirs in how we form collaboration graphs and performour temporal analysis. Our approach is different in terms of our project selectioncriteria, the metrics we examine, and our research questions.

The tension between diversity and homogeneity in a community was studiedby [Kunegis et al. 2012]. Table 2 lists five network statistics they used to examinethe evolution of large-scale networks over time. They found that except for thediameter, all other measures of diversity shrunk as the networks matured overtheir lifespan. Kunegis et al. [Kunegis et al. 2012] argued that one possiblereason could be that the community structure consolidates as projects mature.

Table 2: The measures of diversity used by [Kunegis et al. 2012]

Network property Network is diverse when A network is diverse when

Paths between nodes Paths are long Effective diameter

Degrees of nodes Degrees are equal Gini coefficient of the degreedistribution

Communities Communities have similar sizes Fractional rank of the adja-cency matrix

Random walks Random walks have high proba-bility of return

Weighted spectral distribution

Control of nodes Nodes are hard to control Number of driver nodes

Community dynamics was the focus of a recent study by [Hanneman andKlamma 2013] on three open source bioinformatics communities. They mea-sured ”age” of users, as starting from their first activity and found survivalrates and two indicators for significant changes in the core of the community.They identified a survival rate pattern of 20-40-90%, meaning that only 20%of the newcomers survived after their first year, 40% of the survivors survivedthrough the second year, and 90% of the remaining ones, survived over the nextyears. As for the change in the core, they suggested that a falling maximal be-tweenness in combination with an increasing network diameter as an indicatorfor a significant change in the core, e.g. retirement of a central person in thecommunity. Our approach builds on top of their findings, and the evolution ofbetweenness centralities and network diameters for the projects in our studyare depicted in the following sections.

To date, most studies on FOSS have only been carried out on a small numberof projects, and using snapshots in time. To our knowledge, no study has beendone of project forking that has taken into account the temporal dimension.

3 Methodology

We argue that the social interactions data reflects the changes the communitygoes through, and will be able to describe the context surrounding a forking

Page 5: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 5

event. Robles and Gonzalez-Barahona [Robles and Gonzalez-Barahona 2012]classify the main reasons for forking into six classes, listed in Table 3.

Table 3: The main reasons for forking as classified by [Robles and Gonzalez-Barahona 2012]

Reason for forking Example forks

Technical (Addition of functionality) Amarok & Clementine Player

More community-driven development Asterisk & Callweaver

Discontinuation of the original project Apache web server

Commercial strategy forks LibreOffice & OpenOffice.org

Legal issues X.Org & XFree

Differences among developer team Kamailio & OpenSIPS

Three of the six listed reasons are socially related, and so should arguably bereflected somehow in the social interaction data. As an example, if a fork occursbecause of a desire for “more community-driven development”, we expect to seean interaction patterns in the collaboration data showing a strongly-connectedcore that is hard to penetrate for the rest of the community. In other words, thepower stays in the hands of the same people throughout, as new people comeand go.

We aimed to visualize and quantify how the community is structured, howit evolves, and the degree to which community involvement changes over time.To this end, we aimed to pick projects from the aforementioned three socialcategories of forked projects. This involved obtaining communication archives,creating the collaboration graphs, applying social network analysis techniqueand measuring SNA metrics, and visualizing the evolving graphs. We did thisin four phases as described in the following:

3.1 Phase 1: Data Collection

The entry point for our data collection was the results of a study of forksby [Robles and Gonzalez-Barahona 2012]. Their database included informationabout 220 significant forks and their reasons. We applied a three-stage selectioncriteria to those projects. A project was shortlisted if the fork had happenedafter the year 2000. The second stage, we checked the shortlisted projects’websites for availability of their data, and found that very few of them hadtheir collaboration data available. The next stage, we checked the size of theprojects, as we were looking for project communities of approximately the samesize. This three stage filtering process resulted in the projects listed in Table 4.

Data collection involved mining mailing list archives and bug repositories,although the bug repositories data was not used in the study described in thispaper, and is planned to be used in our future study. We collected data for the

Page 6: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

6 author name 1, author name 2

Table 4: The project forks for which collaboration data was collected

Projects Reason for forking Year

Amarok & Clementine Player Technical (Addition of functionality) 2010

Asterisk & Callweaver More community-driven development 2007

Kamailio & OpenSIPS Differences among developer team 2008

year in which the fork had happened, as well as for three month before andthree months after that year. So, in case of the Amarok and Clemetine Playerfork in 2010, we collected data dating back from October 2009 till March 2011.This argueably ensures the data captures the social context context around thetime of fork.

3.2 Phase 2: Creating Communication Graphs

Many social structures can be represented as graphs. The nodes represent ac-tors/players and the edges between them represent the interaction among theactors. Such graphs can be a snapshot of the network, which forms a staticgraph, or a longitudinally changing network, also called a dynamic graph.

In this phase, we processed the data to form a communication graph rep-resenting the community. This is fraught with complications, as collaborationgraph for a large community, e.g. Linux Kernel with 9515 contributors will bea complex network of 9515 nodes and millions of edges. Special graph samplingalgorithms are needed to use clustering techniques and social network analysistools on such a large and complex network. These graph-sampling techniqueshave been studied and compared by [Wang et al. 2011] and [Lu et al. 2011].[Chi et al. 2007] e.g. applied a temporal smoothness to their spectral clusteringalgorithm, and argued that incorporating this method results in a more robustclustering results, while it provides less sensitivity to short-term noise and is“adaptive to long-term clustering drift.”

We made use of the mlstat command line based tool to analyze mboxesand parsed the information into a locally hosted database for each project. Thedatabases were then queried as to form the communication graphs. We werelooking for how people have interacted with each other, but how about thegeneral mailing list? We decided to treat the general mailing list as a person,because the bulk of the communication was targeted at it, and most newcomersstart participating in the community by sending their questions to the generalmailing list. Each communication effort was captured with a timestamp. Thisallowed us to form a dynamic graph, in which the nodes would exist if andonly if they had an interaction with another node during the period we wereinterested in.

Page 7: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 7

3.3 Phase 3: Temporal Evolution Analysis

In this phase, we wanted to analyze the changes that happen to the communityover a given period of time, i.e. three months before and three months after theyear in which the forking event happened. To this end, we were interested in see-ing the changes in the roles individuals have, any shifts in the center of gravityof the project, unusual dilution or concentration of the part of the communityand how information can diffuse along the community. These changes can trans-late into questions, e.g. is everyone still connected to everyone like before, orthere are bridges burnt down. The roles individuals take in the community arein par with the individual’s importance or centrality in the network. We canmeasure each individual’s importance with a social network analysis measurecalled centrality. i.e. who is more important/central than the others? e.g. whohas had the most number of ”friends” in a community? (degree centrality)

There are many ways of looking at an individual’s importance. One is calledcloseness centrality. The farness of a node is defined as the sum of its dis-tances to all other nodes. The closeness of a node is defined as the inverseof the farness. More informally, the more central a node is the lower its to-tal distance to all other nodes. Closeness centrality can be used as a measureof how fast information will spread through the network [Chakrabarty andFaloutsos 2006]. Secondly, if we are looking for people who can serve as bridgesbetween two distinct communities, we could measure the node’s betweennesscentrality. Betweenness centralities for mediators who act as intermediate enti-ties between other nodes are higher [Chakrabarty and Faloutsos 2006]. Third,if cross-community collaboration is the focus, we can measure edge betweennesscentrality. Edges connecting nodes from different communities have higher edgecentrality values. In the community collaboration graph, edge betweenness orstress of an edge is the number of these shortest paths that the edge belongs to,considering all shortest paths between all pairs of nodes in the graph. Fourth,one can claim that certain people in the community are more important thanothers, and whoever is close to them, is relatively more important than others.In graph terms, this is measured by eigenvector centrality, which is based onthe assumption that connections to high-profile nodes contribute more to theimportance of a node. Google’s PageRank link-analysis algorithm by [Brin andPage 1998] is a variant of the eigenvector centrality measure. In short, cen-trality measures have been used in several studies to identify key player in acommunity.

In addition to the centrality measures, we planned to look into the resilienceof the community as well. By resilience, we mean how well the network holdsits structure and form when some parts of it are deleted, added, or changed.For a graph, the resilience of a graph is a measure of its robustness to node oredge failures. This could occur for instance when an influential member of thecommunity leaves. Many real-world graphs are resilient to random failures butvulnerable to targeted attacks. Resilience can be related to the graph diameter :

Page 8: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

8 author name 1, author name 2

Fig. 1: Heat-map color-coded examples of nodes with high centrality metric areshown above. The same network is analysed four times with the following cen-trality measures: A) Degree centrality, B) Closeness centrality, C) Betweennesscentrality and D) Eigenvector centrality [Rocchini 2012]

a graph whose diameter does not increase much on node or edge removal hashigher resilience [Chakrabarty and Faloutsos 2006].

We measured betweenness centrality [Brandes 2001] of the top significantnodes in the communication graph, and the graph diameter over time. Figures3-5 show the measured betweenness centralities over the 1.5 year period forthe Camailio, Amarok and Asterisk projects respectively. We did this analysisover time, rather than just a one-snapshot analysis. So, the temporal changesin measured values were what we were primarily interested in. To do temporalanalysis, we had two options; 1) To look at snapshots of the network stateover time, e.g. to look at the network snapshots in every week, the same waythat a video is composed of many consecutive frames, and 2) To look at aperiod through a time window. We preferred the second approach, and lookedthrough a time window of three months wide with 1.5 month overlaps withthe next time frame. For creating the visualizations, we used a 3 months timeframe that progressed six days a frame. In this way, we had a relatively smoothtransition.

Page 9: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 9

3.4 Phase 4: Temporal Visualization

Several visualization techniques and tools are used in the field of social net-work analysis, for instance Gephi [Bastian et al. 2009], NetLogo [Wilensky1999], igraph [Csardi and Nepusz 2006], NetworkX [Hagberg et al. 2008], So-NIA [Bender-deMoll and McFarland 2006], NodeXL [Smith et al. 2009]. Gephi,is a FLOSS tool for exploring and manipulating networks. It is capable of han-dling large networks with more than 20,000 nodes and features several SNAalgorithms. It is customizable with plugins and we used it for dynamic networkvisualization.

We temporally visualized the dynamic network changes which were recordedand are uploaded (anonymously, for the sake of double blindedness) online1. Thevideos show how the community graph is structured, using a continuous force-directed linear-linear model, in which the nodes are positioned in the spacenear or far from each other proportional to the graph distance between them.This results in a graph shape between between Fruchterman & Rheingold’s[Fruchterman and Reingold 1991] layout and Noack’s LinLog [Noack 2007].

(a) (b) (c) (d)

(e) (f) (g)

Fig. 2: Snapshots from the video visualization of Kamailio project’s collabora-tion graph (October 2007 - March 2009) in which a core contributor (coloredred) moves to the periphery and eventually departs out of the community.

1 Video visualizations available at http://goo.gl/aObtqa

Page 10: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

10 author name 1, author name 2

4 Results and Discussion

4.1 Kamailio Project

Figure 2 shows seven frames of interest from the Kamailio project’s lifetimearound the fork date. The events described here are easier to fully grasp bywatching the visualization video. (URL in the footnote2) The video shows the adynamic graph of contributors. A node’s size in a proportional to the number ofinteractions the node (the contributor) has had within the 1.5 year time periodsurrounding the fork. The positions of the node change and node and edges arepresent if they had have interactions within a three month time window shown,with six day progress step at a time. The 1 minute and 37 seconds video showsthe lifetime of the Kamailio project between October 2009, and March 2011.Nodes are colored based on the modularity of the network. One can easily seethat the community starts with the GeneralList as the the biggest node, andfour major core contributors and three major contributors of a slightly lessersize. The big red-colored node’s transitions are hard to miss, as this major con-tributor departs from the core to the periphery of the network (Video minute1:02) and then dispatches and leaves the community (Video minute 1:24). Ourobservation is that this is easy to detect and can arguably be associated witheither a conflict or retirement. This corresponds to the personal difference cat-egory of forking reasons.

Fig. 3: Kamailio project’s top contributors’ betweenness centralities and net-work diameter over time between October 2009 to March 2011 in 3-months timewindows with 1.5 months overlaps

2 Video visualizations available at http://goo.gl/aObtqa

Page 11: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 11

Figure 3 shows the betweenness centrality of the major contributors of Ka-mailio project over the same 1.5 year time period. The horizontal axis marksthe dates, i.e. each mark represents a 3-month time window with 1.5 monthsoverlap with adjacent marks. The vertical axis shows the percentage of the topbetweenness centralities for each node. The saliency of the GeneralList, coloredas light blue, is apparent due to to its continous and dominant presense in thestacked area chart. The chart legend lists the contributors based on the colorand in the same order of appearance on the chart starting from the bottom.One can easily see that around the ”Aug. 15, 2008 - Nov. 15, 2008” tick markon the horizontal axis, several contributors’ betweenness centralities shrink toalmost zero and disappear. This quantitative analysis helps in identifying thedate of fork with a month accuracy.

The network diameter of Kamailio project over the same 1.5 year time pe-riod is also shown in Figure 3, under the area chart. The increase in the networkdiameter during the same identified time period confirms the findings of [Han-nemann and Klamma 2013].

This approach can be used in identifying the people involved in the assumedconflict and the date the fork happened with a month accuracy.

4.2 Amarok Project

The video for the Amarok project fork is available online3, and the results fromquantitative analysis of the betweenness centralities and the network diametersare depicted in Figures 4. The results show that the network diameter has notincreased over the studied time period corresponding to the fork date, whichshows a resilient network. The video visualization shows the dynamic changes inthe network structure, which are typical of a healthy network dynamic, ratherthan a simmering conflict. These indicators show that Amarok fork in 2010arguably belongs to the addition of technical functionality category of forkingreasons.

4.3 Asterisk Project

The video for the Asterisk project fork is also available online3, and the resultsfrom quantitative analysis of the betweenness centralities and the network di-ameters are depicted in Figures 5. The results show that the network diameterhas remained steady at 6 throughout the period. The Asterisk community wasby far the most crowded project among the three, with 932 nodes and 4282edges within the specified time period. The stacked area chart depicts the dis-tribution of centralities, in which, we can see a 80%-20% distribution, i.e. about80% or more of the centralities are limited to six major players, with the restof the community accounting for only 20%. This is evident in the video rep-resentation as well, as the top-level structure of the network holds throughout

3 Video visualizations available at http://goo.gl/aObtqa

Page 12: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

12 author name 1, author name 2

Fig. 4: Amarok project’s top contributors’ betweenness centralities and networkdiameter over time between October 2009 to March 2011 in 3-months timewindows with 1.5 months overlaps

the time period. The combined results from the visual analysis and quantita-tive analysis arguably links the Asterisk fork to the more community-drivencategory of forkings reasons.

5 Conclusion

In this work, we studied collaboration network of three open source projectsusing a combined analysis method of temporal visualization and temporalquantitative analysis. We based our study on two papers by [Robles andGonzalez-Barahona 2012] and [Hanneman and Klamma 2013], and identifiedthree projects that had forked in the recent past. We mined the collaborationdata, formed dynamic collaboration graphs, and measured social network anal-ysis metrics over an 18-month period time window.

We also visualized the dynamic graph (available online) and as stacked areacharts over time. The visualizations and the quantitative results showed thedifferences among the projects in the three forking reasons of personal differ-ences among the developer teams, technical differences (addition of new func-tionality) and more community-driven development. The personal differencesrepresentative project was identifiable, and so was the date it forked, with amonth accuracy.

The novelty of the approach was in applying the temporal analysis ratherthan static analysis, and in the temporal visualization of community structure.

Page 13: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 13

Fig. 5: Asterisk project’s top contributors’ betweenness centralities and networkdiameter over time between October 2009 to March 2011 in 3-months timewindows with 1.5 months overlaps

We showed that this approach shed light on the structure of these projects andreveal information that cannot be seen otherwise.

6 Threats to Validity

The presented findings may not be generalized to all OSS projects. The projectsstudies in this paper were selected from a pool of candidate projects, partly be-cause data about them was available. Given access, a better sampling approachhas to be adopted, which could result in a more robust investigation. Further-more, the proposed technique uses the data from online communications. Theassumption that all the communication can be captured by mining repositoriesis intuitively imperfect, but inevitable. Hence, to minimize the effect of this as-sumption, we plan to complement the quantitative approach with a qualitativeapproach of interviewing key individuals from the community as future work.This will help triangulate the results.

7 Future Work

Considering the novelty of temporal dynamic analysis and temporal visualiza-tion, there is ample possibility for further studies. To validate the results ofthis study, the proposed technique could be applied to other OSS project forks,

Page 14: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

14 author name 1, author name 2

with additional parameters. In today’s world of big data, the possibilities arenumerous for formulating novel well-formed research questions.

Acknowledgments

We would like to thank the Amarok, Asterisk and Kamailio project membersfor making their data available, without which this study would not have beenpossible.

References

1. Asur, S., S. Parthasarathy, and D. Ucar, (2009), “An event-based framework forcharacterizing the evolutionary behavior of interaction graphs,” in ACM Trans.Knowledge Discovery Data. 3, 4, Article 16, (November 2009), 36 pages. 2009.

2. Azarbakht, A. and C. Jensen, (2013), “Analyzing FOSS Collaboration & So-cial Dynamics with Temporal Social Networks,” in Proceedings of the 9th In-ternational Conference on Open Source Systems Doctoral Consortium, KoperCapodistria, Slovenia, 2013.

3. Bastian, M., S. Heymann, and M. Jacomy, “Gephi: an open source software forexploring and manipulating networks,” presented at the International AAAI Con-ference on Weblogs and Social Media, 2009.

4. Bender-deMoll, S. and D. A. McFarland, “The Art and Science of Dynamic Net-work Visualization,” Journal of Social Structure, vol. 7, 2006.

5. Bergquist, M. and Ljungberg, J., (2001). , “The power of gifts: organizing socialrelationships in open source communities,” Info. Systems J., 11, 305-320. 2001.

6. Bird, C., D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu, “Latent socialstructure in open source projects,” in Proceedings of the 16th ACM SIGSOFTInternational Symposium on Foundations of software engineering, New York, NY,USA: ACM, pp. 24-35, 2008.

7. Brandes, U. (2001). “A Faster Algorithm for Betweenness Centrality”, in Journalof Mathematical Sociology 25(2):163-177.

8. Brin, S. and L. Page, “The anatomy of a large-scale hypertextual Web searchengine,” in Proceedings of the 7th international conference on World Wide Web7 (WWW7), Philip H. Enslow, Jr. and Allen Ellis (Eds.). Elsevier Science Pub-lishers B. V., Amsterdam, The Netherlands, The Netherlands, 107-117. 1998.

9. Chakrabarti, D. and C. Faloutsos. “Graph mining: Laws, generators, and algo-rithms,” ACM Computing Surveys, 38, 1, Article 2, 2006.

10. Chi, Y., X. Song, D. Zhou, K. Hino, and B. L. Tseng, “Evolutionary spectral clus-tering by incorporating temporal smoothness,” in Proceedings of the 13th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’07). ACM, New York, NY, USA, 153-162. 2007.

11. Crowston, K., and Scozzi, B., (2002). “Open Source Software Projects as VirtualOrganizations: Competency Rallying for Software Development,” IEE Proceed-ings Software. 149(1), 3-17.

12. Crowston, K., K. Wei, J. Howison. 2012. “Free/Libre open-source software devel-opment: What we know and what we do not know,” ACM Comput. Surv. 44, 2,Article 7, 2012.

Page 15: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 15

13. Csardi, G. and T. Nepusz, “The igraph software package for complex networkresearch,” in InterJournal Complex Systems, 2006.

14. Cuny, F., (Mar. 25 2010), Github Explorer, Available at:http://www.flickr.com/photos/franck /4460144638/, 2010.

15. De Souza, C. R. B., Froehlich, J., and Dourish, P. (2005), “Seeking the Source:Software Source Code as a Social and Technical Artifact,” Proc. ACM Intern.Conf. Supporting Group Work (GROUP 2005), 197-206, Sanibel Island, Florida.2005.

16. Elliott, M. and Scacchi, W., (2003), “Free Software Developers as an OccupationalCommunity: Resolving Conflicts and Fostering Collaboration,” Proc. ACM In-tern. Conf. Supporting Group Work, 21-30, Sanibel Island, FL, November. 2003.

17. Erdos, P. and A. Renyi, “On random graphs, I,” Publicationes Mathematicae(Debrecen), vol. 6, pp. 290-297, 1959.

18. FLOSS (2002). , “Free/Libre and Open Source Software: Survey and Study,FLOSS Final Report,”http://www.flossproject.org/report/. 2006.

19. Ford, L. R. and D. R. Folkerson, “A simple algorithm for finding maximal net-work flows and an application to the Hitchcock problem,” Canadian Journal ofMathematics, vol. 9, pp. 210-218, 1957.

20. Fruchterman, T. M. J. and E. M. Reingold, (Nov. 1991). “Graph drawing byforce-directed placement,” Softw: Pract. Exper., vol. 21, no. 11, pp. 11291164.

21. Hagberg, A. A., D. A. Schult, and P. J. Swart, “Exploring network structure,dynamics, and function using NetworkX,” in Proceedings of the 7th Python inScience Conference (SciPy2008), Pasadena, CA USA, 2008.

22. Howison, J. and K. Crowston. “The perils and pitfalls of mining SourceForge,”In Proceedings of the International Workshop on Mining Software Repositories(MSR 2004), pp. 7-11. 2004.

23. Howison, J., K. Inoue, and K. Crowston, “Social dynamics of free and open sourceteam communications,” in Proceedings of the IFIP Second International Confer-ence on Open Source Systems, 319-330, 2006.

24. Howison, J., M. Conklin, and K. Crowston, “FLOSSmole: A collaborative reposi-tory for FLOSS research data and analyses,” International Journal of InformationTechnology and Web Engineering, 1(3), 1726. 2006.

25. Huntley, C.L., (2003), “Organizational Learning in Open-Source SoftwareProjects: An Analysis of Debugging Data,” IEEE Trans. Engineering Manage-ment, 50(4), 485-493. 2003.

26. Jergensen, C., A. Sarma, and P. Wagstrom. 2011. “The onion patch: migrationin open source ecosystems,” In Proceedings of the 19th ACM SIGSOFT sympo-sium and the 13th European conference on Foundations of software engineering(ESEC/FSE ’11). ACM, New York, NY, USA, 70-80. 2011.

27. Koch, S. (Ed.), (2005), “Free/Open Source Software Development,” Idea GroupPublishing, Hershey, PA. 2005.

28. Kuechler, V., C. Gilbertson, and C. Jensen, “Gender Differences in Early Freeand Open Source Software Joining Process,” Open Source Systems: Long-TermSustainability, pp. 78-93. 2012.

29. Kunegis, J., S. Sizov, F. Schwagereit, and D. Fay, “Diversity dynamics in onlinenetworks,” in Proceedings of the 23rd ACM Conference on Hypertext and SocialMedia, Milwaukee, Wisconsin, USA, 2012.

30. Leskovec, J., Kleinberg, J., and Faloutsos, C.: “Graphs over time: densifica-tion laws, shrinking diameters and possible explanations,” in Proceedings of the

Page 16: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

16 author name 1, author name 2

SIGKDD International Conference on Knowledge Discovery and data Mining,2005.

31. Leskovec, J., K. J. Lang, A. Dasgupta, and M. W. Mahoney, “Statistical propertiesof community structure in large social and information networks,” in Proceedingsof the 17th International Conference on World Wide Web (WWW ’08), ACM,New York, NY, USA, 695-704. 2008.

32. Lu, J. and D. Li, “Sampling online social networks by random walk,” in Proceed-ings of the First ACM International Workshop on Hot Topics on InterdisciplinarySocial Networks Research, ACM, New York, NY, USA, 33-40. 2012.

33. Madey, G., Freeh, V., and Tynan, R., (2002), “The Open Source DevelopmentPhenomenon: An Analysis Based on Social Network Theory,” Proc. AmericasConf. Info. Systems (AMCIS2002), 1806-1813, Dallas, TX. 2002.

34. Madey, G., Freeh, V., and Tynan, R., (2005), “Modeling the F/OSS Community:A Quantitative Investigation,” in S. Koch (ed.), Free/Open Source Software De-velopment, 203-221, Idea Group Publishing, Hershey, PA. 2005.

35. Noack, A., (2007) “Energy models for graph clustering,” J. Graph AlgorithmsAppl., vol. 11, no. 2, pp. 453480.

36. Nyman, L. , “Understanding code forking in open source software,” in Proceedingsof the 7th International Conference on Open Source Systems Doctoral Consor-tium, Salvador, Brazil, 2011.

37. Nyman, L., T. Mikkonen, J. Lindman, and M. Fougere, “Forking: the invisiblehand of sustainability in open source software,” in Proceedings of SOS 2011:Towards Sustainable Open Source, 2011.

38. Oh, W. and S. Jeon, “Membership dynamics and network stability in the open-source community: the ising perspective,” in Proceedings of the InternationalConference on Information Systems, 2004.

39. Ovaska, P., Rossi, M. and Marttiin, P. (2003), “Architecture as a CoordinationTool in Multi-Site Software Development,” Software ProcessImprovement andPractice, 8(3), 233-247. 2003.

40. Robles, G. and J. M. Gonzalez-Barahona, “A comprehensive study of softwareforks: Dates, reasons and outcomes,” in Proceedings of the 8th InternationalConference on Open Source Systems, Hammamet, Tunisia, 2012.

41. Rocchini, C. (Nov. 27 2012), Wikimedia Commons, Available:http://en.wikipedia.org/wiki/File:Centrality.svg, 2012.

42. Scacchi, W., (2004), “Free/Open Source Software Development Practices in theComputer Game Community,” IEEE Software, 21(1), 59-67, January/February.2004.

43. Scacchi, W., (2007), “Free/Open Source Software Development: Recent ResearchResults and Methods,” in M.V. Zelkowitz (ed.), Advances in Computers, 69, 243-295, 2007.

44. Smith, M. A., B. Shneiderman, N. Milic-Frayling, E. Mendes Rodrigues, V.Barash, C. Dunne, T. Capone, A. Perer, and E. Gleave, “Analyzing (social me-dia) networks with NodeXL,” in Proceedings of the 4th International Conferenceon Communities and Technologies (C& T ’09), 2009.

45. Sowe, S., L. Stamelos, and L. Angelis, “Identifying knowledge brokers that yieldsoftware engineering knowledge in OSS projects,” Information and Software Tech-nology, vol. 48, pp. 1025-1033, Nov 2006.

46. Torres, M. R. M., S. L. Toral, M. Perales, and F. Barrero, “Analysis of the CoreTeam Role in Open Source Communities,” in Complex, Intelligent and Software

Page 17: Drawing the Big Picture: Temporal Visualization of Dynamic ...research.engr.oregonstate.edu/hci/sites/research... · a dynamic process. In this paper, we propose to use temporal social

Drawing the Big Picture: Temporal Visualization of Collaboration Graphs 17

Intensive Systems (CISIS), 2011 International Conference on, pp. 109-114. IEEE,2011.

47. Wang, T., Y. Chen, Z. Zhang, T. Xu, L. Jin, P. Hui, B. Deng, and X. Li. “Under-standing graph sampling algorithms for social network analysis,” in DistributedComputing Systems Workshops (ICDCSW), 31st International Conference on,pp. 123-128. IEEE, 2011.

48. Wilensky, U., “NetLogo,” in Center for Connected Learning and Computer-BasedModeling, Northwestern University. Evanston, IL., 1999.

49. Xu, J., Y. Gao, S Christley, and G Madey. 2005, “A Topological Analysis of theOpen Souce Software Development Community,” in Proceedings of the Proceed-ings of the 38th Annual Hawaii International Conference on System Sciences -Volume 07 (HICSS ’05), Vol. 7. IEEE Computer Society, Washington, DC, USA.2005.

50. Ye, Y. and Kishida, K., (2003), “Towards an understanding of the motivation ofopen source software developers,” Proc. 25th Intern. Conf. Software Engineering,Portland, OR, 419-429, IEEE Computer Society, May. 2003.

51. Zachary, W., “An information flow model for conflict and fission in small groups,”Journal of Anthropological Research, vol. 33, no. 4, pp. 452-473, 1977.