23
Social Networks Analytics Hubert Lo Prateek Maitra Aaron Strahl Wikipedia Vote Network

Wikipedia Vote Network - Social Networks

Embed Size (px)

Citation preview

Page 1: Wikipedia Vote Network - Social Networks

Social Networks Analytics

Hubert Lo Prateek Maitra Aaron Strahl

Wikipedia Vote Network

Page 2: Wikipedia Vote Network - Social Networks

Outline- Introduction- Wikipedia Request for Adminship- Is the RfA process fair?- Application Techniques

- Descriptive Statistics- Distributions, Betweenness, Clustering

- Graph Partitioning- Key Takeaways- Conclusion

Page 3: Wikipedia Vote Network - Social Networks

Background

Page 4: Wikipedia Vote Network - Social Networks

Background About RfA and its process:

Nomination

Notice of RfA

Expressing Opinions

Discussion, decision, and closing procedures

Page 5: Wikipedia Vote Network - Social Networks

Research Question Question: We were interested to analyze the directed graph

relationship between wikipedia administrator users and

average users in a Wikipedia voting dataset.

Are the procedures in place fair or not?

Page 6: Wikipedia Vote Network - Social Networks

Application TechniquesTechniques:

Descriptive Statistics and Interpretation

Graph Partitioning/Visuals

Filtering the network by increasing degree - Gephi

Network Degree Distribution

Pattern of Random or Preferential Attachment?

Page 7: Wikipedia Vote Network - Social Networks

Descriptive Statistics Edges Count: 103,689 Strongly Connected - False

Vertices Count: 7,115 Global Clustering: 0.1254791

Reciprocity: 0.0564 Weakly Connected - False

Average Path: 3.34 Diameter: 10

Page 8: Wikipedia Vote Network - Social Networks

Degree Distribution• The Long Tail

Distribution is very

evident

• Nodes from 0 to 100

degrees account for

about 85% of the all the

nodes in the dataset

Page 9: Wikipedia Vote Network - Social Networks

Degree Distribution• Few hubs with large

number of links.

• Many nodes with less

number of links.

Page 10: Wikipedia Vote Network - Social Networks

Log-Log Plot

• Quantity being

measured can be

viewed as a type of

popularity

• Rich-get-Richer

Phenomenon

Page 11: Wikipedia Vote Network - Social Networks

Average Betweenness and Degree

• Degree Centrality and

Node Betweenness appear

very linear

• Nodes with higher degree

of connections have

higher betweenness

scores

Page 12: Wikipedia Vote Network - Social Networks

Average Clustering and Degree• local clustering appears

to be decreasing

exponentially as degree

centrality increases,

resembling the power law

phenomenon

• Moderate levels of

degree centrality, still

high clustering levels

Page 13: Wikipedia Vote Network - Social Networks

Average Constraint and Degree• Average constraint

embeddedness and degree

centrality have a

negative linear

relationship.

• Majority of users have

relatively low level of

constraint.

Page 14: Wikipedia Vote Network - Social Networks

Average Neighbor Degree and Degree Plot

• Low level degree

users have wide,

their neighbors

have higher average

degree.

• As we increase

degree, in

comparison their

neighbors have

lower degree

connections.

Page 15: Wikipedia Vote Network - Social Networks

Application Techniques - Partitioning Challenge in How to partition the graph?We have a network

that has a lot of edges, very dense.

Nodes:7,066

Edges:103,663

Page 16: Wikipedia Vote Network - Social Networks

Graph Networks - Partitioning We increased the degree over time to see how the network

evolved

Degree: Range 2 to 1,167.

Nodes:4797 (67.42%)

Edges:101394(97.97%)

Page 17: Wikipedia Vote Network - Social Networks

Graph Networks - Increasing Degree

Page 18: Wikipedia Vote Network - Social Networks

Graph Networks - Partitioning

Degree Range 160 to 1,167.

Nodes:262 (3.68%)

Edges:9,959(9.60%)

Page 19: Wikipedia Vote Network - Social Networks

Graph Networks - Partitioning

Degree Range 260 to 1,167.

Nodes:92(1.29%)

Edges:2,098(2.02%)

Page 20: Wikipedia Vote Network - Social Networks

Core Component• Majority of these nodes have very high betweenness scores.

• Majority of these nodes have high eigenvector centrality.

• They belong to the strongly connected component id:1016.

Page 21: Wikipedia Vote Network - Social Networks

Key Takeaways- RfA process for adding new administrators does not

exhibit weak or strong connectivity

- Network structure is directed toward a dense, central

core with a lot of nodes around the periphery

- Rich-get-richer/Preferential attachment model

characteristics are exhibited

- Although every vote counts the same, an Administrator’s

vote has the potential to bring many more votes along

with it

- Graph partitioning allows us to view the core clearly

Page 22: Wikipedia Vote Network - Social Networks

So is it Fair?- Ultimately, we determined that the Wikipedia Rfa process

is fair but highly flawed, with underlying nuances

- Although a new user’s vote and an administrator’s vote

technically carries the same weight, administrators

leverage the power of their personal network

- As a result, current administrators retain control over

the network as a whole and decide who gets to become an

administrator

Page 23: Wikipedia Vote Network - Social Networks

Questions?