Eur. Phys. J. B 77, 597609 (2010)DOI: 10.1140/epjb/e2010-00279-x
THE EUROPEANPHYSICAL JOURNAL B
Networks and emotion-driven user communities at popular blogs
M. Mitrovic1, G. Paltoglou2, and B. Tadic1,a
1 Department of theoretical physics, Jozef Stefan Institute, Box 3000 SI-1001 Ljubljana, Slovenia2 Statistical Cybernetics Research Group, School of Computing and Information Technology University of Wolverhampton, UK
Received 6 May 2010/ Received in nal form 25 August 2010Published online 27 September 2010 c EDP Sciences, Societa` Italiana di Fisica, Springer-Verlag 2010
Abstract. Online communications at web portals represents technology-mediated user interactions, leadingto massive data and potentially new techno-social phenomena not seen in real social mixing. Apart frombeing dynamically driven, the user interactions via posts is indirect, suggesting the importance of thecontents of the posted material. We present a systematic way to study Blog data by combined approachesof physics of complex networks and computer science methods of text analysis. We are mapping the Blogdata onto a bipartite network where users and posts with comments are two natural partitions. With themachine learning methods we classify the texts of posts and comments for their emotional contents aspositive or negative, or otherwise objective (neutral). Using the spectral methods of weighted bipartitegraphs, we identify topological communities featuring the users clustered around certain popular posts,and underly the role of emotional contents in the emergence and evolution of these communities.b
Science of the Web  is an emerging multidisciplinaryarea with interconnected contributions from the physics ofcomplex dynamical systems, computer science, and socialscience. Apart from developing technology and algorithmsfor safe and ecient information processing, the researchof Web concerns with understanding its structure  andthe underlying evolution mechanisms  as well as theemergent social phenomena among Web users [4,5]. In thiswork we present a systematic methodology for study of thecollective user behavior on Web portals. The approach isbased on the physics of complex networks and the com-puter science methods of text analysis.
Emotions & Emerging Behavior in Cyberspace. Recentdevelopments of the communication technologies haveinduced new types of human interactions mediated bythe computer networks and on-line availability of dier-ent types of data. This makes the basis for new prac-tice of social communications leading to potentially newtechnology-driven social phenomena not observed in con-ventional social mixing and thus calling for new scienceapproaches [1,4,6,7]. On the other hand, huge amount ofdata of user communications over dierent Web portalsis rapidly accumulating, which oers fabulous possibili-ties for the empirical study. The methodology of complexdynamical systems and mapping the data onto networksprovides the ways to detailed quantitative analysis.
a e-mail: Bosiljka.Tadic@ijs.sib All data are fully anonymized. No information about user
IDs are given.
An important feature of the online communications isthat user interactions are mediated by the posted material,e.g., the text of posts and comments on the Blogs, studiedhere. The indirect interactions not just change the conven-tional social rules known in face-to-face communication,but also indicates the importance of the contents of theposted material . In the Blogs, the posted text mayin dierent ways aect the behavior of the users who readit, depending on the information that the text contains,but also by featuring certain aesthetic, moral or emotionalcontents [8,11]. Recent studies increasingly show that theemotions expressed in the text (or other posted materi-als) play an important role in the online social dynamics.The strength of the emotions expressed by an individual,e.g., the user reading a posted text, can be measured inthe laboratory  and observed on the level of large-scalesocial eects [11,13,14].
A number of conceptually dierent Web sites arecurrently available, ranging from the consumers opinionabout products, e.g., movie database (IMDb), books andmusic records (Amazon), across the sites with exchange ofopinions about everyday events (Diggs, Blogs, Forums),to fast on-line communication on friendship-based net-works (Facebook, FriendFeed, MySpace). The Blogs areconceptually in between the consumer networks and thefriends networks, mentioned above, and thus play a specialrole in the study of social on-line communities [8,9,1519].In Blogs authors express and exchange their opinion viawritten (short) texts, with other users, who are generallynot acquaintances in real life. Registration of bloggers is
598 The European Physical Journal B
required on many Blogsites, which enables quantitativeanalysis and tracing users activity over time.
Network representations of on-line interactions. Net-work representations of complex dynamical systems in-cluding social systems, has proved as a useful tool forquantitative study both in terms of the structure andthe dynamics over networks (for a recent review seeRef. [20,21]). Mapping the data related to dierent so-cial media onto networks reveals correlated dynamical be-haviors, which is manifested in power-law dependencesin the structure of networks and other related distribu-tions [19,2227]. The study of group formation in the net-works related to movie data [22,28,29], music genre [24,25],subject of the posts in Blogs , forums , news sitesand conference publication , etc., show that similarmechanisms might underline the behavior of humans inthese on-line communications. Methods for analyzing con-tent of short messages and textual posts [5,32] and theiremotional content [12,33,34] enable understanding howthe interactions on micro-level (user-to-post-to-user) leadsto large-scale behavior within these virtual communities.
Mapping the data onto bipartite networks [19,22,25,28]is a suitable representation which enables the analysisand identication of dierent user communities. Statis-tical theory and community detection using the meth-ods of the eigenvalue spectral analysis of networks re-veal that dierent mechanisms may drive the dynamics onvery popular post compared to all other posts. (Details ofthe spectral analysis of modular networks are describedin Ref. , while other methods based on maximizationof modularity are reviewed in Ref. [36,37]). In particular,the behavior of bloggers on normally popular posts appears to follow a pattern of self-organized dynamical be-havior and communities mostly related with the subjectpreference. Whereas, subjects appear completely mixedin the case of very popular Blogs , indicating dierentunderlying mechanisms.
In this work we focus on studying popular postscollected from bbc.co.uk/blogs/ by mapping the high-resolution data onto bipartite graphs and nding com-munities of users on it. We study the text of posts andcomments of users within these communities with the aidof machine learning approaches, trained to detect and dis-tinguish emotions in text. This enables us to study sys-tematically the role of the emotions in the emergence andthe evolution of the user communities and the patterns ofuser behavior at these popular posts.
2 Data structure and contents of popularblogs
We collected data  from the bbc.co.uk/blogs/ site fortime period of nearly two years, from June 2007 till Febru-ary 2009. The dataset contains high temporal resolutionof user IDs related action, posting comments related to agiven post, as well as the IDs of the posts and commentsand their text. The concept of the BBC Blogs is ratherspecial: The original posts are written by few (invited)
authors, who often do not take part in the discussion.All posts belong to one of the predened categories, ac-cording to their subjects. Users are registered by IDs andallowed to make comments on these posts. The informa-tion about comment-on-comment is not stored, so thatall comments are automatically attributed to the originalpost. The whole dataset consists of NP = 3792 posts andNC = 80873 comments written by NU = 21462 users.
As mentioned above, we focus on the popular postsand analysis of the emotional contents of user commentsrelated to them. As the popularity break-point occurs atthe number of comments 100 (see the discussion belowand Ref. ), from the entire dataset we select these postsand all users and their comments related to them. Wend NP = 248 popular posts and NU = 13 674 users whowrote NC = 53 606 comments on these posts. We down-loaded text of each of these posts and text of each relatedcomment, and analyzed it with the emotion classifier, de-scribed below. These posts appear to belong to ve dier-ent subject categories: Business and Economy, Music andArt, Sport, Technology and Nature and Science. Knowingthe authors and the posting times for all posts and com-ments, we are able to reconstruct temporal patterns ofusers behavior and link it to the emotional contents of thetexts.
2.1 Mapping the data onto bipartite networks
The Blog data can be suitably represented by directedbipartite graphs with users as one partition, and postsand comments, as the other partition . By deni-tion , in bipartite networks links are allowed only be-tween nodes of dierent partitions, which completely re-spects the structure of the interaction between users overposts and comments in the Blog data. In the data wehave iU = 1, . . . , NU users and jB = 1, . . . , NP + NCposts and comments, which together make N = 106 127nodes of the bipartite network which eventually is reducedto N = 67 528 nodes in the case of the popular posts.The post/comment jB is linked to its author iU trougha directed link t