12
Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management Engineering Sapienza University of Rome Italy Alessandro Bessi IUSS Pavia Piazza della Vittoria Pavia, Italy Guido Caldarelli IMT Lucca Piazza San Ponziano 6 Lucca, Italy Michela Del Vicario IMT Lucca Piazza San Ponziano 6 Lucca, Italy Fabio Petroni Dept. of Computer, Control, and Management Engineering Sapienza University of Rome Italy Antonio Scala ISC-CNR Piazzale Aldo Moro Roma, Italy Fabiana Zollo IMT Lucca Piazza San Ponziano 6 Lucca, Italy Walter Quattrociocchi * IMT Lucca Piazza San Ponziano 6 Lucca, Italy [email protected] ABSTRACT The spreading of unsubstantiated rumors on online social networks (OSN) either unintentionally or intentionally (e.g., for political reasons or even trolling) can have serious con- sequences such as in the recent case of rumors about Ebola causing disruption to health-care workers. Here we show that indicators aimed at quantifying information consump- tion patterns might provide important insights about the virality of false claims. In particular, we address the driv- ing forces behind the popularity of contents by analyzing a sample of 1.2M Facebook Italian users consuming different (and opposite) types of information (science and conspir- acy news). We show that users’ engagement across different contents correlates with the number of friends having sim- ilar consumption patterns (homophily), indicating the area in the social network where certain types of contents are more likely to spread. Then, we test diffusion patterns on an external sample of 4,709 intentional satirical false claims showing that neither the presence of hubs (structural proper- ties) nor the most active users (influencers) are prevalent in viral phenomena. Instead, we found out that in an environ- ment where misinformation is pervasive, users’ aggregation around shared beliefs may make the usual exposure to con- spiracy stories (polarization) a determinant for the virality of false information. * Corresponding author General Terms Misinformation, Virality, Attention Patterns 1. INTRODUCTION The Web has become pervasive and digital technology per- meates every aspect of daily life. Social interaction, health- care activity, political engagement, and economic decision- making are influenced by the digital hyperconnectivity [1–8]. Nowadays, everyone can produce and access a variety of in- formation actively participating in the diffusion and rein- forcement of narratives, by which we mean information co- herent with a given worldview. Such a shift of paradigm in the consumption of information has profoundly affected the way people get informed [9–15]. However, the role of the socio-technical systems in shaping the public opinion still remains unclear. Indeed, the spreading of unsubstantiated rumors, whether it is unintentional or intentional (e.g., for political reasons or even trolling), could have serious con- sequences; the World Economic Forum has listed massive digital misinformation as one of the main risks for the mod- ern society [16]. On online social networks, users discover and share infor- mation with their friends and through cascades of reshares information might reach a large number of individuals. Cas- cades are rare [17], but are common on online social media such as Facebook or Twitter [18–20]. Interesting is the pop- ular case of Senator Cirenga’s [21,22] law proposing to fund policy makers with 134 million of euros (10% of the Italian GDP) in case of defeat in the political competition. This was an intentional joke – the text of the post was explicitly mentioning its provocative nature – that became popular within online political activists to an extent that it has been used as an argumentation in political debates [23]. The in- formation overload mixed with users’ limited attention may arXiv:1411.2893v1 [cs.SI] 11 Nov 2014

Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

Viral Misinformation: The Role of Homophily andPolarization

Aris AnagnostopoulosDept. of Computer, Control,

and Management EngineeringSapienza University of Rome

Italy

Alessandro BessiIUSS Pavia

Piazza della VittoriaPavia, Italy

Guido CaldarelliIMT Lucca

Piazza San Ponziano 6Lucca, Italy

Michela Del VicarioIMT Lucca

Piazza San Ponziano 6Lucca, Italy

Fabio PetroniDept. of Computer, Control,

and Management EngineeringSapienza University of Rome

Italy

Antonio ScalaISC-CNR

Piazzale Aldo MoroRoma, Italy

Fabiana ZolloIMT Lucca

Piazza San Ponziano 6Lucca, Italy

Walter Quattrociocchi∗

IMT LuccaPiazza San Ponziano 6

Lucca, [email protected]

ABSTRACTThe spreading of unsubstantiated rumors on online socialnetworks (OSN) either unintentionally or intentionally (e.g.,for political reasons or even trolling) can have serious con-sequences such as in the recent case of rumors about Ebolacausing disruption to health-care workers. Here we showthat indicators aimed at quantifying information consump-tion patterns might provide important insights about thevirality of false claims. In particular, we address the driv-ing forces behind the popularity of contents by analyzing asample of 1.2M Facebook Italian users consuming different(and opposite) types of information (science and conspir-acy news). We show that users’ engagement across differentcontents correlates with the number of friends having sim-ilar consumption patterns (homophily), indicating the areain the social network where certain types of contents aremore likely to spread. Then, we test diffusion patterns onan external sample of 4,709 intentional satirical false claimsshowing that neither the presence of hubs (structural proper-ties) nor the most active users (influencers) are prevalent inviral phenomena. Instead, we found out that in an environ-ment where misinformation is pervasive, users’ aggregationaround shared beliefs may make the usual exposure to con-spiracy stories (polarization) a determinant for the viralityof false information.

∗Corresponding author

General TermsMisinformation, Virality, Attention Patterns

1. INTRODUCTIONThe Web has become pervasive and digital technology per-meates every aspect of daily life. Social interaction, health-care activity, political engagement, and economic decision-making are influenced by the digital hyperconnectivity [1–8].Nowadays, everyone can produce and access a variety of in-formation actively participating in the diffusion and rein-forcement of narratives, by which we mean information co-herent with a given worldview. Such a shift of paradigm inthe consumption of information has profoundly affected theway people get informed [9–15]. However, the role of thesocio-technical systems in shaping the public opinion stillremains unclear. Indeed, the spreading of unsubstantiatedrumors, whether it is unintentional or intentional (e.g., forpolitical reasons or even trolling), could have serious con-sequences; the World Economic Forum has listed massivedigital misinformation as one of the main risks for the mod-ern society [16].

On online social networks, users discover and share infor-mation with their friends and through cascades of resharesinformation might reach a large number of individuals. Cas-cades are rare [17], but are common on online social mediasuch as Facebook or Twitter [18–20]. Interesting is the pop-ular case of Senator Cirenga’s [21,22] law proposing to fundpolicy makers with 134 million of euros (10% of the ItalianGDP) in case of defeat in the political competition. Thiswas an intentional joke – the text of the post was explicitlymentioning its provocative nature – that became popularwithin online political activists to an extent that it has beenused as an argumentation in political debates [23]. The in-formation overload mixed with users’ limited attention may

arX

iv:1

411.

2893

v1 [

cs.S

I] 1

1 N

ov 2

014

Page 2: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

facilitate the creation of urban legends.

Predicting the evolution of cascades is far from trivial [24].In this work, we want to understand the driving forces be-hind the popularity of contents accounting for the users in-formation consumption patterns. To this end, we focus ontwo very distinct types of information: scientific and con-spiracy news. The former are aiming at diffusing scientificknowledge as well as scientific thinking, while the latter pro-vide alternative arguments that are difficult to substantiate.Conspiracists tend to reduce the complexity of reality byexplaining significant social or political occurrences as plotsconceived by powerful individuals or organizations. Sincethese kinds of arguments can sometimes involve the rejectionof science, alternative explanations are invoked to replacethe scientific evidence [25]. For instance, people who rejectthe link between HIV and AIDS generally believe that AIDSwas created by the U.S. Government to control the AfricanAmerican population [26, 27]. Just recently, the fear of anEbola outbreak in the United States rippled through socialmedia networks [28–30]. Furthermore, the spread of mis-information can be particularly difficult to correct [31–34].In fact, recently, in [35] it has been shown that conspiracistand mainstream information reverberate in a similar way onsocial media and that users generally exposed to conspiracystories are the more prone to like and share satirical infor-mation [36]. Meanwhile, scientific debunking on conspiracyrumors creates a reinforcement effect in consuming conspir-acist information on polarized users [34].

Virality of unsubstantiated rumors. In this work, weanalyze a sample of 1.2M Facebook Italian users consumingdifferent (and opposite) kind of information: scientific andconspiracy news. We observe that users engagement acrossdifferent contents correlates with the number of friends hav-ing similar consumption patterns (homophily). Then, we fo-cus on the popularity of posts of the two categories, findingthat both present similar statistical signatures. Finally, wetest this diffusion patterns on an external sample of 4,709intentional false information – satirical version and para-doxical claims – confirming that neither hubs (structuralproperties) nor the most active users (influencers) are rele-vant to characterize viral phenomena. Our analysis showsthat where misinformation is pervasive, users’ aggregationaround shared beliefs makes the frequent exposure to con-spiracy stories (polarization) a determinant for the viral-ity of false information. Furthermore, we provide impor-tant insights toward the understanding of viral processesaround false information on online social media. In partic-ular, through the relation between the users’ engagementand the number of friends, we provide new metrics – i.e.,users polarization defined on the information consumptionpatterns – to detect areas in the social network where falseclaims are more likely to spread. We conclude the papershowing a nice case study about the role of the frequent ex-posure to conspiracy stories (polarization) on the virality offalse information.

2. RELATED WORKSRecent studies explored the evolution of social phenomenaon online social networks from observable data; one of themost investigated aspects are the structural properties andtheir effect on social dynamics [10, 37, 38]. The strength of

weak ties is currently matter of debate. In fact, in [38] itis shown that while long ties are relevant in the spreadingof innovation and social movements, they are not enoughto spread social reinforcement. In [10] the author foundevidence of an effect of reinforcement in the adoption forclustered networks, where many redundant ties are present.In a more recent work [37], the number of different con-nected components, representing separate social contexts forthe user, proved to be more influential than the number offriends itself and other properties of potential influencershave been addressed in [39]. Another interesting challenge isthe one studying the possibility to distinguish influence fromhomophily [40–42]. In [40] it is performed over the globalnetwork of instant messaging traffic among about 30 millionsusers on Yahoo.com, with complete data on the adoption ofa mobile service application and precise dynamic behavioraldata on users. The effect of homophily proves to explainmore than 50% of the phenomena perceived as contagion.In [41] the role of social network and exposure to friends’activities in information re-sharing on Facebook is analyzed.Through controlled experiments in which user were dividedin exposed and not exposed to friends’ re-sharing, authorswere able to isolate contagion from other confounding effectslike homophily. They claimed that in the exposed case thereis a considerably higher chance to share contents.

Recent studies settled on Facebook aimed at unfolding cas-cades characteristics [18] and predicting their trajectoriesand shapes [43]. As for the characteristics, it is found that asmall but significant fraction of posts forms wide and deepcascades and that different cascades may evolve in differ-ent ways. Many aspects of cascades’ behavior – e.g., underwhich structural and user-constrained properties is possi-ble to predict them – are hard tasks that have not beencompletely exploited. In the last years, a new online phe-nomenon is attracting the interest of the researchers com-munity, the spreading of unsubstantiated and false claimsthrough OSN (as Facebook), that often reverberate lead-ing to mass misinformation. The study in [35] is a de-tailed analysis of the information consumption by Facebookusers on different categories of pages: alternative informa-tion sources, political activism and mainstream media. Au-thors pointed out evidences that mainstream media informa-tion reverberate as long as unsubstantiated one, and that theexposition to the latter makes users more likely to interactwith intentionally injected false information. More recently,in [34] it has been shown that the exposure to debunkingposts might increase the engagement of users in consumingconspiracy information.

3. DATA COLLECTIONIn this work, we aim at testing the relationship of contentconsumption with respect to social networks’ structure andviral phenomena. To do this, we need to clearly identifysources that are promoting different contents referring to op-posite worldviews. We define the space of our investigationwith the help of Facebook groups very active in the debunk-ing of conspiracy theses (see the acknowledgements sectionfor further details). We started from 73 public Facebookpages, from which 34 are about scientific news and 39 aboutnews that can be considered conspiratorial; we refer to theformer as science pages and to the latter as conspiracy pages.In Table 1 we summarize the details of our data collection.

Page 3: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

We downloaded all posts from these pages in a timespan of4 years (2010 to 2014). In addition, we collected all the likesand comments from the posts, and we counted the numberof shares1. In total, we collected around 9M likes and 1Mcomments, performed by about 1.2M and 280K Facebookusers, respectively (see Table 1). Likes, shares, and com-ments have a different meaning from the user viewpoint.Most of the times, a like stands for a positive feedback tothe post; a share expresses the will to increase the visibilityof a given information; and a comment is the way in whichonline collective debates take form. Comments may containnegative or positive feedbacks with respect to the post.

In addition, we collected the ego networks of users who likedat least one post on science or conspiracy pages2, that is,for each of these users we collected the list of friends andthe links between them. This allowed us to create a socialnetwork of users and the (publicly declared) connections be-tween them for a total of about 1.2M nodes and 34.5Medges.

Total Science ConspiracyPages 73 34 39Posts 271,296 62,705 208,591Likes 9,164,781 2,505,399 6,659,382Comments 1,017,509 180,918 836,591Shares 17,797,819 1,471,088 16,326,731Likers 1,196,404 332,357 864,047Commenters 279,972 53,438 226,534

Table 1: Breakdown of Facebook dataset. Thenumber of pages, posts, likes, comments, and sharesfor science and conspiracy pages.

Finally, we obtained access to 4,709 posts from two satiri-cal Facebook pages (to which we will refer to as troll postsand troll pages) promoting intentionally false and caricatu-ral version of the most debated issues. The more popular ofthe two is called called“Semplicemente Me”, it is followed byabout 7K users and it is focused on general online rumors.The second one is called “Simply Humans”, it is followed byabout 1K users, and it hosts mostly posts of conspiratorialnature. We collected about 40K likes and 59K commentson these posts, performed by about 16K and 43K Facebookusers, respectively. These pages were able to trigger sev-eral viral phenomena, with one of them reaching more than100K shares. In Section 7, we use troll memes to measurehow the social ecosystem under investigation is respondingto the injection of false information.

4. PRELIMINARIES AND DEFINITIONSHere we describe how we preprocess our dataset and providesome of the basic definitions that we use throughout the en-tire paper. Let P be the set of all the posts in our collection,1Unfortunately, although Facebook provides this number fora given post, it provides the IDs of only a small number ofthe sharers.2We used publicly available data, so we collected only datafor which the users had the corresponding permissions open.We estimated that these form a percentage of more than 96%of the total connections, allowing us to perform a meaningfulanalysis regarding the connections between users.

and define Pscience (Pconsp) as the set of posts posted on oneof the 34 (39) Facebook pages about science (conspiracy)news. We call posts that were posted in pages about sci-ence, science posts, and similarly for conspiracy posts. LetV be the set of all the 1.2M users that we observed and Ethe set of edges representing the Facebook friendship con-nections between them. These define a graph G = (V,E).

We also define the graph of likes, GL = (V L,EL), which isthe subgraph of G composed of users who have liked at leastone post. Thus, V L is the set of users of V who have likedat least one post, and we set EL = {(u,v) ∈ E; u,v ∈ V L}.

For each user u let θ(u) be the total number of likes that uhas expressed in posts in our collection P. We define ψ(u)and call it the engagement of user u as the normalized lik-ing activity with respect to of all the users in our dataset,

namely ψ(u) = θ(u)maxv θ(v)

. We are interested in learning the

extent to which users are polarized in a given content. Fol-lowing previous works [34–36], we are interested in studyingthe polarization of users, which we informally define as thetendency of users to interact about only with a single type ofinformation. Here we study the polarization towards scienceand conspiracy. To quantify it, we define the polarization ofuser u ∈ V L, ρ(u) ∈ [0,1], as the ratio of likes that u has per-formed on conspiracy posts: assuming that u has performedx and y likes on science and conspiracy posts, respectively,we let ρ(u) = y/(x+ y). Thus, a user u for whom ρ(u) = 0is polarized towards science, whereas a user with ρ(u) = 1is polarized towards conspiracy. Note that we use the likingactivity to define polarization and we ignore the comment-ing activity. The former is usually an explicit endorsementof the original post, whereas a comment may be an endorse-ment, a criticism, or even a response to a previous comment.

In Figure 1 we depict the polarization of all the users in V L.On the left panel, we show the probability density func-tion (PDF). Recall that value ρ(u) = 0 corresponds to userswho have liked only science posts and value ρ(u) = 1 tousers who have liked only conspiracy posts. From the plotsit is evident that the vast majority of the users are polar-ized either towards science or towards conspiracy. Moreover,the quantile–quantile comparison shows that distributions ofthe number of likes for users totally polarized towards sci-ence and conspiracy – i.e, the distributions of the values{ψ(u); ρ(u) = 0} and {ψ(u); ρ(u) = 1} – are very similar.To further details on the interaction between users on thetwo categories refer to [36]. Findings depicted in Figure 1suggest that most of the likers can be divided into two sets,those polarized towards science and those polarized towardsconspiracy news. Let V L

science be the users with polarizationmore than 95% towards science (i.e., less than 5% towardsconspiracy), formally:

V Lscience = {u ∈ V L; ρ(u) < 0.05},

and V Lscience the users with polarization more than 95% to-

wards conspiracy, namely

V Lconsp = {u ∈ V L; ρ(u) > 0.95}.

For convenience, we call the first set of users science usersand the second one conspiracy users, without implying thatthe former are scientists or that the latter necessarily believein conspiracy theories. Note that these two sets contain

Page 4: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

Figure 1: Polarization on contents. Probability den-sity function (PDF) of users’ polarization acrossposts and quantile–quantile comparison (QQ plot)between the distributions of the values {ψ(u); ρ(u) =0} and {ψ(u); ρ(u) = 1}.

most of the users in the peaks in Figure 1. We also ex-perimented with values ranging from 50% to 99%, and theresults were qualitatively the same. We can then define theinduced subgraphs of G, GL

science = (V Lscience,E

Lscience) and

GLconsp = (V L

consp,ELconsp) in the natural way, for example,

ELscience contains all the edges in E such that both endpoints

are in V Lscience. In Section 6 we study those two graphs ex-

tensively.

We also define some of the graph-theoretic terms that weuse throughout the paper. In general, nodes in the graphsrepresent the Facebook users of our study. For convenience,we use the terms node and user interchangeably. We assumethat a graph to which we refer is clear from the context. Thedegree of node u, deg(u), is the number of neighbors of nodeu. The clustering coefficient of u is the number of closedtriplets over the total number of triplets (we use the time-efficiently method). The k-core of a graph H is the maximalsubgraph H ′ of H such that the degree of each node in H ′ isat least k. The coreness of a vertex is k if it belongs to thek-core but not to the (k + 1)-core. The density of a graphwith x nodes and y edges is y/

(x2

): the ratio of the number

of existing edges over all the possible edges.

Finally, we introduce some statistical notions that we use inSection 5 to analyze how science and conspiracy informationis consumed. Let us define a random variable T on the inter-val [0,∞), indicaticating the time an event takes place (wewill use it to represent the time elapsed from the time a postwas posted till a given user liked it). The cumulative dis-tribution function (CDF), F (t) = Pr(T ≤ t), indicates the

probability that the event takes place within a given time t.The survival function, defined as the complementary CDF(CCDF)3 of T , gives the probability that an event lasts be-yond a given time period t. To estimate this probability weuse the Kaplan–Meier estimate [44]. Let nt denote the num-ber of posts being liked before a given time step t, and letdt denote the number of posts stop being liked at t. Then,the estimated survival probability after time t is defined as(nt − dt)/nt. Then, if we have N observations at timest1 ≤ t2 ≤ · · · ≤ tN , assuming that the events at times ti arejointly independent, the Kaplan–Meier estimate of the sur-

vival function at time t is defined as S(t) =∏ti<t

(nti−dtinti

).

5. USERS AND CONTENTSIn this section, we study how users consume informationfrom science and conspiracy pages. Figure 2 shows theempirical complementary cumulative distribution function(CCDF) of the number of likes, comments, and shares ofposts in Pscience and Pconsp. Now, we concentrate on the

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

scientific news

CC

DF

100 101 102 103 104 10510−6

10−5

10−4

10−3

10−2

10−1

100

● # likes# comments# shares

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

conspiracist news

100 101 102 103 104 10510−6

10−5

10−4

10−3

10−2

10−1

100

CC

DF

● # likes# comments# shares

Figure 2: Consumption Patterns. EmpiricalComplementary Cumulative Distribution Function(CCDF) for the number of likes, comments, andshares of Pscience and Pconsp.

users and we measure how the science and conspiracy users(i.e., users in V L

science and V Lconsp) keep on consuming infor-

mation in Pscience and Pconsp over time. Figure 3 showsthe empirical cumulative distribution function (CDF) forthe lifetime of V L

science and V Lconsp on posts in Pscience and

Pconsp, respectively. The lifetime of a user u ∈ V Lscience is

the difference of the post time of the last and first post inPscience that she liked, and similarly for the lifetime of usersin V L

consp. Finally, we put our attention on the lifetime ofPscience and Pconsp by computing their empirical SurvivalFunction. Figure 4 shows the Kaplan–Meier estimate and95% confidence intervals for the Survival Function of Pscience

and Pconsp. Despite the very different kind of content, theresults indicate that consumption patterns (number of likes,comments, and shares), as well as persistence of users and

3We remind that the CCDF of a random variable X is oneminus the CDF, the function f(x) = Pr(X > x).

Page 5: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

Figure 3: Users’ Lifetime. Empirical CumulativeDistribution Function (CDF) for the temporal dis-tance between the first and last like of users inV Lscience and V L

consp on posts in Pscience and Pconsp,respectively; see text for more details.

Figure 4: Posts’ Lifetime. Kaplan–Meier estimateand 95% confidence intervals for the Survival Func-tion of Pscience and Pconsp.

degree

CC

DF

10−5

10−4

10−3

10−2

10−1

100

100 101 102 103

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

cluster size

CC

DF

10−4

10−3

10−2

10−1

100

100 101 102 103 104 105 106

allscientific newsconspiracy news

coreness

CC

DF

10−3

10−2

10−1

100

100 101 102

●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

Figure 5: Network metrics. Complementary Cu-mulative Distribution Function (CCDF) of the de-gree of each node, the size of the connected compo-nents, and the coreness of each node of the graphs G(all), GL

science (scientific news), and GLconsp (conspir-

acy news). All graphs present similar distributions.

posts over time, both categories have similar consumptionpatterns.

6. CONTENTS AND FRIENDSIn this section, we analyze the topological features of thesocial networks by accounting for users’ information con-sumption patterns. As a first measure, in Figure 5, weshow the complementary cumulative distribution function(CCDF) of the degree of each node, the size of the con-nected components, and the coreness of each node for thegraphs G, GL

science, and GLconsp. All graphs present simi-

lar distributions. Next we compare the network structureof the entire graph G, and the subgraphs of the polarizedusers, GL

science and GLconsp, when we restrict to different lev-

els of engagement. In Figure 6 we show various topologicalmetrics for subgraphs of the three aforementioned graphs.In the x-axis we vary the engagement of nodes and considerthe subgraphs of G, GL

science, and GLconsp induced by nodes

u with engagement ψ(u) ≥ x. In the y-axis we consider avariety of topological measures: size (number of nodes), den-sity, clustering coefficient, number of connected components,size of the maximum connected component, and maximumcoreness over all the nodes in the subgraph. The resultsshow that the more we consider active users the more theconnectivity patterns move towards a denser and clusterednetwork.

The liking activity is a good approximation to associateusers with their preferred contents [34–36], so users’ likingactivity across contents of the different categories may beintended as the preferential attitude towards the one or theother type of information. To see how consumers of differentcontents are distributed over the network, we measure howthe polarization of the network changes along with the users’engagement. In Figure 7, we show the average value of ρ(u)as a function of their ψ(u): for each value of x, we considerthe subnetwork of G induced by users u with ψ(u) ≥ x andwe compute the average polarization among the users u inthe entire subnetwork, in the maximal k-core, and in the

Page 6: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

ψ

size

05x

104

1x10

51.

5x10

5

0 0.025 0.05

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

ψ

dens

ity0

2x10

−44x

10−4

6x10

−48x

10−4

1x10

−3

0 0.025 0.05

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●

allscientific newsconspiracy news

ψ

clus

terin

g0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.025 0.05

●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●

●●

●●●●

●●●●●●

●●

allscientific newsconspiracy news

ψ

# co

nnec

ted

com

pone

nts

5000

1000

015

000

0 0.025 0.05

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

ψsize

max

con

nect

ed c

ompo

nent

05x

104

1x10

51.

5x10

5

0 0.025 0.05

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

ψ

max

cor

enes

s20

4060

8010

0

0 0.025 0.05

●●●

●●

●●●

●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●

allscientific newsconspiracy news

Figure 6: Network topology and users engagement.Size (number of nodes), density, clustering coef-ficient, number of connected components, size ofthe maximum connected component, and maximumcoreness over all the nodes in the subgraph inducedby nodes of engagement ψ(u) ≥ x.

maximal connected component.

We can observe in Figure 7 that the lowest value of thepolarization in the largest connected components is attainedat ψ(u) = 0.025. Given this observation, in Figure 8, weshow the largest connected component of the subnetworkinduced by the friendship network G if we only considernodes u ∈ G with ψ(u) > 0.025, where the color of eachnode u depends on ρ(u) (red nodes are science users andblue are conspiracy ones). We identify a main communityof polarized users in conspiracy news.

Figure 8 hints that there is a strong presence of homophilywith respect to polarization, and we delve more into thisissue. In Figure 9 we show the fraction of polarized friendsas a function of the engagement degree ψ(·). Specifically, inthe left panel, we consider the set of each user u ∈ V L

science

with a given ψ(u), we compute the fraction of u’s polarizedneighbors in G that are polarized towards science (i.e., theybelong in V L

science), and we show the average of these frac-tions for all of these users u. In the right panel, we do thesame, but considering the conspiracy users u ∈ V L

consp andtheir conspiracy friends. Figure 9 clearly indicates the pres-ence of homophily in the friendship network with respectto news preference. This is an important phenomenon tounderstand viral processes, because the latter might diffuseamong connected users having similar information consump-tion patterns; we study this question more in Section 7.To further confirm such point, we show that for a polar-ized user u, the fraction of polarized friends in her cate-

ψ0 0.01 0.02 0.03 0.04 0.05

0.6

0.7

0.8

0.9

1

●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

ψψ

● ρ(u) in k−coreρ(u) in max connected componentρ(u) in G

Figure 7: Users Engagement and Network. For eachvalue of x, we consider the subnetwork of G inducedby users u with ψ(u) ≥ x and we compute the averagepolarization among the users u in the entire subnet-work, in the maximal k-core, and in the maximalconnected component.

Figure 8: Largest connected component. Maximalconnected component of the friendship network fornodes u with ψ(u) > 0.025, where nodes are coloredaccording to the polarization ψ(u).

Page 7: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

science users

ψ

frac

tion

of p

olar

ized

frie

nds

00.1

0.20.3

0.40.5

0.60.7

0.80.9

1

0.0

0.2

0.4

0.6

0.8

1.0

science usersconspiracy users

conspiracy users

ψ

frac

tion

of p

olar

ized

frie

nds

00.1

0.20.3

0.40.5

0.60.7

0.80.9

10.

00.

20.

40.

60.

81.

0

science usersconspiracy users

Figure 9: Fraction of polarized neighbors as a func-tion of ψ(·).

gory y(u) can be predicted by means of a linear regressionmodel with intercept where the explanatory variable is alogarithmic transformation of the number of likes θ(u), thatis, y(u) = β0 +β1 log(θ(u)). Coefficients are estimated usingordinary least squares and they are—with the correspondingstandard errors inside the round brackets—β0 = 0.70 (0.005)

and β1 = 0.043 (0.001), with R2 = 0.95, for users polar-

ized towards science, and β0 = 0.71 (0.003) and β1 = 0.047(0.0006), with R2 = 0.98, for users polarized towards con-spiracy. All the p-values are close to zero. In Figure 10, weshow the fit of the model for users polarized in science andconspiracy.

Summarizing, we find very polarized communities arounddifferent contents. Such a polarization emerges in the friend-ship network where we identify homophily; indeed, the morea polarized user is active in her category the bigger the num-ber of polarized friends she has in the same category. Thanksto such a characterization, we are able to identify areas ofthe social network where a given content is more likely todiffuse. This is the topic of the next section.

7. VIRALITY AND POLARIZATIONIn this section, we want to analyze the determinants behindviral phenomena related to false information. In particu-lar, we focus on the role of structural features of the socialnetwork as well as on users’ consumption patterns.

7.1 Network Structure and Users’ ActivityFor a better understanding of posts’ virality, we use the num-ber of shares since they are a proxy of the number of peoplereached by a post. Hence, we pick two random samples of5,000 posts from each set Pscience and Pconsp, and we com-pute structural as well as information-consumption metricsto be compared with the total number of shares. In particu-lar, we concentrate on the degree deg(u) and liking activityψ(u) of each user u who liked the posts in our samples.In figures 11 and 12 we show the number of shares versus,respectively, the average values of the degree deg(u) and lik-ing activity ψ(u) among the users who liked a given post

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

science

log(θ)

y

1 2 3 4 5 6

0.7

0.75

0.8

0.85

0.9

0.95

1

y = 0.70 + 0.043 log(θ)R2 = 0.95

conspiracy

log(θ)

y

1 2 3 4 5 6

0.7

0.75

0.8

0.85

0.9

0.95

1

y = 0.71 + 0.047 log(θ)R2 = 0.98

Figure 10: Predicting the number of polarizedfriends. For a polarized user u, the fraction ofpolarized friends in her category y(u) can be pre-dicted by means of a linear regression model withintercept where the explanatory variable is a loga-rithmic transformation of the number of likes θ(u),that is, y(u) = β0 + β1 log(θ(u)). Coefficients are esti-mated using ordinary least squares and they are—with the corresponding standard errors inside theround brackets—β0 = 0.70 (0.005) and β1 = 0.043(0.001), with R2 = 0.95, for users polarized towards

science, and β0 = 0.71 (0.003) and β1 = 0.047 (0.0006),with R2 = 0.98, for users polarized towards conspir-acy. All the p-values are close to zero.

Page 8: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●●●

●●

●●

● ●

● ●●

●●

● ●

●●●

●●

●● ●●●

●●

●●

●●●

●●●

●●

●●●

● ●

●●

●●● ●

●●

●●●

●● ●

●●●

●●

●●

●●

●●● ●

● ●●

●●

●● ●●

●● ●

● ●●●

●● ●●

●● ●●

●●● ●

●● ●●

● ●●

● ● ●●● ●

●●

●●●

●● ●●

●●

●●●●

●● ●

●●

● ●●●●

●●● ●●●

● ●

●●● ●● ●

●●

●●

●●

●●

●● ●

●● ●● ●

●●

●●●

●●●

●●●

●●

●● ●

●●

●● ● ●

●●

● ●●● ●

●●●● ●

●●

●● ●● ●

● ●● ● ●

●●

● ●● ●● ●● ●●● ●●●

●●●

●●

●● ●●●

●●●●●●

● ●●●● ●

● ●●

●● ●●●

●●●

● ●

●●

●●●

●● ●●●●●●●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●●●

●●●

●●●

●●● ● ●●● ● ●●● ● ●

●● ●

●●

●●●

●● ●

●●

●●● ● ●

●●●

● ●●

●●● ● ●●● ●

● ●●●

●●● ●

● ●

●●

● ●

● ●●

●●

●●

●●●

●●

●●

●●

●●●●●

● ●●

●●

●●●

●●

●●

●●

● ●●●

●●

●●●

●●● ●

●●●

●●

● ●●●

●● ●

● ●

●●●●

● ●● ●●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●● ●●

●●

●●

●●

●● ●

●●● ●

● ●● ●●

●● ●

●● ●

●●

●●●

● ●●●

● ●● ●

●● ●

●● ●●

● ●

●●●

●●

●● ●

●●●

●●●

● ●

●●

●●

●● ●●

●●● ● ●

●●●●

●● ●● ●●●● ●

●●

● ●● ●

●●

●●

●●●●●

●● ●●

●●●

●●

● ●

●●●

●●●● ●●

●●●

●●

●●

● ●●

● ●

●●

● ●●● ●

●● ● ●●●●

● ●

●● ● ●

●●

● ● ● ●

● ●●● ● ●

● ●● ●

●●●

●●

● ●●

●●●● ●

●●● ●●

●●● ●

●●●

●●

● ●●

●●

● ●● ●●●

●●

●●

● ●●

●● ●●●●

●●

●●● ●

●●

●●

● ●●

●●

● ●

●●●●● ●

●●

●●

●● ●●

●● ●

●●● ●●● ●●

● ●●● ●● ● ● ●●●

●●●●●

●●●● ●● ●

●●● ●●

●●

●● ●●

● ●

●●

●●

●●●

●●

●●●

●● ●

●● ●

●●

●●

● ● ● ● ●●●●● ●●● ●

● ●●

●●●

● ●●

●●

●●

●● ●

●●

●●

●●

● ● ●●●● ●

● ●●

●●

● ●● ●●

●●● ●

●●

●● ●● ●●●

●●●

●●

●●●

●● ●

●●

● ●● ●

● ●●●

● ●● ●● ●

● ●●

●●

●●●

●●

●●●

●● ●●

●●●●●●

●●●

●●●●●

●●●

●● ●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ● ●●

●●

●●

●● ●

● ●●

●● ●●

●●

●● ●

●●

●● ●

●●

●●

● ●●

●●●

● ●

● ●

●● ●

●●

●● ●

●●

●●

●●

●●

●● ● ●●●

●●

●●

●●

●●●●●●

● ●●●

●●

●●

● ●● ●

●●●

●● ●

●● ●●●

●●

● ●

●●●

●●●

●●

● ●

●●

●●●

●●

● ● ●●

●●

● ●●●

●●

●●● ●

●● ●●● ●●

● ●●●

●●

●●

● ●●

● ●

●●

●●● ●

●●●

● ●●

●●●●●

●●

● ●●

●●●

● ●

●●

● ●

●● ●●

●●

● ●

●● ●●● ●

●● ● ●●

●●

●●● ●● ●●●

●●●

●● ●●

●●●● ●● ● ●

●● ●

●●●

●●

● ●●●

●●

●●●

●●●● ●

●●

●● ●

●●

●●

●●

●●

● ●●●

●●

●●

●● ●●

●●

● ●●

●●●

●●●●

●●

●●●

●●

●●

●● ●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

●● ●● ●

●●●● ●

●●

●●

● ●●

● ● ●

●●

●●

●●● ●

●●

● ●●

●●

● ●●

●● ●

●●

●● ●

●●

● ●●●

●●

●●

●●

●●

●● ●●

●● ●

● ●●●

●●

●●

● ●● ●

●●

●●

● ●●●●

●●

●● ●

● ● ●●

● ●●

●●

●●

●● ●

●● ●

●●

●●

● ●●

● ●

●●●

●●

● ●● ●●●

●●● ● ●

●●

●●●

●● ●●●●● ●

●●●

●●● ●●●

● ●● ●● ●●●●● ●●● ●● ●

●●● ● ● ●●

●● ●

● ●●

●●

●●●●

● ● ●●●

●●

●●

● ● ● ● ●●●

● ●●●

● ●●●

●● ● ●● ●●

●●

●●

●●

●●●

●● ●●

●●●

●●●

●● ●●●●●

●●

●●

●● ●

● ●

●●

●● ●

●●

●●●

● ●●

●●

● ●●

●●●

●● ●●● ●● ●

● ●● ●●● ● ●● ●●

●●● ●

●●● ●●

●●●

●●●● ●●

●●●●● ●●

●●

●●● ●● ●

● ●●●

●● ●●●●

●● ●●●●● ●

● ●●● ●

●●●

●●

●●●

● ●●

●●

●●

●●●●

●●

●● ●●

●●●● ● ●● ●●

●● ●● ●●● ●●●● ●●

●●

●● ●●

● ●

●●●

● ●● ●● ● ●● ●● ●● ●

●● ●

● ●● ●●

● ●

● ●● ●●

●●● ●

●● ●●● ●●●●

● ●●●

● ●●● ●●●

● ●

●●

●●● ●

●●

●●●

●●

●●●● ●●●● ● ●

●● ●● ●

●●●

●●●

●●●

●●● ● ●●

●●●● ●

●●

●●●

●●● ●●●

●● ●

●● ●●

●● ●

●●●● ● ●

●●

●●

●●

●● ●

●●

●●

●● ●

● ●●

●●● ● ●

●●●

●●

●●

●●● ●● ●

●● ● ●●

●●●

●●

●●

●●

● ●●●●●

● ●● ● ●● ●●

●●●

● ●●●●

●●●

●● ● ●●

● ●●●

●●

●●●● ●●

●●●● ●

●●

●●● ●●●●

●● ● ●● ● ●●●● ●

●● ● ●

●●● ●

●●●● ●● ●

● ●●

●●● ●

●●

●●●

●● ●● ●

●●

●●●● ●●● ●● ●

●●●

●●● ●

●●●● ●●● ●

●●

● ●

● ●●●

●●●

●●

● ● ●●●●

●●●

●● ●●

●●

●●

●●

●●

● ●

●●

●●●

● ●●

scientific news

# shares

avg(

deg)

101 102 103 104

075

150

conspiracy news

# shares

avg(

deg)

101 102 103 104

030

060

0

Figure 11: Number of shares versus average degreeof users who liked the posts in our samples.

from our samples. The results provide an outline of viralprocesses. They show that in viral posts (i.e., posts withhigh values of shares), very active users (in terms either ofnumber of friends or liking activity) form a tiny fraction ofthe overall users. We have checked that similar results holdwhen accounting for commenting activity.

●● ● ●●●●● ●●

●● ●

●● ●●

●● ●●●● ●●

● ●●

●●●●

●●

●●●

●●●

●●●●

●● ●● ●●●●

●● ●●

●●●●● ●●●

●● ●●●●●●●

●●●●

●●● ●

●●

● ●● ●● ●● ●●

●●

●● ●●●

●●

●●●

●● ●●● ●● ●

● ●● ●● ●

● ●● ● ● ●● ●● ●● ●●

●●● ● ●● ●

● ● ●●●

●●●

●● ●●

●● ●

●●

● ●●●

●●

●●

●● ●●

●●● ●

●●

●●

● ●●●●

●●●

●●●●●

●●

●● ●● ●

● ●

●●● ●● ●

●● ●

●●

● ●●

●●●

●● ●●

●●●●

●●●

●●●●

●●

● ●●

●●

● ●● ● ●● ●

●●

●● ●● ●●● ●● ●● ●● ●● ● ●● ●●●● ●● ●● ●●● ●●●●● ●

● ●●●

●● ●●

● ●

●●●●

●● ●●●● ●● ●● ●●

●● ● ●●●●

● ●● ●●●

●● ● ●

●●●●●●● ● ●

●● ● ●● ●● ●●

●●

●●

●●

● ●●●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●● ●●●

●●

● ●●● ●

●●●

●●

●●

●● ●●● ●

● ●

●●

●● ●

●●

●●

●●

● ●

●● ●

●●●

●●

●●

●● ●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●● ●

● ●●

●●

●●

●●●

●●●●

●●●

●●

●●●●

●●

● ●●

●●

● ●●

● ●●

●●

●● ●

●●

● ●

●●

●●●

● ●

● ●●

●●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●●●

● ●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●● ● ●●●

● ●

● ●●

●●

●●●

●●

●●

●●

● ●●●

● ●●

●● ●●●

●●

●●●●

●●

●●

●●●

● ●

●●

●●

●●

● ●●●

●●

●● ●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

● ●

●●

●●● ●

●●

● ●

●●●

● ●

● ●●●

●●

● ●●

●●●

●●● ●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●●● ●●

●●

●●

●● ● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●●●

● ●●

● ●

● ●●

● ●

● ●●

● ●

●●●

● ●

●●

●●

●●● ●

●●

●●

●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

● ●

●●

●●

●●

● ●●

●●●●

●●●

●●

● ●●● ●

●●●

● ●●

●●●● ●●●

● ●●

●● ●●

●●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

● ●

●●●

● ●

● ●

●●

● ● ●●

●●

● ●

●●●

●●●

●● ●

●● ● ●

● ●

●●

● ●

●●

● ●●

● ●●

●●●

●●●

●●

● ●●●

●●●● ●●●●

●● ●●

●●●●

●●

●●●

●●● ●●● ● ●

●●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●● ●●●

●●

●●●

●●

● ●●

● ●

●●

●●

●●● ●

●●

● ●● ●●

●●● ●

●●

●●●●

● ●●●

● ●

● ●

● ●●● ● ●

● ●●●

●●●

● ●● ●● ●

●●●● ●●

●●● ● ●●●● ●

●●●● ● ●●

●● ●● ● ●● ●

● ●●● ●●●

●●● ●●

●●●● ●●

●● ●●●

●● ● ●

●●

●●●

●●

●●●●●

●●●●●● ● ● ●

● ●●

●● ● ●

●●

●●●

●● ●●● ●●

●●● ● ●●●

●●●● ● ●

● ●● ●●

● ● ●● ●●

●● ●

●●●●●

● ●●●●

●●● ●●●

●●● ●

●● ●

●●● ●●

●●

●●

●●●

●●●

●● ●●●●●●●●●

●● ●● ●

●●

●●

● ●

● ●●●

●● ●

●●

● ●● ●

●● ●

●● ●●

●● ● ●

●●● ●

●●

●●

●●●●

●●●

●●

●●

●●

● ●●

●● ●

●●

● ●● ●● ●●

●●● ● ●●

● ● ●● ●●

●●●

●●

●●● ●●● ●

●●●● ●●

●●● ●● ●

●●●●● ●●●●

●●●●●

● ●

●●

● ●

● ●●

● ●●

●●●

● ●●

●●●

●●

●●

●●●●

●●●

●● ● ● ●● ●●● ●● ●● ●● ●●●

●●●●●

●● ●● ●● ●● ● ● ●● ●● ● ● ●● ●●● ●●● ●●●

● ● ●● ●● ●●●

● ● ● ●●● ● ●●●● ● ●

●●●● ● ●

●●●● ●●● ●●● ●●●● ●●

●● ●●

●●● ● ●●● ●● ●●●●● ●●●● ●●● ●● ● ●●● ● ●

●●● ●●

●●

● ●

●●● ●

●●● ●

● ●●●

●●● ●

●●● ●● ●●● ●

●●

● ●●●●● ● ●● ●● ●●

●● ● ●●● ●

●●●● ●

●●

● ●●●●●●● ●●● ● ●●

●● ●● ●

●●●●

●● ●● ●●●● ●●●

●● ●● ●●●●

●● ● ●● ●●●● ●● ●●

●● ●● ●●●

●● ●● ● ● ●● ●● ●

● ● ●● ●●●● ●● ●●

●●●●

●●●

●●● ● ●● ● ●●●

●●● ●● ●●

● ●● ●

● ●● ●● ● ●

●● ● ●● ●●●

● ●● ● ●●●●● ●●● ●

●● ●●●

●●●●● ● ●●

● ●●●

●● ●● ●● ●●●● ●●●●● ●● ● ●●●●● ● ●●

●● ●●● ●● ●

●●●● ●●●●● ●●●

●● ●●● ●● ●

●●●

●● ●● ● ●

●● ●●●●● ●● ●●

●●

● ●●● ● ●●● ● ● ● ●●

●●●

●● ● ●●

● ●●●●● ● ● ●●● ● ●●

●● ●

● ● ●●●

●●●

●●●

●●●

●● ●●

● ● ●●●●●●●

● ●● ●●

● ●●●

●●

●● ●

●● ●● ●●

●● ●

●● ●● ●●● ●●●● ●

●● ● ●●

●● ●

●●● ● ●●● ●●●

●●●

●●●

● ●● ●●●●●

● ● ●● ● ●●

●● ●● ● ● ●●

●●

● ●●●●●

●● ●●● ●

● ●●

●●●

●●

●●

●● ●

●●●●●●

●●●●●

●●●

● ●●●● ●

● ●

● ●●●

●●●●

●●

●● ●

●●●

●●●

●●

●● ●●

●●● ● ●●●

●●●● ● ●

●●●

●●

●●● ●

●● ● ●●●

scientific news

# shares

avg(

ψ)

101 102 103 104

050

010

00

conspiracy news

# shares

avg(

ψ)

101 102 103 104

012

5025

00

Figure 12: Number of shares versus average likingactivity of users who liked the posts in our samples.

Since the distributions of both degree and liking activity areheavy tailed, large volumes of users that make a post viralare mainly composed of users with low degree and likingactivity.

7.2 Viral Processes on False Information

# shares

avg(

deg)

101 102 103 104 105

050

010

00

# shares

avg(

ψ)

101 102 103 104 105

015

030

0

Figure 13: Number of shares versus average degreeand average liking activity of users who liked trollposts.

Users’ preference towards information belonging to a par-ticular kind of narrative might provide interesting insightswith respect to the determinants of viral phenomena. Un-der the light of the findings in the previous section, we nowfocus on how false information can spread in Facebook. Weuse the set of 4,709 troll posts described in Section 3 andwe study the virality of each post with respect to the net-work structure and users’ liking activity. Notice that such aset of posts is disjoint from the one used to classify users aspolarized towards science or conspiracy, thus we can obtainunbiased results regarding the effect of users’ polarization inthe diffusion of false information. Figure 13 shows the sameinformation of figures 11 and 12 but for troll posts. Again,in viral posts, very active users (in terms either of number offriends or liking activity) form a tiny fraction of the overallusers. However, we found an interesting fact: in most ofthe viral posts users are highly polarized. In Figure 14, foreach troll post, we show polarization ρ(·) of users who likedit against its number of shares. Whereas users’ both degreeand liking activity become low as the number of shares in-creases, polarization ρ(u) is relevant for all levels of shares.In Figure 15 we show the average value of the polarizationfor increasing levels of the number of shares. More precisely,for each level of virality x, we compute the average polariza-tion of all users who liked troll posts with number of sharesat least x. As we can see, the curve has an increasing trendthat asymptotically stabilizes around 0.76. The plot indi-cates that for posts containing false claims with more than100 shares, the average polarization of the users who liked itis high and remains constant with the increase (at differentorders of magnitude) of the virality.

To better shape the role the polarization ρ(·) in viral phe-nomena around false information, we study how it changesby considering posts with increasing levels of virality. InFigure 16 we show the PDF of users’ average polarizationfor different levels of virality. We observe that for highlyviral posts (with false information) the average polarization

Page 9: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

# shares

ρ

101 102 103 104 105

00.

20.

40.

60.

81

Figure 14: Polarization on troll posts. The averagepolarization versus the number of shares for trollposts.

# shares

avg(

ρ)

101 102 103 104 105

0.6

0.65

0.7

0.75

0.8

# shares

avg(

ρ)

10 20 50 100

0.65

0.7

0.75

Figure 15: Average polarization of users who likedtroll posts for increasing values of the number ofshares.

# shares > 10

PD

F

0 0.2 0.4 0.6 0.8 105

10152025

# shares > 102

PD

F

0 0.2 0.4 0.6 0.8 105

10152025

# shares > 103

PD

F

0 0.2 0.4 0.6 0.8 105

10152025

# shares > 104

ρ

PD

F

0 0.2 0.4 0.6 0.8 105

10152025

Figure 16: PDF of the average polarization for dif-ferent levels of virality.

value of the users who like them peaks around 0.8.

Observe that the pick increases with virality of posts. indi-cation that high polarization is most of the times essentialfor a post becoming viral. The results indicate that a goodindicator to detect viral phenomena around false informa-tion is the users’ polarization. The diffusion of false claimsproliferates within (tightly clustered) users that are usuallyexposed to unsubstantiated rumors.

7.3 Case StudyLet us now obtain some more intuition by studying in moredetail the most viral troll posts [45–47]. In Figure 17 weshow the probability density function of the polarization ofthe users that liked each of the three troll posts. The mostviral post with 132K shares says that in the year of thepost (2013), after 5,467 years, the month of October has 5Tuesdays, Wednesdays, and Thursdays, and that this is avery rare event so that Chinese people call it year of theglory shu tan tzu. The fact that October 2013 has 5 Tues-days, Wednesdays, and Thursdays is true, however the restis false: this happens around once every seven years andthe phrase has no sense in Chinese. The second one is thepopular case of Senator Cirenga mentioned in the introduc-tion, which received more than 36K shares. The third onehas a more political taste by stating that the former Italianspeaker of the house received a large sum of money after hisresignation. This post was shared 28K times. In Figure 17we show the PDF of users’ polarization liking these threeposts. We can see that in all three posts there is a signifi-cant proportion of conspiracy users, who have seen the post

Page 10: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

# shares = 132K

ρ

PD

F

0 0.2 0.4 0.6 0.8 1

02.

55

# shares = 36K

ρ

PD

F

0 0.2 0.4 0.6 0.8 1

04

8

# shares = 28K

ρ

PD

F

0 0.2 0.4 0.6 0.8 1

025

50

Figure 17: PDF of the polarization of the users wholiked one of the three most viral troll posts.

through a viral process (note that the total subscribers tothe Facebook group pages are upper bounded by about only8K, and we expect a much smaller proportion to have seenthe posts in their Facebook feeds). Interestingly, there is al-ways a no-negligible number of science users who have alsoliked the posts, even though they are from four to ten timesless.

Actually, we can observe that the percentage of science usersdepends directly on the content of the post: the more to-wards conspiracy a post is, the less the ratio of science users.To provide a better outline of this phenomenon we also plotin Figure 18 the two most viral posts [48, 49] from the trollpage that is more specific in teasing conspiracy stories (seeSection 3). The first post mentions that a Chinese engineerdiscovered an eco-friendly (uranium-based) lamp that pro-duces extremely bright light at an extremely low cost (orderof pennies) while being able to several hundred years. Thesecond one is a version of the chemtrail conspiracy theory,mentioning, among others, that one of the ingredients withwhich we are bing sprayed is sildenafil citrate—the activeingredient of Viagra. Note that in those ones the liking ac-tivity of science users is very close to zero.

8. CONCLUSIONSNowadays, understanding the determinants of viral phenom-ena occurring on socio-technical systems is a very interest-ing challenge. The free circulation of contents has changedthe way people get informed, and the quality of informa-tion plays a fundamental role in collective debates. In thiswork, we addressed the relationships between the viralityof false claims and the users information consumption pat-terns. To do this, we analyze a sample of 1.2M FacebookItalian users consuming different (and opposite) types of

# shares = 951

ρ

PD

F

0 0.2 0.4 0.6 0.8 1

015

30

# shares = 706

ρ

PD

F

0 0.2 0.4 0.6 0.8 1

02.

55

Figure 18: PDF of the polarization of the users wholiked one of the two most viral conspiracy-like trollposts.

information—science and conspiracy news. We found outthat users engagement across different contents correlateswith the number of friends having similar consumption pat-terns (homophily). Next, we studied the virality of postsof the two categories, finding that both present similar sta-tistical signatures. Finally, we tested diffusion patterns onan external set of 4,709 intentionally satirical false posts(troll posts) and we discovered that neither the presence ofhubs (structural properties) nor the most active users (in-fluencers) are prevalent in viral phenomena. The resultsof this work provide important insights towards the under-standing of viral processes around false information on on-line social media. In particular, we provide a new metric—user polarization defined on the information consumptionpatterns—which along with users’ engagement and the num-ber of friends helps in the discovery of where in the socialnetwork false claims are more likely to spread. In addition,we show that users’ aggregation around shared beliefs makesthe frequent exposure to conspiracy stories (polarization) adeterminant for the virality of false information.

9. ACKNOWLEDGMENTSFunding for this work was provided by EU FET projectMULTIPLEX nr. 317532 and SIMPOL nr. 610704. Thefunders had no role in study design, data collection and anal-ysis, decision to publish, or preparation of the manuscript.Special thanks for building the atlas of pages to Protesi diProtesi di Complotto, Che vuol dire reale, La menzogna di-venta verita e passa alla storia, Simply Humans, Semplice-mente me. Finally, we want to mention Salvatore Previti,Elio Gabalo, Titta, Morocho, and Sandro Forgione for inter-esting discussions.

Page 11: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

10. REFERENCES[1] G. Lotan, E. Graeff, M. Ananny, D. Gaffney, I. Pearce,

and D. Boyd. The revolutions were tweeted:Information flows during the 2011 tunisian andegyptian revolutions. International Journal ofCommunications, 5:1375–1405, 2011.

[2] K. Lewis, M. Gonzalez, and J. Kaufman. Socialselection and peer influence in an online socialnetwork. Proceedings of the National Academy ofSciences, 109(1):68–72, January 2012.

[3] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Signednetworks in social media. In Proceedings of theSIGCHI Conference on Human Factors in ComputingSystems, CHI ’10, pages 1361–1370, New York, NY,USA, 2010. ACM.

[4] J. Kleinberg. Analysis of large-scale social andinformation networks. Philosophical Transactions ofthe Royal Society A: Mathematical, Physical andEngineering Sciences, 371, 2013.

[5] R. Kahn and D. Kellner. New media and internetactivism: from the ‘battle of seattle’ to blogging. newmedia and society, 6(1):87–95, 2004.

[6] P. N. Howard. The arab spring’s cascading effects.Miller-McCune, 2011.

[7] S. Gonzalez-Bailon, J. Borge-Holthoefer, A. Rivero,and Y. Moreno. The dynamics of protest recruitmentthrough an online network. Scientific Report, 2011.

[8] R. Bond, J. Fariss, J. Jones, A. Kramer, C. Marlow,J. Settle, and J. Fowler. A 61-million-personexperiment in social influence and politicalmobilization. Nature, 489(7415):295–298, 2012.

[9] S. Buckingham Shum, K. Aberer, A. Schmidt,S. Bishop, P. Lukowicz, S. Anderson, Y. Charalabidis,J. Domingue, S. Freitas, I. Dunwell, B. Edmonds,F. Grey, M. Haklay, M. Jelasity, A. Karpistsenko,J. Kohlhammer, J. Lewis, J. Pitt, R. Sumner, andD. Helbing. Towards a global participatory platform.The European Physical Journal Special Topics,214(1):109–152, 2012.

[10] D. Centola. The spread of behavior in an online socialnetwork experiment. Science, 329(5996):1194–1197,2010.

[11] M. Paolucci, T. Eymann, W. Jager, J. Sabater-Mir,R. Conte, S. Marmo, S. Picascia, W. Quattrociocchi,T. Balke, S. Koenig, T. Broekhuizen, D. Trampe,M. Tuk, I. Brito, I. Pinyol, and D. Villatoro. SocialKnowledge for e-Governance: Theory and Technologyof Reputation. Roma: ISTC-CNR, 2009.

[12] W. Quattrociocchi, R. Conte, and E. Lodi. Opinionsmanipulation: Media, power and gossip. Advances inComplex Systems, 14(4):567–586, 2011.

[13] W. Quattrociocchi, M. Paolucci, and R. Conte. On theeffects of informational cheating on social evaluations:image and reputation through gossip. InternationalJournal of Knowledge and Learning, 5(5/6):457–471,2009.

[14] V. Bekkers, H. Beunders, A. Edwards, and R. Moody.New media, micromobilization, and political agendasetting: Crossover effects in political mobilization andmedia usage. The Information Society, 27(4):209–219,2011.

[15] W. Quattrociocchi, G. Caldarelli, and A. Scala.

Opinion dynamics on interacting networks: mediacompetition and social influence. Scientific Reports, 4,2014.

[16] L. Howell. Digital wildfires in a hyperconnected world.In WEF Report 2013. World Economic Forum, 2013.

[17] S. Goel, D. J. Watts, and D. G. Goldstein. Thestructure of online diffusion networks. In Proceedingsof the 13th ACM Conference on Electronic Commerce,EC ’12, pages 623–638, New York, NY, USA, 2012.ACM.

[18] P. A. Dow, L.. Adamic, and A. Friggeri. The anatomyof large facebook cascades. In Emre Kiciman,Nicole B. Ellison, Bernie Hogan, Paul Resnick, andIan Soboroff, editors, ICWSM. The AAAI Press, 2013.

[19] R. Kumar, M. Mahdian, and M. McGlohon. Dynamicsof conversations. In Proceedings of the 16th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining, KDD ’10, pages 553–562,New York, NY, USA, 2010. ACM.

[20] Z. Ma, A. Sun, and G. Cong. On predicting thepopularity of newly emerging hashtags in twitter.JASIST, 64(7):1399–1410, 2013.

[21] Study explains why your stupid facebook friends areso gullible, March 2014.

[22] Facebook, trolls, and italian politics, March 2014.

[23] Giulio Ambrosetti. I forconi: “il senato ha approvatouna legge per i parlamentari in crisi”. chi non verrarieletto, oltre alla buonuscita, si becchera altri soldi.sara vero? Website, 8 2013. last checked: 19.01.2014.

[24] M.J. Salganik, P.S. Dodds, and D.J. Watts.Experimental Study of Inequality andUnpredictability in an Artificial Cultural Market.Science, 311(5762):854–856, 2006.

[25] B Franks, A Bangerter, and MW Bauer. Conspiracytheories as quasi-religious mentality: an integratedaccount from cognitive science, social representationstheory, and frame theory. Frontiers in psychology,4:424, 2013.

[26] C. R. Sunstein and A. Vermeule. Conspiracy theories:Causes and cures*. Journal of Political Philosophy,17(2):202–227, 2009.

[27] K. McKelvey and F. Menczer. Truthy: Enabling thestudy of online social networks. In Proceedings ofCSCW ’13, 2013.

[28] Ebola lessons: How social media gets infected, March2014.

[29] The ebola conspiracy theories, March 2014.

[30] The inevitable rise of ebola conspiracy theories, March2014.

[31] M. L. Meade and H. L. Roediger. Explorations in thesocial contagion of memory. Memory and Cognition,30(7):995–1009, 2002.

[32] C. Mann and F. Stewart. Internet Communicationand Qualitative Research: A Handbook for ResearchingOnline (New Technologies for Social Research series).Sage Publications Ltd, September 2000.

[33] R. K. Garrett and B. E. Weeks. The promise and perilof real-time corrections to political misperceptions. InProceedings of the 2013 conference on Computersupported cooperative work, Proceedings of CSCW ’13,pages 1047–1058. ACM, 2013.

Page 12: Viral Misinformation: The Role of Homophily and Polarization · Viral Misinformation: The Role of Homophily and Polarization Aris Anagnostopoulos Dept. of Computer, Control, and Management

[34] A. Bessi, G. Caldarelli, M. Del Vicario, A. Scala, andW. Quattrociocchi. Social determinants of contentselection in the age of (mis)information. Proceedings ofSOCINFO 2014, abs/1409.2651, 2014.

[35] D. Mocanu, L. Rossi, Q. Zhang, M. Karsai, andW. Quattrociocchi. Collective attention in the age of(mis)information. Computers in Human Behavior,abs/1403.3344, submitted.

[36] A. Bessi, M. Coletto, G.A. Davidescu, A. Scala,G. Caldarelli, and W. Quattrociocchi. Science vsconspiracy: collective narratives in the age of(mis)information. Plos ONE, submitted.

[37] J. Ugander, Lars B., Cameron M., and J. Kleinberg.Structural diversity in social contagion. Proceedings ofthe National Academy of Sciences, 2012.

[38] D. Centola and M. Macy. Complex contagion and theweakness of long ties. American Journal of Sociology,113:702–734, 2007.

[39] S. Aral and D. Walker. Identifying influential andsusceptible members of social networks. Science, 337,2012.

[40] S. Aral, L. Muchnik, and A. Sundararajan.Distinguishing influence-based contagion fromhomophily-driven diffusion in dynamic networks.Pnas, 106, 2009.

[41] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. Therole of social networks in information diffusion.WWW’12, 2012.

[42] D. Centola. Homophily, cultural drift, and theco-evolution of cultural groups. Journal of ConflictResolution, 51:905–929, 2007.

[43] J. Cheng, L. Adamic, P. Dow, J. Kleinberg, andJ. Leskovec. Can cascades be predicted? WWW’14,2014.

[44] E. L. Kaplan and Paul Meier. Nonparametricestimation from incomplete observations. Journal ofthe American Statistical Association, 53(282):457–481,1958.

[45] Semplicemente me, November 2014.https://www.facebook.com/286003974834081/

photos/a.286013618166450.51339.

286003974834081/403452636422547.

[46] Semplicemente me, November 2014.https://www.facebook.com/286003974834081/

photos/a.307604209340724.56105.

286003974834081/335640103203801.

[47] Semplicemente me, November 2014.https://www.facebook.com/286003974834081/

photos/a.307604209340724.56105.

286003974834081/306244606143351.

[48] Simply humans, November 2014. https://www.facebook.com/photo.php?fbid=494359530660272.

[49] Simply humans, November 2014.https://www.facebook.com/HumanlyHuman/photos/

a.468232656606293.1073741826.466607330102159/

532577473505144.