11
Interactions with Potential Mis/Disinformation URLs Among U.S. Users on Facebook, 2017-2019 Aydan Bailey Theo Gregersen Franziska Roesner Paul G. Allen School of Computer Science & Engineering, University of Washington Note (September 10, 2021): After the publication of this paper, we learned that the dataset provided by Facebook contained an error: interactions from the 50% of U.S. users who had no political page affinity (PPA) score were excluded (see https://www.nytimes.com/live/2020/2020-election-misinformation- distortions#facebook-sent-flawed-data-to-misinformation-researchers). When the data is updated, we plan to revisit our analyses. As written, the paper reports only on the 50% of U.S. users who had PPA scores and who thus interacted with Facebook in certain ways that may impact our findings. ABSTRACT Misinformation and disinformation online — and on social media in particular — have become a topic of widespread concern. Re- cently, Facebook and Social Science One released a large, unique, privacy-preserving dataset to researchers that contains data on URLs shared on Facebook in 2017-2019, including how users in- teracted with posts and demographic data from those users. We conduct an exploratory analysis of this data through the lens of mis/disinformation, finding that posts containing potential and known mis/disinformation URLs drew substantial user engagement. We also find that older and more politically conservative U.S. users were more likely to be exposed to (and ultimately re-share) poten- tial mis/disinformation, but that those users who were exposed were roughly equally likely to click regardless of demographics. We discuss the implications of our findings for platform interventions and further study towards understanding and reducing the spread of mis/disinformation on social media. CCS CONCEPTS Human-centered computing Empirical studies in collab- orative and social computing; Information systems So- cial networking sites; Security and privacy Social aspects of security and privacy; KEYWORDS misinformation, disinformation, social media Co-first authors listed in alphabetical order. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). FOCI’21, August 27, 2021, Virtual Event, USA © 2021 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8640-1/21/08. https://doi.org/10.1145/3473604.3474561 ACM Reference Format: Aydan Bailey, Theo Gregersen, and Franziska Roesner. 2021. Interactions with Potential Mis/Disinformation URLs Among U.S. Users on Facebook, 2017-2019. In ACM SIGCOMM 2021 Workshop on Free and Open Communi- cations on the Internet (FOCI’21), August 27, 2021, Virtual Event, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3473604.3474561 1 INTRODUCTION The last several years have brought significant concerns about the spread and influence of (unintentionally false) misinformation and (intentionally false) disinformation online [25, 43, 49] Sometimes referred to as “fake news” or information pollution, mis/disinfor- mation is notorious for spreading through social media platforms like Facebook and Twitter, where large (and increasing) fractions of Americans consume at least some of their news [39]. While social media platforms analyze their own user data inter- nally, it is also important to study mis/disinformation in a public, neutral, academic manner. Such research enables us to compare findings across platforms, contributes to scientific knowledge about human behaviors related to online mis/disinformation more gener- ally, and ultimately informs efforts to reduce its spread and impact. Prior efforts have often focused on Twitter because the openness of the platform lends itself to external research, though prior work has also studied public data on other platforms (see Section 2). For more closed platforms like Facebook, academic work often relies on smaller-scale case studies and/or controlled user studies rather than large-scale organic platform usage data. In this work, we leverage a released privacy-protected dataset from Facebook [29] to conduct an exploratory study of the spread of potential mis/disinformation and associated user behaviors on the platform. This dataset, made available to researchers through an application process organized by Facebook and Social Science One [40], contains billions of rows of Facebook data from January 2017 to December 2019 about which demographic groups of users saw and interacted with which URLs on the platform. Though this dataset has some privacy-related limitations, it provides us with a unique view into a slice of Facebook user behaviors and the spread of mis/disinformation and other URLs shared on Facebook.

Interactions with Potential Mis/Disinformation URLs Among

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Interactions with Potential MisDisinformation URLs AmongUS Users on Facebook 2017-2019

Aydan BaileylowastTheo GregersenlowastFranziska Roesner

Paul G Allen School of Computer Science amp Engineering University of Washington

Note (September 10 2021) After the publication of this paper we learned that the dataset provided byFacebook contained an error interactions from the 50 of US users who had no political page affinity(PPA) score were excluded (see httpswwwnytimescomlive20202020-election-misinformation-

distortionsfacebook-sent-flawed-data-to-misinformation-researchers) When the data is updated weplan to revisit our analyses As written the paper reports only on the 50 of US users who had PPA scores

and who thus interacted with Facebook in certain ways that may impact our findings

ABSTRACTMisinformation and disinformation onlinemdash and on social mediain particularmdashhave become a topic of widespread concern Re-cently Facebook and Social Science One released a large uniqueprivacy-preserving dataset to researchers that contains data onURLs shared on Facebook in 2017-2019 including how users in-teracted with posts and demographic data from those users Weconduct an exploratory analysis of this data through the lens ofmisdisinformation finding that posts containing potential andknown misdisinformation URLs drew substantial user engagementWe also find that older and more politically conservative US userswere more likely to be exposed to (and ultimately re-share) poten-tial misdisinformation but that those users who were exposedwere roughly equally likely to click regardless of demographics Wediscuss the implications of our findings for platform interventionsand further study towards understanding and reducing the spreadof misdisinformation on social media

CCS CONCEPTSbullHuman-centered computingrarr Empirical studies in collab-orative and social computing bull Information systems rarr So-cial networking sites bull Security and privacy rarr Social aspectsof security and privacy

KEYWORDSmisinformation disinformation social media

lowastCo-first authors listed in alphabetical order

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for third-party components of this work must be honoredFor all other uses contact the ownerauthor(s)FOCIrsquo21 August 27 2021 Virtual Event USAcopy 2021 Copyright held by the ownerauthor(s)ACM ISBN 978-1-4503-8640-12108httpsdoiorg10114534736043474561

ACM Reference FormatAydan Bailey Theo Gregersen and Franziska Roesner 2021 Interactionswith Potential MisDisinformation URLs Among US Users on Facebook2017-2019 In ACM SIGCOMM 2021 Workshop on Free and Open Communi-cations on the Internet (FOCIrsquo21) August 27 2021 Virtual Event USA ACMNew York NY USA 11 pages httpsdoiorg10114534736043474561

1 INTRODUCTIONThe last several years have brought significant concerns about thespread and influence of (unintentionally false) misinformation and(intentionally false) disinformation online [25 43 49] Sometimesreferred to as ldquofake newsrdquo or information pollution misdisinfor-mation is notorious for spreading through social media platformslike Facebook and Twitter where large (and increasing) fractionsof Americans consume at least some of their news [39]

While social media platforms analyze their own user data inter-nally it is also important to study misdisinformation in a publicneutral academic manner Such research enables us to comparefindings across platforms contributes to scientific knowledge abouthuman behaviors related to online misdisinformation more gener-ally and ultimately informs efforts to reduce its spread and impactPrior efforts have often focused on Twitter because the opennessof the platform lends itself to external research though prior workhas also studied public data on other platforms (see Section 2) Formore closed platforms like Facebook academic work often relieson smaller-scale case studies andor controlled user studies ratherthan large-scale organic platform usage data

In this work we leverage a released privacy-protected datasetfrom Facebook [29] to conduct an exploratory study of the spreadof potential misdisinformation and associated user behaviors onthe platform This dataset made available to researchers throughan application process organized by Facebook and Social ScienceOne [40] contains billions of rows of Facebook data from January2017 to December 2019 about which demographic groups of userssaw and interacted with which URLs on the platform Though thisdataset has some privacy-related limitations it provides us with aunique view into a slice of Facebook user behaviors and the spreadof misdisinformation and other URLs shared on Facebook

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Specifically we investigate the following research questionsfocused on US Facebook users

(1) RQ1 Exposure to and interactions with potential misdisinformation on Facebook How and how much didUS Facebook users in our dataset see and interact withknown or potential misdisinformation URLs

(2) RQ2 Demographic differences What if any differencesexist in how different demographic groups interacted withUS-shared misdisinformation URLs in our dataset mdash includ-ing people of different genders age groups and US politicalinclinations

To investigate these questions we must overcome several chal-lenges First we must grapple with the question of ldquowhat is misdis-informationrdquo in the first place In the absence of ground truth weuse existing identifications of known or potential misdisinforma-tion URLs Facebookrsquos own third-party fact checking labels anda list of low-quality potential misdisinformation domains previ-ously compiled by other researchers [51] We must also considerseveral limitations of the dataset including its privacy-preservingnoisy properties and limited scope as well as more fundamentallimitations about the ability of platform data (where behaviorsare influenced by a platformrsquos design) to shed light on human be-havior in general [50] We thus focus on an exploratory analysisthat describes what we find in the data we do not aim to makegeneralizable claims about people in general

Among our results we find that in 2017-2019 known and po-tential misdisinformation drew substantial user engagement onFacebook and that large fractions of users re-shared posts contain-ing (any) URLs without first clicking on them We also find thatolder adults with more conservative US political leanings wereexposed to clicked on and re-shared potential misdisinformationmore than other groups However the largest differences are in whowas exposed to (ie shown) these posts in the first place whichmay reflect only partly on explicit user behaviors and instead resultfrom Facebookrsquos newsfeed algorithm

Based on our findings we reflect on the implications for platform-based misdisinformation defenses (eg the importance of prevent-ing initial exposure) as well as future research (eg towards de-fenses that reduce susceptibility and spread after exposure) Overallgiven this unique dataset our results provide a view into US usersrsquoexposure to and interactions with potential misdisinformation onFacebook in 2017-2019 that could not previously be studied at thisscale

2 BACKGROUND AND RELATEDWORKFully systematizing the broad and growing research space aroundonline misdisinformation is beyond our scope but we point toseveral summaries [25 43 49] and highlight some of the mostrelated examples We present a more detailed comparison of ourfindings to prior work in Section 5

Ecosystem and Platform Studies Methodologically our workis most closely related to prior studies of user behaviors based onsocial media platform data directly There are several importantlimitations of this methodology including limitations of platformAPIs [43] Notably Facebookrsquos API is limited and has become moreso over time [15 34] though Twitterrsquos API also has limitations [46]

Moreover human behaviors on specific platforms are shaped by thedesigns of those platforms and thus findings from these contextsshould not be over-generalized [50]

Nevertheless studying data on platforms themselves gives im-portant insights into how misdisinformation spreads in practiceDue to its relatively open API many of these studies have focusedon Twitter [1 6 7 18 38 44 48] though significant work hasalso studied user behaviors on Facebook [12 13 16 20 27 45] aswell as on platforms like Reddit WhatsApp YouTube and oth-ers [4 14 21 23 28 35 36]

Studies of Facebook tend to predate the early 2018 API restric-tions andor focus on specific group or topic case studies eg publicFacebook groups on known conspiracy theories or Facebook pagesof known news outlets [30 47] Our work provides insights froma broader view of Facebook data that was made available throughFacebookrsquos partnership with Social Science One [29] Research onthis dataset is beginning to appear [19] our investigation com-plements this recent work confirming some of its findings witha different view on misdisinformation as well as providing newresults

User Studies Other work has studied people directly conductingsurvey controlled lab or in-depth qualitative studies to test andorobserve interactions with misdisinformation on social media plat-forms As above these studies have focused primarily on Facebookandor Twitter [12 13 16 17 20 26 27 45] or surveyed peopleabout their general social media usage [39] Findings about theeffectiveness of interventions on social media platforms (eg ldquofakenewsrdquo labels) have been mixed [5 13 17 24 33]

3 METHODS31 Dataset OverviewThe Facebook URL dataset that we analyze was made available tous via an application process in a partnership between the SocialScience Research Council Social Science One and Facebook [41] Asper our Research Data Agreement with Facebook we accessed thisdataset only via Facebook-provided servers and did not downloada copy of the dataset Between the original public description ofthe dataset [29] and our analysis in May 2021 Facebook updatedthe dataset (at least) to cover a longer time period (through March2020 at the time of our analysis) Since we did not download orsave a previous version of the dataset we cannot verify exactlywhat changes have been made over time The results in this paperare based on the version of the dataset that was available to us onFacebookrsquos servers as of early May 2021 which we describe here

The version of the dataset we analyzed in May 2021 consistedof 417 million URLs from 11 million unique parent domains Thisincludes only URLs that were shared or posted publicly on Face-book at least 100 times (with Laplace(5) noise) The included URLswere posted between January 2011 and March 20201 The datasetalso includes 17 trillion datapoints of user interactions with thoseURLs (eg clicks re-shares) from January 2017 through December2019 associated with some noisy (privacy-preserving) demographicinformation Data about repeated actions (eg subsequent clicks1Though the bulk of the URL data and all of the user interaction data is from the2017-2019 time range some URLs shared during that period were first posted as earlyas January 2011 as reflected by the first_post_time attribute in the dataset

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

from users who clicked on a URL more than once) or users wholater deleted their Facebook accounts is not included

Our Scope and BaselineWe only consider URL data during thetime period for which we have corresponding user interaction dataJanuary 2017 through December 2019 We also consider only inter-actions from US users (for whom there is additional demographicinformation discussed below) and we scope the URL list to thoseURLs that are labeled as most-shared in the US (as opposed to inother countries) Our final subset thus consists of 9172097 URLsfrom 216137 unique parent domains (and associated user interac-tion data) We refer to this set of URLs as our baseline ldquoUS URLsrdquosubset

We use this subset of all US URLs as a baseline against which wecompare user interactions with known and potential misdisinfor-mation We use this full subset rather than a more specific baseline(eg mainstream or trustworthy news sources) because this allowsus to consider the ldquoaveragerdquo US URL on Facebook without im-posing any value judgements on what is trustworthy or otherwiseworth including in a baseline This full set of US URLs likely in-cludes a wide variety of types of content (which future work maywish to separate out further) but we note that because the datasetincludes only URLs that were shared publicly 100 or more times(with Laplace(5) noise) a potentially long tail of unpopular websitesis excluded by definition

Demographic Data The dataset includes demographic informa-tion associated with URL interactions age bracket gender andldquopolitical page affinityrdquo The latter refers to US political leaningand is available only for US users It is calculated by analyzingthe pages that users follow and based on methods from Barberaacute etal [3] The dataset documentation does not detail further how PPAis calculated here but the end result maps users into five integerbuckets from -2 (ideological left ie liberal) to 2 (ideological rightie conservative) For simplicity and to avoid suggesting conclu-sions where we draw none we omit demographic categories withsubstantially less data (ie ldquootherrdquo for gender and ldquoNArdquo for age)from our figures and analysis

Differential Privacy In order to release this dataset while pro-tecting the privacy of the users whose data is represented Face-book applied differential privacy techniques to add noise to thedata [9 10 29] Specifically the dataset has been made differentiallyprivate on the level of user interactions meaning that the datasetwould look statistically the same whether or not any particular userinteraction is actually included Theoretically this gives plausibledeniability about any particular action by any particular user whilestill accurately reflecting behavior statistically at this large scale

In practice differential privacy here is accomplished by addingGaussian noise to certain columns in the dataset In our results wethus display calculated (noisy) values as well as 95 confidenceintervals Signal-to-noise properties apply though statistics forlarger samples include more noise that noise will generally besmaller relative to the true value

Because noise was applied to the dataset only for data columnsrelated to user activity and demographics URL attributes such aspost times and fact-check ratings are precise

32 Known and Potential MisDisinformationURL Subsets

To investigate our research questions we need a list of misdisinfor-mation domains andor URLs However compiling such a list is amajor research challenge of its own Since addressing that challengeis not a goal of this work we rely on two existing characterizationsone that is likely too narrow and one that is likely too broad

(1) Third-Party Fact-Checked False List (ldquoTPFC False USrdquo)Our most straightforward source of ldquofalserdquo URLs are those flaggeddirectly by Facebook in collaborationwith its third-party fact check-ing partners This process focuses primarily on ldquoviralrdquo misinfor-mation (ie popular posts) as well as clear falsehoods Posts offlagged URLs are labeled in Facebookrsquos UIs andor downranked inits newsfeed algorithm [11] Specifically we focus on US URLsmarked as ldquofact checked as falserdquo or ldquofact checked as mixture orfalse headlinerdquo (as opposed to eg ldquosatirerdquo) There were 6838 USURLs with such ratings from 2644 unique parent domains

The advantage of this list is our high confidence in the ldquofalserdquolabels The disadvantages include that inclusion in the list dependson decisionsmade by Facebook and its partners (eg a focus on viralcontent and limited coverage due to limited resources) MoreoverFacebookrsquos fact-checking choices are not entirely transparent andhave been criticized for being both potentially too harsh andor toolenient on both US politically left- and right-leaning content [3242] This list thus does not represent a labeling of a random sampleof URLs and exclusion from this list does not imply truth

(2) Broad PotentialMisinformation List (ldquoLow-Quality NewsUSrdquo) Since Facebookrsquos fact-checked URL list is a narrow view ofpotential misdisinformation we also consider a much broader viewof low-quality news Specifically we consider a subset of potentialmisdisinformation domains subsampled from a list from Zenget al [51] This list (of which we obtained an updated version inOctober 2020) includes 1344 domains compiled from amateur opensource and professional content checkers Since this list includes abroad range of low-quality news domains we subsampled basedon labels only from professional fact-checking sources SnopesFactCheckorg and Politifact The resulting subsample contains 120domains 103 of which were present and corresponded to 108408US URLs in the Facebook dataset

The disadvantage of this list is that it may be overly general notall articles on a domain may contain falsehoods and thus we canconsider these URLs only potential misdisinformation or moregenerally low-quality news However this list provides us witha much broader perspective that is not influenced by Facebookrsquosfact-checking decisions

33 EthicsOur study was reviewed by our universityrsquos human subjects reviewboard (IRB) Because the dataset does not include identifiable infor-mation about individuals our IRB determined that this study didnot classify as human subjects research Nevertheless we treatedthe data with caution and adhered to Facebookrsquos guidelines aboutits use accessing it only on Facebook-provided servers making noefforts to de-anonymize or identify any individuals and submittingthe paper to Facebook to review for potential privacy issues in

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

advance of publication (No changes to the paper were requestedby Facebook)

34 LimitationsFirst as discussed we lack ground truth for misdisinformationand instead consider both a narrow (the TPFC False URL list) anda broad (the low-quality news URL list) view for this exploratorystudy

Second the noisy privacy-preserving properties of the datasetlimit us to questions about interaction data in aggregate (not indi-vidual users or actions) The fact that the dataset (also for privacyreasons) includes only data about URLs that were shared publicly byenough people on Facebook also means there may a selection biastowards certain URLs (that tend to be shared publicly and widely)and Facebook users (those who make non-private posts)

Third there is other data that would provide useful contextthat we simply do not have For example we lack demographicinformation about who posted URLs in the first place the ldquosharerdquointeractions consist only of re-shares of posts We also cannot com-pare posts containing URLs (reflected in this dataset) with poststhat do not contain URLs

Moreover the fact that the dataset is not public and available onlythrough an application process and official agreementmdash thoughhelping to protect the privacy of the Facebook users whose datais representedmdashmakes it challenging for others to reproduce ourresults The fact that we only access the data on Facebookrsquos servers(per our agreement with Facebook) where the dataset may beupdated or our access revoked at any time also makes it potentiallychallenging for us (and others) to reproduce our own results inthe future As discussed the May 2021 version of the dataset weanalyzed includes more records than described in the June 2020public description of the dataset [29]

Finally we cannot separate the role of Facebookrsquos newsfeed algo-rithm and other design features from user behaviors [50] For exam-ple when we consider URL views we cannot distinguish whethera user saw a post because they chose to follow a relevant publicFacebook page because their friend posted it or because Facebookpushed a sponsored post to their feed We must assume that Face-bookrsquos algorithm attempts to push posts to people that they arelikely to engage with Similarly we lack details about how US polit-ical leaning (PPA) was calculated so we cannot distinguish whethercorrelations between PPA and certain behaviors (eg which URLsa user engages with) are by definition or emergent

Still this large Facebook dataset provides a unique opportunityto study user interactions with potential misdisinformation on theplatform itself providing a complementary perspective to othermethodologies with other limitations In our results we focus ondescribing what we find in this dataset about how people on Face-book interacted with URLs in 2017-2019 we do not intend to makegeneralizable claims about human behaviors in general

4 FINDINGSWe first characterize our URL subsets and then study how Facebookusers interacted with misdisinformation and low-quality newsURLs comparing to a baseline of all US-shared URLs and acrossdemographic groups

Parent Domain Unique URLs TPFC-F Total Viewsfoxnewscom 672K 7 [222e10 223e10]cnncom 761K 0 [216e10 217e10]

nytimescom 712K 1 [211e10 212e10]buzzfeedcom 278K 0 [174e10 175e10]

nprorg 254K 0 [170e10 170e10]youtubecom 8636K 163 [150e10 154e10]

washingtonpostcom 608K 1 [126e10 127e10]huffingtonpostcom 386K 0 [125e10 126e10]

dailywirecom 271K 20 [113e10 114e10]nbcnewscom 331K 0 [101e10 102e10]

Table 1 US Subset Domains with Most Views TPFC-F refersto the number of URLs from that domain that Facebookrsquos third-party fact-checkers labeled as false Given the differential privacynoise applied to the data view counts in Tables 1-3 are presentedwith 95 confidence intervals overlapping ranges means that wecannot distinguish the underlying non-private data points Thusthe ordering in these tables (by view count) is approximate

Parent Domain Unique URLs TPFC-F Total Viewshigherperspectivescom 830 2 [134e9 135e9]

dailysnarkcom 37K 4 [108e9 110e9]madworldnewscom 48K 4 [569e8 599e8]awarenessactcom 25K 10 [519e8 539e8]

disclosetv 37K 6 [465e8 491e8]thegatewaypunditcom 150K 42 [436e8 482e8]thepoliticalinsidercom 51K 2 [351e8 378e8]

infowarscom 89K 25 [338e8 377e8]worldnewsdailyreportcom 430 11 [343e8 353e8]

conservativepostcom 20K 8 [312e8 333e8]Table 2 Low-Quality News Domains with Most Views

Parent Domain Unique URLs TPFC-F Total TPFC-F Viewsyoutubecom 8636K 163 [532e7 570e7]

livebr0adcastcom 156 130 [556e5 200e6]yournewswirecom 26K 81 [671e7 710e7]overseasdailycom 132 78 [334e6 496e6]actual-eventstvcom 105 76 [410e5 151e6]channel23newscom 238 61 [160e7 199e7]

wfrv9com 63 59 [104e6 494e6]dailyusaupdatecom 481 55 [613e6 997e6]actual-eventscom 139 53 [790e4 983e5]

network-channelcom 138 50 [114e3 694e5]Table 3 Domains with Most TPFC False US URLs

41 Characterizing Our URL SubsetsTables 1 and 2 show the view counts for the top ten most-vieweddomains in our baseline US and our low-quality news URL subsetsrespectively Note that ldquoviewsrdquo here refers to instances when a usersaw a Facebook post containing a URL appearing in their newsfeednot instances where a user clicked through to the target of the URLThe tables also show how many unique URLs from each domainappear in that subset as well as how many URLs were fact-checkedby Facebookrsquos partners as false (the ldquoTPFC-Frdquo column)

We highlight several observations First the most popular US-shared domains in our dataset (Table 1) corresponded largely towell-known mainstream news sites though we note the outsizepopularity of youtubecom Second we note that the most popularUS-shared domains (Table 1) and the most popular domains fromour low-quality news list (Table 2) are disjoint That is potentialmisdisinformation domains in the form of low-quality news siteswere not among the most popular shared domains

Table 3 shows the domains in our dataset with the most URLsthat were third-party fact checked as false and the view counts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 1 Average Cumulative View Counts Over Time for URLs in Different Subsets 95 confidence intervals are shown

Figure 2 Percentage of URL Views in Each URL Subset by (a) Gender (b) Age Group (c) Political Page Affinity 95 confidenceintervals are shown

for those URLs This table reveals the comparatively small scopeof Facebookrsquos third-party fact checking labels which applied onlyto a small number of URLs most of which (except youtubecom)came from domains that appeared with relatively few unique URLson Facebook

42 Views (Exposure) Who SawWhat

Overall Views Exposure We begin with a core question howmuch were people exposed to (ie shown on their newsfeed) URLsin our different misdisinformation subsets compared to the base-line dataset of all US URLs In exploring this question we em-phasize again that Facebook plays a role in who sees which postsIn other words trends we surface below may reflect Facebookrsquosnewsfeed algorithm rather than (or in addition to) differences inhuman behaviors (eg which Facebook pages they follow) [50]

To investigate views we return to Tables 1 and 2 which show thetotal number of views across all URLs from each domain We findthat URLs from the most popular US domains overall received anorder of magnitude more views than URLs from popular domainsin the low-quality news subset (though the latter still received largenumbers of views) We also note that the total TPFC False URLviews for the domains in Table 3 were substantial considering thesmaller number of TPFC False URLs just dozens of TPFC False URLsproduced several million views (perhaps by definition given thatviral misinformation is prioritized for Facebookrsquos fact checking)

Spread Over Time We also ask how quickly did URLs in thedifferent subsets spread In Figure 1 we investigate the cumulativeviews over time averaged across URLs for each of our subsets Sincethe dataset provides timestamp information only at month-levelgranularity we can track only cumulative monthly views We findthat the average URL (in all subsets) spreads most quickly within the

first month after the initial post suggesting that misdisinformationinterventions must be deployed quickly to be effective

From Figure 1 we also see that URLs in the TPFC False subsetreceived significantly more views on average than those in othersubsets (hence likely making them targets for fact-checking) Inaddition TPFC False URLs were longer-lived receiving a greaterfraction of views after the firstmonth compared to the others (nearlydoubling in cumulative views) Unfortunately the granularity of thetimestamps in the dataset prevent us from investigating whetherspread decreased substantially after the time of fact checking

Demographic Differences in ExposureWe now turn to our sec-ond research question considering different demographic groups(gender age and political leaning) While we do not know the over-all demographic breakdown of Facebook users in general we cancompare demographics in interactions with our US URLs baselineto interactions with our low-quality news and TPFC False URLsubsets

Figure 2 shows the percentage of views from users in differentdemographic categories for the average URL in each URL subsetFor example considering the TPFC False URL subset we find thaton average (ie averaged across all URLs in this subset) more than20 of views came from users ages 65 or older Taking all US URLsas our baseline we find that known or potential misdisinformationURLs in both lists were shownmore often (on average) than baselineto male users to users ages 55 and older and to users with politicalpage affinity of 1 or more (ie right-leaning)

However the breakdown in Figure 2 obscures potential inter-actions between demographic categories (for example older usersmay also tend to be more conservative) To tease apart these poten-tial interactions Figure 3 shows heatmaps of how often differentdemographic groups were shown posts containing these URLs

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Specifically we investigate the following research questionsfocused on US Facebook users

(1) RQ1 Exposure to and interactions with potential misdisinformation on Facebook How and how much didUS Facebook users in our dataset see and interact withknown or potential misdisinformation URLs

(2) RQ2 Demographic differences What if any differencesexist in how different demographic groups interacted withUS-shared misdisinformation URLs in our dataset mdash includ-ing people of different genders age groups and US politicalinclinations

To investigate these questions we must overcome several chal-lenges First we must grapple with the question of ldquowhat is misdis-informationrdquo in the first place In the absence of ground truth weuse existing identifications of known or potential misdisinforma-tion URLs Facebookrsquos own third-party fact checking labels anda list of low-quality potential misdisinformation domains previ-ously compiled by other researchers [51] We must also considerseveral limitations of the dataset including its privacy-preservingnoisy properties and limited scope as well as more fundamentallimitations about the ability of platform data (where behaviorsare influenced by a platformrsquos design) to shed light on human be-havior in general [50] We thus focus on an exploratory analysisthat describes what we find in the data we do not aim to makegeneralizable claims about people in general

Among our results we find that in 2017-2019 known and po-tential misdisinformation drew substantial user engagement onFacebook and that large fractions of users re-shared posts contain-ing (any) URLs without first clicking on them We also find thatolder adults with more conservative US political leanings wereexposed to clicked on and re-shared potential misdisinformationmore than other groups However the largest differences are in whowas exposed to (ie shown) these posts in the first place whichmay reflect only partly on explicit user behaviors and instead resultfrom Facebookrsquos newsfeed algorithm

Based on our findings we reflect on the implications for platform-based misdisinformation defenses (eg the importance of prevent-ing initial exposure) as well as future research (eg towards de-fenses that reduce susceptibility and spread after exposure) Overallgiven this unique dataset our results provide a view into US usersrsquoexposure to and interactions with potential misdisinformation onFacebook in 2017-2019 that could not previously be studied at thisscale

2 BACKGROUND AND RELATEDWORKFully systematizing the broad and growing research space aroundonline misdisinformation is beyond our scope but we point toseveral summaries [25 43 49] and highlight some of the mostrelated examples We present a more detailed comparison of ourfindings to prior work in Section 5

Ecosystem and Platform Studies Methodologically our workis most closely related to prior studies of user behaviors based onsocial media platform data directly There are several importantlimitations of this methodology including limitations of platformAPIs [43] Notably Facebookrsquos API is limited and has become moreso over time [15 34] though Twitterrsquos API also has limitations [46]

Moreover human behaviors on specific platforms are shaped by thedesigns of those platforms and thus findings from these contextsshould not be over-generalized [50]

Nevertheless studying data on platforms themselves gives im-portant insights into how misdisinformation spreads in practiceDue to its relatively open API many of these studies have focusedon Twitter [1 6 7 18 38 44 48] though significant work hasalso studied user behaviors on Facebook [12 13 16 20 27 45] aswell as on platforms like Reddit WhatsApp YouTube and oth-ers [4 14 21 23 28 35 36]

Studies of Facebook tend to predate the early 2018 API restric-tions andor focus on specific group or topic case studies eg publicFacebook groups on known conspiracy theories or Facebook pagesof known news outlets [30 47] Our work provides insights froma broader view of Facebook data that was made available throughFacebookrsquos partnership with Social Science One [29] Research onthis dataset is beginning to appear [19] our investigation com-plements this recent work confirming some of its findings witha different view on misdisinformation as well as providing newresults

User Studies Other work has studied people directly conductingsurvey controlled lab or in-depth qualitative studies to test andorobserve interactions with misdisinformation on social media plat-forms As above these studies have focused primarily on Facebookandor Twitter [12 13 16 17 20 26 27 45] or surveyed peopleabout their general social media usage [39] Findings about theeffectiveness of interventions on social media platforms (eg ldquofakenewsrdquo labels) have been mixed [5 13 17 24 33]

3 METHODS31 Dataset OverviewThe Facebook URL dataset that we analyze was made available tous via an application process in a partnership between the SocialScience Research Council Social Science One and Facebook [41] Asper our Research Data Agreement with Facebook we accessed thisdataset only via Facebook-provided servers and did not downloada copy of the dataset Between the original public description ofthe dataset [29] and our analysis in May 2021 Facebook updatedthe dataset (at least) to cover a longer time period (through March2020 at the time of our analysis) Since we did not download orsave a previous version of the dataset we cannot verify exactlywhat changes have been made over time The results in this paperare based on the version of the dataset that was available to us onFacebookrsquos servers as of early May 2021 which we describe here

The version of the dataset we analyzed in May 2021 consistedof 417 million URLs from 11 million unique parent domains Thisincludes only URLs that were shared or posted publicly on Face-book at least 100 times (with Laplace(5) noise) The included URLswere posted between January 2011 and March 20201 The datasetalso includes 17 trillion datapoints of user interactions with thoseURLs (eg clicks re-shares) from January 2017 through December2019 associated with some noisy (privacy-preserving) demographicinformation Data about repeated actions (eg subsequent clicks1Though the bulk of the URL data and all of the user interaction data is from the2017-2019 time range some URLs shared during that period were first posted as earlyas January 2011 as reflected by the first_post_time attribute in the dataset

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

from users who clicked on a URL more than once) or users wholater deleted their Facebook accounts is not included

Our Scope and BaselineWe only consider URL data during thetime period for which we have corresponding user interaction dataJanuary 2017 through December 2019 We also consider only inter-actions from US users (for whom there is additional demographicinformation discussed below) and we scope the URL list to thoseURLs that are labeled as most-shared in the US (as opposed to inother countries) Our final subset thus consists of 9172097 URLsfrom 216137 unique parent domains (and associated user interac-tion data) We refer to this set of URLs as our baseline ldquoUS URLsrdquosubset

We use this subset of all US URLs as a baseline against which wecompare user interactions with known and potential misdisinfor-mation We use this full subset rather than a more specific baseline(eg mainstream or trustworthy news sources) because this allowsus to consider the ldquoaveragerdquo US URL on Facebook without im-posing any value judgements on what is trustworthy or otherwiseworth including in a baseline This full set of US URLs likely in-cludes a wide variety of types of content (which future work maywish to separate out further) but we note that because the datasetincludes only URLs that were shared publicly 100 or more times(with Laplace(5) noise) a potentially long tail of unpopular websitesis excluded by definition

Demographic Data The dataset includes demographic informa-tion associated with URL interactions age bracket gender andldquopolitical page affinityrdquo The latter refers to US political leaningand is available only for US users It is calculated by analyzingthe pages that users follow and based on methods from Barberaacute etal [3] The dataset documentation does not detail further how PPAis calculated here but the end result maps users into five integerbuckets from -2 (ideological left ie liberal) to 2 (ideological rightie conservative) For simplicity and to avoid suggesting conclu-sions where we draw none we omit demographic categories withsubstantially less data (ie ldquootherrdquo for gender and ldquoNArdquo for age)from our figures and analysis

Differential Privacy In order to release this dataset while pro-tecting the privacy of the users whose data is represented Face-book applied differential privacy techniques to add noise to thedata [9 10 29] Specifically the dataset has been made differentiallyprivate on the level of user interactions meaning that the datasetwould look statistically the same whether or not any particular userinteraction is actually included Theoretically this gives plausibledeniability about any particular action by any particular user whilestill accurately reflecting behavior statistically at this large scale

In practice differential privacy here is accomplished by addingGaussian noise to certain columns in the dataset In our results wethus display calculated (noisy) values as well as 95 confidenceintervals Signal-to-noise properties apply though statistics forlarger samples include more noise that noise will generally besmaller relative to the true value

Because noise was applied to the dataset only for data columnsrelated to user activity and demographics URL attributes such aspost times and fact-check ratings are precise

32 Known and Potential MisDisinformationURL Subsets

To investigate our research questions we need a list of misdisinfor-mation domains andor URLs However compiling such a list is amajor research challenge of its own Since addressing that challengeis not a goal of this work we rely on two existing characterizationsone that is likely too narrow and one that is likely too broad

(1) Third-Party Fact-Checked False List (ldquoTPFC False USrdquo)Our most straightforward source of ldquofalserdquo URLs are those flaggeddirectly by Facebook in collaborationwith its third-party fact check-ing partners This process focuses primarily on ldquoviralrdquo misinfor-mation (ie popular posts) as well as clear falsehoods Posts offlagged URLs are labeled in Facebookrsquos UIs andor downranked inits newsfeed algorithm [11] Specifically we focus on US URLsmarked as ldquofact checked as falserdquo or ldquofact checked as mixture orfalse headlinerdquo (as opposed to eg ldquosatirerdquo) There were 6838 USURLs with such ratings from 2644 unique parent domains

The advantage of this list is our high confidence in the ldquofalserdquolabels The disadvantages include that inclusion in the list dependson decisionsmade by Facebook and its partners (eg a focus on viralcontent and limited coverage due to limited resources) MoreoverFacebookrsquos fact-checking choices are not entirely transparent andhave been criticized for being both potentially too harsh andor toolenient on both US politically left- and right-leaning content [3242] This list thus does not represent a labeling of a random sampleof URLs and exclusion from this list does not imply truth

(2) Broad PotentialMisinformation List (ldquoLow-Quality NewsUSrdquo) Since Facebookrsquos fact-checked URL list is a narrow view ofpotential misdisinformation we also consider a much broader viewof low-quality news Specifically we consider a subset of potentialmisdisinformation domains subsampled from a list from Zenget al [51] This list (of which we obtained an updated version inOctober 2020) includes 1344 domains compiled from amateur opensource and professional content checkers Since this list includes abroad range of low-quality news domains we subsampled basedon labels only from professional fact-checking sources SnopesFactCheckorg and Politifact The resulting subsample contains 120domains 103 of which were present and corresponded to 108408US URLs in the Facebook dataset

The disadvantage of this list is that it may be overly general notall articles on a domain may contain falsehoods and thus we canconsider these URLs only potential misdisinformation or moregenerally low-quality news However this list provides us witha much broader perspective that is not influenced by Facebookrsquosfact-checking decisions

33 EthicsOur study was reviewed by our universityrsquos human subjects reviewboard (IRB) Because the dataset does not include identifiable infor-mation about individuals our IRB determined that this study didnot classify as human subjects research Nevertheless we treatedthe data with caution and adhered to Facebookrsquos guidelines aboutits use accessing it only on Facebook-provided servers making noefforts to de-anonymize or identify any individuals and submittingthe paper to Facebook to review for potential privacy issues in

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

advance of publication (No changes to the paper were requestedby Facebook)

34 LimitationsFirst as discussed we lack ground truth for misdisinformationand instead consider both a narrow (the TPFC False URL list) anda broad (the low-quality news URL list) view for this exploratorystudy

Second the noisy privacy-preserving properties of the datasetlimit us to questions about interaction data in aggregate (not indi-vidual users or actions) The fact that the dataset (also for privacyreasons) includes only data about URLs that were shared publicly byenough people on Facebook also means there may a selection biastowards certain URLs (that tend to be shared publicly and widely)and Facebook users (those who make non-private posts)

Third there is other data that would provide useful contextthat we simply do not have For example we lack demographicinformation about who posted URLs in the first place the ldquosharerdquointeractions consist only of re-shares of posts We also cannot com-pare posts containing URLs (reflected in this dataset) with poststhat do not contain URLs

Moreover the fact that the dataset is not public and available onlythrough an application process and official agreementmdash thoughhelping to protect the privacy of the Facebook users whose datais representedmdashmakes it challenging for others to reproduce ourresults The fact that we only access the data on Facebookrsquos servers(per our agreement with Facebook) where the dataset may beupdated or our access revoked at any time also makes it potentiallychallenging for us (and others) to reproduce our own results inthe future As discussed the May 2021 version of the dataset weanalyzed includes more records than described in the June 2020public description of the dataset [29]

Finally we cannot separate the role of Facebookrsquos newsfeed algo-rithm and other design features from user behaviors [50] For exam-ple when we consider URL views we cannot distinguish whethera user saw a post because they chose to follow a relevant publicFacebook page because their friend posted it or because Facebookpushed a sponsored post to their feed We must assume that Face-bookrsquos algorithm attempts to push posts to people that they arelikely to engage with Similarly we lack details about how US polit-ical leaning (PPA) was calculated so we cannot distinguish whethercorrelations between PPA and certain behaviors (eg which URLsa user engages with) are by definition or emergent

Still this large Facebook dataset provides a unique opportunityto study user interactions with potential misdisinformation on theplatform itself providing a complementary perspective to othermethodologies with other limitations In our results we focus ondescribing what we find in this dataset about how people on Face-book interacted with URLs in 2017-2019 we do not intend to makegeneralizable claims about human behaviors in general

4 FINDINGSWe first characterize our URL subsets and then study how Facebookusers interacted with misdisinformation and low-quality newsURLs comparing to a baseline of all US-shared URLs and acrossdemographic groups

Parent Domain Unique URLs TPFC-F Total Viewsfoxnewscom 672K 7 [222e10 223e10]cnncom 761K 0 [216e10 217e10]

nytimescom 712K 1 [211e10 212e10]buzzfeedcom 278K 0 [174e10 175e10]

nprorg 254K 0 [170e10 170e10]youtubecom 8636K 163 [150e10 154e10]

washingtonpostcom 608K 1 [126e10 127e10]huffingtonpostcom 386K 0 [125e10 126e10]

dailywirecom 271K 20 [113e10 114e10]nbcnewscom 331K 0 [101e10 102e10]

Table 1 US Subset Domains with Most Views TPFC-F refersto the number of URLs from that domain that Facebookrsquos third-party fact-checkers labeled as false Given the differential privacynoise applied to the data view counts in Tables 1-3 are presentedwith 95 confidence intervals overlapping ranges means that wecannot distinguish the underlying non-private data points Thusthe ordering in these tables (by view count) is approximate

Parent Domain Unique URLs TPFC-F Total Viewshigherperspectivescom 830 2 [134e9 135e9]

dailysnarkcom 37K 4 [108e9 110e9]madworldnewscom 48K 4 [569e8 599e8]awarenessactcom 25K 10 [519e8 539e8]

disclosetv 37K 6 [465e8 491e8]thegatewaypunditcom 150K 42 [436e8 482e8]thepoliticalinsidercom 51K 2 [351e8 378e8]

infowarscom 89K 25 [338e8 377e8]worldnewsdailyreportcom 430 11 [343e8 353e8]

conservativepostcom 20K 8 [312e8 333e8]Table 2 Low-Quality News Domains with Most Views

Parent Domain Unique URLs TPFC-F Total TPFC-F Viewsyoutubecom 8636K 163 [532e7 570e7]

livebr0adcastcom 156 130 [556e5 200e6]yournewswirecom 26K 81 [671e7 710e7]overseasdailycom 132 78 [334e6 496e6]actual-eventstvcom 105 76 [410e5 151e6]channel23newscom 238 61 [160e7 199e7]

wfrv9com 63 59 [104e6 494e6]dailyusaupdatecom 481 55 [613e6 997e6]actual-eventscom 139 53 [790e4 983e5]

network-channelcom 138 50 [114e3 694e5]Table 3 Domains with Most TPFC False US URLs

41 Characterizing Our URL SubsetsTables 1 and 2 show the view counts for the top ten most-vieweddomains in our baseline US and our low-quality news URL subsetsrespectively Note that ldquoviewsrdquo here refers to instances when a usersaw a Facebook post containing a URL appearing in their newsfeednot instances where a user clicked through to the target of the URLThe tables also show how many unique URLs from each domainappear in that subset as well as how many URLs were fact-checkedby Facebookrsquos partners as false (the ldquoTPFC-Frdquo column)

We highlight several observations First the most popular US-shared domains in our dataset (Table 1) corresponded largely towell-known mainstream news sites though we note the outsizepopularity of youtubecom Second we note that the most popularUS-shared domains (Table 1) and the most popular domains fromour low-quality news list (Table 2) are disjoint That is potentialmisdisinformation domains in the form of low-quality news siteswere not among the most popular shared domains

Table 3 shows the domains in our dataset with the most URLsthat were third-party fact checked as false and the view counts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 1 Average Cumulative View Counts Over Time for URLs in Different Subsets 95 confidence intervals are shown

Figure 2 Percentage of URL Views in Each URL Subset by (a) Gender (b) Age Group (c) Political Page Affinity 95 confidenceintervals are shown

for those URLs This table reveals the comparatively small scopeof Facebookrsquos third-party fact checking labels which applied onlyto a small number of URLs most of which (except youtubecom)came from domains that appeared with relatively few unique URLson Facebook

42 Views (Exposure) Who SawWhat

Overall Views Exposure We begin with a core question howmuch were people exposed to (ie shown on their newsfeed) URLsin our different misdisinformation subsets compared to the base-line dataset of all US URLs In exploring this question we em-phasize again that Facebook plays a role in who sees which postsIn other words trends we surface below may reflect Facebookrsquosnewsfeed algorithm rather than (or in addition to) differences inhuman behaviors (eg which Facebook pages they follow) [50]

To investigate views we return to Tables 1 and 2 which show thetotal number of views across all URLs from each domain We findthat URLs from the most popular US domains overall received anorder of magnitude more views than URLs from popular domainsin the low-quality news subset (though the latter still received largenumbers of views) We also note that the total TPFC False URLviews for the domains in Table 3 were substantial considering thesmaller number of TPFC False URLs just dozens of TPFC False URLsproduced several million views (perhaps by definition given thatviral misinformation is prioritized for Facebookrsquos fact checking)

Spread Over Time We also ask how quickly did URLs in thedifferent subsets spread In Figure 1 we investigate the cumulativeviews over time averaged across URLs for each of our subsets Sincethe dataset provides timestamp information only at month-levelgranularity we can track only cumulative monthly views We findthat the average URL (in all subsets) spreads most quickly within the

first month after the initial post suggesting that misdisinformationinterventions must be deployed quickly to be effective

From Figure 1 we also see that URLs in the TPFC False subsetreceived significantly more views on average than those in othersubsets (hence likely making them targets for fact-checking) Inaddition TPFC False URLs were longer-lived receiving a greaterfraction of views after the firstmonth compared to the others (nearlydoubling in cumulative views) Unfortunately the granularity of thetimestamps in the dataset prevent us from investigating whetherspread decreased substantially after the time of fact checking

Demographic Differences in ExposureWe now turn to our sec-ond research question considering different demographic groups(gender age and political leaning) While we do not know the over-all demographic breakdown of Facebook users in general we cancompare demographics in interactions with our US URLs baselineto interactions with our low-quality news and TPFC False URLsubsets

Figure 2 shows the percentage of views from users in differentdemographic categories for the average URL in each URL subsetFor example considering the TPFC False URL subset we find thaton average (ie averaged across all URLs in this subset) more than20 of views came from users ages 65 or older Taking all US URLsas our baseline we find that known or potential misdisinformationURLs in both lists were shownmore often (on average) than baselineto male users to users ages 55 and older and to users with politicalpage affinity of 1 or more (ie right-leaning)

However the breakdown in Figure 2 obscures potential inter-actions between demographic categories (for example older usersmay also tend to be more conservative) To tease apart these poten-tial interactions Figure 3 shows heatmaps of how often differentdemographic groups were shown posts containing these URLs

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

from users who clicked on a URL more than once) or users wholater deleted their Facebook accounts is not included

Our Scope and BaselineWe only consider URL data during thetime period for which we have corresponding user interaction dataJanuary 2017 through December 2019 We also consider only inter-actions from US users (for whom there is additional demographicinformation discussed below) and we scope the URL list to thoseURLs that are labeled as most-shared in the US (as opposed to inother countries) Our final subset thus consists of 9172097 URLsfrom 216137 unique parent domains (and associated user interac-tion data) We refer to this set of URLs as our baseline ldquoUS URLsrdquosubset

We use this subset of all US URLs as a baseline against which wecompare user interactions with known and potential misdisinfor-mation We use this full subset rather than a more specific baseline(eg mainstream or trustworthy news sources) because this allowsus to consider the ldquoaveragerdquo US URL on Facebook without im-posing any value judgements on what is trustworthy or otherwiseworth including in a baseline This full set of US URLs likely in-cludes a wide variety of types of content (which future work maywish to separate out further) but we note that because the datasetincludes only URLs that were shared publicly 100 or more times(with Laplace(5) noise) a potentially long tail of unpopular websitesis excluded by definition

Demographic Data The dataset includes demographic informa-tion associated with URL interactions age bracket gender andldquopolitical page affinityrdquo The latter refers to US political leaningand is available only for US users It is calculated by analyzingthe pages that users follow and based on methods from Barberaacute etal [3] The dataset documentation does not detail further how PPAis calculated here but the end result maps users into five integerbuckets from -2 (ideological left ie liberal) to 2 (ideological rightie conservative) For simplicity and to avoid suggesting conclu-sions where we draw none we omit demographic categories withsubstantially less data (ie ldquootherrdquo for gender and ldquoNArdquo for age)from our figures and analysis

Differential Privacy In order to release this dataset while pro-tecting the privacy of the users whose data is represented Face-book applied differential privacy techniques to add noise to thedata [9 10 29] Specifically the dataset has been made differentiallyprivate on the level of user interactions meaning that the datasetwould look statistically the same whether or not any particular userinteraction is actually included Theoretically this gives plausibledeniability about any particular action by any particular user whilestill accurately reflecting behavior statistically at this large scale

In practice differential privacy here is accomplished by addingGaussian noise to certain columns in the dataset In our results wethus display calculated (noisy) values as well as 95 confidenceintervals Signal-to-noise properties apply though statistics forlarger samples include more noise that noise will generally besmaller relative to the true value

Because noise was applied to the dataset only for data columnsrelated to user activity and demographics URL attributes such aspost times and fact-check ratings are precise

32 Known and Potential MisDisinformationURL Subsets

To investigate our research questions we need a list of misdisinfor-mation domains andor URLs However compiling such a list is amajor research challenge of its own Since addressing that challengeis not a goal of this work we rely on two existing characterizationsone that is likely too narrow and one that is likely too broad

(1) Third-Party Fact-Checked False List (ldquoTPFC False USrdquo)Our most straightforward source of ldquofalserdquo URLs are those flaggeddirectly by Facebook in collaborationwith its third-party fact check-ing partners This process focuses primarily on ldquoviralrdquo misinfor-mation (ie popular posts) as well as clear falsehoods Posts offlagged URLs are labeled in Facebookrsquos UIs andor downranked inits newsfeed algorithm [11] Specifically we focus on US URLsmarked as ldquofact checked as falserdquo or ldquofact checked as mixture orfalse headlinerdquo (as opposed to eg ldquosatirerdquo) There were 6838 USURLs with such ratings from 2644 unique parent domains

The advantage of this list is our high confidence in the ldquofalserdquolabels The disadvantages include that inclusion in the list dependson decisionsmade by Facebook and its partners (eg a focus on viralcontent and limited coverage due to limited resources) MoreoverFacebookrsquos fact-checking choices are not entirely transparent andhave been criticized for being both potentially too harsh andor toolenient on both US politically left- and right-leaning content [3242] This list thus does not represent a labeling of a random sampleof URLs and exclusion from this list does not imply truth

(2) Broad PotentialMisinformation List (ldquoLow-Quality NewsUSrdquo) Since Facebookrsquos fact-checked URL list is a narrow view ofpotential misdisinformation we also consider a much broader viewof low-quality news Specifically we consider a subset of potentialmisdisinformation domains subsampled from a list from Zenget al [51] This list (of which we obtained an updated version inOctober 2020) includes 1344 domains compiled from amateur opensource and professional content checkers Since this list includes abroad range of low-quality news domains we subsampled basedon labels only from professional fact-checking sources SnopesFactCheckorg and Politifact The resulting subsample contains 120domains 103 of which were present and corresponded to 108408US URLs in the Facebook dataset

The disadvantage of this list is that it may be overly general notall articles on a domain may contain falsehoods and thus we canconsider these URLs only potential misdisinformation or moregenerally low-quality news However this list provides us witha much broader perspective that is not influenced by Facebookrsquosfact-checking decisions

33 EthicsOur study was reviewed by our universityrsquos human subjects reviewboard (IRB) Because the dataset does not include identifiable infor-mation about individuals our IRB determined that this study didnot classify as human subjects research Nevertheless we treatedthe data with caution and adhered to Facebookrsquos guidelines aboutits use accessing it only on Facebook-provided servers making noefforts to de-anonymize or identify any individuals and submittingthe paper to Facebook to review for potential privacy issues in

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

advance of publication (No changes to the paper were requestedby Facebook)

34 LimitationsFirst as discussed we lack ground truth for misdisinformationand instead consider both a narrow (the TPFC False URL list) anda broad (the low-quality news URL list) view for this exploratorystudy

Second the noisy privacy-preserving properties of the datasetlimit us to questions about interaction data in aggregate (not indi-vidual users or actions) The fact that the dataset (also for privacyreasons) includes only data about URLs that were shared publicly byenough people on Facebook also means there may a selection biastowards certain URLs (that tend to be shared publicly and widely)and Facebook users (those who make non-private posts)

Third there is other data that would provide useful contextthat we simply do not have For example we lack demographicinformation about who posted URLs in the first place the ldquosharerdquointeractions consist only of re-shares of posts We also cannot com-pare posts containing URLs (reflected in this dataset) with poststhat do not contain URLs

Moreover the fact that the dataset is not public and available onlythrough an application process and official agreementmdash thoughhelping to protect the privacy of the Facebook users whose datais representedmdashmakes it challenging for others to reproduce ourresults The fact that we only access the data on Facebookrsquos servers(per our agreement with Facebook) where the dataset may beupdated or our access revoked at any time also makes it potentiallychallenging for us (and others) to reproduce our own results inthe future As discussed the May 2021 version of the dataset weanalyzed includes more records than described in the June 2020public description of the dataset [29]

Finally we cannot separate the role of Facebookrsquos newsfeed algo-rithm and other design features from user behaviors [50] For exam-ple when we consider URL views we cannot distinguish whethera user saw a post because they chose to follow a relevant publicFacebook page because their friend posted it or because Facebookpushed a sponsored post to their feed We must assume that Face-bookrsquos algorithm attempts to push posts to people that they arelikely to engage with Similarly we lack details about how US polit-ical leaning (PPA) was calculated so we cannot distinguish whethercorrelations between PPA and certain behaviors (eg which URLsa user engages with) are by definition or emergent

Still this large Facebook dataset provides a unique opportunityto study user interactions with potential misdisinformation on theplatform itself providing a complementary perspective to othermethodologies with other limitations In our results we focus ondescribing what we find in this dataset about how people on Face-book interacted with URLs in 2017-2019 we do not intend to makegeneralizable claims about human behaviors in general

4 FINDINGSWe first characterize our URL subsets and then study how Facebookusers interacted with misdisinformation and low-quality newsURLs comparing to a baseline of all US-shared URLs and acrossdemographic groups

Parent Domain Unique URLs TPFC-F Total Viewsfoxnewscom 672K 7 [222e10 223e10]cnncom 761K 0 [216e10 217e10]

nytimescom 712K 1 [211e10 212e10]buzzfeedcom 278K 0 [174e10 175e10]

nprorg 254K 0 [170e10 170e10]youtubecom 8636K 163 [150e10 154e10]

washingtonpostcom 608K 1 [126e10 127e10]huffingtonpostcom 386K 0 [125e10 126e10]

dailywirecom 271K 20 [113e10 114e10]nbcnewscom 331K 0 [101e10 102e10]

Table 1 US Subset Domains with Most Views TPFC-F refersto the number of URLs from that domain that Facebookrsquos third-party fact-checkers labeled as false Given the differential privacynoise applied to the data view counts in Tables 1-3 are presentedwith 95 confidence intervals overlapping ranges means that wecannot distinguish the underlying non-private data points Thusthe ordering in these tables (by view count) is approximate

Parent Domain Unique URLs TPFC-F Total Viewshigherperspectivescom 830 2 [134e9 135e9]

dailysnarkcom 37K 4 [108e9 110e9]madworldnewscom 48K 4 [569e8 599e8]awarenessactcom 25K 10 [519e8 539e8]

disclosetv 37K 6 [465e8 491e8]thegatewaypunditcom 150K 42 [436e8 482e8]thepoliticalinsidercom 51K 2 [351e8 378e8]

infowarscom 89K 25 [338e8 377e8]worldnewsdailyreportcom 430 11 [343e8 353e8]

conservativepostcom 20K 8 [312e8 333e8]Table 2 Low-Quality News Domains with Most Views

Parent Domain Unique URLs TPFC-F Total TPFC-F Viewsyoutubecom 8636K 163 [532e7 570e7]

livebr0adcastcom 156 130 [556e5 200e6]yournewswirecom 26K 81 [671e7 710e7]overseasdailycom 132 78 [334e6 496e6]actual-eventstvcom 105 76 [410e5 151e6]channel23newscom 238 61 [160e7 199e7]

wfrv9com 63 59 [104e6 494e6]dailyusaupdatecom 481 55 [613e6 997e6]actual-eventscom 139 53 [790e4 983e5]

network-channelcom 138 50 [114e3 694e5]Table 3 Domains with Most TPFC False US URLs

41 Characterizing Our URL SubsetsTables 1 and 2 show the view counts for the top ten most-vieweddomains in our baseline US and our low-quality news URL subsetsrespectively Note that ldquoviewsrdquo here refers to instances when a usersaw a Facebook post containing a URL appearing in their newsfeednot instances where a user clicked through to the target of the URLThe tables also show how many unique URLs from each domainappear in that subset as well as how many URLs were fact-checkedby Facebookrsquos partners as false (the ldquoTPFC-Frdquo column)

We highlight several observations First the most popular US-shared domains in our dataset (Table 1) corresponded largely towell-known mainstream news sites though we note the outsizepopularity of youtubecom Second we note that the most popularUS-shared domains (Table 1) and the most popular domains fromour low-quality news list (Table 2) are disjoint That is potentialmisdisinformation domains in the form of low-quality news siteswere not among the most popular shared domains

Table 3 shows the domains in our dataset with the most URLsthat were third-party fact checked as false and the view counts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 1 Average Cumulative View Counts Over Time for URLs in Different Subsets 95 confidence intervals are shown

Figure 2 Percentage of URL Views in Each URL Subset by (a) Gender (b) Age Group (c) Political Page Affinity 95 confidenceintervals are shown

for those URLs This table reveals the comparatively small scopeof Facebookrsquos third-party fact checking labels which applied onlyto a small number of URLs most of which (except youtubecom)came from domains that appeared with relatively few unique URLson Facebook

42 Views (Exposure) Who SawWhat

Overall Views Exposure We begin with a core question howmuch were people exposed to (ie shown on their newsfeed) URLsin our different misdisinformation subsets compared to the base-line dataset of all US URLs In exploring this question we em-phasize again that Facebook plays a role in who sees which postsIn other words trends we surface below may reflect Facebookrsquosnewsfeed algorithm rather than (or in addition to) differences inhuman behaviors (eg which Facebook pages they follow) [50]

To investigate views we return to Tables 1 and 2 which show thetotal number of views across all URLs from each domain We findthat URLs from the most popular US domains overall received anorder of magnitude more views than URLs from popular domainsin the low-quality news subset (though the latter still received largenumbers of views) We also note that the total TPFC False URLviews for the domains in Table 3 were substantial considering thesmaller number of TPFC False URLs just dozens of TPFC False URLsproduced several million views (perhaps by definition given thatviral misinformation is prioritized for Facebookrsquos fact checking)

Spread Over Time We also ask how quickly did URLs in thedifferent subsets spread In Figure 1 we investigate the cumulativeviews over time averaged across URLs for each of our subsets Sincethe dataset provides timestamp information only at month-levelgranularity we can track only cumulative monthly views We findthat the average URL (in all subsets) spreads most quickly within the

first month after the initial post suggesting that misdisinformationinterventions must be deployed quickly to be effective

From Figure 1 we also see that URLs in the TPFC False subsetreceived significantly more views on average than those in othersubsets (hence likely making them targets for fact-checking) Inaddition TPFC False URLs were longer-lived receiving a greaterfraction of views after the firstmonth compared to the others (nearlydoubling in cumulative views) Unfortunately the granularity of thetimestamps in the dataset prevent us from investigating whetherspread decreased substantially after the time of fact checking

Demographic Differences in ExposureWe now turn to our sec-ond research question considering different demographic groups(gender age and political leaning) While we do not know the over-all demographic breakdown of Facebook users in general we cancompare demographics in interactions with our US URLs baselineto interactions with our low-quality news and TPFC False URLsubsets

Figure 2 shows the percentage of views from users in differentdemographic categories for the average URL in each URL subsetFor example considering the TPFC False URL subset we find thaton average (ie averaged across all URLs in this subset) more than20 of views came from users ages 65 or older Taking all US URLsas our baseline we find that known or potential misdisinformationURLs in both lists were shownmore often (on average) than baselineto male users to users ages 55 and older and to users with politicalpage affinity of 1 or more (ie right-leaning)

However the breakdown in Figure 2 obscures potential inter-actions between demographic categories (for example older usersmay also tend to be more conservative) To tease apart these poten-tial interactions Figure 3 shows heatmaps of how often differentdemographic groups were shown posts containing these URLs

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

advance of publication (No changes to the paper were requestedby Facebook)

34 LimitationsFirst as discussed we lack ground truth for misdisinformationand instead consider both a narrow (the TPFC False URL list) anda broad (the low-quality news URL list) view for this exploratorystudy

Second the noisy privacy-preserving properties of the datasetlimit us to questions about interaction data in aggregate (not indi-vidual users or actions) The fact that the dataset (also for privacyreasons) includes only data about URLs that were shared publicly byenough people on Facebook also means there may a selection biastowards certain URLs (that tend to be shared publicly and widely)and Facebook users (those who make non-private posts)

Third there is other data that would provide useful contextthat we simply do not have For example we lack demographicinformation about who posted URLs in the first place the ldquosharerdquointeractions consist only of re-shares of posts We also cannot com-pare posts containing URLs (reflected in this dataset) with poststhat do not contain URLs

Moreover the fact that the dataset is not public and available onlythrough an application process and official agreementmdash thoughhelping to protect the privacy of the Facebook users whose datais representedmdashmakes it challenging for others to reproduce ourresults The fact that we only access the data on Facebookrsquos servers(per our agreement with Facebook) where the dataset may beupdated or our access revoked at any time also makes it potentiallychallenging for us (and others) to reproduce our own results inthe future As discussed the May 2021 version of the dataset weanalyzed includes more records than described in the June 2020public description of the dataset [29]

Finally we cannot separate the role of Facebookrsquos newsfeed algo-rithm and other design features from user behaviors [50] For exam-ple when we consider URL views we cannot distinguish whethera user saw a post because they chose to follow a relevant publicFacebook page because their friend posted it or because Facebookpushed a sponsored post to their feed We must assume that Face-bookrsquos algorithm attempts to push posts to people that they arelikely to engage with Similarly we lack details about how US polit-ical leaning (PPA) was calculated so we cannot distinguish whethercorrelations between PPA and certain behaviors (eg which URLsa user engages with) are by definition or emergent

Still this large Facebook dataset provides a unique opportunityto study user interactions with potential misdisinformation on theplatform itself providing a complementary perspective to othermethodologies with other limitations In our results we focus ondescribing what we find in this dataset about how people on Face-book interacted with URLs in 2017-2019 we do not intend to makegeneralizable claims about human behaviors in general

4 FINDINGSWe first characterize our URL subsets and then study how Facebookusers interacted with misdisinformation and low-quality newsURLs comparing to a baseline of all US-shared URLs and acrossdemographic groups

Parent Domain Unique URLs TPFC-F Total Viewsfoxnewscom 672K 7 [222e10 223e10]cnncom 761K 0 [216e10 217e10]

nytimescom 712K 1 [211e10 212e10]buzzfeedcom 278K 0 [174e10 175e10]

nprorg 254K 0 [170e10 170e10]youtubecom 8636K 163 [150e10 154e10]

washingtonpostcom 608K 1 [126e10 127e10]huffingtonpostcom 386K 0 [125e10 126e10]

dailywirecom 271K 20 [113e10 114e10]nbcnewscom 331K 0 [101e10 102e10]

Table 1 US Subset Domains with Most Views TPFC-F refersto the number of URLs from that domain that Facebookrsquos third-party fact-checkers labeled as false Given the differential privacynoise applied to the data view counts in Tables 1-3 are presentedwith 95 confidence intervals overlapping ranges means that wecannot distinguish the underlying non-private data points Thusthe ordering in these tables (by view count) is approximate

Parent Domain Unique URLs TPFC-F Total Viewshigherperspectivescom 830 2 [134e9 135e9]

dailysnarkcom 37K 4 [108e9 110e9]madworldnewscom 48K 4 [569e8 599e8]awarenessactcom 25K 10 [519e8 539e8]

disclosetv 37K 6 [465e8 491e8]thegatewaypunditcom 150K 42 [436e8 482e8]thepoliticalinsidercom 51K 2 [351e8 378e8]

infowarscom 89K 25 [338e8 377e8]worldnewsdailyreportcom 430 11 [343e8 353e8]

conservativepostcom 20K 8 [312e8 333e8]Table 2 Low-Quality News Domains with Most Views

Parent Domain Unique URLs TPFC-F Total TPFC-F Viewsyoutubecom 8636K 163 [532e7 570e7]

livebr0adcastcom 156 130 [556e5 200e6]yournewswirecom 26K 81 [671e7 710e7]overseasdailycom 132 78 [334e6 496e6]actual-eventstvcom 105 76 [410e5 151e6]channel23newscom 238 61 [160e7 199e7]

wfrv9com 63 59 [104e6 494e6]dailyusaupdatecom 481 55 [613e6 997e6]actual-eventscom 139 53 [790e4 983e5]

network-channelcom 138 50 [114e3 694e5]Table 3 Domains with Most TPFC False US URLs

41 Characterizing Our URL SubsetsTables 1 and 2 show the view counts for the top ten most-vieweddomains in our baseline US and our low-quality news URL subsetsrespectively Note that ldquoviewsrdquo here refers to instances when a usersaw a Facebook post containing a URL appearing in their newsfeednot instances where a user clicked through to the target of the URLThe tables also show how many unique URLs from each domainappear in that subset as well as how many URLs were fact-checkedby Facebookrsquos partners as false (the ldquoTPFC-Frdquo column)

We highlight several observations First the most popular US-shared domains in our dataset (Table 1) corresponded largely towell-known mainstream news sites though we note the outsizepopularity of youtubecom Second we note that the most popularUS-shared domains (Table 1) and the most popular domains fromour low-quality news list (Table 2) are disjoint That is potentialmisdisinformation domains in the form of low-quality news siteswere not among the most popular shared domains

Table 3 shows the domains in our dataset with the most URLsthat were third-party fact checked as false and the view counts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 1 Average Cumulative View Counts Over Time for URLs in Different Subsets 95 confidence intervals are shown

Figure 2 Percentage of URL Views in Each URL Subset by (a) Gender (b) Age Group (c) Political Page Affinity 95 confidenceintervals are shown

for those URLs This table reveals the comparatively small scopeof Facebookrsquos third-party fact checking labels which applied onlyto a small number of URLs most of which (except youtubecom)came from domains that appeared with relatively few unique URLson Facebook

42 Views (Exposure) Who SawWhat

Overall Views Exposure We begin with a core question howmuch were people exposed to (ie shown on their newsfeed) URLsin our different misdisinformation subsets compared to the base-line dataset of all US URLs In exploring this question we em-phasize again that Facebook plays a role in who sees which postsIn other words trends we surface below may reflect Facebookrsquosnewsfeed algorithm rather than (or in addition to) differences inhuman behaviors (eg which Facebook pages they follow) [50]

To investigate views we return to Tables 1 and 2 which show thetotal number of views across all URLs from each domain We findthat URLs from the most popular US domains overall received anorder of magnitude more views than URLs from popular domainsin the low-quality news subset (though the latter still received largenumbers of views) We also note that the total TPFC False URLviews for the domains in Table 3 were substantial considering thesmaller number of TPFC False URLs just dozens of TPFC False URLsproduced several million views (perhaps by definition given thatviral misinformation is prioritized for Facebookrsquos fact checking)

Spread Over Time We also ask how quickly did URLs in thedifferent subsets spread In Figure 1 we investigate the cumulativeviews over time averaged across URLs for each of our subsets Sincethe dataset provides timestamp information only at month-levelgranularity we can track only cumulative monthly views We findthat the average URL (in all subsets) spreads most quickly within the

first month after the initial post suggesting that misdisinformationinterventions must be deployed quickly to be effective

From Figure 1 we also see that URLs in the TPFC False subsetreceived significantly more views on average than those in othersubsets (hence likely making them targets for fact-checking) Inaddition TPFC False URLs were longer-lived receiving a greaterfraction of views after the firstmonth compared to the others (nearlydoubling in cumulative views) Unfortunately the granularity of thetimestamps in the dataset prevent us from investigating whetherspread decreased substantially after the time of fact checking

Demographic Differences in ExposureWe now turn to our sec-ond research question considering different demographic groups(gender age and political leaning) While we do not know the over-all demographic breakdown of Facebook users in general we cancompare demographics in interactions with our US URLs baselineto interactions with our low-quality news and TPFC False URLsubsets

Figure 2 shows the percentage of views from users in differentdemographic categories for the average URL in each URL subsetFor example considering the TPFC False URL subset we find thaton average (ie averaged across all URLs in this subset) more than20 of views came from users ages 65 or older Taking all US URLsas our baseline we find that known or potential misdisinformationURLs in both lists were shownmore often (on average) than baselineto male users to users ages 55 and older and to users with politicalpage affinity of 1 or more (ie right-leaning)

However the breakdown in Figure 2 obscures potential inter-actions between demographic categories (for example older usersmay also tend to be more conservative) To tease apart these poten-tial interactions Figure 3 shows heatmaps of how often differentdemographic groups were shown posts containing these URLs

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 1 Average Cumulative View Counts Over Time for URLs in Different Subsets 95 confidence intervals are shown

Figure 2 Percentage of URL Views in Each URL Subset by (a) Gender (b) Age Group (c) Political Page Affinity 95 confidenceintervals are shown

for those URLs This table reveals the comparatively small scopeof Facebookrsquos third-party fact checking labels which applied onlyto a small number of URLs most of which (except youtubecom)came from domains that appeared with relatively few unique URLson Facebook

42 Views (Exposure) Who SawWhat

Overall Views Exposure We begin with a core question howmuch were people exposed to (ie shown on their newsfeed) URLsin our different misdisinformation subsets compared to the base-line dataset of all US URLs In exploring this question we em-phasize again that Facebook plays a role in who sees which postsIn other words trends we surface below may reflect Facebookrsquosnewsfeed algorithm rather than (or in addition to) differences inhuman behaviors (eg which Facebook pages they follow) [50]

To investigate views we return to Tables 1 and 2 which show thetotal number of views across all URLs from each domain We findthat URLs from the most popular US domains overall received anorder of magnitude more views than URLs from popular domainsin the low-quality news subset (though the latter still received largenumbers of views) We also note that the total TPFC False URLviews for the domains in Table 3 were substantial considering thesmaller number of TPFC False URLs just dozens of TPFC False URLsproduced several million views (perhaps by definition given thatviral misinformation is prioritized for Facebookrsquos fact checking)

Spread Over Time We also ask how quickly did URLs in thedifferent subsets spread In Figure 1 we investigate the cumulativeviews over time averaged across URLs for each of our subsets Sincethe dataset provides timestamp information only at month-levelgranularity we can track only cumulative monthly views We findthat the average URL (in all subsets) spreads most quickly within the

first month after the initial post suggesting that misdisinformationinterventions must be deployed quickly to be effective

From Figure 1 we also see that URLs in the TPFC False subsetreceived significantly more views on average than those in othersubsets (hence likely making them targets for fact-checking) Inaddition TPFC False URLs were longer-lived receiving a greaterfraction of views after the firstmonth compared to the others (nearlydoubling in cumulative views) Unfortunately the granularity of thetimestamps in the dataset prevent us from investigating whetherspread decreased substantially after the time of fact checking

Demographic Differences in ExposureWe now turn to our sec-ond research question considering different demographic groups(gender age and political leaning) While we do not know the over-all demographic breakdown of Facebook users in general we cancompare demographics in interactions with our US URLs baselineto interactions with our low-quality news and TPFC False URLsubsets

Figure 2 shows the percentage of views from users in differentdemographic categories for the average URL in each URL subsetFor example considering the TPFC False URL subset we find thaton average (ie averaged across all URLs in this subset) more than20 of views came from users ages 65 or older Taking all US URLsas our baseline we find that known or potential misdisinformationURLs in both lists were shownmore often (on average) than baselineto male users to users ages 55 and older and to users with politicalpage affinity of 1 or more (ie right-leaning)

However the breakdown in Figure 2 obscures potential inter-actions between demographic categories (for example older usersmay also tend to be more conservative) To tease apart these poten-tial interactions Figure 3 shows heatmaps of how often differentdemographic groups were shown posts containing these URLs

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

Figure 3 Percentage of URL Views by Demographic Groups(Age and Political Page Affinity) for Each URL Subset Per-centages are presented in ranges with 95 confidence intervals dueto differential privacy noise

Each heatmap corresponds to a URL subset and each square ofeach heatmap represents a different demographic intersection de-fined by the tuple age bucket political page affinity (Since genderdid not have an impact on our conclusions we collapse gender for

readability of the heatmap) The percentages represent the fractionof views that an average URL-containing post received from eachdemographic group (ie for each heatmap the percentages sum to100)

Figure 3 thus compares how frequently relative to each otherdifferent overlapping demographic groups were shown posts con-taining URLs For both of our misdisinformation URL subsets wefind that US users who are older and more politically right-leaningwere more likely to see misinformation URLs in their feeds The topbaseline heatmap by contrast shows a larger percentage of viewsfrom younger and farther left-leaning usersmdash in other words thetrends for low-quality news and fact-checked false URLs were notmerely reflections of overall trends of everyonersquos Facebook usageandor experience

43 Clicking Who Clicked WhatWhen a Facebook user saw a post containing a URL what did theydo Figure 4 shows a variety of behaviors aggregated across allusers in our dataset for each URL subset In this subsection webegin by considering clicking behaviors Unlike views which canresult from Facebook pushing content to users clicks are actionstaken directly by users (though still predicated on having seen thepost in the first place)

Overall Clicking Behaviors Figure 4rsquos set of ldquoclicksrdquo bars showsaverage click-through rates ie average clicks-per-view We findthat compared to the baseline of all US URLs which users clickedon roughly 6 of the time they see them URLs in our misdisinfor-mation sets were clicked on more often roughly 8-9 of the timeIn other words the average US user who saw a potential misdis-information URL was more likely to click on it than the averageuser who encountered any random link on Facebook We note thatthis comparison depends on our choice of baseline comparing tothe ldquoaveragerdquo US-shared URL the reasons for increased engage-ment with potential misdisinformation here may result not frommisdisinformation tactics but (for example) because news-stylecontent receives more engagement in general

Demographic Differences in Click-Through Rate The heat-maps in Figure 5 break down the click-through rates for differentdemographic groups and URL subsets Here we show a baselineclick-through rate heatmap on top the percentages (with confi-dence intervals of 95) represent the average click-through rate perdemographic intersection for all US-shared URLs The next twoheatmaps show differences in click-through rate from the baselinefor each demographic intersection we calculated the ratio of theclick-through rate to the baseline click-through rate (For examplea value of 1 would indicate no change while a value of 2 wouldindicate that the click-through rate doubled)

To our surprise the demographic trends observed when consid-ering views (above) do not fully hold here In addition to baselineclick-through rate behavior being roughly comparable on both sidesof the political spectrum we see that for the TPFC False and low-quality news URLs differences in click-through rates comparedto the baseline were not skewed by political page affinity Thatis despite the demographic differences in who saw these posts

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 4 Average Actions per View for URLs in each Subset 95 confidence intervals are shown

political leaning did not seem to impact on click-through rates forknown or potential misdisinformation URLs

We do continue to see differences by age though not unique tomisdisinformation Older adults were more likely to click on URLsin general (top heatmap) without additional age-related differencesfor potential misdisinformation URLs

We cannot distinguishwhy people clicked on links or the impactsof reading the underlying content (eg perhaps politically left-leaning users click on right-leaning links to debunk them) Stillthis finding suggests that political differences in engagement withmisinformation reported in prior research may be partially due towho is exposed by the platform in the first place rather than morefundamental differences between groups Moreover we emphasizethat this exposure is not random Facebookrsquos newsfeed algorithmmay optimize showing these posts to precisely those users morelikely to click on them and our findings may reflect the success (orat least uniformity) of Facebookrsquos algorithm in doing just that Stillthis result underscores the importance of limiting initial exposureto misdisinformation as we discuss further in Section 5

44 Sharing Who Shared WhatWe now consider URL sharing behaviors Our dataset includestwo types of interactions shares which involve users re-sharinga Facebook post containing a URL after clicking on it and shareswithout clicks where users re-share a post containing a URLwithoutfirst clicking on the URL These two metrics are mutually exclusivemeaning that ldquosharesrdquo refers only to shares with clicks We alsoemphasize that our interaction data (with associated demographicinformation) includes only re-shares not original posts of a URL

Overall Sharing Behaviors Figure 4 shows that misdisinforma-tion URLs garnered more sharesmdash and more shares without clicksmdashper view than average US-shared URLs Share rates are particularlyhigh for TPFC False URLs Since Facebook prioritizes fact-checkingviral misdisinformation this finding may reflect in part how theTPFC False subset was selected in the first place However notethat the sharing rates for the low-quality news URLs approachedthe rates for these noteworthy TPFC False URLs ie they also drewsubstantial engagement (per view) We emphasize again that thecomparison to baseline depends on our choice of a broad baselinewe see that (these) potential misdisinformation URLs spread more

efficiently than other US URLs on average but we leave to fu-ture work more detailed comparisons against other potential URLsubsets of interest (eg mainstream news)

Across all URL subsets we were surprised at the high rate ofshares without clicks Considering share counts (rather than shares-per-view) nearly half of all shares for all URL subsets was doneby Facebook users who had not clicked on the URL they werere-sharing Specifically the percentage of shares without clicksfor the average URL was (with 95 confidence intervals) 4229-4235 in the overall US-shared dataset 3981-4018 for TPFCFalse and 4166-4213 for low-quality news The fact that theseratios are similar across all URL subsets suggests that there were notnecessarily URL or content based differences driving this behavior

Demographic Differences in Sharing URLs We consider therates at which different groups shared the URLs they see Figure 6is constructed like the click-through rate heatmaps (with a topbaseline and differences from that baseline below) now consideringthe average shares-per-view ratio per demographic intersectionHere we consider both shares with and without clicks ie all re-shares Note that as with clicks we cannot distinguish the goalsof a share (eg sharing because one agrees with the article or todebunk it)

Unlike for click-throughs we again see a trend towards highersharing rates (given that a user is exposed) of misdisinformationURLs by politically right-leaning users particularly of TPFC FalseURLs Unlike views here the impact of political leaning seemsto dominate compared to agemdashyounger right-leaning users werealso more likely to share these URLs compared to baseline At thesame time we find that older Facebook users were slightly morelikely to share URLs in general regardless of whether they aremisdisinformation

Finally we investigate specifically what was shared by Facebookusers in different demographic buckets Figure 7 considers the top 50most viewed domains from our baseline and low-quality news URLsubsets plotting the average age and political leaning of users whoshared URLs from those domains We have labeled some exampledomains including a cluster of potential misdisinformation URLsshared mostly by older politically conservative users as well asexamples on the political left (The TPFC False URL subset is omittedfrom this comparison because there are not enough flagged URLsper most-viewed TPFC False domain to reduce noise via averagingsufficiently to allow for meaningful comparison)

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

(a) The top heatmap shows baseline click-through rates (clicks per view) bydemographic group for all US-shared URLs

(b) Differences in click-through rates from baseline (eg TPFC False URLswere clicked through by adults age 65+ and PPA=2 137-145x more oftenthan average US-shared URLs)

Figure 5 Baseline and Difference in Click-Through Rates byDemographic Groups and URL Subsets Ranges representing95 confidence intervals are shown overlapping ranges do notallow us to distinguish the underlying non-private values

(a) The top heatmap shows baseline sharing rates (shares per view) bydemographic group for all US-shared URLs

(b) Differences in sharing rates from baseline (eg average TPFC FalseURLs were shared by adults age 65+ and PPA=2 248-261x more often thanaverage US-shared URLs)

Figure 6 Baseline and Difference in SharesView by Demo-graphic Groups and URL Subsets Ranges representing 95 con-fidence intervals are shown overlapping ranges do not allow us todistinguish the underlying non-private values

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

Figure 7 Average Demographics for Users Who Shared URLs from the Top 50 Most Viewed Domains For both the US-sharedURLs baseline and the low-quality news URL subset we consider the top 50 most-views domains and calculate the average demographics(age and political page affinity) for users who re-shared posts containing URLs from those domains Several example domains are labeled and95 confidence intervals are shown

5 DISCUSSION

Reflecting on Demographic Differences Our results show de-mographic differences in who interacted with potential misdis-information on Facebook in 2017-2019 Most strikingly we findthat older adults with right-leaning US political affinities weremore exposed to (ie viewed more posts containing) potentialmisdisinformation URLs As one caveat recall that identifyingmisdisinformation is challengingmdashwhile we relied on reputablesources for our URL lists we cannot exclude the possibility thatthese lists (and Facebookrsquos own fact-checking) may disproportion-ately include politically conservative content Future work shouldcontinue to consider how to identify misdisinformation and in-vestigate our research questions with those perspectives Futurework should also consider additional baseline URL subsets (egtrustworthy news sources by some definition)

Also noteworthy is that political demographic differences flat-tened out for click-through rates (though they reappeared for shar-ing rates) In other words once a user saw a post containing amisdisinformation URL political leaning did not seem to play arole in whether a user clicked on it This finding may reflect inpart Facebookrsquos success targeting users likely to click and recallthat we do not know why people clicked or whether they took thefalse or low-quality content at face value Still this result shouldcaution us not to overestimate the impact of political leaning onhow and how often people engage with misdisinformation withoutconsidering the likelihood of being exposed in the first The fact thatwe again see demographic differences for shares may result frommany reasons that future work might study one observation wemake is that unlike clicks shares are publicly visible which mayimpact behaviors among different demographic groups

At the same time we see that older adults were more likely tobe exposed to (ie view) click on and re-share potential misdis-information than younger adults This is the case both because

older adults were disproportionately likely to be shown posts con-taining these URLs but also because they then exhibited higherclick-through and sharing rates on Facebook in general Futuredefensesmdash either on social media platforms or outside of them (egeducational efforts) mdash should thus look towards reducing suscepti-bility of older adults in particular

Comparisons to Prior Findings and Future Research Ques-tions Our findings contribute to a broader understanding of howpeople interact with potential misdisinformation on social mediabased on a unique large-scale dataset While smaller-scale and con-trolled studies help explain how or why people interact with ldquofakenewsrdquo larger-scale measurements can shed light on broader trendsMost related to our work prior results from larger-scale studiesof Facebook and Twitter suggested that false news (under variousdefinitions) is shared more by US politically right-leaning usersand older adults though sharing is generally rare [18 20] that Face-book users are more likely to see click and share content alignedwith their political ideology [2] that users tend to focus on a smallnumber of news sources [37] and that low-quality news sourcesmay receive more user engagement than mainstream news [8] Ourfindings confirm and update these prior (often pre-2017) resultswith data from 2017-2019 and emphasize an important nuance inhow we should interpret findings about demographic differencesoverall trends can be heavily influenced by what content users areexposed to by the platform in the first place

Our findings confirm other recently published results based onthis dataset [19] which used a different list of low-credibility newsdomains (based on NewsGuard [31]) but also found that older moreconservative adults were more likely to see and share those URLsSince the underlying dataset is not publicly available we believethat multiple corroborating peer-reviewed sets of findings are sci-entifically valuable In addition to a complementary perspectiveconsidering different misdisinformation URL subsets our work

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

FOCIrsquo21 August 27 2021 Virtual Event USA Aydan Bailey Theo Gregersen and Franziska Roesner

also adds an analysis of clicks (in addition to views and shares)and highlights that views should be interpreted as reflecting usersrsquoexposure rather than their direct behaviors

Our quantitative results also raise questions for follow-on morequalitative investigations Most importantly our results can revealtrends in what Facebook users do but not why or under whatconditions they do it For instanceWhy (andwhich types of content)do people reshare so frequently without clicking on the associatedURLs Why do people click on URLs in posts and what is the resultof that actionmdash eg how often do they click in order to fact-checkHow and why do older adults use Facebook differently While ourdata does not allow us to answer these questions we emphasize thatlarge-scale dataset releases like this one despite their limitationsare crucial to allowing researchers to see these trends and formulatesuch follow-on questions

Implications for Defenses Platform defenses can aim to preventmisdisinformation from reaching users (preventing exposure) orwarn them when they encounter it (reducing susceptibility) orwhen they attempt to share it (reducing spread) Our finding thatpeople across the political spectrum were roughly equally likely toclick on (but not necessarily re-share) potential misdisinformationonce exposed suggests that preventing exposure may be crucial Putanother way one might question Facebookrsquos role and responsibilityin surfacing these posts to susceptible users in the first place

The fact that people frequently re-share posts without clickingon URLs suggests that interventions at sharing time may also bevaluable (eg a recent Twitter change prompting users to click onURLs before retweeting them [22])

Regarding interventions at the time of a userrsquos exposuremdash suchas ldquofalse newsrdquo labels on postsmdash our dataset unfortunately pro-vides limited insight Because most shares of URLs happen in thefirst month or two of their lifetime on Facebook the month-levelgranularity of our data prevents us from investigating whetherthe spread of TPFC-False URLs decreases significantly once theyhave been fact-checked (and thus labeled in usersrsquo feeds) We canconfirm from our data that the scope of Facebookrsquos fact-checkingefforts is limited (to already viral and clear-cut false cases) mean-ing that Facebookrsquos fact-checking alone can address only the tip ofthe iceberg of potential misdisinformation on the platform Thatsaid Facebookrsquos methods indeed succeed at fact-checking highlypopular misdisinformation URLs with potentially great impact

6 CONCLUSIONWe conducted an exploratory analysis of a large-scale dataset ofURLs shared on Facebook in 2017-2019 investigating how and howmuch Facebook users were exposed to and interacted with postscontaining potential misdisinformation URLs and how these inter-actions differed across demographic groups We find that potentialmisdisinformation URLs received substantial user engagementparticularly from older adults and from US politically-right lean-ing users (though not uniformly) and add to a rich and growingliterature on misdisinformation on social media There are manyadditional questions we could have investigated in this dataset (egconsidering different URL subsets) and many more questions thatthis particular dataset cannot answer We hope that our exploratoryanalysis provides a foundation for continued investigations

ACKNOWLEDGEMENTSWe are very grateful to Social Science One and to Facebook for theirpartnership in making the Facebook URL Shares dataset availableto researchers including ourselves and to the Facebook team fortheir logistical and technical support working with the data Wealso wish to thank Sewoong Oh for his guidance on working withdifferentially private data We thank our anonymous reviewersand our shepherd Drew Springall for their feedback We are alsograteful to Scott Bailey for reviewing an earlier draft of this paperWe thank Eric Zeng for providing us with the updated low-qualitynews URL list Other members of the UW Security and PrivacyResearch Lab also helped contribute feedback and analysis ideasThis work was supported in part by a Social Media and DemocracyResearch Grant from the Social Science Research Council by theNational Science Foundation under Awards CNS-1513584 and CNS-1651230 and by the UW Center for an Informed Public and theJohn S and James L Knight Foundation

REFERENCES[1] Ahmer Arif Leo Graiden Stewart and Kate Starbird 2018 Acting the Part Exam-

ining Information Operations Within BlackLivesMatter Discourse Proceedingsof the ACM on Human-Computer Interaction (CSCW) 2 Article 20 (Nov 2018)27 pages

[2] Eytan Bakshy Solomon Messing and Lada A Adamic 2015 Exposure to ide-ologically diverse news and opinion on Facebook Science 348 6239 (2015)1130ndash1132

[3] Pablo Barberaacute John T Jost Jonathan Nagler Joshua A Tucker and RichardBonneau 2015 Tweeting From Left to Right Is Online Political CommunicationMore Than an Echo Chamber Psychological Science 26 10 (2015) 1531ndash1542

[4] Alessandro Bessi Fabiana Zollo Michela Del Vicario Michelangelo Puliga Anto-nio Scala Guido Caldarelli Brian Uzzi and Walter Quattrociocchi 2016 UsersPolarization on Facebook and Youtube PLOS ONE 11 8 (08 2016)

[5] Leticia Bode and Emily K Vraga 2015 In Related News That Was Wrong TheCorrection of Misinformation Through Related Stories Functionality in SocialMedia Journal of Communication 65 4 (2015) 619ndash638

[6] Alexandre Bovet and Hernaacuten A Makse 2019 Influence of fake news in Twitterduring the 2016 US presidential election Nature Communications 10 7 (2019)

[7] Ceren Budak 2019 What Happened The Spread of Fake News Publisher ContentDuring the 2016 US Presidential Election In The World Wide Web Conference(WWW)

[8] Peter Burger Soeradj Kanhai Alexander Pleijter and Suzan Verberne 2019 Thereach of commercially motivated junk news on Facebook PLOS ONE 14 8 (082019)

[9] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith 2006 Calibrat-ing Noise to Sensitivity in Private Data Analysis In Third Conference on Theoryof Cryptography (TCC) 20

[10] Georgina Evans and Gary King 2020 Statistically Valid Inferences from Differ-entially Private Data Releases with Application to the Facebook URLs Dataset(2020) httpsjmp38NrmRW

[11] Facebook [n d] Fact-Checking on Facebook ([n d]) httpswwwfacebookcombusinesshelp2593586717571940

[12] Jessica T Feezell and Brittany Ortiz 2019 lsquoI saw it on Facebookrsquo An experimentalanalysis of political learning through social media Information Communicationamp Society (2019) 1ndash20

[13] Martin Flintham Christian Karner Khaled Bachour Helen Creswick Neha Guptaand Stuart Moran 2018 Falling for Fake News Investigating the Consumptionof News via Social Media In CHI Conference on Human Factors in ComputingSystems

[14] Adam Fourney Mikloacutes Z Raacutecz Gireeja Ranade Markus Mobius and Eric Horvitz2017 Geographic and Temporal Trends in Fake News Consumption During the2016 US Presidential Election In ACM Conference on Information and KnowledgeManagement

[15] Deen Freelon 2018 Computational Research in the Post-API Age PoliticalCommunication 35 4 (2018) 665ndash668

[16] R Kelly Garrett and Shannon Poulsen 2019 Flagging Facebook FalsehoodsSelf-Identified Humor Warnings Outperform Fact Checker and Peer WarningsJournal of Computer-Mediated Communication 24 5 (10 2019) 240ndash258

[17] Christine Geeng Savanna Yee and Franziska Roesner 2020 Fake News onFacebook and Twitter Investigating How People (Donrsquot) Investigate In ACMConference on Human Factors in Computing Systems (CHI)

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References

Interactions with Potential MisDisinformation URLs Among US Users on Facebook FOCIrsquo21 August 27 2021 Virtual Event USA

[18] Nir Grinberg Kenneth Joseph Lisa Friedland Briony Swire-Thompson andDavidLazer 2019 Fake news on Twitter during the 2016 US presidential electionScience 363 6425 (2019)

[19] Andy Guess Kevin Aslett Joshua Tucker Richard Bonneau and Jonathan Nagler2021 Cracking Open the News Feed Exploring What US Facebook Users Seeand Share with Large-Scale Platform Data Journal of Quantitative DescriptionDigital Media 1 (April 2021)

[20] Andrew Guess Jonathan Nagler and Joshua Tucker 2019 Less than you thinkPrevalence and predictors of fake news dissemination on Facebook ScienceAdvances 5 1 (2019)

[21] Andrew M Guess Brendan Nyhan and Jason Reifler 2020 Exposure to un-trustworthy websites in the 2016 US election Nature Human Behavior 4 (2020)472ndash480

[22] Taylor Hatmaker 2020 Twitter plans to bring prompts to lsquoread before youretweetrsquo to all users (Sept 2020) httpstechcrunchcom20200924twitter-read-before-retweet

[23] Eslam Hussein Prerna Juneja and Tanushree Mitra 2020 Measuring Misinfor-mation in Video Search Platforms An Audit Study on YouTube Proceedings ofthe ACM on Human-Computer Interaction 4 (05 2020)

[24] Daniel Jolley and Karen M Douglas 2017 Prevention is better than cure Ad-dressing anti-vaccine conspiracy theories Journal of Applied Social Psychology47 8 (2017) 459ndash469

[25] David Lazer MatthewA Baum Yochai Benkler Adam J Berinsky Kelly M Green-hill Filippo Menczer Miriam J Metzger Brendan Nyhan Gordon PennycookDavid Rothschild Michael Schudson Steven A Sloman Cass R Sunstein EmilyThorson Duncan J Watts and Jonathan Zittrain 2018 The science of fake newsScience 359 (2018) 1094ndash1096

[26] Ji Young Lee and S Shyam Sundar 2013 To Tweet or to Retweet That Is theQuestion for Health Professionals on Twitter Health Communication 28 5 (2013)509ndash524

[27] Yanqin Lu 2019 Incidental Exposure to Political Disagreement on Facebookand Corrective Participation Unraveling the Effects of Emotional Responses andIssue Relevance International Journal of Communication 13 (2019)

[28] Binny Mathew Ritam Dutt Pawan Goyal and Animesh Mukherjee 2019 Spreadof Hate Speech in Online Social Media In 10th ACM Conference on Web Science(WebSci)

[29] SolomonMessing Christina DeGregorio Bennett Hillenbrand Gary King SauravMahanti Zagreb Mukerjee Chaya Nayak Nate Persily Bogdan State and ArjunWilkins 2020 Facebook Privacy-Protected Full URLsData Set Harvard Dataverse(2020) httpsdoiorg107910DVNTDOAPG

[30] Mia Moody-Ramirez and Andrew B Church 2019 Analysis of Facebook MemeGroups Used During the 2016 US Presidential Election Social Media + Society 51 (2019)

[31] NewsGuard [n d] NewsGuard Restoring Trust amp Accountability ([n d])httpswwwnewsguardtechcom

[32] Donie OrsquoSullivan 2020 Facebook fact-checkers to Trump supporters We are nottrying to censor you (Oct 2020) httpswwwcnncom20201029techfact-checkers-facebook-trumpindexhtml

[33] Gordon Pennycook Adam Bear Evan Collins and David G Rand 2020 TheImplied Truth Effect Attaching Warnings to a Subset of Fake News HeadlinesIncreases Perceived Accuracy of Headlines Without Warnings ManagementScience (Feb 2020)

[34] Quintly 2018 Changes to Facebookrsquos API - February 6th 2018 (Feb2018) httpssupportquintlycomhcen-usarticles115004414274-Changes-to-

Facebook-s-API-February-6th-2018[35] Gustavo Resende Philipe Melo Hugo Sousa Johnnatan Messias Marisa Vas-

concelos Jussara Almeida and Fabriacutecio Benevenuto 2019 (Mis)InformationDissemination in WhatsApp Gathering Analyzing and Countermeasures InThe World Wide Web Conference (WWW)

[36] Mattia Samory and Tanushree Mitra 2018 Conspiracies Online User discussionsin a Conspiracy Community Following Dramatic Events In International AAAIConf on Web and Social Media

[37] Ana Luciacutea Schmidt Fabiana Zollo Michela Del Vicario Alessandro Bessi AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2017Anatomy of news consumption on Facebook Proceedings of the National Academyof Sciences 114 12 (2017) 3035ndash3039

[38] Chengcheng Shao Giovanni Luca Ciampaglia Onur Varol Kaicheng YangAlessandro Flammini and Filippo Menczer 2018 The spread of low-credibilitycontent by social bots In Nature Communications

[39] Elisa Shearer and Elizabeth Grieco 2019 Americans are wary of the rolesocial media sites play in delivering the news Pew Research Center (Oct2019) httpswwwjournalismorg20191002americans-are-wary-of-the-role-social-media-sites-play-in-delivering-the-news

[40] Social Science One [n d] Building Industry-Academic Partnerships ([n d])httpssocialscienceone

[41] Social Science Research Council [n d] Social Media and Democracy ResearchGrants ([n d]) httpswwwssrcorgfellowshipsviewsocial-media-and-democracy-research-grants

[42] Olivia Solon 2020 Sensitive to claims of bias Facebook relaxed misinformationrules for conservative pages (Aug 2020) httpswwwnbcnewscomtechtech-newssensitive-claims-bias-facebook-relaxed-misinformation-rules-conservative-pages-n1236182

[43] Samuel Spies 2020 How Misinformation Spreads (July 2020) httpsmediawellssrcorgliterature-reviewshow-misinformation-spreadsversions1-0

[44] Kate Starbird Ahmer Arif Tom Wilson Katherine Van Koevering Katya Yefi-mova and Daniel Scarnecchia 2018 Ecosystem or Echo-System ExploringContent Sharing across Alternative Media Domains In International AAAI Con-ference on Web and Social Media (ICWSM)

[45] Edson C Tandoc Jr 2019 Tell Me Who Your Sources Are Journalism Practice13 2 (2019) 178ndash190

[46] Rebekah Tromble Andreas Storz and Daniela Stockmann 2017 We Donrsquot KnowWhat We Donrsquot Know When and How the Use of Twitterrsquos Public APIs BiasesScientific Inference (Nov 2017) httpsssrncomabstract=3079927

[47] Michela Del Vicario Alessandro Bessi Fabiana Zollo Fabio Petroni AntonioScala Guido Caldarelli H Eugene Stanley and Walter Quattrociocchi 2016 Thespreading of misinformation online Proceedings of the National Academy ofSciences 113 3 (2016) 554ndash559

[48] Soroush Vosoughi Deb Roy and Sinan Aral 2018 The spread of true and falsenews online Science 359 6380 (2018) 1146ndash1151

[49] Claire Wardle and Hossein Derakhshan 2017 Information Disorder Toward aninterdisciplinary framework for research and policymaking Technical ReportCouncil of Europe

[50] Angela Xiao Wu and Harsh Taneja 2020 Platform enclosure of human behaviorand its measurement Using behavioral trace data against platform epistemeNew Media amp Society (2020)

[51] Eric Zeng Tadayoshi Kohno and Franziska Roesner 2020 Bad News Clickbaitand Deceptive Ads on News and Misinformation Websites In Workshop onTechnology and Consumer Protection

  • Abstract
  • 1 Introduction
  • 2 Background and Related Work
  • 3 Methods
    • 31 Dataset Overview
    • 32 Known and Potential MisDisinformation URL Subsets
    • 33 Ethics
    • 34 Limitations
      • 4 Findings
        • 41 Characterizing Our URL Subsets
        • 42 Views (Exposure) Who Saw What
        • 43 Clicking Who Clicked What
        • 44 Sharing Who Shared What
          • 5 Discussion
          • 6 Conclusion
          • References