Filter Bubble

Embed Size (px)

DESCRIPTION

a

Citation preview

  • 7/14/2019 Filter Bubble

    1/12

    Data Portraits: Connecting People of Opposing Views

    Eduardo Graells-GarridoUniversitat Pompeu

    FabraBarcelona, Spain

    [email protected]

    Mounia LalmasYahoo LabsLondon, [email protected]

    Daniele QuerciaYahoo Labs

    Barcelona, [email protected]

    ABSTRACT

    Social networks allow people to connect with each otherand have conversations on a wide variety of topics. How-ever, users tend to connect with like-minded people and readagreeable information, a behavior that leads to group polar-ization. Motivated by this scenario, we study how to takeadvantage of partial homophily to suggest agreeable contentto users authored by people with opposite views on sensitive

    issues. We introduce a paradigm to present a data portraitof users, in which their characterizing topics are visualizedand their corresponding tweets are displayed using an or-ganic design. Among their tweets we inject recommendedtweets from other people considering their views on sensitiveissues in addition to topical relevance, indirectly motivatingconnections between dissimilar people. To evaluate our ap-proach, we present a case study on Twitter about a sensitivetopic in Chile, where we estimate user stances for regularpeople and find intermediary topics. We then evaluated ourdesign in a user study. We found that recommending topi-cally relevant content from authors with opposite views in abaseline interface had a negative emotional effect. We sawthat our organic visualization design reverts that effect. Wealso observed significant individual differences linked to eval-

    uation of recommendations. Our results suggest that organicvisualization may revert the negative effects of providingpotentially sensitive content.

    Categories and Subject Descriptors

    H.5.2 [User Interfaces]: Graphical user interfaces (GUI)

    Keywords

    Visualization, social networks, homophily, sensitive issues,data portrait, recommendation.

    1. INTRODUCTIONToday, with social networks, users have less barriers to

    communicate. Geographical distance is no longer a limita-

    Unpublished manuscript. Ask authors before citing/distributing.November 2013.

    tion to interact with others or to know what is happeninganywhere in the world. In particular, real-time streams keeppeople aware of what is happening simply by reading contentposted from other accounts. Discussions for instance aboutpolitical events are driven by the usage of hashtags and any-one can contribute and participate without being connectedto other participants in the discussions. However, is it reallythat good?

    Social research has shown that people tend to connect and

    interact mostly with people with similar beliefs, a phenomenaknown as homophily. In addition, users prefer to accessagreeable information instead of information that challengestheir beliefs. These problems are augmented when socialnetworks recommend content based on what users alreadyknow, on what are their current connections, and what peoplethat are similar to them have done and liked before. Thisphenomenon of being surrounded only by similar people andhaving access to agreeable or likeable information is knownas the filter bubble [30].

    As a result, groups of users that share different pointsof view tend to polarize and disconnect from other groups.Previous work has focused on how to motivate users toread challenging information or how to motivate a change in

    behavior. This direct approach has not been effective asusers do not seem to value diversity or do not feel satisfiedwith it, a result explained by cognitive dissonance[19], a stateof discomfort that affects persons confronted with conflictingideas, beliefs, values or emotional reactions. Because socialnetworks make heavy use of recommendations, both curatedby other peers and machine generated, the motivation behindour research is: can we take advantage of user engagementwith recommendations to indirectly promote connection topeople of opposing views?

    Our methodology recommends content to users througha data portrait [14] of themselves, making them aware oftheir characterizing topics, concepts and relations in a socialnetwork. We call this characterizationuser preferences. Userpreferences are displayed following an organic design, based

    on familiar elements (wordclouds) and new stimuli (circlesin organic layouts). Then, we take advantage of partialhomophily as we inject recommended content in the contextof the preferences of the portrayed user. Besides topicalrelevance, the recommended content considers an additionalfactor: the difference in stances on sensitive issues for bothrecommender and recommendee. We call this difference viewgap. Hence, we nudge users to read content from peoplewho may have opposite views, or high view gaps, in thoseissues, while still being relevant according to their preferences.

    arXiv:1311.4

    658v1

    [cs.HC]19

    Nov2013

  • 7/14/2019 Filter Bubble

    2/12

    The thinking behind this indirect recommendation is thatif a connection is stablished, it could be possible that theportrayed user will receive potentially challenging content inthe future. This content could be better tolerated becauseof the primacy effect in impression formation [3], or, as thepopular interpretation says: first impressions matter. Thispaper is a first step in this direction by presenting an organicapproach to the visual design of data portraits.

    To evaluate our methodology, we performed a case study

    on the social network Twitter1 where the sensitive issue isabortion and the user population is from Chile. Twitter isa social network that contains a high percentage of publicdiscussions on what is currently happening. Chile has one ofthe strictest abortion laws in the world [41], and discussionon social networks is polarized in particular now ahead ofupcoming presidential elections (to be held in November2013). Twitter also provides two types of recommendations:the who to follow? widget [21], and #Discover, a sub-pagethat contains recent tweets published, retweeted or favoritedby others. All of these recommendations are driven by userconnectivity and user activity. Making recommendations onTwitter is not new, and we exploit this artefact to makeindirect recommendations using our data portrait paradigm.

    Using tweets crawled during July and August 2013, weuse our methodology to establish an abortion stance for anyTwitter account from Chile as a linear combination of the twoopposite stances: pro-choiceand pro-life. From all accountsthat have tweeted about abortion, we select a sample ofregular people. We find that indeed there is topical diversity;people of opposite views discuss many other topics, and somecommon to both stances. Then, we performed a user studywhere participants were divided into three groups: a baseline,where the data portrait consisted of an interactive wordcloudand two standard list of tweets: one from the portrayed users,and other with recommendations considering user preferencesonly; a treatment I, where the recommendations considereduser preferences and view gaps; and a treatment II, wherethe data portrait is based on our organic design and the

    recommendations considered user preferences and view gaps.Our results indicate that the user experience with our noveldata portrait design is comparable to a familiar baseline.The injection of content from people with opposite views hada negative effect on the emotional reaction of users whencomparing the baseline and the first treatment. However, ourorganic design helped to recover the emotional reaction to itsprevious levels. We also found how individual differences arerelated to recommendation attributes: openness is linked toperceived interestingness, and engagement with our designis related to perceived serendipity.

    This paper makes the following contributions: (1) An or-ganic visualization design that depicts a virtual portrait ofa user based on its characterizing content; (2) A methodol-ogy to characterize users from social networks into a set oftopics and stance on sensitive issues, and recommend tweetsbased on both; (3) A case study of a sensitive issue and acorresponding user study, demonstrating the potential of ourchosen methodologies and visualization design.

    2. BACKGROUND

    Visualization and Data Portraits Visualizations of socialnetwork data cover a wide range of applications, such as

    1https://twitter.com,visited 02-October-2013.

    event monitoring[15, 24], visual analysis [12], group contentanalysis[2], information diffusion[36]and ego-networks[22].Our work is related to the field known asCasual InformationVisualization, defined asthe use of computer mediated toolsto depict personally meaningful information in visual waysthat support everyday users in both everyday work and non-work situations [33]. The focus on everyday situations implythat there does not need to be a concrete task to be completed.To visualize user profiles we build data portraits [14], which

    are abstract representations of users interaction history[42]. These portraits have been built using content frome-mail[37], personal informatics systems [4], Twitter profiles[16]and internet forums [42]. Since we focus on the topicalaspect of profiles, we make use visual depictions of collectionsof words known as wordclouds [38].

    Selective Exposure. Selective exposure refers to the ten-dency of people to seek for information that reinforces theircurrent beliefs and ideas. Exposure to challenging informa-tion causes cognitive dissonance [19], an uncomfortable stateof mind that individuals try to alleviate by discarding theinformation or avoiding it. Previous works have focusedon how to minimize or avoid this dissonance to improveexposure to challenging information. InNewsCube [31]sev-

    eral automatically determined aspects of news stories arepresented to mitigate the problem of media bias and allowusers to access diverse points of views. A study on sortingand highlighting of political opinions identified two groupsof users: diversity-seeking and challenge-averse [28], wherethe latter was shown to be equally satisfied with a list ofmostly non highlighter agreeable items and a couple of high-lighted challenging items, and a list of only agreeable items.In OpinionSpace [18], users were presented with differentvisualizations of opinions.The usage of a visual approach didnot reduce selective exposure, although it generated moreengagement than a baseline and its users were more respect-ful with those having opposite opinions. In[27]users werepresented with a cartoon representation of the equilibrium oftheir reading behavior in terms of political diversity: reading

    only articles from one political side would make the figure inthe cartoon almost falls, whereas reading a fair proportionof political articles from both sides would keep the figurebalanced. Finally, [23] reported that information seekerspreferred agreeable information even when having to choosebetween agreeable and challenging items presented side byside, and high topic involvement and perceived threat werefound to be influencing factors in selective exposure.

    While previous works refer to direct presentation and ac-cess to diverse content, our work takes a different, indirectapproach. We do not explicitly display challenging contentfor two reasons: 1) users often do not value diversity [28],and2) we take advantage of the primacy effecton impressionformation [3]. Hence, we do not assess selective exposure

    directly, as our goal is to connect people of opposing viewson a sensitive topic, but having non challenging initial inter-actions.

    User Classification and Characterization. We considertwo levels at which users can be modeled. At the high level,self-reported location, name and biography can be used todetect user demographic features [26], which, in additionto linguistic features of user generated content, user behav-ior and user connectivity, can be used to classify users intopolitical parties (and other categorizations) using machinelearning algorithms[32]. However, because regular people

    https://twitter.com/https://twitter.com/https://twitter.com/
  • 7/14/2019 Filter Bubble

    3/12

    may not be as vocal as politically involved people, the ac-curacy of the prediction algorithms is often over-estimated[10]. At the low level, using a knowledge base allows thedetection of entities mentioned in tweets which, in turn, al-lows to assign characterizing categories to a Twitter profile[25]. Topic modelling [35] is used to calculate latent topicsfrom a collection of microblogs that then characterizes thecollection. Then, user profiles can be characterized by theprobability distributions of how those topics contribute to

    their content. A dimensionality reduction approach is pro-posed by OpinionSpace[18], where users are asked for theiropinion of several topics to build an opinion profile.

    In our work, user views in sensitive issues are estimatedusing a mixture of a linguistic approach and opinion profiles[18]. Since our work is focused on visualization, this is suf-ficient to obtain a good enough stance determination, as alinguistic approach has been shown to be reliable [32]. Foruser preferences or topical characterization we work withn-gramsto detect characteristic word sequences such as placenames, artists, movies, and other lexical features. We recom-mend tweets based on topical relevance and the differencesin user views. Although state-of-the-art algorithms also con-sider social relations and different measures of tweet quality

    [9] these features are expensive to estimate, which is outsidethe scope of this paper.

    Sensitive Issues and Politics in Social Media. Oppo-site communities in social media have been studied. Forinstance, [1] found that liberal and conservative blogs inthe US linked mostly to blogs of the same political party.[44] studied the interaction between two opposite groupsof users on Flickr (a social network about photography)2,one related to pro-anorexia and the other to pro-recovery.Pro-recovery users injected content into pro-anorexia groupsbut the intervention was counterproductive. The injectionof political content on Twitter from opposite parties hasbeen studied in[11], and was shown to motivate interactionsbetween p olitically vocal users. However, although the men-tion networkwas found to become less polarized, the retweetnetworkwas not. Debates on abortion between pro-life andpro-choiceadvocates have been studied in[43], who showedthat the interaction between persons having the same stancereinforced group identity, and discussions with members ofthe opposite group were found to be not meaningful, partlybecause the interface did not help in that aspect. As notedin[43], people hardly changed position on abortion based ondiscussions on Twitter. However, being connected to a morediverse group of people may help create a more meaningfuldiscussion and affect people points of view instead of merelyreinforcing them. This is what we strive for in our work.

    3. DATA PORTRAIT

    Our data portrait paradigm is aimed at creating a globalimage of a Twitter account, an image that depends on whatusers want their account to project to others. Existing workon data portraits have focused either on disruptive and artis-tic designs, or solely analytical ones. We acknowledge thatregular users might exhibit resistance when faced to disrup-tive changes in the interfaces they are used to. Therefore,as shown in Figure1, we propose to add a new stimuli to afamiliar element, wordclouds, which changes in a substantial

    2http://flickr.com, accessed on 08-October-2013.

    way the interaction with it, while providing a friendly andevocative appearance based on organic patterns.

    Depicting User Topics. In data portraits, text is impor-tant as it provides immediate context and detail [14]. Wemake use of the topics that characterize a Twitter accountas input for a wordcloud in each portrait. Wordclouds existsince 1997[38] so we expect users to be familiar with them.The font-size of each topic varies according to the score given

    to the topic: a higher score implies a higher font-size. Wordpositioning and color are random, as in popular wordclouds[39]. To make each topic clickable like a button, we positiontopics in the canvas using bounding box collision detectionto minimize overlap. We extend each bounding box alongits axes to have a larger clickable area.

    Depicting Tweets. Tweets are first depicted as circles inthe middle of the wordcloud, as we want to insinuate thattweets are originating those topics. The layout organizingthe circles is based on an organic model of pattern of floretsfrom Vogel [40], defined in polar coordinates as:

    r= c

    nand = n 137.508

    where c is a constant that defines how separated circles are,

    137.508 is the golden ratio and n is the reverse chronologicalordered index of each tweet (the oldest tweet will have index1). The color of each circle is based on the color assignedto the primary topic of the corresponding tweet. By usingcollision detection, wordcloud elements do not overlap withthe circles while positioning the topics in the canvas.

    Context, Nudging and Interaction. When a user se-lects a tweet, its content is presented inside a speech balloondesigned with a format that resembles the native format inTwitter. The tweet includes native options such asretweet,replyandmark as favorite. To nudge interaction with others,in addition to the portrayed individual tweet, we displayrelated tweetsmade by unknown individuals, as shown inFigure2. To provide topical context, tweets and topics are

    connected on-demand with bezier curves, as shown in Figure1. It is possible that one topic is linked to many tweets; andone tweet to many topics. Clicking on a topic highlights thetopic, giving it a button-like appearance and displaying theconnection curves to the corresponding tweets. Clicking ona circle displays the corresponding tweet and the connectioncurves to the corresponding topics. Closing the tweet doesnot discard the connection curves, hence, users can explorethe topics based on how they are interconnected accordingto the tweets. The curves are discarded by selecting anothertopic or by clicking into empty space in the canvas.

    4. METHODOLOGYWe consider the social media site of Twitter, where users

    publish micro-posts known astweets, each having a maximumlength of 140 characters. Each user is able to followotherusers and their tweets. We consider tweets about sensitiveissues, which can be defined as the topics for which the stancesor opinions tend to divide people. For instance, abortion isa sensitive issue in many countries, whereas musical tastesis usually not (even if people have different tastes in music).Often users annotate their tweets withhashtags, which aretext identifiers that start with the character #. For instance,#prochoiceand #prolife are two hashtags related to twoabortion stances.

    http://flickr.com/http://flickr.com/http://flickr.com/
  • 7/14/2019 Filter Bubble

    4/12

    Figure 1: Our data portrait design, based on a wordcloud and an organic layout of circles. The wordcloud contains characterizingtopics and each circle is a tweet about one or more of those topics. Here, the user has clicked on her or his characterizing topic#d3jsand links to corresponding tweets have been drawn.

    Figure 2: Display of tweets inside a pop-up speech balloon.

    Our aim is to build a tool that recommends users tweetsthat they may like, and unknown to them, that come frompeople who hold opposite views in a particular sensitive issue(in our case abortion for users in Chile). We thus need todetermine for each user, what is his or her view with respectto the sensitive issue under consideration, and what he andshe is interested in general (sport, dining, film, etc.).

    User ViewsFor any sensitive issue under consideration, wecollect relevant tweets. Relevance is determined based on

    keywords and hashtags contained in tweets. The collectedtweets fall into one of the several issue stances. For eachstance, we compute a stance vector in which each dimensionrefers to the importance of a given word taken from the tweetscontaining that word. This importance is calculated usingthe TF-IDF weighting scheme used in the vector space model

    in information retrieval[5, Chapter 3]. We also define a uservector, in which each dimension refers to the importance of agiven word taken from the user tweets. Finally, we define theuser stanceas a vector where each dimension corresponds tothe cosine similarity between the user vector and the severalissue stance vectors. For a pair of users, we define their viewgap as the distance between their stances.

    User Preferences To obtain thetop-ktopics that character-ize a user (Twitter account), we compute the most commonn-grams (with n up to 3) of the user tweets. Each set ofn-grams is ranked separately, where the score given to eachtopic is a linear combination of the number of occurrences(frequency) and how long ago has the n-gram been written(time). The weights for frequency and time depend on thenumber of words in the n-gram, as unigrams are expected

    to be more frequent than bi- or tri-grams. We then combinethe n-gram sets and select the top ranked ones. We do thisfor three types of content: mentions(interactions), hashtags(explicit topics) andwords(implicit topics).

    Tweet Recommendations Having each user views andpreferences, we are now able to suggest tweets to users. Wedo so by ranking tweets based on both:

    Topical Relevance: whether they match the user prefer-ences. To estimate topical relevance we use the cosinesimilarity between the user preferences under consider-

  • 7/14/2019 Filter Bubble

    5/12

    Data #

    Tweets 3002299Accounts 798590Keywords 1289901

    Accounts with Abortion-related Tweets 40201

    Table 1: Data crawled from Twitter during July and August2013.

    ation and a pool of candidate tweets.

    Opposite Views: whether their authors have a consid-erable view gap with the user under consideration.

    In this way, suggested tweets are still relevant but come froma diverse pool of users, where diversity is with respect touser stances on a topic.

    5. CASE STUDY: ABORTION IN CHILEOne of the most sensitive issues in Chile is abortion. Chile

    has one of the most severe abortion laws in the world; abor-tion is illegal and there is no exception. The history ofabortion in Chile is long, being declared legal in 1931 and il-legal again in 1989. Given that the presidential elections willbe held soon and the occurrence of several protests aroundthemes such as abortion, public education and gay marriage,social networks are being used to spread ideas and generatedebates. In this section we describe how we represent userviewson abortion for a set of regular users from Chile inTwitter. We then describe how we determine the topicaldiversity for those users, to showcase that we can find rele-vant content to recommend to them according to their userpreferences.

    DatasetWe crawled tweets for the period of July to August2013 using the Twitter Streaming API.3 Table 1 shows asummary of the crawled data.

    5.1 User ViewsTo represent user views, we look at the most discussed

    sensitive issues on Twitter. To find those issues, we crawledtweets in July using keywords for known issues and otherkeywords and hashtags for Chile (location names and presi-dential candidate names). From those tweets, we manuallyinspected the most used keywords to find the top-100 relatedto sensitive issues. Then, in August, we crawled tweets usingthose keywords, plus emergent keywords and hashtags fromunexpected news and events related to these issues. Finally,from those tweets we selected the final top-100 keywords todefine a corpus of documents built with the tweets containingthem. We built two stance vectors: pro-life and pro-choice.The keywords used to build both vectors are specified in

    Table2. The importance of each word in each stance vec-tor is weighted according to its TF-IDF with respect to apreviously defined corpus.

    In our data, the number of users who published tweetsusing keywords related to abortion (see Table 2) is 40, 197.Because we wanted to analyze regular users (and not ac-counts that neither interact with others nor participate intodiscussions), we used a boxplot to identify outliers according

    3https://dev.twitter.com/docs/streaming-api, ac-cessed on 3-October-2013. The data was crawled by the firstauthor.

    Figure 3: Boxplot showing distribution of in-degree (fol-lowers) and out-degree (friends) of chilean users who have

    tweeted about abortion.

    Figure 4: Distribution of user stances based on similaritybetween user vectors and stance vectors (pro-life and pro-choice).

    to connectivity (see Figure3). The median offollowers (in-degree) is 283 and the maximum non-outlier value is 1, 570.The median offriends(out-degree) is 332 and the maximumnon-outlier value is 1580. We filtered users according to theirin- and out-degrees. Then, we filtered users who had lessthan 1, 000 tweets. Finally, we selected users that reportedtheir location as Chile in their profiles and that have a knowngender. To geolocate users according to their self-reportedlocation we used a methodology from [20], and to detectgender we used lists of names. After applying those filterswe had a candidate pool of 3, 350 users. For these candidateswe downloaded their latest 1, 000 tweets and estimated theiruser vectors, by weighting the words used in their tweetsaccording to our corpus of sensitive issue keywords.

    We estimated the candidates stance on abortion by com-puting the cosine similarity between their user vectors and thepro-life and pro-choice stance vectors. The two-dimensionalstances are displayed on Figure4. The view gapsare definedas the distance between their stances. We also defined acandidate tendency as the difference between their similari-ties with the pro-choice stance vector and the pro-life vector.Figure5 shows the distribution of view gaps (top) and ten-dencies (bottom). The average view gap is 0.034 0.096,

    whereas the median is 0.0051 and the maximum is 1.129.The average tendency is 0.00078 0.07245, while the medianis 0.00052. Based on these results, our candidate pool isequally distributed between pro-life and pro-choice users,and most of the view gaps differences are small.

    5.2 Topical DiversityTo verify that user preferences can be estimated from

    our candidate pool, we explore the topical diversity of theusers using topic modelling. We applied Latent DirichletAllocation [7] to a corpus where each document contains

    https://dev.twitter.com/docs/streaming-apihttps://dev.twitter.com/docs/streaming-apihttps://dev.twitter.com/docs/streaming-api
  • 7/14/2019 Filter Bubble

    6/12

    Stan ce Keywords

    Pro-choice

    #abortolibre, #yoabortoel25, #abortolegal, #yoaborto, #abortoterapeutico, #proaborto, #abortolibresegurogratuito,#despenalizaciondelaborto

    Pro-life #provida, #profamilia, #abortoesviolencia, #noalaborto, #prolife, #sialavida, #dejalolatir, #siempreporlavida, provida,#nuncaaceptaremoselaborto

    General #marchaabortolegal, aborto, abortistas, abortista, abortos, abortados, abortivo, #bonoaborto

    Table 2: Keywords used to characterize the pro-choice and pro-life stances on abortion. General keywords plus stance keywordswere used to find people who talked about abortion in Twitter.

    Figure 5: Distribution of view gaps in abortion (distancesbetween user views for a pair of users) and tendency to pro-life or pro-choice stances for regular people in our dataset.

    candidate tweets, after filtering the top-50 corpus specificstopwords (in a similar way to [35]). LDA is a generativemodel that, given a number of topics k and a corpus ofdocuments, estimates which words contribute to each latenttopic. Then, we can estimate what is the probability for alatent topic to contribute to an arbitrary document.

    We ran LDA with k = 300. Then we built an undirectedtopic graphwhere each topic is a node, and two nodes areconnected if the two corresponding topics contribute to thesame candidate tweets document. The edge weight is thefraction of users that contributed to the edge. Having analmost fully connected graph, we filter the edges based ontheir weight, leaving only those in the upper-decile. Afterremoving the disconnected nodes we have 181 nodes. Finally,we compute the betweenness centrality [29]for all remainingnodes to find intermediary topics. This graph is shown in

    Figure6, where two topical communities can be seen. Themost descriptive terms for the top-5 intermediary topicsare shown in Table3. These terms were computed using aTF-IDF inspired term-score formula[6].

    We observe that the top intermediary topics include mostlyneutral keywords; they are not related to abortion-relatedissues. In particular, the most descriptive words for eachtopic are: santiago (location name), #atencion (a hashtagof general use),#junior(referring to the soccer playerJuniorFernandez), #vende (a hashtag to sell things) and #equipos(a hashtag that refers to soccer teams). Although we did not

    Figure 6: Topic Graph, where each node is a topic obtainedwith LDA applied to our corpus of user documents in Chile.Node size encodes betweenness centrality. Two nodes areconnected if both topics contribute to one user document.Only the upper-decile of edges in terms of number of userdocuments is kept on the graph.

    analyze the non-intermediary topics (for space constraints)the fact that we have identified some level of neutrality in

    the intermediary topics indicates that it is possible to linktwo topical communities through trivial things such as sportsand item exchange.

    6. USER EVALUATIONIn this section we evaluate with a user study our data por-

    trait paradigm, and using our methodology to defining userpreferences, stances, etc, described in the previous section.

    Participants. Participants were recruited from social net-works. No compensation was offered. We recruited 37 par-

    Topic Keywords

    57 santiago, partido, derecha, parte, trabajo, nacional

    156 #atencion, #popular, #familiar, #2014, #creativo,#recomendada

    10 #junior, #trabajando, #cueca, #resumen, #cambios,#facha

    157 #vende, #camara, #drama, #marco, #enfermo, #asqueroso

    99 #equipos, #ataque, #catolicos, #comunicaciones,#absurdo, #television

    Table 3: Representative keywords for intermediary LDAtopics.

  • 7/14/2019 Filter Bubble

    7/12

    ticipants (26 male and 11 female), all of them from Chileand active users of Twitter. When participants were askedto rate their experience with social networks they scoredthemselves 3.84 0.80 in average using a Likert scale from 1to 5. In addition, 84% (31) of participants reported to haveaccepted recommendations from other persons, whereas 54%(20) reported to have accepted recommendations from Twit-terwho to follow. When asked in the post-study survey fortheir position on abortion, 86% (32) support the legalization

    of abortion.Apparatus. For each participant we crawled their last3, 200 tweets (if applicable), which is the limit imposed bythe Twitter API. Then we estimated their user preferencesand user views on abortion according to our methodology.Participants were tested in an online-setting. Our dataportrait design (see Figure1) and the baseline design (seeFigure7) were implemented in HTML and Javascript usingd3.js[8]. Before the start of the experiment, we explainedthat we have built a visually explorable characterizationof them and that we will display related tweets to theircharacterization. Each participant was given a unique urlwith their portrait to visit. After the experiment, participantsfilled a post-study survey with two parts: one part contained

    usability questions and a second part with questions relatedto the sensitive issue of our case study. Participants were nottold that the recommendations they would receive were frompeople with eventually opposing views (if applicable), andthey were not told about the sensitive issues aspect of theexperiment until the second part of the post-study survey. Acomment was added at the end of the experiment to explainwhy we were asking about abortion and sensitive issues.

    Design and Procedure. The experiment used a between-groups design. We defined three groups: baseline, treatment Iand treatment II. The baseline interface contains a wordcloudand lists of tweets with a similar format to the currentTwitter interface (see Figure7), and the recommendationsconsideruser preferences only. Treatment I uses the samebaseline interface, but recommendations depend on bothuser preferences and view gapswith authors with respectto abortion, as defined in our methodology. Treatment IIconsiders both user preferencesand view gaps, and uses ourdata portrait design.

    Task. We asked participants to visit their portrait duringthree consecutive days, and to browse their portraits for aslong as they want, but for a minimum of three minutes. Theywere encouraged to explore their user preferences, but wedid not explicitly encourage them to interact with others.If participants tweeted during the days of experiment, theirportraits were updated.

    6.1 Quantitative ResultsThe task and measures we made were designed to evaluate

    if users reacted well to our data portrait paradigm. Sinceour design presents a non typical design, it could result intoa disrupting user experience for those users who are used tothe more traditional Twitter user interface. Therefore, ourfocus was on testing how comparable the user experienceswere. Considering the previous statement, our hypothesisis: participants in Treatment II will have a comparable userexperience than participants in Treatment I. Table4 showsthe average and standard deviations of answers given byparticipants on the post-study survey. Participants answeredusing a Likert scale from 1 to 5. The dfferences in means were

    Figure 7: Baseline interface using a wordcloud and standardlists of tweets.

    Figure 8: Distribution of answers to the question Do youthink the portrait presents a global image of your profile?

    tested with Welchs t-test. The table also lists the questionsthat were asked.4

    The results support our hypothesis. The scores obtainedfor the questions How much did you enjoy using the applica-tion? and Would you use the application if it was integratedin Twitter?are positive and do not show statistically signif-icant differences between the groups. However, there is anunexpected finding; participants in treatment I felt less rep-resented by their portraits (significant at p

  • 7/14/2019 Filter Bubble

    8/12

    Control(N= 12)

    Treatment I(N= 12)

    Treatment II(N= 13)

    How much did you enjoy using the application? 3.92 0.79 3.50 0.90 3.92 0.76Would you use the application if it was integrated in Twitter? 3.67 0.65 3.17 1.03 3.31 0.75How much did you feel represented by the portrait? 3.75 0.62 3.08 0.51*** 3.69 0.95*Do you think the portrait presents a global image of your profile? 3.42 1.16 3.17 1.11 3.54 0.97Do you think the portrait allows you to discover patterns in yourbehavior?

    3.83 0.94 3.50 1.24 3.85 1.07

    Recommendation Similarity 2.58 1.00 2.58 0.90 2.31 0.95Recommendation Interestingness 2.50 1.31 3.17 1.03 2.46 0.97*

    Recommendation Serendipity 2.

    83

    0.

    83 3.

    08

    1.

    08 2.

    92

    1.

    04

    Table 4: Means and standard deviations considering control group, treatment I (recommendations considering view gaps) andtreatment II (organic visualization and recommendations considering view gaps). ***: p

  • 7/14/2019 Filter Bubble

    9/12

    High Global Image Score(N= 20)

    Low Global Image Score(N= 17)

    How much did you enjoy using the application? 3.95 0.83 3.59 0.80Would you us e the a pplicatio n if it was inte grated in Twitter? 3.60 0.82* 3.12 0.78How much did you feel represented by the portrait? 3.90 0.64*** 3.06 0.66Do you think the portrait presents a global image of your profile? 4.25 0.44*** 2.35 0.49Do you think the portrait allows you to discover patterns in yourbehavior?

    4.05 0.89** 3.35 1.17

    Recommendation Similarity 2.35 0.93 2.65 0.93Recommendation Interestingness 2.60 1.14 2.82 1.13

    Recommendation Serendipity 3.

    30

    0.

    86** 2.

    53

    0.

    94

    Table 6: Means and standard deviations for two participant groups based on the score given to the questionDo you think theportrait presents a global image of your profile? ***: p

  • 7/14/2019 Filter Bubble

    10/12

    user interface can help preventing resistance to change, as itis well-known that significant changes in social networks andonline-services can generate some amount of user anger andfrustration. Indeed, it has be noted that one argument fordeliberately designing evocative visualizations for online so-cial environments is the existing default textual interfaces arethemselves evocative, they simply evoke an aura of business-like monotony rather than the lively social scene that actuallyexists[13].

    Our results support our claims. First, if we consider iden-tification with the data portrait as an emotional reaction, wefound that including diverse recommendations with respectto user views degraded the emotional response, whereas us-ing our data portrait design helped to recover it and putit back to the same level as before. Second, there were nodifferences in enjoyment between the different experimentalgroups; overall enjoyment was high. Thus, while we expectedresistance to change, we found that our data portrait designdid not cause any resistance. Hence, our results open a pathinthe usage of organic visualization design to revert the effectof potentially discomfortable information. Instead of diverserecommendations with respect to user views, other types ofdiscomfortable (e.g. propaganda) or annoying information

    (e.g. unwanted advertising) could lead to similar desirableeffect with our organic data portrait design.

    Practical Implications. Our results indicate that individ-ual differences matter. Indeed, the rating of the three aspectsof the recommendations we measured (interestingness, sim-ilarity and serendipity) were related to individual factors:users who tweeted about a sensitive issue gave higher scoresfor interestingness and similarity compared to those who didnot. Also, people who felt represented by the portrait gavehigher score to serendipity. This has two main implications:

    1. Users who openly speak about sensitive issues are moreopen to receive recommendations authored by peoplewith opposing views. Becauseopennessof Twitter userscan be predicted [34], these individual differences can

    be detected without explicitly having to determinewhether a user is concerned about specific sensitiveissues.

    2. Using the data portrait is a signal that users strive forserendipity. The serendipity rating of the recommen-dations is related with how users felt represented bytheir portrait, and those users that gave a statisticallysignificant higher score to the question: Would youuse the application it if was integrated in Twitter?.Therefore, if a data portrait was to be included in Twit-ter, the fact that a user is interacting with it could bea strong indication of his or her willingness to valueserendipitous recommendations.

    Abortion in Chile. In Section5, we studied the distribu-tion of user views on abortion in Chile. Although Chile is ahighly polarized country on a number of sensitive issues, andin particular abortion, we saw that user tendencies towardspro-life and pro-choice stances are equally distributed, andpolarity is rather low (see Figure5). How these observationscorrespond to the reality in Chile is hard to determine, asseveral public opinion surveys report contradictory resultswith respect to this issue [41]. Nonetheless, we expecteda higher polarization. A possible explanation is that ourTF-IDF weighting used in the stance vectorsis softening the

    stronger stances on the issue. It will be important to revisitthe computation of the stance vectors, for example, not onlyby considering the words that are deemed to relate to thesensitive issue under study but how these words are used bythe users. A second explanation is that users may be less vo-cal about this (or other) sensitive issue than our expectations.Previous work demonstrated that the average users are notas vocal as politically involved people [10]. This somewhatreflects what has been reported in the Chilean press about

    users and abortion: pro-life and pro-choice people rarelyinteract with each other (so no need to become vocal) andmost of what they do is to retweet agreeable information[17]. Considering these retweets in our TF-IDF weightingused to generate the stance vectors may have shown strongerpolarization. We did not consider retweets merely to obtaina user stance based on his or her own tweets. There is thusroom for improvement here. Nonetheless, to the extent of ourknowledge this is the first work looking into linking peoplewith opposite view on a given sensitive issue, and our mainfocus was the usage of the data portrait paradigm, and toshow its potential towards our goal.

    LimitationsWe discussed that the quality of the recom-mendations will require improvement. Although this may

    have an impact on our results, we believe that overall therecommendations were acceptable to inform us about thepotential of the proposed data portrait paradigm as a wayto link people having opposite views on sensitive issues.

    Future Work In addition to test better recommendationalgorithms, individual differences and ways to detect andmeasure them are two important problems to solve in futurework. Finally, to evaluate our methodology and our dataportrait design, a longitudinal study is required to establishwhether people with opposing views become connected, forhow long and why (or why not), and the overall effect ontheir beliefs.

    8. CONCLUSIONS

    We presented a methodology to determine user preferencesand user views about sensitive issues, and to use those torecommend relevant content authored by others who haveopposite views. This information was presented through adata portrait with an organic visual design. Hence, our ap-proach is different from previous work in that we propose anindirect way to connect dissimilar people. We evaluated ourdesign and methodology through a case study. We analyzedhow users in Chile tweeted about abortion, and used the out-comes to perform a usability study with Chilean users usinga proof-of-concept implementation of our recommendationapproach in a data portrait paradigm.

    We found that using an organic visualization design hasan impact on how users felt identified by the portrait. Inaddition, we found relationships between individual differ-ences and recommendation ratings: openness in tweetingabout sensitive issues is related to higher interestingness andsimilarity to user preferences, whereas serendipitous recom-mendation is related to higher identification with the dataportrait. These results have implications for the design ofrecommender systems and the usage of visualization in socialnetworks. We conclude that an indirect approach to connect-ing people with opposing views has great potential and usingan organic design constitutes a first step in that direction,where so-calledview gaps could be more tolerated.

  • 7/14/2019 Filter Bubble

    11/12

    9. REFERENCES[1] Lada A Adamic and Natalie Glance. The political

    blogosphere and the 2004 us election: divided they blog.InProceedings of the 3rd international workshop onLink discovery, pages 3643. ACM, 2005.

    [2] Daniel Archambault, Derek Greene, PadraigCunningham, and Neil Hurley. Themecrowds:Multiresolution summaries of twitter usage. InProceedings of the 3rd international workshop on Searchand mining user-generated contents, pages 7784. ACM,2011.

    [3] Solomon E Asch. Forming impressions of personality.The Journal of Abnormal and Social Psychology,41(3):258, 1946.

    [4] Yannick Assogba and Judith Donath. Mycrocosm:visual microblogging. In System Sciences, 2009.HICSS09. 42nd Hawaii International Conference on,pages 110. IEEE, 2009.

    [5] R. Baeza-Yates and B. Ribeiro-Neto.Moderninformation retrieval: the concepts and technologybehind search, 2nd. Edition. Addison-Wesley, Pearson,2011.

    [6] David M Blei and J Lafferty. Topic models.Text

    mining: classification, clustering, and applications,10:71, 2009.

    [7] David M Blei, Andrew Y Ng, and Michael I Jordan.Latent dirichlet allocation. the Journal of machineLearning research, 3:9931022, 2003.

    [8] Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer.D3 data-driven documents.Visualization and ComputerGraphics, IEEE Transactions on, 17(12):23012309,2011.

    [9] Kailong Chen, Tianqi Chen, Guoqing Zheng, Ou Jin,Enpeng Yao, and Yong Yu. Collaborative personalizedtweet recommendation. In Proceedings of the 35thinternational ACM SIGIR conference on Research anddevelopment in information retrieval, pages 661670.

    ACM, 2012.[10] Raviv Cohen and Derek Ruths. Classifying political

    orientation on twitter: Its not easy!, 2013.

    [11] Michael Conover, Jacob Ratkiewicz, MatthewFrancisco, Bruno Goncalves, Filippo Menczer, andAlessandro Flammini. Political polarization on twitter.International AAAI Conference on Weblogs and SocialMedia, 2011.

    [12] Nicholas Diakopoulos, Mor Naaman, and FundaKivran-Swaine. Diamonds in the rough: Social mediavisual analytics for journalistic inquiry. In VisualAnalytics Science and Technology (VAST), 2010 IEEESymposium on, pages 115122. IEEE, 2010.

    [13] Judith Donath. A semantic approach to visualizing

    online conversations.Communications of the ACM,45(4):4549, 2002.

    [14] Judith Donath, Alex Dragulescu, Aaron Zinman,Fernanda Viegas, and Rebecca Xiong. Data portraits.In ACM SIGGRAPH 2010 Art Gallery, pages 375383.ACM, 2010.

    [15] Marian Dork, Daniel Gruen, Carey Williamson, andSheelagh Carpendale. A visual backchannel forlarge-scale events. IEEE Transactions on Visualizationand Computer Graphics, pages 11291138, 2010.

    [16] Alex Dragulescu. Lexigraphs: Twitter Data Portrait:

    jakedfw.http://vimeo.com/2404119, 2009. [Online;accessed 23-August-2013].

    [17] J. Fabrega, P. Paredes, and G. Vega. Aborto enTwitter: un debate a medias.http://www.elmostrador.cl/opinion/2012/04/04/

    aborto-en-twitter-un-debate-a-medias/ ,2012. [InSpanish; Online; accessed 16-September-2013. Titletranslation: Abortion in Twitter: a half-way debate].

    [18] Siamak Faridani, Ephrat Bitton, Kimiko Ryokai, andKen Goldberg. Opinion space: a scalable tool forbrowsing online comments. In Proceedings of theSIGCHI Conference on Human Factors in ComputingSystems, pages 11751184. ACM, 2010.

    [19] Leon Festinger.A theory of cognitive dissonance,volume 2. Stanford university press, 1962.

    [20] Eduardo Graells-Garrido and Barbara Poblete.#santiago is not #chile, or is it? a model to normalizesocial media impact. arXiv preprint arXiv:1309.1785,2013.

    [21] Pankaj Gupta, Ashish Goel, Jimmy Lin, AneeshSharma, Dong Wang, and Reza Zadeh. Wtf: The whoto follow service at twitter. In Proceedings of the 22ndinternational conference on World Wide Web, pages

    505514. International World Wide Web ConferencesSteering Committee, 2013.

    [22] Jeffrey Heer and danah boyd. Vizster: Visualizingonline social networks. In IEEE InformationVisualization (InfoVis), pages 3239, 2005.

    [23] Q Vera Liao and Wai-Tat Fu. Beyond the filter bubble:interactive effects of perceived threat and topicinvolvement on selective exposure to information. InProceedings of the SIGCHI Conference on HumanFactors in Computing Systems, pages 23592368. ACM,2013.

    [24] Adam Marcus, Michael S Bernstein, Osama Badar,David R Karger, Samuel Madden, and Robert C Miller.Twitinfo: aggregating and visualizing microblogs for

    event exploration. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems,pages 227236. ACM, 2011.

    [25] Matthew Michelson and Sofus A Macskassy.Discovering users topics of interest on twitter: a firstlook. InProceedings of the fourth workshop onAnalytics for noisy unstructured text data, pages 7380.ACM, 2010.

    [26] Alan Mislove, Sune Lehmann, Yong-Yeol Ahn,Jukka-Pekka Onnela, and J Niels Rosenquist.Understanding the demographics of twitter users. InICWSM, 2011.

    [27] Sean A Munson, Stephanie Y Lee, and Paul Resnick.Encouraging reading of diverse political viewpointswith a browser widget. ICWSM, 2013.

    [28] Sean A Munson and Paul Resnick. Presenting diversepolitical opinions: how and how much. In Proceedingsof the SIGCHI conference on human factors incomputing systems, pages 14571466. ACM, 2010.

    [29] Mark EJ Newman. A measure of betweenness centralitybased on random walks.Social networks, 27(1):3954,2005.

    [30] Eli Pariser.The filter bubble: What the Internet ishiding from you. Penguin UK, 2011.

    [31] Souneil Park, Seungwoo Kang, Sangyoung Chung, and

    http://vimeo.com/2404119http://vimeo.com/2404119http://www.elmostrador.cl/opinion/2012/04/04/aborto-en-twitter-un-debate-a-medias/http://www.elmostrador.cl/opinion/2012/04/04/aborto-en-twitter-un-debate-a-medias/http://www.elmostrador.cl/opinion/2012/04/04/aborto-en-twitter-un-debate-a-medias/http://www.elmostrador.cl/opinion/2012/04/04/aborto-en-twitter-un-debate-a-medias/http://www.elmostrador.cl/opinion/2012/04/04/aborto-en-twitter-un-debate-a-medias/http://vimeo.com/2404119
  • 7/14/2019 Filter Bubble

    12/12

    Junehwa Song. Newscube: delivering multiple aspectsof news to mitigate media bias. In Proceedings of the27th international conference on Human factors incomputing systems, pages 443452. ACM, 2009.

    [32] Marco Pennacchiotti and Ana-Maria Popescu.Democrats, republicans and starbucks afficionados:user classification in twitter. In Proceedings of the 17thACM SIGKDD international conference on Knowledgediscovery and data mining, pages 430438. ACM, 2011.

    [33] Zachary Pousman, John T Stasko, and Michael Mateas.Casual information visualization: Depictions of data ineveryday life.Visualization and Computer Graphics,IEEE Transactions on, 13(6):11451152, 2007.

    [34] Daniele Quercia, Michal Kosinski, David Stillwell, andJon Crowcroft. Our twitter profiles, our selves:Predicting personality with twitter. In Privacy,security, risk and trust (passat), 2011 ieee thirdinternational conference on and 2011 ieee thirdinternational conference on social computing(socialcom), pages 180185. IEEE, 2011.

    [35] Daniel Ramage, Susan Dumais, and Dan Liebling.Characterizing microblogs with topic models. InInternational AAAI Conference on Weblogs and Social

    Media, volume 5, pages 130137, 2010.[36] Fernanda Viegas, Martin Wattenberg, Jack Hebert,

    Geoffrey Borggaard, Alison Cichowlas, JonathanFeinberg, Jon Orwant, and Christopher Wren. Google+ripples: a native visualization of information flow. InProceedings of the 22nd international conference onWorld Wide Web, pages 13891398. InternationalWorld Wide Web Conferences Steering Committee,2013.

    [37] Fernanda B Viegas, Scott Golder, and Judith Donath.Visualizing email content: portraying relationshipsfrom conversational histories. In Proceedings of theSIGCHI conference on Human Factors in computingsystems, pages 979988. ACM, 2006.

    [38] Fernanda B Viegas and Martin Wattenberg. Tag cloudsand the case for vernacular visualization. interactions,15(4):4952, 2008.

    [39] Fernanda B Viegas, Martin Wattenberg, and JonathanFeinberg. Participatory visualization with wordle.Visualization and Computer Graphics, IEEETransactions on, 15(6):11371144, 2009.

    [40] Helmut Vogel. A better way to construct the sunflowerhead. Mathematical biosciences, 44(3):179189, 1979.

    [41] Wikipedia. Abortion in Chile. https://en.wikipedia.org/wiki/Abortion_in_Chile, 2013.[Online; accessed 16-September-2013].

    [42] Rebecca Xiong and Judith Donath. Peoplegarden:creating data portraits for users. InProceedings of the12th annual ACM symposium on User interfacesoftware and technology, pages 3744. ACM, 1999.

    [43] Sarita Yardi and Danah Boyd. Dynamic debates: Ananalysis of group polarization over time on twitter.Bulletin of Science, Technology & Society,30(5):316327, 2010.

    [44] Elad Yom-Tov, Luis Fernandez-Luque, Ingmar Weber,and Steven P Crain. Pro-anorexia and pro-recoveryphoto sharing: a tale of two warring tribes. Journal ofmedical Internet research, 14(6), 2012.

    https://en.wikipedia.org/wiki/Abortion_in_Chilehttps://en.wikipedia.org/wiki/Abortion_in_Chilehttps://en.wikipedia.org/wiki/Abortion_in_Chilehttps://en.wikipedia.org/wiki/Abortion_in_Chile