5
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Using Twitter to recommend real-time topical news Authors(s) Phelan, Owen; McCarthy, Kevin; Smyth, Barry Publication date 2009-10 Publication information Proceedings of the third ACM conference on Recommender systems Conference details Paper presented at the 3rd ACM Conference on Recommender Systems (RecSys 2009), New York City, NY, USA, 22-25 October 2009 Publisher ACM Item record/more information http://hdl.handle.net/10197/1893 Publisher's version (DOI) 10.1145/1639714.1639794 Downloaded 2020-07-22T13:46:16Z The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! (@ucd_oa) Some rights reserved. For more information, please see the item record link above.

Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

Provided by the author(s) and University College Dublin Library in accordance with publisher

policies. Please cite the published version when available.

Title Using Twitter to recommend real-time topical news

Authors(s) Phelan, Owen; McCarthy, Kevin; Smyth, Barry

Publication date 2009-10

Publication information Proceedings of the third ACM conference on Recommender systems

Conference details Paper presented at the 3rd ACM Conference on Recommender Systems (RecSys 2009),

New York City, NY, USA, 22-25 October 2009

Publisher ACM

Item record/more information http://hdl.handle.net/10197/1893

Publisher's version (DOI) 10.1145/1639714.1639794

Downloaded 2020-07-22T13:46:16Z

The UCD community has made this article openly available. Please share how this access

benefits you. Your story matters! (@ucd_oa)

Some rights reserved. For more information, please see the item record link above.

Page 2: Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

Using Twitter to Recommend Real-Time Topical News !

Owen Phelan, Kevin McCarthy & Barry SmythCLARITY: Centre for Sensor Web Technologies

School of Computer Science and InformaticsUniversity College Dublin Ireland

[firstname.lastname]@ucd.ie

ABSTRACTRecommending news stories to users, based on their pref-erences, has long been a favourite domain for recommendersystems research. In this paper, we describe a novel ap-proach to news recommendation that harnesses real-timemicro-blogging activity, from a service such as Twitter, asthe basis for promoting news stories from a user’s favouriteRSS feeds. A preliminary evaluation is carried out on animplementation of this technique that shows promising re-sults.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Miscellaneous

General TermsAlgorithms, Experimentation, Theory

KeywordsContent-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter

1. INTRODUCTIONThere is a long history of using recommender systems to

help users to navigate through the sea of stories that arewritten and published everyday [1, 5]. Recommender sys-tems promise to promote the most relevant stories to a userbased on their learned or stated preferences or their previousnews consumption histories, helping the user in question tokeep up-to-date and to save valuable time sifting throughless relevant stories. Content-based and collaborative filter-ing techniques have been used to good e!ect and the recentgrowth of services such as Digg (www.digg.com) demonstrate

!This material is based on works supported by Science Foun-dation Ireland under Grant No. 07/CE/11147 CLARITYCSET

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.RecSys’09, October 23–25, 2009, New York, New York, USA.Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00.

the value of recommendation techniques when it comes todeliver a more relevant and compelling news service.

For all the success of recommender systems there are someaspects of news recommendation that are not well cateredfor. For example, in this short paper we will consider theproblem of identifying niche topical news stories. Currentrecommender systems are limited in their ability to identifysuch stories because, typically, they rely on a critical mass ofuser consumption before such stories can be recognised. Inthis paper, we consider a novel alternative to conventionalrecommendation approaches by harnessing a popular micro-blogging service such as Twitter (www.twitter.com) [2, 4]as a source of current and topical news. To this end wedescribe our initial attempts to mine Twitter informationwith a view to identifying emerging topics of interest, whichcan be matched against recent news coverage in an RSS feed,as the basis for story recommendation.

In the next section we describe our recommender system,Buzzer, focusing on the system architecture, and highlight-ing how real-time information is mined from Twitter andRSS feeds to provide a basis for matching and recommen-dation. Each user specifies a set of RSS feeds that theyare interested in, as well as a recommendation strategy theywish to employ (this will be explained further below). InSection 3 we describe the results of a small, preliminaryuser evaluation to compare how di!erent users respond todi!erent types of RSS recommendations based on a numberof di!erent recommendation strategies.

2. RECOMMENDING TOPICAL NEWSRSS (Really Simple Syndication) and Twitter are two im-

portant Web 2.0 technologies. The former is a data formatthat is designed to provide access to frequently updated con-tent. Most commonly, RSS is used as a way to syndicateor distribute news information in the form of short-updatesthat can be linked back to complete stories. RSS Readersthen allow users to aggregate the updates from many dif-ferent feeds to provide a one-stop-shop to breaking news,although as users subscribe to tens of RSS feeds this intro-duces a niche information overload problem [8].

Twitter in contrast is a so-called micro-blogging service. Itallows users to submit their own short (max 140 characters)status update messages, called tweets, while following thestatus updates of others. Recently there has been muchinterest in Twitter, partly because of its growth [2], andpartly because of its ability to provide access to thoughts,intentions and activities of millions of users in real-time. Thestarting point for this paper is the idea that mining tweets

UCD Library
Text Box
© ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the third ACM conference on Recommender systems (2009) http://doi.acm.org/10.1145/1639714.1639794
Page 3: Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

can provide access to emerging topics and breaking eventsand that this information can be used as the basis for a novelapproach to ranking RSS news feeds so that topical articlescan be e!ectively promoted.

!""

#$%&&'(

)*+','-.,/'0'1

!'+233',/4&%2,-5,6%,'

7*88'(-!'1*9&1

:';

<(&%+9'1-=!> #$''&1-=#>

)%1&1-2?-1+2('/-4(&%+9'1-=<>@*'(%'1-=A>

!'B(4,C'/-<(&%+9'1-=!'+)%1&>

D1'(

E('?'(',+'1

Figure 1: Buzzer back-end architecture

2.1 Recommendation Approach & Architec-ture

Buzzer adopts a content-based recommendation technique,by mining content terms from RSS and Twitter feeds as thebasis for article ranking. Content-based approaches to rec-ommending news articles have proven successful in the past.For example, perhaps the earliest example of a news recom-mendation service, Krakatoa Chronicle [3] represented userprofiles as a weighted vector of terms drawn from the ar-ticles that a given user liked, and matched this weightedvector against a new set of articles to produce a ranked listfor presentation to the user. Similarly, Billsus and Pazzani’sNews-Dude [1] harnessed content based representations andmulti-strategy learning techniques to generate short-termand long-term user profiles, as the basis for news recom-mendation. Billsus and Pazzani [6] argue that content-basedapproaches to finding trends and topics in news articlesare di"cult because of the sheer random bag-of-words un-structured nature of articles, and the complexity of natural-language processing. We bypass this consideration becauseour technique finds common co-occuring terms among Twit-ter and RSS. The basic system architecture of Buzzer isshown in Figure 1.

The Buzzer system comprises three basic components:

1. The web-based Configuration Interface manages thebasic user registration process and allows users to pro-vide their Twitter account information and a list of

RSS feeds that they wish to follow; in fact providingTwitter account information is optional since, as dis-cussed later, Buzzer can use Twitter’s public timelineas an alternative source of tweets, as opposed to tweetsonly from friends on Twitter.

2. The Lucene Indexer is responsible for mining and in-dexing the appropriate Twitter and RSS information,given the user’s configuration settings.

3. The Recommendation Engine generates a ranked listof RSS stories based on the co-occurence of popularterms within the user’s RSS and Twitter indexes.

The process by which Buzzer generates a set of rankedRSS stories is presented in detail by the algorithm in Figure2. Given a user, u, and a set of RSS feeds, r, the systemfirst extracts the latest RSS articles, R, and Twitter tweets,T and separately indexes each article and tweet to producetwo Lucene indexes. The resulting index terms are thenextracted from these RSS and Twitter indexes as the basisto produce RSS and Twitter term vectors, MR and MT ,respectively.

Next, we identify the set of terms, t, that co-occur in MT

and MR; these are the words that are present in the latesttweets and the most recent RSS stories and they provide thebasis for our recommendation technique. Each term, ti, isused as a query against the RSS index to retrieve the setof articles A that contain ti along with their associated TF-IDF score [7] . Thus each co-occuring ti is associated witha set of articles A1, ...An, which contain ti, and the TF-IDFscore for ti in each of A1, ...An to produce a matrix as shownin Figure 3.

To calculate an overall score for each article we simplycompute the sum of the TF-IDF scores across all of theterms associated with that article as per Equation 1. Inthis way, articles which contain many tweet terms with highTF-IDF scores are preferred to articles that contain fewertweet terms with lower TF-IDF scores. Obviously this isa simple scoring mechanism but it does serve to provide astraightforward and justifiable starting point.

Score(Ai) =X

!ti

element(Ai, ti) (1)

Finally, producing the recommendation is a simple matterof selecting the top K articles with the highest scores.

2.2 Example Session & User InterfaceIn this section, we will describe a typical usage scenario.

The user logs into Buzzer using their Twitter login details(this is used for the Twitter API). The user then sets uptheir preferences for RSS feeds and recommendation strat-egy. The system then collects the latest RSS and Twitterdata and makes a recommended Buzzer feed. The user ispresented with a set of articles ranked by their relevancescore. The user can select from a range of three recommen-dation strategies, as follows:

1. Public-Rank - this strategy used the basic techniquedescribe above but mined tweets from the public time-line (that is, the most recent public tweets across theentire Twitter user-base).

2. Friends-Rank - this strategy mined tweets from theuser’s Twitter friends.

Page 4: Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

1. define BuzzerRecommender(u, r)

2. Loop (every x minutes or on refresh) Do

3. T ! getTweets(u)

4. R ! getRSSFeeds(r)

5. LT(u) ! indexTweets(T)

6. LR(u) ! indexFeeds(R)

7. MT ! getTweetTerms(L

T (u))

8. MR ! getRSSTerms(L

R (u))

9. C ! MR ! M

T

10.

For each ti in C Do

11.

A ! getArticles(ti,

Aj, L

R)

12. For each Aj in A Do

13. ArticleMatrix(ti,

Aj) ! TFIDF(t

i, Aj, L

R)

14. End

15. End

16. RecList ! TopK{"

Aj in ArticleMatrix} Score(A

j)

17. return RecList

18. End

19. End

u: user, r: rss addresses, T: tweets,

R: rss articles, LT: lucene tweet index,

LR: lucene rss index, M

T: tweet terms map,

MR: rss terms map,

C: co-occuring terms map,

ArticleMatrix: matrix of terms and articles

RecList: buzzer recommendation feed

Monday 17 August 2009

Figure 2: High-level algorithm

3. Content-Rank - this benchmark strategy did not useTwitter but instead ranked articles based on term fre-quency alone, by scoring articles according to the fre-quency of occurence of the top-100 RSS terms.

Figure 4 shows an example set of recommended articlesbased on the users selected recommendation strategy (inthis case, the user has selected the Friends-Rank approach,which uses their own friends’ Twitter content).

The resulting articles include a hyperlinked title, the maincontent of the RSS article (usually a condensed / sample ofthe full article), and a set of explanatory metadata from theBuzzer system, namely the recommendation score, and theassociated terms that were used to gather that particulararticle (See Figure 4). This explanation aids the users un-derstanding as to why the system chose to rank a certainarticle in a certain way.

The results page also shows a standard term/frequencytag cloud that includes terms ordered and sized based onthe frequency of each term. This is also useful in explainingto the user the term space that the results were derived from.For example, if the user has selected a twitter-based strategy,such as using the public feed, these terms are the co-occuringterms between the specified RSS feeds of that user, and theTwitter database. The frequency is determined based onthese co-occuring terms’ frequencies in the Twitter database.

3. PRELIMINARY EVALUATIONThe basic Buzzer system provides users with an alterna-

tive way to access RSS stories. They can use the Buzzerinterface as an RSS reader or, alternatively, the Buzzer rec-ommendation lists can be published as RSS feeds themselvesand thus incorporated, as a summary feed, into the user’snormal RSS reader.

!"

!#

!$

!%

&" &' &( &# &$ &%

")#

")'"")(((

")'%

')*($")(((")$*% ")%'( ")++( "),++

")$'( ")$'

")#

")',(")%#(

&-./0123

!2-45672-/23

")##

")'(

")#$%

!

")#'(

"),++

!'

!(

Figure 3: Buzzer’s co-occurence matrix: each cellcontains the Lucene TF-IDF score (from the RSSindex) of the given term in the given article.

Ultimately we are interested in how well the recommen-dations produced by this novel news recommendation tech-nique are received by end-users. To test this we have car-ried out a preliminary evaluation using a small group of 10participants, over a period of 5 days. Each participant con-figured the system by providing up to 10 of their favouriteRSS feeds along with their Twitter account information, andperiodically selected a di!erent recommendation strategy tore-rank the content.

During the study users were asked to use Buzzer as theirRSS reader. To begin with, they were asked explore the dif-ferent types of recommendation strategies at their leisure.As a basic evaluation measure we focused on the click-throughfrequency for articles across the 3 di!erent recommendationstrategies. The users were told the names of each of therecommendation strategies, and were given a simple expla-nation of each upon initial registration for the evaluation.

The results shown in Figure 5 (A) represent the aver-age per-user click-throughs for each of the recommendationstrategies and there is a clear di!erence in the behaviour ofusers when comparing the Twitter-based strategies to thedefault content-based technique. For example, we see that,on average, the Twitter-based strategies resulted in between8.3 and 10.4 click-throughs per user compared with only 5.8article click-throughs for the content-based strategy; a rela-tive click-through increase of between 30% and 45% for theTwitter-based strategies.

We also see that these usage results suggest a prefer-ence for the Friends-Rank recommendations compared tothe recommendations derived from Twitter’s Public Time-line (Public-Rank). This suggests that users are more likelyto tune in to the themes and topics of interest to theirfriends than those that might be of interest to the Twit-ter public at large. Interestingly, however, this is at oddswith the feedback provided by participants as part of a post-

Page 5: Provided by the author(s) and University College Dublin ... · Content-based recommendation, News recommendation, Real-time recommendation, Social recommendation, Twitter 1. INTRODUCTION

Figure 4: Screenshot of results (articles rerankedbased on users friends’ Tweets)

trial questionnaire, which indicated a strong preference forthe Public-Rank articles as shown in Figure 5 (B); 67% ofusers indicated a preference for Public-Rank recommenda-tions compared with 22% of users indicating a preference forFriends-Rank recommendations. Incidentally, none of theparticipants favoured the Content-rank strategy and 11%didn’t know which strategy they preferred.

Figure 5: A) Average per user click-through for dif-ferent recommendation strategies. B) Preferred rec-ommendation strategies

Interestingly when we compared the ratio of Public-Rankto Friends-Rank click-throughs to the number of friends theuser follows on Twitter we found a correlation coe"cientof "0.89, suggesting that users with more friends tend tobenefit more so from the Friends-Rank recommendations,compared to the recommendations derived from the publictimeline.

Although our initial user study was preliminary, the Buzzerrecommender system was well received and we found thatparticipants preferred the Twitter-based recommendationstrategies. The Buzzer feed provided the participants withinteresting and topical articles which were viewed in greaterdetail by clicking-through to the full article text.

4. CONCLUSIONSIn this short paper, we have outlined a novel news recom-

mendation technique that harnesses real-time Twitter dataas the basis for ranking and recommending articles from acollection of RSS feeds. A prototype system has been devel-oped and deployed and early evaluation results suggest thatusers do benefit from the recommendations that are derivedfrom the Twitter data.

Looking to the future, the Buzzer system provides con-siderable opportunity for further innovation and experimen-tation as a test-bed for real-time recommendation. We arecurrently extending the feedback options that are presentedto users to facilitate negative as well as positive feedback.There are also many ways in which the content-based recom-mendation technique may be improved. For example, mov-ing from single terms to more complex phrases, which mayimprove the recommendation ranking. We also hope to focuson perfecting the scoring technique. Moreover, the Buzzersystem has the potential to act as a collaborative news ser-vice with a number of opportunities to provide additionalrecommendation services such as recommending new RSSfeeds to users or recommending relevant people to follow onTwitter.

5. REFERENCES[1] Daniel Billsus and Michael J. Pazzani. A personal news

agent that talks, learns and explains. In In Proceedingsof the Third International Conference on AutonomousAgents, pages 268–275. ACM Press, 1999.

[2] Akshay Java, Xiaodan Song, Tim Finin, and BelleTseng. Why we twitter: understanding microbloggingusage and communities. In WebKDD/SNA-KDD ’07:Proceedings of the 9th WebKDD and 1st SNA-KDD2007 workshop on Web mining and social networkanalysis, pages 56–65, NY, USA, 2007. ACM.

[3] Tomonari Kamba, Krishna Bharat, and Michael C.Albers. The krakatoa chronicle - an interactive,personalized, newspaper on the web. In Proceedings ofthe Fourth International World Wide Web Conference,pages 159–170, 1995.

[4] Balachander Krishnamurthy, Phillipa Gill, and MartinArlitt. A few chirps about twitter. In WOSP ’08:Proceedings of the first workshop on Online socialnetworks, pages 19–24, NY, USA, 2008. ACM.

[5] Ken Lang. Newsweeder: learning to filter netnews. InProceedings of the 12th International Conference onMachine Learning, pages 331–339. Morgan Kaufmannpublishers Inc.: San Mateo, CA, USA, 1995.

[6] Michael Pazzani and Daniel Billsus. Content-basedrecommendation systems. The Adaptive Web, pages325–341, 2007.

[7] G. Salton and M. J. Mcgill. Introduction to ModernInformation Retrieval. McGraw-Hill, Inc., New York,NY, USA, 1986.

[8] B. Smyth and P. Cotter. A personalised tv listingsservice for the digital tv age. Knowledge-BasedSystems, 13(2-3):53 – 59, 2000.