Download pdf - A Recommender System for Predicting User Engagement in Twitter

Jonathas Magalhães2, Rubens Pessoa, Cleyton Souza, Evandro Costa, Joseana Fechine

The 2014 RecSys Challenge [1] consists of ordering tweets shared by users on IMDb according to the amount of interaction that they received. The interaction of a tweet is defined by the sum of the number of retweets and favorites that it received.Our objective is to present a contestant approach to the 2014 RecSys Challenge.

INTRODUCTION

1 More information at http://www.grouptips.org. 2 Corresponding author, e-mail: [email protected].

RECSYS CHALLENGE 2014FEDERAL UNIVERSITY OF CAMPINA GRANDE

FEDERAL UNIVERSITY OF ALAGOASIntelligent, Personalized and Social Technologies Group1

A RECOMMENDER SYSTEM FOR PREDICTINGUSER ENGAGEMENT IN TWITTER

[1] A. Said, S. Dooms, B. Loni, and D. Tikk. Recommender systems challenge 2014. In Proceedings of

the eighth ACM conference on Recommender systems, RecSys ’14, New York, NY, USA, 2014. ACM.

[2] S. Dooms, T. De Pessemier, and L. Martens. Movietweetings: a movie rating dataset collected from

twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems,

CrowdRec at RecSys 2013, 2013.

REFERENCES

We use two datasets:

● The expanded MovieTweetings dataset [2] distributed by the organizers of the

challenge, with the following attributes: movie id, movie rating, crawled time, tweet

time, followers count, statuses count, favourites count and engagement.

● The IMDb dataset which consists of additional information about movies

referenced by tweets in order to complement the MovieTweetings dataset, with

the following attributes: IMDb rating, IMDb votes count, Movie year.

COMPOSING AND PRE-PROCESSING THE DATASET

In this work we use three different regressors: Linear Regression, Pace Regression

and induction model trees algorithm M5Base that is an extension of the Quinlan’s

algorithm to the regression task.

Table 2: Regression models and their parameters.

Besides the models presented in Table 2, we implemented three methods to combine them: Average, Median and Ranking.

REGRESSION STEP

Our approach is divided into three steps:

● Classification;

● Regression and;

● Ordering Results.

In the classification and regression steps we use the Weka API to train the models.

Figure 1: Overview of the Recommender System.

OVERVIEW OF THE RECOMMENDER SYSTEM

We use three classifiers, Naïve Bayes, Support Vector Machines (SVM) and the

Nearest Neighbor algorithm Ibk.

Table 1: Classification models and their parameters.

We also implement a classifier that combine them using Voting. In other words, an

instance will be classified in a given class if it has obtained the required majority of

the models presented.

CLASSIFICATION STEP

Table 3 summarizes the factors and the levels used in each one. Considering the

factors and levels used, we have an experimental design with 2 * 7 * 9 = 126

treatments without replication. We use the metric normalized Discounted Cumulative

Gain (nDCG) to compare the methods.

Table 3: Experimental factors and their levels.

METHODOLOGY

Table 4 presents the NDCG@10 results of the ten best configurations of our approach.

Table 4: The nDCG@10 of the 10 best configurations.

RESULTS