1
Jonathas Magalhães 2 , Rubens Pessoa, Cleyton Souza, Evandro Costa, Joseana Fechine The 2014 RecSys Challenge [1] consists of ordering tweets shared by users on IMDb according to the amount of interaction that they received. The interaction of a tweet is defined by the sum of the number of retweets and favorites that it received.Our objective is to present a contestant approach to the 2014 RecSys Challenge. INTRODUCTION 1 More information at http://www.grouptips.org. 2 Corresponding author, e-mail: [email protected]. RECSYS CHALLENGE 2014 FEDERAL UNIVERSITY OF CAMPINA GRANDE FEDERAL UNIVERSITY OF ALAGOAS Intelligent, Personalized and Social Technologies Group 1 A RECOMMENDER SYSTEM FOR PREDICTING USER ENGAGEMENT IN TWITTER [1] A. Said, S. Dooms, B. Loni, and D. Tikk. Recommender systems challenge 2014. In Proceedings of the eighth ACM conference on Recommender systems, RecSys ’14, New York, NY, USA, 2014. ACM. [2] S. Dooms, T. De Pessemier, and L. Martens. Movietweetings: a movie rating dataset collected from twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems, CrowdRec at RecSys 2013, 2013. REFERENCES We use two datasets: The expanded MovieTweetings dataset [2] distributed by the organizers of the challenge, with the following attributes: movie id, movie rating, crawled time, tweet time, followers count, statuses count, favourites count and engagement. The IMDb dataset which consists of additional information about movies referenced by tweets in order to complement the MovieTweetings dataset, with the following attributes: IMDb rating, IMDb votes count, Movie year. COMPOSING AND PRE-PROCESSING THE DATASET In this work we use three different regressors: Linear Regression, Pace Regression and induction model trees algorithm M5Base that is an extension of the Quinlan’s algorithm to the regression task. Table 2: Regression models and their parameters. Besides the models presented in Table 2, we implemented three methods to combine them: Average, Median and Ranking. REGRESSION STEP Our approach is divided into three steps: Classification; Regression and; Ordering Results. In the classification and regression steps we use the Weka API to train the models. Figure 1: Overview of the Recommender System. OVERVIEW OF THE RECOMMENDER SYSTEM We use three classifiers, Naïve Bayes, Support Vector Machines (SVM) and the Nearest Neighbor algorithm Ibk. Table 1: Classification models and their parameters. We also implement a classifier that combine them using Voting. In other words, an instance will be classified in a given class if it has obtained the required majority of the models presented. CLASSIFICATION STEP Table 3 summarizes the factors and the levels used in each one. Considering the factors and levels used, we have an experimental design with 2 * 7 * 9 = 126 treatments without replication. We use the metric normalized Discounted Cumulative Gain (nDCG) to compare the methods. Table 3: Experimental factors and their levels. METHODOLOGY Table 4 presents the NDCG@10 results of the ten best configurations of our approach. Table 4: The nDCG@10 of the 10 best configurations. RESULTS

A Recommender System for Predicting User Engagement in Twitter

Embed Size (px)

Citation preview

Page 1: A Recommender System for Predicting User Engagement in Twitter

Jonathas Magalhães2, Rubens Pessoa, Cleyton Souza, Evandro Costa, Joseana Fechine

The 2014 RecSys Challenge [1] consists of ordering tweets shared by users on IMDb according to the amount of interaction that they received. The interaction of a tweet is defined by the sum of the number of retweets and favorites that it received.Our objective is to present a contestant approach to the 2014 RecSys Challenge.

INTRODUCTION

1 More information at http://www.grouptips.org. 2 Corresponding author, e-mail: [email protected].

RECSYS CHALLENGE 2014FEDERAL UNIVERSITY OF CAMPINA GRANDE

FEDERAL UNIVERSITY OF ALAGOASIntelligent, Personalized and Social Technologies Group1

A RECOMMENDER SYSTEM FOR PREDICTINGUSER ENGAGEMENT IN TWITTER

[1] A. Said, S. Dooms, B. Loni, and D. Tikk. Recommender systems challenge 2014. In Proceedings of

the eighth ACM conference on Recommender systems, RecSys ’14, New York, NY, USA, 2014. ACM.

[2] S. Dooms, T. De Pessemier, and L. Martens. Movietweetings: a movie rating dataset collected from

twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems,

CrowdRec at RecSys 2013, 2013.

REFERENCES

We use two datasets:

● The expanded MovieTweetings dataset [2] distributed by the organizers of the

challenge, with the following attributes: movie id, movie rating, crawled time, tweet

time, followers count, statuses count, favourites count and engagement.

● The IMDb dataset which consists of additional information about movies

referenced by tweets in order to complement the MovieTweetings dataset, with

the following attributes: IMDb rating, IMDb votes count, Movie year.

COMPOSING AND PRE-PROCESSING THE DATASET

In this work we use three different regressors: Linear Regression, Pace Regression

and induction model trees algorithm M5Base that is an extension of the Quinlan’s

algorithm to the regression task.

Table 2: Regression models and their parameters.

Besides the models presented in Table 2, we implemented three methods to combine them: Average, Median and Ranking.

REGRESSION STEP

Our approach is divided into three steps:

● Classification;

● Regression and;

● Ordering Results.

In the classification and regression steps we use the Weka API to train the models.

Figure 1: Overview of the Recommender System.

OVERVIEW OF THE RECOMMENDER SYSTEM

We use three classifiers, Naïve Bayes, Support Vector Machines (SVM) and the

Nearest Neighbor algorithm Ibk.

Table 1: Classification models and their parameters.

We also implement a classifier that combine them using Voting. In other words, an

instance will be classified in a given class if it has obtained the required majority of

the models presented.

CLASSIFICATION STEP

Table 3 summarizes the factors and the levels used in each one. Considering the

factors and levels used, we have an experimental design with 2 * 7 * 9 = 126

treatments without replication. We use the metric normalized Discounted Cumulative

Gain (nDCG) to compare the methods.

Table 3: Experimental factors and their levels.

METHODOLOGY

Table 4 presents the NDCG@10 results of the ten best configurations of our approach.

Table 4: The nDCG@10 of the 10 best configurations.

RESULTS