Upload
jonathas-magalhaes
View
83
Download
1
Embed Size (px)
Citation preview
Jonathas Magalhães2, Rubens Pessoa, Cleyton Souza, Evandro Costa, Joseana Fechine
The 2014 RecSys Challenge [1] consists of ordering tweets shared by users on IMDb according to the amount of interaction that they received. The interaction of a tweet is defined by the sum of the number of retweets and favorites that it received.Our objective is to present a contestant approach to the 2014 RecSys Challenge.
INTRODUCTION
1 More information at http://www.grouptips.org. 2 Corresponding author, e-mail: [email protected].
RECSYS CHALLENGE 2014FEDERAL UNIVERSITY OF CAMPINA GRANDE
FEDERAL UNIVERSITY OF ALAGOASIntelligent, Personalized and Social Technologies Group1
A RECOMMENDER SYSTEM FOR PREDICTINGUSER ENGAGEMENT IN TWITTER
[1] A. Said, S. Dooms, B. Loni, and D. Tikk. Recommender systems challenge 2014. In Proceedings of
the eighth ACM conference on Recommender systems, RecSys ’14, New York, NY, USA, 2014. ACM.
[2] S. Dooms, T. De Pessemier, and L. Martens. Movietweetings: a movie rating dataset collected from
twitter. In Workshop on Crowdsourcing and Human Computation for Recommender Systems,
CrowdRec at RecSys 2013, 2013.
REFERENCES
We use two datasets:
● The expanded MovieTweetings dataset [2] distributed by the organizers of the
challenge, with the following attributes: movie id, movie rating, crawled time, tweet
time, followers count, statuses count, favourites count and engagement.
● The IMDb dataset which consists of additional information about movies
referenced by tweets in order to complement the MovieTweetings dataset, with
the following attributes: IMDb rating, IMDb votes count, Movie year.
COMPOSING AND PRE-PROCESSING THE DATASET
In this work we use three different regressors: Linear Regression, Pace Regression
and induction model trees algorithm M5Base that is an extension of the Quinlan’s
algorithm to the regression task.
Table 2: Regression models and their parameters.
Besides the models presented in Table 2, we implemented three methods to combine them: Average, Median and Ranking.
REGRESSION STEP
Our approach is divided into three steps:
● Classification;
● Regression and;
● Ordering Results.
In the classification and regression steps we use the Weka API to train the models.
Figure 1: Overview of the Recommender System.
OVERVIEW OF THE RECOMMENDER SYSTEM
We use three classifiers, Naïve Bayes, Support Vector Machines (SVM) and the
Nearest Neighbor algorithm Ibk.
Table 1: Classification models and their parameters.
We also implement a classifier that combine them using Voting. In other words, an
instance will be classified in a given class if it has obtained the required majority of
the models presented.
CLASSIFICATION STEP
Table 3 summarizes the factors and the levels used in each one. Considering the
factors and levels used, we have an experimental design with 2 * 7 * 9 = 126
treatments without replication. We use the metric normalized Discounted Cumulative
Gain (nDCG) to compare the methods.
Table 3: Experimental factors and their levels.
METHODOLOGY
Table 4 presents the NDCG@10 results of the ten best configurations of our approach.
Table 4: The nDCG@10 of the 10 best configurations.
RESULTS