Using Grids to support Information Filtering Systems

Preview:

DESCRIPTION

Presentation given at the ICEIS 2009 Conference.

Citation preview

Using Grids to support Information Filtering Systems

Leandro N. Ciuffoleandro.ciuffo@ct.infn.it

INFN-Catania - Italy

Running Collaborative Filtering Recommendations on gLite middleware

Recommender system settings

http://canalcinefilia.com.br

Explicit data collection(users need to rate > 20 movies)

Movielens-like recommender system (http://movielens.umn.edu)

MySQL

www

No dedicated servers / standard technologies

Hosted by a comercial Web Hosting service

Data set

225 out of 390 users are able to get movie recommendations

921 movies

33,147 ratings

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

225 users X 921 movies

~207,000 ‘cells’

Rating Matrix

Users’ similarity - Pearson r correlation

The complexity of maintaining a similarity matrix with the Pearson correlation

between every pair of users is O(m2n)

0.9

0.8

0.7

0.60.5

0.4

0.3

1

Pearson Correlation-1 < r < 1

Neighborhood: similarity threshold r > 0.3

Will this user enjoy the movie “Yes Man” ?

User’s neighborhoodRatingPredictiongenerated

Weighted mean

Generating recommendations

0.9 0.8 0.6 0.4 0.3

A movie must be rated byat least 8 neighborsThis repeats for every movie not rated

www.canalcinefilia.com.br - predictions

65,388 predictions

Grid Computing

Computational intensive research

Computation intensive applications / experiments

EGEE numbers:>260 sites54 countries~114,000 CPUs>20 PetaBytes>16,000 users>200 VOs>150,000 jobs/day

EGEE Grid (www.eu-egee.org)

The global network coverage

EELA

OSG

TeraGrid

NAREGIEUMedGrid

BalticGrid

SEE-Grid

EUIndiaGrid

EUAsiaGrid

EUChinaGrid

DEISA

EGEE

The global Grid coverage

Implementation on the gLite-based Grid

GILDA (https://gilda.ct.infn.it)

Grid INFN Laboratory for Dissemination Activities

Grid test-bed for training

A ‘standard’ t-Infrastructure adopted by many projects

Users can practice prior to run their codes on the production e-Infrastructures

~ 11 sites - 285 CPUs *

(*) # of Sites may change in time they are managed on a “best effort” basis

EELA (www.eu-eela.eu)

E-science grid facility for Europe and Latin America

Co-funded by EC (FP7)

~ 5800 CPUs

EELA (www.eu-eela.eu)

MySQL

www

Grid UI

.JDL

LFC

SE

SE

Grid UI

Recommender.class

Start.sh

mdclient.config

Input sandbox

Grid UIWMS

CE 1 CE 2 CE n

WN WN WN

SESE

Output - Version I

CE 1 CE 2 CE n

WN WN WN

.SQL.SQL.SQL.SQL.SQL.SQL

.SQL.SQL.SQL

Grid UI WMS

Output sandbox

.SQL.SQL.SQL

MySQL

www

Output - Version II

CE 1 CE 2 CE n

WN WN WN

AMGA

MySQL

www

Implementation on OurGrid

OurGrid (www.ourgrid.org)

Opportunistic Grid

Job submissions can be handled by a web portal

What’s next?

Future works

Run experiments using the Netflix prize database

Create a new version using the Amazon EC2

Provide performance comparisons among EELA (gLite) X OurGrid X Amazon EC2

I’m looking for partners

The End

leandro.ciuffo@ct.infn.it

http://www.canalcinefilia.com.br

http://canalcinefilia.com.br/en/credits/about.php

http://applications.eu-eela.eu/application_details.php?ID=59

Recommended