38
Using Grids to support Information Filtering Systems Leandro N. Ciuffo [email protected] n.it INFN-Catania - Italy Running Collaborative Filtering Recommendations on gLite middleware

Using Grids to support Information Filtering Systems

Embed Size (px)

DESCRIPTION

Presentation given at the ICEIS 2009 Conference.

Citation preview

Page 1: Using Grids to support Information Filtering Systems

Using Grids to support Information Filtering Systems

Leandro N. [email protected]

INFN-Catania - Italy

Running Collaborative Filtering Recommendations on gLite middleware

Page 2: Using Grids to support Information Filtering Systems

Recommender system settings

Page 3: Using Grids to support Information Filtering Systems

http://canalcinefilia.com.br

Explicit data collection(users need to rate > 20 movies)

Movielens-like recommender system (http://movielens.umn.edu)

Page 4: Using Grids to support Information Filtering Systems

MySQL

www

No dedicated servers / standard technologies

Hosted by a comercial Web Hosting service

Page 5: Using Grids to support Information Filtering Systems

Data set

225 out of 390 users are able to get movie recommendations

921 movies

33,147 ratings

Page 6: Using Grids to support Information Filtering Systems

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

                                                                         

225 users X 921 movies

~207,000 ‘cells’

Rating Matrix

Page 7: Using Grids to support Information Filtering Systems

Users’ similarity - Pearson r correlation

The complexity of maintaining a similarity matrix with the Pearson correlation

between every pair of users is O(m2n)

Page 8: Using Grids to support Information Filtering Systems

0.9

0.8

0.7

0.60.5

0.4

0.3

1

Pearson Correlation-1 < r < 1

Neighborhood: similarity threshold r > 0.3

Page 9: Using Grids to support Information Filtering Systems

Will this user enjoy the movie “Yes Man” ?

User’s neighborhoodRatingPredictiongenerated

Weighted mean

Generating recommendations

0.9 0.8 0.6 0.4 0.3

A movie must be rated byat least 8 neighborsThis repeats for every movie not rated

Page 10: Using Grids to support Information Filtering Systems

www.canalcinefilia.com.br - predictions

65,388 predictions

Page 11: Using Grids to support Information Filtering Systems

Grid Computing

Page 12: Using Grids to support Information Filtering Systems

Computational intensive research

Page 13: Using Grids to support Information Filtering Systems

Computation intensive applications / experiments

Page 14: Using Grids to support Information Filtering Systems

EGEE numbers:>260 sites54 countries~114,000 CPUs>20 PetaBytes>16,000 users>200 VOs>150,000 jobs/day

EGEE Grid (www.eu-egee.org)

Page 15: Using Grids to support Information Filtering Systems

The global network coverage

Page 16: Using Grids to support Information Filtering Systems

EELA

OSG

TeraGrid

NAREGIEUMedGrid

BalticGrid

SEE-Grid

EUIndiaGrid

EUAsiaGrid

EUChinaGrid

DEISA

EGEE

The global Grid coverage

Page 17: Using Grids to support Information Filtering Systems

Implementation on the gLite-based Grid

Page 18: Using Grids to support Information Filtering Systems

GILDA (https://gilda.ct.infn.it)

Grid INFN Laboratory for Dissemination Activities

Grid test-bed for training

A ‘standard’ t-Infrastructure adopted by many projects

Users can practice prior to run their codes on the production e-Infrastructures

~ 11 sites - 285 CPUs *

(*) # of Sites may change in time they are managed on a “best effort” basis

Page 19: Using Grids to support Information Filtering Systems

EELA (www.eu-eela.eu)

E-science grid facility for Europe and Latin America

Co-funded by EC (FP7)

~ 5800 CPUs

Page 20: Using Grids to support Information Filtering Systems

EELA (www.eu-eela.eu)

Page 21: Using Grids to support Information Filtering Systems

MySQL

www

Page 22: Using Grids to support Information Filtering Systems

Grid UI

.JDL

LFC

Page 23: Using Grids to support Information Filtering Systems

SE

SE

Page 24: Using Grids to support Information Filtering Systems

Grid UI

Recommender.class

Start.sh

mdclient.config

Input sandbox

Page 25: Using Grids to support Information Filtering Systems

Grid UIWMS

Page 26: Using Grids to support Information Filtering Systems

CE 1 CE 2 CE n

WN WN WN

SESE

Page 27: Using Grids to support Information Filtering Systems

Output - Version I

Page 28: Using Grids to support Information Filtering Systems

CE 1 CE 2 CE n

WN WN WN

.SQL.SQL.SQL.SQL.SQL.SQL

.SQL.SQL.SQL

Page 29: Using Grids to support Information Filtering Systems

Grid UI WMS

Output sandbox

.SQL.SQL.SQL

Page 30: Using Grids to support Information Filtering Systems

MySQL

www

Page 31: Using Grids to support Information Filtering Systems

Output - Version II

Page 32: Using Grids to support Information Filtering Systems

CE 1 CE 2 CE n

WN WN WN

AMGA

Page 33: Using Grids to support Information Filtering Systems

MySQL

www

Page 34: Using Grids to support Information Filtering Systems

Implementation on OurGrid

Page 35: Using Grids to support Information Filtering Systems

OurGrid (www.ourgrid.org)

Opportunistic Grid

Job submissions can be handled by a web portal

Page 36: Using Grids to support Information Filtering Systems

What’s next?

Page 37: Using Grids to support Information Filtering Systems

Future works

Run experiments using the Netflix prize database

Create a new version using the Amazon EC2

Provide performance comparisons among EELA (gLite) X OurGrid X Amazon EC2

I’m looking for partners

Page 38: Using Grids to support Information Filtering Systems

The End

[email protected]

http://www.canalcinefilia.com.br

http://canalcinefilia.com.br/en/credits/about.php

http://applications.eu-eela.eu/application_details.php?ID=59