Upload
leandro-ciuffo
View
830
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation given at the ICEIS 2009 Conference.
Citation preview
Using Grids to support Information Filtering Systems
Leandro N. [email protected]
INFN-Catania - Italy
Running Collaborative Filtering Recommendations on gLite middleware
Recommender system settings
http://canalcinefilia.com.br
Explicit data collection(users need to rate > 20 movies)
Movielens-like recommender system (http://movielens.umn.edu)
MySQL
www
No dedicated servers / standard technologies
Hosted by a comercial Web Hosting service
Data set
225 out of 390 users are able to get movie recommendations
921 movies
33,147 ratings
225 users X 921 movies
~207,000 ‘cells’
Rating Matrix
Users’ similarity - Pearson r correlation
The complexity of maintaining a similarity matrix with the Pearson correlation
between every pair of users is O(m2n)
0.9
0.8
0.7
0.60.5
0.4
0.3
1
Pearson Correlation-1 < r < 1
Neighborhood: similarity threshold r > 0.3
Will this user enjoy the movie “Yes Man” ?
User’s neighborhoodRatingPredictiongenerated
Weighted mean
Generating recommendations
0.9 0.8 0.6 0.4 0.3
A movie must be rated byat least 8 neighborsThis repeats for every movie not rated
www.canalcinefilia.com.br - predictions
65,388 predictions
Grid Computing
Computational intensive research
Computation intensive applications / experiments
EGEE numbers:>260 sites54 countries~114,000 CPUs>20 PetaBytes>16,000 users>200 VOs>150,000 jobs/day
EGEE Grid (www.eu-egee.org)
The global network coverage
EELA
OSG
TeraGrid
NAREGIEUMedGrid
BalticGrid
SEE-Grid
EUIndiaGrid
EUAsiaGrid
EUChinaGrid
DEISA
EGEE
The global Grid coverage
Implementation on the gLite-based Grid
GILDA (https://gilda.ct.infn.it)
Grid INFN Laboratory for Dissemination Activities
Grid test-bed for training
A ‘standard’ t-Infrastructure adopted by many projects
Users can practice prior to run their codes on the production e-Infrastructures
~ 11 sites - 285 CPUs *
(*) # of Sites may change in time they are managed on a “best effort” basis
EELA (www.eu-eela.eu)
E-science grid facility for Europe and Latin America
Co-funded by EC (FP7)
~ 5800 CPUs
EELA (www.eu-eela.eu)
MySQL
www
Grid UI
.JDL
LFC
SE
SE
Grid UI
Recommender.class
Start.sh
mdclient.config
Input sandbox
Grid UIWMS
CE 1 CE 2 CE n
WN WN WN
SESE
Output - Version I
CE 1 CE 2 CE n
WN WN WN
.SQL.SQL.SQL.SQL.SQL.SQL
.SQL.SQL.SQL
Grid UI WMS
Output sandbox
.SQL.SQL.SQL
MySQL
www
Output - Version II
CE 1 CE 2 CE n
WN WN WN
AMGA
MySQL
www
Implementation on OurGrid
OurGrid (www.ourgrid.org)
Opportunistic Grid
Job submissions can be handled by a web portal
What’s next?
Future works
Run experiments using the Netflix prize database
Create a new version using the Amazon EC2
Provide performance comparisons among EELA (gLite) X OurGrid X Amazon EC2
I’m looking for partners
The End
http://www.canalcinefilia.com.br
http://canalcinefilia.com.br/en/credits/about.php
http://applications.eu-eela.eu/application_details.php?ID=59