Apache Mahout Algorithms

Preview:

DESCRIPTION

 

Citation preview

Mahout AlgorithmsMahmut Karakaya

Agenda- Introduction- Collaborative Filtering- Map/Reduce- Clustering- Demo

What mahout meansElephant rider in Hindi

What Apache Mahout is- Java, Hadoop- Collaborative Filtering- Mahout In Action- user@mahout.apache.org- 0.9 (1-Feb-2014)

Who uses Mahout

Mahout in Apache Foundation

overstock.com saves $2m a year

Judd Bagley Saum Noursalehi

Others- Weka (Machine Learning Library)- Lenskit (Grouplens)- EasyRec (RestAPI)- Write yourself:)

Need to know ML?

Need to know ML?hadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

Data Model (u,i,r)

Similarity

Cosine Similarity

Cosine Similarity

Collaborative Filtering- Data format = userId, itemId, rating- Create Model + Predict

Item Based - Similarity Matrix (Item-Item)

Item Based - Predict- Weighted Sum:

r^(3,1) = 2 * 0.91 + ...

Item Based

Item Based.. Why in Mahout

- Generic recommender like User Based- User Based similarity matrix is heavier

Singular Value Decomposition (SVD)

SVDRecommeder

Factorization

Factorizer

Singular Value Decomposition (SVD)

m * n → m * k + n * k 10M → 100K + 10K

Lets say; m=10Kn = 1Kk=10

Singular Value Decomposition (SVD)

SVD k=3 λ=0.1 a=40 c.a=1

SVD k=3 λ=0.1 a=40 c.a=1

SVD k=3 λ=0.1 a=40 c.a=10

SVD.. Why in Mahout- Won Netflix Prize- Parallelizable by row, column

Map / Reduce Mapper1.txt 2.txtHello HelloHello

Map / Reduce Mapper

Map / Reduce MapperMap1 Map2

Hello,1 Hello,1Hello,1

Map / Reduce Reducer

Map / Reduce ReducerHello,3

Map / Reduce ItemBased

Map / Reduce ItemBasedhadoop.jar mahout-core-0.8-job.jar \org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \-Dmapred.input.dir=input/input.txt \-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

Map / Reduce ItemBased

Map / Reduce ItemBased

Map / Reduce ItemBasedMap 1

Map / Reduce ItemBasedReduce 1

Map / Reduce ItemBasedReduce 1

Map / Reduce ItemBasedMap 2

Map / Reduce ItemBasedReduce 2

Map / Reduce ItemBased

Map / Reduce.. Why in Mahout

Clustering- KMeans Clustering (SM,MR)- Fuzzy kMeans (SM,MR)- Canopy Clustering (SM,MR)- Dirichlet (SM,MR)

Kmeans

Kmeans

Clustering Evaluation

Clustering Intra Distance

Clustering Inter Distance

Clustering.. Why in Mahout- Sparsity

- ~10m of 11m users registered 1 Sony product

Clustering.. Why in Mahout- Group Recommendation- Cluster Based Recommendation

Create WishList Experience

- Mahout (SVD)- Play- Heroku- MongoLab- Resthttp://recommenderplaybbs.herokuapp.com/

Thank you