44
Jubatusのリアルタイム分散 レコメンデーション 2012/02/25@TokyoNLP 株式会社Preferred Infrastructure 海野 裕也 (@unnonouno)

Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9

Embed Size (px)

DESCRIPTION

 

Citation preview

  • 1. Jubatus 2012/02/25@TokyoNLPPreferred Infrastructure (@unnonouno)

2. l (@unnonouno)l unno/no/unol Preferred Infrastructure l Seduel l l l Jubatus 3. l Jubatusl 4. Jubatus 5. Big Data !l l l l l l l l PCEC 5 6. STEP 1. STEP 2. STEP 3. l30 30 6 7. Jubatusl NTT PFPreferred Infrastructure10/27OSS http://jubat.us/ 7 8. Jubatus l l 9. l l l TVl l l l l l 9 10. l Hadoop & Mahoutl l l l l l l l 11. Jubatusl l l 12. Jubatusl l l 12 13. l UPDATEl l ANALYZEl l MIXl l cf. MAP / REDUCEl ver. 0.2.0 13 14. l l (sum)(count)l UPDATEl sum += xl count += 1l ANALYZEl return (sum / count)l MIXl sum = sum1 + sum2l count = count1 + count214 15. l l Jubatus 16. l libsvml +1 1:1 3:1 8:1l l l Cl Cl 16 17. RDBHadoopl l l l l l l SQLl Map/Reducel 17 18. Jubatus l l l l MCMCl l l l 18 19. l Jubatusl JSONl twitter APIl l l l l 19 20. 21. l l l l 22. l D={d1, d2, , dn}l ql ff(d, q)kl fJaccard q 23. l l l l 24. l similar_rowl qIDl l update_rowl IDl complete_rowl ql similar_row 25. l l l cos((x, y)) = xTy / |x||y|l Jaccardl l Jacc(X, Y) = |XY|/|XY|l 26. l l Locality Sensitive Hashingl minhashl 27. l l 28. Locality Sensitive Hashing (LSH)l r l x, yxTryTr cos((x, y))l kl x{r1, , rk} H(x) = {sign(xTr1), , sign(xTrk)}l sign10l H(x)k 29. LSHl l 1 (x, y)/ cos((x, y)) / 30. Jaccardl l 0, 1OKl Jacc(X, Y) = |XY| / |XY|l X = {1, 2, 4, 6, 7}l Y = {1, 3, 5, 6}l XY = {1, 6}l XY = {1, 2, 3, 4, 5, 6, 7}l Jacc(X, Y) = 2/7 31. minhashl X = { x1, x2, , xn }l Xl H(X) = { h(x1), , h(xn) }l m(X) = argmin(H(X))l m(X) = m(Y)Jacc(X, Y)l m(X)=m(Y)Jacc(X, Y)l m(X) [Li+10a, Li+10b] 32. minhashl XYXY X Y 33. Jaccardl idfl wJacc(X, Y) = iXY wi / iXY wil wi1l X = {1, 2, 4, 6, 7}l Y = {1, 3, 5, 6}l w = (2, 3, 1, 4, 5, 2, 3)l XY = {1, 6}l XY = {1, 2, 3, 4, 5, 6, 7}l wJacc(X, Y) = (2+2)/(2+3+1+4+5+2+3)=4/20 34. Jaccardminhash [Chum+08]l X = { x1, x2, , xn }l H(X) = {h(x1)/w1, , h(xn)/wn}l -log(h(x))l wil wil m(X) = argmin(H(X))l m(X) = m(Y)wJacc(X, Y) 35. [Liu+11]l l l l OK 36. Jubatusl Ddl L(d, q)d 37. l IDl mix1~100101~200 CHT (Consistent Hashing)201~300 38. l MIXl 1~100 1 2101~2003201~300 MIX!! 39. l LSHminhashbitl 1~100 1 2101~2003201~300 MIX!! 40. l l l l 41. l LSHl minhash 42. 42 43. l Jubatusl l l l MIXl l l Locality Sensitive Hashingl minhashl l Jubatus 44. l [Chum+08] Ondrej Chum, James Philbin, Andrew Zisserman.Near Duplicate Image Detection: min-Hash and tf-idfWeighting.BMVC 2008.l [Li+10a] Ping Li, Arnd Christian Konig.b-Bit Minwise Hashing.WWW 2008.l [Li+10b] Ping Li, Arnd Christian Konig, Wenhao Gui.b-Bit Minwise Hashing for Estimating Three-Way Similarities.NIPS 2008.l [Liu+11] Wei Liu, Jun Wang, Sanjiv Kumar, Shin-Fu Chang.Hashing with Graphs.ICML 2011.