Recommendation Techn

Embed Size (px)



Text of Recommendation Techn

  • 1. 2014 MapR Technolo 2g0ie1s4 MapR Technologies 1

2. 2014 MapR Technologies 2Who I amTed Dunning, Chief Applications Architect, MapR TechnologiesEmail tdunning@apache.orgTwitter @Ted_DunningApache Mahout @ApacheMahout22 June 2014 Big Data Everywhere Conference #DataIsrael 3. Recommendation: Widely Used Machine LearningExample: Open source Apache Mahout used in production 2014 MapR Technologies 3 4. 2014 MapR Technologies 4Recommendations Data: interactions between people taking action (users) and items Data used to train recommendation model Goal is to suggest additional interactions Example applications: movie, music or map-based restaurant choices;suggesting sale items for e-stores or via cash-register receipts 5. Google maps: restaurant recommendations 2014 MapR Technologies 5 6. 2014 MapR Technologies 6Google maps: tech recommendations 7. 2014 MapR Technologies 7Tutorial Part 1:How recommendation works,or I want a pony 8. First question:Are you using the right data? 2014 MapR Technologies 9 9. 2014 MapR Technologies 10RecommendationBehavior of a crowdhelps us understandwhat individuals will do 10. 2014 MapR Technologies 11RecommendationsAlice got an apple andAlice a puppy 11. 2014 MapR Technologies 12RecommendationsAlice got an apple andAlice a puppyCharles Charles got a bicycle 12. 2014 MapR Technologies 13RecommendationsAlice got an apple andAlice a puppyBob Bob got an appleCharles Charles got a bicycle 13. 2014 MapR Technologies 14RecommendationsAlice got an apple andAlice a puppyBob What else would Bob like?Charles Charles got a bicycle 14. 2014 MapR Technologies 15RecommendationsAlice got an apple andAlice a puppyBob A puppy!Charles Charles got a bicycle 15. 2014 MapR Technologies 16You get the idea of howrecommenders can work 16. By the way, like me, Bob alsowants a pony 2014 MapR Technologies 17 17. 2014 MapR Technologies 18Recommendations?AliceBobAmeliaCharlesWhat if everybody gets apony?What else would you recommend fornew user Amelia? 18. 2014 MapR Technologies 19Recommendations?AliceBobAmeliaCharlesIf everybody gets a pony, itsnot a very good indicator ofwhat to else predict... 19. 2014 MapR Technologies 20Problems with Raw Co-occurrence Very popular items co-occur with everything or why its not veryhelpful to know that everybody wants a pony Examples: Welcome document; Elevator music Very widespread occurrence is not interesting to generate indicatorsfor recommendation Unless you want to offer an item that is constantly desired, such asrazor blades (or ponies) What we want is anomalous co-occurrence This is the source of interesting indicators of preference on which tobase recommendation 20. Overview: Get Useful Indicators from Behaviors 2014 MapR Technologies 211. Use log files to build history matrix of users x items Remember: this history of interactions will be sparse compared to allpotential combinations2. Transform to a co-occurrence matrix of items x items3. Look for useful indicators by identifying anomalous co-occurrences tomake an indicator matrix Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrencescan with confidence be used as indicators of preference ItemSimilarityJob in Apache Mahout uses LLR 21. Apache Mahout: Overview Open source Apache project Mahout version is 0.9 released Feb 2014; inc Scala 2014 MapR Technologies 22 Summary 0.9 blog at Library of scalable algorithms for machine learning Some run on Apache Hadoop distributions; others do not require Hadoop Some can be run at small scale Some are run in parallel; others are sequential Includes the following main areas: Clustering & related techniques Classification Recommendation Mahout Math Library 22. 2014 MapR Technologies 24Log FilesAliceCharlesCharlesAliceAliceBobBob 23. 2014 MapR Technologies 25Log Filesu1u2u2u1u1u3u3t1t4t3t2t3t3t1 24. 2014 MapR Technologies 26History Matrix: Users x ItemsAliceBobCharles 25. 2014 MapR Technologies 27Co-Occurrence Matrix: Items x Items1 2 0111 11020 0How do you tell which co-occurrencesare useful? 26. 2014 MapR Technologies 28Co-Occurrence Matrix: Items x Items1 2 0111 11020 0Use LLR test to turn co-occurrenceintoindicators 27. 2014 MapR Technologies 29Co-occurrence Binary Matrix1not1not 1 28. Which one is the anomalous co-occurrence?A not AB 1 0not B 0 2 2014 MapR Technologies 30A not AB 13 1000not B 1000 100,000A not AB 1 0not B 0 10,000A not AB 10 0not B 0 100,000 29. Which one is the anomalous co-occurrence?A not A0.90 1.95B 1 0not B 0 2 2014 MapR Technologies 31A not AB 13 1000not B 1000 100,000A not A4.52 14.3B 1 0not B 0 10,000A not AB 10 0not B 0 100,000 30. 2014 MapR Technologies 32Co-Occurrence Matrix: Items x Items1 2 0111 11020 0Recap:Use LLR test to turn co-occurrenceinto indicators 31. Indicator Matrix: Anomalous Co-OccurrenceResult:Each marked row showsthe indicators of what torecommend 2014 MapR Technologies 33 32. Indicator Matrix: Anomalous Co-Occurrence 2014 MapR Technologies 34Why not pony + other item? 33. 2014 MapR Technologies 36How will you deliverrecommendations to users? 34. 2014 MapR Technologies 37Seeking SimplicityInnovation:Exploit search technology to deployyour recommendation system 35. 2014 MapR Technologies 38But first, a look at howsearch works 36. Apache Solr/Apache Lucene Apache Solr/Lucene is an open-source powerful searchengine used for flexible, heavily indexed queries including datasuch as Full text, geographical data, statistically weighted data 2014 MapR Technologies 39 Lucene Provides core retrieval Is a low-level library Solr Is a web-based wrapper around Lucene Is easy to integrate because you talk to it via a web-interface URL http://host machine:8888 37. 2014 MapR Technologies 40LucidWorks Enterprise platform and collection of applications based onApache Solr/Lucene Wrapper around Solr A free version ships with MapR LucidWorks leaves Solr exposed but makes Solradministration much easier, which in turn makes it easier touse Lucene URL http://host machine:8989 38. 2014 MapR Technologies 41LucidWorksSolrQuery ResponseIndexLuceneQuery ResponseData SourceRelationship of Solr /Lucene/LucidWorks 39. 2014 MapR Technologies 42Other Options: Elastic Search Apache Lucene library at the heart of several approaches It can also be used on its own Elastic Search is a different web interface for Lucene Real time search and analytics Open source (not Apache) Big advantage is less accumulated cruft 40. 2014 MapR Technologies 44What is a Document? Data is stored in collections made up of documents Documents contain fields that can be Indexed makes field searchable; dont have to index all Stored If you want Solr to return content, have Solr store content Not all data of interest must be stored: can access via stored URL (good for verylarge data set) Multi-value Body field can contain more than one type of data Facetted A way to refine a search or use for statistics Example: data for country: Could return facetted as 37 from US, 23 UK, 7 Japan 41. 2014 MapR Technologies 45Fields and How to Set Them Up Lucene is mostly used for text Text has to be tokenized Also supports other types Long, string, keywords, comma separated Fields properties such as stored, indexed, faceted can also bedefined Defaults arent usually so great 42. 2014 MapR Technologies 46Example of Facetted SearchThe indexed fields Area and Gender have beenfacetted to provide counts for the results 43. Field-specific Searches Using Lucene Syntax Documents often have title, author, keywords and body text General syntax is 2014 MapR Technologies 48field:(term1 term2) Alternativesfield:(term1 OR term2) field:(term1 term2 ~ 5) field:(term1 AND term2) Default field, default interpretations work very well for text 44. 2014 MapR Technologies 49Send Data to Solr (not LucidWorks) LucidWorks has lots of spiders that can use file extension ormime type to trigger certain file parsers Works best with web-ish sources and mime types Also includes MapR and MapR high volume indexers More common at modest volumes or for updates to use JSONformat{"id":"book_314", "title":{"set":"The Call of the Wild"}} Use REST interface to send update filescurl http://localhost:8983/solr/update -H 'Content-type:application/json' --data-binary @file.json 45. Back to recommendation:How do you abuse search to makerecommendation easy? 2014 MapR Technologies 51 46. Collection of Documents: Insert Meta-Data 2014 MapR Technologies 52SearchTechnologyItemmeta-dataIngest easily via NFSDocument forpuppy id: t4title: puppydesc: The sweetest little puppyever.keywords: puppy, dog, pet 47. From Indicator Matrix to New Indicator Field 2014 MapR Technologies 53id: t4title: puppydesc: The sweetest little puppyever.keywords: puppy, dog, petindicators: (t1)Solr documentfor puppyNote: data for the indicator field is added directly to meta-data for a document in ApacheSolr or Elastic Search index. You dont need to create a separate index for the indicators. 48. Lets look at a real example:we built a music recommender 2014 MapR Technologies 54 49. 2014 MapR Technologies 56 50. User activity: Listens to classic jazz hit Take the A Train 2014 MapR Technologies 57 51. System delivers recommendations based on activity 2014 MapR Technologies 58 52. 2014 MapR Technologies 59Lets look inside the musicrecommender 53. Music Meta Data for Search Document Collections 2014 MapR Technologies 60 MusicBrainz data Data includes Artist ID, MusicBrainz ID, Name, Group/Person, From (geolocations) and Gender as seen in this sample 54. Sample User Behavior Histories: Music Log Files 2014 MapR Technologies 6213 START 10113 218265428123 BEACON 10113 218265428124 START 10113 7960061193502834 BEACON 10113 7960061193