15
Data Mining and Recommendation Systems - SALIL NAVGIRE

Data Mining and Recommendation Systems

Embed Size (px)

Citation preview

Page 1: Data Mining and Recommendation Systems

Data Mining and Recommendation Systems

- SALIL NAVGIRE

Page 2: Data Mining and Recommendation Systems

Introduction• Discovery of models for data

• Example if the data is set of numbers then we assume that the data comes from Gaussian and model the parameters to define it completely

• Recognize meaningful patterns in data -> data miningPredict outcome from known patterns -> ML

Page 3: Data Mining and Recommendation Systems

Data Mining Techniques• Classification• Predicting the class of new item given set of

items with several classes and past instances

• Example loan approval based on decision tree classifiers

Job

Income

Job

Income Income

CarpenterEngineer Doctor

Bad Good Bad Good Bad Good

<30K

<40K

<50K

>50K

>90K

>100K

Page 4: Data Mining and Recommendation Systems

• Clustering• Clustering algorithms find group of items that

are similar

• Basically divides a dataset so that records with similar content are in the same group and group are as different as possible from each other

• K-Nearest Neighbor – a classification method that clasifies based on calculating the distances between point and other points in the training dataset

• Example Car Sales

Page 5: Data Mining and Recommendation Systems

• Regression• Deals with prediction of value rather than class

• Given x1, x2, x3….. Predict Y

• Use Linear regression and predict variables a0, a1, a2… in Y=a0+a1x1+a2x2…..

• Use Line fitting, Curve fitting methods

• Example find a relationship between smoking patients and cancer related illness

Page 6: Data Mining and Recommendation Systems

• Association Rules• These algorithms create rules that describe how

often events have occurred together

• Example when a customer buys a hammer then 90% of the time they buy nails

• Spam classification based on conditional probability

• Support is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule

• Confidence is the measure of how often the consequent is true when the antecedent is true

• Outlier Analysis• Most Data mining methods discard outliers as noise

or exceptions

• However in some applications such as fraud detection, these rare events can be more interesting

Page 7: Data Mining and Recommendation Systems

Knowledge Discovery Process• Data Collection

• Data Cleaning

• Data Integration

• Data selection

• Data transformation

• Data Mining

• Evaluation

• Knowledge presentation

Page 8: Data Mining and Recommendation Systems

Applications of Data Mining• Marketing• Analysis of consumer

behavior

• Advertising campaigns

• Targeted mailings

• Segmentation of customers, stores, or products

• Finance• Creditworthiness of clients

• Performance analysis of finance investments

• Fraud detection

• Manufacturing• Optimization of resources

• Optimization of manufacturing processes

• Product design based on customer requirements

• Health Care• Discovering patterns in X-

ray images

• Analyzing side effects of drugs

• Effectiveness of treatments

Page 9: Data Mining and Recommendation Systems

Privacy Concerns• Effective Data Mining requires large sources of data

• To achieve a wide spectrum of data, link multiple data sources

• Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked:

• Shopping History

• Credit History

• Bank History

• Employment History

• The users life story can be painted from the collected data

Page 10: Data Mining and Recommendation Systems

Recommendation systems• Definition – RS are subclass of information filtering

systems that seek to predict the rating or preference that user would give to an item

• Enhance user experience by assisting user in finding information and reduce search and navigation time

• Increase productivity and credibility

• Decrease Long tail phenomenon

• Types of RS• Content based RS

• Collaborative filtering RS

• Hybrid RS

Page 11: Data Mining and Recommendation Systems

• Content based RS• Recommend items similar to those users

preferred in the past

• User profiling is the key

• Items/content usually denoted by keywords

• Limitations• Not all contents well represented by keywords (e.g

Images)• unrated items not shown• Users with thousands of purchases is a problem

• Example: Pandora uses properties of a song in the Music Genome Project to play similar songs

Page 12: Data Mining and Recommendation Systems

• Collaborative Filtering method• Uses other users rating for recommendation• Key is to find users/user groups whose interests

match with the current user• More users, more ratings: better results

• Limitations• Cold Start problem• Large computation power required• Sparsity

• Example: Last.fm or Spotify recommend songs based on user listening history and comparing with other users. Facebook, LinkedIn use collaborative filtering to recommend new friends and connections

Page 13: Data Mining and Recommendation Systems

• Hybrid RS• There are some cases where combining content

based and collaborative filtering are more effective

• Can overcome the sparsity and cold start problem

• Netflix Prize: offered a prize of 1 million to team that could increase the Netflix rating by 10%. The competition spanned from 2006-2009 won by BellKor's Pragmatic Chaos who used ensemble of 107 algorithms for single prediction!

• Amazon item to item collaboration• Compute similarity between item pairs

• Combine the similar items into recommendation list

• Vector corresponds to an item, and directions correspond to customers who have purchased them

• Similar items table built offline

Page 14: Data Mining and Recommendation Systems

• Measuring similarity

Page 15: Data Mining and Recommendation Systems

Examples• E-Commerce: Amazon.com, Ebay, Etsy.

• Music: Spotify, Pandora.

• Movie: Nettfilx.com, IMDB.

• News: Digg, Summly.

• Social Networks: LinkedIn, Facebook, Quora, YouTube

• Apps: Playstore, Cover