34
Going data-driven Learnings from building a real-time recommender system Konrad Burnik September 21, 2016

Spil games konrad

Embed Size (px)

Citation preview

Page 1: Spil games konrad

Going data-driven Learnings from building a real-time recommender system

Konrad Burnik

September 21, 2016

Page 2: Spil games konrad

Spil Games – Leading cross platform publisher

Web Portals

Page 3: Spil games konrad

Spil Games - Web portal stats

• Portfolio of cca. 16K games

• 100 million monthly active users

• Channels: Family, Teen, Girls, Men

• Device Type: Desktop or Mobile

Page 4: Spil games konrad

Spil Games getting ready for the Big Data world

Page 5: Spil games konrad

This looks great! So, what's our

first ML project?

Page 6: Spil games konrad

Just look around ...

Page 7: Spil games konrad

Example: Distribution of Games within labels at spelletjes.nl

Long tail you got there!

Page 8: Spil games konrad

"The" widget for recommendations

Page 9: Spil games konrad
Page 10: Spil games konrad

Goals and challenges

• Provide better content for the users

• Optimize recommendations for business value

• Provide recommendations for new users

• Learn to use the new Spark infrastructure for solving all of the above

Page 11: Spil games konrad

Overview of the recommender system

• The infrastructure (before and after)

• Two key components of the new recommender

• Ephemeral (effectively solving the cold-start problem)

• Collaborative Filtering

Page 12: Spil games konrad

s

Spil Games Recommender Infrastructure (before)

Page 13: Spil games konrad

s

Streaming

MLlib

Spil Games Recommender Infrastructure (after)

Page 14: Spil games konrad

• For users which have some activity

• In particular, we wish to target the users which came to the portals and played just a few games

Ephemeral Recommender

Page 15: Spil games konrad

Ephemeral Recommender (challenges)

• What data can we use besides activity?

• How do we keep track of users?

• How do we quickly generate the recommendation lists?

Page 16: Spil games konrad

Ephemeral Recommender (key features)

• The ephemeral recommender is game-similarity based

• Exploiting the long-tail

• Also we show games which have more business value for Spil Games for example with sufficient amount of lifetime value

• Processing 800-1500 events per second

Page 17: Spil games konrad

Action Puzzle

Example:

Page 18: Spil games konrad

Action Puzzle

+1

+1

+1

Streaming

For You

Page 19: Spil games konrad

Action Puzzle

+1 +1

+1

For You

Streaming

Page 20: Spil games konrad

Action Puzzle

+1

+1

+1

For You

Streaming

Page 21: Spil games konrad

Action Puzzle

+1

+1

+1

For You

Streaming

Page 22: Spil games konrad

• For users which have history of their activity

• Proven to work by different companies like Amazon, Netflix, …

Collaborative Filtering

Page 23: Spil games konrad

Collaborative Filtering in general

* * ? ? * * * *

? * * * * * * ?

* * * * * * * ? * * * * *

* * * * ? * * * * * * *

Page 24: Spil games konrad

Collaborative Filtering in general

* * ? ? * * * *

? * * * * * * ?

* * * * * * * ? * * * * *

* * * * ? * * * * * * *

Can we predict the empty

places?

Page 25: Spil games konrad

Collaborative Filtering in general

* * * * * * * * * * * *

* * * * * * * * * * * *

* * * * * * * * * * * * * *

* * * * * * * * * * * * * * *

Great! But how do we get the highest ratings

out?

Page 26: Spil games konrad

Collaborative Filtering in

Image obtained from databricks.com

MLlib

Page 27: Spil games konrad
Page 28: Spil games konrad

Collaborative Filtering (challenges)

• How do we aggregate the activity data?

• How do we score the data and scale it?

• Which users do we run the model on?

• How do we efficiently extract the recommendations from the model?

Page 29: Spil games konrad

Collaborative Filtering recommender (key features)

• Aggregating every hour of user activity for the last hour (~1.5 - 5 mil. rows) takes about 2 minutes

• Calculating the model based on a month of scored and scaled pre-aggregated activity takes about 1 hour

• We run the model only for user which were active in the last 5 hours

• Extracting the recommendations takes about 30 mins with optimized approach

Page 30: Spil games konrad

Family Teens Girls Men

Desktop 68 894 434 16 070 864 31 285 329 679 565

Mobile 2 532 549 404 934 1 276 879 2 249

# total records

Family Teens Girls Men

Desktop 16 127 074 5 254 646 5 022 497 357 721

Mobile 1 035 520 221 192 397 091 1 240

# distinct users

Family Teens Girls Men

Desktop 15 078 11 764 7 736 3 171

Mobile 3 151 5 532 1 792 467

# distinct games

Data amounts processed by CF

Page 31: Spil games konrad

Results

• The deployment system in place for developing Spark

apps

• Gained knowledge of using Spark infrastructure

• Gained knowledge of inner workings of recommenders as well as some related cutting-edge research

• Significantly improved the CTR of the "For You"

widget in the two months the recommender is live

Page 32: Spil games konrad

What have we learned?

• Giving recommendations is hard!

• Simple solutions often work best

• Exploring the long-tail is a good thing for diversification

• Spark is not that simple as hyped, you often need to tweak a lot!

Page 34: Spil games konrad

Thank you

for your attention!

Contact: https://nl.linkedin.com/in/konrad-burnik