33
OCTOBER 11-14, 2016 BOSTON, MA

Smart Facets at Rakuten: Presented by Keith Thoma & Michael Pellegrini, Rakuten USA

Embed Size (px)

Citation preview

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Smart Facets @ Rakuten Keith Thoma & Michael Pellegrini

Rakuten USA

3 3

About Rakuten

•  Founded  in  1997  in  Japan  

•  Operates  Rakuten  Ichiba,  the  largest  e-­‐commerce  site  in  Japan  

•  One  of  the  15  largest  internet  companies  in  the  world  

•  10,000+  employees  worldwide  

•  $6.3  billion  in  revenue  in  FY2015  

4 4

About Rakuten

Over  50  subsidiaries  worldwide  -­‐  some  notable  ones:  

5 5

Search Technology Stack

6 6

Solr at Rakuten •  30+ Services within the Rakuten group using Solr

•  Solr supported in 10+ languages

•  At Rakuten.com

•  Supported via Solr

•  Over 30 million products and 90 million different items

•  Thousands of unique categories and attributes to search against

•  Millions of queries a day!

7 7

Overview

Introduc1on  To  Facets  

Built-­‐In  Facet  Sor1ng  Methods  

Relevancy-­‐Based  Facet  Sor1ng  Methods  

8 8

What Are Facets?

9 9

Facet Sorting Criteria

9

•  Top  facets  are  relevant  to  query  

•  Top  facet  order  reflects  relevancy  

•  Easy  to  maintain  over  Mme  

•  Acceptable  latency  in  producMon  

10 10

Latency Impact Of Facets

10

•  Facets  are  expensive  •       Extra  logic  can  be  performance  hit  

•       In  some  cases,  facets  can  slow  down  queries  by  10x  

•       OOMs  in  extreme  cases  

11 11

Overview

Introduc1on  To  Facets  

Built-­‐In  Facet  Sor1ng  Methods  

Relevancy-­‐Based  Facet  Sor1ng  Methods  

12

Brands  

ORerbox  

Incipio  

Apple  

AAA  Phone  Cases  

Assume  we  have  the  following  brand  facets  for  the  query  iPhone  6  Cases  

     

Example - iPhone 6 Cases

AAA

13 13

Search Results for iPhone 6 Cases

13

1   2   3   4   5      

O>erbox  iPhone  Case  Brand:  ORerbox  

O>erbox  iPhone  Case  Brand:  ORerbox  

 

Generic  iPhone  Case  Brand:  Incipio  

Generic  iPhone  Case  Brand:  Incipio  

 

Generic  iPhone  Case  Brand:  Incipio  

 

6   7   8   9   10  

iPhone  6s  +  Case  Brand:  Apple  

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases  

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases    

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases    

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases    

AAA AAA AAA AAA

14 14

Default Facet Sorting Methods

14

Sort  based  on  alphabeMcal  order  of  facet  values  

Name Sort Count Sort

Sort  based  on  result  count  per  facet  value  

Let’s  see  how  they  do  

15

15

Name Sort Count Sort

15

Brands  

AAA  Phone  Cases  

Apple  

Incipio  

ORerbox  

Brands  

AAA  Phone  Cases   Count:  4  

Incipio   Count:  3  

ORerbox   Count:  2  

Apple   Count:  1  

AAA AAA

16 16

JSON Facets

16

•  We  can  sort  on  a  value  associated  with  a  facet  

•  Values  must  be  wriRen  to  an  indexed  field  

•  Let’s  add  a  staMc  score  to  the  mix  and  sort  on  that!  

17

17

Search results for iPhone 6 Case

17

1   2   3   4   5      

O>erbox  iPhone  Case  Brand:  ORerbox  

 Score:  30  

O>erbox  iPhone  Case  Brand:  ORerbox  

 Score:  30  

 

Generic  iPhone  Case  Brand:  Incipio  

 Score:  20  

 

Generic  iPhone  Case  Brand:  Incipio  

 Score:  20  

 

Generic  iPhone  Case  Brand:  Incipio  

 Score:  20  

 

6   7   8   9   10  

iPhone  6s  +  Case  Brand:  Apple  

 Score:  100  

 

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases  Score:  1  

 

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases  Score:  1  

 

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases  Score:  1  

 

Off-­‐Brand  iPhone  Case  Brand:  AAA  Phone  

Cases  Score:  1  

 

AAA AAA AAA AAA

18

18

Static Score Sort

18

Brands  

Apple   Score:  100  

ORerbox   Score:  30  

Incipio   Score:  20  

AAA  Phone  Cases   Score:  1  AAA

19

Name   Count   Sta1c  Score  

19

Results – Built-In Sorting Methods

19

•  Top  facets  are  relevant  to  query  

•  Top  facet  order  reflects  relevancy  

•  Easy  to  maintain  over  Mme  

•  Acceptable  latency  in  producMon  

20 20

Overview

Introduc1on  To  Facets  

Built-­‐In  Facet  Sor1ng  Methods  

Relevancy-­‐Based  Facet  Sor1ng  Methods  

21 21

Score Sort

21

•  Try  sorMng  on  score:  • msg: "undefined field: "score"”,

org.apache.solr.common.SolrException: undefined field: "score" at

org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1231)

•  Not  supported  out  of  box  

•  How  could  we  add  support  for  this?  

22 22

Custom Collector Logic

22

•  Could  be  implemented  via  a  custom  collector    

•  Would  alter  select  facets  

•  Would  require  extra  effort  when  performing  Solr  upgrades  

•  Could  have  a  negaMve  performance  impact  

•  Might  need  addiMonal  logic  to  support  grouping/collapsing  

23 23

API Wrapper

23

•  Run  an  API  wrapper  around  Solr  •  Re-­‐sort  facets  in  wrapper  •  Easy  to  add  custom  business  rules  

24 24

Score Sort

24

Is  this  the  best  sort  order?  

Brands  

ORerbox  

Incipio  

Apple  

AAA  Phone  Cases   AAA

25 25

Blended Approach

25

•  Use  both  result  scores  and  user  data  

•  Use  machine  learning  to  blend  the  scores  together  

ORerbox   30  User  Clicks  

Incipio   50  User  Clicks  

Apple   10  User  Clicks  

AAA  Phone  Cases   1  User  Click  AAA

26

Blended Workflow

Score  

Feedback  

User  Data  

27 27

Blended Sort

27

Brands  

Incipio  

ORerbox  

Apple  

AAA  Phone  Cases   AAA

28 28

Impact of API Wrapper

28

•  Coverage  of  significant  user  queries  •  Can  be  used  with  grouping  •  Most  calculaMons  are  done  offline  

•  No  major  impact  on  search  latency  

•  99%  response  Mme  impact  of  less  than  5  ms  

29

Score  –  Custom  Collector  

Score  –  API  

Wrapper  Blended  

29

Results – Relevancy-Based Sorting Methods

29

•  Top  facets  are  relevant  to  query  

•  Top  facet  order  reflects  relevancy  

•  Easy  to  maintain  over  Mme  

•  Acceptable  latency  in  producMon  

30 30

Real-World Examples

30

Samsung  Galaxy   Diamond  Ring   Coffee  

31

Conclusions

•  Built-­‐in  facet  sorMng  methods  are  not  always  opMmal  for  relevancy  

•  SorMng  facets  based  on  result  score  can  improve  relevancy  

•  IntegraMng  external  signals  (such  as  user  data)  makes  the  soluMon  more  robust  

32 32

We’re Hiring!

32

•  Search  Hackers  

•  Data  ScienMsts  

•  NLP  Gurus  •  Machine  Learning  Hobbyists  

•  Deep  Learning  Knights  

•  Apache  CommiRers  

Please  visit    rakuten.careers  

33

Ques1ons?