Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Presented by Fiona Condon,...

Preview:

Citation preview

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Nice Docs Finish First: Designing Search Ranking for Fairness at Etsy

Fiona Condon Senior Software Engineer, Etsy

3

02

4

02

5

02

6

02

7

02Etsy by the Numbers

• 1.5 million active sellers • 21.7 million active buyers • 32 million listings for sale • Solr for listing search

8

02Etsy by the Numbers

9

02Search at Etsy

10

02Search at Etsy

11

01Search at Etsy

12

01Search at Etsy

13

01Our challenge

To return the best listings

14

01Our challenge

To return the most informative, honest, high-quality listings

15

01Our challenge

To return a diverse, fresh mix of the most informative, honest, high-quality listings

16

02

17

01Etsy Values

We are a mindful, transparent, and

humane business.

We plan and build for the long term.

We value craftsmanship in

all we make.

We believe fun should be part of everything we do.

We keep it real, always.

18

01

Don’t hate the player, change the game

We keep it real, always.

19

01Don’t hate the player, change the game

20

01Don’t hate the player, change the game

21

01Don’t hate the player, change the game

The count of the term in the document

The inverse of the count of the term across all documents

22

01Don’t hate the player, change the game

23

01Don’t hate the player, change the game

24

01Don’t hate the player, change the game

25

01Don’t hate the player, change the game

26

01Don’t hate the player, change the game

@Overridepublic float tf(float freq) {

return (freq > 0) ? 1.0f : 0.0f; }

27

01Don’t hate the player, change the game

• Set TF to 1

28

01Don’t hate the player, change the game

• Set TF to 1 • Choose the right fields to index

29

01Don’t hate the player, change the game

30

01Don’t hate the player, change the game

31

01Don’t hate the player, change the game

• Set TF to 1 • Choose the right fields to index

32

01

Crafting a quality signal

We value craftsmanship in all we make

33

01Crafting a quality signal

34

01Crafting a quality signal

35

01Crafting a quality signal

36

01Crafting a quality signal

37

01Crafting a quality signal

38

01Crafting a quality signal

• Avoid presentation bias

39

01Crafting a quality signal

40

01Crafting a quality signal

41

01Crafting a quality signal

• Avoid presentation bias • Store outside the index

42

01Crafting a quality signal

<fieldType name="listing_quality_file" keyField=“listing_id” defVal="0.5" stored="true" indexed="true" class="solr.ExternalFileField" valType=“float" />

43

01Crafting a quality signal

• Avoid presentation bias • Store outside the index • Bootstrap

44

01Crafting a quality signal

45

01

Freshness is fun

We believe fun should be part of everything we do.

46

01Freshness is fun

47

01Freshness is fun

• Diversify by seller

48

01Freshness is funpublic  SearchResults  diversify(SearchResults  results)  {          SearchResults  diversifiedResults;          int  nextWindow  =  diversityOptions.getWindow();              do  {              diversityOptions.window  =  nextWindow;              diversifiedResults  =  shopDiversifier.diversify(results);                  DiversityStats  shopDiversity  =  diversifiedResults.docs.stream()                  .collect(DiversityStatsCalculator.collector(ListingDoc::getShopId)).getStats();                  //  if  the  results  are  sufficiently  diverse,  we're  done              if  (shopDiversity.getDiversityIndex()  <=  diversityOptions.progressive.getTargetDiversityIndex())  {                  break;              }                  //  otherwise,  broaden  the  window  and  re-­‐try              nextWindow  =  Math.min(                  diversifiedResults.totalCount,                  Math.min(diversityOptions.getMaxWindow(),  diversityOptions.window  *  2)              );          }  while  (diversityOptions.window  <  nextWindow);              return  diversifiedResults;  }

49

01Freshness is fun

.  .  .  

           //  if  the  list  is  sufficiently  diverse,  we're  done              if  (shopDiversity.getDiversityIndex()  <=  diversityOptions.progressive.getTargetDiversityIndex())  {                  break;              }                  //  otherwise,  broaden  the  window  and  re-­‐try              nextWindow  =  Math.min(                  diversifiedResults.totalCount,                  Math.min(diversityOptions.getMaxWindow(),  diversityOptions.window  *  2)              );  

.  .  .  

50

01Freshness is fun

51

01Freshness is fun

52

01Freshness is fun

• Diversify by seller • Recency boost

53

01Freshness is fun

The inverse of the time elapsed between now and listing creation

54

01Freshness is fun

55

01

Evaluating for stability

We plan and build for the long term.

56

01Evaluating for stability

• Replayer • CL tool • Uses sampled request logs to “replay” real traffic • Accepts target hosts, duration, query rate • Programmatically filters or alters requests • Provides realistic stats on average/worst-case impact

57

01Evaluating for stability

58

01Evaluating for stability

• RankDelta • Web UI • Allows user to specify query set, hosts and thrift params

in PHP code • Provides high-level statistics about the results • Plus full result set deep-dive

59

01Evaluating for stability

60

01Evaluating for stability

• In an ideal world…

61

01Evaluating for stability

• In an ideal world… • Making trade-offs

62

01

Communicating clearly

We are a mindful, transparent, and

humane business.

63

01Communicating clearly

64

01Communicating clearly

• Focus on the constants

65

01Communicating clearly

• Focus on the constants • Provide a feedback loop

66

01Communicating clearly

67

01Takeaways

• Minor changes to the default scoring can be powerful • Handle quality contextually • Conscious diversity serves both searcher & searchee • Invest in a feedback loop on ranking changes • Be honest but keep it consistent

68

02

@fioroco fiona.io fiona@etsy.com

codeascraft.com etsy.com/careers