32
Recruiting Solutions formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding 1 Shakti Daniel

[In]formation Retrieval: Search at LinkedIn

Embed Size (px)

DESCRIPTION

[In]formation Retrieval: Search at LinkedIn By Shakti Sinha & Daniel Tunkelang Bay Area Search Meetup Presentation March 27, 2013 http://www.meetup.com/Bay-Area-Search/events/63736862/ LinkedIn has a unique data collection: the 200M+ members who use LinkedIn are also part of the content those same members access using our information retrieval products. In this talk, the speakers will discuss some of the unique challenges we face in building the LinkedIn search platform, particularly around leveraging semi-structured and social content, understanding query intent, and personalizing relevance. Shakti Sinha heads LinkedIn's search relevance team, and has been making key contributions to LinkedIn's search products since 2010. He previously worked at Google as both a research intern and a software engineer. He has a MS in Computer Science from Stanford, as well as a BS degree from College of Engineering, Pune. Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Citation preview

Page 1: [In]formation Retrieval: Search at LinkedIn

Recruiting Solutions Recruiting Solutions Recruiting Solutions

formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding

1

Shakti Daniel

Page 2: [In]formation Retrieval: Search at LinkedIn

Why do 200M+ people use LinkedIn?

2

Page 3: [In]formation Retrieval: Search at LinkedIn

People use LinkedIn because of other people.

3

Page 4: [In]formation Retrieval: Search at LinkedIn

Search helps members find and be found.

4

Page 5: [In]formation Retrieval: Search at LinkedIn

Rich collection of professional content.

5

Page 6: [In]formation Retrieval: Search at LinkedIn

Every search is personalized.

6

Page 7: [In]formation Retrieval: Search at LinkedIn

Let’s talk a bit about how it all works.

§  Query Understanding

§  Search Spam

§  Unified Search More at http://data.linkedin.com/search.

7

Page 8: [In]formation Retrieval: Search at LinkedIn

Query Understanding

8

Page 9: [In]formation Retrieval: Search at LinkedIn

People are semi-structured objects.

9 9

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

Page 10: [In]formation Retrieval: Search at LinkedIn

Word sense is contextual.

10

Page 11: [In]formation Retrieval: Search at LinkedIn

Understand queries as early as possible.

11

Page 12: [In]formation Retrieval: Search at LinkedIn

Query structure has many applications.

§  Boost results that match query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  …

Query understanding focuses on set-level metrics.

Not just about best answer, but getting to best question.

12

Page 13: [In]formation Retrieval: Search at LinkedIn

Search Spam

13

Page 14: [In]formation Retrieval: Search at LinkedIn

Let’s look at a search spammer.

14

Page 15: [In]formation Retrieval: Search at LinkedIn

Summary is verbose but legitimate.

15

Page 16: [In]formation Retrieval: Search at LinkedIn

But then comes the keyword stuffing.

16

Page 17: [In]formation Retrieval: Search at LinkedIn

How we train our search spam classifier.

§  Find the queries targeted by spammers. –  10,000 most common non-name queries.

§  Look at top results for a generic user. –  i.e., show unpersonalized search results.

§  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers.

§  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective.

17

Page 18: [In]formation Retrieval: Search at LinkedIn

ROC curve for spam thresholding.

18

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

a

b

Spam score threshold

0 < a < b < 1

Page 19: [In]formation Retrieval: Search at LinkedIn

Integrate spamminess into relevance score.

§  Spam model yields a probability between 0 and 1.

§  Use spam score as piecewise linear factor: if score < spammin: # not a spammer relevance *= 1.0 elif score > spammax: # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spammax - score) / (spammax - spammin)

19

Page 20: [In]formation Retrieval: Search at LinkedIn

Spam is an arms race.

§  We can’t reveal precisely which features we use for spam detection, or spammers will work around them.

§  Spammers will try to reverse-engineer us anyway.

§  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking.

§  Fighting spam is all about making the investment less profitable for the spammer.

20

Page 21: [In]formation Retrieval: Search at LinkedIn

Unified Search

21

Page 22: [In]formation Retrieval: Search at LinkedIn

Un-Unified Search

22

Page 23: [In]formation Retrieval: Search at LinkedIn

Introducing LinkedIn Unified Search!

Goal: make all of our content more discoverable.

Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page

23

Page 24: [In]formation Retrieval: Search at LinkedIn

Query Auto-Complete

24

Page 25: [In]formation Retrieval: Search at LinkedIn

Best completion not always the most popular.

§  In a heavy-tailed distribution, even the most popular queries account for a small fraction of distribution.

§  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs

§  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types.

25

Page 26: [In]formation Retrieval: Search at LinkedIn

Content Type Suggestions

26

Page 27: [In]formation Retrieval: Search at LinkedIn

How we compute content type suggestions.

§  Rank content types by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions.

§  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias.

§  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical.

27

Page 28: [In]formation Retrieval: Search at LinkedIn

Unified Search Result Page

28

Page 29: [In]formation Retrieval: Search at LinkedIn

Intent Detection and Page Construction

§  Relevance is now a two-part computation:

P(Content Type | User, Query) x

P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries

to all verticals.

§  Secondary components introduce diversity.

29

Page 30: [In]formation Retrieval: Search at LinkedIn

Summary

§  Personalize every search and leverage structure. §  Understand queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience.

Goal: help LinkedIn’s 200M+ members find and be found.

30

Page 31: [In]formation Retrieval: Search at LinkedIn

Thank you!

31

Page 32: [In]formation Retrieval: Search at LinkedIn

Want to learn more?

§  Check out http://data.linkedin.com/search.

§  Contact us: –  Shakti: [email protected]

http://linkedin.com/in/sdsinha

–  Daniel: [email protected] http://linkedin.com/in/dtunkelang

§  Did we mention that we’re hiring?

32