69
How Google Works A Ranking Engineer’s Perspective Paul Haahr SMX West March 3, 2016

How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Embed Size (px)

Citation preview

Page 1: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

How Google WorksA Ranking Engineer’s PerspectivePaul HaahrSMX WestMarch 3, 2016

Page 2: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

GoogleSearchToday

Page 3: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Mobile First

Page 4: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Features

• spelling suggestions

• autocomplete

• related searches

• related questions

• calculator

• knowledge graph

• answers

• featured snippets

• maps

• images

• videos

• in-depth articles

• movie showtimes

• sports scores

• weather

• flight status

• package tracking

• …

Erin Simon
not legally mandated, but typically we call it "knowledge graph" externally rather than "knowledge panels"
Kara Berman
Agreed from a PR perspective.
Paul Haahr
Fixed. (I thought knowledge graph was the underlying data and knowledge panels were the presentation. We do refer to them as knowledge panels, at least in the local case. E.g., https://support.google.com/business/answer/6331288?hl=en. But it's no issue to change, so I changed it.)
Page 5: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Ranking

Page 6: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

10 Blue Links

Page 7: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What documents do we show?

What order do we show them in?

Page 8: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Lifeof aQuery

Page 9: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Two Parts of a Search Engine• Ahead of time (before the query)• Query processing

Page 10: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Before the Query• Crawl the web• Analyze the crawled pages

• Extract links• Render contents• Annotate semantics• …

• Build an index

Page 11: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

The Index• Like the index of a book• For each word, a list of pages it appears on• Broken up into groups of millions of pages

• At Google, these are called “shards”• 1000s of shards for the web index

• Plus per-document metadata

Kara Berman
Is this something we've said before? If not, do we want to share it?
Paul Haahr
I'm pretty sure Jeff, at least, has talked about it. (E.g., http://web.stanford.edu/class/cs276/Jeff-Dean-Stanford-CS276-April-2015.pdf) And it raised no issues for the engineers who reviewed it.
Page 12: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Query Processing• Query understanding and expansion

• Retrieval and scoring

• Post-retrieval adjustments

Page 13: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Query Understanding• Does the query name any known entities?

• [san jose convention center]• [matt cutts]

• Are there useful synonyms?• [gm trucks]: “gm” → “general motors”• [gm corn]: “gm” → “genetically modified”

• Context matters

Page 14: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Retrieval and Scoring• Send the query to all the shards• Each shard

• Finds matching pages• Computes a score for query+page• Sends back the top N pages by score

• Combine all the top pages• Sort by score

Page 15: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Post-retrieval adjustments• Host clustering, sitelinks• Is there too much duplication?• Spam demotions, manual actions• …

Erin Simon
I'd avoid using the phrase "manual actions." is there another way you could talk about what you're doing so that it sounds less like we're deliberately interfering with the fair and neutral process of the algorithm? maybe something like 'legally mandated removals' to indicate that we are not just tweaking results for our own reasons.
Paul Haahr
This is meant to be about spam (consider the audience) and not legal removals. I originally had "manual penalties" and Cody said, per Larry's request, they now always use the terminology "manual actions." But I reverse the order to "Spam demotions, manual actions" to make it clear that they're related.
Page 16: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What do ranking engineers do? (version 1)

Write code for those servers

Page 17: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

ScoringSignals

Page 18: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Signal• A piece of information used in scoring• Query independent – feature of page

• PageRank, language, mobile friendliness, ...

• Query dependent – feature of page & query• keyword hits, synonyms, proximity, …

Page 19: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What do ranking engineers do? (version 2)

Look for new signals.

Combine old signals in new ways.

Page 20: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Metrics

Page 21: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

“If you can not measure it, you can not improve it.”

–Lord Kelvin (sort of)

Page 22: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Key Metrics• Relevance

• Does a page usefully answer the user’s query?• Ranking’s top-line metric

• Quality• How good are the results we show?

• Time to result (faster is better)• ...

Page 23: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Higher results matter• “Position weighed”• “Reciprocally ranked” metrics

• Position 1 is worth 1• Position 2 is worth ½• Position 3 is worth ⅓• Position 4 is worth ¼• …

Page 24: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What do ranking engineers do? (version 3)

Optimize for our metrics

Page 25: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

But where do themetrics come from?

Page 26: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Evaluation

Page 27: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

How do we measure ourselves?• Live Experiments• Human Rater Experiments

Page 28: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

LiveExperiments

Page 29: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Live Experiments• A/B experiments on real traffic

• Similar to what many other websites do

• Look for changes in click patterns• Harder to understand than you might expect

• A lot of traffic is in one experiment or another

Page 30: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Interpreting Live Experiments• Both pages P1 and P2 answer user’s need• For P1, answer is on the page• For P2, answer is on the page and in the snippet• Algorithm A puts P1 before P2 user clicks on P⇒ 1 “good”⇒• Algorithm B puts P2 before P1 no click “bad”⇒ ⇒

• Do we really think A is better than B?

Page 31: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

HumanRaterExperiments

Page 32: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Human Rater Experiments• Show real people experimental search results• Ask how good the results are• Ratings aggregated across raters• Published guidelines explain criteria for raters• Tools support doing this in an automated way

Page 33: How Google Works: A Ranking Engineer's Perspective By Paul Haahr
Page 34: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Result Rating Task

Page 35: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Two Scales• Needs Met

• Does this page address the user’s need?• Our current relevance metric

• Page Quality• How good is the page?

Page 36: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

MobileFirst

Page 37: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Mobile First Rating

“Needs Met rating tasks ask [raters] to focus on mobile user needs and think

about how helpful and satisfying the result is for the mobile users.”

Page 38: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

How do we make it mobile-centric?• More mobile queries than desktop in samples• Pay attention to user’s location• Tools display mobile user experience• Raters visit websites on smartphones

Page 39: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

NeedsMetRating

Page 40: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Needs Met Rating• Fully Meets• Highly Meets• Moderately Meets• Slightly Meets• Fails to Meets

(Following examples are from Rater Guidelines)

Page 41: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

FullyMeets

Page 42: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

(Very)HighlyMeets

Page 43: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

HighlyMeets

Page 44: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

(More)HighlyMeets

Page 45: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

ModeratelyMeets

Page 46: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

SlightlyMeets

Page 47: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Fails toMeet

Page 48: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

PageQualityRating

Page 49: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Page Quality Concepts• Expertise• Authoritativeness• Trustworthiness

Page 50: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

High Quality Pages• A satisfying amount of high quality main content

• The page and website are expert, authoritative, and trustworthy for the topic of the page

• The website has a good reputation for the topic of the page

Page 51: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Low Quality Pages• The quality of the main content is low

• There is an unsatisfying amount of main content

• The author does not have expertise or is not trustworthy or authoritative for the topic

• The website has a negative reputation

• The secondary content is distracting or unhelpful

Page 52: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

OptimizingOurMetrics

Page 53: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Ranking engineers• Team of a few hundred computer scientists• Focused on our metrics and signals• Run lots of experiments• Make lots of changes

Page 54: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Development Process• Idea• Repeat until ready:

• Write code• Generate data• Run experiments• Analyze

• Launch report by Quantitative Analyst• Launch review

Page 55: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What do ranking engineers do? (version 4)

Move results with good ratings up.

Move results with bad ratings down.

Page 56: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

WhatGoesWrong?

(And how do we fix it?)

Page 57: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Two kinds of problems• Systematically bad ratings• Metrics don’t capture things we care about

Page 58: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

BadRatings

Page 59: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

[texas farm fertilizer]• User is looking for a

brand of fertilizer

• Unlikely to want to go to the manufacturer’s headquarters

• Rater average called map of headquarters almost “Highly Meets”

Page 60: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Patterns of Losses• Look for things we think are bad in results

• Either live or from experiments

• Create examples for rater guidelines

Page 61: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

New rater example

Page 62: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

MissingMetrics

Page 63: How Google Works: A Ranking Engineer's Perspective By Paul Haahr
Page 64: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Low Quality Content in 2009-2011• Lots of complaints about low quality content• But our relevance metric kept going up

• Low quality pages can be very relevant• We thought we were doing great

• ⇒ We weren’t measuring what we needed to

Page 65: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Quality Metric• Gets directly at the quality issue• Not the same as relevance• Enabled development of quality-related signals

Page 66: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

When theMetricsMissSomething

Page 67: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

What do ranking engineers do? (version 5)

Fix rater guidelines ordevelop new metrics

(when necessary)

Page 68: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Thank you!

Page 69: How Google Works: A Ranking Engineer's Perspective By Paul Haahr

Questions?