115
Machine-learned relevance Learning to rank Introduction to Information Retrieval http://informationretrieval.org IIR 15-2: Learning to Rank Hinrich Sch¨ utze Institute for Natural Language Processing, Universit¨ at Stuttgart 2011-08-29 Sch¨ utze: Learning to rank 1 / 28

Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Introduction to Information Retrievalhttp://informationretrieval.org

IIR 15-2: Learning to Rank

Hinrich Schutze

Institute for Natural Language Processing, Universitat Stuttgart

2011-08-29

Schutze: Learning to rank 1 / 28

Page 2: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Models and Methods

1 Boolean model and its limitations (30)

2 Vector space model (30)

3 Probabilistic models (30)

4 Language model-based retrieval (30)

5 Latent semantic indexing (30)

6 Learning to rank (30)

Schutze: Learning to rank 3 / 28

Page 3: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Take-away

Schutze: Learning to rank 4 / 28

Page 4: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Take-away

Machine-learned relevance: We use machine learning to learnthe relevance score (retrieval status value) of a document withrespect to a query.

Schutze: Learning to rank 4 / 28

Page 5: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Take-away

Machine-learned relevance: We use machine learning to learnthe relevance score (retrieval status value) of a document withrespect to a query.

Learning to rank: A machine-learning method that directlyoptimizes the ranking (as opposed to classification orregression accuracy)

Schutze: Learning to rank 4 / 28

Page 6: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Outline

1 Machine-learned relevance

2 Learning to rank

Schutze: Learning to rank 5 / 28

Page 7: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Basic idea

Schutze: Learning to rank 6 / 28

Page 8: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Basic idea

Given: A training set of examples, each of which is a tuple of:a query q, a document d , a relevance judgment for d on q

Schutze: Learning to rank 6 / 28

Page 9: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Basic idea

Given: A training set of examples, each of which is a tuple of:a query q, a document d , a relevance judgment for d on q

Learn weights from this training set, so that the learned scoresapproximate the relevance judgments in the training set

Schutze: Learning to rank 6 / 28

Page 10: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Schutze: Learning to rank 7 / 28

Page 11: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Schutze: Learning to rank 7 / 28

Page 12: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

Schutze: Learning to rank 7 / 28

Page 13: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

We need a query-specific training set to learn the ranker.

Schutze: Learning to rank 7 / 28

Page 14: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

We need a query-specific training set to learn the ranker.We need to learn a new ranker for each query.

Schutze: Learning to rank 7 / 28

Page 15: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

We need a query-specific training set to learn the ranker.We need to learn a new ranker for each query.

Machine-learned relevance and learning to rank usually referto query-independent ranking.

Schutze: Learning to rank 7 / 28

Page 16: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

We need a query-specific training set to learn the ranker.We need to learn a new ranker for each query.

Machine-learned relevance and learning to rank usually referto query-independent ranking.

We learn a single classifier or ranker.

Schutze: Learning to rank 7 / 28

Page 17: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance vs. Text classification

Both are machine learning approaches

Text classification (if used for information retrieval, e.g., inrelevance feedback) is query-specific.

We need a query-specific training set to learn the ranker.We need to learn a new ranker for each query.

Machine-learned relevance and learning to rank usually referto query-independent ranking.

We learn a single classifier or ranker.

We can then rank documents for a query that we don’t haveany relevance judgments for.

Schutze: Learning to rank 7 / 28

Page 18: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

Schutze: Learning to rank 8 / 28

Page 19: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

The vector space cosine similarity between query anddocument (denoted α)

Schutze: Learning to rank 8 / 28

Page 20: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

The vector space cosine similarity between query anddocument (denoted α)

The minimum window width within which the query terms lie(denoted ω)

Schutze: Learning to rank 8 / 28

Page 21: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

The vector space cosine similarity between query anddocument (denoted α)

The minimum window width within which the query terms lie(denoted ω)

Thus, we have

Schutze: Learning to rank 8 / 28

Page 22: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

The vector space cosine similarity between query anddocument (denoted α)

The minimum window width within which the query terms lie(denoted ω)

Thus, we have

one feature (α) that captures overall query-document similarity

Schutze: Learning to rank 8 / 28

Page 23: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Two typical features used in machine-learned relevance

The vector space cosine similarity between query anddocument (denoted α)

The minimum window width within which the query terms lie(denoted ω)

Thus, we have

one feature (α) that captures overall query-document similarityone feature (ω) that captures query term proximity (oftenindicative of topical relevance)

Schutze: Learning to rank 8 / 28

Page 24: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup for these two features

Schutze: Learning to rank 9 / 28

Page 25: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup for these two features

Training set

Example DocID Query α ω Judgment

Φ1 37 linux . . . 0.032 3 relevantΦ2 37 penguin . . . 0.02 4 nonrelevantΦ3 238 operating system 0.043 2 relevantΦ4 238 runtime . . . 0.004 2 nonrelevantΦ5 1741 kernel layer 0.022 3 relevantΦ6 2094 device driver 0.03 2 relevantΦ7 3191 device driver 0.027 5 nonrelevant

α is the cosine score. ω is the window width.

Schutze: Learning to rank 9 / 28

Page 26: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup (2)

Schutze: Learning to rank 10 / 28

Page 27: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup (2)

Two classes: relevant = 1 and nonrelevant = 0

Schutze: Learning to rank 10 / 28

Page 28: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup (2)

Two classes: relevant = 1 and nonrelevant = 0

We now seek a scoring function that combines the values ofthe features to generate a value that is (close to) 0 or 1.

Schutze: Learning to rank 10 / 28

Page 29: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup (2)

Two classes: relevant = 1 and nonrelevant = 0

We now seek a scoring function that combines the values ofthe features to generate a value that is (close to) 0 or 1.

We wish this function to be in agreement with our set oftraining examples as much as possible.

Schutze: Learning to rank 10 / 28

Page 30: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance: Setup (2)

Two classes: relevant = 1 and nonrelevant = 0

We now seek a scoring function that combines the values ofthe features to generate a value that is (close to) 0 or 1.

We wish this function to be in agreement with our set oftraining examples as much as possible.

The simplest classifier is a linear classifier, defined by anequation of the form:

Score(d , q) = Score(α, ω) = aα+ bω + c ,

where we learn the coefficients a, b, c from training data.

Schutze: Learning to rank 10 / 28

Page 31: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Graphic representation of the training set

Schutze: Learning to rank 11 / 28

Page 32: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Graphic representation of the training set

Schutze: Learning to rank 11 / 28

Page 33: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

In this case, we learn a linear classifier in 2D

Schutze: Learning to rank 12 / 28

Page 34: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

In this case, we learn a linear classifier in 2D

A linear classifier in 2D isa line described by theequation w1d1 + w2d2 = θ

Example for a 2D linearclassifier

Points (d1 d2) withw1d1 + w2d2 ≥ θ are inthe class c .

Points (d1 d2) withw1d1 + w2d2 < θ are inthe complement classc .

Schutze: Learning to rank 12 / 28

Page 35: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

In this case, we learn a linear classifier in 2D

A linear classifier in 2D isa line described by theequation w1d1 + w2d2 = θ

Example for a 2D linearclassifier

Points (d1 d2) withw1d1 + w2d2 ≥ θ are inthe class c .

Points (d1 d2) withw1d1 + w2d2 < θ are inthe complement classc .

Schutze: Learning to rank 12 / 28

Page 36: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Schutze: Learning to rank 13 / 28

Page 37: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triples

Schutze: Learning to rank 13 / 28

Page 38: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training set

Schutze: Learning to rank 13 / 28

Page 39: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)

Schutze: Learning to rank 13 / 28

Page 40: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)Rank documents according to model’s decisions

Schutze: Learning to rank 13 / 28

Page 41: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)Rank documents according to model’s decisionsReturn the top K (e.g., K = 10) to the user

Schutze: Learning to rank 13 / 28

Page 42: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)Rank documents according to model’s decisionsReturn the top K (e.g., K = 10) to the user

In principle, any classification/regression method can be used.

Schutze: Learning to rank 13 / 28

Page 43: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)Rank documents according to model’s decisionsReturn the top K (e.g., K = 10) to the user

In principle, any classification/regression method can be used.

Big advantage: we avoid hand-tuning scoring functions andsimply learn them from training data.

Schutze: Learning to rank 13 / 28

Page 44: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Summary

Machine-learned relevance

Assemble a training set of query-document-judgment triplesTrain classification or regression model on training setFor a new query, apply model to all documents (actually: asubset)Rank documents according to model’s decisionsReturn the top K (e.g., K = 10) to the user

In principle, any classification/regression method can be used.

Big advantage: we avoid hand-tuning scoring functions andsimply learn them from training data.

Bottleneck: we need to maintain a representative set oftraining examples whose relevance assessments must be madeby humans.

Schutze: Learning to rank 13 / 28

Page 45: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance for more than two features

The approach can be readily generalized to a large number offeatures.

Schutze: Learning to rank 14 / 28

Page 46: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Machine-learned relevance for more than two features

The approach can be readily generalized to a large number offeatures.

Any measure that can be calculated for a query-documentpair is fair game for this approach.

Schutze: Learning to rank 14 / 28

Page 47: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (1)

Schutze: Learning to rank 15 / 28

Page 48: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (1)

Features derived from standard IR models: query termnumber, query term ratio, length, idf,sum/min/max/mean/variance of term frequency,sum/min/max/mean/variance of length normalized termfrequency, sum/min/max/mean/variance of tf-idf weight,boolean model, BM25, LM-absolute-discounting, LM-dirichlet,LM-jelinek-mercer

Schutze: Learning to rank 15 / 28

Page 49: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (1)

Features derived from standard IR models: query termnumber, query term ratio, length, idf,sum/min/max/mean/variance of term frequency,sum/min/max/mean/variance of length normalized termfrequency, sum/min/max/mean/variance of tf-idf weight,boolean model, BM25, LM-absolute-discounting, LM-dirichlet,LM-jelinek-mercer

Most of these features can be computed for different zones:body, anchor, title, url, whole document

Schutze: Learning to rank 15 / 28

Page 50: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (2)

Schutze: Learning to rank 16 / 28

Page 51: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (2)

Web-specific features: number of slashes in url, length of url,inlink number, outlink number, PageRank, SiteRank

Schutze: Learning to rank 16 / 28

Page 52: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (2)

Web-specific features: number of slashes in url, length of url,inlink number, outlink number, PageRank, SiteRank

Spam features: QualityScore

Schutze: Learning to rank 16 / 28

Page 53: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (2)

Web-specific features: number of slashes in url, length of url,inlink number, outlink number, PageRank, SiteRank

Spam features: QualityScore

Usage-based features: query-url click count, url click count,url dwell time

Schutze: Learning to rank 16 / 28

Page 54: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

LTR features used by Microsoft Research (2)

Web-specific features: number of slashes in url, length of url,inlink number, outlink number, PageRank, SiteRank

Spam features: QualityScore

Usage-based features: query-url click count, url click count,url dwell time

All of these features can be assembled into a big featurevector and then fed into the machine learning algorithm.

Schutze: Learning to rank 16 / 28

Page 55: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Shortcoming of what we’ve presented so far

Approaching IR ranking like we have done so far is notnecessarily the right way to think about the problem.

Schutze: Learning to rank 17 / 28

Page 56: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Shortcoming of what we’ve presented so far

Approaching IR ranking like we have done so far is notnecessarily the right way to think about the problem.

Statisticians normally first divide problems into classificationproblems (where a categorical variable is predicted) versusregression problems (where a real number is predicted).

Schutze: Learning to rank 17 / 28

Page 57: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Shortcoming of what we’ve presented so far

Approaching IR ranking like we have done so far is notnecessarily the right way to think about the problem.

Statisticians normally first divide problems into classificationproblems (where a categorical variable is predicted) versusregression problems (where a real number is predicted).

In between: specialized field of ordinal regression

Schutze: Learning to rank 17 / 28

Page 58: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Shortcoming of what we’ve presented so far

Approaching IR ranking like we have done so far is notnecessarily the right way to think about the problem.

Statisticians normally first divide problems into classificationproblems (where a categorical variable is predicted) versusregression problems (where a real number is predicted).

In between: specialized field of ordinal regression

Machine learning for ad hoc retrieval is most properly thoughtof as an ordinal regression problem.

Schutze: Learning to rank 17 / 28

Page 59: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Shortcoming of what we’ve presented so far

Approaching IR ranking like we have done so far is notnecessarily the right way to think about the problem.

Statisticians normally first divide problems into classificationproblems (where a categorical variable is predicted) versusregression problems (where a real number is predicted).

In between: specialized field of ordinal regression

Machine learning for ad hoc retrieval is most properly thoughtof as an ordinal regression problem.

Next up: ranking SVMs, a machine learning method thatlearns an ordering directly.

Schutze: Learning to rank 17 / 28

Page 60: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Outline

1 Machine-learned relevance

2 Learning to rank

Schutze: Learning to rank 18 / 28

Page 61: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

Schutze: Learning to rank 19 / 28

Page 62: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

As before we begin with a set of judged query-document pairs.

Schutze: Learning to rank 19 / 28

Page 63: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

As before we begin with a set of judged query-document pairs.

But we do not represent them as query-document-judgmenttriples.

Schutze: Learning to rank 19 / 28

Page 64: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

As before we begin with a set of judged query-document pairs.

But we do not represent them as query-document-judgmenttriples.

Instead, we ask judges, for each training query q, to order thedocuments that were returned by the search engine withrespect to relevance to the query.

Schutze: Learning to rank 19 / 28

Page 65: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

As before we begin with a set of judged query-document pairs.

But we do not represent them as query-document-judgmenttriples.

Instead, we ask judges, for each training query q, to order thedocuments that were returned by the search engine withrespect to relevance to the query.

We again construct a vector of features ψj = ψ(dj , q) for eachdocument-query pair – exactly as we did before.

Schutze: Learning to rank 19 / 28

Page 66: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Basic setup for ranking SVMs

As before we begin with a set of judged query-document pairs.

But we do not represent them as query-document-judgmenttriples.

Instead, we ask judges, for each training query q, to order thedocuments that were returned by the search engine withrespect to relevance to the query.

We again construct a vector of features ψj = ψ(dj , q) for eachdocument-query pair – exactly as we did before.

For two documents di and dj , we then form the vector offeature differences:

Φ(di , dj , q) = ψ(di , q) − ψ(dj , q)

Schutze: Learning to rank 19 / 28

Page 67: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

Schutze: Learning to rank 20 / 28

Page 68: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

By hypothesis, one of di and dj has been judged morerelevant.

Schutze: Learning to rank 20 / 28

Page 69: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

By hypothesis, one of di and dj has been judged morerelevant.

Notation: We write di ≺ dj for “di precedes dj in the resultsordering”.

Schutze: Learning to rank 20 / 28

Page 70: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

By hypothesis, one of di and dj has been judged morerelevant.

Notation: We write di ≺ dj for “di precedes dj in the resultsordering”.

If di is judged more relevant than dj , then we will assign thevector Φ(di , dj , q) the class yijq = +1; otherwise −1.

Schutze: Learning to rank 20 / 28

Page 71: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

By hypothesis, one of di and dj has been judged morerelevant.

Notation: We write di ≺ dj for “di precedes dj in the resultsordering”.

If di is judged more relevant than dj , then we will assign thevector Φ(di , dj , q) the class yijq = +1; otherwise −1.

This gives us a training set of pairs of vectors and“precedence indicators”.

Schutze: Learning to rank 20 / 28

Page 72: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Training a ranking SVM

Vector of feature differences: Φ(di , dj , q) = ψ(di , q)−ψ(dj , q)

By hypothesis, one of di and dj has been judged morerelevant.

Notation: We write di ≺ dj for “di precedes dj in the resultsordering”.

If di is judged more relevant than dj , then we will assign thevector Φ(di , dj , q) the class yijq = +1; otherwise −1.

This gives us a training set of pairs of vectors and“precedence indicators”.

We can then train an SVM on this training set with the goalof obtaining a classifier that returns

~wTΦ(di , dj , q) > 0 iff di ≺ dj

Schutze: Learning to rank 20 / 28

Page 73: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Advantages of Ranking SVMs vs. Classification/regression

Schutze: Learning to rank 21 / 28

Page 74: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Advantages of Ranking SVMs vs. Classification/regression

Documents can be evaluated relative to other candidatedocuments for the same query . . .

Schutze: Learning to rank 21 / 28

Page 75: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Advantages of Ranking SVMs vs. Classification/regression

Documents can be evaluated relative to other candidatedocuments for the same query . . .

. . . rather than having to be mapped to a global scale ofgoodness.

Schutze: Learning to rank 21 / 28

Page 76: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Advantages of Ranking SVMs vs. Classification/regression

Documents can be evaluated relative to other candidatedocuments for the same query . . .

. . . rather than having to be mapped to a global scale ofgoodness.

This often is an easier problem to solve since just a ranking isrequired rather than an absolute measure of relevance.

Schutze: Learning to rank 21 / 28

Page 77: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Schutze: Learning to rank 22 / 28

Page 78: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

Schutze: Learning to rank 22 / 28

Page 79: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

But some violations are minor problems, e.g., getting the orderof two relevant documents wrong.

Schutze: Learning to rank 22 / 28

Page 80: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

But some violations are minor problems, e.g., getting the orderof two relevant documents wrong.Other violations are big problems, e.g., ranking a nonrelevantdocument ahead of a relevant document.

Schutze: Learning to rank 22 / 28

Page 81: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

But some violations are minor problems, e.g., getting the orderof two relevant documents wrong.Other violations are big problems, e.g., ranking a nonrelevantdocument ahead of a relevant document.

In most IR settings, getting the order of the top documentsright is key.

Schutze: Learning to rank 22 / 28

Page 82: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

But some violations are minor problems, e.g., getting the orderof two relevant documents wrong.Other violations are big problems, e.g., ranking a nonrelevantdocument ahead of a relevant document.

In most IR settings, getting the order of the top documentsright is key.

In the simple setting we have described, top and bottom rankswill not be treated differently.

Schutze: Learning to rank 22 / 28

Page 83: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Why simple ranking SVMs don’t work that well

Ranking SVMs treat all ranking violations alike.

But some violations are minor problems, e.g., getting the orderof two relevant documents wrong.Other violations are big problems, e.g., ranking a nonrelevantdocument ahead of a relevant document.

In most IR settings, getting the order of the top documentsright is key.

In the simple setting we have described, top and bottom rankswill not be treated differently.

→ Learning-to-rank frameworks actually used in IR are morecomplicated than what we have presented here.

Schutze: Learning to rank 22 / 28

Page 84: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

Schutze: Learning to rank 23 / 28

Page 85: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

SVM algorithm that directly optimizes MAP (as opposed toranking).

Schutze: Learning to rank 23 / 28

Page 86: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

SVM algorithm that directly optimizes MAP (as opposed toranking).Proposed by: Yue, Finley, Radlinski, Joachims, ACM SIGIR 2002.

Schutze: Learning to rank 23 / 28

Page 87: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

SVM algorithm that directly optimizes MAP (as opposed toranking).Proposed by: Yue, Finley, Radlinski, Joachims, ACM SIGIR 2002.Performance compared to state-of-the-art models: cosine, tf-idf,BM25, language models (Dirichlet and Jelinek-Mercer)

Schutze: Learning to rank 23 / 28

Page 88: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

SVM algorithm that directly optimizes MAP (as opposed toranking).Proposed by: Yue, Finley, Radlinski, Joachims, ACM SIGIR 2002.Performance compared to state-of-the-art models: cosine, tf-idf,BM25, language models (Dirichlet and Jelinek-Mercer)

Schutze: Learning to rank 23 / 28

Page 89: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Example for superior performance of LTR

SVM algorithm that directly optimizes MAP (as opposed toranking).Proposed by: Yue, Finley, Radlinski, Joachims, ACM SIGIR 2002.Performance compared to state-of-the-art models: cosine, tf-idf,BM25, language models (Dirichlet and Jelinek-Mercer)

Learning-to-rank clearly better than non-machine-learningapproaches

Schutze: Learning to rank 23 / 28

Page 90: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

Schutze: Learning to rank 24 / 28

Page 91: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Schutze: Learning to rank 24 / 28

Page 92: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Schutze: Learning to rank 24 / 28

Page 93: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Schutze: Learning to rank 24 / 28

Page 94: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming available

Schutze: Learning to rank 24 / 28

Page 95: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational power

Schutze: Learning to rank 24 / 28

Page 96: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational powerWillingness to pay for large annotated training sets

Schutze: Learning to rank 24 / 28

Page 97: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational powerWillingness to pay for large annotated training sets

Strengths of learning-to-rank

Schutze: Learning to rank 24 / 28

Page 98: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational powerWillingness to pay for large annotated training sets

Strengths of learning-to-rank

Humans are bad at fine-tuning a ranking function with dozensof parameters.

Schutze: Learning to rank 24 / 28

Page 99: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational powerWillingness to pay for large annotated training sets

Strengths of learning-to-rank

Humans are bad at fine-tuning a ranking function with dozensof parameters.Machine-learning methods are good at it.

Schutze: Learning to rank 24 / 28

Page 100: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Assessment of learning to rank

The idea of learning to rank is old.

Early work by Norbert Fuhr and William S. Cooper

Renewed recent interest due to:

Better machine learning methods becoming availableMore computational powerWillingness to pay for large annotated training sets

Strengths of learning-to-rank

Humans are bad at fine-tuning a ranking function with dozensof parameters.Machine-learning methods are good at it.Web search engines use a large number of features → websearch engines need some form of learning to rank.

Schutze: Learning to rank 24 / 28

Page 101: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Schutze: Learning to rank 25 / 28

Page 102: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

Schutze: Learning to rank 25 / 28

Page 103: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

Schutze: Learning to rank 25 / 28

Page 104: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Schutze: Learning to rank 25 / 28

Page 105: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

Schutze: Learning to rank 25 / 28

Page 106: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

Schutze: Learning to rank 25 / 28

Page 107: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

You need to tune parameters.

Schutze: Learning to rank 25 / 28

Page 108: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

You need to tune parameters.

Best performance: learning to rank

Schutze: Learning to rank 25 / 28

Page 109: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

You need to tune parameters.

Best performance: learning to rank

But you need an expensive training set

Schutze: Learning to rank 25 / 28

Page 110: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

You need to tune parameters.

Best performance: learning to rank

But you need an expensive training set

Noisy data or vocabulary mismatch queries/documents & notime to custom-build a solution & collection is not too large

Schutze: Learning to rank 25 / 28

Page 111: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Information retrieval models: Pros and Cons

Least effort: Boolean system

In general, low user satisfaction

A little bit more effort: Vector space model

Acceptable performance in many cases

State-of-the-art performance: BM25, LMs

You need to tune parameters.

Best performance: learning to rank

But you need an expensive training set

Noisy data or vocabulary mismatch queries/documents & notime to custom-build a solution & collection is not too large

Use Latent Semantic Indexing

Schutze: Learning to rank 25 / 28

Page 112: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Take-away

Machine-learned relevance: We use machine learning to learnthe relevance score (retrieval status value) of a document withrespect to a query.

Learning to rank: A machine-learning method that directlyoptimizes the ranking (as opposed to classification orregression accuracy)

Schutze: Learning to rank 26 / 28

Page 113: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Resources

Chapter 15 of Introduction to Information Retrieval

Resources at http://informationretrieval.org/essir2011

References to learning to rank literatureMicrosoft learning to rank datasetsHow Google tweaks ranking

Schutze: Learning to rank 27 / 28

Page 114: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Exercise

Schutze: Learning to rank 28 / 28

Page 115: Introduction to Information Retrieval ` `%%%`# ` …Machine-learned relevance Learning to rank Machine-learned relevance vs. Text classification Both are machine learning approaches

Machine-learned relevance Learning to rank

Exercise

Write down the training set from the last exercise as a training setfor a ranking SVM.

Schutze: Learning to rank 28 / 28