Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...

Preview:

Citation preview

Evaluation MetricsPresented by Dawn Lawrie

1

Some PossibilitiesPrecisionRecallF-measureMean Average PrecisionMean Reciprocal Rank

2

Precision

Proportion of things of interest in some set

Example: I’m interested in apples

Set

Precision = 3 apples / 5 pieces of fruit

3

Recall

Proportion of things of interest in the set out of all the things of interest

Example: I’m looking for apples

Set

Recall = 3 apples / 6 total apples

4

F-measure

Harmonic mean of precision and recallCombined measure that values each the same

F1= 2 * precision * recallprecision + recall

5

Where to use

The set is well definedOrder of things in the set doesn’t matter

6

But with a Ranked List123456789

10

123456789

10

7

Mean Average Precision

Also known as MAPFavored IR metric for ranked retrieval

8

Let Relevant = Set of Apples

Computing Average Precision

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3 + 3/6

9

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12

9

Compute MAPCompute average over a query set

Apple QueryBlueberry QueryPineapple QueryBanana Query

MAP Query( ) =AP Relevant( q )( )

q∈Query∑

Query

10

Limitation of MAP

Results can be biased for query sets that include queries with few relevant documents

11

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

0

otherwise 1TopRank q( )

!

"

##

$

##

MRR Query( ) =RR (q )

q∈Query∑

Query

12

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

0

otherwise 1TopRank q( )

!

"

##

$

##

MRR Query( ) =RR (q )

q∈Query∑

Query

Reciprocal Rank

12

Understanding MRRRanks

515

13

Understanding MRRRanks

515

205215

13

Understanding MRRRanks

515

RR values0.2

0.067205215

13

Understanding MRRRanks

515

RR values0.2

0.0670.00490.0047

205215

13

Understanding MRRRanks

515

RR values0.2

0.0670.00490.0047

Average: 110 MRR: 0.069

205215

13

MRR vs. Average RankMRR=MAP when one relevant documentBound result between 0 and 1

1 is perfect retrievalAverage rank greatly influenced by documents retrieved at large ranks

High Ranks does not reflect the importance of those documents in practice

Minimizes difference between 750 and 900

14

Take Home MessageP/R and f-measure good for well defined setsMAP good for ranked results when your looking for 5+ thingsMRR good for ranked results when your looking for <5 things and best when just 1 thing

15

Recommended