Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...

Evaluation MetricsPresented by Dawn Lawrie

Some PossibilitiesPrecisionRecallF-measureMean Average PrecisionMean Reciprocal Rank

Precision

Proportion of things of interest in some set

Example: I’m interested in apples

Precision = 3 apples / 5 pieces of fruit

Recall

Proportion of things of interest in the set out of all the things of interest

Example: I’m looking for apples

Recall = 3 apples / 6 total apples

F-measure

Harmonic mean of precision and recallCombined measure that values each the same

F1= 2 * precision * recallprecision + recall

Where to use

The set is well definedOrder of things in the set doesn’t matter

But with a Ranked List123456789

123456789

Mean Average Precision

Also known as MAPFavored IR metric for ranked retrieval

Let Relevant = Set of Apples

Computing Average Precision

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

2 3 6 101112

r∈Relevant∑

Relevant

2 3 6 101112

r∈Relevant∑

Relevant

2 3 6 101112

r∈Relevant∑

Relevant

2 3 6 101112

r∈Relevant∑

Relevant

2 3 6 101112

r∈Relevant∑

Relevant

1/2 + 2/3

2 3 6 101112

r∈Relevant∑

Relevant

1/2 + 2/3

2 3 6 101112

r∈Relevant∑

Relevant

1/2 + 2/3 + 3/6

2 3 6 101112

r∈Relevant∑

Relevant

1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12

Compute MAPCompute average over a query set

Apple QueryBlueberry QueryPineapple QueryBanana Query

MAP Query( ) =AP Relevant( q )( )

q∈Query∑

Limitation of MAP

Results can be biased for query sets that include queries with few relevant documents

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

otherwise 1TopRank q( )

MRR Query( ) =RR (q )

q∈Query∑

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

otherwise 1TopRank q( )

MRR Query( ) =RR (q )

q∈Query∑

Reciprocal Rank

Understanding MRRRanks

205215

RR values0.2

0.067205215

RR values0.2

0.0670.00490.0047

205215

RR values0.2

0.0670.00490.0047

Average: 110 MRR: 0.069

205215

MRR vs. Average RankMRR=MAP when one relevant documentBound result between 0 and 1

1 is perfect retrievalAverage rank greatly influenced by documents retrieved at large ranks

High Ranks does not reflect the importance of those documents in practice

Minimizes difference between 750 and 900

Take Home MessageP/R and f-measure good for well defined setsMAP good for ranked results when your looking for 5+ thingsMRR good for ranked results when your looking for <5 things and best when just 1 thing

Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...

Documents

Spatial Metrics based Landscape Structure and Dynamics Assessment · PDF file · 2017-03-31Spatial Metrics based Landscape Structure and ... total edge, total edge and mean shape),

Metrics. 2 Learning objectives Software metrics Metrics for various phases Why metrics are needed How to collect metrics How to use metrics

Classification - uni-mannheim.de · Model Evaluation 4.1 Metrics for Model Evaluation ... F1-score combines precision and recall into one measure F1-score is the harmonic mean of

CURRENT POSITIONING ACCURACY USING SPACE GEODESY · 2017. 5. 16. · CURRENT POSITIONING ACCURACY USING SPACE GEODESY •Assessment methodologies – comparison metrics for precision

Accuracy versus Precision. In everyday language "precise" and "accurate” mean roughly the same thing... but not in physics

Diagnostic metrics for evaluation of annual and diurnal cycles · diurnal cycles of precipitation. The metrics for the annual variation include the annual mean, the solstice and equi-noctial

Stanford University...harmonic mean of the precision and recall, where precision is the ratio of words correct within the outputted answer and recall is the ratio of words correct

The Significance of Precision and Recall in Automatic Metrics for MT Evaluation

Software Reliability Model Metrics: Precision and Robustness

PRECISION IS POSSIBILITIES PRECISION IS MOTION · continuous basis against rigorous quality and delivery performance metrics. PRECISION IS PEOPLE Ultimately, technology is the backbone

Presentation metrics - Dasdan€¦ · Implicit presentation metrics • Statistical metrics – Precision/recall based on editorial data – Example: • Title from various sources

What Does That Mean? Author Selection Of Virtual Patient Metrics

Metrics and Caffe - Stanford University · Roadmap Evaluation Metrics Why metrics? Confusion Matrix IOU Positives vs Negatives ROC Curve Precision/Recall PR Curve Deep Learning &

When do Software Complexity Metrics Mean Nothing? do Software Complexity Metrics Mean ... and Hassan [40] ask whether code metrics are affected by ... When do Software Complexity Metrics

APPENDIX 1 REGIONAL BASEMAP INFORMATION · APPENDIX 3 ELEVATION RELIEF Elevation Metrics (feet above mean sea level) Mean Minimum Maximum 531.6 254.4 1169.7 Elevation Range ACRES

Evaluation principles and objectives · Information Extraction Metrics F-measure - the harmonic mean of precision and recall F = (B 22 + 1) P R / ( (B2 P) + R) where P = precision

Streaming Telemetry: Considerations & Challenges...2.2 1.3 22 12 Metric Precision nanosecond millisecond millisecond millisecond Write Performance -Single Node 470k metrics / sec (custom

Mean deformation metrics for quantifying 3D cell matrix ... · PDF fileMean deformation metrics for quantifying 3D cell–matrix interactions without requiring information about matrix

Fast R-CNNweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/Fast_RCNN.… · confidence Produce Precision Recall curves Average precision for each class - Take mean to get mAP

When Metrics Mean Business...When Metrics Mean Business 1 Mauricio Aguiar TI Métricas Rio de Janeiro, RJ Brasil 9th Software Measurement European Forum (SMEF), Roma, Italia, 2012