19
Performance Evaluation and Quality Assessment Stefan Pletschacher Europeana Newspapers Information Day London, 9 June 2014

Performance Evaluation and Quality Assessment

Embed Size (px)

DESCRIPTION

Performance Evaluation and Quality Assessment by Stefan Pletschacher, University of Salford. Presentation given at the Europeana Newspapers Information Day, held at the British Library on 9 June 2014.

Citation preview

Page 1: Performance Evaluation and Quality Assessment

Performance Evaluation and Quality Assessment

Stefan Pletschacher

Europeana Newspapers Information Day

London, 9 June 2014

Page 2: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Scope

• Intelligent/value-adding processes in digitisation projects

(as opposed to “mechanical” tasks)

• Objective performance indicators for individual processing

steps

• Objective quality measures for overall results of refinement

processes

• In-depth analysis of specific problems

(not just Pass/Fail QA)

2

Europeana Newspapers Information Day, London, 9 June 2014

Page 3: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Importance of PE&QA in Digitisation Projects

• Planning

• Feasibility

• Prioritisation

• Costs, time, manual steps, specialist software

• Services, output formats

• Implementation

• Setup of workflows

• Identification of bottlenecks

• Optimisation of individual processing steps

• Monitoring and controlling

• Agreed quality levels (OCR tools and commissioned services)

3

Europeana Newspapers Information Day, London, 9 June 2014

Page 4: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Digitisation Workflows and Evaluation Approaches

① Scanning

② Image enhancement Page splitting

Border removal

Dewarping (page curl, arbitrary warping)

Noise removal

Binarisation

③ Layout analysis Segmentation of regions, lines, words and characters

Region classification

Logical layout analysis

④ OCR

⑤ Post-processing

4

Europeana Newspapers Information Day, London, 9 June 2014

• Individual processing

steps vs.

entire workflow

• Direct vs. indirect

• Based on use

scenarios

Page 5: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Performance Evaluation Overview

5

Europeana Newspapers Information Day, London, 9 June 2014

Evaluation

Tools

Image

Repository

Evaluation

Results

Compatibility through

one common format

(PAGE)

Page 6: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Miss / Part.

Miss

Split

Misclass.

Merge

False

Detection

Layout Analysis (Segmentation and Classification)

6

Europeana Newspapers Information Day, London, 9 June 2014

Source: NLT/USAL

Types of errors Ground truth Result

Page 7: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Reading Order Detection

7

Europeana Newspapers Information Day, London, 9 June 2014

Ground

truth

Result

Page 8: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Text Recognition

• Comparison of Ground Truth and OCR output based on encoded text (ASCII, Unicode)

• Normalisation

• Character accuracy

• Distance measure: minimum number of edit operations (insertions, deletions,

substitutions)

• Per character class (lower case, upper case, whitespace characters, numbers, symbols)

• Word accuracy

• Correctly recognised words vs. total word count

• Bag of words (index, ranking)

• Stop words and non-stop words

• Rejected and suspicious characters/words

• Substitution errors (higher penalty)

• OCR confidence ≠ accuracy

8

Europeana Newspapers Information Day, London, 9 June 2014

“OCR is cool” “OOR is cod”

Page 9: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Interpretation of Results

9

Europeana Newspapers Information Day, London, 9 June 2014

• Metrics

• Measurements of conditions

• Types and number of errors

• Scenarios

• Application context

• Error weights

Miss

Misclass.

Merge

Split

False detect.

Merge

Rate

M1

M2

M3

Split

Rate

S1 S2

...

Error

Rate

• Overall success/error rates are based on

• weighted individual results

• type and size of affected regions

• allowable vs. non-allowable errors

Page 10: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Use Scenarios

10

Europeana Newspapers Information Day, London, 9 June 2014

Use Scenario Frequency

Keyword search in full text 82%

Image browsing - provenance (incl. title, year etc) 76%

Phrase search in full text 76%

Information aggregation (linking to related resources) 59%

Metadata-based search 53%

Crowd sourcing (correction/enrichment) 53%

Semantic search (respecting context) 41%

Access via content structure (article tracking, TOC etc.) 41%

Geolocation based services 29%

Print/eBook on demand 29%

Access through mobile Apps 29%

Translation (incl. search term translation for retrieval, historical - modern language)

29%

Content-based image retrieval (textual description and/or image as query ) 29%

Image browsing - categories (like advertisement, image) 24%

Text mining 24%

Search hit highlighting 24%

Content summarisation 18%

Social media integration (and vice versa integration in social media) 18%

Repurposing/Reformatting 18%

Recommendations 12%

Screen reader (text to speech) 12%

Information retrieval (incl. queries in natural language, fuzzy search etc) 12%

Negative search (eliminate results according to unwanted keywords) 6%

Intended/conceivable

use scenarios, based

on a survey among 17

project partners in the

Europeana Newspapers

project (2013)

Page 11: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

11

Europeana Newspapers Information Day, London, 9 June 2014

Layout Quality

OCR Accuracy

Text Eval

Layout Eval

PAGE XML

Layout

Text Content

Aletheia

Web Aletheia

Crowd Prototype

Tesseract Exporter

FineReader Exporter

Document Image

Typewritten OCR

Segmenter

Repositories

Converter Validator

Dewarping

Image Tool

Metadata Extractor

Extractor

Exporter

Snippet

Serialised Text

SimplePageExporter C++

JAletheia

Sandbox

PAGE to SVG XSD

Optimiser

Layout correspondence,

reading order

Validation Conversion

Filtering

Bag of Words, Character and word accuracy

Dewarping

Eval …

Threshold, Otsu, Sauvola binarisation

Image and PAGE XML snippets

Gamera XML

(PAGE Scanner)

Tool

Prototype

Data

Java

Web

Command Line

ALTO XML FineReader XML

PRImA Tools

Page 12: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Ground Truth Production

• Aletheia

• Page border

• Print space

• Layout regions

(incl. metadata)

• Text lines

• Words

• Glyphs

• Unicode text

• Reading order

• Layers

• Ground Truth Validator

12

Europeana Newspapers Information Day, London, 9 June 2014

• FineReader Engine Exporter (Preproduction)

• Ground Truth Converter/Normaliser

Page 13: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Evaluation Tools

• Segmentation,

Classification, and

Reading Order

• OCR Text

• Deskewing

• Dewarping

• Border Removal

• Binarisation

• Double Page Splitting

13

Europeana Newspapers Information Day, London, 9 June 2014

Page 14: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Case Studies

Usability and Potential of Existing Material

When to re-process existing material?

• Evolutionary Improvements in OCR Technology

• Specifically Trained OCR Engines

• Dedicated Language and Font Support

• Re-scanning

YMMV – results depend strongly on the material in question

14

Europeana Newspapers Information Day, London, 9 June 2014

Page 15: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Evolutionary Improvements in OCR Technology

ABBYY

FRE9 FRE10

15

Europeana Newspapers Information Day, London, 9 June 2014

72.7%73.7%

67.4%68.7%

60%

65%

70%

75%

80%

FRE 9 FRE 10

Succ

ess

Rat

e

OCR Engine

OCR Performance (Bag of Words)

OCR Performance (count based) OCR Performance (index based)

80.9%

75.5%

69.6%68.0%

98.5%

96.4%

85.1%

79.0%

90.5%

85.1%

71.8%70.8%

67.9%65.9%

60%

65%

70%

75%

80%

85%

90%

95%

100%

FineReader 9 FineReader 10

Succ

ess

Rat

e

OCR Engine

Layout Analysis Performance

General Recognition Access via Content Structure

Content-Based Image Retrieval IMPACT - Text Structure Scenario (no reading order)

Keyword Search in Full Text Phrase Search in Full Text

Print and eBook on Demand

-1..-6%

+1%

Page 16: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Specifically Trained OCR Engines

16

Europeana Newspapers Information Day, London, 9 June 2014

53.3%

80.3% 81.4% 82.0%85.2% 84.3%

82.4%

50%

60%

70%

80%

90%

100%

Tesseract 3 FRE 10 EPITA JOUVE PAL Fraunhofer 2013

Fraunhofer 2011

Succ

ess

Rat

e

OCR Scenario

Layout Analysis of Historical Newspapers:

Off-the-shelf software vs. optimised/trained systems

+5%

HNLA2013

Page 17: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

Dedicated Language and Font Support

17

Europeana Newspapers Information Day, London, 9 June 2014

Recognition of Blackletter (Fraktur) Documents:

Standard vs. Gothic

Mode (ABBYY FRE10)

30.8%

94.2%

30.2%

94.0%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Normal Gothic

Succ

ess

Rat

e

Setting for OCR Engine

OCR Performance (Bag of Words)

OCR Performance (count based) OCR Performance (index based)

92.5%93.4%

75.3% 74.9%

99.5% 99.5%

94.2% 94.4%94.7% 94.6%

76.6% 76.1%75.0% 74.7%

60%

65%

70%

75%

80%

85%

90%

95%

100%

Normal Gothic

Succ

ess

Rat

e

FRE10 OCR Engine Setting

Layout Analysis Performance

General Recognition Access via Content Structure

Content-Based Image Retrieval IMPACT - Text Structure Scenario (no reading order)

Keyword Search in Full Text Phrase Search in Full Text

Print and eBook on Demand

+64%

Page 18: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

OCR results on bi-tonal and re-scanned greyscale images

for documents with varying contrast:

(ABBYY FRE10)

Re-scanning

18

Europeana Newspapers Information Day, London, 9 June 2014

+35%

27.7%

64.2%

28.0%

62.9%

0%

10%

20%

30%

40%

50%

60%

70%

Original Re-scanned

Succ

ess

Rat

e

Dataset

OCR Performance (Bag of Words)

OCR Performance (count based) OCR Performance (index based)

Page 19: Performance Evaluation and Quality Assessment

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the

Competitiveness and Innovation Framework Programme by the European Community

http://ec.europa.eu/ict_psp

More information:

PRImA

www.primaresearch.org

Europeana Newspapers

www.europeana-newspapers.eu/

19

Europeana Newspapers Information Day, London, 9 June 2014