30
Christian Federmann, DFKI LT Lab Appraise |əˈprāz|

Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Christian Federmann, DFKI LT Lab

Appraise |əˈprāz|

Page 2: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Appraise |əˈprāz|

Christian Federmann, DFKI LT Lab

verb [ with obj. ]assess the value or quality of

Page 3: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Overview

‣Appraise in a nutshell

‣Annotation tasks

‣Added extras

‣Available for free

‣Outlook

Page 4: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Appraise in a nutshell

Page 5: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

System Description

‣Appraise is...

‣ an open-source tool for MT evaluation

‣ available under a BSD license from GitHub

‣ actively maintained and extended

‣ alive and kicking since 2008...

Page 6: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Features

‣Appraise implements...

‣ translation quality checking

‣ ranking of translations

‣ error classification

‣manual post-editing

Page 7: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Batteries Included

‣Appraise comes...

‣ ready for deployment as web-based service

‣with inter-annotator agreement scores

‣ offering XML-based import/export formats

‣ featuring full-blown administration backend

Page 8: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Say “hello”...

Page 9: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Annotation Tasks

Page 10: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Ranking

‣Ranking of n ≥ 2 translations

‣ source text and, optionally, reference shown

‣wherever available, context is displayed

‣ standard browser controls for ranking

‣UI supports keyboard shortcuts

Page 11: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Ranking

Page 12: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

3-Way Ranking

Page 13: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Error Classification

‣Classification of errors in given translation

‣ source (or target) text and translation shown

‣ classification based on (Vilar et al., 2006)

‣ global errors (“missing words”, “too many”)

‣ local error classes (“minor”, “severe”)

Page 14: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Error Classification

Page 15: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Quality Estimation

‣Quickly estimate quality of given translations

‣ source text and translation shown

‣ depending on task design, reference given

‣ user selects one of three result categories

‣ focus on quick decisions (“gut feeling”)

Page 16: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Quality Estimation

Page 17: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Post-editing

‣ Post-edit a given translation

‣ source and one or more translations shown

‣ user selects translation “easiest to post-edit”

‣ “Translate from scratch” mode

‣ simple standard browser controls used

Page 18: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Post-editing

Page 19: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Added Extras

Page 20: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Task Status Overview

Page 21: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Agreement Scores

‣Appraise gives you...

‣Krippendorff ’s α (Krippendorff, 2004)

‣ Fleiss’ κ (Fleiss, 1971) based on (Cohen, 1960)

‣ Bennett’s S (Bennett, 1954)

‣ Scott’s π (Scott, 1955)

Page 22: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Agreement Scores

Page 23: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Admin Backend

Page 24: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Available for free

Page 25: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Hosted on GitHub

Page 26: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Hosted on GitHub

‣Get it at...

‣ https://github.com/cfedermann/Appraise

Page 27: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Outlook

Page 28: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Outlook

‣Appraise will...

‣ happily support people trying it out

‣ soon get more refined task status views

‣ explore new annotation user interfaces

‣ improve from your feedback/pull requests

Page 29: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Questions?

Page 30: Appraise - mt-archive.info · System Description ‣Appraise is... ‣an open-source tool for MT evaluation ‣available under a BSD license from GitHub ‣actively maintained and

Acknowledgements

This work has been funded under the Seventh Framework Programme for Research and Technological Development of the European Commission (grant agreement no.: 249119) through the T4ME contract.

We are grateful to the anonymous reviewers for their valuable feedback on the draft version of the MTM paper.