13
Towards the New Czech Grammar-checker RASLAN 2018 Vojtěch Mrkývka [email protected] December 7, 2018

Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Towards the New Czech Grammar-checkerRASLAN 2018

Vojtěch Mrký[email protected]

December 7, 2018

Page 2: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Introduction Goal

Goal

New grammar-checker of CzechWeb-based applicationUsing new and existing tools developed at MU

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 2 / 13

Page 3: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Introduction Motivation

Motivation

There are tools existing / in development at MUCurrent best Czech GC is part of proprietary systemCreate an alternative to applications like Grammarly but forCzech

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 3 / 13

Page 4: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Current version Interface

The current interface

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 4 / 13

Page 5: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Current version Interface

The current interface

Based on on-line text processor tinyMCEMostly in JavaScript as tinyMCE modulesAsynchronous processCommunication with backend via AJAX

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 5 / 13

Page 6: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Current version Processing diagram

Processing diagram

tokenization correctiondisplaying

lemmatization& tagging

somemodule

somemodule

somemodule

somemodule

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 6 / 13

Page 7: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Current version Correction displaying

Correction displaying

The dog is runing .0 1 2 3 4 5 6 7

Tokens to display mistake at: 6Correction: 6/runing/running

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 7 / 13

Page 8: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Current version Implemented modules

Implemented modules

Correction TP FP TN FN pre recMisspellings (excl. proper nouns) 24 0 487 16 1,000 0,600Misspellings (incl. proper nouns) 7 17 497 6 0,292 0,538Vocalisation of prepositions 4 0 8 0 1,000 1,000Multiple whitespaces 4 0 515 0 1,000 1,000Whitespace in the interpunction proximity 7 0 119 0 1,000 1,000Conditionals 2 0 1 0 1,000 1,000Commas in a sentence 3 0 0 4 1,000 0,429

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 8 / 13

Page 9: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Proximate issues Genuine testing

A problem with testing

Testing data were too smallMistakes were artificial⇒ Need for API & collection of correctly annotated genuine texts

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 9 / 13

Page 10: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Proximate issues Implemented modules

Implemented modules

Correction TP FP TN FN pre recMisspellings (excl. proper nouns) 24 0 487 16 1,000 0,600Misspellings (incl. proper nouns) 7 17 497 6 0,292 0,538Vocalisation of prepositions 4 0 8 0 1,000 1,000Multiple whitespaces 4 0 515 0 1,000 1,000Whitespace in the interpunction proximity 7 0 119 0 1,000 1,000Conditionals 2 0 1 0 1,000 1,000Commas in a sentence 3 0 0 4 1,000 0,429

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 10 / 13

Page 11: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Proximate issues Spell-checking

A problem with spell-checking

Precision is lowNot often updated dictionary⇒ Method of adding new words, using different lexicon. . .

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 11 / 13

Page 12: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Proximate issues Error reporting

A problem with error reporting

Allow users to flag miscorrectionsHow to not display miscorrection afterwards?⇒ Probably module-depending

V.Mrkývka · Towards the New Czech Grammar-checker · December 7, 2018 12 / 13

Page 13: Towards the New Czech Grammar-checker · Current best Czech GC is part of proprietary system Create an alternative to applications like Grammarly but for ... Commas in a sentence

Thank you for your attention!

This work was supported by the project of specific research Čeština v jednotě synchronie a diachronie (Czechlanguage in unity of synchrony and diachrony; project no. MUNI/A/0862/2017).