Arabic spell checkers

  • View
    433

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Text of Arabic spell checkers

  • 1. Arabic Spell CheckersNatural Language Processing - CS465Supervised by:Dr. Amal Al-SaifDone by:Hanan Al-MohammadiMona Al-MutairiImam Muhammad ibn Saud University, Department ofComputer Science and Information System1
  • 2. Outline- Introduction- Arabic Spell Checker Techniques
  • 3. Outline- Introduction- Arabic Spell Checker Techniques
  • 4. Outline- Introduction- Arabic Spell Checker Techniques
  • 5. First PaperAn Approach for Analyzing and CorrectingSpelling Errors for Non-native Arabic learnerso Based on a questioning environment.
  • 6. First Paper Error DetectionTwo types of errors:1. Ill-formed word errors.o Buckwalters Arabic Morphological analyzer .Ex. is ill-formed of word 2. Semantically incorrect errors.Ex. If a spelling question displays a happy face to a learnerand asks him to write a word which describes this pictureand he enter /helped instead of /happy
  • 7. First Paper Error CorrectionEdit distance technique. Filtering1. Morphological Analyzer Filter.Ex. After applying Correction techniques on word , appears as correction. So, Morphological filter will exclude it.2. Gloss Filter.Ex. If user misspelled word /happy with (the second letter is incorrectly replaced by the short vowel Fatha). applying Correctiontechniques will result two possible word corrections: /happy and /helped, Both are valid Arabic words. Apply gloss filter willexclude word /helped.
  • 8. First Paper Evaluation:Done using real test data composed of 190 misspelled words and includeboth single and multi-error misspellings composed of up to three errors perword. Average word length is 5 letters per word. Result80+% recall and 90+% precision were achieved for each type of spellingerror.
  • 9. Second PaperTowards Automatic Spell Checking forArabic Composed of Arabic morphologicalanalyzer, lexicon, spelling detector, and spellingcorrector. Spelling detection Two possibilities :1. The misspelled word is an invalid word, Ex. for 2. The misspelled word is a valid word , Ex. inplace of
  • 10. Second Paper Spelling correction: Add missing character: the candidates of the misspelled are , and Replace incorrect character: the candidates of the misspelled " " are" ", " and " ". Remove excessive character: the candidates of the misspelled word" " are " ", " ". Add a space to split words: the candidates of the misspelled word " "are " ", " ". Arabic morphological analyzer Broke down the inflected word into the prefix , the suffix , and the stem . Then check the stemlexicon, if has entry in the lexicon stem is correct.
  • 11. Second Paper Evaluation:This approach theoretical, No experimental results were report.
  • 12. Third Paper- Algorithm defined by B. Haddad and M. Yassen- Error patternsSimple Errors :Editing Errors and Boundary ProblemsCognitive and Phonetic MistakesSyntax ErrorsSemantic ErrorsSubstitution: (/ /, flql, he said), the letter (/ /,f) mistakenly substituted by (/ /,q).Deletion: (/ /, sdama stadama, he or it-used), the letter (/ /,t) is missing.Insertion: (/ /, makttb maktb, a letter in the sense of a message). (/ /,t) is additionally inserted.Transposition: (/ /, mit tim, meeting). The letter (/ /, t) is swapped.(/ /, rasalamih ras alamih)(/ /, fa ql faql, and then he said)(/ or /, hd or hz had, the particle that)(/ /, the girl went to [the]- school), (/ /,dahaba) instead of(/ /, dahabat).(/ /, red rebuking cells red blood cells). (/ /, ldam, the rebuking)instead of (/ /, ldam, the-blood).
  • 13. Third Paper- Knowledge base :D&C = ( DAWKB , NDAKB , CORSTR)- Derivative Arabic Word Knowledge Base DAWKB- For each valid Arabic root there is a certain number of consistent patterns.- Root-pattern relationship means, a word, which has at least one lexical occurrencein the Arabic vocabulary.- dwj = ( Prefji + PtjsubMGRi + Suffji ) MSR PNGRi- Database for NDW & AWConsidered as stems or lexemes collected in the knowledge base.- Non-Word Recognition and Error Correction Strategy
  • 14. Fourth Paper- Paper proposed by A. Hattab and A. Hussein.- The proposed system consists of three models.- The detection and correction model, classify wordsinto a non-words or a misspelling.
  • 15. Fourth PaperEvaluation :-There are two run applied for the proposed system, first run without the detectionand correction method and the second is with detection and correction method.-The same data will be used in both experiments. The results of these experimentsare shown in Tables:-The detection and correction algorithm outperformed the Bayes algorithm by about10%, without checking misspelling errors accuracy is 68.85%, while the averageaccuracy for the classification system with misspellings detection and correction is71.77%.
  • 16. Thank You For Your Attention