38
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University [email protected]

Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

  • Upload
    kedem

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners. Howard Chen Department of English National Taiwan Normal University [email protected]. The Needs to Provide Feedback on Second Language Writing. - PowerPoint PPT Presentation

Citation preview

Page 1: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

1

Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL LearnersHoward Chen

Department of EnglishNational Taiwan Normal [email protected]

Page 2: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

2

The Needs to Provide Feedback on Second Language Writing More and more tests ask ESL/EFL students

to demonstrate their writing abilities SLA Researchers would suggest that

learners would need more practices and corrective feedback.

However, who can provide them useful feedback on meaning and forms?

Page 3: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

3

Use the Existing Grammar Checkers? Teachers are the best feedback providers. However, so many essays to correct….

Microsoft grammar checker General impressions from ESL/EFL learners=

it is NOT very useful. The two new commercial packages: Vantage

MyAccess and ETS Criterion The feedback quality for ESL learners are not so accurate and

comprehensive. (perhaps because it does not target at any L1 group and it is mainly targeted at native speakers)

Page 4: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

4

A More Through Review on E-rater- ETS Criterion Japanese college researcher Junko Otoshi (2005)

from Ritsumeikan University Use 28 Japanese adult students’ TOEFL writing

essays to explore what Criterion can and cannot do with regard to providing feedback on the essays.

Criterion’s critique function was compared with a human instructor’s error feedback focusing on five error categories: verbs, word choice, nouns, articles, and sentence structures.

Page 5: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

5

Errors Marked by Criterion and Human Instructors (Means) Error Type Criterion Human Instructors Verbs 0.47 0.84 Nouns 0.00 0.94 Articles 0.07 2.00 Word Choice 0.11 2.32 Sentence Structure 0.32 6.31

Page 6: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

6

Rather Disappointing Results and Possible Reasons The results revealed that Criterion

experienced difficulties in detecting errors in all of the five categories.

Does it aim for higher accuracy and has lower recall? More conservative approach

The size the reference corpus? Another program MyAccess has similar

problems, though the general impression from review reports was that they can detect more errors.

Page 7: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

7

Trying to Combine Different Approaches: Plan A and B for Grammar Checkers With the funding from NSC in Taiwan, we

planned to develop two grammar checkers. Different approaches= parser-rules-statistics

Plan A: we will use the ngram to help to identify the errors

Plan B: we will use the rule-based grammar checker to identify errors.

If possible, plan A and B will be merged and it should be able to capture more errors.

In this paper, we will only discuss the plan A.

Page 8: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

8

What’s the Ngram (statistical) Checker? We will not write specific grammar rules. The computer helps to calculate all the

possible combinations of word strings (2-word and 3-word) in a very large native corpus. Language models building.

All these saved to a large database. Then when students write and submit an

essay to the ngram checker, the system can quickly detect the word strings that do not exist in the native corpus.

Page 9: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

9

Ngram-based Checker: advantages The key idea is simple but powerful No need to write rule More robust in detecting errors. Large and suitable corpus might make this

very useful. (ETS, they used 30-million news)

Page 10: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

10

The Procedure of Developing an Ngram Checker (corpora and tools) 1. Find suitable and large corpus (e.g BNC;

wikipedia, and Google) 2. Extract the ngrams (NLP tools SRI tool ) 3. Build a large ngram database 4. Develop and test different highlighting

methods 5. Highlight the possibly problematic ngrams

in learners’ writing

Page 11: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

11

Grammar Checker Online

The links http://140.122.83.250:4000/main (BNC) http://140.122.83.250/search.php (Google) http://140.122.83.245/ngram-check/ (BNC)

Page 12: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

12

The Web Interface of Ngram Checker

Page 13: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

13

Page 14: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

14

Page 15: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

15

A Simple Example

Page 16: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

16

Evaluate the Checker Performances: Any Standard Way of Evaluating Checkers? What kind of errors should be used to test the

grammar checker? Fair assessment- same set of sentences. How many sentences? Many different categories and errors Lexical factors. NLP researchers: F-measure and precision

and recall

Page 17: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

17

Test with CLEC Corpus from China The size of the Chinese learners of English

Corpus. 1 million error-tagged learner corpus. With about 60 error types. We decided to single out some sentences (10

sentences) from the learner corpus and then throw them into our ngram checkers.

Page 18: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

18

1. Form

Page 19: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

19

2. Verb Phrases (Tense)

Page 20: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

20

3. Noun Phrases

Page 21: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

21

4. Pronouns

Page 22: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

22

5. Adjective Phrases

Page 23: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

23

6. Prepositions- seems to be a difficult area

Page 24: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

24

7. Conjuncts Errors

Page 25: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

25

8. Word Errors

Page 26: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

26

9. Collocation Errors

Page 27: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

27

10. Sentence Structure Errors

Page 28: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

28

The Strengths of NTNU Ngram Checkers: Ngram is good at detecting errors in the “local”

or adjacent domains. It can indeed find many errors in CLEC.

Spellings Word forms Verb phrases Noun phrases Adj phrases Collocations

Page 29: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

29

The Weakness of Ngram Checkers It failed to catch the followings effectively:

Tense errors Conjuncts errors Fragments Pronoun errors Preposition errors The run on sentences The missing words

Page 30: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

30

The Poor Performance of Ngram Checkers for Tense and Conjuncts

Page 31: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

31

Rule-based Checker can Perform Better for Some Nonlocal Errors

Page 32: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

32

Wintertree Grammar Checker

Page 33: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

33

BUT Ngram Performed Better for the Local Errors I have some book. The informations are so rich. Th

ese researches are excellent. He is new friend. He cutted his finger. He enjoys to eat. He wants jumping into the river. I cannot decided about this. These reason are too simple. I has three answers.

Page 34: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

34

What Can We Do to Improve Feedback from Ngram Checkers? Only Highlighting and No detailed feedback?? We are facing a bigger challenge. How to recommend correct usage? How we can find

the correct examples for students? If students only see the errors highlighted, they

might still fail to correct the errors. For agreement errors, tense errors, confusing words,

Students might be able to self-correct.However, if there are some tense errors, collocations

errors or preposition errors, learners might need more specific suggestions.

Page 35: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

35

Find the Proper Collocates: increase and improve life

Page 36: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

36

Confusion between accept and receive your apology

Page 37: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

37

Future Directions for Improvement1. Test with many different errors and find the

strengths and limitations of Ngram-based checkers and Rule-based checkers

2. Use Tagged learner corpus to find the error patterns from learner languages

3. Feedback can be added in for ngram-based Checkers on the major error patterns

4. Better integration of the rule- based system and ngram checkers

Page 38: Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners

38

Thanks for your attention Questions and Discussions

[email protected] National Taiwan Normal University