RK EASG Seminar 080211 Plus Demo

Embed Size (px)

Citation preview

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    1/41

    Using Corpora for AutonomousCorrection and Improvement of

    Academic Writing

    Ramesh Krishnamurthy

    Aston UniversityFebruary 8th 2011

    [REPORT ON WORK IN PROGRESS]

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    2/41

    Abstract

    1. All of ourstudents need to improve their academic writing

    skills. This is true for Home students as well as for the increasing

    numbers of EU and International students.

    2. This talk looks at the possibilities of using corpora in this

    process, and specifically reports on a case study involving aChinese-speaking student using the ACORN (the Aston Corpus

    Network) corpora.

    3. The method requires less teacher time, offers more scope for

    autonomous student learning, and leads to a greater awareness

    of academic writing as a cyclic editorial process rather than

    merely as a product for assessment.

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    3/41

    UG1 students need to improve their

    academic writing skills - 1

    Examples from UG1

    The same article can be reported differently, dependingon the type of newspaper it has been obtained from.

    As the case when first reported concerned the death ofa young baby due to neglect and abuse, which legally

    the public were not allowed to be made aware of the

    full name of the child.

    As expected from a headline the text still reads as astatement as opposed to a structured sentence in order

    to grab the audiences attention.

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    4/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    5/41

    UG2 students need to improve their

    academic writing skills

    Extracts from Feedback to UG2

    Your written language needs some more work, as yourerrors sometimes impede communication somemistakes affect the clarity of the argument mistakesin spelling and grammar more noticeable aremistakes in the use ofterminology poor grammarmakes the analysis difficult to understandgrammatical errors, and poor choice of words

    (especially terminology) spelling mistakes use ofcomplex sentences affect the clarity of argumentsome rather informal comments

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    6/41

    UG3 students need to improve their

    academic writing skills

    Examples from UG3

    From my initial reading on this matter, I have readwithin Richard Dawkins (1976) book The Selfish

    Gene and this gave me a valuable insight into OxfordDictionaryies.com it is easily transferrable from any

    subject that is of slight annoyance, to an accident

    these memes are an interesting cause for study,

    particularly, as they are most widely recognised inyounger Internet communities The area in which I

    propose to study is in politics and corpus

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    7/41

    UG3 students need to improve their

    academic writing skills Extracts from Feedback to UG3 Some errors and weaknesses in expression (comprised

    from) weak wordings cause loss of coherence Weakacademic style; poor proofreading; sometimes

    repetitive/tautologous wordings Weak expressionoften obscures meaning Grammar not clear; manyseeming errors Major weaknesses in expression andstyle, obscures meaning at times Some weaknessesin use of terms s for plurals non-grammatical

    sentences some poor wordings... errors and typospoor wordings, including informal, non-academicphrases weak grammar obscures meaning

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    8/41

    Masters students need to improve

    their academic writing skills Extracts from Feedback to MA/MSc Unfortunately, the presentation suffered considerably from poor

    wordings, weak academic style, and many typos and errors quitea lot of minor slips already noticeable in the first page, often to dowith word choice The consistently poor quality of English

    throughout makes it very difficult to assess frequent lack oflinguistic clarity and cohesion the content is largely obscured bythe weaknesses in form at times repetitive, or overladen withconnectors The main weakness is in English expression, whichsometimes obscures the intended meaning English style is oftenpoor (the learnt from coursebooks language the employment of

    the founding exerting the whole dialogues) problems in Englishexpression sometimes cause difficulty for the reader inconsistentand inaccurate use ofterminology very weak English academicwriting style and expression throughout, often leading toconsiderable difficulty in comprehension

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    9/41

    and its not just me!

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    10/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    11/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    12/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    13/41

    http://nexus.aber.ac.uk/xwiki/bin/view/Main/HEA+Annual+Conference+2009

    Higher Education Academy Annual Conference 2009

    The Wiki Way to Develop Academic Writing Competence Dr Rob Spence (Edge Hill University)

    This paper presented an account of an ongoing investigation intothe use of wikis to develop students academic writing skills

    through collaborative work. Undergraduate students of English

    were invited to collaborate on writing tasks with the specific aim of

    developing their competence through peer review and appraisal.

    The motivation for the wiki project arose from the widely-commented (if only anecdotal) decline in student writing

    skills/literacy in HE. In particular, the wiki project sought to addressthree widely-perceived problems: students lack of confidence,

    students inability to deal with complex issues, students

    substandard written work and the tendency to Wikipedia cut-

    and-paste.

    http://nexus.aber.ac.uk/xwiki/bin/view/Main/HEA+Annual+Conference+2009http://nexus.aber.ac.uk/xwiki/bin/view/Main/HEA+Annual+Conference+2009
  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    14/41

    http://www.humboldt.edu/english/GWPEGeneralInformation.htm

    Humboldt State University, department of ENGLISH History of and Rationale behind the Graduation Writing

    Proficiency Examination Requirement

    Because of a noticeable decline in student writingskills, the CSU Chancellor appointed a Task Force on

    Student Writing Skills in 1975 to investigate theproblem and recommend appropriate solutions. Themajor portion of the Task Force's recommendations,reviewed by the Educational Policies Committee andsupported by the CSU Academic Senate, was accepted

    by the Board of Trustees in 1976. One of the centralaspects of this policy required the demonstration ofwriting proficiency at the upper-division level as arequirement for graduation from every campus withinthe CSU system.

    http://www.humboldt.edu/english/GWPEGeneralInformation.htmhttp://www.humboldt.edu/english/GWPEGeneralInformation.htm
  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    15/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    16/41

    Learner autonomy

    autonomy 1620s, from Gk. autonomia "independence, livingby one's own laws" from auto- "self" + nomos "custom, law"

    [http://www.etymonline.com/]

    moral and political philosophy > sociology > education

    Holec (1979) Autonomy and Foreign Language Learning

    Boud (ed) (1981) Developing Student Autonomy in Learning

    Grenfell and James (2004) Change in the field - changing the

    field: Bourdieu and the methodological practice of educational

    Research. British Journal of Sociology of Education,25/4, 507-523

    http://www.etymonline.com/index.php?term=autonomyhttp://www.etymonline.com/index.php?term=autonomy
  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    17/41

    Learner autonomy: Holec (1979)

    The autonomous language learner takes responsibility for the totality ofhis learning situation. He does this by determining his own objectives,defining the contents to be learned and the progression of the course,selecting methods and techniques to be used, monitoring this procedure,and evaluating what he has acquired. Objectives are specific to thelearner, and the learner's communicative needs determine the verbal

    elements chosen. Learning thus proceeds from ideas to correctgrammatical, lexical, and phonological form. The self-directed learnerchooses the methods of instruction through trial-and-error. His selectionis based on the objectives set and its applicability to internal and externalconstraints. The student evaluates his attainment through his objectives,and this evaluation helps him to plan subsequent learning. The concept ofautonomous learning requires a redefinition of knowledge from anobjective universal to a subjective individual knowledge determined bythe learner. For teachers, it means new objectives which help the learnerdefine his personal objectives and help him acquire autonomy. Severalexperiments in autonomous learning are described.

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    18/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    19/41

    Learner autonomy: Grenfell and James (2004)

    methodological practice in educational research from theperspective of Bourdieus field theory (p507)

    taking educational research itself to be a field (p508)

    the briefest account of methodological developments in thetwentieth century would describe a move away from a

    positivist towards a more qualitative, naturalistic paradigm.

    Up until the 1960s, what educational research that did takeplace was mostly small, part-time and based on

    psychometric tests of pupils intelligence and learning. The

    alternative to this approach stemmed from a philosophical

    critique of its founding assumptions to mimic the physicalsciences and stressed instead the social and contextual

    aspects of education (see Hirst, 1966, 1974). What emerged

    was a definition of educational theory in terms of the so-

    called `foundational disciplines': sociology, philosophy, history,

    psychology.

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    20/41

    Learner autonomy: Grenfell and James (2004)

    The qualitative paradigm developed throughout the 1970s, 1980s and1990s, giving rise to a range of ethnographic and naturalist

    methodologies, including the postmodernist. However, a sustained attack(see Hillage et al., 1998; Tooley & Darby, 1998) against this research was

    mounted during this last decade of the century; claiming to find its

    methods insufficiently rigorous, its data collection small scale and its

    outcomes biased. Moreover, it was argued that such research had little

    impact on institutional practice; while what was needed was research ofthe nature that answered questions such as how to improve pupil

    achievement. Researchers were urged to return to quantitative methods,

    with experiments and randomized controlled trials seen as capable of

    producing sufficiently hard' evidence (see Fitz-Gibbon & Morris, 1987;

    Boruch, 1997; Fitz-Gibbon, 2001). (p509) avant-garde rear-garde

    process of time (p510) There are other features that follow from the

    character offields and the avant-garde. First is the question of

    autonomy (p510) [NB NO mention anywhere in the article oflearner

    autonomy!]

    academic products structure practice (p510)

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    21/41

    Focus on Product = Neglect of Process?

    League tables

    A level results

    Marking systems (class distribution)

    Equality (irrespective of motivation/performance)

    Increasing instrumentality in attitudes toeducation

    Grenfell and James (2004):

    academic products structure practice(p510)

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    22/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    23/41

    Initial Research: ACORN Case Studies 2006-7

    This research was first reported in the ACORNCase Studies (2008), as ACORN Case Study 2:Self-Correction of Academic Writing

    Case Study 4: Spanish Grammar Clinics was sincedeveloped and published: Yepes, G.R. &Krishnamurthy, R. (2010). Corpus Linguistics

    and Second Language Acquisition the use ofACORN in the teaching of Spanish Grammar,Lebende Sprachen 55/1: 108122

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    24/41

    ACORN Case Study 2 Context : I worked closely with Steven, a Computer Science Placement Student on

    ACORN, a Chinese native-speaker, who came to UK in 2002, did 9 months ofEnglish then 2 years A-level (Maths, Chinese, Physics) at an FE college, then startedat Aston in 2005. He submitted a weekly 1-page report to me on his ACORN work.

    Aims: To help Steven to improve his English and produce better reports; to trial theACORN system with a view to software enhancements; to understand some of thepedagogic implications of the methodology

    Procedure: This started very informally, but seemed to work extremely well, so we

    started to preserve the data. Very rough estimates are: I spent 2-3 minuteshighlighting in green any marked usages; Steven spent 5-10 minutes correcting 10-15% silly mistakes, 30 minutes checking ACORN corpus and correcting 60-70% ofother items; We spent 15 minutes going through the 15-20% remaining complexitems, and 15 minutes discussing Chinese/English, corpus software design, andsearch procedures

    Examples of items I highlighted in Stevens draft reports: I will take a deep lookinto it next week I replied him He was not an expert with MySQL The testingthat I am doing does not affect any of the current functions on ACORN exceptadding new records to the ACORN log The PHP engine on the server might out

    putan error message.

    ACORN Screenshots were provided for these items, showing how he found thecorrect wording to use

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    25/41

    ACORN Case Study 2 Initial Evaluation: Steven enjoys this method: he finds it empowering, and

    incidentally learns other lexis and grammar; he perceives for himself the value offunctions missing in the ACORN software: e.g. phrase search, and this motivateshim to develop them; It saves me time, and turns a more mundane task into astimulating experimental procedure

    Afterthoughts: We have records of the marked pages and the corrected pages. We need to

    accurately record when Steven uses ACORN, which searches are quicker, which are

    impactful on his learning, which steps through the data require externalprompting, etc.

    An updated report from Steven suggests that, partly because of the restricted andrepetitive nature of his reports, and partly due to his past experience, theproportions of corrections are changing. The range/variety of errors has beenreduced. He now estimates that he is able to self-correct 30% of errors (e.g.omission of the; mismatch of tense sequences), only about 20% involve ACORN

    searches, and perhaps up to 50% require discussion. I think this methodology could be used by many language teachers. It is quick for

    the teacher, and results in a high proportion of self-correction by the student, aswell as some incidental learning. The procedure can of course also be used bystudents while drafting, rather than after correction, and for academic writing inFrench, German and Spanish.

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    26/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    27/41

    AntConc software

    for initial corpus analysis

    Demo

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    28/41

    AntConc: Word List: Drafts corpus = 15694 tokens [avge length=402], 1821 types

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    29/41

    AntConc: Word List: Corrected corpus = 15643 tokens [avge length=401], 1806 types

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    30/41

    DATASET 2: ACORN usage

    monitoring programs

    1. Createlog the original monitor program

    started 05/06/07 when ACORN was firstreleased to staff/students

    but only recorded concordance searches

    Designed to allow download as Excel file

    but dataset has now outgrown the Excelmaximum record limit (c. 62k lines?)

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    31/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    32/41

    DATASET 2: ACORN usage

    monitoring programs

    2. Monitor Log records ALL queries withinACORN (i.e. frequencies, etc as well asconcordance)

    but only started on 13/03/08 written bySteven!

    Was also supposed to allow saving as Excel file

    But the Excel download does not work itcreates a file, but with only one line of data,always the same one!

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    33/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    34/41

    Extracting Stevens searches

    from the ACORN usage monitor logs

    This was fairly straightforward, sorting the log

    files on the username column

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    35/41

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    36/41

    Aligning Stevens work and ACORN usage

    for detailed analyses

    This was slightly trickier! START/END DATES of Stevens work: 06/08/07 - 12/06/08Week 1 draft report = 06/08/07

    Week 42 draft report =29/05/08

    [Week 45 draft report = 13/06/08]

    Week 1 corrected report = 07/08/07Week 42 corrected report = 12/06/08

    BUT(1) Corrected versions were often submitted in batches, whenever

    Steven found the time in between his ACORN programming tasks,

    hence the detailed analyses are also initially conducted in batches

    (2) Change in ACORN usage monitoring program: As Steven only

    launched the monitor log program on 13/03/08, I can only check

    Stevens use of Concordances (and no other features) before that date

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    37/41

    Week twelve

    I updated the contents of the tutorial and case studies files by following Rameshs corrections and then recreated them in new designedlayout. And finally, I uploaded them to the server in order to allow Ramesh to them to show Professor Alison Halstead.

    For the existing parallel text files on the server, there are marks between paragraphs, where # indicates theparagraph number, so that the parallel indexing program knows that where a new paragraph starts and what the paragraph number is.

    However, for the new parallel indexing program which compiled last week, it recognizes a new paragraph by an empty line of String

    and then increments the paragraph number by 1. The reason for why I did it this way was because if it gave me the correct paragraph

    number, then I would not have to run the paraAlign.java program to produce the marks before running the parallel

    indexing program, this would shorten the time required for the whole indexing processes.

    The contents of the new created databas were changed slightly after using the new compiled program. The sequence of the values

    under the field ID in table tokens used to be in numerical order, from 1 to the number of total tokens in the file. But after the new

    compiled program was used, the sequence was not in numerical order. The reason for that was because the tables contents were

    ordered by the frequency of tokens, which means the most frequent word appeared on the top of the table rather than the first token in

    the file.

    To test whether the new database could work properly with the parallelResult.php file, I had to upload the database and the parallel

    text files from localhost to the live server and then move the existing parallel text files on the server to a different directory so that only

    the new uploaded files were read, and finally test them by using the parallel function on the website. Unfortunately the test result

    suggested that there were some problems because no text was shown on the parallelResult web page. While I was thinking what the

    problems may be, I emailed Husman to explain what I have done and what the result was, to see if he knew what had gone wrong. The

    reasons that I could think of were either there was something else that I had not yet done or the values under the field ID had to be in

    numerical order. But I did not think the possibilities were high for both of the reasons.

    I had a look at the parallelResult.php file and tried to find out what commands were used to retrieve the data from the database. But Ihave not resolved anything yet.

    Stevens draft

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    38/41

    Stevens draft with Rameshs green highlights

    Createlog Createlog + Items highlighted by Ramesh in draft Items searched in

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    39/41

    Createlog

    ONLY

    05/06/07

    12/03/08

    Createlog +

    Monitor Log

    13/03/08

    12/06/08

    Items highlighted by Ramesh in draft Items searched in

    ACORN

    Week 12

    draft

    25/10/07 following Rameshs corrections corrections

    Week 12

    corrected

    26/10/07 I did not think the possibilities were high for both of the reasons reason

    the new compiled program compiled

    the new uploaded files [corrected by

    analogy?]

    The reason for that was because reason

    NOT IN WEEK 12 DRAFT 1346 chenz English eng_general_db research 18/10/2007

    NOT IN WEEK 12 DRAFT 1359 chenz English eng_general_db negative 21/10/2007

    1366 chenz English eng_general_db the 23/10/2007

    1376 chenz English eng_general_db reason 25/10/2007

    1377 chenz English eng_general_db compiled 25/10/2007

    1378 chenz English eng_general_db numerical 25/10/2007

    1379 chenz English eng_general_db may 25/10/2007

    1380 chenz English eng_general_db might 25/10/2007

    1381 chenz English eng_general_db top 25/10/2007

    1382 chenz English eng_general_db reason 25/10/2007

    1383 chenz English eng_general_db the 25/10/2007

    1394 chenz English eng_general_db corrections 26/10/2007

    NOT IN WEEK 12 DRAFT 1395 chenz English eng_general_db webpage 26/10/2007

    1396 chenz English eng_general_db text 26/10/2007

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    40/41

    Week twelve

    I updated the contents of the tutorial and case studies files by implementing the corrections that Ramesh suggested, and then recreated them ina newly designed layout. And finally, I uploaded them to the server in order to allow Ramesh to show them to Professor Alison Halstead.

    For the existing parallel text files on the server, there are marks between paragraphs, where # indicates the paragraphnumber, so that the parallel indexing program knows that where a new paragraph starts and what the paragraph number is. However, the new

    parallel indexing program (compiled last week) recognizes a new paragraph by empty lines, and then increments the paragraph number by 1.

    The reason for doing it this way was that if it gave me the correct paragraph number, then I would not have to run the paraAlign.java program

    to produce the marks before running the parallel indexing program. This would shorten the time required for the whole

    indexing process.

    The contents of the newly created database were changed slightly after using the newly compiled program. The values under the field ID in

    table tokens used to be in numerical order, from 1 to the number of total tokens in the file. But after the newly compiled program was used,

    the sequence was not in numerical order. That was because the tables contents were ordered by the frequency of tokens, which means the

    most frequent word appeared at the top of the table rather than the first token in the file.

    To test whether the new database could work properly with the parallelResult.php file, I had to upload the database and the parallel text files

    from localhost to the live server and then move the existing parallel text files on the server to a different directory so that only the newly

    uploaded files were read, and finally test them by using the parallel function on the website. Unfortunately the test result suggested that there

    were some problems because no text was displayed on the parallelResult webpage. While I was thinking what the problems might be, I

    emailed Husman to explain what I have done and what the result was, to see if he knew what had gone wrong. The reasons that I could think

    of were either there was something else that I had not yet done or the values under the field ID had to be in numerical order. But I did not

    think the possibilities were high for either of the reasons.

    I had a look at the parallelResult.php file and tried to find out what commands were used to retrieve the data from the database. But I have notresolved anything yet.

    Stevens corrected version

  • 8/2/2019 RK EASG Seminar 080211 Plus Demo

    41/41

    NEXT STEPS:

    I need to search the same items that Stevensearched, and try to work out, by following his

    search path, which screen displays could have ledhim to make successful corrections. This will help

    me to evaluate the query strategy he used,

    think of quicker/better strategies, ways toimprove the user interface, and helpfiles to train

    users in successful search strategies