13
Evaluation of a support intervention for senior secondary school English immersion Elizabeth Walker * English Department, Hong Kong Institute of Education, 10 Lo Ping Rd., Tai Po, NT, Hong Kong Received 14 November 2008; received in revised form 22 July 2009; accepted 18 August 2009 Abstract This paper reports on part of an evaluation of a 2-year program providing preparatory support for 430 Cantonese Chi- nese-native-speaking students switching from Chinese to English-medium instruction late in their secondary schooling, mainly because of aspiration to English-medium tertiary study. Focusing quantitatively and qualitatively on scientific Eng- lish achievement, the paper addresses the content or cognitive–academic dimension, so far as underrepresented in English- as-foreign-language research, as is senior secondary school immersion itself. While no direct cause-effect relationship between the program and achievement levels was to be claimed, the observed differences between program participants and non-participants were ultimately minimal. Reasons proposed for the intervention’s outcome seemed mainly related to key stakeholders’ apparent limited awareness of cognitive–academic language and its development. The discussion iden- tifies factors arguably crucial for support programs for senior secondary school academic study through English-as-for- eign-language. Ó 2009 Elsevier Ltd. All rights reserved. Keywords: Senior secondary immersion; Support programs; Cognitive–academic language; Scientific English; Threshold specification 1. Introduction English-medium immersion education is politicized (Tsui, 2004) in the Hong Kong Special Administrative Region of China (HK) mainly because of the social capital English proficiency brings, e.g. tertiary education. In the late 1990s, some schools, required by government to use mother tongue Cantonese Chinese as the medium of instruction, felt compelled to change to the foreign English-medium in secondary 4 (S4 – students aged 16), to ensure the school’s competitiveness in attracting the top 25% of students, who choose English-medium school- ing. The switch to English-medium in S4 is very late in the students’ schooling (So and Jones, 2002), so the gov- ernment responded to the demands of Chinese-medium students’ schools and parents, and commissioned a 2- year Pilot Enrichment Program (EP) one of the stated aims of which was to ‘smoothen the transfer to S4’ which means in effect to support students from selected Chinese-medium schools to develop the high English 0346-251X/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.system.2009.12.005 * Tel.: +852 2948 7381; fax: +852 2948 7270. E-mail address: [email protected] Available online at www.sciencedirect.com System 38 (2010) 50–62 www.elsevier.com/locate/system

Evaluation of a support intervention for senior secondary school English immersion

Embed Size (px)

Citation preview

Available online at www.sciencedirect.com

System 38 (2010) 50–62

www.elsevier.com/locate/system

Evaluation of a support intervention for senior secondaryschool English immersion

Elizabeth Walker *

English Department, Hong Kong Institute of Education, 10 Lo Ping Rd., Tai Po, NT, Hong Kong

Received 14 November 2008; received in revised form 22 July 2009; accepted 18 August 2009

Abstract

This paper reports on part of an evaluation of a 2-year program providing preparatory support for 430 Cantonese Chi-nese-native-speaking students switching from Chinese to English-medium instruction late in their secondary schooling,mainly because of aspiration to English-medium tertiary study. Focusing quantitatively and qualitatively on scientific Eng-lish achievement, the paper addresses the content or cognitive–academic dimension, so far as underrepresented in English-as-foreign-language research, as is senior secondary school immersion itself. While no direct cause-effect relationshipbetween the program and achievement levels was to be claimed, the observed differences between program participantsand non-participants were ultimately minimal. Reasons proposed for the intervention’s outcome seemed mainly relatedto key stakeholders’ apparent limited awareness of cognitive–academic language and its development. The discussion iden-tifies factors arguably crucial for support programs for senior secondary school academic study through English-as-for-eign-language.� 2009 Elsevier Ltd. All rights reserved.

Keywords: Senior secondary immersion; Support programs; Cognitive–academic language; Scientific English; Threshold specification

1. Introduction

English-medium immersion education is politicized (Tsui, 2004) in the Hong Kong Special AdministrativeRegion of China (HK) mainly because of the social capital English proficiency brings, e.g. tertiary education. Inthe late 1990s, some schools, required by government to use mother tongue Cantonese Chinese as the mediumof instruction, felt compelled to change to the foreign English-medium in secondary 4 (S4 – students aged 16), toensure the school’s competitiveness in attracting the top 25% of students, who choose English-medium school-ing. The switch to English-medium in S4 is very late in the students’ schooling (So and Jones, 2002), so the gov-ernment responded to the demands of Chinese-medium students’ schools and parents, and commissioned a 2-year Pilot Enrichment Program (EP) one of the stated aims of which was to ‘smoothen the transfer to S4’ whichmeans in effect to support students from selected Chinese-medium schools to develop the high English

0346-251X/$ - see front matter � 2009 Elsevier Ltd. All rights reserved.

doi:10.1016/j.system.2009.12.005

* Tel.: +852 2948 7381; fax: +852 2948 7270.E-mail address: [email protected]

E. Walker / System 38 (2010) 50–62 51

proficiency level required to learn academic subjects through English at senior secondary school level, and even-tually tertiary level. The EP was conceptualized by a high-level government steering committee including senioracademics; content design was tendered to English-language specialists in a local university; and the Pilot’sevaluation was tendered to the author’s university. The evaluation took 3 years because it was undertakenpre, during and post-implementation of the EP. This paper aims to report only one part of the evaluation, usingdata from a test comparing participating and non-participating students’ scientific English proficiency after thePilot had finished, several months into the students’ S4 studies.

1.1. The intervention

The EP occurred across the 2 years prior to the students’ change to English-medium teaching in S4. It com-prised 60 modules, from which teachers in each pilot school selected. Topics covered the main areas of thecurriculum, e.g. consumer education, old HK (social sciences), shapes (mathematics), weather, deserts (geog-raphy, science), music and cooking. The modules and teachers’ notes resembled an integrated-skills language

syllabus, with standard communicative language teaching methodology. The module materials included highlydetailed teaching notes and materials. The teaching notes listed sequential learning activities for each lesson,with detail such as ‘step 3: Let students work in pairs to identify the timber products from the photos. Note: stu-

dents may have problems with the pronunciation of Canterbury, sculpture, plate . . ..read these words aloud and

then ask students to repeat’ (from a module called ‘The Environment’). The materials comprised visuals,and texts for reading or listening, with many writing and speaking exercises based on the input, as preparationfor the output. For example, in a module called ‘Natural Hazards’ there was a short reading text with wordsearch activities using text-derived lexis such as monsoon, drought; a series of exercises in preparation for lis-tening to a (scripted) radio report on flooding, and guidance for writing an 80-word news report on a flood.

In-school teaching time per module depended on the extent of materials use, available time and staff allo-cation. In some schools, the program was taught for about an hour after school, once or twice per week, byEnglish-language teachers; while in others it was taught after school by subject teachers (hence the detailedteaching notes) or as a subject within the curriculum in place of other subject lessons. No school replacedits English-as-subject lessons with the EP modules, since the political imperative was to provide additionalEnglish exposure. Furthermore, training sessions had to be provided by the module writers because manyof the subject teachers found the module’s EFL-type teaching strategies too hard to follow from the teachers’notes.

This paper reports some of the findings of the commissioned evaluation of the EP, in which the author ledthe proficiency testing, mandated by the EP’s steering committee. The paper uses the proficiency test data tocompare the academic (scientific) English proficiency of participant and non-participant students about threemonths after all students had begun S4 English-medium science studies. Specifically, the paper investigates thequestion: is there a significant difference in test performance in science-related English of S4 students who took a2-year EP and those who did not? If a significant difference is observed, one factor in the difference might be theEP, indicating the viability of such a support program in enhancing subject-related English proficiency. This isimportant because EFL research has not much considered the content dimension of the language–contentrelationship, nor the nature of support (Marsh et al., 2000) required in secondary school English classes fordevelopment of cognitive–academic language proficiency (CALP) (Cummins, 1979, 1991), essential at tertiarylevel. This paper’s focus on the content dimension justifies the attention to CALP-related literature in the fol-lowing Literature review pertaining to the particular nature of senior secondary immersion learning, and rel-evant support interventions.

2. Literature review

2.1. Dilemmas in late immersion learning as cognitive–academic language learning

‘Immersion’ is a category of bilingual education with a set of well-known prototypical characteristics setout by Swain and Johnson (1997), all of which are met in the junior secondary HK context. While ‘late immer-sion’ refers to that around age 12, and ‘late late immersion’ (Burger et al., 1997) refers to university level,

52 E. Walker / System 38 (2010) 50–62

immersion at senior secondary is obviously somewhere in between. The major mission of senior secondaryschooling everywhere, in first or foreign language (Halliday and Martin, 1993), is transition from BICS (basicinterpersonal communication skills) to CALP (Cummins, 1979). Swain and Johnson’s junior secondaryimmersion entry characteristic, ‘limited L2 proficiency’ is, then, highly problematic for senior secondaryimmersion, where the time for attaining the very high levels of L2 CALP in is even shorter and the cogni-tive–linguistic demand of the content is very high.

CAL is no-one’s mother tongue, and has to be learnt (Halliday, 1978). ‘Raising CALP in science’ meansinitiating students into the specialized discourse of science which ‘has developed to construct its alternativeworld view’ (Halliday and Martin, 1993). CAL mastery, to systemic functional linguists, means the abilityto ‘reconstrue’ experience from being verb/action-based to noun/‘thingness’-based. ‘Things’ are boundedand determinate, and be can be observed, measured and possibly explained (Halliday, 2007, p.380), so ‘nouns’are more useful in academic discourse, not to mention that a noun group is infinitely extendable, while a verbgroup is not. A science-related example is: ‘X can conduct electricity’ becomes ‘X’s ability to conduct electric-ity’. This ‘nominalization’ opens up space for more meaning because it is then possible to talk about ‘the par-

tial or considerable or moderate ability’. Halliday (2004, p.29) effectively illustrates the development of noun-based CAL from beginnings in everyday verb-based language by the following re-construals of meaning withimaginary individuals at the age in brackets:

Look, wasn’t it good that we watered that [plant]? See how well it’s growing! (3); How can you be sure that

you really know what’s going on? You do something and then you see that it works. Like growing plants: you

water them and then they grow (6); We can prove that we know exactly what’s happening by seeing that

what we do is working (9); What best proves that we know something accurately is that we can act effec-

tively (12); The best proof that our knowledge is accurate is the fact that our actions are effective (15); The

truest confirmation of the accuracy of our knowledge is the effectiveness of our actions (CAL).

CAL mastery also means coping with many other grammatical ‘problems’ (see Halliday, 2004, p. 159ff) e.g.the nature of technical taxonomies ‘a is a kind of x’ (hyponymy) or ‘b is a part of y’ (meronymy, e.g. ‘kitchen’is a meronym of ‘house’, see Halliday, 2004, p. 67; c.f. metonymy, e.g. ‘the press’ is a metonym of ‘newsmedia’); and ‘stretched’ grammar of some verbs (relational or verbal) to construct information of a lowerorder on which can be based a higher order meaning, e.g. ‘The table tells us (lower order) the risk of gettinglung cancer increases as smoking increases (higher order)’ (ibid. p. 167; Christie and Cleirigh, 2008, mybrackets).

Kong’s (2004) research with HK biology teachers has shown that, unfortunately, teachers aim to raiseCALP only to the extent required by examinations, and the writing demands of public biology examinationspromote CAL in a very restricted way in terms of variety of scientific genres. Moreover, in her teacher sampleshe found that there was little awareness of the specialized discourse of science outside examination require-ments. Hoare (2003) illustrates how learning outcomes in science can be influenced by the teachers’ level ofawareness of language and of the relationship between language and content. He also points out that mostresearch into outcomes of English-medium education in HK focuses on English proficiency per se rather thanthe English of/as the ‘content’, or CALP (p. 34). In a rare subject-related study, Yip et al. (2003) conclude thatin science, the mean score of English-medium students in a HK government achievement test was statisticallysignificantly lower than that of the highest stratum of the Chinese-medium students (p. 324), despite the highergeneral intelligence score of the English-medium students. Gu’s (2006) comprehensive review of studies ofEnglish-medium achievement reinforce this gloomy picture, reiterating findings by Flowerdew et al. (2006)and Hyland (1997), at tertiary level, warning that universities will continue to have to cope with theconsequences.

While Johnson (1997) had claimed that ‘educational outcomes in content subjects [at secondary level], atleast in receptive knowledge, are comparable with, and in certain areas superior to achievements in otherdeveloped education systems’, Marsh et al. (2000) pointed out that Johnson’s conclusions were based on testswhich did not include more language-dependent subjects (e.g. social sciences) and may not have taken accountof the extent of L2 actually used in teaching. In contrast, the Marsh et al. (2000) longitudinal meta-analysis ofsubject achievement across S1–S3 (ages 12–15), in HK, did control for the amount of English actually used inclasses, and did investigate social science subjects. They found that there were large negative effects of EM

E. Walker / System 38 (2010) 50–62 53

teaching on achievement for science, history and geography. There were slightly smaller negative effects of EMteaching for mathematics. This was explained in two ways. First, that mathematics depends more on symbolsand numbers, and is thus less language-dependent. Second, while basic mathematical concepts might havebeen mastered in the mother tongue in primary school, science and social sciences are new subjects in second-ary school. A further finding was that though students at EM schools were more academically talented thanthose studying through Chinese, their subject-related English proficiency was ‘not greater than might beexpected in terms of their general achievement’ (p. 324). That is, the scientific English of even the best studentshad not benefited much from immersion at junior secondary level.

2.2. Support programs and foreign language CAL development

In contexts where English is L1, e.g. the US, there are many documented types of support for developingthe proficiency of students with initially limited English proficiency (LEP). Some are ‘transitional’, non-bilin-gual programs aiming to develop English proficiency, e.g. ‘sheltered immersion’ (Murray, 1999) or Commu-nity College ESL programs (Kuo, 1999), while others aim to develop additive bilingualism, e.g. two-waybilingual programs (Alanis, 2000). There is also growing awareness of the need for special measures to supportvery high level proficiency, e.g. for academic purposes, not only in English, but in languages other than Eng-lish (LOTE) (Leaver and Shekhtman, 2002; Byrnes, 2005).

However, where English is a foreign language, the literature on support programs for LEP pre-immer-sion students is sparse, despite LEP being the default characteristic of the students. Moreover, literature isapparently non-existent at senior secondary level, perhaps because English study often starts early enoughfor the assumption there is less ‘need[ ] to develop rigorous models of instruction and learning at thehighest levels of proficiency’ (Brecht, 2002). In HK, pre-immersion support is a ‘varied and complex pic-ture’ (Man et al., 2003), possibly related to the large variation in the quality of primary education, whichdoes not often progress much beyond everyday topics such as food, drink and entertainment (He, 2006),despite the students’ mainly academic end-use. Summer ‘bridging’ programs are provided by many of the114 English-medium schools as well as profit-making and charitable organisations. At S1 level, Man et al.(2003) found that in ‘well established’ English-medium schools, ‘bridging’ tuition such as ‘dictionaryskills’, ‘independent learning skills’ and ‘oral skills’ was supported in English classes only, while less wellestablished schools incorporated support measures from a variety of personnel, such as alumni and seniorstudents, in so-called ‘English-loaded subjects’ as well as in English-as-subject. Schools also adopt ‘curric-ulum tailoring’ and a ‘slower teaching pace’ (p. 137) throughout S1. Very few schools in the studyreported explicitly prioritizing CAL support, and when they did, some referred to ‘teaching subject-spe-cific special terminology’ or ‘subject-related jargon’ (p. 136), indicating a word-level view of the CAL sup-port required.

Explicit, comprehensive attention to CAL was also absent from the junior-secondary-related recommenda-tions of Marsh et al. (2000), who simply suggested that, in view of the slow pace of second language develop-ment, [HK] ‘students may require a sufficiently long transition period, spent entirely on learning English to an

appropriate threshold level of proficiency, prior to starting an English-language secondary school’ (p. 342, myitalics). This kind of long-term, intensive, pre-immersion language support was suggested because their studyfound that the negative effects of immersion ‘declined somewhat over time’ (ibid. p. 341), suggesting that anypossible benefits of (late) immersion may take more than 3 years to materialize. The Marsh et al. recommen-dations were subsequently applied by the HK government steering committee to senior secondary, and the EPwas conceived as one form of pre-immersion support. The ‘appropriate threshold level’ and the kind of lan-guage support were, however, never defined (see Section 3.3).

3. Methodology of the evaluation

3.1. Sample

All test-takers were S4 native speakers of Cantonese Chinese, around 16 years of age, of average to goodacademic ability in their age cohort, but who had been assessed by government tests as ineligible for English

54 E. Walker / System 38 (2010) 50–62

immersion in S1. They had studied English-as-subject for about 3 h per week (in primary school) to 8 h perweek (in secondary school), for around 10 years, and studied academic subjects through Cantonese fromS1 to S3. Students (430) had participated in the 2-year EP and 44 had not. The imbalance was unavoidable,because the EP was seen by parents as highly desirable English exposure.

3.2. Data and overall methodology

The data are students’ scores and short-answer responses to two of the items on a test of science-relatedEnglish CALP (see Appendix). The methodology is essentially cross-sectional in that a snapshot is providedof students’ scientific CALP at one point in time, a few months after the Pilot. The ‘snapshot’ is the chosenmethod, because the focus here is on the differences at one point in time between test performances by EP andnon-EP participants. For space reasons, there is no presentation of other data collected in the longitudinalstudy detailed in Chow et al., 2004a, e.g. classroom observation and interviews, although these are referredto in Section 5.

3.3. Threshold proficiency and the proficiency test

EP evaluators were required by the steering committee to assess students’ EP exit proficiency levels. Sinceone of the steering committee’s stated aims of the EP was to ‘smoothen the transition’ to S4 English-mediumstudy, as mentioned, it made sense for the test to be based on informed definitions of S4 subject-related CALP‘threshold’ levels in order to assess whether the EP students’ exit CALP was adequate enough for ‘smoothtransition’. Doing so turned out to be a research project in itself, in view of two main factors. First, the con-text-specificity, geographically speaking, of ‘thresholds’ and consequent unsuitability of existing instrumentsfrom elsewhere, e.g. the Common European Framework, and diagnostic word lists (Xue and Nation,1984), and the threshold level word list (Van Ek, 1977); and second, the lack of HK government English

threshold specifications at S1, not to mention S4, where the threshold level for achieving additive bilingualismis higher. Eventually, an attempt had to be made by the EP evaluators to conceptualize what senior secondarythreshold levels might mean in the HK context, and subsequently to develop a feasible, justifiable form ofCALP test of scientific English.

The rationale for the part of the proficiency test reported here was that senior secondary science studentswould need to: understand and act on instructions/explanations for science-specific classroom tasks and con-ceptual understanding; read, with understanding, science-specific texts; use the texts to learn or to solveproblems; and write to display knowledge of science-specific concepts and processes (see Chow et al.,2004a, pp. 34–36 for details of the test development process). It was appreciated that students would haveto harness their non-subject-specific English resources (e.g. knowledge of grammar), mainly derived from10 years of English-as-subject, plus subject-related English resources, ostensibly derived from the 2-yearEP (e.g. some science-compatible lexis such as ‘metal’) to do the science-specific test tasks in English. Itwas felt that the task was authentic, feasible and justifiable because the instructions were in Chinese; thetopic had been previously studied in Chinese, which in HK generally means that, because of the mixeduse of Cantonese and English, key technical English words (nouns and verbs) are comprehensible (Yip etal., 2003; Lin, 2006); the actual task was only very slightly adapted from a textbook widely used in S3HK English-medium schools; similar tasks in previous tests had already been piloted in other parts ofthe evaluation study; and students’ L1 and L2 grammatical competence was taken into account (see scoringbelow). The test could not be designed to test EP module learning per se, because of the large number ofmodules and their internal variation. In any case, no single, direct cause-effect relationship between the EPand test result could be sought or claimed, because it was impossible to parcel out the effects on proficiencyof regular English-as-subject classes and other variables.

Task one was a weak form of note-taking, an aural cloze. Task two (a) required the definition of an alloy,with possible exemplification and (b) required a cause-effect-type evaluation of alloy use (see Appendix). Theperformance bands and descriptors were quite general. ‘Above-threshold’ (scores 5–6) meant a complete,

E. Walker / System 38 (2010) 50–62 55

linguistically–conceptually accurate response, where ‘accurate’ meant writers’ ‘choices are possible and accept-able within the nexus of intended [scientific] meanings, available [linguistic–systemic] resources, and privilegedforms of expression as the [scientific CAL cultural community] has evolved them’ (Byrnes, 2002, p. 45, mybracketed wording). ‘At-threshold’, (scores 3–4), meant a generally linguistically–conceptually competentresponse. A score of 3, called ‘at-lower-threshold’, was less complete and/or requiring more careful reading.‘Below-threshold’, (scores 1–2, and 0) meant a linguistically–conceptually partially appropriate response, the 1score exhibiting only a sense of a correct meaning. Score 0 meant a linguistically–conceptually flawed, and/orunintelligible, minimal or nil response. The wording ‘linguistically–conceptually’ in these general descriptorsconstrues the researchers’ view of the fundamental integration of language–thought–content. That is, concep-tual understanding is the language choice; there is no systematic thought or content without language(Halliday, 1978).

Text 1 below was considered a ‘threshold’ response for the definition task and Text 2 a ‘threshold’ responsefor the evaluation task. The wording in brackets would be ‘above threshold’ 6, exhibiting more typical CALfeatures (Halliday, 2004) such as nominalization. Raters were instructed that a 4 or borderline 5 score could beobtained by conceptual–linguistic accuracy without such nominalization as ‘for the improvement of’, or suchnon-basic structures as passive voice or to-clauses, and despite slips in one or two of the grammatical elementsunderlined. The non-bold words were provided in the test rubrics.

Text 1

An alloy is made by adding (the addition of) other elements (metals or non-metals) to a metal to form a(uniform) mixture to improve (for the improvement of) the properties of the metal. (For example, brass iscomposed of about 70% copper and about 30% zinc. Brass is harder and more resistant to corrosion thanpure copper and pure zinc).

Text 2

Alloy A is suitable for making warships because the low density (enables flotation) makes the ship floateasily; a warship needs a high strength body (for war damage reduction) to reduce damage in war; warshipsare always in water (contact), so the body must be resistant to corrosion. Alloy B (C) is suitable for a win-

dow frame because it is cheaper and its strength and resistance to corrosion are not low (its low priceenables easy purchase, and its medium strength and corrosion resistance enable durability.)

3.4. Test Implementation

Thirteen intact classes of S4 students in 12 schools were tested over 3 months. Since the evaluatorshad been required to administer proficiency tests regularly over the preceding 3 years, the testing pro-cess had been constantly refined, and by the time this test was administered near the end of the EPevaluation, the process was as rigorous and consistent as normal variation in in-school testing condi-tions permit. One experienced rater scored all science papers, with random sub-sample double rating byone other experienced rater. The experienced raters had both marked the science section of the testsfrom four previous EP testing periods, where inter-rater reliability was established, and raters receivedfeedback on their rating for standardization purposes. There is support for single-rater assessment ofscience performances, though not necessarily in other disciplines (Ruiz-Primo and Shavelson, 1996, p.1050; Shavelson et al., 1999). Single-rater assessment also has the advantage that rater error remainingafter the double rating should be consistent, though some readers may see this as a limitation of theresearch.

3.5. Data analysis

Regarding the quantitative data, comparison was made between the itemized test scores of the EP andnon-EP students. Comparisons were made by examining percentages of ordinal scores at each score, as

Table 1Comparison of 2-year EP students and non-EP students’ CALP.

Scientific English CALP Scale 0–6 Independent samples t-testMean

EP (n = 430) Non-EP (n = 44) Mean difference Sig. p value

Task 1. Taking dictation of technical vocab.:extraction and disposal of metals

2.67 (SD 1.33) 2.34 (SD 1.08) 0.33 (t) = 1.85 0.06

Task 2(a). Defining an alloy 0.63 (SD 0.87) 0.57 (SD 0.85) 0.06 (t) = 0.49 0.62Task 2(b). Applying knowledge of properties

of alloys to a practical problem2.20 (SD 1.51) 2.36 (SD 1.63) �0.16 (t) = �0.65 0.52

0 = flawed, minimal, nil; 1–2 = partially appropriate, below-threshold; 3–4 = generally competent, near or at-threshold; 5–6 =linguistically–conceptually accurate, above threshold.

Table 4Comparison of Task 2 (b) performances: reading and writing: evaluating alloys.

Performance % EP students (n = 430) % Non-EP students (n = 44)

0 16.3 20.51 18.6 9.12 23.3 22.73 22.1 15.94 10.7 27.35 8.8 2.36 0.2 2.3

0 = flawed, minimal, nil; 1–2 = partially appropriate, below-threshold; 3–4 = Generally competent, near or at-threshold; 5–6 = linguis-tically–conceptually accurate; above threshold.

Table 2Comparison of Task 1 performances: listening and writing – dictation of vocabulary on extraction and disposal of metals.

Performance % EP students (n = 430) % Non-EP students (n = 44)

0 0.9 0.01 23.5 22.72 18.4 38.63 37.2 25.04 6.3 9.15 12.8 4.56 0.9 0.0

0 = flawed, minimal, nil; 1–2 = partially appropriate, below-threshold; 3–4 = generally competent, near or at-threshold; 5–6 = linguisti-cally–conceptually accurate, above threshold.

Table 3Comparison of Task 2 (a) performances: reading and writing – defining an alloy.

Performance % EP students (n = 430) % Non-EP students (n = 44)

0 58.1 61.41 24.9 25.02 12.6 9.13 4.2 4.54 0.2 0.05 0.0 0.06 0.0 0.0

0 = flawed, minimal, nil; 1–2 = partially appropriate, below-threshold; 3–4 = generally competent, near or at-threshold; 5–6 = linguisti-cally–conceptually accurate, above threshold.

56 E. Walker / System 38 (2010) 50–62

E. Walker / System 38 (2010) 50–62 57

presented in Tables 1–3 below; and, following standard practice (Harwell and Gatti, 2001), treatingordinal data as interval data and testing for statistically significant differences in ‘mean’ scores by applyingparametric t-tests, as presented in Table 4. Knapp (1990) and Wright (2003) present positive views of thispractice. There was no attempt to quantitatively account for differences between EP delivery by subjectteachers and by English teachers, though qualitative data did address this indirectly (Chow et al.,2004b). This was mainly because the teachers were essentially volunteers and the issue could have beencounter-productive.

Regarding the qualitative data, the short answers to question 2, the brief text descriptions are informed byconsultation with a subject-expert and research into the language of scientific genres (e.g. Unsworth, 2001;Halliday, 2004). Capitalized terms for grammatical functions, e.g. Thematized Subject, follow Systemic Func-tional Linguistics convention.

4. Results

Table 1 above shows that in Task 1 students were able to represent heard words in writing to a reasonableextent, though not statistically significantly better than non-EP students. Table 2 below shows that about 13%of EP students performed above threshold in the 5–6 band on this task, and 44% in the at-threshold 3–4 band.On the other hand, no non-EP students scored 6, only around 5% scored above-threshold 5, and around 40%scored below-threshold 1–2.

Tables 3 and 4 below show that most performances on the definition task were far below-threshold, atzero, in both participants and non-participants, indicating students’ unfamiliarity with the term ‘alloy’,despite the clue that ships and windows were made of them, and in the evaluation task (with morecues), though many students in both groups scored zero, most students in both groups scored in therange below-threshold 2 to at-lower-threshold 3. However, interestingly while 42% of EP partici-pants scored below-threshold 1–2, and only 32% scored lower-or-at-threshold 3–4, this was reversedfor the non-EP participants, of whom 43% scored lower-or-at-threshold 3–4, and 27% scored at-threshold4.

Typical responses (score 0) for the alloy definition task were: ‘The warship is steel’; ‘warship is made from

aluminium’. Below-threshold (score 1) responses were ‘The alloy is corrosion resistance’; ‘the frame of windowwas made from aluminium alloys’; and a rarer below-threshold response (score 2) was ‘Alloy is make from two

different metals and it will not corrosion resistance faster metals’.The responses exhibit students’ difficulties with making appropriate scientific meanings, though ‘definition’

is generally considered a task carrying lower cognitive–linguistic demand than evaluation. The positioning ofwriter and reader by declarative grammatical mood, construing information-giving, is one of the few accura-cies. Relational ‘is’ would appropriately construe definition if the Thematized Subject entity was ‘an alloy’, butonly the ‘score 2’ response manages to achieve this to a limited extent.

A typical below-threshold response (score 2) for the evaluation task was:

D is used to make warship. The corrosion resistance is high and the density and strength are medium. Price

is high because all the warship need many money to build. C is used to make window, because window is notlike warship. The price cannot be very high and the corrosion resistance is high [sic].

An at-lower-threshold response (score 3) was:

‘Alloy A is most suitable for making the warship as A is good in corrosion resistance. A also has alow density. Alloy B is most suitable for making the frame of window as B is cheap and it is strong’

[sic].

This item may have elicited better responses because more cues were available in the rubrics. The at-lower-threshold ‘3’ text is more cognitively–linguistically appropriate than below-threshold text ‘2’ because

58 E. Walker / System 38 (2010) 50–62

of a more conceptually appropriate choice of alloy (A was more suitable than D), and less manipulation ofthe grammar of the rubrics. Both texts omit an answer for part of the question (reasons for chosen criteriaare required) possibly because English cues for this part were not available in the rubrics. There are alsoproblems in construing scientific generality in noun groups and verb tense. Modality is absent in the ‘3’ textand minimal in the ‘2’, though it is an unmarked feature of evaluative text-types (Halliday/Matthiessen,2004). There is evidence of stance in ‘good in’ in the ‘3’ text, but evaluative lexis is minimal and mostly cop-ied in the ‘2’ text. The ‘3’ text exhibits generally appropriate textual meaning through lexical cohesion, ade-quate Thematization, and conjunction, though some referencing with ‘the’, referring to the test rubrics,reduces the text’s independence. Clausal Themes (Halliday/Matthiessen, 2004) in the ‘2’ text are poorlyselected for demonstrating conceptual understanding, and lexical cohesion seems achieved only throughcopying the test rubrics.

5. Discussion

While research (e.g. Shavelson et al., 1999) has established that wider multi-tasking than was possiblein this study is the most reliable way of assessing science performances, the results of the reported partsof this evaluation do tend to cast some doubt on the viability of the intervention, which was imple-mented as a kind of frequent, though not intensive, discipline-related, language teaching. The teachingturned out to be an overly weak form of CAL teaching. In terms of science, after 2 years, participantsseemed to do no better than non-participants. Moreover, surprisingly, non-participants could harnesstask cues better than participants in evaluating ‘alloys’ even though they were not sure of the scientificdefinition of alloys. Notwithstanding this counter-intuitive result, the below-threshold performances of themajority of all test-takers indicate that even though all have reached S4, the demonstrated low profi-ciency in written science CALP of these students switching from Chinese to English-medium so latein their schooling could be cause for some concern. The performances imply that all might face disad-vantage, as Marsh et al. (2000) warned, in terms of competing for tertiary study places in science, and inlearning science well in English both in senior schooling and at university, relative to those students whohave had three additional years English-medium instruction in S1–3. Furthermore, the latter were, on thebasis of region-wide government aptitude testing in S1, assessed as more academically able in the firstplace.

The demonstrated deficiencies in students’ scientific CALP performances might be accounted for in severalinterrelated ways, which if addressed, could help improve the EP program and inform similar pre-immersioninterventions. The improvements relate to the all-important factor, the teaching, as well as to other basic mat-ters such as module design, selection and time allocation. The following argument draws on other parts of theevaluation, as well as the part reported here.

In regard to the teaching, other parts of the evaluation observed that the quality of EP teaching wasgenerally not conducive to either general English development or specific CAL development. Teacher talkdominated classes, and did not exemplify language-conscious teaching (Chow et al., 2004a, p. 50); stu-dents had insufficient good quality output opportunities (ibid.); and teachers did not exploit the sharedL1 as a means of access to the English CAL (ibid.). Moreover, also in other parts of the evaluation,teacher interviews (Chow et al., 2004b, pp. 55, 64) and classroom observations (Chow et al., 2004a, p.47) showed lack of awareness of English teachers’ role in teaching English for use in English-mediumsubject areas, and subject teachers’ role in language-conscious subject teaching. Not only was the teachinggenerally unhelpful for CAL development, but it was found in another part of the evaluation that EPmaterial design solely by English specialists meant that the modules were not specialized enough lex-ico-grammatically for the participating science teachers but sometimes too specialized for the participatingEnglish-as-subject teachers (Chow et al., 2004b, pp.52 and 65).

The sudden call to the EP designers for teacher training before the EP began, gave early indication of someteacher uncertainties. These may have accounted for some unhelpful implementation, such as the lack of

E. Walker / System 38 (2010) 50–62 59

systematic or single-subject focus in teachers’ module selections. If teachers had appreciated the very long timeneeded for CAL development, they might have systematically selected many or all of the modules in onesubject area, such as the 12 available science modules, and there might have been more positive learningeffects. As it turned out, any potential positive effects may have been diluted by recorded unsystematic andunfocused module choices.

On a more positive note, other parts of the evaluation (see Chow et al., 2004a, p. 124) found that EPparticipants did perform significantly better than non-participants in listening-related tasks when the datawas grouped by integrated English skills (e.g. listening/writing; reading/writing), not per test item, andwhen the mathematics and science test scores were combined and examined both non-parametricallyand parametrically. The slightly more positive result at the most elementary linguistic level, the percep-tion of single words and sounds, suggests that this relatively long-term but non-intensive support mighthave been beginning to influence beneficial learning effects. Longer, more intensive, more CAL-compat-ible support may have facilitated more positive and wider effects. Particularly in view of the long-termnature of language and CAL learning, and the government-assessed lower general academic ability of theChinese-medium students, stronger integration into the regular curriculum, including English-as-subjectlessons, from day one in secondary school, might be warranted. After all, the decision of most schoolsto devote only an hour after school once or twice a week to the EP meant that it was unlikely to leadto significant learning advantages. Fuller articulation of the intervention with the regular English-as-sub-ject curriculum and with the senior subject curriculum is suggested, because the English curriculum,which often occupies up to one third of the HK school timetable, could possibly bear some responsibil-ity for students’ demonstrated generic weaknesses. Teacher interviews from another part of the evalua-tion (Chow et al., 2004b, p. 56), support the view, illustrated by Davison (2005, p. 235), that foreign-language-as-subject curricula in HK traditionally deal primarily with everyday topics and BICS, and arelargely ‘irrelevant to other subjects’ curricula’. This might be unimportant elsewhere, but in HK andother locations where students need to rapidly develop very high levels of English for tertiary-levelCAL, it is important that students’ English CAL profits from the large number of English-as-subjectclasses.

On a final positive note, in view of the fundamental limitations discussed, the student performances inthis study could be seen as creditable, cause even for optimism. To build on this optimism, it seems thatwhat is needed is deeper understanding by all stakeholders of the role of English as a cognitive–academictool, and of the nature of CAL mastery outlined above. Since the literature tells us that the scientific Eng-lish of even the most able HK students had not benefited much from immersion at junior secondary level,CAL awareness-raising, based on abundant CAL-related systemic functional linguistic research, could be ahelpful element in this and similar interventions for those attempting the mammoth task at senior secondarylevel.

6. Conclusion

The senior secondary school mission of mastery of cognitive–academic language involves much morethan a focus on dictionary skills and other features of bridging programs at junior secondary. Thisstudy’s results have shown that, in HK, more effective support for very late English immersion at seniorsecondary appears necessary if Chinese-medium students are not to face disadvantage in tertiary levelEnglish-medium studies, and if their secondary and tertiary study is to be the rich experience to whichthey are entitled. In terms of pre-immersion intervention programs in contexts where very high levels ofcognitive–academic English are required in a short time, it is suggested that more effective supportentails greater stakeholder awareness of the complex and long-term nature of cognitive–academic lan-guage mastery; more informed module planning and selection; longer-term, more CAL-compatible Eng-lish teaching; and more student-centred integration with other good quality English learningopportunities.

60 E. Walker / System 38 (2010) 50–62

Appendix A

: Section I Listening Comprehension and Writing

1.

Listen carefully and follow the instructions. A passage will be read three times in English. Listen carefully and fill in the blanks with the words you hear.

Extraction and disposal of metals

There are environmental problems associated with the and

of metals. In extracting metals from the earth, human beings cause

. In addition, metal needs a large amount of energy,

and produces a large amount of air such as dioxide.

In disposing of metals, other environmental problems arise. For example, huge

amounts of are created and some of this waste, such as mercury, is

for humans and animals.

: Section II Solving written problems and writing solutions

2. Read the following problems and write your answer in English.

(a) The warship and the frame of window are made from different alloys. What is an alloy?

(b) The following table lists information about four alloys. Which alloy in the

table is most suitable for making the warship? Which alloy is suitable for

making the frame of window? State TWO reasons for each choice.

Property Alloy Density Strength

Corrosion resistance Price

A Low High High High B High High Medium Low C Medium Low High Low D Medium Medium High High

References

Alanis, I., 2000. A Texas two-way bilingual program: its effects on linguistic and academic achievement. Bilingual Research Journal 24, 3(online).

E. Walker / System 38 (2010) 50–62 61

Brecht, R.D., 2002. Foreword. In: Leaver, B.L., Shekhtman, B. (Eds.), Developing Professional-Level Language Proficiency. CambridgeUniversity Press, Cambridge.

Burger, S., Wesche, M., Migneron, M., 1997. Late-late immersion: discipline-based second language teaching at the University of Ottawa.In: Johnson, R.K., Swain, M. (Eds.), Immersion Education: International Perspectives. Cambridge University Press, Cambridge.

Byrnes, H., 2002. Contexts for advanced foreign language learning: a report on an immersion institute. In: Leaver, B.L., Shekhtman, B.(Eds.), Developing Professional-Level Language Proficiency. Cambridge University Press, Cambridge, pp. 61–76.

Byrnes, H., 2005. Reconsidering the nexus of content and language: a mandate of the NCLB legislation. Modern Language Journal 89 (2),277–282.

Chow, A., Li, B., Ma, A., Pang, M., Tse-tso, Y.W., Tong, A., Walker, E., Chan, J., Wong, W., 2004a. Final Report: Study on theEffectiveness of an Enrichment Program in CMI Schools. Hong Kong Institute of Education and Education and Manpower Bureau,Hong Kong.

Chow, A., Li, B., Ma, A., Pang, M., Tse-tso, Y.W., Tong, A., Walker, E., Chan, J., Wong, W., 2004b. Final Report: Study on theEffectiveness of an Enrichment Program in CMI School: Report on Case Study Schools. Hong Kong Institute of Education andEducation and Manpower Bureau, Hong Kong.

Christie, F., Cleirigh, C., 2008. On the importance of ‘showing’. In: Wu, C.Z., Matthiessen, C.M.I.M., Herke, M., Proceedings ISCF,Voices Around the world, vol. 35, 35th ISCF organizing committee, Sydney, pp.13–19.

Cummins, J., 1979. Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some othermatters. Working Papers on Bilingualism 19, 121–129.

Cummins, J., 1991. Conversational and academic proficiency in bilingual contexts. Association Internationale de Linguistique AppliqueReview 8, 75–89.

Davison, C., 2005. Learning your lines: negotiating language and content in subject English. Linguistics and Education 16 (2), 219–237.Flowerdew, J., Miller, L., Li, D.C.S., 2006. Chinese lecturers perceptions, problems and strategies in lecturing in English to Chinese-

speaking students. RELC Journal 31 (1), 116–138.Gu, Y., 2006. Towards a coherent foreign language policy in China: lessons from Hong Kong. Asian Journal of English Language

Teaching 16, 67–87.Halliday, M.A.K., 1978. Language as Social Semiotic: The Social Interpretation of Language and Meaning. Edward Arnold, UK.Halliday, M.A.K., 2004. The language of science. In: Webster, J.J. (Ed.), . In: The Collected Works of M.A.K. Halliday, vol. 5.

Continuum, London and New York.Halliday M.A.K., revised by Matthiessen, C.M.I.M., 2004. An Introduction to Functional Grammar, third ed. Arnold, London and New

York.Halliday, M.A.K., 2007. Language and education. In: Webster, J.J. (Ed.), . In: The Collected Works of M.A.K. Halliday, vol. 9.

Continuum, London, New York.Halliday, M.A.K., Martin, J.R., 1993. Writing Science: Literacy and Discursive Power. Falmer, London.Harwell, M.R., Gatti, G.G., 2001. Rescaling ordinal data to interval data in educational research. Review of Educational Research 71 (1),

105–131.He, A.E., 2006. Subject matter in Hong Kong primary English Classrooms: a critical analysis of teacher talk. Critical Inquiry in Language

Studies 3 (2&3), 169–188.Hoare, P., 2003. Effective Teaching of Science Through English in Hong Kong Secondary Schools. Unpublished PhD Thesis, University of

Hong Kong, Hong Kong.Hyland, K., 1997. Is EAP necessary? A survey of Hong Kong undergraduates. Asian Journal of English Language Teaching 7, 77–79.Johnson, R.K., 1997. The Hong Kong education system: late immersion under stress. In: Johnson, R.K., Swain, M. (Eds.), Immersion

Education: International Perspectives. Cambridge University Press, Cambridge.Knapp, T.R., 1990. Treating ordinal scales as interval scales: an attempt to resolve the controversy. Nursing Research 39 (2), 121–123.Kong, S., 2004. Writing in the Immersion Classroom: Developing Students’ Content Knowledge and Second Language Proficiency.

Unpublished PhD Thesis, City University of Hong Kong.Kuo, E.W., 1999. English as a second language in the community college curriculum. New Directions for Community Colleges 100, 53–

62.Leaver, B.L., Shekhtman, B. (Eds.), 2002. Developing Professional-Level Language Proficiency. Cambridge University Press, Cambridge.Lin, A., 2006. Beyond linguistic pluralism in language-in-education policy and practice: exploring bilingual pedagogies in a Hong Kong

Science classroom. Language and Education 20 (4), 287–305.Man, E.Y.F., Coniam, D., Lee, I., 2003. Adapting to teaching in the medium of English: how are schools helping their secondary one

students cope? Journal of Basic Education 12 (2), 125–149.Marsh, H.W., Hau, K.T., Kong, C.K., 2000. Late immersion and language of instruction in Hong Kong high schools: achievement growth

in language and non-language subjects. Harvard Educational Review 70 (3), 302–346.Murray, G.L., 1999. Autonomy, technology and language learning in a sheltered ESL immersion program. TESL Canada Journal 17 (1),

1–15.Ruiz-Primo, A., Shavelson, R.J., 1996. Rhetoric and reality in science performance assessments: and update. Journal of Research in

Science Teaching 33 (10), 1045–1063.Shavelson, R.J., Ruiz-Primo, A., Wiley, E.W., 1999. Notes on sources of sampling variability in science performance assessments. Journal

of Educational Measurement 36 (1), 61–71.So, D.W.C., Jones, G.M. (Eds.), 2002. Education and Society in Plurilingual Contexts. Brussels University Press, Brussels.

62 E. Walker / System 38 (2010) 50–62

Swain, M., Johnson, R.K., 1997. Immersion education: a category within bilingual education. In: Johnson, R.K., Swain, M. (Eds.),Immersion Education: International Perspectives. Cambridge University Press, Cambridge, pp. 1–15.

Tsui, A.B.M., 2004. Medium of instruction in Hong Kong: one country, two systems, whose language? In: Tollefson, J.W., Tsui, A.B.M.(Eds.), Medium of Instruction Policies: Which Agenda? Whose Agenda? Lawrence Erlbaum Associates, Mahwah, NJ, pp. 283–294.

Unsworth, L., 2001. Teaching Multiliteracies Across the Curriculum: Changing Contexts of Text and Image in Classroom Practice. OpenUniversity Press, New York and London.

Van Ek, J.A., 1977. Threshold Level for Modern Language Learning in Schools. Longman, London.Wright, D.B., 2003. Making friends with your data: Improving how statistics are conducted and reported. British Journal of Educational

Psychology 73, 123–136.Xue, G., Nation, I.S.O., 1984. A university word list. Language Learning and Communication 3 (2), 215–229.Yip, D.Y., Tsang, W.K., Cheung, S.P., 2003. Evaluation of the effects of medium of instruction on the science learning of Hong Kong

secondary students: performance on the science achievement test. Bilingual Research Journal 27 (2), 295–331.