15

Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

Embed Size (px)

Citation preview

Page 1: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer
Page 2: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

11

1 Introduction Since the Reform and Opening policy of the People’s Republic of China in 1978, the economic and political importance of China has grown enormously, and more and more individuals want or need to learn Chinese. Although reliable data about the worldwide number of all learners of Chinese do not exist (S�n Déj�n, 2009, p. 19), there is evidence of a strong increase. In South Korea, there are around 100,000 learners in schools and universities, and together with those who study via TV, radio or other media, they exceed 1,000,000 (Niè Hóngy�ng, 2007, p. 87). In Japan, Chinese has become the second most popular foreign language behind Eng-lish with 2,000,000 learners (S� Jìng, 2009, p. 88). Europe still lags behind; how-ever, in Germany more than 4,000 students learn Chinese in intensive language programs at universities and colleges (Bermann and Guder, 2010), while an un-known number studies in optional classes. Together with learners at secondary schools, all students of Chinese in Germany number 10,000, leaving only France with more Chinese learners in Europe (Fachverband Chinesisch, 2011). In the United States, nearly 2,000 high schools already offer Chinese, which has become the third most popular language behind English and Spanish (ibid.).

Figure 1: HSK test taker development (black: foreign group; gray: Chinese ethnic minorities). For 2006, bars estimated on a total number of 160,000 (Yáng Chéngq�ng and Zh�ng Jìnj�n, 2007, p. 108); other data from S�n Déj�n (2009, p. 20). Data rights shifted from the HSK Center to the Hanban5 in 2005.

5 The Hanban (Hànbàn or Guóji Hànbàn ) stands for (Zh�ngguó Guóji Hàny� Guójì Tu�gu�ng L�ngd�o Xi�oz�

0

25000

50000

75000

100000

1990 1994 1998 2002 2006

HS

K te

st ta

kers

Year

Page 3: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

12

In addition, more and more students participate in language proficiency tests for Chinese as a foreign language (CFL) (cf. Figure 1), and the number of tests has also risen. Since the beginning of the 1980s by far more than twenty tests have been launched (cf. chapter 1.5). These tests fulfill different purposes, such as helping test takers enter a Chinese or Taiwanese university, placing students into appropriate language courses, giving credit points to students who have gained considerable knowledge prior to their studies, or helping companies to find employees who are able to do business communication and translation work. In South Korea, many job applicants are expected to be able to use Chinese due to the strong economic ties between China and Korea, and the level of proficiency is often directly related to salaries (Niè Hóngy�ng, 2007, p. 87); there, the HSK (Hàny� Shu�píng K�oshì

), the official Chinese proficiency test from the People’s Republic of China (PRC), has partially a major impact on test takers’ lives or affects “the can-didates’ future lives substantially,” and the test can be considered as a high-stakes test (Davies, Brown, Elder, Hill, Lumley, and McNamara, 1999, p. 185; Bachman and Palmer, 1996, pp. 96–97).

The HSK has the largest test population of all CFL tests, and it has prompted the most research. In 2007, more than 1,000,000 test takers participated in it (Wáng Jímín, 2007, p. 126). In Germany, the HSK was the only CFL proficiency test which test takers could take until 2009, when the Taiwanese TOCFL (Test of Chi-nese as a Foreign Language Huáy�wén Nénglì Cèyàn ) entered Germany. In 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer the old HSK.

In fact, overhauling the old HSK was necessary because it had several major limitations: the HSK resembled the format of a discrete-point test7; it did not direct-ly assess oral and written productive skills; in addition, the score and level system was not easy to comprehend (cf. Meyer, 2009), which made it difficult for stake-holders to interpret the meaning of HSK scores. On the other hand, the HSK had several advantages: it was a highly standardized multiple-choice test with very high objectivity and reliability. Both latter qualities derived partly from the fact that the test almost exclusively used items in multiple-choice format. The test intended to measure the Chinese language ability needed for successfully studying in China, and test takers’ results were set in relation to a norm-reference group. It was a high-stakes test for many Koreans, Japanese, Chinese ethnic minorities, and in part, oth-er foreigners interested in studying in China. The (old) HSK has now been used for

Bàng�ngshì; “The Office of Chinese Language Council International”). It is a non-governmental and non-profit organization affiliated with the PRC’s Ministry of Education.

6 According to one high HSK official, the new HSK has absolutely nothing in common with the old one “despite its name” (private conversation in 2010). Official documents and re-search literature have an inconsistent spelling of the “new HSK” or “New HSK.” In this dissertation, the spelling “new HSK” has been adopted.

7 Such a test in Chinese is called f�nlìshì cèshì .

Page 4: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

13

more than 20 years, during which time it underwent changes, and research is still partly being conducted. However, the major question is which inferences can be drawn from test scores of test takers, especially those with a “Western” native lan-guage background, such as individuals from Germany. Therefore, this work will examine the quality of the (old) HSK, with the core question being whether scores or the interpretation of HSK test scores can be considered valid? Is it a fair exam, or is it biased8 in favor of Japanese, Korean or other East Asian test takers9? What do HSK scores tell us about learners of Chinese?

1.1 An integrative validation of the old HSK Although many HSK validation studies have already been conducted, this is the first work providing an integrative validation approach, which attempts to incorpo-rate all studies. But before starting this undertaking, one should stress one im-portant fact: there is no perfect test. As Cronbach ([1949] 1970) has stated:

Different tests have different virtues; no one test in any field is “the best” for all pur-poses. No test maker can put into his test all desirable qualities. A design feature that improves the test in one respect generally sacrifices some other desirable quality. Some tests work with children but not with adults; some give precise measures but require much time; … Tests must be selected for the purpose and situation for which they are to be used. (ibid., p. 115; italics added)

Thus, this work examines whether the HSK is a valid test for a specific purpose.10 For what kind of use do the interpretations of HSK scores make sense? How can we interpret HSK scores and what inferences can we draw from HSK results? What is the intended use of the HSK, and what else should the HSK measure? In what sense are interpretations limited? What do the HSK and Chinese language testing research tell us about the quality of the HSK? What are the logical inferences lead-ing from HSK test performance to conclusions about test takers? Which parts of the HSK consist of weak inferences that should be improved? And finally, what are the intended and unintended outcomes of using the HSK?

Another question concerns whether the HSK can be used as a diagnostic tool for the Chinese language acquisition process, especially for Western learners.

8 If a test or an item favors a group of test takers, but the ability tested is influenced by anoth-er trait or feature of this group which is not part of the construct the test intends to assess, then a test or a specific item of the test can be considered as biased (cf. section 4.5.4).

9 In this work, the terms test taker, (test) candidate, testee, participant and examinee are used synonymously.

10 Ziermann (1996) compared the answering time length of the HSK listening subtest with other language proficiency tests, such as the TOEFL or the Certificate for German as a for-eign language (Zertifikat Deutsch als Fremdsprache). This comparison underlies the as-sumption that it might exist an universal and appropriate answering time for listening sub-tests occurring in language proficiency tests in general (across languages and across tests), which is a true misunderstanding of testing.

Page 5: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

14

Many Western learners did not consider (old) HSK “scores”11 a valid measure of their Chinese language competence, and they complained the HSK had several shortcomings. First, the HSK did not assess productive oral skills. Second, Chinese characters were displayed in all sections and subtests (e.g., also in the multiple-choice answers of the listening subtest). And third, the HSK was mostly a multiple-choice test showing features of a discrete-point test, which did not replicate authen-tic language tasks.12 In contrast, HSK researchers claimed that the HSK “conforms to objective and real situations” (Liú Y�nglín, [1990] 1994, p. 1, preface).

This work shows that the old HSK provided valid score interpretations to as-sess Chinese learners’ listening and reading abilities for the purpose of studying in China. Thus, one should consider the HSK’s specific purpose to evaluate its use-fulness. The validation, or the evaluation of its usefulness, will be undertaken in chapter 4 based on HSK research. This validation study reveals weak aspects of the inferences drawn from scores of HSK test takers. For instance, inferences about test takers’ productive skills are rather limited. Hence, one major goal of this study is to clearly explain which parts of the HSK should be strengthened to provide a better estimate whether learners’ Chinese language abilities sufficed to study at a Chinese university. The validation approach used in this dissertation is an argument-based approach (Kane, 1990, 1992, 2006), which has been successfully used in recent years and has been adopted to develop the new Test of English as a Foreign Lan-guageTM (TOEFL®), the TOEFL iBT (Chapelle, Enright, and Jamieson, 2008).

In chapter 5, the HSK is used as a diagnostic tool estimating the learning pro-gress of learners of Chinese in relation to the length of time they have spent study-ing the language in class. The study was conducted in Germany, which has one of the largest Chinese learning communities in Europe. Over two years, 257 test tak-ers participated in this study13, and 99 learners (without any Chinese language background) provided a good estimate of how many hours an “average” German learner needed to spend in class for achieving a specific (old) HSK level. The main questions guiding this research are:

- Does a positive correlation exist between the time learners spent in Chinese language classes and HSK scores?14

11 Scores themselves can never be valid or invalid, just the interpretations of scores and their use can be valid or not. This will be explained in more detail in section 3.3.

12 The HSK consisted of 170 items. 154 items were multiple-choice items with four answer choices (one key and three distractors). In the cloze test (the last 16 items), test takers had to fill in blanks with characters to complete short texts.

13 The surveys were conducted directly after the test; and the survey was optional. 14 Other scenarios might also be possible. For instance, there could be a correlation up to a

certain number of hours of Chinese classes a learner has taken, e.g., up to 1,000 hours, but after this threshold other factors could become more important for gaining language compe-tence in Chinese (e.g., communicating with Chinese friends, watching Chinese movies,

Page 6: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

15

- If there is a relation between the time spent in classes and HSK results, what is the nature of this relation? Is it possible to estimate a regression line for predicting how long it takes to reach a certain level of proficiency in Chinese?

- What do these results tell us about the nature of the Chinese language acqui-sition process of German learners? What are the main factors influencing this process?

1.2 Why a validation of the old HSK is useful This work (a) investigates language proficiency testing for CFL, (b) will give new insight into how Western test takers acquire Chinese, and (c) discuss these issues on the basis of theoretical approaches and methods from the field of testing (espe-cially psychological testing). Thus, perspectives from different research fields and disciplines need to be incorporated which all overlap to a certain extent (cf. Figure 2). Chinese proficiency testing influences teaching Chinese as a foreign language (TCFL). Almost all large-scale CFL proficiency tests are based on word and grammar syllabi, which, in turn, have a huge influence on course books and other learning material. At the same time, CFL proficiency testing is strongly affected by the field of language testing, which is mostly dominated by Anglo-Saxon countries, particularly the United States and England. And finally, language testing is largely embedded in the theoretical grounds provided by psychological testing.

Figure 2: Localization of research fields relevant for this dissertation.

So, why does this dissertation investigate the old HSK which was replaced by the new HSK in 2010? First, the old HSK was the most widespread proficiency test for CFL in the world, and this dissertation deals with how German test takers per-form on CFL proficiency tests,15 and by 2007, it was the only proficiency test available in Germany, and empirical research could be conducted on only this test.

etc.); therefore, no correlation may be found above this amount of Chinese classes. If the re-lation between both variables is non-linear, the correlation coefficient normally diminishes.

15 In Europe, the HSK was first administered in Hamburg, on June 4, 1994 (Ziermann, 1996).

Educational testing

TCFLLanguage testing

Proficiency testing for CFL

Page 7: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

16

Second, the HSK has one of the longest histories of CFL proficiency tests. Re-searchers have generated a vast number of studies, which helped to develop and to improve the HSK, and this offers a rich pool for understanding how CFL testing and research in China has developed and functions. Investigations on the (old) HSK continued until recently (e.g., Huáng Ch�nxiá, 2011a, 2011b).16 Therefore, by us-ing the concrete tool “HSK” and its research history, this work highlights the cru-cial mechanisms generally inherent in CFL testing. To reach this goal, the funda-mental debate about today’s test theory, the concept of validity, and a useful and feasible approach for validation have been integrated into this work. Hopefully, this will offer new aspects into CFL acquisition, and a better understanding of the “CFL construct” and its assessment. As Liú Y�nglín (1994d) clearly stated, testing in CFL—like in other disciplines as well—is an ongoing process of making compro-mises and finding an appropriate and useful trade-off. To understand these com-promises, a concrete test must be integrated into a clear and integral argumentative framework explaining what the test intends to measure.

1.3 Research overview and approach With the rise of the HSK in the PRC (1990)17 and the TOCFL18 in the Republic of China (2004), proficiency testing for CFL came to the agenda.19 More than 450 studies related to the HSK have been published, starting with Liú Xún, Huáng Zhèngchéng, Fng Lì, S�n J�nlín, and Gu� Shùj�n (1986).20 Many studies were published between 1989 and 2010 in the eight edited volumes on the HSK21, one edited volume deals with language test theory and CFL testing (Zhng K�i, 2006a). The majority of these studies were conducted by professional HSK test develop-ers22 for further improving the test. In the late 1990s, more critical studies followed,

16 With DIF studies she investigated performance differences of Western and Asian test tak-ers.

17 The HSK was reviewed by experts in 1990. In 1992, it became the official language profi-ciency test of the PRC (Liú Y�nglín, 1994, preface, p. 1).

18 In 2003, the test was originally named CPT (Chinese Proficiency Test). In 2007, the test was renamed TOP (“Test of Proficiency – Huayu”). On August 4, 2010 the Ministry of Ed-ucation of the Republic of China announced that the “TOP – Huayu” would be called “Test of Chinese as a Foreign Language” (TOCFL) from that day on. The Chinese name––Huáy�wén Nénglì Cèyàn ––has never been changed.

19 In 1981, the “Chinese Language Test” (Zh�ngguóy� Ji�ndìng Shìyàn ) was launched in Japan by the preceding organization of the “Japanese Society for Testing Chinese” (Rìb�n Zh�ngguóy� Ji�ndìng Xiéhuì ). Approximately 15,000 test takers per year participate in this test (Yáng Ch�xi�o, 2007, p. 45).

20 Actually, more essays have been published. However, in some studies the HSK plays only a very subordinate role, so they have not been counted.

21 The last volume focused on the G�ijìnb�n HSK [ HSK; Revised HSK]. 22 The term “test developer” refers to individuals who design and develop tests or assess-

ments. “Test users” refer to individuals who make decisions based on assessments.

Page 8: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

17

often published by test practitioners, such as test administrators or language teach-ers, the latter group engaged in this because their teaching got affected by the HSK. These studies were often related to washback issues. Figure 3 shows the number of HSK studies published each year.

Figure 3: Chinese studies related to the HSK or using it as a research tool (in total 421).

The Chinese literature on CFL testing has not received much attention outside of China although the number of standardized Chinese language proficiency test takers and test centers outside of China has constantly risen (Meyer, 2009). Main-land Chinese research can be divided into studies focusing on the old HSK, the G�ijìnb�n HSK (Revised HSK), and the new HSK. The old HSK related research can be subdivided into research on the three different HSK test formats, which cov-ered different levels of Chinese proficiency: (a) the Elementary-Intermediate HSK, (b) the Advanced HSK, and (c) the Basic HSK. This dissertation primarily targets the Elementary-Intermediate HSK, which was the first test launched officially in 1990. This test (and its successor the new HSK) still has the highest total test-taking population of all CFL proficiency tests by far (cf. Figure 1), which is why the majority of all HSK studies examine this test. Because this dissertation focuses on the Elementary-Intermediate HSK, which is also the most important test for German test takers, it will only mention studies on the Basic and the Advanced HSK when necessary.23 HSK research was also conducted on different test-taker groups, especially on ethnic minorities, and on test takers from specific countries,

23 The Advanced HSK was taken by few German test takers because German and other West-ern test takers almost never reached this proficiency level (Kaden, 2004, p. 4; Meyer, 2006).

0

10

20

30

40

50

60

70

1986 1990 1994 1998 2002 2006 2010

Stu

dies

rela

ted

to th

e H

SK

Year

Page 9: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

18

mostly Asian countries because Asian test takers account for more than 95% of all foreign HSK test takers (Huáng Ch�nxiá, 2011b, p. 61).24 Some studies investigat-ed non-Asian test-taker groups, for example the situation in Italy (Xú Yùm�n and Bulfoni, 2007; S�n Yúnhè, 2011) or Australia (Wáng Z�léi, 2009). Unfortunately, none of these studies explicitly differentiates between test takers who have a native Chinese language background and those who do not; exceptions are Yè Tíngtíng’s (2011) study on the situation in Malaysia and Shàn Méi (2006) who investigated the HSK’s face validity. This dissertation will initially provide data distinguishing between both groups, and it will give new insights to learners who have absolutely no native Chinese language background.25 HSK research covers a vast variety of topics, even the historical aspects of testing in China.26 Other HSK research deals with the first revised version of the HSK, the G�ijìnb�n HSK (Revised HSK, launched in 2007) and the new HSK ( HSK X�nb�n HSK, launched in 2010). The volume edited by Zhng Wàngx� and Wáng Jímín (2010) solely deals with the G�ijìnb�n HSK. Most studies on the new HSK have occurred in recent years, starting with Lù Shìyì and Yú Jiyuán (2003) who published the first essay about the new HSK.27 Up to now, around 40 studies in total concern the G�ijìnb�n HSK and the new HSK. Both in China and in Taiwan, one monograph on CFL test-ing has been published.28 Wáng Jímín (2011) covers the whole spectrum of lan-guage assessment, with many examples coming from CFL testing, while Zhng Lìpíng (2002) focuses completely on testing for CFL.

Compared to the situation in China, Western research is rather scanty. Several studies originated in the United States, most of which deal with classroom assess-ment (e.g., Bai, 1998; Muller, 1972) or test formats or test types (Ching, 1972;

24 These studies included, e.g., Korean (Cu� Sh�yàn, 2009), Japanese (S� Jìng, 2009; Yáng Ch�xi�o, 2007, 2011), Vietnamese (L� Xiá and Lín K�, 2007), Mongolian (Zhng Ruìfng, 2008, 2011; S� Dé and Táo Gétú, 1999), Malaysian (Yè Tíngtíng, 2011) and Thai test tak-ers (Lóng W�ihuá, 2011).

25 The distinction is important because among foreign HSK test takers a certain amount has a native Chinese language background, e.g., in Germany approximately 35% (cf. chapter 5).

26 Rén Xi�oméng (1998) compared the HSK and the Chinese imperial civil-service examina-tion system (K�j� ).

27 This essay was a political text by Hanban officials who wanted to “explain” why the previ-ous research on the old HSK conducted by HSK research specialists would be meritless and not very scientifically fruitful. Other studies, e.g., Yáng Chéngq�ng and Zhng Jìnj�n (2007), explained why the old HSK should lower its difficulty to ensure better access to Chinese learners outside of China, and “promote” the development of the Chinese language.

28 Several unpublished master’s theses exist. In the library of the Graduate Institute for Teach-ing Chinese (Huáy�wén Jiàoxué Yánji�su ) at the National Taiwan Normal University (NTNU; Guólì Táiwn Sh�fàn Dàxué ) one master thesis on grammar assessment for CFL could be found (Yáng Yùsh�ng, 2007).

Page 10: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

19

Lowe, 1982; Yao, 1995). Chun and Worthy (1985) discuss the ACTFL29 Chinese language speaking proficiency levels. Hayden (1998) and Tseng (2006) examine language gain. In Germany, only five studies on the HSK have been published (Meyer, 2006, 2009; Reick, 2010; Ziermann, 1995b, 1996). Fung-Becker (1995) writes about achievement testing for CFL, and Lutz (1995) presents some thoughts on methods for assessing the oral ability of learners of Chinese.30

On the one hand we can find considerable knowledge about CFL testing in China; on the other hand, outside of China nearly no literature exists. Thus, this work presents the major findings of the rich HSK research to a Western audience, and it will identify crucial questions in CFL proficiency testing and explain why a “perfect” language proficiency test for CFL will never exist because testing goals, test takers, and the context in which the Chinese language is used and assessed, as well as the resources and the testing technologies used will always vary and have to be specified and adjusted to the specific needs and uses of a test. However, the cru-cial points or main theoretical issues will remain. I hope this study can contribute to the above-mentioned fields by clearly revealing what these main issues are and how they affect CFL testing.

Over time, the quality of HSK studies has gradually improved. Studies in the 1980s were concerned with the foundation of the HSK, which especially included the target language domain, the scoring, and the reliability of the HSK. One of the main targets of researchers at that time was to provide norm-referenced scores and to make the HSK a stable measure. Validation studies began in 1986 and emerged in greater numbers in the 1990s. In the 2000s, washback studies emerged. Jìng Chéng (2004) claims that researchers who were not involved in the HSK test de-velopment process had no access to test taker data samples and could not generate results large enough to have statistical value, and the author argues that non-test developers had to engage in more qualitative research than quantitative (p. 23). However, HSK research maintained high quality and shifted from larger fields to increasingly specialized topics. Though confirmatory studies initially dominated HSK research, several studies were very critical and disclosed controversial points. Non-test developers later expanded on these critiques. One specific criticism stemmed from teachers and universities in the autonomous region X�njing, whose participants had outnumbered the foreign test takers after 1999 (cf. Figure 1, p. 11), and for whom the HSK became a high-stakes test because admission officers re-quired HSK certificates as part of the decision-making process to admit ethnic mi-nority students to Chinese universities and colleges.

29 The ACTFL (American Council on the Teaching of Foreign Languages) aims to improve and expand the teaching and learning of foreign languages in the United States.

30 Ziermann (1995a) wrote a master thesis (Magisterarbeit; unpublished) on one HSK con-ducted in Germany.

Page 11: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

20

Thus, some investigations on the HSK that are thematically related to this work provided rich information for this dissertation and were quoted in several chapters31, while some could be summarized in one or two sentences, and others were not mentioned because they did not provide new insights. The majority of HSK studies used quantitative approaches; qualitative studies investigating single learners only occasionally occur, though the idea of combining different methods in a useful way appropriate for the specific research field—triangulation (e.g., Grotjahn, [2003] 2007, p. 497; Kelle, [2007] 2008)—is known among Chinese language testing ex-perts (e.g., Chén Hóng, [1997c] 2006, p. 235).

The HSK research was used to validate the test (chapter 4), and the validation focuses on the Elementary-Intermediate HSK. In chapter 2, the term language pro-ficiency will be discussed in detail, to foster a better understanding of Chinese HSK research. In addition, terminology relevant for this dissertation will be defined. Chapter 3 provides the theoretical foundation of testing, presenting the quality cri-teria in language testing, and it explains the crucial term of validity, how this term has been understood in psychological testing, and how it is used in this dissertation. Based on this validity concept, the theoretical approach underlying the validation in this work will be depicted in detail. Chapter 5 is an extension of the HSK validation with an empirical investigation on HSK test takers in Germany. The validity argu-ment for the HSK will be presented in chapter 6. Afterwards, the conclusion fol-lows in chapter 7.

1.4 History of the HSK S�n Déj�n (2009) divides the development of the HSK into three periods: (a) an initial phase (ch�chuàngq� ) from 1980 to 1990, (b) an expanding stage (tuòzh�nq� ) from 1990 to 2000, and (c) an innovative stage (chuàngx�nq�

) from 2000. A fourth stage started with the new HSK in 2010 and ended the innovative stage.

In 1981, the development of the HSK started with research on small-scale tests. By that time, the HSK was strongly affected by standardized language tests in the United States and England, especially the TOEFL, which had just reached the Chi-nese mainland and shifted the focus in Chinese foreign language didactics from language knowledge to language ability (p. 19; cf. Liú Y�nglín, [1988b] 1989, p. 110–111; S�n Déj�n, 2009). After founding the “HSK design group” (Hàny� Shu�píng K�oshì Shèjì Xi�oz� ) in December 1984, led by Liú Xún and consisting of ten members32, the first test was developed and

31 S�n Déj�n (2009) says that nowadays HSK experts are able to discuss and exchange ideas with other leading experts on language testing at the same level, for example from Educa-tional Testing Service (ETS) in the United States.

32 According to Zhng K�i (2006c), the group had been formed in October 1984. Other found-ing members were Huáng Zhèngchéng , Fng Lì , S�n J�nlín and Gu�

Page 12: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

21

pretested in June 1985 at the BLCU33 (Liú Xún et al., [1986] 1997, p. 77; Liú Y�nglín, [1990b] 1994, p. 45; Liú Y�nglín, et al., [1988] 2006, p. 23; S�n Déj�n, 2007, p. 130; Zhng K�i, 2006c, p. 1). Liú Xún reported the results of the 1985 pre-test at the first conference on “International Chinese Didactics,” where they caused a “stir.” Afterwards, further large-scale pretests were conducted in 1986 and 1987; in 1988, the BLCU launched the first official HSK and issued certificates to the test takers, who have to pay a test fee since 1989 (S�n Déj�n, 2007, p. 130, 2009, p. 19). By that time, the HSK consisted merely of the test format that was later renamed Elementary-Intermediate HSK.

From June 1985 to January 1990, 8,392 test takers from 85 countries partici-pated in the HSK, and the examinations were held at 33 test sites in 16 Chinese provinces, cities, and autonomous regions (Liú Y�nglín and Gu� Shùj�n, [1991] 1994, p. 12). From 1985, five large-scale pretests were administered once a year. In March 1989, BLCU established the Chinese Proficiency Test Center (HSK Center; Hàny� Shu�píng K�oshì Zh�ngx�n ; Zhng K�i, 2006c, p. 2); the Center provided the professional basis for HSK development and research. In 1990, the HSK was appraised by experts and officially launched.

In 1991, the HSK was launched outside of China, and the number of test takers steadily increased.34 Because the HSK only assessed the elementary and intermedi-ate proficiency levels, the “Advanced HSK” (God�ng HSK) was introduced in 1993, and the original HSK was renamed to Elementary-Intermediate HSK (Ch�-zh�ngd�ng HSK). In 1997, the Basic HSK (J�ch� HSK) en-tered the scene.

In 2000, the number of test takers reached 85,238, whereby 31,067 test takers were “foreigners” and 54,171 belonged to Chinese ethnic minorities. In this phase, research was conducted investigating to what extent the HSK fulfilled the needs of different stakeholders, which, in addition to Chinese learners, included universities, companies, and other organizations that used HSK scores of test takers for making decisions about university admission, employment, etc., and the HSK “product” was revised, also in terms of economic aspects (S�n Déj�n, 2009, p. 19). In 2006, the HSK Threshold (Rù-mén jí HSK) and the C.TEST (Shíyòng Hàny� Shu�píng Rèndìng K�oshì ) were launched. The former test had been designed to measure the Chinese language ability of learners who had attended fewer than 200 study hours in Chinese. The test was developed to meet market demand created by rising numbers of Chinese learners outside of China who studied Chinese as a hobby. The C.TEST was created for assessing the Chi-

Shùj�n . In 1986, the core group consisted of Liú Y�nglín , Gu� Shùj�n and Wáng Zhìfng (p. 1). S�n Déj�n (2009) indicates only six people (p. 19).

33 BLCU stands for Beijing Language and Culture University, in Chinese B�ij�ng Y�yán Dàxué (formerly called B�ij�ng Y�yán Xuéyuàn ).

34 Statistics show that every year the HSK had more test takers in China than outside, at least till 2005 (S�n Déj�n, 2009, p. 20).

Page 13: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

22

nese language ability needed for working in China and daily life, and it should help Chinese companies recruit non-native Chinese employees (S�n Déj�n, 2006, p. 4). In 2007, an oral examination was additionally offered, called “C.TEST oral exami-nation” (C.TEST K uy� K�oshì ; Wáng Jímín, 2011, p. 36).

These years marked two further important incidents. First, in 2006 the total number of HSK test takers exceeded 1,000,000. Second, around 2005, the Chinese Ministry of Education withdrew HSK authorization from the HSK Center and shifted all rights to the Hanban35 (L� Háng, 2010, p. 952), and the Hanban founded its own test section; thus, the HSK Center have not been able to access the data of the test takers since 2005–2006. Moreover, the first revision version of the HSK—the G�ijìnb�n HSK ( HSK, Revised HSK), which had been developed and launched by the HSK Center on April 21st in 2007—was not supported and pro-moted by the Hanban. Actually, the G�ijìnb�n HSK should replace the old HSK (Zhng Wàngx� and Wáng Jímín, 2010). However, in 2010 the new HSK (X�n Hàny� Shu�píng K�oshì ) was introduced by the Hanban, which lowered the standards in CFL drastically; moreover, it amateurishly linked the test to the Common European Framework of Reference for Languages (CEFR; cf. Xiè Xi�oqìng, 2011, p. 11). Not only because of the decrease in standards, but also be-cause of the introduction of subtests assessing oral and written productive Chinese abilities and because of a massive promotion campaign executed by the Confucius Institutes outside of China, the number of test takers immediately skyrocketed in 2010 (cf. S�n Yúnhè, 2011). In addition, the Hanban introduced the Business Chi-nese Test (BCT; Shngwù Hàny� K�oshì ) and the Youth Chinese Test (YCT; X�n Zh�ng-Xi�oxuésheng Hàny� K�oshì ). With this background knowledge, the following paragraph by S�n Déj�n, the former head of the HSK Center, can be seen in a completely new light:

… We [the researchers of the HSK Center] believe that the development and the exist-ence of the HSK have to insist on scientific principles and directions. …If there is no scientific basis, there will be no future for the HSK. (S�n Déj�n, 2009, p. 20)

1.5 Other Chinese language proficiency tests Gaining an overview on proficiency tests for CFL has become increasingly confus-ing year after year. As Zh� Hóngy� (2009) notes, in Mainland China alone almost 10 tests already exist aiming to assess the Chinese language ability of non-natives of Chinese (p. 54). Obviously, an effort to mention all worldwide existing CFL tests would probably fail, and the scientific value of such a listing is also doubtful because it is difficult and usually not very fruitful to compare among tests because every test has its own specific purpose and circumstances (e.g., different target

35 Cf. footnote 5, p. 6.

Page 14: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

23

populations). Nevertheless, in this section a short overview of the most important proficiency tests for CFL will be given. The following aspects were considered when choosing specific CFL tests: (a) test-taking population size, and/or (b) West-erner participation, and (c) whether the test can be considered as high-stakes test. The HSK and the TOCFL have already been mentioned in the sections before. Be-cause of the above-mentioned reasons, the list of tests below does not claim to be exhaustive.

The first test that one needs to mention is the Ch�gokugo kentei shiken (Chinese Proficiency Test), launched in 1981 by the Japanese Society for

Testing Chinese (Nihon Ch�gokugo kentei ky�kai ). This test seems to be the first professional CFL proficiency test and it is designed for Japanese native speakers. As of 2011, a total of 75 exams were administered, in which 600,000 candidates participated. The test is offered three times per year. Out of all 600,000 test takers, 180,000 received a certificate. The listening subtest also includes a dictation, and the test has a translation subtest (Chinese–Japanese–Chinese). Every year, all 18 tests—three times per year, six formats—are published within half a year, together with audio recordings, answers keys and explanation sheets. Approximately 20,000 test takers per year currently take the test. In 2004, more Japanese took this test than those who participated in the HSK (Oikawa, 2009; S� Jìng, 2009; S�n Déj�n, 2009; Wikipedia, 2011; Yáng Ch�xi�o, 2011).

Another test from Japan, the Ch�gokugo komyuni k�shon n�ryoku kentei (Test of Communicative Chinese, TECC), was

initiated by the Ch�gokugo k�ry� ky�kai (Society for the Ex-change of Chinese) and was launched in 1998. The test is designed to assess com-municative Chinese ability. Chinese language experts and major companies in Ja-pan that have trade experience with Chinese counterparts initiated the exam. Japa-nese companies willingly accept those certificates, and the number of test takers has significantly risen in recent years (S� Jìng, 2009, p. 91). Though the name of the test claims to measure communicative ability, it consists of only a listening and a reading subtest, which last 35 and 45 minutes, respectively (Zhng Lìpíng, 2002, p. 9).

In the United States, three major tests evaluate whether students have mastered the Chinese ability usually taught during a four-semester college course. The certif-icates are regularly used when applying for university admission. The CPT (Chi-nese Proficiency Test) was developed in 1983 by the Center for Applied Linguis-tics (CAL). The target population consists of English-speaking learners of Chinese, generally students who have studied two or more years of Chinese at a college or university in the U.S. The CPT has a listening subtest and a reading subtest (the latter also has a structure subtest). All response options on the listening subtest are in English, as well as all questions on the other two subtests, and all 150 items are multiple-choice items with four answer choices. The CPT offers a Cantonese ver-sion as well (Center for Applied Linguistics, 2010). In addition, the CAL offers a

Page 15: Language Proficiency Testing for Chinese as a Foreign · PDF fileIn 2010, the “new HSK” replaced the former HSK version.6 However, in China (2013) some universities still offer

24

Preliminary Chinese Proficiency Test (Pre-CPT) for students who have studied Chinese in school for three to four years or for college students who have studied for at least one year.

The SAT (Scholastic Aptitude Test) Subject Test in Chinese with Listening measures the reading and listening abilities of students who have studied Chinese for two to four years in high school. It helps them to be placed into higher-level college or university Chinese language classes. The SAT is developed by the Edu-cational Testing Service (ETS). It has three subtests: listening (30 items), grammar (25 items) and reading (30 items). Similar to the CPT, the tasks are mostly in Eng-lish. All the items of the grammar subtest are displayed in simplified characters, traditional characters, P�ny�n, and in the Taiwanese transcription Zhùy�n Fúhào (

; also called Bopomofo). The Advanced Placement Program® (AP®) offers a Chinese Language and Cul-

ture examination, which roughly equals a four-semester college course. The test is a computer-based test that is also administered by ETS. Questions are provided in simplified and traditional characters, and test takers can chose which system they use for writing (answers are typed using a keyboard). The test has four subtests: listening (30 items, 20 minutes), reading (35–40 items, 1 hour), writing (2 tasks, 30 minutes) and speaking (7 tasks, ca. 11 minutes), and the whole test usually lasts around 2 hours and 15 minutes. Questions and answer choices are all given in Eng-lish, and the writing and the speaking task are holistically rated (The College Board, 2011).

1.6 Transcription system in this work This dissertation uses the Hàny� P�ny�n transcription for Chinese words and names. Exceptions are fixed names such as Peking University, Tsinghua University, or the above-mentioned Hanban. Normally, the order applied here is Hàny� P�ny�n, Chi-nese characters, and then the English translation. When Chinese characters are in the focus, they might be placed in front, and where titles of studies, books or syllabi have been used, the English translation precedes. All Chinese authors who have published in Chinese are transcribed family name first, followed by his or her given name (without comma). The P�ny�n spelling rules are followed according to the X�nhuá P�nxi� Cídi�n [Chinese Transliteration Dictionary], pub-lished in 2002. Thus, diacritics have been used in the entire work, and proper nouns are spelled in capitals. Korean names are transcribed by using the McCune-Reischauer Romanization, Japanese words with the Hepburn Romanization. Any translation or spelling mistakes are due to shortcomings of the author. This ac-counts also for block quotations from Chinese and their related translations.