12
Empirical Articles Validity of Indirect Assessment of Writing Competency for Deaf and Hard-of-Hearing College Students Gerald P. Berent Vincent J. Samar Ronald R. Kelly National Technical Institute for the Deaf, Rochester Institute of Technology Rusti Berent University of Rochester Joseph Bochner John Albertini Jeannee Sacken National Technical Institute for the Deaf, Rochester Institute of Technology Indirect tests of writing competency are often used at the col- lege level for a variety of educational, programmatic, and re- search purposes. Although such tests may have been vali- dated on hearing populations, it cannot be assumed that they validry assess the writing competency of deaf and hard-of- hearing students. This study used a direct criterion measure of writing competency to determine the criterion validity of two indirect measures of writing competency. Results suggest that the validity of indirect writing tests for deaf and hard-of- hearing baccalaureate-level students is weak. We recommend that direct writing tests be used with this population to en- sure fair and accurate assessment of writing competency. The assessment of writing skills is commonplace in ed- ucational environments. During a student's education, writing skills are assessed at various stages, both for- mally and informally, for a variety of reasons. Writing is generally assessed through tests administered for college admission, course placement, and course evalu- ation. In addition, colleges and universities often have writing policies intended to ensure that students do not graduate until their writing skills reflect those of edu- This research was conducted at the National Technical Institute for the Deaf, a college of Rochester Institute of Technology, in the course of an agreement with the US. Department of Education. A preliminary rer- sion of these finding* was presented at the April 1994 annual convention of Teachers of English to Speakers of Other Languages in Baltimore, Maryland. Correspondence should be cent to Gerald P. Berent, Depart- ment of Applied Language and Cognition Research, National Technical Institute for the Deaf, Rochester Institute of Technology, 52 Lomb Me- morial Drire, Rochester, NY 14623-5604. Copyright © 1996 Oxford University Press. CCC 1081-4159. cated members of society and are sufficient for compe- tent functioning in their intended careers. To ensure that students are assessed accurately, ed- ucational institutions have the responsibility of finding and using valid, fair assessment tools. The use of an inappropriate test can have serious negative conse- quences on a student's educational success and career opportunities. Invalid testing might result in a qualified student being denied acceptance to a particular college or program, being placed in the wrong course or pro- gram of study, failing a certain course, not graduating on time, or not qualifying for a certain job. Conversely, invalid testing might result in an unqualified student inappropriately gaining access to opportunities, pass- ing courses, or graduating. Direct measures of writing competency, which in- volve raters' judgments of the quality of students' writ- ing samples, have gained popularity in educational in- stitutions since the mid-1970s. Generally, direct measures are valid and reliable when appropriate atten- tion is paid to details of administration, testing condi- tions, population factors, and scoring (Huot, 1990). Their clear construct validity makes them attractive candidates for writing assessment for a variety of stu- dent populations. However, for ease of administration and scoring, indirect measures are still used in many ed- ucational institutions to assess writing competency. They appear to be particularly popular in community college environments, where there is a demand for rapid, on-site testing at the time of admission. Gener-

Assessment of Writing Competency

Embed Size (px)

Citation preview

Empirical Articles

Validity of Indirect Assessment of Writing Competency for

Deaf and Hard-of-Hearing College Students

Gerald P. BerentVincent J. SamarRonald R. KellyNational Technical Institute for the Deaf, Rochester Institute of Technology

Rusti BerentUniversity of Rochester

Joseph BochnerJohn AlbertiniJeannee SackenNational Technical Institute for the Deaf, Rochester Institute of Technology

Indirect tests of writing competency are often used at the col-lege level for a variety of educational, programmatic, and re-search purposes. Although such tests may have been vali-dated on hearing populations, it cannot be assumed that theyvalidry assess the writing competency of deaf and hard-of-hearing students. This study used a direct criterion measureof writing competency to determine the criterion validity oftwo indirect measures of writing competency. Results suggestthat the validity of indirect writing tests for deaf and hard-of-hearing baccalaureate-level students is weak. We recommendthat direct writing tests be used with this population to en-sure fair and accurate assessment of writing competency.

The assessment of writing skills is commonplace in ed-ucational environments. During a student's education,writing skills are assessed at various stages, both for-mally and informally, for a variety of reasons. Writingis generally assessed through tests administered forcollege admission, course placement, and course evalu-ation. In addition, colleges and universities often havewriting policies intended to ensure that students do notgraduate until their writing skills reflect those of edu-

This research was conducted at the National Technical Institute for theDeaf, a college of Rochester Institute of Technology, in the course of anagreement with the US. Department of Education. A preliminary rer-sion of these finding* was presented at the April 1994 annual conventionof Teachers of English to Speakers of Other Languages in Baltimore,Maryland. Correspondence should be cent to Gerald P. Berent, Depart-ment of Applied Language and Cognition Research, National TechnicalInstitute for the Deaf, Rochester Institute of Technology, 52 Lomb Me-morial Drire, Rochester, NY 14623-5604.

Copyright © 1996 Oxford University Press. CCC 1081-4159.

cated members of society and are sufficient for compe-tent functioning in their intended careers.

To ensure that students are assessed accurately, ed-ucational institutions have the responsibility of findingand using valid, fair assessment tools. The use of aninappropriate test can have serious negative conse-quences on a student's educational success and careeropportunities. Invalid testing might result in a qualifiedstudent being denied acceptance to a particular collegeor program, being placed in the wrong course or pro-gram of study, failing a certain course, not graduatingon time, or not qualifying for a certain job. Conversely,invalid testing might result in an unqualified studentinappropriately gaining access to opportunities, pass-ing courses, or graduating.

Direct measures of writing competency, which in-volve raters' judgments of the quality of students' writ-ing samples, have gained popularity in educational in-stitutions since the mid-1970s. Generally, directmeasures are valid and reliable when appropriate atten-tion is paid to details of administration, testing condi-tions, population factors, and scoring (Huot, 1990).Their clear construct validity makes them attractivecandidates for writing assessment for a variety of stu-dent populations. However, for ease of administrationand scoring, indirect measures are still used in many ed-ucational institutions to assess writing competency.They appear to be particularly popular in communitycollege environments, where there is a demand forrapid, on-site testing at the time of admission. Gener-

168 Journal of Deaf Studies and Deaf Education 1:3 Summer 1996

ally, indirect measures assess students' recognition ofspecific domains of language knowledge such as gram-mar but do not require any actual writing on the partof students (Perkins, 1983).

The continued use of indirect measures raises con-cerns about the validity of writing competency assess-ment for deaf and hard-of-hearing students. For ex-ample, in some high schools, colleges, and universities,writing assessment policies may require all students tobe tested with indirect measures. Hence, the writingcompetency of deaf and hard-of-hearing students whohappen to be mainstreamed in such institutions wouldordinarily, without special provision, be tested usingindirect measures. However, it cannot be assumed thatindirect measures are equally valid for all populations.A variety of testing and language factors might selec-tively alter the validity of indirect measures for the deafand hard-of-hearing population, including factors suchas the comprehensibility of test instructions, students'general reading skills, the degree to which deaf andhard-of-hearing students' actual writing performancedepends upon the specific skills and language knowl-edge tested by indirect tests, and so on. Examining thevalidity of indirect tests of writing competency for thispopulation is an issue of immediate and growing con-cern because the recent Americans with DisabilitiesAct of 1990 has opened the way for increased enroll-ment of deaf and hard-of-hearing students into main-stream college environments such as the country'smany community colleges.

Direct measures of writing competency have clearface validity in that they require students to performthe actual behavior being measured, namely, writing,and they allow students to draw upon a wide range oflanguage knowledge, including knowledge of mechan-ics, grammar, style, organization, and logical develop-ment (Bochner, Albertini, Samar, & Metz, 1992; Per-kins, 1983). On the other hand, indirect measures ofwriting competency have been criticized for their lackof face validity and credibility among educators (Brad-dock, Lloyd-Jones, & Schoer, 1963; Breland & Gaynor,1979). Indirect tests tend to be biased toward the as-sessment of a few discrete language properties, for ex-ample, mechanics, grammar, and vocabulary, whileother significant aspects of written language are ig-nored, for example, organization, content, and coher-

ence (Bochner et al., 1992). Furthermore, they are typ-ically paper and pencil tests and therefore tend toconfound reading and writing competency. This maybe problematic when indirect tests are used to assessthe writing competency of deaf and hard-of-hearingstudents.

The popularity of indirect tests apparently stemsfrom the belief that their scores correlate with perfor-mance on direct writing tests, despite their lack of con-struct equivalence. Diederich (1974) asserted that theindirect, "objective" tests of English usage often corre-late about as highly with written essay grades as writtenessay grades correlate among themselves. Therefore,Diederich argued, "It does not matter that they do not'really' measure 'the same thing.' If students who aregood at one tend to be good at the other, and vice versa,then it is a good indicator of proficiency in written En-glish" (p. 80). Diederich's reasoning amounts to theclaim diat indirect tests may display criterion validity;

that is, they may correlate sufficiently well with directtests of writing competency to provide a useful psycho-metric substitute for those direct tests. The presentstudy examines the issue of the criterion validity of in-direct measures for use with deaf and hard-of-hearingbaccalaureate-level students. Two potentially appro-priate indirect writing tests are compared with a directmeasure of the quality of actual writing samples ob-tained from deaf and hard-of-hearing college studentspursuing a variety of majors at the baccalaureate level.

Method

Participants. Fifty-six deaf and hard-of-hearing collegestudents, ranging in age from 18 to 38 years (M =22.91, SD = 3.96), participated in this study. Twenty-five of the participants were women and 31 were men.The participants' pure tone average hearing losses,measured in the better ear at 500, 1000, and 2000 Hz(ISO, 1975), ranged from moderate to profound (M =90.45 dB, SD = 15.37 dB, mm = SI dB, max = 120dB). These students were pursuing bachelor's degreesin a broad range of arts and science disciplines in eightof the colleges of the Rochester Institute of Technol-ogy. The participants' grade point averages of aca-demic achievement on a scale of 0 (low) to 4 (high)ranged from 1.91 to 3.73 (M = 2.89, SD = 0.43).

Validity of Indirect Writing Assessment 169

Test selection: indirect measures. A search was carried out

for published, standardized indirect measures of writ-ing competency that had potential for use with deafand hard-of-hearing college students. A detailed de-scription of this process and a review of the tests lo-cated are included in Berent and Berent (1990).1 Thesearch for indirect measures uncovered no writing testsspecifically designed for or normed on deaf and hard-of-hearing college students.2 Of 44 tests originallyidentified as potential writing tests appropriate for thetarget population, all but nine were eliminated fromfurther consideration. Some of the tests were out ofprint; some were determined to be too elementary; andsome had too narrow a focus on one language skill (e.g.,grammar) to be construed in any plausible way as indi-rect measures of writing. The remaining nine belowwere reviewed in detail.

1. ACT Collegiate Assessment of Academic Pro-ficiency: Writing Skills Test (American College TestingProgram, 1988)

2. Advanced Placement Exam in English Lan-guage and Composition (College Entrance Examina-tion Board, 1990)

3. The Written English Expression PlacementTest (Educational Testing Service, 1985)

4. The College Board Achievement Test in EnglishComposition (Educational Testing Service, 1990)

5. College English Placement Test (Riverside Pub-lishing Company, 1969)

6. The College-Level Examination Program: Sub-ject Examination in College Composition (The CollegeBoard, 1989)

7. The Descriptive Tests of Language Skills inConventions of Written English (The College Board,1988)

8. Diagnostic Skill Level Inventory for WritingSkills (Educational Diagnostic Services, 1988)

9. The New Jersey High School Proficiency Test:Writing Section (Cooperman & Bloom, 1988)

The tests, all timed, consist of various multiple-

choice items. The various skills assessed by each test

were determined either by analyzing the individual

items or by examining information provided in the test

manual. The Appendix presents the skills targeted by

these tests (see Berent & Berent, 1990, for more detail).

We felt that the first six items listed in the Appen-dix refer to the basic skills, strategies, and techniquesnecessary for good writing in nearly all contexts. As forthe seventh item, vocabulary skills, a good knowledgeof vocabulary is desirable in the sense that the writermust be able to supply words appropriate to communi-cate the required meanings of sentences. However, aknowledge of sophisticated or esoteric vocabulary isgenerally not essential for good clear writing in mostsituations. In fact, many guidelines for clear and effec-tive writing recommend simplifying word choice wher-ever possible. This issue is particularly important inthe evaluation of the writing skills of deaf and hard-of-hearing students. These students often have difficultymastering the more advanced and infrequent Englishvocabulary. Given the nonessential role of a sophisti-cated or esoteric vocabulary in most writing situations,it is desirable to select a test for deaf and hard-of-hearing students that is not substantially weightedtoward the assessment of this sort of vocabularyknowledge.

The last four skills in the Appendix deal with fac-tual and content knowledge that may vary considerablyamong different individuals, cultures, and situations.For example, different writing situations may place id-iosyncratic demands on an individual's world knowl-edge. Individuals might write well when confrontedwith, say, a technical writing situation within- theirscope of world knowledge, but write poorly when con-fronted with a requirement to produce literary criti-cism, with which they may have little experience. Teststhat depend upon a limited scope of world knowledge,therefore, might misrepresent the writing abilities ofstudents who do not have that knowledge for culturalor other reasons. This is a particularly significant issuefor the deaf and hard-of-hearing population, whichmay be quite culturally diverse.

In consideration of the above points, tests thattapped any of the last five skills in the Appendix wereeliminated as candidates for use in this study. The re-maining tests, namely, tests 1,3,7, and 9, were retainedfor further consideration. These tests tapped at leastthree of the first six skills in the Appendix (punctua-tion, grammar and usage, sentence structure, orga-nization, style, and rhetorical strategy). Due to timelimitations and factors such as participant fatigue,

170 Journal of Deaf Studies and Deaf Education 1:3 Summer 1996

counterbalancing requirements, and so on, we selectedonly two of these four tests for inclusion in the study.The final selection was based on relatively minordifferences among the tests.

We anticipated that the language of the test direc-tions would need to be modified in structure, length,and vocabulary for use with a deaf and hard-of-hearingpopulation. Not doing so would potentially compro-mise the assessment of the true strength of the relation-ship betweenthe indirect measures and the direct mea-sure used in this study, since, because of Englishlanguage difficulties, the standard directions might bemisunderstood by some students. By the same token,we felt that making the least extensive modification oftest directions would pose the least threat to a test'sgeneral validity. Accordingly, test 1 was excluded be-cause its directions would require more extensive mod-ification for use with deaf and hard-of-hearing studentsthan those of the other three tests. Test 7 was excludedas well because it places somewhat heavier emphasis onEnglish grammar than the other tests do and becauseit targets fewer of the first six writing skills in the Ap-pendix than the remaining two tests. Thus, test 3, theWritten English Expression Placement Test, and test 9,the New Jersey High School Proficiency Test: WritingSection, were ultimately selected for this validity study.

The Written English Expression Placement Test(WEEPT) was designed as a 25-minute test and con-tains 40 four-choice items in two formats. One formatincludes 20 error recognition items, each consisting ofa sentence in which three segments are underlined, oneof which may contain an error. There is a fourth no-error option. The second format involves 20 sentencecorrection items. For each item, part or all of a sentenceis underlined and is followed by four options for re-phrasing what is underlined. The first option alwaysrepeats the exact wording of the original sentence. TheWEEPT has a Kuder-Richardson 20 internal consis-tency of .83. The test is purported to have good contentvalidity and a median predictive validity coefficient of.32 between test scores obtained one year before collegeentry and English course grades during the first yearof college. No information on the correlation of theWEEPT with direct writing measures is available forthe normative population (Using and Interpreting

Scores, 1979).

The New Jersey High School Proficiency Test:

Writing Section, referred to hereafter as the New Jer-sey Writing Test (NJWT), is a 60-minute test con-taining 66 four-choice items using 11 formats, eachconsisting of six items. The eleven formats are asfollows:

1. Sentence reformulation: A sentence is pre-

sented along with the beginning of a reformulation, fol-

lowed by four possible completions.2. Sentence combining: Four suggested one-

sentence rewrites follow a three- or four-sentence para-graph.

3. Sentence removal: An irrelevant sentence mustbe removed from a paragraph.

4. Sentence error identification: An error in a

sentence must be identified.

5. Paragraph error identification: An error in a

paragraph must be identified.6. Paragraph completion: A paragraph must be

completed by choosing words to fill in blanks.7. Sentence error correction: An error in a sen-

tence must be corrected.8. Main idea identification: The main idea in a

paragraph must be identified.9. Supporting detail choice: The best supporting

detail for the main idea of a sentence must be chosen.10. Sentence correction: An underlined part of a

sentence must be corrected.

11. Paragraph arrangement: Four or five sentences

must be arranged into a logical paragraph.

The NJWT has a Kuder-Richardson 20 internalconsistency of .885. The test is purported to have goodcontent validity. No information on the correlation ofthe NJWT with direct writing measures is available forthe normative population (Cooperman & Bloom,1988).

The directions for the WEEPT and the NJWT

were modified to make them as accessible as possible to

the deaf and hard-of-hearing college students partici-

pating in the study. Although the intent of the original

directions was maintained, sentence structures, sen-

tence length, and vocabulary were simplified as much

as possible.3

Test selection: direct measure. The Test of Written English

(1986), or TWE, was chosen for use in this study as a

Validity of Indirect Writing Assessment 171

direct measure of writing competency. The TWE, a30-minute essay test, is normally administered in asso-ciation with the Test of English as a Foreign Language(TOEFL). Examinees perform academic writing tasksin which they have an opportunity to generate and or-ganize ideas on paper, to support those ideas with ex-amples or evidence, and to use the conventions of stan-dard written English. The TWE is scored by readerswho rate the proficiency of the essays on a scale of 1 to6, where 1 means the writer has demonstrated incom-petence in writing and 6 means the writer has demon-strated clear competence in writing (TOEFL Test of

Written English Guide, 1989).

The TWE is recognized as a valid measure of writ-ing ability with demonstrated high reliability. Inter-rater reliability coefficients for the TWE are reasonablyhigh (.86 to .88), and TWE scores tend to remain stablewith change of topic The coefficient alpha estimate ofinternal consistency ranged from .89 to .91 in eightlarge-scale administrations of the TWE between Sep-tember 1989 and May 1991, during which a total of523,779 essays were collected and scored. The TWEhas been constructed to possess good content validity,using writing tasks similar to those required of collegeand university students in North America. It also en-compasses universally recognized components ofwritten language facility, including organization,development, support, illustration, unity, coherence,progression, syntax, and usage. TWE scoring is doneby two specialists in English writing and English as asecond language with a provision for discrepancy reso-lution by a third reader.

The TWE also shows evidence of construct valid-ity in that TWE scores correlate appropriately withTOEFL scaled scores. These correlations (rangingfrom .60 to .72) are high enough to suggest that theTWE is measuring the appropriate construct but lowenough to account for the fact that the TWE and theTOEFL are designed to measure somewhat distinctEnglish language proficiency skills. Importantly, theTWE has been used with deaf and hard-of-hearing col-lege students, whose performance on the test is compa-rable to performance on the TWE by hearing speakersof other languages (Traxler, 1990). Thus, the TWEgenerally appears to be a valid and reliable direct mea-sure of writing ability appropriate for use with deaf andhard-of-hearing college students.

Procedure. The 56 students who participated in thisstudy were randomly assigned to three groups of ap-proximately equal size (n = 18, 19, and 19) for grouptesting. The WEEPT, the NJWT, and the TWE wereadministered to each group in a single session. TheWEEPT and the NJWT were administered and scoredaccording to standard procedures, except for the neces-sary modification of test directions. Care was taken toensure the validity and reliability of the TWE by usingan essay question supplied by the Educational TestingService (ETS), by following established procedures,and by channeling students' essays through ETS forscoring. The order of administration of the tests wascounterbalanced over student groups according to aLatin square rotation, so that each test was scheduledto appear equally often in each position in the three-test sequence. Before each test and between each timedsection of a test, the proctor for each student group dis-played the relevant directions on an overhead projector -and answered students' questions using simultaneouscommunication. Each group received a 10-minutebreak between test administrations. Due to a proc-toring error, one group of students (n = 19) was notadministered one of the sections of the NJWT. Thosestudents were eliminated from all analyses involvingthe NJWT.

Scoring and analysis. The WEEPT and NJWT werescored automatically from students' op-scan answersheets. The TWE writing samples were returned toETS to be included in a special scoring session inspring 1991. The 56 writing samples from this studywere interspersed among the writing samples from sev-eral other special administrations at other sites involv-ing other populations of students. Hence, the TWEscoring procedure was not biased in any way by raters'knowledge of, or expectations about, the populationsampled in this study.

Available scores for each test were combined acrossgroups. Score distributions and item difficulties wereexamined for each indirect measure, and an estimate ofthe internal consistency reliability was obtained foreach indirect measure based on Horst's modificationof the Kuder-Richardson 20 formula (Guilford &Fruchter, 1978, p. 429). The Kuder-Richardson 20 for-mula was adopted to avoid the inherent samplingdifficulties of correlation-based reliability measures.

172 Journal of Deaf Studies and Deaf Education 1:3 Summer 1996

Horst's modification provides a correction for biases inreliability estimation due to unequal item difficultieswithin a test and was motivated for use here by a directexamination of the item difficulties for the WEEPTand the NJWT. Scatter plots of each indirect testagainst the TWE were examined to evaluate the formof any functional relationships.

We computed validity coefficients that estimatedthe validity of the indirect measures as tests of writingcompetency after correcting for the expected imperfectreliability of the tests. Correcting for imperfect reliabil-ity permits a more accurate assessment of a test's truecriterion validity than does a simple correlation be-tween a given test and a criterion measure. Conceptu-ally the correction involves removing the error ofmeasurement associated with a test that has nothing todo with the construct the test measures. The error ofmeasurement removed is associated with incidental,construct-irrelevant factors that simply add noise to astudent's score on any particular administration of thetest. These factors include fatigue, boredom, healthfactors, unintended response errors, environmentaldistractions, and many other specific factors that inter-fere with students' ability to display their true compe-tency in dealing with the material and demands of thetest. The collective influence of such factors is reflectedin the numerical estimate of a test's reliability. Reliabil-ity coefficients indicate, for example, how likely it isthat a student will get exactly the same score on twoseparate administrations of the same test.

Two tests may be perfectly reliable, in that eachgives exactly the same score on a second administrationfor a particular student, yet these tests may measureentirely different things. In this case, neither test wouldbe a valid measure of the construct that the other testmeasures despite its perfect reliability. Conversely, twotests may have only moderate reliability in that a stu-dent's score may vary considerably on repeat adminis-trations. Still these tests can have perfect validity inthat, apart from all the noise that affects their scores,the actual underlying construct that they measure isidentical. If one computes the actual correlation ofthese two tests, it will not be perfect because the noiseof measurement tends to disrupt the pairing of testscores. However, if we could reduce the noise, then thecorrelation would go up, ultimately to a perfect value

20-

15-

10-

}•

00 O5 1 J 2 2 J 3 1 S 4 4 J 5 5 J 6 6 J

TWE Score

5-

I t I

20 40 60

WEEPT Score

100

I"40 60

NJWT Score80 100

Figure 1 TWE, WEEPT, and NJWT score distributions.

of 1.0 if all the noise (unreliability) were eliminated.This happens because the only thing left for the twotests to measure after the noise is eliminated is the con-struct that the two tests share perfectly.

Reduced correlations between actual test scoresdue solely to the influence of poor reliability is referredto as "attenuation" in the tests' validity coefficient.Fortunately, it is a simple matter to estimate the truevalidity coefficient for a pair of tests by applying amathematical "correction for attenuation," based onestimates of the imperfect reliability of each test, as hasbeen done in this study. This correction allows a directestimate to be made of the degree to which the indirecttests actually measure the construct measured by thedirect test, namely, writing competency, when there isno measurement error at all.

Results

The score distributions for the TWE, WEEPT, andNJWT appear in Figure 1, and their associated sum-

Validity of Indirect Writing Assessment 173

Table 1 Test score means, standard deviations, standarderrors, ranges, and numbers of students for the TWE,WEEPT, and NJWT

MSDSERange

n

TWE(rating)

4.8.8.1

3.0-6.056

WEEPT(% correct)

63.913.31.8

33.0-93.056

NJWT(% correct)

83.110.31.6

55.0-98.037

10-

3-

025 O5 0.75 1

WEEPT ton Difficulty

Table 2 Item difficulty means, standard deviations,standard errors, ranges, and numbers of items for theWEEPT and NJWT

MSDSERange

n

WEEPT

.64

.19

.03

.27-.9540

NJWT

.83

.14

.02

.23-.9866

mary statistics are displayed in Table 1. The distribu-tion for the TWE reveals that writing ability in our stu-dent sample ranged from medium-level skills (a scoreof 3) to proficient skills (a score of 6). Although notinvariably true, it is generally unlikely for deaf andhard-of-hearing baccalaureate-level students to be in-competent or very low in writing competency. Further-more, Traxler (1990) reported a similar distribution onthe TWE for deaf and hard-of-hearing college studentsat Gallaudet University. These considerations suggestthat our sample is reasonably representative of the typ-ical range of writing skills found in this population.The WEEPT showed a very broad range of test scoresdistributed essentially symmetrically about a mean inthe mid-range of scores. This pattern suggests that theWEEPT displays an appropriate range of difficulty forthe language assessment of baccalaureate-level deafand hard-of-hearing students. The NJWT showed anarrower range of test scores restricted to values in theupper half of potential test scores and distributed in anegatively skewed fashion about a high-end mean.This pattern suggests that the NJWT is a relativelyeasy test for our student sample and may discriminateless well among high-performing students.

The distributions of item difficulties for the

10-

5-

0 015 05 075 1NJWT ton Difficulty

Figure 2 WEEPT and NJWT item difficulty distribu-tions.

WEEPT and the NJWT appear in Figure 2, and theirassociated summary statistics are displayed in Table 2.The histograms in Figure 2 show the number of itemsin each test associated with a particular item difficulty.Item difficulty is a proportional score computed as thenumber of students who correctly answered an itemout of the total number of students tested.

The WEEPT item difficulties are very broadly dis-tributed about a mean in the mid-range of difficulty.This is to be expected because the WEEPT is designedto include items that fall into three nominal categoriesof difficulty—11 easy items, 21 items of medium diffi-culty, and 8 difficult items. The means of the distribu-tions of the observed item difficulties in the presentsample, broken out by these difficulty categories, were.82, .63, and .39, respectively, with overlapping rangesof .70 to .95, .39 to .82, and .27 to .59. This suggeststhat deaf and hard-of-hearing baccalaureate-level stu-dents and the general hearing population have roughlysimilar difficulty in answering individual WEEPTitems. The NJWT item difficulties were largely distrib-uted over a more restricted range than the WEEPTitem difficulties, with a negatively skewed distribution

174 Journal of Deaf Studies and Deaf Education 1:3 Summer 1996

43-

to-

13-

XO-

23

Table 3 Regression coefficients (r) and coefficients ofdetermination (r2) for the WEEPT-TWE and NJWT-TWEregressions

30 40 TO 1 0

WEEPT Tut Scon (X Comet)

12-

»J3-

33-

M -

43-

4jO-

1 3 -

to-

2330 tO TO K) » ICC

NJWT Ttrt Scon (» Comet)

Figure 3 Scatter plots for the WEEPT-TWE and NJWT-TWE regression analyses.

centered about a high mean probability of success.Nevertheless, about one third of the items on theNJWT possessed item difficulties in the range of .24 to.79, indicating substantial variability in item difficulty.Generally, the NJWT distribution in Figure 2 confirmsthe impression given by the distribution of studentNJWT scores in Figure 1 that the NJWT is a fairlyeasy test for baccalaureate-level deaf and hard-of-hearing college students.

Given the clear variability in item difficulties on theWEEPT and the NJWT, we computed an internal con-sistency reliability measure for each test using Horst'smodification of the Kuder-Richardson 20 formula. Theinternal consistency reliability coefficients for theWEEPT and the NJWT were .733 and .905, respec-tively. These coefficients indicate moderately high re-liability for the WEEPT and high reliability for theNJWT. These reliability estimates are similar to theKuder-Richardson 20 estimates for the general popula-tion for the WEEPT of .83 (Using and Interpreting

Scores, 1979) and for the NJWT of .885 (Cooperman &Bloom, 1988). The reliability of the TWE could not beestimated for the present subject sample since only oneessay could be collected from each student

Figure 3 shows scatter plots of the TWE againstthe WEEPT and NJWT, respectively, with associated

WEEPT-TWE NJWT-TWE

r

r2

.491

.241.598.358

regression results displayed in Table 3. Precautionarycurve fitting procedures revealed no significant nonlin-ear relationships in these two scatter plots. TheWEEPT-TWE relationship was best fit by a simple lin-ear regression equation. The regression coefficient of.491 accounted for only about 24% of the variance inthe relationship (/ = 4.141, df = 55, p < .0001). TheNJWT-TWE relationship was also best fit by a simplelinear regression equation. The regression coefficientof .598 accounted for only about 36% of the variancein the relationship (/ = 4.409, df= 36, p < .0001).

The regression coefficients in Table 3 indicate howwell the indirect writing tests predict actual single-administration TWE test scores. However, these co-efficients do not adequately represent the validity of in-direct writing tests because they are attenuated by themeasurement error inherent in a single administrationof the TWE. The goal of using an indirect measure ofwriting competency is to predict an individual's truescore on a valid direct measure, not a score subject tomeasurement error. A true TWE score is the score thatan examinee would receive if there were no measure-ment error in the TWE itself. To correct for TWEmeasurement error in the estimation of the validity co-efficients, it is necessary to divide the regression co-efficients by the square root of the reliability coefficientfor the TWE.

Unfortunately, a directly measured TWE reliabil-ity coefficient for this student population is not avail-able. Therefore, the average of the eight alpha coeffi-cient measures of internal consistency reliability (.896)for the TWE presented in the TOEFL TWE Guide

(1989) was used. Table 4 displays the validity coeffi-cients and the coefficients of determination for theWEEPT and the NJWT corrected for attenuation dueto imperfect reliability in the TWE. The validity co-efficients, which are correlations, are the coefficientsthat describe the best prediction of writing competencythat could be expected for deaf and hard-of-hearing

Validity of Indirect Writing Assessment 175

Table 4 Regression coefficients (r) and coefficients ofdetermination (r2) for the WEEPT-TWE and NJWT-TWEregressions corrected for attenuation due to imperfect TWEreliability

WEEPT-TWE NJWT-TWE

r

r2

.519

.269.632.399

Table 5 Regression coefficients (r) and coefficients ofdetermination (r2) for the WEEPT-TWE and NJWT-TWEregressions corrected for attenuation due to imperfectreliability of the TWE and the indirect measures

WEEPT-TWE NJWT-TWE

T

r1.606.367

.664

.441

baccalaureate-level students, given actual indirect testmeasurements. The coefficients of determination (thesquares of the validity coefficients, describing varianceaccounted for) indicate how much information about astudent's writing skills is shared by the TWE and eachof the indirect tests. In other words, the coefficients ofdetermination indicate the proportion of the informa-tion contained in the TWE score that is provided tothe examiner by the indirect test scores. A perfect co-efficient of determination (r2 = 1.0) between the TWEand an indirect test would mean that the indirect testmeasured exactly the same skills as the TWE. In thiscase, the indirect test could be used in place of theTWE to provide the same information. An r2 of zerobetween the TWE and an indirect test would mean thatthe indirect test measured none of the same skills asthe TWE.

Generally, the validity of the two indirect measuresis poor for the purpose of predicting students' writingcompetency from a single administration of one ofthese indirect measures. The coefficients of determina-tion in Table 4 show that, at best, less than one half ofthe variance in writing ability due to individual differ-ences among students is accounted for by the NJWT.In other words, knowing an individual student's scoreon the NJWT, for example, gives the examiner only arough and very error-prone estimation of the student'sactual writing skill as it would be revealed by a perfectTWE measurement.

Finally, it is possible to address the question of howgood the predictive validity of the indirect tests wouldbe if there were a way to eliminate or reduce their ownmeasurement error, say, by repeated administrations ofthe tests. These results, based on a further correctionfor attenuation in regression coefficients due to the im-perfect reliability of the indirect tests, appear in Table5. The r values in Table 5 are estimates of the theoreti-cal upper limit on the validity of writing competency

measurement by the WEEPT and the NJWT. The lowvalues for the coefficients of determination (r2) suggestthat, regardless of the precision of testing, the WEEPTand the NJWT will very likely misrepresent deaf andhard-of-hearing students' writing competencies to asubstantial degree.

Discussion

We have presented evidence that indirect measures ob-tained in single test administrations have low validityfor assessing the writing competency of deaf and hard-of-hearing baccalaureate-level students. Furthermore,by estimating the theoretical upper limit on the validityof writing competency measurement by the WEEPTand the NJWT, it is also possible to conclude that re-peated test administrations of these indirect measureswould only marginally improve their validity for as-sessing the writing competency of deaf and hard-of-hearing baccalaureate-level students. The most accu-rate and appropriate way to assess their writing compe-tency, then, appears to be through the use of a validand reliable direct measure such as the TWE.

At this point there is no reliability data available forthe TWE for this population. Therefore, it was notpossible to estimate with precision the measurementerror associated with a single administration of theTWE to deaf and hard-of-hearing baccalaureate-levelstudents. However, the reliability of the TWE in thisstudy is likely quite high. By using test materials sup-plied by ETS, by closely following the ETS recom-mended procedures for administering the TWE, andby channeling the essays through ETS for scoring, weensured that most of the factors that could adverselyinfluence reliability for any population were controlled.Furthermore, there is some reason to suspect that thenormative population value (.885) might be approxi-mately correct for the deaf and hard-of-hearing popu-

176 Journal of Deaf Studies and Deaf Education 1:3 Summer 19%

lation in particular. Bochner et al. (1992) reported that,for a general college population of deaf and hard-of-hearing students who were given another valid, essay-based direct measure of writing competency (theNTTD Writing Test), the measured internal consis-tency reliability ranged from .83 to .91. The TWE andthe NTID Writing Test make very similar demands onthe writer and are both scored by multiple raters.Therefore, the numerical similarity of die reliabilitycoefficients from the TWE normative population andfrom the sample of Bochner et al. intimates that writingperformance in rater-scored, essay-based writing testsis relatively stable both for hearing and for deaf andhard-of-hearing populations.

If the TWE reliability for deaf and hard-of-hearingbaccalaureate-level students is approximately the sameas for the TWE normative population, then the stan-dard error of measurement for a single TWE testadministration would be .283, indicating that, widi95% confidence, an individual's measured TWE scorewould be at most approximately a half point away fromtheir true score. This error would cover a range of ap-proximately 1 scale interval out of 6. This range seemsconsiderably more acceptable than the correspondingconfidence intervals for the prediction of true TWEscores from the two indirect tests, which cover a rangeof approximately 2.5 to 3 scale intervals out of 6, nearlyhalf the TWE scale range. Future TWE reliabilitystudies will be necessary to confirm the accuracy ofthese calculations.

However, even if the reliability of the TWE for thedeaf and hard-of-hearing baccalaureate-level studentpopulation is lower than the value for the normativepopulation, it presents no fundamental limitation toaccurate assessment of writing competency. Unlike theindirect tests, whose ability to predict true TWE scorescannot be substantially improved by repeated test ad-ministrations, direct TWE measurement can be madeas accurate as desired simply by immediate repeatedtesting of a student. Accuracy will approach perfectmeasurement with each additional testing.

Although some educators have promoted the use ofindirect measures as a convenient and objective methodof assessing writing competency, and although suchmeasures continue to be used widely in educational en-vironments, we note that actual correlations betweenindirect and direct measures of writing competency

in the general population across different age levelstend to be no higher than about .5 to .7 (Breland,Camp, Jones, Morris, & Rock, 1987; Diederich, 1974;Hogan & Mishler, 1980; Moss, Cole, & Khampalikit,1982). The results of the current study suggest that thecorrelations of indirect with direct writing competencymeasures within the deaf and hard-of-hearing popula-tion are comparable in magnitude to those of the gen-eral population, when efforts are made to select indi-rect tests that are specifically appropriate for use withdeaf and hard-of-hearing students. Nevertheless, cor-relations of .5 to .7 imply that the prediction of writingcompetency by indirect measures will be relativelypoor, because indirect measures only account for aboutone quarter to one half of the variance in students'writing competency. This suggests that, despite theirappeal, indirect tests should not be used indiscrimi-nately to assess students' writing competency, regard-less of their hearing status.

In conclusion, the demonstrated low validity of dieindirect writing tests examined in this study suggeststhat indirect writing tests in general are inappropriatefor use widi deaf and hard-of-hearing baccalaureate-level students. The use of indirect measures to assesswriting competency could have serious negative conse-quences for the educational and career advancement ofthis population. Indirect tests of writing competencylack face and construct validity and, based on the re-sults of this study, also lack substantial criterion valid-ity. We recommend that writing competency assess-ment for deaf and hard-of-hearing people be carriedout through the use of direct measures. However, caremust still be taken to follow established procedures fortest administration and scoring to ensure the reliabilityand validity of writing competency assessment (seeBreland, Camp, Jones, Morris, & Rock, 1987, for fur-ther discussion).

Appendix

Various Skills Targeted by the Nine Tests

Comprehensively Reviewed in Berent and Berent

(1990)

Punctuation Use of commas, semicolons,periods, etc, and capital-

ization

Validity of Indirect Writing Assessment 177

Grammar and Usage

Sentence Structure

Organization

Style

Rhetorical Strategy

Vocabulary

Literature

World Knowledge

Library Skills

Rote Knowledge

Use of parts of speech, verbtenses, subject-verbagreement, pronounagreement, etaSentence construction rules,relationships between clauses,etc

Organization of ideas, includ-ing order, coherence, andunity

Parallel structure, economy,preferred word choice, etcAppropriateness of expressionin relation to audience andpurpose, supporting material,etc

Use of lexical items to expressrequired meanings; analogies,etcKnowledge of literary worksand genresKnowledge of world events,practical facts, and their impli-cations

Knowledge of library facilitiesand resources

Knowledge of specific defini-tions (e.g., parts of speech),formulas, etc

Notes

1. Copies of Berent and Berent (1990) are available from theStaff Resource Center, National Technical Institute for the Deaf,Rochester Institute of Technology, 52 Lomb Memorial Drive,Rochester, NY 14623-5604.

2. An anonymous reviewer raised the question why the Testof Syntactic Abilities (Quijdey, Steinkamp, Power, & Jones,1978), developed to assess structural knowledge of English indeaf children and adolescents, was not considered as a possibleindirect measure suitable for the deaf and hard-of-hearing popu-lation addressed in this study. There were two reasons for ourdecision to exclude this test. First, the Test of Syntactic Abilitiesis not regarded as a writing test, but rather as a test of Englishlanguage sentence structure. More importantly, Bochner (1981)has reported data that indicate that students at the NationalTechnical Institute for the Deaf (NTTD) in general tend to per-form near ceiling leveb on this test. Therefore, the Test of Syn-tactic Abilities would likely have inadequate sensitivity to indi-vidual differences in writing ability among baccalaureate-leveldeaf and hard-of-hearing students in particular, since they tend

to be the deaf and hard-of-hearing students with the highest lev-els of English language proficiency at the Rochester Institute ofTechnology, including NTID.

3. An anonymous reviewer inquired whether the indirecttests used in this study were esoteric or not widely used.The WEEPT was developed and distributed to 2,500 colleges,schools, school systems, and education associations as part of theComparative Guidance and Placement Program of the CollegeBoard. The NJWT was developed as a part of the New JerseyStatewide Testing System for use throughout that state. Thus,these tests have been widely distributed and used. However, re-gardless of the distribution and popularity of these particulartests, they were selected for their appropriateness for potentialuse with deaf and hard-of-hearing students based on specificcontent-oriented test selection criteria.

References

ACT Collegiate Assessment of Academic Proficiency: Writing SkillsTest. (1988). Iowa Gty, IA: The American College TestingProgram.

Advanced Placement Exam in English Language and Composition.(1990). Princeton, NJ: College Entrance ExaminationBoard.

Berent, R-, & Berent, G. P. (1990). Report on indirect measures'of writing competency for use with deaf and hard-of-hearingstudents (Technical Report). Rochester, NY: Rochester Insti-tute of Technology, National Technical Institute for theDeaf.

Bochner, J. H. (1981). Linguistics and diagnostic testing: Re-marks on the TSA. Teaching English to the Deaf 7(2), 18-22.

Bochncr, J. H., Albertini, J. A., Samar, V.J., & Mete, D. E.(1992). External and diagnostic validity of the NTID Writ-ing Test: An investigation using direct magnitude estimationand principal components analysis. Research in the Teaching1-of English, 26, 299-314.

Braddock, R_, Lloyd-Jones, R., & Schoer, L. (1963). Research inmitten composition. Urbana, IL: National Council of Teach-ers of English.

Brebnd, H. M., Camp, R., Jones, R. J., Morris, M. M., & Rock,D. A. (1987). Assessing writing skill. New York: College En-trance Examination Board.

Breland, H. M., & Gaynor, J. L. (1979). A comparison of directand indirect assessments of writing skill. Journal of Educa-tional Measurement, 16, 119-128.

The College Board Achievement Test in English Composition. (1990).Princeton, NJ: Educational Testing Service.

College English Placement Test. (1969). Chicago, IL: RiversidePublishing Company.

The College-Level Examination Program: Subject Examination inCollege Composition. (1989). Princeton, NJ: College BoardPublications.

Coopennan, S., & Bloom, J. (1988). New Jersey Statewide TestingSystem: High School Proficiency Tat (Technical Report, vol.1). Trenton, NJ: New Jersey Department of Education.

The Descriptive Tests of Language Skills in Conventions of WrittenEnglish. (1988). Princeton, NJ: The College Board.

Diagnostic Skill Level Inventory for Writing Skills. (1988). Cor-unna, IN: Educational Diagnostic Services.

178 Journal of Deaf Studies and Deaf Education 1:3 Summer 1996

Diederich, P. (1974). Measuring growth in English. Urbana, IL:National Council of Teachers of English.

Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in

psychology and education (6th cd.). New York McGraw-Hill.Hogan, T. P., & Mishler, C (1980). Relationship! between essay

tests and objective tests of language skills for elementaryschool students. Journal of Educational Measurement, 17,

219-227.Huot, B. (1990). The literature of direct writing assessment: Ma-

jor concerns and prevailing trends. Review of Educational Re-

search, 60, 237-263.International Standards Organization. (1975). Acoustics—Stan-

dard reference zero for the calibration of pure-tone audiometers

(ISO 389). Geneva.Moss, P. A., Cole, N. S., & Khampalikit, C (1982). A compari-

son of procedures to assess written language skills at grades4, 7, and 10. Journal of Educational Measurement, 19, 37-47.

Perkins, K. (1983). On the use of composition scoring tech-

niques, objective measures, and objective tests to evaluateESL writing ability. TESOL Quarterly, 17, 651-671.

Quigley, S. P., Steinkamp, M., Power, D J., & Jones, B. (1978).Test of Syntactic Abilities. Beaverton, OR: Dormac

The Test of Written English. (1986). Princeton, NJ: EducationalTesting Service.

TOEFL Test of Written English Guide. (1989). Princeton, NJ: Ed-ucational Testing Service.

Trailer, C B. (1990, April). Assessing the writing competence of

deaf college students: A new use for the TOEFL TWE. Paperpresented at the annual meeting of the American Educa-tional Research Association, Boston, MA.

Using and interpreting scores on the CGP self-scoring placement tests

in English and mathematics. (1979). Princeton, NJ: Educa-tional Testing Service.

The Written English Expression Placement Test. (1985). Princeton,NJ: Educational Testing Service.