82
Reading for Information Technical Manual

Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Reading for InformationTechnical Manual

Page 2: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

ACT endorses the Code of Fair Testing Practices in Education and the Code of ProfessionalResponsibilities in Educational Measurement, guides to the conduct of those involved in educationaltesting. ACT is committed to ensuring that each of its testing programs upholds the guidelines in eachCode. A copy of each Code may be obtained free of charge from ACT Customer Services (68), P.O.Box 1008, Iowa City, IA 52243-1008, 319/337-1429.

Title: WorkKeys Reading for Information Technical Manual. Author: ACT. Publisher: ACT.

© 2008 by ACT, Inc. All rights reserved. ACT® and WorkKeys® are trademarks of ACT, Inc.

For questions or more information, please contact the ACT national office:

www.act.org/workkeys/index

500 ACT DriveP.O. Box 168Iowa City, IA 52243-0168

Telephone: 800/967-5539

Fax: 319/339-1790

11020

Page 3: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

The WorkKeys® System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vReading for Information—A Core WorkKeys Assessment . . . . . . . . . . vDevelopment of Reading for Information . . . . . . . . . . . . . . . . . . . . . . . viComprehensive Evaluation of Reading for Information Scores . . . . . viiA Multi-Faceted Approach to Validity . . . . . . . . . . . . . . . . . . . . . . . . . viiiAlignment with Industry Standards and Government Requirements . . ixOrganization of Technical Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

The WorkKeys System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Reading for Information—A Foundational WorkKeys Assessment . . . 1The Growing Need for Objective Assessment of Job Skills . . . . . . . . 2

WorkKeys: A Common language for Assessing Job SkillsComponents of the WorkKeys System . . . . . . . . . . . . . . . . . . . . . . . . . 3WorkKeys Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Personal Skills AssessmentsFoundational Skills Assessments

WorkKeys Job Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6WorkKeys EstimatorSkillMap® Job InventoryWorkKeys Job Profiling

Value-Added Resources for Workforce Development . . . . . . . . . . . . . 8Educational Program Planning and EvaluationCurriculum Development and Job Opportunities

About the Reading for Information Assessment . . . . . . . . . . . . . . . . 10

Construct Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11The Need for a Reading for Information Assessment . . . . . . . . . . . . 11Reading in Context: Classroom Versus Workplace . . . . . . . . . . . . . . 12

Classroom ReadingWorkplace ReadingDifferences between Reading in Classrooms and

Reading in WorkplacesThe Construct of Reading for Information . . . . . . . . . . . . . . . . . . . . . 16

Construct AspectsAspect 1: Reading SkillsAspect 2: Document TypesAspect 3: Level of Complexity

Representing the World of Work in Reading for Information . . . . . . . 18

Assessment Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Overview of the WorkKeys Test Development Process . . . . . . . . . . . 21Development of the Reading for Information Assessment . . . . . . . . 26

Scaling and Equating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Level Score Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Level Score Scaling StudyScale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Equating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Pool Calibration and Pre-Equating

i

Page 4: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43The Concept of Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Internal Consistency for Number-Correct Scores . . . . . . . . . . . . . . . 44Generalizability Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Standard Errors of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Classification Consistency of Reading for Information Level Scores . . 48

Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Chapter Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Validity: A Unitary Evidence-Based Concept . . . . . . . . . . . . . . . . . . . 51Construct-Related Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52WorkKeys Reading for Information and the ACT Assessment . . . . . 52Criterion-Related Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Correlations between Reading for Information and Performance Ratings

Classification ConsistencyContent-Related Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Job Analysis and High-Stakes Employment Decisions . . . . . . . . . . . 60

Adverse Impact and Validity EvidenceGender and Race/Ethnicity AnalysesConclusion

Appendix 1: WorkKeys Assessment Formats, Administration Times,and Delivery Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Appendix 2: Special Testing Arrangements: Accommodations for WorkKeys Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Appendix 3: Readability and Grade Level of Reading for Information . . 66

Works Cited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

ii

Page 5: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Tables3.1 Skill Definition for Reading for Information . . . . . . . . . . . . . . . . . . . . . 273.2 Demographic Statistics of Initial Item Writers (N=20) . . . . . . . . . . . . 293.3 Descriptive Statistics of Operational Forms Based on Pretest Data . . 314.1 Summary Statistics for Number Correct Scores for Reading for

Information Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Reading for Information Boundary Thetas, Form-Specific

Cutoff Thetas, and NC Score Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Percentages of Test Takers by Level Score by Form . . . . . . . . . . . . . 384.4 Conversion Table of Scale Scores to Levels for Reading for

Information (an Example) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.5 Summary Statistics for Scores and Scale Scores . . . . . . . . . . . . . . . 405.1 Estimated Variance Components, Error Variances, and

Generalizability Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2 Predicted Classification Consistency for Level Scores . . . . . . . . . . . 496.1 Correlations between WorkKeys Reading for Information,

ACT Reading, and ACT English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2 Conditional Distributions of Score Ranges for WorkKeys Reading

for Information Level Scores and ACT Reading Scale Scores . . . . . 536.3 Correlations between WorkKeys Reading for Information Scores

and Job Performance Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.4 Job Classification Consistency of Reading for Information . . . . . . . . 576.5 Descriptive Statistics of Reading for Information Mean Level

Scores by Gender and Race/Ethnicity . . . . . . . . . . . . . . . . . . . . . . . . 62A6.1 MetaMetrics Lexile and Grade Range Compared to Flesch-Kincaid

Grade Level for WorkKeys Reading for Information Form 13AA . . . . 66A6.2 Mean Readability of Reading for Information Reading

Selections in Forms C, D, and E . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Figures1.1 WorkKeys Skill Assessment System . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Personal Skills Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 WorkKeys Foundational Skills Assessments . . . . . . . . . . . . . . . . . . . . 62.1 Reading in the Classroom Versus Reading in the Workplace . . . . . . 152.2 Three Interacting Dimensions of the Reading for Information

Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Factors Affecting Complexity of Reading for Information . . . . . . . . . 182.4 World-of-Work Career Clusters and Areas . . . . . . . . . . . . . . . . . . . . . 193.1 Content Review Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Fairness Review Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3a Target Distribution of Skills Reading for Information Test Forms(30 Scored Items Total) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3b Target Distribution of Document Types on Reading for InformationTest Forms (30 Scored Items Total) . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Item p-Values (p) and mean Item p-Values by Level of Item . . . . . . 344.2 Reading for Information Level Characteristic Curves . . . . . . . . . . . . 355.1 SEMs for Two WorkKeys Reading for Information Test Forms . . . . . . 476.1 Boxplots of Scale Scores on ACT Reading at Each Level Score

on WorkKeys Reading for Information . . . . . . . . . . . . . . . . . . . . . . . . 546.2 How ACT WorkKeys Job Analysis Procedures Meet Uniform

Guidelines Requirements for Content Validation . . . . . . . . . . . . . . . . 59

iii

Page 6: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific
Page 7: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Executive SummaryIn today’s high-performance workplace, employers increasingly recognize thatmany workers demonstrate serious gaps in many of the essential skills requiredfor success on the job. Such perceived gaps in job skills reflect dramaticchanges in the demographic profile of the American workforce. As ourworkforce grows more diverse and newly created jobs require higher levels ofskill, the proportion of workers with the necessary skill credentials is expectedto decline, while the demand for workers with better skills is expected to grow.Such findings have prompted those with a stake in workforce development tosee the need for more effective means of assessing job skills relative toperformance standards. Responding to this need, ACT created WorkKeys, astandardized skill assessment system, to furnish stakeholders with clear,objective information about the skills of our workforce. WorkKeys givesemployers, educators, and other stakeholders a common language forcommunicating about the personal characteristics and foundational skills oftomorrow’s workforce.

The WorkKeys SystemAs a workforce development system, WorkKeys has three main components:assessments, job analysis, and value-added user resources. The assessmentsinclude criterion-referenced measures of foundational skills such as Reading forInformation, which can be augmented with norm-referenced assessments ofwork-related personal characteristics such as integrity. In addition, users canselect among job analysis methods to determine standards of performance inthe foundational skills most relevant to a job. WorkKeys also provides access toa wide range of value-added resources for anyone invested in the process of workforce development. Most notable is the recent introduction of ACT’sNational Career Readiness Certificate Program, a resource designed tocommunicate information about the skills of job applicants in the three coreWorkKeys skill areas of Reading for Information, Applied Mathematics, andLocating Information.

Reading for Information—A Core WorkKeysAssessmentThis manual documents the technical characteristics of Reading for Information,one of the core foundational skills assessments in the WorkKeys system.Reading for Information is based on the premise that workers who possessfoundational skills in reading can perform on the job and more readily learn job-specific skills through experience or additional training. Reading for Informationis a short multiple-choice test designed to measure skills in reading work-related documents. These documents are based on materials such as memos,letters, directions, signs, notices, bulletins, policies, and regulations that reflectthe actual reading demands of the workplace.

Reading for Information is based on a construct defined in terms of threeaspects: Reading Skills, Document Types, and Level of Complexity. ReadingSkills include choosing main ideas or details, understanding word meanings,applying instructions, applying information, and applying reasoning. Thedocument types used as reading selections include instructions, information,policies, contracts, and other legal documents. Like all WorkKeys foundationalskills assessments, Reading for Information also incorporates an aspect of increasing Complexity with respect to the tasks and skills assessed.

v

Page 8: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

In developing Reading for Information, ACT translated this construct into aworking set of test specifications that is used to guide the construction of testforms. The test specifications determine for each level of items what readingskills are required and what document types are used. In addition, ACT uses itsWorld-of-Work Career Clusters to ensure that problems are presented in a widevariety of workplace situations.

Development of Reading for InformationTo develop Reading for Information, ACT first reviewed the relevant literature,consulted with advisory panels from business and education, and studied abroad range of jobs requiring skill in reading. From this research, ACT came todefine Reading for Information as a measure of the skills people use when theyread and use written text in order to do a job. The skills to be assessed werethen defined in terms of WorkKeys skill levels ranging from Level 3 to Level 7,the levels of greatest interest to employers who need to administer skill tests tojob applicants.

Next ACT used a draft set of test specifications to develop a prototypeassessment consisting of 75 multiple-choice items organized by skill level, each containing 15 items. To determine the properties of the prototype, ACTadministered it to samples of students and employees in three midwesternstates. ACT then considered the statistical results and participant surveyfeedback from the prototype administration in making adjustments to theReading for Information skill definition and skill level descriptions.

ACT then developed a pool of Reading for Information items designed to meetthe WorkKeys standards and specifications for item content and format. Fromthis pool, ACT selected items for pretesting and subsequent psychometricevaluation. Following this evaluation, ACT withdrew any problematic items andconducted content and fairness reviews on all of the remaining items withacceptable pretest statistics. Pretest items that met the content, fairness, andstatistical specifications were then used to assemble the first three operationalforms of Reading for Information. Each form consisted of 30 operational itemswith 6 items at each skill level, which were selected to meet the content andstatistical specifications of the test. After collecting sufficient operational scoredata on the published forms, ACT conducted a scaling study to place scoresfrom the different test forms on a common scale.

ACT continues to develop and pretest items, which are embedded in the newoperational forms of Reading for Information published periodically. Givensufficient data, ACT has also evaluated operational items for differential itemfunctioning (DIF). To ensure each form’s readiness for operational use, ACTused classical and item response theory (IRT) methods to examine its statisticalproperties and asked advisory panels to conduct final reviews for content andfairness. ACT equates the forms to make statistical adjustments to differencesin their difficulty. Meanwhile, ACT has developed and is continuously expandinga pool of Reading for Information items designed to meet the WorkKeysstandards and specifications for item content and format.

vi

Page 9: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Comprehensive Evaluation of Reading forInformation ScoresTo ensure the accuracy and meaningfulness of test scores on Reading forInformation, ACT has conducted extensive psychometric analyses, beginningwith scaling and equating. Scaling is a process of setting up a rule ofcorrespondence between a test’s observed scores and the numbers assignedto them. Reading for Information assigns scores on two scales: Level Scoresand Scale Scores.

Level Scores reflect the expectation that test takers have mastery of the levelspecified in the score and all the levels below it. To establish the Level Scorescale, ACT conducted an empirical scaling study of pools of Reading forInformation items. Using IRT methods, ACT determined an assignment of LevelScores for test takers that reliably supported the assumptions that mastery of alevel means a test taker is able to correctly answer 80 percent of the itemsrepresenting the level, and that test takers have mastery of all levels up to andincluding the level specified in the Level Score. ACT used an expectedproportion correct method to define the Level Score scale. This method rests oncertain statistical assumptions, notably the fit of the IRT model to the score data.The results of the scaling study showed the fit of the model to be very good.

Scale Scores give users more detailed information for purposes of programevaluation and outcome measurement. Scale Scores make finer distinctionsamong test takers’ abilities than Level Scores. ACT used the equal standarderror of measurement method to determine the assignment of Scale Scores,which fall on a scale of 25 points ranging from 65 to 90.

As new forms of Reading for Information are developed, each is constructed toadhere to the test specifications. To control for the inevitable slight differencesin form difficulty, scores on different forms are equated so that they have thesame meaning when reported as Level Scores or Scale Scores. Depending onthe circumstances of administration and other factors, ACT uses different datacollection designs and accepted methods of equating. ACT also uses IRTmethods to maintain a pool of Reading for Information items calibrated suchthat their item parameter estimates can be used to assemble new, pre-equatedforms.

In addition to scaling and equating, ACT evaluated the reliability of Reading forInformation test scores using a variety of techniques. These include estimatingthe internal consistency of test forms, conducting generalizability analyses,computing Scale Score reliability estimates, and estimating classificationconsistency. Classification consistency refers to the extent to whichclassifications of examinees agree when obtained from two independentadministrations of a test or two parallel forms of a test. For Reading forInformation, ACT used the three-parameter logistic (3PL) model to determine IRTestimates of classification consistency. Overall the results showed Reading forInformation to be an acceptably reliable measure of work-related reading skills.

vii

Page 10: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

A Multi-Faceted Approach to ValidityACT has examined three types of validity evidence collected to justify the useof Reading for Information scores in making employment decisions: construct-related evidence, criterion-related evidence, and content-related evidence. Toaccumulate such evidence, ACT has conducted validity studies or worked withorganizations to collect data on students and employees.

To support the construct-related validity of Reading for Information test scores,ACT examined the relationship between Reading for Information and the ACTReading and English Tests, which measure the language skills identified asprerequisites to successful performance in entry-level college courses in readingand English. ACT found moderate correlations between these measures: Ingeneral, test takers who received higher Scale Scores on the ACT Reading andEnglish Tests also received higher Level Scores on Reading for Information.

To support the criterion-related validity of Reading for Information test scores,ACT has gathered study data from various organizations on the correlationbetween Reading for Information test scores and the job performance ratings ofthe employees. These studies showed positive correlations between test scoresand performance ratings ranging from 0.12 to 0.86, which compares favorablywith the correlations found in the general research literature on criterion-relatedvalidity of employment tests. ACT has also conducted classification consistencystudies, comparing the employees’ job performance classification to theirclassification by Reading for Information skill level. In these studies, thepercentage of employees classified the same way by both measures rangedfrom 71 percent to 79 percent.

To support the content-related validity of Reading for Information test scores,ACT uses two job analysis procedures—WorkKeys Job Profiling and theSkillMap Job Inventory—to link the Reading for Information skill levels torelevant job behaviors. Profiling and SkillMap are both designed to meet federalstandards and other industry guidelines for content validation of employmenttests used for high-stakes decisions such as hiring and promotion. Bothprocedures can be used to define critical job tasks, determine which WorkKeysskills are relevant to performing the tasks, and identify the level of skill requiredfor performing them.

viii

Page 11: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Alignment With Industry Standards andGovernment RequirementsIn developing and maintaining the Reading for Information assessment, ACThas closely followed well-established policies and procedures consistent withindustry standards and government requirements, as published in:

• Code of Fair Testing Practices in Education (2004) prepared by the JointCommittee on Testing Practices

• Code of Professional Responsibilities in Educational Measurement (1995)prepared by the National Council on Measurement in Education

• Standards for Educational and Psychological Testing (1999) prepared by the American Educational Research Association, American PsychologicalAssociation, and National Council on Measurement in Education

• Uniform Guidelines on Employee Selection Procedures (1978) adopted bythe U.S. Equal Employment Opportunity Commission (EEOC)

Organization of Technical ManualChapter Title Content

1 The WorkKeys System Describes the WorkKeys system and howReading for Information fits into it

2 Construct Definition Discusses how reading, as taught in theclassroom, differs from reading applied totasks in the workplace, and how thesedifferences informed ACT’s definition ofthe Reading for Information construct

3 Test Development Offers an overview of the WorkKeys testdevelopment process and how it wasapplied to the development of Reading for Information

4 Scaling and Equating Documents the procedures used todevelop the Reading for Informationscore scales and to equate test forms

5 Reliability Documents the various techniques used to estimate the reliability of Reading for Information test forms

6 Validity Summarizes validation procedures andstudies regarding the meaningful andappropriate interpretation of Reading forInformation test scores

ix

Page 12: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific
Page 13: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The WorkKeys SystemChapter AbstractIn today’s high-performance workplace, employers increasingly recognize thatmany workers demonstrate serious gaps in many of the skills required forsuccess on the job. Such perceived gaps in job skills reflect dramatic changesin the demographic profile of the American workforce. As our workforce growsmore diverse, the proportion of workers with the necessary academiccredentials is expected to decline, while the demand for workers with betterskills is expected to grow. Such findings have prompted those with a stake inworkforce development to identify a need for more effective means of assessingjob skills relative to performance standards. Responding to this need, ACTcreated WorkKeys—a standardized skill assessment system—to furnishstakeholders with clear, objective information about job skills. WorkKeys givesemployers, educators, and other stakeholders a common language forcommunicating about the personal characteristics and foundational skills oftomorrow’s workers.

The WorkKeys SystemAs a workforce development system, WorkKeys has three components:assessments, job analysis, and value-added resources for educators andemployers. The assessments include criterion-referenced measures offoundational skills such as Reading for Information, which can be augmentedwith norm-referenced assessments of personal skills. In addition, users canselect from one of three methods of WorkKeys job analysis to determinestandards of performance in the foundational skills most relevant to a job.WorkKeys also provides access to a wide range of value-added resources foranyone invested in the process of improving job skills. Most notable is therecent introduction of ACT’s National Career Readiness Certificate Program,which is designed to communicate information about the skills of job applicantsin the three foundational WorkKeys skill areas of Reading for Information,Applied Mathematics, and Locating Information.

Reading for Information—A Foundational WorkKeys Skill Assessment As an assessment of foundational skills, Reading for Information emphasizesthe application of skills to the solution of specific workplace problems andsituations. Specifically, Reading for Information is a multiple-choice testdesigned to measure skill in reading written text in order to do a job. The testdocuments are written to reflect the way written information is presented in aworkplace setting. The items simulate workplace tasks using the document.

And, like all WorkKeys foundational skills assessments, Reading for Informationis criterion-referenced. This means that an individual’s performance on the testis compared to an established set of criteria—in this case, the range of skillsestablished in the WorkKeys skill description for Reading for Information. Scoresare reported as WorkKeys skill levels, which are delineated in terms of thecomplexity of the tasks and materials used on the job. These levels are designedto build on each other, each incorporating the skills assessed at the previouslevels. In the case of Reading for Information, items are ordered in five levels ofincreasing complexity, from Level 3 to Level 7. These WorkKeys skill levelsprovide a commonly understood framework for describing the foundational skillsemployees need to function effectively on their first day in a job, to demonstratesustained performance over time, and to learn new, job-specific skills.

1

Page 14: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The Growing Need for Objective Assessmentof Job SkillsEmployers who have long relied on our schools to educate their workforce nowface a powerful dilemma: Traditional credentials such as high school diplomasand college degrees no longer offer consistent assurances that workers areready to participate in today’s fast-paced, high-performance workplace.Increasingly, employers now recognize that workers often demonstrate seriousgaps in many of the relevant personal characteristics and foundational skillsrequired for success on the job. For instance, a recent survey of employers inmanufacturing found a “paradoxical mismatch…between the need for thehighest skill levels ever and the current need to address basic employabilityissues and basic skills in general” (National Association of Manufacturers,2005). As manufacturing processes change, employers know they needworkers with more sophisticated technical skills. At the same time, theyperceive that many workers lack the relevant personal characteristics (such as timeliness) and foundational skills (such as reading, writing, teamwork, and mathematics) essential to effective performance.

Such perceived gaps in job skills reflect a dynamic redrawing of America’sdemographic profile. The fastest growing demographic groups in the U.S. arethe least educated. And, as our workforce grows more diverse, “the proportionof workers with high school diplomas and college degrees will decrease andthe personal income of Americans will decline over the next fifteen years”(National Center for Public Policy and Higher Education, 2005). Paradoxically,this projected decline “coincides with the growth of a knowledge-basedeconomy” in which about two thirds of the jobs now require some post-secondary education (Carnevale and Desroches, 2003).

Even so, conventional educational programs offer no guarantees that workerspossess the job skills employers say they need. A recent study of student skillsfinds, for example, that “Twenty percent of U.S. college students completing 4-year degrees—and 30 percent of students earning 2-year degrees—haveonly basic quantitative literacy skills. . . . They are unable to estimate if their carhas enough gasoline to get to the next gas station or calculate the total cost ofordering office supplies” (American Institute for Research, 2006).

WorkKeys: A Common Language for Assessing Job SkillsSuch findings have prompted those with substantial stakes in workforcedevelopment to identify a need for more effective means of assessing job skillsrelative to performance standards. Responding to this need, ACT createdWorkKeys—a standardized skill assessment system—to furnish stakeholderswith clear, objective information about job skills. WorkKeys offers employers,educators, and other stakeholders a common language for communicatingabout the personal characteristics and foundational skills most relevant to jobsuccess.

2

Page 15: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Components of the WorkKeys SystemThe common language of WorkKeys has three components—assessments, jobanalysis, and value-added resources for workforce development. WorkKeyssupplies criterion-referenced assessments of essential literacy and otherfoundational skills, which can be augmented with norm-referencedassessments of personal characteristics such as integrity. To determinestandards of performance in the foundational skills most relevant to a job, userscan select from one of three methods of WorkKeys job analyses. Finally,WorkKeys provides access to a wide range of value-added resources forstakeholders interested in developing and documenting the job readiness of theworkforce. Most notable is the recent introduction of ACT’s National CareerReadiness Certificate, a program designed to communicate information aboutthe skills of job applicants in the core WorkKeys skill areas of Reading forInformation, Applied Mathematics, and Locating Information.

WorkKeys AssessmentsThe WorkKeys system incorporates two batteries of assessments—personalskills and foundational skills—each designed to measure crucial aspects of employability. Personal skills assessments measure an individual’scharacteristics in relation to the demands of specific jobs and broaderoccupational categories. Foundational skills assessments measure whetherindividuals possess the levels of cognitive skills needed to prepare for work, tolearn a new job, or to perform a job with increasing effectiveness. Here weprovide an overview of the personal skills and foundational skills assessments.Details about the available formats, administration times, and delivery optionsfor each assessment appear in Appendix 1.

Figure 1.1WorkKeys Skill Assessment System

EssentialLiteracy Skills

ComplementaryEmployability

Skills

Work-relatedPersonality

Characteristicsand Interests

Reading forInformation

LocatingInformation

AppliedMathematics

BusinessWriting

Listening

AppliedTechnology

Teamwork

Observation

Performance

Talent

Fit

Foundational SkillsPersonal Skills

3

Page 16: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Personal Skills AssessmentsWorkKeys includes three personal skills assessments—Performance, Talent,and Fit. These measures provide incremental predictions of job performanceover and above the data provided by other employer assessments such asstructured interviews, work samples, reference checks, and WorkKeysfoundational skills assessments. As summarized below, Performance, Talent,and Fit are norm-referenced measures of individual characteristics that may behelpful in matching people to the kinds of jobs that are right for them.

Figure 1.2Personal Skills Assessments

Performance is an integrity test used to screen individuals for potentiallyproblematic work behaviors. Performance identifies job candidates who mightbe prone to problems with general work attitude and conduct, as well asconcerns with following safety rules and procedures. Test results include aperformance index that helps human resources staff in their selection decisions(WorkKeys Performance Assessment User and Technical Guide, 2007).

Talent is an inventory of normal personality. It measures a set of twelvepersonality characteristics reflecting a spectrum of behaviors and attitudescommonly found in the workplace. These characteristics are important becausethey are associated with a variety of work outcomes, such as organizationalcitizenship and teamwork, and because they vary in importance depending onjob demands and job complexity. Accordingly, when an organization seeks tohire and develop quality employees, it may be important to consider thevariables assessed by Talent (WorkKeys Talent Assessment User and TechnicalGuide, 2007).

Fit evaluates a job candidate’s interests and values to determine the closenessof fit between the candidate and different occupations. Specifically, it comparesthe self-reported activity preferences (interests) and work values of examineesto the corresponding activities and characteristics of occupations, as drawnfrom the current O*NET database (National Center for O*NET Development,2006). Test results summarize the examinee’s highest and lowest measuredinterests and values, and provide Fit scores for specified occupations that canbe used to inform resource management decisions (WorkKeys Fit AssessmentUser and Technical Guide, 2007).

4

Page 17: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Foundational Skills AssessmentsIn addition to personal skills assessments, WorkKeys offers two classes offoundational skills assessments—those for essential literacy skills andadditional tests of complementary employability skills. WorkKeys assessmentsof essential literacy skills address the “four Rs” first studied in the classroomand later applied in the workplace:

• Reading for Information (literacy in prose comprehension)

• Locating Information (document literacy)

• Applied Mathematics (mathematical literacy or numeracy)

• Business Writing (literacy in prose production)

Though less often taught in the classroom, complementary employability skillsare necessary for success in many jobs. WorkKeys currently providesassessments for four such skills: Listening, Applied Technology, Teamwork,and Observation.

As described in Figure 1.1, all of the foundational skills assessments emphasizethe application of skills to the solution of specific workplace problems andsituations. The test items are written to reflect the way skills are used inworkplace settings, with an emphasis on critical thinking and problem solving.

The foundational skills assessments are criterion-referenced. This means thatan individual’s performance on the test is compared to an established set ofcriteria—in this case, the range of skill established in the WorkKeys skilldescription for a particular assessment. Scores are reported as WorkKeys skilllevels, which are delineated in terms of the complexity of the tasks andmaterials used on the job. Skill levels provide a commonly understoodframework for describing the skills employees need to function effectively ontheir first day in a job, to demonstrate sustained performance over time, and tolearn new, job-specific skills.

For each foundational skills assessment, ACT uses a list of typical workplacetasks to develop the skill description and to define its hierarchy of skills levels.In this hierarchy, the lowest level comprises the least complex skills for whichbusinesses typically find it useful to conduct an assessment. This level hasbeen identified as Level 3 to allow for the possibility that lower levels might bedefined in the future to meet some business need. The highest level—identifiedas Level 7—is set at the point at which more complex levels of skill wouldprobably require specialized training. These skill levels constitute the primaryscale used for scoring the foundational skills assessments and for conductingWorkKeys job analysis studies, as discussed in the next section.

5

Page 18: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 1.3WorkKeys Foundational Skills Assessments

WorkKeys Job AnalysisJob analysis is the systematic process of discovering, understanding, anddescribing what people do at work (Brannick, Levine, and Morgeson, 2007).Using various models and methods, industrial-organizational psychologistsconduct job analysis studies for such purposes as job classification, jobredesign, performance appraisals, training, workforce planning, and standardsetting. WorkKeys job analysis is designed to set performance standards forthe WorkKeys foundational skills assessments. These standards are expressedin terms of the skill levels required to perform different jobs or categories ofoccupations. Integral to WorkKeys job analyses are the WorkKeys skilldescriptions and skill levels, as defined for the foundational skills assessments.ACT currently offers three methods of WorkKeys job analyses—WorkKeysEstimator®, the SkillMap® Job Inventory, and WorkKeys Job Profiling.

Essential Literacy Skills

Reading for Information measures the skills people use when they read anduse written text in order to do a job. Written texts include memos, letters,directions, signs, notices, bulletins, policies, and regulations and are basedon materials that reflect actual reading demands of the workplace.

Locating Information measures the skills people use when they work withworkplace graphics such as charts, graphs, tables, forms, flowcharts,diagrams, floor plans, maps, and instrument gauges. Test takers are asked tofind information in a graphic or insert information into a graphic. They alsomust compare, summarize, and analyze information found in related graphics.

Applied Mathematics measures the skills people use when they applymathematical reasoning and problem-solving techniques to work-relatedproblems. Test takers are asked to set up and solve the types of problems and do the types of calculations that actually occur in the workplace.

Business Writing measures the skills people use when they write an originalresponse to a work-related situation. Components of writing skills includesentence structure, mechanics, grammar, word usage, tone and word choice,organization and focus, and development of ideas.

Complementary Employability Skills

Listening measures the skills people use when they receive verbalinformation in the workplace and relay it to another person.

Applied Technology measures the skills people use when they solveproblems with machines and equipment found in the workplace. Test takersare asked to apply basic principles in four areas of technology: electricity,mechanics, fluid dynamics, and thermodynamics.

Teamwork measures the skills people use for choosing behaviors that leadtoward the accomplishment of work tasks and support the relationshipsbetween team members. A team is defined as any workplace group with acommon goal and ownership of shared responsibility in achieving that goal.

Observation measures the skills people use when they pay attention to andremember work-related instructions, demonstrations, and procedures.

6

Page 19: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

WorkKeys EstimatorWorkKeys Estimator is a step-by-step process designed to provide users witha method of documenting their decisions concerning the use of WorkKeysfoundational skills assessments. Companies may use WorkKeys Estimator toassist with low-stakes uses of the assessments such as enhancing recruitingefforts and developing training goals. Effective implementation of WorkKeysEstimator depends on the appointment of a coordinator, who uses standardized,written instructions to facilitate the job analysis. This coordinator is tasked withcollecting data, managing the flow of information, and communicating with jobexperts and business management throughout the process.

Job experts are individuals knowledgeable about the job and how it isperformed. They work with the coordinator to independently review descriptionsof WorkKeys skills and skill levels and to estimate the levels needed forcompleting job tasks. Decision makers in management then review the jobexperts’ estimates along with additional information from WorkKeysoccupational profiles. Their decisions about which skill level estimates to use fora job are based on all of the information collected by the coordinator and therecommendations included in the WorkKeys Estimator documentation. TheWorkKeys Estimator process generates skill level estimates only. It does notcreate task lists that link skill levels to the tasks of the job. If these are neededfor high-stakes decisions such as hiring, employers should consider using theSkillMap Job Inventory or WorkKeys Job Profiling, as described next.

SkillMap Job InventorySkillMap is a Web-delivered job inventory procedure that links the tasks of a jobto the WorkKeys skills and skill levels. In this procedure, a local administratorcoordinates the job analysis activities using the instructions built into SkillMap.This administrator contacts job experts and informs them about the Web-basedactivities they are asked to complete. SkillMap guides the job experts throughthe process of identifying job tasks and the WorkKeys skills and skill levelsneeded for completing those tasks.

After the job experts have entered information about job tasks and skill levels,the software produces a SkillMap Job Inventory Report. The job inventory liststhe required WorkKeys skills and skill levels and indicates how critical they areto the job. As with job profiling, the skills and skill levels correspond to theWorkKeys assessments and cutoff scores. In addition, the report providescontent validity evidence that documents the appropriate use of WorkKeysassessments for personnel selection, training, or promotion. In this way, theSkillMap procedure complies with the reporting requirements outlined in theUniform Guidelines for Employee Selection Procedures (1978), as more fullydiscussed in Chapter 6, Validity.

WorkKeys Job ProfilingWorkKeys Job Profiling is a more comprehensive method of job analysisconducted by individuals trained and authorized by ACT industrial-organizational psychologists. The training consists of several weeks of distancelearning activities culminating in an onsite workshop. Profilers are trained todevelop task lists and to conduct profiling sessions at job sites. In the sessions,profilers work with subject matter experts (SMEs), who provide information andexplanations about how the job tasks require specified skills and skill levels.The SMEs are individuals familiar with the job being studied. They typicallyinclude job incumbents and may include their supervisors or other employeesfamiliar with the job.

7

Page 20: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The outcome of job profiling is a set of recommended and prioritized standardsdescribing the WorkKeys skill levels required for job entry and effectiveperformance of a job. These skill levels correspond directly to scores on theWorkKeys foundational skills assessments. After the profiling sessions havebeen completed, the profiler prepares a written report that indicates which skillsand skill levels are relevant to the job and lists them according to their criticality.The report includes a task list and links the tasks to the WorkKeys skills andskill levels needed to perform those tasks. Thus, the report provides thoroughdocumentation of content validity evidence and facilitates the use of theWorkKeys assessments for high-stakes personnel selection and/or promotiondecisions, as more fully discussed in Chapter 6, Validity.

Value-Added Resources for Workforce DevelopmentThe WorkKeys system’s unique combination of job analysis and standardizedassessments has established a common language for sharing informationabout the skills of today’s evolving workforce. Job analysis can be used fordetermining the levels of foundational skills required for different kinds of work.The foundational skills assessments can then be administered to determinewhether individuals have sufficient levels of skill to perform certain jobs. Inaddition, decision-makers can augment employment screening processes withpersonal skills assessments, which are designed to provide incrementalpredictions of job performance in the areas of integrity, work conduct, andattitudes toward safety (Performance), a broad range of behavioralcharacteristics and attitudes relevant to multiple work outcomes (Talent ) andthe match between a person and specified occupations (Fit ).

As a system for sharing information about workforce development, WorkKeysalso acts as a bridge between the education and business communities. Arecent survey on students’ readiness for college and work finds, for example,that “employers and colleges say they are looking for the same basic skills:[High school] graduates should be able to write and speak clearly, analyzeinformation, conduct research, and solve difficult mathematics problems”(American Diploma Project, 2005). In a similar vein, ACT’s own research onstudent skills in Reading for Information and Applied Mathematics shows that“whether planning to enter college or workforce training programs, high schoolstudents need to be educated to a comparable level of readiness if they are tosucceed in college-level courses without remediation and to enter workforcetraining programs ready to learn job-specific skills” (ACT, 2006).

To help educators and employers reach their common goals for workforcereadiness, WorkKeys offers users a suite of value-added resources for:

• Educational program planning and evaluation

• Curriculum development and instruction

• Skills certification and job opportunities, as highlighted below

Educational Program Planning and EvaluationBecause WorkKeys is based on the principle that those who want to improvetheir skills can do so with appropriate direction, educators and trainers can useWorkKeys information about skill requirements and skill gaps to develop curricula,advise and place learners in programs, and evaluate learning outcomes basedon objective standards of performance. Several states, for instance, have alignedWorkKeys assessments with their high school curriculum standards, and Illinoisand Michigan have incorporated the Reading for Information and AppliedMathematics assessments into their high school graduation requirements.

8

Page 21: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

At the post-secondary level, many community college programs have analyzedthe WorkKeys skills individuals need to succeed in their courses of study. Theseschools also use WorkKeys profiles to set requirements for entry into theirprograms and, when necessary, to determine what skills training is required.And, in the arena of adult basic education, the U.S. Department of Educationand U.S. Department of Labor have approved the use of Reading forInformation and Applied Mathematics assessments in several federally fundedliteracy programs.

Case Study:

Using WorkKeys to gauge student abilitiesand measure progressThe ChallengeIn the mid-1990s, employers in Greeley, Colorado couldn’t find thequalified workers they needed. At the same time, high school dropoutsand at-risk students in Greeley had no way to qualify for skilled, well-paying jobs. Many were caught in a cycle of low-wage, entry-level work.

The SolutionIn 1994, to connect education with the needs of local businesses, AimsCommunity College in Greeley became the nation’s first ACT WorkKeysValue-Added Reseller. (There are now more than 400 resellers.) In 1998,Aims joined with the Centennial Board of Cooperative Educational Servicesto start the Weld County High School Diploma Program, a self-paced,competency-based chance for students to earn a high school diploma.

The ResultsThe diploma program grew from one student at its start to more than 1,600 graduates since 1999. The number of at-risk students entering theprogram has continued to grow and the program has expanded to twocounties, five towns, and 14 school districts. An advisory board wasestablished in 2001 to oversee and review the content areas of theprogram in order to ensure the quality of the diploma at all sites. Due tothe success of the program, most Greeley-area high schools haveintegrated WorkKeys-based skills training into their curricula.

Curriculum Development and InstructionThe WorkKeys system also supplies resources for curriculum developmentand delivery, beginning with the Targets for Instruction. The Targets are guidesdesigned to help educators and trainers develop curriculum and instructionstrategies for the WorkKeys foundational skills. The Targets can be used to:

• Identify the skill levels of competencies and learning objectives

• Select developmental materials that match specific WorkKeys skill levels

• Estimate the skill levels of materials currently in use

9

Page 22: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The guide for each skill includes skill-building strategies, sample work-basedtasks and problems for each level, guidelines for obtaining and using workplacematerials, and a detailed description of the WorkKeys skill and skill levels.

Learners can access numerous opportunities to improve their job skills throughschools, employer training programs, or independent distance-learningproviders. Self-paced training courses based on the skills and objectives of theWorkKeys assessments can be accessed, for example, through the ACTCenter™ network, which are located primarily at community and technicalcolleges where they serve as local workforce development resources. Moreinformation about the Targets for Instruction, the ACT Center sites, and links to online skill training courses is available athttp://www.act.org/workkeys/overview/prod.html#training.

Skills Certification and Job OpportunitiesBesides helping educators, WorkKeys assessments yield timely, reliable,and valid information for employers making decisions about screening,hiring, promotion, and additional training. Three of the foundational skillsassessments—Reading for Information, Applied Mathematics, and LocatingInformation—also serve as the platform for ACT’s National Career ReadinessCertificate program, which is designed to link qualified individuals withemployers who recognize the value of skilled job applicants. The NationalCareer Readiness Certificate Program has four components:

• The National Career Readiness Certificate, which verifies that an individualhas the literacy and numeracy skills necessary to be successful enteringemployment or a training program

• An Internet-based Certificate Registry, which allows individuals to viewWorkKeys scores, apply online for certificates, and show employers thatthey hold certificates

• A Talent Bank in which individuals who qualify for a certificate can post theircredentials and search job postings in a national database

• A Job Bank in which employers who use the National Career ReadinessCertificate can post job opportunities and search for qualified candidates

The National Career Readiness Certificate tells employers which workershave the essential core employability skills that are critical for success in theirbusinesses. Individuals with higher WorkKeys scores are prepared for a greaterrange of jobs or training programs. When businesses require or recommend thecertificate, they improve their chances of hiring a more highly skilled workforceand improving their productivity.

About the Reading for Information AssessmentThe remainder of this guide provides technical documentation for Reading forInformation, one of the core WorkKeys foundational skills assessments. Readingfor Information is a multiple-choice test designed to measure skills in readingwork-related documents. All of the reading selections and items formulated forinclusion on the Reading for Information assessment are based on theconstruct defined in the next chapter.

10

Page 23: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Construct DefinitionChapter AbstractExperts in both reading education and workplace productivity have noted theincreasingly sophisticated kinds of reading skills required in today’s globaleconomy. However, comparative research shows that traditional academicinstruction in reading may not provide workers with the reading skill sets theymost need in the context of the workplace. Widespread concern that Americanworkers do not have the foundational skills in reading needed for success onthe job led ACT to develop the WorkKeys® Reading for Information assessment.Designed to measure the reading skills required for entering and succeeding ina wide range of jobs, Reading for Information is based on the premise thatworkers who possess foundational skills in reading can more readily learn job-specific skills through experience or additional training.

Reading for Information is a multiple-choice test designed to measure skills inreading work-related documents. The documents—which include memos, letters,directions, signs, notices, bulletins, policies, and regulations—are based onmaterial that reflects the actual reading demands of the workplace. All of thereading selections and items formulated for inclusion in the Reading forInformation assessment are based on a construct defined by three aspects:Reading Skills, Document Types, and Level of Complexity. Reading Skills includethe categories of choosing main ideas or details, understanding word meanings,applying instructions, applying information, and applying reasoning. TheDocument Types used as reading selections are categorized as Instructions,Information, Policies, Contracts, and other Legal Documents. And, like allWorkKeys foundational skills assessments, Reading for Information incorporatesan aspect of increasing Complexity with respect to the tasks and skills assessed.

ACT translated this theoretical construct of Reading for Information into aworking set of test specifications, which is used to guide the construction ofstandardized test forms. Guided by these specifications, ACT selects items for the test that present reading problems in the context of a job in which theproblems are defined in terms of specific reading skills and document types,both of which increase in complexity by skill level. In addition, ACT uses itsWorld-of-Work Career Clusters to ensure that the test items present readingproblems in a variety of workplace situations.

The Need for a Reading for InformationAssessmentExperts in reading education and workplace productivity agree that proficiencyin reading is essential to successful participation in today’s knowledge-basedeconomy. The National Council of Teachers of English and the InternationalReading Association note in their current Standards for the English LanguageArts that:

Literacy expectations are likely to accelerate in the coming decades. Toparticipate fully in society and the workplace in 2020, citizens will needpowerful literacy abilities that until now have been achieved by only a smallpercentage of the population. At the same time, individuals will need to

11

Page 24: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

develop technological competencies undreamed of as recently as ten yearsago. One unexpected outcome of the recent explosion in electronic mediahas been a remarkable increase in the use of written language, suggestingthat predictions about the decline of conventional literacy have beenmisplaced and premature (http://www.ncte.org/about/over/standards).

Such conclusions about the need for better reading skills are reinforced by theresults of surveys conducted by government, business, and industry. Accordingto the most recent National Adult Literacy Survey, “Between 1992 and 2003,prose literacy declined for adults with a high school diploma, and…declined foradults with some college or with high levels of education (National Center forEducation Statistics, 2005). The American Management Association’s mostrecent survey of its corporate membership and client base found that over one-third (34.1 percent) of job applicants administered basic reading andmathematics skill tests “lacked the skills necessary to perform the jobs theysought” (AMA, 2001). And, in a recent national survey of manufacturingcompanies, 51 percent of the respondents agreed that employees need morereading, writing, and communication skills over the next three years (NationalAssociation of Manufacturers, 2005). Similarly, in a survey of human resourceand other senior executives, The Conference Board (2006) found that while63 percent of the employers rated reading comprehension as “very important”for high school graduates, 38 percent of them rated high school graduates asdeficient in this same skill.

Paradoxically, while the expectations for reading skills in school and theworkplace are rising, the achievement of U.S. students in reading continues tolag. In its most recent Program for International Student Assessment (PISA), theOrganisation for Economic Co-operation and Development (OECD) found thatonly about one-third of U.S. 15-year-olds are performing at satisfactory readinglevels, with nine countries ranking statistically significantly higher than the U.S.in average performance (OECD, 2004).

Such findings echo widespread concern that American workers do not have thereading skills that educators and employers say they need. Such concerns ledACT to develop the WorkKeys Reading for Information assessment. Designedto measure the reading skills required for entering and succeeding in a widerange of jobs, Reading for Information is based on the premise that workerswho possess the necessary skills in reading can more readily learn job-specificskills through experience or additional training.

Reading in Context: Classroom VersusWorkplaceTo help delineate the construct of Reading for Information, ACT reviewed therelevant literature on reading skills. In general, it was noted that readinginstruction does not always respond to the disparate needs of individuals asthey manage households, go to college, and go to work. A growing body ofresearch has documented the differences between reading as it is taught in theclassroom versus how it is applied in the workplace. As highlighted below, thisresearch indicates that successful application of reading skill is situation-specific, with reading behaviors dictated by the reader’s purpose andcircumstances.

12

Page 25: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Classroom ReadingClassroom reading materials differ from workplace documents in their content,structure, and task requirements. Classroom reading generally includestextbooks, narratives (stories and essays), dramatic literature, and poetry. TheACT National Curriculum Survey (2005–2006), for example, found that a highproportion of participating language arts educators rated the understanding ofprose fiction as prerequisite to future success. Humanities-based texts werealso highly rated.

In addition, teachers usually provide reading materials designed to beappropriate for students at a specified level. Often prepared by professionalwriters and editors, these materials use predictable methods of organization,including topic sentences and supporting details. Reading selections aretypically clear, logical, and even entertaining. As noted by Human Resourcesand Skills Development Canada (2004), “Much classroom reading is narrative.The reader is expected to start at the beginning and read consecutively to theend. Skimming skills are taught to allow readers to “get the gist,” and scanningmay be encouraged to find features such as the table of contents and index.Still, most classroom reading tasks are linear (i.e., we generally proceed line byline).” Using this narrative, linear approach, students read to follow directions,to acquire knowledge, to learn about how literature is structured, and tobecome familiar with great writers and historic ideas. Such tasks are oftenassigned to facilitate understanding, and students may approach them withguidance from a teacher or in partnership with their classmates. Thus, learninggoals such as comprehension are often achieved through structured materialsand cooperative efforts.

Workplace ReadingWhile electronic recordings can sometimes be substituted for live speech ordemonstrations, the written word is still the most consistently available mediumin the workplace. Employees who need to learn or review a procedure, verifypreviously encountered information, or find answers to job-related questionsfrequently do so by reading. As noted in the Public Broadcasting ServiceLiteracy Link program, “Whether it’s fixing a cranky copy machine or mixing upa batch of lawn fertilizer, good reading ... skills can mean the differencebetween success and disaster.” Similarly, information about the workplace itselfand the employees’ behavior in it are commonly defined in writing. “Employersrate good communication skills among the highest qualities they value in theiremployees. Every workplace has its own style of communication andsometimes new workers have difficulty mastering the new language of work”(Workplace Essential Skills, 1999).

In contrast to classroom reading selections, workplace reading materials areusually written by individuals more qualified by their content knowledge thantheir writing skills. While these materials may be intended to convey precisemeaning, they are not always easy to understand. Such materials may be used to train employees on safety and work procedures, or to provideinformation on employee benefits such as insurance policies and retirementplans. Employees read many of these materials in order to makes decisionsabout some immediate course of action. Other materials describe behaviors orcircumstances that may be relevant to their jobs in a more general sense or inthe future. In both cases, the employees’ comprehension of the text and theircompliance with its dictates may be taken for granted.

13

Page 26: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

According to Human Resources and Skills Development Canada (2004),“A great deal of workplace reading is “reading to do,” with the reader takingvarious actions and assuming risks associated with error. The fact that thereader takes various actions as a result of reading materials changes thedynamics of reading considerably. That is why the person with hands-onexperience to support the knowledge gained through reading is often thebest equipped to carry out the work.”

Differences between Reading in Classrooms and Reading in Workplaces Figure 2.1 summarizes the essential differences between classroom readingand workplace reading. The literacy demands encountered in schools areconsiderably different from those encountered at work. The differences arelikely to be in the purposes for reading, the type of reading materials, and theamount of help that readers can expect when they approach reading tasks.

Though classroom reading and workplace reading overlap to some degree,students are more likely to read with guidance, while employees are more likelyto read for guidance. In both situations, readers may look for main ideas,details, or the relationships between main ideas and details. However, becauseclassroom reading is carefully structured, students and employees may notneed the same skills to identify the most important aspect of the text.

In addition, what is most important to employees may change as conditionschange. A policy for quality control, for example, can be comprehended andimplemented differently under different conditions such as season of the year or staffing capacity. An employee who does not comprehend such differencesmay make a mistake that could result in accidental injury or financial loss.Students who make similar mistakes may earn a lower grade or may beexpected to do additional schoolwork, but they are not likely to put people orbusinesses at risk.

14

Page 27: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 2.1Reading in Classrooms Versus Reading in the Workplace

What Schools Teach What the Workplace Requires

Typical Materials

Literature, textbooks on differentsubjects, worksheets, materials aimedat the reader’s grade level

Procedural and informationaldocuments, materials that may bepoorly written, a wide range ofreadability levels

Authors

Textbook and literary authors Technical writers, content experts,lawyers, supervisors, secretaries,human resources personnel

Conditions

Deadlines and distractions controlledby the teacher

Deadlines and distractions subject toreal-time circumstances

Logic

Theoretical, academic, emphasis onsymbolic meaning

Problem-solving, pragmatic, goal-oriented, emphasis on literal meaning

Process

Reading guided by an instructor usingtechniques such as questioning, topicsentences, headings

Independent reading where the readermust either target specific informationwith a precise meaning or develop ageneral understanding of overallthemes and rationales

Purpose

• To learn reading and language skills

• To gain academic and culturalknowledge

• To learn to read for fun or newinformation

To gain job-related information in suchareas as:

• Work procedures, protocols, andsafety rules

• Instructions for specific tasks

• Rules and policies regulating on-the-job behavior

• Human resource opportunities

• Changing work conditions

Actions Taken

Talk or write about learned informationor about one’s own ideas

Decide what to do next or when toperform a step in a series of steps;determine what criteria to apply in acertain situation

Responsibility and Consequences

Complete assignments and earn agrade, honors, or scholarships

Make on-the-job decisions, whilemitigating risks

15

Page 28: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The Construct of Reading for InformationUnderstanding the above-discussed requirements of the workplace, ACTdesigned Reading for Information to assess a wide range of skills related toreading and understanding workplace information, instructions, procedures,and policies. The action-oriented texts found in many workplaces differ from theexplanatory and narrative texts on which most academic reading programs arebased. In addition, unlike academic texts, which are usually organized to easeunderstanding, workplace communication is not necessarily well written or easyto read. The reading selections in Reading for Information are based on actualworkplace materials representing a variety of occupations and workplacesituations. These selections and their associated test items are designed to the Reading for Information construct, as defined below.

Construct AspectsFigure 2.2 shows the three aspects essential to the construct of Reading forInformation: Reading Skills, Document Types, and Level of Complexity. Theseaspects can be mutually exclusive or they can interact. Workplace ReadingSkills vary in their Level of Complexity. Moreover, the Reading Skills applieddepend on the Document Types employees are asked to read.

Figure 2.2Three Interacting Dimensions of the Reading for Information Construct

InstructionsInformation

PoliciesContractsLegal Documents

Leve

l 3

Leve

l 4

Leve

l 5

Leve

l 6

Leve

l 7

Choos

ing m

ain id

eas o

r det

ails

Under

stand

ing w

ord

mea

nings

Applyi

ng in

struc

tions

Applyi

ng in

form

ation

Applyi

ng re

ason

ing

Docum

ent T

ypes

Readin

g Skil

ls

Levels of Complexity

16

Page 29: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Aspect 1: Reading SkillsThe skills tested by Reading for Information can be loosely grouped into thefollowing five categories, or strands, that vary in complexity as follows.

Choosing main ideas or details. Looking for main ideas and details is acommon reading task. In academic reading programs, the reader’s focus isoften directed toward finding the main idea in a topic sentence at the beginningof a paragraph or in a concluding sentence. Written communication found inwork situations is not necessarily produced to correspond to this readingstrategy. The critical information may not be at the beginning or end of aparagraph. Consequently, employees need to be able to use clues other thanplacement to identify the main ideas and important details.

Understanding word meanings. Work-related reading skills include the needto know simple words, to identify definitions clearly stated in the reading, and touse context to determine specific word meanings. However, reading workplacematerials often require the use of context to determine the meanings of moredifficult and specialized words. Thus, the use of jargon, technical terminology,acronyms, and words with multiple meanings increase as the contexts becomemore complex.

Applying instructions. Conveying instructions is the principal purpose of agreat deal of workplace communication. Skill in applying instructions involvessequencing and generalizing. At the less-complex levels, employees applyinstructions in a situation that is the same as that described in the readingmaterials. Tasks become more complex when employees must apply instructionsin new situations that are less similar to the one described, when the instructionscontain more steps, and when conditional statements are added.

Applying information. For effective performance of a task, it is often necessaryto apply information given in workplace communications to situations that aresimilar to the one described and to situations that are not the same. As in theprevious category, employees completing less complex tasks apply informationto clearly described situations. Tasks become more complex when employeesmust apply information to similar situations and then to new situations.

Applying reasoning. Employees must often use reading materials to predictthe consequences of certain actions, to summarize information, and tounderstand the reasoning—which may or may not be stated—behind a policyor procedure. Such abstract skills are generally found at the most complexReading for Information skill levels.

Aspect 2: Document TypesACT identified five types of workplace documents to use as reading selections,or stimuli, on Reading for Information: Instructions, Information, Policies,Contracts, and other Legal Documents. As in the workplace, procedural textsuch as instruction manuals and information memos form the bulk of the stimuli.Policy selections might outline rules and guidelines regulating employeebehavior in the workplace. A small number of the reading selections are basedon legal documents and contracts, which are typically written at a higher levelof complexity.

17

Page 30: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Aspect 3: Level of ComplexityIn developing the Reading for Information assessment, ACT received input fromemployers and educators indicating that some workplace tasks require readersto look for basic pieces of information and to find the next step in a set ofinstructions, while other tasks require readers to analyze, summarize, orgeneralize in order to draw meaning from a text. Therefore, the skill levels weredeveloped to measure skills in this range. In progressing from Level 3 to Level7, the information, sentence structure, and vocabulary become more complex.Reading materials at Level 3 are short and direct. The material becomes longer,denser, and more difficult to use as readers move toward Level 7. Written textsrange from clearly stated memos and instructions at Level 3 to complex legalregulations and policies at Level 7. The tasks also become more complex: AtLevel 3, readers begin by finding very obvious details and following shortinstructions. At the higher levels, readers need to make inferences, drawconclusions, and determine what information is not useful or relevant. Althoughreadability is one consideration in choosing Reading for Information materialsand assigning them to levels (see Appendix 3 for a readability study), it is onlyone small piece of the construct. Figure 2.3 summarizes the factors in thereading selections and items that influence their complexity at different levels.

Figure 2.3Factors Affecting Complexity of Reading for Information

18

• Reading materials include morecomplicated information, includingmulti-step procedures and legalregulations

• Similar pieces of information aredifferentiated by many minor details

• Longer sentences are typical

• More difficult words, includingjargon, technical terms, acronyms,or words that have multiplemeanings are used

• Information needed to answer theitems may not be clearly stated

• Readers must think about changingconditions or multiple considerationsin choosing a course of action

• Readers apply information to newsituations or draw conclusions fromthe information given

Representing the World of Work in Reading for InformationAs more fully described in the next chapter on test development, ACTtranslated its theoretical construct of Reading for Information into a working setof test specifications, which is used to guide the construction of standardizedtest forms. As discussed, the items on the test present reading problems in thecontext of a job, where these problems are defined in terms of specific readingskills and document types, both of which increase in complexity by skill level.

In addition, ACT uses its World-of-Work Career Clusters to ensure that a broadscope of careers are sampled in the test items. This sampling approacheliminates the possibility of over-representing familiar topics or jobs that maygive some test takers an unfair advantage in answering the items. The ACT

Page 31: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

World-of-Work Map organizes occupations into six career clusters and twentysix career areas or groups of similar jobs (Prediger, 1976), as listed in Figure2.4. Because the map covers all U.S. jobs, ACT uses it as a guide to representa fair balance of careers in the Reading for Information test forms. Thecomplete map and explanations can be found at www.act.org/wwm.

Figure 2.4World-of-Work Career Clusters and Areas

Career Cluster Career Areas within Cluster

Administration and Sales Employment-Related ServicesMarketing and Sales ManagementRegulation and Protection

Business Operations Communications and RecordsFinancial Transactions Distribution and Dispatching

Technical Transport Operation and RelatedAgriculture, Forestry, and RelatedComputer and Information SpecialtiesMechanical and Electrical SpecialtiesCrafts and RelatedManufacturing and ProcessingConstruction and Maintenance

Science and Technology Engineering and TechnologiesNatural Science and TechnologiesMedical TechnologiesMedical Diagnosis and TreatmentSocial Science

Arts Applied Arts (Visual)Creative and Performing ArtsApplied Arts (Written and Spoken)

Social Services Health CareEducationCommunity ServicesPersonal Services

19

Page 32: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Assessment DevelopmentChapter AbstractThis chapter first delineates the general process ACT uses to develop thefoundational skills assessments included in the WorkKeys system. It thendocuments how this process was applied to the development of Reading forInformation. For any WorkKeys foundational skills test, the general processinvolves five phases: (1) development of the skill definition and itemspecifications, (2) development and administration of a prototype assessment,(3) development and administration of pretest items, (4) development andadministration of initial operational forms, and (5) on-going psychometricanalysis, evaluation, and development of new items and test forms. To developReading for Information, ACT implemented these five phases, as follows.

In Phase 1, to define the general domain of skill, ACT reviewed the relevantliterature, consulted with advisory panels, and studied task examplesrepresenting a broad range of jobs. The resulting test specifications definedReading for Information as a measure of the skills people use when they readand use written text in order to do a job. Like the other foundational skillsassessments, Reading for Information is a criterion-referenced test in which theassessed skills are defined in terms of ordered levels of proficiency, whereeach level builds upon the skill components assessed at the lower levels. Thefinished skill definition for Reading for Information includes five levels of skillranging from 3 to 7.

In Phase 2, ACT used the test specifications formulated in Phase 1 to developa prototype of the Reading for Information assessment. The prototype consistedof 75 multiple-choice items organized into the five levels, each containing 15 items. To determine the properties of the prototype, ACT administered it tosmall samples of students and employees in three Midwestern states. ACT thenconsidered the statistical results and participant feedback from this prototypeadministration in making adjustments to the Reading for Information skilldefinition and skill level descriptions.

In Phase 3, ACT trained content experts to develop a pool of Reading forInformation items designed to meet the WorkKeys standards and specificationsfor item content and format. From this pool, ACT selected items for pretesting.Pretesting permitted psychometric evaluation of the individual test items.Following this evaluation, ACT withdrew any problematic items from the pooland conducted content and fairness reviews on all of the remaining items withacceptable pretest statistics.

In Phase 4, ACT used the pretest items that met the content, fairness, andstatistical specifications to assemble the first three operational forms of Reading for Information. Each form consisted of 30 operational items with 6 items at each level, which were selected to meet the content and statisticalspecifications of the test. To ensure each form’s readiness for operational use,ACT used IRT methods to examine its statistical properties and asked advisorypanels to conduct final reviews for content and fairness. The forms were thenpublished.

20

Page 33: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

In Phase 5, after collecting sufficient operational score data, ACT conductedscaling to place scores from the different test forms on a common scale andequated the forms to make statistical adjustments to differences in theirdifficulty. Given sufficient data, ACT has also evaluated operational items fordifferential item functioning (DIF). ACT continues on a regular basis to developand pretest new items which are embedded in the new operational forms ofReading for Information published periodically.

Overview of the WorkKeys Test Development ProcessWhen a new foundational skills assessment is developed, ACT typicallyconsults with numerous professionals in labor, business, and education, andreviews the literature regarding the skill to be assessed. The goal of thisresearch is to find out how the skill is used in jobs so that the assessmentincludes realistic tasks or problems based on the actual demands of theworkplace.

In developing the initial forms of a WorkKeys assessment, ACT staff takes stepsto ensure the assessment’s suitability of format, quality of content, fairness todiverse groups, and psychometric soundness. The general test developmentprocess involves five interrelated phases:

1) Development of the skill definition and test specifications

2) Development and administration of a prototype assessment

3) Development and administration of pretest items

4) Construction of the initial operational forms

5) On-going psychometric analysis, evaluation, and development of new items and test forms

At critical points in the development process, ACT consults with advisorypanels made up of employers and educators to ensure appropriateness of thetest content. In addition, item writers, content reviewers, and fairness reviewersfrom across the country participate. These individuals hold a variety of jobsrepresenting such diverse occupational fields as healthcare, engineering,marketing and sales, agriculture, social services, and the performing arts.During the development of WorkKeys assessments, ACT consistently seeks abroad representation of skills, workplaces, geographic regions, anddemographic groups.

21

Page 34: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Phase 1: Development of Skill Definition andTest SpecificationsGeneral ApproachTo delimit the skill area for a WorkKeys assessment, ACT consults with advisorypanels of experts in business and education. ACT also researches taskexamples representing a broad range of jobs and uses this information todefine the skill area, its associated strands of subskills, and the skill levels thatmake up the skill scale. To begin, ACT identifies the general domain of the skill,defining it to be as homogeneous as possible while still appropriate forapplications across jobs. Next, test specifications are drafted that define theskill and its strands as applied in the workplace. This draft is reviewed by theadvisory panel members, whose feedback and recommendations help ACTrefine the test specifications.

Foundational skills assessments are criterion-referenced: That is, an individual’sperformance on the test is compared to an established set of criteria (Crockerand Algina, 1986), which, in the WorkKeys case, is the range of skill defined asthe skill area. For purposes of employee selection, the skill standard is usuallydefined as the level of proficiency required for performing a particular job in aparticular location effectively. Therefore, an essential part of defining aWorkKeys skill and its corresponding test specifications is describing the skill interms of levels.

Development of WorkKeys Skill LevelsFor each assessment, ACT uses a list of typical workplace tasks to develop acontinuum of skills and to define a hierarchy of skill levels, whose top andbottom are determined as follows:

• The lowest level comprises the least complex skills for which a businesswould be willing to pay for assessment. That is, if the job requires a lowerlevel of the skill, it would probably not be cost-effective for the business toformally assess it. This level is identified as Level 3 to allow for the possibilitythat lower levels might be defined in the future to meet some business need.

• The highest level is set at the point at which a more complex level of skillwould probably require specialized training. The level is identified as Level 7,allowing for five levels in all that can be reliably distinguished from oneanother.

In other words, ACT delineated the WorkKeys skill levels to be far enoughapart to be psychometrically distinguishable yet close enough to provide usefulinformation. To meet these criteria, ACT staff views a WorkKeys assessmentas a series of item pools, one pool for each level. The levels are designedsuch that:

• Items within each pool are relatively homogenous with respect to the skillcomponents assessed and to the degree of complexity.

• Each level subsumes and builds on the skill components assessed at lowerlevels.

The levels constitute the skill scale that is used for scoring the assessment andfor conducting WorkKeys-related job analyses. Therefore, it is important todefine the skill components in terms that can be related to both specific jobtasks and to test items. To this end, ACT conducts a qualitative analysis ofworkplace situations requiring various levels of skill in numerous occupations.Variables that contribute to the complexity of the skill area are identified and

22

Page 35: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

examined with regard to the purpose of the assessment. Possible interactionsor other joint effects of these variables are also considered. This extensiveanalysis leads to the assignment of a set of skill components to each level.

Development of WorkKeys Item SpecificationsFor each assessment, ACT also determines the types of tasks that best fit thedefined skill levels—for example, discrete multiple-choice items or essayprompts. The items are developed to be similar to problems faced byemployees on the job. As discussed in Chapter 2, examples representing avariety of occupations drawn from the World-of-Work clusters help to establishthe job relevance of the assessment. The challenge is to incorporate a varietyof tasks while maintaining the homogeneity of the skill and avoiding job-specificknowledge. For all the WorkKeys tests that do not specifically assess readingskills, ACT staff uses clear, uncomplicated language to keep the reading skillrequired for taking the tests as low as possible. This practice ensures that theskill of interest is assessed, not reading ability.

To test and further refine the specifications, ACT drafts an initial set of testitems, which undergo reviews for realism, accuracy, and fairness. Participantsin these reviews include ACT test development staff and external business andeducation experts from diverse cultural and ethnic backgrounds. Feedbackfrom these reviews is then considered in the course of further item revisions andediting to prepare the items for a prototype of the assessment.

Phase 2: Development and Administration ofWorkKeys PrototypeIn Phase 2, ACT creates a prototype form of the assessment based on the testand item specifications developed in Phase 1. To determine the properties ofthe prototype test, ACT administers it to small samples of students andemployees in selected locations. Surveys about the testing experience are alsoadministered to participating students, teachers, employees, supervisors, andmanagers. ACT then analyzes the test results to see if the specifications areworking properly. More importantly, the results are used to qualitatively evaluatethe functioning of the skill levels. These statistical analyses and the qualitativefeedback from the test takers and test administrators help ACT make anynecessary adjustments to the skill and skill level descriptions.

Phase 3: Development and Administration ofWorkKeys PretestsPretest Item DevelopmentAfter the prototype testing, ACT selects and trains item writers who have workexperience in a wide range of entry-level jobs in a variety of workplacesituations. These writers develop a larger pool of items for pretesting. ACT editsthe items to meet the content standards for the skills and tasks. Since theassessment is designed to be workplace relevant, depictions of the workplacesin the tasks have to be realistic. Stimulus materials also need to be appropriateto the workplace described, and the questions asked have to reflect types thatmight actually be encountered in that workplace.

Each item is developed to have integrity of its own and to represent genuineworkplace needs and issues. Accuracy in the content of the tasks is criticallyimportant. A person knowledgeable of any workplace depicted in the

23

Page 36: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

assessment should not be able to identify any of the tasks, circumstances,procedures, or keyed responses (correct answers) as questionable,inappropriate, or otherwise inaccurate. It is especially important to avoid itemswhere an individual in the situation depicted would have no reason to careabout the answer to the question, either because the answer is of no use to theperson or because the person already knows the answer.

Pretest Administration and Data AnalysisPretesting items allows ACT to evaluate their psychometric properties such asreliability and scalability. Standard item statistics such as item difficulty and item-total correlations also produce information about item performance. ACT staffand external content reviewers examine items that do not perform as expected.The statistical fit of items to levels is reviewed to ensure that each item meets thestatistical requirements of its level. For example, a Level 3 item may need to bereclassified at a higher level if pretesting reveals that many examinees scoring atLevel 4 or higher did not answer it correctly. At this step, any problematic itemsare withdrawn from the pool of potential operational items.

Content and Fairness Reviews of Pretested ItemsAll pretested items are reviewed for realism, accuracy, content, and fairness.Participants in these reviews include ACT test development staff and externalbusiness and education experts from diverse cultural and ethnic backgrounds.Content-qualified reviewers examine each item to ensure its soundness from aworkplace perspective. Fairness reviewers representing gender, cultural, andethnic/racial subgroups work to ensure that no item was unfair to any minoritygroup members. ACT gives the reviewers written guidelines and requires themto write an evaluation of each item. Figure 3.1 shows the content reviewchecklist, and Figure 3.2 shows the fairness review checklist.

Figure 3.1Content Review Checklist

1. The wording is straightforward and easily read and understood.

2. The question being asked is clear.

3. Items are independent of one another.

4. The factual information is accurate.

5. The consequences of actions or procedures are logical.

6. The item is relevant and appropriate to the workplace described.

7. The questions or situations are similar to those in the actual workplace setting.

8. The workplace details contained in the item are realistic.

9. The amount of detail provided is appropriate to create a relevantcontext for the item.

10. The keyed answer is the best one for that item.

11. No responses other than the key are correct or equally appropriate forthe specific item.

12. The alternative choices are plausible in the context of the question.

13. The alternative choices are attractive to uninformed test takers.

14. The test instructions are clear and precise.

15. The criteria for scoring or identifying the correct or best answer are clear.

24

Page 37: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 3.2Fairness Review Checklist

1. The language used is not likely to be considered offensive to anysubgroup.

2. Test-taker subgroups are not given an unfair advantage ordisadvantage.

3. The concepts or words likely to be unfamiliar to any subgroup areused only when inherent to the workplace document andcorrespondingly to the related questions.

4. The item topic and context are not likely to upset test takers or distractthem from the examination task.

5. The portrayal of population subgroups is accurate and fair withoutreference to stereotypes.

6. The presentations of group differences, if any, are relevant to thecontext of the item.

7. The portrayal of groups includes a full range of roles, occupations, and activities.

8. Culturally specific language is avoided.

9. The vocabulary is representative of that used in the workplace settingdescribed.

10. Gender-neutral terms and culture-neutral situations are used wheneverpossible and appropriate.

11. Background conceptual information and job-specific information that isnot part of the question are equally accessible to all test-taker groups.

12. All test-taker groups have equal access to test questions at each level.

ACT reviews the evaluations and responds to any concerns the reviewersraise. Any item rejected by the reviewers is removed from the operational pool.Those that pass reviews and meet specifications are left intact to preserve theaccuracy of the pretest item data. Such items form the pool from whichsubsequent operational forms are drawn.

Phase 4: Construction of Operational FormsOperational WorkKeys test forms are constructed such that in each formoccupations are evenly distributed across levels, the items depictapproximately equal distribution of power between men and women, andnames used in the items sound like those used by a variety of racial andethnic groups. The items included in the operational forms have to (a) meetpsychometric quality standards individually and (b) as a set, satisfy thestatistical criteria established for each level. Finally, standard item and teststatistics are reviewed to ensure, to the extent possible, that the respectivelevels are represented comparably across the forms.

ACT also uses item response theory (IRT) methods to calculate item parameterestimates and determine the expected test parameters of the operational forms.For each operational form of a WorkKeys assessment, the estimated mean itemdifficulty (p) values and IRT b parameters should indicate that the itemsincrease in difficulty at each new level. The data should further suggest that thedifficulty of the items selected for each level is comparable across forms. ACTexamines the statistical properties of the operational forms, and experts on theadvisory panels conduct final reviews for content and fairness. Once the formssatisfy this round of scrutiny, ACT considers them ready for operational use.

25

Page 38: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Phase 5: Ongoing Psychometric Evaluation and DevelopmentDevelopment of New Test FormsAfter the first WorkKeys test forms have been administered operationally, theyare scaled and equated—processes more fully described in Chapter 4. At thesame time, ACT continues to develop and pretest new items. These new pretestitems are embedded in operational forms, but performance on them does notaffect the test taker’s final score. As in Phase 2, the data from the administrationof the items are used to evaluate the psychometric properties of the items. And,as before, reviews by content and fairness experts ensure the fairness andcontent validity of the items. Those pretested items that meet statistical andjudgmental requirements become viable candidates for inclusion in newoperational forms.

Differential Item Functioning (DIF)Once items on an operational form have been administered to a sufficientnumber of test takers, ACT uses DIF analyses to evaluate and flag any non-pretest items that could be unfair to any group of test takers. Items found tobe fair in earlier qualitative reviews can still function differently for specificpopulation subgroups. DIF shows a statistical difference between theprobability that a specific population group (the focal group) will get the itemright and the probability that a comparison population group (the base group)will get the item right, assuming that both groups have the same level ofexpertise with respect to the content being tested. In many instances, onegroup on average will have a higher probability of correctly answering an item,a difference that may be explained by differing levels of expertise between thegroups. DIF procedures take these background group differences into accountand indicate whether an item may unfairly favor one group over another. ACTuses the standardized difference in proportion correct (STD) and the Mantel-Haenszel common-odds-ratio (MH) statistics to detect the existence of DIF initems on WorkKeys test forms. Items found to exceed critical values for DIF arereviewed singly and overall. The results of this review may lead to the removalof one or more items from a form.

Development of the Reading for InformationAssessmentDevelopment of the Reading for Information assessment followed the fivegeneral phases described in the first part of this chapter. In Phase 1, ACTconsulted with advisory panels from business and education and researchedtasks from a broad range of jobs that required the use of reading skills. ACTused the results of this research to define the general skill area of Reading forInformation and its associated strands, which include choosing main ideas ordetails, understanding word meanings, applying instructions, applyinginformation, and applying reasoning.

26

Page 39: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

ACT also conducted an extensive analysis of typical workplace tasks involvingReading for Information to define the hierarchy of skill levels, which range fromLevel 3 to Level 7. ACT determined that reading selections (stimuli) andmultiple-choice items worked best for assessing the types of reading tasksfaced by employees on the job. During Phase 1, ACT drafted an initial set ofstimuli and test items, which were reviewed for realism, accuracy, and fairness.Content and fairness reviewers provided feedback which was factored into thedevelopment of the prototype assessment planned for Phase 2.

In Phase 2, ACT created a prototype of Reading for Information that consisted of 75 multiple-choice items organized into five levels of 15 items each. Theprototype, along with a survey of the testing experience, was administered tosmall samples of students and employees in three Midwestern states. ACT thenanalyzed the test and survey results and used the findings to made adjustmentsto the Reading for Information skill definition and skill level descriptions, asshown in Table 3.1.

27

Level Characteristics of Stimuli and Items Skills

3 • Reading materials include basic companypolicies, procedures, and announcements

• Reading materials are short and simple,with no extra information

• Reading materials tell readers what theyshould do

• All needed information is stated clearly and directly

• Items focus on the main points of thepassages

• Wording of the questions and answers issimilar or identical to the wording used inthe reading materials

• Identify main ideas and clearly stateddetails

• Choose the correct meaning of a word thatis clearly defined in the reading

• Choose the correct meaning of common,everyday and workplace words

• Choose when to perform each step in ashort series of steps

• Apply instructions to a situation that is thesame as the one in the reading materials

4 • Reading materials include companypolicies, procedures, and notices

• Reading materials are straightforward, buthave longer sentences and contain anumber of details

• Reading materials use common words, butdo have some harder words, too

• Reading materials describe procedures thatinclude several steps

• When following the procedures, individualsmust think about changing conditions thataffect what they should do

• Questions and answers are oftenparaphrased from the passage

• Identify important details that may not beclearly stated

• Use the reading material to figure out themeaning of words that are not defined

• Apply instructions with several steps to asituation that is the same as the situationin the reading materials

• Choose what to do when changingconditions call for a different action (followdirections that include “if-then” statements)

Table 3.1Skill Definition for Reading for Information

(continued)

Page 40: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 3.1 (continued)

28

Level Characteristics of Stimuli and Items Skills

5 • Policies, procedures, and announcementsinclude all of the information needed tofinish a task

• Information is stated clearly and directly, but the materials have many details

• Materials also include jargon, technicalterms, acronyms, or words that have several meanings

• Application of information given in thepassage to a situation that is not specificallydescribed in the passage

• There are several considerations to betaken into account in order to choose thecorrect actions

• Figure out the correct meaning of a wordbased on how the word is used

• Identify the correct meaning of an acronymthat is defined in the document

• Identify the paraphrased definition of atechnical term or jargon that is defined inthe document

• Apply technical terms and jargon and relatethem to stated situations

• Apply straightforward instructions to a newsituation that is similar to the one describedin the material

• Apply complex instructions that includeconditionals to situations described in thematerials

6 • Reading materials include elaborateprocedures, complicated information, and legal regulations found in all kinds of workplace documents

• Complicated sentences with difficult words,jargon, and technical terms

• Most of the information needed to answerthe items is not clearly stated

• Identify implied details• Use technical terms and jargon in new

situations• Figure out the less common meaning of a

word based on the context • Apply complicated instructions to new

situations• Figure out the principles behind policies,

rules, and procedures• Apply general principles from the materials

to similar and new situations• Explain the rationale behind a procedure,

policy, or communication

7 • Very complex reading materials• Information includes a lot of details• Complicated concepts• Difficult vocabulary• Unusual jargon and technical terms are

used, but not defined• Writing often lacks clarity and direction• Readers must draw conclusions from

some parts of the reading and apply them to other parts

• Figure out the definitions of difficult,uncommon words based on how they are used

• Figure out the meaning of jargon or technicalterms based on how they are used

• Figure out the general principles behind thepolicies and apply them to situations thatare quite different from any described in thematerials

Page 41: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

In Phase 3, ACT recruited qualified item writers from a wide variety ofoccupations to develop a pool of pretest items. Table 3.2 providesdemographic statistics for these writers.

Table 3.2 Demographic Statistics of Initial Item Writers (N = 20)

The writers wrote 570 test items, from which ACT selected the items that metWorkKeys standards for skill, task, and format. The items were presented in six pretest forms, each form containing five levels of difficulty of 15 items each.Pretesting allowed ACT to evaluate the performance of the items and identifythe pool of acceptable items from which operational forms could be drawn. This evaluation included external reviews by 13 content reviewers and 8 fairness reviewers.

In Phase 4, ACT initially constructed three operational forms of Reading forInformation, selecting from the pool of pretested items that met the content,fairness, and statistical specifications. Each test form contained 30 scoreditems—six items at each level—fit to a target item distribution defined by readingskill, level of complexity, and document type. Figure 3.3a shows the approximatedistribution of items for each reading skill by WorkKeys skill level. For example,approximately nine items on the test address the skill of identifying the main ideaor details, both obvious and implied, in a reading selection.

Items are also distributed across five skill levels, where Level 3 is the leastcomplex and Level 7, the most complex. Reading selections that are short anddirect at Level 3 become longer, denser, and more difficult to use as readersmove toward Level 7. Written texts range from clearly stated memos andinstructions at Level 3 to complex legal regulations and policies at Level 7. Thetasks also become more complex as readers move from Level 3 to Level 7. AtLevel 3, readers begin by finding very obvious details and following shortinstructions. At the more complex levels, tasks can involve more applicationand interpretation, as described in Table 3.1.

N percent

Gender Male 7 35

Female 13 65

Race/Ethnicity White 19 95

African American 1 5

29

Page 42: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 3.3aTarget Distribution of Skills Reading for Information Test Forms(30 Scored Items Total)

Figure 3.3b shows the target distribution of document types used as readingselections, which include a range of work-related materials such as memos,letters, signs, notices, bulletins, and contracts. Instructions, for example,comprise the most frequently used type of document on the assessment.

Figure 3.3bTarget Distribution of Document Types on Reading for Information Test Forms(30 Scored Items Total)

After selecting items using the target distributions as a guide, ACT carefullyreviewed each of the three initial Reading for Information forms to ensure thatthe items, both individually and collectively, met the test specifications. IRTmethods were used to ensure that the difficulty of the items selected for eachlevel were comparable across forms.

Document Types Used for Stimuli Items per Form*

Contract 2

Policy 7

Instructions 13

Legal document 2

Informational 6

*Approximate number

Skill

Items per Level*

Level

3 4 5 6 7

Identifying the main idea or details, bothobvious and implied

3 2 2 1 1

Understanding word and acronym meaningsusing context, ranging from explicit toimplied definitions

1 1 1 1 1

Applying instructions by sequencing steps,knowing when to use conditions, and causeand effect

2 2 1 1 0

Applying information to described and newsituations

0 1 2 1 1

Analyzing and synthesizing information toidentify rationale and principles and applythem to new situations

0 0 0 2 3

*Approximate number

30

Page 43: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 3.3 shows statistics for three typical operational forms of Reading forInformation. The decreasing p-values across levels are consistent with theincrease in difficulty of the test items from Level 3 to Level 7. The biserial rindicates a high degree of discrimination at each level. Moreover, each testform exhibits good internal consistency, as indicated by coefficient alpha valuesof 0.823 or greater. (For more detail on IRT metrics, please see Chapter 4.)

Table 3.3Descriptive Statistics of Operational Forms Based on Pretest Data

Experts on the advisory panels were again asked to perform final reviews forcontent and fairness. Once the forms satisfied all of the standard quality controlchecks, ACT made the assessments available for operational use.

In Phase 5, when sufficient score data had been collected, ACT conductedscaling and equating, two processes more fully described in Chapter 4. At thesame time, ACT has continued to monitor the psychometric quality of thepublished test forms and to develop new pretest items. The new items areembedded in operational forms and, as in Phase 2, the data from theiradministration are used to evaluate their psychometric properties. Anypretested items that meet statistical and judgmental requirements becomeviable candidates for inclusion in new operational forms.

Items Mean IRT Values

LevelNo. ofItems

Mean p value

MeanBiserial Alpha a b c

Form 1 3 6 .934 .783 .651 1.290 -1.871 .136

4 6 .869 .752 .661 1.156 -1.366 .165

5 6 .701 .640 .556 .950 - .677 .188

6 6 .446 .695 .571 .983 .663 .165

7 6 .298 .691 .361 1.038 1.748 .179

3–7 30 .650 .712 .823 1.084 - .300 .167

Form 2 3 6 .959 .775 .570 1.261 -2.289 .112

4 6 .830 .704 .630 1.080 -1.210 .173

5 6 .676 .716 .676 1.057 - .417 .147

6 6 .433 .740 .599 1.306 .871 .173

7 6 .394 .656 .392 .936 1.395 .214

3–7 30 .659 .718 .840 1.128 - .330 .164

Form 3 3 6 .953 .785 .611 1.283 -2.132 .144

4 6 .828 .625 .567 .817 -1.407 .120

5 6 .646 .692 .642 1.012 - .269 .165

6 6 .491 .732 .624 1.190 .414 .179

7 6 .300 .680 .282 .955 1.834 .181

3–7 30 .643 .703 .826 1.052 - .312 .158

31

Page 44: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Scaling and EquatingChapter AbstractThis chapter documents the methods used to develop meaningful score scalesfor the Reading for Information assessment. It describes the procedures usedto scale and equate the scores of different test forms. Scaling is a process ofsetting up a rule of correspondence between a test’s observed scores and thenumbers assigned to them. The Reading for Information assessment assignsscores on two scales: Level Scores and Scale Scores. Level Scores reflect theexpectation that test takers have mastery of the level specified in the score andall the levels below it. To establish the Level Score scale, ACT conducted anempirical scaling study of pools of Reading for Information items. Using themethods of item response theory (IRT), ACT sought to determine an assignmentof Level Scores to test takers that reliably supported the assumptions that (1) mastery of a level means a test taker is able to correctly answer 80 percentof the items representing the level, and (2) test takers have mastery of all levelsup to and including the level specified in the Level Score (but no higher). ACTused an Expected Proportion Correct (EPC) method to define the Level Scorescale. This method rests on certain statistical assumptions, notably the fit of theIRT model to the score data. The results of the scaling study showed the fit ofthe model to be very good.

In addition to Level Scores, ACT developed Scale Scores to give users moredetailed information for purposes of program evaluation and outcomemeasurement. Scale Scores, which are a function of the Number Correct (NC)Score, make finer distinctions among test takers’ abilities than Level Scores.ACT used the equal standard error of measurement method to determine theassignment of Scale Scores, which fall on a scale of 26 points ranging from 65to 90. To take the “guessing effect” into account, ACT used a combination ofclassical test theory and IRT to truncate the scale’s lower end. ACT’spsychometric goals in developing this additional scale were to provide anadequate number of score points for the anticipated uses of the scores, whileavoiding having more score points than could be supported by the number ofitems on a test form.

As new forms of Reading for Information are developed, each is constructed to adhere to the same test specifications. To control for the inevitable slightdifferences in form difficulty, however, scores on different forms are equated sothat, when they are reported as either Level Scores or Scale Scores, they havethe same meaning. Depending on the circumstances of administration andother factors, ACT uses different data-collection designs and methods forequating test forms. For example, if a common-item nonequivalent groupdesign is used, ACT typically uses observed-score equating methods. ACT alsouses IRT to maintain a pool of items calibrated to place their item parameterestimates on a common scale. These estimates can be used to assemble new,pre-equated forms. When possible, ACT conducts studies to evaluate thecomparability of pre-equating results with the equating results derived fromother methods.

32

Page 45: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Level Score ScaleEach Reading for Information test item is written to assess a specified level ofskill applied in a workplace situation with a specified level of complexity. Fivelevels designated Levels 3 through 7 were initially defined through expertjudgment, as described in Chapters 2 and 3. Pretesting demonstrated that the items met statistical specifications as well. This chapter describes how thescores based on an initial set of Reading for Information forms were related tothe same five skill levels through a process called scaling. The equatingmethod used to establish statistical comparability of the forms is described later in this chapter.

The method of assigning Level Scores to test takers was developed to supporttwo basic assumptions about Level Scores:

1. Mastery of a level should mean that a test taker is able to correctly answer80 percent of the items representing the level.

2. Test takers have mastery of all levels up to and including the level specifiedin the score, and do not have mastery of higher levels.

Initially determined by content experts, the 80 percent standard was thenimplemented with respect to pooled domains of items called level pools. Foreach of the five levels, ACT created a pool of eighteen items composed of sixitems from each of three operational forms assembled according to the sametest specifications. Though they had no items in common, these three forms—identified here as Forms 1, 2, and 3—were designed to be comparable indifficulty based on item statistics from pretest studies and were administered torandomly equivalent groups. ACT then applied an item response theory (IRT)model to these five level pools to derive a Level Score scale based on the80 percent criterion of mastery.

In WorkKeys job analysis, the skill level required for entry into a specified job isestablished based on the most complex tasks a newly hired employee wouldbe expected to complete using the skill. This remains true even if the job alsoinvolves less-complex tasks corresponding to lower levels of the same skill. TheWorkKeys scoring system must therefore reflect a reasonable expectation thattest takers have mastery of the level specified in the score and mastery of allthe preceding levels (Guttman, 1950). For example, a test taker scoring at Level 5 is expected to have mastered the skills at Levels 5, 4, and 3.

33

Page 46: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Level Score Scaling StudyThe data collection process and the analyses that defined the WorkKeys levelsare referred to here as the Level Score scaling study. All three test forms wereadministered to randomly equivalent groups of high school juniors and seniorsby spiraling test forms within classrooms. This means that in each classroomthe first person received Form 1, the next person received Form 2, and the nextreceived Form 3. This pattern was repeated so that a third of the test takersreceived each form.

Table 4.1 shows the summary statistics for Number Correct (NC) scores on theReading for Information forms used in the scaling study. Sample sizes for theforms ranged from 2,020 to 2,032. The mean NC scores ranged from 20.3 to21.2 with skewness of approximately –1 and kurtosis greater than 1, except forForm 3. The KR-20 reliability coefficients were 0.77 to 0.80 (Schulz, Kolen, andNicewander, 1999). At 0.78 to 0.81, the reliability coefficients based on thethree-parameter logistic (3PL) IRT model (Kolen, Zeng, and Hanson, 1996)were similar to the KR-20 reliability coefficients; however, the differencesbetween the coefficients derived by the two methods were small.

Table 4.1Summary Statistics for Number Correct Scores for Reading for Information Assessment

Figure 4.1 displays the summary p-values, or difficulties, of the itemscomprising the level pools. This plot shows that while item difficultiesoverlapped across levels, average item difficulty increased substantiallyby level, as shown by decreasing mean item p-values.

Figure 4.1Item p-Values (p) and Mean Item p-Values by Level of Item(Diamonds connected by lines = mean item p-values)

1.0

0.8

0.6

0.4

0.2

0.0

p

3 4 5 6 7

Level of item

Form 1(N = 2,032)

Form 2(N = 2,020)

Form 3(N = 2,024)

Mean 20.7 21.2 20.3

Standard Deviation 4.4 4.2 4.5

KR-20 0.79 0.77 0.80

3PL IRT Reliability 0.79 0.78 0.81

34

Page 47: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The 3PL IRT model was fit to the data separately for each test form using theBILOG program (Mislevy and Bock, 1990). Test-taker skill is represented in the3PL model as a unidimensional, continuous variable, θ (theta). Theta isassumed to be approximately normally distributed in the sample to which thetest is administered. Items are represented in the 3PL model by three statisticsdenoted a, b, and c, where:

• a represents the discriminating power of the item

• b represents the difficulty of the item

• c represents the lower asymptote of the item response function on θ, whichis sometimes referred to as the guessing parameter

The item statistics from the BILOG analyses were used with the IRT model topredict expected proportion correct (EPC) scores on the level pools as afunction of θ. Figure 4.2 shows the EPC scores on the Reading for Informationlevel pools as a function of θ for Reading for Information. The curves in thefigure represent level response functions. The lower boundary of each Reading for Information level on the θ scale is shown to be the θ coordinatecorresponding to an EPC of 0.8 on the corresponding level pool. For example,the dotted vertical line on the left intersects the Level 3 characteristic curve atthe coordinates of 0.8 on the EPC axis and at –1.68 on the θ axis. This meansthat a test taker with a θ of –1.68 would be expected to answer 80 percent ofthe items correctly within the Level 3 item pool.

Figure 4.2Reading for Information Level Characteristic Curves

35

–3 –2 –1 0 1 2 3 4

theta

1.0

0.8

0.6

0.4

0.2

0.0

Level 3 4 5 6 7

EP

C

Mastery

Nonmastery

Page 48: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

EPC scores represent a test taker’s level of skill in two ways that observedscores cannot. First, EPC scores represent performance on a larger set of itemsthan those on any given single form. For Reading for Information, test takers tookonly 6 items per level, but an EPC score represents expected performance on all18 items representing the level. EPC scores therefore provide a more consistentbasis for assigning Level Scores to test takers who take different forms.

Second, EPC scores represent levels of performance that do not necessarilycorrespond to any observed score. In particular, an 80 percent correct criterionfor mastery does not correspond exactly to a NC score for six itemsrepresenting a level of Reading for Information on a single form or eighteenitems representing the level more generally.

The EPC method of defining levels of skill used in the scaling study rests on theassumptions that the data fit the IRT model and that the samples of test takerstaking alternate forms were randomly equivalent. The fit of the data to the modelwas evaluated by its ability to predict the observed distributions of Level Scoresunder three different scoring methods, and to account for observed patterns ofmastery over levels (Schulz et al., 1997; 1999). The fit of the model was judgedto be very good in these respects. To estimate the EPC on level pools, itemstatistics from form-specific BILOG analyses were treated as belonging to acommon scale. This treatment rests on the assumption of randomly equivalentgroups.

Table 4.2 shows the boundary thetas, form-specific cutoff thetas, and NC scorecutoffs that define the levels of Reading for Information used in the Level Scorescaling study.

• The lower boundary of Level 3 on the θ scale is shown to be –1.68, asillustrated in Figure 4.2.

• Similarly, the θ coordinates of the dotted vertical lines representing the lowerboundaries of Levels 4, 5, 6, and 7 in Figure 4.2 are shown in the LowerBoundary column of Table 4.2 to be –0.95, 0.11, 1.15, and 2.88, respectively.

Table 4.2Reading for InformationBoundary Thetas, Form-Specific Cutoff Thetas, and NC Score Cutoffs

36

Form-Specific Cutoff Theta NC Score Cutoff

LevelLower

Boundary Form 1 Form 2 Form 3 Form 1 Form 2 Form 3

3 –1.68 –1.57 –1.72 –1.66 14 14 13

4 –0.95 –1.04 –1.06 –1.06 17 17 16

5 0.11 0.24 0.13 0.30 22 22 22

6 1.15 1.25 1.02 1.26 25 25 25

7 2.88 2.86 2.73 2.40 28 28 28

Page 49: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Because the θ distribution in a BILOG analysis is assumed to be a standardnormal distribution, θ values have approximately the same meaning as Z-scores(standard normal variates) for distribution of true Level Scores. This meaning isuseful for understanding how difficult it is to achieve a given level of skill. Forexample, approximately 5 percent of a standard normal distribution is below aZ-score of –1.68. It is therefore reasonable to suppose that approximately5 percent of the test takers who took the Reading for Information forms in thescaling study had skills below Level 3.

Table 4.2 also shows how cutoff scores were selected. First, the IRT model was used to find a θ for each NC score on each form. The NC score wasconsidered to be the true score, rounded to three decimal places for itscorresponding θ (Schulz et al., 1999). The NC score with the θ closest to theboundary θ for a level was then chosen as the cutoff score for that level.

The form-specific cutoff θ is the θ corresponding to a cutoff score. As shown inTable 4.2 for Reading for Information Level 3, the form-specific cutoff θ s were–1.57 for Form 1, –1.72 for Form 2, and –1.66 for Form 3. These cutoff scoreswere associated with a NC score of 14 for Forms 1 and 2, and 13 for Form 3.On Form 1, the lowest NC score at Level 3 was 14 and the highest NC score atLevel 3 was 16. Therefore, for Form 1, the NC scores ranging from 14 to 16were assigned to Level 3.

The fact that the form-specific cutoff thetas do not generally correspond exactlyto the boundary thetas reflects the difference between continuous and discretevariables. The EPC and θ scales represent achievement and criterion-referencedstandards as continuous variables. These scales can represent a 79 percent or81 percent standard of mastery as precisely as an 80 percent correct standard.NC scores cannot represent all possible standards precisely because they arediscrete. For example, a 0.8 EPC score has no NC representation in an 18-itemlevel pool.

Variation across forms in the θ s associated with a particular NC score representsa combination of systematic and random effects across forms. Systematic effectsinclude the true psychometric characteristics of the forms. For example, the factthat the θ of –1.72 associated with a NC score of 14 on Form 2 is lower than the θof –1.57 associated with a score of 14 on Form 1 suggests that it may be slightlyeasier to score at 14 on Form 2 than on Form 1. However, random effects such asthe error in estimates of IRT parameters and random differences in ability amongtest takers in the Form 1 and Form 2 groups also play a role. Remarkably, cutoffscores were often the same across forms.

With the exception of the Level 3 and Level 4 cutoff scores for Form 3, thecutoff scores for the Reading for Information levels were the same across allforms: 14 for Level 3, 17 for Level 4, 22 for Level 5, 25 for Level 6, and 28 forLevel 7. These results attest to the reliability of item statistics from pretest dataand to the care taken when these statistics were used to make the alternateforms psychometrically equivalent.

37

Page 50: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Since the forms were administered to randomly equivalent groups, and cutoffscores were selected to implement standards consistently across forms, thedistribution of Level Scores should be similar across forms. The results in Table 4.3 confirm this expectation. The percentages of test takers scoring atany given level differ by no more than six points. The data reflect a fairly evendistribution of performance across the samples for all three Reading forInformation forms.

Table 4.3Percentages of Test Takers by Level Score by Form

The method of selecting cutoff scores is slightly lenient. The cutoff for each formis not necessarily higher than the boundary θ. For example, the Level 3 cutoff θof –1.72 for Form 2 is not higher than the Level 3 boundary θ of –1.68. Thispractice tends to produce a high false-positive-to-false-negative error ratio anda higher overall classification error rate than would occur if the cutoff θ alwaysequaled or exceeded the boundary θ. The slightly lenient scoring rule waschosen for two important reasons:

1. The current scoring procedure replaces one that was also lenient (Schulz etal., 1997; 1999). Both the current procedure and the previous one producesimilar frequency distributions of observed Level Scores. This is important forhelping WorkKeys users connect current results with past results.

2. A lenient implementation of the 0.8 EPC standard in WorkKeys is justified bythe error inherent in measuring with reference to a standard.

In addition to the measurement error associated with a test taker’s score, thereis also error in setting a criterion-referenced standard. One or both of thesetypes of errors are typically cited in choosing a cutoff score that is more lenientand gives the benefit of the doubt to the test taker.

Leniency typically takes the form of a cutoff score that is one or more standarderrors of measurement below the score that strictly represents the standard.ACT’s particular method of scoring WorkKeys tests is less lenient than thisapproach. Strict implementation of the 0.8 EPC standard would require the cutoffθ to exceed the boundary θ. In about half the cases, it already does. In the otherhalf, the cutoff score would be a lower value than would be required by a strictimplementation of the standard. One NC point of difference is less than onestandard error of measurement on the NC scale for the WorkKeys tests.

Level Form 1 Form 2 Form 3

Below 3 6 5 6

3 7 7 8

4 38 36 42

5 31 30 27

6 15 19 15

7 2 3 2

38

Page 51: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Scale ScoresScaling is a process of setting up a rule of correspondence between theobserved scores and the numbers assigned to them. The usefulness of a scorescale depends on whether or not it can facilitate meaningful interpretation andcan minimize misinterpretation and unwarranted inferences (Petersen, Kolen,and Hoover, 1989). The purpose of developing an additional score scale foreach WorkKeys test was to provide users with more detailed information for usein program evaluation and outcome measurement. Therefore, the new scorescale makes finer distinctions than can be made with the Level Score scale.Table 4.4 shows an example of the conversion tables for the Scale Scorescorresponding to Level Scores on Reading for Information.

Table 4.4Conversion Table of Scale Scores to Levels for Reading for Information (An Example)

The Scale Scores for the WorkKeys tests were developed using the equalstandard error of measurement methodology developed by Kolen (1988). In thismethod, the number correct (NC) scores are first transformed using the arcsinetransformation described by Freeman and Tukey (1950) to stabilize errorvariance. The form of this transformation is

c(i ) = �sin–1 + sin–1 �where sin–1 is the arcsine function and K is the number of items. Thisnonlinear transformation is designed to equalize error variance acrossthe score points.

The transformed arcsine values are then linearly transformed to the new scorescale using

s = A*c (i ) + B

where s is the Scale Score, A is the slope and B is the intercept.

More specifically,

A = (s1 – s2)/[c (i1) – c (i2)] and B = s2 – A*c (i2) or B = s1 – A*c (i1),

where s1 and s2 correspond to the lowest and highest Scale Score points.The non-integer Scale Scores are then rounded to integers to obtainreported scores.

i + 1______K + 1

i______K + 1

1__2

Level Score Scale Scores

<3 65–72

3 73–74

4 75–78

5 79–81

6 82–84

7 85–90

39

Page 52: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

After considering scales of various lengths, ACT chose a scale of 26 pointsranging from 65 to 90 for the linear transformation. In consideration of the“guessing effect,” scores at the lower end were truncated. A combination ofclassical test theory and IRT was used to determine at what score truncationshould occur. The goals were to provide an adequate number of score pointsfor the anticipated uses of the scores, and to avoid having more score pointsthan the number of items on a test form could support.

The Scale Score is a function of the NC score. Scale Scores also incorporateequal conditional standard errors of measurement (SEM) along most of thescore scale. The standard error of measurement is about 1.5 to 2 points, so anapproximate 68 percent confidence interval can be formed by adding plus orminus 2 points to the Scale Scores. As an illustration, Table 4.5 providessummary statistics for the distributions of NC scores and Scale Scores for thetest takers in a statewide testing program for grade 11.

Table 4.5Summary Statistics for Scores and Scale Scores

N Score Mean Standard Deviation

Spring 2002 121,304 NC Score 21.75 4.42

Scale Score 79.15 3.87

Spring 2003 122,820 NC Score 21.03 4.49

Scale Score 78.73 3.68

40

Page 53: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

EquatingNew test forms for the WorkKeys assessments are developed as needed.Though each form is constructed to adhere to the same content and statisticalspecifications, the forms may be slightly different in difficulty. To control forthese differences, scores on all forms are equated to the same scale so thatwhen they are reported as either Level Scores or Scale Scores, the equatedscale scores have the same meaning regardless of the form administered. Forthis reason, Level Scores and Scale Scores are comparable across test formsand test dates; however, they are not comparable across different WorkKeystests. For instance, a Level Score of 3 or a Scale Score of 73 in Reading forInformation does not compare in any way to a Level Score of 3 or a Scale Scoreof 73 on any other WorkKeys test.

Two commonly used data-collection designs are used to equate WorkKeys testforms: randomly equivalent groups design and common-item nonequivalentgroups design. In a randomly equivalent groups design, new test forms areadministered along with an anchor form that has already been equated toprevious forms. In this design, test forms are distributed in each testing room so that the first person receives Form 1, the next Form 2, and the next Form 3.This spiraling pattern is repeated so that each form is given to a third of the testtakers, and the forms are given to randomly equivalent groups. When this designis used, the difference in group performance on the new and anchor forms isconsidered a direct indication of the difference in difficulty between the forms.

Scores on the new forms are equated to the Score Scale using various equatingmethodologies, including linear and equipercentile procedures (Kolen andBrennan, 2004). When the Level Score and Scale Score conversions arechosen for each form, the equating functions are examined, as are the resultingdistributions of the scores and their means, standard deviations, skewness, and kurtosis.

A common-item nonequivalent groups design has been used when thespiraling technique of assigning forms cannot be used, when only a single formcan be administered on a test date, or when some items are changed in arevised form. In a common-item nonequivalent groups design, the new form(s)and base form have a set of items in common. These anchor items are chosento represent the content and statistical characteristics of the test and areusually interspersed among the other items in the forms. The different forms areadministered to different test takers. In this design, the groups are not assumedto be equivalent. The common items are used to adjust for group differences.Observed differences between group performances can result from acombination of differences in test taker groups and test forms. Strong statisticalassumptions are usually required to separate these differences.

The various equating methods used under the common-item nonequivalentgroups design are distinguished by their statistical assumptions (Kolen andBrennan, 2004). Observed-score equating methods are usually used inequating WorkKeys test forms. For each form, the equating functions areexamined, as are the resulting distributions of scaled scores and the mean,standard deviation, skewness and kurtosis of the scaled scores. The set ofequating conversions chosen for each form is the one that results in scaledscore distributions and scaled score moments that are judged to be reasonablebased on the sample sizes, the magnitudes of the form and group differences,and the historical statistics for the test.

41

Page 54: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Pool Calibration and Pre-EquatingAfter being field tested or operationally administered, the items are calibratedand placed in an item pool. A calibrated item pool is a group of items that havetheir item parameter estimates placed on a scale common to all of them. Theinitial item pool was developed using the items available at that time, and newitems are added to it as they are developed. To calibrate the item pool, a linkingplan was selected. The plan listed all the forms and their links to a base form. A 3PL IRT model was used in the calibration. Using the BILOG-MG program,items were calibrated either concurrently or separately (Zimowski, Muraki,Mislevy, and Bock, 1996), depending on the data-collection designs. Items inthe different forms administered under the randomly equivalent groups designwere calibrated concurrently. The item parameter estimates for all the formsadministered were placed on the scale for the base form using the Stocking-Lord characteristic curve method (Stocking and Lord, 1983).

During the calibration process, item statistics based on both classical testtheory and IRT analyses are reviewed. Pretest items with very low discriminationindices are excluded from the pool. All of the estimated item parameters frommultiple calibrations of any set of items are plotted and compared to eachother. For most items, the estimates are similar. If they are not, the item is notused as an anchor item in the scaling process. The estimates for pretest itemsare replaced by the estimates from operational test administrations when theybecome available.

Creating an IRT-calibrated item pool makes it possible to pre-equate new forms prior to administration. The item parameter estimates can be used inassembling new forms and conducting pre-equating. As described above,most WorkKeys forms are currently equated using either randomly equivalentgroups or common-item equating methods. When these two conventionalequating designs cannot be implemented, pre-equating is performed if theitems have been calibrated previously. The goal of pre-equating is to be able to derive raw-to-scale score conversions before a form is administered intact.Quality of pre-equating rests on the premise that item parameter estimatesestablished prior to the operational administration are appropriate for useduring the operational administration. When possible, research studies arecarried out to evaluate the comparability of the pre-equating results with theequating results derived from other methods (Gao, Chen, and Harris, 2005;Gao, Harris, Yi, and Ming, 2003). In addition, the stability of parameterestimates of pretest items and their impact on pre-equating are evaluated(Chen, Gao, and Harris, 2006).

.

42

Page 55: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

ReliabilityChapter AbstractThis chapter documents how ACT evaluated the reliability of test scores ofReading for Information using a variety of estimation techniques, beginning withinternal consistency. Internal consistency reliability measures the consistencywithin a test by comparing all items with each other. To determine the internalconsistency of the Reading for Information test scores, ACT computed reliabilitycoefficients from two data sets obtained from a Midwestern state—one from121,304 high school students in 2002 (Form A) and another from 122,820 highschool students in 2003 (Form B). The reliability coefficient (KR-20) for the twoforms was 0.82 and 0,90, which are considered high for a 30-item test.

Generalizability theory provides a broad conceptual and statistical frameworkfor evaluating the measurement precision of tests. Generalizability analysesproduce reliability-like coefficients to indicate reliability of measurement. ForReading for Information, ACT used data based on 1,332 test takers to conductgeneralizability analyses. The results indicated that items in the middle levelsof difficulty contribute most to the universe score variances. The reliabilitycoefficients for both rank-ordering test takers and judging test takers’ levels ofperformance were found to be at or above 0.80.

Closely related to test reliability, the standard error of measurement (SEM)summarizes the amount of error or inconsistency in Number Correct (NC)scores on a test. For the two forms of Reading for Information, ACT transformedthe NC scores to Scale Scores and calculated conditional SEMs based on thethree-parameter logistic (3PL) model. The SEMs were found to be mostly lessthan 2 points, showing that the Scale Scores for Reading for Information weredeveloped to have approximately constant SEMs conditional on most ScaleScores. The Scale Score reliability estimates for Form A and Form B were 0.81and 0.85, respectively. These results are quite consistent across forms.

Classification consistency is defined as the extent to which classifications ofexaminees agree when obtained from two independent administrations of a testor two parallel forms of a test. Classification consistency is relevant to Readingfor Information, because—like the other Foundational Skills assessments in theWorkKeys system—it is designed primarily to classify test takers by WorkKeysSkill Level.

For Reading for Information, ACT used the 3PL model to determine estimates ofclassification consistency. Estimates were derived from data collected from astate testing program in the Midwest in 2002-2003. First, ACT estimated thepercentage of test takers who would receive exactly the same Level Score fromtwo strictly parallel test forms. For Reading for Information, it is estimated thatcases of exact agreement amounted to not less than 55 percent and as high as61 percent of the test takers in the study. ACT also analyzed the consistency of“at-or-above” classifications by Skill Level. (For example, test takers might beconsistently classified with respect to being at or above Level 4 if they were toscore at Level 4 on one form and Level 5 on a second, strictly parallel form.)Classification consistency was found to be higher for at-or-above classificationsthan for exact classifications. At-or-above consistency of Reading for Informationscores is estimated to be not less than 85 percent and as high as 98 percent.

43

Page 56: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

The Concept of ReliabilityFor a test to function as intended, the scores reflecting examinee performanceneed to be reliable and score interpretations need to be valid. Both of thesecharacteristics have been defined by the Standards for Educational andPsychological Testing (1999). According to the Standards, reliability is “theconsistency of … measurements when the testing procedures are repeated ona population of individuals or groups.” That is, reliability values document thedegree to which the test is consistent—whether, for example, it is consistentwith itself across administrations of the same form, across administrations ofparallel forms, or across raters. Reliability coefficients are estimates of theconsistency of test scores. They range from zero to one, with values near oneindicating greater consistency and those near zero indicating little or noconsistency. The Standards (1999) also advise test publishers to provideindices that reflect random effects on test scores. In many situations, test usersare also concerned with how measurement error affects score interpretations.Estimates of standard error of measurement (SEM) would indicate the expectedvariation of an examinee’s observed scores about the true score. The meaningand the specific values of the standard errors depend on the score scale. Thischapter reports on the reliability and standard error of Level Scores and thestandard error of Scale Scores.

WorkKeys tests are often used as classification tests. They are designed topermit accurate at-or-above classifications of test-takers with regard to theparticular level of skill that may be required in a given job setting. Therefore,ACT has also examined the classification consistency of Reading forInformation test scores in addition to reliability and standard error ofmeasurement. The higher the classification consistency indices are, the betterprecision the assessment has.

Internal Consistency for Number-Correct ScoresInternal consistency reliability measures the consistency within a test bycomparing all items with each other. ACT computed internal consistencyreliability coefficients for two test forms, A and B, based on data sets obtainedfor 121,304 and 122,820 high school students in a Midwestern state in spring2002 and spring 2003, respectively. The reliability coefficients (KR-20) for thetwo forms equaled 0.87 and 0.90, which are considered high for a 30-item test.

44

Page 57: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Generalizability AnalysesGeneralizability theory provides a broad conceptual and statistical frameworkfor evaluating measurement precision (Cronbach, Gleser, Nanda, andRajaratnam, 1972). In particular, generalizability theory presents amultidimensional perspective on error variance. It enables test users todifferentiate between multiple sources of error and to estimate the magnitudesof the errors (sampling variabilities). Generalizability analyses produce reliabilitylike coefficients known as generalizability and dependability coefficients toindicate reliability of measurement. For example, univariate generalizabilityanalyses can estimate the following:

• variability (variance components, σ 2) associated with test takers (p), items(i ), and the interaction between test takers and items (pi );

• measurement error variances for norm-referenced (rank-ordering test takers)and domain-referenced (assessing performance level) decisions [σ 2(δ) and σ 2(∆)]; and

• generalizability (reliability-like) coefficients for norm-referenced and domain-referenced decisions (Ερ2 and Φ).

Furthermore, generalizability theory can treat multivariate models in which eachtest taker has multiple universe scores associated with a specific level of afixed domain (Brennan, 2001). A multivariate generalizability theory approachcan be used to address issues involved in analyzing test data at the level of atable of specifications. For Reading for Information, items are nested withinlevels of difficulty. Multivariate generalizability analyses can estimate:

• sampling variability associated with items (i ) at each fixed level; with itemlevels (h); and with the interaction between item levels and test takers (ph);

• generalizability coefficients for the total scores; and

• proportions of the universe score variance at each item level to the varianceof the composite (total) universe scores.

Both univariate and multivariate generalizability analyses were conducted usingdata based on 1,332 test takers. The mean, standard deviation, skewness, andkurtosis of NC scores for these test takers were 20.142, 4.549, –0.628, and3.269, respectively. Table 5.1 presents the results of the analyses. The resultsindicate that items in the middle levels of difficulty contribute most to theuniverse score variances (the weight). The reliability-like coefficients for bothrank-ordering test takers and judging test takers’ levels of performance are ator above 0.80 for the test (see bold-faced values in the table).

45

Page 58: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 5.1Estimated Variance Components, Error Variances, and Generalizability Coefficients

Standard Errors of MeasurementClosely related to test reliability, the standard error of measurement (SEM)summarizes the amount of error or inconsistency in test scores. Nonlineartransformations of number-correct (NC) scores to scaled scores alter therelative magnitudes of the conditional SEMs for scaled scores (Kolen, Hanson,and Brennan, 1992). Figure 5.1 presents the conditional SEM for the two formsof Reading for Information as a function of the NC true score (expected NCscore), E(X⎥θ), and the expected Scale Score, E(S⎥θ), based on the 3PL IRTmodel. The SEMs are mostly less than 2 points, showing that the Scale Scoresfor the Reading for Information test were developed to have approximatelyconstant SEMs on most Scale Scores.

Univariate Analysis

Level σ^ 2(p) σ^ 2(i ) σ^ 2(pi ) σ^ 2(δ) σ^ 2(∆) Ερ^2 Φ^ Weight

3 0.006 0.000 0.026 0.004 0.004 0.582 0.581 0.076

4 0.021 0.000 0.094 0.016 0.016 0.578 0.577 0.193

5 0.048 0.006 0.157 0.026 0.027 0.647 0.639 0.310

6 0.040 0.011 0.200 0.033 0.035 0.548 0.534 0.265

7 0.017 0.002 0.200 0.033 0.034 0.338 0.336 0.156

All Items 0.018 0.061 0.144 0.005 0.007 0.792 0.728

Multivariate Analysis

σ^ 2(p) σ^ 2(h) σ^ 2(ph) σ^ 2c (δ) σ^ 2

c (∆) Ερ^2c Φ^ c

Total Score 0.019 0.069 0.010 0.005 0.005 0.804 0.800

Note: Weight indicates the proportional contribution of the universe score variance at each level ofitems to the composite universe score variance (effective weight).

46

Page 59: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 5.1SEMs for Two WorkKeys Reading for Information Test Forms

Scale Score average SEMs were estimated using the 3PL IRT model. Theestimated Scale Score reliability R was calculated as

R = 1 –

where SEM is the estimated Scale Score average standard error ofmeasurement and S is the standard deviation for the observed Scale Scores.For the data sets described above, the Scale Score reliability estimates forForm A and Form B were 0.79 and 0.87, respectively.

SEM2______S2

47

65 70 75 80 85 90

2.5

2.0

1.5

1.0

0.5

0.0

SE

M

Scale Score (Spring 2002)Form A

0 5 10 15 20 25 30

2.5

2.0

1.5

1.0

0.5

0.0

SE

M

Number-Correct Score (Spring 2002)Form A

0 5 10 15 20 25 30

2.5

2.0

1.5

1.0

0.5

0.0

SE

M

Number-Correct Score (Spring 2003)Form B

65 70 75 80 85 90

2.5

2.0

1.5

1.0

0.5

0.0

SE

M

Scale Score (Spring 2003)Form B

Page 60: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Classification Consistency of Reading forInformation Level ScoresThe Standards (1999) advises publishers of classification tests to provideinformation about the percentage of test takers who would be classified in thesame way if they were to take the test two times using the same form or usingalternate forms. The Standards notes that reliability coefficients and standarderrors do not directly answer this practical question. Decision consistency canhelp address this question because it is an important reliability concept formeasurements that involve classification decisions. Classification consistency isdefined as the extent to which classifications agree when obtained from twoindependent administrations of a test or two parallel forms of a test. According toSubkoviak’s review (1984), two important classification consistency indices are:

1) the agreement index p0, which is the proportion of consistent classificationbased on two parallel forms, and

2) coefficient κ, which is the proportion of consistent classification adjusted forchance agreement.

One principal output from the analysis of classification consistency is a symmetriccontingency table. The contingency table can be estimated based on apsychometric model using test scores obtained from a single test administration.

Using the IRT methodology described by Schulz, et al. (1997, 1999), ACTderived estimates of classification consistency for Reading for Information. Thismethodology performed well when compared with classical methods (Lee,Brennan, and Hanson, 2000). Indices of classification consistency are moredirectly informative about the effects of measurement error on a classificationtest than are SEMs. Expressed as a proportion, classification consistency hasthe same range as the reliability coefficient: 0.00 to 1.00, with 1.00 being themaximum or best possible. When expressed as a percentage, classificationconsistency ranges from 0 to 100.

Table 5.2 shows estimates of classification consistency for Reading forInformation. The data sets collected from a state testing program in the Midwestin 2002–2003 were used to estimate classification consistency based on an IRTapproach. The 3PL IRT model was fit to the data. In Table 5.2, the first rowlabeled Exact shows the percentage of test takers who would receive the sameLevel Score from two strictly parallel test forms. For example, if a test taker wereto take two strictly parallel forms of the test and score at Level 3 on both forms,this would be a case of exact agreement. For Reading for Information, it isestimated that cases of exact agreement are not less than 55 percent and canbe as high as 61 percent of the test takers in this study.

48

Page 61: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 5.2Predicted Classification Consistency for Level Scores

The remaining rows in Table 5.2 show the consistency of at-or-aboveclassifications by level. For example, entries in the row labeled At-or-above 5reflect the consistency of classifying test takers at or above Level 5. Say a testtaker were to take two strictly parallel forms of Reading for Information andscore at Level 4 on the first form and Level 5 on the second. That test takerwould not be consistently classified with respect to being at or above Level 5,but would be consistently classified with respect to being at or above Level 4.Classification consistency is clearly higher for at-or-above classifications thanfor exact classifications. At-or-above consistency of Reading for Informationscores is estimated to be not less than 85 percent and as high as 98 percent.

Estimates of classification consistency are sensitive to the distribution of skill.For example, the lower boundary of 0.11 on the θ scale for Reading forInformation Level 5 is near zero, which is the mean of the Reading forInformation θ distribution used to compute classification consistency andclassification error. (The θ distribution for each skill is assumed to be standardnormal.) This means that the true skill of a relatively large proportion of thesetest takers was close to the Level 5 boundary. Generally, test takers are morelikely to be misclassified because of measurement error when their true skill iscloser to the criterion. Given this fact, an 85 percent to 87 percent classificationconsistency for the at-or-above Level 5 classification for Reading for Informationis very good.

By the same reasoning, however, an 89 percent to 95 percent classificationconsistency for the at-or-above Level 7 classification is probably overlyoptimistic. At 2.88, the Level 7 boundary is far above the skill of most test takersin a standard normal θ distribution. Applicants for jobs requiring Level 7 skill inReading for Information, however, will probably have skills closer to the Level 7boundary than the test takers in this study. In that case, the classificationconsistency for actual job applicants is likely to be lower than is indicated bythe Level Score values shown in Table 5.2. Kappas are moderate and usersshould decide if they are acceptable for their decision making.

Spring 2002 Spring 2003

N = 121,304 N = 122,820

Level p0 κ p0 κ

Exact 0.55 0.43 0.61 0.50

At-or-above 3 0.97 0.84 0.98 0.89

At-or-above 4 0.94 0.76 0.94 0.79

At-or-above 5 0.85 0.68 0.87 0.73

At-or-above 6 0.85 0.66 0.86 0.63

At-or-above 7 0.89 0.55 0.95 0.51

Note: Exact classifications specify a particular skill level for the test taker. The remaining classifications specify that the test taker is at or above the indicated level.

49

Page 62: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Validity Chapter AbstractThis chapter provides an overview of the validity concept as it applies to theinterpretation of scores on the WorkKeys foundational skills assessments. Thisoverview covers three types of validity evidence collected to justify the use ofWorkKeys foundational skills scores in making employment decisions:construct-related evidence, criterion-related evidence, and content-relatedevidence. For the Reading for Information assessment, ACT has conductedvalidity studies or worked with employers to collect data for each type of validityevidence, summarized as follows.

To support the construct-related validity of Reading for Information test scores,ACT examined the relationship between Reading for Information and the ACTReading and English Tests, which cover the major content areas that areprerequisite to successful performance in entry-level college reading andEnglish courses. ACT found a moderate correlation between Reading forInformation and ACT Reading and English Test scores: In general, test takerswho received higher Scale Scores on the ACT Reading and English Tests alsoreceived higher Level Scores on Reading for Information.

To support the criterion-related validity of Reading for Information test scores,ACT has gathered data from various organizations on the correlation betweenReading for Information test scores used to select job applicants and theirsubsequent job performance ratings. While sample sizes and correlations varyfrom study to study, all of the correlations have been positive, ranging from 0.12to 0.86, which compares favorably with the correlations typically found in thegeneral research literature on criterion-related validity of employment tests. ACT has also conducted classification consistency studies, comparing theemployees’ job performance classification to their classification by Reading forInformation Skill Level. In these studies, the percentage of employees classifiedthe same way by both measures ranged from 71 percent to 79 percent.

To support the content-related validity of Reading for Information test scores,ACT uses two job analysis procedures—WorkKeys Job Profiling and theSkillMap Job Inventory—to link the Reading for Information Skill Levels torelevant job behaviors. WorkKeys Job Profiling and SkillMap are both designedto meet federal standards and other industry guidelines for content validation ofemployment tests used for high-stakes decisions such as hiring and promotion.Both procedures can be used to define critical job tasks, determine whichWorkKeys skills are relevant to performing the tasks, and identify the level ofskill required for performing them.

Because WorkKeys assessments such as Reading for Information can be usedfor high-stakes employment decisions, ACT has analyzed Skill Level scores forevidence of adverse impact by gender and racial/ethnic groups. Evidence ofadverse impact has been found to be consistent with existing research on thevalidity of employment test scores used for high-stakes selection decisions. Inthis context, such findings reinforce the need to clearly link use of WorkKeystest scores to the critical tasks and skills required for the job. Such linkages canbe readily established through content validity studies using either WorkKeysJob Profiling or SkillMap.

50

Page 63: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Validity: A Unitary Evidence-Based ConceptThe Uniform Guidelines on Employee Selection Procedures (1978) describesthree kinds of validity with respect to the interpretation of test scores: contentvalidity, construct validity, and criterion-related validity. At the same time, theStandards for Educational and Psychological Testing (1999) describes testvalidity as a unitary concept supported by three kinds of evidence. Thatevidence can be “based on relations to other variables” (criterion-related),“based on test content” (content-related), or established by “the validityargument” (construct-related), which is “an explicit scientific justification of thedegree to which accumulated evidence and theory support the proposedinterpretation(s) of test scores” (pp. 13, 11, 184, 174). In other words, evidencemay be accumulated in a number of ways to establish test validity as a whole.The value of each way is determined, not by any inherent value, but by itsappropriateness to the assessment situation.

The Standards (pp. 9–11) also explains that the need for validity evidence isbased on the assumption that a test is used for a purpose, and it is necessary toprovide evidence showing that using the test for that purpose is appropriate.When a test is administered under standard conditions, and a score is reported, itis necessary to determine what the score means within the context in which thetest is used. To make this determination, the test provider accumulates evidenceto show what the test score means. Thus, validity refers to the degree to whichthe evidence supports the interpretation of the scores, not the test itself. BecauseWorkKeys assessments are designed for use in varied business and educationalsettings, validity evidence must be obtained from these different contexts.

Similarly, the Principles for the Validation and Use of Personnel SelectionProcedures (2003) also provides professional standards for three ways toestablish validity evidence for the use of an assessment within an employmentsetting: construct validity evidence, criterion-related validity evidence, andcontent validity evidence. Any or all of these sources of validity evidence maybe used to support the interpretation of scores for a specific purpose, though—as argued in the Standards—none should be viewed as an end in itself.

In summary, the validation process is unitary: Appropriate evidence is collectedand analyzed, and the results are used to support the validity of scoreinterpretations for a specified purpose. The process begins with a statement ofwhat the score is expected to indicate and an explanation of how it will makethat indication. Validation is achieved when a scientifically sound validityargument has been presented. Such an argument is one that supports theintended interpretation of the test scores by showing what they mean within aspecific context.

Within the context of employment testing, the validation of test scores forWorkKeys foundational skills assessments like Reading for Information reliesprimarily on the gathering of evidence for content validity. However, asdiscussed in the following sections, ACT has gathered evidence from a varietyof sources for all three kinds of validity: construct-related, criterion-related, andcontent-related. Moreover, because Reading for Information can be used forhigh-stakes employment decisions, ACT also analyzed Reading for Informationscores for evidence of adverse impact on gender and racial/ethnic groups. Thefindings reinforce the need to clearly link use of the test scores to the criticaltasks and skills required for the job. Such linkages can be readily establishedthrough content validity studies using either WorkKeys Job Profiling or theSkillMap Job Inventory.

51

Page 64: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Construct-Related EvidenceConstruct-related evidence for test validity focuses primarily on the test scoreas a measure of the psychological characteristic of interest. The process ofcompiling construct-related evidence starts with test development andcontinues until the pattern of empirical relationships between test scores andother relevant variables clearly indicates the meaning of the test score.

WorkKeys Reading for Information and the ACT AssessmentWhen focusing on construct-related evidence, for example, one would expectscores on Reading for Information—which focuses on the reading andunderstanding of work-related instructions and policies—to be related to scoreson other reading tests. As part of the validation process, then, ACT examinedthe relationship between Reading for Information and the ACT Reading andEnglish Tests. The ACT Reading and English Tests are part of the ACTAssessment, which is designed to measure the skills acquired duringsecondary education that are most important to success in postsecondaryeducation. The material the tests cover emphasizes the major content areasthat are prerequisite to successful performance in entry-level college readingand English courses.

ACT conducted a study of the correlation between scores on Reading forInformation and scores on the ACT Reading and English Tests. Table 6.1presents the correlation coefficients between these scores. ACT collected score data from two test administrations in a Midwestern state, one from asample of 121,304 students in spring 2002 and another from a sample of122,820 students in spring 2003. The correlations shown in the table indicatethat Reading for Information scores are moderately correlated with scores onthe ACT Reading and English Tests. These results imply that the reading-relatedachievements and skills measured by the three tests share many similarities,while maintaining some differences.

Table 6.1Correlations between WorkKeys Reading for Information, ACT Reading,and ACT English

Number Correct Score Scale Score1

ACT Reading ACT English ACT Reading ACT English

Range 1–40 1–75 1–36 1–36

2002 WorkKeys RFI 0.650 0.692 0.608 0.639

2003 WorkKeys RFI 0.657 0.711 0.620 0.6601The correlations for the Scale Scores are lower than the correlations for the Number Correct Scores becausethe Scale Scores are more restricted in their numeric range.

52

Page 65: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 6.2 presents the conditional distributions of the ACT Reading ScaleScores for each Level Score on Reading for Information. Each row shows forthe Level Score indicated the percentage of those cases that received an ACTReading score within the range indicated for the column. For example, for bothdata sets, about 89 percent of the test takers who scored below Level 3 onReading for Information received Scale Scores below 16 on ACT Reading.

Figure 6.1 presents these same conditional distributions, showing the range(excluding extreme values), median, and quartile of the Scale Scores on ACTReading for each Reading for Information Level Score. For example, for testtakers in 2003 who scored below Level 3 on Reading for Information, themedian Scale Score on ACT Reading was 12 with an observed range of 9 to15. In summary, the results the correlation study provide evidence of constructvalidity by showing a relatively strong relationship between Reading forInformation and ACT Reading scores. In general, test takers who receivedhigher Level Scores on Reading for Information received higher Scale Scoreson ACT Reading.

Table 6.2Conditional Distributions of Score Ranges for WorkKeys Reading forInformation Level Scores and ACT Reading Scale Scores

53

Year

Reading forInformation

Level Scores

ACT Reading Scale Scores

Below16 16–19 20–23 24–27 28–32 33–36 Total

percent

2002 Below 3 89.09 8.29 1.78 0.58 0.24 0.02 100

3 80.01 16.46 3.03 0.40 0.09 0.01 100

4 45.43 32.61 15.50 5.07 1.32 0.07 100

5 14.61 27.47 28.70 18.66 9.12 1.45 100

6 3.58 12.89 23.83 26.59 24.61 8.50 100

7 0.57 3.45 10.47 21.26 35.71 28.55 100

Total percent 30.85 24.08 19.22 13.22 9.34 3.31 100

2003 Below 3 89.74 7.26 1.94 0.75 0.24 0.07 100

3 79.60 15.70 3.89 0.73 0.08 0.00 100

4 49.88 28.66 14.91 5.13 1.38 0.04 100

5 19.70 26.64 27.11 17.67 8.12 0.78 100

6 5.61 13.57 23.67 27.91 24.41 4.82 100

7 1.18 4.03 12.41 23.71 40.85 17.82 100

Total percent 31.50 21.34 19.11 14.73 10.91 2.42 100

Page 66: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 6.1Boxplots of Scale Scores on ACT Reading at Each Level Score onWorkKeys Reading for Information

40

30

20

10

0

AC

T R

ead

ing

Sca

le S

core

WorkKeys Reading Level Score0 3 4 5 6 7

2003 Administration

40

30

20

10

0

AC

T R

ead

ing

Sca

le S

core

WorkKeys Reading Level Score0 3 4 5 6 7

2002 Administration

Upper end of the distribution

3rd Quartile

Median

1st Quartile

Lower end of the distribution

Legend

54

Page 67: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Criterion-Related EvidenceThe Standards (1999) pays specific attention to the area of employment skillstesting, stating that “the fundamental inference to be drawn from test scores inmost applications of testing in employment settings is one of prediction: the testuser wishes to make an inference from test results to some future job behavioror job outcome.” The required behavior, which might be the satisfactorycompletion of some aspect of job performance, is commonly called a jobcriterion. For example, a required job behavior may be that an employee beable to read and interpret technical materials associated with the job. Criterion-related evidence for validity might, therefore, show that a test taker with acertain score on a certain test can fairly be expected to read and interpret job-related technical materials that have a certain level of difficulty.

To collect criterion-related evidence, researchers administer the tests to asample of job applicants or employees and compare the results to somemeasure of their observed work behavior such as supervisor’s performanceratings. Two types of criterion-related evidence are commonly studied.

• In a predictive study, a test is administered to a group of job applicants, butis not used for selection decisions. Some of the applicants are hired, andsome are not. Performance data are later collected for those hired, and thetest scores are compared to the performance data. This process makes itpossible to gather information about the accuracy with which test data canbe used to estimate criterion scores obtained at a later time.

• In a concurrent study, a test is administered to job incumbents (hiredapplicants only), and the scores are compared to their current performancedata, with no delay between the test administration and the collection of jobperformance ratings.

Performance is both multidimensional and dynamic (Borman, 1991; Austin andVillanova, 1992), so an employer's performance measures will, to some extent,include something other than true performance. Measures such asperformance ratings, absenteeism, and tardiness may fail to include oraccurately measure relevant aspects of true performance, and they mayaddress “contaminating aspects” that are not relevant to true performance.

Criterion-related evidence is typically underestimated due to restrictions of rangeby excluding applicants who were not hired and the less-than-perfect reliabilityof the two measures (e.g., test scores and supervisor ratings). For example,performance ratings have been known to vary due to rater errors (Woehr andHuffcut, 1994), and ratings frequently fail to include all of the dimensions that areimportant to evaluating performance (Schmidt, 1993). As a result, a study usingperformance ratings as the criterion is limited in its ability to determine the truecriterion-related validity of employment test scores. Both the performance ratingsand the test scores are subject to random measurement error, which reducestheir reliability. The inter-correlation between the predictor variable and thecriterion variable cannot exceed the reliability of either one. Therefore, the valuesof the reported correlations between the test scores and the job ratings areattenuated due to the less-than-perfect reliability of the two measures. Despitethese limitations, studies of correlations between test scores and criteria such asjob performance ratings can provide additional, supporting evidence of validityfor job skills tests such as Reading for Information.

55

Page 68: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Correlations between Reading for Information and Performance RatingsAs part of the process of validating WorkKeys tests for use in hiring decisions,the tests are administered to job incumbents. Then supervisors rate theseincumbents on their job performance, and the results are compared. Becausethese ratings are usually based on overall job performance, they can includemany factors not related to the WorkKeys assessment under study. Ratings canalso focus on specific aspects of the employees’ job performance. Thecorrelations between WorkKeys test scores and job performance ratingsprovide criterion-related evidence for the validity of using a WorkKeys test inrelation to a specific job. The correlations reported here for Reading forInformation are based on studies of the performance of job incumbents, not jobcandidates. This restriction of range tends to reduce the correlations betweenthe test scores and job ratings. Thus, criterion-related evidence is typicallyunderestimated due to restrictions of range by excluding applicants who werenot hired and the less-than-perfect reliability of the two measures (e.g., testscores and supervisor ratings).

Table 6.3 presents the results of an abbreviated selection of correlation studies,numbered 1 through through 15 for reference only. These studies looked at thecorrelation between Reading for Information test scores and job performanceratings obtained from various organizations that studied the appropriateness ofusing WorkKeys test scores for job applicant selection. The jobs cover a widespectrum that includes, for example, machine operators, lab technicians,clerks, supervisors, and social workers. Sample sizes and correlations varyfrom study to study; however, all the correlations are positive, ranging from 0.12to 0.86. These correlations compare quite favorably with the correlationsranging from 0.20 to 0.30 typically found in the general research literature oncriterion-related validity.

Table 6.3Correlations between WorkKeys Reading for Information Scores and Job Performance Ratings

Study Sample Size Correlation

1 10 0.86

2 47 0.58

3 31 0.51

4 19 0.47

5 30 0.43

6 26 0.39

7 56 0.38

8 27 0.34

9 142 0.33

10 21 0.26

11 36 0.17

12 103 0.16

13 120 0.16

14 173 0.14

15 314 0.12

56

Page 69: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Classification ConsistencyAnother way to measure criterion-related validity evidence for WorkKeysFoundational Skills assessments is to examine the percentage of employeescorrectly classified by the tests, as shown in Table 6.4.

• Incumbent employees are classified as successful or less-successfulemployees based on a minimally acceptable cutoff score for supervisorratings of their job performance.

• Employees can also be classified according to their WorkKeys test scores.If they achieve the minimum acceptable cutoff score, they are classified assuccessful, otherwise as not successful.

Comparing the employees’ job performance classification with their WorkKeystest classification yields a measure of classification consistency for theWorkKeys assessments. Correctly classified employees are those who wereclassified the same way by both measures. That is, the total number of correctlyclassified employees is the number classified as successful by both measuresplus the number classified as unsuccessful by both measures. Employees whoare not classified the same way by both measures are considered misclassified.

Classification consistency is typically underestimated due to restrictions in therange of scores and the less-than-perfect reliability of the two measures.Consistency is also affected if the cutoff score is set too high or too low. Forhiring decisions, a cutoff score generally represents the minimum acceptableskill level. In other situations, however, this cutoff score might be set at a moredesirable skill level that exceeds minimum requirements. Setting a higher cutoffhas the effect of reducing the percentage of correct classifications whenincumbents are tested.

With these caveats in mind, Table 6.4 presents data from selected classificationconsistency studies in which Reading for Information test scores werecompared to job performance ratings provided by participating employers fromvarious business sectors. Their participation also helped them consider theappropriateness of using WorkKeys foundational skills assessments to selectjob applicants or to set training goals. In these studies, the percentage ofemployees classified the same way by both measures ranged from 71 percentto 79 percent.

Table 6.4Job Classification Consistency of Reading for Information

Study NWorkKeys

Cutoff ScorePercent

Correctly Classified

A 33 Level 3 79

B 120 Level 6 75

C 103 Level 5 73

D 56 Level 3 71

57

Page 70: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Content-Related EvidenceThe Uniform Guidelines indicates that employers using a content validationstrategy should focus on observable work behaviors as the aspect of jobperformance to which they link the content of the test. The test may measure atype of knowledge, skill, or ability needed to perform these observable workbehaviors. However, the Uniform Guidelines states that a content validationapproach is not sufficient for justifying the use of tests that measure mentalprocesses such as “common sense” or “personality,” that are not directlyobservable or discernible through observable work behaviors. The Standardsstates that content-related validity evidence used for selecting, promoting, orclassifying employees should be based on a job analysis that defines the workperformed. The Uniform Guidelines further specifies that the job analysis shouldyield information regarding the critical work behaviors or tasks that the jobcomprises.

For WorkKeys foundational skills assessments like Reading for Information,content-related validity evidence can be established in employment settings bylinking test scores to the set of job behaviors or job outcomes of interest. TheWorkKeys job profiling procedure enables job profilers trained and authorizedby ACT to conduct a job analysis to document content-related validity evidencefor each WorkKeys assessment. During the profiling procedure, the job profilerswork with focus groups of subject matter experts (SMEs). After meeting with thefocus groups, job profilers prepare a report that includes the job profile data.

Test scores can also be linked to job behaviors using the Web-based SkillMap®

Job Inventory program, which ACT developed for users seeking an alternativeto the focus group format of job profiling. Experts familiar with the specified jobparticipate in the SkillMap process without meeting as a group. When the jobexperts have completed their SkillMap tasks, SkillMap generates a jobinventory. In this way, content-related validity evidence is documented, and theskill levels identified as required can then be used as criteria, or cutoff scores,on the specified WorkKeys foundational skills assessments.

WorkKeys Job Profiling and SkillMap are two different procedures that are both designed to meet the standards for content validation established in theUniform Guidelines. Both of them are used to define critical job tasks,determine which WorkKeys skills are relevant to performing the tasks, andidentify the level of skill required for performing them. Figure 6.2 summarizeshow these two procedures meet the requirements of the Uniform Guidelines forcontent validation of assessments.

58

Page 71: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Figure 6.2How ACT WorkKeys Job Analysis Procedures Meet Uniform GuidelinesRequirements for Content Validation

59

Job Profiling SkillMap

Requirement: Conduct a job analysis that generates descriptions of job behaviors, descriptions of tasks, andmeasures of their criticality.

SMEs establish a list that describes behaviors andtasks, then rate each task for Importance and RelativeTime Spent to yield a Criticality rating for each task.

Job experts establish a list that describes behaviorsand tasks, then rate each task for Importance andRelative Time Spent in order to yield a Criticality ratingfor each task.

Requirement: Demonstrate that the test is related to described job behaviors and tasks.

Job profilers report the percentage of tasks that requirethe skill.

SkillMap job inventory software lists the tasks linked to askill and shows the number of job experts who linkedeach task to the skill.

Requirement: Define skills in terms of observable work outcomes.

Each WorkKeys skill and skill level is defined withspecific criteria and illustrated with multiple workplaceexamples. SMEs link these definitions to job behaviorsand tasks.

Each WorkKeys skill and skill level is defined withspecific criteria and illustrated with multiple workplaceexamples. Job experts link these definitions to jobbehaviors and tasks.

Requirement: Explain how the skills are used to perform tasks or behaviors.

SMEs identify tasks that require the skill under reviewand link specific tasks to a skill level and say how thelevel is used for the tasks.

Job experts identify tasks that require the skill underreview and link specific tasks to a skill level and sayhow the level is used for the tasks. Job experts assigntasks to skills and skill levels.

Requirement: Make no decisions based on knowledge, skills, and abilities that can be learned quickly on the job orin training.

SMEs identify the skill level required for job entry. Newhires should enter the job with this level, not learn it onthe job.

Job experts identify tasks performed at job entry andlink them to skill levels. An algorithm compiles theresults.

Requirement: Assess applicant on skills for higher-level jobs only if new hires may advance quickly.

SMEs identify the skill level required for performing thejob on the first day. They may, in addition, set a higherskill level for performing the job effectively after training.

Job experts identify the skill level required forperforming the job on the first day. They may, inaddition, set a higher skill level for performing the jobeffectively after training.

Requirement: Provide a rationale for setting cutoff scores.

SMEs identify cutoff skill levels by describing job tasksand linking skill level descriptions and examples tothem.

SkillMap uses an algorithm to set cutoff scores basedon task criticality and the job experts’ assignment oftasks to skill levels.

Requirement: Use cutoff scores consistent with normal expectations of workers.

SMEs identify the cutoff skill level based on the normalrequirements of the job, not on unusual job situations,desired capabilities, or their beliefs regarding their ownskill levels.

Job experts identify the cutoff skill level based on thenormal requirements of the job, not on unusual jobsituations, desired capabilities, or their beliefs regardingtheir own skill levels.

Requirement: Do not use results supporting pass/fail decisions only to rank test takers.

WorkKeys scores show that test takers either have therequired skill levels or do not have them. It is notappropriate to rank applicants based on their WorkKeysscores.

WorkKeys scores show that test takers either have therequired skill levels or do not have them. It is notappropriate to rank applicants based on their WorkKeysscores.

Requirement: Maintain documentation regarding validation efforts.

Job profilers present a full report documenting content-related validity evidence, and they retain all relatedworksheets and computer records.

SkillMap generates content-related validitydocumentation and a thorough record of the entireprocess. Users may download SkillMap data.

Page 72: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Job Analysis and High-Stakes Employment DecisionsThe WorkKeys foundational skills assessments are designed for use by anyoneneeding to determine if a person’s performance meets established standards.Frequently, for example, employers use WorkKeys to screen job applicants toverify that they have the skill levels required for performing the job. Employers,educators, and governmental agencies often want to identify skill gaps amongemployees or prospective employees to determine what training is neededand by whom. However, the use of WorkKeys in educational settings andemployment training is typically less prone to legal ramifications than the useof the assessments for selecting and promoting employees. Any time anassessment is used to screen or differentiate applicants for a position, there isthe potential that the use of the assessment may be legally challenged. Thissection focuses on the validation of the use of the WorkKeys foundational skillsassessments in such high-stakes employment situations. That is, it explains theprocedure necessary to justify the use of specific WorkKeys assessments andcutoff scores in employment decisions.

The United States Equal Employment Opportunity Commission, Civil ServiceCommission, Department of Labor, and Department of Justice adopted theUniform Guidelines on Employee Selection Procedures (1978)—referred tohere as the Uniform Guidelines—as rules for the nondiscriminatory use ofemployment tests. The Guidelines provide information for employers regardingthe legally defensible use of such tests and methods for documenting thejustification of test use. Failure to meet the standards set forth in the UniformGuidelines may result in an employer’s inability to defend its use of a particularemployment test.

When ACT developed the WorkKeys system, it adhered to the federalstandards outlined. In addition, The Standards for Educational andPsychological Testing (1999) and the Principles for the Validation and Use ofPersonnel Selection Procedures (2003)—referred to here as the Principles—provide technical and ethical information about developing meaningfulassessments and using them properly. ACT used information regardingimportant issues spelled out in the Standards when developing the tests andscoring systems, and continues to include the Principles when setting policiesregarding test use.

The Standards explains validity as the degree to which evidence supports theuse of tests and the interpretation of test scores in a given context. Thus, onemay think of validation in an employment context as justifying the use of anemployment test and justifying the interpretation of its scores for a particulardecision such as those regarding selection, placement, and promotion.

60

Page 73: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Adverse Impact and Validity EvidenceDiscrimination in selection, placement, and promotion may be looked for whena test is shown to have an adverse impact on a group of people protected byTitle VII of the Civil Rights Act of 1964 (Uniform Guidelines §1607.2A). Toestimate whether adverse impact exists, an employer looks at the number ofindividuals hired compared to the number of applicants, and then compares theratio (e.g., ten out of ten is 100 percent) for the majority group to the ratio forthe minority group. If the ratio for the minority group is four-fifths the size of theratio for the minority group, then no adverse impact is indicated. If it is less thanfour-fifths, then adverse impact is indicated (§1607.3A). This can also beexpressed as:

If 100 percent of the applicants from a majority group are hired, and if lessthan 80 percent of the applicants from a minority group are hired, thenadverse impact is indicated.

However, a testing procedure shown to have adverse impact will not beconsidered “discriminatory…if the procedure has been validated in accordancewith these guidelines” (§1607.3A). When the results of a validity study show arelationship between the characteristics measured by the assessment andsuccessful job performance, there is no discrimination even if adverse impacthas been shown. Therefore, when the use of an employment assessment resultsin adverse impact, an employer may continue to use the assessment after:

• Demonstrating the business necessity of the test by validating its use.

• Demonstrating that the use of a similar test will not result in less adverseimpact.

The WorkKeys system, when used appropriately, provides evidence that thereis a relationship between the content of WorkKeys assessments and the contentof a specified job, occupation, or curriculum.

Gender and Race/Ethnicity AnalysesTable 6.5 presents data comparing Level Scores for male and female testtakers who took Reading for Information. There were statistically significantdifferences between males and females, but because only integer Level Scoresare reported (decimals are not reported), both groups effectively scored at thesame level. Because there is a potential for adverse impact with any cognitiveability test, employers should make sure that a well-documented job analysislinks the job to the skills and the skills to the assessment tool. The cutoff scoreshould be set at a level that is clearly appropriate, and the reasons for usingthat score should also be well documented.

The fact that statistically significant differences in cognitive ability testperformance are typical between a majority and a minority group (for example,a difference of one standard deviation between Caucasians and AfricanAmericans) has been thoroughly researched and documented (Ryan, 2001).Performance on the WorkKeys Foundational Skills assessments is consistentwith these findings. A review of Table 6.5 also shows that, for Reading forInformation, statistically significant differences in test-taker scores byrace/ethnicity were detected.

61

Page 74: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table 6.5Descriptive Statistics of Reading for Information Mean Level Scores byGender and Race/Ethnicity

With both the gender analysis and the race/ethnicity analysis, it was importantto look at practical differences. A difference in mean Level Score of .50, ormore, among the four race/ethnic groups was considered practically significant.A one-way analysis of variance was used to compare groups, and it indicatedsignificant differences.

Using the performance level difference of .50 or more, results of a BonferroniPost Hoc test determined that there were statistically significant and practicalmean differences between Caucasians and African Americans, betweenCaucasians and Hispanic/Latinos, and between Asian Americans andHispanic/Latinos.

ConclusionWhile these findings on adverse impact are consistent with existing research onvalidation of employment skill tests, an employer’s use of any assessment foremployment decisions should be clearly linked to the critical tasks required forthe job. The task and WorkKeys skill requirements for a job can be readilyestablished through a validity study using, for example, the WorkKeys JobProfiling or SkillMap job analysis procedures. The results of such studies serveto establish which assessments are appropriate for high-stakes employmentdecisions.

N MeanStandardDeviation

Gender Female 627,236 4.60 1.203

Male 632,084 4.38 1.369

Race/Ethnicity

African American/Black, Non-Hispanic 249,720 4.06 1.253

Asian-American or Pacific Islander 27,488 4.53 1.338

Caucasian/White, Non-Hispanic 719,758 4.74 1.225

Hispanic/Latino 43,248 3.89 1.469

62

Page 75: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Appendix 1

WorkKeys Assessment Formats, Administration Times, and Delivery Options*

Summary of Test Formats, Administration Times, and Delivery Options

*Organizations can obtain assessments directly from ACT to administer themselves, or they can use secure test administration facilities available from the network of ACT WorkKeysservice partners. An up-to-date list of WorkKeys test administration services is provided atwww.act.org/workkeys/index

**Estimated test administration time

63

WorkKeys AssessmentsItems/

Prompts ComputerPaper/Pencil

Audio/Video Spanish

Foundational Skills Tests

Applied Mathematics 33 items 55 min 45 min No Yes

Business Writing 1 prompt 30 min 30 min No No

Locating Information 38 items 55 min 45 min No Yes

Reading for Information 33 items 55 min 45 min No Yes

Applied Technology 34 items 55 min 45 min No Yes

Listening 6 messages N/A 40 min Audio No

Observation 36 items N/A 60 min Video No

Teamwork 36 items N/A 64 min Video No

Personal Skills Assessments

Performance 55 items 10–15 min** N/A No No

Talent 165 items 30–35 min** N/A No No

Fit 102 items 10–20 min** N/A No No

Page 76: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Appendix 2

Special Testing Arrangements:Accommodations for WorkKeys AssessmentsEligibilityTest takers with documented physical or learning disabilities who cannotcomplete the WorkKeys assessments in the standard time limits, using standardmaterials, and under standard conditions may, at the discretion of the testadministrator, following review of disability documentation, be tested underspecial conditions and/or using special testing materials available from ACT.

ACT Guidelines for Testing AccommodationWritten documentation of evaluation and diagnosis of disability by a qualifiedprofessional within the past five years should be required for all requests foraccommodation. The testing site is responsible for acquiring, keepingconfidential, and maintaining such documentation for a period of at leastone year.

• Documentation should clearly identify the disability for which theaccommodation is to be given, i.e., the basis of the claim.

• There should be a clear statement of the functional limitations emanatingfrom that disability that are known to impact the person’s ability to performtasks in the assessment. A statement such as “Test taker has ADHD andtherefore requires additional time” is not adequate. The statement identifiesthe diagnosis but fails to indicate any limitation. A limitation for this individualmight be that memory skills are weak and test taker needs to rereadquestions in order to understand the meaning implied.

• There must be a link between the disability, the limitation, and the tasksrequired for this particular testing situation. A learning disability in math doesnot imply difficulty with reading or written expression. Individuals with readingdifficulties often have difficulty with written expression, but not always, andmany individuals who have difficulty with written expression have no troublewith reading (either decoding or reading comprehension).

• Although accommodation in school does not necessarily imply the necessityof accommodation in standardized testing, in most cases a current IndividualEducational Plan (IEP) prepared by appropriate academic/psychological stafffor a student will be acceptable documentation for accommodation.

• Test administrators should keep all accommodation documentation on file fora period of at least one year after testing is completed and be able to supplythat documentation if a testing site audit is conducted.

Note: ACT no longer reports accommodations on score reports; however, thatinformation is stored in the database.

Large-Print Assessment MaterialsACT offers large-print (22 point) WorkKeys assessment booklets and answerdocuments. When a test taker records responses on a large-print answerdocument, testing personnel must transfer those responses, in the test taker’spresence, to a regular answer document for scoring by ACT.

64

Page 77: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Braille Assessment MaterialsBraille assessment books are available for the following assessments: AppliedMathematics, Applied Technology, Locating Information, and Reading forInformation.

Captioned Assessment MaterialsCaptioned videos are available for the Listening, Observation, Teamwork, andWriting assessment.

Reader/SignerIf a reader assists a test taker, assessments must be administered in a separateroom to avoid disturbing other test takers. It is important that readers read theassessment exactly as printed, using a WorkKeys Reader’s script, with nointerpretation. Likewise, sign language interpreters may supply no additionalinformation. Some assessments such as Locating Information use numerousgraphics that do not lend themselves to the use of a reader or signer. ACTcautions that using a reader or signer may substantially change the skill beingmeasured in some assessments.

Assistance in Recording ResponsesWhen a test taker is unable to mark responses on the regular WorkKeys answerdocument, testing personnel may offer one of the following accommodations:

• testing personnel mark the answer document as the test taker indicates the responses

• the test taker records the responses on the assessment booklet

• the test taker marks responses directly on the large-print answer document

• the test taker uses a typewriter or computer

If the test taker is recording responses on the assessment booklet or the large-print answer document, testing personnel must, in the test taker’s presence,transfer responses to a standard answer document for scoring by ACT. The sameis true for typewriter or computer use to record answers except for Listening andWriting and Business Writing where the printed sheet may be attached to theanswer booklet. In the case of Listening and Writing and Business Writing, spellcheck and grammar check devices on the computer must be disabled. If a testtaker is giving verbal responses for testing personnel to record, the assessmentsmust be administered in a separate room from other test takers.

Computer-Based TestingAll ACT Centers are required to have ADA-compliant workstations. Theseconsist of a workstation with adjustable height and greater width, a standardPC, and an ADA kit including a keyboard with 1-inch square keys, anergonomic track ball mouse and Big Shot (screen magnification) software.All computer-based tests can be delivered under the same extended timeconditions as paper-and-pencil tests (time-and-a-half, double time, three hours).The administrator simply selects the proper extension when assigning the testtaker to a group.

Test Takers for Whom English is a Second LanguageTest takers for whom English is a second language may use a foreign languagedictionary. The test taker must supply the dictionary. The test administrator mustcheck the dictionary before and after testing to ensure that it does not containnotes or other unauthorized testing aids.

65

Page 78: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Appendix 3

Readability and Grade Level of Reading for InformationThe difficulty of reading selections in WorkKeys assessments is not basedsolely on readability formulas. The “Reading” section in the Encyclopedia ofEducational Research provides a helpful overview of readability issues,stressing that “readable” means “understandable,” and that readability canbe determined using judgment, measures, and/or readability formulas.

Readability formulas are generally used to analyze characteristics of textsamples by comparing features such as sentence length or the familiarity ofwords in the text samples to those found in texts used at specified grade levels.These comparisons may be augmented by considering how well students atthat grade level are actually able to comprehend the text. There are manymethods and formulas for estimating the appropriateness of specified gradelevels, and the results of them often vary. Some reading experts attach verylittle credibility to readability formulas.

Table A6.1 shows the results of a 1994 study conducted by MetaMetrics, Inc.When developing their lexile scale, MetaMetrics used information about thecomprehension of text by test takers (based on testing them using a clozetechnique) to augment information about textual characteristics. In a specialstudy, MetaMetrics used 30 samples of the stimulus text from WorkKeys Readingfor Information Form 13AA to calculate a total-raw-score to lexile conversiontable. The midpoints of the raw score scale ranges for each WorkKeys level werethen used to identify the lexile associated with that skill level.

Table A6.1MetaMetrics Lexile and Grade Range Compared to Flesch-Kincaid GradeLevel for WorkKeys Reading for Information Form 13AA

Table A6.2 shows the results of a study conducted for Reading for Informationtext passages from three forms using the two readability formulas available inMicrosoft® Word. The values presented are consistent with those obtained onother Reading for Information forms and represent grade levels that increasewith the levels, as they should. In addition, the mean values are of a reasonabledifficulty for the population of test takers.

Skill Level Lexile

Grade Range ofReading Selections

(MetaMetrics)

Mean Grade Level ofReading Selections

(Flesch Kincaid)

3 832 5 5.2

4 981 Late 6, Early 7 5.7

5 1,161 11 9.4

6 1,350 13-14 (Early college) 10.5

7 1,721 Graduate school 12.0

66

Page 79: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Table A6.2Mean Readability of Reading for Information Reading Selections in Forms C, D, and E

a Higher numbers for Flesch Reading Ease indicate greater readability. b Lower numbers for Flesch-Kincaid Grade Level indicate greater readability. Numbers for Flesch-

Kincaid Grade Levels indicate the U.S. grade-level equivalent. No grade level higher than 12 iscalculated or reported.

When assigning a Reading for Information level to workplace materials used inassessments, WorkKeys staff does not depend exclusively or even primarily onreadability formulas. Rather, the final readability determination depends on acombination of factors listed in each level description. These factors caninclude any or all of the following: length of words, sentences, and/or stimuli;difficulty or unfamiliarity of the vocabulary; number of confusing or similardetails; clarity of the writing, and number and type of conditional (if…then)statements. While certain types of documents are associated with certainlevels, the complexity of both the stimulus and the items that go with itdetermines the level to which a particular stimulus-and-item set is assigned. For example, memos typically are categorized as appropriate to Levels 3, 4,or 5, but some memos are complex enough for Levels 6 or 7.

Mean

Skill Level Flesch Reading Easea Flesch-Kincaid Grade Levelb

3 77.7 5.6

4 74.5 6.3

5 64.8 8.5

6 28.6 11.9

7 36.1 12.0

67

Page 80: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Works CitedACT. (1992). A strategic plan for the development and implementation of the

WorkKeys system. [Internal report]. Iowa City, IA: American College Testing(now ACT).

ACT. (1994). WorkKeys Targets for Instruction: Reading for Information. IowaCity, IA: Author.

ACT. (1995a). WorkKeys guide for reviewers [Internal document]. Iowa City, IA:Author.

ACT. (1995b). WorkKeys SkillPro [Computer software]. Iowa City, IA: Author.

ACT. (2005, July). Continued analysis of the WorkKeys job profiling andassessment databases. [Internal technical report]. Iowa City, IA: Author.

ACT. National Curriculum Survey (2007). Retrieved from www.act.org/research/policymakers/pdf/NationalCurriculumSurvey2006.pdf

American Management Association. (2001). AMA survey on workplace testing:Basic skills, job skills, psychological measurement. New York: Author.

Are they really ready to work? Employers’ perspectives on the basic knowledgeand applied skills of new entrants to the 21st century U.S. Workforce. (2006).[Washington, D.C.]: The Conference Board, Corporate Voices for WorkingFamilies, the Partnership for 21st Century Skills, and the Society for HumanResource Management. Retrieved fromwww.21stcenturyskills.org/documents/FINAL_REPORT_PDF09-29-06.pdf

Austin, J. T., and Villanova, P. (1992). The criterion problem: 1917–1992. Journalof Applied Psychology, 77(6), 836–874.

Borman, W. C. (1991). Job behavior, performance, and effectiveness. In M. D.Dunnette and L. M. Hough (Eds.), Handbook of industrial and organizationalpsychology (2nd ed., Vol. 2, pp. 271–326). Palo Alto, CA: ConsultingPsychologists Press.

Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.

Carnevale, A. (2001, January). Help Wanted… College Required. Presented atBusiness Education Partnerships Conference, Chicago. Princeton, NJ: ETSLeadership 2000 series.

Chen, H., Gao, X., and Harris, D. J. (2006). Selecting the Population: AnInvestigation of the Stability of Item Parameter Estimates Across Populations.Paper presented at the Annual Meeting of the National Council onMeasurement in Education, San Francisco, CA.

Code of fair testing practices in education. (2004). Washington, DC: JointCommittee on Testing Practices.

Code of professional responsibilities in educational measurement. (1995).Madison, WI: National Council on Measurement in Education.

Cronbach, L. J., Gleser, G. C., Nanda, H. I., and Rajaratnam, N. (1972). Thedependability of behavioral measurement: Theory of generalizability ofscores and profiles. New York: Wiley.

Education at a glance: OECD indicators 2004. (2004). Paris: Organisation forEconomic Co-operation and Development.

Equal Employment Opportunity Commission, Department of Labor. (2000,revised). Uniform Guidelines on Employee Selection Procedures (1978). FR 43 38290 38315 (August 25, 1978). 29 CFR part 1607.

Freeman, M. F., and Tukey, J. W. (1950). Transformation related to the angularand square root. The Annals of Mathematical Statistics, 21, 607–611.

68

Page 81: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Gao, X., Chen, H., and Harris, D. J. (2005). Consistency of Equating FunctionsAcross Different Equating Designs, Methods and Samples: An EmpiricalInvestigation. Paper presented at the Annual Meeting of the AmericanEducational Research Association, Montreal, Canada.

Gao, X., Harris, D. J., Yi, Q., and Lei, M. (2003). Examining consistency of itemparameters estimated in pretest and operational test administrations. Paperpresented at the Annual Meeting of the National Council on Measurement inEducation, Chicago.

Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L.Guttman, E. A. Suchman, P. A. Lasarsfeld, S. A. Star, and J. A. Clausen(Eds.), Measurement and Prediction (pp. 60-90). Princeton: PrincetonUniversity.

Human Resources and Skills Development Canada (2004). Comparingclassroom and workplace reading. Retrieved September 9, 2004, fromwww15.hrdc-drhc.gc.ca/awm/main /c_tf_readl_e.asp.

Kolen, M. J. (1988). Traditional equating methodology. Educational Measurement:Issues and Practice, 7(4), 29-36.

Kolen, M. J., and Brennan, R. L. (2004). Test equating, scaling, and linking (2nd ed.). New York: Springer-Verlag.

Kolen, M. J., Hanson, B. A., and Brennan, R. L. (1992). Conditional standarderrors of measurement for Scale Scores. Journal of EducationalMeasurement, 29, 285–307.

Kolen, M. J., Zeng, L., and Hanson, B.A. (1996). Conditional standard errorsof measurement for Scales scores using IRT. Journal of EducationalMeasurement, 33, 129–140.

Lee, W., Brennan, R. L., and Hanson, B. A. (2000). Procedures for computingclassification consistency and accuracy indices with multiple categories.ACT Research Report Series. Iowa City, IA: ACT.

Levy, F. and Murnane, R.J. (2006). How computerized work and globalizationshape human skill demands. (working paper), 1–25.

Mislevy, R. J., and Bock, R. D. (1990). BILOG 3. Item analysis and test scoringwith binary logistic models (2nd ed.). Mooresville, IN: Scientific Software.

National Association of Manufacturers. (2005). 2005 skills gap report: A surveyof the American Manufacturing Workforce. Washington, DC: Author.

Pearson, P.D. (1992). Reading. In Encyclopedia of Educational Research(4th ed., Vol. 3, pp. 1075-1085). New York: Macmillan.

Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed.), (pp. 221–262). New York: American Council on Education, and Macmillan.

Prediger, D. J. (1976). A World-of-Work map for career exploration. VocationalGuidance Quarterly, 24, 198–208.

Ryan, A. M. (2001). Explaining the black-white test score gap: The role of testperformance. Human Performance, 14(1), 45–75.

Schmidt, F. L. (1993). Personnel psychology at the cutting edge. In N. Schmittand W. C. Borman (Eds.), Personnel selection in organizations (pp. 497–516).San Francisco: Jossey-Bass.

Schulz, E. M., Kolen, M. J., and Nicewander, W. A. (1997). A study of modified-Guttman and scale scores using IRT. Journal of Educational Measurement,33, 129–140.

69

Page 82: Reading for Information Technical Manual · Reading for Information Technical Manual. ... foundational skills in reading can perform on the job and more readily learn job-specific

Schulz, E. M., Kolen, M. J., and Nicewander, W. A. (1999). A rationale fordefining achievement levels using IRT-estimated domain scores. AppliedPsychological Measurement, 23, 347 362.

Society for Industrial and Organizational Psychology. (2003). Principles for thevalidation and use of personnel selection procedures (4th ed.). BowlingGreen, OH: Author.

Standards for Educational and Psychological Testing. (1999). Washington, DC:American Educational Research Association, American PsychologicalAssociation, and National Council on Measurement in Education.

Stocking, M. L., and Lord, F. M. (1983). Developing a common metric in itemresponse theory. Applied Psychological Measurement, 7, 201–210.

Subkoviak, M. J. (1984). Estimating the reliability of mastery-nonmasteryclassifications. In R. A. Berk (Ed.), A guide to criterion-referenced testconstruction (pp. 267–290). Baltimore: The Johns Hopkins University Press.

U.S. Chamber of Commerce. (Winter 2002). [PDF]. Higher skills: Bottom-lineresults: A Chamber guide to improving workplace literacy. Washington, DC:Center for Workforce Preparation.

U.S. Department of Education Institute of Education Sciences (2005). Keyconcepts and features of the 2003 national assessment of adult literacy(NCES 2006-471).

U.S. Equal Opportunity Employment Center. Title VII of the Civil Rights Act of1964. (Pub. L. 88-352) (Title VII), as amended, as it appears in volume 42 ofthe United States Code, beginning at section 2000e.

Woehr, D. J., and Huffcutt, A. I. (1994). Rater training for performance appraisal:A quantitative review. Journal of Occupational and Organizational Psychology,67(3), 189–205.

Workplace Essential Skills (1999). From LiteracyLink: A Joint Project of theKentucky Educational Television and the Public Broadcasting System fundedby the U.S. Department of Education. Accessed at www.literacylink.net.

Zimowski, M. F., Muraki, E., Mislevy, R. J., Bock, R. D. (1996). BILOG-MG:Multiple-group item analysis and test scoring. Chicago: Scientific SoftwareInternational.

70