11
Validity refers to the degree to which an assessment instrument measures the attributes that its author says it measures. Gronlund and Linn (1990) sug- gested that 1. Validity refers to the appropriateness of the inter- pretation of the results of a test or evaluation instrument for a given group of individuals, and not to the instrument itself. .. . It is more correct to speak of the validity of the interpretation to be made from the results. 2. Validity is a matter of degree; it does not exist on an all-or-none basis. . . . Validity is best consid- ered in terms of categories that specify degree, such as high validity, moderate validity, and low validity. 3. Validity is always specific to some particular use or interpretation. No test is valid for all purposes. • . . Thus, when appraising or describing validity, it is necessary to consider the specific interpreta- tion to be used or made of the results. Evaluation results are never just valid; they have a different degree of validity for each particular interpreta- tion to be made. (pp. 49-50) 6 Validity of TOMAGS Results 29 Three interrelated types of validity are often con- sidered: content, criterion-related, and construct validity. The following sections describe how each type of validity was examined with relation to the TOMAGS. Content Validity Content validity involves the "systematic examina- tion of the test content to determine whether it cov- ers a representative sample of the behavior domain to be measured" (Anastasi & Urbina, 1997, pp. 114-115). Obviously, this kind of validity has to be built into an assessment instrument when the items are constructed. Instrument developers usually address content validity by showing that the abilities chosen to be measured are consistent with the cur- rent knowledge about a particular area, as well as by demonstrating that the items hold up statistically. Two demonstrations of content validity are offered for the TOMAGS. First, a rationale for the scale's content and format is presented. Second, the validity of the items is empirically demonstrated by

Validity of TOMAGS Results - Prufrock refers to the degree to which an assessment instrument measures the attributes that its author says it measures. Gronlund and Linn (1990) sug-gested

Embed Size (px)

Citation preview

Validity refers to the degree to which an assessmentinstrument measures the attributes that its authorsays i t measures. Gronlund and Linn (1990) sug-gested that

1. Validity refers to the appropriateness of the inter-pretation of the results of a test or evaluationinstrument for a given group of individuals, andnot to the instrument itself. . . . It is more correctto speak of the validity of the interpretation to bemade from the results.

2. Validity is a matter of degree; it does not exist onan all-or-none basis. . . . Validity is best consid-ered in terms of categories that specify degree,such as high validity, moderate validity, and lowvalidity.

3. Validity is always specific to some particular useor interpretation. No test is valid for all purposes.• . . Thus, when appraising or describing validity,it is necessary to consider the specific interpreta-tion to be used or made of the results. Evaluationresults are never just valid; they have a differentdegree of validity for each particular interpreta-tion to be made. (pp. 49-50)

6Validity of TOMAGS Results

29

Three interrelated types of validity are often con-sidered: content, criterion-related, and constructvalidity. The following sections describe how eachtype of validity was examined with relation to theTOMAGS.

Content ValidityContent validity involves the "systematic examina-tion of the test content to determine whether it cov-ers a representative sample of the behavior domainto b e measured" (Anastasi & Urbina, 1997, pp .114-115). Obviously, this kind of validity has to bebuilt into an assessment instrument when the itemsare constructed. Instrument developers usual lyaddress content validity by showing that the abilitieschosen to be measured are consistent with the cur-rent knowledge about a particular area, as well as bydemonstrating that the items hold up statistically.

Two demonstrations o f content val idi ty areoffered for the TOMAGS. First, a rationale for thescale's content and format is presented. Second, thevalidity of the items is empirically demonstrated by

the results of "classical" item analysis proceduresthat were used during the developmental stages ofthe scale's construction.

Rationale Behind Content and Format Selection

The purpose o f this section i s twofold: first, t odescribe the process used to select the "best" items,and second, to demonstrate that the TOMAGS wasdeveloped using ideas about content and format thatare drawn from the existing research literature deal-ing with mathematical talent in elementary students.Content validation is a valuable first phase in theconstruction of a measure. The process begins byidentifying, understanding, and defining the con-structs or characteristics that will be measured.

In Chapter 1, we briefly discussed the three prin-ciples that guided the development of the TOMAGSas a measure of mathematical giftedness in elemen-tary school-aged children. To reiterate, the princi-ples are as follows:

1. The test should be aligned to the NCTM curricu-lum and evaluation standards.

2. The test should require a student t o use theattributes of good mathematical thinkers.

3. Mathematical talent should be measured usingopen-ended questions in a problem-solving format.

These three principles were derived from an in-depth analysis of the NCTM standards and a reviewof the literature i n assessment o f giftedness i nmathematics.

Choosing the "Best" Items

The TOMAGS is a result of several years of pilot test-ing with students who were nominated for giftedmathematics programs o r who were identified asgifted in mathematics. In 1994 the authors of theTOMAGS were approached by the director of giftedprograms in a medium-sized school district in theSouth to assist in the identification of children whowere gifted i n mathematics. Numerous excellentsources identified or suggested critical items relatedto identifying giftedness i n mathematics. Thesesources i nc l uded Curriculum a n d Evaluat ionStandards for School Mathematics (National Council ofTeachers o f Mathematics, 1989) ; To p i c s i nContemporary Mathematics (Bello & Britton, 1993);Mathematics for Elementary Teachers (Krause, 1991);Addison-Wesley Mathematics (Addison-Wesley, 1991),

30

Mathematics i n A c t i o n (Macmillan/McGraw-Hill,1991), Exploring Mathematics (Scott, Foresman, &Co., 1991), Mathematics Unlimited (Harcourt, Brace,Jovanovich, 1991), Mathematics Exploring Your World(Silver, Burdett, & Ginn, 1991), Mathematics (Hough-ton Mifflin, 1991), Investigations in Number, Data, andSpace (TERC, 1991), and numerous issues o f theArithmetic Teacher and the Mathematics Teacher jour-nals, both publications of the National Council o fTeachers of Mathematics.

After reviewing these sources, items for the firstpilot version of the TOMAGS were written. At thisstage, the TOMAGS consisted of four separate testsfor Grades 2 through 5, called the TOMAGS 2 throughthe TOMAGS 5. The TOMAGS 2 consisted of 28 items,the TOMAGS 3 consisted of 26 items, the TOMAGS 4consisted of 35 items, and the TOMAGS 5 consistedof 32 items. These TOMAGS were reviewed by 25teachers of gifted children and three other profes-sionals in the area of gifted education. In addition,372 children who were gifted in mathematics tookthe TOMAGS in the fall of 1994, and an item analysiswas conducted. Based on the results o f the itemanalysis, 4 items were removed and 17 items wereadded to the TOMAGS 2, which then consisted of 41items; 5 items were removed and 21 items wereadded to the TOMAGS 3, which then consisted of 42items; 8 items were removed and 14 items wereadded to the TOMAGS 4, which then consisted of 41items; and 7 items were removed and 4 items wereadded to the TOMAGS 5, which then consisted of29 items. At this point, we knew which of our originalitems worked well with children already identified asgifted and which items did not work well with thispopulation. Our next task was t o determine thedegree to which the tests discriminated betweenchildren gifted i n mathematics and children no tgifted in mathematics.

With th is in mind, in the spring o f 1995, wepiloted the rewritten TOMAGS, referred to as the sec-ond pilot, with 719 children who were nominated fora gifted mathematics program. We next comparedthe scores of two groups of children, those in the sec-ond pilot who were eventually identified as gifted inmathematics and those in the second pilot who werenot identified as gifted in mathematics. We calcu-lated a t-test with these two group's mean raw scoresand concluded that the TOMAGS discriminated wellbetween groups of children identified and not identi-fied as gifted in mathematics.

We also conducted a second item analysis usingthe data generated from our second pilot. Based onthese results, we took our best items and wrote twoversions of the TOMAGS, a Primary version consist-ing of 36 items and an Intermediate version consist-

ing of 47 items. These two versions were pilotedagain in the spring of 1996 and a third item analysiswas conducted. The results of the item analysis werefor the most part satisfactory, although 4 items weredeleted and 7 items were added to the TOMAGSPrimary, for a total of 39 items, and 8 items weredeleted and 13 items were added to the TOMAGSIntermediate, for a total of 52 items. These becamethe forming versions of the tests.

Standards Item Numbers

Number Sense and NumerationConcepts of Whole Number Operations and Whole Number ComputationGeometry and Spatial SenseMeasurementStatistics and ProbabilityPatterns and Relationships

2a, 2b1, 2b2, 2c, 2d1, 2d2, 4a, 4b3a, 3b, 5a, 5b, 68a, 8b, 9a, 9b, 11a, 11 b7a, 7b, 10, 12a, 12b, 12c, 12d13a, 13b, 13c, 13dla l , 1a2, l b l , 1b2, 1c1, 1c2, l d l , 1d2

Relationship of the Itemsto the NCTM Standards

In Chapter 1, the NCTM curriculum standards forGrades K through 4 and 5 through 8 were outlined. Asdiscussed earlier, the first three standards for bothgrade intervals are mathematics as problem solving,mathematics as communication, and mathematics asreasoning. These three standards are reflected in theoverall format of the TOMAGS in that they use a math-ematical problem-solving and reasoning format andrequire the child to communicate mathematically. Inaddition, items were written specifically to reflect par-ticular standards. Some items could be aligned tomore than one standard. For example, Item 11 on theTOMAGS Primary could b e aligned t o bo th (a )Geometry and Spatial Sense and (b) Fractions andDecimals. For the TOMAGS we chose to incorporatesome standards within others. Fractions and Decimalsis one example.

The TOMAGS Primary was developed for chil-dren ages 6-0 through 9-11. The K through 4 stan-dards used to construct the TOMAGS Primary arereflected in Table 6.1 wi th the aligned test items.Additional standards we chose to reflect in the devel-opment of the TOMAGS Primary are Number Senseand Numeration, Concepts o f Who le NumberOperations a n d W h o l e Number Computation,Geometry and Spatial Sense, Measurement, Statisticsand Probability, and Patterns and Relationships.

TABLE 6.1Alignment of the TOMAGS Primary to the NCTM K Through 4 Curriculum Standards

31

Items 2a, 2b1, 2b2, 2c, 2d1, 2d2, 4a, and 4b relateto Number Sense and Numeration. Number Senseand Numeration includes an understanding of thenumeration system in terms of counting, grouping,and place-value concepts, as well as a developingnumber sense (i.e., number meanings). Items 2athrough 2d are related to grouping and place value,whereas Items 4a and 4b are related t o numbersense.

Items 3a, 3b, 5a, 5b, and 6 relate to Concepts ofWhole Number Operations a n d Whole NumberComputation. Concepts of Whole Number Operationsinclude understanding the fundamental operations,which requires a recognition of conditions that indi-cate when a particular operation would be useful.Closely related to this standard is Whole NumberComputation, which includes using a variety of men-tal computations, developing reasonable proficiencywith basic facts and algorithms, and selecting andusing computation techniques appropriate to specificproblems. Items 3a, 3b, 5a, and 5b require the child torecognize which operation would be useful and usethat operation to solve a problem. Item 6 requires thechild to illustrate an understanding of the word sumby choosing two numbers that sum to 12.

Items 8a, 8b, 9a, 9b, h a , and l i b relate t oGeometry and Spatial Sense. This standard includesrelating geometric ideas t o number and measure-ment ideas, drawing shapes, developing spatialsense, and predicting the results of combining, sub-dividing, and changing shapes. Items 8a and 8brequire the child t o subdivide a two dimensionalshape into fractional parts. Items 9a and 9b requirethe child to draw a shape and illustrate their under-standing o f the meaning o f the geometric termperimeter. Item 11 requires the child to change andadd to a shape and at the same time to illustrate spa-tial sense and an understanding of fractions.

Items 7a, 7b, 10, 12a, 12b, 12c, and 12d relate toMeasurement. Measurement includes an understand-ing of the attributes of length, capacity, weight, mass,

area, volume, time, temperature, and angle and anability to make and use estimates of measurement.Items 7a and 7b require the child to choose the bestestimate o f a measurement. Item 10 requires thechild to illustrate an understanding of volume andother related measurement terms. Items 12a through12d require the child to illustrate an understandingof a variety of measurement units.

Items 13a, 13b, 13c, and 13d relate to Statisticsand Probability. This standard includes an under-standing of how to organize and construct displaysof data. Items 13a through 13d require the child toorganize and construct data using a table.

Items la l , 1a2, l b l , 1b2, 1c1, 1c2, l d l , and 1d2relate to Patterns and Relationships. Patterns andRelationships includes recognizing, describing, andextending a wide variety o f patterns. Items l athrough id require the child to extend a series of pat-terns that become increasingly difficult.

Because the TOMAGS Intermediate was devel-oped for children ages 9-0 through 12-11, a combina-tion of the K-4 and 5-8 curriculum standards wereused to construct the test. The standards used toconstruct the TOMAGS Intermediate are reflected inTable 6.2 with the associated test items. In the devel-opment of the TOMAGS Intermediate, we chose touse t h e s e s tandards: Number a n d N u m b e rRelationships, Number Systems and Number Theory,Computation a n d Est imat ion, Pa t t e rns a n dFunctions, A lgebra, Stat ist ics a n d Probabil i ty,Geometry and Spatial Sense, and Measurement.

Items 7a, 7b, 7c, and 14 relate to Number andNumber Relationships, which includes understand-ing, representing, and using numbers in a variety ofequivalent forms, applying ratios, and investigatingrelationships among fractions, decimals, and per-cents. Items 7a through 7c require the chi ld t ochange a percent to a fraction, a fraction to a deci-mal, and a decimal to a percent. Item 14 requires thechild to form a ratio to solve a problem.

Standards Item Numbers

Number and Number Relationships 7a, 7h, 7c, 14Number Systems and Number Theory 5a, 5h, 6a, 6bComputation and Estimation 3a, 3h, 4a, 4bPatterns and Functions la l , 1a2, l b l , 1b2, 1c1, 1c2Algebra 2a, 2h, 2c, 2d, 2e, 9a, 9bStatistics and Probability 15a, 15b, 15c, 16a, 16b, 16c, 16dGeometry and Spatial SenseMeasurement

11a, 11b, 11c, 12a, 12b, 12c, 12d,8a, 8b, 10a, lob

13a, 13b, 13c, 13d

TABLE 6.2Alignment of the TOMAGS Intermediate to the NCTM K Through 4 and 5 Through 8 Curriculum Standards

32

Items 5a, 5b, 6a, and 6 b relate t o NumberSystems and Number Theory. This standard includesusing order relations f o r numbers o f a l l kinds,extending understanding of whole number opera-tions t o fractions, decimals, integers, and rationalnumbers, and developing and applying number the-ory (e.g., factors). Item 5a requires the child to findtwo factors of a whole number and Item 5b requiresthe child to find a missing addend. Items 6a and fibrequire the child to find two addends for a fractionand two factors for a decimal and to illustrate anunderstanding of mathematical terms.

Items 3a, 3b, 4a, and 4b relate to Computation andEstimation, which includes computing with numbersof all kinds; using computation, estimation, and pro-portions to solve problems; and using estimation tocheck reasonableness of answers. Items 3a, 3b, 4a, and4b all require the child to use computational methodsto solve problems and to employ mental arithmeticand estimation to assist in solving the problems.

Items la l , 1a2, 1 bl, 1b2, 1c1, and 1c2 relate toPatterns and Functions. Patterns and Functionsincludes describing, extending, and analyzing a widevariety of patterns. Items la through l c require thechild to extend three patterns that become increas-ingly difficult.

Items 2a, 2b, 2c, 2d, 2e, 9a, and 9b relate toAlgebra. This standard includes solving linear equa-tions using concrete, informal, and formal methodsand applying algebraic methods to solve a variety ofmathematical problems. Items 2a through 2e andItems 9a and 9b require the child to informally set upa series of linear equations to solve mathematicalproblems.

Items 15a, 15b, 15c, 16a, 16b, 16c, and 16d relateto Statistics and Probability, which are two separatestandards for Grades 5 through 8 and a single stan-dard for Grades K through 4 (as described previ-ously). Statistics includes collecting, organizing, anddescribing data, and reading and interpreting graphs.

Probability includes carrying out experiments o rsimulations t o determine probabilities. Items 15athrough 15c require the child to read and interpret agraph, and Items 16a through 16d require the child tocarry out a simulation to determine the probabilityof an event.

Items 11a, l i b , 11c, 12a, 12b, 12c, 12d, 13a, 13b,13c, and 13d relate to Geometry and Spatial Sense.This standard was discussed previously. Items 1 lathrough 1 lc require the child to change and add to ashape and at the same time to illustrate spatial senseand an understanding of fractions. Items 12a through12d require the child to use spatial sense. Items 13athrough 13d require the child to draw a shape andcalculate its area.

Items 8a, 8b, 10a, and 10b relate to Measurement,as described for the TOMAGS Primary. Items 8a and8b require the child to choose the best estimate of ameasurement. Items 10a and 10b require the child toillustrate an understanding of a variety of measure-ment units.

Age

Group 6 7 8 9 All

Normal 46 36 39 30 31Gifted 36 41 41 46 44

"Classical" Item Analysis

In the previous section, evidence for the TOMAGS'scontent validity was provided. This section providesevidence for the scale's content validity in the formof an item-discrimination analysis, a type of analysisthat is traditional, time tested, and indispensable.According to Anastasi and Urbina (1997), "Item dis-crimination refers to the degree to which an item dif-ferentiates correctly among test takers in the behav-ior that the test is designed to measure" (p. 179). Thepoint biserial correlation technique, in which eachitem is correlated with the total scale score, wasused to determine item discrimination (sometimescalled discriminating power or item validity). Garrett(1965) noted that items with a point biserial (i.e., dis-criminating power) of .20 or more "can be taken to bevalid if the test is fairly long. In a short test, items ofhigher validity are needed" (p. 233). Unfortunately,

33

neither Garrett nor other authorities have indicatedhow many items an assessment instrument musthave in order to be considered "fairly long." Nor havethese authorities provided guidance concerning howmuch higher the coefficients should be in order to beconsidered satisfactory o n shorter instruments.Thus, we decided to apply conventions governingthe interpretation of validity coefficients to the inter-pretation o f discriminating powers. Anastasi andUrbina (1997) suggested that statistically significantcoefficients of .2 or .3 can be considered acceptable.For the present purpose, we arbitrarily selected themore conservative value of .3 to ensure that theitems retained for the TOMAGS would be acceptable.

Based on the procedures previously described,the unsatisfactory items (those not satisfying the cri-teria described above) were deleted from the pilotversions of the test. The final item analysis conductedwith the entire norming sample resulted in 39 gooditems for the TOMAGS Primary and 47 good items forthe TOMAGS Intermediate. The good items (thosethat satisfied the item discrimination criterion) makeup the current versions of the instruments.

To demonstrate conclusively the item character-istics of the final version, a final item analysis wasperformed. In this study, the entire normative sampleserved as participants. Median item discriminationcoefficients were computed for each age interval forboth t h e TOMAGS Pr imary a n d t h e TOMAGSIntermediate. The resulting median item discrimina-tion coefficients are reported by age for the two nor-mative groups (normal and gifted) in Tables 6.3 and6.4. Al l of the scale items satisfy the requirementspreviously described and provide quantitative evi-dence of content validity.

Criterion-Related ValidityWallace, Larsen, and Elksnin (1992) stated that crite-rion-related validity is used either to compare an

TABLE 6.3Median Discriminating Powers for the TOMAGS Primary at Four Ages and All Ages

(Decimals Omitted)

TABLE 6.4Median Discriminating Powers for the TOMAGS Intermediate at Four Ages and All Ages

(Decimals Omitted)

Criterion Measures TOMAGS Primary

OLSAT Total School Abilities Index

Age

CogAT Quantitative Battery 73

Group 9 10 11 12 All

Normal 30 34 42 33 33Gifted 32 35 32 32 31

Criterion Measures TOMAGS Primary

OLSAT Total School Abilities Index 67CogAT Quantitative Battery 73SAT Mathematics Total 62

assessment instrument with a valued measure hav-ing similar characteristics (concurrent validity) or topredict the future performance of a student (predic-tive validity). Although the predictive validity of theTOMAGS has yet to be explored, the test has beenexamined w i t h respect t o concurrent val idi ty.Concurrent validity i s derived b y correlating aninstrument with other established measures of simi-lar constructs, in this case, mathematical aptitude.

For t h e TOMAGS Primary, three concurrentvalidity studies were conducted. Study 1 investi-gated t h e concurrent val id i ty o f t h e TOMAGSPrimary by correlating its Quotient score with theTotal School Ability Index (SAI) of the Otis-LennonSchool Ability Test (OLSAT; Otis & Lennon, 1990). TheOLSAT Total SAI examines verbal comprehension,verbal reasoning, pictorial reasoning, figural reason-ing, and quantitative reasoning. Study 1 was con-ducted with 69 children ranging in age from 7-8 to 8-6in a medium-sized school distr ict i n t he South.Thirty-seven children were identified as gifted i nmathematics and 32 were nominated for the giftedmathematics program but were not selected.

Study 2 investigated the concurrent validity of theTOMAGS primary by correlating its Quotient scorewith the Quantitative score of the Cognitive AbilitiesTest (CogAT; Thomdike & Hagen, 1986). The CogATQuantitative Battery consists o f three subtests:

TABLE 6.5Correlation Between TOMAGS Primary and Selected Tests

(Decimals Omitted)

Note. OLSAT = Otis-Lennon School Ability Test (Otis & Lennon, 1990); CogAT = Cognitive Abilities Test (Thorndike & Hagen, 1986); SAT. Stanford Achievement Test (Gardner et al., 1982).

34

Quantitative Relations, Number Series, and EquationBuilding. Children's ability to identify relationshipsamong numbers, complete a series of numbers, andselect numbers and symbols to match an answer isassessed. Study 2 was conducted with 22 childrenidentified as gifted in mathematics, ranging in age from8-6 to 9-5 in a small-sized school district in the West.

Study 3 correlated t h e TOMAGS Pr imaryQuotient score with the Mathematics Total Score ofthe Stanford Achievement Tes t (SAT) (Gardner,Rudman, Karlsen, & Merwin, 1982). T h e SATMathematics To t a l consists o f th ree subtests:Concepts of Number, Mathematics Computation, andMathematics Applications. The Mathematics Totalexamines children's comprehension of whole num-bers and place value, fractions, operations and prop-erties; children's ability to add, subtract, multiply,divide, and problem solve mathematically; and theirknowledge of geometry, measurement, and graphs.Study 3 was conducted with 29 children ranging inage from 7-6 to 8-5 in a medium-sized school districtin the West. Twelve children were identified as giftedin mathematics and 17 were not identified.

The resulting correlations were all statisticallysignificant (see Table 6.5), supporting the concurrentvalidity of the TOMAGS Primary.

For the TOMAGS Intermediate version, two con-current validity studies were conducted. Study 1

investigated the concurrent validity of the TOMAGSIntermediate by correlating its Quotient score withthe CogAT Quantitative score (see discussion o fTOMAGS Primary). This study was conducted with55 children identified as gifted in mathematics, rang-ing in age from 9-2 to 12-4. The children were all iden-tified a s gif ted i n mathematics and were f rommedium- or large-sized school districts in the NorthCentral or West.

Study 2 investigated the concurrent validity ofthe TOMAGS Intermediate by correlating its Quotientscore with the Mathematics Total score of the IowaTests o f Basic Skills (ITBS, Hieronymous & Hoover,1985). The ITBS Mathematics Total consists of threesubtests: Mathematics Concepts, Problem Solving,and Computation. The authors of the ITBS statedthat one of its primary uses is to diagnose specificstrengths and weaknesses in children's educationaldevelopment. Study 2 was conducted with 38 chil-dren, ranging in age from 10-6 to 12-8. Sixteen of thechildren were f rom a North Central small-sizedschool district and were not identified as gifted inmathematics, whereas 22 of the children were from aNorth Central large-sized school district and wereidentified as gifted in mathematics.

The resulting correlations were all statisticallysignificant (see Table 6.6), supporting the concurrentvalidity of the TOMAGS Intermediate.

Criterion Measures TOMAGS Intermediate

CogAT Quantitative BatteryITBS Mathematics Total

6744

Construct ValidityConstruct validity, the final type of validity to beexamined, relates to (a) the degree to which theunderlying traits of an assessment instrument can beidentified and (b) the extent to which these traitsreflect the theoretical model on which the instru-ment is based. Gronlund and Linn (1990) offered athree-step procedure for demonstrating this kindof validity. First, several constructs presumed t oaccount for test performance are identified. Second,

35

hypotheses based on the identified constructs aregenerated. Third, the hypotheses are verified by logi-cal or empirical methods. The following basic con-structs believed to underlie the TOMAGS are dis-cussed in the remainder of the chapter:

1. Because the TOMAGS was designed to identifymathematical ta lent , i t shou ld differentiatebetween groups of students identified as gifted inmathematics and those not identified as gifted inmathematics.

2. Because the TOMAGS was built to conform tospecific aspects of the NCTM standards, a factoranalysis should confirm the relationship of thetest to the standards.

3. Because the TOMAGS was designed to identifytalent in mathematics, i t should not be biased,that i s , g ive advantages t o one group overanother group.

4. Because the items measure similar traits, theitems should be highly correlated with the totalscore.

Group Differentiation

One way of establishing an assessment instrument'svalidity is to study the performances o f differentgroups o f individuals o n t h e instrument. Eachgroup's results should make sense, given what isknown about the relationship o f the instrument'scontent to the group. In the case of the TOMAGS, anassessment o f mathematical talent, o n e wou ldexpect that individuals identified as gifted in math-ematics would score higher than individuals not soidentified. In fact, an instrument whose results didnot differentiate between such groups would have nodiagnostic value; it would have no construct validity.

We would expect to find statistically significantdifferences between individuals identified as gifted in

TABLE 6.6Correlation Between TOMAGS Intermediate and Selected Tests

(Decimals Omitted)

Note. CogAT = Cognitive Abilities Test (Thorndike & Hagen, 1986); ITBS = Iowa Tests of Basic Skills (Hieronymous 8t Hoover, 1985).

mathematics and those individuals not so identified.To test for these differences, eight Rests were con-ducted (one for each age interval for both theTOMAGS Primary and the TOMAGS Intermediate).The Bonferroni procedure was used to control forType I error and the alpha was set at .006. The meansand standard deviations, t-test results, and probabil-ity levels are presented in Tables 6.7 and 6.8 (in eachcase, p < .000).

The mean standard scores in the table are verysupportive of the construct validity of the TOMAGS.The scores made by individuals identified as gifted inmathematics are pretty much as one would expect.Overall, this group scored about 2 standard devia-tions higher than the normal group on the TOMAGSQuotient. The results of the Rests indicate that thesedifferences are large enough to be statistically sig-nificant.

In addition, we conducted a study in a medium-sized school district in the South to determine thedegree to which the TOMAGS Primary discriminatedbetween Grade 2 students identified as gifted in

00

Subject Group

Normal Gifted

Age M SD M SD t-ratio P

6 99.19 15.00 119_00 9.34 15.58 _00017 100.74 15.81 116.12 16.15 8.18 .00008 100.15 15.49 116.36 13.68 11.84 .00009 99.19 15.00 119.46 9.34 15.58 .0001

00

Subject Group

Normal Gifted

Age M SD M SD Natio P9 100.00 14.99 126.72 14.03 14.48 .0001

10 100.00 15.01 120.13 12.21 13.97 .000111 100.00 15.00 119.60 10.27 13.81 .000112 99.99 14.99 125.23 6.73 13.93 .0001

mathematics and students not identified as gifted inmathematics. The school district in which the studywas conducted identifies Grade 2 students as giftedin mathematics using the following five measures:the Otis-Lennon School Ability Test (Otis & Lennon,1990), the Matrix Analogies Test (Naglieri, 1985), theIowa Tests o f Basic Ski l ls Mathematics To t a l(Hieronymous & Hoover, 1985), a teacher checklist inmathematics/language ar ts , a n d t h e TOMAGSPrimary. Thirty-seven Grade 2 students were identi-fied and 32 Grade 2 students were not identified forthe gifted mathematics program using the scoresfrom the identification measures.

We conducted a discriminate function analysis,which was statistically significant. Results of theanalysis indicated that the measures used in the iden-tification process correctly predicted 85.0% of the stu-dents identified and 89.7% of the students not identi-fied (see Table 6.9). The measure that discriminatedbetween the two groups to the greatest degree wasthe OLSAT Total, followed by the TOMAGS Primary,the ITBS Mathematics Total, the Matrix Analogies Test,

TABLE 6.7Means, Standard Deviations, t-Test Results, and Probability Levels for Normal

and Gifted Groups on the TOMAGS Primary

TABLE 6.8Means, Standard Deviations, t-Test Results, and Probability Levels for Normal

and Gifted Groups on the TOMAGS Intermediate

36

and the teacher checklist in mathematics (see Table6.10). The results of the discriminate function analysissupport using the TOMAGS Primary as an identifica-tion measure to predict giftedness in mathematics.

PredictedGroup Membership

Function

Nominated Gifted Mathematics

.895

Identified

.558

Not Identified

N % N %

Predicted IdentifiedPredicted Not IdentifiedTotal

343

37

85.010.3

62632

15.089.7

Measure Function

OLSAT .895TOMAGS Primary .558ITBS Mathematics Total .549Matrix Analogies Test .352Teacher Checklist .302

Factor Analysis

Construct validity also relates to the degree to whichthe underlying traits of an assessment device can beidentified and the extent to which these traits reflectthe theoretical model on which the instrument isbased. A way to investigate this type of validity forthe TOMAGS was to analyze the scores of the chil-dren in the normative sample using the principalcomponents method. The use of this method deter-mined which items were related t o one another.Because we constructed the TOMAGS to be alignedto the NCTM standards, we would expect items toload on factors related to the standards.

Our factor analysis was computed and the itemsfor the TOMAGS Primary loaded on four factors.Factor 1 contains items that measure number sense,numeration, and whole number computation and oper-

ations. Factor 2 contains items that measure geometry,spatial sense, and measurement. Factor 3 containsitems that measure patterns and relationships. Factor4 contains items that measure statistics and probabil-ity. The factors, their loadings for each item, and theireigenvalues are presented in Table 6.11.

The items for the TOMAGS Intermediate alsoloaded on four factors. Factor 1 contains items thatmeasure number relationships, systems, and theory,computation, estimation, and patterns. Factor 2 con-tains items that measure geometry, spatial sense,and measurement. Factor 3 contains items that mea-sure statistics and probability. Factor 4 containsItems that measure algebra. The factors, their load-ings for each item, and their eigenvalues are pre-sented in Table 6.12.

Item Bias

Camilli and Shepard (1994) recommended that testdevelopers employ statistical techniques to detectitem bias. For the purposes of detecting item bias,we focused specifically on the item performances

TABLE 6.9Prediction of Group Membership Using All Five Measures for the Test ofMathematical Abilities for Gifted Students Program for Grade 2 (N = 76)

TABLE 6.10Correlations Between Five Measures Used To Predict Group Membership

and Standardized Canonical Discriminant Functions

Note. OLSAT = Otis-Lennon School Ability Test (Otis & Lennon, 1990); ITBS = Iowa Tests of Basic Skills (Hieronymous & Hoover, 1985).

37

TABLE 6.11Factors and Loadings of the Items of the TOMAGS Primary

(Decimals Omitted)

Number Sense/Numeration

Geometry/SpatialSense/Measurement

Patterns/Relationships Statistics/Probability

Item Loading Item Loading Item Loading Item Loading

2a 31 7a 47 la l 82 13a 302b1 64 7b 30 1a2 80 13b 712b2 64 8a 47 l b l 62 13c 872c 39 8b 33 1b2 62 13d 852d1 61 9a 33 1c1 75

2e 78

2d2 55 9b 39 1c2 75

9a 32

3a 43 10 36 l d l 39

9b 31

3b 42 h a 37 1d2 384a 38 l l b 424b 42 12a 685a 32 12b 685b1 32 12c 315b2 33 12d 526 30

13c 40

eigenvalue 10.15 eigenvalue 2.16 eigenvalue 1.60 eigenvalue 1.13

NumberRelationships/Systems/Theory/Computation/Estimation/Patterns

Geometry/SpatialSense/Measurement Statistics/Probability Algebra

Item Loading Item Loading Item Loading Item Loading

1 a 1 51 8a 25 15a 26 2a 721a2 50 8b 37 15b 23 2b 71l b l 71 10a 29 15c 20 2c 711b2 71 10b 31 16a 85 2d 801c1 59 h a 40 16b 95 2e 781c2 62 l i b 35 16c 96 9a 323a 56 11c 37 16d 35 9b 313b 50 12a 744a 45 12b 744b 44 12c 445a 32 12d 665b 59 13a 196a 59 13b 576b 40 13c 407a 59 13d 557b 457c 1914 44

eigenvalue 11.06 eigenvalue 3.03 eigenvalue 2.43 eigenvalue 1.89

TABLE 6.12Factors and Loadings of the Items of the TOMAGS Intermediate

(Decimals Omitted)

38

of individuals in selected subgroups who took thetest. We chose to use the Delta Scores approach,developed b y Jensen (1980). Delta Scores (i.e.,derived linear scales that relate to item difficulties)are linear transformations of the z scale (Delta =4z + 13). Jensen noted that the Pearson r betweenDelta Scores of different groups "indicates the degreeof group resemblance in relative item difficultieswhen the rank order o f the items is eliminated"(p. 422). The results are reported as correlation coef-ficients; the bigger the coefficients, the smaller thebias in the test.

This procedure was applied t o three dichoto-mous groups: female versus male, African Americanversus non-A f r i can Amer ican, a n d M e x i c a nAmerican versus non-Mexican American. The result-ing coefficients are reported in Tables 6.13 and 6.14for t h e TOMAGS Pr imary a n d t h e TOMAGSIntermediate, respectively. MacEachron (1982)described the magnitude of coefficients listed in thetable as being "high" or "very high." These coeffi-cients provide further evidence that the TOMAGS

Dichotomous Groups

Normative Sample

Normative Sample

Normal Gifted

Male/FemaleAfrican American/

Non-African AmericanMexican American/

Non-Mexican American

96

97

87

97

93

90

Dichotomous Groups

Normative Sample

Normal Gifted

Male/FemaleAfrican American/

Non-African AmericanMexican American/

Non-Mexican American

97

96

96

98

97

93

items contain little or no bias in the groups investi-gated.

Item Validity

Guilford and Fruchter (1978) pointed out thatinformation about a scale's construct validity can beobtained by correlating performance on the items withthe total score made on the instrument. The procedureis also used in the early stages of test construction toselect good items for the purposes of establishing anitem's discriminating power. Strong evidence of theTOMAGS's validity is found in the discriminating pow-ers reported in Tables 6.3 and 6.4. Scales having poorconstruct validity are unlikely to be composed of itemshaving coefficients reported in this table.

Based on information provided in this chapter,one may conclude that the TOMAGS is a valid mea-sure of mathematical talent, and examiners can usethis scale with confidence.

TABLE 6.13Correlation Between Delta Values for Subgroup Comparisons on the TOMAGS Primary

(Decimals Omitted)

TABLE 6.14Correlation Between Delta Values for Subgroup Comparisons on the TOMAGS Intermediate

(Decimals Omitted)

39