10
http://gcq.sagepub.com/ Gifted Child Quarterly http://gcq.sagepub.com/content/57/2/101 The online version of this article can be found at: DOI: 10.1177/0016986213477190 2013 57: 101 Gifted Child Quarterly Jacob A. Giessman, James L. Gambrell and Molly S. Stebbins Abilities Test, Form 6 : One Gifted Program's Experience Minority Performance on the Naglieri Nonverbal Ability Test, Second Edition, Versus the Cognitive Published by: http://www.sagepublications.com On behalf of: National Association for Gifted Children can be found at: Gifted Child Quarterly Additional services and information for http://gcq.sagepub.com/cgi/alerts Email Alerts: http://gcq.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://gcq.sagepub.com/content/57/2/101.refs.html Citations: What is This? - Mar 13, 2013 Version of Record >> at University of Missouri-Columbia on March 19, 2013 gcq.sagepub.com Downloaded from

Minority Performance on the Naglieri Nonverbal Ability Test, Second Edition, Versus the Cognitive Abilities Test, Form 6: One Gifted Program’s Experience

Embed Size (px)

Citation preview

http://gcq.sagepub.com/Gifted Child Quarterly

http://gcq.sagepub.com/content/57/2/101The online version of this article can be found at:

 DOI: 10.1177/0016986213477190

2013 57: 101Gifted Child QuarterlyJacob A. Giessman, James L. Gambrell and Molly S. Stebbins

Abilities Test, Form 6 : One Gifted Program's ExperienceMinority Performance on the Naglieri Nonverbal Ability Test, Second Edition, Versus the Cognitive

  

Published by:

http://www.sagepublications.com

On behalf of: 

  National Association for Gifted Children

can be found at:Gifted Child QuarterlyAdditional services and information for    

  http://gcq.sagepub.com/cgi/alertsEmail Alerts:

 

http://gcq.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://gcq.sagepub.com/content/57/2/101.refs.htmlCitations:  

What is This? 

- Mar 13, 2013Version of Record >>

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

Gifted Child Quarterly57(2) 101 –109© 2013 National Association for Gifted ChildrenReprints and permission: sagepub.com/journalsPermissions.navDOI: 10.1177/0016986213477190gcq.sagepub.com

Article

The continued underrepresentation of Black, Hispanic, and English-Language Learner (ELL) students in gifted programs is recognized as an important problem by theorists and prac-titioners in the field (e.g., Callahan, 2005; Donovan & Cross, 2002; Ford, 1998; U.S. Department of Education, 1993). Bor-land (2004), for one, argues that, because of chronic under-representation of certain groups, gifted programs may actually “widen the gap between society’s have’s and have-not’s” (p. 6). Borland maintains that, although gifted education is by no means the primary cause of achievement differences between demographic groups, it is morally and politically imperative that administrators do what they can to address minority underrepresentation in gifted programs.

In recognition of this issue, the National Association for Gifted Children (NAGC; 2010b) recommends that “students with identified needs represent diverse backgrounds and reflect the total student population of the district” and—to that end—supports “non-biased and equitable” identification strategies, including the use of nonverbal tests (Standard 2.3).

Some have cautioned that nonverbal tests do not measure entirely the same constructs as the tests they are meant to supplement or replace and that they may contain unique forms of bias (Anastasi & Urbina, 1997; Lohman, 2005b; Lohman & Gambrell, 2012). Others, however, have argued that nonverbal tests are relatively free of test bias against children from non-English speaking homes, culturally diverse backgrounds, or with limited opportunity to learn and that they are better measures of ability for any child. Naglieri (2010), for example, argued that ability tests with

477190 GCQXXX10.1177/0016986213477190Gifted Child QuarterlyGiessman et al.2013

1Columbia (Mo.) Public Schools, Columbia, MO, USA2University of Iowa, Iowa City, IA, USA

* This manuscript was accepted under the previous editor, Carolyn M. Callahan.

Corresponding Author:Jacob A. Giessman, Center for Gifted Education, Columbia (Mo.) Public Schools, 4303 South Providence Road, Columbia, MO 65203, USA. Email: [email protected]

Minority Performance on the Naglieri Nonverbal Ability Test, Second Edition, Versus the Cognitive Abilities Test, Form 6: One Gifted Program’s Experience

Jacob A. Giessman1, James L. Gambrell2, and Molly S. Stebbins1

Abstract

The Naglieri Nonverbal Ability Test, Second Edition (NNAT2), is used widely to screen students for possible inclusion in talent development programs. The NNAT2 claims to provide a more culturally neutral evaluation of general ability than tests such as Form 6 of the Cognitive Abilities Test (CogAT6), which has Verbal and Quantitative batteries in addition to a Nonverbal battery. This study compared the performance of 5,833 second graders who took the CogAT6 and 4,038 kindergartners, first graders, and second graders who took the NNAT2 between 2005 and 2011 as part of a grade-wide screening for a gifted program. Comparison between minorities and Whites on the CogAT6 and the NNAT2 found slightly larger gaps on the CogAT6 Composite for Hispanics and English-Language Learners (ELL) but the same gap for Black students. Considered alone, the Nonverbal battery of CogAT6 produced smaller gaps than the NNAT2 for Blacks, Hispanics, Asians, and ELL students. Fisher’s exact tests showed no significant differences between the CogAT6 Composite and the NNAT2 in subgroup identification rates at hypothetical cuts for gifted identification (top 20%, 10%, or 5%), except for Asian and ELL students. The CogAT6 Nonverbal score appeared to identify as many or more high-ability students from underrepresented groups as the NNAT2. Wechsler Intelligence Scale for Children, Fourth Edition, follow-up on the top 5% showed greater predictive validity for the CogAT6 Composite. These results suggest that gifted programs should not assume that using a figural screening test such as the NNAT2, without other adjustments to selection protocol, will address minority underrepresentation.

Keywords

NNAT2, CogAT6, WISC-IV, gifted, talent, minority, underrepresentation

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

102 Gifted Child Quarterly 57(2)

verbal or quantitative content are inappropriate for measur-ing general ability because they are heavily loaded with achievement factors. Naglieri, Brulles, and Landsdowne (2008) deemed nonverbal measures more equitable for all children and argued that “a nonverbal measure of ability can overcome the injustice of under-representation of minorities in gifted programs” (p. 10).

The Naglieri Nonverbal Ability Test, Second Edition (NNAT2; Naglieri, 2008a), has been advertised by its pub-lisher as “a culturally neutral evaluation of students’ nonver-bal reasoning and general problem-solving ability, regardless of the individual student’s primary language, education, cul-ture or socioeconomic background” (Pearson, 2012). In an analysis of the standardization data for the first edition of the NNAT (Naglieri, 1997), Naglieri and Ford (2003) found that White, Black, and Hispanic children had similar mean scores and were similarly likely to meet common percentile cuts for participation in gifted programming (see also Naglieri & Ronning, 2000). Carman and Taylor (2010), however, found that low socioeconomic status students from underrepre-sented minority groups scored 14 Naglieri Ability Index score (NAI) points lower on the NNAT than nonminority students from middle-class families. Like Villarreal (2005), Carman and Taylor cautioned that the NNAT be used only in conjunc-tion with other measures of ability. Naglieri and Ford’s (2003) findings were also questioned on statistical and methodologi-cal grounds by Lohman (2005a). A response from Naglieri and Ford (2005) included a call for similar empirical investi-gations of race and ethnic differences on the Cognitive Abilities Test, Form 6 (CogAT6; Lohman & Hagen, 2001b). The present study responds to Naglieri and Ford’s request by analyzing archival data sets from one gifted program’s use of the NNAT2 and the CogAT6 in grade-wide screenings.

The Present StudyThe Midwestern school district (approximately 18,000 stu-dents) studied used grade-wide screenings with a group abil-ity test as one major means of identifying students who might benefit from gifted services. Students who met district cut scores on the group ability test were referred for further evaluation, which typically included administration of the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV; Wechsler, 2003a)—perhaps the most commonly used test in identification for gifted-talented services (NAGC, 2010a).

In the fall of 2010, the district switched from using the CogAT6 to the NNAT2 for its grade-wide screenings in hopes that the NNAT2 might yield a more diverse pool for further evaluation and, ultimately, a more diverse group of students in the district’s gifted programs. This study used district screening results from both instruments to explore three questions of interest to the district and to the larger con-versation about nonverbal testing as a tool for addressing minority underrepresentation in gifted education.

Research Question 1: On which of the two screening tests were mean scores and variances most similar among subgroups?

Research Question 2: Which screening test best mod-erated minority underrepresentation at hypothetical gifted program cut scores (top 20%, 10%, and 5%)?

Research Question 3: Which screening test best pre-dicted high performance on the WISC-IV?

Additional Exploration: Because the CogAT6 Nonver-bal battery is similar to the NNAT2, CogAT6 Non-verbal Standard Age Scores (NSAS) were included for comparison where possible in the analysis.

MethodSample

Data were drawn from district testing records that included 5,833 students who took the CogAT6 in second grade in the 2005-2006 to the 2009-2010 school years, and 4,035 stu-dents who took the NNAT2 in kindergarten, first grade, and second grade during the 2010-2011 school year. Because these were grade-wide screenings, the sample included four complete grade cohorts for the CogAT6 and three complete grade cohorts for the NNAT2. With the exception of a higher representation of ELL students in the NNAT2 group (6.2% as opposed to 3.4%), demographic characteristics were nearly identical between the two groups (approximately 51% male, 64% White, 20.5% Black, 5% Asian or Pacific Islander, 5% Hispanic, 5.5% multiracial, and 1% American Indian or Alaska Native).

Although the full sample was relevant to the first two research questions, only a subset of the sample was used for evaluation of our question pertaining to WISC-IV predictiv-ity. District policy for the most part limited WISC-IV testing to students with high screening test scores; therefore, cor-relations including all students identified by the screening test could not be calculated. Instead, investigation of the third question focused on the top 5% of scorers for each screening test.

MeasuresDistrict databases provided gender and ethnicity as well as ELL status at the time of screening. Although of interest to the authors, socioeconomic status information in the form of free and reduced lunch status was withheld by the district due to its interpretation of privacy law.

CogAT6. The CogAT6 is a multidimensional group ability test and consists of three batteries measuring verbal, quanti-tative, and nonverbal reasoning (Lohman & Hagen, 2001a). At second grade (Level 2), there are 48 items on each battery, and two item types per battery. No reading is required at this level. On the Verbal and Quantitative batteries, children

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

Giessman et al. 103

listen to questions read by the test administrator and choose their answers from a set of pictures. On the Nonverbal Bat-tery, the test administrator simply paces children though the 44 nonverbal items. The CogAT6 yields a standard age score (SAS) for each battery (VSAS, QSAS, and NSAS), three partial composite standard age scores (VQSAS, VNSAS, and QNSAS), and a full composite standard age score (Ver-bal, Quantitative, and Nonverbal standard age score [VQN-SAS])—all with a mean of 100 and a standard deviation of 16 (Lohman & Hagen, 2002).

Reliabilities for the three batteries using the Kuder-Richardson Formula 20 (KR20) are reported in the research handbook for CogAT6 (Lohman & Hagen, 2002). These ranged from .86 to .92 in the Primary Battery (grades K-2). The KR20 reliability for VQNSAS for these grades was reported as .96, which corresponds to a standard error of measurement of 3.2 SAS points. The handbook also reported test–retest reliability as .92 when different forms of the test were administered 2 weeks apart. Correlations between the overall composite score and scores on other tests include .69 with the Woodcock-Johnson III (Lohman, 2003b; Woodcock, McGrew, & Mather, 2001), .79 with the WISC, Third Edition (Lohman, 2003a; Wechsler, 1991), and .86 with the Iowa Tests of Basic Skills (Hoover, Hieronymous, Frisbie, & Dunbar, 1994; Lohman & Hagen, 2002).

NNAT2. The NNAT2 is a shorter, unidimensional group-administered ability test that uses 48 figure matrix items at all levels (Naglieri, 2008a). The NNAT2 yields the NAI, which has a mean of 100 and a standard deviation of 16 (Naglieri, 2008b). The district used the online version.

The NNAT2 technical manual (Naglieri, 2008b) reported that KR20 reliability coefficients for the test levels used in the present study ranged from .84 to .92. The standard error of measurement at these levels ranged from 4.79 to 6.36. Test–retest reliability ranged from .75 to .78. Validity was exam-ined through correlation with the Otis-Lennon School Ability Test, Eighth Edition (OLSAT-8; Otis & Lennon, 2003) and the Stanford Achievement Tests, Tenth Edition (Stanford 10; Pearson, 2003). Pearson r with OLSAT-8 at second grade was .53 for the Verbal section, .68 for the Nonverbal section, and .69 for the Composite. For kindergarten through second grade, correlations with Stanford 10 Reading ranged from .61 to .70 and with Stanford 10 Math ranged from .62 to .70. At first grade, a comparison of ELL student scores with matched control groups showed a difference of 3.57 NAI points for non-Spanish speaking ELL students and .93 NAI points for Spanish-speaking ELL students.

WISC-IV. The WISC-IV is an individual ability test with sub-tests yielding index scores for Verbal Comprehension (VCI), Perceptual Reasoning (PRI), Working Memory (WMI), and Processing Speed (PSI; Wechsler, 2003a). The first two bat-teries together yield a General Ability Index (GAI), and all four batteries in combination yield a Full-Scale IQ (FSIQ).

These index scores have a mean of 100 and a standard devia-tion of 15 (Wechsler, 2003b).

Internal consistency estimates reported by the technical manual (Wechsler, 2003b) include .94 for VCI, .92 for PRI, .92 for WMI, .88 for PSI, and .97 for FSIQ. Rowe, Kingsley, and Thompson (2010) studied the correlation of GAI and FSIQ with the reading and math composites from the Wechsler Individual Achievement Test, Second Edition (WIAT-II; Psychological Corporation, 2001) among gifted referrals. They found GAI among these higher ability stu-dents to correlate with WIAT-II Reading at .50 and WIAT-II Math at .43. Correlations were higher with FSIQ (Reading, .59 and Math, .47).

Several analyses have been cited by NAGC (2010a) sup-porting use of GAI over FSIQ in identification, especially in cases where subscores are highly discrepant. During part of the study period, the district only administered the VCI and PRI subtests of the WISC-IV. Our analysis, therefore, is con-fined to VCI, PRI, and GAI.

Statistical AnalysesThe shape of the score distribution on each screening test was analyzed in terms of mean, standard deviation, skew-ness, and kurtosis. Skewness and kurtosis were tested for significance at p < .05. Differences between subgroup means on each screening test were tested for significance at p < .05 and p < .001 with independent samples t tests. The lower and upper limits of the 95% confidence interval were also reported. We used PS (version 3.0) to determine that the sample size necessary to detect a real difference of 5 points at a power level greater than .80 was approximately 135 for the smaller group (Dupont & Plummer, 1997). Accordingly, the Native American and Pacific Islander groups were left out of analyses due to sample size. Comparisons between Asian and non-Asian ELL students were included, despite a less than ideal sample size, because observed differences were strikingly large. Along with each mean comparison, we also tested for differences between subgroup variances using Levene’s test. Mean comparisons with significant variance differences used a separate variance t test algorithm for sig-nificance testing.

Next, differences in the proportion of each subgroup scor-ing in the top 20%, 10%, and 5% on CogAT6 versus NNAT2 were tested for significance using Fisher’s exact test. The size of the effect is indexed using the natural log of the odds ratio (LOR; subgroup odds of selection on CogAT6/ subgroup odds of selection on NNAT2). Rosenthal (1996) gave guidelines for interpretation of effect sizes in the odds ratio metric based on Cohen (1998). Suggested values for small, medium, and large effect sizes translate into LOR of .40, .90, and 1.5, respectively. The statistical power of these tests depends not only on assumptions about possible effect sizes but also on the exact proportions involved. The sample size necessary in the smaller (NNAT2) sample to detect a medium-sized effect increases from approximately 140 to

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

104 Gifted Child Quarterly 57(2)

560 as the smaller proportion in the comparison decreases from 10% to 2%. Thus, statistical power should be sufficient to detect effects in the top 20% and 10% comparisons. For the top 5%, only comparisons in groups with relatively large samples and/or large proportions selected have adequate power, but results and tests for all groups were reported for descriptive purposes. To ensure that any possible differences were detected, we used an uncorrected alpha level of .05 despite the many comparisons. Although this decision increases the chances of detecting a difference where none exists, it minimizes the possibility of missing real differ-ences. A more stringent alpha level could be viewed as a means of masking real differences between the tests.

Exploration of relationships between each screening test and WISC-IV performance was complicated by district test-ing policy and data collection. WISC-IVs usually were given to students scoring above 125 on VQNSAS when CogAT6 was administered, or above an NAI score of 118 when the NNAT2 was administered, but exceptions and incomplete WISC-IV records were not unusual. To create a fair compari-son between the screening tests as predictors, we compared the WISC-IV performance of only those students scoring in the top 5% on either screening test. This cut point is just above the district cut score on VQNSAS and well above the district cut score on NAI. If the pool of talent available in the

district was stable across the studied time period and both tests were equally good at predicting WISC-IV performance, then we could expect no difference in observed scores after the selectivity was equalized.

Finally, to explore whether the inclusion of kindergarten and first grade NNAT2 scores may have influenced the com-parison between screening tests, given that the CogAT6 sam-ple was composed entirely of second graders, we presented grade-disaggregated results for the NNAT2. All statistical analyses were conducted in SPSS 20.

ResultsMean Scores and Variances

The first research question asked, “Which screening test generated mean scores and variances that were more similar between subgroups.” Table 1 shows mean scores and stan-dard deviation by subgroup for NSAS, VQNSAS, and NAI. Although NSAS and VQNSAS scores were normally dis-tributed, NAI scores showed significant negative skew (−.461) and positive kurtosis (+.380), p < .05. This means there were more NAI scores at the extremes of the distribu-tion, and in particular more very low scores, than would be expected under normality. NAI score standard deviations at

Table 1. Descriptive Statistics for Subgroups.

CogAT6 sample NNAT2 sample

Group n NSAS, M (SD) VQNSAS, M (SD) n NAI, M (SD)

Grade K 1,251 98.0 (18.3) 1 1,432 97.5 (17.9) 2 5,833 104.1 (15.1) 102.4 (14.8) 1,352 96.3 (15.7)Gender Female 2,843 104.9 (14.9) 102.9 (14.7) 1,967 97.5 (16.7) Male 2,990 103.5 (15.3) 102.0 (14.9) 2,068 97.0 (17.9)Ethnicity White 3,665 106.7 (14.1) 106.5 (13.4) 2,567 100.5 (15.6) Black 1,217 94.6 (14.2) 90.5 (12.7) 820 84.5 (16.5) Hispanic 284 101.0 (13.2) 96.1 (12.6) 191 93.2 (15.8) Asian 296 114.8 (14.8) 108.0 (15.1) 214 109.6 (16.5) Native American 30 106.1 (13.6) 102.6 (12.5) 21 102.0 (11.3) Native Hawaiian/

Pacific Islander9 109.4 (19.0) 100.0 (13.0) 8 97.6 (12.4)

Multiracial 332 103.6 (13.7) 101.2 (12.9) 214 97.4 (15.9)ELL status Non-ELL 5,634 104.2 (15.0) 102.8 (14.7) 3,786 97.5 (17.1) ELL 199 103.1 (18.0) 92.0 (13.6) 249 93.2 (19.3) Non-Asian ELL 127 95.8 (15.0) 87.3 (11.6) 172 88.5 (17.3) Asian ELL 72 116.1 (15.3) 100.3 (13.0) 77 104.8 (19.4)

Note. CogAT6 = Cognitive Abilities Test–Form 6; NNAT2 = Naglieri Nonverbal Ability Test, Second Edition; NSAS = Nonverbal Standard Age Scores; VQNSAS = Verbal, Quantitative, and Nonverbal Standard Age Score; NAI = Naglieri Ability Index; ELL = English-language learner. The normative population mean and SD for all tests is 100 and 16, respectively.

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

Giessman et al. 105

kindergarten and first grade were larger than the expected 16, which would exacerbate the tendency toward extreme scores. A similar pattern of deviation from the expected dis-tribution was found for the first edition of NNAT (Lohman, Korb, & Lakin, 2008). Meanwhile, NSAS and VQNSAS standard deviations were slightly smaller than expected. Table 2 presents the differences between subgroup means using White students as a reference among ethnic groups.

Mean scores were substantially higher on the CogAT6 than on the NNAT2—a difference of more than 6 points in most cases. This large overall mean difference is the rea-son we did not attempt any testing of mean differences across the screening tests. Little significant gender differ-ence was noted for either test. Blacks had the lowest mean scores, scoring a full standard deviation below Whites on VQNSAS and NAI and three quarters of a standard devia-tion below Whites on NSAS (p < .001). However, Blacks did have larger score variability than Whites on the NAI (p = .005).

On all three measures, Hispanic and multiracial means fell between Black and White means, while Asians scored the highest. The difference between Asian and White means was 8.1 and 9.1 points, respectively, on NSAS and NAI (p < .001) but insignificant on VQNSAS (p > .05). Despite no mean advantage, Asians showed greater variance than Whites on VQNSAS (SD +1.7, p = .005).

ELL students scored 10.8 points lower than non-ELL stu-dents on VQNSAS (p < .001), but only 4.3 points lower on NAI (p < .001) and showed no significant difference on NSAS (p > .05). As would be expected, this indicates that the larger gap on VQNSAS is due to the Verbal and Quantitative

batteries, which include spoken English language instruc-tions at Grade 2. Further analysis showed sharp differences between Asian ELL and other, mostly Hispanic, ELL stu-dents. In fact, some of the largest mean score differences noted (13.0, 16.1, and 20.3, p < .001) favored Asian ELL over non-Asian ELL students. This suggests that any ELL advantage on NSAS or NAI was largely attributable to an overall Asian advantage on nonverbal measures. Due to the large gap between Asian and non-Asian ELL students, stan-dard deviations were larger for the ELL than non-ELL sam-ple on both NSAS (+3, p < .001) and NAI (+2.2, p = .002).

Identification RatesThe second research question asked which screening test yielded identification rates on likely cut scores most similar across subgroups. Table 3 details the percentage of each sub-group that fell within the top 20%, 10%, and 5% of sample scores for NSAS, VQNSAS, and NAI. If perfect proportion-ality among subgroups were to hold, each cell value would match the cut percent. For example, all of the cell values in the first section of the table (top 20%) would be 20.0.

Neither instrument identified proportionally at three hypothetical cut scores gifted programs might apply during identification for services. Fisher exact tests (p < .05) were used, though, to indicate subgroups for which one test held an advantage over another at each cut. The finding of larger variance for Black students on NAI did not translate into more high scores since the additional variability was caused by an excess of low scores. NAI identified proportionately more ELL and more Asian students at all three score levels.

Table 2. Subgroup Score Differences.

CogAT6 sample NNAT2 sample

NSAS VQNSAS NAI

95% CI 95% CI 95% CI

Group n Difference LL UL Difference LL UL n Difference LL UL

Gender Female—Male 2,843 1.4** 0.6 2.2 0.9* 0.2 1.7 1,967 0.6 −0.5 1.6Ethnicity Black—White 1,217 −12.1** −11.2 −13.0 −16.1** −15.2 −16.9 820 −16.0**a −17.3 −14.8 Hispanic—White 284 −5.7** −4.0 −7.4 −10.5** −8.9 −12.1 191 −7.3** −9.7 −5 Asian—White 296 8.1** 9.8 6.4 1.5a 3.1 −.1 214 9.1** 6.9 11.3 Multiracial—White 332 −3.1** −1.6 −4.7 −5.4** −3.9 −6.9 214 −3.1* −5.3 −0.9ELL status ELL—Non-ELL 199 −1.0a −3.2 1.1 −10.8** −12.8 −8.7 249 −4.3**a −6.5 −2.1 Asian ELL—Non-Asian ELL 72 20.3** 16.0 24.7 13.0**a 8.8 17.2 77 16.1** 11.5 20.7

Note. CogAT6 = Cognitive Abilities Test–Form 6; NNAT2 = Naglieri Nonverbal Ability Test, Second Edition; NSAS = Nonverbal Standard Age Scores; VQN-SAS = Verbal, Quantitative, and Nonverbal Standard Age Score; NAI = Naglieri Ability Index; ELL = English-language learner; CI = confidence interval; UL = upper limit; LL = lower limit. Listed sample sizes are for the focal groups in each comparison.aSignificant variance differences.*p < .05. **p < .001.

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

106 Gifted Child Quarterly 57(2)

The effect size for ELL students was large, whereas the effect for Asian students was moderate. The only significant subgroup advantage on the CogAT6 over the NNAT2 was a very small effect for Whites on VQNSAS at the 10% cut only. As this advantage occurred at only one cut point, it may be simply the result of chance.

No significant differences were found at any cut between NAI and CogAT6 NSAS. In fact, NSAS identification rates for underrepresented groups were as high as or slightly higher than NAI identification rates. From this we can also infer that the significantly lower Asian selection advantage on VQNSAS compared with NAI stems from differences on the VSAS and QSAS batteries.

Relationship to WISC-IVThe third research question asked which screening test best predicted high performance on the WISC-IV. Table 4 com-pares WISC-IV performance between the top 5% of VQNSAS scorers and NNAT2 scorers in the sample, show-ing what score level on WISC-IV is predicted by a high

score on the screening test. Results showed VQNSAS was a significantly better predictor of high VCI and high GAI, with the top 5% scoring 12 and 5.8 points higher, respec-tively, on VCI and GAI than the top 5% of NAI scorers. NAI appeared nominally better at predicting high PRI, but the difference was not significant.

Impact of Grade and Age Differences in SampleBecause the CogAT6 data used in this study came exclu-sively from second graders and the NNAT2 data included a mix of kindergarteners, first graders, and second graders, it was conceivable that age differences influenced the com-parison of subgroup performance and WISC-IV perfor-mance between the two screening tests. To investigate this possibility, means, standard deviations, and WISC-IV results were disaggregated by grade, as shown in Table 5.

No evidence of substantial differences in NAI mean scores across grade levels was found, although variability decreased as grade level increased. This increased variability in kindergarten and first grade may have resulted in more high scores on NNAT2 than would have been found if the sample had been restricted to second graders. First grade PRI and GAI scores were significantly higher than kindergarten scores (p < .05). However, the trend does not continue into

Table 3. Percentage of Students Within Subgroups Above Selected Score Levels and CogAT:NNAT Log Odds Ratio Effect Sizes.

NNAT NAI CogAT NSAS CogAT VQNSAS

% %Log odds effect size %

Log odds effect size

Top 20% Female 21.6 22.0 .02 21.8 .01 White 25.4 24.0 −.08 26.9 .08 Black 4.1 6.2 .44 2.6 −.47 Hispanic 11.0 11.3 .03 6.0 −.66 Asian 50.5 48.0 −.10 34.5* −.66 Multiracial 16.8 20.2 .23 16.8 .00 ELL 17.7 25.1 .44 5.5* −1.31Top 10% Female 10.2 10.9 .07 11.0 .08 White 11.6 12.1 .05 13.9* .21 Black 1.6 2.6 .50 1.0 −.48 Hispanic 3.7 4.6 .23 3.5 −.06 Asian 36.0 30.1 −.27 18.9* −.88 Multiracial 9.8 8.1 −.21 4.8 −.77 ELL 10.4 15.1 .43 1.0* −2.44Top 5% Female 4.5 5.2 .15 5.2 .15 White 5.7 6.1 .07 6.9 .20 Black 0.6 0.7 .16 .4 −.41 Hispanic 2.1 1.1 −.66 1.1 −.66 Asian 22.0 16.9 −.33 11.1* −.81 Multiracial 1.9 2.1 .10 3.6 .66 ELL 7.6 8.5 .12 .5* −2.80

Note. NNAT = Naglieri Nonverbal Ability Test, Second Edition; CogAT = Cognitive Abilities Test; NSAS = Nonverbal Standard Age Score; VQNSAS = Verbal, Quantita-tive, and Nonverbal Standard Age Score; NAI = Naglieri Ability Index; ELL = English-language learners.*Significant Fisher Exact test when compared with NAI percentage at p < .05.

Table 4. WISC-IV Performance of Top 5%.

Screening test n VCI, M (SD) PRI, M (SD) GAI, M (SD)

VQNSAS 182 124.3* (11.5) 126.4 (10.2) 130.0* (10.6)NAI 161 112.3 (14.8) 128.4 (11.5) 124.2 (13.1)

Note. WISC-IV = Wechsler Intelligence Scale for Children, Fourth Edition; VQNSAS = Verbal, Quantitative, and Nonverbal Standard Age Score; NAI = Naglieri Ability Index; VCI = Verbal Comprehension; PRI = Perceptual Reasoning; GAI = General Ability Index.*Difference between CogAT6 and NNAT2 sample was significant at p < .001.

Table 5. NNAT2 Descriptives and WISC-IV Scores of Top 5% by Grade.

Grade

K, M (SD) 1, M (SD) 2, M (SD)

Overall 97.9 (18.3) 97.5 (17.9) 96.3 (15.7)Female 97.6 (17.9) 97.8 (17.0) 97.1 (15.0)White 101.3 (16.6) 101.2 (15.7) 99.0 (14.4)Black 85.0 (18.1) 83.5 (16.8) 85.2 (14.7)Hispanic 93.8 (18.9) 92.7 (14.7) 93.2 (14.0)Asian 108.8 (15.3) 112.5 (16.7) 107.5 (16.9)Multiracial 96.4 (17.9) 99.1 (16.1) 97.0 (13.0)ELL 95.3 (21.7) 91.0 (19.7) 93.5 (17.0)(Top 5%) WISC-IV VCI 110.2 (16.6) 113.8 (12.3) 113.3 (15.4)(Top 5%) WISC-IV PRI 125.8 (11.3) 131.4 (10.2) 128.0 (12.8)(Top 5%) WISC-IV GAI 121.3 (13.3) 126.9 (11.0) 124.5 (14.8)

Note. NNAT2 = Naglieri Nonverbal Ability Test, Second Edition; WISC-IV = Wechsler Intelligence Scale for Children, Fourth Edition; VCI = Verbal Comprehension; PRI = Perceptual Reasoning; GAI = General Ability Index.

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

Giessman et al. 107

second grade, as these scores were not significantly different from either grade.

DiscussionThe purpose of this study was to compare subgroup perfor-mance and WISC-IV prediction of the CogAT6 and the NNAT2 in the context of selection for gifted services. Our field data informs the debate about whether or not the NNAT2 is an effective tool for addressing the underrepre-sentation of minorities in gifted programming. In this study, none of the three screening measures (VQNSAS, NSAS, and NAI) yielded similar mean performance or identification rates across subgroups—meaning that performance gaps among subgroups persisted across instruments.

Within our sample, multiracial, Hispanic, and ELL stu-dents did perform less disparately on average from White stu-dents on the NNAT2 than they did on the CogAT6 VQNSAS, but this was not true for Black students. Furthermore, any nar-rowing of performance gaps did not translate into significantly higher rates of identification at likely selection cut scores—with the exception of ELL students. The advantage to ELL students on the NNAT2 may be attributable to an overall Asian advantage on nonverbal items. Asian ELL students outper-formed non-Asian ELL students, and the overall Asian sample outperformed all other groups in both mean scores and identi-fication rates—most significantly on the nonverbally oriented NSAS and NAI. Exceptional Asian and Asian-ELL perfor-mance may also be partly attributable to the fact that the Asian population in this district is affiliated disproportionately with a large research university and several medical institutions, and thus is a particularly talented Asian sample that has been attracted from other states and countries.

Of the three screening measures, VQNSAS yielded the lowest ELL means and identification rates relative to non-ELLs, which could suggest either a disadvantage on verbal items or difficulty with directions spoken in the English lan-guage. The CogAT6 Directions for Administration anticipate this and advise that

students who have just begun instruction in English are not likely to be able to answer many of the questions on the Verbal and Quantitative batteries. . . . However, these students can generally take the tests in the Nonverbal battery. (Lohman & Hagen, 2001b, p. 9)

In fact, our results suggested that the CogAT6 Nonverbal battery is similar to the NNAT2 in identifying students from underrepresented groups at hypothetical cut scores and was better than the NNAT2 at moderating the mean score disad-vantage to Black, Hispanic, multiracial, and non-Asian ELL students.

Of the three screening measures, VQNSAS was the best predictor of high GAI, which may be taken as evidence that it is a better measure of general intelligence because of the broader range of item formats and reasoning abilities

sampled. This conclusion, however, assumes that GAI is a “gold standard” measure; one could conclude alternatively that the CogAT6 Verbal and Quantitative batteries and the WISC-IV Verbal subtests simply share a heavy loading of achievement or language factors.

LimitationsSeveral limitations of this study stem from its dependence on data from one district’s gifted testing records. First, it would have been preferable for analysis if all students had taken the CogAT6, the NNAT2, and the WISC-IV. Instead, practical and financial considerations at the district level meant that each student took either the CogAT6 or the NNAT2 and only a small portion of these students took the WISC-IV. Second, due to sample size limitations, we com-pared results from slightly different grade levels and from test administrations that took place over the course of 5 years. This increased the possibility of unmeasured changes in the sampled population. Third, the fact that the NNAT2 was administered online may raise questions about a disad-vantage for subgroups with less early childhood experience with computers (Huff & Sireci, 2001). Fourth, the latest form of the CogAT (CogAT7, Lohman, 2012a), which has been updated to improve ELL fairness, was not represented in this study (see also Lohman & Gambrell, 2012). Finally, the study is limited to one Midwestern district and may not be representative of other districts.

Two theoretical issues also limit the practical implications of the results. First, one cannot expect one test to perform in isolation as a reliable, valid, and equitable selection tool when matching gifted services to students (NAGC, 2010b, Standard 2.2.5). Dai (2010) confirmed that dependence on a single measure is common (p. 248), but likened it to “putting all of the eggs in one basket” (pp. 224-225). Using a group ability test as a screening test to inform who goes on for indi-vidual testing is a similarly flawed practice. Ideally, one would administer multiple measures to all students and find a fair way to interpret them in combination (Lohman, 2012b). Furthermore, improving minority representation in gifted programming need not require the development of a test on which all subgroups perform identically. The use of local and subgroup norms, for example, offer a defensible framework for identifying talent among underrepresented groups (Lohman, 2012b, p. 27). The utility of the NNAT2 in address-ing underrepresentation is as much about how the test fits into a larger approach to identification as it is about how dif-ferent groups perform on it.

The second theoretical limitation relates to WISC-IV pre-dictivity. The fact that high performance on the CogAT6 was more predictive of high performance on the WISC-IV than was high performance on the NNAT2 could be interpreted either as evidence that the WISC-IV and the CogAT6 share the “achievement” loading that the NNAT2 seeks to avoid, or as evidence that the CogAT6 is a better measure of general ability than the NNAT2. This is ultimately a philosophical

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

108 Gifted Child Quarterly 57(2)

question about which abilities should be used to define aca-demic giftedness, as well as a practical question of which abilities are most required by a particular gifted program.

ConclusionThis study raises doubts about the claims of at least one nonverbal test that it can better identify students from underrepresented groups for gifted services. Districts should not assume that one instrument will be a panacea and, instead, might consider using nonverbal ability tests as one tool in a wider approach to identifying and serving students in these groups.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, author-ship, and/or publication of this article.

References

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.

Borland, J. H. (2004). Issues and practices in the identification and education of gifted students from under-represented groups (Research Monograph No. 04186). Storrs: University of Connect-icut, The National Research Center on the Gifted and Talented.

Callahan, C. M. (2005). Identifying gifted students from underrep-resented populations. Theory into Practice, 44, 98-104.

Carman, C. A., & Taylor, D. K. (2010). Socioeconomic status effects on using the Naglieri Nonverbal Ability Test (NNAT) to identify the gifted/talented. Gifted Child Quarterly, 54, 75-84.

Cohen, J. (1988). Statistical power analysis for the behavioral sci-ences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Dai, D. Y. (2010). The nature and nurture of giftedness: A new framework for understanding gifted education. New York, NY: Teachers College Press.

Donovan, M. S., & Cross, C. T. (Eds.). (2002). Minority students in special and gifted education. Washington, DC: National Acad-emies Press.

Dupont, W. D., & Plummer, W. D. (1997). PS power and sample size program available for free on the Internet. Controlled Clin-ical Trials, 18, 274.

Ford, D. Y. (1998). The underrepresentation of minority students in gifted education. Journal of Special Education, 32, 4-14.

Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20, 16-25.

Lohman, D. F. (2003a). The Wechsler Intelligence Scale for Chil-dren III and the Cognitive Abilities Test (Form 6): Are the gen-eral factors the same? Retrieved from http://faculty.education.uiowa/dlohman/pdf/CogAT-WISC_final_2col2r.pdf

Lohman, D. F. (2003b). The Woodcock-Johnson III and the Cog-nitive Abilities Test (Form 6): A concurrent validity study. Retrieved from http://faculty.education.uiowa.edu/dlohman/pdf/CogAT_WJIII_final_2col%202r.pdf

Lohman, D. F. (2005a). Review of Naglieri and Ford (2003): Does the Naglieri Nonverbal Ability Test identify equal proportions of high-scoring White, Black, and Hispanic students? Gifted Child Quarterly, 49, 19-28.

Lohman, D. F. (2005b). The role of nonverbal ability tests in iden-tifying academically gifted students: An aptitude perspective. Gifted Child Quarterly, 49, 111-138.

Lohman, D. F. (2012a). Cognitive Abilities Test (Form 7). Rolling Meadows, IL: Riverside.

Lohman, D. F. (2012b). Decision Strategies. In S. L. Hunsaker (Ed.), Identification: The theory and practice of identifying stu-dents for gifted and talented education services (pp. 217-248). Mansfield Center, CT: Creative Learning Press.

Lohman, D. F., & Gambrell, J. L. (2012). Using nonverbal tests to help identify academically talented children. Journal of Psy-choeducational Assessment, 30, 25-44.

Lohman, D. F., & Hagen, E. P. (2001a). Cognitive Abilities Test (Form 6). Itasca, IL: Riverside.

Lohman, D. F., & Hagen, E. P. (2001b). Cognitive Abilities Test (Form 6): Directions for administration. Itasca, IL: Riverside.

Lohman, D. F., & Hagen, E. P. (2002). Cognitive Abilities Test (Form 6): Research handbook. Itasca, IL: Riverside.

Lohman, D. F., Korb, K. A., & Lakin, J. M. (2008). Identifying academically gifted English-language learners using nonverbal tests. Gifted Child Quarterly, 52, 275-296.

Naglieri, J. A. (1997). Naglieri Nonverbal Ability Test. San Antonio, TX: Psychological Corporation.

Naglieri, J. A. (2008a). Naglieri Nonverbal Ability Test (2nd ed.). San Antonio, TX: NCS Pearson.

Naglieri, J. A. (2008b). Naglieri Nonverbal Ability Test (Second Edition) manual: Technical information and normative data. San Antonio, TX: NCS Pearson.

Naglieri, J. A. (2010, July). The truth about IQ and achieve-ment. Paper presented at Learning and the Brain Conference, Boston, MA. Retrieved from http://www.jacknaglieri.com/wordpress/wp-content/uploads/2010/11/The-Truth-About-IQ-Ach-HNDT.pdf

Naglieri, J. A., Brulles, D., & Landsdowne, K. (2008). Helping all gifted children learn: A teacher’s guide to using the NNAT2. San Antonio, TX: Pearson.

Naglieri, J. A., & Ford, D. Y. (2003). Addressing underrepresen-tation of gifted minority children using the Naglieri Non-verbal Ability Test (NNAT). Gifted Child Quarterly, 47, 155-160.

Naglieri, J. A., & Ford, D. Y. (2005). Increasing minority children’s participation in gifted classes using the NNAT: A response to Lohman. Gifted Child Quarterly, 49, 29-36.

Naglieri, J. A., & Ronning, M. E. (2000). Comparison of White, African-American, Hispanic, and Asian children on the Naglieri Nonverbal Ability Test. Psychological Assessment, 12, 328-334.

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from

Giessman et al. 109

National Association for Gifted Children. (2010a). NAGC posi-tion statement: WISC-IV. Retrieved from http://www.nagc.org/index.aspx?id=2455

National Association for Gifted Children. (2010b). NAGC pre-K-grade 12 gifted education programming standards: A blueprint for high quality gifted education programs. Washington, DC: Author.

Otis, A. S., & Lennon, R. T. (2003). Otis-Lennon School Ability Test (8th ed.). San Antonio, TX: Psychological Corporation.

Pearson. (2003). Stanford Achievement Test Series (10th ed.). San Antonio, TX: Author.

Pearson. (2012). Introduction to the Naglieri Nonverbal Ability Test–Second Edition (NNAT2). Retrieved from http://www.pearsonassessments.com/haiweb/Cultures/en-US/Site/Community/Education/Products/NNAT2/nnat2.htm

Psychological Corporation. (2001). Wechsler Individual Achieve-ment Test (2nd Ed.). San Antonio, TX: Author.

Rosenthal, J. A. (1996). Qualitative descriptors of strength associa-tion and effect size. Social Service Research, 21, 37-59.

Rowe, E. W., Kingsley, J. M., & Thompson, D. F. (2010). Predic-tive ability of the General Ability Index (GAI) versus the Full Scale IQ among gifted referrals. School Psychology Quarterly, 25, 119-128.

U.S. Department of Education. (1993). National excellence: A case for developing America’s talent. Washington, DC: Author.

Villarreal, C. A. (2005). An analysis of the reliability and valid-ity of the Naglieri Nonverbal Ability Test (NNAT) with English Language Learner (ELL) Mexican-American Children (Doc-toral dissertation). Retrieved from http://repository.tamu.edu/bitstream/handle/1969.1/3850/etd-tamu-2005A-SPSY-Villarr.pdf?sequence=1

Wechsler, D. (1991). Wechsler Intelligence Scales for Children (3rd ed.). San Antonio, TX: Psychological Corporation.

Wechsler, D. (2003a). Wechsler Intelligence Scales for Children (4th Ed.). San Antonio, TX: Psychological Corporation.

Wechsler, D. (2003b). Wechsler Intelligence Scale for Children (Fourth Edition): Technical and interpretive manual. San Anto-nio, TX: Psychological Corporation.

Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Tests of Cognitive Abilities. Itasca, IL: Riverside.

Author Biographies

Jacob A. Giessman is Co-Director of the Center for Gifted Education at the Columbia (Mo.) Public Schools. He is the former head of Academy Hill School in Springfield, Massachusetts, and served on the Massachusetts Department of Elementary and Secondary Education’s Gifted-Talented Advisory Council.

James L. Gambrell is a doctoral student in Educational Psychology at the University of Iowa. His research interests include test valid-ity, growth modeling, school effectiveness, and the use of assess-ments to identify gifted children.

Molly S. Stebbins is the Coordinator of Psychological Services for the Columbia Public Schools in Missouri and has served in the public school system for over 13 years. She is a nationally certified school psychologist and an adjunct assistant professor at the University of Missouri-Columbia.

at University of Missouri-Columbia on March 19, 2013gcq.sagepub.comDownloaded from