A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL · PDF fileA CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR ... A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL ... listening comprehension

A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR

CHILDREN ENTERING PRE-KINDERGARTEN

by

MI-YOUNG WEBB

(Under the Direction of Seock-Ho Kim)

ABSTRACT

The purpose of this study was to determine the psychometric characteristics of phonological awareness assessment in pre-kindergarten children based on Messick’s (1989) framework for unitary construct validity. Four hundred and fifteen pre-kindergarten children were given eight tasks of phonological awareness drawn from “The Phonological Awareness Test” (Robertson & Salter, 1997). The four aspects of construct validity, including content, substantive, structural, and external aspects were examined. The item analysis indicated a high internal consistency; however, the levels of item difficulty for each task were fairly difficult for this age group. Factor analysis with varimax rotation revealed that two factors may underlie the phonological awareness measurement. Although the effect size was small, multiple regression analysis indicated a linear combination of two tasks had a statistically significant predictive validity for beginning alphabet sound knowledge in pre-kindergarten. INDEX WORDS: Validation, Messick’s unitary construct validity, Reading, Phonological awareness, Assessment



by

MI-YOUNG WEBB

B.S, The Cheongju University, South Korea, 1997

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partia l

Fulfillment of the Requirement for the Degree

MASTER OF ARTS

ATHENS, GEORGIA

2003

© 2003

Mi – young Webb

All Rights Reserved



by

MI – YOUNG WEBB

Major Professor: Seock – Ho Kim Committee: Steve Olejnik Paula Schwanenflugel Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2003

iv

TABLES OF CONTENTS

Page

LIST OF TABLES .............................................................................................................vi

LIST OF FIGURES ........................................................................................................viii

CHAPTER

I INTRODUCTION............................................................................................1

Reading and Academic Performance ........................................................1

The Component of Reading Acquisition ..................................................1

Overview...................................................................................................2

II PHONOLOGICAL AWARENESS .................................................................3

Definition of Phonological Awareness .....................................................3

The Role of Phonological Awareness in Reading Acquisition.................4

Developmental Sequence of Phonological Awareness .............................5

Validity Test for Phonological Awareness Tasks .....................................7

Purpose of the Study.................................................................................9

III VALIDITY.....................................................................................................10

Traditional Conception of Validity.........................................................10

Unified Conception of Validity ..............................................................13

Validity as Integrated Evidence ..............................................................21

Facets of the Unitary Validity.................................................................22

IV METHOD .....................................................................................................24

v

Participants..............................................................................................24

Materials .................................................................................................25

Procedure ................................................................................................31

V RESILTS .......................................................................................................35

The Content Aspect of Construct Validity..............................................35

The Substantive Aspect of Construct Validity........................................35

The Structural Aspect of Construct Validity ..........................................37

The External Aspect of Construct Validity.............................................41

VI DISCUSSION...............................................................................................45

The Content Aspect of Construct Validity..............................................45

The Substantive Aspect of Construct Validity........................................48

The Structural Aspect of Construct Validity ..........................................50

The Generalizability Aspect of Construct Validity ................................55

The External Aspect of Construct Validity.............................................56

The Consequential Aspect of Construct Validity ...................................59

VII CONCLUSION ............................................................................................61

REFERENCES ..................................................................................................................63

APPENDIX: PHONOLOGICAL AWARENESS TEST..................................................96

vi

LIST OF TABLES

Page

Table 1: The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Preliminary Item Condition...............................69

Table 2: The Maximum Scores, the Means, and the Standard Deviations fro Phonological

Awareness Tasks Based on the Actual Item Condition.......................................70

Table 3: Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Preliminary Item Condition...............................71

Table 4: Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Actual Item Condition.......................................72

Table 5: The Mean Levels of Task Difficulty of Phonological Awareness ......................73

Table 6: Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item

Condition..............................................................................................................74

Table 7: Item Analyses for Rhyming Discrimination Task Based on the Actual Item

Condition..............................................................................................................75

Table 8: Item Analyses for Syllable Segmentation Task Based on the Preliminary Item

Condition..............................................................................................................76

Table 9: Item Analyses for Syllable Segmentation Task Based on the Actual Item

Condition..............................................................................................................77

Table 10: Item Analyses for Initial Isolation Task Based on the Preliminary Item

Condition............................................................................................................78

vii

Table 11: Item Analyses for Initial Isolation Task Based on the Actual Item

Condition............................................................................................................79

Table 12: Item Analyses for Phoneme Blending Task Based on the Preliminary Item

Condition............................................................................................................80

Table 13: Item Analyses for Phoneme Blending Task Based on the Actual Item

Condition............................................................................................................81

Table 14: Intercorrelations among the Phonological Awareness Tasks ............................82

Table 15: Factors, Eigenvalues, Percentage of Variance Accounted for...........................83

Table 16: Factor Loadings for One-Factor Solution..........................................................84

Table 17: Factor Loadings for Two-Factor Solution after Varimax Rotation...................85

Table 18: The Means and the Standard Deviations of Alphabet Sound Upper and Lower

Case Knowledge Tests .......................................................................................86

Table 19: Predictive Correlations between Phonological Awareness Tasks and Alphabet

Sound Upper and Lower Case Knowledge Tests ..............................................87

Table 20: The Means, and the Standard Deviations of Phonological Awareness Tasks by

Gender Groups ...................................................................................................88

Table 21: The Means and the Standard Deviations of Phonological Awareness Tasks by

Ethnicity Group ..................................................................................................89

Table 22: The Means and the Standard Deviations of Phonological Awareness Tasks by

Socioeconomic Group ........................................................................................90

viii

LIST OF FIGURES

Page

Figure 1: Developmental Sequence of Phonological Awareness ......................................91

Figure 2: Facets of Unitary Validity..................................................................................92

Figure 3: Plot of Eigenvalues and Factors of Scree Test ...................................................93

Figure 4: The Procedure for Assessment Construction and Validation.............................94

1

I. INTRODUCTION

Reading and Academic Performance

Research in early reading acquisition has received considerable attention because

children’s early reading skills have a strong and continuous relationship with their later

academic performance. Children who learn to read early and well are more likely to

become familiarized with print and to increase knowledge domains (Cunningham &

Stanovich, 1997). On the other hand, children who experience difficulties in learning to

read at early ages tend to continue their reading difficulties over time regardless of

remedial services (Johnston & Allington, 1991) and delay learning in other academic

areas which highly depend on their reading skills (Stanovich, 1986; Chall, Jacobs, &

Baldwin, 1990; Stevenson & Newman, 1986).

The Component of Reading Acquisition

No single factor determines the emergence of literacy because reading

development involves complex cognitive levels and multiple activities. Some studies

indicated positive and longitudinal correlations between oral language skills and reading

(Bishop & Adams, 1990). Other research suggests vocabulary skills significantly

influence learning to read (Wagner, Torgesen, Rashotte, Hecht, Barker, Burgess,

Donahue, & Garon, 1997). Whitehurst and Lonigan (1998) proposed three different

components of emergence of literacy named oral language skills, phonological

processing abilities, and print knowledge. Lonigan, Burgess, and Anthony (2000) found

that phonological sensitivity and letter knowledge explained 54 % of the variation in

2

children’s decoding skills. Regardless of different research suggestions on the

components of emergence of literacy, a substantial amount of research has revealed a

significant and continual relationship between phonological awareness and the

acquisition of early reading and spelling (Bradley & Bryant, 1983; Goswami & Bryant,

1990). Much research has suggested that children’s implicit understanding of and ability

to manipulate the sound system of language, which is known as phonological awareness,

is a crucial precursor to the emergence of early literacy. Because of an important role of

phonological awareness in young children, a considerable amount of research has tried to

operationalize the concept of phonological awareness.

Overview

This study investigates measures of phonological awareness for pre-kindergarten

children in terms of their psychometric characteristics. This study will focus on how

framework for unitary construct validity suggested by Messick (1989) can be

implemented in practice. Before the validation study, previous research on phonological

awareness, including the relationship between phonological awareness and the early

reading acquisition, developmental sequence of phonological awareness, and the validity

study of phonological awareness, will be briefly reviewed in the next section.

3

II. PHONOLOGICAL AWARENESS

Definition of Phonological Awareness

Because phonological awareness involves understanding that words can be

divided into segments of sound smaller than a syllable and learning about individual

phonemes, one must know what a phoneme is in order to understand the concept of

phonological awareness (Torgesen & Mathes, 2000). A phoneme is the smallest unit of

sound system in a language which makes a difference in meaning. Phonemic awareness

– a subset of phonological awareness – refers to the awareness that spoken language

consists of a sequence of phonemes (Yopp & Yopp, 2000).

Broadly speaking, phonological awareness refers to the sensitivity to or explicit

awareness of and the ability to manipulate the sound units in spoken language. Thus,

phonological awareness includes the ability to generate and recognize rhyming words, to

count syllables, to segment a word into phonemes, to separate the beginning of a word

from its ending. Beginning readers should understand the fundamental principle that

speech can be segmented and these sound units can be represented by printed forms

(Liberman, Shankweiler, Fischer, & Carter, 1974). Without phonological awareness

young children have difficulty in understanding how alphabetic transcription works, and

consequently, their ability to learn to read is hindered (Torgesen, 1999; Blachman, 1994;

Liberman, Shankweiler, & Liberman, 1989).

4

The Role of Phonological Awareness in Reading Acquisition

Overwhelming evidence from a variety of populations and tasks has indicated a

strong and specific relationship between phonological awareness and early acquisition of

reading and spelling (Adams, 1990; Bradley & Bryant, 1983; Bryant, MacLean, &

Bradley, 1990; Goswami, & Bryant, 1990; Stanovich, 1992; Wagner & Torgesen, 1987).

Children who have better abilities in analyzing and manipulating rhymes, syllables, and

phonemes are better at learning to read than children who have difficulties in acquiring

these skills. The relationship between phonological awareness and early reading

acquisition is present even after such factors as intelligence, vocabulary skills, and

listening comprehension are partialled out (Bryant, MacLean, Bradley, & Crossland,

1990; Stanovich, 1992; Wagner & Torgesen, 1987).

Some researchers have explained that the complex relationship between the

sounds of speech and the signs of print makes it difficult for young readers to perceive

the phonemic segments in speech (Liberman, 1978; Torgesen & Mathes, 2000). For

example, three segments of the written word lag overlap with one another (coarticulating)

and create a single sound in speech production. Coarticulating the phonemes in words

makes it difficult for beginning readers to identify phonemes as unique parts of speech.

Also, letters and phonemes do not always correspond to each other consistently, which

means graphic symbols more or less represent the sounds of speech in different words

(Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967).

Torgesen and Mathes (2000) expound that phonological awareness is not the only

determinant of the early acquisition of reading but it is a critical precursor to effective

reading skills. Phonological awareness promotes children’s understanding of the

5

relationship between speech and alphabetic orthography. Children must understand that

speech is comprised of sound segments at the level of phonemes in order to read the

words in print (Blachman, 1994; Liberman, Shankweiler, & Liberman, 1989; Yopp &

Yopp, 2000). Also, phono logical awareness helps children perceive the categories of

common sounds that are represented by common letters. The ability to observe the

correspondence between letters and sounds in words reinforces children’s knowledge of

common spelling patterns and accurate recognition of whole words that come up in print

repeatedly (Bryant, MacLean, & Bradley, 1990; Goswami, 1986, 1988; Torgesen &

Mathes, 2000). Finally, phonological awareness enables children to produce possible

words in context from the partially sounded out words by elaborating similar phonemes

in words. Indeed, children who are quick to develop the ability to analyze and to

construct a connection between sound segments and letters almost invariably become

better readers than children who have difficulties in developing these skills (Share &

Stanovich, 1995).

Developmental Sequence of Phonological Awareness

Numerous studies and intervention which used various tasks of phonological

awareness have found that, regardless of task requirements, phonological awareness tasks

account for a large portion of common variance of construct that underlies the

measurement. In addition, these studies have demonstrated the different developmental

levels of task difficulty (Adams, 1990; Stahl & Murray, 1994; Stanovich, Cunningham, &

Cramer, 1984; Yopp, 1988). Understanding the developmental sequence of phonological

awareness is important because it is directly related to the issues of validity of

6

assessment. Different tasks involve different levels of cognitive and linguistic abilities or

age-appropriateness, thus the child’s assessed levels of phonological awareness might be

greatly determined by the complexity of the tasks (Backman, 1983; Burt, Holm, & Dodd,

1999).

Generally, the ability to analyze larger units (rhyme and syllable) is developed

prior to the ability to analyze smaller units (phoneme). Hoien, Lundberg, Stanovich, and

Bjaalid (1995) outlined that sensitivity to rhyme is thought to be the beginning of the

developmental continuum of phonological awareness, phoneme segmentation to be the

end of the continuum, and syllable segmentation might be the intermediate level of the

continuum. Children as young as 3 years of age show sensitivity to rhyme, which is a

more global aspect of sound structure of words (Lonigan, Burgess, Anthony, & Barker,

1998; MacLean, Bryant, & Bradley, 1987). Children’s knowledge of nursery rhymes at

age 3 is significantly related to the measure of rhyme detection a year later (MacLean et

al., 1987), and early sensitivity to rhyme and alliteration predicts later awareness of

phonemes which plays an important role in reading development (Bryant et al., 1990).

There is a ceiling effect on rhyme detection and production tasks at the kindergarten

level, and most children are able to blend and segment words into the syllabic unit.

Nonetheless, they cannot segment the words into a series of phonemes at this age level

(Blachman, 1994; Stanovich et al., 1984; Yopp, 1988). By the end of first grade, the

majority of children can manipulate phonemes. They can add, delete, or move phonemes

and generate words. More specific developmental processes of phonological awareness

can be found in Figure 1 (cf. Hill, 1999; Torgesen & Mathes, 2000).

7

Validity Test for Phonological Awareness Tasks

As discussed earlier, a great amount of research using various measures has

focused on the concept of phonological awareness and has found convergence evidence

that performance on phonological awareness tasks are intercorrelated with one another.

Furthermore, regardless of the measures that have been used, phonological awareness

tasks shared a large portion of total variance, which in turn, provide evidence for

construct validity of phonological awareness (Hoien et al., 1995; Stanovich et al., 1984;

Yopp, 1988). Two examples of test validity for phonological awareness tasks are briefly

discussed in this section.

Yopp (1988) administered 10 commonly used phonological awareness tasks,

including; rhyming task, auditory discrimination, phoneme blending, phoneme counting,

phoneme deletion, phoneme segmentation, sound isolation, and word-to-word matching

task, to 96 kindergarten children with an average age of 5 years, 10 months. She found

that the phoneme deletion was the most difficult task, and the rhyming was the easiest

task. She conducted a principal factor analysis with oblique rotation and found that the

first factor accounted for 58.7 % of the variance and the second factor accounted for an

additional 9.5 % of the variance. In addition, phoneme blending, phoneme counting,

phoneme segmentation, and sound isolation all loaded highly on the first factor and the

two phoneme deletion tasks loaded highly on the second factor. She labeled the first

factor as “Simple Phonemic Awareness”, and the second factor as “Compound Phonemic

Awareness”. A stepwise regression analysis was also conducted, with the score on the

learning rate test as the dependent variable and 10 tests of phonological awareness as

8

predictors. The sound isolation task explained 52 % of the variance, and phoneme

deletion task explained 10 % of the variance in the learning rate test.

Hoien, Lundberg, Stanovich, and Bjaalid (1995) utilized a very large sample size

to examine the differential validity of the different levels of phonological awareness. Six

types of phonological awareness tasks including rhyme recognition, syllable counting,

phoneme counting, initial phoneme matching, initial phoneme deletion, and phoneme

blending were administered to 128 Norwegian preschool children. The average age of

the children was 6 years, 11 months. A principal factor analysis using varimax rotation

revealed a three-factor solution. Initial phoneme matching, initial phoneme deletion,

phoneme blending, and phoneme counting were found highly loaded on the first factor

which accounted for 38.6 % of the variance. Syllable counting loaded highest on the

second factor, and rhyme recognition loaded highest on the third factor. The second and

the third factors accounted for 18.4 % and 17.6 % of the variance respectively. Hoien et

al. (1995) concluded that the study results indicated preschool children without any

formal reading instruction and with very limited reading skills showed phonemic

awareness.

9

Purpose of the Study

The studies of Yopp (1988) and Hoien et al. (1995) used large sample sizes and

included a variety of tasks to systematically investigate the concept of phonological

awareness of 5 to 6 years-old children. Similarly, most of studies relating to

phonological awareness have assessed preliterate children at the school entry, prior to

formal reading instruction. Compared with this aspect, there has been much less research

focused on the development of phonological awareness at the preschool age level,

specifically at age of four; nevertheless, the considerable evidence has indicated that

preschool children as young as the age of 3 show implicit knowledge of phonological

awareness (Bryant et al., 1990; MacLean et al., 1987).

The purpose of this study was to conduct a validity study regarding the off- level

use of The Phonological Awareness Test (Robertson & Salter, 1997) for identifying

phonological awareness in preliterate pre-kindergarten children using Messick’s (1989)

framework for unitary construct validity. Because validity is the most important

consideration in a test development and use, traditional view of validity and six aspects of

the unitary concept of validity proposed by Messick (1989, 1995) are briefly reviewed

prior to the validation process for phonological awareness tasks.

10

III. VALIDITY

Validity is “the degree to which evidence and theory support the interpretations of

test scores entailed by proposed uses of tests” (AERA, APA, & NCME, 1999, p. 9).

Accordingly, validation is the most crucial procedure in test development and use

because it is a process of collecting evidence to support the intended interpretation of test

scores and implications of the score meaning.

Traditional Conception of Validity

The conception of validity has gradually shifted from numerous specific criterion

validity to a few distinct validity types and finally to a unitary validity concept (Messick,

1989). Although there has been increasing emphasis on construct validity as a unitary

conception of validity, three or four different types of validity have been commonly

utilized in various assessment settings since the early 1950s. The traditional view of

validity argues that the types or aspects of validity depend on the inferences to be drawn

from the test scores and the implications of entailed test interpretations. These separate

types of validity and the limitation of the traditional conception of validity are briefly

discussed.

Content Validity

Content validity refers to the degree to which the content of test samples

represents the content of a particular behavioral domain of interest. Content validity is

primarily concerned with adequate sampling of the content of the domain. The

knowledge and skills that are measured by the test items should be representative to the

11

larger domain of knowledge and skills. The other aspect of content validity involves the

format of the test such as clarity of questions or directions and appropriateness of

language. Content validity is evaluated based on the professional judgment about the

domain relevance and representativeness of the content according to specific criteria or

objectives. Based upon the agreement in judgments by a panel of content experts, test

developers revise or select the final items. Hence, content validity is to specify the

universe of item content and item – selection procedures (Messick, 1989).

Content validity is important because it accumulates judgmental evidence to

support the domain relevance and representativeness of the test content which act upon

the nature of score inferences supported by other evidence. However, Messick (1989)

argues that using content validity as the solitary validity evidence has a critical limitation.

Content validity does not take into consideration the response processes, the internal and

external structure of the test, or performance differences; thus, it does not provide enough

evidence supporting inferences to be made from the test scores.

Criterion-related Validity

Criterion-related validity is the degree to which the test scores are systematically

associated with one or more external criteria considered to directly measure the same

variable. There are two aspects of criterion-related validity – predictive and concurrent

criterion-related validity. Predictive validity refers to the extent to which the test scores

predict the future performance on the criteria, and concurrent validity indicates the extent

to which the test scores estimate the present performance on the criteria. Therefore,

criterion-related validity is a matter of how the test scores accurately predict criteria

performance. Criterion-related validity is evaluated based on the level of empirical

12

relationship, commonly estimated by correlations or regressions, between the test scores

and criteria scores. For this reason, determining appropriate criteria is a critical step in

criterion-related validation.

Criterion-related validity is not about the pattern of relationships between test

scores and other measures, but rather it is about prediction which is more concerned with

non-causal dependence. Furthermore, criterion-related validity relies very heavily on the

empirical relationships with selected external measures. For this reason, criterion-related

validity may be too narrow to reflect the definition of validity because it does not

consider any other sources of evidence besides specific test – criteria relationships

(Messick, 1989).

Construct Validity

Construct validity refers to the extent to which test scores support the presence of

the psychological construct that underlies the measurement. In this manner, construct

validity is concerned with abstract and theoretical traits such as self-esteem, motivation,

temperament, and creativity. Construct validation begins with the operational definition

of the construct based on the literature reviews and theoretical reasoning. The process of

operationalizing the concept is similar to the process of content validation. Operational

definition is a process of defining the theoretical terms and specifying the hypotheses for

the legitimate experimental procedures for applying a theory (Messick, 1989). After the

construct is operationally defined, the hypotheses – the relationships between the

measures of concepts – are logically and empirically examined. In this process, it is

crucial to evaluate the test items for bias or construct- irrelevant variance which

13

systematically influence the test scores. Finally, empirical evidence is interpreted

whether it is consistent with the hypotheses or rival theories.

Construct validity can be assessed by internal and external test structures, that

examine the pattern of relationships among item scores or between test scores and other

measures. Construct validity also involves study of performance differences over time,

across groups, and different settings in response to experimental treatment. On that

ground, construct validity is an integration of any evidence to support the meaning of the

test scores (Messick, 1989).

Unified Conception of Validity

Traditional distinct types of validity – content, criterion-related, and construct

validity – have been widely utilized in various assessment settings. However, it is

common that inferences, to be drawn from the test scores, require multiple types of

validation approach rather than just one (e.g., Cronbach & Meehl, 1955). Moreover,

content validity as sole validity evidence is insufficient because it does not reflect on the

internal and external test structures and response processes. Thus, it does not provide

evidence that bears on inferences to be made from the test scores. Likewise, criterion-

related validity strictly depends on the specific test – criterion relationships and does not

consider any other sorts of evidence. On that account, Messick (1995) argues that the

traditional conception of validity is fragmented and incomplete because it fails to take

into consideration the evidence for the actual and potential consequences of score

interpretation and use. In addition, he addresses that the types of validity are not

alternatives but supplements of one another because all of these forms of evidence

14

fundamentally support the interpretation and implication of the test scores. Hence, the

relation between the evidence and the inferences should determine the validation

approach focus rather than a type of validity (Messick, 1989). This is why validity is

identified as a unitary concept.

In Messick’s (1989, 1995) view, construct validity incorporates content relevance

and representativeness as well as criterion-relatedness since information about the

domain content relevance and about the specific criterion-relationships predicted by the

test scores clearly influences score interpretation. Therefore, construct validity comprises

almost all aspects of validity evidence. A unitary conception of validity should intermix

considerations of content, criteria, and consequences into a construct framework to

empirically test the rational hypotheses about the interpretation and utility of the test

scores (Messick, 1989, 1995).

Messick’s new unified concept of validity heavily emphasizes on both score

meaning and social values in test interpretation and use. Messick (1989, 1995) suggests

six distinguishable aspects of construct validity to address the multiple and interrelated

validity questions to justify score interpretation and use. There are content, substantive,

structural, generalizability, external, and consequential aspects of construct validity.

Descriptions of these six aspects are outlined to guide the validation of phonological

awareness tasks.

The Content Aspect of Construct Validity

Test content refers to the “themes, wording, and formats of the items, tasks, or

questions on a test as well as guidelines for procedures regarding administration and

scoring” (AERA et al., 1999, p. 11). Hence, the content aspect of construct validity

15

subsumes theoretical and empirical analyses of adequacy of content relevance,

representativeness, and technical quality (Messick, 1989, 1995). This validation process

is to gather the construct-relevant sources of task difficulty and to guide the rational

development and scoring of performance tasks and other assessment formats.

The sources of invalidity are worth addressing because they can occur mostly

during the theoretical and empirical domain of construct – the content aspect of

validation (Benson, 1998). According to Messick (1989, 1995), one of the threats to

validity is known as “Construct Underrepresentation”. Construct underrepresentation

occurs when the assessment is defined too narrowly and fails to adequately cover the

important theoretical domain of construct. Another threat to validity is “Construct-

Irrelevancy”. Construct- irrelevant variance is when the assessment is defined too broadly

and contains excess reliable variance associated with other distinct construct in addition

to the focal construct. That is, aspects of the task are extraneous to the focal construct

and make the task irrelevantly difficult or easy for particular individuals or groups.

In essence, evidence about content is primarily concerned with the basis for

specifying the boundaries and structure of the construct to be assessed. The construct and

test content domain are carefully evaluated by a panel of experts’ professional judgments

and documentation of which addresses the potential sources of irrelevant difficulty or

easiness that require further analysis as well as sample domain processes in terms of their

functional importance (AERA et al., 1999; Messick, 1989, 1995).

On that ground, one needs to consider the definition of phonological awareness –

the sensitivity to or awareness of, and the ability to manipulate the sound units in spoken

language. Then, phonological awareness tasks should be designed to assess individual’s

16

awareness of and ability to manipulate the spoken language segments which make up

words. Regarding the sources of invalidity, understanding the developmental sequence

of phonological awareness in children is important because the difficulty and complexity

of the tasks directly influence children’s performances. The age of subjects and

demographic characteristics should be addressed in this validation step.

The Substantive Aspect of Construct Validity

The substantive aspect of construct validity requires engagement between judged

content relevance and representativeness and empirical response consistency or

performance regularity in the assessment tasks (Loevinger, 1957; Messick, 1989).

Theoretical and empirical analyses of response processes provide evidence for

appropriate sampling of domain and accrue empirical evidence for sampled processes

that are actually engaged by respondents in task performance.

Inferences about processes involved in performance are generally developed by

analyzing individual responses such as eye movements, response times, performance

strategies, or responses to particular items. Empirical evidence of response consistency

also derives from correlation patterns among parts of the test and between the test and

other variables or from consistency in response times for task segments (AERA et al.,

1999; Messick, 1995). In addition to evaluating the response in tasks, the scoring rubrics

or scoring guidelines should be carefully reviewed for the appropriateness of scoring

processes to the intended interpretation or construct definition.

In brief, the matter of test content entails not only the content representativeness

of the construct measure but also the process representation of the construct and the

degree to which these processes are reflective of construct measurement. The content

17

representativeness of the test items need to be assessed in terms of the empirical domain

structure which underlies the ultimate test form and score interpretation (Messick, 1989,

1995). Therefore, the scoring and recording response process should be clearly indicated.

The Structural Aspect of Construct Validity

The structural aspect of validity refers to “the extent to which structural relations

between test items parallel the structural relations of other manifestations of the trait

being measured” (Loevinger, 1957, p. 661). The analyses of internal structure of a test

are to determine the degree of the relationships among test items and the intended

structure of the theoretical domain. Thus, the structural aspect of construct validity

examines the consistency or fidelity of the scoring structure related to the structure of the

construct domain.

The structural aspect can be assessed by various statistical methods such as

intercorrelation among the items and subscales, exploratory and confirmatory factor

analysis, and item response theory. The specific types of analysis and interpretations of

the results rely on the implication and utility of the test scores (AERA et al., 1999). For

instance, if a set of test items of increasing difficulty is of interest, empirical analyses of

the number of items answered correctly or the pattern of scoring key should be provided.

The structural aspect of validity also includes the appropriateness and adequacy of

scaling and equating procedures using item response theory. The adequacy of scaling is

the degree to which the relative weights for different types of items are consistent with

the construct interpretation of the test results (Miller & Linn, 2000).

Indeed, the structural component of construct validity includes both the selection

or construction of relevant assessment tasks and the logical development of construct –

18

based scoring criteria, guidelines, and rubrics. The internal structure of the assessment

including intercorrelation among the items and subtest, degree of homogeneity in the test,

and the dimensionality of the interitem structure should be consistent and reflect the

internal structure of the construct domain (Messick, 1989, 1995). In this aspect, item

analyses including item difficulty, item discrimination, internal consistency, and factor

analysis should be reviewed in addition to the scoring guidelines or the procedure of

scoring on the phonological awareness tasks.

The Generalizability Aspect of Construct Validity

Generalizability is concerned with the numerous factors such as sampling

fluctuations and reliability of measures that contribute to systematic variability in

behavior and performance. Generalizability refers to the degree to which a construct

interpretation empirically generalize to and across population groups (population

generalizability), situations or settings (ecological generalizability), time periods

(temporal generalizability), and task domains (task generalizability) (Messick, 1989).

For example, ecological generalizability involves the sources of invalidity from the

standardization of test materials and administration conditions. As another example,

population generalizability examines the test scores across random samples of diverse

ethnic groups in order to indicate that the test measures the same construct in these

populations. In addition, the limits of score meaning are also influenced by the degree of

generalizability across observers or raters of the task performances.

The degree of generalizability of construct meaning across contexts can be

evaluated by assessing the degree to which test scores reflect comparable patterns of

relationships with other measures or similar responsiveness to treatment across groups,

19

situations, times, and tasks (Messick, 1989). Also, generalizability theory is the

application of analysis of variance models and random variance components to estimate

universe score variance which examines the consistency of the assessment procedures

under different conditions of population groups or tasks (Miller & Linn, 2000). The

generalizability aspect of validity evidence is determined by the degree of correlation of

the assessment tasks with other tasks representing the construct, by the nature of the

construct assessed, and by the scope of its theoretical applicability (Messick, 1989, 1995).

In summary, generalizability is primarily concerned with sources of measurement

error associated with the sampling of tasks, occasions, and raters which underlie

traditional reliability. The generalizability study presents an evidential basis for

judgments of the test interpretation and use across various contexts.

The External Aspect of Construct Validity

The external aspect refers to the degree to which the relationships of test scores

with other measures and non- assessment behaviors or performances reflect the expected

relations in the theory of construct being assessed (Loevinger, 1957). Indeed, “the

construct represented in the assessment should rationally account for the external pattern

of correlations” (Messick, 1995, p. 746).

The external component of validity evidence fundamentally depends on the

correlations between the total score of assessment and any subscores. Accordingly, the

external aspect can be established by the theoretical bases for the obtained patterns and

by structural equation models to reproduce the observed correlations in construct –

consistency. According to Benson (1998), multitrait-multimethod matrix procedure

connects the structural and external stages of validation. The multitrait-multimethod

20

matrix generates two important correlation patterns. One is the “convergent validity

coefficient”, which indicates the relationships between the test scores and other measures

of the same construct on theoretical grounds. Another correlation pattern is the

“discriminant validity coefficient” that specifies the relationships between the test scores

and measures of distinct constructs (AERA et al., 1999; Benson, 1998; Messick, 1995).

In addition, group differentiation also can be relevant if the theoretical construct suggests

the presence or absence of the group differences in the proposed test interpretation.

Contrasting the mean scores of gender, diverse ethnic groups, and socio-economic status

are examples of this approach.

In short, the meaning of the test scores is verified externally by assessing the

degree to which the relevance of the potential relationships with other criterion measure

in the stage of external aspect of validation. The test validation in essence is to insure

that empirical evidence of such relations attest to the scores for the applied purpose.

The Consequential Aspect of Construct Validity

The consequential aspect appraises the intended and unintended consequences of

test uses and implications of score interpretation. AERA et al. (1999) addresses the

distinction between validity evidence about consequences and issues of social policy. If

consequences of assessment are traced to any sources of invalidity such as construct

underrepresentation or construct- irrelevant variance, it is directly related to validity.

Hence, consequences as validity evidence affect or change the score interpretations and

implications of score meaning (Miller & Linn, 2000).

Consequences of assessment are either intended or unintended. Intended

consequences include improved instructional or educational practices, a test used in

21

placement decisions, or selections of effective treatment in therapy. On the other hand,

unintended or adverse consequences include bias in the assessment, unfairness of

assessment, and misinterpretations for certain individuals or groups. Fundamentally, the

measurement is concerned with any negative implication on individuals or groups that are

derived from any sources of invalidity. For example, low scores should not occur

because the test measures unrelated knowledge or skills of domain construct. Also, low

scores should not occur because the assessment contains something sensitive to particular

individuals or groups unintended to be part of the construct.

It is clear that the consequential aspect of validity evidence comprises the value

implication of score interpretations as a basis for actions in addition to actual and

potential consequences of test use (Messick, 1995). Since consequences as a source of

evidence for validity affect the inferences and use of the assessment, the value

implications of score interpretations should be addressed as a part of validity framework

(Messick, 1989; Miller & Linn, 2000).

Validity as Integrated Evidence

The six aspects of construct validity are emphasized as a unified concept that

addresses score-based interpretations, utility of scores, and value implications as a basis

for action. Validity rationale eventually accumulates various sources of evidence to

provide a sound scientific basis for the intended interpretation of test score for specific

use (AERA et al., 1999). Thus, integrating various components of evidence involves

appropriate sampling of domain, relevant assessment task construction procedures,

22

adequate score reliability, proper test administration and scoring procedures, accurate

score scaling and equating, standard setting, and careful attention to test invalidity.

These aspects of validity should be viewed as interdependent and complementary

forms of validity evidence rather than distinct and substitutable validity types. Indeed,

evidence relevant to all of the six aspects need to be integrated into an overall validity

judgment to support score-based interpretations and action implications. Once again, the

unified concept of validity brings considerations of content, criteria, and consequences

together into a construct framework for testing rational hypotheses about theoretical and

score-based inferences (Messick, 1989).

Facets of the Unitary Validity

The unified concept of validity is highlighted because it integrates the

appropriateness, meaningfulness, and usefulness of score-based inferences. Messick

(1989, 1995) suggests two interconnected facets of the unitary validity concept as a way

of cutting and combining validity evidence. The facets of validity enables the prevention

of excessive reliance on selected forms of evidence and emphasizes the supplementary

role of content- and criterion-related inferences to applied decisions and actions based on

the test scores.

The sources of justification of the testing (evidence or consequence) and the

function or outcome of the testing (interpretation or use) generates a four-fold

classification as presented in Figure 2. The evidential basis of test interpretation is

construct validity because construct validity means evidence and rationales support the

score meaning. The evidential basis of test use is also construct validity because it

23

involves the score meaning. Also, the evidential basis of test use is supported by

evidence for the relevance and utility of the test to the specific applied purpose and

setting. The consequential basis of test interpretation is the evaluation of value

implications of score meaning and is construct validity since the score interpretation is

necessary to assess the value implications. Finally, the consequential basis of test use is

the evaluation of both actual and potential social consequences of applied testing. The

social consequences also involve evidence of score meaning, of relevance, and of utility.

24

IV. METHOD

This study utilized data obtained an on-going study by Hamilton, Schwanenflugel,

Neuharth – Pritchett, and Restrepo in pre-kindergarten literacy development. The

descriptions presented here were based on the information provided by these original

investigators.

Participants

A total of 415 pre-kindergarten children (213 boys and 202 girls) participated in

the study. The initial investigators recruited participants at the pre-kindergarten

registration in spring of 2002 in three Northeastern Georgia school districts. Children

were attending 26 public elementary schools in these three school districts. The age of

children ranged from 4 years to 5 years, 7 months with an average age of 4 years, 6

months at the time of the school started in the month of August of the year of 2002. The

ethnic population was diverse; 41.7 % (n = 173) were African-American, 33.4 % (n =

139) were Caucasian, 18.5 % (n = 77) were Hispanic, 5 % (n = 21) were Asian, and 1.4

% (n = 6) were Bi-Racial. 75.8 % (n = 314) of the children spoke English as a first

language, 20.4 % (n = 85) spoke Spanish, and 3.8 % (n = 16) spoke other than English as

a first language. Children were predominantly drawn from a low to lower-middle socio-

economic class population. 32.9 % (n = 137) of children were reported receiving free or

reduced lunch, and 71% (n = 295) of children’s family were reported earning less than

$25,000 per year.

25

The majority of children in this age population did not have any detectable letter

knowledge prior to pre-kindergarten level. None of the children were acquired any

reading skills at the pre-kindergarten age level for the given tasks.

Materials

Phonological Awareness Tasks

A subset of The Phonological Awareness Test (Robertson & Salter, 1997) was

used to assess phonological awareness of pre-kindergarten children in this study. The

Phonological Awareness Test was designed to diagnose deficits in phonological

processing and phoneme-grapheme correspondence. The intended population of The

Phonological Awareness Test is five through nine years of age. The Phonological

Awareness Test included rhyming, segmentation, isolation, deletion, substitution,

blending, graphemes, and decoding subtests.

Eight phonological awareness tasks were drawn from The Phonological

Awareness Test by the initial investigators, Schwaneneflugel and Blake. The initial

investigators included the tasks that were considered to be potentially significant

predictors of reading ability in the previous studies and the tasks that to be included in the

intervention. The rhyming discrimination, sentence segmentation, syllable segmentation,

initial isolation, syllable blending, phoneme blending, consonant graphemes, and long

and short vowels graphemes were included to assess the child’s phonological awareness

in this study. However, instructions were modified slightly and ceiling rules were created

because of the age of the participants. Each of the tasks is described in detail as follows.

The actual tasks items and correct responses are presented in Appendix.

26

Rhyming Discrimination: The rhyming discrimination task was to measure the

child’s ability to identify rhyming words presented in pairs. The examiner said to the

child, “I am going to say two words and ask you if they rhyme. Listen carefully. Do

these words rhyme? Fan – man.” Then the child should respond with either “yes” or

“no”. The examiner indicated whether each response was correct or incorrect, and

provided the correct response, “Fan – man. Yes, they do rhyme.” If the child responded

with other than “yes” or “no”, the examiner repeated the question to elicit a “yes” or “no”

response. The stimulus phrase, “Do these words rhyme?” could be repeated, but no other

prompts were given to the examinees. The actual ten task items were administered to the

child who responded correctly to at least one of the three practice items. Thus, the child

who responded to all three practice items incorrectly was excluded from the task

administration. Practice items included “Fan – man”, “Fan – tan”, and “Fan – dog”.

Only words that the child responded correctly on their own were scored as correct, with a

possible score range of 0 to 10, excluding the three practice items. The examiner stopped

administering the task if there were three consecutive wrong items in the child’s

responses.

Sentence Segmentation: The purpose of sentence segmentation task was to

assess the child’s ability to divide sentence into their constituent words. The examiner

told the child, “I am going to say a sentence, and I want you to clap one time for each

word I say. My house is big. Now, clap it with me.” The examiner said the sentence

again and clapped as she/he said each word. “My – house – is – big. Now, you try it by

yourself. My house is big.” The child should respond with clapping four times, while

she/he repeated the sentence word by word. The examiner indicated whether the child’s

27

response was correct or incorrect. If the child responded incorrectly, the examiner

repeated the sentence and asked the child to clap with her/him. The stimulus phrase,

“Clap one time for each word I say.” was given to the examinees without any other

prompts. Three practice items, including “My – house – is – big.”, “My – name – is -

_______.", and "I – like – dogs.” were given prior to the actual task items. However, the

task administration took considerably long time for this age population. The initial

investigators decided that this task was too long for the concentration level at this age

population. Sentence segmentation task was dropped from the battery after the task was

administered to about 50 pre-kindergarten children.

Syllable Segmentation: The purpose of the syllable segmentation task was to

assess the child’s ability to divide the words into syllables. The examiner told the child,

“I am going to say a word, and I want you to clap one time for each word part or syllable

I say. Saturday. Now, clap it with me.” The examiner said the word again and clapped

once as she/he said each syllable. “Sat – ur – day. Now, try it by yourself.” The words,

including “Saturday”, “Friday”, and “Dog” were given as practice items. The child

should respond with claps, one for each syllable as the child said the word by syllable.

The examiner acknowledged a correct response. If the child responded incorrectly, the

examiner repeated the word and asked the child to clap with her/him. The stimulus

phrase, “Clap one time for each syllable in the word.” was repeated, but no other prompts

were given to the child. After three practice trials, the actual task items were

administered to the child who responded to at least one of the three practice items

correctly; hence, the child was excluded from the task administration if he/she responded

to all three practice items incorrectly. Only words that the child responded to correctly

28

on their own were scored as correct. The examiner stopped the task administration if the

child responded to three consecutive items incorrectly. The child’s score was the number

of correct responses, with a possible score range of 0 to 10, excluding the three practice

items.

Initial Isolation: The initial isolation task was to measure the child’s ability to

identify the initial phoneme in a word. The examiner began the task by saying, “I am

going to say a word, and I want you to tell me the beginning or first sound in the word.

What is the beginning sound in the word CAT?” The child should respond with /k/ or

“kuh”. The examiner gave feedback by saying, “That is correct.” or by saying, “The

beginning sound in CAT is /k/.” The stimulus phrase, “What is the beginning sound in

_______.”, was given to the child. The examiner emphasized the word “sound” if the

child gave letter names; however, she/he scored the item incorrect and did not repeated

the item. After the three practice trials, including “CAT”, “MAD”, and “JANE”, the

examiner administered the actual task items to the child who correctly responded to at

least one of the three practice items. The items that the child responded to correctly on

their own were scored as correct. Score had a possible rage of from 0 to 10 correct,

excluding the three practice items. The task administration stopped if the child responded

to three consecutive items incorrectly.

Syllable Blending: The syllable blending task was designed to assess the child’s

ability to blend individually presented syllables to form a word. The examiner told the

child, “I will say the parts of a word. You guess what the word is. What word is this?

Ta – ble.” The examiner paused for one second between syllables. If the child responded

with table as a whole word without pausing between syllables, the child’s response was

29

scored as correct. The examiner indicated whether each response was correct or incorrect.

If the child repeated the word in parts, the examiner told the child, “Say it faster, like this,

table.” Three practice items, including ta – ble, mo – ther, and he – llo were given to the

examinees before the administration of actual ten task items. However, the task

administration took too long for this age population. The task was dropped from the

battery after it was administered to 50 pre-kindergarten children.

Phoneme Blending: The purpose of the phoneme blending task was to measure

the child’s ability to blend phonemes together to form a word when phonemes were

presented individually. The examiner told the child, “I will say the sounds of a word.

You guess what the word is. What word is this? /P – o – p/.” The examiner paused for

one second between sounds. The child should respond with the word pop without

pausing or distorting any sounds. If the child repeated the sounds as given by the

examiner, she/he was told, “Say it faster, like this pop.” Each child was given three

practice items, including /p – o – p/, /d – o – g/, and /c – a – t/, prior to administration of

the test items. The examiner acknowledged a correct response. If the child responded

incorrectly, the examiner said, “/p – o – p/ is pop.” The stimulus phrase, “What word is

this?” was given to the child without any other prompts. The examiner administered the

actual task items to the child who responded correctly to at least one of the three practice

items. The child’ score was based on the total number of correct responses, with a

possible range of 0 to 10 correct, excluding the three practice items. When there were

three consecutive wrong items in the child’s responses, the examiner stopped

administering the task.

30

Consonants Graphemes: The consonants graphemes task was to assess the

child’s knowledge of sound and symbol correspondence when the letters were

individually presented. The task was not given to the children who did not know the

letters in his or her name. The examiner told the child, “I am going to show you some

letters. I want you to tell me what sound each letter makes.” Some of the letters had two

acceptable sounds. For instance, if the child responded with /k/ or /s/ for the letter c, the

examiner scored the item as correct. But, the consonants graphemes task took too long to

administrate, and the initial investigators decided to drop the consonant graphemes task

from the battery after administered to 50 children.

Long and Short Vowels Graphemes: The purpose of this task was to measure

the child’s knowledge of sound and symbol correspondence of vowels. The examiner

showed the vowels cards to the child and said to the child, “I am going to show you some

letters. I want you to tell me what sound each letter makes.” The task was given to the

children who knew the letters in his or her name. If the child responded with one vowel

sound, the examiner said to the child, “Tell me the other sound this letter makes.” There

were no practice items for this task. However, the administration for this task was too

long for this age population. The task was dropped from the battery after the task was

administered to 50 children.

Criterion Measure

There are many different measures that can be employed to appraise the criterion-

related validity. The initial investigators developed an alphabet knowledge test to

measure the child’s ability to identify the letter names and sounds of the alphabet. An

31

alphabet test was included to determine the predictive validity of each of phonological

awareness tasks.

Alphabet Knowledge Test: An alphabet test was to assess the child’s knowledge

of alphabet letter names and sound correspondence. The examiner showed the child a list

of upper and lower case of letters presented in a random order. The examiner pointed to

each letter sequentially and asked the child, “Do you know what this is?” If the child

responded with a correct letter name, he or she was asked, “What sound does it make?”

If the child responded with a correct letter name, the child was asked what the letter’s

sound was. The examiner recorded child’s responses either correct or incorrect on the

paper. Any correct pronunciation of the given letter was deemed letter sound knowledge.

For example, if the alphabet letter ‘C’ or ‘c’ was pronounced /k/, /s/, or /ch/, the child’s

response was scored as correct for the letter sound knowledge. The alphabet test

included four subtests, including letter name knowledge upper and lower case and letter

sound knowledge upper and lower case. Each of the alphabet tests consisted of 16 upper

and 16 lower case in a random order. Only the alphabet letters that the child responded

correctly on their own were scored as correct. Scores on the alphabet tests ranged from 0

to 16.

Procedure

Assessment Procedure

Assessment of phonological awareness tasks took place over a three-month period

during the months of August and October of the pre-kindergarten year of 2002. Fifteen

examiners were trained by the initial investigators for two days prior to the assessment

32

session. The initial investigators observed the assessment process for a week in order to

insure whether the examiners were fully informed with the administration and scoring

procedures.

The number of sessions taken to complete the assessment relied on the levels of

examinees’ concentration and frustration. Each of the phonological awareness tasks was

administered individually in a quiet room. Items in each task were directly drawn from

The Phonological Awareness Test (Robertson & Salter, 1997), and were given to the

examinees in sequential order. Each task of phonological awareness was administered in

random order in order to avoid the occurrence of an order effect. All examinees were

given three practice items prior to the actual task items. Ten actual task items were

administered to only examinees who responded to at least one of the three practice items

correctly. With respect to the examinees’ age, frustration, and concentration level, the

task administration was stopped if the examinees responded to three consecutive items

incorrectly. If the child was losing track of the task, the examiner went back to the

practice items to remind the child of the task.

The criterion measure, the alphabet knowledge test was given to the examinees

during the months of January and February in the year of 2003. The alphabet test was

administered by a new set of assessors who were similarly trained.

Validation Procedure

The validation study for the phonological awareness tasks in this study focused on

the content, substantive, structural, and external aspects of construct validity proposed by

Messick (1989). Each aspect of validation procedures is briefly reviewed as follows.

33

The content aspect of validity began with literature review about the relationship

between phonological awareness and reading skills of three to seven-year-old children.

The initial investigators selected phonological awareness tasks that were considered to be

related to reading skills later on. The content aspect of construct validity was enhanced

by a pilot study with 19 pre-kindergarten children and 11 kindergarten children.

The substantive aspect of construct validity focused on the age-appropriateness of

task administration. The initial investigators reconstructed guidelines for task

administration and scoring procedures. Because the age population in this study was

younger than the intended population of The Phonological Awareness Test, the

investigators set the ceiling for all subtasks. The actual task items were administered to

only examinees who responded to at least one of three practice items correctly.

Moreover, if the examinees responded to three consecutive items incorrectly, or if the

examinees showed the symptoms of frustration, the examiners stopped the task

administrations. The examiners were trained on the phonological awareness task

administration and scoring procedures by the initial investigators for two days. The

assessment process was observed by the investigators to ensure whether the examiners

were fully informed with phonological awareness task administration and scoring

procedures. In addition, the mean performances and the standard deviations were

calculated as well as internal consistency, using alpha coefficient.

The structural aspect of construct validity was established by the empirical

analyses of items difficulty, item discrimination, and intercorrelations among the tasks.

Factor analysis was also conducted to evaluate the internal structure of the assessment.

34

Finally, as a part of the external aspect of validity, criterion-relatedness was

evaluated by multiple regression analysis with total score on the alphabet upper and

lower sound knowledge test as the dependent variable and the scores on phonological

awareness tasks as the independent variables. In addition to the multiple regression

analysis, the correlation coefficients between alphabet name and knowledge tests and

phonological awareness tasks were calculated. The external aspect of construct validity

also included group differentiation in phonological awareness performances among

gender, ethnicity, and socioeconomic status.

35

V. RESULTS


The description of phonological awareness tasks are presented in the Appendix.

In addition to the task descriptions, the Appendix displays the items and correct

responses, including the three practice items and the ten actual items.


Descriptive Statistics

Table 1 and Table 2 summarize subjects’ performances on the tasks of

phonological awareness. The possible maximum scores, the mean scores, and the

standard deviations are presented, as well as the internal consistency of each task for this

sample.

Table 1 is based on the scores that took into consideration practice items. Recall

that the actual task items were administered to subjects who responded to at least one of

the three practice items correctly. If the subject was given the actual task items, the first

item in the current context labeled as ‘preliminary item’ and was scored as correct.

Likewise, if the computations of the means, the standard deviations, and reliabilities were

included the preliminary item, it was called ‘preliminary item condition’. Hence, Table 1

had a possible score range of 0 to 11, and a score of 0 indicated that the child responded

to all three practice items incorrectly. Table 2 summarized subjects’ performance based

on the actual task items. If the computations of the means, standard deviations, and

36

reliabilities were based on only actual ten task items, it was called ‘actual item condition’.

A possible score range in the actual item condition was 0 to 10.

In both of the preliminary item condition and the actual item condition, rhyming

discrimination task had the highest mean scores (M = 3.64, SD = 3.88 and M = 3.10, SD =

3.46, respectively). On the other hand, initial isolation task had the lowest mean

performance among the tasks in both of the preliminary item condition and the actual

item condition (M = 0.87, SD = 2.54 and M = 0.68, SD = 2.29, respectively). In the

actual item condition, phonemes blending task also had a low mean score of 0.74, with a

standard deviation of 1.86.

Task Reliability

The reliability of each task of phonological awareness was determined by

coefficient alpha. Table 3 displays the coefficient alpha of each task of phonological

awareness, as well as standard error of measurement in the preliminary item condition,

which took into consideration three practice items. Table 4 presents the coefficient alpha

and standard error of measurement of each tasks of phonological awareness based on the

actual item condition. According to Hills (1981), reliability coefficient should be at least

.85 if the interest of test use is to make decisions about individuals. Therefore, reliability

coefficients indicated that all of four phonological awareness tasks had high internal

consistencies, with á > .85. In both preliminary item condition and actual item

condition, initial isolation task had the highest internal consistency, with a coefficient

alpha of .97 and .98, respectively. In contrast, syllables segmentation task had the lowest

internal consistency, with coefficient alpha of .89 and .88, respectively.

37


Item Analyses

All of the items on the phonological awareness tasks were dichotomously scored.

The difficulty level of each task was obtained by averaging the total score mean by the

number of items on the task. Table 5 displays the mean difficulty levels. Examinees

experienced the greatest difficulty with initial isolation task which was to identify the

beginning phonemes in the words (P = .079 in the preliminary item condition, and P =

.067 in the actual item condition). Rhyming discrimination task proved to be the easiest

among the tasks (P = .330 in the preliminary item condition and P = .310 in the actual

item condition).

Because examinees experienced great difficulty with some of the tasks, item

analyses were conducted based on the number of examinees who actually responded to

the item in addition to the total number of examinees. The item difficulty corresponded

to the proportion of examinees who responded to the item correctly. The value of point

biserial correlation between an item score and total score was used for item

discrimination. The point biserial correlation coefficient of .350 or greater is considered

to differentiate relatively high ability examinees from relatively low ability examinees.

None of the items across the phonological awareness tasks had item discrimination that

was less than .350. The results are presented for respective tasks below.

Rhyming Discrimination: Table 6 and 7 display the results of item analyses on

the rhyming discrimination task. The item discrimination ranged from .484 to .823 in the

preliminary item condition, and ranged from .471 to .817 in the actual item condition.

Approximately 46% of the examinees responded to all of the three practice items

38

incorrectly and were not qualified for taking the task. Examinees were more likely to

have difficulty in detecting the non-rhyme words than detecting the rhyme words. All of

the non-rhyme words had the item difficulty level of .169 to .222 based on the total

number of examinees, and ranged .393 to .484 based on the number of examinees who

actually responded to the items. Although the levels of item difficulty were assumed to

systematically decrease as the task administration processed, the item difficulty seemed

to be unsystematically distributed.

Syllable Segmentation: About 54 % of examinees responded to at least one of

the three practice items correctly. Table 8 and 9 show the item difficulty and item

discrimination of the syllable segmentation task. In the preliminary item condition, the

item discrimination ranged from .466 to .727. In the actual item condition, item

discrimination ranged from .475 to .742. Examinees had greater difficulties with more

segmented words (e.g. watermelon or kindergarten) than less segmented words (e.g.

pizza or candy). All of the four-segmentation words had the item difficulty of less than

.100 when the item analyses were based on the total number of examinees. On the other

hand, those items had slightly higher levels of item difficulty of .162 to .204 when the

analyses were based on the number of examinees who actually responded to the items.

The levels of item difficulty seemed to be unsystematically distributed on the syllable

segmentation task.

Initial Isolation: Examinees had the greatest difficult with initial isolation task.

Only 19 % of the examinees responded to at least one of the three practice items

correctly. Table 10 and 11 summarize the item difficulty and item discrimination on the

initial isolation task. Item discrimination ranged from .609 to .953 in the preliminary

39

item condition, and ranged from .839 to .955 in the actual item condition. When the item

analyses were conducted based on the total number of examinees, the levels of item

difficulty seemed to systematically decrease. Moreover, none of the actual task items had

the difficulty level of greater than .083. In contrast, the levels of item difficulty seemed

to be unsystematically distributed when the item analyses were based on the number of

examinees who actually responded to the items. The item difficulty levels increased

dramatically when the item analyses were based on the actual number of responded

examinees. Initial isolation task seemed to be too difficult for this age population.

Phoneme Blending: Table 12 and 13 display the item analyses of the phoneme

blending task. The actual items of phoneme blending task were administered to about 31

% of the total examinees, indicating that about 69% of the examinees responded to all

three practice items incorrectly. Item discrimination ranged from .566 to .772 in the

preliminary item condition, and ranged from .577 to .755 in the actual item condition.

Although there were more examinees who responded to at least one of the three practice

items correctly on the phonemes blending task than on the initial isolation task,

examinees seemed to have more difficulty with the actual task items on the phonemes

blending task. When the analyses were based on the number of examinees who actually

responded to the items, none of the items had the item difficulty level greater than .50

except the first item (P = .598). Furthermore, the levels of item difficulty seemed to

systematically decrease when the analyses were based on the actual number of responded

examinees, as well as the total number of examinees. Phonemes blending task also

seemed to be too difficult for this age population.

40

Task Intercorrelations

The interrelationships between the phonological awareness tasks are demonstrated

in the correlation matrix as shown in Table 14. The correlation coefficients were

computed based on the actual item condition. Using the Bonferroni approach to control

for Type I error across the six correlations (.05/6 = .0083), all of the tasks were

significantly correlated one another. The tasks that correlated the highest were initial

isolation task and phonemes blending task (r = .51, p < .001). Syllables segmentation

task and phonemes blending task had the lowest correlation coefficient (r = .32, p <

.001). The percentage of variance accounted for by the significant correlations ranged

from 10.2 % to 26 %, indicating the medium to large strength of the relationships (J.

Cohen & P. Cohen, 1983).

Factor Analysis

A principal component factor analysis was carried out on the correlation matrix of

phonological awareness tasks (see Table 14 for correlations). The KMO (Kaiser – Meyer

– Olkin Measure of Sampling Adequacy) of .722 indicated that the correlation matrix of

phonological awareness tasks was middling agreeable to factoring. Two criteria were

used to determine the number of factors to rotate: eigenvalues-greater-than-one criterion

and the scree test. Table 15 displays the eigenvalues and the percentage of variance

accounted for. The eigenvalues indicate the variance accounted for by each factor, and

SPSS extracts the number of factors that have eigenvalues greater than one (Green,

Salkind, & Akey, 1997). Only the first factor exceeded the eigenvalues-greater-than-one

criterion for number of factors, and it accounted for 54.8 % of the total variance. The

41

factor loadings are presented in Table 16 when the eigenvalues-greater-than-one criterion

was considered.

The plot of eigenvalues indicated that a two-factor solution might also be

appropriate, especially given that an additional 18.3 % of variance is accounted for (See.

Figure 3). Two factors were extracted by specifying the number of factors in the

analysis, and were rotated using a varimax procedure. Table 17 presents the loadings of

the phonological awareness tasks on the factor after a varimax rotation, revealing that the

smaller unit, phoneme blending and initial isolation tasks loaded highly on Factor 1,

whereas the larger unit, rhyming discrimination and syllable segmentation tasks loaded

highly on Factor 2. This implies that the four tasks of phonological awareness might

have two factors that underlie the measurement.


Relationships to Alphabet Knowledge Test

The mean performances and the standard deviations on four tests of alphabet

name and sound knowledge are displayed in Table 18, including the possible maximum

scores. The letter name knowledge-upper case test had the highest mean score (M =

12.06, SD = 9.92), and the letter sound knowledge- lower case had the lowest mean score

(M = 4.03, SD = 6.67). The predictive correlations between four tasks of phonological

awareness and four tests of alphabet knowledge are presented in Table 19. The

correlation coefficients were computed based on the number of examinees who actually

responded to the phonological awareness task items. Using the Bonferroni method to

control for Type I error across the 16 correlations, a p-value of less than .0031 (.05 / 16 =

42

.0031) was required for significance. None of the predictive correlations between the

phonological awareness tasks and the alphabet knowledge tests were statistically

significantly correlated with one another. The initial isolation task had the highest

correlation with the letter sound knowledge -upper case test (r = .25, n = 36, p = .139).

The phoneme blending task had the lowest correlation with the letter name knowledge-

lower case test (r = -.03, n = 78, p = .767).

Regression Analysis

A forward regression analysis was conducted with a total score on the alphabet

sound-upper and lower case tests as the dependent variable and the four tasks of

phonological awareness as the independent variables. The regression analysis was

conducted based on the total number of examinees. The mean performance on the

alphabet sound knowledge test was 9.06, with a standard deviation of 13.61.

A linear combination of two tasks, initial isolation and phoneme blending made a

significant contribution to explaining the variation in the alphabet sound knowledge test,

F (2, 398) = 5.45, p = .005. The sample multiple correlation was .163, indicating that

approximately 2 % of the variance of the alphabet sound knowledge test in the sample

can be accounted for by the linear combination of initial isolation task and phonemes

blending task. The regression equation is shown below.

YPredicted Alphabet Sound = 1.12 Initial Isolation – 0.90 Phonemes Blending + 9.08

The squared cross-validated correlation coefficient was calculated to evaluate how useful

the sample regression equation would be useful when it is applied to other examinees in

the population (Browne, 1975). The squared cross-validated correlation coefficient was

43

fairly small (Rcv2 = .019) and was similar in value to the squared sample multiple

correlation coefficients (R 2 = .163).

Group Differentiation

Gender Differences: A series of independent samples t-test was conducted to

evaluate the relationship between gender and the performance on each of the

phonological awareness tasks. The Bonferoni procedure was used to control for Type I

error across the tests, with a p-value of less than .0125 (.05/4) for the significance. The

mean performances and the standard deviations on the each phonological awareness tasks

are shown in Table 20. The practical importance, effect size was calculated by the

standardized mean differences. The independent sample t-tests indicated that the groups

did not significantly differ on the following tasks: rhyming discrimination (t (394) =

0.136, p = .892, d = .014); syllable segmentation (t (392) = 0.627, p = .531, d = .063);

initial isolation (t (391) = -0.045, p = .964, d = -.004); and phoneme blending (t (391) = -

0.658, p = .511, d = -.070).

Ethnicity Differences: Table 21 displays the means and the standard deviations

on the each task of phonological awareness by ethnic groups. A series of one-way

analysis of variance was conducted to determine whether there were differences between

ethnic groups. The Bonferroni method was used to control for the Type I error rate

across the tests (.05/4 = .0125). The ANOVA results revealed that there were statistically

not significant differences among the ethnic group performances on the phonological

awareness tasks: rhyming discrimination (F = (4, 312) = 2.42, p = .049, partial ç2 = .030);

syllable segmentation (F (4, 310) = 0.38, p = .826, partial ç2 = .005); initial isolation (F

44

(4, 309) = 1.16, p = .328, partial ç2 = .015); and phonemes blending (F (4, 309) = 0.57, p

= .687, partial ç2 = .007).

Socioeconomic Differences: The two socioeconomic groups were identified

based on the whether the child receive free or reduced lunch or not. Approximately 30 %

of the participants received free or reduced lunch and were identified as lower

socioeconomic group. The mean performances and the standard deviations on each of

the phonological awareness tasks are shown in Table 22. A series of independent

samples t-test was conducted to examine the relationship between the socioeconomic

status and the performance on the phonological awareness tasks, using Bonferroni

method to control for Type I error across the tests (.05/4 = .0125). The independent

samples t-tests indicated nonsignificant relationship between socioeconomic status and

the performances on the phonological awareness tasks: rhyming discrimination (t (381) =

0.491, p = .624, d = .061); syllable segmentation (t (379) = 0.236, p = .814, d = .027);

initial isolation (t (378) = -0.676, p = .500, d = -.077); and phoneme blending (t (378) =

1.137, p = .256, d = .130).

45

VI. DISCUSSION

The current study was to examine the psychometric characteristics of

phonological awareness assessment in pre-kindergarten children. The peculiarity of the

validation study is pursuing six distinguishable and interdependent aspects of unitary

construct validity suggested by Messick (1989). Based upon the theoretical framework,

the study aimed to empirically integrate various components of evidence to form an

overall validity judgment to support the intended score interpretation and the implication

of score meaning. The aspects of construct validity the study focused on and the

limitations of the study are discussed in the following section, as well as the restatement

of the six aspects of construct validity.


The validity evidence about the content is to set up the theoretical and empirical

basis for specifying the boundaries and the structure of the construct domain to be

assessed. The theoretical domain entails the scientific theory about the construct,

previous research, and one’s own observations. The empirical domain involves the

specific set of observed variables that measure the construct (Benson, 1998). Hence, a

matter for discussion about the content-related evidence is to address the professional

judgment and documentation to ensure all important parts of the construct domain are

covered (Messick, 1995).

46

The content-related validation study was primarily reached by examining

previous research about phonological awareness development in young children. The

initial investigators, Schwanenflugel and Blake, reviewed approximately 64 studies using

a wide variety of phonological awareness tasks to measure three to seven-year-old

children’s knowledge of the sound segments with intent to design phonological

awareness intervention for ongoing research, “PAVEd for Success” (Hamilton,

Schwanenflugel, Neuharth-Pritchett, & Restrepo, 2002). They summarized the studies by

the population age, the types of tasks used, the types of study design, and the findings of

the study. The initial investigators selected a subset of eight tasks of phonological

awareness which were considered to be significantly related to reading and decoding

skills later on. Also, the initial investigators took into consideration the mixture of

developmental path of the phonological awareness. They included the tasks that were

considered to be the beginning of the developmental continuum, such as rhyme and

syllable tasks in order to measure the beginning levels of phonological awareness. The

phoneme and grapheme tasks were included to assess the later development of the

phonological awareness. The tasks and the items were directly drawn from The

Phonological Awareness Test (Robertson & Salter, 1997). The initial investigators

conducted a pilot study with 19 pre-kindergarten and 11 kindergarten children during the

months of December and January of the year of 2002 from a local elementary school with

parental consent.

The initial investigators systematically investigated and brought the boundaries of

theoretical domain into focus based on the series of previous studies concerning the

construct. Furthermore, the tasks and the items used in the study drew from the

47

instrument that had established the norms. Nonetheless, some of the tasks of

phonological awareness in the study proved to be potentially incompatible to this age

population during the tasks administration (cf. the item analyses results). For instance,

some of the tasks dropped from the battery because the administration took too long for

the age level. For another example, the initial isolation and phoneme blending tasks

seemed to be too difficult for this age population. This might be due to the fact that the

intended age population of The Phonological Awareness Test was discordant with the age

population of the study. The test manual indicates that administering The Phonological

Awareness Test to children younger than 5 years may not be appropriate since they are

normally not developmentally ready to perform all of the assessment tasks in The

Phonological Awareness Test. Yet, the test manual points out that it is left to the

researcher’s discretion if the administration of particular tasks would be beneficial to

obtain the useful information (Robertson & Salter, 1997).

As discussed earlier, understanding the developmental sequence of phonological

awareness is important because the different developmental levels of task difficulty are

directly related to the issues of assessment validity. The child’s assessed level of

phonological awareness can be dramatically affected by the difficulty or complexity of

the tasks; that is, the different types of tasks depend on the different levels of cognitive

and linguistic abilities of the child.

The task items exceeded the subjects’ levels of attention span or the levels of

developmental task difficulty, may lead to the construct invalidity because the tasks are

irrelevantly too difficult for the age population. Therefore, the tasks should be revised for

48

this age level. For example, the initial investigators might need to reconstruct items by

using more familiar words to this age population.

One way of reconstructing the tasks or items is to comprise a panel of experts to

evaluate the content and format relevance. The content experts’ judgment about the

degree to which the item reflects the content defined by the facet of the domain

specification can provide ongoing professional test-development and systematic

documentation of the consensus of multiple judges (Messick, 1989). The judgment of

experts on the content relevance can be numerically summarized in statistical techniques

[e.g., index of item congruence (Hanbleton, 1980)]. The index of item congruence ranges

from -1 to +1, with the highest value of +1 indicating that all content experts agree that

the item is congruent with the domain specification. In addition, factor analysis or multi-

dimensional scaling of relevance rating by multiple experts can be useful tools for the

purpose of content validity that examine the theoretical boundaries of the construct

(Beson, 1998; Messick, 1989).


The substantive component of construct validity incorporates the content

properties and the response consistencies. Indeed, the substantive aspect is to provide

theoretical rationales and empirical evidence of response consistencies or performance

regularities that manifest the domain specifications (Loevinger, 1957; Messick, 1989,

1995).

The substantive aspect of validity on the present study focused on the structure of

task administration and scoring procedures in order to make subjective judgments and to

49

show that the scores were based on the completion of a process. Regarding the

concentration levels and the cognitive abilities of the age population, the initial

investigators set up two types of ceiling for all the subtasks. The ceiling for

administration starting rule was that the actual task items were administered to subjects

who responded to at least one of the three practice items correctly. Then, the ceiling for

administration termination rule was that the task administration was stopped if there were

three consecutive incorrect items in the responses. Setting the ceiling for the tasks might

be one of the reasons that some of the tasks turned out to be too difficult. For instance,

the majority of the subjects were not qualified for taking the actual task items on the

initial isolation and phonemes blending tasks. Termination of the task administration

after the three consecutive incorrect responses also reduced the number of respondents as

the administration processes. The reduction of the number of respondents might

influence the levels of item difficulty. Use of the ceiling for both administration rule and

termination rule may acquire careful inspection because the observed set of responses

used to estimate the subjects’ abilities to successfully perform the task would be

restricted by the application of the ceiling.

The empirical evidence of response consistency in the study was derived from the

correlation patterns among the items on each task. The internal consistency was

measured by coefficient alpha, revealing that all of four tasks had high internal

consistencies, with á > .85. The high internal consistencies of the phonological

awareness tasks in the present study are likely to be as a consequence of the task

difficulty or setting the ceiling for the tasks. For example, if the particular task was

difficult for most of the subjects, the variance would be small, and the task reliability

50

would increase. Accordingly, one should be cautious to interpret the coefficient alpha

since the task reliability is an important consideration in task selection, and the task

reliability can be affected by multiple factors, such as the variance, the length of the task,

or the quality of instrument itself.

In addition, Messick (1989) suggests a combined convergent-discriminant

strategy for test construction as an elaboration of substantive approach. The convergent-

discriminant strategy is to develop measures of two or more distinct construct at the same

time. If the combined pool of items correlate more highly with their own purported

construct score than with score for other constructs, the items are kept on a given

construct scale. Hence, item selection could be systematically based upon convergent

and discriminant evidence, while method contaminants could be suppressed at the same

time. The present study was not able to achieve such strategy since the items were drawn

from the commercial assessment instrument. If a whole task construction process was

employed, it would be feasible to conduct such an elaboration of substantive approach to

investigate the convergent and discriminant evidence for item selection to provide

explicit reference to task cover and to rationally attune to the nature of the construct in

sound.


The structural aspect of construct validity entails the analyses of internal structure

of the task that appraise the relationships among the task items and the theory of the

construct domain. Messick (1995) notes that the structural aspect of validity should

evaluate not only the selection or construction of assessment tasks related to the domain

51

construct but also the rational development of construct-based scoring criteria, rubrics,

and guidelines. The structural component of validation in the study subsumes the

empirical analyses of item difficulty, item discrimination, and factor analysis in addition

to the task intercorrelations.

Item Analyses

The results of item analyses obtained in the current study agree with previous

studies regarding the levels of task difficulty. Generally, rhyme task is thought to be the

easiest, and phonemes deletion or phonemes segmentation is considered to be the most

difficult among the phonological awareness tasks (Hoien et al., 1994; Stanovich et al.,

1984; Yopp, 1988). Likewise, the present study found that rhyming discrimination was

the easiest, while initial isolation was the most difficult among the phonological

awareness tasks used in the study.

As noted earlier, the item analyses were conducted under two sets of conditions.

First, there was the preliminary item condition that took into consideration three practice

items. In this case, the preliminary item was scored as 1 if the subject responded to at

least one of the three practice items correctly; otherwise, it was scored as 0 on the

preliminary item. Then, there was the actual item condition which considered only the

actual task items. Therefore, item difficulties of preliminary items in Table 6, 8, 10, and

12 imply the proportion of subjects who were qualified for taking the actual task items.

This was based on the assumption that the score of 0 on the preliminary item condition

differed from the score of 0 on the actual item condition.

The second set of item analyses conditions relied on the number of subjects

considered in the data analyses. The item analyses were conducted based on the total

52

number of subjects (N = 415), and based on the number of subjects who actually

responded to the items. Indeed, the subjects who responded to any one of the three

consecutive items incorrectly were excluded from the latter case of the item analyses.

This was because the examinees experienced great difficulty with some of the tasks, and

this strategy was to ensure more suitable item analyses.

It was assumed that the item difficulties on each of the tasks would systematically

decrease as the task administrations progressed because of the ceiling for termination

rule. The results of item analyses indicated that the levels of item difficulty were

unsystematically distributed on rhyming discrimination and syllable segmentation tasks

when both the total number of subjects and the actual number of respondents on the items

were considered for the data analyses. The item difficulties seemed to systematically

decrease as the administration processed on the initial isolation task when the item

analyses were based on the total number of subjects. However, the levels of item

difficulty were unsystematically distributed when the actual number of respondents on

the items was applied to the data analyses. In contrast, the levels of item difficulty

seemed to systematically decrease on the phonemes blending task based on both the total

number of subjects and the actual number of respondents on the items.

Interestingly, there were great discrepancies in the levels of item difficulty on the

initial isolation task when the item analyses were based on the total number of subjects

and based on the number of subjects who actually responded to the items. When the

analyses considered the number of subjects who actually responded to the items, the

levels of item difficulty increased greatly. The item difficulties ranged from .053 (the last

actual item, laugh) to .190 (preliminary item) when the data analyses were based on the

53

total number of subjects. On the other hand, the item difficulties ranged from .190

(preliminary item) to .857 (the fourth actual item, fudge) when the item analyses were

based on the number of subjects who actually responded to the items. Similarly, the

levels of item difficulty on phonemes blending task ranged from .031 (the last actual

item, /s – l – i – p – çr/) to .308 (preliminary item) when the total number of subjects was

applied to the item analyses. When the item analyses were based on the number of

subjects who actually responded to the items, the levels of item difficult ranged from .216

(the second actual item, /n – ç/) to .598 (the first actual item, /b – oi/) (see. Table 10 and

12).

Item discrimination for each item was estimated by the point biserial correlation

coefficient. None of the items on each of the tasks had the item discrimination of less

than .35, revealing that the items discriminated well between the subjects with relatively

high abilities and relatively low abilities.

The separate item analyses on the all three practice items instead of the combined

set of the three preliminary items would provide valuable information to evaluate the

appropriateness of ceiling and to estimate subjects’ abilities to successfully perform the

tasks. That is, the ability of subjects who responded to all three practice items incorrectly

is more likely to differ from the ability of subject who responded to only one practice

item incorrectly. In that sense, it would be desirable to record all the information about

the subjects’ responses on the practice items in addition to the actual task items for more

detailed empirical analyses for those items to estimate the subjects’ potential abilities to

successfully perform the tasks.

54

Task Intercorrelations

Findings of previous studies indicated that various tasks to measure the

knowledge of sound segments were correlated with one another (e.g., Hoien et al., 1995;

Stanovich et al., 1984; Yopp, 1988). Likewise, the four tasks of phonological awareness

used in the study were significantly intercorrelated, suggesting that they tap much of the

same construct that underlies the measurements.

Factor Analysis

Since statistically significant interrelationships were obtained in the correlation

matrix, a principal component factor analysis was conducted in order to examine the

underlying structure. Using the eigenvalues-greater-than-one criterion, the factor

analysis extracted one factor, which accounted for 54.8 % of the total variance. Each of

the tasks strongly loaded on the factor, revealing that the construct may explain each task

well. Yopp (1988) conducted factor analysis based on the ten tasks of phonological

awareness and yielded a two-factor solution. She labeled the first factor as Simple

Phonemic Awareness and the second factor as Compounded Phonemic Awareness.

Hoien and his associates (1995) conducted a factor analysis with six tasks of

phonological awareness and found a three-factor solution, phoneme factor, syllable

factor, and rhyme factor. The present study does not agree with these two studies

regarding the dimensionality. This might be due to the fact that the current study

conducted factor analysis based on a limited number of tasks.

The plot of eigenvalues was also used to determine the number of factors to rotate

and showed that a two-factor solution might also be appropriate. Since an additional 18.3

% of total variance was accounted for by the second factor, factor analysis extracted two

55

factors by specifying the number of factors in the analysis. Phoneme blending and initial

isolation tasks loaded highly on the first factor, with loadings of .88 and .79, respectively;

while rhyming discrimination and syllable blending tasks loaded highly on the second

factor, with loadings of .81 and .81, respectively. This finding is somewhat consistent

with the findings of Hoien and his colleagues (1995). Their findings indicated that the

ability to analyze the smaller units, phonemes is separable from the ability to analyze the

larger unit, rhymes or syllables. Since the scree test yields more accurate analysis than

the eigenvalues-greater-than-one criterion (Green, Salkin, & Akey, 1997) and the second

factor accounted for large amount of total variance, it is concluded that two factors

underlie the construct of phonological awareness. However, it would be more advisable

to conduct confirmatory factor analysis to verify the underlying structure of phonological

awareness found in the present study.

The Generalizability Aspect of Construct Validity

The generalizability component is to examine the replicability or consistency of

assessment results across population groups, situations, time periods, and task domains,

in order to set boundaries of score meaning (Messick, 1995). According to Messick

(1989) the generality of construct meaning can be evaluated by any or all of the

techniques of construct validation. Assessing the comparable correlation patterns with

other measures, examining the test score across random samples of different groups (e.g.,

ethnic, cultural, or SES groups), and combing indicators of test-retest reliability and

construct meaning are examples of techniques to appraise the generalizability of score

meaning. Therefore, the present study also assesses the generality of construct meaning

56

since the purpose of the study was to empirically follow the construct validation process

advocated by Messick (1989) although it provides the limited evidence about the

consistency of assessment results across multiple levels of random facets of phonological

awareness assessment.

Devising a more direct way to appraise the generalizability aspect of construct

validity would be beneficial. For example, Benson (1998) recommended that

generalizability theory is a useful method to differentiate types of errors in measurement

and to provide evidence for how well the empirical domain represents the theoretical

domain. Furthermore, she suggests an informative set of studies that includes

confirmatory factor analysis and generalizability theory. Confirmatory factor analysis is

designed to determine how well the specific set of observed variables fit the structure of

the theoretical domain, and generalizability theory is to evaluate how adequately the

items are representative of the empirical domain.


The external aspect of construct validity is to evaluate how well the assessed

construc t empirically correlates in an expected way with different constructs and

characteristics of the subjects. The evidence about the external structure becomes

especially important if the assessment results are used for selection, placement, licensure,

or program evaluation (Messick, 1995). The present study includes the empirical

relationships between the tasks of phonological awareness and the tests of alphabet

knowledge by correlation coefficients and multiple regression analysis, and group

differentiation to establish the external evidence.

57

Relationships to Alphabet Knowledge Tests

None of the phonological awareness tasks were statistically significantly

correlated with four tests of alphabet knowledge. This finding is contradictory to the

findings of Lonigan and his associates (2000) that there was a predictive relation between

phonological awareness and later letter knowledge. This conflict might be due to

difference in time interval between the administration of phonological awareness and the

alphabet knowledge test. Lonigan and his associates (2000) had about a 12-month time

interval between the phonological awareness tasks at time 1 and the letter knowledge test

at time 2. On the other hand, the current study administered the alphabet name and sound

knowledge tests after about a four month time interval. These non-significant

correlations between the phonological awareness tasks and the alphabet knowledge tests

might also be the results of unknown characteristics of the subjects which might affect

the test scores in the present study. The subjects in the study included a various ethnic

and language background. Thus, there might be some outliers or compounding variables

that affected the score interpretation due to the limited English proficiency or speech

impairment. For example, examining the outliers through the residual would be

beneficial, as well as gathering more detailed information about language related

impairments. The further investigation with the data collected later than the current study

and more explicit reading tests might provide more clear explanation whether the

phonological awareness is significant predictors of later reading and decoding skills.

Regression Analysis

A linear combination of initial isolation and phonemes blending tasks made a

statistically significant contribution to accounting for the variance in the alphabet sound-

58

upper and lower case test, although only 2% of the variance in the alphabet sound test

was accounted for by the linear combination of phonological awareness tasks and the

squared cross-validated correlation coefficient was similarly .019. The result of

regression analysis in the study is similar to the finding of Hoien and his colleagues

(1995) that phonemic awareness proved to be a more potent predictor of early reading

acquisition than syllable or rhyme tasks. Because the linear combination of initial

isolation and phoneme blending tasks explained only 2 % of the variance of the alphabet

sound knowledge test, one needs to be cautious to interpret the result of the regression

analysis. In order to evaluate the relationships between the phonological awareness and

reading development more precisely, including the assessment of more explicit measure

of reading and decoding skills in children one or more years later might be useful.

Group Differentiation

The current study indicates consistent results with findings of Burt and her

associates (1999) that there is no significant gender difference in the performance on the

phonological awareness tasks. The current study also reports that there are no

statistically significant differences in phonological awareness task performances between

the lower socioeconomic (SES) group and the upper SES group; in contrast, Burt and her

colleagues (1999) found that the upper socioeconomic group significantly outperformed

the lower group. Since the subjects were ethnically diverse, and about 24.2 % of the

subjects spoke other than English as a first language, the present study also took into

consideration ethnic differences. There were no statistically significant differences

among the different ethnic groups. The present study found that the tasks of

59

phonological awareness do not seem to have gender, SES, and ethnicity differences when

tested separately.

In addition, study with multitrait-multimethod matrix and structural equation

modeling can provide valuable information about external structure of the assessments.

Multitrait-multimethod matrix can provide empirical collection of convergent and

discriminant evidence by displaying all of the intercorrelations generated when each of

several constructs or traits is measured by each of several methods. Therefore, the

multitrait-multimethod matrix allows estimating the relative contributions of trait and

method variance related to the particular construct measures (Messick, 1989).

Conducting such a method would be beneficial because multitrait-multimethod matrix

entails sound judgment about the constructs to be included in a matrix and offers

provisional evidence to support the nomological validity of the construct. Benson (1998)

suggests the structural equation modeling (SEM) to examine the external aspect of

construct validation. SEM links a specific set of items to the hypothesized structure of

the construct, and the structural model links the constructs with the nomological network

which is theoretical constructs and hypothesized relationships among the constructs.

The Consequential Aspect of Construct Validity

The consequential component of construct validity is fundamentally concerned

with any negative implication on individuals or groups due to the construct

underrepresentation or construct- irrelevant variance. Although some of the tasks used in

the current study might be too difficult for this age population, the levels of task difficulty

did not seemed to affect the scores of certain individuals or groups, such as different

60

ethnic groups and different SES. Additionally, the tasks used to measure the subjects

sensitivity to or ability to analyze the spoken language segments that comprise the words

agree to the purpose of the instrument which was designed to diagnose deficits in

phonological awareness and phoneme-grapheme correspondence.

As noted earlier, validity evidence relevant to all of the six aspects need to be

accumulated into an overall validity judgment to support score-based interpretations and

action implications. This process includes relevant sampling domain, constructing

relevant assessment tasks, appropriate task administration and scoring procedure, and

careful attention to the tasks invalidity. Figure 4 displays the assessment construction

procedures corresponding to the six aspects of construct validation procedures.

61

VII. CONCLUSION

The present study provides information about the psychometric characteristics of

phonological awareness assessment in pre-kindergarten children. Most of all, the study

aims to empirically implement the theoretical framework for unitary construct validity

that integrates various sources of evidence to support the validity of the score derived

from the test.

The current study confirms previous findings regarding the developmental

levels of task difficulty. Although some of the tasks seemed to be too difficult for this

age level, the study found that two factors underlie the construct of phonological

awareness. These two factors accounted 73.12 % of the total variance, supporting the

structural concept of phonological awareness. Furthermore, a linear combination of

initial isolation and phoneme blending tasks, from the first factor, support the predictive

validity for the initial stage of reading acquisition although the practical importance was

fairly small. In addition, the initial investigators modified the technical quality of a

testing system to establish standard setting for the age level, as well as appropriate task

administration and scoring procedures. The levels of task difficulty do not seem to affect

certain types of individuals or groups. From the various components of the validity

evidence, it is concluded that the tasks of phonological awareness in the study provide

valuable information about the knowledge of sound segments in pre-kindergarten

children.

62

Indeed, the present study carries out the unitary conception of construct validation

that accumulates content, criteria, and consequences together to form a scientific basis for

addressing score-based interpretations, utility of score meaning, and value implications as

a ground for action. One should note that the validation is a matter of degree rather than

the property of all or none. The degree to which the score interpretation and implications

of score meaning remain valid across individuals or population, across settings, or across

task context is a continual issue because the interpretation of score on the construct

changes as the social conditions shift (Benson, 1998; Messick, 1989). This is why the

validity is an evolving property, while the validation is a continual process. Therefore,

ongoing validation studies are necessary to reestablish the validity in order for a test to

remain valid over time.

63

REFERENCES

Adams, M. J. (1990). Beginning to read: Thinking and learning about print.

Cambridge, MA: MIT Press.

American Educational Research Association, American Psychological Association, &

National Council on Measurement in Education (1999). Standards for

educational and psychological testing. Washington DC: Author.

Backman, J. (1983). The role of psycholinguistic skills in reading acquisition: A look at

early readers. Reading Research Quarterly, 18, 466-479.

Benson, J. (1998). Developing a strong program of construct validation: A test anxiety

example. Educational Measurement: Issues and Practice, 17(1), 10-17, 22.

Bishop, D. V. M., & Adams, C. (1990). A prospective study of the relationship between

Specific language impairment, phonological disorders, and reading retardation.

Journal of Child Psychology and Psychiatry and Allied Disciplines, 31, 1027-

1050.

Blachman, B. A. (1994). Early literacy acquisition: The role of phonological awareness.

In G. P. Wallach & K. G. Butler (Eds.), Language learning disabilities in school-

age children and adolescents: Some principles and applications (pp. 253-274).

New York, NY: Macmillan.

Bradley, L. L., & Bryant, P. E. (1983). Categorizing sounds and learning to read: A

causal connection. Nature, 301, 419-421.

Browne, M. W. (1975). Predictive validity of a linear regression equation. British

64

Journal of Mathematical and Statistical Psychology, 28, 79-87.

Bryant, P. E., MacLean, M., & Bradley, L. L. (1990). Rhyme, language, and children’s

reading. Applied Psycholinguistics, 11, 237-252.

Bryant, P. E., MacLean, M., Bradley, L. L., & Crossland, J. (1990). Rhyme and

alliteration, phoneme detection, and learning to read. Developmental Psychology,

26, 429-438.

Burt, L., Holm, A., & Dodd, B. (1999). Phonological awareness skills of 4-year-old

British children: An assessment and developmental data. International Journal of

Language & Communication Disorders, 34, 311-335.

Chall, J. S., Jacobs, V., & Baldwin, L. (1990). The reading crisis: Why poor children fall

behind. Cambridge, MA: Harvard University Press.

Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the

behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests.

Psychological Bulletin, 52, 281-302.

Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation

to reading experience and ability 10 years later. Developmental Psychology, 33,

934-945.

Goswami, U. (1986). Children’s use of analogy in learning to read: A developmental

study. Journal of Experimental Child Psychology, 42, 73-83.

Goswami, U. (1988). Children’s use of analogy in learning to spell. British Journal of

Developmental Psychology, 6, 21-33.

65

Goswami, U., & Bryant, P. (1990). Phonological skills and learning to read. Hillsdale,

NJ: Lawrence Erlbaum.

Green, S. B., Salkind, N. J., & Akey, T. M. (1997). Using SPSS for windows: Analyzing

and understanding data (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Hambleton, R. K. (1980). Test score validity and standard-setting methods. In R. A.

(Ed.). Criterion-referenced measurement: The state of the art (pp. 80-123).

Baltimore, MD: Johns Hopkins University Press.

Hamilton, C. E., Schwanenflugel, P., Neuharth-Pritchett, S., & Restrepo, M. A. (2002).

Data from on-going research PAVEd for Success, Unpublished.

Hill, S. (1999). Phonics. York, ME: Stenhouse Publishers.

Hills, J. R. (1981). Measurement and evaluation in the classroom (2nd ed.). Columbus,

OH: Charles E. Merrill.

Hoien, T., Lundberg, I., Stanovich, K. S., & Bjaalid, I. (1995). Component of

phonological awareness. Reading and Writing: An Interdisciplinary Journal, 7,

171-188.

Johnston, P., & Allington, R. (1991). Remediation. In R. Barr, M. Kamil, P. Mosenthal,

& P. D. Pearson (Eds.), Handbook of reading research (pp. 984-1012). New

York: Longman.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).

Perception of the speech code. Psychological Review, 74, 431-461.

Liberman, I. Y. (1978). Segmentation of the spoken word and reading acquisition.

Bulletin of the Orton Society, 23, 65-77.

66

Liberman, I. Y., Shankweiler, D. P., Fischer, F. W., & Carter, B. (1974). Explicit

syllable and phoneme segmentation in the young child. Journal of Experimental

Child Psychology, 18, 201-212.

Liberman, I. Y., Shankweiler, D. P., & Liberman, A. M. (1989). The alphabetic principle

and learning to read. In D. Shankweiler & I. Y. Liberman (Eds.), Phonology and

reading disability: Solving the reading puzzle (pp. 1-33). Ann Arbor: University

of Michigan Press.

Loevinger, J. (1957). Objective tests as instruments of psychological theory.

Psychological Reports, 3, 635-694.

Lonigan, C. J., Burgess, S. R., Anthony, J. L., & Barker, T. A. (1998). Development of

phonological sensitivity in 2-to-5-year-old children. Journal of Educational

Psychology, 90, 294-311.

Lonigan, C. J., Burgess, S. R., & Anthony, J. L. (2000). Development of emergent

literacy and early reading skills in preschool children: Evidence from a latent-

variable longitudinal study. Developmental Psychology, 36, 596-613.

MacLean, M., Bryant, P. E., & Bradley, L. L. (1987). Rhymes, nursery rhymes, and

reading in early childhood. Merrill-Palmer Quarterly, 33, 11-37.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.,

pp. 13-103). New York: Macmillan.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from

persons’ responses and performances as scientific inquiry into score meaning.

American Psychologist, 9, 741-749.

Miller, M. D., & Linn, R. L. (2000). Validation of performance based assessments.

67

Applied Psychological Measurement, 24, 367-378.

Robertson, C., & Salter, W. (1997). The phonological awareness test. East Moline, IL:

LinguiSystem.

Share, D. L., & Stanovich, K. E. (1995). Cognitive processes in early reading

development: Accommodating individual differences into a model of acquisition.

Issues in Education: Contributions from Educational Psychology, 1, 1-57.

Stahl, S. A., & Murray, B. A. (1994). Defining phonological awareness and its

relationship to early reading. Journal of Educational Psychology, 86, 221-234.

Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual

differences in the acquisition of literacy. Reading Research, Quarterly, 21, 360-

407.

Stanovich, K. E. (1992). Speculations on the cause and consequences of individual

differences in early reading acquisition. In P. B. Gough, L. C. Ehri, & R. Treiman

(Eds.), Reading acquisition (pp. 307-342). Hillsdale, NJ: Lawrence Erlbaum.

Stanovich, K. E., Cunningham, A. E., & Cramer, B. B. (1984). Assessing phonological

awareness in kindergarten children: Issues of task comparability. Journal of

Experimental Child Psychology, 38, 175-190.

Stevenson, H. W., & Newman, R. S. (1986). Long – term prediction of achievement and

attitudes in mathematics and reading. Child Development, 57, 646-659.

Sulzby, E., & Teale, W. (1991). Emergent of literacy. In R. Barr, M. Kamil, P.

Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (pp. 727-758).

New York: Longman.

Torgesen, J. K. (1999). Phonologically based reading disabilities: Toward a coherent

68

theory of one kind of learning disability. In R. J. Sternberg & L. Spear-Swerling

(Eds.), Perspectives on learning disabilities: Biological, cognitive, contextual (pp.

106-135). Boulder, CO: Westview Press.

Torgesen, J. K., & Mathes, P. G. (2000). A basic guide to understanding, assessing, and

teaching phonological awareness. Austin, TX: Pro-ED.

Wagner, R. K., & Torgesen, J. K. (1987). The nature of phonological processing and its

causal role in the acquisition of reading skills. Psychological Bulletin, 101, 192-

212.

Wagner, R. K., Torgesen, J. K., Rashotte, C. A., Hecht, S. A., Barker, T. A., Burgess, S.

R., Donahue, J., & Garon, T. (1997). Changing relations between phonological

processing abilities and word- level reading as children develop from beginning to

skilled readers: A 5-year longitudinal study. Developmental Psychology, 33, 468-

479.

Whitehurst, G. J., & Lonigan, C. J. (1998). Child development and emergent literacy.

Child Development, 69, 848-872.

Yopp, H. L. (1988). The validity and reliability of phonemic awareness tests. Reading

Research Quarterly, 23, 159-177.

Yopp, H. K., & Yopp, R. H. (2000). Supporting phonemic awareness development in the

classroom. The Reading Teacher, 54, 130-143.

69

Table 1

The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Preliminary Item Condition

Task Max. Score M SD N

Rhyming discrimination 11 3.64 3.88 415

Syllable segmentation 11 2.35 2.93 415

Initial isolation 11 0.87 2.54 415

Phonemes blending 11 1.05 2.17 415

Note. Preliminary item condition is score 0 if the examinee responded to all three practice

items incorrectly otherwise, it is scored 1.

70

Table 2

The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Actual Item Condition

Task Max. Score M SD N

Rhyming discrimination 10 3.10 3.46 415

Syllable segmentation 10 1.81 2.58 415

Initial isolation 10 0.68 2.29 415

Phonemes blending 10 0.74 1.86 415

71

Table 3

Coefficients Alpha and the Standard Error of Measurements for Phonological Awareness

Tasks Based on the Preliminary Item Condition

Task á SEM N

Rhyming discrimination .93 1.02 415

Syllable segmentation .88 0.98 415

Initial isolation .97 0.45 415

Phonemes blending .89 0.71 415



72

Table 4

Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Actual Item Condition

Task á SEM N

Rhyming discrimination .92 0.98 415

Syllable segmentation .88 0.93 415

Initial isolation .98 0.34 415

Phonemes blending .89 0.60 415

73

Table 5

The Mean Levels of Task Difficulty of Phonological Awareness Tasks

Task

Preliminary item

condition

Actual item

condition

N

Rhyming discrimination .330 .310 415

Syllable segmentation .214 .181 415

Initial isolation .079 .067 415

Phonemes blending .095 .074 415

74

Table 6

Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item

Condition

Item Item difficultya Item difficultyb nb Item discriminationa

Preliminary .542 .542 415 .823

book – look .412 .760 225 .759

fun – run .417 .772 224 .782

ring – rat .222 .414 222 .484

box – mess .222 .449 205 .560

fish – dish .371 .762 202 .812

mop – hop .357 .767 193 .813

shoe – fan .219 .484 188 .601

sweater – better .347 .778 185 .802

camper – hamper .361 .829 181 .817

pudding – table .169 .393 178 .565



aItem difficulties and item discrimination are based on the total number of examinees (N

= 415)

bItem difficulties are based on the number of examinees who actually responded to the

items.

75

Table 7

Item Analyses for Rhyming Discrimination Task Based on the Actual Item Condition


book – look .412 .760 225 .736

fun – run .417 .772 224 .761

ring – rat .222 .414 222 .471

box – mess .222 .449 205 .558

fish – dish .371 .762 202 .809

mop – hop .357 .767 193 .814

shoe – fan .219 .484 188 .604

sweater – better .347 .778 185 .803

camper – hamper .361 .829 181 .817

pudding – table .169 .393 178 .576


= 415)


items.

76

Table 8

Item Analyses for Syllable Segmentation Task Based on the Preliminary Item Condition


Preliminary .540 .540 415 .650

pizza .316 .585 224 .672

watermelon .087 .162 222 .466

fix .337 .639 219 .575

calendar .166 .375 184 .352

television .089 .204 181 .572

moose .275 .659 173 .634

elephant .106 .273 161 .587

pillow .178 .481 154 .695

kindergarten .070 .195 149 .565

candy .190 .552 143 .727




= 415)


items.

77

Table 9

Item Analyses for Syllable Segmentation Task Based on the Actual Item Condition


pizza .316 .585 224 .633

watermelon .087 .162 222 .475

fix .337 .639 219 .514

calendar .166 .375 184 .664

television .089 .204 181 .597

moose .275 .659 173 .604

elephant .106 .273 161 .608

pillow .178 .481 154 .709

kindergarten .070 .195 149 .596

candy .190 .552 143 .742


= 415)


items.

78

Table 10

Item Analyses for Initial Isolation Task Based on the Preliminary Item Condition


Preliminary .190 .190 415 .609

bite .082 .430 79 .953

toy .075 .397 78 .897

dinosaur .065 .355 76 .862

fudge .072 .857 35 .927

nose .072 .833 36 .900

apple .065 .750 36 .881

garage .063 .844 36 .722

happy .063 .743 35 .890

chalk .065 .794 34 .862

laugh .053 .647 34 .827




= 415)


items.

79

Table 11

Item Analyses for Initial Isolation Task Based on the Actual Item Condition


bite .082 .430 79 .955

toy .075 .397 78 .898

dinosaur .065 .355 76 .867

fudge .072 .857 35 .934

nose .072 .833 36 .904

apple .065 .750 36 .888

garage .063 .844 36 .848

happy .063 .743 35 .901

chalk .065 .794 34 .867

laugh .053 .647 34 .839


= 415)


items.

80

Table 12

Item Analyses for Phoneme Blending Task Based on the Preliminary Item Condition


Preliminary .308 .308 415 .600

/b – oi/ .183 .598 127 .772

/n – ç/ .065 .216 125 .579

/p – ö/ .067 .286 98 .647

/s – i – t/ .092 .396 96 .651

/f – l – î/ .087 .456 70 .743

/m – ou – s/ .082 .472 72 .663

/k – î – n – d/ .051 .313 67 .630

/s – n – a – p/ .043 .327 55 .646

/m – i – l – k/ .043 .316 57 .566

/s – l – i – p – çr/ .031 .236 55 .589




= 415)


items.

81

Table 13

Item Analyses for Phoneme Blending Task Based on the Actual Item Condition


/b – oi/ .183 .598 127 .706

/n – ç/ .065 .216 125 .577

/p – ö/ .067 .286 98 .656

/s – i – t/ .092 .396 96 .640

/f – l – î/ .087 .456 70 .755

/m – ou – s/ .082 .472 72 .662

/k – î – n – d/ .051 .313 67 .653

/s – n – a – p/ .043 .327 55 .679

/m – i – l – k/ .043 .316 57 .583

/s – l – i – p – çr/ .031 .236 55 .624


= 415)


items.

82

Table 14

Intercorrelations among the Phonological Awareness Tasks

Task 1 2 3 4

1. Rhyming Discrimination — .40 .36 .36

2. Syllables Segmentation — .43 .32

3. Initial Isolation — .51

4. Phonemes Blending —

Note. Computations are based on the actual item condition

83

Table 15

Factors, Eigenvalues, and Percentage of Variance Accounted for

Factor Eigenvalue Percentage of Variance Total Variance

1 2.19 54.83 54.83

2 .73 18.29 73.12

3 .62 15.44 88.56

4 .46 11.44 100.00

Note. Factor analysis is conducted based on the actual item condition.

84

Table 16

Factor Loadings for One-Factor Solution

Task Factor

Rhyming discrimination .70

Syllables segmentation .72

Initial isolation .79

Phonemes blending .75


85

Table 17

Factor Loadings for Two-Factor Solution after Varimax Rotation

Task Factor 1 Factor 2

Rhyming Discrimination .20 .81

Syllables Segmentation .22 .81

Initial Isolation .79 .32

Phonemes Blending .88 .15


86

Table 18

The Means and the Standard Deviations of Alphabet Knowledge Tests

Tests

Max.

Score

M

SD

N

Letter name knowledge-upper case 16 12.06 9.92 415

Letter sound knowledge-upper case 16 5.03 7.21 415

Letter name knowledge- lower case 16 9.55 8.80 415

Letter sound knowledge-lower case 16 4.03 6.67 415

87

Table 19

Predictive Correlations between Phonological Awareness Tasks and Alphabet

Knowledge Tests

Task

Rhyming

discrimination

Syllables

segmentation

Initial isolation

Phonemes

blending

Letter name

knowledge-upper case

.07

p = .298

.16

p = .033

.08

p = .666

-.03

p = .767

Letter sound

knowledge-upper case

.05

p = .434

.20

p = .006

.25

p = .139

.12

p = .300

Letter name

knowledge- lower case

.07

p = .258

.17

p = .021

.09

p = .605

-.06

p = .629

Letter sound

knowledge- lower case

.05

p = .479

.20

p = .007

.25

p = .140

.15

p = .179

N 212 184 36 78

Note. Correlation coefficients are computed based on the number of examinees who

actually responded to the items on the phonological awareness tasks

88

Table 20

The Means and Standard Deviations of Phonological Awareness Tasks by Gender Group

Male Female

Task M SD N M SD M

Rhyming discrimination 3.18 3.47 200 3.13 3.45 196

Syllables segmentation 1.89 2.70 199 1.73 2.40 195

Initial isolation 0.68 2.31 198 0.69 2.32 195

Phonemes blending 0.68 1.91 198 0.81 1.80 195

Note. The analysis is based on the total number of examinees in the actual item condition.

89

Table 21

The Means and the Standard Deviations of Phonological Awareness Tasks by Ethnic

Group

African-

American

Asian

Bi-Racial

Caucasian

Hispanic

Task

M SD M SD M SD M SD M SD

Rhyming

discrimination

2.64 3.23 1.94 2.82 3.75 2.87 3.48 3.58 4.02 3.85

Syllables

segmentation

1.75 2.49 1.56 2.39 2.25 2.63 1.58 2.40 2.03 2.86

Initial

isolation

0.67 2.35 0.00 0.00 0.00 0.00 0.50 1.89 1.12 2.96

Phonemes

blending

0.63 1.79 0.19 0.75 0.75 1.50 0.75 1.75 0.88 1.97

N 128 16 4 106 60


90

Table 22

The Means and the Standard Deviations of Phonological Awareness Tasks by

Socioeconomic Group

Lower group Upper group

Task M SD M SD

Rhyming discrimination 3.33 3.25 3.14 3.55

Syllables segmentation 1.86 2.63 1.79 2.56

Initial isolation 0.57 2.14 0.75 2.43

Phonemes blending 0.90 2.05 0.66 1.76

N 115 265


Socioeconomic status is based on whether the subject receives free or reduced lunch or

not.

91

Figure 1

Developmental Sequence of Phonological Awareness

Age Development in phonological awareness tasks

3-year-olds Can recite nursery rhymes.

4-year-olds Can detect if two words rhyme.

Can produce a rhyme for a simple word.

5-year-olds Can understand the components of sounds that make them the

same of different.

Can isolate and pronounce the initial sound in a word.

Can blend and segment words into the syllabic units.

6-year-olds Can isolate and pronounce sounds in up to three-phoneme words.

Can blend the sounds in four-phoneme words.

7-year-olds Can manipulate phonemes, including adding, deleting, and moving

any phonemes to generate designated words.

92

Figure 2

Facets of the Unitary Validity

Test Interpretation Test Use

Evidential Basis

Construct Validity

Construct Validity +

Relevance and Utility

Consequential

Basis

Construct Validity + Value

Implications

Construct Validity +

Relevance and Utility +

Value Implication + Social

Consequences

93

Figure 3

Plot of Eigenvalues and Factors of Scree Test

Factor

4321

Eig

enva

lue

2.5

2.0

1.5

1.0

.5

0.0

94

Figure 4

The Procedure for Assessment Construction and Construct Validation

Assessment construction Aspect of validity Validation procedure

• Specifying cognitive outcomes

/ taxonomy of objectives

• Table of specification

• Developing assessment tasks –

construction of items

Content aspect • Specifying domain of construct

– previous research and

observation

• Construct underrepresetation

and construct irrelevancy

• Index of item congruence

• Developing answer keys

• Developing scoring rubrics

• Developing models for scoring

Substantive

aspect

• Administrating and scoring

considerations

• Evaluating assessment

instruments – task reliability

• Summarizing measurement

data

• Gathering information about

item analysis

Structural aspect • Item and subscale

intercorrelations

• Item analysis

• Factor analysis

• Item response theory

• Multitrait-multimethod matrix

Generalizability

• Generalizability theory

• Meta-analysis

95

External aspect • Multitrait-multimethod matrix

• Group differentiation

• Correlations with other

measures

• Regression analysis

• Structural equation modeling

• Selecting items from the

information about item analysis

and item bias detection

• Developing question / item file

Consequential

aspect

• Detecting item bias and fair

selection

• Evaluating intended /

unintended consequences of

score interpretation and use

• Evaluating the impact of test

invalidity

96

APPENDIX: PHONOLOGICAL AWARENESS TEST (PAT)

Ceiling for all subtests: Stop the administration if all of the three practice items are

wrong, or when there are 3 consecutive wrong items. If child is

loosing track of the task, go back to the example to remind the

child of the task.

Name: _______________________________________

Date of Administration: _________________________

Examiner: ____________________________________

Summary of Results

Test Raw Score

Rhyming Discrimination

Sentence Segmentation

Syllable Segmentation

Initial Isolation

Syllable Blending

Phoneme Blending

Consonants Graphemes

Long & Short Vowels Graphemes

97

Rhyming Discrimination

“I am going to say two words and ask you if they rhyme. Listen carefully. Do these

words rhyme? Fan – man.”

Stimulus phrase: “Do these words rhyme? _____ - _____ ”

Practice items: 1. Fan – man (yes), 2. Fan – tan (yes), 3. Fan – dog (no).

Item

Correct Response

Examinee’s

Response

Score

book – look Yes 1 0

fun – run Yes 1 0

ring – rat No 1 0

box - mess No 1 0

fish – dish Yes 1 0

mop – hop Yes 1 0

shoe – fan No 1 0

sweater - better Yes 1 0

camper - hamper Yes 1 0

pudding - table No 1 0

TOTAL SCORE

98

Sentence Segmentation

“I am going to say a sentence, and I want you to clap one time for each word I say. My

house is big. Now, clap it with me.” Say the sentences again and clap once as you say

each word. “My – house – is – big. Now, you try it by yourself. My house is big.”

Stimulus phrase: “Clap one time for each word I say. ____________________”

Practice items: 1. My – house – is – big. (4 claps) 2. My – name – is – _____. (4 claps)

3. I – like – dogs. (3 claps)

Item

Correct

Response

Examinee’s

Response

Score

He can swim 3 claps 1 0

My cat is black 4 claps 1 0

I am very tall 4 claps 1 0

My dad’s car won’t start 5 claps 1 0

That flower is pretty 4 claps 1 0

Some cows give milk 4 claps 1 0

The clown has big feet 5 claps 1 0

Let’s go to school 4 claps 1 0

I have ten books 4 claps 1 0

The kite is flying high 5 claps 1 0

TOTAL SCORE

99

Syllable Segmentation

“I am going to say a word, and I want you to clap one time for each word part or syllable

I say. Saturday. Now, clap it with me.” Say the word and clap once as you say each

syllable. “Sat – ur – day. Now, you try it by yourself. Saturday.”

Stimulus phrase: “Clap one time for each syllable in the word _____.”

Practice items: 1. Sat – tur – day (3 claps) 2. Fri – day (2 claps) 3. Dog (1 clap)

Item

Correct Response

Examinee’s

Response

Score

Pizza 2 claps 1 0

watermelon 4 claps 1 0

Fix 1 claps 1 0

calendar 3 claps 1 0

television 4 claps 1 0

moose 1 claps 1 0

elephant 3 claps 1 0

pillow 2 claps 1 0

kindergarten 4 claps 1 0

candy 2 claps 1 0

TOTAL SCORE

100

Initial Isolation

“I am going to say a word, and I want you to tell me the beginning or first sound in the

word. What’s the beginning sound in the word CAT?”

Stimulus phrase: “What’s the beginning sound in the word _____?”

Practice items: 1. CAT /k/ 2. MAD /m/ 3. JANE /j/

Item

Correct Response

Examinee’s

Response

Score

Bite /b/ 1 0

Toy /t/ 1 0

dinosaur /d/ 1 0

fudge /f/ 1 0

Nose /n/ 1 0

Apple /a/ 1 0

garage /g/ 1 0

happy /h/ 1 0

Chalk /ch/ 1 0

Laugh /l/ 1 0

TOTAL SCORE

101

Syllable Blending

“I’ll say the parts of a word. You guess what the word is. What word is this?” Pause for

one second between syllables. “ta – ble” If the child repeats the word in parts, say “Say

it faster, like this, table.”

Stimulus phrase: “What word is this? _____ .”

Practice items: 1. ta – ble (table) 2. mo – ther (mother) 3. he – llo (hello)

Item

Correct Response

Examinee’s

Response

Score

win - dow window 1 0

flow – er flower 1 0

can – dy candy 1 0

com – pu – ter computer 1 0

moun - tain mountain 1 0

bas – ket basket 1 0

tel – e – phone telephone 1 0

croc – o – dile crocodile 1 0

dic – tion – ar – y dictionary 1 0

con – ver – ti – ble convertible 1 0

TOTAL SCORE

102

Phoneme Blending

“I’ll say the sound. You guess what the word is. What word is this?” Pause for one

second between syllables. “p – o – p” If the child repeats the word by sounds, say, “Say

it faster, like this, pop.”

Stimulus phrase: “What word is this? _____ .”

Practice items: 1. p – o – p (pop) 2. d – o – g (dog) 3. c – a – t (cat)

Item

Correct Response

Examinee’s

Response

Score

/b – oi/ boy 1 0

/n – ç/ knee 1 0

/p – ö/ paw 1 0

/s – i – t/ sit 1 0

/f – l – î/ fly 1 0

/m – ou – s/ mouse 1 0

/k – î – n – d/ kind 1 0

/s – n – a – p/ snap 1 0

/m – i – l – k/ milk 1 0

/s – l – i – p – çr/ slipper 1 0

TOTAL SCORE

103

Consonants Graphemes – Discontinue is child gets 8 consecutive letters wrong, and

does not know those in his or her name.

“I’m going to show you some letters. I want you to tell me what sound each letter

makes.”

Stimulus phrase: “Tell me what sound this makes.”

Note: If the student gives one correct sound of /c, g, s/, prompt for the other sound by

asking, “What’s another sound this makes?” If the student is able to provide one correct

sound, score the item as correct.

Use the graphemes booklet for this subtest.

Item

Correct

Response

Examinee’s

Response

Score

Item

Correct

Response

Examinee’s

Response

Score

b /b/ 1 0 n /n/ 1 0

c /k, s/ 1 0 p /p/ 1 0

d /d/ 1 0 q /k, kw/ 1 0

f /f/ 1 0 r /r/ 1 0

g /g, j/ 1 0 s /s, z/ 1 0

h /h/ 1 0 t /t/ 1 0

j /j/ 1 0 v /v/ 1 0

k /k/ 1 0 w /w/ 1 0

l /l/ 1 0 x /eks, z, ks/ 1 0

m /m/ 1 0 z /z/ 1 0

TOTAL SCORE

104

Long & Short Vowels Graphemes

Use the same vowel card to elicit both the short and long vowel sounds below. If

necessary, prompt with “Now, tell me the other sound this letter makes.”

Note: Use the vowel sounds booklet for this subtest.

Item

Correct Response

Examinee’s

Response

Score

A /a/ as in bat 1 0

A /â/ as in cake 1 0

E /e/ as in met 1 0

E /ç/ as in me 1 0

I /i/ as in sit 1 0

I /î/ as in high 1 0

O /o/ as in top 1 0

O /ô/ as in over 1 0

U /u/ as in but 1 0

U /û/ as in use or tool 1 0

TOTAL SCORE

Documents

A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL · PDF fileA CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR ... A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL ... listening comprehension