22
Application of Generalizability Theory to Concept-Map Assessment Research Yue Yin & Richard J. Shavelson Stanford Educational Assessment Laboratory (SEAL) Stanford University & CRESST AERA 2004, San Diego CA

Application of Generalizability Theory to Concept-Map Assessment Research Yue Yin & Richard J. Shavelson Stanford Educational Assessment Laboratory (SEAL)

Embed Size (px)

Citation preview

Application of Generalizability Theory

to Concept-Map Assessment Research

Yue Yin & Richard J. Shavelson

Stanford Educational Assessment Laboratory (SEAL)Stanford University

& CRESST

AERA 2004, San Diego CA

Overview• Part 1: Feasibility of applying G-theory to

concept-map assessment (CMA) research - Examining the dependability of CMA scores

- Designing a CMA for a particular application

- Narrowing down alternatives

• Part 2: Empirical study of using G-theory to compare two CMAs:

- Construct-a-map with created linking phrases (C)

- Construct-a-map with selected linking phrases (S)

A Concept-map

Linking Phrases

Concepts/Terms

Proposition

Linking lines

Variations in CMAComponents Variation Examples

Task -Topic only

-Topic and concepts (C)

-Topic, concepts and linking phrases (S)

-Topic, incomplete concepts or incomplete

linking phrases (fill-in-the-nodes or fill-in-the-

lines)

Response -Computer

-Paper-pencil

Scoring System -Link score

-Concept score

-Proposition score

-Structure score

Part 1

Feasibility of Applying

G Theory to CMA Research

Viewing CMA with G theory

• Basic idea A particular type of score, given by a particular rater,

based on a particular type of concept map, on a particular occasion, … is a sample from a multifaceted universe.

• Object of measurement People—the variation in students’ knowledge structure

• Facets Task (concept & proposition), response format, scoring

system, rater, occasion, …

G theory vs. CTT

• Concept-term sampling

• Proposition sampling

• Rater sampling

• Occasion sampling

• Equivalence of alternate forms

• Internal consistency

• Inter-rater reliability

• Stability over time

Similarity

G Theory’s Advantage

• Integrate conceptually and simultaneously evaluate all the technical properties above

• Estimate not only the effect of individual facets, but also interaction effects

• Permits us to optimize an assessment’s technical quality

Examining Technical Properties & Designing Assessments

• Examining dependability (G study) How well can a measure of student’s declarative

knowledge structure be generalized across concept map tasks? scoring systems? occasions? raters? propositions? different concept samples?

• Designing an assessment (D study)How many concept map tasks, scoring systems, occasions, raters, propositions, and/or different concept samples will be needed to obtain a reliable measurement of students’ declarative knowledge structure?

Narrowing Down Alternatives• Task

- Which task type is more reliable over raters, occasions, propositions, concept samples?

- Accordingly, this task needs fewer raters, occasions, propositions, and concept samples.

• Scoring system - Which scoring system is more reliable over raters, occasions, propositions, concept samples?

- Accordingly, this scoring system needs fewer raters, occasions, propositions, and concept samples.

Part 2

Empirical Study of Using

G-theory to Compare

Two CMAs

Two Frequently Used CMAs

• Construct-a-map with created linking phrases (C)--Provides a cognitively valid measure of knowledge structure (e.g., Ruiz-Primo et al., 2001 & Yin et al., 2004)

• Construct-a-map with selected linking phrases (S)--Provides an efficient way to measure knowledge structure (e.g., Klein et al., 2001)

Method• Participants - 92 eighth-graders - 46 girls - previously studied a related unit - no related instruction between two occasions

• Procedures C S (n = 22)

S C (n = 23)C C (n = 26)S S (n = 21)

• Concept-map task - 9 Concepts (for C & S)

water, volume, cubic centimeter, wood, density, mass, buoyancy, gram, and matter

- 6 Linking phrases (for S only) is a measure of…

has a property of…

depends on…

is a form of…

is mass divided by…

divided by volume equals…

Criterion Map

Water

Mass Volume

Buoyancy

CCGram

Wood

Matter

Density

is unit of

has a property of

depends on

is a form of

is mass divided bydivided by volume equals

is a form of

has a property ofhas a property of

is a unit of

has

has

has

has has

hashas

Mandatory Propositions

Source of Variation

CS & SC• Person (P)• Proposition/Item (I)• Format (F) • P x F• P x I• F x I• P x F x I, e

CC & SS• Person (P)• Proposition/Item (I)• Occasion (O)• P x O• P x I• O x I• P x O x I, e

Variance Component Estimate

G Study in SC & CS

0%

10%

20%

30%

40%

50%

60%

70%

P F I PF PI FI PFI,e

Source

Pe

rce

nt

of

To

tal V

ari

ab

ility

CS

SC

G Study in CC & SS

0%

10%

20%

30%

40%

50%

60%

70%

P O I PO PI OI POI,e

Source

Per

cen

t of T

ota

l Var

iab

ility

CC

SS

D Study for C CMA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 4 8 12 16 20 24 28 32

Item/Proposition Numbers

Rel

ativ

e G

Co

effi

cien

t

1

2

3

D Study for S CMA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 4 8 12 16 20 24 28 32

Item/Proposition Number

Rel

ativ

e G

Co

effi

cien

t

1

2

3

Conclusions

• G study pinpoints multiple sources of measurement error, thereby giving insight into how to improve the reliability and applicability of CMA via a D study

• C and S mapping tasks are not equivalent in their technical properties

• Fewer occasions and propositions are needed in S than C to get a reliable evaluation of students’ declarative knowledge structure

Thank You for Your Interest!

To get the complete paper, please either

contact Yue Yin at

[email protected]

Or

download the file directly at

http://www.stanford.edu/dept/SUSE/SEAL/