Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Computerized adaptive testingusing Bayesian networks
Jirı Vomlel
Academy of Sciences of the Czech Republic
11th July, 2007
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 1 / 25
Educational Testing Service (ETS)
ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.
Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship
Qualifying Test1 421 000 AP: Advanced Placement Program
801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests
787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test
ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25
Educational Testing Service (ETS)
ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship
Qualifying Test1 421 000 AP: Advanced Placement Program
801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests
787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test
ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25
Educational Testing Service (ETS)
ETS is the world’s largest private educational testing organization.It has 2300 permanent employees.Number of participants in tests in the school year 2000/2001:3 185 000 SAT I Reasoning Test and SAT II: Subject Area Tests2 293 000 PSAT: Preliminary SAT/National Merit Scholarship
Qualifying Test1 421 000 AP: Advanced Placement Program
801 000 The Praxis Series: Professional Assessments for Be-ginning Teachers and Pre-Professional Skills Tests
787 000 TOEFL: Test of English as a Foreign Language449 000 GRE: Graduate Record Examinations General Test
ETS has a research unit doing research on adaptive testing usingBayesian networks: http://www.ets.org/research/
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 2 / 25
Rasch model (G. Rasch,1960)
Model variables
Yn,i binary response variable - its values indicateswhether the answer of person n to question i wascorrect
n = 1, . . . ,N person indexi = 1, . . . , I question index
Model parameters
δi difficulty of question i - fixed effectsβn ability (knowledge level) of person n - a random ef-
fect
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 3 / 25
Rasch model (G. Rasch,1960)
Model variables
Yn,i binary response variable - its values indicateswhether the answer of person n to question i wascorrect
n = 1, . . . ,N person indexi = 1, . . . , I question index
Model parameters
δi difficulty of question i - fixed effectsβn ability (knowledge level) of person n - a random ef-
fect
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 3 / 25
Models for the response variable Y
Yn,i =
{1 if βn ≥ δi0 otherwise.
P(Yn,i = 1) =exp(βn − δi)
1 + exp(βn − δi)
P(Yn,i = 1 | βn) for δi = −2
-6 -4 -2 0 2
0.2
0.4
0.6
0.8
1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25
Models for the response variable Y
Yn,i =
{1 if βn ≥ δi0 otherwise.
P(Yn,i = 1) =exp(βn − δi)
1 + exp(βn − δi)
P(Yn,i = 1 | βn) for δi = −2
-6 -4 -2 0 2
0.2
0.4
0.6
0.8
1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25
Models for the response variable Y
Yn,i =
{1 if βn ≥ δi0 otherwise.
P(Yn,i = 1) =exp(βn − δi)
1 + exp(βn − δi)
P(Yn,i = 1 | βn) for δi = −2
-6 -4 -2 0 2
0.2
0.4
0.6
0.8
1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 4 / 25
Probability distribution for random effect βn
P(βn) = N (0, σ2)
a normal (Gaussian) distributionwith the mean equal zero, and variance σ2.
-3 -2 -1 0 1 2 30
0.1
0.2
0.3
0.4
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 5 / 25
Probability distribution for random effect βn
P(βn) = N (0, σ2)
a normal (Gaussian) distributionwith the mean equal zero, and variance σ2.
-3 -2 -1 0 1 2 30
0.1
0.2
0.3
0.4
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 5 / 25
Computations with the Rasch model
prior Nβ(0,1)
-3 -2 -1 0 1 2 30
0.1
0.2
0.3
0.4
Nβ(0,1)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25
Computations with the Rasch model
P(Y = 1 | β, δ1 = −2) posterior
-3 -2 -1 0 1 2 3
0.2
0.4
0.6
0.8
1
-3 -2 -1 0 1 2 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Nβ(0,1) · P(Y = 1 | β, δ1 = −2)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25
Computations with the Rasch model
P(Y = 0 | β, δ2 = 0) posterior
-3 -2 -1 0 1 2 3
0.2
0.4
0.6
0.8
1
-3 -2 -1 0 1 2 30
0.025
0.05
0.075
0.1
0.125
0.15
0.175
Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25
Computations with the Rasch model
P(Y = 0 | β, δ3 = +1) posterior
-3 -2 -1 0 1 2 3
0.2
0.4
0.6
0.8
1
-3 -2 -1 0 1 2 30
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)
·P(Y = 0 | β, δ3 = +1)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25
Computations with the Rasch model
P(Y = 0 | β, δ4 = +2) posterior
-3 -2 -1 0 1 2 3
0.2
0.4
0.6
0.8
1
-3 -2 -1 0 1 2 30
0.02
0.04
0.06
0.08
0.1
0.12
Nβ(0,1) · P(Y = 1 | β, δ1 = −2) · P(Y = 0 | β, δ2 = 0)
·P(Y = 0 | β, δ3 = +1) · P(Y = 0 | β, δ4 = +2)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 6 / 25
Student model and evidence models(R. Almond and R. Mislevy, 1999)
The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .
student model describes relations between student skills.evidence models - one for each item (question) - describesrelations of the item to the skills.
S4
X2
S1 S3
S4
S2S3
X3
S2
X1
S3
S5
S1
S5
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25
Student model and evidence models(R. Almond and R. Mislevy, 1999)
The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .student model describes relations between student skills.
evidence models - one for each item (question) - describesrelations of the item to the skills.
S4
X2
S1 S3
S4
S2S3
X3
S2
X1
S3
S5
S1
S5
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25
Student model and evidence models(R. Almond and R. Mislevy, 1999)
The variables of the models are: (1) skills, abilities,misconceptions, etc. - for brevity called skills - the vector of skillsis denoted S and (2) items (questions) - the vector of questions isdenoted X .student model describes relations between student skills.evidence models - one for each item (question) - describesrelations of the item to the skills.
S4
X2
S1 S3
S4
S2S3
X3
S2
X1
S3
S5
S1
S5
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 7 / 25
Building Bayesian network models
Three basic approaches:
Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25
Building Bayesian network models
Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the model
A dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25
Building Bayesian network models
Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.
A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25
Building Bayesian network models
Three basic approaches:Discussions with domain experts: expert knowledge is used toget the structure and parameters of the modelA dataset of records is collected and a machine learning methodis used to to construct a model and estimate its parameters.A combination of previous two: e.g. experts suggest thestructure and collected data are used to estimate parameters.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 8 / 25
Example of a simple diagnostic task
We want to diagnose presence/absence of three skills
S1,S2,S3
using three questionsX1,2,X1,3,X2,3 .
The questions depend on skills and their dependence is described byconditional probability distributions
P(Xi,j = 1|Si = si ,Sj = sj) =
{1 if (si , sj) = (1,1)0 otherwise.
Assume all answers were wrong, i.e.,
X1,2 = 0, X1,3 = 0, X2,3 = 0 .
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25
Example of a simple diagnostic task
We want to diagnose presence/absence of three skills
S1,S2,S3
using three questionsX1,2,X1,3,X2,3 .
The questions depend on skills and their dependence is described byconditional probability distributions
P(Xi,j = 1|Si = si ,Sj = sj) =
{1 if (si , sj) = (1,1)0 otherwise.
Assume all answers were wrong, i.e.,
X1,2 = 0, X1,3 = 0, X2,3 = 0 .
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25
Example of a simple diagnostic task
We want to diagnose presence/absence of three skills
S1,S2,S3
using three questionsX1,2,X1,3,X2,3 .
The questions depend on skills and their dependence is described byconditional probability distributions
P(Xi,j = 1|Si = si ,Sj = sj) =
{1 if (si , sj) = (1,1)0 otherwise.
Assume all answers were wrong, i.e.,
X1,2 = 0, X1,3 = 0, X2,3 = 0 .
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25
Example of a simple diagnostic task
We want to diagnose presence/absence of three skills
S1,S2,S3
using three questionsX1,2,X1,3,X2,3 .
The questions depend on skills and their dependence is described byconditional probability distributions
P(Xi,j = 1|Si = si ,Sj = sj) =
{1 if (si , sj) = (1,1)0 otherwise.
Assume all answers were wrong, i.e.,
X1,2 = 0, X1,3 = 0, X2,3 = 0 .
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 9 / 25
Reasoning under the assumption of skills’independence
X1,2
X1,3
X2,3
S1 S3
S2
First, assume the skills arepairwise independent, i.e.,
P(S1,S2,S3) = P(S1)·P(S2)·P(S3)
and for i = 1,2,3, si = 0,1P(Si = si) = 0.5
Then conditional probabilities for j = 1,2,3 are
P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,
i.e., we cannot decide with certainty, which skills are present and whichare absent.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25
Reasoning under the assumption of skills’independence
X1,2
X1,3
X2,3
S1 S3
S2
First, assume the skills arepairwise independent, i.e.,
P(S1,S2,S3) = P(S1)·P(S2)·P(S3)
and for i = 1,2,3, si = 0,1P(Si = si) = 0.5
Then conditional probabilities for j = 1,2,3 are
P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,
i.e., we cannot decide with certainty, which skills are present and whichare absent.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25
Reasoning under the assumption of skills’independence
X1,2
X1,3
X2,3
S1 S3
S2
First, assume the skills arepairwise independent, i.e.,
P(S1,S2,S3) = P(S1)·P(S2)·P(S3)
and for i = 1,2,3, si = 0,1P(Si = si) = 0.5
Then conditional probabilities for j = 1,2,3 are
P(Sj = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.75 ,
i.e., we cannot decide with certainty, which skills are present and whichare absent.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 10 / 25
Modeling dependence between skills
X2,3
X1,3
X1,2
S1 S3
S2
Now, assume there is adeterministic hierarchy amongskills
S1 ⇒ S2, S2 ⇒ S3
Then conditional probabilities are
P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5
For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25
Modeling dependence between skills
X2,3
X1,3
X1,2
S1 S3
S2
Now, assume there is adeterministic hierarchy amongskills
S1 ⇒ S2, S2 ⇒ S3
Then conditional probabilities are
P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5
For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25
Modeling dependence between skills
X2,3
X1,3
X1,2
S1 S3
S2
Now, assume there is adeterministic hierarchy amongskills
S1 ⇒ S2, S2 ⇒ S3
Then conditional probabilities are
P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5
For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25
Modeling dependence between skills
X2,3
X1,3
X1,2
S1 S3
S2
Now, assume there is adeterministic hierarchy amongskills
S1 ⇒ S2, S2 ⇒ S3
Then conditional probabilities are
P(S1 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S2 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 1P(S3 = 0 | X1,2 = 0,X1,3 = 0,X2,3 = 0) = 0.5
For i = 1,2,3: P(Si | X1,2 = 0,X1,3 = 0,X2,3 = 0) = P(Si | X2,3 = 0).X2,3 = 0 gives the same information as X1,2 = X1,3 = X2,3 = 0.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 11 / 25
Fixed test vs. adaptive test
wrong correct wrong
wrong correct
correct
correctwrongcorrectcorrectwrong wrong correct wrong
Q5
Q4
Q3
Q2
Q1
Q2
Q1
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q6 Q8
Q7
Q8
Q4 Q7 Q10
Q6
Q7
Q9
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 12 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.
Two basic steps are repeated:
1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student
Entropy as an information measure:H (P(S)) = −
∑s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:
1 estimation of the knowledge level of the tested student
2 selection of appropriate question to ask the studentEntropy as an information measure:
H (P(S)) = −∑
s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:
1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student
Entropy as an information measure:H (P(S)) = −
∑s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:
1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student
Entropy as an information measure:
H (P(S)) = −∑
s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:
1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student
Entropy as an information measure:H (P(S)) = −
∑s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Computerized Adaptive Testing (CAT)
The goal of CAT is to tailor each test so that it brings most informationabout each student.Two basic steps are repeated:
1 estimation of the knowledge level of the tested student2 selection of appropriate question to ask the student
Entropy as an information measure:H (P(S)) = −
∑s P(S = s) · log P(S = s)
0
0.5
1
0 0.5 1
entr
opy
probability
“The lower the entropy the more we know.”
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 13 / 25
Building strategies using the models
Visit Denmark.
Visit Norway.
Visit Canary Islands.
Do you like hot climate?
Visit Egypt.
Do you like mountains?
Do you like the sea?
X3 = yes
X2 = noX1 = yes
X1 = no
X3 = noX2 = yes
For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:
steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25
Building strategies using the models
Visit Denmark.
Visit Norway.
Visit Canary Islands.
Do you like hot climate?
Visit Egypt.
Do you like mountains?
Do you like the sea?
X3 = yes
X2 = noX1 = yes
X1 = no
X3 = noX2 = yes
For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.
Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25
Building strategies using the models
Visit Denmark.
Visit Norway.
Visit Canary Islands.
Do you like hot climate?
Visit Egypt.
Do you like mountains?
Do you like the sea?
X3 = yes
X2 = noX1 = yes
X1 = no
X3 = noX2 = yes
For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).
Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25
Building strategies using the models
Visit Denmark.
Visit Norway.
Visit Canary Islands.
Do you like hot climate?
Visit Egypt.
Do you like mountains?
Do you like the sea?
X3 = yes
X2 = noX1 = yes
X1 = no
X3 = noX2 = yes
For all terminal nodes (leaves) ` ∈ L(s) of a strategy s we define:steps that were performed to get to that node (e.g. questionsanswered in a certain way). It is called collected evidence e`.Using the probabilistic model of the domain we can computeprobability of getting to a terminal node P(e`).Also during the process, when we collected evidence e, weupdate the probability of getting to a terminal node to conditionalprobability P(e`|e)
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 14 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,
The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.
Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:
expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal is
to find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.
Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,
which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Building strategies using the models
For all terminal nodes ` ∈ L(s) of a strategy s we also define:an evaluation function f : ∪s∈SL(s) 7→ R,The evaluation function can be, e.g., the information gain.Information gain in a node n of a strategyIG(en) = H(P(S))− H(P(S | en))
For each strategy we can compute:expected value of the strategy:
Ef (s) =∑`∈L(s)
P(e`) · f (e`)
The goal isto find a strategy that maximizes its expected value.Specifically, we will maximize the expected information gain,which corresponds to minimization of expected entropy.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 15 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).
A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).
Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Space of tests of length 2 of 3 possible questions
X3
X1
X3
X3
X2
X3
X2
X1
X2
X1
X2
X2
X3
X1
X1
Entropy in node n:H(en) = H(P(S | en))
Expected entropy of a test tEH(t) =
∑`∈L(t) P(e`) · H(e`)
Let T be the set of all possibletests (e.g. of a given length).A test t? is optimal ifft? = arg mint∈T EH(t).Myopically optimal test:in each step a we select aquestion that minimizesexpected entropy of the testthat would terminate after thisquestion (one step lookahead).
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 16 / 25
Adaptive test of basic operations with fractions
Examples of tasks:
T1:(
34 ·
56
)− 1
8 = 1524 −
18 = 5
8 −18 = 4
8 = 12
T2: 16 + 1
12 = 212 + 1
12 = 312 = 1
4
T3: 14 · 1
12 = 1
4 ·32 = 3
8
T4:(
12 ·
12
)·(
13 + 1
3
)= 1
4 ·23 = 2
12 = 16 .
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 17 / 25
Elementary and operational skills
CP Comparison (common numerator or de-nominator)
12 >
13 ,
23 >
13
AD Addition (comm. denom.) 17 + 2
7 = 1+27 = 3
7
SB Subtract. (comm. denom.) 25 −
15 = 2−1
5 = 15
MT Multiplication 12 ·
35 = 3
10
CD Common denominator(1
2 ,23
)=(3
6 ,46
)CL Canceling out 4
6 = 2·22·3 = 2
3
CIM Conv. to mixed numbers 72 = 3·2+1
2 = 312
CMI Conv. to improp. fractions 312 = 3·2+1
2 = 72
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 18 / 25
Misconceptions
Label Description Occurrence
MAD ab + c
d = a+cb+d 14.8%
MSB ab −
cd = a−c
b−d 9.4%
MMT1 ab ·
cb = a·c
b 14.1%
MMT2 ab ·
cb = a+c
b·b 8.1%
MMT3 ab ·
cd = a·d
b·c 15.4%
MMT4 ab ·
cd = a·c
b+d 8.1%
MC abc = a·b
c 4.0%
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 19 / 25
Student model
ACD
HV2 HV1
AD SB CMI CIM CL CD MT
MMT1 MMT2 MMT3 MMT4MCMAD MSB
ACMI ACIM ACL
CP
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 20 / 25
Evidence model for task T 1
(34· 5
6
)− 1
8=
1524− 1
8=
58− 1
8=
48
=12
T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB
X1
P(X1|T1)
MSB
SB
CL ACL
MMT3
MT
MMT4T1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25
Evidence model for task T 1
(34· 5
6
)− 1
8=
1524− 1
8=
58− 1
8=
48
=12
T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB
X1
P(X1|T1)
MSB
SB
CL ACL
MMT3
MT
MMT4T1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25
Evidence model for task T 1
(34· 5
6
)− 1
8=
1524− 1
8=
58− 1
8=
48
=12
T1 ⇔ MT & CL & ACL & SB & ¬MMT3 & ¬MMT4 & ¬MSB
X1
P(X1|T1)
MSB
SB
CL ACL
MMT3
MT
MMT4T1
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 21 / 25
Evidence model for task T 1 connected with thestudent model
MMT1
T1
MMT2 MMT3 MMT4MCMAD MSB
ACMI ACIM ACL ACD
CD MTCLCIMCMISBAD
HV1HV2
X1
CP
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 22 / 25
Skill Prediction Quality
74
76
78
80
82
84
86
88
90
92
0 2 4 6 8 10 12 14 16 18 20
Number of answered questions
adaptive testfixed descending order
fixed ascending order
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 23 / 25
Entropy
3
3.5
4
4.5
5
5.5
6
6.5
7
0 2 4 6 8 10 12 14 16 18 20
Number of answered questions
adaptive testfixed descending order
fixed ascending order
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 24 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.
Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:
Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.
Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.
J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25
Conclusions and References
We have shown how Bayesian networks can be used for theconstruction of adaptive tests in CAT.Adaptive tests can substantially reduce the number of questionswe need to ask a tested student.
References:Howard Wainer, David Thissen, and Robert J. Mislevy.Computerized Adaptive Testing: A Primer. Mahwah, N.J.,Lawrence Erlbaum Associates, Second edition, 2000.Russell G. Almond and Robert J. Mislevy. Graphical models andcomputerized adaptive testing. Applied PsychologicalMeasurement, Vol. 23(3), pp. 223–237, 1999.J. Vomlel: Bayesian networks in educational testing, InternationalJournal of Uncertainty, Fuzziness and Knowledge BasedSystems, Vol. 12, Supplementary Issue 1, 2004, pp. 83—100.
J. Vomlel (UTIA AV CR) CAT 11th July, 2007 25 / 25