to - Patrick Suppessuppes-corpus.stanford.edu/articles/comped/148.pdf · restricted the analysis to the simple case of column addition, but the methods either already have been or

339

MATHEHATICAL IIODELS OF LEARNING AXD PERFORMANCE IN A CAI SETTING

P a t r i c k Suppes Stanford Universi ty

Stanford, California

In t h i s l e c t u r e I give some (but by no means a l l ) of the t e c h n i c a l de- tails of our research i n the psychology of arithmetic, The first t h r e e sections deal with performance models and t h e last s e c t i o n d e a l s w i t h a learning model. Each sect ion a t tempts to d i g a s tep deeper than its predecessor into the skills of ar i thmetic , For s implici ty I have r e s t r i c t e d the ana lys is t o the simple case of column add i t ion , bu t the methods either already have been or ín principle can be extended to e s s e n t i a l l y t h e e n t i r e domain of elementary-school mathematics, On t h e other hand, a good many additional developments will be needed t o extend t h i s work even t o r o u t i n e p a r t s of the undergraduate college curriculum. (Some v e r y e m p i r i c a l f i r s t s t e p s a t t h i s c o l l e g e l e v e l are t o be found i n Goldberg & Suppes, 1972; Kane, 1972; Moloney, 1972.)

Linear Regression Models

I begin with regression models that use as independent variables s t ruc- t u ra l f ea tu re s o f i nd iv idua l arithmetic exerc ises (Suppes, Hyman €i Jerman, 1967; Suppes, Jerman & Brian, 1968). I denote the j ch s t r u c t u r a l f e a t u r e o f exerc ise i i n a given set of exerc ises by f $ . The parameters estimated f r o s t h e d a t a are the va lues a t tached to each s t ruc tura l fea ture . (In previous publ icat ions we have referred to t h e s e s t r u c t u r a l f e a t u r e s as fac- t o r s , but t h i s can lead to confusion with the concept of f a c t o r as used in f ac to r ana lys i s , ) I deno te t he coe f f i c i en t a s s igned t o t he jth s t r u c t u r a l f e a t u r e by dj, and I emphasize tha t t he s t ruc tu ra l f ea tu re s t hemse lves , as opposed t o t h e i r c o e f f i c i e n t s , are objec t ive ly ident i f iab le by the exper - imenter i n terms of t h e exercises themselves, independent of the response data.

L e t pi be the observed proportion of correct responses on exercise i foi a given group of students. The *na tura l linear r e g r e s s i o n i n terms of t h e s t r u c t u r a l f e a t u r e s f i j and t h e c o e f f i c i e n t s d j is simply

Unfortunately, when the r eg res s ion is put in t h i s fom, there is no guaran- tee t h a t p r o b a b i l i t y w i l l be preserved as t h e s t r u c t u r a l features are com- bined to p red ic t the observed propor t ion of correct responses , Tp guaran- tee consewat ion of probabi l i ty , i t is n a t u r a l t o make the fol lowing t rans- formation and to define a new v a r i a b l e z ; e

1 -Pi pi

z ; = log -,

*This paper is a s l i g h t l y r e v i s e d version o f the appendix of "Facts and Fantasies of Education," and is repr in ted by courtesy of Phi Delta Kappa. The research reported has been supported by the National Science Foundation Grant NSFGJ443X and U,S. Office of Education Grant OEG9700024(051),

patguest

Typewritten Text

K.L. Zinn, M. Refice, & A. Romano (Eds.), Computers in the Instructional Process: Report of an International School. Ann Arbor, MI: Extend, 1974, pp. 339-353.

- .

The third structural feature VF reflects ehe vertical format of the exercise. The vertical e5ercises with one-digit responses were given the value O. Blulticolumn exercises with m u l t i d i g i t exercises and one-column addition exercises with a response of 11 were given the value l, One- column exercises with a multidigit response other than 11 were given the value 3. For sample,

P. Suppes

ab

e V F ( - c d ) = O

abc V F ( + d e f ) = l

ghi

341

a VF ( + b ) - 5 .

cd

This s t ruc tu ra l f ea tu re is meant to ref lect the l ikel ihood of the m i s - take of revers ing the d ig i t s of the correct response, especially i n a one- column addition exercise. In the computer-assisted instruction environment where students were responding a t teletype terminals, responses to vertical exercises were typed from r i g h t t o l e f t , while responses to horizontal exercises were typed from l e f t t o r i g h t . Thus, i t was possible for a student t o have i n mind the cor rec t answer, but t o err by typ ing the d ig i t s in the reverse order. It is f a i r t o s a y t h a t t h i s s t r u c t u r a l f e a t u r e is of more importance i n working a t a computer-based terminal than when using paper and pencil .

Table 1 shows a p r e t e s t on column addition given t o third graders. The following regression equation was obtained for the mean response data of 63

- students taking the test.

(3) pi e 5 3 SUXRj + s 9 3 C-ARj -i- 031 VF 4.06.

The z d t i p l e of R was -74 and R was .54, &ích r e f l ec t s a reasonable f i t t o t h data. I sha l l no t en t e r i n to fu r the r de t a i l s of the regression model, but sha l l move on t o the next level of analysis of these same response data. AS should be obvious, I a m not at tempticg anything l ike a sys- t ez t i c p re sen ta t ion o f da t a , but only enough to g ive a sense of how some of the models do f i t .

2

Three-state Automaton Model

The cen t r a l weakness of the regression models is that they are not: process models. They do not provide even a schematic analysis of the algo- r i thnic s teps the s tudent uses to f ind an answer. Automaton models are process models and therefore the i r use represents a natural extension of the regression analysis. For the exerc ises in column addi t ion we may re- s t r ic t ourse lves t o f i n i t e automata, but a s ordinar i ly def ined they have no place for errors. However, t h i s is easily introduced by moving frcm deter- minis t ic state t ransi t ions to probabi l is t : -c ones.

& 4

I begin with the definit ion of a f in i t e de t e rmin i s t i c automaton, and then generalize. These developments follow Suppes (1969).

Definition I. A st'ructure q = (A,v~,v~,x,Q,s~) 5 a f i n i t e (deterministic) automaton with output if and only if

(i) A is a f i n i t e , nonernpty g, vocabularies, respectively),

(M def ines t he t r ans i t i on t eb l e ) .

( i i ) VI and V. a r e finite nonem?ty s e t s ( the inpttt: and output

( i i i ) M is a funct ion from the Cartesian product A X V t o A I--

.

is a funct ion from the Cartesian _ s r _

to 8 is the O U ~ D U ~ function) a

-

as an ex2mple of a k i t e automaton with maton in. the sense s f this def in i t i on , we may c that will perform two-row column addi t ion,

-..

Thus the automaton operates by adding f i r s t the m e s ' column, s tor ing as irrternal sta te O if there is no carry, l if there is a ca-ry, output t ing the sum of t'ne m e s ' column noduius 3.0, and then moving on LO the krrpuk. of the two tens ' column d i g i t s , etc. The i n i t i a l i n t e r n a l state s o i s O, because a t the beginning of the exercise there i s no 'carry.'

P, stcppes 343

Definition 2, & s t ruc ture 91 = (A, Vx,Vo,p,q,s0) 2 5 ( f in i t e ) Drobabilistic automaton i f and onlv if

(i) A is a f f n i t e , nonempty e, ( i i ) VI and V. are f i n i t e , nonempty sets,

(iii) p 5 function on A x V X A to the in te rva l [O,l] such that -- for each s A Q' V, ps,u is a probabi l i ty dens i ty over A, A.E., (a) for each s' in A, ps,d (S') 2 O, P

(b) E Pqa (S') = 1, 5 'eA

Q is a function on A X V X V. to [O,l] such that for -- is a &robability density over V*,

each

In the probabi l is t ic general izat ion of the automaton f o r column addi t ion, the number of possible parameters that can be introduced is unin- terest ingly large. Each t r ans i t i on M(k,(m,n)) may be replaced by a proba- b i l i s t i c t r a n s i t i o n 1 - ek,m,n and GKp,., , and each output Q(k(m,n)), by 10 probabi l i t i es for a t o t a l of 2200 parameters.

A three-parameter automaton model s t ruc tu ra l ly rather c lose to the regression model is easi ly def ined, Firs t , two parameters, e and 7 * are introduced according t o whether there is a 'carry' to the next column.

P(M(k, (m,n)) = O 1 k -k m -k n 1. 9 = 1 - e and

P(M(k, (m,n)) = 1 I k + m + n > 9 ) = 1 - 7 . In other words, i f t h e r e is no 'carry, ' the probability of a correct tran- s i t i on is 1 - e and i f there i s a 'carry' the probabili ty of such a tran- s i t i o a i s 1 - R . The third parameter,?", i s simply the probabili ty of an output error. 'Conversely, the probability of a correct output is:

(P(Q(k, (m,n)) = (k + m + n) nod 10) = 1 -7. Consider now exercise i with C i carrys and Di d i g i t s , If we ignore the

probabili ty of two er rors l ead ing to a correct response (e.g., a t r ans i t i on error followed by an output error), then the probabili ty of a cor rec t answer is jus t :

P c; D; - ci-l (4) P(Correct Answer to Exercise i) = (1 - y) (l -1 ) (1 - € )

As already indicated, i t is important t o r ea l i ze t ha t t h i s equa t ion is an approximation of the ' t rue' probabili ty. However, t o compute the exact probabi l i ty it is necessary t o make a definite assumption about how the probabi l i ty y of an output error is drs t r ibu ted among the n ine poss ib le wrong responses, A simple and intuitively appealing one-parameter model í s the one that arranges the 10 digi-ts on a c i r c l e i n na tu ra l o rde r w i th 9 next to O, and then mqkes the probabi:ity of an error j steps to t he r i gh t or l e f t of the correct response &J. For example, i f 5 is the cor rec t digi t , then the probabi l i ty of responding 4 is 8 , of 3 is S Z , of 2 is s3, of 1 iss$, of O isss, of 6 i s h , of 7 is 6 2 , etc . Thus i n terms of the or ig ina l model

y = 2 ( & + s 2 + s 3 + 6 > + g 5 ,

b

si

Let: there be w git responses b a l e t x i be the rand variable that ass is correct and Q otherwise, It is then easy to see t ha t

granted that €&type terns are ignored. Similarly f o r the same th=& al- ternatives

P. Suppes 345

So f o r a s t r i n g of ac tua l d ig i t r e sponses xlto.,,x we can write t h e Pike- l ihood function as: n

where a = number of correct responses , b = nunber of incorrect responses in the ones' column, c = number of correct responses not in the ones ' column when t h e i n t e r n a l s ta te is O, d = number of correct responses when the in- t e r n a l state is 1, e = number of incorrect responses not i n the ones ' column when t h e i n t e r n a l state is O, and f = number of incorrect responses when t h e i n t e r n a l state is 1. ( In t h i s model statistical independence of responses is assured by the correction procedure.) It is more convenient t o estimate y'= 1 - y e' = 1 - G, and 1' = 1 -7 . Making this change, taking the logari thm of both s ides of (6) and d i f f e ren t i a t ing w i th r e spec t to each of the var iab les , we obtain three equat ions that determine the maximum-likelihood estimates of y', e ' , and q' :

bL a b e 6' - -__I_ ay' y' 1 - y' 1 - y'e' 1 - y'?,

Solving these equations, we obta in as estinates:

y: a - c - d a + b - c - d B

a #z"* c ( a + b - c - d) (c + e)(a - c - d)'

d(a + b - c - d) (d + f ) ( a - c - d).

Estimates of the parameters for the same third-grade data a l ready descr ibed, as well as a graph of the observed and predicted response proba, b i l i t i e s f o r t h e e x e r c i s e s shown i n Table 1, are given in Chapter 4 of Suppes and Morningstar (1972). * (This chapter was w r i t t e n i n c o l l a b o r a t i o n with Alex Capara and he ís respons ib le - for the da ta ana lys i s . ) The es- t imates are: y = .0430, 2 = ,0085 and 1 = .0576. The graph of response p r o b a b i l i t i e s is reproduced as Figure 1, A de ta i l ed d i scuss ion of t h e f i t of the model and fu r the r ana lys i s o f some of the discrepancies are t o b e found i n t h e c h a p t e r mentioned. Here I have t r ied only to g ive a sense of how t h i s k i n d of model can be brought into direct confrontat ion with data .

Register Machines w i t h Perceptua l Ins t ruc t ions .Ø 1

To int roduce-greatFr general i ty and to deepen the analysis to include specif ic ideas about the perceptual processing of a column-addition ex- e rc i se , I move on to regis ter machines . This research is being conducted in collaboration with Lindsay L. Flannery.

Register machines were f i r s t i n t r o d u c e d by Shepherdson and S t u r g i s (1963) t o give a na tu ra l r ep resen ta t ion of computable functions in terns t h a t are c l o s e r t o a Turing machine. In the case of the representa t ion of

n a t u r a l t o p o s t d a t e o n l y a f i n i t e f i x e d number of r eg i s t e r s t ha t t he student can use,

The basie idea of this approach is to simplify drast ical ly the per- cep tua l s i t ua t ion by conceiving each exereise as being presented on a grid, %he student is represented by a model that has ins t ruc t ions fo r attending to a given square on the g r id ; for example, i n the standard algorithms of addi t ion, subtract ion and mult ip l ica t ion we begin i n the upper right-hand corner and then have instructions t o move downward through each column and drom r i g h t t o l e f t a c r o s s columns. Addit ional inst ruct ions for s-Foring the results of an operation, for output t ing the last d i g i t of a stored numeral, etc,, are needed.

The basie idea of ' regis ter machines is that the d i f f e ren t algcsrithms a re represented by subroutines, One subroutine may be called in another , as - complex rout ines are b u i l t up, The procedure is f ami l i a r t o most of us, even i f the language I am using is not. For example, in performing column mult fp l íca t ion we use the algorithm of addi t ion, which i n this case means ca l l ing the subrout inz for addition; i n long d iv is ion we ca l f the subrout ines f o r subtract ion and nn.dtiplication, as well as for addi t ion. Each basic subroutine is represented by a program i n terms of the p r h i t i v e in-

P. Suppes 347

s t ruc t ions . The problem from a psychological standpoint is to f ind in - s t ruc t ions t ha t p rov ide no t on ly a realist ic descr lpt ion of what the student does, a d e s c t i p t i o n t h a t c a n b e f i t t e d t o d a t a i n t h e same way t h a t the automaton models have been applied to da t a , bu t a l so a f u l l e r account of how the s tudent processes the exercise .

A t t h e f i r s t s t a g e of analyzing the register-machine models ve can get r e s u l t s similar to those for t h e automaton models by postulating error parameters for execut ion of main subrout ines of t h e r o u t i n e f o r a given algorithm,

For column addi t ion three r e g i s t e r s s u f f i c e i n our scheme of analysis. F i r s t t h e r e is the st imulus-supported register [SS] t ha t ho lds an encoded representation of a pr in ted symbol t o which the s tudent is perceptua l ly attending. In the recent case the a lphabet of such symbols c o n s i s t s of the l 0 d i g i t s and the underl ine symbol ' ', As a new symbol is at tended to, previously stored symbols are los t un lë s s t r ans fe r r ed t o a non-stimulus- supported regis ter . The second r e g i s t e r is the non-stimulus-supported r e g i s t e r [NSS]. It provides long-term storage for computational results. The t h i r d r e g i s t e r 5s t he ope ra t ions r eg i s t e r [OP] t h a t acts as a short- term-store, both for encodings of ex te rna l s t imu l i and f o r r e s u l t s o f cal- culat ions carr ied out on the contents of o the r r eg i s t e r s , It is a l s o pr%- rnarily non-stimulus-supported.

F7e d ra s t i ca l ly s imp l i fy t he pe rcep tua l s i t ua t ion by conceiving each ex- e r c i s e as being presented on a gr id wi th a t most one symbol i n each square of t he g r id . For column addi t ion we nurnber the coord ina tes of t he g r id fron the upper right-hand corner. Thus, i n t he exe rc i se

15 . 24

4- 37

the coordinates of t h e d i g i t 5 are (1,1), the coordinates of 4 are [2,1), the coordinates of 7 are (3,1), the coord ina tes of 1 are (1,2) and so fo r th , w i th t he f i r s t coo rd ina te be ing t he row number and the second being the column number,

The r e s t r i c t e d following 10.

Attend (a,b) :

(&a, Ab) :

Readin [SS] :

Lookup [RI.] + [R21 :

Copy [Rl] i n [R21 :

Deleter ight [k?] :

set of i n s t r u c t i o n s we need f o r column a d d i t i o n are t h e

Direct a t t en t ion t o g r id pos i t i on ( a ,b ) ,

S h i f t a t t e n t i o n on the g r id by (fa,fb).

Read i n t o the stimulus-supported register the phys ica l symbol in the g r id pos i t ion addressgd by Attend D *

Look up t a b l e of bas i c add i t ion f ac t s fo r add ing 'contents of r e g i s t e r [Rl] and [R21 and s t o r e t h e r e s u l t i n [R11 . Copy the content of register [RI.] i n r e g i s t e r [R2 1 . Delete the rightmost symbol of r e g i s t e r [R].

um

student is perceptually processing only one grid square a t a time, so t ha t he must have a check for finding the bottom row by looking ~ o n ~ ~ n ~ a l l y f o r an underl ine symbol, Otherwise he could, according to an apparently natu- ra l subroutine, proceed indefinitely downward encountering only blanks and l eav ing en t i re ly the immediate perceptual region of the formatted exercises Here is the subrouthe , I n the wain program it is receded by an Attend ins t ruc t ion ,

Vertical Scan Subroutine

Rd Readin

Readin

Attend (+O,+l)

P, Suppes

Jump Rd

Fin E x i t

349

The l a b e l s Rd and Fin of two of t h e l i n e s are shown on t h e l e f t .

The second subroutine is one tha t ou tpu t s a l l t h e d i g i t s i n a r e g i s t e r working from r i g h t t o l e f t . For example, i n column a d d i t i o n , a f t e r t h e leftmost column has been added, there may still be several d igi ts renraining t o p r i n t o u t t o t h e l e f t of t h i s column i n t h e 'answer' rowrn

Output [R]

Put Outright [R] Dele te r igh t [R] Attend (O, +l) Jump (Blank) R, Fin Jump Put

Fin Exit

Using t h e s e two subroutines the program for vertical addi t ion is r e l a t f v e l y s t ra ightforward and requi res 26 l i n e s . I number t h e l i n e s for l a t e r r e f - erence; they are not a g a r t of the progrzm.

Vertical Addition

l b

2. 3. 4 m f i b

6 . OPr 7, Rd 8, 9 .

10 m

11. 1 2 . 13 14 15 b

1 7 . 18 19 m

20 b

21 . Car 22 b

23 Fin 24 . 25. 26 . out

16 m

Attend (1,l) Read i n Copy [SS] i n [OP] Attend (+l,-to) Readin Lookup [OP] + [SS] Attend (+1,0) Readin Jump (0-9) SS, Opx Jump (Blank) SS, Rd A t tend (+l, O) Outright [OP] Dele te r igh t . [OP] Copy [OP] i n [NSS] Attend (l,+l) V-scan (0-9 ,J Jump (-) S S , Fin Jump (0-9) S S , Car Copy [SS] i n [OP] Jump Rd Copy [NSS] i n [OP] Jump Opr Jump (Blank) NSS, Out A t t end (+l, O ) Output [ N S ] End

c

To show how the program works, we may consider a simple one-column addi t ion

just before the next row is attended to, i r e r , a f t e r a l l operations have been performed.

E X C r C i S e . I ShOW at t h2 r igk t OE Each l ine thE CCXlt@Rt Uf e3Ch register

.m

P. suppes 351

pr i a t e program. (I emphasize an because the internal program constructed is not necessarily unique.)

f restrict myself here to an example of t h i s approach. I take as the class of exercises single-column addition, but with an indef in i te number of rows. The program is simpler than the general one given above, and it is easy t o see the r e l a t ion between what is said t o the student by the teacher or computer t o the desired internal program. In Figure 2, I show the verbal inst ruct ions on the r ight with the physical pointing to the relevant par t of the displayed exercise indicated in parentheses. When e r ro r s are made, still more de ta i led ins t ruc t ions , t a i lo red to the par t icu lar e r ror , can be given, but I do not consider such error messages here.

Internal. Program Verbal Ins t ruc t ions

Attend (1,l)

Readin

Transfer [SS] t o [OP]

A t t end (+l, O)

Readin

A t tend (+l , O)

Readin

Jump (0-9) SS, Opr

Artend (+1,0)

Output [OP]

End

c Start here (pointing) 1

c Add f i r s t twa digi ts (point ing) 2

Now add again (pointing) (if conditional jump sa t i s f i ed )

C or Notice end of column (pointing a t -) (if conditional jump not -sat isf ied)

c Write answer here (pointing) Lt

Figure 2. Single-column addition,

In Figure 2, learning parameters c l , :z, c3 and c4 are shown for the four segments of the program. Various learnmg models can be foFu la t ed i n terms SE these four parameters, The s inp le s t is the one that assumes independence of the four parts. If we t r e a t the probabi l i ty of successive e r rors combin4ng t o y i e l d a correct response as having probability zero, then the mean probabsli ty for a correct response on trial n fo r t he independence model is simply: c

Ød

l l

6-1 P,, (Correct Response) =

n-1 (1 - (l - Ci) ).

A t the other extreme a hierarch ica l model pos tu la tes tha t the i* segment of the program cannot be learned un t i l the i - Is) segment is learned. This hierarchical model leads to the fol lowing t ransi t ion matr ix , where

e

,T, ~ ' ~ ~ r i a ~ i ~ i t ~ ln the roof avior or of CAI Course h Logic as a Function of Problem C h a nical Report No, 192, Stanford University, Institute f o r ~ ~ ~ ~ ~ t i ~ a Stud%es fn the Social. SciencesB 1972,

Rottrnayer, W,A. "A Pomal Theory of Perception," Technical Report Ns. P Stanford University, Institute for Mathematical Studies %n the Hocial Sciences. 1970,

Shepherdson, J.C., '& Stuqgis, H.E. "The Computability of Partial Reeursjive Functions," journal of the Association of Computing Machinery, 16 (1963) p

Suppes, P. "Stimulus-response Theory of Finite Automata.'' Plathematical Psychology-, 6 (1969) ,,

Journal. of

P. Suppes 353

Suppes, P*, L. Hyman, & IL Jerman. "Linear Structural Models fo r Response and Latency Performance in Arithaetic on Computer-controlled Ter- mlnals." In J.P. H i l l (Ed,), Minnesota Symposia on Child Psychology. Minneapolis: University of Minnesota Press, 1967.

Suppes, P., M. Jerman, & D. Brian. Computer-assisted Instruction: The 1965-66 Stanford Arithmetic Program, New York: Academic Press, 1968 o

Suppes, P., & M, Morningstar, Computer-assisted Instruction at Stanford, 1966-68: Data, Models, and Evaluation of the Arithmetic Programs, New York: Academic Press, 1972.

Documents

to - Patrick Suppessuppes-corpus.stanford.edu/articles/comped/148.pdf · restricted the analysis to the simple case of column addition, but the methods either already have been or