7
S~PrE~B~, 1943 I"SYCHOMETEIEA~VOL. 8, NO. 3 A PREFERENTIAL MATCHING PROBLEM LT. J. A. GREENWOOD, U. S. N. R. BURRAU OF AE~0NAUTICS, NAVY DEaPARTMENT, WASHINGTON, D. C. A matching method proposed by Dr. C. E. Stuart is presented in some detail and the essentials for a test of significance are derived. This method differs from the older matching methods in that partial credit is allowed for a near miss. A slight variation of the method permits the matching of one item with M sets of n traits. The matching method provides a means of applying statistical treatment to observatGns which cannot easily be classified according to fixed criteria. A simple form of the matching problem is the fol- lowing: Suppose five sets of identical twins provide handwriting specimens. Five specimens, consisting of one specimen from each pair, are placed before ~ judge. He is asked to match the other five specimens to the first set according to degree of similarity. His re- sulting five similar pairs are then checked for agreement with the original twin pairing. If a consistently greater-than-chance agree- ment is noted, then it is possible to conclude safely that the hand- writing of identical twins is similar even though there exist no fixed objective criteria of similarity. In recent years there has been considerable literature on the sub- ject of matching techniques similar to the above. As far as I have ascertained, they have dealt only with the numbers of perfect corre- spondences. Stuart (1) has recently proposed a technique in which each of the comparing items is "preferentially matched" or rated against each of the compared ones, depending on their closeness of "similarity." This paper presents the mathematics of the latter situation and in addition presents a new aspect of the method. It is divided into cases (i), (fi), and (iii), of which the first two represent a formal description and derivation of the mathematics of Stuart's method. A brief description of the matching method under discussion follows. Let two sets of items to be matched be A~ and B~, i -- 1,2, .--, n. A1 is compared to each of the B~ and the one which it most nearly "resembles" is given the number n ,* the one next to it in "resem- * Stuart gave it the value of 1. The mean and variance of the score are the same for the two ways of numbering. 185

A preferential matching problem

Embed Size (px)

Citation preview

S~PrE~B~, 1943 I"SYCHOMETEIEA~VOL. 8, NO. 3

A PREFERENTIAL MATCHING PROBLEM

LT. J. A. GREENWOOD, U. S. N. R. BURRAU OF AE~0NAUTICS, NAVY DEaPARTMENT, WASHINGTON, D. C.

A m a t c h i n g me thod proposed by Dr. C. E. S t u a r t is p r e sen t ed in some detai l and t he essen t ia l s f o r a t e s t of s ignif icance a r e derived. This method differs f rom the older matching methods in t h a t pa r t i a l c red i t is a l lowed f o r a n e a r miss. A s l i gh t v a r i a t i o n of the method pe rmi t s the m a t c h i n g of one i t em w i t h M sets of n t r a i t s .

The matching method provides a means of applying statistical treatment to observatGns which cannot easily be classified according to fixed criteria. A simple form of the matching problem is the fol- lowing: Suppose five sets of identical twins provide handwriting specimens. Five specimens, consisting of one specimen from each pair, are placed before ~ judge. He is asked to match the other five specimens to the first set according to degree of similarity. His re- sulting five similar pairs are then checked for agreement with the original twin pairing. If a consistently greater-than-chance agree- ment is noted, then it is possible to conclude safely that the hand- writing of identical twins is similar even though there exist no fixed objective criteria of similarity.

In recent years there has been considerable literature on the sub- ject of matching techniques similar to the above. As far as I have ascertained, they have dealt only with the numbers of perfect corre- spondences. Stuart (1) has recently proposed a technique in which each of the comparing items is "preferentially matched" or rated against each of the compared ones, depending on their closeness of "similarity."

This paper presents the mathematics of the latter situation and in addition presents a new aspect of the method. It is divided into cases (i), (fi), and (iii), of which the first two represent a formal description and derivation of the mathematics o f Stuart 's method. A brief description of the matching method under discussion follows.

Let two sets of items to be matched be A~ and B~, i -- 1 , 2 , .-- , n . A1 is compared to each of the B~ and the one which it most nearly "resembles" is given the number n ,* the one next to it in "resem-

* S t u a r t gave i t t he v a l u e of 1. T he m e a n a n d v a r i a n c e of the score are the same for the two ways o f n u m b e r i n g .

185

186 PSYCHOMETRIKA

blance" is given the number n - l , and so on to 1. This procedure is done with each A~. Suppose A~ really paired with B~. Then with perfect recognition of the t rue association, B~ would have been allo- cated n . However, any number from 1 to n may actually have been given it. The numbers thus allocated to the t rue correspondences of the A~ are summed as the test score.

I t is clear tha t the set of assignments can be displayed in an n-square mat r ix where the number in the i-th row and j- th column is tha t assigned when A~ is compared with B#. The test score is then the sum of the numbers on the principal diagonal,* providing A~ and B~ are t rue correspondences for i - - 1 , 2 , .-- , n . Thus a matr ix wi th n -- 4 and test score of 12 is i l lustrated by the accompanying diagram.

A1 A2 As A,

B1 B~ Bs B,

4 1 3 2 3 1 2 4 3 1 4 2 2 4 1 3

Two cases are distinguished in practice. Case ( i ) : The alloca- tions in any row are considered independent of all the other rows ex- cept inasmuch as actual similarities influence the decisions. Case (ii) : Each column (and of course each row) must contain an n . This necessitates remembering which B~ has already been allocated a value n , and not thereaf ter assigning it an n . Something between case (i) and case (ii) appears to take place in actual experiments. Tha t is, the judges appear to be influenced by their previous allocations of "firsts" - - those receiving n - - but the difficult memory feat involv?d seems to preclude its effective extension to "seconds," etc.

In both cases (i) and (ii), the null hypothesis assumes tha t a priori each matching of an A~ with a Bi ( i , j ~- 1 , 2 , . -- , n) has the same probability of an n , n - 1 , .-. , or 1 assignment. This is ob- viously not to say tha t the assignments are independent.

Case (i) On a chance basis the probability of a score of r is the coefficient

of x ~ in (x + Z2 + ' " + X')"

¢ ( x ) =

* T h a t ira, from the u p p e r l e f t - h a n d c o r n e r t o the lower r l g h t - h ~ n d c o r n e r .

LT. J. A. GREENWOOD, U. S. N. ]L 187

T h e r e f o r e , the j - th fac tor ia l m o m e n t of r is K~ - - Cm (1) , t he j - t h de r iva t ive of ¢ (x) eva lua ted a t x = 1 . In o r d e r to ob ta in the mo- men t s abou t the m e a n , / , , i = 2 , 3 , 4 , i t is t h u s necessa ry to ob t a in the f i rs t f o u r de r iva t ives of $ (x ) .* Th is is f ac i l i t a ted by c a r r y i n g o u t t he d i f fe ren t ia t ion of ~ (x) in t e r m s of A (x) - - x + x 2 + -.- + x" a n d its der ivat ives . The s t r a i g h t f o r w a r d di e ~ n t i a t i o n is omit ted, b u t t he essent ia l values used a r e

(n + 1) "÷~) At,~ (1) - - . ,

i + 1

Subs t i tu t ing these de r iva t ives eva lua ted a t z = 1 in the f o r m u - las connec t ing fac to r i a l m o m e n t s K~ and o r d i n a r y cen t ra l m o m e n t s p , , we obta in the fo l lowing:

m e a n = K l = n ( n + 1 ) / 2 ,

va r i ance = ~ = K= + K1 - K1 ffi = (n + 1) ~s~/12,

p s = K ~ + ~ ( 3 - 3K1) - (K~ a - - 3K~= + 2 K ~ ) = 0 ,

p , = K 4 + p s ( 6 - 4Kx) - t ~ ( 6 K x ~ - 18K~ + 1 1 )

+ (6Kx ~ - K x " - l l K ~ s + 6K1)

( n + 1) ,8)

240 ( 5 n 8 - 2n= - 5"a - 2 ) ,

P~ P, Q3"--- ~ "-'0~ 014- - -~ - -"

lirn a~ - - 3.

3 ( S n s - - 2 n 2 - 5 n - - 2 )

5 n ( n ~ - 1)

Case (i i)

L e t

- - ( - - 1 ) ' ( r - i ) ! , i l 0 = l ( 2 ) . i=-O / ~

Le t Mqm - - ( : ) , I r a - q be the n u m b e r of ,errfls in a de te r In inan t of or-

der m , each t e r m hav ing q e lements , only, of the p r inc ipa l d iagonal . Th e p robab i l i t y of j n 's on the pr inc ipa l d iagonal of the m a t r i x o f

* We note that for ease (i) the desired moments could be obtained ulti- mately from the moments of the score of a single cell of the matrix.

t Define a ~b) - - a(a--1) . . . (a -- b + 1), b a positive integer.

188 PSYCHOMETRIKA

numbers is then M # ' / n ! . . T h e probability genera t ing function of the total score on the principal diagonal is k~(x) where

n ! ~ ( x ) - - ~ M," x " ( . z + ~ n'-i+"" + x~-~) ~-~"

Again the mean sSore KI equals ~t~l (I) and K~ ---- ~m (1). In order to maintain the upper limit n on the summation sign

throughout the successive differentiations, it is sufficient to bound x away from zero in a neighborhood about 1. The following values are also needed.

M~ ~ - n! since k~(1) - - 1 ; t--O

i Md' "-- n M "-1 so ~ i M~" = n! • $-'t

i=O

i 2M~ " = 2 ( n ! ) . i--0

Carrying through the differentiation in terms of a ( x ) = ( x + x 2 + . ' . + x ~ ) / ( n - 1), f rom which a (1) - - 1, a tt~ (1) - - n / 2 , a t'~ (1) - - n ( n - 2 ) / 3 , one readily obtains K V -~ n ( n + 1)72-as in ease 0)-7

Af te r substi tuting the derivatives of a ( x ) in pt2~ (x), all eval- n

uated at x - - 1, there results K~ = Pt'-~ (1) - ' ~ (3n 3 + 7n ~ - 3n - 4 ) ,

v a r i a n c e = u ~ = K ~ + K ~ - K ~ 2 - - ( n ' + 2 ) = ( 1 + ~ _ _ 3 ) t i m e s 1

!t~ of case (i). To obtain the third and four th moments of case (ii) by this

method of at tack would be considerable of an exercise. I f it is de- sired to evaluate the sum of results of a repetit ion of such an experi- ment as encompassed by ei ther case (i) or case (ii) , one need only apply the usual formulas for the moments of a mean.

As an illustration of this technique, one at tempts to identify au- thors of handwri t ing specimens with charac ter sketches of each. Let

n - - n u m b e r of specimens (or sketches), A~ - - h a n d w r i t i n g specimen of i-th author, Bi - - character sketch of j-th author.

In the experiment to test whe ther or not to a given judge the hand- wr i t ing specimens exhibit significantly identifying properties in com- mon with the t rue associated charac ter sketches, all identifying marks should be removed from all the A's (or the B's) and a code substi tuted before handing over to the judge. The procedure described previously should then be carr ied out by the judge and the score totaled by the

LT. J. A. GREENWOOD, U. S. N. R. 189

experimenter , a f t e r decoding so tha t the t rue correspondences of t h e A's can be identified.

Now it might well happen tha t A~, say, suggested both B8 and B5 and t ha t B~ is the t rue associate of A~. The usual "cor rec t match° ing ~' procedure depends fo r any value of Che matching score upon deciding correct ly which of the two it should be, and fu r ther , i f i t is missed, it affects adversely the possibility of correct ly associating the remaining ones. {This is the method where in effect the B's are l ined up and the A's opposite them in one-to-one correspondence and cor- rect pair ings counted, for the score.) On the o ther hand, in the pres- ent ly suggested method, at wors t Bs is allocated n - 1 and Bs the value n . Thus i t contr ibutes n - 1 to the total score and has no effect in case (i) and little effect in case (ii) on the remain ing match- ings. This would appear to indicate the essential advan~aye of the proposed method.

In order to test the null hypothesis, the difference between the observed score and the theoretical value n ( n ÷ 1 ) / 2 is divided by the s tandard deviation of case (i) or case (ii) (whichever the experi- men te r d~cides is more applicable) and the normal tables entered fo r an approximate probabili ty. In the i l lustration at the beginning of the article n - - 4 , score - - 12, theoretical mean - - 10, and s tandard

deviation [assuming case ( i i ) ] is X/6. While for only one sample n is too small to admit of accurate evaluation on the assumption of nor-

mality,* the formal procedure gives a critical rat io of {11.5 - 1 0 ) / ~ - - 0.61 with a probabil i ty of .3 , roughly, of obtaining a score as large as 12.t.

Case (iii) A different si tuation is involved in the following il lustration.

Choose M subjects (of comparable age and physical condition) and list n selected t r a i t descriptions of each, as: measure of initiative, sample of handwri t ing, reaction t ime on a cer ta in test, etc. Suppose the question is whether or not a photograph of a selected person of

* Even in the more favorable ease (i), it appears that for a reasonably good evaluation of the single sample by normal tables the number of pairs .to be matched should be in the neighborhood of 20. However, the more likely situation to be evaluated is that of an experiment consisting of a series of such matchin(~s of equal numbers of pairs. (In this connection note a caution by the author m J. Psychol., 1940, 53, 614o615, to the effect that the same material should not be matched more than once.) The distribution of the sum of scores obtained is rea- sonably close to normal when the number of pairs and numbers of matchings are at least as large as ~ and 8, respectively. However, if either number were made much larger, then the other could be made smaller.

t To allow for discontinuity of the data, the deviation is reduced half a u n i t

190 PSYCHOMETI~KA

the group of M can be reliably (in the sense of statistically signifi- cantly) associated wi th the correct set of t ra i t descriptions.

Designate the ]-th t ra i t of the i-th subject as B~i and the prop- er ty of the chosen subject (photograph, e.g.) as A . The procedure is as follows: All identifications of t ra i ts to subjects are removed and code substituted. The sets of t rai ts are randomized and a judge is asked to preferentially match A with each of the sets B~j, B:~, --- , B , i , j - - 1 , 2 , . . . , n . The score s is the sum of numbers assigned

i

to each of the t rue associates of A in the preferential matching pro- cess.

There are two possibilities to consider. The chosen t ra i ts have. (a) assignments uncorrelated by pairs, or (b) assignments correlated by pairs. Let p~z be the theoretical correlation between assignment sco re of t rai ts of k and l .

Under the assumption of equi-probabili ty that each of the trai,ts B~j corresponding to A be assigned any value f rom 2. to M, we may utilize the results of case ( i) . This gives the mean and variance of a single cell of the mat r ix of assignments to be (M + 1 ) / 2 and (M ~ - 1) /12, respectively. Then applying formulas for the mean and variance of a l inear expression of variates,

mean score = n(M + 1 ) / 2 ,

/ ~ : , = v a r i a n c e of score = n + 2 ~ 0iJ ~ ) •

If the t ra i t scores may be considered as uncorrelated, then

p , j ~ 0 and

~ : , - - n ( M 2 - 1 ) / 1 2 .

For convenience, a form matr ix for scoring is shown in the illus- t rat ion below in which Bs indicates the ]-th trai t . Under each Bj is the column of values preferential ly assigned to the ]-th t ra i t of each of the subjects. The matr ix can of course be filled in only a f te r the t ra i ts have been decoded.

Illustration: Let M : 6 , n : 3 , and suppose tha t the selected subject was in actual i ty the-s-econd one. Let the completed mat r ix have the following form:

• It is recognized that estimates, only, of #~j may be known, and error there- by introduced into the use of #~:,.

LT. J. A. GREENWOOD, U. S. N. R. 191

B~ B, Ba

3 1 5 5 3 6 1 6 1 4 4 2 6 2 4 2 5 3

Assuming p~j -- 0, then the mean : I0.5 ,_z_~:, = 8.7__5_5, ¢, : 2.96,

s = 14. Critical ratio -- (13.5 - 10.5)/2.96 -- 1.01. Referring to

the n o r m a l tables , the p r o b a b i l i t y t h a t a score as l a rge as 14 would h a v e been ob ta ined equals .16.*

* Under the assumption that p~j --- 0, the same general restrictions as to sizes of ~ and M for using the norma-l-tables hold as for case (i). I am unable to give any exact statements and proofs of minimum values for ~ and M for given desir- able approximations to true probabilities when using normal tables. However, from examining the values of a s and a~ for selected values of n, M, I would se- lect n --- 3 and M --- 6 as minimum values for these constants. Of course in- creasm-~ n wou]-d-permit the decreasing of M.

I .

2.

-REFERENCES Stuart, C. E. An ESP test with drawings. J. Pa~psycImlogy, 1942, 6, 20-43. Whitworth, W. A. Choice and ehanc~ 1942 reprint, p. 102, G. E. Stechert & Co., New York.