Bivariate Dependence Orderings for Unordered Categorical Variables

Bivariate Dependence Orderings for UnorderedCategorical Variables

Alessandra Giovagnoli1, Johnny Marzialetti1 and Henry Wynn2

1 Department of Statistical Sciences, Via Belle Arti 41, Bologna 40126, [email protected], [email protected]

2 London School of Economics and Political Science, Houghton Street, LondonWC2A 2AE, UK. [email protected]

1 Introduction

Several statistical concepts (such as location, dispersion, concentration anddependence) can be studied via order and equivalence relations. The con-cept to be defined can be described by means of a partial ordering or a pre-ordering among the variables of interest and the relative measures are thusorder-preserving functions. Bickel and Lehmann (1975) were among the firstauthors to introduce this approach to statistics. Many well-known propertiesof established statistical measures can be derived from the ordering represen-tation.

Stochastic orderings (i.e. order relations among unidimensional or mul-tidimensional random variables) and the relative order-preserving functionshave a long history: an introduction is in Chapter 9 of the book by Ross(1995) and fundamental works are by Shaked and Shanthikumar (1994) andMuller and Stoyan (2002). Applications are to general statistical theory, inparticular testing, reliability theory, and more recently risk and insurance. Inthis chapter it is not our intention to cover the basic material but rather todescribe and investigate concepts related to association of random variables,as in Goodman and Kruskal (1979), namely the degree of dependence amongthe components of a multivariate variable. We stress that different concepts ofassociation are possible: for example interdependence (all the variables havean exchangeable role) and dependence of one variable on the others. Further-more, as pointed out in Chapter 11 of Bishop et al. (1975), a special case ofassociation among variables is inter-observer agreement, which has importantapplications in several fields. In this case, there is one characteristic of interestobserved on the same statistical units by different observers who may partlyagree and partly disagree in their classifications or their scores, and the multi-dimensional random variables to be compared from the agreement viewpointexpress the judgements of the “raters” in vector form.

https://www.researchgate.net/publication/40933482_Discrete_Multivariate_Analysis_Theory_and_Practice?el=1_x_8&enrichId=rgreq-2643aa95-0193-4263-96d1-4479941d8845&enrichSource=Y292ZXJQYWdlOzIyNTkxNjczMztBUzo5OTc2ODc2NDIwNzExNEAxNDAwNzk4MTM1Njcx

https://www.researchgate.net/publication/38357994_Descriptive_Statistics_for_Nonparametric_Models_I_Introduction?el=1_x_8&enrichId=rgreq-2643aa95-0193-4263-96d1-4479941d8845&enrichSource=Y292ZXJQYWdlOzIyNTkxNjczMztBUzo5OTc2ODc2NDIwNzExNEAxNDAwNzk4MTM1Njcx

2 Alessandra Giovagnoli, Johnny Marzialetti and Henry Wynn

So far studies of dependence and interdependence through orderings havefocussed mainly on real random variables (see Joe (1997)): in the tradition ofItalian statistics, Forcina and Giovagnoli (1987) and Giovagnoli (2002a,b) havedefined some dependence orderings for bivariate nominal random variablesi.e. with values in unordered categories. This is the type of variables thatwe restrict ourselves to in this chapter. We review the existing results andintroduce further developments in Section 2. In particular, we are not awareof a theory of agreement orderings so far, apart from a brief hint in the alreadymentioned paper by Giovagnoli (2002b). A possible definition and some newresults are the topic of Section 3. Section 4 points at directions for research.

For reasons of simplicity and space in this chapter we only look at nominalvariables in two dimensions. A further development which deals with an agree-ment ordering for a different type of multivariate variables, namely discreteor continuous, is the object of another paper by the same authors (Giovagnoliet al. (2006)).

We end this introduction with some terminology. An equivalence in a set Sis a reflexive, symmetric, transitive relation. A pre-order E in S is a reflexiveand transitive relation (anti-symmetry is not required). To every pre-order Ethere corresponds an equivalence relation ,: if x, y ∈ S and x E y, y E xthen we say that x , y. All the one-to-one maps ϕ of S onto S such thatϕ(x) , x for all x ∈ S form a set GI , called the invariance set of E. All themaps ψ of S into S such that x E y implies ψ (x) E ψ (y) ∀ x, y ∈ S form theequivariance set GE of E. All the maps φ of S into S such that φ (x) E x forall x ∈ S form the contraction set GK of E. As well as the contractions, wecan define the expansions: all the maps φ of S into S such that x E φ (x) forall x ∈ S. Clearly GI ⊂GE and GI ⊂GK ; GI is a group, whereas GE andGK are semigroups.

A function f : S → R is order-preserving if x E y implies f(x) ≤ f(y).A trivial remark: if f is order-preserving and g : R → R is non-decreasing,then g ◦ f too is order-preserving. Clearly order-preserving functions must beinvariant w.r.t. GI .

2 Dependence orderings for two nominal variables

2.1 S-dependence and D-dependence of one variable on the other

Let X and Y be categorical variables with a finite number of nominal cate-gories, which we shall denote by x1,. . . ,xr and y1, . . . , yc respectively, just inorder to label them, without the labels implying any order among the cate-gories, i.e. x1 does not “come before”or “is less than”x2, etc. We are interestedin the joint (frequency or probability) distribution of (X,Y ), identified fromnow on by a table

https://www.researchgate.net/publication/261623084_Multivariate_Models_and_Dependence_Concepts?el=1_x_8&enrichId=rgreq-2643aa95-0193-4263-96d1-4479941d8845&enrichSource=Y292ZXJQYWdlOzIyNTkxNjczMztBUzo5OTc2ODc2NDIwNzExNEAxNDAwNzk4MTM1Njcx

Bivariate Dependence Orderings for Unordered Categorical Variables 3

Pr×c =

p11 p12 . . . p1cp21 p22 . . . p2c...

......

pr1 pr2 . . . prc

= (pij)

i = 1, . . . , r; j = 1, . . . , c, where pij ≥ 0,∑ij pij = 1. An alternative de-

scription is by means of the conditional and marginal distributions i.e. either(P ∗,pr), where P ∗ = (pij/pi+) and pr = (p1+, . . . , pr+)t , or (P ∗∗,pc), whereP ∗∗ = (pij/p+j) and pc = (p+1, . . . , p+c)

t. It is sometimes useful to includetables with null rows and/or columns, in which case P ∗ or P ∗∗ are defined bysetting all zeroes in the corresponding row and/or column.

To describe the dependence of Y on X, the following order relation ≤S(we call it S − dependence) was defined in Forcina and Giovagnoli (1987).

Definition 1.Let P and Q be two bivariate tables. Then Q ≤S P if and only if there existsa stochastic matrix S such that Q = StP .

The relation ≤S implies that the column margins of P and Q are equal:pc = qc.

Proposition 1 (Forcina and Giovagnoli (1987)).Q ≤S P is equivalent to the following two conditions holding simultaneously:

i) Q∗ = SP ∗ ii) pr = Stqr

with S another stochastic matrix.

Forcina and Giovagnoli (1987) showed that ≤S satisfies some intuitive require-ments for a dependence ordering.

We call S − equivalence the equivalence relation lS defined by ≤S . Sincepermutation matrices are doubly stochastic, a permutation of the rows ofP gives an S-equivalent table; the converse is not true, namely not all S-equivalent tables can be obtained by permutation as the following resultshows.

Proposition 2.

(1) Row-aggregation, i.e. replacing one of two rows by their sum and the otherby a row of zeroes, leads to a distribution with less S-dependence.

(2) When two rows are proportional, both row-aggregation and row-splitting(namely the inverse operation to row-aggregation) imply S-equivalence.

Proof. The matrices

S1 =

1 01 0

0

0t Ir−2

and S2 =

α 1− α1 0

0

0t Ir−2


are stochastic. Pre-multiplication by St1 gives aggregation of the first two rows.On the other hand, if the second row is zero, pre-multiplication by St2 splitsthe first row into two proportional ones.

There is another type of dependence of Y on X.

Definition 2.We define D-dependence as

Q ≤D Pdef⇐⇒ Q = PD (1)

with D a T -matrix, i.e. a product of T−transforms, namely matrices of theform Tα = (1− α)I + αΠ(2), 0 ≤ α ≤ 1 and Π(2) a permutation matrix thatexchanges only 2 elements; such a D is doubly-stochastic.

This can be thought of as a model for errors in the Y variable: since perfectdependence is obtained when to each x-category there corresponds preciselyonly one y-category, then α stands for the probability (frequency) of mistak-enly exchanging two y-categories. This ordering is known in the literature aschain-majorization (see Marshall and Olkin (1979)). The equivalence relationlD defined by ≤D is permutation of the columns of P .

Clearly these two orderings can be combined.

Definition 3.Define SD-dependence as follows

Q ≤SD Pdef⇐⇒ Q = StPD (2)

where S is a stochastic matrix and D a product of T -transforms.

The ≤SD ordering was defined in Forcina and Giovagnoli (1987), whohowever did not carry out a proper investigation of its properties. Note that(2) can be written as vec(Q) = (S ⊗D)t vec(P ) and means that there existsa bivariate distribution table R such that R ≤S P and Q ≤D R, and also thatthere exists R such that R ≤D P and Q ≤S R. Clearly both ≤S and ≤D arespecial cases of ≤SD. On the other hand, ≤SD allows more comparisons, inparticular matrices P and Q no longer need to have identical row or columnmargins.

Relative to maximal and minimal elements w.r.t. the ordering ≤SD, thefollowing results hold true:

Proposition 3.

(1) All the tables with independent rows and a given marginal distribution ofY are S-equivalent and are smaller w.r.t. ≤S than all the other tables withthe same Y -margin.

https://www.researchgate.net/publication/220695944_Inequalities_Majorization_Theory_and_Its_Applications?el=1_x_8&enrichId=rgreq-2643aa95-0193-4263-96d1-4479941d8845&enrichSource=Y292ZXJQYWdlOzIyNTkxNjczMztBUzo5OTc2ODc2NDIwNzExNEAxNDAwNzk4MTM1Njcx


(2) All the tables giving exact dependence of Y on X with the same Y -marginare S-equivalent and are greater w.r.t. ≤S than all the other tables withthe same Y -margin.

The proof is easily obtainable by the same techniques employed in Theorem3 of Forcina and Giovagnoli (1987).

Observe that Proposition 3 is not true for the ordering ≤D. However fromProposition 3 there follows

Corollary 1.

(1) The independence table with uniform margins (1/rc)Jr×c, where J standsfor the matrix of all ones, is smaller w.r.t. ≤SD than any other r× c jointprobability table.

(2) All the tables giving exact dependence of Y on X are SD-equivalent andare greater w.r.t. ≤SD than all the other r × c tables.

Let us now look at ways of transforming the order relation.

Proposition 4.The invariance group and contraction set of ≤S , ≤D and ≤SD are as follows,where the pair (A,B) denotes two matrices of dimension r × r and c × crespectively, acting on P by pre- and by post-multiplication respectively.

(1) GI(≤S) = {(Π1, Ic); Ic the identity, Π1 a permutation matrix}GI(≤D) = {(Ir, Π2); Ir the identity, Π2 a permutation matrix}GI(≤SD) = {(Π1, Π2); Π1, Π2 permutation matrices} == GI(≤S)

∨GI(≤D)

(2) GK(≤S) = {(St, Ic); Ic the identity, S stochastic}GK(≤D) = {(Ir, D); Ir the identity, D a T - matrix}GK(≤SD) = {(St, D); S stochastic, D a T - matrix} == GK(≤S)

∨GK(≤D)

Furthermore it can be shown that

(3) GE(≤S) k {(St1, S2); S1 stochastic and of full rank, S2 stochastic}GE(≤D) k {(St, D); S stochastic, D a T - matrix}GE(≤SD) = GK(≤S)

⋂GK(≤D)

Clearly, if we were interested in comparing bivariate tables as regards thedependence of X on Y we would consider the “transpose” order of ≤SD,namely

Q ≤tSD Pdef⇐⇒ Q = DPS where S stochastic, D a T -matrix. (3)


2.2 Measures of the dependence of Y on X

Measures of dependence are usually requested to take on their minimum valuewhen X and Y are independent, and usually such a minimum is assumed tobe equal to 0. Furthermore they should have their maximum value when thedependence of one variable on the other is perfect. Usually such a maximumvalue is used to standardize the indicator so that it varies between 0 and 1.According to the approach of this chapter all the indicators of the dependenceof Y on X (possibly before standardization) must preserve the ≤SD-ordering,which amounts to preserving both orderings ≤S and ≤D. In particular, theymust take the same value on tables that are SD-equivalent. Thus all measuresof SD-dependence must be invariant with respect to permutations of rows andof columns, row aggregation of proportional rows and row splitting when onerow is zero.

The dependence indicators used in the literature, in general, seem to fallinto three main types:

Type I) measures that compare optimal prediction of Y given X with optimalprediction of Y when X is unknown, i.e.

ΦY ·X =∆(pc)−

∑i pi+∆(p∗i )

∆(pc)

where ∆ : Rr → R stands for a measure of dispersion/heterogeneity ofthe distribution of Y , or of minimal expected loss in predicting Y .

Type II) weighted averages of measures of the mean information of Y givenX relative to the unconditional information of Y , i.e.

ΨY ·X = g

(∑i

pi+d(p∗i , pc)

)

where g : R→ R is increasing and d(u,v) is a measure of the “distance”or “diversity” of a distribution u on the set of unordered categories fromanother distribution v, or of information gain from prior v to posterior u.

Type III) weighted averages of the “distances” d(·, ·) between all pairs of dis-tributions of Y conditional on X

ΛY ·X(P ) = g

(r∑i=1

r∑i′=1

pi+pi′+d(p∗i ,p∗i′)

)

where g(·) and d(·, ·) are as in Type II.

Proposition 5.The order ≤SD is preserved by Type I indices when ∆ is convex and permu-tation invariant.


Examples are:

1. Guttman’s λ

λ(P ) =

∑i maxj pij −maxj p+j

1−maxj p+j

where ∆(u) = 1−maxi ui.

2. Goodman and Kruskal (1954)’s τ

τ(P ) =

∑i

∑j p

2ij/pi+ −

∑j p

2+j

1−∑j p

2+j

where ∆(u) = 1−∑i u

2i

3. Theil’s index η

η(P ) =−∑i

∑j pij log(pij/pi+p+j)∑j p+j log p+j

,

where ∆ is Shannon’s entropy: ∆(u) = −∑i ui log(ui).

Proposition 6.The order ≤SD is preserved by Type II indices when g is convex, d(u,v) is aconvex function on Rr × Rr, invariant under permutations of y1,. . . ,yc.

Examples are:

1. Gini’s connection index G

G(P ) =1

2

∑i

∑j

|pij − pi+p+i| . (4)

Here d(u,v) =∑i |ui − vi| is the city-block distance.

2. Good’s class of measures J

Jλ(P ) =∑i

∑j

pλij

pλ−1i+ pλ−1+i

(which is Pearson’s χ2 when λ = 2). Here d(u,v) =∑i

uλivλ−1i

− 1.

3. Halphen’s modulus of dependence H

H(P ) =∑i

∑j

pij log pij −∑i

pi+ log pi+ −∑j

p+j log p+j (5)

Here d(u,v) =∑i ui log(ui/vi).


Proposition 7.The order ≤SD is preserved by Type III indices if g(·) is an increasing realfunction, d(·, ·) is convex in both components and invariant under permuta-tions of y1,. . . ,yc.

For example, Goodman and Kruskal’s indicator is obtained by letting d(·, ·)be the Euclidean distance and g(·) the square root.

Propositions 5 and 6 were proved by Forcina and Giovagnoli (1987), Propo-sition 7 is proved in Giovagnoli (2002a).

2.3 Interdependence Orderings

Interdependence between two variables may be defined as some type of “dis-tance” of their joint distribution from the reference situation of independence,which corresponds to no association. A natural requirement is again invari-ance with respect to permutations of the rows and of the columns. Severalways of introducing orders of bilateral association between bivariate distribu-tions, which make use of the heuristic arguments presented in Forcina andGiovagnoli (1987) for the dependence case, have been suggested in Giovagnoli(2002a) but we are not aware of any thorough investigation carried out onthese order relations. For distributions with the same margins, Cifarelli andRegazzini (1986) and Scarsini (1991) also define an association order. Alter-natively, another way is to combine the dependence ordering ≤SD of Y onX with its transpose (dependence of X on Y ). Thus we can define an order-ing ≤SD−bil of bilateral association if X is SD−dependent on Y and Y isSD−dependent on X.

Definition 4.

Q ≤SD−bil Pdef⇐⇒ Q = St1PD1 and Q = Dt

2PS2 (6)

for some stochastic matrices S1 and S2 and some T - matrices D1 and D2.

This is clearly a pre-ordering, invariant under any permutation of the rowsand any permutation of the columns. As an example take:

P =

(0.4 0.20.1 0.3

), Q =

(0.25 0.350.25 0.15

), S1 =

(0.4 0.60.9 0.1

), S2 =

(0.25 0.750.75 0.25

),

D1 = D2 = I2 .

Proposition 8.

(1) For all P , let C = P11tP be the table with the same margins as P whosemargins are independent. Then C ≤SD−bil P .


(2) Table (1/rc)Jr×c, standing for the independence distribution with uniformmargins, is smaller w.r.t. ≤SD−bil than any other r × c joint probabilitytable.

(3) Exact dependence of Y on X and X on Y is possible only when r = c.When this happens, tables with exact dependence of rows on columns andcolumn on rows are SD − bil-maximal.

Note that if both P and Q have uniform margins, then S1 and S2 must bedoubly stochastic for (6) to hold.

A special case of (6) is when there exist T - matrices D1 and D2 such that

Q = D2PD1, i.e vec(Q) = (D2 ⊗D1)tvec(P ) (7)

Since T -matrices are doubly stochastic, so is D2⊗D1, thus (7) is also a special

case of majorization of the vecs, namely the association orderm< defined by

Joe (1985)

Qm< P ⇐⇒ vec(Q) = Dvec(P )

with D an rc× rc doubly stochastic matrix.

2.4 Measures of interdependence between X and Y

As to the order-preserving functions w.r.t. ≤SD−bil, all the measures of SD-dependence seen in Section 2.2 which are symmetric, i.e. invariant under trans-position of the matrix P (ΦY ·X(P ) = ΦY ·X(P t) ) are clearly measures ofbilateral SD-dependence. Among them there are:

1. Good’s measures Jλ and more in general measures of the form∑i

∑j

pi+p+j g

(pij

pi+p+j

),

with g(·) convex on [0,∞), as mentioned in Scarsini (1991).

2. Gini’s indicator (4).3. Halphen’s indicator (5).4. Type III indicators with d(u,v) = d(v,u).

In the literature one is advised to consider as a measure of reciprocal de-pendence of X and Y some type of average of two measurements, dependenceof X on Y and dependence of Y on X, calculated by the same indicator:

MXY (P ) = M (ΦY ·X(P ), ΦX·Y (P ))

which clearly are SD − bil-preserving if ΦY ·X is SD-preserving.


3 Inter-raters agreement for categorical classifications

3.1 How to compare agreement among raters

Classification and rating are basic in all scientific fields, and often there is theneed to test the reliability of a classification process by assessing the level ofagreement between two or more different observers (the raters) who classifythe same group of statistical units (the subjects). In this section we want toapply stochastic orderings to define what we mean by “agreement”among agroup of observers who rate the same units on a categorical scale. Note thatidentical arguments apply if, instead, the same group of observers classifydistinct sets of individuals or items, or the same observers rate the sameitems at different times.

In the literature the extent of inter-rater agreement has been studiedmainly by means of indicators. Cohen’s Kappa is the most popular indexof agreement for the case of two raters on a categorical scale of measurement(Cohen (1960)). A different approach is by means of statistical models whichdescribe the structure of the relationship, mainly log-linear and latent classmodels, see Banerjee et al. (1999).

Our approach is different: given the sample space (the subjects) and theset S of all the possible ratings (unordered classes), we talk about the agree-ment among a group of observers as a type of dependence among the jointlydistributed random variables with values in S representing the ratings of thevarious observers (Bishop et al. (1975)). The question “when does one group orraters show more agreement than another?”is answered defining an order rela-tion among multidimensional random variables (see also Giovagnoli (2002b)).Indicators of agreement will be all the real-valued functions preserving theordering under consideration.

We start off with a set of “reasonable” requirements for any agreementordering. Assume for now that d raters classify just one statistical unit (asubject) into one of m unordered classes. The way in which agreement ofraters relative to that unit is expressed should satisfy the following set ofaxioms:

A-0 The agreement is a maximum when all the raters classify the subject intothe same class.

A-1 The extent of agreement is independent of how we order (label) the classes.A-2 The extent of agreement is independent of how we order (label) the raters.

We believe these axioms are sufficient to characterize agreement in the case ofjust two raters, whereas further properties may be needed when their numberis greater than two. In particular we want to express the fact that agreementincreases if one of the raters in a minority group changes his/her classificationof a particular subject (statistical unit) to a class with larger consensus amongthe other raters. If we define the class distribution of that subject to be thefrequency distribution of the way in which the statistical unit is diverselyclassified by the raters, we state that


A-3 The extent of agreement cannot decrease when the class distribution ofthe subject among the raters becomes more concentrated.

We point out that Axiom A-3 is in accordance with the widespread ap-proach in the literature (Armitage et al. (1966), Fleiss (1971), Davies andFleiss (1982)) that measures agreement by counting pairs of raters who agreeon the classification of a single unit.

3.2 An agreement ordering for the case of two raters

For simplicity in this chapter we consider the case of just two raters. Our aimis to define an agreement ordering for bivariate distributions in such a waythat Axioms A-0 to A-2 hold true, so that these properties are automaticallysatisfied by any agreement measure that preserves the ordering. This approachwill help us clarify the behaviour of some commonly used indicators. Let P =(pij) with i, j = 1, . . . ,m, denote the joint classification probabilities of thetwo raters. By Axioms A-1 and A-2, for every m×m table P and permutationmatrix Π, we require P to be equivalent to ΠtPΠ and also to be equivalentto P t, so that

P lagr P, P lagr ΠtPΠ, P lagr P

t, P lagr ΠtP tΠ (8)

where we write lagr to mean equivalence under agreement. Furthermore, byAxiom A-3 if we let

pti = (pi1,pi2, ..., pij , ..., pim)

the agreement increases by replacing this row with

pti = (pi1,pi2, ..., pii + δ, .., pij − δ, ..., pim)

where 0 < δ ≤ pij , and similarly for any column. This can be formalized as

follows. Define Eij = (eijhk) with i 6= j the m×m “shift” matrix where

eijhk = 1 if h = k = i, eijhk = −1 if h = i, k = j, eijhk = 0 otherwise.

Definition 5.Given two tables P = (pij) and Q = (qij), i, j = 1, . . . ,m, the agreement of Pis lower than that of Q if P is obtained from Q by means of a finite numberof “shifts” on the rows or columns, i.e.

Q = P +∑i 6=j

δijEij +∑i6=j

δijEtij where δij ≥ 0, δij ≥ 0, δij + δij ≤ pij

We call this relation “∆-ordering”and write ≤∆. It is easy to check that:

Proposition 9.The ∆-ordering is a partial order.


It is also easy to show that the ∆-ordering is consistent with the axioms. A-0is clearly true. Furthermore:

Proposition 10.Given two m×m tables P and Q

P ≤∆ Q =⇒ ΠtPΠ ≤∆ ΠtQΠ and P t ≤∆ Qt (9)

Remark 1.It is easy to see that given m×m tables P, Q and T

P ≤∆ T and T lagr Q =⇒ there exists R such that P lagr R and R ≤∆ Q

This enables us to widen the definition of agreement ordering to the fol-lowing relation ≤agr.

Definition 6.P ≤agr Q if and only if there exists T such that P ≤∆ T and T lagr Q.

Proposition 11.The relation P ≤agr Q is a pre-order. Furthermore, P ≤agr Q and Q ≤agr Pif and only if P lagr Q, with lagr defined as in (8).

Proof. By virtue of Remark 1, transitivity of ≤agr holds.

The invariance, equivalence and contraction sets of ≤agr are implicitly definedby (7), (8) and Def. 5 respectively.

3.3 Order preserving indicators with respect to the agreementordering

We now want to check how the definition of ≤agr fits in with existing measuresof agreement. A detailed description of such measures is given in Shoukri(2004).

In order to preserve ≤agr an indicator must be a function of table P which

1. is invariant under any permutation of the rows and the same permutationof the columns of P ;

2. is invariant by transposition of P ;3. preserves ≤∆.

Measures that are invariant under any permutation of the rows and any per-mutation of the columns do not appear to be suitable as potential measuresof agreement. This applies for instance to Pearson’s well-known Chi-squaredindicator.

We recall that Q ≤agr P implies that there exists a permutation π suchthat


1. qii ≤ pπ(i)π(i), for all i = 1, ...,m ;2. either qij ≥ pπ(i)π(j), or qij ≥ pπ(j)π(i), for all i = 1, ...,m .

Note that when π is the identity, property 1. above defines the ≤NAIForder relation of Giovagnoli (2002b).

Total Proportion of AgreementThe Total Proportion of Agreement is the indicator TPA =

∑i pii. Clearly

TPA is order-preserving with respect to ≤agr.

Cohen’s KappaWhen the TPA measure is chance-corrected, namely the amount of agree-

ment obtained for the effect of chance alone is subtracted, and the result isnormalized with respect to the maximum value it can assume, it gives rise tothe Kappa measure introduced by Cohen (1960):

κ(P ) =

∑i pii −

∑i pi+p+i

1−∑i pi+p+i

.

Corollary 2.For tables with the same margins, Cohen’s Kappa preserves the agreementordering ≤agr.

However the following counterexample, in which P ≤agr Q ≤agr R andκ(R) ≤ κ(Q) ≤ κ(P ), shows that this result does not hold in general for anytwo bivariate distribution tables:

P =

0.10 0.20 0.500.01 0.00 0.010.16 0.01 0.01

κ = −0.2970

Q =

0.20 0.10 0.500.01 0.00 0.010.16 0.01 0.01

κ = −0.2989

R =

0.30 0.00 0.500.01 0.00 0.010.16 0.01 0.01

κ = −0.3014.

Remark 2.

i) This is an undesirable behavior of the Kappa indicator. However, it canbe shown to take place only under special circumstances, and only whenits values are negative, which does not occur very frequently in actualpractice (a detailed discussion can be found in Marzialetti (2006)).

ii) It can also be shown that for m = 2 Cohen’s Kappa always preserves the∆-ordering and thus the ≤agr-ordering (Marzialetti (2006)).


Farewell and Sprott (1999)’s indexAnother measure that clearly is invariant under a permutation of the sub-

fixes and their exchange is ∑i<j

logpiipjjpijpji

.

Because of properties 1. and 2. this measure preserves ≤agr .

Rogot and Goldberg (1966)’s indicesThe following measures are defined only for m = 2, and are both order-

preserving w.r.t. ≤agr:

i) A1 =p11

p1+ + p+1+

p22p2+ + p+2

A1 is clearly invariant under permutations and transposition. Consideringjust one row shift for simplicity, we must show that

p11p1+ + p+1

+p22

p2+ + p+2≤ p11 + δ

p1+ + p+1 + δ+

p22p2+ + p+2 − δ

This is always true since

p11p1+ + p+1

≤ p11 + δ

p1+ + p+1 + δand

p22p2+ + p+2

≤ p22p2+ + p+2 − δ

.

Hence A1 preserves ≤agr.

ii) A2 =p11p1+

+p11p+1

+p22p2+

+p22p+2

A2 is clearly invariant under permutations and transposition. The rest ofthe proof retraces the steps of the previous one.

Jacquard’s coefficient

This is given by J =p11

p11 + p21 + p22This measure of agreement is not permutation invariant and does not appearto fit in our set-up, since one of the two categories, namely the first one, istaken to be of special interest, and not interchangeable with the second one.

In conclusion, we have shown that in most cases the order relation ≤agr de-fined by Definition 6 seems to capture the essence of the concept of agreementbetween two observers measured by the most commonly used indicators.

4 Conclusions and further research

There are many ways of defining association for multivariate nominal vari-ables, and in particular agreement among observers seen as a special case ofthe degree of association of their joint classification probabilities. We hope to


have shown that stochastic orderings sometimes combined with group theoryprovide a powerful mathematical environment for investigation of measures ofassociation and agreement. The group theoretical ideas need further develop-ment. Moreover, the idea of matrix G-majorization (defined in Giovagnoli andWynn (1985)) could be extended to cover some of the order relations definedin this chapter, in particular the order relation of Definition 3.

We believe that a research programme in this area should address thefollowing questions, which are only partly answered in the present chapter.

1. a) When does a multidimensional random variable show the same “asso-ciation structure”as another one? In other words, how can we describethe equivalence relation “two multidimensional variables are equallyassociated”?

b) Which operations on a joint probability distribution do not affect thedependence structure?

2. a) When is one multidimensional random variable more associated thananother?

b) Which operations on a set of jointly distributed variables increase ordecrease dependence among them?

3. What are suitable criteria for representing the suggested ordering on anumerical scale, i.e. what is the class of all order preserving functions(consistent indicators) for that type of association?

AcknowledgementsSection 3 is part of second author’s PhD thesis Marzialetti (2006), jointlysupervised by the other authors of the present work. This research was com-pleted when the third author was a Senior Fellow of the I.S.A. (Institute ofAdvanced Study) of the University of Bologna. Thanks are also due to Pro-fessor Antonio Forcina of Perugia University for stimulating discussions.

References

Armitage, P., Blendis, L. M., and Smyllie, H. C. (1966). The measurement of ob-server disagreement in the recording of signs. Journal of the Royal StatisticalSociety, 129, 98–109.

Banerjee, M., Capozzoli, M., McSweeney, L., and Sinha, D. (1999). Beyond kappa:A review of interrater agreement measures. The Canadian Journal of Statistics,27(1), 3–23.

Bickel, P. J. and Lehmann, E. L. (1975). Descriptive statistics for non-parametricmodels. i: Introduction. The Annals of Statistics, 3, 1038–1044.

Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975). Discrete multivariateanalyses: Theory and practice. MIT Press (Cambridge, MA).


Cifarelli, D. M. and Regazzini, E. (1986). Concentration function and its role indescriptive statistics. Proceedings of the 33rd Scientific Meeting of the ItalianStatistical Society, 2, 347–352.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational andPsychological Measurement, 20, 37–46.

Davies, M. and Fleiss, J. (1982). Measuring agreement for multinomial data. Bio-metrics, 38, 1047–1051.

Farewell, V. T. and Sprott, D. A. (1999). Conditional inference for predictive agree-ment. Statistics in Medicine, 18, 1435–1449.

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psy-chological Bulletin, 76, 378–381.

Forcina, A. and Giovagnoli, A. (1987). Mathematical properties of cross-classification matrices. Bollettino Unione Matematica Italiana, VII(1-B)(2),523–544.

Giovagnoli, A. (2002a). Differenti aspetti della connessione: un approccio attraversola teoria degli ordinamenti. In Studi in onore di Angelo Zanella. Ed. Vita ePensiero, 329-334.

Giovagnoli, A. (2002b). Stochastic orderings and their use in statistics: the case ofassociation between two variables. In Proceedings of the XLI Scientific Meetingof the Italian Statistical Society, 5-7 June 2002, Milan, 95-104.

Giovagnoli, A. and Wynn, H. P. (1985). G-majorization with applications to matrixorderings. Linear Algebra and Applications, 67, 111–135.

Giovagnoli, A., Marzialetti, J., and Wynn, H. P. (2006). A new approach to inter-rater agreement through stochastic orderings: the discrete case. Manuscript.

Goodman, L. A. and Kruskal, W. H. (1954). Measures of association for crossclassifications. Journal of the American Statistical Association, 49, 732–764.

Goodman, L. A. and Kruskal, W. H. (1979). Measures of association for crossclassification. Springer-Verlag, New York.

Joe, H. (1985). An ordering of dependence for contingency tables. Linear Algebraand its Applications, 70, 89–103.

Joe, H. (1997). Multivariate models and dependence concepts. Chapman & Hall,London.

Marshall, A. W. and Olkin, I. (1979). Inequalities: Theory of Majorization and ItsApplications. Academic Press.

Marzialetti, J. (2006). Lo Studio dell’Agreement mediante gli Ordinamenti. Ph.D.thesis, Department of Statistical Sciences, University of Bologna.

Muller, A. and Stoyan, D. (2002). Comparison Methods for Stochastic Models andRisks. Wiley.

Rogot, E. and Goldberg, I. D. (1966). A proposed index for measuring agreementin test-retest studies. Journal of Chronical Disease, 19, 991–1006.

Ross, S. (1995). Stochastic Processes. Wiley.Scarsini, M. (1991). An ordering of dependence. In Topics in Statistical Depen-

dence, pages 403–414. Block, H.W., Sampson A.R., Savits, T.H. eds, Instituteof Mathematical Statistics (Hayward).

Shaked, M. and Shanthikumar, J. G. (1994). Stochastic Orders and their Applica-tions. Academic Press New York, London.

Shoukri, M. M. (2004). Measures of Interobserver Agreement. Chapman & Hall,Boca Raton.

Index

agreement ordering, 2, 10–12association, 1, 8, 14, 15

classification, 1, 10, 11Cohen’s Kappa, 10, 13

dependence, 1, 3–6, 9, 15

G-majorization, 15

inter-observer agreement, 1, 10

nominal random variables, 2

partial ordering, 1pre-ordering, 1, 8, 12

stochastic ordering, 1, 10, 14

18 Index

Documents

Bivariate Dependence Orderings for Unordered Categorical Variables