The reduction and ancillary classification of observations

ELSEVIER Journal of Statistical Planning and

Inference 45 (1995) 347 372

journal of statistical planning and inference

The reduction and ancillary classification of observations

E.E.M. van Berkum*, H.N. Linssen, D.A. Overdijk

Department o[ Mathematics and Computing Science, Unicersity qf Technolo~ly, P.O. Box 513. 5600 ~IB Eindhoven. Netherhmd~'

Received 3 March 1992; revised 20 December 1993

Abstract

Generalized definitions of the invariance, (partial) sufficiency and ancillarity principles are given in a measure-theoretic context. Data reduction in statistical inference is described in terms of these principles.

A M S Sul~/ect Class!fication: 62A05, 62B05, 62B20

Key words: Invariance; Sufficiency; Ancillarity; Data reduction

O. Introduction

The concepts of sufficiency and ancillarity in statistical inference were introduced

by Fisher. Many authors contr ibuted to the generalization of these concepts (see

Bhapkar (1989, 1991) for references). In this paper a measure-theoretic description of the invariance, (partial) sufficiency and (partial) ancillarity principles is given. Measure-theoretic formulat ions of these principles are independent of particular

parametr izat ions of the family of probabil i ty distributions. Consequently this ap- proach allows for general and concise definitions.

In Section 1 we give the mathematical prerequisites that are used in the other

sections. Section 2 describes in general terms the probabil i ty structure of the observa- tional evidence. Sections 3 and 4 describe its sufficient reduction and ancillary conditioning. Statistical inference may have a given form, e.g. there may be a para-

meter of interest. This interest specification together with sufficiently reduced and condit ioned probabil i ty structure constitute an inference model as described in

Section 5. The interest specification makes invariant reduction possible as defined in

* Corresponding author.

0378-3758/95/$09.50 C) 1995 ELsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 4 ) 0 0 0 8 8 - 3

348 E.E.M. van Berkum et al./Journal of Statistical Planning and Inference 45 (1995) 347-372

Section 6. Invariant reduction as defined by Barnard (1963) is a special case. In Sections 7 and 8 the concepts of partially sufficient reduction and partially ancillary conditioning are introduced. These definitions generalize other definitions of partial sufficiency and ancillarity (see Bhapkar, 1989, 1991). In Section 9 we propose a sequence in which the transformations described in the previous sections should be applied. Through this sequence of transformations every probability structure is

changed into one or more reference models. In Section 10 we discuss some aspects of our reduction and classification scheme. Section 11 gives a number of examples.

1. Preliminaries

In this section we collect preliminaries for reference in subsequent sections. There- fore the reading of the paper can start with Section 2 and the relevant parts of this section can be read when they are referenced.

1.1. Let 2 be a map from a set X into a set Y. We refer to X as the domain and to Yas the codomain of 2. The powerset of X, i.e. the collection of subsets of X, is written as

P(X). The maps

2~ : P(X)- - ,P(Y) ,

2~ : P ( Y ) ~ P ( X )

are defined as

2~(A) :={2(a)eYIa~A}, A ~ X . (1.1.1)

2~(B): = {beXI2(b)eB} , B ~ Y.

Let ~ be a a-field of subsets of the codomain Y, i.e. ( L o~) is a measurable space. We refer to

a(2) := { 2 ~ ( B ) c X l B e o ~ } (1.1.2)

as the a-field of subsets of the domain X generated by the map 2 from X into the measurable space (Y, ~-).

1.2. Let X be a set and let ~ ¢ c P ( X ) be a collection of subsets of X. The intersection of the a-fields of subsets of X containing d is said to be the a-field a ( d ) generated

by d .

1.3. Let (X,o ~ ) be a measurable space. For A c X we write

o~lA:= { B ~ A I B e ~ },

and we refer to the a-field o~lA on A as the trace of o~ on A.

(1.3.1)

E.E.M. van Berkum et al./Journal Of Stati,stical Planning and h![i~rence 45 (1995) 347-372 2;49

1.4. We now formulate a theorem for later use.

Theorem. Let 2: X ~ ( X x , ~ x ) be a sur]ection./?om a set X into a measurable space

(X1, ~1), and let ~o c a(2) be a a-field on X. We have: (i) 2 ~ 2 ~ ( A ) = A for all AEa(2).

(ii) ~2 ,(A)lAs~-o} c ~ a is a a-field.

Proof. Let A~a(2). There exists BeJ~a such that A=2~(B) . Since the map 2 is a surjection, it follows that

2 . , )~(B) = B,

and we can then conclude that

2~2AA) = 2~2~2~(B) = 2~(B) = A,

which proves (i).

It is easily verified that

~),~(A)IA ~ 0 } = { B ~ , ~ I [ 2~(B)~,~-o } -

The collection on the r ight-hand side is a g-field, which completes the proof of the theorem.

1.5. Let X be a set and let I be an index set such that to every i e I there correspond

a measurable space (Xi ,~i) and a map hi: X ~ X i . The a-field on X generated by the collection

A:-- {2i l ie l}

of maps on X is written as a(A) and

see 1.2 and (1.1.2).

1.6. Let (XI,J~I) and ( X z , ~ z ) be two measurable spaces. A bijection 2 : X l ~ X 2 is

said to be a measurable isomorphism between (Xl, fiX ) and (X2, ~2) , if both )~ and its

inverse )~ 1 are measurable. I f ( X l , ~ l ) = ( X 2 , J ~ z ) , then we refer to 2 as a measurable

au tomorph i sm on (Xa, ~1) . Let 2 be a measurable isomorphism between (XI, ~1 ) and (X2, W2). Note that the

bijection 2~" o~ ~ z satisfies

)~ ~ (A c~ B) = 2 ~ (A) c~ 2 ~ (B) ( 1.6.1 )

for all A, B ~ .

350 E.E.M. van Berkum et al./Journal of Statistical Planning and lnJerence 45 (1995) 347 372

1.7. Let 3 be a measurable automorphism on the measurable space (X, ~ ) ; see 1.6. The a-field

J(6) := {A ~o~ [ c5~ (A)= A} (1.7.1)

is called the a-field of invariant sets under 3.

1.8. Let (XI,.~I) and (Xa,ff2) be two measurable spaces. A bijection 2 : f f l - - - ~ f f 2 1S

said to be an isomorphism between the a-fields ~-1 and J~2, if

2(Ac~B)=2(A)n2(B) (1.8.1)

for all A , B ~ I . I f ( X I , ~ I ) = ( X E , ~ 2 ) , then we refer to 2 as an automorphism on the a-field f f l - It follows from (1.6.1) that a measurable isomorphism between (X l , f f l )

and (X2, ~ z ) induces an isomorphism between the a-fields ~,~ and fie- The converse, however, is not true in general.

1.9. Let 7 be an automorphism on the a-field f f of a measurable space (X, ~ ) ; see 1.8. The a-field

J(7) := { A ~ ] T ( A ) = A }

is called the a-field of invariant sets under ?.

(1.9.1)

1.10. Let ( X , ~ ) be a measurable space. The set of probability measures on ~ is denoted by ~ . For A e ~ consider the map

2a(p):= p(A)e[0, 1], p6o~, (1.10.1)

from ~ into the unit interval. The a-field of Borel subsets of the unit interval is written as ~ [ 0 , 1]. The a-field generated by the collection

of maps from ~ into the measurable space ([-0, 1], ~ [0 , 1]) is denoted by ~ ; see ^ _

(1.5.1). We refer to ( ~ , ~ ) as the space of probability measures on the a-field ~ . Let c ~ be a collection of probability measures on ~ . We refer to (~, ~,~[~) as a space

of probability measures on ~ ; see (1.3.1).

1.11. Let ( X , ~ ) be a measurable space and let ~ o C ~ be a a-field on X. Further- more, let ( ~ , ~ 1 ~ ) be a space of probability measures on ~ ; see 1.10. The marginal probability measure on J~o corresponding to p e ~ is denoted by ~p(p). We refer to the map

~:(~,-¢I~))~(.¢o, ~o) (1.11.I)

E.E.M. van Berkurn et al./Journal Of Statistical Planning and ln/~,rence 45 (1995) 347-372 ~51

^ _

as the marginal iza t ion m a p on ~ cor responding to .~0. Here (~0, Yo) is the space of

probabi l i ty measures in ~0 ; see 1.10. The a-field a(q~) generated by the m a p ~p from into (o~o, ~ o ) is the smallest a-field on ~ such that the m a p

2A(p):= p(A), p ~ ,

fi'om d~ ° into ([0, 1], ~ [ 0 , 1]) is measurable for all A E,~-o. Hence,

o{q,) ~ .~1~ . i I. 11.2)

Let ~ o = ,~-1 c ,~- be two a-fields on X. The marginal iza t ion maps on ~ corresponding to ~ 0 and ,~-~ are writ ten as q)o and O~, respectively. We have

a(~p0) c a(~pl ). (I. I 1.3)

1.12. Let X be a set and let o~ , o~2 be two a-fields on X. The a-fields ~ and ,~2 are called algebraically independent , nota t ion ~ l ~2,~ if every inclusion between sets of ~ and ~ 2 is trivial, i.e. for all A 6 ~ 1 and B c ~ 2 we have

A c B ~ A = O o r B = X . (1.12.1)

1.13. Let (X, o ~ ) be a measurable space and let ~ c ,~ be a set of probabi l i ty measures on ~ . The space of all probabi l i ty measures on ,~- is written as (~ ' , .~) ; see 1.10. The a-field ~ o c ,~ - on X is said to be regular, if there exists a measurable m a p ~ from

(X x ,~, ,~-0 ® . ~ l ~ ) into (~ ' , ~ ) such that

A O(x ,p) (B)p(dx)=plA~B) (1.12,.1)

for all A ~ o , B e ~ and p e ~ . Here ~ o ® ~ l ~ is the produc t a-field on the Cartesian product X x ~ cor responding to the o-fields ,~"~" 0 and ~ I N on X and ~ , respectiw,'ly. For p e ~ we refer to the m a p

~,(-, p): (x, ~o) ~(o~, ~ ) i1. ~ ;~.2)

as the regular condi t ional probabi l i ty measure on ~ given .~-o cor responding to pe:~. A measurable m a p S f rom (X, o~) into a measurable space is said to be a statistic on ( X , ~ ) , if the ~-field a ( S ) c o ~ on X generated by S is regular.

1.14. Let (X, ~ ) be a measurable space and let ~ ' c ,~ - be a set of probabi l i ty measures on o ~ . The space of all probabi l i ty measures on J~ is written as (Y ' , ,~ ) , see 1.10. Let ,-~o c ~ on X be a regular a-field; see 1.13. In general the m a p ~ satisfying (1.13.1) is not unique. Let cg be the n o n e m p t y collection of maps ~ satisfying (1.13.1 ).

Fix x e X and let a(O(x, " ) ) c ~ l ~ be the a-field on ~ generated by the m a p O(x,-) ^ _

from ~ into ( ~ , ~ ) . The collection c~ is part ial ly ordered by ~<;x defined as

4,~(~<;x)~,2 : ~ a(q~(x,.))=a(q~(x,.)), q~l, ~ . i~.14.1)


2. Probability structure

Consider an experiment. The set of possible outcomes of the experiment is denoted by O. The a-field of events on ~2 is written as S. The measurable space (f2, X) is said to be the sample space of the experiment. Let (2, 2 ) be the space of probability measures on 2;; see 1.10. The probability distribution on S corresponding to the outcome of the experiment is not known. However, a subset ~ c 2 is given such that the probability distribution of the outcome of the experiment is in the set ~. The space (N, 21~ ) of probability measures on 2 is referred to as the probability model for the outcome of the experiment; see 1.10.

Consider the a-ring

JV:={A~XIp(A)=O for all p ~ } (2.1)

of the so-called negligible subsets of the sample space. All equalities and inclusions between sets in Z have to be interpreted modulo JV, i.e. for all A, B e S

A c B: .¢~ A \B~./V'. (2.2)

The two measurable spaces

sample space: (O, Z),

probability model: ( ~ , S I ~ )

are said to constitute the probability structure of the experiment.

3. Sufficiency

Let (f2, S) be the sample space and (~, 2 IN) the probability model of a probability structure. Consider a statistic S on the sample space; see 1.13. The a-field on f2 generated by S is denoted by a ( S ) c S ; see (1.1.2). The marginalization map from

into the space (a(S), a(S)) of probability measures on a(S) is written as tp; see 1.11. Let

O:(Q × ~ , a(S) ® 2 1 ~ ) ~ ( 2 , S) (3.1) ^ _

be a map as described in (1.13). Here (Z, 2:) is the space of probability measures on S; ^ _

see 1.10. Hence, for p e ~ the map ~b(.,p) from (I2,a(S)) into (S,S) is a regular conditional probability measure given S corresponding to p e ~ . Fix weE2 and consider the map

~,(o9, .): (~, 21~)-,(2, 2). (3.2)

The a-field on ~ generated by the map ~k(co,-) from ~ into (2, S) is denoted by

a(O(~o,))=21~.

E.E.M. van Berkum et al./Journal o/" Statistical Planning and ln[erence 45 (1995) 347 372 353

The statistic S on (I2,2;) is said to be sufficient, if the conditional probability measure ~(0, p) given S = S(~o) is independent of p ~ for all o)ef2, i.e. the map qJ in

(3.1) can be chosen such that

a ( 0 ( o , . ) ) = { 0 , ~ } (~+.3)

for all o)eQ. The sufficient statistic S is said to be minimal sufficient, if for all sufficient statistics S' on (•, 2;) we have

a ( S ' ) c a ( S ) =~ a (S ' )=a (S ) . (3.4)

Two statistics are considered to be identical, if they generate the same a-field on the sample space. In that sense there exists in general a unique minimal sufficient statistic. Let S be the unique minimal sufficient statistic. So for every sufficient statistic S' we have

a(s) = a(s'). (3.5)

We now transform the given probability structure and we refer to this transformation as the sufficient reduction of the probability structure under consideration.

The sample space of the new probability structure is

(Q,a(S)). (3.6)

The probability model of the new probability structure can be written as

(~ , , a(S) l&), (3.7)

where

~1 := { q ~ ( p ) ~ a ( ~ l p ~ } .

The conditional probability measure qJ(co, p) is independent of p e ~ and therefore the map

q, :(¢', 21~)-~(~ , , a(s)L~,) (3.8)

is a measurable isomorphism; see 1.6. If

or(S) = S, (3.9)

then the sufficient reduction of the probability structure is said to be trivial. For an illustration of the concepts in this section we refer to Example 11.1.

4. Conditioning

Let (f2, S) be the sample space and (~ , 21~) the probability model of a probability structure. Consider a statistic C on the sample space. The a-field on f2 generated by C is denoted by a ( C ) c S; see (1.1.2). The marginalization map from ~ into the space

354 E.E.M. van Berkum et al./Journal of Statistical Plann&g and Inference 45 (1995) 347-372

(a('~), a(C)) of probability measures on a(C) is written as (p; see 1.11. The a-field on generated by ~p is written as a(~0)c21~. The statistic C is said to be an ancillary

statistic, if the probability distribution of C is independent of p e ~ , i.e.

a(cp)= {0,~}. (4.1)

The ancillary statistic C is said to be a maximal ancillary statistic, if for all ancillary statistics C' on (~, S) we have

a(C)c~r(C') ~ a(C)=a(C'). (4.2)

In general there does not exist a unique maximal ancillary statistic. The probability distribution of a maximal ancillary statistic C is independent of p e ~ and therefore we can consider the experiment as a mixture of experiments corresponding to the possible values of C. So we transform, for every coE~2, the given probability structure in the following way.

Let C be a maximal ancillary statistic on (Q, Z). Let

0:(f2 x ~ , a(C) ® 2,1~)--.(2, 2) (4.3)

be a map as described in (1.13). Fix co~(2 and suppose that ~O in (4.3) has been chosen such that ~k is minimal with respect to the partial order ~< ;co in (1.14.1). The sample space of the new probability structure is

(Q, Z). (4.4)

The probability model of the new probability structure can be written as

(~'~, 2[~°~), (4.5)

where

~ : = { O(co, p ) e 2 l p e ~ } .

If

~r(C) = {0, f2}, (4.6)

then the maximal ancillary statistic C is said to be trivial. For an illustration of the concepts in this section we refer to Examples 11.2 and ll.3.

5. Interest specification and inference model

Let (f2, X) be the sample space and let (~ , ,~[~) be the probability model of a probability structure such that the sufficient reduction is trivial, i.e. for every sufficient statistic S on (f2, S) we have

,r(S)=Z, (5.1)

E.E.M. L:an Berkum el al./Journal o f Statistical Planning and hff~,rence 45 (1995) 347 372 t55

and every ancillary statistic is trivial, i.e. for every ancillary statistic C on (~, S) we have

ace) = {0, ~}. (5.2)

Let Po be the probability distribution of the outcome of the experiment. It may be that one is interested only in a specific aspect ofpo, i.e. a a-field ~ c S l ~ is specified such that every inferential statement can be written as po~A with Ae~ . We refer to ~ as the a-field of interest.

The probability models (~, S I~) and (~+~, L ' I ~ ) , e~eO [see 4.5), are not isomorphic in general, and therefore the initial specification of the a-field ~ of interest should take place in a probability structure satisfying (5.2).

The triple

sample space: (Q, S),

probability model: (~,SI:~) ,

a-field of interest: o~c ,~ l~

is said to constitute an inference model for the experiment. Specific a-fields of interest can be found in Examples 11.4-11.11.

6. Invariance

Consider a group of transformations on the sample space such that the parameter space is invariant under the induced transformations. If the distribution of some statistic can be parametrized by a set of parameters including the parameter of interest and the statistic is constant on each orbit and takes different values on different orbits of the above group, then statistical inference should be based on the value of the statistic and its distribution. This is the invariance principle; compare G-sufficiency in Barndorff-Nielsen (1978). A measure-theoretic formulation is given below.

Let (£2, S) be the sample space, ( ~ , S I ~ ) the probability model and ~ c S ] : ~ the a-field of interest of an inference model. The set of automorphisms of the a-field ~+ ': is denoted by F (see 1.8) and the set of measurable automorphisms of (~, S I~) by A (see 1.6). Introduce the set

V:= {(7,3)~F x A [6(p)(~,(A))=p(A) for all p ~ , A~Z). (6.1)

For (7, 6)c V let J (7)c Z be the a-field of invariant sets under ~, and let J (6)c 2 [ ~ be the a-field of invariant sets under 3; see (1.9.1) and (1.7.1). The marginalization map on ;4 ~ corresponding to J(7) is written as ~p.¢, i.e. for (7,6)~ V

~0;,:(~, Zl~)--+(J(?), J(~)), (6.2)

where (J(y), J(7)) is the space of probability measures on J(7); see 1.11. The a-field on ~ generated by ~p~ is denoted by a(~oT)c SI~ .


Consider a statistic I on the sample space. The a-field on f2 generated by I is denoted by a ( l ) : Z ; see (1.1.2). The marginalization map from ~ into the space (a(/'I), a(I)) of probability measures on a(I) is written as ~o; see 1.11. The a-field on

generated by ~0 is denoted by a(~o): SI~. The statistic I is said to be invariant, if the interesting aspect of p E ~ is a function of the marginal distribution ~0(p), i.e.

~ca(~p) , (6.3)

and there exists a set B : V such that

a ( I )= ~ J(7). (6.4) (y,6)GB

The invariant statistic I is said to be minimal invariant, if for all invariant statistics I' we have

a(I')~a(I) ~ a(I')=a(I). (6.5)

If a statistic satisfies (6.4) for some B c V, then in the literature the statistic is said to be maximal invariant with respect to B ~ V; see Gift (1982).

Let I be an invariant statistic. So there exists B : V such that (6.3) and (6.4) are satisfied. If AG~, then A is invariant under 6 for every (7,6)GB, i.e.

~ N J(6). (6.6) (y,~)eB

We now prove (6.6). For (7, 6)~ V, A eJ(7) and 0 ~<x ~< 1 consider

W:= {pG~lp(A)<-..x}Ga(~or).

For pc W we get

6(p)(A) = 6(p)(7 (A)) = p(A) <~ x,

6-1 (p)(A) = 6-1 (p)(7- I(A)) = p(A) <~ x.

So we conclude WGJ(6), and therefore a(~or)cJ(6) for all (7,6)GV. Use (1.11.3) to obtain a(~o):a(q~r) for all (7,6)~B. Hence, for all (7,6)~B we have

~l = a(qg) = a(cp~): J(6) = ,~1~, (6.7)

which completes the proof of (6.6). We now discuss the existence of a unique invariant statistic. Introduce the set

Q := {(7, 6)G vI ~ : a(~0~) }, (6.8)

and the a-field

Sin:= (~ J (7)= X. (6.9) (y,~)GQ

E.E.M. van Berkum et al./Journal Of Statistical Planning and Inference 45 (1995) 347 372 :~57

Let 1m be a statistic such that

a(Im) = Zm, (6.10) ^ _

and let ~p,, be the marginalization map from ~ into the space (Z,., Zm) of probability measures on Z,,. The a-field on ~ generated by ~p,, is denoted by a(~p,.)c _~1~@. We have

a(q~")c (~ a(<p~). (6.11) (7,0)cO

We now conclude the following. If

~ c a((p"), (6.12)

then I,. is the unique minimal invariant statistic. We transform the given inference model and refer to this transformation as the

invariant reduction of the inference model under consideration. Let I be a minimal

invariant statistic on the sample space. The sample space of the new inference model is

(t2, a(1)). (6.13)

The probability model of the new inference model can be written as

(~'a, a(I) l~x) , (6.14)

where

~1 :-- { ~p (p)~a("]) [ p ~ }.

For A ~ c a ( ~ p ) we have by use of Theorem 1.4

~p~p~(A) = A,

and

~ , := { ( p ~ ( A ) I A ~ }

is a a-field on ~ l -

Therefore the a-field ~ l C a ( I ) ] ~ l is the a-field of interest in the new inference model. If

a ( l ) = Z , (6.15)

then the invariant reduction of the inference model is said to be trivial. For an illustration of the concepts in this section we refer to Examples 11.4 and 11.5.

7. Partial sufficiency

If the distribution of some statistic can be parametrized by a set of parameters including the parameter of interest, and the conditional distribution of the

358 E.E.M. van Berkum et al./Journal of Statistical Planning and In[erence 45 (1995) 347 372

observation, given the value of the statistic, is not informative with respect to the above set of parameters, then statistical inference should be based on the value of the statistic and its distribution. This is the sufficiency principle in the presence of incidental (or nuisance) parameters; compare S-sufficiency in Barndorff-Nielsen (1978, pp. 47, 50) and strong and weak p-sufficiency in Bhapkar (1991, p. 188). In the case of S- sufficiency and strong or weak p-sufficiency the distribution of the statistic depends on the parameter of interest alone. Moreover, in the case of S-sufficiency and strong p-sufficiency the conditional distribution of the observation does not depend on the parameter of interest. So all these definitions are generalized by our concept of partial sufficiency. A measure-theoretic formulation is given below.

Let ((L X) be the sample space, (~, 2[~) the probability model and ~ 2 [~ the a-field of interest of an inference model. Consider a statistic R on (f2, L'). The a-field on g? generated by R is denoted by a(R)c2;; see (1.1.2). The marginalization map from

into the space (a('~), a(R)) of probability measures on a(R) is written as ~0; see 1.11. The a-field on ~ generated by ~o is written as a(q~)c21~. Let

~k:(Q x ~ , a(R) ® 21~')--,(2, 2) (7.1)

be a map as described in (1.13). Fix ~oef2 and consider the map

0((o,. ) :(~, 21~)~(2 , 2). (7.2)

The a-field on ~ generated by the map O(~o,.) from ~ into (2, Z) is denoted by

a(0(~o,-))~ 21~'. The statistic R is said to be partially sufficient, if the interesting aspect of p 6 ~ is

a function of the marginal probability distribution ~o(p), i.e.

~ca(~o), (7.3)

and the conditional probability distribution qJ(og, p) is not informative with respect to the marginal distribution ~o(p) for all ~o~f2, i.e. if ff in (7.1) can be chosen such that

a(~9 (~o,-)) ± a(q~) (7.4)

for all ~oe(2; see (1.12). The partially sufficient statistic R is said to be minimal partially sufficient, if for all

partially sufficient statistics R' on (O,Z) we have

a ( R ' ) c a ( R ) ~ a ( R ' ) = a ( R ) . (7.5)

We conjecture that there exists a unique minimal partially sufficient statistic in general.

Let R be a minimal partially sufficient statistic. We now transform the given inference model and refer to this transformation as the partially sufficient reduction of the inference model under consideration. The sample space of the new inference model is

(0, a(R)). (7.6)

E.E.M. van Berkum et al./Journal o/' Statistical Phmning and ln/&ence 45 /1995) 347 372

The probability model of the new inference model can be written as

(~1, a(R)I,~I),

where

;~1 := { q~(p)~a~'R)lpc~}.

For A e ~ c a ( q ~ ) we have by use of Theorem 1.4

q~q~.(A) = A,

and

359

(7.7)

is a a-field on ~1. Therefore the a-field ~1 c a ( R ) ] ~ l is the a-field of interest in the new inference

model. If

a(R)=~, (v.8)

then the partially sufficient reduction of the inference model is said to be trivial. For an illustration of the concepts in this section we refer to Examples 11.6 and 11.7.

8. Partial conditioning

If the conditional distribution of the observation, given the value of some statistic, can be parametrized by a set of parameters including the parameter of interest and the distribution of this statistic is not informative with respect to the parameter of interest, then statistical inference should be based on the observation and its conditional distribution. The marginal distribution of the statistic may be essential for the verification of inference rules; see Fisher (1961). This is the conditioning principle in the presence of incidental parameters; compare S-ancillarity in Barndorff-Nielsen (1978, pp. 47, 50) and strong and weak p-ancillarity in Bhapkar (1989, p. 141). From similar arguments as in Section 7 it follows that all these definitions are generalized by our concept of partial ancillarity. A measure-theoretic formulation is given below.

Let ((2, Z) be the sample space, (,~, Z]~) the probability model and ~ c 2 [ ~ the a-field of interest of an inference model. Consider a statistic A on (~2, 2,'). The a-field on (2 generated by A is denoted by a ( A ) c Z ; see (1.1.2). The marginalization map from ;~ into the space (a'i',4), a(A)) of probability measures on a(A)is written as q~: see 1.11. The a-field on ~' generated by ¢p is written as a(~o)c,~l~'. Let

$:(Q x ~ , a(A) ® Z]~)--*(2, Z) (8.1)


be a map as described in (1.13). Fix coeC2 and consider the map

q, (co, . ) : (~ , 21~)-,(2, 2). (8.2)

The a-field on ~ generated by O(o), ") is denoted by a(O(og,.)). The statistic A is said to be a partially ancillary statistic with respect to coef2, if the

marginal probability distribution ~o(p) is not informative with respect to the interesting aspect of p e ~ , i.e.

a(q~)±~ (8.3)

(see (1.12)), and the interesting aspect of p e ~ is a function of the conditional probability distribution ~,(~o,p), i.e. the map ~, in (8.1) can be chosen such that

c a(O (o),-)). (8.4)

The partially ancillary statistic A is said to be a maximal partially ancillary statistic, if for all partially ancillary statistics A' on (f2, 2;) we have

a ( A ) c a ( A ' ) ~ a(A)=a(A'). (8.5)

In general there does not exist a unique maximal partially ancillary statistic with respect to a~f2.

Let A be a maximal partially ancillary statistic on (O, 22) with respect to o~g2. We transform the inference model in the following way. The sample space of the new inference model is

(a, 22). (8.6)

Let cg,o be the nonempty collection of maps ~ in (8.1) satisfying (8.3) and (8.4). Let ~h~cg ~' be minimal with respect to the partial order ~<;~o in (1.14.1). The probability model of the new inference model can be written as

(~,o, 21~) ' (8.7)

where

For A ~ c a(~h(eo,-)) we have by use of Theorem 1.4

~0~ (~o," ) ~h~ (¢o, • )(A) = A,

and

~,o := { ~h+(~o," )(A)E,~I~'°IA~} (8.8)

is a a-field on ~o. Therefore the a-field ~ ' c Z I ~ '° is the a-field of interest in the new inference model. If

a(A) = {0, C2}, (8.9)

then the partially ancillary statistic A is said to be trivial.

E.E.M. van Berkum et al./Journal O[ Statistical Planning and ln/erence 45 (1995) 347 372 361

For an illustration of the concepts in this section we refer to Examples 11.8 and 11.9.

9. Reference model

In this section we propose a sequence in which the transformations described in the previous sections should be performed. We refer to this scheme of data reduction as the SCIRA data reduction, where

S: sufficiency, C: conditioning, I: invariance, R: partial sufficiency, A: partial conditioning.

Let (~, Z) be the sample space and (~ , 21~) the probability model of a probability structure. By use of the unique minimal sufficient statistic S we transform the probability structure [(f2, 2;),(~, X]~) ] as described in Section 3.

Let (f~s, Xs) be the sample space and (~s, £sl~s) the probability model of the new probability structure. Hence, for every sufficient statistic S on (~s, Zs) we have

a(S) = Es; (9.1)

see (3.9). By use of a maximal ancillary statistic C we transform the probability

structure [(Ds, Zs), (~s,,~sl~s)] as described in Section 4. Let (Qc, Zc) be the sample space and (~c, ,~cl~c) the probability model of the new

probability structure. Hence, for every ancillary statistic C on (t]c, Xc) we have

a(C) = {0, f2c}, (9.2)

(see (4.6)), and for every sufficient statistic S on (f2 c, Xc) we have

a(S) = X c (9.3)

(see (5.2) and (5.1)). We now specify the a-field ~ C 2 c l ~ c of interest. The triple

[(f~c, Xc), (~c, Zc]~c),N] constitutes the inference model under consideration: see Section 5. By use of a minimal invariant statistic I on (f2c, Zc) we transform the

inference model under consideration as described in Section 6. Let ((2i, Z1) be the sample space, (~ i , 211~1) the probability model and ~ c 2;11JAx

the a-field of interest of the new inference model. As illustrated by Example 11.4, there may exist a nontrivial invariant statistic on (f2t,XI). In that case the invariant reduction of the inference model [(QI, ZI), (~ i , 2115a~), °~r] as described in Section 6 must be applied. Now suppose that every invariant statistic on (f2t, Z l ) i s trivial. By use of a minimal partially sufficient statistic R on (~i, N~) we transform the inference model [(f2t, Z~), (Sat, 211.~,), ~ 3 as described in Section 7.

362 E.E.M. van Berkum et al./Journal of Statistical Planning and ln[erence 45 (1995) 347 372

Let (OR,XR) be the sample space, (~R,~RI~R) the probability model and ~tRCY, RI~R the a-field of interest of the new inference model. Suppose that every invariant or partially sufficient statistic on (~R, ZR) is trivial. If this is not true, then we start with the invariant reduction again. By use of a maximal partially ancillary statistic A we transform the inference model [((2R, XR), (~R, Y-RI~R), ~R] as described in Section 8.

Let (~2A,ZA) be the sample space, (~A,Y-,AI~A) the probability model and ~A C~-~-'AI~A the a-field of interest of the new inference model. If every invariant or partially sufficient or partially ancillary statistic o n (~A,,.~A) is trivial, then the inference model [(~2A,XA), (~A,SaJ~A),~IA] is said to be a reference model. If the new inference model is not a reference model, then we start with the invariant reduction again. Hence, the SCIRA data reduction ultimately yields a reference

model. The SCIRA data reduction scheme is depicted in the figure below. Double arrows

should be followed only if the corresponding reduction is nontrivial.

probability structure . . . . S , C I , R , A . . . . reference model i

i

inference model

The SCIRA data reduction is illustrated in Examples 11.10 and 11.11. The reduction and conditioning transformations described in the previous sec-

tions seem to be self-evident. However, the specific sequence described above may be open to debate. The I-, R- and A-steps in the SCIRA scheme are only possible after interest has been specified. So the S- and C-steps should come first. Reduction of the observation should precede conditioning. So the S-step precedes the C-step. Sim- ilarly, the A-step is preceded by the reducing I- and R-steps. The authors conjecture that I- and R-steps may be interchanged without changing the outcome. The authors considered other alternative sequences but all of these seem to be inad- equate in the sense that for every sequence there is a crucial example where it leads to unsatisfactory results. So, in statistical inference, every probability structure should be transformed according to SCIRA and every statistical inference problem is equivalent to the construction of inference rules in the reference models resulting from SCIRA.

10. Discussion

This paper does not solve any particular statistical inference problem. However, it formulates the principles of reduction and ancillary classification of the observation(s).

E.E.M. van Berkum et al./Journal o1' Statistical Planning and Inli'rence 45 (1995) 347 372 363

This formulation should be independent of the parametrization of the probability

model under consideration. Therefore we formulate these principles in a measure- theoretic context. As mentioned in the previous section the solution of a stat-

istical inference problem is equivalent to the construction of inference rules in the reference models resulting from SCIRA. In two forthcoming papers the authors will describe a general method to construct inference rules and their confidence

levels. Again, like the SCIRA data reduction and classification, this method is an elaboration and extension of Fisher's work, especially of his idea of sampling the reference set (Fisher, 1961). In Fisher's terminology SCIRA gives "the set to be sampled'. We shortly illustrate the idea of this general method by use of the inference model in Example 11.9. In this case the set to be sampled can be written as

'~ix, s2)lxeAl,

where s 2 is the observed value of the statistic S 2. The confidence level of an inference rule for the parameter/2 should be based on the conditional distribution of X given S 2 = s 2. However, this distribution depends on a z also. By use of the marginal

distribution of the partially ancillary statistic S 2, a fiducial distribution for a 2 is obtained for every observation s 2 of S 2. The confidence level of an inference rule tor

/2 is calculated by use of these fiducial distributions. Many authors criticized the use of fiducial distributions because supposedly they

are not unique. However, in our first forthcoming paper we present a general definition of inference rules and the corresponding fiducial distributions parametrized by the observation and we show that the fiducial distribution is uniquely determined by the inference rule and the observation. In another forthcoming paper we describe a general method to construct inference rules and their confidence levels. The basic ideas in this construction differ essentially from the methods in decision theory: see

Fisher (1956, pp. 99-103). However, there is not necessarily a conflict between the relevant principles of decision theory and the SCIRA reduction and classification scheme presented in this paper. In a reference model several incomparable inference rules can be constructed, each with its own unique fiducial distribution, and decision- theoretic criteria may be used to select the 'best' of these rules. It is our opinion that SCIRA should be applied before the construction of inference rules. Especially lhe omission of the conditioning steps in SCIRA can lead to erroneous inference rules: see Example 11.2.

11. Examples

In this section we give some examples to illustrate the concepts given in Sections 2-9. In some of these examples both the set f2 of the sample space (O, 2.') and

364 E.E.M. van Berkum et al./Journal of Statistical Planning and ln[erence 45 (1995) 347-372

the set ~ of the probability model (~ , Z'I~) are finite. First we present the notation that we use in such examples.

The probabili ty structure is represented by a matrix M as shown below.

A

8

C M

1 2 ...

Here the possible outcomes are denoted by 1,2,3 . . . . and the probability measures on S by A, B, C . . . . The a-field S of events is equal to the powerset P(O).

By use of the matrix M we can compute probabilities as follows. The probability of the event {j}, corresponding to a probability measure p ~ on S, is equal to

where i is the number of the row that corresponds to the measure p.

(11.1)

Example 11.1. The probabili ty structure is given by

A 0 0 5

B 2 2 1

C 1 1 3

1 2 3

Consider the statistic S defined by S(1)= S(2)= 1, and S(3)= 3. The map ~k described in (3.1) satisfies

- - ~(1,A)=~k(2,A) can be chosen arbitrarily in 2;, - -~b (3 ,A)=~b(3 , B)=~9(3, C) is the probability measure on S concentrated

in {3}, - - ~b(1,B)=~k(2, B ) = ~ ( 1 , C)=~O(2, C) such that ~k(1,B)({i})=½, i=1 ,2 .

If we choose ~k(1,A)=~(1,B), then

a(~(~o, .))= {0, ~}

E.E.M. van Berkum et al./Journal o f Statistical Planning and lnl~,rence 45 (1995) 347-372 365

for all coef2. So the statistic S is sufficient; see (3.3). After sufficient reduction the probability structure is represented by

A 0 5

B 4 1

C 2 3

1,2 3

Example 11.2. A well-known example in the literature concerns the case where there are two measuring instruments available with different measurement standard devi- ations. A fair coin is tossed to decide which to use. The statistic C defined by C = i if the ith instrument is used (i = 1, 2) is an ancillary statistic. Conditioning leads to two probability structures. See Dawid (1982) for more details.

Example 11.3. The probability structure is given by

A 1 2 3 4

B 0 3 4 3

l 2 3 4

In this example there exist two maximal ancillary statistics. Consider the statistic

C defined by C(1 )=C(2 )= 1, C(3)=C(4)=2 . The statistic C is ancillary since

ai~0)= {0,~},

and is maximal. By conditioning on the value of C we obtain two probability structures. They are represented by

C = I

A 1 2 0 0 A

B 0 3 0 0 B

1 2 3 4

C = 2

0 0 3 4

0 0 4 3

1 2 3 4

The ancillary statistic C' defined by C ' ( 1 ) = C ' ( 3 ) = 1 , C ' ( 2 ) = C ' ( 4 ) = 2 is also maximal.

In the next examples we have to deal with a a-field of interest ~ . In the examples with a finite set ~ and a finite set •, we denote the a-field of interest as follows. The probability measures are written as A 1 , A 2 . . . . . A . . , B t , B 2 . . . . . B . . . . . . This

366 E.E.M. van Berkum et ai./Journal of Statistical Planning and lnJerence 45 (1995) 347-372

notation means that the 0.-field of interest ~ is equal to

~=0-({A1 . . . . . A,A}, {B1 . . . . . B.,} .. . . ). (1 1.2)

Example 11.4. The inference model is given by

A1 1 0 2 1

A 2 0 1 2 1

A 3 I 1 1 1

B1 1 1 2 0

1 2 3 4

The only nontrivial element of the set V of (6.1) is the pair (7, 6), where

y({1})={2}, 7({2})={1}, 7({3})={3}, y({4})={4} and

~(A1)=A2, 6 ( A z ) = A 1 , 6 ( A 3 ) = A 3 , 6 ( B 1 ) = B 1 .

Using (6.8) (choose Q = V) and (6.9)-(6.12), we conclude that there exists a unique minimal invariant statistic Im with 0.(1,,)= 0.({1,2}, {3}, {4} ). After invariant reduction the following inference model results:

A1,2 1 2 1

A 3 2 1 1

B 1 2 2 0

1,2 3 4

In this inference model there again exists a nontrivial invariant statistic. Invariant reduction leads to the inference model

A1,2,3 3 1

Bx 4 0

1,2,3 4

Remark. Barnard (1963) demands that the group of transformations on ~ operates transitively on {A1, A2, A3 }. So, according to his definition invariant reduction is not allowed in this case.

Example 11.5. Let X and S z be independent random variables with X ~ N(/~, 0 "2) and $2/0. 2 ~ Z 2. So, f2 = ~ × R + and every probability measure in ~ can be represented by (/~, 0-2). The interesting aspect of p ~ is 0. z. The groups of translations on the first

E.E.M. van Berkum et al./Journal q[" Statistical Planning and h~f~,rence 45 (1995) 347 372 367

component of (/~,a 2) and the first component of (X, S2), respectively, lead to the unique minimal invariant statistic defined by I(X, S 2) = S 2. In the transformed inference model the set f2 of the sample space can be represented by {s 2 ] s 2 e ~ + } and the set ~° 1 of the probability model by {0"210-2~[t~+}.


AI 2 4 14

A2 3 3 14

A 3 4 8 8

A4 6 6 8

B1 6 12 2

B 2 9 9 2

1 2 3

The statistic R is defined by R(1)=R(2)= 1, R(3)=3. We have

0-((p) = 0-({A x, A2 }, {A3, A4}, {B,, B 2 }) and

0-(¢(1, .))= 0-(tp (2, . ) )= a({Aa, A 3, B, }, {A2, A4, Bz }),

o-(~,(3,.))= {0,¢'}.

Conditions (7.3) and (7.4) are satisfied, so R is a partially sufficient statistic. R is minimal and unique. After partially sufficient reduction the following inference model results:

A1,2 3 7

A3, 4 6 4

B1, 2 9 1

1,2 3

Example 11.7. Let X1, X2 and Y be independent random variables such that X1, X2~N(/~,cr2), and Y~N(0, a22), with #ER, a12e~ + and a ~ < l . The minimal sufficient statistic is (X(1),X(z), Y), where X(1) and X(2~ are order statistics. J~he

0 -2 set ~ of the probability model is represented by {(/~, 1,0-2)1#e~, 0-2eN+,0-22<1}. The interesting aspect of p e ~ is /~. The statistic R defined by R(X(1),X(2), Y)= (X,),X(2)) is not invariant. However it is the unique minimal partially sufficient statistic. In the transformed inference model the set ~1 is represented by {(/~, 0-2)]#e~, 0-~+~.

368 E.E.M. van Berkum et al./Journal of Statistical Planning and Inference 45 (1995) 347 372

Remark. The statistic (X(l),S(2)) is not weakly p-sufficient in the sense defined by Bhapkar (1991), and it is not p-sufficient in the complete sense, because the a-field of interest is a proper subset of a(tp); see (7.3) and Definition 3.2 of Bhapkar (1991).


At 0 3 3 3

A2 0 6 3 0

Bi 3 0 0 6

B 2 6 0 1 2

1 2 3 4

The statistic A is defined by A(1)=A(2)= 1, A(3)=A(4)=2. We have

a(~k(1,')) = a(~(2, . ))= a({A~, A2}, {Bt, B2}),

a(~(3, .)) = a(~ (4,-)) = a({A1 }, { Az }, {B1 }, {B2 }),

and

Conditions (8.3) and (8.4) are satisfied for all coEf2, so A is a partially ancillary statistic with respect to every co~f2. It is also maximal and unique. After conditioning on the value of A the following inference models result:

A = I

A1, 2 0 1

B1, 2 1 0

1 2

A = 2

Z l

A2

B1

Bz

3 3

6 0

0 6

2 4

3 4

where columns with only zeroes are deleted; see (2.1) and (2.2).

Example 11.9. Consider the probability structure defined in Example 11.5. However, the interesting aspect of p e ~ is /~. The statistic A defined by A(X, S2)=S 2 is the unique maximal partially ancillary statistic. For every s 2 we transform the inference model as follows. The set f2 of the sample space can be represented by {(x, s2)]xe~} and the set ~s2 of the probability model by { (#, a 2)]#~ ~, a 2 ~ ~ + }.

E.E.M. van Berkum et al./Journal ~?[' Statistical Planning and ln/erence 45 (1995) 347 372 3,69

Remark 1. The statistic A is not weakly p-ancillary in the sense defined by Bhapkar

(1989), and it is not p-ancillary in a complete sense, because the ~-field of interest is

a proper subset of ~(O(s2,-)); see (8.4) and Definition 2.3 of Bhapkar (1989).

Remark 2. The statistics X and S 2 are independent. So, the conditional distribution of X, given S 2 = s 2, is independent ofs 2. On the other hand, the distribution of X depends

on a 2. Fisher (1961) indicated a method to construct inference rules in this and similar

cases; see also Section 10.

Example 11.10. In this

occur once. The initial

A 0 0 12

B 0 0 12

C 0 0 12

D 0 0 24

E 0 0 24

F 4 8 0

G 8 16 0

H 8 16 0

1 2 3

Sufficient reduction

A 0 12

B 0 12

C 0 12

D 0 24

E 0 24 3

F 12 0 0

G 24 0 2

H 24 0 1

1,2 3 4

example all the transformations of the SCIRA data reduction

probability structure is given by

6 6 0 12 2

6 6 12 0 2

3 9 6 6 2

6 6 0 0 2

3 9 0 0 2

0 0 12 12 2

2 2 4 4 2

1 3 4 4 2

4 5 6 7 8

leads to the probability structure

6 6 0 12 2

6 6 12 0 2

3 9 6 6 2

6 6 0 0 2

9 0 0 2

0 12 12 2

2 4 4 2

3 4 4 2

5 6 7 8

There exists a unique maximal ancillary statistic. Conditioning on its value leads to

the two following probability structures:

1. A , B , C , D , E , F , G , H 1

8


. A

B

C

D 0

E 0

F 12

G 24

H 24

1,2

0 12 6 6 0 12

0 12 6 6 12 0

0 12 3 9 6 6

24 6 6 0 0

24 3 9 0 0

0 0 0 12 12

0 2 2 4 4

0 1 3 4 4

3 4 5 6 7

In probability structure 1 only a trivial a-field of interest can be specified; see Section 5. We specify the a-field of interest ~ in probability structure 2 as follows:

~=¢r({A,B,C,D,E}, {F,G,H}).

Define accordingly AI:=A, Az:=B, A3:=C, Ag:=D, As:=E and BI:=F, Bz:=G, 93 :=H. There exists a unique minimal invariant statistic defined by •(6)=•(7)=5, I(1, 2)= 1, •(3)=2, •(4)= 3, •(5)=4. The transformed inference model is given by

A1,2

A3

A4

A5

Ba

Bz

B3

0 12 6 6 12

0 12 3 9 12

0 24 6 6 0

0 24 3 9 0

12 0 0 0 24

24 0 2 2 8

24 0 1 3 8

1,2 3 4 5 6,7

There exists a unique minimal partially sufficient statistic defined by R(4)=R(5)= 1, R(1,2)=2, R(3)=3, R(6,7)=4. The transformed inference model is given by

A1,2,3

A4,5

B1

92,3

0 3 3 3

0 6 3 0

3 0 0 6

6 0 1 2

1,2 3 4,5 6,7

E.E.M. wm Berkum et al./Journal ol' Stalis'tical Plam~ing and hff~,rence 45 (1995) 347 372 371

There exists a unique maximal partially ancillary statistic defined by A(I, 2)= A (3)= 1, A(4, 5)= A (6, 7)= 2. Conditioning on the value of A leads to the following reference models:

1. A1,2,3,4, 5 0 1

BI,2, 3 1 0

1,2 3

2. A1,2, 3 3 3

A4,5 6 0

B 1 0 6

B2, 3 2 4

4,5 6,7


A1

A2

B1

B2

0 3 3 3 1 8

0 12 6 0 0 0

6 0 0 12 0 0

6 0 1 2 0 9

1 2 3 4 5 6

This is an example where one has to condition twice on the value of a maximal partially ancillary statistic. The invariant and partially sufficient statistics are trivial. The statistic A defined by A(1)=A(2)=A(3)=A(4)= 1, A(5 )=A(6)=2 is the unique maximal partially ancillary statistic. After conditioning on the value of A two inference models result:

A = I

1. A 1 0 3 3

A2 0 6 3

Ba 3 0 0

B 2 6 0 1

1 2 3

.

A ==2

A 1 1 8

B2 0 9

5 6

Model 2 is a reference model, since the invariant, partially sufficient and partially ancillary statistics are trivial. Model 1 is the model in Example 11.8, so we have to condition again on the value of a maximal partially ancillary statistic.


Acknowledgement

T h e a u t h o r s t h a n k Dr . J .G .F . T h i e m a n n for a c o m m e n t which led to an essent ia l

i m p r o v e m e n t of the paper .

References

Barnard, G.A. (1963). Some logical aspects of the fiducial argument. J. Roy. Statist. Soc. B25, 111-114. Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, New York. Bhapkar, V.P. (1989). Conditioning on ancillary statistics and loss of information in the presence of

nuisance parameters. J. Statist. Plann. Inference 21, 139 160. Bhapkar, V.P. (1991). Loss of information in the presence of nuisance parameters and partial sufficiency.

J. Statist. Plann. Inference 28, 185 203. Dawid, A.P. (1982). Statistical inference I. In: Encyclopedia of Statistical Sciences. Wiley, New York. Fisher, R.A. (1956). Statistical Methods and Scient![ic Inference. Oliver and Boyd, Edinburgh (3rd ed.,

Hafner, New York). Fisher, R.A. (1961). Sampling the reference set. Sankhya A 23, 3-7. Giri, N.C. (1982). Invariance concepts in statistics. In: Encyclopedia of Statistical Sciences. Wiley, New York.

Documents

The reduction and ancillary classification of observations