17
METRON - International Journal of Statistics 2003, vol. LXI, n. 2, pp. 267-283 ABDERRAHIM OULHAJ – MICHEL MOUCHART Partial sufficiency with connection to the identification problem Summary - Let M X = (R X , X , P ={ P θ : θ }) be a parametrized statistical model and g : G be a non-injective function characterizing a parameter of interest. The basic idea of partial sufficiency is to find a (minimal) statistic sufficient for making inference on g(θ). Following Fraser (1956), Barndorff-Nielsen (1978) has defined a concept of S-sufficiency. Our contribution is first to establish the connection between S-sufficiency and the identification concept. Second, we establish some properties of S-sufficiency, in particular we compare the properties of sufficiency for the complete parameter with those of S-sufficiency. Key Words - Sufficiency; S-sufficiency; Identification; Partial observability; Sufficient parameter. 1. Introduction Let M X = (R X , X , P ={ P θ : θ }) be a parametrized statistical model, where (R X , X ) is the sample space of X , P is a family of probability measures on (R X , X ) and is the parameter space. For the sake of exposition, a statistic will be indifferently considered either as a measurable function of the complete observation, i.e. T = f ( X ), or as a sub-σ -field of X . The connection between the two approaches being evidently that T is the σ -field generated by T , i.e. T = σ(T ). We also denote by P θ T the trace of P θ on T . Two different contexts should be borne in mind when studying partial sufficiency. Context 1: in the case of partial observability, X represents a latent vector and T is an observable one. M X is the structural model and M T , the image of M X under the transformation f , is the statistical model. Here T is given by a criterion of observability rather than of sufficiency and the object is to analyze the identification of the statistical model; in particular, whether g(θ) is identified or not in M T . In such a framework, the concept of S-sufficiency may Received April 2003 and revised July 2003.

Partial sufficiency with connection to the identification problem

Embed Size (px)

Citation preview

METRON - International Journal of Statistics2003, vol. LXI, n. 2, pp. 267-283

ABDERRAHIM OULHAJ – MICHEL MOUCHART

Partial sufficiency with connectionto the identification problem

Summary - Let M�X = (RX ,X ,P� = {Pθ : θ ∈ �}) be a parametrized statistical

model and g : � → G be a non-injective function characterizing a parameter ofinterest. The basic idea of partial sufficiency is to find a (minimal) statistic sufficientfor making inference on g(θ). Following Fraser (1956), Barndorff-Nielsen (1978) hasdefined a concept of S-sufficiency. Our contribution is first to establish the connectionbetween S-sufficiency and the identification concept. Second, we establish someproperties of S-sufficiency, in particular we compare the properties of sufficiency forthe complete parameter with those of S-sufficiency.

Key Words - Sufficiency; S-sufficiency; Identification; Partial observability; Sufficientparameter.

1. Introduction

Let M�X = (RX ,X ,P� = {Pθ : θ ∈ �}) be a parametrized statistical

model, where (RX ,X ) is the sample space of X , P� is a family of probabilitymeasures on (RX ,X ) and � is the parameter space. For the sake of exposition,a statistic will be indifferently considered either as a measurable function of thecomplete observation, i.e. T = f (X), or as a sub-σ -field of X . The connectionbetween the two approaches being evidently that T is the σ -field generatedby T , i.e. T = σ(T ). We also denote by Pθ

T the trace of Pθ on T .Two different contexts should be borne in mind when studying partial

sufficiency.Context 1: in the case of partial observability, X represents a latent vector

and T is an observable one. M�X is the structural model and M�

T , the imageof M�

X under the transformation f , is the statistical model. Here T is givenby a criterion of observability rather than of sufficiency and the object is toanalyze the identification of the statistical model; in particular, whether g(θ) isidentified or not in M�

T . In such a framework, the concept of S-sufficiency may

Received April 2003 and revised July 2003.

268 A. OULHAJ – M. MOUCHART

be used as a sufficient condition to identify g(θ) in M�T (Theorems 1 and 2).

As a matter of fact, instead of identifying directly g(θ) in M�T , which is usually

not easy, one may first identify g(θ) in M�X (often easy to do, because the

complete parameter is usually identified in the structural model) and thereafterfind conditions under which the observation T is S-sufficient for g(θ). Thisapproach happens to be promising when studying the identification of discretechoice models.

Context 2: if our object is to reduce the sample space by marginalizingw.r.t an S-sufficient statistic, identification may be considered as an easy wayfor detecting non S-sufficient statistics (Theorems 1, 2 and Example 1). Fur-thermore, a deeper understanding of the concept of S-sufficiency is enhancedby examining some of its properties, in particular by reviewing the propertiesof the sufficiency for the complete parameter and checking which ones alsohold for S-sufficiency (Theorems 3, 4 and Corollary 1).

Another aspect of this paper is to point out several overlapping concepts inthe statistical literature on partial sufficiency. We payed a particular attention toconnect among themselves the many concepts proposed earlier, and we showedthat a small number of them are actually different, even when they have beenintroduced from different perspectives. An appendix collects the proofs of themain results.

This paper updates an older but more extended version, Mouchart and Oul-haj (2001). The work underlying this paper may be put into a wider perspectivein the analysis of partially observable conditional models in Oulhaj (2003).

2. A review of: sufficiency, identification and a variation free repa-rameterization

In this section we shortly review and sometime extend some basic conceptsalready available from the literature. We start by reminding the usual conceptsof a sufficient statistic for a complete parameter and of identification of afunction of the parameters. For the sake of convenience, we use the notationP(A | θ) for the probability of event A ∈ X according to Pθ , and a submodelof M�

X is denoted as M�0X where �0 ⊂ �.

Definition 1 (Barra (1981)). A statistic T of X is said to be sufficient for M�X

if for every A ∈ X , a version of P(A | T , θ) exists which is the same for allθ ∈ � or equivalently:

∀A ∈ X :⋂θ∈�

P(A | T , θ) �= ∅ (1)

Partial sufficiency with connection to the identification problem 269

where for each θ, P(A | T , θ) is the equivalence class of all versions of theconditional probability of A given T . When there is no ambiguity, one simplysays that T is sufficient.

Definition 2 (LeCam and Schwartz (1960)). A parameter function g(θ) is saidto be identified in M�

T or T -identified, if:

Pθ1T = Pθ2

T �⇒ g(θ1) = g(θ2) . (2)

Definition 2 implies that any function of an identified parameter is identi-fied, simply because for any function h: g(θ1) = g(θ2) �⇒ h[g(θ1)] = h[g(θ2)].Furthermore, if g is T2-identified then g is T1-identified for every T1 ⊃ T2.In particular:

g is T -identified �⇒ g is X -identified . (3)

Barankin (1961) refers to Definition 2 as g(θ) being a necessary parameterfor T . We stick however to the more frequent use of an “identified” parameter.

As a matter of fact, the concept of sufficiency of a parameter for a statisticmay be viewed as a reciprocal of identification, in view of the followingdefinition.

Definition 3 (Barankin 1961). A parameter function g(θ) is said to be sufficientin M�

T or, equivalently, T -sufficient, if:

g(θ1) = g(θ2) �⇒ Pθ1T = Pθ2

T . (4)

Relation (4) actually describes a binary relationship between a statistic Tand a parameter of interest g(θ). Thus Picci (1977) refers to (4) as a function gbeing unresolvable for T and Basu (1977) as a statistic T being g-oriented.

Later on, we shall also make use of �g, the set of all g-oriented statistics,i.e. the set of all T satisfying (4) for a given function g. The next lemma,useful for later results, gives a characteristic property for the set �g. It statesthat if T ∈ �g and θ is identified in M�

X , then for every two distinct parametervalues (θ1, θ2) such that g(θ1) = g(θ2), there is no common version of theirT -conditional distributions. More specifically:

Lemma 1. If θ is X -identified and T ∈ �g, then ∀θ1 �= θ2 :

g(θ1) = g(θ2) �⇒ ∃A ∈ X : P(A | T , θ1) ∩ P(A | T , θ2) = ∅ . (5)

Combining Definitions 2 and 3, Barankin (1961) introduces the conceptof a minimal sufficient parameter, i.e. a parameter which is both T -identifiedand T -sufficient. Such a concept is also called a maximal identifiable (ormaximal unresolvable) parameter in Picci (1977). Note that when g(θ) = θ (or

270 A. OULHAJ – M. MOUCHART

more generally, when g is injective), θ is, by definition of a statistical model,a sufficient parameter; therefore θ identified is eventually minimal sufficient.Thus it might be more adequate to say that the parametrization θ is identifiedeven though some authors also say that the model is identified.

Finally, let us turn to the concept of a variation free reparameterization. Anatural extension of Barndorff-Nielsen (1978) is provided in next definition.

Definition 4. Let h and g be two functions defined on �, with range spaceh(�) and g(�) respectively. The transformation (h, g) is a variation freereparameterization, or equivalently, h and g are variation free complements ifthe following condition holds:

{(h(θ), g(θ)) : θ ∈ �} = h(�) × g(�) . (6)

In the sequel, we denote by Vg the set of all the variation free comple-ments of a given parameter of interest g. Following Fraser (1956), Barndorff-Nielsen (1978) has defined a concept of S-sufficiency, a natural extension ofwhich is the following:

Definition 5. A statistic T of X is S-sufficient for g if:

(i) T is g-oriented (i.e. T ∈ �g) and(ii) There exists a variation free complement of g, say h, such that:

∀β ∈ h(�) :⋂

h−1(β)

P(A | T , θ) �= ∅ . (7)

Basu (1977) has called “T is specific sufficient for γ = g(θ)” the condi-tion (7). Heuristically, to say that a statistic T is S-sufficient for a given gmeans the following: θ accepts a variation free reparameterization θ ↔ (β, γ ),where β = h(θ) is a variation free complement of γ = g(θ), such that thejoint density p(x | θ) can be factorised into:

p(x | θ) = p(t | γ )p(x | t, β) (β, γ ) ∈ h(�) × g(�) . (8)

For many examples, see, among others, Basu (1977), Barndorff-Nielsen (1978),Dawid (1975), Fraser (1956) or Godambe (1980). The same structure as (8) hasalso been called a cut by Barndorff-Nielsen (1978) when γ is not necessarily aparameter of interest. He also noticed that the standard concept of sufficiency(resp. ancillarity) corresponds to a cut when g(θ) is injective (resp. g(θ) is aconstant function). Also, Basu (1977) has called the structure of (8) as T isp-sufficient for γ or T is p-ancillary for β.

Partial sufficiency with connection to the identification problem 271

3. Connection between sufficiency and identification for the completeparameter θ

When the complete parameter θ , or one of its injective transformation, isof interest, the property given in (3) raises a natural question: which prop-erty of the statistic T would actually make X -identification equivalent to T -identification. This question has been settled in the Bayesian framework byFlorens et al. (1990, pp: 162). The next theorem ensures that in a samplingtheory framework the (sampling) sufficiency of T also provides the answer.

Theorem 1. If the statistic T is sufficient, then: θ is T -identified ⇐⇒ θ is X -identified.

According to the two contexts given in the introduction, Theorem 1 may beunderstood as follows. It states, in context 1, that when we reduce an identifiedcomplete model by marginalizing w.r.t a sufficient statistic T , we do not lose theidentification property. In other words, if the observable vector T is sufficientfor the structural model M�

X , it is equivalent to check the identification of θ

in that structural model or in the statistical one. In context 2, Theorem 1may be interpreted as follows: for a given identified model M�

X , a necessarycondition for a statistic T to be sufficient for M�

X is that θ be identified inM�

T . In the homogenous case, the existence, and the construction, of a sufficientstatistic of fixed dimensionality is limited to the exponential family. In the non-homogenous case, as for instance in some non-parametric or semi-parametricmodels, checking the sufficiency of a statistic may be seriously hampered bythe difficulty, or the impossibility, of building the distribution conditional on agiven statistic. Next examples provides an illustration where Theorem 1 givesan easy check of non sufficiency by means of loss of identification.

Example 1. Let Xi , i = 1, 2, . . . , n, be independent and Xi ∼ N (µi , 1). LetTi = X1 − Xi+1 with i = 1, 2, . . . , n − 1 and consider the statistic T =(T1, T2, . . . , Tn−1). To show that the statistic T is not sufficient, it suffices(by Theorem 1) to show that θ = (µ1, . . . , µn) is not T -identified. As matterof fact Ti ∼ N (µ1 − µi+1, 2) ∀i = 1, 2, . . . , n and ∀i �= j : cov(Ti , Tj ) = 1.Now let us consider θ

′ = (µ′1, µ

′2, . . . , µ

′n) and θ ” = (µ”

1, µ”2, . . . , µ”

n) suchthat µ”

i = µ′i + k ∀i = 1, 2, . . . , n, with k is a constant different from zero.

Clearly, θ′ �= θ ” but the distribution of T is the same for these two different

values. Then θ is not T -identified and consequently T is not sufficient. Noticethat this argument is an alternative easier than computing the T -conditionaldistribution of X and checking whether it depends or not on θ .

Example 2. In the conditional binary response models, we observe a realizationof a vector X = (T, Z) where T ∈ {0, 1} is the response or the endogenousvariable and Z ∈ RZ ⊂ R

K+1 is the vector of K exogenous or explanatory

272 A. OULHAJ – M. MOUCHART

variables plus a constant term. The binary response variable is modelled asT = 1{Y1≥Y0}, where, for each j ∈ {0, 1}, Yj represents the random (with respectto the statistician) utility associated to the alternative j . The structural modeldescribing the conditional distribution (Y0, Y1 | Z) is latent because (Y0, Y1)

are not observable, and is called the Random Utility Model. The statisticalmodel, i.e. that one describing the conditional distribution (T | Z) is the imageof that structural model under the transformation arg maxj∈{0,1}. The statisticalmodel may therefore be viewed as a model of partial observability becauseonly the sign of the latent variable Y1 − Y0 is observable. In a semi-parametricapproach, each latent variable Yi is modelled by Yi = Z

′βi + δi , i ∈ {0, 1}

where (δ0, δ1) is a random vector with unknown joint distribution F and foreach i ∈ {0, 1}, βi ∈ R

K+1. The complete parameter characterizing this struc-tural semi-parametric model is then θ = (β0, β1, F). This structural model isa semi-parametric bivariate regression model, eventually subject to the usualidentification restrictions on F : zero expectation, finite variance and uncor-relatedness between Z and (δ0, δ1). For the statistical model, note that themanifest variable T is given by T = 1{Y1≥Y0} = 1{ε−Z

′β≤0}, where β = (β1 −β0)

and ε = (δ0 − δ1). One way to check whether the statistic T is sufficientfor the complete parameter θ could be to evaluate the distribution of (Y0, Y1)

conditional to T and to check whether it depends on θ or not. This strategyis typically not operational in a semi-parametric model. However Theorem 1gives us an answer. In fact, let θ0 = (β

(0)0 , β

(0)1 , F) and θ1 = (β

(1)0 , β

(1)1 , F)

be two distinct parameter values with β(0)1 − β

(0)0 = β

(1)1 − β

(1)0 . Then, θ1 and

θ0 induce the same distribution for T unless θ0 �= θ1. Consequently θ is notT -identified and then by Theorem 1, T is not sufficient for θ .

4. Connection between S-sufficiency and identification for a non-in-jective function g(θ)

Definition 5 of S-sufficiency is essentially based on the two fundamentalconcepts introduced in Section 2, namely the notions of sufficient parameterand variation free reparameterization. In this section, we introduce the notionof a g-partition. This concept provides us with a deeper understanding of thevariation free concept introduced by Barndorff-Nielsen (1978) and makes theconditions of applicability of S-sufficiency more explicit.

4.1. g-partitions and variation free reparameterization

Let us construct a variation free complement parameter from a given non-injective function g. The following lemma offers an alternative view to theconcept of a variation free reparameterization introduced in Definition 4.

Partial sufficiency with connection to the identification problem 273

Lemma 2. Let h and g be two functions defined on �, with range space h(�) andg(�) respectively. The transformation (h, g) is a variation free reparameterizationiff :

∀(β, γ ) ∈ h(�) × g(�) : Card{h−1(β) ∩ g−1(γ )} = 1 , (9)

where the symbol Card{.} means the Cardinality of a set.

Under the conditions of Lemma 2, if h (resp. g) is injective, g(�) (resp. h(�))reduces to a singleton, i.e. g (resp. h) is a constant function.

Let us denote as �/g = {g−1(γ ) : γ ∈ g(�)}, the quotient set of theequivalence relation θ1 ∼g θ2 ⇐⇒ g(θ1) = g(θ2). A g-section is a subset of �

made of exactly one element from each equivalence class in �/g, namely:

Definition 6. A subset �∗g of � is said to be a g-section of � if:

∀γ ∈ g(�) : Card{g−1(γ ) ∩ �∗g} = 1 . (10)

Remarks.(i) An alternative approach is to define a g-section as any function m :

�/g −→ � such that v ◦ m is the identity on �/g, where v is the canon-ical surjection from � to �/g. That the two approaches are equivalentmay be viewed through the equality: m(�/g) = �∗

g.(ii) On every g-section, g is injective and has the same image, namely g(�∗

g) =g(�). So that all the g-sections are in bijection.

In next definition, we introduce a notion of a g-partition.

Definition 7. Let = {�∗g,β : β ∈ B} be a parametrized family of g-sections

of �. is said to be a g-partition of � if the elements of form a partitionof �.

As matter of fact, a g-partition is one way of considering a variation freecomplement of a given function g. In particular their existence is equivalent,in view of the following lemma.

Lemma 3. Let g be a function defined on �, then:

A variation free complement of g exists ⇐⇒ A g-partition of � exists.

When the parameter space � is finite, a necessary and sufficient conditionfor the existence of a g-partition (and thus by Lemma 3 of a variation freecomplement of g) is that all the equivalence classes defined by g have thesame cardinality. That a variation free complement may fail to exist for agiven function g(θ), may be viewed in the following example.

Example 3. Suppose that θ = (µ1, µ2) ∈ R2 and g(θ) = µ1 − µ2; then h(θ) =

µ1 + µ2 is a variation free complement of g(θ). Suppose now that (µ1, µ2) ∈

274 A. OULHAJ – M. MOUCHART

[0, 1]2 along with the same g(θ) = µ1 − µ2. The absence of a variation freecomplement of g(θ) is due to the fact that the sets g−1(γ ) γ ∈ [−1, 1] have nota constant cardinality. For instance g−1(1) = {(1, 0)} and g−1(−1) = {(0, 1)}are each reduced to a singleton making accordingly impossible the constructionof disjoints g-sections and then of a g-partitions.

4.2. S-sufficiency: an alternative approach

The concept of g-partition provides another way of looking at S-sufficiency,in view of the next proposition which is a simple corollary of Lemma 3.

Proposition 1. Let T ⊂ X be a statistic. T is S-sufficient for g iff :

(i) T is g-oriented (i.e. T ∈ �g) and(ii) There exists a g-partition of �, say = {�∗

g,β : β ∈ B}, such that T is

sufficient for each submodel M�∗

g,βX .

The next example illustrates the use of Proposition 1 to prove the S-sufficiency of a statistic.

Example 4. Let X =( X1

X2

)∼ N(2)

(( µ1

µ2

),( 1 0

0 1

))with θ = (µ1, µ2) ∈ R

2

and consider g(θ) = µ1 − µ2 as the parameter of interest. To show that thestatistic T = X1 − X2 is S-sufficient for g(θ), let us consider the followingg-partition:

= {�∗g,β : β ∈ R} with �∗

g,β = {(µ1, µ2) ∈ R2 : µ1 + µ2 = β} . (11)

Clearly (8) is satisfied with h(θ) = µ1 + µ2 because (T | θ) ∼ N (g(θ), 2),i.e. T ∈ �g and the T -conditional distribution of X depends only on h(θ),namely:

(X | T, θ) ∼ N(2)

1

2

( h(θ) + Th(θ) − T

),

1

2

1

21

2

1

2

. (12)

Thus, ∀�∗g,β ∈ : T is sufficient for M

�∗g,β

X . The g-partition in (11) satisfiescondition (ii) of Proposition 1. Thus, by Definition 5, T is S-sufficient for theparameter function µ1 − µ2 when (µ1, µ2) ∈ R

2.The following theorem uses Lemma 1 and connects S-sufficiency with the

identification of a non injective parameter function g.

Theorem 2. If the statistic T is S-sufficient for g, then: g is T -identified ⇐⇒ g isX -identified.

Partial sufficiency with connection to the identification problem 275

Theorem 2 can be considered as a generalization of Theorem 1, for thecase where the parameter of interest may be a non-injective function. From thetwo contexts given in the introduction, Theorem 2 may be interpreted in thesame way as Theorem 1. Next example (Example 2 continued) is an illustrationof the utility of this theorem.

Example 5. Let us take again the same model as in Example 2. For eachunknown joint distribution F of (δ0, δ1), we denote by G F its correspondingdistribution of ε = δ0 − δ1. Let us now consider a noninjetive function ofthe complete parameter θ = (β0, β1, F), namely g(θ) = (β1 − β0, G F). Is thestatistic T = 1{Y1≥Y0} = 1{ε−Z

′β≤0} S-sufficient for g(θ)? One way of checking

this is to evaluate the distribution of (Y0, Y1) conditional to T and to checkwhether it depends on a variation free complement of g(θ) or not. This isunfortunately not possible to do because of the burden of computation of theconditional distribution and also for the difficulty to analytically give a structureof a variation free complement for g(θ). However Theorem 2 provides us withan answer. In fact, let θ0 = (β

(0)0 , β

(0)1 , F (0)) and θ1 = (β

(1)0 , β

(1)1 , F (1)) be two

distinct parameter values such that ∀x ∈ R : G F(1)(x) = G F(0)(xc ), β(0) = cβ(1)

where c > 0 and β(i) = β(i)1 − β

(i)0 i ∈ {0, 1}. Then we have:

P(T = 1 | Z , θ1) = G F(1)(Z′β(1)) = G F(0)(Z

′β(0)) = P(T = 1 | Z , θ0) . (13)

From equation (13), θ1 and θ0 induce the same distribution for T unless g(θ0) =(β

(0)0 , G F(0)) �= g(θ1) = (β

(1)0 , G F(1)). Consequently g(θ) is not T -identified and

then by Theorem 2, T is not S-sufficient for g(θ).

4.3. S-sufficiency: properties

The two contexts, sketched in the introduction and pursued all over thepaper, call for a deeper understanding of the concept of S-sufficiency. Thisone may be viewed as an extension of sufficiency in the sense that the twoconcepts coincide once g(θ) is injective. Displaying the main properties ofS-sufficiency is a natural way for a deeper understanding of what makes S-sufficiency different from sufficiency.

A first question to be asked is the following: why is it so that S-sufficiencydoes not require equation (7) to be satisfied for all the variation free comple-ments of g, i.e. ∀h ∈ Vg? Next theorem shows that, for a standard familyof statistical models, this requirement is not possible, because such an h isessentially unique.

Theorem 3. Let M�X be an identified homogeneous model, i.e. all the probability

measures in the model have the same null-sets. Let T be a g-oriented statistic andh be a variation free complement of g making of T an S-sufficient statistic for g.Then, h is unique up to an injective transformation.

276 A. OULHAJ – M. MOUCHART

Next, let us shortly remind four well known properties of sufficiency:

Property 1. If T is sufficient for M�X then T is also sufficient for any submodel

M�0X .

Property 2 (Littaye-Petit et al. (1969)). If T1 is sufficient for M�X and T2 ⊂ T1

is sufficient for M�T1

, then T2 is also sufficient for M�X .

Property 3 (Halmos and Savage (1948)). In the dominated case, if T1 issufficient and T2 ⊃ T1, then T2 is also sufficient.

Property 4 (Halmos and Savage (1948)). In the dominated case, a minimalsufficient statistic exists.

Let us now check which of these properties also hold for S-sufficiency.Property 1 does not hold anymore for S-sufficiency because S-sufficiency cru-cially depends on the non-emptiness of the set Vg. Thus in Example 4, Vg isnot empty for � = R

2 but is empty for � = [0, 1]2 ⊂ R2. However, Property 2

still holds, in view of the following theorem.

Theorem 4. Let T1 be S-sufficient for g in M�X and let T2 be S-sufficient for g in

the reduced model M�T1

. Then T2 is S-sufficient for g in the complete model M�X .

Theorem 4 means the following: let us reduce the complete model bymarginalizing w.r.t an S-sufficient statistic, say T1, and consider the reducedmodel M�

T1. If there exists a statistic T2, which is S-sufficient for g in that

reduced model, then it will also be S-sufficient in a complete model. Prop-erty 3 does not hold anymore for S-sufficiency because of the following simplecorollary of Theorem 2.

Corollary 1. If θ is X -identified and T is sufficient, then T is not S-sufficient forany non-injective function g(θ).

Indeed, the fact that θ is X -identified and T is sufficient implies by The-orem 1 that θ is a minimal sufficient parameter for T . This also impliesthat every non-injective function, say g, cannot be minimal sufficient for T ,which also implies by Theorem 2 that T is not S-sufficient for that function g.Therefore if T is an S-sufficient statistic for a non-injective function g, thenthe whole sample X will never be S-sufficient for such function g becauseof Corollary 1. Finally, Property 4, on the existence of a minimal sufficientstatistic in the dominated case, has been addressed in Remon (1984) in theframework of L-sufficiency but to the best of our knowledge, no such resulthas been obtained for S-sufficiency.

Partial sufficiency with connection to the identification problem 277

5. Conclusion

The S-sufficiency concept crucially depends on the non emptiness of thetwo sets Vg and �g, i.e. the set of all variation free complements of g and theset of all g-oriented statistics. If for a given parameter of interest γ = g(θ),any one of these two sets is empty, S-sufficiency is not available.

Condition �g �= ∅ ensures that there exists a g-oriented statistic T such thatthe marginal model generating T contains information only on the parameterof interest or, equivalently, that γ = g(θ) is a sufficient parameter for T . Thatcondition �g �= ∅ is not sufficient in itself is due, among others, to the factthat a statistic containing T may not be g-oriented or that a statistic sufficientfor the complete parameter may fail to be sufficient for a non-injective functionof θ (see Corollary 1).

Adding the condition Vg �= ∅ to the condition �g �= ∅ is not enough to pro-duce a useful concept: S-sufficiency requires furthermore that the T -conditionalmodel admits a variation free complement of γ = g(θ) as a sufficient parame-ter. This property ensures that neglecting the T -conditional model does not loseinformation on the parameter of interest. S-sufficiency is accordingly equiva-lent to a cut. S-sufficiency raises several difficulties, one is the proof of theexistence and the construction of an S-sufficient statistic. Another one is theexistence, or the non existence, and the construction of an optimal cut, i.e. aminimal S-sufficient statistic.

However, a good and practical property of S-sufficiency (when conditionsof its existence are satisfied) is its connection to the identification problem.In fact, from context 1, given in the introduction, S-sufficiency preserves theidentification of the parameter of interest g(θ). In other words, if the observablevector is S-sufficient for g(θ), then it is equivalent to identify that parameterof interest from the structural model or from the statistical one. This fact givesus an alternative approach to identify the parameter of interest in partiallyobservable models. As a matter of fact, instead of identifying directly g(θ)

from the statistical model, which is usually not easy, one may first identify g(θ)

in the structural model (often easy to do, because the complete parameter isusually identified in the structural model) and thereafter find conditions underwhich the observation is S-sufficient for the parameter of interest. This approachhappens to be promising when studying the identification of discrete choicemodels.

This paper has also shown the abundance of overlapping concepts in thestatistical literature on partial sufficiency. We payed a particular attention toconnect among themselves the many concepts proposed earlier, and eventuallywe showed that a small number of them are actually different, even whenthey have been introduced from different perspectives. This is summarized innext table.

278 A. OULHAJ – M. MOUCHART

Table 1: A glossary of overlapping concepts

Verbal expression Reference Formal expression

g is T -identified standard Definition 2, eq. (2)g is necessary for T Barankin (1961) idem

g is T -sufficient Barankin (1961) Definition 3, eq. (4)g is unresolvable for T Picci (1977) idemT is g-oriented Basu (1977) idem

T is sufficient for θ standard Definition 1, eq. (1)T is B-sufficient for ω Barndorff-Nielsen (1978) idem

T is S-sufficient for g Barndorff-Nielsen (1978) Definition 5T operates a Cut Barndorff-Nielsen (1978) idemT is p-sufficient for g Basu (1977) idem

Appendix

Proof of Lemma 1. Let us assume that there exist two distinct values (θ1, θ2) ∈�2 such that g(θ1) = g(θ2) and a statistic T ∈ �g such that, for each A ∈ Xthere exists a T -measurable common version f A ∈ P(A | T , θ1)∩ P(A | T , θ2).This means that ∀A ∈ X , ∀B ∈ T :

P(A ∩ B | θ1) =∫

Bf Ad Pθ1

TT ∈�g=

∫B

f Ad Pθ2T = P(A ∩ B | θ2) . (14)

Taking B = RX , equation (14) implies that:

∀A ∈ X : P(A | θ1) = P(A | θ2) (15)

which in turn implies that θ1 = θ2 because θ is identified in M�X .

Proof of Theorem 1. The �⇒ is trivial; thus we only prove the ⇐� part.Let us assume that θ is X -identified but is not T -identified; this means byDefinition 2 that there exist (θ1, θ2) ∈ �2 such that:

θ1 �= θ2 and Pθ1T = Pθ2

T . (16)

As T is sufficient in M�X , there exists, for each A ∈ X and each (θ1, θ2)

satisfying (16), a T -measurable common version f A,θ1,θ2 such that:

∀B ∈ T : P(A ∩ B | θi) =∫

Bf A,θ1,θ2d P

θiT i = 1, 2 . (17)

Partial sufficiency with connection to the identification problem 279

Combining (16) and (17), we have that ∀A ∈ X , ∀B ∈ T :

P(A ∩ B | θ1)(17)=

∫B

f A,θ1,θ2d Pθ1T

(16)=∫

Bf A,θ1,θ2d Pθ2

T(17)= P(A ∩ B | θ2) . (18)

Equation (18) implies for B = RX that ∀A ∈ X : P(A | θ1) = P(A | θ2).Which in turn implies that θ1 = θ2 because θ is identified in M�

X .

Proof of Lemma 2. We first prove that (ii) implies that (h, g) is surjective onthe factor space.

(β, γ ) ∈ h(�) × g(�)(ii)�⇒ Card(h−1(β) ∩ g−1(γ )) = 1

�⇒ ∃!θ ∈ � : (β, γ ) = (h(θ), g(θ))

�⇒ (β, γ ) ∈ {(h(θ), g(θ)) : θ ∈ �} .

(19)

Equation (19) implies that h(�)× g(�) ⊂ {(h(θ), g(θ)) : θ ∈ �}. But trivially,h(�) × g(�) ⊃ {(h(θ), g(θ)) : θ ∈ �} and so the two sets are equal. Next weshow that the function f = (h, g) is injective on �, more specifically:

f (θ1) = f (θ2) �⇒ h(θ1) = h(θ2) and g(θ1) = g(θ2)

�⇒ ∃(β, γ ) ∈ h(�) × g(�) : (θ1, θ2) ∈ h−1(β) ∩ g−1(γ )

(ii)�⇒ θ1 = θ2 .

Conversely, let us show that (i) implies (ii).

(β, γ ) ∈ h(�) × g(�)(i)�⇒ ∃!θ ∈ � : (h(θ), g(θ)) = (β, γ )

�⇒ h−1(β) ∩ g−1(γ ) = {θ}�⇒ Card(h−1(β) ∩ g−1(γ )) = 1 .

(20)

Proof of Lemma 3. Let h be a variation free complement of g; then thequotient set generated by h i.e. �/h is a partition, the elements of whichare trivially g-sections. So, condition (i) implies condition (ii). Conversely,let us show that every g-partition defines, on the parameter space, a uniquefunction, say h, which is a variation free complement of g(θ). Indeed, Let = {�∗

g,β : β ∈ B} be a given g-partition of � and let us construct, on �,a function h as follows, h : � −→ B such that h(θ) = β, ∀θ ∈ �∗

g,β . Clearly,∀β ∈ B : h−1(β) = �∗

g,β and coincides with the quotient set generated by h,i.e. = {h−1(β) : β ∈ B}. Because is a g-partition, the function h satisfiesby condition (10): ∀(β, γ ) ∈ h(�) × g(�) : Card(h−1(β) ∩ g−1(γ )) = 1. Then,h satisfies condition (ii) of Theorem 2.

280 A. OULHAJ – M. MOUCHART

Proof of Theorem 2. Because equation (3) makes the other implication trivial,one should only prove that the T -identification of g(θ) is implied by its X -identification. Let us assume that g(θ) is not T -identified; this means byDefinition 2 that there exist two distinct values (θ1, θ2) ∈ �2 such that:

g(θ1) �= g(θ2) and Pθ1T = Pθ2

T . (21)

The S-sufficiency of T implies the existence of a g-partition, say , satisfy-

ing: ∀�∗g ∈ , T is sufficient for M

�∗g

X . Let us now consider (θ1, θ2) ∈ �2

satisfying (21). Then there exist two g-sections (�∗1,g, �

∗2,g) ∈ 2 such that

(θ1, θ2) ∈ �∗1,g × �∗

2,g, because a g-partition cover all �. Now, let θ′2 be an

element of �∗1,g satisfying θ

′2 ∼g θ2 (θ

′2 exists and is unique by definition of a

g-section); then Pθ2T = P

θ′2

T (because T ∈ �g) which implies by equation (21)

that Pθ1T = P

θ′2

T .The S-sufficiency of T implies, in particular for (θ1, θ

′2), that for each

A ∈ X there exists a T -measurable common version fA,θ1,θ

′2

or equivalently

that ∀A ∈ X , ∀B ∈ T :

P(A ∩ B | θ1) =∫

Bf

A,θ1,θ′2d Pθ1

T =∫

Bf

A,θ1,θ′2d P

θ′2

T = P(A ∩ B | θ′2) . (22)

Equation (22) implies for B = RX that ∀A ∈ X : P(A | θ1) = P(A | θ′2).

Which in turn implies a contradiction of (21), namely g(θ1) = g(θ′2) = g(θ2)

because g(θ) is X -identified.

Proof of Theorem 3. To prove Theorem 3, it suffices to show that if thereexist two different g-sections �∗

g,1 and �∗g,2 such that T is sufficient for both

M�∗

g,1X and M

�∗g,2

X , then �∗g,1 ∩ �∗

g,2 = ∅. Let us assume that T is sufficient for

both M�∗

g,1X and M

�∗g,2

X with �∗g,1 and �∗

g,2 being two non-disjoints g-sections.This means that for each A ∈ X , there exist two T -measurable functionsfi,A ∈ ⋂

θ∈�∗g,i

P(A | T , θ) i = 1, 2.

Let θ0 ∈ �∗g,1 ∩ �∗

g,2, then ∀A ∈ X , ∀B ∈ T :∫

B f1,Ad Pθ0T = ∫

B f2,Ad Pθ0T .

This implies that ∀A ∈ X , ∀B ∈ T :∫

B( f1,A − f2,A)d Pθ0T = 0, which also

implies that:

∀A ∈ X : f1,A = f2,A Pθ0T a.s . (23)

Because M�X is an homogeneous model, equation (23) implies that:

∀A ∈ X : f1,A = f2,A Pθ a.s ∀θ ∈ �∗g,1 ∪ �∗

g,2 . (24)

Partial sufficiency with connection to the identification problem 281

Equation (24) implies that for each A ∈ X , { f1,A, f2,A} are common versionsof {P(A | T , θ) : θ ∈ �∗

g,1 ∪ �∗g,2}, which implies that:

∀A ∈ X :⋂

�∗g,1∪�∗

g,2

P(A | T , θ) �= ∅ . (25)

Because the g-sections are in bijection, equation (25) implies that within eachequivalence class defined by g, there exist two distinct values having a sameT -measurable common version, i.e.

∀γ ∈ g(�) ∃(θ1, θ2) ∈ (g−1(γ ))2 with θ1 �= θ2 and such that :

∀A ∈ X : P(A | T , θ1) ∩ P(A | T , θ2) �= ∅ . (26)

Clearly, equation (26) contradicts Theorem 1 because T ∈ �g and θ is M�X

identified.

Proof of Theorem 4. To prove Theorem 4 we need the following lemma.

Lemma 4. Let T1 ∈ �g and T2 ⊂ T1. If there exists a g-section, say �∗g,0 such that:

∀B ∈ T1 :⋂

θ∈�∗g,0

P(B | T2, θ) �= ∅ . (27)

Then, every g-section satisfies (27).

Proof of Lemma 4. Let �∗g,1 be an arbitrary g-section. Then, ∀θ1 ∈ �∗

g,1 there

exists a unique θ0 ∈ �∗g,0 such that θ1 ∼g θ0 and then Pθ1

T1= Pθ0

T1, because

T1 ∈ �g. Which also implies that:

∀B ∈ T1 : P(B | T2, θ1) = P(B | T2, θ0) .

and therefore, by equation (27) that ∀B ∈ T1 :⋂

θ∈�∗g,1

P(B | T2, θ) �= ∅.

Let us now prove Theorem 4. T1 is S-sufficient for g in M�X means that

there exists a g-partition, say = {�∗β : β ∈ H} such that:

∀β ∈ H, ∀A ∈ X :⋂

θ∈�∗β

P(A | T1, θ) �= ∅ . (28)

T2 is S-sufficient for g in M�T1

implies the existence of a g-section satisfy-ing (27), because T2 ⊂ T1. This implies by Lemma 4 that equation (27) holdsfor every g-section in , more specifically:

∀β ∈ H, ∀B ∈ T1 :⋂

θ∈�∗β

P(B | T2, θ) �= ∅ . (29)

282 A. OULHAJ – M. MOUCHART

Finally, let us fix β ∈ H and A ∈ X . From (28), let us denote by f�∗

β

A anycommon version of {P(A | T1, θ) : θ ∈ �∗

β}. In order to show that T2 isS-sufficient in M�

X , we notice:

⋂θ∈�∗

β

P(A | T2, θ) =⋂

θ∈�∗β

E(P(A | T1, θ) | T2, θ)

=⋂

θ∈�∗β

E( f�∗

β

A | T2, θ) �= ∅ .(30)

Acknowledgments

Without implications, the authors gratefully acknowledge useful comments on earlier versionsby A. Antoniadis, A.P. Dawid, J.-M. Rolin and M. Remon. The comments of two anonymousreferees and of one associate editor have also been deeply appreciated as they stimulated substantialimprovements on an earlier version. Support of the contract Projet d’Actions de Recherche ConcerteesARC 98/03-217, of the Belgium Government, is gratefully acknowledged.

REFERENCES

Barankin, E. W. (1955b) Sufficient parameters: solution of the minimal dimensionality problem,Annals of the Institute of Statistical Mathematics, 12, 91-118.

Barndorff-Nielsen, O. (1978) Information and Exponential Families in Statistical Theory, JohnWiley, New York.

Barra, J.-R. (1981) Mathematical Basis of Statistics, Academic Press, New York.

Basu, D. (1977) On the elimination of the nuisance parameters, Journal of the American StatisticalAssociation, 72, 355-366.

Dawid, A. P. (1975) On the Concepts of Sufficiency and Ancillarity in the Presence of NuisanceParameters, The Journal of the Royal Statistical Society, (Series B), 37, No. 2, 248-258.

Florens, J.-P., Mouchart, M., and Rolin, J.-M. (1990) Elements of Bayesian Statistics, MarcelDekker, New York.

Fraser, D. A. S. (1956) Sufficient statistics with nuisance parameters, The Annals of the Mathemat-ical Statistics, 27, 838-842.

Godambe, V. P. (1980) On sufficiency in the presence of a nuisance parameter, Biometrika, 67, No. 1,155-162.

Halmos, P. R. and Savage, L. J. (1949) Application of the Radon-Nikodym theorem to the theoryof sufficient statistics, The Annals of Mathematical Statistics, 20, 225-241.

LeCam, L. and Schwartz, L. (1960) A necessary and sufficient condition for the existence ofconsistent estimates, The Annals of Mathematical Statistics, 31, 140-150.

Partial sufficiency with connection to the identification problem 283

Littaye-Petit, M., Piednoir, J.-L., and Van Custem, B. (1969) Exhaustivite, Annales de l’InstitutHenri Poincare, 4, 289-322.

Mouchart, M. and Oulhaj, A. (2001) A note on partial sufficiency with connection to the identi-fication problem, DP 0126, Institut de statistique, UCL, Louvain-la-Neuve (B).

Oulhaj, A. (2003) Partially sufficient statistics and identification in conditional models, PhD Thesis,Institut de statistique, UCL, Louvain-la-Neuve (B).

Picci, G. (1977) Some connections between the theory of sufficient statistics and the identifiabilityproblem, SIAM Journal of applied mathematics, 33, No. 3, 383-398.

Remon, M. (1984) On a concept of partial sufficiency: L-sufficiency, International Statistical Review,52, 127-136.

ABDERRAHIM OULHAJInstitut de statistiqueUniversite catholique de Louvain20 voie du roman pays1348 Louvain-la-Neuve (Belgium)[email protected]

MICHEL MOUCHARTInstitut de statistiqueUniversite catholique de Louvain20 voie du roman pays1348 Louvain-la-Neuve (Belgium)[email protected]