Transcript
Page 1: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Single Equation Linear GMM

Eric Zivot

Winter 2013

Page 2: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Single Equation Linear GMM

Consider the linear regression model

= z0δ0 + = 1

z = × 1 vector of explanatory variablesδ0 = × 1 vector of unknown coefficients = random error term

Engodeneity

The model allows for the possibility that some or all of the elements of maybe correlated with the error term (i.e., [] 6= 0 for some ).

If [] 6= 0 then is called an endogenous variable.

If z contains endogenous variables, then the least squares estimator of δ0 inis biased and inconsistent.

Page 3: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Instruments

It is assumed that there exists a ×1 vector of instrumental variables x thatmay contain some or all of the elements of z.

Let w represent the vector of unique and nonconstant elements of { zx}.

It is assumed that {w} is a stationary and ergodic stochastic process.

Page 4: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Moment Conditions for General Model

Define

g(w δ0) = x = x( − z0δ0)

It is assumed that the instrumental variables x satisfy the set of orthogo-nality conditions

[g(w δ0)] = [x] = [x( − z0δ0)] = 0

Expanding gives the relation

[x]−[xz0]δ0 = 0

⇒ Σ(×1)

= Σ(×)

δ0(×1)

where Σ = [x] and Σ = [xz0].

Page 5: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Example: Demand-Supply model with supply shifter

demand: = 0 + 1 +

supply: = 0 + 1 + 2temp +

equilibrium: = =

[temp] = [temp] = 0

Goal: Estimate demand equation; 1 = demand elasticity if data are in logs

Remark

Simultaneity causes to be endogenous in demand equation: [] 6= 0

Why? Use equilibrium condition, solve for and compute []

Page 6: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Demand/Supply model in Hayashi notation

= z0δ +

=

z = (1 )0 δ = (0 1)

0

x = (1 temp)0 w = ( temp)

0

= 2 = 2

Note: 1 is common to both z and x

Page 7: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Example: Wage Equation

ln = 1 + 2 + 3 + 4 +

= wages

= years of schooling

= years of experience

= score on IQ test

2 = rate of return to schooling

Assume

[] = [] = 0

Endogeneity: is a proxy for unobserved ability but is measured with error

= +

⇒ [] 6= 0

Page 8: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Instruments:

= age in years

= years of mother’s education

[] = [] = 0

Page 9: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Wage Equation in Hayashi notation

= z0δ +

= ln

z = (1 )0

x = (1 )0

w = (ln )0

= 4 = 5

Remarks

1. 1 are often called included exogenous variables. That is, theyare included in the behavioral wage equation.

2. are often called excluded exogenous variables. That is, theyare excluded from the behavioral wage equation.

Page 10: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Identification

Identification means that δ0 is the unique solution to the moment equations

[g(w δ0)] = 0

That is, δ0 is identified provided

[g(w δ0)] = 0 and [g(w δ)] 6= 0for δ 6= δ0

For the linear GMM model, this is equivalent to

Σ(×1)

= Σ(×)

δ0(×1)

Σ(×1)

6= Σ(×)

δ(×1)

for δ 6= δ0

Page 11: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Rank Condition

For identification of δ0, it is required that the × matrix [xz0] = Σ

be of full rank . This rank condition is usually stated as

rank(Σ) =

This condition ensures that δ0 is the unique solution to [x(−z0δ0)] = 0.

Note: If = , then Σ is invertible and δ0 may be determined using

δ0 = Σ−1Σ

Page 12: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Example: Demand/Supply Equation

x = (1 temp)0 z = (1 )

0

= 2 = 2

Σ = [xz0] =

Ã1 []

[temp] [temp]

!For the rank condition, we need rank(Σ) = = 2 Now, rank(Σ) = 2 ifdet(Σ) 6= 0 Further,

det(Σ) = [temp]−[temp][]

= cov(temp )

Hence, rank(Σ) = 2 provided cov(temp ) 6= 0 That is, δ0 is identifiedprovided the instrument is correlated with the endogenous variable.

Page 13: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Order Condition

A necessary condition for the identification of δ0 is the order condition

which simply states that the number of instrumental variables must be greaterthan or equal to the number of explanatory variables.

If = then δ0 is said to be (apparently) just identified;

if then δ0 is said to be (apparently) overidentified;

if then δ0 is not identified.

Note: the word “apparently” in parentheses is used to remind the reader thatthe rank condition rank(Σ) = must also be satisfied for identification.

Page 14: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Conditional Heteroskedasticity

In the regression model the error terms are allowed to be conditionally het-eroskedastic and/or serially correlated. Serially correlated errors will be treatedlater on.

For the case in which is conditionally heteroskedastic, it is assumed that{g} = {x} is a stationary and ergodic martingale difference sequence(MDS) satisfying

[gg0] = [xx

02 ] = S

S = nonsingular × matrix

Note: if var(|x) = 2 then S = 2Σ

Page 15: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Remark:

The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments g = −1

P=1 g(w δ0). This follows from the central limit theorem

for ergodic stationary martingale difference sequences (see Hayashi, 2000, p.106)

√g =

1√

X=1

x→ (0S)

S = [gg0] = avar(

√g)

Page 16: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Definition of the GMM Estimator

The GMM estimator of δ0 is constructed by exploiting the orthogonality condi-tions [x(−z0δ0)] = 0. The idea is to create a set of estimating equationsfor δ0 by making sample moments match the population moments.

The sample moments for an arbitrary value δ are

g(δ) =1

X=1

(w δ) =1

X=1

x( − z0δ)

=

⎛⎜⎜⎝1

P=1 1( − z0δ)...

1

P=1 ( − z0δ)

⎞⎟⎟⎠These moment conditions are a set of linear equations in unknowns.

Page 17: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Equating these sample moments to the population moment [x] = 0 givesthe estimating equations

1

X=1

x( − z0δ) = S − Sδ = 0

where

S = −1X=1

x

S = −1X=1

xz0

are the sample moments.

Page 18: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Analytical Solution: Just Identified Model

If = (δ0 is just identified) and S is invertible, then the GMM estimatorof δ0 is

δ = S−1 S

which is also known as the indirect least squares (ILS) or instrumental variables(IV) estimator.

Page 19: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Remark: It is straightforward to establish the consistency of δ

Since = z0δ0 +

= −1X=1

x = −1X=1

x(z0δ0 + )

= Sδ0 + S

Then

δ − δ0 = S−1 S

Page 20: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

By the ergodic theorem and Slutsky’s theorem

S = −1X=1

xz0→ [xz

0] = Σ

S−1→ Σ−1 provided rank(Σ) =

S = −1X=1

xε→ [x] = 0

δ − δ0 = S−1 S

→ 0

Page 21: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Analytic Solution: Overidentified Model

If then are equations in unknowns. In general, there is no solutionto the estimating equations

S − Sδ = 0

GMM Solution: Use the method of minium chi-square. The idea is to try tofind a δ that makes S − Sδ as close to zero as possible. This is achievedby minimizing a weighted average of the moment equations.

Page 22: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

GMM Objective Function

Let W denote a × symmetric and positive definite (p.d.) weight matrix,possibly dependent on the data, such that W

→ W as → ∞ with Wsymmetric and p.d. The GMM objective function is defined as

(δW) = g(δ)0Wg(δ)

= (S − Sδ)0W(S − Sδ)

Page 23: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Remarks

1. The scaling by will be explained later.

2. The elements of W determine the weights put on the moment equations.For example, let = 2 = 1 so that

g() =

Ã1()2()

!W =

Ã11 1212 22

!Then

(δW) = g(δ)0Wg(δ)

= ³1() 2()

´0Ã 11 1212 22

!Ã1()2()

!(111()

2 + 2121()2() + 222()2)

3. We will discuss how to determine the best W later

Page 24: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Definition of GMM estimator

The GMM estimator of δ0, denoted δ(W), is defined as

δ(W) = argminδ

(δW)

argminδ

(S − Sδ)0W(S − Sδ)

Since (δW) is a simple quadratic form in δ, straightforward calculus maybe used to determine the analytic solution for δ(W):

δ(W) = (S0WS)−1S0WS

Page 25: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Derivation of GMM estimator

First, note that

(δW) = (S − Sδ)0W(S − Sδ)=

hS0WS − 2S0WSδ + δ0S0WSδ

iThen

(δW)

δ= −2S0WS + 2S

0WSδ

Solving for δ such that (δW)δ = 0 gives

δ(W) = (S0WS)−1S0WS

provided S0WS is invertible (which requires rank(S) = ).

Page 26: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Recall the results from matrix algebra

(a0β)β

= a

(β0Aβ)β

= 2Aβ for A symmetric

Page 27: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Asymptotic Properties

Under standard regularity conditions (see Hayashi, 2000, C hap. 3), it can beshown that

δ(W) = (S0WS)−1S0WS

→ δ0√³δ(W)− δ0

´→ (0 avar(δ(W)))

where

avar(δ(W)) = (Σ0WΣ)−1Σ0WSWΣ(Σ

0WΣ)

−1

Page 28: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

A consistent estimate of avar(δ(W)), denoted davar(δ(W)), may be computedusing

davar(δ(W)) = (S0WS)−1S0WSWS(S

0WS)

−1

where S is a consistent estimate for S = avar(g).

Note: S can be consistently estimated using the White or heteroskedasticityconsistent (HC) estimator

S =1

X=1

xx02

= − z(W)

because (W)→ δ0

Page 29: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Sketch of Proof

For consistency, note that

δ(W) = (S0WS)−1S0WS

= (S0WS)−1S0W(Sδ0 + S)

= δ0 + (S0WS)

−1S0WS

Page 30: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

By assumption W→W p.d. By the ergodic theorem and Slutsky’s theorem

S = −1X=1

xz0→ [xz

0] = Σ

S0W→Σ0W³

S0WS´−1 →

³Σ0WΣ

´−1 provided rank(Σ) =

S = −1X=1

xε→ [x] = 0

δ(W)− δ0→ 0

Page 31: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

For asymptotic normality, write

√³δ(W)− δ0

´=³S0WS

´−1S0W

√S

By the ergodic theorem and Slutsky’s theorem

S→ [xz

0] = Σ³

S0WS´−1 →

³Σ0WΣ

´−1Since {} = {x} is a stationary and ergodic martingale difference sequence(MDS) satisfying

[gg0] = [xx

02 ] = S

S = nonsingular × matrix

the CLT for stationary-ergodic MDS gives

√S =

1√

X=1

x→ (0S)

Page 32: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Furthermore, Slutsky’s theorem gives

S0W√S

→ Σ0W·(0S)

Therefore√³δ(W)− δ0

´→³Σ0WΣ

´−1Σ0W·(0S)

≡ µ0³Σ0WΣ

´−1Σ0WSWΣ

³Σ0WΣ

´−1¶

Note:

A ·(0S) ≡ (0ASA0)

Page 33: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

The Efficient GMM Estimator

For a given set of instruments x, the GMM estimator δ(W) is defined for anarbitrary p.d. and symmetric weight matrix W. The asymptotic variance ofδ(W) depends on the chosen weight matrix W

avar(δ(W)) = (Σ0WΣ)−1Σ0WSWΣ(Σ

0WΣ)

−1

A natural question to ask is: What weight matrixW produces the smallest valueof avar(δ(W))? The GMM estimator constructed with this weight matrix iscalled the efficient GMM estimator.

Page 34: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Result: Hansen (1982) showed that efficient the GMM estimator results fromsetting W = S−1 such that S

→ S = [xx02 ] = avar (

√g(δ0)). For

this choice of W, the asymptotic variance formula reduces to

avar(δ(S−1)) = (Σ0S−1Σ)

−1

of which a consistent estimate is

davar(δ(S−1)) = (S0S−1S)−1

Page 35: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

The efficient GMM estimator is defined as

δ(S−1) = argminδ

(δ S−1)

= argminδ

g(δ)0S−1g(δ)

which requires a consistent estimate of S = [gg0] = [xx02 ].

However, a consistent estimation of S, in turn, requires a consistent estimateof δ0. White (1982) showed that a consistent estimate of S has the form

S =1

X=1

xx02 =

1

X=1

xx0( − z0δ)2

such that δ→ δ0.

Page 36: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Two-Step Efficient GMM

The two-step efficient GMM estimator utilizes the result that a consistent esti-mate of δ0 may be computed by GMM with an arbitrary positive definite andsymmetric weight matrix W such that W

→W. Let δ(W) denote such anestimate.

Common choices for W are W = I and W = S−1 = (−1X0X)−1, whereX is an × matrix with the th row equal to x0.

Then, a first step consistent estimate of S is given by

S(W) =1

X=1

xx0( − z0δ(W))2

Page 37: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

The two-step efficient GMM estimator is then defined as

δ(S−1(W)) = argminδ

g(δ)0S−1(W)g(δ)

⇒ δ(S−1(W)) = (S0S−1(W)S)

−1S0S−1(W)S

Drawback: The numerical value of δ(S−1(W)) depends on W

Page 38: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Iterated Efficient GMM

The iterated efficient GMM estimator uses the two-step efficient GMM esti-mator δ(S−1(W)) to update the estimation of S and then recomputes theestimator. The process is repeated (iterated) until the estimates of δ0 donot change significantly from one iteration to the next. Typically, only a fewiterations are required. The resulting estimator is denoted δ(S

−1iter).

The iterated efficient GMM estimator has the same asymptotic distribution asthe two-step efficient estimator. However, in finite samples, the two estimatorsmay differ.

The iterated GMM estimator has a practical advantage over the two-step esti-mator in that the resulting estimates are invariant with respect to the scale ofthe data and to the initial weighting matrix W.

Page 39: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Continuous Updating Efficient GMM

This estimator simultaneously estimates S, as a function of δ, and δ. It isdefined as

δ(S−1CU) = argminδ

(δ S−1(δ))

= argminδ

g(δ)0S−1(δ)g(δ)

S(δ) =1

X=1

xx0( − z0δ)2

Hansen, Heaton, and Yaron (1996) call δ(S−1CU) the continuous updating (CU)efficient GMM estimator.

Page 40: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Remarks:

1. The CU estimator is asymptotically equivalent to the two-step and iteratedestimators, but may differ in finite samples.

2. The CU estimator does not depend on an initial weight matrixW, and likethe iterated efficient GMM estimator, the numerical value of CU estimator isinvariant to the scale of the data.

3. The CU estimator is a non-linear function of δ and does not have a closedform expression like the two-step or iterated estimators. It is computationallymore burdensome than the iterated estimator, especially for large nonlinearmodels, and is more prone to numerical instability.

4. Hansen, Heaton, and Yaron found that the finite sample performance ofthe CU estimator, and test statistics based on it, is often superior to the otherestimators.

Page 41: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Remarks continued:

5. Under homoskedastic and normally distributed errors the CU estimator isequivalent to the (limited information) maximum likelihood estimator.

6. More generally, the CU estimator is in the class of generalized empiricallikelihood (GEL) estimators, which are related to non-parametric maximumlikelihood estimators, and have certain optimality properties.

7. The CU estimator is more robust to the presence of weak instruments thanthe two-step or interated GMM estimators. Here, “robust” loosely means thatthe asymptotic distribution of the CU estimator does not get as screwed up asthe asymptotic distributions of the two-step and iterated GMM estimators wheninstruments are characterized as “weak”. We will more formally characterizeweak instruments later on in the course.

Page 42: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Two-Stage Least Squares as an Efficient GMM Estimator

If the errors are conditionally homoskedastic, then

[xx02 ] = 2Σ = S

A consistent estimate of S has the form S = 2S where 2→ 2. Typically,

2 = −1X=1

( − z0δ)2

δ→ δ0

The efficient GMM estimator becomes

δ(−2S−1 ) = (S0−2S−1S)

−1S0−2S−1S

= (S0S−1S)

−1S0S−1S

= δ(S−1 )

which does not depend on 2.

Page 43: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

The estimator δ(S−1 ) is, in fact, identical to the two-stage least squares(TSLS) estimator of δ0:

δ(S−1 ) = (S0S−1S)

−1S0S−1S

= (Z0PZ)−1Z0Py

= (Z0Z)−1Z0y

= δTSLS

where

Z = × matrix of observations with tth row z0X = × matrix of observations with tth row x0P = X(X0X)−1X0

Z = PZ

Page 44: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

The asymptotic variance of δ(S−1 ) = δTSLS is

avar(δTSLS) = (Σ0S

−1Σ)−1 = 2(Σ0Σ

−1Σ)

−1

Although δ(S−1 ) does not depend on 2, a consistent estimate of the asymp-

totic variance does:

davar(δTSLS) = 2TSLS(S0S

−1S)

−1

2TSLS = −1X=1

( − z0δTSLS)2

Page 45: Single Equation Linear GMM · The matrix S is the asymptotic variance-covariance matrix of the sample mo-ments ¯g = −1 P =1 g (w δ 0). This follows from the central limit theorem

Continuous Updating Efficient GMM under Homoskedasticity: LIML

Under conditional homoskedasticity, S = 2Σ and the CU efficient GMMestimator becomes

δ(S−1CU) = argminδ

g(δ)0S−1(δ)g(δ)

S(δ) = 2(δ)S =1

X=1

( − z0δ)2 × S

A little algebra shows that

g(δ)0S−1(δ)g(δ) =

(y− Zδ)0PX(y− Zδ)(y− Zδ)0(y− Zδ)

Result: δ(S−1CU) = δLIML = limited information maximum likelihood estima-tor.