Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Algebraic Complexity in Statistics using Combinatorial and Tensor Methods
BY
ELIZABETH GROSS
THESIS
Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Mathematics
in the Graduate College of theUniversity of Illinois at Chicago, 2013
Chicago, Illinois
Defense Committee:
Shmuel Friedland, Chair and AdvisorSonja Petrovic, Advisor, Penn StateJan VerscheldeOlga KashcheyevaLek-Heng Lim, University of Chicago
To Ryan and Sebastian.
ii
ACKNOWLEDGMENTS
Thank you to my advisor, Sonja Petrovic, whose dedicated mentoring has prepared me for
life as a mathematician. Sonja has been an excellent advisor, always challenging me to do better
and always believing I could. I will continually be grateful for her guidance and knowledge.
Thank you to my committee, Shmuel Friedland, Jan Verschelde, Olga Kashcheyeva, and
Lek-Heng Lim. Jan and Olga have been supportive through their active participation in the
Graduate Computational Algebraic Geometry Seminar. Shmuel has been a second advisor to
me, and I have enjoyed our many conversations. I am grateful to Jan for his early mentoring
and introducing me to numerical algebraic geometry.
I’ve been very lucky to have multiple people who have gone above and beyond in helping me
succeed; of these people, Mathias Drton, Bernd Sturmfels, and Seth Sullivant deserve special
recognition.
I am grateful to my fellow classmates at UIC and my colleagues from SFSU. It has been
inspiring to watch everyone’s triumphs and wonderful to be part of such a vibrant and stimu-
lating department. I am also grateful for the support of my friends and family, especially my
father.
Finally, I would like to thank Ryan. Ryan, you are a loving and devoted husband and father.
You keep me grounded when I am in danger of floating away and soaring when I am leaden.
Together, we have a beautiful life. Thank you for all your help.
iii
TABLE OF CONTENTS
CHAPTER PAGE
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Models, Ideals, and Varieties . . . . . . . . . . . . . . . . . . . . 62.2 Toric models, Markov bases and Markov complexity . . . . . . 92.3 Phylogenetic Models . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Tensors, Rank, and Border Rank . . . . . . . . . . . . . . . . . 122.5 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . 14
3 TORIC IDEALS OF HYPERGRAPHS . . . . . . . . . . . . . . . . . 183.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Preliminaries and notation . . . . . . . . . . . . . . . . . . . . . 213.3 Splitting sets and reducible edge sets . . . . . . . . . . . . . . . 233.4 Indispensable Binomials . . . . . . . . . . . . . . . . . . . . . . . 283.5 General degree bounds . . . . . . . . . . . . . . . . . . . . . . . 343.6 Hidden Subset Models . . . . . . . . . . . . . . . . . . . . . . . . 44
4 PHYLOGENETIC MODELS AND TENSORS OF BOUNDEDRANK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 A characterization of V4(3, 3, 4) . . . . . . . . . . . . . . . . . . 534.3 Proving case A.I.3 using degree 6 polynomials . . . . . . . . . 554.3.1 The case L = R = e3e>3 . . . . . . . . . . . . . . . . . . . . . . . 584.3.2 The case L = e3e>3 , R = e3e>2 . . . . . . . . . . . . . . . . . . . 604.4 The defining polynomials of V4(4, 4, 4) . . . . . . . . . . . . . . 60
5 MAXIMUM LIKELIHOOD DEGREE OF VARIANCE COM-PONENT MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 The likelihood equations . . . . . . . . . . . . . . . . . . . . . . 665.2.1 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 675.2.2 Restricted maximum likelihood . . . . . . . . . . . . . . . . . . 715.3 Proof of formula for ML degree . . . . . . . . . . . . . . . . . . 725.4 Proof of formula for REML degree . . . . . . . . . . . . . . . . 815.5 Linear mixed models with multimodal likelihood functions . . 86
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
iv
TABLE OF CONTENTS (Continued)
CHAPTER PAGE
CITED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
v
LIST OF FIGURES
FIGURE PAGE1 Reducible balanced edge set. The green edge es is the separator. . . 252 Reducible balanced edge set with an improper separator. The sepa-
rator consists of green edges e1 and e2. . . . . . . . . . . . . . . . . . . . . 253 Hypergraph associated to the hierarchical log-linear model for no 3-
way interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Example of a non-uniform hypergraph whose associated toric ideal
is non-homogeneous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Case 1. Proof of Theorem 32. . . . . . . . . . . . . . . . . . . . . . . . 486 Case 3. Proof of Theorem 32. . . . . . . . . . . . . . . . . . . . . . . . 48
vi
SUMMARY
Fundamental questions in statistical modeling ask about the best methods for model se-
lection, goodness-of-fit testing, and estimation of parameters. For example, given a collection
of aligned DNA sequences from a group of extant species, how can we decide which evolu-
tionary tree best describes the species’ ancestral history, or, given a sparse high-dimensional
contingency table, how can we perform goodness-of-fit testing when exact tests are infeasible?
In questions such as these, combinatorics, commutative algebra and algebraic geometry play a
leading role.
We explore such questions for specific classes of models, e.g. toric models, phylogenetic
models, and variance components models, and tackle the algebraic complexity problems that
lie at the root of them. We begin our exploration by studying toric ideals of hypergraphs,
algebraic objects that are used for goodness-of-fit testing for log-linear models. In this study,
we use the combinatorics of hypergraphs to give degree bounds on the generators of the ideals,
give sufficiency conditions of when a binomial in the ideal is indispensable, show that the ideal
of Tan(P1)n is generated by quadratics and cubics in cumulant coordinates, and recover a
well-known complexity theorem in algebraic statistics due to De Loera and Onn. Second, we
explore phylogenetic models by viewing the models as sets of tensors with bounded rank. We
show that the variety of 4× 4× 4 complex-valued tensors with border rank at most 4 is defined
by polynomials of degree 5, 6, and 9. This variety corresponds to the 4-state general Markov
model on the claw tree K1,3 and its defining polynomials can be used in model selection. This
vii
SUMMARY (Continued)
result also gives further evidence that the phylogenetic ideal of the model can be generated by
polynomials of degree 9 and less. Finally, we look at the algebraic complexity of maximum
likelihood estimation for variance components models, where we give explicit formulas for the
ML and REML degree of the random effects model for the one-way layout and give examples
of multimodal likelihood surfaces.
viii
CHAPTER 1
INTRODUCTION
Algebraic statistics applies commutative algebra, algebraic geometry, and combinatorics to
problems arising in statistics (for surveys see [22] and [25]). In this field of study, the idea
of complexity comes up in several different respects. This thesis will explore three different
complexity issues: Markov complexity for toric models, phylogenetic complexity for phylogenetic
models, and maximum likelihood degree for variance components models. The first two of these
are both defined as the maximum degree of polynomials in a minimal generating set of a specified
ideal. The last measure of algebraic complexity, the maximum likelihood degree, is the degree
of a zero-dimensional variety.
A statistical model is a family of probability distributions where joint probabilities are
often specified parametrically. If the joint probabilities, or, more commonly, their logarithms,
are parameterized by polynomials, then the closure of the model is an algebraic variety. The
underlying idea of algebraic statistics is that information about the variety yields statistical
information about the model.
For example, the generators of the vanishing ideal of a statistical model are useful in
goodness-of-fit testing and model selection [19] [9]. These generators are called model invariants
or, in the case of log-linear models, Markov bases. Part I of this dissertation is concerned with
providing an efficient description of the model invariants for two different classes of discrete
models, those encoded by hypergraphs and those specified as tensors of bounded border rank.
1
2
In Chapter 3, we focus on statistical models that are parameterized by square-free mono-
mials. Examples of such models include log-linear models [24] [22], group-based phylogenetic
models [58], and some hidden subset models [59]. In these cases, the parameterization can be
encoded by a hypergraph H. The model invariants are the generators of the toric ideal of the
hypergraph H, the kernel of the monomial map defined by the vertex-edge incidence matrix of
H. In the context of algebraic statistics, a binomial generating set of a toric ideal is called a
Markov basis. The Markov complexity, or Markov width, of a toric ideal is the maximum degree
of the polynomials in a minimal generating set.
Section 3.4 focuses on indispensable binomials, i.e. binomials that are members of every
minimal generating set of IH . The degree of an indispensable binomial gives a lower bound
on the Markov complexity of IH . Proposition 19 gives a combinatorial sufficient condition
for determining whether a binomial f ∈ IH is indispensable. Consequently, the Graver basis
is the unique minimal generating set of IH for any 2-regular hypergraph (Proposition 20). In
Corollary 26, we apply our combinatorial lower bounds to recover a well-known complexity result
in algebraic statistics from [16], which states that Markov bases for the no 3-way interaction
model on 3× r × c contingency tables are arbitrarily complicated.
In Section 3.5, we show that a degree bound on the generators of IH is equivalent to a
combinatorial criterion on H (See Thereom 27). This result generalizes work of Villareal [65]
and Ohsugi and Hibi [47] regarding the toric ideals of graphs and offers a way of computing an
upper bound for the Markov complexity of a given statistical model.
3
As an example, we consider hidden subset models. In [59], Sturmfels and Zwiernik show that
the variety of the hidden subset model for n random variables with subsets {{1}, {2}, . . . {n}}
is isomorphic to the image of the first tangential variety Tan((P1)n) in cumulant coordinates.
Using the hypergraph approach, we are able to show that in cumulant coordinates, the defining
ideal of the first tangential variety Tan((P1)n) is generated in quadratics and cubics. Thus, as
n grows, the Markov complexity of the hidden subset model associated to Tan((P1)n) remains
constant.
In Chapter 4, we study the 4-state general tree-based Markov model on the claw tree K1,3;
this model has applications to phylogenetics [1]. In terms of multilinear algebra, the variety of
this model is the set of all tensors in C4 ⊗ C4 ⊗ C4 of border rank at most 4. Thus, we shift
from the combinatorics of hypergraphs to the study of three-way tensors in this chapter.
For phylogenetic models, the vanishing ideal associated to the model is referred to as a
phylogenetic ideal, and the maximum degree of the polynomials in a minimal generating set
is called the phylogenetic complexity of the model. For the 4-state general Markov model on
K1,3, the phylogenetic complexity is conjectured to be 9 (see Conjecture 34), this was first
conjectured in [57] and agrees with the numerical computations in [6].
In [26], Friedland shows that the variety associated to the model is cut out set-theorectically
by polynomials of degree 5, 9, and 16. The goal of Chapter 4 is to replace the degree 16 equations
with a set of degree 6 equations that are known to be in the ideal [39]. In Theorem 33, we
tighten the results of Friedland in [26] and show that the variety of tensors in C4 ⊗C4 ⊗C4 of
border rank at most 4 is cut out by polynomials of degree 5, 6, and 9. In addition to providing
4
supporting evidence for Conjecture 34, this result, combined with results in [26] and [6], gives
explicit polynomials that one can use to test whether a tensor in C4⊗C4⊗C4 has border rank
less than four, or equivalently, whether given data could have arisen from the 4-state general
Markov model on the claw tree K1,3.
Chapter 5 is dedicated to a different kind of complexity problem: that of maximum likelihood
estimation. In particular, we study the algebraic complexity, or ML degree, of maximum
likelihood estimation for specific models. The ML degree gives us insight into the geometry
of the likelihood surface. For example, it speaks to the total possible number of modes of the
likelihood surface. Since common maximum likelihood estimation techniques use local numerical
methods to find maxima, asking whether a given likelihood function could have more than one
local maximum is an important, but often overlooked, question in statistics.
The goal of maximum likelihood estimation is to find parameters that best explain a given
data point. This amounts to finding the zero set of the likelihood equations. This zero set is also
referred to as the likelihood locus and has been studied within an algebraic geometry setting in
[10] and [33]. The number of complex solutions to the system of likelihood equations for generic
data is called the maximum likelihood degree (ML degree); it is the degree of the likelihood locus
and it quantifies the feasibility of using symbolic algebraic methods to find maximum likelihood
estimates and gives an upper bound on the number of modes of the real likelihood surface.
Another statistical method for finding likely parameters given an array of observations is
restricted maximum likelihood estimation, a variation of maximum likelihood estimation. The
REML method involves considering a projection of the observed array; for the one-way layout
5
with random effects this method returns a likelihood function dependent on only two of the
three parameters. In the case of restricted maximum likelihood estimation, the REML degree
of a model is defined similarly as the ML degree: it is the number of complex solutions to the
restricted maximum likelihood equations for generic data. To our knowledge, the REML degree
has never been studied for any statistical model.
In Chapter 5, we give explicit formulas for the ML and REML degree for variance compo-
nents models, specifically one-way layouts with random effects. We conclude Chapter 5 with
two examples of a multimodal likelihood functions.
CHAPTER 2
BACKGROUND
In this dissertation, we use a common shorthand notation for monomials using vectors. Let
x = (x1, . . . xN ) be a vector of indeterminates and let a ∈ ZN≥0 be a non-negative integer vector.
Then we denote xa = xa11 · x
a22 · · ·x
aNN .
2.1 Models, Ideals, and Varieties
In Part I, we will be concerned with discrete statistical models. The proceeding notation
and definitions follow [22] and [60].
In the case of discrete models, we can think of the joint probability distribution of m
random variables as a m-dimensional array P. Let X1, . . . Xm be discrete random variables
with Xl ∈ [rl]. Let R =∏ml=1[rl] and N =
∏ml=1 rl. The (i1, i2, . . . , im)th entry of P is the joint
probability P (X1 = i1, . . . , Xm = im). For simplicity, we will often flatten the tensor P into a
vector p:
Definition 1 The joint probability vector for X1, . . . Xm is the N -dimensional vector p =
(pi | i ∈ R) ∈ RN where pi is the joint probability
pi = pi1...im = P (X1 = i1, . . . , Xm = im), i = (i1, . . . , im) ∈ R.
6
7
Since we are concerned with probability distributions, the coordinates of p satisfy pi ≥ 0
for all i ∈ R and the constraint∑
i∈R pi = 1. The set of all p that satisfy these constraints
forms a N − 1 dimensional simplex.
Definition 2 The probability simplex ∆N−1 is the set of all possible joint probability distribu-
tions for m random variables with respective state spaces [r1], . . . , [rm].
∆N−1 := {p ∈ RN | pi ≥ 0 for all i ∈ R and∑i∈R
pi = 1}.
A statistical model M for the random variables X1, . . . , Xm is a subset of ∆N−1. The
models we consider in this dissertation are parametric statistical models.
Definition 3 Let Θ ⊂ Rd be a parameter space and φ : Θ→ ∆N−1 a map. The image
M = φ(Θ) ⊂ ∆N−1
is a parametric statistical model.
In algebraic statistics, we are concerned with parametric statistical models where φ is a
rational map. In this case, it is natural to consider φ as a map from Cd → CN . The advantage
of this view is that the image φ(Cd) is well-approximatd by a variety, namely its Zariski closure
(see [60][Theorem 3.6] and proceeding discussion). For completeness, we define here what we
mean by variety, ideal of a subset of CN , and Zariski closure.
8
In the following discussion we work over the ring R[p], treating the joint probability distri-
butions pi for i ∈ R as indeterminates.
Definition 4 Let F be a collection of polynomials in R[p]. The variety of F is the zero set of
F :
V (F ) := {x ∈ CN | f(x) = 0 for all f ∈ F}.
Notice that in the above definition the variety V (F ) may not be irreducible, in general, we will
use variety to mean an algebraic set.
Let S ⊂ CN . We define the ideal of S as
I(S) := {f ∈ R[p] |f(x) = 0 for all x ∈ S}.
Given a variety V ⊂ CN , the ideal I(V ) is the set of all polynomials that vanish on V . If
V = V (J) for an ideal J , then J is called a set of defining equations for V . Notice that, in
many cases, V = V (J) does not imply I(V ) = J ; this implication is only true when J is radical.
Thus, when we say that J defines a variety V set-theorectically, this means V = V (J) but not
necessarily I(V ) = J .
As stated above, we are often interested in the Zariski closure of the image of a rational
map. The Zariski closure of S is V (I(S)); this is the smallest variety that contains S. For a
parametric model with a polynomial parameterization φ, the ideal of the model will be denoted
by
IM := I(Im φ),
9
and the variety of the model as
VM := V (IM).
Chapters 3 and 4 are motivated by parametric statistical models with polynomial parame-
terizations. Our main goal is to understand the complexity of the implicitization problem, i.e.
finding the generators of IM.
2.2 Toric models, Markov bases and Markov complexity
Let C∗ = C\{0}. If the map φ : Cd → CN in Definition 3 is monomial, then the parametric
statistical model M = φ((C∗)d) ∩ ∆N−1 is called a toric model. Toric models are generally
referred to as log-linear models in statistical literature.
In statistics, for discrete data, observations are recorded in a contingency table, m-dimensional
arrays T where the (i1, . . . , im)th entry is the number of times the random vector (X1, . . . , Xm)
was observed in state (i1, . . . , im). A Markov basis is a set of integer vectors that connects
the lattice of all tables with the same sufficient statistics as T for all T in the sample space
(the sufficient statistics are determined by the model; see [22]). Given a toric model M, the
Fundamental Theorem of Markov Bases, which originally appeared in [19], establishes a one-
to-one correspondence between Markov bases and binomial generating sets of IM. The ideal
IM is a toric ideal, which is well-known to be a binomial ideal (see [56]). Since the focus of this
dissertation is more algebraic than statistical, we will use the algebraic definition of Markov
bases.
10
Definition 5 Let M be a toric model. Let F ⊂ R[p] such that each f ∈ F is binomial. If F
generates the ideal IM, then F is a Markov basis of M.
The Markov complexity of a toric model M is the maximum degree of all polynomials in
a minimal generating of IM. In Section 3.4, we use the Graver bases to bound the Markov
complexity for certain toric models.
The Graver basis of a toric ideal I is the set of all binomials pu − pv such that there is no
other binomial pw−pz ∈ I such that pw divides pu and pz divides pv. Binomials in the Graver
basis are called primitive. The Graver complexity of a toric model M is the maximum degree
of all polynomials in the Graver basis of IM. Since the Graver basis of an ideal I contains the
universal Grobner basis of I [56], we see that the Graver basis is a generating set of I, and thus,
Markov complexity of M ≤ Graver complexity of M.
2.3 Phylogenetic Models
A problem that arises in computational biology and has been studied extensively in alge-
braic statistics is the problem of inferring phylogenetic trees given aligned DNA sequences of
several living species. Algebraic methods for inferring phylogenetic trees were first proposed
independently by Lake in [37] and Cavender and Felsenstein in [13]. The methods have been
more recently explored by Cassanellas and Fernandez-Sanchez in [14].
The models used in these studies are hidden Markov models, parametric statistical models
with a polynomial parameterization. In a hidden Markov model, evolution is assumed to
proceed along a directed tree with all edges moving away from the root. In the tree, each
11
node corresponds to a species. The leaves correspond to extant species while the internal nodes
represent extinct species.
Common hidden Markov models used in phylogenetics are the 2-state and 4-state models.
In the 4-state model, each internal node is a hidden random variable and each leaf is an observed
random variable; the state space for each random variable Xi, hidden and observed, is {1, 2, 3, 4}
which correspond to the nucleic bases {A,C,G, T}. Each edge (i, j) is assigned a 4×4 transition
matrix whose (k, l)th entry is the conditional probability P (Xj = l |Xi = k). In the general
Markov model, the model we explore in Chapter 4, the only constraint on the transition matrices
are that the rows sum to 1.
We will refer to the four-state general Markov model on the tree T as MT . The ideal IMT
is referred to as a phylogenetic ideal and its generators are called phylogenetic invariants.
Definition 6 The phylogenetic complexity of a phylogenetic modelMT is the maximum degree
over all polynomials in a minimal generating set of IMT.
When the tree T is a bifurcating tree, results in [21] and [2] state that all phylogenetic
invariants of MT can be obtained from the phylogenetic invariants of MK1,3 where K1,3 is the
3-leaf claw tree. Thus, it suffices to understand IMK1,3.
Proposition 7 [22][Proposition 4.1.11] Let M0 be the 4-state general Markov model on the
claw tree K1,3. VM0 is isomorphic to V4(4, 4, 4), the set of all complex-valued 4× 4× 4 tensors
with border rank less than or equal to 4.
12
Thus, by Proposition 7, in order to understand VM0 , we need to be able to understand the
set of all complex-valued 4× 4× 4 tensors with border rank less than or equal to 4.
2.4 Tensors, Rank, and Border Rank
This section uses terminology and definitions from [26], [38], and [41].
In Chapter 4 we focus on elements of Cm⊗Cn⊗Cl, equivalently, three-way complex-valued
tensors of dimension m × n × l. We will take a coordinate based perspective, considering a
tensor T ∈ Cm ⊗Cn ⊗Cl as an array T = [ti,j,k]m,n,li=j=k ∈ Cm×n×l whose (i, j, k)th entry is ti,j,k.
Coordinate representations of tensors are also referred to as a hypermatrices in order to call
attention to the fact that they are equipped with algebraic operations arising from the algebraic
structure of Cm ⊗ Cn ⊗ Cl rather than just data structures (see [41]).
Just as matrices, we can define a notion of rank for tensors that is independent of the choice
of bases for the vector spaces Cm, Cn, Cl. This rank definition is also sometimes referred to as
the outer product rank.
Definition 8 A three-way tensor T ∈ Cm ⊗ Cn ⊗ Cl is a rank one tensor if it can be written
as the outer product of three vectors u ∈ Cm, v ∈ Cn, w ∈ Cl, i.e.
T = u⊗ v ⊗w.
The (i, j, k)th element of T is uivjwk.
13
Definition 9 The rank of a non-zero tensor T ∈ Cm ⊗ Cn ⊗ Cl, denoted rank T , is the
minimal number r such that there exist ui ∈ Cm, vi ∈ Cn, wi ∈ Cl for 1 ≤ i ≤ r such that
T =r∑i=1
ui ⊗ vi ⊗wi.
While the set of all matrices in Cm ⊗ Cn with rank less than r is closed with respect to
the Zariski topology, the set of all tensors in Cm ⊗ Cn ⊗ Cl is not necessarily closed. Thus, we
introduce the following notion of border rank as described in [38].
Definition 10 A tensor T has border rank r if it is a limit of tensors of rank r but is not a
limit of tensors of rank s for any s < r. In this case, we write brank T = r.
We use Vr(m,n, l) to denote the set of all tensors in Cm ⊗Cn ⊗Cl of border rank less than
r. The set Vr(m,n, l) is a closed irreducible variety whose projectivization is the rth secant
variety of Pm−1 × Pn−1 × Pl−1. Chapter 4 is concerned with determining defining equations of
V4(4, 4, 4), the variety associated to the 4-state general Markov model on the 3-leaf claw tree.
In Chapter 4, many of the results are phrased in terms of the slices of a tensor T ∈
Cm ⊗ Cn ⊗ Cl. Slices are matrices obtained from a tensor T = [ti,j,k]m,n,li=j=k by fixing one of the
three indices: a 1-slice or horizontal slice is obtained by fixing the 1st index, a 2-slice or lateral
slice is obtained by fixing the 2nd index, and a 3-slice or frontal slice is obtained by fixing the
3rd index. A tensor T ∈ Cm ⊗ Cn ⊗ Cl has m horizontal slices, n lateral slices, and l frontal
slices. We denote slices as Tq,p where p ∈ {1, 2, 3} and indicates whether it is a 1-slice, 2-slice,
or 3-slice, and q gives the value of the fixed index. For example, T1,3 = [ti,j,1]m,ni,j=1.
14
One way to understand the rank of a tensor is to understand the span of its frontal slices
(or horizontal or lateral slices). Let the span of the frontal slices be denoted
T3(T ) := span(T1,3, . . . , Tl,3) ⊂ Cm×n.
The following Theorem from [26] states the connection between T3(T ) and the rank of T .
Theorem 11 [26][Theorem 2.1] Let T ∈ Cm⊗Cn⊗Cl. Then rank T is the minimal dimension
of a subspace U ∈ Cm×n that contains T3(T ) and is spanned by rank one matrices.
Theorem 11 is used in Chapter 4 as we show that V4(4, 4, 4) is cut out by polynomials of
degree 5, 6, and 9.
2.5 Maximum Likelihood Estimation
In Chapter 5 we turn our attention towards algebraic complexity problems that arise in
maximum likelihood estimation. Maximum likelihood estimation is a statistical method for
estimating the most likely parameters of a probability density function given a set of observed
data (e.g. a contingency table) and a statistical model. Let
M = {f(·|θ)|θ ∈ Θ}
be a parametric statistical model with parameters θ = (θ1, . . . , θm). Since the models we explore
in Chapter 5 are continuous, we change our notation slightly from the preceding sections and
denote a joint probability density function in M as f(·|θ).
15
In maximum likelihood estimation, one assumes that observed data are independently and
identically distributed according to a probability density function f(·|θ0) ∈ M. Thus, given a
sample of n observations x1, . . . ,xn, the goal is to find the best estimate of θ0. This amounts
to maximizing the likelihood function.
The likelihood function is the probability of observing x1, . . . ,xn given θ0 = θ, that is, it is
a function in the parameters θ
L(θ|x1, . . . ,xn) =n∏i=1
f(xi|θ).
The estimator (or MLE) of θ0, which is denoted θ, is the value of the parameters θ that
maximizes L(θ|x1, . . . ,xn). Since in many cases the logarithm of the likelihood function is
easier to analyze, in statistics, we often consider the log-likelihood function:
`(θ|x1, . . . ,xn) =n∑i=1
ln f(xi|θ).
If θ is a maximum of the log-likelihood function, then θ is the MLE for L(θ|x1, . . . ,xn). We
will use the log-likelihood function in Chapter 5.
A maximum of `(θ|x1, . . . ,xn) occurs when all its first partial derivatives are zero. The
likelihood equations, or log-likelihood equations, are the equations { ∂L∂θi= 0, i = 1, . . . ,m} and
{ ∂`∂θi= 0, i = 1, . . . ,m}, respectively. In Chapter 5 we use ‘likelihood equations’ to mean the
log-likelihood equations.
16
The number of complex solutions to the likelihood equations is constant with probability
one, and a data set is generic if it is not part of the null set for which the number of complex
solutions is different. Thus, we define the maximum likelihood degree as:
Definition 12 The maximum likelihood degree ( ML degree) is the number of complex solu-
tions to the maximum likelihood equations for generic data.
If the likelihood equations or log-likelihood equations are rational, there is symbolic (e.g
Macaulay2) and numerical software (e.g PHCpack) that can find all the complex solutions to
the likelihood equations. The remainder of the optimization problem then becomes evaluating
the likelihood function at the solutions to determine at which point the maximum is attained.
Thus the ML degree is a measure of the algebraic complexity of the problem of maximum
likelihood estimation. For more background on ML degrees, see [33; 10; 8; 22; 57; 34].
The ML degree also gives insight into the geometry of maximum likelihood estimation. The
likelihood surface is the real part of the hypersurface defined by the likelihood function. If
the ML degree of a model is greater than one, then it is possible that the likelihood surface is
multimodal, which suggests local methods of obtaining the maximum of the likelihood surface
could fail. Section 5.5 gives an example of a multimodal likelihood surface.
In Section 5.4, we study restricted maximum likelihood estimation, a variation of maximum
likelihood estimation whose algebraic complexity has not been studied before for any statistical
model. We define REML degree in terms of the restricted maximum likelihood equations.
17
Definition 13 The restricted maximum likelihood degree ( REML degree) is the number of
complex solutions to the restricted maximum likelihood equations for generic data.
Theorem 39 gives a formula for the REML degree for variance components models.
CHAPTER 3
TORIC IDEALS OF HYPERGRAPHS
This chapter is based on work in [30] with Sonja Petrovic.
3.1 Introduction
Let H be a hypergraph on V = {1, . . . , n} with edge set E ⊂ P(V ) \ {∅}. Each edge
ei ∈ E of size d encodes a squarefree monomial xei :=∏j∈ei
xj of degree d in the polynomial
ring k[x1, . . . , xn]. The edge subring of the hypergraph H, denoted by k[H], is the following
monomial subring:
k[H] := k[xei : ei ∈ E(H)].
The toric ideal of k[H], denoted IH , is the kernel of the monomial map φH : k[tei ] → k[H]
defined by φH(tei) = xei . The ideal IH encodes the algebraic relations among the edges of the
hypergraph. For the special case where H is a graph, generating sets of the toric ideal of k[H]
have been studied combinatorially in [47], [48], [51], [62], [65], and [66].
The combinatorial signatures of generators of IH are balanced edge sets of H. Balanced
edge sets on uniform hypergraphs were introduced in [50], and are referred to as monomial
walks. This chapter is based on the fact that the ideal IH is generated by binomials fE arising
from primitive balanced edge sets E of H (See Proprosition 14, a generalization of [50, Theorem
2.8]). A balanced edge set of H is a multiset of bicolored edges E = Eblue t Ered satisfying
18
19
the following balancing condition: for each vertex v covered by E , the number of red edges
containing v equals the number of blue edges containing v, that is,
degblue(v) = degred(v). (3.1.1)
A binomial fE arises from E if it can be written as
fE =∏
e∈Eblue
te −∏
e′∈Ered
te′ .
Note that while H is a simple hypergraph (it contains no multiple edges), E allows repetition
of edges. In addition, the balanced edge set E is primitive if there exists no other balanced edge
set E ′ = E ′blue t E ′red such that E ′blue ( Eblue and E ′red ( Ered; this is the usual definition of an
element in the Graver basis of IH . If H is a uniform hypergraph, a balanced edge set is called
a monomial walk to conform with the terminology in [65], [66] and [50].
The motivation for studying toric ideals IH in this work is their connection to Markov bases
for statistical models parameterized by monomials as described in Section 2.2. In what follows,
we give two general degree bounds for generators of IH (Section 3.5), study the combinatorics
of splitting sets and reducibility (defined in Section 3.3), and explore implications to algebraic
statistics and Markov complexity throughout. Section 3.4 focuses on indispensable binomials,
i.e. binomials that are members of every minimal generating set of IH . Proposition 19 gives a
combinatorial sufficient condition for determining whether a binomial f ∈ IH is indispensable.
Consequently, the Graver basis is the unique minimal generating set of IH for any 2-regular
20
hypergraph (Proposition 20). In particular, this means that the Graver basis is equal to the
universal Grobner basis, although the defining matrix need not be unimodular. Theorem 27 is a
combinatorial criterion for the ideal of a uniform hypergraph to be generated in degree at most
d ≥ 2. The criterion is based on decomposable balanced edge sets, separators, and splitting
sets; see Definitions 15 and 16. Our result generalizes the well-known criterion for the toric
ideal of a graph to be generated in degree 2 from [47], [65], and [66]. Splitting sets translate
and extend the constructions used in [47], [65], and [66] to hypergraphs and arbitrary degrees.
Theorem 29 provides a more general result for non-uniform hypergraphs.
Since log-linear models, by definition, have a monomial parametrization, we can also asso-
ciate to any log-linear model M with a square-free parameterization a (non-uniform) hyper-
graph HM. By Proposition 14, Markov moves for the modelM are described by balanced edge
sets of HM: if E is a balanced edge set of HM, then a Markov move on a fiber of the model
corresponds to replacing the set of red edges in E by the set of blue edges in E . Our degree
bounds give a bound for the Markov complexity of the model M.
We apply our combinatorial criteria to recover a well-known complexity result in algebraic
statistics from [16] in Corollary 26. Finally, we study the Markov complexity of a set of mod-
els from [59] called hidden subset models; the Zariski closure of these models are tangential
varieities. Namely, Theorem 32 says that the ideal associated to the image of Tan((P1)n) in
higher cumulants is generated by quadratics and cubics.
21
3.2 Preliminaries and notation
We remind the reader that all hypergraphs in this chapter are simple, that is, they contain
no multiple edges. In contrast, balanced edge sets of hypergraphs are not, since the binomials
arising from the sets need not be squarefree. Therefore, for the purpose of this manuscript, we
will refer to a balanced edge set as a multiset of edges, with implied vertex set; and, as usual,
V (E) denotes the vertex set contained in the edges in E .
For the remainder of this short section, we will clear the technical details and notation we
need for the proofs that follow.
A multiset, M , is an ordered pair (A, f) such that A is a set and f is a function from A
to N>0 that records the multiplicity of each of the elements of A. For example, the multiset
M = ({1, 2}, f) with f(1) = 1 and f(2) = 3 represents M = {1, 2, 2, 2} where ordering doesn’t
matter. We will commonly use the latter notation.
Given a multiset M = (A, f), the support of M is supp (M) := A, and its size is |M | :=∑a∈A f(a). For two multisets M1 = (A, f1) and M2 = (B, f2), we say M2 ⊆ M1 if B ⊆ A and
for all b ∈ B, f2(b) ≤ f1(b). M2 is a proper submultiset of M1 if B ( A, or there exists a b ∈ B
such that f2(b) < f1(b).
Unions, intersections, and relative complements of multisets are defined in the canonical
way:
22
M1 ∪M2 := (A ∪B, g) where g(a) =
f1(a) if a ∈ A \B,
f2(a) if a ∈ B \A,
max(f1(a), f2(a)) if a ∈ A ∪B;
M1 ∩M2 := (A ∩B, g) where g(a) = min(f1(a), f2(a));
M1 −M2 := (C, g), where g(a) =
f1(a) if a ∈ A \B,
f1(a)− f2(a) otherwise.
and C = A \B ∪ {a ∈ A ∩B | f1(a)− f2(a) > 0}
Note that the support of the union (intersection) of two multisets is the union (intersection)
of their supports. Finally, we define a sum of M1 and M2:
23
M1 tM2 := (A ∪B, g) where g(a) =
f1(a) if a ∈ A \B,
f2(a) if a ∈ B \A
f1(a) + f2(a) if a ∈ A ∩B
.
If M1 tM2 is a balanced edge set, then the notation M1 tbM2 will be used to record the
bicoloring of M1 tM2: edges in M1 are blue, and edges in M2 are red.
Finally, the number of edges in a hypergraph H containing a vertex v will be denoted by
deg(v;H). For a bicolored multiset M := Mblue tm Mred, the blue degree degblue(v;M) of a
vertex v is defined to be deg(v;Mblue). The red degree degred(v;M) is defined similarly.
3.3 Splitting sets and reducible edge sets
The aim of this section is to lay the combinatorial groundwork for studying toric ideals of
hypergraphs. In particular, we explicitly state what it combinatorially means for a binomial
arising from a monomial walk to be generated by binomials of a smaller degree. We begin by
describing the binomial generators of IH . Unless otherwise stated, H need not be uniform.
Proposition 14 Every binomial in the toric ideal of a hypergraph corresponds to a balanced
edge set. In particular, the toric ideal IH is generated by primitive balanced edge sets.
Proof. Suppose E is a balanced multiset of edges over H. Define a binomial fE ∈ k[te :
e ∈ E(H)] as follows:
fE =∏
e∈Eblue
te −∏
e′∈Ered
te′ .
24
The balancing condition (3.1.1) ensures that fE is in the kernel of the map φH .
The second claim is immediate.
Motivated by the application of reducible simplicial complexes to understand the Markov
bases of hierarchical log-linear models [20], we now introduce notions of reducibility and sepa-
rators for balanced edge sets. For simplicity, we will often abuse notation and use H to denote
the edge set of H.
Definition 15 A balanced edge set E is said to be reducible with separator S, supp (S) ⊆
supp (E), and decomposition (Γ1, S, Γ2), if there exist balanced edge sets Γ1 6= E and Γ2 6= E
with S 6= ∅ such that S = Γ1red∩Γ2blue
, E = Γ1tΓ2, and the following coloring conditions hold:
Γ1red,Γ2red
⊆ Ered and Γ1blue,Γ2blue
⊆ Eblue.
We say that S is proper with respect to (Γ1, S, Γ2) if S is a proper submultiset of both Γ1red
and Γ2blue.
If S is not proper, then S is said to be blue with respect to (Γ1, S, Γ2) if Γ1red= S, and red
with respect to (Γ1, S, Γ2) if Γ2blue= S.
Figure 1 shows an example of a reducible balanced edge set E . The separator is proper
and consists of the single green edge es; it appears twice in the balanced edge set E , once as
a blue edge and once as a red edge. Figure 2 shows a reducible balanced edge set where the
separator, consisting of the two green edges e1 and e2, is not proper. As before, the separator
edges appear twice in the balanced edge set.
25
!"#
Figure 1. Reducible balanced edgeset. The green edge es is the
separator.
!"# !$#
Figure 2. Reducible balanced edge set with animproper separator. The separator consists of
green edges e1 and e2.
If H is a hypergraph and E is a balanced edge set with supp (E) ⊆ H, given a multiset S
with supp (S) ⊆ H, we can construct a new balanced edge set in the following manner:
E + S := (Eblue t S) tm (Ered t S).
Definition 16 Let H be a hypergraph. Let E be a balanced edge set with size 2n such that
supp (E) ⊆ H. A non-empty multiset S with supp (S) ⊆ H is a splitting set of E with decom-
position (Γ1, S, Γ2) if E + S is reducible with separator S and decomposition (Γ1, S, Γ2).
S is said to be a blue ( red, resp.) splitting set with respect to (Γ1, S, Γ2), if S is a blue
(red, resp.) separator of E + S with respect to (Γ1, S, Γ2).
S is a proper splitting set of E if there exists a decomposition (Γ1, S, Γ2) of E +S such that
S is a proper separator with respect to (Γ1, S, Γ2).
26
Example 17 (Group-based Markov model) Let V1 = {x1, x2, x3, x4}, V2 = {y1, y2, y3, y4},
and V3 = {z1, z2, z3, z4}. Let V be the disjoint union of V1, V2, and V3. Let H be the 3-uniform
hypergraph with vertex set V and edge set:
e111 = {x1, y1, z1} e122 = {x1, y2, z2} e133 = {x1, y3, z3} e144 = {x1, y4, z4}
e221 = {x2, y2, z1} e212 = {x2, y1, z2} e243 = {x2, y4, z3} e234 = {x2, y3, z4}
e331 = {x3, y3, z1} e342 = {x3, y4, z2} e313 = {x3, y1, z3} e324 = {x3, y2, z4}
e441 = {x4, y4, z1} e432 = {x4, y3, z2} e423 = {x4, y2, z3} e414 = {x4, y1, z4}
The hypergraph H has applications in algebraic phylogenetics: it represents the parametriza-
tion of the Z2 × Z2 group-based Markov model on the claw tree K1,2 (see [58, Example 25]).
This model is a submodel of the 4-state general Markov model on K1,2 described in [addme].
Consider the monomial walk
W = {e324, e111, e243, e432} tm {e122, e313, e234, e441}.
Let S = {e133, e212}. Then S is a splitting set of W with decomposition (Γ1, S, Γ2) where
Γ1 = {e111, e243, e432} tm {e133, e212, e441}
Γ2 = {e133, e212, e324} tm {e122, e313, e234}.
27
The decomposition (Γ1, S,Γ2) encodes binomials in IH that generate fW :
fW = te324(te111te243te432 − te133te212te441) + te441(te133te212te324 − te122te313te234).
The previous example illustrates the algebraic interpretation of a splitting set. Notice there
is a correspondence between monomials in k[tei ] and multisets of edges of H. We will write
E(ta1ei1ta2ei2· · · tal
eil) for the multiset ({ei1 , . . . , eil}, f) where
f : {ei1 , . . . , eil} → N
eij 7→ aj .
Thus the support of E(ta1ei1ta2ei2· · · tal
eil) corresponds to the support of the monomial ta1
ei1ta2ei2· · · tal
eil.
If fE = u − v ∈ IH is the binomial arising from the balanced edge set E , then a monomial
s corresponds to a splitting set S if and only if there exist two binomials u1 − v1, u2 − v2 ∈ IH
such that us = u1u2, vs = v1v2 and s = gcd(v2, u1). In this case, the decomposition of E + S
is (Γ1, S,Γ2) where Γ1 = E(u1) tm E(v1) and Γ2 = E(u2) tm E(v2).
For a balanced edge set, E , the existence of a spitting set determines whether the binomial
fE ∈ IH can be written as the linear combination of two binomials fΓ1 ,fΓ2 ∈ IH . While, in
general, the existence of a splitting set does not imply deg(fΓ1), deg(fΓ2) < deg(fE), if H is
uniform and the splitting set is proper, then the following lemma holds.
28
Lemma 18 Let H be a uniform hypergraph and let W be a monomial walk with supp (W) ⊆ H
and |W| = 2n. If S is a proper splitting set of W, then there exists a decomposition (Γ1, S, Γ2)
of W + S such that |Γ1| < |W| and |Γ2| < |W|.
Proof. Let S be a proper splitting set ofW. By definition, there exists a decomposition
(Γ1, S, Γ2) of W + S, such that S is a proper submultiset of Γ1redand Γ2blue
.
Let |Γ1| = 2n1 and |Γ2| = 2n2. Since W +S = Γ1 tΓ2, it follows that |W +S| = |Γ1|+ |Γ2|.
Then, 2n + 2|S| = 2n1 + 2n2, which implies 2n − 2n1 = 2n2 − 2|S|. But S being a proper
submultiset of Γ2bluegives that n2 > |S|, which, in turn, implies that n > n1. By a similar
argument, n > n2. Thus |Γ1| < |W| and |Γ2| < |W|.
3.4 Indispensable Binomials
A binomial f in a toric ideal I is indispensable if f or−f belongs to every binomial generating
set of I. Indispensable binomials of toric ideals were introduced by Takemura et al, and are
studied in [63], [3], [11], [48], [51]. The degree of a indispensable binomial in IH is a lower
bound on the Markov complexity of the model associated to H.
Proposition 19 Let H be a hypergraph. Let E be a balanced edge set with supp (E) ⊆ H. Let
fE be the binomial arising from E. If there does not exist a splitting set of E, then fE is an
indispensable binomial of IH .
Proof. Suppose E is not indispensable. Then there is a binomial generating set of IH ,
G = {f1, . . . , fn}, such that fE /∈ G and −fE /∈ G.
29
Since fE = f+E − f−E ∈ IH , there is a fi = f+
i − f−i ∈ G such that f+i or f−i divides
f+E . Without loss of generality, assume f+
i |f+E . Since fi is a binomial in IH , fi arises from a
monomial walk Ei on H.
Let S = Eired. Let Γ1 = Ei and Γ2 = Γ2blue
tm Γ2redwhere
Γ2blue= ((Eblue − Eiblue
) t Eired)
Γ2red= Ered.
Since f+i |f
+E , the multiset Eiblue
⊆ Eblue, and thus Γ1 t Γ2 = E + S. By construction, Γ1red∩
Γ2blue= S. Therefore S is a splitting set of E .
If every Graver basis element of a binomial ideal IH is indispensable, then the Graver basis
of IH is the unique minimal generating set of IH . Propositions 20 and 24 describe two classes of
hypergraphs where this is the case. In particular, for these hypergraphs, the universal Grobner
basis of IH is a minimal generating set.
Proposition 20 If H is a 2-regular uniform hypergraph, then the Graver basis of IH is the
unique minimal generating set of IH .
For the proof of Proposition 20, we make use of Proposition 3.2 in [50] which concerns
balanced edge sets that are pairs of perfect matchings.
Definition 21 A matching on a hypergraph H = (V,E) is a subset M ⊆ E such that the
elements of M are pairwise disjoint. A matching is called perfect if V (M) = V .
30
Proof. [Proof of Proposition 20] Let G be the Graver basis of IH and let f ∈ G. Since
every element of G is binomial, f arises from a primitive monomial walkW with supp (W) ⊆ H.
Let Mb = supp (Wred) and Mr = supp (Wblue). By primitivity of W, the intersection
Mr ∩ Mb = ∅. Since W satisfies condition (3.1.1) and H is 2-regular, if e1, e2 ∈ Mb and
e1 ∩ e2 6= ∅, then e1 ∈Mr or e2 ∈Mr, which would contradict the primitivity of W. So Mb and
Mr are two edge-disjoint perfect matchings on V (W). By Proposition 3.2 in [50], W contains
no multiple edges, i.e. W = Mb tmMr. Furthermore, since H is 2-regular, the edge set of the
subhypergraph induced by V (W) is Mb ∪Mr
Suppose S is a splitting set of W with decomposition (Γ1, S, Γ2). By the correspondence
between primitive monomial walks and primitive binomials, there exists a primitive monomial
walk Γ such that Γblue ⊆ Γ1blueand Γred ⊆ Γ1red
(if Γ1 is primitive, then Γ = Γ1). By
Proposition 3.2 in[50], Γ must be a pair of perfect matchings on V (Γ ). This means Γ is a proper
balanced edge set of W, a contradiction. Therefore, by Proposition 19, fW is indispensable.
Since every element in the Graver basis of IH is indispensable, there is no generating set of IH
strictly contained in the Graver basis, and the claim follows.
Definition 22 A k-uniform hypergraph H = (V,E) is k-partite if there exists a partition of V
into k disjoint subsets, V1, . . . , Vk, such that each edge in E contains exactly one vertex from
each Vi.
Lemma 23 Let H = (V,E) be a k-uniform k-partite hypergraph with E = EbtEr and Eb∩Er =
∅. If there exists a Vi, 1 ≤ i ≤ k, such that deg(v;Er) = deg(v;Eb) = 1 for all v ∈ Vi, then a
monomial walk W with support E is primitive only if W contains no multiple edges.
31
Proof. Follows from the proof of necessity of Proposition 3.2 in [50].
Proposition 24 Let H = (V,E) be a k-uniform k-partite hypergraph. If there exists a Vi such
that deg(v;E) = 2 where for all v ∈ Vi, then the Graver basis of IH is the unique minimal
generating set of IH .
Proof. The proof is similar to the proof of Proposition 20. Note that while H may
not be 2-regular, one of its parts, Vi, is ‘locally’ 2-regular, and thus restricts the structure of
monomial walks on H. In particular, Lemma 23 ensures that Mr and Mb, are edge-disjoint
perfect matchings on V (W)|Vi , and the rest of the proof follows immediately.
Example 25 (No 3-way interaction) The toric ideal of the hypergraph H in Figure 3 cor-
responds to the hierarchical log-linear model for no 3-way interaction on 2× 2× 2 contingency
tables. This statistical model is a common example in algebraic statistics [22, Example 1.2.7].
Since there is exactly one primitive monomial walk W on H that travels through 8 edges,
IH = (fW).
For 2 × 3 × 3 contingency tables with no 3-way interaction, the hypergraph corresponding
to this log-linear model has 18 edges. The hypergraph in this case is H = (V,E) where V =
32
Figure 3. Hypergraph associated to the hierarchical log-linear model for no 3-way interaction.
{x00, x01, x02, x10, x11, x12, y00, y01, y02, y10, y11, y12, z00, z01, z02, z10, z11, z12, z20, z21, z22} and the
edge set is:
e000 = {x00, y00, z00} e001 = {x00, y01, z01} e002 = {x00, y02, z02}
e010 = {x01, y00, z10} e011 = {x01, y01, z11} e012 = {x01, y02, z12}
e020 = {x02, y00, z20} e021 = {x02, y01, z21} e022 = {x02, y02, z22}
e100 = {x10, y10, z00} e101 = {x10, y11, z01} e102 = {x10, y12, z02}
e110 = {x11, y10, z10} e111 = {x11, y11, z11} e112 = {x11, y12, z12}
e120 = {x12, y10, z20} e121 = {x12, y11, z21} e122 = {x12, y12, z22}
Let W be the primitive monomial walk
W = {e000, e101, e011, e112, e022, e120} tm {e100, e001, e111, e012, e122, e220.}
33
Every remaining edge H that does not appear in W is not contained in V (W), thus it can
be easily verified that there does not exist a splitting set of W, so by Proposition 19, fW is
indispensable. In fact, H satisfies the condition of Proposition 24 and thus every binomial in
IH corresponding to a primitive monomial walk is indispensable.
From the above discussion, we can see that if a uniform hypergraph H contains an induced
subhypergraph Hs that is 2-regular and there exists a bicoloring such that with this bicoloring
Hs is also a balanced edge set, then the maximum degree of any minimal generating set of IH
is at least |E(Hs)|/2.
A similar statement holds for k-uniform, k-partite hypergraphs with vertex partition V =
∪ki=1Vi. Namely, if H contains an induced subhypergraph Hs that is 2-regular on Vi (i.e., H
satisfies the conditions of Proposition 24) and there exists a bicoloring such that with this
bicoloring Hs is a balanced edge set (e.g., Hs is a pair of disjoint perfect matchings), then the
maximum degree of any minimal generating set of IH is at least |E(Hs)|/2.
Recall that degree bounds on minimal generators give a Markov complexity bound for the
corresponding log-linear model in algebraic statistics. This allows us to recover a well-known
result:
Corollary 26 (Consequence of Theorem 1.2 in [16]; see also Theorem 1.2.17 in [22])
The Markov complexity for the no 3-way interaction model on 3×r×c contingency tables grows
arbitrarily large as r and c increase.
Proof. For the no 3-way interaction model on 2×r×c contingency tables, we can construct
a primitive binomial fHs of degree 2 · min(r, c) in its defining toric ideal by taking a cycle of
34
length min(r, c) on the bipartite graph Kr,c. (We remind the reader that this is precisely how
fW is constructed in Example 25). By noting that the hypergraph associated to this binomial
Hs is an induced subhypergraph of the hypergraph associated to the 3 × r × c case and that
Hs is 2-regular in one of the partitions, the claim follows by Proposition 24.
3.5 General degree bounds
For uniform hypergraphs, balanced edge sets are referred to as monomial walks. In the
previous sections, we saw that splitting sets of W translate to algebraic operations on the
binomials fW , providing a general construction for rewriting a high-degree binomial in terms of
binomials corresponding to shorter walks. This, along with Lemma 18, is the key to the general
degree bound result.
Theorem 27 Given a k-uniform hypergraph H, the toric ideal IH is generated in degree at most
d if and only if for every primitive monomial walk W of length 2n > 2d, with supp (W) ⊆ H,
one of the following two conditions hold:
i) there exists a proper splitting set S of W,
or
ii) there is a finite sequence of pairs, (S1, R1), . . . , (SN , RN ), such that
• S1 and R1 are blue and red splitting sets of W of size less than n with decompositions
(Γ11 , S1, Γ21) and (Υ11 , R1, Υ21),
• Si+1 and Ri+1 are blue and red splitting sets of Wi = Γ2ibluetm Υ1ired
of size less than n
with decompositions (Γ1i+1 , Si+1, Γ2i+1) and (Υ1i+1 , Ri+1,Υ2i+1), and,
35
• SN ∩RN 6= ∅ or there exists a proper splitting set of WN .
Proof. [Proof of necessity (⇒)] Let H be a k-uniform hypergraph whose toric ideal IH
is generated in degree at most d. Let W be a primitive monomial walk of length 2n > 2d.
Let pW = u − v be the binomial that arises from W. Since IH is generated in degree at most
d, there exist primitive binomials of degree at most d, (u1 − v1), . . . , (us − vs) ∈ k[tei ], and
m1, . . . ,ms ∈ k[tei ], such that
pW = m1(u1 − v1) +m2(u2 − v2) + . . .+ms(us − vs).
By expanding and reordering so that m1u1 = uw, msvs = vw, and mivi = mi+1ui+1 for all
i = 1, . . . , s− 1, we may and will assume that m1, . . . ,ms are monomials.
If gcd(mi,mi+1) 6= 1 for some i, we can add the terms mi(ui − vi) and mi+1(ui+1 − vi+1)
to get a new term, m′i(u′i− v′i), where m′i = gcd(mi,mi+1) and (u′i− v′i) is an binomial of IH of
degree less than n. Continuing recursively in the manner, we have
pW = m′1(u′1 − v′1) +m′2(u′2 − v′2) + . . .+m′r(u′r − v′r)
where m′1u′1 = u′w, m′rv
′r = v′w, m′iv
′i = m′i+1u
′i+1, gcd(m′i,m
′i+1) = 1 for all i = 1, . . . , r− 1, and
deg(u′i − v′i) < n for all i = 1, . . . r. For convenience, we will drop the superscripts and write
pw = m1(u1 − v1) +m2(u2 − v2) + . . .+mr(ur − vr).
36
Case 1: r = 2. In this case, pW = m1(u1 − v1) +m2(u2 − v2). Let
Γ1 := E(u1) tm E(v1)
Γ2 := E(u2) tm E(v2)
S := E(v1) ∩ E(u2) = E(gcd(v1, u2)).
We want to show (Γ1, S, Γ2) is a decomposition of W + S. Since S = Γ1red∩ Γ2blue
, Γ1blue⊆
Wblue, and Γ2red⊆ Wred, we only need to show W + S = Γ1 t Γ2, Γ2red
⊆ (W + S)red, and
Γ2blue⊆ (W + S)blue. First, notice the following equalities hold:
W + S = (Wblue t S) t (Wred t S) = E(u) t S t E(v) t S
= E(m1u1) t S t E(m2v2) t S = E(m1) t E(u1) t S t E(m2) t E(v2) t S.
Let s ∈ k[tei ] be the monomial such that E(s) = S, so s = gcd(v1, u2). The equality
m1v1 = m2u2 implies m1(v1s ) = m2(u2s ). Now, v1s and u2
s are clearly relatively prime, and by the
assumptions on pW , m1 and m2 are relatively prime. This means the equality m1(v1s ) = m2(u2s )
implies m1 = u2s and m2 = v1
s . Thus,
Γ1 t Γ2 = E(u1) t E(v1) t E(u2) t E(v2)
= E(u1) t E(v1
s) t S t E(v2) t E(
u2
s) t S
= E(u1) t E(m2) t S t E(v2) t E(m1) t S.
37
Consequently, W + S = Γ1 t Γ2.
Notice the equality m2 = v1s also implies Γ1red
= E(v1) = E(m2) t S. This means Γ1red⊆
(E(m2u2) t S) = (Wred t S) = (W + S)red. By a similar observation, Γ2blue⊆ (W + S)blue.
Case 2: r = 2N + 1. For 1 < i < N , let
Γ1i = E(ui) tm E(vi)
Γ2i = E(mi+1ui+1) tm E(m2N−i+2v2N−i+2)
Si = E(vi) ∩ E(mi+1ui+1) = E(gcd(vi,mi+1ui+1)) = E(vi).
For 1 < i < N , let
Υ1i = E(miui) tm E(m2N−i+1v2N−i+1)
Υ2i = E(u2N−i+2) tm E(v2N−i+2)
Ri = E(m2N−i+1v2N−i+1) ∩ E(u2N−i+2)
= E(gcd(m2N−i+1v2N−i+1, u2N−i+2)) = E(u2N−i+2).
One can follow the proof of Case 1) to see that S1 and R1 are splitting sets of W, and Si+1
and Ri+1 are splitting sets of Wi = E(mi+1ui+1) tm E(m2N−i+1v2N−i+1) for i = 1, . . . , N − 1.
Furthermore, by definition, they are blue and red splitting sets (resp.) of size less than 2n.
38
Since WN−1blue= Γ2N−1blue
and WN−1red= Υ1N−1red
, the binomial arising from the walk on
WN−1 is
mNuN −mN+2vN+2 = mN (uN − vN ) +mN+1(uN+1 − vN+1) +mN+2(uN+2 − vN+2).
Choose e ∈ H such that te | mN+1, then te | vN and te | uN+2. But since SN = E(vN ) and
RN = E(uN+2), e ∈ SN and e ∈ RN , so SN ∩RN 6= ∅.
Case 3: r = 2N + 2. For 1 < i < N , let
Γ1i = E(ui) tm E(vi)
Γ2i = E(mi+1ui+1) tm E(m2N−i+3v2N−i+3)
Si = E(vi) ∩ E(mi+1ui+1) = E(gcd(vi,mi+1ui+1)) = E(vi).
For 1 < i < N , let
Υ1i = E(miui) tm E(m2N−i+2v2N−i+2)
Υ2i = E(u2N−i+3) tm E(v2N−i+3)
Ri = E(m2N−i+2v2N−i+2) ∩ E(u2N−i+3)
= E(gcd(m2N−i+2v2N−i+2, u2N−i+3)) = E(u2N−i+3).
We can follow the proof of Case 1) to see that S1 and R1 are splitting sets of W, and Si+1
and Ri+1 are splitting sets of Wi = E(mi+1ui+1) tm E(m2N−i+2v2N−i+2) for i = 1, . . . , N − 1.
39
Furthermore, by definition, they are blue and red (resp.) splitting sets of size less than n. Since
WNblue= Γ2Nblue
and WNred= Υ1Nred
, the binomial arising from WN is
mN+1uN+1 −mN+2vN+2 = mN+1(uN+1 − vN+1) +mN+2(uN+2 − vN+2)
which is exactly case 1), which means there exists a proper splitting set of WN .
Proof. [Proof of sufficiency (⇐)] Assume every primitive monomial walk W of length
2n > 2d with supp (W) ⊂ H satisfies i) or ii). Let pW = u − v be a generator of IH which
arises from the monomial walk W on H.
To show that IH = [IH ]≤d, we proceed by induction on the degree of pW . If deg pW = 2,
then pW ∈ [IH ]≤d. So assume deg pW = n > d and every generator of IH of degree less than n
is in [IH ]≤d. Since the size of W is greater than 2d, either condition i) holds or condition ii)
holds.
Suppose i) holds. By Lemma 3.5, there exists a decomposition of W, (Γ1, S, Γ2), such that
|Γ1| < |W| and |Γ2| < |W|. Let pΓ1 = u1− v1 (pΓ2 = u2− v2, respectively) be the binomial that
arises from Γ1 (Γ2, respectively). Let m1 = u/u1 and m2 = v/v2.
What remains to be shown is that pW = m1pΓ1 + m2pΓ2 , that is, u − v = m1(u1 − v1) +
m2(u2 − v2). However, it is clear that u = m1u1 and v = m2v2, so it suffices to show is that
m1v1 = m2u2, or equivalently, E(m1v1) = E(m2u2).
40
Let s ∈ k[tei ] be the monomial such that E(s) = S. Then
Γ1 t Γ2 = (E(u1) t E(v1
s) t S) t (E(
u2
s) t S t E(v2))
and
W + S = (E(m1) t E(u1) t S) t (E(m2) t E(v2) t S).
Thus, since W + S = Γ1 t Γ2,
E(m1) t E(m2) = E(v1
s) t E(
u2
s),
which in turn implies
m1m2 = (v1
s)(u2
s).
Since W is primitive and the coloring conditions on (Γ1, S, Γ2) imply E(v1s ) ⊆ Wred and
E(m1) ⊆ Wblue, the monomials m1 and v1s are relatively prime. A similar argument shows
m2 and u2s are relatively prime. Thus, m1 = u2
s and m2 = v1s , and consequently, E(m1v1) =
E(m2u2) and pw = m1pΓ1 +m2pΓ2 .
Since deg pΓ1 , deg pΓ2 < n, the induction hypothesis applied to pΓ1 and pΓ2 shows that
pW ∈ [IH ]≤d.
41
Now suppose ii) holds. For i from 1 to N , let pΓ1i= ui − vi and pΥ2i
= yi − zi be the
binomials arising from Γ1i and Υ2i . Let wib −wir be the binomial arising from the walk Wi and
let pW = w0b− w0r . For 1 ≤ i ≤ N , let mi = w(i−1)b
/ui, and qi = w(i−1)r/zi. Then
pW =N∑i=1
mi(ui − vi) + wNb− wNr +
N∑i=1
qN+1−i(yN+1−i − zN+1−i).
The preceding claim follows from three observations: (1) by construction, w0b= m1u1 and
w0r = q1z1; (2) by the definition of WN , wNb= mNvN and wNr = qNyN ; and (3) by the
definitions of mi, qi, and the walk Wi, mivi = mi+1ui+1 and qi+1zi+1 = qiyi for 1 ≤ i ≤ N − 1.
As a consequence of the size conditions on the splitting sets of Wi, the linear combination∑Ni=1mi(ui − vi) ∈ [IH ]≤d and
∑Ni=1 qN+1−i(yN+1−i − zN+1−i) ∈ [IH ]≤d. So if WN satisfies
condition i), the binomial wNb− wNr ∈ [IH ]≤d, and thus, pW ∈ [IH ]≤d.
To finish the proof, assume that SN and RN share an edge, e. Then the claim above
becomes:
pW =N∑i=1
mi(ui − vi) + te(mNvNte
− qNyNte
) +N∑i=1
qN+1−i(yN+1−i − zN+1−i)
and we just need to show that, in fact, te divides mNvN and qNyN . But this is clear to see
since e ∈ SN which implies te|vN and e ∈ RN which implies te|yN .
Example 28 (Independence models) Let H be the complete k-partite hypergraph with d
vertices in each partition V1, . . . , Vk. These hypergraphs correspond to independence models in
42
statistics. Equivalently, the edge subring of the complete k-partite hypergraph with d vertices in
each partition parametrizes the Segre embedding of Pd × · · · × Pd with k copies.
The ideal IH is generated by quadrics. To see this, let W, supp (W) ⊆ H, be a primitive
monomial walk of length 2n, n > 2. Choose a multiset E′ ⊂ W consisting of n−1 blue and n−1
red edges. Since each edge must contain a vertex from each Vi, for each i, there is at most one
vertex in V (E′) ∩ Vi that is not covered by a red edge and a blue edge from E′. Consequently,
V (E′) contains a vertex from each Vi that belong to at least one red edge and at least one blue
edge of E′.
For a multiset of edges, M , with supp (M) ⊆ H, we define the max degree of a vertex:
maxdeg(v;M) := max(degred(v;M),degblue(v;M)).
The partitioning of the vertices ensures that V (E′) cannot contain more then k vertices whose
maxdeg with respect to E′ is n− 1. Indeed, if there are more that k vertices with maxdeg equal
to n − 1, then two of those vertices must belong to the same partition, Vj. This would imply
that W contains at least 4(n− 1) edges, which is impossible when n > 2.
Next, choose n− 1 new blue edges and n− 1 red edges in the following manner:
Let db(v) := degblue(v;E′) and dr(v) := degred(v;E′). For i = 1, . . . , k choose a vertex from
V (E′blue) ∩ V (E′red) ∩ Vi that has the largest maxdeg with respect to E′; let bn−1 and rn−1 be
this set of vertices. For all v ∈ bn−1, reduce db(v) and dr(v) by 1. Now choose b1, . . . , bn−2 by
the following algorithm:
43
for i from 1 to k do:
let Vi :=sort V (E′) ∩ Vi by db(v) in decreasing order;
for j from n− 2 down to 1 do:
(
bj := list {vi : vi is first element in Vi};
for all v ∈ bj do db(v) = db(v)− 1;
for i from 1 to k do Vi =sort Vi by db(v) in decreasing order;
).
Let R1 = {b1, . . . , bn−1} and S1 = {r1, . . . , rn−1}. Then R1 and S1 are red and blue splitting
sets of W that share an edge. Thus, condition ii) of Theorem 27 is met, and consequently IH
is generated in degree 2.
When H is a non-uniform hypergraph, the toric ideal IH is not necessarily homogeneous.
For example, Figure 4 supports a binomial in IH where H consists of edges of size two and four;
note that the edges still satisfy the balancing condition (3.1.1). However, we can still modify the
conditions of Theorem 27 to find degree bounds for the toric ideals of non-uniform hypergraphs.
Proposition 29 gives a prescription for determining a degree bound on the generators of IH in
terms of local structures of H.
Proposition 29 Given a hypergraph H and a binomial fE ∈ IH arising from the balanced edge
set E with n = |Eblue| ≥ |Ered|, fE is a linear combination of binomials in IH of degree less than
n if one of the following two conditions hold:
44
Figure 4. Example of a non-uniform hypergraph whose associated toric ideal isnon-homogeneous.
i) there exists a proper splitting set S of E with decomposition (Γ1, S, Γ2) where |Γiblue|, |Γired
| <
n for i = 1, 2,
or
ii) there is a pair of blue and red splitting sets of E, S and R, of size less than n with
decompositions (Γ1, S, Γ2), (Υ1, R, Υ2) such that |Γ1blue|, |Υ2red
| < n, |Γ2blue|, |Υ1red
| ≤ n, and
S ∩R 6= ∅.
Proof. This proof follows the proof of sufficiency for Theorem 27. Note that in the
proof, the uniform condition doesn’t play an essential role; it is only invoked to bound the size of
the red and blue parts of each monomial hypergraph appearing in the decompositions involved.
Thus, the hypothesis of Proposition 29 acts in place of the uniform condition in Theorem 27.
3.6 Hidden Subset Models
For the remainder of this section, we will concern ourselves with the first tangential variety,
Tan((P1)n). In [59], Sturmfels and Zwiernik use cumulants to give a monomial parameterization
45
of Tan((P1)n). The variety Tan((P1)n) is associated to a class of hidden subset models [59,
Example 5.2] and context-specific independence models [46]. We now derive a bound for the
toric ideal of the image of Tan((P1)n) in higher cumulants and, equivalently, for the Markov
complexity of these models.
Example 30 Let H = (V,E) where V = {1, . . . , n} and E = {e : e ⊆ V and |e| ≥ 2}. Then
the set of polynomials vanishing on the image of Tan((P1)n) in higher cumulants is the toric
ideal IH (see [59, Theorem 4.1]).
The hypergraph in Example 30 is the complete hypergraph on n vertices after removing
all singleton edges. The degree bound on the generators of this hypergraph can be found by
looking at a smaller hypergraph.
Lemma 31 Let H1 = (V,E1) where V = {1, . . . , n} and E1 = {e : e ⊆ V and |e| ≥ 2}, and
let H2 = (V,E2) where E2 = {e ⊆ V : 2 ≤ |e| ≤ 3}. If the ideal IH2 is generated in degree at
most d, then the ideal IH1 is generated in degree at most d.
Proof. Consider IH2 as an ideal in the bigger polynomial ring S := k[tei : ei ∈ H1],
denoted as IH2 := IH2S. Assume that IH2 , and consequently, IH2 , is generated in degree at
most d. Pick an arbitrary binomial
u− v = tei1tei2· · · tein
− tej1tej2· · · tejm
∈ IH1 .
46
Since every edge e ∈ H1 is the disjoint union of a collection of edges ek1 , . . . , ekl∈ H2, we may
write te −∏li=1 teki
∈ IH1 . Noting that
te −l∏
i=1
teki= (te − tek1
t∪li=2eki
)−l−2∑j=1
[(j∏i=1
teki
)(t∪l
i=j+1eki− tej+1t∪l
i=j+2eki)
],
one easily sees that the binomial te −∏li=1 teki
is generated by quadratics. In turn, this essen-
tially shows that relations in IH2 allow us to rewrite u−v in terms of edges ei1 , . . . , ein , ej1 , . . . , ejm ∈
E2 of size 2 and 3 only. The claim follows since u− v can be expressed as a binomial in IH2 .
Theorem 32 Let H = (V,E) where V = {1, . . . , n} and E = {e ⊆ V : 2 ≤ |e| ≤ 3}. The
toric ideal of H is generated by quadrics and cubics.
In particular the image of Tan((P1)n) in higher cumulants is generated in degrees 2 and 3.
In the following proof, we examine the local combinatorics of H to illustrate how the struc-
ture of a hypergraph reveals insights into the generating set of IH .
Proof. Let fE be a primitive binomial in IH with E a balanced edge set. Without loss
of generality, we will assume throughout the proof |Eblue| ≥ |Ered|. If E contains only 2-edges or
only 3-edges, then by [56, Theorem 14.1] fE is a linear combination of quadratics. So we will
assume E contains a 2-edge and a 3-edge.
Since |Eblue| ≥ |Ered|, Eblue must contain at least as many 2-edges as Ered, and in order to
satisfy (3.1.1), the difference between the number of 3-edges in Ered and the number of 3-edges
in Eblue must be a multiple of 2.
47
Notice that for every pair e1, e2 of 3-edges (where e1 and e2 do not need to be unique), there
are three 2-edges in H, e3, e4, e5, such that
{e1, e2} tm {e3, e4, e5}
is a balanced edge set. Let B2,3 ⊂ IH be the set of all binomials arising from balanced edge
sets of this form. Then fE is a linear combination of binomials in B2,3 and fE ′ , where E ′blue and
E ′red contains the same number of 2-edges and exactly one 3-edge.
Since it suffices to consider primitive binomials, we will proceed inductively by showing that
every primitive degree n binomial in
Bh := {fE ∈ IH : |Eblue| = |Ered| and Eblue, Ered contain exactly one 3-edge each}
is a linear combination of binomials in Bh with degree less than n.
Let fE ∈ Bh such that degree fE = n > 3 and fE is primitive. Let e1 be the 3-edge in Ered.
Since fE is primitive, e1 must intersect a 2-edge e2 in Eblue. Let e2 = {v1, v2} where v1 ∈ e1.
The edge e2 intersects at most one other edge of Ered besides e1. We will examine the
possible intersections of e2 and Ered in order to find splitting sets of E that satisfy one of the
conditions listed in Proposition 29. For illustrations of Case 1 and Case 3 see Figures 5 and
6. In all three cases, we will construct S, Γ1 and Γ2 such that S is a splitting set of E with an
associated decomposition (Γ1, S, Γ2) which satisfies the properties of condition i) in Theorem 29.
In fact, fE will be a linear combination of fΓ1 and fΓ2 , both of which have strictly lower degree
48
than fE . Furthermore, since the blue and red parts of Γ1 and Γ2 will contain the same number
of 2 and 3-edges, it follows that fΓ1 , fΓ2 ∈ Bh.
Case 1: The edge e1 = e2 ∪ {v3} = {v1, v2, v3} for some v3 ∈ V (E).
Since v3 /∈ e2 and |Eblue| = |Ered|, there must be a 2-edge e3 ∈ Ered such that v3 /∈ e3 in order
for (3.1.1) to hold. Let e3 = {v4, v5} and e4 = {v3, v4, v5}. The sets S, Γ1 and Γ2 in this case
are:
S = {e4}
Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3}) t {e4})
Γ2 = {e2, e4} tm {e1, e3}.
!"#!$#
!%# !&#
Figure 5. Case 1. Proof of Theorem 32.
!"#
!$##
!%#
!&#
Figure 6. Case 3. Proof of Theorem 32.
49
Case 2: The edge e1 = {v1, v3, v4} for some v3, v4 ∈ V (E) and there is a 2-edge e3 ∈ Ered
such that e3 = {v2, v3}.
Since v3 /∈ e2, degblue(v3; E) = degred(v3; E) ≤ n−1 and, thus, there exists a 2-edge e4 ∈ Ered
such that v3 /∈ e4. Let e4 = {v5, v6}.
Now let e5 = {v3, v4, v5} and e6 = {v3, v6}. The sets S, Γ1 and Γ2 in this case are:
S = {e5, e6}
Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3, e4}) t {e5, e6})
Γ2 = {e2, e5, e6} tm {e1, e3, e4}.
Case 3: There is a 2-edge e3 ∈ Ered such that v2 ∈ e3 and e2 ∩ e3 = ∅. In this case, let
e4 = (e1 − {v1}) ∪ (e3 − {v2}). The sets S, Γ1 and Γ2 in this case are:
S = {e4}
Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3}) t {e4})
Γ2 = {e2, e4} tm {e1, e3}.
CHAPTER 4
PHYLOGENETIC MODELS AND TENSORS OF BOUNDED RANK
This chapter is based on work in [27] with Shmuel Friedland.
4.1 Introduction
Let Vr(m,n, l) ⊆ Cm⊗Cn⊗Cl be the variety of tensors of border rank at most r. The variety
we will explore in this chapter is V4(4, 4, 4), the variety associated to the 4-state general Markov
model on the 3-leaf claw tree denoted M0 (see Proposition 7). Motivated by understanding
the phylogenetic invariants for the modelM0, in 2007, Elizabeth Allman posed the problem of
determining the ideal I4(4, 4, 4) generated by all polynomials vanishing on V4(4, 4, 4); this has
been coined the salmon problem [2]. The salmon conjecture from [50, Conjecture 3.24] states
that I4(4, 4, 4) is generated by polynomials of degree 5 and 9.
A first nontrivial step in characterizing V4(4, 4, 4) is to characterize V4(3, 3, 4), the variety of
all complex valued 3× 3× 4 tensors or border rank at most 4. In [39], Landsberg and Manivel
show that V4(3, 3, 4) satisfies a set of polynomial equations of degree 6 which are not in the ideal
generated by the equations of degree 5 from the original conjecture. (See also [40, Remark 5.7]
and [6]). Hence the revised version of the salmon conjecture states that I4(4, 4, 4) is generated
by polynomials of degree 5, 6 and 9 [57, §2]. This, in particular, implies the set-theoretic version
of the salmon conjecture, which we will prove in the remainder of this chapter:
50
51
Theorem 33 The variety of tensors in C4 ⊗ C4 ⊗ C4 of border rank at most 4 is cut out by
polynomials of degree 5, 6, and 9.
If the ideal generated by the degree 5, 6, and 9 polynomials described in Section 4.4 is indeed
radical, then Theorem 33 gives a concrete foundation to proving the following conjecture about
the phylogenetic complexity of M0. This conjecture is implied by the salmon conjecture [57,
§2], [6].
Conjecture 34 The phylogenetic complexity of M0, the 4-state general Markov model on the
3-leaf claw tree, is at most 9.
In [26], Friedland shows that V4(4, 4, 4) is cut out by polynomials of degree 5, 9 and 16 by
showing V4(3, 3, 4) is cut out by polynomials of degrees 9 and 16. In [39], Landsberg and Manivel
give an algorithm to construct the polynomials of degree 6, referred here as the LM-polynomials,
that vanish on V4(3, 3, 4) but are not in the ideal generated by the known polynomials of degree
5. In [6], Bates and Oeding explicitly construct a basis of the these degree 6 polynomials
which consist of ten linearly independent polynomials. Using methods from numerical algebraic
geometry, Bates and Oeding give numerical confirmation that V4(3, 3, 4) is the zero set of a set
of polynomials of degree 6 and 9 [6], where the degree 6 polynomials are the LM-polynomials.
The aim of this chapter is to show that V4(3, 3, 4) is cut out by polynomials of degree 6
and 9. This is done by showing that in Case A.I.3 of [26, Proof of Theorem 4.5] the use of
52
polynomials of degree 16 can be eliminated by use of the LM-polynomials. More precisely we
show that any 3× 3× 4 tensor X = [xi,j,k] ∈ C3×3×4 whose four frontal slices are of the form
Xk =
x1,1,k x1,2,k 0
x2,1,k x2,2,k 0
0 0 x3,3,k
, k = 1, 2, 3, 4, (4.1.1)
has border rank at most four if and only if the ten basis LM-polynomials vanish on X .
As we will see later, a tensor X ∈ C3×3×4 of the form (4.1.1) has border rank at most four
if and only if either the four matrices
x1,1,k x1,2,k
x2,1,k x2,2,k
, k = 1, 2, 3, 4 are linearly dependent
or x3,3,k = 0 for k = 1, 2, 3, 4. Note that the condition that the above four 2 × 2 matrices are
linearly dependent is equivalent to the vanishing of the polynomial
f(X ) = det
x1,1,1 x1,2,1 x2,1,1 x2,2,1
x1,1,2 x1,2,2 x2,1,2 x2,2,2
x1,1,3 x1,2,3 x2,1,3 x2,2,3
x1,1,4 x1,2,4 x2,1,4 x2,2,4
. (4.1.2)
Computer-aided symbolic calculations show that the restrictions of the ten basis LM-
polynomials to X of the form (4.1.1) are the polynomials
x3,3,kx3,3,lf(X ) for 1 ≤ k ≤ l ≤ 4. (4.1.3)
53
Hence X has a border rank at most four if and only if the ten basis LM-polynomials vanish on
X . Combining this with the results in [26] we deduce the set-theoretic version of the salmon
conjecture, Theorem 33.
We summarize briefly the content of this chapter. In Section 4.2 we restate the characteri-
zation of V4(3, 3, 4) given in [26, Theorem 4.5]. In 4.3 we show that the use of polynomials of
degree 16 in the proof of [26, Theorem 4.5] can be replaced by the use of the LM-polynomials.
In 4.4, we summarize the characterization of V4(4, 4, 4) as the zero set of polynomials of degree
5, 6 and 9.
4.2 A characterization of V4(3, 3, 4)
In order to understand the defining polynomials for V4(4, 4, 4), it is helpful to understand the
defining polynomials of V4(3, 3, 4). This is because the set of defining equations for V4(4, 4, 4)
inherits the equations of V4(3, 3, 4). The inheritance process is explained explicitly in Condition
2 of [26, Theorem 5.1] and also more generally in [39, Proposition 4.4]. Thus, in this section,
we focus on V4(3, 3, 4) and show that we can replace the degree 16 equations in [26] with degree
6 equations.
Let X = [xi,j,k]3,3,4i,j,k=1 ∈ C3×3×4 ∈ V4(3, 3, 4) be a 3 × 3 × 4 tensor with border rank less
than 4. The four frontal slices of X are denoted as the matrices Xk := Xk,3 = [xi,j,k]3i,j=1 ∈
C3×3, k = 1, 2, 3, 4. Notice that since we are only working with frontal slices, we dropped the
second index on X which indicates the type of the slice.
The following lemma is a specialization of [26, Lemma 4.2].
54
Lemma 35 [26, Lemma 4.2] Let X = [xi,j,k]3,3,4i,j,k=1 ∈ C3×3×4 have border rank at most 4. Then
there exist L,R ∈ C3 such that
LXk ⊂ S(3,C), XkR ⊂ S(3,C) k = 1, . . . , 4, (4.2.1)
LR> = R>L =tr(LR>)
3I3 (4.2.2)
where S(3,C) is the set of all complex-values 3× 3 symmetric matrices.
The following is an immediate corollary to Lemma 35.
Corollary 36 Let X ∈ V4(3, 3, 4).Then there exist nontrivial matrices L,R ∈ C3×3 \ {0} satis-
fying the conditions
LXk −X>k L> = 0, k = 1, . . . , 4, L ∈ C3×3, (4.2.3)
XkR−R>X>k = 0, k = 1, . . . , 4, R ∈ C3×3. (4.2.4)
Proof. The equations 4.2.3 and 4.2.4 are a restatement of conditions imposed by 4.2.1.
2
The equations 4.2.3, 4.2.4 are referred to as the symmetrization conditions. It is known that
for any square matrix A, the matrix A−A> is skew symmetric. Thus 4.2.3, 4.2.4 each result in 12
linear homogeneous equations in the entries of L and R respectively. Let CL(X ), CR(X ) ∈ C12×9
55
be the coefficient matrices of these respective systems. The entries of CL(X ), CR(X ) are linear
functions in the entries of X .
For a generic X ∈ V4(3, 3, 4), rank CL(X ) = rank CR(X ) = 8, hence we can express the
entries of L and R in terms of corresponding 8 × 8 minors of CL(X )CR(X ) respectively (for
details, see [26]). The following characterization of V4(3, 3, 4) from [26, Theorem 4.5] describes
defining polynomials of degree 9 and degree 16 in terms of the coefficient matrices CL(X ), CR(X )
and their minors.
Theorem 4.2.1 [26, Theorem 4.5] The tensor X = [xi,j,k]3,3,4i=j=k=1 ∈ C3×3×4 has border
rank at most 4 if and only if the following conditions hold.
1. Let Xk := [xi,j,k]3i=j=1 ∈ C3×3, k = 1, . . . , 4 be the four frontal slices of X . Then the ranks
of CL(X ), CR(X ) are less than 9. This results in degree 9 polynomial equations.
2. Let R,L be solutions of (4.2.3) and (4.2.4) respectively given by 8×8 minors of CL(X ), CR(X ),
then (4.2.2) holds. This condition results in degree 16 polynomial equations.
The proof of Theorem 4.2.1 in [26] consists of discussing a number of cases. Condition 2 from
Theorem 4.2.1 (the degree 16 polynomials) are used only in the case A.I.3. In the next section
we show how to prove the theorem in the case A.I.3 using only the ten basis LM-polynomials
of degree 6, thus replacing Condition 2 with the LM-polynomials.
4.3 Proving case A.I.3 using degree 6 polynomials
Suppose X ∈ C3×3×4 and there exist two nonzero matrices L,R ∈ C3×3 such that (4.2.3)–
(4.2.4) hold. The case A.I.3 assumes that L and R are rank one matrices and resolves the case
56
where LR> = R>L = 0 without use of the degree 16 polynomials. Therefore, to eliminate the
use of the degree 16 polynomials we need to show the following.
Claim 4.3.1 Let X ∈ C3×3×4. Let R,L ∈ C3×3 be rank one matrices satisfying the condi-
tions (4.2.3)–(4.2.4) respectively. Suppose furthermore that either LR> 6= 0 or R>L 6= 0. If
the ten LM-polynomials vanish on X then X ∈ V4(3, 3, 4).
In the rest of this section we prove Claim 4.3.1. Assume that R,L are rank one matrices.
Then there exist u,v,x,y ∈ C3 such that L = uv>, R = xy>.
Lemma 37 Let A ∈ Cn×n and u,v,x,y ∈ Cn. Then the following two statements hold:
uv>A is symmetric if and only if v>A = bu> for some b ∈ C, (4.3.1)
Axy> is symmetric if and only if Ax = cy for some c ∈ C. (4.3.2)
Proof. Both 4.3.1 and 4.3.2 are direct consequences of the fact that a matrix S ∈ Cn×n
is a rank one symmetric matrix if and only if there exists a z ∈ Cn such that S = zz>. 2
By changing bases in two copies of C3, we can assume that u = v = e3 = (0, 0, 1)>.
(Changes of bases do not affect the vanishing condition of either LR> or R>L [26].) Let
P,Q ∈ GL(3,C) such that
P>e3, Q>e3 ∈ span(e3). (4.3.3)
Then if A ∈ C3 × C3 such that e>3 A = be>3 for some b ∈ C and Ax = cy for some c ∈ C, by
Lemma 37, e3e>3 (PAQ) is symmetric. Observe next that PAQ(Q−1x)(Py)> is also symmetric.
57
Thus we need to analyze what kind of vectors can be obtained from two nonzero vectors x,y
by applying Q−1x, Py, where P,Q satisfy (4.3.3). Notice that P and Q have the zero pattern
∗ ∗ ∗
∗ ∗ ∗
0 0 ∗
. (4.3.4)
Let Q1 := Q−1. By considering the adjoint matrix of Q, we see that Q1 has the zero pattern
in 4.3.4 and satisfies the same conditions as Q and P in (4.3.3) .
Lemma 38 Let y ∈ C3 \ {0}. If e>3 y 6= 0 then there exists P ∈ GL(3,C) of the form (4.3.4)
such that Py = e3. If e>3 y = 0 then there exists P ∈ GL(3,C) of the form (4.3.4) such that
Py = e2.
Proof. Assume first that e>3 y 6= 0. Let f = (f1, 0, f3)>,g = (0, g2, g3)> ∈ C3 \ {0} such
that f>y = g>y = 0. Then f1g2 6= 0. Hence there exists P ∈ GL(3,C) of the form (4.3.4),
whose first and second rows are f>,g> respectively, such that Py = e3.
Suppose now that e>3 y = 0. Hence there exists P = P1 ⊕ [1], P1 ∈ GL(2,C) such that
Py = e2. Any such P is an element of GL(3,C). 2
Corollary 4.3.2 Let A ∈ C3×3 and assume that LA and AR are symmetric matrices for
some rank one matrices L,R ∈ C3×3. Then there exists P,Q ∈ GL(3,C) such that by replacing
58
A,L,R by A1 := PAQ,L1 := Q>LP−1, R1 = Q−1RP> we can assume L1 = e3e>3 and R1 has
one of the following 4 forms
e3e>3 , e3e>2 , e2e>3 , e2e>2 . (4.3.5)
As a result of Corollary 4.3.2, to prove Claim 4.3.1 we need to consider the first three
choices of R1 in (4.3.5) since the last choice implies LR> = R>L = 0. Furthermore, note that
by changing the first two indices in X ∈ C3×3×4 we need to consider only the first two choices
of R1 in (4.3.5).
4.3.1 The case L = R = e3e>3
In the remainder of this section we say that a tensor T ∈ Cm×n×l is essentially a tensor
T ′ = [t′i,j,k] ∈ Cm′×n′×l′ if after a change of bases in Cm,Cn,Cl the tensor T is represented by
the tensor T = [ti,j,k] ∈ Cm×n×l such that the following conditions hold. First t′i,j,k = ti,j,k for
i = 1, . . . ,m′, j = 1, . . . , n′, k = 1, . . . , l′. Second ti,j,k = 0 if ti,j,k is not a coordinate of T ′.
Clearly, rank T = rank T ′,brank T = brank T ′.
Let X1, X2, X3, X4 ∈ C3×3 be the four frontal sections of X = [xi,j,k] ∈ C3×3×4. Assume
that (4.2.3)–(4.2.4) hold. Then, since L = R = e3e>3 , each Xk has the form of (4.1.1). This is
the case discussed in [26, (4.7)].
Using Mathematica, we took the ten basis LM-polynomials available in the ancillary material
of [6, deg 6 salmon.txt] and let x1,3,k = 0, x2,3,k = 0, x3,1,k = 0, x3,2,k = 0 for k = 1, 2, 3, 4. The
resulting polynomials had 24 terms. We then factored f(X ) from these restricted polynomials.
59
This symbolic computations shows that the restriction of the ten basis LM-polynomials to
X satisfying (4.2.3)–(4.2.4) are the polynomials given in (4.1.3). Therefore, by the result of
Landsberg-Manivel [39], if X ∈ V4(3, 3, 4) then all polynomials in (4.1.3) vanish on X .
Conversely, suppose that all polynomials in (4.1.3) vanish on X . Let
Yk =
x1,1,k x1,2,k
x2,1,k x2,2,k
, k = 1, 2, 3, 4, (4.3.6)
be the projection of the four frontal sections of X given by (4.1.1) on C2×2. Then f(X ) = 0
if and only if Y1, Y2, Y3, Y4 are linearly dependent. Decompose the tensor X to a sum Y + Z.
The four frontal sections of Y are block diagonal matrices diag(Yk, 0), k = 1, 2, 3, 4 and the four
frontal sections of Z are diag(0, 0, x3,3,k), k = 1, 2, 3, 4.
Assume first that the polynomial f(X ) given by (4.1.2) vanishes in X . Since Y1, Y2, Y3, Y4
are linearly dependent, it follows the tensor Y can be viewed as a 2× 2× 3 tensor.
A particular case of [12, Theorem 3.1] tells us
brank T ≤ min(n, 2m) for any T ∈ C2×m×n where 2 ≤ m ≤ n (4.3.7)
Hence the border rank of Y is at most 3. In fact, we can say more. It is straightforward
to show that any three dimensional subspace of C2×2 is spanned by 3 rank one matrices, thus
Theorem 11 implies that rank Y ≤ 3. Now, clearly rank Z ≤ 1, therefore brank X ≤ 4 (more
precisely, rank X ≤ 4.)
60
Assume now that f(X ) 6= 0. Since the ten polynomials in (4.1.3) vanish on X it follows that
x3,3,k = 0 for k = 1, 2, 3, 4. So Z = 0. In this case X is essentially a 2 × 2 × 4 tensor. Hence,
by (4.3.7), its border rank is at most 4.
4.3.2 The case L = e3e>3 , R = e3e>2
Let X1, X2, X3, X4 ∈ C3×3 be the four frontal sections of X = [xi,j,k] ∈ C3×3×4. Assume
that (4.2.3)–(4.2.4) hold. This means that our tensor X = [xi,j,k] ∈ C3×3×4 has the following
zero entries x1,3,k = x3,1,k = x3,2,k = x3,3,k = 0 for k = 1, 2, 3, 4. So our tensor is essentially a
2× 3× 4 tensor and hence, by (4.3.7), its border rank is at most 4.
4.4 The defining polynomials of V4(4, 4, 4)
In this section we state for the reader’s convenience the defining equations of V4(4, 4, 4). We
briefly repeat the arguments in [26] by replacing the degree 16 polynomial equations with the
degree 6 polynomial equations. Let X = [xi1,i2,i3 ] ∈ C4×4×4. For each l ∈ {1, 2, 3} we fix il
while we let ip, iq = 1, 2, 3, 4 where {p, q} = {1, 2, 3} \ {l}. In this way we obtain four l-sections
X1,l, . . . , X4,l ∈ C4×4. (Note that Xk,3 = [xi,j,k]4i=j=1, k = 1, 2, 3, 4 are the four frontal sections
of X .) Denote by Xl =span(X1,l, . . . , X4,l) ⊂ C4×4 the l-section subspace corresponding to X .
For each l ∈ {1, 2, 3} we define the following linear subspaces of polynomials of degrees 5, 6, 9
respectively in the entries of X . The defining polynomials could be any basis in each of these
linear subspaces.
We first describe the Strassen commutative conditions [55]. These conditions where redis-
covered independently in [1]. Take U1, U2, U3 ∈ Xl. View Ui =∑4
j=1 uj,iXj,l for i = 1, 2, 3. So
61
the entries of each Xj,l are fixed scalars and uj,i, i = 1, 2, 3, j = 1, 2, 3, 4 are viewed as variables.
Let adj U2 be the adjoint matrix of U2. Then the Strassen commutative conditions are
U1(adj U2)U3 − U3(adj U2)U1 = 0.
Since the values of uj,i, i = 1, 2, 3, j = 1, 2, 3, 4 are arbitrary, we regroup the above condition
for each entry as a polynomial in uj,i. The coefficient of each monomial in the uj,i variables
is a polynomial of degree 5 in the entries of X and must be equal to zero. The set of all such
polynomials of degree 5 span a linear subspace, and we can choose any basis in this subspace.
The degree 6 and 9 polynomial conditions are obtained in a a slightly different way. Let
P = [pij ], Q = [qij ] ∈ C4×4 be matrices with entries viewed as variables. View PXk,lQ, k =
1, 2, 3, 4 as the four frontal slices of the 4× 4× 4 tensor X (P,Q, l) = [xi,j,k(P,Q, l)]4i,j,k=1.
Let Y = [xi,j,k(P,Q, l)]3,3,4i,j,k=1. Now Y must satisfy the degree 6 polynomial conditions of
Landsberg-Manivel and the degree 9 symmetrization conditions. Since the entries of P,Q are
variables, this means that the coefficients of the monomials in the variables pij , qij , i, j = 1, 2, 3, 4
must vanish identically. This procedure gives rise to 10 polynomial conditions of degree 6 [39],
which are linearly independent, and 440 polynomial conditions of degree 9 [26], which may
be linearly dependent. Using appropriate software one may reduce the number of linearly
independent conditions of degree 9.
The zero set of the above polynomials of degrees 5, 6 and 9 defines V4(4, 4, 4). Thus, Theorem
33 holds.
CHAPTER 5
MAXIMUM LIKELIHOOD DEGREE OF VARIANCE COMPONENT
MODELS
This chapter is based on work from [29] with Mathias Drton and Sonja Petrovic.
5.1 Introduction
Linear models with fixed and random effects are widely used for dependent observations.
Such mixed models are typically fit using likelihood-based techniques, and the necessary opti-
mization problems can be solved using the numerical methods implemented in various statistical
software packages, as discussed, for instance, in [23]. Such software typically takes into account
that the variance parameters are nonnegative. However, general-purpose optimization proce-
dures do not give any guarantees that a global optimum is found (see Section 1.8 in [36]). It can
thus be appealing to compute maximum likelihood (ML) estimates algebraically. Linear mixed
models have rational likelihood equations, which can be solved using either symbolic methods
as in [22, Chap. 2] or with numerical solvers such as PHCPack [64]. While solving likelihood
equations algebraically may not be feasible in large models with several random factors, modern
computational algebra does allow one to fully understand the likelihood surface in practically
relevant settings.
62
63
The main contribution of this chapter is a study of the algebraic complexity of ML estimation
in the unbalanced one-way layout with random effects. This model concerns a collection of
grouped observations
Yij = µ+ αi + εij , i = 1, . . . , q, j = 1, . . . , ni. (5.1.1)
The overall mean µ ∈ R is a fixed (‘non-random’) but unknown parameter. The random effects
αi and the error terms εij are mutually independent normal random variables. More precisely,
αi ∼ N (0, τ) and εij ∼ N (0, ω), where τ and ω denote the common variances of the random
effects and the error terms, respectively. The distribution of observation Yij is N (µ, τ +ω), and
two observations Yij and Yik from the ith group are dependent with covariance τ . A detailed
discussion and examples of applications of this specific model can be found in Chapter 3 of [54]
and in Chapter 11 of [53].
The covariance matrix of the joint multivariate normal distribution for all Yij defined by
(5.1.1) is the product of the scalar ω and a matrix that is a function of the variance ratio
θ = τ/ω. Therefore, when θ is known, the likelihood equations for µ and ω are of the type
encountered in generalized least squares calculations, with a unique solution that is a rational
function of the data and the known value of θ. We may thus eliminate µ and ω from the
likelihood equations, which then reduce to a single univariate equation. Before turning to a
first example, we remark that we always tacitly assume suitable sample size conditions to be
satisfied such that ML estimates exist. In particular, we assume there to be q ≥ 2 groups with
64
at least one group of size ni ≥ 2. A definitive answer to the existence problem in linear mixed
models is given in [17] who also treat restricted maximum likelihood (REML) estimation; see
[43] for an introduction to this technique.
Example 5.1.1 Textbook data from [15, §6.4] give the yield of dyestuff from 5 different
preparations from each of q = 6 different batches of an intermediate product; the data are also
available in the R package lme4. The layout is balanced, that is, all batch sizes are equal, here
ni = 5. In this case, the likelihood equations are well-known to be equivalent to a linear equation
system and have ML degree one. The REML degree is also one.
A different picture emerges in the unbalanced case, when the batch sizes are not all equal.
For illustration, we remove the first, second and sixth observation from the data. The first batch
then only comprises n1 = 3 preparations, and the second batch only n2 = 4. The remaining
batches are unchanged with ni = 5 for i ≥ 3. In this unbalanced case, the solutions of the
likelihood equations correspond to the solutions of the polynomial equation
− 245488320000 θ7 − 277109078400 θ6 − 58814614680 θ5 + 54052612853 θ4
+ 37792395524 θ3 + 10086075110 θ2 + 1279832076 θ + 64175517 = 0. (5.1.2)
Thus, the ML degree is 7.
Numerical optimization using the R package lme4 yields a local maximum of the likelihood
function that corresponds to θ ≈ 0.5585. We may check whether this local maximum is unique,
65
or at least a global optimum, by finding all roots of the above univariate polynomial. This is a
task that can be done reliably in computer algebra systems.
It is natural to ask for the ML degree of the one-way layout as a function of the number of
groups q and the group sizes n1, . . . , nq. Our main result answers this question. Theorem 39
gives formulas for both the ML and the REML degree of one-way layouts and offers a direct
comparison of the algebraic complexity of the two approaches. Its proof is given in later sections.
As explained in the paragraph before Example 5.1.1, we may reparametrize the model using the
ratio θ = τ/ω and eliminate the two parameters µ and ω from the likelihood equations. This
gives a single rational equation in θ. By carefully clearing terms from the numerator and the
denominator appearing in the rational equation, our proof produces a polynomial in θ whose
roots correspond to the solutions of the rational equation. The degree of this polynomial is the
ML/REML degree; recall Example 5.1.1.
Our theorem is stated using a notion of multiplicities. Suppose v = (v1, . . . , vq) ∈ Zq is
a tuple of integers. If v has M distinct entries, then the multiplicities of v form the integer
multiset {m1, . . . ,mM}, where mj counts how often the jth distinct entry of v appears among
all entries of v.
Theorem 39 Consider a one-way layout with random effects for q groups that are of sizes
n1, . . . , nq. Suppose M of the group sizes are distinct, with associated multiplicities m1, . . . ,mM .
Let M2 = #{j : mj ≥ 2}. Then the ML degree is 3M + M2 − 3, and the REML degree is
2M + 2M2 − 3. The ML degree exceeds the REML degree unless M2 = M , in which case
equality holds.
66
The condition M2 = M holds if each group size appears at least twice. In the balanced
case, we have M = M2 = 1 and the theorem recovers the well-known fact that both degrees are
one; compare [32; 54; 52]. Each degree is maximal when the group sizes n1, . . . , nq are pairwise
distinct. The degrees are then 3q − 3 for ML and 2q − 3 for REML.
Example 5.1.2 The model for the dyestuff data from Example 5.1.1 has q = 6 groups. The
unbalanced case we considered had group sizes (n1, . . . , n6) = (3, 4, 5, 5, 5, 5). The multiplicities
are {1, 1, 4}. Our formula confirms the ML degree to be 3 · 3 + 1− 3 = 7. As another example,
if (n1, . . . , n6) = (4, 4, 3, 2, 2, 2), then the ML degree is 8 and the REML degree is 7.
The remainder of the chapter is structured as follows. In Section 5.2, we review the deriva-
tion of the likelihood equations for ML and REML estimation. Section 5.3 contains the proof
of the ML degree formula from Theorem 39, and Section 5.4 treats the REML degree. Each
proof consists of a detailed study of a univariate rational equation in the variance ratio θ. We
end this chapter with Section 5.5, which gives two examples of unbalanced one-way random
effects models with bimodal likelihood functions.
5.2 The likelihood equations
Let n1, . . . , nM be unique group sizes with associated multiplicities m1, . . . ,mM . Let Yij =
(Yij1, . . . , Yijni) be the vector comprising the observations in the jth group of size ni. Then
the model for the one-way layout given by (5.1.1) can equivalently be described as stating that
Y11, . . . , Y1m1 , Y21, . . . , YMmMare independent multivariate normal random vectors with
Yij ∼ N (µ1ni ,Σni(ω, τ)) ,
67
where the covariance matrix is
Σni(ω, τ) = ωIni + τ1ni1Tni.
Here, 1n = (1, . . . , 1)T ∈ Rn, and In is the n× n identity matrix.
5.2.1 Maximum likelihood
Ignoring additive constants and multiplying by two, the log-likelihood function of the one-
way model is
`(µ, ω, τ) =M∑i=1
mi∑j=1
log det (Kni(ω, τ))− (Yij − µ1ni)TKni(ω, τ)(Yij − µ1ni), (5.2.1)
where
Kni(ω, τ) =1ωIni −
τ
ω(ω + niτ)1ni1
Tni. (5.2.2)
is the inverse of Σni(ω, τ). The inverse has determinant
det(Kni(ω, τ)) =1
ωni−1(ω + niτ). (5.2.3)
Let N = m1n1 + · · ·+mMnM be the total number of observations. For each i = 1, . . . ,M ,
define the group averages
Yij =1ni
ni∑k=1
Yijk, j = 1, . . . ,mi,
68
and the average across the groups of equal size
Yi =1mi
mi∑j=1
Yij .
From the averages, compute the between-group sum of squares
Bi =mi∑j=1
(Yij − Yi)2.
Note that, for generic data, Bj = 0 if and only if mj = 1. Therefore, it suffices to consider the
sums of squares Bi with mi ≥ 2. Finally, define the within-group sum of squares
W =M∑i=1
mi∑j=1
ni∑k=1
(Yijk − Yij)2,
which is positive for generic data.
Proposition 40 Upon the substitution κ = 1/ω and θ = τ/ω, the log-likelihood function for
the one-way layout can be written as
`(µ, κ, θ) = N log(κ)− κW −
[M∑i=1
mi log(1 + niθ)
]− κ
[M∑i=1
ni1 + niθ
Bi
]
− κ
[M∑i=1
mini1 + niθ
(Yi − µ)2
]. (5.2.4)
69
Proof. Applying (5.2.2), the quadratic form in (5.2.1) can be expanded into
(Yij − µ1ni)TKni(ω, τ)(Yij − µ1ni)
=1ω
ni∑k=1
(Yijk − µ)2 − τ
ω(ω + niτ)[(Yij − µ1ni)
T1ni
]2=
1ω
ni∑k=1
(Yijk − Yij)2 +niω
(Yij − µ)2 − τ
ω(ω + niτ)n2i (Yij − µ)2
= κni
1 + niθ(Yij − µ)2 + κ
ni∑k=1
(Yijk − Yij)2.
Using this expression and (5.2.3), the log-likelihood function is seen to be equal to
`(µ, κ, θ) = N log(κ)− κW −
[M∑i=1
mi log(1 + niθ)
]− κ
M∑i=1
mi∑j=1
ni1 + niθ
(Yij − µ)2
. (5.2.5)
The claimed form of `(µ, κ, θ) is now obtained by expanding the last sum as
mi∑j=1
ni1 + niθ
(Yij − µ)2 (5.2.6)
=ni
1 + niθ
mi∑j=1
[(Yij − Yi)2 + (Yi − µ)2 + 2(Yij − Yi)(Yi − µ)
](5.2.7)
=mini
1 + niθ(Yi − µ)2 +
ni1 + niθ
Bi. (5.2.8)
2
70
The partial derivatives of the log-likelihood function from Proposition 40 are
∂`
∂µ= 2κ
M∑i=1
mini1 + niθ
(Yi − µ), (5.2.9)
∂`
∂κ=N
κ−
[W +
M∑i=1
mini1 + niθ
(Yi − µ)2 +M∑i=1
ni(1 + niθ)
Bi
], (5.2.10)
∂`
∂θ= −
[M∑i=1
mini1 + niθ
]+ κ
[M∑i=1
min2i
(1 + niθ)2(Yi − µ)2 +
M∑i=1
n2i
(1 + niθ)2Bi
]. (5.2.11)
Since N 6= 0, the equation system obtained by setting the three partials to zero has the same
solution set as the equation system
M∑i=1
mini1 + niθ
(Yi − µ) = 0, (5.2.12)
N − κ
[W +
M∑i=1
mini1 + niθ
(Yi − µ)2 +M∑i=1
ni1 + niθ
Bi
]= 0, (5.2.13)
κ
[M∑i=1
min2i
(1 + niθ)2(Yi − µ)2 +
M∑i=1
n2i
(1 + niθ)2Bi
]−
[M∑i=1
mini1 + niθ
]= 0. (5.2.14)
Now we can solve equation (5.2.12) for µ, substitute the result into equation (5.2.13) and
solve for κ. Both µ and κ are then expressed in terms of θ. Substituting the expressions into
(5.2.14), we obtain a univariate rational equation in θ. Our proof of the ML degree formula in
Theorem 39 proceeds by cancelling terms from the numerator and denominator of this rational
expression. This is the topic of Section 5.3.
71
5.2.2 Restricted maximum likelihood
The REML method uses a slightly different likelihood function that is obtained by con-
sidering a projection of the observed random array (Yijk) ∈ RN . The mean of this array has
all entries equal to µ. In other words, it is modelled to lie in the space L ⊂ RN spanned by
the array with all entries equal to one. The likelihood function used in REML is obtained by
taking the observation to be the projection of (Yijk) onto the orthogonal complement of L. The
distribution of the projection no longer depends on µ and so the REML function only has (τ, ω)
or, equivalently, (κ, θ) as arguments.
Using the formulas given, for instance, in [43], and simplifying the resulting expressions
similar to what was done in the proof of Proposition 40, we obtain the following expression for
the restricted log-likelihood function.
Proposition 41 Upon the substitution κ = 1/ω and θ = τ/ω, the restricted log-likelihood
function for the one-way layout can be written as
¯(κ, θ) = (N − 1) log(κ)− κW −
[M∑i=1
mi log(1 + niθ)
]
− log
(M∑i=1
mini1 + niθ
)− κ
[M∑i=1
ni1 + niθ
Bi
]− κ
[M∑i=1
mini1 + niθ
(Yi − µ(θ))2
], (5.2.15)
with
µ(θ) =
∑Mi=1
∑mij=1
ni1+niθ
Yij∑Mi=1
∑mij=1
ni1+niθ
=
∑Mi=1
mini1+niθ
Yi∑Mi=1
mini1+niθ
. (5.2.16)
72
Note that µ(θ) is the solution to the equation in (5.2.12). Computing µ(θ) is the standard
way to obtain an estimate of µ from a REML estimate of θ.
The partial derivatives of the restricted log-likelihood function from Proposition 41 are
∂ ¯
∂κ=N − 1κ−
[W +
M∑i=1
mini1 + niθ
(Yi − µ(θ))2 +M∑
1=1
ni1 + niθ
Bi
], (5.2.17)
∂ ¯
∂θ= −
[M∑i=1
mini1 + niθ
]+
∑Mi=1
min2i
(1+niθ)2∑Mi=1
mini(1+niθ)
(5.2.18)
+ κ
[M∑i=1
min2i
(1 + niθ)2(Yi − µ(θ))2 +
M∑i=1
n2i
(1 + niθ)2Bi
].
The equation ∂ ¯/∂κ = 0 is easily solved. Substituting the unique solution κ(θ) into the equation
∂ ¯/∂θ = 0 yields again a univariate rational equation in θ. The proof of the REML degree
formula in Theorem 39 requires studying cancellations from the numerator and denominator of
this equation, which is the topic of Section 5.4.
5.3 Proof of formula for ML degree
Our proof of the ML degree formula in Theorem 39 proceeds in two steps. First, in Lemma 42
we derive a univariate rational equation whose number of zeros is the ML degree of the model.
Second, we simplify it in Lemmas 43 and 44 by clearing common factors from the numerator
and the denominator.
73
Fix the following notation, used throughout. For a vector a = (a1, . . . , aM ) ∈ RM , define
the rational functions
ra(θ) =M∑i=1
mini1 + niθ
ai and sa(θ) =M∑i=1
min2i
(1 + niθ)2ai.
We write r1, rB/m, rY , rY 2 for the functions ra that have
a = 1M , a =(B1
m1, . . . ,
BMmM
), a = (Y1, . . . , YM ), a = (Y 2
1 , . . . , Y2M ),
respectively. It is clear from Section 5.2 that forming a common denominator for the rational
equations to be studied involves the product
d(θ) =M∏i=1
(1 + niθ) = d1(θ)d2(θ),
where
d1(θ) =∏
{i:mi=1}
(1 + niθ), d2(θ) =∏
{i:mi≥2}
(1 + niθ).
For a vector a ∈ RM , define the degree M − 1 polynomial
fa(θ) = d(θ)ra(θ) =M∑i=1
miniai∏j 6=i
(1 + njθ)
74
and the degree 2(M − 1) polynomial
ga(θ) = d(θ)2sa(θ) =M∑i=1
min2i ai∏j 6=i
(1 + njθ)2.
Lemma 42 The ML degree of the one-way layout is the degree of the numerator created when
cancelling all common factors from numerator and denominator of the following rational func-
tion in θ:
1Nd(θ)2f1(θ)2
×(N[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)
]−f1(θ)2
[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)
]). (5.3.1)
Proof. Adopting the notation above, the solution of the first of the likelihood equations
in (5.2.12) can be written as
µ(θ) =rY (θ)r1(θ)
. (5.3.2)
Next, rewrite the following term from the system of the three critical equations:
M∑i=1
mini1 + niθ
(Yi − µ(θ))2 = rY 2(θ)− 2rY (θ)r1(θ)
M∑i=1
mini1 + niθ
Yi +rY (θ)2
r1(θ)2
M∑i=1
mini1 + niθ
(5.3.3)
= rY 2(θ)− rY (θ)2
r1(θ).
75
Solving the second equation in (5.2.13) with µ = µ(θ) for κ thus gives
κ(θ) =N
W + rY 2(θ) + rB/m(θ)− rY (θ)2
r1(θ)
(5.3.4)
=Nr1(θ)
Wr1(θ) + rY 2(θ)r1(θ) + r1(θ)rB/m(θ)− rY (θ)2. (5.3.5)
Substituting µ(θ) and κ(θ) into the third and last equation in (5.2.14), we obtain the
univariate rational equation
sY 2(θ)− 2rY (θ)r1(θ)
sY (θ) +rY (θ)2
r1(θ)2s1(θ) + sB/m(θ)− r1(θ)
κ(θ)= 0, (5.3.6)
where we have divided by the non-zero rational expression κ(θ). According to (5.3.4), this is
sY 2(θ)− 2rY (θ)r1(θ)
sY (θ) +rY (θ)2
r1(θ)2s1(θ) + sB/m
−Wr1(θ) + rY 2(θ)r1(θ) + r1(θ)rB/m − rY (θ)2
N= 0. (5.3.7)
Reexpress (5.3.6) in terms of the f and g polynomials as
gY 2(θ)d(θ)2
− 2fY (θ)f1(θ)
gY (θ)d(θ)2
+fY (θ)2
f1(θ)2
g1(θ)d(θ)2
+gB/m
d(θ)2
−Wf1(θ)d(θ) + fY 2(θ)f1(θ) + f1(θ)fB/m − fY (θ)2
Nd(θ)2= 0. (5.3.8)
Forming a common denominator we obtain the rational function from (5.3.1). The number of
complex solutions to the likelihood equations and the number of complex roots of (5.3.1) agree.
76
Thus, the ML degree of the one-way layout is the number of complex solutions of (5.3.1), or,
equivalently, the degree of the numerator in (5.3.1) after canceling common factors from the
numerator and denominator. 2
The numerator given in (5.3.1) in Lemma 42 has degree 3(M−1)+M = 4M−3; the highest
degree term involves the within-group sum of squares W . The denominator in (5.3.1) has degree
2M + 2(M − 1) = 4M − 2. The next two lemmas imply that, after cancelling common factors,
the numerator of the univariate rational function from Lemma 42 has the degree claimed in the
ML degree formula from Theorem 39.
Lemma 43 If mt = 1, then (1 + ntθ) divides the numerator of the rational equation (5.3.1).
Hence, the polynomial d1(θ) of degree M −M2 divides this numerator.
Lemma 44 If d1(θ) is cleared from both the numerator and the denominator of the rational
function given in (5.3.1), then the new numerator and denominator are relatively prime for
generic sufficient statistics Y1, . . . , YM , W , and Bj with mj ≥ 2.
Proof. [Proof of Lemma 43] Let mt = 1. To show that (1 +ntθ) divides the numerator,
it is sufficient to show that (1 + ntθ) divides the sum of
N[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)
](5.3.9)
and
− f1(θ)2[fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)]. (5.3.10)
77
The product f1(θ)2gY 2(θ) in the first term of (5.3.9) may be rewritten as
M∑i=1
mini∏j 6=i
(1 + njθ)
M∑k=1
mknk∏l 6=k
(1 + nlθ)
M∑r=1
mrn2rY
2r
∏s 6=r
(1 + nsθ)2
=
M∑i=1
M∑k=1
M∑r=1
mimkmrninkn2rY
2r
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)2.
Combining this expression with the analogous expansions of the other three terms shows that
the polynomial in (5.3.9) is equal to N times
M∑i=1
M∑k=1
M∑r=1
[(mrY
2r − 2mrYiYr +mrYiYk +Br)mimkninkn
2r
×∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s6=r
(1 + nsθ)2
]. (5.3.11)
The polynomial in (5.3.10) can be expanded similarly. We find
fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ) (5.3.12)
=M∑i=1
M∑k=1
(mkY2i −mkYiYk +Bk)minink
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ).
Expanding f1(θ)2 as well, we obtain that the polynomial in (5.3.10) is equal to
−M∑i=1
M∑k=1
M∑r=1
M∑u=1
[(mkY
2i −mkYiYk +Bk)mimrmuninknrnu
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)∏v 6=u
(1 + nvθ)]. (5.3.13)
78
Now notice that (1+ntθ) divides every summand in (5.3.11) and (5.3.13) unless i = k = r = t
in the first summation, or i = k = r = u = t in the second summation. So it suffices to
only consider these ‘diagonal’ terms. However, under the equality of indices, the quadratic
expressions in the averages Yi cancel. Hence, the terms missing a factor of (1 + ntθ) in (5.3.9)
and (5.3.10) sum to
Btntm2tn
4t (N −mt)
∏j 6=t
(1 + njθ)4. (5.3.14)
Throughout the arguments, we assume that we have at least two groups with at least one group
size ni ≥ 2. Moreover, for generic data, Bt = 0 if and only if mt = 1. Hence, for generic data,
the expression in (5.3.14) is zero if and only if mt = 1. We conclude that d1(θ) divides the
numerator of the rational function in (5.3.1). 2
Note that the last part of the above proof shows not only that d1(θ) divides the numerator
of (5.3.1), but that (1+ntθ) does not divide the numerator when Bt 6= 0, which holds generically
if mt ≥ 2.
Proof. [Proof of Lemma 44] Clearing d1(θ) from the denominator in (5.3.1) yields
the polynomial Nd2(θ)d(θ)f1(θ)2. From the preceding comment, we know that d2(θ) and the
numerator are relatively prime for generic data Y1, . . . , YM , W > 0, and Bj > 0 with mj ≥ 2.
To establish our claim, we will first show that f1(θ) does not share a common factor with the
numerator by showing that f1(θ) and fY (θ)2g1(θ) are relatively prime; all terms other than
79
fY (θ)2g1(θ) in the numerator of (5.3.1) are multiples of f1(θ). Then, we will show that after
clearing d1(θ) in (5.3.1), d1(θ) and the new numerator are relatively prime.
Let θ1, . . . , θM−1 be the (possibly complex) roots of the degree M −1 polynomial f1(θ). For
each 1 ≤ k ≤M − 1, consider the linear form fY (θk) in the polynomial ring C[Y1, . . . , YM ]. Let
V (fY (θk)) ⊂ CM be the zero locus of fY (θk). Each set V (fY (θk)) is a hyperplane of dimension
M − 1. Thus, the union ∪M−1k=1 V (fY (θk)) is an M − 1 dimensional algebraic subset of CM . A
generic vector of group means (Y1, . . . , YM ) lies outside this lower-dimensional set, which means
that f1(θ) and fY (θ) are relatively prime for generic data.
To show that f1(θ) and g1(θ) are relatively prime, assume θ0 = a+ ib is a root of f1(θ) and
g1(θ). Since g1(θ) is a sum of squares that is positive on R, we must have θ0 /∈ R and hence
b 6= 0. Without loss of generality, let n1 be the least of the group sizes ni. Rewriting f1(θ0) = 0,
we get
n1 = −∑M
i=2mini∏j 6=i(1 + njθ0)
m1∏j 6=1(1 + njθ0)
= −M∑i=2
mini(1 + n1θ0)m1(1 + niθ0)
. (5.3.15)
The imaginary part of the right side of this equation must equal 0 since n1 is an integer.
Substituting a+ ib for θ0, the imaginary part of (5.3.15) is
bM∑i=2
(minim1
)(ni − n1)
(1 + nia)2 + (nib)2.
Since each term in the sum is positive, we obtain that b = 0. Consequently, θ0 ∈ R, which is a
contradiction. Therefore, f1(θ) and g1(θ) are relatively prime.
80
It remains to show that the numerator and denominator obtained by clearing the factor
d1(θ) in (5.3.1) are relatively prime for generic data. We claim that if mt = 1 then (1 + ntθ)
divides
f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ)f1(θ)2gB/m(θ)d1(θ)
, (5.3.16)
while d1(θ) and
Wf1(θ)d2(θ) +fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)
d1(θ)=: Wf1(θ)d2(θ) + F (θ) (5.3.17)
are relatively prime for generic data.
The ratio in (5.3.16) equals (5.3.11) divided by d1(θ). We may rewrite (5.3.11) as
M∑i=1
M∑k=1
M∑r=1
[(mrY
2r −mrYiYr −mrYkYr +mrYiYk +Br)mimkninkn
2r
×∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)2
]. (5.3.18)
It is clear that the square (1 + ntθ)2 divides all terms in the sum (5.3.18) except those for
r = i = t or r = k = t. However, the quadratic form in the averages Yi vanishes if r = i or
r = k. Since the terms in question have r = t, and Br = Bt = 0 because mt = 1, we conclude
that (1 + ntθ)2 divides the entire sum (5.3.18), which proves that d1(θ) divides the ratio in
(5.3.16).
We are left to show that d1(θ) and Wf1(θ)d2(θ) + F (θ) are relatively prime for generic
data. Let θ1, . . . , θM−M2 be the roots of d1(θ); each root is equal to −1/ni for some index i.
81
Since the ni are distinct, no root of d1(θ) is a root of d2(θ). Moreover, it is easy to see that no
root of d1(θ) is a root of f1(θ). Now let I be the ideal generated by the M −M2 polynomials
Wf1(θk)d2(θk)+F (θk) in the polynomial ring C[W, Y1, . . . YM , B′1, . . . B
′M2
], where the B′i stand
for the between-group sums of squares Bi with multiplicity mi ≥ 2. Pick sufficient statistics
W = Y1 = · · · = YM 6= 0 and B′1 = · · · = B′M = 0. Since no root of d1(θ) is a root of d2(θ) or
f1(θ), (5.3.12) implies that for these special data Wf1(θk)d2(θk) + F (θk) 6= 0 for each k. The
zero locus V (I) is thus a proper algebraic subset of CM+M2+1. Such a set is of lower dimension
and, thus, d1(θ) and Wf1(θ)d2(θ) + F (θ) are relatively prime for generic data. 2
5.4 Proof of formula for REML degree
For the proof of the REML degree formula in Theorem 39, we proceed in the same way
as for the ML degree. We begin by deriving the univariate rational function whose number of
roots is the REML degree.
Lemma 45 Consider the rational function whose numerator is
(g1(θ)− f1(θ)2)[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m] (5.4.1)
+ (N − 1)[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)
]
and denominator is
d(θ)f1(θ)[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m
]. (5.4.2)
82
The REML degree is the degree of the numerator of this rational function after clearing common
factors from the given numerator and denominator.
Proof. The equation ∂ ¯/∂κ = 0 has the unique solution
κ(θ) =N − 1
W +∑M
i=1mini1+niθ
(Yi − µ(θ))2 +∑M
i=1ni
1+niθBi
;
compare (5.2.17). Substituting κ(θ) into the partial derivative ∂ ¯/∂θ yields the univariate
function
−M∑i=1
mini1 + niθ
+
∑Mi=1
min2i
(1+niθ)2∑Mi=1
mini1+niθ
(5.4.3)
+ κ(θ)
[M∑i=1
min2i
(1 + niθ)2(Yi − µ(θ))2 +
M∑i=1
n2i
(1 + niθ)2Bi
]= 0;
recall (5.2.18). We can now simplify and rewrite (5.4.3), forming a common denominator, to
obtain the desired rational function. 2
The degree of the numerator in Lemma 45 is 4M − 3 and the degree of the denominator is
4M − 2. The numerator shares common factors with the denominator. In fact, in the proof of
Lemma 43, we have shown that d1(θ) divides fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m. Thus, d1(θ)2,
whose degree is 2M − 2M2, divides the denominator from Lemma 45. To prove Theorem 39, it
remains to prove the following two facts.
Lemma 46 The polynomial d1(θ)2 divides the numerator (5.4.1).
83
Lemma 47 After clearing d1(θ)2 from (5.4.1) and (5.4.2), the new numerator and new de-
nominator are relatively prime for generic data.
Proof. [Proof of Lemma 46] From the proof of Lemma 43, we know that d1(θ) divides the
polynomial fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m. Moreover, as shown in the proof of Lemma 44,
the square d1(θ)2 divides
f1(θ)2gY 2(θ)− 2f1(θ)fY (θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ).
To complete the proof of the present lemma, it suffices to show that d1(θ) divides g1(θ)−f1(θ)2.
However, with some distributing and grouping, we see
g1(θ)− f1(θ)2
=M∑i=1
min2i
∏j 6=i
(1 + njθ)2 −M∑i=1
M∑k=1
mimknink∏j 6=i
(1 + njθ)∏l 6=k
(1 + njθ)
=M∑i=1
(mi −m2i )∏j 6=i
(1 + njθ)−M∑i=1
M∑k>i
2nink∏j 6=i
(1 + njθ)∏l 6=k
(1 + njθ),
which is divisible by (1 + ntθ) if and only if mt = 1. 2
84
Proof. [Proof of Lemma 47] We first show that if mt ≥ 2, then, for generic data, (1+ntθ)
and the numerator from (5.4.1) are relatively prime. Consider
(g1(θ)− f1(θ)2)[fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)
]+ (N − 1)[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + gB/m(θ)f1(θ)2]. (5.4.4)
Using the results from the proof of Lemma 43 and writing out the involved summations, (5.4.4)
is seen to be equal to
(M∑i=1
M∑k=1
M∑r=1
(mkY2i −mkYiYk +Bk)mimrninkn
2r
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)2
−
(M∑i=1
M∑k=1
M∑r=1
M∑u=1
(mkY2i −mkYiYk +Bk)mimrmuninknrnu
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)∏v 6=u
(1 + nvθ)
+ (N − 1)
[M∑i=1
M∑k=1
M∑r=1
(mrY2r − 2mrYiYr +mrYiYk +Br)mimkninkn
2r
∏j 6=i
(1 + njθ)∏l 6=k
(1 + nlθ)∏s 6=r
(1 + nsθ)2
. (5.4.5)
The factor (1+ntθ) divides every summand in the above summations unless t = i = k = r = u,
so it suffices to only consider these terms. Letting t = i = k = r = u, the terms missing a factor
of (1 + ntθ) sum to a term we already encountered, namely, that in (5.3.14). The discussion
85
following display (5.3.14) shows that if the data is generic and mt ≥ 2, then (1 + niθ) does not
divide the numerator given in (5.4.1).
Continuing to work through the factors of the denominator from (5.4.2), assume that θ0 is a
root of f1(θ). Then everything vanishes in the numerator except for two terms −g1(θ0)fY (θ0)2
and (N − 1)fY (θ0)2g1(θ0), which add to (N − 2)fY (θ0)2g1(θ0). From the proof of Lemma 44,
we know fY (θ0)2g1(θ0) 6= 0 for generic data, so since we are working under the assumption of
at least two groups and at least one group size ni ≥ 2, the numerator and f1(θ) are relatively
prime for generic data.
Finally, we need to show
H(θ) := f1(θ)2gY 2(θ)− 2f1(θ)fY (θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)
and
G(θ) := Wf1(θ)d2(θ) + F (θ),
are relatively prime for generic dataW , Y1, . . . YM , and Bi withmi ≥ 2; the polynomial F (θ) was
defined in (5.3.17). We will again denote the between-group sums of squares with multiplicities
mi ≥ 2 as B′1, . . . , B′M2
. By a standard algebraic results, the polynomials G(θ) and H(θ) share a
common root θ if and only if a certain polynomial in their coefficients vanishes; this polynomial is
called the resultant and we denote it by Res(G,H). Since both H(θ) and G(θ) have coefficients
that are polynomials in the sufficient statistics W , Y1, . . . YM , and B′1, . . . , B′M2
, we may regard
Res(G,H) as a polynomial in the ring C[W, Y1, . . . YM , B′1, . . . , B
′M2
]. By Lemma 44, for any
86
given generic choice of Y1, . . . , YM , B′1, . . . , B
′M2
, a root θ0 of H is not a root of f1(θ) or d2(θ).
Hence, θ0 is a root of G if and only if
W = − F (θ0)d2(θ0)f1(θ0)
. (5.4.6)
PickingW not to satisfy (5.4.6) shows that Res(G,H) is not the zero polynomial in C[W, Y1, . . . YM , B′1, . . . , B
′M2
].
Hence, the zero locus of Res(G,H) is a set of lower dimension, and we conclude that H and G
are relatively prime for generic data. 2
5.5 Linear mixed models with multimodal likelihood functions
To our knowledge, the literature does not supply many examples of linear mixed models
with multimodal likelihood functions. We conclude by giving two simulated examples that
demonstrate the mathematical possibility of more than one mode. Such examples were rare
in our simulations, which is in agreement with findings of [61] who also treat the unbalanced
one-way layout. While uniqueness of local optima is not explicitly discussed in [61], the authors
remark in their conclusion that “varying the iteration starting point slightly affects the rate of
convergence, but not the [mean square errors] or biases of the [ML and REML] estimators.” The
examples we give involve three positive roots to the ML or REML equations for the variance
ratio θ. We do not know of examples with more positive roots.
87
Example 5.5.1 Consider the one-way layout with a single grand mean µ from (5.1.1).
Take q = 5 groups of sizes
n1 = 2, n2 = 5, n3 = 10, n4 = 20, n5 = 50.
Let the sufficient statistics be the five group averages
Y1 = −7357114273 ≈ −5.1546, Y2 = 13781
78326 ≈ 0.1759,
Y3 = −1327792152 ≈ −0.1441, Y4 = 31207
202567 ≈ 0.1541,
Y5 = −1571324121 ≈ −0.6514,
and the within-group sum of squares
W = 116487421 ≈ 276.69.
The univariate ML equation in θ has three nonnegative solutions, namely,
θML,1 ≈ 0.00838738, θML,2 ≈ 0.118458, θML,3 ≈ 0.338944;
having specified six digits we should add that the solutions were computed treating the above
rational fractions as the input. The solution θML,1 yields the global maximum of the likelihood
88
function, whereas θML,2 and θML,3 determine a saddle point and local maximum, respectively.
In contrast, the restricted likelihood function has a unique local and global maximum for
θREML ≈ 0.771763.
The data was simulated from the model with mean µ0 = 0, and variance components τ0 = 3
and ω0 = 2, which gives θ0 = 3/2.
Example 5.5.2 Continuing with the setup from Example 5.5.1, change the sufficient statis-
tics to
Y1 = 23008140206 ≈ 5.7226, Y2 = 721282
5630371 ≈ 0.1281,
Y3 = 2930595646 ≈ 0.3064, Y4 = 15365
37988 ≈ 0.4045,
Y5 = − 56940932 ≈ −0.0139,
and
W = 7550021759 ≈ 429.22.
Now, all real solutions to the ML equations are negative. Thus, the global maximum of the
likelihood is achieved at the boundary point θML = 0. In contrast, the REML equations have
three feasible solutions for θ, namely,
θREML,1 ≈ 0.00492193, θREML,2 ≈ 0.159465, θREML,3 ≈ 0.2414611.
89
The solution θREML,1 gives the global maximum of the restricted likelihood function. The solu-
tions θREML,2 and θREML,3 determine a saddle point and a local maximum, respectively. The
data was simulated as in Example 5.5.1.
In both Example 5.5.1 and Example 5.5.2, the first group is of the smallest size but has
group mean that is largest in absolute value. The other means are comparatively close to each
other. We experimented with permuting the means, while holding the group sizes fixed. In
Example 5.5.2, eight out of 120 permutations give bimodal restricted likelihood functions. Two
permutations yield three positive roots to the REML equations. The other six cases have two
positive roots, and one of the two local maxima occurs for θ = 0. In similar experiments for
Example 5.5.1, which features positive correlation between group sizes and means, bimodal
likelihood functions are obtained for 18 permutations. Again, these permutations keep the first
mean fixed. Only three permutations give three positive roots to the ML equations. The 18
permutations include the top six permutations in terms of large positive correlation but also
the permutation whose associated correlation ranks 43rd.
While dependence between group means and sizes plays a role in Examples 5.5.1 and 5.5.2,
the precise interplay between them appears to be subtle. For instance, when varying the mean
Y1 in Example 5.5.1 and keeping all other sufficient statistics fixed, we find that there are three
positive roots to the ML equations when −5.47 ≤ Y1 ≤ −5.08 but a unique root otherwise;
we experimented with a grid of values in [−10, 10]. In particular, the likelihood function is
unimodal for larger negative values of Y1. It would be interesting, but presumably difficult, to
90
get a better understanding of the semi-algebraic set of sufficient statistics that give (restricted)
likelihood functions with more than one local maximum.
CHAPTER 6
CONCLUSION
In this dissertation, we have seen three examples of how we can use tools from combina-
torial commutative algebra, multilinear algebra, and algebraic geometry to better understand
statistical models and methods. The theme of algebraic complexity arose in all three of these
examples.
In Chapter 3, we explored toric ideals of hypergraphs and showed how we can use com-
binatorics to understand the Markov bases of statistical models parameterized by square-free
monomials. In the literature, there are many papers that have explored toric ideals of graphs,
however, the content in Chapter 2 is one of only several known instances ([30] and [50]) where
toric ideals are studied through the combinatorial framework of hypergraphs. While hyper-
graphs are admittedly general and harder to visualize than graphs, there is a fast growing body
of work in combinatorics regarding hypergraphs and we see opportunities for more connections
to be made between toric ideals and these combinatorial constructions.
Since edge subrings of hypergraphs and their defining ideals, i.e. toric ideals of hypergraphs,
are only beginning to be explored, there are many open questions that would be interesting to
pursue. We list some of the open questions here.
Open Question 48 Given a hypergraph H, what is the Krull dimension of the edge subring
k[H]? This question could be understood by studying the vertex-edge incidence matrix of H,
91
92
see [44, Proposition 7.5], or perhaps by appropriately generalizing the results for graphs in [66,
Corollary 8.2.13].
Open Question 49 Given a hypergraph H, can we give combinatorial conditions that guar-
antee that k[H] is normal? Cohen-Macaulay? Gorenstein? These questions have been studied
for graphs in [47], [42] and [45].
Open Question 50 Given a hypergraph H, are there combinatorial conditions that imply IH
is a robust toric ideal, i.e. minimally generated by a universal Grobner basis (see [7])? For
example, Proposition 20 implies that when H is a 2-regular hypergraph, IH is a robust toric
ideal.
Answers to these questions would give us a better insight into the interplay between combi-
natorics and algebra in the study of toric ideals of hypergraphs, and in turn, give us better
understanding of toric models.
In Chapter 4, we described defining polynomials of the variety associated to the 4-state
general Markov model on the tree K1,3. This result marks the most recent progress toward
proving the salmon conjecture, i.e. showing that the phylogenetic ideal for the model is gener-
ated in polynomials of degree 5, 6, and 9. A final step would be to show that the polynomials
we describe in Section 4.4 define a radical ideal.
The results in Chapter 4 have two important impacts. First, even though we don’t have
an ideal description of the general Markov model, we do have explicit polynomials that can be
used for phylogenetic model selection according to methods described in [14]. Second, these
93
polynomials can also be used to test whether a 4× 4× 4 tensor has border rank at most four.
Currently, there are only a few values of m, n, l, and r for which the defining polynomials of
Vr(m,n, l) are known, but the issue of tensor rank is important in a variety of applications
including data mining, computer vision, and neuroscience.
In Chapter 5, we turned our attention towards maximum likelihood estimation and gave
an explicit formula for the ML degree for random effects models with one-way layouts. While
not only giving us insight into the feasibility of using algebraic methods to solve the likelihood
equations, the ML degree also tells us the number of paths we need to track if using numerical
solvers such as PHCpack [64] or Bertini [5] ,which use homotopy continuation methods.
As we saw in Section 5.5, it is possible for the likelihood equations to have more than one
real, positive solution. Thus, we argue that algebraic methods, symbolic or numeric, should
be used whenever feasible for maximum likelihood estimation, since local methods are not
guaranteed to return the global maximum. Symbolic algebraic methods have complexity issues
that may be hard to overcome in practice, however, homotopy continuation methods can be
used to solve polynomial equations numerically. Homotopy continuation methods have been
recently used in maximum likelihood estimation problems in [31]. One future application of
the research in Chapter 5 is to use the ML degrees to implement code to solve the likelihood
equations for variance component models using PHCpack.
94
CITED LITERATURE
1. E. S. Allman and J. A. Rhodes, Phylogenetic Invariants for the General MarkovModel of Sequence Mutation, Math. Biosci. 186 (2003), 113-144.
2. E. S. Allman and J. A. Rhodes, Phylogenetic ideals and varieties for general theMarkov model, Advances in Appl. Math., 40 (2008) 127-148.
3. S. Aoki, A. Takemura, R. Yoshida, Indispensable monomials of toric ideals andMarkov bases, Journal of Symbolic Computation 43 67 (2008) 490-507.
4. Quentin D. Atkinson, Phonemic diversity supports a serial founder effect model oflanguage expansion from Africa, Science 332 (2011), no. 6027, 346–349.
5. D. Bates, J. D. Hauenstein, A. J. Sommese, and C. W. Wampler,Bertini: Software for numerical algebraic geometry, Available at http://www.nd.edu/sommese/ bertini.
6. D.J. Bates and L. Oeding, Toward a salmon conjecture, Experimental Mathematics,20 no. 3 (2011) 358-370.
7. A. Boocher and E. Robeva, Robust Toric Ideals, preprint, arXiv:1304.0603.
8. Max-Louis G. Buot, Serkan Hosten, and Donald St. P. Richards, Counting and lo-cating the solutions of polynomial systems of maximum likelihood equations.II. The Behrens-Fisher problem, Statist. Sinica 17 (2007), no. 4, 1343–1354.
9. M. Casanellas and J. Fernandez-Sanchez, Performance of a new invariants methodon homogeneous and non-homogeneous quartet trees, Molecular Biology andEvolution, Vol 24, No. 1 (2007), pp. 288-293.
10. Fabrizio Catanese, Serkan Hosten, Amit Khetan, and Bernd Sturmfels, The maxi-mum likelihood degree, Amer. J. Math. 128 (2006), no. 3, 671–697.
11. H. Charalambous, A. Katsabekis, and A. Thoma. Minimal systems of binomial gen-erators and the indispensable complex of a toric ideal, Proceedings of theAmerican Mathematical Society 135 (2007) 3443-3451.
95
12. M. V. Catalisano and A. V. Geramita and A. Gimigliano, Ranks of tensors, secantvarieties of Segre varieties and fat points, Linear Algebra Appl., 355 (2002)263-285.
13. J. A. Cavender and J. Felsenstein, Invariants of phylogenies: a simple case withdiscrete states. Journal of Classification 4 (1987) 57-71.
14. M. Casanellas and J. Fernandez-Sanchez, Geometry of the Kimura 3-parametermodel, Advances in Applied Mathematics, Vol. 41, Issue 3 (2008) 265–292.
15. O. L. Davies and P. L. Goldsmith (eds.), Statistical methods in research and produc-tion, 4th ed., Hafner, 1972.
16. J. De Loera and S. Onn, Markov bases of three-way tables are arbitrarily complicated,J. Symb. Comput. 41 2 (February 2006) 173-18.
17. E. Demidenko and H. Massam, On the existence of the maximum likelihood estimatein variance components models, Sankhya Ser. A 61 (1999), no. 3, 431–443.
18. M. Develin and S. Sullivant. Markov bases of binary graph models, Annals of Com-binatorics 7 (2003) 441-466.
19. P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling from conditionaldistributions, Ann. Statist. 26, (1998) no. 1, 363–397.
20. A. Dobra and S. Sullivant. A divide-and-conquer algorithm for generating Markovbases of multi-way tables. Computational Statistics 19 (2004), 347-366.
21. J. Draisma and J. Kuttler, On the ideals of equivariant tree models, MathematischeAnnalen, Vol. 344, Issue 3 , pp 619-644 (2009).
22. M. Drton, B. Sturmfels, and S. Sullivant, Lectures on algebraic statistics, BirkhauserVerlag AG, Basel, Switzerland, 2009.
23. J. J. Faraway, Extending the linear model with R, Texts in Statistical Science Series,Chapman & Hall/CRC, Boca Raton, FL, 2006, Generalized linear, mixedeffects and nonparametric regression models.
24. S. Fienberg, The Analysis of Cross-classified Categorical Data, MIT Press, Cam-bridge, MA, 1980.
96
25. S. Fienberg, Expanding the Statistical Toolkit with Algebraic Statistics, StatisicaSinica, Vol. 17 (2007), pp. 1251-1272.
26. S. Friedland, On tensors of border rank l in Cm×n×l, Linear Algebra Appl., in press,arXiv:1003.1968.
27. S. Friedland and E. Gross. A proof of the set-theoretic version of the salmon con-jecture, Journal of Algebra, Vol. 356 (2012) No. 1, pp. 374-379.
28. I. Gitler, E. Reyes, R. Villarreal, Ring graphs and toric ideals, Electronic Notes inDiscrete Mathematics 28 1 (2007) 393-400.
29. E. Gross, M. Drton and S. Petrovic, Maximum likelihoood degree of variancecomponent models, Electronic Journal of Statistics, Vol. 6 (2012), pp. 993-1016.
30. E. Gross and S. Petrovic, Combinatorial degree bound for toric ideals of hypergraphs,arXiv: 1206.2512.
31. J. Hauenstain, J. Rodriguez, and B. Sturmfels, Maximum Likelihood for Matriceswith Rank Constraints, preprint, arXiv:1210.0198.
32. R. R. Hocking, The analysis of linear models, Brooks/Cole Publishing Co., Monterey,CA, 1985.
33. S. Hosten, A. Khetan, and B. Sturmfels, Solving the likelihood equations, Found.Comput. Math. 5 (2005), no. 4, 389–407.
34. S. Hosten and S. Sullivant, The algebraic complexity of maximum likelihood estima-tion for bivariate missing data, Algebraic and geometric methods in statistics,Cambridge Univ. Press, Cambridge, 2010, pp. 123–133.
35. S. Hosten and S. Sullivant, A finiteness theorem for Markov bases of hierarchicalmodels, J. Comb. Theory Ser. A 114 2 (2007) 311-321.
36. J. Jiang, Linear and generalized linear mixed models and their applications, SpringerSeries in Statistics, Springer, New York, 2007.
37. J. A. Lake, A rate-independent technique for analysis of nucleaic acid sequences:evolutionary parsimony, Molecular Biology and Evolution 4 (1987) 167-191.
97
38. J. M. Landsberg, Tensors: Geometry and Applications, Graduate Studies in Mathe-matics, Vol. 128, American Mathematical Society, 2012.
39. J.M. Landsberg and L. Manivel, On the ideals of secant varieties of Segre varieties,Found. Comput. Math. 4 (2004), 397422.
40. J.M. Landsberg and L. Manivel, Generalizations of Strassen’s equations for secantvarieties of Segre varieties, Comm. Algebra 36 (2008), 405–422.
41. L. H. Lim, “Tensors and hypermatrices,” in: L. Hogben (Ed.), Handbook of LinearAlgebra, 2nd Ed., CRC Press, Boca Raton, FL, 2013.
42. J. Martnez-Bernal and R. H. Villarreal, Toric ideals generated by circuits, AlgebraColloq. 19, 665 (2012).
43. P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed., Monographs onStatistics and Applied Probability, Chapman & Hall, London, 1989.
44. E. Miller and B. Sturmfels, Combinatorial commutative algebra, Graduate Texts inMathematics, 227, Springer-Verlag, New York, 2005.
45. A. O’Keefe, Cohen-Macaulay toric rings arising from finite graphs, Ph.d. Thesis,2012.
46. L. Oeding, Set-theoretic defining equations of the tangential variety of the Segrevariety, J. Pure and Applied Algebra, 215 (2011) 1516-1527.
47. H. Ohsugi and T. Hibi, Toric ideals generated by quadratic binomials, Journal ofAlgebra 218 (1999), 509-527.
48. H. Ohsugi and T. Hibi, Indispensable binomials of finite graphs, J. Algebra Appl. 4(2005), no 4, 421-434.
49. L. Pachter and B. Sturmfels, Algebraic Statistics for Computational Biology, Cam-bridge University Press, 2005.
50. S. Petrovic and D. Stasi, Toric algebra of hypergraphs, Journal of Algebraic Combi-natorics, to appear.
98
51. E. Reyes, C. Tatakis and A. Thoma, Minimal generators of toric ideals of graphs,Adv. in Appl. Math 48 (2012), no. 1, 64-67
52. H. Sahai and M. M. Ojeda, Analysis of variance for random models. Vol. I. Bal-anced data, Birkhauser Boston Inc., Boston, MA, 2004, Theory, methods,applications and data analysis.
53. H. Sahai and M. M. Ojeda, Analysis of variance for random models. Vol. II. Un-balanced data, Birkhauser Boston Inc., Boston, MA, 2005, Theory, methods,applications, and data analysis.
54. S. R. Searle, G. Casella, and C. E. McCulloch, Variance components, Wiley Series inProbability and Mathematical Statistics: Applied Probability and Statistics,John Wiley & Sons Inc., New York, 1992, A Wiley-Interscience Publication.
55. V. Strassen, Rank and optimal computations of generic tensors, Linear Algebra Appl.,52/53 (1983) 645-685.
56. B. Sturmfels, Grobner bases and convex polytopes, University Lecture Series, 8, Amer-ican Mathematical Society, 1996.
57. B. Sturmfels, Open problems in algebraic statistics, Emerging applications of alge-braic geometry, IMA Vol. Math. Appl., vol. 149, Springer, New York, 2009,pp. 351–363.
58. B. Sturmfels and S. Sullivant, Toric ideals of phylogenetic invariants, Journal ofComputational Biology 12 (2005) 204-228.
59. B. Stumfels and P. Zwiernik, Binary cumulant varieties, Annals of Combinatorics,to appear.
60. S. Sullivant, Statistical models are algebraic varieties, Lecture notes. Available athttp://www4.ncsu.edu/ smsulli2/Activities/assc.html.
61. W. H. Swallow and J. F. Monahan, Monte Carlo comparison of ANOVA, MIVQUE,REML, and ML estimators of variance components, Technometrics 26 (1984),no. 1, 47–57.
62. C. Tatakis and A. Thoma, On the universal Grobner basis of toric ideals of graphs,Journal of Combinatorial Theory, Series A, 118 (2011) 1540-1548.
99
63. A. Takemura and S. Aoki, Some characterizations of minimal Markov basis for sam-pling from discrete conditional distributions, Ann. Inst. Statist. Math. 56 1(2004), 117.
64. J. Verschelde. Algorithm 795: PHCpack: A general-purpose solverfor polynomial systems by homotopy continuation, ACM Trans.Math. Softw. 25 2 (1999), 251-276. Software available athttp://www.math.uic.edu/~jan/download.html.
65. R. Villarreal, Rees algebras of edge ideals, Communications in Algebra 23 (9),3513–3524 (1995).
66. R. Villarreal, Monomial Algebras. Monographs and Textbooks in Pure and AppliedMathematics 238, Marcel Dekker, Inc: New York, 2001.
100
VITA
ELIZABETH GROSS
EDUCATION:
PhD in Mathematics. University of Illinois at Chicago, Aug, 2013 (expected).
Advisors: Sonja Petrovic and Shmuel Friedland.
MA in Mathematics. San Francisco State University, Aug, 2010.
Advisor: Arek Goetz.
BS in Mathematics. California State University, Chico, May, 2003.
PUBLICATIONS:
Maximum likelihood degree of variance component models, with Mathias Drton
and Sonja Petrovic, Electronic Journal of Statistics, 6, (2012), 993-1016.
A proof of the set-theorectic version of the salmon conjecture, with Shmuel
Friedland, Journal of Algebra 356, (2012), no.1, 374-379.
Combinatorial degree bound for toric ideals of hypergraphs, with Sonja Petrovic,
arXiv:1206.2512, Submitted.
PHCPack in Macaulay2, with Sonja Petrovic and Jan Verschelde, arXiv:1105.4881,
Submitted.
101
Modeling social networks using a random walk on a torus, M.A. Thesis, San
Francisco State University.
SOFTWARE:
PHCPack.m2, with Sonja Petrovic and Jan Verschelde, A Macaulay2 interface for PHC-
Pack. Available with Macaulay2 v.1.4 and later.
EXPERIENCE:
Research Assistant, Penn State, 2011–2012.
Visiting Student in the Statistics Department.
Graduate Teaching Assistant, University of Illinois at Chicago, 2009–2011.
Discussion leader for Intermediate Algebra and Calculus I. Grader and Tutor for Ap-
plied Linear Algebra.
Graduate Teaching Assistant, San Francisco State University, 2008–2009.
Instructor for Beginning Algebra and Intermediate Algebra. Discussion leader for
Calculus I, II. Grader for Calculus III.
AWARDS:
Deans Scholar Award, University of Illinois at Chicago, 2012.
Merit fellowship sponsored by the Graduate College.
Poster Award, SIAM Conference on Applied Algebraic Geometry, 2011.
First place in poster competition.
102
MSCS Graduate Student Teaching Award, University of Illinois at Chicago, 2010.
Departmental award for excellent teaching by a teaching assistant.
Sergio Martins Memorial Scholarship, San Francisco State University, 2008.
Merit scholarship sponsored by the Mathematics Department.
Robert W. Maxwell Memorial Scholarship, San Francisco State University, 2008.
Merit scholarship sponsored by the College of Science and Engineering.