1
Universal Gr ¨ obner Bases of Determinantal Ideals Isaac Zebulun Burke Supervisor: Dr. Emil Sk¨ oldberg School of Mathematics, Statistics and Applied Mathematics National University of Ireland, Galway [email protected] Abstract H ere we study the algebraic properties of log-linear independence models [1], considering in par- ticular 2 × 2 ×···× 2 independence models. The fiber polytopes of these models are a special class of n-way transportation polytopes, the general properties of which have been well documented for n =2, 3, see e.g. [2] and references therein. Given a generic m × n matrix A over a field whose entries are algebraically independent variables x ij , the set of k -minors (0 <k m, k n) of A generates a determinantal ideal I [x ij ]. We associate such a determinantal ideal I with each independence model and seek to enumerate the elements of the universal Gr¨ obner basis of I , drawing on results of Sturmfels [3] (see chapter 7) while making use of the soware systems polymake and gfan. Implications are discussed in relation to the problem of describing graphs of n-way transportation polytopes for n 4. Our thanks goes to the organisers of the June 2013 summer school on Algebraic Statistics in Nordf jordeid, Norway, where this research project was suggested. The project is currently in its early stages. 1. Log-linear Models I statistics, a model is frequently thought of as a pair (Y, P ), where Y is the set of possible obser- vations and P is the set of possible probability distributions (on Y ). It is assumed that there is a distinct element of P which generates the observed data. Statistical inference enables us to make statements about which element(s) of P are most likely to be the ‘true one’. Definition 1. Fix a matrix A in Z d×m whose columns all sum to the same value. The j -th column vector a j of A represents the monomial θ a j = d Y i=1 θ a ij i for j =1, 2, ..., m. (1) The log-linear model associated with A is the image of the orthant Θ= R d >0 under the map f : R d R m , (θ 1 , ..., θ d ) 71 m j =1 θ a j · (θ a 1 a 2 , ..., θ a m ). (2) Example 2. Given the following matrix A and the related map ψ : A = 300212100 030120021 003001212 ; ψ (θ 1 2 3 )= 1 9 j =1 θ a j · (θ 3 1 3 2 3 3 2 1 θ 2 1 θ 2 2 2 1 θ 3 1 θ 2 3 2 2 θ 3 2 θ 2 3 ); the associated log-linear model is the image of the orthant Θ= R 3 >0 under the map ψ : R 3 R 9 . The unknowns θ 1 2 and θ 3 represent the model parameters. Typically, data is categorical and comes as u =(u 1 , ..., u 9 ) Z 9 0 , with sample size N = 9 i=1 u i . This setup could model, for example, a simple tournament of Rock-Paper-Scissors where one round consists of three games, and it is not permissible to make three dierent choices in one round. A sample size N (i.e. a finite number of rounds) is fixed, and data of the choices of competitor X are collected. The first entry u 1 in the data vector u gives the number of instances where competitor X chooses Rock three times, and so on. Methods of statistical inference are employed to estimate θ 1 (the probability with which X chooses Rock ), θ 2 (the probability with which X chooses Paper ) and θ 3 (the probability with which X chooses Scissors ), usually under the assumption of independence. For those who are interested, the reference [1] (pp. 26-28) provides greater analysis of this example. 2. Contingency Tables O of the most popular/classical applications of log-linear models is to the analysis of contin- gency tables. The contingency table below appeared in the article ‘Aitudes about Marijuana and Political Views ’, Psychological Reports, 1973, pp. 1051-1054. 479 173 119 771 214 47 15 276 172 45 85 302 865 265 219 1349 If one considers the figures in normalfont as a 3 × 3 matrix, its rows represent Liberal, Conservative and Other political views respectively; its columns represent levels of marijuana use, Never, Rarely and Frequently respectively. In this example, a random sample of high school and college students was taken. The figures in boldface are called the marginals of the contingency table. The main hypothesis of interest with ‘two-responses ’ (e.g. political views, marijuana use) sampling is whether the two responses are independent. If we let π i+ , 1 i 3 and π +j , 1 i 3 represent the marginal distributions of political views and marijuana use respectively, and π ij represent the joint distribution of these two variables, then by definition, the variables are independent if (and only if ) their joint distribution is the product of the marginals. Thus, we can write the hypothesis of independence as: H : π ij = π i+ · π +j , (3) for all i =1, 2, 3 and j =1, 2, 3. Estimation of the parameters and testing of the hypothesis of independence can be performed in various ways. From an algebraic viewpoint, the Markov Chain Monte Carlo method is the most significant. The process is as follows. A configuration matrix A is associated with the 3 × 3 table, thus revealing the log-linear structure of its independence model. A = 111 0 0 0 0 0 0 000 1 1 1 0 0 0 000 0 0 0 1 1 1 100 1 0 0 1 0 0 010 0 1 0 0 1 0 001 0 1 0 0 0 1 The observed table is encoded in the vector x 0 = (419, 137, 119, 214, 47, 15, 172, 45, 85) T and the marginals are represented by t = (771, 276, 302, 865, 265, 219) T . Observe now that Ax 0 = t. The set of contingency tables x satisfying Ax = t is called the fiber of t and denoted F t i.e. F t = {x Z 9 0 : Ax = t}. The p-value of x 0 is defined as p = P (φ(x) φ(x 0 )|H )= X xF t (x)φ(x 0 ) P (x|t = Ax 0 ,H ), (4) where φ(x) calculates the extremality of x. Given the level of significance α, we reject H if p α. In general, there are three ways to calculate this p-value. • Enumerate all of F t (best, but generally infeasible); • Directly sample x (not easy for complicated models); • Sample x by a Markov Chain (recommended; for the most recent analysis of this method, see [5]). 3. Markov Bases Definition 3. Let B ker Z A be a finite set of moves for a configuration A. B is called a Markov Basis if for all fibers F t and for all elements x, y F t , x 6= y, there exist K> 0, z 1 , ..., z K B and 1 , ..., K ∈ {-1, 1}, such that y = x + K X k =1 k z k , x + L X k =1 k z k F t , L =1, ..., K - 1. (5) A Markov Basis provides an optimal way of performing a random walk on the elements of a fiber. The first condition says that by adding or subtracting elements of B , we can move from x to y. The second condition says that on the way from x to y we never encounter a negative frequency. Once a Markov Basis is obtained for some model, it is easy to construct a Markov Chain over F Ax 0 , where x 0 is the observed frequency vector and F Ax 0 = F t is the fiber containing x 0 . One then performs a MCMC method to find the p-value. Theorem 4. The set of 2 × 2 minors of the form 1 -1 -1 1 ! forms a Markov Basis for the I × J independence model of two-way contingency tables. Example 5. Let I = J =3, as in the example of the previous section. Then the set L ={(1,0,-1,0,0,0,-1,0,1), (0,1,-1,0,0,0,0,-1,1), (0,0,0,1,0,-1,-1,0,1), (0,0,0,0,1,-1,0,-1,1)} forms a laice basis for ker Z A. A Markov Basis B is given by augmenting L with (1,-1,0,-1,1,0,0,0,0), (1,0,-1,-1,0,1,0,0,0), (1,-1,0,0,0,0,-1,1,0), (0,0,0,1,-1,0,-1,1,0), (0,1,-1,-1,1,0,0,0,0). By way of illustration, the 3 × 3 tables below belong to the same fiber. It can easily be seen that using only the elements of L above, we can move from x to y, but not without encountering a neqa- tive frequency (table). By constrast, we can perform the move in a single step (and hence without encountering negative frequencies) using the element (1, -1, 0, -1, 1, 0, 0, 0, 0) B . x 2 1 1 4 2 0 2 4 1 2 0 3 5 3 3 11 , y 3 0 1 4 1 1 2 4 1 2 0 3 5 3 3 11 4. Diiculties and Research Plans U , finding Markov Bases for more complicated models of contingency tables is dii- cult and each model generally needs separate consideration, see [4]. The Fundamental Theorem of Markov Bases implies that (Universal) Gr¨ obner Bases can be used instead of Markov Bases, but computation times for these become similarly infeasible, even for basic models e.g. for the model in Example 2: #[Markov Basis]: 17, #[Gr¨ obner Basis (GRevLex)]: 19, #[Reduced Gr¨ obner Bases]: 54,828 (361.378 seconds ), #[Universal Gr¨ obner Basis]: 213 (477.084 seconds ). The goal of this research project is to identify methods of computing Universal Gr¨ obner Bases for 2 × 2 ×···× 2 independence models which are more time-optimal than those that currently exist. This model has many symmetries which can be exploited by soware such as gfan (Jensen). A theorem of Sturmfels (1996) which characterises Universal Gr¨ obner Bases via the edges of certain fiber polytopes will be the starting point for exploring a more geometric approach to basis finding. 1. D, M., S, B. and S, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars, Vol.39. 2. D L, J.A., K, E.D., O, S. and S, F. (2009). Graphs of transportation polytopes. Journal of Combinatorial Theory, Vol.116, pp. 1306-1325. 3. S, B., (1996). Gr¨ obner Bases and convex polytopes. University Lecture Series, AMS, Vol.8. 4. H, H., A, S. and T, A. (2011). Running Markov Chain without Markov Basis. arXiv:1109.0078v1. 5. H, H., A, S. and T, A. (2012). Markov bases in Algebraic Statistics. Springer, New York. as2014 at iit, Algebraic Statistics Conference, 19-22 May 2014, Illinois Institute of Technology, Chicago, IL

Universal Grobner Bases of¨ Determinantal Idealsmypages.iit.edu/~as2014/talks/posterIsaac.pdfMarkov bases in Algebraic Statistics. Springer, New York. as2014 at iit, Algebraic Statistics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Universal Grobner Bases of¨ Determinantal Idealsmypages.iit.edu/~as2014/talks/posterIsaac.pdfMarkov bases in Algebraic Statistics. Springer, New York. as2014 at iit, Algebraic Statistics

Universal Grobner Bases ofDeterminantal Ideals

Isaac Zebulun BurkeSupervisor: Dr. Emil Skoldberg

School of Mathematics, Statistics and Applied MathematicsNational University of Ireland, Galway

[email protected]

Abstract

Here we study the algebraic properties of log-linear independence models [1], considering in par-ticular 2 × 2 × · · · × 2 independence models. The fiber polytopes of these models are a special

class of n-way transportation polytopes, the general properties of which have been well documented forn = 2, 3, see e.g. [2] and references therein.

Given a genericm×nmatrixA over a field k whose entries are algebraically independent variables xij ,the set of k-minors (0 < k ≤ m, k ≤ n) of A generates a determinantal ideal I ⊂ k[xij]. We associatesuch a determinantal ideal I with each independence model and seek to enumerate the elements of theuniversal Grobner basis of I , drawing on results of Sturmfels [3] (see chapter 7) while making use ofthe so�ware systems polymake and gfan. Implications are discussed in relation to the problem ofdescribing graphs of n-way transportation polytopes for n ≥ 4.

Our thanks goes to the organisers of the June 2013 summer school on Algebraic Statistics in Nordfjordeid,Norway, where this research project was suggested. The project is currently in its early stages.

1. Log-linear Models

In statistics, amodel is frequently thought of as a pair (Y,P), where Y is the set of possible obser-vations and P is the set of possible probability distributions (on Y ). It is assumed that there is a

distinct element of P which generates the observed data. Statistical inference enables us to makestatements about which element(s) of P are most likely to be the ‘true one’.

Definition 1. Fix a matrix A in Zd×m whose columns all sum to the same value. The j-th columnvector aj of A represents the monomial

θaj =

d∏i=1

θaiji for j = 1, 2, ...,m. (1)

The log-linear model associated with A is the image of the orthant Θ = Rd>0 under the map

f : Rd→ Rm, (θ1, ..., θd) 7→1∑m

j=1 θaj· (θa1, θa2, ..., θam). (2)

Example 2.Given the following matrix A and the related map ψ:

A =

3 0 0 2 1 2 1 0 00 3 0 1 2 0 0 2 10 0 3 0 0 1 2 1 2

; ψ(θ1, θ2, θ3) = 1∑9j=1 θ

aj· (θ3

1, θ32, θ

33, θ

21θ2, θ1θ

22, θ

21θ3, θ1θ

23, θ

22θ3, θ2θ

23);

the associated log-linear model is the image of the orthant Θ = R3>0 under the map ψ : R3 → R9.

The unknowns θ1, θ2 and θ3 represent themodel parameters. Typically, data is categorical and comesas u = (u1, ..., u9) ∈ Z9

≥0, with sample size N =∑9i=1 ui.

This setup could model, for example, a simple tournament of Rock-Paper-Scissors where one roundconsists of three games, and it is not permissible to make three di�erent choices in one round. Asample size N (i.e. a finite number of rounds) is fixed, and data of the choices of competitor X arecollected. The first entry u1 in the data vector u gives the number of instances where competitor Xchooses Rock three times, and so on. Methods of statistical inference are employed to estimate θ1(the probability with which X chooses Rock), θ2 (the probability with which X chooses Paper) andθ3 (the probability with which X chooses Scissors), usually under the assumption of independence.

For those who are interested, the reference [1] (pp. 26-28) provides greater analysis of this example.

2. Contingency Tables

One of the most popular/classical applications of log-linear models is to the analysis of contin-gency tables. The contingency table below appeared in the article ‘A�itudes about Marijuana

and Political Views’, Psychological Reports, 1973, pp. 1051-1054.479 173 119 771214 47 15 276172 45 85 302

865 265 219 1349

If one considers the figures in normalfont as a 3× 3 matrix, its rows represent Liberal, Conservativeand Other political views respectively; its columns represent levels of marijuana use, Never, Rarelyand Frequently respectively. In this example, a random sample of high school and college studentswas taken. The figures in boldface are called themarginals of the contingency table.

The main hypothesis of interest with ‘two-responses’ (e.g. political views, marijuana use) samplingis whether the two responses are independent. If we let πi+, 1 ≤ i ≤ 3 and π+j, 1 ≤ i ≤ 3 representthe marginal distributions of political views and marijuana use respectively, and πij represent thejoint distribution of these two variables, then by definition, the variables are independent if (andonly if) their joint distribution is the product of the marginals. Thus, we can write the hypothesis ofindependence as:

H : πij = πi+ · π+j, (3)

for all i = 1, 2, 3 and j = 1, 2, 3.

Estimation of the parameters and testing of the hypothesis of independence can be performed invarious ways. From an algebraic viewpoint, the Markov Chain Monte Carlo method is the mostsignificant. The process is as follows. A configuration matrix A is associated with the 3× 3 table,thus revealing the log-linear structure of its independence model.

A =

1 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 1 0 0 1 0 00 1 0 0 1 0 0 1 00 0 1 0 1 0 0 0 1

The observed table is encoded in the vector x0 = (419, 137, 119, 214, 47, 15, 172, 45, 85)T and themarginals are represented by t = (771, 276, 302, 865, 265, 219)T. Observe now that Ax0 = t.The set of contingency tables x satisfying Ax = t is called the fiber of t and denoted Ft i.e.Ft = {x ∈ Z9

≥0 : Ax = t}. The p-value of x0 is defined as

p = P (φ(x) ≥ φ(x0)|H) =∑

x∈Ft, φ(x)≥φ(x0)

P (x|t = Ax0, H), (4)

where φ(x) calculates the extremality of x. Given the level of significance α, we reject H if p ≤ α.

In general, there are three ways to calculate this p-value.

• Enumerate all of Ft (best, but generally infeasible);

•Directly sample x (not easy for complicated models);

• Sample x by aMarkov Chain (recommended; for the most recent analysis of this method, see [5]).

3. Markov Bases

Definition 3. Let B ⊂ kerZA be a finite set of moves for a configuration A. B is called aMarkovBasis if for all fibers Ft and for all elements x,y ∈ Ft,x 6= y, there exist K > 0, z1, ..., zK ∈ Band ε1, ..., εK ∈ {−1, 1}, such that

y = x +

K∑k=1

εkzk, x +

L∑k=1

εkzk ∈ Ft, L = 1, ..., K − 1. (5)

A Markov Basis provides an optimal way of performing a random walk on the elements of a fiber.The first condition says that by adding or subtracting elements of B, we can move from x to y. Thesecond condition says that on the way from x to y we never encounter a negative frequency. Once aMarkov Basis is obtained for some model, it is easy to construct a Markov Chain over FAx0, wherex0 is the observed frequency vector and FAx0 = Ft is the fiber containing x0. One then performsa MCMC method to find the p-value.

Theorem 4. The set of 2 × 2 minors of the form(

1 −1−1 1

)forms a Markov Basis for the I × J

independence model of two-way contingency tables.

Example 5. Let I = J = 3, as in the example of the previous section. Then the set

L ={(1,0,−1,0,0,0,−1,0,1), (0,1,−1,0,0,0,0,−1,1), (0,0,0,1,0,−1,−1,0,1), (0,0,0,0,1,−1,0,−1,1)}

forms a la�ice basis for kerZA. A Markov Basis B is given by augmenting L with

(1,−1,0,−1,1,0,0,0,0), (1,0,−1,−1,0,1,0,0,0), (1,−1,0,0,0,0,−1,1,0), (0,0,0,1,−1,0,−1,1,0), (0,1,−1,−1,1,0,0,0,0).

By way of illustration, the 3 × 3 tables below belong to the same fiber. It can easily be seen thatusing only the elements ofL above, we can move from x to y, but not without encountering a neqa-tive frequency (table). By constrast, we can perform the move in a single step (and hence withoutencountering negative frequencies) using the element (1,−1, 0,−1, 1, 0, 0, 0, 0) ∈ B.

x ∼

2 1 1 42 0 2 41 2 0 3

5 3 3 11

, y ∼

3 0 1 41 1 2 41 2 0 3

5 3 3 11

4. Di�iculties and Research Plans

Unfortunately, findingMarkov Bases for more complicated models of contingency tables is di�i-cult and eachmodel generally needs separate consideration, see [4]. The Fundamental Theorem

of Markov Bases implies that (Universal) Grobner Bases can be used instead of Markov Bases, butcomputation times for these become similarly infeasible, even for basic models e.g. for the modelin Example 2: #[Markov Basis]: 17, #[Grobner Basis (GRevLex)]: 19, #[Reduced Grobner Bases]:54,828 (361.378 seconds), #[Universal Grobner Basis]: 213 (477.084 seconds).

The goal of this research project is to identify methods of computing Universal Grobner Bases for2× 2× · · · × 2 independence models which are more time-optimal than those that currently exist.This model has many symmetries which can be exploited by so�ware such as gfan (Jensen). Atheorem of Sturmfels (1996) which characterises Universal Grobner Bases via the edges of certainfiber polytopes will be the starting point for exploring a more geometric approach to basis finding.

1.Drton, M., Sturmfels, B. and Sullivant, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars, Vol.39.2.De Loera, J.A., Kim, E.D., Onn, S. and Santos, F. (2009). Graphs of transportation polytopes. Journal of Combinatorial Theory, Vol.116, pp. 1306-1325.3. Sturmfels, B., (1996). Grobner Bases and convex polytopes. University Lecture Series, AMS, Vol.8.4.Hara, H., Aoki, S. and Takemura, A. (2011). Running Markov Chain without Markov Basis. arXiv:1109.0078v1.

5.Hara, H., Aoki, S. and Takemura, A. (2012). Markov bases in Algebraic Statistics. Springer, New York.

as2014 at iit, Algebraic Statistics Conference, 19-22 May 2014, Illinois Institute of Technology, Chicago, IL