Introduction to Markov bases and contingency table …park.itc.u-tokyo.ac.jp/atstat/takemura-talks/20140714-takemura... · Introduction to Markov bases and contingency table models

Introduction to Markov bases and contingency tablemodels in statistics

Akimichi Takemura

University of Tokyo

July 14, 2014. NIMS workshop

A.Takemura (U.Tokyo) Markov bases 2014/7/14 1 / 49

Outline

1 Background and project research on algebraic statistics

2 Two-way contingency tables and their Markov bases

3 The case of three-way contingency tables

4 Toric models and fundamental theorem of Markov bases

5 Some results on complexity of Markov bases

6 Generalized hypergeometric distribution


1. Background: origins of algebraic statistics

Two papers:

Pistone and Wynn (1996) “Generalised confounding with Grobnerbases”, Biometrika.

Diaconis and Sturmfels (1998) “Algebraic algorithms for samplingfrom conditional distributions”, Annals of Statistics.

In particular the latter established the connection between Markov bases(MB) for exact tests and set of generators of toric ideals.

“Fundamental theorem of MB” a Markov basis ↔ a set of generators of a toric ideal


My own research on algebraic statistics

2001: started working on MB for contingency tables with S.Aoki.

2003: started collaboration with Hibi and Ohsugi.

bidirectional developments:

Statistics → Algebra: algebraic structure of statistical models

Algebra → Statistics : application of known algebraic results

We studied MB for hierarchical models of contingency tables, logisticregression, etc.


My own research on algebraic statistics

Book : Aoki, Hara and Takemura (2012) “Markov Bases in AlgebraicStatistics”, Springer.


Project research with Takayuki Hibi

Project: October 2008 – March 2014.Japan Science and Technology Agency CREST area.

“Alliance for breakthrough between mathematics and sciences”

Project title “Harmony of Grobner bases and the modern indus-

trial society” (PI: Takayuki Hibi)

Three groups: theory, computation, application

Theory (algebra): Hibi, Ohsugi

Computation: Takayama, Noro, Hamada

Application (statistics): Takemura, Aoki



Book in Japanese (2011) “Grobner Dojo” Kyoritsu Shuppan

Chapter 4: Markov basesA.Takemura (U.Tokyo) Markov bases 2014/7/14 7 / 49


English version (2013)Grobner Bases: Statistics and Software Systems edited by T.Hibi

Chapter 7 with more than 100 pages gives hands-on examples.



Collaboration among Theory, Computation and Application

New development: D-module theory and Grobner bases for the ringof differential operators applied to statistics

Holonomic Gradient Method (HGM)


2. Two-way contingency tables and their Markov bases

Table 1 : Example of scores of a class (hypothetical data)

Algebra\Statistics A B C total

A 7 5 1 13B 5 10 6 21C 2 6 8 16

total 14 21 15 50

This is an example of a 3× 3 two-way contingency table.


Introduction to MB via contingency tables

I × J two-way contingency table

factor 1\factor 2 1 . . . J row sum

1 x11 . . . x1J x1+... . . .

...I xI1 . . . xIJ xI+

column sum x+1 . . . x+J n

xij : joint frequency

xi+, x+j : marginal frequency, n: total frequency


Markov basis for two-way tables

Statistical interest: hypothesis of independence

H0 : pij = αiβj , ∀i , j (1)

pij is the joint probability (or cell probability)

αi = pi+, βj = p+j is the marginal probability

This is the parametric form of the model of independence.



“Restriction” or “implicit” specification: the hypothesis ofindependence can also be written as

pijpi ′j ′

pij ′pi ′j= 1, “odds ratio”1, “log linear” (2)

orpijpi ′j ′ − pij ′pi ′j = 0. binomial form (3)

for all i = i ′, j = j ′. Indeed

pijpi ′j′ − pij′pi ′j = αiαi ′βjβj′ − αiαi ′βjβj′ = 0.

(These two forms seem to be somewhat different.)

Conversely, for example, by setting i ′ = j ′ = 1 we have

pij =1

p11pi1p1j = αiβj .

1odds is already a ratio of probabilities, usually p/(1− p).A.Takemura (U.Tokyo) Markov bases 2014/7/14 13 / 49

Pearson’s χ2 statistic

Look at differences between observed frequencies and the “expected”frequencies.

Expected frequency: proportionally distribute column sums and rowsums (13 : 21 : 16 and 14 : 21 : 15)

xij =xi+x+j

n.

Table 2 : Observed and expected frequencies

Alg\Stat A B C total

A 7 5 1 13B 5 10 6 21C 2 6 8 16

total 14 21 15 50

Alg\Stat A B C total

A 3.64 5.46 3.9 13B 5.88 8.82 6.3 21C 4.48 6.72 4.8 16

total 14 21 15 50


Pearson’s χ2 statistic

χ2 =∑i ,j

(Observed− Expected)2

Expected

=(7− 3.64)2

3.64+ · · ·+ (8− 4.8)2

4.8= 3.36

If this value is “large”, then we reject the null hypothesis H0.

(Fisher’s) Exact test: We evaluate how large it is based on thehypergeometric distribution over the set of contingency tables withgiven row sums and column sums.

Classically we compare χ2 = 3.36 to the upper 5 percentile of χ2-distribution

with 4 = (3− 1)× (3− 1) degrees of freedom. “Asymptotic approximation”.


Hypergeometric distribution appears as the conditionaldistribution

Consider n multinomial draws of cells with cell probabilities pij .

The probability of observing x:

p(x) =n!

x11! . . . xIJ !

∏i ,j

pxijij .

Under the independence model we have∏i ,j

pxijij =

∏i ,j

(αiβj)xij =

∏i

αxi+i

∏j

βx+j

j .

Conditional distribution is free from αi , βj .

p(xij | xi+, x+j) ∝1∏

i ,j xij !

∝: proportional, i.e., except for the normalizing constant


Configuration matrix

Marginal frequencies: b = (13, 21, 16, 14, 21, 15)t

Observed frequencies: x = (x11, x12, . . . , x33)t = (7, 5, . . . , 8)t

“Configuration”

A =

1 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 1 0 0 1 0 00 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 1

=

(E3 ⊗ 1t31t3 ⊗ E3

)

(Ek :identity matrix, 1k : vector consisting of 1’s)


Configuration matrix

The relation between observed frequencies and the marginalfrequencies

b = Ax

Fiber: The set of contingency tables sharing the common marginalfrequencies:

Fb = x ∈ NI×J | b = Ax, N = 0, 1, . . .

In the above example, |Fb| = 10310.

As the table gets larger, |Fb| also gets larger and enumeration of thefiber becomes difficult.

→ we want to sample from the fiber.


Markov chain over the fiber

Markov chain: walk around the fiber

basic move: choose the following z and add to the current table

z =j j ′

i +1 −1i ′ −1 +1

(4)

0 = Az: the marginal frequencies remain the same

If a negative cell appears, then discard the move.

The move z above corresponds to the binomial in (3):

pijpi ′j ′ − pij ′pi ′j = 0. a move ↔ a binomial



Question: can we reach every table of the fiber by these moves,avoiding negative cells?

YES for two-way tables, independent of I , J.

A Markov basis: A set of moves, which allows us to generate alltables avoiding negative elements (and for all fibers).

For two-way tables, the moves in (4) form a Markov basis.

Fundamental theorem (again): Markov basis ↔ generators of a toric ideal


3. The case of three-way contingency tables

Consider “no-three-factor interaction model” for three-waycontingency tables.

All the line sums (three directions) of three-way tables are fixed:

1 111

1111

111 1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

01

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

(Example of 3× 4× 6 table)

“higher Lawrence lifting” of two-way independence model.


Markov bases for three-way tables

“Basic move” for this case:

However these basic moves do not form a Markov bases (there arecases, where we can not avoid negative elements).

We need additional moves.

Markov bases become much more complicated.


Markov bases for 3× 3× K tables

A move of degree 6 is needed for 3× 3× 3:







However these moves are sufficient for all K ≥ 5 !(Aoki and Takemura(2003))


Notation for I × J × K three-way tables

I × J × K three-way contingency table (ex. frequencies of scores forAlgebra, Statistics and Geometry)

x111 · · · x11K...

...x1J1 · · · x1JK

· · ·xI11 · · · xI1K...

...xIJ1 · · · xIJK

Line sums: two-dimensional marginal frequencies

xij+ =∑k

xijk , (xi+k , x+jk similarly defined)

Fiber: all contingency tables with the line sums b

Fb = x = (xijk) ∈ NI×J×K | xij+ = xoij+, xi+k , x+jk = b (5)


Basic moves do not make a fiber connected

Consider the following fiber with I = J = K = 3

F = x = (xijk) | xijk ∈ N,a = xij+ = xi+k = x+jk , 1 ≤ i , j , k ≤ 3

The following element of the fiber is isolated.

a 0 00 a 00 0 a

0 a 00 0 aa 0 0

0 0 aa 0 00 a 0

(a can be arbitrarily large)


Configuration matrix for 2× 2× 2 case

A =

1 1 0 0 0 0 0 00 0 1 1 0 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 11 0 1 0 0 0 0 00 1 0 1 0 0 0 00 0 0 0 1 0 1 00 0 0 0 0 1 0 11 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 0 0 0 1

=

EI ⊗ EJ ⊗ 1tKEI ⊗ 1tJ ⊗ EK

1tI ⊗ EJ ⊗ EK

, (I = J = K = 2)


No-three-factor interaction model

Parametric model:pijk = αijβjkγik . (6)

Implicit specification: For i = i ′, j = j ′, k = k ′

1 =pijkpij ′k′pi ′jk ′pi ′j ′kpi ′jkpij ′kpijk ′pi ′j ′k ′

, ratios of odds ratios, log linear

orpijkpij ′k ′pi ′jk ′pi ′j ′k − pi ′jkpij ′kpijk ′pi ′j ′k ′ = 0. binomial (7)


Hypergeometric distribution for no-three-factor interactionmodel

As in the two-way case, the conditional distribution of the frequencyvector x given all the line sums is the hypergeometric distributionover the fiber in (5).

p(x | xij+, xi+j , x+jk = b) ∝ 1∏i ,j ,k xijk !

, x ∈ Fb


Move of degree 6 again

1 =p111p122p212p221p112p121p211p222

p222p233p123p132p122p133p223p232

In the rational form, this degree 6 move is written in terms of basic moves.


Move of degree 6 again

However

p111p212p221p233p123p132 − p112p121p211p133p223p232

can not be written as a polynomial combination of binomials in (7).

This suggests avoid negative cells ↔ only use polynomials


A Grobner basis for 3× 3× 3 case from Diaconis andSturmfels (1998)

A Grobner basis also contains

28 degree 7 moves

0 0 0 +1 0 −1 −1 0 +10 −1 +1 −1 +1 0 +1 0 −10 +1 −1 0 −1 +1 0 0 0

Just one degree 9 move

−2 +1 +1 +1 0 −1 +1 −1 0+1 0 −1 0 0 0 −1 0 +1+1 −1 0 −1 0 +1 0 +1 −1

Aoki showed me this move. I was very much surprised and gotinterested in Markov bases in 2001.


A Grobner basis for 3× 3× 3 case from Diaconis andSturmfels (1998)

Picture of the degree 7 move

We can use either one of the two basic moves to avoid −1 at theblack cell. Hence this move is not needed for connectivity of fibers.


Markov bases for I × J × K tables and multiway tables

Markov bases for no-three-factor interaction model of generalI × J × K tables are very complicated.

For multiway tables (four-way or higher), we can fix various marginaltotals.

These correspond to “hierarchical models” of contingency tables.


4. Toric models and fundamental theorem of Markov bases

We have been considering contingency tables because of theirimportance in statistics.

Markov bases are defined for general “toric models”.

Toric models are “exponential family models over a finite samplespace with sufficient statistics consisting of integers”.

Let Ω = ω1, . . . , ων be a finite set of possible outcomes.

Ω is called the sample space.

In contingency tables, each cell is an outcome and Ω is a directproduct of finite sets.

I × J × K contingency table: Ω = [I ]× [J]× [K ]


Toric models and fundamental theorem of Markov bases

Let p(ω), ω ∈ Ω, denote the probability of observing ω in one drawfrom Ω.

Suppose that each ω is characterized by d integers (often indicators,taking values 0 or 1 only)

a1ω, a2ω, . . . , adω

Let q1, . . . , qd non-negative real parameters of a model.

A toric model with the configuration matrix A = aiωi=1,...,d , ω∈Ω(d × ν integer matrix) specifies the probability p(ω) as

p(ω) = qa1ω1 qa2ω2 . . . qadωd . (8)

This is the parametric form.



How do we obtain an implicit specification in the binomial form fromthe parametric form?→ Obtain a set of generators of a toric ideal (usually Grobner basis

computation).

k: a fieldk(p(ω), ω ∈ Ω): the polynomial ring in the indeterminates p(ω),ω ∈ Ω.k(q1, . . . , qd) : the polynomial ring in q1, . . . , qd .A homomorphism from k(p(ω), ω ∈ Ω) to k(q1, . . . , qd).

πA : p(ω) 7→d∏

j=1

qaiωj .

IA = kerπA : the toric ideal

We consider the set of generators of IA.


Classical log linear model view (a side remark)

Consider the vector of the logarithms of the cell probabilities.

Our model specifies the row space of A. “Log linear model”

Parametric form expresses the row space of A by a (linear algebraic)basis.

The implicit specification specifies the kernel of A.

They are orthogonal complements to each other.



Moves and Markov bases

Denote NA = Ax | x ∈ Nν.For b ∈ NA, the fiber Fb is defined as

Fb = x ∈ Nν | b = Ax.

In statistical terminology b = Ax is the “sufficient statistic” of thefrequency vector x for the toric model (8).

Moves are elements of the integer kernel of A:

kerZ A = kerA ∩ Zν .

Adding a move z ∈ kerZ A to a frequency vector x does not changethe sufficient statistic:

A(x+ z) = Ax.



Moves and Markov bases

Let B ⊂ kerZ A be a finite set of moves.

For x,y in the same fiber, we draw an edge between them if

x− y ∈ B or y − x ∈ B.

Then each fiber can be considered as a graph.

B is called a Markov basis if every fiber becomes a connected graph.

i.e., by a Markov basis we can walk around all over every fiber byadding or subtracting moves from B, avoiding negative cells.

Fundamental theorem of Markov bases a Markov basis ↔ a set of generators of a toric ideal



Now is the time for advertising our book again.

Section 4.4 of our book contains a 4 page proof of this theorem.

Because one way is easy but other way is not obvious.



One of my colleagues told me that he failed to give a proof of thistheorem in front of students, which was embarrassing.

“MB ⇒ set of generators” is easy to see.

“Set of generators ⇒ MB” is not obvious.

We need to express a binomial px − py, x,y ∈ Fb, not only aspolynomial combination of generators, but monomial combinationwith all coefficients equal to 1 of generators.


5. Some results on complexity of Markov bases

In Aoki and Takemura (2003) we showed that for the case of3× 3× K tables, as K becomes large, there is a certain bound ofcomplexity.

If I , J are fixed and only K is increased, then finiteness result (upperbound for complexity) holds.

Many notions on complexities of Markov bases have been defined sofar.


Some results on complexity of Markov bases

Santos and Sturmfels (2003): general result on “Graver complexity”for higher Lawrence lifting

Kudo and Takemura (2012): a lower bound for Graver complexity forI × J × K tables

Yamaguchi, Ogawa and Takemura (2013): Conjecture of (Diaconisand Eriksson(2006)) on the complexity of Markov basis for Birkhoffpolytope is solved.

Koyama, Ogawa and Takemura (2014) “Markov degree ofconfigurations defined by fibers of a configuration”arXiv:1405.2676, generalizes the result of Yamaguchi et al.


6. Generalized hypergeometric distribution

For exact test, we considered the hypergeometric distribution as thenull distribution

p(x | Ax = b) ∝ 1∏ω∈Ω x(ω)!

This is the null model.

Under the alternative hypothesis, we need to consider an exponentialfamily with respect to this null distribution.

p(x | Ax = b,θ) ∝ θx∏ω∈Ω x(ω)!

This distribution is called the generalized hypergeometric distribution.


Generalized hypergeometric distribution

The maximum likelihood estimator for this distribution is called theconditional maximum likelihood estimator. It is important in a settingof infinitely many nuisance parameters (“Neyman Scott problem”).

For conditional MLE we need the normalizing constant of thisdistribution

Z (θ) =∑x∈Fb

θx∏ω∈Ω x(ω)!

.

In Chapter 6 of Grobner Dojo book, Nobuki Takayama explains thatZ (θ) satisfies an A-hypergeometric system.


Generalized hypergeometric distribution

This means that partial differential equations satisfied by Z (θ) areexplicitly known.

In our current research, it is not straightforward to use the differentialequations in a numerically fast and stable manner.

Also the singularity of the partial differential equations is serious andcauses numerical difficulties in obtaining MLE.


Summary

I talked about two-way and three-way contingency tables.

I presented the general form of the toric model.

I presented more recent results on complexity of Markov bases andmentioned current research on generalized hypergeometricdistribution.

Thank you for your attention!


Documents

Introduction to Markov bases and contingency table …park.itc.u-tokyo.ac.jp/atstat/takemura-talks/20140714-takemura... · Introduction to Markov bases and contingency table models