View
219
Download
0
Category
Preview:
Citation preview
Markov degree of the Birkhoff model
Akimichi Takemura12
joint withT. Yamaguchi1 and M. Ogawa1
1University of Tokyo
2JST CREST
January, 2014 at CASTA2014
1 / 33
Outline
1 The Birkhoff model
2 Markov degree
3 (n, r)-Birkhoff model
4 Main theorem and idea of our proofLatin squaresSketch of our proof
2 / 33
Reference
“Markov degree of the Birkhoff model”T. Yamaguchi, M. Ogawa and A. TakemuraJournal of Algebraic CombinatoricsDOI: 10.1007/s10801-013-0488-z
3 / 33
Definition of the Birkhoff model
Voters are asked to rank n candidates (“totalranking”).
Let Sn = {σ1, · · · , σn!} be the set of possible votes.
σ = (σ(1), · · · , σ(n)) ∈ Sn, where σ(j) denotes thecandidate ranked at the jth position.
pσ denotes the probability to observe σ ∈ Sn.
The Birkhoff model
log pσ = ψ0 +n∑
j=1
ψjσ(j)
4 / 33
Definition of the Birkhoff model (cont.)
The Birkhoff model
log pσ = ψ0 +n∑
j=1
ψjσ(j)
If ψjk is large, the candidate k is likely to be rankedat the jth position.
The sufficient statistic consists of the numbersvoters who rank the candidate k at the jth position.
5 / 33
Hypothesis testing
Does the Birkhoff fit the observed data?
Conditional testing with chi-square statistic.
The size of the fiber |Ft | is huge.⇒ Metropolis–Hastings algorithm with Markovbases (Diaconis&Sturmfels 1998)
6 / 33
Markov basis
Markov basisx : vector of frequenciest : the sufficient statistic for xA : configuration matrix satisfying t = Ax
B ⊂ kerA ∩ Zn! is a Markov basis⇔ ∀t and ∀x , y ∈ Ft , there exist M > 0, zi ∈ B ,ϵi ∈ {−1, 1}, i = 1, . . . ,M , such that
y = x +∑M
i=1 ϵizi , x +∑m
i=1 ϵizi ∈ Ft , 1 ≤ m ≤ M .
If we have a Markov basis, then we can construct aMarkov chain.
7 / 33
Configuration matrix of the Birkhoff model
Configuration matrix A of the Birkhoff model forn = 3:
(123) (132) (213) (231) (312) (321)(1, 1) 1 1 0 0 0 0(1, 2) 0 0 1 1 0 0(1, 3) 0 0 0 0 1 1(2, 1) 0 0 1 0 1 0(2, 2) 1 0 0 0 0 1(2, 3) 0 1 0 1 0 0(3, 1) 0 0 0 1 0 1(3, 2) 0 1 0 0 1 0(3, 3) 1 0 1 0 0 0
The columns are labeled by σ ∈ S3 and the rows arelabeled by (j , k) = (position, candidate).
8 / 33
Markov degree
Degree of movesDegree of z ∈ B =
∑z>0 z =
∑z<0−z = ∥z∥1/2
Markov degreeMarkov degree = maximum degree of moves in theminimal Markov basis
Conjecture of Diaconis&Eriksson 2006Markov degree of the Birkhoff model is three (i.e. the toric
ideal is generated by binomials of degree at most three, but not
two).
9 / 33
Size of the Markov bases for the Birkhoff model
Let n be the number of candidates. The tableshows the size of the minimal Markov basis for theBirkhoff model. (Diaconis&Eriksson 2006)
n deg.2 deg.3 deg.4 deg.5 deg.63 0 1 0 0 04 18 160 0 0 05 1050 28840 0 0 06 57510 7056240 0 0 0
We know many examples of configurations whoseMarkov degree is two. However, not manyconfigurations with Markov degree three are known.
10 / 33
(n, r)-Birkhoff model
Consider an election that each voter is asked to giver (≤ n) preferred candidates and to rank them(“partial ranking”).
For such case we define the (n, r)-Birkhoff model.
The set of possible votes is Sn,r = {σ1, · · · , σ n!(n−r)!}
σ = (σ(1), · · · , σ(r)) ∈ Sn,r ,
where σ(j) denotes the candidate at the jthposition.
11 / 33
(n, r)-Birkhoff model (cont.)
pσ: the probability to observe σ ∈ Sn,r .
(n, r)-Birkhoff mode
log pσ = ψ0 +r∑
j=1
ψjσ(j)
The sufficient statistic consists of the numbersvoters who rank the candidate k at the jth position.(the same as in the Birkhoff model)
12 / 33
Configuration of the (n, r)-Birkhoff model
The configuration matrix A of the (3, 2)-Birkhoffmodel is
(12) (13) (21) (23) (31) (32)(1, 1) 1 1 0 0 0 0(1, 2) 0 0 1 1 0 0(1, 3) 0 0 0 0 1 1(2, 1) 0 0 1 0 1 0(2, 2) 1 0 0 0 0 1(2, 3) 0 1 0 1 0 0
The columns are labeled by σ ∈ S3,2 and the rowsare labeled by (j , k) = (position, candidate).
13 / 33
Size of the Markov bases
The number of moves of the minimal Markov basesfor (n, r)-Birkhoff model:
n r deg.2 deg.3 deg.4 or more3 2 0 1 04 2 6 4 04 3 18 160 05 2 30 10 05 3 360 1000 05 4 1050 28840 06 2 90 20 06 3 2160 3680 07 2 210 35 07 3 8190 10325 08 2 420 56 08 3 23940 24416 0
14 / 33
Main result
TheoremFor n ≥ 3, r ≥ 2, the Markov degree of the(n, r)-Birkhoff model is three.
Our proof is based on the proof ofJacobson&Matthews (1998) for Latin squares.
15 / 33
Latin squares
A Latin square contains all symbols in each row andcolumn.
a b c d eb c d e ac d e a bd e a b ce a b c d
By swap operations among at most three rows,every Latin square can be generated.(Jacobson&Matthews 1998)
16 / 33
Example of swap operation
If we swap a in (1, 1) entry and b in (2, 1) entry, theresulting table is not a Latin square.
a b c d eb c d e ac d e a bd e a b ce a b c d
→
b b c d ea c d e ac d e a bd e a b ce a b c d
17 / 33
Representation of the dataset
N voters rank r preferred candidates chosen from ncandidates. The observed dataset is represented bya N × r matrix:
x11 x12 · · · x1rx21 x22 · · · x2r...
......
...xN1 xN2 · · · xNr
xij denotes the candidate in the ith vote (row) atthe the jth position. Note that every candidate isranked at most once in each vote.Denote the sufficient statistic by t and the fiber byFt .
18 / 33
Operation on the fiber Ft
Every element in Ft is obtained by swaps of thecandidates in the same column.
Each swap corresponds to a move.
To prove that Markov degree is three, it is enoughto show that we only need swap operations involvingat most three rows to generate all elements in Ft .
19 / 33
Example of the operation on Ft
It is not allowed to stop the operation when somerow contains the same candidate more than once.We have to somehow resolve the “collision”.
a b cb c dc d ed e a
→
b b ca c dc d ed e a
20 / 33
Example of the operation on Ft
The following is an example of a possible operation:a b cb c dc d ed e a
→
b d ca c dc b ed e a
21 / 33
Example of the operation on Ft
This operation involves four rows at the same time.a b cb c dc d ed e a
→
d b ca c db d ec e a
22 / 33
Example of the operation on Ft
In this operation, each swap involves at most threerows.
a b cb c dc d ed e a
→
d b cb c da d ec e a
→
d b ca c db d ec e a
23 / 33
Improper dataset (needed just for proof)
Let Ft be the set of N × r matrices obtained byadding matrices with one element of the forma + b − c . (Ft ⊂ Ft)
We call a + b − c an improper element.
a + b − c is understood as a, b appearing once andc appearing −1 time. Every row should containeach candidates zero time or once in total.
We call a matrix in Ft \ Ft an improper matrix anda matrix in Ft a proper matrix.
24 / 33
Example of an improper matrix
The following matrix is an example of impropermatrices in the case of N = 4, r = 3, n = 5. The setof candidates is {a, b, c , d , e}.
d b ca + b − d c d
c d ed e a
25 / 33
Operation on Ft
Consider the operations on Ft. We add ±(a − b)preserving the sufficient statistic.
For Ft, we only consider operations involving tworows of Ft at a time.
a b cb c dc d ed e a
±(a−d)−→
d b c
a + b − d c dc d ed e a
26 / 33
Resolvable pair
If there exists an improper element a + b − c , thenc appears in the same column.
In this case, the improper matrix can betransformed to a proper matrix by the operationamong the these rows.[
d b ca + b − d c d
]±(a−d)−→
[a b cb c d
]We call the pair of rows above a resolvable pair.
27 / 33
Lemma
We call a pair of votes R of two improper matrices I , I ′ acompatible pair, if there exists a common resolvable pairR ′ of I , I ′, such that |R ∪ R ′| ≤ 3.
LemmaAll elements of Ft can be generated by the operations onthe compatible pairs.
28 / 33
Sketch of the proof
For two proper matrices P ,P ′ ∈ Ft, we cantransform P to P ′ by the operations on thecompatible pairs.
Decompose the process from P to P ′ into thesegments that consist of transformations from aproper dataset to another proper dataset:
P1 ←→ I1 ←→ · · · ←→ Ij ←→ Ij+1
←→ · · · ←→ Im ←→ Pm
Each ←→ denotes the operation among two rows.
29 / 33
Sketch of the proof (cont.)
For each pair of two improper matrices Ij , Ij+1, thereexist proper matrices Pj ,P
′j ,P
′j+1 satisfying
Pj ←→ Ij ←→ Ij+1 ←→ P ′j+1 (1)
P ′j ←→ Ij ←→ Pj (2)
Pj ,P′j ,P
′j+1 are inserted in order to avoid improper
matrices temporarily:
30 / 33
Sketch of the proof (cont.)
The operations in (2) involves three rows in total,since both of the operations P ′j ←→ Ij andIj ←→ Pj involve a common improper element.
The transformation of (1) is achieved by operationsinvolving three rows, because
By the compatibility, the operation in Ij ←→ Ij+1
involves one of the rows of a resolvable pair.By the operations on the resolvable pairs, Ij and Ij+1 canbe transformed to proper matrices Pj ,P
′j+1.
31 / 33
Sketch of the proof (cont.)
The transformation from Pj to P ′j+1 and from P ′j toPj can be performed by the operations on Ft
involving at most three rows, respectively.
This proves the theorem.
32 / 33
Recommended