104
Decomposition Behavior in Aggregated Data Sets Sarah Berube Karl-Dieter Crisman Gordon College Oct. 24, 2009 Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 1 / 29

Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposition Behavior in Aggregated Data Sets

Sarah Berube Karl-Dieter Crisman

Gordon College

Oct. 24, 2009

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 1 / 29

Page 2: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 2 / 29

Page 3: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 3 / 29

Page 4: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

Aggregation can be a source of paradox in statistics. Here is a simple(Yule-)Simpson-like example:

Example

Imagine the following stores convinced x out of y (x/y) customers to buysomething on the following days:

Day 1 Day 2 Total

Store 1 2/3 7/20 9/23Store 2 9/20 1/3 10/23

Even though Store 1 has a better success rate on both days, the aggregatedata suggests that Store 2 was actually better at luring customers to buy.

The point is, aggregation of data can yield unexpected results, and that isparticularly true when looking solely at ranking of data.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 4 / 29

Page 5: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

Aggregation can be a source of paradox in statistics. Here is a simple(Yule-)Simpson-like example:

Example

Imagine the following stores convinced x out of y (x/y) customers to buysomething on the following days:

Day 1 Day 2 Total

Store 1 2/3 7/20 9/23Store 2 9/20 1/3 10/23

Even though Store 1 has a better success rate on both days, the aggregatedata suggests that Store 2 was actually better at luring customers to buy.

The point is, aggregation of data can yield unexpected results, and that isparticularly true when looking solely at ranking of data.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 4 / 29

Page 6: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

Aggregation can be a source of paradox in statistics. Here is a simple(Yule-)Simpson-like example:

Example

Imagine the following stores convinced x out of y (x/y) customers to buysomething on the following days:

Day 1 Day 2 Total

Store 1 2/3 7/20 9/23Store 2 9/20 1/3 10/23

Even though Store 1 has a better success rate on both days, the aggregatedata suggests that Store 2 was actually better at luring customers to buy.

The point is, aggregation of data can yield unexpected results, and that isparticularly true when looking solely at ranking of data.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 4 / 29

Page 7: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.

First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 8: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 9: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 10: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 11: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 12: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Paradox in non-parametric statistics

In the last two decades, tools from the mathematics of voting have beenused to begin to unravel some of these paradoxes. Haunsperger specificallyaddresses aggregation paradoxes in her 2003 Social Choice and Welfarepaper, from whence we draw our first example.First, we recall the Kruskal-Wallis test. As a uniformity test for datasamples with three populations, it may be viewed as follows:

I Take sampling data for each population and organize it in a table.

I Replace the data by the rank order of the data, smallest to largest.

I Sum the columns of the ranks and determine whether they are toodissimilar to be from identical populations.

I Alternately, one may view this as giving a ‘ranking’ of the populations.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 5 / 29

Page 13: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Haunsperger’s Example

I Consider the two sets of dataA B C

5.89 5.81 5.805.98 5.90 5.99

and

A B C

5.69 5.63 5.625.74 5.71 6.00

I Both will give rise to the same matrix of ranks,

A B C

3 2 15 4 6

.

The column sums are 8, 6, and 7.

I

Combining the two sets gives the following ma-trix of ranks, which has column sums 26, 22,and 30 - so that not only are the differencesmore pronounced, but C seems now to be thepopulation with the ‘biggest’ result.

A B C

8 7 610 9 113 2 15 4 12

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 6 / 29

Page 14: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Haunsperger’s Example

I Consider the two sets of dataA B C

5.89 5.81 5.805.98 5.90 5.99

and

A B C

5.69 5.63 5.625.74 5.71 6.00

I Both will give rise to the same matrix of ranks,

A B C

3 2 15 4 6

.

The column sums are 8, 6, and 7.

I

Combining the two sets gives the following ma-trix of ranks, which has column sums 26, 22,and 30 - so that not only are the differencesmore pronounced, but C seems now to be thepopulation with the ‘biggest’ result.

A B C

8 7 610 9 113 2 15 4 12

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 6 / 29

Page 15: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Haunsperger’s Example

I Consider the two sets of dataA B C

5.89 5.81 5.805.98 5.90 5.99

and

A B C

5.69 5.63 5.625.74 5.71 6.00

I Both will give rise to the same matrix of ranks,

A B C

3 2 15 4 6

.

The column sums are 8, 6, and 7.

I

Combining the two sets gives the following ma-trix of ranks, which has column sums 26, 22,and 30 - so that not only are the differencesmore pronounced, but C seems now to be thepopulation with the ‘biggest’ result.

A B C

8 7 610 9 113 2 15 4 12

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 6 / 29

Page 16: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 17: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 18: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 19: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 20: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 21: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Recent Work

As it turns out, this is not unusual behavior.

I Haunsperger shows that nearly all data sets are to some extentinconsistent under such aggregation for Kruskal-Wallis.

I Bargagliotti (2009) extends this to the whole class of such tests.

I On the other hand, Bargagliotti and Greenwell show that thestatistical significance of current results is negligible.

And, one can analyze these things using voting theory!

I Many nonparametric procedures create a test statistic by a methodequivalent to first creating a voting profile, to which standardprocedures are applied. (This is Haunsperger and Saari’s approach.)

I Hence, looking at a decomposition of the profile vector with respectto a useful basis could help! Work in this direction is begun inBargagliotti and Saari (2007); for instance, criteria for avoidingcertain paradoxes is given.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 7 / 29

Page 22: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 23: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 24: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 25: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?

I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 26: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?

I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 27: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 28: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Background

Basics under aggregation

I The component of any decomposition which yields the fewestparadoxes is called the Basic component.

I From a theoretical viewpoint, it is useful to look at the mostconsistent situation first.

I So we raise the following questions regarding the Basic component:

Questions

I How does it behave under aggregation, or at least under replication?I How close can we come to a data set with no other components?I How might one recognize such a data set?

I We answer many of these questions in this talk.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 8 / 29

Page 29: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 9 / 29

Page 30: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Data Definitions

We will need a number of definitions before proceeding.

I We have already encountered a data set and the correspondingmatrix of ranks:

A B C

14.5 15.6 16.714.3 11.2 13.4

A B C

4 5 63 1 2

I We can then create a profile and profile vector.I Look at all possible triplets of ranks (one for each item) and, for each

of these triplets, return the ranking of the items corresponding to that.I In this example, we can see that (4 1 2) would correspond to

A � C � B, while (4 1 6) gives C � A � B, and so on.I Our example gives (0, 2, 2, 2, 0, 2), using the usual order

A � B � C , A � C � B, . . . , B � A � C .

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 10 / 29

Page 31: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Components

We use the standard irreducible symmetric decomposition from BasicGeometry of Voting, and more recently Orrison et al.:

I The Basic components, BA = (1, 1, 0,−1,−1, 0),BB = (0,−1,−1, 0, 1, 1), and BC = (−1, 0, 1, 1, 0,−1).

I The Reversal components RA = (1, 1,−2, 1, 1,−2),RB = (−2, 1, 1,−2, 1, 1), and RC = (1,−2, 1, 1,−2, 1).

I The Condorcet component C = (1,−1, 1,−1, 1,−1).

I The Kernel component K = (1, 1, 1, 1, 1, 1) measures the number ofvoters.

In our example, we get(4 5 63 1 2

)⇒ (0, 2, 2, 2, 0, 2)⇒ (−1/3,−2/3,−1/3, 0,−2/3, 4/3)

which can be written 13 (−BA − 2BB − RA − 2C + 4K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 11 / 29

Page 32: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

ComponentsWe use the standard irreducible symmetric decomposition from BasicGeometry of Voting, and more recently Orrison et al.:

I The Basic components, BA = (1, 1, 0,−1,−1, 0),BB = (0,−1,−1, 0, 1, 1), and BC = (−1, 0, 1, 1, 0,−1).

I The Reversal components RA = (1, 1,−2, 1, 1,−2),RB = (−2, 1, 1,−2, 1, 1), and RC = (1,−2, 1, 1,−2, 1).(Note that they have the same algebraic structure as the Basicprofiles, over Σ3.)

I The Condorcet component C = (1,−1, 1,−1, 1,−1).I The Kernel component K = (1, 1, 1, 1, 1, 1) measures the number of

voters.

In our example, we get(4 5 63 1 2

)⇒ (0, 2, 2, 2, 0, 2)⇒ (−1/3,−2/3,−1/3, 0,−2/3, 4/3)

which can be written 13 (−BA − 2BB − RA − 2C + 4K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 11 / 29

Page 33: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Components

We use the standard irreducible symmetric decomposition from BasicGeometry of Voting, and more recently Orrison et al.:

I The Basic components, BA = (1, 1, 0,−1,−1, 0),BB = (0,−1,−1, 0, 1, 1), and BC = (−1, 0, 1, 1, 0,−1).

I The Reversal components RA = (1, 1,−2, 1, 1,−2),RB = (−2, 1, 1,−2, 1, 1), and RC = (1,−2, 1, 1,−2, 1).

I The Condorcet component C = (1,−1, 1,−1, 1,−1).

I The Kernel component K = (1, 1, 1, 1, 1, 1) measures the number ofvoters.

In our example, we get(4 5 63 1 2

)⇒ (0, 2, 2, 2, 0, 2)⇒ (−1/3,−2/3,−1/3, 0,−2/3, 4/3)

which can be written 13 (−BA − 2BB − RA − 2C + 4K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 11 / 29

Page 34: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Components

We use the standard irreducible symmetric decomposition from BasicGeometry of Voting, and more recently Orrison et al.:

I The Basic components, BA = (1, 1, 0,−1,−1, 0),BB = (0,−1,−1, 0, 1, 1), and BC = (−1, 0, 1, 1, 0,−1).

I The Reversal components RA = (1, 1,−2, 1, 1,−2),RB = (−2, 1, 1,−2, 1, 1), and RC = (1,−2, 1, 1,−2, 1).

I The Condorcet component C = (1,−1, 1,−1, 1,−1).

I The Kernel component K = (1, 1, 1, 1, 1, 1) measures the number ofvoters.

In our example, we get(4 5 63 1 2

)⇒ (0, 2, 2, 2, 0, 2)⇒ (−1/3,−2/3,−1/3, 0,−2/3, 4/3)

which can be written 13 (−BA − 2BB − RA − 2C + 4K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 11 / 29

Page 35: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Definitions

Aggregation Definitions

Haunsperger provides useful definitions, for a given statistical procedurewhose outcome is ranking of the candidates, and for all matrices of ranks:

I The procedure is consistent under aggregation if any aggregate of ksets of data, all of which yield a given ordering of the candidates, alsoyields the same ordering.

I The procedure is consistent under replication if any aggregate of ksets of data, all of which have the same matrix of ranks, yields thesame ordering as any individual data set.

In the sequel, our concern is with a specific form of replication, which wecall stacking.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 12 / 29

Page 36: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 13 / 29

Page 37: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of ranks

we will call a stanza, and we will typically delineate the stanzas.I A naive idea of how this might occur is taking samples of the same

things, but before and after some big event.

I Prices before and after a huge tax increaseI Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 38: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of rankswe will call a stanza, and we will typically delineate the stanzas.

I A naive idea of how this might occur is taking samples of the samethings, but before and after some big event.

I Prices before and after a huge tax increaseI Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 39: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of ranks

we will call a stanza, and we will typically delineate the stanzas.

I A naive idea of how this might occur is taking samples of the samethings, but before and after some big event.

I Prices before and after a huge tax increaseI Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 40: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of ranks

we will call a stanza, and we will typically delineate the stanzas.I A naive idea of how this might occur is taking samples of the same

things, but before and after some big event.

I Prices before and after a huge tax increaseI Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 41: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of ranks

we will call a stanza, and we will typically delineate the stanzas.I A naive idea of how this might occur is taking samples of the same

things, but before and after some big event.I Prices before and after a huge tax increase

I Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 42: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Defining Stacking

I Stacking is aggregating k data sets, all of which have the samematrix of ranks, and which in addition do not have any overlapbetween the numerical ranges of their data.

I We stack our original example, with k = 3:

16 17 1815 13 14

10 11 129 7 8

4 5 63 1 2

I Each part of the matrix corresponding to the original matrix of ranks

we will call a stanza, and we will typically delineate the stanzas.I A naive idea of how this might occur is taking samples of the same

things, but before and after some big event.I Prices before and after a huge tax increaseI Animal populations before and after a conservation effort.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 14 / 29

Page 43: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

We are now ready to completely answer the first question about basics,with respect to stacking.

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic component ismultiplied by k2, each Reversal component is multiplied by k, theCondorcet component is multiplied by k2, and the Kernel component ismultiplied by k3.

The implication is that as long as you start with a Condorcet componentsmaller than the Basic components, stacking is a good way to find datasets with very large Basic components (and hence great regularity inoutcome with respect to a variety of procedures).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 15 / 29

Page 44: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

We are now ready to completely answer the first question about basics,with respect to stacking.

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic component ismultiplied by k2, each Reversal component is multiplied by k, theCondorcet component is multiplied by k2, and the Kernel component ismultiplied by k3.

The implication is that as long as you start with a Condorcet componentsmaller than the Basic components, stacking is a good way to find datasets with very large Basic components (and hence great regularity inoutcome with respect to a variety of procedures).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 15 / 29

Page 45: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

We are now ready to completely answer the first question about basics,with respect to stacking.

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic component ismultiplied by k2, each Reversal component is multiplied by k, theCondorcet component is multiplied by k2, and the Kernel component ismultiplied by k3.

The implication is that as long as you start with a Condorcet componentsmaller than the Basic components, stacking is a good way to find datasets with very large Basic components (and hence great regularity inoutcome with respect to a variety of procedures).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 15 / 29

Page 46: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

At least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 47: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

At least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 48: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of RanksAt least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

(This is because the K-W test, since it comes from the Borda Count, onlyobeys the Basic component, and in general the Condorcet and Basiccomponents will always be in the same proportion.)

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 49: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

At least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 50: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of RanksAt least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

(These procedures only differ when it comes to the Reversal component,and otherwise the same argument about Condorcet and Borda applies.)

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 51: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Decomposing Profiles from Stacks of Ranks

At least for this sort of aggregation, we can avoid some paradox. We haveseveral immediate corollaries:

Corollary

The Kruskal-Wallis test is consistent under stacking, as are any procedures(such as Mann-Whitney) which only rely on pairwise data.

Corollary

All tests derived from points-based voting procedures (such as the V test)are consistent under stacking of data sets with no Reversal component.

Corollary

Paradoxes due solely to Reversal components (for instance, including mostdifferences between Kruskal-Wallis and the V test) lessen under stacking ktimes (and disappear in the limit as k →∞).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 16 / 29

Page 52: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 53: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 54: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking TheoremThe proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

(The proof of the Kernel may be done trivially. For a general p × 3 matrixof ranks, there are p3 triplets, so the size of the kernel is p3/6; hence, for akp × 3 matrix, we get k3(p3/6) as the size.)

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 55: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 56: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 57: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

(In fact, for m > 3 ‘candidates’, all m-tuplets formed from elements takenfrom m different stanzas add only kernel components.)

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 58: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem

The proof is actually instructive and elegant. Recall the theorem:

TheoremIf we stack an n × 3 matrix of ranks k times, each Basic and Condorcetcomponent is multiplied by k2, each Reversal component is multiplied byk, and the Kernel component is multiplied by k3.

The rest of the proof comes down to two lemmas:

LemmaAll triplets that are formed from elements taken from three differentstanzas add only kernel components to the resulting profile decomposition.

LemmaFor a stacking with k = 2, the Basic and Condorcet components arequadrupled, and each Reversal component is doubled.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 17 / 29

Page 59: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

One proves this by computing carefully how the initial profile vector(a, b, c , d , e, f ) changes upon doubling (stacking k = 2), which is

(4a + b + c + e + f , a + 4b + c + d + f , a + b + 4c + d + e,

b + c + 4d + e + f , a + c + d + 4e + f , a + b + d + e + 4f ) .

Now multiplying both of these profiles by the decomposition matrix andcomparing the two results yields the lemma.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 18 / 29

Page 60: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

One proves this by simply checking how many there are of each preferenceX � Y � Z , and it turns out there are exactly

(k3

)n3 of each.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

One proves this by computing carefully how the initial profile vector(a, b, c , d , e, f ) changes upon doubling (stacking k = 2), which is

(4a + b + c + e + f , a + 4b + c + d + f , a + b + 4c + d + e,

b + c + 4d + e + f , a + c + d + 4e + f , a + b + d + e + 4f ) .

Now multiplying both of these profiles by the decomposition matrix andcomparing the two results yields the lemma.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 18 / 29

Page 61: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

One proves this by computing carefully how the initial profile vector(a, b, c , d , e, f ) changes upon doubling (stacking k = 2), which is

(4a + b + c + e + f , a + 4b + c + d + f , a + b + 4c + d + e,

b + c + 4d + e + f , a + c + d + 4e + f , a + b + d + e + 4f ) .

Now multiplying both of these profiles by the decomposition matrix andcomparing the two results yields the lemma.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 18 / 29

Page 62: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

One proves this by computing carefully how the initial profile vector(a, b, c , d , e, f ) changes upon doubling (stacking k = 2), which is

(4a + b + c + e + f , a + 4b + c + d + f , a + b + 4c + d + e,

b + c + 4d + e + f , a + c + d + 4e + f , a + b + d + e + 4f ) .

Now multiplying both of these profiles by the decomposition matrix andcomparing the two results yields the lemma.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 18 / 29

Page 63: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

Now we prove the theorem.

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

Considering the k stanzas individually, we get k times the originalcomponents.The first lemma indicates we only need to look at rankings coming fromtwo different stanzas, of which there are

(k2

)possible choices. So we

obtain 2(k

2

)= k2 − k additional (BX and C , but not RX ) components.

Adding these to the k components we already have gives k2, as desired,except for Reversal which remains at k , also as desired.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 19 / 29

Page 64: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)Now we prove the theorem.

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

Considering the k stanzas individually, we get k times the originalcomponents.(So the second lemma really is just saying that when k = 2, we get noadditional Reversal, but double our Basic and Condorcet.)

The first lemma indicates we only need to look at rankings coming fromtwo different stanzas, of which there are

(k2

)possible choices. So we

obtain 2(k

2

)= k2 − k additional (BX and C , but not RX ) components.

Adding these to the k components we already have gives k2, as desired,except for Reversal which remains at k , also as desired.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 19 / 29

Page 65: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Decomposing Stacks

Proof of the Stacking Theorem (cont.)

Now we prove the theorem.

LemmaTriplets from elements taken from three different stanzas add only kernelcomponents.

LemmaFor k = 2, the Basic and Condorcet components are quadrupled, and theReversal component is doubled.

Considering the k stanzas individually, we get k times the originalcomponents.The first lemma indicates we only need to look at rankings coming fromtwo different stanzas, of which there are

(k2

)possible choices. So we

obtain 2(k

2

)= k2 − k additional (BX and C , but not RX ) components.

Adding these to the k components we already have gives k2, as desired,except for Reversal which remains at k , also as desired.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 19 / 29

Page 66: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 20 / 29

Page 67: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

First Results

I The results so far lead one to ask about the component whichbehaves best in terms of paradoxes, and what results we might haveregarding that. This is of course the Basic component.

I Although there is no set which has only a Basic component (nor anyprofile with a positive number of voters!), we call any voting profilewith only Kernel and Basic non-vanishing components a Pure Basic.

I Hence the following results are useful!

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 21 / 29

Page 68: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

First Results

I The results so far lead one to ask about the component whichbehaves best in terms of paradoxes, and what results we might haveregarding that. This is of course the Basic component.

I Although there is no set which has only a Basic component (nor anyprofile with a positive number of voters!), we call any voting profilewith only Kernel and Basic non-vanishing components a Pure Basic.

I Hence the following results are useful!

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 21 / 29

Page 69: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

First Results

I The results so far lead one to ask about the component whichbehaves best in terms of paradoxes, and what results we might haveregarding that. This is of course the Basic component.

I Although there is no set which has only a Basic component (nor anyprofile with a positive number of voters!), we call any voting profilewith only Kernel and Basic non-vanishing components a Pure Basic.

I Hence the following results are useful!

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 21 / 29

Page 70: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

First Results

I The results so far lead one to ask about the component whichbehaves best in terms of paradoxes, and what results we might haveregarding that. This is of course the Basic component.

I Although there is no set which has only a Basic component (nor anyprofile with a positive number of voters!), we call any voting profilewith only Kernel and Basic non-vanishing components a Pure Basic.

I Hence the following results are useful!

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 21 / 29

Page 71: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proofs of First Results

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Implicit in Bargagliotti and Saari (2007) are propositions that if a profilecomes from a pure Basic data set, it must have n3 divisible by both 2 and3, and hence n is divisible by six. Now a direct computation using theopen source mathematics software Sage revealed that out of overseventeen million possible data sets of size n = 6, only about eightthousand were pure Basic - but they were there!See also the relevant note in the Communications of the ACM. Note thatwe still need the theorems, since the next possible size (n = 12) isapproximately nine orders of magnitude more difficult of a computation!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 22 / 29

Page 72: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proofs of First Results

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

Take any matrix with no Condorcet component. Now just note that Pk2

eventually outstrips Qk , no matter what P, Q are.

FactPure Basic data sets exist.

Implicit in Bargagliotti and Saari (2007) are propositions that if a profilecomes from a pure Basic data set, it must have n3 divisible by both 2 and3, and hence n is divisible by six. Now a direct computation using theopen source mathematics software Sage revealed that out of overseventeen million possible data sets of size n = 6, only about eightthousand were pure Basic - but they were there!See also the relevant note in the Communications of the ACM. Note thatwe still need the theorems, since the next possible size (n = 12) isapproximately nine orders of magnitude more difficult of a computation!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 22 / 29

Page 73: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proofs of First Results

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Implicit in Bargagliotti and Saari (2007) are propositions that if a profilecomes from a pure Basic data set, it must have n3 divisible by both 2 and3, and hence n is divisible by six. Now a direct computation using theopen source mathematics software Sage revealed that out of overseventeen million possible data sets of size n = 6, only about eightthousand were pure Basic - but they were there!See also the relevant note in the Communications of the ACM. Note thatwe still need the theorems, since the next possible size (n = 12) isapproximately nine orders of magnitude more difficult of a computation!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 22 / 29

Page 74: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proofs of First Results

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Implicit in Bargagliotti and Saari (2007) are propositions that if a profilecomes from a pure Basic data set, it must have n3 divisible by both 2 and3, and hence n is divisible by six. Now a direct computation using theopen source mathematics software Sage revealed that out of overseventeen million possible data sets of size n = 6, only about eightthousand were pure Basic - but they were there!

See also the relevant note in the Communications of the ACM. Note thatwe still need the theorems, since the next possible size (n = 12) isapproximately nine orders of magnitude more difficult of a computation!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 22 / 29

Page 75: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proofs of First Results

TheoremStacking can yield matrices of ranks with as large a Basic component asone desires, without being pure Basic.

FactPure Basic data sets exist.

Implicit in Bargagliotti and Saari (2007) are propositions that if a profilecomes from a pure Basic data set, it must have n3 divisible by both 2 and3, and hence n is divisible by six. Now a direct computation using theopen source mathematics software Sage revealed that out of overseventeen million possible data sets of size n = 6, only about eightthousand were pure Basic - but they were there!See also the relevant note in the Communications of the ACM. Note thatwe still need the theorems, since the next possible size (n = 12) isapproximately nine orders of magnitude more difficult of a computation!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 22 / 29

Page 76: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Characterizing Pure BasicsWe are nowhere near a full characterization of pure Basic data sets, noteven at the level of the characterizations of pure Condorcet, Reversal, andKernel voting profiles arising from nonparametric data sets found inBargagliotti and Saari (2007). Nonetheless, there are interesting first steps.

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to opposite rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

For instance, all profile entries from a pure Basic data set with sixobservations are divisible by three. These are the first results we know ofalong these lines, which rely in a fundamental way upon the profile arisingfrom a nonparametric data set.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 23 / 29

Page 77: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Characterizing Pure BasicsWe are nowhere near a full characterization of pure Basic data sets, noteven at the level of the characterizations of pure Condorcet, Reversal, andKernel voting profiles arising from nonparametric data sets found inBargagliotti and Saari (2007). Nonetheless, there are interesting first steps.

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to opposite rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

For instance, all profile entries from a pure Basic data set with sixobservations are divisible by three. These are the first results we know ofalong these lines, which rely in a fundamental way upon the profile arisingfrom a nonparametric data set.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 23 / 29

Page 78: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Characterizing Pure BasicsWe are nowhere near a full characterization of pure Basic data sets, noteven at the level of the characterizations of pure Condorcet, Reversal, andKernel voting profiles arising from nonparametric data sets found inBargagliotti and Saari (2007). Nonetheless, there are interesting first steps.

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to opposite rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

For instance, all profile entries from a pure Basic data set with sixobservations are divisible by three. These are the first results we know ofalong these lines, which rely in a fundamental way upon the profile arisingfrom a nonparametric data set.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 23 / 29

Page 79: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Characterizing Pure BasicsWe are nowhere near a full characterization of pure Basic data sets, noteven at the level of the characterizations of pure Condorcet, Reversal, andKernel voting profiles arising from nonparametric data sets found inBargagliotti and Saari (2007). Nonetheless, there are interesting first steps.

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to opposite rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

For instance, all profile entries from a pure Basic data set with sixobservations are divisible by three. These are the first results we know ofalong these lines, which rely in a fundamental way upon the profile arisingfrom a nonparametric data set.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 23 / 29

Page 80: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to reversed rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 24 / 29

Page 81: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to reversed rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

For three, the proof is simply linear algebra. For two, it is in additionnecessary to use the proofs of the lemmas from earlier which guaranteethat n is divisible by 2 and 3. There do exist non-equivalent pure Basicprofiles where two reversed rankings have the same numbers in the profile,so this theorem is sharp.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 24 / 29

Page 82: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to reversed rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 24 / 29

Page 83: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations

TheoremIf any three entries in a pure Basic profile vector are known, or if we knowtwo entries which do not correspond to reversed rankings (such asA � B � C and C � B � A), it is possible to find the remaining entries.

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

In fact, if one decomposes a profile coming from a nonparametric data setwith n rows, one can prove that the Basic components are all multiples ofn/6, the Reversal components are either multiples of 1/3 or 1/6, and theCondorcet component is either an even or odd multiple of n/6! (Theselast two depend on whether n is even or odd.)

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 24 / 29

Page 84: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

To prove this theorem, we need a new concept - that of a transposition orswap of two elements (i , j) of a matrix of ranks. This is simply a switch ofthese ranks between two matrices of ranks.The set of all neighbor swaps (i , i − 1) from a given matrix of ranks willgenerate all possible matrices of ranks for a given shape n × 3. Inparticular, we can begin with a canonical ‘unanimity’ matrix of rankswhich has profile (n3, 0, 0, 0, 0, 0) and decompositionn3

6 (BA − BC − RB + C + K ) and work from this fixed point.Finally, since n must be even, we let n = 2k and write the decompositionas 4k3

3 (BA − BC − RB + C + K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 25 / 29

Page 85: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

To prove this theorem, we need a new concept - that of a transposition orswap of two elements (i , j) of a matrix of ranks. This is simply a switch ofthese ranks between two matrices of ranks.

The set of all neighbor swaps (i , i − 1) from a given matrix of ranks willgenerate all possible matrices of ranks for a given shape n × 3. Inparticular, we can begin with a canonical ‘unanimity’ matrix of rankswhich has profile (n3, 0, 0, 0, 0, 0) and decompositionn3

6 (BA − BC − RB + C + K ) and work from this fixed point.Finally, since n must be even, we let n = 2k and write the decompositionas 4k3

3 (BA − BC − RB + C + K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 25 / 29

Page 86: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

To prove this theorem, we need a new concept - that of a transposition orswap of two elements (i , j) of a matrix of ranks. This is simply a switch ofthese ranks between two matrices of ranks.The following shows a (5, 2) transposition:(

6 5 41 3 2

)becomes

(6 3 51 2 4

).

The set of all neighbor swaps (i , i − 1) from a given matrix of ranks willgenerate all possible matrices of ranks for a given shape n × 3. Inparticular, we can begin with a canonical ‘unanimity’ matrix of rankswhich has profile (n3, 0, 0, 0, 0, 0) and decompositionn3

6 (BA − BC − RB + C + K ) and work from this fixed point.Finally, since n must be even, we let n = 2k and write the decompositionas 4k3

3 (BA − BC − RB + C + K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 25 / 29

Page 87: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

To prove this theorem, we need a new concept - that of a transposition orswap of two elements (i , j) of a matrix of ranks. This is simply a switch ofthese ranks between two matrices of ranks.The set of all neighbor swaps (i , i − 1) from a given matrix of ranks willgenerate all possible matrices of ranks for a given shape n × 3. Inparticular, we can begin with a canonical ‘unanimity’ matrix of rankswhich has profile (n3, 0, 0, 0, 0, 0) and decompositionn3

6 (BA − BC − RB + C + K ) and work from this fixed point.

Finally, since n must be even, we let n = 2k and write the decompositionas 4k3

3 (BA − BC − RB + C + K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 25 / 29

Page 88: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)

TheoremIf n = 6` is the size of the data set and the data set is pure Basic, then allentries in the underlying profile vector are divisible by 3`.

To prove this theorem, we need a new concept - that of a transposition orswap of two elements (i , j) of a matrix of ranks. This is simply a switch ofthese ranks between two matrices of ranks.The set of all neighbor swaps (i , i − 1) from a given matrix of ranks willgenerate all possible matrices of ranks for a given shape n × 3. Inparticular, we can begin with a canonical ‘unanimity’ matrix of rankswhich has profile (n3, 0, 0, 0, 0, 0) and decompositionn3

6 (BA − BC − RB + C + K ) and work from this fixed point.Finally, since n must be even, we let n = 2k and write the decompositionas 4k3

3 (BA − BC − RB + C + K ).

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 25 / 29

Page 89: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)Now we can outline the proof.

LemmaAny neighbor transposition (i , i − 1) between the columns for candidatesY and Z (respectively) changes the Condorcet component by ±2k

3 , the

Basic component by k3 (BZ − BY ), and the Reversal component by an

integer multiple of 16 (RY − RZ ).

LemmaA sequence of neighbor transpositions which brings the Condorcetcomponent to zero makes the Basic component an integer multiple of k.

Proof of Theorem.Recall that if n = 6`, then k = 3`, so that the Basic components are amultiple of 3`. The Kernel also is, as n3/6 = (6`)(6`)(2k)/6 = 3`(4k`),and clearly the Condorcet and Reversal components are, since they arezero! Then we multiply by the (integer!) column matrix obtained from thebasis, whereupon all entries are still divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 26 / 29

Page 90: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)Now we can outline the proof.

LemmaAny neighbor transposition (i , i − 1) between the columns for candidatesY and Z (respectively) changes the Condorcet component by ±2k

3 , the

Basic component by k3 (BZ − BY ), and the Reversal component by an

integer multiple of 16 (RY − RZ ).

LemmaA sequence of neighbor transpositions which brings the Condorcetcomponent to zero makes the Basic component an integer multiple of k.

Proof of Theorem.Recall that if n = 6`, then k = 3`, so that the Basic components are amultiple of 3`. The Kernel also is, as n3/6 = (6`)(6`)(2k)/6 = 3`(4k`),and clearly the Condorcet and Reversal components are, since they arezero! Then we multiply by the (integer!) column matrix obtained from thebasis, whereupon all entries are still divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 26 / 29

Page 91: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)Now we can outline the proof.

LemmaAny neighbor transposition (i , i − 1) between the columns for candidatesY and Z (respectively) changes the Condorcet component by ±2k

3 , the

Basic component by k3 (BZ − BY ), and the Reversal component by an

integer multiple of 16 (RY − RZ ).

LemmaA sequence of neighbor transpositions which brings the Condorcetcomponent to zero makes the Basic component an integer multiple of k.

The proofs of the lemmas are unenlightening computations with votingprofile differentials, and we omit them here.

Proof of Theorem.Recall that if n = 6`, then k = 3`, so that the Basic components are amultiple of 3`. The Kernel also is, as n3/6 = (6`)(6`)(2k)/6 = 3`(4k`),and clearly the Condorcet and Reversal components are, since they arezero! Then we multiply by the (integer!) column matrix obtained from thebasis, whereupon all entries are still divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 26 / 29

Page 92: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Pure Basics

Proving Characterizations (cont.)Now we can outline the proof.

LemmaAny neighbor transposition (i , i − 1) between the columns for candidatesY and Z (respectively) changes the Condorcet component by ±2k

3 , the

Basic component by k3 (BZ − BY ), and the Reversal component by an

integer multiple of 16 (RY − RZ ).

LemmaA sequence of neighbor transpositions which brings the Condorcetcomponent to zero makes the Basic component an integer multiple of k.

Proof of Theorem.Recall that if n = 6`, then k = 3`, so that the Basic components are amultiple of 3`. The Kernel also is, as n3/6 = (6`)(6`)(2k)/6 = 3`(4k`),and clearly the Condorcet and Reversal components are, since they arezero! Then we multiply by the (integer!) column matrix obtained from thebasis, whereupon all entries are still divisible by 3`.

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 26 / 29

Page 93: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Outline

Background

Definitions

Decomposing Stacks of Ranks

Pure Basics

Complements

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 27 / 29

Page 94: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Directions to Proceed

There is of course plenty more work to do in this regard!

Questions

I Will stacking help us with other aggregation questions?

I Can one say more about aggregation directly from the raw matrix ofranks (in the vein of Haunsperger or Bargagliotti), and not just usingthe proxy of voting profiles?

I On a somewhat more ambitious note, one could also try to generalizethe specifics of some of these ideas for n > 3. This seems harder.

I On a very ambitious note, can one characterize the subset of generalvoting profile space that matrices of ranks generate?

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 28 / 29

Page 95: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Directions to Proceed

There is of course plenty more work to do in this regard!

Questions

I Will stacking help us with other aggregation questions?

I Can one say more about aggregation directly from the raw matrix ofranks (in the vein of Haunsperger or Bargagliotti), and not just usingthe proxy of voting profiles?

I On a somewhat more ambitious note, one could also try to generalizethe specifics of some of these ideas for n > 3. This seems harder.

I On a very ambitious note, can one characterize the subset of generalvoting profile space that matrices of ranks generate?

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 28 / 29

Page 96: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Directions to Proceed

There is of course plenty more work to do in this regard!

Questions

I Will stacking help us with other aggregation questions?

I Can one say more about aggregation directly from the raw matrix ofranks (in the vein of Haunsperger or Bargagliotti), and not just usingthe proxy of voting profiles?

I On a somewhat more ambitious note, one could also try to generalizethe specifics of some of these ideas for n > 3. This seems harder.

I On a very ambitious note, can one characterize the subset of generalvoting profile space that matrices of ranks generate?

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 28 / 29

Page 97: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Directions to Proceed

There is of course plenty more work to do in this regard!

Questions

I Will stacking help us with other aggregation questions?

I Can one say more about aggregation directly from the raw matrix ofranks (in the vein of Haunsperger or Bargagliotti), and not just usingthe proxy of voting profiles?

I On a somewhat more ambitious note, one could also try to generalizethe specifics of some of these ideas for n > 3. This seems harder.

I On a very ambitious note, can one characterize the subset of generalvoting profile space that matrices of ranks generate?

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 28 / 29

Page 98: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 99: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 100: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 101: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 102: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 103: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29

Page 104: Decomposition Behavior in Aggregated Data ... - Gordon Collegekcrisman/StackingTalk.pdfGordon College Oct. 24, 2009 Berube and Crisman (Gordon College)Decomposing Aggregated DataOct

Complements

Acknowledgments

Finally, I’d like to thank the following:

I Sarah Berube - for her enthusiasm and talent as a research and REUstudent, and collaborator

I Anna Bargagliotti - for helpful emails and encouraging the project

I The Gordon College Faculty Development Committee - for theInitiative Grant which made the REU possible

I Mike Veatch and the queuing theory group at Gordon - for a goodwork environment

I Don Saari - for organizing this conference and helpful feedback

I All of you - for coming!

Berube and Crisman (Gordon College) Decomposing Aggregated Data Oct. 24, 2009 29 / 29