Masters Project edited from the commentslike to thank Davar Khoshnevisan for getting me started in the right direction and for the advice he offered in finding books that would aide

1

STATISTICAL MANIPULATION:

AN EXAMINATION OF

VARIOUS SUBJECTIVE METHODS

USED IN DATA COLLECTION

AND PRESENTATION

by

Joshua S. Rempfer

A Master’s Project submitted to the faculty of The University of Utah

In partial fulfillment of the requirements for the degree of

Masters of Science Degree for Secondary School Teachers

Mathematics (Teaching)

College of Science

The University of Utah

April 2013

2

THE UNIVERSITY OF UTAH GRADUATE SCHOOL

SUPERVISORY COMMITTEE APPROVAL

of a project submitted by

Joshua S. Rempfer This project has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Date Committee Chair: Hugo Rossi Date Committee Member: Amanda Cangelosi Date Committee Member: Davar Khoshnevisan

3

Table of Contents

Acknowledgements………………………………………………………………………… 4

Abstract………...……………………………………………………………………………. 5

Part I: Literature Review

Introduction………………………………………………………………………………………..7

Strategy I: Consider the Source…………………………………..………………………………8

Strategy II: Identifying Bias……………………………………………………………………..11

Strategy III: Accurate Definitions……………………………………………………………….13

Strategy IV: Misleading Comparisons…………………………………………………………..15

Strategy V: Misleading Presentations……………………………………………………...........22

Conclusion: ……………………………………………………………………………………..26

Part II: Scientific Research

Introduction: …………………………………………………………………………………….29

Background Information:………………………………………………………………………...29

Question, Hypothesis, Method of Research:……………..………….…………………………...31

Defining “Clutch” Performance:…………………………………………………………………31

Gathering Data:…………………………………………………………………………………..32

Glossary of Terms:…………………………………………………………………………….....32

Making Accurate Comparisons:………………………………………………………………....34

Results:…………………………………………………………………………………………...35

Analysis of Results:……………………………………………………………………………...40

Additional Commentary:………………………………………………………………………...47

Conclusion:………………………………………………………………………………………49

Bibliography:…………………………………………………………………………………….50

Appendix A:……………………………………………………………………………………...51

4

Acknowledgements:

I would like to thank the professors who taught each class throughout this cohort. I learned a

great deal and felt that my understanding and appreciation for math grew each semester. I would

like to thank Davar Khoshnevisan for getting me started in the right direction and for the advice

he offered in finding books that would aide in my research. I would like to thank Hugo Rossi

and Amanda Cangelosi for the hours they spent reading my work, the sound advice they offered,

and for taking the time to meet with me on numerous occasions. I would also like to thank my

family, especially my wife, Kory. She rearranged schedules for 2 years, allowed me to spend

countless evenings and weekends doing homework and research, and encouraged me every step

of the way.

5

ABSTRACT

STATISTICAL MANIPULATION:

An examination of various

subjective methods used in

data collection and presentation

by

Joshua Rempfer, Master of Science

University of Utah, 2013

Major Professor: Dr. Hugo Rossi

Degree: Master of Science for Secondary School Teachers in Mathematics

Methods of data collection in statistical research can often be flawed. As a result, many

statistics presented to the general public are unsound. Unfortunately, people willingly believe

errant or misleading statistics without doing any analysis. My goal in this project was really

three-fold. I wanted to see if I could find commonalities among factors contributing to unsound

statistics. I wanted to provide simple questions, or strategies, that could aid in the process of

analyzing statistics. I also wanted to do some research of my own to find out if various

commonly used baseball statistics accurately measured a player’s performance. In addition, it is

my hope that this reading and research can help the statistics unit I teach in my 7th grade classes

be more meaningful.

6

Project Outline:

Part 1: I completed a literature review examining methods of data collection and presentation.

Questionable figures (and their publication) can be credited to a number of factors. I addressed

five major contributing factors: innumeracy, bias, poor definitions/measurements, poor

comparisons, and unsound presentation. I provided a series of examples illustrating how these

factors can be problematic. In addition, I provided a series of questions that an individual can

ask when they encounter statistics that appear to be questionable.

Part 2: I researched and analyzed a particular data set (the batting average of Major League

Baseball players) that I felt was misleading. My findings are presented and I made note of the

accurate methods that were used as I gathered and analyzed all of the data.

Part 3: Although not included in this project, lesson plans that implement the strategies from this

study will be taught in my 7th grade classes. The goal of these lessons will be to provide my

students with the preliminary tools needed for critical analysis.

7

Part I: Literature Review

Introduction

“All statistics, even the most authoritative, are created by people. This does not mean

that they are all inevitably flawed or wrong, but it does mean that we ought to ask ourselves just

how the statistics we encounter were created (Best, 22).”

“Misinforming people by the use of statistical material might be called statistical

manipulation (Huff, 100).”

These two quotes help to illustrate why views concerning statistics can be quite polarized.

I observed two basic opinions concerning misleading statistics. 1) The belief that unsound

information is presented with the intent to mislead. 2) People who mislead with statistics do so

unintentionally because they have no idea what they are doing. In other words, the method by

which data is gathered and presented is filled with errors.

Prior to any research and reading, an individual might assume that misleading statistics

are presented intentionally. However, in my research I observed enough errors in statistical

application to become convinced that the second opinion is closer to the truth and the reason

misleading statistics take shape. As Huff (2004) stated, data gathered by people has the potential

to be misinterpreted and misleading. This observation led me to ask two questions. 1) Why do

so many people believe unsound statistical information? 2) What can I do to help people

(specifically my students) become more critical when it comes to recognizing unsound statistics?

In following pages, I will seek to thoroughly answer both of these questions.

8

Defining Innumeracy:

People, says Paulos (1988), can be very poor judges of the numbers they encounter. His

words serve as a simple definition for innumeracy. I am not saying that our population lacks

intelligence. Rather, a good portion of the population lacks the critical numeracy skills necessary

to spot misleading statistics. Examples of innumeracy often go unnoticed: the cashier who does

not understand why you give $5.05 when your bill is $4.05, the 7th grade student who asks which

side of the ruler measures inches, people who are excited to get their tax return, as if receiving

their own money is some kind of a gift.

The general public seems to have a real hesitation or fear of anything having to do with

mathematics (or even just numbers). In parent teacher conferences, I have had more than a

handful of parents tell me they cannot help their children with the 7th grade work that is sent

home. Parents who simply want their child to “pass” math only help to promote innumeracy.

Many students like to think that most mathematics is not readily useful and cannot wait to be

finished with their math requirements. The problem that results from innumeracy is an inability

to accurately analyze various forms of numerical information. In my opinion, innumeracy is the

main reason unsound statistics “sound” believable.

My goal is to help combat this innumeracy. It is not my goal to turn people into skeptics,

questioning every statistic they encounter. I simply want to provide some questions (I’ll call

them strategies) that can be asked when an individual encounters numbers or statistics. Huff

(1954) called this the ability to “talk back” to a statistic.

9

Strategy 1: Consider the Source

The general public is bombarded by statistics. Often times these “statistics” are simply

numbers attached to claims. However, just attaching a number does not mean research or proper

analysis occurred.

One of the more memorable examples of attaching a number to make a claim sound

believable can be credited to Joseph McCarthy. In 1950, McCarthy made a claim about

communists in the state department. Seife (2010) was quick to point out that it was not

McCarthy’s first statement that people believed. “In my opinion the State Department, which is

one of the most important government departments, is thoroughly infested with communists.” It

was his second. “I have a list in my hand of 205 known communists.” Of course the number

had no foundation and since no audio recording remains, people still argue whether it was 205 or

207. By simply attaching a number, McCarthy’s claim seemed to have validity and as a result

people believed it.

This is a rather dramatic example, but simpler ones can be found every day. In a recent

email, a group of people (myself included) was informed that a child of the sender had been

named the Class 1A Volleyball MVP. Then a claim was made, “we are so proud that our

daughter is one of the top 5 players in the state.” There are 5 classes in Utah High School

Volleyball. Each class named an MVP. The assumption was made that this group of MVP’s

represented the five best players in the state. Keep in mind, that 1A consists of the smallest

schools in Utah; 5A consists of the largest schools. It is not totally accurate to claim this “Top

5” achievement. There is a good chance that a number of 5A players could have been the Class

1A MVP had they attended a smaller school. The five “best” players could all play in 5A, but

10

only one can be named MVP. This is just another example of how attaching a number (without

analysis) seems to validate a claim. I wondered if other recipients had a similar reaction.

Although written some 60 years prior to Seife’s book, How to Lie with Statistics, by

Darrell Huff, mentions the same idea. Huff (1954) calls it the “insert a figure” approach. This

figure may come from some data that has been gathered. Or, it can simply be a figure that makes

the statistic or statement sound as desirable as possible. The first two examples seem to be

guesses. An example of this approach, with actual data, takes place frequently in the real estate

world.

A realtor might make the claim that the average price of a home, in the neighborhood

where you are looking, is $180,000. The uses of averages can be misleading. Average is

defined as the arithmetic mean, the sum of a group of numbers, divided by however many

numbers were in that group. However, averages are affected by outliers. A home with a value

of $500,000 can increase the average value dramatically. Similarly, a home with a low value can

decrease average value. The “number” that is most appealing will undoubtedly be used by the

real estate agent. There are other measures of central tendency (namely the median) that might

give a buyer a better indication of what represents the “average”. The point is that the attached

number gives the claim added support. The claim may lack any sort of analysis. But, when

numbers are attached the claim immediately sounds more believable.

These examples lead us to the first question that should be asked when examining a

statistic. Huff (1954) advises an individual to ask, “Who says so?” Best (2004) states the same

idea in a slightly more academic manner. “How did someone arrive at this number (Best, 59)?”

If we want to be critical analyzers of the claims we are presented with on a daily basis,

we must be aware of what makes a good statistic (Best, 2004). Huff (1954) also writes that we

11

must ask, “average of what?” or “who is included?” Numbers can sucker people into believing

statistics that are not accurate or true. Seife (2010) questions if our population is really capable

of the critical thought necessary to break down a questionable statistic.

Seife’s question was a key point of emphasis for this project. I believe learning to be a

critical thinker starts at a young age. If we can teach our students to be more analytical and more

comfortable with numbers, I believe a certain level of numeracy can be achieved. In doing so,

we may have some success in eliminating what Seife (2010) calls Potemkin numbers. He

defines these numbers as facades that look like real data (Seife, 15). By first asking the source of

the number or statistic, we can start out on the right path to eliminating some of the misleading

statistics that are presented on a regular basis.

Summarizing Strategy 1, the first step in identifying a good statistic is identifying the

source. A credible source is not an individual guess or statement. A credible source is one that

can back up their statistic with accurate data. This leads directly into the second strategy.

Strategy 2: Identifying Bias

Bias can be a major factor in identifying statistics that are misleading. Any given statistic

will have some form of bias. Statistics are not indisputable truth (Seife, 2010). Gathered data

will have a better chance of being reliable when efforts are made to gather a random sample that

represents a good cross section of the population.

One of the more common forms of bias can be found in results gathered from surveys.

Biased responses are typical for a number of reasons. Often surveys are responded to by those

who have a strong opinion. This is readily evident if one were to look at the comment section for

any restaurant. Rarely do comments tend toward average food or service. People respond to

these surveys if they were truly pleased or very disappointed.

12

The same could be said of any type of rating. College students are regularly asked to rate

their professors. These comments are often made public and can be easily accessed in a Google

search. I recently read the ratings of a close friend who teaches at Murray State in Kentucky.

Most of the reviews were positive but the comments were very generic. However, one review

was scathing. This student was clearly upset with his grade and wanted to make it known that he

did not care for the way this class was set up or taught. A strong feeling of like or dislike is a

very common form of bias. We must be careful when looking at the data resulting from surveys.

Surveys consisting of randomly polled people must also receive some scrutiny. The

results here may also possess some bias even when the polling is random. People that are

randomly polled tend to answer in a more positive manner. That is to say, if you randomly took

someone aside and asked them a question about a particular issue, the response given on the spot

may differ from the response given if they answered privately. As Huff (1954) said, we often

assume that people are telling the truth.

Another problem with various surveys is the percent that are actually returned. In a

college course (like the one mentioned above), a student may be required to complete a survey or

give feedback before receiving their grade. A sample with required responses will have less bias

than a sample consisting of responses that are optional. Last school year, my school finished its

initial accreditation process. A major contributing factor to the collected data was the use of

surveys. Student and teacher surveys had a 100% rate of return. There was certainly a chance

for bias. As mentioned before, people do not always give the most honest answer when

responding to a survey. The larger problem came in the form of the parent surveys. Only 52%

of the parents actually returned their surveys. There is certainly a chance that the remaining 48%

13

would have answered in a similar fashion. But, it is not accurate to assume this would be the

case.

Surveys are certainly not the only biased method of data collection. They do, however,

provide an excellent means to explain how bias is present.

The point I am trying to make is that every statistic has the potential to have some form

of bias. The second strategy was not to eliminate bias, but rather to identify it. When looking at

a statistic, we must be able to identify possible forms of bias. Asking these questions can aide in

the process of recognizing biased data. 1) Who was asked? 2) What method of data collection

was used? 3) Does the sample accurately represent the population?

Strategy 3: Accurately Define What is Being Measured

Mistakes can be made when gathering data if definitions are unclear. The definition is

really the parameter that identifies a statistical measurement. Best (2004) discussed dark figures

that result when definitions are either too broad or too narrow. Definitions that are too broad

result in data that includes many false positives. Definitions that are too narrow result in many

false negatives. Before gathering any data, a definition about the data being gathered must be

clear. In addition, if the data is presented to the population, the definition should be made public

(Best, 59).

Poor definitions often take the form of examples. If I ask a student to define the term

“equation”, a common answer might be 3x + 10 = 13. This is an example of an equation but, it

does not define the term.

To drive this point home, Best (2004) used the very serious issue of child abuse to

illustrate why an example cannot serve as the definition. If child abuse was defined by an

example, say a horrific incident where a fatality occurred, that definition fails to include cases

14

where other forms of abuse are present. If child protective services only sought to prevent cases

like this, the lives of many children would be still in danger. If a definition is too narrow, the

population affected is not accurately represented. This is a dramatic example but it clearly shows

the problem of using examples as definitions when gathering data. This narrow definition would

leave many children in harm’s way.

Broad definitions also lead to inaccurate statistics. Best (2004) referred to an example

from the Reagan presidency. Advocacy groups claimed homelessness was an issue that was not

being addressed properly. They claimed that as many 3,000,000 Americans were homeless. The

Reagan camp countered that the actual number was closer to 300,000. The advocacy groups,

wanting to bring a greater awareness the issue, counted homeless individuals using a broad

definition. Individuals living with relatives were counted as homeless (Best, 44). The larger

number also included people who spent one night without housing (Best, 44). Who was right?

The real problem was not in being right or wrong, but the lack of a common, clear definition.

Yet another example of a poorly defined measurement can be linked to standardized test

scores. Last school year, my class of math students scored quite well on their 7th grade end of

level tests. In fact, 91% of my students were rated as proficient. That number sounds great and I

was pleased to receive some praise from my administrators and colleagues. However, if one

were to look at little closer at that number it is not as good as it sounds. Being proficient meant

that a student scored a 3 or 4 on the end of level test. A score of “4” means that student scored

an 84% or better on a test that consisted of 70 questions. I would consider a test score of 84% to

be proficient. A score of “3” meant that a student scored between 63% and 83%. I do not know

the exact number of students who scored a 63%. But, I would not consider a score of 63% to be

proficient. A 63% is nearly a failing grade. To include scores in the 60’s as proficient, inflates

15

the percentage of proficient students. This inaccurate definition leads to a statistic that does not

accurately reflect my 7th grade class. If I were to look back and see the number of scores in the

60’s and eliminate them, the overall proficiency percentage would obviously decrease. I may not

receive as many compliments but the proficiency percentage would be a more accurate statistic.

This inaccurate statistic developed as a result of a poor definition. Test scores will again be

discussed in Strategy 4 as we identify yet another way that statistics can be misleading.

Best (2004) wrote that agreeing on definitions is difficult. At the very least, we must

agree that definitions, like the ones previously mentioned, have limitations. In order to have

reliable statistics, a clear definition is necessary. When looking at a statistic, the 3rd useful

question to ask is this: what parameters were used in the gathering of data? If the definition is

too narrow or too broad, the resulting statistic can be misleading.

Strategy 4: Watch out for Misleading Comparisons

Poor comparisons are another common form of misleading statistics. Best (2004) is quick

to point out that comparisons are absolutely necessary in understanding statistics. However, one

can be easily misled if comparisons are not carefully made. In many cases, people make the

mistake of comparing apples and oranges (Best 2004). That is to say, connections are made

between sets data that really have no connection at all. The examples that follow show how

these statistical errors are all too common.

Best (2004) narrowed these poor comparisons into 3 categories: comparisons of time,

comparisons between groups, and comparisons among places.

Comparisons of time:

To illustrate this point Best again referred to child abuse. In 1963, there were 150,000

reported cases of child abuse (Best 2004). In 1995, that number had grown dramatically to

16

3,000,000 reported cases. Comparing these two numbers is misleading for two reasons. One,

the definition of child abuse changed radically in that 35 year period. The number of people

responsible for reporting child abuse increased. New laws required teachers, doctors, day-care

workers and other professionals working with children to report any suspicion of child abuse.

Secondly, the population increased between 1963 and 1995. An increase in population will most

likely result in an increase of reported cases. It goes without saying that it is absolutely

necessary for laws to be in place requiring these reports. The safety of children is of utmost

importance. However, referring to the increase as an epidemic of child abuse is not sound.

Making the comparison between years should be done in more careful manner. The comparisons

should be made in years following the required changes in law or in years prior. Measurement

will never be completely accurate and every individual case is a tragedy. A broader definition is

necessary in helping to keep our children safe. But, comparing 1963 with 1995 is really

comparing two sets of data that are defined differently. This can have misleading results.

Seife (2010) wrote about another misleading comparison. In recent years, the number of

children affected by autism has grown at a seemingly exponentially rate. Some are quick to say

that the growth has been caused by something external, like immunizations. While that may be

the case, Seife suggests that the increase is caused by a change in the definition of autism. The

definition used in 1970 for an autistic child is not the same as the definition used in 2010 (or

2013). These authors are not medical experts. However, it is perfectly plausible to suggest that

the exponential growth is really the result of a change in the definition. To compare 1970 with

2013 is again comparing two sets of data that are not measuring the same thing. While they are

both numbers counting cases of autism, the definition of what constitutes a child falling on the

autism spectrum has changed dramatically. These two years should not be compared. Like the

17

cases of child abuse that Best wrote about, if we are to draw accurate comparisons, we must

compare sets of data that are measured or gathered in the same way.

In both cases of these misleading comparisons, we can see a domino effect taking place.

A change in definition has the potential to increase (or decrease) the data being collected. These

changes lead to comparisons that are misleading because the changes take place over long

periods of time. The next few paragraphs will provide more examples of misleading

comparisons. Autism and child abuse were used to provide examples of how comparing data

sets from two different time periods can be misleading. We still need to discuss how

comparisons between groups and comparing geographic locations can be misleading.

Groups:

The first example of a misleading comparison between groups is not necessarily two

groups but rather two items. Huff (1954) used the example of a misleading comparison that is

still commonly used today. He referred to an ad in Good House Keeping. The ad claimed that a

certain advertised juicer extracts 26% more juice. Using a previously mentioned strategy, one

might first ask, who says so? In this case, the source is the Good Housekeeping Institute. That

seems like a reputable source. However, Good Housekeeping might earn a great deal of money

from this advertisement. This brings our source into question just a little. We can also make

immediate use of Strategy 4 by asking, 26% more than what? Finding exactly what two things

are being compared is absolutely crucial. The juicer being advertised could be tested against 5

juicers that are known to be less productive. This comparison would of course produce a

desirable result.

As it turned out (Huff, 1954), the electric juicer was being compared to an old fashioned

hand juicer. I would certainly expect the electric juicer to extract more juice. The real question,

18

or comparison, should be between two juicers that are electric. One might actually find this

juicer to be the poorest on the market. But, when comparing it to a hand juicer it immediately

looks like a quality product.

In a more recent advertisement from Crest, the commercial claimed that this particular

toothpaste made your teeth 23% whiter. The questions to be asked are the same as in the

previous example. Who says so? I was not quick enough to read the fine print at the bottom of

the screen. I would have to guess that it was a test run by Crest. The second question to ask is,

23% whiter than what? This comparison may have been made between brushing with Crest and

not brushing your teeth at all or, between Crest and a toothpaste that did not have a whitening

additive. The point is that we do not have all the information. It is very hard to determine the

validity of the comparison if we are not sure about the two items or “groups” being compared.

These two examples used to identify misleading comparisons between groups may not

sufficiently illustrate the idea of two “groups” being compared. These were in fact items. I hope

that the following examples can further illustrate how strategy four (Identifying Misleading

Comparisons) is useful.

I would like to go back to the issue of test scores to illustrate how comparing two groups

can be misleading. I had mentioned previously that 91% of my 7th grade students scored

“proficient” or better on their end of level tests. When the data was presented during our back to

school week of teacher meetings, this statistic was compared to the previous 7th grade class

(where only 37% scored proficient or better). Can you see where I am going with this? These

are two totally different classes. Comparing the previous class to the current one is misleading if

we are trying to measure growth. The make-up of the student body is totally different. The test

was the same, so the percents can provide some useful information. A better comparison would

19

be identifying the growth of the previous class. The previous class had 73% of the students score

proficient or better on the 8th grade end of level test. This is a significant improvement. I believe

the 8th grade teacher is the one who should receive the accolades. The test was more difficult and

the scores improved greatly. It should also be noted that the previous class had 3 different

teachers when they were in 7th grade.

These comparisons are rampant in the education system. Seife (2010) calls it fruit-

packing. More specifically, comparisons are often manipulated to make test scores look better

than they really are. Thankfully, strides are being made to better track individual students from

year to year. Growth is better measured and more accurate comparisons can be made when we

track a class or a student from one year to the next. It was not accurate to say that the 7th grade

end of level test scores improved from 37% to 91% in one year. While those numbers are true,

they represent two totally different classes.

I noticed another example of a misleading comparison between groups in a headline from

the Salt Lake Tribune. “Mormonism is leading the way in US religious growth” [1]. This

headline was in fact true. If we look a little closer, however, we can see why it is misleading.

About 75% of the US population identifies as Christian [2]. While about 2% identifies as

Mormon [1]. For example, if we had a population of 1000 people, by percentage approximately

750 would identify as Christian and 20 would be Mormon. If over the course of 10 years, the

number of people identifying as Christian increased to 775. That increase of 25 people

represents a growth of 3.3%. If the number of Mormons increased from 20 to 30, that would

represent a 50% increase. It is easy to see that with a smaller population, identifying growth in

terms of percent can be misleading.

20

This method of comparing groups with completely different populations in terms of

percent is misleading, but not uncommon. Best (2004) refers to the US population to illustrate

another way that comparisons among groups can be misleading. He is quick to add that the

comparison is over simplified, but the method is used and many times goes unnoticed. Consider

the following statement. “There are more poor white families than black families.” By all

accounts, this statement may be true. But, if we look a little closer there is something misleading

that needs to be noted. Let us look again at a generic population to point out the error behind this

statement. If we had a population of 700 families, statistically about 600 of these families would

be white families and about 100 would be black families. From this pseudo population, perhaps

60 of the white families would fall below the poverty line and perhaps 20 of the black families

would fall below the poverty line. Obviously, the previous statement is true. There are three

times as many poor white families as there are poor black families. The claim loses all

credibility when we analyze these numbers a little more critically. The percentages paint a

totally different picture. This breakdown shows that 20% of the black families fell below the

poverty line while only 10% of the white families fell below this mark.

In this example, using percentages actually gives a more accurate comparison. The

context of the previous two examples is what differs. In both cases, the statistic that creates the

best headline is the one that is used. The individual reader of these headlines must possess the

ability to analyze and determine which context is most accurate. At times comparing

percentages is more reliable, while in other circumstances comparing the actual numbers is a

better choice.

When making comparisons between groups, especially if the population of each group

vastly differs, there are basic questions that deserve more consideration. (Best, 121)

21

1) Are these groups really comparable? 2) Are they different in some way that affects the

statistics used to make the comparison? 3) Is there an unmentioned variable? (Best, 121)

Geographic Comparisons:

Perhaps you have seen the recent Exxon Mobil ads dealing with international scores in

math and science. These comparisons help point out certain inaccuracies that can occur when

comparisons are made between two (or more) geographic regions. The slogan of the ad, “Let’s

Fix This”, makes implications about the math and science education in the US. While

improvements can always be made, the real problem here lies in the comparison. When schools

in the US take a standardized test, every student in the school or particular grade level is tested.

In other countries, public education is set up in many different fashions. The students who are

tested may not be the same cross section as the students tested in the US. In many countries after

primary school, around age 11, the higher scoring students are in different programs all together.

The students who have lower scores are directed in a different path. When the secondary school

testing takes place, students from different countries may take the same test, but the sample of

students being tested is greatly different (Best 2004). When Exxon Mobil presents the data, and

shows the US to rank 26th in the world the comparison being made is not sound. The “tested

student” is not the same. If every country had the same sample of students, students ranging

from low to high, the results could be entirely different. This is just one example of how making

geographic comparisons can lead to inaccurate statistics.

Last May I had a conversation with Professor Khoshnevisan. This conversation related

directly to the idea of comparing different geographic regions. The data I had originally intended

to study was life expectancy. He quickly pointed out that finding good clean data to work with

might be hard to come by. But, he shared one example that related to possible inaccuracies that

22

might develop when comparing two greatly different geographic regions. If one were to

compare the average life span of Americans with the average life span of people in rural African

communities, differences would surely arise. At first glance, one might be led to believe or even

make the statement that Americans live longer. However, Dr. Khoshnevisan pointed out

(personal communication, May 10, 2012) that if we look at the these two groups and eliminate

any deaths that occurred in infancy or early childhood the average life expectancy would be

much closer. In America, a child with complications during infancy and early years will most

likely have better access to medicine and treatment that could be life-saving. In a rural African

community, children susceptible to life threatening complications do not have the same

opportunities. In short, comparing life expectancy in different geographic regions can be like

what Best (2004) called comparing apples and oranges.

This quote from Best summarizes the problems that can occur in these geographic

comparisons. “The basic problem with geographic comparisons, then, is that there is a good

chance that statistics gathered from different places are based on different definitions and

different measurements, so they are not really comparable. It is desirable to make geographic

comparisons, but we need to be cautious when interpreting those statistics (Best, 113).”

Strategy 5: Be aware of how data is presented

Up to this point five basic problems that are present in many misleading statistics have

been identified. To review, the problems identified were as follows: innumeracy, attaching

numbers to make statistics sounds believable, bias, poor definitions and poor measurements, and

inaccurate comparisons. In many cases, one misstep can lead to another and a snowball effect

begins to take place. The culmination of all these factors leads to strategy number five. We

must be aware of how data is presented. What I discovered in looking through many texts and

23

various statistical miscues is that unsound presentation comes in a few basic forms. The

following paragraphs will provide just a few examples of how these methods of presentation can

lead people astray.

Repetition:

The simple act of repeating a statistic based on hearsay can lead to serious problems.

Without doing proper analysis, errant numbers are repeated and these numbers can grow into

what Best (2004) calls mutant statistics. Best started his book by mentioning what he called the

“worst social statistic of all time”. A certain doctoral candidate (that he left unnamed) began

their presentation by saying, “the number of children gunned down has doubled each year since

1950.” Of course this is ludicrous if one has a little knowledge of exponential growth. If this

were actually true, the number of deaths of this nature would actually be larger than the

population. The professors on this committee did a little research and found that at one point the

wording from one article to another changed just slightly. In the original article, the statement

read that the number of deaths had doubled from 1950 to 1995. In the article the student quoted,

the wording changed to “doubled each year”. Just a simple misquote, a few words changed, led

to a mutant statistic. The student repeated this and thankfully it was caught before it was

repeated again.

Unfortunately this is too common. Statistics are repeated and repeated inaccurately. The

first strategy made mention of “innumeracy”. Innumeracy is really the root of many mutant

statistics. The inability to spot these poor statistics lends itself to these numbers being repeated.

Often times it is not deliberate, but once something is repeated, the source is harder to locate and

a poor statistic can live on.

Graphs

24

Statistics can be easily manipulated when they are presented in graphical form. Huff

(1954) wrote about tendencies to present data in a way that best supports what is trying to be

displayed by the individual, group, or company. He identified numerous examples where the x

or y axis had been manipulated. Figures that are seemingly mundane can be made to look very

dramatic when you “zoom” in on a graph. Slow gain or loss can be made to look like a spike

when the graph is carefully manipulated. If the opposite takes place, that is, if we “zoom out”

curves can be made to look quite linear. Both techniques are used when data is displayed in a

graph. A quote from Huff states this idea quite clearly, “simply change the proportion between

the ordinate and the abscissa. There’s no rule against it, and it does give your graph a prettier

shape (Huff, 63).”

Note the increments on the y-axis of this graph. A graph on the next page displays the same data

with a simple change in y-axis.

Graph 1

The change in increments (y-values) on these two graphs make the graphs look very different

despite displaying the same set of information. (See Graph 2 on the top of the next page)

25

Graph 2

This bar graph attempts to make a minimal

increase look massive.

Seife (2010) wrote about something fairly similar. In a 2004 study by Nature magazine,

data containing the Olympics times from the 100 meter dash was analyzed. Researchers felt they

discovered some unique patterns in the decreasing times of the men and women in this event.

They believed the pattern to be so evident that they fit the data into a straight line. The times

never strayed too far from this line of best fit. The times for women were even decreasing at rate

faster than the men. A prediction was made, in 2156 the world record time for women would

surpass the world record time of the men in 100 meter dash. Perhaps this is possible, but if the

graph were carried out even further to year 2500, the women and the men would each break the

seven second barrier. This exemplifies how using a line of best fit can be misleading. A seven

26

second 100 meter dash means that men and women are both running at a rate of 60 miles per

hour! There is a maximum speed that humans can reach. The pattern will eventually flat line, it

will not continually decrease. In addition, it is not surprising that the times of women decreased

at a faster rate. They have been racing competitively for a shorter period of time. The point here

is simple; graphs can be misleading when they are manipulated.

Projections:

Making a prediction based on data is known as a projection. When data is manipulated

so it appears to be linear, projections can be misleading. Like many of the methods previously

written about, projections are necessary. Making projections, however, can be misleading if the

gathered data has not be analyzed properly. Sadly, poor projections are usually noticed after the

fact.

An example to illustrate this point comes from the early 1980’s. In the early years of

HIV and AIDS, the general public was not fully aware of how this disease was contracted. As

medical professionals learned more, people came to realize that it could be contracted through

heterosexual contact and blood transfusions. Thus, contracting AIDS became a bigger concern

to many Americans. “In a 1984 article, a major newsmagazine projected that 10 million

Americans would be affected by the disease in 1991 (Best, 108).” This projection caused a great

amount of reaction. Money was spent in excess of 100 million dollars to raise awareness and

help those affected by the disease. However, as 1991 rolled around the Centers for Disease

control reported that number affected was actually closer to 200,000. (Best, 2004) Perhaps the

money spent prevented the exponential growth but, there is a large difference between 200,000

and 10 million.

27

Projections, graphs, and simply repeating information are all examples of how data is

presented. These forms of presentation are all necessary in the world of statistics. Strategy five

simply stated that people should be aware of how data is presented. While these forms of

presentation are used and necessary, there is a potential for error. Being aware of common errors

can help individuals identify unsound methods of presentation.

Conclusion:

Throughout the course of the previous pages, numerous examples of misleading statistics

were presented. Some examples were of a more serious nature and others were simple examples

that I came upon while doing research, reading the paper, and watching television. It was my not

intention to take sides on any one particular issue or make light of something that could be

considered a very serious problem. The entire goal was to make the reader more aware of

misleading methods that are used when statistical information is gathered and presented.

Statistics have often received a bad reputation because of all the manipulation that takes

place. Yet, at the same time we need them. Statistics were first derived as a method of keeping

quantitative evidence of a population. People keeping track of these numbers were called

statists. Over time the statists’ numbers became known as statistics. A healthy state kept

records. Government used these numbers to develop wise policies. Statistics helped to

summarize a lot of information in a concise manner. (Best 2004)

A description of statistics in this way helps us to see them in a different light. Statistics

were not born bad, or with the intention to mislead. As the keeping of statistics became more

prevalent, the idea of using them to justify beliefs or decisions became increasingly popular.

Activists, the media, and businesses used them to help support their own causes. (Best 2004) It

is easy see how statistical manipulation became more popular as a result.

28

Some people are naïve when presented with any statistical information. Being naïve can

translate into believing any statistic, accurate or misleading. This is most likely the result of

innumeracy. On the other hand, some people become cynical and doubt every statistic that is

presented. It is my hope that people (more specifically my students) simply become more aware.

I want them to look at statistics with a more critical eye and be able to ask some very simple

questions.

1) Who is making the claim? A good statistic is more than just a guess or attaching a number to

make something sound desirable. 2) What bias might be present? A certain amount of bias is

going to be involved when obtaining information but analyzing the data obtained is critical. Data

must be gathered in a manner that truly represents the population being considered.

3) Is the definition clear? Definitions, the parameters by which the data is collected, must be

clear. Definitions that are too broad or too narrow lead to inaccurate statistics. 4) Are the

comparisons reasonable? The largest portion of this literature review was spent providing

examples of comparisons that were misleading. Good comparisons involve comparable items.

Comparable statistics count the same things in the same way. (Best, 2004) These can be the

trickiest to spot and are probably used more often than we realize. 5) Is the presentation

accurate? Graphs and projections can be manipulated just as easily as comparisons. Analyzing

any type of presentation must be done critically and carefully.

Having a critical mindset certainly requires more thought. But, it is extremely necessary

when analyzing statistics. Being a bit skeptical, critical, careful, and analytical can be helpful in

ways that go far beyond simply identifying misleading statistics.

29

Part II: Scientific Research

Examining “Clutch” Performance in Major League Baseball

Introduction:

“My experience has been that most scientists engage in their research because they are

interested in the results and because they get intellectual excitement out of the work (Salzburg,

2).”

This quote made sense and gave me a bit of confidence as I started this portion of my

project. I do not consider myself to be a scientist or a statistician. I am fascinated by statistics.

Admittedly, I was amazed to see all the distributions that existed. Until our cohort’s statistics

class in August, I was only aware of the “normal” distribution. To read and learn about the

Gamma, Poisson, and Binomial distributions, as well as Bayesian statistics and other topics was

truly enlightening. While I am still a novice, this study gave me a chance to show how I would

accurately gather and analyze sets of data.

If there was an area where I felt comfortable, it was in analyzing statistics from Major

League Baseball. This leads back to the original quote. If I were to do research in one particular

area, it made sense to choose something I really enjoy. There is really no point in studying

something unless you are intrigued by it.

Background for Research:

In recent years general managers, owners, and managers have taken a more statistical

approach to the game of baseball. If you have seen the movie Moneyball, this is clearly the

direction most teams are headed. Some of the newly created statistics are truly helpful when

analyzing players. Others are merely created by agents in an effort to make their client look

more appealing and demand a higher salary.

30

Any baseball fan recognizes the importance of statistics as a measure of a player’s

performance in the field and at the plate. While defense is certainly an important part of the

game, offensive performance often takes the spotlight. I have always been interested in

examining player performance in various situations. I became particularly curious about how

players perform in clutch situations. “Clutch” being defined as how someone performs in the

late innings of close games. A more precise definition of clutch will come in a few paragraphs.

I have also often wondered if the players selected for various awards at the end of the

season were truly the most deserving. In 2009, Tim Lincecum of the San Francisco Giants won

the National League Cy Young award. This is an annual award given to the best pitcher. Adam

Wainwright came in second. He plays for the St. Louis Cardinals. I follow the Cardinals very

closely and was definitely disappointed when the award went to Lincecum. I felt that

Wainwright had the better statistics. I did not do any research to see if Lincecum performed

better in clutch situations but, I still felt that Wainwright had the better season in 2009.

The previous statement lacks any sort of objectivity. It was simply a gut feeling. Gut

feelings are not always supported by objective research. The basis for my research came from

another gut feeling. In 2012, Cardinals highest paid player was Matt Holliday. Teams expect a

high level of performance from their highest paid players. His offensive statistics were generally

above average when compared to players in the Major Leagues. However, as a fan, I felt that

Matt Holliday never performed well in the later innings of games that were close. In my

observations, I felt that his best performances were generally in games where the Cardinals won

by a larger margin. I wanted to analyze his statistics, along with other players who performed at

high levels in 2012. My goal was to find out if players who performed at high levels during the

regular season maintained their high levels of success in “clutch” situations.

31

Question for study:

Do the players who performed the best during the regular season maintain the same level of

success during the later innings of close games?

Hypothesis:

I predicted that in the sample of players I studied there would be a significant decline in

production in “clutch” situations.

Method of Research:

I analyzed the performance of each player in Major League Baseball who batted over .300 for

the 2012 regular season. This batting average is considered to be highly successful and

productive for an offensive player. In 2012, there were 25 players who batted over .300. I

wanted to see if these players had the same sort of success in the later innings of games that were

close. These 25 players and Matt Holliday served as the sample for my research.

Defining Parameters for “clutch” situations:

To measure this, I had to decide what constituted a “clutch” situation. In order to gather

a fair amount of data, I decided that any at bats after the 5th inning, (a game is 9 innings) were

considered the latter part of the game. Aiding in this decision was the fact that data already

exists for the 8th inning or later in close games. I wanted to develop my own set of data.

Choosing the 5th inning or later marks the 2nd half of the game and many games are decided

before the final two innings. I also selected games decided by 3 runs or less as the definition of a

close game. I made this choice because if a team trails or leads by 3 runs going into the final

inning of a game, the game tying run will have a chance to reach the plate. For pitchers this is

called a “save” situation.

32

Included with these 25 players was Matt Holliday. He batted .295 and fell slightly below

the .300 mark. However, his performance was really the motivation for this research. I felt it

only fair to include him and see if my observations about his performance were actually true.

Gathering of Data:

I researched each player individually. I first looked through each of the 162 games

played during the regular season. I marked each game that was decided by 3 runs or less. This

number of games ranged anywhere from 90-120 for each player. It was interesting to note that

well over half of the games played during the regular season were “close” games. This gave

each player hundreds of opportunities to show their worth in clutch situations. This data

collection took a fair amount of work. I needed to look at each close game and then tally what

every player did after the 5th inning. I went through each at bat, in each close game, every month

from April through October. I broke the data into the following categories.

Glossary of Terms:

G AB H 2B 3B HR Runs RBI BB SO AVG OBP SLG OPS

* These are the categories commonly listed for offensive statistics on websites covering Major

League Baseball (MLB).

(G) Games – The number of games played in a regular season is 162. *My data collection

includes 2 rows. Row 1 contains all the data from the regular season. Row 2 contains all the

data from the 5th inning or later in close games.

*See appendix A for a complete collection of the statistics.

(AB) At-Bats – This category refers the number of At-Bats. This includes the number of times

the player batted during the season. At bats are only counted if a player reaches base safely, if a

33

player reaches on an error, or if the player gets out. At bats do not include the number of times

the player draws a walk (BB), hits a sacrifice (SAC), a play in which a teammate advances safely

as a direct result of their intended effort, or gets Hit by a Pitch (HBP).

(H) Hits – This includes the number of times a player reaches base via a single, double, triple, or

home run.

1B (Single) - When a batting player reaches first base safely on a batted ball.

2B (Double) - When a batting player reaches second base safely on a batted ball.

3B (Triple) - When a batting player reaches third base safely on a batted ball.

HR (Home Run) - When a batted ball leaves the field of play or when a player crosses all four

bases safely (this is called an inside the park home run).

Runs – The number of times a player safely crosses home plate.

RBI (Run Batted In) – A batting player is awarded an RBI when a runner on base crosses home

plate as a result of their at bat. (A batting player also receives an RBI when they hit home run as

they, too, will cross home plate) (A batting player may also receive an RBI if they draw a walk

(BB) when the bases are loaded.)

BB (Base on Balls) – A batter is awarded first base if he receives four balls (pitches that are not

strikes) in a single at-bat. (This is sometimes called a “walk”)

SO – (Strike Out) – When a player receives three strikes before putting a batted ball in play.

AVG – (Batting Average) – This is calculated by the dividing the number of hits by the number

of at bats.

OBP – (On Base Percentage) – This is calculated by the following formula !!!!!!"#!"!!!!!"#!!"#

SLG – (Slugging Percentage) – This is calculated by the following formula

! !! !! !! !! !! !!(!")!"

34

OPS – (On Base Percentage + Slugging) – This is calculated by adding OBP + SLG.

*Note: Hit by Pitch (HBP) and SAC (Sacrifices) are not included in a normal batting line (as

shown above). However, these were counted and included as I tabulated statistics.

I calculated a new batting line including these statistics for each player in the 5th inning or later

of close games. This line was compared the batting line for the entire season.

Making an Accurate Comparison:

After gathering the new data, I had to decide what statistics were going to be compared.

The first number I decided to take into account was batting average. I wanted to make a

comparison between the batting averages for the entire season and the batting averages in clutch

situations. If a player is truly “clutch”, their batting average should remain about the same in

these situations.

The second area that I felt was important in close games was OBP (on base percentage).

If runs are to be scored in latter innings of games, runners need to be on base. A home run is not

the only way to win a game. A simple single or BB (base on balls) can result in a run scored. If

players are truly clutch, their OBP should be just as high in the latter innings of close games.

The last area that I felt shared equal importance in measuring performance in clutch

situations was SLG (slugging percentage). This statistic helps determine the number of extra

base hits. The more doubles, triples, and home runs a player has will result in a higher slugging

percentage. If these hits occur in late inning games, scoring runs becomes much easier.

To determine if my hypothesis was supported, I separated and graphed each of these statistics for

the 26 players that were included in my study. The graphs compared the overall season averages

with the averages for at bats after the 5th inning in games decided by 3 runs or less.

35

Results:

Batting Average:

Player Regular Season

Close Game 5+ Inning Plus/Minus

Buster Posey 0.336 0.302 -0.034 Miguel Cabrera 0.330 0.343 0.013 Andrew McCutchen 0.327 0.321 -0.006 Mike Trout 0.326 0.236 -0.090 Adrian Beltre 0.321 0.273 -0.048 Ryan Braun 0.319 0.305 -0.014 Joe Mauer 0.319 0.303 -0.016 Derek Jeter 0.316 0.304 -0.012 Yadier Molina 0.315 0.309 -0.006 Prince Fielder 0.313 0.320 0.007 Torii Hunter 0.313 0.310 -0.003 Billy Butler 0.313 0.336 0.023 Robinson Cano 0.313 0.296 -0.017 Jordan Pacheco 0.309 0.291 -0.018 Allen Craig 0.307 0.303 -0.004 Marco Scutaro 0.306 0.309 0.003 David Wright 0.306 0.358 0.052 David Murphy 0.304 0.252 -0.052 Alex Rios 0.304 0.271 -0.033 Carlos Gonzalez 0.303 0.248 -0.055 Aaron Hill 0.302 0.261 -0.041 Martin Prado 0.301 0.298 -0.003 Austin Jackson 0.300 0.286 -0.014 Aramis Ramirez 0.300 0.309 0.009 Dexter Fowler 0.300 0.317 0.017 Matt Holliday * 0.295 0.246 -0.049

Average Difference in Batting Average

-0.015

36

0.000

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

0.225

0.250

0.275

0.300

0.325

0.350

0.375

0.400 Bu

ster Posey

Migue

l Cabrera

Andrew

McCutchen

Mike Trou

t Ad

rian Be

ltre

Ryan Braun

Joe Mauer

Derek Jeter

Yadier M

olina

Prince Fielder

Torii Hun

ter

Billy Butler

Robinson

Cano

Jordan Pache

co

Allen Craig

Marco Scutaro

David Wrig

ht

David Murph

y Alex Rios

Carlo

s Gon

zalez

Aaron Hill

MarRn

Prado

Au

sRn Jackson

Aram

is Ra

mire

z De

xter Fow

ler

MaT

Holliday

Averages

Players with Average above .300 for the Regular Season

Ba;ng Average Comparison

Regular Season

Close Game 5+ Inning

37

On Base Percentage:



Joe Mauer 0.416 0.421 0.005 Prince Fielder 0.412 0.394 -0.018 Buster Posey 0.408 0.393 -0.015 Andrew McCutchen 0.400 0.393 -0.007 Mike Trout 0.399 0.312 -0.087 Miguel Cabrera 0.393 0.425 0.032 Ryan Braun 0.391 0.383 -0.008 David Wright 0.391 0.443 0.052 Dexter Fowler 0.389 0.417 0.028 David Murphy 0.380 0.345 -0.035 Robinson Cano 0.379 0.356 -0.023 Matt Holliday 0.379 0.345 -0.034 Austin Jackson 0.377 0.368 -0.009 Yadier Molina 0.373 0.371 -0.002 Billy Butler 0.373 0.417 0.044 Carlos Gonzalez 0.371 0.322 -0.049 Torii Hunter 0.364 0.366 0.002 Derek Jeter 0.362 0.336 -0.026 Aaron Hill 0.360 0.312 -0.048 Aramis Ramirez 0.360 0.384 0.024 Martin Prado 0.359 0.348 -0.011 Adrian Beltre 0.359 0.33 -0.029 Allen Craig 0.354 0.381 0.027 Marco Scutaro 0.348 0.333 -0.015 Jordan Pacheco 0.341 0.304 -0.037 Alex Rios 0.334 0.304 -0.03

Average Difference in On-Base Percentage

-0.010

38

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

0.500

Joe Mauer

Prince Fielder

Buster Posey

Andrew

McCutchen

Mike Trou

t Migue

l Cabrera

Ryan Braun

Da

vid Wrig

ht

Dexter Fow

ler

David Murph

y Ro

binson

Cano

MaT

Holliday

AusRn Jackson

Yadier M

olina

Billy Butler

Carlo

s Gon

zalez

Torii Hun

ter

Derek Jeter

Aaron Hill

Aram

is Ra

mire

z MarRn

Prado

Ad

rian Be

ltre

Allen Craig

Marco Scutaro

Jordan Pache

co

Alex Rios

On Base Percent

Selected Players

On-‐Base Percentage

Regular Season


39

Slugging Percentage:



Miguel Cabrera 0.606 0.702 0.096 Ryan Braun 0.595 0.570 -0.025 Mike Trout 0.564 0.355 -0.209 Adrian Beltre 0.561 0.505 -0.056 Andrew McCutchen 0.553 0.505 -0.048 Robinson Cano 0.550 0.475 -0.075 Buster Posey 0.549 0.537 -0.012 Aramis Ramirez 0.540 0.553 0.013 Prince Fielder 0.528 0.525 -0.003 Aaron Hill 0.522 0.455 -0.067 Allen Craig 0.522 0.533 0.011 Alex Rios 0.516 0.488 -0.028 Carlos Gonzalez 0.510 0.379 -0.131 Billy Butler 0.510 0.590 0.08 Yadier Molina 0.501 0.506 0.005 Matt Holliday 0.497 0.455 -0.042 David Wright 0.492 0.540 0.048 Austin Jackson 0.479 0.417 -0.062 David Murphy 0.479 0.374 -0.105 Dexter Fowler 0.474 0.500 0.026 Torii Hunter 0.451 0.441 -0.01 Joe Mauer 0.446 0.468 0.022 Martin Prado 0.438 0.479 0.041 Derek Jeter 0.429 0.429 0 Jordan Pacheco 0.421 0.391 -0.03 Marco Scutaro 0.405 0.381 -0.024

Average Difference in Slugging Percentage

-0.023

40

Analyzing Results:

My original hypothesis was not supported by the data. There was really no indication of

a significant decline in any of the selected categories. While a few players had numbers that

dipped well below their season averages, most performed at a fairly similar in clutch situations.

The overall batting average dropped an average of 0.015 thousandths. If a player batted .310 for

the regular and batted .295 in “clutch” situations, this level of performance would be welcomed

by any manager, owner, or general manager.

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

Migue

l Cabrera

Ryan Braun

Mike Trou

t

Adria

n Be

ltre

Andrew

McCutchen

Robinson

Cano

Buster Posey

Aram

is Ra

mire

z

Prince Fielder

Aaron Hill

Allen Craig

Alex Rios

Carlo

s Gon

zalez

Billy Butler

Yadier M

olina

MaT

Holliday

David Wrig

ht

AusRn Jackson

David Murph

y

Dexter Fow

ler

Torii Hun

ter

Joe Mauer

MarRn

Prado

Derek Jeter

Jordan Pache

co

Marco Scutaro

Slugging Percent

Selected Players

Slugging Percentage

Regular Season


41

On Base Percentage showed the least amount of decline with an average decline of .010

thousandths. To put this in perspective, if a player’s on-base percentage was .390 during the

regular season and .380 in “clutch” situations, that would again be welcomed by any team. On

average players reached base in the latter innings of close games as often as they did in games

that were not as close. This measure of consistency truly took me by surprise. I expected this

number to be vastly different.

Lastly Slugging Percentage, this average declined the most. The overall average decline

in slugging was .023 thousandths. Yet even this difference means that a player has slightly fewer

extra base hits (2B, 3B, and HR) in the latter innings of close games. This does not indicant a

significant decline as I had originally hypothesized. This also includes two outliers, (Carlos

Gonzalez and Mike Trout), if these outliers were not present clutch SLG percentage would be

relatively identical.

Deeper Analysis:

After looking over these statistics multiple times and receiving some sound advice, I

decided that only analyzing the average change was insufficient. My advisors suggested looking

to the Fisher Discriminant to determine if one these offensive categories (batting average,

slugging percentage, on-base percentage) could be the best predictor of performance in the

clutch. I stated in my hypothesis that a significant decline in production would be evident in

clutch situations. What I found in my original test was that the overall difference was relatively

insignificant.

The Fisher Discriminant allows us to look at multiple variables and determine if one of

the variables is the best predictor of clutch performance. This analysis leads us away from the

42

original test and hypothesis. Rather than trying to find significant decline, we can show how

closely regular season statistics match up with the statistics in clutch situations.

Here is a description of the Fisher discriminant as written by Dr. Rossi (personal

communication, March 13, 2013). Let us measure n variables measured in m instances (like

height, weight, beltsize) measured for m people. So, each variable produces a column vector of

length m. Let X1,…Xn, be those variables, and X, the m x n matrix whose columns are X1,…Xn.

Suppose we take a new variable A, which is a linear combination of the X1,…Xn: A = a1X1

+a2X2 + …+anXn. Suppose we wanted to find the variance of A. Well, we'd fill in the column

whose jth entry is the same linear combination of the jth entries of X1,…,Xn, and then compute

the variance of that vector.

Here is another way. Let Cij be the covariance of Xi and Xj, and C the n x n matrix whose

(i,j) entry is Cij. Then the variance of A is the number AtCA, where A is the n x 1 column formed

of the a1,…,an and At is the transpose of A (and hence a 1 x n matrix).

Now, eigenvalue theory tells us this: There are n linear combinations of X1,…,Xn, call them

Y1,…,Yn, with these properties:

1. They are independent; that is, the covariance of Yi with Yj (for i not equal to j) is zero.

2. CYi = eiYi, where the number ei (called the ith eigenvalue) is the variance of Yi.

If we order the Yi so that the ei are non-increasing, then Y1 accounts for the largest portion of the

variance in the data, Y2 for the next largest, and so forth. Fisher invented this scheme for the

following situation: We have a large number of variables, and want to simplify our study to a

small number of variables that account for the most variance. So here, he might pick the two or

three eigenvectors which have the largest eigen values. For those eigenvectors with very small

43

eigen values, they correspond to variables that are most likely just linear combinations of the

others.

Analysis #1

In the first test I measured the differences in the regular season and late inning batting

average, on-base percentage, slugging percentage. These numbers were then normalized and

using those normalized values we were able to set up a 3 x 3 covariance matrix. The following

table shows the results of the first test.

Fisher Discriminant Analysis Normalized Means

Player SLG OBP BA

3 x 3 Covariance Matrix Aaron Hill -0.045 -0.038 -0.026

0.0043759 0.0017924 0.0016399

Adrian Beltre -0.034 -0.019 -0.033

0.0017924 0.0009852 0.0008165 Alex Rios -0.006 -0.020 -0.018

0.0016399 0.0008165 0.0008513

Allen Craig 0.034 0.037 0.011

Andrew McCutchen -0.026 0.003 0.009

Eigen Values

Aramis Ramirez 0.036 0.034 0.024

1.683E-10 Austin Jackson -0.040 0.001 0.001

1.098E-06

Billy Butler 0.103 0.054 0.038

1.721E-07 Buster Posey 0.011 -0.005 -0.019

Correlation

Carlos Gonzalez -0.109 -0.039 -0.040

0.8632704 0.8496242 0.8915456 David Murphy -0.083 -0.025 -0.037

SLG OBP BA

David Wright 0.071 0.062 0.067 Derek Jeter 0.023 -0.016 0.003 Dexter Fowler 0.049 0.038 0.032 Joe Mauer 0.045 0.015 -0.001 Jordan Pacheco -0.007 -0.027 -0.003 Marco Scutaro -0.002 -0.005 0.018 Martin Prado 0.064 -0.001 0.012 Matt Holliday -0.020 -0.024 -0.034 Miguel Cabrera 0.119 0.042 0.028 Mike Trout -0.187 -0.077 -0.075 Prince Fielder 0.020 -0.008 0.022 Robinson Cano -0.053 -0.013 -0.002 Ryan Braun -0.003 0.002 0.001 Torii Hunter 0.013 0.012 0.012 Yadier Molina 0.028 0.008 0.009

44

The small eigen values show that there is not a lot of variance in the data: "on average"

all 3 data streams give the same result. It is not a surprise that with little variation in the data the

correlation coefficients are quite strong.

Analysis #2

In the second analysis we chose to use the ratio of the clutch performance to regular

season performance (i.e. clutch BA/reg. season BA). I took the natural log of each ratio and

again normalized each of these values.

Fisher Discriminant Analysis Normalized Means

Player BA OBP SLG

3x3 Covariance Matrix Aaron Hill 0.085 0.111 0.092

BA OBP SLG

Adrian Beltre 0.053 0.052 0.108

BA 0.0176729 0.0095748 0.0113881 Alex Rios 0.003 0.062 0.061

OBP 0.0095748 0.0072020 0.0075514

Allen Craig -‐0.073 -‐0.106 -‐0.041

SLG 0.0113881 0.0075514 0.0100156 Andrew McCutchen 0.038 -‐0.015 -‐0.036

Aramis Ramirez -‐0.076 -‐0.097 -‐0.084

Correlation Austin Jackson 0.086 -‐0.008 -‐0.006

0.8486882 0.85596909 0.889123468

Billy Butler -‐0.198 -‐0.144 -‐0.125

BA OBP SLG Buster Posey -‐0.030 0.005 0.053

Carlos Gonzalez 0.244 0.109 0.146 David Murphy 0.195 0.064 0.133 David Wright -‐0.146 -‐0.157 -‐0.211 Derek Jeter -‐0.053 0.042 -‐0.015 Dexter Fowler -‐0.106 -‐0.102 -‐0.109 Joe Mauer -‐0.101 -‐0.044 -‐0.003 Jordan Pacheco 0.021 0.083 0.006 Marco Scutaro 0.009 0.012 -‐0.064 Martin Prado -‐0.142 -‐0.001 -‐0.044 Matt Holliday 0.036 0.062 0.127 Miguel Cabrera -‐0.200 -‐0.110 -‐0.093 Mike Trout 0.410 0.214 0.269 Prince Fielder -‐0.047 0.012 -‐0.076 Robinson Cano 0.094 0.030 0.002 Ryan Braun -‐0.010 -‐0.012 -‐0.009 Torii Hunter -‐0.030 -‐0.038 -‐0.045 Yadier Molina -‐0.062 -‐0.027 -‐0.035

The covariance matrices in table 1 and 2 are used in the Fisher analysis. The correlation boxes in

each table are used calculating multivariate regression. However, testing the differences and

45

ratios had similar results. None of the variables seemed to predict clutch performance better than

another. This deeper analysis actually matched up with the simple, less sophisticated, analysis of

the means.

My hypothesis was not supported by the data. While Matt Holliday showed the decline

that I felt was present, my conclusion, generally speaking, is that regular season performance

best predicts a player's clutch performance in each of the three measures chosen.

At first, I felt that this conclusion was a bit of a disappointment. However, the

conclusion is actually quite strong. Every calculation (means, variances, covariances, Fisher

analysis, and multivariate regression) told us the same thing. Clutch performance is best

predicted by looking at regular season performance. The anomalies or outliers actually deserve

the deeper analysis when it comes to managing a Major League Baseball team. The final table I

am going to include will display the three original sets of data in one graph. This table does not

include means or variances. It simply shows the increase or decline in BA, OBP, and SLG for

each of the 26 players selected. From the graph below, it is very easy to spot the outliers. The

outliers, especially those whose performance significantly declines, are the players who would

concern a manager the most. In the 2012 playoffs, Joe Girardi, the manager of the New York

Yankees, made use of statistics just like this! Alex Rodriguez (not included in this study) is the

highest paid player in all of baseball. His career statistics are phenomenal. However, in clutch

situations he is an outlier. For this reason, his manager decided to leave him out of the line up in

some very clutch situations. Similarly, in the 2011 World Series, the Cardinals highest paid

player Matt Holliday did not play in the 7th and deciding game. Tony LaRussa and Joe Girardi

were both aware that these players did not perform well in clutch situations. Statistics help

justify these difficult decisions when managing a baseball team.

46

Player Change in BA Change in OBP Change in SLG Aaron Hill -0.041 -0.048 -0.067 Adrian Beltre -0.048 -0.029 -0.056 Alex Rios -0.033 -0.03 -0.028 Allen Craig -0.004 0.027 0.011 Andrew McCutchen -0.006 -0.007 -0.048 Aramis Ramirez 0.009 0.024 0.013 Austin Jackson -0.014 -0.009 -0.062 Billy Butler 0.023 0.044 0.08 Buster Posey -0.034 -0.015 -0.012 Carlos Gonzalez -0.055 -0.049 -0.131 David Murphy -0.052 -0.035 -0.105 David Wright 0.052 0.052 0.048 Derek Jeter -0.012 -0.026 0 Dexter Fowler 0.017 0.028 0.026 Joe Mauer -0.016 0.005 0.022 Jordan Pacheco -0.018 -0.037 -0.03 Marco Scutaro 0.003 -0.015 -0.024 Martin Prado -0.003 -0.011 0.041 Matt Holliday -0.049 -0.034 -0.042 Miguel Cabrera 0.013 0.032 0.096 Mike Trout -0.09 -0.087 -0.209 Prince Fielder 0.007 -0.018 -0.003 Robinson Cano -0.017 -0.023 -0.075 Ryan Braun -0.014 -0.008 -0.025 Torii Hunter -0.003 0.002 -0.01 Yadier Molina -0.006 -0.002 0.005

-‐0.25

-‐0.2

-‐0.15

-‐0.1

-‐0.05

0

0.05

0.1

0.15

Aaron Hill

Adria

n Be

ltre

Alex Rios

Allen Craig

Andrew

McCutchen

Aram

is Ra

mire

z

AusRn Jackson

Billy Butler

Buster Posey

Carlo

s Gon

zalez

David Murph

y

David Wrig

ht

Derek Jeter

Dexter Fow

ler

Joe Mauer

Jordan Pache

co

Marco Scutaro

MarRn

Prado

MaT

Holliday

Migue

l Cabrera

Mike Trou

t

Prince Fielder

Robinson

Cano

Ryan Braun

Torii Hun

ter

Yadier M

olina

Change

Players

Overall Change Regular Season vs. Clutch SituaCons

Change in BA Change in OBP Change in SLG

47

Additional Commentary:

In the background for research section, I mentioned the disappointment I had with 2009

Cy Young award winner. I did not have any research to back up my claim in this particular

scenario. There was a similar argument this year for the 2012 American League MVP. The two

leading candidates were Miguel Cabrera and Mike Trout. Both players were in my sample of

players whose averages were above .300. Many baseball writers felt that Trout should be the

winner. His rookie season was truly one of kind. He played great defense, hit for power, and

was dangerous while running bases. He was truly a deserving candidate. However, Miguel

Cabrera accomplished a feat that not occurred since 1967. He won the Triple Crown. He led the

Major Leagues in batting average, home runs, and runs batted in. In my opinion, Cabrera was

the leading candidate. On many baseball websites, namely ESPN, Mike Trout was the favorite.

In the end Cabrera won.

If we take a closer look at the data I collected, Cabrera is clearly the obvious choice. His

stats were far superior to Mike Trout in “clutch” situations. Miguel Cabrera’s batting average for

the regular season was .330. His average in clutch situations rose to .343. An increase of .013 is

fairly significant considering he already had the highest average in all of baseball. His OBP (On-

Base Percentage) rose from .393 to .425! This is truly amazing. This meant that in clutch

situations Cabrera was getting on base nearly 1 out of every 2 times he went to the plate. His

slugging percentage rose from .606 to an astounding to .720. This means that when he did get

hits, they were often for extra bases (doubles and home runs). These hits are more valuable in

the latter innings of close games.

Trout on the other hand showed the single largest decline of any player. His batting

average dropped from .329 (2nd highest in the American League) to a pedestrian .236 in the latter

48

innings of close games. This was the largest drop off in batting average of any player that was

included in my study. His OBP dropped from .399 to .312. This drop of .087 was again the

largest decline of any of the players that were analyzed. His slugging percentage dropped from

.564 to .355. This decline of .209 was the largest decline of any of the players in any of the

offensive categories.

I am not trying to take anything away from Mike Trout’s season. He certainly did things

in the earlier parts of games that contributed to victories for his team. But, it should also be

noted that Cabrera’s team, due in large part to his efforts, made it all the way to the World Series.

The Angels, Trout’s team, did not make the playoffs.

This motivated me to look a little closer at former MVP’s. What I found could be the

beginning of a new study. The MVP has been awarded in each league for 101 years. Of the 202

MVP awards handed out only 23 players had a batting average below .300. [4] Of the 23 that

were below .300, 20 batted above .290. It would be interesting to see if, like Cabrera, they were

indeed “clutch” performers. It would also be interesting to look at years where the voting was

extremely close. If two candidates were front runners in a particular season, it would be

interesting to see if the player chosen had better statistics in clutch situations, similar to Miguel

Cabrera.

It should also be noted that Matt Holliday, the player who gave me the motivation for this

study, had the 2nd worst decline in batting average. The only player who had a larger drop off

was Mike Trout. So, my observations about Matt Holliday were correct. However, on the

whole, most players who perform well throughout the regular season also perform well in the

latter innings of close games, what was defined as a “clutch” situation.

49

Conclusion:

In doing this study I tried to avoid those things that were labeled as “misleading” in the

first portion of this project. I did not insert numbers that simply helped support what I was trying

to prove. If I only studied Matt Holliday, the data would have supported my hypothesis.

However, this would have shown a great deal of bias. I chose a large list of players that also

performed at high levels to eliminate any personal bias. I did choose the categories that I

believed best measured clutch performance. Others may have had differing opinions. Strategy

two did not say we would eliminate bias, only that we recognize it and take it into account. I

clearly defined what I was going to research. A glossary of terms was provided to the reader to

give a clear insight into what constitutes baseball statistics. I recorded data in the same way for

each player. Comparisons were accurately made and explained. And lastly, graphs were

depicted in ways that would not mislead anyone trying to read them.

Doing the literature review enabled me to gather and present data in way that would not

mislead any readers. In doing so, I was able to make some interesting comparisons. While my

hypothesis was not supported, I was able to gain information that might lead to further research.

That is really the beauty of any research. We may not always find what we were hoping for, but

there is always something to be learned. It is my hope that I can continue to do my own

statistical research and also help my students to become more critical analyzers as they encounter

statistics.

Games AB H 2B 3B HR Runs RBI BB SO AVG OBP SLG OPS148 530 178 39 1 24 78 103 69 96 0.336 0.408 0.549 0.95793 172 52 17 0 6 19 30 28 39 0.302 0.393 0.537 0.930






Games AB H 2B 3B HR Runs RBI BB SO AVG OBP SLG OPS

Entire Season5+ Inn. (close games)

Mike Trout

Buster Posey

Andrew McCutchen

Miguel Cabrera



Austin Jackson



Prince Fielder


Torii Hunter

139 559 182 27 8 30 129 83 67 139 0.329 0.399 0.564 0.96378 152 36 7 1 3 20 21 17 39 0.236 0.312 0.355 0.667







Derek Jeter


5+ Inn. (close games)

Ryan Braun


Aramis Ramirez

Entire Season

David Murphy


Adrian Beltre

5+ Inn. (close games)Entire Season

Joe Mauer












5+ Inn. (close games)

Robinson Cano

Entire Season

Marco Scutaro

Jordan Pacheco

Matt Holliday

Allen Craig


Yadier Molina








Games AB H 2B 3B HR Runs RBI BB SO AVG OBP SLG OPSMartin Prado


Aaron Hill



Alex Rios


David Wright



Billy Butler

Dexter Fowler

Carlos Gonzalez

156 617 186 42 6 10 81 70 58 69 0.301 0.359 0.438 0.79789 188 56 13 3 5 25 26 16 16 0.298 0.348 0.479 0.827


50

Bibliography

Books

Best, J. (2001). Damned Lies and Statistics: Untangling Numbers from the Media,

Politicians, and Activists. Berkeley and Los Angeles, CA: University of California Press.

Seife, C. (2010). Proofiness: How You’re Being Fooled by the Numbers. New York, NY:

Penguin Books.

Huff, D. (1954). How to Lie with Statistics. New York, NY: W.W. Norton & Company.

Salsburg, D. (2001). The Lady Tasting: How Statistics Revolutionized Science in the 21st

Century. New York, NY: Holt Paperback

Paulos, J.A. (1995). A Mathematician Reads the Newspaper. New York, NY: Basics Books.

Internet

[1] http://www.sltrib.com/sltrib/news/54026798-78/lds-religious-church-largest.html.csp

[2] http://en.wikipedia.org/wiki/Religion_in_the_United_States

[3] http://www.espn.com

[4] http://www.baseball-reference.com

Documents

Masters Project edited from the commentslike to thank Davar Khoshnevisan for getting me started in the right direction and for the advice he offered in finding books that would aide