12
Laura Boren MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized bag of skittles and count the number of each color of candy in the bag. The class data was compiled and we used it for a number of different exercises involving a different aspect of statistics. For the first part of the project, we determined the proportion of each color of candy and created a Pareto chart and a pie chart for the total number of each color of candies in the entire class. We compared the class data to our own personal data and noted any similarities or differences. For part 2 of the project we used the skittles data to create statistics summaries of the mean, standard deviation and 5-number summary. We made a frequency histogram of the total number of candies as well as a box plot. Individually, I also wrote a paragraph about the significance of different qualitative and quantitative methods of analysis. The last part of the project involved confidence intervals. We found 3 different confidence intervals for the population proportion, mean, and standard deviation and wrote an analysis about what each confidence interval meant.

MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren

MATH 1040 Skittles Data Project

For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual

sized bag of skittles and count the number of each color of candy in the bag. The class data was

compiled and we used it for a number of different exercises involving a different aspect of

statistics.

For the first part of the project, we determined the proportion of each color of candy and

created a Pareto chart and a pie chart for the total number of each color of candies in the entire

class. We compared the class data to our own personal data and noted any similarities or

differences.

For part 2 of the project we used the skittles data to create statistics summaries of the

mean, standard deviation and 5-number summary. We made a frequency histogram of the total

number of candies as well as a box plot. Individually, I also wrote a paragraph about the

significance of different qualitative and quantitative methods of analysis.

The last part of the project involved confidence intervals. We found 3 different

confidence intervals for the population proportion, mean, and standard deviation and wrote an

analysis about what each confidence interval meant.

Page 2: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer

Math 1040 Class Skittles Proportions Color Count Proportion of Total

Red Skittles

564 0.199

Orange Skittles

564 0.199

Green Skittles

566 0.199

Purple Skittles

559 0.197

Yellow Skittles

586 0.206

Total Number of Skittles in the class

2839

1.000

MATH 1040 Skittles Data

Page 3: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer

Does the Class data represent a random sample?

Yes, the class data does represent a random sample. Although each student was asked to buy their own

bag of skittles and not every bag of skittles in the region had an equal chance of being selected, the

distribution of skittles from the central plant/warehouse was most likely random. The skittles company

most likely does not count colors as they load the bags and simply loads by weight, and assuming

students did not make any biased decisions about which bag to grab off the shelf every bag produced had

an equal chance of being shipped to any location in the country and being selected at random by a student

in the class.

What would the population be?

In this study, the sample is the class data. Since not everyone in the class is currently living in the same

state, the population would be all 2.17 ounce skittles bags in the United States. There are currently

different manufacturing plants operating overseas, therefore the population can only reasonably be

expanded to include the United States distribution circuit.

Page 4: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized
Page 5: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren

Math 1040 Skittles Data Skittles Color Class Total Proportion My Total Proportion

Red Skittles

564 0.199 16 0.258

Yellow Skittles

586 0.206 11 0.177

Orange Skittles

564 0.199 10 0.161

Green Skittles

566 0.199 15 0.242

Purple Skittles

559 0.197 10 0.161

Total Skittles

2839 62

My skittles bag differed quite a bit from the class data. My bag had significantly more red and green

skittles than the class total, but like the class data had the fewest purple skittles. I had always assumed

that red was the most common skittles color, but that may just be due to the vibrancy of the color red

and it being noticed more. In my skittles bag it was the most common, but that was not supported by

the class data. I was surprised to see yellow skittles being the most common in the class.

Page 6: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

1. Using the total number of candies in each bag in our class sample, compute the

following measures for the variable “Total candies in each bag”:

(a) mean number of candies per bag

The mean number of candies per bag is 59.1 candies.

(b) standard deviation of the number of candies per bag

The standard deviation per bag is 6.4 candies.

(c) 5-number summary for the number of candies per bag

The 5-number summary is 34-58-60-62-71.

Report these summary statistics rounded to one decimal place, if needed.

Page 7: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Math 1040 Skittle Data 2015

Page 8: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren

Skittles Data Part 3

1. From these graphs we can conclude that the Frequency Histogram is skewed to the left,

although our boxplot appeared rather symmetrical, likely due to not having smaller value

increments on the number line. This distribution and skew is expected because the median

number of candies per bag is 60 but the mean is only 59.1. One of the main causes of the

negative skew is that several of the skittles bags only had 30-40 candies in them, which is almost

half as much as the median number of skittles per bag. Those bags represent outliers, and pull the

data towards the left. My data agrees with the data collected by the whole class because the

highest frequency of candies per bag was between 60-65 candies per bag. My bag had 62

candies, which falls right in that class.

2. Categorical variables are also known as qualitative variables. These variables can be put

into different categories, such as a model of car, color, gender, etc. Quantitative data is data that

can be ordered and measured. The number of candies in a bag of skittles is quantitative, whereas

the color of the candy is categorical.

Graphing quantitative data is best done with histograms, stem leaf plots, dot plots, bar

graphs, and box plots. All of these types of graphs can be used to measure the quantity of a

certain variable. Categorical data is best graphed using a method that lets you compare the

groups to one another. A bar graph can work for both quantitative and categorical data, but a pie

chart doesn’t make sense for quantitative data because it is comparing categories to the whole. A

pie chart would effectively show the percentage of each color of skittles in a bag (categorical

data), but cannot effectively be used to show the number of skittles in a bag (quantitative data).

When it comes to calculations, mean and median only make sense for quantitative data.

The mean is the average quantity of something in an entire sample, therefore it is a more

meaningful calculation when applied to quantitative data. The median represents the middle

value of the data and once again makes the most sense only when applied to quantitative data.

The best central tendency to apply to categorical data is the mode. When looking at the colors of

candy in a skittles bag, you may not able to find the average color or the median color, but you

can establish which color occurs the most often. Likewise, when looking at the number of

candies in a skittles bag, the best values for probability distributions are going to be the average

and median number of skittles.

Page 9: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal

99% Confidence Interval estimate for the population proportion of yellow candies

X= 586

n= 2839

Z-value for 95% CI = 2.576

p= 586/2839 = 0.206

0.206 +/- 2.576 * (0.007596)

0.206 +/- 0.01957

99% Confidence Interval Estimate: (0.186, 0.226)

Confidence Intervals estimated from a population proportion are used to determine, with the

specified degree of confidence, the proportion of a characteristic found within a population. In

relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of

skittles falls between 0.186 and 0.226.

95% Confidence Interval estimate for the population mean number of skittles per bag

n= 49

Sx = 6.38

Sample mean= 59.15

Standard error of the mean = 0.9114

To find the t-value, a t-table was consulted using a degree of freedom of 50. The t-value is 2.009.

59.15 +/– t*(0.9114)

59.15 + 1.83 = 60.98

59.15- 1.83 = 57.32

95% Confidence Interval Estimate: (57.32, 60.98)

Confidence Interval estimates of the population mean use sample date to extrapolate an interval with

the specified degree of confidence that the mean characteristic of a population should fall within. In

this case, we are 95% confident that the mean number of skittles in any bag is between 57.32 and

60.98.

Page 10: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal

98% confidence interval estimate for the population standard deviation of the number of

candies per bag

n=49

s=6.378

S2=40.679

χ2 1-a/2 = 0.99

χ2 a/2 = 0.01

On the Chi square distribution chart, 50 degrees of freedom was used. The value for χ2 1-a/2 was

29.707. For χ2 a/2 it was 76.154.

√[ s2(df)/Chi value]

Lower bound: 5.06

Upper bound: 8.11

Confidence Interval estimates from the population standard deviation use the sample standard

deviation in order to generate an interval that the population standard deviation of the number of

candies should fall within, with the specified level of confidence. In this case, we are 98%

confident that the population standard deviation is within 5.06 and 8.11 candies. The problem

with confidence interval estimates taken from the sample standard deviation is that the sample

standard deviation may be quite different from the actual population standard deviation.

Page 11: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren

The purpose of taking sample data and calculating statistics from them is to apply those

statistics to a larger population. Since a population is larger than a sample, how well a sample

statistic can be used to estimate a population parameter is an issue. A confidence interval helps to

solve that issue by allowing us to provide a range of values that the population parameter is

likely to fall within. The intervals are constructed with a certain level of confidence, reflected as

a percentage such as 95%, 98% or 99%. This means that if the same population were to be

examined on multiple occasions and a parameter interval calculated each time, the intervals

would contain the true parameter in X% of cases.

Page 12: MATH 1040 Skittles Data Project - Single Mom Studying · MATH 1040 Skittles Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized

Laura Boren

Skittle Project Reflection

When I first started the Skittles project, I was intimidated by the process of using

statistical concepts to interpret real-life data. As the project went on I became much more

comfortable with concepts such as confidence intervals and creating Pareto charts and frequency

histograms. In my volunteer work as a lactation educator and also as a nursing student I

sometimes find myself reading and interpreting peer-reviewed clinical research. Understanding

what things like confidence intervals are and what makes data significant or unusual is very

helpful in interpreting such studies and thinking critically about what the data actually means.

There are even some aspects of statistics that I used before taking this class. In Human

Physiology we were required to calculate the mean, median, and standard deviation of lung

inspiratory volume as part of our laboratory unit on the respiratory system.

Taking calculus really helped me to understand real-world math applications and

statistics only supported what I already knew about the practicality of math. Statistics is a very

fundamental part of scientific literacy and has numerous applications in the world of business

and economics. By completing the skittles project it helped me to understand how businesses and

corporations might need to use statistics, particularly standard deviations, in order to produce

accurate and consistent products. Statistics can also be used to calculate demand and determine

shipping and distribution needs, and evaluate product quality and customer satisfaction. In our

skittles project we determined the average proportion of each color of skittles candy that came in

a bag as well as a confidence interval of that population proportion. This could be helpful in

evaluating customer candy preferences and overall satisfaction based on flavor preference. A

company might use similar statistics in real life to ensure product standardization.