15
Tenorio 1 IB Math SL Internal Assessment: Farmville Statistics Arielle Tenorio Period 6 Farmville is a popular computer game that is hosted by the social networking website, Facebook. This game allows players to manage a virtual farm by plowing, planting, growing, and harvesting on their virtual farmland. Crops, trees, and livestock can be purchased with the “FarmCoins” that are earned by harvesting. There are also levels in this game that are achieved by reaching a certain amount of experience points. Players of higher levels tend to have larger farms, more crops, and more FarmCoins than those of lower levels. This assignment will examine the relationship between the number of trees a Farmville player has and what level they are on in the game. It is predicted that there will be a positive relationship. This assumption can be confirmed or denied by analyzing and processing collected data. First, a scatter plot will be produced with a line of linear regression to display the trend of the data. The correlation coefficient value for the two variables will also be determined. A box and whisker plot will compare the highest-ranking players out of those surveyed and the lowest- ranking and the number of trees that both groups tend to own. A chi-squared test will test for independence to find if the two factors occur as a result of one another or is they are unrelated events.

Math ia farmville final

  • View
    2.004

  • Download
    5

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Math ia  farmville final

Tenorio 1

IB Math SL Internal Assessment: Farmville Statistics

Arielle TenorioPeriod 6

Farmville is a popular computer game that is hosted by the social networking website, Facebook. This game allows players to manage a virtual farm by plowing, planting, growing, and harvesting on their virtual farmland. Crops, trees, and livestock can be purchased with the “FarmCoins” that are earned by harvesting. There are also levels in this game that are achieved by reaching a certain amount of experience points. Players of higher levels tend to have larger farms, more crops, and more FarmCoins than those of lower levels.

This assignment will examine the relationship between the number of trees a Farmville player has and what level they are on in the game. It is predicted that there will be a positive relationship. This assumption can be confirmed or denied by analyzing and processing collected data. First, a scatter plot will be produced with a line of linear regression to display the trend of the data. The correlation coefficient value for the two variables will also be determined. A box and whisker plot will compare the highest-ranking players out of those surveyed and the lowest-ranking and the number of trees that both groups tend to own. A chi-squared test will test for independence to find if the two factors occur as a result of one another or is they are unrelated events.

Page 2: Math ia  farmville final

Tenorio 2

Data samples were collected from 25 random Farmville players after logging onto Facebook and opening the Farmville game. After visiting the virtual farms of 25 “Friends” and counting the number of trees on each farm, a table was drawn up to organize the collected values.

Figure 1: Collected Data# Farmville Level Number of Trees1 7 72 8 93 9 94 9 165 10 116 13 177 13 238 13 459 15 1910 15 2011 16 3312 16 3513 16 1614 18 2115 19 1816 20 2017 22 7918 22 3519 23 4120 23 2821 25 6222 26 3523 28 4424 31 9425 34 40

Figure 1: This table displays the data that was collected.

From the table, it can be observed that the number of trees generally increases as the level increases. The values on this table will be generated onto a scatter plot.

Page 3: Math ia  farmville final

Tenorio 3

A scatter plot is used to visually display the relationship between two variables on a two-dimensional graph. A line of linear regression, or trend line, can be found to confirm the observation of the relationship. A correlation between the variables occurs as a result of the clustering of data points around the trend line.

Figure 2: Scatter Plot and Linear Regression Line

The Relationship Between Level and Number of Trees

y = 2.0783x - 6.4127

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40

Farmville Level

Num

ber o

f Tre

es

Figure 2: This scatter plot shows a positive relationship between the level of Farmville and the number of trees a player has. The line of linear regression is produced by using Microsoft Excel. The calculations to find this equation manually is produced below.

Line of Linear Regression:The formula for finding the linear regression line for y on x is

where is the average of Y variables, is the average of X variables, Sxy is the covariance of X and Y and Sx

2 is the standard deviation of X, squared.

In order to find these values, the data was organized into a table, below.

Page 4: Math ia  farmville final

Tenorio 4

Figure 3: Table for Linear Regression Line# Level (x) Trees (y) xy x² y²1 7 7 49 49 492 8 9 72 64 813 9 9 81 81 814 9 16 144 81 2565 10 11 110 100 1216 13 17 221 169 2897 13 23 299 169 5298 13 45 585 169 20259 15 19 285 225 36110 15 20 300 225 40011 16 33 528 256 108912 16 35 560 256 122513 16 16 256 256 25614 18 21 378 324 44115 19 18 342 361 32416 20 20 400 400 40017 22 79 1738 484 624118 22 35 770 484 122519 23 41 943 529 168120 23 28 644 529 78421 25 62 1550 625 384422 26 35 910 676 122523 28 44 1232 784 193624 31 94 2914 961 883625 34 40 1360 1156 1600

451 777 16671 9413 35299mean = 18.04 31.08 666.84 376.52 1411.96

Figure 3: The sums and averages of of x, y, xy, x² and y² were found and listed. By organizing the data in this manner, it was easier to quickly find the values for Sxy and Sx

2. The calculations are shown below.

n = 25

To find the average of x:

To find the average of y:

Page 5: Math ia  farmville final

Tenorio 5

To find Sxy:

To find Sx2:

To find the equation of the line of linear regression:

y – (31.08) =

y – (31.08) = 2.078x – 37.493

y = 2.078x – 6.413

The correlation between the two values can also be found. Pearson’s correlation coefficient formula is used to find this value. If r = 1, then it is said that the x and y values are perfectly correlated. If r = 0, then x and y are not correlated. If r = -1, then x and y are perfectly negatively correlated. By calculating the correlation coefficient, the degree of linearity between X and Y can be determined.

Pearson’s Correlation Coefficient Formula:

The formula for finding the correlation coefficient is

.

= 31.08 = 18.04

Page 6: Math ia  farmville final

Tenorio 6

Most of the values have already been determined while finding the linear regression line equation.To find the correlation coefficient, r:

r = 0.70334

r² = 0.49468

The correlation value can be rounded to 0.703. It can be stated that there is a moderate, positive correlation between x and y. The positive r value means that the level of a Farmville player (x) increases, then so does the number of trees (y). The graph also represents the positive relationship. However, it will be noted that there are data points that do not cluster as closely to the trend line as the other data points such as points (22, 79) and (31, 94). These points are considered outliers. They might appear as a result of the freedom every player has to purchase a wide variety of items other than trees (animals, seeds, decorations, buildings etc.). Not all players have the same desire to purchase trees. Parallel boxplots can be used to display some of the descriptive statistics of the data sets x and y.

The parallel boxplots will present a visual comparison of the distribution of the data as well as the descriptive statistics. These descriptive statistics are median, range, interquartile range minimum and maximum. The spread of data for the number of trees owned by the lowest-ranking half of Farmville players surveyed (levels 7-15) will be compared to that of the highest-ranking players from the group of 25 players (levels 16-34). It is predicted that the lower-level players will less trees while higher-level players will have a greater number of trees, but there may be some overlapping data.

Figure 4: Number of Trees for Levels 7-15 and 16-34Statistic Levels 7-15 Levels 16-34

Quartile 1 9 21Minimum 7 16Median 16.5 35

Maximum 45 94Quartile 3 20 44

n = 25

= 965.97

Page 7: Math ia  farmville final

Tenorio 7

Figure 4: This table shows the five number summaries for level and number of trees. The data that is organized here will be shown in the box and whisker plot.

Figure 5: Box and Whisker Plot

0

10

20

30

40

50

60

70

80

90

100

Levels 7-15 Levels 16-34

Quartile 1MinimumMedianMaximumQuartile 3

Figure 5: The box and whisker plot compares the spread of data for Farmville players and the number of trees they own. Fifty-percent of the highest ranking players out of the group that was tested own anywhere from 21 to 45 trees, whereas the middle fifty-percent of lowest ranking players own from 10 to 20 trees. Some beginner players, however, seem to own as many trees as the higher-level players.

By comparing the descriptive statistics describing the number of trees that the highest ranking players own versus the lower players, it can be seen that while higher-ranking players tend to have more trees, it is not necessarily true that lower-ranking players cannot surpass them in number of trees owned. This can be seen on the plot, as twenty-five percent of the lower level players own about as much as the higher-level group’s middle fifty-percent. However, the higher-level group has a greater median than that of the lower-level group, which suggests that they own more trees than most of the beginner players.

A chi-squared test will now be performed to determine if the number of trees a player has and their level in the game are independent or independent events. The equation for the chi-squared test is

Page 8: Math ia  farmville final

Tenorio 8

where fo is the observed frequency and fe is the expected frequency. Contingency tables will be constructed to show the results of the 25 surveyed players. One table displays the observed values, while another displays the expected values.

Observed values table:

Trees7-30 >30 Total

Leve

l 7-15 10 0 1016-34 4 11 15Total 14 11 25

Expected values table:

Trees7-30 >30 Total

Leve

l 7-15 5.6 4.4 1016-34 8.4 6.6 15Total 14 11 25

To find expected value (for box 7-15 x 7-30):

Before performing the chi-squared test, the null and alternative hypotheses are formed, the degree of freedom is calculated, and the significance level is stated.Ho (null hypothesis) states that game level and amount of trees are independent events. H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom.At a 5% (0.05) significance level with df = 1, .

To find degrees of freedom for a 2 x 2 contingency table:df = (r-1)(c-1)df = (2-1)(2-1)df= 1

Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation.

Page 9: Math ia  farmville final

Tenorio 9

Figure 6: X2 Calculation

10 5.6 4.4 19.36 3.4571428570 4.4 -4.4 19.36 4.44 8.4 -4.4 19.36 2.30476190511 6.6 4.4 19.36 2.933333333

Total= 13.0952381Figure 6: This table shows how the chi-squared value was found.

Because the X2 is greater than 5.99, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events.

According to the scatter plot and the line of linear regression, there is a positive relationship between the number of trees a Farmville player has and what level they are on in the game. By finding Pearson’s correlation coefficient, it was determined that there is a moderate correlation between the two variables. As stated before, this could be because more experienced players tend to have more “FarmCoins” to purchase trees. Lower-level players and beginners are more likely to buy smaller, cheaper plants. The boxplot also showed that higher-level players own more trees, but also suggested that lower-level players have the ability to own more trees than high-level players. The chi-square test showed that the two factors are dependent events. The level of a Farmville player and the number of trees they own in the game are dependent events. They have a positive correlation suggesting that as a player rises in level, they buy more trees.

There were a couple data samples that did not cluster as closely to the linear regression line as the other data points did. These data points are considered to be outliers. Each player has the freedom to use their “FarmCoins” on various accessories for their farms, such as animals, seeds, and decorations, and not all players are interested in buying the same items for their virtual farm. Some players may buy more trees than seeds or animals. To determine if these outliers skew the data significantly, a chi-squared test will be performed on the data again with the outliers removed. The table below displays the data samples without the two outliers, (22, 79) and (31, 94).

Page 10: Math ia  farmville final

Tenorio 10

Figure 7: Data without OutliersFarmville Level Number of Trees

7 78 99 99 1610 1113 1713 2313 4515 1915 2016 3316 3516 1618 2119 1820 2022 3523 4123 2825 6226 3528 4434 40

Figure 7: This data will be used to perform a second chi-squared test.

Observed values table:Trees

7-30 >30 Total

Leve

l 7-15 10 0 1016-34 4 9 13Total 14 9 23

Expected values table:Trees

7-30 >30 Total

Leve

l 7-15 6.086957 3.913043 1016-34 7.913043 5.086957 13Total 14 9 23

Ho (null hypothesis) states that game level and amount of trees are independent events.

Page 11: Math ia  farmville final

Tenorio 11

H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom.At a 5% (0.05) significance level with df = 1, .

Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation.

Figure 8: X2 Calculation without Outliers

10 6.1 3.9 15.21 2.4934430 3.9 -3.9 15.21 3.94 7.9 -3.9 15.21 1.9253169 5.1 3.9 15.21 2.982353

Total= 11.30111Figure 8: This table shows how the chi-squared value was found.

Because the X2 is greater than 3.84, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events. This concludes that the outliers did not have a significant affect on the outcome of the processed data, and did not skew the results.