Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Effects of Economic Status on International Soccer Success
BY
BAILEY MORTON
THESIS
This thesis is submitted to the faculty of the Economics Department
of the University of Florida in partial fulfillment of the requirements
for the Bachelor of Arts degree
Gainesville, Florida
Approved by:
______________________
Dr. Michelle Phillips
Thesis Advisor
2
I. Introduction
European and South American countries have historically dominated the international
soccer landscape. This continues to be the case in the 21st century. Further, only European and
South American countries hold World Cup titles, with Europe holding the most titles among all
continents, 12, and having come out on top in each of the last four tournaments (World of
Soccer). Per the ELO ratings for world soccer, most countries in the top 50 are predominantly
South American or European. Typically, these European countries are more developed and thus
wealthier. When considering the economic strength, one usually can conclude that richer and
more developed countries can provide better resources and facilities for their citizens. If this is
the case, then it would seem reasonable to assume that the higher the level of economic strength
a country has, the better it would perform on the international stage, since richer countries have
access to more facilities, coaches and technologies to enhance player performance. Yet, when
considering the 2018 ELO ratings, the U.S., one of the richest countries in the world, sits only at
25th, with other developing countries, such as Senegal, only 6 places behind them. Furthermore,
many of those same South American power-houses are classified as developing countries and are
relatively poorer, but have consistently ranked far above the U.S.
Thus, this paper attempts to explore the effects of real GDP per capita and other
economic factors such as income inequality, per capita education spending, urbanization,
popularity, and corruption on a country’s success on the international soccer stage.
II. Sample
The sample that will be used for this study will be those of countries, out of the 238
countries that are present in the ELO ratings system, which have played at least 100 matches, in
3
the years 2002-2016 resulting in 2658 observations. More specifically, we will use yearly
observations from that aforementioned time frame for this study. Thus, the number of teams who
have played 100 matches was larger in some years than others. This includes all countries that
have participated in both friendly and competitive matches that were played under major soccer
confederations. The reasons for selecting countries based on this characteristic are that “ratings
tend to converge to a team’s true strength…after about 30 matches” (ELO) and because having
played 100 matches usually indicates that the team has existed long enough to play in a greater
number of competitive matches. These matches can include, World Cup qualifiers, or regional
championships like the Copa America, African Cup of Nations, or Gold Cup in North America.
III. Dependent Variable
The dependent variable that will be considered is a country’s international soccer success,
which will be measured by the ELO rating system, using the years 2002-2016. These years were
chosen to study the game during the 21st century world cup era and at a time with a more
globalized and ever-changing world. Since many of the economic data was unavailable for 2018,
the range was reduced to 2002-2016. An ELO rating is calculated by “adding a weighting for the
kind of match, an adjustment for the home team advantage, and an adjustment for goal difference
in the match result” (ELO). Thus, this rating system is slightly more comprehensive than the
traditional FIFA ranking and reduces any bias that might occur if a team has played fewer
matches, participated in fewer major tournaments, performed better against weaker teams, etc.
Further, it also displays the point totals for each team-which the FIFA rankings do not do-so that
the spread between teams is more visible. For the full explanation of the ELO rating calculation
4
and the FIFA ranking calculation, visit https://www.eloratings.net/about and
https://www.fifa.com/fifa-world-ranking/procedure/men.html .
IV. Independent Variables
Real GDP Per Capita
Thus, this study will make use of a country’s yearly real GDP per capita for the years
2002 -2018 from the World Bank. It seems that richer, more developed countries would perform
better on the international stage since they can fund the construction of more stadiums and
training facilities, hire better coaches and utilize modern statistical technologies to study player
performance. On the other hand, countries with larger populations will have a larger pool of
players to draw from, so using GDP per capita instead of aggregate GDP allows for a better,
more relative comparison of income. However, countries like Paraguay and Senegal, which
typically are ranked in the high 20’s and low to mid 30’s, in the ELO rating system, are usually
ranked in the low to mid 100’s in GDP ranking, as evidenced by the CIA World Fact Book and
the World Bank. There seems to be some positive correlation between ranking and GDP, in that
nearly half of the top 50 ELO-ranked countries are also amongst the top 50 GDP-ranked
countries. But the spread, in terms of wealth, on average, increases significantly from ELO
ratings between 50 and 100. For example, developed countries like Canada and Slovenia have
ranked below developing countries like Honduras and Iraq. Thus, it appears that higher GDP
does not necessarily guarantee international soccer success.
Income Inequality
5
The level of income inequality in a country can vary significantly amongst developing
and developed countries. It can be seen from the CIA World Fact Book, which measures Gini
coefficients on a scale that ranges from 0 to 100, with 0 referring to perfect equality and 100
referring to perfect inequality, that a developed country like Hong Kong has one of the highest
levels of inequality, with 53.9-which sits Hong Kong at 9th on the list. Similarly, the U.S. and
Singapore, two other developed countries, are indexed at 45.9 and 45.0, showing up in the top 40
for income equality level. While most developed countries are clustered at the bottom and the
least developed at the top, there are several countries from each level of development that stray
from their respective trends. However, since the data for Gini coefficients for each year, from
2002-2016, is not available, we will use, as a proxy for Gini Coefficients, the share of income by
the top 1% in each country to measure income inequality. This will be measured as a percentage
of the share of the total income in a country. This data will be provided by the Top Income Index
database. Furthermore, if we look at the top ELO-ranked countries, those that are European and
South American countries, we can see that many South American countries like Brazil, Chile,
and Colombia are within the 20 highest levels of income inequality, while strong European
teams like France, Germany, and Croatia show up in the bottom 20. The fact that most
developed, high ELO-ranked Europe countries have tighter inequality gaps allows more people
at all income levels to invest time in soccer. Then, since Europe boasts the strongest club leagues
and thus is home to a pool of powerful investors, it allows its countries to have the money to
support stadium construction and focus on player development and grow and secure local talent.
On the other hand, in South America, where income inequality is greater, it may be the case that
these countries have equally superior access to talent because that opportunities for good
employment are slimmer and playing soccer can become the only alternative or path to financial
6
success and sustainability. Therefore, there is a potential for a greater number of high-level
players available who wish to escape poverty and lack of opportunities.
Education Spending Per Capita
It is worth it to consider the level of education of spending a country achieves as it
reflects a country’s commitment to the growth of its citizens’ abilities, skills, and contributions to
the economy. Typically, developed countries are more concerned with greater investments in
education because they can afford to better fund their students to keep their economies more
productive, so that education further develops a country and conversely. It is also possible that
countries with better public-school funding can boast stronger soccer programs. Thus, this
funding could allow players to have a better opportunity to develop and be exposed to the sport
at a young age. Further, they might then have an easier time be scouted at the professional level,
if they can make a name for themselves within local school leagues. Thus, this study will make
use of the percent of government expenditures on education spending for the years 2002- 2018.
These percentages will be multiplied by the real GDP per capita amounts and divided by the
population to convert this metric into per capita terms. A brief insight can be seen by studying a
small but notable sample of countries. If we consider the OECD countries, then we can see that
the more developed, richer countries spend more on primary to non-secondary education per
capita. This includes highly ELO-rated countries like the dominant European countries including
France, Belgium, Germany, Netherlands, as well as the United States, South Korea, etc. (OECD)
However, on average, South American powerhouses, with high ELO ratings like Argentina,
Colombia, Brazil and Chile rank towards the bottom of this group (OECD).
7
Urbanization: The level of urbanization of a country, i.e. the number of people living in
urbanized areas within a country, is indicative of the developmental stage of a country. For this
study, the yearly percentage of people living in urban areas, as estimated by the World Bank,
from the years 2002-2018, will be analyzed. The inclusion of this variable is related to the far-
reaching effects urbanization has on a country. Historically, urban areas have allowed for better
access to health care, nutrition, goods and services, jobs, as well as facilities for entertainment
(Lore Central). Thus, it is of interest for this study to see if rising levels of urbanization, which
traditionally lead to higher access to health/nutrition, employment, as well as facilities for sports
teams allow for countries to better develop players for international success.
Popularity: An important aspect of success can be seen regarding the amount of media coverage
soccer receives in a country as well as its relative popularity, in that “popularity of a sport
depends on its broader significance within a nation's culture” (Hoffman et. al). It is reasonable to
assume that said significance to a nation’s culture can be created due to accessibility and
awareness of the sport. Further, it appears this popularity can be a result of intense media
coverage of a sport relative to other sports, news, and ideas. An example can be seen in the U.S.
where the popularity of soccer pales in comparison to that of sports such as football and
basketball, due to intense media coverage by networks such as ESPN on those sports, in addition
to the historical popularity of those sports in American culture. Thus, it is possible that an
oversaturation of media content relating to soccer can influence more people to pursue
professional careers in the sport, due to its significance in their country. To measure this concept
of popularity, the Google Trends Index for searches of “FIFA”, a language-neutral word, from
2004-2016, will be used to indicate the number of people who will have been potentially exposed
8
to soccer-related media such as matches or internet highlights, since FIFA owns most of these
videos, photos, etc. This analysis will be included in the appendix.
Corruption
The level of corruption in a country is indicative of that country’s ability to maintain the rule of
law, as well as properly fund and maintain infrastructure, the educational system, and the
economy as a whole-increasing the difficulty of performing transactions, uncertainty regarding
employment, inefficient allocation of resources, etc.(Investopedia) In the case of international
soccer, the main governing body of soccer, FIFA has faced multiple of allegations of bribery and
corruption. In 2015, multiple high-ranking FIFA executives were arrested and banned from
soccer, for charges relating to “money laundering, racketeering, wire fraud” (BBC).
Additionally, there is speculation that these same officials have been involved in bribery scandals
regarding the selection of host countries for the World Cup. Thus, it is reasonable to examine if
corruption may have played a part in allowing certain countries to play easier matches and have
better chances at advancing in major tournaments. Further, it is possible that corruption may
have weakened a country’s ability to properly regulate its national team and compete at the
international level. Another facet of corruption is that an inefficient use of funds may have
prohibited certain national teams from both forming or growing earlier, because there was a lack
of access to stadiums, better training, and nutrition. This study will make use of the corruption
index for the years 2002-2018 created by Transparency international, where from 2002 to 2009,
a score of 10 translates to “very clean” and 0 translates to “very corrupt”, and from 2009 onward,
100 translates to “very clean” and 0 to “very corrupt”. Thus, to standardize the values, the index
9
will be scaled by a factor of one-tenth, for the years 2009 onward for a possible comparison for
the previous years of the index. Like the Gini coefficient rankings, on average, many European,
highly ELO-rated countries populate the top 40 spots, whereas many of highly rated South
American teams start to show up around the 100-rank mark. The corruption level is significantly
different, with these South American countries sitting 25-40 (2.5 points to 4 points) below the
ELO-comparable European teams.
V. Summary Statistics
The summary statistics for the variables used in this study are provided below. As mentioned
previously, the popularity and the popularity-income inequality variables will be considered for
the final paper, and thus do not appear in this table.
Table 5.1: Summary Statistics
Variable Mean SD Min Max Observations
ELO Rating 1460 285.2716 603 2150 2658
Real GDP Per Capita 14979.1 21825.64 111.4 179308.1 2100
Urbanization 57.423 23.66352 9.864 100 2144
Education Completion Rates 88.41 18.1 20.46 124.11 1354
Corruption Level 43.37 21.58007 8 97 2065
Income Inequality 34.33 12.41592 13.96 69.99 612
VI. Regression Results
A series of single-variable regressions were run to determine the individual relationships
between ELO rating and the independent variables, in addition to a series of multiple regressions
Through exploration of functional forms, it was discovered that taking the log of some of the
10
independent variable provided useful information regarding its relationship with ELO score.
Thus, the linear-log model will be compared with the standard linear model for some variables.
These single-variable regressions and functional forms will be discussed in the appendix. The
linear-log model will be briefly mentioned in the regression results section, to highlight an
interesting multiple regression that utilizes the linear-log model. To account for the lack of
observations for the Top Income Share variable, a second regression was run, excluding the Top
Income Share variable.
1. Real GDP Per Capita (GDP)
One of the main goals of this study was to investigate the intuitive notion that a richer
country should be able to perform better on the international stage because of the greater
funding a country would be able to provide for a team. However, many rich European
countries consistently rank just as highly as poorer South American countries. Thus, our
expectations regarding the effect of GDP on ELO score are not clear, but we could expect
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 1460.27 1354.14 – 1566.41 <0.001 1139.04 1127.60 – 1276.77 <0.001
GDP -0.0002151 -0.001771 – 0.001341 0.786 -0.000248 -0.003509 – 0.001347 <0.001
Education 0.04 0.02 – 0.06 <0.001 0.06 0.05 – 0.08 <0.001
CPI 0.57 -0.92 – 2.06 0.454 0.26 -0.67 – 1.19 0.581
Top Income Share -1.37 -2.39 – -0.36 0.008
Urbanization 2.26 0.75 – 3.58 0.003 5.98 5.36 – 6.60 <0.001
Observations 555 1847
R2 / adjusted R2 0.120 / 0.112 0.274 / 0.272
11
that to an extent an increase in GDP will impact ELO score, until certain other economic
factors overtake the effect of GDP. Per the initial linear regression, GDP negatively
affects ELO score, in that a one dollar increase in real GDP per capita results in a
decrease of 0.0002151 points towards ELO score, on average. This is a noticeably small
impact and was in fact not statistically significant given that the p-value for real GDP per
capita was 0.786, which is quite high. Further, this does not align with our intuition that
real GDP per capita and ELO score are positively correlated. After dropping the Top
Income Share from the model, real GDP per capita was found to have a p-value <0.001,
so that real GDP per capita has a statistically significant effect on ELO score at all the
standard significance levels. However, real GDP per capita still had a negative coefficient
in this model. However, since the 95% confidence intervals for GDP in these multiple
regressions contained 0, we do not find it worthwhile to interpret the direction its effect
on ELO score. However, it is still likely that richer countries still can provide more
facilities, hire better coaches, and utilize other related technological resources to
strengthen their team’s performance. Moreover, it may be so that GDP can’t be such a
strong predictor for success because it acts more as necessary grounds for success to be
feasible. What we mean by this is that if a country lacked the ability to provide these
resources, a national team wouldn’t even have the necessary resources to compete at a
high level.
2. Education Completion Rates(Education)
From the initial regression output, the effect of an increase in education spending per
capita had a positive effect on ELO score, with a coefficient of 0.04, which is consistent
12
with our intuition. Thus, a 1 dollar increase in education spending per capita resulted in a
0.04 increase in a country’s ELO score, on average. The p-value was found to be <0.001,
so that education spending per capita has a statistically significant effect on ELO score, at
all standard significance levels. In the reduced model, the education spending per capita
variable had a similarly positive effect on ELO score, with a coefficient of 0.06 and a p-
value <0.001. Thus, we can conclude that education spending per capita and ELO score
are positively correlated. Therefore, a potential interpretation of this result as that as the
government spends more on each student, there is a better chance that more students will
have early exposure to soccer through school teams/programs, allowing the country to
have a stronger pool of players to draw from. This is a reasonable conclusion because we
can expect that a greater investment in students can allow for better access to the sport.
Additionally, we can interpret this positive correlation as a country is more invested in
the growth of its citizens, it can have a larger pool of motivated and passionate players
who wish to represent their countries on the national soccer stage (potential interpretation
for the popularity variable).
3. Urbanization Rate (Urbanization)
Like real GDP per capita, it seems that a greater level of urbanization implies that more
people have better access to healthcare, jobs, and entertainment (like sports
teams/facilities). Considering the initial linear model, the coefficient for urbanization is
positive at 2.26, which is much consistent with our intuition. This implies that a 1
percentage point increase in the Urbanization rate will increase a country’s ELO score by
2.26 points, on average. Further, since the p-value is 0.003, we can conclude that the
urbanization rate has a statistically significant positive effect on ELO score, at all the
13
standard significance levels. When the top income share variable was dropped, the
urbanization rate variable had a greater positive effect on ELO score, with a coefficient of
5.98. In this model, the p-value was <0.001, so it was even more statistically significant
than in the previous model. Clearly, we can see that there is a positive correlation
between the Urbanization rate and ELO score This is a reasonable conclusion in that as
more people move to cities, there is a greater chance that a country’s soccer team
performs better, potentially due to better access to resources; i.e., if more people live in
cities, then there are more people that have better access to soccer fields and local teams
(people to play with in general in an area with more places to play) and can spend more
time playing soccer. Further, since these people have better access to jobs and healthcare,
it is likely that they are less concerned with finding employment and worrying about their
health and again, have a greater chance to invest time in playing soccer. Therefore, a
team will have a larger pool of talented players to draw from.
4. Corruption(CPI)
In the case of corruption, we expected that a less corrupt would be able to better allocate
resources and organize funding to both create and maintain a national soccer team.
However, we have seen that top-performing European and South American teams have
ranked at the top and bottom of this index, respectively, leading us to an interesting
investigation. Now, considering the initial regression results, we see that CPI has a
positive coefficient of 0.57, which is consistent with our intuition. This implies that a 1-
point increase in the corruption index-which refers to a country becoming “cleaner”- will
increase a country’s ELO score by 0.57 points, on average. Furthermore, the p-value for
CPI is 0.454 and therefore implies that CPI does not have a statistically significant effect
14
on ELO score. In the reduced regression model, the result was similar, where the CPI
coefficient was found to be 0.26 and had a similarly high p-value of 0.581. Therefore, we
can conclude that there is a positive correlation between ELO score and CPI, but the
effect on ELO score is not statistically significant. Thus, we choose to not interpret the
direction of CPI’s effect on ELO score as the 95% confidence intervals for these
estimated regression coefficients contain 0.
5. Income Inequality (Top Income Share)
We were a bit uncertain about the effect of income inequality on ELO score, given that
countries with “very clean” and “very corrupt” corruption ratings both consistently had
high ELO scores. We expected some sort of a positive correlation between ELO score
and top income share. This is like the initial observations for real GDP per capita and the
corruption index ratings. An issue with this variable is the lack of available information,
so there are a small number of observations. Thus, we chose to include an alternate
regression model. From the initial regression model, we can see that the coefficient for
top income share is -1.37, a negative value. This is not entirely consistent with our
intuition. The p-value for Top Income Share was 0.008, which is statistically significant
at all the standard significance levels. Thus, we can conclude that there is a negative
correlation between ELO Score and top income share. Therefore, as the top 1% in each
country increase their share of income, the country will perform worse on the
international soccer stage. Though this did not line up with our intuition, there may some
sense to this finding. As the income in a country becomes more and more concentrated
with the top 1%, fewer people have the time to seriously pursue soccer, because they are
15
spending more time working and trying to make up for a large income gap. However,
further analysis needs to be done to better understand the effects of income concentration
and its interaction with the popularity of soccer, as we are interested in the possibility that
a large income gap might make soccer become a popular avenue for financial stability.
Still, this result may due to a bias in information availability and should be interpreted
cautiously.
VII. Conclusion
In using the reduced model, we saw that GDP, CPI, Urbanization Rate, and Education Spending
Per capita could explain 27.2% of the variation in ELO score, compared to the 11.2% that the
initial regression model could explain. This indicates that it was useful to throw out the top
income share and the need to find a better variable to represent income inequality, as we still
expect there to be an important relationship between ELO score and income inequality.
Extensive further analysis is needed to account for the remaining 73% of variation in ELO scores
that was left unexplained. Given the fact that many abstract forces and factors affect success in
general, it is particularly difficult to pinpoint a national soccer team’s success simply on a
handful of economic growth-related variables. But we still expected that the general economic
make-up of a country would explain the ability of a country to lay the ground for a team’s
success, since having a functioning and well performing economy allows stadiums to be built,
coaches to be hired, and players to be recruited. It is possible that popularity, which will be
explored in the appendix, may account for a country being able to have access to a larger pool of
more passionate and motivated players. Other variables of interest may include those which are
related to climate, mental health, diet, etc. Additionally, it may be worthwhile to consider the
16
effect of population more directly, rather than looking at it through per capita measures, to see if
having more people allows teams to have access to a larger talent pool. There is much left to be
explored for future research given the complexity of success. Given that these variables measure
economic growth over time, it may be more beneficial to focus on the percentage change for
each of these metrics to capture change over time.
VI. Appendix
Real GDP Per Capita
If we examine the plot below, we can see that there isn’t a very strong linear relationship
between real GDP per capita and ELO rating. If anything, it is difficult to surmise any
relationship between these variables given the large clustering of data points.
However, upon examining functional forms, the relationship between log(GDP) and ELO rating
seemed to be more promising, appearing to be somewhat positively linear, as evidenced by the
plot below.
17
Though, this relationship is still not strongly linear, it seemed worth considering. Thus, two
regressions were run.
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 1435.41 1420.86 – 1449.97 <0.001 949.14 887.56 – 1010.72 <0.001
GDP 0.002324 0.001771 – 0.002876 <0.001
log(GDP)
61.14 54.04 – 68.25 <0.001
Observations 2101 2101
R2 / adjusted R2 0.031 / 0.031 0.119 / 0.119
In the linear regression, using GDP, as evidenced by the regression table, the coefficient for
GDP was .002324, the presumed positive affect (that is a small effect). Thus, a 1 dollar increase
in GDP per capita resulted an increase of 0.002324 in a country’s ELO score, on average.
Further, the p-value for GDP in this model turned out to be <0.001, which is statistically
significant at all the standard significance levels, implying that its effect on ELO score is
18
significant. Since GDP only explains about 3% of the variation in ELO score, this refutes the
idea that GDP is a strong indicator of variation in ELO score, which we believe is assumed by
most people. In terms of statistical significance, the linear-log regression, using log(GDP), was
consistent with the linear regression with just GDP. The p-value was <0.001, which is
statistically significant at all the typical significance levels, thus indicating that log(GDP) also
has a significant effect on ELO score. The coefficient in this model was positive, 61.14, which is
a much stronger positive effect than GDP’s effect the linear regression. This implies that a 1%
increase in GDP per capita increased a country’s ELO score by 61.148 points, on average. Since
it explained nearly 12% of the variation on its own, it may be more appropriate to consider the
linear-log model for this investigation.
Education Spending Per Capita
In studying the education variable, it was difficult to visually determine the relationship between
education spending per capita and ELO score, given the similar clustering issue which was seen
in the plot for real GDP per capita and ELO score. Again, taking log(Education) showed a
clearer relationship between the variables, a relationship that is like that of log(GDP) and ELO
score.
19
From the regression output, the effect of an increase in education spending per capita had a
positive effect on ELO score, with a coefficient of 0.09, which is consistent with our intuition.
Thus, a 1 dollar increase in education spending per capita resulted in a 0.09 increase in a
country’s ELO score, on average. The p-value was found to be <0.001, so that education
spending per capita has a statistically significant effect on ELO score. However, like real GDP
per capita, education spending per capita only explained a small amount of variation in ELO
score, 9.6%.
Urbanization
Years 2002-2016 Score(Dependent)
Predictors Estimates CI p
(Intercept) 1426.34 1413.44 – 1439.25 <0.001
Education 0.09 0.08 – 0.10 <0.001
Observations 2103
R2 / adjusted R2 0.096 / 0.096
20
In this case, the relationships for log(Urbanization) and Urbanization with ELO score were
similar.
In fact, there seems to be a clearer positive correlation between these variables and ELO score.
Considering the regression output for the linear model, the coefficient for Urbanization is
positive at 5.29, which is much consistent with our intuition. This implies that a 1 percentage
point increase in the Urbanization rate will increase a country’s ELO score by 5.29 points, on
average. Further, since the p-value is <0.001, we can conclude that the Urbanization rate has a
statistically significant positive effect on ELO score. It is worth noting that this is a much higher
positive effect than the previous effects for real GDP and education spending per capita, which
might explain why the Urbanization rate could explain 19.1% of the variation in ELO score.
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 1163.65 1135.05 – 1192.26 <0.001 428.63 344.08 – 513.19 <0.001
Urbanization 5.29 4.83 – 5.75 <0.001
log(Urbanization)
263.55 242.27 – 284.83 <0.001
Observations 2145 2145
R2 / adjusted R2 0.191 / 0.191 0.216 / 0.215
21
Though this still indicates that the Urbanization rate is not a strong predictor of variation in ELO
score, it is comparatively larger than the predictive strength of the previously examined
variables. Similarly, the coefficient for the Urbanization rate in the linear-log model was found to
be positive at 263.55, which is quite strong. Since the Urbanization Rate is already a percentage,
this effect may be exaggerated. Further, the p-value was <0.001, indicating that this effect was
indeed significant on ELO score.
Corruption
The relationship between CPI and log (CPI) with ELO score is unclear, visually, but the
log(CPI) plot appears to possible show a bit of positive correlation with ELO score.
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 1313.35 1287.14 – 1339.55 <0.001 786.38 698.22 – 874.53 <0.001
CPI 4.19 3.65 – 4.73 <0.001
log(CPI)
194.04 170.11 – 217.98 <0.001
Observations 2066 2066
R2 / adjusted R2 0.100 / 0.100 0.109 / 0.109
22
Considering the initial regression results, we see that CPI has a positive coefficient of 4.19,
which is consistent with our intuition. This implies that a 1-point increase in the corruption
index-which refers to a country becoming “cleaner”- will increase a country’s ELO score by 4.19
points, on average. Furthermore, the p-value for CPI is <0.001 and therefore implies that CPI has
a statistically significant effect on ELO score. CPI only explains 10% of the variation, so it is not
a strong predictor for ELO score. However, we can see from the plot for the linear and linear-log
models that there are many highly corrupt countries that still have high ELO scores. So, this
confirms that CPI effects ELO score, but only to an extent. The linear-log model yielded similar
results with a p-value <0.001, indicating a statistically significant effect on ELO score. It
explained around the same amount of variation in ELO score, so either functional form is
satisfactory. However, we can see that a 1% increase in CPI score resulted an increase of 1.94
points towards a country’s ELO score, on average, which is quite significant.
Income Inequality
It is difficult to assess the relationship between ELO score and Top Income Share, since there are
a larger number of 0’s in the data and there is a relative cluster near middle-high ELO scores and
middle-high Top Income share percentages.
23
From the regression output, we can see that the coefficient for Top Income Share is -1.7821, a
negative value. This is not entirely consistent with our intuition. The p-value for Top Income
Share was <0.001, which is statistically significant at all the standard significance levels.
However, per the R^2 value, Top Income Share is a very weak predictor of ELO score. This
implies that a 1% increase in the top 1%’s share in income will decrease a country’s ELO score
by 1.7821, on average.
Multiple Log Table
Years 2002-2016 Score(Dependent)
Predictors Estimates CI p
(Intercept) 1714.30 1677.93 – 1750.66 <0.001
Top Income Share -1.78 -2.68 – -0.88 <0.001
Observations 612
R2 / adjusted R2 0.024 / 0.022
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 367.59 269.24 – 465.94 <0.001 446.52 326.10 – 566.94 <0.001
GDP -0.001943 -0.002996 – -0.00089 <0.001
Education 0.06 0.05 – 0.08 <0.001 0.04 0.03 – 0.05 <0.001
CPI 0.43 -0.48 – 1.33 0.354
log(Urbanization) 279.52 252.72 – 306.32 <0.001 251.29 216.33 – 286.25 <0.001
log(GDP)
7.46 -8.00 – 22.92 0.344
log(CPI)
-7.73 -46.88 – 31.42 0.699
Observations 1847 1847
R2 / adjusted R2 0.294 / 0.292 0.289 / 0.287
24
Miscellaneous Regressions
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 1153.03 1072.36 – 1233.70 <0.001 1184.38 1116.20 – 1252.56 <0.001
GDP 0.002485 0.001181 – 0.003788 <0.001 -0.00 -0.001138 – 0.002259 0.190
Education 2.81 1.88 – 3.73 <0.001 -0.69 -1.60 – 0.22 0.136
CPI 1.56 0.26 – 2.85 0.019
Urbanization
6.36 5.60 – 7.11 <0.001
Observations 1222 1340
R2 / adjusted R2 0.133 / 0.131 0.228 / 0.226
These regressions were included to further display some additional reduced models. It is worth
noting that the leftmost regression is the only notable (including at least 3 independent variables)
multiple regression in which GDP was found to have the predicted positive coefficient and have
a statistically significant effect on ELO score.
Years 2002-2016 Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 694.06 285.35 – 1102.76 0.001 1202.19 1127.60 – 1276.77 <0.001
GDP 0.0005159 -0.001135 – 0.002167 0.539 0.0002445 -0.0009924 – 0.001481 0.698
Education 5.62 1.47 – 9.76 0.008 -1.16 -2.16 – -0.16 0.023
CPI 3.33 1.27 – 5.39 0.002 0.19 -1.02 – 1.40 0.754
Top Income Share -0.34 -1.63 – 0.96 0.612
Urbanization 2.64 0.77 – 4.50 0.006 6.76 5.86 – 7.65 <0.001
Observations 321 1222
R2 / adjusted R2 0.214 / 0.202 0.265 / 0.263
25
These regressions include a newly defined version of the Education variable which is now
measured by primary completion rate, “the number of new entrants (enrollments minus
repeaters) in the last grade of primary education, regardless of age, divided by the population at
the entrance age for the last grade of primary education” (World Bank) for the years 2002-2016.
This variable is not as highly correlated with GDP as is education spending per capita, but the
signs of the variables were not consistent across the individual reduced models.
Popularity
We chose to include this variable in the appendix as it was of significant interest to our study. As
proposed earlier, it is likely that the more popular soccer is in a particular country, said country
will have a higher ELO score.
From the scatter plot above, we can see that the relationship between ELO score and popularity
is relatively positive, but highly clustered between 0 and 20. Since this data set has meaningful
0’s, we are unable to investigate the log form of the variable, which may lead to a clearer
26
depiction of the relationship between the two variables. Nevertheless, we perform a single linear
regression between ELO score and popularity, as seen below.
The single variable regression for popularity indicates that it has a statistically significant
positive effect on ELO score, with a coefficient of 10.39, and p-value <0.001. This coefficient is
the largest compared to any of the non-log, single variable regressions, but only explains around
13% of the variation for ELO score. Thus, this is consistent with our hypothesis that popularity
has a positive effect on ELO Score. popularity has on ELO score.
Score(Dependent)
Predictors Estimates CI p
(Intercept) 1321.93 1303.24 – 1340.63 <0.001
Popularity 10.39 9.25 – 11.53 <0.001
Observations 2169
R2 / adjusted R2 0.129 / 0.129
27
We then ran a multiple regression with similar full and reduced models as seen in the previous
multiple regressions. In the full model, the only variables with statistically significant effects on
ELO score were Urbanization, Education and Popularity, with each of these variables having the
predicted positive effects. Further, popularity had the highest magnitude regarding its effect,
which is consistent with the non-log, single variable regressions. We once again chose to remove
Top Income Share to increase the number of observations, which resulted in all variables except
CPI, having statistically significant effects on ELO score, which is consistent with the previous
multiple regressions. More importantly, this regression was able to explain nearly 35% of the
variation in ELO score, which is roughly 10% more than when Popularity was not included in
Score Score
Predictors Estimates CI p Estimates CI p
(Intercept) 1248.49 1136.89 – 1360.09 <0.001 1054.09 1014.99 – 1093.20 <0.001
GDP -0.000735 -0.00225 – 0.000785 0.343 -0.002214 -0.00328 – -0.00114 <0.001
Education 0.03 0.01 – 0.05 0.007 0.05 0.03 – 0.07 <0.001
CPI 1.29 -0.21 – 2.79 0.092 0.33 -0.61 – 1.28 0.491
Top Income Share 0.29 -0.75 – 1.33 0.585
Urbanization 2.38 0.91 – 3.85 0.002 5.53 4.92 – 6.14 <0.001
Popularity 8.98 7.15 – 10.81 <0.001 8.24 7.12 – 9.35 <0.001
Observations 493 1680
R2 / adjusted R2 0.269 / 0.260 0.351 / 0.349
28
the model. The coefficients on Education, Urbanization, and Popularity were all positive, which
lined up with our intuition. However, the coefficient for GDP remained negative, which is not in
line with our intuition. This may be an issue due to multicollinearity, so the regression was re-
run, using the newly defined version of the Education variable, using the metric from the
miscellaneous regression section.
Score(Dependent) Score(Dependent)
Predictors Estimates CI p Estimates CI p
(Intercept) 686.97 265.86 – 1108.07 0.002 1106.80 1030.32 – 1183.28 <0.001
GDP -0.000531 -0.00 – 0.00 0.530 0.00002841 -0.00 – 0.00 0.964
Education 3.22 -0.94 – 7.37 0.130 -0.88 -1.88 – 0.13 0.087
CPI 4.50 2.41 – 6.59 <0.001 0.08 -1.16 – 1.31 0.905
Top Income Share 2.20 0.79 – 3.62 0.003
Urbanization 2.26 0.39 – 4.13 0.019 6.05 5.15 – 6.94 <0.001
Popularity 9.56 7.13 – 11.99 <0.001 8.36 6.95 – 9.76 <0.001
Observations 280 1103
R2 / adjusted R2 0.344 / 0.330 0.337 / 0.333
It appears that redefining the education variable did little to fix the issue as GDP was not
statistically significant in either regression, even though the coefficient changed signs, since the
confidence intervals for GDP contained 0. However, these models were able to explain nearly
33% of the variation in ELO score, which is again, comparatively high. Thus, it seems
reasonable to conclude that GDP may not have the expected positive effect since GDP serves
only as a prerequisite for success-the country must me able to effectively use the resources that
its wealth grants.
29
VII. Data Sources
Real GDP Per Capita: The CIA World Fact Book
https://www.cia.gov/library/publications/the-world-factbook/rankorder/2001rank.html
https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
Gini Coefficients: The CIA World Fact Book , Top Income Index
https://www.cia.gov/library/publications/the-world-factbook/rankorder/2172rank.html
https://wid.world/data/#countriestimeseries/gptinc_p0p100_992_j/US;FR;DE;CN;ZA;GB;WO/1
930/2017/eu/k/p/yearly/g
Corruption Index: Transparency International
https://www.transparency.org/research/cpi
Education: The CIA World Fact Book, OECD
https://data.oecd.org/eduatt/population-with-tertiary-education.htm#indicator-chart
https://data.worldbank.org/indicator/SE.PRM.CMPT.ZS?view=chart
ELO Ratings: World Football ELO Ratings- An in depth look at the formula used and the
weighting of matches and goal differences can be found at the official website for ELO ratings
https://www.eloratings.net/about
https://www.fifa.com/fifa-world-ranking/procedure/men.html
(For comparison with the ELO ratings system)
Urbanization: World Bank
30
https://data.worldbank.org/indicator/SP.URB.TOTL.IN.ZS?end=2017&start=1960&year_low_d
esc=true
https://www.lorecentral.org/2018/01/advantages-and-disadvantages-of-urbanisation.html
Popularity: “The Socio-Economic Determinants of International Soccer Success” (Hoffman et.
al)
https://trends.google.com/trends
FIFA Scandal: BBC
https://www.bbc.com/news/world-europe-32897066
Corruption Information: Investopedia
https://www.investopedia.com/articles/investing/012215/how-corruption-affects-emerging-
economies.asp
World Cup Title Information: World of Soccer
http://www.aworldofsoccer.com/top_tournaments/world_cup_winnersbycontinent.htm
31
References
1. 12 Advantages and Disadvantages of Urbanisation. LORECENTRAL, 12 Jan. 2018.
2. “COUNTRY COMPARISON: DISTRIBUTION OF FAMILY INCOME - GINI INDEX.” World
Fact Book, Central Intelligence Agency.
3. “COUNTRY COMPARISON: GDP (PURCHASING POWER PARITY).” World Fact Book,
Central Intelligence Agency.
4. “Education Resources - Education Spending - OECD Data.” The OECD Database, OECD.
5. “FIFA Trends Data.” Google Trends, Google.
6. “FIFA Corruption Crisis: Key Questions Answered.” BBC News, BBC, 21 Dec. 2015.
7. “Government Expenditure on Education, Total (% of GDP).” The World Bank Database, World
Bank.
8. Hoffman, Robert, et al. “The Socio-Economic Determinants of International Soccer
Performance.” Journal of Applied Economics, V, no. 2, Nov. 2012.
9. “Individuals Using the Internet (% of Population).” The World Bank Database, World Bank.
10. Mirzayev, Elvin. The Economic and Social Effects of Corruption. Investopedia, 1 Aug. 2018.
11. “‘Population, Total.’” The World Bank Database , World Bank.
12. Research - CPI - Overview, Transparency International.
13. Top Soccer Tournaments: Soccer World Cup Winners by Continent, A World of Soccer.
14. “Urban Population (% of Total).” The World Bank Database , World Bank.
15. World Football ELO Ratings 2002-2018.