Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
UNIVERSITAT AUTÒNOMA DE BARCELONA
Non-Sponsored Technology
Adoption: How Network Effects
Create Dominant Standards
PUE Final Project
Josep Nueno Guitart
Tutor: Gabriel Izard Granados
Abstract
This project studies the role of network effects in the diffusion of technical innovations. In order
to do so we study the transition from an incumbent video containing format (AVI) to a newer
more efficient one (MP4) in the context of The Pirate Bay, a peer-to-peer file sharing network,
using a database of the 2.1 million files available in their catalog. In order to carry on our
analysis we divide the peers active in The Pirate Bay in those that upload files and those that
download them, and we find statistically significant evidence of network effects in the MP4
adoption shares for both groups. Lastly, we propose a theoretical model to explain some of the
observed phenomena.
Este proyecto estudia el rol que los efectos de red desempeñan en la difusión de innovaciones.
Analizamos la transición de un formato contenedor de video incumbente (AVI) a otro más
nuevo y eficiente (MP4) en el contexto The Pirate Bay, una la red peer-to-peer para compartir
archivos, usando una base de datos de los 2.1 millones de archivos disponibles en su catálogo.
Para llevar a cabo nuestro análisis dividimos a los usuarios de The Pirate Bay en aquellos que
cargan ficheros y aquellos que los descargan, y encontramos evidencia estadísticamente
significativa de efectos de red en la proporción de usuarios que adoptan MP4 en ambos grupos.
Finalmente proponemos un modelo teórico para explicar algunos de los fenómenos observados.
PROGRAMA UNIVERSITAT-EMPRESA
1
Non-sponsored technology adoption: how
network effects create dominant standards.
Tutor: Gabriel Izard Granados
Signatura:
2
PRESENTACIÓ DEL TREBALL FI DE CARRERA
Josep Nueno Guitart, alumne de la vint-i-tresena Promoció del Programa
Universitat-Empresa, fa entrega per duplicat del treball fi de carrera titulat:
Non-sponsored technology adoption: how network effects create dominant
standards amb el qual participa en la dinovena convocatòria de les Beques
Universitat Empresa. Declara conèixer i acceptar les bases de la convocatòria.
Així mateix declara que el treball fi de carrera que presenta és inèdit, no
plagiat, i haver respectat el compromís de confidencialitat amb les empreses
del PUE.
Igualment autoritza al Programa Universitat-Empresa a la publicació del seu
treball.
Bellaterra (Cerdanyola del Vallès), 30 de maig de 2013
Signat:
3
Table of contents
1. Introduction 6
2. Overview of The Pirate Bay and BitTorrent 10
3. Overview of digital audiovisual formats 12
4. Literature review 16
5. Econometric model 20
6. Data 22
7. Results 30
8. Theoretical model 34
9. Simulation 43
10. Discussion 47
11. Appendix 1 – Code used for database cleanup 51
12. Appendix 2 – Descriptive statistics for television shows 55
13. Appendix 3 – Stata output 56
14. Appendix 4 – MKV regressions and comments 62
15. Appendix 5 – Theoretical model derivations 65
16. Appendix 6 – Code used for the simulation 68
17. References 70
4
Index of figures
Figure 1. Revolutionary innovations in media technology 5
Figure 2. Revolutionary and Evolutionary innovations within media paradigms 5
Figure 3. Monthly uploaded AVI and MP4 files to TPB 9
Figure 4. Monthly format share of uploaded AVI and MP4 files to TPB 9
Figure 5. Broadband penetration and TPB use 11
Figure 6. Monthly uploads to TPB by format 13
Figure 7. HD television sets sold in 2005-2010 14
Figure 8. Monthly uploaded media files to TPB per codec 15
Figure 9. Dataset creation process 23
Figure 10. Average number of seeders per file for AVI and MP4 files 24
Figure 11. Average number of seeders per file (TV Shows sample) 26
Figure 12. Average number of comments per file (TV Shows sample) 27
Figure 13. Net share effects for = 0.9 39
Figure 14. Equilibrium adoption shares for firms varying and 45
Figure 15. Equilibrium adoption shares for firms varying n and fkmax 45
Figure 16. Equilibrium adoption shares for consumers varying and 46
Figure 17. Equilibrium adoption shares for consumers varying n and fkmax 47
5
Index of tables
Table 1. Descriptive statistics 29
Table 2. Random effects MLE regression of MP4 share for users 31
Table 3. Random effects MLE regression of AVI share for users 32
Table 4. Random effects MLE regression of MP4 share for uploaders 33
Table 5. Possible interactions between firms and consumers 35
Table 6. List of parameters used in the simulation 43
Table 7. List of variables used in the simulation 45
6
1. Introduction
After increasing significantly during the late 20th
century, the rate at which
technological innovations are generated experienced unusually high levels of
acceleration over the last decade: digitalization creates very demanding environments as
the relative ease of transition between technologies results in a faster innovation-
substitution cycle. Even if these changes rarely suppose a drastic change in terms of the
improvements new developments warrant, the high frequency at which they are taking
place make them an interesting subject.
Varian and Shapiro (1999) categorize technical innovations depending on whether
the new standard is compatible with the old one or not. If it is he calls it an
“evolutionary innovation” while if it is not it becomes a “revolutionary innovation”.
Looking at the literature on standards from this perspective one can notice that most
work addresses the dynamics of revolutionary innovations (i.e. disruptive changes
among standards) while there is much less focus on evolutionary ones (the substitution
activity within a technology as the many innovative alternatives fight to become the
dominant standard). Another classification prevalent in the reviewed literature is the
distinction between sponsored and unsponsored standards. The former are proprietary
technologies sold by an agent who is capable of strategic maneuvering in order to
maximize the chances of its standard becoming the dominant one. On the other hand in
the case of unsponsored standards, no one else besides the final consumers stand to gain
anything from adoption. While some papers on theory of unsponsored standards have
been written, most notably Katz and Shapiro (1985, 1986) and Farrell and Saloner
(1986), little empirical research has been conducted on them, and the emphasis has
always been on competition rather than replacement of an incumbent format by a more
effective new one.
One aspect in which most research agrees is in the relevance direct and indirect
network effects have in the adoption of new technologies. Direct network effects are a
consequence of adoption by other users: the classical example is the increase in utility
users connected to a telephone network experience when an additional user decides to
join in. Indirect network effects are different in that while their impact may also
increase along user adoption, they are not a direct consequence of it. An example of
them is software variety in different operating systems for personal computers: a higher
7
adoption rate for Mac computers increases the incentive of developers to create products
that work in that platform, which in turn increases the variety of applications available
for that operating system. This is a self-reinforcing loop, since more variety entails a
higher attractiveness for the platform which increases adoption by users. While harder
to identify than direct effects, indirect network effects play a huge role in determining
whether or not a specific technology will succeed in carving out a user base1.
Figure 1: Revolutionary innovations in media technology
Figure 2: Revolutionary (red) and Evolutionary (blue) innovations within media
paradigms
This project intends to study the dynamics of adoption in the case of evolutionary
innovations, paying special attention to the impact of network effects. A case at hand is
1 For a detailed empirical investigation of the relevance of indirect network effects see Ohashi (2004)
8
the gradual replacement there has been in digital video formats, where Audio Video
Interleave (AVI), a format originally designed by Microsoft but freely licensed, was
gradually replaced by ISO’s MPEG-4 Part 14 (MP4). This change was by no means
revolutionary since it did not challenge the governing technical paradigm (digital video,
see Figures 1 and 2), but it did carry incremental improvements to the quality and
usefulness of the contents offered. Figures 3 and 4 show the count and share of AVI and
MP4 multimedia files uploaded daily to The Pirate Bay which, for the time being, can
be taken as the dominant catalog of files being shared in BitTorrent, a protocol for peer-
to-peer sharing of files. We can notice how during the second half of the ‘00s MP4
gained ground over AVI as the preferred format in which to distribute media files, until
becoming the dominating one by mid-2012. This setting is ideal to study how non-
disruptive, non-sponsored technologies bid for dominance within a technical paradigm,
in this case digital video. Furthermore, the exponential growth of MP4 is suggestive of
self-reinforcing dynamics, which in turn points to network effects as a driver of
adoption. Files shared in BitTorrent are ideal to study the role of indirect network
effects since peers on the network can be divided into file uploaders (a small subset of
total peers) and regular users who only download files, and by studying how adoption
decisions of a group impact the adoption decision of the other we can separate and
assess the impact of direct and indirect network effects.. Additionally, since the studied
technologies are freely used in this context we can zero-in network effects while paying
only limited attention to strategic maneuvering by the original designers or other parties.
Sections 2 and 3 give a quick overview of the workings of BitTorrent, The Pirate
Bay and video container formats (AVI and MP4). Section 4 gives an overview of the
literature written on the subject. In section 5 presents an econometric model to explain
the changes in adoption of video formats by peers, and a description the data
manipulations that were carried out in order to test the model is the subject of section 6.
Section 7 proposes a theoretical model to explain the observed results, and in section 9 a
simulation is run in order to better understand its dynamics. Finally section 10 discusses
the findings of the project.
9
0
500
1000
1500
2000
2500
3000
3500
4000
4500
20
04
-08
20
05
-01
20
05
-06
20
05
-11
20
06
-04
20
06
-10
20
07
-03
20
07
-08
20
08
-01
20
08
-06
20
08
-11
20
09
-04
20
09
-09
20
10
-02
20
10
-07
20
10
-12
20
11
-05
20
11
-10
20
12
-03
20
12
-08
20
13
-01
Mo
nth
ly u
plo
ads
to T
PB
Figure 3: Monthly uploads to TPB of AVI and MP4 files
MP4
AVI
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
20
04
-08
20
05
-01
20
05
-06
20
05
-11
20
06
-04
20
06
-10
20
07
-03
20
07
-08
20
08
-01
20
08
-06
20
08
-11
20
09
-04
20
09
-09
20
10
-02
20
10
-07
20
10
-12
20
11
-05
20
11
-10
20
12
-03
20
12
-08
20
13
-01
Mo
nth
ly u
plo
ad s
har
e
Figure 4: Monthly share of MP4 and AVI for media files uploaded to TPB
MP4
AVI
10
2. Overview of BitTorrent and The Pirate Bay
BitTorrent is a protocol that facilitates peer-to-peer sharing of large files. Peer-to-
peer downloading differs from traditional client-server downloading in that the transfer
of a file is not handled by a single central server, but instead is carried out by a network
of computers running a peer-to-peer file-sharing software (or client). When a specific
file is downloaded from a peer-to-peer network each computer in it that has the
requested file transfers a small part of it, which greatly improves efficiency, both in
terms of congestion and in terms of download time. BitTorrent was devised with the
goal of incentivizing sharing and minimize the free rider behavior that is so prevalent in
peer-to-peer sharing (mainly, disconnecting from the network as soon as the download
is complete). It does so through a “tit-for-tat” system which ranks each peer in terms of
the amount of time it remains connected to the network after finishing his or her
downloads, improving future download speeds together with the ranking. Computers
that are sharing the complete file are known as “seeders”, and are an indicator of the
availability of a file in the network. Since for most peers upload speed is significantly
lower than download speed BitTorrent allows parallel download of file chunks,
therefore bypassing the bandwidth bottleneck other peer-to-peer networks are faced
with.
All these factors have contributed to a steady increase over the last decade in the
number of users and companies using the BitTorrent protocol to share files legally or
illegally. As Figure 5 illustrates, the advent of broadband for mass consumption has
facilitated the exchange increasingly large files, and since the early 2000s sharing video
files (be it movies, video clips or television shows) has become commonplace for a lot
of internet users. Following this surge in popularity websites started appearing that
indexed the files in the BitTorrent network and provided the necessary information to
access them (Torrent Files and Magnet Links2).
2 Torrent Files contain data about the locations of a file within the BitTorrent network while Magnet
Links contain a unique identifier that is derived from the contents of the file. Both can be used to start peer-to-peer downloads in the BitTorrent protocol.
11
Source: OECD, own preparation
One of the biggest repositories is The Pirate Bay (TPB), established in 2004 and
indexing as of February 2013 more than 2 million files. During its history TPB has been
frequently on the spotlight due to its status as one of the biggest networks providing
access to pirated content, which have made it the target of a lot of attention by
intellectual property enforcers who repeatedly have taken legal action against it. Despite
their efforts the website continues active although it has been forced to change its
country domain several times (currently all its traffic is being routed from the island of
Saint Martin). Its long trajectory along with its popularity make it an ideal candidate to
study how the MP4 format has come to dominate most video downloads. The website
has a page for each file in its catalog which contains information on it and allows users
to leave comments. Further increasing its attractiveness a dump of its database was
carried out in February 2013 by programmer Karel Bilek who publically posted the
results making them available for download by anyone interested.
The motivations and incentives of peers operating in a BitTorrent network are
clearer for those that download files than for hose that upload them. The incentives for
the downloading side don’t really need to be explained, since there is an obvious benefit
to downloading media for free and at a decent speed. However the uploading side, the
one that creates the file and prepares it for sharing, invests a significant effort in it
apparently without gaining any kind of compensation, other than the occasional thank
0
50000
100000
150000
200000
250000
300000
0.00
5.00
10.00
15.00
20.00
25.00
30.00
File
up
load
s to
TP
B
Bro
adb
and
pen
etra
tio
n o
n O
ECD
co
un
trie
s (%
) Figure 5: Broadband penetration and TPB use
Broadband penetration on OECD countires Bi-Quarterly uploads to TPB
12
you from people that download. In some cases Torrent files contain text documents
which lead to the uploading part’s website, thus generating traffic which may translate
into ad revenue. Other files may contain malware or may require the completion of
surveys or the disclosure of personal information in order to unlock the contents. Finally
some peers seem to operate out of idealism, with their final goal set on the free
circulation of information, whatever its shape or content. Regardless of the motivation
behind the file upload one thing is clear: the more a file is shared the better. All the
peers that upload files wish to maximize its impact by making sure it will have a wide
distribution. Format is one of the choices which factors into the success of a file and
therefore peers that upload them decide which one to adopt keeping this maximization
goal in mind. For clarity during the rest of the project those users that upload files will
be referred to as “uploaders” while those that download them will just be “users”.
The fact that downloading a file from BitTorrent increases the download speeds for
that specific file for all users makes the system ideal to study direct network effects.
Indirect network effects can also be assessed, in particular the effect that the diversity of
contents available in either format has on the adoption rates of each type of peers.
3. Overview of digital audiovisual formats
In order to be able to play a digital video it first needs to be contained in a wrapper or
container format. This wrapper contains data about the video file which the media
player, the application in charge of turning the digital information contained in the file
into actual images and sounds, needs in order to run the video. There are several
competing formats available, each with its strengths and weaknesses. Figure 6 shows
the number of monthly uploads of four of the most frequently used wrappers which for
convenience will be called “the big four”: MKV, WMV, MP4 and AVI. Despite this
variety it quickly becomes clear that two formats have been in dominating positions
over the decade in terms of the files available for download on TPB: Microsoft’s Audio
Video Interleave (AVI) and the International Standards Organization’s MP4. The fate of
WMV, another Microsoft wrapper, is very closely tied to the one of its more popular
and successful cousin AVI. Furthermore, due to methodological constraints, WMV was
not suitable for our study since, as will be explained later on, the empirical part will
13
center on format changes for television shows, of which only a very small part is
wrapped in WMV (a disproportionate amount WMV files are “adult” files, for some
reason). On the other hand MKV also experimented a big increase in usage during the
same time period. While this project focuses mostly on the competition between AVI
and MP4, we will provide some estimates and comments on MKV adoption in
Appendix 3, and its usefulness for future research will be examined in the discussion
section.
For the better part of the last decade AVI was in a dominant position with most
audiovisual files exchanged on BitTorrent wrapped in that format. An AVI file is
divided into three parts, also known as “chunks”: the first one contains metadata, such
as the video’s definition (width and height) or frame rate. The second one contains the
audiovisual content proper which is encoded by using software library known as a
codec: before packaging a video file into its container it needs to be encoded, which is
the process of transforming analog data into digital data. Software libraries designed to
enable this process are known as codecs, of which there is a large variety and many are
freely available to the public. The final chunk is optional and contains additional
metadata on the file.
Despite its initial success, the AVI format has several limitations, especially
regarding compression and aspect ratio, as well as a lack of standardization for features
0
500
1000
1500
2000
2500
3000
3500
4000
4500
20
04
-08
20
05
-01
20
05
-06
20
05
-11
20
06
-04
20
06
-10
20
07
-03
20
07
-08
20
08
-01
20
08
-06
20
08
-11
20
09
-04
20
09
-09
20
10
-02
20
10
-07
20
10
-12
20
11
-05
20
11
-10
20
12
-03
20
12
-08
20
13
-01
Mo
nth
ly u
plo
ads
to T
PB
Figure 6: Monthly uploads to TPB for the big four
MP4
AVI
WMV
MKV
14
such as the time code, which are important for professional use of the file. Some
competing formats have solved this issues allowing for more efficient file transfer and
manipulation. One of these is ISO’s MP4 format (also known as MPEG-4 Part 14),
developed as a version of Apple’s QuickTime File Format. Although it was first
published in 2001 it didn’t start enjoying success until the mid ‘00s, mostly due to the
popularization of High Definition (HD) media. Before HD the most successful codec
was DivX, which allowed the efficient compression of large videos into a digital file.
Most AVI files had their second chunk encoded with DivX, so the popularity of the
codec fueled the diffusion of the format. However, one of the shortcomings the public
version of the codec had was its inefficiency when it came to encoding HD videos.
There are many codec alternatives (such as x264), although at first they didn’t enjoy
much success. This is attributable to the lack of demand for HD files, since most
monitors at the beginning of the 2000s were not able to reproduce this content. However
as the decade advanced the widespread adoption of HD television sets and screens (see
fig. 7) changed this situation, greatly increasing the demand for HD media files.
Figure 7: HD television sets sold in 2005 – 2010
Source: GfK Retail and Technology, July 2010
The fact that many of this television sets allowed the reproduction of encoded
digital files increased the demand for digital media. However this increase in demand
didn’t spread evenly among all codecs, and, as can be appreciated in Figure 8, those that
were better suited for encoding HD files (in particular x264) absorbed most of the
bump.
15
While MP4 was not the only container format able of carrying x264 encoded
video files it benefited greatly from the switch to the new codec since, due to the factors
examined later in this section, it was in an advantageous position to exploit any
weakness AVI showed. The similarity between the x264 trend in Figure 8 and the MP4
trend in Figure 5 illustrates a very strong tie between the x264 codec and the format.
Another force that played an important role in the adoption of MP4 was advent
of smartphones and digital music. One of the first companies that started taking
advantage of the improvements MP4 offered was Apple who created an audio codec
(Apple Lossless) which stored audio data into an MP4 wrapper and that was used for all
the music iTunes Store offered. Other portable devices also offered compatibility with
the standard and the advent of mobile computing entailed a large increase in the
installed base of devices offering compatibility with the new format. In 2004 Apple
released the source code for its Apple Lossless codec, making it open source and royalty
free, further fuelling its growth. At that point most of the content that was being
distributed in MP4 was music and despite its advantages over AVI the format was
hardly used for packaging audiovisual contents: mobile devices still did not have neither
the memory nor the resources necessary to play those videos, and HD screens were not
popular yet. Furthermore on the TPB site the huge popularity of AVI had locked-in
users and uploaders into the incumbent format. The release of additional MP4-
0
2000
4000
6000
8000
10000
12000
20
04
-04
20
04
-09
20
05
-02
20
05
-07
20
05
-12
20
06
-05
20
06
-10
20
07
-03
20
07
-08
20
08
-01
20
08
-06
20
08
-11
20
09
-04
20
09
-09
20
10
-02
20
10
-07
20
10
-12
20
11
-05
20
11
-10
20
12
-03
20
12
-08
20
13
-01
Mo
nth
ly u
plo
aded
file
s
Month
Figure 8: Number of monthly uploaded files to TPB per codec
Divx
x264
16
compatible devices such as PlayStation 3 and the incipient penetration of media centers
into households further increased the attractiveness of the format.
This account summarizes external trends that explain the adoption of MP4 by
users. Without them network effects alone would not have been sufficient to move the
user base of TPB and the wider BitTorrent community out of the old format. In later
parts of the project we will take this into account, despite the fact that our focus is on
the network effects.
4. Literature review
The first literature on technology adoption was written in the 1980s and focuses greatly
in the role network effects play in the process. In their seminal paper, Katz and Shapiro
(1985) described how consumption externalities that were generated by users of a
product impact its demand. They identify two possible types of consumption
externalities: the first, corresponding to direct network effects, is a “direct physical
effect of the number of purchasers on the quality of the product”, as in the telephone
case described in the introduction. Indirect network effects are variables that while they
can be related to the number of users of a certain product are not exclusively dependent
on it: market quota or the costs of post-sale services could be instances of these. They
go on to create a model in which firms compete to attract consumers to their networks
through pricing and find that under certain circumstances the optimal solution is to
allow compatibility between their networks in order to maximize adherence by
consumers by amplifying the intensity of network effects.
Further developing their approach Katz and Shapiro (1985, 1986 and 1995)
extend their model and apply it to technology adoption. They introduce the distinction
between sponsored and unsponsored standards, and identify the inefficiencies that can
arise from adoption: specifically they demonstrate how the incumbent technology has
an advantage over the new one in the case of unsponsored standards, while a sponsored
one will have a strategic advantage over unsponsored ones, even if it is inferior, since
they can behave in a strategic way in order to make sure their technology is the one that
finally succeeds. This process of replacement of a superior technology by an inferior or
less mature one is called excess momentum. They go on to describe the possible
manoeuvers a sponsored standard can use in order to gain the upper hand such as
17
committing to future prices or, in the case of the software-hardware paradigm,
integrating vertically.
Another tenet of the late 80s technology adoption literature are the works of
Farrell and Saloner (1985, 1986). Their model describes adoption of an unsponsored
standard and includes two additional factors into the consumers’ choice: first they
assume consumers can form expectations as to which standard will succeed, which can
lead to a bandwagon effect in which the choice of the first consumer creates a cascading
effect that makes all subsequent adoption decisions identical to the first one. Second,
they introduce the notion of an installed base, which is a reflection of the number of
users committed to a standard on day zero. This installed base under uncertainty
conditions can trap the market into an old, inferior standard since, despite being
individually interested in adopting, consumers do not dare to do so because they ignore
what the choices of subsequent adopters may be, a process which they will call excess
inertia.
A final set of models proposed in in the 80s focused on increasing returns and
path dependence, with Arthur (1987, 1990 and 1994) as one of the main exponents of
the current. In the series of papers compiled in the book Increasing returns and path
dependence in the economy he provides models that illustrate how historical accidents
may explain why an inferior technological standard may end up being adopted over a
superior one. The main analytical tool he uses is a statistical model known as a Polya
urn process: “it can be pictured by imagining a table to which balls are added one at a
time; they can be of several possible colors – white, red, green or blue. The color of the
ball to be added next is unknown, but the probability of a given color depends on the
current proportion of colors on the table”3. Arthur goes on to describe several
economical processes governed by similar self-reinforcing processes, such as the choice
of geographical location by firms, or technology adoption. In the case of technology
adoption he defines the adoption choice as a random walk with critical bounds: if the
process surpasses a threshold all future choices will go to the same technology. He goes
on to demonstrate the existence of several stable equilibriums for such problems, and
examines how historical accidents condition which one is eventually reached. His
models fit into the Evolutionary Economics school and are a great introduction to path
3 Arthur, Brian, 1994, Increasing Returns and Technology adoption p. 6
18
dependence, a notion that has gained a lot of relevance in development, spatial and
financial economics.
During the 1990s and 2000s most of the literature turns to the study of sponsored
standards and technologies and the strategic aspect of the problems. Some papers will
follow more on the tradition of the previous studies conducted while others choose to
develop their theories within the two sided markets approach. An example of the former
would be Besen and Farrell (1994), who will describe different competitive strategies in
standard setting and the mechanisms by which a firm may try to steer the market in its
favor, and how the market structure influences the outcome. They demonstrate how
when firms are similar they choose the same compatibility strategy and therefore
facilitate the emergence of a single, consolidated standard. However if firms are
dissimilar a standards battle is likely to occur: bigger firms may want to forbid new
entrants to adhere to its network. In the same line, Götz (1999) analyzes the adoption
and diffusion of a technology in markets with monopolistic competition, and
demonstrates how in a non-cooperative setting identical firms may adopt a new
technology at different dates. For non-identical firms he assigns a rank to the good of
each firm which alters the demand consumers have of it, and demonstrates how bigger
firms have a bigger incentive for adoption. This project proposes a variant of Götz’s
model in which consumers, and not only firms, are also faced with an adoption choice
(see Section 8).
Varian and Shapiro (ibid) and Varian et al. (2004) gives a business-oriented
overview on the state of the art in standards literature and creates many useful
classifications for technical innovations. His focus is on strategic maneuvering and in
discussing past cases, specifically Standards Wars such as VHS vs. Beta or Standard
Gauge vs. Broad Gauge in the early days of railroads.
A last, and quite recent, contribution to this line of inquiry is the manual
compiled by Farrell and Klemperer (2006). Although it focuses mainly in network
externalities study competition under switching costs and network effects. They show
how these effects can lock in customers into their early choices and how suboptimal
arrangements from a social welfare point of view can prevail under these conditions.
Literature on platform competition in two sided markets is also relevant when
studying technology adoption, in particular in competition between sponsored non-
19
compatible standards, although the bulk of it does not deal with it explicitly. This is
illustrated by the change in language, with platform being used more often than
technology or standard. Still models of two sided markets share many similarities with
models of technology adoption, as in essence both deal with network effects and the
incentives they generate. Rochet and Tirole (2002, 2006) have published several of their
works on the dynamics of competition in two sided markets, with a special focus on
pricing. In their models, platforms compete to attract a demand which is split on two
sides, with at least one of them experiencing a positive membership externality when
additional customers join the opposite side. They demonstrate that the profit-
maximizing decision of a monopolistic platform is to subsidize the side of the demand
that has higher price elasticity and overbill the inelastic side. Armstrong (2002)
develops his own model and pays special attention to competition between platforms.
He also allows the demand side to perform a broader range of behaviors, specifically he
allows those agents to multihome (i.e. use several platforms at the same time). In the
technology adoption context his model could be applied to
Empirical literature
A lot of the empirical literature on technology adoption has been done in within the
Marketing field (and a surprising amount of it focuses on adoption of electronic
payment systems), and tends to be based on attitudinal rather than hard data. There is a
lot of variety in the frameworks these researchers use, but most construct their
investigations around the Technology Acceptance Model (TAM). Proposed by Davis
(1989) TAM is applied mostly in research on the diffusion Information Technology, and
focuses on the perception users have of what he considers the two main drivers behind
an adoption decision: ease of use and usefulness. The model has undergone several
extensions and modifications and has been widely used since its introduction.
Some research has also been dedicated to the exploration of network effects using
data form other sources. For example, Rysman (2003) studies competition between
networks by studying how Yellow Pages directories compete. His final goal is to
determine whether or not standardization would be preferable to competition from a
social welfare point of view, since that would allow maximizing the magnitude of
network effects. His conclusion is that, in the specific case of Yellow Pages directories,
competition is preferred to standardization
20
As for technology adoption, in his paper Ohashi (2003) studies the competition
between Beta and VHS between 1978 and 1986, and the impact networks may have had
in the final victory of VHS. In order to identify the network effects he incorporates into
the consumers’ utility function an installed base variable for each of the competing
standards. He then proceeds by estimating adoption by using a nested logit model, in
which he first estimates the likelihood of adoption of any VCR device, and then the
likelihood of choosing either VHS or Beta. His model allows him to run simulations
with which he can contrast hypotheticals, and one of the most remarkable results he
obtains is that the success of VHS would have been unlikely had its price been lower
during the first stages of competition.
Along this line of work, Clements and Ohashi (2004) study the role of indirect
network effects on the videogame market in the United States. Videogame platforms are
an instance of sponsored non-compatible technological standards, and securing a broad
variety of software products early on in order to get a large installed base is one of the
main concerns of platform rivals. The paper goes on to model the strategic interactions
between platforms and software providers.
5. Econometric Model
The objective of the empirical part of this project is to determine the impact of direct
and indirect network effects on in the substitution of AVI by MP4 for audiovisual file-
sharing in TPB. In order to do so two models have been developed to explain the
variations in the share of adoption of MP4 files (one for each side of the BitTorrent
ecosystem).
Direct network effects are a consequence of adoption of MP4 by other users: as
explained before download speed for a specific file increases with the number of users
that have a copy of it, therefore increasing the attractiveness of a specific format.
However guessing which files each user is interested in is impossible with the data
accessible to us, so we decided to cluster our sample by using television shows: we
assume that someone that downloads an episode of a television show is more likely to
be interested in other episodes and he or she stands to benefit from the direct network
effect generated by larger shares in a specific format within that subset of files.
21
Furthermore, clustering by television show holds the additional advantage of allowing
us to follow a group of users and uploaders over time, since new episodes are uploaded
into TPB after they are broadcast through traditional television, so dummy variables for
unobserved demographic characteristics and time can be added.
As for the indirect network effect, we will model it is a function of the variety of
audiovisual media being offered in MP4. In order to parameterize this variety we will
use the lagged share of audiovisual media being offered in that format, or to put it
another way, the percentage of uploaded media files in MP4 up to that date.
With this in mind we define the following MP4 adoption share function
where is the MP4 adoption share among users for files uploaded at
date t for television show TV; is the format adoption share of users for
previous episodes of that television show; is the lagged proportion of media
files being offered in MP4, and and are television show and time dummies. We
decided to lag since given the large number of media files available for
download it’s unlikely that users would react immediately to variations in the format
composition of the aggregate total of files available.
Time dummies are included to account for unobserved changes in the utility of
MP4. As was discussed in Section 3 starting in the mid ‘00s MP4 becomes more
attractive thanks to it being compatible with HD and mobile devices. Lacking any way
to model these effects we include yearly time dummies as a means to capture the
increase in popularity the format had during the decade for reasons other than network
effects. Finally a dummy is also included for television shows, which is useful mainly to
cluster the sample in differentiated user groups and to a lesser extent to control for
unobserved demographic characteristics affecting the adoption decision.
The model estimating the adoption rate among uploaders of a specific television
show is similar, only in this case the both direct and indirect network effects come from
users. This is based on the assumption that uploaders don’t experiment any direct
benefit resulting from other uploaders switching to MP4 and instead benefit from it
indirectly through a ricochet effect: more uploads in MP4 mean more users in MP4
which in turn makes the format more attractive to uploaders. In order to capture it we
use the total installed base share for users at time t, taking into account all audiovisual
22
media. For the modeling of the direct network effect we proceed in the same fashion as
in the users’ case, taking into account only the user installed base within a single
television show. Thus we specify the following model for upload share
where is the share of installed base MP4 has among all users downloading
media files and , and are the same as in the user specification
(installed base share for a specific television show, television show dummies and time
dummies, respectively).
6. Data
The TPB database used was compiled by Karel Bilek4 by running a Perl script on the
TPB website. The whole process took “about six months” and according to his own
account between 100 and 300 Torrent files are missing (which is negligible when
compared to the more than 2 million Torrent files he did manage to compile). The data
was stored in an XML format and the uncompressed file weighs 4.4 Gb. For each
Torrent file the dump contained several fields, all of which were discarded except the
following:
Identification number: a unique identifier for each file
Title: the title of the file, often specifying the format as well in the case of media
Seeders: the number of users that have a full copy of the file and that are sharing
it
Upload date: the date in which the Torrent file was created
Information: comments left by the uploader which were also checked for file
format
User comments: comments left by downloaders of the file
Due to the large size of the database most of the cleaning and sampling was carried out
with the UNIX terminal. A first step consisted in creating a new XML field containing
for the number of comments each Torrent file had (which although available in the TPB
website was not included in Bilek’s dump) and removing the actual comments in order
4 Karel Bilek’s Github page: http://runn1ng.github.io/piratebay.html
23
to make the file more manageable. The resulting dataset, which we will call “global
dataset” accounted for all 2 million observations. The next step was identifying the
format of each file: the format of many of them is not specified in the TPB website and
several formats are not relevant to this project (such as audio formats) and all those
instances had to be removed. At the end of the process files whose format was identified
as belonging to one of the big four were assigned to the “media dataset” and were used
to estimate the installed base among uploaders and users each format had. The media
dataset included 284’858 observations with upload dates ranging from April 2004 to
February 2013. A final dataset was created for all the Torrent files of episodes of 50 TV
Shows with original airdates in the time range covered by the media dataset. For a
television show to be selected it needed to have broadcasted at least two seasons in the
time period studied and each season needed to have at least 13 episodes (so miniseries
were excluded).5
Figure 9: Dataset creation process
The original metric that was to be used in order to assess the adoption rate of
MP4 among users was the number of seeders each file had. However the data didn’t
actually reflect how many users had downloaded the Torrent at any point in time,
instead it reflected the number of users that had a copy of it at the moment at which the
database was extracted: the field seeders contains transversal information corresponding
5 Samples of the code used during the cleanup process can be found in Appendix 1
Raw database
•Karel Bilek's extraction
•2.2 million files
Global dataset
•Replaced comments by comment count
•Eliminated useless fields
•2.2 million files
Media dataset
•Contains AVI, MP4, MKV and WMV files
•284k files
TV Shows dataset
•Files of episodes (50 TV shows)
•9k files
24
to the extraction date (between the end of 2012 and the beginning of 2013). As can be
seen in Figure 10, the average number of seeds per file declines very fast, with the
number of seeders for files 6 months old hardly representing a 10% of those for more
recent files, leaving many observations with either one or none seeders. This was
problematic because it increased greatly the volatility for the variable containing user
adoption share.
To tackle this, the workaround we have found to this is using the number of
comments users left on the page of each Torrent file as a proxy for its popularity: we
based this on the assumption that users were equally likely to leave comments on any
Torrent page, an assumption that was further reinforced when the sample homogenized
by restricting it to television shows. Unlike seeders, once a comment is left on a page it
remains there indefinitely since forum moderation in TPB is virtually nonexistent.
With the aggregate totals for file count and comment count for AVI and MP4
files the installed base shares for each format at each point in time and for each peer
group were calculated. The time unit used is months, since our intent was to set up a
panel and shorter time intervals increased a lot the noise and hurt significantly its
balance. Since we consider that adoption decisions are permanent in order to estimate
the user installed base we used the cumulative share of comments left in files belonging
to either format. Using cumulative values has the advantage of creating more robust
0
10
20
30
40
50
60
70
80
20
04
-04
20
04
-08
20
04
-12
20
05
-04
20
05
-08
20
05
-12
20
06
-04
20
06
-08
20
06
-12
20
07
-04
20
07
-08
20
07
-12
20
08
-04
20
08
-08
20
08
-12
20
09
-04
20
09
-08
20
09
-12
20
10
-04
20
10
-08
20
10
-12
20
11
-04
20
11
-08
20
11
-12
20
12
-04
20
12
-08
20
12
-12
Ave
rage
nu
mb
er o
f se
eder
s p
er f
ile
Fig. 10: Average number of seeders per file for MKV, WMV, AVI and MP4 files
25
variables and gives a more accurate estimate of the current installed base share. This
approach is easier to justify for uploads than for users, since in the former case it is an
actual reflection of the share of files available in each format at moment t. In the case of
users the approach is valid as long as we only use the share and not the absolute values,
and as long as the adoption decision is not reversible.
For users the installed base share at moment t was calculated by using the Media
Dataset as
∑
∑
where are the count of comments left in MP4 files uploaded at moment i, and
is the comment count of media files in any of the big four video formats
uploaded at moment i.
For uploaders the installed base share at moment t was calculated by using the
Media Dataset as
∑
∑
where are the count of files uploaded in MP4 at moment i and is
the count of media files in any of the big four video formats uploaded at moment i.
Restricting the sample to specific television shows holds many advantages. First
it facilitated substantially the task of cleaning up the original database extraction,
particularly in the case of MP4 which is used to contain both audio and video files, so
after filtering the original extraction by file format a significant proportion of the MP4
files that remained were audio files (mostly music records). Even though their presence
was relevant to the study, since it was indicative of the degree of penetration of the
format among users and uploaders and was used to assess the indirect network effects, it
does not compete against video formats but against audio formats (such as MP3 or
FLAC), so they were distorting the calculation of the variable we used to calculate the
direct effects in video files.
The second advantage of restricting the sample to specific television shows is
that it minimizes any time biases (for instance, older Torrent files having more
26
comments because they have been up for a longer period of time). All the shows
selected for the sample were airing new episodes at some point during the time interval
studied, and, as the sharp drop in the number of seeders Figure 11 shows, most activity
registered in a TPB page happens soon after the file’s upload date.
A quick survey of the most active uploaders6 shows that the Torrent file of a new
episode is uploaded within the next 24 hours to its its original broadcast. These peaks of
activity attenuate (without completely eliminating) the time bias observed for user
comments since, as the sharp drop in the number of seeders in Figure 8 demonstrates,
most of the user downloads occur soon after the upload moment. While more reliable
than seeders, the comment count still presents a bias, since files that have been up
longer tend to have a higher number of comments in them. This is a minor problem
since for the same reason older files are also likely to have more downloads, if only
because they have been up longer. Using proportions instead of absolute values should
minimize this bias.
6 An example of an episode upload catalog can be found at www.eztv.it
0
50
100
150
200
250
300
350
400
2004
-04
2004
-12
2005
-04
2005
-09
2006
-01
2006
-06
2006
-10
2007
-02
2007
-06
2007
-10
2008
-02
2008
-06
2008
-10
2009
-02
2009
-06
2009
-10
2010
-02
2010
-06
2010
-10
2011
-02
2011
-06
2011
-10
2012
-02
2012
-06
2012
-10
2013
-02
Ave
rage
nu
mb
er o
f se
eder
s
Figure 11: Average number of seeders per file (TV Shows sample)
27
A third of the many advantages of restricting the sample to TV Shows is that it
allows controlling for variations that stem from the characteristics of the viewership of a
specific show. Furthermore, since those viewers follow the show over a period of time,
using the TV Shows dataset enables the possibility of following the evolution of their
preferences over a period of time.
The share for user adoption of MP4 during month t and television show TV was
calculated by using the TV Shows Dataset as
where is the count of comments in MP4 files uploaded in month t for
television show TV and is the comment count for files in any of the
big four media formats uploaded at moment t for television show TV.
The share for uploader adoption was estimated in a similar way (also with the
TV Shows Dataset) with
where is the percentage of uploads in MP4 during month t of
television show TV and is the count of uploads in MP4 in month t for
0
1
2
3
4
5
6
7
8
9
10
20
04
-10
2005
-02
20
05
-07
20
05
-11
20
06
-03
20
06
-08
20
06
-12
20
07
-04
20
07
-08
20
07
-12
20
08
-04
20
08
-08
20
08
-12
20
09
-04
20
09
-08
20
09
-12
20
10
-04
20
10
-08
20
10
-12
20
11
-04
20
11
-08
20
11
-12
20
12
-04
20
12
-08
20
12
-12
Ave
rage
nu
mb
er o
f co
mm
ents
Figure 12: Average of comments per uploaded file (TV Shows sample)
28
television show TV and is the count of files in any of the big four
media formats uploaded at moment t for television show TV.
Finally, to calculate the installed base of users of a specific television show we
used the following way
∑
∑
where are the comment count left in MP4 files belonging to television show
TV at moment i, and is the comment count left in all media files
belonging to television show TV at moment i. t-j is the moment corresponding to the
observation that precedes chronologically observation t.
Equivalent variables were generated for all other big four formats. Even though
this section describes only MP4 variables, the variables for AVI, MKV and WMV were
calculated following exactly the same procedure. Table 1 contains the descriptive
statistics for all of them.
29
Table 1: Descriptive statistics
Variable Obs Mean Std. Dev. Min Max
MP4
UserShare 1295 0.1842 0.3261 0.0000 1.0000
UploadShare 1541 0.1602 0.3722 0.0000 1.0000
IBUp (lagged) 1541 0.2596 0.1263 0.0000 0.5586
IBUs (lagged) 1541 0.1924 0.0854 0.0000 0.4262
TVIBUs 1541 0.1358 0.1457 0.0000 0.8333
AVI
UserShare 1295 0.6649 0.4007 0.0000 1.0000
UploadShare 1541 0.6582 0.3722 0.0000 1.0000
IBUp (lagged) 1541 0.4163 0.2019 0.1199 1.0000
IBUs (lagged) 1541 0.4845 0.1684 0.1529 1.0000
TVIBUs 1541 0.7595 0.2190 0.0000 1.0000
MKV
UserShare 1295 0.1406 0.2687 0.0000 1.0000
UploadShare 1541 0.1667 0.2699 0.0000 1.0000
IBUp (lagged) 1541 0.2254 0.1339 0.0000 0.5274
IBUs (lagged) 1541 0.1662 0.0944 0.0000 0.3323
TVIBUs 1541 0.0923 0.1239 0.0000 1.0000
WMV
UserShare 1295 0.0103 0.0920 0.0000 1.0000
UploadShare 1541 0.0146 0.1061 0.0000 1.0000
IBUp (lagged) 1541 0.0988 0.0492 0.0000 0.3612
IBUs (lagged) 1541 0.1569 0.0476 0.0000 0.4071
TVIBUs 1541 0.0124 0.0696 0.0000 1.0000
30
7. Results
Results for user share
Before proceeding to the actual estimation we ran a Hausman test for each user
specification in order to determine whether or not it was suitable to proceed using a
random effects model or if instead we should use a fixed effects one. The result7 was
that we were not able to discard the null hypothesis of equal efficiency of the estimators
for the random effects model, so we carried out our estimates using it. Additionally this
allowed us to perform our regressions using Maximum Likelihood Estimation, instead
of Ordinary Least Squares, which is advisable considering that the relation our
independent variables share with the dependent one may not be linear.
A summary of the results obtained for the regression of MP4 share for users can
be found in Table 2. Both the direct (TVIBUs) and indirect (IBUp) network effects
variables are significant with a 99% and 95% confidence respectively. The indirect
effect seems to have a stronger effect on the adoption of MP4 than the direct effect, that
is, the variety of media available in a specific format has a strong effect on the decision
to switch to MP4. This makes sense since it is very likely that users don’t restrict their
downloads to episodes of a single show, and that the availability of other media in a
specific format conditions their choice. Evidence of a direct network effect was also
significant, if lower than in the case of the indirect network effect, with monthly
adoption of MP4 increasing along with the share of users of a specific television show.
This is likely the consequence of restricting the direct network effect to the television
show level, which we did in order to ensure that the direct and indirect network effects
were properly distinguished. Even so, the effect has the sign we expected and confirms
that format adoption by the users affiliated to a television show do have an impact on
future adoption decisions. The yearly dummies we included are not significant,
probably the random effects model already accounts for some of the variation in time,
and that detracts intensity and significance to our dummies. We chose to leave them
because when running the same model for the share of adoption of AVI some
interesting effects were revealed.
7 Stata results for the Hausman tests are available in Appendix 3.
31
Table 2: Random Effects MLE regression of MP4 share for users with year dummies
Group variable: show
Observations
1255
Pseudo R-sq 0.1996
Groups
50
Log-likelihood -291.935
Obs per group: min = 3
LR Chi2 145.65
avg = 22.8
Prob > Chi2 0.0000
max = 65
ShareUsers
(MP4) Coef. Std. Err. t P>t [95% Conf. Interval]
IBUp .4122 .1983 2.08 0.038 .0234 .8010
TVIBUs .2704 .0892 3.03 0.002 .0956 .4553
d2005 .0732 .2217 0.33 0.741 -.3613 .5078
d2006 .0305 .2194 0.14 0.889 -.3995 .4605
d2007 .0726 .2195 0.33 0.741 -.3577 .5030
d2008 .0965 .2197 0.44 0.660 -.3342 .5273
d2009 -.0099 .2232 -0.04 0.964 -.4474 .4275
d2010 .0897 .2240 0.40 0.689 -.3493 .5288
d2011 -.0206 .2242 -0.09 0.926 -.4602 .4188
d2012 .1736 .2164 -0.74 0.460 -.2872 .6344
Cons -0.221 .2164 -0.10 0.918 -.4463 .4020
Table 3 shows the results of applying the same user adoption model to the AVI
user adoption share (with the AVI variables). Interestingly, the adoption of this format
also shows statistically significant direct network effects and indirect network effects
(with an 80% confidence). The indirect network effect is weaker than in the case of
MP4, and a possible reason for that is However the most interesting thing in this
regression are the year dummies for two reasons: first they show a higher degree of
significance overall than in the case of MP4, and it increases as we approach 2012, and
secondly all the estimated coefficients are negative. While in the case of MP4 the
network effects dominated the adoption process in the case of AVI there are two forces
at work: one is the loss in usefulness the format experienced during the period studied
(captured by the yearly dummies), which drives users away from it, while the other is
the network effects the initial widespread adoption of the format is generating, which
attracts users. As we saw in Section 4 the repulsive force ended up overcoming the
attracting one, but without the former it is likely that the weaker intensity of the indirect
network effect might not have sufficed to move the user base of TPB into MP4,
becoming locked-into AVI. This result points to the relevance of external adoption
32
drivers in the process of technological replacement with network effects, something that
will be examined further in Section 8.
Table 3: Random Effects MLE regression of AVI share for users with year dummies
Group variable: show
Observations
1255
Pseudo R-sq 0.2025
Groups
50
Log-likelihood -463.903
Obs per group: min = 3
LR Chi2 235.65
avg = 22.8
Prob > Chi2 0.0000
max = 65
ShareUsers
(AVI) Coef. Std. Err. t P>t [95% Conf. Interval]
IBUp .2105 .1496 1.41 0.159 -.0827 .5038
TVIBUs .2947 .0709 4.15 0.000 .1556 .4339
d2005 -.0836 .2535 -0.33 0.742 -.5806 .4133
d2006 -.0368 .2506 -0.15 0.883 -.5281 .4544
d2007 -.1682 .2507 -0.67 0.502 -.6597 .3231
d2008 -.1735 .2502 -0.69 0.488 -.6639 .3168
d2009 -.1198 .2538 -0.47 0.637 -.6174 .3777
d2010 -.1883 .2586 -0.73 0.466 -.6952 .3185
d2011 -.1410 .2595 -0.54 0.587 -.6498 .3676
d2012 -.4311 .2635 -1.64 0.102 -.9476 .0853
cons .5532 .2802 1.97 0.048 .0039 1.1025
Results for uploader share
Before running the regression we also performed a Hausman test on the specification,
which confirmed that the random effects model was efficient as well.
Table 4 shows the results of the regression, and confirms the presence of an
indirect network effect in the monthly uploader share. As was discussed in Section 6 the
variable IBUs is capturing the impact an increase in the installed base of users has in the
dependent variable, and there seems to be evidence of an uploader response to the
change in the preferences of media-consuming users. However the direct network effect
doesn’t seem to have any significance, which is understandable considering that users of
a specific television show are only a small subset of the total user base of TPB.
Encoding and uploading a Torrent file is not a trivial task, and acquiring the technical
proficiency needed to do so efficiently entail an investment in terms of learning costs
several orders of magnitude larger than the one users (who only have to download or
update their media player) have to assume when they decide to switch formats.
33
Furthermore, the process of uploading files into the BitTorrent network has economies
of scale, since the method used can be automated and ported to all kinds of media,
regardless of their content. All of this, together with the fact that the final goal of an
uploader is to maximize downloads, explains the weak role direct network effects have
in this instance: switching to a new format is only worth it if all the media that a specific
uploader is going to release is going to be in the new one, and switching to MP4 to cater
exclusively to a small subset of users is just not worth it.
Table 4: Random Effects regression MLE of MP4 share for uploaders with year dummies
Group variable: show
Observations
1491
Pseudo R-sq 0.1207
Groups
50
Log-likelihood -180.342
Obs per group: min = 4
LR Chi2 49.52
avg = 29.8
Prob > Chi2 0.0000
max = 70
Share Uploaders
(MP4) Coef. Std. Err. t P>t [95% Conf. Interval]
IBUs .9075 .1694 5.36 0.000 .5754 1.2397
TVIBUs .0113 .0641 0.18 0.859 -.1143 .1370
d2005 .2988 .2001 1.49 0.135 -0.0934 .6911
d2006 .1181 .0633 1.86 0.063 -.0066 .2429
d2007 .1818 .0539 3.37 0.010 .0795 .2876
d2008 .1340 .0468 2.86 0.004 .0421 .2258
d2009 .0638 .0396 1.61 0.107 -.0138 .1416
d2010 .1127 .0358 3.15 0.002 .0425 .1829
d2011 .0873 .0304 2.87 0.004 .0276 .1470
d2012 .0508 .0320 1.59 0.112 -.0119 .1135
Cons -.0960 .0593 -1.62 0.105 -.2123 .0219
Another interesting result of the uploaders’ specification is the statistical
significance of the year dummies we included. This suggests that while network effects
were the main factor leading to MP4 adoption among users, the switch in the case of
uploaders was strongly influenced by the external factors that made the format more
attractive over the time period studied. It is also indicative of a more active role of the
uploaders’ side in the adoption process of MP4, since the group’s behavior is consistent
with the changes in the technical landscape that ended up deciding the adoption of the
format.
34
8. Theoretical model
In this section we propose a microeconomic model to explain some of the results
observed in our empirical study. In order to allow its generalization to other contexts
different than the file format in TPB we have included production costs and prices,
which would be roughly equivalent to download time or the cost of encoding and
uploading different file formats. We also added adoption costs to this model, which
were omitted from the empirical part due to a lack of data. Our model is set in a
monopolistic competition market, because of the high degree of differentiation between
the television shows that were analyzed in the empirical part.
The model shares some similarities with the one developed by Götz (ibid),
which also describes a monopolistic competition market in which firms adopt in order
to reduce their marginal cost and increase their benefits via demand volume. However
the equilibrium for Götz’s model is always full adoption by all firms, which is not what
we observed in our empirical study and is not what happens in general in the real world.
So in this section we strive to create a model that reflects those incomplete penetration
equilibriums and that also takes into account the adoption decisions made by
consumers.
Basic setting
The starting point is identical to the one postulated by Götz in his model: the industry is
also composed by a number n of active firms which produce a differentiated product,
and a number of consumers E. One of the first departure points is that unlike Götz we
assume that consumers are myopic and therefore only maximize for the present moment
in time. Each has preferences described by a Dixit-Stiglitz consumption index with
(∫
)
[1]
where y(j,t) is the amount of variety j demanded by a consumer at time t. We will
assume all consumers have the same income which we normalize at 1, and that there is
no numeraire available to spend money in. As only one unit of income is spent on the
goods total expenditure in the market equals E. Therefore firm j in moment t is facing
the aggregate demand
35
∫
[2]
where p(j,t) is the price of variety j at moment t. As Götz does in his paper the term in
the denominator will be called the “price index”, and the actions of rivals impact the
demand firm j faces through it. Another thing worth noticing is that the demand a single
consumer has of good j at moment t is the same expression but without multiplying by
E. For a marginal cost c the profit maximizing price firms set is
Initially firms have a common and constant marginal cost , which they can
reduce by adopting a new technology. This switches their costs to , with
[3]
where is a constant that reflects the efficiency gain. Here our model starts to depart
widely from the one postulated by Götz, since we introduce the additional assumption
that consumers also need to adopt the technology in order to be able to purchase goods
from one of the firms using the new technology. If a consumer is not an adopter he will
only be able to purchase goods from firms that are producing with the old technology.
However we assume that the technical innovation offers backwards compatibility, so
that a consumer that adopted the technology will still be able to purchase goods from
firms that didn’t adopt. Table 5 summarizes the possible interactions.
Table 5: Possible interactions between firms and consumers
Firm technology
Consu
mer
tech
nolo
gy
Adopts Doesn’t adopt
Adopts Purchases at price
Purchases at price
Doesn’t adopt Doesn’t purchase
Purchases at price
36
Additionally we define variables to measure adoption shares in each side of the
market: for that purpose let q and s be the adoption rates of the technology among firms
and consumers, respectively.
The final difference with the Götz model is that consumers and firms are not
identical: each has an adoption cost
, with { }, a sunk cost in
which they need to incur in order to adopt the technical innovation. These can be
interpreted in terms of learning costs or monetary costs that the agent needs to assume
in order to get up and running with the new technology. We will assume that
is
identically and independently distributed for both groups, with maximum values .
Technology adoption by consumers
Since the market is fragmented the price index of a consumer changes depending
on whether or not he or she has adopted the technology. Consumers that have done so
will be able to purchase goods from firms in the [0 – n] interval, so their price index can
be written as
∫
[
] [4]
On the other hand, consumers that have not done so will only be able to
purchase from firms which have not adopted the new technology, which will be those
located in the [n – qn] interval. Their price index will be
∫
[5]
Given these assumptions and using [1] and [2] we can describe the value of the
Dixit-Stiglitz index for a consumer that has adopted the technology as
(∫
)
(∫ (
∫
)
)
[6]
For a consumer that has not adopted the technology we have the following
utility:
37
(∫
)
(∫ (
∫
)
)
[7]
All consumers with an entry cost below will take on the technology if the
difference between their post and pre adoption utility levels is higher than .
[8]
Having the costs distributed iid has the property that threshold adoption costs at
moment t can be used to estimate the adoption share a technology has. If a consumer at
with adoption costs decides to adopt he technology, it must be that all the consumers
with adoption costs below that level have already adopted or will chose to do so as well.
This way the adoption share for consumers can be expressed as
⇔ [9]
By plugging [4] and [5] into [6] and [7] (respectively) and then substituting in
equation [8] we get the following function for threshold adoption costs8
{[
]
}
Finally, by substituting by the expression in [3] and by the one in [9] and
then rearranging we obtain the user adoption share as a function of all the other
variables
[
] [10]
Equation [10] has several interesting elements. First we can see how the
adoption share among users is positively related to the total number of companies
operating in the market. On one hand this reflects the preference for variety consumers
have, and the unwillingness to abandon consumption of a portion of the available
8 Some steps of the process are included in Appendix 4
38
products in order adopt the new technology, and the bigger the market is on the supply
side, the less variety they are giving up when finally adopting. On the other hand this is
due to the fact that for a higher number of firms means that a specific proportion of
adopters entails a bigger increase purchased volume (and therefore utility) than it would
if the number of total firms was lower. The second interesting element is the negative
value of the expression between the square brackets when , which is interpreted
as a null adoption share among users. As we will soon see firms have a similar bias,
pointing out to the need to have an installed base on day zero for a technology to be able
to break into a market. In the empirical case we studied MP4 had such an installed base,
since some music and HD contents had to be offered in the wrapper due to the
limitations of AVI. Without these external factors, it’s unlikely that the format would
have enjoyed the success it finally ended up having. Also of note is the negative
relationship the share has with the price of non-technological goods : intuition
dictates that if the price of the non-technological firm is very high an innovation that
reduces it would be a welcome development, and would be more desirable than in a
case where the price was already low. However, we need to keep in mind that the
decrease in the price is proportional, that is, higher initial prices mean higher prices after
adoption as well. This, together with the fact that after purchasing the technology the
consumer won’t stop consuming the non-technological goods (love of variety) ads up to
a lower relative increase in the utility the consumer has. The negative influence of is
easier to explain, since a higher value for it means that adoption costs are spread over a
larger interval: an increase of it means, all else equal, a decrease in the proportion of
users that have a utility differential higher than
Finally we have to explain what the expression between the square brackets is.
The expression is derived from the adopters’ utility
function, and is the term that captures the “share effect” technology adoption has in
terms of utility. The marginal utility of firms is replaced by the new marginal utility
with lower prices . On the other hand, is the loss of utility
that comes from not adopting and being unable to purchase from firms that are
producing with the new technology. In the first case, the impact bigger shares have is
positive, increasing adopter utility and therefore the adoption rate while in the second,
bigger shares have a negative effect. We refer to the result of subtracting the adopters’
share effect from the non-adopters’ as “net share effect”.
39
Figure 13 shows how the net share effect changes when we vary and for a constant
level of .
Technology adoption by firms
To model the adoption decision on the firms’ side we will use a similar method as with
the consumers, only in this case instead of utility we will use benefit which we define as
( )
Here the benefit will also need to be calculated separately for adopting and non-
adopting firms, since the former are servicing only the subset of technology adopters sE
(recall that E was the total number of customers in the market). We obtain the benefit
function or firms that adopt the technology by applying [2] and substituting the prices
[(
) ] [
∫
] [12]
Recall how s was the share of consumers that have adopted the technology, and
how that’s the only group that an adopter firm will service. Notice however how the
price index used is for all the consumers: that’s because adopting consumers also
purchase goods from non-adopting firms. The benefit for non-adopting firms is
0
0.2
0.4
0.6
0.8
1
1.2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Val
ue
of
ne
t sh
are
eff
ect
Value of q
Figure 13: Net share effects for β = 0.9
α=0.1
α=0.2
α=0.3
α=0.4
α=0.5
α=0.6
α=0.7
α=0.8
α=0.9
40
[(
) ] [
∫
] [13]
Firms will adopt the technology if
[14]
Substituting [12] and [13] in [14], and developing we obtain
( )
[ ] ( )
which can be rearranged into
( )
[ ]
[15]
Function [15] determines the threshold adoption cost for firms as a function of
the other variables. The most striking feature of this solution is that the decision
adoption of firms doesn’t depend on the initial marginal cost: this was expected since
under this kind of monopolistic competition the markup of firms is
If we take the derivative of the markup with respect to the cost is always positive
and constant, which means that the markup decreases at a constant rate along with the
costs. Given this the only relevant variable to firms is the variation in the marginal cost
and not the cost level itself.
When we derived the adoption function for consumers we saw how their
decision was conditioned only by the number of adopting firms, and the proportion of
consumers that had adopted was irrelevant to them. However this is not so in the case of
firms, since the decisions of other agents in their side of the market does impact their
benefit function and this is denoted by the appearance of in their adoption function.
41
Since and , , which means that the impact has on the
adoption threshold is negative which in turn means that the higher the share of firms
that adopt, the higher the incentive non-adopting ones will have to do so as well. The
average number of consumers per company E/n has a positive relationship with the
threshold adoption costs (which means a positive relationship with the share of adopting
firms). This result is related to consumers’ love of variety: in more competitive markets
the increase in sales volume doesn’t make up for the loss of revenue that comes from
servicing a smaller share of the market. This result can be related to a market’s maturity
level: more established ones will be more likely to become locked into the older
technology, while firms operating in more concentrated markets will stand to gain more
from adoption, since they will experience a higher increase in demand when they lower
prices. There is an exception to this point when the market is operating under a
monopoly, in which case the firm will never adopt the technology because its income
does not vary.
Finally the fraction in the left side of equation [15] we can find a fraction: the
denominator should be familiar since it is similar to the expression we found for the
consumers’ adopter share effect, only in this case it is not reflecting the increase in
utility of adoption but the variation in the price index as a function of the proportion of
adopting firms. It is dividing ( ), an expression indicative of the increase
in income adopting the technology entails. is a measure of the variation in
the firm’s income after adopting the new technology, taking into account the share of
adopting consumers as well as the effect a reduction in the price has on the demanded
volume. If the firm is making less by adopting the technology than by
not doing so, and this means a negative value for , which we interpret as no adoption
on the firms’ side. This ratio can be interpreted as the increase in income derived from
adoption corrected by the proportion of adopting firms. The more firms that chose to
adopt the lower its value will be since more firms will be already competing at lower
prices. On the other hand the value of the ratio increases together with the proportion of
adopting consumers, since higher adoption rates on the consumer side entail more
demand for the technological goods.
42
Fit with empirical evidence
The microeconomic model greatly emphasizes the need of having a day zero installed
base for an evolutionary innovation to be able to succeed. This was also the case with
MP4, where external factors such as the advent of media centers and portable hardware
greatly incentivized adoption, providing an installed base for the successful diffusion of
the format. Furthermore, the introduction of adoption costs also seems to reflect reality,
providing an explanation for the gradual acceptance and incomplete penetration of the
format. The model also seems to capture the cascading and self-reinforcing relation
between adoption share for firms and for consumers: higher proportions in one side
reinforce adoption in the other, while at least in the supply side, actions taken on the
same side impact adoption differently.
Finally, even though it was not explicitly addressed in the empirical part, another
way in which our theoretical model parallels the format adoption dynamics in TPB is in
the importance the maturity of a market has in the success or failure of the new
technology: the user base for TPB increased during all the period studied, which may
have facilitated the transition from AVI to MP4. In the same line, it captures the
incentive some of the firms have to develop a niche market and keep offering the non-
technological good despite the migration of other competitors to the new technology. A
high substitution elasticity increases this effect and makes adoption less attractive since
the actual quantities consumed barely be affected by changes in the prices, and an
income effect (coming from the price reduction of a part of the goods traded) increases
quantities consumed in general.
As mentioned before the most important departure point of the model with respect
to the empirical part is in the inclusion of prices for the goods, which translate into a
level of demanded volume for each good. However the improvements MP4 offers don’t
translate necessarily into a quantitative increase of demand but in a qualitative
improvement of the goods purchased. A far-fetched way of looking at it is in terms of
bandwidth (which would be equivalent to income) and some kind of quality/size ratio
(which would be equivalent to price). More elegant modifications to the model that take
this nuances into account are examined in the discussion section.
43
9. Simulation
In order to better understand the dynamics of the model we ran several simulations
varying some of the parameters described in Section 8. The software we used for it was
Netlogo, an application developed by Northwestern University that facilitates the
creation and execution of agent-based models. In order to implement our model we
created two sets of agents, consumers and firms. We kept all the parameters for
consumers constant across all the simulation, varying only the ones on firm side and
global. A total of 5’000 consumers were created, and each of them was assigned an
adoption cost which was a value in the [0 - 100] range, with all values independently
and identically distributed amongst the population. The number of created firms
varied from simulation to simulation in the [50-500] range. Their adoption costs were
set in the range [0 - fkmax], where fkmax is a parameter whose value was also varied
in different simulations. In order to solve the initial conditions problem the
microeconomic model presents (i.e. that no firm will adopt if no consumers have
adopted, and that no consumers will adopt if no firm has adopted) a proportion ibc of
consumers were made into adopters at random before the simulation started. Tables 6
and 7 describe the parameters and values used in the simulation.
Table 6: List of parameters used in the simulation
Parameter Name Explanation Value
cons Number of consumers (E) 5’000
n Number of firms (n) [50 – 500]
alpha Substitution elasticity ( [0.1 – 0.9]
beta Marginal cost reduction [0.1 – 0.9]
ckmax Maximum value for consumer
adoption costs ( 100
fkmax Maximum value for firm
adoption costs ( [50 – 500]
ibc Initial adoption share of
consumers [0 – 0.9]
cost Marginal cost of production [100 – 500]
44
Table 7: List of variables used in the simulation
Variable Name Explanation
s Adoption share of consumers
q Adoption Share of firms
kc Threshold adoption cost for consumers
kf Threshold adoption cost for firms
ConsAdoptionCost Adoption cost for individual consumers (owned)
FirmAdoptionCost Adoption cost for individual firms (owned)
The following pseudocode describes the routine executed during each simulation9:
create consumers, assign them ConsAdoptionCost
create firms, assign them FirmAdoptionCost
make a proportion ibc of consumers adopt the technology
while there is no equilibrium
calculate q, s, kc, kf
consumers with ConsAdoptionCost < kc adopt the technology
firms with FirmAdoptionCost < kf adopt the technology
check for equilibrium
loop
When adoption shares for both parts stop varying from cycle to cycle the system
enters a stationary state and the equilibrium condition is considered fulfilled, which in
turn causes the simulation to stop and to return the equilibrium value of the variables as
well as the value of the parameters used in that iteration.
Simulation results
A total of 12’000 simulations were ran, varying the parameters as specified on Table 6.
On the firm side it was immediately evident that the main determinants of adoption
were the values of alpha and beta: high values of beta are a disincentive to adoption,
since as we saw on the previous section they have a direct impact on the firms’ markup.
On the other hand higher values for alpha exert an opposite force and lead to more
9 The actual code used can be found in Appendix 5
45
adoption. Figure 14 shows firm adoption shares for different equilibrium states varying
only alpha and beta.
The rest of parameters also had an impact on the equilibrium adoption shares,
although they condition adoption levels rather than the binary success or failure of a
new technology in getting a foothold in a market. As can be appreciated in Figure 15
fkmax has a negative impact on the adoption share, while the impact of n works
opposite of that, resulting in higher equilibrium adoption shares for lower values of it.
0
0.2
0.4
0.6
0.8
1
1.2
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
Equ
ilib
riu
m f
irm
ad
op
tio
n s
har
e
Alpha
Figure 14: Equilibrium adoption shares for firms with fkmax = 250 , n = 500, cost = 250 and ibc = 0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.2
0.4
0.6
0.8
1
1.2
100 200 300 400 500
Equ
ilib
riu
m f
irm
ad
op
tio
n s
har
e
fkmax
Figure 15: Equilibrium adoption shares for firms with alpha = 0.75, beta = 0.25, cost = 20, and ibc = 0.3
50
100
200
300
400
500
n
Beta
46
On the consumers’ side the effect of beta on the adoption share is the opposite
as in the firms’ case: a higher reduction in marginal cost translates into lower prices for
consumers, increasing the incentive they have to adopt. On the other hand, lower values
for alpha also favor technology adoption, since a lower substitution elasticity allows
consumers to take full advantage of the decrease in prices for a part of the goods offered
without sacrificing much welfare for the loss of diversity in their consumption basket. It
should be noted that consumers’ adoption decisions are not independent for the firms’:
this is especially noticeable when we look at the equilibrium shares for consumers when
we vary alpha and beta, leaving the rest of parameters constant.
It quickly becomes apparent that if there is no adoption on the firms’ side there
will not be further adoption among consumers either. This becomes clear when we vary
the values of beta: higher values are attractive to consumers and unattractive to firms,
and when we increase its values we see how the effect on the firms’ side dominates the
other one, as evidenced by the lower values for equilibrium shares of consumers. Figure
16 shows the consumer equilibrium shares for the market that was described in Figure
14 (varying also alpha and beta).
Finally, another set of relevant parameters to consumers are the initial marginal
cost and the number of firms operating in the market. Figure 17 shows how the
equilibrium adoption share varies when we change them leaving all the remaining
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Equ
ilib
riu
m c
on
sum
er a
do
pti
on
sh
are
Alpha
Figure 16: Equilibrium adoption shares for consumers with fkmax = 250, n=500, cost = 250 and ibc = 0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Beta
47
parameters constant. As was explained in the previous section, a higher marginal cost
translates into higher prices for consumers which difficult the investment necessary to
adopt the new technology. On the other hand, a higher number of companies constitutes
an incentive to adoption since even a small proportion of adopting firms can mean a
great variety of products being offered at a lower price, which encourages consumer
adoption. This figure can be compared to Figure 15, since both show equilibria on
different sides for identical markets. Notice how even if the penetration of the
technology is incomplete on the firm side, it can be a sufficient incentive to encourage
adoption by all consumers.
10. Discussion
In the empirical part of this project (Sections 6 and 7) we demonstrated the impact
network effects have had in the substitution of AVI by MP4 as the dominant format for
containing audiovisual files in TPB. We found significant coefficients for both forces in
the case of peers that download and for the indirect effect in the case uploaders. Even
though the pseudo-R-squared coefficients we ended up obtaining were modest, they are
sufficient when one considers that our models only accounted for network effects and
not for market conditions, which probably explain a lot of the variance of the sample.
Another factor that might have moderated the determination coefficients is the noise
0
0.2
0.4
0.6
0.8
1
25 75 125 175 225 275 325 375 425 475
Equ
ilib
riu
m c
on
sum
er a
do
pti
on
sh
ares
Cost
Figure 17: Equilibrium adoption shares for consumers with fkmax = 250 , alpha = 0.75, beta = 0.25 and ibc = 0.3
100
200
300
400
500
n
48
present in the sample: due to the technical restrictions, only the files that unambiguously
belonged to a format and television show were selected, a process during which many
observations of whose content or format we were not certain were discarded. If we had
used a less basic algorithm and had gotten into content disambiguation or even used the
magnet links provided in the dump to directly check the extension of each Torrent file,
we would have ended up with a more balanced panel and the results of our analysis
could have been more reliable overall. Carrying out this kind of techniques requires a
time investment and a level of technical expertise which was not available to us when
developing this project and are, all in all, not its purpose.
A possible improvement for the empirical part is related to the data generation
itself. Carrying out our own extraction while taking into account the empirical goals of
our research would enable us to obtain additional fields that were not collected in the
raw database we used for this project. The one containing the name or alias of the
uploader would have been of particular interest to us, since it would provide us a way of
controlling for distinct uploaders across different television shows. Since the author of
the database did upload some of the code he used doing something along those lines
may be feasible in the future. Obtaining additional geographical data of all peers would
also be a useful development: controlling for geographical factors would allow us to
incorporate variables that account for the popularization of the format resulting from the
generalization in the use of mobile devices or HD screens, since this were introduced at
different points in time for different countries. Doing so would allow creating
something akin to a treatment variable, further improving the reliability of the model.
However a regular dump of the TPB website would not suffice garner that information,
since it is not openly available to the public. In order to get access to it the collaboration
of the site administrators would be essential, and given the ambiguous legal status of
some of the files shared it is unlikely that they would be willing to give us access to the
information.
Another interesting way of developing the empirical model would be to set up a
discrete choice one similar to the one used by Ohashi (2004) for studying standard
competition in the VCR market. This approach separates more elegantly the network
effects from the market conditions which push a user to choose either format. In order to
implement such approach additional information would be needed, and it would be
useful to focus in a single side of the BitTorrent ecosystem. A nested logit model such
49
as the one developed by Ohashi would probably be unnecessary, but that still leaves the
construction of the model for estimating the utility function, for which we would need
much more information on both the users and the contents being studied.
A first step towards that end could be diving deeper into the demographic
characteristics of the audience of each television show. If we had information on age,
income or education of spectators additional variables could have been included to
model in a more accurate way the trends that, throughout the decade, made MP4 more
attractive than AVI. Ratings companies such as Nielsen, Google, IMDB or Yahoo have
data which might have been used towards that end, however it is proprietary and they
only make it available in exchange of large fees. On the other hand broadcast companies
do offer some free information to that effect, although it’s often through press releases
that only contain partial data.
Including sales data of devices compatible with MP4 such as PlayStation 3 or HD
television sets is another way of giving more accuracy to the time effects and having
more reliable estimates for the magnitude the network effects. Sales figures are an
attractive metric to model the increase in utility the format experimented throughout the
2000s. Besides different introduction dates an additional complication to this approach
is that these platforms may have enjoyed different degrees of popularity in different
countries, and lacking a way of segregating our TPB observations along geographic
lines complicates the identification of hardware effects.
With the same dataset we could also have studied similar format competition in
music instead of audiovisual media. Looking at how MP4 has performed against other
audio container formats such as MP3 or FLAC should turn similar results as the ones
we observed for video container formats. However the approach we used of clustering
our observations by television shows would not be valid in this instance, since
downloads for music files have a longer peak cycle. A way to proceed would be using
Billboard charts as a way of measuring the popularity of an artist at a given point in
time and looking at uploads and downloads of his works taking that into account. Video
codec battles, such as DivX versus x264 could also be studied, although sampling for
them would be complicated given that x264 files are more likely to be identified as
such, since the codec is often used as an alternative way of publicizing a video file as
high definition.
50
As for the microeconomic model, a possible way of furthering its development
would be to allow for more differentiation among agents. A way of achieving that goal
among consumers would be to do away with their homogenous income, which would
probably present us with interesting dynamics regarding income distribution and
technology adoption. Withdrawing the obligation they have of spending all their income
in the goods being offered in the monopolistic market is another possibility that would
allow us to have a numeraire and therefore be able to measure with more reliability the
variation in utility adopting a technology entails. A final possibility on the consumer
side would be the inclusion of a utility bonus tied to technology, which would be
proportional to the quantity consumed of an adopting firm. On the firms side allowing
for variable costs is also a way of including diversity and making the model more
realistic. Alternatively a rank constant could be assigned to each company in order to
obtain different level of consumption for each company.
This project was carried out exclusively with information freely available to the
public, and the database used is by no means one-of-a-kind. Staggering amounts of
information that can shed light to a myriad of questions raised by the social sciences are
at the disposal of anyone interested and with a measure of technical knowhow. The
information that was finally used in the project is dwarfed by the actual contents of the
database, which already is an incomplete copy of the information available in the TPB
website. The increasing availability of large public domain datasets that, after cleansing
and preparation, supply almost universal observations opens a very stimulating field for
new theory development, as well as for replication of previous research that was based
on attitudinal data or smaller samples.
51
11. Appendix 1 – Code used for database cleanup
NOTE: the character “_” indicates that the line was truncated for it to fit this document
Karel Bilek’s original XML file had the following structure:
<!ELEMENT archive (Torrent*)>
<!ELEMENT Torrent (id, title, magnet, size, seeders,_
leechers, quality?, uploaded, nfo, comments)>
<!ELEMENT quality (up,down)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT magnet (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT size (#PCDATA)>
<!ELEMENT seeders (#PCDATA)>
<!ELEMENT leechers (#PCDATA)>
<!ELEMENT uploaded (#PCDATA)>
<!ELEMENT nfo (#PCDATA)>
<!ELEMENT when (#PCDATA)>
<!ELEMENT what (#PCDATA)>
<!ELEMENT up (#PCDATA)>
<!ELEMENT down (#PCDATA)>
From the Raw Dataset to Global Dataset
First step was removing all the fields that were not used in the project. For example the
following command gets rid of all the information in the “magnet” field from the file
“raw.dataset.xml”, and saves the output into a new XML document
(“global.dataset.xml”).
sed '/^<magnet>/d' raw.dataset.xml > global.dataset.xml
Several of these commands can be concatenated in order to speed up the process.
In this example we get rid of the “magnet”, “quality” and “leechers” fields.
sed '/^<magnet>/d' raw.dataset.xml| sed '/^<quality>/d'_ |
sed '/^<leechers>/d' > global.dataset.xml
52
In order to count the comments and then remove them we used something similar
to the following command, that counts the comments in each Torrent file and for each
one of them adds the “<countcomm>” field, which contains the actual number.
awk '/^<comment>/ {count++} /^<\/comments>/ {print; print_
"<countcomm>" count "<\/countcomm>"; count=0; next}1'_
raw.dataset.xml > global.dataset.xml
Then the actual comments were deleted following a similar method to the one
used in irrelevant or redundant fields.
From the Global Dataset to the Media Dataset
The only step to go from the Global to the Media Dataset is removing all those files that
do not belong to one of the big four formats. To do so first we created a different XML
file with all the entries that contained patterns that matched the formats we were
studying. For example for MP4 the command was
awk '/^<Torrent>/ {pre_print=1; do_print=0}_ pre_print==1_
{y=x "\n" $0; x=y}; do_print==1 {print}_ /[Mm][Pp]4/_
{do_print=1; pre_print=0; print y; y=""}_ /^<\/Torrent>/_
{do_print=0; y=""; x=""}' global.dataset.xml > mp4.data
In the case of AVI we had to be extra careful since many files contained the
pattern “AVI” without them actually being wrapped in it (for example, the previous
code for AVI would return an entry like “The Aviator.mp4”). For this reason, in that
case the pattern had to be either preceded by a dot (“.”) or an isolated word (surrounded
by blank spaces).
Once we had the XML files for each separate format a new field <format> was
added to them containing the format (obviously), and then all the files were
concatenated into “media.dataset.xml” with the following command
cat mp4.data avi.data mkv.data wmv.data > media.dataset.xml
At this point the “<nfo>” field, containing miscellaneous information on the
Torrent was deleted.
53
From the Media Dataset to the TV Shows Dataset
The following code gets all the files for the television show “The Walking Dead” and
puts them into the file “TheWalkingDead.data”. It also adds a “<show>” field
(containing in this case “TheWalkingDead”) which was used as the group variable in
the regressions.
awk '/^<Torrent>/ {pre_print=1; do_print=0} pre_print==1_
{y=x "\n" $0; x=y}; do_print==1 {print}_
/[Tt]he.[Ww]aling.[Dd]ead/ {do_print=1; pre_print=0; print_
y; y=""} /[Tt]he[Ww]alking[Dd]ead/ {do_print=1;_
pre_print=0; print y; y=""} /^<\/Torrent>/ {do_print=0;_
y=""; x=""}' media.dataset.xml | awk '/^<countcomm>/_
{print;print "<show>TheWalkingDead<\/show>";next}1' > _
TheWalkingDead.data
Finally all the television show files generated were concatenated into
“shows.dataset.xml” following a similar procedure as the one described before.
54
12. Appendix 2 – Descriptive statistics for television shows
Show Freq.
Average MP4
user share
Average AVI
user share
Average MKV
user share
Average WMV
user share
American Dad 37 0.1477 0.8207 0.0316 0.0000
American Idol 21 0.0660 0.8041 0.0823 0.0476
Archer 20 0.5305 0.1750 0.2945 0.0000
Battlestar Galactica 34 0.1082 0.7162 0.1609 0.0147
Big Bang Theory 44 0.1997 0.4933 0.2843 0.0227
Boston Legal 8 0.1250 0.7500 0.1250 0.0000
Breaking Bad 24 0.1598 0.6691 0.1711 0.0000
CSI 48 0.0487 0.8372 0.1141 0.0000
Chuck 37 0.3522 0.5766 0.0441 0.0270
Cold Case 13 0.0000 0.9650 0.0350 0.0000
Community 25 0.0911 0.7455 0.1434 0.0200
Criminal Minds 38 0.1913 0.6066 0.2020 0.0000
Deadwood 7 0.0000 0.8571 0.1429 0.0000
Desperate Housewives 45 0.0692 0.8522 0.0787 0.0000
Dexter 33 0.1288 0.7135 0.1577 0.0000
Family Guy 59 0.3015 0.5823 0.0989 0.0173
Fringe 38 0.1722 0.5753 0.2525 0.0000
Game Of Thrones 10 0.4153 0.2318 0.3479 0.0050
Gilmore Girls 7 0.0000 1.0000 0.0000 0.0000
Gossip Girl 32 0.2937 0.6166 0.0897 0.0000
How I Met Your Mother 46 0.2493 0.4275 0.3233 0.0000
Jersey Shore 7 0.4603 0.5397 0.0000 0.0000
King Of The Hill 4 0.0000 0.8750 0.1250 0.0000
Last Airbender 13 0.3382 0.3867 0.1185 0.1566
Mad Men 21 0.1021 0.7166 0.1814 0.0000
Masterchef 7 0.4286 0.5714 0.0000 0.0000
Modern Family 27 0.0996 0.7371 0.1633 0.0000
My Name Is Earl 17 0.0000 1.0000 0.0000 0.0000
NCIS 51 0.2410 0.6633 0.0761 0.0196
One Tree Hill 28 0.0134 0.9866 0.0000 0.0000
Prison Break 36 0.0762 0.8232 0.1006 0.0000
Rescue Me 13 0.0833 0.8776 0.0362 0.0028
Robot Chicken 9 0.1453 0.6880 0.1667 0.0000
Scrubs 24 0.2575 0.6650 0.0358 0.0417
Simpsons 66 0.2445 0.6376 0.1179 0.0000
Smallville 46 0.1841 0.7281 0.0878 0.0000
Sons Of Anarchy 4 0.4250 0.0000 0.5750 0.0000
South Park 52 0.2076 0.6715 0.1123 0.0085
Star Wars Clone Wars 33 0.2117 0.4986 0.2844 0.0053
Stargate 59 0.1883 0.6575 0.1494 0.0048
The Closer 10 0.1857 0.8143 0.0000 0.0000
The Mole 4 0.5000 0.5000 0.0000 0.0000
The Office 47 0.2581 0.6005 0.0921 0.0493
The Sopranos 7 0.4286 0.4286 0.1429 0.0000
The Walking Dead 14 0.2229 0.3337 0.4434 0.0000
The West Wing 4 0.0000 1.0000 0.0000 0.0000
Ugly Betty 13 0.0000 1.0000 0.0000 0.0000
Vampire Diaries 26 0.0618 0.7400 0.1597 0.0385
Young Justice 13 0.0587 0.5152 0.4261 0.0000
iCarly 14 0.2143 0.4286 0.3571 0.0000
55
Show Freq.
Average
MP4 upload
share
Average AVI
upload share
Average
MKV upload
share
Average
WMV
upload share
American Dad 37 0.12 0.78 0.10 0.00
American Idol 21 0.05 0.75 0.08 0.11
Archer 20 0.15 0.57 0.28 0.00
Battlestar Galactica 34 0.08 0.72 0.18 0.01
Big Bang Theory 44 0.16 0.51 0.30 0.02
Boston Legal 8 0.08 0.83 0.08 0.00
Breaking Bad 24 0.18 0.64 0.18 0.00
CSI 48 0.06 0.72 0.20 0.03
Chuck 37 0.10 0.74 0.16 0.00
Cold Case 13 0.09 0.79 0.12 0.00
Community 25 0.04 0.72 0.20 0.04
Criminal Minds 38 0.32 0.57 0.12 0.00
Deadwood 7 0.00 0.78 0.22 0.00
Desperate Housewives 45 0.05 0.83 0.12 0.00
Dexter 33 0.12 0.73 0.14 0.01
Family Guy 59 0.28 0.60 0.10 0.02
Fringe 38 0.12 0.58 0.30 0.00
Game Of Thrones 10 0.19 0.36 0.44 0.00
Gilmore Girls 7 0.00 1.00 0.00 0.00
Gossip Girl 32 0.17 0.66 0.17 0.00
How I Met Your Mother 46 0.20 0.49 0.32 0.00
Jersey Shore 7 0.08 0.77 0.15 0.00
King Of The Hill 4 0.24 0.00 0.76 0.00
Last Airbender 13 0.20 0.58 0.22 0.00
Mad Men 21 0.23 0.55 0.14 0.08
Masterchef 7 0.15 0.73 0.12 0.00
Modern Family 27 0.18 0.75 0.07 0.00
My Name Is Earl 17 0.49 0.13 0.38 0.00
NCIS 51 0.08 0.80 0.07 0.04
One Tree Hill 28 0.17 0.74 0.09 0.01
Prison Break 36 0.06 0.92 0.02 0.00
Rescue Me 13 0.10 0.78 0.11 0.01
Robot Chicken 9 0.09 0.68 0.19 0.04
Scrubs 24 0.13 0.78 0.09 0.00
Simpsons 66 0.25 0.66 0.06 0.03
Smallville 46 0.20 0.60 0.19 0.00
Sons Of Anarchy 4 0.14 0.57 0.29 0.00
South Park 52 0.22 0.64 0.13 0.01
Star Wars Clone Wars 33 0.21 0.64 0.14 0.01
Stargate 59 0.24 0.55 0.20 0.01
The Closer 10 0.12 0.61 0.26 0.00
The Mole 4 0.35 0.34 0.31 0.00
The Office 47 0.16 0.74 0.06 0.04
The Sopranos 7 0.34 0.31 0.27 0.08
The Walking Dead 14 0.48 0.24 0.26 0.02
The West Wing 4 0.26 0.51 0.23 0.00
Ugly Betty 13 0.15 0.56 0.29 0.00
Vampire Diaries 26 0.07 0.85 0.04 0.03
Young Justice 13 0.08 0.56 0.36 0.00
iCarly 14 0.14 0.37 0.49 0.00
56
14. Appendix 3 – Stata output of the regressions
Stata output for the Hausman test for the User Share specification of MP4
(V_b-V_B is not positive definite)
Prob>chi2 = 0.9293
= 4.37
chi2(10) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
b = consistent under Ho and Ha; obtained from xtreg
d2012 .2007474 .1654744 .0352731 .
d2011 .0207218 -.0311989 .0519207 .
d2010 .1371569 .0788659 .058291 .
d2009 .0372154 -.0198837 .0570991 .
d2008 .148703 .0865165 .0621865 .
d2007 .1034385 .0669219 .0365166 .
d2006 .0668862 .023194 .0436922 .
d2005 .0944475 .06784 .0266075 .
tvibusmp4j .012848 .3178492 -.3050012 .0571412
mp4ibuplag .4599806 .4024638 .0575168 .0035099
fixed random Difference S.E.
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Coefficients
. hausman fixed random
57
Stata output for the random effects regression of MP4 user share using the 2004-2012
panel using year dummies
pseudo R2=.19967444
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 9.09 Prob>=chibar2 = 0.001
rho .0435735 .0231201 .0138579 .1112324
/sigma_e .3011312 .0062099 .2892026 .3135517
/sigma_u .0642748 .0175428 .037646 .1097396
_cons -.0221572 .2164335 -0.10 0.918 -.4463591 .4020448
d2012 .1736053 .2351182 0.74 0.460 -.2872179 .6344285
d2011 -.0206898 .2242757 -0.09 0.926 -.460262 .4188825
d2010 .0897151 .2240411 0.40 0.689 -.3493974 .5288276
d2009 -.0099462 .2232052 -0.04 0.964 -.4474203 .427528
d2008 .0965474 .2197812 0.44 0.660 -.3342158 .5273107
d2007 .0726784 .2195981 0.33 0.741 -.3577261 .5030828
d2006 .0305307 .2194145 0.14 0.889 -.3995139 .4605753
d2005 .0732896 .2217404 0.33 0.741 -.3613137 .5078928
tvibusmp4j .2704914 .0892177 3.03 0.002 .0956278 .4453549
mp4ibuplag .4122865 .1983688 2.08 0.038 .0234907 .8010822
mp4usershare Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -291.93558 Prob > chi2 = 0.0000
LR chi2(10) = 145.67
max = 65
avg = 25.1
Random effects u_i ~ Gaussian Obs per group: min = 3
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1255
Iteration 3: log likelihood = -291.93558
Iteration 2: log likelihood = -291.93558
Iteration 1: log likelihood = -291.94392
Iteration 0: log likelihood = -293.38403
Fitting full model:
Iteration 4: log likelihood = -364.77103
Iteration 3: log likelihood = -364.77108
Iteration 2: log likelihood = -364.79654
Iteration 1: log likelihood = -365.58808
Iteration 0: log likelihood = -374.33886
Fitting constant-only model:
> d2011 d2012, mle
. xtreg mp4usershare mp4ibuplag tvibusmp4j d2005 d2006 d2007 d2008 d2009 d2010
58
Stata output for the Hausman test for the Uploader Share specification of MP4
Prob>chi2 = 0.5821
= 8.48
chi2(10) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
b = consistent under Ho and Ha; obtained from xtreg
d2012 -.2830568 -.2971939 .014137 .0178864
d2011 -.2282999 -.2460666 .0177668 .0176648
d2010 -.1935188 -.2097648 .016246 .0172677
d2009 -.1713512 -.1845737 .0132225 .0171765
d2008 -.2231884 -.2336473 .0104589 .0171741
d2007 -.1483019 -.1632687 .0149667 .0161921
d2006 -.1073136 -.1159659 .0086523 .0167336
d2005 -.1796632 -.1801901 .0005269 .0172514
tvibusmp4j -.0165451 .0086567 -.0252018 .033146
mp4ibuslag .9148485 .9079244 .0069241 .0208008
fixed random Difference S.E.
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Coefficients
. hausman fixed random
59
Stata output for the random effects regression of MP4 uploader share using the 2004-
2012 panel and year dummies
pseudo R2=.12071189
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 52.81 Prob>=chibar2 = 0.000
rho .0820309 .0240443 .0443224 .139962
/sigma_e .2677043 .0049919 .2580969 .2776693
/sigma_u .0800258 .0125805 .0588053 .1089039
_cons -.0960992 .0593363 -1.62 0.105 -.2123962 .0201978
d2011 .0508353 .0320134 1.59 0.112 -.0119098 .1135804
d2010 .0873114 .0304604 2.87 0.004 .0276101 .1470127
d2009 .11274 .0358199 3.15 0.002 .0425343 .1829456
d2008 .0638723 .0396789 1.61 0.107 -.0138969 .1416415
d2007 .134018 .0468656 2.86 0.004 .0421632 .2258729
d2006 .1818232 .0539996 3.37 0.001 .0759859 .2876605
d2005 .118147 .0636625 1.86 0.063 -.0066291 .2429231
d2004 .2988555 .2001379 1.49 0.135 -.0934076 .6911186
tvibusmp4j .0113603 .0641271 0.18 0.859 -.1143265 .1370472
mp4ibuslag .9075766 .1694564 5.36 0.000 .5754483 1.239705
mp4uploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -180.3429 Prob > chi2 = 0.0000
LR chi2(10) = 49.52
max = 70
avg = 29.8
Random effects u_i ~ Gaussian Obs per group: min = 4
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1491
Iteration 3: log likelihood = -180.3429
Iteration 2: log likelihood = -180.3429
Iteration 1: log likelihood = -180.34638
Iteration 0: log likelihood = -181.95025
Fitting full model:
Iteration 2: log likelihood = -205.10104
Iteration 1: log likelihood = -205.10224
Iteration 0: log likelihood = -205.44047
Fitting constant-only model:
> d2010 d2011, mle
. xtreg mp4uploadshare mp4ibuslag tvibusmp4j d2004 d2005 d2006 d2007 d2008 d2009
60
Stata output for the random effects regression of AVI user share using the 2004-2012
panel and year dummies
pseudo R2=.20254558
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 14.01 Prob>=chibar2 = 0.000
rho .0564343 .0251837 .0216804 .1249395
/sigma_e .3444233 .0070843 .3308144 .358592
/sigma_u .0842322 .0195723 .0534185 .1328203
_cons .5532267 .2802638 1.97 0.048 .0039198 1.102534
d2012 -.4311116 .2635305 -1.64 0.102 -.9476218 .0853986
d2011 -.141057 .2595698 -0.54 0.587 -.6498045 .3676905
d2010 -.1883493 .2586176 -0.73 0.466 -.6952305 .3185319
d2009 -.1198652 .2538927 -0.47 0.637 -.6174857 .3777553
d2008 -.1735844 .2502113 -0.69 0.488 -.6639895 .3168208
d2007 -.168287 .2507594 -0.67 0.502 -.6597665 .3231924
d2006 -.0368414 .2506619 -0.15 0.883 -.5281298 .4544469
d2005 -.0836263 .2535782 -0.33 0.742 -.5806304 .4133777
tvibusavij .2947808 .0709902 4.15 0.000 .1556426 .4339189
aviibuplag .2105681 .1496401 1.41 0.159 -.082721 .5038573
aviusershare Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -463.90312 Prob > chi2 = 0.0000
LR chi2(10) = 235.65
max = 65
avg = 25.1
Random effects u_i ~ Gaussian Obs per group: min = 3
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1255
Iteration 3: log likelihood = -463.90312
Iteration 2: log likelihood = -463.90312
Iteration 1: log likelihood = -463.91868
Iteration 0: log likelihood = -466.15552
Fitting full model:
Iteration 5: log likelihood = -581.72995
Iteration 4: log likelihood = -581.72998
Iteration 3: log likelihood = -581.74952
Iteration 2: log likelihood = -582.33959
Iteration 1: log likelihood = -587.16213
Iteration 0: log likelihood = -614.54308
Fitting constant-only model:
> d2011 d2012, mle
. xtreg aviusershare aviibuplag tvibusavij d2005 d2006 d2007 d2008 d2009 d2010
61
Stata output for the random effects regression of AVI uploader share using the 2004-
2012 panel using year dummies
pseudo R2=.05974167
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 93.12 Prob>=chibar2 = 0.000
rho .1563382 .0378037 .0936167 .2418272
/sigma_e .3399712 .0063566 .327738 .352661
/sigma_u .1463492 .0205957 .1110711 .1928323
_cons .141896 .2791412 0.51 0.611 -.4052107 .6890026
d2012 .0701628 .2580629 0.27 0.786 -.4356312 .5759568
d2011 .1185005 .2542021 0.47 0.641 -.3797265 .6167274
d2010 .0872295 .2528111 0.35 0.730 -.4082711 .58273
d2009 .0444943 .248639 0.18 0.858 -.4428293 .5318179
d2008 -.0289777 .2467597 -0.12 0.907 -.5126179 .4546625
d2007 -.0650242 .2473783 -0.26 0.793 -.5498766 .4198283
d2006 -.0102054 .2469804 -0.04 0.967 -.4942781 .4738673
d2005 .0034731 .2500632 0.01 0.989 -.4866419 .493588
tvibusavij .2637252 .0593309 4.44 0.000 .1474388 .3800116
aviibuslag .5039363 .1608559 3.13 0.002 .1886645 .8192081
aviuploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -549.8617 Prob > chi2 = 0.0000
LR chi2(10) = 69.87
max = 70
avg = 29.8
Random effects u_i ~ Gaussian Obs per group: min = 4
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1491
Iteration 3: log likelihood = -549.8617
Iteration 2: log likelihood = -549.86191
Iteration 1: log likelihood = -549.91838
Iteration 0: log likelihood = -551.7956
Fitting full model:
Iteration 3: log likelihood = -584.79854
Iteration 2: log likelihood = -584.79917
Iteration 1: log likelihood = -584.89276
Iteration 0: log likelihood = -586.84842
Fitting constant-only model:
> d2011 d2012 ,mle
. xtreg aviuploadshare aviibuslag tvibusavij d2005 d2006 d2007 d2008 d2009 d2010
62
15. Appendix 4: MKV regressions and comments
Stata output for the random effects regression of MKV user share using the 2004-2012
panel using year dummies
.
pseudo R2=.57712741
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 10.00 Prob>=chibar2 = 0.001
rho .0301588 .0149637 .0105102 .0736249
/sigma_e .2477991 .0050423 .2381108 .2578817
/sigma_u .0436975 .0110393 .0266328 .0716961
_cons .0033926 .1778643 0.02 0.985 -.3452151 .3520002
d2012 .2084746 .1808754 1.15 0.249 -.1460347 .5629839
d2011 .1747784 .1831672 0.95 0.340 -.1842227 .5337794
d2010 .1080792 .1828104 0.59 0.554 -.2502226 .466381
d2009 .0990813 .1788387 0.55 0.580 -.2514362 .4495987
d2008 .0604957 .1786583 0.34 0.735 -.2896682 .4106595
d2007 .029761 .1795184 0.17 0.868 -.3220885 .3816106
d2006 -.0080571 .179619 -0.04 0.964 -.3601039 .3439897
d2005 -.0029317 .181876 -0.02 0.987 -.3594021 .3535386
tvibusmkvj .3232374 .0771018 4.19 0.000 .1721206 .4743542
mkvibuplag -.0493801 .1395335 -0.35 0.723 -.3228608 .2241006
mkvusershare Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -43.211548 Prob > chi2 = 0.0000
LR chi2(10) = 117.95
max = 65
avg = 25.1
Random effects u_i ~ Gaussian Obs per group: min = 3
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1255
Iteration 3: log likelihood = -43.211548
Iteration 2: log likelihood = -43.211549
Iteration 1: log likelihood = -43.214718
Iteration 0: log likelihood = -44.050166
Fitting full model:
Iteration 4: log likelihood = -102.18574
Iteration 3: log likelihood = -102.18585
Iteration 2: log likelihood = -102.22466
Iteration 1: log likelihood = -103.3203
Iteration 0: log likelihood = -116.86974
Fitting constant-only model:
> d2011 d2012, mle
. xtreg mkvusershare mkvibuplag tvibusmkvj d2005 d2006 d2007 d2008 d2009 d2010
63
Stata output for the random effects regression of MKV uploader share using the 2004-
2012 panel and year dummies
pseudo R2=.25597519
. display "pseudo R2=" (e(ll_0)-e(ll))/e(ll_0)
Likelihood-ratio test of sigma_u=0: chibar2(01)= 75.68 Prob>=chibar2 = 0.000
rho .1160257 .0309744 .0660476 .1882694
/sigma_e .2499455 .0046692 .2409596 .2592666
/sigma_u .090553 .013442 .0676936 .1211317
_cons -.0463475 .1808427 -0.26 0.798 -.4007926 .3080977
d2012 .1729023 .1862589 0.93 0.353 -.1921584 .537963
d2011 .1713201 .184535 0.93 0.353 -.190362 .5330021
d2010 .1333542 .1843655 0.72 0.469 -.2279954 .4947039
d2009 .1450172 .1811716 0.80 0.423 -.2100725 .500107
d2008 .2278536 .1806629 1.26 0.207 -.1262392 .5819463
d2007 .1740835 .181261 0.96 0.337 -.1811816 .5293486
d2006 .1234724 .1817065 0.68 0.497 -.2326657 .4796105
d2005 .1820709 .1838887 0.99 0.322 -.1783443 .5424862
tvibusmkvj .3341119 .0784412 4.26 0.000 .1803699 .4878538
mkvibuslag .2217118 .2228183 1.00 0.320 -.2150041 .6584276
mkvuploadsh~e Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -84.668528 Prob > chi2 = 0.0000
LR chi2(10) = 58.26
max = 70
avg = 29.8
Random effects u_i ~ Gaussian Obs per group: min = 4
Group variable: shows Number of groups = 50
Random-effects ML regression Number of obs = 1491
Iteration 3: log likelihood = -84.668528
Iteration 2: log likelihood = -84.668561
Iteration 1: log likelihood = -84.69051
Iteration 0: log likelihood = -86.005337
Fitting full model:
Iteration 3: log likelihood = -113.79799
Iteration 2: log likelihood = -113.79839
Iteration 1: log likelihood = -113.87002
Iteration 0: log likelihood = -115.36743
Fitting constant-only model:
> d2011 d2012 , mle
. xtreg mkvuploadshare mkvibuslag tvibusmkvj d2005 d2006 d2007 d2008 d2009 d2010
64
Some notes on the case of MKV
MKV is interesting in that direct network effects seem to have dominated adoption for
both users and uploaders, with indirect ones lacking statistical significance in either case
(although for the uploaders’ specification it comes close). One of the most important
advantages it has over other formats is its almost universal compatibility with video and
audio codecs, present and, thanks to its architecture, future. This makes it an interesting
choice for uploaders since any technical improvements on the codec side could
potentially be tackled without needing to change formats. On the user side it is attractive
since it supports most of the features characteristic of retail media (i.e. DVD and Blu-
ray) such as 3D, menus, HD or captions.
A first explanation for what we observe is that MKV, unlike MP4, is strictly an
audiovisual wrapper, so it has a harder time getting an installed base because the range
of contents it can offer is more limited. This, together with the lack of compatibility
with devices different than PCs, would partially explain its reliance on direct network
effects for its diffusion. An additional factor that may explain the significance of the
direct network effect in the uploaders’ specification is that the format is sought after and
requested by users, something that could be checked in future research by looking at the
content of the comments left in TPB.
Finally, the pseudo determination coefficients for the MKV regressions were
significantly higher than in the cases of MP4 or AVI, which supports our theory of an
adoption less conditioned by external factors and determined mostly by network effects,
which was what our econometric model accounted for. It remains to be seen if in time
(as users start asking more for the features at which MKV has a comparative advantage
or as the format’s compatibility improves) its adoption dynamics will take a similar
form as the one we observed for MP4.
65
13. Appendix 5 – Theoretical model derivations
Derivation of the price indexes
Consumers that adopt the technology may purchase their goods from all the companies
operating in the market, but will do so at different prices depending on whether or not
the firm is also an adopter. Firms in the [n-qn] are not adopters and the price they set for
their goods is while firms in the [qn-0] interval have adopted and set their price at
. Therefore we can express the price index as
∫
∫ (
)
∫ (
)
∫
(
)
(
) ⁄
(
) ⁄
∫
[
] [1]
Non-adopting consumers are only able to purchase from non-adopting firms,
therefore their price index only accounts for companies in the [n-qn] interval, and only
purchase at price .
∫
∫ (
)
(
)
(
)
∫
[2]
Derivation of consumer utilities
Based on the demand each consumer has for specific good, we defined the utility of a
consumer that has adopted the technology as
(∫
)
(∫ (
∫
)
)
66
Applying the same method as before we divide the companies into those that have
adopted and those that have not, and we obtain the following expression for the utility
(∫ (
∫
)
∫ (
∫
)
)
((
∫
)
(
∫
)
(
∫
)
)
{
(∫
) [
]}
(
)
(∫
)
Substituting the price index for expression [1] we obtain
(
)
[
]
(
)
[3]
Consumers that have not adopted purchase only from companies in the [n-qn] interval,
so their utility can be written as
(∫ (
∫
)
)
Since they can only purchase from non-adopting firms all the goods they buy have
the unique price
67
(∫ (
∫
)
)
( (
∫
)
(
∫
)
)
( (
∫
)
)
∫
By substituting the price index for equation [2] the expression becomes
⁄
[4]
Derivation of the consumers’ adoption threshold
The adoption threshold was defined by the difference between the adopters’ and non-
adopters’ utility functions
[5]
So by plugging the expressions in equations [3] and [4] into equation [5] the adoption
threshold becomes
(
)
⁄
Which, after rearranging becomes the expression shown on section 8
{[
]
}
68
14. Appendix 6 – Code used in the simulation
All text in green corresponds to annotations and has no effect on the execution of the
simulation. The underscore symbol “_” indicates the line was truncated for it to fit into
this document. Copying and pasting the code into Netlogo will not work since due to
the application’s restrictions some of the parameters are initialized in the simulation
interface. However, a fully functioning copy of the file is available at:
http://tinyurl.com/JNG-FP-SIM
;code for the microeconomic model simulation
breed [firms firm] ;initialize firms class
breed [consumers consumer] ;initialize consumers class
globals [q s kf kc aux stop?] ;initialize global variables
firms-own [tech? fadoptionCost] ;initialize firm-specific variables
consumers-own [adopter? adoptionCost] ;initialize consumer-specific
variables
to setup ;prepare the simulation
__clear-all-and-reset-ticks
random-seed 6267753 ;set random seed to allow for replication of the
results
set q 0
set s 0
set aux 0
set stop? false
create-firms n ;create firms
[ set tech? false
;give firms an iid adoption cost
set fadoptionCost (fkmax / n) * aux
set aux aux + 1]
set aux 0
create-consumers cons
[ set adopter? False
;give users an iid adoption cost
set adoptionCost (ckmax / cons) * aux
set aux aux + 1
;random day zero adoption (a proportion ibc of consumers adopts)
;command random 10 returns a random value in the [0-9] interval
if (random 10) + 1 > (10 – ibc) [set adopter? true]]
end
to go ;main procedure, this is executed every cycle
updateGlobalVars ;update global variables
updateBreedVars ;update variables owned by firms or consumers
tick
;stop the simulation if the stopping condition is fulfilled
if stop? [stop]
end
69
to updateGlobalVars
;set the stopping condition to true if adoption shares don't change
if count consumers with [adopter? = true] / cons = s and count_
firms with [tech? = true] / n = q [set stop? true]
;update adoption shares for firms and consumers
set s count consumers with [adopter? = true] / cons
set q count firms with [tech? = true] / n
;update threshold adoption costs for firms
set kf (cons / n) * (1 - alpha) * (s * (beta ^ (alpha / (alpha -_
1))) - 1) / (1 + q * ((beta ^ (alpha / (alpha - 1))) - 1))
;update threshold adoption costs for customers
set kc (alpha / cost) * (n ^ ((2 - alpha) / alpha)) * (((1 - q + q_
* (beta_ ^ (alpha / (alpha - 1)))) ^ ((1 - alpha) / alpha)) - (1 - q)_
^ ((2 - alpha) / alpha))
end
to updateBreedVars
;ask consumers that have not adopted yet to do so if their adoption
cost is smaller than the threshold for consumers
ask consumers with [not adopter?] [if adoptionCost <= kc [set
adopter?_ true]]
;ask firms that have not adopted yet to do so if their adoption cost
is smaller than the threshold for firms
ask firms with [not tech?] [if fadoptionCost <= kf [set tech? true]
]
end
70
15. References
ARMSTRONG, M., 2006, ‘Competition in Two-Sided Markets’, The RAND Journal of
Economics, 37(3): 668-691
ARTHUR, W. B., 1989, ‘Competing Technologies, Increasing Returns, and Lock-In by
Historical Events’, Economic Journal, 99: 116-131
ARTHUR, W. B., 1996, Increasing Returns and Path Dependence in the Economy, Ann
Arbor, MI: University of Michigan Press
CHURCH, J. and N. GANDAL, 1992, ‘Network Effects, Software Provision and
Standardization’, Journal of Industrial Economics, 40(1): 85-103
CHURCH, J. and N. GANDAL, 1992, ‘Complimentary Network Externalities and
Technological Adoption’, Journal of Industrial Organization, 11: 239-260
CLEMENTS M.T and H. OHASHI, 2004, ‘Indirect Network Effects and the Product
Cycle: Video Games in the U.S., 1994-2002’, Working Paper, University of Tokyo
CLEMENTS, M.T, 2003, ‘Inefficient Adoption of Technological Standards: Inertia and
Momentum Revisited’, Economic Inquiry, 43(3): 507-518
DAVIS, F. D., 1989, ‘Perceived usefulness, perceived ease of use, and user acceptance
of information technology’, MIS Quarterly, 13(3): 319–340
DIXIT, A. and STIGLITZ J., 1977, ‘Monopolistic Competition and Optimal Product
Diversity’, American Economic Review, 67: 297-308
FARRELL, J. and P. KLEMPERER, 2003, ‘Coordination and Lock-in: Competition
with Switching Costs and Network Effects’, Handbook of Industrial Organization 3,
Amsterdam: North Holland
FARRELL, J. and G. SALONER., 1985, ‘Standardization, Compatibility and
Innovation’, Rand Journal of Economics, 16:. 70-83
FARRELL, J. and G. SALONER, 1986, ‘Standardization and Variety’, Economic
Letters, 20: 71-74
71
FARRELL, J. and G. SALONER, 1986, ‘Installed base and compatibility: Innovation,
pre-announcement and predation’, American economic Review, 76(5): 940-955
GÖTZ, G., 1999, ‘Monopolistic competition and the diffusion of new technology’, The
RAND Journal of Economics, 30(4): 679-693
KATZ, M. L. and C. SHAPIRO., 1985, ‘Network Externalities, Competition and
Compatibility’, The American Economic Review, 75(3): 424-440
KATZ, M. L. and C. SHAPIRO., 1986, ‘Technology adoption in the presence of
network externalities’, Journal of Political Economy, August: 822-841
KATZ, M. L. and C. SHAPIRO., 1994, ‘Systems Competition and Network Effects’,
The Journal of Economic Perspectives, 8: 93-115
OHASHI, H., 2003, ‘The Role of Network Effects in the U.S. VCR Market, 1978-86’,
Journal of Economics and Management Strategy, 12(4): 447-494
ROCHET, J. and J. TIROLE, 2003, ‘Platform Competition in Two-Sided Markets’,
Journal of the European Economics Association, 1: 990-1029
ROCHET, J. and J. TIROLE, 2006, ‘Two-Sided Markets: a Progress Report’, The
RAND Journal of Economics, 37(3): 645-667
RYSMAN, M., 2002, ‘Competition between Networks: A Study of the Market for
Yellow Pages’, Boston University
VARIAN, H. R., FARRELL, J. and C. SHAPIRO, 1999, ‘The Economics of
Information Technology’, Cambridge: Cambridge University Press
VARIAN, H. R. and C. SHAPIRO, 1999, ‘The Art of Standards Wars’, California
Management Review, 41(2): 8-32